GitHub - Ebiquity/wdtools: Tools to link entities and concepts in text to Wikidata items and to extract useful information from the Wikidata knowledge graph

Wikidata entiy linking tools

Tools to link entities and concepts in text to Wikidata items and to extract useful information from the Wikidata knowledge graph

Entities play an important role in text and are often used to describe what the text is about. One approach we evaluated was to find entity mentions in a report or document, link these to corresponding entities in the Wikidata knowledge graph, and use their information to improve the search.

Document entities

We use SpaCy to process text in a report or document and identify the named entity mentions along with their type. The standard SpaCy pipelines for most common languages come with a named entity recognition module that has been trained to identify the relatively limited set of Ontonotes types \cite{hovy2006ontonotes}. While it works reasonably well, it does miss some named entities and often assign the wrong type to entities that are found.

SpaCy's predefined pipelines also do not do coreference, i.e., identifying a set of entity mentions, nominal mentions and pronouns that refer to the same entity. This can be important for many subsequent tasks, such as noting how often an entity was mentioned in a document. We experimented with the addition of a simple coference tool that recognized name shortening (e.g., "Joe Biden" and "Biden") and abbreviations (e.g., "World Health Organization" and "WHO") for coreference.

The Ontonotes named entities refer to a instance of type, such as an individual person or organization, a specific location, or a nationality. We also experimented with identifying potential mentions of concepts that might be linkable, such as "letter bomb", "lava flow" or "potentially hazardous asteroid". Our strategy was simple: look for a nominal compound possible preceded by an adjective.

Wikidata entities

Wikidata is a collaboratively edited multilingual knowledge graph that is intended to provide common data for Wikipedia sites and other applications. It currently has about one billion facts on about 100 million items. It has a web interface to support exploration and editing by people, a set of APIs to access its information programmatically, and a SPARQL endpoint for querying RDF model of the knowledge graph. Wikidata's ontology has very fine-grained type system with more than two-million types and a much smaller set of properties, currently about NNNNN in number.

Wikidata is inherently multilingual and draws on data from more Wikipedia sites in nearly 300 different languages. It gives each item a identifier beginning with the letter Q like Q64780099 (the Human Language Technology Center of Excellence), and each property an id beginning with the letter P, for example P31, which is the property "instance of" that links an item with one of its immediate types and P279 that links a type to one of its immediate super--types.

Entity linking

...

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
evaluation		evaluation
examples		examples
notebooks		notebooks
README.md		README.md
abbreviation.py		abbreviation.py
create_mesh_items_set_pkl.py		create_mesh_items_set_pkl.py
entity_types.py		entity_types.py
evaluate_column.py		evaluate_column.py
gkg_search.py		gkg_search.py
linking_tests.tsv		linking_tests.tsv
mesh_items.pkl		mesh_items.pkl
mesh_items.txt		mesh_items.txt
pd_all_annotations.ipynb		pd_all_annotations.ipynb
pd_procure.ipynb		pd_procure.ipynb
procure.ipynb		procure.ipynb
procure_column_eval.ipynb		procure_column_eval.ipynb
procure_config.yml		procure_config.yml
scale_reports.py		scale_reports.py
spacy_ner_reports.py		spacy_ner_reports.py
spacy_plus.py		spacy_plus.py
tensor2attr.py		tensor2attr.py
test.py		test.py
throttle.ctrl		throttle.ctrl
user-config.py		user-config.py
wd_search.py		wd_search.py
wd_search_config.yml		wd_search_config.yml
wd_search_default_config.yml		wd_search_default_config.yml
wd_search_scale_config.yml		wd_search_scale_config.yml
wdsearch.ipynb		wdsearch.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Wikidata entiy linking tools

Document entities

Wikidata entities

Entity linking

About

Releases

Packages

Languages

Ebiquity/wdtools

Folders and files

Latest commit

History

Repository files navigation

Wikidata entiy linking tools

Document entities

Wikidata entities

Entity linking

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages