The Wikipedia Citations dataset currently includes around 30M citations from Wikipedia pages to a variety of sources – of which 4M are

Wikipedia Citations in Wikidata

submited by
Style Pass
2021-07-15 22:00:08

The Wikipedia Citations dataset currently includes around 30M citations from Wikipedia pages to a variety of sources – of which 4M are to scientific publication. The increase of the connection with external data services and the provision of structured data to one of the key elements of Wikipedia articles has two significant benefits: first of all, a better discoverability of relevant encyclopedic articles related to scholarly studies; furthermore, the enacting of Wikipedia as a social authority and policy hub which would enable policymakers to assess the importance of an article, person, research group and institution by looking at how many Wikipedia articles cite them. 

These are the motivations behind the “Wikipedia Citations in Wikidata” project, supported by a grant from the WikiCite Initiative. From January 2021 until the end of April, the team of Silvio Peroni (co-founder and director of OpenCitations), Giovanni Colavizza, Marilena Daquino, Gabriele Pisciotta and Simone Persiani from the University of Bologna (Department of Classical Philology and Italian Studies) has been working in developing a codebase to enrich Wikidata with citations to scholarly publications that are currently referenced in English Wikipedia. This codebase consists of four software modules in Python and integrates new components (a classifier to distinguish citations by cited source and a look-up module to equip citations with identifiers from Crossref or other APIs). In so doing, Wikipedia Citations extends upon prior work which only focused on citations already equipped with identifiers.  

In the first two steps of the workflow (extractor and converter) the mapping between the various ways Wikipedia citations are represented in Wikipedia articles and the OpenCitations Data Model (OCDM) has been implemented and then enriched with a component responsible to find new identifiers to the entities in a dataset compliant with OCDM (enricher), while in the pusher step the mapping between the OCDM and Wikidata has been enabled, and the code has been finally released in GitHub.  

Leave a Comment