Evaluation for wikidata

Evaluator: Automated Evaluation

Evaluated on: 2026-01-06

⚠️ Automated Evaluation: This evaluation was generated automatically using an AI-based system. It is distinct from manual evaluations curated by human experts. Please review findings carefully and report any inaccuracies.

Evaluation Criteria: This evaluation uses the KG-Registry evaluation rubric as described in Cortes et al. (2025) . The rubric assesses knowledge graphs across multiple dimensions including access, provenance, documentation, maintenance, and fitness for purpose.


Access Level and Types

QuestionAnswerComment
Access to data outside of the knowledge graphYWikidata web portal at wikidata.org provides browsing and editing interface for 119+ million data items
API or online access to the knowledge graphYSPARQL query service at query.wikidata.org supports complex semantic queries; RESTful action API available
Multiple access options availableYFive documented access methods: web portal, SPARQL endpoint, query editor, bulk dumps, and REST API
Source code availabilityYMediawiki-based platform; source code publicly available through Wikimedia organization repositories
Downloadable knowledge graphYComplete database dumps in JSON, RDF/XML, TTL formats available as compressed archives at dumps.wikimedia.org

Section Score: 5/5

Provenance of Nodes and Edges

QuestionAnswerComment
Source list providedYData sourced from Wikipedia articles, Wikimedia projects, and external linked open data sources
Source versions informationNNo explicit versioning of upstream Wikipedia or external data sources; continuous updates without version tracking
Import dependenciesYExplicit relationships documented for Wikipedia article mappings and interlinking with VIAF, GND, and other identifiers
Node and edge sourcesYEach item traceable to Wikimedia source; edit history provides attribution of knowledge statements
Edges deduplicationNWhile community identifies duplicates, no formal deduplication algorithm documented; merging handled ad-hoc
Triples source detailsNRDF export schema partially documented but source attribution for individual statements not explicit
Edge type schemaNExtensive property vocabulary used but mapping to standard ontologies (RDF, OWL) not fully formalized

Section Score: 3/7

Documented standards, schema, construction

QuestionAnswerComment
Biological usable dataYWikidata includes extensive biomedical data: genes, proteins, diseases, drugs, and biological pathways
Resolvable IDsYWikidata IDs (Q-identifiers) are stable and resolvable; cross-references to standard identifiers (Uniprot, NCBI Gene, etc.)
Construction documentationNWhile edit history is transparent, formal KG construction methodology not documented; wiki-based approach
Transformation documentationNRDF/TTL dump generation process not formally documented; no data quality control procedures published
Schema usedNUses Wikidata property model; mapping to RDF schema and standard ontologies incomplete

Section Score: 2/5

Update frequency and versioning

QuestionAnswerComment
Stable versionsYDatabase dumps published regularly with dates; snapshot releases available for reproducibility
Public tracker informationNPhabricator system used for tracking but not specifically scoped to Wikidata KG development
Knowledge graph contact informationYWikimedia Foundation provides support; contact available through wikidata.org/wiki/Wikidata:Contact
Updated annuallyYContinuously updated knowledge base; new dumps published regularly (weekly/monthly snapshot releases)
Prior versions accessYHistorical dumps available through archive; complete edit history accessible for any item or statement

Section Score: 4/5

Evaluation - Metrics and Fitness for Purpose

QuestionAnswerComment
Use case providedYClear use cases: central structured data repository for Wikipedia, machine-readable linked data access, integration hub
Evaluation against other modelsNNo formal comparison with other general-purpose knowledge graphs (DBpedia, YAGO); relative completeness not quantified
Defined scopeYScope well-defined: comprehensive general knowledge base with 119+ million items covering all domains
Multiple evaluation methodsNNo systematic evaluation framework published; quality assessment relies on community and edit history
Accuracy metricsNNo reported accuracy metrics, precision/recall, or data quality benchmarks; community-driven validation

Section Score: 2/5

License Information

QuestionAnswerComment
License