Evaluation for ekg

Evaluator: Automated evaluation (GPT-5.4)

Evaluated on: 2026-04-22

⚠️ Automated Evaluation: This evaluation was generated automatically using an AI-based system. It is distinct from manual evaluations curated by human experts. Please review findings carefully and report any inaccuracies.

Evaluation Criteria: This evaluation uses the KG-Registry evaluation rubric as described in Cortes et al. (2025) . The rubric assesses knowledge graphs across multiple dimensions including access, provenance, documentation, maintenance, and fitness for purpose.


Access Level and Types

QuestionAnswerComment
Access to data outside of the knowledge graphYRaw epidemiological extractions are downloadable as CSV in addition to the graph serializations.
API or online access to the knowledge graphYPublic SPARQL endpoint at https://api-vast.jrc.service.ec.europa.eu/sparql/ and faceted browser at https://api-vast.jrc.service.ec.europa.eu/fct/.
Multiple access options availableYRDF/XML download, Turtle download, raw CSV download, SPARQL endpoint, and browser interface are all available.
Source code availabilityNThe paper cites a GitHub repository for the extraction pipeline, but the cited repository was not publicly resolvable at evaluation time.
Downloadable knowledge graphYThe complete graph is downloadable from the JRC catalogue in RDF/XML and Turtle.

Section Score: 4/5

Provenance of Nodes and Edges

QuestionAnswerComment
Source list providedYThe paper clearly states that eKG is extracted from WHO Disease Outbreak News and enriched through BioPortal and GeoNames mappings.
Source versions informationNNo explicit versioning or snapshot identifiers are given for the upstream WHO DON corpus or the external ontologies used in alignment.
Import dependenciesYThe construction reuses IDO, BFO, OBO Foundry terms, BioPortal ontology links, and GeoNames identifiers.
Node and edge sourcesYThe paper explains how outbreak entities come from DON reports and how diseases and locations are linked to external ontology and GeoNames resources.
Edges deduplicationNThe paper discusses minimizing duplication through ontology grounding, but it does not document a specific edge deduplication procedure.
Triples source detailsNProvenance is described at the dataset and entity-linking level, not as explicit triple-level provenance metadata.
Edge type schemaYRelations are described using RDF/OWL with reused ontology properties such as skos:related and OBO relations, and the paper documents the TBox/ABox structure.

Section Score: 4/7

Documented standards, schema, construction

QuestionAnswerComment
Biological usable dataYThe graph is designed for epidemiological and public-health analysis over infectious disease outbreaks.
Resolvable IDsYThe graph uses resolvable JRC namespace IRIs and links out to BioPortal ontology entities and GeoNames identifiers.
Construction documentationYThe Scientific Data article provides a detailed methods section describing extraction, FAIR publishing, ontology alignment, and services.
Transformation documentationYThe paper documents the ETL and LLM ensemble pipeline used to transform WHO DON reports into structured extractions and then into RDF/OWL.
Schema usedYeKG explicitly uses RDF, OWL, Linked Open Data principles, and ontology reuse from IDO/BFO/OBO with documented class mappings.

Section Score: 5/5

Update frequency and versioning

QuestionAnswerComment
Stable versionsNThe dataset has persistent URLs and a DOI-backed landing page, but no explicit versioned releases are exposed in the curated metadata.
Public tracker informationNNo public issue tracker or repository tracker was accessible at evaluation time.
Knowledge graph contact informationYThe JRC dataset page and article provide Sergio Consoli as a contact.
Updated annuallyYThe paper describes the resource as daily-updated, and the JRC catalogue records a modified date of 2025-10-10.
Prior versions accessNNo archive of prior graph releases or historical snapshots was identified from the public landing page.

Section Score: 2/5

Evaluation - Metrics and Fitness for Purpose

QuestionAnswerComment
Use case providedYThe stated purpose is epidemiological research, outbreak surveillance, and structured querying over WHO DON reports.
Evaluation against other modelsYThe paper compares the ensemble against multiple open-source LLMs and OpenAI GPT baselines for extraction tasks.
Defined scopeYThe resource scope is clearly defined as outbreak information extracted from WHO Disease Outbreak News and enriched for public-health knowledge representation.
Multiple evaluation methodsYThe paper reports benchmark classification metrics on a gold-standard subset and also includes a qualitative comparison of reconstructed outbreak trends against WHO counts.
Accuracy metricsYPrecision, Recall, and F1 are reported across multiple extraction tasks, with the ensemble achieving the best F1 values in the presented tables.

Section Score: 5/5

License Information

QuestionAnswerComment
LicenseYThe paper and JRC catalogue state that the produced data and ontology are available under CC BY 4.0.