Evaluation for ekg
Evaluator: Automated evaluation (GPT-5.4)
Evaluated on: 2026-04-22
⚠️ Automated Evaluation: This evaluation was generated automatically using an AI-based system. It is distinct from manual evaluations curated by human experts. Please review findings carefully and report any inaccuracies.
Evaluation Criteria: This evaluation uses the KG-Registry evaluation rubric as described in Cortes et al. (2025) . The rubric assesses knowledge graphs across multiple dimensions including access, provenance, documentation, maintenance, and fitness for purpose.
Access Level and Types
| Question | Answer | Comment |
|---|---|---|
| Access to data outside of the knowledge graph | Y | Raw epidemiological extractions are downloadable as CSV in addition to the graph serializations. |
| API or online access to the knowledge graph | Y | Public SPARQL endpoint at https://api-vast.jrc.service.ec.europa.eu/sparql/ and faceted browser at https://api-vast.jrc.service.ec.europa.eu/fct/. |
| Multiple access options available | Y | RDF/XML download, Turtle download, raw CSV download, SPARQL endpoint, and browser interface are all available. |
| Source code availability | N | The paper cites a GitHub repository for the extraction pipeline, but the cited repository was not publicly resolvable at evaluation time. |
| Downloadable knowledge graph | Y | The complete graph is downloadable from the JRC catalogue in RDF/XML and Turtle. |
Section Score: 4/5
Provenance of Nodes and Edges
| Question | Answer | Comment |
|---|---|---|
| Source list provided | Y | The paper clearly states that eKG is extracted from WHO Disease Outbreak News and enriched through BioPortal and GeoNames mappings. |
| Source versions information | N | No explicit versioning or snapshot identifiers are given for the upstream WHO DON corpus or the external ontologies used in alignment. |
| Import dependencies | Y | The construction reuses IDO, BFO, OBO Foundry terms, BioPortal ontology links, and GeoNames identifiers. |
| Node and edge sources | Y | The paper explains how outbreak entities come from DON reports and how diseases and locations are linked to external ontology and GeoNames resources. |
| Edges deduplication | N | The paper discusses minimizing duplication through ontology grounding, but it does not document a specific edge deduplication procedure. |
| Triples source details | N | Provenance is described at the dataset and entity-linking level, not as explicit triple-level provenance metadata. |
| Edge type schema | Y | Relations are described using RDF/OWL with reused ontology properties such as skos:related and OBO relations, and the paper documents the TBox/ABox structure. |
Section Score: 4/7
Documented standards, schema, construction
| Question | Answer | Comment |
|---|---|---|
| Biological usable data | Y | The graph is designed for epidemiological and public-health analysis over infectious disease outbreaks. |
| Resolvable IDs | Y | The graph uses resolvable JRC namespace IRIs and links out to BioPortal ontology entities and GeoNames identifiers. |
| Construction documentation | Y | The Scientific Data article provides a detailed methods section describing extraction, FAIR publishing, ontology alignment, and services. |
| Transformation documentation | Y | The paper documents the ETL and LLM ensemble pipeline used to transform WHO DON reports into structured extractions and then into RDF/OWL. |
| Schema used | Y | eKG explicitly uses RDF, OWL, Linked Open Data principles, and ontology reuse from IDO/BFO/OBO with documented class mappings. |
Section Score: 5/5
Update frequency and versioning
| Question | Answer | Comment |
|---|---|---|
| Stable versions | N | The dataset has persistent URLs and a DOI-backed landing page, but no explicit versioned releases are exposed in the curated metadata. |
| Public tracker information | N | No public issue tracker or repository tracker was accessible at evaluation time. |
| Knowledge graph contact information | Y | The JRC dataset page and article provide Sergio Consoli as a contact. |
| Updated annually | Y | The paper describes the resource as daily-updated, and the JRC catalogue records a modified date of 2025-10-10. |
| Prior versions access | N | No archive of prior graph releases or historical snapshots was identified from the public landing page. |
Section Score: 2/5
Evaluation - Metrics and Fitness for Purpose
| Question | Answer | Comment |
|---|---|---|
| Use case provided | Y | The stated purpose is epidemiological research, outbreak surveillance, and structured querying over WHO DON reports. |
| Evaluation against other models | Y | The paper compares the ensemble against multiple open-source LLMs and OpenAI GPT baselines for extraction tasks. |
| Defined scope | Y | The resource scope is clearly defined as outbreak information extracted from WHO Disease Outbreak News and enriched for public-health knowledge representation. |
| Multiple evaluation methods | Y | The paper reports benchmark classification metrics on a gold-standard subset and also includes a qualitative comparison of reconstructed outbreak trends against WHO counts. |
| Accuracy metrics | Y | Precision, Recall, and F1 are reported across multiple extraction tasks, with the ensemble achieving the best F1 values in the presented tables. |
Section Score: 5/5
License Information
| Question | Answer | Comment |
|---|---|---|
| License | Y | The paper and JRC catalogue state that the produced data and ontology are available under CC BY 4.0. |