Evaluation for primekg
Evaluator: Not specified
Evaluated on: 2025-08-14
This is a manual evaluation intended to identify potential barriers to reuse.
Access Level and Types
Question | Answer | Comment |
---|---|---|
Access to data outside of the knowledge graph | Y | ClinicalBERT-based embeddings were used to group disease nodes, providing an embedding-derived version of the graph |
API or online access to the knowledge graph | N | |
Multiple access options available | Y | Available via Harvard Dataverse with raw KG (kg raw.csv) and largest connected component (kg giant.csv) https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/IXA7BM |
Source code availability | Y | Full source code is available on GitHub https://github.com/mims-harvard/PrimeKG |
Downloadable knowledge graph | Y | Harvard Dataverse Repo hosts the downloadable KG and intermediate files https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/IXA7BM |
Section Score: 4/5
Provenance of Nodes and Edges
Question | Answer | Comment |
---|---|---|
Source list provided | Y | 20 primary data sources listed, including DisGeNET, DrugBank, UMLS, Orphanet, etc. in the paper https://www.nature.com/articles/s41597-023-01960-3 |
Source versions information | Y | Explicit versions and download dates provided for each dataset in the Methods/Data Records section |
Import dependencies | Y | Partially - Tools like goatools, beautifulsoup, regex scripts, and vocabulary mappings are mentioned in the GitHub repo, but not all formal dependencies are listed |
Node and edge sources | Y | Each node contains node source; edges are annotated by type and origin |
Edges deduplication | Y | Duplicates and self-loops were removed during KG preprocessing and merging |
Triples source details | Y | Clear documentation on what triples were derived from which resource (e.g., drug–protein from DrugBank, phenotype–disease from HPO) |
Edge type schema | Y | The paper documented schema of 30 edge types and their origin ontologies |
Section Score: 7/7
Documented standards, schema, construction
Question | Answer | Comment |
---|---|---|
Biological usable data | Y | Clinical and pharmacological text features are readable and interpretable (e.g., Mayo Clinic descriptions, DrugBank pharmacodynamics) |
Resolvable IDs | Y | Uses Mondo, DrugBank, HPO, MeSH, Entrez Gene IDs, and UMLS CUIs, which are mappable and resolvable via external resources |
Construction documentation | Y | Extensive paper + GitHub repo |
Transformation documentation | Y | Transformations like self-loop removal, duplicate dropping, phenotype-disease resolution, and mapping across ontologies are documented |
Schema used | Y | Node and edge formats, and their standardized schema, are explained in the methodology and data files |
Section Score: 5/5
Update frequency and versioning
Question | Answer | Comment |
---|---|---|
Stable versions | N | No version tags (e.g., v1.0, v1.1) are mentioned or used on Dataverse or GitHub |
Public tracker information | GitHub Issues tab is not actively used for public feature requests or bug tracking | |
Knowledge graph contact information | Y | Maintained by Zitnik Lab at Harvard with lab contact and GitHub maintainers listed |
Updated annually | N | Only one release version is available as of now (May 2022) |
Prior versions access | N | No archived prior versions or changelog indicating updates |
Section Score: 1/4
Evaluation - Metrics and Fitness for Purpose
Question | Answer | Comment |
---|---|---|
Use case provided | Y | Autism case study demonstrates disease concept resolution and clinical alignment |
Evaluation against other models | Y | Compared to other KGs (e.g., SPOKE); benchmarks and references to prior systems included |
Defined scope | Y | Focused on disease-centric precision medicine with defined coverage: 17,080 diseases, 10 biological scales, 20 sources |
Multiple evaluation methods | Y | Structure connectivity, edge density, text embedding-based grouping, and clinical relevance tested |
Accuracy metrics | Y | Partially - Uses similarity thresholds (e.g., cosine ≥ 0.98 for disease grouping); no formal metrics like precision/recall provided |
Section Score: 5/5
License Information
Question | Answer | Comment |
---|---|---|
License | CC BY 4.0 |