Evaluation for pharmkg
Evaluator: Not specified
Evaluated on: 2025-08-14
This is a manual evaluation intended to identify potential barriers to reuse.
Access Level and Types
Question | Answer | Comment |
---|---|---|
Access to data outside of the knowledge graph | Y | PharmKG provides embeddings, learned low-dimensional representations for entities and relations via KGE models. It also provides top predicted paths for mechanistic interpretation (Fig 6) and t-SNE visualizations showing semantic structure (Fig 5). |
API or online access to the knowledge graph | N | |
Multiple access options available | N | |
Source code availability | Y | The full source code for KG construction, processing, and the Heterogeneous Graph Attention Network (HRGAT) embedding model is provided on the GitHub repository https://github.com/MindRank-Biotech/PharmKG/tree/master |
Downloadable knowledge graph | Y | The benchmark KG is downloadable. They also provide both the clean version (final benchmark) and the raw version (with more entities and relations); the KG dataset is guarded by Google Drive permissions to access it |
Section Score: 3/5
Provenance of Nodes and Edges
Question | Answer | Comment |
---|---|---|
Source list provided | Y | An integration of 6 sources (OMIM, DrugBank, PharmGKB, TTD, SIDER, HumanNet); plus entity features from MeSH, PubChem, BioBERT, BioGPS, and Connectivity Map |
Source versions information | N | Do mention integrating “recent versions” of each source and give details of ID mapping and unification (e.g., Entrez Gene ID, MeSH hierarchy) but no explicit version numbers or file names for each data dump are shown in the text |
Import dependencies | Y | The code is built on Pykeen and KG-reeval, with details for training and hyperparameters. Also mentions use of RDKit for chemical fingerprints and BioBERT for embeddings |
Node and edge sources | Y | Partially - Confirms that nodes were unified with standard IDs (Entrez, MeSH, PubChem) and that duplicate synonyms were resolved, but it doesn’t state that each edge carries explicit source provenance tags — only that relations were merged and thematically assigned using Global Network of Biomedical Relationships (GNBR) |
Edges deduplication | Y | Explicitly describes merging overlapping triplets, disambiguation with synonym tables, and merging semantically similar relations using clustering |
Triples source details | Y | Explains how 29 relation types were derived, how Global Network of Biomedical Relationships (GNBR) semantic themes were mapped, and which sources contribute to which entity types and interactions |
Edge type schema | Y | Documents edge semantics: “interaction,” “disease-gene,” “chemical-gene,” “disease-chemical,” and subtypes — all mapped from Global Network of Biomedical Relationships (GNBR) semantic themes and curated bases |
Section Score: 6/7
Documented standards, schema, construction
Question | Answer | Comment |
---|---|---|
Biological usable data | Y | All entities explicitly use standard biomedical identifiers and the structure is designed to support drug repurposing, target discovery, adverse reaction prediction, etc., in real-world biomedical tasks |
Resolvable IDs | Y | Genes use Entrez Gene IDs; diseases use MeSH IDs; chemicals use PubChem IDs |
Construction documentation | Y | Explains the construction in detail, including entity filtering, merging, disambiguation, feature extraction, and the final schema design |
Transformation documentation | Y | Removed trivial entities, merged low-level symptoms into MeSH parent diseases, clustered and merged redundant relation types, applied PCA for feature reduction, and explain all steps clearly |
Schema used | Y | 29 defined relation types with source mapping and entity categories; and semantics from Global Network of Biomedical Relationships (GNBR) themes and curated bases |
Section Score: 5/5
Update frequency and versioning
Question | Answer | Comment |
---|---|---|
Stable versions | N | There is no formal semantic versioning scheme mentioned for the dataset or codebase |
Public tracker information | N | There is no mention of a public tracker for feature requests or issues |
Knowledge graph contact information | Y | Two corresponding authors are provided with full institutional contacts (Prof. Yuedong Yang and Dr. Zhangming Niu) in the published paper (https://academic.oup.com/bib/article/22/4/bbaa344/6042240) |
Updated annually | N | No evidence yet. They state they plan future expansions (e.g., transcriptomics, clinical data, auto-extraction) but no recurring updates are published so far |
Prior versions access | Y | Partially - They do provide the raw version alongside the final benchmark on GitHub but do not maintain a changelog across multiple yearly versions |
Section Score: 2/5
Evaluation - Metrics and Fitness for Purpose
Question | Answer | Comment |
---|---|---|
Use case provided | Y | Provide detailed case studies for Alzheimer’s disease and Parkinson’s disease for drug repurposing and target discovery, with literature validation, top scored predictions, and visualized paths |
Evaluation against other models | Y | Benchmarked Heterogeneous Graph Attention Network (HRGAT) and 9 other KGE baselines on Hetionet and PharmKG side by side and analyzed results |
Defined scope | Y | The paper is a dedicated benchmark for evaluating KGE models in biomedical relation prediction, with explicit focus on drug repurposing, target identification, and multi-relation prediction tasks |
Multiple evaluation methods | Y | Link prediction (MRR, Hits@k), downstream tasks (AUROC, AUPRC), t-tests for significance |
Accuracy metrics | Y | Multiple: MRR, Hits@1/3/10/100, AUROC, AUPRC, p-values for statistical tests, plus visualization and interpretability checks with t-SNE plots and path analyses |
Section Score: 5/5
License Information
Question | Answer | Comment |
---|---|---|
License | Explicit license (restricted use). The paper notes: “© The Author(s) 2020. Published by Oxford University Press. All rights reserved.” But does not reveal license information (nor does the GitHub repo) |