Evaluation for pharmkg

Evaluator: Not specified

Evaluated on: 2025-08-14

This is a manual evaluation intended to identify potential barriers to reuse.


Access Level and Types

QuestionAnswerComment
Access to data outside of the knowledge graphYPharmKG provides embeddings, learned low-dimensional representations for entities and relations via KGE models. It also provides top predicted paths for mechanistic interpretation (Fig 6) and t-SNE visualizations showing semantic structure (Fig 5).
API or online access to the knowledge graphN
Multiple access options availableN
Source code availabilityYThe full source code for KG construction, processing, and the Heterogeneous Graph Attention Network (HRGAT) embedding model is provided on the GitHub repository https://github.com/MindRank-Biotech/PharmKG/tree/master
Downloadable knowledge graphYThe benchmark KG is downloadable. They also provide both the clean version (final benchmark) and the raw version (with more entities and relations); the KG dataset is guarded by Google Drive permissions to access it

Section Score: 3/5

Provenance of Nodes and Edges

QuestionAnswerComment
Source list providedYAn integration of 6 sources (OMIM, DrugBank, PharmGKB, TTD, SIDER, HumanNet); plus entity features from MeSH, PubChem, BioBERT, BioGPS, and Connectivity Map
Source versions informationNDo mention integrating “recent versions” of each source and give details of ID mapping and unification (e.g., Entrez Gene ID, MeSH hierarchy) but no explicit version numbers or file names for each data dump are shown in the text
Import dependenciesYThe code is built on Pykeen and KG-reeval, with details for training and hyperparameters. Also mentions use of RDKit for chemical fingerprints and BioBERT for embeddings
Node and edge sourcesYPartially - Confirms that nodes were unified with standard IDs (Entrez, MeSH, PubChem) and that duplicate synonyms were resolved, but it doesn’t state that each edge carries explicit source provenance tags — only that relations were merged and thematically assigned using Global Network of Biomedical Relationships (GNBR)
Edges deduplicationYExplicitly describes merging overlapping triplets, disambiguation with synonym tables, and merging semantically similar relations using clustering
Triples source detailsYExplains how 29 relation types were derived, how Global Network of Biomedical Relationships (GNBR) semantic themes were mapped, and which sources contribute to which entity types and interactions
Edge type schemaYDocuments edge semantics: “interaction,” “disease-gene,” “chemical-gene,” “disease-chemical,” and subtypes — all mapped from Global Network of Biomedical Relationships (GNBR) semantic themes and curated bases

Section Score: 6/7

Documented standards, schema, construction

QuestionAnswerComment
Biological usable dataYAll entities explicitly use standard biomedical identifiers and the structure is designed to support drug repurposing, target discovery, adverse reaction prediction, etc., in real-world biomedical tasks
Resolvable IDsYGenes use Entrez Gene IDs; diseases use MeSH IDs; chemicals use PubChem IDs
Construction documentationYExplains the construction in detail, including entity filtering, merging, disambiguation, feature extraction, and the final schema design
Transformation documentationYRemoved trivial entities, merged low-level symptoms into MeSH parent diseases, clustered and merged redundant relation types, applied PCA for feature reduction, and explain all steps clearly
Schema usedY29 defined relation types with source mapping and entity categories; and semantics from Global Network of Biomedical Relationships (GNBR) themes and curated bases

Section Score: 5/5

Update frequency and versioning

QuestionAnswerComment
Stable versionsNThere is no formal semantic versioning scheme mentioned for the dataset or codebase
Public tracker informationNThere is no mention of a public tracker for feature requests or issues
Knowledge graph contact informationYTwo corresponding authors are provided with full institutional contacts (Prof. Yuedong Yang and Dr. Zhangming Niu) in the published paper (https://academic.oup.com/bib/article/22/4/bbaa344/6042240)
Updated annuallyNNo evidence yet. They state they plan future expansions (e.g., transcriptomics, clinical data, auto-extraction) but no recurring updates are published so far
Prior versions accessYPartially - They do provide the raw version alongside the final benchmark on GitHub but do not maintain a changelog across multiple yearly versions

Section Score: 2/5

Evaluation - Metrics and Fitness for Purpose

QuestionAnswerComment
Use case providedYProvide detailed case studies for Alzheimer’s disease and Parkinson’s disease for drug repurposing and target discovery, with literature validation, top scored predictions, and visualized paths
Evaluation against other modelsYBenchmarked Heterogeneous Graph Attention Network (HRGAT) and 9 other KGE baselines on Hetionet and PharmKG side by side and analyzed results
Defined scopeYThe paper is a dedicated benchmark for evaluating KGE models in biomedical relation prediction, with explicit focus on drug repurposing, target identification, and multi-relation prediction tasks
Multiple evaluation methodsYLink prediction (MRR, Hits@k), downstream tasks (AUROC, AUPRC), t-tests for significance
Accuracy metricsYMultiple: MRR, Hits@1/3/10/100, AUROC, AUPRC, p-values for statistical tests, plus visualization and interpretability checks with t-SNE plots and path analyses

Section Score: 5/5

License Information

QuestionAnswerComment
LicenseExplicit license (restricted use). The paper notes: “© The Author(s) 2020. Published by Oxford University Press. All rights reserved.” But does not reveal license information (nor does the GitHub repo)