Evaluation for pharmkg

Evaluator: Not specified

Evaluated on: 2025-08-26

This is a manual evaluation intended to identify potential barriers to reuse.

Access Level and Types

Question	Answer	Comment
Access to data outside of the knowledge graph	Y	PharmKG provides embeddings, learned low-dimensional representations for entities and relations via KGE models. It also provides top predicted paths for mechanistic interpretation (Fig 6) and t-SNE visualizations showing semantic structure (Fig 5).
API or online access to the knowledge graph	N
Multiple access options available	N
Source code availability	Y	The full source code for KG construction, processing, and the Heterogeneous Graph Attention Network (HRGAT) embedding model is provided on the GitHub repository https://github.com/MindRank-Biotech/PharmKG/tree/master
Downloadable knowledge graph	Y	The benchmark KG is downloadable. They also provide both the clean version (final benchmark) and the raw version (with more entities and relations); the KG dataset is guarded by Google Drive permissions to access it

Section Score: 3/5

Provenance of Nodes and Edges

Question	Answer	Comment
Source list provided	Y	An integration of 6 sources (OMIM, DrugBank, PharmGKB, TTD, SIDER, HumanNet); plus entity features from MeSH, PubChem, BioBERT, BioGPS, and Connectivity Map
Source versions information	N	Do mention integrating “recent versions” of each source and give details of ID mapping and unification (e.g., Entrez Gene ID, MeSH hierarchy) but no explicit version numbers or file names for each data dump are shown in the text
Import dependencies	Y	The code is built on Pykeen and KG-reeval, with details for training and hyperparameters. Also mentions use of RDKit for chemical fingerprints and BioBERT for embeddings
Node and edge sources	Y	Partially - Confirms that nodes were unified with standard IDs (Entrez, MeSH, PubChem) and that duplicate synonyms were resolved, but it doesn’t state that each edge carries explicit source provenance tags — only that relations were merged and thematically assigned using Global Network of Biomedical Relationships (GNBR)
Edges deduplication	Y	Explicitly describes merging overlapping triplets, disambiguation with synonym tables, and merging semantically similar relations using clustering
Triples source details	Y	Explains how 29 relation types were derived, how Global Network of Biomedical Relationships (GNBR) semantic themes were mapped, and which sources contribute to which entity types and interactions
Edge type schema	Y	Documents edge semantics: “interaction,” “disease-gene,” “chemical-gene,” “disease-chemical,” and subtypes — all mapped from Global Network of Biomedical Relationships (GNBR) semantic themes and curated bases

Section Score: 6/7

Documented standards, schema, construction

Question	Answer	Comment
Biological usable data	Y	All entities explicitly use standard biomedical identifiers and the structure is designed to support drug repurposing, target discovery, adverse reaction prediction, etc., in real-world biomedical tasks
Resolvable IDs	Y	Genes use Entrez Gene IDs; diseases use MeSH IDs; chemicals use PubChem IDs
Construction documentation	Y	Explains the construction in detail, including entity filtering, merging, disambiguation, feature extraction, and the final schema design
Transformation documentation	Y	Removed trivial entities, merged low-level symptoms into MeSH parent diseases, clustered and merged redundant relation types, applied PCA for feature reduction, and explain all steps clearly
Schema used	Y	29 defined relation types with source mapping and entity categories; and semantics from Global Network of Biomedical Relationships (GNBR) themes and curated bases

Section Score: 5/5

Update frequency and versioning

Question	Answer	Comment
Stable versions	N	There is no formal semantic versioning scheme mentioned for the dataset or codebase
Public tracker information	N	There is no mention of a public tracker for feature requests or issues
Knowledge graph contact information	Y	Two corresponding authors are provided with full institutional contacts (Prof. Yuedong Yang and Dr. Zhangming Niu) in the published paper (https://academic.oup.com/bib/article/22/4/bbaa344/6042240)
Updated annually	N	No evidence yet. They state they plan future expansions (e.g., transcriptomics, clinical data, auto-extraction) but no recurring updates are published so far
Prior versions access	Y	Partially - They do provide the raw version alongside the final benchmark on GitHub but do not maintain a changelog across multiple yearly versions

Section Score: 2/5

Evaluation - Metrics and Fitness for Purpose

Question	Answer	Comment
Use case provided	Y	Provide detailed case studies for Alzheimer’s disease and Parkinson’s disease for drug repurposing and target discovery, with literature validation, top scored predictions, and visualized paths
Evaluation against other models	Y	Benchmarked Heterogeneous Graph Attention Network (HRGAT) and 9 other KGE baselines on Hetionet and PharmKG side by side and analyzed results
Defined scope	Y	The paper is a dedicated benchmark for evaluating KGE models in biomedical relation prediction, with explicit focus on drug repurposing, target identification, and multi-relation prediction tasks
Multiple evaluation methods	Y	Link prediction (MRR, Hits@k), downstream tasks (AUROC, AUPRC), t-tests for significance
Accuracy metrics	Y	Multiple: MRR, Hits@1/3/10/100, AUROC, AUPRC, p-values for statistical tests, plus visualization and interpretability checks with t-SNE plots and path analyses

Section Score: 5/5

License Information

Question	Answer	Comment
License		Explicit license (restricted use). The paper notes: “© The Author(s) 2020. Published by Oxford University Press. All rights reserved.” But does not reveal license information (nor does the GitHub repo)