Evaluation for hetionet

Evaluator: Not specified

Evaluated on: 2025-08-14

This is a manual evaluation intended to identify potential barriers to reuse.


Access Level and Types

QuestionAnswerComment
Access to data outside of the knowledge graphYCan access paths, DWPCs, prediction probabilities, network support breakdowns for compound–disease pairs (via Neo4j Browser & guides)
API or online access to the knowledge graphYFully hosted on a public Neo4j instance with Cypher queries, guides, tutorials https://neo4j.het.io/browser/
Multiple access options availableYDownloadable as JSON, Neo4j DB, TSV; also query online in Neo4j Browser; source code & intermediate datasets on GitHub, Zenodo, Figshare
Source code availabilityYThe source code and scripts are public on hetio and GitHub linked in the paper https://github.com/elifesciences-publications/hetionet
Downloadable knowledge graphYMultiple export formats (JSON, Neo4j dump, TSV)

Section Score: 5/5

Provenance of Nodes and Edges

QuestionAnswerComment
Source list providedY29 sources documented; each node/edge carries source information in properties; full list with versions in paper https://elifesciences.org/articles/26726
Source versions informationYVersions noted: e.g., DrugBank v4.2, SIDER v4.1, LINCS L1000 (Oct 2015), Pathway Commons (with date)
Import dependenciesYInput ontologies and databases fully listed with versions; also intermediate resources described (e.g., STARGEO, PharmacotherapyDB
Node and edge sourcesYNode/edge properties include URLs, source, license, confidence scores (for applicable edges)
Edges deduplicationYMerged redundant pathways; multiple studies for same edge consolidated; non-informative gene sets removed
Triples source detailsYExplicit per metaedge: e.g., binding affinities (≤1 mM), co-occurrence p-values (MEDLINE), gene interaction specifics
Edge type schemaYClear metagraph with 11 node types & 24 metaedges; each with documented origin & justification

Section Score: 7/7

Documented standards, schema, construction

QuestionAnswerComment
Biological usable dataYUses standard biomedical IDs: Entrez, UMLS, MeSH, DO, Uberon
Resolvable IDsYEntrez Gene, DOID, MeSH IDs, InChIKeys used for easy cross-referencing
Construction documentationYExtensive: paper + Thinklab logs + GitHub issues + detailed guides
Transformation documentationYExplained pruning (e.g., filtering Uberon terms, merging pathways, restricting GO terms by size)
Schema usedYMetagraph is the explicit schema; node and edge types clearly defined

Section Score: 5/5

Update frequency and versioning

QuestionAnswerComment
Stable versionsYPartial - “v1.0” labeled, but no formal version history beyond initial
Public tracker informationYPartial - Thinklab (now static); issues can be filed on GitHub
Knowledge graph contact informationYDaniel Himmelstein and team, contactable via GitHub, Thinklab archives, paper
Updated annuallyNOnly v1.0 publicly released so far
Prior versions accessNEarly versions mentioned but no archived download versions listed

Section Score: 3/5

Evaluation - Metrics and Fitness for Purpose

QuestionAnswerComment
Use case providedYNicotine dependence (bupropion), epilepsy predictions (acamprosate)
Evaluation against other modelsYCompared to PREDICT, Guney et al., Cheng et al.; used baselines & permutation
Defined scopeYDesigned for systematic drug repurposing + broader knowledge integration
Multiple evaluation methodsYDWPC + AUROC + permutation + cross-validation + external test sets (DrugCentral, ClinicalTrials.gov)
Accuracy metricsYProbability scores, cross-validated elastic net, path-level contribution breakdowns, AUROC

Section Score: 5/5

License Information

QuestionAnswerComment
LicenseCC0 1.0