Evaluation for pheknowlator

Evaluator: Shilpa Sundar

Evaluated on: 2025-08-26

This is a manual evaluation intended to identify potential barriers to reuse.


Access Level and Types

QuestionAnswerComment
Access to data outside of the knowledge graphYPrebuilt node embeddings (created using DeepWalk variants, v1.0.0); Jupyter notebooks for entity/path search and RDF querying; OWL-NETS abstractions for analysis. Future plans also mention providing more embeddings using GRAPE
API or online access to the knowledge graphY(API/SPARQL). Hosted SPARQL endpoint (Blazegraph) with proxy UI and deployment code. Neo4j hosting not mentioned
Multiple access options availableYZenodo community archives, GitHub releases, PyPI package, Docker images, SPARQL endpoint, plus logs/metadata
Source code availabilityYFull repo on GitHub and installable via PyPI; notebooks for OWL-NETS, RDF queries, and entity search
Downloadable knowledge graphY11 monthly builds (2019–2021), each with 12 KG types (class/instance × standard/inverse × OWL vs. OWL-NETS ± harmonization), provided in standard formats

Section Score: 5/5

Provenance of Nodes and Edges

QuestionAnswerComment
Source list providedYComprehensive source lists (Supp. Table 12) + edge_source_metadata.txt, ontology_source_metadata.txt, downloaded_build_metadata.txt
Source versions informationYMetadata capture provider, filenames, URLs, licenses, download dates; per-build logs and version tags
Import dependenciesYMethods detail OWLTools, ROBOT (noted as future improvement), Blazegraph, DBCLS SPARQL proxy, Docker, NetworkX, Gephi/OpenOrd, GitHub Actions/Google Cloud specs, etc. The wiki/docs enumerate inputs and environment
Node and edge sourcesYLargely yes - build metadata record which sources contribute which node/edge types; evidence such as PubMed IDs added for some relationships (e.g., CTD). Fine-grained per-triple provenance is not universal
Edges deduplicationYPartial - Explicit handling of inverse/symmetric relations; self-loops reported. No standalone deduplication policy document - supports inverse/bidirectional relation strategy and treats implicitly symmetric interaction edges
Triples source detailsYFigure 8 walk-through (e.g., ClinVar → variant–disease); Table 5 counts edges by relation/source rules
Edge type schemaYPredicates from Relation Ontology (RO); OWL-NETS converts OWL to hybrid triples; harmonization rules (rdf:type ↔ rdfs:subClassOf) per knowledge model

Section Score: 7/7

Documented standards, schema, construction

QuestionAnswerComment
Biological usable dataYOutputs in RDF/XML, N-Triples, and JSON/TSV with resolvable IRIs; queryable via SPARQL; broadly compatible with standard bio/semantic-web tooling and pipelines
Resolvable IDsYUses HGNC, Entrez, Ensembl, PRO, ChEBI, Uberon, HPO, MONDO, etc.; explicit ID mapping during preparation
Construction documentationYStrong yes - Detailed wiki, notebooks, per-build pages, logs, and metadata; SPARQL deployment docs
Transformation documentationYOntology cleaning with reports; data preparation steps (replace NaN, unnest, reformat IDs); filtering/mapping dictionaries logged
Schema usedYSemantic-web first: OBO Foundry grounding, RO relations, OWL complex graphs, OWL-NETS abstraction. (Biolink/Bioregistry noted as future integration.)

Section Score: 5/5

Update frequency and versioning

QuestionAnswerComment
Stable versionsYSemantic Versioning used for code/docs; builds labeled (e.g., v2.1.0); artifacts versioned on GitHub/Zenodo/Docker
Public tracker informationYGitHub Issues; community bug reports acknowledged
Knowledge graph contact informationYGitHub repo mentions contact as Tiffany Callahan (first author)
Updated annuallyYThe paper states, "Eleven monthly PKT Human Disease benchmark KG builds were created between September 2, 2019 and November 1, 2021." This confirms that the KG is updated monthly, which is more than once per year
Prior versions accessYZenodo community archive + GitHub releases/wiki per build; file manifests and changes described

Section Score: 5/5

Evaluation - Metrics and Fitness for Purpose

QuestionAnswerComment
Use case providedYThere is a dedicatd section in the paper. Applications include toxicogenomic inference, depression→AD causal features, MGMLink, RNA-KG, HuBMAP/SenNet ingestion, pathway “cartooning,” etc.
Evaluation against other modelsYQualitative) - Survey vs 14 open-source builders; feature coverage and maturity compared (no head-to-head same-data build
Defined scopeYHuman disease mechanisms spanning central dogma, pathways, variants, treatments across multiple biological scales
Multiple evaluation methodsYTool survey, computational performance (runtime/memory), structural stats (nodes/edges/density/self-loops), visualizations, embeddings/tasks, and qualitative comparison in a systematic manner
Accuracy metricsYNo universal per-triple confidence score; but includes evidence IDs for some relations, reasoner-based logical consistency, and encourages task-level evaluation using benchmarks/embeddings

Section Score: 5/5

License Information

QuestionAnswerComment
LicenseApache 2.0 License