Evaluation for pheknowlator
Evaluator: Shilpa Sundar
Evaluated on: 2025-08-26
This is a manual evaluation intended to identify potential barriers to reuse.
Access Level and Types
Question | Answer | Comment |
---|---|---|
Access to data outside of the knowledge graph | Y | Prebuilt node embeddings (created using DeepWalk variants, v1.0.0); Jupyter notebooks for entity/path search and RDF querying; OWL-NETS abstractions for analysis. Future plans also mention providing more embeddings using GRAPE |
API or online access to the knowledge graph | Y | (API/SPARQL). Hosted SPARQL endpoint (Blazegraph) with proxy UI and deployment code. Neo4j hosting not mentioned |
Multiple access options available | Y | Zenodo community archives, GitHub releases, PyPI package, Docker images, SPARQL endpoint, plus logs/metadata |
Source code availability | Y | Full repo on GitHub and installable via PyPI; notebooks for OWL-NETS, RDF queries, and entity search |
Downloadable knowledge graph | Y | 11 monthly builds (2019–2021), each with 12 KG types (class/instance × standard/inverse × OWL vs. OWL-NETS ± harmonization), provided in standard formats |
Section Score: 5/5
Provenance of Nodes and Edges
Question | Answer | Comment |
---|---|---|
Source list provided | Y | Comprehensive source lists (Supp. Table 12) + edge_source_metadata.txt, ontology_source_metadata.txt, downloaded_build_metadata.txt |
Source versions information | Y | Metadata capture provider, filenames, URLs, licenses, download dates; per-build logs and version tags |
Import dependencies | Y | Methods detail OWLTools, ROBOT (noted as future improvement), Blazegraph, DBCLS SPARQL proxy, Docker, NetworkX, Gephi/OpenOrd, GitHub Actions/Google Cloud specs, etc. The wiki/docs enumerate inputs and environment |
Node and edge sources | Y | Largely yes - build metadata record which sources contribute which node/edge types; evidence such as PubMed IDs added for some relationships (e.g., CTD). Fine-grained per-triple provenance is not universal |
Edges deduplication | Y | Partial - Explicit handling of inverse/symmetric relations; self-loops reported. No standalone deduplication policy document - supports inverse/bidirectional relation strategy and treats implicitly symmetric interaction edges |
Triples source details | Y | Figure 8 walk-through (e.g., ClinVar → variant–disease); Table 5 counts edges by relation/source rules |
Edge type schema | Y | Predicates from Relation Ontology (RO); OWL-NETS converts OWL to hybrid triples; harmonization rules (rdf:type ↔ rdfs:subClassOf) per knowledge model |
Section Score: 7/7
Documented standards, schema, construction
Question | Answer | Comment |
---|---|---|
Biological usable data | Y | Outputs in RDF/XML, N-Triples, and JSON/TSV with resolvable IRIs; queryable via SPARQL; broadly compatible with standard bio/semantic-web tooling and pipelines |
Resolvable IDs | Y | Uses HGNC, Entrez, Ensembl, PRO, ChEBI, Uberon, HPO, MONDO, etc.; explicit ID mapping during preparation |
Construction documentation | Y | Strong yes - Detailed wiki, notebooks, per-build pages, logs, and metadata; SPARQL deployment docs |
Transformation documentation | Y | Ontology cleaning with reports; data preparation steps (replace NaN, unnest, reformat IDs); filtering/mapping dictionaries logged |
Schema used | Y | Semantic-web first: OBO Foundry grounding, RO relations, OWL complex graphs, OWL-NETS abstraction. (Biolink/Bioregistry noted as future integration.) |
Section Score: 5/5
Update frequency and versioning
Question | Answer | Comment |
---|---|---|
Stable versions | Y | Semantic Versioning used for code/docs; builds labeled (e.g., v2.1.0); artifacts versioned on GitHub/Zenodo/Docker |
Public tracker information | Y | GitHub Issues; community bug reports acknowledged |
Knowledge graph contact information | Y | GitHub repo mentions contact as Tiffany Callahan (first author) |
Updated annually | Y | The paper states, "Eleven monthly PKT Human Disease benchmark KG builds were created between September 2, 2019 and November 1, 2021." This confirms that the KG is updated monthly, which is more than once per year |
Prior versions access | Y | Zenodo community archive + GitHub releases/wiki per build; file manifests and changes described |
Section Score: 5/5
Evaluation - Metrics and Fitness for Purpose
Question | Answer | Comment |
---|---|---|
Use case provided | Y | There is a dedicatd section in the paper. Applications include toxicogenomic inference, depression→AD causal features, MGMLink, RNA-KG, HuBMAP/SenNet ingestion, pathway “cartooning,” etc. |
Evaluation against other models | Y | Qualitative) - Survey vs 14 open-source builders; feature coverage and maturity compared (no head-to-head same-data build |
Defined scope | Y | Human disease mechanisms spanning central dogma, pathways, variants, treatments across multiple biological scales |
Multiple evaluation methods | Y | Tool survey, computational performance (runtime/memory), structural stats (nodes/edges/density/self-loops), visualizations, embeddings/tasks, and qualitative comparison in a systematic manner |
Accuracy metrics | Y | No universal per-triple confidence score; but includes evidence IDs for some relations, reasoner-based logical consistency, and encourages task-level evaluation using benchmarks/embeddings |
Section Score: 5/5
License Information
Question | Answer | Comment |
---|---|---|
License | Apache 2.0 License |