Evaluation for bioteque

Evaluator: Not specified

Evaluated on: 2025-08-14

This is a manual evaluation intended to identify potential barriers to reuse.


Access Level and Types

QuestionAnswerComment
Access to data outside of the knowledge graphYBioteque provides node-type-specific embeddings for 11 of the node types in its graph, along with downloadable sets of those individual sets of nodes mapped to their representations in the embeddings.
API or online access to the knowledge graphN
Multiple access options availableN
Source code availabilityYSource code for assembling the KG is provided in a GitHub repository, https://github.com/sbnb-irb/bioteque. This includes source-specific scripts for retrieving data components.
Downloadable knowledge graphN

Section Score: 2/5

Provenance of Nodes and Edges

QuestionAnswerComment
Source list providedYSee https://bioteque.irbbarcelona.org/sources
Source versions informationN
Import dependenciesYThe list of data sources is documented and includes specific mentions of source files in its data retrieval scripts, e.g., the source for COSMIC is defined here: https://github.com/sbnb-irb/bioteque/blob/master/datasets/cosmic_mutsig/get_data.sh
Node and edge sourcesYThe sources of nodes and edges are provided in the documentation and the node files.
Edges deduplicationYThe 2022 Bioteque paper makes two notes about duplicate resolution: "...we first mapped the samples and genes to our reference vocabulary and collapsed the duplicates by their mean value", and "We mapped the cell lines and genes to our reference vocabularies and took the mean value whenever duplicates occurred".
Triples source detailsYThe list of data sources is documented and makes specific mention of which associations are derived from which sources; see https://bioteque.irbbarcelona.org/sources
Edge type schemaN

Section Score: 5/7

Documented standards, schema, construction

QuestionAnswerComment
Biological usable dataYThe provided node files are provided as TSVs, though assembly of a full KG would require running the graph assembly code. It appears that the assembly code also produces nodes and edges in a TSV format.
Resolvable IDsYNode identifiers are from clearly defined sources and expressed as CURIEs
Construction documentationYDocumentation regarding the assembly code is provided; see https://github.com/sbnb-irb/bioteque
Transformation documentationYEach source has its own transform code and documentation, provided on the GitHub repo, https://github.com/sbnb-irb/bioteque
Schema usedN

Section Score: 4/5

Update frequency and versioning

QuestionAnswerComment
Stable versionsN
Public tracker informationY(the GitHub repository at https://github.com/sbnb-irb/bioteque is public and permits issue creation).
Knowledge graph contact informationYIt's never explicitly stated as a contact, but the responsible organization, the Structural Bioinformatics and Network Biology Group at the Institute for Research in Biomedicine Barcelona, is identified along with a link to their home page
Updated annually
Prior versions accessN

Section Score: 2/4

Evaluation - Metrics and Fitness for Purpose

QuestionAnswerComment
Use case providedYExamples of use are described in the 2022 Nat Comm paper; an example of generating predictions for drug repurposing is provided.
Evaluation against other modelsN
Defined scopeN
Multiple evaluation methodsYMultiple evaluation methods are provided in the 2022 Nat Comm paper, primarily for embedding evaluation.
Accuracy metricsYMultiple validation methods are provided in the 2022 Nat Comm paper, including two distinct analyses involving gene expression data and protein-protein interactions, respectively.

Section Score: 3/5

License Information

QuestionAnswerComment
LicenseCC BY 4.0