Evaluation for bioteque
Evaluator: Not specified
Evaluated on: 2025-08-14
This is a manual evaluation intended to identify potential barriers to reuse.
Access Level and Types
Question | Answer | Comment |
---|---|---|
Access to data outside of the knowledge graph | Y | Bioteque provides node-type-specific embeddings for 11 of the node types in its graph, along with downloadable sets of those individual sets of nodes mapped to their representations in the embeddings. |
API or online access to the knowledge graph | N | |
Multiple access options available | N | |
Source code availability | Y | Source code for assembling the KG is provided in a GitHub repository, https://github.com/sbnb-irb/bioteque. This includes source-specific scripts for retrieving data components. |
Downloadable knowledge graph | N |
Section Score: 2/5
Provenance of Nodes and Edges
Question | Answer | Comment |
---|---|---|
Source list provided | Y | See https://bioteque.irbbarcelona.org/sources |
Source versions information | N | |
Import dependencies | Y | The list of data sources is documented and includes specific mentions of source files in its data retrieval scripts, e.g., the source for COSMIC is defined here: https://github.com/sbnb-irb/bioteque/blob/master/datasets/cosmic_mutsig/get_data.sh |
Node and edge sources | Y | The sources of nodes and edges are provided in the documentation and the node files. |
Edges deduplication | Y | The 2022 Bioteque paper makes two notes about duplicate resolution: "...we first mapped the samples and genes to our reference vocabulary and collapsed the duplicates by their mean value", and "We mapped the cell lines and genes to our reference vocabularies and took the mean value whenever duplicates occurred". |
Triples source details | Y | The list of data sources is documented and makes specific mention of which associations are derived from which sources; see https://bioteque.irbbarcelona.org/sources |
Edge type schema | N |
Section Score: 5/7
Documented standards, schema, construction
Question | Answer | Comment |
---|---|---|
Biological usable data | Y | The provided node files are provided as TSVs, though assembly of a full KG would require running the graph assembly code. It appears that the assembly code also produces nodes and edges in a TSV format. |
Resolvable IDs | Y | Node identifiers are from clearly defined sources and expressed as CURIEs |
Construction documentation | Y | Documentation regarding the assembly code is provided; see https://github.com/sbnb-irb/bioteque |
Transformation documentation | Y | Each source has its own transform code and documentation, provided on the GitHub repo, https://github.com/sbnb-irb/bioteque |
Schema used | N |
Section Score: 4/5
Update frequency and versioning
Question | Answer | Comment |
---|---|---|
Stable versions | N | |
Public tracker information | Y | (the GitHub repository at https://github.com/sbnb-irb/bioteque is public and permits issue creation). |
Knowledge graph contact information | Y | It's never explicitly stated as a contact, but the responsible organization, the Structural Bioinformatics and Network Biology Group at the Institute for Research in Biomedicine Barcelona, is identified along with a link to their home page |
Updated annually | ||
Prior versions access | N |
Section Score: 2/4
Evaluation - Metrics and Fitness for Purpose
Question | Answer | Comment |
---|---|---|
Use case provided | Y | Examples of use are described in the 2022 Nat Comm paper; an example of generating predictions for drug repurposing is provided. |
Evaluation against other models | N | |
Defined scope | N | |
Multiple evaluation methods | Y | Multiple evaluation methods are provided in the 2022 Nat Comm paper, primarily for embedding evaluation. |
Accuracy metrics | Y | Multiple validation methods are provided in the 2022 Nat Comm paper, including two distinct analyses involving gene expression data and protein-protein interactions, respectively. |
Section Score: 3/5
License Information
Question | Answer | Comment |
---|---|---|
License | CC BY 4.0 |