tbkg

is a Knowledge Graph.

A weighted heterogeneous knowledge graph containing four types of entities (Tumor, Biomarker, Drug, and ADR) extracted from MEDLINE corpus for adverse drug reaction discovery in antitumor drugs. TBKG uses a naive Bayesian model to explore correlations and provides explainable predictions through tumor-biomarker-drug pathways. The knowledge graph contains 1,179 tumors, 2,550 biomarkers, 1,806 drugs, and 756 ADRs with six types of relationships totaling 139,254 edges.

License

CC BY 4.0

Homepage

tbkg

Infores ID

Unknown

FAIRsharing ID

Unknown

Product Summary

Products

From this Resource
ID Name URL Category Format Description
tbkg.data TBKG Knowledge Graph Data full#supplementary-material GraphProduct mixed Weighted heterogeneous knowledge grap...
tbkg.osimertinib_case_study TBKG Osimertinib ADR Case Study Data full#supplementary-material Product mixed Clinical validation dataset with calc...

Details

Overview

TBKG (Tumor-Biomarker Knowledge Graph) is an explainable knowledge graph-based approach for discovering potential adverse drug reactions (ADRs) of antitumor drugs. The system extracts entities from biomedical literature (MEDLINE database with 22+ million citations) using the UMLS Metathesaurus 2020AA and Apache cTAKES natural language processing tool.

Key Features

  • Entity Types: Four node types (Tumor, Biomarker, Drug, ADR) with minimum frequency threshold of 50
  • Knowledge Graph Structure: Weighted heterogeneous graph with undirected edges representing correlations
  • Relationship Discovery: Naive Bayesian model combining prior and posterior probabilities to avoid bias
  • ADR Discovery: Depth-first search algorithm to find all paths between drug-ADR pairs
  • Explainability: Provides tumor-biomarker-drug pathways explaining predicted ADRs
  • Performance: 0.81 accuracy in three-fold cross-validation, outperforms co-occurrence analysis

Data Sources

  • MEDLINE database (1928-2020, filtered with “cancer therapy” keyword)
  • UMLS Metathesaurus 2020AA for entity dictionaries
  • WHO source dictionary for ADR coding
  • Clinical validation from 3rd Xiangya Hospital (8 patients, 2017-2020)

Technical Implementation

  • Entity Extraction: cTAKES NLP system with UMLS concept mapping
  • Importance Measure: log(p(biomarker tumor present)) - log(p(biomarker tumor absent))
  • Graph Construction: Binary matrix representation (0/1 for entity presence in abstracts)
  • Cross-validation: Three-fold to prevent overfitting

Clinical Validation

Osimertinib case study demonstrated:

  • Moderate consistency with official manual (Kappa=0.68)
  • Better specificity than co-occurrence methods (Kappa=0.4)
  • Discovery of rare/unreported ADRs (e.g., renal failure requiring dialysis)
  • Identification of mediating biomarkers (e.g., Macrophage-Activating Factors for nephrosclerosis)

Applications

  • Early ADR detection before drug development
  • Mechanism research through biomarker pathways
  • Clinical decision support for oncologists
  • Literature-based drug safety surveillance
  • Rare ADR discovery and explanation

Is this information incorrect or incomplete? Request an update.

Created: November 22, 2025 | Last modified: May 29, 2026