diseases

is an Aggregator.

DISEASES is a weekly updated database that integrates evidence on disease-gene associations from automatic text mining, manually curated literature, cancer mutation data, and genome-wide association studies. It provides confidence scores to facilitate comparison of different types and sources of evidence.

Domains

health, genomics, biomedical, literature

License

CC BY 4.0

Homepage

diseases

Repository

Unknown

Infores ID

infores:diseases

FAIRsharing ID

Unknown

Product Summary

Publications

Preferred

Products

From this Resource
ID Name URL Category Format Description
diseases.portal DISEASES Web Search diseases.jensenlab.org GraphicalInterface http Web search interface for querying hum...
diseases.textmining-full Text Mining Channel (Full) human_disease_textmining_full.tsv (1.8 GB) Product tsv Disease-gene associations from text m...
diseases.textmining-filtered Text Mining Channel (Filtered) human_disease_textmining_filtered.tsv (46.4 MB) Product tsv Disease-gene associations from text m...
diseases.knowledge-full Knowledge Channel (Full) human_disease_knowledge_full.tsv (6.6 MB) Product tsv Disease-gene associations from manual...
diseases.knowledge-filtered Knowledge Channel (Filtered) human_disease_knowledge_filtered.tsv (588.0 KB) Product tsv Disease-gene associations from manual...
diseases.experiments-full Experiments Channel (Full) human_disease_experiments_full.tsv (25.7 MB) Product tsv Disease-gene associations from experi...
diseases.experiments-filtered Experiments Channel (Filtered) human_disease_experiments_filtered.tsv (2.4 MB) Product tsv Disease-gene associations from experi...
diseases.integrated-full Integrated Channel (Full) human_disease_integrated_full.tsv (618.3 MB) Product tsv Experimental integrated channel combi...
diseases.dictionary DISEASES Dictionary diseases_dictionary.tar.gz (15.2 MB) Product Dictionary of human gene and disease ...
amyco.annotations AmyCo Curated Annotations Product Manually curated disease-gene associa...
From other Resources
ID Name URL Category Format Description
clinicalkg.graph CKG Graph Dump 1 GraphProduct mixed Neo4j database dump of the Clinical K...
cancer-genome-interpreter.clinicalkg.graph CKG Graph Dump 1 GraphProduct mixed Neo4j database dump of the Clinical K...
spoke.graph SPOKE Graph GraphProduct The SPOKE knowledge graph containing ...
translator.diseases.graph Translator DISEASES KGX Graph latest GraphProduct kgx-jsonl KGX JSONL graph package for DISEASES ...
string.protein.links STRING Protein Links protein.links.v12.0.txt.gz (128.7 GB) GraphProduct txt protein network data (full network, s...
string.protein.links.detailed STRING Protein Links Detailed protein.links.detailed.v12.0.txt.gz (189.6 GB) GraphProduct txt protein network data (full network, i...
string.protein.links.full STRING Protein Links Full protein.links.full.v12.0.txt.gz (199.6 GB) GraphProduct txt protein network data (full network, i...
string.protein.physical.links STRING Protein Physical Links protein.physical.links.v12.0.txt.gz (11.1 GB) GraphProduct txt protein network data (physical subnet...
string.protein.physical.links.detailed STRING Protein Physical Links Detailed protein.physical.links.detailed.v12.0.txt.gz (13.8 GB) GraphProduct txt protein network data (physical subnet...
string.protein.physical.links.full STRING Protein Physical Links Full protein.physical.links.full.v12.0.txt.gz (14.5 GB) GraphProduct txt protein network data (physical subnet...
string.cog.links STRING COG Links COG.links.v12.0.txt.gz (176.8 MB) GraphProduct txt association scores between orthologou...
string.cog.links.detailed STRING COG Links Detailed COG.links.detailed.v12.0.txt.gz (238.7 MB) GraphProduct txt association scores (incl. subscores p...
string.database STRING Database Network Schema network_schema.v12.0.sql.gz (262.2 GB) GraphProduct full database, part II: the networks ...
translator.translator_kg.graph Translator Aggregate KGX Graph latest GraphProduct kgx-jsonl Aggregated KGX JSONL graph package co...
ckg.graph CKG Graph Database Dump 1 GraphProduct neo4j Graph database dump and additional re...

Details

DISEASES

Overview

DISEASES is a comprehensive database that integrates disease-gene associations from multiple evidence sources. Maintained by the JensenLab and currently hosted at the Swiss Institute of Bioinformatics (University of Zurich), it provides weekly updated data combining automatic text mining, manually curated knowledge, cancer mutation data, and genome-wide association studies. The resource assigns unified confidence scores to facilitate comparison across different types of evidence.

Data Content

DISEASES integrates disease-gene associations through four main channels:

1. Text Mining Channel

  • Automatic extraction of disease-gene mentions from scientific literature
  • Z-scores indicating co-mention strength
  • Confidence scores for each association
  • Links to underlying abstracts

2. Knowledge Channel

  • Manually curated disease-gene associations from literature
  • Evidence type classification
  • Source database attribution
  • Confidence scores

3. Experiments Channel

  • Cancer mutation data
  • Genome-wide association study (GWAS) results
  • Source scores from original databases
  • Confidence scores

4. Integrated Channel

  • Experimental combination of all evidence sources
  • Unified confidence scoring across channels
  • Non-redundant filtered associations

Data Organization

All downloadable files contain:

  • Gene identifier and name
  • Disease identifier and name
  • Channel-specific evidence metrics
  • Confidence scores for comparison

Full datasets: Complete associations from the database Filtered datasets: Non-redundant associations shown in the web interface

Key Features

  • Weekly Updates: Regular integration of new evidence from literature and databases
  • Multiple Evidence Types: Text mining, curated knowledge, experimental data
  • Confidence Scores: Unified scoring system for comparing evidence quality
  • Human-Specific: Focus on human disease-gene associations
  • Open Access: All data available under CC BY 4.0 license
  • Interactive Search: Web interface for exploring associations

Access Methods

  • Web Interface: Search and browse at https://diseases.jensenlab.org/
  • Direct Downloads: TSV files for each channel (full and filtered)
  • Dictionary: Gene and disease name dictionary for local text mining
  • Historical Data: Previous versions archived on figshare

Download Products

Each channel available in two versions:

  1. Full: All associations in the database
  2. Filtered: Non-redundant associations (shown in web interface)

Additional resources:

  • DISEASES tagger for local installation (Unix platforms)
  • Disease and gene name dictionary
  • Benchmark dataset from original publication
  • List of excluded PubMed IDs (papermill publications)

Use Cases

  1. Disease Gene Discovery: Identify candidate genes for diseases of interest
  2. Literature Mining: Access text-mined associations from biomedical literature
  3. Evidence Integration: Compare multiple lines of evidence for disease-gene links
  4. Network Analysis: Build disease-gene networks for systems biology studies
  5. Validation: Benchmark other disease-gene prediction methods
  6. Local Text Mining: Use dictionary and tagger for custom analyses

Management

Current Maintainer: Qingyao Huang (Swiss Institute of Bioinformatics, University of Zurich)

Original Developers:

  • Sune Frankild
  • Alexander Junge
  • Albert Pallejà
  • Dhouha Grissa
  • Kalliopi Tsafou
  • Lars Juhl Jensen

Affiliation: Novo Nordisk Foundation Center for Protein Research

Funding

  • Novo Nordisk Foundation (NNF14CC0001)
  • National Institutes of Health (U54 CA189205, U24 224370)
  • European Union’s Seventh Framework Programme (n259348)

License

Licensed under Creative Commons Attribution 4.0 International (CC BY 4.0)

Citation

Primary Publication: Grissa, D., Junge, A., Oprea, T. I., & Jensen, L. J. (2022). DISEASES 2.0: a weekly updated database of disease–gene associations from text mining and data integration. Database, 2022, baac019. https://doi.org/10.1093/database/baac019 (PMID: 35348650)

Original Publication: Pletscher-Frankild, S., Pallejà, A., Tsafou, K., Binder, J. X., & Jensen, L. J. (2015). DISEASES: Text mining and data integration of disease–gene associations. Methods, 74, 83-89. (PMID: 25484339)

Is this information incorrect or incomplete? Request an update.

Created: June 04, 2025 | Last modified: January 30, 2026