KG-Registry

ID	Name	URL	Category	Format	Description
imodulondb.browser	iModulonDB Web Interface	imodulondb.org	GraphicalInterface	http	Interactive web interface for browsin...
imodulondb.pymodulon	PyModulon Python Package	pymodulon	ProgrammingInterface	http	Python package for characterization, ...
imodulondb.imodulonminer	iModulonMiner Pipeline	iModulonMiner	Product	http	Computational pipeline for gathering ...
imodulondb.modulome_workflow	Modulome Workflow	modulome-workflow	Product	http	Modulome workflow for scraping public...
imodulondb.datasets	iModulonDB Datasets	imodulondb.org	Product	mixed	Downloadable transcriptomic datasets ...

Relevant Taxa

Bacteria (NCBITaxon:2)

Details

Overview

iModulonDB is a comprehensive knowledgebase that applies machine learning to bacterial transcriptomes to reveal independently modulated gene sets and transcriptional regulatory networks. First released in 2020 and significantly expanded in 2024, the database uses Independent Component Analysis (ICA) to decompose high-quality transcriptomic datasets into iModulons - sets of co-expressed genes that represent independently modulated regulatory signals.

Key Features

Extensive Coverage: Contains 22 ICA decompositions spanning 15 prokaryotic organisms, with 1924 curated iModulons derived from 9576 expression profiles across 525 studies
Supported Organisms: E. coli, B. subtilis, S. aureus, M. tuberculosis, P. aeruginosa, P. putida, P. syringae, S. enterica, A. baumannii, L. reuteri, S. acidocaldarius, S. pyogenes, S. albidoflavus, S. elongatus, V. natriegens, S. pneumoniae
Interactive Dashboards: Organism-specific pages with iModulon browsers, gene search functionality, and detailed visualizations of gene weights, iModulon activities, and regulatory relationships
Data Curation: Comprehensive metadata including study abstracts, experimental conditions, control/experimental variables, and genetic perturbations
Analysis Tools: Search-driven pairwise analysis of iModulon-iModulon, iModulon-gene, and gene-gene relationships; dataset-wide correlation analyses; phase-plane plots showing iModulon activity relationships
External Integration: Direct links to BioCyc operon diagrams and STRING protein-protein interaction networks for biological context

Understanding iModulons

What are iModulons?

iModulons are independently modulated groups of genes identified through Independent Component Analysis (ICA) of transcriptomic datasets. Each iModulon consists of:

Gene Weights: Indicating how strongly each gene belongs to the iModulon
Activity Profiles: Showing how active the iModulon is under different experimental conditions
Regulatory Associations: Links to known transcription factors and regulons when characterized

iModulons vs Regulons

While regulons are defined by experimentally validated transcription factor binding sites, iModulons are data-driven:

Regulons: Bottom-up approach based on binding site identification
iModulons: Top-down approach based on co-expression patterns
Often highly overlapping but iModulons can reveal:
- Novel genes regulated by known transcription factors
- Subsets of regulons responding to specific conditions
- Uncharacterized regulatory modules
- Cross-regulation between regulatory systems

Applications and Use Cases

Gene Function Prediction

Assign putative functions to uncharacterized genes based on their iModulon membership
224 genes in E. coli have been characterized through iModulon analysis

Regulatory Network Discovery

Identify new transcriptional regulators and their target genes
Reveal condition-specific activation patterns
Understand systems-level organization of transcriptional regulation

Experimental Validation

MetJ and CysB regulons in E. coli were expanded through ChIP-exo validation of iModulon predictions
BrlR-mediated regulation of mexPQ-ompE efflux system discovered in P. aeruginosa

Comparative Analysis

Pan-genomic analysis across strains (e.g., Salmonella Typhimurium)
Evolutionary insights from iModulon conservation
Understanding adaptation to environmental conditions

Data Projection

Project new datasets onto existing iModulon structures
Perform differential iModulon activity analysis between conditions
Simplify analysis from hundreds of genes to handful of iModulons

Technical Details

ICA Methodology

Independent Component Analysis decomposes a transcriptomic matrix (X) into:

M matrix: Links genes to iModulons (gene weights)
A matrix: Links iModulons to conditions (activity levels)
Assumes independent, linearly additive effects of transcriptional regulators

Quality Control

Datasets must meet criteria for high-quality iModulon computation:

High-quality RNA-seq with rigorous QC (alignment, contamination checks)
Consistent underlying transcriptional regulatory network (typically strain-specific)
Large number of unique conditions (minimum ~100 samples recommended)
Minimal batch effects through careful normalization

Data Processing Pipeline

Download RNA-seq data from public repositories (Sequence Read Archive)
Align reads to reference genome
Quality control and filtering
Normalize expression data to reference conditions
Run ICA to compute iModulons
Characterize iModulons through regulon enrichment
Generate interactive dashboard files

Access and Usage

Web Interface

The main website (iModulonDB.org) provides:

Organism selection page listing all available datasets
Dataset pages with iModulon tables and dataset-wide analyses
Individual iModulon dashboards with detailed visualizations
Gene search functionality across organisms
Project pages showing experimental metadata and publication information
Analysis pages for conducting user-specified comparisons

Programmatic Access

PyModulon Package: Available on GitHub and via pip installation, provides Python API for:

Loading and manipulating iModulon data
Computing differential iModulon activity
Generating custom plots and analyses
Projecting new data onto existing iModulon structures

Pipeline Tools: iModulonMiner and Modulome Workflow available for generating custom iModulon decompositions

Data Downloads

All transcriptomic datasets and iModulon structures are downloadable directly from the website or through associated publications.

Recent Updates (Version 2.0)

The 2024 update represents a 1370% growth in expression profiles and includes:

Expanded Content: 19 new datasets, 12 additional organisms, 8925 new samples
Enhanced Visualization: Condition-specific coloring, genetic perturbation highlighting, improved correlation plots
New Analysis Tools: Search-driven pairwise analyses, phase-plane plots, dataset-wide correlation searches
Improved Curation: Study abstracts, experimental variables, reference conditions displayed for all projects
External Resources: Direct links to BioCyc operons and STRING interaction networks
Interactive Features: TreeMaps for explained variance, Recall Plots for regulon overlap, correlation heatmaps

Database Statistics

Datasets: 22 ICA decompositions across 15 organisms
iModulons: 1924 curated sets with characterized functions and regulators
Expression Profiles: 9576 samples from diverse experimental conditions
Studies: 525 unique experiments with 247 accompanying publications
Monthly Active Users: ~500 researchers worldwide
Citations: Over 300 citations of iModulonDB-related publications

Community and Support

Contributing

Researchers can submit their own ICA decompositions and transcriptomic datasets through the contact form on the website.

Contact

For questions, comments, feedback, or collaboration inquiries:

Email: ecatoiu@ucsd.edu
Organization: Systems Biology Research Group (SBRG) at UC San Diego
Website: http://systemsbiology.ucsd.edu/

RegulonDB: Experimentally validated regulons for E. coli
BioCyc: Metabolic pathways and operon structures
STRING: Protein-protein interaction networks
Sequence Read Archive: Source of public RNA-seq data

imodulondb

Domains

License

Homepage

Repository

Infores ID

FAIRsharing ID

Product Summary

Contacts

Publications

Products

From this Resource