Overview
iModulonDB is a comprehensive knowledgebase that applies machine learning to bacterial transcriptomes to reveal independently modulated gene sets and transcriptional regulatory networks. First released in 2020 and significantly expanded in 2024, the database uses Independent Component Analysis (ICA) to decompose high-quality transcriptomic datasets into iModulons - sets of co-expressed genes that represent independently modulated regulatory signals.
Key Features
- Extensive Coverage: Contains 22 ICA decompositions spanning 15 prokaryotic organisms, with 1924 curated iModulons derived from 9576 expression profiles across 525 studies
- Supported Organisms: E. coli, B. subtilis, S. aureus, M. tuberculosis, P. aeruginosa, P. putida, P. syringae, S. enterica, A. baumannii, L. reuteri, S. acidocaldarius, S. pyogenes, S. albidoflavus, S. elongatus, V. natriegens, S. pneumoniae
- Interactive Dashboards: Organism-specific pages with iModulon browsers, gene search functionality, and detailed visualizations of gene weights, iModulon activities, and regulatory relationships
- Data Curation: Comprehensive metadata including study abstracts, experimental conditions, control/experimental variables, and genetic perturbations
- Analysis Tools: Search-driven pairwise analysis of iModulon-iModulon, iModulon-gene, and gene-gene relationships; dataset-wide correlation analyses; phase-plane plots showing iModulon activity relationships
- External Integration: Direct links to BioCyc operon diagrams and STRING protein-protein interaction networks for biological context
Understanding iModulons
What are iModulons?
iModulons are independently modulated groups of genes identified through Independent Component Analysis (ICA) of transcriptomic datasets. Each iModulon consists of:
- Gene Weights: Indicating how strongly each gene belongs to the iModulon
- Activity Profiles: Showing how active the iModulon is under different experimental conditions
- Regulatory Associations: Links to known transcription factors and regulons when characterized
iModulons vs Regulons
While regulons are defined by experimentally validated transcription factor binding sites, iModulons are data-driven:
- Regulons: Bottom-up approach based on binding site identification
- iModulons: Top-down approach based on co-expression patterns
- Often highly overlapping but iModulons can reveal:
- Novel genes regulated by known transcription factors
- Subsets of regulons responding to specific conditions
- Uncharacterized regulatory modules
- Cross-regulation between regulatory systems
Applications and Use Cases
Gene Function Prediction
- Assign putative functions to uncharacterized genes based on their iModulon membership
- 224 genes in E. coli have been characterized through iModulon analysis
Regulatory Network Discovery
- Identify new transcriptional regulators and their target genes
- Reveal condition-specific activation patterns
- Understand systems-level organization of transcriptional regulation
Experimental Validation
- MetJ and CysB regulons in E. coli were expanded through ChIP-exo validation of iModulon predictions
- BrlR-mediated regulation of mexPQ-ompE efflux system discovered in P. aeruginosa
Comparative Analysis
- Pan-genomic analysis across strains (e.g., Salmonella Typhimurium)
- Evolutionary insights from iModulon conservation
- Understanding adaptation to environmental conditions
Data Projection
- Project new datasets onto existing iModulon structures
- Perform differential iModulon activity analysis between conditions
- Simplify analysis from hundreds of genes to handful of iModulons
Technical Details
ICA Methodology
Independent Component Analysis decomposes a transcriptomic matrix (X) into:
- M matrix: Links genes to iModulons (gene weights)
- A matrix: Links iModulons to conditions (activity levels)
- Assumes independent, linearly additive effects of transcriptional regulators
Quality Control
Datasets must meet criteria for high-quality iModulon computation:
- High-quality RNA-seq with rigorous QC (alignment, contamination checks)
- Consistent underlying transcriptional regulatory network (typically strain-specific)
- Large number of unique conditions (minimum ~100 samples recommended)
- Minimal batch effects through careful normalization
Data Processing Pipeline
- Download RNA-seq data from public repositories (Sequence Read Archive)
- Align reads to reference genome
- Quality control and filtering
- Normalize expression data to reference conditions
- Run ICA to compute iModulons
- Characterize iModulons through regulon enrichment
- Generate interactive dashboard files
Access and Usage
Web Interface
The main website (iModulonDB.org) provides:
- Organism selection page listing all available datasets
- Dataset pages with iModulon tables and dataset-wide analyses
- Individual iModulon dashboards with detailed visualizations
- Gene search functionality across organisms
- Project pages showing experimental metadata and publication information
- Analysis pages for conducting user-specified comparisons
Programmatic Access
PyModulon Package: Available on GitHub and via pip installation, provides Python API for:
- Loading and manipulating iModulon data
- Computing differential iModulon activity
- Generating custom plots and analyses
- Projecting new data onto existing iModulon structures
Pipeline Tools: iModulonMiner and Modulome Workflow available for generating custom iModulon decompositions
Data Downloads
All transcriptomic datasets and iModulon structures are downloadable directly from the website or through associated publications.
Recent Updates (Version 2.0)
The 2024 update represents a 1370% growth in expression profiles and includes:
- Expanded Content: 19 new datasets, 12 additional organisms, 8925 new samples
- Enhanced Visualization: Condition-specific coloring, genetic perturbation highlighting, improved correlation plots
- New Analysis Tools: Search-driven pairwise analyses, phase-plane plots, dataset-wide correlation searches
- Improved Curation: Study abstracts, experimental variables, reference conditions displayed for all projects
- External Resources: Direct links to BioCyc operons and STRING interaction networks
- Interactive Features: TreeMaps for explained variance, Recall Plots for regulon overlap, correlation heatmaps
Database Statistics
- Datasets: 22 ICA decompositions across 15 organisms
- iModulons: 1924 curated sets with characterized functions and regulators
- Expression Profiles: 9576 samples from diverse experimental conditions
- Studies: 525 unique experiments with 247 accompanying publications
- Monthly Active Users: ~500 researchers worldwide
- Citations: Over 300 citations of iModulonDB-related publications
Community and Support
Contributing
Researchers can submit their own ICA decompositions and transcriptomic datasets through the contact form on the website.
For questions, comments, feedback, or collaboration inquiries:
- Email: ecatoiu@ucsd.edu
- Organization: Systems Biology Research Group (SBRG) at UC San Diego
- Website: http://systemsbiology.ucsd.edu/
- RegulonDB: Experimentally validated regulons for E. coli
- BioCyc: Metabolic pathways and operon structures
- STRING: Protein-protein interaction networks
- Sequence Read Archive: Source of public RNA-seq data