lincs-l1000

is a Data Source.

The LINCS L1000 is a high-throughput, reduced representation gene expression profiling assay developed as part of the NIH Library of Integrated Network-Based Cellular Signatures (LINCS) Program. L1000 directly measures 978 landmark genes and computationally infers the expression of 11,350 additional genes, enabling cost-effective large-scale transcriptional profiling at approximately $2 per sample. The technology was developed to create a Connectivity Map (CMap) that catalogs cellular responses to genetic and chemical perturbations, facilitating drug discovery, mechanism of action determination, and functional annotation of genetic variants. The LINCS L1000 dataset comprises over 1.3 million gene expression profiles representing responses to 19,811 chemical compounds, genetic perturbations targeting 5,075 genes (via shRNA knockdowns and cDNA overexpression), and 314 biologics across multiple cell lines and time points.

Compare

Domains

drug discovery

License

Warning: No license entered

Homepage

lincs-l1000

Repository

GitHub

Infores ID

Unknown

FAIRsharing ID

Unknown

Product Summary

Contacts

Broad Institute LINCS Transcriptomics Center

Email: lincs-help@broadinstitute.org

Publications

A Next Generation Connectivity Map: L1000 platform and the first 1,000,000 profiles Preferred

Products

From this Resource

ID	Name	URL	Category	Format	Description
lincs-l1000.cmap	LINCS Connectivity Map (CMap)	clue.io	GraphicalInterface	http	The Connectivity Map (CMap) database ...
lincs-l1000.clue	CLUE Platform	clue.io	GraphicalInterface	http	The CLUE platform provides interactiv...
lincs-l1000.geo ⚠	LINCS L1000 GEO Dataset	acc.cgi?acc=GSE92742	Product	http	LINCS L1000 data deposited in the Gen...

From other Resources

ID	Name	URL	Category	Format	Relation	Description
spoke.graph	SPOKE Graph	data-tools	GraphProduct	http	had primary source	The SPOKE knowledge graph containing ...
alzkb.browser	AlzKB Graph Database Browser	login	GraphicalInterface	http	had primary source	A browser interface for a knowledge g...
alzkb.data	AlzKB Data Release (Version 2.0.0)	v2.0.0	GraphProduct	mixed	had primary source	Memgraph data release for AlzKB.
harmonizome.downloads	Harmonizome Downloads	download	Product	mixed	was derived from	Harmonizome 3.0 processed dataset dow...
harmonizome.kg-neo4j	Harmonizome Knowledge Graph Neo4j Database	harmonizome-kg.maayanlab.cloud	GraphProduct	neo4j	was derived from	Neo4j knowledge graph serialization o...
spoke-okn.graph	SPOKE-OKN Graph	spoke.ucsf.edu	GraphProduct	ttl	was influenced by	The SPOKE-OKN knowledge graph, an OKN...

Relevant Taxa

Human (NCBITaxon:9606)

Details

Overview

The LINCS L1000 platform represents a major advance in high-throughput transcriptional profiling technology. By measuring only 978 carefully selected “landmark” genes and using computational inference for the remaining transcriptome, L1000 achieves dramatic cost reduction (approximately $2 per sample) while maintaining high reproducibility and comparability to RNA-seq. This breakthrough enabled the generation of over 1.3 million gene expression profiles as part of the NIH LINCS Program, creating one of the largest publicly available transcriptional profiling resources.

Key Features

Technology

Landmark genes: 978 directly measured genes selected through data-driven analysis of 12,000+ microarray profiles
Inferred genes: 11,350 genes computationally inferred with 81% showing high accuracy
Platform: Ligation-mediated amplification (LMA) with Luminex bead-based detection
Cost: Approximately $2 per profile (vs. traditional microarrays or RNA-seq)
Reproducibility: >88% of technical replicates show Spearman correlation >0.9
Validation: Highly comparable to RNA-seq (median cross-platform correlation 0.84)

Dataset Composition (CMap-L1000v1)

1,319,138 L1000 profiles comprising:
- 19,811 small molecule compounds (drugs, tool compounds, screening libraries)
- 18,493 shRNAs targeting 5,075 genes for loss-of-function studies
- 3,462 cDNAs for gain-of-function studies
- 314 biologics
- 473,647 signatures (consolidating biological replicates)
Cell line coverage:
- 9 core cancer cell lines (Touchstone reference dataset)
- Up to 77 cell lines for Discovery dataset
- Includes neuronal cell types (NPCs and differentiated neurons)
Time points: 6h and 24h for chemical perturbations, 96h for genetic perturbations

Applications

Drug discovery: Mechanism of action (MOA) determination, off-target effect identification
Compound annotation: Functional classification of uncharacterized molecules via connectivity to Perturbagen Classes (PCLs)
Genetic variant interpretation: Assess functional impact of disease-associated alleles (e.g., FBXW7, KEAP1, PTEN)
Clinical trial analysis: Evaluate target engagement and identify resistance mechanisms
Connectivity mapping: Discover relationships between genes, drugs, and disease states through gene expression signatures
Bioactivity screening: Identify transcriptionally active compounds from screening libraries

Data Access and Tools

Primary Data Portals

CLUE Platform (https://clue.io): Interactive analysis tools, signature search, data downloads, and APIs
GEO (GSE92742): Raw and processed data at multiple preprocessing levels
GitHub (https://github.com/cmap/cmapM): Pre-processing code and tools

Additional LINCS Resources

SigCom LINCS (https://maayanlab.cloud/sigcom-lincs): Search across 1.5M+ signatures
L1000FWD (https://maayanlab.cloud/L1000FWD/): L1000 Characteristic Direction Signature Search Engine
iLINCS (http://www.ilincs.org): Integrated LINCS data portal
LINCS Data Portal (http://lincsportal.ccs.miami.edu/dcic-portal/): DCIC data coordination center

Data Levels

Level 1: Raw bead count and fluorescence intensity
Level 2: Deconvoluted data (assigning expression to two genes per bead color)
Level 3: Normalized data (LISS + quantile normalization) with inferred gene expression
Level 4: Differential expression (z-scores)
Level 5: Replicate-consensus signatures

Methodology Highlights

Perturbagen Classes (PCLs)

171 high-confidence Perturbagen Classes have been defined, representing groups of perturbagens with shared mechanisms. These include:

92 compound classes (e.g., HDAC inhibitors, kinase inhibitors)
60 loss-of-function gene classes
17 gain-of-function gene classes

PCLs enhance interpretability by aggregating similar perturbagens to strengthen on-target signals while diminishing off-target effects.

Consensus Gene Signatures (CGS)

To mitigate strong off-target effects of shRNAs (where seed sequence effects often exceed on-target effects), a Consensus Gene Signature algorithm was developed that identifies consistent gene expression changes across multiple shRNAs targeting the same gene.

Query Methodology

The Connectivity Map uses a weighted connectivity score (WTCS) approach similar to Gene Set Enrichment Analysis, computing similarity between query gene signatures and database signatures. Results are normalized and summarized across cell lines using quantile-based metrics (τ scores) to identify robust connections.

Key Publications and Citations

Primary Reference:

Subramanian A, Narayan R, Corsello SM, et al. A Next Generation Connectivity Map: L1000 platform and the first 1,000,000 profiles. Cell. 2017;171(6):1437-1452.e17. doi:10.1016/j.cell.2017.10.049

Additional Resources:

Original Connectivity Map concept: Lamb J, et al. The Connectivity Map: using gene-expression signatures to connect small molecules, genes, and disease. Science. 2006;313(5795):1929-1935.
Detailed protocols: https://clue.io/sop-L1000.pdf
Connectopedia knowledge base: https://clue.io/connectopedia

Funding

Supported by NIH grants including 5U54HG006093, U54HG008699, and 5U01HG008699 as part of the NIH Common Fund LINCS Program.

Is this information incorrect or incomplete? Request an update.

Created: January 08, 2025 | Last modified: November 08, 2025