gencode

is a Data Source.

GENCODE is a comprehensive and high-quality reference annotation of the human and mouse genomes, providing evidence-based gene annotations including protein-coding genes, long non-coding RNAs, small RNAs, pseudogenes, and other genomic features based on manual curation and computational analysis.

Domains

genomics

License

Warning: No license entered

Homepage

gencode

Repository

Unknown

Infores ID

Unknown

FAIRsharing ID

Unknown

Product Summary

Products

From this Resource
ID Name URL Category Format Description
gencode.human.gtf GENCODE Human Annotations GTF human Product gff Comprehensive gene annotations for hu...
gencode.mouse.gtf GENCODE Mouse Annotations GTF mouse Product gff Comprehensive gene annotations for mo...
gencode.primary GENCODE Primary Transcripts gencode_primary Product gff GENCODE Primary transcript set captur...
From other Resources
ID Name URL Category Format Description
ubkg.neo4j UBKG Neo4j Docker Distribution ubkg-downloads.xconsortia.org GraphProduct Turnkey neo4j distributions that depl...
ubkg.csv UBKG Ontology CSV Files ubkg-downloads.xconsortia.org GraphProduct csv Ontology CSV files that can be import...

Details

GENCODE

Overview

GENCODE (Encyclopedia of Genes and Gene Variants) is a scientific project aimed at identifying and classifying all gene features in the human and mouse genomes with high accuracy based on biological evidence. It provides comprehensive, evidence-based annotations that serve as a reference standard for genome interpretation and biomedical research.

Mission

The goal of the GENCODE project is to identify and classify all gene features in the human and mouse genomes with high accuracy based on biological evidence, and to release these annotations for the benefit of biomedical research and genome interpretation.

Key Features

  • Comprehensive Annotations: Protein-coding genes, long non-coding RNAs (lncRNAs), small RNAs, pseudogenes, and other genomic features
  • Evidence-Based: Combines manual curation with computational analysis
  • Regular Updates: New releases approximately every 3-4 months
  • High Quality: Integrates experimental data including Capture Long-read Sequencing
  • Global Core Biodata Resource: Recognized as essential infrastructure for life sciences research

Current Releases (September 2025)

  • GENCODE 49: Human genome annotations
  • GENCODE M38: Mouse genome annotations

Annotation Types

Protein-Coding Genes

  • Comprehensive transcript annotations
  • Alternative splicing isoforms
  • Coding sequence (CDS) definitions
  • Translation information

Non-Coding RNAs

  • Long non-coding RNAs (lncRNAs): Significantly expanding through Capture Long-read Sequencing project integration
  • Small RNAs: miRNAs, snoRNAs, snRNAs
  • Pseudogenes: Processed and unprocessed pseudogenes

Special Annotation Sets

  • GENCODE Primary: Minimal set of transcripts at protein-coding genes
  • Promoter Windows: First catalog of human promoter windows
  • Ribo-seq ORFs: Non-canonical human ORFs predicted by ribosome profiling data, including peptidomics and immunopeptidomics integration

Data Formats

  • GTF/GFF3: Gene transfer format files
  • FASTA: Sequence files for transcripts and proteins
  • Metadata: Gene and transcript metadata

Applications

  • RNA-seq analysis and transcript quantification
  • Variant annotation and interpretation
  • Comparative genomics
  • Functional genomics studies
  • Gene expression analysis
  • Knowledge graph construction (UBKG)
  • Clinical genomics and precision medicine

Integration

GENCODE annotations are integrated into:

  • Ensembl genome browser
  • UCSC Genome Browser
  • NCBI RefSeq
  • UniProt
  • UBKG (Unified Biomedical Knowledge Graph)
  • GTEx (Genotype-Tissue Expression)
  • Various RNA-seq analysis pipelines

Consortium Members

  • EMBL-EBI (European Bioinformatics Institute)
  • The Wellcome Sanger Institute
  • CRG (Centre for Genomic Regulation, Barcelona)
  • UCSC (University of California, Santa Cruz)
  • CNIO (Spanish National Cancer Research Centre)
  • MIT (Massachusetts Institute of Technology)
  • Yale University

Funding

GENCODE is supported by the National Human Genome Research Institute (NHGRI) of the US National Institutes of Health.

Access

  • Website: https://www.gencodegenes.org/
  • FTP Downloads: Gene annotations, sequences, and metadata
  • Genome Browsers: Via Ensembl and UCSC
  • Social Media: @GencodeGenes on Twitter

Citation

When using GENCODE data, please cite the GENCODE project and relevant publications describing the specific release used.

License

GENCODE data are freely available for research use. See EMBL-EBI Terms of Use for details.

Is this information incorrect or incomplete? Request an update.

Created: November 26, 2025 | Last modified: November 26, 2025