refseq

is a Data Source.

The NCBI Reference Sequence Database (RefSeq) provides a comprehensive, integrated, non-redundant, well-annotated set of reference sequences including genomic, transcript, and protein sequences for naturally occurring molecules of the central dogma.

Domains

genomics, biomedical, biological systems

License

Warning: No license entered

Homepage

refseq

Infores ID

Unknown

FAIRsharing ID

Unknown

Product Summary

Contacts

Products

From other Resources
ID Name URL Category Format Description
ncbigene.gene_refseq_uniprotkb_collab Gene RefSeq UniProtKB Collaboration Data gene_refseq_uniprotkb_collab.gz (1.1 GB) MappingProduct tsv Gene to RefSeq/UniProtKB collaboratio...
ncbigene.gene2refseq Gene to RefSeq Mapping gene2refseq.gz (1.9 GB) MappingProduct tsv Gene to RefSeq mapping data providing...
clinicalkg.graph CKG Graph Dump 1 GraphProduct mixed Neo4j database dump of the Clinical K...
cancer-genome-interpreter.clinicalkg.graph CKG Graph Dump 1 GraphProduct mixed Neo4j database dump of the Clinical K...
genecards.gene.annotations GeneCards Gene Annotations www.genecards.org Product http Integrated gene annotation data aggre...
string.protein.links STRING Protein Links protein.links.v12.0.txt.gz (128.7 GB) GraphProduct txt protein network data (full network, s...
string.protein.links.detailed STRING Protein Links Detailed protein.links.detailed.v12.0.txt.gz (189.6 GB) GraphProduct txt protein network data (full network, i...
string.protein.links.full STRING Protein Links Full protein.links.full.v12.0.txt.gz (199.6 GB) GraphProduct txt protein network data (full network, i...
string.protein.physical.links STRING Protein Physical Links protein.physical.links.v12.0.txt.gz (11.1 GB) GraphProduct txt protein network data (physical subnet...
string.protein.physical.links.detailed STRING Protein Physical Links Detailed protein.physical.links.detailed.v12.0.txt.gz (13.8 GB) GraphProduct txt protein network data (physical subnet...
string.protein.physical.links.full STRING Protein Physical Links Full protein.physical.links.full.v12.0.txt.gz (14.5 GB) GraphProduct txt protein network data (physical subnet...
string.cog.links STRING COG Links COG.links.v12.0.txt.gz (176.8 MB) GraphProduct txt association scores between orthologou...
string.cog.links.detailed STRING COG Links Detailed COG.links.detailed.v12.0.txt.gz (238.7 MB) GraphProduct txt association scores (incl. subscores p...
string.database STRING Database Network Schema network_schema.v12.0.sql.gz (262.2 GB) GraphProduct full database, part II: the networks ...
ckg.graph CKG Graph Database Dump 1 GraphProduct neo4j Graph database dump and additional re...

Details

RefSeq: NCBI Reference Sequence Database

The NCBI Reference Sequence Database (RefSeq) provides a comprehensive, integrated, non-redundant, well-annotated set of reference sequences including genomic, transcript, and protein sequences. RefSeq standards serve as a foundation for functional annotation of genomes and provide stable reference points for mutation analysis, gene expression studies, and polymorphism discovery.

Overview

RefSeq provides reference sequence standards for naturally occurring molecules of the central dogma, from chromosomes to mRNAs to proteins. The database includes:

  • Genomic sequences: Complete chromosomes and genomic regions
  • Transcript sequences: mRNAs and non-coding RNAs
  • Protein sequences: Reference protein sequences
  • Cross-references: Mappings between different sequence databases

Scope and Content

Current Statistics (Release 231)

  • Proteins: 418,412,148 sequences
  • Transcripts: 72,500,061 sequences
  • Organisms: 167,222 represented
  • Regular releases: Updated periodically with comprehensive data

Molecule Types and Accession Prefixes

  • Proteins: NP_, XP_, AP_, YP_, WP_
  • RNA: NM_, NR_, XM_, XR_
  • Genomic: NC_, AC_, NG_, NT_, NW_, NZ_

Key Features

Data Organization

  • Taxonomic coverage: Archaea, bacteria, fungi, invertebrates, plants, vertebrates, viruses
  • Sequence formats: FASTA, GenBank flatfile, ASN.1
  • Update frequency: Daily updates and periodic releases
  • Quality control: Curated and computationally annotated sequences

Special Collections

  • RefSeqGene: Reference standards for well-characterized genes
  • Functional Elements: Experimentally validated non-genic functional elements
  • MANE: Matched Annotation from NCBI and EMBL-EBI
  • Targeted Loci: rRNA sequences from type material

Access Methods

FTP Access

Web Interfaces

  • CCDS: Consensus CDS project for human and mouse
  • RefSeq Select: High-confidence transcript sets
  • NCBI Virus: Viral sequence collection
  • Prokaryotic Genome Annotation Pipeline: Automated annotation system

Citation

When using RefSeq data, please cite:

  • O’Leary NA, et al. Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation. Nucleic Acids Res. 2016;44(D1):D733-45.

Contact Information

For questions, updates, or collaborations:

  • Help Desk: info@ncbi.nlm.nih.gov
  • RefSeq Updates: Subscribe to refseq-announce mailing list
  • GitHub: https://github.com/ncbi

Is this information incorrect or incomplete? Request an update.

Created: July 17, 2025 | Last modified: August 07, 2025