greengenes

is a Data Source.

It is part of the BER collection.

Greengenes2 is a comprehensive reference database and phylogenetic tree for 16S rRNA gene sequences that unifies microbial data from multiple sources into a single coherent framework. It provides standardized taxonomic assignments and phylogenetic placement for microbiome research and enables consistent analysis across different studies and sequencing platforms.

Domains

biomedical, microbiome, microbiology, organisms

License

Warning: No license entered

Homepage

greengenes

Repository

GitHub

Infores ID

Unknown

FAIRsharing ID

Unknown

Product Summary

Products

From this Resource
ID Name URL Category Format Description
greengenes.portal Greengenes2 Web Portal greengenes2.ucsd.edu GraphicalInterface http Web-based search interface for queryi...
greengenes.ftp Greengenes2 FTP Archive current Product http FTP archive containing Greengenes2 da...
greengenes.phylogeny Greengenes2 Phylogenetic Tree current Product txt Reference phylogenetic tree in Newick...
greengenes.sequences Greengenes2 Sequences current Product fasta 16S rRNA gene sequences in FASTA form...
greengenes.taxonomy Greengenes2 Taxonomy current Product tsv Taxonomic assignments and metadata fo...
greengenes.qiime2-plugin q2-greengenes2 Plugin q2-greengenes2 ProcessProduct python QIIME 2 plugin for integrating Greeng...
From other Resources
ID Name URL Category Format Description
rnacentral.portal RNAcentral Portal rnacentral.org GraphicalInterface http Web portal for searching and browsing...
rnacentral.api RNAcentral REST API api ProgrammingInterface http REST API for programmatic access to R...
rnacentral.ftp RNAcentral FTP Archive RNAcentral Product http FTP archive with current and archived...
rnacentral.public-db RNAcentral Public Postgres Database public-database DataModelProduct postgres Public PostgreSQL database for direct...

Details

Greengenes2

Greengenes2 is a comprehensive reference database and phylogenetic framework for 16S rRNA gene sequences that addresses the fragmentation problem in microbial data analysis. By unifying sequences from multiple databases into a single coherent phylogenetic tree, Greengenes2 enables consistent taxonomic assignments and comparative analyses across different microbiome studies and sequencing platforms.

Key Features

Unified Phylogenetic Framework

  • Single reference tree containing over 33 million 16S rRNA sequences
  • Consistent phylogenetic placement for all sequences in the database
  • Integration of data from multiple sources including SILVA, RDP, and NCBI
  • Standardized taxonomy based on phylogenetic relationships

Comprehensive Sequence Coverage

  • Amplicon sequence variants (ASVs) from diverse environmental samples
  • Full-length and partial 16S rRNA gene sequences
  • Sequences from both cultured and uncultured microorganisms
  • Regular updates incorporating new sequence data from public repositories

Advanced Search and Query Capabilities

  • Search by species name, genus, or higher taxonomic levels
  • Query by ASV sequence or MD5 hash identifier
  • Clade-based searches for phylogenetic groups
  • Integration with QIIME 2 for streamlined analysis workflows

Data Structure

Phylogenetic Tree

  • Newick format tree files with branch lengths and support values
  • Multiple tree representations (full, backbone, and region-specific)
  • Bootstrap support values for assessing phylogenetic confidence
  • Time-calibrated molecular clock estimates where applicable

Taxonomic Assignments

  • Seven-level taxonomic hierarchy (Kingdom to Species)
  • Confidence scores for taxonomic assignments
  • Consistent naming conventions across all entries
  • Cross-references to external taxonomic databases

Sequence Data

  • FASTA format files for all 16S rRNA sequences
  • Quality-filtered sequences with length and ambiguity criteria
  • Metadata including source database, collection information, and quality metrics
  • MD5 hash identifiers for unique sequence identification

Metadata Integration

  • Environmental context information where available
  • Host organism data for host-associated microbes
  • Geographic origin and sampling metadata
  • Publication and study references for sequence sources

Applications

Microbiome Analysis

  • Taxonomic profiling of amplicon sequencing data
  • Phylogenetic diversity calculations and community comparisons
  • Identification of novel or rare microbial taxa
  • Cross-study meta-analyses with consistent taxonomic framework

Comparative Genomics

  • Phylogenetic placement of newly sequenced organisms
  • Evolutionary analysis of microbial communities
  • Assessment of phylogenetic signal in functional traits
  • Integration with whole-genome phylogenies

Environmental Microbiology

  • Biodiversity assessments across different ecosystems
  • Tracking of microbial populations over time and space
  • Identification of indicator species for environmental conditions
  • Assessment of microbial community assembly processes

Clinical and Applied Microbiology

  • Pathogen identification and phylogenetic typing
  • Microbiome-based diagnostic applications
  • Probiotic strain identification and quality control
  • Food safety and industrial microbiology applications

Data Access and Integration

Web Interface

  • User-friendly search portal with autocomplete functionality
  • Interactive phylogenetic tree visualization
  • Sequence alignment and comparison tools
  • Export capabilities for analysis results

Programmatic Access

  • Direct FTP download of database files
  • QIIME 2 plugin for seamless workflow integration
  • Command-line tools for batch processing
  • API endpoints for automated data retrieval

File Formats and Standards

  • Standard bioinformatics formats (FASTA, Newick, TSV)
  • QIIME 2 artifact formats (.qza) for reproducible analysis
  • Compressed archives for efficient data transfer
  • Detailed documentation and metadata schemas

Quality Assurance

  • Rigorous sequence quality filtering and validation
  • Phylogenetic consistency checking across updates
  • Cross-validation with established taxonomic databases
  • Community feedback mechanisms for error reporting

Technical Implementation

Greengenes2 is built using scalable phylogenetic inference methods that can accommodate the massive scale of modern sequence databases. The unified tree construction process involves careful alignment, phylogenetic placement algorithms, and iterative refinement to ensure consistency and accuracy across the entire phylogenetic framework.

Is this information incorrect or incomplete? Request an update.

Created: September 09, 2025 | Last modified: September 24, 2025