greengenes

is a Data Source.

It is part of the BER collection.

Greengenes2 is a comprehensive reference database and phylogenetic tree for 16S rRNA gene sequences that unifies microbial data from multiple sources into a single coherent framework. It provides standardized taxonomic assignments and phylogenetic placement for microbiome research and enables consistent analysis across different studies and sequencing platforms.

Compare

Domains

biomedical, microbiology, organisms

License

Warning: No license entered

Homepage

greengenes

Repository

GitHub

Infores ID

Unknown

FAIRsharing ID

Unknown

Product Summary

Contacts

Daniel McDonald

University of California San Diego

URL: https://greengenes2.ucsd.edu/

Publications

Greengenes2 unifies microbial data in a single reference tree Preferred

Products

From this Resource

ID	Name	URL	Category	Format	Description
greengenes.portal	Greengenes2 Web Portal	greengenes2.ucsd.edu	GraphicalInterface	http	Web-based search interface for queryi...
greengenes.ftp	Greengenes2 FTP Archive	current	Product	http	FTP archive containing Greengenes2 da...
greengenes.phylogeny	Greengenes2 Phylogenetic Tree	current	Product	txt	Reference phylogenetic tree in Newick...
greengenes.sequences	Greengenes2 Sequences	current	Product	fasta	16S rRNA gene sequences in FASTA form...
greengenes.taxonomy	Greengenes2 Taxonomy	current	Product	tsv	Taxonomic assignments and metadata fo...
greengenes.qiime2-plugin	q2-greengenes2 Plugin	q2-greengenes2	ProcessProduct	python	QIIME 2 plugin for integrating Greeng...

From other Resources

ID	Name	URL	Category	Format	Relation	Description
rnacentral.portal	RNAcentral Portal	rnacentral.org	GraphicalInterface	http	had primary source	Web portal for searching and browsing...
rnacentral.api	RNAcentral REST API	api	ProgrammingInterface	http	had primary source	REST API for programmatic access to R...
rnacentral.ftp	RNAcentral FTP Archive	RNAcentral	Product	http	had primary source	FTP archive with current and archived...
rnacentral.public-db	RNAcentral Public Postgres Database	public-database	DataModelProduct	postgres	had primary source	Public PostgreSQL database for direct...

Relevant Taxa

Bacteria (NCBITaxon:2)

Details

Greengenes2

Greengenes2 is a comprehensive reference database and phylogenetic framework for 16S rRNA gene sequences that addresses the fragmentation problem in microbial data analysis. By unifying sequences from multiple databases into a single coherent phylogenetic tree, Greengenes2 enables consistent taxonomic assignments and comparative analyses across different microbiome studies and sequencing platforms.

Key Features

Unified Phylogenetic Framework

Single reference tree containing over 33 million 16S rRNA sequences
Consistent phylogenetic placement for all sequences in the database
Integration of data from multiple sources including SILVA, RDP, and NCBI
Standardized taxonomy based on phylogenetic relationships

Comprehensive Sequence Coverage

Amplicon sequence variants (ASVs) from diverse environmental samples
Full-length and partial 16S rRNA gene sequences
Sequences from both cultured and uncultured microorganisms
Regular updates incorporating new sequence data from public repositories

Advanced Search and Query Capabilities

Search by species name, genus, or higher taxonomic levels
Query by ASV sequence or MD5 hash identifier
Clade-based searches for phylogenetic groups
Integration with QIIME 2 for streamlined analysis workflows

Data Structure

Phylogenetic Tree

Newick format tree files with branch lengths and support values
Multiple tree representations (full, backbone, and region-specific)
Bootstrap support values for assessing phylogenetic confidence
Time-calibrated molecular clock estimates where applicable

Taxonomic Assignments

Seven-level taxonomic hierarchy (Kingdom to Species)
Confidence scores for taxonomic assignments
Consistent naming conventions across all entries
Cross-references to external taxonomic databases

Sequence Data

FASTA format files for all 16S rRNA sequences
Quality-filtered sequences with length and ambiguity criteria
Metadata including source database, collection information, and quality metrics
MD5 hash identifiers for unique sequence identification

Metadata Integration

Environmental context information where available
Host organism data for host-associated microbes
Geographic origin and sampling metadata
Publication and study references for sequence sources

Applications

Microbiome Analysis

Taxonomic profiling of amplicon sequencing data
Phylogenetic diversity calculations and community comparisons
Identification of novel or rare microbial taxa
Cross-study meta-analyses with consistent taxonomic framework

Comparative Genomics

Phylogenetic placement of newly sequenced organisms
Evolutionary analysis of microbial communities
Assessment of phylogenetic signal in functional traits
Integration with whole-genome phylogenies

Environmental Microbiology

Biodiversity assessments across different ecosystems
Tracking of microbial populations over time and space
Identification of indicator species for environmental conditions
Assessment of microbial community assembly processes

Clinical and Applied Microbiology

Pathogen identification and phylogenetic typing
Microbiome-based diagnostic applications
Probiotic strain identification and quality control
Food safety and industrial microbiology applications

Data Access and Integration

Web Interface

User-friendly search portal with autocomplete functionality
Interactive phylogenetic tree visualization
Sequence alignment and comparison tools
Export capabilities for analysis results

Programmatic Access

Direct FTP download of database files
QIIME 2 plugin for seamless workflow integration
Command-line tools for batch processing
API endpoints for automated data retrieval

File Formats and Standards

Standard bioinformatics formats (FASTA, Newick, TSV)
QIIME 2 artifact formats (.qza) for reproducible analysis
Compressed archives for efficient data transfer
Detailed documentation and metadata schemas

Quality Assurance

Rigorous sequence quality filtering and validation
Phylogenetic consistency checking across updates
Cross-validation with established taxonomic databases
Community feedback mechanisms for error reporting

Technical Implementation

Greengenes2 is built using scalable phylogenetic inference methods that can accommodate the massive scale of modern sequence databases. The unified tree construction process involves careful alignment, phylogenetic placement algorithms, and iterative refinement to ensure consistency and accuracy across the entire phylogenetic framework.

Is this information incorrect or incomplete? Request an update.

Created: September 09, 2025 | Last modified: September 24, 2025