medgen

is an Aggregator.

NCBI's portal to information about conditions, phenotypes, and findings in humans related to medical genetics. Aggregates and organizes data from multiple authoritative sources including UMLS, OMIM, HPO, Mondo, Orphanet, GeneReviews, PharmGKB, and community submissions to GTR and ClinVar. Each concept is assigned a distinct Concept Unique Identifier (CUI) and integrated with related information from clinical resources, genetic testing registries, medical literature, and molecular resources.

Domains

genomics, clinical, health, biomedical

Homepage

medgen

Repository

Unknown

Infores ID

infores:medgen

FAIRsharing ID

Unknown

Product Summary

Products

From this Resource
ID Name URL Category Format Description
medgen.portal MedGen Portal medgen GraphicalInterface http Main web portal for searching and bro...
medgen.search Advanced Search advanced GraphicalInterface http Advanced search interface with suppor...
medgen.mgconso MGCONSO (Concept Names) MGCONSO.RRF.gz (15.1 MB) Product txt Rich Release Format (RRF) file contai...
medgen.mgdef MGDEF (Definitions) MGDEF.RRF.gz (4.8 MB) Product txt Rich Release Format (RRF) file contai...
medgen.mgrel MGREL (Relationships) MGREL.RRF.gz (14.9 MB) MappingProduct txt Rich Release Format (RRF) file contai...
medgen.mgsat MGSAT (Attributes) MGSAT.RRF.gz (11.2 MB) Product txt Rich Release Format (RRF) file contai...
medgen.mgsty MGSTY (Semantic Types) MGSTY.RRF.gz (1.6 MB) Product txt Rich Release Format (RRF) file contai...
medgen.names NAMES NAMES.RRF.gz (3.0 MB) Product txt Rich Release Format (RRF) file contai...
medgen.id-mappings MedGen ID Mappings MedGenIDMappings.txt.gz (5.7 MB) MappingProduct txt Mappings between MedGen CUIs and exte...
medgen.hpo-mapping MedGen HPO Mapping MedGen_HPO_Mapping.txt.gz (380.5 KB) MappingProduct txt Mappings between MedGen and Human Phe...
medgen.hpo-omim-mapping MedGen HPO OMIM Mapping MedGen_HPO_OMIM_Mapping.txt.gz (3.9 MB) MappingProduct txt Combined mappings between MedGen, HPO...
medgen.pubmed-links MedGen PubMed Links medgen_pubmed_lnk.txt.gz (228.8 MB) Product txt Links between MedGen concepts and Pub...
medgen.cui-history MedGen CUI History MedGen_CUI_history.txt (86.6 KB) Product txt History file tracking changes to MedG...
medgen.uid-cui-history MedGen UID CUI History MedGen_UID_CUI_history.txt (56.5 MB) Product txt History of mappings between MedGen UI...
medgen.hpo-history HPO CUI History HPO_CUI_history.txt (1.2 MB) Product txt History file tracking changes to HPO ...
medgen.mondo-history Mondo CUI History MONDO_CUI_history.txt (989.1 KB) Product txt History file tracking changes to Mond...
medgen.ordo-history ORDO CUI History ORDO_CUI_history.txt (1.1 MB) Product txt History file tracking changes to Orph...
medgen.sources MedGen Sources MedGen_Sources.txt (10.1 KB) Product txt Information about source databases an...
medgen.merged MERGED (Merged CUIs) MERGED.RRF.gz (46.5 KB) Product txt Merged CUI mappings showing concept c...
medgen.csv CSV Data Files csv Product csv CSV format data files directory with ...
medgen.help MedGen Documentation overview DocumentationProduct http Documentation, help pages, and user g...
medgen.faq FAQ faq DocumentationProduct http Frequently asked questions about MedGen
medgen.readme README README.txt (16.9 KB) DocumentationProduct txt README file with detailed information...
medgen.presentations MedGen Presentations presentations DocumentationProduct http Directory of MedGen presentations rel...
From other Resources
ID Name URL Category Format Description
pheknowlator.graph PheKnowLator graph knowledge_graphs?pageState=(%22StorageObjectListTable%22:(%22f%22:%22%255B%255D%22))&inv=1&invt=Ab5_1Q&project=pheknowlator GraphProduct owl PheKnowLator graph files, including s...
ncbigene.mim2gene_medgen MIM to Gene MedGen Mapping mim2gene_medgen (932.6 KB) MappingProduct tsv MIM to Gene and MedGen mapping data c...
unibiomap.links UniBioMap Graph Links unibiomap.links.csv (1.3 GB) GraphProduct csv Core UniBioMap graph edges file.
unibiomap.auxs UniBioMap Graph Auxiliaries unibiomap.auxs.tsv (563.9 MB) GraphProduct tsv Auxiliary UniBioMap graph annotations...
unibiomap.pred UniBioMap Predicted Graph unibiomap.pred.csv (2.3 GB) GraphProduct csv Predicted UniBioMap graph edges with ...
unibiomap.pred.full UniBioMap Predicted Graph (Full) unibiomap.pred.full.csv (5.9 GB) GraphProduct csv Full unfiltered UniBioMap predicted g...

Details

MedGen

MedGen is NCBI’s portal to information about conditions, phenotypes, and findings in humans related to medical genetics. It serves as a comprehensive aggregator that organizes and integrates information from multiple authoritative sources, providing a unified view of genetic conditions and their relationships.

Overview

MedGen includes diseases such as Mendelian disorders, multi-factorial disorders, chronic disease susceptibilities, somatic phenotypes, and pharmacogenetic responses. The database also includes infectious disease terms to support submitters of the NIH Genetic Testing Registry (GTR) and clinicians looking for tests and terms related to infectious agents in human samples.

Core Concept

Terms from various sources including GTR and ClinVar submissions, Unified Medical Language System (UMLS), Online Mendelian Inheritance in Man (OMIM), Human Phenotype Ontology (HPO), Mondo Disease Ontology, rare disease terms from Orphanet Rare Disease Ontology (ORDO), and other sources are integrated into unique concepts. Each concept is assigned a distinct identifier called the Concept Unique Identifier (CUI) and a preferred name.

Content Integration

The core content of a concept record includes:

  • Names and Identifiers: Names used by different databases with links to external resources (ontologies such as HPO, ORDO, and Mondo)
  • Modes of Inheritance: Information about genetic inheritance patterns
  • Clinical Features: Associated phenotypes and clinical characteristics
  • Genomic Location: Map location of the loci affecting the phenotype
  • External Resources: Links to resources for clinicians (OMIM), customers (MedlinePlus), genetic tests (GTR), medical literature (GeneReviews, PubMed), and molecular resources (ClinVar, RefSeqGene)

Data Sources

MedGen aggregates data from multiple authoritative sources with varying update frequencies:

Source Update Frequency Primary Data Types Coverage
UMLS 2x/year CUI, descriptions, name and ID 96%
HPO Monthly Name and ID, Phenotype:Disease relationships 8%
Mondo Monthly Name, ID; Orphanet and GARD 10%
Orphanet (ORDO) Monthly Name, ID, Mode of Inheritance 4%
OMIM Daily Name, ID, Gene:Disease, description 5%
GeneReviews Weekly Name, Gene:Disease, description 0.4%
PharmGKB Monthly Drug name, ID, Drug:Gene relationships 0.5%
MedGen (internal) Daily Name, CN CUI, Drug:Disease, Disease:Disease 2%
NCBI Gene Daily Gene symbol, chromosome location N/A

Note: MedGen terms can have one or more external identifiers/sources mapped. Percentages are averaged from data analyzed in 2024.

Concept Unique Identifiers

MedGen primarily uses CUIs from the UMLS dataset. When a CUI cannot be found in UMLS to match a record in MedGen, an NCBI-generated CUI is provided instead. These begin with “CN” to clarify they are not UMLS-provided CUIs (which all begin with “C”). CN-type CUIs may be created from submissions in the NIH Genetic Testing Registry or ClinVar that do not match a record in UMLS.

Pharmacogenomics

MedGen provides standardized terminology for pharmacogenomics, describing the interaction between an individual’s genetic code and medication response. Using data from PharmGKB, MedGen creates disease records describing abnormal responses to drugs driven by genetic or environmental factors. These terms are created and maintained by MedGen with links to expert clinical recommendations, FDA-approved drug labels, and PharmGKB pages.

Data Processing and Curation

MedGen employs both automated and manual curation processes:

Automated Alignment

When new versions of source data become available, automated pipelines download and process the data, making local copies. The relevant data is subset, existing terms are updated as needed, and new terms are added. Mapping is done between identifiers or concept preferred names, or both, depending on the data sources involved.

Manual Curation

The manual curation process follows a standard decision tree to evaluate source terms and potential matches in MedGen. When source data is unclear, curators contact the source to clarify term scope and meaning. MedGen terms are curated to align with sources by splitting or merging terms or creating novel MedGen records as needed.

MedGen integrates with several NCBI resources:

  • ClinVar: Database of genomic variation and its relationship to human health
  • Gene: NCBI’s gene-centered database
  • Genetic Testing Registry (GTR): Registry of genetic tests
  • GeneReviews: Expert-authored disease descriptions
  • RefSeqGene: Reference standard sequences for human genes
  • Medical Genetics Summaries: Information about genetic conditions

FTP Data Files

MedGen provides extensive downloadable data through its FTP site, including:

  • Rich Rich Format (RRF) files: Core database files with concept names, definitions, relationships, attributes, and semantic types
  • Mapping files: ID mappings to external sources, HPO, OMIM, and other databases
  • History files: Tracking changes to CUIs and external term mappings
  • Literature links: Connections between concepts and PubMed articles
  • CSV exports: Alternative format data files for easier processing

All major data files are provided with gzip compression to reduce file sizes while maintaining complete data integrity.

Is this information incorrect or incomplete? Request an update.

Created: July 17, 2025 | Last modified: February 26, 2026