semmeddb

is a Data Source.

SemMedDB is a repository of semantic predications (subject-predicate-object triples) extracted from biomedical literature by SemRep, a natural language processing system. It contains over 130 million semantic predications extracted from more than 37 million PubMed citations, supporting biomedical knowledge discovery, literature-based discovery, and clinical applications.

Domains

biomedical, literature, health, clinical, drug discovery, genomics, pharmacology

License

UMLS License

Homepage

semmeddb

Repository

Unknown

Infores ID

infores:semmeddb

FAIRsharing ID

Unknown

Product Summary

Products

From this Resource
ID Name URL Category Format Description
semmeddb.semrep.tool SemRep NLP System SemRep_download.html ProcessProduct The SemRep natural language processin...
From other Resources
ID Name URL Category Format Description
rtx-kg2.graph.nodes RTX-KG2.10.1c KGX JSONL Nodes kg2c-2.10.1-v1.0-nodes.jsonl.gz (359.1 MB) GraphProduct kgx-jsonl Nodes for KGX distribution of the RTX...
rtx-kg2.graph.edges RTX-KG2.10.1c KGX JSONL Edges kg2c-2.10.1-v1.0-edges.jsonl.gz (1.7 GB) GraphProduct kgx-jsonl Edges for KGX distribution of the RTX...
rtx-kg2.neo4j RTX-KG2 Neo4j arax.ncats.io ProgrammingInterface Neo4j distribution of the RTX-KG2 as ...
epigraphdb.graph EpiGraphDB Graph Database graph-database GraphProduct neo4j Integrated graph knowledge base combi...
translator.semmeddb.graph Translator SemMedDB KGX Graph latest GraphProduct kgx-jsonl KGX JSONL graph package for SemMedDB ...
translator.translator_kg.graph Translator Aggregate KGX Graph latest GraphProduct kgx-jsonl Aggregated KGX JSONL graph package co...

Details

Overview

The Semantic MEDLINE Database (SemMedDB) is a large-scale repository of semantic predications (subject-predicate-object triples) extracted from the biomedical literature by the SemRep natural language processing system. It provides a structured representation of biomedical knowledge contained in PubMed citations, where concepts are normalized to the Unified Medical Language System (UMLS) Metathesaurus, and their relationships are based on the UMLS Semantic Network.

SemMedDB version 43 (VER43_R) is the final update to the database (as of May 2024), containing data extracted from MEDLINE BASELINE 2022 with PubMed update files through May 8, 2024. The resource is being deprecated and will no longer be maintained as of December 31, 2024, though an archived version will remain available through the Internet Archive.

Features

  • Contains over 130 million semantic predications extracted from more than 37 million PubMed citations
  • Uses UMLS Metathesaurus concepts as predication arguments (subjects and objects)
  • Includes a wide range of semantic relationship types defined in the UMLS Semantic Network
  • Supports various types of biomedical relationships including clinical medicine, molecular interactions, disease etiology, pharmacogenomics, and anatomical relationships
  • Database schema includes tables for citations, sentences, entities, predications, and metadata

Database Schema

SemMedDB has the following main tables:

  1. CITATIONS: Contains metadata for each PubMed citation including PMID, publication date, and journal information
  2. SENTENCE: Contains information about individual sentences from PubMed citations
  3. ENTITY: Contains entity information with UMLS concept identifiers, names, and semantic types
  4. PREDICATION: Contains semantic predications with subject-predicate-object triples and associated metadata
  5. PREDICATION_AUX: Contains auxiliary information for predications with mention-level details
  6. GENERIC_CONCEPT: Contains generic concepts as indicated by SemRep

Applications

SemMedDB has been used for numerous biomedical knowledge discovery applications including:

  • Clinical decision making and medical diagnosis
  • Drug repurposing
  • Literature-based discovery and hypothesis generation
  • Adverse drug reaction identification
  • Drug-drug interaction discovery
  • Gene regulatory network inference
  • Biomedical question answering
  • Semantic relatedness assessment

Availability

SemMedDB is available for download from the National Library of Medicine. A UMLS Terminology Services (UTS) account is required to access the downloads. SemMedDB version 43 (VER43_R) is the final update to the database (as of May 2024).

SemRep System

SemRep is the underlying natural language processing system that extracts semantic predications for SemMedDB. It combines syntactic and semantic principles with structured biomedical domain knowledge contained in the Unified Medical Language System (UMLS) to extract semantic relations from biomedical text. SemRep has been developed at the U.S. National Library of Medicine.

Note

These tools will no longer be maintained as of December 31, 2024. Archived webpage can be found at the Internet Archive. The Indexing Initiative Github repository is under development. Contact NLM Customer Service if you have questions.

Is this information incorrect or incomplete? Request an update.

Created: May 30, 2025 | Last modified: January 23, 2026