OBO Foundry Synchronization

This document describes the OBO Foundry synchronization functionality added to the KG-Registry.

Overview

The OBO Foundry sync feature automatically creates and updates KG-Registry resources for ontologies from the OBO Foundry registry (https://obofoundry.org/). This ensures that the KG-Registry stays up-to-date with the latest ontologies from this important biomedical ontology resource.

Files Added

Usage

Command Line

# Sync all active OBO Foundry ontologies
uv run python util/sync_obo_foundry.py --verbose

# Test sync with limited number of ontologies
uv run python util/sync_obo_foundry.py --limit 5 --verbose

# Dry run to see what would be synced
uv run python util/sync_obo_foundry.py --dry-run --verbose

Make Targets

# Sync all active OBO Foundry ontologies
make sync-obo-foundry

# Test sync with 5 ontologies
make sync-obo-test

# Dry run to see what would be synced
make sync-obo-dry-run

Integration with Build Process

The OBO Foundry sync is integrated into the main build process via the all target in the Makefile. This means that every time the full build is run, the OBO Foundry ontologies will be synchronized.

What Gets Synced

The sync process:

  1. Fetches the OBO Foundry registry from https://obofoundry.org/registry/ontologies.yml
  2. Filters out inactive, orphaned, or obsolete ontologies
  3. Transforms the metadata to match the KG-Registry schema
  4. Creates/updates resource files in the resource/ directory
  5. Adds all synced ontologies to the obo-foundry collection

Features

Domain Mapping

OBO Foundry domains are mapped to KG-Registry domains as follows:

Unknown domains default to biological systems.

Recent Enhancements (September 2024)

The OBO Foundry sync has been enhanced with four key improvements to better align with the KG-Registry schema:

1. Enhanced Contact Transformation

2. Structured Product Objects

3. DataModel Category Assignment

4. Inactive Ontology Inclusion

Ontology Statistics

Based on the most recent sync (September 2024):

Schema Changes

Added the obo-foundry collection to CollectionEnum in the schema:

obo-foundry:
  description: >-
    This entity is an ontology from the OBO Foundry,
    a collaborative effort to create reference ontologies
    in the biomedical domain.
  meaning: https://obofoundry.org/

Error Handling

The sync script includes comprehensive error handling:

Failed ontologies are logged but don’t stop the overall sync process.

Logging

The script provides detailed logging at INFO and DEBUG levels:

Future Enhancements

Potential future improvements: