openbiodiv

is a Knowledge Graph.

OpenBiodiv is an open-access biodiversity knowledge graph that extracts, integrates, and provides semantic access to linked open biodiversity data from scientific literature and established taxonomic backbones. It converts narrative biodiversity information into computable semantic form and maintains a comprehensive knowledge management system for biodiversity science.

Domains

literature

License

CC BY 4.0

Homepage

openbiodiv

Repository

GitHub

Infores ID

infores:openbiodiv

FAIRsharing ID

Unknown

Product Summary

Products

From this Resource
ID Name URL Category Format Description
openbiodiv.portal OpenBiodiv Web Portal openbiodiv.net GraphicalInterface http Interactive web portal for searching,...
openbiodiv.sparql OpenBiodiv SPARQL Endpoint graph.openbiodiv.net ProgrammingInterface http SPARQL endpoint for complex semantic ...
openbiodiv.api OpenBiodiv REST API webapi ProgrammingInterface http RESTful API for programmatic access t...
openbiodiv.ontology.ttl OpenBioDiv-O openbiodiv-ontology-latest.ttl (8.0 KB) OntologyProduct ttl OpenBioDiv-O, the OpenBiodiv Ontology
openbiodiv.ontology.guide OpenBioDiv-O Guide ontologies DocumentationProduct http Guide to OpenBioDiv-O, the OpenBiodiv...

Details

OpenBiodiv

Overview

OpenBiodiv is an Open Access Biodiversity Knowledge Management System (OBKMS) that creates and maintains a comprehensive biodiversity knowledge graph. It combines information extracted from scientific literature with established taxonomic backbones to create a semantic, linked-data repository of biodiversity knowledge.

The system transforms narrative biodiversity information from scholarly publications into computable semantic form, effectively “lifting” biodiversity knowledge into Linked Open Data (LOD) and making previously inaccessible information discoverable and machine-readable.

Data Content and Coverage

Scale and Scope

As of the latest updates, OpenBiodiv contains:

  • 24,939 articles - Scholarly publications analyzed for biodiversity information
  • 167,471 treatments - Taxonomic treatments and species descriptions
  • 130,359 authors - Contributing researchers and publication authors
  • 736,809 taxon names - Scientific names indexed and semantically linked
  • 129,257 sequences - Genetic and molecular sequences with taxonomic linkage
  • 1,390 institutions and collections - Museums and research institutions
  • 117,854 figures - Scientific illustrations and diagrams from publications
  • 18,585 tables - Research data tables extracted from literature
  • 90,008 material examined sections - Specimen and type material records

Data Types and Coverage

Primary Data Sources:

  • Pensoft-published articles (>5,000 biodiversity papers)
  • Plazi TreatmentBank (taxonomic treatments)
  • GBIF Backbone Taxonomy (established taxonomic frameworks)
  • Biodiversity Literature Repository (Zenodo-hosted content)

Content Categories:

  • Taxonomic treatments and species descriptions
  • Scientific nomenclature and naming relationships
  • Author and institutional information
  • Specimen and type material data
  • DNA and genetic sequences with taxonomic context
  • Ecological data and species interactions
  • Biogeographical distribution information
  • Material examined sections from taxonomic literature
  • Scientific figures and illustrations
  • Research data tables

Taxonomic Coverage: OpenBiodiv covers all kingdoms of life with particular strength in:

  • Insects - Especially the “Big 4” insect orders (Coleoptera, Lepidoptera, Hymenoptera, Diptera)
  • Plants - Extensive plant taxonomy coverage
  • Fungi - Fungal taxonomy and ecology
  • Invertebrates - Comprehensive non-insect invertebrate coverage
  • Vertebrates - Vertebrate species and taxonomy

Data Access and Formats

Access Methods

1. Web Portal

  • Interactive interface for searching and exploring biodiversity data
  • User-friendly discovery of taxonomic treatments and species information
  • Full-text search capabilities

2. SPARQL Endpoint

  • Endpoint: http://graph.openbiodiv.net/
  • Enable complex semantic queries across multiple data types
  • Query combinations of literature, taxonomy, specimens, and sequences
  • Adheres to W3C SPARQL standards

3. REST API

  • Endpoint: https://graph.openbiodiv.net/webapi
  • RESTful interface for programmatic access
  • Swagger documentation for API methods
  • Simple queries and data retrieval
  • Integration with external systems and workflows

Data Format

All data is maintained and distributed in RDF (Resource Description Framework) format, adhering to Linked Open Data (LOD) principles and stored in Ontotext GraphDB. Output formats include:

  • RDF/XML
  • RDF Linked Open Data format
  • Query results through SPARQL
  • JSON (via REST API)

Data Updates

  • Database is updated and indexed daily
  • Apache Kafka event-streaming platform enables continuous workflow automation
  • Ongoing integration of new content from source publishers
  • Real-time incorporation of new taxonomic treatments and literature

Key Features

Semantic Integration

  • Automatic entity linking and discovery across data sources
  • Connection of taxon names to publication sections and context
  • Linking of specimens to institutions and collectors
  • Association of authors with their publications and taxonomic work
  • Relationship extraction from natural language text

Query Capabilities

  • Complex queries spanning literature, taxonomy, specimens, and sequences
  • Example: “Which taxa treatments mention both species X and species Y?”
  • Example: “Find all specimens of species X from institution Y”
  • Example: “Discover all ecological interactions mentioned for a given taxon”
  • Advanced SPARQL queries for custom analyses

Standards Compliance

  • Full compliance with Linked Open Data principles
  • Integration with Global Biodiversity Information Facility (GBIF)
  • Compatible with Darwin Core standards and DarwinCore-based ontologies
  • Support for SPAR Ontologies in scholarly publishing
  • Connection to ENVO (Environment Ontology) for ecological data
  • Integration with NOMEN ontology for nomenclature

Use Cases and Applications

  • Taxonomic Research - Discovering all published species descriptions for a taxon
  • Specimen Management - Finding and linking type materials and specimen records
  • Literature Mining - Extracting structured biodiversity information from publications
  • Cross-database Research - Integrating data across literature, taxonomy, and specimens
  • Evolutionary Studies - Connecting sequence data to taxonomic descriptions
  • Biogeographical Analysis - Mapping species distributions from literature
  • Ecological Interaction Discovery - Finding documented species interactions
  • Institutional Research - Analyzing collections and museum holdings
  • Literature Integration - Semantic linking of biodiversity research outputs

Technical Infrastructure

Technologies

  • Graph Database: Ontotext GraphDB
  • Data Pipeline: Apache Kafka for event streaming
  • Publishing Platform: ARPHA (Advanced Research Publishing and Hosting Architecture)
  • Data Standards: RDF, SPARQL, Darwin Core, Linked Open Data

Ontologies

  • OpenBiodiv-O: Custom ontology defining the semantic structure of the system
  • Integration with multiple external ontologies for comprehensive coverage
  • Semantic web standards for interoperability

Citation and Usage

OpenBiodiv data is freely available under the Creative Commons Attribution 4.0 (CC BY 4.0) license. Users are encouraged to cite the appropriate OpenBiodiv publication(s) when using data from this resource in research or educational materials.

The system is operated by Pensoft Publishers in collaboration with major biodiversity institutions and the global biodiversity informatics community.

Automated Evaluation

Is this information incorrect or incomplete? Request an update.

Created: December 17, 2025 | Last modified: January 06, 2026