pfocr

is a Data Source.

Pathway Figure OCR (PFOCR) is a resource that extracts biological pathway information from figures in scientific publications using optical character recognition (OCR) and machine learning. PFOCR automatically identifies pathway diagrams in published literature, extracts gene and protein names from pathway figures, and creates structured pathway data. The resource enables discovery of pathway knowledge that exists only in figure format and is not captured in article text or structured databases.

Domains

pathways, literature, biomedical, systems biology

License

Warning: No license entered

Homepage

pfocr

Repository

GitHub

Infores ID

infores:pfocr

FAIRsharing ID

Unknown

Product Summary

Publications

Products

From this Resource
ID Name URL Category Format Description
pfocr.web PFOCR Web Interface pfocr.wikipathways.org GraphicalInterface http Web interface for searching and brows...
pfocr.data PFOCR Pathway Data pfocr-database Product json Extracted pathway information from li...
pfocr.api PFOCR API pfocr.wikipathways.org ProgrammingInterface http API for accessing PFOCR extracted pat...

Details

PFOCR - Pathway Figure OCR

Overview

Pathway Figure OCR (PFOCR) is a resource that extracts biological pathway information from figures in scientific publications using optical character recognition (OCR) and machine learning.

PFOCR addresses the challenge that much pathway knowledge exists only in published figures and is not captured in article abstracts or structured databases. By automatically processing pathway diagrams, PFOCR makes this “hidden” knowledge discoverable and machine-readable.

Key Features

  • Automated Figure Processing: Identifies pathway figures in publications using machine learning
  • OCR Extraction: Extracts gene/protein names and pathway elements from figures
  • Entity Recognition: Identifies and normalizes biological entities (genes, proteins, metabolites)
  • Pathway Reconstruction: Creates structured pathway data from visual representations
  • Literature Coverage: Processes figures from PubMed Central and other sources
  • Integration: Data compatible with WikiPathways and other pathway resources

Research Applications

  • Literature-based pathway discovery
  • Pathway knowledge mining
  • Gene function annotation
  • Systems biology network construction
  • Complement to text mining approaches
  • Pathway database enrichment

Products

PFOCR Web Interface

Search and browse pathway information extracted from literature figures with visualization of source figures and extracted data.

PFOCR Pathway Data

Structured pathway data extracted from literature figures, including gene/protein interactions and pathway relationships in machine-readable formats.

PFOCR API

Programmatic access to PFOCR data for integration with pathway analysis tools and knowledge graphs.

Information Resource ID

This resource has the Information Resource identifier: infores:pfocr

Publications

Repository

Source code: https://github.com/wikipathways/pfocr

Domains

  • Pathways
  • Literature
  • Biomedical
  • Systems Biology

Is this information incorrect or incomplete? Request an update.

Created: November 05, 2025 | Last modified: November 05, 2025