pfocr

is a Data Source.

Pathway Figure OCR (PFOCR) is a resource that extracts biological pathway information from figures in scientific publications using optical character recognition (OCR) and machine learning. PFOCR automatically identifies pathway diagrams in published literature, extracts gene and protein names from pathway figures, and creates structured pathway data. The resource enables discovery of pathway knowledge that exists only in figure format and is not captured in article text or structured databases.

License

CC0-1.0

Homepage

pfocr

Repository

GitHub

Infores ID

infores:pfocr

FAIRsharing ID

Unknown

Product Summary

Products

From this Resource
ID Name URL Category Format Description
pfocr.web PFOCR Web Interface pfocr.wikipathways.org GraphicalInterface http Web interface for searching and brows...
pfocr.database_repository PFOCR Database Repository pfocr-database Product http GitHub repository containing the Jeky...
pfocr.search_json PFOCR Search JSON search.json (53.3 MB) Product json Search metadata JSON used by the PFOC...
pfocr.figure_info_json PFOCR Figure Information JSON getFigureInfo.json (56.4 MB) Product json JSON file containing all PFOCR figure...
pfocr.gmt PFOCR GMT Gene Sets current Product txt Current GMT release of PFOCR pathway ...
pfocr.api PFOCR API help.html#download ProgrammingInterface http JSON endpoints and help documentation...
From other Resources
ID Name URL Category Format Relation Description
harmonizome.downloads Harmonizome Downloads download Product mixed was derived from Harmonizome 3.0 processed dataset dow...
harmonizome.kg-neo4j Harmonizome Knowledge Graph Neo4j Database harmonizome-kg.maayanlab.cloud GraphProduct neo4j was derived from Neo4j knowledge graph serialization o...

Details

PFOCR - Pathway Figure OCR

Overview

Pathway Figure OCR (PFOCR) is a resource that extracts biological pathway information from figures in scientific publications using optical character recognition (OCR) and machine learning.

PFOCR addresses the challenge that much pathway knowledge exists only in published figures and is not captured in article abstracts or structured databases. By automatically processing pathway diagrams, PFOCR makes this “hidden” knowledge discoverable and machine-readable.

Key Features

  • Automated Figure Processing: Identifies pathway figures in publications using machine learning
  • OCR Extraction: Extracts gene/protein names and pathway elements from figures
  • Entity Recognition: Identifies and normalizes biological entities (genes, proteins, metabolites)
  • Pathway Reconstruction: Creates structured pathway data from visual representations
  • Literature Coverage: Processes figures from PubMed Central and other sources
  • Integration: Data compatible with WikiPathways and other pathway resources

Research Applications

  • Literature-based pathway discovery
  • Pathway knowledge mining
  • Gene function annotation
  • Systems biology network construction
  • Complement to text mining approaches
  • Pathway database enrichment

Products

PFOCR Web Interface

Search and browse pathway information extracted from literature figures with visualization of source figures and extracted data.

PFOCR Pathway Data

Structured pathway data extracted from literature figures, including gene/protein interactions and pathway relationships in machine-readable formats.

PFOCR API

Programmatic access to PFOCR data for integration with pathway analysis tools and knowledge graphs.

Information Resource ID

This resource has the Information Resource identifier: infores:pfocr

Publications

Repository

Database repository: https://github.com/wikipathways/pfocr-database

Domains

  • Pathways
  • Literature
  • Biomedical
  • Systems Biology

Is this information incorrect or incomplete? Request an update.

Created: November 05, 2025 | Last modified: June 02, 2026