is a Data Source.
Pfam is a large collection of protein families, each represented by multiple sequence alignments and hidden Markov models (HMMs), providing annotations of protein domains and functional sites.
proteomics
Unknown
infores:pfam
Unknown
| ID | Name | URL | Category | Format | Description |
|---|---|---|---|---|---|
| pfam.site | Interface for the Pfam Database | #table | GraphicalInterface | http | The core Pfam database containing pro... |
| pfam.a.models | Pfam-A HMM Library | Pfam-A.hmm.gz (331.1 MB) | Product | ❔ | The Pfam HMM library for Pfam-A famil... |
| pfam.a.data | Pfam-A HMM data | Pfam-A.hmm.dat.gz (637.4 KB) | Product | ❔ | The Pfam HMM data for Pfam-A families... |
| pfam.a.seedalignment | Pfam-A Seed alignment | Pfam-A.seed.gz (164.6 MB) | Product | ❔ | Pfam-A Seed alignment. |
| pfam.a.fullalignment | Pfam-A Full alignment | Pfam-A.full.gz (19.6 GB) | Product | ❔ | Pfam-A Full alignment. |
| pfam.api | InterPro API | api | ProgrammingInterface | json | REST API for programmatic access to P... |
| ID | Name | URL | Category | Format | Description |
|---|---|---|---|---|---|
| spoke.graph | SPOKE Graph | ❔ | GraphProduct | ❔ | The SPOKE knowledge graph containing ... |
| clinicalkg.graph | CKG Graph Dump | 1 | GraphProduct | mixed | Neo4j database dump of the Clinical K... |
| cancer-genome-interpreter.clinicalkg.graph | CKG Graph Dump | 1 | GraphProduct | mixed | Neo4j database dump of the Clinical K... |
| string.protein.links | STRING Protein Links | protein.links.v12.0.txt.gz (128.7 GB) | GraphProduct | txt | protein network data (full network, s... |
| string.protein.links.detailed | STRING Protein Links Detailed | protein.links.detailed.v12.0.txt.gz (189.6 GB) | GraphProduct | txt | protein network data (full network, i... |
| string.protein.links.full | STRING Protein Links Full | protein.links.full.v12.0.txt.gz (199.6 GB) | GraphProduct | txt | protein network data (full network, i... |
| string.protein.physical.links | STRING Protein Physical Links | protein.physical.links.v12.0.txt.gz (11.1 GB) | GraphProduct | txt | protein network data (physical subnet... |
| string.protein.physical.links.detailed | STRING Protein Physical Links Detailed | protein.physical.links.detailed.v12.0.txt.gz (13.8 GB) | GraphProduct | txt | protein network data (physical subnet... |
| string.protein.physical.links.full | STRING Protein Physical Links Full | protein.physical.links.full.v12.0.txt.gz (14.5 GB) | GraphProduct | txt | protein network data (physical subnet... |
| string.cog.links | STRING COG Links | COG.links.v12.0.txt.gz (176.8 MB) | GraphProduct | txt | association scores between orthologou... |
| string.cog.links.detailed | STRING COG Links Detailed | COG.links.detailed.v12.0.txt.gz (238.7 MB) | GraphProduct | txt | association scores (incl. subscores p... |
| string.database | STRING Database Network Schema | network_schema.v12.0.sql.gz (262.2 GB) | GraphProduct | ❔ | full database, part II: the networks ... |
| obo-db-ingest.pfam.tsv | pfam Nodes TSV | pfam.tsv (450.2 KB) | Product | tsv | pfam Nodes TSV |
| obo-db-ingest.pfam.clan.tsv | pfam.clan Nodes TSV | pfam.clan.tsv (6.3 KB) | Product | tsv | pfam.clan Nodes TSV |
| ckg.graph | CKG Graph Database Dump | 1 | GraphProduct | neo4j | Graph database dump and additional re... |
Pfam is a large collection of protein families, each represented by multiple sequence alignments and profile hidden Markov models (HMMs). Proteins are generally composed of one or more functional regions, commonly termed domains. The presence of different domains in varying combinations in different proteins gives rise to the diverse repertoire of proteins found in nature. Identifying the domains present in a protein can provide insights into its function.
Each Pfam family, usually referred to as a Pfam-A entry, consists of a curated seed alignment containing a small set of representative members of the family, profile HMMs built from the seed alignment, and an automatically generated full alignment, which contains all detectable protein sequences belonging to the family, as defined by profile HMM searches of primary sequence databases.
Pfam entries are classified into several types:
Pfam also groups related entries into clans, which are collections of Pfam entries related by sequence, structure, or profile HMM. This is particularly useful for capturing relationships between divergent families that may have a common evolutionary origin.
Pfam version 37.0 is based on UniProt release 2023_05. The database is now maintained as part of the InterPro database at the European Bioinformatics Institute (EMBL-EBI). Pfam is powered by the HMMER3 package developed by Sean Eddy’s group at HHMI/Harvard University.
The database is freely available under the Creative Commons Zero (CC0) license and can be accessed through the InterPro website or downloaded from the FTP site.
Created: May 28, 2025 | Last modified: January 30, 2026