is a Data Source.
Pfam is a large collection of protein families, each represented by multiple sequence alignments and hidden Markov models (HMMs), providing annotations of protein domains and functional sites.
proteomics
Unknown
infores:pfam
Unknown
| ID | Name | URL | Category | Format | Description |
|---|---|---|---|---|---|
| pfam.site | Interface for the Pfam Database | #table | GraphicalInterface | http | The core Pfam database containing pro... |
| pfam.a.models | Pfam-A HMM Library | Pfam-A.hmm.gz (331.1 MB) | Product | ❔ | The Pfam HMM library for Pfam-A famil... |
| pfam.a.data | Pfam-A HMM data | Pfam-A.hmm.dat.gz (637.4 KB) | Product | ❔ | The Pfam HMM data for Pfam-A families... |
| pfam.a.seedalignment | Pfam-A Seed alignment | Pfam-A.seed.gz (164.6 MB) | Product | ❔ | Pfam-A Seed alignment. |
| pfam.a.fullalignment | Pfam-A Full alignment | Pfam-A.full.gz (19.6 GB) | Product | ❔ | Pfam-A Full alignment. |
| pfam.api | InterPro API | api | ProgrammingInterface | json | REST API for programmatic access to P... |
| ID | Name | URL | Category | Format | Relation | Description |
|---|---|---|---|---|---|---|
| spoke.graph | SPOKE Graph | ❔ | GraphProduct | ❔ | had primary source | The SPOKE knowledge graph containing ... |
| clinicalkg.graph | CKG Graph Dump | 1 | GraphProduct | mixed | had primary source | Neo4j database dump of the Clinical K... |
| cancer-genome-interpreter.clinicalkg.graph | CKG Graph Dump | 1 | GraphProduct | mixed | had primary source | Neo4j database dump of the Clinical K... |
| string.protein.links | STRING Protein Links | protein.links.v12.0.txt.gz (128.7 GB) | GraphProduct | txt | had primary source | protein network data (full network, s... |
| string.protein.links.detailed | STRING Protein Links Detailed | protein.links.detailed.v12.0.txt.gz (189.6 GB) | GraphProduct | txt | had primary source | protein network data (full network, i... |
| string.protein.links.full | STRING Protein Links Full | protein.links.full.v12.0.txt.gz (199.6 GB) | GraphProduct | txt | had primary source | protein network data (full network, i... |
| string.protein.physical.links | STRING Protein Physical Links | protein.physical.links.v12.0.txt.gz (11.1 GB) | GraphProduct | txt | had primary source | protein network data (physical subnet... |
| string.protein.physical.links.detailed | STRING Protein Physical Links Detailed | protein.physical.links.detailed.v12.0.txt.gz (13.8 GB) | GraphProduct | txt | had primary source | protein network data (physical subnet... |
| string.protein.physical.links.full | STRING Protein Physical Links Full | protein.physical.links.full.v12.0.txt.gz (14.5 GB) | GraphProduct | txt | had primary source | protein network data (physical subnet... |
| string.cog.links | STRING COG Links | COG.links.v12.0.txt.gz (176.8 MB) | GraphProduct | txt | had primary source | association scores between orthologou... |
| string.cog.links.detailed | STRING COG Links Detailed | COG.links.detailed.v12.0.txt.gz (238.7 MB) | GraphProduct | txt | had primary source | association scores (incl. subscores p... |
| string.database | STRING Database Network Schema | network_schema.v12.0.sql.gz (262.2 GB) | GraphProduct | ❔ | had primary source | full database, part II: the networks ... |
| obo-db-ingest.pfam.tsv | pfam Nodes TSV | pfam.tsv (450.2 KB) | Product | tsv | had primary source | pfam Nodes TSV |
| obo-db-ingest.pfam.clan.tsv | pfam.clan Nodes TSV | pfam.clan.tsv (6.3 KB) | Product | tsv | had primary source | pfam.clan Nodes TSV |
| ckg.graph | CKG Graph Database Dump | 1 | GraphProduct | neo4j | had primary source | Graph database dump and additional re... |
Pfam is a large collection of protein families, each represented by multiple sequence alignments and profile hidden Markov models (HMMs). Proteins are generally composed of one or more functional regions, commonly termed domains. The presence of different domains in varying combinations in different proteins gives rise to the diverse repertoire of proteins found in nature. Identifying the domains present in a protein can provide insights into its function.
Each Pfam family, usually referred to as a Pfam-A entry, consists of a curated seed alignment containing a small set of representative members of the family, profile HMMs built from the seed alignment, and an automatically generated full alignment, which contains all detectable protein sequences belonging to the family, as defined by profile HMM searches of primary sequence databases.
Pfam entries are classified into several types:
Pfam also groups related entries into clans, which are collections of Pfam entries related by sequence, structure, or profile HMM. This is particularly useful for capturing relationships between divergent families that may have a common evolutionary origin.
Pfam version 37.0 is based on UniProt release 2023_05. The database is now maintained as part of the InterPro database at the European Bioinformatics Institute (EMBL-EBI). Pfam is powered by the HMMER3 package developed by Sean Eddy’s group at HHMI/Harvard University.
The database is freely available under the Creative Commons Zero (CC0) license and can be accessed through the InterPro website or downloaded from the FTP site.
Created: May 28, 2025 | Last modified: January 30, 2026