pfam

is a Data Source.

Pfam is a large collection of protein families, each represented by multiple sequence alignments and hidden Markov models (HMMs), providing annotations of protein domains and functional sites.

Domains

proteomics

Homepage

pfam

Repository

Unknown

Infores ID

infores:pfam

FAIRsharing ID

Unknown

Product Summary

Products

From this Resource
ID Name URL Category Format Description
pfam.site Interface for the Pfam Database #table GraphicalInterface http The core Pfam database containing pro...
pfam.a.models Pfam-A HMM Library Pfam-A.hmm.gz (331.1 MB) Product The Pfam HMM library for Pfam-A famil...
pfam.a.data Pfam-A HMM data Pfam-A.hmm.dat.gz (637.4 KB) Product The Pfam HMM data for Pfam-A families...
pfam.a.seedalignment Pfam-A Seed alignment Pfam-A.seed.gz (164.6 MB) Product Pfam-A Seed alignment.
pfam.a.fullalignment Pfam-A Full alignment Pfam-A.full.gz (19.6 GB) Product Pfam-A Full alignment.
pfam.api InterPro API api ProgrammingInterface json REST API for programmatic access to P...
From other Resources
ID Name URL Category Format Description
spoke.graph SPOKE Graph GraphProduct The SPOKE knowledge graph containing ...
clinicalkg.graph CKG Graph Dump 1 GraphProduct mixed Neo4j database dump of the Clinical K...
cancer-genome-interpreter.clinicalkg.graph CKG Graph Dump 1 GraphProduct mixed Neo4j database dump of the Clinical K...
string.protein.links STRING Protein Links protein.links.v12.0.txt.gz (128.7 GB) GraphProduct txt protein network data (full network, s...
string.protein.links.detailed STRING Protein Links Detailed protein.links.detailed.v12.0.txt.gz (189.6 GB) GraphProduct txt protein network data (full network, i...
string.protein.links.full STRING Protein Links Full protein.links.full.v12.0.txt.gz (199.6 GB) GraphProduct txt protein network data (full network, i...
string.protein.physical.links STRING Protein Physical Links protein.physical.links.v12.0.txt.gz (11.1 GB) GraphProduct txt protein network data (physical subnet...
string.protein.physical.links.detailed STRING Protein Physical Links Detailed protein.physical.links.detailed.v12.0.txt.gz (13.8 GB) GraphProduct txt protein network data (physical subnet...
string.protein.physical.links.full STRING Protein Physical Links Full protein.physical.links.full.v12.0.txt.gz (14.5 GB) GraphProduct txt protein network data (physical subnet...
string.cog.links STRING COG Links COG.links.v12.0.txt.gz (176.8 MB) GraphProduct txt association scores between orthologou...
string.cog.links.detailed STRING COG Links Detailed COG.links.detailed.v12.0.txt.gz (238.7 MB) GraphProduct txt association scores (incl. subscores p...
string.database STRING Database Network Schema network_schema.v12.0.sql.gz (262.2 GB) GraphProduct full database, part II: the networks ...
obo-db-ingest.pfam.tsv pfam Nodes TSV pfam.tsv (450.2 KB) Product tsv pfam Nodes TSV
obo-db-ingest.pfam.clan.tsv pfam.clan Nodes TSV pfam.clan.tsv (6.3 KB) Product tsv pfam.clan Nodes TSV
ckg.graph CKG Graph Database Dump 1 GraphProduct neo4j Graph database dump and additional re...

Details

Pfam is a large collection of protein families, each represented by multiple sequence alignments and profile hidden Markov models (HMMs). Proteins are generally composed of one or more functional regions, commonly termed domains. The presence of different domains in varying combinations in different proteins gives rise to the diverse repertoire of proteins found in nature. Identifying the domains present in a protein can provide insights into its function.

Each Pfam family, usually referred to as a Pfam-A entry, consists of a curated seed alignment containing a small set of representative members of the family, profile HMMs built from the seed alignment, and an automatically generated full alignment, which contains all detectable protein sequences belonging to the family, as defined by profile HMM searches of primary sequence databases.

Pfam entries are classified into several types:

  • Family: A collection of related proteins
  • Domain: A structural unit that can be found in multiple protein contexts
  • Repeat: A short unit that is unstable in isolation but forms a stable structure when multiple copies are present
  • Motif: A short unit found outside globular domains
  • Coiled-coil: A region that forms a coiled-coil structure
  • Disordered: A region that is disordered in the native state

Pfam also groups related entries into clans, which are collections of Pfam entries related by sequence, structure, or profile HMM. This is particularly useful for capturing relationships between divergent families that may have a common evolutionary origin.

Pfam version 37.0 is based on UniProt release 2023_05. The database is now maintained as part of the InterPro database at the European Bioinformatics Institute (EMBL-EBI). Pfam is powered by the HMMER3 package developed by Sean Eddy’s group at HHMI/Harvard University.

The database is freely available under the Creative Commons Zero (CC0) license and can be accessed through the InterPro website or downloaded from the FTP site.

Is this information incorrect or incomplete? Request an update.

Created: May 28, 2025 | Last modified: January 30, 2026