pfam_db – Bioinformatics guidance page

Pfam database

General info

Pfam is a collection of protein family alignments which were constructed semi-automatically using hidden Markov models (HMMs) (Bateman et al. 2004). Sequences that are not covered by Pfam are clustered, aligned and automatically and released as Pfam-B. Pfam-B used to be integrated into the Pfam website, in addition to being available as a flatfile. It was discontinued from Pfam 28.0 to Pfam 33.0. As of Pfam 33.1,Pfam-B entries are available as a tar archive of alignments. Pfam families have permanent accession numbers and contain functional annotation and cross-references to other databases, while Pfam-B families are re-generated at each release and are unannotated.

Pfam is available on the web at:

https://www.ebi.ac.uk/interpro/

Installation

Available on crunchomics: Yes,

Pfam is installed as part of the bioinformatics share. If you have access to crunchomics and have not yet access to the bioinformatics you can send an email with your Uva netID to Nina Dombrowski.

The Pfam database can be found here:

/zfs/omics/projects/bioinformatics/databases/pfam/release_36.0/.

If you want to download the database yourself, you can do:

wget ftp://ftp.ebi.ac.uk/pub/databases/Pfam/current_release/Pfam-A.hmm.gz
wget https://ftp.ebi.ac.uk/pub/databases/Pfam/current_release/Pfam-A.clans.tsv.gz

gzip -d Pfam-A.hmm.gz 
gzip -d Pfam-A.clans.tsv.gz

#cleanup mapping file 
sed -i "s/ /_/g" Pfam-A.clans.tsv

References

Bateman, Alex, Lachlan Coin, Richard Durbin, Robert D. Finn, Volker Hollich, Sam Griffiths-Jones, Ajay Khanna, et al. 2004. “The Pfam Protein Families Database.” Nucleic Acids Research 32 (suppl_1): D138–41. https://doi.org/10.1093/nar/gkh121.