# Go into a folder in which you have all your software installed
cd software_dir
# Make a new folder for the interproscan installation and go into the folder
mkdir interproscan
cd interproscan
# Download software
wget https://ftp.ebi.ac.uk/pub/software/unix/iprscan/5/5.64-96.0/interproscan-5.64-96.0-64-bit.tar.gz
wget https://ftp.ebi.ac.uk/pub/software/unix/iprscan/5/5.64-96.0/interproscan-5.64-96.0-64-bit.tar.gz.md5
# Recommended checksum to confirm the download was success
# Must return *interproscan-5.64-96.0-64-bit.tar.gz: OK*
md5sum -c interproscan-5.64-96.0-64-bit.tar.gz.md5
# Decompress the downloaded folder
tar -pxvzf interproscan-5.64-96.0-*-bit.tar.gz
# Index hmm models
cd interproscan-5.64-96.0
python3 setup.py -f interproscan.properties
# Setup a conda environment with the newest java version
conda deactivate
mamba create --name java_f_iprscan -c conda-forge openjdk
# Activate the environment and install some more dependencies
conda activate java_f_iprscan
mamba install -c conda-forge gfortran
# The default installation will give an error with pfsearchV3 to solve this:
# Get new pftools version via conda (v3.2.12)
mamba install -c bioconda pftools
# Remove old pftools that came with interprocscan
cd bin/prosite
rm pfscanV3
rm pfsearchV3
# Replace with new pftools we have installed via mamba (and which are found in the mamba env folder)
# !!! exchange <~/personal/mambaforge/> with where your conda environments are installed!!!
cp ~/personal/mambaforge/envs/java_f_iprscan/bin/pfscan .
cp ~/personal/mambaforge/envs/java_f_iprscan/bin/pfscanV3 .
cp ~/personal/mambaforge/envs/java_f_iprscan/bin/pfsearch .
cp ~/personal/mambaforge/envs/java_f_iprscan/bin/pfsearchV3 .
cd ../..
# Do testrun (ProSiteProfiles and ProSitePatterns not working yet, therefore this works best when setting what applications to run manually)
#srun -n 1 --cpus-per-task 1 --mem=8G ./interproscan.sh -i test_all_appl.fasta -f tsv
srun -n 1 --cpus-per-task 1 --mem=8G ./interproscan.sh -i test_all_appl.fasta -f tsv -dp --appl TIGRFAM,SFLD,SUPERFAMILY,PANTHER,GENE3D,Hamap,Coils,SMART,CDD,PRINTS,PIRSR,AntiFam,Pfam,MobiDBLite,PIRSF,ProSiteProfiles
conda deactivate
Interproscan
Introduction
InterPro is a database which integrates together predictive information about proteins’ function from a number of partner resources, giving an overview of the families that a protein belongs to and the domains and sites it contains (Blum et al. 2021).
Users who have novel nucleotide or protein sequences that they wish to functionally characterise can use the software package InterProScan to run the scanning algorithms from the InterPro database in an integrated way. Sequences are submitted in FASTA format. Matches are then calculated against all of the required member database’s signatures and the results are then output in a variety of formats (Jones et al. 2014).
Installation
Newest version
Notice: The newest version does NOT run on crunchomics unless you do some extra steps during the installation since we need a newer java version and we need to update a tool used to analyse Prosite specific databases.
These changes require to move a few things around, so if you feel uncomfortable with this feel free to install an older version (for installation, see section #### Version for Crunchomics below) that works well with the default java version installed on Crunchomics.
If you want to work with the newest version and are not afraid of some extra steps do the following:
Version for Crunchomics
On Crunchomics, newer versions of interproscan do not run due to an incompatibility with the installed Java version. However, this version has an incompatibility with the local blast install, so some additional steps are also needed to get things working as well.
# Download software
//ftp.ebi.ac.uk/pub/software/unix/iprscan/5/5.36-75.0/interproscan-5.36-75.0-64-bit.tar.gz
wget https://ftp.ebi.ac.uk/pub/software/unix/iprscan/5/5.36-75.0/interproscan-5.36-75.0-64-bit.tar.gz.md5
wget https:
# Recommended checksum to confirm the download was success
# Must return *interproscan-5.64-96.0-64-bit.tar.gz: OK*
-5.36-75.0-64-bit.tar.gz.md5
md5sum interproscan
# Decompress the folder
-pxvzf interproscan-5.36-75.0-64-bit.tar.gz
tar
# Index hmm models
-5.36-75.0
cd interproscan
#fix a dependency issue with the blast install that comes with this version of interproscan
## Setup new blast env (installed blast 2.16.0)
-n python3.6.8 -c conda-forge python=3.6.8
mamba create .6.8
conda activate python3
## Install a working version of blast
-c bioconda blast=2.16.0
mamba install
## Replace problematic blast versions via symlinking
## !!! Replace </zfs/omics/personal/ndombro/mambaforge/> with the path to your conda environments!!!
bin/blast/ncbi-blast-2.9.0+/rpsblast
rm -s /zfs/omics/personal/ndombro/mambaforge/envs/python3.6.8/bin/rpsblast bin/blast/ncbi-blast-2.9.0+/rpsblast
ln
# Do a test run to check the installation
-n1 --cpus-per-task 8 --mem=8G ./interproscan.sh -i test_proteins.fasta
srun
# Close the environment
conda deactivate
Usage
Required inputs:
- Protein fasta file
Generated outputs:
- TSV: a simple tab-delimited file format
- XML: the new “IMPACT” XML format
- GFF3: The GFF 3.0 format
- JSON
- SVG
- HTML
Notice:
- Interproscan does not like
*
symbols inside the protein sequence. Some tools for protein calling, like prokka, use*
add the end of a protein to indicate that the full protein was found. If your files have such symbols, use the code below to remove it first. Beware: usingsed -i
overwrites the content of your file. If that behaviour is not wanted usesed 's/*//g' Proteins_of_interest.faa > Proteins_of_interest_new.faa
instead. - If you are on Crunchomics (or most other servers): DO NOT run jobs on the head node, but add something like
srun -n 1 --cpus-per-task 4
before the actual command
Example code:
#clean faa file
#remove `*` as interproscan does not like that symbol
-i 's/*//g' Proteins_of_interest.faa
sed
#run interproscan
<path_to_install>/interproscan-5.36-75.0/interproscan.sh --cpu 4 -i Proteins_of_interest.faa -d outputfolder -T outputfolder/temp --iprlookup --goterms
To check available options use <path_to_install>/interproscan-5.36-75.0/interproscan.sh --help
or for more detailed information, see the documentation):