conda config --add envs_dirs /zfs/omics/projects/bioinformatics/software/miniconda3/envs/
Mafft
Introduction
MAFFT (Katoh 2002) is a multiple sequence alignment program for unix-like operating systems. It offers a range of multiple alignment methods, L-INS-i (accurate; for alignment of <∼200 sequences), FFT-NS-2 (fast; for alignment of <∼30,000 sequences), and so on. For more information, please visit the mafft website.
Installation
Installed on crunchomics: Yes,
- Mafft v7.525 is installed as part of the bioinformatics share. If you have access to crunchomics and have not yet access to the bioinformatics you can send an email with your Uva netID to Nina Dombrowski, n.dombrowski@uva.nl.
- Afterwards, you can add the bioinformatics share as follows (if you have already done this in the past, you don’t need to run this command):
If you want to install it yourself, you can run:
mamba create -n mafft -c bioconda mafft
Usage
Example usage:
conda activate mafft_7.525
mafft-linsi --reorder --thread 20 \
> my_protein_file.aln
my_protein_file.faa
conda deactivate
Notice:
- linsi is an alias for an accurate option (L-INS-i) for an alignment of up to ∼200-1000 sequences × ∼2,000 sites.
- By default, mafft uses a fast option (FFT-NS-2)
- If unsure what alignment option to use, you can also use
--auto
or check all available options in the manual
Useful options (for a full list, please visit the manual):
--auto
Automatically selects an appropriate strategy from L-INS-i, FFT-NS-i and FFT-NS-2, according to data size. Default: off (always FFT-NS-2)
--maxiterate number
number cycles of iterative refinement are performed. Default: 0--reorder
: Output order: aligned. Default: off (inputorder). This can be useful if you visually inspect the alignments, as outliers tend to appear at the bottom--anysymbol
: To be able to allow unusual characters (e.g., U as selenocysteine in protein sequence; i as inosine in nucleotide sequence),we have to use this option
If you want to inspect your alignment, we also provide a tool on the bioinformatics server to be able to do this (to better find conserved sites go to Colour –> Clustal):
conda activate jalview_2.11.3.3
jalview my_protein_file.aln
conda deactivate
References
Katoh, K. 2002. “MAFFT: A Novel Method for Rapid Multiple Sequence Alignment Based on Fast Fourier Transform.” Nucleic Acids Research 30 (14): 3059–66. https://doi.org/10.1093/nar/gkf436.