#install the conda environment
conda create -n motus_3.1.0
conda install -n motus_3.1.0 -c bioconda
#download the required databases
#the database will get downloaded to gets downloaded to: conda_env_folder/motus_3.1.0/lib/python3.9/site-packages/motus
conda activate motus_3.1.0
motus downloadDB
The mOTU profiler is a computational tool that estimates relative taxonomic abundance of known and currently unknown microbial community members using metagenomic shotgun sequencing data (Ruscheweyh et al. 2022). Check the wiki for more information.
Available on Crunchomics: No
You can install mOTUs with conda/mamba as follows:
Required inputs: Illumina reads (forward, reverse, unpaired) or Nanopore reads in fastq(.gz) format
Generated output(s):
- a file with the relative abundance (or read counts) of the prokaryotic species found in the sample. More about the output can be found here
- the results of the read mapping in BAM format (optional)
Example for short-read Illumina data
#prepare folders
mkdir data
mkdir -p results/motus_sr
#run motus
motus profile -f data/SRR17913199_1.fastq -r data/SRR17913199_2.fastq \
-n SRR17913199 \
-t 30 \
-o results/motus_sr/taxonomy_profile.txt
#parse the results and extract only hits >0 in decreasing order
awk -F'\t' -v OFS="\t" '!/^#/ && $2 > 0' results/motus_sr/taxonomy_profile.txt | sort -t$'\t' -k2,2nr
Example for long-read data
#prepare folders
mkdir -p results/motus_lr
#prepare long-reads to be profiled by mOTUs, this splits long reads into short reads of length 300
srun --cpus-per-task 1 --mem=10GB motus prep_long -i data/SRR17913199.fastq \
-o data/SRR17913199_prepped.fastq -n SRR17913199
#run motus
srun --cpus-per-task 30 --mem=10GB motus profile -s data/SRR17913199_prepped.fastq \
-n SRR17913199 \
-t 30 \
-o results/motus_lr/taxonomy_profile.txt
#parse the results and extract only hits >0 in decreasing order
awk -F'\t' -v OFS="\t" '!/^#/ && $2 > 0' results/motus_lr/taxonomy_profile.txt | sort -t$'\t' -k2,2nr
Useful arguments for motus profile
Input options:
FILE[,FILE] input file(s) for reads in forward orientation, fastq(.gz)-formatted-r
FILE[,FILE] input file(s) for reads in reverse orientation, fastq(.gz)-formatted-s
FILE[,FILE] input file(s) for unpaired reads, fastq(.gz)-formatted-n
STR sample name [‘unnamed sample’]-i
FILE[,FILE] provide SAM or BAM input file(s) (generated by motus map_tax)-m
FILE provide a mgc reads count file (generated by motus calc_mgc)-db
DIR provide a different database directory
Output options:
FILE output file name [stdout]-I
FILE save the result of BWA in BAM format (output of motus map_tax)-M
FILE save the mgc reads count (output of motus calc_mgc)-e
only species with reference genomes (ref-mOTUs)-u
print the full name of the species-c
print result as counts instead of relative abundances-p
print NCBI taxonomy identifiers-B
print result in BIOM format-C
STR print result in CAMI format (BioBoxes format 0.9.1), Values: [precision, recall, parenthesis]-q
print the full rank taxonomy-A
print all taxonomic levels together (kingdom to mOTUs, override -k)-k
STR taxonomic level [mOTU], Values: [kingdom, phylum, class, order, family, genus, mOTU]
Algorithm options:
INT number of marker genes cutoff: 1=higher recall, 6=higher precision [3]-l
INT min length of the alignment (bp) [75]-t
INT number of threads [1]-v
INT verbosity level: 1=error, 2=warning, 3=message, 4+=debugging [3]-y
STR type of read counts [insert.scaled_counts], Values: [base.coverage, insert.raw_counts, insert.scaled_counts]
Merging profiles
If you generated several profiles for different samples, you can merge them by providing a list of samples like this:
motus merge -i sampleX.motus,sampleY.motus,sampleZ.motus \
-o results/motus/merged.txt
Notice: For this to work its best to use the -n
option when using motus profile
to ensure that each sample is has a clear identifier in the abundance table.
You can also merge different profiles if they are all in the same directory like this:
motus merge -d results/motus \
-o results/motus/merged.txt