conda config --add envs_dirs /zfs/omics/projects/bioinformatics/software/miniconda3/envs/
CheckM2
CheckM2 is a software to assess the quality (completeness, contamination, coding density, etc.) of a genome assembly (Chklovski et al., n.d.). Unlike CheckM1, CheckM2 has universally trained machine learning models it applies regardless of taxonomic lineage to predict the completeness and contamination of genomic bins. This allows it to incorporate many lineages in its training set that have few - or even just one - high-quality genomic representatives, by putting it in the context of all other organisms in the training set. As a result of this machine learning framework, CheckM2 is also highly accurate on organisms with reduced genomes or unusual biology, such as the Nanoarchaeota or Patescibacteria.
For more information, check out the tools github page.
Introduction
Installation
Installed on crunchomics: Yes,
- CheckM2 v1.0.1 is installed as part of the bioinformatics share. If you have access to crunchomics and have not yet access to the bioinformatics you can send an email with your Uva netID to Nina Dombrowski. Afterwards, you can add the bioinformatics share as follows (if you have already done this in the past, you don’t need to run this command):
If you want to install it yourself, you can run:
mamba create -p /zfs/omics/projects/bioinformatics/software/miniconda3/envs/checkm2_1.0.1 -c bioconda -c conda-forge checkm2=1.0.1
conda activate checkm2_1.0.1
#install right python version, otherwise you get a class error
mamba install python=3.8
#install database
checkm2 database --download --path /zfs/omics/projects/bioinformatics/databases/checkm
#do test run
checkm2 testrun
Usage
#run checkm2
conda activate checkm2_1.0.1
checkm2 predict --threads 30 \
--input folder_with_genomes_to_analyse/ \
-x fasta \
--output-directory results/checkm2
conda deactivate
After running this, you fill find all relevant information in the quality_report.tsv
file in the output folder.
Useful options (for a full list, use the help function):
--genes
: Treat input files as protein files. [Default: False]-x EXTENSION
,--extension
EXTENSION: Extension of input files. [Default: .fna]--tmpdir
TMPDIR : specify an alternative directory for temporary files--force
: overwrite output directory [default: not set]--resume
: Reuse Prodigal and DIAMOND results found in output directory [default: not set]--threads
num_threads,-t
num_threads: number of CPUS to use [default: 1]--ttable
ttable: Provide a specific prodigal translation table for bins [default: automatically determine either 11 or 4]
For more information, please visit the manual.
Common Issues and Solutions
- Issue 1: Running out of memory
- Solution 1: If you are running CheckM2 on a device with limited RAM, you can use the
--lowmem
option to reduce DIAMOND RAM use by half at the expense of longer runtime.
- Solution 1: If you are running CheckM2 on a device with limited RAM, you can use the