conda config --add envs_dirs /zfs/omics/projects/bioinformatics/software/miniconda3/envs/
GTDB_tk
Introduction
GTDB_Tk is a software toolkit for assigning objective taxonomic classifications to bacterial and archaeal genomes (Chaumeil et al. 2019). It uses the GTDB database to assign your genome(s) of interest to a taxonomy (Parks et al. 2020). For more information, visit the tool`s github and website.
Installation
Installed on crunchomics: Yes,
- GTDB_Tk v2.4.0 and the GTDB v220 database are installed as part of the bioinformatics share. If you have access to crunchomics and have not yet access to the bioinformatics you can send an email with your Uva netID to Nina Dombrowski.
After you were added to the bioinformatics share you can add the conda environments that are installed in this share as follows (if you have already done this in the past, you don’t need to run this command):
If you want to install it yourself, you can run:
mamba create -n gtdbtk_2.4.0 -c bioconda gtdbtk=2.4.0
#get gtdb data, v220
cd <my_database_folder>
wget https://data.gtdb.ecogenomic.org/releases/latest/auxillary_files/gtdbtk_package/full_package/gtdbtk_data.tar.gz
tar xvzf gtdbtk_data.tar.gz
#link data to gtdbtk
conda activate gtdbtk_2.4.0
conda env config vars set GTDBTK_DATA_PATH="<my_database_folder>/gtdb/release220";
conda deactivate
Usage
mkdir -p results/gtdb
conda activate gtdbtk_2.4.0
gtdbtk classify_wf --genome_dir genome_dir/ \
--extension fasta \
--out_dir results/gtdb \
--mash_db /zfs/omics/projects/bioinformatics/databases/gtdb/release220/gtdb_ref_sketch.msh \
--cpus 20
conda deactivate
The files with the name gtdbtk.*.summary.tsv
will contain key information about the taxonomic assignment of your genome(s).
For a full set of options and description how the tool works, please visit the manual.