kraken2

Introduction

Kraken2 (Wood, Lu, and Langmead 2019) is a taxonomic sequence classifier that assigns taxonomic labels to DNA sequences. Kraken examines the k-mers within a query sequence and uses the information within those k-mers to query a database. That database maps k-mers to the lowest common ancestor (LCA) of all genomes known to contain a given k-mer.

Available on Crunchomics: Kraken version 2.0.8-beta installed

Installation

If you want to install kraken2 on your own, its best to install it via mamba:

#setup new conda environment, which we name kraken2
mamba create --name kraken2 -c bioconda kraken2

Usage

For detailed usage information, check out the kraken2 manual.

Build a kraken database

The command below will download NCBI taxonomic information, as well as the complete genomes in RefSeq for the bacterial, archaeal, and viral domains, along with the human genome and a collection of known vectors (UniVec_Core). After downloading all this data, the build process begins; this can be the most time-consuming step. If you have multiple processing cores, you can run this process with multiple threads.

Addtionally, kraken2 comes with several custom databases, such as the SILVA database for 16S rRNA gene analyses. Check the kraken2 manual for detailed information on how to download custom things..

#create a kraken2 database 
kraken2-build --standard --threads 24 --db $DBNAME

Classification

kraken2 --db $DBNAME seqs.fa --output output.out --report output.report

References

Wood, Derrick E., Jennifer Lu, and Ben Langmead. 2019. “Improved Metagenomic Analysis with Kraken 2.” Genome Biology 20 (1): 257. https://doi.org/10.1186/s13059-019-1891-0.