mamba create -n bowtie_2.5.3
mamba install -n bowtie_2.5.3 -c bioconda bowtie2=2.5.3
mamba activate bowtie_2.5.3
Bowtie2
Introduction
Bowtie 2 is an ultra-fast and memory-efficient tool for aligning sequencing reads to long reference sequences, for example a genome (Langmead and Salzberg 2012). It is particularly good at aligning reads of about 50 up to 100s or 1,000s of characters, and particularly good at aligning to relatively long genomes.
For more detailed information, please visit the manual.
Installation
Installed on crunchomics: Yes, Bowtie v2.4.1 is installed. If you want to install it yourself, you can run:
Usage
Indexing
To perform the Bowtie2 alignment, an index is required. The index is analogous to the index in a book. By indexing the reference sequence, we organize it in a manner that allows for an efficient search and retrieval of matches of the query (sequence read) to the reference sequences.
The basic syntax is for building an index for a genome called GCF_000385215.fna is as follows:
bowtie2-build GCF_000385215.fna GCF_000385215
In the command above GCF_000385215.fna is the input file of sequence reads in fasta format, and GCF_000385215 is the prefix of the generated index files.
Read alignment
After generating the index, we can align some short Illumina reads against our genome index. Notice, that Bowtie 2 does not generate log summary files and this information gets printed to screen. To save this output in a file we use the 2>
operator.
Required inputs:
- Single-end and paired-end files in fasta or fastq format (can be compressed)
Generated output:
- The output from the Bowtie2 is an unsorted SAM file (i.e. Sequence Alignment/Map format)). The SAM file is a tab-delimited text file that contains information for each individual read and its alignment to the genome. To learn how to work with this file format, view the page about samtools.
If you have a single-end file, you can run:
bowtie2 -p 2 -q \
-x GCF_000385215 \
-U sample1.fastq.gz \
-S sample1_mapped.sam 2> bowtie2.log
For paired-end data you can do:
bowtie2 -p 2 -q \
-x GCF_000385215 \
-1 sample1_R1.fastq.gz \
-2 sample1_R2.fastq.gz \
-S sample1_mapped.sam 2> bowtie2.log
Basic options, for a fool list, go here:
-p
: number of processors/cores-q
: reads are in FASTQ format-x
: /path/to/genome_indices_directory-U
: /path/to/FASTQ_file-S
: /path/to/output/SAM_file