Chopper

Introduction

Chopper is a tool for quality filtering of long read data. It is a Rust implementation of two other tools for long-read quality filtering, NanoFilt and NanoLyse, both originally written in Python. This tool, intended for long read sequencing such as PacBio or ONT, filters and trims a fastq file (De Coster and Rademakers 2023).

Installation

Installed on crunchomics: Yes,

  • Chopper v0.8 is installed as part of the bioinformatics share. If you have access to crunchomics and have not yet access to the bioinformatics you can send an email with your Uva netID to Nina Dombrowski.
  • Afterwards, you can add the bioinformatics share as follows (if you have already done this in the past, you don’t need to run this command):
conda config --add envs_dirs /zfs/omics/projects/bioinformatics/software/miniconda3/envs/

If you want to install it yourself, you can run:

mamba create --name chopper -c bioconda chopper

Usage

Required input:

  • FASTQ files

Output:

  • FASTQ files

Example usage:

conda activate chopper0.8.0

gunzip -c results/porechop/my_reads.fastq.gz |\
    chopper -q 10 \
    --headcrop 0 --tailcrop 0  \
    -l 1000 \
    --threads 20 |\
    gzip > results/chopper/my_reads_filtered1000.fastq.gz

conda deactivate

Useful arguments:

  • --headcrop Trim N nucleotides from the start of a read [default: 0]
  • --maxlength Sets a maximum read length [default: 2147483647]
  • -l, --minlength Sets a minimum read length [default: 1]
  • -q, --quality Sets a minimum Phred average quality score [default: 0]
  • --tailcrop Trim N nucleotides from the end of a read [default: 0]
  • --threads Number of parallel threads to use [default: 4]
  • --contam Fasta file with reference to check potential contaminants against [default None]}

References

De Coster, Wouter, and Rosa Rademakers. 2023. “NanoPack2: Population-Scale Evaluation of Long-Read Sequencing Data.” Bioinformatics 39 (5): btad311. https://doi.org/10.1093/bioinformatics/btad311.