NanoQC

Introduction

NanoQC is a quality control tool for long read sequencing data aiming to replicate some of the plots made by fastQC (De Coster et al. 2018).

Available on Crunchomics: Not by default

Installation

NanoQC is part of the Nanopack package and I would recommend installing this package to already have other useful tools installed. Therefore, we install a new conda environment called nanopack. If you already have an environment with tools for long-read analyses I suggest adding nanopack there instead.

#setup new conda environment, which we name nanopack
mamba create --name nanopack -c conda-forge -c bioconda python=3.6 pip

#activate environment
conda activate nanopack 

#install nanopack software tools using pip
$HOME/personal/mambaforge/envs/nanopore/bin/pip3 install nanopack

#close environment
conda deactivate

Usage

  • Inputs: Fastqc.gz file
  • Output: An HTML with quality information

Example code:

#start environment
conda activate nanopack

#run on a single file
nanoQC myfile.fastq.gz -o outputfolder

Useful arguments (for the full version, check the manual):

  • -l, --minlen {int} Minimum length of reads to be included in the plots. This also controls the length plotted in the graphs from the beginning and end of reads (length plotted = minlen / 2)

References

De Coster, Wouter, Svenn D’Hert, Darrin T Schultz, Marc Cruts, and Christine Van Broeckhoven. 2018. “NanoPack: Visualizing and Processing Long-Read Sequencing Data.” Bioinformatics 34 (15): 2666–69. https://doi.org/10.1093/bioinformatics/bty149.