porechop_readme – Bioinformatics guidance page

Porechop

Introduction

Porechop is a tool for finding and removing adapters from Oxford Nanopore reads. Adapters on the ends of reads are trimmed off, and when a read has an adapter in its middle, it is treated as chimeric and chopped into separate reads. Porechop performs thorough alignments to effectively find adapters, even at low sequence identity.

Notice: From 2018 on, porechop is not actively maintained anymore. It runs perfectly fine, but that is something to keep in mind when running into bugs.

Installation

Installed on crunchomics: Yes,

Porechop v0.2.4 is installed as part of the bioinformatics share. If you have access to crunchomics and have not yet access to the bioinformatics you can send an email with your Uva netID to Nina Dombrowski.
Afterwards, you can add the bioinformatics share as follows (if you have already done this in the past, you don’t need to run this command):

conda config --add envs_dirs /zfs/omics/projects/bioinformatics/software/miniconda3/envs/

If you want to install it yourself, you can run:

mamba create --name porechop -c conda-forge -c bioconda porechop=0.2.4

Usage

Important

Adaptor trimming very much depends on how the sequencing library was generated. Therefore, I recommend to carefully read through the How it works section of the softwares manual to know what to expect and look out for.

Similarly, porechop works with both demultiplexed and non-demultiplexed sequences. Also here, the manual explains in more detail how to perform barcode demultiplexing.

Required input:

FASTA/FASTQ of input reads or a directory which will be recursively searched for FASTQ files (required and can be fasta,fastq,fasta.gz,fastq.gz)

Output:

FASTA or FASTQ of trimmed reads

Example code:

conda activate porechop_0.2.4

porechop --input myfile.fastq.gz \
  --output outputfolder/myfile_filtered.fastq.gz \
  --threads 1 \
  --discard_middle

conda deactivate

Useful arguments (for the full version, check the manual):

-b {BARCODE_DIR}, --barcode_dir {BARCODE_DIR}: Reads will be binned based on their barcode and saved to separate files in this directory (incompatible with –output)
--barcode_threshold {BARCODE_THRESHOLD} A read must have at least this percent identity to a barcode to be binned (default: 75.0)
--barcode_diff {BARCODE_DIFF} If the difference between a read’s best barcode identity and its second-best barcode identity is less than this value, it will not be put in a barcode bin (to exclude cases which are too close to call) (default: 5.0)
--adapter_threshold {ADAPTER_THRESHOLD} An adapter set has to have at least this percent identity to be labelled as present and trimmed off (0 to 100) (default: 90.0)
--check_reads{CHECK_READS} This many reads will be aligned to all possible adapters to determine which adapter sets are present (default: 10000)
--no_split Skip splitting reads based on middle adapters (default: split reads when an adapter is found in the middle)
--discard_middle Reads with middle adapters will be discarded (default: reads with middle adapters are split) (required for reads to be used with Nanopolish, this option is on by default when outputting reads into barcode bins)

Porechop

Introduction

Installation

Usage

References