FeatureCounts

Introduction

FeatureCounts is part of the Subread software package, a tool kit for processing next-gen sequencing data (Liao, Smyth, and Shi 2013). It includes Subread aligner, Subjunc exon-exon junction detector and the featureCounts read summarization program.

FeatureCounts is a program that counts how many reads map to features, such as genes, exon, promoter and genomic bins. Therefore, it is useful to use after you, for example, aligned sequences (from a genome, metagenome, transcriptome) to reference sequences and want to generate a count table.

A detailed documentation can be downloaded from here.

Installation

Installed on crunchomics: No

If you want to install it yourself, you can run:

mamba create -n subread_2.0.6
mamba install -n subread_2.0.6 -c bioconda subread=2.0.6
mamba activate subread_2.0.6

Usage

FeatureCounts takes as input a annotation file in gtf or gff format and a sorted bam file.

It outputs a text file with the counts for each feature (in our example CDS) per sample. Notice, how you can use a wildcard to generate a counts table for multiple bam files at the same time.

featureCounts -T 5 -t CDS -g gene_id -M \
    -a data/genome/genomic.gtf \
    -o  results/featurecounts/ncbi_gtf/counts.txt \
    results/bowtie/*_mapped_sorted.bam

Useful options:

  • -a Name of an annotation file. GTF/GFF format by default. See -F option for more format information. Inbuilt annotations (SAF format) is available in ‘annotation’ directory of the package. Gzipped file is also accepted.
  • -o Name of output file including read counts. A separate file including summary statistics of counting results is also included in the output (‘.summary’). Both files are in tab delimited format.
  • -t Specify feature type(s) in a GTF annotation. If multiple types are provided, they should be separated by ‘,’ with no space in between. ‘exon’ by default. Rows in the annotation with a matched feature will be extracted and used for read mapping.
  • -g Specify attribute type in GTF annotation. ‘gene_id’ by default. Meta-features used for read counting will be extracted from annotation using the provided value.
  • -M Multi-mapping reads will also be counted. For a multi- mapping read, all its reported alignments will be counted. The ‘NH’ tag in BAM/SAM input is used to detect multi-mapping reads.
  • -L Count long reads such as Nanopore and PacBio reads. Long read counting can only run in one thread and only reads (not read-pairs) can be counted. There is no limitation on the number of ‘M’ operations allowed in a CIGAR string in long read counting.
  • --maxMOp Maximum number of ‘M’ operations allowed in a CIGAR string. 10 by default. Both ‘X’ and ‘=’ are treated as ‘M’ and adjacent ‘M’ operations are merged in the CIGAR string.
  • -p If specified, libraries are assumed to contain paired-end reads. For any library that contains paired-end reads, the ‘countReadPairs’ parameter controls if read pairs or reads should be counted.
  • -s Perform strand-specific read counting. A single integer value (applied to all input files) or a string of comma- separated values (applied to each corresponding input file) should be provided. Possible values include: 0 (unstranded), 1 (stranded) and 2 (reversely stranded). Default value is 0 (ie. unstranded read counting carried out for all input files).

References

Liao, Yang, Gordon K. Smyth, and Wei Shi. 2013. “featureCounts: An Efficient General Purpose Program for Assigning Sequence Reads to Genomic Features.” Bioinformatics 30 (7): 923–30. https://doi.org/10.1093/bioinformatics/btt656.