mamba create -n subread_2.0.6
mamba install -n subread_2.0.6 -c bioconda subread=2.0.6
mamba activate subread_2.0.6
FeatureCounts
Introduction
FeatureCounts is part of the Subread software package, a tool kit for processing next-gen sequencing data (Liao, Smyth, and Shi 2013). It includes Subread aligner, Subjunc exon-exon junction detector and the featureCounts read summarization program.
FeatureCounts is a program that counts how many reads map to features, such as genes, exon, promoter and genomic bins. Therefore, it is useful to use after you, for example, aligned sequences (from a genome, metagenome, transcriptome) to reference sequences and want to generate a count table.
A detailed documentation can be downloaded from here.
Installation
Installed on crunchomics: No
If you want to install it yourself, you can run:
Usage
FeatureCounts takes as input a annotation file in gtf or gff format and a sorted bam file.
It outputs a text file with the counts for each feature (in our example CDS) per sample. Notice, how you can use a wildcard to generate a counts table for multiple bam files at the same time.
featureCounts -T 5 -t CDS -g gene_id -M \
-a data/genome/genomic.gtf \
-o results/featurecounts/ncbi_gtf/counts.txt \
*_mapped_sorted.bam results/bowtie/
Useful options:
-a
Name of an annotation file. GTF/GFF format by default. See -F option for more format information. Inbuilt annotations (SAF format) is available in ‘annotation’ directory of the package. Gzipped file is also accepted. -o
Name of output file including read counts. A separate file including summary statistics of counting results is also included in the output (‘ .summary’). Both files are in tab delimited format. -t
Specify feature type(s) in a GTF annotation. If multiple types are provided, they should be separated by ‘,’ with no space in between. ‘exon’ by default. Rows in the annotation with a matched feature will be extracted and used for read mapping. -g
Specify attribute type in GTF annotation. ‘gene_id’ by default. Meta-features used for read counting will be extracted from annotation using the provided value. -M
Multi-mapping reads will also be counted. For a multi- mapping read, all its reported alignments will be counted. The ‘NH’ tag in BAM/SAM input is used to detect multi-mapping reads.-L
Count long reads such as Nanopore and PacBio reads. Long read counting can only run in one thread and only reads (not read-pairs) can be counted. There is no limitation on the number of ‘M’ operations allowed in a CIGAR string in long read counting.--maxMOp
Maximum number of ‘M’ operations allowed in a CIGAR string. 10 by default. Both ‘X’ and ‘=’ are treated as ‘M’ and adjacent ‘M’ operations are merged in the CIGAR string. -p
If specified, libraries are assumed to contain paired-end reads. For any library that contains paired-end reads, the ‘countReadPairs’ parameter controls if read pairs or reads should be counted.-s
Perform strand-specific read counting. A single integer value (applied to all input files) or a string of comma- separated values (applied to each corresponding input file) should be provided. Possible values include: 0 (unstranded), 1 (stranded) and 2 (reversely stranded). Default value is 0 (ie. unstranded read counting carried out for all input files).