cd <path_to_software_folder>
git clone https://github.com/deweylab/RSEM.git
mv RSEM/ RSEM_1.3.3
cd RSEM_1.3.3
make
make install prefix=/<path_to_software_folder>/RSEM_1.3.3
RSEM
Introduction
RSEM is a software package for estimating gene and isoform expression levels from RNA-Seq data (Li and Dewey 2011). The RSEM package supports threads for parallel computation of the EM algorithm, single-end and paired-end read data, quality scores, variable-length reads and RSPD estimation. In addition, it provides posterior mean and 95% credibility interval estimates for expression levels. For visualization, it can generate BAM and Wiggle files in both transcript-coordinate and genomic-coordinate. Genomic-coordinate files can be visualized by both UCSC Genome browser and Broad Institute’s Integrative Genomics Viewer (IGV). Transcript-coordinate files can be visualized by IGV. RSEM also has its own scripts to generate transcript read depth plots in pdf format.
For a full list of options, we recommend that the users visits the documentation and the tutorial.
Installation
Installed on crunchomics: Yes,
- RSEM v1.3.3 is installed as part of the bioinformatics share. If you have access to crunchomics and have not yet access to the bioinformatics you can send an email with your Uva netID to Nina Dombrowski.
- Individual RSEM modules can be found in this folder
/zfs/omics/projects/bioinformatics/software/RSEM_1.3.3/bin/
If you want to install it yourself, you can run:
Usage
There are many different options how to use RSEM as indicted in the figure below:
For a full list of options, we recommend that the users visits the documentation and the tutorial.
Below, you find a very brief example on how to use RSEM to estimate gene and isoform expression with BAM mapping files that were generated with STAR using --quantMode TranscriptomeSAM
. Notice, you don’t need to run the mapping with an aligner separately, as RSEM supports read mapping with Bowtie2, STAR and HISAT2.
#prepare folders
mkdir -p data/genome_files/rsem_ref
mkdir -p results/quantification/rsem
#prepare the reference files
#for this step we need the fasta and gtf files from a reference assembly (the same assembly that was also used when running STAR)
/zfs/omics/projects/bioinformatics/software/RSEM_1.3.3/bin/rsem-prepare-reference \
--gtf data/genome_files/assembly.gtf \
\
data/genome_files/assembly.fasta
data/genome_files/rsem_ref
#caculate the expression
#for this we need the bam files generated using STAR
/zfs/omics/projects/bioinformatics/software/RSEM_1.3.3/bin/rsem-calculate-expression \
--bam --no-bam-output -p 12 --paired-end --forward-prob 0.5 \
\
results/star_output/sample1_Aligned.toTranscriptome.out.bam \
data/genome_files/rsem_ref >& results/quantification/rsem/sample1.log
results/quantification/rsem/sample1
#if you ran star and rsem on more than one sample via a for-loop, then you can use the code below to
#combine rsem results for multiple samples
python3 /zfs/omics/projects/bioinformatics/scripts/combine_rsem.py \
-i 06_mapping/quantification/rsem \
-o 06_mapping/quantification/rsem/rsem_genes.tsv \
-t genes
python3 /zfs/omics/projects/bioinformatics/scripts/combine_rsem.py \
-i 06_mapping/quantification/rsem \
-o 06_mapping/quantification/rsem/rsem_transcripts.tsv \
-t isoforms
Options used in the example (please read the manual for a full set of options!):
--paired-end
is applicable to paired stranded RNA-seq data--forward-prob
NUM. Here:- 0 is for a strand-specific protocol where all (upstream) read are derived from the reverse strand
- 1 is for a strand-specific protocol where all (upstream) reads are derived from the forward strand
- 0.5 is for a non-strand specific protocol
A note on combining results from different samples
We provide a small python script o combine the rsem outputs from multiple files. This script can be found on Crunchomics but you are free to combine the files yourself in your favorite computational language. The script combine_rsem.py
requires that the gene and isoform files are located in a specific folder. Namely, each sample should be have a folder that has the same name as the sample. In the example above, the script expects a folder called sample1 in the target folder specified in the script, i.e. results/quantification/rsem/
. In the sample1 folder, the script will look for two files, rsem.genes.results
and rsem.isoforms.results
.
If you want to use DESeq2, then you can import the individual tables into R with tximport.