This lesson is still being designed and assembled (Pre-Alpha version)

Introduction to RNA-seq: Glossary

Key Points

01 Introduction
  • An RNA-Seq experiment is also a normal experiment (control, treated, replication, etc.).

  • A canonical RNA-Seq experiment consists in RNA library preparation followed by bioinformatic analyses.”

  • RNA-Seq yields a snapshot of individual gene expression levels (count table).

  • Upon completion of the bioinformatic steps, the analysis of RNA-Seq results can be done using the DESeq2 R package.

02 Statistics & Experimental design
  • Low statistical power reduces the chance of detecting a true effect.

  • Replication, randomization and blocking are the three core principles of proper experimental design.

  • Confounding happens when two sources of variation cannot be distinguished from one another.

  • Randomize what you cannot control, block what you can control.

  • Maximizing the number of biological replicates in RNA-seq experiments is key to increase statistical power and lower the number of false negatives.

03 From fastq files to alignments
  • Next-Generation Sequencing techniques are massively parallel cDNA sequencing.

  • Sequencing files are produced in a standard format: the fastq format.

  • Using FastQC, one can easily check the sequencing quality of a fastq file.

  • Performing read trimming ensures that seqence of bed quality and no sequencing adapter is left in your final reads.

  • Align RNA-seq reads to a reference genome using a splice-aware aligner like STAR.

04 Visualizing and counting of the alignments
  • The SAM/BAM format is the end-result of a read alignment to a reference genome.

  • The samtools software can be used to view, filter and order aligments in a .bam file.

  • The aligments can be visualized in a genome using an genome viewer like IGV

  • The resulting BAM files are used to generate a count table for use in differential expression analyses.

05 Assessing the quality of RNA-seq experiments
  • Several biaises including sequencing depth can result in analysis artifacts and must be corrected trough scaling/normalisation.

  • Sample-level RNA-seq results in a multivariate output that can be explored through data reduction methods (e.g. PCA).

  • Sample clustering and PCA should indicate whether the observed experimental variability can be explained by the experimental design.

06 Differential expression analysis
  • Call differentially expressed genes requires to know how to specify the right contrast.

  • Multiple hypothesis testing correction is required because multiple statistical tests are being run simultaneously.

  • Volcano plots and heatmaps are useful representations to visualise differentially expressed genes.

07 Functional enrichment analysis
08 Cluster analysis
  • The SAM/BAM format is the end-result of a read alignment to a reference genome.

  • With Samtools the aligments in the SAM/BAM files can be viewed, filtered and ordered

  • The aligments can be visualized in a genome using an genome viewer like IGV

  • The resulting .bam files are used to generate a count table for use in differential expression analyses.

09 Transcriptomic and metabolomic data integration
  • Transcriptomic data integration with metabolic pathways require to map gene identifiers to pathway the correspondence

Glossary

FIXME