This lesson is still being designed and assembled (Pre-Alpha version)

Introduction to RNA-seq: Glossary

Key Points

 01 Introduction An RNA-Seq experiment is also a normal experiment (control, treated, replication, etc.). A canonical RNA-Seq experiment consists in RNA library preparation followed by bioinformatic analyses.” RNA-Seq yields a snapshot of individual gene expression levels (count table). Upon completion of the bioinformatic steps, the analysis of RNA-Seq results can be done using the DESeq2 R package. 02 Statistics & Experimental design Low statistical power reduces the chance of detecting a true effect. Replication, randomization and blocking are the three core principles of proper experimental design. Confounding happens when two sources of variation cannot be distinguished from one another. Randomize what you cannot control, block what you can control. Maximizing the number of biological replicates in RNA-seq experiments is key to increase statistical power and lower the number of false negatives. 03 From fastq files to alignments Next-Generation Sequencing techniques are massively parallel cDNA sequencing. Sequencing files are produced in a standard format: the fastq format. Using FastQC, one can easily check the sequencing quality of a fastq file. Performing read trimming ensures that seqence of bed quality and no sequencing adapter is left in your final reads. Align RNA-seq reads to a reference genome using a splice-aware aligner like STAR. 04 Visualizing and counting of the alignments The SAM/BAM format is the end-result of a read alignment to a reference genome. The samtools software can be used to view, filter and order aligments in a .bam file. The aligments can be visualized in a genome using an genome viewer like IGV The resulting BAM files are used to generate a count table for use in differential expression analyses. 05 Assessing the quality of RNA-seq experiments Several biaises including sequencing depth can result in analysis artifacts and must be corrected trough scaling/normalisation. Sample-level RNA-seq results in a multivariate output that can be explored through data reduction methods (e.g. PCA). Sample clustering and PCA should indicate whether the observed experimental variability can be explained by the experimental design. 06 Differential expression analysis Call differentially expressed genes requires to know how to specify the right contrast. Multiple hypothesis testing correction is required because multiple statistical tests are being run simultaneously. Volcano plots and heatmaps are useful representations to visualise differentially expressed genes. 07 Functional enrichment analysis 08 Cluster analysis The SAM/BAM format is the end-result of a read alignment to a reference genome. With Samtools the aligments in the SAM/BAM files can be viewed, filtered and ordered The aligments can be visualized in a genome using an genome viewer like IGV The resulting .bam files are used to generate a count table for use in differential expression analyses. 09 Transcriptomic and metabolomic data integration Transcriptomic data integration with metabolic pathways require to map gene identifiers to pathway the correspondence

FIXME