This lesson is still being designed and assembled (Pre-Alpha version)

# Introduction to RNA-seq: Welcome!

## Welcome!

This lesson will introduce you to the basics of gene expression analysis using RNA-Seq (short for RNA sequencing). Due to the considerable progress and constant decreasing costs of RNA-Seq, this technique has became a standard technique in biology.

It is going to be fun and empowering! You will discover how total RNA are converted to short sequences called “reads” that can in turn be used to get insights into gene expression. Through careful experimental design, these gene expression information can yield new research avenues and answer crucial questions.

We will use mostly R and its companion RStudio to perform our RNA-Seq analyses and visualisations.

Depending on the level of participants, the bioinformatic part might be performed (QC of fastq files, genome alignment, counting, etc.). For this, we will use a Docker image containing all necessary datasets and softwares.

Before you begin, be sure you are all set up (see below). For complete information, see the Setup section.

## Main learning objectives

After completing this lesson, you should be able to:

• Indicate the reasons for doing an RNA-Seq experiment.
• Identify good practices when designing a RNA-Seq experiment.
• Memorize the steps of a complete RNA-Seq experiment: from sequencing to analysis.
• Assess the quality of RNA-seq sequencing data (“reads”) using the command-line instructions in the cloud (Linux).
• Align RNA-seq reads to a reference genome using a splice-aware aligner (e.g. STAR).
• Generate a count matrix from the RNA-seq data alignment
• Perform a QC of your experiment through Principal Component Analysis (PCA) and sample clustering.
• Execute a differential gene expression analysis using R and the DESeq2 package.
• Be able to create key plots: volcano plot, heatmap and clustering of differentially expressed genes.
• Provide a biological interpretation to differentially expressed genes through ORA/GSEA analyses and data integration.

## Before you start

Before the training, please make sure you have done the following:

1. Consult what you need to do in the lesson Setup.
2. Read the workshop Code of Conduct to make sure this workshop stays welcoming for everybody.
3. Get comfortable: if you’re not in a physical workshop, get two screens if possible. You will be following along in RStudio on your own computer while also following this tutorial on your own. More instructions are available on the workshop website in the Setup section.

## Citation

If you make use of this material in some way (teaching, vocational training, research), please cite us: “Bliek Tijs, Frans van der Kloet and Marc Galland” (eds): “RNA-seq lesson.” Version 2020.04. https://github.com/ScienceParkStudyGroup/rnaseq-lesson

## Credits

This lesson is heavily based on teaching materials from the Harvard Chan Bioinformatics Core (HBC) in-depth NGS data analysis course. Materials have been adapted and some exercises created to comply with the Carpentries Foundation teaching requirements.

## Schedule

 Setup Download files required for the lesson 00:00 1. 01 Introduction What can I learn by doing this RNA-Seq lesson? What are the tools that I will be using? What are the tidy data principles? What is working in a more open way beneficial? 00:30 2. 02 Statistics & Experimental design What are the key statistical concepts I need to know for experimental design? What are type I and type II errors? What are the source of variability in an experiment? What are the 3 cores principles of (good) experimental design? Why is having biological replicates important in an (RNA-seq) experiment? 02:30 3. 03 From fastq files to alignments How do I perform a quality check of my RNA-seq fastq files with FastQC? How can I remove RNA-seq reads of low quality? using trimmomatic? How do I align my reads to a reference genome using STAR? 03:15 4. 04 Visualizing and counting of the alignments What is a BAM file? What do all these numbers mean? how to prepare BAM files for visualisation? How to use IGV, an interactive genome browser? 04:15 5. 05 Assessing the quality of RNA-seq experiments How do I assess the success or failure of an RNA-seq experiment? How do I know that my RNA-seq experiment has worked according to my experimental design? What is a Principal Component Analysis (PCA)? How can I apply a Principal Component Analysis to RNA-seq gene count results? How are gene expression levels distributed within a RNA-seq experiment? Why do I need to scale/normalize read counts? How do I know that my RNA-seq experiment has worked according to my experimental design? How informative is PCA and sample clustering for sample-level RNA-seq quality checks? 05:55 6. 06 Differential expression analysis What are factor levels and why is it important for different expression analysis? How can I call the genes differentially regulated in response to my experimental design? What is a volcano plot and how can I create one? What is a heatmap and how can it be informative for my comparison of interest? 07:25 7. 07 Functional enrichment analysis Given a list of differentially expressed genes, how do I search for enriched functions? What is the difference between an over-representation analysis (ORA) and a gene set enrichment analysis (GSEA)? 08:40 8. 08 Cluster analysis How can the genes be clustered (grouped besed on expression). How to isolate and visualize the different clusters. 10:10 9. 09 Transcriptomic and metabolomic data integration How can I map differential genes to metabolic pathways? How do I retrieve KEGG identifiers given a list of gene identifiers? 11:10 Finish

The actual schedule may vary slightly depending on the topics and exercises chosen by the instructor.