Welcome!

This lesson will guide you through two datasets related to “real-life” problems applicable to your study track (Advanced Forensic Biology, 5274ADFB6Y).

You will see how to perform an extensive exploration data analysis, how to find gene markers using clustering and Principal Component Analysis, apply Machine Learning techniques to find candidates related to a response variable (i.e. age).
All these acquired skills should be useful in your future life as a researcher. In turn, this newly generated knowledge can yield new research avenues and serve to answer new questions.

We will use R and its companion RStudio to perform our data analysis and visualisations.

Before you begin, be sure you are all set up (see below). For complete information, see the Setup section.

Main learning objectives

Be able to exhaustively explore a dataset using the R tidyverse suite of tools: tidyr, dplyr, ggplot, etc.
Understand the basics of Principal Component Analysis and use it to both explore and find tissue-specific marker genes.
Build heatmaps from multivariate datasets to visualise expression/methylation values of differential variables.
Be capable of constructing sample clusters based on different clustering methods.
Be able to apply supervised Machine Learning techniques to find epigenetic markers related to age.

Before you start

Before the training, please make sure you have done the following:

Consult what you need to do in the lesson Setup.

Make yourself comfortable: if you’re not in a physical workshop, be set up with two screens if possible. You will be following along in RStudio on your own computer while also following this tutorial on your own.

Citation

If you make use of this material in some way (teaching, vocational training, research), please use this citation: “Marc Galland” (eds): “Advanced Forensic Biology.” Version 2020.10. Link.

Credits

Schedule

	Setup	Download files required for the lesson
00:00	1. Introduction to dataset #1 (gene expressions)	Where does this dataset come from? How many genes will I consider in my analysis? How many tissues do I have in my dataset? In which unit are my gene expression measured?
00:30	2. Exploratory Data Analysis of dataset #1	What are the main explorary data analysis step to perform? What would you do to compare the different tissues all at once? How would you already get an idea of the similarity between tissues?
01:15	3. Principal Component Analysis (PCA)	What is Principal Component Analysis (PCA)? How can I compute a PCA using R? What are PCA loadings and scores? How to make a plot using scores and loadings?
03:15	4. Hierarchical clustering and heatmaps	How can I group genes with similar profiles together? How do I represent the result of a clustering analysis?
04:30	5. Finding tissue-specific genes through feature engineering	How do I transform my variables to build more meaningful variables? What type of transformation and checks should I perform? How to extract my list of selected tissue-specific genes?
05:15	6. Exercises on dataset #1	Can I find liver-specific genes?
06:00	7. Introduction to dataset #2 (methylation and age)	How are gene expression levels distributed within a RNA-seq experiment?
07:00	8. Advanced Data Exploration on dataset #2
08:00	9. Linear, Multiple and Regularised Regressions
09:00	10. Random Forest analysis
10:00	Finish

The actual schedule may vary slightly depending on the topics and exercises chosen by the instructor.

Advanced Forensic Biology: Welcome!

Welcome!

Main learning objectives

Before you start

Citation

Credits

Schedule