This lesson is still being designed and assembled (Pre-Alpha version)

Advanced Forensic Biology: Welcome!


This lesson will guide you through two datasets related to “real-life” problems applicable to your study track (Advanced Forensic Biology, 5274ADFB6Y).

You will see how to perform an extensive exploration data analysis, how to find gene markers using clustering and Principal Component Analysis, apply Machine Learning techniques to find candidates related to a response variable (i.e. age).
All these acquired skills should be useful in your future life as a researcher. In turn, this newly generated knowledge can yield new research avenues and serve to answer new questions.

We will use R and its companion RStudio to perform our data analysis and visualisations.

Before you begin, be sure you are all set up (see below). For complete information, see the Setup section.

Main learning objectives

Before you start

Before the training, please make sure you have done the following:

  1. Consult what you need to do in the lesson Setup.
  2. Make yourself comfortable: if you’re not in a physical workshop, be set up with two screens if possible. You will be following along in RStudio on your own computer while also following this tutorial on your own.


If you make use of this material in some way (teaching, vocational training, research), please use this citation: “Marc Galland” (eds): “Advanced Forensic Biology.” Version 2020.10. Link.



Setup Download files required for the lesson
00:00 1. Introduction to dataset #1 (gene expressions) Where does this dataset come from?
How many genes will I consider in my analysis?
How many tissues do I have in my dataset?
In which unit are my gene expression measured?
00:30 2. Exploratory Data Analysis of dataset #1 What are the main explorary data analysis step to perform?
What would you do to compare the different tissues all at once?
How would you already get an idea of the similarity between tissues?
01:15 3. Principal Component Analysis (PCA) What is Principal Component Analysis (PCA)?
How can I compute a PCA using R?
What are PCA loadings and scores? How to make a plot using scores and loadings?
03:15 4. Hierarchical clustering and heatmaps How can I group genes with similar profiles together?
How do I represent the result of a clustering analysis?
04:30 5. Finding tissue-specific genes through feature engineering How do I transform my variables to build more meaningful variables?
What type of transformation and checks should I perform?
How to extract my list of selected tissue-specific genes?
05:15 6. Exercises on dataset #1 Can I find liver-specific genes?
06:00 7. Introduction to dataset #2 (methylation and age) How are gene expression levels distributed within a RNA-seq experiment?
07:00 8. Advanced Data Exploration on dataset #2
08:00 9. Linear, Multiple and Regularised Regressions
09:00 10. Random Forest analysis
10:00 Finish

The actual schedule may vary slightly depending on the topics and exercises chosen by the instructor.