Welcome!
This lesson will guide you through two datasets related to “real-life” problems applicable to your study track (Advanced Forensic Biology, 5274ADFB6Y).
You will see how to perform an extensive exploration data analysis, how to find gene markers using clustering and Principal Component Analysis, apply Machine Learning techniques to find candidates related to a response variable (i.e. age).
All these acquired skills should be useful in your future life as a researcher. In turn, this newly generated knowledge can yield new research avenues and serve to answer new questions.
We will use R and its companion RStudio to perform our data analysis and visualisations.
Before you begin, be sure you are all set up (see below). For complete information, see the Setup section.
Main learning objectives
- Be able to exhaustively explore a dataset using the R
tidyverse
suite of tools:tidyr
,dplyr
,ggplot
, etc. - Understand the basics of Principal Component Analysis and use it to both explore and find tissue-specific marker genes.
- Build heatmaps from multivariate datasets to visualise expression/methylation values of differential variables.
- Be capable of constructing sample clusters based on different clustering methods.
- Be able to apply supervised Machine Learning techniques to find epigenetic markers related to age.
Before you start
Before the training, please make sure you have done the following:
- Consult what you need to do in the lesson Setup.
- Make yourself comfortable: if you’re not in a physical workshop, be set up with two screens if possible. You will be following along in RStudio on your own computer while also following this tutorial on your own.
Citation
If you make use of this material in some way (teaching, vocational training, research), please use this citation: “Marc Galland” (eds): “Advanced Forensic Biology.” Version 2020.10. Link.