This lesson is still being designed and assembled (Pre-Alpha version)

Advanced Forensic Biology: Glossary

Key Points

Introduction to dataset #1 (gene expressions)
  • This dataset measures gene expression from various human tissues.

  • Gene expression is measured from the hybridization of mRNA molecules to microarray probes.

  • Some tissues have genes that are uniquely or strongly expressed in them which makes them gene-markers of that tissue.

  • Finding tissue-specific markers can be done through several methods: PCA, clustering or even custom ones.

Exploratory Data Analysis of dataset #1
  • Computing several descriptive metrics and distribution plots is important to visualise value distributions and potential outliers.

  • Scaling is necessary to visualise values that show value differences of several order of magnitude.

  • A pairwise plot matrix can help to pinpoint samples with similar gene expression profiles.

Principal Component Analysis (PCA)
Hierarchical clustering and heatmaps
  • Scaling of expression values is essential for distance calculation and hierarchical clustering.

  • The clustering method of choice can have a profound impact

  • Although clustering is a powerful technique to describe data structure, it does not easily help to pinpoint at specific interesting genes.

Finding tissue-specific genes through feature engineering
  • Sometimes, creating a new variable is a necessary step to find interesting leads in a dataset.

  • Data transformation that converts a distribution to a normal one can benefit to one’s analysis.

Exercises on dataset #1
Introduction to dataset #2 (methylation and age)
  • Several biaises including sequencing depth can result in analysis artifacts and must be corrected trough scaling/normalisation.

Advanced Data Exploration on dataset #2
Linear, Multiple and Regularised Regressions
Random Forest analysis