Introduction

Overview

Teaching: 30 min
Exercises: 0 min

Questions

What will I learn during this workshop?

Objectives

List the learning objectives of this tutorial

Present the data sets used in this tutorial

Present the strategy used to analyse the data sets

1. Overview
2. Learning objectives
3. Context, scientific question and data sets
4. Strategy

1. Overview

This tutorial will introduce you to Microbiota data analysis and guide you through the analyses, visualization and interpretation of microbial community composition and diversity.

2. Learning objectives

By the end of this tutorial you should be able to:

Address sparsity, undersampling and uneven sampling depth using data filtering and normalization
Define, calculate, and interpret alpha- and beta-diversity of microbial communities
Visualize and interpret microbial community composition
Generate and interpret multivariate analyses

3. Context, scientific question and data sets

3.1. Scientific context

Epilithic river biofilms are complex matrix-enclosed communities harboring a great diversity of prokaryotic and eukaryotic microorganisms. Deciphering the microbial community composition in river biofilms and identifying the impacts of environmental factors on these communities are of particular importance for attaining a better understanding of the functioning of riverine biofilms.

3.2. Scientific question

In this tutorial, we will investigate how bacterial communities within epilithic biofilms from the Loue River in France change along the river and over time (using a subset of the data previously published in Zancarini et al. 2017, Scientific Reports 7:4344. We will compare bacterial diversity and composition within biofilms harvested in two different sites (Cleron, located at the upstream area of the river, and Parcey, located at the downstream area of the river) and over the summer 2012 (i.e. July, August and September). For each site and harvesting time point, three samples were harvested. Then, DNA was extracted for each sample and the 16S gene was amplified and sequenced using 454-pyrosequencing.

Below are displayed a map of the sampling sites (from Zancarini et al. 2017) and the Loue at Cléron (source Wikipedia).

Loue river

Cleron Loue river

3.3. Datasets

A first bioinformatical analysis was required to create the occurence table based on the raw sequencing data given by the sequencing company. Usually, the biom format is used as an output format for this first analysis. The biom format has been developed to support encapsulation of core study data (occurence table data and sample/observation metadata) in a single file. Similar to the biom format, the phyloseq format also uses a single file containing the three tables. Microbial ecologists usually use Vegan and/or Phyloseq packages to analyse the occurence table.

To limit the number of packages used in this tutorial, the data sets are not in biom format but in txt (tabulation separated values) and consist of three different data files which will be converted to a single phyloseq file format:

An OTU table: data_loue_16S_nonnorm.txt
A sample metadata table: data_loue_16S_nonnorm_grp.txt
A taxonomy table: data_loue_16S_nonnorm_taxo.txt

4. Strategy

We want to assess if there is any difference between bacterial communities within biofilms harvested in the upper (Cleron) and in the downstream (Parcey) area of the Loue River and over time during the summer 2012 (i.e. July, August and September). We will follow the following strategy:

Explore the data sets and check data properties (sparsity and sequencing depth).
Define, calculate and interpret alpha-diversity (i.e. Richness, Choa1, Evenness and Shannon indices).
Filter the data and normalize the data by sample (i.e. correction for library size).
Explore the beta-diversity using Principal Covariates Analysis (PCoA) and perform statistical tests to test differences among treatments.
Visualize and interpret the bacterial community composition for the different treatments.

Teaching materials

This lesson has been formatted according to the Carpentries Foundation lesson template and following their recommendations on how to teach researchers good practices in programming and data analysis.

Key Points

lesson home

Microbiota data analysis

next episode