This lesson is in the early stages of development (Alpha version)

Introduction to Open Data Science with R: Welcome

Welcome

This lesson will introduce you to open data science so you can work with data in an open, reproducible, and collaborative way. Open data science means that methods, data, and code are available so that others can access, reuse, and build from it without much fuss. Here you will learn a workflow with R, RStudio, Git, and GitHub, as we describe in Lowndes et al. 2017, Nature Ecology & Evolution: Our path to better science in less time using open data science tools.

This is going to be fun, because learning these open data science tools and practices is empowering! This training book is written (and always improving) so you can use it as self-paced learning, or it can be used to teach an in-person workshop where the instructor live-codes. Either way, you should do everything hands-on on your own computer as you learn.

Before you begin, be sure you are all set up: see the prerequisites in Chapter \@ref(overview).

Before you start

Before the training, please make sure you have done the following:

  1. Download and install up-to-date versions of:
  2. Create a GitHub account: https://github.com Note! Shorter names that kind of identify you are better, and use your work email!
  3. Read the workshop Code of Conduct to make sure this workshop stays welcoming for everybody.
  4. Get comfortable: if you’re not in a physical workshop, be set up with two screens if possible. You will be following along in RStudio on your own computer while also following this tutorial on your own. More instructions are available on the workshop website in the Setup section.

Schedule

Setup Download files required for the lesson
00:00 1. Introduction What will I learn during this workshop?
What are the tools that I will be using?
What are the tidy data principles?
What is working in a more open way beneficial?
00:30 2. R & RStudio, Rmarkdown How do I orient myself in the RStudio interface?
How can I work with R in the console?
What are built-in R functions and how do I use their help page?
How can I generate an Rmarkdown notebook?
01:30 3. Visualizing data with ggplot2 How can I make publication-grade plots with ggplot2?
What are the key concepts underlying ggplot2 plotting?
What are some of the visualisations available through ggplot2?
How can I save my plot in a specific format (e.g. png)?
03:00 4. Data transformation with dplyr How do I perform data transformations such as removing columns on my data using R?
What are tidy data (in opposition to messy data)?
How do I import data into R (e.g. from a web link)?
How can I make my code more readable when performing a series of transformation?
04:30 5. Data tidying with tidyr How can I turn my dataset into the tidy format to perform efficient data analyses with R?
How can I convert from the tidy format to a more classic wide format?
How can I make my dataset explicit for all combinations of variables?
06:00 6. Programming with R How can I create a script in R to automatise a data analysis?
How can I create for loops in R?
How can I create a script that dynamically make choices?
How can I combine two dataframes into one?
07:30 7. Version control with git and Github What is version control? How do I use it?
What is the difference between gitand Github?
What benefits does a version control system brings in for my research?
09:00 8. Collaborating with Github How can I develop and collaborate on code with another scientist?
How can I give access to my code to another collaborator?
How can I keep code synchronised with another scientist?
How can I solve conflicts that arise from that collaboration?
What are Github
10:30 9. Become a champion of open (data) science
12:00 Finish

The actual schedule may vary slightly depending on the topics and exercises chosen by the instructor.