Instructors:
Tijs Bliek1, Marc Galland1, Enric Martínez1
Helpers:
Stacy Shinneman2, Martina Ferraguti2, Johannes de Groeve2, Joachim Goedhart1
1: member of the SILS institute, 2: member of the IBED institute
Introduction to Open Data Science with R.
Presentation
This lesson will introduce you to open data science so you can work with data in an open, reproducible, and collaborative way. Open data science means that methods, data, and code are available so that others can access, reuse, and build from it without much fuss. Here you will learn a workflow with R, RStudio, Git, and GitHub.
This is going to be fun, because learning these open data science tools and practices is empowering! This training book is written (and always improving) so you can also use it after the workshop if you have forgotten something for instance.
This workshop is part of the 2021 Summer School organised by the Amsterdam Science Park Study Group. This event is made possible thanks to the support of the NWO Team Science Award (2020 edition).
Who are we
This workshop is organized by the core members of the Amsterdam Science Park Study Group. This small community of computational biologists aims to promote skill sharing and collaboration through the organisation of interactive workshops. It acts as the main local hub to set-up Software and Data Carpentry workshops (official workshops and Carpentry-style). All are welcome to this study group, regardless of scientific research area, affiliation or training level.
For more information on what we teach and why,
see our website:
"scienceparkstudygroup".
General Information
Who:
The course is aimed at master students and other researchers (PhD. and postdoc).
You don't need to have any previous knowledge of the tools
that will be presented at the workshop.
Where:
Online, Amsterdam UTC+1 (see Zoom links).
Get directions with
OpenStreetMap
or
Google Maps.
Requirements: Participants must dispose of a laptop with a
Mac, Linux, or Windows operating system (not a tablet, Chromebook, etc.) that they have administrative privileges on. They should have a few specific software packages installed (listed below).
Code of Conduct: Everyone who participates in Carpentries activities is required to conform to the Code of Conduct. This document also outlines how to report an incident if needed.
Helpers: Experts helping in the room:
Monday 21 June from 9:30 to 13:00: Tijs
Monday 21 June from 14:00 to 17:00: Stacy
Tuesday 22 June from 9:30 to 13:00: Martina Ferraguti
Discord for technical assistance during the course
Discord server: We will use Discord to manage questions, general announcements and to match helpers with learners.
Please install the Discord application on your laptop/computer. Invite Link to the Discord server is here.
There are several channels that we will use:
The #general channel is used for general announcements about the workshop (links, coffee breaks, etc).
The #helpdesk is a text channel where participants can ask questions.
The helpdesk-1 is a voice channel where participants can ask questions to helper number 1 through video. Additional video channels called helpdesk-2, -3, etc. are available depending on the number of helpers.
Teachers have two private channels, one for text called #teacher-chat and one for video called teacher-coffee-room.
Git is a version control system that lets you track who made changes
to what when and has options for easily updating a shared or public
version of your code
on github.com. You will need a
supported
web browser.
You will need an account at github.com
for parts of the Git lesson. Basic GitHub accounts are free. We encourage
you to create a GitHub account if you don't have one already.
Please consider what personal information you'd like to reveal. For
example, you may want to review these
instructions
for keeping your email address private provided at GitHub.
Click on "Next" four times (two times if you've previously
installed Git). You don't need to change anything
in the Information, location, components, and start menu screens.
Select "Use the nano editor by default" and click on "Next".
Keep "Git from the command line and also from 3rd-party software" selected and click on "Next".
If you forgot to do this programs that you need for the workshop will not work properly.
If this happens rerun the installer and select the appropriate option.
Click on "Next".
Select "Use the native Windows Secure Channel library", and click "Next".
Keep "Checkout Windows-style, commit Unix-style line endings" selected and click on "Next".
Select "Use Windows' default console window" and click on "Next".
Leave all three items selected, and click on "Next".
Do not select the experimental option. Click "Install".
Click on "Finish".
If your "HOME" environment variable is not set (or you don't know what this is):
Open command prompt (Open Start Menu then type cmd and press [Enter])
Type the following line into the command prompt window exactly as shown:
setx HOME "%USERPROFILE%"
Press [Enter], you should see SUCCESS: Specified value was saved.
Quit command prompt by typing exit then pressing [Enter]
This will provide you with both Git and Bash in the Git Bash program.
For OS X 10.9 and higher, install Git for Mac
by downloading and running the most recent "mavericks" installer from
this list.
Because this installer is not signed by the developer, you may have to
right click (control click) on the .pkg file, click Open, and click
Open on the pop up window.
After installing Git, there will not be anything in your /Applications folder,
as Git is a command line program.
For older versions of OS X (10.5-10.8) use the
most recent available installer labelled "snow-leopard"
available here.
If Git is not already available on your machine you can try to
install it via your distro's package manager. For Debian/Ubuntu run
sudo apt-get install git and for Fedora run
sudo dnf install git.
Text Editor
When you're writing code, it's nice to have a text editor that is
optimized for writing code, with features like automatic
color-coding of key words. The default text editor on macOS and
Linux is usually set to Vim, which is not famous for being
intuitive. If you accidentally find yourself stuck in it, hit
the Esc key, followed by :+Q+!
(colon, lower-case 'q', exclamation mark), then hitting Return to
return to the shell.
nano is a basic editor and the default that instructors use in the workshop.
It is installed along with Git.
Others editors that you can use are
Notepad++ or
Sublime Text.
Be aware that you must
add its installation directory to your system path.
Please ask your instructor to help you do this.
nano is a basic editor and the default that instructors use in the workshop.
See the Git installation video tutorial
for an example on how to open nano.
It should be pre-installed.
R is a programming language
that is especially powerful for data exploration, visualization, and
statistical analysis. To interact with R, we use
RStudio.
Install R by downloading and running
this .exe file
from CRAN.
Also, please install the
RStudio IDE.
Note that if you have separate user and admin accounts, you should run the
installers as administrator (right-click on .exe file and select "Run as
administrator" instead of double-clicking). Otherwise problems may occur later,
for example when installing R packages.
You can download the binary files for your distribution
from CRAN. Or
you can use your package manager (e.g. for Debian/Ubuntu
run sudo apt-get install r-base and for Fedora run
sudo dnf install R). Also, please install the
RStudio IDE.