Maciej Dobrzyński (Institute of Cell Biology, University of Bern)
November 3, 2020
The first part will demonstrate:
data.table & ggplot2,R notebook with the code.
During the second part we will process time-series data from a time-lapse microscopy experiment. We will:
Intermediate datasets throughout the workshop:
PDF with an introduction to datasets.
Notebook with the practical session.
data.table
Extension of base R's data.frame structure.
Fast data manipulation with a concise SQL-like syntax.
Check out the vignette for an introduction and Advanced tips and tricks with data.table for expanding your knowledge.
Packages are R's greatest strength but may create confusion.
CRAN = The Comprehensive R Archive Network, is a package repository that currently features >15k packages.
Aside from an obligatory reference manual, many packages include vignettes, i.e. digestible intros into working with a package.
To access functions provided by R packages, a package needs to be loaded:
library(data.table)
Then, functions such as dcast, melt, etc. are directly available right in the R interpreter.
However, there can be more packages that provide functions with the same name! For example:
library(plyr)
library(Hmisc)
Both provide a function summarise. Upon loading the second package, R throws a warning:
> require(Hmisc)
Loading required package: Hmisc
Loading required package: lattice
Loading required package: survival
Loading required package: Formula
Attaching package: ‘Hmisc’
The following objects are masked from ‘package:plyr’:
is.discrete, summarize
Therefore, it is a good practice to call functions including the package reference:
plyr::summarise
Hmisc::summarise
An opinionated collection of R packages designed for data science. All packages share an underlying design philosophy, grammar, and data structures.
Brief, visual sumamries of packages' functionality.
A variable refers to a storage location in computer's memory, e.g.
myVariable = 5.3
The symbolic name myVariable refers to a memory location that stores the number 5.3.
A variable can vary!
myVariable = myVariable + 2.8
We changed the value referred by the mnemonic myVariable. Now it stores 8.1.
Data stored under variables can have different types. There are 5 of them in R. Use function typeof() to check.
Illustration from R tutorial on TechVidvan.
A data structure is a way of storing and organising data. For example, in order to store 10 integers, we could define 10 variables, which isn't very efficient. Instead, we can store these numbers in a vector.
Illustration from R tutorial on TechVidvan.
Control structures change the flow of the code. The changes are based on conditions, e.g. if variable a is greater than a certain value, do this, otherwise, do that.
Illustration from R tutorial on TechVidvan.
Source code with the template.
## Load libraries ----
library(data.table)
library(ggplot2)
## Global variables ----
# Lists with parameters for easy recall
lParRW = list(
fileIn = "experimentalResults.csv",
fileOut = "processedData",
filePlotOut = "boxPlot_activity.pdf"
)
lCol = list(
time = "Time_h",
meas = "sensor_ch0",
group = "Exp_cond"
)
## Custom functions ----
# Define custom functions or
# load from an external file
source("myFunctionLIbrary.R")
locCalcStats = function(...) {
...
}
## Read data ----
dt = fread(lParRW$fileIn)
## Clean data ----
# Remove unnecessary columns
dt[,
c("uselessColumn1",
"uselessColumn2") := NULL]
## Process data ----
...
## Save output data ----
fwrite(x = dt,
file = lParRW$fileOut)
## Save plots ----
p1 = ggplot2(dt,
aes(x = ...,
y = ...)) +
geom_line(aes(color = group))
ggsave(filename = lParRW$filePlotOut,
plot = p1)