Pre-session 1 instructions

What is this page for?

This page has instructions for installing R & RStudio on your laptops and confirming they are working all right. This is to ensure that we are ready for Stats Workshop session 1. This activity will likely take around 30 min to complete.

After following instructions on this page, go to Session 1 and Session 2 pages.

R/Studio & package versions

Even if you have previously installed R/RStudio or ggplot2 or other packages, please re-install the latest versions. Old versions are likely to lead to errors during the workshops.

Get latest versions even if you have used R before

If you have previous versions or R and/or RStudio, please uninstall them and re-install their latest versions as below.

We will use the latest R and RStudio, and various packages, therefore please read this page to ensure we are all have working copies of required software.

Installation steps

Follow the steps in this order.

  1. R is freely available from The Comprehensive R Archive Network (CRAN). Go to the CRAN website and use the links at the top for your OS (operating system, e.g., Windows or MacOS or Linux) to download and install R. Use default options during installation.

  2. RStudio Desktop is a free IDE (integrated development environment) that makes it very easy to use R that we installed in step 1. Go to the RStudio website and install their free RStudio Desktop version for your OS. Use default options during installation. Note that RStudio was recently renamed “Posit”, but nothing has changed for users. I still call it RStudio.

  3. Open RStudio and familiarise yourself with the four windows or “panes”. You could start by watching these short videos (will likely need Imperial login to access the Sharepoint site), the first one Getting started with R - intro session 1 explains the installation process.

My RStudio looks like this (settings on top and appearance below), but you can use settings you prefer!

  1. For the workshop, we will need several plotting and statistics packages. To install them, copy the code below to your ‘console’ and hit ENTER to run it. It could take several minutes to download and install all required packages. Note that R is case sensitive (knitr is not the same as Knitr, KNITR or other variants).

install.packages(c("rmarkdown", "knitr", "readr", "readxl", "grafify", "palmerpenguins"), dependencies = TRUE)

The above code will download and install packages called rmarkdown & knitr (for creating and saving code and graphs), readr & readxl (for reading data from Excel files), grafify (for graphs and ANOVA) and palmerpenguins (a dataset of measurements taken at the Palmer Station, Antarctica), and any packages that they depend on. This can take some time depending on download speeds etc.

Depending on your OS and other settings, during the installation process you might be asked questions, for example:

For this question, it is good to say “Yes” once; if it asks the same again, say “No” and proceed.

Another question could be:

Say “No” and proceed (building from source needs more software and can be much more time-consuming).

What if there are errors?

The easiest thing to do if there are errors is to copy the text of the error and do a Google search. Almost all errors have been experienced by other users and there may be help online to resolve them.

The basic operators

Let’s look at the common operators in R.

  1. assign operator <- (you can also use =). Pressing Alt and - together is the short-cut for this operator. You will use <- to assign values and create objects in R.

Create your first objects as below. Alpha and five will appear in your Environment pane.

#an object with an alphabet
Alpha <- "A"
#an object with one number
five <- 5
  1. # at the start of line will indicate it is a comment and not something to be executed on the console. For example, typing 5*6 on the console will give you the result (30), but # 5*6 will not work because R thinks that’s a comment.

  2. common math operators are *, /, +, -, ^, and brackets ( )

# a comment
7*3 #three times seven
[1] 21
#7*3 is also a comment

#mathematical operation on an object
five * 5 
[1] 25
#this will fail try it without the #
#Alpha * 5
  1. the $ operator is used to look up columns in a table.

cars is a table in base R, let’s use it as an example.

  1. the operator = is also used to assign an arugment of a function.

head is a function to see the rows from the start of a table. Let’s use this function the table cars.

head(cars) #shows top 6 rows by default
  speed dist
1     4    2
2     4   10
3     7    4
4     7   22
5     8   16
6     9   10
#with a value to an argument for head
head(cars, n = 3) #show first 3 rows
  speed dist
1     4    2
2     4   10
3     7    4
#see the table cars
View(cars)

#pick the column speed from cars
cars$speed
 [1]  4  4  7  7  8  9 10 10 10 11 11 12 12 12 12 13 13 13 13 14 14 14 14 15 15
[26] 15 16 16 17 17 17 18 18 18 18 19 19 19 20 20 20 20 20 22 23 24 24 24 24 25
#mathematical operation on the column
cars$speed + 10 
 [1] 14 14 17 17 18 19 20 20 20 21 21 22 22 22 22 23 23 23 23 24 24 24 24 25 25
[26] 25 26 26 27 27 27 28 28 28 28 29 29 29 30 30 30 30 30 32 33 34 34 34 34 35

Other operators we will look at in Session 1 are:

  1. the square brackets [ ] have a special meaning, and typically used for sub-setting data frames (i.e., only picking rows in the table that match a criterion).

  2. logic operators such as == (equal to), != (not equal to), >= (equal to or greater than) or =< (equal to or less than)

Check installation

Check that everything has gone OK by loading one of the packages we installed, and using it to plot a graph. To save memory, R does not load all available packages - the user must invoke them by using the command library, and then use functions in the package.

This is a two-step process.

  1. First copy the following line to your console and hit ENTER (this will make the grafify package available to use). It will only work if your installation went OK!

library(grafify)

  1. Next, copy this line to your console and hit ENTER. It should produce a graph using the grafify package.

plot_scatterbox(data_t_pdiff, Condition, Mass)

If all went OK, a graph with scattered symbols with box and whiskers should appear in the Plots pane.

Loading required package: ggplot2

In the function above, we used a data table from the grafify package and used the function plot_scatterbox. First, let’s look at the data table, or more commonly called a data frame in R.

#View function opens the table in the source pane
View(data_t_pdiff)

#names of columns
names(data_t_pdiff)
[1] "Subject"   "Condition" "Mass"     
#structure of the table
str(data_t_pdiff)
'data.frame':   20 obs. of  3 variables:
 $ Subject  : Factor w/ 10 levels "A","B","C","D",..: 1 1 2 2 3 3 4 4 5 5 ...
 $ Condition: Factor w/ 2 levels "Treated","Untreated": 2 1 2 1 2 1 2 1 2 1 ...
 $ Mass     : num  20.9 33.4 21 33.9 28.7 ...

You now see that the table has 3 columns, Subject (an ID of each mouse used in the study), Condition (whether a mouse was Treated or Untreated with a drug) and Mass (its body weight in grams).

When we plotted the graph above, we used the first three arguments in their default order. We can be more explicit as below.

plot_scatterbox(data = data_t_pdiff,    #name of data frame
                xcol = Condition,       #column to plot on X axis
                ycol = Mass)            #column to plot on Y axis

No one can remember the arguments, let alone their order for all the functions in R. But do not worry, help is nearby. Adding a ? before the name of the function gives you usage details in the Help pane.

?plot_scatterbox()
starting httpd help server ... done

This is how help looks on my Rstudio. As you can see, there are many arguments to tweak the graph, most of which have sensible defaults. We will rarely need to assign a value to all of them, but it’s good to know what can be changed in this graph.

Lastly, you can use :: after the name of a package to find all the functions available in it. Try it with grafify:: and you should be able to scroll through a list of data frames and functions in this package.

Further exercises

Go to Session 1 if you’d like to try out more things in R!

Additional resources

Additional helpful resources listed here at the Statistics for Micro/Immuno-biologists website.