This page has code for session 2 of BPI stats workshop. These exercises will only work if you have completed the pre-session items described here correctly. Go to the Session 1 instructions before these exercises.
Create a new Project
Create a project and work inside it to stay organised. You may continue in the project you created in session 1.
Import data from Excel
We will use data in grafify and the “Weights.xlsx” file we imported previously.
Plotting data
R prefers long data format for plotting. It is always easier to convert wide data table to long using the tidyr package as shown in an example here.
We will use Weights_long for exploratory graphs.
#load grafifylibrary(grafify)
Loading required package: ggplot2
Warning in check_dep_version(): ABI version mismatch:
lme4 was built with Matrix ABI version 1
Current Matrix ABI version is 2
Please re-install lme4 from source or restore original 'Matrix' package
penguins has some rows with no data (NULL or NA), which we will remove for this session (to avoid certain errors – it is not OK to exclude data without good reason!).
library(palmerpenguins) #load the package#look at data table dim(palmerpenguins::penguins)
[1] 344 8
#remove null data with na.omit penguins <-na.omit(palmerpenguins::penguins)dim(penguins) #table dimensions without null rows
#data distribution - QQ plot plot_qqline(penguins, ycol = body_mass_g, group = species)
#data distribution - histogram plot_histogram(penguins, ycol = body_mass_g, group = species, facet = species, TextXAngle =90)
Student’s t test
t test with long data tables
With long data, we use the formula input, i.e., Y ~ X, where Y is the numeric variable or outcome or dependent variable (this is usually plotted in the Y-axis) and X is the categorical or discrete independent variable (usually on X-axis). The tilde sign ~ is a special character in R used for formula in linear regression. It indicates the variable on the left hand side is predicted by or varies with the variable(s) on right hand side.
#formula with long tablet.test(formula = Bodyweight ~ Diet, #formula input for t.test()data = Weights_long)
Welch Two Sample t-test
data: Bodyweight by Diet
t = -2.7058, df = 17.892, p-value = 0.01453
alternative hypothesis: true difference in means between group chow and group fat is not equal to 0
95 percent confidence interval:
-7.1178367 -0.8941633
sample estimates:
mean in group chow mean in group fat
25.603 29.609
t test with wide data tables
With wide tables, the $ operator is used to point to the two columns containing data to be compared.
#x and y inputs with wide tables t.test(x = Weights$chow, y = Weights$fat)
Welch Two Sample t-test
data: Weights$chow and Weights$fat
t = -2.7058, df = 17.892, p-value = 0.01453
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-7.1178367 -0.8941633
sample estimates:
mean of x mean of y
25.603 29.609
Output of Student’s t tests
Welch test (default) applies a correction if the SDs of the two groups are not the same. In this example the SDs are slightly different, so we have been penalised by a reduction in the degrees of freedom (df). The df used with Welch’s correction is not a whole number (it is 17.892). The expected df was: total sample size (\(n\)) - 2 = 20 - 2 = 18.
The data used for the analyses.
The test statistic in a Student’s t test is called t, which is a \(signal/noise\) ratio. This is the most important number to look at in t tests. From our data, the test statistic was calculated to be -2.7058. A larger value indicates high difference between the two groups (signal) and low dispersion (noise).
The two-tailed P value is 0.01453. This is the AUC from a t distribution with 17.892 degrees of freedom (df) for t values at least as extreme as \(\pm\) 2.7058.
The 95% confidence interval of the difference between the two group means is also provided.
The means of the two groups are at the end.
Therefore, if the null hypothesis was true, the chance of observing a test statistic at least as extreme as 2.7058 is 0.01453. Given that we usually set \(\alpha\) at 0.05, we may reject the null hypothesis.
Diagnosing a t test
A t test is reliable only if the data in both groups are approximately normally distributed. In reality, with few independently performed experiments (i.e., small sample size), this may be difficult to test. But a strong deviation from normal distribution is a “red flag” (i.e., you should consider data transformations).
One way to check approximate normal distributions is to plot a QQ plot of both groups before performing the test.
Note: this is only the case for t tests; in ANOVAs (when there are more than 3 groups present), the normality of the residuals from the model should be tested rather than that of the raw data. See the section on t test as a linear model for further details.
Student’s t test is only appropriate when you are comparing exactly two groups. It is not correct to use multiple t tests when you have multiple groups to compare – you should use ANOVAs. In the statistics section of Methods and/or figure legends, you must report the test, the P value, whether you performed a one- or two-tailed test and whether you had independent or matched data. In the Methods section, you should also mention how experiments were designed, any data transformation and diagnostics performed on model residuals. More formally, the result is reported as follows (but this is uncommon in biology journals): The result of a two-tailed independent group ttest were as follows: t (17.892) = -2.71; P = 0.01453. The t statistic and the degrees of freedom give an idea of the sample size and effect size, which is the basis for obtaining the P value and is therefore more informative.
Power of a t test
Let’s first get the mean and SD of the two groups and look at the chance of getting a similar “significant” result with 10 independent draws from the same distributions.
The mean \(\pm\) SD are 25.603 \(\pm\) 3.4 and 29.609 \(\pm\) 3.18 for chow and fat, respectively.
Let’s try and see how much variability we have in t tests for 10 random numbers drawn from these distributions (i.e., normal distributions with with these mean and SD) using the rnom function.
#rnorm gives random variates from normal distributiont.test(x =rnorm(n =10, #number of variatesmean =25.604, #meansd =3.5), #sdy =rnorm(n =10,mean =29.609,sd =3.18))
Welch Two Sample t-test
data: rnorm(n = 10, mean = 25.604, sd = 3.5) and rnorm(n = 10, mean = 29.609, sd = 3.18)
t = -3.0241, df = 16.949, p-value = 0.00767
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-6.355474 -1.131175
sample estimates:
mean of x mean of y
25.08675 28.83008
#repeat 10 times with replicate()#get P values with $p.valuereplicate(t.test(x =rnorm(n =10, mean =25.604, sd =3.5),y =rnorm(n =10,mean =29.609,sd =3.18))$p.value,n =10)
From these 10 tests, how many have P < \(\alpha\) ?
Estimating the required sample size
The idea is to start with a sufficiently large sample size so we can correctly reject the null hypothesis (when it is false) at least 80 % of repeated sampling. This means a power of 0.8 (i.e., false negative rate \(\beta\) of 0.2; power = 1 - \(\beta\)).
To calculate power, we need to know Cohen’s effect size d. Below we calculate Cohen’s effect size d by hand (difference between the means/pooled SD). The effectsizepackage has a convenient function to do this for more complex cases (e.g., when the number of both groups is not equal). See an example of how to use it here.
#cohen'd by hand(25.603-29.609)/((3.4+3.18)/2)
[1] -1.217629
#alternative method#install the package pwr firstlibrary(pwr) #load the library#formula input like t.test()effectsize::cohens_d(Bodyweight ~ Diet, data = Weights_long)
Cohen's d | 95% CI
--------------------------
-1.21 | [-2.16, -0.24]
- Estimated using pooled SD.
Cohen's d | 95% CI
--------------------------
-1.21 | [-2.16, -0.24]
- Estimated using pooled SD.
#use the value in the next steppwr.t.test(n =NULL, #we want to estimate this d =-1.21, #effect sizesig.level =0.05, #desired alpha power =0.8) #desired power (1-beta)
Two-sample t test power calculation
n = 11.76412
d = 1.21
sig.level = 0.05
power = 0.8
alternative = two.sided
NOTE: n is number in *each* group
If you had known the means and SDs before hand (e.g., in a small pilot experiment), you should design the main experiment with 12 mice per group for 80 % power for the experiment. Let’s repeat the simulated t test above with n = 12 to check for increased power as compared to n = 10.
#repeat 10 times and get P valuesreplicate(t.test(x =rnorm(n =12, mean =25.604, sd =3.5),y =rnorm(n =12,mean =29.609,sd =3.18))$p.value,n =10)
This is an exercise for you. The above examples used code for groups with independent sampling. These data are from matched mice!
Repeat the above t tests and power analyses for paired or matched data. You could use the wide table for the t.test and the cohen_d functions with an additional argument paired = TRUE in both functions.
How do the results differ? Is the sample size different if the experiment was designed with independent mice or matched mice?
t test as a linear model
In R linear models (straight lines) can be fit with the function lm. In grafify, this is done with simple_model for independent groups and mixed_model for paired or matched data. See the code below and identify the formula in the outputs.
Linear mixed model fit by REML ['lmerModLmerTest']
Formula: Bodyweight ~ Diet + (1 | Mouse)
Data: Weights_long
REML criterion at convergence: 97.4276
Random effects:
Groups Name Std.Dev.
Mouse (Intercept) 2.024
Residual 2.619
Number of obs: 20, groups: Mouse, 10
Fixed Effects:
(Intercept) Dietfat
25.603 4.006
Note that the Intercept is the mean of group 1 (chosen alphabetically unless reordered by hand), and Dietfat is the slope, i.e., the difference between the two means. Compare the values of the Intercept and Dietfat (group 2) with the t.test results above. For diagnostics and post-hoc tests, we will need to save the linear model object.
#save the modelweights_model <-simple_model(data = Weights_long,Y_value ="Bodyweight",Fixed_Factor ="Diet")#use summary() to look at itsummary(weights_model)
Call:
lm(formula = Bodyweight ~ Diet, data = Weights_long)
Residuals:
Min 1Q Median 3Q Max
-5.243 -1.667 -0.694 1.538 6.417
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 25.603 1.047 24.456 2.92e-15 ***
Dietfat 4.006 1.481 2.706 0.0145 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 3.311 on 18 degrees of freedom
Multiple R-squared: 0.2891, Adjusted R-squared: 0.2496
F-statistic: 7.321 on 1 and 18 DF, p-value: 0.01447
#similarly save the mixed effectsmodelweights_mixed_model <-mixed_model(data = Weights_long,Y_value ="Bodyweight",Fixed_Factor ="Diet",Random_Factor ="Mouse")
summary() shows the full details of the linear model object in R. It has the t statistic for the slope (Dietfat) and the associated P value.
In linear models, the null hypothesis is that slope = 0 i.e., the means of the two groups is the same and a line through the means is parallel to the X-axis. The alternative hypothesis is that the slope != 0 (not equal to 0), i.e., a line joining the two means is not parallel to X with either a positive or negative slope. Do we have evidence to reject the null hypothesis?
Before answering the above question and accepting the result of the test, we must diagnose the model, i.e., ask whether the straight line is a good fit to the data. If the line is a good fit, the residuals of the model should be approximately normally distributed.
#diagnose by plotting residualsplot_qqmodel(weights_model)
We can then get the P value for the slope in an ANOVA table format. For balanced designs, the ANOVA F statistic is the square of the t statistic.
#get the result simple_anova(data = Weights_long,Y_value ="Bodyweight",Fixed_Factor ="Diet")
Anova Table (Type II tests)
Response: Bodyweight
Sum Sq Mean sq Df F value Pr(>F)
Diet 80.24 80.24 1 7.3212 0.01447 *
Residuals 197.28 10.96 18
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
As an exercise,
repeat the above linear model code for matched mice use the mixed_model
diagnose the fitted model
get the P value from an ANOVA table with mixed_anova functions with Random_Factor = "Mouse" argument.
Look closely at the results and answer:
What is the difference from the simple_model result above?
Compare these results with the Student’s t tests above.
Question: did our experimental design with matched mice lead to increased power or should we have used different sets of mice? Which of the two is compatible with 3Rs?
One-way ANOVA
Let’s look at the data_1w_death table from the grafify package.
Plot GST vs the two factors “Strain” and “Treatment”. The question we’re addressing is whether GST levels depend on treatment across the strains of mice.
Type II Analysis of Variance Table with Kenward-Roger's method
Sum Sq Mean Sq NumDF DenDF F value Pr(>F)
Strain 28613 9538 3 7 3.2252 0.09144 .
Treatment 227529 227529 1 7 76.9394 5.041e-05 ***
Strain:Treatment 49591 16530 3 7 5.5897 0.02832 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Post-hoc comparisons
Functions in grafify can do this along with false-discovery rate based correction for multiple comparisons (as default).
For the one-way ANOVA analyses above, let’s compare whether KO1 and KO2 are statistically different from WT. We consider the WT level to be the ‘reference’ level, and use posthoc_vsRef, which requires the fitted linear model and factor name(s).
#default Ref_level = 1#alphabetically the first level#unless reordered with factor()posthoc_vsRef(Model = mlm_1w_1, Fixed_Factor ="Genotype")
For the two-way ANOVA analyses above, let’s compare whether each Strain is affected by Treatment. For this we will use posthoc_Levelwise (e.g., to test whether GST at each level of Treatment has an effect at each level of Strain).
One-way or factorial (two or more factors) ANOVAs used when there are more than two groups to compare. In the Methods and/or figure legends, clarify what kind of ANOVA was performed, which groups are being compared and respective P values. In the Methods, provide as many details as required to understand the experimental design. More formally, ANOVAs are reported along with the F statistic and the degrees of freedoms. F(2, 8) = 42.659; P = 5.401e-05 for the one-way mixed effects analyses above.
Data transformation
We will use the data_t_pratio data set from grafify. Look at the variables and perform exploratory plots.
Proceed to the t test on the appropriately transformed data set. Without data transformation, a paired t test calculates the paired differences.
This only works with wide tables after a recent update to t.test. So convert a long table to wide if you want to use t.test, or use mixed_model as above.
Paired t-test
data: wdata_t_pratio$WT and wdata_t_pratio$KO
t = -3.7385, df = 32, p-value = 0.0007256
alternative hypothesis: true mean difference is not equal to 0
95 percent confidence interval:
-4.523562 -1.332764
sample estimates:
mean difference
-2.928163
With log-transformations, it calculates paired ratios.
#paired ratio t-test#on log-transformed datat.test(log(wdata_t_pratio$WT), log(wdata_t_pratio$KO),paired =TRUE)
Paired t-test
data: log(wdata_t_pratio$WT) and log(wdata_t_pratio$KO)
t = -8.299, df = 32, p-value = 1.758e-09
alternative hypothesis: true mean difference is not equal to 0
95 percent confidence interval:
-0.7808986 -0.4731092
sample estimates:
mean difference
-0.6270039
Note
Many biological data are log-normally distributed. This can mean that the ratio between two group means changes consistently than the difference between them.
For paired ratio t-tests, the null hypothesis is that ratio of the two group means = 1. That is \(mean1/mean2 = 1\).
Therefore, calculations with log-transformed data can be carried out as usual t tests. The null hypothesis is now rephrased as: the difference between the log-transformed means = 0. In other words, to test whether the ratio of the raw (untransformed) means of the two groups is consistent, use paired t test on log-transformed data.