t-test for the difference in means (2 samples, equal variances)
From a biologist standpoint, it is possibly the most commonly performed test in experimental biology.
The test is a generalisation of the normal test for the difference in means described for large samples. In large samples, the sampling distribution of the mean follows a normal distribution with a mean = to the mean of the population and a standard deviation equal to the standard deviation of the population divided by square root of n.
When n decreases, the sampling distribution of the means follows a t-distribution with the same parameters but with n-1 degrees of freedom.
t.test{stats}
- var.equal = TRUE / FALSE defines either this test (TRUE) or the Welch test which does not assume equality of variance (FALSE).
- alternative = one of “two.sided”, “less”, “greater”. The parameters defines to define the type of alternative hypothesis to be used: two.sided for bilateral test and “greater” or “less” for unilateral tests.
- the paired indicates whether elements of the samples are linked.
The t tests exists in several version and many syntaxes. The first syntax for a 2 sample test, is the one closest to the textbook example in which the data is included in 2 independent vectors (i.e. vectors that are not attached to any larger object such as a data frame).
# Lets create two variables V1 and v2 corresponding to two samples
#
v1 <- c(175, 168, 168, 190, 156, 181, 182, 175, 174, 179)
v2 <- c(185, 169, 173, 173, 188, 186, 175, 174, 179)
# We can now carry out the t-test (assuming equality of variance)
#
t.test(v1,v2, var.equal = TRUE, alternative = "two.sided")
##
## Two Sample t-test
##
## data: v1 and v2
## t = -0.84472, df = 17, p-value = 0.41
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -11.192439 4.792439
## sample estimates:
## mean of x mean of y
## 174.8 178.0
In this particular example, we do not reject the null hypothesis (as p >0.05). This second example uses example uses another syntax in which the data is located in a dataframe described by two variables: the continuous variables that is the actual measurement (Height) and a factor variable (Group) with the 2 levels that describe the groups. The file should be located in the current working directory
# Le's now use the same two vectors but, in a dataframe
#downloaded from a csv file.
dt <- read.csv("HeightCm.csv")
str(dt)
## 'data.frame': 19 obs. of 2 variables:
## $ Group : Factor w/ 2 levels "G1","G2": 1 1 1 1 1 1 1 1 1 1 ...
## $ Height: int 175 168 168 190 156 181 182 175 179 174 ...
# we will use the second syntax now
t.test(Height ~ Group, data = dt, var.equal =TRUE, alternative = "two.sided")
##
## Two Sample t-test
##
## data: Height by Group
## t = -0.84472, df = 17, p-value = 0.41
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -11.192439 4.792439
## sample estimates:
## mean in group G1 mean in group G2
## 174.8 178.0
Note the ~ syntax which reads, compare the mean of Height as a function of Group, located in the data dt. This syntax is quite flexible and allow the successive comparison of different pairs of variables located in the same dataframe. The results are of course absolutely identica: the same analysis was performed.
# to test if difference 1-2 < 0, we would the following syntax
t.test(Height ~ Group, data = dt, var.equal =TRUE, alternative = "less")
##
## Two Sample t-test
##
## data: Height by Group
## t = -0.84472, df = 17, p-value = 0.205
## alternative hypothesis: true difference in means is less than 0
## 95 percent confidence interval:
## -Inf 3.390008
## sample estimates:
## mean in group G1 mean in group G2
## 174.8 178.0
The use of the alternative = “less”, tests if the difference between first and second levels of the factor is negative. If we wanted to check for a positive difference Mean 1 > Mean2, we would have used alternative = “greater”.