Correlation | Introduction to R

Correlation tests (2 variables)

Enter the name for this tabbed section: Description

This test is used to establish whether two variables are significantly correlated or not. It is based on paired data (two observations on the same statistical objects). The test can calculate the significance of a correlation based on three indices:

Pearson’s correlation
Spearman’s rank correlation
Kendall’s rank correlation

Pearsons correlation
In this test, after transformation, the coefficient of correlation follows a t-distribution with n-2 degrees of freedom.
Spearman and Kendall correlation
These two tests are based on the ranks of the observations, not their numerical values. If n is small and there are no ties in the ranks, it is possible to calculate an exact p-value (exact = TRUE) if n is larger, the exact p-value cannot be calculated but good approximations are available (exact = FALSE). If exact is not defined, R will choose an exact p-value, but will also indicates that accuracy may not be garanteed, This happens when equal ranks for instance are found in the dataset.

Enter the name for this tabbed section: R Syntax

In R, the test can be performed using the same main syntax :

cor.test{stats}

cor.test(var1,var2, method = c(“pearson”, “spearman”, “kendall”), alternative = c(“two.sided, “less”, “greater”), exact = TRUE/FALSE)

method defines either the type of correlation test to be performed (1 of 3 choices: “pearson”, “spearman”, “kendall”)
alternative = one of “two.sided”, “less”, “greater”. The parameters defines to define the type of alternative hypothesis to be use: two.sided defines the alternative h1 correlation is not =0, “greater’ means R > 0 and of course “less” indicates H1 = rR < 0.
exact = TRUE is used when we want an exact p-value (spearman or kendal tests, n relatively small, no ties between ranks). R will issue a warning if the exact test is no possible. Exact = FALSE is used to indicates we want an estimate of the p-value based on approximations.

Enter the name for this tabbed section: Code Example

The correlation tests (cor.test) have the same syntax as the correlation function (cor) but returns the result of a null hypothesis test (that the correlation = 0). The test itself depends on the method: for pear

# Correlation tests
#lets first load the iris data set
data(iris)
head(iris)

##   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1          5.1         3.5          1.4         0.2  setosa
## 2          4.9         3.0          1.4         0.2  setosa
## 3          4.7         3.2          1.3         0.2  setosa
## 4          4.6         3.1          1.5         0.2  setosa
## 5          5.0         3.6          1.4         0.2  setosa
## 6          5.4         3.9          1.7         0.4  setosa

We can see that the dataset includes 4 numerical variables. We can use any pairs of numerical variables to calculate the correlation tests.

Our first syntax will use the Spearman’s test between Sepal.Length and Petal.Length, more specifically we will test the correlation = 0 against the alternative hypothesis correlation > 0.

# To calculate correlation we use the cor( ) function.
# To perform a correlation test, we use the cor.test( ) which has the same arguments
res <- cor.test(iris$Sepal.Length, iris$Petal.Length, method = "spearman", alternative = "greater",exact = FALSE)
res

## 
##  Spearman's rank correlation rho
## 
## data:  iris$Sepal.Length and iris$Petal.Length
## S = 66429, p-value < 2.2e-16
## alternative hypothesis: true rho is greater than 0
## sample estimates:
##       rho 
## 0.8818981

The spearman’s test use either an approximation of the distribution of the correlation under the null hypothesis if n is large and there are no equal ranks in the measurements. Otherwise, the test uses a normal approximation. We can force the approximation using the exact = FALSE as we have done here. A similar approach is used for Kendall’s correlation test. To visualize the results of the test, we can call the name of the R object associated with the test. In our case, the null hypothesis is strongly rejected. It is very unlikely that the correlation between these two variables is 0.

We can also test if Pearson’s correlation = 0. We will use Sepal.Length vs Sepal.Width for this test.

# To calculate correlation we use the cor( ) function.
# To perform a correlation test, we use the cor.test( ) which has the same arguments
res2 <- cor.test(iris$Sepal.Length, iris$Sepal.Width, method = "pearson")
res2

## 
##  Pearson's product-moment correlation
## 
## data:  iris$Sepal.Length and iris$Sepal.Width
## t = -1.4403, df = 148, p-value = 0.1519
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.27269325  0.04351158
## sample estimates:
##        cor 
## -0.1175698

Here we can conclude that the correlation is not likely different from 0. We do not reject the null hypothesis.

Introduction to R

Yet another tutorial on R

Correlation tests (2 variables)

cor.test{stats}