Chi-Square (Independence)

Enter the name for this tabbed section: Description
This Chi-square test is used to examine whether the frequencies of observations between two categorical variables are independent of each other. In other words to assess if the frequencies of observation along the levels of one categorical variables are similar among the levels of a second categorical variables.
The square test is based on frequencies (number of observations), not on their relative abundances or proportion. If as a result of earlier data manipulation, only proportion or percentages are available, they need to be transformed back into frequencies.
Enter the name for this tabbed section: R Syntax
In R, the test can be performed using the syntax :

chisq.test {stats}

chisq.test(t, correct = TRUE / FALSE)
  • t is a table of frequencies.
In this syntax, R assumes that the frequencies are provided in a tabular format (table or matrix). The frequencies can be calculated from a data.frame using the table( ) function. The correct = TRUE will also include Yates correction for 2x2 tables.
Enter the name for this tabbed section: Code Example
Chi-Square test (Independence)

The Chi-square test for independence, tests if observed frequencies distributed among the categories of two categorical variables are independent of each other. In this first example, we will assess if the frequencies of breast self-examination vary with age. We need first to create the data:

row1 = c(91,90,51)          # first row of data
row2 = c(150,200,155)     # second row of data
row3 = c(109,198,172)     # third row of data
data.table = rbind(row1, row2, row3)   # binds the rows together
data.table   # display the content of the table
##      [,1] [,2] [,3]
## row1   91   90   51
## row2  150  200  155
## row3  109  198  172

We are now ready to carry out the test.

chisq.test(data.table)
## 
##  Pearson's Chi-squared test
## 
## data:  data.table
## X-squared = 25.086, df = 4, p-value = 4.835e-05

The p-value is very small, suggesting that we cannot assume that the fequency of breast examination does not vary with age. Note the df of the test. Since we had 3 rows and 3 columns of freqencies in the table, the degrees of freedom = (3-1)x(3-1)= 2x2 = 4.

As a second example, we will use a 2x2 table. We will use a slightly different method of data entry but entirely equivalent.

x <- c(28, 20, 289, 276)  # put the data in a vector, by rows
xt <- matrix (x, byrow =  TRUE, 2,2)

We can now use the chi square test to assess whether the rows are indendent of the columns. Because there are only two levels in each categorical variables we should apply Yates correction.

 chisq.test(xt, correct =  TRUE)
## 
##  Pearson's Chi-squared test with Yates' continuity correction
## 
## data:  xt
## X-squared = 0.64909, df = 1, p-value = 0.4204

The high p-value suggests that indeed, the observed frequencies along the columns are similar between rows.