Chi-Square (Independence)
The square test is based on frequencies (number of observations), not on their relative abundances or proportion. If as a result of earlier data manipulation, only proportion or percentages are available, they need to be transformed back into frequencies.
chisq.test {stats}
- t is a table of frequencies.
Chi-Square test (Independence)
M Claereboudt
December 19, 2015
The Chi-square test for independence, tests if observed frequencies distributed among the categories of two categorical variables are independent of each other. In this first example, we will assess if the frequencies of breast self-examination vary with age. We need first to create the data:
row1 = c(91,90,51) # first row of data
row2 = c(150,200,155) # second row of data
row3 = c(109,198,172) # third row of data
data.table = rbind(row1, row2, row3) # binds the rows together
data.table # display the content of the table
## [,1] [,2] [,3]
## row1 91 90 51
## row2 150 200 155
## row3 109 198 172
We are now ready to carry out the test.
chisq.test(data.table)
##
## Pearson's Chi-squared test
##
## data: data.table
## X-squared = 25.086, df = 4, p-value = 4.835e-05
The p-value is very small, suggesting that we cannot assume that the fequency of breast examination does not vary with age. Note the df of the test. Since we had 3 rows and 3 columns of freqencies in the table, the degrees of freedom = (3-1)x(3-1)= 2x2 = 4.
As a second example, we will use a 2x2 table. We will use a slightly different method of data entry but entirely equivalent.
x <- c(28, 20, 289, 276) # put the data in a vector, by rows
xt <- matrix (x, byrow = TRUE, 2,2)
We can now use the chi square test to assess whether the rows are indendent of the columns. Because there are only two levels in each categorical variables we should apply Yates correction.
chisq.test(xt, correct = TRUE)
##
## Pearson's Chi-squared test with Yates' continuity correction
##
## data: xt
## X-squared = 0.64909, df = 1, p-value = 0.4204
The high p-value suggests that indeed, the observed frequencies along the columns are similar between rows.