Chi-Square (goodness of fit)

Enter the name for this tabbed section: Description
Chi-square test is used to examine whether observed frequencies of events correspond to a predefined theoretical model (the goodness of fit of the observations to the predictions of the model).
The square test is based on frequencies (number of observations), not on their relative abundances or proportion. If as a result of earlier data manipulation, only proportion or percentages are available, they need to be transformed back into frequencies.
Enter the name for this tabbed section: R Syntax
In R, the test can be performed using the syntax :

chisq.test {stats}

chisq.test(x)
  • x is a vector representing the frequencies in each category
In this syntax, R assumes that the frequencies are theoretically equal: i.e. each frequency is equal to the number of observations/ n of categories.
If the theoretical model does not have equal frequencies, then we should define a vector of probabilities….
chisq.test(x, p = probabilities)
  • x is a vector representing the frequencies in each category
  • probabilities is a vector of the same length as x with the theoretical expected probabilities (not frequencies !).
Enter the name for this tabbed section: Code Example
Chi-Square test (Goodness of fit)

The Chi-square test for goodness of fits, tests if observed frequencies distributed among different categories follow a predicted model. The default model is that all frequencies are equally likely: i.e. that there are no differences in theoretical frequencies. In this first example, we will investigates if the number of people who declare to prefer Coke, Pepsi, Fanta or Mountain Dew are equal. A total of 765 people were interviewed and the following preferences were collected: Coke : 176 Pepsi: 180 Fanta: 187 Mountain Dew: 222

chisq.test(c(176,180,187,222))
## 
##  Chi-squared test for given probabilities
## 
## data:  c(176, 180, 187, 222)
## X-squared = 6.9163, df = 3, p-value = 0.07461

The pvalue is > 0.05, we cannot reject the null hypothesis that people have no particular preferences for any of these soft drink (despite a considerably larger number observed for Mountain Dew).

For the next example we will use a genetic example in which the resistance to a herbicide (supposedly managed by a single dominant gene). When F2 generation of rize are analyzed for resistance, there are 772 resistant pants, 1611 plants with some resistance and 737 plants susceptible to the herbicide. Is this patern compatible with the single gene theory of (1/4: FF, 1/2 fF and 1/4 ff) distribution.

chisq.test(c(772, 1611, 737), p = c(1/4, 1/2, 1/4))
## 
##  Chi-squared test for given probabilities
## 
## data:  c(772, 1611, 737)
## X-squared = 4.1199, df = 2, p-value = 0.1275

The high p-value suggests that indeed, the observed frequencies of resistance match the single gene theory.