Chi-Square (goodness of fit)
The square test is based on frequencies (number of observations), not on their relative abundances or proportion. If as a result of earlier data manipulation, only proportion or percentages are available, they need to be transformed back into frequencies.
chisq.test {stats}
- x is a vector representing the frequencies in each category
If the theoretical model does not have equal frequencies, then we should define a vector of probabilities….
- x is a vector representing the frequencies in each category
- probabilities is a vector of the same length as x with the theoretical expected probabilities (not frequencies !).
Chi-Square test (Goodness of fit)
M Claereboudt
December 19, 2015
The Chi-square test for goodness of fits, tests if observed frequencies distributed among different categories follow a predicted model. The default model is that all frequencies are equally likely: i.e. that there are no differences in theoretical frequencies. In this first example, we will investigates if the number of people who declare to prefer Coke, Pepsi, Fanta or Mountain Dew are equal. A total of 765 people were interviewed and the following preferences were collected: Coke : 176 Pepsi: 180 Fanta: 187 Mountain Dew: 222
chisq.test(c(176,180,187,222))
##
## Chi-squared test for given probabilities
##
## data: c(176, 180, 187, 222)
## X-squared = 6.9163, df = 3, p-value = 0.07461
The pvalue is > 0.05, we cannot reject the null hypothesis that people have no particular preferences for any of these soft drink (despite a considerably larger number observed for Mountain Dew).
For the next example we will use a genetic example in which the resistance to a herbicide (supposedly managed by a single dominant gene). When F2 generation of rize are analyzed for resistance, there are 772 resistant pants, 1611 plants with some resistance and 737 plants susceptible to the herbicide. Is this patern compatible with the single gene theory of (1/4: FF, 1/2 fF and 1/4 ff) distribution.
chisq.test(c(772, 1611, 737), p = c(1/4, 1/2, 1/4))
##
## Chi-squared test for given probabilities
##
## data: c(772, 1611, 737)
## X-squared = 4.1199, df = 2, p-value = 0.1275
The high p-value suggests that indeed, the observed frequencies of resistance match the single gene theory.