SECTION 9.1 Summary
- The null hypothesis for tables of count data is that there is no relationship between the row variable and the column variable.
- Expected cell counts under the null hypothesis are computed using the formula
- The null hypothesis is tested by the chi-square statistic, which compares the observed counts with the expected counts:
- Under the null hypothesis, has approximately the chi-square distribution with degrees of freedom. The P-value for the test is
where is a random variable having the distribution with .
- The chi-square approximation is adequate for practical use when the average expected cell count is 5 or greater and all individual expected counts are 1 or greater, except in the case of tables. All four expected counts in a table should be 5 or greater.
- To analyze a two-way table, first compute percents or proportions that describe the relationship between the row and column variables. Then calculate expected counts, the chi-square statistic, and the P-value.
- Two different models for generating tables lead to the chi-square test. In the first model, independent SRSs are drawn from each of populations, and each observation is classified according to a categorical variable with possible values. The null hypothesis is that the distributions of the row categorical variable are the same for all populations. In the second model, a single SRS is drawn from a population, and observations are classified according to two categorical variables having and possible values. In this model, states that the row and column variables are independent.