The null hypothesis for r×c tables of count data is that there is no relationship between the row variable and the column variable.
Expected cell counts under the null hypothesis are computed using the formula
expected count=row total ×column totaln
The null hypothesis is tested by the chi-square statistic, which compares the observed counts with the expected counts:
Χ2=Σ(observed-expected)2expected
Under the null hypothesis, Χ2 has approximately the chi-square distribution with (r−1)(c−1) degrees of freedom. The P-value for the test is
P(χ2≥Χ2)
where χ2 is a random variable having the χ2(df) distribution with df=(r−1)(c−1).
The chi-square approximation is adequate for practical use when the average expected cell count is 5 or greater and all individual expected counts are 1 or greater, except in the case of 2×2 tables. All four expected counts in a 2×2 table should be 5 or greater.
To analyze a two-way table, first compute percents or proportions that describe the relationship between the row and column variables. Then calculate expected counts, the chi-square statistic, and the P-value.
Two different models for generating r×c tables lead to the chi-square test. In the first model, independent SRSs are drawn from each of c populations, and each observation is classified according to a categorical variable with r possible values. The null hypothesis is that the distributions of the row categorical variable are the same for all c populations. In the second model, a single SRS is drawn from a population, and observations are classified according to two categorical variables having r and c possible values. In this model, H0 states that the row and column variables are independent.