SECTION 2.5 Summary
- A two-way table of counts organizes counts of data classified by two categorical variables. Values of the row variable label the rows that run across the table, and values of the column variable label the columns that run down the table. Two-way tables are often used to summarize large amounts of information by grouping outcomes into categories.
- The row totals and column totals in a two-way table give the marginal distributions of the two individual variables. It is clearer to present these distributions as percents of the table total. Marginal distributions tell us nothing about the relationship between the variables.
- To find the conditional distribution of the row variable for one specific value of the column variable, look only at that one column in the table. Divide each entry in the column by the column total.
- There is a conditional distribution of the row variable for each column in the table. Comparing these conditional distributions is one way to describe the association between the row and the column variables. It is particularly useful when the column variable is the explanatory variable.
- Bar graphs are a flexible means of presenting categorical data. There is no single best way to describe an association between two categorical variables.
- Mosaic plots are effective graphical displays for two-way tables, particularly when the column variable is an explanatory variable.
- A comparison between two variables that holds for each individual value of a third variable can be changed or even reversed when the data for all values of the third variable are combined. This is Simpson’s paradox. Simpson’s paradox is an example of the effect of lurking variables on an observed association.