2.168 Simpson’s paradox and regression. Simpson’s paradox occurs when a relationship between variables within groups of observations reverses when all of the data are combined. The phenomenon is usually discussed in terms of categorical variables, but it also occurs in other settings. Here is an example:
y | x | Group | y | x | Group |
---|---|---|---|---|---|
10.1 | 1 | 1 | 18.3 | 6 | 2 |
8.9 | 2 | 1 | 17.1 | 7 | 2 |
8.0 | 3 | 1 | 16.2 | 8 | 2 |
6.9 | 4 | 1 | 15.1 | 9 | 2 |
6.1 | 5 | 1 | 14.3 | 10 | 2 |
(a) Make a scatterplot of the data for Group 1. Find the least-squares regression line and add it to your plot. Describe the relationship between y and x for Group 1.
(b) Do the same for Group 2.
(c) Make a scatterplot using all 10 observations. Find the least-squares line and add it to your plot.
(d) Make a plot with all of the data using different symbols for the two groups. Include the three regression lines on the plot. Write a paragraph about Simpson’s paradox for regression using this graphical display to illustrate your description.