Question 6.80

45.Table 6.10 offers four datasets prepared by statistician Frank Anscombe to show the dangers of calculating without first plotting the data.

  1. Without making scatterplots, find the correlation and least-squares regression line for all four datasets. What do you notice? Use the regression line to predict for .
  2. Make a scatterplot for each of the datasets and add the regression line to each plot.
  3. In which of the four cases would you be willing to use the regression line to describe the dependence of on ? Explain your answer in each case.
Table 6.26: TABLE 6.10 Four Datasets for Exploring Correlation and Regression
Dataset A
10 8 13 9 11 14 6 4 12 7 5
8.04 6.95 7.58 8.81 8.33 9.96 7.24 4.26 10.84 4.82 5.68
Dataset B
10 8 13 9 11 14 6 4 12 7 5
9.14 8.14 8.74 8.77 9.26 8.10 6.13 3.10 9.13 7.26 4.74
Dataset C
10 8 13 9 11 14 6 4 12 7 5
7.46 6.77 12.74 7.11 7.81 8.84 6.08 5.39 8.15 6.42 5.73
Dataset D
8 8 8 8 8 8 8 8 8 8 19
6.58 5.76 7.71 8.84 8.47 7.04 5.25 5.56 7.91 6.89 12.50
Table 6.26: Data from Frank J. Anscombe, Graphs in statistical analysis, The American Statistician, 27 (1973): 17–21.

45.

(a) All four have .

(b)

image

(c) Dataset A; additional answers will vary.

Variable Return Year Mean StDev Minimum Median Maximum
2002 −16.03 23.51 −50.50 −26.90 −12.80 −6.70 64.30
2003 37.74 15.78 14.10 27.50 32.30 43.90 71.90

A-16