EXAMPLE 9 Beware the Outlier!

Figure 6.15 shows a scatterplot of data that have a strong positive straight-line relationship. In fact, the correlation is , close to the of a perfect straight line. The line on the plot is the least-squares regression line for predicting from . One point is an extreme outlier in both the - and -directions. Let’s examine the influence of this outlier.

First, suppose we omit the outlier. The correlation for the five remaining points (the cluster at the lower left) is . The outlier extends the straight-line pattern and greatly increases the correlation.

image
Figure 6.15: Figure 6.15 The outlier increases the correlation and fixes the location of the least-squares line.

Next, suppose we grab the outlier and pull it straight down, as in Figure 6.16. The least-squares line chases the outlier down, pivoting until it has a negative slope. This is the least-squares idea at work: The line stays close to all six points. However, in this situation its location is determined almost entirely by the one outlier. Of course, the correlation is now also negative, . Never trust a correlation or a regression line if you have not plotted the data.

One way to explore this concept is to use the Correlation and Regression applet. Applet Exercise 1 (page 288) asks you to animate the situation shown in Figures 6.15 and 6.16 so that you can watch change and the regression line move as you pull the outlier down.

269

image
Figure 6.16: Figure 6.16 Moving the outlier unduly changes the correlation and moves the least-squares line.