EXAMPLE 2 Identification of Outliers

Consider two datasets:

  • Table 6.3 gives the results of an 8-year-old competitive swimmer’s first 14 races of the 50-yard butterfly. (The time for race 10 did not get recorded and hence is missing here.)
  • Table 6.4 shows reading and IQ test scores for a group of fifth-grade students.
Table 6.3: TABLE 6.3 A Swimmer’s 50-Yard Butterfly Times for Her First 14 Races
Race number 1 2 3 4 5 6 7
Time (seconds) 60.81 66.11 47.32 42.69 43.40 44.82 42.67
Race number 8 9 10 11 12 13 14
Time (seconds) 45.17 41.20 missing 42.47 41.74 40.40 42.90
image
Table 6.4: TABLE 6.4 Fifth-Grade Students’ IQ and Reading Test Scores
IQ test score 100 102 110 115 118 123 124
Reading test score 40 65 55 70 75 95 45
IQ test score 125 126 130 135 140 143 147
Reading test score 70 85 90 75 95 85 95
image

246

Scatterplots of these datasets appear in Figure 6.3 and Figure 6.4, respectively. Outliers have been circled. The scatterplot in Figure 6.3 shows two circled outliers—they are associated with the highest values of the response variable, time, and lowest values of the explanatory variable, race number. Whenever possible, look for explanations for the presence of outliers. In this case, the swimmer had just learned the butterfly, which explains why her times in the first two races (when she was worried about getting disqualified) were unusually slow.

image
Figure 6.3: Figure 6.3 Swimmer’s times for the 50-yard butterfly in consecutive races.
image
Figure 6.4: Figure 6.4 Scatterplot of reading test scores against IQ test scores.

The outlier circled in Figure 6.4 was flagged as an outlier by a statistical program. In this case, the outlier does not correspond to the minimum or maximum values of either the response or explanatory variables. Instead, the point (124, 45) indicates a reading test score that is low in comparison to the reading test scores of other students with IQ test scores close to 124. Without additional information about this student, we don’t have an explanation for the presence of this outlier.