Example 2

EXAMPLE 2 Identification of Outliers

Consider two datasets:

Table 6.3 gives the results of an 8-year-old competitive swimmer’s first 14 races of the 50-yard butterfly. (The time for race 10 did not get recorded and hence is missing here.)
Table 6.4 shows reading and IQ test scores for a group of fifth-grade students.

Table 6.3: TABLE 6.3 A Swimmer’s 50-Yard Butterfly Times for Her First 14 Races

Race number	1	2	3	4	5	6	7
Time (seconds)	60.81	66.11	47.32	42.69	43.40	44.82	42.67
Race number	8	9	10	11	12	13	14
Time (seconds)	45.17	41.20	missing	42.47	41.74	40.40	42.90

Table 6.4: TABLE 6.4 Fifth-Grade Students’ IQ and Reading Test Scores

IQ test score	100	102	110	115	118	123	124
Reading test score	40	65	55	70	75	95	45
IQ test score	125	126	130	135	140	143	147
Reading test score	70	85	90	75	95	85	95

246

Scatterplots of these datasets appear in Figure 6.3 and Figure 6.4, respectively. Outliers have been circled. The scatterplot in Figure 6.3 shows two circled outliers—they are associated with the highest values of the response variable, time, and lowest values of the explanatory variable, race number. Whenever possible, look for explanations for the presence of outliers. In this case, the swimmer had just learned the butterfly, which explains why her times in the first two races (when she was worried about getting disqualified) were unusually slow.

Figure 6.3: Figure 6.3 Swimmer’s times for the 50-yard butterfly in consecutive races.

Figure 6.4: Figure 6.4 Scatterplot of reading test scores against IQ test scores.

The outlier circled in Figure 6.4 was flagged as an outlier by a statistical program. In this case, the outlier does not correspond to the minimum or maximum values of either the response or explanatory variables. Instead, the point (124, 45) indicates a reading test score that is low in comparison to the reading test scores of other students with IQ test scores close to 124. Without additional information about this student, we don’t have an explanation for the presence of this outlier.