Median and quartiles

A simple and effective way to describe center and variability is to give the median and the quartiles. The median is the midpoint, the value that separates the smaller half of the observations from the larger half. The first and third quartiles mark off the middle half of the observations. The quartiles get their name because with the median they divide the observations into quarters—one-quarter of the observations lie below the first quartile, half lie below the median, and three-quarters lie below the third quartile. That’s the idea. To actually get numbers, we need a rule that makes the idea exact.

EXAMPLE 1 Finding the median

We might compare Bonds’s career with that of Hank Aaron, the previous holder of the career record. Here are Aaron’s home run counts for his 23 years in baseball.

13 27 26 44 30 39 40 34 45 44 24 32
44 39 29 44 38 47 34 40 20 12 10

To find the median, first arrange them in order from smallest to largest:

10 12 13 20 24 26 27 29 30 32 34 34
38 39 39 40 40 44 44 44 44 45 47

The bold 34 is the center observation, with 11 observations to its left and 11 to its right. When the number of observations is odd (here ), there is always one observation in the center of the ordered list. This is the median, .

271

How does this compare with Bonds’s record? Here are Bonds’s 22 home run counts, arranged in order from smallest to largest:

5 16 19 24 25 25 26 28 33 33 34
34 37 37 40 42 45 45 46 46 49 73

When n is even, there is no one middle observation. But there is a middle pair—the bold 34 and 34 have 10 observations on either side. We take the median to be halfway between this middle pair. So Bonds’s median is

There is a fast way to locate the median in an ordered list: count up places from the beginning of the list. Try it. For Aaron, and , so the median is the 12th entry in the ordered list. For Bonds, () and . This means “halfway “between the 11th and 12th” entries, so M is the average of these two entries. This “ rule” is especially handy when you have many observations. The median of incomes is halfway between the 23,470th and 23,471st in the ordered list. Be sure to note that does not give the median M, just its position in the ordered list of observations.

The median

The median is the midpoint of a distribution, the number such that half the observations are smaller and the other half are larger. To find the median of a distribution:

  1. 1. Arrange all observations in order of size, from smallest to largest.

  2. 2. If the number of observations n is odd, the median is the center observation in the ordered list. Find the location of the median by counting observations up from the bottom of the list.

  3. 3. If the number of observations is even, the median is the average of the two center observations in the ordered list. The location of the median is again from the bottom of the list.

The Census Bureau website provides data on income inequality. For example, it tells us that in 2013 the median income of Hispanic households was $40,963. That’s helpful but incomplete. Do most Hispanic households earn close to this amount, or are the incomes very variable? The simplest useful description of a distribution consists of both a measure of center and a measure of variability. If we choose the median (the midpoint) to describe center, the quartiles (in particular, the difference between the quartiles) provide natural descriptions of variability. Again, the idea is clear: find the points one-quarter and three-quarters up the ordered list of observations. Again, we need a rule to make the idea precise. The rule for calculating the quartiles uses the rule for the median.

272

image
Macmillan Learning

The quartiles Q1 and Q3

To calculate the quartiles:

  1. 1. Arrange the observations in increasing order and locate the median M in the ordered list of observations.

  2. 2. The first quartile Q1 is the median of the observations whose position in the ordered list is to the left of the location of the overall median. The overall median is not included in the observations considered to be to the left of the overall median.

  3. 3. The third quartile Q3 is the median of the observations whose position in the ordered list is to the right of the location of the overall median. The overall median is not included in the observations considered to be to the right of the overall median.

EXAMPLE 2 Finding the quartiles

Hank Aaron’s 23 home run counts are

10 12 13 20 24 26 27 29 30 32 34 38
39 39 40 40 44 44 44 45 47

273

There is an odd number of observations, so the median is the one in the middle, the bold 34 in the list. To find the quartiles, ignore this central observation. The first quartile is the median of the 11 observations to the left of the bold 34 in the list. That’s the sixth, so . The third quartile is the median of the 11 observations to the right of the bold 34. It is .

Barry Bonds’s 22 home run counts are

5 16 19 24 25 25 26 28 33 33 34 34
37 37 40 42 45 45 46 46 49 73

The median lies halfway between the middle pair. There are 11 observations to the left of this location. The first quartile is the median of these 11 numbers. That’s the sixth, so . The third quartile is the median of the 11 observations to the right of the overall median’s location, .

NOW IT’S YOUR TURN

Question 269.1

12.1 Babe Ruth. Prior to Hank Aaron, Babe Ruth was the holder of the career record. Here are Ruth’s home run counts for his 22 years in Major League Baseball, arranged in order from smallest to largest:

0 2 3 4 6 11 22 25 29 34 35
41 41 46 46 46 47 49 54 54 59 60

Find the median, first quartile, and third quartile of these counts.

You can use the rule to locate the quartiles when there are many observations. The Census Bureau website tells us that there were 15,811,000 (rounded off to the nearest 1000) Hispanic households in the United States in 2013. If we ignore the roundoff, the median of these 15,811,000 incomes is halfway between the 7,905,500th and 7,905,501st in the list arranged in order from smallest to largest. So the first quartile is the median of the 7,905,500 incomes below this point in the list. Use the rule with to locate the quartile:

274

The average of the 3,952,750th and 3,952,751st incomes in the ordered list falls in the range $20,000 to $24,999 and we estimate the first quartile to be $21,621.

The third quartile is the median of the 7,905,500 incomes above the median. By the rule with 7,905,500, this will be the average of the 3,952,750th and 3,952,751st incomes above the median in the ordered list. We find that this falls in the range $65,000 to $69,999 and we estimate the third quartile to be $67,660.

In practice, people use statistical software to compute quartiles. Software can give results that differ from those you will obtain using the method described here. In fact, different software packages use slightly different rules for deciding how to divide the space between two adjacent values between which the quartile is believed to lie. We have chosen to select the point halfway between them, but other rules exist. Two different software packages can give two slightly different answers, depending on the rule employed.