5 Exploring Data: Distributions

5.6 5.5 Describing Variability: Range and Quartiles

The mean and median provide two different measures of the center of a distribution. But a measure of center alone can be limiting. It would not be comfortable to live in a home with a mean temperature of 70°F if half the time it was 40°F and half the time it was 100°F! Two neighborhoods with a median house price of $193,000 can still be quite different if one has both mansions and modest homes and the other has little variation among houses. We are interested in the variability of house prices, as well as in their centers. A useful numerical description of a distribution needs to consist of both a measure of center and a measure of variability.

The simplest way to measure variability is with the range, which is the difference between the smallest and largest observations. For example, the percentages of Hispanics in the states are as low as 1% (Maine or West Virginia) and as high as 42.3% (New Mexico), so the range would be . Likewise, the range of the city mileage numbers in Table 5.7 is . The range tells us the full span of the data, but it may be greatly affected by an outlier. Without the Toyota Prius, the preceding answer becomes .

Range DEFINITION

The range is a measure of variability of a set of observations. It is obtained by subtracting the smallest observation from the largest observation.

We can improve our description of variability by looking at the spread of the middle half of the data. The first and third quartiles delineate the middle half. At the end of the first quarter of a football game, one quarter of the game is complete. Similarly, the first quartile of a distribution or dataset is the point that exceeds one-quarter (or 25%) of the values. is also the 25th percentile. The third quartile is the point that exceeds three-quarters (or 75%) of the values. (You usually won’t hear the phrase "second quartile" because it’s equivalent to something we already named: the median!) The quartiles break the dataset into four groups with equal numbers of observations. To make the idea of quartiles more exact, we need a procedure to find them.

Finding the Quartiles and PROCEDURE

Arrange all observations (including any repeated values) in increasing order.
Use the median to split the ordered dataset into two halves—an upper half and a lower half. (If the number of values is odd, don’t include the middle observation in either half.)
The first quartile, , is the median of the lower half. The third quartile, , is the median of the upper half.

201

EXAMPLE 11 Finding Quartiles

The city mileages of the 12 gasoline-powered midsized cars, after sorting, are

We have indicated with brackets a split of the data into a lower half and an upper half. The first quartile is the median of the six observations in the lower half, so . Similarly, the third quartile is the median of the upper half: .

For an example with an odd number of observations, try the city mileages of all 13 midsized cars in Table 5.7 (page 196). Below are the mileages in increasing order, with the median in the center, which will be excluded to form two equal-sized groups:

We find the quartiles by finding the median of each half of the dataset: and .

Some software packages or calculators may use a slightly different procedure to find the quartiles, so their results may be a bit different from our work here. Don’t worry about this. The differences will be too small to be important.