12
Describing Distributions with Numbers
CASE STUDY Does education pay? We are told that people with more education earn more on the average than people with less education. How much more? How can we answer this question?
Data on income can be found at the Census Bureau website. The data are estimates, for the year 2013, of the total incomes of 136,641,000 people aged 25 and over with earnings and are based on the results of the Current Population Survey in 2014. The website gives the income distribution for each of several education categories. In particular, it gives the number of people in each of several education categories who earned between $1 and $2499, between $2500 and $4999, up to between $97,500 and $99,999, and $100,000 and over. That is a lot of information. A histogram could be used to display the data, but are there simple ways to summarize the information with just a few numbers that allow us to make sensible comparisons?
In this chapter, we will learn several ways to summarize large data sets with a few numbers. By the end of this chapter, with these new methods for summarizing large data sets, you will be able to provide an answer to whether education really pays.
Baseball has a rich tradition of using statistics to summarize and characterize the performance of players. We begin by investigating ways to summarize the performance of the greatest home-run hitters of all time.
In the summer of 2007, Barry Bonds shattered the career home run record, breaking the previous record set by Hank Aaron. Here are his home run counts for the years 1986 (his rookie year) to 2007 (his final season):
1986 | 1987 | 1988 | 1989 | 1990 | 1991 | 1992 | 1993 | 1994 | 1995 | 1996 |
16 | 25 | 24 | 19 | 33 | 25 | 34 | 46 | 37 | 33 | 42 |
1997 | 1998 | 1999 | 2000 | 2001 | 2002 | 2003 | 2004 | 2005 | 2006 | 2007 |
40 | 37 | 34 | 49 | 73 | 46 | 45 | 45 | 5 | 26 | 28 |
The stemplot in Figure 12.1 displays the data. The shape of the distribution is a bit irregular, but we see that it has one high outlier, and if we ignore this outlier, we might describe it as slightly skewed to the left with a single peak. The outlier is, of course, Bonds’s record season in 2001.
A graph and a few words give a good description of Barry Bonds’s home run career. But words are less adequate to describe, for example, the incomes of people with a high school education. We need numbers that summarize the center and variability of a distribution.