When the data we are interested in are quantitative, we commonly summarize the data not only graphically but also numerically. As with our graphical descriptions, we would like our numerical descriptions to address the important characteristics of the data set—its shape, center and spread. Consider once again the data table that gave information about some of the 25 top-grossing movies in the United States in November 2010.
| Year | Box Office (millions of dollars) | Genre | Running Time | Academy Awards | MPAA Rating |
---|---|---|---|---|---|---|
E.T.: The Extra-Terrestrial | 1982 | 435 | Family | 115 | 4 | PG |
Star Wars: Episode I - The Phantom Menace | 1999 | 431 | Sci-Fi | 133 | 0 | PG |
Pirates of the Caribbean: Dead Man's Chest | 2006 | 423 | Adventure | 130 | 1 | PG13 |
Toy Story 3 | 2010 | 415 | Animation | 103 | * | G |
Spider-Man | 2002 | 404 | Action | 121 | 0 | PG13 |
Transformers: Revenge of the Fallen | 2009 | 402 | Action | 150 | 0 | PG13 |
Star Wars: Episode III - Revenge of the Sith | 2005 | 380 | Sci-Fi | 140 | 0 | PG13 |
The Lord of the Rings: The Return of the King | 2003 | 377 | Fantasy | 201 | 11 | PG13 |
*as of February 2011 Source: The Internet Movie Database |
In reviewing this table we see, in particular, that the third column and the fifth column each contain values of a quantitative variable. In order to get a “picture” of these data, we might want to determine how much, on average, a top-grossing film earned, or whether the running time for The Lord of the Rings: The Return of the King was unusually long. These are questions that can be answered using numerical descriptive measures.
We will begin our discussion by considering a numerical description with which you are probably familiar, the mean. The mean of a set of quantitative data is simply its arithmetic average, the very same average you learned to calculate in elementary school. The mean is found by summing all data values, then dividing the result by the number of values. We indicate this symbolically using formulas:
You are probably thinking that these two formulas are remarkably similar; indeed, they are nearly identical. That is because what we do is exactly the same; what we say depends on whether we are working with a population or a sample.
In working with a population, we denote the mean by µ, the Greek letter mu. The upper case N is the population size (the number of values we are adding), and the x’s are the individual data values, labeled from 1 to N. If we are calculating the mean of a sample, we indicate the result by , which we call (not very creatively) x-bar, with the lower case n the sample size, and the x’s labeled from 1 to n.
While this might seem unnecessarily “picky” at first glance, the notation indicates two important distinctions that we must keep in mind: