279
The five-number summary is not the most common numerical description of a distribution. That distinction belongs to the combination of the mean to measure center and the standard deviation to measure variability. The mean is familiar—it is the ordinary average of the observations. The idea of the standard deviation is to give the average distance of observations from the mean. The “average distance” in the standard deviation is found in a rather obscure way. We will give the details, but you may want to just think of the standard deviation as “average distance from the mean” and leave the details to your calculator.
Mean and standard deviation
The mean (pronounced “x-bar”) of a set of observations is their arithmetic average. To find the mean of n observations, add the values and divide by n:
The standard deviation s measures the average distance of the observations from their mean. It is calculated by finding an average of the squared distances and then taking the square root. To find the standard deviation of n observations:
1. Find the distance of each observation from the mean and square each of these distances.
2. Average the squared distances by dividing their sum by . This average squared distance is called the variance.
3. The standard deviation is the square root of this average squared distance.
EXAMPLE 4 Finding the mean and standard deviation
The numbers of home runs Barry Bonds hit in his 22 major league seasons are
16 | 25 | 24 | 19 | 33 | 25 | 34 | 46 | 37 | 33 | 42 |
40 | 37 | 34 | 49 | 73 | 46 | 45 | 45 | 5 | 26 | 28 |
280
To find the mean of these observations,
Figure 12.6 displays the data as points above the number line, with their mean marked by a vertical line. The arrow shows one of the distances from the mean. The idea behind the standard deviation is to average the 22 distances. To find the standard deviation by hand, you can use a table layout:
Observation | Squared distance from mean | ||
---|---|---|---|
16 | (16 − 34.6)2 = | (−18.6)2 = | 345.96 |
25 | (25 − 34.6)2 = | (−9.6)2 = | 92.16 |
⋮ | |||
28 | (28 − 34.6)2 = | (−6.6)2 = | 43.56 |
sum = | 4139.12 |
The average is
Notice that we “average” by dividing by one less than the number of observations. Finally, the standard deviation is the square root of this number:
281
In practice, you can key the data into your calculator and hit the mean key and the standard deviation key. Or you can enter the data into a spreadsheet or other software to find and s. It is usual, for good but somewhat technical reasons, to average the squared distances by dividing their total by rather than by n. Many calculators have two standard deviation buttons, giving you a choice between dividing by n and dividing by . Be sure to choose .
NOW IT’S YOUR TURN
12.3 Hank Aaron. Here are Aaron’s home run counts for his 23 years in baseball.
13 | 27 | 26 | 44 | 30 | 39 | 40 | 34 | 45 | 44 | 24 | 32 |
44 | 39 | 29 | 44 | 38 | 47 | 34 | 40 | 20 | 12 | 10 |
Find the mean and standard deviation of the number of home runs Aaron hit in each season of his career. How do the mean and median compare?
More important than the details of the calculation are the properties that show how the standard deviation measures variability.
Properties of the standard deviation s
• s measures variability about the mean . Use s to describe the variability of a distribution only when you use to describe the center.
• only when there is no variability. This happens only when all observations have the same value. So standard deviation zero means no variability at all. Otherwise, . As the observations become more variable about their mean, s gets larger.
EXAMPLE 5 Investing 101
We have discussed examples about income. Here is an example about what to do with it once you’ve earned it. One of the first principles of investing is that taking more risk brings higher returns, at least on the average in the long run. People who work in finance define risk as the variability of returns from an investment (greater variability means higher risk) and measure risk by how unpredictable the return on an investment is. A bank account that is insured by the government and has a fixed rate of interest has no risk—its return is known exactly. Stock in a new company may soar one week and plunge the next. It has high risk because you can’t predict what it will be worth when you want to sell.
282
Investors should think statistically. You can assess an investment by thinking about the distribution of (say) yearly returns. That means asking about both the center and the variability of the pattern of returns. Only naive investors look for a high average return without asking about risk, that is, about how variable the returns are. Financial experts use the mean and standard deviation to describe returns on investments. The standard deviation was long considered too complicated to mention to the public, but now you will find standard deviations appearing regularly in mutual funds reports.
Here by way of illustration are the means and standard deviations of the yearly returns on three investments over the second half of the 20th century (the 50 years from 1950 to 1999):
Investment | Mean return | Standard deviation |
---|---|---|
Treasury bills | 5.34% | 2.96% |
Treasury bonds | 6.12% | 10.73% |
Common stocks | 14.62% | 16.32% |
You can see that risk (variability) goes up as the mean return goes up, just as financial theory claims. Treasury bills and bonds are ways of loaning money to the U.S. government. Treasury bills are paid back in one year, so their return changes from year to year depending on interest rates. Bonds are 30-year loans. They are riskier because the value of a bond you own will drop if interest rates go up. Stocks are even riskier. They give higher returns (on the average in the long run) but at the cost of lots of sharp ups and downs along the way. As the stemplot in Figure 12.7 shows, stocks went up by as much as 50% and down by as much as 26% in one year during the 50 years covered by our data.
283
Player | Salary ($) | Player | Salary ($) |
---|---|---|---|
LeBron James | 20.6 million | Iman Shumpert | 2.6 million |
Kevin Love | 15.7 million | Brendan Haywood | 2.2 million |
Anderson Varejao | 9.7 million | James Jones | 1.4 million |
Kyrie Irving | 7.1 million | Shawn Marion | 1.4 million |
J.R. Smith | 6.0 million | Joe Harris | 0.9 million |
Tristan Thompson | 5.1 million | Matthew Dellavedova | 0.8 million |
Timofey Mozgov | 4.7 million | Kendrick Perkins | 0.4 million |
Mike Miller | 2.7 million | ||
Source: The salaries are estimates are from www.spotrac.com/nba/rankings/2014/base/cleveland-cavaliers/ |