Mean and standard deviation

279

The five-number summary is not the most common numerical description of a distribution. That distinction belongs to the combination of the mean to measure center and the standard deviation to measure variability. The mean is familiar—it is the ordinary average of the observations. The idea of the standard deviation is to give the average distance of observations from the mean. The “average distance” in the standard deviation is found in a rather obscure way. We will give the details, but you may want to just think of the standard deviation as “average distance from the mean” and leave the details to your calculator.

Mean and standard deviation

The mean (pronounced “x-bar”) of a set of observations is their arithmetic average. To find the mean of n observations, add the values and divide by n:

The standard deviation s measures the average distance of the observations from their mean. It is calculated by finding an average of the squared distances and then taking the square root. To find the standard deviation of n observations:

  1. 1. Find the distance of each observation from the mean and square each of these distances.

  2. 2. Average the squared distances by dividing their sum by . This average squared distance is called the variance.

  3. 3. The standard deviation is the square root of this average squared distance.

EXAMPLE 4 Finding the mean and standard deviation

The numbers of home runs Barry Bonds hit in his 22 major league seasons are

16 25 24 19 33 25 34 46 37 33 42
40 37 34 49 73 46 45 45 5 26 28

280

To find the mean of these observations,

Figure 12.6 displays the data as points above the number line, with their mean marked by a vertical line. The arrow shows one of the distances from the mean. The idea behind the standard deviation is to average the 22 distances. To find the standard deviation by hand, you can use a table layout:

Observation Squared distance from mean
16 (16 − 34.6)2 = (−18.6)2 = 345.96
25 (25 − 34.6)2 = (−9.6)2 = 92.16
28 (28 − 34.6)2 = (−6.6)2 = 43.56
sum = 4139.12

The average is

Notice that we “average” by dividing by one less than the number of observations. Finally, the standard deviation is the square root of this number:

image
Figure 269.6: Figure 12.6 Barry Bonds’s home run counts, Example 4, with their mean and the distance of one observation from the mean indicated. Think of the standard deviation as an average of these distances.

281

In practice, you can key the data into your calculator and hit the mean key and the standard deviation key. Or you can enter the data into a spreadsheet or other software to find and s. It is usual, for good but somewhat technical reasons, to average the squared distances by dividing their total by rather than by n. Many calculators have two standard deviation buttons, giving you a choice between dividing by n and dividing by . Be sure to choose .

NOW IT’S YOUR TURN

Question 269.3

12.3 Hank Aaron. Here are Aaron’s home run counts for his 23 years in baseball.

13 27 26 44 30 39 40 34 45 44 24 32
44 39 29 44 38 47 34 40 20 12 10

Find the mean and standard deviation of the number of home runs Aaron hit in each season of his career. How do the mean and median compare?

More important than the details of the calculation are the properties that show how the standard deviation measures variability.

Properties of the standard deviation s

  1. s measures variability about the mean . Use s to describe the variability of a distribution only when you use to describe the center.

  2. only when there is no variability. This happens only when all observations have the same value. So standard deviation zero means no variability at all. Otherwise, . As the observations become more variable about their mean, s gets larger.

EXAMPLE 5 Investing 101

We have discussed examples about income. Here is an example about what to do with it once you’ve earned it. One of the first principles of investing is that taking more risk brings higher returns, at least on the average in the long run. People who work in finance define risk as the variability of returns from an investment (greater variability means higher risk) and measure risk by how unpredictable the return on an investment is. A bank account that is insured by the government and has a fixed rate of interest has no risk—its return is known exactly. Stock in a new company may soar one week and plunge the next. It has high risk because you can’t predict what it will be worth when you want to sell.

282

Investors should think statistically. You can assess an investment by thinking about the distribution of (say) yearly returns. That means asking about both the center and the variability of the pattern of returns. Only naive investors look for a high average return without asking about risk, that is, about how variable the returns are. Financial experts use the mean and standard deviation to describe returns on investments. The standard deviation was long considered too complicated to mention to the public, but now you will find standard deviations appearing regularly in mutual funds reports.

Here by way of illustration are the means and standard deviations of the yearly returns on three investments over the second half of the 20th century (the 50 years from 1950 to 1999):

Investment Mean return Standard
deviation
Treasury bills 5.34% 2.96%
Treasury bonds 6.12% 10.73%
Common stocks 14.62% 16.32%

You can see that risk (variability) goes up as the mean return goes up, just as financial theory claims. Treasury bills and bonds are ways of loaning money to the U.S. government. Treasury bills are paid back in one year, so their return changes from year to year depending on interest rates. Bonds are 30-year loans. They are riskier because the value of a bond you own will drop if interest rates go up. Stocks are even riskier. They give higher returns (on the average in the long run) but at the cost of lots of sharp ups and downs along the way. As the stemplot in Figure 12.7 shows, stocks went up by as much as 50% and down by as much as 26% in one year during the 50 years covered by our data.

image
Figure 269.7: Figure 12.7 Stemplot of the yearly returns on common stocks for the 50 years 1950–1999, Example 5. The returns are rounded to the nearest whole percent. The stems are 10s of percents and the leaves are single percents.

283

Player Salary ($) Player Salary ($)
LeBron James 20.6 million Iman Shumpert 2.6 million
Kevin Love 15.7 million Brendan Haywood 2.2 million
Anderson Varejao 9.7 million James Jones 1.4 million
Kyrie Irving 7.1 million Shawn Marion 1.4 million
J.R. Smith 6.0 million Joe Harris 0.9 million
Tristan Thompson 5.1 million Matthew Dellavedova 0.8 million
Timofey Mozgov 4.7 million Kendrick Perkins 0.4 million
Mike Miller 2.7 million
Source: The salaries are estimates are from www.spotrac.com/nba/rankings/2014/base/cleveland-cavaliers/
Table : TABLE 12.1 Salaries of the Cleveland Cavaliers, 2014–2015 season