4.2 Measures of Variability

  • Variability is a numerical way of describing how much spread there is in a distribution.

MASTERING THE CONCEPT

4-3: Variability is the second most common concept (after central tendency) to help us understand the shape of a distribution. Common indicators of variability are range, variance, and standard deviation.

People often poked fun at trashy Japanese products after World War II. Sometimes their transistor radios, for example, worked pretty well, but sometimes they didn’t work at all. They just weren’t reliable. However, after Japanese companies started applying Deming’s insight that the definition of high quality was low variability, the tiny nation needed just 3 years to become an industrial powerhouse whose reputation for high-quality, reliable products continues to this day. Variability is a numerical way of describing how much spread there is in a distribution. One way to numerically describe the variability of a distribution is by computing its range. A second and more common way to describe variability is by computing variance and its square root, known as standard deviation.

Range

  • The range is a measure of variability calculated by subtracting the lowest score (the minimum) from the highest score (the maximum).

The range is the easiest measure of variability to calculate. The range is a measure of variability calculated by subtracting the lowest score (the minimum) from the highest score (the maximum). Maximum and minimum are sometimes substituted in this formula to describe the highest and lowest scores, and some statistical computer programs abbreviate these as max and min. The range is represented in a formula as:

MASTERING THE FORMULA

4-2: The formula for the range is: Range = XhighestXlowest. We simply subtract the lowest score from the highest score to calculate the range.

range = Xhighest – Xlowest

Here are the scores for countries’ numbers of top finishes in the World Cup that we discussed earlier in the chapter. As before, we’ll omit countries with 0 top finishes.

1 1 1 1 2 2 2 2 2 2 3 3 4 6 8 10

EXAMPLE 4.8

We can determine the highest and lowest scores either by reading through the data or, more easily, by glancing at the frequency table for these data.

STEP 1: Determine the highest score.


In this case, the highest score is 10.

STEP 2: Determine the lowest score.


In this case, the lowest score is 1.

88

STEP 3: Calculate the range by subtracting the lowest score from the highest score:

range = XhighestXlowest = 10 – 1 = 9

The range can be a useful first indicator of variability, but it is influenced only by the highest and lowest scores. All the other scores in between could be clustered near the highest score, huddled near the center, spread out evenly, or have some other unexpected pattern. We can’t know based only on the range.

Variance

  • Variance is the average of the squared deviations from the mean.

Variance is the average of the squared deviations from the mean. When something varies, it must vary from (or be different from) some standard. That standard is the mean. So when we compute variance, that number describes how far a distribution varies around the mean. A small number indicates a small amount of spread or deviation around the mean, and a larger number indicates a great deal of spread or deviation around the mean. Post-World War II Japanese manufacturers, for example, were always aiming at a lower number because it meant high reliability—products you could trust.

EXAMPLE 4.9

  • A deviation from the mean is the amount that a score in a sample differs from the mean of the sample; also called a deviation.

Students who seek therapy at university counseling centers often do not attend many sessions. For example, in one study, the median number of therapy sessions was 3 and the mean was 4.6 (Hatchett, 2003). Let’s examine the spread of fictional scores for a sample of five students: 1, 2, 4, 4, and 10 therapy sessions, with a mean of 4.2. We find out how far each score deviates from the mean by subtracting the mean from every score. First, we label with an X the column that lists the scores. Here, the second column includes the results we get when we subtract the mean from each score, or XM. We call each of these a deviation from the mean (or just a deviation)—the amount that a score in a sample differs from the mean of the sample.

X XM
1 –3.2
2 –2.2
4 –0.2
4 –0.2
10 5.8

But we can’t just take the mean of the deviations. If we do (and if you try this, don’t forget the signs—negative and positive), we get 0—every time. Are you surprised? Remember, the mean is the point at which all scores are perfectly balanced. Mathematically, the scores have to balance out. Yet we know that there is variability among these scores. The number representing the amount of variability is certainly not 0!

When we ask students for ways to eliminate the negative signs, two suggestions typically come up: (1) Take the absolute value of the deviations, thus making them all positive, or (2) square all the scores, again making them all positive. It turns out that the latter, squaring all the deviations, is how statisticians solve this problem. Once we square the deviations, we can take their average and get a measure of variability. Later (using a beautifully descriptive term created by our students), we will “unsquare” those deviations in order to calculate the standard deviation.

89

To recap:

STEP 1: Subtract the mean from every score.

We call these deviations from the mean.


STEP 2: Square every deviation from the mean.

We call these squared deviations.


STEP 3: Sum all of the squared deviations.

This is often called the sum of squared deviations, or the sum of squares, for short.

STEP 4: Divide the sum of squares by the total number in the sample (N ).

This number represents the mathematical definition of variance—the average of the squared deviations from the mean.

  • The sum of squares, symbolized as SS, is the sum of each score’s squared deviation from the mean.

To calculate the variance for the therapy session data, we add a third column to contain the squares of each of the deviations. Then we add all of these numbers up to compute the sum of squares (symbolized as SS), the sum of each score’s squared deviation from the mean. In this case, the sum of the squared deviations is 48.80, so the average squared deviation is 48.80/5 = 9.76. Thus, the variance equals 9.76.

X XM (XM )2
1 –3.20 10.24
2 –2.20 4.84
4 –0.20 0.04
4 –0.20 0.04
10 5.80 33.64

MASTERING THE FORMULA

4-3: The formula for variance is: image . To calculate variance, subtract the mean (M ) from every score (X ) to calculate deviations from the mean; then square these deviations, sum them, and divide by the sample size (N ). By summing the squared deviations and dividing by sample size, we are taking their mean.

Language Alert! We need a few more symbols to use symbolic notation to represent the idea of variance. Each of these symbols represents the same idea (variance) applied to slightly different situations. The symbols that represent the variance of a sample include SD2, s2, and MS. The first two symbols, SD2 and s2, both represent the words standard deviation squared. The symbolic notation MS comes from the words mean square (referring to the average of the squared deviations). We’ll use SD2 at this point, but we will alert you when we switch to other symbols for variance later. The variance of the sample uses all three symbolic notations; however, the variance of a population uses just one symbol: σ2 (pronounced “sigma squared”). Table 4-2 summarizes the symbols and language used to describe different versions of the mean and variance, but we will keep reminding you as we go along.

We already know all the other symbols needed to calculate variance: X to indicate the individual scores, M to indicate the mean, and N to indicate the sample size.

image

90

image

As you can see, variance is really just a mean—the mean of squared deviations.

Standard Deviation

EXAMPLE 4.10

  • The standard deviation is the square root of the average of the squared deviations from the mean; it is the typical amount that each score varies, or deviates, from the mean.

Language Alert! Variance and standard deviation refer to the same core idea. The standard deviation is more useful because it is the typical amount that each score varies from the mean. Mathematically, the standard deviation is the square root of the average of the squared deviations from the mean, or, more simply, the square root of the variance. The beauty of the standard deviation—compared to the variance—is that we can understand it at a glance.

For example, the numbers of therapy sessions for the five students were 1, 2, 4, 4, and 10, with a mean of 4.2. The typical score does not vary from the mean by 9.76. The variance is based on squared deviations, not deviations, so it is too large. When we ask our students how to solve this problem, they invariably say, “Unsquare it,” and that’s just what we do. We take the square root of variance to come up with a much more useful number, the standard deviation. The square root of 9.76 is 3.12. Now we have a number that “makes sense” to us. We can now say that the typical number of therapy sessions for students in this sample is 4.2 and the typical amount a student varies from that is 3.12.

MASTERING THE FORMULA

4-4: The most basic formula for standard deviation is: image . We simply take the square root of the variance.

As you read journal articles, you often will see the mean and standard deviation reported as: (M = 4.2, SD = 3.12). A glance at the original data (1, 2, 4, 4, 10) tells us that these numbers make sense: 4.2 does seem to be approximately in the center, and scores do seem to vary from 4.2 by roughly 3.12. The score of 10 is a bit of an outlier—but not so much of one; the mean and the standard deviation are still somewhat representative of the typical score and typical deviation.

MASTERING THE FORMULA

4-5: The full formula for standard deviation is: image . To determine standard deviation, subtract the mean from every score to calculate deviations from the mean. Then, square the deviations from the mean. Sum the squared deviations, then divide by the sample size. Finally, take the square root of the mean of the squared deviations.

We didn’t actually need a formula to get the standard deviation. We just took the square root of the variance. Perhaps you guessed the symbols for standard deviation by just taking the square root of those for variance. With a sample, standard deviation is either SD or s. With a population, standard deviation is σ. Table 4-2 presents this information concisely. We can write the formula showing how standard deviation is calculated from variance:

image

We can also write the formula showing how standard deviation is calculated from the original X’s, M, and N:

image

91

CHECK YOUR LEARNING

Reviewing the Concepts
  • The simplest way to measure variability is to use the range, which is calculated by subtracting the lowest score from the highest score.

  • Variance and standard deviation both measure the degree to which scores in a distribution vary from the mean. The standard deviation is simply the square root of the variance: It represents the typical deviation of a score from the mean.

Clarifying the Concepts 4-6 In your own words, what is variability?
4-7 Distinguish the range from the standard deviation. What does each tell us about the distribution?
Calculating the Statistics 4-8

Calculate the range, variance, and standard deviation for the following data sets (the same ones from the section on central tendency).

  1. 10 8 22 5 6 1 19 8 13 12 8

  2. 122.5 123.8 121.2 125.8 120.2 123.8 120.5 119.8 126.3 123.6

  3. 0.100 0.866 0.781 0.555 0.222 0.245 0.234

Applying the Concepts 4-9 Final exam week is approaching, and students are not eating as well as usual. Four students were asked how many calories of junk food they had consumed between noon and 10:00 p.m. on the day before an exam. The estimated numbers of empty calories, calculated with the help of a nutritional software program, were 450, 670, 1130, and 1460.
  1. Using the formula, calculate the range for these scores.

  2. What information can’t you glean from the range?

  3. Using the formula, calculate variance for these scores.

  4. Using the formula, calculate the standard deviation for these scores.

  5. If a researcher were interested only in these four students, which symbols would he use for variance and standard deviation, respectively?

  6. If another researcher hoped to generalize from these four students to all students at the university, which symbols would she use for variance and standard deviation?

Solutions to these Check Your Learning questions can be found in Appendix D.