Variability is a numerical way of describing how much spread there is in a distribution.
4-
People often poked fun at trashy Japanese products after World War II. Sometimes their transistor radios, for example, worked pretty well, but sometimes they didn’t work at all. They just weren’t reliable. However, after Japanese companies started applying Deming’s insight that the definition of high quality was low variability, the tiny nation needed just 3 years to become an industrial powerhouse whose reputation for high-
The range is a measure of variability calculated by subtracting the lowest score (the minimum) from the highest score (the maximum).
The range is the easiest measure of variability to calculate. The range is a measure of variability calculated by subtracting the lowest score (the minimum) from the highest score (the maximum). Maximum and minimum are sometimes substituted in this formula to describe the highest and lowest scores, and some statistical computer programs abbreviate these as max and min. The range is represented in a formula as:
4-
range = Xhighest – Xlowest
Here are the scores for countries’ numbers of top finishes in the World Cup that we discussed earlier in the chapter. As before, we’ll omit countries with 0 top finishes.
1 1 1 1 2 2 2 2 2 2 3 3 4 6 8 10
We can determine the highest and lowest scores either by reading through the data or, more easily, by glancing at the frequency table for these data.
STEP 1: Determine the highest score.
In this case, the highest score is 10.
STEP 2: Determine the lowest score.
In this case, the lowest score is 1.
STEP 3: Calculate the range by subtracting the lowest score from the highest score:
range = Xhighest – Xlowest = 10 – 1 = 9
The range can be a useful first indicator of variability, but it is influenced only by the highest and lowest scores. All the other scores in between could be clustered near the highest score, huddled near the center, spread out evenly, or have some other unexpected pattern. We can’t know based only on the range.
Variance is the average of the squared deviations from the mean.
Variance is the average of the squared deviations from the mean. When something varies, it must vary from (or be different from) some standard. That standard is the mean. So when we compute variance, that number describes how far a distribution varies around the mean. A small number indicates a small amount of spread or deviation around the mean, and a larger number indicates a great deal of spread or deviation around the mean. Post-
A deviation from the mean is the amount that a score in a sample differs from the mean of the sample; also called a deviation.
Students who seek therapy at university counseling centers often do not attend many sessions. For example, in one study, the median number of therapy sessions was 3 and the mean was 4.6 (Hatchett, 2003). Let’s examine the spread of fictional scores for a sample of five students: 1, 2, 4, 4, and 10 therapy sessions, with a mean of 4.2. We find out how far each score deviates from the mean by subtracting the mean from every score. First, we label with an X the column that lists the scores. Here, the second column includes the results we get when we subtract the mean from each score, or X – M. We call each of these a deviation from the mean (or just a deviation)—the amount that a score in a sample differs from the mean of the sample.
X | X – M |
1 | –3.2 |
2 | –2.2 |
4 | –0.2 |
4 | –0.2 |
10 | 5.8 |
But we can’t just take the mean of the deviations. If we do (and if you try this, don’t forget the signs—
When we ask students for ways to eliminate the negative signs, two suggestions typically come up: (1) Take the absolute value of the deviations, thus making them all positive, or (2) square all the scores, again making them all positive. It turns out that the latter, squaring all the deviations, is how statisticians solve this problem. Once we square the deviations, we can take their average and get a measure of variability. Later (using a beautifully descriptive term created by our students), we will “unsquare” those deviations in order to calculate the standard deviation.
To recap:
STEP 1: Subtract the mean from every score.
We call these deviations from the mean.
STEP 2: Square every deviation from the mean.
We call these squared deviations.
STEP 3: Sum all of the squared deviations.
This is often called the sum of squared deviations, or the sum of squares, for short.
STEP 4: Divide the sum of squares by the total number in the sample (N ).
This number represents the mathematical definition of variance—
The sum of squares, symbolized as SS, is the sum of each score’s squared deviation from the mean.
To calculate the variance for the therapy session data, we add a third column to contain the squares of each of the deviations. Then we add all of these numbers up to compute the sum of squares (symbolized as SS), the sum of each score’s squared deviation from the mean. In this case, the sum of the squared deviations is 48.80, so the average squared deviation is 48.80/5 = 9.76. Thus, the variance equals 9.76.
X | X – M | (X – M )2 |
1 | –3.20 | 10.24 |
2 | –2.20 | 4.84 |
4 | –0.20 | 0.04 |
4 | –0.20 | 0.04 |
10 | 5.80 | 33.64 |
4-
Language Alert! We need a few more symbols to use symbolic notation to represent the idea of variance. Each of these symbols represents the same idea (variance) applied to slightly different situations. The symbols that represent the variance of a sample include SD2, s2, and MS. The first two symbols, SD2 and s2, both represent the words standard deviation squared. The symbolic notation MS comes from the words mean square (referring to the average of the squared deviations). We’ll use SD2 at this point, but we will alert you when we switch to other symbols for variance later. The variance of the sample uses all three symbolic notations; however, the variance of a population uses just one symbol: σ2 (pronounced “sigma squared”). Table 4-2 summarizes the symbols and language used to describe different versions of the mean and variance, but we will keep reminding you as we go along.
We already know all the other symbols needed to calculate variance: X to indicate the individual scores, M to indicate the mean, and N to indicate the sample size.
As you can see, variance is really just a mean—
The standard deviation is the square root of the average of the squared deviations from the mean; it is the typical amount that each score varies, or deviates, from the mean.
Language Alert! Variance and standard deviation refer to the same core idea. The standard deviation is more useful because it is the typical amount that each score varies from the mean. Mathematically, the standard deviation is the square root of the average of the squared deviations from the mean, or, more simply, the square root of the variance. The beauty of the standard deviation—
For example, the numbers of therapy sessions for the five students were 1, 2, 4, 4, and 10, with a mean of 4.2. The typical score does not vary from the mean by 9.76. The variance is based on squared deviations, not deviations, so it is too large. When we ask our students how to solve this problem, they invariably say, “Unsquare it,” and that’s just what we do. We take the square root of variance to come up with a much more useful number, the standard deviation. The square root of 9.76 is 3.12. Now we have a number that “makes sense” to us. We can now say that the typical number of therapy sessions for students in this sample is 4.2 and the typical amount a student varies from that is 3.12.
4-
As you read journal articles, you often will see the mean and standard deviation reported as: (M = 4.2, SD = 3.12). A glance at the original data (1, 2, 4, 4, 10) tells us that these numbers make sense: 4.2 does seem to be approximately in the center, and scores do seem to vary from 4.2 by roughly 3.12. The score of 10 is a bit of an outlier—
4-
We didn’t actually need a formula to get the standard deviation. We just took the square root of the variance. Perhaps you guessed the symbols for standard deviation by just taking the square root of those for variance. With a sample, standard deviation is either SD or s. With a population, standard deviation is σ. Table 4-2 presents this information concisely. We can write the formula showing how standard deviation is calculated from variance:
We can also write the formula showing how standard deviation is calculated from the original X’s, M, and N:
Reviewing the Concepts |
|
|
Clarifying the Concepts | 4- |
In your own words, what is variability? |
4- |
Distinguish the range from the standard deviation. What does each tell us about the distribution? | |
Calculating the Statistics | 4- |
Calculate the range, variance, and standard deviation for the following data sets (the same ones from the section on central tendency).
|
Applying the Concepts | 4- |
Final exam week is approaching, and students are not eating as well as usual. Four students were asked how many calories of junk food they had consumed between noon and 10:00 p.m. on the day before an exam. The estimated numbers of empty calories, calculated with the help of a nutritional software program, were 450, 670, 1130, and 1460. |
|
Solutions to these Check Your Learning questions can be found in Appendix D.