Improving reliability, reducing bias

What time is it? Much modern technology, such as the Global Positioning System, which uses satellite signals to tell you where you are, requires very exact measurements of time. In 1967, the International Committee for Weights and Measures defined the second to be the time required for 9,192,631,770 vibrations of a cesium atom. The cesium atom is not affected by changes in temperature, humidity, and air pressure, like physical clocks are. The National Institute of Standards and Technology (NIST) has the world’s most accurate atomic clock and broadcasts the results (with some loss in transmission) by radio, telephone, and Internet.

175

EXAMPLE 9 Really accurate time

NIST’s atomic clock is very accurate but not perfectly accurate. The world standard is Coordinated Universal Time, compiled by the International Bureau of Weights and Measures (BIPM) in Sèvres, France. BIPM doesn’t have a better clock than NIST. It calculates the time by averaging the results of more than 200 atomic clocks around the world. NIST tells us (after the fact) how much it misses the correct time by. Here are the last 12 errors as we write, in seconds:

0.0000000075 0.0000000012
0.0000000069 −0.0000000020
0.0000000067 −0.0000000045
0.0000000063 −0.0000000046
0.0000000041 −0.0000000042
0.0000000032 −0.0000000036

In the long run, NIST’s measurements of time are not biased. The NIST second is sometimes shorter than the BIPM second and sometimes longer, not always off in the same direction. NIST’s measurements are very reliable, but the preceding numbers do show some variation. There is no such thing as a perfectly reliable measurement. The average (mean) of several measurements is more reliable than a single measurement. That’s one reason BIPM combines the time measurements of many atomic clocks.

Scientists everywhere repeat their measurements and use the average to get more reliable results. Even students in a chemistry lab often do this. Just as larger samples reduce variation in a sample statistic, averaging over more measurements reduces variation in the final result.

Use averages to improve reliability

No measuring process is perfectly reliable. The average of several repeated measurements of the same individual is more reliable (less variable) than a single measurement.

176

image
Figure 8.2: Figure 8.2 This atomic clock at the National Institute of Standards and Technology is accurate to 1 second in 6 million years. (Source: NIST.)

Unfortunately, there is no similarly straightforward way to reduce the bias of measurements. Bias depends on how good the measuring instrument is. To reduce the bias, you need a better instrument. The atomic clock at NIST (Figure 8.2) is accurate to 1 second in 6 million years but is a bit large to put beside your bed.

EXAMPLE 10 Measuring unemployment again

Measuring unemployment is also “measurement.” The concepts of bias and reliability apply here just as they do to measuring length or time.

The Bureau of Labor Statistics checks the reliability of its measurements of unemployment by having supervisors reinterview about 5% of the sample. This is repeated measurement on the same individual, just as a student in a chemistry lab measures a weight several times.

The BLS attacks bias by improving its instrument. That’s what happened in 1994, when the Current Population Survey was given its biggest overhaul in more than 50 years. The old system for measuring unemployment, for example, underestimated unemployment among women because the detailed procedures had not kept up with changing patterns of women’s work. The new measurement system corrected that bias—and raised the reported rate of unemployment.