Pity the poor psychologist

Statisticians think about measurement much the same way as they think about sampling. In both settings, the big idea is to ask, “What would happen if we did this many times?” In sampling, we want to estimate a population parameter, and we worry that our estimate may be biased or vary too much from sample to sample. Now we want to measure the true value of some property, and we worry that our measurement may be biased or vary too much when we repeat the measurement on the same individual. Bias is systematic error that happens every time; high variability (low reliability) means that our result can’t be trusted because it isn’t repeatable.

Thinking of measurement this way is pretty straightforward when you are measuring your weight. To start with, you have a clear idea of what your “true weight” is. You know that there are really good scales around: start at the doctor’s office, go to the physics lab, end up at NIST. You can measure your weight as accurately as you wish. This makes it easy to see that your bathroom scale always reads 3 pounds too high. Reliability is also easy to describe—step on and off the scale many times and see how much its readings vary.

177

Asking “What would happen if we did this many times?” is a lot harder to put into practice when we want to measure “intelligence” or “readiness for college.” Consider as an example the poor psychologist who wants to measure “authoritarian personality.”

EXAMPLE 11 Authoritarian personality?

Do some people have a personality type that disposes them to rigid thinking and to following strong leaders? Psychologists looking back on the Nazis after World War II thought so. In 1950, a group of psychologists developed the “F-scale” as an instrument to measure “authoritarian personality.” The F-scale asks how strongly you agree or disagree with statements such as the following:

  • Obedience and respect for authority are the most important virtues children should learn.

  • Science has its place, but there are many important things that can never be understood by the human mind.

Strong agreement with such statements marks you as authoritarian. The F-scale and the idea of the authoritarian personality continue to be prominent in psychology, especially in studies of prejudice and right-wing extremist movements.

Here are some questions we might ask about using the F-scale to measure “authoritarian personality.” The same questions come to mind when we think about IQ tests or the SAT exam.

  1. 1. Just what is an “authoritarian personality”? We understand this much less well than we understand your weight. The answer in practice seems to be “whatever the F-scale measures.” Any claim for validity must rest on what kinds of behavior high F-scale scores go along with. That is, we fall back on predictive validity.

  2. 2. The F in “F-scale” stands for Fascist. As the second question in Example 11 suggests, people who hold traditional religious beliefs are likely to get higher F-scale scores than similar people who don’t hold those beliefs. Does the instrument reflect the beliefs of those who developed it? That is, would people with different beliefs come up with a quite different instrument?

  3. 3. You think you know what your true weight is. What is the true value of your F-scale score? The measuring devices at NIST can help us find a true weight but not a true authoritarianism score. If we suspect that the instrument is biased as a measure of “authoritarian personality” because it penalizes religious beliefs, how can we check that?

    178

  4. 4. You can weigh yourself many times to learn the reliability of your bathroom scale. If you take the F-scale test many times, you remember what answers you gave the first time. That is, repeats of the same psychological measurement are not really repeats. Thus, reliability is hard to check in practice. Psychologists sometimes develop several forms of the same instrument in order to repeat their measurements. But how do we know these forms are really equivalent?

The point is not that psychologists lack answers to these questions. The first two are controversial because not all psychologists think about human personality in the same way. The second two questions have at least partial answers but not simple answers. The point is that “measurement,” which seems so straightforward when we measure weight, is complicated indeed when we try to measure human personality.

There is a larger lesson here. Be wary of statistical “facts” about squishy topics like authoritarian personality, intelligence, and even readiness for college. The numbers look solid, as numbers always do. But data are a human product and reflect human desires, prejudices, and weaknesses. If we don’t understand and agree on what we are measuring, the numbers may produce more disagreement than enlightenment.