13.3 Applying Correlation in Psychometrics
Psychometrics is the branch of statistics used in the development of tests and measures.
Psychometricians are the statisticians and psychologists who develop tests and measures.
Here’s an in-demand career available to students of the behavioral sciences: psychometrics, which is the branch of statistics used in the development of tests and measures. Not surprisingly, the statisticians and psychologists who develop tests and measures are called psychometricians. Psychometricians use the statistical procedures referred to in this textbook, particularly those for which correlation forms the mathematical backbone. Psychometricians make sure that elections are fair, they test for cultural biases in standardized tests, they identify high-achieving employees, and they make a wide range of social contributions—and we don’t have nearly enough of them. The New York Times reported (Herszenhorn, 2006) a “critical shortage” of such experts and intense competition for the few who are available—who are being offered U.S. salaries as high as $200,000 a year! Psychometricians use correlation to examine two important aspects of the development of measures—reliability and validity.
Reliability
Test–retest reliability refers to whether the scale being used provides consistent information every time the test is taken.
In Chapter 1, we defined a reliable measure as one that is consistent. For example, if we measure shyness, then a reliable measure leads to nearly the same score every time a person takes the shyness test. One particular type of reliability is test–retest reliability. Test-retest reliability refers to whether the scale being used provides consistent information every time the test is taken. To calculate a measure’s test–retest reliability, the measure is given twice to the same sample, typically with a delay between tests. The participants’ scores for the first time they complete the measure are correlated with their scores for the second time they complete the measure. A large correlation indicates that the measure yields the same results consistently over time—that is, good test–retest reliability (Cortina, 1993).
Correlation and Reliability Correlation is used by psychometricians to help professional sports teams assess the reliability of athletic performance, such as how fast a pitcher can throw a baseball.
AP Photo/Michael Manning
Another way to measure the reliability of a test is by assessing its internal consistency in order to verify that all the items were measuring the same idea (DeVellis, 1991). Initially, researchers measured internal consistency via split-half reliability, correlating the odd-numbered items (1, 3, 5, etc.) with the even-numbered items (2, 4, 6, etc.). If this correlation coefficient is large, then the test has high internal consistency. The odd–even approach is easy to understand, but computers now allow researchers to take a more sophisticated approach. A computer can calculate the average of every possible split-half reliability.
Coefficient alpha, symbolized as α, is a commonly used estimate of a test or measure’s reliability and is calculated by taking the average of all possible split-half correlations; sometimes called Cronbach’s alpha.
Consider a 10-item measure. A computer can calculate correlations between the odd-numbered items and even-numbered items, between the first 5 items and the last 5 items, between items 1, 2, 4, 8, 10 and items 3, 5, 6, 7, 9, and so on for every combination of two groups of 5 items. The computer can then calculate what is essentially (although not always exactly) the average of all possible split-half correlations (Cortina, 1993). The average of these is called coefficient alpha (or Cronbach’s alpha, in honor of the statistician who developed it). Coefficient alpha (symbolized as α) is a commonly used estimate of a test or measure’s reliability and is calculated by taking the average of all possible split-half correlations. Coefficient alpha is used frequently across a wide range of fields, including psychology, education, sociology, political science, medicine, economics, criminology, and anthropology (Cortina, 1993). (Note that this alpha is different from the p level.)
MASTERING THE CONCEPT
13-6: Correlation is used to calculate reliability either through test–retest reliability or through a measure of internal consistency such as coefficient alpha.
When developing a new scale or measure, how high should its reliability be? It would not be worth using a scale in research if the scale’s coefficient alpha is less than 0.80. However, if we are using a scale to make decisions about individuals—for example, if we are using the SAT or a diagnostic tool—we should aim for a coefficient alpha of 0.90 or even of 0.95 (Nunnally & Bernstein, 1994). We want high reliability when we are using a test that directly affects people’s lives—but the test also needs to be valid.
Validity
In Chapter 1, we defined a valid measure as one that measures what it was designed or intended to measure. Many researchers consider validity to be the most important concept in the field of psychometrics (e.g., Nunnally & Bernstein, 1994). It can be a great deal more work to measure validity than reliability, however, so that work is not always done. In fact, it is quite possible to have a reliable test, one that measures a variable, such as shyness, consistently over time and is internally consistent but is still not valid. Just because the items on a test all measure the same thing doesn’t mean that they’re measuring what we want them to measure or what we think they are measuring.
MASTERING THE CONCEPT
13-7: Correlation is used to calculate validity, often by correlating a new measure with existing measures known to assess the variable of interest.
For example, Cosmopolitan magazine often has quizzes that claim to assess readers’ relationships with their boyfriends. If you’ve ever taken one of these quizzes, you might wonder whether some of the quiz items actually measure what the quiz suggests. One quiz, titled “Is He Devoted to You?,” asks “Be honest: Do you ever worry that he might cheat on you?” Does this item assess a man’s devotion to his partner or his partner’s jealousy? Another item asks: “When you introduced him to your closest friends, he said:” and then offers three options—(1) “I’ve heard so much about all of you! So, how’d you become friends?” (2) “‘Hi,’ then silence—he looked a bit bored.” (3) “‘Nice to meet you’ with a big smile.” Does this measure his devotion or his social skills? Such a quiz might be reliable (you’d consistently get the same score), but it might not be a valid measure of a man’s devotion to his partner. Devotion, jealousy, and social skills are different concepts.
It takes a psychometrician who understands correlation to test the validity of such measures. Typically, a psychometrician finds other measures with which to correlate the new measure. For instance, a new scale to measure anxiety might be correlated with an existing measure known to be valid, or with physiological measures of anxiety such as heart rate. If the new anxiety measure correlates with other measures, this is evidence of its validity.
Validity and Personality Quizzes Correlation also can help establish the validity of a personality test—a more difficult task than establishing reliability. Magazine publishers and Web sites probably never check on the reliability or validity of the quizzes they develop. Think of them as mere entertainment.
Spencer Grant/PhotoEdit
Here’s another example concerning validity. In a groundbreaking study on affirmative action in higher education, researchers studied the success of more than 35,000 black and white students who attended 1 of 28 highly selective universities (Bowen & Bok, 2000). When determining validity, it is important that we consider how we will operationalize the variable of interest—here, success.
In this study, the researchers first considered the obvious criteria to operationalize success: these students’ future graduate education and career achievement. Their findings debunked the myth that black graduates of such institutions did not achieve the successes of their white counterparts. The researchers then went a step further and assessed a success-related criterion very important to the social fabric of a society: graduates’ levels of civic and community participation, including political involvement and community service. They found that significantly more black graduates than white graduates of these top institutions were actively involved in their communities. Through an examination of validity, this research changed the nature of the debate on affirmative action—by widening the pool of criteria by which we operationalize success.
CHECK YOUR LEARNING
Reviewing the Concepts |
|
Correlation is a central part of psychometrics, the statistics of the construction of tests and measures.
Psychometricians, the statisticians who practice psychometrics, use correlation to establish the reliability and the validity of a test.
Test–retest reliability can be estimated by correlating the same participants’ scores on the same test at two different time points.
Coefficient alpha, now widely used to establish reliability, is essentially calculated by taking the average of all possible split-half correlations (i.e., not just the odds vs. the evens).
|
Clarifying the Concepts |
13-11 |
How does the field of psychometrics make use of correlation? |
|
13-12 |
What does coefficient alpha measure and how is it calculated? |
Calculating the Statistics |
13-13 |
A researcher is assessing a diagnostic tool for determining whether students should be placed in a remedial reading program. The researcher calculates coefficient alpha and finds that it is 0.85.
Does the test have sufficient reliability to be used as a diagnostic tool? Why or why not?
Does the test have sufficient validity to be used as a diagnostic tool? Why or why not?
What information would we need to appropriately assess the validity of the test?
|
Applying the Concepts |
13-14 |
Remember the Cosmopolitan devotion quiz we referred to when discussing validity? Imagine that the magazine hired a psychometrician to assess the reliability and validity of its quizzes, and she administered this 10-item quiz to 100 readers of that magazine who had boyfriends.
How could the psychometrician establish the reliability of the quiz? That is, which of the methods introduced in this chapter could she use in this case? Be specific, and cite at least two ways.
How could the psychometrician establish the validity of the quiz? Be specific, and cite at least two ways.
Choose one of your criteria from part (b) and explain why it might not actually measure the underlying variable of interest. That is, explain how your criterion itself might not be valid.
|
Solutions to these Check Your Learning questions can be found in Appendix D.