32.4 The Question of Bias

32-5 Are intelligence tests inappropriately biased?

If one assumes that race is a meaningful concept, the debate over racial differences in intelligence divides into three camps (Hunt & Carlson, 2007):

We have considered group difference from the first and second perspectives. Let’s turn now to the third: Are intelligence tests biased? The answer depends on which of two very different definitions of bias we use.

Two Meanings of Bias

The scientific meaning of bias hinges on a test’s validity—on whether it predicts future behavior only for some groups of test-takers. For example, if the SAT accurately predicted the college achievement of women but not that of men, then the test would be biased. In this statistical meaning of the term, the near-consensus among psychologists (as summarized by the U.S. National Research Council’s Committee on Ability Testing and the American Psychological Association’s Task Force on Intelligence) has been that the major U.S. aptitude tests are not biased (Hunt & Carlson, 2007; Neisser et al., 1996; Wigdor & Garner, 1982). The tests’ predictive validity is roughly the same for women and men, for various races, and for rich and poor. If an intelligence test score of 95 predicts slightly below-average grades, that rough prediction usually applies equally to all.

But we can also consider a test biased if it detects not only innate differences in intelligence but also performance differences caused by cultural experiences. This in fact happened to Eastern European immigrants in the early 1900s. Lacking the experience to answer questions about their new culture, many were classified as “feeble-minded.” In this popular sense, intelligence tests are biased. They measure your developed abilities, which reflect, in part, your education and experiences.

You may have read examples of intelligence test items that make assumptions (for example, that a cup goes with a saucer). Such items bias the test against those who do not use saucers. Could such questions explain cultural differences in test performance? In such cases, tests can be a vehicle for discrimination, consigning potentially capable children (some of whom may have a different native language) to dead-end classes and jobs. Thus, some intelligence researchers recommend creating culture-neutral questions—such as assessing people’s ability to learn novel words, sayings, and analogies—to enable culture-fair aptitude tests (Fagan & Holland, 2007, 2009).

Defenders of the existing aptitude tests have noted that racial group differences persist on nonverbal items, such as counting digits backward (Jensen, 1983, 1998). Moreover, they add, blaming the test for a group’s lower scores is like blaming a messenger for bad news. Why blame the tests for exposing unequal experiences and opportunities? If, because of malnutrition, people were to suffer stunted growth, would you blame the measuring stick that reveals it? If unequal past experiences predict unequal future achievements, a valid aptitude test will detect such inequalities.

414

So, test-makers’ expectations can introduce bias in an intelligence test. This is consistent with an observation we have seen throughout this text: Our expectations and attitudes can influence our perceptions and behaviors. This is also true for the person taking the test.

RETRIEVAL PRACTICE

  • What is the difference between a test that is biased culturally and a test that is biased in terms of its validity?

A test may be culturally biased if higher scores are achieved by those with certain cultural experiences. That same test may not be biased in terms of validity if it predicts what it is supposed to predict. For example, the SAT may be culturally biased in favor of those with experience in the U.S. school system, but it does still accurately predict U.S. college success.

Test-Takers' Expectations

When Steven Spencer and his colleagues (1997) gave a difficult math test to equally capable men and women, women did not do as well—except when they had been led to expect that women usually do as well as men on the test. Otherwise, something affected their performance. And with Claude Steele and Joshua Aronson, Spencer (2002) again observed this self-fulfilling stereotype threat when Black students were reminded of their race just before taking verbal aptitude tests and performed worse. Follow-up experiments have confirmed that negatively stereotyped minorities and women may have unrealized academic potential (Nguyen & Ryan, 2008; Walton & Spencer, 2009). If, when taking an intelligence test or an exam, you are worried that your group or “type” often doesn’t do well, your self-doubts and self-monitoring may hijack your working memory and impair your performance (Schmader, 2010). Such thoughts, and worries about what others are thinking about you, can be distracting. For such reasons, stereotype threat may impair attention, performance, and learning (Inzlicht & Kang, 2010; Rydell, 2010). Remove the threat—by labeling the assessment as a “warm-up” exercise rather than a “test”—and stereotyped minorities often perform better (Taylor & Walton, 2011).

“Math class is tough!”

“Teen talk” talking Barbie doll (introduced July 1992, recalled October 1992)

Critics argue that stereotype threat does not fully account for Black-White aptitude score differences or the gender gap in high-level math achievements (Sackett et al., 2004, 2008; Stoet & Geary, 2012). But it does help explain why Blacks have scored higher when tested by Blacks than when tested by Whites (Danso & Esses, 2001; Inzlicht & Ben-Zeev, 2000). It gives us insight into why women have scored higher on math tests with no male test-takers present, and why women’s online chess play drops sharply when they think they are playing a male opponent (Maass et al., 2008). It also explains “the Obama effect”—the finding that African-American adults performed better if they took a verbal aptitude test immediately after watching then-candidate Barack Obama’s stereotype-defying nomination acceptance speech or just after his 2008 presidential victory (Marx et al., 2009).

Stereotype threat Academic success can be hampered by self-doubt and self-monitoring during exams, which may impair attention, memory, and performance.

Steele (1995, 2010) concludes that telling students they probably won’t succeed (as is sometimes implied by remedial “minority support” programs) functions as a stereotype that can erode performance. Over time, such students may detach their self-esteem from academics and look for recognition elsewhere. Indeed, as African-American male students progress from eighth to twelfth grade, a growing disconnect appears between their grades and their self-esteem, and they tend to underachieve (Osborne, 1997).

415

One experiment randomly assigned some African-American seventh graders to write for 15 minutes about their most important values (Cohen et al., 2006, 2009). That simple exercise in self-affirmation had the apparent effect of boosting their semester grade point average by 0.26 in a first experiment and 0.34 in a replication. Can a brief confidence-boosting exercise actually increase school achievement? “It was hard for us to believe,” reported Geoffrey Cohen (2013), “but we’ve replicated it since,” including among women in college physics. Other research teams also have reproduced the benefits of the self-affirmation exercise (Bowen et al., 2012; Harackiewicz et al., 2013; Miyake et al., 2010; Sherman et al., 2013). Minority students in university programs that have challenged them to believe in their potential, or to focus on the idea that intelligence is malleable and not fixed, have likewise produced markedly higher grades and had lower dropout rates (Wilson, 2006).

***

What, then, can we realistically conclude about aptitude tests and bias? The tests are not biased in the scientific sense of failing to make valid statistical predictions for different groups. But they are indeed biased (appropriately so, some would say) in one sense—sensitivity to performance differences caused by cultural experience. Are the tests discriminatory? Again, the answer can be Yes or No. In one sense, Yes, their purpose is to discriminate—to distinguish among individuals. In another sense, No, their purpose is to reduce discrimination by decreasing reliance on subjective criteria for school and job placement—who you know, how you dress, or whether you are the “right kind of person.” Civil service aptitude tests, for example, were devised to discriminate more fairly and objectively by reducing the political, racial, ethnic, and gender discrimination that preceded their use. Banning aptitude tests would lead those who decide on jobs and admissions to rely more on other considerations, such as personal opinion.

‘Almost all the joyful things of life are outside the measure of IQ tests.”

Madeleine L’Engle, A Circle of Quiet, 1972

Perhaps, then, our goals for tests of mental abilities should be threefold. First, we should realize the benefits that intelligence testing pioneer Alfred Binet foresaw—to enable schools to recognize who might profit most from early intervention. Second, we must remain alert to Binet’s fear that intelligence test scores may be misinterpreted as literal measures of a person’s worth and potential. Third, we must remember that the competence that general intelligence tests sample is important; it helps enable success in some life paths. But it reflects only one aspect of personal competence, while missing the irrational thoughts and other kinds of thinking common to us all (Stanovich et al., 2013, 2014). Our practical intelligence and emotional intelligence matter, too, as do other forms of creativity, talent, and character.

The point to remember: There are many ways of being successful; our differences are variations of human adaptability. Life’s great achievements result not only from “can do” abilities (and fair opportunity) but also from “will do” motivation. Competence + Diligence → Accomplishment.

“[Einstein] showed that genius equals brains plus tenacity squared.”

Walter Isaacson, “Einstein’s Final Quest,” 2009

RETRIEVAL PRACTICE

  • What psychological principle helps explain why women tend to perform more poorly when they believe their online chess opponent is male?

stereotype threat

416