PART 9: Thinking, Language, and Intelligence

27.2 Assessing Intelligence

27-4 What is an intelligence test, and what is the difference between achievement and aptitude tests?

intelligence test a method for assessing an individual’s mental aptitudes and comparing them with those of others, using numerical scores.

An intelligence test assesses people’s mental abilities and compares them with others, using numerical scores. Psychologists classify such tests as either aptitude tests, intended to predict your ability to learn a new skill, or achievement tests, intended to reflect what you have already learned. How do we design such tests, and what makes them credible? Consider why psychologists created tests of mental abilities and how they have used them.

aptitude test a test designed to predict a person’s future performance; aptitude is the capacity to learn.

achievement test a test designed to assess what a person has learned.

What Do Intelligence Tests Test?

Page 346

27-5 When and why were intelligence tests created, and how do today’s tests differ from early intelligence tests?

Barely a century ago, psychologists began designing tests to assess people’s abilities. Some measured aptitude (ability to learn). Others assessed achievement (what people have already learned).

ALFRED BINET: PREDICTING SCHOOL ACHIEVEMENT Modern intelligence testing traces its birth to early-twentieth-century France, where a new law required all children to attend school. French officials knew that some children, including many newcomers to Paris, would struggle and need special classes. But how could the schools make fair judgments about children’s learning potential? Teachers might assess children who had little prior education as slow learners. Or they might sort children into classes on the basis of their social backgrounds. To minimize such bias, France’s minister of public education gave Alfred Binet (1857–1911) the task of solving this problem.

Alfred Binet (1857–1911) “Some recent philosophers have given their moral approval to the deplorable verdict that an individual’s intelligence is a fixed quantity, one which cannot be augmented. We must protest and act against this brutal pessimism” (Binet, 1909, p. 141).

mental age a measure of intelligence test performance devised by Binet; the chronological age that most typically corresponds to a given level of performance. Thus, a child who does as well as an average 8-year-old is said to have a mental age of 8.

In 1905, Binet and his student, Théodore Simon, first presented their work under the archaic title, “New Methods for Diagnosing the Idiot, the Imbecile, and the Moron” (Nicolas & Levine, 2012). They began by assuming that all children follow the same course of intellectual development, but that some develop more rapidly. A “dull” child should score much like a typical younger child, and a “bright” child like a typical older child. Binet and Simon now had a clear goal: They would measure each child’s mental age, the level of performance typically associated with a certain chronological age. The average 8-year-old, for example, has a mental age of 8. An 8-year-old with a below-average mental age (perhaps performing at the level of a typical 6-year-old) would struggle with schoolwork considered normal for 8-year-olds.

“The IQ test was invented to predict academic performance, nothing else. If we wanted something that would predict life success, we’d have to invent another test completely.”

Social psychologist Robert Zajonc (1984b)

Binet and Simon tested a variety of reasoning and problem-solving questions on Binet’s two daughters, and then on “bright” and “backward” Parisian schoolchildren. Items that the successful students more often answered correctly could then be used to predict how well other French children would handle their schoolwork. Binet hoped his test would be used to improve children’s education, but he also feared it would be used to label children and limit their opportunities (Gould, 1981).

RETRIEVE IT

Question

What did Binet hope to achieve by establishing a child's mental age?

ANSWER: Binet hoped that determining mental age (the age that typically corresponds to a child's level of performance) would help identify appropriate school placements for children.

Stanford-Binet the widely used American revision (by Terman at Stanford University) of Binet’s original intelligence test.

LEWIS TERMAN: THE INNATE IQ Binet’s fears were realized soon after his death in 1911, when others adapted his tests for use as a numerical measure of inherited intelligence. Stanford University professor Lewis Terman (1877-1956) found that the Paris-developed questions and age norms worked poorly with California schoolchildren. He adapted some items, added others, and established new standards for various ages. He also extended the upper end of the test’s range from teenagers to “superior adults” and gave his revision the name it retains today—the Stanford-Binet. Terman assumed that certain ethnic groups were naturally more intelligent, and he supported the controversial eugenics movement, which aimed to protect and improve human genetic quality through selective sterilization and breeding.

intelligence quotient (IQ) defined originally as the ratio of mental age (ma) to chronological age (ca) multiplied by 100 (thus, IQ = ma/ca × 100). On contemporary intelligence tests, the average performance for a given age is assigned a score of 100.

Dave Coverly/Speed Bump

From such tests, German psychologist William Stern derived the famous intelligence quotient, or IQ. The IQ was simply a person’s mental age divided by chronological age and multiplied by 100 to get rid of the decimal point. Thus, an average child, whose mental age (8) and chronological age (8) are the same, has an IQ of 100. But an 8-year-old who answers questions at the level of a typical 10-year-old has an IQ of 125:

Page 347

The original IQ formula worked fairly well for children but not for adults. (Should a 40-year-old who does as well on the test as an average 20-year-old be assigned an IQ of only 50?) Most current intelligence tests, including the Stanford-Binet, no longer compute an IQ in this manner (though the term IQ still lingers in everyday vocabulary as shorthand for “intelligence test score”). Instead, they represent the test-taker’s performance relative to the average performance of others the same age. This average performance is arbitrarily assigned a score of 100, and about two-thirds of all test-takers fall between 85 and 115.

RETRIEVE IT

Question

What is the IQ of a 4-year-old with a mental age of 5?

ANSWER: 125 (5 ÷ 4 x 100 = 125)

Wechsler Adult Intelligence Scale (WAIS) the WAIS and its companion versions for children are the most widely used intelligence tests; contain verbal and performance (nonverbal) subtests.

DAVID WECHSLER: SEPARATE SCORES FOR SEPARATE SKILLS Psychologist David Wechsler created what is now the most widely used individual intelligence test, the Wechsler Adult Intelligence Scale (WAIS). There is a version for school-age children (the Wechsler Intelligence Scale for Children [WISC]), and another for preschool children (Evers et al., 2012). The WAIS (2008) edition consists of 15 subtests, including:

Matching patterns Block design puzzles test visual abstract processing ability. Wechsler’s individually administered intelligence test comes in forms suited for adults and children.

Similarities—Considering the commonality of two objects or concepts (“In what way are wool and cotton alike?”)
Vocabulary—Naming pictured objects, or defining words (“What is a guitar?”)
Block Design—Visual abstract processing (“Using the four blocks, make one just like this.”)
Letter-Number Sequencing—On hearing a series of numbers and letters, repeat the numbers in ascending order, and then the letters in alphabetical order (“R-2-C-1-M-3.”)

The WAIS yields both an overall intelligence score and individual scores for verbal comprehension, perceptual organization, working memory, and processing speed. Striking differences among these individual scores can provide clues to cognitive strengths or weaknesses. For example, a low verbal comprehension score combined with high scores on other subtests could indicate a reading or language disability. Other comparisons can help a therapist establish a rehabilitation plan for a stroke patient. In such ways, these tests help realize Binet’s aim: to identify opportunities for improvement and strengths that teachers and others can build upon.

RETRIEVE IT

Question

An employer with a pool of applicants for a single available position is interested in testing each applicant's potential. To help her decide whom she should hire, she should use an (achievement/aptitude) test. That same employer wishing to test the effectiveness of a new, on-the-job training program would be wise to use an (achievement/aptitude) test.

Three Tests of a “Good” Test

27-6 What is a normal curve, and what does it mean to say that a test has been standardized and is reliable and valid?

To be widely accepted, a psychological test must be standardized, reliable, and valid. The Stanford-Binet and Wechsler tests meet these requirements.

standardization defining uniform testing procedures and meaningful scores by comparison with the performance of a pretested group.

normal curve the bell-shaped curve that describes the distribution of many physical and psychological attributes. Most scores fall near the average, and fewer and fewer scores lie near the extremes.

Page 348

WAS THE TEST STANDARDIZED? The number of questions you answer correctly on an intelligence test would reveal almost nothing. To know how well you performed, you would need some basis for comparison. That’s why test-makers give new tests to a representative sample of people. The scores from this pretested group become the basis for future comparisons. If you later take the test following the same procedures, your score will be meaningful when compared with others. This process is called standardization.

If we construct a graph of test-takers’ scores, the scores typically form a bell-shaped pattern called the normal curve. No matter what attributes we measure—height, weight, or mental aptitude—people’s scores tend to form a bell curve. The highest point is the midpoint, or the average score. On an intelligence test, we give this average score a value of 100 (FIGURE 27.3). Moving out from the average, toward either extreme, we find fewer and fewer people. For the Stanford-Binet and Wechsler tests, a person’s score indicates whether that person’s performance fell above or below the average. A performance higher than all but 2 percent of all scores earns an intelligence score of 130. A performance lower than 98 percent of all scores earns an intelligence score of 70.

Figure 9.15: FIGURE 27.3 The normal curve Scores on aptitude tests tend to form a normal, or bell-shaped, curve around an average score. For the Wechsler scale, for example, the average score is 100

reliability the extent to which a test yields consistent results, as assessed by the consistency of scores on two halves of the test, on alternative forms of the test, or on retesting.

IS THE TEST RELIABLE? Knowing your score in comparison to the standardization group still won’t tell you much unless the test has reliability. A reliable test gives consistent scores, no matter who takes the test or when they take it. To check a test’s reliability, researchers test people many times. They may retest people using the same test, or they may split the test in half and see whether odd-question scores and even-question scores agree. If the two sets of scores generally agree, or correlate, the test is reliable. The higher the correlation, the higher the test’s reliability. The tests we have considered so far—the Stanford-Binet, the WAIS, and the WISC—are, after early childhood, very reliable (about +.9). When retested, people’s scores generally match their first score closely.

validity the extent to which a test measures or predicts what it is supposed to. (See also content validity and predictive validity.)

See LaunchPad’s Video: Correlational Studies below for a helpful tutorial animation.

IS THE TEST VALID? High reliability does not ensure a test’s validity—the extent to which the test actually measures or predicts what it promises. Imagine using a miscalibrated tape measure to measure people’s heights. Your results would be very reliable. No matter how many times you measured, people’s heights would be the same. But your results would not be valid, because you would not be giving the information you promised: real height.

content validity the extent to which a test samples the behavior that is of interest.

predictive validity the success with which a test predicts the behavior it is designed to predict; it is assessed by computing the correlation between test scores and the criterion behavior. (Also called criterion-related validity.)

Tests that tap the pertinent behavior, or criterion, have content validity. The road test for a driver’s license has content validity because it samples the tasks a driver routinely faces. Course exams have content validity if they assess your mastery of course material. But we expect intelligence tests to have predictive validity: They should predict future performance, and to some extent they do.

Page 349

The predictive power of aptitude tests is fairly strong in the early school years, but later it weakens. Past grades, which reflect both aptitude and motivation, are better predictors of future achievements.

RETRIEVE IT

Question

What are the three criteria that a psychological test must meet in order to be widely accepted? Explain.

ANSWER: A psychological test must be standardized (pretested on a representative sample of people), reliable (yielding consistent results), and valid (measuring what it is supposed to measure).

Question

Correlation coefficients were used in this module. Here's a quick review: Correlations do not indicate cause-effect, but they do tell us whether two things are associated in some way. A correlation of –1.0 represents perfect (agreement/disagreement) between two sets of scores: As one score goes up, the other score goes (up/down). A correlation of represents no association. The highest correlation, +1.0, represents perfect (agreement/disagreement): As the first score goes up, the other score goes (up/down).

●

◌

▣