6 Thinking and Intelligence

6.3 Intelligent Thinking

The types of thinking that we have discussed so far—problem solving, judgment, and hypothesis testing—are important aspects of intelligent thinking, or what we call intelligence. We have a pretty good idea of what intelligence is, and we can recognize examples of intelligent thinking. However, finding a definition of intelligence that most psychologists can agree upon is not that easy. The definition of intelligence is controversial, and intelligence tests and their use are even more controversial. Their fairness and usefulness are constantly questioned. In this section, we will examine two of the major questions concerning intelligence and intelligence tests. First, is intelligence one general ability or is it a set of specific mental abilities? Second, is intelligence due to genetics or experience? To put these two discussions into proper perspective, we will first consider the historical development of intelligence tests and then the characteristics of a good intelligence test.

Page 266

Intelligence Tests

Sir Francis Galton

Alfred Binet

The first attempts to develop intelligence tests took place in late nineteenth-century England and in early twentieth-century France. From the start, intelligence testing was enmeshed in the nature-versus-nurture controversy—is intelligence innate (set by genetics) or is it nurtured by one’s environmental experiences? You will see that the major intelligence theorists tended to favor one side or the other. As each theorist’s work is described, his bias will be pointed out.

The history of intelligence tests. In late nineteenth-century England, Sir Francis Galton was trying to develop an intelligence test for the purpose of eugenics, selective reproduction to enhance the capacities of the human race (Fancher, 1985). Galton clearly fell on the nature side of the nature–nurture debate. He believed in the genetic determination of intelligence and thought he could measure intelligence by measuring various aspects of the human brain and nervous system. Galton assumed that more intelligent people would have more acute senses, greater strength, and faster reactions to stimuli, so he devised a series of tests measuring these physical traits and tested thousands of people. As you might guess, Galton’s tests were not good predictors of intelligent thinking. Probably more significant to intelligence testing was Galton’s invention of the basic mathematics behind correlational statistics (what he termed “co-relations”) that are used for numerous aspects of testing from assessing genetic contributions to determining a test’s validity (Gardner, Kornhaber, & Wake, 1996). Before Galton, correlational statistics did not exist. Actually, his disciple, Karl Pearson, formalized Galton’s ideas to allow computation of the correlation coefficient (Fancher, 1985), which we discussed in Chapter 1.

Lewis Terman

Courtesy Stanford University Archives

David Wechsler

Archives of the History of American Psychology, The Center for the History of Psychology, The University of Akron

In France at the beginning of the twentieth century, Alfred Binet and his assistant Theophile Simon were working on the problem of academic retardation. France had recently introduced mass public education, and the French government asked Binet to develop a test to diagnose children whose intellectual development was subnormal and who would not be able to keep up with their peers in school. This test, published in 1905, was the first accepted test of intelligence (Fancher, 1985). The test calculated a child’s performance as an intellectual level, or mental age, that then could be used in diagnosing academic retardation. Mental age is the age typically associated with a child’s level of test performance. If a child’s mental age were less than his or her chronological (actual) age, the child would need remedial work. The fact that Binet helped in the development of “mental orthopedics”—mental exercises designed to raise the intelligence of children classified as subnormal—clearly illustrated his bent toward the nurture side of the nature–nurture debate. The Binet-Simon test was revised in 1908 and again in 1911, the year Binet died unexpectedly at the age of 54 from a terminal disease whose exact nature is no longer known (Fancher, 1985). Within 10 years of Binet’s death, translations and revisions of the Binet-Simon scale were in use all over the world (Miller, 1962). Ironically, the test was not used extensively in France until the 1940s—and only then when a French social worker who had been working in the United States brought the American version of the test back to France (Kaufman, 2009; Miller, 1962).

Page 267

intelligence quotient (IQ) (mental age/chronological age) × 100.

The next major figure in intelligence test development, Lewis Terman, had Galton’s nature bias, but he used Binet and Simon’s test. Working at Stanford University, Terman revised the intelligence test for use with American schoolchildren. This revision, first published in 1916, became known as the Stanford-Binet (Terman, 1916). To report Stanford-Binet scores, Terman used the classic intelligence quotient formula suggested by William Stern, a German psychologist (Stern, 1914). The intelligence quotient (IQ) formula was the following: IQ = (mental age/chronological age) × 100. Multiplying by 100 centered the scale at 100. When a child’s mental age (as assessed by the test) was greater than the child’s chronological age (actual age), the child’s IQ was greater than 100; when chronological age was greater than mental age, the child’s IQ was less than 100. The IQ formula, however, is no longer used to compute a person’s intelligence test score on the Stanford-Binet. It is a confounded measure because a year’s growth doesn’t have a constant meaning from year to year for mental ability. Mental growth levels off, but age keeps increasing. How intelligence test scores are computed is explained in our discussion of the next major figure of intelligence testing, David Wechsler.

J.B. Handelsman/The New Yorker Collection/The Cartoon Bank

David Wechsler was working as chief psychologist at Bellevue Hospital in New York City in the 1930s and was in charge of testing thousands of adult patients from very diverse backgrounds (Fancher, 1985). Like Binet, Wechsler thought that intelligence was nurtured by one’s environment. Given that most of his patients were undereducated, he wanted to get a much broader assessment of their abilities. The Stanford-Binet wasn’t well-suited for his purposes, because it only provided a single measure of abilities related to academic performance (IQ). The IQ formula was especially irrelevant for adults because, as we mentioned earlier, at some point mental age levels off but chronological age keeps increasing. Based on the Stanford-Binet, a person’s IQ would automatically go down with age regardless of one’s mental abilities.

Page 268

Given all of these problems, Wechsler developed his own test, the Wechsler Bellevue Scale, in 1939 (Gardner, Kornhaber, & Wake, 1996). This test became known as the Wechsler Adult Intelligence Scale (WAIS) in 1955 and is appropriate for ages 16 and older. Wechsler also developed the Wechsler Intelligence Scale for Children (WISC) in 1949 for children aged 6 to 16. Both tests (the WAIS and WISC) provide finer analyses of a person’s abilities by providing test scores for a battery of not only verbal tests (such as vocabulary and word reasoning) but also nonverbal, performance tests (such as block design, matrix reasoning, and visual puzzles). An item similar to those on the WAIS nonverbal Visual Puzzles subtest and some similar to those on the WAIS Cancellation subtest are given in Figure 6.3a and 6.3b, respectively. The WAIS is in its fourth edition, and the WISC and Stanford-Binet are now in their fifth editions. Like Wechsler’s two tests, the latest edition of the Stanford-Binet provides a broader assessment of intelligence, including nonverbal subtests.

Figure 6.3: Figure 6.3 | Sample Performance (Nonverbal) Test Items on the WAIS | (a) This is a nonverbal test item similar to those on the WAIS Visual Puzzles subtest that assesses perceptual reasoning ability. (b) These are nonverbal test items similar to those on the WAIS Cancellation subtest that assess perceptual processing speed.

((a) Wechsler Adult Intelligence Scale, Fourth Edition (WAIS-IV). Copyright © 2008 NCS Pearson, Inc. Reproduced with permission. All rights reserved. (b) “Wechsler Adult Intelligence Scale^®, Fourth Edition” and “WAIS” are trademarks, in the United States and other countries, of Pearson Education, Inc. or its affiliate(s).)

Page 269

standardization The process that allows test scores to be interpreted by providing test norms.

Deviation IQ scores. Wechsler also devised a better way to report intelligence test scores, called deviation IQ scores. Like IQs, deviation IQs involve standardization. Standardization is the process that allows test scores to be interpreted by providing test norms. To standardize a test, the test must be given to a representative sample of the relevant population. The scores of this sample then serve as the test norms for interpretation. For example, Terman standardized the original Stanford-Binet on American children of various ages. Any child’s raw test score could be compared to the standardization norms to calculate the child’s mental age and IQ. Because of the problems in the IQ formula for adults, however, Wechsler decided to use his standardization data differently. He collected standardization data for various adult age groups. The data for each age group form a normal distribution (see Figure 6.4).

Figure 6.4: Figure 6.4 | Deviation IQ Scores on the WAIS | Deviation IQ scores for the WAIS are indicated for the mean, and 1 to 3 standard deviations above or below the mean. To make deviation IQ scores resemble IQ scores, the mean was set at 100 points and the standard deviation at 15 points.

deviation IQ score 100 plus or minus (15 × the number of standard deviations the person is from the raw score mean for their standardization group).

To calculate a person’s deviation IQ, Wechsler compared the person’s raw test score to the normal distribution of raw scores for that person’s standardization age group. He calculated how far the raw score was from the mean raw score in terms of standard deviation units from the mean. To make the deviation scores resemble IQ formula scores, he set the mean to 100 and the standard deviation to 15. He then defined a person’s deviation IQ score as 100 plus or minus (15 × the number of standard deviation units a person’s raw test score is from the mean for the relevant age group norms). For example, if a person’s raw test score fell 1 standard deviation above the mean for his or her age group, he or she would have a deviation IQ score of 100 plus (15 × 1), or 115. The deviation IQ scale for the WAIS is illustrated in Figure 6.4. The same scale is used for the WISC but with standardization data for various child age groups; a similar deviation IQ scale with the standard deviation set at 15 is now used for scoring the Stanford-Binet intelligence test.

Page 270

reliability The extent to which the scores for a test are consistent.

Reliability and validity. In addition to being standardized, a good test must also be reliable and valid. Reliability is the extent to which the scores for a test are consistent. This may be assessed in various ways. In the test–retest method, the test is given twice to the same sample, and the correlation coefficient for the two sets of scores is computed. If the test is reliable, this coefficient should be strongly positive. Remember from Chapter 1 that a strong correlation coefficient approaches 1.0, hence acceptable reliability coefficients should be around +0.90 or higher. The reliability coefficients for the major intelligence tests, the WAIS, WISC, and Stanford-Binet, are at this level (Kaufman, 2000).

If multiple forms of the test are available, then alternate-form reliability can be assessed. Alternate-form reliability is determined by giving different forms of the test to the same sample at different times and computing the correlation coefficient for performance on the two forms. A third type of reliability assessment involves consistency within the test. If the test is internally consistent, then performance on the two halves of the test (odd versus even items) should be strongly positively correlated. This type of reliability is called split-half reliability because performance on half of the test items (the odd-numbered items) is compared to performance on the other half (the even-numbered items).

validity The extent to which a test measures what it is supposed to measure or predicts what it is supposed to predict.

In addition to reliability, a test should have validity. Validity is the extent to which a test measures what it is supposed to measure or predicts what it is supposed to predict. The former type of validity is called content validity, and the latter, predictive validity. Content validity means that the test covers the content that it is supposed to cover. Experts decide this. All course exams should have content validity (be on the content that was assigned). Predictive validity means that the test predicts behavior that is related to what is being measured by the test; for example, an intelligence test should predict how well a child does in school. Children and teenagers higher in intelligence should do better on average than children and teenagers lower in intelligence, and this is the case for the intelligence tests that we have discussed. These tests all have good predictive validity. It is important to note that if a test is valid, then it will also be reliable. However, the reverse is not true. A test may be reliable, but not valid. A good example is measuring the circumference of the head as an indicator of intelligence. This number would be consistent across two measurements and therefore reliable, but it does not predict intelligent behavior (does not have predictive validity).

Marty Bucella/Cartoonstock.com

Intelligence test scores are among the most valid predictors of academic performance and job performance across just about every major occupation studied (Neisser et al., 1996; Sackett, Schmitt, Ellingson, & Kabin, 2001; Schmidt & Hunter, 1998). Not only do intelligence tests have good predictive validity, but they also are not biased against women or minorities. The tests’ predictive validity applies equally to all. It is roughly the same regardless of gender, race, ethnicity, and so on. Group differences in test scores are accompanied by comparable differences in performance. As Lilienfeld, Lynn, Ruscio, and Beyerstein (2010) point out, the belief that these tests are biased in this way is a widespread myth and the research on this belief indicates that intelligence tests and other standardized abilities tests, such as the SAT, are not biased against either women or minorities. A task force of the American Psychological Association (Neisser et al., 1996) and two National Academy of Science panels (Hartigan & Wigdor, 1989; Wigdor & Garner, 1982) reached this same conclusion. In sum, the question of intelligence test bias with respect to predictive validity has been settled about as conclusively as any scientific controversy can be (Gottfredson, 1997, 2009; Sackett, Bornerman, & Connelly, 2008).

Page 271

Psychologists agree on what an intelligence test should predict, but they do not agree on how intelligence should be defined. Is intelligence one general ability or many specific abilities? Does intelligence involve more than just mental abilities? Are there multiple types of intelligence? Different psychologists answer these questions in different ways based on their definitions of intelligence. The other major controversy concerning intelligence is the nature–nurture debate on the basis of intelligence. Galton’s work marked the beginning of this debate, and the debate is still with us today, over a century later. It is to these two controversies that we now turn.

Controversies About Intelligence

The argument over whether intelligence is a single general ability or a collection of specific abilities has been around for over a hundred years. Many theorists have argued that intelligence comprises multiple abilities, but the exact number of abilities or types of intelligence proposed has varied from 2 to over 100. In our discussion, we’ll consider a few of the more prominent theories, first considering those that propose that intelligence is one or more mental abilities and then theories that define intelligence more broadly, including more than just mental abilities as assessed by intelligence tests.

Page 272

Theories of intelligence. Mental ability theories of intelligence began with Charles Spearman (1927), who argued that intelligence test performance is a function of two types of factors: (1) a g factor (general intelligence), and (2) some s factors (specific intellectual abilities such as reasoning). Spearman thought that the g factor was more important because it was relevant across the various mental abilities that make up an intelligence test, and that the s factors were more pertinent to specific subtests of an intelligence test.

Spearman’s theory was based on his observation that people who did well on one subtest usually did well on most of the subtests, and people who did poorly usually did so on most of the subtests. Individuals did not do equally well on all of the various subtests, however, indicating the secondary effects of the s factors or specific abilities on the various subtests. Contemporary research has shown the g factor to be a good predictor of performance both in school and at work (Gottfredson, 2002a, 2002b).

factor analysis A statistical technique that identifies clusters of test items that measure the same ability (factor).

In contrast, one of Spearman’s contemporaries, L. L. Thurstone, argued that specific mental abilities (like Spearman’s s factors) were more important (Thurstone, 1938). Based on his research, he argued that there were seven primary mental abilities—verbal comprehension, number facility, spatial relations, perceptual speed, word fluency, associative memory, and reasoning. Thurstone identified these primary abilities by using factor analysis, a statistical technique that identifies clusters of test items that measure the same ability (factor). Spearman also used factor analysis, so why the difference in theories? Basically, they emphasized different aspects of the analysis. Thurstone emphasized the specific factor clusters, whereas Spearman emphasized the correlations across the various clusters.

Later, Raymond Cattell (a student of Spearman) and John Horn proposed a slightly different type of mental ability theory that has mainly influenced researchers focused on aging (Cattell, 1987; Horn & Cattell, 1966, 1967). They proposed that the g factor should be viewed as two types of intelligence: (1) fluid intelligence and (2) crystallized intelligence. Fluid intelligence refers to abilities independent of acquired knowledge, such as abstract reasoning, logical problem solving, and the speed of information processing. They defined crystallized intelligence as accumulated knowledge and verbal and numerical skills. This theory has interested researchers focused on aging because crystallized intelligence increases with experience and formal education and grows as we age; fluid intelligence is not influenced by these factors and actually declines with age. However, a recent large-scale study revealed that this differentiation may be too simplistic and that various cognitive abilities that make up intelligence peak at different ages (Hartshorne & Germaine, 2015).We will return to this theory in Chapter 7 when we consider how intelligence changes across the life span.

All of the theories we have discussed thus far focus on definitions of intelligence as mental abilities that can be assessed by standard intelligence tests such as the Stanford-Binet and WAIS. Three major contemporary theories by Howard Gardner (1983, 1993, 1999), Robert Sternberg (1985, 1988, 1999), and Keith Stanovich (2009a, b) extend this definition to include other types of abilities. We’ll consider Gardner’s first. According to Gardner’s theory of multiple intelligences, there are eight independent intelligences—linguistic, logical-mathematical, spatial, musical, bodily-kinesthetic, intrapersonal, interpersonal, and naturalistic. Brief descriptions of each of these intelligences are given in Table 6.1. The linguistic and logical-mathematical intelligences seem to fit with other definitions of intelligence in terms of mental abilities, but the other six are controversial; many psychologists see these as talents or skills instead of types of intelligence. In addition, many of these intelligences are difficult to quantify (such as intrapersonal intelligence) and present measurement problems.

Table 6.2: Table 6.1 Brief Descriptions of Gardner’s Eight Intelligences

Linguistic	Language ability as in reading, writing, and speaking
Logical-mathematical	Mathematical problem solving and scientific analysis
Spatial	Reasoning about visual spatial relationships
Musical	Musical skills such as the ability to compose and understand music
Bodily-kinesthetic	Skill in body movement and handling objects
Intrapersonal	Understanding oneself
Interpersonal	Understanding other people
Naturalist	Ability to discern patterns in nature

Page 273

According to Sternberg’s triarchic theory of intelligence, there are three types of intelligence—analytical, practical, and creative. Analytical intelligence is essentially what is measured by standard intelligence tests, the skills necessary for good academic performance. However, the other two types of intelligence are not really measured by standard intelligence tests. Practical intelligence could be equated with good common sense or “street smarts.” Creative intelligence is concerned with the ability to solve novel problems and deal with unusual situations. Sternberg’s intelligences are all types of mental ability, but the inclusion of practical and creative intelligences broadens our conception of intelligence by including mental abilities that seem to have more applicability in the nonacademic world.

Cognitive researcher Keith Stanovich (2009a, 2009b) argues that intelligence is a meaningful, useful construct and, unlike Gardner and Sternberg, is not interested in expanding the definition of intelligence. Rather he argues that intelligence is only one component of good thinking and thus by itself is not sufficient to explain such thinking. The other critical component is our ability to think and act rationally, which is not assessed by standard intelligence tests. Further, these two components are independent so you can be intelligent and not act rationally and vice versa. This is why smart people sometimes do foolish things. Stanovich coined the term “dysrationalia” to describe this failure to think and behave rationally despite having adequate intelligence.

Page 274

One cause of dysrationalia is that we tend to be cognitive misers, using System 1 too much. This is the reason we have developed a whole set of heuristics and biases (many such as anchoring, representativeness, and confirmation bias were discussed earlier in this chapter) to limit the amount of reflective, analytical thinking that we need to engage in. As we have learned, these shortcut strategies provide rough and ready answers that are right sometimes but often wrong. Another source of dysrationalia is what Stanovich calls the mindware gap, which occurs when we haven’t learned the appropriate mindware (specific knowledge, such as an understanding of probability, and cognitive rules and strategies, such as scientific thinking, that are necessary to think rationally). According to Stanovich, many intelligent people never acquire the appropriate mindware. Finally, given such causes, Stanovich thinks that rational thinking and behavior can be taught and that they ought to be taught at every stage of the educational system. Richard Nisbett, in his 2015 book Mindware: Tools for Smart Thinking, provides a guide to the most essential mindware tools and how to frame common problems so that these tools can be applied to them.

The six theories of intelligence that we have discussed are briefly summarized in Table 6.2. Next we will consider the controversial nature–nurture debate on the basis of intelligence.

Table 6.3: Table 6.2 Theories of Intelligence

Theorist	Theory Summary
Spearman	Intelligence is mainly a function of a general intelligence (g) factor
Thurstone	Intelligence is a function of seven primary mental abilities—verbal comprehension, number facility, spatial relations, perceptual speed, word fluency, associative memory, and reasoning
Cattell and Horn	There are two types of intelligence—crystallized intelligence, which refers to accumulated knowledge and verbal and numerical skills, and fluid intelligence, which refers to abilities independent of acquired knowledge, such as abstract reasoning, logical problem solving, and speed of information processing
Gardner	Intelligence is defined as eight independent intelligences—linguistic, logical-mathematical, spatial, musical, bodily-kinesthetic, intrapersonal, interpersonal, and naturalist
Sternberg	Intelligence is defined as three types of abilities—analytical, creative, and practical
Stanovich	Intelligence is not sufficient for good thinking; rationality, which is independent of intelligence, is also necessary

Page 275

Nature versus nurture. Not only do psychologists disagree on the definition of intelligence, they also argue about its origins—the nature–nurture debate that we discussed earlier. This debate was popularized by Galton, a strong proponent of the nature side of the argument, over a century ago. Most contemporary psychologists, however, believe that both heredity (nature) and environmental experiences (nurture) are important. The disagreement now is over the relative contribution of each part (nature and nurture) to intelligence. We’ll take a brief look at how some psychologists have tried to settle this disagreement.

First, we’ll consider the results of genetic similarity studies. Genetic similarity between people varies from 100% similarity between identical twins to 50% between fraternal twins and between siblings (brothers and sisters) to 0% between two unrelated people. If intelligence were due to heredity, the average correlations between intelligence scores should decrease as genetic similarity decreases. Researchers have found this to be the case (Bouchard & McGue, 1981). The average correlation coefficient drops from 10.86 for identical twins to essentially 0 for unrelated people. The data, however, also show effects of environment. For example, if identical twins are raised apart (adopted into different families), the average correlation between their intelligence test scores drops to 10.72, indicating the importance of sharing similar environments.

Let’s consider two more of these findings to see how the effects of both heredity and environment are indicated. The average correlation between fraternal twins raised together (10.60) is less than that for identical twins reared apart (10.72), indicating the influence of heredity; but the average correlation is greater than that for ordinary siblings reared together (10.47), indicating environmental influences. Remember that the amount of genetic similarity for fraternal twins and ordinary siblings is the same, 50%. This means that the greater correlation for fraternal twins (10.60 versus 10.47) must be due to environmental factors. The fraternal twins are the same age; hence their environmental experiences are more similar than those for ordinary siblings of different ages. As these two findings indicate, heredity and environment work together to influence intelligence test scores.

Researchers have also looked at adopted children and the correlations between their scores with both their adoptive and biological parents. The modest correlation between the intelligence test scores of adopted children with their adoptive parents disappears as the children age (McGue, Bouchard, Iacono, & Lykken, 1993). The reverse is true, however, for the correlation between the scores for adopted children and their biological parents. It increases (Plomin, DeFries, McClearn, & Rutter, 1997). This stronger relationship between a person’s intelligence and that of his or her biological parents means that nature plays a larger role in determining a person’s intelligence than environmental experiences.

heritability An index of the degree that variation of a trait within a given population is due to heredity.

reaction range The genetically determined limits for an individual’s intelligence.

The results of genetic similarity studies of intelligence can also be used to estimate its heritability, an index of the degree that variation of a trait within a given population is due to heredity. These estimates vary, usually in the range from around 50% up to 70% (Bouchard, Lykken, McGue, Segal, & Tellegen, 1990). Thus, for a given population, 50% to 70% of the variation in their intelligence test scores is estimated to be due to heredity. However, because heritability is not 100%, this means that heredity and environment work together to determine intelligence (though heredity may make a larger contribution). Given this fact, recent research focuses on how heredity and environment interact to determine a person’s intelligence score. The assumption is that heredity determines a reaction range, genetically determined limits for an individual’s intelligence. Heredity places upper and lower limits on a person’s intelligence, but the quality of the person’s environment determines where the individual falls within this range. The principle is simple—the higher the environmental quality, the higher the person’s intelligence within the reaction range.

Page 276

Mike Twohy/The New Yorker Collection/The Cartoon Bank

Two points about heritability should be made clear. First, it is a group statistic and is not relevant to individual cases. For example, if the heritability estimate for intelligence for a population were 50%, this does not mean that 50% of the intelligence of an individual in the group is determined by genetics and the other 50% by environmental experiences. It means that 50% of the variation in intelligence among people in the population is due to genetics and 50% of the variation is due to environmental experiences. Second, it is important to understand that heritability has nothing to do with the differences that have been observed between populations, such as the difference in scores for Asian schoolchildren versus American schoolchildren. Heritability only applies to the variation within a given population or group, not to variation between groups. Observed group differences must be analyzed individually and in a different way.

Let’s examine the gap between Asian and American schoolchildren to see one way such analyses are done (Stevenson, Chen, & Lee, 1993; Stevenson & Stigler, 1992). In this case, researchers examined the group difference for children of different ages and concluded that the gap is likely due to the priority placed on education in Asian countries. Why? There doesn’t seem to be a gap before the children enter school. The gap begins and increases as the children proceed through school. The Asian cultures place higher value on education, and so the children spend more time studying and work harder, achieving more in school and higher test scores. There seem to be clear environmental factors operating to explain this particular group difference.

Flynn effect The finding that the average intelligence test score in the United States and other industrialized nations has improved steadily over the last century.

As if there are not enough unanswered questions and debates about intelligence, a recent curious finding has led to yet another. It is called the “Flynn effect” because intelligence researcher James Flynn popularized it. The “Flynn effect” label was coined by Herrnstein and Murray in their book The Bell Curve (1994). Actually Flynn (2007) says that if he had thought to name it, he would have called it the “Tuddenham effect” because Tuddenham (1948) was the first to present convincing evidence of the effect using the intelligence test scores of U.S. soldiers in World Wars I and II. The Flynn effect refers to the fact that in the United States and other Western developed nations, average intelligence scores have improved steadily over the past century (Flynn, 1987, 1999, 2007, 2012). For example, the average score in 1918 would be equivalent to a score of 76 when compared to recent standardization norms. This translates to a gain on average of about 3 points per decade on both the Stanford-Binet and Wechsler intelligence scales. A recent meta-analysis (Trahan, Stuebing, Hiscock, & Fletcher, 2014) of 285 studies (N = 14,031) with test score data for a wider range of intelligence tests than Flynn used in his analyses supported Flynn’s estimate of about 3 points per decade. Another recent, even larger meta-analysis of the intelligence test score data for almost 4 million participants from 31 countries across more than one century (1909−2013) also found an average of about 3 points per decade in intelligence test scores (Pietschnig & Voracek, 2015).

Page 277

Proposed explanations for the Flynn effect involve a broad range of environmental factors, from improved nutrition, hygiene, and availability of medical services to better education and smaller average family size, but the explanation of the effect still remains a source of debate, with multiple factors likely contributing to it (Neisser, 1998; Mingroni, 2014; Williams, 2013). Joining this debate, Flynn has recently proposed that the effect is not due to people getting smarter overall but rather to the fact that they are getting smarter at skills that have become more important in our society over the past century, especially abstract, scientific thinking (Flynn, 2007, 2012). Our society has changed from agriculture-based to industry-based to information-based. Thus, the need to develop abstract, scientific thinking has grown as the nature of our society has changed. Flynn’s hypothesis is supported by the finding that large intelligence increases have not been observed for all types of cognitive functioning. For example, the largest intelligence gains have come on intelligence subtests that involve abstract, scientific thinking, and the gains on subtests that are related to traditional academic subjects, such as vocabulary, general knowledge, and arithmetic, have been smaller. In agreement with Flynn’s proposal, Pietschnig and Voracek (2015) found that the gains for fluid intelligence (abilities such as abstract reasoning that are independent of acquired knowledge) over the past century have been greater than for crystallized intelligence (acquired knowledge and verbal and numerical skills).

James Flynn

James R. Flynn

Page 278

Pietschnig and Voracek (2015) also found that although intelligence test score gains have increased over the past few decades, the size of the gain each decade has been decreasing. Consistent with this finding, other researchers have found data that indicate that the Flynn effect may have already ended in some developed countries (for example, Denmark and Norway) but is still alive in the United States and some other developed nations such as Germany and those countries that make up the United Kingdom (Flynn, 2012; Kaufman, 2009). These findings, however, are for developed countries. Has the Flynn effect also been found for intelligence test scores in developing countries and if it has, how does it compare to that for developed countries? The findings of a recent study by Wongupparaj, Kumari, and Morris (2015) address these questions. Their meta-analysis covered 64 years (1950−2014) and used intelligence test data for the Raven’s Progressive Matrices test (a measure of nonverbal intelligence that requires mental manipulation of objects, logical inference, and other types of abstract reasoning) for both developed countries and developing countries (mainly from Africa, Asia, and South America). They found evidence of the Flynn effect in both types of countries. Although the average general intelligence score was always higher for developed countries, the size of the gains over time were greater for developing countries, closing the gap between these two types of countries to only about 3 points by 2014. To explain these gains for developing countries, Wongupparaj, Kumari, and Morris discuss possible causal factors, but they are essentially the same factors that have been proposed for the Flynn effect in developed countries. Thus, an explanation of the Flynn effect for developing countries seems just as elusive as it is for developed countries.

Section Summary

In this section, we discussed how the first attempts at developing a valid intelligence test began with Galton in late nineteenth-century England and with Binet and Simon in early twentieth-century France. Galton failed, but Binet and Simon, hired by the French government to develop a test to find children who would have difficulty in school, succeeded in 1905. Lewis Terman, working at Stanford University, revised the Binet-Simon test to use with American schoolchildren in 1916, and this test became known as the Stanford-Binet. Results for the Stanford-Binet were reported in terms of Stern’s IQ formula, IQ = (mental age/chronological age) × 100. Subsequently, David Wechsler developed both the WAIS for adults and the WISC for children. Unlike the Stanford-Binet, these tests included both verbal and nonverbal subtests. In addition, Wechsler used deviation IQ scores instead of IQs to report performance on his tests. Deviation IQ scores are based on how many standard deviations a person’s raw test score is above or below the raw score mean for his or her age group norms. Wechsler set the mean to 100 and the standard deviation to 15 to create deviation IQ scores that resemble IQ formula scores.

We also learned that these intelligence tests have both reliability (consistency in measurement) and predictive validity (predicting what we suppose they should predict) and are among the most valid predictors of both academic performance and job performance. In addition, research has not supported the claim that they are biased toward women or minorities.

We also considered some of the major theories of intelligence. Most of these theories define intelligence in terms of mental abilities, but differ with respect to how many abilities are proposed. Using the results of factor analysis, Spearman thought that a general intelligence factor (the g factor) was most important, but other theorists, like Thurstone, emphasized multiple, more specific abilities in their definitions. Two recent theories have attempted to broaden the conception of intelligence and another theory points to the limitations of intelligence as a sufficient explanation of good thinking. Howard Gardner has proposed a theory of eight independent types of intelligence, but critics view some of these as talents or skills and not really types of intelligence. Robert Sternberg has also attempted to broaden the conception of intelligence in his triarchic theory of intelligence, which includes analytical, creative, and practical intelligences, with the latter two having more applicability in the nonacademic world. Keith Stanovich does not want to expand the definition of intelligence but rather argues that rationality in addition to intelligence is necessary for good thinking.

Page 279

Last, we considered the origins of intelligence. We found that genetic similarity studies indicate that both nature (heredity) and nurture (environmental experiences) are important in determining one’s intelligence. However, both heritability estimates and the results of adoption studies indicate that nature likely plays a larger role than nurture in determining intelligence. The concept of reaction range attempts to explain how heredity and environmental experiences work together to determine an individual’s intelligence—heredity places limits on intellectual development, but the quality of the person’s environment determines where the person’s intelligence level falls within these limits. Finally, we discussed the Flynn effect—the finding that intelligence, at least as assessed by intelligence tests, has dramatically increased in the United States and other Western developed nations over the past century, about 3 points per decade. Many environmental factors have been proposed to explain this effect, but its explanation, which likely involves multiple factors, remains a source of debate. Recently, Flynn proposed that the effect is not due to people getting smarter overall but rather to the fact that they are getting smarter at skills that have become more important in our society over the past century, especially abstract, scientific thinking. Flynn’s proposal is supported by a recent finding that fluid intelligence has increased more than crystallized intelligence over the past century. Other recent studies have found that the size of the effect for developed countries has decreased over the past few decades and that the effect may have ended in some developed countries. The opposite, however, is true for developing countries whose average intelligence test scores have made large gains in the past few decades. The gap between the average intelligence score for developing countries is now only about 3 points lower than the average for developed countries. The explanation for these findings is also a source of debate.

3 Question 6.11

Explain why standardization of a test is necessary.

Standardization of a test is necessary for the interpretation of test performance. In the standardization process, a representative sample of the relevant population takes the test and their scores are used as the norms for the test. Test takers’ scores are compared to those of the standardization group in order to determine an index of performance. For example, on intelligence tests a person’s performance is compared to the scores for a representative sample of the person’s age group.

Question 6.12

Explain what a deviation IQ score is and how it differs from an IQ score.

A deviation IQ score is based on the normal distribution of scores for a person’s standardization age group. First, a person’s raw score is determined in terms of standard deviation units above or below the mean for that normal distribution. Then the person’s score in terms of standard deviation units is converted to a score that resembles an IQ score. For example, on the WAIS the mean (or 0 point) is set equal to 100 and the standard deviation to 15. Thus, a person who scores 1 standard deviation above the mean in comparison to his age group norms receives a score of 100 + 15, or 115. IQ scores were based on the following formula, IQ = (mental age/chronological age) × 100. A deviation IQ tells us how well a person did relative to the standardization data for the person’s age group. An IQ told us how well a child did relative to the child’s own actual age.

Question 6.13

Explain how the results of studies examining the impact of genetic similarity on intelligence support both nature and nurture explanations.

The results of genetic similarity studies support the nature (heredity) explanation of intelligence because as genetic similarity decreases, the average correlation between intelligence test scores also decreases. The average correlation is strongest for identical twins (about 0.90). However, these results also indicate that environment (nurture) plays a role. For example, the average correlation for intelligence test scores for identical twins decreases when the twins are adopted and raised apart in different environments. The twins are 100% genetically similar in both cases; therefore, environmental factors must be responsible for the difference in average correlations.

Question 6.14

Explain how the more contemporary theories of intelligence proposed by Gardner, Sternberg, and Stanovich differ from the more traditional theories of intelligence (e.g., those proposed by Spearman and Thurstone).

The theories proposed by Gardner and Sternberg are different from the traditional theories of intelligence that we discussed in that both theorists broaden the definition of intelligence by proposing types of intelligence that are not measured by standard intelligence tests like the WAIS and Stanford-Binet. In Gardner’s case, six of his proposed eight intelligences fall into this category. In Sternberg’s case, two of his three proposed types of intelligence do so. However, all three of Sternberg’s intelligences are types of mental ability, but this is not true for Gardner’s proposed intelligences. With respect to Stanovich’s theory, he doesn’t want to expand the definition of intelligence like Gardner and Sternberg do, but rather argues that intelligence is only one component of good thinking. The other critical component is our ability to think and act rationally, which is not assessed by standard intelligence tests.

●

◌

▣