Infographic 7.4: How Smart Are Intelligence Tests?

The introduction reads as follows: Tests that claim to measure intelligence are everywhere—online, in your favorite magazine, at job interviews, and in many elementary and secondary schools. But can all of these tests be trusted? The results of an intelligence test aren’t meaningful unless the test is valid, reliable, and fair. But what do those concepts mean, and how can we be sure whether a test is valid, reliable, or fair—let alone all three? Let’s take a look. There are three blocks in the infographic. The top block is labeled, “Validity: Does the test measure what it intends to measure?” At the left is a bathroom scale with a rubber ducky sitting on the scale. The caption reads, “Is a bathroom scale valid for measuring height?” At the center is a picture of a rubber ducky with a ruler on each side standing vertically. The left ruler starts at 0 inches, and the right ruler starts at 1 inch. A dotted arrow indicates 3” on the left ruler and 4” on the right ruler. The caption reads, “How about a ruler missing its first inch?” A callout to the image reads as follows: A shortened ruler would not be a valid measure because it would provide different results than other rulers. A valid intelligence test will provide results that:

Agree with the results of other valid intelligence tests
Predict performance in an area related to intelligence, such as academic achievement.

The center block of the infographic is labeled, “Reliability: Will your score be consistent every time you take the test?” There are three images of a rubber ducky next to a ruler with the toy pointing in different directions. In all cases, a dotted arrow points to the 4” mark on the rulers. The text reads as follows: A shortened ruler isn’t valid, but it is reliable because it will give the same result every time it’s used. A reliable intelligence test will provide results that:

Are reproducible (produce a similar score if taken a second time)
Show the first and second halves of the test are consistent with each other.

A standard curve is shown. The X-axis is labeled “Wechsler IQ score” and shows values from 55 to 145 in increments of 15. The Y-axis is labeled “Number of scores” but doesn’t show any values. Vertical lines intersect the graph at the 70 and 130 marks, and the areas towards the edges of the curve are shaded and labeled 2%. The area between 70 to 85 and 115 to 130 are shaded. An arrow extending from 70 to 130 is labeled “95%.” The area from 85 to 115 is shaded, with an arrow extended from 85 to 115 labeled “68%.” A callout to the arrow reads, “68% of all people score within 15 points above or below the average score.” The text reads as follows: Because most intelligence tests are standardized, you can determine how well you have performed in comparison to others. Test scores tend to form a bell-shaped curve—called the normal curve—around the average score. Most people (68%) score within 15 points above or below the average. If the test is reliable, each person’s score should stay around the same place on the curve across multiple testings. The lower block of the infographic is labeled “Fairness: Is the test valid for the group?” Text at the left reads as follows: An animal weighing 2 stone is likely to be a:

sparrow
small dog
mature lion
blue whale

Unless you live in the United Kingdom, where the imperial system of weights is used, you probably wouldn’t know that a stone is approximately 14 pounds, and therefore the correct answer is B. Does this mean that you are less intelligent, or that the test is biased against people without a specific background? A test that is culture-fair is designed to minimize the bias of cultural background. At the right are three images of a rubber ducky with a vertical ruler next to the toy. All three toys are the same height, but show different units of measurement. The first is labeled, “3 inches.” The second is labeled, “2.286 Chinese Imperial cùn.” The third is labeled “.1667 cubits.”