PRELUDE

xxiii

Making Sense of Statistics

Statistics is about data. Data are numbers, but they are not “just numbers.” Data are numbers with a context. The number 10.5, for example, carries no information by itself. But if we hear that a friend’s new baby weighed 10.5 pounds at birth, we congratulate her on the healthy size of the child. The context engages our background knowledge and allows us to make judgments. We know that a baby weighing 10.5 pounds is quite large and that a human baby is unlikely to weigh 10.5 ounces or 10.5 kilograms. The context makes the number informative.

Statistics uses data to gain insight and to draw conclusions. The tools are graphs and calculations, but the tools are guided by ways of thinking that amount to educated common sense. Let’s begin our study of statistics with a rapid and informal guide to coping with data and statistical studies in the news media and in the heat of political and social controversy. We will examine the examples introduced in this prelude in more detail later.

Data beats anecdotes

Belief is no substitute for arithmetic.

HENRY SPENCER

An anecdote is a striking story that sticks in our minds exactly because it is striking. Anecdotes humanize an issue, so news reports usually start (and often stop) with anecdotes. But anecdotes are weak ground for making up your mind—they are often misleading exactly because they are striking. Always ask if a claim is backed by data, not just by an appealing personal story.

Does living near power lines cause leukemia in children? The National Cancer Institute spent 5 years and $5 million gathering data on the question. Result: no connection between leukemia and exposure to magnetic fields of the kind produced by power lines. The editorial that accompanied the study report in the New England Journal of Medicine thundered, “It is time to stop wasting our research resources” on the question.

Now compare the impact of a television news report of a 5-year, $5 million investigation with that of a televised interview with an articulate mother whose child has leukemia and who happens to live near a power line. In the public mind, the anecdote wins every time. Be skeptical. Data are more reliable than anecdotes because they systematically describe an overall picture rather than focus on a few incidents.

xxiv

We are tempted to add, “Data beat self-proclaimed experts.” The idea of balance held by much of the news industry is to present a quick statement by an “expert” on either side. We never learn that one expert expresses the consensus of an entire field of science, while the other is a quack with a special-interest axe to grind. As a result of the media’s taste for conflict, the public now thinks that for every expert there is an equal and opposite expert. If you really care about an issue, try to find out what the data say and how good the data are. Many issues do remain unsettled, but many others are unsettled only in the minds of people who don’t care about evidence. You can start by looking at the credentials of the “experts” and at whether the studies they cite have appeared in journals that require careful outside review before they publish a claim.

Where the data come from is important

Figure won’t lie but liars will figure.

CHARLES GROSVENOR

Data are numbers, and numbers always seem solid. Some are and some are not. Where the data come from is the single most important fact about any statistical study. When Ann Landers asked readers of her advice column whether they would have children again and 70% of those who replied shouted “No,” readers should have just amused themselves with Ann’s excerpts from tear-stained letters describing what beasts the writers’ children are. Ann Landers was in the entertainment business. Her invitation attracted parents who regretted having their children. Most parents don’t regret having children. We know this because opinion polls have asked large numbers of parents, chosen at random to avoid attracting one opinion or another. Opinion polls have their problems, as we will see, but they beat just asking upset people to write in.

Even the most reputable publications have not been immune to bad data. The Journal of the American Medical Association once printed an article claiming that pumping refrigerated liquid through tubes in the stomach relieves ulcers. The patients did respond, but only because patients often respond to any treatment given with the authority of a trusted doctor. That is, placebos (dummy treatments) work. When a skeptic finally tried a properly controlled study in which some patients got the tube and some got a placebo, the placebo actually did a bit better. “No comparison, no conclusion” is a good starting point for judging medical studies. We would be skeptical about the ongoing interest in “natural remedies,” for example. Few of these have passed a comparative trial to show that they are more than just placebos sold in bottles bearing pretty pictures of plants.

xxv

Beware the lurking variable

I have enough money to last me the rest of my life, unless I buy something.

JACKIE MASON

You read that crime is higher in counties with gambling casinos. A college teacher says that students who took a course online did better than the students in the classroom. Government reports emphasize that well-educated people earn a lot more than people with less education. Don’t jump to conclusions. Ask first, “What is there that they didn’t tell me that might explain this?”

Crime is higher in counties with casinos, but it is also higher in urban counties and in poor counties. What kinds of counties are casinos in? Did these counties have high crime rates before the casinos arrived? The online students did better, but they were older and better prepared than the in-class students. No wonder they did better. Well-educated people do earn a lot. But educated people have (on the average) parents with more education and more money than the parents of poorly educated people have. They grew up in nicer places and went to better schools. These advantages help them get more education and would help them earn more even without that education.

All these studies report a connection between two variables and invite us to conclude that one of these variables influences the other. “Casinos increase crime” and “Stay in school if you want to be rich” are the messages we hear. Perhaps these messages are true. But perhaps much of the connection is explained by other variables lurking in the background, such as the nature of counties that accept casinos and the advantages that highly educated people were born with. Good statistical studies look at lots of background variables. This is tricky, but you can at least find out if it was done.

Variation is everywhere

When the facts change, I change my mind. What do you do, sir?

JOHN MAYNARD KEYNES

If a thermometer under your tongue reads higher than 98.6°F, do you have a fever? Maybe not. People vary in their “normal” temperature. Your own temperature also varies—it is lower around 6 A.M. and higher around 6 P.M. The government announces that the unemployment rate rose a tenth of a percent last month and that new home starts fell by 3%. The stock market promptly jumps (or sinks). Stocks are jumpier than is sensible. The government data come from samples that give good estimates but not the exact truth. Another run of the same samples would give slightly different answers. And economic facts jump around anyway, due to weather, strikes, holidays, and all sorts of other reasons.

xxvi

Many people join the stock market in overreacting to minor changes in data that are really nothing but background noise. Here is Arthur Nielsen, head of the country’s largest market research firm, describing his experience:

Too many business people assign equal validity to all numbers printed on paper. They accept numbers as representing Truth and find it difficult to work with the concept of probability. They do not see a number as a kind of shorthand for a range that describes our actual knowledge . . .

Variation is everywhere. Individuals vary; repeated measurements on the same individual vary; almost everything varies over time. Ignore the pundits who try to explain the deep reasons behind each day’s stock market moves, or who condemn a team’s ability and character after a game decided by a last-second shot that did or didn’t go in.

Conclusions are not certain

As far as the laws of mathematics refer to reality they are not certain, and as far as they are certain they do not refer to reality.

ALBERT EINSTEIN

Because variation is everywhere, statistical conclusions are not certain. Most women who reach middle age have regular mammograms to detect breast cancer. Do mammograms really reduce the risk of dying of breast cancer? Statistical studies of high quality find that mammograms reduce the risk of death in women aged 50 to 64 years by 26%. That’s an average over all women in the age group. Because variation is everywhere, the results are different for different women. Some women who have mammograms every year die of breast cancer, and some who never have mammograms live to 100 and die when they crash their motorcycles.

What the summary study actually said was “mammography reduces the risk of dying of breast cancer by 26 percent (95 percent confidence interval, 17 to 34 percent).” That 26% is, in Arthur Nielsen’s words, “shorthand for a range that describes our actual knowledge of the underlying condition.” The range is 17% to 34%, and we are 95% confident that the truth lies in that range. We’re pretty sure, in other words, but not certain. Once you get beyond news reports, you can look for phrases like “95% confident” and “statistically significant” that tell us that a study did produce findings that, while not certain, are pretty sure.

xxvii

Data reflect social values

It’s easy to lie with statistics. But it is easier to lie without them.

FREDERICK MOSTELLER

Good data do beat anecdotes. Data are more objective than anecdotes or loud arguments about what might happen. Statistics certainly lies on the factual, scientific, rational side of public discourse. Statistical studies deserve more weight than most other evidence about controversial issues. There is, however, no such thing as perfect objectivity. Statistics shares a social context that influences what we decide to measure and how we measure it.

Suicide rates, for example, vary greatly among nations. It appears that much of the difference in the reported rates is due to social attitudes rather than to actual differences in suicide rates. Counts of suicides come from death certificates. The officials who complete the certificates (details vary depending on the state or nation) can choose to look more or less closely at, for example, drownings and falls that lack witnesses. Where suicide is stigmatized, deaths are more often reported as accidents. Countries that are predominantly Catholic have lower reported suicide rates than others, for example. Japanese culture has a tradition of honorable suicide as a response to shame. This tradition leads to better reporting of suicide in Japan because it reduces the stigma attached to suicide. In other nations, changes in social values may lead to higher suicide counts. It is becoming more common to view depression as a medical problem rather than a weakness of character and suicide as a tragic end to the illness rather than a moral flaw. Families and doctors then become more willing to report suicide as the cause of death.

Social values influence data on matters less sensitive than suicide. The percentage of people who are unemployed in the United States is measured each month by the Bureau of Labor Statistics, using a large and very professionally chosen sample of people across the country. But what does it mean to be “unemployed”? It means that you don’t have a job even though you want a job and have actively looked for work in the last 4 weeks. If you went 4 weeks without seeking work, you are not unemployed; you are “out of the labor force.” This definition of unemployment reflects the value we attach to working. A different definition might give a very different unemployment rate.

Our point is not that you should mistrust the unemployment rate. The definition of “unemployment” has been stable over time so that we can see trends. The definition is reasonably consistent across nations so that we can make international comparisons. The data are produced by professionals free of political interference. The unemployment rate is important and useful information. Our point is that not everything important can be reduced to numbers and that reducing things to numbers is done by people influenced by many pressures, conscious and unconscious.