Beware of searching for significance

Statistical significance ought to mean that you have found an effect that you were looking for. The reasoning behind statistical significance works well if you decide what effect you are seeking, design a study to search for it, and use a test of significance to weigh the evidence you get. In other settings, significance may have little meaning. Here is an example.

EXAMPLE 4 Does your “sign” affect your health?

In June 2015, The Washington Post published an article that reported on researchers who examined the records of patients treated at the Columbia University Medical Center between 1900 and 2000. According to the article, the researchers examined the records of 1.75 million patients, and “using statistical analysis, [the researchers] combed through 1,688 different diseases and found 55 that had a correlation with birth month, including ADHD, reproductive performance, asthma, eyesight and ear infections.” Believers in astrology are vindicated! There is a relationship between your “sign” and your health. Or is there?

Before you base future decisions about your sign and your health on these findings, recall that results significant at the 5% level occur five times in 100 in the long run, even when H0 is true. When you make dozens of tests at the 5% level, you expect a few of them to be significant by chance alone. That means that we would expect 5% of the 1,688 different diseases, or about 84 diseases, to show some relationship with birth month. Finding that 55 of the diseases had some relationship with birth month is actually less than we would expect if there was no relationship. The results are not surprising. Running one test and reaching the α = 0.05 level is reasonably good evidence that you have found something. Running several dozen tests and reaching that level once or twice is not.

557

In Example 4, the researchers tested almost 1,700 diseases and found 55 to have a relationship with birth month. Taking these results and saying that they are evidence of a relationship between your sign and your health is not appropriate. It is bad practice to confuse the roles of exploratory analysis of data (using graphs, tables, and summary statistics, like those discussed in Part II, to find suggestive patterns in data) and formal statistical inference. Finding statistical significance is not surprising if you use exploratory methods to examine many outcomes, choose the largest, and test to see if it is significantly larger than the others.

Searching data for suggestive patterns is certainly legitimate. Exploratory data analysis is an important part of statistics. But the reasoning of formal inference does not apply when your search for a striking effect in the data is successful. The remedy is clear. Once you have a hypothesis, design a study to search specifically for the effect you now think is there. If the result of this study is statistically significant, you have real evidence.

NOW IT’S YOUR TURN

Question 23.2

23.2 Take me out to the ball game. A researcher compared a random sample of recently divorced men in a large city with a random sample of men from the same city who had been married at least 10 years and had never been divorced. The researcher measured 122 variables on each man and compared the two samples using 122 separate tests of significance. Only the variable measuring how often the men attended Major League Baseball games with their spouse was significant at the 1% level, with the married men attending a higher proportion of games with their spouse, on average, than the divorced men did while they were married. Is this strong evidence that attendance at Major League Baseball games improves the chance that a man will remain married? Discuss.