Chapter 23: Use and Abuse of Statistical Inference

STATISTICS IN SUMMARY

Chapter Specifics

• Statistical inference is less widely applicable than exploratory analysis of data. Any inference method requires the right setting—in particular, the right design for a random sample or randomized experiment.
• Understanding the meaning of confidence levels and statistical significance helps prevent improper conclusions.
• Increasing the number of observations has a straightforward effect on confidence intervals: the interval gets shorter for the same level of confidence.
• Taking more observations usually decreases the P-value of a test when the truth about the population stays the same, making significance tests harder to interpret than confidence intervals.
• A finding with a small P-value may not be practically interesting if the sample is large, and an important truth about the population may fail to be significant if the sample is small. Avoid depending on fixed significance levels such as 5% to make decisions.
• If a test of significance is thought of as a decision problem, we focus on two hypotheses, H₀ and H_a, and give a decision rule for deciding between them based on sample evidence. We can make two types of errors. If we reject H₀ (accept H_a) when in fact H₀ is true, this is a Type I error. If we accept H₀ (reject H_a) when in fact H_a is true, this is a Type II error.

In Chapters 21 and 22, we introduced the basic reasoning behind statistical estimation and tests of significance. We applied this reasoning to the problem of making inferences about a population proportion and a population mean. In this chapter, we provided some cautions about confidence intervals and tests of significance. Some of these echo statements made in Chapters 1 through 6 that where the data come from matters. Some cautions are based on the behavior of confidence intervals and significance tests. These cautions will help you evaluate studies that report confidence intervals or the results of a test of significance.

Page 563

CASE STUDY EVALUATED Look again at the Case Study at the beginning of this chapter. The ProFunds Internet Inv Fund was identified as among the top 1% in returns after looking at more than 10,000 mutual funds.

1. Does it make sense to look at past data, take a fund that is one of the best out of more than 10,000 funds, and ask if this fund was above average? What would you expect the outcome of a test of significance to show about the performance of the fund compared with the average performance of all funds?
2. In fact, the ProFunds Internet Inv Fund was in the bottom 42% out of more than 10,000 mutual funds in 2011. Is this surprising given the fact that a test of significance shows that its average return over the previous three years was significantly higher than the average return for all funds? Discuss.
3. Significance tests work when we form a hypothesis, such as “the ProFunds Internet Inv Fund will have higher-than-average returns over the next three years,’’ and then collect data. Suppose you did this for a particular fund and found that its average return was significantly higher than the average return for all funds. Discuss whether this provides good evidence that the fund is a sound investment. You may wish to review the material on understanding prediction in Chapter 15. In particular, read pages 344–345.

Online Resources

• There is a StatTutor lesson on Cautions about Significance Tests.
• LearningCurve has good questions to check your understanding of the concepts.