For Exercise 6.93, see page 339; and for 6.94, see page 339.
6.95 Your role on a team.
You are the statistical expert on a team that is planning a study. After you have made a careful presentation of the mechanics of significance testing, one of the team members suggests using α=0.20 for the study because you would be more likely to obtain statistically signifcant results with this choice. Explain in simple terms why this would not be a good use of statistical methods.
6.95
If α=0.20, then we would be making a mistake 20% of the time.
6.96 What do you know?
A research report described two results that both achieved statistical significance at the 5% level. The P-value for the first is 0.049; for the second it is 0.00002. Do the P-values add any useful information beyond that conveyed by the statement that both results are statistically signifcant? Write a short paragraph explaining your views on this question.
6.97 Find some journal articles.
Find two journal articles that report results with statistical analyses. For each article, summarize how the results are reported, and write a critique of the presentation. Be sure to include details regarding use of significance testing at a particular level of significance, P-values, and confidence intervals.
6.98 Vitamin C and colds.
In a study of the suggestion that taking vitamin C will prevent colds, 400 subjects are assigned at random to one of two groups. The experimental group takes a vitamin C tablet daily, while the control group takes a placebo. At the end of the experiment, the researchers calculate the difference between the percents of subjects in the two groups who were free of colds. This difference is statistically signifcant (P=0.03) in favor of the vitamin C group. Can we conclude that vitamin C has a strong effect in preventing colds? Explain your answer.
6.99 How far do rich parents take us?
How much education children get is strongly associated with the wealth and social status of their parents, termed “socioeconomic status,” or SES. The SES of parents, however, has little influence on whether children who have graduated from college continue their education. One study looked at whether college graduates took the graduate admissions tests for business, law, and other graduate programs. The effects of the parents’ SES on taking the LSAT test for law school were “both statistically insignifcant and small.”
6.99
(a) If SES had no effect on LSAT, there would still be some small differences due to chance variation. Statistically insignificant means that the effect is small enough that it could just be due to this chance. (b) If the effect were large, it could be of practical importance even though it wasn’t statistically significant; this is especially true if the sample was small.
6.100 Do you agree?
State whether or not you agree with each of the following statements, and provide a short summary of the reasons for your answers.
6.101 Turning insignificance in significance.
Every user of statistics should understand the distinction between statistical significance and practical importance.
A sufficiently large sample will declare very small effects statistically signifcant. Consider the following randomly generated digits used to form (x,y) observation pairs:
x | 1 | 7 | 9 | 4 | 6 | 4 | 6 | 5 | 0 | 1 |
y | 0 | 0 | 4 | 3 | 7 | 5 | 5 | 2 | 4 | 5 |
Read the 10 ordered pair values into statistical software. We will want to test the significance of the observed correlation. Excel doesn’t provide that capability.
6.101
(a) There seems to be no relationship between x and y. (b) r=0.07565. P-value=0.8355. Yes, there is no significant correlation between x and y. (c) The plot is identical. The correlation is the same, r=0.07565. The P-value is smaller (P-value=0.7513). (d) The correlation has not changed, but the P-value gets smaller as n increases.
n | R | P-value |
---|---|---|
10 | 0.07565 | 0.8355 |
20 | 0.07565 | 0.7513 |
30 | 0.07565 | 0.6911 |
40 | 0.07565 | 0.6427 |
50 | 0.07565 | 0.6016 |
60 | 0.07565 | 0.5657 |
(e) n=680. (f) Even with no relationship and a very small correlation, a big enough sample size can show statistical significance, warning us to make sure the effect is worth our attention rather than just “trusting” the statistics.
signif
6.102 Predicting success of trainees.
What distinguishes managerial trainees who eventually become executives from those who, after expensive training, don’t succeed and leave the company? We have abundant data on past trainees—data on their personalities and goals, their college preparation and performance, even their family backgrounds and their hobbies. Statistical software makes it easy to perform dozens of significance tests on these dozens of variables to see which ones best predict later success. From running such tests, we find that future executives are signifcantly more likely than washouts to have an urban or suburban upbringing and an undergraduate degree in a technical field.
Explain clearly why using these “signifcant” variables to select future trainees is not wise. Then suggest a follow-up study using this year’s trainees as subjects that should clarify the importance of the variables identified by the first study.
6.103 More than one test.
A P-value based on a single test is misleading if you perform several tests. The Bonferroni procedure gives a significance level for several tests together. Level α then means that if all the null hypotheses are true, the probability is α that any of the tests rejects its null hypothesis.
If you perform two tests and want to use the α=5% significance level, Bonferroni says to require a P-value of 0.05/2=0.025 to declare either one of the tests signifcant. In general, if you perform k tests and want protection at level α, use α/k as your cutoff for statistical significance for each test.
You perform six tests and obtain individual P-values of 0.376, 0.037, 0.009, 0.007, 0.004, and<0.001. Which of these are statistically signifcant using the Bonferroni procedure with α=0.05?
6.103
0.007, 0.004, <0.001.
6.104 More than one test.
Refer to the previous exercise. A researcher has performed 12 tests of significance and wants to apply the Bonferroni procedure with α=0.05. The calculated P-values are 0.039, 0.549, 0.003, 0.316, 0.001, 0.006, 0.251, 0.031, 0.778, 0.012, 0.002, and < 0.001. Which of these tests reject their null hypotheses with this procedure?
6.105 More than one test and critical value.
Suppose that you are performing 12 two-sided tests of significance using the Bonferroni procedure with α=0.05.
6.105
(a) Z*=2.87. (b) It will get bigger.
6.106 False-positive rate.
With the big data movement, companies are searching through thousands of variables to find patterns in the data to make better predictions on key business variables. For example, Walmart found that sales of strawberry Pop-Tarts increased signifcantly when the surrounding region was threatened with an impending hurricane.22 Imagine yourself in a business analytics position at a company and that you are trying to find variables that signifcantly correlate with company sales y. Among the variables you are going to compare y against are 80 variables that are truly unrelated to y. In other words, for each of these 80 variables, the null hypothesis is true that the correlation between y and the variables is 0. You are unaware of this fact. Suppose that the 80 variables are independent of each other and that you perform correlation tests between y and each of the variables at the 5% level of significance.
6.107 False-positives.
Refer to the setting of the previous problem. Define X as the number of false-positives occurring among the 80 correlation tests.
6.107
(a) X~B(80,0.05). (b) 0.9139.