EXAMPLE 3 Count Buffon’s coin again

Count Buffon tossed a coin 4040 times and got 2048 heads. His sample proportion of heads was

Is the count’s coin balanced? Suppose we seek statistical significance at level 0.05. The hypotheses are

H0: p = 0.5

Ha: p ≠ 0.5

The test of significance works by locating the sample outcome = 0.507 on the sampling distribution that describes how would vary if the null hypothesis were true. Figure 23.1 repeats Figure 22.2. It shows that the observed = 0.507 is not surprisingly far from 0.5 and, therefore, is not good evidence against the hypothesis that the true p is 0.5. The P-value, which is 0.37, just makes this precise.

image
Figure 23.1 The sampling distribution of the proportion of heads in 4040 tosses of a coin if in fact the coin is balanced, Example 3. Sample proportion 0.507 is not an unusual outcome.

Suppose that Count Buffon got the same result, = 0.507, from tossing a coin 100,000 times. The sampling distribution of when the null hypothesis is true always has mean 0.5, but its standard deviation gets smaller as the sample size n gets larger. Figure 23.2 displays the two sampling distributions, for n = 4040 and n = 100,000. The lower curve in this figure is the same Normal curve as in Figure 23.1, drawn on a scale that allows us to show the very tall and narrow curve for n = 100,000. Locating the sample outcome = 0.507 on the two curves, you see that the same outcome is more or less surprising depending on the size of the sample.

553

image
Figure 23.2 The two sampling distributions of the proportion of heads in 4040 and 100,000 tosses of a balanced coin, Example 3. Sample proportion 0.507 is not unusual in 4040 tosses but is very unusual in 100,000 tosses.

The P-values are P = 0.37 for n = 4040 and P = 0.000009 for n = 100,000. Imagine tossing a balanced coin 4040 times repeatedly. You will get a proportion of heads at least as far from one-half as Buffon’s 0.507 in about 37% of your repetitions. If you toss a balanced coin 100,000 times repeatedly, however, you will almost never (nine times in one million repeats) get an outcome as or more unbalanced than this.

The outcome = 0.507 is not evidence against the hypothesis that the coin is balanced if it comes up in 4040 tosses. It is completely convincing evidence if it comes up in 100,000 tosses.