In this chapter we began our study of inferential statistics, the process of drawing conclusions about a population based on information obtained from a sample. Two common inferential procedures are confidence intervals and hypothesis tests. A confidence interval is used when we want to estimate the value of a parameter and a hypothesis test is used when we want to test a theory about a parameter.
In Section 8.1 we developed a confidence interval formula to estimate a population proportion. In Section 8.2 we developed a hypothesis test appropriate when testing a claim made about a population proportion. Both of these methods rely on knowing how the sample proportion, \(\hat{p}\), varies in repeated random samples.
The sampling distribution of describes how the values of the sample proportion, \(\hat{p}\), will vary in repeated random sample, each of the same size, \(n\).
We say that the sampling distribution of \(\hat{p}\) is approximately normal with \(\mu_{\hat{p}} = p \) and \(\sigma_{\hat{p}} = \sqrt{ \frac{p(1-p)}{n}}\). This approximation can be used whenever \(n\) and \(p\) are such that the number of successes (\(np\)) and the number of failures (\(n(1-p)\)) are each at least 10.
The general formula for a confidence interval to estimate a parameter is estimate ± margin of error. The estimate is a sample statistic. The margin of error is determined by an estimate of the standard deviation of the sampling distribution and a critical value. The values needed for generating particular confidence intervals are called critical values, and are designated by \(z^{*}\). The most common confidence levels are 90%, 95%, and 99% and their corresponding critical values are \(z^{*}\) = 1.645, 1.96, 2.576, respectively.
Calculating the standard deviation of the sampling distribution requires that we know the true population proportion \(p\), since \(\sigma_{\hat{p}} = \sqrt{ \frac{p(1-p)}{n}}\). But \(p\) is exactly what we are trying to determine so we use \(\hat{p}\) to estimate \(p\). When we do that, we create what we call the standard error of the sample proportion \(SE_{\hat{p}}= \sqrt{\frac{\hat{p}(1-\hat{p})}{n}}\).
Therefore, the formula for a confidence interval to estimate \(p\) is:
estimate ± margin of error
\(\hat{p}\pm z^{*}SE_{\hat{p}}\)
\(\hat{p}\pm z^{*}\sqrt{\frac{\hat{p}(1-\hat{p})}{n}}\)
This confidence interval procedure can be used provided that the following conditions are satisfied:
In practice, it is very difficult to guarantee that all these conditions are met. So we must do our best to explain why we believe our sample is good enough to use a confidence interval to estimate a population proportion.
Pollsters generally want to know how small of a sample size can be used to achieve a specified level of confidence with a prescribed margin of error. The sample size is determined by \(n=\hat{p}(1-\hat{p})\big(\frac{z^{*}}{MoE}\big)^2\) where MoE is the margin of error expressed in decimal form and \(n\) is rounded up to the next whole number. The value of \(\hat{p}\) is either determined by a result from a previous study or else it is chosen to be 0.50.
Next we developed a procedure to do inferential statistics. We use this procedure when we are interested in testing a claim about a population. To test this claim we randomly select a sample from the population and then use the sample results to test the claim. We want to see results that are statistically significant, that is, unlikely to have occurred merely by chance.
Every hypothesis that we present will include the following five steps:
The null hypothesis, \(H_{0}\), is an established claim about at least one parameter (and in this chapter it is a claim about one population proportion). The alternative hypothesis, \(H_{a}\) is the claim that we will support if the evidence from the sample makes it unlikely that the null hypothesis is true. The results of the test include both the test statistic and its P-value. A test statistic is a calculation that we will use to decide how unusual our sample results are. A P-value is the probability, given the null hypothesis is true, that we get a test statistic at least as extreme as the one we obtained, merely by chance. The conclusion will include a sentence summarizing the results in the context of the problem.
When testing a claim about one population proportion, p, we use the following 5-step procedure:
When testing a claim about the hypothesis it is important to verify that the appropriate conditions to use these methods are satisfied. The conditions required for inference about a proportion are:
Deciding whether or not to reject the null hypothesis can be done using the P-value approach that we used in Section 8.2. This relies on our ability to accurately determine the area of the required tail (or tails) of the standard normal distribution. Because of the widespread availability of statistical software, this approach has become common. We believe that this is the best way to report the strength of the evidence against the null hypothesis because it allows anyone reading our results to see just how likely it is that we obtained such a sample merely by chance.
Another method, however, is referred to as the level of significance approach to hypothesis testing. In this method, we decide in advance how strong our evidence must be for us to reject the null hypothesis. The typical levels considered are the 10%, 5%, and 1% levels of significance, and we use \(\alpha\) to indicate the significance level. If the P-value that we find is less than or equal to the given \(\alpha\) we will reject the null hypothesis at that significance level.