415
OBJECTIVES By the end of this section, I will be able to …
1 Sampling Distribution of the Sample Proportion
The sample mean is not the only statistic that can have a sampling distribution. Every sample statistic has a sampling distribution. One of the most important is the sampling distribution of the sample proportion .
Suppose each individual in a population either has or does not have a particular characteristic. If we take a sample of size n from this population, the sample proportion (read “p-hat”) is
where represents the number of individuals in the sample that have the particular characteristic. We use to estimate the unknown value of the population proportion .
EXAMPLE 10 Calculating the sample proportion
In 2014, the Harvard University Institute of Politics surveyed 3058 18- to 29-year-olds and found 1223 who had a Twitter account. Calculate the sample proportion of 18- to 29-year-olds who had a Twitter account.6
Solution
The survey sample size is , and the number of successes is . We calculate
Thus, the sample proportion of 18- to 29-year-olds who had a Twitter account is 0.4. That is, , or 40%, of 18- to 29-year-olds had a Twitter account.
YOUR TURN #6
Refer to Example 10. The same survey found that 1009 had a Pinterest account. Calculate the sample proportion that had a Pinterest account.
(The solution is shown in Appendix A.)
Like , the sample proportion varies from sample to sample. And because we do not know its value before taking the sample, is a random variable. Just as we learned about the Central Limit Theorem for Means in Section 7.1, here in Section 7.2, we develop a Central Limit Theorem for Proportions, where the sampling distribution of the sample proportion becomes approximately normal if the right conditions are satisfied.
416
The sampling distribution of the sample proportion for a given sample size consists of the collection of the sample proportions of all possible samples of size from the population.
In general, the sampling distribution of any particular statistic for a given sample size consists of the collection of the values of that sample statistic across all possible samples of size .
Recall that in Section 7.1, we found that the mean of the sampling distribution of the sample mean is and the standard error of the mean is . We now learn the mean and standard error of the sampling distribution of the sample proportion .
Fact 5: Mean of the Sampling Distribution of the Sample Proportion
The mean of the sampling distribution of the sample proportion is the value of the population proportion . This may be denoted as and read as “the mean of the sampling distribution of is .”
Fact 5 provides a measure of center for the sampling distribution of the sample proportion , and Fact 6 provides a measure of spread.
Fact 6: Standard Deviation of the Sampling Distribution of the Sample Proportion
The standard deviation of the sampling distribution of the sample proportion is , where is the population proportion, , and is the sample size. is called the standard error of the proportion.
EXAMPLE 11 Mean and standard error of
The National Institutes of Health reported that color blindness linked to the X chromosome afflicts 8% of men. Suppose we take a random sample of 100 men and let denote the proportion of men in the population who have color blindness linked to the X chromosome. Find and .
Solution
First, we note that this is a binomial experiment with and . Fact 5 tells us that ; that is, the sampling distribution of the sample proportion has a mean of . Fact 6 states that the standard error is
YOUR TURN #7
Refer to Example 11. Suppose we take a random sample of 400. Find and .
(The solution is shown in Appendix A.)
417
What Do These Numbers Mean?
Imagine that we repeatedly draw random samples of 100 men and observe the proportion of men in each sample who have color blindness linked to the X chromosome. Each sample provides us with a value for . Eventually, the values for , when graphed, form the sampling distribution shown in Figure 13.
Note that is located at the balance point of this distribution, which we should expect because the mean proportion of these samples is Each arrow represents 1 standard error . Note that nearly all the sample proportions lie within 3 standard errors of the mean.
Unfortunately, the sampling distribution of is not always normal. Recall from Section 7.1 that the approximate normality provided by the Central Limit Theorem for Means was a useful tool for solving probability problems for the sample mean . Similarly, in order to solve probability problems for the sample proportion , we need a way to achieve approximate normality for the sampling distribution of . Conditions for the approximate normality of the sampling distribution of are as follows.
Fact 7: Conditions for Approximate Normality of the Sampling Distribution of the Sample Proportion
The sampling distribution of the sample proportion may be considered approximately normal only if both the following conditions hold:
Alternatively, the conditions may be expressed as follows: and .
The minimum sample size required to produce approximate normality in the sampling distribution of is the larger of either
(rounded up to the next integer).
418
2 Applying the Central Limit Theorem for Proportions
Using information from Facts 5, 6, and 7, we express the Central Limit Theorem for Proportions.
Central Limit Theorem for Proportions
The sampling distribution of the sample proportion follows an approximately normal distribution with mean and standard deviation when both the following conditions are satisfied: and .
Alternatively, the conditions may be expressed as follows: and .
EXAMPLE 12 Determining whether the Central Limit Theorem for Proportions applies
In Example 11, we learned that color blindness linked to the X chromosome afflicts 8% of men. Determine the approximate normality of the sampling distribution of , the proportion of men who have color blindness linked to the X chromosome, for samples of size (a) 50 and (b) 100.
Solution
We need to check both conditions to find whether the sampling distribution of is approximately normal.
We are given that and
Because 4 is not ≥ 5, the first condition is not satisfied. The Central Limit Theorem for Proportions cannot be used. We cannot conclude that the sampling distribution of is approximately normal.
Here, and .
Because both 8 and 92 are ≥ 5, both conditions are satisfied. The Central Limit Theorem for Proportions applies, and we can conclude that the sampling distribution of is approximately normal. From Example 11, we have and . Thus, the sampling distribution of is approximately normal with and
NOW YOU CAN DO
Exercises 7–18.
EXAMPLE 13 Minimum sample size for approximate normality
According to George Washington University, 4.3% of all vehicles on the road are large trucks. Let represent the population proportion.
Solution
Using Fact 7, the minimum sample size required is the larger of either
Here,
419
The larger of and is . However, it is unclear what “0.3” of a vehicle means. So we round up to the next integer: . Therefore, the minimum sample size required to produce a sampling distribution of that is approximately normal is vehicles. We confirm that this satisfies our conditions:
We have and
Because the conditions are met, the Central Limit Theorem for Proportions applies. The sampling distribution of is approximately normal .
NOW YOU CAN DO
Exercises 19–24
In those cases where we determine that the sampling distribution of is approximately normal, we can standardize using Fact 8 to obtain the standard normal . Then we may proceed to apply the normal distribution methods we learned in Chapter 6. Fact 8 for proportions is similar to Fact 4 for means.
Fact 8: Standardizing a Normal Sampling Distribution for Proportions
When the sampling distribution of is approximately normal, we can standardize to produce the standard normal :
where is the population proportion of successes, , and is the sample size.
EXAMPLE 14 Finding probabilities Using the Central Limit Theorem for Proportions
Using the information in Example 13, find the probability that a sample of vehicles will have a proportion of large trucks greater than 9% for samples of size (a) 30 vehicles and (b) 117 vehicles.
Solution
From Example 13(b), the sampling distribution of is approximately normal with mean and standard deviation . We are then faced with a normal probability problem similar to those in Section 6.5. Figure 14 shows the sampling distribution of and the probability we are interested in, . Using Fact 8, we standardize as follows:
Thus, , as shown in Figure 15.
Again, we can use our normal distribution methods from Section 6.5 because the Central Limit Theorem for Proportions gives us approximate normality.
420
Following Table 8 in Chapter 6 (page 355), we look up in the table and subtract this table area (0.9940) from 1 to get the desired tail area. That is,
So the probability that the sample proportion of large trucks will exceed 0.09 is 0.006.
NOW YOU CAN DO
Exercises 25–32.
YOUR TURN #8
Using the information in Example 13, find the probability that a sample of vehicles will have a proportion of large trucks smaller than 4% for a sample of size vehicles.
(The solution is shown in Appendix A.)
EXAMPLE 15 Finding percentiles Using the Central Limit Theorem for Proportions
Using the information from Example 13, find the 2.5th and the 97.5th percentiles of sample proportions for .
Solution
We use the inverse normal function on the TI-83/84, just as we did in Example 8 (page 406). The results, using mean and standard deviation , are shown in Figure 16. We have the and the .
The 2.5th and the 97.5th percentiles contain the middle 95% of sample proportions, as shown in Figure 17.
NOW YOU CAN DO
Exercises 33–38.
421
YOUR TURN #9
Using the information from Example 13, find the two percentiles that contain the middle 90% of sample proportions.
(The solution is shown in Appendix A.)
Developing Your Statistical Sense
Pitfalls of Using an Approximation
Let's use symmetry and the results from Example 15 to find the 1st percentile of the sampling distribution of for . By symmetry, the 1st percentile will be the same distance below the mean that the 99th percentile is above the mean. The 99th percentile, 0.0867, lies above the mean. Therefore, the 1st percentile lies 0.0437 below the mean:
However, this value of −0.0007 is negative and cannot represent a sample proportion. This negative result is obtained because the normality of the sampling distribution of is only approximate and not exact.
Note: What can we do to estimate the 1st percentile? One way is to use simulation. Generate samples of size from the population of the original survey respondents, record the sample proportion from each, and simply choose the 1st percentile. Proceeding in this manner, we estimate the 1st percentile as 0.0128.