Statistical inference uses sample data to draw conclusions about the entire population. Because good samples are chosen with random sampling, statistics such as computed from these samples are random variables. We can describe the behavior of a sample statistic by a probability model that answers the question “What would happen if we took random samples of the same size many times?” Here is an example that will lead us toward the probability ideas most important for statistical inference.
The proportion of uninsured drivers varies widely between states as do the enforcement policies. Maine and Massachusetts have the lowest rates in the nation, and both require drivers to show proof of insurance when they register a vehicle. Ohio is a passive enforcement state: although it requires drivers to indicate that they have insurance when they register a vehicle, no proof of insurance is required.3
The proportion p of Ohio drivers without insurance is a parameter describing the population of Ohio drivers. To estimate p, we take a simple random sample of 150 Ohio drivers and find that 27 do not have insurance. The sample proportion of these subjects without insurance is = 27/150 = 0.18. It seems reasonable to use the sample result = 0.18 to estimate the unknown p. An SRS should fairly represent the population, so the proportion of the sample should be somewhere near the proportion p of the population. Of course, we don’t expect to be exactly equal to p. We realize that if we choose another SRS, the sample will probably produce a different .
336
If is rarely exactly right and varies from sample to sample, why is it nonetheless a reasonable estimate of the population proportion p? Here is one answer: if we continue to take larger and larger samples, the statistic is guaranteed to get closer and closer to the parameter p. We have the comfort of knowing that if we can afford to continue sampling more subjects, eventually we will estimate the proportion of uninsured Ohio drivers very accurately. This is a special case of a more general mathematical result called the law of large numbers that we will encounter in Chapter 19.
Draw observations at random from any population with population proportion p. As the number of observations drawn increases, the sample proportion gets closer and closer to the population proportion p.
You should recognize the similarity between the law of large numbers for proportions and the idea of probability (page 260). The idea of probability states that in the long run, the proportion of outcomes taking any value gets close to the probability of that value. Suppose the value of the outcome corresponds to obtaining a success when sampling a single individual from the population. Then the probability of the value of the outcome is p, the proportion of successes in the population. The fact that gets closer and closer to the population proportion p is just the idea of probability rephrased in our new terminology. Because of the importance of this concept, here is an example similar to Example 11.2 (page 260), but in the context of inference.
Extensive studies conducted throughout the year show that the proportion of uninsured drivers in Ohio is very close to 16%.4 Because of this, we will take p = 0.16 as the value of the population proportion, or the true value of the parameter. Figure 15.1 shows how the sample sample proportion of an SRS drawn from the population of drivers changes as we add more subjects to our sample.
337
The first subject in the sample had insurance, so the sample proportion after sampling one subject is zero, and the line in Figure 15.1 starts there. The second subject also had insurance, and the sample proportion after sampling two subjects remains at zero as in Figure 15.1. The third subject did not have insurance, so after sampling three subjects the sample proportion increases to
the third point on the graph. At first, the graph shows that the sample proportion changes considerably as we take more observations. Eventually, however, the sample proportion gets close to the population proportion p = 0.16 and settles down to that value.
If we started over, again choosing people at random from the population, we would get a different path from left to right in Figure 15.1. The law of large numbers says that whatever path we get will always settle down to 0.16 as we draw more and more people.
We know that if we continue to sample subjects, eventually we will estimate the population proportion very accurately. In practice, we take a sample of a fixed size. The key idea that will allow us to calculate the accuracy of an estimate for a fixed sample size is presented in the next section.
Prayer Among the Millennials. The Millennial generation (so called because they were born after 1980 and began to come of age around the year 2000) are less religiously active than older Americans. One of the questions in the General Social Survey in 2012 was “How often does the respondent pray?” Among the 377 respondents in the survey between 18 and 30 years of age, 225 prayed at least once a week.5
The Law of Large Numbers for Proportions Made Visible. You are going to simulate sampling from a population with proportion of successes p = 0.16 as in Example 15.3. The law of large numbers says that the proportion from a sample gets closer and closer to p = 0.16 as the sample increases in size.
338