The law of large numbers assures us that if we measure enough subjects, the statistic will eventually get very close to the unknown parameter p. But what can we say about estimating p by from a sample of 150 subjects in Example 15.2? To answer this question, put this one sample in the context of all such samples by asking, “What would happen if we took many samples of 150 subjects from this population?” Here’s how to answer this question:
simulation
In practice it is too expensive to take many samples from a large population such as all Ohio drivers or all adults in the United States. But we can imitate taking many samples by using software? Using software to imitate chance behavior is called simulation.
Extensive studies in Ohio have found that the proportion of uninsured drivers is approximately 16%. This is the population proportion.
Figure 15.2 illustrates the process of choosing many samples and finding the sample proportion of uninsured drivers for each one. Follow the flow of the figure from the population at the left, to choosing an SRS and finding the for this sample, to collecting together the ’s from many samples. The first sample has = 0.120. The second sample contains a different 150 people, with = 0.147, and so on. The histogram at the right of the figure shows the distribution of the values of from 2000 separate SRSs of size 150. This histogram displays the sampling distribution of the statistic .
339
The sampling distribution of a statistic is the distribution of values taken by the statistic in all possible samples of the same size from the same population.
Strictly speaking, the sampling distribution is the ideal pattern that would emerge if we looked at all possible samples of size 150 from our population. A histogram obtained from a fixed number of trials, like the 2000 trials in Figure 15.2, is only an approximation to the sampling distribution. One of the uses of probability theory in statistics is to obtain sampling distributions without simulation. The interpretation of a sampling distribution is the same, however, whether we obtain it by simulation or by the mathematics of probability.
We can use the tools of data analysis to describe any distribution. Let’s apply those tools to Figure 15.2. What can we say about the shape, center, and spread/variability of this distribution?
Although these results describe just one simulation of a sampling distribution, they reflect facts that are true whenever we use random sampling.
Generating a Sampling Distribution. Let’s illustrate the idea of a sampling distribution in the case of a very small sample from a very small population. The population is the sex of 10 students in a class:
Student | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |
Score | F | F | M | M | F | M | M | M | F | M |
The parameter of interest is the proportion of females p in this population. The sample is an SRS of size n = 4 drawn from the population. Because the students are labeled 0 to 9, a single random digit from Table B chooses one student for the sample.
340