When you complete this section, you will be able to:
• Describe a level C confidence interval for a population parameter in terms of an estimate and its margin of error.
• Construct a level C confidence interval for μ from a simple random sample (SRS) of size n from a large population having known standard deviation σ.
• Explain how the margin of error changes with a change in the confidence level C.
• Determine the sample size needed to obtain a specified margin of error for a level C confidence interval for μ.
• Identify situations where inference about μ based on the confidence interval may be suspect.
344
linear transformations, p. 44
The SAT is a widely used measure of readiness for college study. It consists of two sections, one for mathematical reasoning ability (SATM), and one for reading and writing ability (SATV). Possible scores on each section range from 200 to 800, for a total range of 400 to 1600. Since 1995, section scores have been recentered so that the mean is approximately 500 with a standard deviation of 100 in a large “standardized group.” This scale has been maintained so that scores have a constant interpretation.
EXAMPLE 6.3
Estimating the mean SATM score for seniors in California. Suppose that you want to estimate the mean SATM score for the 485,264 high school seniors in California.2 You know better than to trust data from the students who choose to take the SAT. Only about 38% of California students typically take the SAT. These self-selected students are planning to attend college and are not representative of all California seniors. At considerable effort and expense, you give the test to an SRS of 500 California high school seniors. The mean score for your sample is . What can you say about the mean score μ in the population of all 485,264 seniors?
unbiased estimator, p. 287
law of large numbers, p. 250
The sample mean is the natural estimator of the unknown population mean μ. We know that is an unbiased estimator of μ. More important, the law of large numbers says that the sample mean must approach the population mean as the size of the sample grows. The value , therefore, appears to be a reasonable estimate of the mean score μ that all 485,264 students would achieve if they took the test.
But how reliable is this estimate? A second sample of 500 students would surely not give a sample mean of 495 again. Unbiasedness says only that there is no systematic tendency to underestimate or overestimate the truth. Could we plausibly get a sample mean of 485 and a sample mean of 520 in repeated samples? An estimate without an indication of its variability is of little value.
Statistical confidence
central limit theorem, p. 298
The unbiasedness of an estimator concerns the center of its sampling distribution, but questions about variation are answered by looking at its spread. The central limit theorem says that if the entire population of SATM scores has mean μ and standard deviation σ, then in repeated SRSs of size 500, the sample mean is approximately . Let us suppose that we know that the standard deviation σ of SATM scores in our California population is σ = 100. (We will see in the next chapter how to proceed when σ is not known. For now, we are more interested in statistical reasoning than in details of realistic methods.) This means that in repeated sampling the sample mean has an approximately Normal distribution centered at the unknown population mean μ and a standard deviation of
345
Now we are ready to proceed. Consider this line of thought, which is illustrated in Figure 6.2:
• The 68–95–99.7 rule says that the probability is about 0.95 that will be within 9 points (that is, two standard deviations of ) of the population mean score μ.
• To say that lies within 9 points of μ is the same as saying that μ is within 9 points of .
• So about 95% of all samples will contain the true μ in the interval from to
We have simply restated a fact about the sampling distribution of . The language of statistical inference uses this fact about what would happen in the long run to express our confidence in the results of any one sample. Our sample gave . We say that we are 95% confident that the unknown mean score for all California seniors lies between
and
Be sure you understand the grounds for our confidence. There are only two possibilities for our SRS:
1. The interval between 486 and 504 contains the true μ.
2. The interval between 486 and 504 does not contain the true μ.
We cannot know whether our sample is one of the 95% for which the interval contains μ or one of the unlucky 5% for which it does not contain μ. The statement that we are 95% confident is shorthand for saying, “We arrived at these numbers by a method that gives correct results 95% of the time.”
USE YOUR KNOWLEDGE
6.1 How much do you spend on lunch? The average amount you spend on a lunch during the week is not known. Based on past experience, you are willing to assume that the standard deviation is $2.10. If you take a random sample of 28 lunches, what is the value of the standard deviation of ?
346
6.2 Applying the 68–95–99.7 rule. In the setting of the previous exercise, the 68–95–99.7 rule says that the probability is about 0.95 that is within $________ of the population mean μ. Fill in the blank.
6.3 Constructing a 95% confidence interval. In the setting of the previous two exercises, about 95% of all samples will capture the true mean in the interval plus or minus $________. Fill in the blank.
Confidence intervals
In the setting of Example 6.3, the interval of numbers between the values is called a 95% confidence interval for μ. Like most confidence intervals we will discuss, this one has the form
margin of error, p. 287
estimate ± margin of error
The estimate ( in this case) is our guess for the value of the unknown parameter. The margin of error (9 here) reflects how accurate we believe our guess is, based on the variability of the estimate, and how confident we are that the procedure will produce an interval that will contain the true population mean μ.
Figure 6.3 illustrates the behavior of 95% confidence intervals in repeated sampling from a Normal distribution with mean μ. The center of each interval (marked by a dot) is at and varies from sample to sample. The sampling distribution of (also Normal) appears at the top of the figure to show the long-term pattern of this variation.
The 95% confidence intervals, ± margin of error, from 25 SRSs appear below the sampling distribution. The arrows on either side of the dot () span the confidence interval. All except one of the 25 intervals contain the true value of μ. In those intervals that contain μ, sometimes μ is near the middle of the interval and sometimes it is closer to one of the ends. This again reflects the variation of . In practice, we don’t know the value of μ, but we have a method such that, in a very large number of samples, 95% of the confidence intervals will contain μ.
347
We can construct confidence intervals for many different parameters based on a variety of designs for data collection. We will learn the details of a number of these in later chapters. Two important things about a confidence interval are common to all settings:
1. It is an interval of the form (a, b), where a and b are numbers computed from the sample data.
2. It has a property called a confidence level that gives the probability of producing an interval that contains the unknown parameter.
Users can choose the confidence level, but 95% is the standard for most situations. Occasionally, 90% or 99% is used. We use C to stand for the confidence level in decimal form. For example, a 95% confidence level corresponds to C = 0.95.
CONFIDENCE INTERVAL
A level C confidence interval for a parameter is an interval computed from sample data by a method that has probability C of producing an interval containing the true value of the parameter.
With the Confidence Interval applet, you can construct diagrams similar to the one displayed in Figure 6.3. The only difference is that the applet displays the Normal population distribution at the top rather than the Normal sampling distribution of . You choose the confidence level C, the sample size n, and whether you want to generate 1 or 25 samples at a time. A running total (and percent) of the number of intervals that contain μ is displayed so you can consider a larger number of samples.
When generating single samples, the data for the latest SRS are shown below the confidence interval. The spread in these data reflects the spread of the population distribution. This spread is assumed known, and it does not change with sample size. What does change, as you vary n, is the margin of error, since it reflects the uncertainty in the estimate of μ. As you increase n, you’ll find that the span of the interval gets smaller.
USE YOUR KNOWLEDGE
6.4 Generating a single confidence interval. Using the default settings in the Confidence Interval applet (95% confidence level and n = 20), click “Sample” to choose an SRS and display its confidence interval.
(a) Is the spread in the data, shown as yellow dots below the confidence interval, larger than the span of the confidence interval? Explain why this would typically be the case.
(b) For the same data set, you can compare the span of the confidence interval for different values of C by sliding the confidence level to a new value. For the SRS you generated in part (a), what happens to the span of the interval when you move C to 99%? What about 90%? Describe the relationship you find between the confidence level C and the span of the confidence interval.
348
6.5 80% confidence intervals. The idea of an 80% confidence interval is that the interval captures the true parameter value in 80% of all samples. That’s not high enough confidence for practical use, but 80% hits and 20% misses make it easy to see how a confidence interval behaves in repeated samples from the same population.
(a) Set the confidence level in the Confidence Interval applet to 80%. Click “Sample 25” to choose 25 SRSs and display their confidence intervals. How many of the 25 intervals contain the true mean μ? What proportion contain the true mean?
(b) We can’t determine whether a new SRS will result in an interval that contains μ or not. The confidence level only tells us what percent will contain μ in the long run. Click “Sample 25” again to get the confidence intervals from 50 SRSs. What proportion hit? Keep clicking “Sample 25” and record the proportion of hits among 100, 200, 300, 400, and 500 SRSs. As the number of samples increases, we expect the percent of captures to get closer to the confidence level, 80%. Do you find this pattern in your results?
Confidence interval for a population mean
central limit theorem, p. 298
We now construct a level C confidence interval for the mean μ of a population when the data are an SRS of size n. The construction is based on the sampling distribution of the sample mean . This distribution is exactly when the population has the N(μ, σ) distribution. The central limit theorem says that this same sampling distribution is approximately correct for large samples whenever the population mean and standard deviation are μ and σ. For now, we will assume we are in one of these two situations. We discuss what we mean by “large sample” after we briefly study these intervals.
Our construction of a 95% confidence interval for the mean SATM score began by noting that any Normal distribution has probability about 0.95 within ±2 standard deviations of its mean. To construct a level C confidence interval we first catch the central C area under a Normal curve. That is, we must find the number z* such that any Normal distribution has probability C within ±z* standard deviations of its mean.
Because all Normal distributions have the same standardized form, we can obtain everything we need from the standard Normal curve. Figure 6.4 shows how C and z* are related. Values of z* for many choices of C appear in the row labeled z* at the bottom of Table D. Here are the most important entries from that row:
z* | 1.645 | 1.960 | 2.576 |
C | 90% | 95% | 99% |
Notice that for 95% confidence the value 2 obtained from the 68–95–99.7 rule is replaced with the more precise 1.96.
349
As Figure 6.4 reminds us, any Normal curve has probability C between the point z* standard deviations below the mean and the point z* standard deviations above the mean. The sample mean has the Normal distribution with mean μ and standard deviation so there is probability C that lies between
This is exactly the same as saying that the unknown population mean μ lies between
That is, there is probability C that the interval contains μ. This is our confidence interval. The estimate of the unknown μ is , and the margin of error is .
CONFIDENCE INTERVAL FOR A POPULATION MEAN
Choose an SRS of size n from a population having unknown mean μ and known standard deviation σ. The margin of error for a level C confidence interval for μ is
Here, z* is the value on the standard Normal curve with area C between the critical points −z* and z*. The level C confidence interval for μ is
The confidence level of this interval is exactly C when the population distribution is Normal and is approximately C when n is large in other cases.
Starting in 2008, Sallie Mae, a major provider of education loans and savings programs, has conducted an annual study titled “How America Pays for College.” In the 2015 survey, 1600 randomly selected individuals (800 parents of undergraduate students and 800 undergraduate students) were surveyed by telephone.3
350
Many of the survey questions focus on the composition of funding sources used to pay for college, so the undergraduates in the survey are often responding for their parents. For example, each participant is asked to report how much of the parent’s current income is used to pay for college. Do you think it is wise to combine responses across the parents and undergraduates? Are you fully aware of how much money your parents are spending and borrowing for college? The authors report overall averages and percents in their report. We will also consider this a sample from one population but this is certainly debatable.
EXAMPLE 6.4
Average college savings fund contribution. One survey question asked how much money from a college savings fund, such as a 529 plan, is used to pay for college. Of the 1600 who were surveyed, n = 1593 provided an answer. Nonresponse should always be considered as a source of bias. In this case, the nonresponse is very low, so we’ll proceed by treating the n = 1593 sample as if it were an unbiased sample.
The average amount is $1768. It’s very likely that this distribution is highly skewed to the right with many small amounts and a few very large amounts. Nevertheless, because the sample size is quite large, we can rely on the central limit theorem to assure us that the confidence interval based on the Normal distribution will be a good approximation.
Let’s compute an approximate 95% confidence interval for the true mean amount contributed from a college savings fund among all undergraduates. We’ll assume that the standard deviation for the population of college savings fund contributions is $1483. For 95% confidence, we see from Table D that z* = 1.960. The margin of error for the 95% confidence interval for μ is, therefore,
We have computed the margin of error with more digits than we really need. Our mean is rounded to the nearest $1, so we will do the same for the margin of error. Keeping additional digits would provide no additional useful information. Therefore, we will use m = 37. The approximate 95% confidence interval is
= (1731, 1805)
We are 95% confident that the mean amount contributed from a college savings fund among all undergraduates is between $1731 and $1805.
Suppose that the researchers who designed this study had used a different sample size. How would this affect the confidence interval? We can answer this question by changing the sample size in our calculations and assuming that the sample mean is the same.
351
EXAMPLE 6.5
How sample size affects the confidence interval. As in Example 6.4, the sample mean of the college savings fund contribution is $1768 and the population standard deviation is $1483. Suppose that the sample size is only 177 but still large enough for us to rely on the central limit theorem. In this case, the margin of error for 95% confidence is
and the approximate 95% confidence interval is
= (1657, 1879)
Notice that the margin of error for this example is three times as large as the margin of error that we computed in Example 6.4. The only change that we made was to assume that the sample size is 177 rather than 1593. This sample size is one-ninth of the original 1593. Thus, we triple the margin of error when we reduce the sample size to one-ninth of the original value. Figure 6.5 illustrates the effect in terms of the intervals.
USE YOUR KNOWLEDGE
6.6 Average amount paid for college. Refer to Example 6.4. The average annual amount the n = 1593 families paid for college was $24,164.4 If the population standard deviation is $8500, give the 95% confidence interval for μ, the average annual amount a family pays for a college undergraduate.
6.7 Changing the sample size. In the setting of the previous exercise, would the margin of error for 95% confidence be roughly doubled or halved if the sample size were raised to n = 6375? Verify your answer by performing the calculations.
6.8 Changing the confidence level. In the setting of Exercise 6.7, would the margin of error for 99% confidence be larger or smaller? Verify your answer by performing the calculations.
The argument leading to the form of confidence intervals for the population mean μ rested on the fact that the statistic used to estimate μ has a Normal distribution. Because many sample estimates have Normal distributions (at least approximately), it is useful to notice that the confidence interval has the form
352
estimate ± z*σestimate
The estimate based on the sample is the center of the confidence interval. The margin of error is z*σestimate. The desired confidence level determines z* from Table D. The standard deviation of the estimate is found from knowledge of the sampling distribution in a particular case. When the estimate is from an SRS, the standard deviation of the estimate is . We return to this general form numerous times in the following chapters.
How confidence intervals behave
The margin of error for the mean of a Normal population illustrates several important properties that are shared by all confidence intervals in common use. The user chooses the confidence level, and the margin of error follows from this choice.
Both high confidence and a small margin of error are desirable characteristics of a confidence interval. High confidence says that our method almost always gives correct answers. A small margin of error says that we have pinned down the parameter quite precisely.
Suppose that in planning a study you calculate the margin of error and decide that it is too large. Here are your choices to reduce it:
• Use a lower level of confidence (smaller C).
• Choose a larger sample size (larger n).
• Reduce σ.
For most problems, you would choose a confidence level of 90%, 95%, or 99%, so z* will be 1.645, 1.960, or 2.576, respectively. Figure 6.4 (page 349) shows that z* will be smaller for lower confidence (smaller C). The bottom row of Table D also shows this. If n and σ are unchanged, a smaller z* leads to a smaller margin of error.
EXAMPLE 6.6
How the confidence level affects the confidence interval. Suppose that for the college saving fund contribution data in Example 6.4 (page 350), we wanted 99% confidence. Table D tells us that for 99% confidence, z* = 2.576. The margin of error for 99% confidence based on 1593 observations is
and the 99% confidence interval is
= (1672, 1864)
Requiring 99%, rather than 95%, confidence has increased the margin of error from 37 to 96. Figure 6.6 compares the two intervals.
353
Similarly, choosing a larger sample size n reduces the margin of error for any fixed confidence level. The square root in the formula implies that we must multiply the number of observations by 4 in order to cut the margin of error in half. Likewise, if we want to reduce the standard deviation of by a factor of 4, we must take a sample 16 times as large.
The standard deviation σ measures the variation in the population. You can think of the variation among individuals in the population as noise that obscures the average value μ. It is harder to pin down the mean μ of a highly variable population; that is why the margin of error of a confidence interval increases with σ.
In practice, we can sometimes reduce σ by carefully controlling the measurement process. We also might change the mean of interest by restricting our attention to only part of a large population. Focusing on a subpopulation will often result in a smaller σ. This is why many medical studies only use healthy male subjects. The tradeoff, however, is less generalizable results.
Choosing the sample size
A wise user of statistics never plans data collection without, at the same time, planning the inference. You can arrange to have both high confidence and a small margin of error. The margin of error of the confidence interval for a population mean is
Notice once again that it is the size of the sample that determines the margin of error. The size of the population (as long as the population is much larger than the sample) does not influence the sample size we need.
To obtain a desired margin of error m, plug in the value of σ and the value of z* for your desired confidence level, and solve for the sample size n. Here is the result.
SAMPLE SIZE FOR DESIRED MARGIN OF ERROR
The confidence interval for a population mean will have a specified margin of error m when the sample size is
This formula does not account for collection costs. In practice, taking observations costs time and money. The required sample size may be impossibly expensive. In those situations, you might consider a larger margin of error and/or a lower confidence level to find a workable sample size.
354
EXAMPLE 6.7
How many undergraduates should we survey? Suppose that we are planning a survey similar to the one described in Example 6.4 (page 350). If we want the margin of error for the average amount contributed from a college savings plan to be $30 with 95% confidence, what sample size n do we need? For 95% confidence, Table D gives z* = 1.960. For σ we will use the value from the previous study, $1483. If the margin of error is $30, we have
Because 9387 measurements will give a slightly wider interval than desired and 9388 measurements a slightly narrower interval, we should choose n = 9388. We need information from 9388 undergraduates to determine an estimate of mean college savings fund contribution with the desired margin of error.
It is always safe to round up to the next higher whole number when finding n because this will give us a smaller margin of error. The purpose of this calculation is to determine a sample size that is sufficient to provide useful results, but the determination of what is useful is a matter of judgment.
Would we need a much larger sample size to obtain a margin of error of $25? Here is the calculation:
A sample of n = 13,519 is much larger, and the costs of such a large sample may be prohibitive.
Unfortunately, the actual number of usable observations is often less than what we plan at the beginning of a study. This is particularly true of data collected in surveys but is an important consideration in most studies. Careful study designers often assume a nonresponse rate or dropout rate that specifies what proportion of the originally planned sample will fail to provide data. We use this information to calculate the sample size to be used at the start of the study.
For example, if in Example 6.7 we expect only 50% of those contacted to respond, we would need to start with a sample size of 2 × 9388 = 18,776 to obtain usable information from 9388 undergraduates and parents of undergraduates.
USE YOUR KNOWLEDGE
6.9 Starting salaries. You are planning a survey of starting salaries for recent computer science majors. In the latest survey by the National Association of Colleges and Employers, the average starting salary was reported to be $61,287.5 If you assume that the standard deviation is $3850, what sample size do you need to have a margin of error equal to $500 with 95% confidence?
6.10 Changes in sample size. Suppose that in the setting of the previous exercise you have the resources to contact 300 recent graduates. If all respond, will your margin of error be larger or smaller than $500? What if only 50% respond? Verify your answers by performing the calculations.
355
Some cautions
We have already seen that small margins of error and high confidence can require large numbers of observations. You should also be keenly aware that any formula for inference is correct only in specific circumstances. If the government required statistical procedures to carry warning labels like those on drugs, most inference methods would have long labels. Our formula for estimating a population mean comes with the following list of warnings for the user:
• The data should be an SRS from the population. We are completely safe if we actually did a randomization and drew an SRS. We are not in great danger if the data can plausibly be thought of as independent observations from a population. That is the case in Examples 6.4 through 6.7, provided the undergraduates and parents can be considered one population.
• The formula is not correct for probability sampling designs more complex than an SRS. Correct methods for other designs are available. We will not discuss confidence intervals based on multistage or stratified samples (page 195). If you plan such samples, be sure that you (or your statistical consultant) know how to carry out the inference you desire.
• There is no correct method for inference from data haphazardly collected with bias of unknown size. Fancy formulas cannot rescue badly produced data.
resistant measure, p. 30
• Because is not a resistant measure, outliers can have a large effect on the confidence interval. You should search for outliers and try to correct them or justify their removal before computing the interval. If the outliers cannot be removed, ask your statistical consultant about procedures that are not sensitive to outliers.
• If the sample size is small and the population is not Normal, the true confidence level will be different from the value C used in computing the interval. Prior to any calculations, examine your data carefully for skewness and other signs of non-Normality. Remember though that the interval relies only on the distribution of , which even for quite small sample sizes is much closer to Normal than is the distribution of the individual observations. When n ≥ 15, the confidence level is not greatly disturbed by non-Normal populations unless extreme outliers or quite strong skewness are present. Our college fund contribution data in Example 6.4 are very likely skewed, but because of the large sample size, we are confident that the distribution of the sample mean will be approximately Normal.
standard
deviation s, p. 38
• The interval assumes that the standard deviation σ of the population is known. This unrealistic requirement renders the interval of little use in statistical practice. We will learn in the next chapter what to do when σ is unknown. If, however, the sample is large, the sample standard deviation s will be close to the unknown σ. The interval is then an approximate confidence interval for μ.
The most important caution concerning confidence intervals is a consequence of the first of these warnings. The margin of error in a confidence interval covers only random sampling errors. The margin of error is obtained from the sampling distribution and indicates how much error can be expected because of chance variation in randomized data production.
356
Practical difficulties such as undercoverage and nonresponse in a sample survey cause additional errors. These errors can be larger than the random sampling error. This often happens when the sample size is large (so that is small). Remember this unpleasant fact when reading the results of an opinion poll or other sample survey. The practical conduct of the survey influences the trustworthiness of its results in ways that are not included in the announced margin of error.
Every inference procedure that we will meet has its own list of warnings. Because many of the warnings are similar to those we have mentioned, we will not print the full warning label each time. It is easy to state (from the mathematics of probability) conditions under which a method of inference is exactly correct. These conditions are never fully met in practice.
For example, no population is exactly Normal. Deciding when a statistical procedure should be used in practice often requires judgment assisted by exploratory analysis of the data. Mathematical facts are, therefore, only a part of statistics. The difference between statistics and mathematics can be stated thusly: mathematical theorems are true; statistical methods are often effective when used with skill.
Finally, you should understand what statistical confidence does not say. Based on our SRS in Example 6.3, we are 95% confident that the mean SATM score for the California students lies between 486 and 504. This says that this interval was calculated by a method that gives correct results in 95% of all possible samples. It does not say that the probability is 0.95 that the true mean falls between 486 and 504. No randomness remains after we draw a particular sample and compute the interval. The true mean either is or is not between 486 and 504. The probability calculations of standard statistical inference describe how often the method, not a particular sample, gives correct answers.
USE YOUR KNOWLEDGE
6.11 Nonresponse in a survey. In earlier versions of the Sallie Mae survey of Example 6.4 (page 350), participants were asked to report the undergraduate’s outstanding credit card balance. Only about a third reported this amount. Provide a couple of reasons why a survey respondent might not provide an amount. Based on these reasons, do you think the sample mean using just the reported amounts is biased? Is the margin of error based just on the reported amounts a good measure of precision? Explain your answers.