428
OBJECTIVES By the end of this section, I will be able to …
1 Calculate a Point Estimate of the Population Mean
Recall from Section 1.2 that characteristics of a sample, such as the sample mean , are called statistics, whereas characteristics of a population, such as the population mean , are called parameters. Statistical inference consists of methods for estimating and drawing conclusions about parameters, based on the corresponding statistic. For example, we use the known value of to estimate the unknown value of .
Suppose a random sample of 30 male students at your school produced a sample mean height of . We could then use this statistic to infer that the population mean height of all male students at your school was close to 70 inches. This value of is called a point estimate of the population mean .
Point estimation is the process of estimating unknown population parameters by known sample statistics. The value of each sample statistic used as an estimate is called a point estimate.
EXAMPLE 1 Calculating a point estimate
Farmers work hard to increase the yield of their acreage. Yield represents the number of bushels of a crop produced per acre. Suppose we are interested in estimating the population mean yield for winter wheat across all 50 states. Shown here is the mean July 2014 yield for a sample of five states, in bushels, as published by the U.S. Department of Agriculture (USDA).
State | Yield (bushels) |
---|---|
California | 85 |
Georgia | 55 |
Illinois | 67 |
Ohio | 68 |
Texas | 25 |
429
Solution
NOW YOU CAN DO
Exercises 11–14.
YOUR TURN #1
See Example 1. The USDA reports the yields for Colorado, Indiana, Maryland, Michigan, and Pennsylvania to be 36, 68, 65, 70, and 63 bushels, respectively.
(The solutions are shown in Appendix A.)
However, because a sample is only a small subset of the population, generalizing from a sample to the population carries the risk that the point estimate may not be very accurate. For example, do you think that the population mean yield of winter wheat exactly equals our point estimate of 60 bushels per acre? It's not likely, because we learned in Example 1 of Chapter 7 (page 397) that different samples will produce different sample means, and thus different point estimates of . Our point estimate may be close to or it may be far from . In other words, we have no measure of confidence that our particular point estimate is close to . There has to be a better way, and there is: confidence intervals, the subject of this chapter.
2 The Interval for the Population Mean
Although we cannot measure how confident we are of as a point estimate for , we can use the point estimate to find an interval that is likely to contain . Suppose we are interested in estimating the mean height of the students at your school. The students in your class are a sample of the population of students at your school, so we can calculate the sample mean height of the students in your class to be inches (5 feet 7.5 inches tall).
We are 90% confident that lies between 66.5 inches and 68.5 inches.
We may then use as a point estimate of the unknown population mean height of all students at your school. However, this estimate is not likely to be exactly correct. To address this uncertainty in our estimate, we can use a range of heights instead, such as 67.5 inches, give or take an inch, which we write
and would equal the interval
The “1 inch” is called the margin of error. We might then say that we are 90% confident that the mean height of all students at our school lies in the
To increase the confidence in our estimate, we increase the margin of error, so that we might say we are 95% confident that the mean height of all students at our school lies in the interval or the interval (65.5 inches, 69.5 inches). These two intervals are examples of what are called confidence intervals.
430
A confidence interval is an estimate of a parameter consisting of an interval of numbers based on a point estimate, together with a confidence level specifying the probability that the interval contains the parameter.
For example, our estimate that the mean height of all students at our school would lie in the interval (66.5 inches, 68.5 inches) was reported with confidence level 90%.
Confidence intervals are often reported in the format:
In the 90% confidence interval above, we have lower bound = 66.5 and upper bound = 68.5.
Try not to confuse confidence interval with confidence level. A confidence interval is an interval of values on the number line. A confidence level is a percent, like 95%.
We use the (alpha) notation here because it ties in with the notation we will need in Chapter 9, Hypothesis Testing.
A confidence level of 90% for a confidence interval means that, if we repeat the experiment 10 times, we would expect about 9 out of 10 times (90%) that the interval (lower bound, upper bound) will capture the population parameter.
Recall that, in previous chapters, we calculated probabilities for normal distributions using the standard normal . We can use to develop the formula for the confidence intervals for the population mean.
But before we do so, we need to define some notation.
Next, we use the facts we learned in Chapter 7 about the sampling distribution of the sample mean to develop the formula for the confidence interval for the mean.
Plugging this formula for back into the earlier inequality, , gives
431
We then use algebra to isolate as the middle term:
Therefore, because areas represent probabilities, we can write
The quantities on either side of in this inequality represent the lower bound and the upper bound for a confidence interval for . This confidence interval for is based on the standard normal distribution, so it is called the interval for the population mean .
To use the interval for , the value of must be known.
Interval for the Population Mean
The interval for may be constructed only when either of the following two conditions are met:
When a random sample of size is taken from a population, a confidence interval for is given by
where is the confidence level. The interval can also be written as
and is denoted
EXAMPLE 2 Determining whether the interval for may be used
For the following situations, state whether the confidence interval for the population mean may be used. Assume the population is normally distributed.
Solution
NOW YOU CAN DO
Exercises 15–20.
YOUR TURN #2
For the following situations, state whether the confidence interval for the population mean may be used:
(The solutions are shown in Appendix A.)
432
Two important results from Chapter 7 form the conditions that allow us to construct the interval for :
The Normal Density Curve applet may be used to find critical values for confidence levels not listed in Table 1.
Table 1 provides a listing of values for the most common confidence levels.
Confidence level ()100% | α | ||
0.20 | 0.10 | 1.28 | |
0.10 | 0.05 | 1.645 | |
0.05 | 0.025 | 1.96 | |
0.01 | 0.005 | 2.576 |
EXAMPLE 3 Finding the value of
For the following situations, find the value of :
Solution
NOW YOU CAN DO
Exercises 21–26.
YOUR TURN #3
For the following situations, find the value of :
(The solutions are shown in Appendix A.)
EXAMPLE 4 Constructing a confidence interval for the mean of a normal population
The College Board reports that the scores on the 2014 SAT Math test were normally distributed. A sample of 25 SAT scores had a mean of . Assume that the population standard deviation of such scores is . Construct a 90% confidence interval for the population mean score on the 2014 SAT Math test.
433
Be careful! In order to use the interval for , the population standard deviation must be known, not just the sample standard deviation. If the word problem provides the sample standard deviation but not the population standard deviation , then you cannot use the interval. You might be able to use the confidence interval for (Section 8.2).
Solution
Because the population is normal and the population standard deviation is known, the requirements for the interval are met:
We are given , , and . From Table 1, we have . Thus,
We are 90% confident that the population mean score on the 2014 Mathematics SAT test lies between 471.2 and 548.8.
NOW YOU CAN DO
Exercises 27–30.
YOUR TURN #4
For the scenario in Example 4, construct a 95% confidence interval for the population mean score on the 2014 SAT Math test.
(The solution is shown in Appendix A.)
What Does This Confidence Interval Mean?
What does the 90% in the phrase 90% confidence interval mean? If we take sample after sample for a very long time, then in the long run, the proportion of intervals that will contain the population mean will equal 90%.
Interpreting Confidence Intervals
You may use the following generic interpretation for the confidence intervals that you construct: “We are 90% (or 95% or 99% and so on) confident that the population mean__________(for example, SAT Math score) lies between__________ (lower bound) and__________(upper bound).”
The interval for the population mean takes the form
where the point estimate equals the sample mean and the margin of error equals .
The margin of error is a measure of the precision of the confidence interval estimate. For the interval, the margin of error takes the form . Smaller values of indicate smaller margin of error, and therefore, greater precision.
Later in this section (page 437) we learn ways to reduce the margin of error.
For example, the confidence interval from Example 4 has the form
434
EXAMPLE 5 Constructing a interval for the population mean for a large sample size
Motor Vehicle Fuel Efficiency
One of the variables in our case study is City MPG, which is the number of miles a vehicle can travel in city conditions on one gallon of gas. Because we have information on the entire population of 1141 vehicles, we know the population standard deviation . We obtained a sample of 100 vehicles and observed a sample mean city gas mileage of .
Note: As a check on your arithmetic, make sure that .
In other words, the sample mean should lie exactly midway between the lower bound and the upper bound.
Solution
The formula for the confidence interval is given by
We are given , , and . For a confidence level of 90%, Table 1 provides the value of . Plugging into the formula:
NOW YOU CAN DO
Exercises 31–34.
YOUR TURN #5
For the scenario in Example 5, construct a 99% confidence interval for , the population mean City MPG for all vehicles.
(The solution is shown in Appendix A.)
435
The confidence Interval applet allows you to see for yourself how individual samples generate intervals that either do or do not contain the population mean.
Developing Your Statistical Sense
What Is Random Here?
It is important to understand that it is the interval that is random, not the population mean . The interval is formed by sample statistics such as , and for each different sample we get different values for the statistics. So the interval is random because it is constructed using , which is also random. The population mean , though usually unknown, is nevertheless constant.
We generated 10 samples of size 100 vehicles from the Fuel Efficiency data set, and observed the City MPG of each vehicle. For each sample, a 90% confidence interval for the population mean City MPG was constructed. The results are shown in Figure 3. Note that, because we have the entire population of 1141 vehicles, we know the population mean City MPG is , which is also shown in Figure 3. Note that the confidence intervals are random, whereas is constant. The confidence intervals are random because they are based on the different values that the sample mean takes with each sample. The randomness involved in the sampling leads to the randomness of the values of . (This relates to what we learned in Chapter 7: the sample mean is a random variable that has its own distribution, the sampling distribution.)
Now, the confidence interval from our sample in Example 5 is shown as the first confidence interval, and is rounded to (19.8, 21.6). Note that this confidence interval happened to “capture” the population mean . However, one of the confidence intervals did not capture the population mean (the red one). It turns out that 9 out of 10 of the samples (90%) produced confidence intervals that contained . But it did not have to turn out this way. The 90% refers to the proportion of intervals that will contain after a great many samples are taken.
436
EXAMPLE 6 intervals for using technology
highwaympg 16
Motor Vehicle Fuel Efficiency
Another of the variables in our case study is Highway MPG, which is the number of miles a vehicle can travel on a highway on one gallon of gas. We know the population standard deviation . The sample of 16 vehicles, shown here, has a sample mean highway gas mileage of .
Vehicle | Highway MPG | Vehicle | Highway MPG |
---|---|---|---|
Honda CR-V | 30 | Subaru Impreza | 25 |
Nissan Pathfinder | 26 | Ford Mustang | 26 |
Acura MDX | 28 | Cadillac ATS | 31 |
Porsche Cayenne | 29 | Chevrolet Camaro | 24 |
Mercedes-Benz GLK 250 | 33 | Ford Taurus | 29 |
Chevrolet Chevy SS | 21 | Ford Expedition | 20 |
Dodge Charger | 27 | Lincoln MKT | 25 |
Jeep Compass | 23 | BMW X1 | 34 |
Solution
We shall use the instructions provided in the Step-by-Step Technology Guide at the end of this section (page 441). The results for the TI-83/84 in Figure 5 show that the 95% confidence interval for the population mean Highway MPG is
Figure 5 also shows the sample mean , the sample standard deviation , and the sample size .
The Minitab results are provided in Figure 6. The “assumed standard deviation” is indicated to be . Then the sample size , the sample mean , and the sample standard deviation are displayed. “SE Mean” refers to the standard error of the mean, but we don't need it here. Finally, the 95% confidence interval is given as (lower bound = 23.84, upper bound = 30.04).
437
The JMP results are shown in Figure 7. The sample mean is shown in the first column, with the sample standard deviation below it. The 95% confidence interval is given in the row labeled Mean, with lower bound = 23.84 (rounded) and upper bound = 30.04 (rounded).
3 Ways to Reduce the Margin of Error
Remember that the “±” notation always represents a pair of numbers.
Recall that the interval for takes the form
where . We interpret the margin of error for a confidence interval for as follows:
“We can estimate to within units with ()100% confidence.”
EXAMPLE 7 Finding and interpreting the margin of error
In Example 5, the interval for the population mean city gas mileage for all motor vehicles is:
Solution
The point estimate is . Thus, the 95% confidence interval for the population mean city gas mileage for all motor vehicles takes the following form:
NOW YOU CAN DO
Exercises 35–42.
438
Note: When it comes to the margin of error , smaller is better!
Of course, we want our confidence interval estimates to be as precise as possible. Therefore, we want the margin of error to be as small as possible, which would in turn result in a tighter confidence interval. Tighter confidence intervals are better, because the likely maximum difference between the sample mean and the population mean is reduced.
So how do we reduce the size of the margin of error? Let's look at the margin of error for the interval:
The population standard deviation is fixed, so only and can vary. There are therefore two strategies for decreasing the margin of error:
EXAMPLE 8 Decreasing the margin of error by decreasing the confidence level
For the confidence interval for the population mean city gas mileage in Example 5, suppose we reduce the confidence level from 90% to 80% and leave everything else unchanged. Find the new margin of error. Describe how the margin of error has changed.
Solution
From Example 5, we have the margin of error for the 90% confidence interval for as follows:
Decreasing the confidence level from 90% to 80% decreases from 1.645 to 1.28. This gives us the margin of error for the 90% confidence interval as:
Decreasing the confidence level from 90% to 80% decreases the margin of error from 0.93 mpg to 0.72 mpg.
Developing Your Statistical Sense
There's No Free Lunch
The margin of error in Example 8 is smaller than the one in Example 5, which is good because it gives a more precise estimate of . However, this smaller margin of error is due entirely to the decrease in the confidence level, which is not good. In statistical data analysis, there is rarely a free lunch. The trade-off here is that, while the margin of error went down, so did the confidence level, from 90% to 80%. On the other hand, confidence intervals that are too wide can be useless. For example, we can be 99.9999% confident that the population mean age of college students in Florida lies between 15 and 75 years old. But, so what? The interval is too wide to be of practical use. More useful would be a 95% confidence interval that the population mean age of college students in Florida lies between 20 and 27.
439
This leads us to Strategy 2 for reducing the margin of error: increase the sample size. The only way to have both high confidence and a tight interval is to boost the sample size.
EXAMPLE 9 Decreasing the margin of error by increasing the sample size
For the confidence interval for the population mean city gas mileage in Example 5, suppose the results were based on a sample of size instead of . Leaving everything else unchanged, find the new margin of error, and describe how the margin of error has changed.
Solution
For , the margin of error is
Increasing the sample size from to has decreased the margin of error from 0.93 mpg to 0.46 mpg.
“More data” is a familiar refrain in statistical analysis. Of course, increasing the sample size often raises pocketbook issues, because large samples can get very expensive (“We want a large-sample estimate of the amount of damage sustained by Corvettes hitting a wall at 90 mph”). Sometimes obtaining large samples is simply impossible. Suppose an astronomer has developed a new technique for predicting corona effects during solar eclipses; she will have to wait a while (say, a few hundred years) to build up a large sample. So, take samples as large as realistically possible to keep the width of the confidence interval as narrow as possible.
Increasingly, technology is being used to perform statistical analysis, including confidence intervals. Therefore, it is important to know how to read and interpret confidence intervals provided by software output. For instance, Example 10 shows how to calculate the margin of error , when the software gives you only the lower bound and upper bound of the confidence interval.
EXAMPLE 10 Finding the margin of error, given the lower and upper bounds
Figure 8 shows the results for a 95% confidence interval for , where represents the population mean score on the SAT Math test. Do the following:
Solution
The TI-83/84 output gives us the following confidence interval:
We interpret this confidence interval as follows: We are 95% confident that the population mean score on the SAT Math test lies between 482.28 and 537.72.
440
Here, we show how to calculate the margin of error, given the lower bound and upper bound of the confidence interval. The confidence interval from (a) is illustrated in Figure 9.
Now, the width of the margin of error is:
In Figure 9, the width of our confidence interval is:
Then, the margin of error is half this width, as shown in Figure 9. This gives us a margin of error of
NOW YOU CAN DO
Exercises 43–46.
In general, when the lower bound and upper bound of the confidence interval for have already been found, then the margin of error may be calculated as follows.
4 Sample Size for Estimating the Population Mean
In general, more data implies more precise results. In fact, when samples are plentiful and cheap, arbitrarily precise confidence intervals with arbitrarily high confidence are possible simply by taking sufficiently large samples.
Therefore, the question arises: How large a sample size do I need to get a tight confidence interval with a high confidence level?
Note: We solve for as follows:
Multiply both sides by :
Divide both sides by :
Square both sides to get the formula for :
Sample Size for Estimating the Population Mean
The sample size for a interval that estimates the population mean to within a margin of error with confidence is given by
where is the value associated with the desired confidence level (Table 1), is the desired margin of error, and is the population standard deviation. By convention, whenever this formula yields a sample size with a decimal, always round up to the next whole number.
441
EXAMPLE 11 Sample size for estimating the population mean
We round up because (a) the sample size must be a whole number and (b) rounding down will lead to a value of with less than the desired confidence level.
Suppose we want to estimate to within $1000 the mean salary of all college graduates who were business majors. Assume . How many business majors would we sample to estimate the mean salary to within $1000 with 95% confidence?
Solution
“Within $1000” means that the desired margin of error is $1000, and 1.96 is the value associated with 95% confidence. Substituting into the formula for the sample size, we get:
Now, when finding the required sample size, if the formula results in a decimal, we always round up to the next whole number. Thus, we need a sample size of for a confidence level of 95%.
NOW YOU CAN DO
Exercises 47–54.
YOUR TURN #6
For the situation in Example 11, suppose we now needed our estimate to be within only $100 the mean salary . How many business majors would we sample to estimate the mean salary to within $100 with 95% confidence?
(The solution is shown in Appendix A.)