8.1 Interval for the Population Mean

428

OBJECTIVES By the end of this section, I will be able to …

  1. Calculate a point estimate of the population mean.
  2. Calculate and interpret a interval for the population mean when the population is normal and when the sample size is large.
  3. Find ways to reduce the margin of error.
  4. Calculate the sample size needed to estimate the population mean.

1 Calculate a Point Estimate of the Population Mean

Recall from Section 1.2 that characteristics of a sample, such as the sample mean , are called statistics, whereas characteristics of a population, such as the population mean , are called parameters. Statistical inference consists of methods for estimating and drawing conclusions about parameters, based on the corresponding statistic. For example, we use the known value of to estimate the unknown value of .

Suppose a random sample of 30 male students at your school produced a sample mean height of . We could then use this statistic to infer that the population mean height of all male students at your school was close to 70 inches. This value of is called a point estimate of the population mean .

Point estimation is the process of estimating unknown population parameters by known sample statistics. The value of each sample statistic used as an estimate is called a point estimate.

EXAMPLE 1 Calculating a point estimate

image

Farmers work hard to increase the yield of their acreage. Yield represents the number of bushels of a crop produced per acre. Suppose we are interested in estimating the population mean yield for winter wheat across all 50 states. Shown here is the mean July 2014 yield for a sample of five states, in bushels, as published by the U.S. Department of Agriculture (USDA).

  1. Find the sample mean yield .
  2. Express as the point estimate of , the unknown population mean winter wheat yield for all 50 states.
State Yield (bushels)
California 85
Georgia 55
Illinois 67
Ohio 68
Texas 25

429

Solution

  1. The sample mean yield is calculated as

  2. The point estimate of , the unknown nationwide mean winter wheat yield for all 50 states, is 60 bushels per acre.

NOW YOU CAN DO

Exercises 11–14.

YOUR TURN #1

See Example 1. The USDA reports the yields for Colorado, Indiana, Maryland, Michigan, and Pennsylvania to be 36, 68, 65, 70, and 63 bushels, respectively.

  1. Find the sample mean yield .
  2. Express as the point estimate of , the unknown population mean winter wheat yield for all 50 states.

(The solutions are shown in Appendix A.)

However, because a sample is only a small subset of the population, generalizing from a sample to the population carries the risk that the point estimate may not be very accurate. For example, do you think that the population mean yield of winter wheat exactly equals our point estimate of 60 bushels per acre? It's not likely, because we learned in Example 1 of Chapter 7 (page 397) that different samples will produce different sample means, and thus different point estimates of . Our point estimate may be close to or it may be far from . In other words, we have no measure of confidence that our particular point estimate is close to . There has to be a better way, and there is: confidence intervals, the subject of this chapter.

2 The Interval for the Population Mean

Although we cannot measure how confident we are of as a point estimate for , we can use the point estimate to find an interval that is likely to contain . Suppose we are interested in estimating the mean height of the students at your school. The students in your class are a sample of the population of students at your school, so we can calculate the sample mean height of the students in your class to be inches (5 feet 7.5 inches tall).

image

We are 90% confident that lies between 66.5 inches and 68.5 inches.

We may then use as a point estimate of the unknown population mean height of all students at your school. However, this estimate is not likely to be exactly correct. To address this uncertainty in our estimate, we can use a range of heights instead, such as 67.5 inches, give or take an inch, which we write

and would equal the interval

The “1 inch” is called the margin of error. We might then say that we are 90% confident that the mean height of all students at our school lies in the

To increase the confidence in our estimate, we increase the margin of error, so that we might say we are 95% confident that the mean height of all students at our school lies in the interval or the interval (65.5 inches, 69.5 inches). These two intervals are examples of what are called confidence intervals.

430

A confidence interval is an estimate of a parameter consisting of an interval of numbers based on a point estimate, together with a confidence level specifying the probability that the interval contains the parameter.

For example, our estimate that the mean height of all students at our school would lie in the interval (66.5 inches, 68.5 inches) was reported with confidence level 90%.

Confidence intervals are often reported in the format:

In the 90% confidence interval above, we have lower bound = 66.5 and upper bound = 68.5.

image Try not to confuse confidence interval with confidence level. A confidence interval is an interval of values on the number line. A confidence level is a percent, like 95%.

We use the (alpha) notation here because it ties in with the notation we will need in Chapter 9, Hypothesis Testing.

A confidence level of 90% for a confidence interval means that, if we repeat the experiment 10 times, we would expect about 9 out of 10 times (90%) that the interval (lower bound, upper bound) will capture the population parameter.

Recall that, in previous chapters, we calculated probabilities for normal distributions using the standard normal . We can use to develop the formula for the confidence intervals for the population mean.

But before we do so, we need to define some notation.

image
Figure 8.1: FIGURE 1 is the value of that has area to the right of it.

Next, we use the facts we learned in Chapter 7 about the sampling distribution of the sample mean to develop the formula for the confidence interval for the mean.

Plugging this formula for back into the earlier inequality, , gives

431

We then use algebra to isolate as the middle term:

Therefore, because areas represent probabilities, we can write

The quantities on either side of in this inequality represent the lower bound and the upper bound for a confidence interval for . This confidence interval for is based on the standard normal distribution, so it is called the interval for the population mean .

To use the interval for , the value of must be known.

Interval for the Population Mean

The interval for may be constructed only when either of the following two conditions are met:

  • The population is normally distributed, and the value of is known.
  • The sample size is large (≥30), and the value of is known.

When a random sample of size is taken from a population, a confidence interval for is given by

where is the confidence level. The interval can also be written as

and is denoted

EXAMPLE 2 Determining whether the interval for may be used

For the following situations, state whether the confidence interval for the population mean may be used. Assume the population is normally distributed.

  1. The sample size is 30, but the value of is unknown.
  2. The sample size is 30, and the value of is known.

Solution

  1. Even though the population is normally distributed, the value of the population standard deviation is unknown, so the interval may not be used.
  2. Now the value of is known, and the population is normally distributed, so the interval may be used.

NOW YOU CAN DO

Exercises 15–20.

YOUR TURN #2

For the following situations, state whether the confidence interval for the population mean may be used:

  1. The population is not normally distributed, and the sample size is 10. The value of is known.
  2. The population is not normally distributed, and the sample size is 100. The value of is known.

(The solutions are shown in Appendix A.)

432

Two important results from Chapter 7 form the conditions that allow us to construct the interval for :

The Normal Density Curve applet may be used to find critical values for confidence levels not listed in Table 1.

Table 1 provides a listing of values for the most common confidence levels.

Table 8.2: Table 1 values for common confidence levels
Confidence level ()100% α
0.20 0.10 1.28
0.10 0.05 1.645
0.05 0.025 1.96
0.01 0.005 2.576

EXAMPLE 3 Finding the value of

For the following situations, find the value of :

  1. Confidence level = 95%

Solution

  1. From Table 1, we have . We mentioned this case earlier (page 430), and in Example 35 of Chapter 6 (page 362).
  2. Table 1 gives us .

NOW YOU CAN DO

Exercises 21–26.

YOUR TURN #3

For the following situations, find the value of :

  1. Confidence level = 99%

(The solutions are shown in Appendix A.)

EXAMPLE 4 Constructing a confidence interval for the mean of a normal population

The College Board reports that the scores on the 2014 SAT Math test were normally distributed. A sample of 25 SAT scores had a mean of . Assume that the population standard deviation of such scores is . Construct a 90% confidence interval for the population mean score on the 2014 SAT Math test.

433

image Be careful! In order to use the interval for , the population standard deviation must be known, not just the sample standard deviation. If the word problem provides the sample standard deviation but not the population standard deviation , then you cannot use the interval. You might be able to use the confidence interval for (Section 8.2).

Solution

Because the population is normal and the population standard deviation is known, the requirements for the interval are met:

We are given , , and . From Table 1, we have . Thus,

We are 90% confident that the population mean score on the 2014 Mathematics SAT test lies between 471.2 and 548.8.

NOW YOU CAN DO

Exercises 27–30.

YOUR TURN #4

For the scenario in Example 4, construct a 95% confidence interval for the population mean score on the 2014 SAT Math test.

(The solution is shown in Appendix A.)

What Does This Confidence Interval Mean?

What does the 90% in the phrase 90% confidence interval mean? If we take sample after sample for a very long time, then in the long run, the proportion of intervals that will contain the population mean will equal 90%.

Interpreting Confidence Intervals

You may use the following generic interpretation for the confidence intervals that you construct: “We are 90% (or 95% or 99% and so on) confident that the population mean__________(for example, SAT Math score) lies between__________ (lower bound) and__________(upper bound).”

The interval for the population mean takes the form

where the point estimate equals the sample mean and the margin of error equals .

The margin of error is a measure of the precision of the confidence interval estimate. For the interval, the margin of error takes the form . Smaller values of indicate smaller margin of error, and therefore, greater precision.

Later in this section (page 437) we learn ways to reduce the margin of error.

For example, the confidence interval from Example 4 has the form

434

EXAMPLE 5 Constructing a interval for the population mean for a large sample size

image

image Motor Vehicle Fuel Efficiency

One of the variables in our case study is City MPG, which is the number of miles a vehicle can travel in city conditions on one gallon of gas. Because we have information on the entire population of 1141 vehicles, we know the population standard deviation . We obtained a sample of 100 vehicles and observed a sample mean city gas mileage of .

  1. Determine whether the requirements are met for constructing the interval for .
  2. Construct a 90% confidence interval for , the population mean City MPG for all vehicles.
  3. Interpret the confidence interval.

Note: As a check on your arithmetic, make sure that .

In other words, the sample mean should lie exactly midway between the lower bound and the upper bound.

Solution

  1. We are not given any information about the distribution of the population, so we don't know if the population is normally distributed. However, the sample size is greater than 30, and the value of is known; therefore, we can proceed to construct the confidence interval.
  2. The formula for the confidence interval is given by

    We are given , , and . For a confidence level of 90%, Table 1 provides the value of . Plugging into the formula:

  3. We are 90% confident that , the population mean City MPG for all motor vehicles, lies between 19.78 mpg and 21.64 mpg. (See Figure 2.)
    image
    Figure 8.2: FIGURE 2 90% confidence interval for the population mean City MPG.

NOW YOU CAN DO

Exercises 31–34.

YOUR TURN #5

For the scenario in Example 5, construct a 99% confidence interval for , the population mean City MPG for all vehicles.

(The solution is shown in Appendix A.)

435

The confidence Interval applet allows you to see for yourself how individual samples generate intervals that either do or do not contain the population mean.

Developing Your Statistical Sense

What Is Random Here?

It is important to understand that it is the interval that is random, not the population mean . The interval is formed by sample statistics such as , and for each different sample we get different values for the statistics. So the interval is random because it is constructed using , which is also random. The population mean , though usually unknown, is nevertheless constant.

We generated 10 samples of size 100 vehicles from the Fuel Efficiency data set, and observed the City MPG of each vehicle. For each sample, a 90% confidence interval for the population mean City MPG was constructed. The results are shown in Figure 3. Note that, because we have the entire population of 1141 vehicles, we know the population mean City MPG is , which is also shown in Figure 3. Note that the confidence intervals are random, whereas is constant. The confidence intervals are random because they are based on the different values that the sample mean takes with each sample. The randomness involved in the sampling leads to the randomness of the values of . (This relates to what we learned in Chapter 7: the sample mean is a random variable that has its own distribution, the sampling distribution.)

Now, the confidence interval from our sample in Example 5 is shown as the first confidence interval, and is rounded to (19.8, 21.6). Note that this confidence interval happened to “capture” the population mean . However, one of the confidence intervals did not capture the population mean (the red one). It turns out that 9 out of 10 of the samples (90%) produced confidence intervals that contained . But it did not have to turn out this way. The 90% refers to the proportion of intervals that will contain after a great many samples are taken.

image
Figure 8.3: FIGURE 3 The confidence intervals are random; is constant.

436

EXAMPLE 6 intervals for using technology

highwaympg 16

image Motor Vehicle Fuel Efficiency

Another of the variables in our case study is Highway MPG, which is the number of miles a vehicle can travel on a highway on one gallon of gas. We know the population standard deviation . The sample of 16 vehicles, shown here, has a sample mean highway gas mileage of .

Vehicle Highway MPG Vehicle Highway MPG
Honda CR-V 30 Subaru Impreza 25
Nissan Pathfinder 26 Ford Mustang 26
Acura MDX 28 Cadillac ATS 31
Porsche Cayenne 29 Chevrolet Camaro 24
Mercedes-Benz GLK 250 33 Ford Taurus 29
Chevrolet Chevy SS 21 Ford Expedition 20
Dodge Charger 27 Lincoln MKT 25
Jeep Compass 23 BMW X1 34
  1. Determine whether the interval for may be applied.
  2. Use the TI-83/84, Minitab, and JMP to construct a 95% confidence interval for the population mean Highway MPG.

Solution

  1. The sample size is not large , so we need to check if the data follow a normal distribution. The normal probability plot of the data in Figure 4 supports the assumption of normality. Further, the population standard deviation is known. We may thus apply the interval for .
    image
    Figure 8.4: FIGURE 4 Normal probability plot of the Highway MPG data.
  2. We shall use the instructions provided in the Step-by-Step Technology Guide at the end of this section (page 441). The results for the TI-83/84 in Figure 5 show that the 95% confidence interval for the population mean Highway MPG is

    Figure 5 also shows the sample mean , the sample standard deviation , and the sample size .

    image
    Figure 8.5: FIGURE 5 TI-83/84 results.

The Minitab results are provided in Figure 6. The “assumed standard deviation” is indicated to be . Then the sample size , the sample mean , and the sample standard deviation are displayed. “SE Mean” refers to the standard error of the mean, but we don't need it here. Finally, the 95% confidence interval is given as (lower bound = 23.84, upper bound = 30.04).

437

image
Figure 8.6: FIGURE 6 Minitab results.

The JMP results are shown in Figure 7. The sample mean is shown in the first column, with the sample standard deviation below it. The 95% confidence interval is given in the row labeled Mean, with lower bound = 23.84 (rounded) and upper bound = 30.04 (rounded).

image
Figure 8.7: FIGURE 7 JMP results.

3 Ways to Reduce the Margin of Error

image Remember that the “±” notation always represents a pair of numbers.

Recall that the interval for takes the form

where . We interpret the margin of error for a confidence interval for as follows:

“We can estimate to within units with ()100% confidence.”

EXAMPLE 7 Finding and interpreting the margin of error

In Example 5, the interval for the population mean city gas mileage for all motor vehicles is:

  1. Find the margin of error .
  2. Express the confidence interval in the form “point estimate ± margin of error”
  3. Interpret the margin of error .

Solution

  1. We find the margin of error as follows:
  2. The point estimate is . Thus, the 95% confidence interval for the population mean city gas mileage for all motor vehicles takes the following form:

  3. We interpret the margin of error by saying that we can estimate the population mean city gas mileage for all vehicles to within 0.93 mpg with 90% confidence.

NOW YOU CAN DO

Exercises 35–42.

438

Note: When it comes to the margin of error , smaller is better!

Of course, we want our confidence interval estimates to be as precise as possible. Therefore, we want the margin of error to be as small as possible, which would in turn result in a tighter confidence interval. Tighter confidence intervals are better, because the likely maximum difference between the sample mean and the population mean is reduced.

So how do we reduce the size of the margin of error? Let's look at the margin of error for the interval:

The population standard deviation is fixed, so only and can vary. There are therefore two strategies for decreasing the margin of error:

EXAMPLE 8 Decreasing the margin of error by decreasing the confidence level

For the confidence interval for the population mean city gas mileage in Example 5, suppose we reduce the confidence level from 90% to 80% and leave everything else unchanged. Find the new margin of error. Describe how the margin of error has changed.

Solution

From Example 5, we have the margin of error for the 90% confidence interval for as follows:

Decreasing the confidence level from 90% to 80% decreases from 1.645 to 1.28. This gives us the margin of error for the 90% confidence interval as:

Decreasing the confidence level from 90% to 80% decreases the margin of error from 0.93 mpg to 0.72 mpg.

Developing Your Statistical Sense

There's No Free Lunch

The margin of error in Example 8 is smaller than the one in Example 5, which is good because it gives a more precise estimate of . However, this smaller margin of error is due entirely to the decrease in the confidence level, which is not good. In statistical data analysis, there is rarely a free lunch. The trade-off here is that, while the margin of error went down, so did the confidence level, from 90% to 80%. On the other hand, confidence intervals that are too wide can be useless. For example, we can be 99.9999% confident that the population mean age of college students in Florida lies between 15 and 75 years old. But, so what? The interval is too wide to be of practical use. More useful would be a 95% confidence interval that the population mean age of college students in Florida lies between 20 and 27.

439

This leads us to Strategy 2 for reducing the margin of error: increase the sample size. The only way to have both high confidence and a tight interval is to boost the sample size.

EXAMPLE 9 Decreasing the margin of error by increasing the sample size

For the confidence interval for the population mean city gas mileage in Example 5, suppose the results were based on a sample of size instead of . Leaving everything else unchanged, find the new margin of error, and describe how the margin of error has changed.

Solution

For , the margin of error is

Increasing the sample size from to has decreased the margin of error from 0.93 mpg to 0.46 mpg.

“More data” is a familiar refrain in statistical analysis. Of course, increasing the sample size often raises pocketbook issues, because large samples can get very expensive (“We want a large-sample estimate of the amount of damage sustained by Corvettes hitting a wall at 90 mph”). Sometimes obtaining large samples is simply impossible. Suppose an astronomer has developed a new technique for predicting corona effects during solar eclipses; she will have to wait a while (say, a few hundred years) to build up a large sample. So, take samples as large as realistically possible to keep the width of the confidence interval as narrow as possible.

Increasingly, technology is being used to perform statistical analysis, including confidence intervals. Therefore, it is important to know how to read and interpret confidence intervals provided by software output. For instance, Example 10 shows how to calculate the margin of error , when the software gives you only the lower bound and upper bound of the confidence interval.

EXAMPLE 10 Finding the margin of error, given the lower and upper bounds

Figure 8 shows the results for a 95% confidence interval for , where represents the population mean score on the SAT Math test. Do the following:

  1. Report the confidence interval in the form “(lower bound, upper bound).”
  2. Interpret the confidence interval.
  3. Calculate the margin of error for the confidence interval.
  4. Interpret the margin of error.
image
Figure 8.8: FIGURE 8 TI-83/84 output for a -interval for .

Solution

  1. The TI-83/84 output gives us the following confidence interval:

  2. We interpret this confidence interval as follows: We are 95% confident that the population mean score on the SAT Math test lies between 482.28 and 537.72.

    440

  3. Here, we show how to calculate the margin of error, given the lower bound and upper bound of the confidence interval. The confidence interval from (a) is illustrated in Figure 9.

    image
    Figure 8.9: FIGURE 9 Margin of error equals half the width of the confidence interval.

    Now, the width of the margin of error is:

    In Figure 9, the width of our confidence interval is:

    Then, the margin of error is half this width, as shown in Figure 9. This gives us a margin of error of

  4. We interpret the margin of error by saying that we can estimate the population mean Math SAT score to within 27.72 points with 95% confidence.

NOW YOU CAN DO

Exercises 43–46.

In general, when the lower bound and upper bound of the confidence interval for have already been found, then the margin of error may be calculated as follows.

4 Sample Size for Estimating the Population Mean

In general, more data implies more precise results. In fact, when samples are plentiful and cheap, arbitrarily precise confidence intervals with arbitrarily high confidence are possible simply by taking sufficiently large samples.

Therefore, the question arises: How large a sample size do I need to get a tight confidence interval with a high confidence level?

Note: We solve for as follows:

Multiply both sides by :

Divide both sides by :

Square both sides to get the formula for :

Sample Size for Estimating the Population Mean

The sample size for a interval that estimates the population mean to within a margin of error with confidence is given by

where is the value associated with the desired confidence level (Table 1), is the desired margin of error, and is the population standard deviation. By convention, whenever this formula yields a sample size with a decimal, always round up to the next whole number.

441

EXAMPLE 11 Sample size for estimating the population mean

We round up because (a) the sample size must be a whole number and (b) rounding down will lead to a value of with less than the desired confidence level.

Suppose we want to estimate to within $1000 the mean salary of all college graduates who were business majors. Assume . How many business majors would we sample to estimate the mean salary to within $1000 with 95% confidence?

Solution

“Within $1000” means that the desired margin of error is $1000, and 1.96 is the value associated with 95% confidence. Substituting into the formula for the sample size, we get:

Now, when finding the required sample size, if the formula results in a decimal, we always round up to the next whole number. Thus, we need a sample size of for a confidence level of 95%.

NOW YOU CAN DO

Exercises 47–54.

YOUR TURN #6

For the situation in Example 11, suppose we now needed our estimate to be within only $100 the mean salary . How many business majors would we sample to estimate the mean salary to within $100 with 95% confidence?

(The solution is shown in Appendix A.)