OBJECTIVES By the end of this section, I will be able to …
We have seen how confidence intervals can be used to estimate the unknown value of a population mean or a population proportion. However, the variability of a population is also important. As we have learned, less variability is usually better. For example, a tool manufacturer relies on a quality control technician (who has a strong background in statistics) to make sure that the tools the company is making do not vary appreciably from the required specifications. Otherwise, the tools may be too large or too small. Data analysts therefore construct confidence intervals to estimate the unknown value of the population parameters that measure variability: the population variance and the population standard deviation .
474
We first need to become acquainted with the (chi-square) distribution, which is used to construct these confidence intervals.
1 Properties of the (chi-Square) Distribution
The (pronounced ky-square, to rhyme with “my square”) distribution was discovered in 1875 by the German physicist Friedrich Helmert and further developed in 1900 by the English statistician Karl Pearson. It is a continuous distribution, so the random variable is continuous.
Just as we did with the normal and distributions, we can find probabilities associated with values of , and vice versa. Similar to any continuous distribution, probability is represented by area below the curve above an interval. We examine the properties of the distribution and then learn how to use the table to find the critical values of the distribution.
Properties of the Distribution
To construct the confidence intervals in this section, we will need to find the critical values of a distribution for the given confidence level , using either the table (Table E in the Appendix) or technology. The table is somewhat similar to the table (Table D in the Appendix); both tables show the degrees of freedom in the left column. The area to the right of the critical value is given across the top of the table.
The distribution is not symmetric, so we cannot construct the confidence interval for using the “point estimate ± margin of error” method. Instead, the lower bound and upper bound for the confidence interval are determined using two critical values:
= the value of the distribution with area to its right (Figure 35)
= the value of the distribution with area to its right (Figure 35)
475
For instance, for a 95% confidence interval , and . Thus, represents the value of the distribution with area to the right of the critical value. The second critical value represents the value of the distribution with area to the right of the critical value.
EXAMPLE 23 Finding the critical values
Note: If the appropriate degrees of freedom are not given in the table, the conservative solution is to take the next row with the smaller df.
Find critical values for a 90% confidence interval, where we have a sample size of size .
Solution
For a 90% confidence interval,
So we are seeking (1) , the critical value with area to the right of it, and (2) , the critical value with area to the right of it.
Because , the degrees of freedom is . To find for df = 9, go across the top of the table (Table E in the Appendix) until you see 0.95 (Figure 36). is somewhere in that column. Now go down that column until you see your number of degrees of freedom df = 9. Thus, for df = 9, . For a distribution with 9 degrees of freedom, there is area = 0.95 to the right of 3.325.
476
Similarly, is found in the column labeled “0.05” and the row corresponding to . We find that , as shown in Figure 37.
NOW YOU CAN DO
Exercises 9–16.
YOUR TURN #16
Find critical values for a 95% confidence interval, where we have a sample size of size .
(The solutions are shown in Appendix A.)
2 Constructing Confidence Intervals for the Population Variance and Standard Deviation
We derive the formula for a confidence interval for the population variance . Suppose we take a random sample of size from a normal population with mean and standard deviation . Then the statistic
follows a distribution with degrees of freedom, where represents the sample variance. From Figure 35, we see that of the values of lie between and . These values are described as
Rearranging this inequality so that is in the numerator gives us the formula for the confidence interval for :
Thus, the lower bound of the confidence interval for is , and the upper bound is . Taking the square root of each gives us the lower and upper bounds or the confidence interval for .
Confidence Interval for the Population Variance
Suppose we take a sample of size from a normal population with mean and standard deviation . Then a confidence interval for the population variance is given by
where represents the sample variance and and are the critical values for a distribution with degrees of freedom.
477
Confidence Interval for the Population Standard Deviation
A confidence interval for the population standard deviation is then given by
EXAMPLE 24 Constructing confidence intervals for the population variance and population standard deviation
electricmiles
The accompanying table shows the miles-per-gallon equivalent (MPGe) for fve electric cars, as reported by www.hybridcars.com in 2014. The normal probability plot in Figure 38 indicates that the data are normally distributed.
Electric Vehicle | Mileage (MPGe) |
---|---|
Tesla Model S | 89 |
Nissan Leaf | 99 |
Ford Focus | 105 |
Mitsubishi i-MiEV | 112 |
Chevrolet Spark | 119 |
electricmiles
Solution
There are electric cars in our sample, so the degrees of freedom equal .
For a 95% confidence interval,
From the table (Table E in the Appendix), therefore,
Figures 39 through 41 show these results using Excel, Minitab, and JMP.
Figure 42 shows the descriptive statistics for MPGe, as obtained by the TI-83/84. The sample standard deviation is .
478
Thus, our 95% confidence interval for is given by
We are 95% confident that the population variance lies between 48.17 and 1109.09 miles per gallon squared, that is, (MPG)2. (Recall that the variance is measured in units squared.) It is unclear what miles per gallon squared means, so we prefer to construct a confidence interval for the population standard deviation .
We are 95% confident that the population standard deviation lies between 6.94 and 33.3 miles per gallon. Figure 43 shows the two confidence intervals obtained using Minitab.
Figure 44 shows the confidence interval for obtained using JMP. We are interested in the bottom row, which has the confidence interval for the population standard deviation .
NOW YOU CAN DO
Exercises 17–32.