10 Two-Sample Inference

10.4 Inference for Two Independent Standard Deviations

OBJECTIVES By the end of this section, I will be able to …

Describe the characteristics of the $F$ distribution and the $F$ test for two population standard deviations.
Perform hypothesis tests for two population standard deviations using the critical-value method.
Perform hypothesis tests for two population standard deviations using the $p$ -value method.

1 The $F$ Distribution and the $F$ Test

Sir Ronald A. Fisher

In Sections 10.1–10.3, we were introduced to inference methods for comparing two population means and two population proportions. Here, we learn how to perform hypothesis tests regarding two population standard deviations. Wall Street investors are wary of excessive stock price variability. In this section, we will compare the variability of prices between two tech stocks, Google and Apple, using a new hypothesis test, called the $F$ test. The $F$ test will determine whether there is a significant difference in the variability of the stock prices, as measured by the respective population standard deviations. The $F$ test is based on the $F$ distribution, named in honor of the “grandfather of statistics,” Sir Ronald A. Fisher.

Let population 1 be Google stock prices and population 2 be Apple stock prices. We can test whether Google's stock prices are more variable than those of Apple; that is, we may test whether the standard deviation of Google stock prices, $σ_{1}$ , is greater than the standard deviation of Apple stock prices, $σ_{2}$ . This gives us the following hypotheses for our $F$ test:

$\begin{matrix} H_{0} : σ_{1} = σ_{2} & versus & H_{a} : σ_{1} > σ_{2} \end{matrix}$

Table 16 provides the three possible forms of hypotheses available when performing the $F$ test for comparing two population standard deviations for two populations with standard deviations $σ_{1}$ and $σ_{2}$ , respectively.

Page 618

Table 10.60: Table 16 Three possible forms for the

$F$ test for comparing two population standard deviations

Form of test	Null hypothesis	Alternative hypothesis
Right-tailed test	$H_{0} : σ_{1} = σ_{2}$	$H_{a} : σ_{1} > σ_{2}$
Left-tailed test	$H_{0} : σ_{1} = σ_{2}$	$H_{a} : σ_{1} < σ_{2}$
Two-tailed test	$H_{0} : σ_{1} = σ_{2}$	$H_{a} : σ_{1} \neq σ_{2}$

The requirements for performing the $F$ test are the following:

We have two independent random samples taken from two populations.
The two populations are both normally distributed.

The test statistic for the $F$ test is $F_{data}$ , given as follows.

Test Statistic for the $F$ Test for Comparing Two Population Standard Deviations

Suppose that the two population variances are equal, $σ_{1}^{2} = σ_{2}^{2}$ , and that we have independent random samples of size $n_{1}$ and $n_{2}$ from two normally distributed populations with sample variances $s_{1}^{2}$ and $s_{2}^{2}$ , respectively. Then the test statistic for the $F$ test

$F_{data} = \frac{s_{1}^{2}}{s_{2}^{2}}$

follows an $F$ distribution with $n_{1} - 1$ degrees of freedom in the numerator and $n_{2} - 1$ degrees of freedom in the denominator.

Let's become better acquainted with the $F$ distribution. Similar to the $Χ^{2}$ distribution, the $F$ distribution is right-skewed, never takes negative values, and has an infinite number of different $F$ curves (Figure 20). The shape of the curve depends on degrees of freedom.

FIGURE 20 Shape of the

$F$ distribution for various degrees of freedom.

As noted, the $F$ distribution resembles the $Χ^{2}$ distribution. This is not surprising because the values of the $F$ distribution represent ratios of two $Χ^{2}$ distributions. Moreover, the $F$ distribution has two different degrees of freedom, which we will call ${df}_{1}$ and ${df}_{2}$ , derived from the degrees of freedom of the two $Χ^{2}$ distributions represented in the ratio. Often, ${df}_{1}$ is called the numerator degrees of freedom, and ${df}_{2}$ is called the denominator degrees of freedom.

Properties of the $F$ Curve

The total area under the $F$ curve equals 1.
The value of the $F$ random variable is never negative, so the $F$ curve starts at 0. However, it extends indefinitely to the right. The curve approaches but never quite meets the horizontal axis.
Because of the characteristics described in (2), the $F$ curve is right-skewed.
There is a different $F$ curve for each different pair of degrees of freedom: ${df}_{1}$ and ${df}_{2}$ .

Page 619

The $F$ distribution is continuous, so we can find probabilities associated with values of $F$ , and vice versa, just as we did with the normal, $t$ , and $Χ^{2}$ distributions. Just as for any continuous distribution, probability is represented by the area below the $F$ curve above an interval.

2 Perform the $F$ Test for Comparing Two Population Standard Deviations: Critical-Value Method

To perform the hypothesis tests in this section, as well as in Chapters 12 and 13, we need to find the critical values of an $F$ distribution for a given level of significance $α$ . For example, we may need to find the value of an $F$ distribution that has area $α = 0.05$ to the right of it, or we may need to find the value of an $F$ distribution that has area $α = 0.01$ to the left of it. To find these $F$ critical values, we will work with the $F$ tables (see Appendix Table F). The $F$ tables are somewhat different from the other tables that we have worked with so far.

The notation

$F_{crit} = F_{α, n_{1} - 1, n_{2} - 1}$

represents the critical value of the $F$ distribution with ${df}_{1} = n_{1} - 1$ numerator degrees of freedom and ${df}_{2} = n_{2} - 1$ denominator degrees of freedom, with area $α$ to the right of $F_{α, n_{1} - 1, n_{2} - 1}$ . For example, $F_{0.05, 15, 10}$ represents the value of the $F$ distribution with ${df}_{1} = 15$ and ${df}_{2} = 10$ , with area 0.05 to the right of it. Next, we learn how to find the $F$ critical values using the $F$ tables.

Procedure for Finding $F$ Critical Value for a Given Area $α$ to the Right of it

Suppose we have an $F$ distribution with ${df}_{1}$ and ${df}_{2}$ degrees of freedom. To find the critical value $F_{crit}$ that has area $α$ to the right of it, do the following:

Step 1 Look across the top of the $F$ table until you find your ${df}_{1}$ . Then go down that column until you see your ${df}_{2}$ on the left.
Step 2 For each ${df}_{2}$ on the left, you will see a range of $α$ values from 0.100 to 0.001. Choose the row next to ${df}_{2}$ that has your value of $α$ . The $F$ -value in that row and column is your value of $F_{crit}$ .

Note: When the degrees of freedom are not listed in the $F$ table, we do not necessarily take the closest degrees of freedom we can find. This is because, sometimes, the closest degree of freedom is larger than the original, which leads to misleadingly overprecise results—α level of precision not warranted by the data. For example, this could lead us to find significance where none actually exists.

Developing Your Statistical Sense

Degrees of Freedom Not Listed in the Table

Just as with the $t$ table, not all the degrees of freedom are listed in the $F$ table. If either of the degrees of freedom ${df}_{1}$ or ${df}_{2}$ are not listed in the $F$ table, a conservative solution is to take the next smallest value for whichever of ${df}_{1}$ or ${df}_{2}$ is not listed. For example, suppose ${df}_{1} = n_{1} - 1 = 57 - 1 = 56$ and ${df}_{2} = n_{2} - 1 = 170 - 1 = 169$ . Neither ${df}_{1} = 56$ nor ${df}_{2} = 169$ is given in the $F$ table. Therefore, we set df to be the next smallest value given in the table, ${df}_{1} = 50$ , and we set ${df}_{2}$ to be the next smallest value, ${df}_{2} = 160$ .

The $F$ tables give only the $F$ critical values for a given area $α$ to the right. To find $F$ critical values for a given area $α$ to the left, we use the following property:

$F_{1 - α, n_{1} - 1, n_{2} - 1} = \frac{1}{F_{α, n_{2} - 1, n_{1} - 1}}$

In other words, the value from an $F$ distribution with degrees of freedom ${df}_{1} = n_{1} - 1$ and ${df}_{2} = n_{2} - 1$ and area $α$ to the left of it equals the reciprocal of the value from an $F$ distribution with degrees of freedom ${df}_{1} = n_{2} - 1$ and ${df}_{2} = n_{1} - 1$ and area $α$ to the right of it. Note that the two degrees of freedom get switched.

Page 620

Procedure for Finding $F$ Critical Value for a Given Area $α$ to the Left of it

Switch the values of ${df}_{1}$ and ${df}_{2}$ .
Find $F_{α, n_{2} - 1, n_{1} - 1}$ using the $F$ table.
Calculate $F_{crit} = F_{1 - α, n_{1} - 1, n_{2} - 1} = \frac{1}{F_{α, n_{2} - 1, n_{1} - 1}}$

So, for example, to find the value of an $F$ distribution with degrees of freedom ${df}_{1} = 10$ and ${df}_{2} = 15$ with area $α = 0.05$ to the left of it, follow steps 1 and 2 above to find the value of an $F$ distribution with ${df}_{1} = 15$ and ${df}_{2} = 10$ with area $α = 0.05$ to the right of it. Then compute the reciprocal.

EXAMPLE 19 Finding critical values of the $F$ distribution

Use the excerpt from the $F$ distribution tables in Figure 21 to find the following critical values of the $F$ distribution:

FIGURE 21

$F$ table (excerpt).

Find the critical value with area $α = 0.05$ to the right of it, for an $F$ distribution with ${df}_{1} = 2$ and ${df}_{2} = 7$ .
Find the critical value with area $α = 0.01$ to the left of it, for an $F$ distribution with ${df}_{1} = 6$ and ${df}_{2} = 3$ .

Solution

- Step 1. Go across the top of the $F$ table until we get to ${df}_{1} = 2$ , and go down that column until we see ${df}_{2} = 7$ on the left.
- Step 2. Next to the 7 is a range of $α$ values from 0.100 to 0.001. We choose the row with $α = 0.05$ . The $F$ -value in that row and column is 4.74. Thus, our $F$ critical value is
  
  $F_{crit} = F_{0.05, 2, 7} = 4.74$
  
  Page 621
- Step 1. Switching the two degrees of freedom, we go across the top of the $F$ table until we get to ${df}_{1} = 3$ . Then we go down that column until we see ${df}_{2} = 6$ on the left.
- Step 2. Choose the row with $α = 0.010 = 0.01$ . The $F$ -value in that row and column is 9.78. Thus, our $F$ critical value is
  $F_{crit} = F_{0.99, 6, 3} = \frac{1}{F_{0.01, 6, 3}} = \frac{1}{9.78} = 0.1023$

NOW YOU CAN DO

Exercises 9–20.

Now we use the critical values of the $F$ distribution to help us with the critical-value method of performing the $F$ test for comparing two population standard deviations. Later we show the steps for the $p$ -value method, along with an example.

$F$ Test for Comparing Two Population Standard Deviations: Critical-Value Method

Suppose we have two independent random samples of size $n_{1}$ and $n_{2}$ taken from two normally distributed populations, with population standard deviations $σ_{1}$ and $σ_{2}$ , and sample standard deviations $s_{1}$ and $s_{2}$ , respectively.

Step 1 State the hypotheses. Use one of the forms from Table 17. State the meaning of $σ_{1}$ and $σ_{2}$ .
Step 2 Find the critical value(s) and state the rejection rule. Use Table 17 and the $F$ tables.
Step 3 Find $F_{data}$ .

$F_{data} = \frac{s_{1}^{2}}{s_{2}^{2}}$

follows an $F$ distribution with ${df}_{1} = n_{1} - 1$ and ${df}_{2} = n_{2} - 1$ .
Step 4 State the conclusion and the interpretation. Compare $F_{data}$ with the $F$ critical value from Table 17.

Table 10.61: Table 17 Critical values, rejection rules, and rejection regions

Page 622

We now return to our example comparing the variability of stock prices between Google and Apple.

EXAMPLE 20 $F$ test for comparing two population standard deviations: Critical-value method

Table 18 shows independent samples from Google and Apple stock prices from July 2014 together with the sample sizes and sample standard deviations. Test, using the critical-value method, whether the standard deviation of Google stock prices $σ_{1}$ is greater than the standard deviation of Apple stock prices $σ_{2}$ .

Table 10.62: Table 18 Independent random samples of stock prices, July 2014

Google	Apple
574.79 590.76 583.04 593.06 580.82 599.02 579.55 587.78	93.52 94.03 95.39 96.45 95.60 99.02 97.03
$\begin{matrix} n_{1} = 8 & s_{1} \approx 7.999701 \end{matrix}$	$\begin{matrix} n_{2} = 7 & s_{2} \approx 1.862594 \end{matrix}$

Table 10.62: Source: www.marketwatch.com/tools/quotes/historical.asp.

Solution

The normal probability plots in Figures 22a and 22b show acceptable normality for both samples. We may, therefore, proceed with the $F$ test for comparing population standard deviations.

FIGURE 22a Normal probability plot of Google stock prices.

FIGURE 22b Normal probability plot of Apple stock prices.

Step 1 State the hypotheses. We are testing whether Google's stock prices are more variable than those of Apple. Thus, because Google represents population 1, we have the following hypotheses for our $F$ test:

$\begin{matrix} H_{0} : σ_{1} = σ_{2} & versus & H_{a} : σ_{1} > σ_{2} \end{matrix}$

where $σ_{1}$ represents the standard deviation of Google stock prices and $σ_{2}$ represents the standard deviation of Apple stock prices. Use level of significance $α = 0.05$ .
Step 2 Find the critical value and state the rejection rule. We have ${df}_{1} = n_{1} - 1 = 7$ and ${df}_{2} = n_{2} - 1 = 6$ . From Table 17 and Appendix Table F, our critical value is the $F$ -value with area $α = 0.05$ to the right of it:

$F_{crit} = F_{α, n_{1} - 1, n_{2} - 1} = F_{0, 0.5, 7.6} = 4.21$

Our rejection rule is, therefore, from Table 17: Reject $H_{0}$ if $F_{data} > 4.21$ .

Page 623
Step 3 Find $F_{data}$

$F_{data} = \frac{S_{1}^{2}}{S_{2}^{2}} = \frac{{7.999701}^{2}}{{1.862594}^{2}} \approx 18.45$

follows an $F$ distribution with ${df}_{1} = n_{1} - 1 = n_{1} - 1 = 7$ and ${df}_{2} = n_{2} - 1 = n_{2} - 1 = 6$ .
Step 4 State the conclusion and the interpretation. Because $F_{data} \approx 18.45$ is greater than $F_{crit} = 4.21$ , we reject $H_{0}$ . There is evidence that the variability in Google stock prices is greater than the variability in Apple stock prices.

NOW YOU CAN DO

Exercises 21–26.

3 Perform the $F$ Test for Comparing Two Population Standard Deviations: $p$ -Value Method

We may also use the $p$ -value method to perform the $F$ test for comparing two population standard deviations. The requirements are the same.

$F$ Test for comparing two Population Standard Deviations: $p$ -Value Method

Suppose we have two independent random samples of size $n_{1}$ , and $n_{2}$ taken from two normally distributed populations, with population standard deviations $σ_{1}$ and $σ_{2}$ , and sample standard deviations $s_{1}$ and $s_{2}$ , respectively.

Step 1 State the hypotheses and the rejection rule. Use one of the forms from Table 19. Clearly state the meaning of $σ_{1}$ and $σ_{2}$ . The rejection rule is: Reject $H_{0}$ if the $p$ -value is less than $α$ .
Step 2 Find $F_{data}$ .

$F_{data} = \frac{S_{1}^{2}}{S_{2}^{2}}$

follows an $F$ distribution with ${df}_{1} = n_{1} - 1$ and ${df}_{2} = n_{2} - 1$ .
Step 3 Find the $p$ -value. Use technology and Table 19 to find the $p$ -value.
Step 4 State the conclusion and the interpretation. Compare the $p$ -value with $α$ .

Table 10.63: Table 19

$p$ -Value for the

$F$ test for comparing two standard deviations

We illustrate the $p$ -value method with an example.

EXAMPLE 21 $F$ Test for comparing two population standard deviations: $p$ -value method

The Web site Medicare.gov publishes survey information on patient attitudes about their level of care. Table 20 shows the percentages of respondents taken from independent random samples of hospitals in Florida and Georgia, which reported that their nurses always communicated well.

Page 624

Table 10.64: Table 20 Independent random samples of hospital percentages

Florida	Georgia
67 66 70 70 72 73	72 75 78 73 68 71
63 69 65 68 65	82 72 77 75 73
$\begin{matrix} n_{1} = 11 & s_{1} = 3.1305 \end{matrix}$	$\begin{matrix} n_{2} = 11 & s_{2} = 3.8162 \end{matrix}$

Table 10.64: Source: Medicare.gov.

Test whether there is a difference in variability between the two states, using $α = 0.10$ .

Solution

Because the normal probability plots in Figures 23a and 23b show acceptable normality, we may therefore proceed with the $F$ test for comparing population standard deviations.

FIGURE 23a Normal probability plot of Florida percentages.

FIGURE 23b Normal probability plot of Georgia percentages.

Step 1 State the hypotheses. We are testing whether there is a difference in the standard deviation of the percentages for Florida ( $σ_{1}$ ) and Georgia ( $σ_{2}$ ). We therefore have a two-tailed test:

$\begin{matrix} H_{0} : σ_{1} = σ_{2} & versus & H_{a} : σ_{1} \neq σ_{2} \end{matrix}$

where $σ_{1}$ and $σ_{2}$ represent the standard deviations of the percent of Florida and Georgia respondents, respectively, who reported that their nurses always communicated well. Use level of significance $α = 0.10$ .
Step 2 Find $F_{data}$ .

$F_{data} = \frac{s_{1}^{2}}{s_{2}^{2}} = \frac{{3.1305}^{2}}{{3.8162}^{2}} \approx 0.6729$

follows an $F$ distribution with ${df}_{1} = n_{1} - 1 = 10$ and ${df}_{2} = n_{2} - 1 = 10$ .
Step 3 Find the $p$ -value. Because we have a two-tailed test, Table 19 states that the $p$ -value is

$\begin{matrix} the smaller of (i) 2 \cdot P (F > F_{data}) & and & (i i) 2 \cdot P (F < F_{data}) \end{matrix}$

Figure 24a shows the output from the TI-83/84, giving $P (F > F_{data})$ as the area under the $F$ distribution curve between 0.6729 and infinity. This gives

$2 \cdot P (F > F_{data}) = 2 \cdot 0.7287420993 = 1.457484199$

FIGURE 24a

This cannot represent a valid $p$ -value, because it is larger than 1. Figure 24b shows the output from the TI-83/84, giving $P (F < F_{data})$ as the area under the $F$ distribution curve between 0 and 0.6729. Thus, our $p$ -value is:

$p -value = 2 \cdot P (F < F_{data}) = 2 \cdot P (F < 0.6729) = 2 \cdot 0.2712579007 \approx 0.5425$

because this is smaller than 1.457484199.

FIGURE 24b

Page 625
Step 4 State the conclusion and the interpretation. The $p$ -value 0.5425 is not less than $α = 0.10$ , so we do not reject $H_{0}$ . There is insufficient evidence for a difference in population standard deviations between percentages of patients in Florida and Georgia hospitals who reported that their nurses always communicated well.

NOW YOU CAN DO

Exercises 27–32.