Processing math: 46%

12.2 Multiple Comparisons

OBJECTIVES By the end of this section, I will be able to …

  1. Perform multiple comparisons tests using the Bonferroni method.
  2. Use Tukey's test to perform multiple comparisons.
  3. Use confidence intervals to perform multiple comparisons for Tukey's test.

Recall Example 5, where we rejected the null hypothesis that the population mean time spent in the open-ended sections of a maze was the same for three groups of genetically altered mice. But so far, we have not tested to find out which pairs of population means are significantly different.

Page 686
image
FIGURE 21 Summary statistics for three groups of mice.

Figure 21 indicates that the sample mean time for Group 0 ˉxGroup0=19.387 was much larger than the sample means of the other groups ˉxGroup1=8.660 or ˉxGroup2=8.620. Because ˉxGroup0>ˉxGroup1, and because the ANOVA test produced evidence that the three population means are not equal, we are tempted to conclude that μGroup0>μGroup1. However, we cannot formally draw such a conclusion based on the one-way ANOVA results alone. Instead, we need to perform multiple comparisons.

Multiple Comparisons

Once an ANOVA result has been found significant (the null hypothesis is rejected) multiple comparisons procedures seek to determine which pairs of population means are significantly different. Multiple comparisons are not performed if the ANOVA null hypothesis has not been rejected.

We will learn three multiple comparisons procedures: the Bonferroni method, Tukey's test, and Tukey's test using confidence intervals.

1 Performing Multiple Comparisons Tests Using the Bonferroni Method

In Section 10.2, we learned about the independent sample t test for determining whether pairs of population means were significantly different. We will do something similar here, except that (a) the formula for test statistic tdata is different from the one in Section 10.2, and (b) we need to apply the Bonferroni adjustment to the p-value.

Denote the number of population means as k. In general, there are

c=(kC2)=k!2!(k2)!

possible pairs of means to compare; that is, there are c pairwise comparisons. For k=3, there are c=(3C2)=3!2!(k2)!=3 comparisons, and for k=4 there are c=(4C2)=4!2!(42)!=6 comparisons. We rejected the null hypothesis in Example 5, so we are interested in which pairs of population means are significantly different. There are c=3 hypothesis tests:

Suppose each of these three pairwise hypothesis tests is carried out using a level of significance a=0.05. Then the experimentwise error rate, that is, the probability of making at least one Type I error in these three hypothesis tests is

αEW=1(1a)3=0.142625

which is approximately three times larger than α=0.05. The Bonferroni adjustment corrects for this as follows.

Recall that a Type I error is rejecting the null hypothesis when it is true.

The Bonferroni Adjustment

  • When performing multiple comparisons, the experimentwise error rate αEW is the probability of making at least one Type I error in the set of hypothesis tests.
  • αEW is always greater than the comparison level of significance α by a factor approximately equal to the number of comparisons being made.
  • Thus, the Bonferroni adjustment corrects for the experimentwise error rate by multiplying the p-value of each pairwise hypothesis test by the number of comparisons being made. If the Bonferroni-adjusted p-value is greater than 1, then set the adjusted value equal to 1.
Page 687

For example, when we test H0:μGroup0=μμGroup1versusHa:μGroup0μGroup1, the Bonferroni adjustment says to multiply the resulting p-value by c=3. Example 7 shows how to use the Bonferroni method of multiple comparisons.

EXAMPLE 7 Bonferroni method of multiple comparisons

Use the Bonferroni method of multiple comparisons to determine which pairs of population mean times differ, for the mice in Groups 0, 1, and 2 in Example 5. Use level of significance α=0.01.

Solution

The Bonferroni method requires that

  • the requirements for ANOVA have been met, and
  • the null hypothesis that the population means are all equal has been rejected.

In Example 5, we verified both requirements.

  • Step 1 For each of the c hypothesis tests, state the hypotheses and the rejection rule. There are k=3 means, so there will be c=3 hypothesis tests. Our hypotheses are

    • Test 1: H0:μGroup0=μGroup1versusHa:μGroup0μGroup1
    • Test 2: H0:μGroup0=μGroup2versusHa:μGroup0μGroup2
    • Test 3: H0:μGroup1=μGroup2versusHa:μGroup1μGroup2

    where μi represents the population mean time spent in the open-ended sections of the maze, for the ith group. For each hypothesis test, reject H0 if the Bonferroni-adjusted p-valueα=0.01.

  • Step 2 Calculate tdata for each hypothesis test. From Figure 11 on page 676, we have the mean square error from the original ANOVA as MSE = 52.9485079 and from Figure 21 we get the sample means and the sample sizes. Thus,

    • Test 1:

      tdata=ˉxGroup0ˉxGroup1MSE(1nGroup0+1nGroup1)=19.3878.660(52.9485079)(115+115)4.037

    • Test 2:

      tdata=ˉxGroup0ˉxGroup2MSE(1nGroup0+1nGroup2)=19.3878.620(52.9485079)(115+115)4.052

    • Test 3:

      tdata=ˉxGroup1ˉxGroup2MSE(1nGroup1+1nGroup2)=8.6608.620(52.9485079)(115+115)0.015

    When the requirements are met, tdata follows a t distribution with ntk=453=42 degrees of freedom, where nt represents the total sample size.

    image
    FIGURE 22 Unadjusted p-values from Excel.
  • Step 3 Find the Bonferroni-adjusted p-value for each hypothesis test. Figure 22 shows the unadjusted p-values for the values of tdata from Step 2, using the function tdist (tdata, df,2), where df=42 and the 2 represents a two-tailed test. Then the Bonferroni-adjusted p-value=c(p-value)=3(p-value), for each hypothesis test.
    Page 688
    • Test 1: Bonferroni-adjusted p-value=30.000225=0.000675.
    • Test 2: Bonferroni-adjusted p-value=30.000215=0.000645.
    • Test 3: Bonferroni-adjusted p-value=30.988103=2.964309, but this value exceeds 1, so we set this p-value equal to 1.
  • Step 4 For each hypothesis test, state the conclusion and the interpretation.
    • Test 1: The adjusted p-value = , which is ≤0.01; therefore, reject H0. There is evidence at the 0.01 level of significance that the population mean time spent in the open-ended part of the maze differs between Group 0 and Group 1.
    • Test 2: The adjusted p-value = 0.000645, which is ≤0.01; therefore, reject H0. There is evidence at the 0.01 level of significance that the population mean time differs between Group 0 and Group 2.
    • Test 3: The adjusted p-value = 1, which is not ≤0.01; therefore, do not reject H0. There is insufficient evidence at the 0.01 level of significance that the population mean time differs between Group 1 and Group 2.

NOW YOU CAN DO

Exercises 9–18.

2 Tukey's Test for Multiple Comparisons

We may also use Tukey's test to determine which pairs of population means are significantly different. Tukey's test was developed by John Tukey, whom we met earlier as the developer of the stem-and-leaf display. We illustrate the steps for Tukey's method using an example.

EXAMPLE 8 Tukey's test for multiple comparisons

In the Case Study on page 678, we tested whether the population mean student motivation scores were equal for the three types of professor self-disclosure on Facebook: high, medium, and low. Figure 18 on page 678 contains the ANOVA results, for which we rejected the null hypothesis of equal population mean scores. Use Tukey's method to determine which pairs of population means are significantly different, using level of significance α=0.05.

Solution

Tukey's method has the same requirements as the Bonferroni method:

  • the requirements for ANOVA have been met, and
  • the null hypothesis that the population means are all equal has been rejected.

In the Case Study, both requirements were verified.

  • Step 1 For each of the c hypothesis tests, state the hypotheses. There are k=3 means, so there will be c=3 hypothesis tests. Our hypotheses are:

    • Test 1: H0:μHigh=μMediumversusHa:μHighμMedium
    • Test 2: H0:μHigh=μLowversusHa:μHighμLow
    • Test 3: H0:μMedium=μLowversusHa:μMediumμLow

    where μi represents the population mean score, for the ith category.

  • Step 2 Find the Tukey critical value qcrit and state the rejection rule. The total sample size is nt=43+44+43=130. Use experimentwise error rate αEW=0.05, degrees of freedom df=ntk=1303=127, and k=number of population means=3. Using the table of Tukey critical values (Table G in the Appendix), we seek df=127 on the left, but, when we don't find it, we conservatively choose df = 120. Then, in the column for k=3, we find the Tukey critical value qcrit=3.356 (Figure 23). The rejection rule for the Tukey method is “Reject H0ifqdataqcrit,” that is, Reject H0 if qdata3.356.

    image
    FIGURE 23 Finding the Tukey critical value qcrit.
    Page 689
  • Step 3 Calculate the Tukey test statistic qdata for each hypothesis test. From Figure 18 on page 678, we get the sample means, the sample sizes, and the mean square error MSE = 168. Thus,
    • Test 1:

      qdata=x¯Highx¯MediumMSE2(1nHigh+1nMedium)=81.0979.36(1682)(143+144)0.880

    • Test 2:

      qdata=x¯Highx¯LowMSE2(1nHigh+1nLow)=81.0979.63(1682)(143+143)5.292

    • Test 3:

      qdata=x¯Mediumx¯LowMSE2(1nMedium+1nLow)=79.3670.63(1682)(144+143)4.442

  • Step 4 For each hypothesis test, state the conclusion and the interpretation.
    • Test 1: qdata=0.880, which is not qcrit=3.356; therefore, do not reject H0. There is insufficient evidence at the 0.05 level of significance that the population mean student motivation scores differ between professors having high and medium self-disclosure on Facebook.
    • Test 2: qdata=5.292, which is qcrit=3.356; therefore, reject H0. There is evidence at the 0.05 level of significance that the population mean scores differ between high and low professor self-disclosure on Facebook.
    • Test 3: qdata=4.442, which is qcrit=3.356; therefore, reject H0. There is evidence at the 0.05 level of significance that the population mean scores differ between medium and low professor self-disclosure on Facebook.

This set of three hypothesis tests has an experimentwise error rate αEW=0.05.

When calculating the numerator of qdata for each pairwise comparison, be sure to subtract the smaller value of x¯ from the larger value of x¯, so that the value of qdata is positive.

NOW YOU CAN DO

Exercises 19–30.

3 Using Confidence Intervals to Perform Tukey's Test

Tukey's test for multiple comparisons may also be performed using confidence intervals and technology. Recall that when using confidence intervals for hypothesis tests, H0 is rejected if the hypothesized value of the population mean does not fall inside the confidence interval.

Rejection Rule for Using Confidence Intervals to Perform Tukey's test

If a 100(1α)% confidence interval for μ1μ2 contains zero, then at level of significance α, we do not reject the null hypothesis H0:μ1=μ2. If the interval does not contain zero, then we do reject H0.

Page 690

We illustrate the concept of using confidence intervals to perform Tukey's test with an example using the Facebook data.

EXAMPLE 9 Using confidence intervals to perform Tukey's test

Use the 95% confidence intervals for the differences in population means provided by Minitab to perform Tukey's test for multiple comparisons on the Facebook data.

Solution

We use the steps in the Step-by-Step Technology Guide provided at the end of this section. Figure 24 contains the output from Minitab showing 95% confidence intervals for the differences in population means for the high, medium, and low professor disclosure levels. The output states that “Group = Low” is being subtracted from the other two groups, meaning that the first two confidence intervals are for μMediumμLow and μHighμLow. Later, “Group = Medium” is subtracted from the high group, indicating a confidence interval for μHighμMedium. The column headings “Lower” and “Upper” represent the lower and upper bounds of the confidence interval. Figure 25 shows the output from JMP, including 95% confidence intervals for the differences in population means. The output states that the second level listed is subtracted from the first, meaning that the first two confidence intervals are for μHighμLow and μMediumμLow. The columns “Lower CL” and “Upper CL” represent the lower and upper bounds of each confidence interval.

FIGURE 24 Using Minitab confidence intervals to perform Tukey's test.
image
FIGURE 25 Using JMP confidence intervals to perform Tukey's test.
image

Thus, for our c=3 hypothesis tests, we have

  • Test 1: H0:μMedium=μLowversusHa:μMediumμLow

    95% confidence interval for μMediumμLow is (2.14, 15.33), which does not contain zero, so we reject H0:μMedium=μLow for level of significance α=0.05.

  • Test 2: H0:μHigh=μLowversusHa:μHighμLow

    95% confidence interval for μHighμLow is (3.84, 17.09), which does not contain zero, so we reject H0:μHigh=μLow for level of significance α=0.05.

    Page 691
  • Test 3: H0:μHigh=μMediumversusHa:μHighμMedium

    95% confidence interval for μHighμMedium is (–4.86, 8.32), which does contain zero, so we do not reject H0:μHigh=μMedium for level of significance α=0.05.

Note that these conclusions are exactly the same as the conclusions from Example 8.

NOW YOU CAN DO

Exercises 31 and 32.

[Leave] [Close]