OBJECTIVES By the end of this section, I will be able to …
In Section 14.4, we learned the Wilcoxon rank sum test, which tests whether the population medians of two independent random samples are equal. Here, in Section 14.5, we extend this method from two populations to three or more populations.
1 Kruskal-Wallis Test for Equal Medians in Three or More Populations
The Kruskal-Wallis test is used to determine whether the population medians of three or more independent random samples are equal. In Chapter 12, we learned how to perform analysis of variance (ANOVA), which is a hypothesis test to determine if the population means of three or more populations are equal. However, ANOVA requires that each population be normally distributed. The Kruskal-Wallis test is less strict, in that it does not require that the populations be normally distributed. Thus, the Kruskal-Wallis test is more widely applicable than is ANOVA.
The Kruskal-Wallis test is a nonparametric hypothesis test in which the original data from three or more independent samples are transformed into their ranks. It tests whether the population medians are all equal.
To calculate the test statistic for the Kruskal-Wallis test, we temporarily combine all the data values from all the samples and find the ranks of the combined data values.
14-37
So far, this is exactly what we did for the Wilcoxon rank sum test, except that now we have (three or more) samples instead of just two samples. Then the ranks are summed separately for each of the samples.
Let , , …, represent the sample sizes for samples , respectively. And let represent the total number of data values in all the samples combined; that is, . To perform the Kruskal-Wallis test, each of the sample sizes must be at least 5. Then the Kruskal-Wallis test statistic is given by
When the conditions are met, follows a distribution with degrees of freedom.
EXAMPLE 16 Calculating the Kruskal-Wallis test statistic
citybusiness
The U.S. Small Business Administration publishes the number of small businesses in medium-size cities. We are interested in testing whether the population median number of small businesses per city is the same in Florida, North Carolina, and Texas. For the following independent random samples given in the table below, calculate the test statistic for the Kruskal-Wallis test, using these steps:
Florida city | Number of small businesses |
North Carolina city |
Number of small businesses |
Texas city | Number of small businesses |
---|---|---|---|---|---|
Gainesville | 3,718 | Asheville | 4,883 | El Paso | 8,150 |
Tallahassee | 4,948 | Wilmington | 5,825 | Lubbock | 4,403 |
Daytona Beach | 9,489 | Greenville | 2, 153 | Killeen | 3,274 |
Melbourne | 8,771 | Fayetteville | 3,424 | College Station | 2,276 |
Sarasota | 13,729 | Rocky Mount | 2,108 | Laredo | 3,070 |
Lakeland | 6,865 | Amarillo | 3,855 | ||
Naples | 7,184 |
Solution
Combined data | 2,108 | 2,153 | 2,276 | 3,070 | 3,274 | 3,424 | 3,718 | 3,855 | 4,403 |
Rank | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |
Combined data | 4,883 | 4,948 | 5,825 | 6,865 | 7,184 | 8,150 | 8,771 | 9,489 | 13,729 |
Rank | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 |
14-38
The sum of the ranks for Florida is
The sum of the ranks for North Carolina is
The sum of the ranks for Texas is
Also, there are 7 cities in the Florida sample, 5 cities in the North Carolina sample, and 6 cities in the Texas sample, so that , , and , and the total sample size is .
Finally, the value of the test statistic is
Later, we will find out if this value for the test statistic warrants rejection of the null hypothesis. But first, we need to learn the hypotheses for the Kruskal-Wallis test.
NOW YOU CAN DO
Exercises 7–14.
Recall from Chapter 12 that the null hypothesis for ANOVA is that all population means are equal, and that the alternative hypothesis is that not all the population means are equal. The hypotheses for the Kruskal-Wallis test are the same, except that we are testing for medians instead of means.
Hypotheses for the Kruskal-Wallis Test
Next, we will summarize the steps for performing the Kruskal-Wallis test for the equality of three or more population medians.
Kruskal-Wallis Test for Independent Samples
The requirements are (a) there are independent samples, each randomly selected, and (b) there are at least 5 data values in each sample. It is not required that the populations be normally distributed.
Step 2 Find the critical value and state the rejection rule.
Use Appendix Table E. Select the column with “Area to the right of critical value” equal to the given level of significance . The value of is in the row with degrees of freedom . The Kruskal-Wallis test is always a right-tailed test, so that the rejection rule is always to reject if .
14-39
Step 3 Find the value of the test statistic .
where
and where represent the sample sizes for samples , respectively, and
EXAMPLE 17 Performing the Kruskal-Wallis test
Use the data in Example 16 to test whether the population median number of small businesses per city is the same in Florida, North Carolina, and Texas. Use the Kruskal-Wallis test with level of significance .
Solution
Each sample is independent and randomly selected, and each sample has at least five data values. Thus, the conditions for the Kruskal-Wallis test are met, and we may proceed with the hypothesis test.
NOW YOU CAN DO
Exercises 15–18.
14-40
EXAMPLE 18 Performing the Kruskal-Wallis test using technology
Recall the Chapter 12 Case Study, which investigated whether the amount of information a professor posts about himself or herself (that is, self-disclosure) on the online social network Facebook is related to student motivation.10 A professor constructed three different Facebook sites: one offering low self-disclosure, one offering medium self-disclosure, and one offering high self-disclosure. Study participants (students not enrolled in the professor's courses) were then randomly and independently assigned to access and browse one of the three Facebook sites, develop an impression of the professor, and complete the research questionnaire. Student motivation was measured using a set of 16 items, and the sum of the 16 items was calculated to form the total motivation score. Use technology and the Kruskal-Wallis test at level of significance to test whether the population median motivation scores are equal for the three types of Facebook pages (low, medium, and high self-disclosure). There were 43 students assigned to the low-disclosure page, 43 assigned to the medium-disclosure page, and 44 assigned to the high-disclosure page.
Solution
Each sample is independent and randomly selected, and there are at least five data values in each sample. Thus, the conditions for the Kruskal-Wallis test are met, and we may proceed with the hypothesis test.