OBJECTIVES By the end of this section, I will be able to …
The sign test that we learned about in Section 14.2 required only that the sample be randomly selected. However, because the requirements for performing the sign test are so minimal, the efficiency of the sign test may not be as high as the analyst would want it to be.
If the sample data are randomly selected and symmetric, however, the data analyst may apply the more efficient Wilcoxon signed rank test to two of the situations in which the sign test can be applied, namely, to test for a single population median and to test for the population median of the differences for matched-pair data.
1 Assessing the Symmetry of a Data Set
In Section 2.2, we learned that a distribution is symmetric if an axis of symmetry splits the image in half so that one side is the mirror image of the other. The distribution of women's heights in Figure 6 is approximately symmetric (exact symmetry is rarely achieved with real-world data). If the data were randomly selected, it would be appropriate to perform the Wilcoxon signed rank test on the data in Figure 6. On the other hand, the distributions shown in Figures 7 and 8 are not symmetric.
In Section 3.5, we learned that a boxplot is a convenient method for assessing the symmetry of a data set. Figure 7 shows the clearly right-skewed histogram of the number of calories per gram for the food items in the Nutrition data set. Note that the corresponding boxplot has a longer whisker on the right side, and that the median line is somewhat to the left of the center of the box.
14-19
For the left-skewed exam score data in Figure 8, the boxplot has a longer whisker on the left side, and the median line is somewhat to the right of center. Finally, for the symmetric fitness score data in Figure 9, the corresponding boxplot has whiskers of approximately equal length, and the median line is situated approximately in the center of the box.
Boxplot Criterion for Assessing Symmetry
A data set is symmetric when its corresponding boxplot has whiskers of approximately equal length, and the median line is situated approximately in the center of the box.
EXAMPLE 8 Assessing symmetry using the Ti-83/84
In Example 18 in Chapter 9, we examined a random sample of young women who were admitted with the diagnosis of anorexia nervosa to the Toronto Hospital for Sick Children. Use the TI-83/84 to assess the symmetry of the ages of these women (data on page 526).
Solution
Figure 10 shows the TI-83/84 boxplot of the ages of the young women. The whiskers are of approximately equal length, and the median line is situated approximately in the center of the box. We therefore conclude that the age data are symmetric.
NOW YOU CAN DO
Exercises 11–14.
2 Wilcoxon Signed Rank Test for Matched-Pair Data from two Dependent Samples
In Section 14.2, we performed the sign test to test for both a single population median and for the population median of the difference between two dependent samples. Similarly, in Section 14.3 we apply the Wilcoxon signed rank test for these two situations. We begin with the Wilcoxon signed rank test for matched-pair data from two dependent samples.
14-20
In the sign test, the data are converted into plus signs or minus signs. The magnitude of the data values is lost, which contributes to the low efficiency of the sign test. In 1945, Frank Wilcoxon developed the Wilcoxon signed rank test for a single population median, which takes the magnitude of the data into account by ranking the data values.
The Wilcoxon signed rank test is a nonparametric hypothesis test in which the original data are transformed into their ranks. The Wilcoxon signed rank test may be conducted for (a) a single population median or (b) matched-pair data from two dependent samples.
The following example illustrates how to calculate the signed ranks for the Wilcoxon signed rank test.
EXAMPLE 9 Calculating the Wilcoxon signed ranks
The California Community Colleges Chancellor's Office publishes enrollment data for each of its community colleges. Table 7 contains the number of students enrolled in the Spring 2013 and the Spring 2014 semesters at a random sample of six community colleges in California. We are interested in testing whether the median number of enrolled students has declined from 2013 to 2014. That is, we are interested in testing whether the population median of the differences (2014 – 2013) is less than zero.
Solution
Calculate the signed ranks as follows:
Community college |
Enrollment Spring 2013 |
Enrollment Spring 2014 |
Difference (2014 – 2013) |
Rank of |
Signed rank |
|
---|---|---|---|---|---|---|
Los Angeles CC | 148,754 | 148,362 | –392 | 392 | 3 | –3 |
Santa Monica CC | 31,719 | 31,437 | –282 | 282 | 2 | –2 |
El Camino CC | 22,657 | 22,791 | 134 | 134 | 1 | 1 |
North Orange CC | 51,780 | 53,993 | 2,213 | 2,213 | 5 | 5 |
Foothill CC | 34,415 | 33,574 | –841 | 841 | 4 | –4 |
Los Rios CC | 75,230 | 71,911 | –3,319 | 3,319 | 6 | –6 |
The sum of the positive sign ranks is
The sum of the negative signed ranks is
14-21
The procedure for the Wilcoxon signed rank test for matched-pair data is summarized as follows.
Wilcoxon Signed Rank Test for Matched-Pair Data
The requirements are that the sample data be randomly selected and that the distribution of the differences be symmetric. It is not required that the population be normally distributed.
Step 1 State the hypotheses.
Choose one of the forms in Table 8.
Null hypothesis | Alternative hypothesis | Type of test |
---|---|---|
Right-tailed test | ||
Left-tailed test | ||
Two-tailed test |
Step 3 Find the value of the test statistic.
First find the signed ranks using the following steps:
14-22
Type of test | Test statistic |
---|---|
Right-tailed test | |
Left-tailed test | |
Two-tailed test | , whichever is smaller |
We illustrate the Wilcoxon signed rank test for the population median of the differences using the following example.
EXAMPLE 10 Wilcoxon signed rank test for matched-pair data
Use the data from Example 9 to test whether the population median number of enrolled students has decreased from 2013 to 2014, using level of significance .
Solution
Figure 11 is a TI-83/84 boxplot of the differences (2014 – 2013). The whiskers are approximately the same length, indicating symmetry. Thus, we have a random sample of data exhibiting acceptable symmetry, and so our conditions are met.
Step 1 State the hypotheses. We have a left-tailed test:
where represents the population median of the differences in number of enrolled students at California community colleges from 2013 to 2014.
Step 3 Find the value of the test statistic. The signed ranks are given in Table 7. We have a left-tailed test, so from Table 9, we have
14-23
NOW YOU CAN DO
Exercises 15–18.
3 Wilcoxon Signed Rank Test for a Single Population Median
We can use the same methods for the Wilcoxon signed rank test for a single population median that we used for the Wilcoxon signed rank test for matched-pair data. However, only one sample is involved, so no subtraction is necessary to find the differences. The hypotheses for the Wilcoxon signed rank test for a single population median are the same as those for the sign test for matched-pair data, given in Table 10.
Null hypothesis | Alternative hypothesis | Type of test |
---|---|---|
Right-tailed test | ||
Left-tailed test | ||
Two-tailed test |
We illustrate the small-sample case of the Wilcoxon signed rank test for a single population median using the following example.
EXAMPLE 11 Wilcoxon signed rank test for a single population median: small-sample case
The Web site www.missingkids.com provides a searchable database of missing children. The ages of the following six children were obtained from this database.
Child | Adam | Juan | Benjamin | Samantha | Kayleen | Aiko |
Age | 4 | 9 | 5 | 7 | 6 | 3 |
Test, using level of significance , whether the population median age of the missing children equals 6 years old.
14-24
Solution
Step 1 State the hypotheses. We have a two-tailed test:
where represents the population median age of the missing children. Thus, the hypothesized value for the median is .
Child | Age | Rank of | Signed rank | ||
---|---|---|---|---|---|
Adam | 4 | 2 | 3 | −3 | |
Juan | 9 | 3 | 4.5 | 4.5 | |
Benjamin | 5 | 1 | 1.5 | −1.5 | |
Samantha | 7 | 1 | 1.5 | 1.5 | |
Kayleen | 6 | — | — | — | |
Aiko | 3 | 3 | 4.5 | −4.5 |
14-25
Next, we need to sum the positive ranks and the negative ranks. There are two positive signed ranks: Juan's 4.5 and Samantha's 1.5. Thus, . There are three negative signed ranks, which we add to get . Taking the absolute value gives us . Table 9 tells us that Thus, .
NOW YOU CAN DO
Exercises 19–22.
EXAMPLE 12 Large-sample Wilcoxon signed rank test for a population median using technology
Test using level of significance whether the population median age of missing children differs from 6 years old, using the random sample of 50 missing children shown here:
Child | Age | Child | Age | Child | Age | Child | Age |
---|---|---|---|---|---|---|---|
Amir | 5 | Carlos | 7 | Octavio | 8 | Christian | 8 |
Yamile | 5 | Ulisses | 6 | Keoni | 6 | Mario | 8 |
Kevin | 5 | Alexander | 7 | Lance | 5 | Reya | 5 |
Hilary | 8 | Adam | 4 | Mason | 5 | Elias | 1 |
Zitlalit | 7 | Sultan | 6 | Joaquin | 6 | Maurice | 4 |
Aleida | 8 | Abril | 6 | Adriana | 6 | Samantha | 7 |
Alexia | 2 | Ramon | 6 | Christopher | 3 | Michael | 9 |
Juan | 9 | Amari | 4 | Johan | 6 | Carlos | 2 |
Kevin | 2 | Joliet | 1 | Kassandra | 4 | Lukas | 4 |
Hazel | 5 | Christopher | 4 | Hiroki | 6 | Kayla | 4 |
Melissa | 1 | Jonathan | 8 | Kimberly | 5 | Aiko | 3 |
Kayleen | 6 | Emil | 7 | Diondre | 4 | Lorenzo | 9 |
Mirynda | 7 | Benjamin | 5 |
Solution
The boxplot of the age data is shown here.
The conditions are met because we have a random sample and the distribution of ages is symmetric.
Step 1 State the hypotheses.
where represents the population median age of the missing children.
Step 3 Find the value of the test statistic. We use the instructions provided in the Step-by-Step Technology Guide at the end of this section. Figure 14 shows the Minitab results, and Figure 15 shows the SPSS results, from the Wilcoxon signed rank test for the population median. Note that the original sample size (“N”) is 50, but that “N for Test” is , because 10 data values have been omitted. The “Wilcoxon Statistic” is the value of , which represents the smaller of and . We use this value to find the test statistic:
14-26