OBJECTIVES By the end of this section, I will be able to …
Recall from Chapter 13 that one of the assumptions for the linear regression model was that the values of the response variable were independent. We checked this assumption using a scatterplot of the residuals against the fitted values; if systematic curvature was present, then the assumption was violated. Here, in Section 14.7, we learn a hypothesis test for checking this assumption, called the runs test for randomness.
1 Runs Test for Randomness
In contrast to the other sections in this chapter, in this section we look upon our data set as a sequence. The first observation is considered to occur before the second, which is before the third, and so on. That is, a sequence is an ordered data set.
Note that we are considering the data set to be a sequence (time-ordered) only for the application of the runs test for randomness. We are not suggesting that the data set itself is necessarily time-ordered.
The runs test for randomness helps us determine whether the data in the sequence are random or whether there is a pattern in the sequence. The runs test applies to data that have two possible outcomes (such as female or male) or data that can be reexpressed as one of two outcomes (such as correct or incorrect answers on a multiple-choice quiz). The runs test works by counting the number of runs in the data set.
A sequence is an ordered data set. A run is a sequence of observations sharing the same value (of two possible values), preceded or followed by data having the other possible value or by no data at all. The runs test for randomness tests whether the data in a sequence are random or whether there is a pattern in the sequence.
14-56
For example, suppose that we are noting the gender (F = female, M = male) of the first 16 students to enter your statistics classroom today as they walk in the door. Here are two possible sequences:
Sequence 1: | F | F | F | F | F | F | F | F | M | M | M | M | M | M | M | M |
Sequence 2: | F | M | F | M | F | M | F | M | F | M | F | M | F | M | F | M |
In the first sequence, there is a run of eight females, followed by a run of eight males. The eight females form a run because they represent a sequence of observations sharing the same value: F. Similarly, the eight males form a run. In Sequence 2, we note that the genders are alternating. The first data value F is followed immediately by an observation with a different value: M. Thus, the first data value itself forms a run. Similarly, each of the remaining observations forms a run of length 1.
The following notation is used in conducting a runs test for randomness:
EXAMPLE 23 Notation used for the runs test for randomness
The following sequence represents the genders of 20 students in a statistics class recorded as they enter the classroom:
F | F | M | M | M | F | F | F | M | F | F | F | M | M | F | F | M | F | F | M |
Calculate the values of , , , and .
Solution
There are females and males, so that . There are runs.
NOW YOU CAN DO
Exercises 5–8.
If the number of runs is too low or too high, this is evidence that a pattern exists in the data set. If the number of runs is neither too high nor too low, this is evidence that no time-ordered pattern exists in the data set, which may then be considered random. Thus, the runs test for randomness tests whether the number of runs is either too high or too low. There are large- and small-sample cases for the test statistic and the critical values for the runs test for randomness, as shown in the following steps.
Runs Test for Randomness
Two conditions are necessary for the runs test: (a) the data are ordered, and (b) each data value represents one of two distinct outcomes (such as female or male).
Small-Sample Case (, , and level of significance ): Use Appendix Table L. Note that the table is applicable only for level of significance . Find the row with the appropriate value of and the column with the appropriate value of . The two values at the intersection of this row and column represent the lower critical value and the upper critical value . The rejection rule is to reject if or if .
14-57
Level of significance α |
Critical value | Rejection rule |
---|---|---|
0.10 | 1.645 | Reject if |
0.05 | 1.96 | |
0.01 | 2.58 | or if |
Step 3 Find the value of the test statistic.
Finally, the test statistic is :
Step 4 State the conclusion and the interpretation.
Compare the test statistic with the critical value, using the rejection rule.
EXAMPLE 24 Conducting the runs test for randomness
Test whether the sequence from Example 23 is random by conducting the runs test for randomness, using level of significance .
Solution
We know that the data are time-ordered, and that each data value represents one of two distinct outcomes. We may thus proceed with the hypothesis test.
Step 2 Find the critical values, and state the rejection rule. We have females and males, so the small-sample case applies ( and ). In Appendix Table L we find the row with and the column with , giving us the critical values and (see Figure 26). We will reject if or if .
14-58
NOW YOU CAN DO
Exercises 9–20.
The runs test may also be used for numerical data, as long as the numerical data are classified into two categories, as shown in the following example.
EXAMPLE 25 Runs test for randomness of numerical data classified into categories
The weather station at the University of Missouri at Columbia publishes daily information on the amount of rain that falls at Sanborn Field at the university. The following 62 observations represent the daily rainfall information for the months of July and August 2008. For example, on July 1 the weather station reported 0.00 inch of rain, and on July 2 the weather station reported 0.37 inch of rain. We categorize each day's rainfall as follows: N = no rain falling, and R = some rain falling. Test whether the sequence is random by conducting the runs test for randomness, using level of significance .
N | R | R | N | N | N | N | R | R | N | N | R | N | N | N | N | N | N | N | N | N | R | N | R | R | N | R | R | N | R | R |
N | N | N | N | N | N | N | N | N | N | N | R | N | R | N | N | N | N | N | R | R | R | N | N | N | N | N | R | N | N | N |
Solution
The data are ordered, because they are arranged from July 1 to August 31, 2008. Also, each data value represents one of two distinct outcomes: some rain or no rain. We may thus proceed with the hypothesis test.
Step 1 State the hypotheses.
14-59
Step 3 Find the value of the test statistic. We have , and there are runs. Then
Finally, the test statistic is
The runs test for randomness may also be used to test the independence assumption for linear regression data, as shown in the following example. The important thing to remember is that the runs test should be applied to the residuals, which are ordered by the size of the fits ().
EXAMPLE 26 Using the runs test for linear regression
Consider the following ordered bivariate data set and the accompanying scatterplot (Figure 27). We are interested in performing linear regression of the variable on the variable. Make a scatterplot of the residuals versus the fts . Classify the residuals as being either positive (P) or negative (N). Then evaluate the independence assumption for the linear regression model by performing the runs test for randomness on the residuals, ordered by the fits.
0.0 | 1.00000 | 3.3 | −0.98748 |
0.3 | 0.95534 | 3.6 | −0.89676 |
0.6 | 0.82534 | 3.9 | −0.72593 |
0.9 | 0.62161 | 4.2 | −0.49026 |
1.2 | 0.36236 | 4.5 | −0.21080 |
1.5 | 0.07074 | 4.8 | 0.08750 |
1.8 | −0.22720 | 5.1 | 0.37798 |
2.1 | −0.50485 | 5.4 | 0.63469 |
2.4 | −0.73739 | 5.7 | 0.83471 |
2.7 | −0.90407 | 6.0 | 0.96 017 |
3.0 | −0.98999 | 6.3 | 0.99986 |
14-60
Solution
The scatterplot of the residuals versus the fts is shown in Figure 28.
What Results Might We Expect?
When applied to linear regression analysis, the runs test for randomness tests whether a pattern exists in the residuals. Do you observe a pattern in the scatterplot of the residuals (Figure 28)? If so, then what might we expect our conclusion to be for the runs test? Yes, there appears to be a descending and then ascending pattern in the data (In fact, can you discern the exact relationship between and ?), and thus we expect to reject the null hypothesis that the data are random
By examining Figure 28, we can classify the residuals from left to right as positive or negative, giving us:
P | P | P | P | P | P | N | N | N | N | N | N | N | N | N | N | P | P | P | P | P | P |
The residuals are ordered by the size of the fts, and we have classified each residual into one of two distinct outcomes. Thus, we may proceed with the hypothesis test.
By the way, have you guessed the equation of the pattern shown in Figures 27 and 28? The relationship between and is .