11.1 Goodness of Fit Test

632

OBJECTIVES By the end of this section, I will be able to …

  1. Explain what a multinomial random variable is and how to calculate expected frequencies.
  2. Describe how a goodness of fit test works.
  3. Perform and interpret the results from the goodness of fit test using the critical-value method and the p-value method.

According to the Adobe Digital Index, the market share for the leading Internet browsers (both desktop and mobile) in June 2014 was as follows: Google Chrome, 32%; Microsoft Internet Explorer, 31%; others, 37%. Change is rapid in the online environment. Have these market shares changed since June 2014? How would we go about performing a hypothesis test to determine whether market shares have changed significantly? In Section 11.1, we examine this question using a new type of hypothesis test called a goodness of fit test. We begin by first considering a new type of random variable that is used to represent categorical data.

1 The Multinomial Random Variable

Recall from Chapter 1 that categorical (qualitative) variables take values that can be classified into categories. In Chapter 6, we considered binomial random variables, for which there are only two possible outcomes. Now, let's consider the following type of random variable, which can have more than two possible values.

Multinomial Random Variable

A random variable is multinomial if it satisfies each of the following conditions:

  • Each independent trial of the experiment has possible outcomes,
  • The ith outcome (category) occurs with probability , where (that is, is the population proportion for category ).
  • (Law of Total Probability).

Data from a multinomial random variable are said to follow a multinomial distribution.

Note: The binomial distribution may be considered a special case of the multinomial distribution, with .

For example, suppose 30% of the residents of a particular town are Democrats,30% are Republicans, and 40% are Independents. If we select residents at random, then the number of Democrats, Republicans, and Independents observed follows a multinomial distribution, with

and

EXAMPLE 1 Identifying a Multinomial Random Variable

For each of the following, determine whether the random variable is multinomial.

  1. We select 10 students at random and define our random variable to be the amount of time the student used a Web browser yesterday.
  2. We select 10 students at random and define our random variable to be the browser used most by the student the last time he or she was on the Internet, where the possible values are Google Chrome, Microsoft Internet Explorer, or Other.

633

Solution

  1. The amount of time spent using a browser is a continuous random variable, not categorical. So, cannot be multinomial.
  2. The browser is categorical. We have different categories, with the population proportions of the three categories adding up to one (see Example 2(a) below). Therefore, is multinomial.

NOW YOU CAN DO

Exercises 5–8

Next, recall from Section 6.2 that the formula for finding the expected value (mean) of a binomial random variable having trials and probability of success is

For a multinomial random variable, the expected frequency of the ith category is

where represents the number of trials, and represents the population proportion for the ith category.

EXAMPLE 2 Finding the expected frequencies

According to the Adobe Digital Index, the market share for the leading Internet browsers (both desktop and mobile) in June 2014 was as shown in Table 1. Let of a randomly selected Internet user.

  1. Verify that is a valid multinomial random variable.
  2. Find the expected frequency for each category in a series of 200 trials.
Table 11.1: Table 1 Distribution of browser market share
Browser Relative frequency
Google Chrome 0.32
Microsoft Internet Explorer 0.31
Other 0.37

Solution

  1. There are possible outcomes: Google Chrome, Microsoft Internet Explorer, and Other. Assigning probabilities using the relative frequency method, we have the following hypothesized proportions for each browser:

    and

    Therefore, is a valid multinomial random variable.

    634

  2. We have trials (), so the expected frequencies are as provided in Table 2.
Table 11.2: Table 2 Expected frequencies for browser preference in sample of size 100
Category
Google Chrome
Microsoft Internet Explorer (IE)
Other

As a check on the calculations, we should have . In this case,

NOW YOU CAN DO

Exercises 9–12.

YOUR TURN #1

Publishers Weekly reported that, in 2014, the book format market share was as follows: paperbacks, 41%; hard covers, 34%; e-books, 13%; and all other formats, 12%. Suppose a survey was conducted this year of 2000 books purchased.

  1. Verify that is a valid multinomial random variable.
  2. Find the expected frequency for each category.

(The solutions are shown in Appendix A.)

What Do These Expected Frequencies Mean?

Recall that the expected value of a random variable refers to the long-run mean of that random variable after an arbitrarily large number of trials. For example, if we repeatedly took samples of 200 Internet users and asked about browser preference, the mean number of persons who used Google Chrome would approach as we took more and more different samples, if the proportions given in Table 1 are correct. Similarly, because 31% of the entire population of Internet users use Microsoft IE, we would expect about 31% of any given sample of 200 Internet users to use Microsoft IE, because the sample is a subset of the population. This of course raises the question: Are the proportions in Table 1 still true? That is the type of question we will learn how to address next.

2 What Is a Goodness of Fit Test?

Do the 2014 market shares still hold true today? In other words, has the distribution of the multinomial random variable browser given in Table 1 changed since June 2014? To determine this, we introduce a new type of hypothesis test, called a goodness of fit test.

Goodness of Fit Test

A goodness of fit test is a hypothesis test used to determine whether a random variable follows a particular distribution. In a goodness of fit test, the hypotheses are

  • .

635

For Example 2, the null hypothesis completely specifies each of the probabilities in the relative frequency distribution, as follows:

The alternative hypothesis simply denies the claim made by the null hypothesis:

.

In other words, claims that the browser market shares have changed since June 2014.

Developing Your Statistical Sense

Fitting the Model to the Data

Now, a goodness of fit test sounds like something you do in a clothing store dressing room. Actually, the analogy to clothes is rather appropriate. Suppose winter is coming and you are in the market for a new pair of gloves. You find one pair that is especially attractive, but the gloves don't fit your hands. What do you do? You reject the ill-fitting gloves and search for a new pair. In statistics, the gloves represent the models and your hands represent the actual “hard data” observed in the sample.

The null hypothesis represents what is called a model, a working theory of how the population proportions are distributed. Our working model of how the market shares are distributed is stated in the null hypothesis:

Model 1.

Of course, we could also try other models if we think the market has changed, such as the following:

Model 2.

Model 3.

In hypothesis testing, we “try on” only one model at a time.

In statistics, a goodness of fit test determines if the actual “hard data” observed in the sample are consistent with the proportions stated in the null hypothesis. Market researchers would collect data on the actual preferences of a sample of 100 real Internet users in order to determine whether or not the market shares have changed. The sample is summarized in a set of observed frequencies of Internet users who prefer the various browsers. The goodness of fit test then compares these observed frequencies with the expected frequencies found in Example 2.

How a Goodness of Fit Test Works

The goodness of fit test is based on a comparison of the observed frequencies (sample data) with the expected frequencies when is true. That is, we compare what we actually see with what we would expect to see if were true. If the difference between the observed and expected frequencies is large, we reject .

The difference between the observed and expected frequencies is measured by the test statistic, . As usual, it comes down to how large a difference is large.

636

Test Statistic for the Goodness of Fit Test

For a multinomial random variable with categories and trials, let represent the observed frequency for category , and let represent the expected frequency for category . Then the test statistic for a goodness of fit test

approximately follows a (chi-square) distribution with degrees of freedom (df), if the following conditions are satisfied:

  1. None of the expected frequencies is less than 1.
  2. At most, 20% of the expected frequencies are less than 5.

Students may want to review the characteristics of the distribution (Chapter 10, page 618) and the procedure for finding critical values for a right-tailed test (Chapter 10, page 620).

If the conditions are not satisfied, then it may be possible to combine two or more categories so that the conditions may then be fulfilled.

EXAMPLE 3 Calculating

Suppose the observed frequencies of browser preference in Table 3 come from a survey taken this year of 200 Internet users.

Table 11.3: Table 3 Observed frequencies of browser preference in a sample of 200 Internet users
Browser Observed frequency
Google Chrome 80
Microsoft Internet Explorer 62
Other 58

Calculate the test statistic by comparing the observed frequencies from Table 3 with the expected frequencies calculated in Table 2 of Example 2.

Solution

The observed frequencies are found in Table 3, and the expected frequencies are given in Table 2. Table 4 then provides the quantities needed to calculate . Then

Table 11.4: Table 4 Calculating
Category
Chrome 0.32 80 64 16 256
IE 0.31 62 62 0 0
Other 0.37 58 74 −16 −256

NOW YOU CAN DO

Exercises 13–18.

637

YOUR TURN #2

Publishers Weekly reported that, in 2014, the book format market share was as follows: paperbacks, 41%; hard covers, 34%; e-books, 13%; and all other formats, 12%. Suppose a survey was conducted this year of 2000 books purchased, with the following book sales: 810 paperbacks, 680 hard covers, 280 e-books, and 230 others. Calculate the test statistic .

(The solution is shown in Appendix A.)

3 Performing the Goodness of Fit Test

The goodness of fit test may be performed using (a) the critical-value method or (b) the p-value method. We start with the critical value method.

Goodness of Fit Test: Critical-Value Method

  • Step 1 State the hypotheses and check the conditions.
  • The null hypothesis states that the multinomial random variable follows a particular distribution.
  • The alternative hypothesis states that the random variable does not follow that distribution.

The following conditions must be met:

  1. None of the expected frequencies is less than 1.
  2. At most, 20% of the expected frequencies are less than 5.

The expected frequency for the ith category is , where represents the number of trials and represents the population proportion for the ith category.

  • Step 2 Find the critical value, , and state the rejection rule. Use Table E in the Appendix. Reject
  • Step 3 Calculate .

    where = observed frequency, and = expected frequency.

  • Step 4 State the conclusion and the interpretation. Compare .

All hypothesis tests in this chapter are right-tailed tests, so that we need to find for the area to the right of the critical value only.

EXAMPLE 4 Critical-value method for the goodness of fit test

Test whether the Internet browser market shares from Example 2 have changed since June 2014, using level of significance .

Solution

  • Step 1 State the hypotheses and check the conditions. The hypotheses are:

    Checking the conditions, the expected frequencies from Table 2 are

    Because none of these expected frequencies is less than 1, and none of the expected frequencies is less than 5, the conditions for performing the goodness of fit test are satisfied.

  • Step 2 Find the critical value, , and state the rejection rule. We have degrees of freedom . Turning to the table (Table E in the Appendix) in the column labeled and the row containing , we find , as shown in Figure 1. The rejection rule is “Reject if ."

    638

    image
    Figure 11.1: FIGURE 1 Finding the critical value for and level of significance .
  • Step 3 From Example 3, we have .
  • Step 4 State the conclusion and the interpretation. Compare is greater than , as shown in Figure 2. Therefore, we reject .
    image
    Figure 11.2: FIGURE 2 Reject when

Evidence exists at level of significance that the random variable browser does not follow the distribution specified in . In other words, evidence exists that the market shares for Internet browsers have changed.

NOW YOU CAN DO

Exercises 19–22.

YOUR TURN #3

Test using level of significance whether the book format market shares have changed, using the information from Your Turn #1 on page 634 and Your Turn #2 on page 637.

(The solution is shown in Appendix A.)

Developing Your Statistical Sense

Be Careful How You Interpret the Conclusion

Note carefully what this conclusion says and what it doesn't say. The goodness of fit test provides evidence that the random variable does not follow the distribution specified in . In particular, the conclusion does not state, for example, that Chrome's proportion is significantly greater than it was in 2014. Informally, we can compare the observed frequency of 80 with the expected frequency of 64 for the Chrome browser and note that there appears to be evidence of an increase in market share for Chrome. But this is only informal and is not part of the hypothesis test. It is a common error in statistical analysis to form conclusions beyond what the hypothesis test is actually testing.

639

Next, we turn to the p-value method. The goodness of fit test is a right-tailed test, so the p-value for the statistic is defined as the area under the curve to the right of the test statistic , as shown in Figure 3. That is,

image
Figure 11.3: FIGURE 3

Goodness of Fit Test: p-Value Method

  • Step 1 State the hypotheses and the rejection rule. Check the conditions.
  • The null hypothesis states that the multinomial random variable follows a particular distribution.
  • The alternative hypothesis states that the random variable does not follow that distribution.
  • Reject if the p-value .

The following conditions must be met:

  1. None of the expected frequencies is less than 1.
  2. At most, 20% of the expected frequencies are less than 5.

    The expected frequency for the ith category is , where represents the number of trials and represents the population proportion for the ith category.

  • Step 2 Calculate .

    where = observed frequency, and = expected frequency.

  • Step 3 Find the p-value.

    (see Figure 3)

  • Step 4 State the conclusion and the interpretation. Compare the p-value with .

EXAMPLE 5 p-Value method for the goodness of fit test using technology

Table 5 contains the distribution of violent crime in New York City in 2012.2 Suppose that a random sample of 1000 violent crimes in New York City yielded the counts shown in Table 6. Test whether the population proportions have changed since 2012, using the p-value method and level of significance .

Table 11.5: Table 5 2012 violent crime in New York City
Murder Rape Robbery Assault
0.01 0.04 0.35 0.60

640

Table 11.6: Table 6 Sample of 1000 violent crimes in New York City this year
Murder Rape Robbery Assault
6 50 350 594

Solution

  • Step 1 State the hypotheses and the rejection rule. Check the conditions.

    Reject if the p-value .

What Results Might We Expect?

Before we do the formal hypothesis test, let's try to figure out what the conclusion might be. Figure 4 is a clustered bar graph (see Section 2.1) of the observed and expected frequencies for each of the four categories. If were true, then, for each category, we would expect the red bars (observed frequencies) and blue bars (expected frequencies) to have somewhat similar heights. In fact, the heights of the bars are fairly similar for all four categories, indicating not much difference between the crimes that were observed and the crimes that were expected. Thus, we might expect to not reject .

image
Figure 11.4: FIGURE 4 Graph indicates no evidence against .

First, we need to find the expected frequencies. We have , so the expected frequencies are as shown here.

Category
Murder
Rape
Robbery
Assault
Table 11.7: Expected frequencies for violent crimes in a sample of size

641

Next, check the conditions for this test. Because (a) none of the expected frequencies is less than 1 and (b) no more than 20% of the expected frequencies are less than 5, we may proceed. We use the instructions provided in the Step-by-Step Technology Guide at the end of this section.

  • Step 2 Find the test statistic . The TI-83/84 results in Figure 5 tell us that

  • Step 3 Find the p-value. Figure 5 also tells us that

    This p-value, for the χ2 distribution with 3 degrees of freedom, is shown in Figure 6.

    image
    Figure 11.5: FIGURE 5 has p-value 0.2447.
    image
    Figure 11.6: FIGURE 6

    Figure 7a shows the TI-84 output for the test, and Figure 7b shows the SPSS output for the test, confirming our test statistic of 4.16 and p-value of 0.2447.

    image
    Figure 11.7: FIGURE 7a test on TI-84.
    image
    Figure 11.8: FIGURE 7b test in SPSS.
  • Step 4 State the conclusion and the interpretation. The p-value is not less than , so we do not reject , which we expected. There is insufficient evidence, at a level of significance , that the population proportions of violent crime have changed in New York City since 2012.

NOW YOU CAN DO

Exercises 23–26.