17.3 Multiple Logistic Regression

The MOVIES data set includes several explanatory variables. Example 17.10 examines the model where log opening-weekend revenue alone is used to predict the odds that the movie will have a total U.S. box-office revenue greater than the movie budget. Perhaps combining log opening-weekend revenue with other explanatory variables will give us a helpful prediction. We use multiple logistic regression to investigate this. Generating the computer output is easy, just as it was when we generalized simple linear regression with one explanatory variable to multiple linear regression with more than one explanatory variable in Chapter 11. The statistical concepts are similar, although the computations are more complex. Here is the analysis.

multiple logistic regression

EXAMPLE 17.11 Multiple Logistic Regression

movprof

As in Example 17.10, we predict the odds that a movie will be profitable. The explanatory variables are log opening-weekend revenue (LOpening), the length of the movie (Minutes), and the movie rating (Rating1). For the movie rating, we use an indicator variable

Figure 17.8 gives the SAS output. From the output, we see that the fitted model is

image
Figure 17.8: FIGURE 17.8 Multiple logistic regression output from SAS for the movie profit data with log opening-weekend revenue, number of theaters, and movie rating as the explanatory variables, Example 17.11.

17-17

When analyzing data using multiple regression, we first examine the hypothesis that all the regression coefficients for the explanatory variables are zero. We do the same for logistic regression. The hypothesis

is tested by a chi-square statistic with three degrees of freedom. SAS provides results for three different calculations of this statistic. In all three approaches, the -value is . We reject and conclude that one or more of the explanatory variables can be used to predict the odds that the movie is profitable.

Next, examine the coefficients for each variable and the tests that each of these is 0 in a model that contains the other two. The -values are 0.0225, 0.0089, and 0.2138. The null hypothesis cannot be rejected. That is, log opening-weekend revenue and the movie's length add significant predictive ability once the other two explanatory variables are already in the model.

Because the explanatory variables are correlated, however, we cannot conclude that log opening-weekend revenue and the movie's length make up the best predictive model. Further analysis of these data using subsets of the three explanatory variables is needed to clarify the situation. We leave this work for the exercises.