For Exercises 11.59 to 11.61, see page 568; for 11.62 and 11.63, see page 571; for 11.64 to 11.66, see pages 474–575; for 11.67 and 11.68, see page 577; for 11.69 to 11.72, see page 580.
11.73 Quadratic models.
Sketch each of the following quadratic equations for values of between 0 and 5. Then describe the relationship between and in your own words.
11.73
(a) The relationship is curved; as increases, also increases, but at larger values of , increases more rapidly. (b) The relationship is curved; as increases, decreases at first but then starts to increase slowly, but at larger values of , increases more rapidly. (c) The relationship is curved; as increases, increases at first but then starts to decrease slowly, but at larger values of , decreases more rapidly. (d) The relationship is curved; as increases, decreases, but at larger values of , decreases more rapidly.
11.74 Models with indicator variables.
Suppose that is an indicator variable with the value 0 for Group A and 1 for Group B. The following equations describe relationships between the value of and membership in Group A or B. For each equation, give the value of the mean response for Group A and for Group B.
11.75 Differences in means.
Verify that the coefficient of in each part of the previous exercise is equal to the mean for Group B minus the mean for Group A. Do you think that this will be true in general? Explain your answer.
11.75
(a) , which is the slope for . (b) , which is the slope for . (c) , which is the slope for . Yes, it is true in general as long as is an indicator variable with values 0 and 1.
11.76 Models with interactions.
Suppose that is an indicator variable with the value 0 for Group A and 1 for Group B, and is a quantitative variable. Each of the following models describes a relationship between and the explanatory variables and . For each model, substitute the value 0 for , and write the resulting equation for in terms of for Group A. Then substitute to obtain the equation for Group B, and sketch the two equations on the same graph. Describe in words the difference in the relationship for the two groups.
11.77 Differences in slopes and intercepts.
Refer to the previous exercise. Verify that the coefficient of is equal to the slope for Group B minus the slope for Group A in each of these cases. Also, verify that the coefficient of is equal to the intercept for Group B minus the intercept for Group A in each of these cases. Do you think these two results will be true in general? Explain your answer.
11.77
(a) , which is the coefficient of , which is the coefficient of . (b) , which is the coefficient of , which is the coefficient of . (c) , which is the coefficient of , which is the coefficient of . These results will be true in general as long as is an indicator variable with values 0 and 1.
11.78 Write the model.
For each of the following situations write a model for of the form
where is the number of explanatory variables. Be sure to give the value of and, if necessary, explain how each of the ’s is coded.
583
11.79 Predicting movie revenue.
CASE 11.2 A plot of theater count versus box office revenue suggests that the relationship may be slightly curved.
movies
11.79
(a) . The quadratic term for theaters, , is significant and should be included in the model already containing Theaters. (b) For the first model, ; for the second model, . are 1 and should be included in a model that already contains Theaters. (c) with rounding error.
11.80 Assessing collinearity in the movie revenue model.
CASE 11.2 Many software packages will calculate VIF values for each explanatory variable. In this exercise, calculate the VIF values using several multiple regressions, and then use them to see if there is collinearity among the movie explanatory variables.
movies
11.81 Predicting movie revenue, continued.
CASE 11.2 Refer to Exercise 11.79. Although a quadratic relationship between total U.S. revenue and theater count provides a better fit than the linear model, it does not make sense that box office revenue would again increase for very low budgeted movies (unless you are the Syfy Channel). An alternative approach to describe the relationship between theater count and box office revenue is to consider a piecewise linear equation.
movies
11.81
(b) . The new variable is significant and should be included in the model already containing Theaters. (It should be noted that Theaters is no longer significant in this model and could be removed.) (c) As shown in the plot, it is called a piecewise linear model because we are only measuring linearity for a piece of the variable Theaters (greater than 2800). (d) The results for the quadratic model and the results for the piecewise linear model are very similar; both models required that we retain the additional variable (quadratic or new). Answers will vary for preference; both models add some complexity for interpretation.
11.82 Predicting movie revenue: Model selection.
CASE 11.2 Refer to the data set on movie revenue in Case 11.2 (page 550). In addition to the movie’s budget, opening-weekend revenue, and opening-weekend theater count, the data set also includes a column named Sequel. Sequel is 1 if the corresponding movie is a sequel, and Sequel is 0 if the movie is not a sequel. Assuming opening-weekend revenue (Opening) is in the model, there are eight possible regression models. For example, one model just includes Opening; another model includes Opening and Theaters; and another model includes Opening, Sequel, and Theaters. Run these eight regressions and make a table giving the regression coefficients, the value of , and the value of for each regression. (If an explanatory variable is not included in a particular regression, enter a value 0 for its coefficient in the table.) Mark coefficients that are statistically significant at the 5% level with an asterisk (*). Summarize your results and state which model you prefer.
movies
584
11.83 Effect of an outlier.
CASE 11.2 In Exercise 11.50 (page 563), we identified a movie that had much higher revenue than predicted. Remove this movie and repeat the previous exercise. Does the removal of this movie change which model you prefer?
movies
11.83
Regression Coefficients | |||||||
# variables | -Square | Intercept | Opening | Budget | Theaters | Sequel | |
1 | 0.716 | 41.368 | 18.04207 | 2.4782* | 0 | 0 | 0 |
2 | 0.7785 | 36.997 | 6.14917 | 2.14815* | 0.34102* | 0 | 0 |
2 | 0.7486 | 39.414 | 21.26068 | 2.65651* | 0 | 0 | −31.97239* |
2 | 0.7434 | 39.821 | −55.80153 | 2.09021* | 0 | 0.02787* | 0 |
3 | 0.7919 | 36.334 | 9.88794 | 2.31116* | 0.29527* | 0 | −21.28865 |
3 | 0.7831 | 37.092 | −61.61398 | 2.23857* | 0 | 0.03141* | −35.44187* |
3 | 0.7821 | 37.176 | −22.31507 | 2.03073* | 0.30005* | 0.01128 | 0 |
4 | 0.8008 | 36.026 | −35.76104 | 2.15636* | 0.21796 | 0.01843 | −26.12162 |
Five models have all terms significant: Opening alone, with Budget, with Sequel, with Theaters, or with Theaters and Sequel. Clearly, the model with both Theaters and Sequel is better than those with just Sequel or just Theaters. Likewise the models with 2 variables are better than just Opening alone. Which leaves two potentially good models: Opening with Budget or Opening with Theaters and Sequel. Both have very similar and values, so arguments for either model could be made.