For Exercises 11.59 to 11.61, see page 568; for 11.62 and 11.63, see page 571; for 11.64 to 11.66, see pages 474–575; for 11.67 and 11.68, see page 577; for 11.69 to 11.72, see page 580.
11.73 Quadratic models.
Sketch each of the following quadratic equations for values of x between 0 and 5. Then describe the relationship between μy and x in your own words.
11.73
(a) The relationship is curved; as x increases, μY also increases, but at larger values of x, μY increases more rapidly. (b) The relationship is curved; as x increases, μY decreases at first but then starts to increase slowly, but at larger values of x, μY increases more rapidly. (c) The relationship is curved; as x increases, μY increases at first but then starts to decrease slowly, but at larger values of x, μY decreases more rapidly. (d) The relationship is curved; as x increases, μY decreases, but at larger values of x, μY decreases more rapidly.
11.74 Models with indicator variables.
Suppose that x is an indicator variable with the value 0 for Group A and 1 for Group B. The following equations describe relationships between the value of μy and membership in Group A or B. For each equation, give the value of the mean response μy for Group A and for Group B.
11.75 Differences in means.
Verify that the coefficient of x in each part of the previous exercise is equal to the mean for Group B minus the mean for Group A. Do you think that this will be true in general? Explain your answer.
11.75
(a) 15−10=5, which is the slope for x. (b) 15−5=10, which is the slope for x. (c) 105−5=100, which is the slope for x. Yes, it is true in general as long as x is an indicator variable with values 0 and 1.
11.76 Models with interactions.
Suppose that x1 is an indicator variable with the value 0 for Group A and 1 for Group B, and x2 is a quantitative variable. Each of the following models describes a relationship between μy and the explanatory variables x1 and x2. For each model, substitute the value 0 for x1, and write the resulting equation for μy in terms of x2 for Group A. Then substitute x1=1 to obtain the equation for Group B, and sketch the two equations on the same graph. Describe in words the difference in the relationship for the two groups.
11.77 Differences in slopes and intercepts.
Refer to the previous exercise. Verify that the coefficient of x1x2 is equal to the slope for Group B minus the slope for Group A in each of these cases. Also, verify that the coefficient of x1 is equal to the intercept for Group B minus the intercept for Group A in each of these cases. Do you think these two results will be true in general? Explain your answer.
11.77
(a) 6−2=4, which is the coefficient of x1x2. 70−40=30, which is the coefficient of x1. (b) 6−4=2, which is the coefficient of x1x2. 70−40=30, which is the coefficient of x1. (c) 2−(−2)=4, which is the coefficient of x1x2. 70−30=40, which is the coefficient of x1. These results will be true in general as long as x1 is an indicator variable with values 0 and 1.
11.78 Write the model.
For each of the following situations write a model for μy of the form
μy=β0+β1x1+β2x2+⋯+βpxp
where p is the number of explanatory variables. Be sure to give the value of p and, if necessary, explain how each of the x’s is coded.
11.79 Predicting movie revenue.
A plot of theater count versus box office revenue suggests that the relationship may be slightly curved.
movies
11.79
(a) H0:β2=0. Ha:β2≠0. t=3.03, df=40, 0.002<P-value<0.005. The quadratic term for theaters, Theaters2, is significant and should be included in the model already containing Theaters. (b) For the first model, R2=0.5125; for the second model, R2=0.4009. F=9.16, df are 1 and 40, 0.001<P-value<0.01, Theaters2 should be included in a model that already contains Theaters. (c) 3.032=9.18≈9.16 with rounding error.
11.80 Assessing collinearity in the movie revenue model.
Many software packages will calculate VIF values for each explanatory variable. In this exercise, calculate the VIF values using several multiple regressions, and then use them to see if there is collinearity among the movie explanatory variables.
movies
VIF=11−R2
11.81 Predicting movie revenue, continued.
Refer to Exercise 11.79. Although a quadratic relationship between total U.S. revenue and theater count provides a better fit than the linear model, it does not make sense that box office revenue would again increase for very low budgeted movies (unless you are the Syfy Channel). An alternative approach to describe the relationship between theater count and box office revenue is to consider a piecewise linear equation.
movies
11.81
(b) t=3.21, df=40, P-value=0.0026. The new variable is significant and should be included in the model already containing Theaters. (It should be noted that Theaters is no longer significant in this model and could be removed.) (c) As shown in the plot, it is called a piecewise linear model because we are only measuring linearity for a piece of the variable Theaters (greater than 2800). (d) The results for the quadratic model and the results for the piecewise linear model are very similar; both models required that we retain the additional variable (quadratic or new). Answers will vary for preference; both models add some complexity for interpretation.
11.82 Predicting movie revenue: Model selection.
Refer to the data set on movie revenue in Case 11.2 (page 550). In addition to the movie’s budget, opening-weekend revenue, and opening-weekend theater count, the data set also includes a column named Sequel. Sequel is 1 if the corresponding movie is a sequel, and Sequel is 0 if the movie is not a sequel. Assuming opening-weekend revenue (Opening) is in the model, there are eight possible regression models. For example, one model just includes Opening; another model includes Opening and Theaters; and another model includes Opening, Sequel, and Theaters. Run these eight regressions and make a table giving the regression coefficients, the value of R2, and the value of s for each regression. (If an explanatory variable is not included in a particular regression, enter a value 0 for its coefficient in the table.) Mark coefficients that are statistically significant at the 5% level with an asterisk (*). Summarize your results and state which model you prefer.
movies
11.83 Effect of an outlier.
In Exercise 11.50 (page 563), we identified a movie that had much higher revenue than predicted. Remove this movie and repeat the previous exercise. Does the removal of this movie change which model you prefer?
movies
11.83
Regression Coefficients | |||||||
# variables | R-Square | s | Intercept | Opening | Budget | Theaters | Sequel |
1 | 0.716 | 41.368 | 18.04207 | 2.4782* | 0 | 0 | 0 |
2 | 0.7785 | 36.997 | 6.14917 | 2.14815* | 0.34102* | 0 | 0 |
2 | 0.7486 | 39.414 | 21.26068 | 2.65651* | 0 | 0 | −31.97239* |
2 | 0.7434 | 39.821 | −55.80153 | 2.09021* | 0 | 0.02787* | 0 |
3 | 0.7919 | 36.334 | 9.88794 | 2.31116* | 0.29527* | 0 | −21.28865 |
3 | 0.7831 | 37.092 | −61.61398 | 2.23857* | 0 | 0.03141* | −35.44187* |
3 | 0.7821 | 37.176 | −22.31507 | 2.03073* | 0.30005* | 0.01128 | 0 |
4 | 0.8008 | 36.026 | −35.76104 | 2.15636* | 0.21796 | 0.01843 | −26.12162 |
Five models have all terms significant: Opening alone, with Budget, with Sequel, with Theaters, or with Theaters and Sequel. Clearly, the model with both Theaters and Sequel is better than those with just Sequel or just Theaters. Likewise the models with 2 variables are better than just Opening alone. Which leaves two potentially good models: Opening with Budget or Opening with Theaters and Sequel. Both have very similar R2 and s values, so arguments for either model could be made.