SECTION 11.3 Exercises

For Exercises 11.59 to 11.61, see page 568; for 11.62 and 11.63, see page 571; for 11.64 to 11.66, see pages 474575; for 11.67 and 11.68, see page 577; for 11.69 to 11.72, see page 580.

Question 11.73

11.73 Quadratic models.

Sketch each of the following quadratic equations for values of between 0 and 5. Then describe the relationship between and in your own words.

  1. .
  2. .
  3. .
  4. .

11.73

(a) The relationship is curved; as increases, also increases, but at larger values of , increases more rapidly. (b) The relationship is curved; as increases, decreases at first but then starts to increase slowly, but at larger values of , increases more rapidly. (c) The relationship is curved; as increases, increases at first but then starts to decrease slowly, but at larger values of , decreases more rapidly. (d) The relationship is curved; as increases, decreases, but at larger values of , decreases more rapidly.

Question 11.74

11.74 Models with indicator variables.

Suppose that is an indicator variable with the value 0 for Group A and 1 for Group B. The following equations describe relationships between the value of and membership in Group A or B. For each equation, give the value of the mean response for Group A and for Group B.

  1. .
  2. .
  3. .

Question 11.75

11.75 Differences in means.

Verify that the coefficient of in each part of the previous exercise is equal to the mean for Group B minus the mean for Group A. Do you think that this will be true in general? Explain your answer.

11.75

(a) , which is the slope for . (b) , which is the slope for . (c) , which is the slope for . Yes, it is true in general as long as is an indicator variable with values 0 and 1.

Question 11.76

11.76 Models with interactions.

Suppose that is an indicator variable with the value 0 for Group A and 1 for Group B, and is a quantitative variable. Each of the following models describes a relationship between and the explanatory variables and . For each model, substitute the value 0 for , and write the resulting equation for in terms of for Group A. Then substitute to obtain the equation for Group B, and sketch the two equations on the same graph. Describe in words the difference in the relationship for the two groups.

  1. .
  2. .
  3. .

Question 11.77

11.77 Differences in slopes and intercepts.

Refer to the previous exercise. Verify that the coefficient of is equal to the slope for Group B minus the slope for Group A in each of these cases. Also, verify that the coefficient of is equal to the intercept for Group B minus the intercept for Group A in each of these cases. Do you think these two results will be true in general? Explain your answer.

11.77

(a) , which is the coefficient of , which is the coefficient of . (b) , which is the coefficient of , which is the coefficient of . (c) , which is the coefficient of , which is the coefficient of . These results will be true in general as long as is an indicator variable with values 0 and 1.

Question 11.78

11.78 Write the model.

For each of the following situations write a model for of the form

where is the number of explanatory variables. Be sure to give the value of and, if necessary, explain how each of the ’s is coded.

583

  1. A model where the explanatory variable is a categorical variable with three possible values.
  2. A model where there are four explanatory variables. One of these is categorical with two possible values; another is categorical with four possible values. Include a term that would model an interaction of the first categorical variable and the third (quantitative) explanatory variable.
  3. A cubic regression, where terms up to and including the third power of an explanatory variable are included in the model.

Question 11.79

11.79 Predicting movie revenue.

CASE 11.2 A plot of theater count versus box office revenue suggests that the relationship may be slightly curved.

movies

  1. Examine this question by running a regression to predict the box office revenue using the theater count and the square of the theater count. Report the relevant test statistic with its degrees of freedom and -value, and summarize your conclusion.
  2. Now view this analysis in the framework of testing a hypothesis about a collection of regression coefficients, which you studied in Section 11.2 (page 559). The first model includes theater count and the square of theater count, while the second includes only theater count. Run both regressions and find the value of for each. Find the statistic for comparing the models based on the difference in the values of . Carry out the test and report your conclusion.
  3. Verify that the square of the statistic that you found in part (a) for testing the coefficient of the quadratic term is equal to the statistic that you found for this exercise.

11.79

(a) . The quadratic term for theaters, , is significant and should be included in the model already containing Theaters. (b) For the first model, ; for the second model, . are 1 and should be included in a model that already contains Theaters. (c) with rounding error.

Question 11.80

11.80 Assessing collinearity in the movie revenue model.

CASE 11.2 Many software packages will calculate VIF values for each explanatory variable. In this exercise, calculate the VIF values using several multiple regressions, and then use them to see if there is collinearity among the movie explanatory variables.

movies

  1. Use statistical software to estimate the multiple regression model for predicting Budget based on Opening and Theaters. Calculate the VIF value for Budget using from this model and the formula

  2. Use statistical software to estimate the multiple regression model for predicting Opening based on Budget and Theaters. Calculate the VIF value for Opening using from this model and the formula from part (a).
  3. Use statistical software to estimate the multiple regression model for predicting Theaters based on Budget and Opening. Calculate the VIF value for Theaters using from this model and the formula from part (a).
  4. Do any of the calculated VIF values indicate severe collinearity among the explanatory variables? Explain your response.

Question 11.81

11.81 Predicting movie revenue, continued.

CASE 11.2 Refer to Exercise 11.79. Although a quadratic relationship between total U.S. revenue and theater count provides a better fit than the linear model, it does not make sense that box office revenue would again increase for very low budgeted movies (unless you are the Syfy Channel). An alternative approach to describe the relationship between theater count and box office revenue is to consider a piecewise linear equation.

movies

  1. It appears the relationship between theater count and U.S. revenue changes around a count of 2800 theaters. Create a new variable that is the max . This is simply the difference between the theater count and 2800 with all negative differences rounded to 0.
  2. Fit the model with theater count and the variable you created in part (a). Report the relevant test statistic with its degrees of freedom and -value, and summarize your conclusion.
  3. Obtain the fitted values from this model, and plot them versus theater count. Use this diagram to explain why this is called a piecewise linear model.
  4. Compare the results of this model with the quadratic fit of Exercise 11.79. Which model do you prefer? Explain your answer.

11.81

(b) . The new variable is significant and should be included in the model already containing Theaters. (It should be noted that Theaters is no longer significant in this model and could be removed.) (c) As shown in the plot, it is called a piecewise linear model because we are only measuring linearity for a piece of the variable Theaters (greater than 2800). (d) The results for the quadratic model and the results for the piecewise linear model are very similar; both models required that we retain the additional variable (quadratic or new). Answers will vary for preference; both models add some complexity for interpretation.

Question 11.82

11.82 Predicting movie revenue: Model selection.

CASE 11.2 Refer to the data set on movie revenue in Case 11.2 (page 550). In addition to the movie’s budget, opening-weekend revenue, and opening-weekend theater count, the data set also includes a column named Sequel. Sequel is 1 if the corresponding movie is a sequel, and Sequel is 0 if the movie is not a sequel. Assuming opening-weekend revenue (Opening) is in the model, there are eight possible regression models. For example, one model just includes Opening; another model includes Opening and Theaters; and another model includes Opening, Sequel, and Theaters. Run these eight regressions and make a table giving the regression coefficients, the value of , and the value of for each regression. (If an explanatory variable is not included in a particular regression, enter a value 0 for its coefficient in the table.) Mark coefficients that are statistically significant at the 5% level with an asterisk (*). Summarize your results and state which model you prefer.

movies

584

Question 11.83

11.83 Effect of an outlier.

CASE 11.2 In Exercise 11.50 (page 563), we identified a movie that had much higher revenue than predicted. Remove this movie and repeat the previous exercise. Does the removal of this movie change which model you prefer?

movies

11.83

Regression Coefficients
# variables -Square Intercept Opening Budget Theaters Sequel
1 0.716 41.368 18.04207 2.4782* 0 0 0
2 0.7785 36.997 6.14917 2.14815* 0.34102* 0 0
2 0.7486 39.414 21.26068 2.65651* 0 0 −31.97239*
2 0.7434 39.821 −55.80153 2.09021* 0 0.02787* 0
3 0.7919 36.334 9.88794 2.31116* 0.29527* 0 −21.28865
3 0.7831 37.092 −61.61398 2.23857* 0 0.03141* −35.44187*
3 0.7821 37.176 −22.31507 2.03073* 0.30005* 0.01128 0
4 0.8008 36.026 −35.76104 2.15636* 0.21796 0.01843 −26.12162

Five models have all terms significant: Opening alone, with Budget, with Sequel, with Theaters, or with Theaters and Sequel. Clearly, the model with both Theaters and Sequel is better than those with just Sequel or just Theaters. Likewise the models with 2 variables are better than just Opening alone. Which leaves two potentially good models: Opening with Budget or Opening with Theaters and Sequel. Both have very similar and values, so arguments for either model could be made.