SECTION 11.1 EXERCISES

For Exercise 11.1, see page 610 and for Exercise 11.2, see page 611.

Question 11.3

11.3 What’s wrong? In each of the following situations, explain what is wrong and why.

  1. (a) A small P-value for the ANOVA F test implies that all explanatory variables are significantly different from zero.

  2. (b) R2 is the proportion of variation explained by the collection of explanatory variables. It can obtained by squaring the correlations between y and each xi and summing them up.

  3. (c) In a multiple regression with a sample size of 45 and six explanatory variables, the test statistic for the null hypothesis H0: b2 = 0 is a t statistic that follows the t(38) distribution when the null hypothesis is true.

Question 11.4

11.4 What’s wrong? In each of the following situations, explain what is wrong and why.

  1. (a) One of the assumptions for multiple regression is that the distribution of each explanatory variable is Normal.

  2. (b) The null hypothesis H0: β3 = 0 in a multiple regression involving three explanatory variables implies there is no linear association between x3 and y.

  3. (c) The multiple correlation coefficient gives the average correlation between the response variable and each explanatory variable in the model.

617

Question 11.5

11.5 95% confidence intervals for regression coefficients. In each of the following settings, give a 95% confidence interval for the coefficient of x1.

  1. (a) n = 25, ,

  2. (b) n = 43, ,

  3. (c) n = 25, ,

  4. (d) n = 104, ,

Question 11.6

11.6 Significance tests for regression coefficients. For each of the settings in the previous exercise, test the null hypothesis that the coefficient of x1 is zero versus the two-sided alternative.

Question 11.7

11.7 Constructing the ANOVA table. Six explanatory variables are used to predict a response variable using a multiple regression. There are 183 observations.

  1. (a) Write the statistical model that is the foundation for this analysis. Also include a description of all assumptions.

  2. (b) Outline the analysis of variance table giving the sources of variation and numerical values for the degrees of freedom.

Question 11.8

11.8 More on constructing the ANOVA table. A multiple regression analysis of 57 cases was performed with four explanatory variables. Suppose that SSM = 16.5 and SSE = 100.8.

  1. (a) Find the value of the F statistic for testing the null hypothesis that the coefficients of all the explanatory variables are zero.

  2. (b) What are the degrees of freedom for this statistic?

  3. (c) Find bounds on the P-value using Table E. Show your work.

  4. (d) What proportion of the variation in the response variable is explained by the explanatory variables?

Question 11.9

11.9 Significance tests for regression coefficients. Refer to Exercise 11.1 (page 610). The following table contains the estimated coefficients and standard errors of their multiple regression fit. Each explanatory variable is an average of several five-point Likert scale questions.

Variable Estimate SE
Intercept 1.316 0.651
Math course anxiety −0.212 0.114
Math test anxiety −0.155 0.119
Numerical task anxiety −0.094 0.116
Enjoyment 0.176 0.114
Self-confidence 0.118 0.114
Motivation 0.097 0.115
Feedback usefulness 0.644 0.194
  1. (a) Look at the signs of the coefficients (positive and negative). Is this what you would expect in this setting? Explain your answer.

  2. (b) What are the degrees of freedom for the model and error?

  3. (c) Test the significance of each coefficient and state your conclusions.

Question 11.10

11.10 ANOVA table for multiple regression. Use the following information and the general form of the ANOVA table for multiple regression on page 613 to perform the ANOVA F test and compute R2.

Source Degrees of
freedom
Sum of
squares
Mean
square
F
Model 4 70
Error
Total 33 524

Question 11.11

image 11.11 Game-day spending. Game-day spending (ticket sales and food and beverage purchases) is critical for the sustainability of many professional sports teams. In the National Hockey League (NHL), nearly half the franchises generate more than two-thirds of their annual income from game-day spending. Understanding and possibly predicting this spending would allow teams to respond with appropriate marketing and pricing strategies. To investigate this possibility, a group of researchers looked at data from one NHL team over a three-season period (n = 123 home games).3 The following table summarizes the multiple regression used to predict ticket sales. Each explanatory variable is an indicator variable taking the value 1 for the condition specified and 0 otherwise.

Explanatory variables b t
Constant 12,493.47 12.13
Division −788.74 −2.01
Nonconference −474.83 −1.04
November −1800.81 −2.65
December −559.24 −0.82
January −925.56 −1.54
February −35.59 −0.05
March −131.62 −0.21
Weekend 2992.75 8.48
Night 1460.31 2.13
Promotion 2162.45 5.65
Season 2 −754.56 −1.85
Season 3 −779.81 −1.84
  1. (a) Which of the explanatory variables significantly aid prediction in the presence of all the explanatory variables? Show your work.

  2. (b) The overall F statistic was 11.59. What are the degrees of freedom and P-value of this statistic?

  3. (c) The value of R2 is 0.52. What percent of the variance in ticket sales is explained by these explanatory variables?

  4. (d) The constant predicts the number of tickets sold for a nondivisional, conference game with no promotions played during the day on a weekday in October of Season 1. What is the predicted number of tickets sold for a divisional conference game with no promotions played on a weekend evening in March during Season 3?

  5. (e) Would a 95% confidence interval for the mean response or a 95% prediction interval be more appropriate to include with your answer to part (d)? Explain your reasoning.

618

Question 11.12

11.12 Discrimination at work? A survey of 457 engineers in Canada was performed to identify the relationship of race, language proficiency, and location of training in finding work in the engineering field. In addition, each participant completed the Workplace Prejudice and Discrimination Inventory (WPDI), which is designed to measure perceptions of prejudice on the job, primarily due to race or ethnicity. The score of the WPDI ranged from 16 to 112, with higher scores indicating more perceived discrimination. The following table summarizes two multiple regression models used to predict an engineer’s WPDI score. The first explanatory variable indicates whether the engineer was foreign trained (x = 1) or locally trained (x = 0). The next set of seven variables indicate race and the last six are demographic variables.

Model 1 Model 2
Explanatory variables b s(b) b s(b)
Foreign trained 0.55 0.21 0.58 0.22
Chinese 0.06 0.24
South Asian −0.06 0.19
Black −0.03 0.52
Other Asian −0.38 0.34
Latin American 0.20 0.46
Arab 0.56 0.44
Other (not white) 0.05 0.38
Mechanical −0.19 0.25 −0.16 0.25
Other (not electrical) −0.14 0.20 −0.13 0.21
Masters/PhD 0.32 0.18 0.37 0.18
30–39 years old −0.03 0.22 −0.06 0.22
40 or older 0.32 0.25 0.25 0.26
Female −0.02 0.19 −0.05 0.19
R2 0.10 0.11
  1. (a) The F statistics for these two models are 7.12 and 3.90, respectively. What are the degrees of freedom and P-value of each statistic?

  2. (b) The F statistics for the multiple regressions are highly significant, but the R2 are relatively low. Explain to a statistical novice how this can occur.

  3. (c) Do foreign-trained engineers perceive more discrimination than do locally trained engineers? To address this, test if the first coefficient in each model is equal to zero versus the greater than alternative. Summarize your results.