For Exercise 11.1, see page 610 and for Exercise 11.2, see page 611.
11.3 What’s wrong? In each of the following situations, explain what is wrong and why.
(a) A small P-value for the ANOVA F test implies that all explanatory variables are significantly different from zero.
(b) R2 is the proportion of variation explained by the collection of explanatory variables. It can obtained by squaring the correlations between y and each xi and summing them up.
(c) In a multiple regression with a sample size of 45 and six explanatory variables, the test statistic for the null hypothesis H0: b2 = 0 is a t statistic that follows the t(38) distribution when the null hypothesis is true.
11.3 (a) A small P-value indicates that at least one explanatory variable is significant. (b) R2 is not obtained from squaring and adding the pairwise correlations. (c) The null hypothesis should be β2.
11.4 What’s wrong? In each of the following situations, explain what is wrong and why.
(a) One of the assumptions for multiple regression is that the distribution of each explanatory variable is Normal.
(b) The null hypothesis H0: β3 = 0 in a multiple regression involving three explanatory variables implies there is no linear association between x3 and y.
(c) The multiple correlation coefficient gives the average correlation between the response variable and each explanatory variable in the model.
11.5 95% confidence intervals for regression coefficients. In each of the following settings, give a 95% confidence interval for the coefficient of x1.
(a) n = 25, ˆy=1.6+6.4x1+5.7x2, SEb1=3.3
(b) n = 43, ˆy=1.6+6.4x1+5.7x2, SEb1=2.9
(c) n = 25, ˆy=1.6+4.8x1+3.2x2+5.2x3, SEb1=2.7
(d) n = 104, ˆy=1.6+4.8x1+3.2x2+5.2x3, SEb1=1.8
11.5 (a) ( − 0.4442, 13.2442). (b) (0.5391, 12.2609). (c) ( − 0.816, 10.416). (d) (1.2288, 8.3712).
11.6 Significance tests for regression coefficients. For each of the settings in the previous exercise, test the null hypothesis that the coefficient of x1 is zero versus the two-
11.7 Constructing the ANOVA table. Six explanatory variables are used to predict a response variable using a multiple regression. There are 183 observations.
(a) Write the statistical model that is the foundation for this analysis. Also include a description of all assumptions.
(b) Outline the analysis of variance table giving the sources of variation and numerical values for the degrees of freedom.
11.7 (a) y = β0 + β1x1 + β2x2 + β3x3 + β4x4 + β5x5 + β6x6 + ϵ, where ϵ ~N(0, σ ) and independent. (b) The sources of variation are model (DFM = p = 7), error (DFE = n − p − 1 = 134), and total (DFT = n − 1 = 141).
11.8 More on constructing the ANOVA table. A multiple regression analysis of 57 cases was performed with four explanatory variables. Suppose that SSM = 16.5 and SSE = 100.8.
(a) Find the value of the F statistic for testing the null hypothesis that the coefficients of all the explanatory variables are zero.
(b) What are the degrees of freedom for this statistic?
(c) Find bounds on the P-value using Table E. Show your work.
(d) What proportion of the variation in the response variable is explained by the explanatory variables?
11.9 Significance tests for regression coefficients. Refer to Exercise 11.1 (page 610). The following table contains the estimated coefficients and standard errors of their multiple regression fit. Each explanatory variable is an average of several five-
Variable | Estimate | SE |
---|---|---|
Intercept | 1.316 | 0.651 |
Math course anxiety | −0.212 | 0.114 |
Math test anxiety | −0.155 | 0.119 |
Numerical task anxiety | −0.094 | 0.116 |
Enjoyment | 0.176 | 0.114 |
Self- |
0.118 | 0.114 |
Motivation | 0.097 | 0.115 |
Feedback usefulness | 0.644 | 0.194 |
(a) Look at the signs of the coefficients (positive and negative). Is this what you would expect in this setting? Explain your answer.
(b) What are the degrees of freedom for the model and error?
(c) Test the significance of each coefficient and state your conclusions.
11.9 (a) Things seem as expected: the three anxiety variables have negative signs, and the other four variables all have positive signs. (b) 7 and 158. (c) Only Feedback usefulness tests significant (t = 3.320, P-value = 0.0011) at the 0.05 level.
11.10 ANOVA table for multiple regression. Use the following information and the general form of the ANOVA table for multiple regression on page 613 to perform the ANOVA F test and compute R2.
Source | Degrees of freedom |
Sum of squares |
Mean square |
F |
---|---|---|---|---|
Model | 4 | 70 | ||
Error | ||||
Total | 33 | 524 |
11.11 Game-
Explanatory variables | b | t |
---|---|---|
Constant | 12,493.47 | 12.13 |
Division | −788.74 | −2.01 |
Nonconference | −474.83 | −1.04 |
November | −1800.81 | −2.65 |
December | −559.24 | −0.82 |
January | −925.56 | −1.54 |
February | −35.59 | −0.05 |
March | −131.62 | −0.21 |
Weekend | 2992.75 | 8.48 |
Night | 1460.31 | 2.13 |
Promotion | 2162.45 | 5.65 |
Season 2 | −754.56 | −1.85 |
Season 3 | −779.81 | −1.84 |
(a) Which of the explanatory variables significantly aid prediction in the presence of all the explanatory variables? Show your work.
(b) The overall F statistic was 11.59. What are the degrees of freedom and P-value of this statistic?
(c) The value of R2 is 0.52. What percent of the variance in ticket sales is explained by these explanatory variables?
(d) The constant predicts the number of tickets sold for a nondivisional, conference game with no promotions played during the day on a weekday in October of Season 1. What is the predicted number of tickets sold for a divisional conference game with no promotions played on a weekend evening in March during Season 3?
(e) Would a 95% confidence interval for the mean response or a 95% prediction interval be more appropriate to include with your answer to part (d)? Explain your reasoning.
11.11 (a) We need |t| > 1.984. So Division, November, Weekend, Night, and Promotion are all significant in the presence of all the other explanatory variables. (b) df = 12 and 110, P-value < 0.001. (c) 52%. (d) 15246.36. (e) A prediction interval is more appropriate to represent this particular.
11.12 Discrimination at work? A survey of 457 engineers in Canada was performed to identify the relationship of race, language proficiency, and location of training in finding work in the engineering field. In addition, each participant completed the Workplace Prejudice and Discrimination Inventory (WPDI), which is designed to measure perceptions of prejudice on the job, primarily due to race or ethnicity. The score of the WPDI ranged from 16 to 112, with higher scores indicating more perceived discrimination. The following table summarizes two multiple regression models used to predict an engineer’s WPDI score. The first explanatory variable indicates whether the engineer was foreign trained (x = 1) or locally trained (x = 0). The next set of seven variables indicate race and the last six are demographic variables.
Model 1 | Model 2 | |||
---|---|---|---|---|
Explanatory variables | b | s(b) | b | s(b) |
Foreign trained | 0.55 | 0.21 | 0.58 | 0.22 |
Chinese | 0.06 | 0.24 | ||
South Asian | −0.06 | 0.19 | ||
Black | −0.03 | 0.52 | ||
Other Asian | −0.38 | 0.34 | ||
Latin American | 0.20 | 0.46 | ||
Arab | 0.56 | 0.44 | ||
Other (not white) | 0.05 | 0.38 | ||
Mechanical | −0.19 | 0.25 | −0.16 | 0.25 |
Other (not electrical) | −0.14 | 0.20 | −0.13 | 0.21 |
Masters/PhD | 0.32 | 0.18 | 0.37 | 0.18 |
30– |
−0.03 | 0.22 | −0.06 | 0.22 |
40 or older | 0.32 | 0.25 | 0.25 | 0.26 |
Female | −0.02 | 0.19 | −0.05 | 0.19 |
R2 | 0.10 | 0.11 |
(a) The F statistics for these two models are 7.12 and 3.90, respectively. What are the degrees of freedom and P-value of each statistic?
(b) The F statistics for the multiple regressions are highly significant, but the R2 are relatively low. Explain to a statistical novice how this can occur.
(c) Do foreign-