SECTION 11.2 Exercises

For Exercise 11.33, see page 550; for 11.34 and 11.35, see page 554; for 11.36 and 11.37, see page 555; for 11.38, see page 558; for 11.39, see page 559; and for 11.40 and 11.41, see pages 560561.

Question 11.42

11.42 Confidence interval for a regression coefficient.

In each of the following settings, give a 95% confidence interval for the coefficient of .

  1. .
  2. .
  3. .
  4. .

Question 11.43

11.43 Significance test for a regression coefficient.

For each of the settings in the previous exercise, test the null hypothesis that the coefficient of is zero versus the two-sided alternative.

11.43

for all cases. (a) ; technically, this is significant. (b) ; this is significant. (c) ; this is not significant. (d) use ; this is significant.

Question 11.44

11.44 What’s wrong?

In each of the following situations, explain what is wrong and why.

  1. One of the assumptions for multiple regression is that the distribution of each explanatory variable is Normal.
  2. The smaller the -value for the ANOVA test, the greater the explanatory power of the model.
  3. All explanatory variables that are significantly correlated with the response variable will have a statistically significant regression coefficient in the multiple regression model.

Question 11.45

11.45 What’s wrong?

In each of the following situations, explain what is wrong and why.

  1. The multiple correlation gives the proportion of the variation in the response variable that is explained by the explanatory variables.
  2. In a multiple regression with a sample size of 35 and four explanatory variables, the test statistic for the null hypothesis is a statistic that follows the (30) distribution when the null hypothesis is true.
  3. A small -value for the ANOVA test implies that all explanatory variables are statistically different from zero.

11.45

(a) This is true for the squared multiple correlation. (b) We should not have a slope estimate in the hypotheses; it should be a parameter, . (c) A significant test implies that at least one explanatory variable is statistically different from zero, not necessarily all.

Question 11.46

11.46 Inference basics.

You run a multiple regression with 54 cases and three explanatory variables.

  1. What are the degrees of freedom for the statistic for testing the null hypothesis that all three of the regression coefficients for the explanatory variables are zero?
  2. Software output gives . What is the estimate of the standard deviation of the model?
  3. The output gives the estimate of the regression coefficient for the first explanatory variable as 0.85 with a standard error of 0.43. Find a 95% confidence interval for the true value of this coefficient.
  4. Test the null hypothesis that the regression coefficient for the first explanatory variable is zero. Give the test statistic, the degrees of freedom, the -value, and your conclusion.

Question 11.47

11.47 Inference basics.

You run a multiple regression with 22 cases and four explanatory variables. The ANOVA table includes the sums of squares and .

  1. Find the statistic for testing the null hypothesis that the regression coefficients for the four explanatory variables are all zero. Carry out the significance test and report the results.
  2. What is the value of for this model? Explain what this number tells us.

11.47

. Using of 4 and , the data are not significant at the 5% level, and there is not enough evidence to say that at least one of the slopes is not zero. (b) . 39.81% of the variation in the response variable is explained by all the explanatory variables.

563

Question 11.48

11.48 Discrimination at work?

A survey of 457 engineers in Canada was performed to identify the relationship of race, language proficiency, and location of training in finding work in the engineering field. In addition, each participant completed the Workplace Prejudice and Discrimination Inventory (WPDI), which is designed to measure perceptions of prejudice on the job, primarily due to race or ethnicity. The score of the WPDI ranged from 16 to 112, with higher scores indicating more perceived discrimination. The following table summarizes two multiple regression models used to predict an engineer’s WPDI score. The first explanatory variable indicates whether the engineer was foreign trained () or locally trained (). The next set of seven variables indicate race and the last six are demographic variables.

Model 1 Model 2
Explanatory variables
Foreign trained 0.55 0.21 0.58 0.22
Chinese 0.06 0.24
South Asian −0.06 0.19
Black −0.03 0.52
Other Asian −0.38 0.34
Latin American 0.20 0.46
Arab 0.56 0.44
Other (not white) 0.05 0.38
Mechanical −0.19 0.25 −0.16 0.25
Other (not electrical) −0.14 0.20 −0.13 0.21
Masters/PhD 0.32 0.18 0.37 0.18
30-39 years old −0.03 0.22 −0.06 0.22
40 or older 0.32 0.25 0.25 0.26
Female −0.02 0.19 −0.05 0.19
0.10 0.11
  1. The statistics for these two models are 7.12 and 3.90, respectively. What are the degrees of freedom and -value of each statistic?
  2. The statistics for the multiple regressions are highly significant, but the are relatively low. Explain to a statistical novice how this can occur.
  3. Do foreign trained engineers perceive more discrimination than do locally trained engineers? To address this, test if the first coefficient in each model is equal to zero. Summarize your results.

Question 11.49

11.49 Checking the model assumptions.

CASE 11.2 Statistical inference requires us to make some assumptions about our data. These should always be checked prior to drawing conclusions. For brevity, we did not discuss this assessment for the movie revenue data of Section 11.2, so let’s do it here.

movies

  1. Obtain the residuals for the multiple regression in Example 11.13 (pages 552553), and construct a histogram and Normal quantile plot. Do the residuals appear approximately Normal? Explain your answer.
  2. Plot the residuals versus the opening-weekend revenue. Comment on anything unusual in the plot.
  3. Repeat part (b) using the explanatory variable Budget on the x axis.
  4. Repeat part (b) using the predicted value on the x axis.
  5. Summarize your overall findings from these summaries. Are the model assumptions reasonably satisfied? Explain your answer.

11.49

(a) The residuals are right-skewed and not Normally distributed. (b) There are two outliers in the residual plot for Opening—one with a very high residual, one with a very large Opening value. (c) The residual plot for Budget again shows the observation with a very high residual; otherwise, it looks fairly good (random). (d) The residual plot against the predicted values shows a megaphone effect suggesting non-constant variance. (e) The model assumptions are not reasonably satisfied; the residuals are right-skewed, and there are several outliers in the dataset that are potentially influencing the regression analysis.

Question 11.50

11.50 Effect of a potential outlier.

CASE 11.2 Refer to the previous exercise.

movies

  1. There is one movie that has a much larger total U.S. box office revenue than predicted. Which is it, and how much more revenue did it obtain compared with that predicted?
  2. Remove this movie and redo the multiple regression. Make a table giving the regression coefficients and their standard errors, statistics, and -values.
  3. Compare these results with those presented in Example 11.13 (pages 552553). How does the removal of this outlying movie affect the estimated model?
  4. Obtain the residuals from this reduced data set and graphically examine their distribution. Do the residuals appear approximately Normal? Is there constant variance? Explain your answer.

Question 11.51

11.51 Game-day spending.

Game-day spending (ticket sales and food and beverage purchases) is critical for the sustainability of many professional sports teams. In the National Hockey League (NHL), nearly half the franchises generate more than two-thirds of their annual income from game-day spending. Understanding and possibly predicting this spending would allow teams to respond with appropriate marketing and pricing strategies. To investigate this possibility, a group of researchers looked at data from one NHL team over a three-season period ( home games).12 The following table summarizes the multiple regression used to predict ticket sales.

564

Explanatory variables
Constant 12,493.47 12.13
Division −788.74 −2.01
Nonconference −474.83 −1.04
November −1800.81 −2.65
December −559.24 −0.82
January −925.56 −1.54
February −35.59 −0.05
March −131.62 −0.21
Weekend 2992.75 8.48
Night 1460.31 2.13
Promotion 2162.45 5.65
Season 2 −754.56 −1.85
Season 3 −779.81 −1.84
  1. Which of the explanatory variables significantly aid prediction in the presence of all the explanatory variables? Show your work.
  2. The overall statistic was 11.59. What are the degrees of freedom and -value of this statistic?
  3. The value of is 0.52. What percent of the variance in ticket sales is explained by these explanatory variables?
  4. The constant predicts the number of tickets sold for a nondivisional, conference game with no promotions played during the day during the week in October during Season 1. What is the predicted number of tickets sold for a divisional conference game with no promotions played on a weekend evening in March during Season 3?
  5. Would a 95% confidence interval for the mean response or a 95% prediction interval be more appropriate to include with your answer to part (d)? Explain your reasoning.

11.51

(a) Using and (use 100), for significance we need . So Division, November, Weekend, Night, and Promotion are all significant in the presence of all the other explanatory variables. (b) . (c) 52%. (d) 15246.36. (e) Because we don’t expect the same setting for very many games, the mean response interval doesn’t make sense, so a prediction interval is more appropriate to represent this particular game and its specific settings.

Question 11.52

11.52 Bank auto loans.

Banks charge different interest rates for different loans. A random sample of 2229 loans made by banks for the purchase of new automobiles was studied to identify variables that explain the interest rate charged. A multiple regression was run with interest rate as the response variable and 13 explanatory variables.13

  1. The statistic reported is 71.34. State the null and alternative hypotheses for this statistic. Give the degrees of freedom and the -value for this test. What do you conclude?
  2. The value of is 0.297. What percent of the variation in interest rates is explained by the 13 explanatory variables?

Question 11.53

11.53 Bank auto loans, continued.

Table 11.4 gives the coefficients for the fitted model and the individual statistic for each explanatory variable in the study described in the previous exercise. The -values are given without the sign, assuming that all tests are two-sided.

Table 11.13: TABLE 11.4 Regression coefficients and statistics for Exercise 11.53
Variable
Intercept 15.47
Loan size (in dollars) −0.0015 10.30
Length of loan (in months) −0.906 4.20
Percent down payment −0.522 8.35
Cosigner (, ) −0.009 3.02
Unsecured loan (, ) 0.034 2.19
Total payments (borrower’s monthly installment debt) 0.100 1.37
Total income (borrower’s total monthly income) −0.170 2.37
Bad credit report (, ) 0.012 1.99
Young borrower (, ) 0.027 2.85
Male borrower (, ) −0.001 0.89
Married (, ) −0.023 1.91
Own home (, ) −0.011 2.73
Years at current address −0.124 4.21

565

  1. State the null and alternative hypotheses tested by an individual statistic. What are the degrees of freedom for these statistics? What values of will lead to rejection of the null hypothesis at the 5% level?
  2. Which of the explanatory variables have coefficients that are significantly different from zero in this model? Explain carefully what you conclude when an individual statistic is not significant.
  3. The signs of many of the coefficients are what we might expect before looking at the data. For example, the negative coefficient for loan size means that larger loans get a smaller interest rate. This is reasonable. Examine the signs of each of the statistically significant coefficients and give a short explanation of what they tell us.

11.53

(a) . (b) Loan size, Length of loan, Percent down, Cosigner, Unsecured loan, Total income, Bad credit report, Young borrower, Own home, and Years at current address are significant. Those that aren’t significant only mean that the particular variable is not useful after all other variables are considered included in the model already. (c) Having a larger loan size gives a smaller interest rate. Having a longer loan gives a smaller interest rate. Having a larger percent down payment gives a smaller interest rate. Having a cosigner gives a smaller interest rate. Having an unsecured loan gives a larger interest rate. Having larger total income gives a smaller interest rate. Having a bad credit report gives a larger interest rate. Being a young borrower gives a larger interest rate. Owning a home gives a smaller interest rate. More years at current address gives a smaller interest rate.

Question 11.54

11.54 Auto dealer loans.

The previous two exercises describe auto loans made directly by a bank. The researchers also looked at 5664 loans made indirectly—that is, through an auto dealer. They again used multiple regression to predict the interest rate using the same set of 13 explanatory variables.

  1. The statistic reported is 27.97. State the null and alternative hypotheses for this statistic. Give the degrees of freedom and the -value for this test. What do you conclude?
  2. The value of is 0.141. What percent of the variation in interest rates is explained by the 13 explanatory variables? Compare this value with the percent explained for direct loans in Exercise 11.53.

Question 11.55

11.55 Auto dealer loans, continued.

Table 11.5 gives the estimated regression coefficient and individual statistic for each explanatory variable in the setting of the previous exercise. The -values are given without the sign, assuming that all tests are two-sided.

  1. What are the degrees of freedom of any individual statistic for this model? What values of are significant at the 5% level? Explain carefully what significance tells us about an explanatory variable.
  2. Which of the explanatory variables have coefficients that are significantly different from zero in this model?
  3. The signs of many of these coefficients are what we might expect before looking at the data. For example, the negative coefficient for loan size means that larger loans get a smaller interest rate. This is reasonable. Examine the signs of each of the statistically significant coefficients and give a short explanation of what they tell us.

11.55

(a) . Any variable that is significant tells us that the particular variable is useful in predicting the response after all other variables are considered included in the model already. (b) Only Loan size, Length of loan, Percent down, and Unsecured loan are significant. (c) Having a larger loan size gives a smaller interest rate. Having a longer loan gives a smaller interest rate. Having a larger percent down payment gives a smaller interest rate. Having an unsecured loan gives a larger interest rate.

Question 11.56

11.56 Direct versus indirect loans.

The previous four exercises describe a study of loans for buying new cars. The authors conclude that banks take higher risks with indirect loans because they do not take into account borrower characteristics when setting the loan rate. Explain how the results of the multiple regressions lead to this conclusion.

Table 11.14: TABLE 11.5 Regression coefficients and statistics for Exercise 11.55
Variable
Intercept 15.89
Loan size (in dollars) −0.0029 17.40
Length of loan (in months) −1.098 5.63
Percent down payment −0.308 4.92
Cosigner (, ) −0.001 1.41
Unsecured loan (, ) 0.028 2.83
Total payments (borrower’s monthly installment debt) −0.513 1.37
Total income (borrower’s total monthly income) 0.078 0.75
Bad credit report (, ) 0.039 1.76
Young borrower (, ) −0.036 1.33
Male borrower (, ) −0.179 1.03
Married (, ) −0.043 1.61
Own home (, ) −0.047 1.59
Years at current address −0.086 1.73

566

Question 11.57

11.57 Canada’s Small Business Financing Program.

The Canada Small Business Financing Program (CSBFP) seeks to increase the availability of loans for establishing and improving small businesses. A survey was performed to better understand the experiences of small businesses when seeking loans and the extent to which they are aware of and satisfied with the CSBFP.14 A total of 1050 survey interviews were completed. To understand the drivers of perceived fairness of CSBFP terms and conditions, a multiple regression was undertaken. The response variable was the subject’s perceived fairness scored on a 5-point scale, where 1 means "very unfair’’ and 5 means "very fair.’’ The 15 explanatory variables included characteristics of the survey participant (gender, francophone, loan history, previous CSBFP borrower) and characteristics of his or her small business (type, location, size).

  1. What are the degrees of freedom for the statistic of the model that contains all the predictors?
  2. The report states that the -value for the overall test is and that the complete set of predictors has an of 0.031. Explain to a statistical novice how the test can be highly significant but with a very low .
  3. The report also reports that only two of the explanatory variables were found significant at the 0.05 level. Suppose the model with just an indicator of previous CSBFP participation and an indicator that the business is in transportation and warehousing explained 2.5% of the variation in the response variable. Test the hypothesis that the other 13 predictors do not help predict fairness when these two predictors are already in the model.

11.57

(a) are 15 and 1034. (b) The test is significant, meaning the model is good at predicting the response variable, but there is still a lot of variance that is unexplained (a lot of scatter around our current regression line) because is small. This small just means there are other potential predictors that may also help us, in addition to our current predictors, to account for this remaining scatter or variation in the response. (c) are 13 and 1034, . The added 13 variables do not contribute significantly in explaining the response when these 2 predictors are already in the model.

Question 11.58

11.58 Compensation and human capital.

A study of bank branch manager compensation collected data on the salaries of 82 managers at branches of a large eastern U.S. bank.15 Multiple regression models were used to predict how much these branch managers were paid. The researchers examined two sets of explanatory variables. The first set included variables that measured characteristics of the branch and the position of the branch manager. These were number of branch employees, a variable constructed to represent how much competition the branch faced, market share, return on assets, an efficiency ranking, and the rank of the manager. A second set of variables was called human capital variables and measured characteristics of the manager. These were experience in industry, gender, years of schooling, and age. For the multiple regression using all the explanatory variables, the value of was 0.77. When the human capital variables were deleted, fell to 0.06. Test the null hypothesis that the coefficients for the human capital variables are all zero in the model that includes all the explanatory variables. Give the test statistic with its degrees of freedom and -value, and give a short summary of your conclusion in nontechnical language.