For Exercise 11.33, see page 550; for 11.34 and 11.35, see page 554; for 11.36 and 11.37, see page 555; for 11.38, see page 558; for 11.39, see page 559; and for 11.40 and 11.41, see pages 560–561.
11.42 Confidence interval for a regression coefficient.
In each of the following settings, give a 95% confidence interval for the coefficient of .
11.43 Significance test for a regression coefficient.
For each of the settings in the previous exercise, test the null hypothesis that the coefficient of is zero versus the two-sided alternative.
11.43
for all cases. (a) ; technically, this is significant. (b) ; this is significant. (c) ; this is not significant. (d) use ; this is significant.
11.44 What’s wrong?
In each of the following situations, explain what is wrong and why.
11.45 What’s wrong?
In each of the following situations, explain what is wrong and why.
11.45
(a) This is true for the squared multiple correlation. (b) We should not have a slope estimate in the hypotheses; it should be a parameter, . (c) A significant test implies that at least one explanatory variable is statistically different from zero, not necessarily all.
11.46 Inference basics.
You run a multiple regression with 54 cases and three explanatory variables.
11.47 Inference basics.
You run a multiple regression with 22 cases and four explanatory variables. The ANOVA table includes the sums of squares and .
11.47
. Using of 4 and , the data are not significant at the 5% level, and there is not enough evidence to say that at least one of the slopes is not zero. (b) . 39.81% of the variation in the response variable is explained by all the explanatory variables.
563
11.48 Discrimination at work?
A survey of 457 engineers in Canada was performed to identify the relationship of race, language proficiency, and location of training in finding work in the engineering field. In addition, each participant completed the Workplace Prejudice and Discrimination Inventory (WPDI), which is designed to measure perceptions of prejudice on the job, primarily due to race or ethnicity. The score of the WPDI ranged from 16 to 112, with higher scores indicating more perceived discrimination. The following table summarizes two multiple regression models used to predict an engineer’s WPDI score. The first explanatory variable indicates whether the engineer was foreign trained () or locally trained (). The next set of seven variables indicate race and the last six are demographic variables.
Model 1 | Model 2 | |||
---|---|---|---|---|
Explanatory variables | ||||
Foreign trained | 0.55 | 0.21 | 0.58 | 0.22 |
Chinese | 0.06 | 0.24 | ||
South Asian | −0.06 | 0.19 | ||
Black | −0.03 | 0.52 | ||
Other Asian | −0.38 | 0.34 | ||
Latin American | 0.20 | 0.46 | ||
Arab | 0.56 | 0.44 | ||
Other (not white) | 0.05 | 0.38 | ||
Mechanical | −0.19 | 0.25 | −0.16 | 0.25 |
Other (not electrical) | −0.14 | 0.20 | −0.13 | 0.21 |
Masters/PhD | 0.32 | 0.18 | 0.37 | 0.18 |
30-39 years old | −0.03 | 0.22 | −0.06 | 0.22 |
40 or older | 0.32 | 0.25 | 0.25 | 0.26 |
Female | −0.02 | 0.19 | −0.05 | 0.19 |
0.10 | 0.11 |
11.49 Checking the model assumptions.
CASE 11.2 Statistical inference requires us to make some assumptions about our data. These should always be checked prior to drawing conclusions. For brevity, we did not discuss this assessment for the movie revenue data of Section 11.2, so let’s do it here.
movies
11.49
(a) The residuals are right-skewed and not Normally distributed. (b) There are two outliers in the residual plot for Opening—one with a very high residual, one with a very large Opening value. (c) The residual plot for Budget again shows the observation with a very high residual; otherwise, it looks fairly good (random). (d) The residual plot against the predicted values shows a megaphone effect suggesting non-constant variance. (e) The model assumptions are not reasonably satisfied; the residuals are right-skewed, and there are several outliers in the dataset that are potentially influencing the regression analysis.
11.50 Effect of a potential outlier.
CASE 11.2 Refer to the previous exercise.
movies
11.51 Game-day spending.
Game-day spending (ticket sales and food and beverage purchases) is critical for the sustainability of many professional sports teams. In the National Hockey League (NHL), nearly half the franchises generate more than two-thirds of their annual income from game-day spending. Understanding and possibly predicting this spending would allow teams to respond with appropriate marketing and pricing strategies. To investigate this possibility, a group of researchers looked at data from one NHL team over a three-season period ( home games).12 The following table summarizes the multiple regression used to predict ticket sales.
564
Explanatory variables | ||
---|---|---|
Constant | 12,493.47 | 12.13 |
Division | −788.74 | −2.01 |
Nonconference | −474.83 | −1.04 |
November | −1800.81 | −2.65 |
December | −559.24 | −0.82 |
January | −925.56 | −1.54 |
February | −35.59 | −0.05 |
March | −131.62 | −0.21 |
Weekend | 2992.75 | 8.48 |
Night | 1460.31 | 2.13 |
Promotion | 2162.45 | 5.65 |
Season 2 | −754.56 | −1.85 |
Season 3 | −779.81 | −1.84 |
11.51
(a) Using and (use 100), for significance we need . So Division, November, Weekend, Night, and Promotion are all significant in the presence of all the other explanatory variables. (b) . (c) 52%. (d) 15246.36. (e) Because we don’t expect the same setting for very many games, the mean response interval doesn’t make sense, so a prediction interval is more appropriate to represent this particular game and its specific settings.
11.52 Bank auto loans.
Banks charge different interest rates for different loans. A random sample of 2229 loans made by banks for the purchase of new automobiles was studied to identify variables that explain the interest rate charged. A multiple regression was run with interest rate as the response variable and 13 explanatory variables.13
11.53 Bank auto loans, continued.
Table 11.4 gives the coefficients for the fitted model and the individual statistic for each explanatory variable in the study described in the previous exercise. The -values are given without the sign, assuming that all tests are two-sided.
Variable | ||
---|---|---|
Intercept | 15.47 | |
Loan size (in dollars) | −0.0015 | 10.30 |
Length of loan (in months) | −0.906 | 4.20 |
Percent down payment | −0.522 | 8.35 |
Cosigner (, ) | −0.009 | 3.02 |
Unsecured loan (, ) | 0.034 | 2.19 |
Total payments (borrower’s monthly installment debt) | 0.100 | 1.37 |
Total income (borrower’s total monthly income) | −0.170 | 2.37 |
Bad credit report (, ) | 0.012 | 1.99 |
Young borrower (, ) | 0.027 | 2.85 |
Male borrower (, ) | −0.001 | 0.89 |
Married (, ) | −0.023 | 1.91 |
Own home (, ) | −0.011 | 2.73 |
Years at current address | −0.124 | 4.21 |
565
11.53
(a) . (b) Loan size, Length of loan, Percent down, Cosigner, Unsecured loan, Total income, Bad credit report, Young borrower, Own home, and Years at current address are significant. Those that aren’t significant only mean that the particular variable is not useful after all other variables are considered included in the model already. (c) Having a larger loan size gives a smaller interest rate. Having a longer loan gives a smaller interest rate. Having a larger percent down payment gives a smaller interest rate. Having a cosigner gives a smaller interest rate. Having an unsecured loan gives a larger interest rate. Having larger total income gives a smaller interest rate. Having a bad credit report gives a larger interest rate. Being a young borrower gives a larger interest rate. Owning a home gives a smaller interest rate. More years at current address gives a smaller interest rate.
11.54 Auto dealer loans.
The previous two exercises describe auto loans made directly by a bank. The researchers also looked at 5664 loans made indirectly—that is, through an auto dealer. They again used multiple regression to predict the interest rate using the same set of 13 explanatory variables.
11.55 Auto dealer loans, continued.
Table 11.5 gives the estimated regression coefficient and individual statistic for each explanatory variable in the setting of the previous exercise. The -values are given without the sign, assuming that all tests are two-sided.
11.55
(a) . Any variable that is significant tells us that the particular variable is useful in predicting the response after all other variables are considered included in the model already. (b) Only Loan size, Length of loan, Percent down, and Unsecured loan are significant. (c) Having a larger loan size gives a smaller interest rate. Having a longer loan gives a smaller interest rate. Having a larger percent down payment gives a smaller interest rate. Having an unsecured loan gives a larger interest rate.
11.56 Direct versus indirect loans.
The previous four exercises describe a study of loans for buying new cars. The authors conclude that banks take higher risks with indirect loans because they do not take into account borrower characteristics when setting the loan rate. Explain how the results of the multiple regressions lead to this conclusion.
Variable | ||
---|---|---|
Intercept | 15.89 | |
Loan size (in dollars) | −0.0029 | 17.40 |
Length of loan (in months) | −1.098 | 5.63 |
Percent down payment | −0.308 | 4.92 |
Cosigner (, ) | −0.001 | 1.41 |
Unsecured loan (, ) | 0.028 | 2.83 |
Total payments (borrower’s monthly installment debt) | −0.513 | 1.37 |
Total income (borrower’s total monthly income) | 0.078 | 0.75 |
Bad credit report (, ) | 0.039 | 1.76 |
Young borrower (, ) | −0.036 | 1.33 |
Male borrower (, ) | −0.179 | 1.03 |
Married (, ) | −0.043 | 1.61 |
Own home (, ) | −0.047 | 1.59 |
Years at current address | −0.086 | 1.73 |
566
11.57 Canada’s Small Business Financing Program.
The Canada Small Business Financing Program (CSBFP) seeks to increase the availability of loans for establishing and improving small businesses. A survey was performed to better understand the experiences of small businesses when seeking loans and the extent to which they are aware of and satisfied with the CSBFP.14 A total of 1050 survey interviews were completed. To understand the drivers of perceived fairness of CSBFP terms and conditions, a multiple regression was undertaken. The response variable was the subject’s perceived fairness scored on a 5-point scale, where 1 means "very unfair’’ and 5 means "very fair.’’ The 15 explanatory variables included characteristics of the survey participant (gender, francophone, loan history, previous CSBFP borrower) and characteristics of his or her small business (type, location, size).
11.57
(a) are 15 and 1034. (b) The test is significant, meaning the model is good at predicting the response variable, but there is still a lot of variance that is unexplained (a lot of scatter around our current regression line) because is small. This small just means there are other potential predictors that may also help us, in addition to our current predictors, to account for this remaining scatter or variation in the response. (c) are 13 and 1034, . The added 13 variables do not contribute significantly in explaining the response when these 2 predictors are already in the model.
11.58 Compensation and human capital.
A study of bank branch manager compensation collected data on the salaries of 82 managers at branches of a large eastern U.S. bank.15 Multiple regression models were used to predict how much these branch managers were paid. The researchers examined two sets of explanatory variables. The first set included variables that measured characteristics of the branch and the position of the branch manager. These were number of branch employees, a variable constructed to represent how much competition the branch faced, market share, return on assets, an efficiency ranking, and the rank of the manager. A second set of variables was called human capital variables and measured characteristics of the manager. These were experience in industry, gender, years of schooling, and age. For the multiple regression using all the explanatory variables, the value of was 0.77. When the human capital variables were deleted, fell to 0.06. Test the null hypothesis that the coefficients for the human capital variables are all zero in the model that includes all the explanatory variables. Give the test statistic with its degrees of freedom and -value, and give a short summary of your conclusion in nontechnical language.