CLARIFYING THE CONCEPTS
1. Write the multiple regression equation for predictor variables. (p. 743)
13.3.1
2. Which is preferable, or , and why? (p. 745)
3. Which test do we perform if we want to determine whether our multiple regression is useful? (p. 746)
13.3.3
The test for the overall significance of the multiple regression
4. If we conclude from the test thatour multiple regression is useful, is it still possible that one of the s equals zero? Explain. (p. 746)
5. Explain the difference between the test and the test we learned in this section. (p. 747)
13.3.5
The test is for the overall significance of the multiple regression and the test is for testing whether a particular -variable has a significant relationship with the response variable .
6. How many tests may we perform for a multiple regression model? (p. 747)
7. How do we interpret the coefficient for a dummy variable. (Hint: Consider Figure 29.) (p. 750)
13.3.7
The coefficient of a dummy variable can be interpreted as the estimated increase in for those observations with the value of the dummy variable equal to 1 as compared to those with the value of the dummy variable equal to 0 when all of the other variables are held constant.
8. What are the four steps of the Strategy for Building a Multiple Regression Model. (p. 750)
PRACTICING THE TECHNIQUES
CHECK IT OUT!
To do | Check out | Topic |
---|---|---|
Exercises 9–16 | Example 11 | Multiple regression equation, coefficients, and predictions |
Exercises 17–20 | Example 12 | Calculating and interpreting the adjusted coefficient of determination |
Exercises 21–22 and 27–28 |
Example 13 |
test for the overall significance of the multiple regression |
Exercises 23 and 30 |
Example 14 | Performing a set of tests for the significance of a set of individual variables |
Exercise 29 | Example 15 | Dummy variables in multiple regression |
Exercises 24–26 and 31–33 |
Example 16 | Strategy for building a multiple regression model |
Use the following information for Exercises 9–12: A multiple regression model has been produced for a set of observations with multiple regression equation , with multiple coefficient of determination .
Assume the regression assumptions are met.
9. Interpret the value of the coefficient for .
13.3.9
For each increase in one unit of the variable , the estimated value of increases by 5 units when the value of is held constant.
10. Explain what the value of means.
11. Interpret the coefficients , and .
13.3.11
The estimated value of when and is . means that for each increase of one unit of the variable , the estimated value of increases by 5 units when the value of is held constant. means that for each increase of one unit of the variable , the estimated value of increases by 8 units when the value of is held constant.
12. Find point estimates of for the following values of and :
Use the following information for Exercises 13–16: A multiple regression model has been produced for a set of observations with multiple regression equation , with multiple coefficient of determination . Assume the regression assumptions are met.
13. Interpret the value of the coefficient for .
13.3.13
For each increase in one unit of the variable , the estimated value of decreases by 0.1 unit when the value of is held constant.
14. Explain what the value of means.
15. Interpret the coefficients , and .
13.3.15
The estimated value of when and is . means that for each increase of one unit of the variable , the estimated value of decreases by 0.1 unit when the value of is held constant. means that for each increase of one unit of the variable , the estimated value of increases by 0.9 unit when the value of is held constant.
16. Find point estimates of for the following values of and :
17. For the data in Exercises 9–12, how should the value of be interpreted?
13.3.17
50% of the variability in is accounted for by this multiple regression equation.
18. Calculate for the data in Exercises 9–12.
19. For the data in Exercises 13–16, how should the value of be interpreted?
13.3.19
75% of the variability in is accounted for by this multiple regression equation.
20. Calculate for the data in Exercises 13–16.
755
Use the following data set for Exercises 21-26.
0.6 | 1 | 10 | 1.3 |
4.0 | 2 | 10 | −3.2 |
3.2 | 3 | 8 | −1.0 |
9.0 | 4 | 8 | 0.9 |
1.8 | 5 | 6 | −2.5 |
8.4 | 6 | 6 | 0.9 |
9.8 | 7 | 4 | 1.0 |
10.4 | 8 | 4 | 2.0 |
8.8 | 9 | 2 | 0.2 |
14.7 | 10 | 2 | −2.2 |
21. Perform the multiple regression of on , , and , and write the multiple regression equation.
13.3.21
22. Assume the regression assumptions are met. Perform the test for the significance of the overall regression, using level of significance . Do the following:
23. Perform the test for the significance of the individual predictor variables, using level of significance . Do the following:
13.3.23
(a) Test : There is no linear relationship between and . : There is a linear relationship between and . Reject if the -value . Test 2: : There is no linear relationship between and . : There is a linear relationship between and . Reject if the -value . Test 3: : There is no linear relationship between and . : There is a linear relationship between and . Reject if the -value . (b) Test , with . Test , with . Test , with . (c) Test 1: The , which is . Therefore we reject . There is evidence of a linear relationship between and . Test 2: The , which is . Therefore we reject . There is evidence of a linear relationship between and . Test 3: The , which is not . Therefore we do not reject . There is insufficient evidence of a linear relationship between and .
24. Identify any predictors that have corresponding -values greater than the level of significance . Of these, discard the variable with the largest -value. Then redo Exercise 23, omitting this predictor. Repeat if necessary.
25. Verify the regression assumptions for your final model from Exercise 24.
13.3.25
The scatterplot above of the residuals versus fitted values shows no strong evidence of unhealthy patterns. Thus, the independence assumption, the constant variance assumption, and the zero-mean assumption are verified. Also, the normal probability plot of the residuals above indicates no evidence of departure from normality of the residuals. Therefore we conclude that the regression assumptions are verified.
26. Report and interpret your final model from Exercise 24, by doing the following:
Use the following data set for Exercises 27–33. Note that is a dummy variable.
−0.7 | 2 | 0.1 | 0 |
6.4 | 4 | −2.5 | 1 |
2.8 | 6 | 2.7 | 0 |
9.4 | 8 | 2.8 | 1 |
8.6 | 10 | −1.6 | 0 |
13.1 | 12 | 1.0 | 1 |
12.2 | 14 | −1.4 | 0 |
19.1 | 16 | −0.5 | 1 |
18.8 | 18 | 1.0 | 0 |
23.2 | 20 | −2.3 | 1 |
27. Perform the multiple regression of on , , and , and write the multiple regression equation.
13.3.27
The regression equation is .
28. Assume the regression assumptions are met. Perform the test for the significance of the overall regression, using level of significance . Do the following:
29. Interpret the coefficient for the dummy variable.
13.3.29
For each increase in one unit of the variable , the estimated value of increases by 3.55 units when the values of and are held constant.
30. Perform the test for the significance of the individual predictor variables, using level of significance . Do the following:
31. Identify any predictors that have corresponding -values greater than the level of significance . Of these, discard the variable with the largest -value. Then redo Exercise 30, omitting this predictor. Repeat if necessary.
13.3.31
The -value for is the only -value greater than , so we eliminate from the multiple regression equation. The new regression equation is . (a) Test : There is no linear relationship between and . : There is a linear relationship between and . Reject if the -value . Test There is no linear relationship between and . : There is a linear relationship between and . Reject if the -value . (b) Test 1: , with ; Test 2: , with . (c) Test 1: The , which is . Therefore we reject . There is evidence of a linear relationship between and . Test 2: The , which is . Therefore we reject . There is evidence of a linear relationship between and . Since all of the variables are significant, we have our final multiple regression equation.
32. Verify the regression assumptions for your final model from Exercise 31.
33. Report and interpret your final model from Exercise 31, by doing the following:
13.3.33
(a) The final multiple regression equation is . For , the regression equation is . For , the regression equation is . (b) For each increase in one unit of the variable , the estimated value of increases by 1.15 units. The estimated increase in for those observations with , as compared to those with , when is held constant, is 3.61. (c) Using the multiple regression equation in (a), the size of the typical prediction error will be about 0.959129. 98.4% of the variability in is accounted for by this multiple regression equation.
APPLYING THE CONCEPTS
For Exercises 34–39, apply the Strategy for Building a Multiple Regression Model by performing the following steps, using level of significance :
bestdating
34. Best Places for Dating. Sperling's Best Places published the list of best places for dating in America for 2010. Table 6 shows the top 10 places, along with the overall dating score () and a set of predictor variables.
756
City |
= Overall dating score |
Percentage 18–24 years old |
Percentage 18–24 years and single |
Online dating score |
---|---|---|---|---|
Austin | 100.0 | 13.40% | 81.20% | 77.8 |
Colorado Springs |
88.7 | 10.50% | 74.20% | 88.9 |
San Diego | 84.0 | 11.30% | 79.40% | 77.4 |
Raleigh | 80.7 | 11.60% | 82.90% | 79.2 |
Seattle | 78.7 | 9.00% | 83.90% | 100.0 |
Charleston | 78.7 | 11.20% | 82.70% | 66.9 |
Norfolk | 77.0 | 11.20% | 75.60% | 82.9 |
Ann Arbor | 75.5 | 12.90% | 90.30% | 51.1 |
Springfield | 75.2 | 11.70% | 89.80% | 63.5 |
Honolulu | 75.2 | 10.10% | 82.30% | 50.2 |
bestbusiness
35. Ease of Doing Business. Doing Business (www.doingbusiness.org) publishes statistics on how easy or difficult different countries make it to do business. Table 7 shows the top 12 countries for ease of doing business, with .
Country | Easiness score |
Starting a business |
Employing workers |
Paying taxes |
---|---|---|---|---|
Singapore | 100 | 10 | 1 | 5 |
New Zealand | 99 | 1 | 14 | 12 |
United States | 98 | 6 | 1 | 46 |
Hong Kong | 97 | 15 | 20 | 3 |
Denmark | 96 | 16 | 10 | 13 |
U.K. | 95 | 8 | 28 | 16 |
Ireland | 94 | 5 | 38 | 6 |
Canada | 93 | 2 | 18 | 28 |
Australia | 92 | 3 | 8 | 48 |
Norway | 91 | 33 | 99 | 18 |
Iceland | 90 | 17 | 62 | 32 |
Japan | 89 | 64 | 17 | 112 |
13.3.35
See Solutions Manual.
vaweather
36. Virginia Weather. Table 8 contains data on weather in a sample of cities in the state of Virginia. We are interested in predicting .
City | Heating degreedays |
Avg. Jan. temp. |
Avg. July temp. |
Cooling degreedays |
---|---|---|---|---|
Alexandria | 4055 | 34.9 | 79.2 | 1531 |
Arlington | 4055 | 34.9 | 79.2 | 1531 |
Blacksburg | 5559 | 30.9 | 71.1 | 533 |
Charlottesville | 4103 | 35.5 | 76.9 | 1212 |
Chesapeake | 3368 | 40.1 | 79.1 | 1612 |
Danville | 3970 | 36.6 | 78.8 | 1418 |
Hampton | 3535 | 39.4 | 78.5 | 1432 |
Harrisonburg | 5333 | 30.5 | 73.5 | 758 |
Leesburg | 5031 | 31.5 | 75.2 | 911 |
Lynchburg | 4354 | 34.5 | 75.1 | 1075 |
Manassas | 4925 | 31.7 | 75.7 | 1075 |
Newport News | 3179 | 41.2 | 80.3 | 1682 |
Norfolk | 3368 | 40.1 | 79.1 | 1612 |
Petersburg | 3334 | 39.7 | 79.6 | 1619 |
Portsmouth | 3368 | 40.1 | 79.1 | 1612 |
Richmond | 3919 | 36.4 | 77.9 | 1435 |
Roanoke | 4284 | 35.8 | 76.2 | 1134 |
Suffolk | 3467 | 39.6 | 78.5 | 1427 |
Virginia Beach | 3336 | 40.7 | 78.8 | 1482 |
healthinsurance
37. Health Insurance Coverage. We are interested in estimating , using and . Use the data in Table 9, containing a random sample of U.S. states. All data are in thousands.
State | Persons covered |
Adults not covered |
Children not covered |
---|---|---|---|
Alabama | 3,843 | 689 | 82 |
Arizona | 4,958 | 1,311 | 283 |
Colorado | 3,977 | 826 | 176 |
Georgia | 7,688 | 1,659 | 314 |
Illinois | 10,867 | 1,776 | 302 |
Kentucky | 3,467 | 639 | 98 |
Maryland | 4,836 | 776 | 137 |
Massachusetts | 5,678 | 657 | 103 |
Michigan | 8,928 | 1,043 | 116 |
Minnesota | 4,675 | 475 | 104 |
Missouri | 5,028 | 772 | 127 |
New Jersey | 7,319 | 1,341 | 277 |
North Carolina | 7,266 | 1,585 | 307 |
Ohio | 10,181 | 1,138 | 157 |
Pennsylvania | 11,108 | 1,237 | 203 |
South Carolina | 3,553 | 672 | 112 |
Tennessee | 5,111 | 809 | 94 |
Virginia | 6,532 | 1,006 | 185 |
Washington | 5,572 | 746 | 105 |
Wisconsin | 4,995 | 481 | 63 |
13.3.37
See Solutions Manual.
757
accounting
38. Regression in Accounting. We are interested in estimating using , , and . Use the data in Table 10, containing a random sample of large technology companies in 2010. Total assets and total liabilities are in billions of dollars.
Company | Current ratio |
Price–earnings ratio |
Assets | Liabilities |
---|---|---|---|---|
Microsoft | 1.82 | 12.51 | 77.9 | 38.3 |
Intel | 2.79 | 18.44 | 53.1 | 11.4 |
Dell | 1.28 | 10.95 | 33.7 | 28.0 |
Apple | 1.88 | 24.57 | 53.9 | 26.0 |
10.62 | 18.87 | 40.5 | 4.5 |
systolic
39. Blood Pressure. Open the data set Systolic. We are interested in estimating , based on the other predictor variables.
13.3.39
See Solutions Manual.
40. Baseball. In Example 16, interpret the coefficients for Triples, Hits, Home Runs, RBIs, Walks, and Red Sox.
Your Best Model. Work with the Nutrition data sets for Exercises 41 and 42.
nutrition
41. Use technology to apply the Strategy for Building a Multiple Regression Model, using level of significance , for predicting the number of calories, with the following -variables: protein, fat, saturated fat, cholesterol, carbohydrates, calcium, phosphorous, iron, potassium, sodium, thiamin, niacin, and ascorbic acid.
13.3.41
The standard error in the estimate for the final model is . That is, using the multiple regression equation given above, the size of the typical prediction error will be about 16.7233 calories. The adjusted coefficient of variation is . In other words, 99.91% of the variation in calories is accounted for by this multiple regression equation.
nutrition
42. Write a summary to interpret each regression coefficient, and comment on which variables are the most important for predicting the number of calories.