CLARIFYING THE CONCEPTS
1. What is the difference between the regression equation (calculated using the sample) and the population regression equation? (p. 718)
13.1.1
The regression equation is calculated from a sample and is valid only for values of in the range of the sample data. The population regression equation may be used to approximate the relationship between the predictor variable and the response variable for the entire population of (, ) pairs.
2. What are the four regression model assumptions? (p. 718)
3. How do we go about verifying the regression model assumptions? (p. 719)
13.1.3
We construct a scatterplot of the residuals against the fitted values and a normal probability plot of the residuals. We must make sure that the scatterplot contains no strong evidence of any unhealthy patterns and that the normal probability plot indicates no evidence of departures from normality in residuals.
4. What is the difference between and on the one hand and and on the other hand? (p. 718)
5. What does it mean for the relationship between and when equals 0? (p. 721)
13.1.5
There is no relationship between and .
6. What is the difference between and ? (p. 722)
PRACTICING THE TECHNIQUES
CHECK IT OUT!
To do | Check out | Topic |
---|---|---|
Exercises 7–14 | Example 2 | Calculating the residuals and verifying the regression assumptions |
Exercises 15–18 | Example 4 | Test for the slope : critical-value method |
Exercises 19–22 | Example 5 | Test for the slope : -value method |
Exercises 23–30 | Example 6 | Confidence interval for the slope |
Exercises 31–38 | Example 7 | Using confidence intervals to perform the test for the slope |
For Exercises 7–14, you are given the regression equation.
7.
1 | 15 |
2 | 20 |
3 | 20 |
4 | 25 |
5 | 25 |
13.1.7
(a) and (b)
Predicted value |
Residual () |
||
---|---|---|---|
1 | 15 | 16 | –1 |
2 | 20 | 18.5 | 1.5 |
3 | 20 | 21 | –1 |
4 | 25 | 23.5 | 1.5 |
5 | 25 | 26 | –1 |
(c) and (d) See Solutions Manual. (e) The scatterplot of the residuals contains an unhealthy pattern, so the regression assumptions are not verified.
8.
0 | 10 |
5 | 20 |
10 | 45 |
15 | 50 |
20 | 75 |
732
9.
−5 | 0 |
−4 | 8 |
−3 | 8 |
−2 | 16 |
−1 | 16 |
13.1.9
(a) and (b)
Predicted value |
Residual () |
||
---|---|---|---|
–5 | 0 | 1.6 | –1.6 |
–4 | 8 | 5.6 | 2.4 |
–3 | 8 | 9.6 | –1.6 |
–2 | 16 | 13.6 | 2.4 |
–1 | 16 | 17.6 | –1.6 |
(c) and (d) See Solutions Manual. (e) The scatterplot of the residuals contains an unhealthy pattern, so the regression assumptions are not verified.
10.
−3 | −5 |
−1 | −15 |
1 | −20 |
3 | −25 |
5 | −30 |
11.
10 | 100 |
20 | 95 |
30 | 85 |
40 | 85 |
50 | 80 |
13.1.11
(a) and (b)
Predicted value |
Residual () |
||
---|---|---|---|
10 | 100 | 99 | 1 |
20 | 95 | 94 | 1 |
30 | 85 | 89 | –4 |
40 | 85 | 84 | 1 |
50 | 80 | 79 | 1 |
(c) and (d) See Solutions Manual. (e) The scatterplot of the residuals contains an unhealthy pattern, so the regression assumptions are not verified.
12.
0 | 11 |
20 | 11 |
40 | 16 |
60 | 21 |
80 | 26 |
13.
1 | 1 |
2 | 1 |
3 | 2 |
4 | 3 |
5 | 3 |
13.1.13
(a) and (b)
1 | 1 | 0.8 | 0.2 |
2 | 1 | 1.4 | –0.4 |
3 | 2 | 2 | 0 |
4 | 3 | 2.6 | 0.4 |
5 | 3 | 3.2 | –0.2 |
(c)
(d)
(e) The scatterplot in (c) of the residuals versus fitted values shows no strong evidence of unhealthy patterns. Thus, the independence assumption, the constant variance assumption, and the zero-mean assumption are verified. Also, the normal probability plot of the residuals in (d) indicates no evidence of departure from normality of the residuals. Therefore we conclude that the regression assumptions are verified.
14.
1 | 6 |
2 | 5 |
2 | 4 |
2 | 3 |
3 | 2 |
For Exercises 15–18, follow these steps. Assume that the regression model assumptions are valid.
15. Data in Exercise 7, where
13.1.15
(a) (b) (c) (d) (e) : There is no linear relationship between and . : There is a linear relationship between and . Reject if or . Since , we reject . There is evidence at level of significance that and that there is a linear relationship between and .
16. Data in Exercise 8, where
17. Data in Exercise 9, where
13.1.17
(a) (b) (c) (d) (e) : There is no linear relationship between and . : There is a linear relationship between and . Reject if or . Since , we reject . There is evidence at level of significance that and that there is a linear relationship between and .
18. Data in Exercise 10, where
For Exercises 19–22, follow these steps. Assume that the regression model assumptions are valid.
19. Data in Exercise 11, where
13.1.19
(a) (b) (c) (d) (e) : There is no linear relationship between and . : There is a linear relationship between and . Reject if -value . Since -value , we reject . There is evidence at level of significance that and that there is a linear relationship between and .
20. Data in Exercise 12, where
21. Data in Exercise 13, where
13.1.21
(a) (b) (c)
(d) (e) : There is no linear relationship between and . : There is a linear relationship between and . Reject if -value . Since -value , we reject . There is evidence at level of significance that and that there is a linear relationship between and .
22. Data in Exercise 14, where
For Exercises 23–30, follow these steps. Assume that the regression model assumptions are valid.
23. Data in Exercise 7
13.1.23
(a) (b) (c) (0.909, 4.091)
24. Data in Exercise 8
25. Data in Exercise 9
13.1.25
(a) (b) (c) (1.4544, 6.5456)
26. Data in Exercise 10
27. Data in Exercise 11
13.1.27
(a) (b) (c) (–0.7598, −0.2402)
28. Data in Exercise 12
29. Data in Exercise 13
13.1.29
(a) (b) (c) (0.2326, 0.9674). TI-83/84: (0.2325, 0.9675)
30. Data in Exercise 14
For Exercises 31–38, using the confidence interval from the indicated exercise, perform the test for at level of significance .
31. Exercise 23
13.1.31
: There is no linear relationship between and . : There is a linear relationship between and . Since the confidence interval from Exercise 23 (c) does not contain zero, we may conclude that and that a linear relationship exists between and , at level of significance .
32. Exercise 24
33. Exercise 25
13.1.33
: There is no linear relationship between and . : There is a linear relationship between and . Since the confidence interval from Exercise 25 (c) does not contain zero, we may conclude that and that a linear relationship exists between and , at level of significance .
34. Exercise 26
35. Exercise 27
13.1.35
: There is no linear relationship between and . There is a linear relationship between and . Since the confidence interval from Exercise 27 (c) does not contain zero, we may conclude that and that a linear relationship exists between and , at level of significance .
36. Exercise 28
37. Exercise 29
13.1.37
: There is no linear relationship between and . There is a linear relationship between and . Since the confidence interval from Exercise 29 (c) does not contain zero, we may conclude that and that a linear relationship exists between and , at level of significance .
38. Exercise 30
APPLYING THE CONCEPTS
For Exercises 39–46, assume the regression requirements are met. Test for the linear relationship between and , using level of significance .
733
volweight
39. Volume and Weight. The following table contains the volume (, in cubic meters) and weight (, in kilograms) of five randomly chosen packages shipped to a local college.
Volume |
Weight |
---|---|
4 | 10 |
8 | 16 |
12 | 25 |
16 | 30 |
20 | 35 |
13.1.39
: There is no relationship between volume () and weight (). : There is a linear relationship between volume () and weight (). Reject if the -value ≤ 0.05. Since the -value is , we reject . There is evidence for a linear relationship between volume () and weight ().
familypet
40. Family Size and Pets. The number of family members in a random sample taken from a suburban neighborhood, along with the number of pets belonging to each family are shown in the following table.
Family size |
Pets |
---|---|
2 | 1 |
3 | 2 |
4 | 2 |
5 | 3 |
6 | 3 |
worldtemp
41. World Temperatures. Listed in the following table are the low and high temperatures for a particular day, measured in degrees Fahrenheit, for a random sample of cities worldwide.
City | Low | High |
---|---|---|
Kolkata | 57 | 77 |
London | 36 | 45 |
Montreal | 7 | 21 |
Rome | 39 | 55 |
San Juan | 70 | 83 |
Shanghai | 34 | 45 |
13.1.41
: There is no relationship between Low () and High ().
: There is a linear relationship between Low () and High ().
Reject if . Since is , we reject . There is evidence for a linear relationship between Low () and High ().
videogamereg
42. Video Game Sales. The Chapter 1 Case Study looked at video game sales for the top 30 video games. The following table contains the total sales (, in game units) and weeks on the top 30 list of five randomly chosen video games.
Video game | Weeks | Total sales |
---|---|---|
Super Mario Bros. U for WiiU | 78 | 1,690,689 |
NBA 2K14 for PS4 | 27 | 608,899 |
Battlefield 4 for PS3 | 29 | 911,687 |
Titanfall for Xbox One | 10 | 1,150,856 |
Yoshi's New Island for 3DS | 10 | 172,680 |
dartsdjia
43. Darts and the Dow Jones. The following table contains a random sample of eight days from the Chapter 3 Case Study data set, indicating the stock market gain or loss for the portfolio chosen by the random darts , as well as the Dow Jones Industrial Average gain or loss for that day .
Darts | DJIA |
---|---|
−27.4 | −12.8 |
18.7 | 9.3 |
42.2 | 8 |
−16.3 | −8.5 |
11.2 | 15.8 |
28.5 | 10.6 |
1.8 | 11.5 |
16.9 | −5.3 |
13.1.43
: No linear relationship exists between DJIA and darts.
: A linear relationship exists between DJIA and darts.
Reject if the -value . . -value .
The is not ≤ , so we do not reject . Insufficient evidence exists, at level of significance , for a linear relationship between DJIA and darts.
ageheight
44. Age and Height. The following table provides a random sample from the Chapter 4 Case Study data set Body Females, showing the age and height of the eight women.
Age | Height |
---|---|
40 | 63.5 |
28 | 63 |
25 | 64.4 |
34 | 63 |
26 | 63.8 |
21 | 68 |
19 | 61.8 |
24 | 69 |
gardasilreg
45. Gardasil Shots and Age. The accompanying table shows a random sample of 10 patients from the Chapter 5 Case Study data set, Gardasil, including the age of the patient and the number of shots taken by the patient .
Age | Shots |
---|---|
13 | 3 |
21 | 3 |
16 | 3 |
17 | 2 |
17 | 3 |
18 | 1 |
25 | 2 |
15 | 3 |
12 | 1 |
16 | 1 |
13.1.45
: No linear relationship exists between age and shots.
: A linear relationship exists between age and shots.
Reject if the -value . . .
The is not , so we do not reject . insufficient evidence exists, at level of significance , for a linear relationship between a patient's age and the number of shots taken by the patient.
ncaa2014
46. NCAA Power Ratings. The accompanying table shows the top 10 teams' winning percentage and power rating for the 2013-2014 NCAA basketball season, according to www.teamrankings.com.
734
Team | Winning proportion |
Power rating |
---|---|---|
Florida | 0.923 | 121.2 |
Wichita State | 0.971 | 119.1 |
Arizona | 0.868 | 118.8 |
Louisville | 0.838 | 117.9 |
Connecticut | 0.800 | 117.2 |
Virginia | 0.811 | 116.8 |
Wisconsin | 0.789 | 116.6 |
Villanova | 0.853 | 116.4 |
Michigan State | 0.763 | 115.9 |
Michigan | 0.757 | 115.9 |
For Exercises 47–54, do the following for the indicated data:
47. Volume and Weight. Data from Exercise 39.
13.1.47
(a) (b) (1.2688, 1.9312) (c) We are 95% confident that the interval (1.2688, 1.9312) captures the population slope of the relationship between volume and weight.
48. Family Size and Pets. Data from Exercise 40
49. World Temperatures. Data from Exercise 41
13.1.49
(a) (b) (0.8064, 1.2868). TI-83/84: (0.8063, 1.2868) (c) We are 95% confident that the interval (0.8063, 1.2868) captures the population slope of the relationship between Low and High.
50. Video Game Sales. Data from Exercise 42
51. Darts and the Dow Jones. Data from Exercise 43
13.1.51
(a) 1.5910 (b) (–0.1769, 3.0051) (c) We are 95% confident that the interval (–0.1769, 3.0051) captures the slope of the population regression line. That is, we are 95% confident that for each additional $1 the stocks in the DJIA gain in one day, the daily change in the stocks in the portfolio predicted by the darts lies between –$0.039 and $0.66248.
52. Age and Height. Data from Exercise 44
53. Gardasil Shots and Age. Data from Exercise 45
13.1.53
(a) 0.19824 shot (b) (–0.1826, 0.21388) (c) We are 95% confident that the interval (–0.1826, 0.21388) captures the slope of the population regression line. That is, we are 95% confident that for each additional year in a patient's age, the change in the number of shots taken by the patient lies between −0.1826 shot and 0.21388 shot.
54. NCAA Power Ratings. Data from Exercise 46
satfatreg
55. Saturated Fat and Calories. The table contains the calories and saturated fat in a sample of 10 food items.
Food item | Calories | Grams of saturated fat |
---|---|---|
Chocolate bar (1.45 oz) | 215.66 | 6.9618 |
Meat & veggie pizza, big slice (1/8 lg pizza) |
363.81 | 5.6472 |
New England clam chowder (cup) |
148.80 | 1.8600 |
Baked chicken drumstick (no skin, medium size) |
75.24 | 0.6424 |
Curly fries, deep-fried (4 ounces) |
276.21 | 3.1752 |
Wheat bagel (large) | 374.66 | 0.2751 |
Chicken curry (cup) | 146.32 | 1.5930 |
Cake doughnut hole (one) | 58.94 | 0.5068 |
Rye bread (slice) | 67.34 | 0.1638 |
Raisin Bran cereal (cup) | 194.59 | 0.3355 |
13.1.55
(a) (–6.647, 49.817) (b) 0 lies in the interval, so we do not reject .
displacement
56. Engine Displacement and Gas Mileage. The table provides the engine displacement (size, in liters) and the city MPG (miles per gallon) gas mileage of a random sample of 12 vehicles.
Vehicle | Engine displacement |
City MPG |
---|---|---|
GMC Yukon Denali | 6.2 | 13 |
Ford E350 Wagon | 5.4 | 11 |
BMW 435i Coupe | 3.0 | 20 |
Land Rover Range Rover | 5.0 | 13 |
Infiniti Q50a | 3.7 | 19 |
Dodge Journey | 3.6 | 17 |
Jaguar XF | 5.0 | 15 |
Dodge Challenger | 6.4 | 14 |
Toyota Highlander Hybrid | 3.5 | 28 |
Mercedes-Benz S 550 | 4.7 | 17 |
Ford Fiesta | 1.6 | 29 |
Hyundai Elantra | 2.0 | 24 |
Batting Average and Runs Scored. The table shows the top 10 hitters in Major League Baseball for 2013. We are interested in estimating the number of runs scored by using the player's batting average . Use this information for Exercises 57–60.
Batter | Team | Runs scored |
Batting average |
---|---|---|---|
Miguel Cabrera | Tigers | 103 | 0.348 |
Mike Trout | Angels | 109 | 0.323 |
Matt Carpenter | Cardinals | 126 | 0.318 |
Andrew McCutchen | Pirates | 97 | 0.317 |
Paul Goldschmidt | Diamondbacks | 103 | 0.302 |
Josh Donaldson | Athletics | 89 | 0.301 |
Chris Davis | Orioles | 103 | 0.286 |
Carlos Gomez | Brewers | 80 | 0.284 |
Manny Machado | Orioles | 88 | 0.283 |
Evan Longoria | Rays | 91 | 0.269 |
batters2013
57. Assess the regression assumptions. Is it okay to proceed with the regression?
13.1.57
The scatterplot of the residuals versus the fitted values shows no evidence of the unhealthy patterns shown in Figure 4. Thus, the independence assumption, the constant variance assumption, and the zero-mean assumption are verified.
Also, the normal probability plot of the residuals indicates no evidence of departures from normality in the residuals. Therefore, we conclude that the regression assumptions are verified and it is okay to proceed with the regression.
batters2013
58. Regress runs scored on batting average. Test for the significance of the linear relationship, using level of significance .
batters2013
59. Find the residual for Matt Carpenter. What is unusual about Matt Carpenter in this regression?
13.1.59
22.51 runs; by far the highest number of runs scored but the third highest batting average.
735
batters2013
60. Test for the significance of the linear relationship, using level of significance . Compare your conclusion with the earlier regression, using level of significance . How do you suggest we resolve this dilemma?
BRINGING IT ALL TOGETHER
SAT Reading and Math Scores. Use this information for Exercises 61–65. The table shows the SAT scores for five states as reported by the College Board. We are interested in whether a linear relationship exists between the SAT Reading score and the SAT Math score .
State | SAT Reading | SAT Math |
---|---|---|
New York | 497 | 510 |
Connecticut | 515 | 515 |
Massachusetts | 518 | 523 |
New Jersey | 501 | 514 |
New Hampshire | 522 | 521 |
statesat
61. What Result Might We Expect? Consider the accompanying scatterplot of Math score versus Reading score. Is there evidence for or against the null hypothesis that no linear relationship exists? Explain.
13.1.61
Against; the points appear to lie near a line with a positive slope.
statesat
62. Consider the following graphics. Is there strong evidence that the regression assumptions are violated?
statesat
63. Test, using level of significance , whether a linear relationship exists between the SAT reading score and the SAT Math score.
13.1.63
: There is no linear relationship between SAT Reading score () and SAT Math score (). : There is a linear relationship between SAT Reading score () and SAT Math score (). Reject if or . Since , we reject . There is evidence at level of significance that and that there is a linear relationship between SAT Reading score () and SAT Math score ().
statesat
64. Construct and interpret a 90% confidence interval for a slope .
statesat
65. Do your inferences in Exercises 63 and 64 agree with each other? Explain.
13.1.65
Yes. Since the confidence interval from Exercise 64 does not contain zero, we may conclude that and that a linear relationship exists between SAT Reading score () and SAT Math score (), at level of significance .
66. Challenge Exercise. Suppose we have a regression equation whose slope was not significant (that is, the null hypothesis was not rejected). What if we add five new data values to the original data set, and all five data values are identical, ? How and why will this affect the following statistics? Will the statistic increase, decrease, or remain unchanged, or is there insufficient information to determine? (Hint: The data point always lies on the estimated regression line.)
67. Challenge Exercise. Refer to Exercise 66. How and why will the change affect the following items?
13.1.67
(a) increases if is positive and decreases if is negative. (b) remains the same. (c) decreases. (d) -value decreases. (e) Since we don't know what the new -value will be, we don't know if the -value will decrease enough to change the conclusion from “Do not reject ” to “Reject .”
68. Challenge Exercise. Suppose a regression analysis of on was found to be significant (that is, the null hypothesis was rejected). What if we get 10 new data values, all with different values of and all of which can be found on the estimated regression line of the original model? How and why will this change affect the following statistics? Will the statistic increase, decrease, or remain unchanged, or is there insufficient information to determine?
69. Refer to Exercise 68. How and why will this change affect the following measures?
13.1.69
(a) increases if is positive and decreases if is negative.
(b) increases. (c) decreases. (d) -value decreases. (e) Unchanged.
70. Challenge Exercise. Suppose a regression analysis of on was found to be significant (that is, the null hypothesis was rejected) and the slope . Consider the observation , which represents the data value for the maximum value of in the data set. Suppose the residual for is negative. What if we increase max by an arbitrary amount so that the new data value is ? (All other data values in the data set are unchanged.) How will this increase affect the following measures? Will they increase, decrease, or remain unchanged, or is there insufficient information to determine the effect?
736
71. Challenge Exercise. Refer to Exercise 70. How and why will the change affect the following measures?
13.1.71
(a-b) Decrease (c-d) Increase (e) Depends on the new -value.
WORKING WITH LARGE DATA SETS
For Exercises 72–74, use technology to solve the following problems:
darts
72. Open the Darts data set, which we used for the Chapter 3 Case Study. Use the Dow Jones Industrial Average to estimate the pros' performance .
nutrition
73. Open the Nutrition data set. Estimate the number of calories per gram using the amount of fat per gram .
13.1.73
(a) The scatterplot of the residuals contains evidence of an unhealthy pattern and the normal probability plot indicates evidence of departures from normality in the residuals. Therefore we conclude that the regression assumptions are not verified. (b) (7.821, 8.437). We are 95% confident that the interval (13.5483, 14.6201) captures the population slope of the relationship between fat per gram and calories per gram. (c) Yes (d) . There is no relationship between fat per gram () and calories per gram . There is a linear relationship between fat per gram () and calories per gram (). Reject if . -value ≈ 0. Since the , we reject . There is evidence for a linear relationship between fat per gram () and calories per gram ().
pulseandtemp
74. Open the Pulse and Temp data set. Estimate body temperature using heart rate .
Use technology for Exercises 75–78. Open the Crash data set, which contains information about the severity of injuries sustained by crash dummies when the National Transportation Safety Board crashed automobiles into a wall at 35 miles per hour.
crash
75. The variable head_inj contains a measure of the severity of the head injury sustained by the dummies. The variable chest_in is a measure of the severity of the chest injury suffered by the crash dummies.
13.1.75
(a) No (b) Positive relationship (c) Unclear
crash
76. Perform a regression of the head injury severity on the chest injury severity .
crash
77. The variable lleg_inj contains a measure of the severity of the injury sustained by the dummies' left legs. The variable weight contains the weight of the vehicles.
13.1.77
(a) No (b) No apparent relationship between the variables (c) The weight of vehicles is the predictor variable and the severity of the leg injuries should be the response variable.
crash
78. Perform a regression of the left leg injury severity on the vehicle weight .