SECTION 2.5 EXERCISES

For Exercise 2.92, see page 125; and for Exercise 2.93 see page 125.

Question 2.94

2.94 Bone strength. Exercise 2.24 (page 97) gives the bone strengths of the dominant and the nondominant arms for 15 men who were controls in a study. The least-squares regression line for these data is

dominant = 2.74 + (0.936 × nondominant)

Here are the data for four cases:

ID Nondominant Dominant ID Nondominant Dominant
5 12.0 14.8 7 12.3 13.1
6 20.0 19.8 8 14.4 17.5

Calculate the residuals for these four cases.

Question 2.95

2.95 Bone strength for baseball players. Refer to the previous exercise. Similar data for baseball players is given in Exercise 2.25 (page 98). The equation of the least-squares line for the baseball players is

dominant = 0.886 + (1.373 × nondominant)

Here are the data for the first four cases:

ID Nondominant Dominant ID Nondominant Dominant
20 21.0 40.3 22 31.5 36.9
21 14.6 20.8 23 14.9 21.2

Calculate the residuals for these four cases.

134

Question 2.96

2.96 Least-squares regression for radioactive decay. Refer to Exercise 2.32 (page 99) for the data on radioactive decay of barium-137m. Here are the data:

Time 1 3 5 7
Count 578 317 203 118
  1. (a) Using the least-squares regression equation

    count = 602.8 − (74.7 × time)

    and the observed data, find the residuals for the counts.

  2. (b) Plot the residuals versus time.

  3. (c) Write a short paragraph assessing the fit of the least-squares regression line to these data based on your interpretation of the residual plot.

Question 2.97

2.97 Least-squares regression for the log counts. Refer to Exercise 2.33 (page 99), where you analyzed the radioactive decay of barium-137m data using log counts. Here are the data:

Time 1 3 5 7
Log count 6.35957 5.75890 5.31321 4.77068
  1. (a) Using the least-squares regression equation

    log count = 6.593 − (0.2606 × time)

    and the observed data, find the residuals for the counts.

  2. (b) Plot the residuals versus time.

  3. (c) Write a short paragraph assessing the fit of the least-squares regression line to these data based on your interpretation of the residual plot.

Question 2.98

2.98 College students by state. Refer to Exercise 2.75 (page 119), where you examined the relationship between the number of undergraduate college students and the populations for the 50 states.

  1. (a) Make a scatterplot of the data with the least-squares regression line.

  2. (b) Plot the residuals versus population.

  3. (c) Focus on California, the state with the largest population. Is this state an outlier when you consider only the distribution of population? Explain your answer and describe what graphical and numerical summaries you used as the basis for your conclusion.

  4. (d) Is California an outlier in the distribution of undergraduate college students? Explain your answer and describe what graphical and numerical summaries you used as the basis for your conclusion.

  5. (e) Is California an outlier when viewed in terms of the relationship between number of undergraduate college students and population? Explain your answer and describe what graphical and numerical summaries you used as the basis for your conclusion.

  6. (f) Is California influential in terms of the relationship between number of undergraduate college students and population? Explain your answer and describe what graphical and numerical summaries you used as the basis for your conclusion.

Question 2.99

2.99 College students by state using logs. Refer to the previous exercise. Answer parts (a) through (f) for that exercise using the logs of both variables. Write a short paragraph summarizing your findings and comparing them with those from the previous exercise.

Question 2.100

2.100 Make some scatterplots. For each of the following scenarios, make a scatterplot with 10 observations that show a moderate positive association, plus one that illustrates the unusual case. Explain each of your answers.

  1. (a) An outlier in x that is not influential for the regression.

  2. (b) An outlier in x that is influential for the regression.

  3. (c) An influential observation that is not an outlier in x.

  4. (d) An observation that is influential for the intercept but not for the slope.

Question 2.101

2.101 What’s wrong? Each of the following statements contains an error. Describe each error and explain why the statement is wrong.

  1. (a) An influential observation will always have a large residual.

  2. (b) High correlation is never present when there is causation.

  3. (c) If we have data at values of x equal to 1, 2, 3, 4, and 5, and we try to predict the value of y for x = 2.5 using a least-squares regression equation, we are doing an extrapolation.

Question 2.102

2.102 What’s wrong? Each of the following statements contains an error. Describe each error and explain why the statement is wrong.

  1. (a) If the residuals are all negative, this implies that there is a negative relationship between the response variable and the explanatory variable.

  2. (b) A strong negative relationship does not imply that there is an association between the explanatory variable and the response variable.

  3. (c) A lurking variable is always something that can be measured.

135

Question 2.103

2.103 Internet use and babies. Exercise 2.34 (page 99) explores the relationship between Internet use and birthrate for 106 countries. Figure 2.13 (page 99) is a scatterplot of the data. It shows a negative association between these two variables. Do you think that this plot indicates that Internet use causes people to have fewer babies? Give another possible explanation for why these two variables are negatively associated.

Question 2.104

image2.104 A lurking variable. The effect of a lurking variable can be surprising when individuals are divided into groups. In recent years, the mean SAT score of all high school seniors has increased. But the mean SAT score has decreased for students at each level of high school grades (A, B, C, and so on). Explain how grade inflation in high school (the lurking variable) can account for this pattern. A relationship that holds for each group within a population need not hold for the population as a whole. In fact, the relationship can even change direction.

image

Question 2.105

2.105 How’s your self-esteem? People who do well tend to feel good about themselves. Perhaps helping people feel good about themselves will help them do better in their jobs and in life. For a time, raising self-esteem became a goal in many schools and companies. Can you think of explanations for the association between high self-esteem and good performance other than “Self-esteem causes better work”?

Question 2.106

2.106 Are big hospitals bad for you? A study shows that there is a positive correlation between the size of a hospital (measured by its number of beds x) and the median number of days y that patients remain in the hospital. Does this mean that you can shorten a hospital stay by choosing a small hospital? Why?

Question 2.107

2.107 Does herbal tea help nursing-home residents? A group of college students believes that herbal tea has remarkable powers. To test this belief, they make weekly visits to a local nursing home, where they visit with the residents and serve them herbal tea. The nursing-home staff reports that after several months many of the residents are healthier and more cheerful. We should commend the students for their good deeds but doubt that herbal tea helped the residents. Identify the explanatory and response variables in this informal study. Then explain what lurking variables account for the observed association.

Question 2.108

2.108 Price and ounces. In Example 2.2 (page 80) and Exercise 2.3 (page 82), we examined the relationship between the price and the size of a Mocha Frappuccino®. The 12-ounce Tall drink costs $3.95, the 16-ounce Grande is $4.45, and the 24-ounce Venti is $4.95.

  1. (a) Plot the data and describe the relationship. (Explain why you should plot size in ounces on the x axis.)

  2. (b) Find the least-squares regression line for predicting the price using size. Add the line to your plot.

  3. (c) Draw a vertical line from the least-squares line to each data point. This gives a graphical picture of the residuals.

  4. (d) Find the residuals and verify that they sum to zero.

  5. (e) Plot the residuals versus size. Interpret this plot.

Question 2.109

image 2.109 Use the applet. It isn’t easy to guess the position of the least-squares line by eye. Use the Correlation and Regression applet to compare a line you draw with the least-squares line. Click on the scatterplot to create a group of 15 points from lower left to upper right with a clear, positive straight-line pattern (correlation around 0.6). Click the “Draw line” button and use the mouse to draw a line through the middle of the cloud of points from lower left to upper right. Note the “thermometer” that appears above the plot. The red portion is the sum of the squared vertical distances from the points in the plot to the least-squares line. The green portion is the “extra” sum of squares for your line—it shows by how much your line misses the smallest possible sum of squares.

  1. (a) You drew a line by eye through the middle of the pattern. Yet the right-hand part of the bar is probably almost entirely green. What does that tell you?

  2. image

    (b) Now click the “Show least-squares line” box. Is the slope of the least-squares line smaller (the new line is less steep) or larger (line is steeper) than that of your line? If you repeat this exercise several times, you will consistently get the same result. The least-squares line minimizes the vertical distances of the points from the line. It is not the line through the “middle” of the cloud of points. This is one reason it is hard to draw a good regression line by eye.

Question 2.110

image 2.110 Use the applet. Go to the Correlation and Regression applet. Click on the scatterplot to create a group of 12 points in the lower-right corner of the scatterplot with a strong straight-line pattern (correlation about −0.8). Now click the “Show least-squares line” box to display the regression line.

  1. (a) Add one point at the upper left that is far from the other 12 points but exactly on the regression line. Why does this outlier have no effect on the line even though it changes the correlation?

  2. (b) Now drag this last point down until it is opposite the group of 12 points. You see that one end of the least-squares line chases this single point, while the other end remains near the middle of the original group of 12. What makes the last point so influential?

Question 2.111

2.111 Education and income. There is a strong positive correlation between years of education and income for economists employed by business firms. (In particular, economists with doctorates earn more than economists with only a bachelor’s degree.) There is also a strong positive correlation between years of education and income for economists employed by colleges and universities. But when all economists are considered, there is a negative correlation between education and income. The explanation for this is that business pays high salaries and employs mostly economists with bachelor’s degrees, while colleges pay lower salaries and employ mostly economists with doctorates. Sketch a scatterplot with two groups of cases (business and academic) that illustrates how a strong positive correlation within each group and a negative overall correlation can occur together.

136

Question 2.112

2.112 Dangers of not looking at a plot. Table 2.1 (page 122) presents four sets of data prepared by the statistician Frank Anscombe to illustrate the dangers of calculating without first plotting the data.22

  1. (a) Use x to predict y for each of the four data sets. Find the predicted values and residuals for each of the four regression equations.

  2. (b) Plot the residuals versus x for each of the four data sets.

  3. (c) Write a summary of what the residuals tell you for each data set, and explain how the residuals help you to understand these data.