CHAPTER 2 Review Exercises

Question 2.119

2.119 Companies of the world with logs

In Exercises 2.10 (page 72), 2.27 (page 78), and 2.58 (pages 95–96), you examined the relationship between the numbers of companies that are incorporated and are listed on their country’s stock exchange at the end of the year using data collected by the World Bank.24 In this exercise, you will explore the relationship between the numbers for 2012 and 2002 using logs.

  1. Which variable do you choose to be the explanatory variable, and which do you choose to be the response variable? Explain your answer.
  2. Plot the data with the least-squares regression line. Summarize the major features of your plot.
  3. Give the equation of the least-squares regression line.
  4. Find the predicted value and the residual for Sweden.
  5. Find the correlation between the two variables.
  6. Compare the results found in this exercise with those you found in Exercises 2.10, 2.27, and 2.58. Do you prefer the analysis with the original data or the analysis using logs? Give reasons for your answer.

117

Question 2.120

2.120 Residuals for companies of the world with logs

Refer to the previous exercise.

  1. Use a histogram to examine the distribution of the residuals.
  2. Make a Normal quantile plot of the residuals.
  3. Summarize the distribution of the residuals using the graphical displays that you created in parts (a) and (b).
  4. Repeat parts (a), (b), and (c) for the original data, and compare these results with those you found in parts (a), (b), and (c). Which do you prefer? Give reasons for your answer.

Question 2.121

2.121 Dwelling permits and sales for 21 European countries

The Organization for Economic Cooperation and Development (OECD) collects data on Main Economic Indicators (MEIs) for many countries. Each variable is recorded as an index, with the year 2000 serving as a base year. This means that the variable for each year is reported as a ratio of the value for the year divided by the value for 2000. Use of indices in this way makes it easier to compare values for different countries.25

  1. Make a scatterplot with sales as the response variable and permits issued for new dwellings as the explanatory variable. Describe the relationship. Are there any outliers or influential observations?
  2. Find the least-squares regression line and add it to your plot.
  3. What is the predicted value of sales for a country that has an index of 160 for dwelling permits?
  4. The Netherlands has an index of 160 for dwelling permits. Find the residual for this country.
  5. What percent of the variation in sales is explained by dwelling permits?

Question 2.122

2.122 Dwelling permits and production

Refer to the previous exercise.

  1. Make a scatterplot with production as the response variable and permits issued for new dwellings as the explanatory variable. Describe the relationship. Are there any outliers or influential observations?
  2. Find the least-squares regression line and add it to your plot.
  3. What is the predicted value of production for a country that has an index of 160 for dwelling permits?
  4. The Netherlands has an index of 160 for dwelling permits. Find the residual for this country.
  5. What percent of the variation in production is explained by dwelling permits? How does this value compare with the value you found in the previous exercise for the percent of variation in sales that is explained by building permits?

Question 2.123

2.123 Sales and production

Refer to the previous two exercises.

  1. Make a scatterplot with sales as the response variable and production as the explanatory variable. Describe the relationship. Are there any outliers or influential observations?
  2. Find the least-squares regression line and add it to your plot.
  3. What is the predicted value of sales for a country that has an index of 125 for production?
  4. Finland has an index of 125 for production. Find the residual for this country.
  5. What percent of the variation in sales is explained by production? How does this value compare with the percents of variation that you calculated in the two previous exercises?

Question 2.124

2.124 Salaries and raises

For this exercise, we consider a hypothetical employee who starts working in Year 1 at a salary of $50,000. Each year her salary increases by approximately 5%. By Year 20, she is earning $126,000. The following table gives her salary for each year (in thousands of dollars):

Year Salary Year Salary Year Salary Year Salary
1 50 6 63 11 81 16 104
2 53 7 67 12 85 17 109
3 56 8 70 13 90 18 114
4 58 9 74 14 93 19 120
5 61 10 78 15 99 20 126
  1. Figure 2.24 is a scatterplot of salary versus year with the least-squares regression line. Describe the relationship between salary and year for this person.
  2. The value of r2 for these data is 0.9832. What percent of the variation in salary is explained by year? Would you say that this is an indication of a strong linear relationship? Explain your answer.
Figure 2.24: FIGURE 2.24 Plot of salary versus year, with the least-squares regression line, for an individual who receives approximately a 5% raise each year for 20 years, Exercise 2.124.

Question 2.125

2.125 Look at the residuals

Refer to the previous exercise. Figure 2.25 is a plot of the residuals versus year.

  1. Interpret the residual plot.
  2. Explain how this plot highlights the deviations from the least-squares regression line that you can see in Figure 2.24.
Figure 2.25: FIGURE 2.25 Plot of residuals versus year for an individual who receives approximately a 5% raise each year for 20 years, Exercise 2.125.

118

Question 2.126

2.126 Try logs

Refer to the previous two exercises. Figure 2.26 is a scatterplot with the least-squares regression line for log salary versus year. For this model, r2 = 0.9995.

  1. Compare this plot with Figure 2.24. Write a short summary of the similarities and the differences.
  2. Figure 2.27 is a plot of the residuals for the model using year to predict log salary. Compare this plot with Figure 2.25 and summarize your findings.
Figure 2.26: FIGURE 2.26 Plot of log salary versus year, with the least-squares regression line, for an individual who receives approximately a 5% raise each year for 20 years, Exercise 2.126.
Figure 2.27: FIGURE 2.27 Plot of residuals, based on log salary, versus year for an individual who receives approximately a 5% raise each year for 20 years, Exercise 2.126.

Question 2.127

2.127 Predict some salaries

The individual whose salary we have been studying in Exercises 2.122 through 2.124 wants to do some financial planning. Specifically, she would like to predict her salary five years into the future, that is, for Year 25. She is willing to assume that her employment situation will be stable for the next five years and that it will be similar to the last 20 years.

  1. Use the least-squares regression equation constructed to predict salary from year to predict her salary for Year 25.
  2. Use the least-squares regression equation constructed to predict log salary from year to predict her salary for Year 25. Note that you will need to convert the predicted log salary back to the predicted salary. Many calculators have a function that will perform this operation.
  3. Which prediction do you prefer? Explain your answer.
  4. Someone looking at the numerical summaries, and not the plots, for these analyses says that because both models have very high values of r2, they should perform equally well in doing this prediction. Write a response to this comment.
  5. Write a short paragraph about the value of graphical summaries and the problems of extrapolation using what you have learned from studying these salary data.

119

Question 2.128

2.128 Faculty salaries

Data on the salaries of a sample of professors in a business department at a large university are given below. The salaries are for the academic years 2014–2015 and 2015–2016.

2014–2015
salary ($)
2015–2016
salary ($)
2014–2015
salary ($)
2015–2016
salary ($)
145,700 147,700 136,650 138,650
112,700 114,660 132,160 134,150
109,200 111,400 74,290 76,590
98,800 101,900 74,500 77,000
112,000 113,000 83,000 85,400
111,790 113,800 141,850 143,830
103,500 105,700 122,500 124,510
149,000 150,900 115,100 117,100
  1. Construct a scatterplot with the 2015–2016 salaries on the vertical axis and the 2014–2015 salaries on the horizontal axis.
  2. Comment on the form, direction, and strength of the relationship in your scatterplot.
  3. What proportion of the variation in 2015–2016 salaries is explained by 2014–2015 salaries?

Question 2.129

2.129 Find the line and examine the residuals

Refer to the previous exercise.

  1. Find the least-squares regression line for predicting 2015–2016 salaries from 2014–2015 salaries.
  2. Analyze the residuals, paying attention to any outliers or influential observations. Write a summary of your findings.

Question 2.130

2.130 Bigger raises for those earning less

Refer to the previous two exercises. The 2014–2015 salaries do an excellent job of predicting the 2015–2016 salaries. Is there anything more that we can learn from these data? In this department, there is a tradition of giving higher-than-average percent raises to those whose salaries are lower. Let’s see if we can find evidence to support this idea in the data.

120

  1. Compute the percent raise for each faculty member. Take the difference between the 2015–2016 salary and the 2014–2015 salary, divide by the 2014–2015 salary, and then multiply by 100. Make a scatterplot with the raise as the response variable and the 2014–2015 salary as the explanatory variable. Describe the relationship that you see in your plot.
  2. Find the least-squares regression line and add it to your plot.
  3. Analyze the residuals. Are there any outliers or influential cases? Make a graphical display and include it in a short summary of what you conclude.
  4. Is there evidence in the data to support the idea that greater percentage raises are given to those with lower salaries? Summarize your findings and include numerical and graphical summaries to support your conclusion.

Question 2.131

2.131 Marketing your college

Colleges compete for students, and many students do careful research when choosing a college. One source of information is the rankings compiled by U.S. News & World Report. One of the factors used to evaluate undergraduate programs is the proportion of incoming students who graduate. This quantity, called the graduation rate, can be predicted by other variables such as the SAT or ACT scores and the high school records of the incoming students. One of the components in U.S. News & World Report rankings is the difference between the actual graduation rate and the rate predicted by a regression equation.26 In this chapter, we call this quantity the residual. Explain why the residual is a better measure to evaluate college graduation rates than the raw graduation rate.

Question 2.132

2.132 Planning for a new product

The editor of a statistics text would like to plan for the next edition. A key variable is the number of pages that will be in the final version. Text files are prepared by the authors using a word processor called LaTeX, and separate files contain figures and tables. For the previous edition of the text, the number of pages in the LaTeX files can easily be determined, as well as the number of pages in the final version of the text. Here are the data:

Chapter
1 2 3 4 5 6 7 8 9 10 11 12 13
LaTeX
pages
77 73 59 80 45 66 81 45 47 43 31 46 26
Text
pages
99 89 61 82 47 68 87 45 53 50 36 52 19
  1. Plot the data and describe the overall pattern.
  2. Find the equation of the least-squares regression line, and add the line to your plot.
  3. Find the predicted number of pages for the next edition if the number of LaTeX pages for a chapter is 62.
  4. Write a short report for the editor explaining to her how you constructed the regression equation and how she could use it to estimate the number of pages in the next edition of the text.

Question 2.133

2.133 Points scored in women’s basketball games

Use the Internet to find the scores for the past season’s women’s basketball team at a college of your choice. Is there a relationship between the points scored by your chosen team and the points scored by their opponents? Summarize the data and write a report on your findings.

Question 2.134

2.134 Look at the data for men

Refer to the previous exercise. Analyze the data for the men’s team from the same college, and compare your results with those for the women.

Question 2.135

2.135 Circular saws

The following table gives the weight (in pounds) and amps for 19 circular saws. Saws with higher amp ratings tend to also be heavier than saws with lower amp ratings. We can quantify this fact using regression.

Weight Amps Weight Amps Weight Amps
11 15 9 10 11 13
12 15 11 15 13 14
11 15 12 15 10 12
11 15 12 14 11 12
12 15 10 10 11 12
11 15 12 13 10 12
13 15
  1. We will use amps as the explanatory variable and weight as the response variable. Give a reason for this choice.
  2. Make a scatterplot of the data. What do you notice about the weight and amp values?
  3. Report the equation of the least-squares regression line along with the value of r2.
  4. Interpret the value of the estimated slope.
  5. How much of an increase in amps would you expect to correspond to a one-pound increase in the weight of a saw, on average, when comparing two saws?
  6. Create a residual plot for the model in part (b). Does the model indicate curvature in the data?

121

Question 2.136

2.136 Circular saws

The table in the previous exercise gives the weight (in pounds) and amps for 19 circular saws. The data contain only five different amp ratings among the 19 saws.

  1. Calculate the correlation between the weights and the amps of the 19 saws.
  2. Calculate the average weight of the saws for each of the five amp ratings.
  3. Calculate the correlation between the average weights and the amps. Is the correlation between average weights and amps greater than, less than, or equal to the correlation between individual weights and amps?

Question 2.137

2.137 What correlation does and doesn’t say

Construct a set of data with two variables that have different means and correlation equal to one. Use your example to illustrate what correlation does and doesn’t say.

Question 2.138

2.138 Simpson’s paradox and regression

Simpson’s paradox occurs when a relationship between variables within groups of observations reverses when all of the data are combined. The phenomenon is usually discussed in terms of categorical variables, but it also occurs in other settings. Here is an example:

y x Group y x Group
10.1 1 1 18.3 6 2
8.9 2 1 17.1 7 2
8.0 3 1 16.2 8 2
6.9 4 1 15.1 9 2
6.1 5 1 14.3 10 2
  1. Make a scatterplot of the data for Group 1. Find the least-squares regression line and add it to your plot. Describe the relationship between y and x for Group 1.
  2. Do the same for Group 2.
  3. Make a scatterplot using all 10 observations. Find the least-squares line and add it to your plot.
  4. Make a plot with all of the data using different symbols for the two groups. Include the three regression lines on the plot. Write a paragraph about Simpson’s paradox for regression using this graphical display to illustrate your description.

Question 2.139

2.139 Wood products

A wood product manufacturer is interested in replacing solid-wood building material by less-expensive products made from wood flakes.27 The company collected the following data to examine the relationship between the length (in inches) and the strength (in pounds per square inch) of beams made from wood flakes:

Length 5 6 7 8 9 10 11 12 13 14
Strength 446 371 334 296 249 254 244 246 239 234
  1. Make a scatterplot that shows how the length of a beam affects its strength.
  2. Describe the overall pattern of the plot. Are there any outliers?
  3. Fit a least-squares line to the entire set of data. Graph the line on your scatterplot. Does a straight line adequately describe these data?
  4. The scatterplot suggests that the relation between length and strength can be described by two straight lines, one for lengths of 5 to 9 inches and another for lengths of 9 to 14 inches. Fit least-squares lines to these two subsets of the data, and draw the lines on your plot. Do they describe the data adequately? What question would you now ask the wood experts?

Question 2.140

2.140 Aspirin and heart attacks

Does taking aspirin regularly help prevent heart attacks? “Nearly five decades of research now link aspirin to the prevention of stroke and heart attacks.” So says the Bayer Aspirin website, bayeraspirin.com. The most important evidence for this claim comes from the Physicians’ Health Study. The subjects were 22,071 healthy male doctors at least 40 years old. Half the subjects, chosen at random, took aspirin every other day. The other half took a placebo, a dummy pill that looked and tasted like aspirin. Here are the results.28 (The row for “None of these” is left out of the two-way table.)

Aspirin
group
Placebo
group
Fatal heart attacks 10 26
Other heart attacks 129 213
Strokes 119 98
Total 11,037 11,034

What do the data show about the association between taking aspirin and heart attacks and stroke? Use percents to make your statements precise. Include a mosaic plot if you have access to the needed software. Do you think the study provides evidence that aspirin actually reduces heart attacks (cause and effect)?

122

Question 2.141

2.141 More smokers live at least 20 more years!

You can see the headlines “More smokers than nonsmokers live at least 20 more years after being contacted for study!” A medical study contacted randomly chosen people in a district in England. Here are data on the 1314 women contacted who were either current smokers or who had never smoked. The tables classify these women by their smoking status and age at the time of the survey and whether they were still alive 20 years later.29

Age 18 to 44 Age 45 to 64 Age 65+
Smoker Not Smoker Not Smoker Not
Dead 19 13 78 52 42 165
Alive 269 327 167 147 7 28
  1. From these data, make a two-way table of smoking (yes or no) by dead or alive. What percent of the smokers stayed alive for 20 years? What percent of the nonsmokers survived? It seems surprising that a higher percent of smokers stayed alive.
  2. The age of the women at the time of the study is a lurking variable. Show that within each of the three age groups in the data, a higher percent of nonsmokers remained alive 20 years later. This is another example of Simpson’s paradox.
  3. The study authors give this explanation: “Few of the older women (over 65 at the original survey) were smokers, but many of them had died by the time of follow-up.” Compare the percent of smokers in the three age groups to verify the explanation.

Question 2.142

2.142 Recycled product quality

Recycling is supposed to save resources. Some people think recycled products are lower in quality than other products, a fact that makes recycling less practical. People who actually use a recycled product may have different opinions from those who don’t use it. Here are data on attitudes toward coffee filters made of recycled paper among people who do and don’t buy these filters:30

Think the quality of the
recycled product is:
Higher The same Lower
Buyers 20 7 9
Nonbuyers 29 25 43
  1. Find the marginal distribution of opinion about quality. Assuming that these people represent all users of coffee filters, what does this distribution tell us?
  2. How do the opinions of buyers and nonbuyers differ? Use conditional distributions as a basis for your answer. Include a mosaic plot if you have access to the needed software. Can you conclude that using recycled filters causes more favorable opinions? If so, giving away samples might increase sales.