105
For Exercises 2.38 and 2.39, see page 102.
2.40 Correlations and scatterplots. Explain why you should always look at a scatterplot when you want to use a correlation to describe the relationship between two quantitative variables.
2.41 Interpret some correlations. For each of the following correlations, describe the relationship between the two quantitative variables in terms of the direction and the strength of the linear relationship.
(a) r = 0.9.
(b) r = −0.9.
(c) r = −0.3.
(d) r = 0.0.
2.42 Blueberries and anthocyanins. In Exercise 2.18 (page 97), you examined the relationship between Antho4 and Antho3, two anthocyanins found in blueberries.
BERRIES
(a) Find the correlation between these two anthocyanins.
(b) Look at the scatterplot for these data that you made in part (a) of Exercise 2.18 (or make one if you did not do that exercise). Is the correlation a good numerical summary of the graphical display in the scatterplot? Explain your answer.
(c) Does the size of the correlation suggest that the amounts of these two anthocyanins is approximately equal in these blueberries? Explain why or why not.
2.43 Blueberries and anthocyanins with logs. In Exercise 2.19 (page 97), you examined the relationship between Antho4 and Antho3, two anthocyanins found in blueberries, using logs for both variables. Answer the questions in the previous exercise for the variables transformed in this way.
BERRIES
2.44 Fuel consumption. In Exercise 2.21 (page 97), you examined the relationship between CO2 emissions and highway fuel consumption for 527 vehicles that use regular fuel. Find the correlation between these two variables. Write a short paragraph describing the relationship using the scatterplot and the correlation.
CANFREG
2.45 Fuel consumption for different types of vehicles. In Exercise 2.23 (page 97), you examined the relationship between CO2 emissions and highway fuel consumption for 1067 vehicles that use four different types of fuel. Find the correlations between CO2 and highway fuel consumption for each of these four categories of vehicle. Summarize your results explaining similarities and differences in the relationships among the four types of fuel.
CANFUEL
2.46 Strong association but no correlation. Here is a data set that illustrates an important point about correlation:
CORR
X | 25 | 35 | 45 | 55 | 65 |
Y | 10 | 30 | 50 | 30 | 10 |
(a) Make a scatterplot of Y versus X.
(b) Describe the relationship between Y and X. Is it weak or strong? Is it linear?
(c) Find the correlation between Y and X.
(d) What important point about correlation does this exercise illustrate?
2.47 Bone strength. Exercise 2.24 (page 97) gives the bone strengths of the dominant and the nondominant arms for 15 men who were controls in a study.
ARMSTR
(a) Find the correlation between the bone strength of the dominant arm and the bone strength of the nondominant arm.
(b) Look at the scatterplot for these data that you made in part (a) of Exercise 2.24 (or make one if you did not do that exercise). Is the correlation a good numerical summary of the graphical display in the scatterplot? Explain your answer.
2.48 Bone strength for baseball players. Refer to the previous exercise. Similar data for baseball players are given in Exercise 2.25 (page 98). Answer parts (a) and (b) of the previous exercise for these data.
ARMSTR
2.49 Student ratings of teachers. A college newspaper interviews a psychologist about student ratings of the teaching of faculty members. The psychologist says, “The evidence indicates that the correlation between the research productivity and teaching rating of faculty members is close to zero.” The paper reports this as “Professor McDaniel said that good researchers tend to be poor teachers, and vice versa.” Explain why the paper’s report is wrong. Write a statement in plain language (don’t use the word “correlation”) to explain the psychologist’s meaning.
2.50 Decay of a radioactive element. Data for an experiment on the decay of barium-137m is given in Exercise 2.32 (page 99).
DECAY
(a) Find the correlation between the radioactive counts and the time after the start of the first counting period.
(b) Does the correlation give a good numerical summary of the relationship between these two variables? Explain your answer.
106
2.51 Decay in the log scale. Refer to the previous exercise and to Exercise 2.33 (page 99), where the counts were transformed by a log.
DECAY
(a) Find the correlation between the log counts and the time after the start of the first counting period.
(b) Does the correlation give a good numerical summary of the relationship between these two variables? Explain your answer.
(c) Compare your results for this exercise with those from the previous exercise.
2.52 Brand names and generic products.
(a) If a store always prices its generic “store brand” products at 80% of the brand name products’ prices, what would be the correlation between the prices of the brand name products and the store brand products? (Hint: Draw a scatterplot for several prices.)
(b) If the store always prices its generic products $2 less than the corresponding brand name products, then what would be the correlation between the prices of the brand name products and the store brand products?
2.53 Alcohol and calories in beer. Figure 2.12 (page 98) gives a scatterplot of the calories versus percent alcohol for 159 brands of domestic beer.
BEERD
(a) Compute the correlation for these data.
(b) Does the correlation do a good job of describing the direction and strength of this relationship? Explain your answer.
2.54 Alcohol and calories in beer revisited. Refer to the previous exercise. The data that you used to compute the correlation includes an outlier.
BEERD
(a) Remove the outlier and recompute the correlation.
(b) Write a short paragraph about the possible effects of outliers on a correlation using this example to illustrate your ideas.
2.55 Compare domestic with imported. In Exercise 2.31 (page 99), you compared domestic beers with imported beers with respect to the relationship between calories and percent alcohol. In that exercise, you used scatterplots to make the comparison. Compute the correlations for these two categories of beer and write a new summary of the comparison using correlations in addition to the scatterplots.
BEERD, BEERI
2.56 Use the applet. Go to the Correlation and Regression applet. Click on the scatterplot to create a group of 12 points in the lower-right corner of the scatterplot with a strong straight-line negative pattern (correlation about −0.9).
(a) Add one point at the upper left that is in line with the first 12. How does the correlation change?
(b) Drag this last point down until it is opposite the group of 12 points. How small can you make the correlation? Can you make the correlation positive? A single outlier can greatly strengthen or weaken a correlation. Always plot your data to check for outlying points.
2.57 Use the applet. You are going to use the Correlation and Regression applet to make different scatterplots with 12 points that have correlation close to 0.8. Many patterns can have the same correlation. Always plot your data before you trust a correlation.
(a) Stop after adding the first two points. What is the value of the correlation? Why does it have this value no matter where the two points are located?
(b) Make a lower-left to upper-right pattern of 12 points with correlation about r = 0.8. (You can drag points up or down to adjust r after you have 12 points.) Make a rough sketch of your scatterplot.
(c) Make another scatterplot, this time with 11 points in a vertical stack at the left of the plot. Add one point far to the right and move it until the correlation is close to 0.8. Make a rough sketch of your scatterplot.
(d) Make yet another scatterplot, this time with 12 points in a curved pattern that starts at the lower left, rises to the right, then falls again at the far right. Adjust the points up or down until you have a quite smooth curve with correlation close to 0.8. Make a rough sketch of this scatterplot also.
2.58 An interesting set of data. Make a scatterplot of the following data:
INTER
X | 1 | 2 | 3 | 4 | 10 | 10 |
Y | 1 | 3 | 3 | 5 | 1 | 10 |
Verify that the correlation is about 0.5. What feature of the data is responsible for reducing the correlation to this value despite a strong straight-line association between x and y in most of the observations?
2.59 Internet use and babies. Figure 2.13 (page 99) is a scatterplot of the number of births per 1000 people rate versus Internet users per 100 people for 106 countries. In Exercise 2.34 (page 99), you described this relationship.
INBIRTH
(a) Make a plot of the data similar to Figure 2.13 and report the correlation.
(b) Is the correlation a good numerical summary for this relationship? Explain your answer.
107
2.60 What’s wrong? Each of the following statements contains a blunder. Explain in each case what is wrong.
(a) There is a high correlation between the age of American workers and their occupation.
(b) We found a high correlation (r = 1.19) between students’ ratings of faculty teaching and ratings made by other faculty members.
(c) The correlation between the sex of a group of students and the color of their cell phone was r = 0.23.