SECTION 2.1 Exercises

For Exercises 2.1 and 2.2, pages 6465; for 2.3 and 2.4, see page 65; for 2.5 and 2.6, pages 6667; and for 2.7, see page 68.

Question 2.8

2.8 What's wrong?

Explain what is wrong with each of the following:

  1. If two variables are negatively associated, then low values of one variable are associated with low values of the other variable.
  2. A stemplot can be used to examine the relationship between two variables.
  3. In a scatterplot, we put the response variable on the x axis and the explanatory variable on the y axis.

Question 2.9

2.9 Make some sketches

For each of the following situations, make a scatterplot that illustrates the given relationship between two variables.

  1. No apparent relationship.
  2. A weak negative linear relationship.
  3. A strong positive relationship that is not linear.
  4. A more complicated relationship. Explain the relationship.

Question 2.10

2.10 Companies of the world

inccom

In Exercise 1.118 (page 61), you examined data collected by the World Bank on the numbers of companies that are incorporated and are listed in their country's stock exchange at the end of the year for 2012. In Exercise 1.119, you did the same for the year 2002.3 In this exercise, you will examine the relationship between the numbers for these two years.

  1. Which variable would you choose as the explanatory variable, and which would you choose as the response variable. Give reasons for your answers.
  2. Make a scatterplot of the data.
  3. Describe the form, the direction, and the strength of the relationship.
  4. Are there any outliers? If yes, identify them by name.

Question 2.11

2.11 Companies of the world

inccom

Refer to the previous exercise. Using the questions there as a guide, describe the relationship between the numbers for 2012 and 2002. Do you expect this relationship to be stronger or weaker than the one you described in the previous exercise? Give a reason for your answer.

2.11

We expect the relationship to be weaker because the time difference is larger. (a) The data for year 1992 would be the explanatory variable; the data for year 2012 would be the response. We would expect the 1992 data to explain, and possibly cause, changes in the 2012 data. (c) The form is roughly linear; the direction is positive; the strength is moderate. (d) United States is the only outlier with a much larger value for the year 1992 than most other countries.

Question 2.12

2.12 Brand-to-brand variation in a product

beer

Beer100.com advertises itself as “Your Place for All Things Beer.” One of their “things” is a list of 175 domestic beer brands with the percent alcohol, calories per 12 ounces, and carbohydrates (in grams).4 In Exercises 1.56 through 1.58 (page 36), you examined the distribution of alcohol content and the distribution of calories for these beers.

  1. Give a brief summary of what you learned about these variables in those exercises. (If you did not do them when you studied Chapter 1, do them now.)
  2. Make a scatterplot of calories versus percent alcohol.
  3. Describe the form, direction, and strength of the relationship.
  4. Are there any outliers? If yes, identify them by name.

73

Question 2.13

2.13 More beer

beer

Refer to the previous exercise. Repeat the exercise for the relationship between carbohydrates and percent alcohol. Be sure to include summaries of the distributions of the two variables you are studying.

2.13

(a) From 1.156, percent alcohol is somewhat right skewed. Carbohydrates is fairly symmetric. (c) The form is somewhat linear; the direction is positive; the strength is weak. (d) O'Doul's could be a potential outlier; it has a very small percent alcohol value. Sierra Nevada Bigfoot could also be a potential outlier; it has a very high amount of carbohydrates.

Question 2.14

2.14 Marketing in Canada

canadap

Many consumer items are marketed to particular age groups in a population. To plan such marketing strategies, it is helpful to know the demographic profile for different areas. Statistics Canada provides a great deal of demographic data organized in different ways.5

  1. Make a scatterplot of the percent of the population over 65 versus the percent of the population under 15.
  2. Describe the form, direction, and strength of the relationship.

Question 2.15

2.15 Compare the provinces with the territories

canadap

Refer to the previous exercise. The three Canadian territories are the Northwest Territories, Nunavut, and the Yukon Territories. All of the other entries in the data set are provinces.

  1. Generate a scatterplot of the Canadian demographic data similar to the one that you made in the previous exercise but with the points labeled “P” for provinces and “T” for territories (or some other way if that is easier to do with your software.)
  2. Use your new scatterplot to write a new summary of the demographics for the 13 Canadian provinces and territories.

2.15

(b) The three territories have smaller percentages of the population over 65 than any of the provinces. Additionally two of the three territories have larger percentages of the population under 15 than any of the provinces.

Question 2.16

2.16 Sales and time spent on web pages

You have collected data on 1000 customers who visited the web pages of your company last week. For each customer, you recorded the time spent on your pages and the total amount of their purchases during the visit. You want to explore the relationship between these two variables.

  1. What is the explanatory variable? What is the response variable? Explain your answers.
  2. Are these variables categorical or quantitative?
  3. Do you expect a positive or negative association between these variables? Why?
  4. How strong do you expect the relationship to be? Give reasons for your answer.

Question 2.17

2.17 A product for lab experiments

decay

Barium-137m is a radioactive form of the element barium that decays very rapidly. It is easy and safe to use for lab experiments in schools and colleges.6 In a typical experiment, the radioactivity of a sample of barium-137m is measured for one minute. It is then measured for three additional one-minute periods, separated by two minutes. So data are recorded at one, three, five, and seven minutes after the start of the first counting period. The measurement units are counts. Here are the data for one of these experiments:7

Time 1 3 5 7
Count 578 317 203 118
  1. Make a scatterplot of the data. Give reasons for the choice of which variables to use on the x and y axes.
  2. Describe the overall pattern in the scatterplot.
  3. Describe the form, direction, and strength of the relationship.
  4. Identify any outliers.
  5. Is the relationship approximately linear? Explain your answer.

2.17

(b) As time increases, the count goes down. (c) The form is curved; the direction is negative; the strength is very strong. (d) The first data point at time 1 is somewhat of an outlier because it doesn't line up as well as the other times do. (e) A curve might fit the date better than a simple linear trend.

Question 2.18

2.18 Use a log for the radioactive decay

decay

Refer to the previous exercise. Transform the counts using a log transformation. Then repeat parts (a) through (e) for the transformed data, and compare your results with those from the previous exercise.

Question 2.19

2.19 Time to start a business

tts

Case 1.2 (page 23) uses the World Bank data on the time required to start a business in different countries. For Example 1.21 and several other examples that follow we used data for a subset of the countries for 2013. Data are also available for times to start in 2008. Let's look at the data for all 189 countries to examine the relationship between the times to start in 2013 and the times to start in 2008.

  1. Why should you use the time for 2008 as the explanatory variable and the time for 2013 as the response variable?
  2. Make a scatterplot of the two variables.
  3. How many points are in your plot? Explain why there are not 189 points.
  4. Describe the form, direction, and strength of the relationship.
  5. Identify any outliers.
  6. Is the relationship approximately linear? Explain your answer.

2.19

(a) 2008 data should explain the 2013 data. (c) There are 182 points; some of the data for 2008 are missing. (d) The form is somewhat linear; the direction is positive; the strength is moderate. (e) Suriname is an outlier for both 2008 and 2013. (f) The relationship is somewhat linear, though there are observations that don't follow the linear trend well.

Question 2.20

2.20 Use 2003 to predict 2013

tts

Refer to the previous exercise. The data set also has times for 2003. Use the 2003 times as the explanatory variable and the 2013 times as the response variable.

  1. Answer the questions in the previous exercise for this setting.
  2. Compare the strength of this relationship (between the 2013 times and the 2003 times) with the strength of the relationship in the previous exercise (between the 2013 times and the 2008 times). Interpret this finding.

74

Question 2.21

2.21 Fuel efficiency and CO2 emissions

canfuel

Refer to Example 2.7 (pages 7071), where we examined the relationship between CO2 emissions and highway MPG for 1067 vehicles for the model year 2014. In that example, we used MPG as the explanatory variable and CO2 as the response variable. Let's see if the relationship differs if we change our measure of fuel efficiency from highway MPG to city MPG. Make a scatterplot of the fuel efficiency for city driving, city MPG, versus CO2 emissions. Write a summary describing the relationship between these two variables. Compare your summary with what we found in Example 2.7.

2.21

There is a negative relationship between City MPG and CO2 emissions; better City MPG is associated with lower CO2 emissions. The relationship, however, is not linear but curved. There also seems to be two distinct lines or groups. This relationship is very similar to what we found in Example 2.7 when using highway MPG, with the patterns seen in the plot nearly identical to the form we saw in Example 2.7.

Question 2.22

2.22 Add the type of fuel to the plot

canfuel

Refer to the previous exercise. As we did in Figure 2.6 (page 71), add the categorical variable, type of fuel, to your plot. (If your software does not have this capability, make separate plots for each fuel type. Use the same range of values for the y axis and for the x axis to make the plots easier to compare.) Summarize what you have found in this exercise, and compare your results with what we found in Example 2.7 (pages 7071).