SECTION 2.1 Exercises

For Exercises 2.1 and 2.2, pages 64–65; for 2.3 and 2.4, see page 65; for 2.5 and 2.6, pages 66–67; and for 2.7, see page 68.

Question 2.8

2.8 What’s wrong?

Explain what is wrong with each of the following:

  1. If two variables are negatively associated, then low values of one variable are associated with low values of the other variable.
  2. A stemplot can be used to examine the relationship between two variables.
  3. In a scatterplot, we put the response variable on the x axis and the explanatory variable on the y axis.

Question 2.9

2.9 Make some sketches

For each of the following situations, make a scatterplot that illustrates the given relationship between two variables.

  1. No apparent relationship.
  2. A weak negative linear relationship.
  3. A strong positive relationship that is not linear.
  4. A more complicated relationship. Explain the relationship.

Question 2.10

2.10 Companies of the world

In Exercise 1.118 (page 61), you examined data collected by the World Bank on the numbers of companies that are incorporated and are listed in their country’s stock exchange at the end of the year for 2012. In Exercise 1.119, you did the same for the year 2002.3 In this exercise, you will examine the relationship between the numbers for these two years.

  1. Which variable would you choose as the explanatory variable, and which would you choose as the response variable. Give reasons for your answers.
  2. Make a scatterplot of the data.
  3. Describe the form, the direction, and the strength of the relationship.
  4. Are there any outliers? If yes, identify them by name.

Question 2.11

2.11 Companies of the world

Refer to the previous exercise. Using the questions there as a guide, describe the relationship between the numbers for 2012 and 2002. Do you expect this relationship to be stronger or weaker than the one you described in the previous exercise? Give a reason for your answer.

Question 2.12

2.12 Brand-to-brand variation in a product

Beer100.com advertises itself as “Your Place for All Things Beer.” One of their “things” is a list of 175 domestic beer brands with the percent alcohol, calories per 12 ounces, and carbohydrates (in grams).4 In Exercises 1.56 through 1.58 (page 36), you examined the distribution of alcohol content and the distribution of calories for these beers.

  1. Give a brief summary of what you learned about these variables in those exercises. (If you did not do them when you studied Chapter 1, do them now.)
  2. Make a scatterplot of calories versus percent alcohol.
  3. Describe the form, direction, and strength of the relationship.
  4. Are there any outliers? If yes, identify them by name.

73

Question 2.13

2.13 More beer

Refer to the previous exercise. Repeat the exercise for the relationship between carbohydrates and percent alcohol. Be sure to include summaries of the distributions of the two variables you are studying.

Question 2.14

2.14 Marketing in Canada

Many consumer items are marketed to particular age groups in a population. To plan such marketing strategies, it is helpful to know the demographic profile for different areas. Statistics Canada provides a great deal of demographic data organized in different ways.5

  1. Make a scatterplot of the percent of the population over 65 versus the percent of the population under 15.
  2. Describe the form, direction, and strength of the relationship.

Question 2.15

2.15 Compare the provinces with the territories

Refer to the previous exercise. The three Canadian territories are the Northwest Territories, Nunavut, and the Yukon Territories. All of the other entries in the data set are provinces.

  1. Generate a scatterplot of the Canadian demographic data similar to the one that you made in the previous exercise but with the points labeled “P” for provinces and “T” for territories (or some other way if that is easier to do with your software.)
  2. Use your new scatterplot to write a new summary of the demographics for the 13 Canadian provinces and territories.

Question 2.16

2.16 Sales and time spent on web pages

You have collected data on 1000 customers who visited the web pages of your company last week. For each customer, you recorded the time spent on your pages and the total amount of their purchases during the visit. You want to explore the relationship between these two variables.

  1. What is the explanatory variable? What is the response variable? Explain your answers.
  2. Are these variables categorical or quantitative?
  3. Do you expect a positive or negative association between these variables? Why?
  4. How strong do you expect the relationship to be? Give reasons for your answer.

Question 2.17

2.17 A product for lab experiments

Barium-137m is a radioactive form of the element barium that decays very rapidly. It is easy and safe to use for lab experiments in schools and colleges.6 In a typical experiment, the radioactivity of a sample of barium-137m is measured for one minute. It is then measured for three additional one-minute periods, separated by two minutes. So data are recorded at one, three, five, and seven minutes after the start of the first counting period. The measurement units are counts. Here are the data for one of these experiments:7

Time 1 3 5 7
Count 578 317 203 118
  1. Make a scatterplot of the data. Give reasons for the choice of which variables to use on the x and y axes.
  2. Describe the overall pattern in the scatterplot.
  3. Describe the form, direction, and strength of the relationship.
  4. Identify any outliers.
  5. Is the relationship approximately linear? Explain your answer.

Question 2.18

2.18 Use a log for the radioactive decay

Refer to the previous exercise. Transform the counts using a log transformation. Then repeat parts (a) through (e) for the transformed data, and compare your results with those from the previous exercise.

Question 2.19

2.19 Time to start a business

Case 1.2 (page 23) uses the World Bank data on the time required to start a business in different countries. For Example 1.21 and several other examples that follow we used data for a subset of the countries for 2013. Data are also available for times to start in 2008. Let’s look at the data for all 189 countries to examine the relationship between the times to start in 2013 and the times to start in 2008.

  1. Why should you use the time for 2008 as the explanatory variable and the time for 2013 as the response variable?
  2. Make a scatterplot of the two variables.
  3. How many points are in your plot? Explain why there are not 189 points.
  4. Describe the form, direction, and strength of the relationship.
  5. Identify any outliers.
  6. Is the relationship approximately linear? Explain your answer.

Question 2.20

2.20 Use 2003 to predict 2013

Refer to the previous exercise. The data set also has times for 2003. Use the 2003 times as the explanatory variable and the 2013 times as the response variable.

  1. Answer the questions in the previous exercise for this setting.
  2. Compare the strength of this relationship (between the 2013 times and the 2003 times) with the strength of the relationship in the previous exercise (between the 2013 times and the 2008 times). Interpret this finding.

74

Question 2.21

2.21 Fuel efficiency and CO_2 emissions

Refer to Example 2.7 (pages 70–71), where we examined the relationship between CO2 emissions and highway MPG for 1067 vehicles for the model year 2014. In that example, we used MPG as the explanatory variable and CO2 as the response variable. Let’s see if the relationship differs if we change our measure of fuel efficiency from highway MPG to city MPG. Make a scatterplot of the fuel efficiency for city driving, city MPG, versus CO2 emissions. Write a summary describing the relationship between these two variables. Compare your summary with what we found in Example 2.7.

Question 2.22

2.22 Add the type of fuel to the plot

Refer to the previous exercise. As we did in Figure 2.6 (page 71), add the categorical variable, type of fuel, to your plot. (If your software does not have this capability, make separate plots for each fuel type. Use the same range of values for the y axis and for the x axis to make the plots easier to compare.) Summarize what you have found in this exercise, and compare your results with what we found in Example 2.7 (pages 70–71).