For Exercises 2.1 and 2.2, pages 64–65; for 2.3 and 2.4, see page 65; for 2.5 and 2.6, pages 66–67; and for 2.7, see page 68.
2.8 What's wrong?
Explain what is wrong with each of the following:
2.9 Make some sketches
For each of the following situations, make a scatterplot that illustrates the given relationship between two variables.
2.10 Companies of the world
inccom
In Exercise 1.118 (page 61), you examined data collected by the World Bank on the numbers of companies that are incorporated and are listed in their country's stock exchange at the end of the year for 2012. In Exercise 1.119, you did the same for the year 2002.3 In this exercise, you will examine the relationship between the numbers for these two years.
2.11 Companies of the world
inccom
Refer to the previous exercise. Using the questions there as a guide, describe the relationship between the numbers for 2012 and 2002. Do you expect this relationship to be stronger or weaker than the one you described in the previous exercise? Give a reason for your answer.
2.11
We expect the relationship to be weaker because the time difference is larger. (a) The data for year 1992 would be the explanatory variable; the data for year 2012 would be the response. We would expect the 1992 data to explain, and possibly cause, changes in the 2012 data. (c) The form is roughly linear; the direction is positive; the strength is moderate. (d) United States is the only outlier with a much larger value for the year 1992 than most other countries.
2.12 Brand-to-brand variation in a product
beer
Beer100.com advertises itself as “Your Place for All Things Beer.” One of their “things” is a list of 175 domestic beer brands with the percent alcohol, calories per 12 ounces, and carbohydrates (in grams).4 In Exercises 1.56 through 1.58 (page 36), you examined the distribution of alcohol content and the distribution of calories for these beers.
73
2.13 More beer
beer
Refer to the previous exercise. Repeat the exercise for the relationship between carbohydrates and percent alcohol. Be sure to include summaries of the distributions of the two variables you are studying.
2.13
(a) From 1.156, percent alcohol is somewhat right skewed. Carbohydrates is fairly symmetric. (c) The form is somewhat linear; the direction is positive; the strength is weak. (d) O'Doul's could be a potential outlier; it has a very small percent alcohol value. Sierra Nevada Bigfoot could also be a potential outlier; it has a very high amount of carbohydrates.
2.14 Marketing in Canada
canadap
Many consumer items are marketed to particular age groups in a population. To plan such marketing strategies, it is helpful to know the demographic profile for different areas. Statistics Canada provides a great deal of demographic data organized in different ways.5
2.15 Compare the provinces with the territories
canadap
Refer to the previous exercise. The three Canadian territories are the Northwest Territories, Nunavut, and the Yukon Territories. All of the other entries in the data set are provinces.
2.15
(b) The three territories have smaller percentages of the population over 65 than any of the provinces. Additionally two of the three territories have larger percentages of the population under 15 than any of the provinces.
2.16 Sales and time spent on web pages
You have collected data on 1000 customers who visited the web pages of your company last week. For each customer, you recorded the time spent on your pages and the total amount of their purchases during the visit. You want to explore the relationship between these two variables.
2.17 A product for lab experiments
decay
Barium-137m is a radioactive form of the element barium that decays very rapidly. It is easy and safe to use for lab experiments in schools and colleges.6 In a typical experiment, the radioactivity of a sample of barium-137m is measured for one minute. It is then measured for three additional one-minute periods, separated by two minutes. So data are recorded at one, three, five, and seven minutes after the start of the first counting period. The measurement units are counts. Here are the data for one of these experiments:7
Time | 1 | 3 | 5 | 7 |
Count | 578 | 317 | 203 | 118 |
2.17
(b) As time increases, the count goes down. (c) The form is curved; the direction is negative; the strength is very strong. (d) The first data point at time 1 is somewhat of an outlier because it doesn't line up as well as the other times do. (e) A curve might fit the date better than a simple linear trend.
2.18 Use a log for the radioactive decay
decay
Refer to the previous exercise. Transform the counts using a log transformation. Then repeat parts (a) through (e) for the transformed data, and compare your results with those from the previous exercise.
2.19 Time to start a business
tts
Case 1.2 (page 23) uses the World Bank data on the time required to start a business in different countries. For Example 1.21 and several other examples that follow we used data for a subset of the countries for 2013. Data are also available for times to start in 2008. Let's look at the data for all 189 countries to examine the relationship between the times to start in 2013 and the times to start in 2008.
2.19
(a) 2008 data should explain the 2013 data. (c) There are 182 points; some of the data for 2008 are missing. (d) The form is somewhat linear; the direction is positive; the strength is moderate. (e) Suriname is an outlier for both 2008 and 2013. (f) The relationship is somewhat linear, though there are observations that don't follow the linear trend well.
2.20 Use 2003 to predict 2013
tts
Refer to the previous exercise. The data set also has times for 2003. Use the 2003 times as the explanatory variable and the 2013 times as the response variable.
74
2.21 Fuel efficiency and CO2 emissions
canfuel
Refer to Example 2.7 (pages 70–71), where we examined the relationship between CO2 emissions and highway MPG for 1067 vehicles for the model year 2014. In that example, we used MPG as the explanatory variable and CO2 as the response variable. Let's see if the relationship differs if we change our measure of fuel efficiency from highway MPG to city MPG. Make a scatterplot of the fuel efficiency for city driving, city MPG, versus CO2 emissions. Write a summary describing the relationship between these two variables. Compare your summary with what we found in Example 2.7.
2.21
There is a negative relationship between City MPG and CO2 emissions; better City MPG is associated with lower CO2 emissions. The relationship, however, is not linear but curved. There also seems to be two distinct lines or groups. This relationship is very similar to what we found in Example 2.7 when using highway MPG, with the patterns seen in the plot nearly identical to the form we saw in Example 2.7.
2.22 Add the type of fuel to the plot
canfuel
Refer to the previous exercise. As we did in Figure 2.6 (page 71), add the categorical variable, type of fuel, to your plot. (If your software does not have this capability, make separate plots for each fuel type. Use the same range of values for the y axis and for the x axis to make the plots easier to compare.) Summarize what you have found in this exercise, and compare your results with what we found in Example 2.7 (pages 70–71).