Section 1.3 Exercises

For Exercise 1.33, see page 24; for 1.34 and 1.35, see page 25; for 1.36 to 1.38, see page 26; for 1.39, see page 27; for 1.40 to 1.42, see pages 30-31; for 1.43, see page 32; and for 1.44 and 1.45, see page 33.

Question 1.46

1.46 Gross domestic product for 189 countries.

gdp

The gross domestic product (GDP) of a country is the total value of all goods and services produced in the country. It is an important measure of the health of a country’s economy. For this exercise, you will analyze the 2012 GDP for 189 countries. The values are given in millions of U.S. dollars.20

  1. Compute the mean and the standard deviation.
  2. Which countries do you think are outliers? Identify them by name and explain why you consider them to be outliers.
  3. Recompute the mean and the standard deviation without your outliers. Explain how the mean and standard deviation changed when you deleted the outliers.

Question 1.47

1.47 Use the resistant measures for GDP.

gdp

Repeat parts (a) and (c) of the previous exercise using the median and the quartiles. Summarize your results and compare them with those of the previous exercise.

1.47

(a) . . . (c) Answers will vary

Question 1.48

1.48 Forbes rankings of best countries for business.

bestbus

The Forbes website ranks countries based on their characteristics that are favorable for business.21 One of the characteristics that it uses for its rankings is trade balance, defined as the difference between the value of a country’s exports and its imports. A negative trade balance occurs when a country imports more than it exports. Similarly, the trade balance will be positive for a country that exports more than it imports. Data related to the rankings are given for 145 countries.

  1. Describe the distribution of trade balance using the mean and the standard deviation.
  2. Do the same using the median and the quartiles.
  3. Using only information from parts (a) and (b), give a description of the data. Do not look at any graphical summaries or other numerical summaries for this part of the exercise.

Question 1.49

1.49 What do the trade balance graphical summaries show?

bestbus

Refer to the previous exercise.

  1. Use graphical summaries to describe the distribution of the trade balance for these countries.
  2. Give the names of the countries that correspond to extreme values in this distribution.
  3. Reanalyze the data without the outliers.
  4. Summarize what you have learned about the distribution of the trade balance for these countries.

Include appropriate graphical and numerical summaries as well as comments about the outliers.

1.49

(b) Montenegro has a really low trade balance of -45.3. Kuwait, 42.2, and Libya, 40.7, have really high trade balances. (c) . . . . . The distribution and numerical summaries are almost identical before and after the outliers are removed. (d) Overall, the distribution is very symmetrical so that if some countries export a lot, there are other countries that import just as much. The mean and median trade balance is very close to 0. The outliers had almost no effect on the distribution or numerical summaries. Essentially, the outliers form longer tails on the curve.

Question 1.50

1.50 GDP Growth for 145 countries.

bestbus

Refer to the previous two exercises. Another variable that Forbes uses to rank countries is growth in gross domestic product, expressed as a percent.

  1. Use graphical summaries to describe the distribution of the growth in GDP for these countries.
  2. Give the names of the countries that correspond to extreme values in this distribution.
  3. Reanalyze the data without the outliers.
  4. Summarize what you have learned about the distribution of the growth in GDP for these countries. Include appropriate graphical and numerical summaries as well as comments about the outliers.

Question 1.51

1.51 Create a data set.

Create a data set that illustrates the idea that an extreme observation can have a large effect on the mean but not on the median.

Question 1.52

1.52 Variability of an agricultural product.

potato

A quality product is one that is consistent and has very little variability in its characteristics. Controlling variability can be more difficult with agricultural products than with those that are manufactured. The following table gives the individual weights, in ounces, of the 25 potatoes sold in a 10-pound bag.

36

7.8 7.9 8.2 7.3 6.7 7.9 7.9 7.9 7.6 7.8 7.0 4.7 7.6
6.3 4.7 4.7 4.7 6.3 6.0 5.3 4.3 7.9 5.2 6.0 3.7
  1. Summarize the data graphically and numerically. Give reasons for the methods you chose to use in your summaries.
  2. Do you think that your numerical summaries do an effective job of describing these data? Why or why not?
  3. There appear to be two distinct clusters of weights for these potatoes. Divide the sample into two subsamples based on the clustering. Give the mean and standard deviation for each subsample. Do you think that this way of summarizing these data is better than a numerical summary that uses all the data as a single sample? Give a reason for your answer.

Question 1.53

1.53 Apple is the number one brand.

brands

A brand is a symbol or images that are associated with a company. An effective brand identifies the company and its products. Using a variety of measures, dollar values for brands can be calculated.22 The most valuable brand is Apple, with a value of $104.3 million. Apple is followed by Microsoft, at $56.7 million; Coca-Cola, at $54.9 million; IBM, at $50.7 million; and Google, at $47.3 million. For this exercise, you will use the brand values, reported in millions of dollars, for the top 100 brands.

  1. Graphically display the distribution of the values of these brands.
  2. Use numerical measures to summarize the distribution.
  3. Write a short paragraph discussing the dollar values of the top 100 brands. Include the results of your analysis.

1.53

(b) . . . . . (c) The distribution is strongly rightskewed, with several brands far more valuable than most others. This is shown in the numerical summaries, with 75% of brand values less than . Additionally, the median brand value is only 9.6. The mean value is 14.92, substantially higher than the median, again indicating the skew. Thus, brands like Apple and those listed in the problem dwarf the competition.

Question 1.54

1.54 Advertising for best brands.

brands

Refer to the previous exercise. To calculate the value of a brand, the Forbes website uses several variables, including the amount the company spent for advertising. For this exercise, you will analyze the amounts of these companies spent on advertising, reported in millions of dollars.

  1. Graphically display the distribution of the dollars spent on advertising by these companies.
  2. Use numerical measures to summarize the distribution.
  3. Write a short paragraph discussing the advertising expenditures of the top 100 brands. Include the results of your analysis.

Question 1.55

1.55 Salaries of the chief executives.

According to the May 2013 National Occupational Employment and Wage Estimates for the United States, the median wage was $45.96 per hour and the mean wage was $53.15 per hour.23 What explains the difference between these two measures of center?

1.55

The data is right-skewed, which pulls the mean higher than the median.

Question 1.56

1.56 The alcohol content of beer.

beer

Brewing beer involves a variety of steps that can affect the alcohol content. A website gives the percent alcohol for 175 domestic brands of beer.24

  1. Use graphical and numerical summaries of your choice to describe the data. Give reasons for your choice.
  2. The data set contains an outlier. Explain why this particular beer is unusual.
  3. For the outlier, give a short description of how you think this particular beer should be marketed.

Question 1.57

1.57 Outlier for alcohol content of beer.

beer

Refer to the previous exercise.

  1. Calculate the mean with and without the outlier. Do the same for the median. Explain how these values change when the outlier is excluded.
  2. Calculate the standard deviation with and without the outlier. Do the same for the quartiles. Explain how these values change when the outliers are excluded.
  3. Write a short paragraph summarizing what you have learned in this exercise.

1.57

(a) With the outlier: . . Without the outlier: . . The values are nearly identical with and without the outlier. (b) With the outlier: . . . Without the outlier: . . . The values are nearly identical with and without the outlier. (c) Even though there is one outlier, its removal does not change the numerical summaries at all. This is partly due to the large sample and partly due to the fact that this outlier is not too far from the other observations, so removing it doesn—t have a huge effect on the analysis.

Question 1.58

1.58 Calories in beer.

Refer to the previous two exercises. The data set also lists calories per 12 ounces of beverage.

beer

  1. Analyze the data and summarize the distribution of calories for these 175 brands of beer.
  2. In Exercise 1.56, you identified one brand of beer as an outlier. To what extent is this brand an outlier in the distribution of calories? Explain your answer.
  3. Does the distribution of calories suggest marketing strategies for this brand of beer? Describe some marketing strategies.

Question 1.59

1.59 Discovering outliers.

us65

Whether an observation is an outlier is a matter of judgment. It is convenient to have a rule for identifying suspected outliers. The is in common use:

  1. The interquartile range is the distance between the first and third quartiles, . This is the spread of the middle half of the data.
  2. An observation is a suspected outlier if it lies more than below the first quartile or above the third quartile .

The stemplot in Exercise 1.31 (page 22) displays the distribution of the percents of residents aged 65 and older in the 50 states. Stemplots help you find the five-number summary because they arrange the observations in increasing order.

  1. Give the five-number summary of this distribution.
  2. Does the rule identify any outliers? If yes, give the names of the states with the percents of the population over 65.

37

The following three exercises use the Mean and Median applet available at the text website to explore the behavior of the mean and median.

1.59

(a) , , , , . (b) . . So, Utah with 9.5 percent and Alaska with 8.5 percent are low outliers. . So, Florida with 18.2 percent is a high outlier.

Question 1.60

1.60 Mean = median?

Place two observations on the line by clicking below it. Why does only one arrow appear?

Question 1.61

1.61 Extreme observations.

Place three observations on the line by clicking below it—two close together near the center of the line and one somewhat to the right of these two.

  1. Pull the rightmost observation out to the right. (Place the cursor on the point, hold down a mouse button, and drag the point.) How does the mean behave? How does the median behave? Explain briefly why each measure acts as it does.
  2. Now drag the rightmost point to the left as far as you can. What happens to the mean? What happens to the median as you drag this point past the other two? (Watch carefully).

Question 1.62

1.62 Don't change the median.

Place five observations on the line by clicking below it.

  1. Add one additional observation without changing the median. Where is your new point?
  2. Use the applet to convince yourself that when you add yet another observation (there are now seven in all), the median does not change no matter where you put the seventh point. Explain why this must be true.

Question 1.63

1.63 and are not enough

abdata

The mean and standard deviation measure center and spread but are not a complete description of a distribution. Data sets with different shapes can have the same mean and standard deviation. To demonstrate this fact, find and for these two small data sets. Then make a stemplot of each, and comment on the shape of each distribution.

Data A: 9.14 8.14 8.74 8.77 9.26 8.10
6.13 3.10 9.13 7.26 4.74
Data B: 6.58 5.76 7.71 8.84 8.47 7.04
5.25 5.56 7.91 6.89 12.50

1.63

The means and standard deviations are the same. . . The stemplots (rounded to 1 decimal) show very different distributions. Data A is strongly left-skewed with a couple possible low outliers; Data B is equally distributed between 5 and 9 but has one high outlier at 12.5.

Question 1.64

1.64 Returns on Treasury bills.

CASE 1.1 Figure 1.16(a) (page 34) is a stemplot of the annual returns on U.S. Treasury bills for 50 years. (The entries are rounded to the nearest tenth of a percent.)

tbill50

  1. Use the stemplot to find the five-number summary of T-bill returns.
  2. The mean of these returns is about 5.19%. Explain from the shape of the distribution why the mean return is larger than the median return.

Question 1.65

1.65 Salary increase for the owners.

Last year, a small accounting firm paid each of its five clerks $40,000, two junior accountants $75,000 each, and the firm’s owner $455,000.

  1. What is the mean salary paid at this firm? How many of the employees earn less than the mean? What is the median salary?
  2. This year, the firm gives no raises to the clerks and junior accountants, while the owner’s take increases to $495,000. How does this change affect the mean? How does it affect the median?

1.65

(a) . All the employees except for the owner make less than the mean. . (b) The mean increases to $105,625. The median does not change.

Question 1.66

1.66 A skewed distribution.

Sketch a distribution that is skewed to the left. On your sketch, indicate the approximate position of the mean and the median. Explain why these two values are not equal.

Question 1.67

1.67A standard deviation contest.

You must choose four numbers from the whole numbers 10 to 20, with repeats allowed.

  1. Choose four numbers that have the smallest possible standard deviation.
  2. Choose four numbers that have the largest possible standard deviation.
  3. Is more than one choice possible in (a)? In (b)? Explain.

1.67

(a) Picking the same number for all four observations results in a standard deviation of 0. (b) Picking 10, 10, 20, and 20 results in the largest standard deviation. (c) For part (a), you may pick any number as long as all observations are the same. For part (b), only one choice provides the largest standard deviation.

Question 1.68

1.68 Imputation.

impute

Various problems with data collection can cause some observations to be missing. Suppose a data set has 20 cases. Here are the values of the variable for 10 of these cases:

27 16 2 12 22 23 9 12 16 21

The values for the other 10 cases are missing. One way to deal with missing data is called imputation. The basic idea is that missing values are replaced, or imputed, with values that are based on an analysis of the data that are not missing. For a data set with a single variable, the usual choice of a value for imputation is the mean of the values that are not missing.

  1. Find the mean and the standard deviation for these data.
  2. Create a new data set with 20 cases by setting the values for the 10 missing cases to 15. Compute the mean and standard deviation for this data set.
  3. Summarize what you have learned about the possible effects of this type of imputation on the mean and the standard deviation.

Question 1.69

1.69 A different type of mean.

brands

The trimmed mean is a measure of center that is more resistant than the mean but uses more of the available information than the median. To compute the 5% trimmed mean, discard the highest 5% and the lowest 5% of the observations, and compute the mean of the remaining 90%. Trimming eliminates the effect of a small number of outliers. Use the data on the values of the top 100 brands that we studied in Exercise 1.53 (page 36) to find the 5% trimmed mean. Compare this result with the value of the mean computed in the usual way.

1.69

The 5% trimmed mean is 12.78. The original mean was 14.92. The 5% trimmed mean is not as influenced by the large outliers as the original mean.