Chapter 1: Looking at Data

SECTION 1.3 EXERCISES

For Exercises 1.43 and 1.44, see page 29; for Exercises 1.45 to 1.47, see page 31; for Exercise 1.48, see page 33; for Exercises 1.49 and 1.50, see page 34; for Exercise 1.51, see page 35; for Exercise 1.52, see page 37; for Exercise 1.53, see page 39; for Exercise 1.54, see page 40; for Exercise 1.55, see page 40; and for Exercise 1.56, see page 45.

Question 1.57

1.57 Potassium from potatoes. Refer to Exercise 1.30 (page 24) where you examined the potassium absorption of a group of 27 adults who ate a controlled diet that included 40 mEq of potassium from potatoes for five days.

KPOT40

(a) Compute the mean for these data.
(b) Compute the median for these data.
(c) Which measure do you prefer for describing the center of this distribution? Explain your answer. (You may include a graphical summary as part of your explanation.)

Question 1.58

1.58 Potassium from a supplement. Refer to Exercise 1.31 (page 24) where you examined the potassium absorption of a group of 29 adults who ate a controlled diet that included 40 mEq of potassium from a supplement for five days.

KSUP40

(a) Compute the mean for these data.
(b) Compute the median for these data.
(c) Which measure do you prefer for describing the center of this distribution? Explain your answer. (You may include a graphical summary as part of your explanation.)

Question 1.59

1.59 Potassium from potatoes. Refer to Exercise 1.30 (page 24) where you examined the potassium absorption of a group of 27 adults who ate a controlled diet that included 40 mEq of potassium from potatoes for five days.

KPOT40

(a) Compute the standard deviation for these data.
(b) Compute the quartiles for these data.
(c) Give the five-number summary and explain the meaning of each of the five numbers.
(d) Which numerical summaries do you prefer for describing the distribution, the mean, and the standard deviation of the five-number summary? Explain your answer. (You may include a graphical summary as part of your explanation.)

Question 1.60

1.60 Potassium from a supplement. Refer to Exercise 1.31 (page 24) where you examined the potassium absorption of a group of 29 adults who ate a controlled diet that included 40 mEq of potassium from a supplement for five days.

KSUP40

(a) Compute the standard deviation for these data.
(b) Compute the quartiles for these data.
(c) Give the five-number summary and explain the meaning of each of the five numbers.
(d) Which numerical summaries do you prefer for describing the distribution, the mean, and the standard deviation of the five-number summary? Explain your answer. (You may include a graphical summary as part of your explanation.)

Question 1.61

1.61 Potassium from potatoes. Refer to Exercise 1.30 (page 24) where you examined the potassium absorption of a group of 27 adults who ate a controlled diet that included 40 mEq of potassium from potatoes for five days. In Exercise 1.30, you used a stemplot to examine the distribution of the potassium absorption.

KPOT40

(a) Make a histogram and use it to describe the distribution of potassium absorption.
(b) Make a boxplot and use it to describe the distribution of potassium absorption.
(c) Compare the stemplot, the histogram, and the boxplot as graphical summaries of this distribution. Which do you prefer? Give reasons for your answer.

Question 1.62

1.62 Potassium from a supplement. Refer to Exercise 1.31 (page 24) where you examined the potassium absorption of a group of 29 adults who ate a controlled diet that included 40 mEq of potassium from a supplement for five days. In Exercise 1.31, you used a stemplot to examine the distribution of the potassium absorption.

KSUP40

(a) Make a histogram and use it to describe the distribution of potassium absorption.
(b) Make a boxplot and use it to describe the distribution of potassium absorption.
(c) Compare the stemplot, the histogram, and the boxplot as graphical summaries of this distribution. Which do you prefer? Give reasons for your answer.

Question 1.63

1.63 Compare the potatoes with the supplement. Refer to Exercises 1.30 and 1.31 (page 24). Use a back-to-back stemplot to display the data for the two sources of potassium. Use the stemplot to compare the two distributions and write a short summary of your findings.

KPS40

Question 1.64

1.64 Potassium sources. Refer to Exercises 1.30 and 1.31 (page 24). Use side-by-side boxplots in to describe the distributions.

KPS40

(a) Summarize what you see in the boxplots and compare it with what you saw in the stemplots.
(b) For comparing these two distributions, do you prefer back-to-back stemplots or side-by-side boxplots? Give reasons for your answer.

Question 1.65

1.65 Gosset’s data on double stout sales. William Sealy Gosset worked at the Guinness Brewery in Dublin and made substantial contributions to the practice of statistics.²³ In his work at the brewery, he collected and analyzed a great deal of data. Archives with Gosset’s handwritten tables, graphs, and notes have been preserved at the Guinness Storehouse in Dublin.²⁴ In one study, Gosset examined the change in the double stout market before and after World War I (1914–1918). For various regions in England and Scotland, he calculated the ratio of sales in 1925, after the war, as a percent of sales in 1913, before the war. Here are the data:

STOUT

Bristol	94	Glasgow	66
Cardiff	112	Liverpool	140
English Agents	78	London	428
English O	68	Manchester	190
English P	46	Newcastle-on-Tyne	118
English R	111	Scottish	24

(a) Compute the mean for these data.
(b) Compute the median for these data.
(c) Which measure do you prefer for describing the center of this distribution? Explain your answer. (You may include a graphical summary as part of your explanation.)

Question 1.66

1.66 Measures of spread for the double stout data. Refer to the previous exercise.

STOUT

(a) Compute the standard deviation for these data.
(b) Compute the quartiles for these data.
(c) Which measure do you prefer for describing the spread of this distribution? Explain your answer. (You may include a graphical summary as part of your explanation.)

Question 1.67

1.67 Are there outliers in the double stout data? Refer to the previous two exercises.

STOUT

(a) Find the IQR for these data.
(b) Use the 1.5 × IQR rule to identify and name any outliers.
(c) Make a boxplot for these data and describe the distribution using only the information in the boxplot.
(d) Make a modified boxplot for these data and describe the distribution using only the information in the boxplot.
(e) Make a stemplot for these data.
(f) Compare the boxplot, the modified boxplot, and the stemplot. Evaluate the advantages and disadvantages of each graphical summary for describing the distribution of the double stout data.

Question 1.68

1.68 Smolts. Smolts are young salmon at a stage when their skin becomes covered with silvery scales and they start to migrate from freshwater to the sea. The reflectance of a light shined on a smolt’s skin is a measure of the smolt’s readiness for the migration. Here are the reflectances, in percents, for a sample of 50 smolts:²⁵

SMOLTS

57.6	54.8	63.4	57.0	54.7	42.3	63.6	55.5	33.5	63.3
58.3	42.1	56.1	47.8	56.1	55.9	38.8	49.7	42.3	45.6
69.0	50.4	53.0	38.3	60.4	49.3	42.8	44.5	46.4	44.3
58.9	42.1	47.6	47.9	69.2	46.6	68.1	42.8	45.6	47.3
59.6	37.8	53.9	43.2	51.4	64.5	43.8	42.7	50.9	43.8

(a) Find the mean reflectance for these smolts.
(b) Find the median reflectance for these smolts.
(c) Do you prefer the mean or the median as a measure of center for these data? Give reasons for your preference.

Question 1.69

1.69 Measures of spread for smolts. Refer to the previous exercise.

SMOLTS

(a) Find the standard deviation of the reflectance for these smolts.
(b) Find the quartiles of the reflectance for these smolts.
(c) Do you prefer the standard deviation or the quartiles as a measure of spread for these data? Give reasons for your preference.

Question 1.70

1.70 Are there outliers in the smolt data? Refer to the previous two exercises.

SMOLTS

(a) Find the IQR for the smolt data.
(b) Use the 1.5 × IQR rule to identify any outliers.
(c) Make a boxplot for the smolt data and describe the distribution using only the information in the boxplot.
(d) Make a modified boxplot for these data and describe the distribution using only the information in the boxplot.
(e) Make a stemplot for these data.
(f) Compare the boxplot, the modified boxplot, and the stemplot. Evaluate the advantages and disadvantages of each graphical summary for describing the distribution of the smolt reflectance data.

Question 1.71

1.71 Potatoes. A quality product is one that is consistent and has very little variability in its characteristics. Controlling variability can be more difficult with agricultural products than with those that are manufactured. The following table gives the weights, in ounces, of the 25 potatoes sold in a 10-pound bag:

POTATO

7.6	7.9	8.0	6.9	6.7	7.9	7.9	7.9	7.6	7.8	7.0	4.7	7.6
6.3	4.7	4.7	4.7	6.3	6.0	5.3	4.3	7.9	5.2	6.0	3.7

(a) Summarize the data graphically and numerically. Give reasons for the methods you chose to use in your summaries.
(b) Do you think that your numerical summaries do an effective job of describing these data? Why or why not?
(c) There appear to be two distinct clusters of weights for these potatoes. Divide the sample into two subsamples based on the clustering. Give the mean and standard deviation for each subsample. Do you think that this way of summarizing these data is better than a numerical summary that uses all the data as a single sample? Give a reason for your answer.

Question 1.72

1.72 The alcohol content of beer. Brewing beer involves a variety of steps that can affect the alcohol content. A website gives the percent alcohol for 159 domestic brands of beer.²⁶

BEER

(a) Use graphical and numerical summaries of your choice to describe the data. Give reasons for your choice.
(b) The data set contains an outlier. Explain why this particular beer is unusual.
(c) For the outlier, give a short description of how you think this particular beer should be marketed.

Question 1.73

1.73 Outlier for alcohol content of beer. Refer to the previous exercise.

BEER

(a) Calculate the mean with and without the outlier. Do the same for the median. Explain how these values change when the outliers is excluded.
(b) Calculate the standard deviation with and without the outlier. Do the same for the quartiles. Explain how these values change when the outlier is excluded.
(c) Write a short paragraph summarizing what you have learned in this exercise.

Question 1.74

1.74 Calories in beer. Refer to the previous two exercises. The data set also lists calories per 12 ounces of beverage.

BEER

(a) Analyze the data and summarize the distribution of calories for these 159 brands of beer.
(b) In the previous exercise, you identified one brand of beer as an outlier. To what extent is this brand an outlier in the distribution of calories? Explain your answer.
(c) Does the distribution of calories suggest marketing strategies for this brand of beer? Describe some marketing strategies.

Question 1.75

1.75 Median versus mean for net worth. A report on the assets of American households says that the median net worth of U.S. families is $81,200. The mean net worth of these families is $534,600.²⁷ What explains the difference between these two measures of center?

Question 1.76

1.76 Create a data set. Create a data set with seven observations for which the median would change by a large amount if the smallest observation were deleted.

Question 1.77

1.77 Mean versus median. A small accounting firm pays each of its seven clerks $55,000, three junior accountants $80,000 each, and the firm’s owner $650,000. What is the mean salary paid at this firm? How many of the employees earn less than the mean? What is the median salary?

Question 1.78

1.78 Be careful about how you treat the zeros. In computing the median income of any group, some federal agencies omit all members of the group who had no income. Give an example to show that the reported median income of a group can go down even though the group becomes economically better off. Is this also true of the mean income?

Question 1.79

1.79 How does the median change? The firm in Exercise 1.77 gives no raises to the clerks and junior accountants, while the owner’s take increases to $500,000. How does this change affect the mean? How does it affect the median?

Question 1.80

1.80 Metabolic rates. Calculate the mean and standard deviation of the metabolic rates in Example 1.32 (page 38), showing each step in detail. First find the mean $\bar{x}$ by summing the seven observations and dividing by 7. Then find each of the deviations $x_{i} - \bar{x}$ and their squares. Check that the deviations have sum 0. Calculate the variance as an average of the squared deviations (remember to divide by n − 1). Finally, obtain s as the square root of the variance.

METABOL

Question 1.81

1.81 Earthquakes. Each year there are about 900,000 earthquakes of magnitude 2.5 or less that are usually not felt. In contrast, there are about 10 of magnitude 7.0 that cause serious damage.²⁸ Explain why the average magnitude of earthquakes is not a good measure of their impact.

Question 1.82

1.82 IQ scores. Many standard statistical methods that you will study in Part II of this book are intended for use with distributions that are symmetric and have no outliers. These methods start with the mean and standard deviation, $\bar{x}$ and s. For example, standard methods would typically be used for the IQ and GPA data in Table 1.3 (page 26).

IQGPA

(a) Find $\bar{x}$ and s for the IQ data. In large populations, IQ scores are standardized to have mean 100 and standard deviation 15. In what way does the distribution of IQ among these students differ from the overall population?
(b) Find the median IQ score. It is, as we expect, close to the mean.
(c) Find the mean and median for the GPA data. The two measures of center differ a bit. What feature of the data (see your stemplot in Exercise 1.39 or make a new stemplot) explains the difference?

Question 1.83

1.83 Mean and median for two observations. The Mean and Median applet allows you to place observations on a line and see their mean and median visually. Place two observations on the line by clicking below it. Why does only one arrow appear?

Question 1.84

1.84 Mean and median for three observations. In the Mean and Median applet, place four observations on the line by clicking below it, three close together near the center of the line and one somewhat to the right of these two.

(a) Pull the single rightmost observation out to the right. (Place the cursor on the point, hold down a mouse button, and drag the point.) How does the mean behave? How does the median behave? Explain briefly why each measure acts as it does.
(b) Now drag the rightmost point to the left as far as you can. What happens to the mean? What happens to the median as you drag this point past the other two (watch carefully)?

Question 1.85

1.85 Mean and median for seven observations. Place seven observations on the line in the Mean and Median applet by clicking below it.

(a) Add one additional observation without changing the median. Where is your new point?
(b) Use the applet to convince yourself that when you add yet another observation (there are now nine in all), the median does not change no matter where you put the seventh point. Explain why this must be true.

Question 1.86

1.86 Imputation. Various problems with data collection can cause some observations to be missing. Suppose a data set has 20 cases. Here are the values of the variable x for 10 of these cases:

IMPUTE

The values for the other 10 cases are missing. One way to deal with missing data is called imputation. The basic idea is that missing values are replaced, or imputed, with values that are based on an analysis of the data that are not missing. For a data set with a single variable, the usual choice of a value for imputation is the mean of the values that are not missing. The mean for this data set is 15.

(a) Verify that the mean is 15 and find the standard deviation for the 10 cases for which x is not missing.
(b) Create a new data set with 20 cases by setting the values for the 10 missing cases to 15. Compute the mean and standard deviation for this data set.
(c) Summarize what you have learned about the possible effects of this type of imputation on the mean and the standard deviation.

Question 1.87

1.87 A standard deviation contest. This is a standard deviation contest. You must choose four numbers from the whole numbers 10 to 20, with repeats allowed.

(a) Choose four numbers that have the smallest possible standard deviation.
(b) Choose four numbers that have the largest possible standard deviation.
(c) Is more than one choice possible in either part (a) or part (b)? Explain.

Question 1.88

1.88 Longleaf pine trees. The Wade Tract in Thomas County, Georgia, is an old-growth forest of longleaf pine trees (Pinus palustris) that has survived in a relatively undisturbed state since before the settlement of the area by Europeans. A study collected data on 584 of these trees.²⁹ One of the variables measured was the diameter at breast height (DBH). This is the diameter of the tree at 4.5 feet and the units are centimeters (cm). Only trees with DBH greater than 1.5 cm were sampled. Here are the diameters of a random sample of 40 of these trees:

PINES

10.5	13.3	26.0	18.3	52.2	9.2	26.1	17.6	40.5	31.8
47.2	11.4	2.7	69.3	44.4	16.9	35.7	5.4	44.2	2.2
4.3	7.8	38.1	2.2	11.4	51.5	4.9	39.7	32.6	51.8
43.6	2.3	44.6	31.5	40.3	22.3	43.3	37.5	29.1	27.9

(a) Find the five-number summary for these data.
(b) Make a boxplot.
(c) Make a histogram.
(d) Write a short summary of the major features of this distribution. Do you prefer the boxplot or the histogram for these data?

Question 1.89

1.89 Weight gain. A study of diet and weight gain deliberately overfed 15 volunteers for eight weeks. The mean increase in fat was $\bar{x} = 2.41$ kilograms, and the standard deviation was $s = 1.25$ kilograms. What are $\bar{x}$ and s in pounds? (A kilogram is 2.2 pounds.)

Question 1.90

1.90 Changing units from inches to centimeters. Changing the unit of length from inches to centimeters multiplies each length by 2.54 because there are 2.54 centimeters in an inch. This change of units multiplies our usual measures of spread by 2.54. This is true of IQR and the standard deviation. What happens to the variance when we change units in this way?

Question 1.91

1.91 A different type of mean. The trimmed mean is a measure of center that is more resistant than the mean but uses more of the available information than the median. To compute the 10% trimmed mean, discard the highest 10% and the lowest 10% of the observations and compute the mean of the remaining 80%. Trimming eliminates the effect of a small number of outliers. Compute the 10% trimmed mean of the service time data in Table 1.2 (page 17). Then compute the 20% trimmed mean. Compare the values of these measures with the median and the ordinary untrimmed mean.

Question 1.92

1.92 Changing units from centimeters to inches. Refer to Exercise 1.88 (page 50). Change the measurements from centimeters to inches by multiplying each value by 0.39. Answer the questions from that exercise and explain the effect of the transformation on these data.

7.6	7.9	8.0	6.9	6.7	7.9	7.9	7.9	7.6	7.8	7.0	4.7	7.6
6.3	4.7	4.7	4.7	6.3	6.0	5.3	4.3	7.9	5.2	6.0	3.7

7.6	7.9	8.0	6.9	6.7	7.9	7.9	7.9	7.6	7.8	7.0	4.7	7.6
6.3	4.7	4.7	4.7	6.3	6.0	5.3	4.3	7.9	5.2	6.0	3.7

7.6	7.9	8.0	6.9	6.7	7.9	7.9	7.9	7.6	7.8	7.0	4.7	7.6
6.3	4.7	4.7	4.7	6.3	6.0	5.3	4.3	7.9	5.2	6.0	3.7