For Exercise 1.33, see page 24; for 1.34 and 1.35, see page 25; for 1.36 to 1.38, see page 26; for 1.39, see page 27; for 1.40 to 1.42, see pages 30-31; for 1.43, see page 32; and for 1.44 and 1.45, see page 33.
1.46 Gross domestic product for 189 countries.
gdp
The gross domestic product (GDP) of a country is the total value of all goods and services produced in the country. It is an important measure of the health of a country’s economy. For this exercise, you will analyze the 2012 GDP for 189 countries. The values are given in millions of U.S. dollars.20
1.47 Use the resistant measures for GDP.
gdp
Repeat parts (a) and (c) of the previous exercise using the median and the quartiles. Summarize your results and compare them with those of the previous exercise.
1.47
(a) M=27035. Q1=7103. Q3=205789. (c) Answers will vary
1.48 Forbes rankings of best countries for business.
bestbus
The Forbes website ranks countries based on their characteristics that are favorable for business.21 One of the characteristics that it uses for its rankings is trade balance, defined as the difference between the value of a country’s exports and its imports. A negative trade balance occurs when a country imports more than it exports. Similarly, the trade balance will be positive for a country that exports more than it imports. Data related to the rankings are given for 145 countries.
1.49 What do the trade balance graphical summaries show?
bestbus
Refer to the previous exercise.
Include appropriate graphical and numerical summaries as well as comments about the outliers.
1.49
(b) Montenegro has a really low trade balance of -45.3. Kuwait, 42.2, and Libya, 40.7, have really high trade balances. (c) ˉX=−3.50. s=9.767. M=−3.3. Q1=−9.1. Q3=0.9. The distribution and numerical summaries are almost identical before and after the outliers are removed. (d) Overall, the distribution is very symmetrical so that if some countries export a lot, there are other countries that import just as much. The mean and median trade balance is very close to 0. The outliers had almost no effect on the distribution or numerical summaries. Essentially, the outliers form longer tails on the curve.
1.50 GDP Growth for 145 countries.
bestbus
Refer to the previous two exercises. Another variable that Forbes uses to rank countries is growth in gross domestic product, expressed as a percent.
1.51 Create a data set.
Create a data set that illustrates the idea that an extreme observation can have a large effect on the mean but not on the median.
1.52 Variability of an agricultural product.
potato
A quality product is one that is consistent and has very little variability in its characteristics. Controlling variability can be more difficult with agricultural products than with those that are manufactured. The following table gives the individual weights, in ounces, of the 25 potatoes sold in a 10-pound bag.
7.8 | 7.9 | 8.2 | 7.3 | 6.7 | 7.9 | 7.9 | 7.9 | 7.6 | 7.8 | 7.0 | 4.7 | 7.6 | |
6.3 | 4.7 | 4.7 | 4.7 | 6.3 | 6.0 | 5.3 | 4.3 | 7.9 | 5.2 | 6.0 | 3.7 |
1.53 Apple is the number one brand.
brands
A brand is a symbol or images that are associated with a company. An effective brand identifies the company and its products. Using a variety of measures, dollar values for brands can be calculated.22 The most valuable brand is Apple, with a value of $104.3 million. Apple is followed by Microsoft, at $56.7 million; Coca-Cola, at $54.9 million; IBM, at $50.7 million; and Google, at $47.3 million. For this exercise, you will use the brand values, reported in millions of dollars, for the top 100 brands.
1.53
(b) ˉX=14.92. s=14.1. M=9.6. Q1=6.95. Q3=18.05. (c) The distribution is strongly rightskewed, with several brands far more valuable than most others. This is shown in the numerical summaries, with 75% of brand values less than Q3=18.05. Additionally, the median brand value is only 9.6. The mean value is 14.92, substantially higher than the median, again indicating the skew. Thus, brands like Apple and those listed in the problem dwarf the competition.
1.54 Advertising for best brands.
brands
Refer to the previous exercise. To calculate the value of a brand, the Forbes website uses several variables, including the amount the company spent for advertising. For this exercise, you will analyze the amounts of these companies spent on advertising, reported in millions of dollars.
1.55 Salaries of the chief executives.
According to the May 2013 National Occupational Employment and Wage Estimates for the United States, the median wage was $45.96 per hour and the mean wage was $53.15 per hour.23 What explains the difference between these two measures of center?
1.55
The data is right-skewed, which pulls the mean higher than the median.
1.56 The alcohol content of beer.
beer
Brewing beer involves a variety of steps that can affect the alcohol content. A website gives the percent alcohol for 175 domestic brands of beer.24
1.57 Outlier for alcohol content of beer.
beer
Refer to the previous exercise.
1.57
(a) With the outlier: ˉX=0.0526. M=0.0494. Without the outlier: ˉX=0.0529. M=0.0494. The values are nearly identical with and without the outlier. (b) With the outlier: s=0.014. Q1=0.045. Q3=0.057. Without the outlier: s=0.014. Q1=0.045. Q3=0.057. The values are nearly identical with and without the outlier. (c) Even though there is one outlier, its removal does not change the numerical summaries at all. This is partly due to the large sample and partly due to the fact that this outlier is not too far from the other observations, so removing it doesn—t have a huge effect on the analysis.
1.58 Calories in beer.
Refer to the previous two exercises. The data set also lists calories per 12 ounces of beverage.
beer
1.59 Discovering outliers.
us65
Whether an observation is an outlier is a matter of judgment. It is convenient to have a rule for identifying suspected outliers. The 1.5×IQR is in common use:
The stemplot in Exercise 1.31 (page 22) displays the distribution of the percents of residents aged 65 and older in the 50 states. Stemplots help you find the five-number summary because they arrange the observations in increasing order.
The following three exercises use the Mean and Median applet available at the text website to explore the behavior of the mean and median.
1.59
(a) , , , , . (b) . . So, Utah with 9.5 percent and Alaska with 8.5 percent are low outliers. . So, Florida with 18.2 percent is a high outlier.
1.60 Mean = median?
1.61 Extreme observations.
Place three observations on the line by clicking below it—two close together near the center of the line and one somewhat to the right of these two.
1.62 Don't change the median.
1.63 and are not enough
abdata
The mean and standard deviation measure center and spread but are not a complete description of a distribution. Data sets with different shapes can have the same mean and standard deviation. To demonstrate this fact, find and for these two small data sets. Then make a stemplot of each, and comment on the shape of each distribution.
Data A: | 9.14 | 8.14 | 8.74 | 8.77 | 9.26 | 8.10 |
6.13 | 3.10 | 9.13 | 7.26 | 4.74 | ||
Data B: | 6.58 | 5.76 | 7.71 | 8.84 | 8.47 | 7.04 |
5.25 | 5.56 | 7.91 | 6.89 | 12.50 |
1.63
The means and standard deviations are the same. . . The stemplots (rounded to 1 decimal) show very different distributions. Data A is strongly left-skewed with a couple possible low outliers; Data B is equally distributed between 5 and 9 but has one high outlier at 12.5.
1.64 Returns on Treasury bills.
Figure 1.16(a) (page 34) is a stemplot of the annual returns on U.S. Treasury bills for 50 years. (The entries are rounded to the nearest tenth of a percent.)
tbill50
1.65 Salary increase for the owners.
Last year, a small accounting firm paid each of its five clerks $40,000, two junior accountants $75,000 each, and the firm’s owner $455,000.
1.65
(a) . All the employees except for the owner make less than the mean. . (b) The mean increases to $105,625. The median does not change.
1.66 A skewed distribution.
Sketch a distribution that is skewed to the left. On your sketch, indicate the approximate position of the mean and the median. Explain why these two values are not equal.
1.67A standard deviation contest.
You must choose four numbers from the whole numbers 10 to 20, with repeats allowed.
1.67
(a) Picking the same number for all four observations results in a standard deviation of 0. (b) Picking 10, 10, 20, and 20 results in the largest standard deviation. (c) For part (a), you may pick any number as long as all observations are the same. For part (b), only one choice provides the largest standard deviation.
1.68 Imputation.
impute
Various problems with data collection can cause some observations to be missing. Suppose a data set has 20 cases. Here are the values of the variable for 10 of these cases:
27 | 16 | 2 | 12 | 22 | 23 | 9 | 12 | 16 | 21 |
The values for the other 10 cases are missing. One way to deal with missing data is called imputation. The basic idea is that missing values are replaced, or imputed, with values that are based on an analysis of the data that are not missing. For a data set with a single variable, the usual choice of a value for imputation is the mean of the values that are not missing.
1.69 A different type of mean.
brands
The trimmed mean is a measure of center that is more resistant than the mean but uses more of the available information than the median. To compute the 5% trimmed mean, discard the highest 5% and the lowest 5% of the observations, and compute the mean of the remaining 90%. Trimming eliminates the effect of a small number of outliers. Use the data on the values of the top 100 brands that we studied in Exercise 1.53 (page 36) to find the 5% trimmed mean. Compare this result with the value of the mean computed in the usual way.
1.69
The 5% trimmed mean is 12.78. The original mean was 14.92. The 5% trimmed mean is not as influenced by the large outliers as the original mean.