scc9e_ips9e_sample

SECTION 5.2 EXERCISES

For Exercise 5.17, see page xxx; for Exercises 5.18 and 5.19, see pages xxx–xxx; for Exercise 5.20, see page xxx; for Exercise 5.21, see page xxx; for Exercises 5.22 and 5.23, see pages xxx–xxx; and for Exercise 5.24, see page xxx.

Question 293.9

5.25 What is wrong? Explain what is wrong in each of the following statements.

(a) If the population standard deviation is 10, then the standard deviation of $\bar{x}$ for an SRS of 10 observations will be 10/10 = 1.
(b) When taking SRSs from a population, larger sample sizes will result in larger standard deviations of $\bar{x}$ .
(c) For an SRS from a population, both the mean and the standard deviation of $\bar{x}$ depend on the sample size n.
(d) The larger the population size N, the larger the sample size n needs to be for a desired standard deviation of $\bar{x}$ .

Question 293.10

5.26 What is wrong? Explain what is wrong in each of the following statements.

(a) The central limit theorem states that for large n, the population mean μ is approximately Normal.
(b) For large n, the distribution of observed values will be approximately Normal.
(c) For sufficiently large n, the 68–95–99.7 rule says that $\bar{x}$ should be within μ ± 2 $σ$ about 95% of the time.
(d) As long as the sample size n is less than half the population size N, the standard deviation of $\bar{x}$ is $σ / \sqrt{n}$ .

Question 293.11

5.27 Generating a sampling distribution. Let’s illustrate the idea of a sampling distribution in the case of a very small sample from a very small population. The population is the 10 scholarship players currently on your women’s basketball team. For convenience, the 10 players have been labeled with the integers 0 to 9. For each player, the total amount of time spent (in minutes) on Twitter during the last week is recorded in the following table.

Page 308

Player	0	1	2	3	4	5	6	7	8	9
Total time (min)	98	63	137	210	52	88	151	133	105	168

The parameter of interest is the average amount of time on Twitter. The sample is an SRS of size n = 3 drawn from this population of players. Because the players are labeled 0 to 9, a single random digit from Table B chooses one player for the sample.

(a) Find the mean for the 10 players in the population. This is the population mean μ.
(b) Use Table B to draw an SRS of size 3 from this population. (Note: You may sample the same player’s time more than once.) Write down the three times in your sample and calculate the sample mean $\bar{x}$ . This statistic is an estimate of μ.
(c) Repeat this process nine more times using different parts of Table B. Make a histogram of the 10 values of $\bar{x}$ . You are approximating the sampling distribution of $\bar{x}$ .
(d) Is the center of your histogram close to μ? Explain why you’d expect it to get closer to μ the more times you repeated this sampling process.

Question 293.12

5.28 Total sleep time of college students. In Example 5.4, the total sleep time per night among college students was approximately Normally distributed with mean μ = 6.78 hours and standard deviation σ = 1.24 hours. You plan to take an SRS of size n = 120 and compute the average total sleep time.

(a) What is the standard deviation for the average time?
(b) Use the 95 part of the 68–95–99.7 rule to describe the variability of this sample mean.
(c) What is the probability that your average will be below 6.9 hours?

Question 293.13

5.29 Determining sample size. Refer to the previous exercise. You want to use a sample size such that about 95% of the averages fall within ±5 minutes (0.08 hour) of the true mean μ = 6.78.

(a) Based on your answer to part (b) in Exercise 5.28, should the sample size be larger or smaller than 120? Explain.
(b) What standard deviation of $\bar{x}$ do you need such that approximately 95% of all samples will have a mean within 5 minutes of μ?
(c) Using the standard deviation you calculated in part (b), determine the number of students you need to sample.

Question 293.14

5.30 Music file size on a tablet PC. A tablet PC contains 3217 music files. The distribution of file size is highly skewed with many small file sizes. Assume that the standard deviation for this population is 3.25 megabytes (MB).

(a) What is the standard deviation of the average file size when you take an SRS of 25 files from this population?
(b) How many files would you need to sample if you wanted the standard deviation of $\bar{x}$ to be no larger than 0.50 MB?

Question 293.15

5.31 Bottling an energy drink. A bottling company uses a filling machine to fill cans with an energy drink. The cans are supposed to contain 250 milliliters (ml). The machine, however, has some variability, so the standard deviation of the volume is σ = 0.4 ml. A sample of five cans is inspected each hour for process control purposes, and records are kept of the sample mean volume. If the process mean is exactly equal to the target value, what will be the mean and standard deviation of the numbers recorded?

Question 293.16

5.32 Average file size on a tablet. Refer to Exercise 5.30. Suppose that the true mean file size of the music and video files on the tablet is 2.35 MB and you plan to take an SRS of n = 50 files.

(a) Explain why it may be reasonable to assume that the average $\bar{x}$ is approximately Normal even though the population distribution is highly skewed.
(b) Sketch the approximate Normal curve for the sample mean, making sure to specify the mean and standard deviation.
(c) What is the probability that your sample mean will differ from the population mean by more than 0.15 MB?

Question 293.17

5.33 Can volumes. Averages are less variable than individual observations. It is reasonable to assume that the can volumes in Exercise 5.31 vary according to a Normal distribution. In that case, the mean $\bar{x}$ of an SRS of cans also has a Normal distribution.

(a) Make a sketch of the Normal curve for a single can. Add the Normal curve for the mean of an SRS of five cans on the same sketch.
(b) What is the probability that the volume of a single randomly chosen can differs from the target value by 0.1 ml or more?
(c) What is the probability that the mean volume of an SRS of five cans differs from the target value by 0.1 ml or more?

Question 293.18

5.34 Number of friends on Facebook. To commemorate Facebook’s 10-year milestone, Pew Research reported several facts about Facebook obtained from its Internet Project survey. One was that the average adult user of Facebook has 338 friends. This population distribution takes only integer values, so it is certainly not Normal. It is also highly skewed to the right, with a reported median of 200 friends.⁸ Suppose that σ = 380 and you take an SRS of 80 adult Facebook users.

Page 309

(a) For your sample, what are the mean and standard deviation of $\bar{x}$ , the mean number of friends per adult user?
(b) Use the central limit theorem to find the probability that the average number of friends for 80 Facebook users is greater than 350.
(c) What are the mean and standard deviation of the total number of friends in your sample?
(d) What is the probability that the total number of friends among your sample of 80 Facebook users is greater than 28,000?

Question 293.19

5.35 Cholesterol levels of teenagers. A study of the health of teenagers plans to measure the blood cholesterol level of an SRS of 13- to 16-year olds. The researchers will report the mean $\bar{x}$ from their sample as an estimate of the mean cholesterol level μ in this population.

(a) Explain to someone who knows no statistics what it means to say that $\bar{x}$ is an “unbiased” estimator of μ.
(b) The sample result $\bar{x}$ is an unbiased estimator of the population truth μ no matter what size SRS the study chooses. Explain to someone who knows no statistics why a large sample gives more trustworthy results than a small sample.

Question 293.20

5.36 Grades in a math course. Indiana University posts the grade distributions for its courses online.⁹ Students in one section of Math 118 in the fall semester received 18% A’s, 31% B’s, 26% C’s, 13% D’s and 12% F.

(a) Using the common scale A = 4, B = 3, C = 2, D = 1, F = 0, take X to be the grade of a randomly chosen Math 118 student. Use the definitions of the mean (page xxx) and standard deviation (page xxx) for discrete random variables to find the mean μ and the standard deviation σ of grades in this course.
(b) Math 118 is a large enough course that we can take the grades of an SRS of 25 students and not worry about the finite population correction factor. If $\bar{x}$ is the average of these 25 grades, what are the mean and standard deviation of $\bar{x}$ ?
(c) What is the probability that a randomly chosen Math 118 student gets a B or better, P(X ≥ 3)?
(d) What is the approximate probability $P (\bar{x} \geq 3)$ that the grade point average for 25 randomly chosen Math 118 students is B or better?

Question 293.21

5.37 Monitoring the emerald ash borer. The emerald ash borer is a beetle that poses a serious threat to ash trees. Purple traps are often used to detect or monitor populations of this pest. In the counties of your state where the beetle is present, thousands of traps are used to monitor the population. These traps are checked periodically. The distribution of beetle counts per trap is discrete and strongly skewed. A majority of traps have no beetles, and only a few will have more than two beetles. For this exercise, assume that the mean number of beetles trapped is 0.4 with a standard deviation of 0.9.

(a) Suppose that your state does not have the resources to check all the traps, so it plans to check only an SRS of n = 100 traps. What are the mean and standard deviation of the average number of beetles $\bar{x}$ in 100 traps?
(b) Use the central limit theorem to find the probability that the average number of beetles in 100 traps is greater than 0.5.
(c) Do you think it is appropriate in this situation to use the central limit theorem? Explain your answer.

Question 293.22

5.38 Risks and insurance. The idea of insurance is that we all face risks that are unlikely but carry high cost. Think of a fire destroying your home. So we form a group to share the risk: we all pay a small amount, and the insurance policy pays a large amount to those few of us whose homes burn down. An insurance company looks at the records for millions of homeowners and sees that the mean loss from fire in a year is μ = $500 per house and that the standard deviation of the loss is σ = $10,000. (The distribution of losses is extremely right-skewed: most people have $0 loss, but a few have large losses.) The company plans to sell fire insurance for $500 plus enough to cover its costs and profit.

(a) Explain clearly why it would be unwise to sell only 100 policies. Then explain why selling many thousands of such policies is a safe business.
(b) Suppose the company sells the policies for $600. If the company sells 50,000 policies, what is the approximate probability that the average loss in a year will be greater than $600?

Question 293.23

5.39 Weights of airline passengers. In 2005, the Federal Aviation Administration (FAA) updated its passenger weight standards to an average of 190 pounds in the summer (195 in the winter). This includes clothing and carry-on baggage. The FAA, however, did not specify a standard deviation. A reasonable standard deviation is 35 pounds. Weights are not Normally distributed, especially when the population includes both men and women, but they are not very non-Normal. A commuter plane carries 25 passengers. What is the approximate probability that, in the summer, the total weight of the passengers exceeds 5200 pounds? (Hint: To apply the central limit theorem, restate the problem in terms of the mean weight.)

Question 293.24

5.40 Iron depletion without anemia and physical performance. Several studies have shown a link between iron depletion without anemia (IDNA) and physical performance. In one recent study, the physical performance of 24 female collegiate rowers with IDNA was compared with 24 female collegiate rowers with normal iron status.¹⁰ Several different measures of physical performance were studied, but we’ll focus here on training-session duration. Assume that training-session duration of female rowers with IDNA is Normally distributed, with mean 58 minutes and standard deviation 11 minutes. Training-session duration of female rowers with normal iron status is Normally distributed, with mean 69 minutes and standard deviation 18 minutes.

Page 310

(a) What is the probability that the mean duration of the 24 rowers with IDNA exceeds 63 minutes?
(b) What is the probability that the mean duration of the 24 rowers with normal iron status is less than 63 minutes?
(c) What is the probability that the mean duration of the 24 rowers with IDNA is greater than the mean duration of the 24 rowers with normal iron status?

Question 293.25

5.41 Treatment and control groups. The previous exercise illustrates a common setting for statistical inference. This exercise gives the general form of the sampling distribution needed in this setting. We have a sample of n observations from a treatment group and an independent sample of m observations from a control group. Suppose that the response to the treatment has the N( $μ$ _X, σ_X) distribution and that the response of control subjects has the N( $μ$ _Y, σ_Y) distribution. Inference about the difference μ_Y - μ_X between the population means is based on the difference $\bar{y} - \bar{x}$ between the sample means in the two groups.

(a) Under the assumptions given, what is the distribution of $\bar{y}$ ? Of $\bar{x}$ ?
(b) What is the distribution of $\bar{y} - \bar{x}$ ?

Question 293.26

5.42 Investments in two funds. Jennifer invests her money in a portfolio that consists of 70% Fidelity Spartan 500 Index Fund and 30% Fidelity Diversified International Fund. Suppose that, in the long run, the annual real return X on the 500 Index Fund has mean 10% and standard deviation 15%, the annual real return Y on the Diversified International Fund has mean 9% and standard deviation 19%, and the correlation between X and Y is 0.6.

(a) The return on Jennifer’s portfolio is R = 0.7X + 0.3Y. What are the mean and standard deviation of R?
(b) The distribution of returns is typically roughly symmetric but with more extreme high and low observations than a Normal distribution. The average return over a number of years, however, is close to Normal. If Jennifer holds her portfolio for 20 years, what is the approximate probability that her average return is less than 5%?
(c) The calculation you just made is not overly helpful because Jennifer isn’t really concerned about the mean return $\bar{R}$ . To see why, suppose that her portfolio returns 12% this year and 6% next year. The mean return for the two years is 9%. If Jennifer starts with $1000, how much does she have at the end of the first year? At the end of the second year? How does this amount compare with what she would have if both years had the mean return, 9%? Over 20 years, there may be a large difference between the ordinary mean $\bar{R}$ and the geometric mean, which reflects the fact that returns in successive years multiply rather than add.