6 Introduction to Inference

CHAPTER 6 Review Exercises

Question 6.118

6.118 Change in number insured.

The Wall Street Journal reported a Rand study on the estimated change in insured Americans from September 2013 to March 2014.²³ Here is an excerpt:

… a net gain of 9.3 million people with coverage. That number came with a wide margin of error (3.5 million people), was driven largely by increased employer-based coverage, and didn’t fully capture the surge in enrollments that occurred in late March as the application deadline for Obamacare plans neared.

Page 352

The reported margin of error is based on a 95% level of confidence. What is the 95% confidence interval for the change in people with coverage?

Question 6.119

6.119 Coverage percent of 95% confidence interval.

For this exercise, use the Confidence Interval applet. Set the confidence level at 95%, and click the “Sample” button 10 times to simulate 10 confidence intervals. Record the percent hit (that is, percent of intervals including the population mean). Simulate another 10 intervals by clicking another 10 times (do not click the “Reset” button). Record the percent hit for your 20 intervals. Repeat the process of simulating 10 additional intervals and recording the results until you have a total of 200 intervals. Plot your results and write a summary of what you have found.

6.119

Applet, answers will vary.

Question 6.120

6.120 Coverage percent of 90% confidence interval.

Refer to the previous exercise. Do the simulations and report the results for 90% confidence.

Question 6.121

6.121 Change the confidence level.

Refer to Example 6.21 (page 329) and construct a 95% confidence interval for the mean initial return for the population of Chinese IPO firms.

6.121

(61.17, 71.43).

Question 6.122

6.122 Job satisfaction.

A study of job satisfaction of Croatian employees was conducted on a research sample of 4000+ employees.²⁴ The researcher developed a metric for overall job satisfaction based on the rating of numerous factors, including nature of work, top management, promotion, pay, status, working conditions, and others. The job satisfaction metric ranges from 1 to 5. Here is a table found in the report:

	$n$	Mean	Standard deviation	Standard error of mean
Men	2261	3.4601	0.86208	?
Women	1975	3.5842	0.75004	?

Given the large sample sizes, we can assume that the sample standard deviations are the population standard deviations.

Determine the two missing standard error of mean values. As we note in Chapter 7, the “standard error” for estimating the mean is $s / \sqrt{n}$ . But because the sample sizes of the study are large, $s$ is approximately equal to the population standard deviation $σ$ .
Compute 95% confidence intervals for the mean job satisfaction for men and for women.
In the next chapter, we describe the confidence interval for the difference between two means. For now, let’s compare the men’s and women’s confidence intervals to arrive to a preliminary conclusion. In the study, the researcher states: “The results showed that there was a difference in job satisfaction between men and women.” Are the confidence intervals from part (b) consistent with this conclusion? Explain your answer.

Question 6.123

6.123 Really small $P$ -value.

For Example 6.21 (page 329), we noted that the $P$ -value for testing the null hypothesis of $μ = 0$ is $2 P (Z \geq 25.33)$ . Without calculation, we further noted that the $P$ -value is obviously much less than 0.001.

Just how small is the $P$ -value? Excel will actually report very small probabilities. Use the NORM.DIST function to find the probability.
Relate the extremely small probability found in part (a) to a friend with the small probability event of winning the multi-state Powerball lottery, which has probability of 1 in 175 million.

6.123

(a) 1.4929E-141.

Question 6.124

6.124 Supply chain practices.

In a Stanford University study of supply chain practices, researcher gathered data on numerous companies and computed the correlations between various managerial practices and metrics on social responsibility.²⁵ In the report, the researchers only report correlations that meet the following criteria: correlation value $\geq 0.2$ and $P$ -value $\leq 0.05$ . Why do you think the researchers ar not reporting statistically signifcant correlations that are less than 0.2?

Question 6.125

6.125 Wine.

Many food products contain small quantities of substances that would give an undesirable taste or smell if they were present in large amounts. An example is the “off-odors” caused by sulfur compounds in wine. Oenologists (wine experts) have determined the odor threshold, the lowest concentration of a compound that the human nose can detect. For example, the odor threshold for dimethyl sulfde (DMS) is given in the oenology literature as 25 micrograms per liter of wine ( $μ g / l$ ). Untrained noses may be less sensitive, however. Here are the DMS odor thresholds for 10 beginning students of oenology:

Assume (this is not realistic) that the standard deviation of the odor threshold for untrained noses is known to be $σ = 7 μ g / l$ .

odor

Make a stemplot to verify that the distribution is roughly symmetric with no outliers. (A Normal quantile plot confirms that there are no systematic departures from Normality.)
Page 353
Give a 95% confidence interval for the mean DMS odor threshold among all beginning oenology students.
Are you convinced that the mean odor threshold for beginning students is higher than the published threshold, $25 μ g / l$ ? Carry out a significance test to justify your answer.

6.125

(b) (26.06, 34.74). (c) $H_{0} : μ = 25. H_{a} : μ > 25, Z = 2.44$ , $P -value = 0.0073$ . The mean odor threshold for the beginning students is higher than the published threshold of 25.

Question 6.126

6.126 Too much cellulose to be proftable?

Excess cellulose in alfalfa reduces the “relative feed value” of the product that will be fed to dairy cows. If the cellulose content is too high, the price will be lower and the producer will have less proft. An agronomist examines the cellulose content of one type of alfalfa hay. Suppose that the cellulose content in the population has standard deviation $σ = 8$ milligrams per gram ( $mg / m$ ). A sample of 15 cuttings has mean cellulose content $\bar{x} = 145 mg / g$ .

Give a 90% confidence interval for the mean cellulose content in the population.
A previous study claimed that the mean cellulose content was $μ = 140 mg / g$ , but the agronomist believes that the mean is higher than that figure. State $H_{0}$ and $H_{a}$ , and carry out a significance test to see if the new data support this belief.
The statistical procedures used in parts (a) and (b) are valid when several assumptions are met. What are these assumptions?

Question 6.127

6.127 Where do you buy?

Consumers can purchase nonprescription medications at food stores, mass merchandise stores such as Kmart and Walmart, or pharmacies. About 45% of consumers make such purchases at pharmacies. What accounts for the popularity of pharmacies, which often charge higher prices?

A study examined consumers’ perceptions of overall performance of the three types of store using a long questionnaire that asked about such things as “neat and attractive store,” “knowledgeable staff,” and “assistance in choosing among various types of nonprescription medication.” A performance score was based on 27 such questions. The subjects were 201 people chosen at random from the Indianapolis telephone directory. Here are the means and standard deviations of the performance scores for the sample:²⁶

Store type	$\bar{x}$	$s$
Food stores	18.67	24.95
Mass merchandisers	32.38	33.37
Pharmacies	48.60	35.62

We do not know the population standard deviations, but a sample standard deviation $s$ from so large a sample is usually close to $σ$ . Use $s$ in place of the unknown $σ$ in this exercise.

What population do you think the authors of the study want to draw conclusions about? What population are you certain they can draw conclusions about?
Give 95% confidence intervals for the mean performance for each type of store.
Based on these confidence intervals, are you convinced that consumers think that pharmacies offer higher performance than the other types of stores? In Chapter 12, we study a statistical method for comparing the means of several groups.

6.127

(a) The ideal population is all nonprescription medication customers. The actual population consists of those listed in the Indianapolis telephone directory. (b) Food stores: (15.22, 22.12), Mass merchandisers: (27.77, 36.99), Pharmacies: (43.68, 53.52). (c) Yes, the confidence interval for the pharmacies gives values much higher than in the other two intervals.

Question 6.128

6.128 Using software on a data set.

Refer to Exercise 6.125 and the DMS odor threshold data. As noted in the exercise, assume $σ = 7 μ g / l$ . Read the data into statistical software, and obtain the 95% confidence interval for the mean DMS. Standard Excel does not provide an option for confidence intervals for the mean when $σ$ is known.

odor

JMP users: With data in a data table, select the data in the Distribution platform to get the histogram and other summary statistics. With the red arrow option pull down, go to Confidence Interval and then select Other. You will then find an option to provide a known sigma.
Minitab users: With data in a worksheet, do the following pull-down sequence: Stat → Basic Statistics → 1-Sample Z.

Question 6.129

6.129 Using software with summary measures.

Most statistical software packages provide an option of find confidence interval limits by inputting the sample mean, sample size, population standard deviation, and desired confidence level.

JMP users: Do the following pull-down sequence: Help → Sample Data and then select Confidence Interval for One Mean found in the Calculators group.
Minitab users: Do the following pull-down sequence: Stat → Basic Statistics → 1 Sample Z and select Summarized data option.

Have software find the 95% confidence interval for the mean when $\bar{x} = 20$ , $n = 27$ , and $σ = 4$ .
Find a 93.5% confidence interval using the information of part (a).

6.129

(a) (18.49, 21.51). (b) (18.58, 21.42)

Question 6.130

6.130 CEO pay.

A study of the pay of corporate chief executive officers (CEOs) examined the increase in cash compensation of the CEOs of 104 companies, adjusted for infation, in a recent year. The mean increase in real compensation was $\bar{x} = 6.9 %$ , and the standard deviation of the increases was $s = 55 %$ . Is this good evidence that the mean real compensation $μ$ of all CEOs increased that year? The hypotheses are

$\begin{matrix} H_{0} : μ = 0 & (no increase) \\ H_{a} : μ > 0 & (no increase) \end{matrix}$

Page 354

Because the sample size is large, the sample $s$ is close t the population $σ$ , so take $σ = 55 %$ .

Sketch the Normal curve for the sampling distribution of $\bar{x}$ when $H_{0}$ is true. Shade the area that represents the $P$ -value for the observed outcome $\bar{x} = 6.9 %$ .
Calculate the $P$ -value.
Is the result signifcant at the $α = 0.05$ level? Do you think the study gives strong evidence that the mean compensation of all CEOs went up?

Question 6.131

6.131 Large samples.

Statisticians prefer large samples. Describe briefly the effect of increasing the size of a sample (or the number of subjects in an experiment) on each of the following.

The width of a level $C$ confidence interval.
The $P$ -value of a test when $H_{0}$ is false and all facts about the population remain unchanged as $n$ increases.
The power of a fixed level $α$ test when $α$ , the alternative hypothesis, and all facts about the population remain unchanged.

6.131

(a) The confidence interval gets narrower. (b) The $P$ -value gets smaller. (c) Power increases.

Question 6.132

6.132 Roulette.

A roulette wheel has 18 red slots among its 38 slots. You observe many spins and record the number of times that red occurs. Now you want to use these data to test whether the probability of a red has the value that is correct for a fair roulette wheel. State the hypotheses $H_{0}$ and $H_{a}$ that you will test.

Question 6.133

6.133 Signifcant.

When asked to explain the meaning of “statistically signifcant at the $α = 0.05$ level,” a student says, “This means there is only probability 0.05 that the null hypothesis is true.” Is this a correct explanation of statistical significance? Explain your answer.

6.133

This student is wrong. $α = 0.05$ means there is a 5% chance that we will incorrectly reject the null hypothesis.

Question 6.134

6.134 Signifcant.

Another student, when asked why statistical significance appears so often in research reports, says, “Because saying that results are signifcant tells us that they cannot easily be explained by chance variation alone.” Do you think that this statement is essentially correct? Explain your answer.

Question 6.135

6.135 Welfare reform.

A study compares two groups of mothers with young children who were on welfare two years ago. One group attended a voluntary training program offered free of charge at a local vocational school and advertised in the local news media. The other group did not choose to attend the training program. The study finds a signifcant difference ( $P < 0.01$ ) between the proportions of the mothers in the two groups who are still on welfare. The difference is not only signifcant but quite large. The report says that with 95% confidence the percent of the nonattending group still on welfare is $21 % \pm 4 %$ higher than that of the group who attended the program. You are on the staff of a member of Congress who is interested in the plight of welfare mothers and who asks you about the report.

Explain briefly, and in nontechnical language, what “a signifcant difference ( $P < 0.01$ )” means.
Explain clearly and briefly what “95% confidence” means.
Is this study good evidence that requiring job training of all welfare mothers would greatly reduce the percent who remain on welfare for several years?

6.135

(a) The difference between the groups is so large that we do not believe it is attributed to chance. (b) 95% confidence means our results, in the long run, will be correct 95% of the time. (c) Not necessarily because there likely are lurking variables. For example, it is possible that those mothers willing to sign up for the training program are also more actively seeking employment, which could account for the difference.

Question 6.136

6.136 Sample mean distribution.

Consider the following distribution for a discrete random variable $X$ :

$k$	−2	−1	0	1
$P (X = k)$	1/4	1/4	1/4	1/4

Imagine a simple experiment of randomly generating a value for $X$ and recording it and then repeating a second time. Recognize that it is possible to get the same result on both trials. Finally, take the average of the two observed values.

Hand draw the probability distribution of $X$ .
Find $P (X < 0)$ on either of the trials.
Find the probability that $X$ is less than 0 for both trials.
List out all the possible outcomes of the experiment. Find all the possible values of $\bar{x}$ , and determine the probability distribution for the possible sample mean values.
Based on the probabilities found in part (d), hand draw the probability distribution for the sample mean statistic. Describe the shape of this probability distribution in relationship to the probability distribution of part (a). What phenomenon discussed in this chapter is taking place?
Find the probability that the sample mean statistic is less 0. Explain why this probability is not the same as what you found in part (c).

Question 6.137

6.137 Median statistic.

When a distribution is symmetric, the mean and median will equal. So, when sampling from a symmetric population, it would seem that we would be indifferent in using either the sample mean or sample median for estimating the population mean. Let’s explore this question by simulation. With software, you need to generate 1000 SRS based on $n = 5$ from the standard Normal distribution. The easiest way to proceed is to create five adjacent columns of 1000 rows of random numbers from the standard Normal distribution.

Page 355

Excel users: To generate a random number from the standard Normal distribution, enter “ $= NORM .INV(RAND(), 0, 1)$ ” in any cell. Use the convenience of the dragging the lower-right corner of a highlighted cell to copy and paste down the column and then across columns to get five columns of 1000 random numbers.
JMP users: With a new data table, right-click on header of Column 1 and choose Column Info. In the drag-down dialog box named Initialize Data, pick Random option. Choose the bullet option of Random Normal, which has the standard Normal as the default setting. Input the value of 1000 into the Number of rows box and then click OK. Repeat to get five columns of random numbers.
Minitab users: Do the following pull-down sequence: Calc → Random Data → Normal. The default settings is for the standard Normal distribution. Enter “1000” in the Number of rows of data to generate box and type “c1-c5” in the Store in column(s) box. Click OK to find 1000 random numbers in the five columns.

For each row, find the mean and median of the five random observations. In JMP, define new columns using the formula editor, with the Mean function applied to the five columns and the Quantile function with the first argument as 0.5 and the other arguments being each of the five columns. In Minitab, this all can be done using the Row Statistics option found under Calc.

Find the average of the 1000 samples means and the average of the 1000 sample medians. Are these averages close to the population mean of 0?
Find the standard deviation of the 1000 sample means. What is theoretical standard deviation? Is the estimated standard deviation close to the theoretical standard deviation?
Find the standard deviation of the 1000 sample medians.
Compare the estimated standard deviation of the mean statistic from part (b) with the standard deviation of the median statistic.
Refer to the four bull’s-eyes of Figure 5.14 (page 280). In the estimation of the mean of a symmetric population, which bull’s-eye is associated with the sample mean statistic, and which bull’s-eye is associated with the sample median statistic?

6.137

Answers will vary. (a) They both should be close to 0. (b) The theoretical standard deviation is 0.4472. The estimated standard deviation should be close to this number. (c) This will be somewhat higher than 0.4472. (d) The standard deviation of the median statistic is larger than the standard deviation of the mean statistic. (e) D is associated with the mean, and B is associated with the median.

Page 356