Chapter 7: Inference for Means

SECTION 7.2 EXERCISES

For Exercises 7.48 and 7.49, see page 439; for Exercises 7.50 and 7.51, see pages 440–441; for Exercise 7.52, see page 448; and for Exercises 7.53 and 7.54, see page 453.

In exercises that call for two-sample t procedures, you may use either of the two approximations for the degrees of freedom that we have discussed: the value given by your software or the smaller of n₁ − 1 and n₂ − 1. Be sure to state clearly which approximation you have used.

Question 7.55

7.55 What is wrong? In each of the following situations, identify what is wrong and then either explain why it is wrong or change the wording of the statement to make it true.

(a) A researcher wants to test versus the two-sided alternative .
(b) A study recorded the IQ scores of 100 college freshmen. The scores of the 56 males in the study were compared with the scores of all 100 freshmen using the two-sample methods of this section.
(c) A two-sample t statistic gave a P-value of 0.94. From this, we can reject the null hypothesis with 90% confidence.
(d) A researcher is interested in testing the one-sided alternative . The significance test gave . Because the P-value for the two-sided alternative is 0.036, he concluded that his P-value was 0.018.

455

Question 7.56

7.56 Basic concepts. For each of the following, answer the question and give a short explanation of your reasoning.

(a) A 95% confidence interval for the difference between two means is reported as . What can you conclude about the results of a significance test of the null hypothesis that the population means are equal versus the two-sided alternative?
(b) Will larger samples generally give a larger or smaller margin of error for the difference between two sample means?

Question 7.57

7.57 More basic concepts. For each of the following, answer the question and give a short explanation of your reasoning.

(a) A significance test for comparing two means gave with 10 degrees of freedom. Can you reject the null hypothesis that the μ’s are equal versus the two-sided alternative at the 5% significance level?
(b) Answer part (a) for the one-sided alternative that the difference between means is negative.

Question 7.58

7.58 Physical demands of women’s rugby seven matches. Rugby sevens is rapidly growing in popularity and will be included in the 2016 Olympics. Matches are played on a full rugby field and consist of two seven-minute halves. Each team also consists of seven players. To better understand the demands of women’s rugby sevens, a group of researchers compared the physical qualities of elite players from the Canadian National team with a university squad. The following table summarizes some of these qualities:²⁸

	Elite (n = 16)		University (n = 13)
Quality		s		s
Sprint speed (km/hr)	27.3	0.7	26.0	1.5
Peak heart rate (bpm)	192.0	6.0	193.0	6.0
Intermittent recovery test (m)	1160	191	781	129

Carry out the significance tests using . Report the test statistic with the degrees of freedom and the P-value. Write a short summary of your conclusion.

Question 7.59

7.59 Noise levels in fitness classes. Fitness classes often have very loud music that could affect hearing. One study collected noise levels (decibels) in both high-intensity and low-intensity fitness classes across eight commercial gyms in Sydney, Australia.²⁹

(a) Create a histogram or Normal quantile plot for the high-intensity classes. Do the same for the low-intensity classes. Are the distributions reasonably Normal? Summarize the distributions in words.
(b) Test the equality of means using a two-sided alternative hypothesis and significance level .
(c) Are the t procedures appropriate given your observations in part (a)? Explain your answer.
(d) Remove the one low decibel reading for the low-intensity group and redo the significance test. How does this outlier affect the results?
(e) Do you think the results of the significance test from part (b) or (d) should be reported? Explain your answer.

Question 7.60

7.60 Noise levels in fitness classes, continued. Refer to the previous exercise. In most countries, the workplace noise standard is 85 db (over eight hours). For every 3 dB increase above that, the amount of exposure time is halved. This means that the exposure time for a dB level of 91 is two hours and for a dB level of 94 it is one hour.

(a) Construct a 95% confidence interval for the mean dB level in high-intensity classes.
(b) Using the interval in part (a), construct a 95% confidence interval for the number of one-hour classes per day an instructor can teach before possibly risking hearing loss. (Hint: This is a linear transformation.)
(c) Repeat parts (a) and (b) for low-intensity classes.
(d) Explain how one might use these intervals to determine the staff size of a new gym.

Question 7.61

7.61 When is 30/31 days not equal to a month? Time can be expressed on different levels of scale; days, weeks, months, and years. Can the scale provided influence perception of time? For example, if you placed an order over the phone, would it make a difference if you were told the package would arrive in four weeks or one month? To investigate this, two researchers asked a group of 267 college students to imagine their car needed major repairs and would have to stay at the shop. Depending on the group he or she was randomized to, the student was either told it would take one month or 30/31 days. Each student was then asked to give best- and worst-case estimates of when the car would be ready. The interval between these two estimates (in days) was the response. Here are the results:³⁰

Group	n		s
30/31 days	177	20.4	14.3
One month	90	24.8	13.9

(a) Given that the interval cannot be less than 0, the distributions are likely skewed. Comment on the appropriateness of using the t procedures.
456

(b) Test that the average interval is the same for the two groups using the significance level. Report the test statistic, the degrees of freedom, and the P-value. Give a short summary of your conclusion.

Question 7.62

7.62 When is 52 weeks not equal to a year? Refer to the previous exercise. The researchers also had 60 marketing students read an announcement about a construction project. The expected duration was either one year or 52 weeks. Each student was then asked to state the earliest and latest completion date.

Group	n		s
52 weeks	30	84.1	55.8
1 year	30	139.6	73.1

Test that the average interval is the same for the two groups using the significance level. Report the test statistic, the degrees of freedom, and the P-value. Give a short summary of your conclusion.

Question 7.63

7.63 Trustworthiness and eye color. Why do we naturally tend to trust some strangers more than others? One group of researchers decided to study the relationship between eye color and trustworthiness.³¹ In their experiment, the researchers took photographs of 80 students (20 males with brown eyes, 20 males with blue eyes, 20 females with brown eyes, and 20 females with blue eyes), each seated in front of a white background looking directly at the camera with a neutral expression. These photos were cropped so the eyes were horizontal and at the same height in the photo and so the neckline was visible. They then recruited 105 participants to judge the trustworthiness of each student photo. This was done using a 10-point scale, where 1 meant very untrustworthy and 10 very trustworthy. The 80 scores from each participant were then converted to z -scores, and the average z -score of each photo (across all 105 participants) was used for the analysis. Here is a summary of the results:

Eye color	n		s
Brown	40	0.55	1.68
Blue	40	−0.38	1.53

Can we conclude from these data that brown-eyed students appear more trustworthy compared to their blue-eyed counterparts? Test the hypothesis that the average scores for the two groups are the same.

Question 7.64

7.64 Facebook use in college. Because of Facebook’s rapid rise in popularity among college students, there is a great deal of interest in the relationship between Facebook use and academic performance. One study collected information on undergraduate students to look at the relationships among frequency of Facebook use, participation in Facebook activities, time spent preparing for class, and overall GPA.³²

Students reported preparing for class an average of 706 minutes per week with a standard deviation of 526 minutes. Students also reported spending an average of 106 minutes per day on Facebook with a standard deviation of 93 minutes; 8% of the students reported spending no time on Facebook.

(a) Construct a 95% confidence interval for the average number of minutes per week a student prepares for class.
(b) Construct a 95% confidence interval for the average number of minutes per week a student spends on Facebook. ( Hint: Be sure to convert from minutes per day to minutes per week.)
(c) Explain why you might expect the population distributions of these two variables to be highly skewed to the right. Do you think this fact makes your confidence intervals invalid? Explain your answer.

Question 7.65

7.65 Possible biases? Refer to the previous exercise. The researcher surveyed students at a four-year, public university in the northeastern United States (). Each student was emailed a link to the survey hosted on SurveyMonkey.com. The researcher also states:

For the students who did not participate immediately, two additional reminders were sent, one week apart. Participants were offered a chance to enter a drawing to win one of 90 $10 Amazon.com gift cards as incentive. A total of 1839 surveys were completed for an overall response rate of 48%.

Discuss how these factors influence your interpretation of the results of this survey.

Question 7.66

7.66 Comparing means. Refer to Exercise 7.64. Suppose that you wanted to compare the average minutes per week spent on Facebook with the average minutes per week spent preparing for class.

(a) Provide an estimate of this difference.
(b) Explain why it is incorrect to use the two-sample t test to see if the means differ.

Question 7.67

7.67 Sadness and spending. The “misery is not miserly” phenomenon refers to a person’s spending judgment going haywire when the person is sad. In a study, 31 young adults were given $10 and randomly assigned to either a sad or a neutral group. The participants in the sad group watched a video about the death of a boy’s mentor (from The Champ), and those in the neutral group watched a video on the Great Barrier Reef. After the video, each participant was offered the chance to trade $0.50 increments of the $10 for an insulated water bottle.³³ Here are the data:

457

Group	Purchase price ($)
Neutral	0.00	2.00	0.00	1.00	0.50	0.00	0.50
	2.00	1.00	0.00	0.00	0.00	0.00	1.00
Sad	3.00	4.00	0.50	1.00	2.50	2.00	1.50	0.00	1.00
	1.50	1.50	2.50	4.00	3.00	3.50	1.00	3.50

(a) Examine each group’s prices graphically. Is use of the t procedures appropriate for these data? Carefully explain your answer.
(b) Make a table with the sample size, mean, and standard deviation for each of the two groups.
(c) State appropriate null and alternative hypotheses for comparing these two groups.
(d) Perform the significance test at the level, making sure to report the test statistic, degrees of freedom, and P-value. What is your conclusion?
(e) Construct a 95% confidence interval for the mean difference in purchase price between the two groups.

Question 7.68

7.68 Diet and mood. Researchers were interested in comparing the long-term psychological effects of being on a high-carbohydrate, low-fat (LF) diet versus a high-fat, low-carbohydrate (LC) diet.³⁴ A total of 106 overweight and obese participants were randomly assigned to one of these two energy-restricted diets. At 52 weeks, 32 LC dieters and 33 LF dieters remained. Mood was assessed using a total mood disturbance score (TMDS), where a lower score is associated with a less negative mood. A summary of these results follows:

Group	n		s
LC	32	47.3	28.3
LF	33	19.3	25.8

(a) Is there a difference in the TMDS at Week 52? Test the null hypothesis that the dieters’ average mood in the two groups is the same. Use a significance level of 0.05.
(b) Critics of this study focus on the specific LC diet (that it, the science) and the dropout rate. Explain why the dropout rate is important to consider when drawing conclusions from this study.

Question 7.69

7.69 Drive-thru customer service. QSRMagazine.com assessed 1855 drive-thru visits at quick-service restaurants.³⁵ One benchmark assessed was customer service. Responses ranged from “Rude (1)” to “Very Friendly (5).” The following table breaks down the responses according to two of the chains studied.

	Rating
Chain	1	2	3	4	5
Taco Bell	0	5	41	143	119
McDonald’s	1	22	55	139	100

(a) A researcher decides to compare the average rating of McDonald’s and Taco Bell. Comment on the appropriateness of using the average rating for these data.
(b) Assuming an average of these ratings makes sense, comment on the use of the t procedures for these data.
(c) Report the means and standard deviations of the ratings for each chain separately.
(d) Test whether the two chains, on average, have the same customer satisfaction. Use a two-sided alternative hypothesis and a significance level of 5%.

Question 7.70

7.70 Comparison of two web page designs. You want to compare the daily number of hits for two different website designs for your indie rock band. You assign the next 30 days to either Design A or Design B, 15 days to each.

(a) Would you use a one-sided or a two-sided significance test for this problem? Explain your choice.
(b) If you use Table D to find the critical value, what are the degrees of freedom using the second approximation?
(c) If you perform the significance test using , how large (positive or negative) must the t statistic be to reject the null hypothesis that the two designs result in the same average hits?

Question 7.71

7.71 Comparison of dietary composition. Refer to Example 7.15 (page 443). That study also broke down the dietary composition of the main meal. The following table summarizes the total fats, protein, and carbohydrates in the main meal (g) for the two groups:

	Early eaters (n = 202)		Late eaters (n = 200)
		s		s
Fats	23.1	12.5	21.4	8.2
Protein	27.6	8.6	25.7	6.8
Carbohydrates	64.1	21.0	63.5	20.8

(a) Is it appropriate to use the two-sample t procedures that we studied in this section to analyze these data for group differences? Give reasons for your answer.
(b) Describe appropriate null and alternative hypotheses for comparing the two groups in terms of fats consumed.
(c) Carry out the significance test using . Report the test statistic with the degrees of freedom and the P-value. Write a short summary of your conclusion.
(d) Find a 95% confidence interval for the difference between the two means. Compare the information given by the interval with the information given by the significance test.

458

Question 7.72

7.72 More on dietary composition. Refer to the previous exercise. Repeat parts (b) through (d) for protein and for carbohydrates. Combining these results with the results of Exercise 7.71, write a short summary of your findings.

Question 7.73

7.73 Change in portion size. A study of food portion sizes reported that over a 17-year period, the average size of a soft drink consumed by Americans aged two years and older increased from 13.1 ounces (oz) to 19.9 oz. The authors state that the difference is statistically significant with .³⁶ Explain what additional information you would need to compute a confidence interval for the increase, and outline the procedure that you would use for the computations. Do you think that a confidence interval would provide useful additional information? Explain why or why not.

Question 7.74

7.74 Beverage consumption. The results in the previous exercise were based on two national surveys with a very large number of individuals. Here is a study that also looked at beverage consumption, but the sample sizes were much smaller. One part of this study compared 20 children who were 7 to 10 years old with 5 children who were 11 to 13.³⁷ The younger children consumed an average of 8.2 oz of sweetened drinks per day, while the older ones averaged 14.5 oz. The standard deviations were 10.7 oz and 8.2 oz, respectively.

(a) Do you think that it is reasonable to assume that these data are Normally distributed? Explain why or why not. ( Hint: Think about the 68–95–99.7 rule.)
(b) Using the methods in this section, test the null hypothesis that the two groups of children consume equal amounts of sweetened drinks versus the two-sided alternative. Report all details of the significance-testing procedure with your conclusion.
(c) Give a 95% confidence interval for the difference in means.
(d) Do you think that the analyses performed in parts (b) and (c) are appropriate for these data? Explain why or why not.
(e) The children in this study were all participants in an intervention study at the Cornell Summer Day Camp at Cornell University. To what extent do you think that these results apply to other groups of children?

Question 7.75

7.75 Study design is important! Recall Exercise 7.70 (page 457). You are concerned that day of the week may affect the number of hits. So to compare the two web page designs, you choose two successive weeks in the middle of a month. You flip a coin to assign one Monday to the first design and the other Monday to the second. You repeat this for each of the seven days of the week. You now have seven hit amounts for each design. It is incorrect to use the two-sample t test to see if the mean hits differ for the two designs. Carefully explain why.

Question 7.76

7.76 New hybrid tablet and laptop? The purchasing department has suggested your company switch to a new hybrid tablet and laptop. As CEO, you want data to be assured that employees will like these new hybrids over the old laptops. You designate the next 16 employees needing a new laptop to participate in an experiment in which eight will be randomly assigned to receive the standard laptop and the remainder will receive the new hybrid tablet and laptop. After a month of use, these employees will express their satisfaction with their new computers by responding to the statement “I like my new computer” on a scale from 1 to 5, where 1 represents “strongly disagree,” 2 is “disagree,” 3 is “neutral,” 4 is “agree,” and 5 is “strongly agree.”

(a) The employees with the hybrid computers have an average satisfaction score of 4.3 with standard deviation 0.7. The employees with the standard laptops have an average of 3.7 with standard deviation 1.5. Give a 95% confidence interval for the difference in the mean satisfaction scores for all employees.
(b) Would you reject the null hypothesis that the mean satisfaction for the two types of computers is the same versus the two-sided alternative at significance level 0.05? Use your confidence interval to answer this question. Explain why you do not need to calculate the test statistic.

Question 7.77

7.77 Why randomize? Refer to the previous exercise. A coworker suggested that you give the new hybrid computers to the next eight employees who need new computers and the standard laptop to the following eight. Explain why your randomized design is better.

Question 7.78

7.78 Does ad placement matter? Corporate advertising tries to enhance the image of the corporation. A study compared two ads from two sources, the Wall Street Journal and the National Enquirer. Subjects were asked to pretend that their company was considering a major investment in Performax, the fictitious sportswear firm in the ads. Each subject was asked to respond to the question “How trustworthy was the source in the sportswear company ad for Performax?” on a 7-point scale. Higher values indicated more trustworthiness.³⁸ Here is a summary of the results:

Ad source	n		s
Wall Street Journal	66	4.77	1.50
National Enquirer	61	2.43	1.64

(a) Compare the two sources of ads using a t test. Be sure to state your null and alternative hypotheses, the test statistic with degrees of freedom, the P-value, and your conclusion.
(b) Give a 95% confidence interval for the difference.
(c) Write a short paragraph summarizing the results of your analyses.

459

Question 7.79

7.79 Size of trees in the northern and southern halves. The study of 584 longleaf pine trees in the Wade Tract in Thomas County, Georgia, had several purposes. Are trees in one part of the tract more or less like trees in any other part of the tract or are there differences? In Example 6.1 (page 342), we examined how the trees were distributed in the tract and found that the pattern was not random. In this exercise, we will examine the sizes of the trees. In Exercise 7.33 (page 429), we analyzed the sizes, measured as diameter at breast height (DBH), for a random sample of 40 trees. Here, we divide the tract into northern and southern halves and take random samples of 30 trees from each half. Here are the diameters in centimeters (cm) of the sampled trees:

	27.8	14.5	39.1	3.2	58.8	55.5	25.0	5.4	19.0	30.6
North	15.1	3.6	28.4	15.0	2.2	14.2	44.2	25.7	11.2	46.8
	36.9	54.1	10.2	2.5	13.8	43.5	13.8	39.7	6.4	4.8
	44.4	26.1	50.4	23.3	39.5	51.0	48.1	47.2	40.3	37.4
South	36.8	21.7	35.7	32.0	40.4	12.8	5.6	44.3	52.9	38.0
	2.6	44.6	45.5	29.1	18.7	7.0	43.8	28.3	36.9	51.6

(a) Use a back-to-back stemplot and side-by-side boxplots to examine the data graphically. Describe the patterns in the data.
(b) Is it appropriate to use the methods of this section to compare the mean DBH of the trees in the north half of the tract with the mean DBH of the trees in the south half? Give reasons for your answer.
(c) What are appropriate null and alternative hypotheses for comparing the two samples of tree DBHs? Give reasons for your choices.
(d) Perform the significance test. Report the test statistic, the degrees of freedom, and the P-value. Summarize your conclusion.
(e) Find a 95% confidence interval for the difference in mean DBHs. Explain how this interval provides additional information about this problem.

Question 7.80

7.80 Size of trees in the eastern and western halves. Refer to the previous exercise. The Wade Tract can also be divided into eastern and western halves. Here are the DBHs of 30 randomly selected longleaf pine trees from each half:

	23.5	43.5	6.6	11.5	17.2	38.7	2.3	31.5	10.5	23.7
East	13.8	5.2	31.5	22.1	6.7	2.6	6.3	51.1	5.4	9.0
	43.0	8.7	22.8	2.9	22.3	43.8	48.1	46.5	39.8	10.9
	17.2	44.6	44.1	35.5	51.0	21.6	44.1	11.2	36.0	42.1
West	3.2	25.5	36.5	39.0	25.9	20.8	3.2	57.7	43.3	58.0
	21.7	35.6	30.9	40.6	30.7	35.6	18.2	2.9	20.4	11.4

Using the questions in the previous exercise, analyze these data.

Question 7.81

7.81 Sales of a small appliance across months. A market research firm supplies manufacturers with estimates of the retail sales of their products from samples of retail stores. Marketing managers are prone to look at the estimate and ignore sampling error. Suppose that an SRS of 60 stores this month shows mean sales of 53 units of a small appliance, with standard deviation 12 units. During the same month last year, an SRS of 58 stores gave mean sales of 50 units, with standard deviation 10 units. An increase from 50 to 53 is a rise of 6%. The marketing manager is happy because sales are up 6%.

(a) Use the two-sample t procedure to give a 95% confidence interval for the difference in mean number of units sold at all retail stores.
(b) Explain in language that the manager can understand why he cannot be certain that sales rose by 6%, and that in fact sales may even have dropped.

Question 7.82

7.82 An improper significance test. A friend has performed a significance test of the null hypothesis that two means are equal. His report states that the null hypothesis is rejected in favor of the alternative that the first mean is larger than the second. In a presentation on his work, he notes that the first sample mean was larger than the second mean and this is why he chose this particular one-sided alternative.

(a) Explain what is wrong with your friend’s procedure and why.
(b) Suppose that he reported with a P-value of 0.06. What is the correct P-value that he should report?

Question 7.83

7.83 Breast-feeding versus baby formula. A study of iron deficiency among infants compared samples of infants following different feeding regimens. One group contained breast-fed infants, while the infants in another group were fed a standard baby formula without any iron supplements. Here are summary results on blood hemoglobin levels at 12 months of age:³⁹

Group	n		s
Breast-fed	23	13.3	1.7
Formula	19	12.4	1.8

(a) Is there significant evidence that the mean hemoglobin level is higher among breast-fed babies? State and and carry out a t test. Give the P-value. What is your conclusion?
(b) Give a 95% confidence interval for the mean difference in hemoglobin level between the two populations of infants.
(c) State the assumptions that your procedures in parts (a) and (b) require in order to be valid.

460

Question 7.84

7.84 Revisiting the sadness and spending study. In Exercise 7.67 (page 456), the purchase price of a water bottle was analyzed using the two-sample t procedures that do not assume equal standard deviations. Compare the means using a significance test and find the 95% confidence interval for the difference using the pooled methods. How do the results compare with those you obtained in Exercise 7.67?

Question 7.85

7.85 Revisiting the diet and mood study. In Exercise 7.68 (page 457), the total mood disturbance score means were compared using the two-sample t procedures that do not assume equal standard deviations. Compare the means using a significance test and find the 95% confidence interval for the difference using the pooled methods. How do the results compare with those you obtained in Exercise 7.68?

Question 7.86

7.86 Revisiting dietary composition. In Exercise 7.71 (page 457), the total amount of fats was analyzed using the two-sample t procedures that do not assume equal standard deviations. Compare the means using a significance test and find the 95% confidence interval for the difference using the pooled methods. How do the results compare with those you obtained in Exercise 7.71?

Question 7.87

7.87 Revisiting the size of trees. Refer to the Wade Tract DBH data in Exercise 7.79, where we compared a sample of trees from the northern half of the tract with a sample from the southern half. Because the standard deviations for the two samples are quite close, it is reasonable to analyze these data using the pooled procedures. Perform the significance test and find the 95% confidence interval for the difference in means using these methods. Summarize your results and compare them with what you found in Exercise 7.79.

Question 7.88

7.88 Revisiting the food-timing study. Example 7.15 (page 443) gives summary statistics for weight loss in early eaters and late eaters. The two sample standard deviations are quite similar, so we may be willing to assume equal population standard deviations. Calculate the pooled t test statistic and its degrees of freedom from the summary statistics. Use Table D to assess significance. How do your results compare with the unpooled analysis in the example?

Question 7.89

7.89 Computing the degrees of freedom. Use the Wade Tract data in Exercise 7.79 to calculate the software approximation to the degrees of freedom using the formula on page 447. Verify your calculation with software.

Question 7.90

7.90 Again computing the degrees of freedom. Use the Wade Tract data in Exercise 7.80 to calculate the software approximation to the degrees of freedom using the formula on page 447. Verify your calculation with software.

Question 7.91

7.91 Revisiting the small-sample example. Refer to Example 7.16 (page 444). This is a case where the sample sizes are quite small. With only five observations per group, we have very little information to make a judgment about whether the population standard deviations are equal. The potential gain from pooling is large when the sample sizes are small. Assume that we will perform a two-sided test using the 5% significance level.

(a) Find the critical value for the unpooled t test statistic that does not assume equal variances. Use the minimum of and for the degrees of freedom.
(b) Find the critical value for the pooled t test statistic.
(c) How does comparing these critical values show an advantage of the pooled test?

Question 7.92

7.92 Two-sample test of equivalence. In Section 7.1, we were introduced to the one-sample test of equivalence (page 421). Using those same concepts, describe how to perform a two-sample test of equivalence.