For Exercises 7.40 and 7.41, see page 382; for 7.42, see page 383; for 7.43, see page 384; and for 7.44 and 7.45, see page 392.
In exercises that call for two-sample procedures, you may use either of the two approximations for the degrees of freedom that we have discussed: the value given by your software or the smaller of and . Be sure to state clearly which approximation you have used.
7.46 What’s wrong?
In each of the following situations, explain what is wrong and why.
7.47 Understanding concepts.
For each of the following, answer the question and give a short explanation of your reasoning.
7.47
(a) Because 0 is not in the interval, we can reject the null hypothesis; the data support a significant difference between the two means. (b) Generally, a larger sample will result in a smaller margin of error.
7.48 Determining significance.
For each of the following, answer the question and give a short explanation of your reasoning.
7.49 Advertising in sports.
Can there ever be too many commercials during a sporting event? A group of researchers compared the level of acceptance for commercials between NASCAR and NFL fans.24 Each fan was asked a series of 5-point Likert scale questions to evaluate their level of commercial acceptance. The average of these questions was used as the response, where a lower score means less acceptance. Here are the results:
394
Group | |||
---|---|---|---|
NASCAR | 300 | 3.42 | 0.84 |
NFL | 302 | 3.27 | 0.81 |
7.49
(a) Yes, because outliers are not possible and , the procedures can be used. (b) (c) . The data are significant at the 5% level, and there is evidence of a difference between NASCAR and NFL average commercial acceptance levels.
7.50 Advertising in sports, continued.
Refer to the previous exercise. This study not only allows a comparison of these two fan groups, but also an assessment of each fan group separately. Write a short paragraph summarizing the key results an advertiser should take away from this study.
7.51 Trustworthiness and eye color.
Why do we naturally tend to trust some strangers more than others? One group of researchers decided to study the relationship between eye color and trustworthiness.25 In their experiment, the researchers took photographs of 80 students (20 males with brown eyes, 20 males with blue eyes, 20 females with brown eyes, and 20 females with blue eyes), each seated in front of a white background looking directly at the camera with a neutral expression. These photos were cropped so the eyes were horizontal and at the same height in the photo and so the neckline was visible. They then recruited 105 participants to judge the trustworthiness of each student photo. This was done using a 10-point scale, where 1 meant very untrustworthy and 10 very trustworthy. The 80 scores from each participant were then converted to -scores, and the average -score of each photo (across all 105 participants) was used for the analysis. Here is a summary of the results:
Eye color | |||
---|---|---|---|
Brown | 40 | 0.55 | 1.68 |
Blue | 40 | −0.38 | 1.53 |
Can we conclude from these data that brown-eyed students appear more trustworthy compared with their blue-eyed counterparts? Test the hypothesis that the average scores for the two groups are the same.
7.51
. (c) . The data show that brown-eyed students appear more trustworthy compared with their blue-eyed counterparts.
7.52 Sadness and spending.
The “misery is not miserly” phenomenon refers to a sad person’s spending judgment going haywire. In a recent study, 31 young adults were given $10 and randomly assigned to either a sad or a neutral group. The participants in the sad group watched a video about the death of a boy’s mentor (from The Champ), and those in the neutral group watched a video on the Great Barrier Reef. After the video, each participant was offered the chance to trade $0.50 increments of the $10 for an insulated water bottle.26 Here are the data:
sadness
Group | Purchase price ($) | ||||||||
---|---|---|---|---|---|---|---|---|---|
Neutral | 0.00 | 2.00 | 0.00 | 1.00 | 0.50 | 0.00 | 0.50 | ||
2.00 | 1.00 | 0.00 | 0.00 | 0.00 | 0.00 | 1.00 | |||
Sad | 3.00 | 4.00 | 0.50 | 1.00 | 2.50 | 2.00 | 1.50 | 0.00 | 1.00 |
1.50 | 1.50 | 2.50 | 4.00 | 3.00 | 3.50 | 1.00 | 3.50 |
7.53 Noise levels in fitness classes.
Fitness classes often have very loud music that could affect hearing. One study collected noise levels (decibels) in both high-intensity and low-intensity fitness classes across eight commercial gyms in Sydney, Australia.27
noise
7.53
(a) Both distributions are Normally distributed, except the low-intensity class has a low outlier.
(b) . The data are significant at the 5% level, and there is evidence the noise levels are different between the high- and low-intensity fitness classes. (c) Because the low-intensity class has an outlier, the -test is not appropriate. (d) . Removing the outlier didn’t change the results. (e) Because the outlier is not affecting the results, it is probably okay to report both tests. It would be a good idea to investigate the outlier and see why it had such a low decibel value; if it were drastically different in some way, it might be good to remove it and only report the test without it after mentioning its removal.
395
7.54 Noise levels in fitness classes, continued.
Refer to the previous exercise. In most countries, the workplace noise standard is 85 db (over eight hours). For every 3 dB increase above that, the amount of exposure time is halved. This means that the exposure time for a dB level of 91 is two hours, and for a dB level of 94 it is one hour.
noise
7.55 Counts of seeds in one-pound scoops.
Refer to Exercise 7.23 (pages 375–376). As part of the Six Sigma quality improvement effort, the company wants to compare scoops of seeds from two different packaging plants. An SRS of 50 one-pound scoops of seeds was collected from Plant 1746, and an SRS of 19 one-pound scoops of seeds was collected from Plant 1748. The number of seeds in each scoop were recorded.
seedcnt2
7.55
(a) For plant 1746: the data are roughly Normal. For plant 1748: the data are somewhat left-skewed but have several clusters or groups of points. (b) Because the total , the procedures are appropriate. (c) For 1746: . For 1748: . Using , the 99% C.I. is (−418.4, −76.8). (d) . The data are significant at the 1% level, and there is evidence that the mean number of seeds per 1-pound scoop is different for the two plants. (e) Answers will vary. The emphasis should be on the difference between the number of seeds so that potentially the scoops from plant 1746 are too light or the scoops from plant 1748 are too heavy (assuming the seeds are the same size/weight).
7.56 More on counts of seeds.
Refer to the previous exercise.
7.57 Drive-thru customer service.
QSRMagazine.com assessed 1855 drive-thru visits at quickservice restaurants.28 One benchmark assessed was customer service. Responses ranged from “Rude (1)” to “Very Friendly (5).” The following table breaks down the responses according to two of the chains studied.
drvthru
Rating | |||||
---|---|---|---|---|---|
Chain | 1 | 2 | 3 | 4 | 5 |
Taco Bell | 0 | 5 | 41 | 143 | 119 |
McDonald’s | 1 | 22 | 55 | 139 | 100 |
7.57
(a) The problem with averages on rating is that there is no guarantee the differences between ratings are equal, so that going from a rating of 1 to 2, and 2 to 3, etc., are equal. Taking averages assumes this so it is likely not appropriate. (b) The data are ratings from 1–5; as such they certainly will not be Normally distributed but because and outliers are not possible, the procedures can be used. (c) McDonald’s: . Taco Bell: . (d) . The data are significant at the 5% level, and there is evidence the average customer ratings between the two chains is different.
7.58 Dust exposure at work.
Exposure to dust at work can lead to lung disease later in life. One study measured the workplace exposure of tunnel construction workers.29 Part of the study compared 115 drill and blast workers with 220 outdoor concrete workers. Total dust exposure was measured in milligram years per cubic meter (). The mean exposure for the drill and blast workers was with a standard deviation of . For the outdoor concrete workers, the corresponding values were 6.5 and , respectively.
396
7.59 Not all dust is the same.
Not all dust particles that are in the air around us cause problems for our lungs. Some particles are too large and stick to other areas of our body before they can get to our lungs. Others are so small that we can breathe them in and out and they will not deposit in our lungs. The researchers in the study described in the previous exercise also measured respirable dust. This is dust that deposits in our lungs when we breathe it. For the drill and blast workers, the mean exposure to respirable dust was with a standard deviation of . The corresponding values for the outdoor concrete workers were 1.4 and , respectively. Analyze these data using the questions in the previous exercise as a guide.
7.59
(a) Answers will vary. But there are likely differences about this company’s workers that could not be generalized to other workers. (b) (4.37, 5.43). With 95% confidence, the drill and blast workers have between 4.37 and 5.43 more exposure to respirable dust than the outdoor concrete workers. (c) . There is significant evidence that the drill and blast workers have more exposure to respirable dust than the outdoor concrete workers. (d) Because , the procedures can be used for skewed data.
7.60 Active companies versus failed companies.
CASE 7.2 Examples 7.14 and 7.15 (pages 390–391) compare active and failed companies under the special assumption that the two populations of firms have the same standard deviation. In practice, we prefer not to make this assumption, so let’s analyze the data without making this assumption. We expect active firms to have a higher cash flow margins. Do the data give good evidence in favor of this expectation? By how much on the average does the cash flow margin for active firms exceed that for failed firms (use 99% confidence)?
cmps
7.61 When is 30/31 days not equal to a month?
Time can be expressed on different levels of scale; days, weeks, months, and years. Can the scale provided influence perception of time? For example, if you placed an order over the phone, would it make a difference if you were told the package would arrive in four weeks or one month? To investigate this, two researchers asked a group of 267 college students to imagine their car needed major repairs and would have to stay at the shop. Depending on the group he or she was randomized to, the student was either told it would take one month or 30/31 days. Each student was then asked to give best- and worst-case estimates of when the car would be ready. The interval between these two estimates (in days) was the response. Here are the results:30
Group | |||
---|---|---|---|
30/31 days | 177 | 20.4 | 14.3 |
One month | 90 | 24.8 | 13.9 |
7.61
(a) Because , we can use the procedures on skewed data. (b) . The data are significant at the 5% level, and there is evidence the means of the two groups are different. Those who are told 30/31 days have a smaller expectation interval on average than those who are told 1 month.
7.62 When is 52 weeks not equal to a year?
Refer to the previous exercise. The researchers also had 60 marketing students read an announcement about a construction project. The expected duration was either one year or 52 weeks. Each student was then asked to state the earliest and latest completion date.
Group | |||
---|---|---|---|
52 weeks | 30 | 84.1 | 55.8 |
1 year | 30 | 139.6 | 73.1 |
Test that the average interval is the same for the two groups using the significance level. Report the test statistic, the degrees of freedom, and the -value. Give a short summary of your conclusion.
7.63 Fitness and ego.
Employers sometimes seem to prefer executives who appear physically fit, despite the legal troubles that may result. Employers may also favor certain personality characteristics. Fitness and personality are related. In one study, middle-aged college faculty who had volunteered for a fitness program were divided into low-fitness and high-fitness groups based on a physical examination. The subjects then took the Cattell Sixteen Personality Factor Questionnaire.31 Here are the data for the “ego strength” personality factor:
ego
Low fitness | High fitness | ||||
---|---|---|---|---|---|
4.99 | 5.53 | 3.12 | 6.68 | 5.93 | 5.71 |
4.24 | 4.12 | 3.77 | 6.42 | 7.08 | 6.20 |
4.74 | 5.10 | 5.09 | 7.32 | 6.37 | 6.04 |
4.93 | 4.47 | 5.40 | 6.38 | 6.53 | 6.51 |
4.16 | 5.30 | 6.16 | 6.68 |
7.63
(a) . The data are significant at both the 5% and 1% levels, and there is evidence the two groups are different in mean ego strength. (b) No, they were all college faculty who volunteered and would not represent all middle-aged men. (c) No, the study was observational; we would need an experiment to show causation.
397
7.64 Study design matters!
In the previous exercise, you analyzed data on the ego strength of high-fitness and low-fitness participants in a campus fitness program. Suppose that instead you had data on the ego strengths of the same men before and after six months in the program. You wonder if the program has affected their ego scores. Explain carefully how the statistical procedures you would use would differ from those you applied in Exercise 7.63.
7.65 Sales of small appliances.
A market research firm supplies manufacturers with estimates of the retail sales of their products from samples of retail stores. Marketing managers are prone to look at the estimate and ignore sampling error. Suppose that an SRS of 70 stores this month shows mean sales of 53 units of a small appliance, with standard deviation 12 units. During the same month last year, an SRS of 58 stores gave mean sales of 50 units, with standard deviation 10 units. An increase from 50 to 53 is a rise of 6%. The marketing manager is happy, because sales are up 6%.
7.65
(a) (−0.91, 6.91). (b) With 95% confidence, the mean change in sales from last year to this year is between −0.91 and 6.91. Because the interval covers 0 and includes some negative values, it is possible sales have actually decreased.
7.66 Compare two marketing strategies.
A bank compares two proposals to increase the amount that its credit card customers charge on their cards. (The bank earns a percentage of the amount charged, paid by the stores that accept the card.) Proposal A offers to eliminate the annual fee for customers who charge $3600 or more during the year. Proposal B offers a small percent of the total amount charged as a cash rebate at the end of the year. The bank offers each proposal to an SRS of 150 of its existing credit card customers. At the end of the year, the total amount charged by each customer is recorded. Here are the summary statistics:
Group | |||
---|---|---|---|
A | 150 | $3385 | $468 |
B | 150 | $3124 | $411 |
7.67 More on smart shopping carts.
Recall Example 7.10 (pages 381–382). The researchers also had participants, who were not told they were on a budget, go through the same online grocery shopping exercise.
smart1
7.67
(a) For those with feedback, . For those without feedback, . (b) Both Normal quantile plots show the two variables are both roughly Normally distributed. (c) . The data are significant at the 5% level, and there is evidence the two groups are different in total cost for those with and without feedback among those who were not told they were on a budget. The results are similar to those in Example 7.10; feedback helped reduce spending.
7.68 New hybrid tablet and laptop?
The purchasing department has suggested your company switch to a new hybrid tablet and laptop. As CEO, you want data to be assured that employees will like these new hybrids over the old laptops. You designate the next 14 employees needing a new laptop to participate in an experiment in which seven will be randomly assigned to receive the standard laptop and the remainder will receive the new hybrid tablet and laptop. After a month of use, these employees will express their satisfaction with their new computers by responding to the statement “I like my new computer” on a scale from 1 to 5, where 1 represents “strongly disagree,” 2 is “disagree,” 3 is “neutral,” 4 is “agree,” and 5 is “strongly agree.”
7.69 Why randomize?
A coworker suggested that you give the new hybrid computers to the next seven employees who need new computers and the standard laptop to the following seven. Explain why your randomized design is better.
7.69
There could be things that are similar about the next 7 employees who need new computers as well as the following 7, which could bias the results (like being from the same office or department).
398
7.70 Pooled procedures.
Refer to the previous two exercises. Reanalyze the data using the pooled procedure. Does the conclusion depend on the choice of method? The standard deviations are quite different for these data, so we do not recommend use of the pooled procedures in this case.
7.71 Satterthwaite approximation.
The degrees of freedom given by the Satterthwaite approximation are always at least as large as the smaller of and and never larger than than the sum . In Exercise 7.53 (pages 394–395), you were asked to compare the analyses with and without a very low decibel reading in the low-intensity group. Redo those analyses and make a table showing the sample sizes and , the standard deviations and , and the Satterthwaite degrees of freedom for each of these analyses. Based on these results, suggest when the Satterthwaite degrees of freedom will be closer to the smaller of and and when it will be closer to .
noise
7.71
When the standard deviations are similar, the Satterthwaite DF are closer to . When one standard deviation is much larger, the Satterthwaite DF is closer to the smaller of and .
7.72 Pooled equals unpooled?
The software outputs in Figure 7.10 (pages 387–388) give the same value for the pooled and unpooled statistics. Do some simple algebra to show that this is always true when the two sample sizes and are the same. In other cases, the two statistics usually differ.
7.73 The advantage of pooling.
For the analysis of wheat prices in Example 7.13 (pages 385–386), there are only five observations per month. When sample sizes are small, we have very little information to make a judgment about whether the population standard deviations are equal. The potential gain from pooling is large when the sample sizes are very small. Assume that we will perform a two-sided test using the 5% significance level.
wheat
7.73
(a) . (b) . (c) Because the critical value is smaller for the pooled test, it is easier to show significance than the unpooled test.
7.74 The advantage of pooling.
Suppose that in the setting of the previous exercise, you are interested in 95% confidence intervals for the difference rather than significance testing. Find the widths of the intervals for the two procedures (assuming or not assuming equal standard deviations). How do they compare?
wheat