22.8 Ethnocentrism. A social psychologist reports, “In our sample, ethnocentrism was significantly higher (P < 0.05) among church attenders than among nonattenders.’’ Explain to someone who knows no statistics what this means.
22.9 Students’ earnings. The financial aid office of a university asks a sample of students about their employment and earnings. The report says, “For academic year earnings, a significant difference (P = 0.028) was found between the sexes, with men earning more on average than women. No difference (P = 0.576) was found between the earnings of black and white students.’’ Explain both of these conclusions, for the effects of sex and of race on mean earnings, in language understandable to someone who knows no statistics.
22.9 There is good evidence that male students earn more (on the average) than do female students during the academic year. The difference in earnings in our sample was large enough that it would rarely occur in samples drawn from a population in which men’s and women’s average earnings are equal. Such a difference would happen in less than 3% of all samples. The average earnings of black and white students in our sample were so close together that a difference this large would not be unexpected in samples drawn from a population in which the average earnings of blacks and whites are equal. Similar differences would happen more than 50% of the time.
22.10 Diet and diabetes. Does eating more fiber reduce the blood cholesterol level of patients with diabetes? A randomized clinical trial compared normal and high-
22.11 Diet and bowel cancer. It has long been thought that eating a healthier diet reduces the risk of bowel cancer. A large study cast doubt on this advice. The subjects were 2079 people who had polyps removed from their bowels in the past six months. Such polyps may lead to cancer. The subjects were randomly assigned to a low-
(a) Outline the design of this experiment.
(b) Surprisingly, the occurrence of new polyps “did not differ significantly between the two groups.’’ Explain clearly what this finding means.
22.11 (a) Randomly assign the 2079 subjects to the low fat/high fiber and usual diet groups. Compare number of polyps that reoccurred in the two groups. (b) The difference in polyp development between the two groups was small enough that it might occur simply by chance if diet had no effect.
22.12 Pigs and prestige in ancient China. It appears that pigs in Stone Age China were not just a source of food. Owning pigs was also a display of wealth. Evidence for this comes from examining burial sites. If the skulls of sacrificed pigs tend to appear along with expensive ornaments, that suggests that the pigs, like the ornaments, signal the wealth and prestige of the person buried. A study of burials from around 3500 B.C. concluded that “there are striking differences in grave goods between burials with pig skulls and burials without them. . . . A test indicates that the two samples of total artifacts are significantly different at the 0.01 level.’’ Explain clearly why “significantly different at the 0.01 level’’ gives good reason to think that there really is a systematic difference between burials that contain pig skulls and those that lack them.
22.13 Ancient Egypt. Settlements in Egypt before the time of the pharaohs are dated by measuring the presence of forms of carbon that decay over time. The first datings of settlements in the Nagada region used hair that had been excavated 60 years earlier. Now researchers have used newer methods and more recently excavated material. Do the dates differ? Here is the conclusion about one location: “There are two dates from Site KH6. Statistically, the two dates are not significantly different. They provide a weighted average corrected date of 3715 ± 90 B.C.’’ Explain to someone interested in ancient Egypt but not interested in statistics what “not significantly different’’ means.
22.13 The differences observed between dates estimated using the old method and estimates based on the new method were small enough that they could occur simply by chance if both methods produced the same results (on the average).
22.14 What’s a gift worth? Do people value gifts from others more highly than they value the money it would take to buy the gift? We would like to think so because we hope that “the thought counts.’’ A survey of 209 adults asked them to list three recent gifts and then asked, “Aside from any sentimental value, if, without the giver ever knowing, you could receive an amount of money instead of the gift, what is the minimum amount of money that would make you equally happy?’’ It turned out that most people would need more money than the gift cost to be equally happy. The magic words “significant (P < 0.01)’’ appear in the report of this finding.
(a) The sample consisted of students and staff in a graduate program and of “members of the general public at train stations and airports in Boston and Philadelphia.’’ The report says this sample is “not ideal.’’ What’s wrong with the sample?
(b) In simple language, what does it mean to say that the sample thought their gifts were worth “significantly more’’ than their actual cost?
(c) Now be more specific: what does “significant (P < 0.01)’’ mean?
22.15 Attending church. A 2010 Gallup Poll found that 39% of American adults say they attended religious services last week. This is almost certainly not true.
(a) Why might we expect answers to a poll to overstate true church attendance?
(b) You suspect strongly that the true percentage attending church in any given week is less than 39%. You plan to watch a random sample of adults and see whether or not they go to church. What are your null and alternative hypotheses? (Be sure to say in words what the population proportion p is for your study.)
22.15 (a) People might be embarrassed to admit that they have not attended religious services, or they might want to make a favorable impression by saying that they did. (b) We take p to be the proportion of American adults who attended religious services last week. H0 : p=0.39 and Ha : p<0.39.
22.16 Body temperature. We have all heard that 98.6 degrees Fahrenheit (or 37 degrees Celsius) is “normal body temperature.’’ In fact, there is evidence that most people have a slightly lower body temperature. You plan to measure the body temperature of a random sample of people very accurately. You hope to show that a majority have temperatures lower than 98.6 degrees.
(a) Say clearly what the population proportion p stands for in this setting.
(b) In terms of p, what are your null and alternative hypotheses?
22.17 Unemployment. The national unemployment rate in a recent month was 5.1%. You think the rate may be different in your city, so you plan a sample survey that will ask the same questions as the Current Population Survey. To see if the local rate differs significantly from 5.1%, what hypotheses will you test?
22.17 With p as the local unemployment rate, our hypotheses are Ha : p = 0.051 and Ha : p ≠ 0.051.
22.18 First-
22.19 Do our athletes graduate? The National Collegiate Athletic Association (NCAA) requires colleges to report the graduation rates of their athletes. At one large university, 78% of all students who entered in 2004 graduated within six years. One hundred thirty-
(a) Explain in words what the parameter p is in this setting.
(b) What are the null and alternative hypotheses H0 and Ha?
(c) What is the numerical value of the sample proportion ˆp? The P-value is the probability of what event?
(d) The P-value is P = 0.033. Explain why this P-value indicates there is some reason to think that graduation rates are lower among athletes than among all students.
22.19 (a) p is the graduation rate for all athletes at this university. (b)H0 : p=0.78 and Ha : p<0.78. (c) 0.7211; the P-value is the probability that ˆp≤0.7211 (under the assumption that p = 0.78). (d) A P-value of 0.025 indicates that ˆp values as extreme as 0.7211 would be somewhat rare (that is, they would occur in about 2.5% of all samples). This gives some reason to doubt the assumption that p = 0.78.
22.20 Using the Internet. In 2013, 81.8% of first-
(a) Explain in words what the parameter p is in this setting.
(b) What are the null and alternative hypotheses H0 and Ha?
(c) What is the numerical value of the sample proportion ˆp ? The P-value is the probability of what event?
(d) The P-value is P = 0.210. Explain carefully why this evidence should lead administrators to fail to reject H0.
22.21 Vote for the best face? We often judge other people by their faces. It appears that some people judge candidates for elected office by their faces. Psychologists showed head-
(a) Explain in words what the parameter p is in this setting.
(b) What are the null and alternative hypotheses H0 and Ha?
(c) What is the numerical value of the sample proportion ˆp ? The P-value is the probability of what event?
(d) The P-value is P = 0.017. Explain carefully why this is reasonably good evidence that H0 may not be true and that Ha may be true.
22.21 (a) p is the proportion of times the candidate with the better face wins. (b)H0 : p =0.50 and Ha : p>0.50. (c) 0.6875; the P-value is the probability that ˆp≤0.6875 (under the assumption that p = 0.50). (d) A P-value of 0.017 indicates that ˆp values as extreme as 0.6875 would be unlikely, that is, they would occur in about 1.7% of all samples. This gives pretty good reason to doubt the assumption that p = 0.50.
22.22 Do our athletes graduate? Is the result of Exercise 22.19 statistically significant at the 10% level? At the 5% level?
22.23 Using the Internet. Is the result of Exercise 22.20 statistically significant at the 5% level? At the 1% level?
22.23 Because the P-value (0.210) is greater than both 5% and 1%, the result is not significant at the 5% level and the 1% level.
22.24 Vote for the best face? Is the result of Exercise 22.21 statistically significant at the 5% level? At the 1% level?
22.25 Significant at what level? Explain in plain language why a result that is significant at the 1% level must always be significant at the 5% level. If a result is significant at the 5% level, what can you say about its significance at the 1% level?
22.25 A test is significant at the 1% level if outcomes as or more extreme than observed occur less than once in 100 times. A test is significant at the 5% level if outcomes as or more extreme than observed occur less than five in 100 times. Something that occurs less than once in 100 times also occurs less than five in 100 times, but the opposite is not necessarily true.
22.26 Significance means what? Asked to explain the meaning of “statistically significant at the α = 0.05 level,’’ a student says: “This means that the probability that the null hypothesis is true is less than 0.05.’’ Is this explanation correct? Why or why not?
22.27 Finding a P-value by simulation. Is a new method of teaching reading to first-
H0: p = 0.5 | (no difference in effectiveness) |
Ha: p > 0.5 | (Method B is more effective) |
The result of your experiment is that Method B gave the higher score in 12 of the 20 pairs, or ˆp=12/20=0.6.
(a) If H0 is true, the 20 pairs of students are 20 independent trials with probability 0.5 that Method B “wins’’ each trial (is the more effective method). Explain how to use Table A to simulate these 20 trials if we assume for the sake of argument that H0 is true.
(b) Use Table A, starting at line 105, to simulate 10 repetitions of the experiment. Estimate from your simulation the probability that Method B will do better (be the more effective method) in 12 or more of the 20 pairs when H0 is true. (Of course, 10 repetitions are not enough to estimate the probability reliably. Once you understand the idea, more repetitions are easy.)
(c) Explain why the probability you simulated in part (b) is the P-value for your experiment. With enough patience, you could find all the P-values in this chapter by doing simulations similar to this one.
22.27 (a) Take 20 digits, and use the digits 0–4 for “Method A wins the trial” and 5–9 for “Method B wins the trial.” (Or vice versa, or use even digits for Method A, etc.) (b) Results will vary depending on which digits represent “Method B wins.” (c) We simulated the probability (assuming that H0 is true) of observing results at least as extreme as those in our sample.
22.28 Finding a P-value by simulation. A classic experiment to detect extra-
(a) Give H0 and Ha for a test to see if this result is significant evidence that Julie has ESP.
(b) Explain how to simulate the experiment if we assume for the sake of argument that H0 is true.
(c) Simulate 20 repetitions of the experiment; begin at line 121 of Table A.
(d) The actual experimental result was five correct in 10 tries. What is the event whose probability is the P-value for this experimental result? Give an estimate of the P-value based on your simulation. How convincing was Julie’s performance?
The following exercises concern the optional section on calculating P-
22.29 Using the Internet. Return to the study in Exercise 22.20, which found that 168 of 200 first-
22.29 H0 : p = 0.818 and Ha : p > 0.818, where p is the proportion of first-year college students at this university who use the Internet frequently for research or homework. If the null hypothesis is true, then the proportion who use the Internet frequently for research or homework from an SRS of 200 students would have (approximately) a Normal distribution with mean p = 0.818 and standard deviation 0.0273. Our sample had ˆp=168200 =0.840, for which the standard score is −0.8 and the P-value is 0.2119 using Table B. This result is not significant and roughly the same as the P-value in Exercise 22.20, which was computed using technology.
22.30 Interpreting scatterplots. In 2014, the Pew Research Centers American Trends Panel sought to better understand what Americans know about science. It was observed that among a random selection of 3278 adults, 2065 adults could correctly interpret a scatterplot. Is this good evidence that more than 60% of Americans are able to correctly interpret scatterplots?
22.31 Side effects. An experiment on the side effects of pain relievers assigned arthritis patients to one of several over-
(a) If 10% of all patients suffer adverse symptoms, what would be the sampling distribution of the proportion with adverse symptoms in a sample of 420 patients?
(b) Does the experiment provide strong evidence that fewer than 10% of patients who take this medication have adverse symptoms?
22.31 (a) If p = 0.1, then the proportion suffering adverse symptoms in an SRS of 420 patients has (approximately) a Normal distribution with mean p = 0.1 and standard deviation 0.01464. (b) We test H0 : p=0.1 and Ha : p<0.1. Our sample had ˆp=21420=0.05, for which the standard score is −3.42. Compared with Table B, we see that this is very strong evidence (P < 0.0003) that fewer than 10% of patients suffer adverse side effects from this medication.
22.32 Do chemists have more girls? Some people think that chemists are more likely than other parents to have female children. (Perhaps chemists are exposed to something in their laboratories that affects the sex of their children.) The Washington State Department of Health lists the parents’ occupations on birth certificates. Between 1980 and 1990, 555 children were born to fathers who were chemists. Of these births, 273 were girls. During this period, 48.8% of all births in Washington State were girls. Is there evidence, at a significance level of 0.05, that the proportion of girls born to chemists is higher than the state proportion?
22.33 Speeding. It often appears that most drivers on the road are driving faster than the posted speed limit. Situations differ, of course, but here is one set of data. Researchers studied the behavior of drivers on a rural interstate highway in Maryland where the speed limit was 55 miles per hour. They measured speed with an electronic device hidden in the pavement and, to eliminate large trucks, considered only vehicles less than 20 feet long. They found that 5690 out of 12,931 vehicles were exceeding the speed limit. Is this good evidence, at a significance level of 0.05, that (at least in this location) fewer than half of all drivers are speeding?
22.33H0 : p=0.5 and Ha : p<0.5, where p is the proportion of all drivers who are speeding. If the null hypothesis is true, then the proportion of speeders in an SRS of 12,931 drivers would have (approximately) a Normal distribution with mean p = 0.5 and standard deviation 0.004397. Our sample had ˆp=5,69012,931 =0.4400, for which the standard score is −13.6. Table B tells us that P < 0.0003. We can conclude that fewer than half of all drivers in this location are speeding.
The following exercises concern the optional section on tests for a population mean. To carry out a test, complete the steps illustrated in Example 4 or Example 5.
22.34 Student attitudes. The Survey of Study Habits and Attitudes (SSHA) is a psychological test that measures students’ study habits and attitudes toward school. Scores range from 0 to 200. The mean score for U.S. college students is about 115, and the standard deviation is about 30. A teacher suspects that older students have better attitudes toward school. She gives the SSHA to 25 students who are at least 30 years old. Assume that scores in the population of older students are Normally distributed with standard deviation σ = 30. The teacher wants to test the hypotheses
H0: μ = 115
Ha: μ > 115
(a) What is the sampling distribution of the mean score ˉx of a sample of 25 older students if the null hypothesis is true? Sketch the density curve of this distribution. (Hint: Sketch a Normal curve first, then mark the axis using what you know about locating μ and σ on a Normal curve.)
(b) Suppose that the sample data give ˉx=118.6. Mark this point on the axis of your sketch. In fact, the outcome was ˉx=125.7. Mark this point on your sketch. Using your sketch, explain in simple language why one outcome is good evidence that the mean score of all older students is greater than 115 and why the other outcome is not.
(c) Shade the area under the curve that is the P-value for the sample result ˉx=125.7.
22.35 Mice in a maze. Experiments on learning in animals sometimes measure how long it takes mice to find their way through a maze. The mean time is 19 seconds for one particular maze. A researcher thinks that a loud noise will cause the mice to complete the maze faster. She measures how long each of several mice takes to find its way through a maze with a noise as stimulus. What are the null hypothesis H0 and alternative hypothesis Ha?
22.35H0 : μ=19 seconds and Ha : μ<19 seconds, where μ is the mean time to complete the maze with the noise as a stimulus.
22.36 Response time. Last year, your company’s service technicians took an average of 2.5 hours to respond to trouble calls from business customers who had purchased service contracts. Do this year’s data show a significantly different average response time? What null and alternative hypotheses should you test to answer this question?
22.37 Testing a random number generator. Our statistical software has a “random number generator’’ that is supposed to produce numbers scattered at random between 0 and 1. If this is true, the numbers generated come from a population with μ = 0.5. A command to generate 100 random numbers gives outcomes with ˉx = 0.536 and s = 0.312. Is this good evidence that the mean of all numbers produced by this software is not 0.5?
22.37 We test H0 : μ=0.5 versus Ha : μ≠0.5. If H0 were true, the sample mean ˉx of an SRS of 100 numbers would have (approximately) a Normal distribution with mean 0.5 and standard deviation 0.0312. For ˉx=0.536, the standard score is 1.2, for which P = 0.2302 using Table B. We have little reason to believe that the mean of all possible numbers produced by this software is not 0.5.
22.38 Will they charge more? A bank wonders whether omitting the annual credit card fee for customers who charge at least $3000 in a year will increase the amount charged on its credit cards. The bank makes this offer to an SRS of 400 of its credit card customers. It then compares how much these customers charge this year with the amount that they charged last year. The mean increase in the sample is $246, and the standard deviation is $112. Is there significant evidence at the 1% level that the mean amount charged increases under the no-
ex22-39
22.39 Bad weather, bad tip? People tend to be more generous after receiving good news. Are they less generous after receiving bad news? The average tip left by adult Americans is 20%. Give 20 patrons of a restaurant a message on their bill warning them that tomorrow’s weather will be bad and record the tip percentage they leave. Here are the tips as a percentage of the total bill:
18.0 19.1 19.2 18.8 18.4 19.0
18.5 16.1 16.8 18.2 14.0 17.0
13.6 17.5 20.0 20.2 18.8 18.0
23.2 19.4
Suppose that tip percentages are Normal with σ = 2, and assume that the patrons in this study are a random sample of all patrons of this restaurant. Is there good evidence that the mean tip percentage for all patrons of this restaurant is less than 20 when they receive a message warning them that tomorrow’s weather will be bad? State H0 and Ha and carry out a significance test. Use significance level 0.05.
22.39 We test H0 : μ=20 versus Ha : μ<20, where μ is the mean tip percent for all patrons of this restaurant who are given a message about bad weather. If H0 were true, the sample mean ˉx of an SRS of 20 customers would have (approximately) a Normal distribution with mean 20 and standard deviation 0.4472. For ˉx=18.19, the standard score is −4.05. Compared with Table B, we see that this is very strong evidence (P < 0.0003) that the mean tip percent for all patrons of this restaurant who are given a message about bad weather is less than 20%.
22.40 Why should the significance level matter? On June 15, 2005, an article by Lawrence K. Altman appeared in the New York Times. The title of the article was “Studies Rebut Earlier Report on Pledges of Virginity.” The article began by stating the following: “Challenging earlier findings, two studies from the Heritage Foundation reported yesterday that young people who took virginity pledges had lower rates of acquiring sexually transmitted diseases and engaged in fewer risky sexual behaviors.” The new findings were based on the same national survey used by earlier studies and conducted by the Department of Health and Human Services. But the authors of the new study used different methods of statistical analysis from those in an earlier one that was widely publicized, making direct comparisons difficult. One particular criticism of the new study was that the result of a statistical test at a 0.10 level of significance was reported when journals generally use a lower level of 0.05. Why might this be a concern?
EXPLORING THE WEB
Follow the QR code to access exercises.