Chapter 22: What Is a Test of Significance?

CHAPTER 22 EXERCISES

Question 22.8

22.8 Ethnocentrism. A social psychologist reports, “In our sample, ethnocentrism was significantly higher (P < 0.05) among church attenders than among nonattenders.’’ Explain to someone who knows no statistics what this means.

Question 22.9

22.9 Students’ earnings. The financial aid office of a university asks a sample of students about their employment and earnings. The report says, “For academic year earnings, a significant difference (P = 0.028) was found between the sexes, with men earning more on average than women. No difference (P = 0.576) was found between the earnings of black and white students.’’ Explain both of these conclusions, for the effects of sex and of race on mean earnings, in language understandable to someone who knows no statistics.

22.9 There is good evidence that male students earn more (on the average) than do female students during the academic year. The difference in earnings in our sample was large enough that it would rarely occur in samples drawn from a population in which men’s and women’s average earnings are equal. Such a difference would happen in less than 3% of all samples. The average earnings of black and white students in our sample were so close together that a difference this large would not be unexpected in samples drawn from a population in which the average earnings of blacks and whites are equal. Similar differences would happen more than 50% of the time.

Question 22.10

22.10 Diet and diabetes. Does eating more fiber reduce the blood cholesterol level of patients with diabetes? A randomized clinical trial compared normal and high-fiber diets. Here is part of the researchers’ conclusion: “The high-fiber diet reduced plasma total cholesterol concentrations by 6.7 percent (P = 0.02), triglyceride concentrations by 10.2 percent (P = 0.02), and very-low-density lipoprotein cholesterol concentrations by 12.5 percent (P = 0.01).’’ A doctor who knows no statistics says that a drop of 6.7% in cholesterol isn’t a lot—maybe it’s just an accident due to the chance assignment of patients to the two diets. Explain in simple language how “P = 0.02’’ answers this objection.

Page 539

Question 22.11

22.11 Diet and bowel cancer. It has long been thought that eating a healthier diet reduces the risk of bowel cancer. A large study cast doubt on this advice. The subjects were 2079 people who had polyps removed from their bowels in the past six months. Such polyps may lead to cancer. The subjects were randomly assigned to a low-fat, high-fiber diet or to a control group in which subjects ate their usual diets. Did polyps reoccur during the next four years?

(a) Outline the design of this experiment.
(b) Surprisingly, the occurrence of new polyps “did not differ significantly between the two groups.’’ Explain clearly what this finding means.

22.11 (a) Randomly assign the 2079 subjects to the low fat/high fiber and usual diet groups. Compare number of polyps that reoccurred in the two groups. (b) The difference in polyp development between the two groups was small enough that it might occur simply by chance if diet had no effect.

Question 22.12

22.12 Pigs and prestige in ancient China. It appears that pigs in Stone Age China were not just a source of food. Owning pigs was also a display of wealth. Evidence for this comes from examining burial sites. If the skulls of sacrificed pigs tend to appear along with expensive ornaments, that suggests that the pigs, like the ornaments, signal the wealth and prestige of the person buried. A study of burials from around 3500 B.C. concluded that “there are striking differences in grave goods between burials with pig skulls and burials without them. . . . A test indicates that the two samples of total artifacts are significantly different at the 0.01 level.’’ Explain clearly why “significantly different at the 0.01 level’’ gives good reason to think that there really is a systematic difference between burials that contain pig skulls and those that lack them.

Question 22.13

22.13 Ancient Egypt. Settlements in Egypt before the time of the pharaohs are dated by measuring the presence of forms of carbon that decay over time. The first datings of settlements in the Nagada region used hair that had been excavated 60 years earlier. Now researchers have used newer methods and more recently excavated material. Do the dates differ? Here is the conclusion about one location: “There are two dates from Site KH6. Statistically, the two dates are not significantly different. They provide a weighted average corrected date of 3715 ± 90 B.C.’’ Explain to someone interested in ancient Egypt but not interested in statistics what “not significantly different’’ means.

22.13 The differences observed between dates estimated using the old method and estimates based on the new method were small enough that they could occur simply by chance if both methods produced the same results (on the average).

Question 22.14

22.14 What’s a gift worth? Do people value gifts from others more highly than they value the money it would take to buy the gift? We would like to think so because we hope that “the thought counts.’’ A survey of 209 adults asked them to list three recent gifts and then asked, “Aside from any sentimental value, if, without the giver ever knowing, you could receive an amount of money instead of the gift, what is the minimum amount of money that would make you equally happy?’’ It turned out that most people would need more money than the gift cost to be equally happy. The magic words “significant (P < 0.01)’’ appear in the report of this finding.

(a) The sample consisted of students and staff in a graduate program and of “members of the general public at train stations and airports in Boston and Philadelphia.’’ The report says this sample is “not ideal.’’ What’s wrong with the sample?
(b) In simple language, what does it mean to say that the sample thought their gifts were worth “significantly more’’ than their actual cost?
(c) Now be more specific: what does “significant (P < 0.01)’’ mean?

Page 540

Question 22.15

22.15 Attending church. A 2010 Gallup Poll found that 39% of American adults say they attended religious services last week. This is almost certainly not true.

(a) Why might we expect answers to a poll to overstate true church attendance?
(b) You suspect strongly that the true percentage attending church in any given week is less than 39%. You plan to watch a random sample of adults and see whether or not they go to church. What are your null and alternative hypotheses? (Be sure to say in words what the population proportion p is for your study.)

22.15 (a) People might be embarrassed to admit that they have not attended religious services, or they might want to make a favorable impression by saying that they did. (b) We take p to be the proportion of American adults who attended religious services last week. $H_{0} : p = 0.39$ and $H_{a} : p < 0.39$ .

Question 22.16

22.16 Body temperature. We have all heard that 98.6 degrees Fahrenheit (or 37 degrees Celsius) is “normal body temperature.’’ In fact, there is evidence that most people have a slightly lower body temperature. You plan to measure the body temperature of a random sample of people very accurately. You hope to show that a majority have temperatures lower than 98.6 degrees.

(a) Say clearly what the population proportion p stands for in this setting.
(b) In terms of p, what are your null and alternative hypotheses?

Question 22.17

22.17 Unemployment. The national unemployment rate in a recent month was 5.1%. You think the rate may be different in your city, so you plan a sample survey that will ask the same questions as the Current Population Survey. To see if the local rate differs significantly from 5.1%, what hypotheses will you test?

22.17 With p as the local unemployment rate, our hypotheses are H_a : p = 0.051 and H_a : p ≠ 0.051.

Question 22.18

22.18 First-year students. A UCLA survey of college freshmen in the 2014–2015 academic year found that 19.4% of all first-year college students identify themselves as politically conservative. You wonder if this percentage is different at your school, but you have no idea whether it is higher or lower. You plan a sample survey of first-year students at your school. What hypotheses will you test to see if your school differs significantly from the UCLA survey result?

Question 22.19

22.19 Do our athletes graduate? The National Collegiate Athletic Association (NCAA) requires colleges to report the graduation rates of their athletes. At one large university, 78% of all students who entered in 2004 graduated within six years. One hundred thirty-seven of the 190 students who entered with athletic scholarships graduated. Consider these 190 as a sample of all athletes who will be admitted under present policies. Is there evidence that the percentage of athletes who graduate is less than 78%?

(a) Explain in words what the parameter p is in this setting.
(b) What are the null and alternative hypotheses H₀ and H_a?
(c) What is the numerical value of the sample proportion $\hat{p}$ ? The P-value is the probability of what event?
(d) The P-value is P = 0.033. Explain why this P-value indicates there is some reason to think that graduation rates are lower among athletes than among all students.

22.19 (a) p is the graduation rate for all athletes at this university. (b) $H_{0} : p = 0.78$ and $H_{a} : p < 0.78$ . (c) 0.7211; the P-value is the probability that $\hat{p} \leq 0.7211$ (under the assumption that p = 0.78). (d) A P-value of 0.025 indicates that $\hat{p}$ values as extreme as 0.7211 would be somewhat rare (that is, they would occur in about 2.5% of all samples). This gives some reason to doubt the assumption that p = 0.78.

Page 541

Question 22.20

22.20 Using the Internet. In 2013, 81.8% of first-year college students responding to a national survey said that they used the Internet frequently for research or homework. Administrators at a large state university believe that their current first-year students use the Internet more frequently than students in 2013 did. They find that 168 of an SRS of 200 of the university’s first-year students said that they used the Internet frequently for research or homework. Is the proportion of first-year students at this university who said that they used the Internet frequently for research or homework larger than the 2013 national value of 81.8%?

(a) Explain in words what the parameter p is in this setting.
(b) What are the null and alternative hypotheses H₀ and H_a?
(c) What is the numerical value of the sample proportion $\hat{p}$ ? The P-value is the probability of what event?
(d) The P-value is P = 0.210. Explain carefully why this evidence should lead administrators to fail to reject H₀.

Question 22.21

22.21 Vote for the best face? We often judge other people by their faces. It appears that some people judge candidates for elected office by their faces. Psychologists showed head-and-shoulders photos of the two main candidates in 32 races for the U.S. Senate to many subjects (dropping subjects who recognized one of the candidates) to see which candidate was rated “more competent’’ based on nothing but the photos. On election day, the candidates whose faces looked more competent won 22 of the 32 contests. If faces don’t influence voting, half of all races in the long run should be won by the candidate with the better face. Is there evidence that the proportion of times the candidate with the better face wins is more than 50%?

(a) Explain in words what the parameter p is in this setting.
(b) What are the null and alternative hypotheses H₀ and H_a?
(c) What is the numerical value of the sample proportion $\hat{p}$ ? The P-value is the probability of what event?
(d) The P-value is P = 0.017. Explain carefully why this is reasonably good evidence that H₀ may not be true and that H_a may be true.

22.21 (a) p is the proportion of times the candidate with the better face wins. (b) $H_{0} : p = 0.50$ and $H_{a} : p > 0.50$ . (c) 0.6875; the P-value is the probability that $\hat{p} \leq 0.6875$ (under the assumption that p = 0.50). (d) A P-value of 0.017 indicates that $\hat{p}$ values as extreme as 0.6875 would be unlikely, that is, they would occur in about 1.7% of all samples. This gives pretty good reason to doubt the assumption that p = 0.50.

Question 22.22

22.22 Do our athletes graduate? Is the result of Exercise 22.19 statistically significant at the 10% level? At the 5% level?

Question 22.23

22.23 Using the Internet. Is the result of Exercise 22.20 statistically significant at the 5% level? At the 1% level?

22.23 Because the P-value (0.210) is greater than both 5% and 1%, the result is not significant at the 5% level and the 1% level.

Question 22.24

22.24 Vote for the best face? Is the result of Exercise 22.21 statistically significant at the 5% level? At the 1% level?

Question 22.25

22.25 Significant at what level? Explain in plain language why a result that is significant at the 1% level must always be significant at the 5% level. If a result is significant at the 5% level, what can you say about its significance at the 1% level?

22.25 A test is significant at the 1% level if outcomes as or more extreme than observed occur less than once in 100 times. A test is significant at the 5% level if outcomes as or more extreme than observed occur less than five in 100 times. Something that occurs less than once in 100 times also occurs less than five in 100 times, but the opposite is not necessarily true.

Page 542

Question 22.26

22.26 Significance means what? Asked to explain the meaning of “statistically significant at the α = 0.05 level,’’ a student says: “This means that the probability that the null hypothesis is true is less than 0.05.’’ Is this explanation correct? Why or why not?

Question 22.27

22.27 Finding a P-value by simulation. Is a new method of teaching reading to first-graders (Method B) more effective than the method now in use (Method A)? You design a matched pairs experiment to answer this question. You form 20 pairs of first-graders, with the two children in each pair carefully matched by IQ, socioeconomic status, and reading-readiness score. You assign at random one student from each pair to Method A. The other student in the pair is taught by Method B. At the end of first grade, all the children take a test to determine their reading skill. Assume that the higher the score on this test, the more proficient the student is at reading. Let p stand for the proportion of all possible matched pairs of children for which the child taught by Method B has the higher score. Your hypotheses are

H₀: p = 0.5	(no difference in effectiveness)
H_a: p > 0.5	(Method B is more effective)

The result of your experiment is that Method B gave the higher score in 12 of the 20 pairs, or $\hat{p} = 12 / 20 = 0.6$ .

(a) If H₀ is true, the 20 pairs of students are 20 independent trials with probability 0.5 that Method B “wins’’ each trial (is the more effective method). Explain how to use Table A to simulate these 20 trials if we assume for the sake of argument that H₀ is true.
(b) Use Table A, starting at line 105, to simulate 10 repetitions of the experiment. Estimate from your simulation the probability that Method B will do better (be the more effective method) in 12 or more of the 20 pairs when H₀ is true. (Of course, 10 repetitions are not enough to estimate the probability reliably. Once you understand the idea, more repetitions are easy.)
(c) Explain why the probability you simulated in part (b) is the P-value for your experiment. With enough patience, you could find all the P-values in this chapter by doing simulations similar to this one.

22.27 (a) Take 20 digits, and use the digits 0–4 for “Method A wins the trial” and 5–9 for “Method B wins the trial.” (Or vice versa, or use even digits for Method A, etc.) (b) Results will vary depending on which digits represent “Method B wins.” (c) We simulated the probability (assuming that $H_{0}$ is true) of observing results at least as extreme as those in our sample.

Question 22.28

22.28 Finding a P-value by simulation. A classic experiment to detect extra-sensory perception (ESP) uses a shuffled deck of cards containing five suits (waves, stars, circles, squares, and crosses). As the experimenter turns over each card and concentrates on it, the subject guesses the suit of the card. A subject who lacks ESP has probability 1-in-5 of being right by luck on each guess. A subject who has ESP will be right more often. Julie is right in five of 10 tries. (Actual experiments use much longer series of guesses so that weak ESP can be spotted. No one has ever been right half the time in a long experiment!)

(a) Give H₀ and H_a for a test to see if this result is significant evidence that Julie has ESP.
(b) Explain how to simulate the experiment if we assume for the sake of argument that H₀ is true.
(c) Simulate 20 repetitions of the experiment; begin at line 121 of Table A.
(d) The actual experimental result was five correct in 10 tries. What is the event whose probability is the P-value for this experimental result? Give an estimate of the P-value based on your simulation. How convincing was Julie’s performance?

The following exercises concern the optional section on calculating P-values. To carry out a test, complete the steps (hypotheses, sampling distribution, data, P-value, and conclusion) illustrated in Example 3.

Page 543

Question 22.29

22.29 Using the Internet. Return to the study in Exercise 22.20, which found that 168 of 200 first-year students said that they used the Internet frequently for research or homework. Carry out the hypothesis test described in Exercise 22.20 and compute the P-value. How does your value compare with the value given in Exercise 22.16(d)?

22.29 H₀ : p = 0.818 and H_a : p > 0.818, where p is the proportion of first-year college students at this university who use the Internet frequently for research or homework. If the null hypothesis is true, then the proportion who use the Internet frequently for research or homework from an SRS of 200 students would have (approximately) a Normal distribution with mean p = 0.818 and standard deviation 0.0273. Our sample had $\hat{p} = \frac{168}{200} = 0.840$ , for which the standard score is −0.8 and the P-value is 0.2119 using Table B. This result is not significant and roughly the same as the P-value in Exercise 22.20, which was computed using technology.

Question 22.30

22.30 Interpreting scatterplots. In 2014, the Pew Research Centers American Trends Panel sought to better understand what Americans know about science. It was observed that among a random selection of 3278 adults, 2065 adults could correctly interpret a scatterplot. Is this good evidence that more than 60% of Americans are able to correctly interpret scatterplots?

Question 22.31

22.31 Side effects. An experiment on the side effects of pain relievers assigned arthritis patients to one of several over-the-counter pain medications. Of the 420 patients who took one brand of pain reliever, 21 suffered some “adverse symptom.’’

(a) If 10% of all patients suffer adverse symptoms, what would be the sampling distribution of the proportion with adverse symptoms in a sample of 420 patients?
(b) Does the experiment provide strong evidence that fewer than 10% of patients who take this medication have adverse symptoms?

22.31 (a) If p = 0.1, then the proportion suffering adverse symptoms in an SRS of 420 patients has (approximately) a Normal distribution with mean p = 0.1 and standard deviation 0.01464. (b) We test $H_{0} : p = 0.1$ and $H_{a} : p < 0.1$ . Our sample had $\hat{p} = \frac{21}{420} = 0.05$ , for which the standard score is −3.42. Compared with Table B, we see that this is very strong evidence (P < 0.0003) that fewer than 10% of patients suffer adverse side effects from this medication.

Question 22.32

22.32 Do chemists have more girls? Some people think that chemists are more likely than other parents to have female children. (Perhaps chemists are exposed to something in their laboratories that affects the sex of their children.) The Washington State Department of Health lists the parents’ occupations on birth certificates. Between 1980 and 1990, 555 children were born to fathers who were chemists. Of these births, 273 were girls. During this period, 48.8% of all births in Washington State were girls. Is there evidence, at a significance level of 0.05, that the proportion of girls born to chemists is higher than the state proportion?

Question 22.33

22.33 Speeding. It often appears that most drivers on the road are driving faster than the posted speed limit. Situations differ, of course, but here is one set of data. Researchers studied the behavior of drivers on a rural interstate highway in Maryland where the speed limit was 55 miles per hour. They measured speed with an electronic device hidden in the pavement and, to eliminate large trucks, considered only vehicles less than 20 feet long. They found that 5690 out of 12,931 vehicles were exceeding the speed limit. Is this good evidence, at a significance level of 0.05, that (at least in this location) fewer than half of all drivers are speeding?

22.33 $H_{0} : p = 0.5$ and $H_{a} : p < 0.5$ , where p is the proportion of all drivers who are speeding. If the null hypothesis is true, then the proportion of speeders in an SRS of 12,931 drivers would have (approximately) a Normal distribution with mean p = 0.5 and standard deviation 0.004397. Our sample had $\hat{p} = \frac{5,690}{12,931} = 0.4400$ , for which the standard score is −13.6. Table B tells us that P < 0.0003. We can conclude that fewer than half of all drivers in this location are speeding.

The following exercises concern the optional section on tests for a population mean. To carry out a test, complete the steps illustrated in Example 4 or Example 5.

Page 544

Question 22.34

22.34 Student attitudes. The Survey of Study Habits and Attitudes (SSHA) is a psychological test that measures students’ study habits and attitudes toward school. Scores range from 0 to 200. The mean score for U.S. college students is about 115, and the standard deviation is about 30. A teacher suspects that older students have better attitudes toward school. She gives the SSHA to 25 students who are at least 30 years old. Assume that scores in the population of older students are Normally distributed with standard deviation σ = 30. The teacher wants to test the hypotheses

H₀: μ = 115

H_a: μ > 115

(a) What is the sampling distribution of the mean score $\bar{x}$ of a sample of 25 older students if the null hypothesis is true? Sketch the density curve of this distribution. (Hint: Sketch a Normal curve first, then mark the axis using what you know about locating μ and σ on a Normal curve.)
(b) Suppose that the sample data give $\bar{x} = 118.6$ . Mark this point on the axis of your sketch. In fact, the outcome was $\bar{x} = 125.7$ . Mark this point on your sketch. Using your sketch, explain in simple language why one outcome is good evidence that the mean score of all older students is greater than 115 and why the other outcome is not.
(c) Shade the area under the curve that is the P-value for the sample result $\bar{x} = 125.7$ .

Question 22.35

22.35 Mice in a maze. Experiments on learning in animals sometimes measure how long it takes mice to find their way through a maze. The mean time is 19 seconds for one particular maze. A researcher thinks that a loud noise will cause the mice to complete the maze faster. She measures how long each of several mice takes to find its way through a maze with a noise as stimulus. What are the null hypothesis H₀ and alternative hypothesis H_a?

22.35 $H_{0} : μ = 19$ seconds and $H_{a} : μ < 19$ seconds, where μ is the mean time to complete the maze with the noise as a stimulus.

Question 22.36

22.36 Response time. Last year, your company’s service technicians took an average of 2.5 hours to respond to trouble calls from business customers who had purchased service contracts. Do this year’s data show a significantly different average response time? What null and alternative hypotheses should you test to answer this question?

Question 22.37

22.37 Testing a random number generator. Our statistical software has a “random number generator’’ that is supposed to produce numbers scattered at random between 0 and 1. If this is true, the numbers generated come from a population with μ = 0.5. A command to generate 100 random numbers gives outcomes with $\bar{x}$ = 0.536 and s = 0.312. Is this good evidence that the mean of all numbers produced by this software is not 0.5?

22.37 We test $H_{0} : μ = 0.5$ versus $H_{a} : μ \neq 0.5$ . If $H_{0}$ were true, the sample mean $\bar{x}$ of an SRS of 100 numbers would have (approximately) a Normal distribution with mean 0.5 and standard deviation 0.0312. For $\bar{x} = 0.536$ , the standard score is 1.2, for which P = 0.2302 using Table B. We have little reason to believe that the mean of all possible numbers produced by this software is not 0.5.

Page 545

Question 22.38

22.38 Will they charge more? A bank wonders whether omitting the annual credit card fee for customers who charge at least $3000 in a year will increase the amount charged on its credit cards. The bank makes this offer to an SRS of 400 of its credit card customers. It then compares how much these customers charge this year with the amount that they charged last year. The mean increase in the sample is $246, and the standard deviation is $112. Is there significant evidence at the 1% level that the mean amount charged increases under the no-fee offer? State H₀ and H_a and carry out a significance test. Use significance level 0.01.

ex22-39

Question 22.39

22.39 Bad weather, bad tip? People tend to be more generous after receiving good news. Are they less generous after receiving bad news? The average tip left by adult Americans is 20%. Give 20 patrons of a restaurant a message on their bill warning them that tomorrow’s weather will be bad and record the tip percentage they leave. Here are the tips as a percentage of the total bill:

18.0 19.1 19.2 18.8 18.4 19.0

18.5 16.1 16.8 18.2 14.0 17.0

13.6 17.5 20.0 20.2 18.8 18.0

23.2 19.4

Suppose that tip percentages are Normal with σ = 2, and assume that the patrons in this study are a random sample of all patrons of this restaurant. Is there good evidence that the mean tip percentage for all patrons of this restaurant is less than 20 when they receive a message warning them that tomorrow’s weather will be bad? State H₀ and H_a and carry out a significance test. Use significance level 0.05.

22.39 We test $H_{0} : μ = 20$ versus $H_{a} : μ < 20$ , where μ is the mean tip percent for all patrons of this restaurant who are given a message about bad weather. If $H_{0}$ were true, the sample mean $\bar{x}$ of an SRS of 20 customers would have (approximately) a Normal distribution with mean 20 and standard deviation 0.4472. For $\bar{x} = 18.19$ , the standard score is −4.05. Compared with Table B, we see that this is very strong evidence (P < 0.0003) that the mean tip percent for all patrons of this restaurant who are given a message about bad weather is less than 20%.

Question 22.40

22.40 Why should the significance level matter? On June 15, 2005, an article by Lawrence K. Altman appeared in the New York Times. The title of the article was “Studies Rebut Earlier Report on Pledges of Virginity.” The article began by stating the following: “Challenging earlier findings, two studies from the Heritage Foundation reported yesterday that young people who took virginity pledges had lower rates of acquiring sexually transmitted diseases and engaged in fewer risky sexual behaviors.” The new findings were based on the same national survey used by earlier studies and conducted by the Department of Health and Human Services. But the authors of the new study used different methods of statistical analysis from those in an earlier one that was widely publicized, making direct comparisons difficult. One particular criticism of the new study was that the result of a statistical test at a 0.10 level of significance was reported when journals generally use a lower level of 0.05. Why might this be a concern?

EXPLORING THE WEB

Follow the QR code to access exercises.