Exercises

Clarifying the Concepts

Question 8.1

8.1

What specific danger exists when reporting a statistically significant difference between two means?

Question 8.2

8.2

In your own words, define the word confidence—first as you would use it in everyday conversation and then as a statistician would use it in the context of a confidence interval.

Question 8.3

8.3

Why do we calculate confidence intervals?

Question 8.4

8.4

What are the five steps to create a confidence interval for the mean of a z distribution?

Question 8.5

8.5

In your own words, define the word effect—first as you would use it in everyday conversation and then as a statistician would use it.

Question 8.6

8.6

What effect does increasing the sample size have on standard error and the test statistic?

Question 8.7

8.7

Relate effect size to the concept of overlap between distributions.

Question 8.8

8.8

What does it mean to say an effect-size statistic neutralizes the influence of sample size?

Question 8.9

8.9

What are Cohen’s guidelines for small, medium, and large effects?

Question 8.10

8.10

How does statistical power relate to Type II errors?

Question 8.11

8.11

In your own words, define the word power—first as you would use it in everyday conversation and then as a statistician would use it.

Question 8.12

8.12

How are statistical power and effect size different but related?

Question 8.13

8.13

Traditionally, what minimum percentage chance of correctly rejecting the null hypothesis is suggested in order to proceed with an experiment?

Question 8.14

8.14

Explain how increasing alpha increases statistical power.

Question 8.15

8.15

List five factors that affect statistical power. For each, indicate how a researcher can leverage that factor to increase power.

Question 8.16

8.16

What are the four basic steps of a meta-analysis?

Question 8.17

8.17

What is the goal of a meta-analysis?

Question 8.18

8.18

In statistics, concepts are often expressed in symbols and equations. For Mlower = −z(σ) + Msample, (i) identify the incorrect symbol, (ii) state what the correct symbol is, and (iii) explain why the initial symbol was incorrect.

Question 8.19

8.19

In statistics, concepts are often expressed in symbols and equations. For image , (i) identify the incorrect symbol, (ii) state what the correct symbol is, and (iii) explain why the initial symbol was incorrect.

Calculating the Statistics

Question 8.20

8.20

In 2008, the Gallup poll asked people whether or not they were suspicious of steroid use among Olympic athletes. Thirty-five percent of respondents indicated that they were suspicious when they saw an athlete break a track-and-field record, with a 4% margin of error. Calculate an interval estimate.

207

Question 8.21

8.21

In 2008, twenty-two percent of Gallup respondents indicated that they were suspicious of steroid use by athletes who broke world records in swimming. Calculate an interval estimate using a margin of error at 3.5%.

Question 8.22

8.22

In 2013, the Gallup polling organization and the online publication Inside Higher Ed reported the results of a survey of 831 university presidents and chancellors. The report stated: “For results based on the sample size of 831 total respondents, one can say with 95 percent confidence that the margin of error attributable to sampling error is ±3.4 percentage points” (p. 6). Fourteen percent of respondents indicated that they strongly agreed that massive open online courses (MOOCs) could have a positive impact on higher education. Construct an interval estimate for the point estimate of 14%.

Question 8.23

8.23

For each of the following confidence levels, indicate how much of the distribution would be placed in the cutoff region for a one-tailed z test.

  1. 80%

  2. 85%

  3. 99%

Question 8.24

8.24

For each of the following confidence levels, indicate how much of the distribution would be placed in the cutoff region for a two-tailed z test.

  1. 80%

  2. 85%

  3. 99%

Question 8.25

8.25

For each of the following confidence levels, look up the critical z value for a one-tailed z test.

  1. 80%

  2. 85%

  3. 99%

Question 8.26

8.26

For each of the following confidence levels, look up the critical z values for a two-tailed z test.

  1. 80%

  2. 85%

  3. 99%

Question 8.27

8.27

Calculate the 95% confidence interval for the following fictional data regarding daily TV viewing habits: μ = 4.7 hours; σ = 1.3 hours; sample of 78 people, with a mean of 4.1 hours.

Question 8.28

8.28

Calculate the 80% confidence interval for the same fictional data regarding daily TV viewing habits: μ = 4.7 hours; σ = 1.3 hours; sample of 78 people, with a mean of 4.1 hours.

Question 8.29

8.29

Calculate the 99% confidence interval for the same fictional data regarding daily TV viewing habits: µ = 4.7 hours; σ = 1.3 hours; sample of 78 people, with a mean of 4.1 hours.

Question 8.30

8.30

Calculate the standard error for each of the following sample sizes when μ = 1014 and σ = 136:

  1. 12

  2. 39

  3. 188

Question 8.31

8.31

For a given variable, imagine we know that the population mean is 1014 and the standard deviation is 136. A sample mean of 1057 is obtained. Calculate the z statistic for this mean, using each of the following sample sizes:

  1. 12

  2. 39

  3. 188

Question 8.32

8.32

Calculate the effect size for the mean of 1057 observed in Exercise 8.31 where μ = 1014 and σ =136.

Question 8.33

8.33

Calculate the effect size for each of the following average SAT math scores. Remember, SAT math is standardized such that μ = 500 and σ = 100.

  1. Sixty-one people sampled have a mean of 480.

  2. Eighty-two people sampled have a mean of 520.

  3. Six people sampled have a mean of 610.

Question 8.34

8.34

For each of the effect-size calculations in Exercise 8.33, identify the size of the effect using Cohen’s guidelines. Remember, for SAT math, μ = 500 and σ = 100.

  1. Sixty-one people sampled have a mean of 480.

  2. Eighty-two people sampled have a mean of 520.

  3. Six people sampled have a mean of 610.

Question 8.35

8.35

For each of the following d values, identify the size of the effect, using Cohen’s guidelines.

  1. d = 0.79

  2. d = −0.43

  3. d = 0.22

  4. d = −0.04

Question 8.36

8.36

For each of the following d values, identify the size of the effect, using Cohen’s guidelines.

  1. d = 1.22

  2. d = −1.22

  3. d = 0.13

  4. d = −0.13

Question 8.37

8.37

For each of the following z statistics, calculate the value for a two-tailed test.

  1. 2.23

  2. −1.82

  3. 0.33

208

Question 8.38

8.38

A meta-analysis reports an average effect size of = 0.11, with a confidence interval of d = 0.08 to = 0.14.

  1. Would a hypothesis test (assessing the null hypothesis that the average effect size is 0) lead us to reject the null hypothesis? Explain.

  2. Use Cohen’s conventions to describe the average effect size of d = 0.11.

Question 8.39

8.39

A meta-analysis reports an average effect size of d = 0.11, with a confidence interval of d = −0.06 to d = 0.28. Would a hypothesis test (assessing the null hypothesis that the average effect size is 0) lead us to reject the null hypothesis? Explain.

Question 8.40

8.40

Assume you are conducting a meta-analysis over a set of five studies. The effect sizes for each study follow: d = 0.67; d = 0.03; d = 0.32; d = 0.59; d = 0.22.

  1. Calculate the mean effect size for these studies.

  2. Use Cohen’s conventions to describe the mean effect size you calculated in part (a).

Question 8.41

8.41

Assume you are conducting a meta-analysis over a set of five studies. The effect sizes for each study follow: = 1.23; d = 1.08; d = −0.35; d = 0.88; d = 1.69.

  1. Calculate the mean effect size for these studies.

  2. Use Cohen’s conventions to describe the mean effect size you calculated in part (a).

Applying the Concepts

Question 8.42

8.42

Margin of error and adult education: According to a report by Public Agenda and the Kresge Foundation, online education is popular among adults planning to return to university (2013). “The majority (73 percent) of adult prospective students want to take at least some classes online, and nearly 4 in 10 (37 percent) say it is absolutely essential for them that their future school offer online classes” (p. 25). The margin of error was reported to be 4.27. Calculate an interval estimate for each of these findings.

Question 8.43

8.43

Distributions and the Burakumin: A friend reads in her Introduction to Psychology textbook about a minority group in Japan, the Burakumin, who are racially the same as other Japanese people but are viewed as outcasts because their ancestors were employed in positions that involved the handling of dead animals (e.g., butchers). In Japan, the text reported, mean IQ scores of Burakumin were 10 to 15 points below mean IQ scores of other Japanese people. In the United States, where Burakumin experienced no discrimination, there was no mean difference (from Ogbu, 1986, as reported in Hockenbury & Hockenbury, 2013). Your friend says to you: “Wow—when I taught English in Japan last summer, I had a Burakumin student. He seemed smart; perhaps I was fooled.” What should your friend consider about the two distributions, the one for Burakumin people and the one for other Japanese people?

Question 8.44

8.44

Sample size, z statistics, and the Consideration of Future Consequences scale: Here are summary data from a z test regarding scores on the Consideration of Future Consequences scale (Petrocelli, 2003): The population mean (μ) is 3.20 and the population standard deviation (σ) is 0.70. Imagine that a sample of students had a mean of 3.45.

  1. Calculate the test statistic for a sample of 5 students.

  2. Calculate the test statistic for a sample of 1000 students.

  3. Calculate the test statistic for a sample of 1,000,000 students.

  4. Explain why the test statistic varies so much even though the population mean, population standard deviation, and sample mean do not change.

  5. Why might sample size pose a problem for hypothesis testing and the conclusions we are able to draw?

Question 8.45

8.45

Sample size, z statistics, and the Graded Naming Test: In an exercise in Chapter 7, we asked you to conduct a z test to ascertain whether the Graded Naming Test (GNT) scores for Canadian participants differed from the GNT norms based on adults in England. We also used these data in the How It Works section of this chapter. The mean for a sample of 30 adults in Canada was 17.5. The normative mean for adults in England is 20.4, and we assumed a population standard deviation of 3.2. With 30 participants, the z statistic was −4.97, and we were able to reject the null hypothesis.

  1. Calculate the test statistic for 3 participants. How does the test statistic change compared to when N of 30 was used? Conduct step 6 of hypothesis testing. Does your conclusion change? If so, does this mean that the actual difference between groups changed? Explain.

  2. Conduct steps 3, 5, and 6 for 100 participants. How does the test statistic change?

  3. Conduct steps 3, 5, and 6 for 20,000 participants. How does the test statistic change?

  4. What is the effect of sample size on the test statistic?

  5. As the test statistic changes, has the underlying difference between groups changed? Why might this present a problem for hypothesis testing?

Question 8.46

8.46

Cheating with hypothesis testing: Unsavory researchers know that one can cheat with hypothesis testing. That is, they know that a researcher can stack the deck in her or his favor, making it easier to reject the null hypothesis.

  1. If you wanted to make it easier to reject the null hypothesis, what are three specific things you could do?

  2. Would it change the actual difference between the samples? Why is this a potential problem with hypothesis testing?

209

Question 8.47

8.47

Overlapping distributions and the LSATs: A Midwestern U.S. university reported that its behavioral science majors tended to outperform its humanities majors on the LSAT standardized test for law school admissions. Sadie, an English major, and Kofi, a sociology major, both just took the LSAT.

  1. Can we tell which student will do better on the LSAT? Explain your answer.

  2. Draw a picture that represents what the two distributions, that for social science majors and that for humanities majors at this institution, might look like with respect to one another.

Question 8.48

8.48

Confidence intervals, effect sizes, and tennis serves: Let’s assume the average speed of a serve in men’s tennis is around 135 mph, with a standard deviation of 6.5 mph. Because these statistics are calculated over many years and many players, we will treat them as population parameters. We develop a new training method that will increase arm strength, the force of the tennis swing, and the speed of the serve, we hope. We recruit 9 professional tennis players to use our method. After 6 months, we test the speed of their serves and compute an average of 138 mph.

  1. Using a 95% confidence interval, test the hypothesis that our method makes a difference.

  2. Compute the effect size and describe its strength.

  3. Calculate statistical power using an alpha of 0.05, or 5%, and a one-tailed test.

  4. Calculate statistical power using an alpha of 0.10, or 10%, and a one-tailed test.

  5. Explain how power is affected by alpha in the calculations in parts (c) and (d).

Question 8.49

8.49

Confidence intervals and football wins: In an exercise in Chapter 7, we asked whether college football teams tend to be more likely or less likely to be mismatched in the upper National Collegiate Athletic Association (NCAA) divisions. During one week of a college football season, the population of 53 Football Bowl Subdivision (FBS) games had a mean spread (winning score minus losing score) of 16.189, with a standard deviation of 12.128. We took a sample of 4 games that were played that week in the next-highest league, the Football Championship Subdivision (FCS), to see if the spread were different; one of the many leagues within FCS, the Patriot League, played 4 games that weekend. Their mean was 8.75.

  1. Calculate the 95% confidence interval for this sample.

  2. State in your own words what we learn from this confidence interval.

  3. What information does the confidence interval give us that we also get from a hypothesis test?

  4. What additional information does the confidence interval give us that we do not get from a hypothesis test?

Question 8.50

8.50

Confidence intervals and football wins (continued): Using the football data presented in Exercise 8.49, practice evaluating data using confidence intervals.

  1. Compute the 80% confidence interval.

  2. How do the conclusion and the confidence interval change as you move from 95% confidence to 80% confidence?

  3. Why don’t we talk about having 100% confidence?

Question 8.51

8.51

Effect size and football wins: In Exercises 8.49 and 8.50, we considered the study of one week of a 2006 college football season, during which the population of 53 FBS games had a mean spread (winning score minus losing score) of 16.189, with a standard deviation of 12.128. The sample of 4 games that were played that week in the next highest league, the FCS, had a mean of 8.75.

  1. Calculate the appropriate measure of effect size for this sample.

  2. Based on Cohen’s conventions, is this a small, medium, or large effect?

  3. Why is it useful to have this information in addition to the results of a hypothesis test?

Question 8.52

8.52

Effect size and football wins (continued): In Exercise 8.51, you calculated an effect size for data from one week of a 2006 college football season with 4 games. Imagine that you had a sample of 20 games. How would the effect size change? Explain why it would or would not change.

Question 8.53

8.53

Confidence intervals, effect sizes, and Valentine’s Day spending: According to the Nielsen Company, Americans spend $345 million on chocolate during the week of Valentine’s Day. Let’s assume that we know the average married person spends $45, with a population standard deviation of $16. In February 2009, the U.S. economy was in the throes of a recession. Comparing data for Valentine’s Day spending in 2009 with what is generally expected might give us some indication of the attitudes during the recession.

  1. Compute the 95% confidence interval for a sample of 18 married people who spent an average of $38.

  2. How does the 95% confidence interval change if the sample mean is based on 180 people?

  3. If you were testing a hypothesis that things had changed under the financial circumstances of 2009 as compared to previous years, what conclusion would you draw in part (a) versus part (b)?

  4. Compute the effect size based on these data and describe the size of the effect.

210

Question 8.54

8.54

More about confidence intervals, effect sizes, and tennis serves: Let’s assume the average speed of a serve in women’s tennis is around 118 mph, with a standard deviation of 12 mph. We recruit 100 amateur tennis players to use our new training method this time, and after 6 months we calculate a group mean of 123 mph.

  1. Using a 95% confidence interval, test the hypothesis that our method makes a difference.

  2. Compute the effect size and describe its strength.

Question 8.55

8.55

Confidence intervals, effect sizes, and tennis serves (continued): As in the previous exercise, assume the average speed of a serve in women’s tennis is around 118 mph, with a standard deviation of 12 mph. But now we recruit only 26 amateur tennis players to use our method. Again, after 6 months we calculate a group mean of 123 mph.

  1. Using a 95% confidence interval, test the hypothesis that our method makes a difference.

  2. Compute the effect size and describe its strength.

  3. How did changing the sample size from 100 (in Exercise 8.54) to 26 affect the confidence interval and effect size? Explain your answer.

Question 8.56

8.56

Statistical power and football wins: In several exercises in this chapter, we considered the study of one week of a college football season, during which the population of 53 FBS games had a mean spread (winning score minus losing score) of 16.189, with a standard deviation of 12.128. The sample of 4 games that were played that week in the next-highest league, the FCS, had a mean of 8.75.

  1. Calculate statistical power for this study using a one-tailed test and a p level of 0.05.

  2. What does the statistical power suggest about how we should view the findings of this study?

  3. Using G*Power or an online power calculator, calculate statistical power for this study for a one-tailed test with a p level of 0.05.

Question 8.57

8.57

Statistical power and tennis serves: Calculate statistical power based on the data presented in Exercise 8.55 using the following alpha levels in a one-tailed test:

  1. Alpha of 0.05, or 5%

  2. Alpha of 0.10, or 10%

  3. Explain how power is affected by alpha in these calculations.

Question 8.58

8.58

Effect size and homeless families: A New York Times article reported on the growing problem of homelessness among families (Bellafante, 2013). The reporter wrote that families in a city-run program called Homebase had shorter stays than families not in the program—a difference of about 22.6 fewer nights in a shelter. However, the reporter observed, “Though this is a statistically significant result, it is hardly an impressive one, especially in light of the fact that the average stay for a family in the shelter system is now 13 months, up from 9 months in 2011, and the city is experiencing record levels of homelessness with 50,000 people, including 21,000 children, in shelters every night.”

  1. How is the reporter’s observation about the size of the result—“hardly an impressive one”—related to the concept of effect size?

  2. Imagine that a friend who has not taken statistics asks you to explain the difference between a statistically significant result and a large or “impressive” effect. In your own words, how would you explain this difference to your friend?

Question 8.59

8.59

Meta-analysis, mental health treatments, and cultural contexts: A meta-analysis examined studies that compared two types of mental health treatments for ethnic and racial minorities—the standard available treatments and treatments that were adapted to the clients’ cultures (Griner & Smith, 2006). An excerpt from the abstract follows:

Many previous authors have advocated traditional mental health treatments be modified to better match clients’ cultural contexts. Numerous studies evaluating culturally adapted interventions have appeared, and the present study used meta-analytic methodology to summarize these data. Across 76 studies the resulting random effects weighted average effect size was d = .45, indicating a. . . benefit of culturally adapted interventions. (p. 531)

  1. What is the topic chosen by the researchers conducting the meta-analysis?

  2. What type of effect size statistic did the researchers’ calculate for each study in the meta-analysis?

  3. What was the mean effect size? According to Cohen’s conventions, how large is this effect?

  4. If a study chosen for the meta-analysis did not include an effect size, what summary statistics could the researchers use to calculate an effect size?

Question 8.60

8.60

Meta-analysis, mental health treatments, and cultural contexts (continued): The research paper on culturally targeted therapy described in Exercise 8.59 reported the following:

Across all 76 studies, the random effects weighted average effect size was d = .45 (SE = .04, p < .0001), with a 95% confidence interval of d = .36 to d = .53. The data consisted of 72 nonzero effect sizes, of which 68 (94%) were positive and 4 (6%) were negative. Effect sizes ranged from d = −48 to d = 2.7. (Griner & Smith, 2006, p. 535)

  1. What is the confidence interval for the effect size?

  2. Based on the confidence interval, would a hypothesis test lead us to reject the null hypothesis that the effect size is zero? Explain.

  3. Why would a graph, such as a histogram, be useful when conducting a meta-analysis like this one? (Hint: Consider the problems when using a mean as the measure of central tendency.)

211

Putting It All Together

Question 8.61

8.61

Fantasy baseball: Your roommate is reading Fantasyland: A Season on Baseball’s Lunatic Fringe (Walker, 2006) and is intrigued by the statistical methods used by competitors in fantasy baseball leagues (in which competitors select a team of baseball players from across all major league teams, winning in the fantasy league if their eclectic roster of players outperforms the chosen mixes of other fantasy competitors). Among the many statistics reported in the book is a finding that Major League Baseball (MLB) players who have a third child show more of a decline in performance than players who have a first child or a second child. Your friend remembers that Red Sox player David Ortiz has three children and drops him from consideration for his fantasy team.

  1. Explain to your friend why a difference between means doesn’t provide information about any specific player. Include a drawing of overlapping curves as part of your answer. On the drawing, mark places on the x-axis that might represent a player from the distribution of those who recently had a third child (mark with an X?) scoring above a player from the distribution of those who recently had a first or second child (mark with a Y).

  2. Explain to your friend that a statistically significant difference doesn’t necessarily indicate a large effect size. How might a measure of effect size, such as Cohen’s d, help us understand the importance of these findings and compare them to other predictors of performance that might have larger effects?

  3. Given that the reported association is true, can we conclude that having a third child causes a decline in performance? Explain your answer. What confounding variables might lead to the difference observed in this study?

  4. Given the relatively limited numbers of MLB players (and the relatively limited numbers of those who recently had a child—whether first, second, or third), what general guess would you make about the likely statistical power of this analysis?

Question 8.62

8.62

Hours of sleep: The table below provides information about hours of sleep.

Mean of population 1 (from which the sample comes) 14.9 hours of sleep
Sample size 37 infants
Mean of population 2 16 hours of sleep
Standard deviation of the population 1.7 hours of sleep
Standard error image
  1. Calculate statistical power for a one-tailed test (a = 0.05, or 5%) aimed at determining if those in the sample sleep fewer hours, on average, than those in the population.

  2. Recalculate statistical power with alpha of 0.01, or 1%. Explain why changing alpha affects power. Explain why we should not use a larger alpha to increase power.

  3. Without performing any computations, describe how statistical power is affected by performing a two-tailed test for this example. Why are two-tailed tests recommended over one-tailed tests?

  4. The easiest way to affect the outcome of a hypothesis test is to increase sample size. Similarly, true results may sometimes be missed because a sufficient sample was not used in the research. Perform the hypothesis test on these data with a sample of 37. Then perform the same hypothesis test but assume that the mean was based on only 4 infants.

  5. The easiest way to increase statistical power is to increase sample size. Similarly, statistical power decreases with a smaller sample size. For these data, compute the statistical power of the one-tailed statistical test with alpha of 0.05 when N is 4. How does that value compare to when N was 37?

Question 8.63

8.63

Effect size and an intervention to increase college applications: Caroline Hoxby and Sarah Turner (2013) conducted an experiment to determine whether a simple intervention could increase the number of college applications among low-income students. The intervention consisted of information about the college application process and about college costs that were specific to the student, along with an easy-to-implement waiver of college application fees. The following is an excerpt from a table. The intervention had a statistically significant effect on this variable at a p level of 0.01.

212

Dependent variable Effect in percentage change Effect in effect size
Number of applications submitted 19.0% 0.247
  1. Describe the sample and population of this study.

  2. What is the independent variable and what are its levels?

  3. What is the dependent variable?

  4. The finding was statistically significant. Why is this not sufficient to determine that this intervention, which costs about $6 per student, is worthwhile?

  5. What is the effect size for the dependent variable? How large is it, according to Cohen’s conventions?

  6. What does this effect size mean in terms of standard deviations in the context of this study?

  7. The researchers also included the effect in percentage change. Explain what this means in the context of this study.