Chapter 7 Exercises

Clarifying the Concepts

Question 7.1

What is a percentile?

Question 7.2

When we look up a z score on the z table, what information can we report?

Question 7.3

How do we calculate the percentage of scores below a particular positive z score?

Question 7.4

How is calculating a percentile for a mean from a distribution of means different from doing so for a score from a distribution of scores?

184

Question 7.5

In statistics, what do we mean by assumptions?

Question 7.6

What sample size is recommended in order to meet the assumption of a normal distribution of means, even when the underlying population of scores is not normal?

Question 7.7

What is the difference between parametric tests and nonparametric tests?

Question 7.8

What are the six steps of hypothesis testing?

Question 7.9

What are critical values and the critical region?

Question 7.10

What is the standard size of the critical region used by most statisticians?

Question 7.11

What does statistically significant mean to statisticians?

Question 7.12

What do these symbolic expressions mean: H0: μ1 = μ2 and H1: μ1μ2?

Question 7.13

Using everyday language rather than statistical language, explain why the words critical region might have been chosen to define the area in which a z statistic must fall in order for a researcher to reject the null hypothesis.

Question 7.14

Using everyday language rather than statistical language, explain why the word cutoff might have been chosen to define the point beyond which we reject the null hypothesis.

Question 7.15

What is the difference between a one-tailed hypothesis test and a two-tailed hypothesis test in terms of critical regions?

Question 7.16

Why do researchers typically use a two-tailed test rather than a one-tailed test?

Question 7.17

Write the symbols for the null hypothesis and research hypothesis for a one-tailed test.

Question 7.18

What are three kinds of dirty data and what are their possible sources?

Question 7.19

What are three ways to deal with missing data?

Question 7.20

How can data that are misleading result in missing data?

Calculating the Statistic

Question 7.21

Calculate the following percentages for a z score of 0.74, with a tail of 22.96%:

  1. What percentage of scores falls below this z score?

  2. What percentage of scores falls between the mean and this z score?

  3. What proportion of scores falls below a z score of −0.74?

Question 7.22

Using the z table in Appendix B, calculate the following percentages for a z score of −0.08:

  1. Above this z score

  2. Below this z score

  3. At least as extreme as this z score

Question 7.23

Using the z table in Appendix B, calculate the following percentages for a z score of 1.71:

  1. Above this z score

  2. Below this z score

  3. At least as extreme as this z score

Question 7.24

Rewrite each of the following percentages as probabilities, or p levels:

  1. 5%

  2. 83%

  3. 51%

Question 7.25

Rewrite each of the following probabilities, or p levels, as percentages:

  1. 0.19

  2. 0.04

  3. 0.92

Question 7.26

If the critical values for a hypothesis test occur where 2.5% of the distribution is in each tail, what are the cutoffs for z?

Question 7.27

For each of the following p levels, what percentage of the data will be in each critical region for a two-tailed test?

  1. 0.05

  2. 0.10

  3. 0.01

Question 7.28

State the percentage of scores in a one-tailed critical region for each of the following p levels:

  1. 0.05

  2. 0.10

  3. 0.01

Question 7.29

You are conducting a z test on a sample of 50 people with an average SAT verbal score of 542 (assume we know the population mean to be 500 and the standard deviation to be 100). Calculate the mean and the spread of the comparison distribution (μM and σM).

Question 7.30

You are conducting a z test on a sample of 132 people for whom you observed a mean SAT verbal score of 490. The population mean is 500, and the standard deviation is 100. Calculate the mean and the spread of the comparison distribution (μM and σM).

Question 7.31

If the cutoffs for a z test are −1.96 and 1.96, determine whether you would reject or fail to reject the null hypothesis in each of the following cases:

  1. z = 1.06

  2. z = −2.06

  3. A z score beyond which 7% of the data fall in each tail

Question 7.32

If the cutoffs for a z test are −2.58 and 2.58, determine whether you would reject or fail to reject the null hypothesis in each of the following cases:

  1. z = −0.94

  2. z = 2.12

  3. A z score for which 49.6% of the data fall between z and the mean

185

Question 7.33

Use the cutoffs of −1.65 and 1.65 and a p level of approximately 0.10, or 10%. For each of the following values, determine whether you would reject or fail to reject the null hypothesis:

  1. z = 0.95

  2. z = −1.77

  3. A z statistic that 2% of the scores fall above

Question 7.34

You are conducting a z test on a sample for which you observe a mean weight of 150 pounds. The population mean is 160, and the standard deviation is 100.

  1. Calculate a z statistic for a sample of 30 people.

  2. Repeat part (a) for a sample of 300 people.

  3. Repeat part (a) for a sample of 3000 people.

Question 7.35

For each of the following, indicate whether or not the situation describes misleading data that the researcher may decide to investigate and potentially discard.

  1. A sample of 50 students rate their agreement with 100 statements designed to assess their political attitudes. The rating scale goes from 1 (definitely disagree) to 7 (definitely agree). One participant provides a response of 1 to all 100 statements.

  2. A researcher measures the time it takes participants to hit a button upon hearing a warning signal. In her sample of 34 participants, she finds that the mean response time is 413 milliseconds (ms) with a standard deviation of 30 ms. One participant has a response time of 420 ms.

  3. A researcher measures the time it takes participants to hit a button upon hearing a warning signal. In previous studies, she found that the mean response time is 413 ms with a standard deviation of 30 ms. In the current study, one participant had a response time of 1220 ms, which drives up the overall mean of the sample.

Question 7.36

Assume that the following set of data represents the responses of 10 participants to three similar statements. The participants rated their agreement with each statement on a scale from 1 to 7.

Participant S1 S2 S3
1 2 3 2
2 6 7 3
3 3 2 5
4 7 6 5
5 2 3 3
6 5 5 6
7 9 5 4
8 2 3 7
9 6 7 7
10 3 6 5
  1. There is a piece of dirty data in this data set. Identify it and explain why it is dirty.

  2. Assume that you have decided to throw out the piece of dirty data you identified in part (a) and replace it with the mean for that variable. What is the new data point?

  3. Assume that you have decided to throw out the piece of dirty data you identified in part (a) and replace it with the mean of that participant’s responses. What is the new data point?

Applying the Concepts

Question 7.37

Percentiles and unemployment rates: The U.S. Bureau of Labor Statistics’ annual report published in 2011 provided adjusted unemployment rates for 10 countries. The mean was 7, and the standard deviation was 1.85. For the following calculations, treat these as the population mean and standard deviation.

  1. Australia’s unemployment rate was 5.4. Calculate the percentile for Australia—that is, what percentage is less than that of Australia?

  2. The United Kingdom’s unemployment rate was 8.5. Calculate its percentile—that is, what percentage is less than that of the United Kingdom?

  3. The unemployment rate in the United States was 8.9. Calculate its percentile—that is, what percentage is less than that of the United States?

  4. The unemployment rate in Canada was 6.5. Calculate its percentile—that is, what percentage is less than that of Canada?

Question 7.38

Height and the z distribution: Elena, a 15-year-old girl, is 58 inches tall. Based on what we know, the average height for girls at this age is 63.80 inches, with a standard deviation of 2.66 inches.

  1. Calculate Elena’s z score.

  2. What percentage of girls are taller than Elena?

  3. What percentage of girls are shorter?

  4. How much would Elena have to grow to be perfectly average?

  5. If Sarah is in the 75th percentile for height at age 15, how tall is she?

  6. How much would Elena have to grow in order to be at the 75th percentile with Sarah?

Question 7.39

Height and the z distribution: Kona, a 15-year-old boy, is 72 inches tall. According to the CDC, the average height for boys at this age is 67.00 inches, with a standard deviation of 3.19 inches.

  1. Calculate Kona’s z score.

  2. What is Kona’s percentile score for height?

  3. What percentage of boys this age is shorter than Kona?

  4. What percentage of heights is at least as extreme as Kona’s, in either direction?

  5. If Ian is in the 30th percentile for height as a 15-year-old boy, how tall is he? How does he compare to Kona?

186

Question 7.40

Height and the z statistic: Imagine a class of thirty-three 15-year-old girls with an average height of 62.6 inches. Remember, μ = 63.8 inches and σ = 2.66 inches.

  1. Calculate the z statistic.

  2. How does this sample of girls compare to the distribution of sample means?

  3. What is the percentile rank for this sample?

Question 7.41

Height and the z statistic: Imagine a basketball team comprised of thirteen 15-year-old boys. The average height of the team is 69.5 inches. Remember, μ = 67 inches and σ = 3.19 inches.

  1. Calculate the z statistic.

  2. How does this sample of boys compare to the distribution of sample means?

  3. What is the percentile rank for this sample?

Question 7.42

The z distribution and statistics test scores: Imagine that your statistics professor lost all records of students’ raw scores on a recent test. However, she did record students’ z scores for the test, as well as the class average of 41 out of 50 points and the standard deviation of 3 points (treat these as population parameters). She informs you that your z score was 1.10.

  1. What was your percentile score on this test?

  2. Using what you know about z scores and percentiles, how did you do on this test?

  3. What was your original test score?

Question 7.43

The z statistic, distributions of means, and height: Using what we know about the height of 15-year-old girls (again, μ = 63.8 inches and σ = 2.66 inches), imagine that a teacher finds the average height of 14 female students in one of her classes to be 62.4 inches.

  1. Calculate the mean and the standard error of the distribution of mean heights.

  2. Calculate the z statistic for this group.

  3. What percentage of mean heights, based on a sample size of 14 students, would we expect to be shorter than this group?

  4. How often do mean heights equal to or more extreme than this size occur in this population?

  5. If statisticians define sample means that occur less than 5% of the time as “special” or rare, what would you say about this result?

Question 7.44

The z statistic, distributions of means, and height: Another teacher decides to average the height of all 15-year-old male students in his classes throughout the day. By the end of the day, he has measured the heights of 57 boys and calculated an average of 68.1 inches (remember, for this population μ = 67 inches and σ = 3.19 inches).

  1. Calculate the mean and the standard error of the distribution of mean heights.

  2. Calculate the z statistic for this group.

  3. What percentage of groups of people would we expect to have mean heights, based on samples of this size (57), taller than this group?

  4. How often do mean heights equal to or more extreme than 68.1 occur in this population?

  5. How does this result compare to the statistical significance cutoff of 5%?

Question 7.45

Directional versus nondirectional hypotheses: For each of the following examples, identify whether the research has expressed a directional or a nondirectional hypothesis:

  1. A researcher is interested in studying the relation between the use of antibacterial products and the dryness of people’s skin. He thinks these products might alter the moisture in skin differently from other products that are not antibacterial.

  2. A student wonders if grades in a class are in any way related to where a student sits in the classroom. In particular, do students who sit in the front row get better grades, on average, than the general population of students?

  3. Cell phones are everywhere, and we are now available by phone almost all of the time. Does this translate into a change in the closeness of our long-distance relationships?

Question 7.46

Null hypotheses and research hypotheses: For each of the following examples (the same as those in Exercise 7.45), state the null hypothesis and the research hypothesis, in both words and symbolic notation:

  1. A researcher is interested in studying the relation between the use of antibacterial products and the dryness of people’s skin. He thinks these products might alter the moisture in skin differently from other products that are not antibacterial.

  2. A student wonders if grades in a class are in any way related to where a student sits in the classroom. In particular, do students who sit in the front row get better grades, on average, than the general population of students?

  3. Cell phones are everywhere, and we are now available by phone almost all of the time. Does this translate into a change in the closeness of our long-distance relationships?

Question 7.47

The z distribution and Hurricane Katrina: Hurricane Katrina hit New Orleans on August 29, 2005. The National Weather Service Forecast Office maintains online archives of climate data for all U.S. cities and areas. These archives allow us to find out, for example, how the rainfall in New Orleans that August compared to that in the other months of 2005. The table below shows the National Weather Service data (rainfall in inches) for New Orleans in 2005.

187

January 4.41
February 8.24
March 4.69
April 3.31
May 4.07
June 2.52
July 10.65
August 3.77
September 4.07
October 0.04
November 0.75
December 3.32
  1. Calculate the z score for August. (Note: These are raw data for the population, rather than summaries, so you have to calculate the mean and the standard deviation first.)

  2. What is the percentile for the rainfall in August? Does this surprise you? Explain.

  3. When results surprise us, it is worthwhile to examine individual data points more closely or even to go beyond the data. The daily climate data as listed by this source for August 2005 shows the code “M” next to August 29, 30, and 31 for all climate statistics. The code says: “[REMARKS] ALL DATA MISSING AUGUST 29, 30, AND 31 DUE TO HURRICANE KATRINA.” Pretend you were hired as a consultant to determine the percentile for that August. Write a brief paragraph for your report, explaining why the data you generated are likely to be inaccurate.

  4. What raw scores mark the cutoff for the top and bottom 10% for these data? Based on these scores, which months had extreme data for 2005? Why should we not trust these data?

Question 7.48

Percentiles and IQ scores: IQ scores are designed to have a mean of 100 and a standard deviation of 15. IQ testing is one way in which people are categorized as having different levels of mental disability; there are four levels of mental retardation between the IQ scores of 0 and 70.

  1. People with IQ scores of 20–35 are said to have severe mental retardation and to be able to learn only basic skills (e.g., how to talk, basic self-care). What percentage of people fall in this range?

  2. People with IQ scores of 50–70 have scores in the topmost category of IQ scores that indicate an impairment. They are said to have mild mental retardation. They can attain as high as a sixth-grade education and are often self-sufficient. What percentage of people fall in this range?

  3. A person has an IQ score of 66. What is her percentile?

  4. A person falls at the 3rd percentile. What is his IQ score? Would he be classified as having a mental disability?

Question 7.49

Step 1 of hypothesis testing for a study of the Wechsler Adult Intelligence Scale: Boone (1992) examined scores on the Wechsler Adult Intelligence Scale-Revised (WAIS-R) for 150 adult psychiatric inpatients. He determined the “intrasubtest scatter” score for each inpatient. Intrasubtest scatter refers to patterns of responses in which respondents are almost as likely to get easy questions wrong as hard ones. In the WAIS-R, we expect more wrong answers near the end, as the questions become more difficult, so high levels of intrasubtest scatter would be an unusual pattern of responses. Boone wondered if psychiatric patients have different response patterns than nonpatients have. He compared the intrasubtest scatter for 150 patients to population data from the WAIS-R standardization group. Assume that he had access both to means and standard deviations for this population. Boone reported that “the standardization group’s intrasubtest scatter was significantly greater than those reported for the psychiatric inpatients” and concluded that such scatter is normal.

  1. What are the two populations?

  2. What would the comparison distribution be? Explain.

  3. What hypothesis test would you use? Explain.

  4. Check the assumptions for this hypothesis test. Label your answers (1) through (3).

  5. What does Boone mean when he says significantly?

Question 7.50

Step 2 of hypothesis testing for a study of the Wechsler Adult Intelligence Scale: Refer to the scenario described in Exercise 7.49.

  1. State the null and research hypotheses for a two-tailed test in both words and symbols.

  2. Imagine that you wanted to replicate this study. Based on the findings described in Exercise 7.49, state the null and research hypotheses for a one-tailed test in both words and symbols.

Question 7.51

Step 1 of hypothesis testing for a study of college football: Let’s consider whether U.S. college football teams are more likely or less likely to be mismatched in the upper National Collegiate Athletic Association (NCAA) divisions. Overall, the 53 Football Bowl Subdivision (FBS) games (formerly Division I-A; the highest division) had a mean spread (winning score minus losing score) of 16.189 in a particular week, with a standard deviation of 12.128. We took a sample of 4 games that were played that week in the next-highest league, the Football Championship Subdivision (FCS; formerly Division I-AA), to see if the mean spread was different; one of the many leagues within the FCS, the Patriot League, played 4 games that weekend.

188

  1. List the independent variable and the dependent variable in this example.

  2. Did we use random selection? Explain.

  3. Identify the populations of interest in this example.

  4. State the comparison distribution.

  5. Check the assumptions for this test.

Question 7.52

Step 2 of hypothesis testing for a study of college football: Refer to Exercise 7.51.

  1. State the null hypothesis and the research hypothesis for a two-tailed test in both words and symbols.

  2. One of our students hypothesized that the spread would be bigger among the FCS teams because “some of them are really bad and would get crushed.” State the one-tailed null hypothesis and research hypothesis, based on our student’s prediction, in both words and symbols.

Question 7.53

Steps 3 through 6 of hypothesis testing for a study of college football: Refer to Exercise 7.51. Remember, the population mean is 16.189, with a standard deviation of 12.128. The results for the four FCS Patriot League games are as follows:

Holy Cross, 27/Bucknell, 10

Lehigh, 23/Colgate, 15

Lafayette, 31/Fordham, 24

Georgetown, 24/Marist, 21

  1. Conduct steps 3 through 6 of hypothesis testing. (You already conducted steps 1 and 2 in Exercises 7.51(e) and 7.52(a), respectively.)

  2. Would you be willing to generalize these findings beyond the sample? Explain.

Putting It All Together

Question 7.54

The Graded Naming Test and sociocultural differences: Researchers often use z tests to compare their samples to known population norms. The Graded Naming Test (GNT) asks respondents to name objects in a set of 30 black-and-white drawings. The test, often used to detect brain damage, starts with easy words like kangaroo and gets progressively more difficult, ending with words like sextant. The GNT population norm for adults in England is 20.4. Roberts (2003) wondered whether a sample of Canadian adults had different scores than adults in England. If they were different, the English norms would not be valid for use in Canada. The mean for 30 Canadian adults was 17.5. For the purposes of this exercise, assume that the standard deviation of the adults in England is 3.2.

  1. Conduct all six steps of a z test. Be sure to label all six steps.

  2. Some words on the GNT are more commonly used in England. For example, a mitre, the head-piece worn by bishops, is worn by the archbishop of Canterbury in public ceremonies in England. No Canadian participant correctly responded to this item, whereas 55% of English adults correctly responded. Explain why we should be cautious about applying norms to people different from those on whom the test was normed.

  3. When we conduct a one-tailed test instead of a two-tailed test, there are small changes in steps 2 and 4 of hypothesis testing. (Note: For this example, assume that those from populations other than the one on which it was normed will score lower, on average. That is, hypothesize that the Canadians will have a lower mean.) Conduct steps 2, 4, and 6 of hypothesis testing for a one-tailed test.

  4. Under which circumstance—a one-tailed or a two-tailed test—is it easier to reject the null hypothesis? Explain.

  5. If it becomes easier to reject the null hypothesis under one type of test (one-tailed versus two-tailed), does this mean that there is a bigger difference between the groups with a one-tailed test than with a two-tailed test? Explain.

  6. When we change the p level that we use as a cutoff, there is a small change in step 4 of hypothesis testing. Although 0.05 is the most commonly used p level, other values, such as 0.01, are often used. For this example, conduct steps 4 and 6 of hypothesis testing for a two-tailed test and p level of 0.01, determining the cutoff and drawing the curve.

  7. With which p level—0.05 or 0.01—is it easiest to reject the null hypothesis? Explain.

  8. If it is easier to reject the null hypothesis with certain p levels, does this mean that there is a bigger difference between the samples with one p level versus the other p level? Explain.

Question 7.55

Patient adherence and orthodontics: A research report (Behenam & Pooya, 2007) begins, “There is probably no other area of health care that requires…cooperation to the extent that orthodontics does,” and explores factors that affected the number of hours per day that Iranian patients wore their orthodontic appliances. The patients in the study reported that they used their appliances, on average, 14.78 hours per day, with a standard deviation of 5.31. We’ll treat this group as the population for the purposes of this example. Let’s say a researcher wanted to study whether a DVD with information about orthodontics led to an increase in the amount of time patients wore their appliances, but decided to use a two-tailed test to be conservative. Let’s say he studied the next 15 patients at his clinic, asked them to watch the DVD, and then found that they wore their appliances, on average, 17 hours per day.

189

  1. What is the independent variable? What is the dependent variable?

  2. Did the researcher use random selection to choose his sample? Explain your answer.

  3. Conduct all six steps of hypothesis testing. Be sure to label all six steps.

  4. If the researcher’s decision in step 6 were wrong, what type of error would he have made? Explain your answer.

Question 7.56

Radiation levels on Japanese farms: Fackler (2012) reported in the New York Times that Japanese farmers have become skeptical of the Japanese government’s assurances that radiation levels were within legal limits in the wake of the 2011 tsunami and radiation disaster at Fukushima. After reports of safe levels in Onami, more than 12 concerned farmers tested their crops and found dangerously high levels of cesium.

  1. If the farmers wanted to conduct a z test comparing their results to the cesium levels found in areas that had not been exposed to the radiation, what would their sample be? Be specific.

  2. Conduct step 1 of hypothesis testing.

  3. Conduct step 2 of hypothesis testing.

  4. Conduct step 4 of hypothesis testing for a two-tailed test and a p level of 0.05.

  5. Imagine that the farmers calculated a z statistic of 3.2 for their sample. Conduct step 6 of hypothesis testing.

  6. If the farmers’ conclusions were incorrect, what type of error would they have made? Explain your answer.

Question 7.57

You have conducted a study with 120 participants (60 female, 60 male) about the relation between attitudes toward cohabitation before marriage (on a 30-item scale) and self-reported sexual behaviors (on a 20-item scale). Most respondents filled out both scales completely. Everyone completed the scale assessing attitudes toward cohabitation, but 1 participant marked the highest possible score on every item on both scales and finished very quickly. In addition, 13 women and 4 men did not complete the 20 questions about sexual behavior. Of these, 9 women and 2 men did not respond at all to the questions about sexual behavior; 3 women and 1 man answered just 10 of these questions; and 2 women and 1 man did not answer just 1 question.

  1. What are the possible causes of incomplete data on the sexual behavior scale?

  2. What choices do you have regarding the missing data on the sexual behavior scale?

  3. What might you do with the data from the participant who reported the highest possible scores on every item on both scales?

  4. Explain why you would or would not report on how you made your decision about what to do with outliers or with the missing or incomplete data in your write-up of this experiment.

Question 7.58

You have just conducted a study testing how well two independent variables, daily sugar intake (as assessed by a 25-item eating habits scale) and physical activity (as assessed by a 20-item daily physical activity scale), predicted the dependent variable of blood sugar levels. There were only 17 participants to start with, and 3 of them dropped out before having their blood sugar levels assessed. In addition, 2 participants left one item blank on the physical activity scale, and 4 other participants left most of the data on the eating habits scale blank. At their debriefing interview, they said they just couldn’t estimate food intake with any accuracy.

  1. What will you do with the data of the 3 participants who dropped out just before having their blood sugar levels assessed?

  2. What are your options with regard to the data from the 2 participants who left one item blank on the physical activity scale?

  3. What are your options with regard to the data from the 4 participants who did not respond to most of the items on the eating habits scale?

  4. Do you recommend using these data at all? If so, how?

Question 7.59

In Next Steps, we noted that the z distribution is sometimes used to identify potential outliers in a data set. Box Office Mojo (2013) provides data on U.S. box office receipts for major films. Here are worldwide box office grosses for a randomly selected sample of 15 of the 100 top-grossing films of 2012. Note that we have rounded figures to the nearest million. The figures reported below are millions of dollars.

190

Movie Millions of dollars
Marvel’s The Avengers 1512
Flight 162
Skyfall 1109
Wrath of the Titans 305
Tyler Perry’s Madea’s Witness Protection 66
Zero Dark Thirty 109
Lincoln 275
Moonrise Kingdom 68
Life of Pi 609
The Lucky One 99
The Bourne Legacy 276
The Watch 68
Rock of Ages 59
Cloud Atlas 131
Snow White and the Huntsman 397
  1. Eyeball the data. Which score or scores seem like they might be outliers?

  2. Sometimes potential outliers are defined as scores that are beyond 2 standard deviations from the mean—that is, scores with z scores less than −2.00 or more than 2.00. Based on that criterion, are any of these scores potential outliers? (Hint: You will have to calculate the mean and standard deviation of the data from this sample first.)

  3. Sometimes potential outliers are defined as scores that are beyond 3 standard deviations from the mean—that is, scores with z scores less than −3.00 or more than 3.00. Based on that criterion, are any of these scores potential outliers?

  4. Why might it make sense to eliminate potential outliers from any data analyses?

  5. Explain why the decision about how to identify potential outliers should be made before collecting data.