Exercises

Clarifying the Concepts

Question 6.1

6.1

Explain how the word normal is used in everyday conversation; then explain how statisticians use it.

Question 6.2

6.2

What point on the normal curve represents the most commonly occurring observation?

Question 6.3

6.3

How does the size of a sample of scores affect the shape of the distribution of data?

Question 6.4

6.4

Explain how the word standardize is used in everyday conversation; then explain how statisticians use it.

Question 6.5

6.5

What is a z score?

Question 6.6

6.6

Give three reasons why z scores are useful.

Question 6.7

6.7

What are the mean and standard deviation of the z distribution?

Question 6.8

6.8

Why is the central limit theorem such an important idea for dealing with a population that is not normally distributed?

Question 6.9

6.9

What does the symbol μM stand for?

Question 6.10

6.10

What does the symbol σM stand for?

Question 6.11

6.11

What is the difference between standard deviation and standard error?

Question 6.12

6.12

Why does the standard error become smaller simply by increasing the sample size?

Question 6.13

6.13

What does a z statistic—a z score based on a distribution of means—tell us about a sample mean?

Question 6.14

6.14

Each of the following equations has an error. Identify, fix, and explain the error in each of the following equations.

  1. image

  2. image (for a distribution of means)

  3. image (for a distribution of means)

  4. image (for a distribution of scores)

Calculating the Statistics

Question 6.15

6.15

Create a histogram for these three sets of scores. Each set of scores represents a sample taken from the same population.

  1. 6 4 11 7 7

  2. 6 4 11 7 7 2 10 7 8 6 6 7 5 8

  3. 6 4 11 7 7 2 10 7 8 6 6 7 5 8
    7 8 9 7 6 9 3 9 5 6 8 11 8 3
    8 4 10 8 5 5 8 9 9 7 8 7 10 7
  4. What do you observe happening across these three distributions?

150

Question 6.16

6.16

A population has a mean of 250 and a standard deviation of 47. Calculate z scores for each of the following raw scores:

  1. 391

  2. 273

  3. 199

  4. 160

Question 6.17

6.17

A population has a mean of 1179 and a standard deviation of 164. Calculate z scores for each of the following raw scores:

  1. 1000

  2. 721

  3. 1531

  4. 1184

Question 6.18

6.18

For a population with a mean of 250 and a standard deviation of 47, calculate the z score for 250. Explain the meaning of the value you obtain.

Question 6.19

6.19

For a population with a mean of 250 and a standard deviation of 47, calculate the z scores for 203 and 297. Explain the meaning of these values.

Question 6.20

6.20

For a population with a mean of 250 and a standard deviation of 47, convert each of the following z scores to raw scores.

  1. 0.54

  2. −2.66

  3. −1.00

  4. 1.79

Question 6.21

6.21

For a population with a mean of 1179 and a standard deviation of 164, convert each of the following z scores to raw scores.

  1. −0.23

  2. 1.41

  3. 2.06

  4. 0.03

Question 6.22

6.22

By design, the verbal subtest of the Graduate Record Examination (GRE) has a population mean of 500 and a population standard deviation of 100. Convert the following z scores to raw scores without using a formula.

  1. 1.5

  2. −0.5

  3. −2.0

Question 6.23

6.23

By design, the verbal subtest of the Graduate Record Examination (GRE) has a population mean of 500 and a population standard deviation of 100. Convert the following z scores to raw scores using symbolic notation and the formula.

  1. 1.5

  2. −0.5

  3. −2.0

Question 6.24

6.24

A study of the Consideration of Future Consequences (CFC) scale found a mean score of 3.20, with a standard deviation of 0.70, for the 800 students in the sample (Adams, 2012). (Treat this sample as the entire population of interest.)

  1. If the CFC score is 4.2, what is the z score? Use symbolic notation and the formula. Explain why this answer makes sense.

  2. If the CFC score is 3.0, what is the z score? Use symbolic notation and the formula. Explain why this answer makes sense.

  3. If the z score is 0, what is the CFC score? Explain.

Question 6.25

6.25

Using the instructions in Example 6.8, compare the following “apples and oranges”: a score of 45 when the population mean is 51 and the standard deviation is 4, and a score of 732 when the population mean is 765 and the standard deviation is 23.

  1. Convert these scores to standardized scores.

  2. Using the standardized scores, what can you say about how these two scores compare to each other?

Question 6.26

6.26

Compare the following scores:

  1. A score of 811 when μ = 800 and σ = 29 against a score of 4524 when μ = 3127 and σ = 951

  2. A score of 17 when μ = 30 and σ = 12 against a score of 67 when μ = 88 and σ = 16

Question 6.27

6.27

Assume a normal distribution when answering the following questions.

  1. What percentage of scores falls below the mean?

  2. What percentage of scores falls between 1 standard deviation below the mean and 2 standard deviations above the mean?

  3. What percentage of scores lies beyond 2 standard deviations away from the mean (on both sides)?

  4. What percentage of scores is between the mean and 2 standard deviations above the mean?

  5. What percentage of scores falls under the normal curve?

Question 6.28

6.28

Compute the standard error (σM) for each of the following sample sizes, assuming a population mean of 100 and a standard deviation of 20:

  1. 45

  2. 100

  3. 4500

Question 6.29

6.29

A population has a mean of 55 and a standard deviation of 8. Compute μM and σM for each of the following sample sizes:

  1. 30

  2. 300

  3. 3000

151

Question 6.30

6.30

Compute a z statistic for each of the following, assuming the population has a mean of 100 and a standard deviation of 20:

  1. A sample of 43 scores has a mean of 101.

  2. A sample of 60 scores has a mean of 96.

  3. A sample of 29 scores has a mean of 100.

Question 6.31

6.31

A sample of 100 people had a mean depression score of 85; the population mean for this depression measure is 80, with a standard deviation of 20. A different sample of 100 people had a mean score of 17 on a different depression measure; the population mean for this measure is 15, with a standard deviation of 5.

  1. Convert these means to z statistics.

  2. Using the z statistics, what can you say about how these two means compare to each other?

Applying the Concepts

Question 6.32

6.32

Normal distributions in real life: Many variables are normally distributed, but not all are. (Fortunately, the central limit theorem saves us when we conduct research on samples from nonnormal populations if the samples are larger than 30!) Which of the following are likely to be normally distributed, and which are likely to be nonnormal? Explain your answers.

  1. In the population of students admitted to the highly selective University of Toronto, scores on the federal or provincial literacy test (required for university admissions)

  2. In the population of secondary school students in New Zealand, the number of daily calories consumed

  3. In the population of employed adults in San Antonio, Texas, the amount of time spent commuting to work

  4. In the population of North American university students, the number of frequent flyer miles earned in a year

Question 6.33

6.33

Distributions and getting ready for a date: We asked 150 students in our statistics classes how long, in minutes, they typically spend getting ready for a date. The scores ranged from 1 minute to 120 minutes, and the mean was 51.52 minutes. Here are the data for 40 of these students:

30 90 60 60 5 90 30 40 45 60
60 30 90 60 25 10 90 20 15 60
60 75 45 60 30 75 15 30 45 1
20 25 45 60 90 10 105 90 30 60
  1. Construct a histogram for the 10 scores in the first row.

  2. Construct a histogram for all 40 of these scores.

  3. What happened to the shape of the distribution as you increased the number of scores from 10 to 40? What do you think would happen if the data for all 150 students were included? What if we included 10,000 scores? Explain this phenomenon.

  4. Are these distributions of scores or distributions of means? Explain.

  5. The data here are self-reported. That is, our students wrote down how many minutes they believe that they typically take to get ready for a date. This accounts for the fact that the data include many “pretty” numbers, such as 30, 60, or 90 minutes. What might have been a better way to operationalize this variable?

  6. Do these data suggest any hypotheses that you might like to study? List at least one.

Question 6.34

6.34

z scores and the GRE: By design, the verbal subtest of the GRE has a population mean of 500 and a population standard deviation of 100 (the quantitative subtest has the same mean and standard deviation).

  1. Use symbolic notation to state the mean and the standard deviation of the GRE verbal test.

  2. Convert a GRE score of 700 to a z score without using a formula.

  3. Convert a GRE score of 550 to a z score without using a formula.

  4. Convert a GRE score of 400 to a z score without using a formula.

Question 6.35

6.35

The z distribution and hours slept: A sample of 150 statistics students reported the typical number of hours that they sleep on a weeknight. The mean number of hours was 6.65, and the standard deviation was 1.24. (For this exercise, treat this sample as the entire population of interest.)

  1. What is always the mean of the z distribution?

  2. Using the sleep data, demonstrate that your answer to part (a) is the mean of the z distribution. (Hint: Calculate the z score for a student who is exactly at the mean.)

  3. What is always the standard deviation of the z distribution?

  4. Using the sleep data, demonstrate that your answer to part (c) is the standard deviation of the z distribution. (Hint: Calculate the z score for a student who is exactly 1 standard deviation above or below the mean.)

  5. How many hours of sleep do you typically get on a weeknight? What would your z score be, based on this population?

Question 6.36

6.36

The z distribution applied to admiration ratings: A sample of 148 of our statistics students rated their level of admiration for Hillary Clinton on a scale of 1 to 7. The mean rating was 4.06, and the standard deviation was 1.70. (For this exercise, treat this sample as the entire population of interest.)

152

  1. Use these data to demonstrate that the mean of the z distribution is always 0.

  2. Use these data to demonstrate that the standard deviation of the z distribution is always 1.

  3. Calculate the z score for a student who rated his admiration of Hillary Clinton as 6.1.

  4. A student had a z score of −0.55. What rating did she give for her admiration of Hillary Clinton?

Question 6.37

6.37

z statistics and CFC scores: We have already discussed summary parameters for CFC scores for the population of participants in a study by Adams (2012). The mean CFC score was 3.20, with a standard deviation of 0.70. (Remember that we treated the sample of 800 participants as the entire population.) Imagine that you randomly selected 40 people from this population and had them watch a series of videos on financial planning after graduation. The mean CFC score after watching the video was 3.62.

  1. Why would it not make sense to compare the mean of this sample with the distribution of scores? Be sure to discuss the spread of distributions in your answer.

  2. In your own words, what would the null hypothesis predict? What would the research hypothesis predict?

  3. Using symbolic notation and formulas, what are the appropriate measures of central tendency and variability for the distribution from which this sample comes?

  4. Using symbolic notation and the formula, what is the z statistic for this sample mean?

Question 6.38

6.38

Converting z scores to raw CFC scores: A study using the Consideration of Future Consequences scale found a mean CFC score of 3.20, with a standard deviation of 0.70, for the 800 students in the sample (Adams, 2012).

  1. Imagine that your z score on the CFC score is −1.2. What is your raw score? Use symbolic notation and the formula. Explain why this answer makes sense.

  2. Imagine that your z score on the CFC score is 0.66. What is your raw score? Use symbolic notation and the formula. Explain why this answer makes sense.

Question 6.39

6.39

The normal curve and real-life variables, part I: For each of the following variables, state whether the distribution of scores would likely approximate a normal curve. Explain your answer.

  1. Number of movies that a college student watches in a year

  2. Number of full-page advertisements in a magazine

  3. Human birth weights in Canada

Question 6.40

6.40

The normal curve and real-life variables, part II: For each of the following variables, state whether the distribution of scores would likely approximate a normal curve. Explain your answer.

  1. Number of minutes that students check Facebook and other social media each week

  2. Volume of water that people drink each day

  3. The length, in minutes, of YouTube videos

Question 6.41

6.41

The normal curve in the media: Statistics geeks rejoiced when the New York Times published an article on the normal curve (Dunn, 2013)! Biologist Casey Dunn wrote that “Many real-world observations can be approximated by, and tested against, the same expected pattern: the normal distribution.” He described the normal curve as symmetric and bell-shaped with more observations gathered near the mean. He offered several examples: “The size of flowers, the physiological response to a drug, the breaking force in a batch of steel cables,” but also noted that there are important exceptions, including household income. In your own words, explain to someone who has never taken statistics why household income, unlike Dunn’s other examples, is not normally distributed.

Question 6.42

6.42

Percentiles and eating habits: As noted in How It Works 6.1, Georgiou and colleagues (1997) reported that college students had healthier eating habits, on average, than did those who were neither college students nor college graduates. The 412 students in the study ate breakfast a mean of 4.1 times per week, with a standard deviation of 2.4. (For this exercise, again imagine that this is the entire population of interest.)

  1. What is the approximate percentile for a student who eats breakfast four times per week?

  2. What is the approximate percentile for a student who eats breakfast six times per week?

  3. What is the approximate percentile for a student who eats breakfast twice a week?

Question 6.43

6.43

z scores and comparisons of sports teams: A common quandary faces sports fans who live in the same city but avidly follow different sports. How does one determine whose team did better with respect to its league division? In 2012, the Atlanta Braves baseball team and the Atlanta Falcons football team both did well. The Braves won 94 games and the Falcons won 13. Which team was better in 2012? The question, then, is: Were the Braves better, as compared to the other teams in Major League Baseball (MLB), than the Falcons, as compared to the other teams in the National Football League (NFL)? Some of us could debate this for hours, but it’s better to examine some statistics. Let’s operationalize performance over the season as the number of wins during regular season play.

153

  1. In 2012, the mean number of wins for MLB teams was 81.00, with a standard deviation of 11.733. Because all teams were included, these are population parameters. What is the Braves’ z score?

  2. In 2012, the mean number of wins for all NFL teams was 7.969, with a standard deviation of 3.036. What is the Falcons’ z score?

  3. Which team did better, according to these data?

  4. How many games would the team with the lower z score have had to win to beat the team with the higher z score?

  5. List at least one other way we could have operationalized the outcome variable (i.e., team performance).

Question 6.44

6.44

z scores and comparisons of admiration ratings: Our statistics students were asked to rate their admiration of Hillary Clinton on a scale of 1 to 7. They also were asked to rate their admiration of actor, singer, and former American Idol judge Jennifer Lopez and their admiration of tennis player Venus Williams on a scale of 1 to 7. As noted earlier, the mean rating of Clinton was 4.06, with a standard deviation of 1.70. The mean rating of Lopez was 3.72, with a standard deviation of 1.90. The mean rating of Williams was 4.58, with a standard deviation of 1.46. One of our students rated her admiration of Clinton and Williams at 5 and her admiration of Lopez at 4.

  1. What is the student’s z score for her rating of Clinton?

  2. What is the student’s z score for her rating of Williams?

  3. What is the student’s z score for her rating of Lopez?

  4. Compared to the other statistics students in our sample, which celebrity does this student most admire? (We can tell by her raw scores that she prefers Clinton and Williams to Lopez, but when we take into account the general perception of these celebrities, how does this student feel about each one?)

  5. How do z scores allow us to make comparisons that we cannot make with raw scores? That is, describe the benefits of standardization.

Question 6.45

6.45

Raw scores, z scores, percentiles, and sports teams: Let’s look at baseball and football again. We’ll look at data for all of the teams in Major League Baseball (MLB) and the National Football League (NFL), respectively.

  1. In 2012, the mean number of wins for MLB teams was 81.00, with a standard deviation of 11.733. The perennial underdogs, the Chicago Cubs, had a z score of −1.705. How many games did they win?

  2. In 2012, the mean number of wins for all NFL teams was 7.969, with a standard deviation of 3.036. The New Orleans Saints had a z score of −0.319. How many games did they win?

  3. The Indianapolis Colts were just below the 84th percentile in terms of NFL wins. How many games did they win? Explain how you obtained your answer.

  4. Explain how you can examine your answers in parts (a), (b), and (c) to determine whether the numbers make sense.

Question 6.46

6.46

Distributions and life expectancy: Researchers have reported that the projected life expectancy for South African men diagnosed with human immunodeficiency virus (HIV) at age 20 who receive antiretroviral therapy (ART) is 27.6 years ( Johnson et al., 2013). Imagine that the researchers determined this by following 250 people with HIV who were receiving ART and calculating the mean.

  1. What is the dependent variable of interest?

  2. What is the population?

  3. What is the sample?

  4. For the population, describe what the distribution of scores would be.

  5. For the population, describe what the distribution of means would be.

  6. If the distribution of the population were skewed, would the distribution of scores likely be skewed or approximately normal? Explain your answer.

  7. Would the distribution of means be skewed or approximately normal? Explain your answer.

Question 6.47

6.47

Distributions, personality testing, and depression: The revised version of the Minnesota Multiphasic Personality Inventory (MMPI-2) is the most frequently administered self-report personality measure. Test-takers respond to more than 500 true/false statements, and their responses are scored, typically by a computer, on a number of scales (e.g., hypochondriasis, depression, psychopathic deviation). Respondents receive a T score on each scale that can be compared to norms. (You’re likely to encounter T scores if you take psychology classes, but it’s good to be aware that they are different from the t statistic that you will learn about in a few chapters.) T scores are another way to standardize scores so that percentiles and cutoffs can be determined. The mean T score is always 50, and the standard deviation is always 10. Imagine that you administer the MMPI-2 to 95 respondents who have recently lost a parent; you wonder whether their scores on the depression scale will be, on average, higher than the norms. You find a mean score on the depression scale of 55 in your sample.

  1. Using symbolic notation, report the mean and standard deviation of the population.

  2. Using symbolic notation and formulas (where appropriate), report the mean and standard error for the distribution of means to which your sample will be compared.

    154

  3. In your own words, explain why it makes sense that the standard error is smaller than the standard deviation.

Question 6.48

6.48

Distributions, personality testing, and social introversion: See the description of the MMPI-2 in Exercise 6.47. The mean T score is always 50, and the standard deviation is always 10. Imagine that you administer the MMPI-2 to 50 respondents who do not use Instagram or any other social media; you wonder whether their scores on the social introversion scale will be, on average, higher than the norms. You find a mean score on the social introversion scale of 60 in your sample.

  1. Using symbolic notation, report the mean and standard deviation of the population.

  2. Using symbolic notation and formulas (where appropriate), report the mean and standard error for the distribution of means to which your sample will be compared.

  3. In your own words, explain why it makes sense that the standard error is smaller than the standard deviation.

Question 6.49

6.49

Distributions and the General Social Survey: The General Social Survey (GSS) is a survey of approximately 2000 adults conducted each year since 1972, for a total of more than 38,000 participants. During several years of the GSS, participants were asked how many close friends they have. The mean for this variable is 7.44 friends, with a standard deviation of 10.98. The median is 5.00 and the mode is 4.00.

  1. Are these data for a distribution of scores or a distribution of means? Explain.

  2. What do the mean and standard deviation suggest about the shape of the distribution? (Hint: Compare the sizes of the mean and the standard deviation.)

  3. What do the three measures of central tendency suggest about the shape of the distribution?

  4. Let’s say that these data represent the entire population. Pretend that you randomly selected a person from this population and asked how many close friends she or he had. Would you compare this person to a distribution of scores or to a distribution of means? Explain your answer.

  5. Now pretend that you randomly selected a sample of 80 people from this population. Would you compare this sample to a distribution of scores or to a distribution of means? Explain your answer.

  6. Using symbolic notation, calculate the mean and standard error of the distribution of means.

  7. What is the likely shape of the distribution of means? Explain your answer.

Question 6.50

6.50

A distribution of scores and the General Social Survey: Refer to Exercise 6.49. Again, pretend that the GSS sample is the entire population of interest.

  1. Imagine that you randomly selected one person from this population who reported that he had 18 close friends. Would you compare his score to a distribution of scores or to a distribution of means? Explain your answer.

  2. What is his z score? Based on this z score, what is his approximate percentile?

  3. Does it make sense to calculate a percentile for this person? Explain your answer. (Hint: Consider the shape of the distribution.)

Question 6.51

6.51

A distribution of means and the General Social Survey: Refer to Exercise 6.49. Again, pretend that the GSS sample is the entire population of interest.

  1. Imagine that you randomly selected 80 people from this population, and that they had a mean of 8.7. Would you compare this sample mean to a distribution of scores or to a distribution of means? Explain your answer.

  2. What is the z statistic for this mean? Based on this z statistic, what is the approximate percentile for this sample?

  3. Does it make sense to calculate a percentile for this sample? Explain your answer. (Hint: Consider the shape of the distribution.)

Question 6.52

6.52

Percentiles, raw scores, and credit card theft: Credit card companies will often call cardholders if the pattern of use indicates that the card might have been stolen. Let’s say that you charge an average of $280 a month on your credit card, with a standard deviation of $75. The credit card company will call you anytime your purchases for the month exceed the 98th percentile. What is the dollar amount beyond which you’ll get a call from your credit card company?

Putting It All Together

Question 6.53

6.53

Probability and medical treatments: The three most common treatments for blocked coronary arteries are medication; bypass surgery; and angioplasty, which is a medical procedure that involves clearing out arteries and that leads to higher profits for doctors than do the other two procedures. The highest rate of angioplasty in the United States is in Elyria, a small city in Ohio. A 2006 article in the New York Times stated that “the statistics are so far off the charts—Medicare patients in Elyria receive angioplasties at a rate nearly four times the national average—that Medicare and at least one commercial insurer are starting to ask questions.” The rate, in fact, is three times as high as that of Cleveland, Ohio, which is located just 30 miles from Elyria.

155

  1. What is the population in this example? What is the sample?

  2. How did probability play a role in the decision of Medicare and the commercial insurer to begin investigations?

  3. How might the z distribution help the investigators to detect possible fraud in this case?

  4. If the insurers determine that physicians in Elyria are committing fraud, but the insurers are wrong, what kind of error would they have made? Explain.

  5. Does Elyria’s extremely high percentile mean that the doctors in town are committing fraud? Cite two other possible reasons for Elyria’s status as an outlier.

Question 6.54

6.54

Rural friendships and the General Social Survey: Earlier, we considered data from the GSS on numbers of close friends people reported having. The mean for this variable is 7.44, with a standard deviation of 10.98. Let’s say that you decide to use the GSS data to test whether people who live in rural areas have a different mean number of friends than does the overall GSS sample. Again, treat the overall GSS sample as the entire population of interest. Let’s say that you select 40 people living in rural areas and find that they have an average of 3.9 friends.

  1. What is the independent variable in this study? Is this variable nominal, ordinal, or scale?

  2. What is the dependent variable in this study? Is this variable nominal, ordinal, or scale?

  3. What is the null hypothesis for this study?

  4. What is the research hypothesis for this study?

  5. Would we compare the sample data to a distribution of scores or to a distribution of means? Explain.

  6. Using symbolic notation and formulas, calculate the mean and standard error for the distribution of means.

  7. Using symbolic notation and the formula, calculate the z statistic for this sample.

  8. What is the approximate percentile for this sample?

  9. Let’s say that the researchers concluded that people in rural areas have fewer friends than does the general population (thus rejecting the null hypothesis). If they are incorrect, have they made a Type I or a Type II error? Explain.

Question 6.55

6.55

Cheating on standardized tests: In their book Freakonomics, Levitt and Dubner (2009) describe alleged cheating among teachers in the Chicago public school system. Certain classrooms had suspiciously strong performances on standardized tests that often mysteriously declined the following year when a new teacher taught the same students. In about 5% of classrooms studied, Levitt and other researchers found blocks of correct answers, among most students, for the last few questions, an indication that the teacher had changed responses to difficult questions for most students. Let’s assume cheating in a given classroom if the overall standardized test score for the class showed a surprising change from one year to the next.

  1. How are the researchers operationalizing the variable of cheating in this study? Is this a nominal, ordinal, or scale variable?

  2. Explain how researchers can use the z distribution to catch cheating teachers.

  3. How might a histogram or frequency polygon be useful to researchers who are trying to catch cheating teachers?

  4. If researchers falsely conclude that teachers are cheating, what kind of error would they be committing? Explain.