Chapter 18 Exercises

Clarifying the Concepts

Question 18.1

When do we convert scale data to ordinal data?

Question 18.2

When the data on at least one variable are ordinal, the data on any scale variable must be converted from scale to ordinal. How do we convert a scale variable into an ordinal one?

Question 18.3

How does the transformation of scale data to ordinal data solve the problem of outliers?

Question 18.4

What does a histogram of rank-ordered data look like and why does it look that way?

Question 18.5

Explain how the relation between ranks is the core of the Spearman rank-order correlation.

Question 18.6

Define the symbols in the following term:

Question 18.7

What is the possible range of values for the Spearman rank-order correlation and how are these values interpreted?

Question 18.8

How would you respond in a situation in which you are ranking a set of scale data and there are two numbers that are exactly the same?

Question 18.9

When is it appropriate to use the Wilcoxon signed-rank test?

Question 18.10

How is N determined for the Wilcoxon signed-rank test and how does this differ from the way N is typically determined for most statistical tests?

Question 18.11

When conducting a Wilcoxon signed-rank test, why do we use absolute values when ranking the differences?

Question 18.12

When do we use the Mann–Whitney U test?

Question 18.13

What are the assumptions of the Mann–Whitney U test?

Question 18.14

How are the critical values for the Mann–Whitney U test and the Wilcoxon signed-rank test used differently than critical values for parametric tests?

Question 18.15

When is it appropriate to use the Kruskal–Wallis H test?

Question 18.16

Define and explain the symbols in the following equation:

Question 18.17

If the data meet the assumptions of the parametric test, why is it preferable to use the parametric test rather than the nonparametric alternative?

Question 18.18

What is bootstrapping?

Question 18.19

How can bootstrapping be used as an alternative to nonparametric tests when working with small sample sizes?

Calculating the Statistics

Question 18.20

In order to compute statistics, we need to have working formulas. For the following, (i) identify the incorrect symbol, (ii) state what the correct symbol should be, and (iii) explain why the initial symbol was incorrect.

Question 18.21

Consider the following scale data.

Participant Variable X Variable Y
  1   134.5   64.00
  2   186   60.00
  3   157   61.50
  4   129   66.25
  5   147   65.50
  6   133   62.00
  7   141   62.50
  8   147   62.00
  9   136   63.00
10   147   65.50
  1. Convert the data to ordinal or ranked data, starting with a rank of 1 for the smallest data point.

  2. Compute the Spearman correlation coefficient.

Question 18.22

Consider the following scale data.

Participant Variable X Variable Y
1 $1250 25
2 $1400 21
3 $1100 32
4 $1450 54
5 $1600 38
6 $2100 62
7 $3750 43
8 $1300 32
  1. Convert the data to ordinal or ranked data, starting with a rank of 1 for the smallest data point.

  2. Compute the Spearman correlation coefficient.

519

Question 18.23

The following fictional data represent the finishing place for runners of a 5-kilometer race and the number of hours they trained per week.

Race Rank Hours Trained Race Rank Hours Trained
1 25 6 18
2 25 7 12
3 22 8 17
4 18 9 15
5 19 10   16
  1. Calculate the Spearman correlation for this set of data.

  2. Make a decision regarding the null hypothesis. Is there a significant correlation between a runner’s finishing place and the amount the runner trained?

Question 18.24

Imagine that a researcher measured a group of participants at two time points. Fictional scores for these two time points appear below. Are the scores different at time 1 and time 2?

Person Time 1 Time 2
1 56 83    
2 74 116    
3 81 96    
4 47 56    
5 78 120    
6 96 100    
7 72 71    
  1. Compute the Wilcoxon signed-rank test statistic.

  2. Make a decision regarding the null hypothesis.

Question 18.25

Assume a group of students provides happiness ratings for how happy they feel during the school year and how happy they feel during the summer. Do happiness levels differ depending on the time of year? Fictional data appear below:

Student School Year Summer
1 7 4
2 4 6
3 5 5
4 3 4
5 4 8
6 5 7
7 3 2
  1. Compute the Wilcoxon signed-rank test statistic.

  2. Make a decision regarding the null hypothesis.

Question 18.26

Compute the Wilcoxon signed-rank test statistic for the following set of data:

Person Score 1 Score 2
1 6 6
2 5 3
3 4 2
4 3 5
5 2 1
6 1 4

Question 18.27

Compute the Mann–Whitney U statistic for the following data. Each participant has been assigned a group and a participant number; these are shown in the “Group 1” and “Group 2” columns.

Group 1 Ordinal Dependent Variable Group 2 Ordinal Dependent Variable
1 1 1 11  
2 2.5   2 9
3 8 3 2.5  
4 4 4 5
5 6 5 7
6 10   6 12  

Question 18.28

Compute the Mann–Whitney U statistic for the following data. Each participant has been assigned a group and a participant number; these are shown in the “Group 1” and “Group 2” columns.

Group 1 Scale Dependent Variable Group 2 Scale Dependent Variable
1 8 9 3
2 5 10   4
3 5 11   2
4 7 12   1
5 10   13   1
6 14   14   5
7 9 15   6
8 11  

520

Question 18.29

Are men or women more likely to be at the top of their class? The following table depicts fictional class standings for a group of men and women:

Student Gender Class Standing Student Gender Class Standing
1 Male 98   7 Male 43
2 Female 72   8 Male 33
3 Male 15   9 Female 17
4 Female 3 10   Female 82
5 Female 102     11   Male 63
6 Female 8 12   Male 25
  1. Compute the Mann–Whitney U test statistic.

  2. Make a decision regarding the null hypothesis. Is there a significant difference in the class ranks of men and women?

Question 18.30

Assume a researcher compared the performance of two independent groups of participants on an ordinal variable using the Mann–Whitney U test. The first group had 8 participants and the second group had 11 participants.

  1. Using a p level of 0.05 and a two-tailed test, determine the critical value.

  2. Assume the researcher calculated U1 = 22 and U2 = 17. Make a decision regarding the null hypothesis and explain that decision.

  3. Assume the researcher calculated U1 = 24 and U2 = 30. Make a decision regarding the null hypothesis and explain that decision.

  4. Assume the researcher calculated U1 = 13 and U2 = 9. Make a decision regarding the null hypothesis and explain that decision.

Question 18.31

The following data set represents the scores of three independent groups of participants on a single scale dependent variable. Calculate the Kruskal–Wallis H statistic for this data set.

Group 1 Group 2 Group 3
0.22 1.03 0.52
0.55 0.89 0.67
1.20 0.74 2.83
0.83 1.86 3.20
1.01 2.21 2.75
0.86 0.94 1.74

Question 18.32

The following data set represents the scores of three independent groups of participants on a single ordinal dependent variable. Calculate the Kruskal–Wallis H statistic for this data set.

Group 1 Group 2 Group 3
1 1 5
5 3 4
3 3 1
2 4 1
2 2 3

Question 18.33

The following data set represents the scores of three independent groups of participants on a single scale dependent variable:

Group 1 Group 2 Group 3
15 38 12
27 22 72
16 56 84
41 33
  1. Calculate the Kruskal–Wallis H statistic for this data set.

  2. Make a decision regarding the null hypothesis. Is there a significant difference among the groups?

Applying the Concepts

Question 18.34

University students, cell phone bills, and ordinal data: Here are some monthly cell phone bills, in dollars, for university students:

100     60     35     50     50     50     60     65    
0     75     100     55     50     40     80    
200     30     50     108     500     100     45    
40     45     50     40     40     100     80    
  1. Convert these data from scale to ordinal. (Don’t forget to put them in order first.) What happens to an outlier when you convert these data to ordinal?

  2. What approximate shape would the distribution of these data take? Would they likely be normally distributed? Explain why the distribution of ordinal data is never normal.

  3. Why does it not matter if the ordinal variable is normally distributed? (Hint: Think about what kind of hypothesis test you would conduct.)

Question 18.35

World cities, livability, and nonparametric hypothesis tests: CNN.com reported on a 2012 study that ranked the world’s cities in terms of how livable they are (https://www.eiu.com/public/topical_report.aspx?campaignid=Liveability2012), using a range of criteria related to stability, health care, culture and environment, education, and infrastructure. The top 10, in order, were: Melbourne, Australia; Vienna, Austria; Vancouver, Toronto, and Calgary, all in Canada; Adelaide and Sydney, both in Australia; Helsinki, Finland; Perth, Australia; and Auckland, New Zealand. For each of the following research questions, state which nonparametric hypothesis test is appropriate: the Spearman rank-order correlation coefficient, the Wilcoxon signed-rank test, the Mann–Whitney U test, or the Kruskal–Wallis H test. Explain your answers and indicate the equivalent parametric test.

  1. Which cities tend to receive higher rankings—those north of the equator or those south of the equator?

  2. Did the top 10 cities tend to change rank relative to their position in the previous study?

  3. Are the livability rankings related to a city’s economic status?

  4. On which continent do cities tend to have the highest rankings?

521

Question 18.36

Fantasy baseball and the Spearman correlation coefficient: In fantasy baseball, groups of 12 league participants conduct a draft in which they can “buy” any baseball players from any teams across one of the two Major League Baseball (MLB) leagues (the American League and the National League). These makeshift teams are compared on the basis of the combined statistics of the individual baseball players. For example, statistics about home runs are transformed into points, and each fantasy team receives a total score of all combined points based on its baseball players, regardless of their real-life team. Many in the fantasy and real-life baseball worlds have wondered how success in fantasy leagues maps onto real MLB teams’ success in terms of winning baseball games. Walker (2006) compared the fantasy league performances of the players for each American League team with their actual American League finishes for the 2004 season, the year the Boston Red Sox broke the legendary “curse” against them and won the World Series. The data, sorted from highest to lowest fantasy league score, are shown in the accompanying table.

Team Fantasy League Points Actual American League Finish
Boston 117.5 2
New York 109.5 1
Anaheim 108     3.5  
Minnesota 97   3.5  
Texas 85   6
Chicago 80   7
Cleveland 79   8
Oakland 77   5
Baltimore 74.5 9
Detroit 68.5 10  
Seattle 51   13  
Tampa Bay 47.5 11  
Toronto 35.5 12  
Kansas City 20   14  
  1. What are the two variables of interest? For each variable, state whether it’s scale or ordinal.

  2. Calculate the Spearman correlation coefficient for these two variables. Remember to convert any scale variables to ranks.

  3. What does the coefficient tell us about the relation between these two variables?

  4. Why couldn’t we calculate a Pearson correlation coefficient for these data?

Question 18.37

Test-taking speed, grade, and the Spearman correlation coefficient: Does speed in completing a test correlate with one’s grade? Here are test scores for eight students in one of our statistics classes. They are arranged in order from the student who turned in the test first to the student who turned in the test last.

98 74 87 92 88 93 62 67

  1. What are the two variables of interest? For each variable, state whether it’s scale or ordinal.

  2. Calculate the Spearman correlation coefficient for these two variables. Remember to convert any scale variables to ranks.

  3. What does the coefficient tell us about the relation between these two variables?

  4. Why couldn’t we calculate a Pearson correlation coefficient for these data?

  5. Does this Spearman correlation coefficient suggest that students should take their tests as quickly as possible? That is, does it indicate that taking the test quickly causes a good grade? Explain your answer.

  6. What third variables might be responsible for this correlation? That is, what third variables might cause both speedy test taking and a good test grade?

Question 18.38

Test-taking speed, grade, and interpreting the Spearman correlation coefficient: Consider again the two variables described in Exercise 18.37, test grade and speed in taking the test. Imagine that each of the following numbers represents the Spearman correlation coefficient that quantifies the relation between test grade and speed in taking the test. Recall that test grade was converted to ranks such that the top grade of 98 is ranked 1, and for speed in taking the test, the fastest person was ranked 1. What does each coefficient suggest about the relation between the variables? Using the guidelines for the Pearson correlation coefficient, indicate whether each coefficient is roughly small (0.10), medium (0.30), or large (0.50). Specify which of these coefficients suggests the strongest relation between the two variables as well as which coefficient suggests the weakest relation between the two variables. [You calculated the actual correlation between these variables in Exercise 18.37(b).]

522

  1. 1.00

  2. −0.001

  3. 0.52

  4. −0.27

  5. −0.98

  6. 0.09

Question 18.39

Hockey wins and the Wilcoxon signed-rank test: Are Canadian professional hockey teams consistent over time? Here are the wins per season (out of 82 games) for the six Canadian teams in the National Hockey League (NHL). For comparison, in 1995–1996, the top team in the Eastern Conference was the Pittsburgh Penguins, with 49 wins, and the top team in the Western Conference was the Detroit Red Wings, with 62 wins. In 2005–2006, the top team in the Eastern Conference was the Ottawa Senators, with 52 wins, and the top team in the Western Conference was, once again, Detroit, with 58 wins. (The Winnipeg Jets weren’t in existence in the 2005–2006 season, so we didn’t include them here.)

Team 1995–1996 Season 2005–2006 Season
Calgary Flames 34 46
Edmonton Oilers 30 41
Montréal Canadiens 40 42
Ottawa Senators 18 52
Toronto Maple Leafs 34 41
Vancouver Canucks 32 42
  1. What is the independent variable and what are its levels? What is the dependent variable?

  2. Is this a between-groups or within-groups design? Explain.

  3. Why might it be preferable to use a nonparametric hypothesis test for these data?

  4. Conduct all six steps of hypothesis testing for a Wilcoxon signed-rank test for matched pairs.

  5. How would you present these statistics in a journal article?

Question 18.40

Online plane tickets and the Wilcoxon signed-rank test: Which Web site offers better fares— CheapTickets or Expedia? We conducted searches in February 2007, for the cheapest fares for round-trip international flights during peak summer travel season (which was not all that far into the future): leaving on July 7, 2007, and returning on July 28, 2007. We conducted a search for each itinerary using both search engines.

Itinerary CheapTickets Expedia
Athens, GA, to Johannesburg, South Africa $2403     $2580    
Chicago to Chennai, India 1884     2044    
Columbus, OH, to Belgrade, Serbia 1259     1436    
Denver to Geneva, Switzerland 1392     1412    
Montréal to Dublin, Ireland 1097     1152    
New York City to Reykjavik, Iceland 935     931    
San Antonio to Hong Kong 1407     1400    
Toronto to Istanbul, Turkey 1261     1429    
Tulsa to Guadalajara, Mexico 565     507    
Vancouver to Melbourne, Australia 1621     1613    
  1. What is the independent variable and what are its levels? What is the dependent variable?

  2. Is this a between-groups or a within-groups design? Explain.

  3. Conduct all six steps of hypothesis testing for a Wilcoxon signed-rank test for matched pairs.

  4. How would you present these statistics in a journal article?

Question 18.41

Public versus private universities and the Mann–Whitney U test: Do public or private universities tend to have better sociology graduate programs? U.S. News & World Report publishes online rankings of graduate schools across a range of disciplines. The table below includes the Report’s 2013 list of the top 19 doctoral programs in sociology and notes whether the schools are public or private. Schools listed at the same rank are tied.

University Rank Type of School Public Rank Private Rank
Princeton University 2 Private 2
University of California, Berkeley 2 Public 2
University of Wisconsin, Madison 2 Public 2
Stanford University 4.5 Private 4.5
University of Michigan, Ann Arbor 4.5 Public 4.5
Harvard University 7 Private 7
University of Chicago 7 Private 7
University of North 7 Public 7
Carolina, Chapel Hill
University of California, Los Angeles 9 Public 9
Northwestern University 10.5 Private 10.5
University of Pennsylvania 10.5 Private 10.5
Columbia University 12.5 Private 12.5
Indiana University, Bloomington 12.5 Public 12.5
Duke University 14.5 Private 14.5
University of Texas, Austin 14.5 Public 14.5
New York University 16 Private 16
Cornell University 18 Private 18
Ohio State University 18 Public 18
Pennsylvania State University, University Park 18 Public 18

523

  1. What is the independent variable, and what are its levels? What is the dependent variable?

  2. Is this a between-groups or within-groups design? Explain.

  3. Why do we have to use a nonparametric hypothesis test for these data?

  4. Conduct all six steps of hypothesis testing for a Mann–Whitney U test.

  5. How would you present these statistics in a journal article?

Question 18.42

Gender, aggression, and the interpretation of a Mann–Whitney U test: Spanish researchers examining aggression in children’s dreams reported the following: “Using the Mann–Whitney nonparametrical statistical test on the gender differences, we found a significant difference between boys and girls in Group 1 for overall [aggression] (U = 44.00, p = 0.004) and received aggression (U = 48.00, p = 0.005). So, in their dreams, younger boys not only had a higher level of general aggression but also received more severe aggressive acts than girls of the same age” (emphasis in original) (Oberst, Charles, & Chamarro, 2005, p. 175).

  1. What is the independent variable, and what are its levels? What is the dependent variable?

  2. Is this a between-groups or within-groups design?

  3. Which hypothesis test did the researchers conduct? Why might they have chosen a nonparametric test? Why do you think they chose this particular nonparametric test?

  4. In your own words, describe what they found.

  5. Can we conclude that gender caused a difference in levels of aggression in dreams? Explain. Provide at least two reasons why gender might not cause certain levels of aggression in dreams even though these variables are associated.

Question 18.43

Cell phone bill, hours studied, and the shapes of distributions: The following figures display data that depict the relation between students’ monthly cell phone bills and the number of hours they report that they study per week.

  1. What does the accompanying scatterplot suggest about the shape of the distribution for hours studied per week? What does it suggest about the shape of the distribution for monthly cell phone bill?

  2. What does the accompanying grouped frequency histogram suggest about the shape of the distribution for monthly cell phone bill?

  3. Is it a good idea to use a parametric hypothesis test for these data? Explain.

Question 18.44

Angelman syndrome and bootstrapping: Angelman syndrome is a rare genetic disease in which children are delayed developmentally and exhibit unusual symptoms such as inappropriate and prolonged laughter, difficulty in speaking or inability to speak, and seizures. Imagine that a researcher obtained vocabulary data for six children with Angelman syndrome and wants to develop an estimate of the mean vocabulary score of the population of children with Angelman syndrome. (Although those with Angelman syndrome often cannot speak, they are usually able to understand at least some simple language and they may learn to communicate with sign language.) The General Social Survey (GSS), using a multiple-choice format, asks children the meaning of 10 words; the GSS data have a mean of 6.1, with a standard deviation of 2.1. The fictional data for the six children with Angelman syndrome are: 0, 1, 1, 2, 3, and 4. Write each of these six numbers on a separate, small piece of paper.

  1. Put the six pieces of paper in a bowl or hat, and then pull six out, one at a time, replacing each one and mixing them up before pulling the next. List the numbers and take the mean. Repeat this procedure two more times so that you have three lists and three means.

  2. We did this 20 times and got the following 20 means:

    Determine the 90% confidence interval for these means. (Hint: Arrange them in order and then choose the middle 90% of scores.) Remember, were we really to bootstrap the data, we would have a computer do it because 20 means is far too few.

  3. Why is bootstrapping a helpful technique in this particular situation?

Putting It All Together

Question 18.45

“Smart” states and the Kruskal–Wallis H test: The Morgan Quitno Press regularly ranks U.S. states on how “smart” they are, based on 21 criteria including per-student school expenditures, percent of population with high school degrees, high school dropout rate, average class size, and “percent of 4th graders whose parents have strict rules about getting homework done.” Here are the rankings for all 50 states for 2004.

  1. Massachusetts (NE)
  2. Connecticut (NE)
  3. Vermont (NE)
  4. New Jersey (NE)
  5. Wisconsin (MW)
  6. New York (NE)
  7. Minnesota (MW)
  8. Iowa (MW)
  9. Pennsylvania (NE)
  10. Montana
  11. Maine (NE)
  12. Virginia (S)
  13. Nebraska (MW)
  14. New Hampshire (NE)
  15. Kansas (MW)
  16. Wyoming
  17. Indiana (MW)
  18. Maryland
  19. North Dakota
  20. Ohio (MW)
  21. Colorado
  22. South Dakota
  23. Rhode Island (NE)
  24. Illinois (MW)
  25. North Carolina (S)
  26. Missouri (MW)
  27. Delaware
  28. Utah
  29. Idaho
  30. Washington
  31. Michigan (MW)
  32. South Carolina (S)
  33. Texas
  34. West Virginia
  35. Oregon
  36. Arkansas (S)
  37. Kentucky (S)
  38. Georgia (S)
  39. Florida (S)
  40. Oklahoma
  41. Tennessee (S)
  42. Hawaii
  43. California
  44. Alabama (S)
  45. Alaska
  46. Louisiana (S)
  47. Mississippi (S)
  48. Arizona
  49. Nevada
  50. New Mexico

525

We marked states in the Northeast with an NE, in the Midwest with a MW, and in the South with an S. Do these regions tend to have different rankings from one another?

  1. What is the independent variable and what are its levels? What is the dependent variable?

  2. Is this a between-groups or within-groups design? Explain.

  3. Why do we have to use a nonparametric hypothesis test for these data?

  4. Conduct all six steps of hypothesis testing for a Kruskal–Wallis H test. Note that you have to rank just the states in this study, separate from the original ranking list.

  5. How would you present these statistics in a journal article?

  6. Explain why a statistically significant Kruskal–Wallis H statistic does not tell us exactly where the specific differences lie. If there is a statistically significant finding for this example, determine where the difference lies by calculating Kruskal–Wallis H statistics for each pair.

  7. What test would you conduct if you had scale scores for the 50 states? Explain your answer.

  8. If you were interested in the validity of the categorizing of states as “smart,” how could you determine this? What criteria might you use? Be specific.

Question 18.46

Stroke patients, treatment, and type of nonparametric test: A common situation faced by researchers working with special populations, such as neurologically impaired people or people with less common psychiatric conditions, is that the studies often have small sample sizes due to the relatively few number of patients. As a result, these researchers often turn to nonparametric statistical tests. For each of the following research descriptions, state which nonparametric hypothesis test is most appropriate: the Spearman rank-order correlation coefficient, the Wilcoxon signed-rank test, the Mann–Whitney U test, or the Kruskal–Wallis H test. Explain your answers.

  1. People who have had a stroke often have whole or partial paralysis on the side of their body opposite the side of the brain damage. Leung, Ng, and Fong (2009) were interested in the effects of a treatment program for constrained movement on the recovery from paralysis. They compared the arm-movement ability of eight stroke patients before and after the treatment.

  2. Leung and colleagues (2009) were also interested in whether the amount of improvement after the therapy was related to the number of months that had passed since the patient experienced the stroke.

  3. Five of Leung and colleagues’ (2009) patients were male and three were female. We could ask whether post-treatment movement performance was different between men and women.

  4. For parts (a), (b), and (c), what test would you use if you had 180 patients instead of 8? Explain your answers.

  5. For parts (a), (b), and (c), what was the dependent variable in each case, and how might the researchers have operationalized this variable?

  6. The researchers used a treatment program to help patients recover from a stroke. If the researchers had enough patients to randomly assign some to treatment and others to no treatment, would they have been able to have either a blind or a double-blind design? Explain your answer.