Clarifying the Concepts
When do we convert scale data to ordinal data?
When the data on at least one variable are ordinal, the data on any scale variable must be converted from scale to ordinal. How do we convert a scale variable into an ordinal one?
How does the transformation of scale data to ordinal data solve the problem of outliers?
What does a histogram of rank-
Explain how the relation between ranks is the core of the Spearman rank-
Define the symbols in the following term:
What is the possible range of values for the Spearman rank-
How would you respond in a situation in which you are ranking a set of scale data and there are two numbers that are exactly the same?
When is it appropriate to use the Wilcoxon signed-
How is N determined for the Wilcoxon signed-
When conducting a Wilcoxon signed-
When do we use the Mann–
What are the assumptions of the Mann–
How are the critical values for the Mann–
When is it appropriate to use the Kruskal–
Define and explain the symbols in the following equation:
If the data meet the assumptions of the parametric test, why is it preferable to use the parametric test rather than the nonparametric alternative?
What is bootstrapping?
How can bootstrapping be used as an alternative to nonparametric tests when working with small sample sizes?
Calculating the Statistics
In order to compute statistics, we need to have working formulas. For the following, (i) identify the incorrect symbol, (ii) state what the correct symbol should be, and (iii) explain why the initial symbol was incorrect.
Consider the following scale data.
Participant | Variable X | Variable Y |
---|---|---|
1 | 134.5 | 64.00 |
2 | 186 | 60.00 |
3 | 157 | 61.50 |
4 | 129 | 66.25 |
5 | 147 | 65.50 |
6 | 133 | 62.00 |
7 | 141 | 62.50 |
8 | 147 | 62.00 |
9 | 136 | 63.00 |
10 | 147 | 65.50 |
Convert the data to ordinal or ranked data, starting with a rank of 1 for the smallest data point.
Compute the Spearman correlation coefficient.
Consider the following scale data.
Participant | Variable X | Variable Y |
---|---|---|
1 | $1250 | 25 |
2 | $1400 | 21 |
3 | $1100 | 32 |
4 | $1450 | 54 |
5 | $1600 | 38 |
6 | $2100 | 62 |
7 | $3750 | 43 |
8 | $1300 | 32 |
Convert the data to ordinal or ranked data, starting with a rank of 1 for the smallest data point.
Compute the Spearman correlation coefficient.
519
The following fictional data represent the finishing place for runners of a 5-
Race Rank | Hours Trained | Race Rank | Hours Trained |
---|---|---|---|
1 | 25 | 6 | 18 |
2 | 25 | 7 | 12 |
3 | 22 | 8 | 17 |
4 | 18 | 9 | 15 |
5 | 19 | 10 | 16 |
Calculate the Spearman correlation for this set of data.
Make a decision regarding the null hypothesis. Is there a significant correlation between a runner’s finishing place and the amount the runner trained?
Imagine that a researcher measured a group of participants at two time points. Fictional scores for these two time points appear below. Are the scores different at time 1 and time 2?
Person | Time 1 | Time 2 |
---|---|---|
1 | 56 | 83 |
2 | 74 | 116 |
3 | 81 | 96 |
4 | 47 | 56 |
5 | 78 | 120 |
6 | 96 | 100 |
7 | 72 | 71 |
Compute the Wilcoxon signed-
Make a decision regarding the null hypothesis.
Assume a group of students provides happiness ratings for how happy they feel during the school year and how happy they feel during the summer. Do happiness levels differ depending on the time of year? Fictional data appear below:
Student | School Year | Summer |
---|---|---|
1 | 7 | 4 |
2 | 4 | 6 |
3 | 5 | 5 |
4 | 3 | 4 |
5 | 4 | 8 |
6 | 5 | 7 |
7 | 3 | 2 |
Compute the Wilcoxon signed-
Make a decision regarding the null hypothesis.
Compute the Wilcoxon signed-
Person | Score 1 | Score 2 |
---|---|---|
1 | 6 | 6 |
2 | 5 | 3 |
3 | 4 | 2 |
4 | 3 | 5 |
5 | 2 | 1 |
6 | 1 | 4 |
Compute the Mann–
Group 1 | Ordinal Dependent Variable | Group 2 | Ordinal Dependent Variable |
---|---|---|---|
1 | 1 | 1 | 11 |
2 | 2.5 | 2 | 9 |
3 | 8 | 3 | 2.5 |
4 | 4 | 4 | 5 |
5 | 6 | 5 | 7 |
6 | 10 | 6 | 12 |
Compute the Mann–
Group 1 | Scale Dependent Variable | Group 2 | Scale Dependent Variable |
---|---|---|---|
1 | 8 | 9 | 3 |
2 | 5 | 10 | 4 |
3 | 5 | 11 | 2 |
4 | 7 | 12 | 1 |
5 | 10 | 13 | 1 |
6 | 14 | 14 | 5 |
7 | 9 | 15 | 6 |
8 | 11 |
520
Are men or women more likely to be at the top of their class? The following table depicts fictional class standings for a group of men and women:
Student | Gender | Class Standing | Student | Gender | Class Standing |
---|---|---|---|---|---|
1 | Male | 98 | 7 | Male | 43 |
2 | Female | 72 | 8 | Male | 33 |
3 | Male | 15 | 9 | Female | 17 |
4 | Female | 3 | 10 | Female | 82 |
5 | Female | 102 | 11 | Male | 63 |
6 | Female | 8 | 12 | Male | 25 |
Compute the Mann–
Make a decision regarding the null hypothesis. Is there a significant difference in the class ranks of men and women?
Assume a researcher compared the performance of two independent groups of participants on an ordinal variable using the Mann–
Using a p level of 0.05 and a two-
Assume the researcher calculated U1 = 22 and U2 = 17. Make a decision regarding the null hypothesis and explain that decision.
Assume the researcher calculated U1 = 24 and U2 = 30. Make a decision regarding the null hypothesis and explain that decision.
Assume the researcher calculated U1 = 13 and U2 = 9. Make a decision regarding the null hypothesis and explain that decision.
The following data set represents the scores of three independent groups of participants on a single scale dependent variable. Calculate the Kruskal–
Group 1 | Group 2 | Group 3 |
---|---|---|
0.22 | 1.03 | 0.52 |
0.55 | 0.89 | 0.67 |
1.20 | 0.74 | 2.83 |
0.83 | 1.86 | 3.20 |
1.01 | 2.21 | 2.75 |
0.86 | 0.94 | 1.74 |
The following data set represents the scores of three independent groups of participants on a single ordinal dependent variable. Calculate the Kruskal–
Group 1 | Group 2 | Group 3 |
---|---|---|
1 | 1 | 5 |
5 | 3 | 4 |
3 | 3 | 1 |
2 | 4 | 1 |
2 | 2 | 3 |
The following data set represents the scores of three independent groups of participants on a single scale dependent variable:
Group 1 | Group 2 | Group 3 |
---|---|---|
15 | 38 | 12 |
27 | 22 | 72 |
16 | 56 | 84 |
41 | 33 |
Calculate the Kruskal–
Make a decision regarding the null hypothesis. Is there a significant difference among the groups?
Applying the Concepts
University students, cell phone bills, and ordinal data: Here are some monthly cell phone bills, in dollars, for university students:
100 | 60 | 35 | 50 | 50 | 50 | 60 | 65 |
0 | 75 | 100 | 55 | 50 | 40 | 80 | |
200 | 30 | 50 | 108 | 500 | 100 | 45 | |
40 | 45 | 50 | 40 | 40 | 100 | 80 |
Convert these data from scale to ordinal. (Don’t forget to put them in order first.) What happens to an outlier when you convert these data to ordinal?
What approximate shape would the distribution of these data take? Would they likely be normally distributed? Explain why the distribution of ordinal data is never normal.
Why does it not matter if the ordinal variable is normally distributed? (Hint: Think about what kind of hypothesis test you would conduct.)
World cities, livability, and nonparametric hypothesis tests: CNN.com reported on a 2012 study that ranked the world’s cities in terms of how livable they are (https:/
Which cities tend to receive higher rankings—
Did the top 10 cities tend to change rank relative to their position in the previous study?
Are the livability rankings related to a city’s economic status?
On which continent do cities tend to have the highest rankings?
521
Fantasy baseball and the Spearman correlation coefficient: In fantasy baseball, groups of 12 league participants conduct a draft in which they can “buy” any baseball players from any teams across one of the two Major League Baseball (MLB) leagues (the American League and the National League). These makeshift teams are compared on the basis of the combined statistics of the individual baseball players. For example, statistics about home runs are transformed into points, and each fantasy team receives a total score of all combined points based on its baseball players, regardless of their real-
Team | Fantasy League Points | Actual American League Finish |
---|---|---|
Boston | 117.5 | 2 |
New York | 109.5 | 1 |
Anaheim | 108 | 3.5 |
Minnesota | 97 | 3.5 |
Texas | 85 | 6 |
Chicago | 80 | 7 |
Cleveland | 79 | 8 |
Oakland | 77 | 5 |
Baltimore | 74.5 | 9 |
Detroit | 68.5 | 10 |
Seattle | 51 | 13 |
Tampa Bay | 47.5 | 11 |
Toronto | 35.5 | 12 |
Kansas City | 20 | 14 |
What are the two variables of interest? For each variable, state whether it’s scale or ordinal.
Calculate the Spearman correlation coefficient for these two variables. Remember to convert any scale variables to ranks.
What does the coefficient tell us about the relation between these two variables?
Why couldn’t we calculate a Pearson correlation coefficient for these data?
Test-
98 74 87 92 88 93 62 67
What are the two variables of interest? For each variable, state whether it’s scale or ordinal.
Calculate the Spearman correlation coefficient for these two variables. Remember to convert any scale variables to ranks.
What does the coefficient tell us about the relation between these two variables?
Why couldn’t we calculate a Pearson correlation coefficient for these data?
Does this Spearman correlation coefficient suggest that students should take their tests as quickly as possible? That is, does it indicate that taking the test quickly causes a good grade? Explain your answer.
What third variables might be responsible for this correlation? That is, what third variables might cause both speedy test taking and a good test grade?
Test-
522
1.00
−0.001
0.52
−0.27
−0.98
0.09
Hockey wins and the Wilcoxon signed-
Team | 1995– |
2005– |
---|---|---|
Calgary Flames | 34 | 46 |
Edmonton Oilers | 30 | 41 |
Montréal Canadiens | 40 | 42 |
Ottawa Senators | 18 | 52 |
Toronto Maple Leafs | 34 | 41 |
Vancouver Canucks | 32 | 42 |
What is the independent variable and what are its levels? What is the dependent variable?
Is this a between-
Why might it be preferable to use a nonparametric hypothesis test for these data?
Conduct all six steps of hypothesis testing for a Wilcoxon signed-
How would you present these statistics in a journal article?
Online plane tickets and the Wilcoxon signed-
Itinerary | CheapTickets | Expedia |
---|---|---|
Athens, GA, to Johannesburg, South Africa | $2403 | $2580 |
Chicago to Chennai, India | 1884 | 2044 |
Columbus, OH, to Belgrade, Serbia | 1259 | 1436 |
Denver to Geneva, Switzerland | 1392 | 1412 |
Montréal to Dublin, Ireland | 1097 | 1152 |
New York City to Reykjavik, Iceland | 935 | 931 |
San Antonio to Hong Kong | 1407 | 1400 |
Toronto to Istanbul, Turkey | 1261 | 1429 |
Tulsa to Guadalajara, Mexico | 565 | 507 |
Vancouver to Melbourne, Australia | 1621 | 1613 |
What is the independent variable and what are its levels? What is the dependent variable?
Is this a between-
Conduct all six steps of hypothesis testing for a Wilcoxon signed-
How would you present these statistics in a journal article?
Public versus private universities and the Mann–
University | Rank | Type of School | Public Rank | Private Rank |
---|---|---|---|---|
Princeton University | 2 | Private | 2 | |
University of California, Berkeley | 2 | Public | 2 | |
University of Wisconsin, Madison | 2 | Public | 2 | |
Stanford University | 4.5 | Private | 4.5 | |
University of Michigan, Ann Arbor | 4.5 | Public | 4.5 | |
Harvard University | 7 | Private | 7 | |
University of Chicago | 7 | Private | 7 | |
University of North | 7 | Public | 7 | |
Carolina, Chapel Hill | ||||
University of California, Los Angeles | 9 | Public | 9 | |
Northwestern University | 10.5 | Private | 10.5 | |
University of Pennsylvania | 10.5 | Private | 10.5 | |
Columbia University | 12.5 | Private | 12.5 | |
Indiana University, Bloomington | 12.5 | Public | 12.5 | |
Duke University | 14.5 | Private | 14.5 | |
University of Texas, Austin | 14.5 | Public | 14.5 | |
New York University | 16 | Private | 16 | |
Cornell University | 18 | Private | 18 | |
Ohio State University | 18 | Public | 18 | |
Pennsylvania State University, University Park | 18 | Public | 18 |
523
What is the independent variable, and what are its levels? What is the dependent variable?
Is this a between-
Why do we have to use a nonparametric hypothesis test for these data?
Conduct all six steps of hypothesis testing for a Mann–
How would you present these statistics in a journal article?
Gender, aggression, and the interpretation of a Mann–
What is the independent variable, and what are its levels? What is the dependent variable?
Is this a between-
Which hypothesis test did the researchers conduct? Why might they have chosen a nonparametric test? Why do you think they chose this particular nonparametric test?
In your own words, describe what they found.
Can we conclude that gender caused a difference in levels of aggression in dreams? Explain. Provide at least two reasons why gender might not cause certain levels of aggression in dreams even though these variables are associated.
Cell phone bill, hours studied, and the shapes of distributions: The following figures display data that depict the relation between students’ monthly cell phone bills and the number of hours they report that they study per week.
What does the accompanying scatterplot suggest about the shape of the distribution for hours studied per week? What does it suggest about the shape of the distribution for monthly cell phone bill?
What does the accompanying grouped frequency histogram suggest about the shape of the distribution for monthly cell phone bill?
Is it a good idea to use a parametric hypothesis test for these data? Explain.
Angelman syndrome and bootstrapping: Angelman syndrome is a rare genetic disease in which children are delayed developmentally and exhibit unusual symptoms such as inappropriate and prolonged laughter, difficulty in speaking or inability to speak, and seizures. Imagine that a researcher obtained vocabulary data for six children with Angelman syndrome and wants to develop an estimate of the mean vocabulary score of the population of children with Angelman syndrome. (Although those with Angelman syndrome often cannot speak, they are usually able to understand at least some simple language and they may learn to communicate with sign language.) The General Social Survey (GSS), using a multiple-
Put the six pieces of paper in a bowl or hat, and then pull six out, one at a time, replacing each one and mixing them up before pulling the next. List the numbers and take the mean. Repeat this procedure two more times so that you have three lists and three means.
We did this 20 times and got the following 20 means:
Determine the 90% confidence interval for these means. (Hint: Arrange them in order and then choose the middle 90% of scores.) Remember, were we really to bootstrap the data, we would have a computer do it because 20 means is far too few.
Why is bootstrapping a helpful technique in this particular situation?
Putting It All Together
“Smart” states and the Kruskal–
525
We marked states in the Northeast with an NE, in the Midwest with a MW, and in the South with an S. Do these regions tend to have different rankings from one another?
What is the independent variable and what are its levels? What is the dependent variable?
Is this a between-
Why do we have to use a nonparametric hypothesis test for these data?
Conduct all six steps of hypothesis testing for a Kruskal–
How would you present these statistics in a journal article?
Explain why a statistically significant Kruskal–
What test would you conduct if you had scale scores for the 50 states? Explain your answer.
If you were interested in the validity of the categorizing of states as “smart,” how could you determine this? What criteria might you use? Be specific.
Stroke patients, treatment, and type of nonparametric test: A common situation faced by researchers working with special populations, such as neurologically impaired people or people with less common psychiatric conditions, is that the studies often have small sample sizes due to the relatively few number of patients. As a result, these researchers often turn to nonparametric statistical tests. For each of the following research descriptions, state which nonparametric hypothesis test is most appropriate: the Spearman rank-
People who have had a stroke often have whole or partial paralysis on the side of their body opposite the side of the brain damage. Leung, Ng, and Fong (2009) were interested in the effects of a treatment program for constrained movement on the recovery from paralysis. They compared the arm-
Leung and colleagues (2009) were also interested in whether the amount of improvement after the therapy was related to the number of months that had passed since the patient experienced the stroke.
Five of Leung and colleagues’ (2009) patients were male and three were female. We could ask whether post-
For parts (a), (b), and (c), what test would you use if you had 180 patients instead of 8? Explain your answers.
For parts (a), (b), and (c), what was the dependent variable in each case, and how might the researchers have operationalized this variable?
The researchers used a treatment program to help patients recover from a stroke. If the researchers had enough patients to randomly assign some to treatment and others to no treatment, would they have been able to have either a blind or a double-