Chapter 17

  • 17.1 Nominal data are those that are categorical in nature; they cannot be ordered in any meaningful way, and they are often thought of as simply named. Ordinal data can be ordered, but we cannot assume even distances between points of equal separation. For example, the difference between the second and third scores may not be the same as the difference between the seventh and the eighth. Scale data are measured on either the interval or ratio level; we can assume equal intervals between points along these measures.
  • 17.3 The chi-square test for goodness of fit is a nonparametric hypothesis test used with one nominal variable. The chi-square test for independence is a nonparametric test used with two nominal variables.
  • 17.5 Throughout the book, we have referred to independent variables, those variables that we hypothesize to have an effect on the dependent variable. We also described how statisticians refer to observations that are independent of one another, such as a between-groups research design requiring that observations be taken from independent samples. Here, with regard to chi square, independence takes on a similar meaning. We are testing whether the effect of one variable is independent of the other—that the proportion of cases across the levels of one variable does not depend on the levels of the other variable.
  • 17.7 In most previous hypothesis tests, the degrees of freedom have been based on sample size. For the chi-square hypothesis tests, however, the degrees of freedom are based on the numbers of categories, or cells, in which participants can be counted. For example, the degrees of freedom for the chi-square test for goodness of fit is the number of categories minus 1: dfχ2 = k − 1. Here, k is the symbol for the number of categories.
  • 17.9 The contingency table presents the observed frequencies for each cell in the study.
  • 17.11 This is the formula to calculate the chi-square statistic. The symbols represent the sum, for each cell, of the squared difference between each observed frequency and its matching expected frequency, divided by the expected value for its cell.
  • 17.13 Relative likelihood indicates the relative chance of an outcome (i.e., how many times more likely the outcome is, given the group membership of an observation). For example, we might determine the relative likelihood that a person would be a victim of bullying, given that the person is a boy versus a girl.
  • 17.15 Relative likelihood and relative risk are exactly the same measure, but relative likelihood is typically called relative risk when it comes to health and medical situations because it describes a person’s risk for a disease or health outcome.
  • 17.17 The most useful graph for displaying the results of a chi-square test of independence is a bar graph that uses the conditional proportions rather than the frequencies, thus allowing us to compare the rates across the various levels of each variable.
  • 17.19 If a researcher obtains a significant chi-square value but one of the variables has more than two levels, the researcher can determine which cells of the table differ from expectations by comparing the value of the adjusted standardized residual for that cell to a criterion. The criterion adopted by many researchers is 2, such that if the adjusted standardized residual is greater than 2, the observed values for that cell differ significantly from the expected values.
  • 17.21
    • a. The independent variable is gender, which is nominal (men or women). The dependent variable is number of loads of laundry, which is scale.
    • b. The independent variable is need for approval, which is ordinal (rank). The dependent variable is miles on a car, which is scale.

      C-64

    • c. The independent variable is place of residence, which is nominal (on or off campus). The dependent variable is whether the student is an active member of a club, which is also nominal (active or not active).
  • 17.23
    • a. dfx2 = k − 1 = 4 − 1 = 3
    • b.
    • c.
  • 17.25 The conditional probability of being a smoker, given that a person is female is , and the conditional probability of being a smoker, given that a person is male is . The relative likelihood of being a smoker given that one is female rather than male is . These Turkish women with lung cancer were less than one-tenth as likely to be smokers as were the male lung cancer patients.
  • 17.27
    • a. The first variable is gender, which is nominal (male or female). The second variable is salary negotiation, which also is nominal (wage not explicitly negotiable or wage explicitly negotiable).
    • b. A chi-square test for independence would be appropriate because both variables are nominal.
    • c. The researchers found that both genders seemed to be more likely to negotiate when the ad stated that the wage was negotiable than when that was not stated; however, when the job posting stated that the wage was negotiable, women seemed to be somewhat more likely than men to negotiate, whereas, when wage was not explicitly mentioned as negotiable in the job posting, men seemed to be more likely than women to negotiate.
  • 17.29
    • a. A nonparametric test would be appropriate because both of the variables are nominal: gender and major.
    • b. A nonparametric test is more appropriate because the sample size is small and the data are unlikely to be normal; the “top boss” is likely to have a much higher income than the other employees. This outlier would lead to a nonnormal distribution.
    • c. A parametric test would be appropriate because the independent variable (type of student: athlete versus nonathlete) is nominal and the dependent variable (grade point average) is scale.
    • d. A nonparametric test would be appropriate because the independent variable (athlete versus nonathlete) is nominal and the dependent variable (class rank) is ordinal.
    • e. A nonparametric test would be appropriate because the research question is about the relation between two nominal variables: seat-belt wearing and degree of injuries.
    • f. A parametric test would be appropriate because the independent variable (seat-belt use: no seat belt versus seat belt) is nominal and the dependent variable (speed) is scale.
  • 17.31
    • a. (i) Year. (ii) Grades received. (iii) This is a category III research design because the independent variable, year, is nominal and the dependent variable, grade (A or not), could also be considered nominal.
    • b. (i) Type of school. (ii) Average GPA of graduating students. (iii) This is a category II research design because the independent variable, type of school, is nominal and the dependent variable, GPA, is scale.
    • c. (i) SAT scores of incoming students. (ii) College GPA. (iii) This is a category I research design because both the independent variable and the dependent variable are scale.
  • 17.33
    • a.
      MEXICAN WHITE BLACK
      MARRIED      
      SINGLE      
    • b.
      MARRIED HEAD OF HOUSEHOLD
      IMMIGRANT NEIGHBORHOOD NONIMMIGRANT NEIGHBORHOOD
      COMMITTED CRIME    
      NO CRIME    
      UNMARRIED HEAD OF HOUSEHOLD
      IMMIGRANT NEIGHBORHOOD NONIMMIGRANT NEIGHBORHOOD
      COMMITTED CRIME    
      NO CRIME    
    • c.
      FIRST GENERATION SECOND GENERATION THIRD GENERATION
      COMMITTED CRIME      
      NO CRIME      

    C-65

  • 17.35
    • a. There is one variable, the gender of the op-ed writers. Its levels are men and women.
    • b. A chi-square test for goodness of fit would be used because we have data on a single nominal variable from one sample.
    • c. Step 1: Population 1 is op-ed contributors, in proportions of males and females that are like those in our sample. Population 2 is op-ed contributors, in proportions of males and females that are like those in the general population. The comparison distribution is a chi-square distribution. The hypothesis test will be a chi-square test for goodness of fit because we have only one nominal variable. This study meets three of the four assumptions. (1) The variable under study is nominal. (2) Each observation is independent of all the others. (3) There are more than five times as many participants as there are cells (there are 124 op-ed articles and only 2 cells). (4) This is not, however, a randomly selected sample of op-eds, so we must generalize with caution; specifically, we should not generalize beyond the New York Times.
      Step 2: Null hypothesis: The proportions of male and female op-ed contributors are the same as those in the population as whole.
      Research hypothesis: The proportions of male and female op-ed contributors are different from those in the population as a whole.
      Step 3: The comparison distribution is a chi-square distribution with 1 degree of freedom: dfx2 = 2 − 1 = 1. Step 4: The critical χ2, based on a p level of 0.05 and 1 degree of freedom, is 3.841.
      Step 5:
      OBSERVED (PROPORTIONS OF MEN AND WOMEN)
          MEN WOMEN
          103 21
      EXPECTED (BASED ON THE GENERAL POPULATION)
          MEN WOMEN
          62 62
      CATEGORY OBSERVED (O) EXPECTED (E) OE (OE)2
          Men 103 62 41 16811 27.113
          Women 21 62 −41 16811 27.113


      Step 6: Reject the null hypothesis. The calculated chi-square statistic exceeds the critical value. It appears that the proportion of op-eds written by women versus men is not the same as the proportion of men and women in the population. Specifically, there are fewer women than in the general population.
    • d. χ2(1, N = 124) = 54.23, p < 0.05
  • 17.37
    • a. The accompanying table shows the conditional proportions.
      EXCITING ROUTINE DULL
      SAME CITY 0.424 0.521 0.055 1.00
      SAME STATE/DIFFERENT CITY 0.468 0.485 0.047 1.00
      DIFFERENT STATE 0.502 0.451 0.047 1.00
    • b. The accompanying graph shows these conditional proportions.
    • c. The relative likelihood of finding life exciting if one lives in a different state as opposed to the same city is .
  • 17.39
    • a. There are two nominal variables—premarital doubts (yes or no) and divorced by 4 years (yes or no).
    • b. Chi-square tests for independence were used because there were two nominal variables. These tests were conducted for husbands and wives separately.

      C-66

    • c. n should be reported as N. The specific p values for each hypothesis test should be provided. An effect size—Cramer’s V in these cases—should be reported for each hypothesis test.
    • d. The researchers could not conclude that the likelihood of husbands being divorced by 4 years was dependent on premarital doubts. However, premarital doubts did seem to be related to being divorced by 4 years for wives.
  • 17.41
    • a. There are two variables in this study. The independent variable is the referred child’s gender (boy, girl) and the dependent variable is the diagnosis (problem, no problem but below norms, no problem and normal height).
    • b. A chi-square test for independence would be used because we have data on two nominal variables.
    • c. Step 1: Population 1 is referred children like those in this sample. Population 2 is referred children from a population in which growth problems do not depend on the child’s gender. The comparison distribution is a chi-square distribution. The hypothesis test will be a chi-square test for independence because we have two nominal variables. This study meets three of the four assumptions. (1) The two variables are nominal. (2) Every participant is in only one cell. (3) There are more than five times as many participants as there are cells (there are 278 participants and 6 cells). (4) The sample, however, was not randomly selected, so we must use caution when generalizing.
      Step 2: Null hypothesis: The proportion of boys in each diagnostic category is the same as the proportion of girls in each category.
      Research hypothesis: The proportion of boys in each diagnostic category is different from the proportion of girls in each category.
      Step 3: The comparison distribution is a chi-square distribution that has 2 degrees of freedom: dfχ2 = (krow − 1)(kcolumn − 1) = (2 − 1)(3 − 1) = 2.
      Step 4: The critical χ2, based on a p level of 0.05 and 2 degrees of freedom, is 5.99.
      Step 5:
      MEDICAL PROBLEM OBSERVED NO PROBLEM/BELOW NORM NO PROBLEM/NORMAL HEIGHT
      BOYS 27 86 69 182
      GIRLS 39 38 19 96
        66 124 88 278



      MEDICAL PROBLEM EXPECTED NO PROBLEM/BELOW NORM NO PROBLEM/NORMAL HEIGHT
      BOYS 43.134      81.172      57.694      182     
      GIRLS 22.752      42.816      30.432      96     
        65.886      123.988      88.126      278     
      CATEGORY OBSERVED (O) EXPECTED (E) OE (OE)2
      Boy; med prob 27 43.134 −16.134 260.306 6.035
      Boy; no prob/below 86 81.172 4.828 23.31 0.287
      Boy; no prob/norm 69 57.694 11.306 127.826 2.216
      Girl; med prob 39 22.752 16.248 263.998 11.603
      Girl; no prob/below 38 42.816 −4.816 23.194 0.542
      Girl; no prob/norm 19 30.432 −11.432 130.691 4.295


      Step 6: Reject the null hypothesis. The calculated chi-square value exceeds the critical value. It appears that the proportion of boys in each diagnostic category is not the same as the proportion of girls in each category.
    • d. Cramer’s According to Cohen’s conventions, this is a small-to-medium effect size.
    • e. χ2(1, N = 278) = 24.98, p < 0.05, Cramer’s V = 0.30
    • f. The accompanying table shows the conditional proportions.
      MEDICAL PROBLEM OBSERVED NO PROBLEM/BELOW NORM NO PROBLEM/NORMAL HEIGHT
      BOYS 0.148 0.473 0.379 1.00
      GIRLS 0.406 0.396 0.198 1.00

      C-67

    • g. The accompanying graph shows all six conditions.
    • h. Of the 113 boys below normal height, 27 were diagnosed with a medical problem. Of the 77 girls below normal height, 39 were diagnosed with a medical problem. The conditional proportion for boys is 0.239 and for girls is 0.506. This makes the relative risk for having a medical condition, given that one is a boy as opposed to a girl .
    • i. Boys below normal height are about half as likely to have a medical condition as are girls below normal height.
    • j. The relative risk for having a medical condition, given that .
    • k. Girls below normal height are about twice as likely to have a medical condition as are boys below normal height.
    • l. The two relative risks give us complementary information. Saying that boys are half as likely to have a medical condition implies that girls are twice as likely to have a medical condition.
    • m. The observed frequency is 27. The expected frequency is 43.2.
    • n. The adjusted standardized residual for boys is −4.8. This number indicates that the observed frequency for boys is 4.8 standard errors below the expected frequency for boys. The cells for boys and for girls with an underlying medical condition exceed the criterion of 2, as do the cells for both boys and girls who are of normal height. Using the new criterion of 3 does not change the conclusion regarding which cells have observed frequencies significantly different from expected.