15.2 Chi-Square Tests

  • The chi-square test for goodness of fit is a nonparametric hypothesis test that is used when there is one nominal variable.

  • The chi-square test for independence is a nonparametric hypothesis test that is used when there are two nominal variables.

Hot-hand research has moved from individual performance to team performance. For example, after opening the 2013 season by hanging on to win a wild game against Buffalo, Coach Urban Meyer of the Ohio State Buckeyes tried to explain that, “Momentum is an amazing thing in college football. The more mature your team is, momentum’s only about seven points. . . . We were up 22–0, I believe, when it turned upside-down” (Maks, 2013). But was Coach Meyer just seeing Elvis in a potato chip when he perceived a shift in his team’s momentum? It is a testable idea and this chapter describes what is being tested in two common kinds of chi-square statistical tests: (1) the chi-square test for goodness of fit, a nonparametric hypothesis test that is used when there is one nominal variable; (2) the chi-square test for independence, a nonparametric hypothesis test that is used when there are two nominal variables. Both chi-square tests involve the by-now familiar six steps of hypothesis testing.

433

MASTERING THE CONCEPT

15-2: When we only have nominal variables, we use the chi-square statistic. Specifically, we use a chi-square test for goodness of fit when we have one nominal variable, and we use a chi-square test for independence when we have two nominal variables.

Both chi-square tests use the chi-square statistic: χ2. The chi-square statistic is based on the chi-square distribution. As with t and F distributions, there are also several chi-square distributions, depending on the degrees of freedom. After we introduce chi-square tests, we’ll introduce several ways of determining the size of a finding—by calculating an effect size, graphing the finding, or determining relative risk.

Chi-Square Test for Goodness of Fit

The chi-square test for goodness of fit calculates a statistic based on just one variable. There is no independent variable or dependent variable, just one categorical variable with two or more categories into which participants are placed. In fact, the chi-square test for goodness of fit received its name because it measures how good the fit is between the observed data in the various categories of a single nominal variable and the data we would expect according to the null hypothesis. If there’s a really good fit with the null hypothesis, then we cannot reject the null hypothesis. If we hope to receive empirical support for the research hypothesis, then we’re actually hoping for a bad fit between the observed data and what we expect according to the null hypothesis.

EXAMPLE 15.1

For example, researchers reported that the best youth soccer players in the world were more likely to have been born early in the year than later (Dubner & Levitt, 2006a). As one example, they reported that 52 elite youth players in Germany were born in January, February, or March, whereas only 4 players were born in October, November, or December. (Those born in other months were not included in this study.)

The null hypothesis predicts that when a person was born will not make any difference; the research hypothesis predicts that the month a person was born will matter when it comes to being an elite soccer player. Assuming that births in the general population are evenly distributed across months of the year, the null hypothesis posits that equal numbers of elite soccer players were born in the first 3 months and the last 3 months of the year. With 56 participants in the study (52 born in the first 3 months and 4 in the last 3 months), equal frequencies lead us to expect 28 players to have been born in the first 3 months and 28 in the last 3 months just by chance. The birth months don’t appear to be evenly distributed, but is this a real pattern, or just chance?

image
Are Elite Soccer Players Born in the Early Months of the Year? Based on data for elite German youth soccer players, a chi-square test for goodness of fit showed a statistically significant effect: Players were more likely to be born in the first 3 months than in the last 3 months of the year (Dubner & Levitt, 2006a).
© Gero Breloer/dpa/Corbis

Like previous hypothesis tests, the chi-square goodness of fit test uses the six steps of hypothesis testing.

STEP 1: Identify the populations, distribution, and assumptions.

There are always two populations involved in a chi-square test: one population that matches the frequencies of participants like those we observed and another population that matches the frequencies of participants like those we would expect according to the null hypothesis. In this case, there is a population of elite German youth soccer players with birth dates like those we observed and a population of elite German youth soccer players with birth dates like those in the general population. The comparison distribution is a chi-square distribution. There’s just one nominal variable, birth months, so we’ll conduct a chi-square test for goodness of fit.

434

The first assumption is that the variable (birth month) is nominal. The second assumption is that each observation is independent; no single participant can be in more than one category. The third assumption is that participants were randomly selected. If not, it may be unwise to confidently generalize beyond the sample. A fourth assumption is that there is a minimum number of expected participants in every category (also called a cell)—at least 5 and preferably more. An alternative guideline (Delucchi, 1983) is for there to be at least five times as many participants as cells. In any case, the chi-square tests seem robust to violations of this last assumption.

Summary: Population 1: Elite German youth soccer players with birth dates like those we observed. Population 2: Elite German youth soccer players with birth dates like those in the general population.

The comparison distribution is a chi-square distribution. The hypothesis test will be a chi-square test for goodness of fit because we have one nominal variable only, birth months. This study meets three of the four assumptions: (1) The one variable is nominal. (2) Every participant is in only one cell (you can’t be born in both January and November). (3) This is not a randomly selected sample of all elite soccer players. The sample includes only German youth soccer players in the elite leagues. We must be cautious in generalizing beyond young German elite players. (4) There are more than five times as many participants as cells (the table has two cells, and 2 × 5 = 10). We have 56 participants, far more than the 10 necessary to meet this guideline.

STEP 2: State the null and research hypotheses.

For chi-square tests, it’s easiest to state the hypotheses in words only, rather than in both words and symbols.

Summary: Null hypothesis: Elite German youth soccer players have the same pattern of birth months as those in the general population. Research hypothesis: Elite German youth soccer players have a different pattern of birth months than those in the general population.

STEP 3: Determine the characteristics of the comparison distribution.

Our only task at this step is to determine the degrees of freedom. In most previous hypothesis tests, the degrees of freedom have been based on sample size. For the chi-square hypothesis tests, however, the degrees of freedom are based on the numbers of categories, or cells, in which participants can be counted. The degrees of freedom for a chi-square test for goodness of fit is the number of categories minus 1:

dfχ2 = k – 1

MASTERING THE FORMULA

15-1: We calculate the degrees of freedom for the chi-square test for goodness of fit by subtracting 1 from the number of categories, represented in the formula by k. The formula is:

dfχ2 = k – 1

Here, k is the symbol for the number of categories. The current example has only two categories: Each soccer player in this study was born in either the first 3 months of the year or the last 3 months of the year:

dfχ2 = 2 – 1 = 1

Summary: The comparison distribution is a chi-square distribution, which has 1 degree of freedom: dfχ2 = 2 – 1 = 1.

435

STEP 4: Determine the critical values, or cutoffs.

To determine the cutoff, or critical value, for the chi-square statistic, we use the chi-square table in Appendix B. χ2 is based on squares and can never be negative, so there is just one critical value. An excerpt from Appendix B that applies to the soccer study is given in Table 15-2. We look under the p level, usually 0.05, and across from the appropriate degrees of freedom, in this case, 1. For this situation, the critical chi-square statistic is 3.841.

image

Summary: The critical χ2, based on a p level of 0.05 and 1 degree of freedom, is 3.841, as seen in the curve in Figure 15-1.

image
Figure 15.3: FIGURE 15-1
Determining the Cutoff for a Chi-Square Statistic
We look up the critical value for a chi-square statistic, based on a certain p level and degrees of freedom, in the chi-square table. Because the chi-square statistic is squared, it is never negative, so there is only one critical value.

STEP 5: Calculate the test statistic.

To calculate a chi-square statistic, we determine the observed frequencies and the expected frequencies, as seen in Table 15-3 and in the second and third columns of Table 15-4. The expected frequencies are determined from the information we have about the general population. In this case, we estimate that, in the general population, about half of all births (only, of course, among those born in the first or last 3 months of the year) occur in the first 3 months of the year, a proportion of 0.50:

image
image

(0.50)(56) = 28

Of the 56 elite German youth soccer players in the study, we would expect to find that 28 were born in the first 3 months of the year (versus the last 3 months of the year) if these youth soccer players are no different from the general population with respect to birth date. Similarly, we would expect a proportion of 1 − 0.50 = 0.50 of these soccer players to be born in the last 3 months of the year:

436

(0.50)(56) = 28

These numbers are identical only because the proportions are 0.50 and 0.50. If the proportion expected for the first 3 months of the year, based on the general population, were 0.60, then we would expect a proportion of 1 – 0.60 = 0.40 for the last 3 months of the year.

The next step in calculating the chi-square statistic is to calculate a sort of sum of squared differences. We start by determining the difference between each observed frequency and its matching expected frequency. This is usually done in columns, so we use this format even though we have only two categories. The first three columns of Table 15-4 show us the categories, observed frequencies, and expected frequencies, respectively. The fourth column, using O for observed and E for expected, displays the differences. As in the other situations, if we sum the differences, we get 0; they cancel out because some are positive and some are negative. We solve this problem as we have others—by squaring the differences, as shown in the fifth column. Next, however, we have a step that we haven’t seen before with squared differences. We divide each squared difference by the expected value for its cell, as seen in the sixth column. The numbers in the sixth column are the ones we sum.

437

As an example, here are the calculations for the category “first 3 months”:

O − E = (52 − 28) = 24

(O − E)2 = (24)2 = 576

image

Once we complete the table, the last step is easy. We just add up the numbers in the sixth column. In this case, the chi-square statistic is 20.571 + 20.571 = 41.14. We can finish the formula by adding a summation sign to the formula in the sixth column. Note that we don’t have to divide this sum by anything, as we’ve done with other statistics. We already did the dividing before we summed. This sum is the chi-square statistic. Here is the formula:

image

Summary: image

MASTERING THE FORMULA

15-2: The formula for the chi-square statistic is:

image

For each cell, we subtract the expected count, E, from the observed count, O. Then we square each difference and divide the square by the expected count. Finally, we sum the calculations for each of the cells.

STEP 6: Make a decision.

This last step is identical to that of previous hypothesis tests. We reject the null hypothesis if the test statistic is beyond the critical value, and we fail to reject the null hypothesis if the test statistic is not beyond the critical value. In this case, the test statistic, 41.14, is far beyond the cutoff, 3.841, as seen in Figure 15-2. We reject the null hypothesis. Because there are only two categories, it’s clear where the difference lies. It appears that elite German youth soccer players are more likely to have been born in the first 3 months of the year, and less likely to have been born in the last 3 months of the year, than members of the general population. (If we had failed to reject the null hypothesis, we could only have concluded that these data did not provide sufficient evidence to show that elite German youth soccer players have a different likelihood of being born in the first, versus last, 3 months of the year than those in the general population.)

image
Figure 15.6: FIGURE 15-2
Making a Decision
As with other hypothesis tests, we make a decision with a chi-square test by comparing the test statistic to the cutoff, or critical value. We see here that 41.14 would be far to the right of 3.841.
image
Clown Therapy Israeli researchers tested whether entertainment by a clown led to higher pregnancy rates after in vitro fertilization treatment. Their study had two nominal variables—entertainment (clown, no clown) and pregnancy (pregnant, not pregnant)—and could have been analyzed with a chi-square test for independence.
Lisa F. Young/Shutterstock

Summary: Reject the null hypothesis; it appears that elite German youth soccer players are more likely to have been born in the first 3 months of the year, and less likely to have been born in the last 3 months of the year, than people in the general population.

438

We report these statistics in a journal article in almost the same format that we’ve seen previously. We report the degrees of freedom, the value of the test statistic, and whether the p value associated with the test statistic is less than or greater than the cutoff based on the p level of 0.05. (As usual, we would report the actual p level if we conducted this hypothesis test using software.) In addition, we report the sample size in parentheses with the degrees of freedom. In the current example, the statistics read:

χ2(1, N = 56) = 41.14, p < 0.05

The researchers who conducted this study imagined four possible explanations: “a) certain astrological signs confer superior soccer skills; b) winter-born babies tend to have higher oxygen capacity, which increases soccer stamina; c) soccer-mad parents are more likely to conceive children in springtime, at the annual peak of soccer mania; d) none of the above” (Dubner & Levitt, 2006a). What’s your guess?

Dubner and Levitt (2006a) picked (d) and suggested another alternative. Participation in youth soccer leagues has a strict cutoff date: December 31. Compared to those born in December, children born the previous January are likely to be more physically and emotionally mature, perceived as more talented, chosen for the best leagues, and given better coaching—a self-fulfilling prophecy. All this from a simple chi-square test for goodness of fit!

Chi-Square Test for Independence

The chi-square test for goodness of fit analyzes just one nominal variable. The chi-square test for independence analyzes two nominal variables.

Like the correlation coefficient, the chi-square test for independence does not require that we identify independent and dependent variables. However, specifying an independent variable and a dependent variable can help us articulate hypotheses. The chi-square test for independence is so named because it is used to determine whether the two variables—no matter which one is considered to be the independent variable—are independent of each other. Let’s take a closer look at whether pregnancy rates are independent of (that is, depend on) whether one is entertained by a clown after IVF treatment.

EXAMPLE 15.2

In the clown study, as reported in the mass media (Ryan, 2006), 186 women were randomly assigned to receive IVF treatment only or to receive IVF treatment followed by 15 minutes of clown entertainment. Eighteen of the 93 who received only the IVF treatment became pregnant, whereas 33 of the 93 who received both IVF treatment and clown entertainment became pregnant. The cells for these observed frequencies can be seen in Table 15-5. The table of cells for a chi-square test for independence is called a contingency table because it helps us see if the outcome of one variable (e.g., becoming pregnant versus not becoming pregnant) is contingent on the other variable (clown versus no clown). Let’s implement the six steps of hypothesis testing for a chi-square test for independence.

image

439

STEP 1: Identify the populations, distribution, and assumptions.

Population 1: Women receiving IVF treatment like the women we observed. Population 2: Women receiving IVF treatment for whom the presence of a clown is not associated with eventual pregnancy.

The comparison distribution is a chi-square distribution. The hypothesis test will be a chi-square test for independence because we have two nominal variables. This study meets three of the four assumptions: (1) The two variables are nominal. (2) Every participant is in only one cell. (3) The participants were not, however, randomly selected from the population of all women undergoing IVF treatment. We must be cautious in generalizing beyond the sample of Israeli women at this particular hospital. (4) There are more than five times as many participants as cells (186 participants and 4 cells; 4 × 5 = 20). We have far more participants, 186, than the 20 necessary to meet this guideline.

STEP 2: State the null and research hypotheses.

Null hypothesis: Pregnancy rates are independent of whether one is entertained by a clown after IVF treatment. Research hypothesis: Pregnancy rates depend on whether one is entertained by a clown after IVF treatment.

STEP 3: Determine the characteristics of the comparison distribution.

For a chi-square test for independence, we calculate degrees of freedom for each variable and then multiply the two to get the overall degrees of freedom. The degrees of freedom for the variable in the rows of the contingency table are:

dfrow = krow − 1

MASTERING THE FORMULA

15-3: To calculate the degrees of freedom for the chi-square test for independence, we first have to calculate the degrees of freedom for each variable. For the variable in the rows, we subtract 1 from the number of categories in the rows: dfrow = krow − 1. For the variable in the columns, we subtract 1 from the number of categories in the columns: dfcolumn = kcolumn − 1. We multiply these two numbers to get the overall degrees of freedom: dfχ2 = (dfrow)(dfcolumn). To combine all the calculations, we can use the following formula instead: dfχ2 = (krow – 1)(kcolumn – 1).

The degrees of freedom for the variable in the columns of the contingency table are:

dfcolumn = kcolumn − 1

The overall degrees of freedom are:

dfχ2 = (dfrow)(dfcolumn)

To expand this last formula, we write:

dfχ2 = (krow – 1)(kcolumn – 1)

The comparison distribution is a chi-square distribution, which has 1 degree of freedom:

dfχ2 = (krow – 1)(kcolumn – 1) = (2 – 1)(2 – 1) = 1

440

STEP 4: Determine the critical value, or cutoff.

The critical value, or cutoff, for the chi-square statistic, based on a p level of 0.05 and 1 degree of freedom, is 3.841 (Figure 15-3).

image
Figure 15.8: FIGURE 15-3
The Cutoff for a Chi-Square Test for Independence
The shaded region is beyond the critical value for a chi-square test for independence with a p level of 0.05 and 1 degree of freedom. If the test statistic falls within this shaded area, we will reject the null hypothesis.

STEP 5: Calculate the test statistic.

The next step, determining the appropriate expected frequencies, is the most important in the calculation of the chi-square test for independence. Errors are often made in this step, and if the wrong expected frequencies are used, the chi-square statistic derived from them will also be wrong. Many students want to divide the total number of participants (here, 186) by the number of cells (here, 4) and place equivalent frequencies in all cells for the expected data. Here, that would mean that the expected frequencies would be 46.5.

But this would not make sense. Of the 186 women, only 51 became pregnant; 51/186 = 0.274, or 27.4%, of these women became pregnant. If pregnancy rates do not depend on clown entertainment, then we would expect the same percentage of successful pregnancies, 27.4%, regardless of exposure to clowns. If we have expected frequencies of 46.5 in all four cells, then we have a 50%, not a 27.4%, pregnancy rate. We must always consider the specifics of the situation.

In the current study, we already calculated that 27.4% of all women in the study became pregnant. If pregnancy rates are independent of whether a woman is entertained by a clown, then we would expect 27.4% of the women who were entertained by a clown to become pregnant and 27.4% of women who were not entertained by a clown to become pregnant. Based on this percentage, 100 − 27.4 = 72.6% of women in the study did not become pregnant. We would therefore expect 72.6% of women who were entertained by a clown to fail to become pregnant and 72.6% of women who were not entertained by a clown to fail to become pregnant. Again, we expect the same pregnancy and nonpregnancy rates in both groups—those who were and were not entertained by clowns.

Table 15-6 shows the observed data, and it also shows totals for each row, each column, and the whole table. From Table 15-6, we see that 93 women were entertained by a clown after IVF treatment. As we calculated above, we would expect 27.4% of them to become pregnant:

(0.274)(93) = 25.482

image

Of the 93 women who were not entertained by a clown, we would expect 27.4% to become pregnant if clown entertainment is independent of pregnancy rates:

(0.274)(93) = 25.482

441

We now repeat the same procedure for not becoming pregnant. We would expect 72.6% of women in both groups to not become pregnant. For the women who were entertained by a clown, we would expect 72.6% of them to not become pregnant:

(0.726)(93) = 67.518

Of the women who were not entertained by a clown, we would expect 72.6% to not become pregnant:

(0.726)(93) = 67.518

(Note that the two expected frequencies for the first row are the same as the two expected frequencies for the second row, but only because the same number of people were in each clown condition, 93. If these two numbers were different, we would not see the same expected frequencies in the two rows.)

MASTERING THE FORMULA

15-4: When conducting a chi-square test for independence, we can calculate the expected frequencies in each cell by taking the total for the column that the cell is in, dividing it by the total in the study, and then multiplying by the total for the row that the cell is in:

image

The method of calculating the expected frequencies that we described above is ideal because it is directly based on our own thinking about the frequencies in the rows and in the columns. Sometimes, however, our thinking can get muddled, particularly when the two (or more) row totals do not match and the two (or more) column totals do not match. For these situations, a simple set of rules leads to accurate expected frequencies. For each cell, we divide its column total (Totalcolumn) by the grand total (N) and multiply that by the row total (Totalrow):

image

As an example, the observed frequency of those who became pregnant and were entertained by a clown is 33. The row total for this cell is 93. The column total is 51. The grand total, N, is 186. The expected frequency, therefore, is:

image

Notice that this result is identical to what we calculated without a formula. The middle step above shows that, even with the formula, we actually did calculate the pregnancy rate overall, by dividing the column total (51) by the grand total (186). We then calculated how many in that row of 93 participants we would expect to become pregnant using this overall rate:

(0.274)(93) = 25.482

442

The formula follows the logic of the test, and keeps us on track when there are multiple calculations.

As a final check on the calculations, shown in Table 15-7, we can add up the frequencies to be sure that they still match the row, column, and grand totals. For example, if we add the two numbers in the first column, 25.482 and 25.482, we get 50.964 (different from 51 only because of rounding decisions). If we had made the mistake of dividing the 186 participants into cells by dividing by 4, we would have had 46.5 in each cell; then the total for the first column would have been 46.5 + 46.5 = 93, which is not a match with 51. This final check ensures that we have the appropriate expected frequencies in the cells.

image

The remainder of the fifth step is identical to that for a chi-square test for goodness of fit, as seen in Table 15-8. As before, we calculate the difference between each observed frequency and its matching expected frequency, square these differences, and divide each squared difference by the appropriate expected frequency. We add up the numbers in the final column of the table to calculate the chi-square statistic:

image
image

STEP 6: Make a decision.

Reject the null hypothesis; it appears that pregnancy rates depend on whether a woman receives clown entertainment following IVF treatment (Figure 15-4). The statistics, as reported in a journal article, would follow the format we learned for a chi-square test for goodness of fit as well as for other hypothesis tests in earlier chapters. We report the degrees of freedom and sample size, the value of the test statistic, and whether the p value associated with the test statistic is less than or greater than the critical value based on the p level of 0.05. (We would report the actual p level if we conducted this hypothesis test using software.) In the current example, the statistics would read:

χ2(1, N = 186) = 6.08, p < 0.05

image
Figure 15.12: FIGURE 15-4
The Decision
Because the chi-square statistic, 6.081, is beyond the critical value, 3.841, we can reject the null hypothesis. It is unlikely that the pregnancy rates for those who received clown therapy versus those who did not were this different from each other just by chance.

443

Cramér’s V, the Effect Size for Chi Square

A hypothesis test tells us only that there is a likely effect—that it would be unlikely that the observed effect would have occurred merely by chance if the null hypothesis were true. But we have to calculate an additional statistic, an effect size, before we can make claims about the importance of a study’s finding.

MASTERING THE FORMULA

15-5: The formula for Cramér’s V, the effect size typically used with the chi-square statistic, is:

Cramér’s image

The numerator is the chi-square statistic, χ2. The denominator is the product of the sample size, N, and either the degrees of freedom for the rows or the degrees of freedom for the columns, whichever is smaller. We take the square root of this quotient to get Cramér’s V.

  • Cramér’s V is the standard effect size used with the chi-square test for independence; also called Cramér’s phi, symbolized as

Cramér’s V is the standard effect size used with the chi-square test for independence. It is also called Cramér’s phi (pronounced “fie”—rhymes with fly) and symbolized by Φ. Once we have calculated the test statistic, it is easy to calculate Cramér’s V by hand. The formula is:

image

χ2 is the test statistic we just calculated, N is the total number of participants in the study (the lower-right number in the contingency table), and dfrow/column is the degrees of freedom for either the category in the rows or the category in the columns, whichever is smaller.

EXAMPLE 15.3

For the clown example, we calculated a chi-square statistic of 6.081, there were 186 participants, and the degrees of freedom for both categories were 1. When neither degrees of freedom is smaller than the other, of course, it doesn’t matter which one we choose. The effect size for the clown study, therefore, is:

image

Now that we have the effect size, we must ask what it means. As with other effect sizes, Jacob Cohen (1992) has developed guidelines, shown in Table 15-9, for determining whether a particular effect is small, medium, or large. The guidelines vary based on the size of the contingency table. When the smaller of the two degrees of freedom for the row and column is 1, we use the guidelines in the second column. When the smaller of the two degrees of freedom is 2, we use the guidelines in the third column. And when it is 3, we use the guidelines in the fourth column. As with the other guidelines for judging effect sizes, such as those for Cohen’s d, the guidelines are not cutoffs. Rather, they are rough indicators to help researchers gauge a finding’s importance.

image

444

The effect size for the clowning and pregnancy study was 0.18. The smaller of the two degrees of freedom, that for the row and that for the column, was 1 (in fact, both were 1). So we use the second column in Table 15-9. This Cramér’s V falls about halfway between the effect-size guidelines for a small effect (0.10) and a medium effect (0.30). We would call this a small-to-medium effect. We can build on the report of the statistics by adding the Cramér’s V to the end:

χ2(1, N = 186) = 6.08, p < Cramér’s V = 0.18

Graphing Chi-Square Percentages

In addition to calculating Cramér’s V, we can graph the data. A visual depiction of the pattern of results is an effective way to understand the size of the relation between two variables assessed using the chi-square statistic. We don’t graph the frequencies, however. We graph proportions or percentages.

EXAMPLE 15.4

For the women entertained by a clown, we calculate the proportion who became pregnant and the proportion who did not. For the women not entertained by a clown, we again calculate the proportion who became pregnant and the proportion who did not. The calculations for the proportions are below.

In each case, we’re dividing the number of a given outcome by the total number of women in that group. The proportions are called conditional proportions because we’re not calculating the proportions out of all women in the study; we’re calculating proportions for women in a certain condition. We calculate the proportion of women who became pregnant, for example, conditional on their having been entertained by a clown.

Entertained by a clown

Became pregnant: 33/93 = 0.355

Did not become pregnant: 60/93 = 0.645

Not entertained by a clown

Became pregnant: 18/93 = 0.194

Did not become pregnant: 75/93 = 0.806

445

We can put those proportions into a table (such as Table 15-10). For each category of entertainment (clown, no clown), the proportions should add up to 1.00; or if we used percentages, they should add up to 100%.

image

We can now graph the conditional proportions, as in Figure 15-5. Alternately, we could have simply graphed the two rates at which women got pregnant—0.355 and 0.194—given that the rates at which they did not become pregnant are based on these rates. This graph is depicted in Figure 15-6. In both cases, we include the scale of proportions on the y-axis from 0 to 1.0 so that the graph does not mislead the viewer into thinking that rates are higher than they are.

image
Figure 15.15: FIGURE 15-5
Graphing and Chi Square
When we graph the data for a chi-square test for independence, we graph conditional proportions rather than frequencies. The proportions allow us to compare the rates at which women became pregnant in the two conditions.
image
Figure 15.16: FIGURE 15-6
A Simpler Graph of Conditional Probabilities
Because the rates at which women did not become pregnant are based on the rates at which they did become pregnant, we can simply graph one set of rates. Here we see the rates at which women became pregnant in each of the two clown conditions.

Relative Risk

  • Relative risk is a measure created by making a ratio of two conditional proportions; also called relative likelihood or relative chance.

Public health statisticians like John Snow (Chapter 1) are called epidemiologists. They often think about the size of an effect with chi square in terms of relative risk, a measure created by making a ratio of two conditional proportions. It is also called relative likelihood or relative chance.

446

EXAMPLE 15.5

As with Figure 15-5, we calculate the chance of getting pregnant with clown entertainment after IVF by dividing the number of pregnancies in this group by the total number of women entertained by clowns:

33/93 = 0.355

MASTERING THE CONCEPT

15-3: We can quantify the size of an effect with chi square through relative risk, also called relative likelihood. By making a ratio of two conditional proportions, we can say, for example, that one group is twice as likely to show some outcome or, conversely, that the other group is one-half as likely to show that outcome.

We then calculate the chance of getting pregnant with no clown entertainment after IVF by dividing the number of pregnancies in this group by the total number of women not entertained by clowns:

18/93 = 0.194

If we divide the chance of getting pregnant having been entertained by clowns by the chance of getting pregnant not having been entertained by clowns, then we get the relative likelihood:

0.355/0.194 = 1.830

Based on the relative risk calculation, the chance of getting pregnant when IVF is followed by clown entertainment is 1.83 times (or about twice) the chance of getting pregnant when IVF is not followed by clown entertainment. This matches the impression that we get from the graph.

Alternately, we can reverse the ratio, dividing the chance of becoming pregnant without clown entertainment, 0.194, by the chance of becoming pregnant following clown entertainment, 0.355. This is the relative likelihood for the reversed ratio:

0.194/0.355 = 0.546

This number gives us the same information in a different way. The chance of getting pregnant when IVF is followed by no entertainment is 0.55 (or about half) the chance of getting pregnant when IVF is followed by clown entertainment. Again, this matches the graph; one bar is about half that of the other.

image
The Fun Theory The Fun Theory is an initiative by Volkswagen to identify ways to persuade people to engage in activities that are good for them or for the environment. When the stairs next to an escalator were turned into piano keys that played musical notes, the rate of people taking the stairs was 66% higher than for regular stairs. The researchers gathered data by counting people in the two conditions—those climbing the stairs and those riding the escalator. This is a nominal variable, so the researchers would have used a chi-square test. But they reported their findings in terms of relative likelihood, an easy-to-understand single number.
Yonhap News/YNA/Newscom

(Note: When this calculation is made with respect to diseases, it is referred to as relative risk [rather than relative likelihood].) We should be careful when relative risks and relative likelihoods are reported, however. We must always be aware of base rates. If, for example, a certain disease occurs in just 0.01% of the population (that is, 1 in 10,000) and is twice as likely to occur among people who eat ice cream, then the rate is 0.02% (2 in 10,000) among those who eat ice cream. Relative risks and relative likelihoods can be used to scare the general public unnecessarily—which is one more reason why statistical reasoning is a healthy way to think.

447

CHECK YOUR LEARNING

Reviewing the Concepts
  • The chi-square tests are used when all variables are nominal.

  • The chi-square test for goodness of fit is used with one nominal variable.

  • The chi-square test for independence is used with two nominal variables; usually one can be thought of as the independent variable and one as the dependent variable.

  • Both chi-square hypothesis tests use the same six steps of hypothesis testing with which we are familiar.

  • The appropriate effect-size measure for the chi-square test for independence is Cramér’s V.

  • We can depict the effect size visually by calculating and graphing conditional proportions so that we can compare the rates of a certain outcome in each of two or more groups.

  • Another way to consider the size of an effect is through relative risk, a ratio of conditional proportions for each of two groups.

Clarifying the Concepts 15-5 When do we use chi-square tests?
15-6 What are observed frequencies and expected frequencies?
15-7 What is the effect-size measure for chi-square tests and how is it calculated?
Calculating the Statistics 15-8 Imagine a town that boasts clear blue skies 80% of the time. You get to work in that town one summer for 78 days and record the following data. (Note: For each day, you picked just one label.)

Clear blue skies: 59 days

Cloudy/hazy/gray skies: 19 days

image
  1. Calculate degrees of freedom for this chi-square test for goodness of fit.

  2. Determine the observed and expected frequencies.

  3. Calculate the differences and squared differences between frequencies, and calculate the chi-square statistic. Use the six-column format provided here.

15-9 Assume you are interested in whether students with different majors tend to have different political affiliations. You ask U.S. psychology majors and business majors to indicate whether they are Democrats or Republicans. Of 67 psychology majors, 36 indicated that they are Republicans and 31 indicated that they are Democrats. Of 92 business majors, 54 indicated that they are Republicans and 38 indicated that they are Democrats. Calculate the relative likelihood of being a Republican, given that a person is a business major as opposed to a psychology major.
Applying the Concepts 15-10 The Chicago Police Department conducted a study comparing two types of lineups for suspect identification: simultaneous lineups and sequential lineups (Mecklenburg, Malpass, & Ebbesen, 2006). In simultaneous lineups, witnesses saw the suspects all at once, either live or in photographs, and then made their selection. In sequential lineups, witnesses saw the people in the lineup one at a time, either live or in photographs, and said yes or no to suspects one at a time. After numerous high-profile cases in which DNA evidence exonerated people who had been convicted, including many on death row, many police departments shifted to sequential lineups in the hope of reducing incorrect identifications. Several previous studies had indicated the superiority of sequential lineups with respect to accuracy. Over one year, three jurisdictions in Illinois compared the two types of lineups. Of 319 simultaneous lineups, 191 led to identification of the suspect, 8 led to identification of another person in the lineup, and 120 led to no identification. Of 229 sequential lineups, 102 led to identification of the suspect, 20 led to identification of another person in the lineup, and 107 led to no identification.
  1. Who or what are the participants in this study? Identify the independent variable and its levels, as well as the dependent variable and its levels.

  2. Conduct all six steps of hypothesis testing.

  3. Report the statistics as you would in a journal article.

  4. Why is this study an example of the importance of using two-tailed rather than one-tailed hypothesis tests?

  5. Calculate the appropriate measure of effect size for this study.

  6. Create a graph of the conditional proportions for these data.

  7. Calculate the relative likelihood of a suspect being accurately identified in the simultaneous lineups versus the sequential lineups.

Solutions to these Check Your Learning questions can be found in Appendix D.