As is the case with quantitative variables, the effects of lurking variables can change or even reverse relationships between two categorical variables. In the following example, we will find that sometimes a lurking variable might reverse the relationship of what we would expect to find in the data.
EXAMPLE 7 Do medical helicopters save lives?
Accident victims are sometimes taken by helicopter from the accident scene to a hospital. Helicopters save time. Do they also save lives? Let’s compare the percentages of accident victims who die with helicopter evacuation and with the usual transport to a hospital by road. The numbers here are hypothetical, but they illustrate a phenomenon that often appears in real data.
Helicopter | Road | Total | |
Victim died | 64 | 260 | 324 |
Victim survived | 136 | 840 | 976 |
Total | 200 | 1100 | 1300 |
We see that 32% (64 out of 200) of helicopter patients died, but only 24% (260 out of 1100) of the others died. That seems discouraging.
The explanation is that the helicopter is sent mostly to serious accidents so that the victims transported by helicopter are more often seriously injured. They are more likely to die with or without helicopter evacuation. We will break the data down into a three-
Serious Accidents | ||
---|---|---|
Helicopter | Road | |
Died | 48 | 60 |
Survived | 52 | 40 |
Total | 100 | 100 |
Less Serious Accidents | ||
---|---|---|
Helicopter | Road | |
Died | 16 | 200 |
Survived | 84 | 800 |
Total | 100 | 1000 |
585
Inspect these tables to convince yourself that they describe the same 1300 accident victims as the original two-
Among victims of serious accidents, the helicopter saves 52% (52 out of 100) compared with 40% for road transport. If we look at less serious accidents, 84% of those transported by helicopter survive versus 80% of those transported by road. Both groups of victims have a higher survival rate when evacuated by helicopter.
How can it happen that the helicopter does better for both groups of victims but worse when all victims are combined? Look at the data: half the helicopter transport patients are from serious accidents, compared with only 100 of the 1100 road transport patients. So the helicopter carries patients who are more likely to die. The original two-
Simpson’s paradox
An association or comparison that holds for all of several groups can disappear or even reverse direction when the data are combined to form a single group. This situation is called Simpson’s paradox.
Simpson’s paradox is just an extreme form of the fact that observed associations can be misleading when there are lurking variables. Remember the caution from Chapter 15: beware the lurking variable.
EXAMPLE 8 Discrimination in mortgage lending?
Studies of applications for home mortgage loans from banks show a strong racial pattern: banks reject a higher percentage of black applicants than white applicants. One lawsuit against a bank for discrimination in lending in the Washington DC area contends that the bank rejected 17.5% of blacks but only 3.3% of whites.
586
The bank replies that lurking variables explain the difference in rejection rates. Blacks have (on the average) lower incomes, poorer credit records, and less secure jobs than whites. Unlike race, these are legitimate reasons to turn down a mortgage application. It is because these lurking variables are confounded with race, the bank says, that it rejects a higher percentage of black applicants. It is even possible, thinking of Simpson’s paradox, that the bank accepts a higher percentage of black applicants than of white applicants if we look at people with the same income and credit record.
Who is right? Both sides will hire statisticians to examine the effects of the lurking variables. Both sides will present statistical arguments supporting or refuting a charge of discrimination in lending. Unfortunately, there are no formal guidelines for how juries and judges are to assess statistical arguments. And juries and judges need not have any statistical expertise. Lurking variables and seeming paradoxes such as Simpson’s paradox make it difficult for even experts to determine the cause of the disparity in the rejection of mortgage applications. The court will eventually decide as best it can, but the decision may not be based on the statistical arguments.