How to live with observational studies

Do children who are bullied suffer depression as adults? Do doctors discriminate against women in treating heart disease? Does texting while driving increase the risk of having an accident? These are cause-and-effect questions, so we reach for our favorite tool, the randomized comparative experiment. Sorry. We refuse to require children to be bullied. We can’t use random digits to assign heart disease patients to be men or women. We are reluctant to require drivers to use cell phones in traffic because talking while driving may be risky.

The best data we have about these and many other cause-and-effect questions come from observational studies. We know that observation is a weak second-best to experiment, but good observational studies are far from worthless, and we will discuss this further in Chapter 15. What makes a good observational study?

First, good studies are comparative even when they are not experiments. We compare random samples of people who were bullied as children with those who were not bullied. We compare how doctors treat men and women patients. We might compare drivers talking on cell phones with the same drivers when they are not on the phone. We can often combine comparison with matching in creating a control group. To see the effects of taking a painkiller during pregnancy, we compare women who did so with women who did not. From a large pool of women who did not take the drug, we select individuals who match the drug group in age, education, number of children, and other lurking variables. We now have two groups that are similar in all these ways so that these lurking variables should not affect our comparison of the groups. However, if other important lurking variables, not measurable or not thought of, are present, they will affect the comparison, and confounding will still be present.

Matching does not entirely eliminate confounding. People who were bullied as children may have characteristics that increase susceptibility to victimization as well as independently increasing the risk of depression. They are more likely to be female, have had concurrent emotional or mental health problems as a child, have parents who suffer from depression, or have experienced maltreatment at home as a child. Although matching can reduce some of these differences, direct comparison of rates of depression in young adults who were bullied as children and in young adults who were not bullied as children would still confound any effect of bullying with the effects of mental health issues in childhood, mental health issues of the parents, and maltreatment as a child. A good comparative study measures and adjusts for confounding variables. If we measure sex, the presence of mental health issues as a child, the presence of mental health issues in the parents, and aspects of the home environment, there are statistical techniques that reduce the effects of these variables on rates of depression so that (we hope) only the effect of bullying itself remains.

105

EXAMPLE 6 Bullying and depression

A recent study in the United Kingdom examined data on 3898 participants in a large observational study for which they had information on both victimization by peers at age 13 and the presence of depression at age 18. The researchers also had information on lots of variables, not just the explanatory variable (bullying at age 13) and the response variable (presence of depression at age 18). The research article said:

Compared with children who were not victimized those who were frequently victimized by peers had over a twofold increase in the odds of depression. . . . This association was slightly reduced when adjusting for confounders. . .

That “adjusting for confounders” means that the final results were adjusted for differences between the two groups. Adjustment reduced the association between bullying at age 13 and depression at age 18, but still left a nearly twofold increase in the odds of depression.

Interestingly, the researchers go on to mention that the use of observational data does not allow them to conclude the associations are causal.

EXAMPLE 7 Sex bias in treating heart disease?

Doctors are less likely to give aggressive treatment to women with symptoms of heart disease than to men with similar symptoms. Is this because doctors are sexist? Not necessarily. Women tend to develop heart problems much later than men so that female heart patients are older and often have other health problems. That might explain why doctors proceed more cautiously in treating them.

106

This is a case for a comparative study with statistical adjustments for the effects of confounding variables. There have been several such studies, and they produce conflicting results. Some show, in the words of one doctor, “When men and women are otherwise the same and the only difference is gender, you find that treatments are very similar.” Other studies find that women are undertreated even after adjusting for differences between the female and male subjects.

As Example 7 suggests, statistical adjustment is complicated. Randomization creates groups that are similar in all variables known and unknown. Matching and adjustment, on the other hand, can’t work with variables the researchers didn’t think to measure. Even if you believe that the researchers thought of everything, you should be a bit skeptical about statistical adjustment. There’s lots of room for cheating in deciding which variables to adjust for. And the “adjusted” conclusion is really something like this:

If female heart disease patients were younger and healthier than they really are, and if male patients were older and less healthy than they really are, then the two groups would get the same medical care.

This may be the best we can get, and we should thank statistics for making such wisdom possible. But we end up longing for the clarity of a good experiment.