1.11–1.14: Well-designed experiments are essential to testing hypotheses.

Controlled experiments increase the power of our observations. Here, laboratory-grown plants are measured.
1.11: Controlling variables makes experiments more powerful.

From our earlier discussion of critical experiments, you have a sense of how important it is to have a well-planned, well-designed experiment. In performing experiments, our goal is to figure out whether one thing influences another thing: if an experiment enables us to draw a correct conclusion about that cause-and-effect relationship, it is a good experiment.

In our initial discussion of experiments, we just described what the experiment was, without examining why the researchers chose to perform the experiment the way they did. In this section, we explore some of the ways to maximize an experiment’s power, and we’ll find that, with careful planning, it is possible to increase an experiment’s ability to discern causes and effects.

First, let’s consider some elements common to most experiments.

Let’s look at a real-life example that illustrates the importance of considering all these elements when designing an experiment.

Stomach ulcers are erosions of the stomach lining that can be very painful. In the late 1950s, a doctor reported in the Journal of the American Medical Association that stomach ulcers could be effectively treated by having a patient swallow a balloon connected to some tubes that circulated a refrigerated fluid. He argued that by super-cooling the stomach, acid production was reduced and the ulcer symptoms were relieved. He had convincing data to back up his claim: in all 24 of his patients who received this “gastric freezing” treatment, their condition improved (FIGURE 1-14). As a result, the treatment became widespread for many years.

Figure 1.14: No controls. Gastric freezing and stomach ulcers: a poorly designed experiment and a well-controlled “do-over.”

17

Although there was a clear hypothesis (“Gastric cooling reduces the severity of ulcers”) and some compelling observations (all 24 patients experienced relief), this experiment was not designed well. In particular, there was no clear group with whom to compare the patients who received the treatment. In other words, who is to say that just going to the doctor or having a balloon put into your stomach doesn’t improve ulcers? The results of this doctor’s experiment do not rule out these interpretations.

A few years later, another researcher decided to do a more carefully controlled study. He recruited 160 ulcer patients and gave 82 of them the gastric freezing treatment. The other 78 received a similar treatment in which they swallowed the balloon but had room-temperature water pumped in. The latter was an appropriate control group because the subjects were treated exactly like the experimental group, with the exception of only a single difference between the groups—whether they experienced gastric freezing or not. The new experiment could test for an effect of the gastric freezing, while controlling for the effects of other, lurking variables that might affect the outcome.

18

Surprisingly, although the researcher found that for 34% of those in the gastric freezing group their condition improved, he also found that 38% of those in the control group improved. These results indicated that gastric freezing didn’t actually confer any benefit when compared with a treatment that did not involve gastric freezing. Not surprisingly, the practice was abandoned.

A surprising result from the gastric freezing study—and many other studies, as we’ll see—was that they demonstrated the placebo effect, the frequently observed, poorly understood phenomenon in which people respond favorably to any treatment. The placebo effect highlights the need for an appropriate control group. We want to know whether the treatment is actually responsible for any effect seen; if the control group receiving the placebo or sham treatment has an outcome much like that of the experimental group, we can conclude that the treatment itself does not have an effect.

Another pitfall to be aware of in designing an experiment is to ensure that the persons conducting the experiment don’t influence the experiment’s outcome. An experimenter can often unwittingly influence the results of an experiment. This phenomenon is seen in the story of a horse named Clever Hans. Hans was considered clever because his owner claimed that Hans could perform remarkable intellectual feats, including multiplication and division. When given a numerical problem, the horse would tap out the answer number with his foot. Controlled experiments, however, demonstrated that Hans was only able to solve problems when he could see the person asking the question and when that person knew the answer (FIGURE 1-15). It turned out that the questioners, unintentionally and through very subtle body language, revealed the answers.

Figure 1.15: Math whiz or ordinary horse? The horse Clever Hans was said to be capable of mathematical calculations, until a controlled experiment demonstrated otherwise.

The Clever Hans phenomenon highlights the benefits of instituting even greater controls when designing an experiment. In particular, it highlights the value of blind experimental design, in which the experimental subjects do not know which treatment (if any) they are receiving, and double-blind experimental design, in which neither the experimental subjects nor the experimenter know which treatment a subject is receiving.

Another hallmark of an extremely well-designed experiment is that it combines the blind/double-blind strategies we’ve just described in a randomized, controlled, double-blind study. In this context, “randomized” refers to the fact that, as in the echinacea study described above, the subjects are randomly assigned into experimental and control groups. In this way, researchers and subjects have no influence on the composition of the two groups.

The use of randomized, controlled, double-blind experimental design can be thought of as an attempt to imagine all the possible ways that someone might criticize an experiment and to design the experiment so that the results cannot be explained by anything other than the effect of the treatment. In this way, the experimenter’s results either support the hypothesis or invalidate it—in which case, the hypothesis must be rejected. If multiple explanations can be offered for the observations and evidence from an experiment, then it has not succeeded as a critical experiment.

Suppose you want to know whether a new drug is effective in fighting the human immunodeficiency virus (HIV), the virus that leads to AIDS. Which experiment would be better, one in which the drug is added to HIV-infected cells in a test tube under carefully controlled laboratory conditions, or one in which the drug is given to a large number of HIV-infected individuals? There is no definitive answer. In laboratory studies, it is possible to control nearly every environmental variable. In their simplicity, however, lab studies may introduce difficulties in coming to broader conclusions about a drug’s effectiveness under real-world conditions. For example, complex factors in human subjects, such as nutrition and stress, may interact with the experimental drug and influence its effectiveness. These interactions will not be present and taken into account in the controlled lab study.

19

Good experimental design is more complex than simply following a single recipe. The only way to determine the quality of an experiment is to assess how well the variables that were not of interest were controlled, and how well the experimental treatment tested the relationship of interest.

TAKE-HOME MESSAGE 1.11

To draw clear conclusions from experiments, it is essential to hold constant all those variables we are not interested in. Control and experimental groups should differ only with respect to the treatment of interest. Differences in outcomes between the groups can then be attributed to the treatment.

In crafting a well-planned, well-designed experiment, describe the four elements common to most experiments that are used to maximize an experiment’s power.