greenberg1e

1.4 The Scientific Method: Systematizing the Acquisition of Knowledge

Research Methods Video on LaunchPad

SCIENCE: We must recognize that there are many ways of knowing, but … in the entire course of prehistory and history only one way of knowing has encouraged its own practitioners to doubt their own premises and to systematically expose their own conclusions to the hostile scrutiny of nonbelievers.

—Marvin Harris, American anthropologist (1927–2001), Cultural Materialism

Over thousands of years, humans have refined everyday thinking to sharpen it and make it less susceptible to the many biases that limit it; the result is the scientific method. Science is a method for answering questions about the nature of reality that reduces the impact of the human biases we have just reviewed.

Just like everyone else, scientists make observations, look for patterns in what they observe, and then generate explanations for how or why things happen as they do. These explanations are called theories. Research is the process whereby scientists observe events in the world, look for consistent patterns, and evaluate theories proposed to explain those patterns. Research and theory are simply scientific refinements of the observations and explanations we all make every day to help us get through life. However, whereas ordinary intuitive thinking typically leads us to accept explanations relatively uncritically (especially if they are consistent with our expectations and desires), generating a plausible account of our observations is just the beginning of scientific inquiry.

Theory

An explanation for how and why variables are related to each other.

Research

The process whereby scientists observe events, look for patterns, and evaluate theories proposed to explain those patterns.

The Cycle of Theory and Research in Social Psychology

As the social psychologist Kurt Lewin put it, “There is nothing so practical as a good theory” (1952, p. 169). Theories tell us about causal factors that influence particular kinds of behavior. This knowledge can help us alter behavior in beneficial ways. For example, if theories specify factors that lead to bad things such as child abuse and good things such as charitable giving, we can design ways to alter these factors to reduce the occurrence of the bad behaviors and increase the occurrence of good behaviors. And research tells us whether our theories provide the right explanations. The concept of theory is often misunderstood: In grade school, many of us were taught to distinguish theories from facts. This probably gave a lot of people the idea that the difference between a theory and a fact lies in the level of certainty we have about its truth, as if a theory is a sort of weaker version of a fact that shouldn’t be taken all that seriously. But in scientific thinking, the concepts of fact and theory are entirely different from one another. They serve different functions and play different roles in the process of doing science. A fact is the content of research observations that have been replicated, that is, verified by multiple observers. A theory, on the other hand, is an explanation for the facts. Although a theory may be our current best explanation for how or why things happen as they do, it is not—and is not expected to be—an entirely complete or accurate explanation in any absolute sense. The history of science shows us that a theory accepted as useful scientific truth in one era often is viewed as a quaint but misguided misunderstanding centuries or even decades later.

FIGURE 1.3

The Cycle of Theory and Research
Theories lead to hypotheses that are then tested. The outcomes of these tests influence views and revisions of the theory.

Scientific knowledge is continually evolving, moving toward a more and more useful understanding of reality.

To assess the validity of a theory, a scientist starts by deriving testable hypotheses from the theory (see FIGURE 1.3). A hypothesis is an “if-then” statement that follows logically from the theory and specifies how certain variables (characteristics that vary and that can be measured) should be related to each other if the theory is correct. Hypotheses are the bridges that scientists use to move from a theory, which explains how or why something happens as it does, to research, in which new observations are made and checked to see if they correspond with what is predicted by a hypothesis.

Hypothesis

An “if-then” statement that follows logically from a theory and specifies how certain variables should be related to each other if the theory is correct.

Typically, a theory generates numerous hypotheses. Once they are tested, either the theory is accepted as it is or is revised or replaced in light of the research findings. The reformulated theory (or the new theory) is then used to generate additional hypotheses, which are then tested, and the cycle continues. In this way, through the ongoing interplay between theory and research, the process spirals toward more sophisticated theories that provide increasingly accurate explanations of reality and programs of research that probe increasingly refined questions about these processes. Let’s consider the cycle of theory and research using the example of the development of stereotype threat theory, a topic we will cover more fully in chapter 11.

Stereotype Threat: Case Study of a Theory

To illustrate the ongoing interplay between theory and research, let’s focus on some influential findings in social psychology that address the question of why people who are members of stereotyped groups sometimes perform poorly on standardized tests of their abilities. This work was inspired by the fairly consistent observation that members of stigmatized groups (groups within a culture that are viewed negatively in some way), such as African Americans and women, tend, on average, to perform less well in certain academic areas—specifically, general scholastic aptitude and mathematics, respectively—than their nonstigmatized peers. Although it is clear that there is wide variability in the performance of people of all races and genders and that it is impossible to predict accurately a person’s performance from simple demographic information such as race or gender, these race and gender gaps in test scores beg for some explanation. No one disputes that these average differences between groups exist, but as you might expect, the theories that attempt to explain why they are there have been extremely controversial. They range from locating a cause in nature (the most contentious being a presumption of genetic inferiority) to pointing to systemic inequalities in environment (patterns of poverty or discrimination within American society).

In 1995, Claude Steele and Josh Aronson proposed a creative new theoretical explanation for poor performance by members of stigmatized groups, which they labeled stereotype threat theory. The basic idea is that if you are a member of a group about which there are negative stereotypic beliefs, engaging in behavior that is relevant to those negative beliefs puts you in a doubly threatening situation. Not only will you be judged as an individual but your performance also will be taken as evidence of the ability of your entire group. So in the context of a test of verbal intelligence, unlike a White male, whose performance is typically taken as indicative of only his own ability, an African American male might worry that a low score will be viewed as evidence of his entire race’s alleged deficiencies in intelligence. Likewise, a woman who misses too many math questions could be seen as confirming the stereotypes of women’s inability to do math. Steele proposed that this resulting experience of stereotype threat is at least part of the reason members of stigmatized groups tend to perform less well in areas relevant to negative stereotypes concerning their group. Steele further posited that, because of the prevailing negative stereotypic beliefs about the group, the situation itself—having to take a test—arouses stereotype threat and reduces stigmatized students’ ability to perform up to their potential. Stereotype threat theory thus proposes that conditions that bring the stereotype to mind contribute to poor performance among members of various stigmatized groups.

This, of course, is a very different explanation from one that assumes that differences in the abilities and potential of particular groups result from either genetic inferiority or a lifetime of experience with poverty or discrimination. If true, stereotype threat theory would also be a nice example of how understanding basic social psychological processes can shed new light on important personal and social issues. But to have any scientific credibility, this theoretical explanation must be tested. How would a social psychologist use the scientific method to assess the validity of the stereotype threat theory? To do so, the social psychologist will have to generate hypotheses from the theory, and then test those hypotheses with research. Consider these two hypotheses that have been generated from the theory of stereotype threat:

The more a person is conscious of the negative stereotype of his or her group, the worse that person will perform in areas related to the stereotype.
Situations that make a negative stereotype of a person’s group prominent in the person’s mind will lead to worse performance than situations that do not.

Hypothesis 1 proposes an association between two variables that can be assessed with correlational research. Hypothesis 2 posits that one variable has a causal influence on the other and can be assessed only through experimental research. We will discuss each of these two primary approaches to research in social psychology and how they were used to test these hypotheses derived from stereotype threat theory.

Research: The Correlational Method

One of the most widely used approaches to doing research is the correlational method, whereby two or more preexisting characteristics (the variables) of a group of individuals are measured and compared to determine whether and/or to what extent they are associated. If the variables are associated, then knowing a person’s standing on one variable predicts, beyond chance levels, his or her standing on the other variable; if this is the case, we can say that the variables are correlated. To test stereotype threat hypothesis 1, we might: (1) measure the extent to which particular members of a given group are conscious of their stereotyped status; and (2) assess each person’s performance on stereotype-related dimensions.

Correlational method

Research in which two or more variables are measured and compared to determine to what extent if any they are associated.

Liz Pinel and colleagues (Pinel et al., 2005) tested this very hypothesis. They first measured the stigma consciousness—the tendency to be highly conscious of one’s stereotyped status and to believe that these stereotypes have a big effect on how one is viewed by others—of academically stigmatized students (specifically, African Americans and Hispanic Americans) and nonacademically stigmatized students (specifically, European Americans and Asian Americans). Then, they obtained information about their participants’ GPAs. To assess whether stigma consciousness is correlated with GPA, the researchers computed correlation coefficient. Pinel and colleagues found a moderate negative correlation between stigma consciousness and GPA. Let’s briefly consider what this statistic can tell us about how two variables are related.

Correlation coefficient

A positive or negative numerical value that shows the direction and the strength of a relationship between two variables.

The Correlation Coefficient

FIGURE 1.4

Correlation Coefficient
The correlation coefficient (signified by the letter r) is a measure of the relationship between two variables. These graphs represent the three kinds of correlations between Variables X and Y: positive, negative, and no correlation. The sign of r (+ or -) tells us whether the relationship is positive or negative. The absolute value of r tells us the strength of the relationship. The stronger the correlation, the more confidently we can predict the value of one variable from the value of the other.

The correlation coefficient (typically indicated by r) gives us two vital pieces of information about a relationship: both the direction and the strength of the relationship (FIGURE 1.4).

The sign, positive (+) or negative (−), tells us the direction of the relationship. A positive correlation occurs when a high level of one variable tends to be accompanied by a corresponding high level of another variable. A negative correlation exists when a high level of one variable is accompanied by a low level of the other variable. If Pinel and colleagues had found that the higher a person scores on stigma consciousness, the better her GPA, they would have found a positive correlation. The negative correlation that they actually found tells us that the higher a person’s level of stigma consciousness the lower that person’s GPA. This negative correlation provides some evidence for stereotype threat hypothesis 1.

The numerical value tells us the strength of the relationship. The strength of a correlation refers to how closely associated the two variables are, how much knowing a person’s standing on one variable tells us about, or enables us to predict, the person’s standing on the other variable. If knowing a person’s level of stigma consciousness enables us to predict his test performance with absolute certainty, the two variables are perfectly correlated, and the correlation coefficient equals −1.0 (or +1.0 if it were a positive relationship). Perfect correlations are virtually nonexistent in the behavioral sciences. When they do occur, it typically means that the two variables are different measures of the same underlying conceptual variable. For example, temperature as measured on Fahrenheit and Celsius thermometers will be perfectly correlated (as long as the thermometers are operating correctly). On the other hand, we would find a correlation of 0 if the two variables are completely unrelated. This means that knowing something about a person’s standing on one variable tells you nothing whatsoever about where she stands on the other. For example, according to stereotype threat theory, knowing a person’s level of stigma consciousness should only relate to his GPA if he is not a member of an academically stigmatized group. Sure enough, Pinel and colleagues observed no correlation between stigma consciousness and GPA for nonacademically stigmatized groups.

It’s important to be clear that although the sign of a correlation coefficient tells you whether two variables are positively or negatively correlated, it tells you nothing at all about the strength of that relationship. Thus, a correlation of −0.60 reflects a stronger relationship than a correlation of +0.35.

Pinel and colleagues’ finding of a moderate negative correlation between stigma consciousness and GPA tells us that knowing how sensitive a person is to stereotypes about his or her group gives us some basis for predicting how well he or she is likely to score on measures of academic performance, although we couldn’t predict the person’s performance with absolute certainty or precision. Clearly, many variables other than stigma consciousness influence college GPA. And imperfections in our two measures would also reduce the size of any correlation we observe. Nonetheless, the negative correlation between stigma consciousness and test performance tells us that these two variables are indeed related, which is consistent with the hypothesis deduced from stereotype threat theory.

Correlation Does Not Imply Causation

Scientists usually are interested in understanding why variables are correlated. But finding a negative correlation between stigma consciousness and test performance does not allow us to conclude that fear of confirming stereotypes about one’s group causes poorer performance. Correlation does not imply causality. There must be a correlation between the two variables if one variable causes the other, but there are two major reasons that correlation does not enable us to infer causation.

First, although it is certainly possible that stereotype threat causes poorer test performance, it is also possible that the causal relationship runs in the other direction: Doing poorly on tests makes a person especially sensitive to the stereotypes about his or her group, and perhaps fearful that he or she might be contributing to these stereotypes. This is known as the reverse causality problem: Correlations tell us nothing about which of two interrelated variables is the cause and which is the effect.

Reverse causality problem

A correlation between variables x and y may occur because one causes the other, but it is often impossible to determine if x causes y or y causes x.

The second major reason that we cannot draw causal inferences from correlations is referred to as the third variable problem: The two variables are correlated, but it is still possible that neither exerts a causal influence on the other. It may be that some third variable—for example, a general tendency to be self-conscious and anxiety prone—is responsible for the correlation found between stigma consciousness and performance. Being self-conscious and nervous might make a person concerned about how others view his or her group and at the same time may interfere with test performance. Such correlations between anxiety proneness and stigma consciousness, and between anxiety proneness and test performance, would create a correlation between stigma consciousness and test performance even if there were no causal relationship between the latter two variables. Taken together, the reverse causality and third variable problems make it impossible to be conclusive about causality from correlational findings.

Third variable problem

The possibility that two variables may be correlated but do not exert a causal influence on one another; rather, both are caused by some additional variable.

Longitudinal Studies

In longitudinal studies two variables are measured at multiple points in time. By examining correlations between one variable at time 1 and another variable at time 2, such studies can make us more confident about likely causal order. For example, one classic study of aggression (see Huesmann et al., 1984) found that amount of violent television watched in childhood correlated positively with amount of aggressive behavior in adulthood. In contrast, aggressiveness in childhood did not correlate with amount of violent television watching in adulthood. The result of this longitudinal study suggests that childhood television watching affected later aggression, rather than childhood aggressiveness affecting later television viewing. However, such studies are not definitive about causation because the third variable problem remains. For example, it could be that neglectful parents both allow their children to watch a lot of violence, and for other reasons produce adult offspring with aggressive tendencies.

Longitudinal studies

Studies in which variables are measured in the same individuals over two or more periods of time, typically over months or years.

Research: The Experimental Method

Fortunately, there is an approach to research that lets us draw conclusions about cause and effect: the experimental method. As a consequence, this method is extremely popular among social psychologists. An experiment is a study in which the researcher takes active control and manipulates one variable, referred to as the independent variable, measures possible effects on another variable, referred to as the dependent variable, and tries to hold all other variables constant. The independent variable is manipulated because it is being investigated as the possible cause. The dependent variable is the one that is then measured to assess the effect. An experiment can tell us if the dependent variable depends on the independent variable. An experiment would be needed to test hypothesis 2: that conditions which increase the individual’s awareness of the negative stereotype of that person’s group (and thereby increase stereotype threat) will reduce the person’s test performance. Such an experiment must involve:

Manipulating our research participants’ awareness of the negative stereotype of their group, creating two or more conditions differing in the level of the independent variable: stereotype threat
Assessing participants’ performance on a test that is relevant to that negative stereotype, providing a measurement of the dependent variable
Holding everything else constant within the setting

The experimental method A study in which a researcher manipulates a variable, referred to as the independent variable, measures possible effects on another variable, referred to as the dependent variable, and tries to hold all other variables constant.

When all the requirements of the experimental method are met, the study has internal validity, which means that it is possible to conclude that the manipulated independent variable caused the change in the measured dependent variable. Let’s translate that into a real example.

Internal validity

The judgment that for a particular experiment it is possible to conclude that the manipulated independent variable caused the change in the measured dependent variable.

FIGURE 1.5

Stereotype Threat
Black students performed more poorly on a test when reminded of their race. White students were unaffected by such a reminder.
[Data source: Steele & Aronson (1995) © 1995 American Psychological Association. Reprinted by permission]

Steele and Aronson (1995) conducted a series of experiments that provided the first evidence that stereotype threat caused reduced performance among members of stigmatized groups. In one study, African American and White college students were given a challenging test of verbal ability that consisted of sample items from the verbal portion of the Graduate Record Exam. Performance on the test was the dependent measure. To manipulate stereotype threat, the researchers simply asked half of the participants to indicate their race on the answer form prior to beginning the test; this simple act of indicating race was meant to bring to mind the stereotypes about how each participant’s group was supposed to perform on such tests. The other half of the participants, the control group, took the test with no mention being made of race, so they were much less likely to be thinking about stereotype-related issues while taking the test. Whether or not race was mentioned was the independent variable. The racial identity of the participants was the second variable that the experimenters expected to play a causal role. We should note that demographic variables such as race, age, or gender are commonly treated as independent variables, even though the experimenter cannot manipulate them. The caveat to interpreting these variables is to keep in mind that there are lots of different ways that Blacks and Whites might differ from each other (e.g., cultural beliefs, socioeconomic status) that could underlie any racial differences observed.

As stereotype threat hypothesis 2 predicts, when participants were reminded of their race, there was a significant drop in the performance of African American students but not in the performance of White students (see FIGURE 1.5). This pattern of results is referred to as an interaction, which occurs when the effect of one independent variable on the dependent variable depends on the level of a second variable. In this study, the effect of the reminder of race depended on whether the participant’s racial identity was African American or White. Because African American students are stereotyped in the United States as being less intelligent, for them, the reminder of racial identity led to lower performance; for White students, however, it had no effect. Thus, even though we cannot randomly assign a person to his or her race, the fact that a reminder of race influenced Blacks and Whites differently suggests that racial identity is what mattered here.

Interaction

A pattern of results in which the effect of one independent variable on the dependent variable depends on the level of a second independent variable.

How Experiments Make Causal Inference Possible

The experimental method overcomes the limitations of the correlational method so that causal inferences are possible. As we previously noted, the first major obstacle to drawing causal inferences from correlational studies is the reverse causality problem, because you typically can’t tell which variable is the cause and which is the effect. In an experiment, because the researcher determines whether a participant is exposed to the experimental condition (race reminder) or the control condition (no race reminder) and subsequently measures performance, it is impossible for the participant’s poor test performance to have caused him or her to be reminded of his or her race. Causes must come before effects. Consequently, the causal sequence problem is eliminated.

What about the third variable problem? Recall that in an experiment, the only thing that differs between conditions is the independent variable. Everything else is held constant. The researcher treats participants in the various conditions in identical ways: the same instructions are given; the physical setting is the same; and any written, audio, and video materials are identical, except for what is to be manipulated between conditions (the independent variable). All this is done so that if there is a difference between conditions, we can be confident that the cause is the independent variable. By holding everything constant across the various conditions in the experiment except the independent variable, the experimenter solves the third variable problem.

Controlling the Impact of Individual Differences by Random Assignment

But how do we know that the participants in the experimental group and the control group didn’t simply differ on the dependent measure to begin with? And how do we know that differences between the two samples on some other dimension that existed prior to manipulation of the independent variable were not responsible for the differences in test performance that occurred? The potential problem of preexisting differences among participants in the various experimental conditions is solved by random assignment, in which participants are assigned to conditions in such a way that each person has an equal chance of being in either condition (FIGURE 1.6). Deciding which treatment to give each participant can be done by tossing a coin, pulling names from a hat, or using a random number generator to put individuals into treatment conditions.

Random assignment

A procedure in which participants are assigned to conditions in such a way that each person has an equal chance of being in any condition of an experiment.

FIGURE 1.6

Random Assignment
Even though individuals differ from each other, when they are randomly assigned to groups, the groups’ averages will be largely the same.

Random assignment is an essential component of all experiments in which participants are put in different conditions. It ensures that, if a sufficiently large sample is used, no systematic average differences will exist among the participants in the various experimental conditions. This is because random assignment evenly distributes people, and all the ways they may vary, across all the conditions of the experiment. For example, if a sample of 100 people was randomly divided into two groups of 50, the mean height, weight, level of self-esteem, and verbal GRE performance of the two groups would be virtually identical. Random assignment thereby controls for individual differences that might otherwise vary between the experimental and control groups, and is thus essential for eliminating the third variable problem. Because the experimental method eliminates both the causal sequence and third variable problems, it provides internal validity and causal inferences can be made.

Experimental and Correlational Research in Concert

Because the experimental method enables us to infer causes for behavior, it is generally the preferred way to conduct research in social psychology, but in some situations experimental methods cannot be applied. Many of the variables that social psychologists are interested in cannot be manipulated. There are many important questions about the effect of variables like gender, age, race, and sexual preference, but people can’t be randomly assigned to be male or female, old or young, Black or White, or straight or gay. Furthermore, many of the questions of interest to social psychologists deal with long-standing personality dispositions, attitudes, values, and other individual differences. Correlational methods that examine relationships between preexisting differences among people are the only way questions such as these can be addressed. Correlational methods also have the advantage of examining the relationship between variables as they naturally occur in the real world. Experimental methods, by definition, involve observing the effects of variables that are created by researchers; consequently there is always some question as to how well these experimentally created variables mirror the forces that operate on us in real life. For all these reasons, correlational methods have been, and will continue to be, important tools for social psychologists.

In fact, the correlational method and the experimental method provide complementary information about how or why people behave the way they do. Let’s go back to the example of research testing hypotheses derived from stereotype threat theory. The experimental research by Steele and Aronson (1995) provides compelling evidence that stereotype threat is at least one of the factors that cause poorer performance by members of stigmatized groups; on the other hand, the correlational research by Pinel and colleagues (2005) suggests that some students will be more vulnerable to these effects. When applied together, these two research strategies enable social psychologists to document the role that both individual differences and situational forces play in leading people to behave the way they do. Such evidence fits the first core assumption of social psychology: that behavior is a function of a combination of the features of the person and the situation.

Field Research and Quasi-experimental Methods

Because social psychologists ultimately want to understand the forces that operate on us in real life, another important type of research is field research. This type of research occurs outside the laboratory, for example, in schools, office buildings, medical clinics, football games, or even in shopping malls or on street corners. Field research is not wedded to an experimental or correlational approach. It can be either. It also often utilizes quasi-experimental designs. In a quasi-experimental design, groups of participants are compared on some dependent variable, but for practical or ethical reasons, the groups are not formed on the basis of random assignment. Note that the stereotype threat study described earlier can be considered partly quasi-experimental because participants are not randomly assigned to race.

Field research

Research that occurs outside the laboratory, for example, in schools, office buildings, medical clinics, football games, or even in shopping malls or on street corners.

Quasi-experimental designs

Type of research in which groups of participants are compared on some dependent variable, but for practical or ethical reasons, the groups are not formed on the basis of random assignment.

We can also use research on stereotype threat to highlight an example of field research. One goal of a field study might be to see if we can use stereotype threat to design interventions that reduce racial differences in students’ actual academic achievement. This is exactly what researchers such as Greg Walton and Geoff Cohen have done (Walton & Cohen, 2007, 2011). They reasoned that for many if not most college students, the transition to college can be stressful. These students have to adjust to a more rigorous type of study than you had in high school. They might also be living away from family for the first time and trying to make new friends. When we add feelings of stereotype threat, perhaps from not seeing many other faculty or students who share their racial background, students from minority backgrounds might be at greater risk for feeling that they don’t belong, and this might impair their academic performance. In the context of the transition to university, Walton and Cohen wanted to see if shoring up feelings of belonging at college would reduce stereotype threat and improve academic performance for racial minorities.

To do this, they randomly assigned a sample of White and Black first-year college students to one of two conditions. In the intervention condition, students read testimonials the researchers had compiled from more senior students. They each sounded something like this:

FIGURE 1.7

Belonging and School Performance
A racial gap in achievement observed between European American and African American students was reduced when first-year students received an intervention to bolster feelings of belonging.
[Data source: Walton & Cohen (2011) © 2011 by the American Association for the Advancement of Science. Reprinted by permission]

Freshman year even though I met large numbers of people, I didn’t have a small group of close friends… . I was pretty homesick, and I had to remind myself that making close friends takes time. Since then … I have met people some of whom are now just as close as my friends in high school were. (Walton & Cohen, 2007, p. 88)

These testimonials from students of different racial, gender, and ethnic backgrounds send the message that stress is a pretty normal and understandable part of all students’ experience. Those students in the control condition read similar testimonials about how students’ political attitudes had changed. Then the researchers proceeded to follow both groups of students for the next three years (FIGURE 1.7).

Among students in the control group, Black students earned GPAs that were significantly lower than those of their White peers. But for those students who received the intervention and learned that stress is a part of everyone’s experience at university, this racial gap in achievement was cut in half over the next three years. Whereas learning about how stressed other students are did not matter too much for White students, it significantly boosted how Black students performed in their courses, and it did so by helping students see that their experience of stress and adversity at college in no way meant that they didn’t belong there.

One of the strengths of field research like this study is that it tries to capture social behavior as it occurs out in the world. This is important because, as you well know, the world is a complex place and researchers need to study that complexity. The chief weakness, though, is that researchers often lose a lot of the control they have in the laboratory in terms of what participants are exposed to, and thus don’t always have the clearest manipulation or measurement of the variables they want to study.

Quasi-experimental designs have an additional weakness. Because the researchers are not randomly assigning participants to the levels of the independent variable, there is a greater chance that participants may differ on some other potentially important characteristic. So although the researchers may be able to overcome the reverse causality problem of correlational designs, it is more difficult to overcome the third variable problem. None of these methods is perfect, but each has its own strengths and weaknesses, and each plays a useful role in helping social psychologists understand human behavior.

What Makes for a Good Theory in Social Psychology?

The ultimate function of a good theory is to be useful by moving this ongoing cyclical process of science forward. It should advance our understanding of how and why people behave the way they do, facilitating efforts to make the world a better place. Our experiences in applying our newfound knowledge to issues of real human importance ultimately come back to tell us how well our theoretical understanding fits the world in which we live. A useful theory has the following characteristics.

Organizes Observations

First, a theory should organize the observations, or facts, that come out of the research process. Theories create order out of chaos and simplify the bewildering array of facts that we observe in the world around us. Theories provide a more abstract and general way of describing the nature of reality than the complex and sometimes messy observations that theories seek to explain. For example, Steele’s stereotype threat theory summarizes and simplifies results from other studies that have shown that members of stigmatized groups perform worse when very few other members of their group are present, when the person administering a test is from a different ethnic group, and when the test is presented as one on which their group tends to perform poorly. This rather disparate set of facts coheres within the broader theory that performance is impaired when conditions make it likely that people will think of a relevant negative stereotype about their group. Generally speaking, the broader the range of observations that a theory can make sense of, the better. Theories that are able to account for a wide variety of observations are said to have conceptual power.

Explains Observations

Theories do much more than simplify and organize knowledge. A good theory should also give us insight into how or why things happen. To do this effectively, a theory must be conceptually coherent and logically consistent. It should specify clear relationships between variables that help us understand the processes through which particular events in the world occur. To be truly useful, a theory should provide us with understanding that goes beyond what we already know. It should shed new light on what we observe happening within and around us, giving us a sort of “aha, now I get it” experience. Stereotype threat theory provides an entirely new way of thinking about group differences in academic achievement, and it does this in a coherent and logically consistent way. It is also a relatively simple idea that fits well with our understanding of basic psychological processes. In this sense, stereotype threat theory is highly parsimonious—it explains a wide range of observations with a relatively small number of basic principles. Einstein’s theory of relativity and Darwin’s theory of evolution are two of the most parsimonious theories in the history of science in that both explain extremely diverse sets of observations with just a few relatively simple principles.

Provides Direction for Research

Third, a good theory should inspire research. It should enable us to deduce clear and novel hypotheses that follow logically from its propositions, hypotheses that in turn lead to research that tells us how well the theory fits with reality. Stereotype threat theory has inspired a great deal of research that has both supported its core propositions and led to refinements in our understanding of how stereotype threat undermines performance. Many potentially interesting ideas about why people behave the way they do have been discussed over the millennia; some of these ideas might be quite accurate. But unless a theory produces hypotheses that can be used to assess its fit with reality, it is not scientifically useful. That’s not to say that a useful theory must be easy to test, or that it must be testable immediately on its development. Indeed, some of the most influential and important theories in the history of science could not be tested directly for many years after they were proposed. For example, the theory that physical matter is made up of tiny particles moving about in space could not be tested until suitable techniques were developed to enable physicists to assess the nature and movement of atomic particles. An intriguing new theory that seems at first to defy scientific testing often provides the impetus for the development of new technologies that can be used to test the theory’s core propositions.

Generates New Questions

Fourth, in addition to inspiring research, a good theory should shed light on phenomena beyond what the theory was originally designed to explain. In other words, a good theory should be generative, providing new theoretical insights in other domains. When we combine a good theory with other ideas, new ideas should spill out. Stereotype threat theory has been generative in the sense that it has led to new ideas about performance deficits in a wide range of areas and among a wide variety of different groups of people. It has also led to finer-grained ideas about the processes through which fear of confirming negative stereotypes of one’s group undermines successful performance (more on this to follow).

Has Practical Value

A good theory should have practical applications that help us solve pressing problems and improve the quality of life. In recent years, stereotype threat theory has begun to inform interventions applied in schools and on college campuses (Walton & Spencer, 2009). For example, the theory implies that remedial programs to help negatively stereotyped minority-group students may backfire because they continually remind the students of the negative stereotype of their group. Typically, practical applications of social psychological theories take time to emerge. One of the earliest theories about how to reduce prejudice, developed by Gordon Allport in his classic book The Nature of Prejudice (1954), led to what is known as the contact hypothesis. The idea is that specific forms of contact between groups can break down stereotypes and negative feelings and thus reduce prejudice and intergroup conflict. In 1961, Muzafer Sherif and colleagues (Sherif et al., 1961) conducted a famous study at a Boy Scout camp in Oklahoma that supported this hypothesis and led to myriad practical applications. For example, Elliot Aronson (1978) used ideas from Allport and supported by the Sherif study to reduce interracial conflict in Austin, Texas, public schools that had recently been desegregated. His jigsaw classroom technique promotes the kind of contact that Sherif and colleagues had found effective in their summer-camp study. We’ll cover all these examples in greater depth later in the text, and we’ll highlight examples of practical applications of theories throughout.

Assessing Abstract Theories with Concrete Research

These people are reacting to the terrorist bombings at the Boston Marathon, April 15, 2013. Social psychologists assess anxiety by self-report, facial expressions, overt behavior, and physiological measures.
[Bill Green/The Boston Globe via Getty Images]

Theories deal with the world of abstract conceptual variables, such as attitudes, self-esteem, anxiety, attraction, and conflict. They specify relationships among these variables in attempts to explain important aspects of human behavior. For example, one explanation for why stereotype threat undermines performance is that it creates anxiety that people try to regulate and control, saddling minority students with an extra cognitive task that nonstigmatized students don’t have to worry about (Schmader et al., 2008). Anxiety is a conceptual variable that most psychologists define as a vague, un-differentiated feeling of unease, tension, or fear. Anxiety can involve various psychological and bodily reactions: a feeling of dread, a vague sense of impending doom, sweaty palms, racing heart, butterflies in the stomach, fidgeting, nail-biting, or a desire to change the topic or flee the situation. Different people experience anxiety in somewhat different ways and exhibit a rather wide range of symptoms or signs that they are experiencing it. The abstract concept of anxiety refers to the essential underlying phenomenon that is indicated by these various signs and symptoms. So how would a scientist conduct research on—that is, make observations of—something so abstract and diffuse as the concept of anxiety?

To conduct research on any conceptual variable, we first must develop an operational definition of that concept. Defining a concept operationally involves moving from the abstract world of concepts to the more concrete world of specific instances. An operational definition entails finding a specific, concrete way to measure or manipulate a conceptual variable. Ideally, an operational definition will capture a typical instance of the conceptual variable that illustrates its core meaning or essence. In reality, any conceptual variable can be operationalized in a variety of ways, so that no single operational definition is likely to provide the perfect or only instance of the concept.

Operational definition

A specific, concrete method of measuring or manipulating a conceptual variable.

Measuring and Manipulating What We Intend

Let’s first examine this issue with regard to a dependent variable. Operationalizing a dependent variable refers to specifying precisely how it will be measured in a particular study. For example, a researcher might operationalize the conceptual variable anxiety in the following ways:

Scores on a self-report survey of the subjective feeling of anxiety (e.g., tension, apprehension, uneasiness, butterflies in the stomach)
Overt behaviors that are thought (on the basis of a theoretical conception) to be indicators of anxiety (e.g., chewing on the fingernails, rapidly tapping one’s foot, twitching eyelids)
Physiological measures that assess bodily symptoms or signs that are thought (again, on the basis of a theoretical conception) to be indicators of anxiety (e.g., rapid heart rate, sweaty palms, exaggerated startle response)

These various operationalizations tap into different aspects of the concept of anxiety. It’s important that multiple operationalizations of a given conceptual variable are highly correlated with each other, so that we can be confident that the various operationalizations are all tapping into the same underlying conceptual variable. Construct validity is the degree to which the dependent variable measures what it intends to measure or the independent variable manipulates what it intends to manipulate. Often researchers assess the construct validity of an independent variable by including a manipulation check, which is a measure that directly assesses whether the manipulation created the change that was intended. For dependent variables, if different operationalizations of a given conceptual variable are not strongly related to each other, we may actually be tapping into two different conceptual variables. Poor construct validity is one of the primary potential problems in the research process. If it is not clear that an operationalization of a dependent variable measures what it was intended to measure, then we can’t draw any clear conclusion from an experiment using that operationalization. An experiment that lacks construct validity for either the independent or the dependent variable does not have internal validity. No clear conclusions can be drawn from the results of such an experiment.

Construct validity

The degree to which the dependent measure assesses what it intends to assess or the manipulation manipulates what it intends to manipulate.

Problems with the construct validity of independent variables are particularly common in social psychological research. Operationalizations of the manipulation of any one specific conceptual independent variable might also inadvertently alter several other conceptual variables. For example, if we manipulate stereotype threat by informing our research participants that it is widely believed that their group performs poorly on a particular task, this may well be increasing their concern that their poor performance might confirm a negative stereotype, just as our conceptual definition of stereotype threat would suggest. But it may also be doing other things. Maybe it’s just creating a general increase in fear of failure that has little to do with concerns about stereotypes. It might even be creating anger at the thought that some people view one’s group as inferior.

How can we know if the effect of our independent variable is due to concerns about stereotypes, performance anxiety, anger, or any number of other possible consequences of our manipulation? This is a crucial question for determining whether a study has internal validity. When more than one conceptual variable differs across conditions in an experiment, the independent variable is confounded. Confounds cloud the interpretation of research results because a variable other than the conceptual variable we intended to manipulate may be responsible for the effect on the dependent variable, making alternative explanations possible. Alternative explanations make it unclear which conceptual variable really is responsible for the changes in the dependent variable that occur. Confounds and alternative explanations are thus a major problem in social psychological research, and in all of science. Much of the controversy and disagreement among scientists results from the confounding of variables.

Confound

A variable other than the conceptual variable intended to be manipulated that may be responsible for the effect on the dependent variable, making alternative explanations possible.

Researchers do their best to avoid confounds in their studies. Ideally, the researcher carefully considers potential confounds and alternative explanations when planning the study and includes control groups that expose participants to these possible confounding alternative causal variables without exposing them to the variable that is being investigated as a possible cause. To control for possible confounds in experiments on the effect of stereotype threat on test performance, we might include control conditions in which participants are threatened, distracted, or angered in ways unrelated to stereotypes of the groups to which they belong. If the experimental stereotype threat induction group shows worse performance than any of these other groups, we can confidently rule out performance anxiety, distraction, and anger as alternative explanations for our findings, which would increase our confidence that stereotype threat is, in fact, causing the poorer performance.

The problem of confounding can also be minimized by replicating our studies with different operationalizations of the crucial variables, a process known as conceptual replication. If different studies, each flawed in one way or another, with possible confounds operating, yield consistent results, the probability that an alternative explanation is responsible for the results is reduced. Science is thus a cumulative process, and scientific knowledge depends heavily on ongoing conceptual replications of findings to rule out any confounds that might be affecting our results.

Conceptual replication

The repetition of a study with different operationalizations of the crucial variables but yielding similar results.

Can the Findings Be Generalized?

As you can see, establishing the construct validity of an experiment’s independent and dependent variables is essential to the internal validity of the experiment. If a study has high internal validity, we may know, for instance, that stereotype threat undermined performance by a group of African American students at a university in California in the early 1990s. This is important because it supports a hypothesis derived from stereotype threat theory and thereby increases confidence in the theory. And even if this finding comes from a unique sample, it demonstrates that the effect can occur. Once internal validity has been established, we can then ask, What does this tell us about other people, in other settings, at other times? This is the basic question regarding external validity, the ability to generalize one’s findings. Can we generalize beyond the group of people studied at a particular time and place?

External validity

The judgment that a research finding can be generalized to other people, in other settings, at other times.

Social psychology studies rely heavily on readily available college-student samples. But how can we know if findings from such studies can be generalized?
[Diego Cervo/Shutterstock]

In the case of stereotype threat, one external validity question would be whether these effects are limited to African Americans or extend to other stigmatized groups, and even farther, to majority-group members in domains in which they are negatively stereotyped. For example, would the performance of American women be worsened by reminding them of the stereotype that women supposedly have poor mathematical ability? Would the athletic performance of American White males be diminished by reminding them of the stereotype that “White men can’t jump”? Research suggests that the answer to both questions is yes. For instance, one study (Spencer et al., 1999) showed that leading women participants to believe that women typically perform poorly on the math test they were about to take led to poorer math performance among the women.

FIGURE 1.8

Stereotype Threat in Blacks and Whites
Any group that is negatively stereotyped—and that is any group—can be affected by stereotype threat. In this study, White participants needed more strokes to sink a golf putt when they thought their natural ability was being assessed, but Black participants needed more strokes when they thought their sports intelligence was being assessed.
[Data source: Stone et al. (1999)]

Another study (Stone et al., 1999) had White and Black participants engage in a task akin to miniature golf. Half the participants were told that the task measured sports intelligence, and the other half were told that it measured athletic ability. The researchers reasoned that Whites would feel stereotype threat when they were led to believe that the task measured athletic ability, but Blacks would experience stereotype threat when the task was framed as a measure of sports intelligence. These hypotheses were supported: Whites performed poorly when the task was described as a measure of athletic ability, and Blacks performed poorly when it was described as a measure of sports intelligence (FIGURE 1.8). Over the years, the results of many studies have shown that the problem of stereotype threat is indeed a general one that, depending on the performance domain, can affect members of any group that is negatively stereotyped—that is, virtually everyone!

These examples illustrate that if we are really to have confidence in the external validity of the findings of psychological research, the research needs to be replicated with other types of operationalizations and other participants from varying cultures, geographical regions, and socioeconomic levels. Social psychological research has been criticized for its heavy use of college students as research participants and for participants who might be described as WEIRD (that is, from countries that are Western, educated, industrialized, rich, and democratic [Heinrich et al., 2010]). This is not surprising, because most of the research has been conducted by scientists who are themselves WEIRD. However, some have wondered whether we are simply piling up knowledge about the middle class in WEIRD nations but are learning little about other North Americans, Europeans, and Australians, let alone people from other continents. This narrow choice of participants is a problem, because if culture does exert a powerful role in shaping our view of ourselves and the world around us, then building a science of human behavior largely drawn from only a limited slice of human diversity is likely to skew the conclusions we draw. The ideal solution to this problem would be to sample people randomly from the entire population of the earth. Of course, such random sampling is never possible. Although the rare cross-national survey study might be able to recruit samples that are broadly representative of people from diverse racial, ethnic, national, geographic, and economic constituencies, they are still not representative of people they cannot reach or those who are unwilling to fill out the survey. Most studies that take place in laboratories are forced to rely on samples of convenience, typically college students much like yourself. How, then, can we hope that the findings from such research will inform us about why people in general do the things they do?

One important point to remember is that scientific progress is made in the aggregate. Every study that scientists carry out contains some limitation or weakness; only by conducting multiple studies, using a diverse set of procedures and with a diverse array of samples, can we learn the more general patterns of the human condition. According to this logic, a good, internally valid experiment teaches us what is possible and lends support to a broader theory, even when it doesn’t capture the effect as it actually occurs among people in general. For example, Steele and Aronson’s (1995) demonstration that merely marking one’s race on a cover sheet to a test can lead Black but not White students to underperform doesn’t apply only to the rare occurrences when students fill out demographic information in a testing context. It tells us something conceptual about how reminding people of their group identity can lead to subtle but profound shifts in behavior.

A second answer to the problem of nonrepresentative samples is the increasingly global nature of psychology. Social psychologists can currently be found on every populated continent. Although research from North America and western Europe still dominates the field, the broadening reach of social psychology as a science will continue to fuel efforts to replicate key findings in other cultural and geographic settings. Although these true tests of generalizability will sometimes confirm the universal nature of phenomena, they might also reveal important cultural differences in how we think and feel about ourselves and others. Throughout this text, we’ll highlight some of the research that has already revealed such interesting cultural variations.

The Limitations of Science

The scientific method has helped improve our lives in many ways. By providing a way of assessing the merits of competing claims about the nature of reality, science has greatly enhanced our understanding of the world we live in, ourselves, and how we fit into that world. By applying the knowledge gained from scientific inquiry, humankind has solved many of the problems that have plagued us for millennia, greatly reducing our vulnerability to disease, providing improved means of meeting our basic needs, and giving us control over aspects of life that our ancestors never dreamed possible. But the knowledge science has given us has also created problems our ancestors could have never imagined, such as the potential to kill each other by the millions and to use up or poison the natural resources we rely on for survival. These are very real problems that must be faced. Social psychology can help us grapple with them by providing the knowledge needed to get people to look beyond their immediate personal benefits to see the long-term consequences of their decisions for others and to put aside age-old ethnic and religious rivalries and realize that our mutual survival depends on our ability to coexist peacefully with each other. However, there are some things that the science of social psychology, no matter how far it progresses, cannot help us with. Despite its enormous utility, science has some important limitations.

First, there are aspects of reality that we humans cannot know. Our knowledge of the world originates in the information provided to us by our sense organs. Unfortunately, human sense organs are capable of registering only a tiny fraction of the things that are actually happening in the world. For example, our hearing is limited to a relatively narrow range of sound frequencies. Our dogs can hear many sounds we have no hope of perceiving; bats live in an even more highly differentiated world of sound that we can’t even imagine. Although we often use the knowledge that science gives us to develop technologies that enable us to assess things that our raw sense organs cannot perceive, the fact that we are capable of perceiving only part of what is happening in the world makes a complete understanding of all aspects of reality an elusive goal.

Second, although the scientific method may be objective, the human beings who apply it are not. The scientific method was developed to provide a more objective way of answering questions and evaluating the validity of competing claims about how the world works. But science remains a human endeavor. Scientists may try their best to put their biases aside and be objective, but human nature makes a complete elimination of individual bias impossible. This is part of the reason that controversies continue to rage in all active areas of scientific inquiry. Scientists, social psychologists included, often stake their reputations, careers, and ultimately their self-esteem on the ideas they espouse. It is a rare occurrence for a scientist to gleefully greet new findings that disconfirm important claims he or she has made; more often, egos get involved, and even highly trained scientists committed to the pursuit of truth muster their best arguments to try to convince the scientific community of flaws in the competing point of view and to show that their own ideas were right all along. Fortunately, the scientific method, and the communal nature of the scientific enterprise, typically weeds out these biases in the long run. But it is important to realize that scientists are human beings subject to the same needs, desires, and expectations that produce bias in all humans.

Third, not all questions can be answered scientifically. Many of the most pressing crises facing us today involve questions of values, morality, and ethics. Although social psychology can fruitfully employ the scientific method to understand how values develop, change, and influence human behavior, science cannot tell us which values are the right ones to invest in. Is safety more important than freedom? Are the rights of the individual more important than the welfare of the group? Should scientific knowledge be used to restrict behaviors that are injurious to the people who engage in them? These are important questions we all will be facing in the years to come, and although science can help us understand the consequences of different courses of action, it cannot tell us which consequences are more important than others and which values we should use to guide our decisions.

Fourth, human values exert a powerful influence on the way science is conducted. The questions we choose to ask—or perhaps more important, choose not to ask—are often determined by nonscientific political, religious, and/or economic factors. For example, studies of the genetic underpinnings of behavior were actively discouraged or prohibited outright in the Soviet Union during most of the 20th century because communist ideology claimed that all differences among individuals are the result of environmental influences of the state and society; why bother studying genes when we already know that they’re irrelevant? Similarly, questions pertaining to women’s contributions to science and politics are unlikely to arise in cultural milieus where females are regarded as uneducable subordinates. Scientists, like all human beings, live in a world of values, morals, and ethics. Sometimes these values limit the search for truth that is the ultimate goal of the scientific method. But human values also direct scientific inquiry toward questions that serve our highest aspirations and steer scientific research away from practices that would violate these values.

The Scientific Method: Systematizing the Acquisition of Knowledge

Science is a method for answering questions that reduces the impact of human biases. Theory and research have a cyclical relationship: Research provides systematic observations; theory provides the basis for predicting and explaining these observations; research then tests hypotheses derived from the theory to assess its validity, refine it, or generate alternate theories.
Correlational Method Two or more variables are measured and compared to determine whether or not they are related. A relationship between variables does not mean that one caused the other.		Experimental Method This process seeks to control variables so that cause and effect can be determined. The independent variable is manipulated, and its effect on the dependent variable is observed. Participants must be randomly assigned to conditions to reduce possible confounds.
Features of a Good Theory Organizes the facts. Explains observations. Inspires new research. Generates new questions. Has practical applications.	Internal and External Validity Abstract ideas need to be made specific and quantifiable to be manipulated and measured properly. Studies should be able to be replicated using different operationalizations of variables.	Limitations of Science Human knowledge is limited. Humans are biased. Some questions are outside the scope of science. Human values influence the questions asked.

●

◌

▣