7.2 Operant Conditioning: Reinforcements from the Environment

The study of classical conditioning is the study of behaviours that are already in an organism’s repertoire. The animal already knows how to salivate, or feel anxious, fearful, or nauseated. Classical conditioning allows these behaviours to be triggered by new stimuli in the environment, but does not change the form of the behaviour. Classical conditioning is all about the stimuli (like CSs and USs) that come before a response and trigger it; it has nothing to say about how stimuli that come after a response (such as rewards, and punishments) might exert control over behaviour. So how does an animal learn new behaviours?

We turn now to a different form of learning: operant conditioning a type of learning in which the consequences of an organism’s behaviour determine the likelihood of that behaviour being repeated in the future. The study of operant conditioning is the exploration of the powerful effects of reward and punishment on behaviour.

7.2.1 The Development of Operant Conditioning: The Law of Effect

Figure 7.5: Thorndike’s Puzzle Box In Thorndike’s original experiments, food was placed just outside the door of the puzzle box, where the cat could see it. If the cat triggered the appropriate lever, it would open the door and let the cat out.

The study of how active behaviour affects the environment began at about the same time as classical conditioning. In fact, Edward L. Thorndike (1874–1949) first examined active behaviours back in the 1890s, before Pavlov published his findings. Thorndike’s research focused on instrumental behaviours; that is, behaviour that required an organism to do something, solve a problem, or otherwise manipulate elements of its environment (Thorndike, 1898). For example, Thorndike completed several experiments using a puzzle box, which was a wooden crate with a door that would open when a concealed lever was moved in the right way (see FIGURE 7.5). A hungry cat placed in a puzzle box would try various behaviours to get out—scratching at the door, meowing loudly, sniffing the inside of the box, putting its paw through the openings—but only one behaviour opened the door and led to food: tripping the lever in just the right way. After this happened, Thorndike placed the cat back in the box for another round. Do not get the wrong idea. Thorndike probably really liked cats. Far from teasing them, he was after an important behavioural principle.

What is the relationship between behaviour and reward?

Fairly quickly, the cats became quite skilled at triggering the lever for their release. Notice what is going on. At first, the cat enacts any number of likely (but ultimately ineffective) behaviours, but only one behaviour leads to freedom and food. Over time, the ineffective behaviours become less and less frequent, and the one instrumental behaviour (going right for the latch) becomes more frequent (see FIGURE 7.6). From these observations, Thorndike developed the law of effect: Behaviours that are followed by a “satisfying state of affairs” tend to be repeated and those that produce an “unpleasant state of affairs” are less likely to be repeated.

Figure 7.6: The Law of Effect Thorndike’s cats displayed trial-and-error behaviour when trying to escape from the puzzle box. They made lots of irrelevant movements and actions until, over time, they discovered the solution. Once they figured out what behaviour was instrumental in opening the latch, they stopped all other ineffective behaviours and escaped from the box faster and faster.

278

The circumstances that Thorndike used to study learning were very different from those in studies of classical conditioning. Remember that in classical conditioning experiments, the experimenter provides the stimuli (CS and/or US) and then the animal responds. But in Thorndike’s work, the behaviour of the animal determined what happened next. If the behaviour was “correct” (i.e., the latch was triggered), the animal was rewarded with food. Incorrect behaviours produced no results and the animal was stuck in the box until it performed the correct behaviour. Although different from classical conditioning, Thorndike’s work resonated with most behaviourists at the time: It was still observable, quantifiable, and free from explanations involving the mind (Galef, 1998).

7.2.2 B. F. Skinner: The Role of Reinforcement and Punishment

B. F. Skinner with one of his many research participants.
©SAM FALK/SCIENCE SOURCE

Several decades after Thorndike’s work, B. F. Skinner (1904–1990) coined the term operant behaviour to refer to behaviour that an organism produces that has some impact on the environment. In Skinner’s system, all of these emitted behaviours “operated” on the environment in some manner, and the environment responded by providing events that either strengthened those behaviours (i.e., they reinforced them) or made them less likely to occur (i.e., they punished them). Skinner’s elegantly simple observation was that most organisms do not behave like a dog in a harness, passively waiting to receive food no matter what the circumstances. Rather, most organisms are like cats in a box, actively engaging the environment in which they find themselves to reap rewards (Skinner, 1938, 1953).

In order to study operant behaviour scientifically, Skinner developed a variation on Thorndike’s puzzle box. The operant conditioning chamber, or Skinner box, as it is commonly called (shown in FIGURE 7.7 on the next page), allows a researcher to study the behaviour of small organisms in a controlled environment.

Figure 7.7: Skinner Box In a typical Skinner box, or operant conditioning chamber, a rat, pigeon, or other suitably sized animal is placed in this environment and observed during learning trials that use operant conditioning principles.
SCIENCE SOURCE

Skinner’s approach to the study of learning focused on reinforcement and punishment. These terms, which have commonsense connotations, turned out to be rather difficult to define. For example, some people love roller coasters, whereas others find them horrifying; the chance to go on one will be reinforcing for one group but punishing for another. Dogs can be trained with praise and a good belly rub—procedures that are nearly useless for most cats. Skinner settled on a neutral definition that would characterize each term by its effect on behaviour. Therefore, a reinforcer is any stimulus or event that functions to increase the likelihood of the behaviour that led to it, whereas a punisher is any stimulus or event that functions to decrease the likelihood of the behaviour that led to it.

279

Whether a particular stimulus acts as a reinforcer or a punisher depends in part on whether it increases or decreases the likelihood of a behaviour. Presenting food is usually reinforcing and produces an increase in the behaviour that led to it; removing food is often punishing and leads to a decrease in the behaviour. Turning on an electric shock is typically punishing (and decreases the behaviour that led to it); turning it off is rewarding (and increases the behaviour that led to it).

To keep these four possibilities distinct, Skinner used the term positive for situations in which a stimulus was presented and negative for situations in which it was removed. Consequently, there is both positive reinforcement (where a rewarding stimulus is presented) and negative reinforcement (where an unpleasant stimulus is removed), as well as positive punishment (where an unpleasant stimulus is administered) and negative punishment (where a rewarding stimulus is removed). Here the words positive and negative mean, respectively, something that is added or something that is taken away, but do not mean “good” or “bad” as they do in everyday speech. As you can see from TABLE 7.1, positive and negative reinforcement increase the likelihood of the behaviour; positive and negative punishment decrease the likelihood of the behaviour.

Increases the Likelihood of Behaviour

Decreases the Likelihood of Behaviour

Stimulus is presented

Positive reinforcement

Positive punishment

Stimulus is removed

Negative reinforcement

Negative punishment

Table 7.1: Reinforcement and Punishment

These distinctions can be confusing at first; after all, “negative reinforcement” and “punishment” both sound like they should be “bad” and produce the same type of behaviour. However, negative reinforcement involves something “good”: it is the removal of something unpleasant, like a shock, and the absence of a shock is indeed pleasant.

Why is reinforcement more constructive than punishment in learning desired behaviour?

Reinforcement is generally more effective than punishment in promoting learning. There are many reasons (Gershoff, 2002), but one reason is this: Punishment signals that an unacceptable behaviour has occurred, but it does not specify what should be done instead. Spanking a young child for starting to run into a busy street certainly stops the behaviour, but it does not promote any kind of learning about the desired behaviour such as teaching the child to look both ways before stepping into the street.

7.2.2.1 Primary and Secondary Reinforcement and Punishment

Negative reinforcement involves the removal of something unpleasant from the environment. When Daddy stops the car, he gets a reward: His little monster stops screaming. However, from the perspective of the child, this is positive reinforcement. The child’s tantrum results in something positive added to the environment—stopping for a snack.
©MICHELLE SELESNICK/FLICKR VISION

Reinforcers and punishers often gain their functions from basic biological mechanisms. A pigeon that pecks at a target in a Skinner box is usually reinforced with food pellets, just as an animal who learns to escape a mild electric shock has avoided the punishment of tingly paws. Food, comfort, shelter, or warmth are examples of primary reinforcers because they help satisfy biological needs. However, the vast majority of reinforcers or punishers in our daily lives have little to do with biology: Verbal approval, a bronze trophy, or money all serve powerful reinforcing functions, yet none of them taste very good or help keep you warm at night. The point is, we learn to perform a lot of behaviours based on reinforcements that have little or nothing to do with biological satisfaction.

280

These secondary reinforcers derive their effectiveness from their associations with primary reinforcers through classical conditioning. For example, money starts out as a neutral CS that, through its association with primary USs like acquiring food or shelter, takes on a conditioned emotional element. Flashing lights, originally a neutral CS, acquire powerful negative elements through association with a speeding ticket and a fine.

7.2.2.2 Immediate versus Delayed Reinforcement and Punishment

Figure 7.8: Delay of Reinforcement Rats pressed a lever in order to obtain a food reward. Researchers varied the amount of time between the lever press and the delivery of food reinforcement. The number of lever presses declined substantially with longer delays.

A key determinant of the effectiveness of a reinforcer is the amount of time between the occurrence of a behaviour and the reinforcer: The more time that elapses, the less effective the reinforcer (Lattal, 2010; Renner, 1964). This was dramatically illustrated in experiments with hungry rats in which food reinforcers were given at varying times after the rat pressed the lever (Dickinson, Watt, & Griffiths, 1992). Delaying reinforcement by even a few seconds led to a reduction in the number of times the rat subsequently pressed the lever, and extending the delay to a minute rendered the food reinforcer completely ineffective (see FIGURE 7.8). The most likely explanation for this effect is that delaying the reinforcer made it difficult for the rats to figure out exactly what behaviour they needed to perform in order to obtain it. In the same way, parents who wish to reinforce their children for playing quietly with a piece of candy should provide the candy while the child is still playing quietly; waiting until later when the child may be engaging in other behaviours—perhaps making a racket with pots and pans—will make it more difficult for the child to link the reinforcer with the behaviour of playing quietly (Powell et al., 2009).

How does the concept of delayed reinforcement relate to difficulties with quitting smoking?

The greater potency of immediate versus delayed reinforcers may help us to appreciate why it can be difficult to engage in behaviours that have long-term benefits. The smoker who desperately wants to quit smoking will be reinforced immediately by the feeling of relaxation that results from lighting up, but may have to wait years to be reinforced with better health that results from quitting; the dieter who sincerely wants to lose weight may easily succumb to the temptation of a chocolate sundae that provides reinforcement now, rather than waiting weeks or months for the reinforcement (looking and feeling better) that would be associated with losing excess weight.

281

CULTURE & COMMUNITY: Are There Cultural Differences in Reinforcers?

Reinforcers play a critical role in operant conditioning, and operant approaches that use positive reinforcement have been applied extensively in everyday settings such as behaviour therapy (see Treatment of Psychological Disorders). Surveys designed to assess what kinds of reinforcers are rewarding to individuals have revealed that there can be wide differences among various groups (Dewhurst, & Cautela, 1980; Houlihan et al., 1991).

Recently, 750 high school students from America, Australia, Denmark, Honduras, Korea, Spain, and Tanzania were surveyed in order to evaluate possible cross-cultural differences among reinforcers (Homan et al., 2012). The survey asked students to rate on a 5-point scale how rewarding they found a range of activities, including listening to music, playing various kinds of sports, shopping, reading, spending time with friends, and so on. The American researchers hypothesized that American high school students would differ most strongly from high school students in the third-world countries of Tanzania and Honduras, and that is what they found. The differences between American and Korean students were nearly as large and, somewhat surprisingly, so were the differences between American and Spanish students. There were much smaller differences between Americans and their Australian or Danish counterparts. For example, of the 30 activities evaluated, “Hang with friends” was ranked as the most rewarding activity by American, Australian, and Danish teens. It ranked 5th in Spain, 6th in Honduras, 9th in Korea, and did not make the top-10 list at all for Tanzanian students, who ranked shopping and reading the highest.

These results should be taken with a grain of salt because the researchers did not control for variables other than culture that could influence their results, such as economic status. Nonetheless, they suggest that cultural differences should be considered in the design of programs or interventions that rely on the use of reinforcers to influence the behaviour of individuals who come from different cultures.

Similar considerations apply to punishment: As a general rule, the longer the delay between a behaviour and the administration of punishment, the less effective the punishment will be in suppressing the targeted behaviour (Kamin, 1959; Lerman & Vorndran, 2002). The reduced effectiveness of delayed punishment can be a serious problem in non-laboratory settings, because in everyday life it is often difficult to administer punishment immediately or even soon after a problem behaviour has occurred (Meindl & Casey, 2012). For example, a parent whose child misbehaves at a shopping mall may be unable to punish the child immediately with a time-out because it is impractical in the mall setting. Some problem behaviours, such as cheating, can be difficult to detect immediately and therefore punishment is necessarily delayed. Research in both the laboratory and everyday settings suggests several strategies for increasing the effectiveness of delayed punishment, including increasing the severity of the punishment or attempting to bridge the gap between the behaviour and the punishment with verbal instructions (Meindl & Casey, 2012). The parent in the shopping mall, for example, might tell the misbehaving child exactly when and where a later time-out will occur.

Suppose you are the mayor of a suburban town and you want to institute some new policies to decrease the number of drivers who speed on residential streets. How might you use punishment to decrease the behaviour you dislike (speeding)? How might you use reinforcement to increase the behaviour you desire (safe driving)? Based on the principles of operant conditioning you read about in this section, which approach do you think might be most fruitful?
PHILLIPPE RENAULT/HEMIS/ALAMY

282

7.2.3 The Basic Principles of Operant Conditioning

After establishing how reinforcement and punishment produced learned behaviour, Skinner and other scientists began to expand the parameters of operant conditioning. This took the form of investigating some phenomena that were well known in classical conditioning (such as discrimination, generalization, and extinction) as well as some practical applications, such as how best to administer reinforcement or how to produce complex learned behaviours in an organism. Let us look at some of these basic principles of operant conditioning.

7.2.3.1 Discrimination, Generalization, and the Importance of Context

What does it mean to say that learning takes place in contexts?

We all take off our clothes at least once a day, but usually not in public. We scream at rock concerts, but not in libraries. We say, “Please pass the gravy,” at the dinner table, but not in a classroom. Although these observations may seem like nothing more than common sense, Thorndike was the first to recognize the underlying message: Learning takes place in contexts, not in the free range of any plausible situation. As Skinner rephrased it later, most behaviour is under stimulus control, which develops when a particular response only occurs when an appropriate discriminative stimulus, a stimulus that indicates that a response will be reinforced, is present. Skinner (1972) discussed this process in terms of a “three-term contingency”: In the presence of a discriminative stimulus (classmates drinking coffee together in Tim Hortons), a response (joking comments about a psychology professor’s increasing waistline and receding hairline) produces a reinforcer (laughter among classmates). The same response in a different context—the professor’s office—would most likely produce a very different outcome.

In research on stimulus control, participants trained with Picasso paintings, such as the one on the left, responded to other paintings by Picasso or even to paintings by other Cubists. Participants trained with Monet paintings, such as the one on the right, responded to other paintings by Monet or by other French Impressionists. Interestingly, the participants in this study were pigeons.
TATE GALLERY, LONDON/ART RESOURCE, NY

Stimulus control, perhaps not surprisingly, shows both discrimination and generalization effects similar to those we saw with classical conditioning. To demonstrate this, researchers used either a painting by the French Impressionist Claude Monet or one of Pablo Picasso’s paintings from his Cubist period for the discriminative stimulus (Watanabe, Sakamoto, & Wakita, 1995). Participants in the experiment were only reinforced if they responded when the appropriate painting was presented. After training, the participants discriminated appropriately; those trained with the Monet painting responded when other paintings by Monet were presented and those trained with a Picasso painting reacted when other Cubist paintings by Picasso were shown. And as you might expect, Monet-trained participants did not react to Picassos and Picasso-trained participants did not respond to Monets. What is more, the research participants showed that they could generalize across painters as long as they were from the same artistic tradition. Those trained with Monet responded appropriately when shown paintings by Auguste Renoir (another French Impressionist), and the Picasso-trained participants responded to artwork by Cubist painter Henri Matisse, despite never having seen those paintings before. If these results do not seem particularly startling to you, it might help to know that the research participants were pigeons who were trained to key-peck to these various works of art. Stimulus control, and its ability to foster stimulus discrimination and stimulus generalization, is effective even if the stimulus has no meaning to the respondent.

283

7.2.3.2 Extinction

In classical conditioning, a CR extinguishes when the animal learns that the CS no longer predicts the US. Similarly, an operant behaviour undergoes extinction when the behaviour no longer predicts reward; in other words, when the reinforcements stop. Pigeons cease pecking at a key if food is no longer presented following the behaviour. You would not put more money into a vending machine if it failed to give you its promised chocolate bar or pop. Warm smiles that are are not returned will quickly disappear. On the surface, extinction of operant behaviour looks like that of classical conditioning: The response rate drops off fairly rapidly and, if a rest period is provided, spontaneous recovery is typically seen.

How is the concept of extinction different in operant conditioning versus classical conditioning?

However, there is an important difference. In operant conditioning, the reinforcements only occur when the proper response has been made, and they do not always occur even then. Not every trip into the forest produces nuts for a squirrel, auto salespeople do not sell to everyone who takes a test drive, and researchers run many experiments that do not work out and never get published. Yet these behaviours do not weaken and gradually extinguish. In fact, they typically become stronger and more resilient. Curiously, then, extinction is a bit more complicated in operant conditioning than in classical conditioning because it depends, in part, on how often reinforcement is received. In fact, this principle is an important cornerstone of operant conditioning that we will examine next.

7.2.3.3 Schedules of Reinforcement

Skinner was intrigued by the apparent paradox surrounding extinction, and in his autobiography, he described how he began studying it (Skinner, 1979). He was labouriously rolling ground rat meal and water to make food pellets to reinforce the rats in his early experiments. It occurred to him that perhaps he could save time and effort by not giving his rats a pellet for every bar press but instead delivering food on some intermittent schedule. The results of this hunch were dramatic. Not only did the rats continue bar pressing, but they also shifted the rate and pattern of bar pressing depending on the timing and frequency of the presentation of the reinforcers. Unlike classical conditioning, where the sheer number of learning trials was important, in operant conditioning the pattern with which reinforcements appeared was crucial.

Skinner explored dozens of what came to be known as schedules of reinforcement (Ferster & Skinner, 1957) (see FIGURE 7.9 on the next page). The two most important are interval schedules, based on the time intervals between reinforcements, and ratio schedules, based on the ratio of responses to reinforcements.

Figure 7.9: Reinforcement Schedules Different schedules of reinforcement produce different rates of responding. These lines represent the amount of responding that occurs under each type of reinforcement. The black slash marks indicate when reinforcement was administered. Notice that ratio schedules tend to produce higher rates of responding than do interval schedules, as shown by the steeper lines for fixed-ratio and variable-ratio reinforcement.

284

7.2.3.3.1 Interval Schedules

Students cramming for an exam often show the same kind of behaviour as pigeons being reinforced under a fixed-interval schedule.
BRAND X PICTURES/JUPITER IMAGES

Under a fixed-interval schedule (FI), reinforcers are presented at fixed-time periods, provided that the appropriate response is made. For example, on a 2-minute fixed-interval schedule, a response will be reinforced, but only after 2 minutes have expired since the last reinforcement. Rats and pigeons in Skinner boxes produce predictable patterns of behaviour under these schedules. They show little responding right after the presentation of the reinforcement, but as the next time interval draws to a close, they show a burst of responding. Many undergraduates behave exactly like this. They do relatively little work until just before the upcoming exam, then engage in a burst of reading and studying.

Radio station promotions and giveaways often follow a variable-interval schedule of reinforcement.
© RICHARD HUTCHINGS/PHOTOEDIT

How does a radio station use scheduled reinforcements to keep you listening?

Under a variable-interval schedule (VI), a behaviour is reinforced based on an average time that has expired since the last reinforcement. For example, on a 2-minute variable-interval schedule, responses will be reinforced every 2 minutes on average but not after each 2-minute period. Variable-interval schedules typically produce steady, consistent responding because the time until the next reinforcement is less predictable. Variable-interval schedules are not encountered that often in real life, although one example might be radio promotional giveaways, such as tickets to rock concerts. The reinforcement—getting the tickets—might average out to once an hour across the span of the broadcasting day, but the presentation of the reinforcement is variable: It might come early in the 10:00 o’clock hour, later in the 11:00 o’clock hour, immediately into the 12:00 o’clock hour, and so on.

Both fixed-interval schedules and variable-interval schedules tend to produce slow, methodical responding because the reinforcements follow a time scale that is independent of how many responses occur. It does not matter if a rat on a fixed-interval schedule presses a bar 1 time during a 2-minute period or 100 times: The reinforcing food pellet will not drop out of the shoot until 2 minutes have elapsed, regardless of the number of responses before that 2 minutes is up.

285

7.2.3.3.2 Ratio Schedules

These pieceworkers in a textile factory get paid following a fixed-ratio schedule: They receive payment after a number of shirts have been sewn.
JEFF HOLT/BLOOMBERG VIA GETTY IMAGES

Under a fixed-ratio schedule (FR), reinforcement is delivered after a specific number of responses have been made. One schedule might present reinforcement after every fourth response, a different schedule might present reinforcement after every 20 responses; the special case of presenting reinforcement after each response is called continuous reinforcement, and it is what drove Skinner to investigate these schedules in the first place. Notice that, in each example, the ratio of reinforcements to responses, once set, remains fixed.

How do ratio schedules work to keep you spending your money?

There are many situations in which people, sometimes unknowingly, find themselves being reinforced on a fixed-ratio schedule: Book clubs often give you a free book after a set number of regular purchases; pieceworkers get paid after making a fixed number of products; and some credit card companies return to their customers a percentage of the amount charged. When a fixed-ratio schedule is operating, it is possible, in principle, to know exactly when the next reinforcer is due. A laundry pieceworker on a 10-response, fixed-ratio schedule who has just washed and ironed the ninth shirt knows that payment is coming after the next shirt is done.

Slot machines in casinos pay out following a variable-ratio schedule. This helps explain why some gamblers feel incredibly lucky, whereas others (like this chap) cannot believe they can play a machine for so long without winning a thing.
©MBI/ALAMY

Under a variable-ratio schedule (VR) the delivery of reinforcement is based on a particular average number of responses. For example, if a laundry worker was following a 10-response variable-ratio schedule instead of a fixed-ratio schedule, she or he would still be paid, on average, for every 10 shirts washed and ironed but not for each 10th shirt. Slot machines in a modern casino pay off on variable-ratio schedules that are determined by the random number generator that controls the play of the machines. A casino might advertise that they pay off on “every 100 pulls on average,” which could be true. However, one player might hit a jackpot after 3 pulls on a slot machine, whereas another player might not hit a jackpot until after 80 pulls. The ratio of responses to reinforcements is variable, which probably helps casinos stay in business.

Not surprisingly, variable-ratio schedules produce slightly higher rates of responding than fixed-ratio schedules primarily because the organism never knows when the next reinforcement is going to appear. What is more, the higher the ratio, the higher the response rate tends to be; a 20-response variable-ratio schedule will produce considerably more responding than a 2-response variable-ratio schedule. When schedules of reinforcement provide intermittent reinforcement when only some of the responses made are followed by reinforcement, they produce behaviour that is much more resistant to extinction than a continuous reinforcement schedule. One way to think about this effect is to recognize that the more irregular and intermittent a schedule is, the more difficult it becomes for an organism to detect when it has actually been placed on extinction.

Imagine you own an insurance company and you want to encourage your salespeople to sell as many policies as possible. You decide to give them bonuses, based on the number of policies sold. How might you set up a system of bonuses using an FR schedule? Using a VR schedule? Which system do you think would encourage your salespeople to work harder, in terms of making more sales?
ISTOCKPHOTO/THINKSTOCK

For example, if you have just put a dollar into a pop machine that, unbeknownst to you, is broken, no pop comes out. Because you are used to getting your pops on a continuous reinforcement schedule—one dollar produces one pop—this abrupt change in the environment is easily noticed and you are unlikely to put additional money into the machine: You would quickly show extinction. However, if you have put your dollar into a slot machine that, unbeknownst to you, is broken, do you stop after one or two plays? Almost certainly not. If you are a regular slot player, you are used to going for many plays in a row without winning anything, so it is difficult to tell that anything is out of the ordinary. Under conditions of intermittent reinforcement, all organisms will show considerable resistance to extinction and continue for many trials before they stop responding. This effect has even been observed in infants (Weir et al., 2005).

286

This relationship between intermittent reinforcement schedules and the robustness of the behaviour they produce is called the intermittent reinforcement effect, the fact that operant behaviours that are maintained under intermittent reinforcement schedules resist extinction better than those maintained under continuous reinforcement. In one extreme case, Skinner gradually extended a variable-ratio schedule until he managed to get a pigeon to make an astonishing 10 000 pecks at an illuminated key for one food reinforcer! Behaviour maintained under a schedule like this is virtually immune to extinction.

7.2.3.4 Shaping through Successive Approximations

Have you ever been to Marineland and wondered how the dolphins learn to jump up in the air, twist around, splash back down, do a somersault, and then jump through a hoop, all in one smooth motion? Well, they do not. Wait—of course they do—you have seen them. It is just that they do not learn to do all those complex aquabatics in one smooth motion. Rather, elements of their behaviour are shaped over time until the final product looks like one smooth motion.

How can operant conditioning produce complex behaviours?

Skinner noted that the trial-by-trial experiments of Pavlov and Thorndike were rather artificial. Behaviour rarely occurs in fixed frameworks where a stimulus is presented and then an organism has to engage in some activity or another. We are continuously acting and behaving, and the world around us reacts in response to our actions. Most of our behaviours, then, are the result of shaping learning that results from the reinforcement of successive steps to a final desired behaviour. The outcomes of one set of behaviours shape the next set of behaviours, whose outcomes shape the next set of behaviours, and so on.

B. F. Skinner shaping a dog named Agnes. In the span of 20 minutes, Skinner was able to use reinforcement of successive approximations to shape Agnes’s behaviour. The result was a pretty neat trick: to wander in, stand on hind legs, and jump.
LIBRARY OF CONGRESS/LOOK MAGAZINE PHOTOGRAPHIC COLLECTION

Skinner realized the potential power of shaping one day in 1943 when he was working on a wartime project sponsored by General Mills in a lab on the top floor of a flour mill where pigeons frequently visited (Peterson, 2004). In a lighthearted moment, Skinner and his colleagues decided to see whether they could teach the pigeons to “bowl” by swiping with their beaks at a ball that Skinner had placed in a box along with some pins. Nothing worked until Skinner decided to reinforce any response even remotely related to a swipe, such as merely looking at the ball. “The result amazed us,” Skinner recalled. “In a few minutes the ball was caroming off the walls of the box as if the pigeon had been a champion squash player” (Skinner, 1958, p. 974). Skinner applied this insight in his later laboratory research. For example, he noted that if you put a rat in a Skinner box and wait for it to press the bar, you could end up waiting a very long time: Bar pressing just is not very common in a rat’s natural repertoire of responses. However, it is relatively easy to shape bar pressing. Watch the rat closely: If it turns in the direction of the bar, deliver a food reward. This will reinforce turning toward the bar, making such a movement more likely. Now wait for the rat to take a step toward the bar before delivering food; this will reinforce moving toward the bar. After the rat walks closer to the bar, wait until it touches the bar before presenting the food. Notice that none of these behaviours is the final desired behaviour (reliably pressing the bar). Rather, each behaviour is a successive approximation to the final product, or a behaviour that gets incrementally closer to the overall desired behaviour. In the dolphin example—and indeed, in many instances of animal training in which relatively simple animals seem to perform astoundingly complex behaviours—you can think through how each smaller behaviour is reinforced until the overall sequence of behaviour is performed reliably.

287

7.2.3.5 Superstitious Behaviour

Everything we have discussed so far suggests that one of the keys to establishing reliable operant behaviour is the correlation between an organism’s response and the occurrence of reinforcement. In the case of continuous reinforcement, when every response is followed by the presentation of a reinforcer, there is a one-to-one, or perfect, correlation. In the case of intermittent reinforcement, the correlation is weaker (i.e., not every response is met with the delivery of reinforcement), but it is not zero. As you read in the Methods in Psychology chapter, however, just because two things are correlated (i.e., they tend to occur together in time and space) does not imply that there is causality (i.e., the presence of one reliably causes the other to occur).

People believe in many different superstitions and engage in all kinds of superstitious behaviours. Professional athletes and coaches are notoriously superstitious. Patrick Roy was perhaps the greatest goaltender the Montreal Canadiens ever had. During games he would talk to the goal posts, and touch them, expressing his gratitude to them when a puck was deflected. Before games, he would dress in his equipment in a very specific order and then, once on the ice, he would skate backward toward the net before turning around at the last second. Skinner thought superstitions resulted from the unintended reinforcement of inconsequential behaviour.
BRUCE BENNETT STUDIOS/GETTY IMAGES

How would a behaviourist explain superstitions?

Skinner (1948) designed an experiment that illustrates this distinction. He put several pigeons in Skinner boxes, set the food dispenser to deliver food every 15 seconds, and left the birds to their own devices. Later he returned and found the birds engaging in odd, idiosyncratic behaviours, such as pecking aimlessly in a corner or turning in circles. He referred to these behaviours as “superstitious” and offered a behaviourist analysis of their occurrence. The pigeons, he argued, were simply repeating behaviours that had been accidentally reinforced. A pigeon that just happened to have pecked randomly in the corner when the food showed up had connected the delivery of food to that behaviour. Because this pecking behaviour was reinforced by the delivery of food, the pigeon was likely to repeat it. Now pecking in the corner was more likely to occur, and it was more likely to be reinforced 15 seconds later when the food appeared again. For each pigeon, the behaviour reinforced would most likely be whatever the pigeon happened to be doing when the food was first delivered. Skinner’s pigeons acted as though there was a causal relationship between their behaviours and the appearance of food when it was merely an accidental correlation.

LIBRARY OF CONGRESS/LOOK MAGAZINE PHOTOGRAPHIC COLLECTION

288

Although some researchers questioned Skinner’s characterization of these behaviours as superstitious (Staddon & Simmelhag, 1971), later studies have shown that reinforcing adults or children using schedules in which reinforcement is not contingent on their responses can produce seemingly superstitious behaviour. It seems that people, like pigeons, behave as though there is a correlation between their responses and reward when in fact the connection is merely accidental (Bloom et al., 2007; Mellon, 2009; Ono, 1987; Wagner & Morris, 1987). Such findings should not be surprising to sports fans. Baseball players who enjoy several home runs on a day when they happened not to have showered are likely to continue that tradition, laboring under the belief that the accidental correlation between poor personal hygiene and a good day at bat is somehow causal. This “stench causes home runs” hypothesis is just one of many examples of human superstitions (Gilbert et al., 2000; Radford & Radford, 1949).

7.2.4 A Deeper Understanding of Operant Conditioning

Like classical conditioning, operant conditioning also quickly proved to be a powerful approach to learning. But B. F. Skinner, like Watson before him, was satisfied to observe an organism perform a learned behaviour; he did not look for a deeper explanation of mental processes (Skinner, 1950). In this view, an organism behaved in a certain way as a response to stimuli in the environment, not because there was any wanting, wishing, or willing by the animal in question. However, some research on operant conditioning digs deeper into the underlying mechanisms that produce the familiar outcomes of reinforcement. Like we did earlier in the chapter with classical conditioning, let us examine three elements that expand our view of operant conditioning: the cognitive, neural, and evolutionary elements of operant conditioning.

7.2.4.1 The Cognitive Elements of Operant Conditioning

Edward Chace Tolman advocated a cognitive approach to operant learning and provided evidence that in maze-learning experiments, rats develop a mental picture of the maze, which he called a cognitive map.
BANCROFT LIBRARY/UNIVERSITY OF CALIFORNIA, BERKELEY

Edward Chace Tolman (1886–1959) was one of the first researchers to question Skinner’s strictly behaviourist interpretation of learning, and was the strongest early advocate of a cognitive approach to operant learning. Tolman argued that there was more to learning than just knowing the circumstances in the environment (the properties of the stimulus) and being able to observe a particular outcome (the reinforced response). Instead, Tolman proposed that an animal established a means–ends relationship. That is, the conditioning experience produced knowledge or a belief that, in this particular situation, a specific reward (the end state) will appear if a specific response (the means to that end) is made.

Tolman’s means–ends relationship may remind you of the Rescorla–Wagner model of classical conditioning. Rescorla argued that the CS functions by setting up an expectation about the arrival of a US, and “expectations” most certainly involve cognitive processes. In both Rescorla’s and Tolman’s theories, the stimulus does not directly evoke a response; rather, it establishes an internal cognitive state, which then produces the behaviour. These cognitive theories of learning focus less on the stimulus–response (SR) connection and more on what happens in the organism’s mind when faced with the stimulus. During the 1930s and 1940s, Tolman and his students conducted studies that focused on latent learning and cognitive maps, two phenomena that strongly suggest that simple S–R interpretations of operant learning behaviour are inadequate.

7.2.4.1.1 Latent Learning and Cognitive Maps

In latent learning, something is learned, but it is not manifested as a behavioural change until sometime in the future. Latent learning can easily be established in rats and occurs without any obvious reinforcement, a finding that posed a direct challenge to the then-dominant behaviourist position that all learning required some form of reinforcement (Tolman & Honzik, 1930a).

289

Tolman gave three groups of rats access to a complex maze every day for over 2 weeks. The control group never received any reinforcement for navigating the maze. They were simply allowed to run around until they reached the goal box at the end of the maze. In FIGURE 7.10 you can see that over the 2 weeks of the study, the control group (in green) got a little better at finding their way through the maze, but not by much. A second group of rats received regular reinforcements; when they reached the goal box, they found a small food reward there. Not surprisingly, these rats showed clear learning, as can be seen in blue in Figure 7.10. A third group was treated exactly like the control group for the first 10 days and then rewarded for the last 7 days. This group’s behaviour (in orange) was quite striking. For the first 10 days, they behaved like the rats in the control group. However, during the final 7 days, they behaved a lot like the rats in the second group that had been reinforced every day. Clearly, the rats in this third group had learned a lot about the maze and the location of the goal box during those first 10 days even though they had not received any reinforcements for their behaviour. In other words, they showed evidence of latent learning.

Figure 7.10: Latent Learning Rats in a control group that never received any reinforcement (in green) improved at finding their way through the maze over 17 days but not by much. Rats that received regular reinforcements (in blue) showed fairly clear learning; their error rate decreased steadily over time. Rats in the latent learning group (in orange) were treated exactly like the control group rats for the first 10 days and then like the regularly rewarded group for the last 7 days. Their dramatic improvement on day 12 shows that these rats had learned a lot about the maze and the location of the goal box even though they had never received reinforcements. Notice, also, that on the last 7 days, these latent learners actually seem to make fewer errors than their regularly rewarded counterparts.

What are cognitive maps and why are they a challenge to behaviourism?

These results suggested to Tolman that beyond simply learning “start here, end here,” his rats had developed a sophisticated mental picture of the maze. Tolman called this a cognitive map, a mental representation of the physical features of the environment. Tolman thought that the rats had developed a mental picture of the maze, along the lines of “make two lefts, then a right, then a quick left at the corner,” and he devised several experiments to test that idea (Tolman & Honzik, 1930b; Tolman, Ritchie, & Kalish, 1946).

7.2.4.1.2 Further Support for Cognitive Explanations

One simple experiment provided support for Tolman’s theories and wreaked havoc with the noncognitive explanations offered by staunch behaviourists. Tolman trained a group of rats in the maze shown in FIGURE 7.11a on the next page. As you can see, rats run down a straightaway, take a left, a right, a long right, and then end up in the goal box at the end of the maze. Because we are looking at it from above, we can see that the rat’s position at the end of the maze, relative to the starting point, is “diagonal to the upper right.” Of course, all the rat in the maze sees are the next set of walls and turns until it eventually reaches the goal box. Nonetheless, rats learned to navigate this maze without error or hesitation after about four nights. Clever rats. But they were more clever than you think.

After they had mastered the maze, Tolman changed things around a bit and put them in the maze shown in FIGURE 7.11b. The goal box was still in the same place relative to the start box. However, many alternative paths now spoked off the main platform, and the main straightaway that the rats had learned to use was blocked. Most behaviourists would predict that the rats in this situation—running down a familiar path only to find it blocked—would show stimulus generalization and pick the next closest path, such as one immediately adjacent to the straightaway. This was not what Tolman observed. When faced with the blocked path, the rats instead ran all the way down the path that led directly to the goal box. The rats had formed a sophisticated cognitive map of their environment and behaved in a way that suggested they were successfully following that map after the conditions had changed. Latent learning and cognitive maps suggest that operant conditioning involves much more than an animal responding to a stimulus. Tolman’s experiments strongly suggest that there is a cognitive component, even in rats, to operant learning.

Figure 7.11: Cognitive Maps (a) Rats trained to run from a start box to a goal box in the maze on the left mastered the task quite readily. When those rats were then placed in the maze on the right (b), in which the main straightaway had been blocked, they did something unusual. Rather than simply backtrack and try the next closest runway (i.e., those labelled 8 or 9 in the figure), which would be predicted by stimulus generalization, the rats typically chose runway 5, which led most directly to where the goal box had been during their training. The rats had formed a cognitive map of their environment and knew where they needed to end up spatially, compared to where they began.

290

7.2.4.1.3 Learning to Trust: For Better or Worse

Cognitive factors also played a key role in an experiment examining learning and brain activity (using fMRI) in people who played a “trust” game with a fictional partner (Delgado, Frank, & Phelps, 2005). On each trial, a participant could either keep a $1 reward or transfer the reward to a partner, who would receive $3. The partner could then either keep the $3 or share half of it with the participant. When playing with a partner who was willing to share the reward, the participant would be better off transferring the money, but when playing with a partner who did not share, the participant would be better off keeping the reward in the first place. Participants in such experiments typically find out who is trustworthy on the basis of trial-and-error learning during the game, transferring more money to partners who reinforce them by sharing.

Bernard Madoff, shown here leaving a court hearing in March 2009, pleaded guilty to fraud after swindling billions of dollars from investors who trusted him.
AFP PHOTO/TIMOTHY A. CLARY/NEWSCOM

In the study by Delgado and his colleagues, participants were given detailed descriptions of their partners that either portrayed the partners as trustworthy, neutral, or suspect. Even though during the game itself the sharing behaviour of the three types of partners did not differ—they each reinforced participants to the same extent through sharing—the participants’ cognitions about their partners had powerful effects. Participants transferred more money to the trustworthy partner than to the others, essentially ignoring the trial-by-trial feedback that would ordinarily shape their playing behaviour, thus reducing the amount of reward they received. Highlighting the power of the cognitive effect, signals in a part of the brain that ordinarily distinguishes between positive and negative feedback were evident only when participants played with the neutral partner; these feedback signals were absent when participants played with the trustworthy partner and reduced when participants played with the suspect partner.

Why might cognitive factors have been a factor in people’s trust of Bernie Madoff?

These kinds of effects might help us to understand otherwise perplexing real-life cases such as that of con artist Bernard Madoff, who in March 2009 pleaded guilty to swindling numerous American investors out of billions of dollars in a highly publicized case that attracted worldwide attention. Madoff had been the chairman of the American stock exchange NASDAQ and seemed to his investors an extremely trustworthy figure with whom one could safely invest money. Those powerful cognitions might have caused investors to miss danger signals that otherwise would have led them to learn about the true nature of Madoff’s operation. If so, the result was one of the most expensive failures of learning in modern history.

291

7.2.4.2 The Neural Elements of Operant Conditioning

Soon after psychologists came to appreciate the range and variety of things that could function as reinforcers, they began looking for underlying brain mechanisms that might account for these effects. The first hint of how specific brain structures might contribute to the process of reinforcement came from the discovery of what came to be called pleasure centres. McGill researchers James Olds and Peter Milner inserted tiny electrodes into different parts of a rat’s brain and allowed the animal to control electric stimulation of its own brain by pressing a bar. They discovered that some brain areas, particularly those in the limbic system (see the Neuroscience and Behaviour chapter), produced what appeared to be intensely positive experiences: The rats would press the bar repeatedly to stimulate these structures. The researchers observed that these rats would ignore food, water, and other life-sustaining necessities for hours on end simply to receive stimulation directly in the brain. They then called these parts of the brain pleasure centres (Olds, 1956) (see FIGURE 7.12).

Figure 7.12: Pleasure Centres in the Brain The nucleus accumbens, medial forebrain bundle, and hypothalamus are all major pleasure centres in the brain.

In the years since these early studies, researchers have identified a number of structures and pathways in the brain that deliver rewards through stimulation (Wise, 1989, 2005). The neurons in the medial forebrain bundle, a pathway that meanders its way from the midbrain through the hypothalamus into the nucleus accumbens, are the most susceptible to stimulation that produces pleasure. This is not surprising because psychologists have identified this bundle of cells as crucial to behaviours that clearly involve pleasure, such as eating, drinking, and engaging in sexual activity. Second, the neurons all along this pathway and especially those in the nucleus accumbens itself are all dopaminergic (i.e., they secrete the neurotransmitter dopamine). Remember from the Neuroscience and Behaviour chapter that higher levels of dopamine in the brain are usually associated with positive emotions. During recent years, several competing hypotheses about the precise role of dopamine have emerged, including the idea that dopamine is more closely linked with the expectation of reward than with reward itself (Fiorillo, Newsome, & Schultz, 2008; Schultz, 2006, 2007), or that dopamine is more closely associated with wanting or even craving something rather than simply liking it (Berridge, 2007).

How do specific brain structures contribute to the process of reinforcement?

Whichever view turns out to be correct, researchers have found good support for a reward centre in which dopamine plays a key role. First, as you have just seen, rats will work to stimulate this pathway at the expense of other basic needs (Olds & Fobes, 1981). However, if drugs that block the action of dopamine are administered to the rats, they cease pressing the lever for stimulation (Stellar, Kelley, & Corbett, 1983). Second, drugs such as cocaine, amphetamine, and opiates activate these pathways and centres (Moghaddam & Bunney, 1989), but dopamine-blocking drugs dramatically diminish their reinforcing effects (White & Milner, 1992). Third, fMRI studies show activity in the nucleus accumbens in heterosexual men looking at pictures of attractive women (Aharon et al., 2001) and in individuals who believe they are about to receive money (Cooper et al., 2009; Knutson et al., 2001). Finally, rats given primary reinforcers such as food or water, or who are allowed to engage in sexual activity, show increased dopamine secretion in the nucleus accumbens—but only if they are hungry, thirsty, or sexually aroused (Damsma et al., 1992). This last finding is exactly what we might expect given our earlier discussion of the complexities of reinforcement. After all, food tastes a lot better when we are hungry and sexual activity is more pleasurable when we are aroused. These biological structures underlying rewards and reinforcements probably evolved to ensure that species engaged in activities that helped survival and reproduction. (To learn the relationship between dopamine and Parkinson’s, see the Hot Science box below.)

292

HOT SCIENCE: Dopamine and Reward Learning in Parkinson’s Disease

Many of us have relatives or friends who have been affected by Parkinson’s disease (recall Michael J. Fox), which causes difficulty moving and involves the loss of neurons that use dopamine. As you learned in the Neuroscience and Behaviour chapter, the drug L-dopa is often used to treat Parkinson’s disease because it spurs surviving neurons to produce more dopamine. Dopamine also plays a key role in reward-related learning.

Researchers have focused on the role of dopamine in reward-based learning, especially the expectation of reward. A key idea is that dopamine plays an important role in reward prediction error: the difference between the actual reward received versus the amount of predicted or expected reward. For example, when an animal presses a lever and receives an unexpected food reward, a positive prediction error occurs (a better than expected outcome) and the animal learns to press the lever again. By contrast, when an animal expects to receive a reward from pressing a lever but does not receive it, a negative prediction error occurs (a worse than expected outcome) and the animal will subsequently be less likely to press the lever again. Reward prediction error can thus serve as a kind of “teaching signal” that helps the animal to learn to behave in a way that maximizes reward.

In pioneering studies linking reward prediction error to dopamine, Wolfram Schultz and his colleagues recorded activity in dopamine neurons located in the reward centres of a monkey’s brain. They found that those neurons showed increased activity when the monkey received unexpected juice rewards and decreased activity when the monkey did not receive expected juice rewards. This suggests that dopamine neurons play an important role in generating the reward prediction error (Schultz, 2006, 2007; Schultz, Dayan, & Montague, 1997). Schultz’s observations have been backed up by studies using neuroimaging techniques to show that human brain regions involved in reward-related learning also produce reward prediction error signals, and that dopamine is involved in generating those signals (O’Doherty et al., 2003; Pessiglione et al., 2006).

THINKSTOCK

So, how do these findings relate to people with Parkinson’s disease? Several studies report that reward-related learning can be impaired in persons with Parkinson’s (Dahger & Robbins, 2009). Other studies provide evidence that when individuals with Parkinson’s perform reward-related learning tasks, the reward prediction error signal is disrupted (Schonberg et al., 2010). Recent studies have examined the influence of Parkinson’s treatment drugs on reward-related learning and the reward prediction error signal. In one study, participants with Parkinson’s who were either treated with L-dopa (combined in some participants with drugs that stimulate dopamine receptors) or not, were given a reward-learning task that involved choosing between two computer-animated crab traps; one was much more likely than the other to contain the reward (i.e., a crab) (Rutledge et al., 2009). After each trial, participants were given feedback concerning their choice. Participants on dopaminergic drugs had a higher learning rate than those not on the drugs. However, there was greater learning for the positive reward prediction error (learning based on positive outcomes) than for the negative reward prediction error (learning based on negative outcomes).

These results may relate to another intriguing feature of Parkinson’s disease: Some individuals develop serious problems with compulsive gambling, shopping, and related impulsive behaviours. Such problems seem to be largely the consequence of Parkinson’s drugs that stimulate dopamine receptors (Ahlskog, 2011; Weintraub, Papay, & Siderowf, 2013). Voon and her colleagues (2011) studied individuals who developed gambling and shopping problems only after contracting Parkinson’s disease and receiving treatment with drugs that stimulate dopamine receptors. Those individuals were scanned with fMRI while they performed a reward-learning task in which they chose between stimuli that had a greater or lesser probability of producing a monetary gain or loss. Their performance was compared to that of individuals with Parkinson’s without gambling and shopping problems, either on or off drug treatment. Compared with both of these groups, those who had compulsive gambling and shopping problems showed an increase in the rate of learning from gains. Importantly, they also showed an increased positive reward prediction error signal in the striatum, a subcortical region in the basal ganglia (see Figure 3.16) that is rich in dopamine receptors and has been linked with the reward prediction error. The researchers suggested that these results likely reflected an effect of the drug treatment on individuals who are susceptible to compulsive behaviours.

More studies will be needed to unravel the complex relations among dopamine, reward prediction error, learning, and Parkinson’s disease, but the studies to date suggest that such research should have important practical as well as scientific implications.

293

7.2.4.3 The Evolutionary Elements of Operant Conditioning

Figure 7.13: A Simple T Maze When rats find food in the right arm of a typical T maze, on the next trial, they will often run to the left arm of the maze. This contradicts basic principles of operant conditioning: If the behaviour of running to the right arm is reinforced, it should be more likely to occur again in the future. However, this behaviour is perfectly consistent with a rat’s evolutionary preparedness. Like most foraging animals, rats explore their environments in search of food and seldom return to where food has already been found. Quite sensibly, if food has already been found in the right arm of the T maze, the rat will search the left arm next to see if more food is there.

As you will recall, classical conditioning has an adaptive value that has been fine-tuned by evolution. Not surprisingly, we can also view operant conditioning from an evolutionary perspective. This viewpoint grew out of a set of curious observations from the early days of conditioning experiments. Several behaviourists who were using simple T mazes like the one shown in FIGURE 7.13 to study learning in rats discovered that if a rat found food in one arm of the maze on the first trial of the day, it typically ran down the other arm on the very next trial. A staunch behaviourist would not expect the rats to behave this way. After all, the rats in these experiments were hungry and they had just been reinforced for turning in a particular direction. According to operant conditioning, this should increase the likelihood of turning in that same direction, not reduce it. With additional trials the rats eventually learned to go to the arm with the food, but they had to learn to overcome this initial tendency to go the wrong way. How can we explain this?

What was puzzling from a behaviourist perspective makes sense when viewed from an evolutionary perspective. Rats are foragers and, like all foraging species, they have evolved a highly adaptive strategy for survival. They move around in their environment looking for food. If they find it somewhere, they eat it (or store it) and then go look somewhere else for more. If they do not find food, they forage in another part of the environment. So, if the rat just found food in the right arm of a T maze, the obvious place to look next time is the left arm. The rat knows that there is not more food in the right arm because it just ate the food it found there! Indeed, foraging animals such as rats have well-developed spatial representations that allow them to search their environment efficiently. If given the opportunity to explore a complex environment like the multi-arm maze shown in FIGURE 7.14, rats will systematically go from arm to arm collecting food, rarely returning to an arm they have previously visited (Olton & Samuelson, 1976).

Figure 7.14: A Complex T Maze Like many other foraging species, rats placed in a complex T maze such as this one show evidence of their evolutionary preparedness. These rats will systematically travel from arm to arm in search of food, never returning to arms they have already visited.

What explains a rat’s behaviour in a T maze?

Two of Skinner’s former students, Keller Breland and Marian Breland, were among the first researchers to discover that it was not just rats in T mazes that presented a problem for behaviourists (Breland & Breland, 1961). The Brelands pointed out that psychologists and the organisms they study often seemed to “disagree” on what the organisms should be doing. Their argument was simple: When this kind of dispute develops, the animals are always right, and the psychologists had better rethink their theories.

The Brelands, who made a career out of training animals for commercials and movies, often used pigs because pigs are surprisingly good at learning all sorts of tricks. However, they discovered that it was extremely difficult to teach a pig the simple task of dropping coins in a box. Instead of depositing the coins, the pigs persisted in rooting with them as if they were digging them up in soil, tossing them in the air with their snouts and pushing them around. The Brelands tried to train raccoons at the same task, with different but equally dismal results. The raccoons spent their time rubbing the coins between their paws instead of dropping them in the box.

294

The misbehaviour of organisms: Pigs are biologically predisposed to root out their food, just as raccoons are predisposed to wash their food. Trying to train either species to behave differently can prove to be an exercise in futility.
MILLARD H. SHARP/SCIENCE SOURCE
BY KAZ/WWW.CARTOONSTOCK.COM

Having learned the association between the coins and food, the animals began to treat the coins as stand-ins for food. Pigs are biologically predisposed to root out their food, and raccoons have evolved to clean their food by rubbing it with their paws. That is exactly what each species of animal did with the coins.

The Brelands’ work shows that all species, including humans, are biologically predisposed to learn some things more readily than others and to respond to stimuli in ways that are consistent with their evolutionary history (Gallistel, 2000). Such adaptive behaviours, however, evolved over extraordinarily long periods and in particular environmental contexts. If those circumstances change, some of the behavioural mechanisms that support learning can lead an organism astray. Raccoons that associated coins with food failed to follow the simple route to obtaining food by dropping the coins in the box; “nature” took over and they wasted time rubbing the coins together. The point is that, although much of every organism’s behaviour results from predispositions sharpened by evolutionary mechanisms, these mechanisms sometimes can have ironic consequences.

  • Operant conditioning, as developed by B. F. Skinner, is a process by which behaviours are reinforced and therefore become more likely to occur, where complex behaviours are shaped through reinforcement, and where the contingencies between actions and outcomes are critical in determining how an organism’s behaviours will be displayed.

  • Like Watson, Skinner tried to explain behaviour without considering cognitive, neural, or evolutionary mechanisms. However, as with classical conditioning, this approach turned out to be incomplete.

  • Operant conditioning has clear cognitive components: Organisms behave as though they have expectations about the outcomes of their actions and adjust their actions accordingly. Cognitive influences can sometimes override the trial-by-trial feedback that usually influences learning.

    295

  • Studies with both animals and people highlight the operation of a neural reward centre that impacts learning.

  • The associative mechanisms that underlie operant conditioning have their roots in evolutionary biology. Some things are relatively easily learned and others are difficult; the history of the species is usually the best clue as to which will be which.