Transcript: Operant Conditioning: Learned Behaviors

Operant conditioning is the strengthening of behavior by consequences that occur. So for example, I see a light switch on the wall and I move my finger like this pushing the light switch upward. The lights go on— amazing, fantastic.

What if you did that and nothing happened? Would you continue, over and over and over and over again, to pop the light switch up? No.

People do something. An animal does something. So the behaviorist, any organism does essentially anything. And often, the environment subsequently provides some consequence that follows that behavior. And operant conditioning, then, is the study of the way in which the behavior— that is, the frequency or probability that the organism does that behavior— changes as a result of that consequence.

The study of how novel behaviors are learned based on the consequences of those behaviors began with the classic puzzle box experiments of Edward Thorndike in 1890s. Thorndike conceived of a wooden crate with a door that would open when a concealed latch was moved in the right way by a hungry cat eager to obtain a food reward.

After a short time in the box, the cats were quite skilled at triggering the door for their own release. All their behaviors which were ineffective at getting them out gradually faded away. And soon, the hungry cats went to trigger the door each time they were placed in the box.

Here's what he found over and over and over again— that the learning process was orderly, meaning, the first time the cat escaped, it took him, say, seven minutes or 10 minutes or 15 minutes. But the second time, it took less time.

Not long after Thorndike left Harvard, a graduate student arrived there named BF Skinner. And Skinner was also interested in consequences. He was interested in the law of effect. But he ended up studying it with different tools.

He ended up with this kind of set up. There's a rat in a box. And in the box, there's a lever that the rat can press. And there's also an opening in the wall that leads to a pellet dispenser, where the rat can access food.

In one of his many experiments, Skinner would administer food rewards, demonstrating the effect of positive reinforcers, and electric shocks, demonstrating positive punishers. Positive here means something that is added, negative, something that is taken away. By orchestrating the reinforcer and punishers within the box in both positive and negative directions, Skinner was able to increase or decrease the behaviors of the animal within, depending on the consequences of those behaviors.

One of the first things he learned was he didn't have to present food following every single lever press. He could present food just occasionally.

Operant conditioning can be broken down into different types of reinforcers. You have your primary reinforcers, which have direct biological ties. We have learned about cues in our environment and actions we need to take to prepare food and beverages for ourselves. We learn these actions through operant conditioning.

Then, you have your secondary reinforcers. These are not directly biologically relevant to the species. They derive their importance from learned associations with primary enforcers. Money is perhaps the most potent example. It starts out as a neutral piece of paper, but through its associations with everything it gets us, it takes on a conditioned emotional element.

Reinforcement comes in positive and negative doses. Remember, positive mean something that is presented or added as part of the conditioning. Positive reinforcement presents a stimulus that reinforces further behavior to obtain that stimulus— in this case, a cup of coffee.

Positive punishment occurs when we are presented with something that deters us from that behavior in the future— in this case, illegally parking and receiving a ticket. A negative reinforcement involves taking action to remove an unpleasant stimulus, conditioning us to take that same action in the future.

Negative punishment is when a rewarding stimulus is removed, conditioning us for avoidance of the behavior that got us there— in this case, avoiding leaving your coffee sitting too long so it gets cold.

At first, operant conditioning looks a lot like classical conditioning, when it comes to extinction. Warm smiles that are greeted with cold indifference will quickly disappear. Like classical conditioning, the response rate drops off fairly rapidly, and if a rest period is provided, spontaneous recovery is typically seen.

But there are some key differences in extinction between operant conditioning and classical conditioning. In classical conditioning, the organism learns to associate two stimuli that it does not control, and it responds automatically. So when Pavlov wanted to test extinction, he would ring the bell and not present the dog with food. If the food did not come with the bell on repeated trials, eventually, the dog simply wouldn't salivate at the sound of the bell alone.

In operant conditioning, the organism learns to associate its behaviors— those behaviors it does control— with their consequences. So we see different trends with extinction.

Skinner ended up, over a period of decades, determining the relationship between the kinds of behavior you get and the schedule of reinforcement, the schedule according to which reinforcers are delivered.

In real life, behaviors are not reinforced every time. Skinner explored dozens of schedules of reinforcement in his work. He would follow a specific set of rules or criteria as to when the behavior would be reinforced, and observe the patterns and rates of responses.

There are two types of schedules. An interval schedule reinforces behavior after a certain amount of time. A ratio schedule reinforces behavior after a certain number of responses.

There are certain schedules in which organisms don't do much for a long time and then swing into action. If Skinner was in a standard classroom, he wouldn't be surprised that if you have an exam four weeks from now, it's unlikely you're gonna be studying for it in weeks one, two, three, or three and a half, at some point realize, jeez, I'd better start studying for the exam tomorrow.

Both schedules can be either fixed or variable. A fixed ratio schedule reinforces behavior after a set number of responses. A variable ratio schedule reinforces behavior after a changing number of responses.

One of the schedules is called the VR, or Variable Ratio schedule, in which a varying number of responses are required to make that feeder operate. The ratio between the number of responses and the reinforcers keeps changing. That's why we get really excited if we're playing roulette or we're playing a slot machine or something, because it could be this one. It could be this— the next one could produce $1 million.

Operant conditioning can also lead to some accidental correlations between perceived cause and effect. Something Skinner also investigated in his work were instances where animals come to associate some of their own idiosyncratic behaviors with the delivery of food rewards, merely because the two happened by accident within the same time window.

We see this in humans as well, and we call it superstition— the repetition of behaviors that have been accidentally reinforced. But since the times of Thorndike and Skinner, our views of operant conditioning have become somewhat more nuanced, accounting for internal cognitive processes and evolutionary factors that shape behavior and make the picture a bit more complex than black and white cause and effect.

For one, researchers offer new insights into the biological mechanisms involved in operant conditioning by investigating the mind of the operator with some of the penetrating tools of modern science. Brain imaging has revealed regions of the brain, such as the nucleus accumbens, which is involved in the circuits underlying many positively and negatively reinforced behaviors. Additionally, research now takes into account the evolutionary history of the species in question, as some animals are biologically predisposed to learn some things more readily than others, and to respond to some stimuli more readily than others, given their evolutionary history.

Operant conditioning involves a relationship between behavior and consequences, but it doesn't have to be a big behavior, like flicking on a light switch. And it doesn't have to be a big consequence, like the lights going on. It's more subtle than that. Watch. Here is some behavior, and then there's going to be a consequence. Watch.

I'll do it again. Here's the behavior, and there's a consequence. If you know how to program consequences, you can make your dog do anything. You'll be more effective as a teacher, more effective as a parent. The practical application of what's known about operant behavior— it's practically endless.