The Deep Magic of Behaviorism

Reinforcement schedules combine to cause complex behavior.

Behaviorism gets far too little credit. The boilerplate criticism against it is that the "simple" principles of reward and punishment aren't enough to account for complex, human behavior. That's like saying the weather isn't caused by physics, or life isn't caused by chemistry. The whole the universe is the result of simple forces combining to produce staggering complexity. You still have to understand the simple forces before you can hope to grasp the bigger picture.

This article starts with the most basic tenet of behaviorism (rewards increase behavior and punishments reduce behavior) and takes one step upwards in complexity by adding the variable of time. When behaviors are systematically rewarded or punished over time, strange and complex interactions take shape.

Before we begin, let's establish some common terms:

Continuous Schedules

When a behavior is rewarded or punished each and every time is occurs, this is called a "continuous schedule." It's the fastest way to condition a new behavior, but the results are extremely easy to extinguish. As soon as the rewards stop coming, so does the behavior.

Punishment can be quite effective when administered on a continuous schedule, but the results are still prone to extinction. Even worse, inconsistent punishment can actually make the unwanted behavior more persistent (see below). If a punishment isn't strong enough, it may increase the size of the punishment that is ultimately needed to stop the misbehavior. Finally, punishment tends to produce unwanted side-effects like anger and resentment, even when used correctly.

In either case, you get the best results by starting with a continuous schedule and then gradually transitioning into one of the interval or ratio schedules described below.

Interval Schedules

On an "interval schedule," a certain amount of time must pass before a behavior is rewarded or punished. The interval may or may not be predictable (see below). These situations characterize many aspects of modern life and tend to produce remarkably regular patterns of behavior.

Under a fixed interval schedule, the time at which a behavior will be rewarded or punished is predictable. This is usually because the interval between rewards/punishments is always the same. Under fixed interval reinforcement, behavior vanishes immediately after a reward and spikes as the next reward time approaches. The same pattern emerges when behavior is only rewarded in the presence of a specific signal, like a bell chime. Ring the bell and behavior explodes; put it down and behavior evaporates like the fog.

Fixed interval punishment creates the opposite pattern: behavior spikes shortly after punishment and drops off as the time for the next punishment draws near.

When you wake up before your alarm goes off in the morning, that's a result of fixed interval negative reinforcement. You wake up in order to turn off the alarm, which stops it from blaring in your ear. You could have woken up to turn it off at any time throughout the night, but the unwanted stimulus always occurs in the morning. As the appointed hour approaches, your preventative behavior becomes increasingly likely until... Bam! You slap that snooze button.

Variable interval schedules make things less predictable by changing the period of time between rewards/punishments and not providing any warning. They take the sharp peaks and valleys of fixed interval schedules and flatten them out a little. Otherwise, the same rules apply.

Ratio Schedules

On "ratio schedules," behavior is only rewarded or punished after it has occurred a few times. Once again, the number of times a behavior can occur without reward or punishment may or may not be predictable. Regardless, ratio reinforcement produces very high rates of behavior; it characterizes many kinds of compulsions and addictions.

Under a fixed ratio reinforcement schedule, the number of behaviors it takes to get a reward is always the same. This produces a steady rate of high-frequency behavior as the subject tries to get as many rewards as possible, as quickly as possible. This behavior is still easy to extinguish, but the process may take longer since the subject has learned to persist without rewards to some extent.

Fixed ratio schedules tend to weaken punishments, because the subject learns that each punishment will "buy" them a free pass on future behavior. If they stand to gain more than they loose, most people will just take their lumps and keep misbehaving.

Variable ratio schedules make things much more interesting. They happen when there's no way to predict how many behaviors it will take to earn a reward/punishment. This provokes even higher rates of behavior than fixed ratio reinforcement, if at a slightly less steady pace, and it's the most difficult type of learning to extinguish. The reason is simple: the subject has been trained to persist in their behavior as long as it takes to get that reward.

When attempting to train new behaviors, it's very difficult to start with a variable ratio schedule. It demands too much of the trainee before any reinforcement is given. Early on, it's best to use a very small ratio (ideally 1:1, which is continuous reinforcement) and then increase the ratio gradually to make the behavior more resistant to extinction.

Most attempts to use punishment end up on variable ratio schedules simply because it's hard to catch the unwanted behavior each and every time it occurs. This is especially true when misbehavior is also being reinforced, which is true of most criminal acts. Unfortunately, this causes the unwanted behavior to become increasingly resistant to punishment. Yet another reason to prefer reinforcement whenever possible.

Unscheduled Reward & Punishment

Sometimes, the universe (or a particularly devious researcher) doles out rewards and punishments at random, without concern for anyone's behavior. You could call this "unscheduled" reinforcement / punishment.

The funny thing is, reinforcement tends to happen whether rewards are intentional or not. A random reward increases the odds that you'll repeat whatever behavior you happened to be doing at the time, which increases the odds that you'll be doing it again when the next random reward drops in your lap. This feedback loop accounts for a lot of superstitious behavior.

When there's no way for a subject to avoid or prevent punishment, their behavior tends to decrease across the board. Eventually, they stop doing anything at all. This learned helplessness is the bedrock of depression.

Compound Schedules

Most real-world situations involve two or more schedules of reward and/or punishment laid one atop another. This is where the real complexity happens. I've already described how trainers get the most bang for their buck by starting with continuous reinforcement and then transitioning into an interval or ratio schedule, but that's bush league.

Gambling is the poster child for variable ratio reinforcement, but it's not as simple as that. When a game of chance requires players to wager something, they face a potential reward and a potential (negative) punishment every time they roll the dice (or pull the lever, or spin the wheel, or whatever). The trick is to keep the individual punishments small enough that they don't overpower the reinforcement.

Casinos use large, infrequent rewards to make people insensitive to small-but-frequent losses. If the rewards and punishment were similar in magnitude, the low-ratio punishment schedule would win out. If you changed the ratios to make rewards more frequent, gambling would increase. (Profits, however, would fall.) Finding the sweet spot is a matter of balancing the ratios of both schedules.

The same applies to crime, commerce, relationships, and everything else human beings do. The greater the number of schedules involved, the more complex the behavior becomes, but the underlying principles are always the same. Reinforcement schedules define the Skinner boxes in which we live.