Learning and Conditioning Part 3: Schedules of Reinforcement

Ashleigh Louis, PhD, LMFT, PA

Operant conditioning is a learning process based on reinforcement and punishment. As a reminder, in terms of operant conditioning, reinforcement always means that the behavior is strengthened (more likely to occur again), and punishment always means that the behavior is weakened (less likely to occur again). As discussed in the previous article about operant conditioning, people tend to learn most effectively through reinforcement rather than punishment. The degree of impact from reinforcement strategies depends on the schedule of reinforcement used; that is, the timing and frequency of the introduction of the reinforcing response.

Schedules of reinforcement can be divided into two broad categories: continuous reinforcement and partial reinforcement. Continuous reinforcement schedules involve reinforcing the desired behavior each and every time it occurs. It is very advantageous during the initial learning process and tends to shape a behavior quickly and effectively. The problem, as you might imagine, is that it’s extremely time-consuming (and draining on other resources) and difficult to maintain. Over time, with each lack of reinforcement for a particular response, the behavior begins to weaken or disappear entirely. This process is known as operant extinction and is the main reason why continuous schedules of reinforcement need to be switched to partial reinforcement strategies in order to maintain the learned behavior.

Partial, or intermittent, reinforcement involves reinforcing the desired behavior only part of the time. It is much more resistant to extinction but does take longer to achieve the desired behavior if this strategy is used initially (compared to a continuous schedule). There are four types of partial reinforcement schedules that are determined according to the timing and frequency of the response. These include fixed interval, variable interval, fixed ratio, and variable ratio.

Fixed interval schedules involve reinforcing a behavior after a specific amount of time has elapsed. A person who is paid hourly or on a monthly stipend despite how hard they actually worked is being reinforced on a fixed interval schedule. It is predictable and steady; they know that at the end of every hour or month they will have earned a certain amount of money. Fixed interval schedules are fairly easy to maintain but they have relatively low operant strength compared to the other alternatives, which means that the person is more likely to quit or reduce responding. For example, if someone stopped being paid for their work, they would likely stop working very quickly due to the lack of reward.

Variable interval schedules are similar to fixed interval schedules, except that instead of occurring at regular, predictable intervals, the reinforcer is unpredictable. Reinforcement is still contingent on the passage of time, but each interval may vary from a few minutes to several days or months. Because the person cannot predict the timing of the reinforcer, they are likely to behave in a relatively steady manner, hoping the reinforcer will be coming soon. Fishing is a great example of a variable interval schedule. You may catch your first fish moments after casting the line, but it could be hours until you catch your second. If your set on catching fish that day, you’ll continue to wait with your line in the water until you are sufficiently reinforced (that is, catch your desired number of fish).

Fixed ratio schedules occur when a response is reinforced after a certain number of responses. Rather than being contingent on time, ratio schedules are based on the actual activity of the individual. While this schedule tends to lead to a high rate of response, it can lead to burnout and/or lower quality work. For example, let’s say a parent offers to pay their child $5 each time they empty the dishwasher. It’s likely that the child will be motivated to complete this chore, but in an attempt to gain their reward as quickly and easily as possible, they’re also likely to rush through it and perhaps break a dish in the process. Similarly, a child who is rewarded for every 10 books read is likely to breeze through reading each book at the risk of not fully comprehending the story or gaining the benefits of mindful reading.

Variable ratio schedules are also based on actual input from the individual, but rather than being a fixed number of responses, the required number of responses vary randomly. The response rate is very high and steady because the individual is totally unsure of how many responses are needed before reinforcement will occur. Consider how you feel while playing a slot machine or checking your Facebook account. Every attempt comes with an exciting rush associated with the possibility of reinforcement. Despite the number of times you receive a disappointing lack of reinforcement, deep down you know it’s coming eventually, so you continue to play or check for notifications. Not surprisingly, therefore, this schedule is most associated with behavioral addictions and is most resistant to operant extinction.

The likelihood of operant extinction depends on the type of reinforcement schedule being used. Variable schedules are more resistant to extinction because, despite unsuccessful feedback, people tend to hold out hope that the next attempt will be successful, and their luck is about to change. That’s why people are more likely to sit for hours on end waiting to catch a fish or hit the jackpot but are much less likely to perform behaviors that are normally reinforced after a certain number of attempts or elapsed amount of time. If you haven’t been rewarded after that month of work (no paycheck) or after emptying the dishwasher (no $5), would you try it again in the future and hope for a different outcome? Probably not!