Back to Search Start Over

Goal-Directed and Habit-Like Modulations of Stimulus Processing during Reinforcement Learning

Authors :
Tom Beesley
Richard W. Morris
Mike E. Le Pelley
David Luque
Bradley N. Jack
Oren Griffiths
Thomas J. Whitford
Source :
The Journal of Neuroscience. 37:3009-3017
Publication Year :
2017
Publisher :
Society for Neuroscience, 2017.

Abstract

Recent research has shown that perceptual processing of stimuli previously associated with high-value rewards is automatically prioritized even when rewards are no longer available. It has been hypothesized that such reward-related modulation of stimulus salience is conceptually similar to an “attentional habit.” Recording event-related potentials in humans during a reinforcement learning task, we show strong evidence in favor of this hypothesis. Resistance to outcome devaluation (the defining feature of a habit) was shown by the stimulus-locked P1 component, reflecting activity in the extrastriate visual cortex. Analysis at longer latencies revealed a positive component (corresponding to the P3b, from 550–700 ms) sensitive to outcome devaluation. Therefore, distinct spatiotemporal patterns of brain activity were observed corresponding to habitual and goal-directed processes. These results demonstrate that reinforcement learning engages both attentional habits and goal-directed processes in parallel. Consequences for brain and computational models of reinforcement learning are discussed.SIGNIFICANCE STATEMENTThe human attentional network adapts to detect stimuli that predict important rewards. A recent hypothesis suggests that the visual cortex automatically prioritizes reward-related stimuli, driven by cached representations of reward value; that is, stimulus–response habits. Alternatively, the neural system may track the current value of the predicted outcome. Our results demonstrate for the first time that visual cortex activity is increased for reward-related stimuli even when the rewarding event is temporarily devalued. In contrast, longer-latency brain activity was specifically sensitive to transient changes in reward value. Therefore, we show that both habit-like attention and goal-directed processes occur in the same learning episode at different latencies. This result has important consequences for computational models of reinforcement learning.

Details

ISSN :
15292401 and 02706474
Volume :
37
Database :
OpenAIRE
Journal :
The Journal of Neuroscience
Accession number :
edsair.doi.dedup.....480928bdf9b4c01f85b4b1e0210f9e23