1. Dopamine blockade impairs the exploration-exploitation trade-off in rats
- Author
-
François Cinotti, Alain R. Marchand, Etienne Coutureau, Virginie Fresno, Nassim Aklil, Benoît Girard, Mehdi Khamassi, Institut des Systèmes Intelligents et de Robotique (ISIR), Sorbonne Université (SU)-Centre National de la Recherche Scientifique (CNRS), Institut de Neurosciences cognitives et intégratives d'Aquitaine (INCIA), Université Bordeaux Segalen - Bordeaux 2-Université Sciences et Technologies - Bordeaux 1-SFR Bordeaux Neurosciences-Centre National de la Recherche Scientifique (CNRS), Architectures et modèles d'Adptation et de la cognition (AMAC), and Sorbonne Université (SU)-Centre National de la Recherche Scientifique (CNRS)-Sorbonne Université (SU)-Centre National de la Recherche Scientifique (CNRS)
- Subjects
0301 basic medicine ,Male ,Operant learning ,Computer science ,Dopamine ,Science ,Decision ,Decision Making ,[INFO.INFO-NE]Computer Science [cs]/Neural and Evolutionary Computing [cs.NE] ,Trade-off ,Article ,[INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI] ,03 medical and health sciences ,0302 clinical medicine ,Reward ,medicine ,Animals ,[INFO.INFO-RB]Computer Science [cs]/Robotics [cs.RO] ,Rats, Long-Evans ,Dopamine metabolism ,Multidisciplinary ,[SCCO.NEUR]Cognitive science/Neuroscience ,Dopaminergic ,Models, Theoretical ,Blockade ,Rats ,030104 developmental biology ,Dopamine receptor ,Exploratory Behavior ,Dopamine Antagonists ,Medicine ,Probability Learning ,Neuroscience ,030217 neurology & neurosurgery ,medicine.drug - Abstract
International audience; In a volatile environment where rewards are uncertain, successful performance requires a delicate balance between exploitation of the best option and exploration of alternative choices. It has theoretically been proposed that dopamine contributes to the control of this exploration-exploitation trade-off, specifically that the higher the level of tonic dopamine, the more exploitation is favored. We demonstrate here that there is a formal relationship between the rescaling of dopamine positive reward prediction errors and the exploration-exploitation trade-off in simple non-stationary multi-armed bandit tasks. We further show in rats performing such a task that systemically antagonizing dopamine receptors greatly increases the number of random choices without affecting learning capacities. Simulations and comparison of a set of different computational models (an extended Q-learning model, a directed exploration model, and a meta-learning model) fitted on each individual confirm that, independently of the model, decreasing dopaminergic activity does not affect learning rate but is equivalent to an increase in random exploration rate. this study shows that dopamine could adapt the exploration-exploitation trade-off in decision-making when facing changing environmental contingencies. All organisms need to make choices for their survival while being confronted to uncertainty in their environment. Animals and humans tend to exploit actions likely to provide desirable outcomes, but they must also take into account the possibility that environmental contingencies and the outcome of their actions may vary with time. Behavioral flexibility is thus needed in volatile environments in order to detect and learn new contingencies 1. This requires a delicate balance between exploitation of known resources and exploration of alternative options that may have become advantageous. How this exploration/exploitation dilemma may be resolved and regulated is still a subject of active research in the fields of Neuroscience and Machine Learning 2-5. Dopamine holds a fundamental place in contemporary theories of learning and decision-making. The temporal evolution of phasic dopamine signals across learning has been extensively replicated, and is most of the time considered as evidence of a role in learning 6-8 , but see alternative views in Coddington et al. 9. Dopamine reward prediction error (RPE) signals have been identified in a variety of instrumental and Pavlovian conditioning tasks 10-13. They affect plasticity and action value learning in cortico-basal networks 14-16 and have been directly related to behavioral adaptation in a number of decision-making tasks in humans, non-human primates 17 and rodents 18-21. Accordingly, it is commonly assumed that manipulations of dopamine activity affect the rate of learning, but this could represent a misconception. Besides learning, the role of dopamine in the control of behavioral performance is still unclear. Dopamine is known to modulate incentive choice (the tendency to differentially weigh costs and benefits) 22,23 , and risk-taking behavior 24 , as well as other motivational aspects such as effort and response vigour 25. Because dopamine is one of the key factors that may encode success or uncertainty, it might modulate decisions by biasing them toward options that present the largest uncertainty 26,27. This would correspond to a "directed" exploration strategy 5,28,29. Alternatively, success and failure could affect tonic dopamine levels and control random exploration of all options, as recently proposed by Humphries et al. 30. This form of undirected exploration, which is often difficult
- Published
- 2019
- Full Text
- View/download PDF