Back to Search Start Over

Bootstrapping Q-Learning for Robotics from Neuro-Evolution Results

Authors :
Matthieu Zimmer
Stéphane Doncieux
Architectures et modèles d'Adptation et de la cognition (AMAC)
Institut des Systèmes Intelligents et de Robotique (ISIR)
Sorbonne Université (SU)-Centre National de la Recherche Scientifique (CNRS)-Sorbonne Université (SU)-Centre National de la Recherche Scientifique (CNRS)
This work has been supported by the FET project DREAM, that has received funding from the European Unions Horizon 2020 research and innovation programme under grant agreement No 640891. The authors would like to thank Olivier Sigaud for his comments on a first draft of the article. The data has been numerically analysed with the free software package GNU Octave and scikit-learn. Experiments presented in this paper were carried out using the Grid’5000 testbed, supported by a scientific interest group hosted by INRIA and including CNRS, RENATER and several Universities as well as other organizations (see https://www.grid5000.fr).
GRID5000
Source :
IEEE Transactions on Cognitive and Developmental Systems, IEEE Transactions on Cognitive and Developmental Systems, Institute of Electrical and Electronics Engineers, Inc, 2017, ⟨10.1109/TCDS.2016.2628817⟩
Publication Year :
2017
Publisher :
HAL CCSD, 2017.

Abstract

International audience; Reinforcement learning problems are hard to solve in a robotics context as classical algorithms rely on discrete representations of actions and states, but in robotics both are continuous. A discrete set of actions and states can be defined, but it requires an expertise that may not be available, in particular in open environments. It is proposed to define a process to make a robot build its own representation for a reinforcement learning algorithm. The principle is to first use a direct policy search in the sensori-motor space, i.e. with no predefined discrete sets of states nor actions, and then extract from the corresponding learning traces discrete actions and identify the relevant dimensions of the state to estimate the value function. Once this is done, the robot can apply reinforcement learning (1) to be more robust to new domains and, if required, (2) to learn faster than a direct policy search. This approach allows to take the best of both worlds: first learning in a continuous space to avoid the need of a specific representation, but at a price of a long learning process and a poor generalization, and then learning with an adapted representation to be faster and more robust.

Details

Language :
English
ISSN :
23798920 and 23798939
Database :
OpenAIRE
Journal :
IEEE Transactions on Cognitive and Developmental Systems, IEEE Transactions on Cognitive and Developmental Systems, Institute of Electrical and Electronics Engineers, Inc, 2017, ⟨10.1109/TCDS.2016.2628817⟩
Accession number :
edsair.doi.dedup.....4ca7c0f0e7052c8c147fc48d8ef003ea
Full Text :
https://doi.org/10.1109/TCDS.2016.2628817⟩