Back to Search Start Over

Maximum Entropy Semi-Supervised Inverse Reinforcement Learning

Authors :
Julien Audiffren
Valko, M.
Lazaric, A.
Ghavamzadeh, M.
Centre de Mathématiques et de Leurs Applications (CMLA)
École normale supérieure - Cachan (ENS Cachan)-Centre National de la Recherche Scientifique (CNRS)
Sequential Learning (SEQUEL)
Inria Lille - Nord Europe
Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189 (CRIStAL)
Centrale Lille-Université de Lille-Centre National de la Recherche Scientifique (CNRS)-Centrale Lille-Université de Lille-Centre National de la Recherche Scientifique (CNRS)
ANR-14-CE24-0010,ExTra-Learn,Extraction et transfert de connaissances dans l'apprentissage par renforcement(2014)
European Project: 270327,EC:FP7:ICT,FP7-ICT-2009-6,COMPLACS(2011)
Source :
International Joint Conference on Artificial Intelligence, International Joint Conference on Artificial Intelligence, Jul 2015, Bueons Aires, Argentina, Scopus-Elsevier
Publication Year :
2015
Publisher :
HAL CCSD, 2015.

Abstract

International audience; A popular approach to apprenticeship learning (AL) is to formulate itas an inverse reinforcement learning (IRL) problem. The MaxEnt-IRL algorithm successfully integrates the maximum entropy principleinto IRL and unlike its predecessors, it resolves theambiguity arising from the fact that a possibly large number of policies couldmatch the expert's behavior. In this paper, we study an AL setting in which inaddition to the expert's trajectories,a number of unsupervised trajectories is available. We introduce MESSI,a novel algorithm that combines MaxEnt-IRLwith principles coming from semi-supervised learning. In particular, MESSIintegrates the unsupervised data intothe MaxEnt-IRL framework using a pairwise penalty on trajectories. Empiricalresults in a highway driving and grid-world problems indicate that MESSI is able to take advantage of the unsupervised trajectories and improve the performance ofMaxEnt-IRL.

Details

Language :
English
Database :
OpenAIRE
Journal :
International Joint Conference on Artificial Intelligence, International Joint Conference on Artificial Intelligence, Jul 2015, Bueons Aires, Argentina, Scopus-Elsevier
Accession number :
edsair.dedup.wf.001..84432c7a4a3b5879ff395320481d2c18