Monte Carlo Tree Search for Policy Optimization

Authors :: Mykel J. Kochenderfer
Xiaobai Ma
Zongzhang Zhang
Katherine Driggs-Campbell
Source :: IJCAI
Publication Year :: 2019
Publisher :: International Joint Conferences on Artificial Intelligence Organization, 2019.
Abstract: Gradient-based methods are often used for policy optimization in deep reinforcement learning, despite being vulnerable to local optima and saddle points. Although gradient-free methods (e.g., genetic algorithms or evolution strategies) help mitigate these issues, poor initialization and local optima are still concerns in highly nonconvex spaces. This paper presents a method for policy optimization based on Monte-Carlo tree search and gradient-free optimization. Our method, called Monte-Carlo tree search for policy optimization (MCTSPO), provides a better exploration-exploitation trade-off through the use of the upper confidence bound heuristic. We demonstrate improved performance on reinforcement learning tasks with deceptive or sparse reward functions compared to popular gradient-based and deep genetic algorithm baselines.

Subjects :: Mathematical optimization
Tree (data structure)
Local optimum
Heuristic (computer science)
Computer science
Saddle point
Genetic algorithm
Monte Carlo tree search
Initialization
Reinforcement learning

Database :: OpenAIRE
Journal :: Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence
Accession number :: edsair.doi...........74659625b0d735f1a836d5fd7d9e48b7
Full Text :: https://doi.org/10.24963/ijcai.2019/432