Back to Search Start Over

Evolutionary Development of Hierarchical Learning Structures

Authors :
Eiji Uchibe
Henrik I. Christensen
Stefan Elfwing
Kenji Doya
Source :
IEEE Transactions on Evolutionary Computation. 11:249-264
Publication Year :
2007
Publisher :
Institute of Electrical and Electronics Engineers (IEEE), 2007.

Abstract

Hierarchical reinforcement learning (RL) algorithms can learn a policy faster than standard RL algorithms. However, the applicability of hierarchical RL algorithms is limited by the fact that the task decomposition has to be performed in advance by the human designer. We propose a Lamarckian evolutionary approach for automatic development of the learning structure in hierarchical RL. The proposed method combines the MAXQ hierarchical RL method and genetic programming (GP). In the MAXQ framework, a subtask can optimize the policy independently of its parent task's policy, which makes it possible to reuse learned policies of the subtasks. In the proposed method, the MAXQ method learns the policy based on the task hierarchies obtained by GP, while the GP explores the appropriate hierarchies using the result of the MAXQ method. To show the validity of the proposed method, we have performed simulation experiments for a foraging task in three different environmental settings. The results show strong interconnection between the obtained learning structures and the given task environments. The main conclusion of the experiments is that the GP can find a minimal strategy, i.e., a hierarchy that minimizes the number of primitive subtasks that can be executed for each type of situation. The experimental results for the most challenging environment also show that the policies of the subtasks can continue to improve, even after the structure of the hierarchy has been evolutionary stabilized, as an effect of Lamarckian mechanisms

Details

ISSN :
1089778X
Volume :
11
Database :
OpenAIRE
Journal :
IEEE Transactions on Evolutionary Computation
Accession number :
edsair.doi...........2012f1bf0d562d5d783e37a4760ba270
Full Text :
https://doi.org/10.1109/tevc.2006.890270