1. Improving actor-critic structure by relatively optimal historical information for discrete system.
- Author
-
Zhang, Xinyu, Li, Weidong, Zhu, Xiaoke, and Jing, Xiao-Yuan
- Subjects
DISCRETE systems ,DISTRIBUTION (Probability theory) ,INFORMATION storage & retrieval systems ,ACTING education ,REINFORCEMENT learning - Abstract
Recently, actor-critic structure based neural networks are widely used in many reinforcement learning tasks. It consists of two main parts: (i) an actor module which outputs the probability distribution of action, and (ii) a critic module which outputs the predicted value based on the current environment. Actor-critic structure based networks usually need expert demonstration to provide an appropriate pre-training for the actor module, but the demonstration data is often hard or even impossible to obtain. And most of them, such as those used in the maze and robot control tasks, suffer from a lack of proper pre-training and unstable error propagation from the critic module to the actor module, which would result in poor and unstable performance. Therefore, a specially designed module which is called relatively optimal historical information learning (ROHI) is proposed. The proposed ROHI module can record the historical explored information and obtain the relatively optimal information through a customized merging algorithm. Then, the relatively optimal historical information is used to assist in training the actor module during the main learning process. We introduce two complex experimental environments, including the complex maze problem and flipping game, to evaluate the effectiveness of the proposed module. The experimental results demonstrate that the extended models with ROHI can significantly improve the success rate of the original actor-critic structure based models and slightly decrease the number of iteration required to reach the stable phase of value-based networks. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF