251. Boosting Policy Learning in Reinforcement Learning via Adaptive Intrinsic Reward Regulation
- Author
-
Qian Zhao, Jinhui Han, and Mao Xu
- Subjects
Exploration ,intrinsic reward ,reinforcement learning ,learning efficiency ,Electrical engineering. Electronics. Nuclear engineering ,TK1-9971 - Abstract
Reinforcement learning employs heuristic intrinsic rewards to facilitate effective learning and exploration of the environment by intelligent agents. Particularly in environments with sparse rewards, it’s challenging for agents to reach goals or obtain extrinsic rewards through random exploration. Appropriate intrinsic rewards can significantly boost learning. However, using intrinsic rewards requires a careful balance between exploration and exploitation, which is typically adjusted by a coefficient. In addition, different settings of the intrinsic reward coefficient can lead to significant differences in learning efficiency and performance. Hence, this paper presents a novel approach to regulate intrinsic rewards by adaptively tuning their coefficients, with the aim of enhancing the performance of some existing intrinsic reward techniques. The primary contributions of this study can be summarized in three aspects: 1) Designing a coefficient that adjusts the magnitude of intrinsic rewards, which dynamically adapts based on the return curve. 2) Developing an episode-wise adjustment strategy to improve the sample efficiency of intrinsic reward methods. 3) Modifying the advantage function in gradient policy methods to mitigate training instability caused by changes in the regulated intrinsic rewards. To evaluate the proposed method, we conducted experiments in both 2D and 3D maze environments with sparse rewards, combining it with several intrinsic reward approaches. The results demonstrate that the proposed method effectively enhances learning efficiency and improves the performance of some existing approaches to a certain extent.
- Published
- 2024
- Full Text
- View/download PDF