Back to Search Start Over

Orientation-Preserving Rewards’ Balancing in Reinforcement Learning

Authors :
Shangqi Guo
Feng Chen
Jinsheng Ren
Source :
IEEE Transactions on Neural Networks and Learning Systems. 33:6458-6472
Publication Year :
2022
Publisher :
Institute of Electrical and Electronics Engineers (IEEE), 2022.

Abstract

Auxiliary rewards are widely used in complex reinforcement learning tasks. However, previous work can hardly avoid the interference of auxiliary rewards on pursuing the main rewards, which leads to the destruction of the optimal policy. Thus, it is challenging but essential to balance the main and auxiliary rewards. In this article, we explicitly formulate the problem of rewards' balancing as searching for a Pareto optimal solution, with the overall objective of preserving the policy's optimization orientation for the main rewards (i.e., the policy driven by the balanced rewards is consistent with the policy driven by the main rewards). To this end, we propose a variant Pareto and show that it can effectively guide the policy search toward more main rewards. Furthermore, we establish an iterative learning framework for rewards' balancing and theoretically analyze its convergence and time complexity. Experiments in both discrete (grid word) and continuous (Doom) environments demonstrated that our algorithm can effectively balance rewards, and achieve remarkable performance compared with those RLs with heuristically designed rewards. In the ViZDoom platform, our algorithm can learn expert-level policies.

Details

ISSN :
21622388 and 2162237X
Volume :
33
Database :
OpenAIRE
Journal :
IEEE Transactions on Neural Networks and Learning Systems
Accession number :
edsair.doi.dedup.....6f9909bac4f8c54f0bb158f4e31a8aff