Improvement of MADRL Equilibrium Based on Pareto Optimization.

Authors :: Zhao, Zhiruo
Cao, Lei
Chen, Xiliang
Lai, Jun
Zhang, Legui
Source :: Computer Journal; Jul2023, Vol. 66 Issue 7, p1573-1585, 13p
Publication Year :: 2023
Abstract: In order to solve the incalculability caused by the issue of inconsistent objective functions in multi-agent deep reinforcement learning, the concept of Nash equilibrium is introduced. However, a Marko game may have multiple equilibriums, how to filter out a stable and optimal one is worth studying. Besides solution concept, how to keep the balance between exploration and exploitation is another key issue in reinforcement learning. On basis of the methods, which can converge to Nash equilibrium, this paper makes improvement through Pareto optimization. In order to alleviate the problem of over fitting caused by Pareto optimization and non-convergence caused by strategy change, we use stratified sampling in place of random sampling as assistance. What's more, our methods are trained through fictitious self-play to make full of self-learning experiences. By analyzing the experiment carried out on MAgent platform, the proposed methods are not only far better than traditional methods, but also reaching or even surpassing the state of art MADRL methods. [ABSTRACT FROM AUTHOR]