Efficient off‐policy Q‐learning for multi‐agent systems by solving dual games.

Authors :: Wang, Yan
Xue, Huiwen
Wen, Jiwei
Liu, Jinfeng
Luan, Xiaoli
Source :: International Journal of Robust & Nonlinear Control. Apr2024, Vol. 34 Issue 6, p4193-4212. 20p.
Publication Year :: 2024
Abstract: This article develops distributed optimal control policies via Q‐learning for multi‐agent systems (MASs) by solving dual games. According to game theory, first, the distributed consensus problem is formulated as a multi‐player non‐zero‐sum game, where each agent is viewed as a player focusing only on its local performance and the whole MAS achieves Nash equilibrium. Second, for each agent, the anti‐disturbance problem is formulated as a two‐player zero‐sum game, in which the control input and external disturbance are a pair of opponents. Specifically, (1) an offline data‐driven off‐policy for distributed tracking algorithm based on momentum policy gradient (MPG) is developed, which can effectively achieve consensus of MASs with guaranteed l2$$ {l}_2 $$‐bounded synchronization error. (2) An actor‐critic‐disturbance neural network is employed to implement the MPG algorithm and obtain optimal policies. Finally, numerical and practical simulation results are conducted to verify the effectiveness of the developed tracking policies via MPG algorithm. [ABSTRACT FROM AUTHOR]