TPN:Triple network algorithm for deep reinforcement learning.

Authors :: Han, Chen
Wang, Xuanyin
Source :: Neurocomputing. Jul2024, Vol. 591, pN.PAG-N.PAG. 1p.
Publication Year :: 2024
Abstract: The target net method has been the foundation of deep reinforcement learning since Deepmind first proposed it in 2015. Almost all the current popular reinforcement learning algorithms include target net. However, while the slowly updated target network improves the stability of the algorithm, it also reduces the performance of the algorithm. In this paper, the authors design a novel triple-network algorithm(TPN). TPN combines the temporal-difference(TD) algorithm and policy gradient(PG) theorem. Using three networks to estimate the state value(v), action value (q) , and policy(π). These networks have no primary or secondary distinction but are trained synchronously and influence each other. The author found that through this TPN architecture, the convergence and stability of the algorithm can be greatly improved without increasing the amount of calculation. Although it is only a basic framework at present. The calculation process of TPN is simple and easy to implement. Experiments prove that the convergence speed and stability of TPN in discrete cases are better than PPO. • Proposed a new deep reinforcement learning algorithm that does not rely on target networks. • Presented the complete derivation process of the TPN algorithm. • Verified the effectiveness of proposed algorithm through experiments. [ABSTRACT FROM AUTHOR]