Dual experience replay-based TD3 for single intersection signal control.

Authors :: Gao, Yichao
Zhou, Dake
Shen, Yaqi
Yang, Xin
Source :: Journal of Supercomputing. Jul2024, Vol. 80 Issue 11, p15161-15182. 22p.
Publication Year :: 2024
Abstract: Compared to traditional traffic signal control methods, the method driven by Deep Reinforcement Learning (DRL) has shown better performance. But the problem of low sample utilization in reinforcement learning also arises. To deal with the problem, this paper presents a novel Twin Delayed Deep Deterministic Policy Gradient with Dual Buffer (TD3_DB) for traffic signal control. In the proposed framework, two experience buffers are used to store important samples and normal samples, separately, and the proportion of the two buffers is adjusted adaptively. In addition, lane pressure, describing the dynamic feature of lane traffic flow, is used for the state design of the TD3 agent, which enhances the perception of the agent toward intersections. Comprehensive experiments on different traffic flow modes has shown, the dual experience replay scheme can improve the sample utilization, and the proposed TD3_DB performs better than other methods such as original TD3, Proximal Policy Optimization (PPO), etc., effectively reducing vehicle queue length and waiting time. [ABSTRACT FROM AUTHOR]