Start Over

Causal inference multi-agent reinforcement learning for traffic signal control.

Authors :: Yang, Shantian
Yang, Bo
Zeng, Zheng
Kang, Zhongfeng
Source :: Information Fusion. Jun2023, Vol. 94, p243-256. 14p.
Publication Year :: 2023
Abstract: • A Causal-Inference (CI) model is designed for the non-stationary multi-agent environment. • Combining with Multi-Agent learning, a CI-MA algorithm is proposed for traffic signal control. • Different granularity of traffic information is fused for feature representation. • A representation loss function and MA loss function are designed for joint optimization. • Experiments show that CI-MA algorithm outperforms the state-of-the-art algorithms. A primary challenge in multi-agent reinforcement learning for traffic signal control is to produce effective cooperative traffic-signal policies in non-stationary multi-agent traffic environments. However, each agent suffers from its local non-stationary traffic environment caused by the time-varying traffic-signal policies of adjacent agents; At the same time, different agents also produce time-varying traffic-signal policies, which further results in the non-stationarity of the whole traffic environment, so these produced traffic-signal policies may be ineffective. In this work, we propose a Causal Inference Multi-Agent reinforcement learning (CI-MA) algorithm, which can alleviate the non-stationarity of multi-agent traffic environments from both feature representation and optimization, eventually helps to produce effective cooperative traffic-signal policies. Specifically, a Causal-Inference (CI) model is first designed to reason about and tackle the non-stationarity of multi-agent traffic environments by both acquiring feature representation distributions and deriving variational lower bounds (i.e., objective functions); And then, based on the designed CI model, we propose a CI-MA algorithm, in which the feature representations are acquired from the non-stationarity of multi-agent traffic environments at both task level and timestep level, the acquired feature representations are used to produce cooperative traffic-signal policies and Q-values for multiple agents; Finally the corresponding objective functions optimize the whole algorithm from both causal inference and multi-agent reinforcement learning. Experiments are conducted in different non-stationary multi-agent traffic environments. Results show that CI-MA algorithm outperforms other state-of-the-art algorithms, and demonstrate that the proposed algorithm trained in synthetic-traffic environments can be effectively transferred to both synthetic- and real-traffic environments with non-stationarity. [ABSTRACT FROM AUTHOR]