Back to Search Start Over

Decentralized Adaptive temporal-difference learning over time-varying networks and its finite-time analysis.

Authors :
Xie, Ping
Wang, Xin
Yao, Shan
Liu, Muhua
Zhao, Xuhui
Zheng, Ruijuan
Source :
Neurocomputing. Nov2024, Vol. 604, pN.PAG-N.PAG. 1p.
Publication Year :
2024

Abstract

In reinforcement learning, centralized temporal-difference (TD) learning is commonly used to solve the policy evaluation problem. However, the decentralized adaptive variant of the TD learning algorithm has rarely been investigated in multi-agent reinforcement learning. To fill this gap, based on linear function approximation, we propose a decentralized adaptive TD learning algorithm over time-varying networks, referred to as D-AdaTD. We rigorously analyze the convergence performance of D-AdaTD , i.e., the explicit finite-time analysis is established under different step-sizes. Specifically, under constant step-sizes, the average estimated value function can converge to a neighborhood of the optimal value at rate O (1 / (k + 1)) and the estimated parameter of each agent can converge to a neighborhood of the optimal parameter at rate O (ξ k) , where k is the number of iterations and ξ ∈ (0 , 1). Under diminishing step-sizes, the average estimated value function can converge to the optimal value and the average estimated parameter can converge to the optimal parameter at rate O ((1 + log (k + 1)) / k) and O ((1 + log (k + 1)) / (k + 1)) , respectively. In addition, we evaluate the performance of D-AdaTD via simulation experiments, which are commonly insufficient in the existing decentralized temporal-difference learning. The experimental results also validate the effectiveness of D-AdaTD. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
09252312
Volume :
604
Database :
Academic Search Index
Journal :
Neurocomputing
Publication Type :
Academic Journal
Accession number :
179364708
Full Text :
https://doi.org/10.1016/j.neucom.2024.128311