Back to Search
Start Over
Decentralized Adaptive temporal-difference learning over time-varying networks and its finite-time analysis.
- Source :
-
Neurocomputing . Nov2024, Vol. 604, pN.PAG-N.PAG. 1p. - Publication Year :
- 2024
-
Abstract
- In reinforcement learning, centralized temporal-difference (TD) learning is commonly used to solve the policy evaluation problem. However, the decentralized adaptive variant of the TD learning algorithm has rarely been investigated in multi-agent reinforcement learning. To fill this gap, based on linear function approximation, we propose a decentralized adaptive TD learning algorithm over time-varying networks, referred to as D-AdaTD. We rigorously analyze the convergence performance of D-AdaTD , i.e., the explicit finite-time analysis is established under different step-sizes. Specifically, under constant step-sizes, the average estimated value function can converge to a neighborhood of the optimal value at rate O (1 / (k + 1)) and the estimated parameter of each agent can converge to a neighborhood of the optimal parameter at rate O (ξ k) , where k is the number of iterations and ξ ∈ (0 , 1). Under diminishing step-sizes, the average estimated value function can converge to the optimal value and the average estimated parameter can converge to the optimal parameter at rate O ((1 + log (k + 1)) / k) and O ((1 + log (k + 1)) / (k + 1)) , respectively. In addition, we evaluate the performance of D-AdaTD via simulation experiments, which are commonly insufficient in the existing decentralized temporal-difference learning. The experimental results also validate the effectiveness of D-AdaTD. [ABSTRACT FROM AUTHOR]
- Subjects :
- *MACHINE learning
*TIME-varying networks
*NEIGHBORHOODS
Subjects
Details
- Language :
- English
- ISSN :
- 09252312
- Volume :
- 604
- Database :
- Academic Search Index
- Journal :
- Neurocomputing
- Publication Type :
- Academic Journal
- Accession number :
- 179364708
- Full Text :
- https://doi.org/10.1016/j.neucom.2024.128311