1. Towards Fast Rates for Federated and Multi-Task Reinforcement Learning
- Author
-
Zhu, Feng, Heath Jr., Robert W., and Mitra, Aritra
- Subjects
Computer Science - Machine Learning ,Electrical Engineering and Systems Science - Systems and Control ,Mathematics - Optimization and Control - Abstract
We consider a setting involving $N$ agents, where each agent interacts with an environment modeled as a Markov Decision Process (MDP). The agents' MDPs differ in their reward functions, capturing heterogeneous objectives/tasks. The collective goal of the agents is to communicate intermittently via a central server to find a policy that maximizes the average of long-term cumulative rewards across environments. The limited existing work on this topic either only provide asymptotic rates, or generate biased policies, or fail to establish any benefits of collaboration. In response, we propose Fast-FedPG - a novel federated policy gradient algorithm with a carefully designed bias-correction mechanism. Under a gradient-domination condition, we prove that our algorithm guarantees (i) fast linear convergence with exact gradients, and (ii) sub-linear rates that enjoy a linear speedup w.r.t. the number of agents with noisy, truncated policy gradients. Notably, in each case, the convergence is to a globally optimal policy with no heterogeneity-induced bias. In the absence of gradient-domination, we establish convergence to a first-order stationary point at a rate that continues to benefit from collaboration., Comment: Accepted to the Decision and Control Conference (CDC), 2024
- Published
- 2024