Start Over

Communication-Efficient and Resilient Distributed Q-Learning.

Authors :: Xie Y
Mou S
Sundaram S
Source :: IEEE transactions on neural networks and learning systems [IEEE Trans Neural Netw Learn Syst] 2024 Mar; Vol. 35 (3), pp. 3351-3364. Date of Electronic Publication: 2024 Feb 29.
Publication Year :: 2024
Abstract: This article investigates the problem of communication-efficient and resilient multiagent reinforcement learning (MARL). Specifically, we consider a setting where a set of agents are interconnected over a given network, and can only exchange information with their neighbors. Each agent observes a common Markov Decision Process and has a local cost which is a function of the current system state and the applied control action. The goal of MARL is for all agents to learn a policy that optimizes the infinite horizon discounted average of all their costs. Within this general setting, we consider two extensions to existing MARL algorithms. First, we provide an event-triggered learning rule where agents only exchange information with their neighbors if a certain triggering condition is satisfied. We show that this enables learning while reducing the amount of communication. Next, we consider the scenario where some of the agents can be adversarial (as captured by the Byzantine attack model), and arbitrarily deviate from the prescribed learning algorithm. We establish a fundamental trade-off between optimality and resilience when Byzantine agents are present. We then create a resilient algorithm and show almost sure convergence of all reliable agents' value functions to the neighborhood of the optimal value function of all reliable agents, under certain conditions on the network topology. When the optimal Q -values are sufficiently separated for different actions, we show that all reliable agents can learn the optimal policy under our algorithm.