Multiplayer Bandits Without Observing Collision Information.

Authors :: Lugosi, Gábor
Mehrabian, Abbas
Source :: Mathematics of Operations Research; May2022, Vol. 47 Issue 2, p1247-1265, 19p
Publication Year :: 2022
Abstract: We study multiplayer stochastic multiarmed bandit problems in which the players cannot communicate, and if two or more players pull the same arm, a collision occurs and the involved players receive zero reward. We consider two feedback models: a model in which the players can observe whether a collision has occurred and a more difficult setup in which no collision information is available. We give the first theoretical guarantees for the second model: an algorithm with a logarithmic regret and an algorithm with a square-root regret that does not depend on the gaps between the means. For the first model, we give the first square-root regret bounds that do not depend on the gaps. Building on these ideas, we also give an algorithm for reaching approximate Nash equilibria quickly in stochastic anticoordination games. [ABSTRACT FROM AUTHOR]