1. An Information-Theoretic Analysis of Thompson Sampling for Logistic Bandits
- Author
-
Gouverneur, Amaury, Rodríguez-Gálvez, Borja, Oechtering, Tobias J., and Skoglund, Mikael
- Subjects
Statistics - Machine Learning ,Computer Science - Machine Learning - Abstract
We study the performance of the Thompson Sampling algorithm for logistic bandit problems. In this setting, an agent receives binary rewards with probabilities determined by a logistic function, $\exp(\beta \langle a, \theta \rangle)/(1+\exp(\beta \langle a, \theta \rangle))$, with slope parameter $\beta>0$, and where both the action $a\in \mathcal{A}$ and parameter $\theta \in \mathcal{O}$ lie within the $d$-dimensional unit ball. Adopting the information-theoretic framework introduced by Russo and Van Roy (2016), we analyze the information ratio, a statistic that quantifies the trade-off between the immediate regret incurred and the information gained about the optimal action. We improve upon previous results by establishing that the information ratio is bounded by $\tfrac{9}{2}d\alpha^{-2}$, where $\alpha$ is a minimax measure of the alignment between the action space $\mathcal{A}$ and the parameter space $\mathcal{O}$, and is independent of $\beta$. Using this result, we derive a bound of order $O(d/\alpha\sqrt{T \log(\beta T/d)})$ on the Bayesian expected regret of Thompson Sampling incurred after $T$ time steps. To our knowledge, this is the first regret bound for logistic bandits that depends only logarithmically on $\beta$ while being independent of the number of actions. In particular, when the action space contains the parameter space, the bound on the expected regret is of order $\tilde{O}(d \sqrt{T})$., Comment: 21 pages, under review
- Published
- 2024