Back to Search Start Over

The non-stationary stochastic multi-armed bandit problem

Authors :
Raphaël Féraud
Robin Allesiardo
Odalric-Ambrym Maillard
Laboratoire de Recherche en Informatique (LRI)
Université Paris-Sud - Paris 11 (UP11)-CentraleSupélec-Centre National de la Recherche Scientifique (CNRS)
Orange Labs [Lannion]
France Télécom
Machine Learning and Optimisation (TAO)
Centre National de la Recherche Scientifique (CNRS)-Inria Saclay - Ile de France
Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Université Paris-Sud - Paris 11 (UP11)-Laboratoire de Recherche en Informatique (LRI)
Université Paris-Sud - Paris 11 (UP11)-CentraleSupélec-Centre National de la Recherche Scientifique (CNRS)-CentraleSupélec
ANR-16-CE40-0002,BADASS,BANDITS MANCHOTS POUR SIGNAUX NON-STATIONNAIRES ET STRUCTURES(2016)
Université Paris-Sud - Paris 11 (UP11)-CentraleSupélec-Centre National de la Recherche Scientifique (CNRS)-Université Paris-Sud - Paris 11 (UP11)-CentraleSupélec-Centre National de la Recherche Scientifique (CNRS)-Inria Saclay - Ile de France
Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)
Source :
International Journal of Data Science and Analytics, International Journal of Data Science and Analytics, Springer Verlag, 2017, 3 (4), pp.267-283. ⟨10.1007/s41060-017-0050-5⟩, International Journal of Data Science and Analytics, 2017, 3 (4), pp.267-283. ⟨10.1007/s41060-017-0050-5⟩
Publication Year :
2017
Publisher :
Springer Science and Business Media LLC, 2017.

Abstract

International audience; We consider a variant of the stochastic multi-armed bandit with K arms where the rewards are not assumed to be identically distributed, but are generated by a non-stationary stochastic process. We first study the unique best arm setting when there exists one unique best arm. Second, we study the general switching best arm setting when a best arm switches at some unknown steps. For both settings, we target problem-dependent bounds, instead of the more conservative problem-free bounds. We consider two classical problems: (1) identify a best arm with high probability (best arm identification), for which the performance measure by the sample complexity (number of samples before finding a near-optimal arm). To this end, we naturally extend the definition of sample complexity so that it makes sense in the switching best arm setting, which may be of independent interest. (2) Achieve the smallest cumulative regret (regret minimization) where the regret is measured with respect to the strategy pulling an arm with the best instantaneous mean at each step.

Details

ISSN :
23644168 and 2364415X
Volume :
3
Database :
OpenAIRE
Journal :
International Journal of Data Science and Analytics
Accession number :
edsair.doi.dedup.....f62101eece9aa4f0bb3a4a108e30232b
Full Text :
https://doi.org/10.1007/s41060-017-0050-5