Finite state multi-armed bandit problems: sensitive-discount, average-reward and average-overtaking optimality

Authors :: Michael N. Katehakis
Uriel G. Rothblum
Source :: Ann. Appl. Probab. 6, no. 3 (1996), 1024-1034
Publication Year :: 1996
Publisher :: The Institute of Mathematical Statistics, 1996.
Abstract: We express Gittins indices for multi-armed bandit problems as Laurent expansions around discount factor 1. The coefficients of these expan-sions are then used to characterize stationary optimal policies when the optimality criteria are sensitive-discount optimality (otherwise known as Blackwell optimality), average-reward optimality and average-overtaking optimality. We also obtain bounds and derive optimality conditions for policies of a type that continue playing the same bandit as long as the state of that bandit remains in prescribed sets.

Subjects :: Statistics and Probability
Gittins index
Discounting
Mathematical optimization
Bandit problems
90C39
Markov decision chains
90C31
State (functional analysis)
Multi-armed bandit
90C47
optimality criteria
Overtaking
Laurent expansions
Finite state
Statistics, Probability and Uncertainty
Mathematical economics
60G40
Mathematics

Tools