Back to Search
Start Over
Finite state multi-armed bandit problems: sensitive-discount, average-reward and average-overtaking optimality
- Source :
- Ann. Appl. Probab. 6, no. 3 (1996), 1024-1034
- Publication Year :
- 1996
- Publisher :
- The Institute of Mathematical Statistics, 1996.
-
Abstract
- We express Gittins indices for multi-armed bandit problems as Laurent expansions around discount factor 1. The coefficients of these expan-sions are then used to characterize stationary optimal policies when the optimality criteria are sensitive-discount optimality (otherwise known as Blackwell optimality), average-reward optimality and average-overtaking optimality. We also obtain bounds and derive optimality conditions for policies of a type that continue playing the same bandit as long as the state of that bandit remains in prescribed sets.
- Subjects :
- Statistics and Probability
Gittins index
Discounting
Mathematical optimization
Bandit problems
90C39
Markov decision chains
90C31
State (functional analysis)
Multi-armed bandit
90C47
optimality criteria
Overtaking
Laurent expansions
Finite state
Statistics, Probability and Uncertainty
Mathematical economics
60G40
Mathematics
Subjects
Details
- Language :
- English
- Database :
- OpenAIRE
- Journal :
- Ann. Appl. Probab. 6, no. 3 (1996), 1024-1034
- Accession number :
- edsair.doi.dedup.....6629df64e3cf6d22a04dbb321d9d2d99