Back to Search Start Over

An ϵ -Greedy Multiarmed Bandit Approach to Markov Decision Processes †.

Authors :
Muqattash, Isa
Hu, Jiaqiao
Source :
Stats; Mar2023, Vol. 6 Issue 1, p99-112, 14p
Publication Year :
2023

Abstract

We present REGA, a new adaptive-sampling-based algorithm for the control of finite-horizon Markov decision processes (MDPs) with very large state spaces and small action spaces. We apply a variant of the ϵ -greedy multiarmed bandit algorithm to each stage of the MDP in a recursive manner, thus computing an estimation of the "reward-to-go" value at each stage of the MDP. We provide a finite-time analysis of REGA. In particular, we provide a bound on the probability that the approximation error exceeds a given threshold, where the bound is given in terms of the number of samples collected at each stage of the MDP. We empirically compare REGA against another sampling-based algorithm called RASA by running simulations against the SysAdmin benchmark problem with 2 10 states. The results show that REGA and RASA achieved similar performance. Moreover, REGA and RASA empirically outperformed an implementation of the algorithm that uses the "original" ϵ -greedy algorithm that commonly appears in the literature. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
2571905X
Volume :
6
Issue :
1
Database :
Complementary Index
Journal :
Stats
Publication Type :
Academic Journal
Accession number :
162811029
Full Text :
https://doi.org/10.3390/stats6010006