Back to Search
Start Over
An ϵ -Greedy Multiarmed Bandit Approach to Markov Decision Processes †.
- Source :
- Stats; Mar2023, Vol. 6 Issue 1, p99-112, 14p
- Publication Year :
- 2023
-
Abstract
- We present REGA, a new adaptive-sampling-based algorithm for the control of finite-horizon Markov decision processes (MDPs) with very large state spaces and small action spaces. We apply a variant of the ϵ -greedy multiarmed bandit algorithm to each stage of the MDP in a recursive manner, thus computing an estimation of the "reward-to-go" value at each stage of the MDP. We provide a finite-time analysis of REGA. In particular, we provide a bound on the probability that the approximation error exceeds a given threshold, where the bound is given in terms of the number of samples collected at each stage of the MDP. We empirically compare REGA against another sampling-based algorithm called RASA by running simulations against the SysAdmin benchmark problem with 2 10 states. The results show that REGA and RASA achieved similar performance. Moreover, REGA and RASA empirically outperformed an implementation of the algorithm that uses the "original" ϵ -greedy algorithm that commonly appears in the literature. [ABSTRACT FROM AUTHOR]
Details
- Language :
- English
- ISSN :
- 2571905X
- Volume :
- 6
- Issue :
- 1
- Database :
- Complementary Index
- Journal :
- Stats
- Publication Type :
- Academic Journal
- Accession number :
- 162811029
- Full Text :
- https://doi.org/10.3390/stats6010006