Back to Search
Start Over
MDP Geometry, Normalization and Reward Balancing Solvers
- Publication Year :
- 2024
-
Abstract
- The Markov Decision Process (MDP) is a widely used mathematical model for sequential decision-making problems. In this paper, we present a new geometric interpretation of MDPs with a natural normalization procedure that allows us to adjust the value function at each state without altering the advantage of any action with respect to any policy. This procedure enables the development of a novel class of algorithms for solving MDPs that find optimal policies without explicitly computing policy values. The new algorithms we propose for different settings achieve and, in some cases, improve upon state-of-the-art sample complexity results.<br />Comment: Preliminary version
- Subjects :
- Computer Science - Machine Learning
Mathematics - Optimization and Control
Subjects
Details
- Database :
- arXiv
- Publication Type :
- Report
- Accession number :
- edsarx.2407.06712
- Document Type :
- Working Paper