MDP Geometry, Normalization and Reward Balancing Solvers

Authors :: Mustafin, Arsenii
Pakharev, Aleksei
Olshevsky, Alex
Paschalidis, Ioannis Ch.
Publication Year :: 2024
Abstract: The Markov Decision Process (MDP) is a widely used mathematical model for sequential decision-making problems. In this paper, we present a new geometric interpretation of MDPs with a natural normalization procedure that allows us to adjust the value function at each state without altering the advantage of any action with respect to any policy. This procedure enables the development of a novel class of algorithms for solving MDPs that find optimal policies without explicitly computing policy values. The new algorithms we propose for different settings achieve and, in some cases, improve upon state-of-the-art sample complexity results.<br />Comment: Preliminary version