Back to Search Start Over

Q-Sorting: An Algorithm for Reinforcement Learning Problems with Multiple Cumulative Constraints

Authors :
Jianfeng Huang
Guoqiang Lu
Yi Li
Jiajun Wu
Source :
Mathematics, Vol 12, Iss 13, p 2001 (2024)
Publication Year :
2024
Publisher :
MDPI AG, 2024.

Abstract

This paper proposes a method and an algorithm called Q-sorting for reinforcement learning (RL) problems with multiple cumulative constraints. The primary contribution is a mechanism for dynamically determining the focus of optimization among multiple cumulative constraints and the objective. Executed actions are picked through a procedure with two steps: first filter out actions potentially breaking the constraints, and second sort the remaining ones according to the Q values of the focus in descending order. The algorithm was originally developed upon the classic tabular value representation and episodic setting of RL, but the idea can be extended and applied to other methods with function approximation and discounted setting. Numerical experiments are carried out on the adapted Gridworld and the motor speed synchronization problem, both with one and two cumulative constraints. Simulation results validate the effectiveness of the proposed Q-sorting in that cumulative constraints are honored both during and after the learning process. The advantages of Q-sorting are further emphasized through comparison with the method of lumped performances (LP), which takes constraints into account through weighting parameters. Q-sorting outperforms LP in both ease of use (unnecessity of trial and error to determine values of the weighting parameters) and performance consistency (6.1920 vs. 54.2635 rad/s for the standard deviation of the cumulative performance index over 10 repeated simulation runs). It has great potential for practical engineering use.

Details

Language :
English
ISSN :
22277390
Volume :
12
Issue :
13
Database :
Directory of Open Access Journals
Journal :
Mathematics
Publication Type :
Academic Journal
Accession number :
edsdoj.f0bc846f24844c81b93a1bbce13ff1bc
Document Type :
article
Full Text :
https://doi.org/10.3390/math12132001