Back to Search
Start Over
Constrained Reinforcement Learning-Based Closed-Loop Reference Model for Optimal Tracking Control of Unknown Continuous-Time Systems
- Source :
- IEEE Transactions on Automation Science and Engineering: A Publication of the IEEE Robotics and Automation Society; October 2024, Vol. 21 Issue: 4 p7312-7324, 13p
- Publication Year :
- 2024
-
Abstract
- Although reinforcement learning (RL) is effective in stabilizing systems, it faces many challenges in solving the tracking problem of unknown continuous-time systems. One of the major challenges is that RL-based control can hardly satisfy both the transient and steady-state performance requirements for the tracking problem simultaneously. In this study, instead of implementing an RL controller, the RL agent acts as a planner in the closed-loop reference model. The RL-based planner concentrates on tracking performance optimization by the constrained integral RL algorithm. Meanwhile, the system is controlled by the proposed library-based adaptive controller, which contains a library of candidate functions for modeling the unknown system dynamics. A natural gradient-like adaptive law is developed to update the controller, ensuring asymptotic tracking and promoting sparsity in the controller parameter. Compared with the conventional RL-based control, the proposed framework can eliminate the tracking error while avoiding the high-frequency oscillation and peaking phenomenon. Furthermore, we theoretically demonstrate that our approach can improve the transient performance in terms of the <inline-formula> <tex-math notation="LaTeX">${\mathcal{ L}}_{2}$ </tex-math></inline-formula> norm of the tracking error and explicitly limit the <inline-formula> <tex-math notation="LaTeX">${\mathcal{ L}}_{\infty }$ </tex-math></inline-formula> norm of the peaking value through the Lyapunov analysis. Simulations are presented to support the theoretical findings at the end of the paper. Note to Practitioners—Practical control design is often interested in tracking non-zero reference trajectories. However, the non-optimal transient response, such as oscillation and overshoot, is the major obstacle to the development of a high-performance tracking control system. The proposed method addresses this issue by designing a constrained RL-based CRM to ensure the optimal transient performance of the system. The primary advantage is that the maximum peaking value can be conveniently tuned as a hyperparameter by users, which is extremely useful in practice. Furthermore, the proposed library-based adaptive controller can handle unknown system dynamics, where the governing equations of the dynamics can be determined through engineering experience.
Details
- Language :
- English
- ISSN :
- 15455955 and 15583783
- Volume :
- 21
- Issue :
- 4
- Database :
- Supplemental Index
- Journal :
- IEEE Transactions on Automation Science and Engineering: A Publication of the IEEE Robotics and Automation Society
- Publication Type :
- Periodical
- Accession number :
- ejs67730626
- Full Text :
- https://doi.org/10.1109/TASE.2023.3340726