A Q-Learning Algorithm for Discrete-Time Linear-Quadratic Control with Random Parameters of Unknown Distribution: Convergence and Stabilization

Authors :: Kai Du
Qingxin Meng
Fu Zhang
Source :: SIAM Journal on Control and Optimization. 60:1991-2015
Publication Year :: 2022
Publisher :: Society for Industrial & Applied Mathematics (SIAM), 2022.
Abstract: This paper studies an infinite horizon optimal control problem for discrete-time linear systems and quadratic criteria, both with random parameters which are independent and identically distributed with respect to time. A classical approach is to solve an algebraic Riccati equation that involves mathematical expectations and requires certain statistical information of the parameters. In this paper, we propose an online iterative algorithm in the spirit of Q-learning for the situation where only one random sample of parameters emerges at each time step. The first theorem proves the equivalence of three properties: the convergence of the learning sequence, the well-posedness of the control problem, and the solvability of the algebraic Riccati equation. The second theorem shows that the adaptive feedback control in terms of the learning sequence stabilizes the system as long as the control problem is well-posed. Numerical examples are presented to illustrate our results.<br />Comment: 24 pages, 3 figures

Subjects :: Control and Optimization
Optimization and Control (math.OC)
Applied Mathematics
ComputingMethodologies_SYMBOLICANDALGEBRAICMANIPULATION
Probability (math.PR)
FOS: Mathematics
49N10, 93E35, 93D15
Mathematics - Optimization and Control
Mathematics - Probability

Tools