Back to Search Start Over

Probably Approximately Correct Learning in Adversarial Environments With Temporal Logic Specifications.

Authors :
Wen, Min
Topcu, Ufuk
Source :
IEEE Transactions on Automatic Control. Oct2022, Vol. 67 Issue 10, p5055-5070. 16p.
Publication Year :
2022

Abstract

Reinforcement learning (RL) algorithms have been used to learn how to implement tasks in uncertain and partially unknown environments. In practice, environments are usually uncontrolled and may affect task performance in an adversarial way. In this article, we model the interaction between an RL agent and its potentially adversarial environment as a turn-based zero-sum stochastic game. The task requirements are represented both qualitatively as a subset of linear temporal logic (LTL) specifications, and quantitatively as a reward function. For each case in which the LTL specification is realizable and can be equivalently transformed into a deterministic Büchi automaton, we show that there always exists a memoryless almost-sure winning strategy that is $\varepsilon$ -optimal for the discounted-sum objective for any arbitrary positive $\varepsilon$. We propose a probably approximately correct (PAC) learning algorithm that learns such a strategy efficiently in an online manner with a priori unknown reward functions and unknown transition distributions. To the best of our knowledge, this is the first result on PAC learning in stochastic games with independent quantitative and qualitative objectives. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
00189286
Volume :
67
Issue :
10
Database :
Academic Search Index
Journal :
IEEE Transactions on Automatic Control
Publication Type :
Periodical
Accession number :
160621533
Full Text :
https://doi.org/10.1109/TAC.2021.3115080