Back to Search Start Over

Probably Approximately Correct Learning in Adversarial Environments With Temporal Logic Specifications

Authors :
Ufuk Topcu
Min Wen
Source :
IEEE Transactions on Automatic Control. 67:5055-5070
Publication Year :
2022
Publisher :
Institute of Electrical and Electronics Engineers (IEEE), 2022.

Abstract

Reinforcement learning (RL) algorithms have been used to learn how to implement tasks in uncertain and partially unknown environments. In practice, environments are usually uncontrolled and may affect task performance in an adversarial way. In this paper, we model the interaction between an RL agent and its potentially adversarial environment as a turn-based zero-sum stochastic game. The task requirements are represented both qualitatively as a linear temporal logic specification, and quantitatively as a reward function. For each case in which the specification is realizable and can be equivalently transformed into a deterministic Buchi automaton, we show that there always exists a memoryless almost-sure winning strategy that is epsilon-optimal for the discountedd sum objective for any arbitrary positive epsilon. We propose a probably approximately correct (PAC) learning algorithm that learns such a strategy efficiently in an online manner with a priori unknown reward functions and unknown transition distributions. To the best of our knowledge, this is the first result on PAC learning in stochastic games with independent quantitative and qualitative objectives.

Details

ISSN :
23343303 and 00189286
Volume :
67
Database :
OpenAIRE
Journal :
IEEE Transactions on Automatic Control
Accession number :
edsair.doi...........6dbfa7edd4cbe126f105852f4fbb833d
Full Text :
https://doi.org/10.1109/tac.2021.3115080