1. Probably Approximately Correct Learning in Adversarial Environments With Temporal Logic Specifications.
- Author
-
Wen, Min and Topcu, Ufuk
- Subjects
- *
MACHINE learning , *REINFORCEMENT learning , *REWARD (Psychology) , *CLASSROOM environment , *LOGIC , *PICTURE archiving & communication systems - Abstract
Reinforcement learning (RL) algorithms have been used to learn how to implement tasks in uncertain and partially unknown environments. In practice, environments are usually uncontrolled and may affect task performance in an adversarial way. In this article, we model the interaction between an RL agent and its potentially adversarial environment as a turn-based zero-sum stochastic game. The task requirements are represented both qualitatively as a subset of linear temporal logic (LTL) specifications, and quantitatively as a reward function. For each case in which the LTL specification is realizable and can be equivalently transformed into a deterministic Büchi automaton, we show that there always exists a memoryless almost-sure winning strategy that is $\varepsilon$ -optimal for the discounted-sum objective for any arbitrary positive $\varepsilon$. We propose a probably approximately correct (PAC) learning algorithm that learns such a strategy efficiently in an online manner with a priori unknown reward functions and unknown transition distributions. To the best of our knowledge, this is the first result on PAC learning in stochastic games with independent quantitative and qualitative objectives. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF