Back to Search
Start Over
Probably Approximately Correct Learning in Adversarial Environments With Temporal Logic Specifications
- Source :
- IEEE Transactions on Automatic Control. 67:5055-5070
- Publication Year :
- 2022
- Publisher :
- Institute of Electrical and Electronics Engineers (IEEE), 2022.
-
Abstract
- Reinforcement learning (RL) algorithms have been used to learn how to implement tasks in uncertain and partially unknown environments. In practice, environments are usually uncontrolled and may affect task performance in an adversarial way. In this paper, we model the interaction between an RL agent and its potentially adversarial environment as a turn-based zero-sum stochastic game. The task requirements are represented both qualitatively as a linear temporal logic specification, and quantitatively as a reward function. For each case in which the specification is realizable and can be equivalently transformed into a deterministic Buchi automaton, we show that there always exists a memoryless almost-sure winning strategy that is epsilon-optimal for the discountedd sum objective for any arbitrary positive epsilon. We propose a probably approximately correct (PAC) learning algorithm that learns such a strategy efficiently in an online manner with a priori unknown reward functions and unknown transition distributions. To the best of our knowledge, this is the first result on PAC learning in stochastic games with independent quantitative and qualitative objectives.
- Subjects :
- Computer science
business.industry
media_common.quotation_subject
Stochastic game
Probably approximately correct learning
Büchi automaton
Computer Science Applications
Task (project management)
Linear temporal logic
Control and Systems Engineering
Reinforcement learning
Temporal logic
Artificial intelligence
Electrical and Electronic Engineering
Function (engineering)
business
media_common
Subjects
Details
- ISSN :
- 23343303 and 00189286
- Volume :
- 67
- Database :
- OpenAIRE
- Journal :
- IEEE Transactions on Automatic Control
- Accession number :
- edsair.doi...........6dbfa7edd4cbe126f105852f4fbb833d
- Full Text :
- https://doi.org/10.1109/tac.2021.3115080