Start Over

Faithful and Effective Reward Schemes for Model-Free Reinforcement Learning of Omega-Regular Objectives

Authors :: Hahn, Ernst Moritz
Perez, Mateo
Schewe, Sven
Somenzi, Fabio
Trivedi, Ashutosh
Wojtczak, Dominik
Hung, Dang Van
Sokolsky, Oleg
Formal Methods and Tools
University of Twente
Source :: Automated Technology for Verification and Analysis: 18th International Symposium, ATVA 2020, Hanoi, Vietnam, October 19–23, 2020, Proceedings, 108-124, STARTPAGE=108;ENDPAGE=124;TITLE=Automated Technology for Verification and Analysis, AUTOMATED TECHNOLOGY FOR VERIFICATION AND ANALYSIS (ATVA 2020), Automated Technology for Verification and Analysis ISBN: 9783030591519, ATVA
Publication Year :: 2020
Publisher :: Springer, 2020.
Abstract: Omega-regular properties—specified using linear time temporal logic or various forms of omega-automata—find increasing use in specifying the objectives of reinforcement learning (RL). The key problem that arises is that of faithful and effective translation of the objective into a scalar reward for model-free RL. A recent approach exploits Buchi automata with restricted nondeterminism to reduce the search for an optimal policy for an Open image in new window-regular property to that for a simple reachability objective. A possible drawback of this translation is that reachability rewards are sparse, being reaped only at the end of each episode. Another approach reduces the search for an optimal policy to an optimization problem with two interdependent discount parameters. While this approach provides denser rewards than the reduction to reachability, it is not easily mapped to off-the-shelf RL algorithms. We propose a reward scheme that reduces the search for an optimal policy to an optimization problem with a single discount parameter that produces dense rewards and is compatible with off-the-shelf RL algorithms. Finally, we report an experimental comparison of these and other reward schemes for model-free RL with omega-regular objectives.

Subjects :: 050101 languages & linguistics
Mathematical optimization
Optimization problem
Computer science
05 social sciences
22/2 OA procedure
Büchi automaton
02 engineering and technology
Reduction (complexity)
Reachability
0202 electrical engineering, electronic engineering, information engineering
Key (cryptography)
Reinforcement learning
020201 artificial intelligence & image processing
0501 psychology and cognitive sciences
Temporal logic
Time complexity

Details

Language :: English
ISBN :: 978-3-030-59151-9
ISBNs :: 9783030591519
Database :: OpenAIRE
Journal :: Automated Technology for Verification and Analysis: 18th International Symposium, ATVA 2020, Hanoi, Vietnam, October 19–23, 2020, Proceedings, 108-124, STARTPAGE=108;ENDPAGE=124;TITLE=Automated Technology for Verification and Analysis, AUTOMATED TECHNOLOGY FOR VERIFICATION AND ANALYSIS (ATVA 2020), Automated Technology for Verification and Analysis ISBN: 9783030591519, ATVA
Accession number :: edsair.doi.dedup.....bebbebe5d0c7e05b17c95cbba4a2e033

Tools

Email
Cite

Printer

Authors Abstract Subjects Details

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Faithful and Effective Reward Schemes for Model-Free Reinforcement Learning of Omega-Regular Objectives

Abstract

Subjects

Details

Tools

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Faithful and Effective Reward Schemes for Model-Free Reinforcement Learning of Omega-Regular Objectives

Abstract

Subjects

Details

Tools

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources