1. DSMC Evaluation Stages: Fostering Robust and Safe Behavior in Deep Reinforcement Learning – Extended Version.
- Author
-
GROS, TIMO P., GROß, JOSCHKA, HÖLLER, DANIEL, HOFFMANN, JÖRG, KLAUCK, MICHAELA, MEERKAMP, HENDRIK, MÜLLER, NICOLA J., SCHALLER, LUKAS, and WOLF, VERENA
- Subjects
DEEP reinforcement learning ,REINFORCEMENT learning ,DEEP learning ,REINFORCEMENT (Psychology) ,MARKOV processes ,ACTIVE learning ,DECISION making ,ARTIFICIAL intelligence - Abstract
Neural networks (NN) are gaining importance in sequential decision-making. Deep reinforcement learning (DRL), in particular, is extremely successful in learning action policies in complex and dynamic environments. Despite this success, however, DRL technology is not without its failures, especially in safety-critical applications: (i) the training objective maximizes average rewards, which may disregard rare but critical situations and hence lack local robustness; (ii) optimization objectives targeting safety typically yield degenerated reward structures, which, for DRL to work, must be replaced with proxy objectives. Here, we introduce a methodology that can help to address both deficiencies. We incorporate evaluation stages (ES) into DRL, leveraging recent work on deep statistical model checking (DSMC), which verifies NN policies in Markov decision processes. Our ES apply DSMC at regular intervals to determine state space regions with weak performance. We adapt the subsequent DRL training priorities based on the outcome, (i) focusing DRL on critical situations and (ii) allowing to foster arbitrary objectives. We run case studies on two benchmarks. One of them is the Racetrack, an abstraction of autonomous driving that requires navigating a map without crashing into a wall. The other is MiniGrid, a widely used benchmark in the AI community. Our results show that DSMC-based ES can significantly improve both (i) and (ii). [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF