Author: "Delgrange, Florent" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Delgrange, Florent"' showing total 13 results

Start Over Author "Delgrange, Florent"

13 results on '"Delgrange, Florent"'

1. Synthesis of Hierarchical Controllers Based on Deep Reinforcement Learning Policies

Author: Delgrange, Florent, Avni, Guy, Lukina, Anna, Schilling, Christian, Nowé, Ann, and Pérez, Guillermo A.
Subjects: Computer Science - Artificial Intelligence
Abstract: We propose a novel approach to the problem of controller design for environments modeled as Markov decision processes (MDPs). Specifically, we consider a hierarchical MDP a graph with each vertex populated by an MDP called a "room". We first apply deep reinforcement learning (DRL) to obtain low-level policies for each room, scaling to large rooms of unknown structure. We then apply reactive synthesis to obtain a high-level planner that chooses which low-level policy to execute in each room. The central challenge in synthesizing the planner is the need for modeling rooms. We address this challenge by developing a DRL procedure to train concise "latent" policies together with PAC guarantees on their performance. Unlike previous approaches, ours circumvents a model distillation step. Our approach combats sparse rewards in DRL and enables reusability of low-level policies. We demonstrate feasibility in a case study involving agent navigation amid moving obstacles., Comment: 19 pages main text, 17 pages Appendix (excluding references)
Published: 2024

2. Wasserstein Auto-encoded MDPs: Formal Verification of Efficiently Distilled RL Policies with Many-sided Guarantees

Author: Delgrange, Florent, Nowé, Ann, and Pérez, Guillermo A.
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence
Abstract: Although deep reinforcement learning (DRL) has many success stories, the large-scale deployment of policies learned through these advanced techniques in safety-critical scenarios is hindered by their lack of formal guarantees. Variational Markov Decision Processes (VAE-MDPs) are discrete latent space models that provide a reliable framework for distilling formally verifiable controllers from any RL policy. While the related guarantees address relevant practical aspects such as the satisfaction of performance and safety properties, the VAE approach suffers from several learning flaws (posterior collapse, slow learning speed, poor dynamics estimates), primarily due to the absence of abstraction and representation guarantees to support latent optimization. We introduce the Wasserstein auto-encoded MDP (WAE-MDP), a latent space model that fixes those issues by minimizing a penalized form of the optimal transport between the behaviors of the agent executing the original policy and the distilled policy, for which the formal guarantees apply. Our approach yields bisimulation guarantees while learning the distilled policy, allowing concrete optimization of the abstraction and representation model quality. Our experiments show that, besides distilling policies up to 10 times faster, the latent model quality is indeed better in general. Moreover, we present experiments from a simple time-to-failure verification algorithm on the latent space. The fact that our approach enables such simple verification techniques highlights its applicability., Comment: ICLR 2023, 10 pages main text, 14 pages appendix (excluding references)
Published: 2023

3. The Wasserstein Believer: Learning Belief Updates for Partially Observable Environments through Reliable Latent Space Models

Author: Avalos, Raphael, Delgrange, Florent, Nowé, Ann, Pérez, Guillermo A., and Roijers, Diederik M.
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence
Abstract: Partially Observable Markov Decision Processes (POMDPs) are used to model environments where the full state cannot be perceived by an agent. As such the agent needs to reason taking into account the past observations and actions. However, simply remembering the full history is generally intractable due to the exponential growth in the history space. Maintaining a probability distribution that models the belief over what the true state is can be used as a sufficient statistic of the history, but its computation requires access to the model of the environment and is often intractable. While SOTA algorithms use Recurrent Neural Networks to compress the observation-action history aiming to learn a sufficient statistic, they lack guarantees of success and can lead to sub-optimal policies. To overcome this, we propose the Wasserstein Belief Updater, an RL algorithm that learns a latent model of the POMDP and an approximation of the belief update. Our approach comes with theoretical guarantees on the quality of our approximation ensuring that our outputted beliefs allow for learning the optimal value function.
Published: 2023

4. Distillation of RL Policies with Formal Guarantees via Variational Abstraction of Markov Decision Processes (Technical Report)

Author: Delgrange, Florent, Nowé, Ann, and Pérez, Guillermo A.
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence
Abstract: We consider the challenge of policy simplification and verification in the context of policies learned through reinforcement learning (RL) in continuous environments. In well-behaved settings, RL algorithms have convergence guarantees in the limit. While these guarantees are valuable, they are insufficient for safety-critical applications. Furthermore, they are lost when applying advanced techniques such as deep-RL. To recover guarantees when applying advanced RL algorithms to more complex environments with (i) reachability, (ii) safety-constrained reachability, or (iii) discounted-reward objectives, we build upon the DeepMDP framework introduced by Gelada et al. to derive new bisimulation bounds between the unknown environment and a learned discrete latent model of it. Our bisimulation bounds enable the application of formal methods for Markov decision processes. Finally, we show how one can use a policy obtained via state-of-the-art RL to efficiently train a variational autoencoder that yields a discrete latent model with provably approximately correct bisimulation guarantees. Additionally, we obtain a distilled version of the policy for the latent model., Comment: AAAI 2022, technical report including supplementary material (10 pages main text, 14 pages appendix)
Published: 2021

5. Simple Strategies in Multi-Objective MDPs (Technical Report)

Author: Delgrange, Florent, Katoen, Joost-Pieter, Quatmann, Tim, and Randour, Mickael
Subjects: Computer Science - Logic in Computer Science, Computer Science - Artificial Intelligence
Abstract: We consider the verification of multiple expected reward objectives at once on Markov decision processes (MDPs). This enables a trade-off analysis among multiple objectives by obtaining the Pareto front. We focus on strategies that are easy to employ and implement. That is, strategies that are pure (no randomization) and have bounded memory. We show that checking whether a point is achievable by a pure stationary strategy is NP-complete, even for two objectives, and we provide an MILP encoding to solve the corresponding problem. The bounded memory case can be reduced to the stationary one by a product construction. Experimental results using \Storm and Gurobi show the feasibility of our algorithms.
Published: 2019

6. Life is Random, Time is Not: Markov Decision Processes with Window Objectives

Author: Brihaye, Thomas, Delgrange, Florent, Oualhadj, Youssouf, and Randour, Mickael
Subjects: Computer Science - Logic in Computer Science, Computer Science - Artificial Intelligence, Computer Science - Formal Languages and Automata Theory, Computer Science - Computer Science and Game Theory, Mathematics - Probability
Abstract: The window mechanism was introduced by Chatterjee et al. to strengthen classical game objectives with time bounds. It permits to synthesize system controllers that exhibit acceptable behaviors within a configurable time frame, all along their infinite execution, in contrast to the traditional objectives that only require correctness of behaviors in the limit. The window concept has proved its interest in a variety of two-player zero-sum games because it enables reasoning about such time bounds in system specifications, but also thanks to the increased tractability that it usually yields. In this work, we extend the window framework to stochastic environments by considering Markov decision processes. A fundamental problem in this context is the threshold probability problem: given an objective it aims to synthesize strategies that guarantee satisfying runs with a given probability. We solve it for the usual variants of window objectives, where either the time frame is set as a parameter, or we ask if such a time frame exists. We develop a generic approach for window-based objectives and instantiate it for the classical mean-payoff and parity objectives, already considered in games. Our work paves the way to a wide use of the window mechanism in stochastic models.
Published: 2019
Full Text: View/download PDF

7. A framework for flexibly guiding learning agents

Author: Elbarbari, Mahmoud, Delgrange, Florent, Vervlimmeren, Ivo, Efthymiadis, Kyriakos, Vanderborght, Bram, and Nowé, Ann
Published: 2022
Full Text: View/download PDF

8. Simple Strategies in Multi-Objective MDPs

Author: Delgrange, Florent, Katoen, Joost-Pieter, Quatmann, Tim, Randour, Mickael, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Woeginger, Gerhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Biere, Armin, editor, and Parker, David, editor
Published: 2020
Full Text: View/download PDF

9. Simple Strategies in Multi-Objective MDPs

Author: Delgrange, Florent, primary, Katoen, Joost-Pieter, additional, Quatmann, Tim, additional, and Randour, Mickael, additional
Published: 2020
Full Text: View/download PDF

10. The Wasserstein Believer

Author: Avalos, Raphael, Delgrange, Florent, Nowé, Ann, Pérez, Guillermo A., Roijers, Diederik M., Informatics and Applied Informatics, Faculty of Sciences and Bioengineering Sciences, Artificial Intelligence, and Electronics and Informatics
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Artificial Intelligence (cs.AI), Computer Science - Artificial Intelligence, cs.LG, cs.AI, Machine Learning (cs.LG)
Abstract: Partially Observable Markov Decision Processes (POMDPs) are useful tools to model environments where the full state cannot be perceived by an agent. As such the agent needs to reason taking into account the past observations and actions. However, simply remembering the full history is generally intractable due to the exponential growth in the history space. Keeping a probability distribution that models the belief over what the true state is can be used as a sufficient statistic of the history, but its computation requires access to the model of the environment and is also intractable. Current state-of-the-art algorithms use Recurrent Neural Networks (RNNs) to compress the observation-action history aiming to learn a sufficient statistic, but they lack guarantees of success and can lead to suboptimal policies. To overcome this, we propose the Wasserstein-Belief-Updater (WBU), an RL algorithm that learns a latent model of the POMDP and an approximation of the belief update. Our approach comes with theoretical guarantees on the quality of our approximation ensuring that our outputted beliefs allow for learning the optimal value function.
Published: 2023

11. Distillation of RL Policies with Formal Guarantees via Variational Abstraction of Markov Decision Processes

Author: Delgrange, Florent, Nowe, Ann, Pérez, Guillermo A., Faculty of Sciences and Bioengineering Sciences, Informatics and Applied Informatics, Artificial Intelligence, and Electronics and Informatics
Subjects: Formal methods, Reinforcement Learning, Representation learning
Abstract: We consider the challenge of policy simplification and verification in the context of policies learned through reinforcement learning (RL) in continuous environments. In well-behaved settings, RL algorithms have convergence guarantees in the limit. While these guarantees are valuable, they are insufficient for safety-critical applications. Furthermore, they are lost when applying advanced techniques such as deep-RL. To recover guarantees when applying advanced RL algorithms to more complex environments with (i) reachability, (ii) safety-constrained reachability, or (iii) discounted-reward objectives, we build upon the DeepMDP framework to derive new bisimulation bounds between the unknown environment and a learned discrete latent model of it. Our bisimulation bounds enable the application of formal methods for Markov decision processes. Finally, we show how one can use a policy obtained via state-of-the-art RL to efficiently train a variational autoencoder that yields a discrete latent model with provably approximately correct bisimulation guarantees. Additionally, we obtain a distilled version of the policy for the latent model.
Published: 2022

12. Distillation of RL Policies with Formal Guarantees via Variational Abstraction of Markov Decision Processes

Author: Delgrange, Florent, primary, Nowé, Ann, additional, and Pérez, Guillermo A., additional
Published: 2022
Full Text: View/download PDF

13. Simple Strategies in Multi-Objective MDPs

Author: Quatmann, Tim, primary, Delgrange, Florent, primary, Katoen, Joost-Pieter, primary, and Randour, Mickael, primary
Published: 2021
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

13 results on '"Delgrange, Florent"'

1. Synthesis of Hierarchical Controllers Based on Deep Reinforcement Learning Policies

2. Wasserstein Auto-encoded MDPs: Formal Verification of Efficiently Distilled RL Policies with Many-sided Guarantees

3. The Wasserstein Believer: Learning Belief Updates for Partially Observable Environments through Reliable Latent Space Models

4. Distillation of RL Policies with Formal Guarantees via Variational Abstraction of Markov Decision Processes (Technical Report)

5. Simple Strategies in Multi-Objective MDPs (Technical Report)

6. Life is Random, Time is Not: Markov Decision Processes with Window Objectives

7. A framework for flexibly guiding learning agents

8. Simple Strategies in Multi-Objective MDPs

9. Simple Strategies in Multi-Objective MDPs

10. The Wasserstein Believer

11. Distillation of RL Policies with Formal Guarantees via Variational Abstraction of Markov Decision Processes

12. Distillation of RL Policies with Formal Guarantees via Variational Abstraction of Markov Decision Processes

13. Simple Strategies in Multi-Objective MDPs

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

Publisher

13 results on '"Delgrange, Florent"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources