Author: "Simão, Thiago D." - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Simão, Thiago D."' showing total 29 results

Start Over Author "Simão, Thiago D."

29 results on '"Simão, Thiago D."'

1. Pessimistic Iterative Planning for Robust POMDPs

Author: Galesloot, Maris F. L., Suilen, Marnix, Simão, Thiago D., Carr, Steven, Spaan, Matthijs T. J., Topcu, Ufuk, and Jansen, Nils
Subjects: Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: Robust POMDPs extend classical POMDPs to handle model uncertainty. Specifically, robust POMDPs exhibit so-called uncertainty sets on the transition and observation models, effectively defining ranges of probabilities. Policies for robust POMDPs must be (1) memory-based to account for partial observability and (2) robust against model uncertainty to account for the worst-case instances from the uncertainty sets. To compute such robust memory-based policies, we propose the pessimistic iterative planning (PIP) framework, which alternates between two main steps: (1) selecting a pessimistic (non-robust) POMDP via worst-case probability instances from the uncertainty sets; and (2) computing a finite-state controller (FSC) for this pessimistic POMDP. We evaluate the performance of this FSC on the original robust POMDP and use this evaluation in step (1) to select the next pessimistic POMDP. Within PIP, we propose the rFSCNet algorithm. In each iteration, rFSCNet finds an FSC through a recurrent neural network by using supervision policies optimized for the pessimistic POMDP. The empirical evaluation in four benchmark environments showcases improved robustness against several baseline methods and competitive performance compared to a state-of-the-art robust POMDP solver.
Published: 2024

2. Maintenance Strategies for Sewer Pipes with Multi-State Degradation and Deep Reinforcement Learning

Author: Jimenez-Roa, Lisandro A., Simão, Thiago D., Bukhsh, Zaharah, Tinga, Tiedo, Molegraaf, Hajo, Jansen, Nils, and Stoelinga, Marielle
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Computer Science - Computational Engineering, Finance, and Science
Abstract: Large-scale infrastructure systems are crucial for societal welfare, and their effective management requires strategic forecasting and intervention methods that account for various complexities. Our study addresses two challenges within the Prognostics and Health Management (PHM) framework applied to sewer assets: modeling pipe degradation across severity levels and developing effective maintenance policies. We employ Multi-State Degradation Models (MSDM) to represent the stochastic degradation process in sewer pipes and use Deep Reinforcement Learning (DRL) to devise maintenance strategies. A case study of a Dutch sewer network exemplifies our methodology. Our findings demonstrate the model's effectiveness in generating intelligent, cost-saving maintenance strategies that surpass heuristics. It adapts its management strategy based on the pipe's age, opting for a passive approach for newer pipes and transitioning to active strategies for older ones to prevent failures and reduce costs. This research highlights DRL's potential in optimizing maintenance policies. Future research will aim improve the model by incorporating partial observability, exploring various reinforcement learning algorithms, and extending this methodology to comprehensive infrastructure management.
Published: 2024
Full Text: View/download PDF

3. Factored Online Planning in Many-Agent POMDPs

Author: Galesloot, Maris F. L., Simão, Thiago D., Junges, Sebastian, and Jansen, Nils
Subjects: Computer Science - Artificial Intelligence, Computer Science - Multiagent Systems
Abstract: In centralized multi-agent systems, often modeled as multi-agent partially observable Markov decision processes (MPOMDPs), the action and observation spaces grow exponentially with the number of agents, making the value and belief estimation of single-agent online planning ineffective. Prior work partially tackles value estimation by exploiting the inherent structure of multi-agent settings via so-called coordination graphs. Additionally, belief estimation methods have been improved by incorporating the likelihood of observations into the approximation. However, the challenges of value estimation and belief estimation have only been tackled individually, which prevents existing methods from scaling to settings with many agents. Therefore, we address these challenges simultaneously. First, we introduce weighted particle filtering to a sample-based online planner for MPOMDPs. Second, we present a scalable approximation of the belief. Third, we bring an approach that exploits the typical locality of agent interactions to novel online planning algorithms for MPOMDPs operating on a so-called sparse particle filter tree. Our experimental evaluation against several state-of-the-art baselines shows that our methods (1) are competitive in settings with only a few agents and (2) improve over the baselines in the presence of many agents., Comment: Extended version (includes the Appendix) of the paper accepted at AAAI-24
Published: 2023

4. Robust Active Measuring under Model Uncertainty

Author: Krale, Merlijn, Simão, Thiago D., Tumova, Jana, and Jansen, Nils
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence
Abstract: Partial observability and uncertainty are common problems in sequential decision-making that particularly impede the use of formal models such as Markov decision processes (MDPs). However, in practice, agents may be able to employ costly sensors to measure their environment and resolve partial observability by gathering information. Moreover, imprecise transition functions can capture model uncertainty. We combine these concepts and extend MDPs to robust active-measuring MDPs (RAM-MDPs). We present an active-measure heuristic to solve RAM-MDPs efficiently and show that model uncertainty can, counterintuitively, let agents take fewer measurements. We propose a method to counteract this behavior while only incurring a bounded additional cost. We empirically compare our methods to several baselines and show their superior scalability and performance., Comment: Accepted at AAAI 2024
Published: 2023

5. Reinforcement Learning by Guided Safe Exploration

Author: Yang, Qisong, Simão, Thiago D., Jansen, Nils, Tindemans, Simon H., and Spaan, Matthijs T. J.
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence
Abstract: Safety is critical to broadening the application of reinforcement learning (RL). Often, we train RL agents in a controlled environment, such as a laboratory, before deploying them in the real world. However, the real-world target task might be unknown prior to deployment. Reward-free RL trains an agent without the reward to adapt quickly once the reward is revealed. We consider the constrained reward-free setting, where an agent (the guide) learns to explore safely without the reward signal. This agent is trained in a controlled environment, which allows unsafe interactions and still provides the safety signal. After the target task is revealed, safety violations are not allowed anymore. Thus, the guide is leveraged to compose a safe behaviour policy. Drawing from transfer learning, we also regularize a target policy (the student) towards the guide while the student is unreliable and gradually eliminate the influence of the guide as training progresses. The empirical analysis shows that this method can achieve safe transfer learning and helps the student solve the target task faster., Comment: Accecpted at ECAI 2023
Published: 2023
Full Text: View/download PDF

6. More for Less: Safe Policy Improvement With Stronger Performance Guarantees

Author: Wienhöft, Patrick, Suilen, Marnix, Simão, Thiago D., Dubslaff, Clemens, Baier, Christel, and Jansen, Nils
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence
Abstract: In an offline reinforcement learning setting, the safe policy improvement (SPI) problem aims to improve the performance of a behavior policy according to which sample data has been generated. State-of-the-art approaches to SPI require a high number of samples to provide practical probabilistic guarantees on the improved policy's performance. We present a novel approach to the SPI problem that provides the means to require less data for such guarantees. Specifically, to prove the correctness of these guarantees, we devise implicit transformations on the data set and the underlying environment model that serve as theoretical foundations to derive tighter improvement bounds for SPI. Our empirical evaluation, using the well-established SPI with baseline bootstrapping (SPIBB) algorithm, on standard benchmarks shows that our method indeed significantly reduces the sample complexity of the SPIBB algorithm., Comment: Accecpted at IJCAI 2023
Published: 2023

7. Act-Then-Measure: Reinforcement Learning for Partially Observable Environments with Active Measuring

Author: Krale, Merlijn, Simão, Thiago D., and Jansen, Nils
Subjects: Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: We study Markov decision processes (MDPs), where agents have direct control over when and how they gather information, as formalized by action-contingent noiselessly observable MDPs (ACNO-MPDs). In these models, actions consist of two components: a control action that affects the environment, and a measurement action that affects what the agent can observe. To solve ACNO-MDPs, we introduce the act-then-measure (ATM) heuristic, which assumes that we can ignore future state uncertainty when choosing control actions. We show how following this heuristic may lead to shorter policy computation times and prove a bound on the performance loss incurred by the heuristic. To decide whether or not to take a measurement action, we introduce the concept of measuring value. We develop a reinforcement learning algorithm based on the ATM heuristic, using a Dyna-Q variant adapted for partially observable domains, and showcase its superior performance compared to prior methods on a number of partially-observable environments., Comment: Accecpted at ICAPS 2023
Published: 2023

8. Decision-Making Under Uncertainty: Beyond Probabilities

Author: Badings, Thom, Simão, Thiago D., Suilen, Marnix, and Jansen, Nils
Subjects: Computer Science - Artificial Intelligence, Computer Science - Machine Learning, Computer Science - Robotics, Electrical Engineering and Systems Science - Systems and Control
Abstract: This position paper reflects on the state-of-the-art in decision-making under uncertainty. A classical assumption is that probabilities can sufficiently capture all uncertainty in a system. In this paper, the focus is on the uncertainty that goes beyond this classical interpretation, particularly by employing a clear distinction between aleatoric and epistemic uncertainty. The paper features an overview of Markov decision processes (MDPs) and extensions to account for partial observability and adversarial behavior. These models sufficiently capture aleatoric uncertainty but fail to account for epistemic uncertainty robustly. Consequently, we present a thorough overview of so-called uncertainty models that exhibit uncertainty in a more robust interpretation. We show several solution techniques for both discrete and continuous models, ranging from formal verification, over control-based abstractions, to reinforcement learning. As an integral part of this paper, we list and discuss several key challenges that arise when dealing with rich types of uncertainty in a model-based fashion.
Published: 2023

9. Safe Policy Improvement for POMDPs via Finite-State Controllers

Author: Simão, Thiago D., Suilen, Marnix, and Jansen, Nils
Subjects: Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: We study safe policy improvement (SPI) for partially observable Markov decision processes (POMDPs). SPI is an offline reinforcement learning (RL) problem that assumes access to (1) historical data about an environment, and (2) the so-called behavior policy that previously generated this data by interacting with the environment. SPI methods neither require access to a model nor the environment itself, and aim to reliably improve the behavior policy in an offline manner. Existing methods make the strong assumption that the environment is fully observable. In our novel approach to the SPI problem for POMDPs, we assume that a finite-state controller (FSC) represents the behavior policy and that finite memory is sufficient to derive optimal policies. This assumption allows us to map the POMDP to a finite-state fully observable MDP, the history MDP. We estimate this MDP by combining the historical data and the memory of the FSC, and compute an improved policy using an off-the-shelf SPI algorithm. The underlying SPI method constrains the policy-space according to the available data, such that the newly computed policy only differs from the behavior policy when sufficient data was available. We show that this new policy, converted into a new FSC for the (unknown) POMDP, outperforms the behavior policy with high probability. Experimental results on several well-established benchmarks show the applicability of the approach, even in cases where finite memory is not sufficient., Comment: Accecpted at AAAI 2023
Published: 2023

10. Targeted Adversarial Attacks on Deep Reinforcement Learning Policies via Model Checking

Author: Gross, Dennis, Simao, Thiago D., Jansen, Nils, and Perez, Guillermo A.
Subjects: Computer Science - Machine Learning
Abstract: Deep Reinforcement Learning (RL) agents are susceptible to adversarial noise in their observations that can mislead their policies and decrease their performance. However, an adversary may be interested not only in decreasing the reward, but also in modifying specific temporal logic properties of the policy. This paper presents a metric that measures the exact impact of adversarial attacks against such properties. We use this metric to craft optimal adversarial attacks. Furthermore, we introduce a model checking method that allows us to verify the robustness of RL policies against adversarial attacks. Our empirical analysis confirms (1) the quality of our metric to craft adversarial attacks against temporal logic properties, and (2) that we are able to concisely assess a system's robustness against attacks., Comment: ICAART 2023 Paper (Technical Report)
Published: 2022

11. Safe Reinforcement Learning From Pixels Using a Stochastic Latent Representation

Author: Hogewind, Yannick, Simao, Thiago D., Kachman, Tal, and Jansen, Nils
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence
Abstract: We address the problem of safe reinforcement learning from pixel observations. Inherent challenges in such settings are (1) a trade-off between reward optimization and adhering to safety constraints, (2) partial observability, and (3) high-dimensional observations. We formalize the problem in a constrained, partially observable Markov decision process framework, where an agent obtains distinct reward and safety signals. To address the curse of dimensionality, we employ a novel safety critic using the stochastic latent actor-critic (SLAC) approach. The latent variable model predicts rewards and safety violations, and we use the safety critic to train safe policies. Using well-known benchmark environments, we demonstrate competitive performance over existing approaches with respects to computational requirements, final reward return, and satisfying the safety constraints.
Published: 2022

12. Robust Anytime Learning of Markov Decision Processes

Author: Suilen, Marnix, Simão, Thiago D., Parker, David, and Jansen, Nils
Subjects: Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: Markov decision processes (MDPs) are formal models commonly used in sequential decision-making. MDPs capture the stochasticity that may arise, for instance, from imprecise actuators via probabilities in the transition function. However, in data-driven applications, deriving precise probabilities from (limited) data introduces statistical errors that may lead to unexpected or undesirable outcomes. Uncertain MDPs (uMDPs) do not require precise probabilities but instead use so-called uncertainty sets in the transitions, accounting for such limited data. Tools from the formal verification community efficiently compute robust policies that provably adhere to formal specifications, like safety constraints, under the worst-case instance in the uncertainty set. We continuously learn the transition probabilities of an MDP in a robust anytime-learning approach that combines a dedicated Bayesian inference scheme with the computation of robust policies. In particular, our method (1) approximates probabilities as intervals, (2) adapts to new data that may be inconsistent with an intermediate model, and (3) may be stopped at any time to compute a robust policy on the uMDP that faithfully captures the data so far. Furthermore, our method is capable of adapting to changes in the environment. We show the effectiveness of our approach and compare it to robust policies computed on uMDPs learned by the UCRL2 reinforcement learning algorithm in an experimental evaluation on several benchmarks., Comment: Accepted at NeurIPS 2022
Published: 2022

13. Reinforcement Learning by Guided Safe Exploration

Author: Yang, Qisong, primary, Simão, Thiago D., additional, Jansen, Nils, additional, Tindemans, Simon H., additional, and Spaan, Matthijs T. J., additional
Published: 2023
Full Text: View/download PDF

14. Decision-making under uncertainty: beyond probabilities: Challenges and perspectives

Author: Badings, Thom, Simão, Thiago D., Suilen, Marnix, and Jansen, Nils
Published: 2023
Full Text: View/download PDF

15. Safety-constrained reinforcement learning with a distributional safety critic

Author: Yang, Qisong, Simão, Thiago D., Tindemans, Simon H., and Spaan, Matthijs T. J.
Published: 2023
Full Text: View/download PDF

16. Safe Policy Improvement with an Estimated Baseline Policy

Author: Simão, Thiago D., Laroche, Romain, and Combes, Rémi Tachet des
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Statistics - Machine Learning
Abstract: Previous work has shown the unreliability of existing algorithms in the batch Reinforcement Learning setting, and proposed the theoretically-grounded Safe Policy Improvement with Baseline Bootstrapping (SPIBB) fix: reproduce the baseline policy in the uncertain state-action pairs, in order to control the variance on the trained policy performance. However, in many real-world applications such as dialogue systems, pharmaceutical tests or crop management, data is collected under human supervision and the baseline remains unknown. In this paper, we apply SPIBB algorithms with a baseline estimate built from the data. We formally show safe policy improvement guarantees over the true baseline even without direct access to it. Our empirical experiments on finite and continuous states tasks support the theoretical findings. It shows little loss of performance in comparison with SPIBB when the baseline policy is given, and more importantly, drastically and significantly outperforms competing algorithms both in safe policy improvement, and in average performance., Comment: Published at AAMAS 2020
Published: 2019

17. Robust Active Measuring under Model Uncertainty

Author: Krale, Merlijn, Simão, Thiago D., Tumova, Jana, Jansen, Nils, Krale, Merlijn, Simão, Thiago D., Tumova, Jana, and Jansen, Nils
Abstract: Partial observability and uncertainty are common problems in sequential decision-making that particularly impede the use of formal models such as Markov decision processes (MDPs). However, in practice, agents may be able to employ costly sensors to measure their environment and resolve partial observability by gathering information. Moreover, imprecise transition functions can capture model uncertainty. We combine these concepts and extend MDPs to robust active-measuring MDPs (RAM-MDPs). We present an active-measure heuristic to solve RAM-MDPs efficiently and show that model uncertainty can, counterintuitively, let agents take fewer measurements. We propose a method to counteract this behavior while only incurring a bounded additional cost. We empirically compare our methods to several baselines and show their superior scalability and performance., QC 20240430
Published: 2024
Full Text: View/download PDF

18. When a Robot Reaches Out for Human Help

Author: Andrés, Ignasi, de Barros, Leliane Nunes, Mauá, Denis D., Simão, Thiago D., Simari, Guillermo R., editor, Fermé, Eduardo, editor, Gutiérrez Segura, Flabio, editor, and Rodríguez Melquiades, José Antonio, editor
Published: 2018
Full Text: View/download PDF

19. More for Less: Safe Policy Improvement with Stronger Performance Guarantees

Author: Wienhöft, Patrick, primary, Suilen, Marnix, additional, Simão, Thiago D., additional, Dubslaff, Clemens, additional, Baier, Christel, additional, and Jansen, Nils, additional
Published: 2023
Full Text: View/download PDF

20. Recursive Small-Step Multi-Agent A* for Dec-POMDPs

Author: Koops, Wietze, primary, Jansen, Nils, additional, Junges, Sebastian, additional, and Simão, Thiago D., additional
Published: 2023
Full Text: View/download PDF

21. Act-Then-Measure: Reinforcement Learning for Partially Observable Environments with Active Measuring

Author: Krale, Merlijn, primary, Simão, Thiago D., additional, and Jansen, Nils, additional
Published: 2023
Full Text: View/download PDF

22. Safe Policy Improvement for POMDPs via Finite-State Controllers

Author: Simão, Thiago D., primary, Suilen, Marnix, additional, and Jansen, Nils, additional
Published: 2023
Full Text: View/download PDF

23. Scalable Safe Policy Improvement via Monte Carlo Tree Search

Author: Castellini, Alberto (author), Bianchi, Federico (author), Zorzi, Edoardo (author), Simão, Thiago D. (author), Farinelli, Alessandro (author), Spaan, M.T.J. (author), Castellini, Alberto (author), Bianchi, Federico (author), Zorzi, Edoardo (author), Simão, Thiago D. (author), Farinelli, Alessandro (author), and Spaan, M.T.J. (author)
Abstract: Algorithms for safely improving policies are important to deploy reinforcement learning approaches in real-world scenarios. In this work, we propose an algorithm, called MCTS-SPIBB, that computes safe policy improvement online using a Monte Carlo Tree Search based strategy. We theoretically prove that the policy generated by MCTS-SPIBB converges, as the number of simulations grows, to the optimal safely improved policy generated by Safe Policy Improvement with Baseline Bootstrapping (SPIBB), a popular algorithm based on policy iteration. Moreover, our empirical analysis performed on three standard benchmark domains shows that MCTS-SPIBB scales to significantly larger problems than SPIBB because it computes the policy online and locally, i.e., only in the states actually visited by the agent., Algorithmics
Published: 2023

24. When a Robot Reaches Out for Human Help

Author: Andrés, Ignasi, primary, de Barros, Leliane Nunes, additional, Mauá, Denis D., additional, and Simão, Thiago D., additional
Published: 2018
Full Text: View/download PDF

25. Safety-constrained reinforcement learning with a distributional safety critic

Author: Yang, Qisong, primary, Simão, Thiago D., additional, Tindemans, Simon H., additional, and Spaan, Matthijs T. J., additional
Published: 2022
Full Text: View/download PDF

26. WCSAC: Worst-Case Soft Actor Critic for Safety-Constrained Reinforcement Learning

Author: Yang, Qisong, primary, Simão, Thiago D., additional, Tindemans, Simon H, additional, and Spaan, Matthijs T. J., additional
Published: 2021
Full Text: View/download PDF

27. Safe and Sample-Efficient Reinforcement Learning Algorithms for Factored Environments

Author: Simão, Thiago D., primary
Published: 2019
Full Text: View/download PDF

28. Structure Learning for Safe Policy Improvement

Author: Simão, Thiago D., primary and Spaan, Matthijs T. J., additional
Published: 2019
Full Text: View/download PDF

29. Safe Policy Improvement with Baseline Bootstrapping in Factored Environments

Author: Simão, Thiago D., primary and Spaan, Matthijs T. J., additional
Published: 2019
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

29 results on '"Simão, Thiago D."'

1. Pessimistic Iterative Planning for Robust POMDPs

2. Maintenance Strategies for Sewer Pipes with Multi-State Degradation and Deep Reinforcement Learning

3. Factored Online Planning in Many-Agent POMDPs

4. Robust Active Measuring under Model Uncertainty

5. Reinforcement Learning by Guided Safe Exploration

6. More for Less: Safe Policy Improvement With Stronger Performance Guarantees

7. Act-Then-Measure: Reinforcement Learning for Partially Observable Environments with Active Measuring

8. Decision-Making Under Uncertainty: Beyond Probabilities

9. Safe Policy Improvement for POMDPs via Finite-State Controllers

10. Targeted Adversarial Attacks on Deep Reinforcement Learning Policies via Model Checking

11. Safe Reinforcement Learning From Pixels Using a Stochastic Latent Representation

12. Robust Anytime Learning of Markov Decision Processes

13. Reinforcement Learning by Guided Safe Exploration

14. Decision-making under uncertainty: beyond probabilities: Challenges and perspectives

15. Safety-constrained reinforcement learning with a distributional safety critic

16. Safe Policy Improvement with an Estimated Baseline Policy

17. Robust Active Measuring under Model Uncertainty

18. When a Robot Reaches Out for Human Help

19. More for Less: Safe Policy Improvement with Stronger Performance Guarantees

20. Recursive Small-Step Multi-Agent A* for Dec-POMDPs

21. Act-Then-Measure: Reinforcement Learning for Partially Observable Environments with Active Measuring

22. Safe Policy Improvement for POMDPs via Finite-State Controllers

23. Scalable Safe Policy Improvement via Monte Carlo Tree Search

24. When a Robot Reaches Out for Human Help

25. Safety-constrained reinforcement learning with a distributional safety critic

26. WCSAC: Worst-Case Soft Actor Critic for Safety-Constrained Reinforcement Learning

27. Safe and Sample-Efficient Reinforcement Learning Algorithms for Factored Environments

28. Structure Learning for Safe Policy Improvement

29. Safe Policy Improvement with Baseline Bootstrapping in Factored Environments

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

Publisher

29 results on '"Simão, Thiago D."'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources