Descriptor: "Direct policy search" / Search Limiters: Full Text - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Direct policy search"' showing total 28 results

Start Over Descriptor "Direct policy search" Search Limiters Full Text

28 results on '"Direct policy search"'

1. Connectivity of the Feasible and Sublevel Sets of Dynamic Output Feedback Control With Robustness Constraints

Author: Hu, Bin and Zheng, Yang
Subjects: Optimization landscape, sublevel set, direct policy search, H-infinity control, LOG control
Published: 2023

2. Multi-objective optimal design of interbasin water transfers: The Tagus-Segura aqueduct (Spain)

Author: Carlotta Valerio, Matteo Giuliani, Andrea Castelletti, Alberto Garrido, and Lucia De Stefano
Subjects: Interbasin water transfer, Tagus-Segura aqueduct, Multi-objective evolutionary optimization, Direct policy search, Environmental flow, Physical geography, GB3-5030, Geology, QE1-996.5
Abstract: Study region: The Tagus-Segura aqueduct (TSA) is a large and strategic water transfer scheme in Spain that connects Entrepeñas and Buendía reservoirs in the Tagus river headwaters to the Segura river basin, a highly stressed Mediterranean area. Study focus: The operating rules of the TSA underwent several modifications over the years, and the debate about which are the optimal parameters to meet the interests of the parties involved is still open. We employed Evolutionary Multi-Objective Direct Policy Search to jointly optimize the re-operation of the headwaters dams and the water transfer policy with respect to four conflicting objectives: Tagus and Segura water demands, hydropower production and socioeconomic benefit of the population living on the shores of the headwaters reservoirs. We tested the optimization under the baseline and the 2027 scenario, which foresees an increased environmental flow (EF) in the Tagus river. New hydrological insights for the region: The proposed operating rule presents optimized control parameters, a higher degree of freedom and a transferred volume that cyclically varies according to the hydrological stage of the year. In the 2027 scenario, despite the increased EF, the deficit in the aqueduct shows a limited increase compared to the historical solution (+10%), while the storage deficit is strongly reduced (−73%). This benefits the population living on the reservoirs shores and also ensures more stability to the aqueduct functioning.
Published: 2023
Full Text: View/download PDF

3. Quantifying the trade-offs in re-operating dams for the environment in the Lower Volta River.

Author: Owusu, Afua, Salazar, Jazmin Zatarain, Mul, Marloes, van der Zaag, Pieter, and Slinger, Jill
Abstract: The construction of the Akosombo and Kpong dams in the Lower Volta River Basin in Ghana changed the downstream riverine ecosystem and affected the lives of downstream communities, particularly those who lost their traditional livelihoods. In contrast to the costs borne by those in the vicinity of the river, Ghana as a whole, has enjoyed vast economic benefits from the affordable hydropower, irrigation schemes and lake tourism that developed after construction of the dams. Herein lies the challenge; there exists a trade-off between water for river ecosystems and related services on the one hand, and anthropogenic water demands such hydropower or irrigation on the other. In this study, an Evolutionary Multi-Objective Direct Policy Search (EMODPS) is used to identify the multi-sectorial trade-offs that exist in the Lower Volta River Basin. Three environmental flows, previously determined for the Lower Volta are incorporated separately as an environmental objective. The results highlight the dominance of hydropower production in the Lower Volta, but show that there is room for providing environmental flows under current climatic and water use conditions if firm energy requirement from Akosombo Dam reduces by 12% to 38% depending on the environmental flow regime that is implemented. There is uncertainty in climate change effects on runoff in this region, however multiple scenarios are investigated. It is found that climate change leading to increased annual inflows to the Akosombo Dam reduces the trade-off between hydropower and the environment while climate change resulting in lower inflows provide the opportunity to strategically provide dry season environmental flows, that is, reduce flows sufficiently to meet low flow requirements for key ecosystem services such as the clam fishery. This study not only highlights the challenges in balancing anthropogenic water demands and environmental considerations in managing existing dams, but also identifies opportunities for compromise in the Lower Volta River [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

4. Evaluating the choice of radial basis functions in multiobjective optimal control applications

Author: Zatarain Salazar, J. (author), Kwakkel, J.H. (author), Witvliet, Mark (author), Zatarain Salazar, J. (author), Kwakkel, J.H. (author), and Witvliet, Mark (author)
Abstract: Evolutionary Multi-Objective Direct Policy Search (EMODPS) is a prominent framework for designing control policies in multi-purpose environmental systems, combining direct policy search with multi-objective evolutionary algorithms (MOEAs) to identify Pareto approximate control policies. While EMODPS is effective, the choice of functions within its global approximator networks remains underexplored, despite their potential to significantly influence both solution quality and MOEA performance. This study conducts a rigorous assessment of a suite of Radial Basis Functions (RBFs) as candidates for these networks. We critically evaluate their ability to map system states to control actions, and assess their influence on Pareto efficient control policies. We apply this analysis to two contrasting case studies: the Conowingo Reservoir System, which balances competing water demands including hydropower, environmental flows, urban supply, power plant cooling, and recreation; and The Shallow Lake Problem, where a city navigates the trade-off between environmental and economic objectives when releasing anthropogenic phosphorus. Our findings reveal that the choice of RBF functions substantially impacts model outcomes. In complex scenarios like multi-objective reservoir control, this choice is critical, while in simpler contexts, such as the Shallow Lake Problem, the influence is less pronounced, though distinctive differences emerge in the characteristics of the prescribed control strategies., Policy Analysis
Published: 2024
Full Text: View/download PDF

5. Direct Policy Search Reinforcement Learning Based on Variational Bayesian Inference.

Author: Yamaguchi, Nobuhiko
Subjects: *REINFORCEMENT learning, *BAYESIAN analysis, *COMPUTER algorithms, *PARAMETER estimation, *GOVERNMENT policy
Abstract: Direct policy search is a promising reinforcement learning framework particularly for controlling continuous, high-dimensional systems. Peters et al. proposed reward-weighted regression (RWR) as a direct policy search. The RWR algorithm estimates the policy parameter based on the expectation-maximization (EM) algorithm and is therefore prone to overfitting. In this study, we focus on variational Bayesian inference to avoid overfitting and propose direct policy search reinforcement learning based on variational Bayesian inference (VBRL). The performance of the proposed VBRL is assessed in several experiments involving a mountain car and a ball batting task. These experiments demonstrate that VBRL yields a higher average return and outperforms the RWR. [ABSTRACT FROM AUTHOR]
Published: 2020
Full Text: View/download PDF

6. Evaluating the choice of radial basis functions in multiobjective optimal control applications.

Author: Zatarain Salazar, Jazmin, Kwakkel, Jan H., and Witvliet, Mark
Subjects: *RADIAL basis functions, *EVOLUTIONARY algorithms, *WATER management, *WATER power
Abstract: Evolutionary Multi-Objective Direct Policy Search (EMODPS) is a prominent framework for designing control policies in multi-purpose environmental systems, combining direct policy search with multi-objective evolutionary algorithms (MOEAs) to identify Pareto approximate control policies. While EMODPS is effective, the choice of functions within its global approximator networks remains underexplored, despite their potential to significantly influence both solution quality and MOEA performance. This study conducts a rigorous assessment of a suite of Radial Basis Functions (RBFs) as candidates for these networks. We critically evaluate their ability to map system states to control actions, and assess their influence on Pareto efficient control policies. We apply this analysis to two contrasting case studies: the Conowingo Reservoir System, which balances competing water demands including hydropower, environmental flows, urban supply, power plant cooling, and recreation; and The Shallow Lake Problem, where a city navigates the trade-off between environmental and economic objectives when releasing anthropogenic phosphorus. Our findings reveal that the choice of RBF functions substantially impacts model outcomes. In complex scenarios like multi-objective reservoir control, this choice is critical, while in simpler contexts, such as the Shallow Lake Problem, the influence is less pronounced, though distinctive differences emerge in the characteristics of the prescribed control strategies. • RBF choice in EMODPS impacts tradeoffs and policies in multiobjective control. • Lake Problem: RBFs affect control policies, not objective values. • Concave RBFs excel in complex EMODPS, like Conowingo Reservoir. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

7. Using direct policy search to identify robust strategies in adapting to uncertain sea-level rise and storm surge.

Author: Garner, Gregory G. and Keller, Klaus
Subjects: *SEA level, *STORM surges, *COASTAL ecology, *CLIMATE change, *INFRASTRUCTURE (Economics)
Abstract: Sea-level rise poses considerable risks to coastal communities, ecosystems, and infrastructure. Decision makers are faced with uncertain sea-level projections when designing a strategy for coastal adaptation. The traditional methods are often silent on tradeoffs as well as the effects of tail-area events and of potential future learning. Here we reformulate a simple sea-level rise adaptation model to address these concerns. We show that Direct Policy Search yields improved solution quality, with respect to Pareto-dominance in the objectives, over the traditional approach under uncertain sea-level rise projections and storm surge. Additionally, the new formulation produces high quality solutions with less computational demands than an intertemporal optimization approach. Our results illustrate the utility of multi-objective adaptive formulations for the example of coastal adaptation and point to wider-ranging application in climate change adaptation decision problems. [ABSTRACT FROM AUTHOR]
Published: 2018
Full Text: View/download PDF

8. Exploring global approximators for multiobjective reservoir control

Author: Zatarain Salazar, J. (author), Kwakkel, J.H. (author), Witvliet, Mark (author), Zatarain Salazar, J. (author), Kwakkel, J.H. (author), and Witvliet, Mark (author)
Abstract: Efficient multi-purpose reservoir control policies are crucial in the face of frequent and severe floods and droughts, and to balance water allocation across conflicting demands. Evolutionary Multi-Objective Direct Policy Search (EMODPS) is a popular approach to design control policies for multi-purpose reservoir systems. EMODPS, however, relies on experimental choices within the key components of the framework particularly when coupling multi-objective evolutionary optimization with nonlinear approximation networks. This study explores a suite of radial basis functions (RBFs) used to map the system's states to control actions in a flexible manner as time-varying, non-linear relationships. We provide a systematic assessment of different RBF functions to explore their suitability to obtain Pareto efficient control policies. We use the Susquehanna river basin case study in which competing water demands for hydropower, environment, urban water supply, atomic power plant cooling and recreation need to be met. Our findings suggest that the choice of RBF functions have a large impact on the model outcomes and the search behavior of the optimization algorithm., Policy Analysis
Published: 2022
Full Text: View/download PDF

9. Direct policy search for robust multi-objective management of deeply uncertain socio-ecological tipping points.

Author: Quinn, Julianne D., Reed, Patrick M., and Keller, Klaus
Subjects: *ENVIRONMENTAL policy, *ECOSYSTEMS, *SOCIAL systems, *PHOSPHORUS, *POLLUTION control industry, *DECISION making
Abstract: Managing socio-ecological systems is a challenge wrought by competing societal objectives, deep uncertainties, and potentially irreversible tipping points. A classic, didactic example is the shallow lake problem in which a hypothetical town situated on a lake must develop pollution control strategies to maximize its economic benefits while minimizing the probability of the lake crossing a critical phosphorus (P) threshold, above which it irreversibly transitions into a eutrophic state. Here, we explore the use of direct policy search (DPS) to design robust pollution control rules for the town that account for deeply uncertain system characteristics and conflicting objectives. The closed loop control formulation of DPS improves the quality and robustness of key management tradeoffs, while dramatically reducing the computational complexity of solving the multi-objective pollution control problem relative to open loop control strategies. These insights suggest DPS is a promising tool for managing socio-ecological systems with deeply uncertain tipping points. [ABSTRACT FROM AUTHOR]
Published: 2017
Full Text: View/download PDF

10. Connectivity of the Feasible and Sublevel Sets of Dynamic Output Feedback Control with Robustness Constraints

Author: Bin Hu and Yang Zheng
Subjects: direct policy search, Control and Optimization, Control and Systems Engineering, Optimization and Control (math.OC), LOG control, H-infinity control, FOS: Mathematics, Optimization landscape, sublevel set, Mathematics - Optimization and Control
Abstract: This paper considers the optimization landscape of linear dynamic output feedback control with $\mathcal{H}_\infty$ robustness constraints. We consider the feasible set of all the stabilizing full-order dynamical controllers that satisfy an additional $\mathcal{H}_\infty$ robustness constraint. We show that this $\mathcal{H}_\infty$-constrained set has at most two path-connected components that are diffeomorphic under a mapping defined by a similarity transformation. Our proof technique utilizes a classical change of variables in $\mathcal{H}_\infty$ control to establish a subjective mapping from a set with a convex projection to the $\mathcal{H}_\infty$-constrained set. This proof idea can also be used to establish the same topological properties of strict sublevel sets of linear quadratic Gaussian (LQG) control and optimal $\mathcal{H}_\infty$ control. Our results bring positive news for gradient-based policy search on robust control problems., Comment: Submitted to L-CSS and CDC 2022
Published: 2022
Full Text: View/download PDF

11. Exploring global approximators for multiobjective reservoir control

Author: Jazmin Zatarain Salazar, Jan Kwakkel, and Mark Witvliet
Subjects: direct policy search, Control and Systems Engineering, global approximators, Optimal operation of water resources systems
Abstract: Efficient multi-purpose reservoir control policies are crucial in the face of frequent and severe floods and droughts, and to balance water allocation across conflicting demands. Evolutionary Multi-Objective Direct Policy Search (EMODPS) is a popular approach to design control policies for multi-purpose reservoir systems. EMODPS, however, relies on experimental choices within the key components of the framework particularly when coupling multi-objective evolutionary optimization with nonlinear approximation networks. This study explores a suite of radial basis functions (RBFs) used to map the system's states to control actions in a flexible manner as time-varying, non-linear relationships. We provide a systematic assessment of different RBF functions to explore their suitability to obtain Pareto efficient control policies. We use the Susquehanna river basin case study in which competing water demands for hydropower, environment, urban water supply, atomic power plant cooling and recreation need to be met. Our findings suggest that the choice of RBF functions have a large impact on the model outcomes and the search behavior of the optimization algorithm.
Published: 2022

12. Season-Dependent Hedging Policies for Reservoir Operation—A Comparison Study

Author: Nikhil Bhatia, Roshan Srivastav, and Kasthrirengan Srinivasan
Subjects: parameterization, simulation, optimization, direct policy search, hedging policy, shortage ratio: Vulnerability, NSGA-II, Hydraulic engineering, TC1-978, Water supply for domestic and industrial purposes, TD201-500
Abstract: During periods of significant water shortage or when drought is impending, it is customary to implement some kind of water supply reduction measures with a view to prevent the occurrence of severe shortages (vulnerability) in the near future. In the case of operation of a water supply reservoir, this reduction of water supply is affected by hedging schemes or hedging policies. This research work aims to compare the popular hedging policies: (i) linear two-point hedging; (ii) modified two-point hedging; and, (iii) discrete hedging based on time-varying and constant hedging parameters. A parameterization-simulation-optimization (PSO) framework is employed for the selection of the parameters of the compromising hedging policies. The multi-objective evolutionary search-based technique (Non-dominated Sorting based Genetic Algorithm-II) was used to identify the Pareto-optimal front of hedging policies that seek to obtain the trade-off between shortage ratio and vulnerability. The case example used for illustration is the Hemavathy reservoir in Karnataka, India. It is observed that the Pareto-optimal front that was obtained from time-varying hedging policies show significant improvement in reservoir performance when compared to constant hedging policies. The variation in the monthly parameters of the time-variant hedging policies shows a strong correlation with monthly inflows and available water.
Published: 2018
Full Text: View/download PDF

13. Distribution of waiting time for dynamic pickup and delivery problems.

Author: Vonolfen, Stefan and Affenzeller, Michael
Subjects: *DISTRIBUTION (Probability theory), *EXPRESS service (Delivery of goods), *PASSENGERS, *HEURISTIC algorithms, *SIMULATION methods & models
Abstract: Pickup and delivery problems have numerous applications in practice such as parcel delivery and passenger transportation. In the dynamic variant of the problem, not all information is available in advance but is revealed during the planning process. Thus, it is crucial to anticipate future events in order to generate high-quality solutions. Previous work has shown that the use of waiting strategies has the potential to save costs and maximize service quality. We adapt various waiting heuristics to the pickup and delivery problem with time windows. Previous research has shown, that specialized waiting heuristics utilizing anticipatory knowledge potentially outperform general heuristics. Direct policy search based on evolutionary computation and a simulation model is proposed as a methodology to automatically specialize waiting strategies to different problem characteristics. Based on the strengths of the previously introduced waiting strategies, we propose a novel waiting heuristic that can utilize historical request information based on an intensity measure which does not require an additional data preprocessing step. The performance of the waiting heuristics is evaluated on a single set of benchmark instances containing various instance classes that differ in terms of spatial and temporal properties. The diverse set of benchmark instances is used to analyze the influence of spatial and temporal instance properties as well as the degree of dynamism to the potential savings that can be achieved by anticipatory waiting and the incorporation of knowledge about future requests. [ABSTRACT FROM AUTHOR]
Published: 2016
Full Text: View/download PDF

14. A genetic fuzzy system for interpretable and parsimonious reinforcement learning policies

Author: Chicano, Francisco, Bishop, Jordan T., Gallagher, Marcus, Browne, Will N., Chicano, Francisco, Bishop, Jordan T., Gallagher, Marcus, and Browne, Will N.
Abstract: Reinforcement learning (RL) is experiencing a resurgence in research interest, where Learning Classifier Systems (LCSs) have been applied for many years. However, traditional Michigan approaches tend to evolve large rule bases that are difficult to interpret or scale to domains beyond standard mazes. A Pittsburgh Genetic Fuzzy System (dubbed Fuzzy MoCoCo) is proposed that utilises both multiobjective and cooperative coevolutionary mechanisms to evolve fuzzy rule-based policies for RL environments. Multiobjectivity in the system is concerned with policy performance vs. complexity. The continuous state RL environment Mountain Car is used as a testing bed for the proposed system. Results show the system is able to effectively explore the trade-off between policy performance and complexity, and learn interpretable, high-performing policies that use as few rules as possible.
Published: 2021

15. Multiobjective direct policy search using physically based operating rules in multireservoir systems

Author: Universitat Politècnica de Catalunya. Doctorat en Enginyeria Civil, Universitat Politècnica de Catalunya. CRAHI - Centre de Recerca Aplicada en Hidrometeorologia, Ritter, Josias Manuel Gisbert, Corzo, Gerald, Solomatine, Dimitri P., Angarita, Héctor, Universitat Politècnica de Catalunya. Doctorat en Enginyeria Civil, Universitat Politècnica de Catalunya. CRAHI - Centre de Recerca Aplicada en Hidrometeorologia, Ritter, Josias Manuel Gisbert, Corzo, Gerald, Solomatine, Dimitri P., and Angarita, Héctor
Abstract: supplemental_data_wr.1943-5452.0001159_ritter.pdf (492 KB), This study explores the ways to introduce physical interpretability into the process of optimizing operating rules for multireservoir systems with multiple objectives. Prior studies applied the concept of direct policy search (DPS), in which the release policy is expressed as a set of parameterized functions (e.g., neural networks) that are optimized by simulating the performance of different parameter value combinations over a testing period. The problem with this approach is that the operators generally avoid adopting such artificial black-box functions for the direct real-time control of their systems, preferring simpler tools with a clear connection to the system physics. This study addresses this mismatch by replacing the black-box functions in DPS with physically based parameterized operating rules, for example by directly using target levels in dams as decision variables. This leads to results that are physically interpretable and may be more acceptable to operators. The methodology proposed in this work is applied to a network of five reservoirs and four power plants in the Nechi catchment in Colombia, with four interests involved: average energy generation, firm energy generation, flood hazard, and flow regime alteration. The release policy is expressed depending on only 12 parameters, which significantly reduces the computational complexity compared to existing approaches of multiobjective DPS. The resulting four-dimensional Pareto-approximate set offers a variety of operational strategies from which operators may choose one that corresponds best to their preferences. For demonstration purposes, one particular optimized policy is selected and its parameter values are analyzed to illustrate how the physically based operating rules can be directly interpreted by the operators., Peer Reviewed, Preprint
Published: 2020

16. Adaptive mitigation strategies hedge against extreme climate futures

Author: Giacomo Marangoni, Patrick M. Reed, Jonathan R. Lamontagne, Klaus Keller, and J. Quinn
Subjects: Atmospheric Science, Adaptive strategies, 010504 meteorology & atmospheric sciences, Adaptive mitigation pathways, Climate change, Context (language use), Climate risk management, Adaptive mitigation pathways, Integrated assessment modelling, Multi-objective optimization, Direct policy search, 01 natural sciences, 12. Responsible consumption, United Nations Framework Convention on Climate Change, 0502 economics and business, 11. Sustainability, Direct policy search, 050207 economics, Climate risk management, Integrated assessment modelling, 0105 earth and related environmental sciences, Sustainable development, Global and Planetary Change, 05 social sciences, 1. No poverty, Environmental economics, Multi-objective optimization, 13. Climate action, Business, Futures contract
Abstract: The United Nations Framework Convention on Climate Change agreed to “strengthen the global response to the threat of climate change, in the context of sustainable development and efforts to eradicate poverty” (UNFCCC 2015). Designing a global mitigation strategy to support this goal poses formidable challenges. For one, there are trade-offs between the economic costs and the environmental benefits of averting climate impacts. Furthermore, the coupled human-Earth systems are subject to deep and dynamic uncertainties. Previous economic analyses typically addressed either the former, introducing multiple objectives, or the latter, making mitigation actions responsive to new information. This paper aims at bridging these two separate strands of literature. We demonstrate how information feedback from observed global temperature changes can jointly improve the economic and environmental performance of mitigation strategies. We focus on strategies that maximize discounted expected utility while also minimizing warming above 2 °C, damage costs, and mitigation costs. Expanding on the Dynamic Integrated Climate-Economy (DICE) model and previous multi-objective efforts, we implement closed-loop control strategies, map the emerging trade-offs and quantify the value of the temperature information feedback under both well-characterized and deep climate uncertainties. Adaptive strategies strongly reduce high regrets, guarding against mitigation overspending for less sensitive climate futures, and excessive warming for more sensitive ones.
Published: 2021

17. Reinforcement Learning with Rare Significant Events: Direct Policy Search vs. Gradient Policy Search

Author: Nicolas Fontbonne, Jean-Baptiste André, Paul Ecoffet, Nicolas Bredeche, Bredeche, Nicolas, Sorbonne Université (SU), Institut Jean-Nicod (IJN), Département d'Etudes Cognitives - ENS Paris (DEC), École normale supérieure - Paris (ENS Paris), Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-École normale supérieure - Paris (ENS Paris), Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-École des hautes études en sciences sociales (EHESS)-Collège de France (CdF (institution))-Centre National de la Recherche Scientifique (CNRS)-Département de Philosophie - ENS Paris, and Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)
Subjects: [INFO.INFO-AI] Computer Science [cs]/Artificial Intelligence [cs.AI], PPO, Computer science, gradient policy search, Evolutionary algorithm, [INFO.INFO-NE] Computer Science [cs]/Neural and Evolutionary Computing [cs.NE], 0102 computer and information sciences, 02 engineering and technology, [INFO.INFO-NE]Computer Science [cs]/Neural and Evolutionary Computing [cs.NE], Machine learning, computer.software_genre, 01 natural sciences, Task (project management), [INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI], continuous state and action spaces, [INFO.INFO-LG]Computer Science [cs]/Machine Learning [cs.LG], 0202 electrical engineering, electronic engineering, information engineering, Reinforcement learning, evolutionary algorithms, ComputingMilieux_MISCELLANEOUS, CMAES, rare significant events, business.industry, [INFO.INFO-LG] Computer Science [cs]/Machine Learning [cs.LG], on-policy, direct policy search, 010201 computation theory & mathematics, 020201 artificial intelligence & image processing, Artificial intelligence, business, computer, on-line
Abstract: This paper shows that the CMAES direct policy search method fares significantly better than PPO gradient policy search for a reinforcement learning task where significant events are rare.
Published: 2021

18. Policy Search with Rare Significant Events: Choosing the Right Partner to Cooperate with

Author: Paul Ecoffet, Nicolas Fontbonne, Jean-Baptiste André, Nicolas Bredeche, Sorbonne Université (SU), Institut Jean-Nicod (IJN), Département d'Etudes Cognitives - ENS Paris (DEC), École normale supérieure - Paris (ENS Paris), Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-École normale supérieure - Paris (ENS Paris), Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-École des hautes études en sciences sociales (EHESS)-Collège de France (CdF (institution))-Centre National de la Recherche Scientifique (CNRS)-Département de Philosophie - ENS Paris, and Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)
Subjects: I.2, FOS: Computer and information sciences, Computer Science - Machine Learning, reinforcement learning, PPO, Computer Science - Artificial Intelligence, gradient policy search, [INFO.INFO-NE]Computer Science [cs]/Neural and Evolutionary Computing [cs.NE], [INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI], Machine Learning (cs.LG), continuous state and action spaces, Reward, [INFO.INFO-LG]Computer Science [cs]/Machine Learning [cs.LG], Neural and Evolutionary Computing (cs.NE), evolutionary algorithms, CMAES, rare significant events, Multidisciplinary, I.2.6, Computer Science - Neural and Evolutionary Computing, on-policy, Policy, direct policy search, Artificial Intelligence (cs.AI), cooperation and partner choice, Reinforcement, Psychology, Algorithms, on-line
Abstract: This paper focuses on a class of reinforcement learning problems where significant events are rare and limited to a single positive reward per episode. A typical example is that of an agent who has to choose a partner to cooperate with, while a large number of partners are simply not interested in cooperating, regardless of what the agent has to offer. We address this problem in a continuous state and action space with two different kinds of search methods: a gradient policy search method and a direct policy search method using an evolution strategy. We show that when significant events are rare, gradient information is also scarce, making it difficult for policy gradient search methods to find an optimal policy, with or without a deep neural architecture. On the other hand, we show that direct policy search methods are invariant to the rarity of significant events, which is yet another confirmation of the unique role evolutionary algorithms has to play as a reinforcement learning method.
Published: 2021
Full Text: View/download PDF

19. Multiobjective Direct Policy Search Using Physically Based Operating Rules in Multireservoir Systems

Author: Gerald Corzo, Josias Ritter, Hector Angarita, Dimitri Solomatine, Universitat Politècnica de Catalunya. Doctorat en Enginyeria Civil, and Universitat Politècnica de Catalunya. CRAHI - Centre de Recerca Aplicada en Hidrometeorologia
Subjects: Mathematical optimization, Physics - Physics and Society, Computer science, Process (engineering), Geography, Planning and Development, FOS: Physical sciences, Physics and Society (physics.soc-ph), Management, Monitoring, Policy and Law, Parameterization simulation optimization, Physics - Atmospheric and Oceanic Physics, Atmospheric and Oceanic Physics (physics.ao-ph), Rivers--Regulation, Policy myopia, Direct policy search, Cursos d'aigua -- Regulació -- Models matemàtics, Multiobjective reservoir optimization, Multireservoir systems, Enginyeria civil::Enginyeria hidràulica, marítima i sanitària::Embassaments i preses [Àrees temàtiques de la UPC], Water Science and Technology, Civil and Structural Engineering, Interpretability
Abstract: supplemental_data_wr.1943-5452.0001159_ritter.pdf (492 KB) This study explores the ways to introduce physical interpretability into the process of optimizing operating rules for multireservoir systems with multiple objectives. Prior studies applied the concept of direct policy search (DPS), in which the release policy is expressed as a set of parameterized functions (e.g., neural networks) that are optimized by simulating the performance of different parameter value combinations over a testing period. The problem with this approach is that the operators generally avoid adopting such artificial black-box functions for the direct real-time control of their systems, preferring simpler tools with a clear connection to the system physics. This study addresses this mismatch by replacing the black-box functions in DPS with physically based parameterized operating rules, for example by directly using target levels in dams as decision variables. This leads to results that are physically interpretable and may be more acceptable to operators. The methodology proposed in this work is applied to a network of five reservoirs and four power plants in the Nechi catchment in Colombia, with four interests involved: average energy generation, firm energy generation, flood hazard, and flow regime alteration. The release policy is expressed depending on only 12 parameters, which significantly reduces the computational complexity compared to existing approaches of multiobjective DPS. The resulting four-dimensional Pareto-approximate set offers a variety of operational strategies from which operators may choose one that corresponds best to their preferences. For demonstration purposes, one particular optimized policy is selected and its parameter values are analyzed to illustrate how the physically based operating rules can be directly interpreted by the operators.
Published: 2020

20. Uncertainty-Driven Policies for Resource Allocation in Epidemics Response

Author: den Brok, Emma (author) and den Brok, Emma (author)
Abstract: Humanitarians and global health actors come to the aid of many people every year, with the aim of preventing disease, increasing wellbeing, and providing (medical) aid to those suffering from disease. One of the contexts in which they operate is that of an epidemic. An epidemic is dynamic by nature and provides a complex and evolving environment in which medical aid needs to be provided. A key aspect in a response to an epidemic is logistics – specifically the allocation of resources such as personnel and medical supplies. These resources are often limited, calling for a targeted and strategic response. There is a variety of studies tackling the problem of resource allocation in the context of an epidemic, which include sequential decisions as the epidemic evolves, as well as the choice between several locations to which resources can be sent. However, these studies often assume decision-makers have complete information on the situation at hand and can make “perfect” choices. In reality, due to the large number of actors involved in a response, poor (telecommunication) infrastructure, and the fact that an epidemic is a moving target due to its dynamic nature, decision-makers often have to deal with incomplete and uncertain information on the number of patients and the way the epidemic is evolving., Engineering and Policy Analysis
Published: 2019

21. Cross-Entropy Optimization of Control Policies With Adaptive Basis Functions.

Author: Busoniu, Lucian, Ernst, Damien, De Schutter, Bart, and Babuska, Robert
Subjects: *CROSS-entropy method, *MATHEMATICAL optimization, *APPROXIMATION theory, *MONTE Carlo method, *MARKOV processes, *DECISION making, *RADIAL basis functions, *SIMULATION methods & models, *COMPUTATIONAL complexity
Abstract: This paper introduces an algorithm for direct search of control policies in continuous-state discrete-action Markov decision processes. The algorithm looks for the best closed-loop policy that can be represented using a given number of basis functions (BFs), where a discrete action is assigned to each BF. The type of the BFs and their number are specified in advance and determine the complexity of the representation. Considerable flexibility is achieved by optimizing the locations and shapes of the BFs, together with the action assignments. The optimization is carried out with the cross-entropy method and evaluates the policies by their empirical return from a representative set of initial states. The return for each representative state is estimated using Monte Carlo simulations. The resulting algorithm for cross-entropy policy search with adaptive BFs is extensively evaluated in problems with two to six state variables, for which it reliably obtains good policies with only a small number of BFs. In these experiments, cross-entropy policy search requires vastly fewer BFs than value-function techniques with equidistant BFs, and outperforms policy search with a competing optimization algorithm called DIRECT. [ABSTRACT FROM AUTHOR]
Published: 2011
Full Text: View/download PDF

22. A diagnostic assessment of evolutionary algorithms for multi-objective surface water reservoir control

Author: Jonathan D. Herman, Jazmin Zatarain Salazar, Matteo Giuliani, Andrea Castelletti, and Patrick M. Reed
Subjects: Engineering, Mathematical optimization, Multi-objective evolutionary algorithm, 010504 meteorology & atmospheric sciences, business.industry, Management science, Suite, Reliability (computer networking), 0208 environmental biotechnology, Evolutionary algorithm, Pareto principle, Parameterized complexity, 02 engineering and technology, Benchmarking, Multi-purpose reservoir control, Benchmark, 01 natural sciences, Direct policy search, 020801 environmental engineering, Benchmark (computing), Key (cryptography), business, 0105 earth and related environmental sciences, Water Science and Technology
Abstract: Globally, the pressures of expanding populations, climate change, and increased energy demands are motivating significant investments in re-operationalizing existing reservoirs or designing operating policies for new ones. These challenges require an understanding of the tradeoffs that emerge across the complex suite of multi-sector demands in river basin systems. This study benchmarks our current capabilities to use Evolutionary Multi-Objective Direct Policy Search (EMODPS), a decision analytic framework in which reservoirs’ candidate operating policies are represented using parameterized global approximators (e.g., radial basis functions) then those parameterized functions are optimized using multi-objective evolutionary algorithms to discover the Pareto approximate operating policies. We contribute a comprehensive diagnostic assessment of modern MOEAs’ abilities to support EMODPS using the Conowingo reservoir in the Lower Susquehanna River Basin, Pennsylvania, USA. Our diagnostic results highlight that EMODPS can be very challenging for some modern MOEAs and that epsilon dominance, time-continuation, and auto-adaptive search are helpful for attaining high levels of performance. The ϵ-MOEA, the auto-adaptive Borg MOEA, and ϵ-NSGAII all yielded superior results for the six-objective Lower Susquehanna benchmarking test case. The top algorithms show low sensitivity to different MOEA parameterization choices and high algorithmic reliability in attaining consistent results for different random MOEA trials. Overall, EMODPS poses a promising method for discovering key reservoir management tradeoffs; however algorithmic choice remains a key concern for problems of increasing complexity.
Published: 2016

23. Exemplar-Based Policy with Selectable Strategies and its Optimization Using GA

Subjects: exemplar, direct policy search, genetic algorithm, case based reasoning, Markov decision process
Abstract: As an approach for dynamic control problems and decision making problems, usually formulated as Markov Decision Processes (MDPs), we focus direct policy search (DPS), where a policy is represented by a model with parameters, and the parameters are optimized so as to maximize the evaluation function by applying the parameterized policy to the problem. In this paper, a novel framework for DPS, an exemplar-based policy optimization using genetic algorithm (EBP-GA) is presented and analyzed. In this approach, the policy is composed of a set of virtual exemplars and a case-based action selector, and the set of exemplars are selected and evolved by a genetic algorithm. Here, an exemplar is a real or virtual, free-styled and suggestive information such as ``take the action A at the state S'' or ``the state S1 is better to attain than S2''. One advantage of EBP-GA is the generalization and localization ability for policy expression, based on case-based reasoning methods. Another advantage is that both the introduction of prior knowledge and the extraction of knowledge after optimization are relatively straightforward. These advantages are confirmed through the proposal of two new policy expressions, experiments on two different problems and their analysis.
Published: 2010

24. Season-Dependent Hedging Policies for Reservoir Operation—A Comparison Study.

Author: Bhatia, Nikhil, Srivastav, Roshan, and Srinivasan, Kasthrirengan
Subjects: RESERVOIRS, WATER supply, WATER shortages, PARAMETERIZATION, WATER management
Abstract: During periods of significant water shortage or when drought is impending, it is customary to implement some kind of water supply reduction measures with a view to prevent the occurrence of severe shortages (vulnerability) in the near future. In the case of operation of a water supply reservoir, this reduction of water supply is affected by hedging schemes or hedging policies. This research work aims to compare the popular hedging policies: (i) linear two-point hedging; (ii) modified two-point hedging; and, (iii) discrete hedging based on time-varying and constant hedging parameters. A parameterization-simulation-optimization (PSO) framework is employed for the selection of the parameters of the compromising hedging policies. The multi-objective evolutionary search-based technique (Non-dominated Sorting based Genetic Algorithm-II) was used to identify the Pareto-optimal front of hedging policies that seek to obtain the trade-off between shortage ratio and vulnerability. The case example used for illustration is the Hemavathy reservoir in Karnataka, India. It is observed that the Pareto-optimal front that was obtained from time-varying hedging policies show significant improvement in reservoir performance when compared to constant hedging policies. The variation in the monthly parameters of the time-variant hedging policies shows a strong correlation with monthly inflows and available water. [ABSTRACT FROM AUTHOR]
Published: 2018
Full Text: View/download PDF

25. Multi-optima exploration with adaptive Gaussian mixture model

Author: Calinon, S., Pervez, Affan, Caldwell, D. G., Calinon, S., Pervez, Affan, and Caldwell, D. G.
Abstract: In learning by exploration problems such as reinforcement learning (RL), direct policy search, stochastic optimization or evolutionary computation, the goal of an agent is to maximize some form of reward function (or minimize a cost function). Often, these algorithms are designed to find a single policy solution. We address the problem of representing the space of control policy solutions by considering exploration as a density estimation problem. Such representation provides additional information such as shape and curvature of local peaks that can be exploited to analyze the discovered solutions and guide the exploration. We show that the search process can easily be generalized to multi-peaked distributions by employing a Gaussian mixture model (GMM) with an adaptive number of components. The GMM has a dual role: representing the space of possible control policies, and guiding the exploration of new policies. A variation of expectation-maximization (EM) applied to reward-weighted policy parameters is presented to model the space of possible solutions, as if this space was a probability distribution. The approach is tested in a dart game experiment formulated as a black-box optimization problem, where the agent's throwing capability increases while it chases for the best strategy to play the game. This experiment is used to study how the proposed approach can exploit new promising solution alternatives in the search process, when the optimality criterion slowly drifts over time. The results show that the proposed multi-optima search approach can anticipate such changes by exploiting promising candidates to smoothly adapt to the change of global optimum., QC 20130219
Published: 2012
Full Text: View/download PDF

26. Exemplar-Based Direct Policy Search with Evolutionary Optimization

Author: Kokolo Ikeda
Subjects: Computer science, business.industry, Evolutionary robotics, Evolutionary algorithm, Interactive evolutionary computation, exemplar based policy, Machine learning, computer.software_genre, evolutionary optimization, Acrobot, Evolutionary computation, Evolutionary acquisition of neural topologies, Human-based evolutionary computation, Evolutionary music, Direct policy search, Q-learning, Artificial intelligence, business, Metaheuristic, computer, Evolutionary programming
Abstract: In this paper, an exemplar-based policy optimization framework for direct policy search is presented. In this exemplar-based approach, the policy to be optimized is composed of a set of exemplars and a case-based action selector. An implementation of this approach using a state-action-based policy representation and an evolutionary algorithm optimizer is shown to provide favorable search performance for two higher-dimensional problems.
Published: 2005

27. Exemplar-Based Direct Policy Search with Evolutionary Optimization

Author: IKEDA, Kokolo and IKEDA, Kokolo
Abstract: In this paper, an exemplar-based policy optimization framework for direct policy search is presented. In this exemplar-based approach, the policy to be optimized is composed of a set of exemplars and a case-based action selector. An implementation of this approach using a state-action-based policy representation and an evolutionary algorithm optimizer is shown to provide favorable search performance for two higher-dimensional problems., identifier:https://dspace.jaist.ac.jp/dspace/handle/10119/12960
Published: 2005

28. Curses, Tradeoffs, and Scalable Management: Advancing Evolutionary Multiobjective Direct Policy Search to Improve Water Reservoir Operations

Author: Francesca Pianosi, Patrick M. Reed, Andrea Castelletti, Emanuele Mason, and Matteo Giuliani
Subjects: Mathematical optimization, Engineering, 010504 meteorology & atmospheric sciences, Reliability (computer networking), 0208 environmental biotechnology, Geography, Planning and Development, MathematicsofComputing_NUMERICALANALYSIS, Evolutionary algorithm, 02 engineering and technology, Management, Monitoring, Policy and Law, 01 natural sciences, Direct policy search, Multiobjective evolutionary algorithm, Water management, Limit (mathematics), 0105 earth and related environmental sciences, Water Science and Technology, Civil and Structural Engineering, Artificial neural network, business.industry, Stochastic programming, 020801 environmental engineering, Water resources, Scalability, business, Curse of dimensionality
Abstract: Optimal management policies for water reservoir operation are generally designed via stochastic dynamic programming (SDP). Yet, the adoption of SDP in complex real-world problems is challenged by the three curses of dimensionality, modeling, and multiple objectives. These three curses considerably limit SDP’s practical application. Alternatively, this study focuses on the use of evolutionary multiobjective direct policy search (EMODPS), a simulation-based optimization approach that combines direct policy search, nonlinear approximating networks, and multiobjective evolutionary algorithms to design Pareto-approximate closed-loop operating policies for multipurpose water reservoirs. This analysis explores the technical and practical implications of using EMODPS through a careful diagnostic assessment of the effectiveness and reliability of the overall EMODPS solution design as well as of the resulting Pareto-approximate operating policies. The EMODPS approach is evaluated using the multipurpose Hoa Binh water reservoir in Vietnam, where water operators are seeking to balance the conflicting objectives of maximizing hydropower production and minimizing flood risks. A key choice in the EMODPS approach is the selection of alternative formulations for flexibly representing reservoir operating policies. This study distinguishes between the relative performance of two widely-used nonlinear approximating networks, namely artificial neural networks (ANNs) and radial basis functions (RBFs). The results show that RBF solutions are more effective than ANN ones in designing Pareto approximate policies for the Hoa Binh reservoir. Given the approximate nature of EMODPS, the diagnostic benchmarking uses SDP to evaluate the overall quality of the attained Pareto-approximate results. Although the Hoa Binh test case’s relative simplicity should maximize the potential value of SDP, the results demonstrate that EMODPS successfully dominates the solutions derived via SDP.

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

28 results on '"Direct policy search"'

1. Connectivity of the Feasible and Sublevel Sets of Dynamic Output Feedback Control With Robustness Constraints

2. Multi-objective optimal design of interbasin water transfers: The Tagus-Segura aqueduct (Spain)

3. Quantifying the trade-offs in re-operating dams for the environment in the Lower Volta River.

4. Evaluating the choice of radial basis functions in multiobjective optimal control applications

5. Direct Policy Search Reinforcement Learning Based on Variational Bayesian Inference.

6. Evaluating the choice of radial basis functions in multiobjective optimal control applications.

7. Using direct policy search to identify robust strategies in adapting to uncertain sea-level rise and storm surge.

8. Exploring global approximators for multiobjective reservoir control

9. Direct policy search for robust multi-objective management of deeply uncertain socio-ecological tipping points.

10. Connectivity of the Feasible and Sublevel Sets of Dynamic Output Feedback Control with Robustness Constraints

11. Exploring global approximators for multiobjective reservoir control

12. Season-Dependent Hedging Policies for Reservoir Operation—A Comparison Study

13. Distribution of waiting time for dynamic pickup and delivery problems.

14. A genetic fuzzy system for interpretable and parsimonious reinforcement learning policies

15. Multiobjective direct policy search using physically based operating rules in multireservoir systems

16. Adaptive mitigation strategies hedge against extreme climate futures

17. Reinforcement Learning with Rare Significant Events: Direct Policy Search vs. Gradient Policy Search

18. Policy Search with Rare Significant Events: Choosing the Right Partner to Cooperate with

19. Multiobjective Direct Policy Search Using Physically Based Operating Rules in Multireservoir Systems

20. Uncertainty-Driven Policies for Resource Allocation in Epidemics Response

21. Cross-Entropy Optimization of Control Policies With Adaptive Basis Functions.

22. A diagnostic assessment of evolutionary algorithms for multi-objective surface water reservoir control

23. Exemplar-Based Policy with Selectable Strategies and its Optimization Using GA

24. Season-Dependent Hedging Policies for Reservoir Operation—A Comparison Study.

25. Multi-optima exploration with adaptive Gaussian mixture model

26. Exemplar-Based Direct Policy Search with Evolutionary Optimization

27. Exemplar-Based Direct Policy Search with Evolutionary Optimization

28. Curses, Tradeoffs, and Scalable Management: Advancing Evolutionary Multiobjective Direct Policy Search to Improve Water Reservoir Operations

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

Publisher

28 results on '"Direct policy search"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources