28 results on '"Direct policy search"'
Search Results
2. Multi-objective optimal design of interbasin water transfers: The Tagus-Segura aqueduct (Spain)
- Author
-
Carlotta Valerio, Matteo Giuliani, Andrea Castelletti, Alberto Garrido, and Lucia De Stefano
- Subjects
Interbasin water transfer ,Tagus-Segura aqueduct ,Multi-objective evolutionary optimization ,Direct policy search ,Environmental flow ,Physical geography ,GB3-5030 ,Geology ,QE1-996.5 - Abstract
Study region: The Tagus-Segura aqueduct (TSA) is a large and strategic water transfer scheme in Spain that connects Entrepeñas and Buendía reservoirs in the Tagus river headwaters to the Segura river basin, a highly stressed Mediterranean area. Study focus: The operating rules of the TSA underwent several modifications over the years, and the debate about which are the optimal parameters to meet the interests of the parties involved is still open. We employed Evolutionary Multi-Objective Direct Policy Search to jointly optimize the re-operation of the headwaters dams and the water transfer policy with respect to four conflicting objectives: Tagus and Segura water demands, hydropower production and socioeconomic benefit of the population living on the shores of the headwaters reservoirs. We tested the optimization under the baseline and the 2027 scenario, which foresees an increased environmental flow (EF) in the Tagus river. New hydrological insights for the region: The proposed operating rule presents optimized control parameters, a higher degree of freedom and a transferred volume that cyclically varies according to the hydrological stage of the year. In the 2027 scenario, despite the increased EF, the deficit in the aqueduct shows a limited increase compared to the historical solution (+10%), while the storage deficit is strongly reduced (−73%). This benefits the population living on the reservoirs shores and also ensures more stability to the aqueduct functioning.
- Published
- 2023
- Full Text
- View/download PDF
3. Quantifying the trade-offs in re-operating dams for the environment in the Lower Volta River.
- Author
-
Owusu, Afua, Salazar, Jazmin Zatarain, Mul, Marloes, van der Zaag, Pieter, and Slinger, Jill
- Abstract
The construction of the Akosombo and Kpong dams in the Lower Volta River Basin in Ghana changed the downstream riverine ecosystem and affected the lives of downstream communities, particularly those who lost their traditional livelihoods. In contrast to the costs borne by those in the vicinity of the river, Ghana as a whole, has enjoyed vast economic benefits from the affordable hydropower, irrigation schemes and lake tourism that developed after construction of the dams. Herein lies the challenge; there exists a trade-off between water for river ecosystems and related services on the one hand, and anthropogenic water demands such hydropower or irrigation on the other. In this study, an Evolutionary Multi-Objective Direct Policy Search (EMODPS) is used to identify the multi-sectorial trade-offs that exist in the Lower Volta River Basin. Three environmental flows, previously determined for the Lower Volta are incorporated separately as an environmental objective. The results highlight the dominance of hydropower production in the Lower Volta, but show that there is room for providing environmental flows under current climatic and water use conditions if firm energy requirement from Akosombo Dam reduces by 12% to 38% depending on the environmental flow regime that is implemented. There is uncertainty in climate change effects on runoff in this region, however multiple scenarios are investigated. It is found that climate change leading to increased annual inflows to the Akosombo Dam reduces the trade-off between hydropower and the environment while climate change resulting in lower inflows provide the opportunity to strategically provide dry season environmental flows, that is, reduce flows sufficiently to meet low flow requirements for key ecosystem services such as the clam fishery. This study not only highlights the challenges in balancing anthropogenic water demands and environmental considerations in managing existing dams, but also identifies opportunities for compromise in the Lower Volta River [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
4. Evaluating the choice of radial basis functions in multiobjective optimal control applications
- Author
-
Zatarain Salazar, J. (author), Kwakkel, J.H. (author), Witvliet, Mark (author), Zatarain Salazar, J. (author), Kwakkel, J.H. (author), and Witvliet, Mark (author)
- Abstract
Evolutionary Multi-Objective Direct Policy Search (EMODPS) is a prominent framework for designing control policies in multi-purpose environmental systems, combining direct policy search with multi-objective evolutionary algorithms (MOEAs) to identify Pareto approximate control policies. While EMODPS is effective, the choice of functions within its global approximator networks remains underexplored, despite their potential to significantly influence both solution quality and MOEA performance. This study conducts a rigorous assessment of a suite of Radial Basis Functions (RBFs) as candidates for these networks. We critically evaluate their ability to map system states to control actions, and assess their influence on Pareto efficient control policies. We apply this analysis to two contrasting case studies: the Conowingo Reservoir System, which balances competing water demands including hydropower, environmental flows, urban supply, power plant cooling, and recreation; and The Shallow Lake Problem, where a city navigates the trade-off between environmental and economic objectives when releasing anthropogenic phosphorus. Our findings reveal that the choice of RBF functions substantially impacts model outcomes. In complex scenarios like multi-objective reservoir control, this choice is critical, while in simpler contexts, such as the Shallow Lake Problem, the influence is less pronounced, though distinctive differences emerge in the characteristics of the prescribed control strategies., Policy Analysis
- Published
- 2024
- Full Text
- View/download PDF
5. Direct Policy Search Reinforcement Learning Based on Variational Bayesian Inference.
- Author
-
Yamaguchi, Nobuhiko
- Subjects
- *
REINFORCEMENT learning , *BAYESIAN analysis , *COMPUTER algorithms , *PARAMETER estimation , *GOVERNMENT policy - Abstract
Direct policy search is a promising reinforcement learning framework particularly for controlling continuous, high-dimensional systems. Peters et al. proposed reward-weighted regression (RWR) as a direct policy search. The RWR algorithm estimates the policy parameter based on the expectation-maximization (EM) algorithm and is therefore prone to overfitting. In this study, we focus on variational Bayesian inference to avoid overfitting and propose direct policy search reinforcement learning based on variational Bayesian inference (VBRL). The performance of the proposed VBRL is assessed in several experiments involving a mountain car and a ball batting task. These experiments demonstrate that VBRL yields a higher average return and outperforms the RWR. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
6. Evaluating the choice of radial basis functions in multiobjective optimal control applications.
- Author
-
Zatarain Salazar, Jazmin, Kwakkel, Jan H., and Witvliet, Mark
- Subjects
- *
RADIAL basis functions , *EVOLUTIONARY algorithms , *WATER management , *WATER power - Abstract
Evolutionary Multi-Objective Direct Policy Search (EMODPS) is a prominent framework for designing control policies in multi-purpose environmental systems, combining direct policy search with multi-objective evolutionary algorithms (MOEAs) to identify Pareto approximate control policies. While EMODPS is effective, the choice of functions within its global approximator networks remains underexplored, despite their potential to significantly influence both solution quality and MOEA performance. This study conducts a rigorous assessment of a suite of Radial Basis Functions (RBFs) as candidates for these networks. We critically evaluate their ability to map system states to control actions, and assess their influence on Pareto efficient control policies. We apply this analysis to two contrasting case studies: the Conowingo Reservoir System, which balances competing water demands including hydropower, environmental flows, urban supply, power plant cooling, and recreation; and The Shallow Lake Problem, where a city navigates the trade-off between environmental and economic objectives when releasing anthropogenic phosphorus. Our findings reveal that the choice of RBF functions substantially impacts model outcomes. In complex scenarios like multi-objective reservoir control, this choice is critical, while in simpler contexts, such as the Shallow Lake Problem, the influence is less pronounced, though distinctive differences emerge in the characteristics of the prescribed control strategies. • RBF choice in EMODPS impacts tradeoffs and policies in multiobjective control. • Lake Problem: RBFs affect control policies, not objective values. • Concave RBFs excel in complex EMODPS, like Conowingo Reservoir. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
7. Using direct policy search to identify robust strategies in adapting to uncertain sea-level rise and storm surge.
- Author
-
Garner, Gregory G. and Keller, Klaus
- Subjects
- *
SEA level , *STORM surges , *COASTAL ecology , *CLIMATE change , *INFRASTRUCTURE (Economics) - Abstract
Sea-level rise poses considerable risks to coastal communities, ecosystems, and infrastructure. Decision makers are faced with uncertain sea-level projections when designing a strategy for coastal adaptation. The traditional methods are often silent on tradeoffs as well as the effects of tail-area events and of potential future learning. Here we reformulate a simple sea-level rise adaptation model to address these concerns. We show that Direct Policy Search yields improved solution quality, with respect to Pareto-dominance in the objectives, over the traditional approach under uncertain sea-level rise projections and storm surge. Additionally, the new formulation produces high quality solutions with less computational demands than an intertemporal optimization approach. Our results illustrate the utility of multi-objective adaptive formulations for the example of coastal adaptation and point to wider-ranging application in climate change adaptation decision problems. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
8. Exploring global approximators for multiobjective reservoir control
- Author
-
Zatarain Salazar, J. (author), Kwakkel, J.H. (author), Witvliet, Mark (author), Zatarain Salazar, J. (author), Kwakkel, J.H. (author), and Witvliet, Mark (author)
- Abstract
Efficient multi-purpose reservoir control policies are crucial in the face of frequent and severe floods and droughts, and to balance water allocation across conflicting demands. Evolutionary Multi-Objective Direct Policy Search (EMODPS) is a popular approach to design control policies for multi-purpose reservoir systems. EMODPS, however, relies on experimental choices within the key components of the framework particularly when coupling multi-objective evolutionary optimization with nonlinear approximation networks. This study explores a suite of radial basis functions (RBFs) used to map the system's states to control actions in a flexible manner as time-varying, non-linear relationships. We provide a systematic assessment of different RBF functions to explore their suitability to obtain Pareto efficient control policies. We use the Susquehanna river basin case study in which competing water demands for hydropower, environment, urban water supply, atomic power plant cooling and recreation need to be met. Our findings suggest that the choice of RBF functions have a large impact on the model outcomes and the search behavior of the optimization algorithm., Policy Analysis
- Published
- 2022
- Full Text
- View/download PDF
9. Direct policy search for robust multi-objective management of deeply uncertain socio-ecological tipping points.
- Author
-
Quinn, Julianne D., Reed, Patrick M., and Keller, Klaus
- Subjects
- *
ENVIRONMENTAL policy , *ECOSYSTEMS , *SOCIAL systems , *PHOSPHORUS , *POLLUTION control industry , *DECISION making - Abstract
Managing socio-ecological systems is a challenge wrought by competing societal objectives, deep uncertainties, and potentially irreversible tipping points. A classic, didactic example is the shallow lake problem in which a hypothetical town situated on a lake must develop pollution control strategies to maximize its economic benefits while minimizing the probability of the lake crossing a critical phosphorus (P) threshold, above which it irreversibly transitions into a eutrophic state. Here, we explore the use of direct policy search (DPS) to design robust pollution control rules for the town that account for deeply uncertain system characteristics and conflicting objectives. The closed loop control formulation of DPS improves the quality and robustness of key management tradeoffs, while dramatically reducing the computational complexity of solving the multi-objective pollution control problem relative to open loop control strategies. These insights suggest DPS is a promising tool for managing socio-ecological systems with deeply uncertain tipping points. [ABSTRACT FROM AUTHOR]
- Published
- 2017
- Full Text
- View/download PDF
10. Connectivity of the Feasible and Sublevel Sets of Dynamic Output Feedback Control with Robustness Constraints
- Author
-
Bin Hu and Yang Zheng
- Subjects
direct policy search ,Control and Optimization ,Control and Systems Engineering ,Optimization and Control (math.OC) ,LOG control ,H-infinity control ,FOS: Mathematics ,Optimization landscape ,sublevel set ,Mathematics - Optimization and Control - Abstract
This paper considers the optimization landscape of linear dynamic output feedback control with $\mathcal{H}_\infty$ robustness constraints. We consider the feasible set of all the stabilizing full-order dynamical controllers that satisfy an additional $\mathcal{H}_\infty$ robustness constraint. We show that this $\mathcal{H}_\infty$-constrained set has at most two path-connected components that are diffeomorphic under a mapping defined by a similarity transformation. Our proof technique utilizes a classical change of variables in $\mathcal{H}_\infty$ control to establish a subjective mapping from a set with a convex projection to the $\mathcal{H}_\infty$-constrained set. This proof idea can also be used to establish the same topological properties of strict sublevel sets of linear quadratic Gaussian (LQG) control and optimal $\mathcal{H}_\infty$ control. Our results bring positive news for gradient-based policy search on robust control problems., Comment: Submitted to L-CSS and CDC 2022
- Published
- 2022
- Full Text
- View/download PDF
11. Exploring global approximators for multiobjective reservoir control
- Author
-
Jazmin Zatarain Salazar, Jan Kwakkel, and Mark Witvliet
- Subjects
direct policy search ,Control and Systems Engineering ,global approximators ,Optimal operation of water resources systems - Abstract
Efficient multi-purpose reservoir control policies are crucial in the face of frequent and severe floods and droughts, and to balance water allocation across conflicting demands. Evolutionary Multi-Objective Direct Policy Search (EMODPS) is a popular approach to design control policies for multi-purpose reservoir systems. EMODPS, however, relies on experimental choices within the key components of the framework particularly when coupling multi-objective evolutionary optimization with nonlinear approximation networks. This study explores a suite of radial basis functions (RBFs) used to map the system's states to control actions in a flexible manner as time-varying, non-linear relationships. We provide a systematic assessment of different RBF functions to explore their suitability to obtain Pareto efficient control policies. We use the Susquehanna river basin case study in which competing water demands for hydropower, environment, urban water supply, atomic power plant cooling and recreation need to be met. Our findings suggest that the choice of RBF functions have a large impact on the model outcomes and the search behavior of the optimization algorithm.
- Published
- 2022
12. Season-Dependent Hedging Policies for Reservoir Operation—A Comparison Study
- Author
-
Nikhil Bhatia, Roshan Srivastav, and Kasthrirengan Srinivasan
- Subjects
parameterization ,simulation ,optimization ,direct policy search ,hedging policy ,shortage ratio: Vulnerability ,NSGA-II ,Hydraulic engineering ,TC1-978 ,Water supply for domestic and industrial purposes ,TD201-500 - Abstract
During periods of significant water shortage or when drought is impending, it is customary to implement some kind of water supply reduction measures with a view to prevent the occurrence of severe shortages (vulnerability) in the near future. In the case of operation of a water supply reservoir, this reduction of water supply is affected by hedging schemes or hedging policies. This research work aims to compare the popular hedging policies: (i) linear two-point hedging; (ii) modified two-point hedging; and, (iii) discrete hedging based on time-varying and constant hedging parameters. A parameterization-simulation-optimization (PSO) framework is employed for the selection of the parameters of the compromising hedging policies. The multi-objective evolutionary search-based technique (Non-dominated Sorting based Genetic Algorithm-II) was used to identify the Pareto-optimal front of hedging policies that seek to obtain the trade-off between shortage ratio and vulnerability. The case example used for illustration is the Hemavathy reservoir in Karnataka, India. It is observed that the Pareto-optimal front that was obtained from time-varying hedging policies show significant improvement in reservoir performance when compared to constant hedging policies. The variation in the monthly parameters of the time-variant hedging policies shows a strong correlation with monthly inflows and available water.
- Published
- 2018
- Full Text
- View/download PDF
13. Distribution of waiting time for dynamic pickup and delivery problems.
- Author
-
Vonolfen, Stefan and Affenzeller, Michael
- Subjects
- *
DISTRIBUTION (Probability theory) , *EXPRESS service (Delivery of goods) , *PASSENGERS , *HEURISTIC algorithms , *SIMULATION methods & models - Abstract
Pickup and delivery problems have numerous applications in practice such as parcel delivery and passenger transportation. In the dynamic variant of the problem, not all information is available in advance but is revealed during the planning process. Thus, it is crucial to anticipate future events in order to generate high-quality solutions. Previous work has shown that the use of waiting strategies has the potential to save costs and maximize service quality. We adapt various waiting heuristics to the pickup and delivery problem with time windows. Previous research has shown, that specialized waiting heuristics utilizing anticipatory knowledge potentially outperform general heuristics. Direct policy search based on evolutionary computation and a simulation model is proposed as a methodology to automatically specialize waiting strategies to different problem characteristics. Based on the strengths of the previously introduced waiting strategies, we propose a novel waiting heuristic that can utilize historical request information based on an intensity measure which does not require an additional data preprocessing step. The performance of the waiting heuristics is evaluated on a single set of benchmark instances containing various instance classes that differ in terms of spatial and temporal properties. The diverse set of benchmark instances is used to analyze the influence of spatial and temporal instance properties as well as the degree of dynamism to the potential savings that can be achieved by anticipatory waiting and the incorporation of knowledge about future requests. [ABSTRACT FROM AUTHOR]
- Published
- 2016
- Full Text
- View/download PDF
14. A genetic fuzzy system for interpretable and parsimonious reinforcement learning policies
- Author
-
Chicano, Francisco, Bishop, Jordan T., Gallagher, Marcus, Browne, Will N., Chicano, Francisco, Bishop, Jordan T., Gallagher, Marcus, and Browne, Will N.
- Abstract
Reinforcement learning (RL) is experiencing a resurgence in research interest, where Learning Classifier Systems (LCSs) have been applied for many years. However, traditional Michigan approaches tend to evolve large rule bases that are difficult to interpret or scale to domains beyond standard mazes. A Pittsburgh Genetic Fuzzy System (dubbed Fuzzy MoCoCo) is proposed that utilises both multiobjective and cooperative coevolutionary mechanisms to evolve fuzzy rule-based policies for RL environments. Multiobjectivity in the system is concerned with policy performance vs. complexity. The continuous state RL environment Mountain Car is used as a testing bed for the proposed system. Results show the system is able to effectively explore the trade-off between policy performance and complexity, and learn interpretable, high-performing policies that use as few rules as possible.
- Published
- 2021
15. Multiobjective direct policy search using physically based operating rules in multireservoir systems
- Author
-
Universitat Politècnica de Catalunya. Doctorat en Enginyeria Civil, Universitat Politècnica de Catalunya. CRAHI - Centre de Recerca Aplicada en Hidrometeorologia, Ritter, Josias Manuel Gisbert, Corzo, Gerald, Solomatine, Dimitri P., Angarita, Héctor, Universitat Politècnica de Catalunya. Doctorat en Enginyeria Civil, Universitat Politècnica de Catalunya. CRAHI - Centre de Recerca Aplicada en Hidrometeorologia, Ritter, Josias Manuel Gisbert, Corzo, Gerald, Solomatine, Dimitri P., and Angarita, Héctor
- Abstract
supplemental_data_wr.1943-5452.0001159_ritter.pdf (492 KB), This study explores the ways to introduce physical interpretability into the process of optimizing operating rules for multireservoir systems with multiple objectives. Prior studies applied the concept of direct policy search (DPS), in which the release policy is expressed as a set of parameterized functions (e.g., neural networks) that are optimized by simulating the performance of different parameter value combinations over a testing period. The problem with this approach is that the operators generally avoid adopting such artificial black-box functions for the direct real-time control of their systems, preferring simpler tools with a clear connection to the system physics. This study addresses this mismatch by replacing the black-box functions in DPS with physically based parameterized operating rules, for example by directly using target levels in dams as decision variables. This leads to results that are physically interpretable and may be more acceptable to operators. The methodology proposed in this work is applied to a network of five reservoirs and four power plants in the Nechi catchment in Colombia, with four interests involved: average energy generation, firm energy generation, flood hazard, and flow regime alteration. The release policy is expressed depending on only 12 parameters, which significantly reduces the computational complexity compared to existing approaches of multiobjective DPS. The resulting four-dimensional Pareto-approximate set offers a variety of operational strategies from which operators may choose one that corresponds best to their preferences. For demonstration purposes, one particular optimized policy is selected and its parameter values are analyzed to illustrate how the physically based operating rules can be directly interpreted by the operators., Peer Reviewed, Preprint
- Published
- 2020
16. Adaptive mitigation strategies hedge against extreme climate futures
- Author
-
Giacomo Marangoni, Patrick M. Reed, Jonathan R. Lamontagne, Klaus Keller, and J. Quinn
- Subjects
Atmospheric Science ,Adaptive strategies ,010504 meteorology & atmospheric sciences ,Adaptive mitigation pathways ,Climate change ,Context (language use) ,Climate risk management, Adaptive mitigation pathways, Integrated assessment modelling, Multi-objective optimization, Direct policy search ,01 natural sciences ,12. Responsible consumption ,United Nations Framework Convention on Climate Change ,0502 economics and business ,11. Sustainability ,Direct policy search ,050207 economics ,Climate risk management ,Integrated assessment modelling ,0105 earth and related environmental sciences ,Sustainable development ,Global and Planetary Change ,05 social sciences ,1. No poverty ,Environmental economics ,Multi-objective optimization ,13. Climate action ,Business ,Futures contract - Abstract
The United Nations Framework Convention on Climate Change agreed to “strengthen the global response to the threat of climate change, in the context of sustainable development and efforts to eradicate poverty” (UNFCCC 2015). Designing a global mitigation strategy to support this goal poses formidable challenges. For one, there are trade-offs between the economic costs and the environmental benefits of averting climate impacts. Furthermore, the coupled human-Earth systems are subject to deep and dynamic uncertainties. Previous economic analyses typically addressed either the former, introducing multiple objectives, or the latter, making mitigation actions responsive to new information. This paper aims at bridging these two separate strands of literature. We demonstrate how information feedback from observed global temperature changes can jointly improve the economic and environmental performance of mitigation strategies. We focus on strategies that maximize discounted expected utility while also minimizing warming above 2 °C, damage costs, and mitigation costs. Expanding on the Dynamic Integrated Climate-Economy (DICE) model and previous multi-objective efforts, we implement closed-loop control strategies, map the emerging trade-offs and quantify the value of the temperature information feedback under both well-characterized and deep climate uncertainties. Adaptive strategies strongly reduce high regrets, guarding against mitigation overspending for less sensitive climate futures, and excessive warming for more sensitive ones.
- Published
- 2021
17. Reinforcement Learning with Rare Significant Events: Direct Policy Search vs. Gradient Policy Search
- Author
-
Nicolas Fontbonne, Jean-Baptiste André, Paul Ecoffet, Nicolas Bredeche, Bredeche, Nicolas, Sorbonne Université (SU), Institut Jean-Nicod (IJN), Département d'Etudes Cognitives - ENS Paris (DEC), École normale supérieure - Paris (ENS Paris), Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-École normale supérieure - Paris (ENS Paris), Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-École des hautes études en sciences sociales (EHESS)-Collège de France (CdF (institution))-Centre National de la Recherche Scientifique (CNRS)-Département de Philosophie - ENS Paris, and Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)
- Subjects
[INFO.INFO-AI] Computer Science [cs]/Artificial Intelligence [cs.AI] ,PPO ,Computer science ,gradient policy search ,Evolutionary algorithm ,[INFO.INFO-NE] Computer Science [cs]/Neural and Evolutionary Computing [cs.NE] ,0102 computer and information sciences ,02 engineering and technology ,[INFO.INFO-NE]Computer Science [cs]/Neural and Evolutionary Computing [cs.NE] ,Machine learning ,computer.software_genre ,01 natural sciences ,Task (project management) ,[INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI] ,continuous state and action spaces ,[INFO.INFO-LG]Computer Science [cs]/Machine Learning [cs.LG] ,0202 electrical engineering, electronic engineering, information engineering ,Reinforcement learning ,evolutionary algorithms ,ComputingMilieux_MISCELLANEOUS ,CMAES ,rare significant events ,business.industry ,[INFO.INFO-LG] Computer Science [cs]/Machine Learning [cs.LG] ,on-policy ,direct policy search ,010201 computation theory & mathematics ,020201 artificial intelligence & image processing ,Artificial intelligence ,business ,computer ,on-line - Abstract
This paper shows that the CMAES direct policy search method fares significantly better than PPO gradient policy search for a reinforcement learning task where significant events are rare.
- Published
- 2021
18. Policy Search with Rare Significant Events: Choosing the Right Partner to Cooperate with
- Author
-
Paul Ecoffet, Nicolas Fontbonne, Jean-Baptiste André, Nicolas Bredeche, Sorbonne Université (SU), Institut Jean-Nicod (IJN), Département d'Etudes Cognitives - ENS Paris (DEC), École normale supérieure - Paris (ENS Paris), Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-École normale supérieure - Paris (ENS Paris), Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-École des hautes études en sciences sociales (EHESS)-Collège de France (CdF (institution))-Centre National de la Recherche Scientifique (CNRS)-Département de Philosophie - ENS Paris, and Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)
- Subjects
I.2 ,FOS: Computer and information sciences ,Computer Science - Machine Learning ,reinforcement learning ,PPO ,Computer Science - Artificial Intelligence ,gradient policy search ,[INFO.INFO-NE]Computer Science [cs]/Neural and Evolutionary Computing [cs.NE] ,[INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI] ,Machine Learning (cs.LG) ,continuous state and action spaces ,Reward ,[INFO.INFO-LG]Computer Science [cs]/Machine Learning [cs.LG] ,Neural and Evolutionary Computing (cs.NE) ,evolutionary algorithms ,CMAES ,rare significant events ,Multidisciplinary ,I.2.6 ,Computer Science - Neural and Evolutionary Computing ,on-policy ,Policy ,direct policy search ,Artificial Intelligence (cs.AI) ,cooperation and partner choice ,Reinforcement, Psychology ,Algorithms ,on-line - Abstract
This paper focuses on a class of reinforcement learning problems where significant events are rare and limited to a single positive reward per episode. A typical example is that of an agent who has to choose a partner to cooperate with, while a large number of partners are simply not interested in cooperating, regardless of what the agent has to offer. We address this problem in a continuous state and action space with two different kinds of search methods: a gradient policy search method and a direct policy search method using an evolution strategy. We show that when significant events are rare, gradient information is also scarce, making it difficult for policy gradient search methods to find an optimal policy, with or without a deep neural architecture. On the other hand, we show that direct policy search methods are invariant to the rarity of significant events, which is yet another confirmation of the unique role evolutionary algorithms has to play as a reinforcement learning method.
- Published
- 2021
- Full Text
- View/download PDF
19. Multiobjective Direct Policy Search Using Physically Based Operating Rules in Multireservoir Systems
- Author
-
Gerald Corzo, Josias Ritter, Hector Angarita, Dimitri Solomatine, Universitat Politècnica de Catalunya. Doctorat en Enginyeria Civil, and Universitat Politècnica de Catalunya. CRAHI - Centre de Recerca Aplicada en Hidrometeorologia
- Subjects
Mathematical optimization ,Physics - Physics and Society ,Computer science ,Process (engineering) ,Geography, Planning and Development ,FOS: Physical sciences ,Physics and Society (physics.soc-ph) ,Management, Monitoring, Policy and Law ,Parameterization simulation optimization ,Physics - Atmospheric and Oceanic Physics ,Atmospheric and Oceanic Physics (physics.ao-ph) ,Rivers--Regulation ,Policy myopia ,Direct policy search ,Cursos d'aigua -- Regulació -- Models matemàtics ,Multiobjective reservoir optimization ,Multireservoir systems ,Enginyeria civil::Enginyeria hidràulica, marítima i sanitària::Embassaments i preses [Àrees temàtiques de la UPC] ,Water Science and Technology ,Civil and Structural Engineering ,Interpretability - Abstract
supplemental_data_wr.1943-5452.0001159_ritter.pdf (492 KB) This study explores the ways to introduce physical interpretability into the process of optimizing operating rules for multireservoir systems with multiple objectives. Prior studies applied the concept of direct policy search (DPS), in which the release policy is expressed as a set of parameterized functions (e.g., neural networks) that are optimized by simulating the performance of different parameter value combinations over a testing period. The problem with this approach is that the operators generally avoid adopting such artificial black-box functions for the direct real-time control of their systems, preferring simpler tools with a clear connection to the system physics. This study addresses this mismatch by replacing the black-box functions in DPS with physically based parameterized operating rules, for example by directly using target levels in dams as decision variables. This leads to results that are physically interpretable and may be more acceptable to operators. The methodology proposed in this work is applied to a network of five reservoirs and four power plants in the Nechi catchment in Colombia, with four interests involved: average energy generation, firm energy generation, flood hazard, and flow regime alteration. The release policy is expressed depending on only 12 parameters, which significantly reduces the computational complexity compared to existing approaches of multiobjective DPS. The resulting four-dimensional Pareto-approximate set offers a variety of operational strategies from which operators may choose one that corresponds best to their preferences. For demonstration purposes, one particular optimized policy is selected and its parameter values are analyzed to illustrate how the physically based operating rules can be directly interpreted by the operators.
- Published
- 2020
20. Uncertainty-Driven Policies for Resource Allocation in Epidemics Response
- Author
-
den Brok, Emma (author) and den Brok, Emma (author)
- Abstract
Humanitarians and global health actors come to the aid of many people every year, with the aim of preventing disease, increasing wellbeing, and providing (medical) aid to those suffering from disease. One of the contexts in which they operate is that of an epidemic. An epidemic is dynamic by nature and provides a complex and evolving environment in which medical aid needs to be provided. A key aspect in a response to an epidemic is logistics – specifically the allocation of resources such as personnel and medical supplies. These resources are often limited, calling for a targeted and strategic response. There is a variety of studies tackling the problem of resource allocation in the context of an epidemic, which include sequential decisions as the epidemic evolves, as well as the choice between several locations to which resources can be sent. However, these studies often assume decision-makers have complete information on the situation at hand and can make “perfect” choices. In reality, due to the large number of actors involved in a response, poor (telecommunication) infrastructure, and the fact that an epidemic is a moving target due to its dynamic nature, decision-makers often have to deal with incomplete and uncertain information on the number of patients and the way the epidemic is evolving., Engineering and Policy Analysis
- Published
- 2019
21. Cross-Entropy Optimization of Control Policies With Adaptive Basis Functions.
- Author
-
Busoniu, Lucian, Ernst, Damien, De Schutter, Bart, and Babuska, Robert
- Subjects
- *
CROSS-entropy method , *MATHEMATICAL optimization , *APPROXIMATION theory , *MONTE Carlo method , *MARKOV processes , *DECISION making , *RADIAL basis functions , *SIMULATION methods & models , *COMPUTATIONAL complexity - Abstract
This paper introduces an algorithm for direct search of control policies in continuous-state discrete-action Markov decision processes. The algorithm looks for the best closed-loop policy that can be represented using a given number of basis functions (BFs), where a discrete action is assigned to each BF. The type of the BFs and their number are specified in advance and determine the complexity of the representation. Considerable flexibility is achieved by optimizing the locations and shapes of the BFs, together with the action assignments. The optimization is carried out with the cross-entropy method and evaluates the policies by their empirical return from a representative set of initial states. The return for each representative state is estimated using Monte Carlo simulations. The resulting algorithm for cross-entropy policy search with adaptive BFs is extensively evaluated in problems with two to six state variables, for which it reliably obtains good policies with only a small number of BFs. In these experiments, cross-entropy policy search requires vastly fewer BFs than value-function techniques with equidistant BFs, and outperforms policy search with a competing optimization algorithm called DIRECT. [ABSTRACT FROM AUTHOR]
- Published
- 2011
- Full Text
- View/download PDF
22. A diagnostic assessment of evolutionary algorithms for multi-objective surface water reservoir control
- Author
-
Jonathan D. Herman, Jazmin Zatarain Salazar, Matteo Giuliani, Andrea Castelletti, and Patrick M. Reed
- Subjects
Engineering ,Mathematical optimization ,Multi-objective evolutionary algorithm ,010504 meteorology & atmospheric sciences ,business.industry ,Management science ,Suite ,Reliability (computer networking) ,0208 environmental biotechnology ,Evolutionary algorithm ,Pareto principle ,Parameterized complexity ,02 engineering and technology ,Benchmarking ,Multi-purpose reservoir control ,Benchmark ,01 natural sciences ,Direct policy search ,020801 environmental engineering ,Benchmark (computing) ,Key (cryptography) ,business ,0105 earth and related environmental sciences ,Water Science and Technology - Abstract
Globally, the pressures of expanding populations, climate change, and increased energy demands are motivating significant investments in re-operationalizing existing reservoirs or designing operating policies for new ones. These challenges require an understanding of the tradeoffs that emerge across the complex suite of multi-sector demands in river basin systems. This study benchmarks our current capabilities to use Evolutionary Multi-Objective Direct Policy Search (EMODPS), a decision analytic framework in which reservoirs’ candidate operating policies are represented using parameterized global approximators (e.g., radial basis functions) then those parameterized functions are optimized using multi-objective evolutionary algorithms to discover the Pareto approximate operating policies. We contribute a comprehensive diagnostic assessment of modern MOEAs’ abilities to support EMODPS using the Conowingo reservoir in the Lower Susquehanna River Basin, Pennsylvania, USA. Our diagnostic results highlight that EMODPS can be very challenging for some modern MOEAs and that epsilon dominance, time-continuation, and auto-adaptive search are helpful for attaining high levels of performance. The ϵ-MOEA, the auto-adaptive Borg MOEA, and ϵ-NSGAII all yielded superior results for the six-objective Lower Susquehanna benchmarking test case. The top algorithms show low sensitivity to different MOEA parameterization choices and high algorithmic reliability in attaining consistent results for different random MOEA trials. Overall, EMODPS poses a promising method for discovering key reservoir management tradeoffs; however algorithmic choice remains a key concern for problems of increasing complexity.
- Published
- 2016
23. Exemplar-Based Policy with Selectable Strategies and its Optimization Using GA
- Subjects
exemplar ,direct policy search ,genetic algorithm ,case based reasoning ,Markov decision process - Abstract
As an approach for dynamic control problems and decision making problems, usually formulated as Markov Decision Processes (MDPs), we focus direct policy search (DPS), where a policy is represented by a model with parameters, and the parameters are optimized so as to maximize the evaluation function by applying the parameterized policy to the problem. In this paper, a novel framework for DPS, an exemplar-based policy optimization using genetic algorithm (EBP-GA) is presented and analyzed. In this approach, the policy is composed of a set of virtual exemplars and a case-based action selector, and the set of exemplars are selected and evolved by a genetic algorithm. Here, an exemplar is a real or virtual, free-styled and suggestive information such as ``take the action A at the state S'' or ``the state S1 is better to attain than S2''. One advantage of EBP-GA is the generalization and localization ability for policy expression, based on case-based reasoning methods. Another advantage is that both the introduction of prior knowledge and the extraction of knowledge after optimization are relatively straightforward. These advantages are confirmed through the proposal of two new policy expressions, experiments on two different problems and their analysis.
- Published
- 2010
24. Season-Dependent Hedging Policies for Reservoir Operation—A Comparison Study.
- Author
-
Bhatia, Nikhil, Srivastav, Roshan, and Srinivasan, Kasthrirengan
- Subjects
RESERVOIRS ,WATER supply ,WATER shortages ,PARAMETERIZATION ,WATER management - Abstract
During periods of significant water shortage or when drought is impending, it is customary to implement some kind of water supply reduction measures with a view to prevent the occurrence of severe shortages (vulnerability) in the near future. In the case of operation of a water supply reservoir, this reduction of water supply is affected by hedging schemes or hedging policies. This research work aims to compare the popular hedging policies: (i) linear two-point hedging; (ii) modified two-point hedging; and, (iii) discrete hedging based on time-varying and constant hedging parameters. A parameterization-simulation-optimization (PSO) framework is employed for the selection of the parameters of the compromising hedging policies. The multi-objective evolutionary search-based technique (Non-dominated Sorting based Genetic Algorithm-II) was used to identify the Pareto-optimal front of hedging policies that seek to obtain the trade-off between shortage ratio and vulnerability. The case example used for illustration is the Hemavathy reservoir in Karnataka, India. It is observed that the Pareto-optimal front that was obtained from time-varying hedging policies show significant improvement in reservoir performance when compared to constant hedging policies. The variation in the monthly parameters of the time-variant hedging policies shows a strong correlation with monthly inflows and available water. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
25. Multi-optima exploration with adaptive Gaussian mixture model
- Author
-
Calinon, S., Pervez, Affan, Caldwell, D. G., Calinon, S., Pervez, Affan, and Caldwell, D. G.
- Abstract
In learning by exploration problems such as reinforcement learning (RL), direct policy search, stochastic optimization or evolutionary computation, the goal of an agent is to maximize some form of reward function (or minimize a cost function). Often, these algorithms are designed to find a single policy solution. We address the problem of representing the space of control policy solutions by considering exploration as a density estimation problem. Such representation provides additional information such as shape and curvature of local peaks that can be exploited to analyze the discovered solutions and guide the exploration. We show that the search process can easily be generalized to multi-peaked distributions by employing a Gaussian mixture model (GMM) with an adaptive number of components. The GMM has a dual role: representing the space of possible control policies, and guiding the exploration of new policies. A variation of expectation-maximization (EM) applied to reward-weighted policy parameters is presented to model the space of possible solutions, as if this space was a probability distribution. The approach is tested in a dart game experiment formulated as a black-box optimization problem, where the agent's throwing capability increases while it chases for the best strategy to play the game. This experiment is used to study how the proposed approach can exploit new promising solution alternatives in the search process, when the optimality criterion slowly drifts over time. The results show that the proposed multi-optima search approach can anticipate such changes by exploiting promising candidates to smoothly adapt to the change of global optimum., QC 20130219
- Published
- 2012
- Full Text
- View/download PDF
26. Exemplar-Based Direct Policy Search with Evolutionary Optimization
- Author
-
Kokolo Ikeda
- Subjects
Computer science ,business.industry ,Evolutionary robotics ,Evolutionary algorithm ,Interactive evolutionary computation ,exemplar based policy ,Machine learning ,computer.software_genre ,evolutionary optimization ,Acrobot ,Evolutionary computation ,Evolutionary acquisition of neural topologies ,Human-based evolutionary computation ,Evolutionary music ,Direct policy search ,Q-learning ,Artificial intelligence ,business ,Metaheuristic ,computer ,Evolutionary programming - Abstract
In this paper, an exemplar-based policy optimization framework for direct policy search is presented. In this exemplar-based approach, the policy to be optimized is composed of a set of exemplars and a case-based action selector. An implementation of this approach using a state-action-based policy representation and an evolutionary algorithm optimizer is shown to provide favorable search performance for two higher-dimensional problems.
- Published
- 2005
27. Exemplar-Based Direct Policy Search with Evolutionary Optimization
- Author
-
IKEDA, Kokolo and IKEDA, Kokolo
- Abstract
In this paper, an exemplar-based policy optimization framework for direct policy search is presented. In this exemplar-based approach, the policy to be optimized is composed of a set of exemplars and a case-based action selector. An implementation of this approach using a state-action-based policy representation and an evolutionary algorithm optimizer is shown to provide favorable search performance for two higher-dimensional problems., identifier:https://dspace.jaist.ac.jp/dspace/handle/10119/12960
- Published
- 2005
28. Curses, Tradeoffs, and Scalable Management: Advancing Evolutionary Multiobjective Direct Policy Search to Improve Water Reservoir Operations
- Author
-
Francesca Pianosi, Patrick M. Reed, Andrea Castelletti, Emanuele Mason, and Matteo Giuliani
- Subjects
Mathematical optimization ,Engineering ,010504 meteorology & atmospheric sciences ,Reliability (computer networking) ,0208 environmental biotechnology ,Geography, Planning and Development ,MathematicsofComputing_NUMERICALANALYSIS ,Evolutionary algorithm ,02 engineering and technology ,Management, Monitoring, Policy and Law ,01 natural sciences ,Direct policy search ,Multiobjective evolutionary algorithm ,Water management ,Limit (mathematics) ,0105 earth and related environmental sciences ,Water Science and Technology ,Civil and Structural Engineering ,Artificial neural network ,business.industry ,Stochastic programming ,020801 environmental engineering ,Water resources ,Scalability ,business ,Curse of dimensionality - Abstract
Optimal management policies for water reservoir operation are generally designed via stochastic dynamic programming (SDP). Yet, the adoption of SDP in complex real-world problems is challenged by the three curses of dimensionality, modeling, and multiple objectives. These three curses considerably limit SDP’s practical application. Alternatively, this study focuses on the use of evolutionary multiobjective direct policy search (EMODPS), a simulation-based optimization approach that combines direct policy search, nonlinear approximating networks, and multiobjective evolutionary algorithms to design Pareto-approximate closed-loop operating policies for multipurpose water reservoirs. This analysis explores the technical and practical implications of using EMODPS through a careful diagnostic assessment of the effectiveness and reliability of the overall EMODPS solution design as well as of the resulting Pareto-approximate operating policies. The EMODPS approach is evaluated using the multipurpose Hoa Binh water reservoir in Vietnam, where water operators are seeking to balance the conflicting objectives of maximizing hydropower production and minimizing flood risks. A key choice in the EMODPS approach is the selection of alternative formulations for flexibly representing reservoir operating policies. This study distinguishes between the relative performance of two widely-used nonlinear approximating networks, namely artificial neural networks (ANNs) and radial basis functions (RBFs). The results show that RBF solutions are more effective than ANN ones in designing Pareto approximate policies for the Hoa Binh reservoir. Given the approximate nature of EMODPS, the diagnostic benchmarking uses SDP to evaluate the overall quality of the attained Pareto-approximate results. Although the Hoa Binh test case’s relative simplicity should maximize the potential value of SDP, the results demonstrate that EMODPS successfully dominates the solutions derived via SDP.
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.