Journal: operations research / Publication Year Range: This year - Searchworks@Jio Institute Digital Library Search Results

Showing total 49 results

Start Over Publication Year Range This year Journal operations research

49 results

1. Uncertainty Quantification and Exploration for Reinforcement Learning.

Author: Zhu, Yi, Dong, Jing, and Lam, Henry
Subjects: REINFORCEMENT learning, CENTRAL limit theorem, CONFIDENCE regions (Mathematics), ASYMPTOTIC distribution, INFERENTIAL statistics
Abstract: Quantify the uncertainty to decide and explore better In statistical inference, large-sample behavior and confidence interval construction are fundamental in assessing the error and reliability of estimated quantities with respect to the data noises. In the paper "Uncertainty Quantification and Exploration for Reinforcement Learning", Dong, Lam, and Zhu study the large sample behavior in the classic setting of reinforcement learning. They derive appropriate large-sample asymptotic distributions for the state-action value function (Q-value) and optimal value function estimations when data are collected from the underlying Markov chain. This allows one to evaluate the assertiveness of performances among different decisions. The tight uncertainty quantification also facilitates the development of a pure exploration policy by maximizing the worst-case relative discrepancy among the estimated Q-values (ratio of the mean squared difference to the variance). This exploration policy aims to collect informative training data to maximize the probability of learning the optimal reward collecting policy, and it achieves good empirical performance. We investigate statistical uncertainty quantification for reinforcement learning (RL) and its implications in exploration policy. Despite ever-growing literature on RL applications, fundamental questions about inference and error quantification, such as large-sample behaviors, appear to remain quite open. In this paper, we fill in the literature gap by studying the central limit theorem behaviors of estimated Q-values and value functions under various RL settings. In particular, we explicitly identify closed-form expressions of the asymptotic variances, which allow us to efficiently construct asymptotically valid confidence regions for key RL quantities. Furthermore, we utilize these asymptotic expressions to design an effective exploration strategy, which we call Q-value-based Optimal Computing Budget Allocation (Q-OCBA). The policy relies on maximizing the relative discrepancies among the Q-value estimates. Numerical experiments show superior performances of our exploration strategy than other benchmark policies. Funding: This work was supported by the National Science Foundation (1720433). [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

2. Unified Moment-Based Modeling of Integrated Stochastic Processes.

Author: Kyriakou, Ioannis, Brignone, Riccardo, and Fusai, Gianluca
Subjects: STOCHASTIC processes, STOCHASTIC integrals, STOCHASTIC models, STOCHASTIC systems, DISTRIBUTION (Probability theory)
Abstract: Dial M for Simulation For years, systems of stochastic differential equations (SDEs) were simulated by discretization, inevitably introducing a bias, which can be difficult to quantify accurately. To circumvent this, some attempts have been made to simulate exactly various models from the SDE solution. These approaches prove capable of producing accurate results. A serious drawback of such an approach, nevertheless, is the implicit need to use extensive numerical methods, which make the entire simulation computationally heavy and quite impracticable. In the paper "Unified moment-based modeling of integrated stochastic processes," Kyriakou, Brignone, and Fusai present a methodological framework based on M(oments) for the simulation of such models that overcomes earlier limitations. Theoretical results and extensive numerical experiments show that the proposed approach allows accurate simulation of complex stochastic models with low computational effort. In this paper, we present a new method for simulating integrals of stochastic processes. We focus on the nontrivial case of time integrals, conditional on the state variable levels at the endpoints of a time interval through a moment-based probability distribution construction. We present different classes of models with important uses in finance, medicine, epidemiology, climatology, bioeconomics, and physics. The method is generally applicable in well-posed moment problem settings. We study its convergence, point out its advantages through a series of numerical experiments, and compare its performance against existing schemes. Supplemental Material: The e-companion is available at https://doi.org/10.1287/opre.2022.2422. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

3. UCB-Type Learning Algorithms with Kaplan–Meier Estimator for Lost-Sales Inventory Models with Lead Times.

Author: Lyu, Chengyi, Zhang, Huanan, and Xin, Linwei
Subjects: MACHINE learning, KAPLAN-Meier estimator, LEAD time (Supply chain management), INVENTORIES, NUMERICAL analysis
Abstract: Efficient Learning Algorithms for the Best Capped Base-Stock Policy in Lost Sales Inventory Systems Periodic review, lost sales inventory systems with lead times are notoriously challenging to optimize. Recently, the capped base-stock policy, which places orders to bring the inventory position up to the order-up-to level subject to the order cap, has demonstrated exceptional performance. In the paper "UCB-Type Learning Algorithms with Kaplan–Meier Estimator for Lost Sales Inventory Models with Lead Times," Lyu, Zhang, and Xin propose an upper confidence bound–type learning framework. This framework, which incorporates simulations with the Kaplan–Meier estimator, works with censored demand observations. It can be applied to determine the optimal capped base-stock policy with a tight regret with respect to the planning horizon and the optimal base-stock policy with a regret that matches the best existing result. Both theoretical analysis and extensive numerical experiments demonstrate the effectiveness of the proposed learning framework. In this paper, we consider a classic periodic-review lost-sales inventory system with lead times, which is notoriously challenging to optimize with a wide range of real-world applications. We consider a joint learning and optimization problem in which the decision maker does not know the demand distribution a priori and can only use past sales information (i.e., censored demand). Departing from existing learning algorithms on this learning problem that require the convexity property of the underlying system, we develop an upper confidence bound (UCB)-type learning framework that incorporates simulations with the Kaplan–Meier estimator and demonstrate its applicability to learning not only the optimal capped base-stock policy in which convexity no longer holds, but also the optimal base-stock policy with a regret that matches the best existing result. Compared with a classic multi-armed bandit problem, our problem has unique challenges because of the nature of the inventory system, because (1) each action has long-term impacts on future costs, and (2) the system state space is exponentially large in the lead time. As such, our learning algorithms are not naive adoptions of the classic UCB algorithm; in fact, the design of the simulation steps with the Kaplan–Meier estimator and averaging steps is novel in our algorithms, and the confidence width in the UCB index is also different from the classic one. We prove the regrets of our learning algorithms are tight up to a logarithmic term in the planning horizon T. Our extensive numerical experiments suggest the proposed algorithms (almost) dominate existing learning algorithms. We also demonstrate how to select which learning algorithm to use with limited demand data. Supplemental Material: The online appendix is available at https://doi.org/10.1287/opre.2022.0273. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

4. A Lyapunov Theory for Finite-Sample Guarantees of Markovian Stochastic Approximation.

Author: Chen, Zaiwei, Maguluri, Siva T., Shakkottai, Sanjay, and Shanmugam, Karthikeyan
Subjects: STOCHASTIC approximation, STOCHASTIC analysis, REINFORCEMENT learning, SMOOTHNESS of functions, CONTRACTION operators, STATISTICAL learning
Abstract: The stochastic approximation (SA) method stands as the foundational mathematical tool for modern large-scale optimization and machine learning. Therefore, gaining a theoretical understanding of SA algorithms is of fundamental interest. In their paper titled "A Lyapunov Theory for Finite-Sample Guarantees of Markovian Stochastic Approximation," Chen et al. present a unified Lyapunov framework for the finite-sample analysis of a Markovian SA algorithm under a contractive operator with respect to an arbitrary norm. The key novelty lies in the construction of a smooth Lyapunov function called the generalized Moreau envelope. The authors demonstrate the effectiveness of their SA results in the context of reinforcement learning (RL), specifically through popular algorithms such as variants of temporal difference (TD) learning and Q-learning. As byproducts, the results provide theoretical insights into the efficiency of bootstrapping in TD learning with eligibility traces and the bias-variance tradeoff in off-policy learning. This paper develops a unified Lyapunov framework for finite-sample analysis of a Markovian stochastic approximation (SA) algorithm under a contraction operator with respect to an arbitrary norm. The main novelty lies in the construction of a valid Lyapunov function called the generalized Moreau envelope. The smoothness and an approximation property of the generalized Moreau envelope enable us to derive a one-step Lyapunov drift inequality, which is the key to establishing the finite-sample bounds. Our SA result has wide applications, especially in the context of reinforcement learning (RL). Specifically, we show that a large class of value-based RL algorithms can be modeled in the exact form of our Markovian SA algorithm. Therefore, our SA results immediately imply finite-sample guarantees for popular RL algorithms such as n-step temporal difference (TD) learning, TD (λ) , off-policy V-trace, and Q-learning. As byproducts, by analyzing the convergence bounds of n-step TD and TD (λ) , we provide theoretical insight into the problem about the efficiency of bootstrapping. Moreover, our finite-sample bounds of off-policy V-trace explicitly capture the tradeoff between the variance of the stochastic iterates and the bias in the limit. Funding: This work was supported by RTX, the National Science Foundation [Grants 2019844, 2107037, 211247, 2112533, 2144316, and 2240982], and the Machine Learning Laboratory at University of Texas at Austin. Supplemental Material: The online appendix is available at https://doi.org/10.1287/opre.2022.0249. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

5. Comparing Sequential Forecasters.

Author: Choe, Yo Joong and Ramdas, Aaditya
Subjects: FUTUROLOGISTS, SEQUENTIAL learning, NULL hypothesis, COMPUTER science education, CONFIDENCE intervals, NONPARAMETRIC statistics
Abstract: Anytime Valid Comparison of Sequential Forecasters How do we compare forecasters that each make a probabilistic prediction on a sequence of events (e.g., weather and sports)? In "Comparing Sequential Forecasters," Choe and Ramdas propose flexible approaches to sequential inference on the mean score difference between any two forecasters. To estimate this time-varying quantity, the authors propose a sequential analog of confidence intervals, called confidence sequences (CSs). These CSs correctly cover the score difference under continuous monitoring, and the evaluator can freely peek at the scores to stop the experiment ("anytime valid"). The authors further develop a complementary anytime valid approach called e-processes, which quantify the evidence against the claim that one forecaster is never better than the other in mean scores. The validity of these methods does not depend on stationarity or other assumptions on how the score differences evolve sequentially. In their paper, the authors showcase CSs and e-processes for comparing real-world baseball and weather forecasters. Consider two forecasters, each making a single prediction for a sequence of events over time. We ask a relatively basic question: how might we compare these forecasters, either online or post hoc, avoiding unverifiable assumptions on how the forecasts and outcomes were generated? In this paper, we present a rigorous answer to this question by designing novel sequential inference procedures for estimating the time-varying difference in forecast scores. To do this, we employ confidence sequences (CS), which are sequences of confidence intervals that can be continuously monitored and are valid at arbitrary data-dependent stopping times ("anytime-valid"). The widths of our CSs are adaptive to the underlying variance of the score differences. Underlying their construction is a game-theoretic statistical framework in which we further identify e-processes and p-processes for sequentially testing a weak null hypothesis—whether one forecaster outperforms another on average (rather than always). Our methods do not make distributional assumptions on the forecasts or outcomes; our main theorems apply to any bounded scores, and we later provide alternative methods for unbounded scores. We empirically validate our approaches by comparing real-world baseball and weather forecasters. Funding: A. Ramdas acknowledges funding from the National Science Foundation Division of Mathematical Sciences [Grant 1916320]. Research reported in this paper was sponsored in part by the DEVCOM Army Research Laboratory under Cooperative Agreement W911NF-17-2-0196 (ARL IoBT CRA). Supplemental Material: The e-companion is available at https://doi.org/10.1287/opre.2021.0792. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

6. Technical Note—Risk Sensitivity and Firm Power: Price Competition with Mean2-Variance Profit Objective Under Multinomial Logit Demand.

Author: Li, Hongmin and Webster, Scott
Subjects: PRICES, MARKET exit, NASH equilibrium, LOGISTIC regression analysis, MARKET equilibrium
Abstract: Risk Sensitivity and Firm Power: Price Competition with Mean2-Variance Profit Objective Under Multinomial Logit Demand Firms with varying degrees of risk sensitivity perceive the same stochastic profit prospect differently. It is important to understand the equilibrium pricing behaviors under risk aversion and their implications on firms' survival and market health. The authors identify a power index, which is the ratio of product attractiveness to risk sensitivity, and show that the set of profitable firms is power index ordered. Firms with a favorable power index value earn a positive profit in a price competition, and others are driven to zero profit. Interestingly, although high risk aversion handicaps a firm, this paper shows that moderate risk aversion may give a firm an advantage over an otherwise equivalent competitor that is less risk sensitive. The authors also establish that in an equilibrium with market entry and exit, the power index is generalized to the ratio of product attractiveness and an entry cost-adjusted risk sensitivity measure. This paper is the first in the literature to address a risk-sensitive price competition under the multinomial logit choice model, with each participating firm maximizing a risk-adjusted profit objective. We find that, at equilibrium, a subset of firms earns a positive profit, whereas others are driven to zero profit, contrasting with the risk-neutral equilibrium in which all firms earn a positive profit regardless of quality and cost. We identify a power index—the ratio of effective product attractiveness to risk sensitivity—and show that the set of profitable firms is power index ordered. Risk aversion drives firms toward a more aggressive equilibrium pricing strategy and intensifies competition. However, the relative profit impact across firms is not monotone; although high risk aversion handicaps a firm, moderate risk aversion may place a firm at an advantage over an otherwise equivalent competitor that is less risk sensitive, contrary to intuition. In an equilibrium with market entry cost, we show that the set of active firms at equilibrium follows a generalized power index order that depends on entry cost. Furthermore, although the power index is decreasing in risk aversion, the generalized power index may initially be increasing in risk aversion. Supplemental Material: The online appendix is available at https://doi.org/10.1287/opre.2023.2466. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

7. Technical Note—Greedy Algorithm for Multiway Matching with Bounded Regret.

Author: Gupta, Varun
Subjects: GREEDY algorithms, ONLINE algorithms, REVENUE management, RESOURCE allocation, TIME perspective
Abstract: A Unified View of Online Matching and Resource Allocation Problems Whereas small regret online algorithms for applications as diverse as network revenue management (NRM), assemble-to-order (ATO) systems, and online stochastic bin packing (SBP) are known in the literature, the design of the existing algorithms are tailored to the specific application and often use the strategy of resolving a planning linear program. In the paper "Greedy Algorithm for Multiway Matching with Bounded Regret," Gupta proposes a unified model for studying such online matching/allocation problems. In the unified model, resources of three types—off-line (e.g., inventory in NRM), online-queueable (e.g., orders or resources in ATO systems), or online-nonqueueable (e.g., requests in NRM, items in SBP)—must be combined to create feasible configurations. Leveraging the unified framework, the author gives one simple greedy algorithm that gives small regret (bounded or logarithmic in time horizon) for these diverse applications under a mild nondegeneracy condition on the off-line planning problem. In this paper, we prove the efficacy of a simple greedy algorithm for a finite horizon online resource allocation/matching problem when the corresponding static planning linear program (SPP) exhibits a nondegeneracy condition called the general position gap (GPG). The key intuition that we formalize is that the solution of the reward-maximizing SPP is the same as a feasibility linear program restricted to the optimal basic activities, and under GPG, this solution can be tracked with bounded regret by a greedy algorithm, that is, without the commonly used technique of periodically resolving the SPP. The goal of the decision maker is to combine resources (from a finite set of resource types) into configurations (from a finite set of feasible configurations) in which each configuration is specified by the number of resources consumed of each type and a reward. The resources are further subdivided into three types—off-line (whose quantity is known and available at time 0), online-queueable (which arrive online and can be stored in a buffer), and online-nonqueueable (which arrive online and must be matched on arrival or lost). Under GPG, we prove that (i) our greedy algorithm gets bounded anytime regret of O (1 / ϵ 0) for matching reward (ϵ0 is a measure of the GPG) when no configuration contains both an online-queueable and an online-nonqueueable resource and (ii) O (log t) expected anytime regret otherwise (we also prove a matching lower bound). By considering the three types of resources, our matching framework encompasses several well-studied problems, such as dynamic multisided matching, network revenue management, online stochastic packing, and multiclass queueing systems. Supplemental Material: The online appendix is available at https://doi.org/10.1287/opre.2022.2400. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

8. Data-Driven Optimization with Distributionally Robust Second Order Stochastic Dominance Constraints.

Author: Peng, Chun and Delage, Erick
Subjects: STOCHASTIC dominance, STOCHASTIC orders, ROBUST optimization, OPTIMIZATION algorithms, AMBIGUITY
Abstract: This paper presents the first comprehensive study of a data-driven formulation of the distributionally robust second order stochastic dominance constrained problem (DRSSDCP) that hinges on using a type-1 Wasserstein ambiguity set. It is, furthermore, for the first time shown to be axiomatically motivated in an environment with distribution ambiguity. We formulate the DRSSDCP as a multistage robust optimization problem and further propose a tractable conservative approximation that exploits finite adaptability and a scenario-based lower bounding problem. We then propose the first exact optimization algorithm for this DRSSDCP. We illustrate how the data-driven DRSSDCP can be applied in practice on resource-allocation problems with both synthetic and real data. Our empirical results show that, with a proper adjustment of the size of the Wasserstein ball, DRSSDCP can reach acceptable out-of-sample feasibility yet still generating strictly better performance than what is achieved by the reference strategy. Optimization with stochastic dominance constraints has recently received an increasing amount of attention in the quantitative risk management literature. Instead of requiring that the probabilistic description of the uncertain parameters be exactly known, this paper presents a comprehensive study of a data-driven formulation of the distributionally robust second order stochastic dominance constrained problem (DRSSDCP) that hinges on using a type-1 Wasserstein ambiguity set. This formulation allows us to identify solutions with finite sample guarantees and solutions that are asymptotically consistent when observations are independent and identically distributed. It is, furthermore, shown to be axiomatically motivated in an environment with distribution ambiguity. Leveraging recent results in the field of robust optimization, we further formulate the DRSSDCP as a multistage robust optimization problem and further propose a tractable conservative approximation that exploits finite adaptability and a scenario-based lower bounding problem, both of which can reduce to linear programs under mild conditions. We then propose, to the best of our knowledge, the first exact optimization algorithm for this DRSSDCP, the efficiency of which is confirmed by our numerical results. Finally, we illustrate how the data-driven DRSSDCP can be applied in practice on resource-allocation problems with both synthetic and real data. Our empirical results show that, with a proper adjustment of the size of the Wasserstein ball, DRSSDCP can reach "acceptable" out-of-sample feasibility yet still generating strictly better performance than what is achieved by the reference strategy. Funding: This research was partially supported by the Natural Sciences and Engineering Research Council of Canada [Grant RGPIN-2016-05208], the Canada Research Chair program [Grant 950-230057], and Groupe d'études et de recherche en analyse des décisions, and it was enabled in part by support provided by Calcul Quebec (https://www.calculquebec.ca/en/) and the Digital Research Alliance of Canada (https://www.alliancecan.ca/). Supplemental Material: The e-companion is available at https://doi.org/10.1287/opre.2022.2387. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

9. Exploration and Incentives in Reinforcement Learning.

Author: Simchowitz, Max and Slivkins, Aleksandrs
Subjects: REINFORCEMENT learning, MARKOV processes, INFORMATION asymmetry, INFORMATION resources management
Abstract: How do you incentivize self-interested agents to explore when they prefer to exploit? We consider complex exploration problems, where each agent faces the same (but unknown) Markov decision process (MDP). In contrast with traditional formulations of reinforcement learning (RL), agents control the choice of policies, whereas an algorithm can only issue recommendations. However, the algorithm controls the flow of information, and can incentivize the agents to explore via information asymmetry. We design an algorithm which explores all reachable states in the MDP. We achieve provable guarantees similar to those for incentivizing exploration in static, stateless exploration problems studied previously. From the RL perspective, we design RL mechanisms, that is, RL algorithms that interact with self-interested agents and are compatible with their incentives. This is the first paper on RL mechanisms, that is, the first paper on any scenario that combines RL and incentives, to the best of our knowledge. How do you incentivize self-interested agents to explore when they prefer to exploit? We consider complex exploration problems, where each agent faces the same (but unknown) Markov decision process (MDP). In contrast with traditional formulations of reinforcement learning, agents control the choice of policies, whereas an algorithm can only issue recommendations. However, the algorithm controls the flow of information, and can incentivize the agents to explore via information asymmetry. We design an algorithm which explores all reachable states in the MDP. We achieve provable guarantees similar to those for incentivizing exploration in static, stateless exploration problems studied previously. To the best of our knowledge, this is the first work to consider mechanism design in a stateful, reinforcement learning setting. Supplemental Material: The online appendix is available at https://doi.org/10.1287/opre.2022.0495. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

10. Wasserstein Distributionally Robust Optimization and Variation Regularization.

Author: Gao, Rui, Chen, Xi, and Kleywegt, Anton J.
Subjects: ROBUST optimization, POISSON processes, MACHINE learning, DISTRIBUTION (Probability theory), STATISTICAL learning
Abstract: This paper builds a bridge between two area in optimization and machine learning by establishing a general connection between Wasserstein distributional robustness and variation regularization. It helps to demystify the empirical success of Wasserstein distributionally robust optimization and devise new regularization schemes for machine learning. Wasserstein distributionally robust optimization (DRO) is an approach to optimization under uncertainty in which the decision maker hedges against a set of probability distributions, specified by a Wasserstein ball, for the uncertain parameters. This approach facilitates robust machine learning, resulting in models that sustain good performance when the data are to some extent different from the training data. This robustness is related to the well-studied effect of regularization. The connection between Wasserstein DRO and regularization has been established in several settings. However, existing results often require restrictive assumptions, such as smoothness or convexity, that are not satisfied by many important problems. In this paper, we develop a general theory for the variation regularization effect of the Wasserstein DRO—a new form of regularization that generalizes total-variation regularization, Lipschitz regularization, and gradient regularization. Our results cover possibly nonconvex and nonsmooth losses and losses on non-Euclidean spaces and highlight the bias-variation tradeoff intrinsic in the Wasserstein DRO, which balances between the empirical mean of the loss and the variation of the loss. Example applications include multi-item newsvendor, linear prediction, neural networks, manifold learning, and intensity estimation for Poisson processes. We also use our theory of variation regularization to derive new generalization guarantees for adversarial robust learning. Funding: X. Chen is supported by the National Science Foundation [Grant IIS-1845444]. Supplemental Material: The online appendix is available at https://doi.org/10.1287/opre.2022.2383. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

11. Best Principal Submatrix Selection for the Maximum Entropy Sampling Problem: Scalable Algorithms and Performance Guarantees.

Author: Li, Yongchun and Xie, Weijun
Subjects: ARTIFICIAL intelligence, APPROXIMATION algorithms, ENTROPY, SEARCH algorithms, ALGORITHMS, SURETYSHIP & guaranty
Abstract: This paper studies a classic maximum entropy sampling problem (MESP), which aims to select the most informative principal submatrix of a prespecified size from a covariance matrix. MESP is widely applied to many areas, including healthcare, power systems, manufacturing, and data science. By investigating its Lagrangian dual and primal characterization, we derive a novel convex integer program for MESP and show that its continuous relaxation yields a near-optimal solution. The results motivate us to study efficient approximation algorithms and develop their approximation bounds for MESP, which improves the best known one in the literature. This paper studies a classic maximum entropy sampling problem (MESP), which aims to select the most informative principal submatrix of a prespecified size from a covariance matrix. By investigating its Lagrangian dual and primal characterization, we derive a novel convex integer program for MESP and show that its continuous relaxation yields a near-optimal solution. The results motivate us to develop a sampling algorithm and derive its approximation bound for MESP, which improves the best known bound in literature. We then provide an efficient deterministic implementation of the sampling algorithm with the same approximation bound. Besides, we investigate the widely used local search algorithm and prove its first known approximation bound for MESP. The proof techniques further inspire for us an efficient implementation of the local search algorithm. Our numerical experiments demonstrate that these approximation algorithms can efficiently solve medium-size and large-scale instances to near optimality. Finally, we extend the analyses to the A-optimal MESP, for which the objective is to minimize the trace of the inverse of the selected principal submatrix. Funding: This work was supported by the National Science Foundation Division of Information and Intelligent Systems [Grant 2246417] and Division of Civil, Mechanical and Manufacturing Innovation [Grant 2246414]. Supplemental Material: The e-companion is available at https://doi.org/10.1287/opre.2023.2488. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

12. Black-Box Acceleration of Monotone Convex Program Solvers.

Author: London, Palma, Vardi, Shai, Eghbali, Reza, and Wierman, Adam
Subjects: CONVEX programming, LINEAR programming, RESEARCH personnel, MACHINE learning, JOB titles, POSTDOCTORAL programs
Abstract: When and where was the study conducted: This work was done in 2018, 2019 and 2020 when Palma London was a PhD student at Caltech and Shai Vardi was a postdoc at Caltech. This work was also done in part while Palma London was visiting Purdue University, and while Reza Eghbali was a postdoctoral fellow the Simons Institute for the Theory of Computing. Adam Wierman is a professor at Caltech. Article Summary and Talking Points: Please describe the primary purpose/findings of your article in 3 sentences or less. This paper presents a framework for accelerating (speeding up) existing convex program solvers. Across engineering disciplines, a fundamental bottleneck is the availability of fast, efficient, accurate solvers. We present an acceleration method that speeds up linear programing solvers such as Gurobi and convex program solvers such as the Splitting Conic Solver by two orders of magnitude. Please include 3-5 short bullet points of "Need to Know" items regarding this research and your findings. - Optimizations problems arise in many engineering and science disciplines, and developing efficient optimization solvers is key to future innovation. - We speed up linear programing solver Gurobi by two orders of magnitude. - This work applies to optimization problems with monotone objective functions and packing constraints, which is a common problem formulation across many disciplines. Please identify 2 pull quotes from your article that best capture the novelty and impact of your research. "We propose a framework for accelerating exact and approximate convex programming solvers for packing linear programming problems and a family of convex programming problems with linear constraints. Analytically, we provide worst-case guarantees on the run time and the quality of the solution produced. Numerically, we demonstrate that our framework speeds up Gurobi and the Splitting Conic Solver by two orders of magnitude, while maintaining a near-optimal solution." "Our focus in this paper is on a class of packing problems for which data is either very costly or hard to obtain. In these situations, the number of data points available is much smaller than the number of variables. In a machine-learning setting, this regime is increasingly prevalent because it is often advantageous to consider larger and larger feature spaces, while not necessarily obtaining proportionally more data." Article Implications - Please describe in 5 sentences or less the innovative takeaway(s) of your research.This framework applies to optimization problems with monotone objective functions and packing constraints, which is a common problem formulation across many disciplines, including machine learning, inference, and resource allocation. Providing fast solvers for these problems is crucial. We exploit characteristics of the problem structure and leverage statistical properties of the problem constraints to allow us to speed up optimization solvers. We present worst-case guarantees on run-time, and empirically demonstrate speedups of two orders of magnitude. - Please describe in 5 sentences or less why your findings would be of interest to the general public.Many problems in engineering, science, math, and machine learning involve solving an optimization problem. Fast, efficient optimization solvers are key to future innovation in science and engineering. This work presents a tool to accelerate existing convex solvers, and thus can also be applied to future solvers. As the size of datasets grow it is even more crucial to have fast solvers. - Who would be the most impacted by your research (i.e. by industry, job title, consumer category).Our work impacts machine-learning researchers and optimization researchers, in industry or academia. This paper presents a black-box framework for accelerating packing optimization solvers. Our method applies to packing linear programming problems and a family of convex programming problems with linear constraints. The framework is designed for high-dimensional problems, for which the number of variables n is much larger than the number of measurements m. Given an (m × n) problem, we construct a smaller (m × ϵ n) problem, whose solution we use to find an approximation to the optimal solution. Our framework can accelerate both exact and approximate solvers. If the solver being accelerated produces an α-approximation, then we produce a (1 − ϵ) / α 2 -approximation of the optimal solution to the original problem. We present worst-case guarantees on run time and empirically demonstrate speedups of two orders of magnitude. Funding: Financial support from the National Science Foundation [Grants AitF-1637598, CNS-151894, and CPS-154471] and the Linde Institute is gratefully acknowledged. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

13. Breaking the Sample Size Barrier in Model-Based Reinforcement Learning with a Generative Model.

Author: Li, Gen, Wei, Yuting, Chi, Yuejie, and Chen, Yuxin
Subjects: REINFORCEMENT learning, SAMPLE size (Statistics), SCHOLARSHIPS, STATISTICAL accuracy, SCIENCE awards
Abstract: This paper studies a central issue in modern reinforcement learning, the sample efficiency, and makes progress toward solving an idealistic scenario that assumes access to a generative model or a simulator. Despite a large number of prior works tackling this problem, a complete picture of the trade-offs between sample complexity and statistical accuracy has yet to be determined. In particular, all prior results suffer from a severe sample size barrier in the sense that their claimed statistical guarantees hold only when the sample size exceeds some enormous threshold. The current paper overcomes this barrier and fully settles this problem; more specifically, we establish the minimax optimality of the model-based approach for any given target accuracy level. To the best of our knowledge, this work delivers the first minimax-optimal guarantees that accommodate the entire range of sample sizes (beyond which finding a meaningful policy is information theoretically infeasible). This paper is concerned with the sample efficiency of reinforcement learning, assuming access to a generative model (or simulator). We first consider γ-discounted infinite-horizon Markov decision processes (MDPs) with state space S and action space A. Despite a number of prior works tackling this problem, a complete picture of the trade-offs between sample complexity and statistical accuracy has yet to be determined. In particular, all prior results suffer from a severe sample size barrier in the sense that their claimed statistical guarantees hold only when the sample size exceeds at least | S ‖ A | (1 − γ) 2 . The current paper overcomes this barrier by certifying the minimax optimality of two algorithms—a perturbed model-based algorithm and a conservative model-based algorithm—as soon as the sample size exceeds the order of | S ‖ A | 1 − γ (modulo some log factor). Moving beyond infinite-horizon MDPs, we further study time-inhomogeneous finite-horizon MDPs and prove that a plain model-based planning algorithm suffices to achieve minimax-optimal sample complexity given any target accuracy level. To the best of our knowledge, this work delivers the first minimax-optimal guarantees that accommodate the entire range of sample sizes (beyond which finding a meaningful policy is information theoretically infeasible). Funding: Y. Wei is supported in part by the Google Research Scholar Award and the National Science Foundation [Grants CCF-2106778, DMS-2147546, and DMS-2143215]. Y. Chi is supported in part by the Office of Naval Research [Grants N00014-18-1-2142 and N00014-19-1-2404] and the National Science Foundation [Grants CCF-1806154, CCF-2007911, and CCF-2106778]. Y. Chen is supported in part by the Alfred P. Sloan Foundation [research fellowship], Google [research scholar award], the Air Force Office of Scientific Research [Grants FA9550-19-1-0030 and FA9550-22-1-0198], the Office of Naval Research [Grant N00014-22-1-2354], and the National Science Foundation [Grants CCF-2221009, CCF-1907661, DMS-2014279, IIS-2218713, and IIS-2218773]. Supplemental Material: The online appendix is available at https://doi.org/10.1287/opre.2023.2451. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

14. Is Q-Learning Minimax Optimal? A Tight Sample Complexity Analysis.

Author: Li, Gen, Cai, Changxiao, Chen, Yuxin, Wei, Yuting, and Chi, Yuejie
Subjects: SCHOLARSHIPS, REINFORCEMENT learning, RESEARCH awards, MARKOV processes, AIR forces, ACTIVE learning, ASYNCHRONOUS learning
Abstract: This paper investigates a model-free algorithm of broad interest in reinforcement learning, namely, Q-learning. Whereas substantial progress had been made toward understanding the sample efficiency of Q-learning in recent years, it remained largely unclear whether Q-learning is sample-optimal and how to sharpen the sample complexity analysis of Q-learning. In this paper, we settle these questions: (1) When there is only a single action, we show that Q-learning (or, equivalently, TD learning) is provably minimax optimal. (2) When there are at least two actions, our theory unveils the strict suboptimality of Q-learning and rigorizes the negative impact of overestimation in Q-learning. Our theory accommodates both the synchronous case (i.e., the case in which independent samples are drawn) and the asynchronous case (i.e., the case in which one only has access to a single Markovian trajectory). Q-learning, which seeks to learn the optimal Q-function of a Markov decision process (MDP) in a model-free fashion, lies at the heart of reinforcement learning. When it comes to the synchronous setting (such that independent samples for all state–action pairs are drawn from a generative model in each iteration), substantial progress has been made toward understanding the sample efficiency of Q-learning. Consider a γ-discounted infinite-horizon MDP with state space S and action space A : to yield an entry-wise ε-approximation of the optimal Q-function, state-of-the-art theory for Q-learning requires a sample size exceeding the order of | S ‖ A | (1 − γ) 5 ε 2 , which fails to match existing minimax lower bounds. This gives rise to natural questions: What is the sharp sample complexity of Q-learning? Is Q-learning provably suboptimal? This paper addresses these questions for the synchronous setting: (1) when the action space contains a single action (so that Q-learning reduces to TD learning), we prove that the sample complexity of TD learning is minimax optimal and scales as | S | (1 − γ) 3 ε 2 (up to log factor); (2) when the action space contains at least two actions, we settle the sample complexity of Q-learning to be on the order of | S ‖ A | (1 − γ) 4 ε 2 (up to log factor). Our theory unveils the strict suboptimality of Q-learning when the action space contains at least two actions and rigorizes the negative impact of overestimation in Q-learning. Finally, we extend our analysis to accommodate asynchronous Q-learning (i.e., the case with Markovian samples), sharpening the horizon dependency of its sample complexity to be 1 (1 − γ) 4 . Funding: Y. Chen is supported in part by the Alfred P. Sloan Research Fellowship, the Google Research Scholar Award, the Air Force Office of Scientific Research [Grant FA9550-22-1-0198], the Office of Naval Research (ONR) [Grant N00014-22-1-2354], and the National Science Foundation (NSF) [Grants CCF-2221009, CCF-1907661, DMS-2014279, IIS-2218713, and IIS-2218773]. Y. Wei is supported in part by the Google Research Scholar Award and the NSF [Grants CCF-2106778, DMS-2147546/2015447, and CAREER award DMS-2143215]. Y. Chi is supported in part by the ONR [Grants N00014-18-1-2142 and N00014-19-1-2404] and the NSF [Grants CCF-1806154, CCF-2007911, CCF-2106778, ECCS-2126634, and DMS-2134080]. Supplemental Material: The online appendix is available at https://doi.org/10.1287/opre.2023.2450. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

15. Recursive Importance Sketching for Rank Constrained Least Squares: Algorithms and High-Order Convergence.

Author: Luo, Yuetian, Huang, Wen, Li, Xudong, and Zhang, Anru
Subjects: LEAST squares, GAUSS-Newton method, LOW-rank matrices, MACHINE learning, ALGORITHMS
Abstract: Solving Rank Constrained Least Squares via Recursive Importance Sketching In statistics and machine learning, we sometimes run into the rank-constrained least squares problems, for which we need to find the best low-rank fit between sets of data, such as trying to figure out what factors are affecting the data, filling in missing information, or finding connections between different sets of data. This paper introduces a new method for solving this problem called the recursive importance sketching algorithm (RISRO), in which the central idea is to break the problem down into smaller, easier parts using a unique technique called "recursive importance sketching." This new method is not only easy to use, but it is also very efficient and gives accurate results. We prove that RISRO converges in a local quadratic-linear and quadratic rate under some mild conditions. Simulation studies also demonstrate the superior performance of RISRO. In this paper, we propose a recursive importance sketching algorithm for rank-constrained least squares optimization (RISRO). The key step of RISRO is recursive importance sketching, a new sketching framework based on deterministically designed recursive projections, and it significantly differs from the randomized sketching in the literature. Several existing algorithms in the literature can be reinterpreted under this new sketching framework, and RISRO offers clear advantages over them. RISRO is easy to implement and computationally efficient, and the core procedure in each iteration is to solve a dimension-reduced least squares problem. We establish the local quadratic-linear and quadratic rate of convergence for RISRO under some mild conditions. We also discover a deep connection of RISRO to the Riemannian Gauss–Newton algorithm on fixed rank matrices. The effectiveness of RISRO is demonstrated in two applications in machine learning and statistics: low-rank matrix trace regression and phase retrieval. Simulation studies demonstrate the superior numerical performance of RISRO. Funding: Y. Luo and A. Zhang were partially supported by the National Science Foundation [Grant CAREER-2203741]. W. Huang was partially supported by the Fundamental Research Funds for the Central Universities [Grant 20720190060] and the National Natural Science Foundation of China [Grant 12001455]. X. Li was partially supported by the National Natural Science Foundation of China [Grants 62141407 and 12271107], the Chenguang Program by the Shanghai Education Development Foundation and Shanghai Municipal Education Commission [Grant 19CG02], and the Shanghai Science and Technology Program [Grant 21JC1400600]. Supplemental Material: The online appendix is available at https://doi.org/10.1287/opre.2023.2445. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

16. Resource-Aware Cost-Sharing Methods for Scheduling Games.

Author: Christodoulou, George, Gkatzelis, Vasilis, and Sgouritsa, Alkmini
Subjects: COST shifting, WORKING hours, COST functions, NASH equilibrium, CONVEX functions, SCHEDULING, SHARING economy
Abstract: In large distributed systems, ensuring the efficient utilization of the available resources is a very challenging task. Given limited information regarding the state of the system and no centralized control over the outcome, decentralized scheduling mechanisms are unable to enforce optimal utilization. To better understand such systems, some classic papers that introduced game theoretic models used the "price of anarchy" measure to evaluate the system's performance. The paper "Resource-Aware Cost-Sharing Methods for Scheduling Games" by Christodoulou, Gkatzelis, and Sgouritsa overcomes some of the overly pessimistic results shown in this prior work by enhancing the scheduling mechanisms with access to some additional information regarding the state of the system: a "resource-aware" mechanism knows what machines are available in the system and uses this information to carefully incentivize the users toward more efficient Nash equilibrium outcomes. We study the performance of cost-sharing methods in a selfish scheduling setting where a group of users schedule their jobs on machines with load-dependent cost functions, aiming to minimize their own cost. Anticipating this user behavior, the system designer chooses a decentralized protocol that defines how the cost generated on each machine is to be shared among its users, and the performance of the protocol is evaluated over the Nash equilibria of the induced game. Previous work on selfish scheduling has focused on two extreme models: omniscient protocols that are aware of every machine and every job that is active at any given time, and oblivious protocols that are aware of nothing beyond the machine they control. We focus on a well-motivated middle-ground model of resource-aware protocols, which are aware of the set of machines in the system, but unaware of what jobs are active at any given time. Furthermore, we study the extent to which appropriately overcharging the users can lead to improved performance. We provide protocols that achieve small constant price of anarchy bounds when the cost functions are convex or concave, and we complement our positive results with impossibility results for general cost functions. Funding: This work was supported by the Royal Society [Grant LT140046], the Engineering and Physical Sciences Research Council [Grant EP/M008118/1], the National Science Foundation [Grants CCF-1161813, CCF-1216073, and CCF-1408635; CAREER Award CCF-2047907], and the Lise Meitner Award Fellowship. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

17. Provably Good Region Partitioning for On-Time Last-Mile Delivery.

Author: Carlsson, John Gunnar, Liu, Sheng, Salari, Nooshin, and Yu, Han
Subjects: DELIVERY of goods, STOCHASTIC orders, PARALLEL algorithms, DISCRETE geometry
Abstract: Managing on-time delivery systems is challenging because of the underlying uncertainties and combinatorial nature of the routing decision. In practice, the efficiency of such systems also hinges on the driver's familiarity with the local neighborhood. In "Provably Good Region Partitioning for On-Time Last-Mile Delivery," Carlsson et al. study a region partitioning policy to minimize the expected delivery time of customer orders in a stochastic and dynamic setting. This policy assigns every driver to a subregion, ensuring that drivers are only dispatched to their territories. The authors characterize the structure of the optimal partitioning policy and show its expected on-time performance converges to that of the flexible dispatching policy in heavy traffic. The optimal characterization features two insightful conditions that are critical to the on-time performance of last-mile delivery systems. Furthermore, the paper develops partitioning algorithms with performance guarantees, leveraging ham sandwich cuts and three-partitions from discrete geometry. On-time last-mile delivery is expanding rapidly as people expect faster delivery of goods ranging from grocery to medicines. Managing on-time delivery systems is challenging because of the underlying uncertainties and combinatorial nature of the routing decision. In practice, the efficiency of such systems also hinges on the driver's familiarity with the local neighborhood. This paper studies the optimal region partitioning policy to minimize the expected delivery time of customer orders in a stochastic and dynamic setting. We allow both the order locations and on-site service times to be random and generally distributed. This policy assigns every driver to a subregion, hence making sure drivers will only be dispatched to their own territories. We characterize the structure of the optimal partitioning policy and show its expected on-time performance converges to that of the flexible dispatching policy in heavy traffic. The optimal characterization features two insightful conditions that are critical to the on-time performance of last-mile delivery systems. We then develop partitioning algorithms with performance guarantees, leveraging ham sandwich cuts and three-partitions from discrete geometry. This algorithmic development can be of independent interest for other logistics problems. We demonstrate the efficiency of the proposed region partitioning policy via numerical experiments using synthetic and real-world data sets. Funding: The first author gratefully acknowledges the support of the Office of Naval Research [Grant N00014-21-1-2208] and METRANS [Grant PSR-21-22]. The second author gratefully acknowledges the support of the Natural Sciences and Engineering Research Council of Canada [Grant RGPIN-2022-04950] and the National Natural Science Foundation of China [Grant 72242106]. The third author gratefully acknowledges the support of the Natural Sciences and Engineering Research Council of Canada [Grant RGPIN-2023-04453]. Supplemental Material: The online appendix is available at https://doi.org/10.1287/opre.2021.0588. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

18. Static Pricing for Multi-unit Prophet Inequalities.

Author: Chawla, Shuchi, Devanur, Nikhil, and Lykouris, Thodoris
Subjects: PRICES, REVENUE management, VALUE (Economics), PUBLIC utilities, SOCIAL services
Abstract: Characterizing the Efficiency of Static Pricing Schemes as a Function of the Supply The problem of selling a supply of k units to a stream of customers constitutes one of the cornerstones in revenue management. Static pricing schemes (that output the same price to all customers) are commonly used because of their simplicity and their many desirable properties; they are anonymous, nonadaptive, and order oblivious. Although the efficiency of those schemes should improve as the supply k increases, prior work has only focused either on algorithms that aim for a constant approximation that is independent of k or on the setting where k becomes really large. In contrast, this paper characterizes the efficiency of static pricing schemes as a function of the supply. Our approach stems from identifying a "sweet spot" between selling enough items and obtaining enough utility from customers with high valuations. Subsequent work shows that our pricing scheme is the optimal static pricing for every value of k. We study a pricing problem where a seller has k identical copies of a product, buyers arrive sequentially, and the seller prices the items aiming to maximize social welfare. When k = 1, this is the so-called prophet inequality problem for which there is a simple pricing scheme achieving a competitive ratio of 1/2. On the other end of the spectrum, as k goes to infinity, the asymptotic performance of both static and adaptive pricing is well understood. We provide a static pricing scheme for the small-supply regime: where k is small but larger than one. Prior to our work, the best competitive ratio known for this setting was the 1/2 that follows from the single-unit prophet inequality. Our pricing scheme is easy to describe as well as practical; it is anonymous, nonadaptive, and order oblivious. We pick a single price that equalizes the expected fraction of items sold and the probability that the supply does not sell out before all customers are served; this price is then offered to each customer while supply lasts. This extends an approach introduced by Samuel-Cahn for the case of k = 1. This pricing scheme achieves a competitive ratio that increases gradually with the supply. Subsequent work shows that our pricing scheme is the optimal static pricing for every value of k. Funding: This work was supported by the National Science Foundation [Grants CCF-2008006 and SHF-1704117]. T. Lykouris would like to acknowledge funding from Google Research. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

19. Statistical Inference for Aggregation of Malmquist Productivity Indices.

Author: Pham, Manh, Simar, Léopold, and Zelenyuk, Valentin
Subjects: INFERENTIAL statistics, CENTRAL limit theorem, ECONOMETRICS
Abstract: A Comprehensive Set of Asymptotic Properties for a Meaningful Aggregation of Malmquist Indices The Malmquist productivity index (MPI) has become one of the most widely used tools for analyzing dynamic performance of decision-making units. Whereas accounting for economic weights of individual units in aggregations of indices is emphasized in the literature, statistical theory for constructing confidence intervals and performing hypothesis tests based on weighted aggregation of the MPI are still unavailable. In "Statistical Inference for Aggregation of Malmquist Productivity Indices," Pham, Simar, and Zelenyuk use a novel approach (based on the uniform delta method) to develop new asymptotic theory (including new central limit theorems) for aggregate MPIs as the basis for the statistical inference and test. They also verify the finite-sample performance of their approach via extensive Monte Carlo experiments and provide an illustration using real-world data. The Malmquist productivity index (MPI) has gained popularity among studies on the dynamic change of productivity of decision-making units (DMUs). In practice, this index is frequently reported at aggregate levels (e.g., public and private firms) in the form of simple, equally weighted arithmetic or geometric means of individual MPIs. A number of studies emphasize that it is necessary to account for the relative importance of individual DMUs in the aggregations of indices in general and of the MPI in particular. Whereas more suitable aggregations of MPIs have been introduced in the literature, their statistical properties have not been revealed yet, preventing applied researchers from making essential statistical inferences, such as confidence intervals and hypothesis testing. In this paper, we fill this gap by developing a full asymptotic theory for an appealing aggregation of MPIs. On the basis of this, meaningful statistical inferences are proposed, their finite-sample performances are verified via extensive Monte Carlo experiments, and the importance of the proposed theoretical developments is illustrated with an empirical application to real data. Funding: M. Pham acknowledges support from an Australian Government Research Training Program Scholarship. V. Zelenyuk acknowledges financial support from the University of Queensland and the Australian Research Council [Grant FT170100401]. Supplemental Material: The e-companion is available at https://doi.org/10.1287/opre.2022.2424. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

20. Best of Both Worlds: Ex Ante and Ex Post Fairness in Resource Allocation.

Author: Aziz, Haris, Freeman, Rupert, Shah, Nisarg, and Vaish, Rohit
Subjects: RESOURCE allocation, DISTRIBUTION (Probability theory), FAIRNESS, ECONOMIC efficiency
Abstract: Consider the problem of allocating indivisible goods among agents with additive valuations, where monetary payments are not allowed. When randomization is allowed, it is possible to achieve compelling notions of fairness such as EV, which states that no agent should prefer any other agent's allocation to their own. When allocations must be deterministic, achieving exact fairness is impossible but approximate notions such as EV up to one good can be guaranteed. In "Best of Both Worlds: Ex Ante and Ex Post Fairness in Resource Allocation," H. Aziz, R. Freeman, N. Shah, and R. Vaish ask whether it is possible to achieve both types of guarantees simultaneously. More specifically, they ask whether there exists a probability distribution over deterministic allocations such that every deterministic allocation is envy-free up to one good and the distribution is exactly envy-free in expectation. The main result of the paper answers this question in the affirmative, showing that ex ante and ex post fairness need not be in conflict. We study the problem of allocating indivisible goods among agents with additive valuations. When randomization is allowed, it is possible to achieve compelling notions of fairness such as envy-freeness, which states that no agent should prefer any other agent's allocation to their own. When allocations must be deterministic, achieving exact fairness is impossible but approximate notions such as envy-freeness up to one good can be guaranteed. Our goal in this work is to achieve both simultaneously, by constructing a randomized allocation that is exactly fair ex ante (before the randomness is realized) and approximately fair ex post (after the randomness is realized). The key question we address is whether ex ante envy-freeness can be achieved in combination with ex post envy-freeness up to one good. We settle this positively by designing an efficient algorithm that achieves both properties simultaneously. The algorithm can be viewed as a desirable way to instantiate a lottery for the probabilistic serial rule. If we additionally require economic efficiency, we obtain three impossibility results that show that ex post or ex ante Pareto optimality is impossible to achieve in conjunction with combinations of fairness properties. Hence, we slightly relax our ex post fairness guarantees and present a different algorithm that can be viewed as a fair way to instantiate a lottery for the maximum Nash welfare allocation rule. Funding: This work was supported by the Office of Naval Research [Grant N00014-17-1-2621] and DST INSPIRE [Grant DST/INSPIRE/04/2020/000107]. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

21. Technical Note—Asymptotically Optimal Control of Omnichannel Service Systems with Pick-up Guarantees.

Author: Gao, Xuefeng, Huang, Junfei, and Zhang, Jiheng
Subjects: FOOD quality, CONSUMERS, COST control, ONLINE education, RESTAURANT customer services, TARDINESS, SURETYSHIP & guaranty
Abstract: Both walk-in and online customers appear in many service systems such as restaurants. Online customers are often given a target time for pick up, and the quality of the food/beverage can degrade if it is ready before the arrival of online customers. This distinctive feature brings essential difficulties in the analysis and control of such systems. In "Technical Note: Asymptotically Optimal Control of Omnichannel Service Systems with Pick-up Guarantees," Gao, Huang, and Zhang study the optimal control problem of such service systems based on a two-class single server queueing model. The paper proposes a nearly-optimal policy that keeps the server idle when the queue of online customers drops below a threshold and there are no walk-in customers. Motivated by the recent popularity of omnichannel service systems, we analyze the joint admission and scheduling control of a queueing system with two classes of customers: online and walk-in. Unlike walk-in customers, online customers are given a target time for pick up upon placing an order. Thus, in addition to minimizing the waiting costs of walk-in customers and the rejection cost of both classes, we need to minimize the earliness and tardiness costs of online customers. Such a distinctive objective makes the control problem difficult to analyze. We develop a novel analysis by adopting the idea of proving H = λ G to establish an asymptotic relationship between the waiting time cost and the queue length cost under general control policies in the heavy-traffic regime. We also design a policy and show that it is asymptotically optimal. Funding: X. Gao's research is supported in part by the Hong Kong Research Grants Council [Grants 14201520, 14201421, and 14212522]. J. Huang's research is supported in part by the Hong Kong Research Grants Council [Grants 14500819, 14505820, and 14501621] and the National Natural Science Foundation of China [Grant 72222023]. J. Zhang's research is supported in part by the Hong Kong Research Grants Council [Grants 16208120 and 16214121]. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

22. Distinguishing Useful and Wasteful Slack.

Author: Bogetoft, Peter and Kerstens, Pieter Jan
Subjects: ORGANIZATION management, DECOUPLING (Organizational behavior), STRATEGIC planning
Abstract: Can inefficiency be rational? Excess resources or slack may serve as a buffer against environmental shocks, help decouple organizations, ease planning and implementation, support innovation, and enable effective responses to competitors. Slack may however also be the result of inefficiency. In Bogetoft and Kerstens, Distinguishing useful and wasteful slack, we propose an approach to separate useful and wasteful slack. If an organization can maintain the same levels of output and slack at lower cost, there is wasteful or nonrationalizable spending. We develop ways to measure the extent to which total spending can be rationalized and show how to statistically estimate and test the usefulness of the available slack using bootstrapping. The literature on organization and strategic management suggests that slack in the form of excess resources may be useful. It may, for example, serve as a buffer against environmental shocks, help decouple organizations, ease planning and implementation, support innovation, and enable effective responses to competitors. In contrast, the economic literature tends to view slack as wasteful. When the same products and services can be produced with fewer resources and slack per se is not assigned any value, slack should be eliminated. The aim of this paper is to reconcile these two perspectives. We acknowledge that slack may be both useful and wasteful. The challenge is how to separate the two. Our approach relies on the simple Pareto idea. If an organization can maintain the same levels of output and slack at lower cost, there is wasteful or nonrationalizable spending. We develop ways to measure the extent to which total spending can be rationalized and show how to statistically estimate and test the usefulness of the available slack using bootstrapping. Funding: Financial support from Det Frie Forskningsråd [Grant 9038-00042A] is greatly appreciated. Supplemental Material: The online appendix is available at https://doi.org/10.1287/opre.2022.2415. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

23. On Proportionally Consistent Solutions to the Divorced-Parents Problem.

Author: Romeijnders, Ward, Van Foreest, Nicky D., and Wijngaard, Jacob
Subjects: CHILDREN of divorced parents, DUTCH people, BIPARTITE graphs, PARENTS, JUDGES, DOMESTIC relations, CUSTODY of children
Abstract: When Dutch parents with children divorce, a mediator compiles a matrix with the financial needs of the children and the financial capacities of parents to meet these needs. Moreover, in case parents have children from previous marriages or are prepared to contribute to stepchildren, a bipartite graph shows which parent is financially responsible for which child. The Dutch high court ruled that the final contributions should be proportionally consistent, implying that shortages for children should be prevented if possible and any remaining parental capacities should be proportionally divided among parents responsible for the same child. Finding by hand this proportional solution is difficult for realistic court cases, as these can include several (step)parents and children. The paper on the divorced-parents problem shows that the final unique solution can be found when parents start court cases iteratively and provides efficient algorithms that can deal with large (country-size) networks. When Dutch parents divorce, Dutch law dictates that the parental contributions to cover the financial needs of the children have to be proportionally consistent. This rule is clear when parents only have common children. However, cases can be considerably more complicated, for example, when parents have financial responsibilities to children from previous marriages. We show that, mathematically, this settlement problem can be modeled as a bipartite rationing problem for which a unique global proportionally proportional solution exists. Moreover, we develop two efficient algorithms for obtaining this proportionally proportional solution, and we show numerically that both algorithms are considerably faster than standard convex optimization techniques. The first algorithm is a novel tailor-made fixed-point iteration algorithm (FPA), whereas the second algorithm only iteratively applies simple lawsuits involving a single child and its parents. The inspiration for this latter algorithm comes from our main convergence proof in which we show that iteratively applying settlements on smaller subnetworks eventually leads to the same settlement on the network as a whole. This has significant societal importance because, in practice, lawsuits are often only held between two or a few parents. Moreover, our iterative algorithm is easy to understand, also by parents, legal counselors, and judges, which is crucial for its acceptance in practice. Finally, as the method provides a unique solution to any dispute, it removes the legal inequality perceived by parents. Consequently, it may considerably reduce the workload of courts because parents and lawyers can compute the proportionally proportional parental contributions before bringing their case to court. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

24. Shipping Emission Control Area Optimization Considering Carbon Emission Reduction.

Author: Zhuge, Dan, Wang, Shuaian, and Zhen, Lu
Subjects: EMISSION control, CARBON emissions, GREENHOUSE gas mitigation, EXTERNALITIES, SAILING ships, COST allocation
Abstract: Managing Shipping Emission Control Areas The design of emission control areas (ECAs) is crucial for reducing global shipping emissions and protecting the environment. In "Shipping Emission Control Area Optimization Considering Carbon Emission Reduction," Zhuge, Wang, and Zhen focus on the ECA optimization problem for sailing legs with ECAs. First, a case with a no-ECA policy and a case with the current ECA policy are discussed. Then, two new voyage-dependent ECA policies with sulfur limits, designated sailing paths, and speed limits are proposed, under which Stackelberg game models with the ECA regulator and a shipping company are developed. The authors extend the research problem from a sailing leg to a shipping network to improve the practicality of the findings. They also develop a dynamic programming-based algorithm to optimize the ECA policies for the shipping network from the perspective of the ECA regulator. The effectiveness of the proposed policies in reducing social costs is validated by numerical experiments. Sulfur emission control areas (ECAs) are crucial for reducing global shipping emissions and protecting the environment. The main plank of an ECA policy is usually a fuel sulfur limit. However, the approaches to setting sulfur limits are relatively subjective and lack scientific support. This paper investigates the design of ECA policies, especially sulfur limits, for sailing legs with ECAs. The objective is to minimize the social costs of shipping operations, local sulfur oxides (SOx) emissions, and global carbon dioxide (CO2) emissions. First, a case with a no-ECA policy and a case with the current ECA policy are analyzed. Then, two new voyage-dependent ECA policies with sulfur limits, designated sailing paths, and speed limits are proposed. Stackelberg game models are developed to solve the research problem with the two proposed policies and two players: the ECA regulator and a shipping company aiming to minimize social costs and company costs, respectively. The ECA regulator determines the sulfur limit, sailing path, and speed limit, and the shipping company optimizes the sailing speed accordingly. We also compare and analyze each type of cost under different ECA policies (i.e., no ECA, the current ECA policy, and the proposed ECA policies). The research problem is then extended from a sailing leg to a shipping network to improve the practicality of the findings. A dynamic programming-based algorithm is developed to optimize the ECA policies for the shipping network from the perspective of the ECA regulator. Mathematical derivation shows that the proposed ECA policies can reduce the social costs of shipping. The results of extensive numerical experiments further demonstrate the ability of the proposed policies to reduce social costs, providing important insights for voyage-dependent ECA policy design. Funding: This work was supported by the National Natural Science Foundation of China [Grants 72025103, 72201163, 71831008, 72071173, 72371221, 72394360, 72394362, and 72361137001]. Supplemental Material: The e-companion is available at https://doi.org/10.1287/opre.2022.0361. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

25. Screening with Limited Information: A Dual Perspective.

Author: Chen, Zhi, Hu, Zhenyu, and Wang, Ruiqin
Subjects: PRICES, VALUE (Economics), CONVEX sets, PROBLEM solving, BEST sellers
Abstract: A Dual Perspective to the Robust Screening Problem Robust screening problem is concerned with the problem of a seller seeking a selling mechanism that maximizes the worst-case revenue obtained from a buyer whose valuation distribution lies in a certain ambiguity set. In the paper "Screening with Limited Information: A Dual Perspective", Z. Chen, Z. Hu, and R. Wang show that strong duality holds between the problem of finding the optimal robust mechanism and a minimax pricing problem, where the adversary first chooses a worst-case distribution and then the seller decides the best posted price mechanism. The duality result connects prior literature that separately studies the primal (robust screening) and problems related to the dual and offers a unified geometric intuition in solving the problem. Consider a seller seeking a selling mechanism to maximize the worst-case revenue obtained from a buyer whose valuation distribution lies in a certain ambiguity set. Such a mechanism design problem with one product and one buyer is known as the screening problem. For a generic convex ambiguity set, we show via the minimax theorem that strong duality holds between the problem of finding the optimal robust mechanism and a minimax pricing problem where the adversary first chooses a worst-case distribution, and then the seller decides the best posted price mechanism. This implies that the extra value of optimizing over more sophisticated mechanisms amounts exactly to the value of eliminating distributional ambiguity under a posted price mechanism. The duality result also connects prior literature that separately studies the primal (robust screening) and problems related to the dual (e.g., robust pricing, buyer-optimal pricing, and personalized pricing). We further analytically solve the minimax pricing problem (as well as the robust pricing problem) for several important ambiguity sets, such as the ones with mean and various dispersion measures, and with the Wasserstein metric, and we provide a unified geometric intuition behind our approach. The solutions are then used to construct the optimal robust mechanism and to compare with the solutions to the robust pricing problem. We also establish the uniqueness of the worst-case distribution for some cases. Funding: The research is funded in part by the Hong Kong Research Grants Council under its General Research Fund (CUHK-11502422) and the Ministry of Education, Singapore, under its 2019 Academic Research Fund Tier 3 (Grant MOE-2019-T3-1-010). Supplemental Material: The online appendix is available at https://doi.org/10.1287/opre.2022.0016. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

26. Harvesting Solar Power Foments Prices in a Vicious Cycle: Breaking the Cycle with Price Mechanisms.

Author: Mamaghani, Fariba F. and Çakanyildirim, Metin
Subjects: PRICES, SOLAR energy, ENERGY harvesting, DISTRIBUTED power generation, ELECTRICITY pricing, PRICE regulation
Abstract: Harvesting Solar Power Foments Prices in a Vicious Cycle: Breaking the Cycle with Price Mechanisms An important paper in this issue quantifies and explores how residential solar power adoption affects the price of electricity and the profit of a utility firm. It reveals the mechanism behind what is known as the "utility (death) spiral." As more residences adopt solar power generation, the price of electricity goes up, whereas the profit of utility firm comes down. The study finds the equilibrium of a game played between consumers deciding to adopt residential solar power and the utility deciding on prices in a regulated market. The equilibrium, involving a potential bankruptcy for the utility firm and a high price for a nonadopter consumer, is desirable for neither. To avoid this, it suggests either decreasing the buyback price that the utility pays for the excess power of an adopter or introducing a subscription for adopters. These suggestions can inform pricing policies in the power market. Distributed solar power generation is growing but not necessarily benefiting the utility firms. Reducing the demand, it hinders the coverage of utility costs with reasonable retail electricity prices. Utilities raise prices, unintentionally reducing both demand and affordability of electricity, and are said to be caught in a utility (death) spiral. The reduced affordability adversely affects consumers who cannot invest in solar generation. Environmentally desirable solar power paradoxically can be socially undesirable. Market regulators are challenged to keep prices low within the current pricing mechanisms. We provide a profit maximization formulation for a regulated utility and reveal the interaction between optimal price increases and growing solar power adoption. Iterating with this interaction, we study the utility death spiral for myopic and forward-looking consumers. We consider new pricing mechanisms with a buyback price and a subscription fee paid only by solar power-generating consumers. The fee mitigates the optimal retail price increase by allowing for the coverage of fixed costs in part. We find appropriate values for the buyback price and subscription fee to, respectively, slow or stop the utility spiral. These mechanisms are important not only for the utility and its regulator but also for all electricity consumers. Supplemental Material: The online appendix is available at https://doi.org/10.1287/opre.2021.0756. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

27. Erratum to "Competitive Two-Agent Scheduling and Its Applications".

Author: Koulamas, Christos, Kyparisis, George, Pinedo, Michael, and Wan, Guohua
Subjects: TIME complexity, OPERATIONS research, SCHEDULING, DYNAMIC programming, TARDINESS
Abstract: Erratum to "Competitive Two-Agent Scheduling and Its Applications" The paper "Competitive Two-Agent Scheduling and Its Applications" by Leung, Pinedo, and Wan appeared in Operations Research in 2010. Because competitive agent scheduling has become an active area of research, the paper has garnered its share of citations over the years. This particular direction of scheduling research has become so popular that the book Multiagent Scheduling: Models and Algorithms by Agnetis, Billaut, Gawiejnowicz, Pacciarelli, and Soukhal appeared in 2014. However, one of the results in the Leung et al. paper had a minor error in it. The paper considers a two-agent scheduling problem with a single machine, with each one of the two agents having their own set of jobs. The first agent has as the objective the minimization of the total tardiness of her set of jobs, and the second agent has to ensure that the total completion time of her set of jobs is less than or equal to Q. Leung et al. showed, through a dynamic programming formulation, that this problem can be solved in pseudopolynomial time and determined the time complexity of their algorithm. Their proof contains a minor error. The proper algorithm is still pseudopolynomial but of a slightly higher time complexity. Leung et al. (2010) [Leung JY-T, Pinedo M, Wan G (2010) Competitive two-agent scheduling and its applications. Oper. Res. 58:458–469] considered a two-agent nonpreemptive single-machine scheduling problem. Agent A is responsible for n1 jobs with due dates d 1 , ... , d n and has as the objective the minimization of the total tardiness of the n1 jobs. Agent B is responsible for n2 jobs and has as the objective the minimization of the total completion time of the n2 jobs. The problem is to find a schedule for the n 1 + n 2 jobs that minimizes the objective of agent A (with regard to his n1 jobs) while keeping the objective of agent B (with regard to his n2 jobs) below or at a fixed level Q. Leung et al. (2010) [Leung JY-T, Pinedo M, Wan G (2010) Competitive two-agent scheduling and its applications. Oper. Res. 58:458–469], in their theorem 3, showed that this problem can be solved through dynamic programming in pseudopolynomial time. However, in the proof of their theorem and in their dynamic programming formulation, there is an error that requires some changes in their proof. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

28. Online Learning and Pricing for Service Systems with Reusable Resources.

Author: Jia, Huiwen, Shi, Cong, and Shen, Siqian
Subjects: ONLINE shopping, MACHINE learning, REVENUE management, HOTEL management, PRICES, ONLINE algorithms, ONLINE education
Abstract: Revenue Management of Service Systems under Incomplete Information Revenue management with reusable resources finds many important applications in today's economy, such as cloud computing services, car/bicycle rental services, ride-hailing services, hotel management, project team management, and call center services. The existing literature predominantly assumes that the stochastic demand and service processes are given as an input to the models, and the pricing decisions are made with full knowledge of the distributional information. However, in practice, the decision maker may not know how demand or service rates react to price changes. Thus, the decision maker needs to learn the underlying mapping between prices and rates from past observations, while maximizing the total expected revenue on the fly. In "Online Learning and Pricing for Service Systems with Reusable Resources", H. Jia, C. Shi, and S. Shen developed a series of online learning algorithms for revenue management problems with reusable resources and showed that they admit an optimal regret bound. We consider a price-based revenue management problem with finite reusable resources over a finite time horizon T. Customers arrive following a price-dependent Poisson process, and each customer requests one unit of c homogeneous reusable resources. If there is an available unit, the customer gets served within a price-dependent exponentially distributed service time; otherwise, the customer waits in a queue until the next available unit. In this paper, we assume that the firm does not know how the arrival and service rates depend on posted prices, and thus it makes adaptive pricing decisions in each period based only on past observations to maximize the cumulative revenue. Given a discrete price set with cardinality P, we propose two online learning algorithms, termed batch upper confidence bound (BUCB) and batch Thompson sampling (BTS), and prove that the cumulative regret upper bound is O ˜ (P T) , which matches the regret lower bound. In establishing the regret, we bound the transient system performance upon price changes via a novel coupling argument, and also generalize bandits to accommodate subexponential rewards. We also extend our approach to models with balking and reneging customers and discuss a continuous price setting. Our numerical experiments demonstrate the efficacy of the proposed BUCB and BTS algorithms. Funding: This research was partially supported by an Amazon research award and the Department of Energy [Award DE-SC0018018]. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

29. On-Demand Ride-Matching in a Spatial Model with Abandonment and Cancellation.

Author: Wang, Guangju, Zhang, Hailun, and Zhang, Jiheng
Subjects: BIG data, SUPPLY & demand, DECISION making, RESEARCH institutes
Abstract: Balancing the Abandonment and Cancellation of a Ride-Matching Market. In "On-Demand Ride-Matching in a Spatial Model with Abandonment and Cancellation," Wang, Zhang, and Zhang propose a spatial model to approximate pickup times based on the number of waiting passengers and idle drivers. They analyze the dynamics of passengers and drivers in a queueing model in which the platform can control the matching process by setting a threshold on the expected pickup time. Applying fluid approximations, we obtain accurate performance evaluations and an elegant optimality condition based on which they propose a policy that adapts to time-varying demand. Ride-hailing platforms, such as Uber, Lyft, and DiDi, coordinate supply and demand by matching passengers and drivers. The platform has to promptly dispatch drivers when receiving requests because, otherwise, passengers may lose patience and abandon the service by switching to alternative transportation methods. However, having fewer idle drivers results in a possible lengthy pickup time, which is a waste of system capacity and may cause passengers to cancel the service after they are matched. Because of the complex spatial and queueing dynamics, analysis of the matching decision is challenging. In this paper, we propose a spatial model to approximate the pickup time based on the number of waiting passengers and idle drivers. We analyze the dynamics of passengers and drivers in a queueing model in which the platform can control the matching process by setting a threshold on the expected pickup time. Applying fluid approximations, we obtain accurate performance evaluations and an elegant optimality condition, based on which we propose a policy that adapts to time-varying demand. Funding: H. Zhang's research is partially supported by the National Natural Science Foundation of China [Grants 72201231 and 72192805], the Shenzhen Science and Technology Innovation Commission [Grant RCYX20210609103124047], the Shenzhen Research Institute of Big Data [Grant JSQ202210001], and the Guangdong Provincial Key Laboratory of Big Data Computing at the Chinese University of Hong Kong, Shenzhen. J. Zhang's research is supported in part by the General Research Fund [Grants 16204718, 16208120, and 16214121] from the Hong Kong Research Grants Council. Supplemental Material: The online appendix is available at https://doi.org/10.1287/opre.2022.2399. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

30. A Low-Rank Approximation for MDPs via Moment Coupling.

Author: Zhang, Amy B. Z. and Gurvich, Itai
Subjects: CENTRAL limit theorem, PARTIAL differential equations, DIFFERENTIAL equations, MARKOV processes, STOCHASTIC matrices
Abstract: Markov Decision Process Tayloring for Approximation Design Optimal control problems are difficult to solve for problems on large state spaces, calling for the development of approximate solution methods. In "A Low-rank Approximation for MDPs via Moment Coupling," Zhang and Gurvich introduce a novel framework to approximate Markov decision processes (MDPs) that stands on two pillars: (i) state aggregation, as the algorithmic infrastructure, and (ii) central-limit-theorem-type approximations, as the mathematical underpinning. The theoretical guarantees are grounded in the approximation of the Bellman equation by a partial differential equation (PDE) where, in the spirit of the central limit theorem, the transition matrix of the controlled Markov chain is reduced to its local first and second moments. Instead of solving the PDE, the algorithm introduced in the paper constructs a "sister"' (controlled) Markov chain whose two local transition moments are approximately identical with those of the focal chain. Because of this moment matching, the original chain and its sister are coupled through the PDE, facilitating optimality guarantees. Embedded into standard soft aggregation, moment matching provides a disciplined mechanism to tune the aggregation and disaggregation probabilities. We introduce a framework to approximate Markov decision processes (MDPs) that stands on two pillars: (i) state aggregation, as the algorithmic infrastructure, and (ii) central-limit-theorem-type approximations, as the mathematical underpinning of optimality guarantees. The theory is grounded in recent work by Braverman et al. [Braverman A, Gurvich I, Huang J (2020) On the Taylor expansion of value functions. Oper. Res. 68(2):631–65] that relates the solution of the Bellman equation to that of a partial differential equation (PDE) where, in the spirit of the central limit theorem, the transition matrix is reduced to its local first and second moments. Solving the PDE is not required by our method. Instead, we construct a "sister" (controlled) Markov chain whose two local transition moments are approximately identical with those of the focal chain. Because of this moment matching, the original chain and its sister are coupled through the PDE, a coupling that facilitates optimality guarantees. Embedded into standard soft aggregation algorithms, moment matching provides a disciplined mechanism to tune the aggregation and disaggregation probabilities. Computational gains arise from the reduction of the effective state space from N to N 1 2 + ϵ is as one might intuitively expect from approximations grounded in the central limit theorem. Funding: This work was supported by the National Science Foundation [Grant CMMI-1662294]. Supplemental Material: The online appendix is available at https://doi.org/10.1287/opre.2022.2392. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

31. Technical Note—An Improved Analysis of LP-Based Control for Revenue Management.

Author: Chen, Guanting, Li, Xiaocheng, and Ye, Yinyu
Subjects: REVENUE management, ONLINE algorithms, BUDGET, REGRET
Abstract: Bounded Regret for LP-Based Revenue-Management Problems In "An Improved Analysis of LP-Based Control for Revenue Management," Chen, Li, and Ye study a class of quantity-based network revenue-management problems. The authors consider a stochastic setting where all the orders are i.i.d. sampled and the customers are of finite type. They focus on the classic LP-based adaptive algorithm and consider regret as the performance measure. They found that when the underlying LP is nondegenerate, the algorithm achieves a problem-dependent regret upper bound that is independent of the horizon/number of time periods T; when the underlying LP is degenerate, the algorithm achieves a tight regret upper bound that scales on the order of T log(T) and matches the lower bound up to a logarithmic order. In this paper, we study a class of revenue-management problems, where the decision maker aims to maximize the total revenue subject to budget constraints on multiple types of resources over a finite horizon. At each time, a new order/customer/bid is revealed with a request of some resource(s) and a reward, and the decision maker needs to either accept or reject the order. Upon the acceptance of the order, the resource request must be satisfied, and the associated revenue (reward) can be collected. We consider a stochastic setting where all the orders are independent and identically distributed-sampled—that is, the reward-request pair at each time is drawn from an unknown distribution with finite support. The formulation contains many classic applications, such as the quantity-based network revenue-management problem and the Adwords problem. We focus on the classic linear program (LP)-based adaptive algorithm and consider regret as the performance measure defined by the gap between the optimal objective value of the certainty-equivalent LP and the expected revenue obtained by the online algorithm. Our contribution is twofold: (i) When the underlying LP is nondegenerate, the algorithm achieves a problem-dependent regret upper bound that is independent of the horizon/number of time periods T; and (ii) when the underlying LP is degenerate, the algorithm achieves a tight regret upper bound that scales on the order of T log (T) and matches the lower bound up to a logarithmic order. To our knowledge, both results are new and improve the best existing bounds for the LP-based adaptive algorithm in the corresponding setting. We conclude with numerical experiments to further demonstrate our findings. Supplemental Material: The online appendix is available at https://doi.org/10.1287/opre.2022.2358. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

32. Adversarial Robustness for Latent Models: Revisiting the Robust-Standard Accuracies Tradeoff.

Author: Javanmard, Adel and Mehrabi, Mohammad
Subjects: MACHINE learning, SCHOLARSHIPS, GAUSSIAN mixture models, SCIENCE awards, TEACHING awards, CREDIT scoring systems, STATISTICAL learning
Abstract: Low-dimensional structure of data can solve the adversarial robustness-accuracy conflict for machine learning systems. Modern machine learning systems have demonstrated breakthrough performance in a multitude of applications. However, they are known to be highly vulnerable to small perturbations to the input data, known as adversarial attacks. There are many well-documented examples of such behavior, for example small perturbations of an image, which is imperceptible to a human, can significantly degrade performance of modern classifiers. Adversarial training has been put forward as a way to improve robustness of learning algorithms to adversarial attacks. However, this benefit often comes at the cost of decreasing accuracy on natural unperturbed inputs, pointing to a potential conflict between adversarial robustness and standard accuracy. In "Adversarial robustness for latent models: Revisiting the robust-standard accuracies tradeoff," Adel Javanmard and Mohammad Mehrabi develop a theory to show that when the data enjoys low-dimensional structure, then it is possible to train models that are nearly optimal with respect to both, the standard and robust accuracies. Over the past few years, several adversarial training methods have been proposed to improve the robustness of machine learning models against adversarial perturbations in the input. Despite remarkable progress in this regard, adversarial training is often observed to drop the standard test accuracy. This phenomenon has intrigued the research community to investigate the potential tradeoff between standard accuracy (a.k.a generalization) and robust accuracy (a.k.a robust generalization) as two performance measures. In this paper, we revisit this tradeoff for latent models and argue that this tradeoff is mitigated when the data enjoys a low-dimensional structure. In particular, we consider binary classification under two data generative models, namely Gaussian mixture model and generalized linear model, where the features data lie on a low-dimensional manifold. We develop a theory to show that the low-dimensional manifold structure allows one to obtain models that are nearly optimal with respect to both, the standard accuracy and the robust accuracy measures. We further corroborate our theory with several numerical experiments, including Mixture of Factor Analyzers (MFA) model trained on the MNIST data set. Funding: A. Javanmard was partially supported by the Sloan Research Fellowship in mathematics, an Adobe Data Science Faculty Research Award, the National Science Foundation Career Award DMS-1844481, and the National Science Foundation Award 2311024. Supplemental Material: The e-companion is available at https://doi.org/10.1287/opre.2022.0162. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

33. Efficient Decentralized Multi-agent Learning in Asymmetric Bipartite Queueing Systems.

Author: Freund, Daniel, Lykouris, Thodoris, and Weng, Wentao
Subjects: MACHINE learning, CALL centers, INFORMATION storage & retrieval systems
Abstract: New Algorithm Enables Efficient Decentralized Learning in Bipartite Queueing Systems Bipartite queueing systems, where agents with individual job queues request service from a pool of heterogeneous servers, are standard models for service applications like data networks and call centers. Traditionally, a central controller schedules agent requests with full knowledge of system parameters. However, emerging applications require decentralized operation without this central coordination or complete system information. This presents challenges as agents lack the global knowledge needed to efficiently route jobs. Recent research into efficient decentralized learning algorithms for such systems faces limitations in nonoptimal throughput, demanding computations, or degrading efficiency with exponential queue growth. In contrast, this paper introduces an algorithm that enables queues to efficiently learn decentralized scheduling policies while ensuring throughput optimality. The approach is computationally lightweight, achieving queue length bounds that scale polynomially rather than exponentially in system size. Experiments demonstrate faster convergence and robustness of our algorithm compared with prior decentralized algorithms. We study decentralized multiagent learning in bipartite queueing systems, a standard model for service systems. In particular, N agents request service from K servers in a fully decentralized way, that is, by running the same algorithm without communication. Previous decentralized algorithms are restricted to symmetric systems, have performance that is degrading exponentially in the number of servers, require communication through shared randomness and unique agent identities, and are computationally demanding. In contrast, we provide a simple learning algorithm that, when run decentrally by each agent, leads the queueing system to have efficient performance in general asymmetric bipartite queueing systems while also having additional robustness properties. Along the way, we provide the first provably efficient upper confidence bound–based algorithm for the centralized case of the problem. Funding: T. Lykouris would like to acknowledge funding from Google Research. Supplemental Material: The e-companion is available at https://doi.org/10.1287/opre.2022.0291. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

34. Selling Quality-Differentiated Products in a Markovian Market with Unknown Transition Probabilities.

Author: Keskin, N. Bora and Li, Meng
Subjects: CONSUMER preferences, PRODUCT lines, TIME-based pricing, CONSUMER lending, PROBABILITY theory, MARKET design & structure (Economics)
Abstract: How to Price a Product Line When Customer Preferences Change over Time Quality-differentiated products can help sellers increase their profits through market segmentation. However, in many business applications, such as online search and consumer lending, customer preferences evolve over time, making it difficult for sellers to use market segmentation. In their study "Selling Quality-Differentiated Products in a Markovian Market with Unknown Transition Probabilities," Keskin and Li analyze dynamic pricing of a vertically differentiated product line when customer preferences for quality can shift over time. Keskin and Li show that data-driven learning is essential when operating in a changing market with unknown customer heterogeneity. Keskin and Li also develop a bounded learning policy that implements near-optimal data-driven learning in a Markov-modulated demand environment. In this paper, we study a firm's dynamic pricing problem in the presence of unknown and time-varying heterogeneity in customers' preferences for quality. The firm offers a standard product as well as a premium product to deal with this heterogeneity. First, we consider a benchmark case in which the transition structure of customer heterogeneity is known. In this case, we analyze the firm's optimal pricing policy and characterize its key structural properties. Thereafter, we investigate the case of unknown market transition structure and design a simple and practically implementable policy, called the bounded learning policy, which is a combination of two policies that perform poorly in isolation. Measuring performance by regret (i.e., the revenue loss relative to a clairvoyant who knows the underlying changes in the market), we prove that our bounded learning policy achieves the fastest possible convergence rate of regret in terms of the frequency of market shifts. Thus, our policy performs well without relying on precise knowledge of the market transition structure. Supplemental Material: The e-companion is available at https://doi.org/10.1287/opre.2022.0316. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

35. Robust Dynamic Assortment Optimization in the Presence of Outlier Customers.

Author: Chen, Xi, Krishnamurthy, Akshay, and Wang, Yining
Subjects: CONSUMERS, CONSUMER behavior, BUSINESS revenue, ROBUST statistics, LOGISTIC regression analysis
Abstract: Robust Dynamic Assortment Optimization Assortment optimization with choice model estimation and learning has been studied extensively in the data-driven revenue management literature. Existing methods and analysis, however, do not take into consideration the fact that some customers arriving at certain time periods might exhibit outlier purchasing behaviors. The work of Chen et al. studies dynamic assortment optimization in the presence of outlier customers modeled by an ε -contamnination model. The impact of outlier customers on the revenue performance of an algorithm is analyzed and discussed. We consider the dynamic assortment optimization problem under the multinomial logit model with unknown utility parameters. The main question investigated in this paper is model mis-specification under the ε-contamination model, which is a fundamental model in robust statistics and machine learning. In particular, throughout a selling horizon of length T, we assume that customers make purchases according to a well-specified underlying multinomial logit choice model in a (1 − ε) -fraction of the time periods and make arbitrary purchasing decisions instead in the remaining ε-fraction of the time periods. In this model, we develop a new robust online assortment optimization policy via an active-elimination strategy. We establish both upper and lower bounds on the regret, and we show that our policy is optimal up to a logarithmic factor in T when the assortment capacity is constant. We further develop a fully adaptive policy that does not require any prior knowledge of the contamination parameter ε. In the case of the existence of a suboptimality gap between optimal and suboptimal products, we also established gap-dependent logarithmic regret upper bounds and lower bounds in both the known-ε and unknown-ε cases. Our simulation study shows that our policy outperforms the existing policies based on upper confidence bounds and Thompson sampling. Funding: X. Chen acknowledges support from the National Science Foundation [Grant IIS-1845444]. Supplemental Material: The supplementary material is available at https://doi.org/10.1287/opre.2020.0281. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

36. In Congestion Games, Taxes Achieve Optimal Approximation.

Author: Paccagnan, Dario and Gairing, Martin
Subjects: POLYNOMIAL time algorithms, APPROXIMATION algorithms, TAXATION, EXTERNALITIES, TAX & expenditure limitations, COST allocation
Abstract: The Power of Traffic Tolls The paper "In Congestion Games, Taxes Achieve Optimal Approximation" by Paccagnan and Gairing investigates the power and limitations of taxes as interventions in congestion games. Perhaps surprisingly, they find that efficiently computed taxes can achieve the same performance as the best polynomial time algorithm, even when the algorithm has full control over the agents' actions. The authors establish three main results to support this claim. First, they prove that minimizing congestion is NP-hard to approximate within a given factor based on the latency functions. Second, they demonstrate how convex optimization tools can be used to design optimal taxes. Third, they provide a polynomial time algorithm with an approximation factor matching the hardness barrier. The upshot of their contribution can be summarized as follows: Judiciously designed taxes achieve optimal approximation, and no other tractable intervention can improve upon this result. In this work, we address the problem of minimizing social cost in atomic congestion games. For this problem, we present lower bounds on the approximation ratio achievable in polynomial time and demonstrate that efficiently computable taxes result in polynomial time algorithms matching such bounds. Perhaps surprisingly, these results show that indirect interventions, in the form of efficiently computed taxation mechanisms, yield the same performance achievable by the best polynomial time algorithm, even when the latter has full control over the agents' actions. It follows that no other tractable approach geared at incentivizing desirable system behavior can improve upon this result, regardless of whether it is based on taxations, coordination mechanisms, information provision, or any other principle. In short: Judiciously chosen taxes achieve optimal approximation. Three technical contributions underpin this conclusion. First, we show that computing the minimum social cost is NP -hard to approximate within a given factor depending solely on the admissible cost functions. Second, we design a polynomially computable taxation mechanism whose efficiency (price of anarchy) matches this hardness factor, and thus is optimal among all tractable mechanisms. As these results extend to coarse correlated equilibria, any no-regret algorithm inherits the same performances, allowing us to devise polynomial time algorithms with optimal approximation. Supplemental Material: The e-companion is available at https://doi.org/10.1287/opre.2021.0526. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

37. Erratum to "Consumer Choice Under Limited Attention When Alternatives Have Different Information Costs".

Author: Huettner, Frank, Boyacı, Tamer, and Akçay, Yalçın
Subjects: CONSUMER preferences, MATHEMATICAL optimization, COST, REVENUE management
Abstract: There is an error in one of the results of our paper [Huettner F, Boyacı T, Akçay Y (2019) Consumer choice under limited attention when alternatives have different information costs. Oper. Res. 67(3):671–699]. In this erratum, we point out the error and provide a correction based on Walker-Jones [(2023) Rational inattention with multiple attributes. J. Econom. Theory 212:105688]. Our key characterizations, insights, and numerical examples do not depend on this error and, hence, remain valid. The main implication is on the stopping condition used in the algorithm. We propose a fix based on the new sufficient condition and, if needed, standard convex optimization techniques. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

38. Technical Note—Dynamic Pricing and Learning with Discounting.

Author: Feng, Zhichao, Dawande, Milind, Janakiraman, Ganesh, and Qi, Anyan
Subjects: TIME-based pricing, MACHINE learning, LEARNING problems, LEAST squares, PRICES
Abstract: Learning algorithms can take a substantial amount of time to converge, thereby raising the need to understand the role of discounting in learning. In "Dynamic Pricing and Learning with Discounting," Z. Feng, M. Dawande, G. Janakiraman, and A. Qi examine the impact of discounting on learning by examining two classic dynamic-pricing and learning problems studied in Broder and Rusmevichientong (2012) and Keskin and Zeevi (2014). In both settings, the retailer initially does not know the parameters of the demand model. Given a discount factor, the retailer's objective is to determine a pricing policy to maximize the discounted revenue over a selling horizon. The authors establish lower bounds on the regret under any policy and propose new asymptotically optimal policies that take the discount factor into consideration. They numerically examine the regret under the proposed policies and the existing policies in the aforementioned two papers. In many practical settings, learning algorithms can take a substantial amount of time to converge, thereby raising the need to understand the role of discounting in learning. We illustrate the impact of discounting on the performance of learning algorithms by examining two classic and representative dynamic-pricing and learning problems studied in Broder and Rusmevichientong (BR) [Broder J, Rusmevichientong P (2012) Dynamic pricing under a general parametric choice model. Oper. Res. 60(4):965–980] and Keskin and Zeevi (KZ) [Keskin NB, Zeevi A (2014) Dynamic pricing with an unknown demand model: Asymptotically optimal semi-myopic policies. Oper. Res. 62(5):1142–1167]. In both settings, a seller sells a product with unlimited inventory over T periods. The seller initially does not know the parameters of the general choice model in BR (respectively, the linear demand curve in KZ). Given a discount factor ρ, the retailer's objective is to determine a pricing policy to maximize the expected discounted revenue over T periods. In both settings, we establish lower bounds on the regret under any policy and show limiting bounds of Ω (1 / (1 − ρ)) and Ω (T) when T → ∞ and ρ → 1 , respectively. In the model of BR with discounting, we propose an asymptotically tight learning policy and show that the regret under our policy as well that under the MLE-CYCLE policy in BR is O (1 / (1 − ρ)) (respectively, O (T)) when T → ∞ (respectively, ρ → 1). In the model of KZ with discounting, we present sufficient conditions for a learning policy to guarantee asymptotic optimality and show that the regret under any policy satisfying these conditions is O (log (1 / (1 − ρ)) 1 / (1 − ρ)) (respectively, O (log T T)) when T → ∞ (respectively, ρ → 1). We show that three different policies—namely, the two variants of the greedy iterated least squares policy in KZ and a different policy that we propose—achieve this upper bound on the regret. We numerically examine the behavior of the regret under our policies as well as those in BR and KZ in the presence of discounting. We also analyze a setting in which the discount factor per period is a function of the number of decision periods in the planning horizon. Funding: Z. Feng received support from the National Natural Science Foundation of China [Grant 72201256]. Supplemental Material: The online appendix is available at https://doi.org/10.1287/opre.2023.2477. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

39. Reliable Off-Policy Evaluation for Reinforcement Learning.

Author: Wang, Jie, Gao, Rui, and Zha, Hongyuan
Subjects: REINFORCEMENT learning, ROBUST optimization, ARTIFICIAL intelligence, CLASSROOM environment
Abstract: Off-policy evaluation is an important topic in reinforcement learning, which estimates the expected cumulative reward of a target policy using logged trajectory data generated from a different behavior policy, without execution of the target policy. It is imperative to quantify the uncertainty of the off-policy estimate before deployment of the target policy. Here we leverage methodologies from (Wasserstein) distributionally robust optimization to provide robust and optimistic cumulative reward estimates. With proper selection of the size of the distributional uncertainty set, these estimates serve as confidence bounds with nonasymptotic and asymptotic guarantees under stochastic or adversarial environments. We also generalize those results to batch reinforcement learning. In a sequential decision-making problem, off-policy evaluation estimates the expected cumulative reward of a target policy using logged trajectory data generated from a different behavior policy, without execution of the target policy. Reinforcement learning in high-stake environments, such as healthcare and education, is often limited to off-policy settings due to safety or ethical concerns or inability of exploration. Hence, it is imperative to quantify the uncertainty of the off-policy estimate before deployment of the target policy. In this paper, we propose a novel framework that provides robust and optimistic cumulative reward estimates using one or multiple logged trajectories data. Leveraging methodologies from distributionally robust optimization, we show that with proper selection of the size of the distributional uncertainty set, these estimates serve as confidence bounds with nonasymptotic and asymptotic guarantees under stochastic or adversarial environments. Our results are also generalized to batch reinforcement learning and are supported by empirical analysis. Funding: This work was supported by the Shenzhen Institute of Artificial Intelligence and Robotics for Society, Shenzhen Science and Technology Program [Grant JCYJ20210324120011032]. Supplemental Material: The online appendix is available at https://doi.org/10.1287/opre.2022.2382. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

40. Sample and Computationally Efficient Stochastic Kriging in High Dimensions.

Author: Ding, Liang and Zhang, Xiaowei
Subjects: KRIGING, COVARIANCE matrices, COMPUTATIONAL complexity, EXPERIMENTAL design, MATRIX inversion
Abstract: High-dimensional Simulation Metamodeling Stochastic kriging has been widely employed for simulation metamodeling to predict the response surface of complex simulation models. However, its use is limited to cases where the design space is low-dimensional because the sample complexity (i.e., the number of design points required to produce an accurate prediction) grows exponentially in the dimensionality of the design space. The large sample size results in both a prohibitive sample cost for running the simulation model and a severe computational challenge due to the need to invert large covariance matrices. To address this long-standing challenge, Liang Ding and Xiaowei Zhang, in their recent paper "Sample and Computationally Efficient Stochastic Kriging in High Dimensions", develop a novel methodology — based on tensor Markov kernels and sparse grid experimental designs — that dramatically alleviates the curse of dimensionality. The proposed methodology has theoretical guarantees on both sample complexity and computational complexity and shows outstanding performance in numerical problems of as high as 16,675 dimensions. Stochastic kriging has been widely employed for simulation metamodeling to predict the response surface of complex simulation models. However, its use is limited to cases where the design space is low-dimensional because in general the sample complexity (i.e., the number of design points required for stochastic kriging to produce an accurate prediction) grows exponentially in the dimensionality of the design space. The large sample size results in both a prohibitive sample cost for running the simulation model and a severe computational challenge due to the need to invert large covariance matrices. Based on tensor Markov kernels and sparse grid experimental designs, we develop a novel methodology that dramatically alleviates the curse of dimensionality. We show that the sample complexity of the proposed methodology grows only slightly in the dimensionality, even under model misspecification. We also develop fast algorithms that compute stochastic kriging in its exact form without any approximation schemes. We demonstrate via extensive numerical experiments that our methodology can handle problems with a design space of more than 10,000 dimensions, improving both prediction accuracy and computational efficiency by orders of magnitude relative to typical alternative methods in practice. Funding: We gratefully acknowledge financial support from the Hong Kong Research Grants Council (GRF 17206821). Supplemental Material: The e-companion is available at https://doi.org/10.1287/opre.2022.2367. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

41. Fast Quantum Subroutines for the Simplex Method.

Author: Nannicini, Giacomo
Subjects: SIMPLEX algorithm, QUANTUM computing, MATRIX multiplications, QUANTUM computers, SPARSE matrices
Abstract: What would Dantzig do with a quantum computer? It is unlikely we will ever find out the answer to this question. However, we can try to understand if the simplex method can be implemented on a quantum computer, and this might have piqued Dantzig's interest. The paper "Fast Quantum Subroutines for the Simplex Method" gives a quantum implementation of an iteration of the simplex method, in which the basis inverse is never explicitly computed: the quantum computer takes as input the current basis and certifies optimality or outputs the next basis. Because computing the basis inverse is expensive, this can lead to an asymptotically faster algorithm in terms of the problem size: in the best case, the quantum algorithm can identify pivots in essentially linear time! This, however, comes at the cost of worse dependence on some numerical parameters: all these tradeoffs are discussed in the full article. We propose quantum subroutines for the simplex method that avoid classical computation of the basis inverse. We show how to quantize all steps of the simplex algorithm, including checking optimality, unboundedness, and identifying a pivot (i.e., pricing the columns and performing the ratio test) according to Dantzig's rule or the steepest edge rule. The quantized subroutines obtain a polynomial speedup in the dimension of the problem but have worse dependence on other numerical parameters. For example, for a problem with m constraints, n variables, at most dc nonzero elements per column of the costraint matrix, at most d nonzero elements per column or row of the basis, basis condition number κ, and optimality tolerance ϵ, pricing can be performed in O ˜ (ϵ − 1 κ d n (d c n + d m)) time, where the O ˜ notation hides polylogarithmic factors; classically, pricing requires O (d c 0.7 m 1.9 + m 2 + o (1) + d c n) time in the worst case using the fastest known algorithm for sparse matrix multiplication. For well-conditioned sparse problems, the quantum subroutines scale better in m and n and may therefore have an advantage for very large problems. The running time of the quantum subroutines can be improved if the constraint matrix admits an efficient algorithmic description or if quantum RAM is available. Funding: This work was supported by the Army Research Office [Grant W911NF-20-1-0014] and the Air Force Research Laboratory [Grant FA8750-C-18-0098]. Supplemental Material: The online appendix is available at https://doi.org/10.1287/opre.2022.2341. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

42. Advances in MINLP to Identify Energy-Efficient Distillation Configurations.

Author: Gooty, Radhakrishna Tumbalam, Agrawal, Rakesh, and Tawarmalani, Mohit
Subjects: GREENHOUSE gas mitigation, GREENHOUSE gases, DISTILLATION, CHEMICAL industry, PETROLEUM chemicals industry, SEPARATION (Technology), DISCRETE choice models
Abstract: Separation of mixtures of chemicals, ubiquitous in chemical and petrochemical industries, by distillation is energy intensive. Nearly 3% of the overall energy is used for distillation in the United States. Improving the distillation process is crucial for making chemical industries more sustainable. However, designing distillation sequences is challenging because the choice set is vast, and the equations governing the physical process are highly nonconvex. Traditional design practices rely on heuristics and often result in suboptimal solutions. Tumbalam Gooty et al. present the first approach that reliably identifies the distillation sequence that requires the least energy for a given separation. By embedding convex hulls of substructures and adapting the reformulation-linearization technique to fractions of polynomials, they demonstrated that their approach outperforms the state-of-the-art. Their work will help the chemical industry reduce greenhouse gas emissions associated with distillation. In this paper, we describe the first mixed-integer nonlinear programming (MINLP)-based solution approach that successfully identifies the most energy-efficient distillation configuration sequence for a given separation. Current sequence design strategies are largely heuristic. The rigorous approach presented here can help reduce the significant energy consumption and consequent greenhouse gas emissions by separation processes. First, we model discrete choices using a formulation that is provably tighter than previous formulations. Second, we highlight the use of partial fraction decomposition alongside reformulation-linearization technique (RLT). Third, we obtain convex hull results for various special structures. Fourth, we develop new ways to discretize the MINLP. Finally, we provide computational evidence to demonstrate that our approach significantly outperforms the state-of-the-art techniques. Funding: This work was supported by the U.S. Department of Energy [Grant DE-EE0005768], the Bilsland Dissertation Fellowship, and the National Science Foundation [Grant EEC-1647722]. Supplemental Material: The online appendix is available at https://doi.org/10.1287/opre.2022.2340. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

43. Optimal Investment, Heterogeneous Consumption, and Best Time for Retirement.

Author: Jang, Hyun Jin, Xu, Zuo Quan, and Zheng, Harry
Subjects: STOCHASTIC partial differential equations, STOCHASTIC control theory, ECONOMIC impact, RETIREMENT income, LUXURIES, LABOR costs, EARLY retirement, RETIREMENT
Abstract: We study an optimal investment and consumption problem with heterogeneous consumption of basic and luxury goods, together with the choice of time for retirement. The optimal heterogeneous consumption strategies for a class of nonhomothetic utility maximizer are shown to consume only basic goods when the wealth is small, to consume basic goods and make savings when the wealth is intermediate, and to consume almost all in luxury goods when the wealth is large. The optimal retirement policy is shown to be both universal, in the sense that all individuals should retire at the same level of marginal utility that is determined only by income, labor cost, discount factor as well as market parameters, and not universal, in the sense that all individuals can achieve the same marginal utility with different utility and wealth. We also show that individuals prefer to retire as time goes by if the marginal labor cost increases faster than that of income. This paper studies an optimal investment and consumption problem with heterogeneous consumption of basic and luxury goods, together with the choice of time for retirement. The utility for luxury goods is not necessarily a concave function. The optimal heterogeneous consumption strategies for a class of nonhomothetic utility maximizer are shown to consume only basic goods when the wealth is small, to consume basic goods and make savings when the wealth is intermediate, and to consume almost all in luxury goods when the wealth is large. The optimal retirement policy is shown to be both universal, in the sense that all individuals should retire at the same level of marginal utility that is determined only by income, labor cost, discount factor and market parameters, and not universal, in the sense that all individuals can achieve the same marginal utility with different utility and wealth. It is also shown that individuals prefer to retire as time goes by if the marginal labor cost increases faster than that of income. The main tools used in analyzing the problem are from a partial differential equation and stochastic control theory including variational inequality and dual transformation. We finally conduct the simulation analysis for the featured model parameters to investigate practical and economic implications by providing their figures. Funding: This work was supported by Hong Kong Research Grants Council General Research Fund [Grants 15202421 and 15202817], the National Research Foundation of Korea [Grant 2021R1C1C1004647], the PolyU-SDU Joint Research Center on Financial Mathematics, the CAS AMSS-PolyU Joint Laboratory of Applied Mathematics, and Hong Kong Polytechnic University, the National Natural Science Foundation of China [Grant 11971409], and the Engineering and Physical Sciences Research Council (UK) [Grant EP/V008331/1]. Supplemental Material: The online appendix is available at https://doi.org/10.1287/opre.2022.2328. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

44. Stochastic Liquidity as a Proxy for Nonlinear Price Impact.

Author: Muhle-Karbe, Johannes, Wang, Zexin, and Webster, Kevin
Subjects: PRICES, LIQUIDITY (Economics), MARKET manipulation, FINANCIAL engineering
Abstract: For the "In This Issue" column: Trading costs play a central role in designing and implementing quantitative trading strategies. To quantify trading costs, optimal execution and trading algorithms rely on price impact models, such as the propagator model. Empirically, price impact is concave in trade sizes, leading to nonlinear models for which optimization problems are intractable and even qualitative properties, such as price manipulation, are poorly understood. This paper shows that, in the diffusion limit of small and frequent orders, the nonlinear model converges to a tractable linear model. In this high-frequency limit, a stochastic liquidity parameter approximates the original impact function's nonlinearity. This allows us to derive simple formulas for optimal trading strategies and sharp conditions on market volumes to rule out price manipulation. A detailed empirical study using high-frequency limit-order data illustrates the practical performance of the theoretical results. Optimal execution and trading algorithms rely on price impact models, such as the propagator model, to quantify trading costs. Empirically, price impact is concave in trade sizes, leading to nonlinear models for which optimization problems are intractable, and even qualitative properties, such as price manipulation, are poorly understood. However, we show that in the diffusion limit of small and frequent orders, the nonlinear model converges to a tractable linear model. In this high-frequency limit, a stochastic liquidity parameter approximates the original impact function's nonlinearity. We illustrate the approximation's practical performance using limit order data. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

45. Robust Queue Inference from Waiting Times.

Author: Bandi, Chaithanya, Han, Eojin, and Proskynitopoulos, Alexej
Subjects: QUEUEING networks, ROBUST optimization, OPERATIONS research, INFERENTIAL statistics, STATISTICAL services
Abstract: Modeling and decision making for queueing systems have been one of fundamental topics in operations research. For these problems, uncertainty models are established by estimation of key parameters such as expected interarrival and service times. In practice, however, their distributions are unknown, and decision makers are only given historical records of waiting times, which contain relevant but indirect information on the uncertainties. Their complex temporal dependence on the queueing dynamics and the absence of distributional information on the model primitives render estimation of queueing systems remarkably challenging. In the paper "Robust Queue Inference from Waiting Times" by Chaithanya Bandi, Eojin Han, and Alexej Proskynitopoulos, a new inference framework based on robust optimization is proposed to estimate unknown service times from waiting time observations. This new framework allows data-driven, distribution-free estimation on unknown model primitives by solving tractable optimization problems. Observational data from queueing systems are of great practical interest in many application areas because they can be leveraged for better statistical inference of service processes. However, these observations often only provide partial information of the system for various reasons in real-world settings. Moreover, their complex temporal dependence on the queueing dynamics and the absence of distributional information on the model primitives render estimation of queueing systems remarkably challenging. To this end, we consider the problem of inferring service times from waiting time observations. Specifically, we propose an inference framework based on robust optimization, where service times are described via sets that are calibrated by the observed waiting times. We provide conditions under which these data-driven uncertainty sets become asymptotically confident estimators of the service process; that is, they contain unknown service times almost surely as the number of observations grows. We also introduce tractable optimization formulations to compute bounds of various service time characteristics such as moments and risk measures. In this way, our approach is data driven and free of distributional assumptions on unknown model primitives, which is required by existing methods. We also generalize the proposed inference framework to tandem queues and feed-forward networks, offering broader capability in estimation of real-world queueing systems. Our simulation study demonstrates that the proposed approach easily incorporates information of arrival processes such as moments and correlations and performs consistently well on queueing networks under various settings. Supplemental Material: The online appendix is available at https://doi.org/10.1287/opre.2022.0091. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

46. Flattening Energy-Consumption Curves by Monthly Constrained Direct Load Control Contracts.

Author: Fattahi, Ali, Ghodsi, Saeed, Dasu, Sriram, and Ahmadi, Reza
Subjects: INDEPENDENT system operators, ENERGY consumption, ELECTRIC power consumption, COST control, PUBLIC utilities, ELECTRIC utilities
Abstract: California Utility Firm Implements Innovative Model, Reducing Costs by 4% A California utility firm has successfully implemented a pioneering model to balance electricity demand and supply while minimizing costs. By utilizing direct load control contracts (DLCCs), the firm can reduce energy consumption during peak hours. Researchers developed an integer stochastic dynamic optimization problem that considers monthly and annual constraints, allowing for effective execution of DLCCs. Incorporating a "reduce-to-threshold" policy to flatten energy-consumption curves during high demand, the model was verified using real data from the California Independent System Operator. When implemented, the utility firm achieved an impressive cost reduction of approximately 4%. Sensitivity analysis was conducted to enhance customer experience and improve DLCC contract features. The success of this innovative model highlights the potential of DLCCs and advanced optimization techniques in the energy sector, offering a blueprint for other utility companies seeking to optimize grid stability and reduce costs. Balancing electricity demand and supply is one of the most critical tasks that utility firms perform to maintain grid stability and reduce system cost. Demand-response programs are among the strategies that utilities use to reduce electricity consumption during peak hours and flatten the energy-consumption curve. Direct load control contracts (DLCCs) are a class of incentive-based demand-response programs that allow utilities to assign "calls" to customer groups to reduce their energy usage by a prespecified amount for a given length of time. Given the rapid expansion of such contracts, in this paper, we develop an integer stochastic dynamic optimization problem for executing DLCCs that minimizes total system cost subject to monthly and annual constraints on the number of times and hours customers can be called. We develop a hierarchical approximation approach, which consists of an annual problem and monthly problems, to solve the DLCC implementation problem effectively and in a reasonable amount of time. Motivated by the practice in a large utility firm in California, we incorporate a reduce-to-threshold policy that attempts to flatten energy-consumption curves whenever demand exceeds a given threshold. We verified the quality of our proposed approach on real data from the California Independent System Operator, which is the umbrella organization of the utility firms in California, and measured the quality of our solution against a lower bound. A large utility firm in California implemented our model and informed us that the additional reduction in cost was approximately 4%. Our sensitivity analysis reports the impact of managerial concerns on some policies to enhance customer experience and provides insights for improving the features of DLCC contracts. Supplemental Material: The online appendix is available at https://doi.org/10.1287/opre.2021.0638. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

47. Technical Note—A Note on State-Independent Policies in Network Revenue Management.

Author: Manchiraju, Chandrasekhar, Dawande, Milind, and Janakiraman, Ganesh
Subjects: REVENUE management, PRACTICAL reason, PRICES, SUPPLY & demand, DIVERSIFICATION in industry, CONSUMERS
Abstract: For a variety of practical reasons, including ease of understanding and familiarity for customers, firms use static-pricing algorithms in which product prices remain unchanged over time. In a multiproduct setting with no demand substitutability or complementarity between products, prior work has shown that static-pricing algorithms can offer excellent performance. In "A Note on State-Independent Policies in Network Revenue Management," Manchiraju, Dawande, and Janakiraman extend that work and show that static-pricing algorithms offer a near-optimal performance even in the presence of demand substitutability and complementarity. We revisit two classical price-based and choice-based network revenue management problems studied in the literature. The setting for the problems is as follows: A firm sells multiple products over a finite horizon using a limited supply of resources. Product demands are stochastic. The demand rate for each product depends on the current price-vector (respectively, assortment displayed). The firm's goal is to obtain a pricing (respectively, assortment) policy that maximizes its expected revenue. The main result for the price-based problem is that the optimality gaps incurred by two state-independent policies scale proportionally to k , where k is the scale of demand and supply. The analysis in the literature implicitly assumes that the demand-price relationship is separable among the products. In this paper, we derive these results for the more general setting where the demand-price relationship need not be separable. We also consider an important practical variant of the price-based problem in which the price of each product is restricted to a discrete and finite set and show the k result for this problem. For the choice-based problem, to our knowledge, there is no result in the literature on the asymptotic convergence rate of any policy. We show that this problem is mathematically equivalent to the discrete-price variant of the price-based problem and use this equivalence to show that the choice-based deterministic linear program policy in the literature for the choice-based problem also inherits the k bound on the optimality gap. Supplemental Material: The online appendix is available at https://doi.org/10.1287/opre.2023.2471. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

48. A Planner-Trader Decomposition for Multimarket Hydro Scheduling.

Author: Schindler, Kilian, Rujeerapaiboon, Napat, Kuhn, Daniel, and Wiesemann, Wolfram
Subjects: HYDROELECTRIC power plants, STOCHASTIC programming, ELECTRICITY pricing, PHYSICAL sciences, SCHEDULING, WATER power
Abstract: Multimarket Multireservoir Hydro Scheduling Peak/off-peak spreads on European electricity forward and spot markets are eroding due to the ongoing nuclear phaseout in Germany and the steady growth in photovoltaic capacity. The reduced profitability of peak/off-peak arbitrage forces hydropower producers to recover part of their original profitability on the reserve markets. In their paper titled "A Planner-Trader Decomposition for Multimarket Hydro Scheduling" Schindler, Rujeerapaiboon, Kuhn, and Wiesemann propose a bi-layer stochastic programming framework that jointly optimizes the trading strategies on the spot and reserve markets. The model faithfully accounts for uncertainty in electricity prices, water inflows, and reserve activations, and it ensures that the hydropower producers can fulfill their market commitments under any circumstances. The model is numerically challenging due to the various sources of uncertainty that are revealed at different time scales and that affect the problem's objective function and constraints, and the authors propose a new planner-trader decomposition and an information restriction for its solution. A case study based on real data from Austria reveals significant benefits of simultaneously participating in the spot and the reserve markets. Peak/off-peak spreads on European electricity forward and spot markets are eroding due to the ongoing nuclear phaseout in Germany and the steady growth in photovoltaic capacity. The reduced profitability of peak/off-peak arbitrage forces hydropower producers to recover part of their original profitability on the reserve markets. We propose a bilayer stochastic programming framework for the optimal operation of a fleet of interconnected hydropower plants that sells energy on both the spot and the reserve markets. The outer layer (the planner's problem) optimizes end-of-day reservoir filling levels over one year, whereas the inner layer (the trader's problem) selects optimal hourly market bids within each day. Using an information restriction whereby the planner prescribes the end-of-day reservoir targets one day in advance, we prove that the trader's problem simplifies from an infinite-dimensional stochastic program with 25 stages to a finite two-stage stochastic program with only two scenarios. Substituting this reformulation back into the outer layer and approximating the reservoir targets by affine decision rules allows us to simplify the planner's problem from an infinite-dimensional stochastic program with 365 stages to a two-stage stochastic program that can conveniently be solved via the sample average approximation. Numerical experiments based on a cascade in the Salzburg region of Austria demonstrate the effectiveness of the suggested framework. Funding: This research was supported by the Ministry of Education, Singapore, under its 2019 Academic Research Fund Tier 3 [Grant MOE-2019-T3-1-010], the National University of Singapore [Grant A-0009135-01-00], and the Swiss National Science Foundation [Grant BSCGI0_157733] as well as the Engineering and Physical Sciences Research Council [Grant EP/R045518/1]. Supplemental Material: The online appendix is available at https://doi.org/10.1287/opre.2023.2456. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

49. Mechanism Design Under Approximate Incentive Compatibility.

Author: Balseiro, Santiago R., Besbes, Omar, and Castro, Francisco
Subjects: GAME theory, INCENTIVE (Psychology), REVENUE management
Abstract: An assumption that is pervasive in revenue management and economics is that buyers are perfect optimizers. However, in practice, buyers may be limited by their computational capabilities or lack of information and may not be able to perfectly optimize their response to a selling mechanism. This has motivated the introduction of approximate incentive compatibility as a solution concept for practical mechanisms. In "Mechanism Design under Approximate Incentive Compatibility," Balseiro, Besbes, and Castro study, for the first time, the problem of designing optimal selling mechanisms when buyers are imperfect optimizers. Their work characterizes structural properties of approximate incentive compatible mechanisms and establishes fundamental bounds on how much revenue can be garnered by moving from exact to approximate incentive constraints. Their work brings a new perspective to the theory of mechanism design by shedding light on a novel class of optimization problems, techniques, and challenges that emerge when relaxing incentive constraints. A fundamental assumption in classical mechanism design is that buyers are perfect optimizers. However, in practice, buyers may be limited by their computational capabilities or a lack of information and may not be able to perfectly optimize their response to a mechanism. This has motivated the introduction of approximate incentive compatibility (IC) as an appealing solution concept for practical mechanism design. Although most of the literature has focused on the analysis of particular approximate IC mechanisms, this paper is the first to study the design of optimal mechanisms in the space of approximate IC mechanisms and to explore how much revenue can be garnered by moving from exact to approximate incentive constraints. In particular, we study the problem of a seller facing one buyer with private values and analyze optimal selling mechanisms under ε-incentive compatibility. We establish that the gains that can be garnered depend on the local curvature of the seller's revenue function around the optimal posted price when the buyer is a perfect optimizer. If the revenue function behaves locally like an α-power for α ∈ (1 , ∞) , then no mechanism can garner gains higher than order ε α / (2 α − 1) . This improves on state-of-the-art results that imply maximum gains of ε 1 / 2 by providing the first parametric bounds that capture the impact of revenue function's curvature on revenue gains. Furthermore, we establish that an optimal mechanism needs to randomize as soon as ε > 0 and construct a randomized mechanism that is guaranteed to achieve order ε α / (2 α − 1) additional revenues, leading to a tight characterization of the revenue implications of approximate IC constraints. Our study sheds light on a novel class of optimization problems and the challenges that emerge when relaxing IC constraints. In particular, it brings forward the need to optimize not only over allocations and payments but also over best responses, and we develop a new framework to address this challenge. Supplemental Material: The online appendix is available at https://doi.org/10.1287/opre.2022.2359. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Region

Database

49 results

Search Results

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources