Author: "Grigas, Paul" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Grigas, Paul"' showing total 42 results

Start Over Author "Grigas, Paul"

42 results on '"Grigas, Paul"'

1. Beyond Discretization: Learning the Optimal Solution Path

Author: Dong, Qiran, Grigas, Paul, and Gupta, Vishal
Subjects: Mathematics - Optimization and Control, Statistics - Machine Learning
Abstract: Many applications require minimizing a family of optimization problems indexed by some hyperparameter $\lambda \in \Lambda$ to obtain an entire solution path. Traditional approaches proceed by discretizing $\Lambda$ and solving a series of optimization problems. We propose an alternative approach that parameterizes the solution path with a set of basis functions and solves a \emph{single} stochastic optimization problem to learn the entire solution path. Our method offers substantial complexity improvements over discretization. When using constant-step size SGD, the uniform error of our learned solution path relative to the true path exhibits linear convergence to a constant related to the expressiveness of the basis. When the true solution path lies in the span of the basis, this constant is zero. We also prove stronger results for special cases common in machine learning: When $\lambda \in [-1, 1]$ and the solution path is $\nu$-times differentiable, constant step-size SGD learns a path with $\epsilon$ uniform error after at most $O(\epsilon^{\frac{1}{1-\nu}} \log(1/\epsilon))$ iterations, and when the solution path is analytic, it only requires $O\left(\log^2(1/\epsilon)\log\log(1/\epsilon)\right)$. By comparison, the best-known discretization schemes in these settings require at least $O(\epsilon^{-1/2})$ discretization points (and even more gradient calls). Finally, we propose an adaptive variant of our method that sequentially adds basis functions and demonstrates strong numerical performance through experiments.
Published: 2024

2. New Methods for Parametric Optimization via Differential Equations

Author: Liu, Heyuan and Grigas, Paul
Subjects: Mathematics - Optimization and Control
Abstract: We develop and analyze several different second-order algorithms for computing a near-optimal solution path of a convex parametric optimization problem with smooth Hessian. Our algorithms are inspired by a differential equation perspective on the parametric solution path and do not rely on the specific structure of the objective function. We present computational guarantees that bound the oracle complexity to achieve a near-optimal solution path under different sets of smoothness assumptions. Under the assumptions, the results are an improvement over the best-known results of the grid search methods. We also develop second-order conjugate gradient variants that avoid exact computations of Hessians and solving of linear equations. We present computational results that demonstrate the effectiveness of our methods over grid search methods on both real and synthetic datasets. On large-scale problems, we demonstrate significant speedups of the second-order conjugate variants as compared to the standard versions of our methods.
Published: 2023

3. Binary Classification with Instance and Label Dependent Label Noise

Author: Im, Hyungki and Grigas, Paul
Subjects: Statistics - Machine Learning, Computer Science - Machine Learning
Abstract: Learning with label dependent label noise has been extensively explored in both theory and practice; however, dealing with instance (i.e., feature) and label dependent label noise continues to be a challenging task. The difficulty arises from the fact that the noise rate varies for each instance, making it challenging to estimate accurately. The question of whether it is possible to learn a reliable model using only noisy samples remains unresolved. We answer this question with a theoretical analysis that provides matching upper and lower bounds. Surprisingly, our results show that, without any additional assumptions, empirical risk minimization achieves the optimal excess risk bound. Specifically, we derive a novel excess risk bound proportional to the noise level, which holds in very general settings, by comparing the empirical risk minimizers obtained from clean samples and noisy samples. Second, we show that the minimax lower bound for the 0-1 loss is a constant proportional to the average noise rate. Our findings suggest that learning solely with noisy samples is impossible without access to clean samples or strong assumptions on the distribution of the data.
Published: 2023

4. Stochastic First-Order Algorithms for Constrained Distributionally Robust Optimization

Author: Im, Hyungki and Grigas, Paul
Subjects: Mathematics - Optimization and Control
Abstract: We consider distributionally robust optimization (DRO) problems, reformulated as distributionally robust feasibility (DRF) problems, with multiple expectation constraints. We propose a generic stochastic first-order meta-algorithm, where the decision variables and uncertain distribution parameters are each updated separately by applying stochastic first-order methods. We then specialize our results to the case of using two specific versions of stochastic mirror descent (SMD): (i) a novel approximate version of SMD to update the decision variables, and (ii) the bandit mirror descent method to update the distribution parameters in the case of $\chi^2$-divergence sets. For this specialization, we demonstrate that the total number of iterations is independent of the dimensions of the decision variables and distribution parameters. Moreover, the cost per iteration to update both sets of variables is nearly independent of the dimension of the distribution parameters, allowing for high dimensional ambiguity sets. Furthermore, we show that the total number of iterations of our algorithm has a logarithmic dependence on the number of constraints. Experiments on logistic regression with fairness constraints, personalized parameter selection in a social network, and the multi-item newsvendor problem verify the theoretical results and show the usefulness of the algorithm, in particular when the dimension of the distribution parameters is large.
Published: 2023

5. On the Softplus Penalty for Constrained Convex Optimization

Author: Li, Meng, Grigas, Paul, and Atamturk, Alper
Subjects: Mathematics - Optimization and Control
Abstract: We study a new penalty reformulation of constrained convex optimization based on the softplus penalty function. We develop novel and tight upper bounds on the objective value gap and the violation of constraints for the solutions to the penalty reformulations by analyzing the solution path of the reformulation with respect to the smoothness parameter. We use these upper bounds to analyze the complexity of applying gradient methods, which are advantageous when the number of constraints is large, to the reformulation.
Published: 2023

6. Active Learning in the Predict-then-Optimize Framework: A Margin-Based Approach

Author: Liu, Mo, Grigas, Paul, Liu, Heyuan, and Shen, Zuo-Jun Max
Subjects: Computer Science - Machine Learning, Mathematics - Optimization and Control, Statistics - Machine Learning
Abstract: We develop the first active learning method in the predict-then-optimize framework. Specifically, we develop a learning method that sequentially decides whether to request the "labels" of feature samples from an unlabeled data stream, where the labels correspond to the parameters of an optimization model for decision-making. Our active learning method is the first to be directly informed by the decision error induced by the predicted parameters, which is referred to as the Smart Predict-then-Optimize (SPO) loss. Motivated by the structure of the SPO loss, our algorithm adopts a margin-based criterion utilizing the concept of distance to degeneracy and minimizes a tractable surrogate of the SPO loss on the collected data. In particular, we develop an efficient active learning algorithm with both hard and soft rejection variants, each with theoretical excess risk (i.e., generalization) guarantees. We further derive bounds on the label complexity, which refers to the number of samples whose labels are acquired to achieve a desired small level of SPO risk. Under some natural low-noise conditions, we show that these bounds can be better than the naive supervised learning approach that labels all samples. Furthermore, when using the SPO+ loss function, a specialized surrogate of the SPO loss, we derive a significantly smaller label complexity under separability conditions. We also present numerical evidence showing the practical value of our proposed algorithms in the settings of personalized pricing and the shortest path problem.
Published: 2023

7. Online Contextual Decision-Making with a Smart Predict-then-Optimize Method

Author: Liu, Heyuan and Grigas, Paul
Subjects: Computer Science - Machine Learning, Mathematics - Optimization and Control, Statistics - Machine Learning
Abstract: We study an online contextual decision-making problem with resource constraints. At each time period, the decision-maker first predicts a reward vector and resource consumption matrix based on a given context vector and then solves a downstream optimization problem to make a decision. The final goal of the decision-maker is to maximize the summation of the reward and the utility from resource consumption, while satisfying the resource constraints. We propose an algorithm that mixes a prediction step based on the "Smart Predict-then-Optimize (SPO)" method with a dual update step based on mirror descent. We prove regret bounds and demonstrate that the overall convergence rate of our method depends on the $\mathcal{O}(T^{-1/2})$ convergence of online mirror descent as well as risk bounds of the surrogate loss function used to learn the prediction model. Our algorithm and regret bounds apply to a general convex feasible region for the resource constraints, including both hard and soft resource constraint cases, and they apply to a wide class of prediction models in contrast to the traditional settings of linear contextual models or finite policy spaces. We also conduct numerical experiments to empirically demonstrate the strength of our proposed SPO-type methods, as compared to traditional prediction-error-only methods, on multi-dimensional knapsack and longest path instances.
Published: 2022

8. New Penalized Stochastic Gradient Methods for Linearly Constrained Strongly Convex Optimization

Author: Li, Meng, Grigas, Paul, and Atamturk, Alper
Subjects: Mathematics - Optimization and Control, 90C30, 90C25, 65K05
Abstract: For minimizing a strongly convex objective function subject to linear inequality constraints, we consider a penalty approach that allows one to utilize stochastic methods for problems with a large number of constraints and/or objective function terms. We provide upper bounds on the distance between the solutions to the original constrained problem and the penalty reformulations, guaranteeing the convergence of the proposed approach. We give a nested accelerated stochastic gradient method and propose a novel way for updating the smoothness parameter of the penalty function and the step-size. The proposed algorithm requires at most $\tilde O(1/\sqrt{\epsilon})$ expected stochastic gradient iterations to produce a solution within an expected distance of $\epsilon$ to the optimal solution of the original problem, which is the best complexity for this problem class to the best of our knowledge. We also show how to query an approximate dual solution after stochastically solving the penalty reformulations, leading to results on the convergence of the duality gap. Moreover, the nested structure of the algorithm and upper bounds on the distance to the optimal solutions allows one to safely eliminate constraints that are inactive at an optimal solution throughout the algorithm, which leads to improved complexity results. Finally, we present computational results that demonstrate the effectiveness and robustness of our algorithm.
Published: 2022

9. Integrated Conditional Estimation-Optimization

Author: Qi, Meng, Grigas, Paul, and Shen, Zuo-Jun Max
Subjects: Statistics - Machine Learning, Computer Science - Machine Learning
Abstract: Many real-world optimization problems involve uncertain parameters with probability distributions that can be estimated using contextual feature information. In contrast to the standard approach of first estimating the distribution of uncertain parameters and then optimizing the objective based on the estimation, we propose an integrated conditional estimation-optimization (ICEO) framework that estimates the underlying conditional distribution of the random parameter while considering the structure of the optimization problem. We directly model the relationship between the conditional distribution of the random parameter and the contextual features, and then estimate the probabilistic model with an objective that aligns with the downstream optimization problem. We show that our ICEO approach is asymptotically consistent under moderate regularity conditions and further provide finite performance guarantees in the form of generalization bounds. Computationally, performing estimation with the ICEO approach is a non-convex and often non-differentiable optimization problem. We propose a general methodology for approximating the potentially non-differentiable mapping from estimated conditional distribution to the optimal decision by a differentiable function, which greatly improves the performance of gradient-based algorithms applied to the non-convex problem. We also provide a polynomial optimization solution approach in the semi-algebraic case. Numerical experiments are also conducted to show the empirical success of our approach in different situations including with limited data samples and model mismatches.
Published: 2021

10. Risk Bounds and Calibration for a Smart Predict-then-Optimize Method

Author: Liu, Heyuan and Grigas, Paul
Subjects: Computer Science - Machine Learning, Mathematics - Optimization and Control, Statistics - Machine Learning
Abstract: The predict-then-optimize framework is fundamental in practical stochastic decision-making problems: first predict unknown parameters of an optimization model, then solve the problem using the predicted values. A natural loss function in this setting is defined by measuring the decision error induced by the predicted parameters, which was named the Smart Predict-then-Optimize (SPO) loss by Elmachtoub and Grigas [arXiv:1710.08005]. Since the SPO loss is typically nonconvex and possibly discontinuous, Elmachtoub and Grigas [arXiv:1710.08005] introduced a convex surrogate, called the SPO+ loss, that importantly accounts for the underlying structure of the optimization model. In this paper, we greatly expand upon the consistency results for the SPO+ loss provided by Elmachtoub and Grigas [arXiv:1710.08005]. We develop risk bounds and uniform calibration results for the SPO+ loss relative to the SPO loss, which provide a quantitative way to transfer the excess surrogate risk to excess true risk. By combining our risk bounds with generalization bounds, we show that the empirical minimizer of the SPO+ loss achieves low excess true risk with high probability. We first demonstrate these results in the case when the feasible region of the underlying optimization problem is a polyhedron, and then we show that the results can be strengthened substantially when the feasible region is a level set of a strongly convex function. We perform experiments to empirically demonstrate the strength of the SPO+ surrogate, as compared to standard $\ell_1$ and squared $\ell_2$ prediction error losses, on portfolio allocation and cost-sensitive multi-class classification problems., Comment: To appear in NeurIPS 2021
Published: 2021

11. Joint Online Learning and Decision-making via Dual Mirror Descent

Author: Lobos, Alfonso, Grigas, Paul, and Wen, Zheng
Subjects: Computer Science - Machine Learning, Mathematics - Optimization and Control
Abstract: We consider an online revenue maximization problem over a finite time horizon subject to lower and upper bounds on cost. At each period, an agent receives a context vector sampled i.i.d. from an unknown distribution and needs to make a decision adaptively. The revenue and cost functions depend on the context vector as well as some fixed but possibly unknown parameter vector to be learned. We propose a novel offline benchmark and a new algorithm that mixes an online dual mirror descent scheme with a generic parameter learning process. When the parameter vector is known, we demonstrate an $O(\sqrt{T})$ regret result as well an $O(\sqrt{T})$ bound on the possible constraint violations. When the parameter is not known and must be learned, we demonstrate that the regret and constraint violations are the sums of the previous $O(\sqrt{T})$ terms plus terms that directly depend on the convergence of the learning process.
Published: 2021

12. Stochastic In-Face Frank-Wolfe Methods for Non-Convex Optimization and Sparse Neural Network Training

Author: Grigas, Paul, Lobos, Alfonso, and Vermeersch, Nathan
Subjects: Mathematics - Optimization and Control, Computer Science - Machine Learning, Statistics - Machine Learning
Abstract: The Frank-Wolfe method and its extensions are well-suited for delivering solutions with desirable structural properties, such as sparsity or low-rank structure. We introduce a new variant of the Frank-Wolfe method that combines Frank-Wolfe steps and steepest descent steps, as well as a novel modification of the "Frank-Wolfe gap" to measure convergence in the non-convex case. We further extend this method to incorporate in-face directions for preserving structured solutions as well as block coordinate steps, and we demonstrate computational guarantees in terms of the modified Frank-Wolfe gap for all of these variants. We are particularly motivated by the application of this methodology to the training of neural networks with sparse properties, and we apply our block coordinate method to the problem of $\ell_1$ regularized neural network training. We present the results of several numerical experiments on both artificial and real datasets demonstrating significant improvements of our method in training sparse neural networks.
Published: 2019

13. Generalization Bounds in the Predict-then-Optimize Framework

Author: Balghiti, Othman El, Elmachtoub, Adam N., Grigas, Paul, and Tewari, Ambuj
Subjects: Computer Science - Machine Learning, Statistics - Machine Learning
Abstract: The predict-then-optimize framework is fundamental in many practical settings: predict the unknown parameters of an optimization problem, and then solve the problem using the predicted values of the parameters. A natural loss function in this environment is to consider the cost of the decisions induced by the predicted parameters, in contrast to the prediction error of the parameters. This loss function was recently introduced in Elmachtoub and Grigas (2022) and referred to as the Smart Predict-then-Optimize (SPO) loss. In this work, we seek to provide bounds on how well the performance of a prediction model fit on training data generalizes out-of-sample, in the context of the SPO loss. Since the SPO loss is non-convex and non-Lipschitz, standard results for deriving generalization bounds do not apply. We first derive bounds based on the Natarajan dimension that, in the case of a polyhedral feasible region, scale at most logarithmically in the number of extreme points, but, in the case of a general convex feasible region, have linear dependence on the decision dimension. By exploiting the structure of the SPO loss function and a key property of the feasible region, which we denote as the strength property, we can dramatically improve the dependence on the decision and feature dimensions. Our approach and analysis rely on placing a margin around problematic predictions that do not yield unique optimal solutions, and then providing generalization bounds in the context of a modified margin SPO loss function that is Lipschitz continuous. Finally, we characterize the strength property and show that the modified SPO loss can be computed efficiently for both strongly convex bodies and polytopes with an explicit extreme point representation., Comment: Preliminary version in NeurIPS 2019
Published: 2019

14. Stochastic First-Order Algorithms for Constrained Distributionally Robust Optimization

Author: Im, Hyungki, primary and Grigas, Paul, additional
Published: 2024
Full Text: View/download PDF

15. Condition Number Analysis of Logistic Regression, and its Implications for Standard First-Order Solution Methods

Author: Freund, Robert M., Grigas, Paul, and Mazumder, Rahul
Subjects: Mathematics - Optimization and Control, Computer Science - Machine Learning, Statistics - Computation, Statistics - Machine Learning
Abstract: Logistic regression is one of the most popular methods in binary classification, wherein estimation of model parameters is carried out by solving the maximum likelihood (ML) optimization problem, and the ML estimator is defined to be the optimal solution of this problem. It is well known that the ML estimator exists when the data is non-separable, but fails to exist when the data is separable. First-order methods are the algorithms of choice for solving large-scale instances of the logistic regression problem. In this paper, we introduce a pair of condition numbers that measure the degree of non-separability or separability of a given dataset in the setting of binary classification, and we study how these condition numbers relate to and inform the properties and the convergence guarantees of first-order methods. When the training data is non-separable, we show that the degree of non-separability naturally enters the analysis and informs the properties and convergence guarantees of two standard first-order methods: steepest descent (for any given norm) and stochastic gradient descent. Expanding on the work of Bach, we also show how the degree of non-separability enters into the analysis of linear convergence of steepest descent (without needing strong convexity), as well as the adaptive convergence of stochastic gradient descent. When the training data is separable, first-order methods rather curiously have good empirical success, which is not well understood in theory. In the case of separable data, we demonstrate how the degree of separability enters into the analysis of $\ell_2$ steepest descent and stochastic gradient descent for delivering approximate-maximum-margin solutions with associated computational guarantees as well. This suggests that first-order methods can lead to statistically meaningful solutions in the separable case, even though the ML solution does not exist., Comment: 38 pages
Published: 2018

16. Optimal Bidding, Allocation and Budget Spending for a Demand Side Platform Under Many Auction Types

Author: Lobos, Alfonso, Grigas, Paul, Wen, Zheng, and Lee, Kuang-chih
Subjects: Mathematics - Optimization and Control
Abstract: We develop a novel optimization model to maximize the profit of a Demand-Side Platform (DSP) while ensuring that the budget utilization preferences of the DSP's advertiser clients are adequately met. Our model is highly flexible and can be applied in a Real-Time Bidding environment (RTB) with arbitrary auction types, e.g., both first and second price auctions. Our proposed formulation leads to a non-convex optimization problem due to the joint optimization over both impression allocation and bid price decisions. Using Fenchel duality theory, we construct a dual problem that is convex and can be solved efficiently to obtain feasible bidding prices and allocation variables that can be deployed in a RTB setting. With a few minimal additional assumptions on the properties of the auctions, we demonstrate theoretically that our computationally efficient procedure based on convex optimization principles is guaranteed to deliver a globally optimal solution. We conduct experiments using data from a real DSP to validate our theoretical findings and to demonstrate that our method successfully trades off between DSP profitability and budget utilization in a simulated online environment., Comment: Demand-Side Platforms, Real-Time Bidding, Online Advertising, Optimization
Published: 2018

17. Smart 'Predict, then Optimize'

Author: Elmachtoub, Adam N. and Grigas, Paul
Subjects: Mathematics - Optimization and Control, Computer Science - Machine Learning, Statistics - Machine Learning
Abstract: Many real-world analytics problems involve two significant challenges: prediction and optimization. Due to the typically complex nature of each challenge, the standard paradigm is predict-then-optimize. By and large, machine learning tools are intended to minimize prediction error and do not account for how the predictions will be used in the downstream optimization problem. In contrast, we propose a new and very general framework, called Smart "Predict, then Optimize" (SPO), which directly leverages the optimization problem structure, i.e., its objective and constraints, for designing better prediction models. A key component of our framework is the SPO loss function which measures the decision error induced by a prediction. Training a prediction model with respect to the SPO loss is computationally challenging, and thus we derive, using duality theory, a convex surrogate loss function which we call the SPO+ loss. Most importantly, we prove that the SPO+ loss is statistically consistent with respect to the SPO loss under mild conditions. Our SPO+ loss function can tractably handle any polyhedral, convex, or even mixed-integer optimization problem with a linear objective. Numerical experiments on shortest path and portfolio optimization problems show that the SPO framework can lead to significant improvement under the predict-then-optimize paradigm, in particular when the prediction model being trained is misspecified. We find that linear models trained using SPO+ loss tend to dominate random forest algorithms, even when the ground truth is highly nonlinear.
Published: 2017

18. Profit Maximization for Online Advertising Demand-Side Platforms

Author: Grigas, Paul, Lobos, Alfonso, Wen, Zheng, and Lee, Kuang-chih
Subjects: Mathematics - Optimization and Control, Computer Science - Computer Science and Game Theory
Abstract: We develop an optimization model and corresponding algorithm for the management of a demand-side platform (DSP), whereby the DSP aims to maximize its own profit while acquiring valuable impressions for its advertiser clients. We formulate the problem of profit maximization for a DSP interacting with ad exchanges in a real-time bidding environment in a cost-per-click/cost-per-action pricing model. Our proposed formulation leads to a nonconvex optimization problem due to the joint optimization over both impression allocation and bid price decisions. We use Lagrangian relaxation to develop a tractable convex dual problem, which, due to the properties of second-price auctions, may be solved efficiently with subgradient methods. We propose a two-phase solution procedure, whereby in the first phase we solve the convex dual problem using a subgradient algorithm, and in the second phase we use the previously computed dual solution to set bid prices and then solve a linear optimization problem to obtain the allocation probability variables. On several synthetic examples, we demonstrate that our proposed solution approach leads to superior performance over a baseline method that is used in practice.
Published: 2017

19. An Extended Frank-Wolfe Method with 'In-Face' Directions, and its Application to Low-Rank Matrix Completion

Author: Freund, Robert M., Grigas, Paul, and Mazumder, Rahul
Subjects: Mathematics - Optimization and Control, Statistics - Computation, Statistics - Machine Learning, 90C25, G.1.6
Abstract: Motivated principally by the low-rank matrix completion problem, we present an extension of the Frank-Wolfe method that is designed to induce near-optimal solutions on low-dimensional faces of the feasible region. This is accomplished by a new approach to generating ``in-face" directions at each iteration, as well as through new choice rules for selecting between in-face and ``regular" Frank-Wolfe steps. Our framework for generating in-face directions generalizes the notion of away-steps introduced by Wolfe. In particular, the in-face directions always keep the next iterate within the minimal face containing the current iterate. We present computational guarantees for the new method that trade off efficiency in computing near-optimal solutions with upper bounds on the dimension of minimal faces of iterates. We apply the new method to the matrix completion problem, where low-dimensional faces correspond to low-rank matrices. We present computational results that demonstrate the effectiveness of our methodological approach at producing nearly-optimal solutions of very low rank. On both artificial and real datasets, we demonstrate significant speed-ups in computing very low-rank nearly-optimal solutions as compared to either the Frank-Wolfe method or its traditional away-step variant., Comment: 25 pages, 3 tables and 2 figues
Published: 2015

20. A New Perspective on Boosting in Linear Regression via Subgradient Optimization and Relatives

Author: Freund, Robert M., Grigas, Paul, and Mazumder, Rahul
Subjects: Mathematics - Statistics Theory, Computer Science - Learning, Mathematics - Optimization and Control, Statistics - Machine Learning, 62J05, 62J07, 90C25
Abstract: In this paper we analyze boosting algorithms in linear regression from a new perspective: that of modern first-order methods in convex optimization. We show that classic boosting algorithms in linear regression, namely the incremental forward stagewise algorithm (FS$_\varepsilon$) and least squares boosting (LS-Boost($\varepsilon$)), can be viewed as subgradient descent to minimize the loss function defined as the maximum absolute correlation between the features and residuals. We also propose a modification of FS$_\varepsilon$ that yields an algorithm for the Lasso, and that may be easily extended to an algorithm that computes the Lasso path for different values of the regularization parameter. Furthermore, we show that these new algorithms for the Lasso may also be interpreted as the same master algorithm (subgradient descent), applied to a regularized version of the maximum absolute correlation loss function. We derive novel, comprehensive computational guarantees for several boosting algorithms in linear regression (including LS-Boost($\varepsilon$) and FS$_\varepsilon$) by using techniques of modern first-order methods in convex optimization. Our computational guarantees inform us about the statistical properties of boosting algorithms. In particular they provide, for the first time, a precise theoretical description of the amount of data-fidelity and regularization imparted by running a boosting algorithm with a prespecified learning rate for a fixed but arbitrary number of iterations, for any dataset.
Published: 2015

21. AdaBoost and Forward Stagewise Regression are First-Order Convex Optimization Methods

Author: Freund, Robert M., Grigas, Paul, and Mazumder, Rahul
Subjects: Statistics - Machine Learning, Computer Science - Learning, Mathematics - Optimization and Control, 68Q32, 68T05, 62J05, 90C25, I.2.6, I.5.1, G.3, G.1.6
Abstract: Boosting methods are highly popular and effective supervised learning methods which combine weak learners into a single accurate model with good statistical performance. In this paper, we analyze two well-known boosting methods, AdaBoost and Incremental Forward Stagewise Regression (FS$_\varepsilon$), by establishing their precise connections to the Mirror Descent algorithm, which is a first-order method in convex optimization. As a consequence of these connections we obtain novel computational guarantees for these boosting methods. In particular, we characterize convergence bounds of AdaBoost, related to both the margin and log-exponential loss function, for any step-size sequence. Furthermore, this paper presents, for the first time, precise computational complexity results for FS$_\varepsilon$.
Published: 2013

22. New Analysis and Results for the Frank-Wolfe Method

Author: Freund, Robert M. and Grigas, Paul
Subjects: Mathematics - Optimization and Control, 90C25, G.1.6
Abstract: We present new results for the Frank-Wolfe method (also known as the conditional gradient method). We derive computational guarantees for arbitrary step-size sequences, which are then applied to various step-size rules, including simple averaging and constant step-sizes. We also develop step-size rules and computational guarantees that depend naturally on the warm-start quality of the initial (and subsequent) iterates. Our results include computational guarantees for both duality/bound gaps and the so-called FW gaps. Lastly, we present complexity bounds in the presence of approximate computation of gradients and/or linear optimization subproblem solutions., Comment: Changed the name of the method from "conditional gradient" to "Frank-Wolfe"
Published: 2013

23. On the softplus penalty for large-scale convex optimization

Author: Li, Meng, primary, Grigas, Paul, additional, and Atamtürk, Alper, additional
Published: 2023
Full Text: View/download PDF

24. A NEW PERSPECTIVE ON BOOSTING IN LINEAR REGRESSION VIA SUBGRADIENT OPTIMIZATION AND RELATIVES

Author: Freund, Robert M., Grigas, Paul, and Mazumder, Rahul
Published: 2017

25. Generalization Bounds in the Predict-Then-Optimize Framework.

Author: El Balghiti, Othman, Elmachtoub, Adam N., Grigas, Paul, and Tewari, Ambuj
Subjects: SCHOLARSHIPS, GENERALIZATION, CONVEX bodies, CONVEX domains, PROBLEM solving
Abstract: The predict-then-optimize framework is fundamental in many practical settings: predict the unknown parameters of an optimization problem and then solve the problem using the predicted values of the parameters. A natural loss function in this environment is to consider the cost of the decisions induced by the predicted parameters in contrast to the prediction error of the parameters. This loss function is referred to as the smart predict-then-optimize (SPO) loss. In this work, we seek to provide bounds on how well the performance of a prediction model fit on training data generalizes out of sample in the context of the SPO loss. Because the SPO loss is nonconvex and non-Lipschitz, standard results for deriving generalization bounds do not apply. We first derive bounds based on the Natarajan dimension that, in the case of a polyhedral feasible region, scale at most logarithmically in the number of extreme points but, in the case of a general convex feasible region, have linear dependence on the decision dimension. By exploiting the structure of the SPO loss function and a key property of the feasible region, which we denote as the strength property, we can dramatically improve the dependence on the decision and feature dimensions. Our approach and analysis rely on placing a margin around problematic predictions that do not yield unique optimal solutions and then providing generalization bounds in the context of a modified margin SPO loss function that is Lipschitz continuous. Finally, we characterize the strength property and show that the modified SPO loss can be computed efficiently for both strongly convex bodies and polytopes with an explicit extreme point representation. Funding: O. El Balghiti thanks Rayens Capital for their support. A. N. Elmachtoub acknowledges the support of the National Science Foundation (NSF) [Grant CMMI-1763000]. P. Grigas acknowledges the support of NSF [Grants CCF-1755705 and CMMI-1762744]. A. Tewari acknowledges the support of the NSF [CAREER grant IIS-1452099] and a Sloan Research Fellowship. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

26. New analysis and results for the Frank–Wolfe method

Author: Freund, Robert M. and Grigas, Paul
Published: 2016
Full Text: View/download PDF

27. Ch3MS-RF: a random forest model for chemical characterization and improved quantification of unidentified atmospheric organics detected by chromatography–mass spectrometry techniques

Author: Franklin, Emily B., primary, Yee, Lindsay D., additional, Aumont, Bernard, additional, Weber, Robert J., additional, Grigas, Paul, additional, and Goldstein, Allen H., additional
Published: 2022
Full Text: View/download PDF

28. Supplementary material to "Ch3MS-RF: A Random Forest Model for Chemical Characterization and Improved Quantification of Unidentified Atmospheric Organics Detected by Chromatography-Mass Spectrometry Techniques"

Author: Franklin, Emily B., primary, Yee, Lindsay D., additional, Aumont, Bernard, additional, Weber, Robert J., additional, Grigas, Paul, additional, and Goldstein, Allen H., additional
Published: 2022
Full Text: View/download PDF

29. Smart “Predict, then Optimize”

Author: Elmachtoub, Adam N., primary and Grigas, Paul, additional
Published: 2022
Full Text: View/download PDF

30. Integrated Conditional Estimation-Optimization

Author: Grigas, Paul, Qi, Meng, Zuo-Jun, and Shen
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Statistics - Machine Learning, Machine Learning (stat.ML), Machine Learning (cs.LG)
Abstract: Many real-world optimization problems involve uncertain parameters with probability distributions that can be estimated using contextual feature information. In contrast to the standard approach of first estimating the distribution of uncertain parameters and then optimizing the objective based on the estimation, we propose an \textit{integrated conditional estimation-optimization} (ICEO) framework that estimates the underlying conditional distribution of the random parameter while considering the structure of the optimization problem. We directly model the relationship between the conditional distribution of the random parameter and the contextual features, and then estimate the probabilistic model with an objective that aligns with the downstream optimization problem. We show that our ICEO approach is asymptotically consistent under moderate regularity conditions and further provide finite performance guarantees in the form of generalization bounds. Computationally, performing estimation with the ICEO approach is a non-convex and often non-differentiable optimization problem. We propose a general methodology for approximating the potentially non-differentiable mapping from estimated conditional distribution to optimal decision by a differentiable function, which greatly improves the performance of gradient-based algorithms applied to the non-convex problem. We also provide a polynomial optimization solution approach in the semi-algebraic case. Numerical experiments are also conducted to show the empirical success of our approach in different situations including with limited data samples and model mismatches.
Published: 2021
Full Text: View/download PDF

31. A new perspective on boosting in linear regression via subgradient optimization and relatives

Author: Massachusetts Institute of Technology. Operations Research Center, Sloan School of Management, Freund, Robert Michael, Grigas, Paul Edward, M. Freund, Robert, Grigas, Paul, Mazumder, Rahul, Massachusetts Institute of Technology. Operations Research Center, Sloan School of Management, Freund, Robert Michael, Grigas, Paul Edward, M. Freund, Robert, Grigas, Paul, and Mazumder, Rahul
Abstract: We analyze boosting algorithms [Ann. Statist. 29 (2001) 1189–1232; Ann. Statist. 28 (2000) 337–407; Ann. Statist. 32 (2004) 407–499] in linear regression from a new perspective: that of modern first-order methods in convex optimiz ation. We show that classic boosting algorithms in linear regression, namely the incremental forward stagewise algorithm (FS ? ) and least squares boosting [LS-BOOST(?)], can be viewed as subgradient descent to minimize the loss function defined as the maximum absolute correlation between the features and residuals. We also propose a minor modification of FS ? that yields an algorithm for the LASSO, and that may be easily extended to an algorithm that computes the LASSO path for different values of the regularization parameter. Furthermore, we show that these new algorithms for the LASSO may also be interpreted as the same master algorithm (subgradient descent), applied to a regularized version of the maximum absolute correlation loss function. We derive novel, comprehensive computational guarantees for several boosting algorithms in linear regression (including LS-BOOST(?) and FS ? ) by using techniques of first-order methods in convex optimization. Our computational guarantees inform us about the statistical properties of boosting algorithms. In particular, they provide, for the first time, a precise theoretical description of the amount of data-fidelity and regularization imparted by running a boosting algorithm with a prespecified learning rate for a fixed but arbitrary number of iterations, for any dataset.
Published: 2018

32. Optimal Bidding, Allocation, and Budget Spending for a Demand-Side Platform with Generic Auctions

Author: Grigas, Paul, primary, Lobos, Alfonso, additional, Wen, Zheng, additional, and Lee, Kuang-Chih, additional
Published: 2021
Full Text: View/download PDF

33. An Extended Frank--Wolfe Method with “In-Face” Directions, and Its Application to Low-Rank Matrix Completion

Author: Sloan School of Management, Freund, Robert Michael, Mazumder, Rahul, Grigas, Paul, Sloan School of Management, Freund, Robert Michael, Mazumder, Rahul, and Grigas, Paul
Abstract: Motivated principally by the low-rank matrix completion problem, we present an extension of the Frank-Wolfe method that is designed to induce near-optimal solutions on low- dimensional faces of the feasible region. This is accomplished by a new approach to generating "in-face" directions at each iteration, as well as through new choice rules for selecting between in- face and "regular" Frank-Wolfe steps. Our framework for generating in-face directions generalizes the notion of away steps introduced by Wolfe. In particular, the in-face directions always keep the next iterate within the minimal face containing the current iterate. We present computational guarantees for the new method that trade off efficiency in computing near-optimal solutions with upper bounds on the dimension of minimal faces of iterates. We apply the new method to the matrix completion problem, where low-dimensional faces correspond to low-rank matrices. We present computational results that demonstrate the effectiveness of our methodological approach at producing nearly optimal solutions of very low rank. On both artificial and real datasets, we demonstrate significant speedups in computing very low rank nearly optimal solutions as compared to the Frank-Wolfe method (as well as several of its significant variants). Key words: convex optimization, Frank–Wolfe method, computational guarantees, low-rank, matrix completion, nuclear norm regularization, United States. Air Force. Office of Scientific Research (Grant FA9550-15-1-0276), MIT-Chile Seed Fund, MIT-Belgium Program, United States. Office of Naval Research (Grant N000141512342), Gordon and Betty Moore Foundation
Published: 2018

34. A new perspective on boosting in linear regression via subgradient optimization and relatives

Author: M. Freund, Robert, primary, Grigas, Paul, additional, and Mazumder, Rahul, additional
Published: 2017
Full Text: View/download PDF

35. Methods for convex optimization and statistical learning

Author: Robert M. Freund., Massachusetts Institute of Technology. Operations Research Center., Grigas, Paul (Paul Edward), Robert M. Freund., Massachusetts Institute of Technology. Operations Research Center., and Grigas, Paul (Paul Edward)
Abstract: Thesis: Ph. D., Massachusetts Institute of Technology, Sloan School of Management, Operations Research Center, 2016., This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections., Cataloged from student-submitted PDF version of thesis., Includes bibliographical references (pages 219-225)., We present several contributions at the interface of first-order methods for convex optimization and problems in statistical machine learning. In the first part of this thesis, we present new results for the Frank-Wolfe method, with a particular focus on: (i) novel computational guarantees that apply for any step-size sequence, (ii) a novel adjustment to the basic algorithm to better account for warm-start information, and (iii) extensions of the computational guarantees that hold in the presence of approximate subproblem and/or gradient computations. In the second part of the thesis, we present a unifying framework for interpreting "greedy" first-order methods -- namely Frank-Wolfe and greedy coordinate descent -- as instantiations of the dual averaging method of Nesterov, and we discuss the implications thereof. In the third part of the thesis, we present an extension of the Frank-Wolfe method that is designed to induce near-optimal low-rank solutions for nuclear norm regularized matrix completion and, for more general problems, induces near-optimal "well-structured" solutions. We establish computational guarantees that trade off efficiency in computing near-optimal solutions with upper bounds on the rank of iterates. We then present extensive computational results that show significant computational advantages over existing related approaches, in terms of delivering low rank and low run-time to compute a target optimality gap. In the fourth part of the thesis, we analyze boosting algorithms in linear regression from the perspective modern first-order methods in convex optimization. We show that classic boosting algorithms in linear regression can be viewed as subgradient descent to minimize the maximum absolute correlation between features and residuals. We also propose a slightly modified boosting algorithm that yields an algorithm for the Lasso, and that computes the Lasso path. Our perspective leads to first-ever comprehensive computational guarantees for all, by Paul Grigas., Ph. D.
Published: 2017

36. Profit Maximization for Online Advertising Demand-Side Platforms

Author: Grigas, Paul, primary, Lobos, Alfonso, additional, Wen, Zheng, additional, and Lee, Kuang-chih, additional
Published: 2017
Full Text: View/download PDF

37. An Extended Frank--Wolfe Method with “In-Face” Directions, and Its Application to Low-Rank Matrix Completion

Author: Freund, Robert M., primary, Grigas, Paul, additional, and Mazumder, Rahul, additional
Published: 2017
Full Text: View/download PDF

38. New analysis and results for the Frank–Wolfe method

Author: Massachusetts Institute of Technology. Operations Research Center, Freund, Robert Michael, Grigas, Paul Edward, Massachusetts Institute of Technology. Operations Research Center, Freund, Robert Michael, and Grigas, Paul Edward
Abstract: We present new results for the Frank–Wolfe method (also known as the conditional gradient method). We derive computational guarantees for arbitrary step-size sequences, which are then applied to various step-size rules, including simple averaging and constant step-sizes. We also develop step-size rules and computational guarantees that depend naturally on the warm-start quality of the initial (and subsequent) iterates. Our results include computational guarantees for both duality/bound gaps and the so-called FW gaps. Lastly, we present complexity bounds in the presence of approximate computation of gradients and/or linear optimization subproblem solutions., United States. Air Force Office of Scientific Research (AFOSR Grant No. FA9550-11-1-0141), Pontifical Catholic University of Chile (MIT-Chile-Pontificia Universidad Católica de Chile Seed Fund), National Science Foundation (U.S.) (NSF Graduate Research Fellowship No. 1122374)
Published: 2016

39. AdaBoost and Forward Stagewise Regression are First-Order Convex Optimization Methods

Author: Freund, Robert M., Grigas, Paul, Mazumder, Rahul, Freund, Robert M., Grigas, Paul, and Mazumder, Rahul
Published: 2014

40. New analysis and results for the Frank–Wolfe method

Author: Freund, Robert M., primary and Grigas, Paul, additional
Published: 2014
Full Text: View/download PDF

41. New Analysis and Results for the Conditional Gradient Method

Author: Freund, Robert M., Grigas, Paul, Freund, Robert M., and Grigas, Paul
Published: 2013

42. New analysis and results for the Frank–Wolfe method

Author: Paul Grigas, Robert M. Freund, Massachusetts Institute of Technology. Operations Research Center, Freund, Robert Michael, and Grigas, Paul Edward
Subjects: Mathematical optimization, 021103 operations research, Linear programming, General Mathematics, Computation, Numerical analysis, 0211 other engineering and technologies, Duality (optimization), 010103 numerical & computational mathematics, 02 engineering and technology, 01 natural sciences, Frank–Wolfe algorithm, Iterated function, Simple (abstract algebra), Applied mathematics, 0101 mathematics, Constant (mathematics), Software, Mathematics
Abstract: We present new results for the Frank–Wolfe method (also known as the conditional gradient method). We derive computational guarantees for arbitrary step-size sequences, which are then applied to various step-size rules, including simple averaging and constant step-sizes. We also develop step-size rules and computational guarantees that depend naturally on the warm-start quality of the initial (and subsequent) iterates. Our results include computational guarantees for both duality/bound gaps and the so-called FW gaps. Lastly, we present complexity bounds in the presence of approximate computation of gradients and/or linear optimization subproblem solutions., United States. Air Force Office of Scientific Research (AFOSR Grant No. FA9550-11-1-0141), Pontifical Catholic University of Chile (MIT-Chile-Pontificia Universidad Católica de Chile Seed Fund), National Science Foundation (U.S.) (NSF Graduate Research Fellowship No. 1122374)
Published: 2013

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

Publisher

42 results on '"Grigas, Paul"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources