Author: "Pauwels Edouard" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Pauwels Edouard"' showing total 249 results

Start Over Author "Pauwels Edouard"

249 results on '"Pauwels Edouard"'

1. A second-order-like optimizer with adaptive gradient scaling for deep learning

Author: Bolte, Jérôme, Boustany, Ryan, Pauwels, Edouard, and Purica, Andrei
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Mathematics - Optimization and Control
Abstract: In this empirical article, we introduce INNAprop, an optimization algorithm that combines the INNA method with the RMSprop adaptive gradient scaling. It leverages second-order information and rescaling while keeping the memory requirements of standard DL methods as AdamW or SGD with momentum. After giving geometrical insights, we evaluate INNAprop on CIFAR-10, Food101, and ImageNet with ResNets, VGG, DenseNet, and ViT, and on GPT-2 (OpenWebText) train from scratch and with LoRA fine-tuning (E2E). INNAprop consistently matches or outperforms AdamW both in training speed and accuracy, with minimal hyperparameter tuning in large-scale settings. Our code is publicly available at \url{https://github.com/innaprop/innaprop}.
Published: 2024

2. Geometric and computational hardness of bilevel programming

Author: Bolte, Jérôme, Le, Quoc-Tung, Pauwels, Edouard, and Vaiter, Samuel
Subjects: Computer Science - Computational Complexity, Mathematics - Optimization and Control
Abstract: We first show a simple but striking result in bilevel optimization: unconstrained $C^\infty$ smooth bilevel programming is as hard as general extended-real-valued lower semicontinuous minimization. We then proceed to a worst-case analysis of box-constrained bilevel polynomial optimization. We show in particular that any extended-real-valued semi-algebraic function, possibly non-continuous, can be expressed as the value function of a polynomial bilevel program. Secondly, from a computational complexity perspective, the decision version of polynomial bilevel programming is one level above NP in the polynomial hierarchy ($\Sigma^p_2$-hard). Both types of difficulties are uncommon in non-linear programs for which objective functions are typically continuous and belong to the class NP. These results highlight the irremediable hardness attached to general bilevel optimization and the necessity of imposing some form of regularity on the lower level.
Published: 2024

3. On the sequential convergence of Lloyd's algorithms

Author: Portales, Léo, Cazelles, Elsa, and Pauwels, Edouard
Subjects: Mathematics - Optimization and Control
Abstract: Lloyd's algorithm is an iterative method that solves the quantization problem, i.e. the approximation of a target probability measure by a discrete one, and is particularly used in digital applications.This algorithm can be interpreted as a gradient method on a certain quantization functional which is given by optimal transport. We study the sequential convergence (to a single accumulation point) for two variants of Lloyd's method: (i) optimal quantization with an arbitrary discrete measure and (ii) uniform quantization with a uniform discrete measure. For both cases, we prove sequential convergence of the iterates under an analiticity assumption on the density of the target measure. This includes for example analytic densities truncated to a compact semi-algebraic set. The argument leverages the log analytic nature of globally subanalytic integrals, the interpretation of Lloyd's method as a gradient method and the convergence analysis of gradient algorithms under Kurdyka-Lojasiewicz assumptions. As a by-product, we also obtain definability results for more general semi-discrete optimal transport losses such as transport distances with general costs, the max-sliced Wasserstein distance and the entropy regularized optimal transport loss.
Published: 2024

4. Derivatives of Stochastic Gradient Descent in parametric optimization

Author: Iutzeler, Franck, Pauwels, Edouard, and Vaiter, Samuel
Subjects: Mathematics - Optimization and Control, Computer Science - Machine Learning
Abstract: We consider stochastic optimization problems where the objective depends on some parameter, as commonly found in hyperparameter optimization for instance. We investigate the behavior of the derivatives of the iterates of Stochastic Gradient Descent (SGD) with respect to that parameter and show that they are driven by an inexact SGD recursion on a different objective function, perturbed by the convergence of the original SGD. This enables us to establish that the derivatives of SGD converge to the derivative of the solution mapping in terms of mean squared error whenever the objective is strongly convex. Specifically, we demonstrate that with constant step-sizes, these derivatives stabilize within a noise ball centered at the solution derivative, and that with vanishing step-sizes they exhibit $O(\log(k)^2 / k)$ convergence rates. Additionally, we prove exponential convergence in the interpolation regime. Our theoretical findings are illustrated by numerical experiments on synthetic tasks.
Published: 2024

5. Inexact subgradient methods for semialgebraic functions

Author: Bolte, Jérôme, Le, Tam, Moulines, Éric, and Pauwels, Edouard
Subjects: Mathematics - Optimization and Control, Statistics - Machine Learning
Abstract: Motivated by the widespread use of approximate derivatives in machine learning and optimization, we study inexact subgradient methods with non-vanishing additive errors and step sizes. In the nonconvex semialgebraic setting, under boundedness assumptions, we prove that the method provides points that eventually fluctuate close to the critical set at a distance proportional to $\epsilon^\rho$ where $\epsilon$ is the error in subgradient evaluation and $\rho$ relates to the geometry of the problem. In the convex setting, we provide complexity results for the averaged values. We also obtain byproducts of independent interest, such as descent-like lemmas for nonsmooth nonconvex problems and some results on the limit of affine interpolants of differential inclusions.
Published: 2024

6. A note on stationarity in constrained optimization

Author: Pauwels, Edouard
Subjects: Mathematics - Optimization and Control
Abstract: Minimizing a smooth function f on a closed subset C leads to different notions of stationarity: Fr{\'e}chet stationarity, which carries a strong variational meaning, and criticality, which is defined through a closure process and involves the notion of limiting, or Mordukovitch, subdifferential. The latter is an optimality condition which may loose the variational meaning of Fr{\'e}chet stationarity in some settings. The purpose of this note is to illustrate that, while criticality is the appropriate notion in full generality, Fr{\'e}chet stationarity is typical in practical scenarios.We gather two results to illustrate this phenomenon. These results are essentially known and, our goal is to provide consize self contained arguments in the constrained optimization setting. First we show that if C is semi-algebraic, then for a generic smooth semi-algebraic function f , all critical points of f on C are actually Fr{\'e}chet stationary. Second we prove that for small step-sizes, all the accumulation points of the projected gradient algorithm are Fr{\'e}chet stationary, with an explicit global quadratic estimate of the remainder, avoiding potential critical points that are not Fr{\'e}chet stationary, and some bad local minima.
Published: 2024

7. One-step differentiation of iterative algorithms

Author: Bolte, Jérôme, Pauwels, Edouard, and Vaiter, Samuel
Subjects: Mathematics - Optimization and Control, Computer Science - Machine Learning
Abstract: In appropriate frameworks, automatic differentiation is transparent to the user at the cost of being a significant computational burden when the number of operations is large. For iterative algorithms, implicit differentiation alleviates this issue but requires custom implementation of Jacobian evaluation. In this paper, we study one-step differentiation, also known as Jacobian-free backpropagation, a method as easy as automatic differentiation and as performant as implicit differentiation for fast algorithms (e.g., superlinear optimization methods). We provide a complete theoretical approximation analysis with specific examples (Newton's method, gradient descent) along with its consequences in bilevel optimization. Several numerical examples illustrate the well-foundness of the one-step estimator.
Published: 2023

8. On the nature of Bregman functions

Author: Pauwels, Edouard
Subjects: Mathematics - Optimization and Control
Abstract: Let C be convex, compact, with nonempty interior and h be Legendre with domain C, continuous on C. We prove that h is Bregman if and only if it is strictly convex on C and C is a polytope. This provides insights on sequential convergence of many Bregman divergence based algorithm: abstract compatibility conditions between Bregman and Euclidean topology may equivalently be replaced by explicit conditions on h and C. This also emphasizes that a general convergence theory for these methods (beyond polyhedral domains) would require more refinements than Bregman's conditions.
Published: 2023

9. Differentiating Nonsmooth Solutions to Parametric Monotone Inclusion Problems

Author: Bolte, Jérôme, Pauwels, Edouard, and Silveti-Falls, Antonio
Subjects: Computer Science - Machine Learning, Mathematics - Optimization and Control
Abstract: We leverage path differentiability and a recent result on nonsmooth implicit differentiation calculus to give sufficient conditions ensuring that the solution to a monotone inclusion problem will be path differentiable, with formulas for computing its generalized gradient. A direct consequence of our result is that these solutions happen to be differentiable almost everywhere. Our approach is fully compatible with automatic differentiation and comes with assumptions which are easy to check, roughly speaking: semialgebraicity and strong monotonicity. We illustrate the scope of our results by considering three fundamental composite problem settings: strongly convex problems, dual solutions to convex minimization problems and primal-dual solutions to min-max problems.
Published: 2022

10. The derivatives of Sinkhorn-Knopp converge

Author: Pauwels, Edouard and Vaiter, Samuel
Subjects: Mathematics - Optimization and Control, Statistics - Machine Learning
Abstract: We show that the derivatives of the Sinkhorn-Knopp algorithm, or iterative proportional fitting procedure, converge towards the derivatives of the entropic regularization of the optimal transport problem with a locally uniform linear convergence rate.
Published: 2022

11. On the complexity of nonsmooth automatic differentiation

Author: Bolte, Jérôme, Boustany, Ryan, Pauwels, Edouard, and Pesquet-Popescu, Béatrice
Subjects: Mathematics - Numerical Analysis, Computer Science - Artificial Intelligence, Computer Science - Machine Learning, Mathematics - Optimization and Control
Abstract: Using the notion of conservative gradient, we provide a simple model to estimate the computational costs of the backward and forward modes of algorithmic differentiation for a wide class of nonsmooth programs. The overhead complexity of the backward mode turns out to be independent of the dimension when using programs with locally Lipschitz semi-algebraic or definable elementary functions. This considerably extends Baur-Strassen's smooth cheap gradient principle. We illustrate our results by establishing fast backpropagation results of conservative gradients through feedforward neural networks with standard activation and loss functions. Nonsmooth backpropagation's cheapness contrasts with concurrent forward approaches, which have, to this day, dimensional-dependent worst-case overhead estimates. We provide further results suggesting the superiority of backward propagation of conservative gradients. Indeed, we relate the complexity of computing a large number of directional derivatives to that of matrix multiplication, and we show that finding two subgradients in the Clarke subdifferential of a function is an NP-hard problem.
Published: 2022

12. Automatic differentiation of nonsmooth iterative algorithms

Author: Bolte, Jérôme, Pauwels, Edouard, and Vaiter, Samuel
Subjects: Mathematics - Optimization and Control, Computer Science - Machine Learning
Abstract: Differentiation along algorithms, i.e., piggyback propagation of derivatives, is now routinely used to differentiate iterative solvers in differentiable programming. Asymptotics is well understood for many smooth problems but the nondifferentiable case is hardly considered. Is there a limiting object for nonsmooth piggyback automatic differentiation (AD)? Does it have any variational meaning and can it be used effectively in machine learning? Is there a connection with classical derivative? All these questions are addressed under appropriate nonexpansivity conditions in the framework of conservative derivatives which has proved useful in understanding nonsmooth AD. For nonsmooth piggyback iterations, we characterize the attractor set of nonsmooth piggyback iterations as a set-valued fixed point which remains in the conservative framework. This has various consequences and in particular almost everywhere convergence of classical derivatives. Our results are illustrated on parametric convex optimization problems with forward-backward, Douglas-Rachford and Alternating Direction of Multiplier algorithms as well as the Heavy-Ball method.
Published: 2022

13. Subgradient sampling for nonsmooth nonconvex minimization

Author: Bolte, Jérôme, Le, Tam, and Pauwels, Edouard
Subjects: Mathematics - Optimization and Control
Abstract: Risk minimization for nonsmooth nonconvex problems naturally leads to first-order sampling or, by an abuse of terminology, to stochastic subgradient descent. We establish the convergence of this method in the path-differentiable case and describe more precise results under additional geometric assumptions. We recover and improve results from Ermoliev and Norkin [Cybern. Syst. Anal., 34 (1998), pp. 196--215] by using a different approach: conservative calculus and the ODE method. In the definable case, we show that first-order subgradient sampling avoids artificial critical points with probability one and applies moreover to a large range of risk minimization problems in deep learning, based on the backpropagation oracle. As byproducts of our approach, we obtain several results on integration of independent interest, such as an interchange result for conservative derivatives and integrals or the definability of set-valued parameterized integrals.
Published: 2022
Full Text: View/download PDF

14. The Iterates of the Frank-Wolfe Algorithm May Not Converge

Author: Bolte, Jérôme, Combettes, Cyrille W., and Pauwels, Édouard
Subjects: Mathematics - Optimization and Control
Abstract: The Frank-Wolfe algorithm is a popular method for minimizing a smooth convex function $f$ over a compact convex set $\mathcal{C}$. While many convergence results have been derived in terms of function values, hardly nothing is known about the convergence behavior of the sequence of iterates $(x_t)_{t\in\mathbb{N}}$. Under the usual assumptions, we design several counterexamples to the convergence of $(x_t)_{t\in\mathbb{N}}$, where $f$ is $d$-time continuously differentiable, $d\geq2$, and $f(x_t)\to\min_\mathcal{C}f$. Our counterexamples cover the cases of open-loop, closed-loop, and line-search step-size strategies. We do not assume \emph{misspecification} of the linear minimization oracle and our results thus hold regardless of the points it returns, demonstrating the fundamental pathologies in the convergence behavior of $(x_t)_{t\in\mathbb{N}}$., Comment: 15 pages, 7 figures
Published: 2022

15. Path differentiability of ODE flows

Author: Marx, Swann and Pauwels, Edouard
Subjects: Computer Science - Machine Learning, Mathematics - Optimization and Control
Abstract: We consider flows of ordinary differential equations (ODEs) driven by path differentiable vector fields. Path differentiable functions constitute a proper subclass of Lipschitz functions which admit conservative gradients, a notion of generalized derivative compatible with basic calculus rules. Our main result states that such flows inherit the path differentiability property of the driving vector field. We show indeed that forward propagation of derivatives given by the sensitivity differential inclusions provide a conservative Jacobian for the flow. This allows to propose a nonsmooth version of the adjoint method, which can be applied to integral costs under an ODE constraint. This result constitutes a theoretical ground to the application of small step first order methods to solve a broad class of nonsmooth optimization problems with parametrized ODE constraints. This is illustrated with the convergence of small step first order methods based on the proposed nonsmooth adjoint.
Published: 2022

16. Numerical influence of ReLU'(0) on backpropagation

Author: Bertoin, David, Bolte, Jérôme, Gerchinovitz, Sébastien, and Pauwels, Edouard
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence
Abstract: In theory, the choice of ReLU(0) in [0, 1] for a neural network has a negligible influence both on backpropagation and training. Yet, in the real world, 32 bits default precision combined with the size of deep learning problems makes it a hyperparameter of training methods. We investigate the importance of the value of ReLU'(0) for several precision levels (16, 32, 64 bits), on various networks (fully connected, VGG, ResNet) and datasets (MNIST, CIFAR10, SVHN, ImageNet). We observe considerable variations of backpropagation outputs which occur around half of the time in 32 bits precision. The effect disappears with double precision, while it is systematic at 16 bits. For vanilla SGD training, the choice ReLU'(0) = 0 seems to be the most efficient. For our experiments on ImageNet the gain in test accuracy over ReLU'(0) = 1 was more than 10 points (two runs). We also evidence that reconditioning approaches as batch-norm or ADAM tend to buffer the influence of ReLU'(0)'s value. Overall, the message we convey is that algorithmic differentiation of nonsmooth problems potentially hides parameters that could be tuned advantageously.
Published: 2021

17. Nonsmooth Implicit Differentiation for Machine Learning and Optimization

Author: Bolte, Jérôme, Le, Tam, Pauwels, Edouard, and Silveti-Falls, Antonio
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Mathematics - Optimization and Control
Abstract: In view of training increasingly complex learning architectures, we establish a nonsmooth implicit function theorem with an operational calculus. Our result applies to most practical problems (i.e., definable problems) provided that a nonsmooth form of the classical invertibility condition is fulfilled. This approach allows for formal subdifferentiation: for instance, replacing derivatives by Clarke Jacobians in the usual differentiation formulas is fully justified for a wide class of nonsmooth problems. Moreover this calculus is entirely compatible with algorithmic differentiation (e.g., backpropagation). We provide several applications such as training deep equilibrium networks, training neural nets with conic optimization layers, or hyperparameter-tuning for nonsmooth Lasso-type models. To show the sharpness of our assumptions, we present numerical experiments showcasing the extremely pathological gradient dynamics one can encounter when applying implicit algorithmic differentiation without any hypothesis.
Published: 2021

18. Semialgebraic Representation of Monotone Deep Equilibrium Models and Applications to Certification

Author: Chen, Tong, Lasserre, Jean-Bernard, Magron, Victor, and Pauwels, Edouard
Subjects: Mathematics - Optimization and Control
Abstract: Deep equilibrium models are based on implicitly defined functional relations and have shown competitive performance compared with the traditional deep networks. Monotone operator equilibrium networks (monDEQ) retain interesting performance with additional theoretical guaranties. Existing certification tools for classical deep networks cannot directly be applied to monDEQs for which much fewer tools exist. We introduce a semialgebraic representation for ReLU based monDEQs which allows to approximate the corresponding input output relation by semidefinite programming (SDP). We present several applications to network certification and obtain SDP models for the following problems : robustness certification, Lipschitz constant estimation, ellipsoidal uncertainty propagation. We use these models to certify robustness of monDEQs w.r.t. a general $L_q$ norm. Experimental results show that the proposed models outperform existing approaches for monDEQ certification. Furthermore, our investigations suggest that monDEQs are much more robust to $L_2$ perturbations than $L_{\infty}$ perturbations., Comment: 16 pages, 4 tables, 2 figures
Published: 2021

19. Conservative parametric optimality and the ridge method for tame min-max problems

Author: Pauwels, Edouard
Subjects: Mathematics - Optimization and Control
Abstract: We study the ridge method for min-max problems, and investigate its convergence without any convexity, differentiability or qualification assumption. The central issue is to determine whether the ''parametric optimality formula'' provides a conservative field, a notion of generalized derivative well suited for optimization. The answer to this question is positive in a semi-algebraic, and more generally definable, context. The proof involves a new characterization of definable conservative fields which is of independent interest. As a consequence, the ridge method applied to definable objectives is proved to have a minimizing behavior and to converge to a set of equilibria which satisfy an optimality condition. Definability is key to our proof: we show that for a more general class of nonsmooth functions, conservativity of the parametric optimality formula may fail, resulting in an absurd behavior of the ridge method.
Published: 2021
Full Text: View/download PDF

20. Second-order step-size tuning of SGD for non-convex optimization

Author: Castera, Camille, Bolte, Jérôme, Févotte, Cédric, and Pauwels, Edouard
Subjects: Computer Science - Machine Learning, Mathematics - Optimization and Control
Abstract: In view of a direct and simple improvement of vanilla SGD, this paper presents a fine-tuning of its step-sizes in the mini-batch case. For doing so, one estimates curvature, based on a local quadratic model and using only noisy gradient approximations. One obtains a new stochastic first-order method (Step-Tuned SGD), enhanced by second-order information, which can be seen as a stochastic version of the classical Barzilai-Borwein method. Our theoretical results ensure almost sure convergence to the critical set and we provide convergence rates. Experiments on deep residual network training illustrate the favorable properties of our approach. For such networks we observe, during training, both a sudden drop of the loss and an improvement of test accuracy at medium stages, yielding better results than SGD, RMSprop, or ADAM., Comment: To appear in Neural Processing Letters (accepted Nov. 2021)
Published: 2021
Full Text: View/download PDF

21. A Sublevel Moment-SOS Hierarchy for Polynomial Optimization

Author: Chen, Tong, Lasserre, Jean-Bernard, Magron, Victor, and Pauwels, Edouard
Subjects: Mathematics - Optimization and Control
Abstract: We introduce a sublevel Moment-SOS hierarchy where each SDP relaxation can be viewed as an intermediate (or interpolation) between the d-th and (d+1)-th order SDP relaxations of the Moment-SOS hierarchy (dense or sparse version). With the flexible choice of determining the size (level) and number (depth) of subsets in the SDP relaxation, one is able to obtain different improvements compared to the d-th order relaxation, based on the machine memory capacity. In particular, we provide numerical experiments for d=1 and various types of problems both in combinatorial optimization (Max-Cut, Mixed Integer Programming) and deep learning (robustness certification, Lipschitz constant of neural networks), where the standard Lasserre's relaxation (or its sparse variant) is computationally intractable. In our numerical results, the lower bounds from the sublevel relaxations improve the bound from Shor's relaxation (first order Lasserre's relaxation) and are significantly closer to the optimal value or to the best-known lower/upper bounds., Comment: 25 pages, 13 tables
Published: 2021

22. Sequential convergence of AdaGrad algorithm for smooth convex optimization

Author: Traoré, Cheik and Pauwels, Edouard
Subjects: Mathematics - Optimization and Control, Computer Science - Machine Learning
Abstract: We prove that the iterates produced by, either the scalar step size variant, or the coordinatewise variant of AdaGrad algorithm, are convergent sequences when applied to convex objective functions with Lipschitz gradient. The key insight is to remark that such AdaGrad sequences satisfy a variable metric quasi-Fej\'er monotonicity property, which allows to prove convergence., Comment: 9 pages
Published: 2020

23. A H\'olderian backtracking method for min-max and min-min problems

Author: Bolte, Jérôme, Glaudin, Lilian, Pauwels, Edouard, and Serrurier, Mathieu
Subjects: Computer Science - Machine Learning, Mathematics - Optimization and Control
Abstract: We present a new algorithm to solve min-max or min-min problems out of the convex world. We use rigidity assumptions, ubiquitous in learning, making our method applicable to many optimization problems. Our approach takes advantage of hidden regularity properties and allows us to devise a simple algorithm of ridge type. An original feature of our method is to come with automatic step size adaptation which departs from the usual overly cautious backtracking methods. In a general framework, we provide convergence theoretical guarantees and rates. We apply our findings on simple GAN problems obtaining promising numerical results.
Published: 2020

24. Incremental Without Replacement Sampling in Nonconvex Optimization

Author: Pauwels, Edouard
Subjects: Computer Science - Machine Learning, Mathematics - Optimization and Control
Abstract: Minibatch decomposition methods for empirical risk minimization are commonly analysed in a stochastic approximation setting, also known as sampling with replacement. On the other hands modern implementations of such techniques are incremental: they rely on sampling without replacement, for which available analysis are much scarcer. We provide convergence guaranties for the latter variant by analysing a versatile incremental gradient scheme. For this scheme, we consider constant, decreasing or adaptive step sizes. In the smooth setting we obtain explicit complexity estimates in terms of epoch counter. In the nonsmooth setting we prove that the sequence is attracted by solutions of optimality conditions of the problem., Comment: Journal of Optimization Theory and Applications, 2021
Published: 2020
Full Text: View/download PDF

25. A mathematical model for automatic differentiation in machine learning

Author: Bolte, Jerome and Pauwels, Edouard
Subjects: Computer Science - Machine Learning, Mathematics - Optimization and Control, Statistics - Machine Learning
Abstract: Automatic differentiation, as implemented today, does not have a simple mathematical model adapted to the needs of modern machine learning. In this work we articulate the relationships between differentiation of programs as implemented in practice and differentiation of nonsmooth functions. To this end we provide a simple class of functions, a nonsmooth calculus, and show how they apply to stochastic approximation methods. We also evidence the issue of artificial critical points created by algorithmic differentiation and show how usual methods avoid these points with probability one.
Published: 2020

26. Long term dynamics of the subgradient method for Lipschitz path differentiable functions

Author: Bolte, Jerome, Pauwels, Edouard, and Rios-Zertuche, Rodolfo
Subjects: Mathematics - Optimization and Control, Mathematics - Dynamical Systems, Mathematics - Numerical Analysis, 65K10 (Primary), 37A50, 37B35, 62M45 (Secondary), G.1.6, I.2.6
Abstract: We consider the long-term dynamics of the vanishing stepsize subgradient method in the case when the objective function is neither smooth nor convex. We assume that this function is locally Lipschitz and path differentiable, i.e., admits a chain rule. Our study departs from other works in the sense that we focus on the behavoir of the oscillations, and to do this we use closed measures. We recover known convergence results, establish new ones, and show a local principle of oscillation compensation for the velocities. Roughly speaking, the time average of gradients around one limit point vanishes. This allows us to further analyze the structure of oscillations, and establish their perpendicularity to the general drift., Comment: 28 pages, 2 figures
Published: 2020

27. Semialgebraic Optimization for Lipschitz Constants of ReLU Networks

Author: Chen, Tong, Lasserre, Jean-Bernard, Magron, Victor, and Pauwels, Edouard
Subjects: Mathematics - Optimization and Control, Computer Science - Machine Learning
Abstract: The Lipschitz constant of a network plays an important role in many applications of deep learning, such as robustness certification and Wasserstein Generative Adversarial Network. We introduce a semidefinite programming hierarchy to estimate the global and local Lipschitz constant of a multiple layer deep neural network. The novelty is to combine a polynomial lifting for ReLU functions derivatives with a weak generalization of Putinar's positivity certificate. This idea could also apply to other, nearly sparse, polynomial optimization problems in machine learning. We empirically demonstrate that our method provides a trade-off with respect to state of the art linear programming approach, and in some cases we obtain better bounds in less time., Comment: NeurIPS 2020
Published: 2020

28. Predicting drug side-effect profiles: a chemical fragment-based approach

Author: Stoven Véronique, Pauwels Edouard, and Yamanishi Yoshihiro
Subjects: Computer applications to medicine. Medical informatics, R858-859.7, Biology (General), QH301-705.5
Abstract: Abstract Background Drug side-effects, or adverse drug reactions, have become a major public health concern. It is one of the main causes of failure in the process of drug development, and of drug withdrawal once they have reached the market. Therefore, in silico prediction of potential side-effects early in the drug discovery process, before reaching the clinical stages, is of great interest to improve this long and expensive process and to provide new efficient and safe therapies for patients. Results In the present work, we propose a new method to predict potential side-effects of drug candidate molecules based on their chemical structures, applicable on large molecular databanks. A unique feature of the proposed method is its ability to extract correlated sets of chemical substructures (or chemical fragments) and side-effects. This is made possible using sparse canonical correlation analysis (SCCA). In the results, we show the usefulness of the proposed method by predicting 1385 side-effects in the SIDER database from the chemical structures of 888 approved drugs. These predictions are performed with simultaneous extraction of correlated ensembles formed by a set of chemical substructures shared by drugs that are likely to have a set of side-effects. We also conduct a comprehensive side-effect prediction for many uncharacterized drug molecules stored in DrugBank, and were able to confirm interesting predictions using independent source of information. Conclusions The proposed method is expected to be useful in various stages of the drug development process.
Published: 2011
Full Text: View/download PDF

29. Data Analysis from Empirical Moments and the Christoffel Function

Author: Pauwels, Edouard, Putinar, Mihai, and Lasserre, Jean-Bernard
Subjects: Christoffel-Darboux kernel, Empirical measure, Support inference, Manifold, Density estimation, Reproducing kernel Hilbert spaces, stat.ML, Mathematical Sciences, Information and Computing Sciences, Numerical & Computational Mathematics
Abstract: Spectral features of the empirical moment matrix constitute a resourcefultool for unveiling properties of a cloud of points, among which, density,support and latent structures. It is already well known that the empiricalmoment matrix encodes a great deal of subtle attributes of the underlyingmeasure. Starting from this object as base of observations we combine ideasfrom statistics, real algebraic geometry, orthogonal polynomials andapproximation theory for opening new insights relevant for Machine Learning(ML) problems with data supported on singular sets. Refined concepts andresults from real algebraic geometry and approximation theory are empowering asimple tool (the empirical moment matrix) for the task of solving non-trivialquestions in data analysis. We provide (1) theoretical support, (2) numericalexperiments and, (3) connections to real world data as a validation of thestamina of the empirical moment matrix approach.
Published: 2021

30. Curiosities and counterexamples in smooth convex optimization

Author: Bolte, Jerome and Pauwels, Edouard
Subjects: Mathematics - Optimization and Control
Abstract: Counterexamples to some old-standing optimization problems in the smooth convex coercive setting are provided. We show that block-coordinate, steepest descent with exact search or Bregman descent methods do not generally converge. Other failures of various desirable features are established: directional convergence of Cauchy's gradient curves, convergence of Newton's flow, finite length of Tikhonov path, convergence of central paths, or smooth Kurdyka-Lojasiewicz inequality. All examples are planar. These examples are based on general smooth convex interpolation results. Given a decreasing sequence of positively curved C k convex compact sets in the plane, we provide a level set interpolation of a C k smooth convex function where k $\ge$ 2 is arbitrary. If the intersection is reduced to one point our interpolant has positive definite Hessian, otherwise it is positive definite out of the solution set. Furthermore , given a sequence of decreasing polygons we provide an interpolant agreeing with the vertices and whose gradients coincide with prescribed normals.
Published: 2020

31. Rate of convergence for geometric inference based on the empirical Christoffel function

Author: Vu, Mai Trang, Bachoc, François, and Pauwels, Edouard
Subjects: Mathematics - Statistics Theory, Computer Science - Machine Learning
Abstract: We consider the problem of estimating the support of a measure from a finite, independent, sample. The estimators which are considered are constructed based on the empirical Christoffel function. Such estimators have been proposed for the problem of set estimation with heuristic justifications. We carry out a detailed finite sample analysis, that allows us to select the threshold and degree parameters as a function of the sample size. We provide a convergence rate analysis of the resulting support estimation procedure. Our analysis establishes that we may obtain finite sample bounds which are comparable to existing rates for different set estimation procedures. Our results rely on concentration inequalities for the empirical Christoffel function and on estimates of the supremum of the Christoffel-Darboux kernel on sets with smooth boundaries, that can be considered of independent interest.
Published: 2019

32. Conservative set valued fields, automatic differentiation, stochastic gradient method and deep learning

Author: Bolte, Jérôme and Pauwels, Edouard
Subjects: Mathematics - Optimization and Control, Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: Modern problems in AI or in numerical analysis require nonsmooth approaches with a flexible calculus. We introduce generalized derivatives called conservative fields for which we develop a calculus and provide representation formulas. Functions having a conservative field are called path differentiable: convex, concave, Clarke regular and any semialgebraic Lipschitz continuous functions are path differentiable. Using Whitney stratification techniques for semialgebraic and definable sets, our model provides variational formulas for nonsmooth automatic differentiation oracles, as for instance the famous backpropagation algorithm in deep learning. Our differential model is applied to establish the convergence in values of nonsmooth stochastic gradient methods as they are implemented in practice., Comment: Corrected typos
Published: 2019

33. An Inertial Newton Algorithm for Deep Learning

Author: Castera, Camille, Bolte, Jérôme, Févotte, Cédric, and Pauwels, Edouard
Subjects: Computer Science - Machine Learning, Mathematics - Optimization and Control, Statistics - Machine Learning
Abstract: We introduce a new second-order inertial optimization method for machine learning called INNA. It exploits the geometry of the loss function while only requiring stochastic approximations of the function values and the generalized gradients. This makes INNA fully implementable and adapted to large-scale optimization problems such as the training of deep neural networks. The algorithm combines both gradient-descent and Newton-like behaviors as well as inertia. We prove the convergence of INNA for most deep learning problems. To do so, we provide a well-suited framework to analyze deep learning loss functions involving tame optimization in which we study a continuous dynamical system together with its discrete stochastic approximations. We prove sublinear convergence for the continuous-time differential inclusion which underlies our algorithm. Additionally, we also show how standard optimization mini-batch methods applied to non-smooth non-convex problems can yield a certain type of spurious stationary points never discussed before. We address this issue by providing a theoretical framework around the new idea of $D$-criticality; we then give a simple asymptotic analysis of INNA. Our algorithm allows for using an aggressive learning rate of $o(1/\log k)$. From an empirical viewpoint, we show that INNA returns competitive results with respect to state of the art (stochastic gradient descent, ADAGRAD, ADAM) on popular deep learning benchmark problems., Comment: To appear in Journal of Machine Learning Research (JMLR), Volume 22, acceptance date: 5/21
Published: 2019

34. Semi-algebraic approximation using Christoffel-Darboux kernel

Author: Marx, Swann, Pauwels, Edouard, Weisser, Tillmann, Henrion, Didier, and Lasserre, Jean
Subjects: Mathematics - Optimization and Control
Abstract: We provide a new method to approximate a (possibly discontinuous) function using Christoffel-Darboux kernels. Our knowledge about the unknown multivariate function is in terms of finitely many moments of the Young measure supported on the graph of the function. Such an input is available when approximating weak (or measure-valued) solution of optimal control problems, entropy solutions to non-linear hyperbolic PDEs, or using numerical integration from finitely many evaluations of the function. While most of the existing methods construct a piecewise polynomial approximation, we construct a semi-algebraic approximation whose estimation and evaluation can be performed efficiently. An appealing feature of this method is that it deals with non-smoothness implicitly so that a single scheme can be used to treat smooth or non-smooth functions without any prior knowledge. On the theoretical side, we prove pointwise convergence almost everywhere as well as convergence in the Lebesgue one norm under broad assumptions. Using more restrictive assumptions, we obtain explicit convergence rates. We illustrate our approach on various examples from control and approximation. In particular we observe empirically that our method does not suffer from the the Gibbs phenomenon when approximating discontinuous functions.
Published: 2019

35. Conservative Parametric Optimality and the Ridge Method for Tame Min-Max Problems

Author: Pauwels, Edouard
Published: 2023
Full Text: View/download PDF

36. Curiosities and counterexamples in smooth convex optimization

Author: Bolte, Jérôme and Pauwels, Edouard
Published: 2022
Full Text: View/download PDF

37. Data analysis from empirical moments and the Christoffel function

Author: Pauwels, Edouard, Putinar, Mihai, and Lasserre, Jean-Bernard
Subjects: Statistics - Machine Learning
Abstract: Spectral features of the empirical moment matrix constitute a resourceful tool for unveiling properties of a cloud of points, among which, density, support and latent structures. It is already well known that the empirical moment matrix encodes a great deal of subtle attributes of the underlying measure. Starting from this object as base of observations we combine ideas from statistics, real algebraic geometry, orthogonal polynomials and approximation theory for opening new insights relevant for Machine Learning (ML) problems with data supported on singular sets. Refined concepts and results from real algebraic geometry and approximation theory are empowering a simple tool (the empirical moment matrix) for the task of solving non-trivial questions in data analysis. We provide (1) theoretical support, (2) numerical experiments and, (3) connections to real world data as a validation of the stamina of the empirical moment matrix approach.
Published: 2018

38. An unexpected connection between Bayes $A-$optimal designs and the Group Lasso

Author: Sagnol, Guillaume and Pauwels, Edouard
Subjects: Mathematics - Optimization and Control, 62K05
Abstract: We show that the $A$-optimal design optimization problem over $m$ design points in $\mathbb{R}^n$ is equivalent to minimizing a quadratic function plus a group lasso sparsity inducing term over $n\times m$ real matrices. This observation allows to describe several new algorithms for $A$-optimal design based on splitting and block coordinate decomposition. These techniques are well known and proved powerful to treat large scale problems in machine learning and signal processing communities. The proposed algorithms come with rigorous convergence guaranties and convergence rate estimate stemming from the optimization literature. Performances are illustrated on synthetic benchmarks and compared to existing methods for solving the optimal design problem.
Published: 2018

39. Relating Leverage Scores and Density using Regularized Christoffel Functions

Author: Pauwels, Edouard, Bach, Francis, and Vert, Jean-Philippe
Subjects: Computer Science - Machine Learning, Statistics - Machine Learning
Abstract: Statistical leverage scores emerged as a fundamental tool for matrix sketching and column sampling with applications to low rank approximation, regression, random feature learning and quadrature. Yet, the very nature of this quantity is barely understood. Borrowing ideas from the orthogonal polynomial literature, we introduce the regularized Christoffel function associated to a positive definite kernel. This uncovers a variational formulation for leverage scores for kernel methods and allows to elucidate their relationships with the chosen kernel as well as population density. Our main result quantitatively describes a decreasing relation between leverage score and population density for a broad class of kernels on Euclidean spaces. Numerical simulations support our findings.
Published: 2018

40. Second-Order Step-Size Tuning of SGD for Non-Convex Optimization

Author: Castera, Camille, Bolte, Jérôme, Févotte, Cédric, and Pauwels, Edouard
Published: 2022
Full Text: View/download PDF

41. The multiproximal linearization method for convex composite problems

Author: Bolte, Jérôme, Chen, Zheng, and Pauwels, Edouard
Subjects: Mathematics - Optimization and Control
Abstract: Composite minimization involves a collection of smooth functions which are aggregated in a nonsmooth manner. In the convex setting, we design an algorithm by linearizing each smooth component in accordance with its main curvature. The resulting method, called the Multiprox method, consists in solving successively simple problems (e.g. constrained quadratic problems) which can also feature some proximal operators. To study the complexity and the convergence of this method we are led to study quantitative qualification conditions to understand the impact of multipliers on the complexity bounds. We obtain explicit complexity results of the form $O(\frac{1}{k})$ involving new types of constant terms. A distinctive feature of our approach is to be able to cope with oracles involving moving constraints. Our method is flexible enough to include the moving balls method, the proximal Gauss-Newton's method, or the forward-backward splitting, for which we recover known complexity results or establish new ones. We show through several numerical experiments how the use of multiple proximal terms can be decisive for problems with complex geometries.
Published: 2017

42. Qualification Conditions in Semi-algebraic Programming

Author: Bolte, Jérôme, Hochart, Antoine, and Pauwels, Edouard
Subjects: Mathematics - Optimization and Control, 26D10, 32B20, 49K24, 49J52, 37B35, 14P15
Abstract: For an arbitrary finite family of semi-algebraic/definable functions, we consider the corresponding inequality constraint set and we study qualification conditions for perturbations of this set. In particular we prove that all positive diagonal perturbations, save perhaps a finite number of them, ensure that any point within the feasible set satisfies Mangasarian-Fromovitz constraint qualification. Using the Milnor-Thom theorem, we provide a bound for the number of singular perturbations when the constraints are polynomial functions. Examples show that the order of magnitude of our exponential bound is relevant. Our perturbation approach provides a simple protocol to build sequences of "regular" problems approximating an arbitrary semi-algebraic/definable problem. Applications to sequential quadratic programming methods and sum of squares relaxation are provided.
Published: 2017

43. On Fienup Methods for Regularized Phase Retrieval

Author: Pauwels, Edouard, Beck, Amir, Eldar, Yonina C., and Sabach, Shoham
Subjects: Computer Science - Information Theory, Mathematics - Optimization and Control
Abstract: Alternating minimization, or Fienup methods, have a long history in phase retrieval. We provide new insights related to the empirical and theoretical analysis of these algorithms when used with Fourier measurements and combined with convex priors. In particular, we show that Fienup methods can be viewed as performing alternating minimization on a regularized nonconvex least-squares problem with respect to amplitude measurements. We then prove that under mild additional structural assumptions on the prior (semi-algebraicity), the sequence of signal estimates has a smooth convergent behaviour towards a critical point of the nonconvex regularized least-squares objective. Finally, we propose an extension to Fienup techniques, based on a projected gradient descent interpretation and acceleration using inertial terms. We demonstrate experimentally that this modification combined with an $\ell_1$ prior constitutes a competitive approach for sparse phase retrieval.
Published: 2017
Full Text: View/download PDF

44. The Iterates of the Frank–Wolfe Algorithm May Not Converge.

Author: Bolte, Jérôme, Combettes, Cyrille W., and Pauwels, Edouard
Subjects: CONVEX sets, SMOOTHNESS of functions, AIR forces, CONVEX functions, CHESS
Abstract: The Frank–Wolfe algorithm is a popular method for minimizing a smooth convex function f over a compact convex set C. Whereas many convergence results have been derived in terms of function values, almost nothing is known about the convergence behavior of the sequence of iterates (xt)t∈N. Under the usual assumptions, we design several counterexamples to the convergence of (xt)t∈N , where f is d-time continuously differentiable, d⩾2 , and f(xt)→minC f. Our counterexamples cover the cases of open-loop, closed-loop, and line-search step-size strategies and work for any choice of the linear minimization oracle, thus demonstrating the fundamental pathologies in the convergence behavior of (xt)t∈N. Funding: The authors acknowledge the support of the AI Interdisciplinary Institute ANITI funding through the French "Investments for the Future – PIA3" program under the Agence Nationale de la Recherche (ANR) agreement [Grant ANR-19-PI3A0004], the Air Force Office of Scientific Research, Air Force Material Command, U.S. Air Force [Grants FA866-22-1-7012 and ANR MaSDOL 19-CE23-0017-0], ANR Chess [Grant ANR-17-EURE-0010], ANR Regulia, and Centre Lagrange. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

45. The Christoffel-Darboux Kernel for Data Analysis

Author: Lasserre, Jean Bernard, Pauwels, Edouard, and Putinar, Mihai
Published: 2022
Full Text: View/download PDF

46. A sublevel moment-SOS hierarchy for polynomial optimization

Author: Chen, Tong, Lasserre, Jean-Bernard, Magron, Victor, and Pauwels, Edouard
Published: 2022
Full Text: View/download PDF

47. The empirical Christoffel function with applications in data analysis

Author: Lasserre, Jean-Bernard and Pauwels, Edouard
Subjects: Computer Science - Machine Learning
Abstract: We illustrate the potential applications in machine learning of the Christoffel function, or more precisely, its empirical counterpart associated with a counting measure uniformly supported on a finite set of points. Firstly, we provide a thresholding scheme which allows to approximate the support of a measure from a finite subset of its moments with strong asymptotic guaranties. Secondly, we provide a consistency result which relates the empirical Christoffel function and its population counterpart in the limit of large samples. Finally, we illustrate the relevance of our results on simulated and real world datasets for several applications in statistics and machine learning: (a) density and support estimation from finite samples, (b) outlier and novelty detection and (c) affine matching.
Published: 2017

48. Semi-algebraic Approximation Using Christoffel–Darboux Kernel

Author: Marx, Swann, Pauwels, Edouard, Weisser, Tillmann, Henrion, Didier, and Lasserre, Jean Bernard
Published: 2021
Full Text: View/download PDF

49. Extragradient Method in Optimization: Convergence and Complexity

Author: Nguyen, Trong Phong, Pauwels, Edouard, Richard, Emile, and Suter, Bruce W.
Subjects: Mathematics - Optimization and Control
Abstract: We consider the extragradient method to minimize the sum of two functions, the first one being smooth and the second being convex. Under the Kurdyka-Lojasiewicz assumption, we prove that the sequence produced by the extragradient method converges to a critical point of the problem and has finite length. The analysis is extended to the case when both functions are convex. We provide, in this case, a sublinear convergence rate, as for gradient-based methods. Furthermore, we show that the recent small-prox complexity result can be applied to this method. Considering the extragradient method is an occasion to describe an exact line search scheme for proximal decomposition methods. We provide details for the implementation of this scheme for the one norm regularized least squares problem and demonstrate numerical results which suggest that combining nonaccelerated methods with exact line search can be a competitive choice.
Published: 2016

50. Sorting out typicality with the inverse moment matrix SOS polynomial

Author: Lasserre, Jean-Bernard and Pauwels, Edouard
Subjects: Computer Science - Learning
Abstract: We study a surprising phenomenon related to the representation of a cloud of data points using polynomials. We start with the previously unnoticed empirical observation that, given a collection (a cloud) of data points, the sublevel sets of a certain distinguished polynomial capture the shape of the cloud very accurately. This distinguished polynomial is a sum-of-squares (SOS) derived in a simple manner from the inverse of the empirical moment matrix. In fact, this SOS polynomial is directly related to orthogonal polynomials and the Christoffel function. This allows to generalize and interpret extremality properties of orthogonal polynomials and to provide a mathematical rationale for the observed phenomenon. Among diverse potential applications, we illustrate the relevance of our results on a network intrusion detection task for which we obtain performances similar to existing dedicated methods reported in the literature.
Published: 2016

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

Publisher

249 results on '"Pauwels Edouard"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources