Author: "Loizou, Nicolas" / Database: OpenAIRE - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Loizou, Nicolas"' showing total 24 results

Start Over Author "Loizou, Nicolas" Database OpenAIRE

24 results on '"Loizou, Nicolas"'

1. Locally Adaptive Federated Learning via Stochastic Polyak Stepsizes

Author: Mukherjee, Sohom, Loizou, Nicolas, and Stich, Sebastian U.
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Optimization and Control (math.OC), Statistics - Machine Learning, FOS: Mathematics, Machine Learning (stat.ML), Mathematics - Optimization and Control, Machine Learning (cs.LG)
Abstract: State-of-the-art federated learning algorithms such as FedAvg require carefully tuned stepsizes to achieve their best performance. The improvements proposed by existing adaptive federated methods involve tuning of additional hyperparameters such as momentum parameters, and consider adaptivity only in the server aggregation round, but not locally. These methods can be inefficient in many practical scenarios because they require excessive tuning of hyperparameters and do not capture local geometric information. In this work, we extend the recently proposed stochastic Polyak stepsize (SPS) to the federated learning setting, and propose new locally adaptive and nearly parameter-free distributed SPS variants (FedSPS and FedDecSPS). We prove that FedSPS converges linearly in strongly convex and sublinearly in convex settings when the interpolation condition (overparametrization) is satisfied, and converges to a neighborhood of the solution in the general case. We extend our proposed method to a decreasing stepsize version FedDecSPS, that converges also when the interpolation condition does not hold. We validate our theoretical claims by performing illustrative convex experiments. Our proposed algorithms match the optimization performance of FedAvg with the best tuned hyperparameters in the i.i.d. case, and outperform FedAvg in the non-i.i.d. case., 33 pages, 6 figures
Published: 2023

2. Communication-Efficient Gradient Descent-Accent Methods for Distributed Variational Inequalities: Unified Analysis and Local Updates

Author: Zhang, Siqi, Choudhury, Sayantan, Stich, Sebastian U, and Loizou, Nicolas
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Statistics - Machine Learning, Optimization and Control (math.OC), FOS: Mathematics, Machine Learning (stat.ML), Mathematics - Optimization and Control, Machine Learning (cs.LG)
Abstract: Distributed and federated learning algorithms and techniques associated primarily with minimization problems. However, with the increase of minimax optimization and variational inequality problems in machine learning, the necessity of designing efficient distributed/federated learning approaches for these problems is becoming more apparent. In this paper, we provide a unified convergence analysis of communication-efficient local training methods for distributed variational inequality problems (VIPs). Our approach is based on a general key assumption on the stochastic estimates that allows us to propose and analyze several novel local training algorithms under a single framework for solving a class of structured non-monotone VIPs. We present the first local gradient descent-accent algorithms with provable improved communication complexity for solving distributed variational inequalities on heterogeneous data. The general algorithmic framework recovers state-of-the-art algorithms and their sharp convergence guarantees when the setting is specialized to minimization or minimax optimization problems. Finally, we demonstrate the strong performance of the proposed algorithms compared to state-of-the-art methods when solving federated minimax optimization problems.
Published: 2023
Full Text: View/download PDF

3. Stochastic Gradient Descent-Ascent: Unified Theory and New Efficient Methods

Author: Beznosikov, Aleksandr, Gorbunov, Eduard, Berard, Hugo, and Loizou, Nicolas
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Optimization and Control (math.OC), FOS: Mathematics, Mathematics - Optimization and Control, Machine Learning (cs.LG)
Abstract: Stochastic Gradient Descent-Ascent (SGDA) is one of the most prominent algorithms for solving min-max optimization and variational inequalities problems (VIP) appearing in various machine learning tasks. The success of the method led to several advanced extensions of the classical SGDA, including variants with arbitrary sampling, variance reduction, coordinate randomization, and distributed variants with compression, which were extensively studied in the literature, especially during the last few years. In this paper, we propose a unified convergence analysis that covers a large variety of stochastic gradient descent-ascent methods, which so far have required different intuitions, have different applications and have been developed separately in various communities. A key to our unified framework is a parametric assumption on the stochastic estimates. Via our general theoretical framework, we either recover the sharpest known rates for the known special cases or tighten them. Moreover, to illustrate the flexibility of our approach we develop several new variants of SGDA such as a new variance-reduced method (L-SVRGDA), new distributed methods with compression (QSGDA, DIANA-SGDA, VR-DIANA-SGDA), and a new method with coordinate randomization (SEGA-SGDA). Although variants of the new methods are known for solving minimization problems, they were never considered or analyzed for solving min-max problems and VIPs. We also demonstrate the most important properties of the new methods through extensive numerical experiments., Comment: AISTATS 2023. 65 pages, 5 figures, 3 tables. Changes in v2: new results were added (Theorem 2.5 and its corollaries), few typos were fixed, more clarifications were added. Changes in v3: AISTATS formatting was applied, small clarifications were added. Code: https://github.com/hugobb/sgda
Published: 2022
Full Text: View/download PDF

4. A Unified Approach to Reinforcement Learning, Quantal Response Equilibria, and Two-Player Zero-Sum Games

Author: Sokota, Samuel, D'Orazio, Ryan, Kolter, J. Zico, Loizou, Nicolas, Lanctot, Marc, Mitliagkas, Ioannis, Brown, Noam, and Kroer, Christian
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Artificial Intelligence (cs.AI), Computer Science - Artificial Intelligence, Computer Science - Computer Science and Game Theory, Machine Learning (cs.LG), Computer Science and Game Theory (cs.GT)
Abstract: This work studies an algorithm, which we call magnetic mirror descent, that is inspired by mirror descent and the non-Euclidean proximal gradient algorithm. Our contribution is demonstrating the virtues of magnetic mirror descent as both an equilibrium solver and as an approach to reinforcement learning in two-player zero-sum games. These virtues include: 1) Being the first quantal response equilibria solver to achieve linear convergence for extensive-form games with first order feedback; 2) Being the first standard reinforcement learning algorithm to achieve empirically competitive results with CFR in tabular settings; 3) Achieving favorable performance in 3x3 Dark Hex and Phantom Tic-Tac-Toe as a self-play deep reinforcement learning algorithm.
Published: 2022
Full Text: View/download PDF

5. Stochastic Extragradient: General Analysis and Improved Rates

Author: Gorbunov, Eduard, Berard, Hugo, Gidel, Gauthier, and Loizou, Nicolas
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Optimization and Control (math.OC), FOS: Mathematics, Mathematics - Optimization and Control, Machine Learning (cs.LG)
Abstract: The Stochastic Extragradient (SEG) method is one of the most popular algorithms for solving min-max optimization and variational inequalities problems (VIP) appearing in various machine learning tasks. However, several important questions regarding the convergence properties of SEG are still open, including the sampling of stochastic gradients, mini-batching, convergence guarantees for the monotone finite-sum variational inequalities with possibly non-monotone terms, and others. To address these questions, in this paper, we develop a novel theoretical framework that allows us to analyze several variants of SEG in a unified manner. Besides standard setups, like Same-Sample SEG under Lipschitzness and monotonicity or Independent-Samples SEG under uniformly bounded variance, our approach allows us to analyze variants of SEG that were never explicitly considered in the literature before. Notably, we analyze SEG with arbitrary sampling which includes importance sampling and various mini-batching strategies as special cases. Our rates for the new variants of SEG outperform the current state-of-the-art convergence guarantees and rely on less restrictive assumptions., AISTATS 2022. 37 pages, 3 figures, 2 tables. Changes in v2: some minor typos were fixed, several places were clarified. Changes in v3: few typos were fixed, inaccuracies in Appendix B were corrected. Code: https://github.com/hugobb/Stochastic-Extragradient
Published: 2021

6. Stochastic Mirror Descent: Convergence Analysis and Adaptive Variants via the Mirror Stochastic Polyak Stepsize

Author: D'Orazio, Ryan, Loizou, Nicolas, Laradji, Issam, and Mitliagkas, Ioannis
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Optimization and Control (math.OC), FOS: Mathematics, Mathematics - Optimization and Control, Machine Learning (cs.LG)
Abstract: We investigate the convergence of stochastic mirror descent (SMD) under interpolation in relatively smooth and smooth convex optimization. In relatively smooth convex optimization we provide new convergence guarantees for SMD with a constant stepsize. For smooth convex optimization we propose a new adaptive stepsize scheme -- the mirror stochastic Polyak stepsize (mSPS). Notably, our convergence results in both settings do not make bounded gradient assumptions or bounded variance assumptions, and we show convergence to a neighborhood that vanishes under interpolation. Consequently, these results correspond to the first convergence guarantees under interpolation for the exponentiated gradient algorithm for fixed or adaptive stepsizes. mSPS generalizes the recently proposed stochastic Polyak stepsize (SPS) (Loizou et al. 2021) to mirror descent and remains both practical and efficient for modern machine learning applications while inheriting the benefits of mirror descent. We complement our results with experiments across various supervised learning tasks and different instances of SMD, demonstrating the effectiveness of mSPS.
Published: 2021

7. Extragradient Method: $O(1/K)$ Last-Iterate Convergence for Monotone Variational Inequalities and Connections With Cocoercivity

Author: Gorbunov, Eduard, Loizou, Nicolas, and Gidel, Gauthier
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Optimization and Control (math.OC), FOS: Mathematics, Mathematics - Optimization and Control, Machine Learning (cs.LG)
Abstract: Extragradient method (EG) (Korpelevich, 1976) is one of the most popular methods for solving saddle point and variational inequalities problems (VIP). Despite its long history and significant attention in the optimization community, there remain important open questions about convergence of EG. In this paper, we resolve one of such questions and derive the first last-iterate $O(1/K)$ convergence rate for EG for monotone and Lipschitz VIP without any additional assumptions on the operator unlike the only known result of this type (Golowich et al., 2020) that relies on the Lipschitzness of the Jacobian of the operator. The rate is given in terms of reducing the squared norm of the operator. Moreover, we establish several results on the (non-)cocoercivity of the update operators of EG, Optimistic Gradient Method, and Hamiltonian Gradient Method, when the original operator is monotone and Lipschitz., AISTATS 2022; 37 pages, 4 figures. Changes in v2: structure was changed, minor typos are fixed, several additional clarifications were added. Code: https://github.com/eduardgorbunov/extragradient_last_iterate_AISTATS_2022
Published: 2021

8. Stochastic Gradient Descent-Ascent and Consensus Optimization for Smooth Games: Convergence Analysis under Expected Co-coercivity

Author: Loizou, Nicolas, Berard, Hugo, Gidel, Gauthier, Mitliagkas, Ioannis, and Lacoste-Julien, Simon
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Optimization and Control (math.OC), Computer Science - Computer Science and Game Theory, Statistics - Machine Learning, FOS: Mathematics, Machine Learning (stat.ML), Mathematics - Optimization and Control, Machine Learning (cs.LG), Computer Science and Game Theory (cs.GT)
Abstract: Two of the most prominent algorithms for solving unconstrained smooth games are the classical stochastic gradient descent-ascent (SGDA) and the recently introduced stochastic consensus optimization (SCO) [Mescheder et al., 2017]. SGDA is known to converge to a stationary point for specific classes of games, but current convergence analyses require a bounded variance assumption. SCO is used successfully for solving large-scale adversarial problems, but its convergence guarantees are limited to its deterministic variant. In this work, we introduce the expected co-coercivity condition, explain its benefits, and provide the first last-iterate convergence guarantees of SGDA and SCO under this condition for solving a class of stochastic variational inequality problems that are potentially non-monotone. We prove linear convergence of both methods to a neighborhood of the solution when they use constant step-size, and we propose insightful stepsize-switching rules to guarantee convergence to the exact solution. In addition, our convergence guarantees hold under the arbitrary sampling paradigm, and as such, we give insights into the complexity of minibatching., 35th Conference on Neural Information Processing Systems (NeurIPS 2021)
Published: 2021

9. AI-SARAH: Adaptive and Implicit Stochastic Recursive Gradient Methods

Author: Shi, Zheng, Sadiev, Abdurakhmon, Loizou, Nicolas, Richt��rik, Peter, and Tak��, Martin
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Optimization and Control (math.OC), FOS: Mathematics, Mathematics - Optimization and Control, Machine Learning (cs.LG)
Abstract: We present AI-SARAH, a practical variant of SARAH. As a variant of SARAH, this algorithm employs the stochastic recursive gradient yet adjusts step-size based on local geometry. AI-SARAH implicitly computes step-size and efficiently estimates local Lipschitz smoothness of stochastic functions. It is fully adaptive, tune-free, straightforward to implement, and computationally efficient. We provide technical insight and intuitive illustrations on its design and convergence. We conduct extensive empirical analysis and demonstrate its strong performance compared with its classical counterparts and other state-of-the-art first-order methods in solving convex machine learning problems.
Published: 2021

10. Stochastic Hamiltonian Gradient Methods for Smooth Games

Author: Loizou, Nicolas, Berard, Hugo, Jolicoeur-Martineau, Alexia, Vincent, Pascal, Lacoste-Julien, Simon, and Mitliagkas, Ioannis
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Optimization and Control (math.OC), Computer Science - Computer Science and Game Theory, Statistics - Machine Learning, FOS: Mathematics, MathematicsofComputing_NUMERICALANALYSIS, Machine Learning (stat.ML), Mathematics - Optimization and Control, Machine Learning (cs.LG), Computer Science and Game Theory (cs.GT)
Abstract: The success of adversarial formulations in machine learning has brought renewed motivation for smooth games. In this work, we focus on the class of stochastic Hamiltonian methods and provide the first convergence guarantees for certain classes of stochastic smooth games. We propose a novel unbiased estimator for the stochastic Hamiltonian gradient descent (SHGD) and highlight its benefits. Using tools from the optimization literature we show that SHGD converges linearly to the neighbourhood of a stationary point. To guarantee convergence to the exact solution, we analyze SHGD with a decreasing step-size and we also present the first stochastic variance reduced Hamiltonian method. Our results provide the first global non-asymptotic last-iterate convergence guarantees for the class of stochastic unconstrained bilinear games and for the more general class of stochastic games that satisfy a "sufficiently bilinear" condition, notably including some non-convex non-concave problems. We supplement our analysis with experiments on stochastic bilinear and sufficiently bilinear games, where our theory is shown to be tight, and on simple adversarial machine learning formulations., ICML 2020 - Proceedings of the 37th International Conference on Machine Learning
Published: 2020

11. Unified Analysis of Stochastic Gradient Methods for Composite Convex and Smooth Optimization

Author: Khaled, Ahmed, Sebbouh, Othmane, Loizou, Nicolas, Gower, Robert M., and Richt��rik, Peter
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Optimization and Control (math.OC), Statistics - Machine Learning, FOS: Mathematics, Machine Learning (stat.ML), Mathematics - Optimization and Control, Machine Learning (cs.LG)
Abstract: We present a unified theorem for the convergence analysis of stochastic gradient algorithms for minimizing a smooth and convex loss plus a convex regularizer. We do this by extending the unified analysis of Gorbunov, Hanzely \& Richt��rik (2020) and dropping the requirement that the loss function be strongly convex. Instead, we only rely on convexity of the loss function. Our unified analysis applies to a host of existing algorithms such as proximal SGD, variance reduced methods, quantization and some coordinate descent type methods. For the variance reduced methods, we recover the best known convergence rates as special cases. For proximal SGD, the quantization and coordinate type methods, we uncover new state-of-the-art convergence rates. Our analysis also includes any form of sampling and minibatching. As such, we are able to determine the minibatch size that optimizes the total complexity of variance reduced methods. We showcase this by obtaining a simple formula for the optimal minibatch size of two variance reduced methods (\textit{L-SVRG} and \textit{SAGA}). This optimal minibatch size not only improves the theoretical total complexity of the methods but also improves their convergence in practice, as we show in several experiments.
Published: 2020

12. SGD for Structured Nonconvex Functions: Learning Rates, Minibatching and Interpolation

Author: Gower, Robert M., Sebbouh, Othmane, and Loizou, Nicolas
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Optimization and Control (math.OC), Statistics - Machine Learning, FOS: Mathematics, Machine Learning (stat.ML), Mathematics - Optimization and Control, Machine Learning (cs.LG)
Abstract: Stochastic Gradient Descent (SGD) is being used routinely for optimizing non-convex functions. Yet, the standard convergence theory for SGD in the smooth non-convex setting gives a slow sublinear convergence to a stationary point. In this work, we provide several convergence theorems for SGD showing convergence to a global minimum for non-convex problems satisfying some extra structural assumptions. In particular, we focus on two large classes of structured non-convex functions: (i) Quasar (Strongly) Convex functions (a generalization of convex functions) and (ii) functions satisfying the Polyak-Lojasiewicz condition (a generalization of strongly-convex functions). Our analysis relies on an Expected Residual condition which we show is a strictly weaker assumption than previously used growth conditions, expected smoothness or bounded variance assumptions. We provide theoretical guarantees for the convergence of SGD for different step-size selections including constant, decreasing and the recently proposed stochastic Polyak step-size. In addition, all of our analysis holds for the arbitrary sampling paradigm, and as such, we give insights into the complexity of minibatching and determine an optimal minibatch size. Finally, we show that for models that interpolate the training data, we can dispense of our Expected Residual condition and give state-of-the-art results in this setting., Proceedings of the 24th International Conference on Artificial Intelligence and Statistics (AISTATS) 2021
Published: 2020

13. Stochastic Polyak Step-size for SGD: An Adaptive Learning Rate for Fast Convergence

Author: Loizou, Nicolas, Vaswani, Sharan, Laradji, Issam, and Lacoste-Julien, Simon
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Statistics - Machine Learning, Optimization and Control (math.OC), FOS: Mathematics, Machine Learning (stat.ML), Mathematics - Optimization and Control, Machine Learning (cs.LG)
Abstract: We propose a stochastic variant of the classical Polyak step-size (Polyak, 1987) commonly used in the subgradient method. Although computing the Polyak step-size requires knowledge of the optimal function values, this information is readily available for typical modern machine learning applications. Consequently, the proposed stochastic Polyak step-size (SPS) is an attractive choice for setting the learning rate for stochastic gradient descent (SGD). We provide theoretical convergence guarantees for SGD equipped with SPS in different settings, including strongly convex, convex and non-convex functions. Furthermore, our analysis results in novel convergence guarantees for SGD with a constant step-size. We show that SPS is particularly effective when training over-parameterized models capable of interpolating the training data. In this setting, we prove that SPS enables SGD to converge to the true solution at a fast rate without requiring the knowledge of any problem-dependent constants or additional computational overhead. We experimentally validate our theoretical results via extensive experiments on synthetic and real datasets. We demonstrate the strong performance of SGD with SPS compared to state-of-the-art optimization methods when training over-parameterized models., Comment: Proceedings of the 24th International Conference on Artificial Intelligence and Statistics (AISTATS) 2021
Published: 2020
Full Text: View/download PDF

14. Randomized Iterative Methods for Linear Systems: Momentum, Inexactness and Gossip

Author: Loizou, Nicolas
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Optimization and Control (math.OC), FOS: Mathematics, Numerical Analysis (math.NA), Mathematics - Numerical Analysis, Mathematics - Optimization and Control, Machine Learning (cs.LG)
Abstract: In the era of big data, one of the key challenges is the development of novel optimization algorithms that can accommodate vast amounts of data while at the same time satisfying constraints and limitations of the problem under study. The need to solve optimization problems is ubiquitous in essentially all quantitative areas of human endeavor, including industry and science. In the last decade there has been a surge in the demand from practitioners, in fields such as machine learning, computer vision, artificial intelligence, signal processing and data science, for new methods able to cope with these new large scale problems. In this thesis we are focusing on the design, complexity analysis and efficient implementations of such algorithms. In particular, we are interested in the development of randomized iterative methods for solving large scale linear systems, stochastic quadratic optimization problems, the best approximation problem and quadratic optimization problems. A large part of the thesis is also devoted to the development of efficient methods for obtaining average consensus on large scale networks., PhD Thesis, University of Edinburgh, 2019
Published: 2019

15. Randomized iterative methods for linear systems: momentum, inexactness and gossip

Author: Loizou, Nicolas, Richtarik, Peter, Diakonikolas, Ilias, and Szpruch, Lukasz
Subjects: optimization problems, stochastic dual subspace ascent, convex quadratic problems, Controlled Noise Insertion, stochastic optimization algorithms, complexity analysis, stochastic proximal point, stochastic gradient descent, randomized Gaussian Kaczmarz, randomized gossip algorithms, heavy ball momentum, Binary Oracle, ε -Gap Oracle, stochastic Newton
Abstract: In the era of big data, one of the key challenges is the development of novel optimization algorithms that can accommodate vast amounts of data while at the same time satisfying constraints and limitations of the problem under study. The need to solve optimization problems is ubiquitous in essentially all quantitative areas of human endeavour, including industry and science. In the last decade there has been a surge in the demand from practitioners, in fields such as machine learning, computer vision, artificial intelligence, signal processing and data science, for new methods able to cope with these new large scale problems. In this thesis we are focusing on the design, complexity analysis and efficient implementations of such algorithms. In particular, we are interested in the development of randomized first order iterative methods for solving large scale linear systems, stochastic quadratic optimization problems and the distributed average consensus problem. In Chapter 2, we study several classes of stochastic optimization algorithms enriched with heavy ball momentum. Among the methods studied are: stochastic gradient descent, stochastic Newton, stochastic proximal point and stochastic dual subspace ascent. This is the first time momentum variants of several of these methods are studied. We choose to perform our analysis in a setting in which all of the above methods are equivalent: convex quadratic problems. We prove global non-asymptotic linear convergence rates for all methods and various measures of success, including primal function values, primal iterates, and dual function values. We also show that the primal iterates converge at an accelerated linear rate in a somewhat weaker sense. This is the first time a linear rate is shown for the stochastic heavy ball method (i.e., stochastic gradient descent method with momentum). Under somewhat weaker conditions, we establish a sublinear convergence rate for Cesaro averages of primal iterates. Moreover, we propose a novel concept, which we call stochastic momentum, aimed at decreasing the cost of performing the momentum step. We prove linear convergence of several stochastic methods with stochastic momentum, and show that in some sparse data regimes and for sufficiently small momentum parameters, these methods enjoy better overall complexity than methods with deterministic momentum. Finally, we perform extensive numerical testing on artificial and real datasets. In Chapter 3, we present a convergence rate analysis of inexact variants of stochastic gradient descent, stochastic Newton, stochastic proximal point and stochastic subspace ascent. A common feature of these methods is that in their update rule a certain sub-problem needs to be solved exactly. We relax this requirement by allowing for the sub-problem to be solved inexactly. In particular, we propose and analyze inexact randomized iterative methods for solving three closely related problems: a convex stochastic quadratic optimization problem, a best approximation problem and its dual { a concave quadratic maximization problem. We provide iteration complexity results under several assumptions on the inexactness error. Inexact variants of many popular and some more exotic methods, including randomized block Kaczmarz, randomized Gaussian Kaczmarz and randomized block coordinate descent, can be cast as special cases. Finally, we present numerical experiments which demonstrate the benefits of allowing inexactness. When the data describing a given optimization problem is big enough, it becomes impossible to store it on a single machine. In such situations, it is usually preferable to distribute the data among the nodes of a cluster or a supercomputer. In one such setting the nodes cooperate to minimize the sum (or average) of private functions (convex or non-convex) stored at the nodes. Among the most popular protocols for solving this problem in a decentralized fashion (communication is allowed only between neighbours) are randomized gossip algorithms. In Chapter 4 we propose a new approach for the design and analysis of randomized gossip algorithms which can be used to solve the distributed average consensus problem, a fundamental problem in distributed computing, where each node of a network initially holds a number or vector, and the aim is to calculate the average of these objects by communicating only with its neighbours (connected nodes). The new approach consists in establishing new connections to recent literature on randomized iterative methods for solving large-scale linear systems. Our general framework recovers a comprehensive array of well-known gossip protocols as special cases and allow for the development of block and arbitrary sampling variants of all of these methods. In addition, we present novel and provably accelerated randomized gossip protocols where in each step all nodes of the network update their values using their own information but only a subset of them exchange messages. The accelerated protocols are the first randomized gossip algorithms that converge to consensus with a provably accelerated linear rate. The theoretical results are validated via computational testing on typical wireless sensor network topologies. Finally, in Chapter 5, we move towards a different direction and present the first randomized gossip algorithms for solving the average consensus problem while at the same time protecting the private values stored at the nodes as these may be sensitive. In particular, we develop and analyze three privacy preserving variants of the randomized pairwise gossip algorithm ("randomly pick an edge of the network and then replace the values stored at vertices of this edge by their average") first proposed by Boyd et al. [16] for solving the average consensus problem. The randomized methods we propose are all dual in nature. That is, they are designed to solve the dual of the best approximation optimization formulation of the average consensus problem. We call our three privacy preservation techniques "Binary Oracle", "ε -Gap Oracle" and "Controlled Noise Insertion". We give iteration complexity bounds for the proposed privacy preserving randomized gossip protocols and perform extensive numerical experiments.
Published: 2019

16. SGD: General Analysis and Improved Rates

Author: Gower, Robert, Loizou, Nicolas, Qian, Xun, Sailanbayev, Alibek, Shulgin, Egor, Richtárik, Peter, Statistical Machine Learning and Parsimony (SIERRA), Département d'informatique - ENS Paris (DI-ENS), Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche en Informatique et en Automatique (Inria)-École normale supérieure - Paris (ENS Paris), Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche en Informatique et en Automatique (Inria)-École normale supérieure - Paris (ENS Paris), Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-Centre National de la Recherche Scientifique (CNRS)-Inria de Paris, Institut National de Recherche en Informatique et en Automatique (Inria), Signal, Statistique et Apprentissage (S2A), Laboratoire Traitement et Communication de l'Information (LTCI), Institut Mines-Télécom [Paris] (IMT)-Télécom Paris-Institut Mines-Télécom [Paris] (IMT)-Télécom Paris, Département Images, Données, Signal (IDS), Télécom ParisTech, University of Edinburgh, King Abdullah University of Science and Technology (KAUST), School of Science and Technology [Kazakhstan], Nazarbayev University [Kazakhstan], Moscow Institute of Physics and Technology [Moscow] (MIPT), School of Mathematics - University of Edinburgh, Robert M. Gower acknowledges the support by a public grant as part of the Investissement d’avenir project, reference ANR-11-LABX-0056-LMH, LabEx LMH, in a joint call with Gaspard Monge Program for optimization, operations research and their interactions with data sciences., Département d'informatique de l'École normale supérieure (DI-ENS), École normale supérieure - Paris (ENS Paris), Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-École normale supérieure - Paris (ENS Paris), Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Centre National de la Recherche Scientifique (CNRS)-Inria de Paris, École normale supérieure - Paris (ENS-PSL), and Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-École normale supérieure - Paris (ENS-PSL)
Subjects: FOS: Computer and information sciences, Computer Science::Machine Learning, Computer Science - Machine Learning, Optimization and Control (math.OC), Statistics - Machine Learning, FOS: Mathematics, Machine Learning (stat.ML), [MATH.MATH-OC]Mathematics [math]/Optimization and Control [math.OC], Mathematics - Optimization and Control, Machine Learning (cs.LG), [INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI]
Abstract: We propose a general yet simple theorem describing the convergence of SGD under the arbitrary sampling paradigm. Our theorem describes the convergence of an infinite array of variants of SGD, each of which is associated with a specific probability law governing the data selection rule used to form mini-batches. This is the first time such an analysis is performed, and most of our variants of SGD were never explicitly considered in the literature before. Our analysis relies on the recently introduced notion of expected smoothness and does not rely on a uniform bound on the variance of the stochastic gradients. By specializing our theorem to different mini-batching strategies, such as sampling with replacement and independent sampling, we derive exact expressions for the stepsize as a function of the mini-batch size. With this we can also determine the mini-batch size that optimizes the total complexity, and show explicitly that as the variance of the stochastic gradient evaluated at the minimum grows, so does the optimal mini-batch size. For zero variance, the optimal mini-batch size is one. Moreover, we prove insightful stepsize-switching rules which describe when one should switch from a constant to a decreasing stepsize regime., 23 pages, 6 figures
Published: 2019

17. A Privacy Preserving Randomized Gossip Algorithm via Controlled Noise Insertion

Author: Hanzely, Filip, Kone��n��, Jakub, Loizou, Nicolas, Richt��rik, Peter, and Grishchenko, Dmitry
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Computer Science - Distributed, Parallel, and Cluster Computing, Optimization and Control (math.OC), FOS: Mathematics, FOS: Electrical engineering, electronic engineering, information engineering, Computer Science - Systems and Control, Computer Science - Multiagent Systems, Distributed, Parallel, and Cluster Computing (cs.DC), Systems and Control (eess.SY), Mathematics - Optimization and Control, Machine Learning (cs.LG), Multiagent Systems (cs.MA)
Abstract: In this work we present a randomized gossip algorithm for solving the average consensus problem while at the same time protecting the information about the initial private values stored at the nodes. We give iteration complexity bounds for the method and perform extensive numerical experiments., NeurIPS 2018, Privacy Preserving Machine Learning Workshop (camera ready version). The full-length paper, which includes a number of additional algorithms and results (including proofs of statements and experiments), is available in arXiv:1706.07636
Published: 2019

18. Stochastic Gradient Push for Distributed Deep Learning

Author: Assran, Mahmoud, Loizou, Nicolas, Ballas, Nicolas, and Rabbat, Michael
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Machine Learning (stat.ML), Machine Learning (cs.LG), Artificial Intelligence (cs.AI), Computer Science - Distributed, Parallel, and Cluster Computing, Optimization and Control (math.OC), Statistics - Machine Learning, FOS: Mathematics, Computer Science - Multiagent Systems, Distributed, Parallel, and Cluster Computing (cs.DC), Mathematics - Optimization and Control, Multiagent Systems (cs.MA)
Abstract: Distributed data-parallel algorithms aim to accelerate the training of deep neural networks by parallelizing the computation of large mini-batch gradient updates across multiple nodes. Approaches that synchronize nodes using exact distributed averaging (e.g., via AllReduce) are sensitive to stragglers and communication delays. The PushSum gossip algorithm is robust to these issues, but only performs approximate distributed averaging. This paper studies Stochastic Gradient Push (SGP), which combines PushSum with stochastic gradient updates. We prove that SGP converges to a stationary point of smooth, non-convex objectives at the same sub-linear rate as SGD, and that all nodes achieve consensus. We empirically validate the performance of SGP on image classification (ResNet-50, ImageNet) and machine translation (Transformer, WMT'16 En-De) workloads. Our code will be made publicly available., ICML 2019
Published: 2018

19. Momentum and Stochastic Momentum for Stochastic Gradient, Newton, Proximal Point and Subspace Descent Methods

Author: Loizou, Nicolas and Richtárik, Peter
Subjects: math.NA, math.OC, cs.LG, stat.ML, cs.NA
Abstract: In this paper we study several classes of stochastic optimization algorithms enriched with heavy ball momentum. Among the methods studied are: stochastic gradient descent, stochastic Newton, stochastic proximal point and stochastic dual subspace ascent. This is the first time momentum variants of several of these methods are studied. We choose to perform our analysis in a setting in which all of the above methods are equivalent. We prove global nonassymptotic linear convergence rates for all methods and various measures of success, including primal function values, primal iterates (in L2 sense), and dual function values. We also show that the primal iterates converge at an accelerated linear rate in the L1 sense. This is the first time a linear rate is shown for the stochastic heavy ball method (i.e., stochastic gradient descent method with momentum). Under somewhat weaker conditions, we establish a sublinear convergence rate for Cesaro averages of primal iterates. Moreover, we propose a novel concept, which we call stochastic momentum, aimed at decreasing the cost of performing the momentum step. We prove linear convergence of several stochastic methods with stochastic momentum, and show that in some sparse data regimes and for sufficiently small momentum parameters, these methods enjoy better overall complexity than methods with deterministic momentum. Finally, we perform extensive numerical testing on artificial and real datasets, including data coming from average consensus problems.
Published: 2017

20. Linearly convergent stochastic heavy ball method for minimizing generalization error

Author: Loizou, Nicolas and Richt��rik, Peter
Subjects: FOS: Computer and information sciences, Computer Science - Learning, Optimization and Control (math.OC), Statistics - Machine Learning, FOS: Mathematics, Computer Science - Numerical Analysis, Machine Learning (stat.ML), Numerical Analysis (math.NA), Mathematics - Optimization and Control, Machine Learning (cs.LG)
Abstract: In this work we establish the first linear convergence result for the stochastic heavy ball method. The method performs SGD steps with a fixed stepsize, amended by a heavy ball momentum term. In the analysis, we focus on minimizing the expected loss and not on finite-sum minimization, which is typically a much harder problem. While in the analysis we constrain ourselves to quadratic loss, the overall objective is not necessarily strongly convex., NIPS 2017, Workshop on Optimization for Machine Learning (camera ready version)
Published: 2017

21. Privacy Preserving Randomized Gossip Algorithms

Author: Hanzely, Filip, Konečný, Jakub, Loizou, Nicolas, Richtárik, Peter, and Grishchenko, Dmitry
Subjects: math.OC
Abstract: In this work we present three different randomized gossip algorithms for solving the average consensus problem while at the same time protecting the information about the initial private values stored at the nodes. We give iteration complexity bounds for all methods, and perform extensive numerical experiments.
Published: 2017

22. Privacy Preserving Randomized Gossip Algorithms

Author: Hanzely, Filip, Konečný, Jakub, Loizou, Nicolas, Richtárik, Peter, and Grishchenko, Dmitry
Subjects: Optimization and Control (math.OC), FOS: Mathematics, Mathematics - Optimization and Control
Abstract: In this work we present three different randomized gossip algorithms for solving the average consensus problem while at the same time protecting the information about the initial private values stored at the nodes. We give iteration complexity bounds for all methods, and perform extensive numerical experiments., Comment: 38 pages
Published: 2017
Full Text: View/download PDF

23. Distributionally Robust Game Theory

Author: Loizou, Nicolas
Subjects: FOS: Computer and information sciences, Computer Science::Computer Science and Game Theory, Computer Science - Computer Science and Game Theory, ComputingMilieux_PERSONALCOMPUTING, TheoryofComputation_GENERAL, Computer Science and Game Theory (cs.GT)
Abstract: The classical, complete-information two-player games assume that the problem data (in particular the payoff matrix) is known exactly by both players. In a now famous result, Nash has shown that any such game has an equilibrium in mixed strategies. This result was later extended to a class of incomplete-information two-player games by Harsanyi, who assumed that the payoff matrix is not known exactly but rather represents a random variable that is governed by a probability distribution known to both players. In 2006, Bertsimas and Aghassi proposed a new class of distribution-free two-player games where the payoff matrix is only known to belong to a given uncertainty set. This model relaxes the distributional assumptions of Harsanyi's Bayesian games, and it gives rise to an alternative distribution-free equilibrium concept. In this thesis we present a new model of incomplete information games without private information in which the players use a distributionally robust optimization approach to cope with the payoff uncertainty. With some specific restrictions, we show that our "Distributionally Robust Game" constitutes a true generalization of the three aforementioned finite games (Nash games, Bayesian Games and Robust Games). Subsequently, we prove that the set of equilibria of an arbitrary distributionally robust game with specified ambiguity set can be computed as the component-wise projection of the solution set of a multi-linear system of equations and inequalities. Finally, we demonstrate the applicability of our new model of games and highlight its importance., MSc Thesis, Imperial College London
Published: 2015

24. A Unified Theory of Decentralized SGD with Changing Topology and Local Updates

Author: Koloskova, Anastasia, Loizou, Nicolas, Boreiri, Sadra, Jaggi, Martin, and Stich, Sebastian U.
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, algorithm, G.1.6, Machine Learning (stat.ML), Machine Learning (cs.LG), 68W10, 68W15, 68W40, 90C06, 90C35, ml-ai, Computer Science - Distributed, Parallel, and Cluster Computing, Optimization and Control (math.OC), Statistics - Machine Learning, FOS: Mathematics, Distributed, Parallel, and Cluster Computing (cs.DC), F.2.1, Mathematics - Optimization and Control, distributed optimization
Abstract: Decentralized stochastic optimization methods have gained a lot of attention recently, mainly because of their cheap per iteration cost, data locality, and their communication-efficiency. In this paper we introduce a unified convergence analysis that covers a large variety of decentralized SGD methods which so far have required different intuitions, have different applications, and which have been developed separately in various communities. Our algorithmic framework covers local SGD updates and synchronous and pairwise gossip updates on adaptive network topology. We derive universal convergence rates for smooth (convex and non-convex) problems and the rates interpolate between the heterogeneous (non-identically distributed data) and iid-data settings, recovering linear convergence rates in many special cases, for instance for over-parametrized models. Our proofs rely on weak assumptions (typically improving over prior work in several aspects) and recover (and improve) the best known complexity results for a host of important scenarios, such as for instance coorperative SGD and federated averaging (local SGD).

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

24 results on '"Loizou, Nicolas"'

1. Locally Adaptive Federated Learning via Stochastic Polyak Stepsizes

2. Communication-Efficient Gradient Descent-Accent Methods for Distributed Variational Inequalities: Unified Analysis and Local Updates

3. Stochastic Gradient Descent-Ascent: Unified Theory and New Efficient Methods

4. A Unified Approach to Reinforcement Learning, Quantal Response Equilibria, and Two-Player Zero-Sum Games

5. Stochastic Extragradient: General Analysis and Improved Rates

6. Stochastic Mirror Descent: Convergence Analysis and Adaptive Variants via the Mirror Stochastic Polyak Stepsize

7. Extragradient Method: $O(1/K)$ Last-Iterate Convergence for Monotone Variational Inequalities and Connections With Cocoercivity

8. Stochastic Gradient Descent-Ascent and Consensus Optimization for Smooth Games: Convergence Analysis under Expected Co-coercivity

9. AI-SARAH: Adaptive and Implicit Stochastic Recursive Gradient Methods

10. Stochastic Hamiltonian Gradient Methods for Smooth Games

11. Unified Analysis of Stochastic Gradient Methods for Composite Convex and Smooth Optimization

12. SGD for Structured Nonconvex Functions: Learning Rates, Minibatching and Interpolation

13. Stochastic Polyak Step-size for SGD: An Adaptive Learning Rate for Fast Convergence

14. Randomized Iterative Methods for Linear Systems: Momentum, Inexactness and Gossip

15. Randomized iterative methods for linear systems: momentum, inexactness and gossip

16. SGD: General Analysis and Improved Rates

17. A Privacy Preserving Randomized Gossip Algorithm via Controlled Noise Insertion

18. Stochastic Gradient Push for Distributed Deep Learning

19. Momentum and Stochastic Momentum for Stochastic Gradient, Newton, Proximal Point and Subspace Descent Methods

20. Linearly convergent stochastic heavy ball method for minimizing generalization error

21. Privacy Preserving Randomized Gossip Algorithms

22. Privacy Preserving Randomized Gossip Algorithms

23. Distributionally Robust Game Theory

24. A Unified Theory of Decentralized SGD with Changing Topology and Local Updates

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Database

Publisher

24 results on '"Loizou, Nicolas"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources