Descriptor: "Mathematics - Optimization and Control" / Publisher: hal ccsd / Topic: fos: computer and information sciences - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Mathematics - Optimization and Control"' showing total 311 results

Start Over Descriptor "Mathematics - Optimization and Control" Topic fos: computer and information sciences Publisher hal ccsd

311 results on '"Mathematics - Optimization and Control"'

51. Mixed-Integer Programming for the ROADEF/EURO 2020 challenge

Author: Gabriel Gouvine, Conservatoire National des Arts et Métiers [CNAM] (CNAM), Centre d'études et de recherche en informatique et communications (CEDRIC), Ecole Nationale Supérieure d'Informatique pour l'Industrie et l'Entreprise (ENSIIE)-Conservatoire National des Arts et Métiers [CNAM] (CNAM), Gouvine, Gabriel, and Sciencesconf.org, CCSD
Subjects: FOS: Computer and information sciences, G.4, [INFO.INFO-RO] Computer Science [cs]/Operations Research [cs.RO], [INFO.INFO-DS]Computer Science [cs]/Data Structures and Algorithms [cs.DS], 90-08 (Primary) 90-04, 90B25, 90C11 (Secondary), roadef challenge, [INFO.INFO-DS] Computer Science [cs]/Data Structures and Algorithms [cs.DS], [MATH.MATH-OC] Mathematics [math]/Optimization and Control [math.OC], [INFO.INFO-RO]Computer Science [cs]/Operations Research [cs.RO], [MATH.MATH-CO] Mathematics [math]/Combinatorics [math.CO], Optimization and Control (math.OC), Computer Science - Data Structures and Algorithms, FOS: Mathematics, Data Structures and Algorithms (cs.DS), [MATH.MATH-OC]Mathematics [math]/Optimization and Control [math.OC], integer programming, Mathematics - Optimization and Control
Abstract: The ROADEF 2020 challenge presents a maintenance scheduling problem from the French electricity grid company RTE. The modeling of uncertainty makes the problem highly nonconvex and apparently out of the reach of mathematical solvers. We present our approach for the challenge problem. It is based on a new family of cutting planes, coupled with a constraint generation approach. We present mathematical proofs and separation algorithms for the cutting planes. We then study the practical impact of our additions on the challenge instances, showing that our approach significantly reduces the optimality gap obtained by the solver.
Published: 2021

52. Sliding window strategy for convolutional spike sorting with Lasso Algorithm, theoretical guarantees and complexity

Author: Laurent Dragoni, Rémi Flamary, Karim Lounici, Patricia Reynaud-Bouret, Laboratoire Jean Alexandre Dieudonné (JAD), Université Côte d'Azur (UCA)-Université Nice Sophia Antipolis (... - 2019) (UNS), COMUE Université Côte d'Azur (2015-2019) (COMUE UCA)-COMUE Université Côte d'Azur (2015-2019) (COMUE UCA)-Centre National de la Recherche Scientifique (CNRS), Centre de Mathématiques Appliquées - Ecole Polytechnique (CMAP), and École polytechnique (X)-Centre National de la Recherche Scientifique (CNRS)
Subjects: Signal Processing (eess.SP), FOS: Computer and information sciences, Optimization, Applied Mathematics, [SCCO.NEUR]Cognitive science/Neuroscience, Mathematics - Statistics Theory, Statistics Theory (math.ST), Spike sorting, Statistics - Computation, [INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI], [INFO.INFO-TS]Computer Science [cs]/Signal and Image Processing, [INFO.INFO-LG]Computer Science [cs]/Machine Learning [cs.LG], [STAT.ML]Statistics [stat]/Machine Learning [stat.ML], Optimization and Control (math.OC), [MATH.MATH-ST]Mathematics [math]/Statistics [math.ST], FOS: Mathematics, FOS: Electrical engineering, electronic engineering, information engineering, Electrical Engineering and Systems Science - Signal Processing, Lasso, Mathematics - Optimization and Control, Sparsity, Computation (stat.CO), Neuroscience
Abstract: Spike sorting is a class of algorithms used in neuroscience to attribute the time occurences of particular electric signals, called action potential or spike, to neurons. We rephrase this problem as a particular optimization problem : Lasso for convolutional models in high dimension. Lasso (i.e. least absolute shrinkage and selection operator) is a very generic tool in machine learning that help us to look for sparse solutions (here the time occurrences). However, for the size of the problem at hand in this neuroscience context, the classical Lasso solvers are failing. We present here a new and much faster algorithm. Making use of biological properties related to neurons, we explain how the particular structure of the problem allows several optimizations, leading to an algorithm with a temporal complexity which grows linearly with respect to the size of the recorded signal and can be performed online. Moreover the spatial separability of the initial problem allows to break it into subproblems, further reducing the complexity and making possible its application on the latest recording devices which comprise a large number of sensors. We provide several mathematical results: the size and numerical complexity of the subproblems can be estimated mathematically by using percolation theory. We also show under reasonable assumptions that the Lasso estimator retrieves the true time occurrences of the spikes {with large probability}. Finally the theoretical time complexity of the algorithm is given. Numerical simulations are also provided in order to illustrate the efficiency of our approach.
Published: 2021

53. Factored couplings in multi-marginal optimal transport via difference of convex programming

Author: Tran, Huy Quang, Janati, Hicham, Redko, Ievgen, Flamary, Rémi, Courty, Nicolas, Environment observation with complex imagery (OBELIX), Université de Bretagne Sud (UBS)-SIGNAUX ET IMAGES NUMÉRIQUES, ROBOTIQUE (IRISA-D5), Institut de Recherche en Informatique et Systèmes Aléatoires (IRISA), Université de Rennes (UR)-Institut National des Sciences Appliquées - Rennes (INSA Rennes), Institut National des Sciences Appliquées (INSA)-Institut National des Sciences Appliquées (INSA)-Université de Bretagne Sud (UBS)-École normale supérieure - Rennes (ENS Rennes)-Institut National de Recherche en Informatique et en Automatique (Inria)-Télécom Bretagne-CentraleSupélec-Centre National de la Recherche Scientifique (CNRS)-Université de Rennes (UR)-Institut National des Sciences Appliquées - Rennes (INSA Rennes), Institut National des Sciences Appliquées (INSA)-Institut National des Sciences Appliquées (INSA)-Université de Bretagne Sud (UBS)-École normale supérieure - Rennes (ENS Rennes)-Institut National de Recherche en Informatique et en Automatique (Inria)-Télécom Bretagne-CentraleSupélec-Centre National de la Recherche Scientifique (CNRS)-Institut de Recherche en Informatique et Systèmes Aléatoires (IRISA), Institut National des Sciences Appliquées (INSA)-Institut National des Sciences Appliquées (INSA)-École normale supérieure - Rennes (ENS Rennes)-Institut National de Recherche en Informatique et en Automatique (Inria)-Télécom Bretagne-CentraleSupélec-Centre National de la Recherche Scientifique (CNRS), Centre de Mathématiques Appliquées - Ecole Polytechnique (CMAP), École polytechnique (X)-Centre National de la Recherche Scientifique (CNRS), Laboratoire Hubert Curien (LHC), Institut d'Optique Graduate School (IOGS)-Université Jean Monnet - Saint-Étienne (UJM)-Centre National de la Recherche Scientifique (CNRS), CentraleSupélec-Télécom Bretagne-Université de Rennes 1 (UR1), Université de Rennes (UNIV-RENNES)-Université de Rennes (UNIV-RENNES)-Institut National de Recherche en Informatique et en Automatique (Inria)-École normale supérieure - Rennes (ENS Rennes)-Université de Bretagne Sud (UBS)-Centre National de la Recherche Scientifique (CNRS)-Institut National des Sciences Appliquées - Rennes (INSA Rennes), Institut National des Sciences Appliquées (INSA)-Université de Rennes (UNIV-RENNES)-Institut National des Sciences Appliquées (INSA)-CentraleSupélec-Télécom Bretagne-Université de Rennes 1 (UR1), Institut National des Sciences Appliquées (INSA)-Université de Rennes (UNIV-RENNES)-Institut National des Sciences Appliquées (INSA)-Institut de Recherche en Informatique et Systèmes Aléatoires (IRISA), Université de Rennes (UNIV-RENNES)-Université de Rennes (UNIV-RENNES)-Institut National de Recherche en Informatique et en Automatique (Inria)-École normale supérieure - Rennes (ENS Rennes)-Centre National de la Recherche Scientifique (CNRS)-Institut National des Sciences Appliquées - Rennes (INSA Rennes), Institut National des Sciences Appliquées (INSA)-Université de Rennes (UNIV-RENNES)-Institut National des Sciences Appliquées (INSA), Laboratoire Hubert Curien [Saint Etienne] (LHC), Université Jean Monnet [Saint-Étienne] (UJM)-Centre National de la Recherche Scientifique (CNRS)-Institut d'Optique Graduate School (IOGS), SIGNAUX ET IMAGES NUMÉRIQUES, ROBOTIQUE (IRISA-D5), and Institut d'Optique Graduate School (IOGS)-Université Jean Monnet [Saint-Étienne] (UJM)-Centre National de la Recherche Scientifique (CNRS)
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, [STAT.ML]Statistics [stat]/Machine Learning [stat.ML], Statistics - Machine Learning, Optimization and Control (math.OC), FOS: Mathematics, Machine Learning (stat.ML), [MATH.MATH-OC]Mathematics [math]/Optimization and Control [math.OC], Mathematics - Optimization and Control, Machine Learning (cs.LG)
Abstract: Optimal transport (OT) theory underlies many emerging machine learning (ML) methods nowadays solving a wide range of tasks such as generative modeling, transfer learning and information retrieval. These latter works, however, usually build upon a traditional OT setup with two distributions, while leaving a more general multi-marginal OT formulation somewhat unexplored. In this paper, we study the multi-marginal OT (MMOT) problem and unify several popular OT methods under its umbrella by promoting structural information on the coupling. We show that incorporating such structural information into MMOT results in an instance of a different of convex (DC) programming problem allowing us to solve it numerically. Despite high computational cost of the latter procedure, the solutions provided by DC optimization are usually as qualitative as those obtained using currently employed optimization schemes., Comment: Revision of notation and proofs
Published: 2021

54. Gradient Descent on Infinitely Wide Neural Networks: Global Convergence and Generalization

Author: Bach, Francis, Chizat, Lénaïc, Statistical Machine Learning and Parsimony (SIERRA), Département d'informatique - ENS Paris (DI-ENS), Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche en Informatique et en Automatique (Inria)-École normale supérieure - Paris (ENS Paris), Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche en Informatique et en Automatique (Inria)-École normale supérieure - Paris (ENS Paris), Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-Centre National de la Recherche Scientifique (CNRS)-Inria de Paris, Institut National de Recherche en Informatique et en Automatique (Inria), Ecole Polytechnique Fédérale de Lausanne (EPFL), ANR-19-P3IA-0001,PRAIRIE,PaRis Artificial Intelligence Research InstitutE(2019), European Project: 724063,ERC-2016-COG,SEQUOIA(2017), École normale supérieure - Paris (ENS-PSL), Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-École normale supérieure - Paris (ENS-PSL), Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Centre National de la Recherche Scientifique (CNRS)-Inria de Paris, Département d'informatique de l'École normale supérieure (DI-ENS), École normale supérieure - Paris (ENS Paris), and Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-École normale supérieure - Paris (ENS Paris)
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, [INFO.INFO-LG]Computer Science [cs]/Machine Learning [cs.LG], Optimization and Control (math.OC), [MATH.MATH-ST]Mathematics [math]/Statistics [math.ST], FOS: Mathematics, Mathematics - Statistics Theory, Statistics Theory (math.ST), [MATH.MATH-OC]Mathematics [math]/Optimization and Control [math.OC], Mathematics - Optimization and Control, Machine Learning (cs.LG)
Abstract: International audience; Many supervised machine learning methods are naturally cast as optimization problems. For prediction models which are linear in their parameters, this often leads to convex problems for which many mathematical guarantees exist. Models which are non-linear in their parameters such as neural networks lead to non-convex optimization problems for which guarantees are harder to obtain. In this review paper, we consider two-layer neural networks with homogeneous activation functions where the number of hidden neurons tends to infinity, and show how qualitative convergence guarantees may be derived.
Published: 2021

55. Linear Bandits on Uniformly Convex Sets

Author: Kerdreux, Thomas, Roux, Christophe, d'Aspremont, Alexandre, Pokutta, Sebastian, Département d'informatique de l'École normale supérieure (DI-ENS), École normale supérieure - Paris (ENS Paris), Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS), Equipe de droit public de Lyon, Université Jean Moulin - Lyon 3 (UJML), Université de Lyon-Université de Lyon, Laboratoire d'informatique de l'école normale supérieure (LIENS), Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-Centre National de la Recherche Scientifique (CNRS), Centre National de la Recherche Scientifique (CNRS), Statistical Machine Learning and Parsimony (SIERRA), Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-École normale supérieure - Paris (ENS Paris), Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Centre National de la Recherche Scientifique (CNRS)-Inria de Paris, Institut National de Recherche en Informatique et en Automatique (Inria), Université Paris sciences et lettres (PSL), Zuse Institute Berlin (ZIB), Research reported in this paper was partially supported through the Research Campus Modal funded by the German Federal Ministry of Education and Research (fund numbers 05M14ZAM,05M20ZBM) as well as the Deutsche Forschungsgemeinschaft (DFG) through the DFG Cluster of Excellence MATH+.AA is at the département d’informatique de l’École Normale Supérieure, UMR CNRS 8548, PSL Research University, 75005 Paris, France, and INRIA. AA would like to acknowledge support from the ML and Optimisation joint research initiative with the fonds AXA pour la recherche and Kamet Ventures, a Google focused award, as well as funding by the French government under management of Agence Nationale de la Recherche as part of the 'Investissements d’avenir' program, reference ANR-19-P3IA-0001 (PRAIRIE 3IA Institute)., ANR-19-P3IA-0001,PRAIRIE,PaRis Artificial Intelligence Research InstitutE(2019), Département d'informatique - ENS Paris (DI-ENS), École normale supérieure - Paris (ENS-PSL), Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-École normale supérieure - Paris (ENS-PSL), Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche en Informatique et en Automatique (Inria)-École normale supérieure - Paris (ENS Paris), Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL), Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche en Informatique et en Automatique (Inria)-École normale supérieure - Paris (ENS Paris), Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-Centre National de la Recherche Scientifique (CNRS)-Inria de Paris, d'Aspremont, Alexandre, and PaRis Artificial Intelligence Research InstitutE - - PRAIRIE2019 - ANR-19-P3IA-0001 - P3IA - VALID
Subjects: FOS: Computer and information sciences, Computer Science::Machine Learning, Computer Science - Machine Learning, Statistics::Machine Learning, [INFO.INFO-RO] Computer Science [cs]/Operations Research [cs.RO], [STAT.ML]Statistics [stat]/Machine Learning [stat.ML], Optimization and Control (math.OC), FOS: Mathematics, [INFO.INFO-RO]Computer Science [cs]/Operations Research [cs.RO], Mathematics - Optimization and Control, [STAT.ML] Statistics [stat]/Machine Learning [stat.ML], Machine Learning (cs.LG)
Abstract: Linear bandit algorithms yield $\tilde{\mathcal{O}}(n\sqrt{T})$ pseudo-regret bounds on compact convex action sets $\mathcal{K}\subset\mathbb{R}^n$ and two types of structural assumptions lead to better pseudo-regret bounds. When $\mathcal{K}$ is the simplex or an $\ell_p$ ball with $p\in]1,2]$, there exist bandits algorithms with $\tilde{\mathcal{O}}(\sqrt{nT})$ pseudo-regret bounds. Here, we derive bandit algorithms for some strongly convex sets beyond $\ell_p$ balls that enjoy pseudo-regret bounds of $\tilde{\mathcal{O}}(\sqrt{nT})$, which answers an open question from [BCB12, \S 5.5.]. Interestingly, when the action set is uniformly convex but not necessarily strongly convex, we obtain pseudo-regret bounds with a dimension dependency smaller than $\mathcal{O}(\sqrt{n})$. However, this comes at the expense of asymptotic rates in $T$ varying between $\tilde{\mathcal{O}}(\sqrt{T})$ and $\tilde{\mathcal{O}}(T)$.
Published: 2021

56. Asymptotic convergence rates for averaging strategies

Author: Iskander Legheraba, Yann Chevaleyre, Laurent Meunier, Olivier Teytaud, Laboratoire d'analyse et modélisation de systèmes pour l'aide à la décision (LAMSADE), Université Paris Dauphine-PSL, Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-Centre National de la Recherche Scientifique (CNRS), Facebook AI Research [Paris] (FAIR), and Facebook
Subjects: FOS: Computer and information sciences, Design of experiments, Computer Science - Neural and Evolutionary Computing, 0102 computer and information sciences, 02 engineering and technology, Function (mathematics), 01 natural sciences, Evolutionary computation, Random search, Quadratic equation, 010201 computation theory & mathematics, Optimization and Control (math.OC), Black box, Convergence (routing), 0202 electrical engineering, electronic engineering, information engineering, FOS: Mathematics, Applied mathematics, 020201 artificial intelligence & image processing, [INFO]Computer Science [cs], Neural and Evolutionary Computing (cs.NE), [MATH]Mathematics [math], Mathematics - Optimization and Control, Arithmetic mean, Mathematics
Abstract: Parallel black box optimization consists in estimating the optimum of a function using $\lambda$ parallel evaluations of $f$. Averaging the $\mu$ best individuals among the $\lambda$ evaluations is known to provide better estimates of the optimum of a function than just picking up the best. In continuous domains, this averaging is typically just based on (possibly weighted) arithmetic means. Previous theoretical results were based on quadratic objective functions. In this paper, we extend the results to a wide class of functions, containing three times continuously differentiable functions with unique optimum. We prove formal rate of convergences and show they are indeed better than pure random search asymptotically in $\lambda$. We validate our theoretical findings with experiments on some standard black box functions.
Published: 2021

57. Non-Asymptotic Analysis of Stochastic Approximation Algorithms for Streaming Data

Author: Antoine Godichon-Baggioni, Nicklas Werge, Olivier Wintenberger, Laboratoire de Probabilités, Statistiques et Modélisations (LPSM (UMR_8001)), and Sorbonne Université (SU)-Centre National de la Recherche Scientifique (CNRS)-Université de Paris (UP)
Subjects: FOS: Computer and information sciences, Statistics and Probability, Computer Science - Machine Learning, Machine Learning (stat.ML), stochastic optimization, streaming data, Machine Learning (cs.LG), machine learning, [INFO.INFO-LG]Computer Science [cs]/Machine Learning [cs.LG], [STAT.ML]Statistics [stat]/Machine Learning [stat.ML], Optimization and Control (math.OC), Statistics - Machine Learning, large-scale, stochastic approximation, FOS: Mathematics, [MATH.MATH-OC]Mathematics [math]/Optimization and Control [math.OC], Mathematics - Optimization and Control
Abstract: Motivated by the high-frequency data streams continuously generated, real-time learning is becoming increasingly important. These data streams should be processed sequentially with the property that the stream may change over time. In this streaming setting, we propose techniques for minimizing a convex objective through unbiased estimates of its gradients, commonly referred to as stochastic approximation problems. Our methods rely on stochastic approximation algorithms due to their computationally advantage as they only use the previous iterate as a parameter estimate. The reasoning includes iterate averaging that guarantees optimal statistical efficiency under classical conditions. Our non-asymptotic analysis shows accelerated convergence by selecting the learning rate according to the expected data streams. We show that the average estimate converges optimally and robustly to any data stream rate. In addition, noise reduction can be achieved by processing the data in a specific pattern, which is advantageous for large-scale machine learning. These theoretical results are illustrated for various data streams, showing the effectiveness of the proposed algorithms.
Published: 2021

58. Federated Expectation Maximization with heterogeneity mitigation and variance reduction

Author: Aymeric Dieuleveut, Gersende FORT, Eric Moulines, Geneviève Robin, Département de Mathématiques Appliquées de l'École polytechnique (X-DEP-MATHAPP), École polytechnique (X), Institut de Mathématiques de Toulouse UMR5219 (IMT), Centre National de la Recherche Scientifique (CNRS)-Université Toulouse III - Paul Sabatier (UT3), Université Fédérale Toulouse Midi-Pyrénées-Université Fédérale Toulouse Midi-Pyrénées-Université Toulouse - Jean Jaurès (UT2J)-Université Toulouse 1 Capitole (UT1), Université Fédérale Toulouse Midi-Pyrénées-Institut National des Sciences Appliquées - Toulouse (INSA Toulouse), Institut National des Sciences Appliquées (INSA)-Institut National des Sciences Appliquées (INSA), Laboratoire de Mathématiques et Modélisation d'Evry (LaMME), Université d'Évry-Val-d'Essonne (UEVE)-ENSIIE-Université Paris-Saclay-Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche pour l’Agriculture, l’Alimentation et l’Environnement (INRAE), The work of A. Dieuleveut and E. Moulines is partially supported by ANR-19-CHIA-0002-01 /chaire SCAI. The work of G. Fort is partially supported by the Fondation Simone et Cino del Duca under the project OpSiMorE., Institut National des Sciences Appliquées - Toulouse (INSA Toulouse), Institut National des Sciences Appliquées (INSA)-Institut National des Sciences Appliquées (INSA)-Université Toulouse 1 Capitole (UT1), Université Fédérale Toulouse Midi-Pyrénées-Université Fédérale Toulouse Midi-Pyrénées-Université Toulouse - Jean Jaurès (UT2J)-Université Toulouse III - Paul Sabatier (UT3), Université Fédérale Toulouse Midi-Pyrénées-Centre National de la Recherche Scientifique (CNRS), Université Toulouse Capitole (UT Capitole), Université de Toulouse (UT)-Université de Toulouse (UT)-Institut National des Sciences Appliquées - Toulouse (INSA Toulouse), Institut National des Sciences Appliquées (INSA)-Université de Toulouse (UT)-Institut National des Sciences Appliquées (INSA)-Université Toulouse - Jean Jaurès (UT2J), Université de Toulouse (UT)-Université Toulouse III - Paul Sabatier (UT3), and Université de Toulouse (UT)-Centre National de la Recherche Scientifique (CNRS)
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Artificial Intelligence (cs.AI), [STAT.ML]Statistics [stat]/Machine Learning [stat.ML], [INFO.INFO-LG]Computer Science [cs]/Machine Learning [cs.LG], Computer Science - Artificial Intelligence, Optimization and Control (math.OC), [MATH.MATH-ST]Mathematics [math]/Statistics [math.ST], FOS: Mathematics, Mathematics - Optimization and Control, Machine Learning (cs.LG), [INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI]
Abstract: International audience; The Expectation Maximization (EM) algorithm is the default algorithm for inference in latent variable models. As in any other field of machine learning, applications of latent variable models to very large datasets make the use of advanced parallel and distributed architectures mandatory. This paper introduces FedEM, which is the first extension of the EM algorithm to the federated learning context. FedEM is a new communication efficient method, which handles partial participation of local devices, and is robust to heterogeneous distributions of the datasets. To alleviate the communication bottleneck, FedEM compresses appropriately defined complete data sufficient statistics. We also develop and analyze an extension of FedEM to further incorporate a variance reduction scheme. In all cases, we derive finite-time complexity bounds for smooth non-convex problems. Numerical results are presented to support our theoretical findings, as well as an application to federated missing values imputation for biodiversity monitoring.
Published: 2021

59. Privacy Impact on Generalized Nash Equilibrium in Peer-to-Peer Electricity Market

Author: Ana Bušić, Ilia Shilov, Hélène Le Cadre, Dynamics of Geometric Networks (DYOGENE), Département d'informatique - ENS Paris (DI-ENS), École normale supérieure - Paris (ENS-PSL), Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-École normale supérieure - Paris (ENS-PSL), Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Centre National de la Recherche Scientifique (CNRS)-Inria de Paris, Institut National de Recherche en Informatique et en Automatique (Inria), EnergyVille, Inria de Paris, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Département d'informatique - ENS Paris (DI-ENS), Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche en Informatique et en Automatique (Inria)-École normale supérieure - Paris (ENS Paris), Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-Centre National de la Recherche Scientifique (CNRS)-École normale supérieure - Paris (ENS Paris), Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL), Département d'informatique de l'École normale supérieure (DI-ENS), École normale supérieure - Paris (ENS Paris), and Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-École normale supérieure - Paris (ENS Paris)
Subjects: TheoryofComputation_MISCELLANEOUS, FOS: Computer and information sciences, 0209 industrial biotechnology, Computer Science::Computer Science and Game Theory, Computer science, 020209 energy, Peer-to-peer market, 02 engineering and technology, Management Science and Operations Research, Peer-to-peer, computer.software_genre, Industrial and Manufacturing Engineering, 020901 industrial engineering & automation, Computer Science - Computer Science and Game Theory, 0202 electrical engineering, electronic engineering, information engineering, FOS: Mathematics, Generalized nash equilibrium, Electricity market, Communication game, Uniqueness, Private information retrieval, Mathematics - Optimization and Control, Variational equilibrium, [INFO.INFO-GT]Computer Science [cs]/Computer Science and Game Theory [cs.GT], Applied Mathematics, Generalized Nash equilibrium, Optimization and Control (math.OC), Privacy, Bounded function, [MATH.MATH-OC]Mathematics [math]/Optimization and Control [math.OC], Closed-form expression, computer, Random variable, Mathematical economics, Software, Computer Science and Game Theory (cs.GT)
Abstract: International audience; We consider a peer-to-peer electricity market, where agents hold private information that they might not want to share. The problem is modeled as a noncooperative communication game, which takes the form of a Generalized Nash Equilibrium Problem, where the agents determine their randomized reports to share with the other market players, while anticipating the form of the peer-to-peer market equilibrium. In the noncooperative game, each agent decides on the deterministic and random parts of the report, such that (a) the distance between the deterministic part of the report and the truthful private information is bounded and (b) the expectation of the privacy loss random variable is bounded. This allows each agent to change her privacy level. We characterize the equilibrium of the game, prove the uniqueness of the Variational Equilibria and provide a closed form expression of the privacy price. In addition, we provide a closed form expression to measure the impact of the privacy preservation caused by inclusion of random noise and deterministic deviation from agents' true values. Numerical illustrations are presented on the 14-bus IEEE network.
Published: 2021

60. Dispatching to Parallel Servers: Solutions of Poisson's Equation for First-Policy Improvement

Author: Olivier Bilenne, Performance analysis and optimization of LARge Infrastructures and Systems (POLARIS), Inria Grenoble - Rhône-Alpes, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Laboratoire d'Informatique de Grenoble (LIG), Centre National de la Recherche Scientifique (CNRS)-Université Grenoble Alpes (UGA)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP ), Université Grenoble Alpes (UGA)-Centre National de la Recherche Scientifique (CNRS)-Université Grenoble Alpes (UGA)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP ), Université Grenoble Alpes (UGA), Academy of Finland, project FQ4BD (Grant No. 296206), and ANR-16-CE33-0004,ORACLESS,Stratégies adaptatives d'allocation des ressources dans les réseaux sans fil dynamiques(2016)
Subjects: FOS: Computer and information sciences, Polynomial, 40A30, 41A25, 41A50, 41A10, 42A10, 44A10, 60K20, 60K30, 62E20, 90B22, Policy iteration, C.4, Entire function, M/G/1 queue, G.3, Management Science and Operations Research, Poisson equation, 01 natural sciences, 010104 statistics & probability, symbols.namesake, ACM: I.: Computing Methodologies/I.2: ARTIFICIAL INTELLIGENCE/I.2.8: Problem Solving, Control Methods, and Search, FOS: Mathematics, Taylor series, C.2.4, Applied mathematics, 0101 mathematics, Mathematics - Optimization and Control, Mathematics, Computer Science - Performance, ACM: C.: Computer Systems Organization/C.2: COMPUTER-COMMUNICATION NETWORKS/C.2.4: Distributed Systems, ACM: D.: Software/D.4: OPERATING SYSTEMS/D.4.8: Performance, Laplace transform, PACS: 02.30.Lt, 02.30.Mv, 02.30.Uu, 02.50.Ga, 02.50.LeMSC: 40A30, 41A25, 41A50, 41A10, 42A10, 44A10, 60K20, 60K30, 62E20, 90B22, Probability (math.PR), 010102 general mathematics, I.2.8, D.4.8, Function (mathematics), Computer Science Applications, Exponential function, Performance (cs.PF), [MATH.MATH-PR]Mathematics [math]/Probability [math.PR], [INFO.INFO-PF]Computer Science [cs]/Performance [cs.PF], Computational Theory and Mathematics, Optimization and Control (math.OC), Dispatching, symbols, Piecewise, First-policy improvement, [MATH.MATH-OC]Mathematics [math]/Optimization and Control [math.OC], Mathematics - Probability
Abstract: Policy iteration techniques for multiple-server dispatching rely on the computation of value functions. In this context, we consider the continuous-space M/G/1-FCFS queue endowed with an arbitrarily-designed cost function for the waiting times of the incoming jobs. The associated relative value function is a solution of Poisson's equation for Markov chains, which in this work we solve in the Laplace transform domain by considering an ancillary, underlying stochastic process extended to (imaginary) negative backlog states. This construction enables us to issue closed-form relative value functions for polynomial and exponential cost functions and for piecewise compositions of the latter, in turn permitting the derivation of interval bounds for the relative value function in the form of power series or trigonometric sums. We review various cost approximation schemes and assess the convergence of the interval bounds these induce on the relative value function. Namely: Taylor expansions (divergent, except for a narrow class of entire functions with low orders of growth), and uniform approximation schemes (polynomials, trigonometric), which achieve optimal convergence rates over finite intervals. This study addresses all the steps to implementing dispatching policies for systems of parallel servers, from the specification of general cost functions towards the computation of interval bounds for the relative value functions and the exact implementation of the first-policy improvement step., Submitted manuscript. 34 pages, including 6 figures and 4 appendices; supplementary material (11 pages) available under 'Ancillary files'
Published: 2021

61. Making the most of your day: online learning for optimal allocation of time

Author: Boursier, E., Garrec, T., Perchet, V., Scarsini, Marco, CB - Centre Borelli - UMR 9010 (CB), Service de Santé des Armées-Institut National de la Santé et de la Recherche Médicale (INSERM)-Université Paris-Saclay-Centre National de la Recherche Scientifique (CNRS)-Ecole Normale Supérieure Paris-Saclay (ENS Paris Saclay)-Université de Paris (UP), EDF Lab, Ecole Nationale de la Statistique et de l'Analyse Economique (ENSAE), Ecole Nationale de la Statistique et de l'Analyse Economique, and Libera Università Internazionale degli Studi Sociali Guido Carli [Roma] (LUISS)
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Other Statistics (stat.OT), online learning, online learning, scheduling, Machine Learning (stat.ML), Machine Learning (cs.LG), Statistics - Other Statistics, Statistics - Machine Learning, Optimization and Control (math.OC), FOS: Mathematics, scheduling, [MATH]Mathematics [math], Mathematics - Optimization and Control
Abstract: We study online learning for optimal allocation when the resource to be allocated is time. %Examples of possible applications include job scheduling for a computing server, a driver filling a day with rides, a landlord renting an estate, etc. An agent receives task proposals sequentially according to a Poisson process and can either accept or reject a proposed task. If she accepts the proposal, she is busy for the duration of the task and obtains a reward that depends on the task duration. If she rejects it, she remains on hold until a new task proposal arrives. We study the regret incurred by the agent, first when she knows her reward function but does not know the distribution of the task duration, and then when she does not know her reward function, either. This natural setting bears similarities with contextual (one-armed) bandits, but with the crucial difference that the normalized reward associated to a context depends on the whole distribution of contexts., Comment: NeurIPS 2021 camera ready
Published: 2021

62. Survival of the strictest: Stable and unstable equilibria under regularized learning with partial information

Author: Giannou, Angeliki, Vlatakis-Gkaragkounis, Emmanouil, Mertikopoulos, Panayotis, National Technical University of Athens [Athens] (NTUA), Columbia University [New York], Performance analysis and optimization of LARge Infrastructures and Systems (POLARIS), Inria Grenoble - Rhône-Alpes, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Laboratoire d'Informatique de Grenoble (LIG), Centre National de la Recherche Scientifique (CNRS)-Université Grenoble Alpes (UGA)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP ), Université Grenoble Alpes (UGA)-Centre National de la Recherche Scientifique (CNRS)-Université Grenoble Alpes (UGA)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP ), Université Grenoble Alpes (UGA), ANR-16-CE33-0004,ORACLESS,Stratégies adaptatives d'allocation des ressources dans les réseaux sans fil dynamiques(2016), ANR-19-P3IA-0003,MIAI,MIAI @ Grenoble Alpes(2019), ANR-11-LABX-0025,PERSYVAL-lab,Systemes et Algorithmes Pervasifs au confluent des mondes physique et numérique(2011), and ANR-19-CE48-0018,ALIAS,Apprentissage adaptatif multi-agent(2019)
Subjects: FOS: Computer and information sciences, TheoryofComputation_MISCELLANEOUS, Computer Science - Machine Learning, Computer Science::Computer Science and Game Theory, TheoryofComputation_GENERAL, Machine Learning (cs.LG), Optimization and Control (math.OC), Computer Science - Computer Science and Game Theory, FOS: Mathematics, Computer Science - Multiagent Systems, [MATH.MATH-OC]Mathematics [math]/Optimization and Control [math.OC], Mathematics - Optimization and Control, Computer Science and Game Theory (cs.GT), Multiagent Systems (cs.MA)
Abstract: International audience; In this paper, we examine the Nash equilibrium convergence properties of no-regret learning in general N-player games. For concreteness, we focus on the archetypal "follow the regularized leader" (FTRL) family of algorithms, and we consider the full spectrum of uncertainty that the players may encounter-from noisy, oracle-based feedback, to bandit, payoff-based information. In this general context, we establish a comprehensive equivalence between the stability of a Nash equilibrium and its support: a Nash equilibrium is stable and attracting with arbitrarily high probability if and only if it is strict (i.e., each equilibrium strategy has a unique best response). This equivalence extends existing continuous-time versions of the "folk theorem" of evolutionary game theory to a bona fide algorithmic learning setting, and it provides a clear refinement criterion for the prediction of the day-today behavior of no-regret learning in games.
Published: 2021

63. Optimal control for parameter estimation in partially observed hypoelliptic stochastic differential equations

Author: Quentin Clairon, Adeline Samson, Statistics In System biology and Translational Medicine (SISTM), Inria Bordeaux - Sud-Ouest, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)- Bordeaux population health (BPH), Université de Bordeaux (UB)-Institut de Santé Publique, d'Épidémiologie et de Développement (ISPED)-Institut National de la Santé et de la Recherche Médicale (INSERM)-Université de Bordeaux (UB)-Institut de Santé Publique, d'Épidémiologie et de Développement (ISPED)-Institut National de la Santé et de la Recherche Médicale (INSERM), Bordeaux population health (BPH), Université de Bordeaux (UB)-Institut de Santé Publique, d'Épidémiologie et de Développement (ISPED)-Institut National de la Santé et de la Recherche Médicale (INSERM), Statistique pour le Vivant et l’Homme (SVH), Laboratoire Jean Kuntzmann (LJK), Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Université Grenoble Alpes (UGA)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP ), Université Grenoble Alpes (UGA)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Université Grenoble Alpes (UGA)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP ), Université Grenoble Alpes (UGA), ANR-19-P3IA-0003,MIAI,MIAI @ Grenoble Alpes(2019), ANR-11-LABX-0025,PERSYVAL-lab,Systemes et Algorithmes Pervasifs au confluent des mondes physique et numérique(2011), The IMI2 Joint Undertaking receives support from the European Union’s Horizon 2020 research and innovation programme and the European Federation of Pharmaceutical Industries and Association., and ANR-19-CE40-0024,ChaMaNe,Enjeux mathématiques issus des neurosciences(2019)
Subjects: FOS: Computer and information sciences, Statistics and Probability, Methodology (stat.ME), Computational Mathematics, Optimization and Control (math.OC), Optimal control theory, Hypoellipticity, FOS: Mathematics, Stochastic differential equations, Parameter estimation, Statistics, Probability and Uncertainty, Mathematics - Optimization and Control, [STAT.ME]Statistics [stat]/Methodology [stat.ME], Statistics - Methodology
Abstract: International audience; We deal with the problem of parameter estimation in stochastic differential equations (SDEs) in a partially observed framework. We aim to design a method working for both elliptic and hypoelliptic SDEs, the latters being characterized by degenerate diffusion coefficients. This feature often causes the failure of contrast estimators based on Euler Maruyama discretization scheme and dramatically impairs classic stochastic filtering methods used to reconstruct the unobserved states. All of theses issues make the estimation problem in hypoelliptic SDEs difficult to solve. To overcome this, we construct a well-defined cost function no matter the elliptic nature of the SDEs. We also bypass the filtering step by considering a control theory perspective. The unobserved states are estimated by solving deterministic optimal control problems using numerical methods which do not need strong assumptions on the diffusion coefficient conditioning. Numerical simulations made on different partially observed hypoelliptic SDEs reveal our method produces accurate estimate while dramatically reducing the computational price comparing to other methods.
Published: 2021

64. A heuristic for estimating Nash equilibria in first-price auctions with correlated values

Author: Heymann, Benjamin, Mertikopoulos, Panayotis, Criteo AI Lab, Criteo [Paris], and Centre National de la Recherche Scientifique (CNRS)
Subjects: FOS: Computer and information sciences, TheoryofComputation_MISCELLANEOUS, [INFO.INFO-GT]Computer Science [cs]/Computer Science and Game Theory [cs.GT], Computer Science - Computer Science and Game Theory, Optimization and Control (math.OC), FOS: Mathematics, TheoryofComputation_GENERAL, [MATH.MATH-OC]Mathematics [math]/Optimization and Control [math.OC], Mathematics - Optimization and Control, Computer Science and Game Theory (cs.GT)
Abstract: Our paper concerns the computation of Nash equilibria of first-price auctions with correlated values. While there exist several equilibrium computation methods for auctions with independent values, the correlation of the bidders' values introduces significant complications that render existing methods unsatisfactory in practice. Our contribution is a step towards filling this gap: inspired by the seminal fictitious play process of Brown and Robinson, we present a learning heuristic-that we call fictitious bidding (FB)-for estimating Bayes-Nash equilibria of first-price auctions with correlated values, and we assess the performance of this heuristic on several relevant examples.
Published: 2021

65. Zeroth-order non-convex learning via hierarchical dual averaging

Author: Héliou, Amélie, Martin, Matthieu, Rahier, Thibaud, Mertikopoulos, Panayotis, Criteo AI Lab, Criteo [Paris], Performance analysis and optimization of LARge Infrastructures and Systems (POLARIS), Inria Grenoble - Rhône-Alpes, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Laboratoire d'Informatique de Grenoble (LIG), Centre National de la Recherche Scientifique (CNRS)-Université Grenoble Alpes (UGA)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP ), Université Grenoble Alpes (UGA)-Centre National de la Recherche Scientifique (CNRS)-Université Grenoble Alpes (UGA)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP ), Université Grenoble Alpes (UGA), ANR-16-CE33-0004,ORACLESS,Stratégies adaptatives d'allocation des ressources dans les réseaux sans fil dynamiques(2016), ANR-19-P3IA-0003,MIAI,MIAI @ Grenoble Alpes(2019), ANR-11-LABX-0025,PERSYVAL-lab,Systemes et Algorithmes Pervasifs au confluent des mondes physique et numérique(2011), and ANR-19-CE48-0018,ALIAS,Apprentissage adaptatif multi-agent(2019)
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Optimization and Control (math.OC), Primary 68Q32, 90C56, secondary 90C15, 90C26, FOS: Mathematics, [MATH.MATH-OC]Mathematics [math]/Optimization and Control [math.OC], Mathematics - Optimization and Control, Machine Learning (cs.LG)
Abstract: We propose a hierarchical version of dual averaging for zeroth-order online non-convex optimization - i.e., learning processes where, at each stage, the optimizer is facing an unknown non-convex loss function and only receives the incurred loss as feedback. The proposed class of policies relies on the construction of an online model that aggregates loss information as it arrives, and it consists of two principal components: (a) a regularizer adapted to the Fisher information metric (as opposed to the metric norm of the ambient space); and (b) a principled exploration of the problem's state space based on an adapted hierarchical schedule. This construction enables sharper control of the model's bias and variance, and allows us to derive tight bounds for both the learner's static and dynamic regret - i.e., the regret incurred against the best dynamic policy in hindsight over the horizon of play., 40 pages, 14 figures
Published: 2021

66. Optimization of Graph Neural Networks: Implicit Acceleration by Skip Connections and More Depth

Author: Xu, Keyulu, Zhang, Mozhi, Jegelka, Stefanie, Kawaguchi, Kenji, Massachusetts Institute of Technology (MIT), Harvard University [Cambridge], and Kawaguchi, Kenji
Subjects: [INFO.INFO-AI] Computer Science [cs]/Artificial Intelligence [cs.AI], FOS: Computer and information sciences, Computer Science - Machine Learning, Computer Vision and Pattern Recognition (cs.CV), [INFO.INFO-NE] Computer Science [cs]/Neural and Evolutionary Computing [cs.NE], Computer Science - Computer Vision and Pattern Recognition, [INFO.INFO-CV]Computer Science [cs]/Computer Vision and Pattern Recognition [cs.CV], [MATH.MATH-OC] Mathematics [math]/Optimization and Control [math.OC], Machine Learning (stat.ML), [INFO.INFO-LG] Computer Science [cs]/Machine Learning [cs.LG], [INFO.INFO-NE]Computer Science [cs]/Neural and Evolutionary Computing [cs.NE], [STAT.ML] Statistics [stat]/Machine Learning [stat.ML], Machine Learning (cs.LG), [INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI], [INFO.INFO-CV] Computer Science [cs]/Computer Vision and Pattern Recognition [cs.CV], [INFO.INFO-LG]Computer Science [cs]/Machine Learning [cs.LG], [STAT.ML]Statistics [stat]/Machine Learning [stat.ML], Statistics - Machine Learning, Optimization and Control (math.OC), FOS: Mathematics, [MATH.MATH-OC]Mathematics [math]/Optimization and Control [math.OC], Mathematics - Optimization and Control
Abstract: International audience; Graph Neural Networks (GNNs) have been studied through the lens of expressive power and generalization. However, their optimization properties are less well understood. We take the first step towards analyzing GNN training by studying the gradient dynamics of GNNs. First, we analyze linearized GNNs and prove that despite the non-convexity of training, convergence to a global minimum at a linear rate is guaranteed under mild assumptions that we validate on real-world graphs. Second, we study what may affect the GNNs' training speed. Our results show that the training of GNNs is implicitly accelerated by skip connections, more depth, and/or a good label distribution. Empirical results confirm that our theoretical results for linearized GNNs align with the training behavior of nonlinear GNNs. Our results provide the first theoretical support for the success of GNNs with skip connections in terms of optimization, and suggest that deep GNNs with skip connections would be promising in practice.
Published: 2021

67. Online A-optimal design and active linear regression

Author: Fontaine, Xavier, Perrault, Pierre, Valko, Michal, Perchet, Vianney, CB - Centre Borelli - UMR 9010 (CB), Service de Santé des Armées-Institut National de la Santé et de la Recherche Médicale (INSERM)-Université Paris-Saclay-Centre National de la Recherche Scientifique (CNRS)-Ecole Normale Supérieure Paris-Saclay (ENS Paris Saclay)-Université de Paris (UP), Idemia, DeepMind [Paris], Ecole Nationale de la Statistique et de l'Analyse Economique (ENSAE), Ecole Nationale de la Statistique et de l'Analyse Economique, Criteo AI Lab, and Criteo [Paris]
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, [STAT.ML]Statistics [stat]/Machine Learning [stat.ML], Statistics - Machine Learning, Optimization and Control (math.OC), FOS: Mathematics, Machine Learning (stat.ML), Mathematics - Optimization and Control, Machine Learning (cs.LG)
Abstract: We consider in this paper the problem of optimal experiment design where a decision maker can choose which points to sample to obtain an estimate $\hat{\beta}$ of the hidden parameter $\beta^{\star}$ of an underlying linear model. The key challenge of this work lies in the heteroscedasticity assumption that we make, meaning that each covariate has a different and unknown variance. The goal of the decision maker is then to figure out on the fly the optimal way to allocate the total budget of $T$ samples between covariates, as sampling several times a specific one will reduce the variance of the estimated model around it (but at the cost of a possible higher variance elsewhere). By trying to minimize the $\ell^2$-loss $\mathbb{E} [\lVert\hat{\beta}-\beta^{\star}\rVert^2]$ the decision maker is actually minimizing the trace of the covariance matrix of the problem, which corresponds then to online A-optimal design. Combining techniques from bandit and convex optimization we propose a new active sampling algorithm and we compare it with existing ones. We provide theoretical guarantees of this algorithm in different settings, including a $\mathcal{O}(T^{-2})$ regret bound in the case where the covariates form a basis of the feature space, generalizing and improving existing results. Numerical experiments validate our theoretical findings., Comment: 29 pages, 5 figures
Published: 2021

68. Revisiting Bayesian Optimization in the light of the COCO benchmark

Author: Victor Picheny, Rodolphe Le Riche, Centre National de la Recherche Scientifique (CNRS), Institut Henri Fayol (FAYOL-ENSMSE), École des Mines de Saint-Étienne (Mines Saint-Étienne MSE), Institut Mines-Télécom [Paris] (IMT)-Institut Mines-Télécom [Paris] (IMT), Département Génie mathématique et industriel (FAYOL-ENSMSE), Ecole Nationale Supérieure des Mines de St Etienne (ENSM ST-ETIENNE)-Institut Henri Fayol, Laboratoire d'Informatique, de Modélisation et d'Optimisation des Systèmes (LIMOS), Ecole Nationale Supérieure des Mines de St Etienne (ENSM ST-ETIENNE)-Centre National de la Recherche Scientifique (CNRS)-Université Clermont Auvergne (UCA)-Institut national polytechnique Clermont Auvergne (INP Clermont Auvergne), Université Clermont Auvergne (UCA)-Université Clermont Auvergne (UCA), Institut Mines-Télécom [Paris] (IMT), Université Clermont Auvergne (UCA), Secondmind, Ecole Nationale Supérieure des Mines de St Etienne-Institut Henri Fayol, and Ecole Nationale Supérieure des Mines de St Etienne-Centre National de la Recherche Scientifique (CNRS)-Université Clermont Auvergne (UCA)-Institut national polytechnique Clermont Auvergne (INP Clermont Auvergne)
Subjects: FOS: Computer and information sciences, Mathematical optimization, Control and Optimization, Computer science, 0211 other engineering and technologies, Machine Learning (stat.ML), 02 engineering and technology, Statistics - Computation, symbols.namesake, Statistics - Machine Learning, 0202 electrical engineering, electronic engineering, information engineering, Code (cryptography), FOS: Mathematics, Optimization algorithm benchmark, Global optimization, Gaussian process, Mathematics - Optimization and Control, Computation (stat.CO), Bayesian optimization, 021103 operations research, global optimization, [INFO.INFO-CE]Computer Science [cs]/Computational Engineering, Finance, and Science [cs.CE], Expensive function optimization, Function (mathematics), Computer Graphics and Computer-Aided Design, [INFO.INFO-MO]Computer Science [cs]/Modeling and Simulation, Computer Science Applications, [SPI.MECA.GEME]Engineering Sciences [physics]/Mechanics [physics.med-ph]/Mechanical engineering [physics.class-ph], Control and Systems Engineering, Optimization and Control (math.OC), Kernel (statistics), symbols, Benchmark (computing), 020201 artificial intelligence & image processing, [MATH.MATH-OC]Mathematics [math]/Optimization and Control [math.OC], Engineering design process, Software
Abstract: International audience; It is commonly believed that Bayesian optimization (BO) algorithms are highly efficient for optimizing numerically costly functions. However, BO is not often compared to widely different alternatives, and is mostly tested on narrow sets of problems (multimodal, low-dimensional functions), which makes it difficult to assess where (or if) they actually achieve state-of-the-art performance. Moreover, several aspects in the design of these algorithms vary across implementations without a clear recommendation emerging from current practices, and many of these design choices are not substantiated by authoritative test campaigns. This article reports a large investigation about the effects on the performance of (Gaussian process based) BO of common and less common design choices. The experiments are carried out with the established COCO (COmparing Continuous Optimizers) software. It is found that a small initial budget, a quadratic trend, high-quality optimization of the acquisition criterion bring consistent progress. Using the GP mean as an occasional acquisition contributes to a negligible additional improvement. Warping degrades performance. The Mat\'ern 5/2 kernel is a good default but it may be surpassed by the exponential kernel on irregular functions. Overall, the best EGO variants are competitive or improve over state-of-the-art algorithms in dimensions less or equal to 5 for multimodal functions. The code developed for this study makes the new version (v2.1.1) of the R package DiceOptim available on CRAN. The structure of the experiments by function groups allows to define priorities for future research on Bayesian optimization.
Published: 2021

69. Linear support vector regression with linear constraints

Author: Samuel Vaiter, Quentin Klopfenstein, Institut de Mathématiques de Bourgogne [Dijon] (IMB), Centre National de la Recherche Scientifique (CNRS)-Université de Franche-Comté (UFC), Université Bourgogne Franche-Comté [COMUE] (UBFC)-Université Bourgogne Franche-Comté [COMUE] (UBFC)-Université de Bourgogne (UB), Centre National de la Recherche Scientifique (CNRS), French National Research Agency (ANR)ANR-18-CE40-0005INSERM Plan cancer18CP134-00Projet ANER RAGAG048CVCRB-2018ZZ, and ANR-18-CE40-0005,GraVa,Méthodes variationnelles pour les signaux sur graphe(2018)
Subjects: FOS: Computer and information sciences, Mathematical optimization, Statistics::Theory, Optimization problem, Support vector machine, Computer science, Machine Learning (stat.ML), 02 engineering and technology, Kernel (linear algebra), [STAT.ML]Statistics [stat]/Machine Learning [stat.ML], Artificial Intelligence, Statistics - Machine Learning, 020204 information systems, 0202 electrical engineering, electronic engineering, information engineering, FOS: Mathematics, Isotonic regression, Statistics::Methodology, Sequential minimal optimization, Constrained linear regression, [INFO]Computer Science [cs], Coordinate descent, Mathematics - Optimization and Control, Estimator, Probability vector, Support vector regression, Optimization and Control (math.OC), 020201 artificial intelligence & image processing, [MATH.MATH-OC]Mathematics [math]/Optimization and Control [math.OC], Software
Abstract: International audience; This paper studies the addition of linear constraints to the Support Vector Regression when the kernel is linear. Adding those constraints into the problem allows to add prior knowledge on the estimator obtained, such as finding positive vector, probability vector or monotone data. We prove that the related optimization problem stays a semi-definite quadratic problem. We also propose a generalization of the Sequential Minimal Optimization algorithm for solving the optimization problem with linear constraints and prove its convergence. We show that an efficient generalization of this iterative algorithm with closed-form updates can be used to obtain the solution of the underlying optimization problem. Then, practical performances of this estimator are shown on simulated and real datasets with different settings: non negative regression, regression onto the simplex for biomedical data and isotonic regression for weather forecast. These experiments show the usefulness of this estimator in comparison to more classical approaches.
Published: 2021

70. Regret minimization in stochastic non-convex learning via a proximal-gradient approach

Author: Hallak, Nadav, Mertikopoulos, Panayotis, Cevher, Volkan, Technion - Israel Institute of Technology [Haifa], Performance analysis and optimization of LARge Infrastructures and Systems (POLARIS), Inria Grenoble - Rhône-Alpes, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Laboratoire d'Informatique de Grenoble (LIG), Centre National de la Recherche Scientifique (CNRS)-Université Grenoble Alpes (UGA)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP ), Université Grenoble Alpes (UGA)-Centre National de la Recherche Scientifique (CNRS)-Université Grenoble Alpes (UGA)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP ), Université Grenoble Alpes (UGA), Criteo AI Lab, Criteo [Paris], Ecole Polytechnique Fédérale de Lausanne (EPFL), ANR-16-CE33-0004,ORACLESS,Stratégies adaptatives d'allocation des ressources dans les réseaux sans fil dynamiques(2016), and ANR-19-P3IA-0003,MIAI,MIAI @ Grenoble Alpes(2019)
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, ml-ai, Computer Science - Computer Science and Game Theory, Optimization and Control (math.OC), FOS: Mathematics, MathematicsofComputing_NUMERICALANALYSIS, [MATH.MATH-OC]Mathematics [math]/Optimization and Control [math.OC], Mathematics - Optimization and Control, Machine Learning (cs.LG), Computer Science and Game Theory (cs.GT)
Abstract: This paper develops a methodology for regret minimization with stochastic first-order oracle feedback in online, constrained, non-smooth, non-convex problems. In this setting, the minimization of external regret is beyond reach for first-order methods, and there are no gradient-based algorithmic frameworks capable of providing a solution. On that account, we focus on a local regret measure defined via a proximal-gradient mapping, that also encompasses the original notion proposed by Hazan et al, (2017). To achieve no local regret in this setting, we develop a proximal-gradient method based on stochastic first-order feedback, and a simpler method for when access to a perfect first-order oracle is possible. Both methods are order-optimal (in the min-max sense), and we also establish a bound on the number of proximal-gradient queries these methods require. As an important application of our results, we also obtain a link between online and offline non-convex stochastic optimization manifested as a new proximal-gradient scheme with complexity guarantees matching those obtained via variance reduction techniques.
Published: 2021

71. Algorithmes statistiquement efficaces et en temps polynomial pour les semi-bandits combinatoires

Author: Thibaut Cuvelier, Eric Gourdin, Richard Combes, CentraleSupélec, Laboratoire des signaux et systèmes (L2S), Université Paris-Sud - Paris 11 (UP11)-CentraleSupélec-Centre National de la Recherche Scientifique (CNRS), Orange Labs [Issy les Moulineaux], France Télécom, ACM, and Orange Gardens
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Computational complexity theory, Computer science, Computer Networks and Communications, 0211 other engineering and technologies, Machine Learning (stat.ML), 02 engineering and technology, Combinatorial Bandits, [INFO.INFO-DM]Computer Science [cs]/Discrete Mathematics [cs.DM], 01 natural sciences, Machine Learning (cs.LG), [INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI], Set (abstract data type), 010104 statistics & probability, [INFO.INFO-LG]Computer Science [cs]/Machine Learning [cs.LG], Statistics - Machine Learning, Reinforcement learning, Combinatorial Optimization, [MATH.MATH-CO]Mathematics [math]/Combinatorics [math.CO], FOS: Mathematics, Computer Science (miscellaneous), 0101 mathematics, Safety, Risk, Reliability and Quality, Mathematics - Optimization and Control, Time complexity, Mathematics, Linear function (calculus), 021103 operations research, Approximation algorithm, Regret, Maximization, Function (mathematics), Uncorrelated, Approximation algorithms, Exponential function, Combinatorial optimisation, Optimization and Control (math.OC), Hardware and Architecture, Combinatorial optimization, Bandit algorithms, Bandits, [MATH.MATH-OC]Mathematics [math]/Optimization and Control [math.OC], Algorithm, Software
Abstract: International audience; We consider combinatorial semi-bandits over a set X ⊂ {0, 1} d where rewards are uncorrelated across items. For this problem, the algorithm ESCB yields the smallest known regret bound R(T) = O d (ln m) 2 (ln T) ∆ min after T rounds, where m = max x ∈X 1 ⊤ x. However, ESCB has computational complexity O(|X|), which is typically exponential in d, and cannot be used in large dimensions. We propose the first algorithm that is both computationally and statistically efficient for this problem with regret R(T) = O d (ln m) 2 (ln T) ∆ min and computational asymptotic complexity O(δ −1 T poly(d)), where δ T is a function which vanishes arbitrarily slowly. Our approach involves carefully designing AESCB, an approximate version of ESCB with the same regret guarantees. We show that, whenever budgeted linear maximization over X can be solved up to a given approximation ratio, AESCB is implementable in polynomial time O(δ −1 T poly(d)) by repeatedly maximizing a linear function over X subject to a linear budget constraint, and showing how to solve these maximization problems efficiently. Additional algorithms, proofs and numerical experiments are given in the complete version of this work.
Published: 2021

72. Auction-based and Distributed Optimization Approaches for Scheduling Observations in Satellite Constellations with Exclusive Orbit Portions

Author: Gauthier Picard, ONERA / DTIS, Université de Toulouse [Toulouse], and ONERA-PRES Université de Toulouse
Subjects: FOS: Computer and information sciences, Artificial Intelligence (cs.AI), Optimization and Control (math.OC), Computer Science - Artificial Intelligence, [INFO.INFO-MA]Computer Science [cs]/Multiagent Systems [cs.MA], FOS: Mathematics, Computer Science - Multiagent Systems, [MATH.MATH-OC]Mathematics [math]/Optimization and Control [math.OC], Mathematics - Optimization and Control, Multiagent Systems (cs.MA), [INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI]
Abstract: International audience; We investigate the use of multi-agent allocation techniques on problems related to Earth observation scenarios with multiple users and satellites. We focus on the problem of coordinating users having reserved exclusive orbit portions and one central planner having several requests that may use some intervals of these exclusives. We define this problem as Earth Observation Satellite Constellation Scheduling Problem (EOSCSP) and map it to a Mixed Integer Linear Program. As to solve EOSCSP, we propose market-based techniques and a distributed problem solving technique based on Distributed Constraint Optimization (DCOP), where agents cooperate to allocate requests without sharing their own schedules. These contributions are experimentally evaluated on randomly generated EOSCSP instances based on real large-scale or highly conflicting observation order books.
Published: 2021

73. Non-asymptotic convergence bounds for Wasserstein approximation using point clouds

Author: Merigot, Quentin, Santambrogio, Filippo, Sarrazin, Cl��ment, Laboratoire de Mathématiques d'Orsay (LMO), Université Paris-Saclay-Centre National de la Recherche Scientifique (CNRS), Institut Universitaire de France (IUF), Ministère de l'Education nationale, de l’Enseignement supérieur et de la Recherche (M.E.N.E.S.R.), Institut Camille Jordan [Villeurbanne] (ICJ), École Centrale de Lyon (ECL), Université de Lyon-Université de Lyon-Université Claude Bernard Lyon 1 (UCBL), Université de Lyon-Université Jean Monnet [Saint-Étienne] (UJM)-Institut National des Sciences Appliquées de Lyon (INSA Lyon), Université de Lyon-Institut National des Sciences Appliquées (INSA)-Institut National des Sciences Appliquées (INSA)-Centre National de la Recherche Scientifique (CNRS), and ANR-16-CE40-0014,MAGA,Monge-Ampère et Géométrie Algorithmique(2016)
Subjects: FOS: Computer and information sciences, [STAT.ML]Statistics [stat]/Machine Learning [stat.ML], Optimization and Control (math.OC), Statistics - Machine Learning, Optimal quantization, FOS: Mathematics, Optimal transport, Machine Learning (stat.ML), Wasserstein distance, [MATH.MATH-OC]Mathematics [math]/Optimization and Control [math.OC], Mathematics - Optimization and Control
Abstract: Several issues in machine learning and inverse problems require to generate discrete data, as if sampled from a model probability distribution. A common way to do so relies on the construction of a uniform probability distribution over a set of $N$ points which minimizes the Wasserstein distance to the model distribution. This minimization problem, where the unknowns are the positions of the atoms, is non-convex. Yet, in most cases, a suitably adjusted version of Lloyd's algorithm --- in which Voronoi cells are replaced by Power cells --- leads to configurations with small Wasserstein error. This is surprising because, again, of the non-convex nature of the problem, as well as the existence of spurious critical points. We provide explicit upper bounds for the convergence speed of this Lloyd-type algorithm, starting from a cloud of points sufficiently far from each other. This already works after one step of the iteration procedure, and similar bounds can be deduced, for the corresponding gradient descent. These bounds naturally lead to a modified Poliak-Łojasiewicz inequality for the Wasserstein distance cost, with an error term depending on the distances between Dirac masses in the discrete distribution.
Published: 2021

74. Automated Data-Driven Selection of the Hyperparameters for Total-Variation-Based Texture Segmentation

Author: Samuel Vaiter, Patrice Abry, Barbara Pascal, Nelly Pustelnik, Laboratoire de Physique de l'ENS Lyon (Phys-ENS), École normale supérieure - Lyon (ENS Lyon)-Centre National de la Recherche Scientifique (CNRS)-Université Claude Bernard Lyon 1 (UCBL), Université de Lyon-Université de Lyon, Centre National de la Recherche Scientifique (CNRS), Institut de Mathématiques de Bourgogne [Dijon] (IMB), Centre National de la Recherche Scientifique (CNRS)-Université de Franche-Comté (UFC), Université Bourgogne Franche-Comté [COMUE] (UBFC)-Université Bourgogne Franche-Comté [COMUE] (UBFC)-Université de Bourgogne (UB), ANR-19-CE48-0009,Multisc-In,Multiscale estimation and Interface detection(2019), École normale supérieure de Lyon (ENS de Lyon)-Université de Lyon-Centre National de la Recherche Scientifique (CNRS), and Université de Bourgogne (UB)-Université Bourgogne Franche-Comté [COMUE] (UBFC)-Centre National de la Recherche Scientifique (CNRS)
Subjects: FOS: Computer and information sciences, Statistics and Probability, Computer Science - Machine Learning, Computer science, proximal algorithms, Machine Learning (stat.ML), 02 engineering and technology, Least squares, Regularization (mathematics), Gaussian noise, Machine Learning (cs.LG), symbols.namesake, Bias of an estimator, Statistics - Machine Learning, Strong convexity, FOS: Mathematics, 0202 electrical engineering, electronic engineering, information engineering, [INFO]Computer Science [cs], Texture, [MATH]Mathematics [math], Mathematics - Optimization and Control, Total variation, Covariance matrix, Applied Mathematics, segmentation, SURE, Estimator, Stein Unbiased Risk Estimator, 68U10, 65K10, 65C20, 62F12, Condensed Matter Physics, Noise, Optimization and Control (math.OC), Modeling and Simulation, Texture segmentation, Hyperparameter optimization, Regularization parameters tuning, Algorithmic differentiation, symbols, 020201 artificial intelligence & image processing, Geometry and Topology, Computer Vision and Pattern Recognition, Algorithm, [SPI.SIGNAL]Engineering Sciences [physics]/Signal and Image processing, Estimation
Abstract: Penalized Least Squares are widely used in signal and image processing. Yet, it suffers from a major limitation since it requires fine-tuning of the regularization parameters. Under assumptions on the noise probability distribution, Stein-based approaches provide unbiased estimator of the quadratic risk. The Generalized Stein Unbiased Risk Estimator is revisited to handle correlated Gaussian noise without requiring to invert the covariance matrix. Then, in order to avoid expansive grid search, it is necessary to design algorithmic scheme minimizing the quadratic risk with respect to regularization parameters. This work extends the Stein's Unbiased GrAdient estimator of the Risk of Deledalle et al. to the case of correlated Gaussian noise, deriving a general automatic tuning of regularization parameters. First, the theoretical asymptotic unbiasedness of the gradient estimator is demonstrated in the case of general correlated Gaussian noise. Then, the proposed parameter selection strategy is particularized to fractal texture segmentation, where problem formulation naturally entails inter-scale and spatially correlated noise. Numerical assessment is provided, as well as discussion of the practical issues., Submitted to SIAM Imaging Science
Published: 2021

75. Learning Value Functions in Deep Policy Gradients using Residual Variance

Author: Flet-Berliac, Yannis, Ouhamma, Reda, Maillard, Odalric-Ambrym, Preux, Philippe, Sequential Learning (SEQUEL), Inria Lille - Nord Europe, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189 (CRIStAL), Université de Lille-Ecole Centrale de Lille-Centre National de la Recherche Scientifique (CNRS)-Université de Lille-Ecole Centrale de Lille-Centre National de la Recherche Scientifique (CNRS), Scool (Scool), and Centrale Lille-Université de Lille-Centre National de la Recherche Scientifique (CNRS)-Centrale Lille-Université de Lille-Centre National de la Recherche Scientifique (CNRS)
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Artificial Intelligence (cs.AI), [INFO.INFO-LG]Computer Science [cs]/Machine Learning [cs.LG], [STAT.ML]Statistics [stat]/Machine Learning [stat.ML], Optimization and Control (math.OC), Computer Science - Artificial Intelligence, Statistics - Machine Learning, FOS: Mathematics, Machine Learning (stat.ML), Mathematics - Optimization and Control, Machine Learning (cs.LG), [INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI]
Abstract: Policy gradient algorithms have proven to be successful in diverse decision making and control tasks. However, these methods suffer from high sample complexity and instability issues. In this paper, we address these challenges by providing a different approach for training the critic in the actor-critic framework. Our work builds on recent studies indicating that traditional actor-critic algorithms do not succeed in fitting the true value function, calling for the need to identify a better objective for the critic. In our method, the critic uses a new state-value (resp. state-action-value) function approximation that learns the value of the states (resp. state-action pairs) relative to their mean value rather than the absolute value as in conventional actor-critic. We prove the theoretical consistency of the new gradient estimator and observe dramatic empirical improvement across a variety of continuous control tasks and algorithms. Furthermore, we validate our method in tasks with sparse rewards, where we provide experimental evidence and theoretical insights., Accepted at ICLR 2021
Published: 2021

76. Adaptive extra-gradient methods for min-max optimization and games

Author: Antonakopoulos, Kimon, Belmega, Veronica, Mertikopoulos, Panayotis, Performance analysis and optimization of LARge Infrastructures and Systems (POLARIS), Inria Grenoble - Rhône-Alpes, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Laboratoire d'Informatique de Grenoble (LIG), Centre National de la Recherche Scientifique (CNRS)-Université Grenoble Alpes (UGA)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP ), Université Grenoble Alpes (UGA)-Centre National de la Recherche Scientifique (CNRS)-Université Grenoble Alpes (UGA)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP ), Université Grenoble Alpes (UGA), Ecole Nationale Supérieure de l'Electronique et de ses Applications (ENSEA), ANR-16-CE33-0004,ORACLESS,Stratégies adaptatives d'allocation des ressources dans les réseaux sans fil dynamiques(2016), ANR-19-P3IA-0003,MIAI,MIAI @ Grenoble Alpes(2019), Mertikopoulos, Panayotis, Stratégies adaptatives d'allocation des ressources dans les réseaux sans fil dynamiques - - ORACLESS2016 - ANR-16-CE33-0004 - AAPG2016 - VALID, MIAI @ Grenoble Alpes - - MIAI2019 - ANR-19-P3IA-0003 - P3IA - VALID, Laboratoires d'excellence - Systemes et Algorithmes Pervasifs au confluent des mondes physique et numérique - - PERSYVAL-lab2011 - ANR-11-LABX-0025 - LABX - VALID, Apprentissage adaptatif multi-agent - - ALIAS2019 - ANR-19-CE48-0018 - AAPG2019 - VALID, APPEL À PROJETS GÉNÉRIQUE 2018 - Technologies Emergentes pour l'Internet des Objets - - ELIOT2018 - ANR-18-CE40-0030 - AAPG2018 - VALID, Equipes Traitement de l'Information et Systèmes (ETIS - UMR 8051), Ecole Nationale Supérieure de l'Electronique et de ses Applications (ENSEA)-Centre National de la Recherche Scientifique (CNRS)-CY Cergy Paris Université (CY), FAPESP 2018/12579-7, ANR-11-LABX-0025,PERSYVAL-lab,Systemes et Algorithmes Pervasifs au confluent des mondes physique et numérique(2011), ANR-19-CE48-0018,ALIAS,Apprentissage adaptatif multi-agent(2019), and ANR-18-CE40-0030,ELIOT,Technologies Emergentes pour l'Internet des Objets(2018)
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Computer Science - Computer Science and Game Theory, Optimization and Control (math.OC), Primary 90C47, 91A68, secondary 49J40, 90C33, FOS: Mathematics, [MATH.MATH-OC] Mathematics [math]/Optimization and Control [math.OC], [MATH.MATH-OC]Mathematics [math]/Optimization and Control [math.OC], Mathematics - Optimization and Control, Computer Science and Game Theory (cs.GT), Machine Learning (cs.LG)
Abstract: We present a new family of min-max optimization algorithms that automatically exploit the geometry of the gradient data observed at earlier iterations to perform more informative extra-gradient steps in later ones. Thanks to this adaptation mechanism, the proposed method automatically detects whether the problem is smooth or not, without requiring any prior tuning by the optimizer. As a result, the algorithm simultaneously achieves order-optimal convergence rates, i.e., it converges to an $\varepsilon$-optimal solution within $\mathcal{O}(1/\varepsilon)$ iterations in smooth problems, and within $\mathcal{O}(1/\varepsilon^2)$ iterations in non-smooth ones. Importantly, these guarantees do not require any of the standard boundedness or Lipschitz continuity conditions that are typically assumed in the literature; in particular, they apply even to problems with singularities (such as resource allocation problems and the like). This adaptation is achieved through the use of a geometric apparatus based on Finsler metrics and a suitably chosen mirror-prox template that allows us to derive sharp convergence rates for the methods at hand., Comment: 28 pages, 5 figures, 1 table
Published: 2021

77. Factor Graph-Based Smoothing Without Matrix Inversion for Highly Precise Localization

Author: Paul Chauchat, Axel Barrau, Silvere Bonnabel, Institut Supérieur de l'Aéronautique et de l'Espace (ISAE-SUPAERO), Safran Tech, SAFRAN Group, Institut de sciences exactes et appliquées (ISEA), Université de la Nouvelle-Calédonie (UNC), Centre de Robotique (CAOR), MINES ParisTech - École nationale supérieure des mines de Paris, and Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)
Subjects: FOS: Computer and information sciences, Optimization, 0209 industrial biotechnology, Optimization problem, Computer science, Covariance matrices, Factor graphs, Autonomous vehicles, Systems and Control (eess.SY), 02 engineering and technology, Simultaneous localization and mapping, [INFO.INFO-CG]Computer Science [cs]/Computational Geometry [cs.CG], Electrical Engineering and Systems Science - Systems and Control, 01 natural sciences, Smoothing methods, localization, ill-conditionning, Computer Science::Robotics, Computer Science - Robotics, [SPI]Engineering Sciences [physics], 020901 industrial engineering & automation, Inertial measurement unit, FOS: Electrical engineering, electronic engineering, information engineering, FOS: Mathematics, inertial navigation, [INFO.INFO-RB]Computer Science [cs]/Robotics [cs.RO], Electrical and Electronic Engineering, Mathematics - Optimization and Control, Inertial navigation system, Information filtering system, ComputingMilieux_MISCELLANEOUS, information filter, 010401 analytical chemistry, Kalman filter, 0104 chemical sciences, Optimization and Control (math.OC), Control and Systems Engineering, A priori and a posteriori, [MATH.MATH-OC]Mathematics [math]/Optimization and Control [math.OC], Robotics (cs.RO), Algorithm, Kalman filters, Smoothing
Abstract: International audience; We consider the problem of localizing a manned, semi-autonomous, or autonomous vehicle in the environment using information coming from the vehicle's sensors, a problem known as navigation or simultaneous localization and mapping (SLAM) depending on the context. To infer knowledge from sensors' measurements, while drawing on a priori knowledge about the vehicle's dynamics, modern approaches solve an optimization problem to compute the most likely trajectory given all past observations, an approach known as smoothing. Improving smoothing solvers is an active field of research in the SLAM community. Most work is focused on reducing computation load by inverting the involved linear system while preserving its sparsity. The present paper raises an issue which, to the knowledge of the authors, has not been addressed yet: standard smoothing solvers require explicitly using the inverse of sensor noise covariance matrices. This means the parameters that reflect the noise magnitude must be sufficiently large for the smoother to properly function. When matrices are close to singular, which is the case when using high precision modern inertial measurement units (IMU), numerical issues necessarily arise, especially with 32-bits implementation demanded by most industrial aerospace applications. We discuss these issues and propose a solution that builds upon the Kalman filter to improve smoothing algorithms. We then leverage the results to devise a localization algorithm based on fusion of IMU and vision sensors. Successful real experiments using an actual car equipped with a tactical grade high performance IMU and a LiDAR illustrate the relevance of the approach to the field of autonomous vehicles.
Published: 2021

78. Revisiting the Role of Euler Numerical Integration on Acceleration and Stability in Convex Optimization

Author: Zhang, Peiyuan, Orvieto, Antonio, Daneshmand, Hadi, Hofmann, Thomas, Smith, Roy, Eidgenössische Technische Hochschule - Swiss Federal Institute of Technology [Zürich] (ETH Zürich), Statistical Machine Learning and Parsimony (SIERRA), Département d'informatique de l'École normale supérieure (DI-ENS), École normale supérieure - Paris (ENS Paris), Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-École normale supérieure - Paris (ENS Paris), Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Centre National de la Recherche Scientifique (CNRS)-Inria de Paris, Institut National de Recherche en Informatique et en Automatique (Inria), Département d'informatique - ENS Paris (DI-ENS), Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche en Informatique et en Automatique (Inria)-École normale supérieure - Paris (ENS Paris), Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche en Informatique et en Automatique (Inria)-École normale supérieure - Paris (ENS Paris), Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-Centre National de la Recherche Scientifique (CNRS)-Inria de Paris, Banerjee, Arindam, Fukumizu, Kenji, daneshmand, hadi, École normale supérieure - Paris (ENS-PSL), and Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-École normale supérieure - Paris (ENS-PSL)
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, [INFO.INFO-LG] Computer Science [cs]/Machine Learning [cs.LG], [INFO] Computer Science [cs], [STAT.ML] Statistics [stat]/Machine Learning [stat.ML], Machine Learning (cs.LG), [STAT] Statistics [stat], [STAT]Statistics [stat], [INFO.INFO-LG]Computer Science [cs]/Machine Learning [cs.LG], [STAT.ML]Statistics [stat]/Machine Learning [stat.ML], Optimization and Control (math.OC), FOS: Mathematics, [INFO]Computer Science [cs], Mathematics - Optimization and Control
Abstract: Viewing optimization methods as numerical integrators for ordinary differential equations (ODEs) provides a thought-provoking modern framework for studying accelerated first-order optimizers. In this literature, acceleration is often supposed to be linked to the quality of the integrator (accuracy, energy preservation, symplecticity). In this work, we propose a novel ordinary differential equation that questions this connection: both the explicit and the semi-implicit (a.k.a symplectic) Euler discretizations on this ODE lead to an accelerated algorithm for convex programming. Although semi-implicit methods are well-known in numerical analysis to enjoy many desirable features for the integration of physical systems, our findings show that these properties do not necessarily relate to acceleration., Proceedings of Machine Learning Research, 130, ISSN:2640-3498, Proceedings of the 24th International Conference on Artificial Intelligence and Statistics (AISTATS 2021)
Published: 2021

79. Kuhn's Equivalence Theorem for Games in Product Form

Author: Benjamin Heymann, Michel De Lara, Jean-Philippe Chancelier, Criteo AI Lab, Criteo [Paris], Centre d'Enseignement et de Recherche en Mathématiques, Informatique et Calcul Scientifique (CERMICS), and Institut National de Recherche en Informatique et en Automatique (Inria)-École des Ponts ParisTech (ENPC)
Subjects: FOS: Computer and information sciences, perfect recall, Economics and Econometrics, Computer Science::Computer Science and Game Theory, Games with information, [INFO.INFO-GT]Computer Science [cs]/Computer Science and Game Theory [cs.GT], Witsenhausen intrinsic model, ComputingMilieux_PERSONALCOMPUTING, Kuhn's equivalence theorem, Computer Science - Computer Science and Game Theory, Optimization and Control (math.OC), FOS: Mathematics, [MATH.MATH-OC]Mathematics [math]/Optimization and Control [math.OC], Mathematics - Optimization and Control, Finance, Computer Science and Game Theory (cs.GT)
Abstract: We propose an alternative to the tree representation of extensive form games. Games in product form represent information with $\sigma$-fields over a product set, and do not require an explicit description of the play temporality, as opposed to extensive form games on trees. This representation encompasses games with a continuum of actions, randomness and players, as well as games for which the play order cannot be determined in advance. We adapt and prove Kuhn's theorem-regarding equivalence between mixed and behavioral strategies under perfect recall-for games in product form with continuous action sets.
Published: 2021

80. A Semidefinite Optimization-based Branch-and-Bound Algorithm for Several Reactive Optimal Power Flow Problems

Author: Sliwak, Julie, Anjos, Miguel, L��tocart, Lucas, Traversi, Emiliano, Département de Mathématiques et de Génie Industriel (MAGI), École Polytechnique de Montréal (EPM), Laboratoire d'Informatique de Paris-Nord (LIPN), Université Sorbonne Paris Cité (USPC)-Institut Galilée-Université Paris 13 (UP13)-Centre National de la Recherche Scientifique (CNRS), Réseau de Transport d'Electricité [Paris] (RTE), School of Mathematics - University of Edinburgh, University of Edinburgh, and Centre National de la Recherche Scientifique (CNRS)-Université Sorbonne Paris Nord
Subjects: FOS: Computer and information sciences, Computer Science - Robotics, Optimization and Control (math.OC), FOS: Mathematics, [MATH.MATH-OC]Mathematics [math]/Optimization and Control [math.OC], [INFO.INFO-RO]Computer Science [cs]/Operations Research [cs.RO], Robotics (cs.RO), Mathematics - Optimization and Control
Abstract: The Reactive Optimal Power Flow (ROPF) problem consists in computing an optimal power generation dispatch for an alternating current transmission network that respects power flow equations and operational constraints. Some means of action on the voltage are modelled in the ROPF problem such as the possible activation of shunts, which implies discrete variables. The ROPF problem belongs to the class of nonconvex MINLPs (Mixed-Integer Nonlinear Problems), which are NP-hard problems. In this paper, we solve three new variants of the ROPF problem by using a semidefinite optimization-based Branch-and-Bound algorithm. We present results on MATPOWER instances and we show that this method can solve to global optimality most instances. On the instances not solved to optimality, our algorithm is able to find solutions with a value better than the ones obtained by a rounding algorithm. We also demonstrate that applying an appropriate clique merging algorithm can significantly speed up the resolution of semidefinite relaxations of ROPF large instances.
Published: 2021

81. Accelerating Block Coordinate Descent for Nonnegative Tensor Factorization

Author: Jeremy E. Cohen, Le Thi Khanh Hien, Andersen Man Shun Ang, Nicolas Gillis, Parcimonie et Nouveaux Algorithmes pour le Signal et la Modélisation Audio (PANAMA), Inria Rennes – Bretagne Atlantique, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-SIGNAUX ET IMAGES NUMÉRIQUES, ROBOTIQUE (IRISA-D5), Institut de Recherche en Informatique et Systèmes Aléatoires (IRISA), Université de Rennes (UR)-Institut National des Sciences Appliquées - Rennes (INSA Rennes), Institut National des Sciences Appliquées (INSA)-Institut National des Sciences Appliquées (INSA)-Université de Bretagne Sud (UBS)-École normale supérieure - Rennes (ENS Rennes)-Institut National de Recherche en Informatique et en Automatique (Inria)-CentraleSupélec-Centre National de la Recherche Scientifique (CNRS)-IMT Atlantique (IMT Atlantique), Institut Mines-Télécom [Paris] (IMT)-Institut Mines-Télécom [Paris] (IMT)-Université de Rennes (UR)-Institut National des Sciences Appliquées - Rennes (INSA Rennes), Institut Mines-Télécom [Paris] (IMT)-Institut Mines-Télécom [Paris] (IMT)-Institut de Recherche en Informatique et Systèmes Aléatoires (IRISA), Institut National des Sciences Appliquées (INSA)-Institut National des Sciences Appliquées (INSA)-Université de Bretagne Sud (UBS)-École normale supérieure - Rennes (ENS Rennes)-CentraleSupélec-Centre National de la Recherche Scientifique (CNRS)-IMT Atlantique (IMT Atlantique), Institut Mines-Télécom [Paris] (IMT)-Institut Mines-Télécom [Paris] (IMT), FNRS, Fonds Wetenschappelijk Onderzoek - Vlaanderen, O005318F-RG47, ANR-20-CE23-0010,LoRAiA,Approximations de rang faible pour l'intelligence artificielle(2020), European Project: 679515,H2020,ERC-2015-STG,COLORAMAP(2016), Université de Rennes 1 (UR1), Université de Rennes (UNIV-RENNES)-Université de Rennes (UNIV-RENNES)-Institut National des Sciences Appliquées - Rennes (INSA Rennes), Institut National des Sciences Appliquées (INSA)-Université de Rennes (UNIV-RENNES)-Institut National des Sciences Appliquées (INSA)-Université de Bretagne Sud (UBS)-École normale supérieure - Rennes (ENS Rennes)-Institut National de Recherche en Informatique et en Automatique (Inria)-CentraleSupélec-Centre National de la Recherche Scientifique (CNRS)-IMT Atlantique Bretagne-Pays de la Loire (IMT Atlantique), Institut Mines-Télécom [Paris] (IMT)-Institut Mines-Télécom [Paris] (IMT)-Université de Rennes 1 (UR1), Institut National des Sciences Appliquées (INSA)-Université de Rennes (UNIV-RENNES)-Institut National des Sciences Appliquées (INSA)-Université de Bretagne Sud (UBS)-École normale supérieure - Rennes (ENS Rennes)-CentraleSupélec-Centre National de la Recherche Scientifique (CNRS)-IMT Atlantique Bretagne-Pays de la Loire (IMT Atlantique), SIGNAUX ET IMAGES NUMÉRIQUES, ROBOTIQUE (IRISA-D5), Institut Mines-Télécom [Paris] (IMT)-Institut Mines-Télécom [Paris] (IMT)-Inria Rennes – Bretagne Atlantique, and Institut National de Recherche en Informatique et en Automatique (Inria)
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Heuristic (computer science), Extrapolation, MathematicsofComputing_NUMERICALANALYSIS, Machine Learning (stat.ML), 010103 numerical & computational mathematics, 01 natural sciences, Machine Learning (cs.LG), [STAT.ML]Statistics [stat]/Machine Learning [stat.ML], Statistics - Machine Learning, Convergence (routing), FOS: Mathematics, Mathematics - Numerical Analysis, Nonnegative tensor factorization, 0101 mathematics, Coordinate descent, Mathematics - Optimization and Control, Computational budget, Block (data storage), Mathematics, Descent (mathematics), Algebra and Number Theory, Applied Mathematics, Numerical Analysis (math.NA), 010101 applied mathematics, Optimization and Control (math.OC), Algorithm
Abstract: This paper is concerned with improving the empirical convergence speed of block-coordinate descent algorithms for approximate nonnegative tensor factorization (NTF). We propose an extrapolation strategy in-between block updates, referred to as heuristic extrapolation with restarts (HER). HER significantly accelerates the empirical convergence speed of most existing block-coordinate algorithms for dense NTF, in particular for challenging computational scenarios, while requiring a negligible additional computational budget., 32 pages, 24 figures
Published: 2021

82. Solving Inverse Problems by Joint Posterior Maximization with Autoencoding Prior

Author: Mario González, Andrés Almansa, Pauline Tan, Universidad de la República (UDELAR), Mathématiques Appliquées Paris 5 (MAP5 - UMR 8145), Institut National des Sciences Mathématiques et de leurs Interactions (INSMI)-Centre National de la Recherche Scientifique (CNRS)-Université de Paris (UP), Laboratoire Jacques-Louis Lions (LJLL (UMR_7598)), Sorbonne Université (SU)-Centre National de la Recherche Scientifique (CNRS)-Université de Paris (UP), and ANR-19-CE23-0027,PostProdLEAP,Repenser la post-production d'archives avec des méthodes à patch, variationnelles et par apprentissage(2019)
Subjects: FOS: Computer and information sciences, Bi-convex Optimization, Computer Science - Machine Learning, General Mathematics, Computer Vision and Pattern Recognition (cs.CV), Inverse Problems, Computer Science - Computer Vision and Pattern Recognition, Machine Learning (stat.ML), Machine Learning (cs.LG), [INFO.INFO-LG]Computer Science [cs]/Machine Learning [cs.LG], Statistics - Machine Learning, FOS: Electrical engineering, electronic engineering, information engineering, FOS: Mathematics, Generative Models, Image Restoration, Variational Auto-encoders, Mathematics - Optimization and Control, Applied Mathematics, Image and Video Processing (eess.IV), Electrical Engineering and Systems Science - Image and Video Processing, Optimization and Control (math.OC), [INFO.INFO-TI]Computer Science [cs]/Image Processing [eess.IV], [MATH.MATH-OC]Mathematics [math]/Optimization and Control [math.OC], [SPI.SIGNAL]Engineering Sciences [physics]/Signal and Image processing, Bayesian Statistics, [MATH.MATH-NA]Mathematics [math]/Numerical Analysis [math.NA]
Abstract: In this work we address the problem of solving ill-posed inverse problems in imaging where the prior is a variational autoencoder (VAE). Specifically we consider the decoupled case where the prior is trained once and can be reused for many different log-concave degradation models without retraining. Whereas previous MAP-based approaches to this problem lead to highly non-convex optimization algorithms, our approach computes the joint (space-latent) MAP that naturally leads to alternate optimization algorithms and to the use of a stochastic encoder to accelerate computations. The resulting technique (JPMAP) performs Joint Posterior Maximization using an Autoencoding Prior. We show theoretical and experimental evidence that the proposed objective function is quite close to bi-convex. Indeed it satisfies a weak bi-convexity property which is sufficient to guarantee that our optimization scheme converges to a stationary point. We also highlight the importance of correctly training the VAE using a denoising criterion, in order to ensure that the encoder generalizes well to out-of-distribution images, without affecting the quality of the generative model. This simple modification is key to providing robustness to the whole procedure. Finally we show how our joint MAP methodology relates to more common MAP approaches, and we propose a continuation scheme that makes use of our JPMAP algorithm to provide more robust MAP estimates. Experimental results also show the higher quality of the solutions obtained by our JPMAP approach with respect to other non-convex MAP approaches which more often get stuck in spurious local optima., Comment: arXiv admin note: text overlap with arXiv:1911.06379
Published: 2021

83. Some Remarks on Replicated Simulated Annealing

Author: Vincent Gripon, Franck Vermet, Matthias Löwe, Département Mathematical and Electrical Engineering (IMT Atlantique - MEE), IMT Atlantique Bretagne-Pays de la Loire (IMT Atlantique), Institut Mines-Télécom [Paris] (IMT)-Institut Mines-Télécom [Paris] (IMT), Fachbereich Mathematik und Informatik [Münster] = Fachbereich Mathematik und Informatik [Münster] (FB 10), Westfälische Wilhelms-Universität Münster (WWU), Laboratoire de mathématiques de Brest (LM), Université de Brest (UBO)-Institut Brestois du Numérique et des Mathématiques (IBNM), and Université de Brest (UBO)-Centre National de la Recherche Scientifique (CNRS)
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Computer science, Machine Learning (stat.ML), 01 natural sciences, Machine Learning (cs.LG), [INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI], 010104 statistics & probability, [INFO.INFO-LG]Computer Science [cs]/Machine Learning [cs.LG], [INFO.INFO-TS]Computer Science [cs]/Signal and Image Processing, Statistics - Machine Learning, Convergence (routing), FOS: Mathematics, Neural and Evolutionary Computing (cs.NE), 0101 mathematics, Mathematics - Optimization and Control, Mathematical Physics, Ansatz, Artificial neural network, [INFO.INFO-GT]Computer Science [cs]/Computer Science and Game Theory [cs.GT], Replica, Probability (math.PR), 010102 general mathematics, Computer Science - Neural and Evolutionary Computing, Statistical and Nonlinear Physics, Perceptron, Optimization and Control (math.OC), Simulated annealing, Algorithm, Mathematics - Probability
Abstract: Recently authors have introduced the idea of training discrete weights neural networks using a mix between classical simulated annealing and a replica ansatz known from the statistical physics literature. Among other points, they claim their method is able to find robust configurations. In this paper, we analyze this so called “replicated simulated annealing” algorithm. In particular, we give criteria to guarantee its convergence, and study when it successfully samples from configurations. We also perform experiments using synthetic and real data bases.
Published: 2021

84. Bayesian optimization of variable-size design space problems

Author: El-Ghazali Talbi, Mathieu Balesdent, Julien Pelamatti, Yannick Guerin, Loïc Brevault, DTIS, ONERA, Université Paris Saclay [Palaiseau], ONERA-Université Paris-Saclay, Inria Lille - Nord Europe, Institut National de Recherche en Informatique et en Automatique (Inria), and Centre National d'Etudes Spatiales - Direction Des Lanceurs. (CNES)
Subjects: FOS: Computer and information sciences, Mathematical optimization, Control and Optimization, Optimization problem, Discrete variables, Computer science, 0211 other engineering and technologies, Complex system, Aerospace Engineering, Machine Learning (stat.ML), Variable-size design space optimization problems, 010103 numerical & computational mathematics, 02 engineering and technology, 01 natural sciences, [SPI]Engineering Sciences [physics], Statistics - Machine Learning, Convergence (routing), FOS: Mathematics, 0101 mathematics, Electrical and Electronic Engineering, [MATH]Mathematics [math], Mathematics - Optimization and Control, Civil and Structural Engineering, Variable (mathematics), Bayesian optimization, [PHYS]Physics [physics], 021103 operations research, Mechanical Engineering, Function (mathematics), Covariance, Mixed-variable optimization problems, Optimization and Control (math.OC), Focus (optics), Software
Abstract: International audience; Within the framework of complex system design, it is often necessary to solve mixed variable optimization problems, in which the objective and constraint functions can depend simultaneously on continuous and discrete variables. Additionally, complex system design problems occasionally present a variable-size design space. This results in an optimization problem for which the search space varies dynamically (with respect to both number and type of variables) along the optimization process as a function of the values of specific discrete decision variables. Similarly, the number and type of constraints can vary as well. In this paper, two alternative Bayesian optimization-based approaches are proposed in order to solve this type of optimization problems. The first one consists of a budget allocation strategy allowing to focus the computational budget on the most promising design sub-spaces. The second approach, instead, is based on the definition of a kernel function allowing to compute the covariance between samples characterized by partially different sets of variables. The results obtained on analytical and engineering related test-cases show a faster and more consistent convergence of both proposed methods with respect to the standard approaches.
Published: 2021

85. Acceleration Methods

Author: D'Aspremont, Alexandre, Scieur, Damien, Taylor, Adrien, Centre National de la Recherche Scientifique (CNRS), Statistical Machine Learning and Parsimony (SIERRA), Département d'informatique de l'École normale supérieure (DI-ENS), École normale supérieure - Paris (ENS Paris), Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-École normale supérieure - Paris (ENS Paris), Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Centre National de la Recherche Scientifique (CNRS)-Inria de Paris, Institut National de Recherche en Informatique et en Automatique (Inria), Montreal Institute for Learning Algorithms [Montréal] (MILA), Centre de Recherches Mathématiques [Montréal] (CRM), Université de Montréal (UdeM)-Université de Montréal (UdeM), AA would like to acknowledge support from the ML and Optimisation joint research initiative with the fonds AXA pour la recherche and Kamet Ventures, a Google focused award,as well as funding by the French government under management of Agence Nationale de la Recherche as part of the 'Investissements d’avenir' program, reference ANR-19-P3IA-0001 (PRAIRIE 3IA Institute). AT acknowledges support from the European Research Council (ERC grant SEQUOIA 724063)., ANR-19-P3IA-0001,PRAIRIE,PaRis Artificial Intelligence Research InstitutE(2019), European Project: 724063,ERC-2016-COG,SEQUOIA(2017), Département d'informatique - ENS Paris (DI-ENS), Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche en Informatique et en Automatique (Inria)-École normale supérieure - Paris (ENS Paris), Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche en Informatique et en Automatique (Inria)-École normale supérieure - Paris (ENS Paris), and Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-Centre National de la Recherche Scientifique (CNRS)-Inria de Paris
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, [INFO.INFO-LG]Computer Science [cs]/Machine Learning [cs.LG], Optimization and Control (math.OC), FOS: Mathematics, Numerical Analysis (math.NA), Mathematics - Numerical Analysis, [MATH.MATH-OC]Mathematics [math]/Optimization and Control [math.OC], Mathematics - Optimization and Control, [MATH.MATH-NA]Mathematics [math]/Numerical Analysis [math.NA], Machine Learning (cs.LG)
Abstract: This monograph covers some recent advances in a range of acceleration techniques frequently used in convex optimization. We first use quadratic optimization problems to introduce two key families of methods, namely momentum and nested optimization schemes. They coincide in the quadratic case to form the Chebyshev method. We discuss momentum methods in detail, starting with the seminal work of Nesterov and structure convergence proofs using a few master templates, such as that for optimized gradient methods, which provide the key benefit of showing how momentum methods optimize convergence guarantees. We further cover proximal acceleration, at the heart of the Catalyst and Accelerated Hybrid Proximal Extragradient frameworks, using similar algorithmic patterns. Common acceleration techniques rely directly on the knowledge of some of the regularity parameters in the problem at hand. We conclude by discussing restart schemes, a set of simple techniques for reaching nearly optimal convergence rates while adapting to unobserved regularity parameters., Published in Foundation and Trends in Optimization (see https://www.nowpublishers.com/article/Details/OPT-036)
Published: 2021

86. Expanding boundaries of Gap Safe screening

Author: Cássio Dantas, Soubies, Emmanuel, Fevotte, Cedric, Signal et Communications (IRIT-SC), Institut de recherche en informatique de Toulouse (IRIT), Université Toulouse 1 Capitole (UT1), Université Fédérale Toulouse Midi-Pyrénées-Université Fédérale Toulouse Midi-Pyrénées-Université Toulouse - Jean Jaurès (UT2J)-Université Toulouse III - Paul Sabatier (UT3), Université Fédérale Toulouse Midi-Pyrénées-Centre National de la Recherche Scientifique (CNRS)-Institut National Polytechnique (Toulouse) (Toulouse INP), Université Fédérale Toulouse Midi-Pyrénées-Université Toulouse 1 Capitole (UT1), Université Fédérale Toulouse Midi-Pyrénées, Centre National de la Recherche Scientifique (CNRS), ANR-19-P3IA-0004,ANITI,Artificial and Natural Intelligence Toulouse Institute(2019), European Project: 681839,H2020,ERC-2015-CoG,FACTORY(2016), Université Toulouse Capitole (UT Capitole), Université de Toulouse (UT)-Université de Toulouse (UT)-Université Toulouse - Jean Jaurès (UT2J), Université de Toulouse (UT)-Université Toulouse III - Paul Sabatier (UT3), Université de Toulouse (UT)-Centre National de la Recherche Scientifique (CNRS)-Institut National Polytechnique (Toulouse) (Toulouse INP), Université de Toulouse (UT)-Toulouse Mind & Brain Institut (TMBI), Université Toulouse - Jean Jaurès (UT2J), Université de Toulouse (UT)-Université de Toulouse (UT)-Université Toulouse III - Paul Sabatier (UT3), Université de Toulouse (UT)-Université Toulouse Capitole (UT Capitole), and Université de Toulouse (UT)
Subjects: Signal Processing (eess.SP), FOS: Computer and information sciences, Computer Science - Machine Learning, non-negativity, sparse regression, Machine Learning (cs.LG), Convex optimization, [INFO.INFO-LG]Computer Science [cs]/Machine Learning [cs.LG], [INFO.INFO-TS]Computer Science [cs]/Signal and Image Processing, Optimization and Control (math.OC), FOS: Electrical engineering, electronic engineering, information engineering, FOS: Mathematics, safe screening rules, [MATH.MATH-OC]Mathematics [math]/Optimization and Control [math.OC], Electrical Engineering and Systems Science - Signal Processing, Mathematics - Optimization and Control, beta-divergence
Abstract: International audience; Sparse optimization problems are ubiquitous in many fields such as statistics, signal/image processing and machine learning. This has led to the birth of many iterative algorithms to solve them. A powerful strategy to boost the performance of these algorithms is known as safe screening: it allows the early identification of zero coordinates in the solution, which can then be eliminated to reduce the problem's size and accelerate convergence. In this work, we extend the existing Gap Safe screening framework by relaxing the global strong-concavity assumption on the dual cost function. Instead, we exploit local regularity properties, that is, strong concavity on well-chosen subsets of the domain. The non-negativity constraint is also integrated to the existing framework. Besides making safe screening possible to a broader class of functions that includes beta-divergences (e.g., the Kullback-Leibler divergence), the proposed approach also improves upon the existing Gap Safe screening rules on previously applicable cases (e.g., logistic regression). The proposed general framework is exemplified by some notable particular cases: logistic function, beta = 1.5 and Kullback-Leibler divergences. Finally, we showcase the effectiveness of the proposed screening rules with different solvers (coordinate descent, multiplicative-update and proximal gradient algorithms) and different data sets (binary classification, hyperspectral and count data).
Published: 2021

87. Denoising modulo samples: k-NN regression and tightness of SDP relaxation

Author: Michaël Fanuel, Hemant Tyagi, Department of Electrical Engineering [KU Leuven] (KU-ESAT), Catholic University of Leuven - Katholieke Universiteit Leuven (KU Leuven), MOdel for Data Analysis and Learning (MODAL), Inria Lille - Nord Europe, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Laboratoire Paul Painlevé - UMR 8524 (LPP), Centre National de la Recherche Scientifique (CNRS)-Université de Lille-Centre National de la Recherche Scientifique (CNRS)-Université de Lille-Evaluation des technologies de santé et des pratiques médicales - ULR 2694 (METRICS), Université de Lille-Centre Hospitalier Régional Universitaire [Lille] (CHRU Lille)-Université de Lille-Centre Hospitalier Régional Universitaire [Lille] (CHRU Lille)-École polytechnique universitaire de Lille (Polytech Lille)-Université de Lille, Sciences et Technologies, Laboratoire Paul Painlevé (LPP), Université de Lille-Centre National de la Recherche Scientifique (CNRS)-Université de Lille-Centre National de la Recherche Scientifique (CNRS)-Université de Lille, Sciences et Technologies-Inria Lille - Nord Europe, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Evaluation des technologies de santé et des pratiques médicales - ULR 2694 (METRICS), Université de Lille-Centre Hospitalier Régional Universitaire [Lille] (CHRU Lille)-Université de Lille-Centre Hospitalier Régional Universitaire [Lille] (CHRU Lille)-École polytechnique universitaire de Lille (Polytech Lille), Université de Lille-Centre National de la Recherche Scientifique (CNRS)-Université de Lille-Centre National de la Recherche Scientifique (CNRS)-Evaluation des technologies de santé et des pratiques médicales - ULR 2694 (METRICS), Laboratoire Paul Painlevé - UMR 8524 (LPP), and Centre National de la Recherche Scientifique (CNRS)-Université de Lille-Centre National de la Recherche Scientifique (CNRS)-Université de Lille-Université de Lille, Sciences et Technologies-Inria Lille - Nord Europe
Subjects: FOS: Computer and information sciences, Statistics and Probability, Modulo, Machine Learning (stat.ML), Mathematics - Statistics Theory, Statistics Theory (math.ST), 01 natural sciences, 030218 nuclear medicine & medical imaging, k-nearest neighbors algorithm, 010309 optics, Combinatorics, 03 medical and health sciences, 0302 clinical medicine, Statistics - Machine Learning, [MATH.MATH-ST]Mathematics [math]/Statistics [math.ST], 0103 physical sciences, FOS: Mathematics, Mathematics - Optimization and Control, Mathematics, Numerical Analysis, Quadratically constrained quadratic program, Smoothness (probability theory), Applied Mathematics, Function (mathematics), Lipschitz continuity, Computational Theory and Mathematics, Optimization and Control (math.OC), Relaxation (approximation), Laplace operator, Analysis
Abstract: Many modern applications involve the acquisition of noisy modulo samples of a function $f$, with the goal being to recover estimates of the original samples of $f$. For a Lipschitz function $f:[0,1]^d \to \mathbb{R}$, suppose we are given the samples $y_i = (f(x_i) + \eta_i)\bmod 1; \quad i=1,\dots,n$ where $\eta_i$ denotes noise. Assuming $\eta_i$ are zero-mean i.i.d Gaussian's, and $x_i$'s form a uniform grid, we derive a two-stage algorithm that recovers estimates of the samples $f(x_i)$ with a uniform error rate $O((\frac{\log n}{n})^{\frac{1}{d+2}})$ holding with high probability. The first stage involves embedding the points on the unit complex circle, and obtaining denoised estimates of $f(x_i)\bmod 1$ via a $k$NN (nearest neighbor) estimator. The second stage involves a sequential unwrapping procedure which unwraps the denoised mod $1$ estimates from the first stage. The estimates of the samples $f(x_i)$ can be subsequently utilized to construct an estimate of the function $f$, with the aforementioned uniform error rate. Recently, Cucuringu and Tyagi proposed an alternative way of denoising modulo $1$ data which works with their representation on the unit complex circle. They formulated a smoothness regularized least squares problem on the product manifold of unit circles, where the smoothness is measured with respect to the Laplacian of a proximity graph $G$ involving the $x_i$'s. This is a nonconvex quadratically constrained quadratic program (QCQP) hence they proposed solving its semidefinite program (SDP) based relaxation. We derive sufficient conditions under which the SDP is a tight relaxation of the QCQP. Hence under these conditions, the global solution of QCQP can be obtained in polynomial time., Comment: (i) 38 pages, 6 figures (ii) Revised the manuscript after receiving reviews (iii) Removed Theorem 3(2) and Corollary 3 due to inaccuracies in the proof, main results are unchanged (iv) Added Appendix B (v) Added Section 2.5 for estimating f (Theorem 5)
Published: 2021

88. Distributed stochastic optimization with large delays

Author: Nicholas Bambos, Yinyu Ye, Panayotis Mertikopoulos, Zhengyuan Zhou, Peter W. Glynn, New York University [New York] (NYU), NYU System (NYU), Performance analysis and optimization of LARge Infrastructures and Systems (POLARIS), Inria Grenoble - Rhône-Alpes, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Laboratoire d'Informatique de Grenoble (LIG), Centre National de la Recherche Scientifique (CNRS)-Université Grenoble Alpes (UGA)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP ), Université Grenoble Alpes (UGA)-Centre National de la Recherche Scientifique (CNRS)-Université Grenoble Alpes (UGA)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP ), Université Grenoble Alpes (UGA), Criteo AI Lab, Criteo [Paris], Stanford University, ANR-16-CE33-0004,ORACLESS,Stratégies adaptatives d'allocation des ressources dans les réseaux sans fil dynamiques(2016), ANR-19-P3IA-0003,MIAI,MIAI @ Grenoble Alpes(2019), ANR-11-LABX-0025,PERSYVAL-lab,Systemes et Algorithmes Pervasifs au confluent des mondes physique et numérique(2011), ANR-19-CE48-0018,ALIAS,Apprentissage adaptatif multi-agent(2019), and European Project: GAMENET
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, 0209 industrial biotechnology, Mathematical optimization, Optimization problem, Stochastic approximation, General Mathematics, 0211 other engineering and technologies, 02 engineering and technology, Management Science and Operations Research, Machine Learning (cs.LG), Distributed optimization, Primary 90C15, 90C26, secondary 90C25, 90C06, 020901 industrial engineering & automation, Stochastic gradient descent, Convergence (routing), FOS: Mathematics, Delays, Mathematics - Optimization and Control, Throughput (business), Mathematics, 021103 operations research, Computer Science Applications, Optimization and Control (math.OC), Stochastic optimization, Algorithm design, Node (circuits), [MATH.MATH-OC]Mathematics [math]/Optimization and Control [math.OC]
Abstract: One of the most widely used methods for solving large-scale stochastic optimization problems is distributed asynchronous stochastic gradient descent (DASGD), a family of algorithms that result from parallelizing stochastic gradient descent on distributed computing architectures (possibly) asychronously. However, a key obstacle in the efficient implementation of DASGD is the issue of delays: when a computing node contributes a gradient update, the global model parameter may have already been updated by other nodes several times over, thereby rendering this gradient information stale. These delays can quickly add up if the computational throughput of a node is saturated, so the convergence of DASGD may be compromised in the presence of large delays. Our first contribution is that, by carefully tuning the algorithm's step-size, convergence to the critical set is still achieved in mean square, even if the delays grow unbounded at a polynomial rate. We also establish finer results in a broad class of structured optimization problems (called variationally coherent), where we show that DASGD converges to a global optimum with probability $1$ under the same delay assumptions. Together, these results contribute to the broad landscape of large-scale non-convex stochastic optimization by offering state-of-the-art theoretical guarantees and providing insights for algorithm design., 41 pages, 8 figures; to be published in Mathematics of Operations Research
Published: 2021

89. Local and Global Uniform Convexity Conditions

Author: Kerdreux, Thomas, d'Aspremont, Alexandre, Pokutta, Sebastian, d'Aspremont, Alexandre, PaRis Artificial Intelligence Research InstitutE - - PRAIRIE2019 - ANR-19-P3IA-0001 - P3IA - VALID, Département d'informatique - ENS Paris (DI-ENS), Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche en Informatique et en Automatique (Inria)-École normale supérieure - Paris (ENS Paris), Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL), Zuse Institute Berlin (ZIB), Technische Universität Berlin (TU), Laboratoire d'informatique de l'école normale supérieure (LIENS), Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche en Informatique et en Automatique (Inria)-École normale supérieure - Paris (ENS Paris), Centre National de la Recherche Scientifique (CNRS), Statistical Machine Learning and Parsimony (SIERRA), Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-Centre National de la Recherche Scientifique (CNRS)-Inria de Paris, Institut National de Recherche en Informatique et en Automatique (Inria), Université Paris sciences et lettres (PSL), Research reported in this paper was partially supported through the Research Campus Modal funded by the German Federal Ministry of Education and Research (fund numbers 05M14ZAM,05M20ZBM) as well as the Deutsche Forschungsgemeinschaft (DFG) through the DFG Cluster of Excellence MATH+. AA would like to acknowledge support from the ML and Optimisation joint research initiative with the fonds AXA pour la recherche and Kamet Ventures, a Google focused award, as well as funding by the French government under management of Agence Nationale de la Recherche as part of the 'Investissements d’avenir' program, reference ANR-19-P3IA-0001 (PRAIRIE 3IA Institute)., ANR-19-P3IA-0001,PRAIRIE,PaRis Artificial Intelligence Research InstitutE(2019), Département d'informatique de l'École normale supérieure (DI-ENS), École normale supérieure - Paris (ENS Paris), Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS), Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-Centre National de la Recherche Scientifique (CNRS), Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-École normale supérieure - Paris (ENS Paris), Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Centre National de la Recherche Scientifique (CNRS)-Inria de Paris, École normale supérieure - Paris (ENS-PSL), Technical University of Berlin / Technische Universität Berlin (TU), and Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-École normale supérieure - Paris (ENS-PSL)
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, [STAT.ML]Statistics [stat]/Machine Learning [stat.ML], [INFO.INFO-RO] Computer Science [cs]/Operations Research [cs.RO], Optimization and Control (math.OC), FOS: Mathematics, [INFO.INFO-RO]Computer Science [cs]/Operations Research [cs.RO], Mathematics - Optimization and Control, [STAT.ML] Statistics [stat]/Machine Learning [stat.ML], Machine Learning (cs.LG)
Abstract: We review various characterizations of uniform convexity and smoothness on norm balls in finite-dimensional spaces and connect results stemming from the geometry of Banach spaces with \textit{scaling inequalities} used in analysing the convergence of optimization methods. In particular, we establish local versions of these conditions to provide sharper insights on a recent body of complexity results in learning theory, online learning, or offline optimization, which rely on the strong convexity of the feasible set. While they have a significant impact on complexity, these strong convexity or uniform convexity properties of feasible sets are not exploited as thoroughly as their functional counterparts, and this work is an effort to correct this imbalance. We conclude with some practical examples in optimization and machine learning where leveraging these conditions and localized assumptions lead to new complexity results.
Published: 2021

90. The bi-objective multimodal car-sharing problem

Author: Jakob Puchinger, Sophie N. Parragh, Miriam Enzi, Austrian Institute of Technology [Vienna] (AIT), Johannes Kepler University Linz [Linz] (JKU), IRT SystemX (IRT SystemX), Laboratoire Génie Industriel (LGI), and CentraleSupélec-Université Paris-Saclay
Subjects: FOS: Computer and information sciences, Mathematical optimization, Schedule, Binary search algorithm, Computer Science - Artificial Intelligence, Computer science, 0211 other engineering and technologies, Time horizon, 02 engineering and technology, Management Science and Operations Research, 0502 economics and business, 11. Sustainability, FOS: Mathematics, Mathematics - Optimization and Control, 050210 logistics & transportation, Sequence, 021103 operations research, 05 social sciences, Rank (computer programming), Pareto principle, [INFO.INFO-RO]Computer Science [cs]/Operations Research [cs.RO], Constraint (information theory), Artificial Intelligence (cs.AI), Optimization and Control (math.OC), Business, Management and Accounting (miscellaneous), Routing (electronic design automation)
Abstract: The aim of the bi-objective multimodal car-sharing problem (BiO-MMCP) is to determine the optimal mode of transport assignment for trips and to schedule the routes of available cars and users whilst minimizing cost and maximizing user satisfaction. We investigate the BiO-MMCP from a user-centred point of view. As user satisfaction is a crucial aspect in shared mobility systems, we consider user preferences in a second objective. Users may choose and rank their preferred modes of transport for different times of the day. In this way, we account for, e.g., different traffic conditions throughout the planning horizon. We study different variants of the problem. In the base problem, the sequence of tasks a user has to fulfil is fixed in advance and travel times as well as preferences are constant over the planning horizon. In variant 2, time-dependent travel times and preferences are introduced. In variant 3, we examine the challenges when allowing additional routing decisions. Variant 4 integrates variants 2 and 3. For this last variant, we develop a branch-and-cut algorithm which is embedded in two bi-objective frameworks, namely the $$\epsilon $$ ϵ -constraint method and a weighting binary search method. Computational experiments show that the branch-and cut algorithm outperforms the MIP formulation and we discuss changing solutions along the Pareto frontier.
Published: 2021

91. Single-step deep reinforcement learning for open-loop control of laminar and turbulent flows

Author: Aurélien Larcher, Hassan Ghraieb, Elie Hachem, P. Meliga, Jonathan Viquerat, Meliga, Philippe, Centre de Mise en Forme des Matériaux (CEMEF), MINES ParisTech - École nationale supérieure des mines de Paris, and Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-Centre National de la Recherche Scientifique (CNRS)
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Deep Reinforcement Learning, Neural Networks, Computational Mechanics, [SPI.MECA.MEFL] Engineering Sciences [physics]/Mechanics [physics.med-ph]/Fluids mechanics [physics.class-ph], Computational fluid dynamics, 01 natural sciences, Machine Learning (cs.LG), [SPI.MECA.MEFL]Engineering Sciences [physics]/Mechanics [physics.med-ph]/Fluids mechanics [physics.class-ph], 010305 fluids & plasmas, Cylinder (engine), law.invention, Physics::Fluid Dynamics, [SPI]Engineering Sciences [physics], law, 0103 physical sciences, FOS: Mathematics, Reinforcement learning, Fluidics, 010306 general physics, Mathematics - Optimization and Control, [PHYS]Physics [physics], Fluid Flow and Transfer Processes, Physics, Artificial neural network, Turbulence, Adjoint method, Proximal Policy Optimization, Open-loop controller, Laminar flow, [PHYS.MECA]Physics [physics]/Mechanics [physics], Mechanics, Optimization and Control (math.OC), Drag, Modeling and Simulation, Open-loop flow control
Abstract: International audience; This research gauges the ability of deep reinforcement learning (DRL) techniques to assist the optimization and control of fluid mechanical systems. It relies on introducing single-step proximal policy optimization (PPO), a “degenerate” version of the PPO algorithm, intended for situations where the optimal policy to be learnt by a neural network does not depend on state, as is notably the case in open-loop control problems. The numerical reward fed to the neural network is computed with an in-house stabilized finite elements environment implementing the variational multiscale method. Several prototypical separated flows in two dimensions are used as testbed. The method is applied first to two relatively simple optimization test cases (maximizing the mean lift of a NACA 0012 airfoil and the fluctuating lift of two side-by-side circular cylinders, both in laminar regimes) to assess convergence and accuracy by comparing to in-house direct numerical simulation (DNS) data. The potential of single-step PPO for reliable black-box optimization of computational fluid dynamics systems is then showcased by tackling several problems of open-loop control with parameter spaces large enough to dismiss DNS. The approach proves relevant to map the best positions for placement of a small control cylinder in the attempt to reduce drag in laminar and turbulent cylinder flows. All results are consistent with in-house data obtained by the adjoint method, and the drag of a square cylinder at Reynolds numbers in the range of a few thousands is reduced by 30%, which matches well reference experimental data available from literature. The method also successfully reduces the drag of the fluidic pinball, an equilateral triangle arrangement of rotating cylinders immersed in a turbulent stream. Consistently with reference machine learning results from the literature, drag is reduced by almost 60% using a so-called boat tailing actuation made up of a slowly rotating front cylinder and two downstream cylinders rotating in opposite directions so as to reduce the gap flow between them.
Published: 2021

92. Determinantal point processes based on orthogonal polynomials for sampling minibatches in SGD

Author: Bardenet, Remi, Ghosh, Subhro, Lin, Meixia, Centre National de la Recherche Scientifique (CNRS), Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189 (CRIStAL), Centrale Lille-Université de Lille-Centre National de la Recherche Scientifique (CNRS), and National University of Singapore (NUS)
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Probability (math.PR), FOS: Physical sciences, Machine Learning (stat.ML), Disordered Systems and Neural Networks (cond-mat.dis-nn), Condensed Matter - Disordered Systems and Neural Networks, Machine Learning (cs.LG), [STAT.ML]Statistics [stat]/Machine Learning [stat.ML], Optimization and Control (math.OC), Statistics - Machine Learning, [MATH.MATH-ST]Mathematics [math]/Statistics [math.ST], FOS: Mathematics, Mathematics - Optimization and Control, Mathematics - Probability, ComputingMilieux_MISCELLANEOUS
Abstract: Stochastic gradient descent (SGD) is a cornerstone of machine learning. When the number N of data items is large, SGD relies on constructing an unbiased estimator of the gradient of the empirical risk using a small subset of the original dataset, called a minibatch. Default minibatch construction involves uniformly sampling a subset of the desired size, but alternatives have been explored for variance reduction. In particular, experimental evidence suggests drawing minibatches from determinantal point processes (DPPs), distributions over minibatches that favour diversity among selected items. However, like in recent work on DPPs for coresets, providing a systematic and principled understanding of how and why DPPs help has been difficult. In this work, we contribute an orthogonal polynomial-based DPP paradigm for minibatch sampling in SGD. Our approach leverages the specific data distribution at hand, which endows it with greater sensitivity and power over existing data-agnostic methods. We substantiate our method via a detailed theoretical analysis of its convergence properties, interweaving between the discrete data set and the underlying continuous domain. In particular, we show how specific DPPs and a string of controlled approximations can lead to gradient estimators with a variance that decays faster with the batchsize than under uniform sampling. Coupled with existing finite-time guarantees for SGD on convex objectives, this entails that, DPP minibatches lead to a smaller bound on the mean square approximation error than uniform minibatches. Moreover, our estimators are amenable to a recent algorithm that directly samples linear statistics of DPPs (i.e., the gradient estimator) without sampling the underlying DPP (i.e., the minibatch), thereby reducing computational overhead. We provide detailed synthetic as well as real data experiments to substantiate our theoretical claims., Accepted at NeurIPS 2021 (Spotlight Paper). Authors are listed in alphabetical order
Published: 2021

93. On the Theory of Implicit Deep Learning: Global Convergence with Implicit Layers

Author: Kawaguchi, Kenji, Kawaguchi, Kenji, and Harvard University [Cambridge]
Subjects: FOS: Computer and information sciences, [INFO.INFO-AI] Computer Science [cs]/Artificial Intelligence [cs.AI], Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Computer Vision and Pattern Recognition (cs.CV), Computer Science - Computer Vision and Pattern Recognition, [INFO.INFO-NE] Computer Science [cs]/Neural and Evolutionary Computing [cs.NE], [INFO.INFO-CV]Computer Science [cs]/Computer Vision and Pattern Recognition [cs.CV], Machine Learning (stat.ML), [MATH.MATH-OC] Mathematics [math]/Optimization and Control [math.OC], [INFO.INFO-LG] Computer Science [cs]/Machine Learning [cs.LG], [INFO.INFO-NE]Computer Science [cs]/Neural and Evolutionary Computing [cs.NE], [STAT.ML] Statistics [stat]/Machine Learning [stat.ML], [INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI], Machine Learning (cs.LG), Artificial Intelligence (cs.AI), [INFO.INFO-CV] Computer Science [cs]/Computer Vision and Pattern Recognition [cs.CV], [INFO.INFO-LG]Computer Science [cs]/Machine Learning [cs.LG], [STAT.ML]Statistics [stat]/Machine Learning [stat.ML], Statistics - Machine Learning, Optimization and Control (math.OC), FOS: Mathematics, [MATH.MATH-OC]Mathematics [math]/Optimization and Control [math.OC], Mathematics - Optimization and Control
Abstract: A deep equilibrium model uses implicit layers, which are implicitly defined through an equilibrium point of an infinite sequence of computation. It avoids any explicit computation of the infinite sequence by finding an equilibrium point directly via root-finding and by computing gradients via implicit differentiation. In this paper, we analyze the gradient dynamics of deep equilibrium models with nonlinearity only on weight matrices and non-convex objective functions of weights for regression and classification. Despite non-convexity, convergence to global optimum at a linear rate is guaranteed without any assumption on the width of the models, allowing the width to be smaller than the output dimension and the number of data points. Moreover, we prove a relation between the gradient dynamics of the deep implicit layer and the dynamics of trust region Newton method of a shallow explicit layer. This mathematically proven relation along with our numerical observation suggests the importance of understanding implicit bias of implicit layers and an open problem on the topic. Our proofs deal with implicit layers, weight tying and nonlinearity on weights, and differ from those in the related literature., Comment: ICLR 2021. Selected for ICLR Spotlight (top 6% submissions)
Published: 2021

94. TSSOS: a Julia library to exploit sparsity for large-scale polynomial optimization

Author: Magron, Victor, Wang, Jie, Magron, Victor, Artificial and Natural Intelligence Toulouse Institute - - ANITI2019 - ANR-19-P3IA-0004 - P3IA - VALID, TREMPLIN-ERC - Optimisation garantie pour la vérification des systèmes cyber-physiques - - COPS2018 - ANR-18-ERC2-0004 - TERC - VALID, Polynomial Optimization, Efficiency through Moments and Algebra - POEMA - - H20202019-01-01 - 2022-12-31 - 813211 - VALID, Institut de Mathématiques de Toulouse UMR5219 (IMT), Institut National des Sciences Appliquées - Toulouse (INSA Toulouse), Institut National des Sciences Appliquées (INSA)-Institut National des Sciences Appliquées (INSA)-Université Toulouse 1 Capitole (UT1), Université Fédérale Toulouse Midi-Pyrénées-Université Fédérale Toulouse Midi-Pyrénées-Université Toulouse - Jean Jaurès (UT2J)-Université Toulouse III - Paul Sabatier (UT3), Université Fédérale Toulouse Midi-Pyrénées-Centre National de la Recherche Scientifique (CNRS), Équipe Méthodes et Algorithmes en Commande (LAAS-MAC), Laboratoire d'analyse et d'architecture des systèmes (LAAS), Université Toulouse - Jean Jaurès (UT2J)-Université Toulouse 1 Capitole (UT1), Université Fédérale Toulouse Midi-Pyrénées-Université Fédérale Toulouse Midi-Pyrénées-Centre National de la Recherche Scientifique (CNRS)-Université Toulouse III - Paul Sabatier (UT3), Université Fédérale Toulouse Midi-Pyrénées-Institut National des Sciences Appliquées - Toulouse (INSA Toulouse), Institut National des Sciences Appliquées (INSA)-Institut National des Sciences Appliquées (INSA)-Institut National Polytechnique (Toulouse) (Toulouse INP), Université Fédérale Toulouse Midi-Pyrénées-Université Toulouse - Jean Jaurès (UT2J)-Université Toulouse 1 Capitole (UT1), Université Fédérale Toulouse Midi-Pyrénées, ANR-19-P3IA-0004,ANITI,Artificial and Natural Intelligence Toulouse Institute(2019), ANR-18-ERC2-0004,COPS,Optimisation garantie pour la vérification des systèmes cyber-physiques(2018), European Project: 813211,H2020,POEMA(2019), Université Toulouse Capitole (UT Capitole), Université de Toulouse (UT)-Université de Toulouse (UT)-Institut National des Sciences Appliquées - Toulouse (INSA Toulouse), Institut National des Sciences Appliquées (INSA)-Université de Toulouse (UT)-Institut National des Sciences Appliquées (INSA)-Université Toulouse - Jean Jaurès (UT2J), Université de Toulouse (UT)-Université Toulouse III - Paul Sabatier (UT3), Université de Toulouse (UT)-Centre National de la Recherche Scientifique (CNRS), Université de Toulouse (UT)-Centre National de la Recherche Scientifique (CNRS)-Institut National Polytechnique (Toulouse) (Toulouse INP), Université de Toulouse (UT)-Université Toulouse Capitole (UT Capitole), Université de Toulouse (UT), European Project: 813211,H2020-EU.1.3. - EXCELLENT SCIENCE - Marie Skłodowska-Curie Actions (Main Programme), and H2020-EU.1.3.1. - Fostering new skills by means of excellent initial training of researchers ,10.3030/813211,POEMA(2019)
Subjects: FOS: Computer and information sciences, Optimization and Control (math.OC), ComputingMethodologies_SYMBOLICANDALGEBRAICMANIPULATION, FOS: Mathematics, Computer Science - Mathematical Software, [MATH.MATH-OC] Mathematics [math]/Optimization and Control [math.OC], [MATH.MATH-OC]Mathematics [math]/Optimization and Control [math.OC], Mathematical Software (cs.MS), Mathematics - Optimization and Control
Abstract: The Julia library TSSOS aims at helping polynomial optimizers to solve large-scale problems with sparse input data. The underlying algorithmic framework is based on exploiting correlative and term sparsity to obtain a new moment-SOS hierarchy involving potentially much smaller positive semidefinite matrices. TSSOS can be applied to numerous problems ranging from power networks to eigenvalue and trace optimization of noncommutative polynomials, involving up to tens of thousands of variables and constraints., 10 pages, 2 figures, 2 tables
Published: 2021

95. Finding Global Minima via Kernel Approximations

Author: Rudi, Alessandro, Marteau-Ferey, Ulysse, Bach, Francis, Université Paris sciences et lettres (PSL), Département d'informatique de l'École normale supérieure (DI-ENS), École normale supérieure - Paris (ENS Paris), Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS), Statistical Machine Learning and Parsimony (SIERRA), Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-École normale supérieure - Paris (ENS Paris), Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Centre National de la Recherche Scientifique (CNRS)-Inria de Paris, Institut National de Recherche en Informatique et en Automatique (Inria), This work was funded in part by the French government under management of Agence Nationale de la Recherche as part of the 'Investissements d’avenir' program, reference ANR-19-P3IA-0001 (PRAIRIE 3IA Institute). We also acknowledge support from the European Research Council (grant SEQUOIA 724063)., ANR-19-P3IA-0001,PRAIRIE,PaRis Artificial Intelligence Research InstitutE(2019), European Project: 724063,ERC-2016-COG,SEQUOIA(2017), Département d'informatique - ENS Paris (DI-ENS), Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche en Informatique et en Automatique (Inria)-École normale supérieure - Paris (ENS Paris), Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL), Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche en Informatique et en Automatique (Inria)-École normale supérieure - Paris (ENS Paris), Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-Centre National de la Recherche Scientifique (CNRS)-Inria de Paris, École normale supérieure - Paris (ENS-PSL), and Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-École normale supérieure - Paris (ENS-PSL)
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Polynomial optimization, Machine Learning (stat.ML), Machine Learning (cs.LG), [INFO.INFO-LG]Computer Science [cs]/Machine Learning [cs.LG], [STAT.ML]Statistics [stat]/Machine Learning [stat.ML], Optimization and Control (math.OC), Statistics - Machine Learning, FOS: Mathematics, Semidefinite programming, Global optimization, Mathematics - Optimization and Control, Sum of squares
Abstract: We consider the global minimization of smooth functions based solely on function evaluations. Algorithms that achieve the optimal number of function evaluations for a given precision level typically rely on explicitly constructing an approximation of the function which is then minimized with algorithms that have exponential running-time complexity. In this paper, we consider an approach that jointly models the function to approximate and finds a global minimum. This is done by using infinite sums of square smooth functions and has strong links with polynomial sum-of-squares hierarchies. Leveraging recent representation properties of reproducing kernel Hilbert spaces, the infinite-dimensional optimization problem can be solved by subsampling in time polynomial in the number of function evaluations, and with theoretical guarantees on the obtained minimum. Given $n$ samples, the computational cost is $O(n^{3.5})$ in time, $O(n^2)$ in space, and we achieve a convergence rate to the global optimum that is $O(n^{-m/d + 1/2 + 3/d})$ where $m$ is the degree of differentiability of the function and $d$ the number of dimensions. The rate is nearly optimal in the case of Sobolev functions and more generally makes the proposed method particularly suitable for functions that have a large number of derivatives. Indeed, when $m$ is in the order of $d$, the convergence rate to the global optimum does not suffer from the curse of dimensionality, which affects only the worst-case constants (that we track explicitly through the paper).
Published: 2020

96. Convergence of Online Adaptive and Recurrent Optimization Algorithms

Author: Massé, Pierre-Yves, Ollivier, Yann, Czech Technical University in Prague (CTU), Facebook AI Research [Paris] (FAIR), and Facebook
Subjects: FOS: Computer and information sciences, [STAT.ML]Statistics [stat]/Machine Learning [stat.ML], Optimization and Control (math.OC), Statistics - Machine Learning, [MATH.MATH-DS]Mathematics [math]/Dynamical Systems [math.DS], FOS: Mathematics, Machine Learning (stat.ML), Dynamical Systems (math.DS), [MATH.MATH-OC]Mathematics [math]/Optimization and Control [math.OC], Mathematics - Dynamical Systems, Mathematics - Optimization and Control
Abstract: We prove local convergence of several notable gradient descent algorithms used in machine learning, for which standard stochastic gradient descent theory does not apply directly. This includes, first, online algorithms for recurrent models and dynamical systems, such as \emph{Real-time recurrent learning} (RTRL) and its computationally lighter approximations NoBackTrack and UORO; second, several adaptive algorithms such as RMSProp, online natural gradient, and Adam with $\beta^2\to 1$.Despite local convergence being a relatively weak requirement for a new optimization algorithm, no local analysis was available for these algorithms, as far as we knew. Analysis of these algorithms does not immediately follow from standard stochastic gradient (SGD) theory. In fact, Adam has been proved to lack local convergence in some simple situations \citep{j.2018on}. For recurrent models, online algorithms modify the parameter while the model is running, which further complicates the analysis with respect to simple SGD.Local convergence for these various algorithms results from a single, more general set of assumptions, in the setup of learning dynamical systems online. Thus, these results can cover other variants of the algorithms considered.We adopt an ``ergodic'' rather than probabilistic viewpoint, working with empirical time averages instead of probability distributions. This is more data-agnostic and creates differences with respect to standard SGD theory, especially for the range of possible learning rates. For instance, with cycling or per-epoch reshuffling over a finite dataset instead of pure i.i.d.\ sampling with replacement, empirical averages of gradients converge at rate $1/T$ instead of $1/\sqrt{T}$ (cycling acts as a variance reduction method), theoretically allowing for larger learning rates than in SGD.
Published: 2020

97. A Shooting Formulation of Deep Learning

Author: Vialard, F. -X, Roland Kwitt, Wei, S., Niethammer, M., Laboratoire d'Informatique Gaspard-Monge (LIGM), Université Paris-Est Marne-la-Vallée (UPEM)-École des Ponts ParisTech (ENPC)-ESIEE Paris-Fédération de Recherche Bézout-Centre National de la Recherche Scientifique (CNRS), University of Salzburg, Department of Mathematics and Statistics [Melbourne], University of Melbourne, Department of Computer Science [Chapel Hill], University of North Carolina [Chapel Hill] (UNC), University of North Carolina System (UNC)-University of North Carolina System (UNC), Laboratoire d'Informatique Gaspard-Monge (ligm), and Centre National de la Recherche Scientifique (CNRS)-Fédération de Recherche Bézout-ESIEE Paris-École des Ponts ParisTech (ENPC)-Université Paris-Est Marne-la-Vallée (UPEM)
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Optimization and Control (math.OC), FOS: Mathematics, Computer Science - Neural and Evolutionary Computing, Neural and Evolutionary Computing (cs.NE), [MATH.MATH-OC]Mathematics [math]/Optimization and Control [math.OC], [INFO.INFO-NE]Computer Science [cs]/Neural and Evolutionary Computing [cs.NE], Mathematics - Optimization and Control, Machine Learning (cs.LG)
Abstract: Continuous-depth neural networks can be viewed as deep limits of discrete neural networks whose dynamics resemble a discretization of an ordinary differential equation (ODE). Although important steps have been taken to realize the advantages of such continuous formulations, most current techniques are not truly continuous-depth as they assume \textit{identical} layers. Indeed, existing works throw into relief the myriad difficulties presented by an infinite-dimensional parameter space in learning a continuous-depth neural ODE. To this end, we introduce a shooting formulation which shifts the perspective from parameterizing a network layer-by-layer to parameterizing over optimal networks described only by a set of initial conditions. For scalability, we propose a novel particle-ensemble parametrization which fully specifies the optimal weight trajectory of the continuous-depth neural network. Our experiments show that our particle-ensemble shooting formulation can achieve competitive performance, especially on long-range forecasting tasks. Finally, though the current work is inspired by continuous-depth neural networks, the particle-ensemble shooting formulation also applies to discrete-time networks and may lead to a new fertile area of research in deep learning parametrization.
Published: 2020

98. Faster Wasserstein Distance Estimation with the Sinkhorn Divergence

Author: Chizat, Lenaic, Roussillon, Pierre, Léger, Flavien, Vialard, François-Xavier, Peyré, Gabriel, Laboratoire de Mathématiques d'Orsay (LMO), Université Paris-Sud - Paris 11 (UP11)-Centre National de la Recherche Scientifique (CNRS), Département de Mathématiques et Applications - ENS Paris (DMA), École normale supérieure - Paris (ENS-PSL), Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-Centre National de la Recherche Scientifique (CNRS), Université Gustave Eiffel, ANR-19-P3IA-0001,PRAIRIE,PaRis Artificial Intelligence Research InstitutE(2019), European Project: H2020 724175, NORIA, Centre National de la Recherche Scientifique (CNRS)-École normale supérieure - Paris (ENS Paris), Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL), Centre National de la Recherche Scientifique (CNRS)-Université Paris-Sud - Paris 11 (UP11), École normale supérieure - Paris (ENS Paris), and Université Gustave Eiffel (UNIV GUSTAVE EIFFEL)
Subjects: FOS: Computer and information sciences, Entropic regularization, Mathematics - Statistics Theory, Machine Learning (stat.ML), Statistics Theory (math.ST), Empirical distribution, [STAT.ML]Statistics [stat]/Machine Learning [stat.ML], Statistics - Machine Learning, Optimization and Control (math.OC), [MATH.MATH-ST]Mathematics [math]/Statistics [math.ST], FOS: Mathematics, Wasserstein distance, [MATH.MATH-OC]Mathematics [math]/Optimization and Control [math.OC], Mathematics - Optimization and Control, Estimation
Abstract: International audience; The squared Wasserstein distance is a natural quantity to compare probability distributions in a non-parametric setting. This quantity is usually estimated with the plug-in estimator, defined via a discrete optimal transport problem which can be solved to $\epsilon$-accuracy by adding an entropic regularization of order $\epsilon$ and using for instance Sinkhorn's algorithm. In this work, we propose instead to estimate it with the Sinkhorn divergence, which is also built on entropic regularization but includes debiasing terms. We show that, for smooth densities, this estimator has a comparable sample complexity but allows higher regularization levels, of order $\epsilon^{1/2}$, which leads to improved computational complexity bounds and a strong speedup in practice. Our theoretical analysis covers the case of both randomly sampled densities and deterministic discretizations on uniform grids. We also propose and analyze an estimator based on Richardson extrapolation of the Sinkhorn divergence which enjoys improved statistical and computational efficiency guarantees, under a condition on the regularity of the approximation error, which is in particular satisfied for Gaussian densities. We finally demonstrate the efficiency of the proposed estimators with numerical experiments.
Published: 2020

99. Throughput-Optimal Topology Design for Cross-Silo Federated Learning

Author: Othmane MARFOQ, Chuan Xu, Giovanni Neglia, Richard Vidal, Network Engineering and Operations (NEO ), Inria Sophia Antipolis - Méditerranée (CRISAM), Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria), Accenture Labs [Sophia Antipolis], Institut National de Recherche en Informatique et en Automatique (Inria), and Accenture
Subjects: Networking and Internet Architecture (cs.NI), FOS: Computer and information sciences, Computer Science - Machine Learning, Machine Learning (cs.LG), Computer Science - Networking and Internet Architecture, [INFO.INFO-NI]Computer Science [cs]/Networking and Internet Architecture [cs.NI], Computer Science - Distributed, Parallel, and Cluster Computing, [INFO.INFO-LG]Computer Science [cs]/Machine Learning [cs.LG], Optimization and Control (math.OC), FOS: Mathematics, Distributed, Parallel, and Cluster Computing (cs.DC), [MATH.MATH-OC]Mathematics [math]/Optimization and Control [math.OC], [INFO.INFO-DC]Computer Science [cs]/Distributed, Parallel, and Cluster Computing [cs.DC], Mathematics - Optimization and Control
Abstract: Federated learning usually employs a client-server architecture where an orchestrator iteratively aggregates model updates from remote clients and pushes them back a refined model. This approach may be inefficient in cross-silo settings, as close-by data silos with high-speed access links may exchange information faster than with the orchestrator, and the orchestrator may become a communication bottleneck. In this paper we define the problem of topology design for cross-silo federated learning using the theory of max-plus linear systems to compute the system throughput---number of communication rounds per time unit. We also propose practical algorithms that, under the knowledge of measurable network characteristics, find a topology with the largest throughput or with provable throughput guarantees. In realistic Internet networks with 10 Gbps access links for silos, our algorithms speed up training by a factor 9 and 1.5 in comparison to the master-slave architecture and to state-of-the-art MATCHA, respectively. Speedups are even larger with slower access links., 41 pages, NeurIPS 2020
Published: 2020

100. Tight Nonparametric Convergence Rates for Stochastic Gradient Descent under the Noiseless Linear Model

Author: Berthier, Raphaël, Bach, Francis, Gaillard, Pierre, Université Paris sciences et lettres (PSL), Statistical Machine Learning and Parsimony (SIERRA), Département d'informatique - ENS Paris (DI-ENS), Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche en Informatique et en Automatique (Inria)-École normale supérieure - Paris (ENS Paris), Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche en Informatique et en Automatique (Inria)-École normale supérieure - Paris (ENS Paris), Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-Centre National de la Recherche Scientifique (CNRS)-Inria de Paris, Institut National de Recherche en Informatique et en Automatique (Inria), Apprentissage de modèles à partir de données massives (Thoth), Inria Grenoble - Rhône-Alpes, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Laboratoire Jean Kuntzmann (LJK), Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Université Grenoble Alpes (UGA)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP ), Université Grenoble Alpes (UGA)-Centre National de la Recherche Scientifique (CNRS)-Université Grenoble Alpes (UGA)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP ), Université Grenoble Alpes (UGA), This work was funded in part by the French government under management of Agence Nationale de la Recherche as part of the 'Investissements d’avenir' program, reference ANR-19-P3IA-0001 (PRAIRIE 3IA Institute). We acknowledge support from the European Research Council (grant SEQUOIA724063) and from the DGA., ANR-19-P3IA-0001,PRAIRIE,PaRis Artificial Intelligence Research InstitutE(2019), European Project: 724063,ERC-2016-COG,SEQUOIA(2017), Département d'informatique de l'École normale supérieure (DI-ENS), École normale supérieure - Paris (ENS Paris), Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-École normale supérieure - Paris (ENS Paris), Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Centre National de la Recherche Scientifique (CNRS)-Inria de Paris, École normale supérieure - Paris (ENS-PSL), Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-École normale supérieure - Paris (ENS-PSL), and We acknowledge support from the European Research Council (grant SEQUOIA724063) and from the DGA.
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Kernel regression, Machine Learning (stat.ML), Machine Learning (cs.LG), Stochastic gradient descent, Gossip algorithm, [STAT.ML]Statistics [stat]/Machine Learning [stat.ML], [INFO.INFO-LG]Computer Science [cs]/Machine Learning [cs.LG], Optimization and Control (math.OC), Statistics - Machine Learning, [INFO.INFO-MA]Computer Science [cs]/Multiagent Systems [cs.MA], FOS: Mathematics, Averaging process, Computer Science - Multiagent Systems, [MATH.MATH-OC]Mathematics [math]/Optimization and Control [math.OC], Mathematics - Optimization and Control, Multiagent Systems (cs.MA), Nonparametric rates
Abstract: International audience; In the context of statistical supervised learning, the noiseless linear model assumes that there exists a deterministic linear relation $Y = \langle \theta_*, X \rangle$ between the random output $Y$ and the random feature vector $\Phi(U)$, a potentially non-linear transformation of the inputs $U$. We analyze the convergence of single-pass, fixed step-size stochastic gradient descent on the least-square risk under this model. The convergence of the iterates to the optimum $\theta_*$ and the decay of the generalization error follow polynomial convergence rates with exponents that both depend on the regularities of the optimum $\theta_*$ and of the feature vectors $\Phi(u)$. We interpret our result in the reproducing kernel Hilbert space framework. As a special case, we analyze an online algorithm for estimating a real function on the unit interval from the noiseless observation of its value at randomly sampled points; the convergence depends on the Sobolev smoothness of the function and of a chosen kernel. Finally, we apply our analysis beyond the supervised learning setting to obtain convergence rates for the averaging process (a.k.a. gossip algorithm) on a graph depending on its spectral dimension.
Published: 2020

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Database

311 results on '"Mathematics - Optimization and Control"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources