17,822 results on '"Mathematics - Optimization and Control"'
Search Results
2. Improving Energy Conserving Descent for Machine Learning: Theory and Practice
- Abstract
We develop the theory of Energy Conserving Descent (ECD) and introduce ECDSep, a gradient-based optimization algorithm able to tackle convex and non-convex optimization problems. The method is based on the novel ECD framework of optimization as physical evolution of a suitable chaotic energy-conserving dynamical system, enabling analytic control of the distribution of results - dominated at low loss - even for generic high-dimensional problems with no symmetries. Compared to previous realizations of this idea, we exploit the theoretical control to improve both the dynamics and chaos-inducing elements, enhancing performance while simplifying the hyper-parameter tuning of the optimization algorithm targeted to different classes of problems. We empirically compare with popular optimization methods such as SGD, Adam and AdamW on a wide range of machine learning problems, finding competitive or improved performance compared to the best among them on each task. We identify limitations in our analysis pointing to possibilities for additional improvements., Comment: 15 pages + appendices, full code available
- Published
- 2023
3. Development of On-Ground Hardware In Loop Simulation Facility for Space Robotics
- Abstract
Over a couple of decades, space junk has increased rapidly, which has caused significant threats to the LEO operation satellites. An Active Debris Removal $(ADR)$ concept continuously evolves for space junk removal. One of the ADR methods is Space Robotics, whose function is to chase, capture and de-orbit the space junk. This paper presents the development of an on-ground space robotics facility in the TCS Research for on-orbit servicing $(OOS)$ like refueling and debris capture experiments. A Hardware in Loop Simulation (HILS) system will be used for integrated system development, testing, and demonstration of on-orbit docking mechanisms. The HiLS test facility of TCS Research Lab will use two URs in which one UR is attached to the RG2 gripper, and the other is attached to a force-torque sensor and with a scaled mock-up model. The first UR5 will be mounted on a 7-axis linear rail and contain the docking probe. First, UR5 with a suitable gripper has to interface its control boxes. The grasping algorithm was run through the ROS interface line to demonstrate and validate the on-orbit operations. The manipulator will be mounted with LIDAR and a camera to visualize the mock-up model, find the target model's pose and rotational velocity estimation, and a gripper that will move relative to the target model. The other manipulator has the UR10 control, providing rotational and random motion to the mockup, enabling a dynamic simulator fed by force-torque data. The dynamic simulator is fed up with the orbit propagator, which will provide the orbiting environment to the target model. For the simulation of the docking and grasping of the target model, a linear rail of a 6m setup is still in the procurement process. Once reaching proximity, the grasping algorithm will be launched to capture the target model after reading the random motion of the mock-up model., Comment: 11 pages, 15 figures, Accepted at Small Satellite Conference 2023; Weekday Sessions: Orbital Debris, SSA & STM; Tuesday, 8th Aug 2023
- Published
- 2023
4. Using multiobjective optimization to reconstruct interferometric data (I)
- Abstract
Imaging in radioastronomy is an ill-posed inverse problem. Particularly the Event Horizon Telescope (EHT) Collaboration investigated the fidelity of their image reconstructions convincingly by large surveys solving the problem with different optimization parameters. This strategy faces a limitation for the existing methods when imaging the active galactic nuclei (AGN): large and expensive surveys solving the problem with different optimization parameters are time-consumptive. We present a novel nonconvex, multiobjective optimization modeling approach that gives a different type of claim and may provide a pathway to overcome this limitation. To this end we used a multiobjective version of the genetic algorithm (GA): the Multiobjective Evolutionary Algorithm Based on Decomposition, or MOEA/D. GA strategies explore the objective function by evolutionary operations to find the different local minima, and to avoid getting trapped in saddle points. First, we have tested our algorithm (MOEA/D) using synthetic data based on the 2017 Event Horizon Telescope (EHT) array and a possible EHT + next-generation EHT (ngEHT) configuration. We successfully recover a fully evolved Pareto front of non-dominated solutions for these examples. The Pareto front divides into clusters of image morphologies representing the full set of locally optimal solutions. We discuss approaches to find the most natural guess among these solutions and demonstrate its performance on synthetic data. Finally, we apply MOEA/D to observations of the black hole shadow in Messier 87 (M87) with the EHT data in 2017. MOEA/D is very flexible, faster than any other Bayesian method and explores more solutions than Regularized Maximum Likelihood methods (RML)., Comment: to appear in A&A
- Published
- 2023
- Full Text
- View/download PDF
5. Informative regularization for a multi-layer perceptron RR Lyrae classifier under data shift
- Abstract
In recent decades, machine learning has provided valuable models and algorithms for processing and extracting knowledge from time-series surveys. Different classifiers have been proposed and performed to an excellent standard. Nevertheless, few papers have tackled the data shift problem in labeled training sets, which occurs when there is a mismatch between the data distribution in the training set and the testing set. This drawback can damage the prediction performance in unseen data. Consequently, we propose a scalable and easily adaptable approach based on an informative regularization and an ad-hoc training procedure to mitigate the shift problem during the training of a multi-layer perceptron for RR Lyrae classification. We collect ranges for characteristic features to construct a symbolic representation of prior knowledge, which was used to model the informative regularizer component. Simultaneously, we design a two-step back-propagation algorithm to integrate this knowledge into the neural network, whereby one step is applied in each epoch to minimize classification error, while another is applied to ensure regularization. Our algorithm defines a subset of parameters (a mask) for each loss function. This approach handles the forgetting effect, which stems from a trade-off between these loss functions (learning from data versus learning expert knowledge) during training. Experiments were conducted using recently proposed shifted benchmark sets for RR Lyrae stars, outperforming baseline models by up to 3\% through a more reliable classifier. Our method provides a new path to incorporate knowledge from characteristic features into artificial neural networks to manage the underlying data shift problem.
- Published
- 2023
- Full Text
- View/download PDF
6. Capacity of Sun-driven Lunar Swingby Sequences and Their Application in Asteroid Retrieval
- Abstract
For deep-space mission design, the gravity of the Sun and the Moon can be first considered and utilized. Their gravity can provide the energy change for launching spacecraft and retrieving spacecraft as well as asteroids. Regarding an asteroid retrieval mission, it can lead to the mitigation of asteroid hazards and an easy exploration and exploitation of the asteroid. This paper discusses the application of the Sun-driven lunar swingby sequence for asteroid missions. Characterizing the capacity of this technique is not only interesting in terms of dynamic insights but also non-trivial for trajectory design. The capacity of a Sun-driven lunar swingby sequence is elucidated in this paper with the help of the "Swingby-Jacobi" graph. The capacity can be represented by a range of the Jacobi integral that encloses around 660 asteroids currently cataloged. To facilitate trajectory design, a database of Sun-perturbed Moon-to-Moon transfers, including multi-revolution cases is generated and employed. Massive trajectory options for spacecraft launch and asteroid capture can then be explored and optimized. Finally, a number of asteroid flyby, rendezvous, sample-return, and retrieval mission options enabled by the proposed technique are obtained.
- Published
- 2023
- Full Text
- View/download PDF
7. Initial Trajectory Assessment of the RAMSES Mission to (99942) Apophis
- Abstract
(99942) Apophis is a potentially hazardous asteroid that will closely approach the Earth on April 13, 2029. Although the likelihood of an impact has been ruled out, this close encounter represents a unique opportunity for planetary science and defense. By investigating the physical and dynamical changes induced by this interaction, valuable insights into asteroid cohesion, strength, and internal structure can be obtained. In light of these circumstances, a fast mission to Apophis holds great scientific importance and potential for understanding potentially hazardous asteroids. To this aim, ESA proposed the mission RAMSES (Rapid Apophis Mission for SEcurity and Safety) to reach Apophis before its close encounter. In this context, the paper focuses on the reachability analysis of (99942) Apophis, examining thousands of trajectories departing from Earth and reaching the asteroid before the fly-by, using a low-thrust spacecraft. A two-layer approach combining direct sequential convex programming and an indirect method is employed for fast and reliable trajectory optimization. The results reveal multiple feasible launch windows and provide essential information for mission planning and system design.
- Published
- 2023
8. Construction of stationary trajectories for a model of a system of N particles with interaction
- Abstract
For the classical N-body problem, an approach is proposed based on the introduction of some natural in the physical sense optimization problems of mathematical programming for finding a conditional minimum for the characteristics of the system on the set of its possible states. The solution of these problems then makes it possible to construct families of flat stationary and periodic trajectories of the system and also to find relationships and estimates for the characteristics of the system on these trajectories. It is shown that when the system moves on a plane on trajectories generated by the global minimum in these optimization problems, at any time the minimum possible size of the system is achieved at each current level of its "cohesion" (or potential energy). Similar optimization problems are considered for finding a conditional minimum for the characteristics of a system in three-dimensional space. It is shown that the solution of these problems can be achieved only on flat trajectories of the system and is achieved, in particular, on the constructed flat stationary and periodic trajectories. In addition, it is shown that the trajectory of the system in three-dimensional space, at least at one point of which the minimum possible size of the system is achieved at the current value of its cohesion (or potential energy), can only be flat. And such trajectories are, in particular, flat stationary and periodic trajectories generated by the global minimum in the considered optimization problems., Comment: 32 pages
- Published
- 2023
9. Parametric Optimization of Low Thrust Orbital Maneuvers
- Abstract
The goal of the present paper is to make a numerical analysis of parametric optimization of low thrust orbital maneuver. An orbital maneuver occurs when it is necessary to modify the orbit a space vehicle to change its function or to correct effects of perturbations. A parametric optimization is made when the thrust is not free to point to any direction, but has to follow some prescribed law, like a linear or quadratic relation with time.
- Published
- 2023
10. Movement Optimization of Robotic Arms for Energy and Time Reduction using Evolutionary Algorithms
- Abstract
Trajectory optimization of a robot manipulator consists of both optimization of the robot movement as well as optimization of the robot end-effector path. This paper aims to find optimum movement parameters including movement type, speed, and acceleration to minimize robot energy. Trajectory optimization by minimizing the energy would increase the longevity of robotic manipulators. We utilized the particle swarm optimization method to find the movement parameters leading to minimum energy consumption. The effectiveness of the proposed method is demonstrated on different trajectories. Experimental results show that 49% efficiency was obtained using a UR5 robotic arm.
- Published
- 2023
11. Achieving Consensus over Compact Submanifolds
- Abstract
We consider the consensus problem in a decentralized network, focusing on a compact submanifold that acts as a nonconvex constraint set. By leveraging the proximal smoothness of the compact submanifold, which encompasses the local singleton property and the local Lipschitz continuity of the projection operator on the manifold, and establishing the connection between the projection operator and general retraction, we show that the Riemannian gradient descent with a unit step size has locally linear convergence if the network has a satisfactory level of connectivity. Moreover, based on the geometry of the compact submanifold, we prove that a convexity-like regularity condition, referred to as the restricted secant inequality, always holds in an explicitly characterized neighborhood around the solution set of the nonconvex consensus problem. By leveraging this restricted secant inequality and imposing a weaker connectivity requirement on the decentralized network, we present a comprehensive analysis of the linear convergence of the Riemannian gradient descent, taking into consideration appropriate initialization and step size. Furthermore, if the network is well connected, we demonstrate that the local Lipschitz continuity endowed by proximal smoothness is a sufficient condition for the restricted secant inequality, thus contributing to the local error bound. We believe that our established results will find more application in the consensus problems over a more general proximally smooth set. Numerical experiments are conducted to validate our theoretical findings., Comment: 25 pages
- Published
- 2023
12. Modelling the discretization error of initial value problems using the Wishart distribution
- Abstract
This paper presents a new discretization error quantification method for the numerical integration of ordinary differential equations. The error is modelled by using the Wishart distribution, which enables us to capture the correlation between variables. Error quantification is achieved by solving an optimization problem under the order constraints for the covariance matrices. An algorithm for the optimization problem is also established in a slightly broader context.
- Published
- 2023
13. Bayesian Optimization of Expensive Nested Grey-Box Functions
- Abstract
We consider the problem of optimizing a grey-box objective function, i.e., nested function composed of both black-box and white-box functions. A general formulation for such grey-box problems is given, which covers the existing grey-box optimization formulations as special cases. We then design an optimism-driven algorithm to solve it. Under certain regularity assumptions, our algorithm achieves similar regret bound as that for the standard black-box Bayesian optimization algorithm, up to a constant multiplicative term depending on the Lipschitz constants of the functions considered. We further extend our method to the constrained case and discuss several special cases. For the commonly used kernel functions, the regret bounds allow us to derive a convergence rate to the optimal solution. Experimental results show that our grey-box optimization method empirically improves the speed of finding the global optimal solution significantly, as compared to the standard black-box optimization algorithm.
- Published
- 2023
14. $K$-Nearest-Neighbor Resampling for Off-Policy Evaluation in Stochastic Control
- Abstract
We propose a novel $K$-nearest neighbor resampling procedure for estimating the performance of a policy from historical data containing realized episodes of a decision process generated under a different policy. We focus on feedback policies that depend deterministically on the current state in environments with continuous state-action spaces and system-inherent stochasticity effected by chosen actions. Such settings are common in a wide range of high-stake applications and are actively investigated in the context of stochastic control. Our procedure exploits that similar state/action pairs (in a metric sense) are associated with similar rewards and state transitions. This enables our resampling procedure to tackle the counterfactual estimation problem underlying off-policy evaluation (OPE) by simulating trajectories similarly to Monte Carlo methods. Compared to other OPE methods, our algorithm does not require optimization, can be efficiently implemented via tree-based nearest neighbor search and parallelization and does not explicitly assume a parametric model for the environment's dynamics. These properties make the proposed resampling algorithm particularly useful for stochastic control environments. We prove that our method is statistically consistent in estimating the performance of a policy in the OPE setting under weak assumptions and for data sets containing entire episodes rather than independent transitions. To establish the consistency, we generalize Stone's Theorem, a well-known result in nonparametric statistics on local averaging, to include episodic data and the counterfactual estimation underlying OPE. Numerical experiments demonstrate the effectiveness of the algorithm in a variety of stochastic control settings including a linear quadratic regulator, trade execution in limit order books and online stochastic bin packing.
- Published
- 2023
15. Communication-Efficient Gradient Descent-Accent Methods for Distributed Variational Inequalities: Unified Analysis and Local Updates
- Abstract
Distributed and federated learning algorithms and techniques associated primarily with minimization problems. However, with the increase of minimax optimization and variational inequality problems in machine learning, the necessity of designing efficient distributed/federated learning approaches for these problems is becoming more apparent. In this paper, we provide a unified convergence analysis of communication-efficient local training methods for distributed variational inequality problems (VIPs). Our approach is based on a general key assumption on the stochastic estimates that allows us to propose and analyze several novel local training algorithms under a single framework for solving a class of structured non-monotone VIPs. We present the first local gradient descent-accent algorithms with provable improved communication complexity for solving distributed variational inequalities on heterogeneous data. The general algorithmic framework recovers state-of-the-art algorithms and their sharp convergence guarantees when the setting is specialized to minimization or minimax optimization problems. Finally, we demonstrate the strong performance of the proposed algorithms compared to state-of-the-art methods when solving federated minimax optimization problems.
- Published
- 2023
16. Recovering Simultaneously Structured Data via Non-Convex Iteratively Reweighted Least Squares
- Abstract
We propose a new algorithm for the problem of recovering data that adheres to multiple, heterogeneous low-dimensional structures from linear observations. Focusing on data matrices that are simultaneously row-sparse and low-rank, we propose and analyze an iteratively reweighted least squares (IRLS) algorithm that is able to leverage both structures. In particular, it optimizes a combination of non-convex surrogates for row-sparsity and rank, a balancing of which is built into the algorithm. We prove locally quadratic convergence of the iterates to a simultaneously structured data matrix in a regime of minimal sample complexity (up to constants and a logarithmic factor), which is known to be impossible for a combination of convex surrogates. In experiments, we show that the IRLS method exhibits favorable empirical convergence, identifying simultaneously row-sparse and low-rank matrices from fewer measurements than state-of-the-art methods., Comment: 35 pages, 6 figures
- Published
- 2023
17. On the Identification and Optimization of Nonsmooth Superposition Operators in Semilinear Elliptic PDEs
- Abstract
We study an infinite-dimensional optimization problem that aims to identify the Nemytskii operator in the nonlinear part of a prototypical semilinear elliptic partial differential equation (PDE) which minimizes the distance between the PDE-solution and a given desired state. In contrast to previous works, we consider this identification problem in a low-regularity regime in which the function inducing the Nemytskii operator is a-priori only known to be an element of $H^1_{loc}(\mathbb{R})$. This makes the studied problem class a suitable point of departure for the rigorous analysis of training problems for learning-informed PDEs in which an unknown superposition operator is approximated by means of a neural network with nonsmooth activation functions (ReLU, leaky-ReLU, etc.). We establish that, despite the low regularity of the controls, it is possible to derive a classical stationarity system for local minimizers and to solve the considered problem by means of a gradient projection method. The convergence of the resulting algorithm is proven in the function space setting. It is also shown that the established first-order necessary optimality conditions imply that locally optimal superposition operators share various characteristic properties with commonly used activation functions: They are always sigmoidal, continuously differentiable away from the origin, and typically possess a distinct kink at zero. The paper concludes with numerical experiments which confirm the theoretical findings.
- Published
- 2023
18. Explainable AI using expressive Boolean formulas
- Abstract
We propose and implement an interpretable machine learning classification model for Explainable AI (XAI) based on expressive Boolean formulas. Potential applications include credit scoring and diagnosis of medical conditions. The Boolean formula defines a rule with tunable complexity (or interpretability), according to which input data are classified. Such a formula can include any operator that can be applied to one or more Boolean variables, thus providing higher expressivity compared to more rigid rule-based and tree-based approaches. The classifier is trained using native local optimization techniques, efficiently searching the space of feasible formulas. Shallow rules can be determined by fast Integer Linear Programming (ILP) or Quadratic Unconstrained Binary Optimization (QUBO) solvers, potentially powered by special purpose hardware or quantum devices. We combine the expressivity and efficiency of the native local optimizer with the fast operation of these devices by executing non-local moves that optimize over subtrees of the full Boolean formula. We provide extensive numerical benchmarking results featuring several baselines on well-known public datasets. Based on the results, we find that the native local rule classifier is generally competitive with the other classifiers. The addition of non-local moves achieves similar results with fewer iterations, and therefore using specialized or quantum hardware could lead to a speedup by fast proposal of non-local moves., Comment: 28 pages, 16 figures, 4 tables
- Published
- 2023
19. End-to-End Learning for Stochastic Optimization: A Bayesian Perspective
- Abstract
We develop a principled approach to end-to-end learning in stochastic optimization. First, we show that the standard end-to-end learning algorithm admits a Bayesian interpretation and trains a posterior Bayes action map. Building on the insights of this analysis, we then propose new end-to-end learning algorithms for training decision maps that output solutions of empirical risk minimization and distributionally robust optimization problems, two dominant modeling paradigms in optimization under uncertainty. Numerical results for a synthetic newsvendor problem illustrate the key differences between alternative training schemes. We also investigate an economic dispatch problem based on real data to showcase the impact of the neural network architecture of the decision maps on their test performance., Comment: Accepted at ICML 2023
- Published
- 2023
20. Accelerating 128-bit Floating-Point Matrix Multiplication on FPGAs
- Abstract
General Matrix Multiplication (GEMM) is a fundamental operation widely used in scientific computations. Its performance and accuracy significantly impact the performance and accuracy of applications that depend on it. One such application is semidefinite programming (SDP), and it often requires binary128 or higher precision arithmetic to solve problems involving SDP stably. However, only some processors support binary128 arithmetic, which makes SDP solvers generally slow. In this study, we focused on accelerating GEMM with binary128 arithmetic on field-programmable gate arrays (FPGAs) to enable the flexible design of accelerators for the desired computations. Our binary128 GEMM designs on a recent high-performance FPGA achieved approximately 90GFlops, 147x faster than the computation executed on a recent CPU with 20 threads for large matrices. Using our binary128 GEMM design on the FPGA, we successfully accelerated two numerical applications: LU decomposition and SDP problems, for the first time., Comment: 12 pages, 8 figures
- Published
- 2023
21. Interpreting and Improving Diffusion Models Using the Euclidean Distance Function
- Abstract
Denoising is intuitively related to projection. Indeed, under the manifold hypothesis, adding random noise is approximately equivalent to orthogonal perturbation. Hence, learning to denoise is approximately learning to project. In this paper, we use this observation to reinterpret denoising diffusion models as approximate gradient descent applied to the Euclidean distance function. We then provide straight-forward convergence analysis of the DDIM sampler under simple assumptions on the projection-error of the denoiser. Finally, we propose a new sampler based on two simple modifications to DDIM using insights from our theoretical results. In as few as 5-10 function evaluations, our sampler achieves state-of-the-art FID scores on pretrained CIFAR-10 and CelebA models and can generate high quality samples on latent diffusion models., Comment: 18 pages, 6 figures, 2 tables
- Published
- 2023
22. Comparison of SeDuMi and SDPT3 Solvers for Stability of Continuous-time Linear System
- Abstract
SeDuMi and SDPT3 are two solvers for solving Semi-definite Programming (SDP) or Linear Matrix Inequality (LMI) problems. A computational performance comparison of these two are undertaken in this paper regarding the Stability of Continuous-time Linear Systems. The comparison mainly focuses on computational times and memory requirements for different scales of problems. To implement and compare the two solvers on a set of well-posed problems, we employ YALMIP, a widely used toolbox for modeling and optimization in MATLAB. The primary goal of this study is to provide an empirical assessment of the relative computational efficiency of SeDuMi and SDPT3 under varying problem conditions. Our evaluation indicates that SDPT3 performs much better in large-scale, high-precision calculations., Comment: 13 pages, 12 figures
- Published
- 2023
23. Online Learning under Adversarial Nonlinear Constraints
- Abstract
In many applications, learning systems are required to process continuous non-stationary data streams. We study this problem in an online learning framework and propose an algorithm that can deal with adversarial time-varying and nonlinear constraints. As we show in our work, the algorithm called Constraint Violation Velocity Projection (CVV-Pro) achieves $\sqrt{T}$ regret and converges to the feasible set at a rate of $1/\sqrt{T}$, despite the fact that the feasible set is slowly time-varying and a priori unknown to the learner. CVV-Pro only relies on local sparse linear approximations of the feasible set and therefore avoids optimizing over the entire set at each iteration, which is in sharp contrast to projected gradients or Frank-Wolfe methods. We also empirically evaluate our algorithm on two-player games, where the players are subjected to a shared constraint.
- Published
- 2023
24. Development of On-Ground Hardware In Loop Simulation Facility for Space Robotics
- Abstract
Over a couple of decades, space junk has increased rapidly, which has caused significant threats to the LEO operation satellites. An Active Debris Removal $(ADR)$ concept continuously evolves for space junk removal. One of the ADR methods is Space Robotics, whose function is to chase, capture and de-orbit the space junk. This paper presents the development of an on-ground space robotics facility in the TCS Research for on-orbit servicing $(OOS)$ like refueling and debris capture experiments. A Hardware in Loop Simulation (HILS) system will be used for integrated system development, testing, and demonstration of on-orbit docking mechanisms. The HiLS test facility of TCS Research Lab will use two URs in which one UR is attached to the RG2 gripper, and the other is attached to a force-torque sensor and with a scaled mock-up model. The first UR5 will be mounted on a 7-axis linear rail and contain the docking probe. First, UR5 with a suitable gripper has to interface its control boxes. The grasping algorithm was run through the ROS interface line to demonstrate and validate the on-orbit operations. The manipulator will be mounted with LIDAR and a camera to visualize the mock-up model, find the target model's pose and rotational velocity estimation, and a gripper that will move relative to the target model. The other manipulator has the UR10 control, providing rotational and random motion to the mockup, enabling a dynamic simulator fed by force-torque data. The dynamic simulator is fed up with the orbit propagator, which will provide the orbiting environment to the target model. For the simulation of the docking and grasping of the target model, a linear rail of a 6m setup is still in the procurement process. Once reaching proximity, the grasping algorithm will be launched to capture the target model after reading the random motion of the mock-up model., Comment: 11 pages, 15 figures, Accepted at Small Satellite Conference 2023; Weekday Sessions: Orbital Debris, SSA & STM; Tuesday, 8th Aug 2023
- Published
- 2023
25. Efficient Alternating Minimization with Applications to Weighted Low Rank Approximation
- Abstract
Weighted low rank approximation is a fundamental problem in numerical linear algebra, and it has many applications in machine learning. Given a matrix $M \in \mathbb{R}^{n \times n}$, a weight matrix $W \in \mathbb{R}_{\geq 0}^{n \times n}$, a parameter $k$, the goal is to output two matrices $U, V \in \mathbb{R}^{n \times k}$ such that $\| W \circ (M - U V) \|_F$ is minimized, where $\circ$ denotes the Hadamard product. Such a problem is known to be NP-hard and even hard to approximate [RSW16]. Meanwhile, alternating minimization is a good heuristic solution for approximating weighted low rank approximation. The work [LLR16] shows that, under mild assumptions, alternating minimization does provide provable guarantees. In this work, we develop an efficient and robust framework for alternating minimization. For weighted low rank approximation, this improves the runtime of [LLR16] from $n^2 k^2$ to $n^2k$. At the heart of our work framework is a high-accuracy multiple response regression solver together with a robust analysis of alternating minimization., Comment: arXiv admin note: text overlap with arXiv:2302.11068
- Published
- 2023
26. Convergent Bregman Plug-and-Play Image Restoration for Poisson Inverse Problems
- Abstract
Plug-and-Play (PnP) methods are efficient iterative algorithms for solving ill-posed image inverse problems. PnP methods are obtained by using deep Gaussian denoisers instead of the proximal operator or the gradient-descent step within proximal algorithms. Current PnP schemes rely on data-fidelity terms that have either Lipschitz gradients or closed-form proximal operators, which is not applicable to Poisson inverse problems. Based on the observation that the Gaussian noise is not the adequate noise model in this setting, we propose to generalize PnP using theBregman Proximal Gradient (BPG) method. BPG replaces the Euclidean distance with a Bregman divergence that can better capture the smoothness properties of the problem. We introduce the Bregman Score Denoiser specifically parametrized and trained for the new Bregman geometry and prove that it corresponds to the proximal operator of a nonconvex potential. We propose two PnP algorithms based on the Bregman Score Denoiser for solving Poisson inverse problems. Extending the convergence results of BPG in the nonconvex settings, we show that the proposed methods converge, targeting stationary points of an explicit global functional. Experimental evaluations conducted on various Poisson inverse problems validate the convergence results and showcase effective restoration performance.
- Published
- 2023
27. A Lightweight Method for Tackling Unknown Participation Probabilities in Federated Averaging
- Abstract
In federated learning (FL), clients usually have diverse participation probabilities that are unknown a priori, which can significantly harm the performance of FL if not handled properly. Existing works aiming at addressing this problem are usually based on global variance reduction, which requires a substantial amount of additional memory in a multiplicative factor equal to the total number of clients. An important open problem is to find a lightweight method for FL in the presence of clients with unknown participation rates. In this paper, we address this problem by adapting the aggregation weights in federated averaging (FedAvg) based on the participation history of each client. We first show that, with heterogeneous participation probabilities, FedAvg with non-optimal aggregation weights can diverge from the optimal solution of the original FL objective, indicating the need of finding optimal aggregation weights. However, it is difficult to compute the optimal weights when the participation probabilities are unknown. To address this problem, we present a new algorithm called FedAU, which improves FedAvg by adaptively weighting the client updates based on online estimates of the optimal weights without knowing the probabilities of client participation. We provide a theoretical convergence analysis of FedAU using a novel methodology to connect the estimation error and convergence. Our theoretical results reveal important and interesting insights, while showing that FedAU converges to an optimal solution of the original objective and has desirable properties such as linear speedup. Our experimental results also verify the advantage of FedAU over baseline methods.
- Published
- 2023
28. The Unscented Kalman Filter for Nonlinear Parameter Identification of Adaptive Cruise Control Systems
- Abstract
This paper develops and investigates a dual unscented Kalman filter (DUKF) for the joint nonlinear state and parameter identification of commercial adaptive cruise control (ACC) systems. Although the core functionality of stock ACC systems, including their proprietary control logic and parameters, is not publicly available, this work considers a car-following scenario with a human-driven vehicle (leader) and an ACC engaged ego vehicle (follower) that employs a constant time-headway policy (CTHP). The objective of the DUKF is to determine the CTHP parameters of the ACC by using real-time observations of space-gap and relative velocity from the vehicle's onboard sensors. Real-time parameter identification of stock ACC systems is essential for assessing their string stability, large-scale deployment on motorways, and impact on traffic flow and throughput. In this regard, $L_2$ and $L_\infty$ string stability conditions are considered. The observability rank condition for nonlinear systems is adopted to evaluate the ability of the proposed estimation scheme to estimate stock ACC system parameters using empirical data. The proposed filter is evaluated using empirical data collected from the onboard sensors of two 2019 SUV vehicles, namely Hyundai Nexo and SsangYong Rexton, equipped with stock ACC systems; and is compared with batch and recursive least-squares optimization. The set of ACC model parameters obtained from the proposed filter revealed that the commercially implemented ACC system of the considered vehicle (Hyundai Nexo) is neither $L_2$ nor $L_\infty$ string stable., Comment: 11 papes, 3 Figures
- Published
- 2023
- Full Text
- View/download PDF
29. A Mirror Descent Perspective on Classical and Quantum Blahut-Arimoto Algorithms
- Abstract
The Blahut-Arimoto algorithm is a well known method to compute classical channel capacities and rate-distortion functions. Recent works have extended this algorithm to compute various quantum analogs of these quantities. In this paper, we show how these Blahut-Arimoto algorithms are special instances of mirror descent, which is a well-studied generalization of gradient descent for constrained convex optimization. Using new convex analysis tools, we show how relative smoothness and strong convexity analysis recovers known sublinear and linear convergence rates for Blahut-Arimoto algorithms. This mirror descent viewpoint allows us to derive related algorithms with similar convergence guarantees to solve problems in information theory for which Blahut-Arimoto-type algorithms are not directly applicable. We apply this framework to compute energy-constrained classical and quantum channel capacities, classical and quantum rate-distortion functions, and approximations of the relative entropy of entanglement, all with provable convergence guarantees., Comment: 30 pages
- Published
- 2023
30. Integer Programming Games: A Gentle Computational Overview
- Abstract
In this tutorial, we present a computational overview on computing Nash equilibria in Integer Programming Games ($IPG$s), $i.e.$, how to compute solutions for a class of non-cooperative and nonconvex games where each player solves a mixed-integer optimization problem. $IPG$s are a broad class of games extending the modeling power of mixed-integer optimization to multi-agent settings. This class of games includes, for instance, any finite game and any multi-agent extension of traditional combinatorial optimization problems. After providing some background motivation and context of applications, we systematically review and classify the state-of-the-art algorithms to compute Nash equilibria. We propose an essential taxonomy of the algorithmic ingredients needed to compute equilibria, and we describe the theoretical and practical challenges associated with equilibria computation. Finally, we quantitatively and qualitatively compare a sequential Stackelberg game with a simultaneous $IPG$ to highlight the different properties of their solutions., Comment: To appear in INFORMS TutORials in Operations Research 2023
- Published
- 2023
31. Provable convergence guarantees for black-box variational inference
- Abstract
While black-box variational inference is widely used, there is no proof that its stochastic optimization succeeds. We suggest this is due to a theoretical gap in existing stochastic optimization proofs-namely the challenge of gradient estimators with unusual noise bounds, and a composite non-smooth objective. For dense Gaussian variational families, we observe that existing gradient estimators based on reparameterization satisfy a quadratic noise bound and give novel convergence guarantees for proximal and projected stochastic gradient descent using this bound. This provides the first rigorous guarantee that black-box variational inference converges for realistic inference problems., Comment: 32 pages
- Published
- 2023
32. Understanding Progressive Training Through the Framework of Randomized Coordinate Descent
- Abstract
We propose a Randomized Progressive Training algorithm (RPT) -- a stochastic proxy for the well-known Progressive Training method (PT) (Karras et al., 2017). Originally designed to train GANs (Goodfellow et al., 2014), PT was proposed as a heuristic, with no convergence analysis even for the simplest objective functions. On the contrary, to the best of our knowledge, RPT is the first PT-type algorithm with rigorous and sound theoretical guarantees for general smooth objective functions. We cast our method into the established framework of Randomized Coordinate Descent (RCD) (Nesterov, 2012; Richt\'arik & Tak\'a\v{c}, 2014), for which (as a by-product of our investigations) we also propose a novel, simple and general convergence analysis encapsulating strongly-convex, convex and nonconvex objectives. We then use this framework to establish a convergence theory for RPT. Finally, we validate the effectiveness of our method through extensive computational experiments.
- Published
- 2023
33. Predicting oscillations in relay feedback systems, using fixed points of Poincar\'e maps, and Hopf bifurcations
- Abstract
The relay autotuning method identifies plant parameters, from oscillations of the plant under relay feedback. To predict the presence and nature of such oscillations, we apply the following two approaches: (a) analysis of the switching dynamics, while using an ideal relay, and (b) bifurcation analysis, while using a smooth approximation of the relay. For stable plants with positive DC gains, our analyses predict that: (i) a periodic orbit is guaranteed, for a class of non-minimum phase plants of relative degree one, whose step response starts with an inverse response, and (ii) for a wider class of plants, whose root locus diagrams cross the imaginary axis at complex conjugate values, limit cycles are merely suggested., Comment: submitted to the IEEE transactions on Automatic Control
- Published
- 2023
34. On the Split Closure of the Periodic Timetabling Polytope
- Abstract
The Periodic Event Scheduling Problem (PESP) is the central mathematical tool for periodic timetable optimization in public transport. PESP can be formulated in several ways as a mixed-integer linear program with typically general integer variables. We investigate the split closure of these formulations and show that split inequalities are identical with the recently introduced flip inequalities. While split inequalities are a general mixed-integer programming technique, flip inequalities are defined in purely combinatorial terms, namely cycles and arc sets of the digraph underlying the PESP instance. It is known that flip inequalities can be separated in pseudo-polynomial time. We prove that this is best possible unless P $=$ NP, but also observe that the complexity becomes linear-time if the cycle defining the flip inequality is fixed. Moreover, introducing mixed-integer-compatible maps, we compare the split closures of different formulations, and show that reformulation or binarization by subdivision do not lead to stronger split closures. Finally, we estimate computationally how much of the optimality gap of the instances of the benchmark library PESPlib can be closed exclusively by split cuts, and provide better dual bounds for five instances.
- Published
- 2023
35. Probabilistic Region-of-Attraction Estimation with Scenario Optimization and Converse Theorems
- Abstract
The region of attraction characterizes well-behaved and safe operation of a nonlinear system and is hence sought after for verification. In this paper, a framework for probabilistic region of attraction estimation is developed that combines scenario optimization and converse theorems. With this approach, the probability of an unstable condition being included in the estimate is independent of the system's complexity, while convergence in probability to the true region of attraction is proven. Numerical examples demonstrate the effectiveness for optimization-based control applications. Combining systems theory and sampling, the complexity of Monte--Carlo-based verification techniques can be reduced. The results can be extended to arbitrary level sets of which the defining function can be sampled, such as finite-horizon viability. Thus, the proposed approach is applicable and/or adaptable to verification of a wide range of safety-related properties for nonlinear systems including feedback laws based on optimization or learning., Comment: Submitted to IEEE Transactions of Automatic Control
- Published
- 2023
36. Onsite Job Scheduling by Adaptive Genetic Algorithm
- Abstract
Onsite Job Scheduling is a specialized variant of Vehicle Routing Problem (VRP) with multiple depots. The objective of this problem is to execute jobs requested by customers, belonging to different geographic locations by a limited number of technicians, with minimum travel and overtime of technicians. Each job is expected to be completed within a specified time limit according to the service level agreement with customers. Each technician is assumed to start from a base location, serve several customers and return to the starting place. Technicians are allotted jobs based on their skill sets, expertise levels of each skill and availability slots. Although there are considerable number of literatures on VRP we do not see any explicit work related to Onsite Job Scheduling. In this paper we have proposed an Adaptive Genetic Algorithm to solve the scheduling problem. We found an optimized travel route for a substantial number of jobs and technicians, minimizing travel distance, overtime duration as well as meeting constraints related to SLA.
- Published
- 2023
37. Prescriptive PCA: Dimensionality Reduction for Two-stage Stochastic Optimization
- Abstract
In this paper, we consider the alignment between an upstream dimensionality reduction task of learning a low-dimensional representation of a set of high-dimensional data and a downstream optimization task of solving a stochastic program parameterized by said representation. In this case, standard dimensionality reduction methods (e.g., principal component analysis) may not perform well, as they aim to maximize the amount of information retained in the representation and do not generally reflect the importance of such information in the downstream optimization problem. To address this problem, we develop a prescriptive dimensionality reduction framework that aims to minimize the degree of suboptimality in the optimization phase. For the case where the downstream stochastic optimization problem has an expected value objective, we show that prescriptive dimensionality reduction can be performed via solving a distributionally-robust optimization problem, which admits a semidefinite programming relaxation. Computational experiments based on a warehouse transshipment problem and a vehicle repositioning problem show that our approach significantly outperforms principal component analysis with real and synthetic data sets.
- Published
- 2023
38. Frequency Regulation with Storage: On Losses and Profits
- Abstract
Low-carbon societies will need to store vast amounts of electricity to balance intermittent generation from wind and solar energy, for example, through frequency regulation. Here, we derive an analytical solution to the decision-making problem of storage operators who sell frequency regulation power to grid operators and trade electricity on day-ahead markets. Mathematically, we treat future frequency deviation trajectories as functional uncertainties in a receding horizon robust optimization problem. We constrain the expected terminal state-of-charge to be equal to some target to allow storage operators to make good decisions not only for the present but also the future. Thanks to this constraint, the amount of electricity traded on day-ahead markets is an implicit function of the regulation power sold to grid operators. The implicit function quantifies the amount of power that needs to be purchased to cover the expected energy loss that results from providing frequency regulation. We show how the marginal cost associated with the expected energy loss decreases with roundtrip efficiency and increases with frequency deviation dispersion. We find that the profits from frequency regulation over the lifetime of energy-constrained storage devices are roughly inversely proportional to the length of time for which regulation power must be committed.
- Published
- 2023
39. LibAUC: A Deep Learning Library for X-Risk Optimization
- Abstract
This paper introduces the award-winning deep learning (DL) library called LibAUC for implementing state-of-the-art algorithms towards optimizing a family of risk functions named X-risks. X-risks refer to a family of compositional functions in which the loss function of each data point is defined in a way that contrasts the data point with a large number of others. They have broad applications in AI for solving classical and emerging problems, including but not limited to classification for imbalanced data (CID), learning to rank (LTR), and contrastive learning of representations (CLR). The motivation of developing LibAUC is to address the convergence issues of existing libraries for solving these problems. In particular, existing libraries may not converge or require very large mini-batch sizes in order to attain good performance for these problems, due to the usage of the standard mini-batch technique in the empirical risk minimization (ERM) framework. Our library is for deep X-risk optimization (DXO) that has achieved great success in solving a variety of tasks for CID, LTR and CLR. The contributions of this paper include: (1) It introduces a new mini-batch based pipeline for implementing DXO algorithms, which differs from existing DL pipeline in the design of controlled data samplers and dynamic mini-batch losses; (2) It provides extensive benchmarking experiments for ablation studies and comparison with existing libraries. The LibAUC library features scalable performance for millions of items to be contrasted, faster and better convergence than existing libraries for optimizing X-risks, seamless PyTorch deployment and versatile APIs for various loss optimization. Our library is available to the open source community at https://github.com/Optimization-AI/LibAUC, to facilitate further academic research and industrial applications., Comment: Accepted by KDD2023
- Published
- 2023
- Full Text
- View/download PDF
40. Stochastic Natural Thresholding Algorithms
- Abstract
Sparse signal recovery is one of the most fundamental problems in various applications, including medical imaging and remote sensing. Many greedy algorithms based on the family of hard thresholding operators have been developed to solve the sparse signal recovery problem. More recently, Natural Thresholding (NT) has been proposed with improved computational efficiency. This paper proposes and discusses convergence guarantees for stochastic natural thresholding algorithms by extending the NT from the deterministic version with linear measurements to the stochastic version with a general objective function. We also conduct various numerical experiments on linear and nonlinear measurements to demonstrate the performance of StoNT.
- Published
- 2023
41. Nonlinear Distributionally Robust Optimization
- Abstract
This article focuses on a class of distributionally robust optimization (DRO) problems where, unlike the growing body of the literature, the objective function is potentially non-linear in the distribution. Existing methods to optimize nonlinear functions in probability space use the Frechet derivatives, which present both theoretical and computational challenges. Motivated by this, we propose an alternative notion for the derivative and corresponding smoothness based on Gateaux (G)-derivative for generic risk measures. These concepts are explained via three running risk measure examples of variance, entropic risk, and risk on finite support sets. We then propose a G-derivative based Frank-Wolfe~(FW) algorithm for generic non-linear optimization problems in probability spaces and establish its convergence under the proposed notion of smoothness in a completely norm-independent manner. We use the set-up of the FW algorithm to devise a methodology to compute a saddle point of the non-linear DRO problem. Finally, for the minimum variance portfolio selection problem we analyze the regularity conditions and compute the FW-oracle in various settings, and validate the theoretical results numerically.
- Published
- 2023
42. Aiming towards the minimizers: fast convergence of SGD for overparametrized problems
- Abstract
Modern machine learning paradigms, such as deep learning, occur in or close to the interpolation regime, wherein the number of model parameters is much larger than the number of data samples. In this work, we propose a regularity condition within the interpolation regime which endows the stochastic gradient method with the same worst-case iteration complexity as the deterministic gradient method, while using only a single sampled gradient (or a minibatch) in each iteration. In contrast, all existing guarantees require the stochastic gradient method to take small steps, thereby resulting in a much slower linear rate of convergence. Finally, we demonstrate that our condition holds when training sufficiently wide feedforward neural networks with a linear output layer.
- Published
- 2023
43. Curvature and complexity: Better lower bounds for geodesically convex optimization
- Abstract
We study the query complexity of geodesically convex (g-convex) optimization on a manifold. To isolate the effect of that manifold's curvature, we primarily focus on hyperbolic spaces. In a variety of settings (smooth or not; strongly g-convex or not; high- or low-dimensional), known upper bounds worsen with curvature. It is natural to ask whether this is warranted, or an artifact. For many such settings, we propose a first set of lower bounds which indeed confirm that (negative) curvature is detrimental to complexity. To do so, we build on recent lower bounds (Hamilton and Moitra, 2021; Criscitiello and Boumal, 2022) for the particular case of smooth, strongly g-convex optimization. Using a number of techniques, we also secure lower bounds which capture dependence on condition number and optimality gap, which was not previously the case. We suspect these bounds are not optimal. We conjecture optimal ones, and support them with a matching lower bound for a class of algorithms which includes subgradient descent, and a lower bound for a related game. Lastly, to pinpoint the difficulty of proving lower bounds, we study how negative curvature influences (and sometimes obstructs) interpolation with g-convex functions., Comment: 45 pages
- Published
- 2023
44. Searching for Optimal Per-Coordinate Step-sizes with Multidimensional Backtracking
- Abstract
The backtracking line-search is an effective technique to automatically tune the step-size in smooth optimization. It guarantees similar performance to using the theoretically optimal step-size. Many approaches have been developed to instead tune per-coordinate step-sizes, also known as diagonal preconditioners, but none of the existing methods are provably competitive with the optimal per-coordinate stepsizes. We propose multidimensional backtracking, an extension of the backtracking line-search to find good diagonal preconditioners for smooth convex problems. Our key insight is that the gradient with respect to the step-sizes, also known as hypergradients, yields separating hyperplanes that let us search for good preconditioners using cutting-plane methods. As black-box cutting-plane approaches like the ellipsoid method are computationally prohibitive, we develop an efficient algorithm tailored to our setting. Multidimensional backtracking is provably competitive with the best diagonal preconditioner and requires no manual tuning.
- Published
- 2023
45. Formation Control with Unknown Directions and General Coupling Coefficients
- Abstract
Generally, the normal displacement-based formation control has a sensing mode that requires the agent not only to have certain knowledge of its direction, but also to gather its local information characterized by nonnegative coupling coefficients. However, the direction may be unknown in the sensing processes, and the coupling coefficients may also involve negative ones due to some circumstances. This paper introduces these phenomena into a class of displacement-based formation control problem. Then, a geometric approach have been employed to overcome the difficulty of analysis on the introduced phenomena. The purpose of this approach is to construct some convex polytopes for containing the effects caused by the unknown direction, and to analyze the non-convexity by admitting the negative coupling coefficients in a certain range. Under the actions of these phenomena, the constructed polytopes are shown to be invariant in view of the contractive set method. It means that the convergence of formation shape can be guaranteed. Subsequently, an example is given to examine the applicability of derived result.
- Published
- 2023
46. Solving NP-hard Min-max Routing Problems as Sequential Generation with Equity Context
- Abstract
Min-max routing problems aim to minimize the maximum tour length among agents as they collaboratively visit all cities, i.e., the completion time. These problems include impactful real-world applications but are known as NP-hard. Existing methods are facing challenges, particularly in large-scale problems that require the coordination of numerous agents to cover thousands of cities. This paper proposes a new deep-learning framework to solve large-scale min-max routing problems. We model the simultaneous decision-making of multiple agents as a sequential generation process, allowing the utilization of scalable deep-learning models for sequential decision-making. In the sequentially approximated problem, we propose a scalable contextual Transformer model, Equity-Transformer, which generates sequential actions considering an equitable workload among other agents. The effectiveness of Equity-Transformer is demonstrated through its superior performance in two representative min-max routing tasks: the min-max multiple traveling salesman problem (min-max mTSP) and the min-max multiple pick-up and delivery problem (min-max mPDP). Notably, our method achieves significant reductions of runtime, approximately 335 times, and cost values of about 53% compared to a competitive heuristic (LKH3) in the case of 100 vehicles with 1,000 cities of mTSP. We provide reproducible source code: https://github.com/kaist-silab/equity-transformer, Comment: 18 pages, 7 figures
- Published
- 2023
47. A Generalized Alternating Method for Bilevel Learning under the Polyak-{\L}ojasiewicz Condition
- Abstract
Bilevel optimization has recently regained interest owing to its applications in emerging machine learning fields such as hyperparameter optimization, meta-learning, and reinforcement learning. Recent results have shown that simple alternating (implicit) gradient-based algorithms can achieve the same convergence rate of single-level gradient descent (GD) for bilevel problems with a strongly convex lower-level objective. However, it remains unclear whether this result can be generalized to bilevel problems beyond this basic setting. In this paper, we propose a Generalized ALternating mEthod for bilevel opTimization (GALET) with a nonconvex lower-level objective that satisfies the Polyak-{\L}ojasiewicz (PL) condition. We first introduce a stationary metric for the considered bilevel problems, which generalizes the existing metric. We then establish that GALET achieves an $\epsilon$-stationary metric for the considered problem within $\tilde{\cal O}(\epsilon^{-1})$ iterations, which matches the iteration complexity of GD for smooth nonconvex problems.
- Published
- 2023
48. When Decentralized Optimization Meets Federated Learning
- Abstract
Federated learning is a new learning paradigm for extracting knowledge from distributed data. Due to its favorable properties in preserving privacy and saving communication costs, it has been extensively studied and widely applied to numerous data analysis applications. However, most existing federated learning approaches concentrate on the centralized setting, which is vulnerable to a single-point failure. An alternative strategy for addressing this issue is the decentralized communication topology. In this article, we systematically investigate the challenges and opportunities when renovating decentralized optimization for federated learning. In particular, we discussed them from the model, data, and communication sides, respectively, which can deepen our understanding about decentralized federated learning., Comment: Accepted to IEEE Network
- Published
- 2023
49. Accelerated Quasi-Newton Proximal Extragradient: Faster Rate for Smooth Convex Optimization
- Abstract
In this paper, we propose an accelerated quasi-Newton proximal extragradient (A-QPNE) method for solving unconstrained smooth convex optimization problems. With access only to the gradients of the objective, we prove that our method can achieve a convergence rate of ${O}\bigl(\min\{\frac{1}{k^2}, \frac{\sqrt{d\log k}}{k^{2.5}}\}\bigr)$, where $d$ is the problem dimension and $k$ is the number of iterations. In particular, in the regime where $k = {O}(d)$, our method matches the optimal rate of ${O}(\frac{1}{k^2})$ by Nesterov's accelerated gradient (NAG). Moreover, in the the regime where $k = \Omega(d \log d)$, it outperforms NAG and converges at a faster rate of ${O}\bigl(\frac{\sqrt{d\log k}}{k^{2.5}}\bigr)$. To the best of our knowledge, this result is the first to demonstrate a provable gain of a quasi-Newton-type method over NAG in the convex setting. To achieve such results, we build our method on a recent variant of the Monteiro-Svaiter acceleration framework and adopt an online learning perspective to update the Hessian approximation matrices, in which we relate the convergence rate of our method to the dynamic regret of a specific online convex optimization problem in the space of matrices., Comment: 44 pages, 1 figure
- Published
- 2023
50. Does a sparse ReLU network training problem always admit an optimum?
- Abstract
Given a training set, a loss function, and a neural network architecture, it is often taken for granted that optimal network parameters exist, and a common practice is to apply available optimization algorithms to search for them. In this work, we show that the existence of an optimal solution is not always guaranteed, especially in the context of {\em sparse} ReLU neural networks. In particular, we first show that optimization problems involving deep networks with certain sparsity patterns do not always have optimal parameters, and that optimization algorithms may then diverge. Via a new topological relation between sparse ReLU neural networks and their linear counterparts, we derive -- using existing tools from real algebraic geometry -- an algorithm to verify that a given sparsity pattern suffers from this issue. Then, the existence of a global optimum is proved for every concrete optimization problem involving a shallow sparse ReLU neural network of output dimension one. Overall, the analysis is based on the investigation of two topological properties of the space of functions implementable as sparse ReLU neural networks: a best approximation property, and a closedness property, both in the uniform norm. This is studied both for (finite) domains corresponding to practical training on finite training sets, and for more general domains such as the unit cube. This allows us to provide conditions for the guaranteed existence of an optimum given a sparsity pattern. The results apply not only to several sparsity patterns proposed in recent works on network pruning/sparsification, but also to classical dense neural networks, including architectures not covered by existing results.
- Published
- 2023
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.