1. Global Convergence in Learning Fully-Connected ReLU Networks Via Un-rectifying Based on the Augmented Lagrangian Approach.
- Author
-
Tung, Shih-Shuo, Chung, Ming-Yu, Ho, Jinn, and Hwang, Wen-Liang
- Abstract
Most learning algorithms for deep neural networks (DNNs) employ gradient descent or block coordinate descent, which involves the direct application of a non-linear activation function throughout the course of optimization, such that only the input and output of the activation function are optimized. By contrast, the “un-rectifying” technique expresses a non-linear point-wise (non-smooth) activation function as a data-dependent variable (activation variable), which means that the activation variable along with its input and output can all be employed in optimization. The fact that the ReLU network in this study was un-rectified means that the activation functions could be replaced with data-dependent activation variables in the form of equations and equality constraints. The discrete nature of activation variables associated with un-rectifying ReLUs allows the reformulation of deep learning problems as problems of combinatorial optimization. However, we demonstrate that the optimal solution to a combinatorial optimization problem can be preserved by relaxing the discrete domains of activation variables to closed intervals in a real domain. We also demonstrate it is possible to optimize a network based on the augmented Lagrangian approach. This means that our method could theoretically achieve global convergence where all limit points are critical points of the learning problem. In experiments and comparisons with other methods addressing classification and compressed sensing recovery problems, the proposed approach makes it easier to learn a network to a given degree of accuracy within a small number of epochs. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF