1. The Gauss-Newton matrix for Deep Learning models and its applications
- Author
-
Botev, Aleksandar
- Subjects
006.3 - Abstract
Deep Learning learning has recently become one of the most predominantly used techniques in the field of Machine Learning. Optimising these models, however, is very difficult and in order to scale the training to large datasets and model sizes practitioners use first-order optimisation methods. One of the main challenges of using the more sophisticated second-order optimisation methods is that the curvature matrices of the loss surfaces of neural networks are usually intractable, which is an open avenue for research. In this work, we investigate the Gauss-Newton matrix for neural networks and its application in different areas of Machine Learning. Firstly, we analyse the structure of the Hessian and Gauss-Newton matrices for Feed Forward Neural Networks. Several insightful results are presented, and the relationship of these two matrices to each other and to the Fisher matrix is discussed. Based on this analysis, we develop a block-diagonal Kronecker Factored approximation to the Gauss-Newton matrix. The method is experimentally validated in the context of second-order optimisation, where it achieves competitive performance to other approaches on three datasets. In the last part of this work, we investigate the application of the proposed method for constructing an approximation to the posterior distribution of the parameters of a neural network. The approximation is constructed by adapting the well known Laplace approximation using the Kronecker factored Gauss-Newton matrix approximation. The method is compared against Dropout, a commonly used technique for uncertainty estimation, and achieves better uncertainty estimates on out of distribution data and is less susceptible to adversarial attacks. By combining the Laplace approximation with the Bayesian framework for online learning, we develop a scalable method for overcoming catastrophic forgetting. It achieves significantly better results than other approaches in the literature on several sequential learning tasks. The final chapter discusses potential future research directions that could be of interest to the curious reader.
- Published
- 2020