1. Annealed gradient descent for deep learning
- Author
-
Hui Jiang, Hengyue Pan, Xin Niu, Rongchun Li, and Yong Dou
- Subjects
0209 industrial biotechnology ,business.industry ,Computer science ,Cognitive Neuroscience ,Deep learning ,Pattern recognition ,02 engineering and technology ,Pascal (programming language) ,Convolutional neural network ,Computer Science Applications ,020901 industrial engineering & automation ,Artificial Intelligence ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Artificial intelligence ,Gradient descent ,business ,computer ,computer.programming_language - Abstract
In this paper, we propose a novel annealed gradient descent (AGD) algorithm for deep learning. AGD optimizes a sequence of gradually improving smoother mosaic functions that approximate the original non-convex objective function according to an annealing schedule during optimization process. We present a theoretical analysis on AGD’s convergence properties and learning speed, as well as use some visualization methods to show its advantages. The proposed AGD algorithm is applied to learn both deep neural networks (DNNs) and Convolutional Neural Networks (CNNs) for variety of tasks includes image recognition and speech recognition. Experimental results on several widely-used databases, such as Switchboard, CIFAR-10 and Pascal VOC 2012, show that AGD yields better classification accuracy than SGD, and obviously accelerates the training speed of DNNs and CNNs.
- Published
- 2020