Back to Search
Start Over
On Centralization and Unitization of Batch Normalization for Deep ReLU Neural Networks
- Source :
- IEEE Transactions on Signal Processing; 2024, Vol. 72 Issue: 1 p2827-2841, 15p
- Publication Year :
- 2024
-
Abstract
- Batch normalization (BN) enhances the training of deep ReLU neural network with a composition of mean centering (centralization) and variance scaling (unitization). Despite the success of BN, there lacks a theoretical explanation to elaborate the effects of BN on training dynamics and guide the design of normalization methods. In this paper, we elucidate the effects of centralization and unitization on training deep ReLU neural networks for BN. We first reveal that feature centralization in BN stabilizes the correlation coefficients of features in unnormalized ReLU neural networks to achieve feature decorrelation and accelerate convergence in training. We demonstrate that weight centralization that subtracts means from weight parameters is equivalent to BN in feature decorrelation and achieves the same linear convergence rate in training. Subsequently, we show that feature unitization in BN enables dynamic learning rate that inversely varies with the norm of features for training and propose an adaptive loss function to emulate feature unitization. Furthermore, we exemplify the theoretical results to develop an efficient alternative to BN using a simple combination of weight centralization and the proposed adaptive loss function. Extensive experiments show that the proposed method achieves comparable classification accuracy and evidently reduces memory consumption in comparison to BN, and outperforms normalization-free methods in image classification. We further extend the weight centralization to enable small-batch training for object detection networks.
Details
- Language :
- English
- ISSN :
- 1053587X
- Volume :
- 72
- Issue :
- 1
- Database :
- Supplemental Index
- Journal :
- IEEE Transactions on Signal Processing
- Publication Type :
- Periodical
- Accession number :
- ejs66894509
- Full Text :
- https://doi.org/10.1109/TSP.2024.3410291