9 results on '"Chmiel, Brian"'
Search Results
2. Loss aware post-training quantization
- Author
-
Nahshan, Yury, Chmiel, Brian, Baskin, Chaim, Zheltonozhskii, Evgenii, Banner, Ron, Bronstein, Alex M., and Mendelson, Avi
- Published
- 2021
- Full Text
- View/download PDF
3. Optimal Fine-Grained N:M sparsity for Activations and Neural Gradients
- Author
-
Chmiel, Brian, Hubara, Itay, Banner, Ron, and Soudry, Daniel
- Subjects
FOS: Computer and information sciences ,Computer Science - Machine Learning ,Artificial Intelligence (cs.AI) ,Computer Science - Artificial Intelligence ,Machine Learning (cs.LG) - Abstract
In deep learning, fine-grained N:M sparsity reduces the data footprint and bandwidth of a General Matrix multiply (GEMM) by x2, and doubles throughput by skipping computation of zero values. So far, it was only used to prune weights. We examine how this method can be used also for activations and their gradients (i.e., "neural gradients"). To this end, we first establish a tensor-level optimality criteria. Previous works aimed to minimize the mean-square-error (MSE) of each pruned block. We show that while minimization of the MSE works fine for pruning the activations, it catastrophically fails for the neural gradients. Instead, we show that optimal pruning of the neural gradients requires an unbiased minimum-variance pruning mask. We design such specialized masks, and find that in most cases, 1:2 sparsity is sufficient for training, and 2:4 sparsity is usually enough when this is not the case. Further, we suggest combining several such methods together in order to potentially speed up training even more. A reference implementation is supplied in https://github.com/brianchmiel/Act-and-Grad-structured-sparsity., Main changes: 1) Experiments (see also experiments in the appendix). 2) Overhead analysis (Tab 3)
- Published
- 2022
4. Logarithmic Unbiased Quantization: Simple 4-bit Training in Deep Learning
- Author
-
Chmiel, Brian, Banner, Ron, Hoffer, Elad, Yaacov, Hilla Ben, and Soudry, Daniel
- Subjects
FOS: Computer and information sciences ,Computer Science - Machine Learning ,Machine Learning (cs.LG) - Abstract
Quantization of the weights and activations is one of the main methods to reduce the computational footprint of Deep Neural Networks (DNNs) training. Current methods enable 4-bit quantization of the forward phase. However, this constitutes only a third of the training process. Reducing the computational footprint of the entire training process requires the quantization of the neural gradients, i.e., the loss gradients with respect to the outputs of intermediate neural layers. In this work, we examine the importance of having unbiased quantization in quantized neural network training, where to maintain it, and how. Based on this, we suggest a \textit{logarithmic unbiased quantization} (LUQ) method to quantize all both the forward and backward phase to 4-bit, achieving state-of-the-art results in 4-bit training without overhead. For example, in ResNet50 on ImageNet, we achieved a degradation of 1.1\%. We further improve this to degradation of only 0.32\% after three epochs of high precision fine-tunining, combined with a variance reduction method -- where both these methods add overhead comparable to previously suggested methods., Main Changes: 1) FNT learning rate (sec 4.2) 2) Implementation details (sec 4.3), including solving data movement bottleneck. 3) Additional experiments
- Published
- 2021
5. Bimodal-Distributed Binarized Neural Networks.
- Author
-
Rozen, Tal, Kimhi, Moshe, Chmiel, Brian, Mendelson, Avi, and Baskin, Chaim
- Subjects
ARTIFICIAL neural networks ,CONVOLUTIONAL neural networks ,KURTOSIS - Abstract
Binary neural networks (BNNs) are an extremely promising method for reducing deep neural networks' complexity and power consumption significantly. Binarization techniques, however, suffer from ineligible performance degradation compared to their full-precision counterparts. Prior work mainly focused on strategies for sign function approximation during the forward and backward phases to reduce the quantization error during the binarization process. In this work, we propose a bimodal-distributed binarization method (BD-BNN). The newly proposed technique aims to impose a bimodal distribution of the network weights by kurtosis regularization. The proposed method consists of a teacher–trainer training scheme termed weight distribution mimicking (WDM), which efficiently imitates the full-precision network weight distribution to their binary counterpart. Preserving this distribution during binarization-aware training creates robust and informative binary feature maps and thus it can significantly reduce the generalization error of the BNN. Extensive evaluations on CIFAR-10 and ImageNet demonstrate that our newly proposed BD-BNN outperforms current state-of-the-art schemes. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
6. Neural gradients are near-lognormal: improved quantized and sparse training
- Author
-
Chmiel, Brian, Ben-Uri, Liad, Shkolnik, Moran, Hoffer, Elad, Banner, Ron, and Soudry, Daniel
- Subjects
FOS: Computer and information sciences ,Computer Science - Machine Learning ,Computer Vision and Pattern Recognition (cs.CV) ,Computer Science - Computer Vision and Pattern Recognition ,Machine Learning (cs.LG) - Abstract
While training can mostly be accelerated by reducing the time needed to propagate neural gradients back throughout the model, most previous works focus on the quantization/pruning of weights and activations. These methods are often not applicable to neural gradients, which have very different statistical properties. Distinguished from weights and activations, we find that the distribution of neural gradients is approximately lognormal. Considering this, we suggest two closed-form analytical methods to reduce the computational and memory burdens of neural gradients. The first method optimizes the floating-point format and scale of the gradients. The second method accurately sets sparsity thresholds for gradient pruning. Each method achieves state-of-the-art results on ImageNet. To the best of our knowledge, this paper is the first to (1) quantize the gradients to 6-bit floating-point formats, or (2) achieve up to 85% gradient sparsity -- in each case without accuracy degradation. Reference implementation accompanies the paper.
- Published
- 2020
7. Colored Noise Injection for Training Adversarially Robust Neural Networks
- Author
-
Zheltonozhskii, Evgenii, Baskin, Chaim, Nemcovsky, Yaniv, Chmiel, Brian, Mendelson, Avi, and Bronstein, Alex M.
- Subjects
FOS: Computer and information sciences ,Computer Science - Machine Learning ,Statistics - Machine Learning ,Computer Vision and Pattern Recognition (cs.CV) ,Computer Science - Computer Vision and Pattern Recognition ,Machine Learning (stat.ML) ,Machine Learning (cs.LG) - Abstract
Even though deep learning has shown unmatched performance on various tasks, neural networks have been shown to be vulnerable to small adversarial perturbations of the input that lead to significant performance degradation. In this work we extend the idea of adding white Gaussian noise to the network weights and activations during adversarial training (PNI) to the injection of colored noise for defense against common white-box and black-box attacks. We show that our approach outperforms PNI and various previous approaches in terms of adversarial accuracy on CIFAR-10 and CIFAR-100 datasets. In addition, we provide an extensive ablation study of the proposed method justifying the chosen configurations.
- Published
- 2020
8. Smoothed Inference for Adversarially-Trained Models
- Author
-
Nemcovsky, Yaniv, Zheltonozhskii, Evgenii, Baskin, Chaim, Chmiel, Brian, Fishman, Maxim, Bronstein, Alex M., and Mendelson, Avi
- Subjects
FOS: Computer and information sciences ,Computer Science - Machine Learning ,Statistics - Machine Learning ,Computer Vision and Pattern Recognition (cs.CV) ,Computer Science - Computer Vision and Pattern Recognition ,Machine Learning (stat.ML) ,Machine Learning (cs.LG) - Abstract
Deep neural networks are known to be vulnerable to adversarial attacks. Current methods of defense from such attacks are based on either implicit or explicit regularization, e.g., adversarial training. Randomized smoothing, the averaging of the classifier outputs over a random distribution centered in the sample, has been shown to guarantee the performance of a classifier subject to bounded perturbations of the input. In this work, we study the application of randomized smoothing as a way to improve performance on unperturbed data as well as to increase robustness to adversarial attacks. The proposed technique can be applied on top of any existing adversarial defense, but works particularly well with the randomized approaches. We examine its performance on common white-box (PGD) and black-box (transfer and NAttack) attacks on CIFAR-10 and CIFAR-100, substantially outperforming previous art for most scenarios and comparable on others. For example, we achieve 60.4% accuracy under a PGD attack on CIFAR-10 using ResNet-20, outperforming previous art by 11.7%. Since our method is based on sampling, it lends itself well for trading-off between the model inference complexity and its performance. A reference implementation of the proposed techniques is provided at https://github.com/yanemcovsky/SIAM
- Published
- 2019
9. CAT: Compression-Aware Training for Bandwidth Reduction.
- Author
-
Baskin, Chaim, Chmiel, Brian, Zheltonozhskii, Evgenii, Banner, Ron Banner, Bronstein, Alex M., and Mendelson, Avi
- Subjects
- *
CONVOLUTIONAL neural networks , *BANDWIDTHS , *SENTIMENT analysis , *DEEP learning , *ENTROPY - Abstract
One major obstacle hindering the ubiquitous use of CNNs for inference is their relatively high memory bandwidth requirements, which can be the primary energy consumer and throughput bottleneck in hardware accelerators. Inspired by quantization-aware training approaches, we propose a compression-aware training (CAT) method that involves training the model to allow better compression of weights and feature maps during neural network deployment. Our method trains the model to achieve low-entropy feature maps, enabling efficient compression at inference time using classical transform coding methods. CAT significantly improves the state-of-the-art results reported for quantization evaluated on various vision and NLP tasks, such as image classification (ImageNet), image detection (Pascal VOC), sentiment analysis (CoLa), and textual entailment (MNLI). For example, on ResNet-18, we achieve near baseline ImageNet accuracy with an average representation of only 1.5 bits per value with 5-bit quantization. Moreover, we show that entropy reduction of weights and activations can be applied together, further improving bandwidth reduction. Reference implementation is available. [ABSTRACT FROM AUTHOR]
- Published
- 2021
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.