Author: "Chmiel, Brian" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Chmiel, Brian"' showing total 23 results

Start Over Author "Chmiel, Brian"

23 results on '"Chmiel, Brian"'

1. EXAQ: Exponent Aware Quantization For LLMs Acceleration

Author: Shkolnik, Moran, Fishman, Maxim, Chmiel, Brian, Ben-Yaacov, Hilla, Banner, Ron, and Levy, Kfir Yehuda
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Computer Science - Performance
Abstract: Quantization has established itself as the primary approach for decreasing the computational and storage expenses associated with Large Language Models (LLMs) inference. The majority of current research emphasizes quantizing weights and activations to enable low-bit general-matrix-multiply (GEMM) operations, with the remaining non-linear operations executed at higher precision. In our study, we discovered that following the application of these techniques, the primary bottleneck in LLMs inference lies in the softmax layer. The softmax operation comprises three phases: exponent calculation, accumulation, and normalization, Our work focuses on optimizing the first two phases. We propose an analytical approach to determine the optimal clipping value for the input to the softmax function, enabling sub-4-bit quantization for LLMs inference. This method accelerates the calculations of both $e^x$ and $\sum(e^x)$ with minimal to no accuracy degradation. For example, in LLaMA1-30B, we achieve baseline performance with 2-bit quantization on the well-known "Physical Interaction: Question Answering" (PIQA) dataset evaluation. This ultra-low bit quantization allows, for the first time, an acceleration of approximately 4x in the accumulation phase. The combination of accelerating both $e^x$ and $\sum(e^x)$ results in a 36.9% acceleration in the softmax operation.
Published: 2024

2. Scaling FP8 training to trillion-token LLMs

Author: Fishman, Maxim, Chmiel, Brian, Banner, Ron, and Soudry, Daniel
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence
Abstract: We train, for the first time, large language models using FP8 precision on datasets up to 2 trillion tokens -- a 20-fold increase over previous limits. Through these extended training runs, we uncover critical instabilities in FP8 training that were not observable in earlier works with shorter durations. We trace these instabilities to outlier amplification by the SwiGLU activation function. Interestingly, we show, both analytically and empirically, that this amplification happens only over prolonged training periods, and link it to a SwiGLU weight alignment process. To address this newly identified issue, we introduce Smooth-SwiGLU, a novel modification that ensures stable FP8 training without altering function behavior. We also demonstrate, for the first time, FP8 quantization of both Adam optimizer moments. Combining these innovations, we successfully train a 7B parameter model using FP8 precision on 256 Intel Gaudi2 accelerators, achieving on-par results with the BF16 baseline while delivering up to a $\sim 34 \%$ throughput improvement.
Published: 2024

3. Bimodal Distributed Binarized Neural Networks

Author: Rozen, Tal, Kimhi, Moshe, Chmiel, Brian, Mendelson, Avi, and Baskin, Chaim
Subjects: Computer Science - Machine Learning, Computer Science - Computer Vision and Pattern Recognition
Abstract: Binary Neural Networks (BNNs) are an extremely promising method to reduce deep neural networks' complexity and power consumption massively. Binarization techniques, however, suffer from ineligible performance degradation compared to their full-precision counterparts. Prior work mainly focused on strategies for sign function approximation during forward and backward phases to reduce the quantization error during the binarization process. In this work, we propose a Bi-Modal Distributed binarization method (\methodname{}). That imposes bi-modal distribution of the network weights by kurtosis regularization. The proposed method consists of a training scheme that we call Weight Distribution Mimicking (WDM), which efficiently imitates the full-precision network weight distribution to their binary counterpart. Preserving this distribution during binarization-aware training creates robust and informative binary feature maps and significantly reduces the generalization error of the BNN. Extensive evaluations on CIFAR-10 and ImageNet demonstrate the superiority of our method over current state-of-the-art schemes. Our source code, experimental settings, training logs, and binary models are available at \url{https://github.com/BlueAnon/BD-BNN}.
Published: 2022

4. Minimum Variance Unbiased N:M Sparsity for the Neural Gradients

Author: Chmiel, Brian, Hubara, Itay, Banner, Ron, and Soudry, Daniel
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence
Abstract: In deep learning, fine-grained N:M sparsity reduces the data footprint and bandwidth of a General Matrix multiply (GEMM) up to x2, and doubles throughput by skipping computation of zero values. So far, it was mainly only used to prune weights to accelerate the forward and backward phases. We examine how this method can be used also for the neural gradients (i.e., loss gradients with respect to the intermediate neural layer outputs). To this end, we first establish a tensor-level optimality criteria. Previous works aimed to minimize the mean-square-error (MSE) of each pruned block. We show that while minimization of the MSE works fine for pruning the weights and activations, it catastrophically fails for the neural gradients. Instead, we show that accurate pruning of the neural gradients requires an unbiased minimum-variance pruning mask. We design such specialized masks, and find that in most cases, 1:2 sparsity is sufficient for training, and 2:4 sparsity is usually enough when this is not the case. Further, we suggest combining several such methods together in order to potentially speed up training even more.
Published: 2022

5. Accurate Neural Training with 4-bit Matrix Multiplications at Standard Formats

Author: Chmiel, Brian, Banner, Ron, Hoffer, Elad, Yaacov, Hilla Ben, and Soudry, Daniel
Subjects: Computer Science - Machine Learning
Abstract: Quantization of the weights and activations is one of the main methods to reduce the computational footprint of Deep Neural Networks (DNNs) training. Current methods enable 4-bit quantization of the forward phase. However, this constitutes only a third of the training process. Reducing the computational footprint of the entire training process requires the quantization of the neural gradients, i.e., the loss gradients with respect to the outputs of intermediate neural layers. Previous works separately showed that accurate 4-bit quantization of the neural gradients needs to (1) be unbiased and (2) have a log scale. However, no previous work aimed to combine both ideas, as we do in this work. Specifically, we examine the importance of having unbiased quantization in quantized neural network training, where to maintain it, and how to combine it with logarithmic quantization. Based on this, we suggest a $\textit{logarithmic unbiased quantization}$ (LUQ) method to quantize both the forward and backward phases to 4-bit, achieving state-of-the-art results in 4-bit training without the overhead. For example, in ResNet50 on ImageNet, we achieved a degradation of 1.1%. We further improve this to a degradation of only 0.32% after three epochs of high precision fine-tuning, combined with a variance reduction method -- where both these methods add overhead comparable to previously suggested methods.
Published: 2021

6. Accelerated Sparse Neural Training: A Provable and Efficient Method to Find N:M Transposable Masks

Author: Hubara, Itay, Chmiel, Brian, Island, Moshe, Banner, Ron, Naor, Seffi, and Soudry, Daniel
Subjects: Computer Science - Artificial Intelligence
Abstract: Unstructured pruning reduces the memory footprint in deep neural networks (DNNs). Recently, researchers proposed different types of structural pruning intending to reduce also the computation complexity. In this work, we first suggest a new measure called mask-diversity which correlates with the expected accuracy of the different types of structural pruning. We focus on the recently suggested N:M fine-grained block sparsity mask, in which for each block of M weights, we have at least N zeros. While N:M fine-grained block sparsity allows acceleration in actual modern hardware, it can be used only to accelerate the inference phase. In order to allow for similar accelerations in the training phase, we suggest a novel transposable fine-grained sparsity mask, where the same mask can be used for both forward and backward passes. Our transposable mask guarantees that both the weight matrix and its transpose follow the same sparsity pattern; thus, the matrix multiplication required for passing the error backward can also be accelerated. We formulate the problem of finding the optimal transposable-mask as a minimum-cost flow problem. Additionally, to speed up the minimum-cost flow computation, we also introduce a fast linear-time approximation that can be used when the masks dynamically change during training. Our experiments suggest a 2x speed-up in the matrix multiplications with no accuracy degradation over vision and language models. Finally, to solve the problem of switching between different structure constraints, we suggest a method to convert a pre-trained model with unstructured sparsity to an N:M fine-grained block sparsity model with little to no training. A reference implementation can be found at https://github.com/papers-submission/structured_transposable_masks.
Published: 2021

7. Adversarial robustness via noise injection in smoothed models

Author: Nemcovsky, Yaniv, Zheltonozhskii, Evgenii, Baskin, Chaim, Chmiel, Brian, Bronstein, Alex M., and Mendelson, Avi
Published: 2023
Full Text: View/download PDF

8. Neural gradients are near-lognormal: improved quantized and sparse training

Author: Chmiel, Brian, Ben-Uri, Liad, Shkolnik, Moran, Hoffer, Elad, Banner, Ron, and Soudry, Daniel
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Machine Learning
Abstract: While training can mostly be accelerated by reducing the time needed to propagate neural gradients back throughout the model, most previous works focus on the quantization/pruning of weights and activations. These methods are often not applicable to neural gradients, which have very different statistical properties. Distinguished from weights and activations, we find that the distribution of neural gradients is approximately lognormal. Considering this, we suggest two closed-form analytical methods to reduce the computational and memory burdens of neural gradients. The first method optimizes the floating-point format and scale of the gradients. The second method accurately sets sparsity thresholds for gradient pruning. Each method achieves state-of-the-art results on ImageNet. To the best of our knowledge, this paper is the first to (1) quantize the gradients to 6-bit floating-point formats, or (2) achieve up to 85% gradient sparsity -- in each case without accuracy degradation. Reference implementation accompanies the paper.
Published: 2020

9. Colored Noise Injection for Training Adversarially Robust Neural Networks

Author: Zheltonozhskii, Evgenii, Baskin, Chaim, Nemcovsky, Yaniv, Chmiel, Brian, Mendelson, Avi, and Bronstein, Alex M.
Subjects: Computer Science - Machine Learning, Computer Science - Computer Vision and Pattern Recognition, Statistics - Machine Learning
Abstract: Even though deep learning has shown unmatched performance on various tasks, neural networks have been shown to be vulnerable to small adversarial perturbations of the input that lead to significant performance degradation. In this work we extend the idea of adding white Gaussian noise to the network weights and activations during adversarial training (PNI) to the injection of colored noise for defense against common white-box and black-box attacks. We show that our approach outperforms PNI and various previous approaches in terms of adversarial accuracy on CIFAR-10 and CIFAR-100 datasets. In addition, we provide an extensive ablation study of the proposed method justifying the chosen configurations.
Published: 2020

10. Robust Quantization: One Model to Rule Them All

Author: Shkolnik, Moran, Chmiel, Brian, Banner, Ron, Shomron, Gil, Nahshan, Yury, Bronstein, Alex, and Weiser, Uri
Subjects: Computer Science - Machine Learning, Computer Science - Computer Vision and Pattern Recognition, Statistics - Machine Learning
Abstract: Neural network quantization methods often involve simulating the quantization process during training, making the trained model highly dependent on the target bit-width and precise way quantization is performed. Robust quantization offers an alternative approach with improved tolerance to different classes of data-types and quantization policies. It opens up new exciting applications where the quantization process is not static and can vary to meet different circumstances and implementations. To address this issue, we propose a method that provides intrinsic robustness to the model against a broad range of quantization processes. Our method is motivated by theoretical arguments and enables us to store a single generic model capable of operating at various bit-widths and quantization policies. We validate our method's effectiveness on different ImageNet models.
Published: 2020

11. Loss Aware Post-training Quantization

Author: Nahshan, Yury, Chmiel, Brian, Baskin, Chaim, Zheltonozhskii, Evgenii, Banner, Ron, Bronstein, Alex M., and Mendelson, Avi
Subjects: Computer Science - Machine Learning, Computer Science - Computer Vision and Pattern Recognition
Abstract: Neural network quantization enables the deployment of large models on resource-constrained devices. Current post-training quantization methods fall short in terms of accuracy for INT4 (or lower) but provide reasonable accuracy for INT8 (or above). In this work, we study the effect of quantization on the structure of the loss landscape. Additionally, we show that the structure is flat and separable for mild quantization, enabling straightforward post-training quantization methods to achieve good results. We show that with more aggressive quantization, the loss landscape becomes highly non-separable with steep curvature, making the selection of quantization parameters more challenging. Armed with this understanding, we design a method that quantizes the layer parameters jointly, enabling significant accuracy improvement over current post-training quantization methods. Reference implementation is available at https://github.com/ynahshan/nn-quantization-pytorch/tree/master/lapq
Published: 2019

12. Smoothed Inference for Adversarially-Trained Models

Author: Nemcovsky, Yaniv, Zheltonozhskii, Evgenii, Baskin, Chaim, Chmiel, Brian, Fishman, Maxim, Bronstein, Alex M., and Mendelson, Avi
Subjects: Computer Science - Machine Learning, Computer Science - Computer Vision and Pattern Recognition, Statistics - Machine Learning
Abstract: Deep neural networks are known to be vulnerable to adversarial attacks. Current methods of defense from such attacks are based on either implicit or explicit regularization, e.g., adversarial training. Randomized smoothing, the averaging of the classifier outputs over a random distribution centered in the sample, has been shown to guarantee the performance of a classifier subject to bounded perturbations of the input. In this work, we study the application of randomized smoothing as a way to improve performance on unperturbed data as well as to increase robustness to adversarial attacks. The proposed technique can be applied on top of any existing adversarial defense, but works particularly well with the randomized approaches. We examine its performance on common white-box (PGD) and black-box (transfer and NAttack) attacks on CIFAR-10 and CIFAR-100, substantially outperforming previous art for most scenarios and comparable on others. For example, we achieve 60.4% accuracy under a PGD attack on CIFAR-10 using ResNet-20, outperforming previous art by 11.7%. Since our method is based on sampling, it lends itself well for trading-off between the model inference complexity and its performance. A reference implementation of the proposed techniques is provided at https://github.com/yanemcovsky/SIAM
Published: 2019

13. CAT: Compression-Aware Training for bandwidth reduction

Author: Baskin, Chaim, Chmiel, Brian, Zheltonozhskii, Evgenii, Banner, Ron, Bronstein, Alex M., and Mendelson, Avi
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Machine Learning
Abstract: Convolutional neural networks (CNNs) have become the dominant neural network architecture for solving visual processing tasks. One of the major obstacles hindering the ubiquitous use of CNNs for inference is their relatively high memory bandwidth requirements, which can be a main energy consumer and throughput bottleneck in hardware accelerators. Accordingly, an efficient feature map compression method can result in substantial performance gains. Inspired by quantization-aware training approaches, we propose a compression-aware training (CAT) method that involves training the model in a way that allows better compression of feature maps during inference. Our method trains the model to achieve low-entropy feature maps, which enables efficient compression at inference time using classical transform coding methods. CAT significantly improves the state-of-the-art results reported for quantization. For example, on ResNet-34 we achieve 73.1% accuracy (0.2% degradation from the baseline) with an average representation of only 1.79 bits per value. Reference implementation accompanies the paper at https://github.com/CAT-teams/CAT
Published: 2019

14. Feature Map Transform Coding for Energy-Efficient CNN Inference

Author: Chmiel, Brian, Baskin, Chaim, Banner, Ron, Zheltonozhskii, Evgenii, Yermolin, Yevgeny, Karbachevsky, Alex, Bronstein, Alex M., and Mendelson, Avi
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Machine Learning
Abstract: Convolutional neural networks (CNNs) achieve state-of-the-art accuracy in a variety of tasks in computer vision and beyond. One of the major obstacles hindering the ubiquitous use of CNNs for inference on low-power edge devices is their high computational complexity and memory bandwidth requirements. The latter often dominates the energy footprint on modern hardware. In this paper, we introduce a lossy transform coding approach, inspired by image and video compression, designed to reduce the memory bandwidth due to the storage of intermediate activation calculation results. Our method does not require fine-tuning the network weights and halves the data transfer volumes to the main memory by compressing feature maps, which are highly correlated, with variable length coding. Our method outperform previous approach in term of the number of bits per value with minor accuracy degradation on ResNet-34 and MobileNetV2. We analyze the performance of our approach on a variety of CNN architectures and demonstrate that FPGA implementation of ResNet-18 with our approach results in a reduction of around 40% in the memory energy footprint, compared to quantized network, with negligible impact on accuracy. When allowing accuracy degradation of up to 2%, the reduction of 60% is achieved. A reference implementation is available at https://github.com/CompressTeam/TransformCodingInference
Published: 2019
Full Text: View/download PDF

15. Towards Learning of Filter-Level Heterogeneous Compression of Convolutional Neural Networks

Author: Zur, Yochai, Baskin, Chaim, Zheltonozhskii, Evgenii, Chmiel, Brian, Evron, Itay, Bronstein, Alex M., and Mendelson, Avi
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Machine Learning, Computer Science - Neural and Evolutionary Computing
Abstract: Recently, deep learning has become a de facto standard in machine learning with convolutional neural networks (CNNs) demonstrating spectacular success on a wide variety of tasks. However, CNNs are typically very demanding computationally at inference time. One of the ways to alleviate this burden on certain hardware platforms is quantization relying on the use of low-precision arithmetic representation for the weights and the activations. Another popular method is the pruning of the number of filters in each layer. While mainstream deep learning methods train the neural networks weights while keeping the network architecture fixed, the emerging neural architecture search (NAS) techniques make the latter also amenable to training. In this paper, we formulate optimal arithmetic bit length allocation and neural network pruning as a NAS problem, searching for the configurations satisfying a computational complexity budget while maximizing the accuracy. We use a differentiable search method based on the continuous relaxation of the search space proposed by Liu et al. (arXiv:1806.09055). We show, by grid search, that heterogeneous quantized networks suffer from a high variance which renders the benefit of the search questionable. For pruning, improvement over homogeneous cases is possible, but it is still challenging to find those configurations with the proposed method. The code is publicly available at https://github.com/yochaiz/Slimmable and https://github.com/yochaiz/darts-UNIQ, Comment: Accepted to ICML Workshop on AutoML 2019
Published: 2019

16. Loss aware post-training quantization

Author: Nahshan, Yury, Chmiel, Brian, Baskin, Chaim, Zheltonozhskii, Evgenii, Banner, Ron, Bronstein, Alex M., and Mendelson, Avi
Published: 2021
Full Text: View/download PDF

17. Bimodal-Distributed Binarized Neural Networks

Author: Rozen, Tal, primary, Kimhi, Moshe, additional, Chmiel, Brian, additional, Mendelson, Avi, additional, and Baskin, Chaim, additional
Published: 2022
Full Text: View/download PDF

18. Adversarial robustness via noise injection in smoothed models

Author: Nemcovsky, Yaniv, primary, Zheltonozhskii, Evgenii, additional, Baskin, Chaim, additional, Chmiel, Brian, additional, Bronstein, Alex M., additional, and Mendelson, Avi, additional
Published: 2022
Full Text: View/download PDF

19. Optimal Fine-Grained N:M sparsity for Activations and Neural Gradients

Author: Chmiel, Brian, Hubara, Itay, Banner, Ron, Soudry, Daniel, Chmiel, Brian, Hubara, Itay, Banner, Ron, and Soudry, Daniel
Abstract: In deep learning, fine-grained N:M sparsity reduces the data footprint and bandwidth of a General Matrix multiply (GEMM) by x2, and doubles throughput by skipping computation of zero values. So far, it was only used to prune weights. We examine how this method can be used also for activations and their gradients (i.e., "neural gradients"). To this end, we first establish a tensor-level optimality criteria. Previous works aimed to minimize the mean-square-error (MSE) of each pruned block. We show that while minimization of the MSE works fine for pruning the activations, it catastrophically fails for the neural gradients. Instead, we show that optimal pruning of the neural gradients requires an unbiased minimum-variance pruning mask. We design such specialized masks, and find that in most cases, 1:2 sparsity is sufficient for training, and 2:4 sparsity is usually enough when this is not the case. Further, we suggest combining several such methods together in order to potentially speed up training even more. A reference implementation is supplied in https://github.com/brianchmiel/Act-and-Grad-structured-sparsity., Comment: Main changes: 1) Experiments (see also experiments in the appendix). 2) Overhead analysis (Tab 3)
Published: 2022

20. Logarithmic Unbiased Quantization: Simple 4-bit Training in Deep Learning

Author: Chmiel, Brian, Banner, Ron, Hoffer, Elad, Yaacov, Hilla Ben, and Soudry, Daniel
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Machine Learning (cs.LG)
Abstract: Quantization of the weights and activations is one of the main methods to reduce the computational footprint of Deep Neural Networks (DNNs) training. Current methods enable 4-bit quantization of the forward phase. However, this constitutes only a third of the training process. Reducing the computational footprint of the entire training process requires the quantization of the neural gradients, i.e., the loss gradients with respect to the outputs of intermediate neural layers. In this work, we examine the importance of having unbiased quantization in quantized neural network training, where to maintain it, and how. Based on this, we suggest a \textit{logarithmic unbiased quantization} (LUQ) method to quantize all both the forward and backward phase to 4-bit, achieving state-of-the-art results in 4-bit training without overhead. For example, in ResNet50 on ImageNet, we achieved a degradation of 1.1\%. We further improve this to degradation of only 0.32\% after three epochs of high precision fine-tunining, combined with a variance reduction method -- where both these methods add overhead comparable to previously suggested methods., Main Changes: 1) FNT learning rate (sec 4.2) 2) Implementation details (sec 4.3), including solving data movement bottleneck. 3) Additional experiments
Published: 2021

21. Transient Finite Element Simulation of a Lithium-Ion Battery Pack Thermal Management System Based on Latent Heat System Materials

Author: Tsao, Bang, primary, Chmiel, Brian, additional, Devillier, Amanda, additional, Tsao, Max, additional, McLeod, Michael, additional, Miller, Kathryn, additional, Schimpf, Peter, additional, Grigsby, Addison, additional, and Fellner, Joseph, additional
Published: 2021
Full Text: View/download PDF

22. Feature Map Transform Coding for Energy-Efficient CNN Inference

Author: Chmiel, Brian, primary, Baskin, Chaim, additional, Zheltonozhskii, Evgenii, additional, Banner, Ron, additional, Yermolin, Yevgeny, additional, Karbachevsky, Alex, additional, Bronstein, Alex M., additional, and Mendelson, Avi, additional
Published: 2020
Full Text: View/download PDF

23. CAT: Compression-Aware Training for Bandwidth Reduction.

Author: Baskin, Chaim, Chmiel, Brian, Zheltonozhskii, Evgenii, Banner, Ron Banner, Bronstein, Alex M., and Mendelson, Avi
Subjects: *CONVOLUTIONAL neural networks, *BANDWIDTHS, *SENTIMENT analysis, *DEEP learning, *ENTROPY
Abstract: One major obstacle hindering the ubiquitous use of CNNs for inference is their relatively high memory bandwidth requirements, which can be the primary energy consumer and throughput bottleneck in hardware accelerators. Inspired by quantization-aware training approaches, we propose a compression-aware training (CAT) method that involves training the model to allow better compression of weights and feature maps during neural network deployment. Our method trains the model to achieve low-entropy feature maps, enabling efficient compression at inference time using classical transform coding methods. CAT significantly improves the state-of-the-art results reported for quantization evaluated on various vision and NLP tasks, such as image classification (ImageNet), image detection (Pascal VOC), sentiment analysis (CoLa), and textual entailment (MNLI). For example, on ResNet-18, we achieve near baseline ImageNet accuracy with an average representation of only 1.5 bits per value with 5-bit quantization. Moreover, we show that entropy reduction of weights and activations can be applied together, further improving bandwidth reduction. Reference implementation is available. [ABSTRACT FROM AUTHOR]
Published: 2021

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

23 results on '"Chmiel, Brian"'

1. EXAQ: Exponent Aware Quantization For LLMs Acceleration

2. Scaling FP8 training to trillion-token LLMs

3. Bimodal Distributed Binarized Neural Networks

4. Minimum Variance Unbiased N:M Sparsity for the Neural Gradients

5. Accurate Neural Training with 4-bit Matrix Multiplications at Standard Formats

6. Accelerated Sparse Neural Training: A Provable and Efficient Method to Find N:M Transposable Masks

7. Adversarial robustness via noise injection in smoothed models

8. Neural gradients are near-lognormal: improved quantized and sparse training

9. Colored Noise Injection for Training Adversarially Robust Neural Networks

10. Robust Quantization: One Model to Rule Them All

11. Loss Aware Post-training Quantization

12. Smoothed Inference for Adversarially-Trained Models

13. CAT: Compression-Aware Training for bandwidth reduction

14. Feature Map Transform Coding for Energy-Efficient CNN Inference

15. Towards Learning of Filter-Level Heterogeneous Compression of Convolutional Neural Networks

16. Loss aware post-training quantization

17. Bimodal-Distributed Binarized Neural Networks

18. Adversarial robustness via noise injection in smoothed models

19. Optimal Fine-Grained N:M sparsity for Activations and Neural Gradients

20. Logarithmic Unbiased Quantization: Simple 4-bit Training in Deep Learning

21. Transient Finite Element Simulation of a Lithium-Ion Battery Pack Thermal Management System Based on Latent Heat System Materials

22. Feature Map Transform Coding for Energy-Efficient CNN Inference

23. CAT: Compression-Aware Training for Bandwidth Reduction.

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

Publisher

23 results on '"Chmiel, Brian"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources