"Memory footprint" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Memory footprint"' showing total 41,421 results

Start Over "Memory footprint"

41,421 results on '"Memory footprint"'

1. Inverted Activations: Reducing Memory Footprint in Neural Network Training

Author: Novikov, Georgii and Oseledets, Ivan
Subjects: Computer Science - Machine Learning
Abstract: The scaling of neural networks with increasing data and model sizes necessitates the development of more efficient deep learning algorithms. A significant challenge in neural network training is the memory footprint associated with activation tensors, particularly in pointwise nonlinearity layers that traditionally save the entire input tensor for the backward pass, leading to substantial memory consumption. In this paper, we propose a modification to the handling of activation tensors in pointwise nonlinearity layers. Our method involves saving the output tensor instead of the input tensor during the forward pass. Since the subsequent layer typically also saves its input tensor, this approach reduces the total memory required by storing only one tensor between layers instead of two. This optimization is especially beneficial for transformer-based architectures like GPT, BERT, Mistral, and Llama. To enable this approach, we utilize the inverse function of the nonlinearity during the backward pass. As the inverse cannot be computed analytically for most nonlinearities, we construct accurate approximations using simpler functions. Experimental results demonstrate that our method significantly reduces memory usage without affecting training accuracy or computational performance. Our implementation is provided as a drop-in replacement for standard nonlinearity layers in the PyTorch framework, facilitating easy adoption without requiring architectural modifications.
Published: 2024

2. Efficient Continual Learning with Low Memory Footprint For Edge Device

Author: Wang, Zeqing, Cheng, Fei, Ji, Kangye, and Huang, Bohu
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Computer Science - Computer Vision and Pattern Recognition
Abstract: Continual learning(CL) is a useful technique to acquire dynamic knowledge continually. Although powerful cloud platforms can fully exert the ability of CL,e.g., customized recommendation systems, similar personalized requirements for edge devices are almost disregarded. This phenomenon stems from the huge resource overhead involved in training neural networks and overcoming the forgetting problem of CL. This paper focuses on these scenarios and proposes a compact algorithm called LightCL. Different from other CL methods bringing huge resource consumption to acquire generalizability among all tasks for delaying forgetting, LightCL compress the resource consumption of already generalized components in neural networks and uses a few extra resources to improve memory in other parts. We first propose two new metrics of learning plasticity and memory stability to seek generalizability during CL. Based on the discovery that lower and middle layers have more generalizability and deeper layers are opposite, we $\textit{Maintain Generalizability}$ by freezing the lower and middle layers. Then, we $\textit{Memorize Feature Patterns}$ to stabilize the feature extracting patterns of previous tasks to improve generalizability in deeper layers. In the experimental comparison, LightCL outperforms other SOTA methods in delaying forgetting and reduces at most $\textbf{6.16$\times$}$ memory footprint, proving the excellent performance of LightCL in efficiency. We also evaluate the efficiency of our method on an edge device, the Jetson Nano, which further proves our method's practical effectiveness.
Published: 2024

3. Augmented Efficiency: Reducing Memory Footprint and Accelerating Inference for 3D Semantic Segmentation through Hybrid Vision

Author: Krishnan, Aditya, Vora, Jayneel, and Mohapatra, Prasant
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Semantic segmentation has emerged as a pivotal area of study in computer vision, offering profound implications for scene understanding and elevating human-machine interactions across various domains. While 2D semantic segmentation has witnessed significant strides in the form of lightweight, high-precision models, transitioning to 3D semantic segmentation poses distinct challenges. Our research focuses on achieving efficiency and lightweight design for 3D semantic segmentation models, similar to those achieved for 2D models. Such a design impacts applications of 3D semantic segmentation where memory and latency are of concern. This paper introduces a novel approach to 3D semantic segmentation, distinguished by incorporating a hybrid blend of 2D and 3D computer vision techniques, enabling a streamlined, efficient process. We conduct 2D semantic segmentation on RGB images linked to 3D point clouds and extend the results to 3D using an extrusion technique for specific class labels, reducing the point cloud subspace. We perform rigorous evaluations with the DeepViewAgg model on the complete point cloud as our baseline by measuring the Intersection over Union (IoU) accuracy, inference time latency, and memory consumption. This model serves as the current state-of-the-art 3D semantic segmentation model on the KITTI-360 dataset. We can achieve heightened accuracy outcomes, surpassing the baseline for 6 out of the 15 classes while maintaining a marginal 1% deviation below the baseline for the remaining class labels. Our segmentation approach demonstrates a 1.347x speedup and about a 43% reduced memory usage compared to the baseline., Comment: 18 pages, 3 figures, 3 tables
Published: 2024

4. On the Performance and Memory Footprint of Distributed Training: An Empirical Study on Transformers

Author: Lu, Zhengxian, Wang, Fangyu, Xu, Zhiwei, Yang, Fei, and Li, Tao
Subjects: Computer Science - Distributed, Parallel, and Cluster Computing
Abstract: Transformer models have emerged as potent solutions to a wide array of multidisciplinary challenges. The deployment of Transformer architectures is significantly hindered by their extensive computational and memory requirements, necessitating the reliance on advanced efficient distributed training methodologies. Prior research has delved into the performance bottlenecks associated with distributed training, aiming to unravel these bottlenecks and suggest optimization directions. However, such analyses often overlook three aspects unique to Transformer models: the specialized architecture, the dependency on various distributed strategies, and the requirement to balance computational and memory overhead. This paper aims to bridge this gap by offering a comprehensive examination of the performance bottlenecks inherent in distributed training of Transformer models, leveraging both theoretical analysis and empirical investigation. We propose an analytical framework tailored to these unique aspects of Transformers, facilitating a holistic evaluation of model architectures, distributed strategies, and resource consumption. Based on this analytical framework, we conduct a comparative analysis of theoretical performances and further systematically explore how various distributed training strategies fare in real-world scenarios. Most of the experimental results can be well explained by the analytical outcomes derived from the analytical framework. Notably, our findings suggest an advantage of pipeline parallelism over data parallelism for Transformer models. Moreover, we shed light on some unexpected outcomes, such as the potential for increased total memory overhead due to suboptimal model partitioning within pipeline parallelism. Additionally, we underscore the significance of communication block size and waiting time to further enhance performance.
Published: 2024

5. DNN Memory Footprint Reduction via Post-Training Intra-Layer Multi-Precision Quantization

Author: Ghavami, Behnam, Kamjoo, Amin, Shannon, Lesley, and Wilton, Steve
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence
Abstract: The imperative to deploy Deep Neural Network (DNN) models on resource-constrained edge devices, spurred by privacy concerns, has become increasingly apparent. To facilitate the transition from cloud to edge computing, this paper introduces a technique that effectively reduces the memory footprint of DNNs, accommodating the limitations of resource-constrained edge devices while preserving model accuracy. Our proposed technique, named Post-Training Intra-Layer Multi-Precision Quantization (PTILMPQ), employs a post-training quantization approach, eliminating the need for extensive training data. By estimating the importance of layers and channels within the network, the proposed method enables precise bit allocation throughout the quantization process. Experimental results demonstrate that PTILMPQ offers a promising solution for deploying DNNs on edge devices with restricted memory resources. For instance, in the case of ResNet50, it achieves an accuracy of 74.57\% with a memory footprint of 9.5 MB, representing a 25.49\% reduction compared to previous similar methods, with only a minor 1.08\% decrease in accuracy., Comment: The 25th International Symposium on Quality Electronic Design (ISQED'24)
Published: 2024

6. Reducing the Memory Footprint of 3D Gaussian Splatting

Author: Papantonakis, Panagiotis, Kopanas, Georgios, Kerbl, Bernhard, Lanvin, Alexandre, and Drettakis, George
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: 3D Gaussian splatting provides excellent visual quality for novel view synthesis, with fast training and real-time rendering; unfortunately, the memory requirements of this method for storing and transmission are unreasonably high. We first analyze the reasons for this, identifying three main areas where storage can be reduced: the number of 3D Gaussian primitives used to represent a scene, the number of coefficients for the spherical harmonics used to represent directional radiance, and the precision required to store Gaussian primitive attributes. We present a solution to each of these issues. First, we propose an efficient, resolution-aware primitive pruning approach, reducing the primitive count by half. Second, we introduce an adaptive adjustment method to choose the number of coefficients used to represent directional radiance for each Gaussian primitive, and finally a codebook-based quantization method, together with a half-float representation for further memory reduction. Taken together, these three components result in a 27 reduction in overall size on disk on the standard datasets we tested, along with a 1.7 speedup in rendering speed. We demonstrate our method on standard datasets and show how our solution results in significantly reduced download times when using the method on a mobile device., Comment: Project website: https://repo-sam.inria.fr/fungraph/reduced_3dgs/
Published: 2024
Full Text: View/download PDF

7. F-OAL: Forward-only Online Analytic Learning with Fast Training and Low Memory Footprint in Class Incremental Learning

Author: Zhuang, Huiping, Liu, Yuchen, He, Run, Tong, Kai, Zeng, Ziqian, Chen, Cen, Wang, Yi, and Chau, Lap-Pui
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Online Class Incremental Learning (OCIL) aims to train models incrementally, where data arrive in mini-batches, and previous data are not accessible. A major challenge in OCIL is Catastrophic Forgetting, i.e., the loss of previously learned knowledge. Among existing baselines, replay-based methods show competitive results but requires extra memory for storing exemplars, while exemplar-free (i.e., data need not be stored for replay in production) methods are resource-friendly but often lack accuracy. In this paper, we propose an exemplar-free approach--Forward-only Online Analytic Learning (F-OAL). Unlike traditional methods, F-OAL does not rely on back-propagation and is forward-only, significantly reducing memory usage and computational time. Cooperating with a pre-trained frozen encoder with Feature Fusion, F-OAL only needs to update a linear classifier by recursive least square. This approach simultaneously achieves high accuracy and low resource consumption. Extensive experiments on benchmark datasets demonstrate F-OAL's robust performance in OCIL scenarios. Code is available at https://github.com/liuyuchen-cz/F-OAL.
Published: 2024

8. Network Memory Footprint Compression Through Jointly Learnable Codebooks and Mappings

Author: Yvinec, Edouard, Dapogny, Arnaud, and Bailly, Kevin
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: The massive interest in deep neural networks (DNNs) for both computer vision and natural language processing has been sparked by the growth in computational power. However, this led to an increase in the memory footprint, to a point where it can be challenging to simply load a model on commodity devices such as mobile phones. To address this limitation, quantization is a favored solution as it maps high precision tensors to a low precision, memory efficient format. In terms of memory footprint reduction, its most effective variants are based on codebooks. These methods, however, suffer from two limitations. First, they either define a single codebook for each tensor, or use a memory-expensive mapping to multiple codebooks. Second, gradient descent optimization of the mapping favors jumps toward extreme values, hence not defining a proximal search. In this work, we propose to address these two limitations. First, we initially group similarly distributed neurons and leverage the re-ordered structure to either apply different scale factors to the different groups, or map weights that fall in these groups to several codebooks, without any mapping overhead. Second, stemming from this initialization, we propose a joint learning of the codebook and weight mappings that bears similarities with recent gradient-based post-training quantization techniques. Third, drawing estimation from straight-through estimation techniques, we introduce a novel gradient update definition to enable a proximal search of the codebooks and their mappings. The proposed jointly learnable codebooks and mappings (JLCM) method allows a very efficient approximation of any DNN: as such, a Llama 7B can be compressed down to 2Go and loaded on 5-year-old smartphones.
Published: 2023

9. Reducing shared memory footprint to leverage high throughput on Tensor Cores and its flexible API extension library

Author: Ootomo, Hiroyuki and Yokota, Rio
Subjects: Computer Science - Distributed, Parallel, and Cluster Computing
Abstract: NVIDIA Tensor Core is a mixed-precision matrix-matrix multiplication and addition computing unit, where the theoretical peak performance is more than 300 TFlop/s on NVIDIA A100 GPU. NVIDIA provides WMMA API for using Tensor Cores in custom kernel functions. The most common way to use Tensor Core is to supply the input matrices from shared memory, which has higher bandwidth than global memory. However, the Bytes-per-Flops (B/F) ratio of the shared memory and Tensor Cores is small since the performance of Tensor Cores is high. Thus, it is important to reduce the shared memory footprint for efficient Tensor Cores usage. In this paper, we analyze the simple matrix-matrix multiplication on Tensor Cores by the roofline model and figure out that the bandwidth of shared memory might be a limitation of the performance when using WMMA API. To alleviate this issue, we provide a WMMA API extension library to boost the throughput of the computation, which has two components. The first one allows for manipulating the array of registers input to Tensor Cores flexibly. We evaluate the performance improvement of this library. The outcome of our evaluation shows that our library reduces the shared memory footprint and speeds up the computation using Tensor Cores. The second one is an API for the SGEMM emulation on Tensor Cores without additional shared memory usage. We have demonstrated that the single-precision emulating batch SGEMM implementation on Tensor Cores using this library achieves 54.2 TFlop/s on A100 GPU, which outperforms the theoretical peak performance of FP32 SIMT Cores while achieving the same level of accuracy as cuBLAS. The achieved throughput can not be achieved without reducing the shared memory footprint done by our library with the same amount of register usage., Comment: HPC Asia 2023
Published: 2023

10. Towards Zero Memory Footprint Spiking Neural Network Training

Author: Lei, Bin, Lin, Sheng, Lin, Pei-Hung, Liao, Chunhua, and Ding, Caiwen
Subjects: Computer Science - Neural and Evolutionary Computing, Computer Science - Artificial Intelligence
Abstract: Biologically-inspired Spiking Neural Networks (SNNs), processing information using discrete-time events known as spikes rather than continuous values, have garnered significant attention due to their hardware-friendly and energy-efficient characteristics. However, the training of SNNs necessitates a considerably large memory footprint, given the additional storage requirements for spikes or events, leading to a complex structure and dynamic setup. In this paper, to address memory constraint in SNN training, we introduce an innovative framework, characterized by a remarkably low memory footprint. We \textbf{(i)} design a reversible SNN node that retains a high level of accuracy. Our design is able to achieve a $\mathbf{58.65\times}$ reduction in memory usage compared to the current SNN node. We \textbf{(ii)} propose a unique algorithm to streamline the backpropagation process of our reversible SNN node. This significantly trims the backward Floating Point Operations Per Second (FLOPs), thereby accelerating the training process in comparison to current reversible layer backpropagation method. By using our algorithm, the training time is able to be curtailed by $\mathbf{23.8\%}$ relative to existing reversible layer architectures.
Published: 2023

11. TinyissimoYOLO: A Quantized, Low-Memory Footprint, TinyML Object Detection Network for Low Power Microcontrollers

Author: Moosmann, Julian, Giordano, Marco, Vogt, Christian, and Magno, Michele
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Hardware Architecture, Electrical Engineering and Systems Science - Image and Video Processing
Abstract: This paper introduces a highly flexible, quantized, memory-efficient, and ultra-lightweight object detection network, called TinyissimoYOLO. It aims to enable object detection on microcontrollers in the power domain of milliwatts, with less than 0.5MB memory available for storing convolutional neural network (CNN) weights. The proposed quantized network architecture with 422k parameters, enables real-time object detection on embedded microcontrollers, and it has been evaluated to exploit CNN accelerators. In particular, the proposed network has been deployed on the MAX78000 microcontroller achieving high frame-rate of up to 180fps and an ultra-low energy consumption of only 196{\mu}J per inference with an inference efficiency of more than 106 MAC/Cycle. TinyissimoYOLO can be trained for any multi-object detection. However, considering the small network size, adding object detection classes will increase the size and memory consumption of the network, thus object detection with up to 3 classes is demonstrated. Furthermore, the network is trained using quantization-aware training and deployed with 8-bit quantization on different microcontrollers, such as STM32H7A3, STM32L4R9, Apollo4b and on the MAX78000's CNN accelerator. Performance evaluations are presented in this paper., Comment: Published In: 2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS)
Published: 2023
Full Text: View/download PDF

12. KmerCo: A lightweight K-mer counting technique with a tiny memory footprint

Author: Nayak, Sabuzima and Patgiri, Ripon
Subjects: Computer Science - Databases, 68P05, E.1
Abstract: K-mer counting is a requisite process for DNA assembly because it speeds up its overall process. The frequency of K-mers is used for estimating the parameters of DNA assembly, error correction, etc. The process also provides a list of district K-mers which assist in searching large databases and reducing the size of de Bruijn graphs. Nonetheless, K-mer counting is a data and compute-intensive process. Hence, it is crucial to implement a lightweight data structure that occupies low memory but does fast processing of K-mers. We proposed a lightweight K-mer counting technique, called KmerCo that implements a potent counting Bloom Filter variant, called countBF. KmerCo has two phases: insertion and classification. The insertion phase inserts all K-mers into countBF and determines distinct K-mers. The classification phase is responsible for the classification of distinct K-mers into trustworthy and erroneous K-mers based on a user-provided threshold value. We also proposed a novel benchmark performance metric. We used the Hadoop MapReduce program to determine the frequency of K-mers. We have conducted rigorous experiments to prove the dominion of KmerCo compared to state-of-the-art K-mer counting techniques. The experiments are conducted using DNA sequences of four organisms. The datasets are pruned to generate four different size datasets. KmerCo is compared with Squeakr, BFCounter, and Jellyfish. KmerCo took the lowest memory, highest number of insertions per second, and a positive trustworthy rate as compared with the three above-mentioned methods., Comment: Submitted to the conference for possible publication
Published: 2023

13. An Extension of DNAContainer with a Small Memory Footprint

Author: El-Shaikh, Alex and Seeger, Bernhard
Published: 2023
Full Text: View/download PDF

14. Augmented Efficiency: Reducing Memory Footprint and Accelerating Inference for 3D Semantic Segmentation Through Hybrid Vision.

Author: Aditya Krishnan 0003, Jayneel Vora, and Prasant Mohapatra
Published: 2024
Full Text: View/download PDF

15. Reducing Memory Footprint in Deep Network Training by Gradient Space Reutilization.

Author: Yiming Dong and Zhouchen Lin
Published: 2024
Full Text: View/download PDF

16. Tackling Memory Footprint Expansion During Live Migration of Virtual Machines.

Author: Roja Eswaran, Mingjie Yan, and Kartik Gopalan
Published: 2024
Full Text: View/download PDF

17. Telescope: Telemetry for Gargantuan Memory Footprint Applications.

Author: Alan Nair, Sandeep Kumar, Aravinda Prasad, Ying Huang, Andy Rudoff, and Sreenivas Subramoney
Published: 2024

18. DNN Memory Footprint Reduction via Post-Training Intra-Layer Multi-Precision Quantization.

Author: Behnam Ghavami, Amin Kamjoo, Lesley Shannon, and Steve Wilton
Published: 2024
Full Text: View/download PDF

19. Tempo: Accelerating Transformer-Based Model Training through Memory Footprint Reduction

Author: Andoorveedu, Muralidhar, Zhu, Zhanda, Zheng, Bojian, and Pekhimenko, Gennady
Subjects: Computer Science - Machine Learning, Computer Science - Performance
Abstract: Training deep learning models can be computationally expensive. Prior works have shown that increasing the batch size can potentially lead to better overall throughput. However, the batch size is frequently limited by the accelerator memory capacity due to the activations/feature maps stored for the training backward pass, as larger batch sizes require larger feature maps to be stored. Transformer-based models, which have recently seen a surge in popularity due to their good performance and applicability to a variety of tasks, have a similar problem. To remedy this issue, we propose Tempo, a new approach to efficiently use accelerator (e.g., GPU) memory resources for training Transformer-based models. Our approach provides drop-in replacements for the GELU, LayerNorm, and Attention layers, reducing the memory usage and ultimately leading to more efficient training. We implement Tempo and evaluate the throughput, memory usage, and accuracy/loss on the BERT Large pre-training task. We demonstrate that Tempo enables up to 2x higher batch sizes and 16% higher training throughput over the state-of-the-art baseline. We also evaluate Tempo on GPT2 and RoBERTa models, showing 19% and 26% speedup over the baseline., Comment: Accepted to NeurIPS 2022. Fixed some minor typos and added some small clarifications
Published: 2022

20. A Low Memory Footprint Quantized Neural Network for Depth Completion of Very Sparse Time-of-Flight Depth Maps

Author: Jiang, Xiaowen, Cambareri, Valerio, Agresti, Gianluca, Ugwu, Cynthia Ifeyinwa, Simonetto, Adriano, Cardinaux, Fabien, and Zanuttigh, Pietro
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Sparse active illumination enables precise time-of-flight depth sensing as it maximizes signal-to-noise ratio for low power budgets. However, depth completion is required to produce dense depth maps for 3D perception. We address this task with realistic illumination and sensor resolution constraints by simulating ToF datasets for indoor 3D perception with challenging sparsity levels. We propose a quantized convolutional encoder-decoder network for this task. Our model achieves optimal depth map quality by means of input pre-processing and carefully tuned training with a geometry-preserving loss function. We also achieve low memory footprint for weights and activations by means of mixed precision quantization-at-training techniques. The resulting quantized models are comparable to the state of the art in terms of quality, but they require very low GPU times and achieve up to 14-fold memory size reduction for the weights w.r.t. their floating point counterpart with minimal impact on quality metrics., Comment: In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops 2022. Presented at the 5th Efficient Deep Learning for Computer Vision Workshop
Published: 2022

21. Reducing the Memory Footprint of 3D Gaussian Splatting.

Author: Panagiotis Papantonakis, Georgios Kopanas, Bernhard Kerbl, Alexandre Lanvin, and George Drettakis
Published: 2024
Full Text: View/download PDF

22. FluidKV: Seamlessly Bridging the Gap between Indexing Performance and Memory-Footprint on Ultra-Fast Storage.

Author: Ziyi Lu, Qiang Cao 0001, Hong Jiang 0001, Yuxing Chen, Jie Yao, and Anqun Pan
Published: 2024

23. Rollback-Free Recovery for a High Performance Dense Linear Solver With Reduced Memory Footprint.

Author: Daniela Loreti, Marcello Artioli, and Anna Ciampolini
Published: 2024
Full Text: View/download PDF

24. An MPC Approximation Approach for Adaptive Cruise Control With Reduced Computational Complexity and Low Memory Footprint.

Author: Duc Giap Nguyen, Suyong Park, Jinrak Park, Dohee Kim, Jeong Soo Eo, and Kyoungseok Han
Published: 2024
Full Text: View/download PDF

25. Encrypted Image Classification with Low Memory Footprint Using Fully Homomorphic Encryption.

Author: Lorenzo Rovida and Alberto Leporati
Published: 2024
Full Text: View/download PDF

26. Few-Bit Backward: Quantized Gradients of Activation Functions for Memory Footprint Reduction

Author: Novikov, Georgii, Bershatsky, Daniel, Gusak, Julia, Shonenkov, Alex, Dimitrov, Denis, and Oseledets, Ivan
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence
Abstract: Memory footprint is one of the main limiting factors for large neural network training. In backpropagation, one needs to store the input to each operation in the computational graph. Every modern neural network model has quite a few pointwise nonlinearities in its architecture, and such operation induces additional memory costs which -- as we show -- can be significantly reduced by quantization of the gradients. We propose a systematic approach to compute optimal quantization of the retained gradients of the pointwise nonlinear functions with only a few bits per each element. We show that such approximation can be achieved by computing optimal piecewise-constant approximation of the derivative of the activation function, which can be done by dynamic programming. The drop-in replacements are implemented for all popular nonlinearities and can be used in any existing pipeline. We confirm the memory reduction and the same convergence on several open benchmarks., Comment: Submitted
Published: 2022

27. Encrypted Image Classification with Low Memory Footprint using Fully Homomorphic Encryption

Author: Rovida, L, Leporati, A, Rovida, Lorenzo, Leporati, Alberto, Rovida, L, Leporati, A, Rovida, Lorenzo, and Leporati, Alberto
Abstract: Classifying images has become a straightforward and accessible task, thanks to the advent of Deep Neural Networks. Nevertheless, not much attention is given to the privacy concerns associated with sensitive data contained in images. In this study, we propose a solution to this issue by exploring an intersection between Machine Learning and cryptography. In particular, Fully Homomorphic Encryption (FHE) emerges as a promising solution, as it enables computations to be performed on encrypted data. We therefore propose a Residual Network implementation based on FHE which allows the classification of encrypted images, ensuring that only the user can see the result. We suggest a circuit which reduces the memory requirements by more than 85% compared to the most recent works, while maintaining a high level of accuracy and a short computational time. We implement the circuit using the well-known Cheon-Kim-Kim-Song (CKKS) scheme, which enables approximate encrypted computations. We evaluate the results from three perspectives: memory requirements, computational time and calculations precision. We demonstrate that it is possible to evaluate an encrypted ResNet20 in less than five minutes on a laptop using approximately 15GB of memory, achieving an accuracy of 91.67% on the CIFAR-10 dataset, which is almost equivalent to the accuracy of the plain model (92.60%).
Published: 2024

28. A memory footprint optimization framework for Python applications targeting edge devices

Author: Katsaragakis, Manolis, Papadopoulos, Lazaros, Konijnenburg, Mario, Catthoor, Francky, and Soudris, Dimitrios
Published: 2023
Full Text: View/download PDF

29. Network Memory Footprint Compression Through Jointly Learnable Codebooks and Mappings.

Author: Edouard Yvinec, Arnaud Dapogny, and Kevin Bailly
Published: 2024

30. Reducing Memory Footprint in Deep Network Training by Gradient Space Reutilization

Author: Dong, Yiming, Lin, Zhouchen, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Lin, Zhouchen, editor, Cheng, Ming-Ming, editor, He, Ran, editor, Ubul, Kurban, editor, Silamu, Wushouer, editor, Zha, Hongbin, editor, Zhou, Jie, editor, and Liu, Cheng-Lin, editor
Published: 2025
Full Text: View/download PDF

31. Generic Merging of Structure from Motion Maps with a Low Memory Footprint

Author: Flood, Gabrielle, Gillsjö, David, Persson, Patrik, Heyden, Anders, and Åström, Kalle
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Robotics
Abstract: With the development of cheap image sensors, the amount of available image data have increased enormously, and the possibility of using crowdsourced collection methods has emerged. This calls for development of ways to handle all these data. In this paper, we present new tools that will enable efficient, flexible and robust map merging. Assuming that separate optimisations have been performed for the individual maps, we show how only relevant data can be stored in a low memory footprint representation. We use these representations to perform map merging so that the algorithm is invariant to the merging order and independent of the choice of coordinate system. The result is a robust algorithm that can be applied to several maps simultaneously. The result of a merge can also be represented with the same type of low-memory footprint format, which enables further merging and updating of the map in a hierarchical way. Furthermore, the method can perform loop closing and also detect changes in the scene between the capture of the different image sequences. Using both simulated and real data - from both a hand held mobile phone and from a drone - we verify the performance of the proposed method., Comment: Accepted at ICPR2020, 9 pages, 8 figures
Published: 2021

32. ActNN: Reducing Training Memory Footprint via 2-Bit Activation Compressed Training

Author: Chen, Jianfei, Zheng, Lianmin, Yao, Zhewei, Wang, Dequan, Stoica, Ion, Mahoney, Michael W., and Gonzalez, Joseph E.
Subjects: Computer Science - Machine Learning, Computer Science - Computer Vision and Pattern Recognition, Statistics - Machine Learning
Abstract: The increasing size of neural network models has been critical for improvements in their accuracy, but device memory is not growing at the same rate. This creates fundamental challenges for training neural networks within limited memory environments. In this work, we propose ActNN, a memory-efficient training framework that stores randomly quantized activations for back propagation. We prove the convergence of ActNN for general network architectures, and we characterize the impact of quantization on the convergence via an exact expression for the gradient variance. Using our theory, we propose novel mixed-precision quantization strategies that exploit the activation's heterogeneity across feature dimensions, samples, and layers. These techniques can be readily applied to existing dynamic graph frameworks, such as PyTorch, simply by substituting the layers. We evaluate ActNN on mainstream computer vision models for classification, detection, and segmentation tasks. On all these tasks, ActNN compresses the activation to 2 bits on average, with negligible accuracy loss. ActNN reduces the memory footprint of the activation by 12x, and it enables training with a 6.6x to 14x larger batch size., Comment: to be published in ICML 2021
Published: 2021

33. Vectorization and Minimization of Memory Footprint for Linear High-Order Discontinuous Galerkin Schemes

Author: Gallard, Jean-Matthieu, Rannabauer, Leonhard, Reinarz, Anne, and Bader, Michael
Subjects: Computer Science - Mathematical Software
Abstract: We present a sequence of optimizations to the performance-critical compute kernels of the high-order discontinuous Galerkin solver of the hyperbolic PDE engine ExaHyPE -- successively tackling bottlenecks due to SIMD operations, cache hierarchies and restrictions in the software design. Starting from a generic scalar implementation of the numerical scheme, our first optimized variant applies state-of-the-art optimization techniques by vectorizing loops, improving the data layout and using Loop-over-GEMM to perform tensor contractions via highly optimized matrix multiplication functions provided by the LIBXSMM library. We show that memory stalls due to a memory footprint exceeding our L2 cache size hindered the vectorization gains. We therefore introduce a new kernel that applies a sum factorization approach to reduce the kernel's memory footprint and improve its cache locality. With the L2 cache bottleneck removed, we were able to exploit additional vectorization opportunities, by introducing a hybrid Array-of-Structure-of-Array data layout that solves the data layout conflict between matrix multiplications kernels and the point-wise functions to implement PDE-specific terms. With this last kernel, evaluated in a benchmark simulation at high polynomial order, only 2\% of the floating point operations are still performed using scalar instructions and 22.5\% of the available performance is achieved., Comment: PDSEC 2020
Published: 2020

34. Structured Ensembles: an Approach to Reduce the Memory Footprint of Ensemble Methods

Author: Pomponi, Jary, Scardapane, Simone, and Uncini, Aurelio
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Statistics - Machine Learning
Abstract: In this paper, we propose a novel ensembling technique for deep neural networks, which is able to drastically reduce the required memory compared to alternative approaches. In particular, we propose to extract multiple sub-networks from a single, untrained neural network by solving an end-to-end optimization task combining differentiable scaling over the original architecture, with multiple regularization terms favouring the diversity of the ensemble. Since our proposal aims to detect and extract sub-structures, we call it Structured Ensemble. On a large experimental evaluation, we show that our method can achieve higher or comparable accuracy to competing methods while requiring significantly less storage. In addition, we evaluate our ensembles in terms of predictive calibration and uncertainty, showing they compare favourably with the state-of-the-art. Finally, we draw a link with the continual learning literature, and we propose a modification of our framework to handle continuous streams of tasks with a sub-linear memory cost. We compare with a number of alternative strategies to mitigate catastrophic forgetting, highlighting advantages in terms of average accuracy and memory., Comment: Article accepted at Neural Networks
Published: 2021
Full Text: View/download PDF

35. An Efficient Logic Operation Scheduler for Minimizing Memory Footprint of In-Memory SIMD Computation.

Author: Xingyue Qian, Zhezhi He, and Weikang Qian
Published: 2024
Full Text: View/download PDF

36. On the Performance and Memory Footprint of Distributed Training: An Empirical Study on Transformers.

Author: Zhengxian Lu, Fangyu Wang, Zhiwei Xu, Fei Yang, and Tao Li
Published: 2024
Full Text: View/download PDF

37. Efficient Continual Learning with Low Memory Footprint For Edge Device.

Author: Zeqing Wang, Fei Cheng, Kangye Ji, and Bohu Huang
Published: 2024
Full Text: View/download PDF

38. Rethinking Generalization in American Sign Language Prediction for Edge Devices with Extremely Low Memory Footprint

Author: Paul, Aditya Jyoti, Mohan, Puranjay, and Sehgal, Stuti
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Computer Science - Computation and Language, Computer Science - Computer Vision and Pattern Recognition, Computer Science - Human-Computer Interaction, 68T45, 68T10, 68T07, 68U10, I.2.10, I.4.8, I.5.1, J.3, I.4.1, K.4.2
Abstract: Due to the boom in technical compute in the last few years, the world has seen massive advances in artificially intelligent systems solving diverse real-world problems. But a major roadblock in the ubiquitous acceptance of these models is their enormous computational complexity and memory footprint. Hence efficient architectures and training techniques are required for deployment on extremely low resource inference endpoints. This paper proposes an architecture for detection of alphabets in American Sign Language on an ARM Cortex-M7 microcontroller having just 496 KB of framebuffer RAM. Leveraging parameter quantization is a common technique that might cause varying drops in test accuracy. This paper proposes using interpolation as augmentation amongst other techniques as an efficient method of reducing this drop, which also helps the model generalize well to previously unseen noisy data. The proposed model is about 185 KB post-quantization and inference speed is 20 frames per second., Comment: 6 pages, Published in IEEE RAICS 2020, see https://raics.in
Published: 2020
Full Text: View/download PDF

39. Dimensionality reduction to improve search time and memory footprint in content-retrieval tasks: Application to semiconductor inspection images

Author: Vial, Thomas, Dhouib, Farah, Roger, Louison, Blangero, Annabelle, Duvivier, Frédéric, Sayadi, Karim, and Faraggi, Marisa N.
Published: 2022
Full Text: View/download PDF

40. Derandomization with Minimal Memory Footprint.

Author: Dean Doron and Roei Tell
Published: 2023
Full Text: View/download PDF

41. Reducing shared memory footprint to leverage high throughput on Tensor Cores and its flexible API extension library.

Author: Hiroyuki Ootomo and Rio Yokota
Published: 2023
Full Text: View/download PDF

42. Dynamic Allocations in a Hierarchical Parallel Context - A Study on Performance, Memory Footprint, and Portability Using SYCL.

Author: Aymeric Millan, Thomas Padioleau, and Julien Bigot
Published: 2023
Full Text: View/download PDF

43. TinyissimoYOLO: A Quantized, Low-Memory Footprint, TinyML Object Detection Network for Low Power Microcontrollers.

Author: Julian Moosmann, Marco Giordano, Christian Vogt 0002, and Michele Magno
Published: 2023
Full Text: View/download PDF

44. Minimizing Peak Memory Footprint of Inference on IoTs Devices by Efficient Recomputation.

Author: Xiaofeng Sun, Chaonong Xu, and Chao Li
Published: 2023
Full Text: View/download PDF

45. LM-DiskANN: Low Memory Footprint in Disk-Native Dynamic Graph-Based ANN Indexing.

Author: Yu Pan 0007, Jianxin Sun, and Hongfeng Yu 0001
Published: 2023
Full Text: View/download PDF

46. Reducing the Memory Footprint of IFDS-Based Data-Flow Analyses using Fine-Grained Garbage Collection.

Author: Dongjie He, Yujiang Gui, Yaoqing Gao, and Jingling Xue
Published: 2023
Full Text: View/download PDF

47. Few-bit Backward: Quantized Gradients of Activation Functions for Memory Footprint Reduction.

Author: Georgii Sergeevich Novikov, Daniel Bershatsky, Julia Gusak, Alex Shonenkov, Denis Valerievich Dimitrov, and Ivan V. Oseledets
Published: 2023

48. Mixed precision LU factorization on GPU tensor cores: reducing data movement and memory footprint.

Author: Florent Lopez and Théo Mary
Published: 2023
Full Text: View/download PDF

49. Novel Low Memory Footprint DNN Models for Edge Classification of Surgeons' Postures.

Author: Alex Hanneman, Terry Fawden, Marco Branciforte, Maria Celvisia Virzì, Esther Moss, Luciano Ost, and Massimiliano Zecca
Published: 2023
Full Text: View/download PDF

50. An Extension of DNAContainer with a Small Memory Footprint.

Author: Alex El-Shaikh and Bernhard Seeger
Published: 2023
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Category

Publication Type

Journal

Region

Database

Publisher

41,421 results on '"Memory footprint"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources