2,029 results on '"Memory footprint"'
Search Results
2. Utilizing Inherent Bias for Memory Efficient Continual Learning: A Simple and Robust Baseline.
- Author
-
Rahimi, Neela and Shao, Ming
- Subjects
- *
DIGITAL footprint , *MEMORY bias , *ONLINE education , *SOURCE code , *MACHINE learning - Abstract
Learning from continuously evolving data is critical in real-world applications. This type of learning, known as Continual Learning (CL), aims to assimilate new information without compromising performance on prior knowledge. However, learning new information leads to a bias in the network towards recent observations, resulting in a phenomenon known as catastrophic forgetting. The complexity increases in Online Continual Learning (OCL) scenarios where models are allowed only a single pass over data. Existing OCL approaches that rely on replaying exemplar sets are not only memory-intensive when it comes to large-scale datasets but also raise security concerns. While recent dynamic network models address memory concerns, they often present computationally demanding, over-parameterized solutions with limited generalizability. To address this longstanding problem, we propose a novel OCL approach termed "Bias Robust online Continual Learning (BRCL)." BRCL retains all intermediate models generated. These models inherently exhibit a preference for recently learned classes. To leverage this property for enhanced performance, we devise a strategy we describe as 'utilizing bias to counteract bias.' This method involves the development of an Inference function that capitalizes on the inherent biases of each model towards the recent tasks. Furthermore, we integrate a model consolidation technique that aligns the first layers of these models, particularly focusing on similar feature representations. This process effectively reduces the memory requirement, ensuring a low memory footprint. Despite the simplicity of the methodology to guarantee expandability to various frameworks, extensive experiments reveal a notable performance edge over leading methods on key benchmarks, getting continual learning closer to matching offline training. (Source code will be made publicly available upon the publication of this paper.) • Innovative Approach to Continual Learning: a novel method that leverages the inherent bias in continual learning models as a guiding metric for model selection during inference. This approach significantly reduces forgetting compared to baseline methods. • Efficient Memory Management: By eliminating the need for a new backbone per task and leveraging distribution similarity within networks, the method sets a new standard in memory optimization for continual learning models. • Expandability and Scalability: The dynamic network in the proposed framework is adaptable and expandable to various backbone networks, and scalable across different dataset sizes. This flexibility ensures broad applicability in diverse machine learning tasks. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
3. E-prop on SpiNNaker 2: Exploring online learning in spiking RNNs on neuromorphic hardware.
- Author
-
Rostami, Amirhossein, Vogginger, Bernhard, Yexin Yan, and Mayr, Christian G.
- Subjects
ARTIFICIAL neural networks ,ONLINE education ,RECURRENT neural networks ,BACK propagation ,MNEMONICS ,SPEECH processing systems ,AUTOBIOGRAPHICAL memory - Abstract
Introduction: In recent years, the application of deep learning models at the edge has gained attention. Typically, artificial neural networks (ANNs) are trained on graphics processing units (GPUs) and optimized for efficient execution on edge devices. Training ANNs directly at the edge is the next step with many applications such as the adaptation of models to specific situations like changes in environmental settings or optimization for individuals, e.g., optimization for speakers for speech processing. Also, local training can preserve privacy. Over the last few years, many algorithms have been developed to reduce memory footprint and computation. Methods: A specific challenge to train recurrent neural networks (RNNs) for processing sequential data is the need for the Back Propagation Through Time (BPTT) algorithm to store the network state of all time steps. This limitation is resolved by the biologically-inspired E-prop approach for training Spiking Recurrent Neural Networks (SRNNs). We implement the E-prop algorithm on a prototype of the SpiNNaker 2 neuromorphic system. A parallelization strategy is developed to split and train networks on the ARM cores of SpiNNaker 2 to make efficient use of both memory and compute resources. We trained an SRNN from scratch on SpiNNaker 2 in real-time on the Google Speech Command dataset for keyword spotting. Result: We achieved an accuracy of 91.12% while requiring only 680 KB of memory for training the network with 25 K weights. Compared to other spiking neural networks with equal or better accuracy, our work is significantly more memory-efficient. Discussion: In addition, we performed a memory and time profiling of the E-prop algorithm. This is used on the one hand to discuss whether E-prop or BPTT is better suited for training a model at the edge and on the other hand to explore architecture modifications to SpiNNaker 2 to speed up online learning. Finally, energy estimations predict that the SRNN can be trained on SpiNNaker2 with 12 times less energy than using a NVIDIA V100 GPU. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
4. Lighter and faster simulations on domains with symmetries
- Author
-
Universitat Politècnica de Catalunya. Centre Tecnològic de la Transferència de Calor, Universitat Politècnica de Catalunya. Departament de Màquines i Motors Tèrmics, Universitat Politècnica de Catalunya. CTTC - Centre Tecnològic de Transferència de Calor, Alsalti Baldellou, Àdel, Álvarez Farré, Xavier, Colomer Rey, Guillem, Gorobets, Andrei, Pérez Segarra, Carlos David, Oliva Llena, Asensio, Trias Miquel, Francesc Xavier, Universitat Politècnica de Catalunya. Centre Tecnològic de la Transferència de Calor, Universitat Politècnica de Catalunya. Departament de Màquines i Motors Tèrmics, Universitat Politècnica de Catalunya. CTTC - Centre Tecnològic de Transferència de Calor, Alsalti Baldellou, Àdel, Álvarez Farré, Xavier, Colomer Rey, Guillem, Gorobets, Andrei, Pérez Segarra, Carlos David, Oliva Llena, Asensio, and Trias Miquel, Francesc Xavier
- Abstract
A strategy to improve the performance and reduce the memory footprint of simulations on meshes with spatial reflection symmetries is presented in this work. By using an appropriate mirrored ordering of the unknowns, discrete partial differential operators are represented by matrices with a regular block structure that allows replacing the standard sparse matrix–vector product with a specialised version of the sparse matrix-matrix product, which has a significantly higher arithmetic intensity. Consequently, matrix multiplications are accelerated, whereas their memory footprint is reduced, making massive simulations more affordable. As an example of practical application, we consider the numerical simulation of turbulent incompressible flows using a low-dissipation discretisation on unstructured collocated grids. All the required matrices are classified into three sparsity patterns that correspond to the discrete Laplacian, gradient, and divergence operators. Therefore, the above-mentioned benefits of exploiting spatial reflection symmetries are tested for these three matrices on both CPU and GPU, showing up to 5.0x speed-ups and 8.0x memory savings. Finally, a roofline performance analysis of the symmetry-aware sparse matrix–vector product is presented., A.A.B., X.A.F., G.C., C.D.P.S., A.O. and F.X.T. have been financially supported by two competitive R+D projects: RETOtwin (PDC2021120970-I00), given by MCIN/AEI/10.13039/501100011033 and European Union Next GenerationEU/PRTR, and FusionCAT (001P-001722), given by Generalitat de Catalunya RIS3CAT-FEDER. A.A.B. has also been supported by the predoctoral grants DIN2018-010061 and 2019-DI-90, given by MCIN/AEI/10.13039/501100011033 and the Catalan Agency for Management of University and Research Grants (AGAUR), respectively. The numerical experiments have been conducted on the Marenostrum4 supercomputer at the Barcelona Supercomputing Center under the project IM-2022-3-0026. The authors thankfully acknowledge these institutions., Peer Reviewed, Postprint (published version)
- Published
- 2024
5. E-prop on SpiNNaker 2: Exploring online learning in spiking RNNs on neuromorphic hardware
- Author
-
Amirhossein Rostami, Bernhard Vogginger, Yexin Yan, and Christian G. Mayr
- Subjects
SpiNNaker 2 ,E-prop ,online learning ,training at the edge ,parallelism ,memory footprint ,Neurosciences. Biological psychiatry. Neuropsychiatry ,RC321-571 - Abstract
IntroductionIn recent years, the application of deep learning models at the edge has gained attention. Typically, artificial neural networks (ANNs) are trained on graphics processing units (GPUs) and optimized for efficient execution on edge devices. Training ANNs directly at the edge is the next step with many applications such as the adaptation of models to specific situations like changes in environmental settings or optimization for individuals, e.g., optimization for speakers for speech processing. Also, local training can preserve privacy. Over the last few years, many algorithms have been developed to reduce memory footprint and computation.MethodsA specific challenge to train recurrent neural networks (RNNs) for processing sequential data is the need for the Back Propagation Through Time (BPTT) algorithm to store the network state of all time steps. This limitation is resolved by the biologically-inspired E-prop approach for training Spiking Recurrent Neural Networks (SRNNs). We implement the E-prop algorithm on a prototype of the SpiNNaker 2 neuromorphic system. A parallelization strategy is developed to split and train networks on the ARM cores of SpiNNaker 2 to make efficient use of both memory and compute resources. We trained an SRNN from scratch on SpiNNaker 2 in real-time on the Google Speech Command dataset for keyword spotting.ResultWe achieved an accuracy of 91.12% while requiring only 680 KB of memory for training the network with 25 K weights. Compared to other spiking neural networks with equal or better accuracy, our work is significantly more memory-efficient.DiscussionIn addition, we performed a memory and time profiling of the E-prop algorithm. This is used on the one hand to discuss whether E-prop or BPTT is better suited for training a model at the edge and on the other hand to explore architecture modifications to SpiNNaker 2 to speed up online learning. Finally, energy estimations predict that the SRNN can be trained on SpiNNaker2 with 12 times less energy than using a NVIDIA V100 GPU.
- Published
- 2022
- Full Text
- View/download PDF
6. A fractional memory-efficient approach for online continuous-time influence maximization.
- Author
-
Bevilacqua, Glenn S. and Lakshmanan, Laks V. S.
- Abstract
Influence maximization (IM) under a continuous-time diffusion model requires finding a set of initial adopters which when activated lead to the maximum expected number of users becoming activated within a given amount of time. State-of-the-art approximation algorithms applicable to solving this intractable problem use reverse reachability influence samples to approximate the diffusion process. Unfortunately, these algorithms require storing large collections of such samples which can become prohibitive depending on the desired solution quality, properties of the diffusion process and seed set size. To remedy this, we design an algorithm that allows the influence samples to be processed in a streaming manner, avoiding the need to store them. We approach IM using two fractional objectives: a fractional relaxation and a multi-linear extension of the original objective function. We derive a progressively improved upper bound to the optimal solution, which we empirically find to be tighter than the best existing upper bound. This enables instance-dependent solution quality guarantees that are observed to be vastly superior to the theoretical worst case. Leveraging these, we develop an algorithm that delivers solutions with a superior empirical solution quality guarantee at comparable running time with greatly reduced memory usage compared to the state-of-the-art. We demonstrate the superiority of our approach via extensive experiments on five real datasets of varying sizes of up to 41M nodes and 1.5B edges. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
7. Tile Low Rank Cholesky Factorization for Climate/Weather Modeling Applications on Manycore Architectures
- Author
-
Akbudak, Kadir, Ltaief, Hatem, Mikhalev, Aleksandr, Keyes, David, Hutchison, David, Series editor, Kanade, Takeo, Series editor, Kittler, Josef, Series editor, Kleinberg, Jon M., Series editor, Mattern, Friedemann, Series editor, Mitchell, John C., Series editor, Naor, Moni, Series editor, Pandu Rangan, C., Series editor, Steffen, Bernhard, Series editor, Terzopoulos, Demetri, Series editor, Tygar, Doug, Series editor, Weikum, Gerhard, Series editor, Kunkel, Julian M., editor, Yokota, Rio, editor, Balaji, Pavan, editor, and Keyes, David, editor
- Published
- 2017
- Full Text
- View/download PDF
8. PlaneFusion: Real-Time Indoor Scene Reconstruction With Planar Prior
- Author
-
Feng Xu, Zhiguo Shi, Bingjian Gong, Chenggang Yan, and Zunjie Zhu
- Subjects
Computer science ,Plane (geometry) ,business.industry ,Pipeline (computing) ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Graphics processing unit ,Iterative reconstruction ,Simultaneous localization and mapping ,Computer Graphics and Computer-Aided Design ,Signal Processing ,Memory footprint ,Computer vision ,Segmentation ,Computer Vision and Pattern Recognition ,Artificial intelligence ,business ,Pose ,Software - Abstract
Real-time dense SLAM techniques aim to reconstruct the dense three-dimensional geometry of a scene in real time with an RGB or RGB-D sensor. An indoor scene is an important type of working environment for these techniques. The planar prior can be used in this scenario to improve the reconstruction quality, especially for large low-texture regions that commonly occur in an indoor scene. This article fully explores the planar prior in a dense SLAM pipeline. First, we propose a novel plane detection and segmentation method that runs at 200 Hz on a modern graphics processing unit. Our algorithm for constructing global plane constraints is very efficient; hence, we use it in the process of each input frame for the camera pose estimation while maintaining the real-time performance. Second, we propose herein a plane-based map representation that greatly reduces the memory footprint of plane regions while keeping the geometric details on planes. The experiments reveal that our system yields superior reconstruction results with planar information running at more than 30 fps. Aside from speed and storage improvements, our technique also handles the low-texture problem in plane regions.
- Published
- 2022
- Full Text
- View/download PDF
9. Lighter and faster simulations on domains with symmetries.
- Author
-
Alsalti-Baldellou, Àdel, Álvarez-Farré, Xavier, Colomer, Guillem, Gorobets, Andrey, Pérez-Segarra, Carlos David, Oliva, Assensi, and Trias, F. Xavier
- Subjects
- *
PARTIAL differential operators , *MATRICES (Mathematics) , *MATRIX multiplications , *SPARSE matrices , *INCOMPRESSIBLE flow - Abstract
A strategy to improve the performance and reduce the memory footprint of simulations on meshes with spatial reflection symmetries is presented in this work. By using an appropriate mirrored ordering of the unknowns, discrete partial differential operators are represented by matrices with a regular block structure that allows replacing the standard sparse matrix–vector product with a specialised version of the sparse matrix-matrix product, which has a significantly higher arithmetic intensity. Consequently, matrix multiplications are accelerated, whereas their memory footprint is reduced, making massive simulations more affordable. As an example of practical application, we consider the numerical simulation of turbulent incompressible flows using a low-dissipation discretisation on unstructured collocated grids. All the required matrices are classified into three sparsity patterns that correspond to the discrete Laplacian, gradient, and divergence operators. Therefore, the above-mentioned benefits of exploiting spatial reflection symmetries are tested for these three matrices on both CPU and GPU, showing up to 5.0x speed-ups and 8.0x memory savings. Finally, a roofline performance analysis of the symmetry-aware sparse matrix–vector product is presented. • Strategy to accelerate CFD simulations on meshes with spatial reflection symmetries. • Replacement of SpMV with a specialised version of the more-compute intensive SpMM • Implementation of a lighter sparse matrix storage format accounting for symmetries. • Hierarchical multilevel MPI+OpenMP+OpenCL/CUDA parallelisation. • Numerical tests on CPUs and GPUs show up to 5x speed-ups and 8x memory savings. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
10. Runtime Code Polymorphism as a Protection Against Side Channel Attacks
- Author
-
Couroussé, Damien, Barry, Thierno, Robisson, Bruno, Jaillon, Philippe, Potin, Olivier, Lanet, Jean-Louis, Hutchison, David, Series editor, Kanade, Takeo, Series editor, Kittler, Josef, Series editor, Kleinberg, Jon M., Series editor, Mattern, Friedemann, Series editor, Mitchell, John C., Series editor, Naor, Moni, Series editor, Pandu Rangan, C., Series editor, Steffen, Bernhard, Series editor, Terzopoulos, Demetri, Series editor, Tygar, Doug, Series editor, Weikum, Gerhard, Series editor, Foresti, Sara, editor, and Lopez, Javier, editor
- Published
- 2016
- Full Text
- View/download PDF
11. Vector Maps: A Lightweight and Accurate Map Format for Multi-robot Systems
- Author
-
Baizid, Khelifa, Lozenguez, Guillaume, Fabresse, Luc, Bouraqadi, Noury, Hutchison, David, Series editor, Kanade, Takeo, Series editor, Kittler, Josef, Series editor, Kleinberg, Jon M., Series editor, Mattern, Friedemann, Series editor, Mitchell, John C., Series editor, Naor, Moni, Series editor, Pandu Rangan, C., Series editor, Steffen, Bernhard, Series editor, Terzopoulos, Demetri, Series editor, Tygar, Doug, Series editor, Weikum, Gerhard, Series editor, Kubota, Naoyuki, editor, Kiguchi, Kazuo, editor, Liu, Honghai, editor, and Obo, Takenori, editor
- Published
- 2016
- Full Text
- View/download PDF
12. Reducing NoC and Memory Contention for Manycores
- Author
-
Chandru, Vishwanathan, Mueller, Frank, Hutchison, David, Series editor, Kanade, Takeo, Series editor, Kittler, Josef, Series editor, Kleinberg, Jon M., Series editor, Mattern, Friedemann, Series editor, Mitchell, John C., Series editor, Naor, Moni, Series editor, Pandu Rangan, C., Series editor, Steffen, Bernhard, Series editor, Terzopoulos, Demetri, Series editor, Tygar, Doug, Series editor, Weikum, Gerhard, Series editor, Hannig, Frank, editor, Cardoso, João M. P., editor, Pionteck, Thilo, editor, Fey, Dietmar, editor, Schröder-Preikschat, Wolfgang, editor, and Teich, Jürgen, editor
- Published
- 2016
- Full Text
- View/download PDF
13. Code Structuring Concepts
- Author
-
Loder, Wolfgang and Loder, Wolfgang
- Published
- 2016
- Full Text
- View/download PDF
14. An Overview of the Compiler API
- Author
-
Bock, Jason and Bock, Jason
- Published
- 2016
- Full Text
- View/download PDF
15. Exploiting spatial symmetries for solving Poisson's equation
- Author
-
Universitat Politècnica de Catalunya. Doctorat en Enginyeria Tèrmica, Universitat Politècnica de Catalunya. Departament de Màquines i Motors Tèrmics, Universitat Politècnica de Catalunya. CTTC - Centre Tecnològic de Transferència de Calor, Alsalti Baldellou, Àdel, Álvarez Farré, Xavier, Trias Miquel, Francesc Xavier, Oliva Llena, Asensio, Universitat Politècnica de Catalunya. Doctorat en Enginyeria Tèrmica, Universitat Politècnica de Catalunya. Departament de Màquines i Motors Tèrmics, Universitat Politècnica de Catalunya. CTTC - Centre Tecnològic de Transferència de Calor, Alsalti Baldellou, Àdel, Álvarez Farré, Xavier, Trias Miquel, Francesc Xavier, and Oliva Llena, Asensio
- Abstract
This paper presents a strategy to accelerate virtually any Poisson solver by taking advantage of s spatial reflection symmetries. More precisely, we have proved the existence of an inexpensive block diagonalisation that transforms the original Poisson equation into a set of 2s fully decoupled subsystems then solved concurrently. This block diagonalisation is identical regardless of the mesh connectivity (structured or unstructured) and the geometric complexity of the problem, therefore applying to a wide range of academic and industrial configurations. In fact, it simplifies the task of discretising complex geometries since it only requires meshing a portion of the domain that is then mirrored implicitly by the symmetries’ hyperplanes. Thus, the resulting meshes naturally inherit the exploited symmetries, and their memory footprint becomes 2s times smaller. Thanks to the subsystems’ better spectral properties, iterative solvers converge significantly faster. Additionally, imposing an adequate grid points’ ordering allows reducing the operators’ footprint and replacing the standard sparse matrix-vector products with the sparse matrixmatrix product, a higher arithmetic intensity kernel. As a result, matrix multiplications are accelerated, and massive simulations become more affordable. Finally, we include numerical experiments based on a turbulent flow simulation and making state-of-theart solvers exploit a varying number of symmetries. On the one hand, algebraic multigrid and preconditioned Krylov subspace methods require up to 23% and 72% fewer iterations, resulting in up to 1.7x and 5.6x overall speedups, respectively. On the other, sparse direct solvers’ memory footprint, setup and solution costs are reduced by up to 48%, 58% and 46%, respectively., This work has been financially supported by two competitive R+D projects: RETOtwin (PDC2021-120970-I00), given by MCIN/AEI/10.13039/501100011033 and European Union Next GenerationEU/PRTR, and FusionCAT (001-P-001722), given by Generalitat de Catalunya RIS3CAT-FEDER. Àdel Alsalti-Baldellou has also been supported by the predoctoral grants DIN2018-010061 and 2019-DI-90, given by MCIN/AEI/10.13039/501100011033 and the Catalan Agency for Management of University and Research Grants (AGAUR), respectively., Peer Reviewed, Postprint (published version)
- Published
- 2023
16. Memory Scaling of Cloud-Based Big Data Systems: A Hybrid Approach
- Author
-
Feng Yan, Dongfang Zhao, Ke Wang, Xinying Wang, and Cong Xu
- Subjects
Information Systems and Management ,Computer science ,business.industry ,Distributed computing ,Big data ,Cloud computing ,Trial and error ,Footprint ,Virtual memory ,Memory footprint ,Overhead (computing) ,Representation (mathematics) ,business ,Information Systems - Abstract
When deploying memory-intensive applications to public clouds, one important yet challenging question to answer is how to select a specific instance type whose memory capacity is large enough to prevent out-of-memory errors while the cost is minimized without violating performance requirements. The state-of-the-practice solution is trial and error, causing both performance overhead and additional monetary cost. This paper investigates two memory scaling mechanisms in public clouds: physical memory (good performance and high cost) and virtual memory (degraded performance and no additional cost). In order to analyze the trade-off between performance and cost of the two scaling options, a performance-cost model is developed that is driven by a lightweight analytic prediction approach through a compact representation of the memory footprint. In addition, for those scenarios when the footprint is unavailable, a meta-model-based prediction method is proposed using just-in-time migration mechanisms. The proposed techniques have been extensively evaluated with various benchmarks and real-world applications on Amazon Web Services: the performance-cost model is highly accurate and the proposed just-in-time migration approach reduces the monetary cost by up to 66\%.
- Published
- 2022
- Full Text
- View/download PDF
17. Optimizing a Certified Proof Checker for a Large-Scale Computer-Generated Proof
- Author
-
Cruz-Filipe, Luís, Schneider-Kamp, Peter, Goebel, Randy, Series editor, Tanaka, Yuzuru, Series editor, Wahlster, Wolfgang, Series editor, Kerber, Manfred, editor, Carette, Jacques, editor, Kaliszyk, Cezary, editor, Rabe, Florian, editor, and Sorge, Volker, editor
- Published
- 2015
- Full Text
- View/download PDF
18. Revisiting Volgenant-Jonker for Approximating Graph Edit Distance
- Author
-
Jones, William, Chawdhary, Aziem, King, Andy, Hutchison, David, Series editor, Kanade, Takeo, Series editor, Kittler, Josef, Series editor, Kleinberg, Jon M., Series editor, Mattern, Friedemann, Series editor, Mitchell, John C., Series editor, Naor, Moni, Series editor, Pandu Rangan, C., Series editor, Steffen, Bernhard, Series editor, Terzopoulos, Demetri, Series editor, Tygar, Doug, Series editor, Weikum, Gerhard, Series editor, Liu, Cheng-Lin, editor, Luo, Bin, editor, Kropatsch, Walter G., editor, and Cheng, Jian, editor
- Published
- 2015
- Full Text
- View/download PDF
19. Profiling and Analysis of Dynamic Applications
- Author
-
Atienza Alonso, David, Mamagkakis, Stylianos, Poucet, Christophe, Peón-Quirós, Miguel, Bartzas, Alexandros, Catthoor, Francky, Soudris, Dimitrios, Atienza Alonso, David, Mamagkakis, Stylianos, Poucet, Christophe, Peón-Quirós, Miguel, Bartzas, Alexandros, Catthoor, Francky, and Soudris, Dimitrios
- Published
- 2015
- Full Text
- View/download PDF
20. Adaptive Techniques for Minimizing Middleware Memory Footprint for Distributed, Real-Time, Embedded Systems
- Author
-
Panahi, Mark, Harmon, Trevor, and Klefstad, Raymond
- Subjects
distributed systems ,embedded systems ,real-time systems ,CORBA ,middleware ,aspect-oriented programming ,memory footprint - Abstract
In order for middleware to be widely useful for distributed, real-time, and embedded systems, it should provide a full set of services and be easily customizable to meet the memory footprint limitations of embedded systems. In this paper, we examine a variety of techniques used to reduce memory footprint in middleware. We found that combining aspect-oriented programming with code shrinkers and obfuscators reduces the memory footprint of CORBA middleware to
- Published
- 2003
21. AxRLWE: A Multilevel Approximate Ring-LWE Co-Processor for Lightweight IoT Applications
- Author
-
Dur-e-Shahwar Kundi, Maire OrNeill, Weiqiang Liu, Chenghua Wang, Song Bian, and Ayesha Khalid
- Subjects
Coprocessor ,Standardization ,Computer Networks and Communications ,Computer science ,business.industry ,Cryptography ,Computer Science Applications ,CMOS ,Application-specific integrated circuit ,Computer engineering ,Hardware and Architecture ,Signal Processing ,Memory footprint ,NIST ,business ,Field-programmable gate array ,Information Systems - Abstract
This work presents a multi-level approximation exploration undertaken on the Ring-Learning-with-Errors (R-LWE) based Public-key Cryptographic (PKC) schemes that belong to quantum-resilient cryptography algorithms. Among the various quantum-resilient cryptography schemes proposed in the currently running NIST’s Post-quantum Cryptography (PQC) standardization plan, the lattice based LWE schemes have emerged as the most viable and preferred class for the IoT applications due to their compact area and memory footprint compared to other alternatives. However, compared to the classical schemes used today, R-LWE is much harder a challenge to fit on embedded IoT (end-node) devices, due to their stricter resource constraints (lower area, memory, energy budgets) as well as their limited computational capabilities. To the best of our knowledge, this is the first endeavour exploring the inherent approximate nature of LWE problem to undertake a multi-level Approximate R-LWE (AxRLWE) architecture with respective security estimates opt for lightweight IoT devices. Undertaking AxRLWE on Field Programmable Gate Arrays (FPGAs), we benchmarked a 64% area reduction cost compared to earlier accurate R-LWE designs at the cost of reduced quantum-security. For the Application Specific Integrated Circuits (ASICs) with 45nm CMOS technology, AxRLWE was benchmarked to fit well within the same area-budget of lightweight ECC processor and consume a third of energy compared to special class of R-Binary LWE (R-BLWE) designs being proposed for an IoT, with better security level.
- Published
- 2022
- Full Text
- View/download PDF
22. Optimized training and scalable implementation of Conditional Deep Neural Networks with early exits for Fog-supported IoT applications.
- Author
-
Baccarelli, Enzo, Scardapane, Simone, Scarpiniti, Michele, Momenzadeh, Alireza, and Uncini, Aurelio
- Subjects
- *
GREEDY algorithms , *BIG data , *VIDEO coding - Abstract
The incoming IoT big data era requires efficient and resource-constrained mining of large sets of distributed data. This paper explores a possible approach to this end, combining the two emerging paradigms of Conditional Neural Networks with early exits and Fog Computing. Apart from describing the general framework, we provide four specific contributions. First, after reviewing the basic architectures of CDNNs with early exits and characterizing their computational capacity, we consider three basic algorithms for their supervised training (namely the End-to-End, Layer-Wise and Classifier-Wise training algorithms), and, then, formally characterize and compare the resulting tradeoffs in a Fog-supported implementation. Second, after presenting a reference architecture for the local classifiers equipping the considered CDNNs, we develop an optimized framework for the parallel and distributed setting of their decision thresholds. Third, we propose a greedy algorithm for placing the early exits efficiently on the considered CDNNs and prove its linear scaling complexity. Fourth, we analytically characterize in closed-form and analyze the energy performance of the optimal CDNN-onto-Fog mapping. Finally, extensive numerical tests are presented, in order to test and compare the energy-vs.-implementation complexity-vs.-accuracy performance of the resulting optimized CDNN-over-Fog platforms under the IoT-oriented SVHN and FER-2013 datasets. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
23. Alternating Direction Method of Multipliers for Hierarchical Basis Approximators
- Author
-
Khakhutskyy, Valeriy, Pflüger, Dirk, Barth, Timothy J., Series editor, Griebel, Michael, Series editor, Keyes, David E., Series editor, Nieminen, Risto M., Series editor, Roose, Dirk, Series editor, Schlick, Tamar, Series editor, Garcke, Jochen, editor, and Pflüger, Dirk, editor
- Published
- 2014
- Full Text
- View/download PDF
24. An Automated Approach for Estimating the Memory Footprint of Non-linear Data Objects
- Author
-
DreBler, Sebastian, Steinke, Thomas, Hutchison, David, editor, Kanade, Takeo, editor, Kittler, Josef, editor, Kleinberg, Jon M., editor, Kobsa, Alfred, editor, Mattern, Friedemann, editor, Mitchell, John C., editor, Naor, Moni, editor, Nierstrasz, Oscar, editor, Pandu Rangan, C., editor, Steffen, Bernhard, editor, Terzopoulos, Demetri, editor, Tygar, Doug, editor, Weikum, Gerhard, editor, an Mey, Dieter, editor, Alexander, Michael, editor, Bientinesi, Paolo, editor, Cannataro, Mario, editor, Clauss, Carsten, editor, Costan, Alexandru, editor, Kecskemeti, Gabor, editor, Morin, Christine, editor, Ricci, Laura, editor, Sahuquillo, Julio, editor, Schulz, Martin, editor, Scarano, Vittorio, editor, Scott, Stephen L., editor, and Weidendorfer, Josef, editor
- Published
- 2014
- Full Text
- View/download PDF
25. Faster Canvas Picking : Colt 'MainRoach' McAnlis, Developer Advocate, Google
- Author
-
McAnlis, Colt, Lubbers, Petter, Jones, Brandon, Tebbs, Duncan, Manzur, Andrzej, Bennett, Sean, d’Erfurth, Florian, Garcia, Bruno, Lin, Shun, Popelyshev, Ivan, Gauci, Jason, Howard, Jon, Ballantyne, Ian, Freeman, Jesse, Kihira, Takuo, Smith, Tyler, Olmstead, Don, McCutchan, John, Austin, Chad, Pagella, Andres, McAnlis, Colt, Lubbers, Petter, Jones, Brandon, Tebbs, Duncan, Manzur, Andrzej, Bennett, Sean, d’Erfurth, Florian, Garcia, Bruno, Lin, Shun, Popelyshev, Ivan, Gauci, Jason, Howard, Jon, Ballantyne, Ian, Freeman, Jesse, Kihira, Takuo, Smith, Tyler, Olmstead, Don, McCutchan, John, Austin, Chad, and Pagella, Andres
- Published
- 2014
- Full Text
- View/download PDF
26. Distributed Redundant Placement for Microservice-based Applications at the Edge
- Author
-
Shuiguang Deng, Zijie Liu, Jianwei Yin, Hailiang Zhao, and Schahram Dustdar
- Subjects
FOS: Computer and information sciences ,020203 distributed computing ,Information Systems and Management ,Edge device ,Computer Networks and Communications ,Computer science ,business.industry ,Distributed computing ,020206 networking & telecommunications ,Cloud computing ,02 engineering and technology ,Computer Science Applications ,Scheduling (computing) ,Software Engineering (cs.SE) ,Computer Science - Software Engineering ,Computer Science - Distributed, Parallel, and Cluster Computing ,Hardware and Architecture ,High availability ,0202 electrical engineering, electronic engineering, information engineering ,Memory footprint ,Redundancy (engineering) ,Stochastic optimization ,Distributed, Parallel, and Cluster Computing (cs.DC) ,business ,Edge computing - Abstract
Multi-access Edge Computing (MEC) is booming as a promising paradigm to push the computation and communication resources from cloud to the network edge to provide services and to perform computations. With container technologies, mobile devices with small memory footprint can run composite microservice-based applications without time-consuming backbone. Service placement at the edge is of importance to put MEC from theory into practice. However, current state-of-the-art research does not sufficiently take the composite property of services into consideration. Besides, although Kubernetes has certain abilities to heal container failures, high availability cannot be ensured due to heterogeneity and variability of edge sites. To deal with these problems, we propose a distributed redundant placement framework SAA-RP and a GA-based Server Selection (GASS) algorithm for microservice-based applications with sequential combinatorial structure. We formulate a stochastic optimization problem with the uncertainty of microservice request considered, and then decide for each microservice, how it should be deployed and with how many instances as well as on which edge sites to place them. Benchmark policies are implemented in two scenarios, where redundancy is allowed and not, respectively. Numerical results based on a real-world dataset verify that GASS significantly outperforms all the benchmark policies.
- Published
- 2022
- Full Text
- View/download PDF
27. A 12.1 TOPS/W Quantized Network Acceleration Processor With Effective-Weight-Based Convolution and Error-Compensation-Based Prediction
- Author
-
Wenping Zhu, Ang Li, Shouyi Yin, Wenjing Hu, Leibo Liu, Qiang Li, Huiyu Mo, and Shaojun Wei
- Subjects
Acceleration ,Computer science ,Pipeline (computing) ,Memory footprint ,Multiplication ,Electrical and Electronic Engineering ,Residual ,Algorithm ,Energy (signal processing) ,Efficient energy use ,Convolution - Abstract
In this article, a quantized network acceleration processor (QNAP) is proposed to efficiently accelerate CNN processing by eliminating most unessential operations based on algorithm-hardware co-optimizations. First, an effective-weight-based convolution (EWC) is proposed to distinguish a group of effective weights (EWs) to replace the other unique weights. Therefore, the input activations corresponding to the same EW can be accumulated first and then multiplied by the EW to reduce amounts of multiplication operations, which is efficiently supported by the dedicated process elements in QNAP. The experimental results show that energy efficiency is improved by 1.59x-3.20x compared with different UCNN implementations. Second, an error-compensation-based prediction (ECP) method adopts trained compensated values to replace partly unimportant partial sums to further reduce potentially redundant addition operations caused by the ReLU function. Compared with SnaPEA and Pred on AlexNet, 1.23x and 1.75x higher energy efficiencies (TOPS/W) are achieved by ECP, respectively, with marginal accuracy loss. Third, the residual pipeline mode is proposed to efficiently implement residual blocks with a 1.5x lower memory footprint, a 1.18x lower power consumption, and a 13.15% higher hardware utilization on average than existing works. Finally, the QNAP processor is fabricated in the TSMC 28-nm CMOS process with a core area of 1.9 mm². Benchmarked with AlexNet, VGGNet, GoogLeNet, and ResNet on ImageNet at 470 MHz and 0.9 V, the processor achieves 117.4 frames per second with 131.6-mW power consumption on average, which outperforms the state-of-the-art processors by 1.77x-24.20x in energy efficiency.
- Published
- 2022
- Full Text
- View/download PDF
28. Mixed-Precision Kernel Recursive Least Squares
- Author
-
Dimitrios S. Nikolopoulos, Hans Vandierendonck, and JunKyu Lee
- Subjects
Kernel recursive least squares ,Series (mathematics) ,Computer Networks and Communications ,Computation ,Chaotic ,Online machine learning ,02 engineering and technology ,Computer Science Applications ,Artificial Intelligence ,0202 electrical engineering, electronic engineering, information engineering ,Memory footprint ,020201 artificial intelligence & image processing ,Algorithm ,Nonlinear regression ,Throughput (business) ,Software ,Mathematics - Abstract
Kernel recursive least squares (KRLS) is a widely used online machine learning algorithm for time series predictions. In this article, we present the mixed-precision KRLS, producing equivalent prediction accuracy to double-precision KRLS with a higher training throughput and a lower memory footprint. The mixed-precision KRLS applies single-precision arithmetic to the computation components being not only numerically resilient but also computationally intensive. Our mixed-precision KRLS demonstrates the 1.32, 1.15, 1.29, 1.09, and 1.08x training throughput improvements using 24.95%, 24.74%, 24.89%, 24.48%, and 24.20% less memory footprint without losing any prediction accuracy compared to double-precision KRLS for a 3-D nonlinear regression, a Lorenz chaotic time series, a Mackey-Glass chaotic time series, a sunspot number time series, and a sea surface temperature time series, respectively.
- Published
- 2022
- Full Text
- View/download PDF
29. Lime: Low-Cost and Incremental Learning for Dynamic Heterogeneous Information Networks
- Author
-
Zheng Wang, Lifang He, Renyu Yang, Philip S. Yu, Jianxin Li, Albert Y. Zomaya, Raj Ranjan, and Hao Peng
- Subjects
Exploit ,Computer science ,Node (networking) ,Distributed computing ,02 engineering and technology ,Semantics ,ENCODE ,020202 computer hardware & architecture ,Theoretical Computer Science ,Evolving networks ,Computational Theory and Mathematics ,Hardware and Architecture ,0202 electrical engineering, electronic engineering, information engineering ,Memory footprint ,Anomaly detection ,Representation (mathematics) ,Software - Abstract
Understanding the interconnected relationships of large-scale information networks like social, scholar and Internet of Things networks is vital for tasks like recommendation and fraud detection. The vast majority of the real-world networks are inherently heterogeneous and dynamic, containing many different types of nodes and edges and can change drastically over time. The dynamicity and heterogeneity make it extremely challenging to reason about the network structure. Unfortunately, existing approaches are inadequate in modeling real-life networks as they require extensive computational resources and do not scale well to large, dynamically evolving networks. We introduce LIME, a better approach for modeling dynamic and heterogeneous information networks. LIME is designed to extract high-quality network representation with significantly lower memory resources and computational time over the state-of-the-art. Unlike prior work that uses a vector to encode each network node, we exploit the semantic relationships among network nodes to encode multiple nodes with similar semantics in shared vectors. We evaluate LIME by applying it to three representative network-based tasks, node classification, node clustering and anomaly detection, performing on three large-scale datasets. Our extensive experiments demonstrate that LIME not only reduces the memory footprint by over 80\% and computational time over 2x when learning network representation but also delivers comparable performance for downstream processing tasks.
- Published
- 2022
- Full Text
- View/download PDF
30. Design Prototype and Security Analysis of a Lightweight Joint Compression and Encryption Scheme for Resource-Constrained IoT Devices
- Author
-
Qi Zhang and Gajraj Kuldeep
- Subjects
information security ,Computer Networks and Communications ,Computer science ,Internet of Things ,compressive sensing ,Encryption ,Data security ,Cryptography ,Length measurement ,Energy measurement ,signal reconstruction ,energy efficiency ,Secure channel ,Sensors ,Signal reconstruction ,business.industry ,joint compression and encryption ,sensing matrix generation ,Computer Science Applications ,Hardware and Architecture ,Encoding ,Sparse matrices ,Signal Processing ,Memory footprint ,business ,Energy (signal processing) ,Computer hardware ,Information Systems ,Efficient energy use - Abstract
Compressive sensing (CS) can provide joint compression and encryption, which is promising to address the challenges of massive sensor data and data security in the Internet of Things (IoT). However, as IoT devices have constrained memory, computing power, and energy, in practice the CS-based computationally secure scheme is shown to be vulnerable to ciphertext-only attack for short signal length. Although the CS-based perfectly secure scheme has no such vulnerabilities, its practical realization is challenging. In this paper, we propose an energy concealment (EC) encryption scheme, a practical realization of the perfectly secure scheme by concealing energy, thereby removing the requirement of an additional secure channel. We propose three different methods to generate sensing matrix to improve energy efficiency using linear feedback shift registers and lagged Fibonacci sequences. Leveraging the signal’s maximum energy in the EC scheme, we design a new measure to evaluate reconstructed signal quality without the knowledge of the original signal. Furthermore, a new CS decoding algorithm is designed by incorporating the knowledge of maximum energy at the decoder, which improves the signal reconstruction quality while reducing the number of measurements. Additionally, our comprehensive security analysis shows that the EC scheme is secure against various cryptographic attacks. We implement the EC scheme using the three different ways of generating the sensing matrix in the resource-constrained TelosB mote using the Contiki operating system. The experimental results demonstrate that the EC scheme outperforms AES in terms of code memory footprint and total energy consumption.
- Published
- 2022
- Full Text
- View/download PDF
31. Nonlinear MPC of a Heavy-Duty Diesel Engine With Learning Gaussian Process Regression
- Author
-
Knut Graichen, Jens Niemeyer, Daniel Bergmann, and Karsten Harder
- Subjects
symbols.namesake ,Nonlinear system ,Control and Systems Engineering ,Control theory ,Computer science ,Kriging ,Path (graph theory) ,symbols ,Memory footprint ,Electrical and Electronic Engineering ,Diesel engine ,Gaussian process ,Smoothing - Abstract
This contribution presents a method for modeling and controlling a heavy-duty biturbocharged diesel engine. The modeling scheme can incorporate expert knowledge of the control relevant combustion quantities into Gaussian process models. A nonlinear model predictive controller (MPC) is used to control the engine outputs subject to the gas path dynamics and nonlinear constraints for the emissions and for the sake of engine protection. In addition, an online learning scheme based on Gaussian process regression is used to compensate for model uncertainties due to aging effects and manufacturing tolerances. A consistent model smoothing strategy is derived to preserve the given expert knowledge and to avoid abrupt reactions of the MPC due to the online learning of the models. All parts of the controller are implemented with respect to real-time feasibility and small memory footprint. Experimental results for a real-world heavy-duty engine demonstrate the performance and the online learning ability of the presented nonlinear MPC scheme that may be transferred to various diesel engine applications.
- Published
- 2022
- Full Text
- View/download PDF
32. SPRING: A Sparsity-Aware Reduced-Precision Monolithic 3D CNN Accelerator Architecture for Training and Inference
- Author
-
Ye Yu and Niraj K. Jha
- Subjects
FOS: Computer and information sciences ,Computational complexity theory ,Computer science ,02 engineering and technology ,01 natural sciences ,Convolutional neural network ,Bottleneck ,Hardware Architecture (cs.AR) ,0103 physical sciences ,0202 electrical engineering, electronic engineering, information engineering ,Computer Science (miscellaneous) ,Computer Science - Hardware Architecture ,010302 applied physics ,business.industry ,Deep learning ,Memory bandwidth ,020202 computer hardware & architecture ,Computer Science Applications ,Human-Computer Interaction ,Computer engineering ,Memory footprint ,Hardware acceleration ,Artificial intelligence ,Performance improvement ,business ,Information Systems - Abstract
CNNs outperform traditional machine learning algorithms across a wide range of applications. However, their computational complexity makes it necessary to design efficient hardware accelerators. Most CNN accelerators focus on exploring dataflow styles that exploit computational parallelism. However, potential performance speedup from sparsity has not been adequately addressed. The computation and memory footprint of CNNs can be significantly reduced if sparsity is exploited in network evaluations. To take advantage of sparsity, some accelerator designs explore sparsity encoding and evaluation on CNN accelerators. However, sparsity encoding is just performed on activation or weight and only in inference. It has been shown that activation and weight also have high sparsity levels during training. Hence, sparsity-aware computation should also be considered in training. To further improve performance and energy efficiency, some accelerators evaluate CNNs with limited precision. However, this is limited to the inference since reduced precision sacrifices network accuracy if used in training. In addition, CNN evaluation is usually memory-intensive, especially in training. In this paper, we propose SPRING, a SParsity-aware Reduced-precision Monolithic 3D CNN accelerator for trainING and inference. SPRING supports both CNN training and inference. It uses a binary mask scheme to encode sparsities in activation and weight. It uses the stochastic rounding algorithm to train CNNs with reduced precision without accuracy loss. To alleviate the memory bottleneck in CNN evaluation, especially in training, SPRING uses an efficient monolithic 3D NVM interface to increase memory bandwidth. Compared to GTX 1080 Ti, SPRING achieves 15.6X, 4.2X and 66.0X improvements in performance, power reduction, and energy efficiency, respectively, for CNN training, and 15.5X, 4.5X and 69.1X improvements for inference.
- Published
- 2022
- Full Text
- View/download PDF
33. Energy-Quality Scalable Monocular Depth Estimation on Low-Power CPUs
- Author
-
Valentino Peluso, Fabio Tosi, Stefano Mattoccia, Antonio Cipolletta, Andrea Calimera, Filippo Aleotti, Matteo Poggi, Cipolletta, Antonio, Peluso, Valentino, Calimera, Andrea, Poggi, Matteo, Tosi, Fabio, Aleotti, Filippo, and Mattoccia, Stefano
- Subjects
Computer Networks and Communications ,Computer science ,business.industry ,Deep learning ,Process (computing) ,deep learning ,monocular depth estimation ,Program optimization ,Convolutional neural networks (CNNs) ,Convolutional neural network ,Computer Science Applications ,Computer engineering ,Hardware and Architecture ,Signal Processing ,Scalability ,Memory footprint ,Monocular Depth Estimation, Energy-Quality Scaling, Embedded Systems, Low-Power CPUs, Convolutional Neural Networks, Deep Learning ,embedded systems ,Artificial intelligence ,energy-quality scaling ,low-power CPUs ,business ,Quantization (image processing) ,Information Systems ,Efficient energy use - Abstract
The recent advancements in deep learning have demonstrated that inferring high-quality depth maps from a single image has become feasible and accurate, thanks to convolutional neural networks (CNNs), but how to process such compute- and memory-intensive models on portable and low-power devices remains a concern. Dynamic energy-quality scaling is an interesting yet less explored option in this field. It can improve efficiency through opportunistic computing policies where performances are boosted only when needed, achieving on average substantial energy savings. Implementing such a computing paradigm encompasses the availability of a scalable inference model, which is the target of this work. Specifically, we describe and characterize the design of an energy-quality scalable pyramidal network (EQPyD-Net), a lightweight CNN capable of modulating at runtime the computational effort with minimal memory resources. We describe the architecture of the network and the optimization flow, covering the important aspects that enable the dynamic scaling, namely, the optimized training procedures, the compression stage via fixed-point quantization, and the code optimization for the deployment on commercial low-power CPUs adopted in the edge segment. To assess the effect of the proposed design knobs, we evaluated the prediction quality on the standard KITTI data set and the energy and memory resources on the ARM Cortex-A53 CPU. The collected results demonstrate the flexibility of the proposed network and its energy efficiency. EQPyD-Net can be shifted across five operating points, ranging from a maximum accuracy of 82.2% with 0.4 Frame/J and up to 92.6% of energy savings with 6.1% of accuracy loss, still keeping a compact memory footprint of 5.2 MB for the weights and 38.3 MB (in the worst case) for the processing.
- Published
- 2022
- Full Text
- View/download PDF
34. Tear the Image Into Strips for Style Transfer
- Author
-
Xiaoyang Zeng, Yujie Huang, Minge Jing, Yibo Fan, and Yuhao Liu
- Subjects
business.industry ,Computer science ,Deep learning ,STRIPS ,Convolutional neural network ,Computer Science Applications ,law.invention ,Transmission (telecommunications) ,Feature (computer vision) ,law ,Signal Processing ,Media Technology ,Memory footprint ,Computer vision ,Artificial intelligence ,Electrical and Electronic Engineering ,Latency (engineering) ,business ,Image resolution - Abstract
Recently, Deep Convolutional Neural Networks (DCNNs) have achieved remarkable progress in computer vision community, including in style transfer tasks. Normally, most methods feed the full image to the DCNN. Although highquality results can be achieved in this manner, several underlying problems arise. For one, with the increase in image resolution, the memory footprint will increase dramatically, leading to high latency and massive power consumption. Furthermore, these methods are usually unable to integrate with the commercial image signal processor (ISP), which processes the image in a line-sequential manner. To solve the above problems, we propose a novel ISP-friendly deep learning-based style transfer algorithm: SequentialStyle. A brand new line-sequential processing mode is proposed, where the image is torn into strips, and each strip is sequentially processed, contributing to less memory demand. We further propose a Spatial-Temporal Synergistic (STS) mechanism that decouples the previously simplex 2-D image style transfer into spatial feature processing (in-strip) and temporal correlation transmission (in-between strips). Compared with the SOTA style transfer algorithms, experimental results show that our SequentialStyle is competitive. Besides, SequentialStyle has less demand for memory consumption, even for the images whose resolutions are 4k or higher.
- Published
- 2022
- Full Text
- View/download PDF
35. Improving motion‐mask segmentation in thoracic CT with multiplanar U‐nets
- Author
-
Nicolas Pinon, Maciej Orkisz, Jean-Christophe Richard, Ludmilla Penarrubia, Eduardo Enrique Dávila Serrano, David Sarrut, Emmanuel Roux, Modeling & analysis for medical imaging and Diagnosis (MYRIAD), Centre de Recherche en Acquisition et Traitement de l'Image pour la Santé (CREATIS), Université Jean Monnet [Saint-Étienne] (UJM)-Hospices Civils de Lyon (HCL)-Institut National des Sciences Appliquées de Lyon (INSA Lyon), Université de Lyon-Institut National des Sciences Appliquées (INSA)-Université de Lyon-Institut National des Sciences Appliquées (INSA)-Université Claude Bernard Lyon 1 (UCBL), Université de Lyon-Centre National de la Recherche Scientifique (CNRS)-Institut National de la Santé et de la Recherche Médicale (INSERM)-Université Jean Monnet [Saint-Étienne] (UJM)-Hospices Civils de Lyon (HCL)-Institut National des Sciences Appliquées de Lyon (INSA Lyon), Université de Lyon-Centre National de la Recherche Scientifique (CNRS)-Institut National de la Santé et de la Recherche Médicale (INSERM), Service de Réanimation Médicale, Hôpital de la Croix-Rousse [CHU - HCL], Hospices Civils de Lyon (HCL)-Hospices Civils de Lyon (HCL), and Imagerie Tomographique et Radiothérapie
- Subjects
Lung Neoplasms ,Computer science ,Image registration ,030218 nuclear medicine & medical imaging ,03 medical and health sciences ,0302 clinical medicine ,Robustness (computer science) ,Carcinoma, Non-Small-Cell Lung ,[INFO.INFO-IM]Computer Science [cs]/Medical Imaging ,Image Processing, Computer-Assisted ,Humans ,Segmentation ,Generalizability theory ,Computer vision ,Four-Dimensional Computed Tomography ,Artificial neural network ,SARS-CoV-2 ,business.industry ,Deep learning ,Process (computing) ,COVID-19 ,General Medicine ,3. Good health ,030220 oncology & carcinogenesis ,Memory footprint ,Artificial intelligence ,business - Abstract
International audience; Purpose. Motion-mask segmentation from thoracic computed tomography (CT) images is the process of extracting the region that encompasses lungs and viscera, where large displacements occur during breathing. It has been shown to help image registration between different respiratory phases. This registration step is, for example, useful for radiotherapy planning or calculating local lung ventilation. Knowing the location of motion discontinuity, that is, sliding motion near the pleura, allows a better control of the registration preventing unrealistic estimates. Nevertheless, existing methods for motion-mask segmentation are not robust enough to be used in clinical routine. This article shows that it is feasible to overcome this lack of robustness by using a lightweight deep-learning approach usable on a standard computer, and this even without data augmentation or advanced model design.Methods. A convolutional neural-network architecture with three 2D U-nets for the three main orientations (sagittal, coronal, axial) was proposed. Predictions generated by the three U-nets were combined by majority voting to provide a single 3D segmentation of the motion mask. The networks were trained on a database of nonsmall cell lung cancer 4D CT images of 43 patients. Training and evaluation were done with a K-fold cross-validation strategy. Evaluation was based on a visual grading by two experts according to the appropriateness of the segmented motion mask for the registration task, and on a comparison with motion masks obtained by a baseline method using level sets. A second database (76 CT images of patients with early-stage COVID-19), unseen during training, was used to assess the generalizability of the trained neural network.Results. The proposed approach outperformed the baseline method in terms of quality and robustness: the success rate increased from urn:x-wiley:00942405:media:mp15347:mp15347-math-0001 to urn:x-wiley:00942405:media:mp15347:mp15347-math-0002 without producing any failure. It also achieved a speed-up factor of 60 with GPU, or 17 with CPU. The memory footprint was low: less than 5 GB GPU RAM for training and less than 1 GB GPU RAM for inference. When evaluated on a dataset with images differing by several characteristics (CT device, pathology, and field of view), the proposed method improved the success rate from urn:x-wiley:00942405:media:mp15347:mp15347-math-0003 to urn:x-wiley:00942405:media:mp15347:mp15347-math-0004.Conclusion. With 5-s processing time on a mid-range GPU and success rates around urn:x-wiley:00942405:media:mp15347:mp15347-math-0005, the proposed approach seems fast and robust enough to be routinely used in clinical practice. The success rate can be further improved by incorporating more diversity in training data via data augmentation and additional annotated images from different scanners and diseases. The code and trained model are publicly available.
- Published
- 2021
- Full Text
- View/download PDF
36. AutoRank: Automated Rank Selection for Effective Neural Network Customization
- Author
-
Mohammad Samragh, Mojan Javaheripi, and Farinaz Koushanfar
- Subjects
Artificial neural network ,Computer science ,business.industry ,Deep learning ,Distributed computing ,Rank (computer programming) ,Inference ,Personalization ,Task (project management) ,Memory footprint ,Decomposition (computer science) ,Artificial intelligence ,Electrical and Electronic Engineering ,business - Abstract
Tensor decomposition is a promising approach for low-power and real-time application of neural networks on resource-constrained embedded devices. This paper proposes AutoRank, an end-to-end framework for customizing neural network decomposition using cross-layer rank-selection. For many-layer networks, determining the optimal decomposition ranks is a cumbersome task. To overcome this challenge, we establish a state-action-reward system that effectively absorbs inference accuracy and platform specifications into the rank-selection policy. Our proposed framework brings platform characteristics and performance in the customization loop to enable direct incorporation of hardware cost, e.g., runtime and memory footprint. By means of this hardware-awareness, AutoRank customization engine delivers high accuracy decomposed deep neural networks with low execution cost. Our framework minimizes the engineering cost associated with rank selection by providing an automated API for AutoRank that is compatible with popular deep learning libraries and can be readily used by developers.
- Published
- 2021
- Full Text
- View/download PDF
37. Non-overlapping geometric shadow map
- Author
-
Erison Miller Santos Mesquita, Joaquim Bento Cavalcante-Neto, Creto Augusto Vidal, and Rafael Fernandes Ivo
- Subjects
business.industry ,Computer science ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,General Engineering ,Data structure ,Computer Graphics and Computer-Aided Design ,Rendering (computer graphics) ,Human-Computer Interaction ,Shadow ,Memory footprint ,Preprocessor ,Point (geometry) ,Computer vision ,Artificial intelligence ,business ,Shadow mapping ,Clipping (computer graphics) ,ComputingMethodologies_COMPUTERGRAPHICS - Abstract
Real-time rendering of pixel-accurate hard shadows in 3D scenes is still a challenging problem for shadow generation techniques. In this paper, we present the concept of non-overlapping geometric shadow map (NOGSM), which enables the construction of high-quality hard shadows from static lights for any point in the scene. The NOGSM is created in a preprocessing phase, in which the scene’s geometry is looked at from the light’s point of view. All hidden surfaces are removed through total elimination of hidden polygons plus clipping of the partially hidden polygons. The resulting list of non-overlapping projected polygons is then stored in a multi-grid data structure. Our solution generates shadows with far better quality than standard shadow mapping techniques, while maintaining real time performance, and using a smaller memory footprint than compression techniques.
- Published
- 2021
- Full Text
- View/download PDF
38. Q-PPG: Energy-Efficient PPG-Based Heart Rate Monitoring on Wearable Devices
- Author
-
Massimo Poncino, Luca Benini, Enrico Macii, Daniele Jahier Pagliari, Alessio Burrello, Simone Benatti, and Matteo Risso
- Subjects
Signal Processing (eess.SP) ,FOS: Computer and information sciences ,Computer Science - Machine Learning ,deep neural networks ,embedded systems ,healthcare ,Hearth rate monitoring ,photoplethysmography ,quantization ,wearable devices ,Algorithms ,Artifacts ,Heart Rate ,Signal Processing, Computer-Assisted ,Photoplethysmography ,Wearable Electronic Devices ,Computer Science - Artificial Intelligence ,Computer science ,Design space exploration ,Real-time computing ,Biomedical Engineering ,Machine Learning (cs.LG) ,Acceleration ,Computer-Assisted ,FOS: Electrical engineering, electronic engineering, information engineering ,Electrical Engineering and Systems Science - Signal Processing ,Electrical and Electronic Engineering ,Wearable technology ,business.industry ,Deep learning ,Energy consumption ,Microcontroller ,Artificial Intelligence (cs.AI) ,Signal Processing ,Memory footprint ,Artificial intelligence ,business ,Efficient energy use - Abstract
Hearth Rate (HR) monitoring is increasingly performed in wrist-worn devices using low-cost photoplethysmography (PPG) sensors. However, Motion Artifacts (MAs) caused by movements of the subject's arm affect the performance of PPG-based HR tracking. This is typically addressed coupling the PPG signal with acceleration measurements from an inertial sensor. Unfortunately, most standard approaches of this kind rely on hand-tuned parameters, which impair their generalization capabilities and their applicability to real data in the field. In contrast, methods based on deep learning, despite their better generalization, are considered to be too complex to deploy on wearable devices. In this work, we tackle these limitations, proposing a design space exploration methodology to automatically generate a rich family of deep Temporal Convolutional Networks (TCNs) for HR monitoring, all derived from a single "seed" model. Our flow involves a cascade of two Neural Architecture Search (NAS) tools and a hardware-friendly quantizer, whose combination yields both highly accurate and extremely lightweight models. When tested on the PPG-Dalia dataset, our most accurate model sets a new state-of-the-art in Mean Absolute Error. Furthermore, we deploy our TCNs on an embedded platform featuring a STM32WB55 microcontroller, demonstrating their suitability for real-time execution. Our most accurate quantized network achieves 4.41 Beats Per Minute (BPM) of Mean Absolute Error (MAE), with an energy consumption of 47.65 mJ and a memory footprint of 412 kB. At the same time, the smallest network that obtains a MAE < 8 BPM, among those generated by our flow, has a memory footprint of 1.9 kB and consumes just 1.79 mJ per inference.
- Published
- 2021
- Full Text
- View/download PDF
39. Structured Ensembles: An approach to reduce the memory footprint of ensemble methods
- Author
-
Simone Scardapane, Jary Pomponi, and Aurelio Uncini
- Subjects
FOS: Computer and information sciences ,Computer Science - Machine Learning ,continual learning ,deep learning ,ensemble ,neural networks ,pruning ,structured pruning ,uncertainty ,learning ,computer ,Computer Science - Artificial Intelligence ,Computer science ,Cognitive Neuroscience ,Machine Learning (stat.ML) ,Machine learning ,computer.software_genre ,Regularization (mathematics) ,Machine Learning (cs.LG) ,Statistics - Machine Learning ,Artificial Intelligence ,Learning ,Pruning (decision trees) ,Forgetting ,Artificial neural network ,business.industry ,Deep learning ,Uncertainty ,Ensemble learning ,Task (computing) ,Artificial Intelligence (cs.AI) ,Memory footprint ,Neural Networks, Computer ,Artificial intelligence ,business - Abstract
In this paper, we propose a novel ensembling technique for deep neural networks, which is able to drastically reduce the required memory compared to alternative approaches. In particular, we propose to extract multiple sub-networks from a single, untrained neural network by solving an end-to-end optimization task combining differentiable scaling over the original architecture, with multiple regularization terms favouring the diversity of the ensemble. Since our proposal aims to detect and extract sub-structures, we call it Structured Ensemble. On a large experimental evaluation, we show that our method can achieve higher or comparable accuracy to competing methods while requiring significantly less storage. In addition, we evaluate our ensembles in terms of predictive calibration and uncertainty, showing they compare favourably with the state-of-the-art. Finally, we draw a link with the continual learning literature, and we propose a modification of our framework to handle continuous streams of tasks with a sub-linear memory cost. We compare with a number of alternative strategies to mitigate catastrophic forgetting, highlighting advantages in terms of average accuracy and memory., Comment: Article accepted at Neural Networks
- Published
- 2021
- Full Text
- View/download PDF
40. Computational Neuroscience Breakthroughs through Innovative Data Management
- Author
-
Tauheed, Farhan, Nobari, Sadegh, Biveinis, Laurynas, Heinis, Thomas, Ailamaki, Anastasia, Hutchison, David, editor, Kanade, Takeo, editor, Kittler, Josef, editor, Kleinberg, Jon M., editor, Mattern, Friedemann, editor, Mitchell, John C., editor, Naor, Moni, editor, Nierstrasz, Oscar, editor, Pandu Rangan, C., editor, Steffen, Bernhard, editor, Sudan, Madhu, editor, Terzopoulos, Demetri, editor, Tygar, Doug, editor, Vardi, Moshe Y., editor, Weikum, Gerhard, editor, Catania, Barbara, editor, Guerrini, Giovanna, editor, and Pokorný, Jaroslav, editor
- Published
- 2013
- Full Text
- View/download PDF
41. A Multi GPU Read Alignment Algorithm with Model-Based Performance Optimization
- Author
-
Drozd, Aleksandr, Maruyama, Naoya, Matsuoka, Satoshi, Hutchison, David, editor, Kanade, Takeo, editor, Kittler, Josef, editor, Kleinberg, Jon M., editor, Mattern, Friedemann, editor, Mitchell, John C., editor, Naor, Moni, editor, Nierstrasz, Oscar, editor, Pandu Rangan, C., editor, Steffen, Bernhard, editor, Sudan, Madhu, editor, Terzopoulos, Demetri, editor, Tygar, Doug, editor, Vardi, Moshe Y., editor, Weikum, Gerhard, editor, Daydé, Michel, editor, Marques, Osni, editor, and Nakajima, Kengo, editor
- Published
- 2013
- Full Text
- View/download PDF
42. QMC=Chem: A Quantum Monte Carlo Program for Large-Scale Simulations in Chemistry at the Petascale Level and beyond
- Author
-
Scemama, Anthony, Caffarel, Michel, Oseret, Emmanuel, Jalby, William, Hutchison, David, editor, Kanade, Takeo, editor, Kittler, Josef, editor, Kleinberg, Jon M., editor, Mattern, Friedemann, editor, Mitchell, John C., editor, Naor, Moni, editor, Nierstrasz, Oscar, editor, Pandu Rangan, C., editor, Steffen, Bernhard, editor, Sudan, Madhu, editor, Terzopoulos, Demetri, editor, Tygar, Doug, editor, Vardi, Moshe Y., editor, Weikum, Gerhard, editor, Daydé, Michel, editor, Marques, Osni, editor, and Nakajima, Kengo, editor
- Published
- 2013
- Full Text
- View/download PDF
43. Compiler Help for Binary Manipulation Tools
- Author
-
Ince, Tugrul, Hollingsworth, Jeffrey K., Hutchison, David, editor, Kanade, Takeo, editor, Kittler, Josef, editor, Kleinberg, Jon M., editor, Mattern, Friedemann, editor, Mitchell, John C., editor, Naor, Moni, editor, Nierstrasz, Oscar, editor, Pandu Rangan, C., editor, Steffen, Bernhard, editor, Sudan, Madhu, editor, Terzopoulos, Demetri, editor, Tygar, Doug, editor, Vardi, Moshe Y., editor, Weikum, Gerhard, editor, Caragiannis, Ioannis, editor, Alexander, Michael, editor, Badia, Rosa Maria, editor, Cannataro, Mario, editor, Costan, Alexandru, editor, Danelutto, Marco, editor, Desprez, Frédéric, editor, Krammer, Bettina, editor, Sahuquillo, Julio, editor, Scott, Stephen L., editor, and Weidendorfer, Josef, editor
- Published
- 2013
- Full Text
- View/download PDF
44. Parallel Implementation of the Sherman-Morrison Matrix Inverse Algorithm
- Author
-
He, Xin, Holm, Marcus, Neytcheva, Maya, Hutchison, David, editor, Kanade, Takeo, editor, Kittler, Josef, editor, Kleinberg, Jon M., editor, Mattern, Friedemann, editor, Mitchell, John C., editor, Naor, Moni, editor, Nierstrasz, Oscar, editor, Pandu Rangan, C., editor, Steffen, Bernhard, editor, Sudan, Madhu, editor, Terzopoulos, Demetri, editor, Tygar, Doug, editor, Vardi, Moshe Y., editor, Weikum, Gerhard, editor, Manninen, Pekka, editor, and Öster, Per, editor
- Published
- 2013
- Full Text
- View/download PDF
45. Reducing the Memory Footprint of Parallel Applications with KSM
- Author
-
Rauschmayr, Nathalie, Streit, Achim, Hutchison, David, editor, Kanade, Takeo, editor, Kittler, Josef, editor, Kleinberg, Jon M., editor, Mattern, Friedemann, editor, Mitchell, John C., editor, Naor, Moni, editor, Nierstrasz, Oscar, editor, Pandu Rangan, C., editor, Steffen, Bernhard, editor, Sudan, Madhu, editor, Terzopoulos, Demetri, editor, Tygar, Doug, editor, Vardi, Moshe Y., editor, Weikum, Gerhard, editor, Keller, Rainer, editor, Kramer, David, editor, and Weiss, Jan-Philipp, editor
- Published
- 2013
- Full Text
- View/download PDF
46. Memory-Efficient Deep Learning on a SpiNNaker 2 Prototype
- Author
-
Chen Liu, Guillaume Bellec, Bernhard Vogginger, David Kappel, Johannes Partzsch, Felix Neumärker, Sebastian Höppner, Wolfgang Maass, Steve B. Furber, Robert Legenstein, and Christian G. Mayr
- Subjects
deep rewiring ,pruning ,sparsity ,SpiNNaker ,memory footprint ,parallelism ,Neurosciences. Biological psychiatry. Neuropsychiatry ,RC321-571 - Abstract
The memory requirement of deep learning algorithms is considered incompatible with the memory restriction of energy-efficient hardware. A low memory footprint can be achieved by pruning obsolete connections or reducing the precision of connection strengths after the network has been trained. Yet, these techniques are not applicable to the case when neural networks have to be trained directly on hardware due to the hard memory constraints. Deep Rewiring (DEEP R) is a training algorithm which continuously rewires the network while preserving very sparse connectivity all along the training procedure. We apply DEEP R to a deep neural network implementation on a prototype chip of the 2nd generation SpiNNaker system. The local memory of a single core on this chip is limited to 64 KB and a deep network architecture is trained entirely within this constraint without the use of external memory. Throughout training, the proportion of active connections is limited to 1.3%. On the handwritten digits dataset MNIST, this extremely sparse network achieves 96.6% classification accuracy at convergence. Utilizing the multi-processor feature of the SpiNNaker system, we found very good scaling in terms of computation time, per-core memory consumption, and energy constraints. When compared to a X86 CPU implementation, neural network training on the SpiNNaker 2 prototype improves power and energy consumption by two orders of magnitude.
- Published
- 2018
- Full Text
- View/download PDF
47. Gaussian Mixture Background Modelling Optimisation for Micro-controllers
- Author
-
Salvadori, Claudio, Makris, Dimitrios, Petracca, Matteo, Martinez-del-Rincon, Jesus, Velastin, Sergio, Hutchison, David, editor, Kanade, Takeo, editor, Kittler, Josef, editor, Kleinberg, Jon M., editor, Mattern, Friedemann, editor, Mitchell, John C., editor, Naor, Moni, editor, Nierstrasz, Oscar, editor, Pandu Rangan, C., editor, Steffen, Bernhard, editor, Sudan, Madhu, editor, Terzopoulos, Demetri, editor, Tygar, Doug, editor, Vardi, Moshe Y., editor, Weikum, Gerhard, editor, Bebis, George, editor, Boyle, Richard, editor, Parvin, Bahram, editor, Koracin, Darko, editor, Fowlkes, Charless, editor, Wang, Sen, editor, Choi, Min-Hyung, editor, Mantler, Stephan, editor, Schulze, Jürgen, editor, Acevedo, Daniel, editor, Mueller, Klaus, editor, and Papka, Michael, editor
- Published
- 2012
- Full Text
- View/download PDF
48. Optimization: Memory Footprint
- Author
-
Blunden, Bill and Blunden, Bill
- Published
- 2012
- Full Text
- View/download PDF
49. MSAR-Net: Multi-scale attention based light-weight image super-resolution
- Author
-
Subrahmanyam Murala and Nancy Mehta
- Subjects
Pixel ,Channel (digital image) ,Artificial neural network ,Computer science ,business.industry ,Pattern recognition ,Scale factor ,Artificial Intelligence ,Feature (computer vision) ,Signal Processing ,Benchmark (computing) ,Memory footprint ,Computer Vision and Pattern Recognition ,Artificial intelligence ,business ,Software ,Block (data storage) - Abstract
Recently, single image super-resolution (SISR), aiming to preserve the lost structural and textural information from the input low resolution image, has witnessed huge demand from the videos and graphics industries. The exceptional success of convolution neural networks (CNNs), has absolutely revolutionized the field of SISR. However, for most of the CNN-based SISR methods, excessive memory consumption in terms of parameters and flops, hinders their application in low-computing power devices. Moreover, different state-of-the-art SR methods collect different features, by treating all the pixels contributing equally to the performance of the network. In this paper, we take into consideration both the performance and the reconstruction efficiency, and propose a Light-weight multi-scale attention residual network (MSAR-Net) for SISR. The proposed MSAR-Net consists of stack of multi-scale attention residual (MSAR) blocks for feature refinement, and an up and down-sampling projection (UDP) block for edge refinement of the extracted multi-scale features. These blocks are capable of effectively exploiting the multi-scale edge information, without increasing the number of parameters. Specially, we design our network in progressive fashion, for substituting the large scale factors ( × 4) combinations, with small scale factor ( × 2) combinations, and thus gradually exploit the hierarchical information. In parallel, for modulation of multi-scale features in global and local manners, channel and spatial attention in MSAR block is being used. Visual results and quantitative metrics of PSNR and SSIM exhibit the accuracy of the proposed approach on synthetic benchmark super-resolution datasets. The experimental analysis shows that the proposed approach outperforms the other existing methods for SISR in terms of memory footprint, inference time and visual quality.
- Published
- 2021
- Full Text
- View/download PDF
50. Applying Lightweight Soft Error Mitigation Techniques to Embedded Mixed Precision Deep Neural Networks
- Author
-
Jonas Gava, Geancarlo Abich, Ricardo Reis, Rafael Garibotti, and Luciano Ost
- Subjects
ARM architecture ,Soft error ,Computer engineering ,Processor register ,Computer science ,Redundancy (engineering) ,Memory footprint ,Electrical and Electronic Engineering ,Replication (computing) ,Reliability (statistics) ,Register allocation - Abstract
Deep neural networks (DNNs) are being incorporated in resource-constrained IoT devices, which typically rely on reduced memory footprint and low-performance processors. While DNNs’ precision and performance can vary and are essential, it is also vital to deploy trained models that provide high reliability at low cost. To achieve an unyielding reliability and safety level, it is imperative to provide electronic computing systems with appropriate mechanisms to tackle soft errors. This paper, therefore, investigates the relationship between soft errors and model accuracy. In this regard, an extensive soft error assessment of the MobileNet model is conducted considering precision bitwidth variations (2, 4, and 8 bits) running on an Arm Cortex-M processor. In addition, this work promotes the use of a register allocation technique (RAT) that allocates the critical DNN function/layer to a pool of specific general-purpose processor registers. Results obtained from more than 4.5 million fault injections show that RAT gives the best relative performance, memory utilization, and soft error reliability trade-offs w.r.t. a more traditional replication-based approach. Results also show that the MobileNet soft error reliability varies depending on the precision bitwidth of its convolutional layers.
- Published
- 2021
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.