Author: "Silfa Feliz, Franyell Antonio" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Silfa Feliz, Franyell Antonio"' showing total 5 results

Start Over Author "Silfa Feliz, Franyell Antonio"

5 results on '"Silfa Feliz, Franyell Antonio"'

1. Exploiting kernel compression on BNNs

Author: Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Universitat Politècnica de Catalunya. ARCO - Microarquitectura i Compiladors, Silfa Feliz, Franyell Antonio, Arnau Montañés, José María, González Colás, Antonio María, Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Universitat Politècnica de Catalunya. ARCO - Microarquitectura i Compiladors, Silfa Feliz, Franyell Antonio, Arnau Montañés, José María, and González Colás, Antonio María
Abstract: Binary Neural Networks (BNNs) are showing tremen-dous success on realistic image classification tasks. Notably, their accuracy is similar to the state-of-the-art accuracy obtained by full-precision models tailored to edge devices. In this regard, BNNs are very amenable to edge devices since they employ 1-bit to store the inputs and weights, and thus, their storage requirements are low. Moreover, BNNs computations are mainly done using xnor and pop-counts operations which are implemented very efficiently using simple hardware structures. Nonetheless, supporting BNNs efficiently on mobile CPUs is far from trivial since their benefits are hindered by frequent memory accesses to load weights and inputs. In BNNs, a weight or an input is stored using one bit, and aiming to increase storage and computation efficiency, several of them are packed together as a sequence of bits. In this work, we observe that the number of unique sequences representing a set of weights or inputs is typically low (i.e., 512). Also, we have seen that during the evaluation of a BNN layer, a small group of unique sequences is employed more frequently than others. Accordingly, we propose exploiting this observation by using Huffman Encoding to encode the bit sequences and then using an indirection table to decode them during the BNN evaluation. Also, we propose a clustering-based scheme to identify the most common sequences of bits and replace the less common ones with some similar common sequences. As a result, we decrease the storage requirements and memory accesses since the most common sequences are encoded with fewer bits. In this work, we extend a mobile CPU by adding a small hardware structure that can efficiently cache and decode the compressed sequence of bits. We evaluate our scheme using the ReAacNet model with the Imagenet dataset on an ARM CPU. Our experimental results show that our technique can reduce memory requirement by 1.32x and improve performance by 1.35x., This work has been supported by the CoCoUnit ERC Advanced Grant of the EU’s Horizon 2020 program (grant No 833057), the Spanish State Research Agency (MCIN/AEI) under grant PID2020-113172RB-I00, and the ICREA Academia program., Peer Reviewed, Postprint (author's final draft)
Published: 2023

2. E-BATCH: Energy-efficient and high-throughput RNN batching

Author: Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Universitat Politècnica de Catalunya. ARCO - Microarquitectura i Compiladors, Silfa Feliz, Franyell Antonio, Arnau Montañés, José María, González Colás, Antonio María, Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Universitat Politècnica de Catalunya. ARCO - Microarquitectura i Compiladors, Silfa Feliz, Franyell Antonio, Arnau Montañés, José María, and González Colás, Antonio María
Abstract: Recurrent Neural Network (RNN) inference exhibits low hardware utilization due to the strict data dependencies across time-steps. Batching multiple requests can increase throughput. However, RNN batching requires a large amount of padding since the batched input sequences may vastly differ in length. Schemes that dynamically update the batch every few time-steps avoid padding. However, they require executing different RNN layers in a short time span, decreasing energy efficiency. Hence, we propose E-BATCH, a low-latency and energy-efficient batching scheme tailored to RNN accelerators. It consists of a runtime system and effective hardware support. The runtime concatenates multiple sequences to create large batches, resulting in substantial energy savings. Furthermore, the accelerator notifies it when the evaluation of an input sequence is done. Hence, a new input sequence can be immediately added to a batch, thus largely reducing the amount of padding. E-BATCH dynamically controls the number of time-steps evaluated per batch to achieve the best trade-off between latency and energy efficiency for the given hardware platform. We evaluate E-BATCH on top of E-PUR and TPU. E-BATCH improves throughput by 1.8× and energy efficiency by 3.6× in E-PUR, whereas in TPU, it improves throughput by 2.1× and energy efficiency by 1.6×, over the state-of-the-art., This work has been supported by the CoCoUnit ERC Advanced Grant of the EU’s Horizon 2020 program (grant No 833057), the Spanish State Research Agency (MCIN/AEI) under grant PID2020-113172RB-I00, and the ICREA Academia program., Peer Reviewed, Postprint (author's final draft)
Published: 2022

3. Boosting LSTM performance through dynamic precision selection

Author: Universitat Politècnica de Catalunya. Doctorat en Arquitectura de Computadors, Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Universitat Politècnica de Catalunya. ARCO - Microarquitectura i Compiladors, Silfa Feliz, Franyell Antonio, Arnau Montañés, José María, González Colás, Antonio María, Universitat Politècnica de Catalunya. Doctorat en Arquitectura de Computadors, Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Universitat Politècnica de Catalunya. ARCO - Microarquitectura i Compiladors, Silfa Feliz, Franyell Antonio, Arnau Montañés, José María, and González Colás, Antonio María
Abstract: The use of low numerical precision is a fundamental optimization included in modern accelerators for Deep Neural Networks (DNNs). The number of bits of the numerical representation is set to the minimum precision that is able to retain accuracy based on an offline profiling, and it is kept constant for DNN inference. In this work, we explore the use of dynamic precision selection during DNN inference. We focus on Long Short Term Memory (LSTM) networks, which represent the state-of-the-art networks for applications such as machine translation and speech recognition. Unlike conventional DNNs, LSTM networks remember information from previous evaluations by storing data in the LSTM cell state. Our key observation is that the cell state determines the amount of precision required: time-steps where the cell state changes significantly require higher precision, whereas time-steps where the cell state is stable can be computed with lower precision without any loss in accuracy. We propose a novel hardware scheme that tracks the evolution of the elements in the LSTM cell state and dynamically selects the appropriate precision on each time-step. For a set of popular LSTM networks, it chooses the lowest precision for 57% of the time, outperforming systems that fix the precision statically. We evaluate our proposal on top of a modern highly-optimized LSTM accelerator, and show that it provides 1.46x speedup and 19.2% energy savings on average without degrading the model accuracy. Our scheme has an overhead of less than 8%., This work has been supported by the CoCoUnit ERC Advanced Grant of the ED's Horizon 2020 program (grant No 833057), the Spanish State Research Agency under grant TIN2016-75344-R (AEI/FEDER, EU), the ICREA Academia program, and the Fundación Carolina and PUCMM by a scholarship., Peer Reviewed, Postprint (author's final draft)
Published: 2020

4. Neuron-level fuzzy memoization in RNNs

Author: Universitat Politècnica de Catalunya. Doctorat en Arquitectura de Computadors, Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Universitat Politècnica de Catalunya. ARCO - Microarquitectura i Compiladors, Silfa Feliz, Franyell Antonio, Dot Artigas, Gem, Arnau Montañés, José María, González Colás, Antonio María, Universitat Politècnica de Catalunya. Doctorat en Arquitectura de Computadors, Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Universitat Politècnica de Catalunya. ARCO - Microarquitectura i Compiladors, Silfa Feliz, Franyell Antonio, Dot Artigas, Gem, Arnau Montañés, José María, and González Colás, Antonio María
Abstract: The final publication is available at ACM via http://dx.doi.org/10.1145/3352460.3358309, Recurrent Neural Networks (RNNs) are a key technology for applications such as automatic speech recognition or machine translation. Unlike conventional feed-forward DNNs, RNNs remember past information to improve the accuracy of future predictions and, therefore, they are very effective for sequence processing problems. For each application run, each recurrent layer is executed many times for processing a potentially large sequence of inputs (words, images, audio frames, etc.). In this paper, we make the observation that the output of a neuron exhibits small changes in consecutive invocations. We exploit this property to build a neuron-level fuzzy memoization scheme, which dynamically caches the output of each neuron and reuses it whenever it is predicted that the current output will be similar to a previously computed result, avoiding in this way the output computations. The main challenge in this scheme is determining whether the new neuron's output for the current input in the sequence will be similar to a recently computed result. To this end, we extend the recurrent layer with a much simpler Bitwise Neural Network (BNN), and show that the BNN and RNN outputs are highly correlated: if two BNN outputs are very similar, the corresponding outputs in the original RNN layer are likely to exhibit negligible changes. The BNN provides a low-cost and effective mechanism for deciding when fuzzy memoization can be applied with a small impact on accuracy. We evaluate our memoization scheme on top of a state-of-the-art accelerator for RNNs, for a variety of different neural networks from multiple application domains. We show that our technique avoids more than 24.2% of computations, resulting in 18.5% energy savings and 1.35x speedup on average., Peer Reviewed, Postprint (author's final draft)
Published: 2019

5. E-PUR: an energy-efficient processing unit for recurrent neural networks

Author: Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Universitat Politècnica de Catalunya. ARCO - Microarquitectura i Compiladors, Silfa Feliz, Franyell Antonio, Dot, Gem, Arnau Montañés, José María, González Colás, Antonio María, Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Universitat Politècnica de Catalunya. ARCO - Microarquitectura i Compiladors, Silfa Feliz, Franyell Antonio, Dot, Gem, Arnau Montañés, José María, and González Colás, Antonio María
Abstract: Recurrent Neural Networks (RNNs) are a key technology for emerging applications such as automatic speech recognition, machine translation or image description. Long Short Term Memory (LSTM) networks are the most successful RNN implementation, as they can learn long term dependencies to achieve high accuracy. Unfortunately, the recurrent nature of LSTM networks significantly constrains the amount of parallelism and, hence, multicore CPUs and many-core GPUs exhibit poor efficiency for RNN inference. In this paper, we present E-PUR, an energy-efficient processing unit tailored to the requirements of LSTM computation. The main goal of E-PUR is to support large recurrent neural networks for low-power mobile devices. E-PUR provides an efficient hardware implementation of LSTM networks that is flexible to support diverse applications. One of its main novelties is a technique that we call Maximizing Weight Locality (MWL), which improves the temporal locality of the memory accesses for fetching the synaptic weights, reducing the memory requirements by a large extent. Our experimental results show that E-PUR achieves real-time performance for different LSTM networks, while reducing energy consumption by orders of magnitude with respect to general-purpose processors and GPUs, and it requires a very small chip area. Compared to a modern mobile SoC, an NVIDIA Tegra X1, E-PUR provides an average energy reduction of 88x., Peer Reviewed, Postprint (published version)
Published: 2018

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

5 results on '"Silfa Feliz, Franyell Antonio"'

1. Exploiting kernel compression on BNNs

2. E-BATCH: Energy-efficient and high-throughput RNN batching

3. Boosting LSTM performance through dynamic precision selection

4. Neuron-level fuzzy memoization in RNNs

5. E-PUR: an energy-efficient processing unit for recurrent neural networks

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Publication Year Range

Publication Type

Database

Publisher

5 results on '"Silfa Feliz, Franyell Antonio"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources