Back to Search
Start Over
Neuron-level fuzzy memoization in RNNs
- Source :
- Recercat. Dipósit de la Recerca de Catalunya, instname, Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture, MICRO, UPCommons. Portal del coneixement obert de la UPC, Universitat Politècnica de Catalunya (UPC)
- Publisher :
- Association for Computing Machinery (ACM)
-
Abstract
- The final publication is available at ACM via http://dx.doi.org/10.1145/3352460.3358309 Recurrent Neural Networks (RNNs) are a key technology for applications such as automatic speech recognition or machine translation. Unlike conventional feed-forward DNNs, RNNs remember past information to improve the accuracy of future predictions and, therefore, they are very effective for sequence processing problems. For each application run, each recurrent layer is executed many times for processing a potentially large sequence of inputs (words, images, audio frames, etc.). In this paper, we make the observation that the output of a neuron exhibits small changes in consecutive invocations. We exploit this property to build a neuron-level fuzzy memoization scheme, which dynamically caches the output of each neuron and reuses it whenever it is predicted that the current output will be similar to a previously computed result, avoiding in this way the output computations. The main challenge in this scheme is determining whether the new neuron's output for the current input in the sequence will be similar to a recently computed result. To this end, we extend the recurrent layer with a much simpler Bitwise Neural Network (BNN), and show that the BNN and RNN outputs are highly correlated: if two BNN outputs are very similar, the corresponding outputs in the original RNN layer are likely to exhibit negligible changes. The BNN provides a low-cost and effective mechanism for deciding when fuzzy memoization can be applied with a small impact on accuracy. We evaluate our memoization scheme on top of a state-of-the-art accelerator for RNNs, for a variety of different neural networks from multiple application domains. We show that our technique avoids more than 24.2% of computations, resulting in 18.5% energy savings and 1.35x speedup on average.
- Subjects :
- Scheme (programming language)
Binary networks
Speedup
Computer science
Memoization
Reconeixement automàtic de la parla
02 engineering and technology
01 natural sciences
Fuzzy logic
Neural networks (Computer science)
Informàtica [Àrees temàtiques de la UPC]
0103 physical sciences
Machine learning
0202 electrical engineering, electronic engineering, information engineering
medicine
Xarxes neuronals (Informàtica)
computer.programming_language
010302 applied physics
Sequence
Artificial neural network
Automatic speech recognition
3. Good health
020202 computer hardware & architecture
Long short term memory
Recurrent neural network
medicine.anatomical_structure
Recurrent neural networks
Enginyeria de la telecomunicació::Processament del senyal::Processament de la parla i del senyal acústic [Àrees temàtiques de la UPC]
Neuron
Algorithm
computer
Subjects
Details
- ISBN :
- 978-1-4503-6938-1
- ISBNs :
- 9781450369381
- Database :
- OpenAIRE
- Journal :
- Recercat. Dipósit de la Recerca de Catalunya, instname, Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture, MICRO, UPCommons. Portal del coneixement obert de la UPC, Universitat Politècnica de Catalunya (UPC)
- Accession number :
- edsair.doi.dedup.....4d9c9829c03a15c8418c968521cf3872