Back to Search
Start Over
E-BATCH: Energy-efficient and high-throughput RNN batching
- Source :
- ACM Transactions on Architecture and Code Optimization
- Publication Year :
- 2022
-
Abstract
- Recurrent Neural Network (RNN) inference exhibits low hardware utilization due to the strict data dependencies across time-steps. Batching multiple requests can increase throughput. However, RNN batching requires a large amount of padding since the batched input sequences may vastly differ in length. Schemes that dynamically update the batch every few time-steps avoid padding. However, they require executing different RNN layers in a short time span, decreasing energy efficiency. Hence, we propose E-BATCH, a low-latency and energy-efficient batching scheme tailored to RNN accelerators. It consists of a runtime system and effective hardware support. The runtime concatenates multiple sequences to create large batches, resulting in substantial energy savings. Furthermore, the accelerator notifies it when the evaluation of an input sequence is done. Hence, a new input sequence can be immediately added to a batch, thus largely reducing the amount of padding. E-BATCH dynamically controls the number of time-steps evaluated per batch to achieve the best trade-off between latency and energy efficiency for the given hardware platform. We evaluate E-BATCH on top of E-PUR and TPU. E-BATCH improves throughput by 1.8× and energy efficiency by 3.6× in E-PUR, whereas in TPU, it improves throughput by 2.1× and energy efficiency by 1.6×, over the state-of-the-art. This work has been supported by the CoCoUnit ERC Advanced Grant of the EU’s Horizon 2020 program (grant No 833057), the Spanish State Research Agency (MCIN/AEI) under grant PID2020-113172RB-I00, and the ICREA Academia program.
- Subjects :
- FOS: Computer and information sciences
Batching
Computer Science - Machine Learning
Energia -- Consum
Recurrent neural network
Hardware accelerators
Machine Learning (cs.LG)
Long short term memory
Energy consumption
Neural networks (Computer science)
Computer Science - Distributed, Parallel, and Cluster Computing
Hardware and Architecture
Hardware Architecture (cs.AR)
Xarxes neuronals (Informàtica)
Distributed, Parallel, and Cluster Computing (cs.DC)
Computer Science - Hardware Architecture
Informàtica::Arquitectura de computadors [Àrees temàtiques de la UPC]
Software
Information Systems
Subjects
Details
- Language :
- English
- Database :
- OpenAIRE
- Journal :
- ACM Transactions on Architecture and Code Optimization
- Accession number :
- edsair.doi.dedup.....a967ba2412097c612a5768a221ee2198