Author: "Mingcong Song" / Language: english - Searchworks@Jio Institute Digital Library Search Results

Searchworks

Author: Chubo Liu, Kenli Li, Mingcong Song, Jiechen Zhao, Keqin Li, Tao Li, and Zihao Zeng
Subjects: ARTIFICIAL intelligence, CLOUD computing, DATA flow computing, ELECTRON accelerators, DATA analysis
Abstract: End-to-end latency is sensitive for user-interactive neural network (NN) services on clouds. For periods of high request load, co-locating multiple NN requests has the potential to reduce end-to-end latency. However, current batch-based accelerators lack request-level parallelism support, leaving the queuing time non-optimized. Meanwhile, naively partitioning resources for simultaneous requests suffers from longer execution time as well as lower resource efficiency because different applications utilize separate resources without sharing. To effectively reduce the end-to-end latency for real-time NN requests, we propose COEXE architecture, equipped with a pipeline implementation of a sparsity-driven real-time co-execution model. By leveraging the non-trivial amount of sparse operations during concurrent NNs execution, the end-to-end latency is decreased by up to 12.3x and 2.4 x over Eyeriss-like and SCNN at peak workload mode. Besides, we propose row cross (RC) dataflow to reduce data movement cost, and avoid memory duplication. [ABSTRACT FROM AUTHOR]
Published: 2020

Author: Chao Li, Yang Hu, Longjun Liu, Juncheng Gu, Mingcong Song, Xiaoyao Liang, Jingling Yuan, and Tao Li
Published: 2015
Full Text: View/download PDF

Searchworks