1. Hardware-Enabled Efficient Data Processing With Tensor-Train Decomposition
- Author
-
Zheng Qu, Bangyan Wang, Jilan Lin, Lei Deng, Yuan Xie, Ling Liang, Guoqi Li, Hengnu Chen, and Zheng Zhang
- Subjects
Data processing ,Speedup ,Computer science ,business.industry ,Deep learning ,Big data ,02 engineering and technology ,Computer Graphics and Computer-Aided Design ,020202 computer hardware & architecture ,Convolution ,Software ,Singular value decomposition ,0202 electrical engineering, electronic engineering, information engineering ,Artificial intelligence ,Electrical and Electronic Engineering ,business ,Computer hardware ,Curse of dimensionality - Abstract
In recent years, tensor computation has become a promising tool for solving big data analysis, machine learning, medical image and EDA problems. To ease the memory and computation intensity of tensor processing, decomposition techniques, especially Tensor-train Decomposition(TTD), are widely adopted to compress the extremely high-dimensional tensor data. Despite TTD’s potential to break the curse of dimensionality, researchers have not yet leveraged its full computational potential, mainly because of two reasons:(1) Executing TTD itself is time-and energy-consuming due to the singular value decomposition(SVD) operation inside each of TTD’s iteration; (2) Additional software/hardware optimizations are often required to process the obtained TT-format data in certain applications such as deep learning inference. In this paper, we address these challenges with two approaches. Firstly, we propose an algorithm-hardware co-design with customized architecture namely TTD Engine to accelerate TTD. We use MRI image compression as a demo application to illustrate the efficacy of the proposed accelerator. Secondly, we present a case study demonstrating the benefit of TT-format data processing and the efficacy of using TTD Engine. In the case study, we use TT approach to realize convolution operation, which is difficult and nontrivial for TT-format data. Experimental results show that, TTD Engine achieves, on average, 14.9×∼36.9× speedup over CPU implementations and 4.1×∼9.9× speedup compared to the GPU baseline. The energy efficiency is also improved by at least 14.4× and 5.4× over CPU and GPU, respectively. Moreover, our hardware-enabled TT-format data processing further leads to more efficient implementations of complicated operations and applications.
- Published
- 2022
- Full Text
- View/download PDF