Back to Search Start Over

ReDas: A Lightweight Architecture for Supporting Fine-Grained Reshaping and Multiple Dataflows on Systolic Array

Authors :
Han, Meng
Wang, Liang
Xiao, Limin
Cai, Tianhao
Wang, Zeyu
Xu, Xiangrong
Zhang, Chenhao
Source :
IEEE Transactions on Computers; August 2024, Vol. 73 Issue: 8 p1997-2011, 15p
Publication Year :
2024

Abstract

The systolic accelerator is one of the premier architectural choices for DNN acceleration. However, the conventional systolic architecture suffers from low PE utilization due to the mismatch between the fixed array and diverse DNN workloads. Recent studies have proposed flexible systolic array architectures to adapt to DNN models. However, these designs support only coarse-grained reshaping or significantly increase hardware overhead. In this study, we propose ReDas, a flexible and lightweight systolic array that supports dynamic fine-grained reshaping and multiple dataflows. First, ReDas integrates lightweight and reconfigurable roundabout data paths, which achieve fine-grained reshaping using only short connections between adjacent PEs. Second, we redesign the PE microarchitecture and integrate a set of multi-mode data buffers around the array. The PE structure enables additional data bypassing and flexible data switching. Simultaneously, the multi-mode buffers facilitate fine-grained reallocation of on-chip memory resources, adapting to various dataflow requirements. ReDas can dynamically reconfigure to up to 129 different logical shapes and 3 dataflows for a <inline-formula><tex-math notation="LaTeX">$128\times 128$</tex-math><alternatives><mml:math><mml:mn>128</mml:mn><mml:mo>×</mml:mo><mml:mn>128</mml:mn></mml:math><inline-graphic xlink:href="wang-ieq1-3398500.gif"/></alternatives></inline-formula> array. Finally, we propose an efficient mapper to generate appropriate configurations for each layer of DNN workloads. Compared to the conventional systolic array, ReDas can achieve about 4.6<inline-formula><tex-math notation="LaTeX">$\times$</tex-math><alternatives><mml:math><mml:mo>×</mml:mo></mml:math><inline-graphic xlink:href="wang-ieq2-3398500.gif"/></alternatives></inline-formula> speedup and 8.3<inline-formula><tex-math notation="LaTeX">$\times$</tex-math><alternatives><mml:math><mml:mo>×</mml:mo></mml:math><inline-graphic xlink:href="wang-ieq3-3398500.gif"/></alternatives></inline-formula> energy-delay product (EDP) reduction.

Details

Language :
English
ISSN :
00189340 and 15579956
Volume :
73
Issue :
8
Database :
Supplemental Index
Journal :
IEEE Transactions on Computers
Publication Type :
Periodical
Accession number :
ejs66946263
Full Text :
https://doi.org/10.1109/TC.2024.3398500