Author: "Liqiang Lu" / Journal: ieee transactions on computer-aided design of integrated circuits and systems - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Liqiang Lu"' showing total 5 results

Start Over Author "Liqiang Lu" Journal ieee transactions on computer-aided design of integrated circuits and systems

5 results on '"Liqiang Lu"'

1. Automatic Generation of Spatial Accelerator for Tensor Algebra

Author: Liancheng Jia, Zizhang Luo, Liqiang Lu, and Yun Liang
Subjects: Electrical and Electronic Engineering, Computer Graphics and Computer-Aided Design, Software
Published: 2023

2. Morphling: A Reconfigurable Architecture for Tensor Computation

Author: Liqiang Lu and Yun Liang
Subjects: Electrical and Electronic Engineering, Computer Graphics and Computer-Aided Design, Software
Published: 2022

3. FCNNLib: A Flexible Convolution Algorithm Library for Deep Learning on FPGAs

Author: Liqiang Lu, Jiaming Xie, Qingcheng Xiao, and Yun Liang
Subjects: business.industry, Computer science, Fast Fourier transform, Performance tuning, Control reconfiguration, Computer Graphics and Computer-Aided Design, Convolution, Software, Scalability, Overhead (computing), Electrical and Electronic Engineering, business, Field-programmable gate array, Algorithm
Abstract: Convolution features huge complexity and demands high computation capability. Among hardware platforms, Field Programmable Gate Array (FPGA) emerges as a promising solution for its substantial available parallelism and energy efficiency. Besides, convolution can be implemented with different algorithms, including conventional, GEMM, Winograd, and FFT algorithms, which are diverse in arithmetic complexity, resource requirement, etc. Different CNN models have different topologies and structures, favouring different convolution algorithms. In response, software libraries such as cuDNN provide a variety of computational primitives to support these algorithms. However, supporting such libraries on FPGAs is challenging. First, multiple algorithms can share the FPGA resources spatially as well as temporally, introducing either reconfiguration overhead or resource underutilization. Second, FPGA implementation remains a significant challenge for library developers. It typically requires significant specialized hardware knowledge. In this paper, we propose FCNNLib, an efficient and scalable convolution algorithm library on FPGAs. To coordinate multiple convolution algorithms on FPGAs, we develop three schedulings: spatial, temporal, and hybrid, which exhibit different tradeoffs in latency and throughput. We explore these schedulings by balancing the reconfiguration overhead, resource utilization and optimization objectives of the CNNs. Then, we provide efficient and tunable algorithm templates that allow performance tuning through performance and resource models. To arm the users, FCNNLib exposes a set of interfaces to support high-level application designs. We demonstrate the usability of FCNNLib with state-of-the-art CNNs. FCNNLib achieves up to 44.6X and 1.76X energy efficiency in various scenarios compared with software libraries for CPUs and GPUs, respectively.
Published: 2022

4. OMNI: A Framework for Integrating Hardware and Software Optimizations for Sparse CNNs

Author: Liqiang Lu, Jiaming Xie, and Yun Liang
Subjects: Speedup, Artificial neural network, business.industry, Computer science, Deep learning, 02 engineering and technology, Computer Graphics and Computer-Aided Design, 020202 computer hardware & architecture, Software, Application-specific integrated circuit, 0202 electrical engineering, electronic engineering, information engineering, Overhead (computing), Hardware acceleration, Artificial intelligence, Electrical and Electronic Engineering, business, Field-programmable gate array, Computer hardware
Abstract: Convolution neural networks (CNNs) as one of today’s main flavor of deep learning techniques dominate in various image recognition tasks. As the model size of modern CNNs continues to grow, neural network compression techniques have been proposed to prune the redundant neurons and synapses. However, prior techniques disconnect the software neural networks compression and hardware acceleration, which fail to balance multiple design parameters, including sparsity, performance, hardware area cost, and efficiency. More concretely, prior unstructured pruning techniques achieve high sparsity at the expense of extra performance overhead, while prior structured pruning techniques relying on strict sparse patterns lead to low sparsity and extra hardware cost. In this article, we propose OMNI, a framework for accelerating sparse CNNs on hardware accelerators. The innovation of OMNI stems from that it uses hardware amenable on-chip memory partition patterns to seamlessly engage the software CNN model compression and hardware CNN acceleration. To accelerate the compute-intensive convolution kernel, a promising hardware optimization approach is memory partition, which divides the original weight kernels into several groups so that the different hardware processing elements can simultaneously access the weight. We exploit the memory partition patterns including block, cyclic, or hybrid as a means of CNN compression patterns. Our software CNN model compression balances the sparsity across different groups and our hardware accelerator employs hardware parallelization coordinately with the sparse patterns, leading to a desirable compromise between sparsity and performance. We further develop performance models to help the designers to quickly identify the pattern factors subject to an area constraint. Last, we evaluate our design on application specific integrated circuit (ASIC) and field-programmable gate array (FPGA) platform. Experiments demonstrate that OMNI achieves $3.4\times $ – $6.2\times $ speedup for the modern CNNs, over a comparably ideal dense CNN accelerator. OMNI shows $114.7\times $ energy efficiency improvement compared with GPU platform. OMNI is also evaluated on Xilinx ZC706 and ZCU102 FPGA platforms, achieving 41.5 GOP/s and 125.3 GOP/s, respectively.
Published: 2021

5. Evaluating Fast Algorithms for Convolutional Neural Networks on FPGAs

Author: Shengen Yan, Liqiang Lu, Yun Liang, and Qingcheng Xiao
Subjects: Computer science, Design space exploration, Pipeline (computing), Fast Fourier transform, Reconfigurability, Parallel computing, 02 engineering and technology, Filter (signal processing), Computer Graphics and Computer-Aided Design, Convolutional neural network, Residual neural network, 020202 computer hardware & architecture, 0202 electrical engineering, electronic engineering, information engineering, Hardware acceleration, 020201 artificial intelligence & image processing, Algorithm design, Electrical and Electronic Engineering, Field-programmable gate array, Algorithm, Software, Efficient energy use
Abstract: In recent years, convolutional neural networks (CNNs) have become widely adopted for computer vision tasks. Field-programmable gate arrays (FPGAs) have been adequately explored as a promising hardware accelerator for CNNs due to its high performance, energy efficiency, and reconfigurability. However, prior FPGA solutions based on the conventional convolutional algorithm is often bounded by the computational capability of FPGAs (e.g., the number of DSPs). To address this problem, the feature maps are transformed to a special domain using fast algorithms to reduce the arithmetic complexity. Winograd and fast Fourier transformation (FFT), as fast algorithm representatives, first transform input data and filter to Winograd or frequency domain, then perform element-wise multiplication, and apply inverse transformation to get the final output. In this paper, we propose a novel architecture for implementing fast algorithms on FPGAs. Our design employs line buffer structure to effectively reuse the feature map data among different tiles. We also effectively pipeline the Winograd/FFT processing element (PE) engine and initiate multiple PEs through parallelization. Meanwhile, there exists a complex design space to explore. We propose an analytical model to predict the resource usage and the performance. Then, we use the model to guide a fast design space exploration. Experiments using the state-of-the-art CNNs demonstrate the best performance and energy efficiency on FPGAs. We achieve 854.6 and 2479.6 GOP/s for AlexNet and VGG16 on Xilinx ZCU102 platform using Winograd. We achieve 130.4 GOP/s for Resnet using Winograd and 201.1 GOP/s for YOLO using FFT on Xilinx ZC706 platform.
Published: 2020

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

5 results on '"Liqiang Lu"'

1. Automatic Generation of Spatial Accelerator for Tensor Algebra

2. Morphling: A Reconfigurable Architecture for Tensor Computation

3. FCNNLib: A Flexible Convolution Algorithm Library for Deep Learning on FPGAs

4. OMNI: A Framework for Integrating Hardware and Software Optimizations for Sparse CNNs

5. Evaluating Fast Algorithms for Convolutional Neural Networks on FPGAs

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Database

5 results on '"Liqiang Lu"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources