Back to Search Start Over

FAERY: An FPGA-accelerated Embedding-based Retrieval System

Authors :
Zeng, Chaoliang
Luo, Layong
Ning, Qingsong
Han, Yaodong
Jiang, Yuhang
Tang, Ding
Wang, Zilong
Chen, Kai
Guo, Chuanxiong
Zeng, Chaoliang
Luo, Layong
Ning, Qingsong
Han, Yaodong
Jiang, Yuhang
Tang, Ding
Wang, Zilong
Chen, Kai
Guo, Chuanxiong
Publication Year :
2022

Abstract

Embedding-based retrieval (EBR) is widely used in recommendation systems to retrieve thousands of relevant candidates from a large corpus with millions or more items. A good EBR system needs to achieve both high throughput and low latency, as high throughput usually means cost saving and low latency improves user experience. Unfortunately, the performance of existing CPU- and GPU-based EBR are far from optimal due to their inherent architectural limitations. In this paper, we first study how an ideal yet practical EBR system works, and then design FAERY, an FPGA-accelerated EBR, which achieves the optimal performance of the practically ideal EBR system. FAERY is composed of three key components: It uses a high bandwidth HBM for memory bandwidth-intensive corpus scanning, a data parallelism approach for similarity calculation, and a pipeline-based approach for K-selection. To further reduce hardware resources, FAERY introduces a filter to early drop the non-Top-K items. Experiments show that the degraded FAERY with the same memory bandwidth of GPU still achieves 1.21×-12.27× lower latency and up to 4.29× higher throughput under a latency target of 10 ms than GPU-based EBR. © 2022 by The USENIX Association. All rights reserved.

Details

Database :
OAIster
Notes :
English
Publication Type :
Electronic Resource
Accession number :
edsoai.on1363085668
Document Type :
Electronic Resource