Start Over

hXDP: Efficient Software Packet Processing on FPGA NICs.

Authors :: Brunella, Marco Spaziani
Belocchi, Giacomo
Bonola, Marco
Pontarelli, Salvatore
Siracusano, Giuseppe
Bianchi, Giuseppe
Cammarano, Aniello
Palumbo, Alessandro
Petrucci, Luca
Bifulco, Roberto
Source :: Communications of the ACM. Aug2022, Vol. 65 Issue 8, p91-100. 8p. 5 Diagrams, 2 Charts, 4 Graphs.
Publication Year :: 2022
Abstract: The network interface cards (NICs) of modern computers are changing to adapt to faster data rates and to help with the scaling issues of general-purpose CPU technologies. Among the ongoing innovations, the inclusion of programmable accelerators on the NIC's data path is particularly interesting, since it provides the opportunity to offload some of the CPU's network packet processing tasks to the accelerator. Given the strict latency constraints of packet processing tasks, accelerators are often implemented leveraging platforms such as Field-Programmable Gate Arrays (FPGAs). FPGAs can be re-programmed after deployment, to adapt to changing application requirements, and can achieve both high throughput and low latency when implementing packet processing tasks. However, they have limited resources that may need to be shared among diverse applications, and programming them is difficult and requires hardware design expertise. We present hXDP, a solution to run on FPGAs software packet processing tasks described with the eBPF technology and targeting the Linux's eXpress Data Path. hXDP uses only a fraction of the available FPGA resources, while matching the performance of high-end CPUs. The iterative execution model of eBPF is not a good fit for FPGA accelerators. Nonetheless, we show that many of the instructions of an eBPF program can be compressed, parallelized, or completely removed, when targeting a purpose-built FPGA design, thereby significantly improving performance. We implement hXDP on an FPGA NIC and evaluate it running real-world unmodified eBPF programs. Our implementation runs at 156.25MHz and uses about 15% of the FPGA resources. Despite these modest requirements, it can run dynamically loaded programs, achieves the packet processing throughput of a high-end CPU core, and provides a 10× lower packet forwarding latency. [ABSTRACT FROM AUTHOR]