1. Fastmove: A Comprehensive Study of On-Chip DMA and its Demonstration for Accelerating Data Movement in NVM-based Storage Systems.
- Author
-
Li, Jiahao, Su, Jingbo, Chen, Luofan, Li, Cheng, Zhang, Kai, Yang, Liang, Noh, Sam, and Xu, Yinlong
- Subjects
GRAPH algorithms ,SYSTEM integration ,DYNAMIC random access memory ,ELECTRONIC file management ,STORAGE - Abstract
Data-intensive applications executing on NVM-based storage systems experience serious bottlenecks when moving data between DRAM and NVM. We advocate for the use of the long-existing but recently neglected on-chip DMA to expedite data movement with three contributions. First, we explore new latency-oriented optimization directions, driven by a comprehensive DMA study, to design a high-performance DMA module, which significantly lowers the I/O size threshold to observe benefits. Second, we propose a new data movement engine, Fastmove, that coordinates the use of the DMA along with the CPU with DDIO-aware strategies, judicious scheduling, and load splitting such that the DMA's limitations are compensated, and the overall gains are maximized. Finally, with a general kernel-based design, simple APIs, and DAX file system integration, Fastmove allows applications to transparently exploit the DMA and its new features without code change. We run three data-intensive applications MySQL, GraphWalker, and Filebench atop NOVA, ext4-DAX, and XFS-DAX, with standard benchmarks like TPC-C, and popular graph algorithms like PageRank. Across single- and multi-socket settings, compared to the conventional CPU-only NVM accesses, Fastmove introduces to TPC-C with MySQL 1.13–2.16× speedups of peak throughput, reduces the average latency by 17.7–60.8%, and saves 37.1–68.9% CPU usage spent in data movement. It also shortens the execution time of graph algorithms with GraphWalker by 39.7–53.4%, and introduces 1.01–1.48× throughput speedups for Filebench. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF