Back to Search Start Over

Accelerated Seeding for Genome Sequence Alignment with Enumerated Radix Trees

Authors :
David Blaauw
Nathan Ozog
Arun Subramaniyan
Satish Narayanasamy
Jack Wadden
Kush Goliya
Xiao Wu
Reetuparna Das
Source :
ISCA
Publication Year :
2021
Publisher :
IEEE, 2021.

Abstract

Read alignment is a time-consuming step in genome sequencing analysis. The most widely used software for read alignment, BWA-MEM, and the recently published faster version BWA-MEM2 are based on the seed-and-extend paradigm for read alignment. The seeding step of read alignment is a major bottleneck contributing ~40% to the overall execution time of BWA-MEM2 when aligning whole human genome reads from the Platinum Genomes dataset. This is because both BWA-MEM and BWA-MEM2 use a compressed index structure called the FMD-Index, which results in high bandwidth requirements, primarily due to its character-by-character processing of reads. For instance, to seed each read (101 DNA base-pairs stored in 37.8 bytes), the FMD-Index solution in BWA-MEM2 requires ~68.5 KB of index data. We propose a novel indexing data structure named Enumerated Radix Tree (ERT) and design a custom seeding accelerator based on it. ERT improves bandwidth efficiency of BWA-MEM2 by 4.5X while guaranteeing 100% identical output to the original software, and still fitting in 64 GB DRAM. Overall, the proposed seeding accelerator implemented on AWS F1 FPGA (f1.4xlarge) improves seeding throughput of BWA-MEM2 by 3.3X. When combined with seed-extension accelerators, we observe a 2.1X improvement in overall read alignment throughput over BWA-MEM2. The software implementation of ERT is integrated into BWA-MEM2 (ert branch: https://github.com/bwa-mem2/bwa-mem2/tree/ert) and is open sourced for the benefit of the research community.

Details

Database :
OpenAIRE
Journal :
2021 ACM/IEEE 48th Annual International Symposium on Computer Architecture (ISCA)
Accession number :
edsair.doi...........d4b174b418d4876b6cb2d6f740d67634