51. Fast gap-affine pairwise alignment using the wavefront algorithm
- Author
-
Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Barcelona Supercomputing Center, Universitat Politècnica de Catalunya. CAP - Grup de Computació d'Altes Prestacions, Marco-Sola, Santiago, Moure López, Juan Carlos, Moretó Planas, Miquel, Espinosa Morales, Antonio, Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Barcelona Supercomputing Center, Universitat Politècnica de Catalunya. CAP - Grup de Computació d'Altes Prestacions, Marco-Sola, Santiago, Moure López, Juan Carlos, Moretó Planas, Miquel, and Espinosa Morales, Antonio
- Abstract
Motivation Pairwise alignment of sequences is a fundamental method in modern molecular biology, implemented within multiple bioinformatics tools and libraries. Current advances in sequencing technologies press for the development of faster pairwise alignment algorithms that can scale with increasing read lengths and production yields. Results In this paper, we present the wavefront alignment algorithm (WFA), an exact gap-affine algorithm that takes advantage of homologous regions between the sequences to accelerate the alignment process. As opposed to traditional dynamic programming algorithms that run in quadratic time, the WFA runs in time O(ns), proportional to the read length n and the alignment score s, using O(s2) memory. Furthermore, our algorithm exhibits simple data dependencies that can be easily vectorized, even by the automatic features of modern compilers, for different architectures, without the need to adapt the code. We evaluate the performance of our algorithm, together with other state-of-the-art implementations. As a result, we demonstrate that the WFA runs 20-300x faster than other methods aligning short Illumina-like sequences, and 10-100x faster using long noisy reads like those produced by Oxford Nanopore Technologies. Availability The WFA algorithm is implemented within the wavefront-aligner library, and it is publicly available at https://github.com/smarco/WFA, This research has been supported by the European Unions’s Horizon 2020 Framework Programme under the DeepHealth project (grant agreement number 825111), by the European Union Regional Development Fund within the framework of the ERDF Operational Program of Catalonia 2014-2020 with a grant of 50% of total cost eligible under the DRAC project (grant agreement 001-P-001723), by the MINECO-Spain (contracts TIN2017-84553-C2-1-R and TIN2015-65316-P), and by the Catalan government (contracts 2017-SGR-313, 2017-SGR-1328 and 2017-SGR-1414). M. Moretó has been partially supported by the Spanish Ministry of Economy, Industry and Competitiveness under Ramón y Cajal fellowship number RYC-2016-21104., Peer Reviewed, Postprint (published version)
- Published
- 2020