1. MegaPath: sensitive and rapid pathogen detection using metagenomic NGS data.
- Author
-
Leung CM, Li D, Xin Y, Law WC, Zhang Y, Ting HF, Luo R, and Lam TW
- Subjects
- Algorithms, High-Throughput Nucleotide Sequencing, Humans, Metagenome, Sequence Alignment, Sequence Analysis, DNA, Metagenomics, Software
- Abstract
Background: Next-generation sequencing (NGS) enables unbiased detection of pathogens by mapping the sequencing reads of a patient sample to the known reference sequence of bacteria and viruses. However, for a new pathogen without a reference sequence of a close relative, or with a high load of mutations compared to its predecessors, read mapping fails due to a low similarity between the pathogen and reference sequence, which in turn leads to insensitive and inaccurate pathogen detection outcomes., Results: We developed MegaPath, which runs fast and provides high sensitivity in detecting new pathogens. In MegaPath, we have implemented and tested a combination of polishing techniques to remove non-informative human reads and spurious alignments. MegaPath applies a global optimization to the read alignments and reassigns the reads incorrectly aligned to multiple species to a unique species. The reassignment not only significantly increased the number of reads aligned to distant pathogens, but also significantly reduced incorrect alignments. MegaPath implements an enhanced maximum-exact-match prefix seeding strategy and a SIMD-accelerated Smith-Waterman algorithm to run fast., Conclusions: In our benchmarks, MegaPath demonstrated superior sensitivity by detecting eight times more reads from a low-similarity pathogen than other tools. Meanwhile, MegaPath ran much faster than the other state-of-the-art alignment-based pathogen detection tools (and compariable with the less sensitivity profile-based pathogen detection tools). The running time of MegaPath is about 20 min on a typical 1 Gb dataset.
- Published
- 2020
- Full Text
- View/download PDF