Back to Search
Start Over
SAKE: Strobemer-assisted k-mer extraction.
- Source :
-
PloS one [PLoS One] 2023 Nov 29; Vol. 18 (11), pp. e0294415. Date of Electronic Publication: 2023 Nov 29 (Print Publication: 2023). - Publication Year :
- 2023
-
Abstract
- K-mer-based analysis plays an important role in many bioinformatics applications, such as de novo assembly, sequencing error correction, and genotyping. To take full advantage of such methods, the k-mer content of a read set must be captured as accurately as possible. Often the use of long k-mers is preferred because they can be uniquely associated with a specific genomic region. Unfortunately, it is not possible to reliably extract long k-mers in high error rate reads with standard exact k-mer counting methods. We propose SAKE, a method to extract long k-mers from high error rate reads by utilizing strobemers and consensus k-mer generation through partial order alignment. Our experiments show that on simulated data with up to 6% error rate, SAKE can extract 97-mers with over 90% recall. Conversely, the recall of DSK, an exact k-mer counter, drops to less than 20%. Furthermore, the precision of SAKE remains similar to DSK. On real bacterial data, SAKE retrieves 97-mers with a recall of over 90% and slightly lower precision than DSK, while the recall of DSK already drops to 50%. We show that SAKE can extract more k-mers from uncorrected high error rate reads compared to exact k-mer counting. However, exact k-mer counters run on corrected reads can extract slightly more k-mers than SAKE run on uncorrected reads.<br />Competing Interests: The authors have declared that no competing interests exist.<br /> (Copyright: © 2023 Leinonen, Salmela. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.)
Details
- Language :
- English
- ISSN :
- 1932-6203
- Volume :
- 18
- Issue :
- 11
- Database :
- MEDLINE
- Journal :
- PloS one
- Publication Type :
- Academic Journal
- Accession number :
- 38019768
- Full Text :
- https://doi.org/10.1371/journal.pone.0294415