Back to Search
Start Over
k-nonical space: sketching with reverse complements.
- Source :
-
Bioinformatics . Nov2024, Vol. 40 Issue 11, p1-10. 10p. - Publication Year :
- 2024
-
Abstract
- Motivation Sequences equivalent to their reverse complements (i.e. double-stranded DNA) have no analogue in text analysis and non-biological string algorithms. Despite this striking difference, algorithms designed for computational biology (e.g. sketching algorithms) are designed and tested in the same way as classical string algorithms. Then, as a post-processing step, these algorithms are adapted to work with genomic sequences by folding a k -mer and its reverse complement into a single sequence: The canonical representation (k -nonical space). Results The effect of using the canonical representation with sketching methods is understudied and not understood. As a first step, we use context-free sketching methods to illustrate the potentially detrimental effects of using canonical k -mers with string algorithms not designed to accommodate for them. In particular, we show that large stretches of the genome ("sketching deserts") are undersampled or entirely skipped by context-free sketching methods, effectively making these genomic regions invisible to subsequent algorithms using these sketches. We provide empirical data showing these effects and develop a theoretical framework explaining the appearance of sketching deserts. Finally, we propose two schemes to accommodate for these effects: (i) a new procedure that adapts existing sketching methods to k -nonical space and (ii) an optimization procedure to directly design new sketching methods for k -nonical space. Availability and implementation The code used in this analysis is available under a permissive license at https://github.com/Kingsford-Group/mdsscope. [ABSTRACT FROM AUTHOR]
- Subjects :
- *COMPUTATIONAL biology
*ALGORITHMS
*TEST design
*DESERTS
*GENOMES
Subjects
Details
- Language :
- English
- ISSN :
- 13674803
- Volume :
- 40
- Issue :
- 11
- Database :
- Academic Search Index
- Journal :
- Bioinformatics
- Publication Type :
- Academic Journal
- Accession number :
- 181152994
- Full Text :
- https://doi.org/10.1093/bioinformatics/btae629