Back to Search
Start Over
A high-precision genome size estimator based on the k-mer histogram correction.
- Source :
- Frontiers in Genetics; 2024, p1-8, 8p
- Publication Year :
- 2024
-
Abstract
- Introduction: In the realm of next-generation sequencing datasets, various characteristics can be extracted through k-mer based analysis. Among these characteristics, genome size (GS) is one that can be estimated with relative ease, yet achieving satisfactory accuracy, especially in the context of heterozygosity, remains a challenge. Methods: In this study, we introduce a high-precision genome size estimator, GSET (Genome Size Estimation Tool), which is based on k-mer histogram correction. Results: We have evaluated GSET on both simulated and real datasets. The experimental results demonstrate that this tool can estimate genome size with greater precision, even surpassing the accuracy of state-of-the-art tools. Notably, GSET also performs satisfactorily on heterozygous datasets, where other tools struggle to produce useable results. Discussion: The processing model of GSET diverges from the popular data fitting models used by similar tools. Instead, it is derived from empirical data and incorporates a correction term to mitigate the impact of sequencing errors on genome size estimation. GSET is freely available for use and can be accessed at the following URL: https://github.com/Xingyu-Liao/GSET. [ABSTRACT FROM AUTHOR]
Details
- Language :
- English
- ISSN :
- 16648021
- Database :
- Complementary Index
- Journal :
- Frontiers in Genetics
- Publication Type :
- Academic Journal
- Accession number :
- 179452111
- Full Text :
- https://doi.org/10.3389/fgene.2024.1451730