Back to Search Start Over

A high-precision genome size estimator based on the k-mer histogram correction.

Authors :
Xiangyu Liao
Wufei Zhu
Chaoyun Liu
Source :
Frontiers in Genetics; 2024, p1-8, 8p
Publication Year :
2024

Abstract

Introduction: In the realm of next-generation sequencing datasets, various characteristics can be extracted through k-mer based analysis. Among these characteristics, genome size (GS) is one that can be estimated with relative ease, yet achieving satisfactory accuracy, especially in the context of heterozygosity, remains a challenge. Methods: In this study, we introduce a high-precision genome size estimator, GSET (Genome Size Estimation Tool), which is based on k-mer histogram correction. Results: We have evaluated GSET on both simulated and real datasets. The experimental results demonstrate that this tool can estimate genome size with greater precision, even surpassing the accuracy of state-of-the-art tools. Notably, GSET also performs satisfactorily on heterozygous datasets, where other tools struggle to produce useable results. Discussion: The processing model of GSET diverges from the popular data fitting models used by similar tools. Instead, it is derived from empirical data and incorporates a correction term to mitigate the impact of sequencing errors on genome size estimation. GSET is freely available for use and can be accessed at the following URL: https://github.com/Xingyu-Liao/GSET. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
16648021
Database :
Complementary Index
Journal :
Frontiers in Genetics
Publication Type :
Academic Journal
Accession number :
179452111
Full Text :
https://doi.org/10.3389/fgene.2024.1451730