Back to Search Start Over

The optimal metric for viral genome space.

Authors :
Yu H
Yau SS
Source :
Computational and structural biotechnology journal [Comput Struct Biotechnol J] 2024 May 10; Vol. 23, pp. 2083-2096. Date of Electronic Publication: 2024 May 10 (Print Publication: 2024).
Publication Year :
2024

Abstract

Understanding the structural similarity between genomes is pivotal in classification and phylogenetic analysis. As the number of known genomes rockets, alignment-free methods have gained considerable attention. Among these methods, the natural vector method stands out as it represents sequences as vectors using statistical moments, enabling effective clustering based on families in biological taxonomy. However, determining an optimal metric that combines different elements in natural vectors remains challenging due to the absence of a rigorous theoretical framework for weighting different k -mers and orders. In this study, we address this challenge by transforming the determination of optimal weights into an optimization problem and resolving it through gradient-based techniques. Our experimental results underscore the substantial improvement in classification accuracy achieved by employing these optimal weights, reaching an impressive 92.73% on the testing set, surpassing other alignment-free methods. On one hand, our method offers an outstanding metric for virus classification, and on the other hand, it provides valuable insights into feature integration within alignment-free methods.<br />Competing Interests: None.<br /> (© 2024 Published by Elsevier B.V. on behalf of Research Network of Computational and Structural Biotechnology.)

Details

Language :
English
ISSN :
2001-0370
Volume :
23
Database :
MEDLINE
Journal :
Computational and structural biotechnology journal
Publication Type :
Academic Journal
Accession number :
38803517
Full Text :
https://doi.org/10.1016/j.csbj.2024.05.005