1. Rapid Bacterial Species Delineation Based on Parameters Derived From Genome Numerical Representations
- Author
-
Denisa Maderankova, Robin Jugas, Martin Vitek, Karel Sedlar, and Helena Skutkova
- Subjects
Similarity (geometry) ,Computer science ,lcsh:Biotechnology ,In silico ,Biophysics ,Bacterial genome ,Bacterial genome size ,Biochemistry ,Genome ,Set (abstract data type) ,03 medical and health sciences ,0302 clinical medicine ,Structural Biology ,lcsh:TP248.13-248.65 ,Genetics ,Sensitivity (control systems) ,030304 developmental biology ,Comparative genomics ,0303 health sciences ,Signal processing ,business.industry ,Pattern recognition ,Genomic signal processing ,Computer Science Applications ,030220 oncology & carcinogenesis ,Species delineation ,Artificial intelligence ,Short Survey ,Numerical representation ,business ,Biotechnology - Abstract
Species delineation based on bacterial genomes is an essential part of the research of prokaryotes. In silico genome-to-genome comparison methods are computationally demanding, but much less tedious and error prone than the wet-lab methods. In this paper, we present a novel method for the delineation of bacterial genomes based on genomic signal processing. The proposed method uses numerical representations of whole bacterial genomes, phase signal and cumulated phase signal, from which four parameters are derived for each genome. The parameters characterize a genome and their calculation is independent of the other genomes comprising a delineation dataset. The delineation itself is processed as a calculation of the parameters' average similarity. The method was statistically verified on 1826 bacterial genomes. A similarity threshold of 96% was set based on the receiver operating characteristic curve that featured sensitivity of 99.78% and specificity of 97.25%. Additionally, comparative analysis on another 33 bacterial genomes was conducted using standard delineation tools as these tools were not able to process the dataset of 1826 genomes using desktop computer. The proposed method achieved comparable or better delineation results in comparison with the standard tools. Besides the excellent delineation results, another great advantage of the method is its small computational demands, which enables the delineation of thousands of genomes on a desktop computer. The calculation of the parameters takes tens of minutes for thousands of genomes. Moreover, they can be calculated in advance by creating a database, meaning the delineation itself is then completed in a matter of seconds. Keywords: Bacterial genome, Species delineation, Comparative genomics, Numerical representation, Genomic signal processing
- Published
- 2019