Back to Search Start Over

Whole-Genome k -mer Topic Modeling AssociatesBacterial Families.

Authors :
Borrayo-Carbajal E
May-Canche I
Paredes O
Morales JA
Romo-Vázquez R
Vélez-Pérez H
Source :
Genes [Genes (Basel)] 2020 Feb 14; Vol. 11 (2). Date of Electronic Publication: 2020 Feb 14.
Publication Year :
2020

Abstract

Alignment-free k-mer-based algorithms in whole genome sequence comparisons remainan ongoing challenge. Here, we explore the possibility to use Topic Modeling for organismwhole-genome comparisons. We analyzed 30 complete genomes from three bacterial families bytopic modeling. For this, each genome was considered as a document and 13-mer nucleotiderepresentations as words. Latent Dirichlet allocation was used as the probabilistic modeling of thecorpus. We where able to identify the topic distribution among analyzed genomes, which is highlyconsistent with traditional hierarchical classification. It is possible that topic modeling may be appliedto establish relationships between genome's composition and biological phenomena.

Details

Language :
English
ISSN :
2073-4425
Volume :
11
Issue :
2
Database :
MEDLINE
Journal :
Genes
Publication Type :
Academic Journal
Accession number :
32075081
Full Text :
https://doi.org/10.3390/genes11020197