Back to Search
Start Over
Machine-learning classification suggests that many alphaproteobacterial prophages may instead be gene transfer agents
- Source :
- Genome Biology and Evolution
- Publication Year :
- 2019
- Publisher :
- Cold Spring Harbor Laboratory, 2019.
-
Abstract
- Many of the sequenced bacterial and archaeal genomes encode regions of viral provenance. Yet, not all of these regions encode bona fide viruses. Gene transfer agents (GTAs) are thought to be former viruses that are now maintained in genomes of some bacteria and archaea and are hypothesized to enable exchange of DNA within bacterial populations. In Alphaproteobacteria, genes homologous to the ‘head-tail’ gene cluster that encodes structural components of the Rhodobacter capsulatus GTA (RcGTA) are found in many taxa, even if they are only distantly related to Rhodobacter capsulatus. Yet, in most genomes available in GenBank RcGTA-like genes have annotations of typical viral proteins, and therefore are not easily distinguished from their viral homologs without additional analyses. Here, we report a ‘support vector machine’ classifier that quickly and accurately distinguishes RcGTA-like genes from their viral homologs by capturing the differences in the amino acid composition of the encoded proteins. Our open-source classifier is implemented in Python and can be used to scan homologs of the RcGTA genes in newly sequenced genomes. The classifier can also be trained to identify other types of GTAs, or even to detect other elements of viral ancestry. Using the classifier trained on a manually curated set of homologous viruses and GTAs, we detected RcGTA-like ‘head-tail’ gene clusters in 57.5% of the 1,423 examined alphaproteobacterial genomes. We also demonstrated that more than half of the in silico prophage predictions are instead likely to be GTAs, suggesting that in many alphaproteobacterial genomes the RcGTA-like elements remain unrecognized.Data depositionSequence alignments and phylogenetic trees are available in a FigShare repository at DOI 10.6084/m9.figshare.8796419. The Python source code of the described classifier and additional scripts used in the analyses are available via a GitHub repository at https://github.com/ecg-lab/GTA-Hunter-v1
- Subjects :
- 0106 biological sciences
Support Vector Machine
Genes, Viral
Prophages
In silico
Computational biology
Biology
ENCODE
010603 evolutionary biology
01 natural sciences
Genome
Rhodobacter capsulatus
03 medical and health sciences
Gene cluster
Genetics
carbon depletion
virus exaptation
Gene
Ecology, Evolution, Behavior and Systematics
Prophage
030304 developmental biology
Alphaproteobacteria
0303 health sciences
Rhodobacter
binary classification
Phylogenetic tree
biology.organism_classification
GTA
Genes, Bacterial
GenBank
Genome, Bacterial
Research Article
Subjects
Details
- Database :
- OpenAIRE
- Journal :
- Genome Biology and Evolution
- Accession number :
- edsair.doi.dedup.....c732444d9149377e278346d15e1ea696
- Full Text :
- https://doi.org/10.1101/697243