1. ClustAGE: a tool for clustering and distribution analysis of bacterial accessory genomic elements
- Author
-
Egon A. Ozer
- Subjects
0301 basic medicine ,Genomic Islands ,030106 microbiology ,Population ,Flexible genome ,Bacterial genome size ,Computational biology ,Biology ,lcsh:Computer applications to medicine. Medical informatics ,Biochemistry ,Genome ,03 medical and health sciences ,Structural Biology ,Phylogenetics ,Cluster Analysis ,education ,Molecular Biology ,Gene ,lcsh:QH301-705.5 ,Phylogeny ,Comparative genomics ,education.field_of_study ,Base Sequence ,Bacteria ,Applied Mathematics ,Accessory genome ,Phenotype ,Computer Science Applications ,030104 developmental biology ,lcsh:Biology (General) ,Genes, Bacterial ,Pseudomonas aeruginosa ,lcsh:R858-859.7 ,DNA microarray ,Algorithms ,Genome, Bacterial ,Software - Abstract
Background The non-conserved accessory genome of bacteria can be associated with important adaptive characteristics that can contribute to niche specificity or pathogenicity of strains. High degrees of structural and compositional diversity in genomic islands and other elements of the accessory genome can complicate characterization of accessory genome contents among populations of strains. Methods for easily and effectively defining the distributions of discrete elements of the accessory genome among bacterial strains in a population are needed to explore the relationships between the flexible genome and bacterial adaptive traits. Results We have developed the open-source software package ClustAGE. This program, written in Perl, uses BLAST to cluster nucleotide accessory genomic elements from the genomes of multiple bacterial strains and to identify their distribution within the study population. The program output can be used in combination with strain phenotype data or other characteristics to detect associations. Optional graphical output is available for visualizing accessory genome gene content and distribution patterns. The capabilities of the software are demonstrated on a collection of 14 Pseudomonas aeruginosa genome sequences. Conclusions The ClustAGE software and utilities are effective for identifying characteristics and distributions of accessory genomic elements among groups of bacterial genomes. The ability to easily and effectively characterize the accessory genome of a sequence collection may provide a better understanding of the accessory genome’s contribution to a species’ adaptation and pathogenesis. The ClustAGE source code can be downloaded from https://clustage.sourceforge.io and a limited web-based implementation is available at http://vfsmspineagent.fsm.northwestern.edu/cgi-bin/clustage.cgi. Electronic supplementary material The online version of this article (10.1186/s12859-018-2154-x) contains supplementary material, which is available to authorized users.
- Published
- 2018
- Full Text
- View/download PDF