Back to Search Start Over

Comparative analysis of assembly algorithms to optimize biosynthetic gene cluster identification in novel marine actinomycete genomes

Authors :
Daniela Tizabi
Tsvetan Bachvaroff
Russell T. Hill
Source :
Frontiers in Marine Science, Vol 9 (2022)
Publication Year :
2022
Publisher :
Frontiers Media S.A., 2022.

Abstract

Many marine sponges harbor dense communities of microbes that aid in the chemical defense of these nonmotile hosts. Metabolites that comprise this chemical arsenal can have pharmaceutically-relevant activities such as antibacterial, antiviral, antifungal and anticancer properties. Previous investigation of the Caribbean giant barrel sponge Xestospongia muta revealed a microbial community including novel Actinobacteria, a phylum well known for its production of antibiotic compounds. This novel assemblage was investigated for its ability to produce compounds that inhibit M. tuberculosis by using a bioinformatics approach. Microbial extracts were tested for their ability to inhibit growth of M. tb and genomes of the 11 strains that showed anti-M. tb activity including Micrococcus (n=2), Micromonospora (n=4), Streptomyces (n=3), and Brevibacterium spp. (n=2) were sequenced by using Illumina MiSeq. Three assembly algorithms/pipelines (SPAdes, A5-miseq and Shovill) were compared for their ability to construct contigs with minimal gaps to maximize the probability of identifying complete biosynthetic gene clusters (BGCs) present in the genomes. Although A5-miseq and Shovill usually assembled raw reads into the fewest contigs, after necessary post-assembly filtering, SPAdes generally produced the most complete genomes with the fewest contigs. This study revealed the strengths and weaknesses of the different assemblers based on their ease of use and ability to be manipulated based on output format. None of the assembly methods handle contamination well and high-quality DNA is a prerequisite. BGCs of compounds with known anti-TB activity were identified in all Micromonospora and Streptomyces strains (genomes > 5 Mb), while no such BGCs were identified in Micrococcus or Brevibacterium strains (genomes < 5 Mb). The majority of the putative BGCs identified were located on contig edges, emphasizing the inability of short-read assemblers to resolve repeat regions and supporting the need for long-read sequencing to fully resolve BGCs.

Details

Language :
English
ISSN :
22967745
Volume :
9
Database :
Directory of Open Access Journals
Journal :
Frontiers in Marine Science
Publication Type :
Academic Journal
Accession number :
edsdoj.3b991116c2e408298bef46f7a7f0432
Document Type :
article
Full Text :
https://doi.org/10.3389/fmars.2022.914197