Back to Search Start Over

Software choice and depth of sequence coverage can impact plastid genome assembly – A case study in the narrow endemic Calligonum bakuense

Authors :
Katja Reichel
Eka Giorgashvili
Thomas Borsch
Calvinna Caswara
Vuqar Kerimov
Michael Gruenstaeudl
Publication Year :
2021
Publisher :
Cold Spring Harbor Laboratory, 2021.

Abstract

Most plastid genome sequences are assembled from short-read whole-genome sequencing data, yet the impact that sequence coverage and the choice of assembly software can have on the accuracy of the resulting assemblies is poorly understood. In this study, we test the impact of both factors on plastid genome assembly in the threatened and rare endemic shrub Calligonum bakuense, which forms a distinct lineage in the genus Calligonum. We aim to characterize the differences across plastid genome assemblies generated by different assembly software tools and levels of sequence coverage and to determine if these differences are large enough to affect the phylogenetic position inferred for C. bakuense. Four assembly software tools (FastPlast, GetOrganelle, IOGA, and NOVOPlasty) and three levels of sequence coverage (original depth, 2,000x, and 500x) are compared in our analyses. The resulting assemblies are evaluated with regard to reproducibility, contig number, gene complement, inverted repeat length, and computation time; the impact of sequence differences on phylogenetic tree inference is also assessed. Our results show that software choice can have a considerable impact on the accuracy and reproducibility of plastid genome assembly and that GetOrganelle produced the most consistent assemblies for C. bakuense. Moreover, we found that a cap in sequence coverage can reduce both the sequence variability across assembly contigs and computation time. While no evidence was found that the sequence variability across assemblies was large enough to affect the phylogenetic position inferred for C. bakuense, differences among the assemblies may influence genotype recognition at the population level.

Details

Database :
OpenAIRE
Accession number :
edsair.doi...........90ef15ded607557bbcce71bb03e81145
Full Text :
https://doi.org/10.1101/2021.10.06.463392