Back to Search Start Over

Easy and Accurate Reconstruction of Whole HIV Genomes from Short-Read Sequence Data

Authors :
Katrien Fransen
Guido Vanham
Huldrych F. Günthard
Roger D. Kouyos
Ben Berkhout
Daniela Bezemer
Kholoud Porter
François Blanquart
Astrid Gall
Ard van Sighem
Norbert Bannert
Oliver Laeyendecker
Peter Reiss
Jan Albert
Marion Cornelissen
Margreet Bakker
Christophe Fraser
M. Kate Grabowski
Chris Wymant
Kirsi Liitsola
Matti Ristola
Laurence Meyer
Jacques Fellay
Annabelle Gourlay
Paul Kellam
Nicholas J. Croucher
Swee Hoe Ong
Mariska Hillebregt
Tanya Golubchik
Barbara Gunsenheimer-Bartmeyer
Pia Kivelä
Matthew Hall
Publication Year :
2016
Publisher :
Cold Spring Harbor Laboratory, 2016.

Abstract

Next-generation sequencing has yet to be widely adopted for HIV. The difficulty of accurately reconstructing the consensus sequence of a quasispecies from reads (short fragments of DNA) in the presence of rapid between- and within-host evolution may have presented a barrier. In particular, mapping (aligning) reads to a reference sequence leads to biased loss of information; this bias can distort epidemiological and evolutionary conclusions.De novoassembly avoids this bias by effectively aligning the reads to themselves, producing a set of sequences called contigs. However contigs provide only a partial summary of the reads, misassembly may result in their having an incorrect structure, and no information is available at parts of the genome where contigs could not be assembled. To address these problems we developed the toolshiverto preprocess reads for quality and contamination, then map them to a reference tailored to the sample using corrected contigs supplemented with existing reference sequences. Run with two commands per sample, it can easily be used for large heterogeneous data sets. We useshiverto reconstruct the consensus sequence and minority variant information from paired-end short-read data produced with the Illumina platform, for 65 existing publicly available samples and 50 new samples. We show the systematic superiority of mapping toshiver’s constructed reference over mapping the same reads to the standard reference HXB2: an average of 29 bases per sample are called differently, of which 98.5% are supported by higher coverage. We also provide a practical guide to working with imperfect contigs.

Details

Database :
OpenAIRE
Accession number :
edsair.doi.dedup.....6f64e30970f12c83a9d8e12327c565e5
Full Text :
https://doi.org/10.1101/092916