Back to Search Start Over

HIV-1 Full-Genome Phylogenetics of Generalized Epidemics in Sub-Saharan Africa: Impact of Missing Nucleotide Characters in Next-Generation Sequences

Authors :
Ratmann, Oliver
Wymant, Chris
Colijn, Caroline
Danaviah, Siva
Essex, Max
Frost, Simon
Gall, Astrid
Gaseitsiwe, Simani
Grabowski, Mary
Gray, Ronald
Guindon, Stéphane
Von Haeseler, Arndt
Kaleebu, Pontiano
Kendall, Michelle
Kozlov, Alexey
Manasa, Justen
Minh, Bui Quang
Moyo, Sikhulile
Novitsky, Vlad
Nsubuga, Rebecca
Pillay, Sureshnee
Quinn, Thomas
Serwadda, David
Ssemwanga, Deogratius
Stamatakis, Alexandros
Trifinopoulos, Jana
Wawer, Maria
Brown, Andy Leigh
De Oliveira, Tulio
Pillay, Deenan
Fraser, Christophe
Department of Infectious Disease Epidemiology [London] (DIDE)
Imperial College London
Botswana Harvard AIDS Institute Partnership
Méthodes et Algorithmes pour la Bioinformatique (MAB)
Laboratoire d'Informatique de Robotique et de Microélectronique de Montpellier (LIRMM)
Centre National de la Recherche Scientifique (CNRS)-Université de Montpellier (UM)-Centre National de la Recherche Scientifique (CNRS)-Université de Montpellier (UM)
Africa Centre for Health and Population Studies
University of KwaZulu-Natal (UKZN)-Medical Research Council of South Africa
Source :
AIDS Research and Human Retroviruses, AIDS Research and Human Retroviruses, Mary Ann Liebert, 2017, 33 (11), pp.1083-1098. ⟨10.1089/aid.2017.0061⟩
Publication Year :
2017
Publisher :
HAL CCSD, 2017.

Abstract

International audience; To characterize HIV-1 transmission dynamics in regions where the burden of HIV-1 is greatest, the “Phylogenetics and Networks for Generalised HIV Epidemics in Africa” consortium (PANGEA-HIV) is sequencing full-genome viral isolates from across sub-Saharan Africa. We report the first 3,985 PANGEA-HIV consensus sequences from four cohort sites (Rakai Community Cohort Study, n = 2,833; MRC/UVRI Uganda, n = 701; Mochudi Prevention Project, n = 359; Africa Health Research Institute Resistance Cohort, n = 92). Next-generation sequencing success rates varied: more than 80% of the viral genome from the gag to the nef genes could be determined for all sequences from South Africa, 75% of sequences from Mochudi, 60% of sequences from MRC/UVRI Uganda, and 22% of sequences from Rakai. Partial sequencing failure was primarily associated with low viral load, increased for amplicons closer to the 3′ end of the genome, was not associated with subtype diversity except HIV-1 subtype D, and remained significantly associated with sampling location after controlling for other factors. We assessed the impact of the missing data patterns in PANGEA-HIV sequences on phylogeny reconstruction in simulations. We found a threshold in terms of taxon sampling below which the patchy distribution of missing characters in next-generation sequences (NGS) has an excess negative impact on the accuracy of HIV-1 phylogeny reconstruction, which is attributable to tree reconstruction artifacts that accumulate when branches in viral trees are long. The large number of PANGEA-HIV sequences provides unprecedented opportunities for evaluating HIV-1 transmission dynamics across sub-Saharan Africa and identifying prevention opportunities. Molecular epidemiological analyses of these data must proceed cautiously because sequence sampling remains below the identified threshold and a considerable negative impact of missing characters on phylogeny reconstruction is expected.

Details

Language :
English
ISSN :
08892229
Database :
OpenAIRE
Journal :
AIDS Research and Human Retroviruses, AIDS Research and Human Retroviruses, Mary Ann Liebert, 2017, 33 (11), pp.1083-1098. ⟨10.1089/aid.2017.0061⟩
Accession number :
edsair.dedup.wf.001..1eca5e316bee8c8425023e1a2f4c07f3
Full Text :
https://doi.org/10.1089/aid.2017.0061⟩