Back to Search
Start Over
A penalized regression approach to haplotype reconstruction of viral populations arising in early HIV/SIV infection
- Source :
- Bioinformatics. 33:2455-2463
- Publication Year :
- 2017
- Publisher :
- Oxford University Press (OUP), 2017.
-
Abstract
- Motivation Next generation sequencing (NGS) has been increasingly applied to characterize viral evolution during HIV and SIV infections. In particular, NGS datasets sampled during the initial months of infection are characterized by relatively low levels of diversity as well as convergent evolution at multiple loci dispersed across the viral genome. Consequently, fully characterizing viral evolution from NGS datasets requires haplotype reconstruction across large regions of the viral genome. Existing haplotype reconstruction algorithms have not been developed with the particular characteristics of early HIV/SIV infection in mind, raising the possibility that better performance could be achieved through a specifically designed algorithm. Results Here, we introduce a haplotype reconstruction algorithm, RegressHaplo, specifically designed for low diversity and convergent evolution regimes. The algorithm uses a penalized regression that balances a data fitting term with a penalty term that encourages solutions with few haplotypes. The regression covariates are a large set of potential haplotypes and fitting the regression is made computationally feasible by the low diversity setting. Using simulated and in vivo datasets, we compare RegressHaplo to PredictHaplo and QuRe, two existing haplotype reconstruction algorithms. RegressHaplo performs better than these algorithms on simulated datasets with relatively low diversity levels. We suggest RegressHaplo as a novel tool for the investigation of early infection HIV/SIV datasets and, more generally, low diversity viral NGS datasets. Availability and Implementation https://github.com/SLeviyang/RegressHaplo
- Subjects :
- 0301 basic medicine
Statistics and Probability
HIV Infections
Genome, Viral
Computational biology
Biology
Biochemistry
Genome
DNA sequencing
03 medical and health sciences
Convergent evolution
Covariate
Animals
Humans
Molecular Biology
Genetics
030102 biochemistry & molecular biology
Sequence Analysis, RNA
Haplotype
HIV
High-Throughput Nucleotide Sequencing
Reconstruction algorithm
Genomics
Original Papers
Regression
Computer Science Applications
Computational Mathematics
030104 developmental biology
Haplotypes
Computational Theory and Mathematics
Viral evolution
Simian Immunodeficiency Virus
Algorithms
Software
Retroviridae Infections
Subjects
Details
- ISSN :
- 13674811 and 13674803
- Volume :
- 33
- Database :
- OpenAIRE
- Journal :
- Bioinformatics
- Accession number :
- edsair.doi.dedup.....d6eb3440dc0927719959cd77fbe0c742
- Full Text :
- https://doi.org/10.1093/bioinformatics/btx187