Author: "Art F Y Poon" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Art F Y Poon"' showing total 154 results

Start Over Author "Art F Y Poon"

154 results on '"Art F Y Poon"'

1. Optimized phylogenetic clustering of HIV-1 sequence data for public health applications.

Author: Connor Chato, Yi Feng, Yuhua Ruan, Hui Xing, Joshua Herbeck, Marcia Kalish, and Art F Y Poon
Subjects: Biology (General), QH301-705.5
Abstract: Clusters of genetically similar infections suggest rapid transmission and may indicate priorities for public health action or reveal underlying epidemiological processes. However, clusters often require user-defined thresholds and are sensitive to non-epidemiological factors, such as non-random sampling. Consequently the ideal threshold for public health applications varies substantially across settings. Here, we show a method which selects optimal thresholds for phylogenetic (subset tree) clustering based on population. We evaluated this method on HIV-1 pol datasets (n = 14, 221 sequences) from four sites in USA (Tennessee, Washington), Canada (Northern Alberta) and China (Beijing). Clusters were defined by tips descending from an ancestral node (with a minimum bootstrap support of 95%) through a series of branches, each with a length below a given threshold. Next, we used pplacer to graft new cases to the fixed tree by maximum likelihood. We evaluated the effect of varying branch-length thresholds on cluster growth as a count outcome by fitting two Poisson regression models: a null model that predicts growth from cluster size, and an alternative model that includes mean collection date as an additional covariate. The alternative model was favoured by AIC across most thresholds, with optimal (greatest difference in AIC) thresholds ranging 0.007-0.013 across sites. The range of optimal thresholds was more variable when re-sampling 80% of the data by location (IQR 0.008 - 0.016, n = 100 replicates). Our results use prospective phylogenetic cluster growth and suggest that there is more variation in effective thresholds for public health than those typically used in clustering studies.
Published: 2022
Full Text: View/download PDF

2. Molecular source attribution.

Author: Elisa Chao, Connor Chato, Reid Vender, Abayomi S Olabode, Roux-Cil Ferreira, and Art F Y Poon
Subjects: Biology (General), QH301-705.5
Published: 2022
Full Text: View/download PDF

3. Using networks to analyze and visualize the distribution of overlapping genes in virus genomes.

Author: Laura Muñoz-Baena and Art F Y Poon
Subjects: Immunologic diseases. Allergy, RC581-607, Biology (General), QH301-705.5
Abstract: Gene overlap occurs when two or more genes are encoded by the same nucleotides. This phenomenon is found in all taxonomic domains, but is particularly common in viruses, where it may increase the information content of compact genomes or influence the creation of new genes. Here we report a global comparative study of overlapping open reading frames (OvRFs) of 12,609 virus reference genomes in the NCBI database. We retrieved metadata associated with all annotated open reading frames (ORFs) in each genome record to calculate the number, length, and frameshift of OvRFs. Our results show that while the number of OvRFs increases with genome length, they tend to be shorter in longer genomes. The majority of overlaps involve +2 frameshifts, predominantly found in dsDNA viruses. Antisense overlaps in which one of the ORFs was encoded in the same frame on the opposite strand (-0) tend to be longer. Next, we develop a new graph-based representation of the distribution of overlaps among the ORFs of genomes in a given virus family. In the absence of an unambiguous partition of ORFs by homology at this taxonomic level, we used an alignment-free k-mer based approach to cluster protein coding sequences by similarity. We connect these clusters with two types of directed edges to indicate (1) that constituent ORFs are adjacent in one or more genomes, and (2) that these ORFs overlap. These adjacency graphs not only provide a natural visualization scheme, but also a novel statistical framework for analyzing the effects of gene- and genome-level attributes on the frequencies of overlaps.
Published: 2022
Full Text: View/download PDF

4. Network science inspires novel tree shape statistics.

Author: Leonid Chindelevitch, Maryam Hayati, Art F Y Poon, and Caroline Colijn
Subjects: Medicine, Science
Abstract: The shape of phylogenetic trees can be used to gain evolutionary insights. A tree's shape specifies the connectivity of a tree, while its branch lengths reflect either the time or genetic distance between branching events; well-known measures of tree shape include the Colless and Sackin imbalance, which describe the asymmetry of a tree. In other contexts, network science has become an important paradigm for describing structural features of networks and using them to understand complex systems, ranging from protein interactions to social systems. Network science is thus a potential source of many novel ways to characterize tree shape, as trees are also networks. Here, we tailor tools from network science, including diameter, average path length, and betweenness, closeness, and eigenvector centrality, to summarize phylogenetic tree shapes. We thereby propose tree shape summaries that are complementary to both asymmetry and the frequencies of small configurations. These new statistics can be computed in linear time and scale well to describe the shapes of large trees. We apply these statistics, alongside some conventional tree statistics, to phylogenetic trees from three very different viruses (HIV, dengue fever and measles), from the same virus in different epidemiological scenarios (influenza A and HIV) and from simulation models known to produce trees with different shapes. Using mutual information and supervised learning algorithms, we find that the statistics adapted from network science perform as well as or better than conventional statistics. We describe their distributions and prove some basic results about their extreme values in a tree. We conclude that network science-based tree shape summaries are a promising addition to the toolkit of tree shape features. All our shape summaries, as well as functions to select the most discriminating ones for two sets of trees, are freely available as an R package at http://github.com/Leonardini/treeCentrality.
Published: 2021
Full Text: View/download PDF

5. A model-based clustering method to detect infectious disease transmission outbreaks from sequence variation.

Author: Rosemary M McCloskey and Art F Y Poon
Subjects: Biology (General), QH301-705.5
Abstract: Clustering infections by genetic similarity is a popular technique for identifying potential outbreaks of infectious disease, in part because sequences are now routinely collected for clinical management of many infections. A diverse number of nonparametric clustering methods have been developed for this purpose. These methods are generally intuitive, rapid to compute, and readily scale with large data sets. However, we have found that nonparametric clustering methods can be biased towards identifying clusters of diagnosis-where individuals are sampled sooner post-infection-rather than the clusters of rapid transmission that are meant to be potential foci for public health efforts. We develop a fundamentally new approach to genetic clustering based on fitting a Markov-modulated Poisson process (MMPP), which represents the evolution of transmission rates along the tree relating different infections. We evaluated this model-based method alongside five nonparametric clustering methods using both simulated and actual HIV sequence data sets. For simulated clusters of rapid transmission, the MMPP clustering method obtained higher mean sensitivity (85%) and specificity (91%) than the nonparametric methods. When we applied these clustering methods to published sequences from a study of HIV-1 genetic clusters in Seattle, USA, we found that the MMPP method categorized about half (46%) as many individuals to clusters compared to the other methods. Furthermore, the mean internal branch lengths that approximate transmission rates were significantly shorter in clusters extracted using MMPP, but not by other methods. We determined that the computing time for the MMPP method scaled linearly with the size of trees, requiring about 30 seconds for a tree of 1,000 tips and about 20 minutes for 50,000 tips on a single computer. This new approach to genetic clustering has significant implications for the application of pathogen sequence analysis to public health, where it is critical to robustly and accurately identify clusters for the most cost-effective deployment of outbreak management and prevention resources.
Published: 2017
Full Text: View/download PDF

6. Ancestral Reconstruction.

Author: Jeffrey B Joy, Richard H Liang, Rosemary M McCloskey, T Nguyen, and Art F Y Poon
Subjects: Biology (General), QH301-705.5
Published: 2016
Full Text: View/download PDF

7. Genotypic and functional impact of HIV-1 adaptation to its host population during the North American epidemic.

Author: Laura A Cotton, Xiaomei T Kuang, Anh Q Le, Jonathan M Carlson, Benjamin Chan, Denis R Chopera, Chanson J Brumme, Tristan J Markle, Eric Martin, Aniqa Shahid, Gursev Anmole, Philip Mwimanzi, Pauline Nassab, Kali A Penney, Manal A Rahman, M-J Milloy, Martin T Schechter, Martin Markowitz, Mary Carrington, Bruce D Walker, Theresa Wagner, Susan Buchbinder, Jonathan Fuchs, Beryl Koblin, Kenneth H Mayer, P Richard Harrigan, Mark A Brockman, Art F Y Poon, and Zabrina L Brumme
Subjects: Genetics, QH426-470
Abstract: HLA-restricted immune escape mutations that persist following HIV transmission could gradually spread through the viral population, thereby compromising host antiviral immunity as the epidemic progresses. To assess the extent and phenotypic impact of this phenomenon in an immunogenetically diverse population, we genotypically and functionally compared linked HLA and HIV (Gag/Nef) sequences from 358 historic (1979-1989) and 382 modern (2000-2011) specimens from four key cities in the North American epidemic (New York, Boston, San Francisco, Vancouver). Inferred HIV phylogenies were star-like, with approximately two-fold greater mean pairwise distances in modern versus historic sequences. The reconstructed epidemic ancestral (founder) HIV sequence was essentially identical to the North American subtype B consensus. Consistent with gradual diversification of a "consensus-like" founder virus, the median "background" frequencies of individual HLA-associated polymorphisms in HIV (in individuals lacking the restricting HLA[s]) were ∼ 2-fold higher in modern versus historic HIV sequences, though these remained notably low overall (e.g. in Gag, medians were 3.7% in the 2000s versus 2.0% in the 1980s). HIV polymorphisms exhibiting the greatest relative spread were those restricted by protective HLAs. Despite these increases, when HIV sequences were analyzed as a whole, their total average burden of polymorphisms that were "pre-adapted" to the average host HLA profile was only ∼ 2% greater in modern versus historic eras. Furthermore, HLA-associated polymorphisms identified in historic HIV sequences were consistent with those detectable today, with none identified that could explain the few HIV codons where the inferred epidemic ancestor differed from the modern consensus. Results are therefore consistent with slow HIV adaptation to HLA, but at a rate unlikely to yield imminent negative implications for cellular immunity, at least in North America. Intriguingly, temporal changes in protein activity of patient-derived Nef (though not Gag) sequences were observed, suggesting functional implications of population-level HIV evolution on certain viral proteins.
Published: 2014
Full Text: View/download PDF

8. 'Deep' sequencing accuracy and reproducibility using Roche/454 technology for inferring co-receptor usage in HIV-1.

Author: David J H F Knapp, Rachel A McGovern, Art F Y Poon, Xiaoyin Zhong, Dennison Chan, Luke C Swenson, Winnie Dong, and P Richard Harrigan
Subjects: Medicine, Science
Abstract: Next generation, "deep", sequencing has increasing applications both clinically and in disparate fields of research. This study investigates the accuracy and reproducibility of "deep" sequencing as applied to co-receptor prediction using the V3 loop of Human Immunodeficiency Virus-1. Despite increasing use in HIV co-receptor prediction, the accuracy and reproducibility of deep sequencing technology, and the factors which can affect it, have received only a limited level of investigation. To accomplish this, repeated deep sequencing results were generated using the Roche GS-FLX (454) from a number of sources including a non-homogeneous clinical sample (N = 47 replicates over 18 deep sequencing runs), and a large clinical cohort from the MOTIVATE and A400129 studies (N = 1521). For repeated measurements of a non-homogeneous clinical sample, increasing input copy number both decreased variance in the measured proportion of non-R5 using virus (p<
Published: 2014
Full Text: View/download PDF

9. Real-time evaluation of signal accuracy in wastewater surveillance of pathogens with high rates of mutation

Author: Ocean Thakali, Élisabeth Mercier, Walaa Eid, Martin Wellman, Julia Brasset-Gorny, Alyssa K. Overton, Jennifer J. Knapp, Douglas Manuel, Trevor C. Charles, Lawrence Goodridge, Eric J. Arts, Art F. Y. Poon, R. Stephen Brown, Tyson E. Graber, Robert Delatolla, Christopher T. DeGroot, and Ontario Wastewater Surveillance Consortium
Subjects: Medicine, Science
Abstract: Abstract Wastewater surveillance of coronavirus disease 2019 (COVID-19) commonly applies reverse transcription-quantitative polymerase chain reaction (RT-qPCR) to quantify severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) RNA concentrations in wastewater over time. In most applications worldwide, maximal sensitivity and specificity of RT-qPCR has been achieved, in part, by monitoring two or more genomic loci of SARS-CoV-2. In Ontario, Canada, the provincial Wastewater Surveillance Initiative reports the average copies of the CDC N1 and N2 loci normalized to the fecal biomarker pepper mild mottle virus. In November 2021, the emergence of the Omicron variant of concern, harboring a C28311T mutation within the CDC N1 probe region, challenged the accuracy of the consensus between the RT-qPCR measurements of the N1 and N2 loci of SARS-CoV-2. In this study, we developed and applied a novel real-time dual loci quality assurance and control framework based on the relative difference between the loci measurements to the City of Ottawa dataset to identify a loss of sensitivity of the N1 assay in the period from July 10, 2022 to January 31, 2023. Further analysis via sequencing and allele-specific RT-qPCR revealed a high proportion of mutations C28312T and A28330G during the study period, both in the City of Ottawa and across the province. It is hypothesized that nucleotide mutations in the probe region, especially A28330G, led to inefficient annealing, resulting in reduction in sensitivity and accuracy of the N1 assay. This study highlights the importance of implementing quality assurance and control criteria to continually evaluate, in near real-time, the accuracy of the signal produced in wastewater surveillance applications that rely on detection of pathogens whose genomes undergo high rates of mutation.
Published: 2024
Full Text: View/download PDF

10. Mapping the shapes of phylogenetic trees from human and zoonotic RNA viruses.

Author: Art F Y Poon, Lorne W Walker, Heather Murray, Rosemary M McCloskey, P Richard Harrigan, and Richard H Liang
Subjects: Medicine, Science
Abstract: A phylogeny is a tree-based model of common ancestry that is an indispensable tool for studying biological variation. Phylogenies play a special role in the study of rapidly evolving populations such as viruses, where the proliferation of lineages is constantly being shaped by the mode of virus transmission, by adaptation to immune systems, and by patterns of human migration and contact. These processes may leave an imprint on the shapes of virus phylogenies that can be extracted for comparative study; however, tree shapes are intrinsically difficult to quantify. Here we present a comprehensive study of phylogenies reconstructed from 38 different RNA viruses from 12 taxonomic families that are associated with human pathologies. To accomplish this, we have developed a new procedure for studying phylogenetic tree shapes based on the 'kernel trick', a technique that maps complex objects into a statistically convenient space. We show that our kernel method outperforms nine different tree balance statistics at correctly classifying phylogenies that were simulated under different evolutionary scenarios. Using the kernel method, we observe patterns in the distribution of RNA virus phylogenies in this space that reflect modes of transmission and pathogenesis. For example, viruses that can establish persistent chronic infections (such as HIV and hepatitis C virus) form a distinct cluster. Although the visibly 'star-like' shape characteristic of trees from these viruses has been well-documented, we show that established methods for quantifying tree shape fail to distinguish these trees from those of other viruses. The kernel approach presented here potentially represents an important new tool for characterizing the evolution and epidemiology of RNA viruses.
Published: 2013
Full Text: View/download PDF

11. Reconstructing the dynamics of HIV evolution within hosts from serial deep sequence data.

Author: Art F Y Poon, Luke C Swenson, Evelien M Bunnik, Diana Edo-Matas, Hanneke Schuitemaker, Angélique B van 't Wout, and P Richard Harrigan
Subjects: Biology (General), QH301-705.5
Abstract: At the early stage of infection, human immunodeficiency virus (HIV)-1 predominantly uses the CCR5 coreceptor for host cell entry. The subsequent emergence of HIV variants that use the CXCR4 coreceptor in roughly half of all infections is associated with an accelerated decline of CD4+ T-cells and rate of progression to AIDS. The presence of a 'fitness valley' separating CCR5- and CXCR4-using genotypes is postulated to be a biological determinant of whether the HIV coreceptor switch occurs. Using phylogenetic methods to reconstruct the evolutionary dynamics of HIV within hosts enables us to discriminate between competing models of this process. We have developed a phylogenetic pipeline for the molecular clock analysis, ancestral reconstruction, and visualization of deep sequence data. These data were generated by next-generation sequencing of HIV RNA extracted from longitudinal serum samples (median 7 time points) from 8 untreated subjects with chronic HIV infections (Amsterdam Cohort Studies on HIV-1 infection and AIDS). We used the known dates of sampling to directly estimate rates of evolution and to map ancestral mutations to a reconstructed timeline in units of days. HIV coreceptor usage was predicted from reconstructed ancestral sequences using the geno2pheno algorithm. We determined that the first mutations contributing to CXCR4 use emerged about 16 (per subject range 4 to 30) months before the earliest predicted CXCR4-using ancestor, which preceded the first positive cell-based assay of CXCR4 usage by 10 (range 5 to 25) months. CXCR4 usage arose in multiple lineages within 5 of 8 subjects, and ancestral lineages following alternate mutational pathways before going extinct were common. We observed highly patient-specific distributions and time-scales of mutation accumulation, implying that the role of a fitness valley is contingent on the genotype of the transmitted variant.
Published: 2012
Full Text: View/download PDF

12. Prolonged and substantial discordance in prevalence of raltegravir-resistant HIV-1 in plasma versus PBMC samples revealed by 454 'deep' sequencing.

Author: Guinevere Q Lee, Luke C Swenson, Art F Y Poon, Jeffrey N Martin, Hiroyu Hatano, Steven G Deeks, and P Richard Harrigan
Subjects: Medicine, Science
Abstract: The evolution of drug resistance mutations in plasma samples is relatively well-characterized. However, the viral population and diversity in other body compartments such as peripheral blood mononuclear cells (PBMC) remains poorly understood. Previous studies have mostly focused on protease and reverse transcriptase drug resistance mutations (DRMs). In this study, we used 454 "deep" sequencing technology to observe and quantify longitudinally the prevalence of resistance mutations associated with the integrase inhibitor, raltegravir, in plasma versus PBMC samples from a San Francisco-based cohort. Four heavily treatment-experienced subjects were monitored in this study over a median of 1.2 years since the initiation of raltegravir-containing regimens. We observed a consistent discordance in the prevalence of DRMs, but not resistance pathway(s), in the plasma versus PBMC viral populations. In the final paired samples that were tested while the subjects were on a raltegravir-containing regimen, DRM prevalence reached 100% in plasma but remained 1% in PBMC on day 177 post-therapy in Subject 3180 (Q148H/G140S), 100% in plasma and 36% in PBMC on day 224 in Subject 3242 (N155H), 78% in plasma and 11-12% in PBMC on day 338 in Subject 3501 (Q148H/G140S), and 100% in plasma and 0% in PBMC on day 197 in Subject 3508 (Y143R). Furthermore, absolute sequence homology comparison between the two compartments revealed that 21% - 99% of PBMC sequences had no match in plasma, whereas 14% - 100% of plasma sequences had no match in PBMC. Overall, our observations suggested that plasma and PBMC hosted drastically different HIV-1 populations even after a prolonged exposure to raltegravir selection pressure.
Published: 2012
Full Text: View/download PDF

13. HIV-1 nef protein structures associated with brain infection and dementia pathogenesis.

Author: Susanna L Lamers, Art F Y Poon, and Michael S McGrath
Subjects: Medicine, Science
Abstract: The difference between regional rates of HIV-associated dementia (HAD) in patients infected with different subtypes of HIV suggests that genetic determinants exist within HIV that influence the ability of the virus to replicate in the central nervous system (in Uganda, Africa, subtype D HAD rate is 89%, while subtype A HAD rate is 24%). HIV-1 nef is a multifunctional protein with known toxic effects in the brain compartment. The goal of the current study was to identify if specific three-dimensional nef structures may be linked to patients who developed HAD. HIV-1 nef structures were computationally derived for consensus brain and non-brain sequences from a panel of patients infected with subtype B who died due to varied disease pathologies and consensus subtype A and subtype D sequences from Uganda. Site directed mutation analysis identified signatures in brain structures that appear to change binding potentials and could affect folding conformations of brain-associated structures. Despite the large sequence variation between HIV subtypes, structural alignments confirmed that viral structures derived from patients with HAD were more similar to subtype D structures than to structures derived from patient sequences without HAD. Furthermore, structures derived from brain sequences of patients with HAD were more similar to subtype D structures than they were to their own non-brain structures. The potential finding of a brain-specific nef structure indicates that HAD may result from genetic alterations that alter the folding or binding potential of the protein.
Published: 2011
Full Text: View/download PDF

14. Selection in coastal Synechococcus (cyanobacteria) populations evaluated from environmental metagenomes.

Author: Vera Tai, Art F Y Poon, Ian T Paulsen, and Brian Palenik
Subjects: Medicine, Science
Abstract: Environmental metagenomics provides snippets of genomic sequences from all organisms in an environmental sample and are an unprecedented resource of information for investigating microbial population genetics. Current analytical methods, however, are poorly equipped to handle metagenomic data, particularly of short, unlinked sequences. A custom analytical pipeline was developed to calculate dN/dS ratios, a common metric to evaluate the role of selection in the evolution of a gene, from environmental metagenomes sequenced using 454 technology of flow-sorted populations of marine Synechococcus, the dominant cyanobacteria in coastal environments. The large majority of genes (98%) have evolved under purifying selection (dN/dS1), 77 out of 83 (93%) were hypothetical. Notable among annotated genes, ribosomal protein L35 appears to be under positive selection in one Synechococcus population. Other annotated genes, in particular a possible porin, a large-conductance mechanosensitive channel, an ATP binding component of an ABC transporter, and a homologue of a pilus retraction protein had regions of the gene with elevated dN/dS. With the increasing use of next-generation sequencing in metagenomic investigations of microbial diversity and ecology, analytical methods need to accommodate the peculiarities of these data streams. By developing a means to analyze population diversity data from these environmental metagenomes, we have provided the first insight into the role of selection in the evolution of Synechococcus, a globally significant primary producer.
Published: 2011
Full Text: View/download PDF

15. Transmitted drug resistance in the CFAR network of integrated clinical systems cohort: prevalence and effects on pre-therapy CD4 and viral load.

Author: Art F Y Poon, Jeannette L Aldous, W Christopher Mathews, Mari Kitahata, James S Kahn, Michael S Saag, Benigno Rodríguez, Stephen L Boswell, Simon D W Frost, and Richard H Haubrich
Subjects: Medicine, Science
Abstract: Human immunodeficiency virus type 1 (HIV-1) genomes often carry one or more mutations associated with drug resistance upon transmission into a therapy-naïve individual. We assessed the prevalence and clinical significance of transmitted drug resistance (TDR) in chronically-infected therapy-naïve patients enrolled in a multi-center cohort in North America. Pre-therapy clinical significance was quantified by plasma viral load (pVL) and CD4+ cell count (CD4) at baseline. Naïve bulk sequences of HIV-1 protease and reverse transcriptase (RT) were screened for resistance mutations as defined by the World Health Organization surveillance list. The overall prevalence of TDR was 14.2%. We used a Bayesian network to identify co-transmission of TDR mutations in clusters associated with specific drugs or drug classes. Aggregate effects of mutations by drug class were estimated by fitting linear models of pVL and CD4 on weighted sums over TDR mutations according to the Stanford HIV Database algorithm. Transmitted resistance to both classes of reverse transcriptase inhibitors was significantly associated with lower CD4, but had opposing effects on pVL. In contrast, position-specific analyses of TDR mutations revealed substantial effects on CD4 and pVL at several residue positions that were being masked in the aggregate analyses, and significant interaction effects as well. Residue positions in RT with predominant effects on CD4 or pVL (D67 and M184) were re-evaluated in causal models using an inverse probability-weighting scheme to address the problem of confounding by other mutations and demographic or risk factors. We found that causal effect estimates of mutations M184V/I (-1.7 log₁₀pVL) and D67N/G (-2.1[³√CD4] and 0.4 log₁₀pVL) were compensated by K103N/S and K219Q/E/N/R. As TDR becomes an increasing dilemma in this modern era of highly-active antiretroviral therapy, these results have immediate significance for the clinical management of HIV-1 infections and our understanding of the ongoing adaptation of HIV-1 to human populations.
Published: 2011
Full Text: View/download PDF

16. An evolutionary model-based algorithm for accurate phylogenetic breakpoint mapping and subtype prediction in HIV-1.

Author: Sergei L Kosakovsky Pond, David Posada, Eric Stawiski, Colombe Chappey, Art F Y Poon, Gareth Hughes, Esther Fearnhill, Mike B Gravenor, Andrew J Leigh Brown, and Simon D W Frost
Subjects: Biology (General), QH301-705.5
Abstract: Genetically diverse pathogens (such as Human Immunodeficiency virus type 1, HIV-1) are frequently stratified into phylogenetically or immunologically defined subtypes for classification purposes. Computational identification of such subtypes is helpful in surveillance, epidemiological analysis and detection of novel variants, e.g., circulating recombinant forms in HIV-1. A number of conceptually and technically different techniques have been proposed for determining the subtype of a query sequence, but there is not a universally optimal approach. We present a model-based phylogenetic method for automatically subtyping an HIV-1 (or other viral or bacterial) sequence, mapping the location of breakpoints and assigning parental sequences in recombinant strains as well as computing confidence levels for the inferred quantities. Our Subtype Classification Using Evolutionary ALgorithms (SCUEAL) procedure is shown to perform very well in a variety of simulation scenarios, runs in parallel when multiple sequences are being screened, and matches or exceeds the performance of existing approaches on typical empirical cases. We applied SCUEAL to all available polymerase (pol) sequences from two large databases, the Stanford Drug Resistance database and the UK HIV Drug Resistance Database. Comparing with subtypes which had previously been assigned revealed that a minor but substantial (approximately 5%) fraction of pure subtype sequences may in fact be within- or inter-subtype recombinants. A free implementation of SCUEAL is provided as a module for the HyPhy package and the Datamonkey web server. Our method is especially useful when an accurate automatic classification of an unknown strain is desired, and is positioned to complement and extend faster but less accurate methods. Given the increasingly frequent use of HIV subtype information in studies focusing on the effect of subtype on treatment, clinical outcome, pathogenicity and vaccine design, the importance of accurate, robust and extensible subtyping procedures is clear.
Published: 2009
Full Text: View/download PDF

17. Parsing social network survey data from hidden populations using stochastic context-free grammars.

Author: Art F Y Poon, Kimberly C Brouwer, Steffanie A Strathdee, Michelle Firestone-Cruz, Remedios M Lozada, Sergei L Kosakovsky Pond, Douglas D Heckathorn, and Simon D W Frost
Subjects: Medicine, Science
Abstract: BACKGROUND:Human populations are structured by social networks, in which individuals tend to form relationships based on shared attributes. Certain attributes that are ambiguous, stigmatized or illegal can create a OhiddenO population, so-called because its members are difficult to identify. Many hidden populations are also at an elevated risk of exposure to infectious diseases. Consequently, public health agencies are presently adopting modern survey techniques that traverse social networks in hidden populations by soliciting individuals to recruit their peers, e.g., respondent-driven sampling (RDS). The concomitant accumulation of network-based epidemiological data, however, is rapidly outpacing the development of computational methods for analysis. Moreover, current analytical models rely on unrealistic assumptions, e.g., that the traversal of social networks can be modeled by a Markov chain rather than a branching process. METHODOLOGY/PRINCIPAL FINDINGS:Here, we develop a new methodology based on stochastic context-free grammars (SCFGs), which are well-suited to modeling tree-like structure of the RDS recruitment process. We apply this methodology to an RDS case study of injection drug users (IDUs) in Tijuana, México, a hidden population at high risk of blood-borne and sexually-transmitted infections (i.e., HIV, hepatitis C virus, syphilis). Survey data were encoded as text strings that were parsed using our custom implementation of the inside-outside algorithm in a publicly-available software package (HyPhy), which uses either expectation maximization or direct optimization methods and permits constraints on model parameters for hypothesis testing. We identified significant latent variability in the recruitment process that violates assumptions of Markov chain-based methods for RDS analysis: firstly, IDUs tended to emulate the recruitment behavior of their own recruiter; and secondly, the recruitment of like peers (homophily) was dependent on the number of recruits. CONCLUSIONS:SCFGs provide a rich probabilistic language that can articulate complex latent structure in survey data derived from the traversal of social networks. Such structure that has no representation in Markov chain-based models can interfere with the estimation of the composition of hidden populations if left unaccounted for, raising critical implications for the prevention and control of infectious disease epidemics.
Published: 2009
Full Text: View/download PDF

18. An evolutionary-network model reveals stratified interactions in the V3 loop of the HIV-1 envelope.

Author: Art F Y Poon, Fraser I Lewis, Sergei L Kosakovsky Pond, and Simon D W Frost
Subjects: Biology (General), QH301-705.5
Abstract: The third variable loop (V3) of the human immunodeficiency virus type 1 (HIV-1) envelope is a principal determinant of antibody neutralization and progression to AIDS. Although it is undoubtedly an important target for vaccine research, extensive genetic variation in V3 remains an obstacle to the development of an effective vaccine. Comparative methods that exploit the abundance of sequence data can detect interactions between residues of rapidly evolving proteins such as the HIV-1 envelope, revealing biological constraints on their variability. However, previous studies have relied implicitly on two biologically unrealistic assumptions: (1) that founder effects in the evolutionary history of the sequences can be ignored, and; (2) that statistical associations between residues occur exclusively in pairs. We show that comparative methods that neglect the evolutionary history of extant sequences are susceptible to a high rate of false positives (20%-40%). Therefore, we propose a new method to detect interactions that relaxes both of these assumptions. First, we reconstruct the evolutionary history of extant sequences by maximum likelihood, shifting focus from extant sequence variation to the underlying substitution events. Second, we analyze the joint distribution of substitution events among positions in the sequence as a Bayesian graphical model, in which each branch in the phylogeny is a unit of observation. We perform extensive validation of our models using both simulations and a control case of known interactions in HIV-1 protease, and apply this method to detect interactions within V3 from a sample of 1,154 HIV-1 envelope sequences. Our method greatly reduces the number of false positives due to founder effects, while capturing several higher-order interactions among V3 residues. By mapping these interactions to a structural model of the V3 loop, we find that the loop is stratified into distinct evolutionary clusters. We extend our model to detect interactions between the V3 and C4 domains of the HIV-1 envelope, and account for the uncertainty in mapping substitutions to the tree with a parametric bootstrap.
Published: 2007
Full Text: View/download PDF

19. Adaptation to human populations is revealed by within-host polymorphisms in HIV-1 and hepatitis C virus.

Author: Art F Y Poon, Sergei L Kosakovsky Pond, Phil Bennett, Douglas D Richman, Andrew J Leigh Brown, and Simon D W Frost
Subjects: Immunologic diseases. Allergy, RC581-607, Biology (General), QH301-705.5
Abstract: CD8(+) cytotoxic T-lymphocytes (CTLs) perform a critical role in the immune control of viral infections, including those caused by human immunodeficiency virus type 1 (HIV-1) and hepatitis C virus (HCV). As a result, genetic variation at CTL epitopes is strongly influenced by host-specific selection for either escape from the immune response, or reversion due to the replicative costs of escape mutations in the absence of CTL recognition. Under strong CTL-mediated selection, codon positions within epitopes may immediately "toggle" in response to each host, such that genetic variation in the circulating virus population is shaped by rapid adaptation to immune variation in the host population. However, this hypothesis neglects the substantial genetic variation that accumulates in virus populations within hosts. Here, we evaluate this quantity for a large number of HIV-1- (n > or = 3,000) and HCV-infected patients (n > or = 2,600) by screening bulk RT-PCR sequences for sequencing "mixtures" (i.e., ambiguous nucleotides), which act as site-specific markers of genetic variation within each host. We find that nonsynonymous mixtures are abundant and significantly associated with codon positions under host-specific CTL selection, which should deplete within-host variation by driving the fixation of the favored variant. Using a simple model, we demonstrate that this apparently contradictory outcome can be explained by the transmission of unfavorable variants to new hosts before they are removed by selection, which occurs more frequently when selection and transmission occur on similar time scales. Consequently, the circulating virus population is shaped by the transmission rate and the disparity in selection intensities for escape or reversion as much as it is shaped by the immune diversity of the host population, with potentially serious implications for vaccine design.
Published: 2007
Full Text: View/download PDF

20. Evolutionary interactions between N-linked glycosylation sites in the HIV-1 envelope.

Author: Art F Y Poon, Fraser I Lewis, Sergei L Kosakovsky Pond, and Simon D W Frost
Subjects: Biology (General), QH301-705.5
Abstract: The addition of asparagine (N)-linked polysaccharide chains (i.e., glycans) to the gp120 and gp41 glycoproteins of human immunodeficiency virus type 1 (HIV-1) envelope is not only required for correct protein folding, but also may provide protection against neutralizing antibodies as a "glycan shield." As a result, strong host-specific selection is frequently associated with codon positions where nonsynonymous substitutions can create or disrupt potential N-linked glycosylation sites (PNGSs). Moreover, empirical data suggest that the individual contribution of PNGSs to the neutralization sensitivity or infectivity of HIV-1 may be critically dependent on the presence or absence of other PNGSs in the envelope sequence. Here we evaluate how glycan-glycan interactions have shaped the evolution of HIV-1 envelope sequences by analyzing the distribution of PNGSs in a large-sequence alignment. Using a "covarion"-type phylogenetic model, we find that the rates at which individual PNGSs are gained or lost vary significantly over time, suggesting that the selective advantage of having a PNGS may depend on the presence or absence of other PNGSs in the sequence. Consequently, we identify specific interactions between PNGSs in the alignment using a new paired-character phylogenetic model of evolution, and a Bayesian graphical model. Despite the fundamental differences between these two methods, several interactions are jointly identified by both. Mapping these interactions onto a structural model of HIV-1 gp120 reveals that negative (exclusive) interactions occur significantly more often between colocalized glycans, while positive (inclusive) interactions are restricted to more distant glycans. Our results imply that the adaptive repertoire of alternative configurations in the HIV-1 glycan shield is limited by functional interactions between the N-linked glycans. This represents a potential vulnerability of rapidly evolving HIV-1 populations that may provide useful glycan-based targets for neutralizing antibodies.
Published: 2007
Full Text: View/download PDF

21. Early Warning Measurement of SARS-CoV-2 Variants of Concern in Wastewaters by Mass Spectrometry

Author: Jiaxi Peng, Jianxian Sun, Minqing Ivy Yang, Richard M. Gibson, Eric J. Arts, Abayomi S. Olabode, Art F. Y. Poon, Xianyao Wang, Aaron R. Wheeler, Elizabeth A. Edwards, and Hui Peng
Subjects: Ecology, Health, Toxicology and Mutagenesis, Environmental Chemistry, Pollution, Waste Management and Disposal, Water Science and Technology
Published: 2022

22. Temporary increase in circulating replication-competent latent HIV-infected resting CD4+ T cells after switch to an integrase inhibitor based antiretroviral regimen

Author: Roux-Cil Ferreira, Steven J. Reynolds, Adam A. Capoferri, Owen Baker, Erin E. Brown, Ethan Klock, Jernelle Miller, Jun Lai, Sharada Saraf, Charles Kirby, Briana Lynch, Jada Hackman, Sarah N. Gowanlock, Stephen Tomusange, Samiri Jamiru, Aggrey Anok, Taddeo Kityamuweesi, Paul Buule, Daniel Bruno, Craig Martens, Rebecca Rose, Susanna L. Lamers, Ronald M. Galiwango, Art F. Y. Poon, Thomas C. Quinn, Jessica L. Prodger, and Andrew D. Redd
Subjects: Article
Abstract: The principal barrier to an HIV cure is the presence of a latent viral reservoir (LVR) made up primarily of latently infected resting CD4+ (rCD4) T-cells. Studies in the United States have shown that the LVR decays slowly (half-life=3.8 years), but this rate in African populations has been understudied. This study examined longitudinal changes in the inducible replication competent LVR (RC-LVR) of ART-suppressed Ugandans living with HIV (n=88) from 2015-2020 using the quantitative viral outgrowth assay, which measures infectious units per million (IUPM) rCD4 T-cells. In addition, outgrowth viruses were examined with site-directed next-generation sequencing to assess for possible ongoing viral evolution. During the study period (2018-19), Uganda instituted a nationwide rollout of first-line ART consisting of Dolutegravir (DTG) with two NRTI, which replaced the previous regimen that consisted of one NNRTI and the same two NRTI. Changes in the RC-LVR were analyzed using two versions of a novel Bayesian model that estimated the decay rate over time on ART as a single, linear rate (model A) or allowing for an inflection at time of DTG initiation (model B). Model A estimated the population-level slope of RC-LVR change as a non-significant positive increase. This positive slope was due to a temporary increase in the RC-LVR that occurred 0-12 months post-DTG initiation (pAuthor SummaryHIV is a largely incurable infection despite the use of highly successful antiretroviral drugs (ARV) due to the presence of a population of long-living resting CD4+ T cells, which can harbor a complete copy of the virus integrated into the host cell’s DNA. We examined changes in the levels of these cells, referred to as the latent viral reservoir, in a group of ARV-treated Ugandans living with HIV. During this examination, Uganda authorities switched the backbone drug used in ARV regimens to a different class of drug that blocks the ability of the virus to integrate into the cell’s DNA. We found that for approximately a year after this switch to the new drug, there was a temporary spike in the size of the latent viral reservoir despite the new drug continuing to completely suppress viral replication with no apparent adverse clinical effects.
Published: 2023

23. Clustering Highly Divergent Homologous Proteins: An Alignment‐Free Method

Author: Laura Muñoz‐Baena and Art F. Y. Poon
Subjects: Medical Laboratory Technology, General Immunology and Microbiology, General Neuroscience, Health Informatics, General Pharmacology, Toxicology and Pharmaceutics, General Biochemistry, Genetics and Molecular Biology
Published: 2023

24. From components to communities: bringing network science to clustering for molecular epidemiology

Author: Molly Liu, Connor Chato, and Art F Y Poon
Subjects: Virology, Microbiology
Abstract: Defining clusters of epidemiologically related infections is a common problem in the surveillance of infectious disease. A popular method for generating clusters is pairwise distance clustering, which assigns pairs of sequences to the same cluster if their genetic distance falls below some threshold. The result is often represented as a network or graph of nodes. A connected component is a set of interconnected nodes in a graph that are not connected to any other node. The prevailing approach to pairwise clustering is to map clusters to the connected components of the graph on a one-to-one basis. We propose that this definition of clusters is unnecessarily rigid. For instance, the connected components can collapse into one cluster by the addition of a single sequence that bridges nodes in the respective components. Moreover, the distance thresholds typically used for viruses like HIV-1 tend to exclude a large proportion of new sequences, making it difficult to train models for predicting cluster growth. These issues may be resolved by revisiting how we define clusters from genetic distances. Community detection is a promising class of clustering methods from the field of network science. A community is a set of nodes that are more densely inter-connected relative to the number of their connections to external nodes. Thus, a connected component may be partitioned into two or more communities. Here we describe community detection methods in the context of genetic clustering for epidemiology, demonstrate how a popular method (Markov clustering) enables us to resolve variation in transmission rates within a giant connected component of HIV-1 sequences, and identify current challenges and directions for further work.
Published: 2023

25. bayroot: Bayesian sampling of HIV-1 integration dates by root-to-tip regression

Author: Roux-Cil Ferreira, Emmanuel Wong, and Art F Y Poon
Subjects: Virology, Microbiology
Abstract: The composition of the latent human immunodeficiency virus 1 (HIV-1) reservoir is shaped by when proviruses integrated into host genomes. These integration dates can be estimated by phylogenetic methods like root-to-tip (RTT) regression. However, RTT does not accommodate variation in the number of mutations over time, uncertainty in estimating the molecular clock, or the position of the root in the tree. To address these limitations, we implemented a Bayesian extension of RTT as an R package (bayroot), which enables the user to incorporate prior information about the time of infection and start of antiretroviral therapy. Taking an unrooted maximum likelihood tree as input, we use a Metropolis–Hastings algorithm to sample from the joint posterior distribution of three parameters (the rate of sequence evolution, i.e., molecular clock; the location of the root; and the time associated with the root). Next, we apply rejection sampling to this posterior sample of model parameters to simulate integration dates for HIV proviral sequences. To validate this method, we use the R package treeswithintrees (twt) to simulate time-scaled trees relating samples of actively and latently infected T cells from a single host. We find that bayroot yields significantly more accurate estimates of integration dates than conventional RTT under a range of model settings.
Published: 2022

26. HexSE: Simulating evolution in overlapping reading frames

Author: Laura Muñoz-Baena, Kaitlyn E Wade, and Art F Y Poon
Subjects: Virology, Microbiology
Abstract: MotivationGene overlap occurs when two or more genes are encoded by the same nucleotides. This phenomenon is found in all taxonomic domains, but is particularly common in viruses, where it may provide a mechanism to increase the information content of compact genomes. The presence of overlapping reading frames (OvRFs) can skew estimates of selection based on the rates of non-synonymous and synonymous substitutions, since a substitution that is synonymous in one reading frame may be non-synonymous in another, and vice versa.ResultsTo understand the impact of OvRFs on molecular evolution, we implemented a versatile simulation model of nucleotide sequence evolution along a phylogeny with any distribution of open reading frames in linear or circular genomes. We use a custom data structure to track the substitution rates at every nucleotide site, which is determined by the stationary nucleotide frequencies, transition bias, and the distribution of selection biases (dN/dS) in the respective reading frames.Availability and implementationOur simulation model is implemented in the Python scripting language. All source code is released under the GNU General Public License (GPL) version 3, and is available at https://github.com/PoonLab/HexSE.
Published: 2022

27. From components to communities: bringing network science to clustering for genomic epidemiology

Author: Molly Liu, Connor Chato, and Art F. Y. Poon
Abstract: Defining clusters of epidemiologically-related infections is a common problem in the surveillance of infectious disease. A popular method for generating clusters is pairwise distance clustering, which assigns pairs of sequences to the same cluster if their genetic distance falls below some threshold. The result is often represented as a network or graph of infections. A connected component is a set of interconnected nodes in a graph that are not connected to any other node. The current approach to pairwise clustering is to map clusters to the connected components of the graph. However, the distance thresholds typically used for viruses like HIV-1 tend to yield components that exclude large numbers of infections as unconnected nodes. This is problematic for public health applications of clustering, such as tracking the growth of clusters over time. We propose that this problem can be addressed with community detection, a class of clustering methods being developed in the field of network science. A community is a set of nodes that are more densely inter-connected relative to the number of connections to external nodes. Thus, a connected component may be partitioned into two or more communities. Here we describe community detection methods in the context of genetic clustering for epidemiology, demonstrate how a popular method (Markov clustering) enables us to resolve variation in transmission rates within a giant connected component of HIV-1 sequences, and identify current challenges and directions for further work.
Published: 2022

28. Genomic epidemiology of the first two waves of SARS-CoV-2 in Canada

Author: Angela, McLaughlin, Vincent, Montoya, Rachel L, Miller, Gideon J, Mordecai, Michael, Worobey, Art F Y, Poon, and Jeffrey B, Joy
Subjects: Ontario, General Immunology and Microbiology, SARS-CoV-2, General Neuroscience, COVID-19, Humans, Genomics, General Medicine, Phylogeny, General Biochemistry, Genetics and Molecular Biology
Abstract: Tracking the emergence and spread of SARS-CoV-2 lineages using phylogenetics has proven critical to inform the timing and stringency of COVID-19 public health interventions. We investigated the effectiveness of international travel restrictions at reducing SARS-CoV-2 importations and transmission in Canada in the first two waves of 2020 and early 2021. Maximum likelihood phylogenetic trees were used to infer viruses’ geographic origins, enabling identification of 2263 (95% confidence interval: 2159–2366) introductions, including 680 (658–703) Canadian sublineages, which are international introductions resulting in sampled Canadian descendants, and 1582 (1501–1663) singletons, introductions with no sampled descendants. Of the sublineages seeded during the first wave, 49% (46–52%) originated from the USA and were primarily introduced into Quebec (39%) and Ontario (36%), while in the second wave, the USA was still the predominant source (43%), alongside a larger contribution from India (16%) and the UK (7%). Following implementation of restrictions on the entry of foreign nationals on 21 March 2020, importations declined from 58.5 (50.4–66.5) sublineages per week to 10.3-fold (8.3–15.0) lower within 4 weeks. Despite the drastic reduction in viral importations following travel restrictions, newly seeded sublineages in summer and fall 2020 contributed to the persistence of COVID-19 cases in the second wave, highlighting the importance of sustained interventions to reduce transmission. Importations rebounded further in November, bringing newly emergent variants of concern (VOCs). By the end of February 2021, there had been an estimated 30 (19–41) B.1.1.7 sublineages imported into Canada, which increasingly displaced previously circulating sublineages by the end of the second wave.Although viral importations are nearly inevitable when global prevalence is high, with fewer importations there are fewer opportunities for novel variants to spark outbreaks or outcompete previously circulating lineages.
Published: 2022

29. Revisiting the recombinant history of HIV-1 group M with dynamic network community detection

Author: Abayomi S. Olabode, Garway T. Ng, Kaitlyn E. Wade, Mikhail Salnikov, Heather E. Grant, David W. Dick, and Art F. Y. Poon
Subjects: Recombination, Genetic, Multidisciplinary, HIV-1
Abstract: The prevailing abundance of full-length HIV type 1 (HIV-1) genome sequences provides an opportunity to revisit the standard model of HIV-1 group M (HIV-1/M) diversity that clusters genomes into largely nonrecombinant subtypes, which is not consistent with recent evidence of deep recombinant histories for simian immunodeficiency virus (SIV) and other HIV-1 groups. Here we develop an unsupervised nonparametric clustering approach, which does not rely on predefined nonrecombinant genomes, by adapting a community detection method developed for dynamic social network analysis. We show that this method (dynamic stochastic block model [DSBM]) attains a significantly lower mean error rate in detecting recombinant breakpoints in simulated data (quasibinomial generalized linear model (GLM), P8×10−8), compared to other reference-free recombination detection programs (genetic algorithm for recombination detection [GARD], recombination detection program 4 [RDP4], and RDP5). When this method was applied to a representative sample of n = 525 actual HIV-1 genomes, we determined k = 29 as the optimal number of DSBM clusters and used change-point detection to estimate that at least 95% of these genomes are recombinant. Further, we identified both known and undocumented recombination hotspots in the HIV-1 genome and evidence of intersubtype recombination in HIV-1 subtype reference genomes. We propose that clusters generated by DSBM can provide an informative framework for HIV-1 classification.
Published: 2022

30. Validation of a Genotype-Independent Hepatitis C Virus Near-Whole Genome Sequencing Assay

Author: Hope R. Lapointe, P. Richard Harrigan, Chanson J. Brumme, Don Kirkby, Anita Y. M. Howe, Art F. Y. Poon, Winnie Dong, Weiyan Dong, and Conan K. Woods
Subjects: Genotype, Genotyping Techniques, Hepatitis C virus, genotype-independent, Hepacivirus, resistance-associated substitutions, Biology, medicine.disease_cause, Sensitivity and Specificity, Microbiology, Article, chemistry.chemical_compound, Virology, medicine, Humans, Line Probe Assay, NS5A, NS5B, Whole genome sequencing, direct-acting antiviral agent, NS3, Whole Genome Sequencing, virus diseases, Reproducibility of Results, Viral Load, Hepatitis C, digestive system diseases, QR1-502, Infectious Diseases, chemistry, whole-genome sequencing, HCV, RNA, Viral, Viral load
Abstract: Despite the effectiveness of direct-acting antiviral agents in treating hepatitis C virus (HCV), cases of treatment failure have been associated with the emergence of resistance-associated substitutions. To better guide clinical decision-making, we developed and validated a near-whole-genome HCV genotype-independent next-generation sequencing strategy. HCV genotype 1–6 samples from direct-acting antiviral agent treatment-naïve and -treated HCV-infected individuals were included. Viral RNA was extracted using a NucliSens easyMAG and amplified using nested reverse transcription-polymerase chain reaction. Libraries were prepared using Nextera XT and sequenced on the Illumina MiSeq sequencing platform. Data were processed by an in-house pipeline (MiCall). Nucleotide consensus sequences were aligned to reference strain sequences for resistance-associated substitution identification and compared to NS3, NS5a, and NS5b sequence data obtained from a validated in-house assay optimized for HCV genotype 1. Sequencing success rates (defined as achieving &gt, 100-fold read coverage) approaching 90% were observed for most genotypes in samples with a viral load &gt, 5 log10 IU/mL. This genotype-independent sequencing method resulted in &gt, 99.8% nucleotide concordance with the genotype 1-optimized method, and 100% agreement in genotype assignment with paired line probe assay-based genotypes. The assay demonstrated high intra-run repeatability and inter-run reproducibility at detecting substitutions above 2% prevalence. This study highlights the performance of a freely available laboratory and bioinformatic approach for reliable HCV genotyping and resistance-associated substitution detection regardless of genotype.
Published: 2021

31. CoVizu: Rapid analysis and visualization of the global diversity of SARS-CoV-2 genomes

Author: Art F. Y. Poon, Connor Chato, Gopi Gugan, Bonnie Lu, Kaitlyn Wade, Molly Liu, Roux-Cil Ferreira, Abayomi S Olabode, Laura Munoz Baena, and Emmanuel Wong
Subjects: Set (abstract data type), Phylogenetic tree, Phylogenetics, Virology, Lineage (evolution), Computational biology, Biology, Molecular clock, Microbiology, Genome, Distance matrices in phylogeny, Reference genome
Abstract: Phylogenetics has played a pivotal role in the genomic epidemiology of SARS-CoV-2, such as tracking the emergence and global spread of variants, and scientific communication. However, the rapid accumulation of genomic data from around the world — with over two million genomes currently available in the GISAID database — is testing the limits of standard phylogenetic methods. Here, we describe a new approach to rapidly analyze and visualize large numbers of SARS-CoV-2 genomes. Using Python, genomes are filtered for problematic sites, incomplete coverage, and excessive divergence from a strict molecular clock. All differences from the reference genome, including indels, are extracted using minimap2, and compactly stored as a set of features for each genome. For each Pango lineage (https://cov-lineages.org), we collapse genomes with identical features into ‘variants’, generate 100 bootstrap samples of the feature set union to generate weights, and compute the symmetric differences between the weighted feature sets for every pair of variants. The resulting distance matrices are used to generate neigihbor-joining trees in RapidNJ and converted into a majority-rule consensus tree for the lineage. Branches with support values below 50% or mean lengths below 0.5 differences are collapsed, and tip labels on affected branches are mapped to internal nodes as directly-sampled ancestral variants. Currently, we process about million genomes in approximately nine hours on 34 cores. The resulting trees are visualized using the JavaScript framework D3.js as ‘beadplots’, in which variants are represented by horizontal line segments, annotated with beads representing samples by collection date. Variants are linked by vertical edges to represent branches in the consensus tree. These visualizations are published at https://filogeneti.ca/CoVizu. All source code was released under an MIT license at https://github.com/PoonLab/covizu.
Published: 2021

32. Using networks to analyze and visualize the distribution of overlapping reading frames in virus genomes

Author: Art F. Y. Poon and Laura Muñoz-Baena
Subjects: Negative selection, Computer science, Reading (process), media_common.quotation_subject, Reading frame, Frame (artificial intelligence), Adjacency list, Computational biology, Gene, Genome, Frameshift mutation, media_common
Abstract: Gene overlap occurs when two or more genes are encoded by the same nucleotides. This phenomenon is found in all taxonomic domains, but is particularly common in viruses, where it may increase the information content of compact genomes or influence the creation of new genes. Here we report a global comparative study of overlapping reading frames (OvRFs) of 12,609 virus reference genomes in the NCBI database. We retrieved metadata associated with all annotated reading frames in each genome record to calculate the number, length, and frameshift of OvRFs. Our results show that while the number of OvRFs increases with genome length, they tend to be shorter in longer genomes. The majority of overlaps involve +2 frameshifts, predominantly found in ds-DNA viruses. However, the longest overlaps involve no shift in reading frame (+0), increasing the selective burden of the same nucleotide positions within codons, instead of exposing additional sites to purifying selection. Next, we develop a new graph-based representation of the distribution of OvRFs among the reading frames of genomes in a given virus family. In the absence of an unambiguous partition of reading frames by homology at this taxonomic level, we used an alignment-free k-mer based approach to cluster protein coding sequences by similarity. We connect these clusters with two types of directed edges to indicate (1) that constituent reading frames are adjacent in one or more genomes, and (2) that the reading frames overlap. These adjacency graphs not only provide a natural visualization scheme, but also a novel statistical framework for analyzing the effects of gene- and genome-level attributes on the frequencies of overlaps.
Published: 2021

33. HyPhy 2.5—A Customizable Platform for Evolutionary Hypothesis Testing Using Phylogenies

Author: Brittany Rife Magalis, Stephanie J. Spielman, Ben Murrell, Sadie R Wisotsky, Simon D. W. Frost, Dave Bouvier, Ryan Velazquez, Art F. Y. Poon, Spencer V. Muse, N. Lance Hepler, Sergei L Kosakovsky Pond, Stephen D. Shank, Anton Nekrutenko, and Steven Weaver
Subjects: Natural selection, Estimation theory, business.industry, Stability (learning theory), Usability, Biology, Machine learning, computer.software_genre, Resources, Backward compatibility, Genetic Techniques, Genetics, Statistical inference, Artificial intelligence, business, Molecular Biology, computer, Phylogeny, Software, Ecology, Evolution, Behavior and Systematics, Coevolution, Statistical hypothesis testing
Abstract: HYpothesis testing using PHYlogenies (HyPhy) is a scriptable, open-source package for fitting a broad range of evolutionary models to multiple sequence alignments, and for conducting subsequent parameter estimation and hypothesis testing, primarily in the maximum likelihood statistical framework. It has become a popular choice for characterizing various aspects of the evolutionary process: natural selection, evolutionary rates, recombination, and coevolution. The 2.5 release (available from www.hyphy.org) includes a completely re-engineered computational core and analysis library that introduces new classes of evolutionary models and statistical tests, delivers substantial performance and stability enhancements, improves usability, streamlines end-to-end analysis workflows, makes it easier to develop custom analyses, and is mostly backward compatible with previous HyPhy releases.
Published: 2019

34. Tree shape‐based approaches for the comparative study of cophylogeny

Author: Mathias S. Renaud, Yiying He, Garway T Ng, Art F. Y. Poon, Bradley R Jones, and Mariano Avino
Subjects: 0106 biological sciences, Computer science, cophylogeny, 010603 evolutionary biology, 01 natural sciences, Distance measures, 03 medical and health sciences, Congruence (geometry), lcsh:QH540-549.5, Cluster analysis, Extreme value theory, Ecology, Evolution, Behavior and Systematics, Coevolution, 030304 developmental biology, Nature and Landscape Conservation, Original Research, tree shape, 0303 health sciences, Ecology, Phylogenetic tree, business.industry, Pattern recognition, tree measures, Cospeciation, kernel, coevolution, host switching, Artificial intelligence, lcsh:Ecology, Approximate Bayesian computation, business
Abstract: Cophylogeny is the congruence of phylogenetic relationships between two different groups of organisms due to their long‐term interaction. We investigated the use of tree shape distance measures to quantify the degree of cophylogeny. We implemented a reverse‐time simulation model of pathogen phylogenies within a fixed host tree, given cospeciation probability, host switching, and pathogen speciation rates. We used this model to evaluate 18 distance measures between host and pathogen trees including two kernel distances that we developed for labeled and unlabeled trees, which use branch lengths and accommodate different size trees. Finally, we used these measures to revisit published cophylogenetic studies, where authors described the observed associations as representing a high or low degree of cophylogeny. Our simulations demonstrated that some measures are more informative than others with respect to specific coevolution parameters especially when these did not assume extreme values. For real datasets, trees’ associations projection revealed clustering of high concordance studies suggesting that investigators are describing it in a consistent way. Our results support the hypothesis that measures can be useful for quantifying cophylogeny. This motivates their usage in the field of coevolution and supports the development of simulation‐based methods, i.e., approximate Bayesian computation, to estimate the underlying coevolutionary parameters.
Published: 2019

35. A systematic, deep sequencing-based methodology for identification of mixed-genotype hepatitis C virus infections

Author: P. Richard Harrigan, Chanson J. Brumme, Jason Grebely, Gail V. Matthews, Vincent Montoya, Thuy Nguyen, Tanya L. Applegate, Gregory J. Dore, Art F. Y. Poon, Marianne Martinello, Celia K. Chui, Vera Tai, Andrea D. Olmstead, Winnie Dong, Jeffrey B. Joy, and Anita Y. M. Howe
Subjects: 0301 basic medicine, Microbiology (medical), Genes, Viral, Genotype, Sequence analysis, Hepatitis C virus, 030106 microbiology, Genome, Viral, Hepacivirus, Biology, medicine.disease_cause, Microbiology, Deep sequencing, 03 medical and health sciences, Genetics, medicine, Humans, Molecular Biology, Phylogeny, Ecology, Evolution, Behavior and Systematics, Receiver operating characteristic, Coinfection, Computational Biology, High-Throughput Nucleotide Sequencing, Genomics, Amplicon, medicine.disease, Hepatitis C, Virology, 030104 developmental biology, Infectious Diseases, ROC Curve, Cohort, RNA, Viral
Abstract: Hepatitis C virus (HCV) mixed genotype infections can affect treatment outcomes and may have implications for vaccine design and disease progression. Previous studies demonstrate 0–39% of high-risk, HCV-infected individuals harbor mixed genotypes however standardized, sensitive methods of detection are lacking. This study compared PCR amplicon, random primer (RP), and probe enrichment (PE)-based deep sequencing methods coupled with a custom sequence analysis pipeline to detect multiple HCV genotypes. Mixed infection cutoff values, based on HCV read depth and coverage, were identified using receiver operating characteristic curve analysis. The methodology was validated using artificially mixed genotype samples and then applied to two clinical trials of HCV treatment in high-risk individuals (ACTIVATE, 114 samples from 90 individuals; DARE-C II, 26 samples from 18 individuals) and a cohort of HIV/HCV co-infected individuals (Canadian Coinfection Cohort (CCC), 3 samples from 2 individuals with suspected mixed genotype infections). Amplification bias of genotype (G)1b, G2, G3 and G5 was observed in artificially mixed samples using the PCR method while no genotype bias was observed using RP and PE. RP and PE sequencing of 140 ACTIVATE and DARE-C II samples identified the following primary genotypes: 15% (n = 21) G1a, 76% (n = 106) G3, and 9% (n = 13) G2. Sequencing of ACTIVATE and DARE-C II demonstrated, on average, 2% and 1% of HCV reads mapping to a second genotype using RP and PE, respectively, however none passed the mixed infection cutoff criteria and phylogenetics confirmed no mixed infections. From CCC, one mixed infection was confirmed while the other was determined to be a recombinant genotype. This study underlines the risk for false identification of mixed HCV infections and stresses the need for standardized methods to improve prevalence estimates and to understand the impact of mixed infections for management and elimination of HCV.
Published: 2019

36. Early and ongoing importations of SARS-CoV-2 in Canada

Author: Art F. Y. Poon, Montoya, Angela McLaughlin, Jeffrey B. Joy, Rachel L Miller, Gideon J. Mordecai, and Michael Worobey
Subjects: Transmission (mechanics), Geography, Coronavirus disease 2019 (COVID-19), law, Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), Public health interventions, Quarantine, Pandemic, Context (language use), Socioeconomics, law.invention, Sampling bias
Abstract: Tracking the emergence and spread of SARS-CoV-2 is critical to inform public health interventions. Phylodynamic analyses have quantified SARS-CoV-2 migration on global and local scales1–5, yet they have not been applied to determine transmission dynamics in Canada. We quantified SARS-CoV-2 migration into, within, and out of Canada in the context of COVID-19 travel restrictions. To minimize sampling bias, global sequences were subsampled with probabilities corrected for their countries’ monthly contribution to global new diagnoses. A time-scaled maximum likelihood tree was used to estimate most likely ancestral geographic locations (country or Canadian province), enabling identification of sublineages, defined as introduction events into Canada resulting in domestic transmission. Of 402 Canadian sublineages identified, the majority likely originated from the USA (54%), followed by Russia (7%), India (6%), Italy (6%), and the UK (5%). International introductions were mostly into Ontario (39%) and Quebec (38%). Among Pango lineages6, B.1 was imported at least 191 separate times from 11 different countries. Introduction rates peaked in late March then diminished but were not eliminated following national interventions including restrictions on non-essential travel. We further identified 1,380 singleton importations, international importations that did not result in further sampled transmission, whereby representation of lineages and location were comparable to sublineages. Although proportion of international transmission decreased over time, this coincided with exponential growth of within-province transmission – in fact, total number of sampled transmission events from international or interprovincial sources increased from winter 2020 into spring 2020 in many provinces. Ontario, Quebec, and British Columbia acted as sources of transmission more than recipients, within the caveat of higher sequence representation. We present strong evidence that international introductions and interprovincial transmission of SARS-CoV-2 contributed to the Canadian COVID-19 burden throughout 2020, despite initial reductions mediated by travel restrictions in 2020. More stringent border controls and quarantine measures may have curtailed introductions of SARS-CoV-2 into Canada and may still be warranted.Significance StatementBy analyzing SARS-CoV-2 genomes from Canada in the context of the global pandemic, we illuminate the extent to which the COVID-19 burden in Canada was perpetuated by ongoing international importations and interprovincial transmission throughout 2020. Although travel restrictions enacted in March 2020 reduced the importation rate and proportion of transmission from abroad across all Canadian provinces, SARS-CoV-2 introductions from the USA, India, Russia, and other nations were detectable through the summer and fall of 2020.
Published: 2021

37. Revisiting the recombinant history of HIV-1 group M with dynamic network community detection

Author: Kaitlyn Wade, Garway T Ng, Abayomi S Olabode, David W Dick, Art F. Y. Poon, and Mikhail Salnikov
Subjects: Dynamic network analysis, Breakpoint, Human immunodeficiency virus (HIV), Computational biology, Biology, medicine.disease_cause, Genome, law.invention, law, medicine, Recombinant DNA, Cluster analysis, Recombination, Change detection
Abstract: A new abundance of full-length HIV-1 genome sequences provides an opportunity to revisit the standard model of HIV-1/M diversity that clusters genomes into largely non-recombinant subtypes, which is not consistent with recent evidence of deep recombinant histories for SIV and other HIV-1 groups. Here we develop an unsupervised non-parametric clustering approach, which does not rely on predefined non-recombinant genomes, by adapting a community detection method developed for dynamic social network analysis. We show that this method (DSBM) attains a significantly lower mean error rate in detecting recombinant breakpoints in simulated data (quasibinomial GLM, P < 8 × 10−8), compared to other reference-free recombination detection programs (GARD, RDP4 and RDP5). Applied to a representative sample of n = 525 actual HIV-1 genomes, we determined k = 25 as the optimal number of DSBM clusters, and used change point detection to estimate that at least 95% of these genomes are recombinant. Further, we identified both known and novel recombination hotspots in the HIV-1 genome, and evidence of inter-subtype recombination in HIV-1 subtype reference genomes. We propose that clusters generated by DSBM can provide an informative new framework for HIV-1 classification.
Published: 2021

38. Quantifying the clonality and dynamics of the within-host HIV-1 latent reservoir

Author: Andrew D. Redd, Roux Cil Ferreira, Jessica L. Prodger, and Art F. Y. Poon
Subjects: branching processes, Population, Human immunodeficiency virus (HIV), clonality, Biology, medicine.disease_cause, Microbiology, 03 medical and health sciences, Reflections, 0302 clinical medicine, Virology, medicine, AcademicSubjects/MED00860, Dna viral, education, 030304 developmental biology, 0303 health sciences, education.field_of_study, Phylogenetic tree, Mechanism (biology), Host (biology), within-host evolution, AcademicSubjects/SCI01130, AcademicSubjects/SCI02285, Provirus, 3. Good health, Viral replication, Evolutionary biology, HIV-1 latency, 030217 neurology & neurosurgery
Abstract: Among people living with human immunodeficiency virus type 1 (HIV-1), the long-term persistence of a population of cells carrying transcriptionally silent integrated viral DNA (provirus) remains the primary barrier to developing an effective cure. Ongoing cell division via proliferation is generally considered to be the driving force behind the persistence of this latent HIV-1 reservoir. The contribution of this mechanism (clonal expansion) is supported by the observation that proviral sequences sampled from the reservoir are often identical. This outcome is quantified as the ‘clonality’ of the sample population, e.g. the fraction of provirus sequences observed more than once. However, clonality as a quantitative measure is inconsistently defined and its statistical properties are not well understood. In this Reflections article, we use mathematical and phylogenetic frameworks to formally examine the inherent problems of using clonality to characterize the dynamics and proviral composition of the reservoir. We describe how clonality is not adequate for this task due to the inherent complexity of how infected cells are ‘labeled’ by proviral sequences—the outcome of a sampling process from the evolutionary history of active viral replication before treatment—as well as variation in cell birth and death rates among lineages and over time. Lastly, we outline potential directions in statistical and phylogenetic research to address these issues.
Published: 2021

39. Addressing Ethical Challenges in US-Based HIV Phylogenetic Research

Author: Nanette Benbow, Liza Dawson, Stuart Rennie, Lucia V. Torian, Jeremy Sugarman, Brian Minalga, Patricia Sweeney, Sanjay Mehta, Stephen R. Latham, Thomas Leitner, Faith E. Fletcher, Amy Killelea, Art F. Y. Poon, Susan J. Little, Omar Martinez, Seble Kassaye, Joel O. Wertheim, and Lisa M. Lee
Subjects: Pediatric AIDS, Biomedical Research, HIV Infections, Medical and Health Sciences, 8.3 Policy, 0302 clinical medicine, Public health surveillance, Immunology and Allergy, Public Health Surveillance, 030212 general & internal medicine, Aetiology, Phylogeny, Pediatric, Community engagement, public health, 06 humanities and the arts, Public relations, Biological Sciences, phylogenetics, Scholarship, Infectious Diseases, Research Design, HIV/AIDS, Psychology, Infection, Confidentiality, Health and social care services research, medicine.medical_specialty, Advisory Committees, Data security, 0603 philosophy, ethics and religion, Microbiology, and research governance, 03 medical and health sciences, Major Articles and Brief Reports, Acquired immunodeficiency syndrome (AIDS), Clinical Research, medicine, Humans, Computer Security, Ethical code, Acquired Immunodeficiency Syndrome, business.industry, Diagnostic Tests, Routine, Information Dissemination, Public health, Prevention, Community Participation, HIV, medicine.disease, ethics, United States, Clinical trial, Good Health and Well Being, National Institutes of Health (U.S.), 060301 applied ethics, business, 2.4 Surveillance and distribution
Abstract: In recent years, phylogenetic analysis of HIV sequence data has been used in research studies to investigate transmission patterns between individuals and groups, including analysis of data from HIV prevention clinical trials, in molecular epidemiology, and in public health surveillance programs. Phylogenetic analysis can provide valuable information to inform HIV prevention efforts, but it also has risks, including stigma and marginalization of groups, or potential identification of HIV transmission between individuals. In response to these concerns, an interdisciplinary working group was assembled to address ethical challenges in US-based HIV phylogenetic research. The working group developed recommendations regarding (1) study design; (2) data security, access, and sharing; (3) legal issues; (4) community engagement; and (5) communication and dissemination. The working group also identified areas for future research and scholarship to promote ethical conduct of HIV phylogenetic research.
Published: 2020

40. Tuning intrinsic disorder predictors for virus proteins

Author: Gal Almog, Abayomi S Olabode, and Art F. Y. Poon
Subjects: 0206 medical engineering, Context (language use), 02 engineering and technology, Computational biology, Biology, Intrinsically disordered proteins, ensemble classifier, Microbiology, Genome, Cross-validation, 03 medical and health sciences, Protein structure, Virology, AcademicSubjects/MED00860, 030304 developmental biology, protein disorder prediction, virus proteins, 0303 health sciences, 030302 biochemistry & molecular biology, AcademicSubjects/SCI01130, AcademicSubjects/SCI02285, Matthews correlation coefficient, Random forest, Open reading frame, machine learning, Viral replication, intrinsically disordered proteins, 020602 bioinformatics, Research Article
Abstract: Many virus-encoded proteins have intrinsically disordered regions that lack a stable, folded three-dimensional structure. These disordered proteins often play important functional roles in virus replication, such as down-regulating host defense mechanisms. With the widespread availability of next-generation sequencing, the number of new virus genomes with predicted open reading frames is rapidly outpacing our capacity for directly characterizing protein structures through crystallography. Hence, computational methods for structural prediction play an important role. A large number of predictors focus on the problem of classifying residues into ordered and disordered regions, and these methods tend to be validated on a diverse training set of proteins from eukaryotes, prokaryotes, and viruses. In this study, we investigate whether some predictors outperform others in the context of virus proteins and compared our findings with data from non-viral proteins. We evaluate the prediction accuracy of 21 methods, many of which are only available as web applications, on a curated set of 126 proteins encoded by viruses. Furthermore, we apply a random forest classifier to these predictor outputs. Based on cross-validation experiments, this ensemble approach confers a substantial improvement in accuracy, e.g., a mean 36 per cent gain in Matthews correlation coefficient. Lastly, we apply the random forest predictor to severe acute respiratory syndrome coronavirus 2 ORF6, an accessory gene that encodes a short (61 AA) and moderately disordered protein that inhibits the host innate immune response. We show that disorder prediction methods perform differently for viral and non-viral proteins, and that an ensemble approach can yield more robust and accurate predictions.
Published: 2020

41. A targeted reactivation of latent HIV-1 using an activator vector in patient samples from acute infection

Author: Deborah King, Chanuka N. Wijewardhana, Caroline Foster, Joshua Pankrac, Katja Klein, Rahul Pawa, Jamie F.S. Mann, Eric J. Arts, Robin J. Shattock, Art F. Y. Poon, David H. Canaday, Richard M. Gibson, Paul F. McKay, Yong Gao, J Meyerowitz, Sarah Fidler, Mariano Avino, Imperial College Healthcare NHS Trust- BRC Funding, Medical Research Council (MRC), British HIV Association (BHIVA), and American Foundation for AIDS Research
Subjects: 0301 basic medicine, HIV-1 Latency, Antigenicity, medicine.medical_treatment, Activator vector (ACT-VEC), General Biochemistry, Genetics and Molecular Biology, Virus, 1117 Public Health and Health Services, Transcriptional reactivation, 03 medical and health sciences, 0302 clinical medicine, Antigen, Medicine, Latency (engineering), business.industry, ELISPOT, 1103 Clinical Sciences, General Medicine, Immunotherapy, Provirus, 3. Good health, HIV-1 Cure, 030104 developmental biology, Virus-Like Particles, 030220 oncology & carcinogenesis, Immunology, business, Ex vivo, Research Paper
Abstract: Background During combined anti-retroviral treatment, a latent HIV reservoir persists within resting memory CD4 T cells that initiates viral recrudescence upon treatment interruption. Strategies for HIV-1 cure have largely focused on latency reversing agents (LRAs) capable of reactivating and eliminating this viral reservoir. Previously investigated LRAs have largely failed to achieve a robust latency reversal sufficient for reduction of latent HIV pool or the potential of virus-free remission in the absence of treatment. Methods We utilize a polyvalent virus-like particle (VLP) formulation called Activator Vector (ACT-VEC) to ‘shock’ provirus into transcriptional activity. Ex vivo co-culture experiments were used to evaluate the efficacy of ACT-VEC in relation to other LRAs in individuals diagnosed and treated during the acute stage of infection. IFN-γ ELISpot, qRT-PCR and Illumina MiSeq were used to evaluate antigenicity, latency reversal, and diversity of induced virus respectively. Findings Using samples from HIV+ patients diagnosed and treated at acute/early infection, we demonstrate that ACT-VEC can reverse latency in HIV infected CD4 T cells to a greater extent than other major recall antigens as stimuli or even mitogens such as PMA/Iono. Furthermore, ACT-VEC activates more latent HIV-1 than clinically tested HDAC inhibitors or protein kinase C agonists. Interpretation Taken together, these results show that ACT-VEC can induce HIV reactivation from latently infected CD4 T cells collected from participants on first line combined antiretroviral therapy for at least two years after being diagnosed and treated at acute/early stage of infection. These findings could provide guidance to possible targeted cure strategies and treatments. Funding NIH and CIHR
Published: 2020

42. Public health in genetic spaces: a statistical framework to optimize cluster-based outbreak detection

Author: Marcia L. Kalish, Connor Chato, and Art F. Y. Poon
Subjects: Computer science, Population, HIV prevention, Feature selection, Context (language use), modifiable areal unit problem, Microbiology, molecular epidemiology, genetic clustering, 03 medical and health sciences, 0302 clinical medicine, Virology, Covariate, Statistics, 030212 general & internal medicine, Cluster analysis, education, Selection (genetic algorithm), 030304 developmental biology, Mathematics, virus evolution, 0303 health sciences, education.field_of_study, Null model, 3. Good health, Modifiable areal unit problem, Pairwise comparison, Sample collection, Akaike information criterion, Research Article
Abstract: In infectious disease epidemiology, clustering cases of infection in space and time is a standard method to identify and characterize outbreaks. Clustering cases by genetic similarity is analogous to spatial clustering, and may be more effective for pathogens transmitted at a relatively low rate by intimate contact. However, the statistical properties of genetic clustering in the context of out-break detection are not well characterized and cluster-defining criteria are generally set to arbitrary values. We describe a new method to optimize the predictive value of a clustering method by optimizing its parameters to maximize the difference in the Akaike information criterion (AIC) between individual-weighted and null models of cluster growth. This approach mirrors solutions to the modifiable areal unit problem (MAUP): the statistical association between covariates and an outcome is contingent on how their spatial distribution is partitioned into units of observation. To evaluate our method, we analyzed the distributions of pairwise Tamura-Nei (TN93) genetic distances from two published sets of anonymized HIV-1 subtype B pol sequence data stratified by collection year. We generated 46 different graphs by varying the pairwise threshold, where an edge in a graph indicates that the TN93 distance between the respective cases is below the corresponding threshold. For each graph, we generated predictions of cluster growth (numbers of new cases with edges to clusters of known cases) under two different Poisson regression models: a null model in which growth is only proportional to cluster size (i.e., no variation among individuals); and a weighted model where the variation associated with individual-level covariates are summed by cluster. Next, we calculated the AIC for each model on the distributions of observed cluster growth in two published HIV-1 pol data sets from Seattle, USA (n = 1, 653) and Alberta, Canada (n = 809). Based on the difference in AICs, we obtained different optimized TN93 thresholds for these data sets (0.014 and 0.011, respectively). We show that selection of this threshold parameter can substantially limit the utility of genetic clusters for public health, and that the optimal parameter for one population can misdirect prevention efforts in another. This statistical framework can potentially be used to optimize any clustering method, and to evaluate it against other methods including those that do not use genetic information.
Published: 2020

43. Accumulation of integrase strand transfer inhibitor resistance mutations confers high-level resistance to dolutegravir in non-B subtype HIV-1 strains from patients failing raltegravir in Uganda

Author: Cissy Kityo, Yue Li, Paul S Reyes, Art F. Y. Poon, Fred Kyeyune, Adam Meadows, Abayomi S Olabode, Miguel E. Quiñones-Mateu, Eric J. Arts, Mariano Avino, Immaculate Nankya, Emmanuel Ndashimye, Christine Tan, and Richard M. Gibson
Subjects: Microbiology (medical), Pyridones, Human immunodeficiency virus (HIV), Integrase inhibitor, HIV Infections, Drug resistance, HIV Integrase, medicine.disease_cause, Virus, Piperazines, chemistry.chemical_compound, Raltegravir Potassium, Drug Resistance, Viral, Oxazines, medicine, Humans, Pharmacology (medical), Uganda, HIV Integrase Inhibitors, Original Research, Pharmacology, biology, Elvitegravir, business.industry, Raltegravir, Virology, Integrase, Infectious Diseases, chemistry, Dolutegravir, Mutation, biology.protein, HIV-1, business, Heterocyclic Compounds, 3-Ring, medicine.drug
Abstract: Background Increasing first-line treatment failures in low- and middle-income countries (LMICs) have led to increased use of integrase strand transfer inhibitors (INSTIs) such as dolutegravir. However, HIV-1 susceptibility to INSTIs in LMICs, especially with previous raltegravir exposure, is poorly understood due to infrequent reporting of INSTI failures and testing for INSTI drug resistance mutations (DRMs). Methods A total of 51 non-subtype B HIV-1 infected patients failing third-line (raltegravir-based) therapy in Uganda were initially selected for the study. DRMs were detected using Sanger and deep sequencing. HIV integrase genes of 13 patients were cloned and replication capacities (RCs) and phenotypic susceptibilities to dolutegravir, raltegravir and elvitegravir were determined with TZM-bl cells. Spearman’s correlation coefficient was used to determine cross-resistance between INSTIs. Results INSTI DRMs were detected in 47% of patients. HIV integrase-recombinant virus carrying one primary INSTI DRM (N155H or Y143R/S) was susceptible to dolutegravir but highly resistant to raltegravir and elvitegravir (>50-fold change). Two patients, one with E138A/G140A/Q148R/G163R and one with E138K/G140A/S147G/Q148K, displayed the highest reported resistance to raltegravir, elvitegravir and even dolutegravir. The former multi-DRM virus had WT RC whereas the latter had lower RCs than WT. Conclusions In HIV-1 subtype A- and D-infected patients failing raltegravir and harbouring INSTI DRMs, there is high-level resistance to elvitegravir and raltegravir. More routine monitoring of INSTI treatment may be advised in LMICs, considering that multiple INSTI DRMs may have accumulated during prolonged exposure to raltegravir during virological failure, leading to high-level INSTI resistance, including dolutegravir resistance.
Published: 2020

44. Genetic Diversity, Compartmentalization, and Age of HIV Proviruses Persisting in CD4 + T Cell Subsets during Long-Term Combination Antiretroviral Therapy

Author: Marianne Harris, Mark A. Brockman, Rémi Fromentin, Zabrina L. Brumme, Bradley R Jones, Art F. Y. Poon, Hanwei Sudderuddin, Natalie N. Kinloch, Aniqa Shahid, Rachel L Miller, Olivia Tsai, Chanson J. Brumme, Bruce Ganase, Jeffrey B. Joy, Nicolas Chomont, and Hawley Rigsby
Subjects: Cart, 0303 health sciences, Genetic diversity, Phylogenetic tree, Effector, T cell, Immunology, Biology, Microbiology, Virology, Genome, 3. Good health, 03 medical and health sciences, Phylogenetic diversity, 0302 clinical medicine, medicine.anatomical_structure, Insect Science, medicine, 030217 neurology & neurosurgery, 030304 developmental biology, Subgenomic mRNA
Abstract: The HIV reservoir, which comprises diverse proviruses integrated into the genomes of infected, primarily CD4+ T cells, is the main barrier to developing an effective HIV cure. Our understanding of the genetics and dynamics of proviruses persisting within distinct CD4+ T cell subsets, however, remains incomplete. Using single-genome amplification, we characterized subgenomic proviral sequences (nef region) from naive, central memory, transitional memory, and effector memory CD4+ T cells from five HIV-infected individuals on long-term combination antiretroviral therapy (cART) and compared these to HIV RNA sequences isolated longitudinally from archived plasma collected prior to cART initiation, yielding HIV data sets spanning a median of 19.5 years (range, 10 to 20 years) per participant. We inferred a distribution of within-host phylogenies for each participant, from which we characterized proviral ages, phylogenetic diversity, and genetic compartmentalization between CD4+ T cell subsets. While three of five participants exhibited some degree of proviral compartmentalization between CD4+ T cell subsets, combined analyses revealed no evidence that any particular CD4+ T cell subset harbored the longest persisting, most genetically diverse, and/or most genetically distinctive HIV reservoir. In one participant, diverse proviruses archived within naive T cells were significantly younger than those in memory subsets, while for three other participants we observed no significant differences in proviral ages between subsets. In one participant, "old" proviruses were recovered from all subsets, and included one sequence, estimated to be 21.5 years old, that dominated (>93%) their effector memory subset. HIV eradication strategies will need to overcome within- and between-host genetic complexity of proviral landscapes, possibly via personalized approaches.IMPORTANCE The main barrier to HIV cure is the ability of a genetically diverse pool of proviruses, integrated into the genomes of infected CD4+ T cells, to persist despite long-term suppressive combination antiretroviral therapy (cART). CD4+ T cells, however, constitute a heterogeneous population due to their maturation across a developmental continuum, and the genetic "landscapes" of latent proviruses archived within them remains incompletely understood. We applied phylogenetic techniques, largely novel to HIV persistence research, to reconstruct within-host HIV evolutionary history and characterize proviral diversity in CD4+ T cell subsets in five individuals on long-term cART. Participants varied widely in terms of proviral burden, genetic diversity, and age distribution between CD4+ T cell subsets, revealing that proviral landscapes can differ between individuals and between infected cell types within an individual. Our findings expose each within-host latent reservoir as unique in its genetic complexity and support personalized strategies for HIV eradication.
Published: 2020

45. Quantifying the Aftermath: Recent Outbreaks Among People Who Inject Drugs and the Utility of Phylodynamics

Author: Art F. Y. Poon and Bethany L. Dearlove
Subjects: 0301 basic medicine, HIV, Outbreak, Biology, Disease Outbreaks, Major Articles and Brief Reports, 03 medical and health sciences, 030104 developmental biology, Infectious Diseases, Viral phylodynamics, Scotland, Environmental health, Humans, Immunology and Allergy, Substance Abuse, Intravenous, Phylogeny
Abstract: BACKGROUND: Harm reduction has dramatically reduced HIV incidence among people who inject drugs (PWID). In Glasgow, Scotland, 90%) is indicative of sharing of injecting equipment. Monitoring the epidemic phylogenetically in real time may accelerate public health action.
Published: 2018

46. Absence of HIV-1 Drug Resistance Mutations Supports the Use of Dolutegravir in Uganda

Author: Peter Mugyenyi, Fred Kyeyune, Eva Nabulime, Miguel E. Quiñones-Mateu, Mariano Avino, Immaculate Nankya, Richard M. Gibson, Art F. Y. Poon, Eric J. Arts, Cissy Kityo, Emmanuel Ndashimye, Global Health, Graduate School, APH - Personalized Medicine, APH - Quality of Care, and AII - Infectious diseases
Subjects: 0301 basic medicine, Male, Genotype, Pyridones, Epidemiology, 030106 microbiology, Immunology, Human immunodeficiency virus (HIV), Mutation, Missense, Integrase inhibitor, HIV Infections, Drug resistance, medicine.disease_cause, Piperazines, 03 medical and health sciences, chemistry.chemical_compound, immune system diseases, Virology, Drug Resistance, Viral, Oxazines, medicine, Humans, Uganda, HIV Integrase Inhibitors, Retrospective Studies, business.industry, virus diseases, Sequence Analysis, DNA, VIROLOGIC FAILURE, Infectious Diseases, chemistry, pol Gene Products, Human Immunodeficiency Virus, Dolutegravir, HIV-1, Female, business, Heterocyclic Compounds, 3-Ring, HIV drug resistance
Abstract: To screen for drug resistance and possible treatment with Dolutegravir (DTG) in treatment-naive patients and those experiencing virologic failure during first-, second-, and third-line combined antiretroviral therapy (cART) in Uganda. Samples from 417 patients in Uganda were analyzed for predicted drug resistance upon failing a first- (N = 158), second- (N = 121), or third-line [all 51 involving Raltegravir (RAL)] treatment regimen. HIV-1 pol gene was amplified and sequenced from plasma samples. Drug susceptibility was interpreted using the Stanford HIV database algorithm and SCUEAL was used for HIV-1 subtyping. Frequency of resistance to nucleoside reverse transcriptase inhibitors (NRTIs) (95%) and non-NRTI (NNRTI, 96%) was high in first-line treatment failures. Despite lack of NNRTI-based treatment for years, NNRTI resistance remained stable in 55% of patients failing second-line or third-line treatment, and was also at 10% in treatment-naive Ugandans. DTG resistance (n = 366) was not observed in treatment-naive individuals or individuals failing first- and second-line cART, and only found in two patients failing third-line cART, while 47% of the latter had RAL- and Elvitegravir-resistant HIV-1. Secondary mutations associated with DTG resistance were found in 2%–10% of patients failing third-line cART. Of 14 drugs currently available for cART in Uganda, resistance was readily observed to all antiretroviral drugs (except for DTG) in Ugandan patients failing first-, second-, or even third-line treatment regimens. The high NNRTI resistance in first-line treatment in Uganda even among treatment-naive patients calls for the use of DTG to reach the UNAIDS 90:90:90 goals.
Published: 2018

47. Workshop: Error correction, noise filtering, and phylogenetic analysis of HIV sequences using the 454 platform.

Author: Konrad Scheffler, N. Lance Hepler, Martin D. Smith, Wayne Delport, Art F. Y. Poon, Douglas D. Richman, and Sergei L. Kosakovsky Pond
Published: 2012
Full Text: View/download PDF

48. Potential for immune-driven viral polymorphisms to compromise antiretroviral-based preexposure prophylaxis for prevention of HIV-1 infection

Author: Gustavo Reyes-Terán, Carlos Mejía-Villatoro, Simon Mallal, Zabrina L. Brumme, Philip J. R. Goulder, Guinevere Q. Lee, Guillermo Porras-Cortés, Santiago Ávila-Ríos, Masafumi Takiguchi, Hiroyuki Gatanaga, Emily Adland, Elsa Palou, Tsunefusa Hayashida, Art F. Y. Poon, Mary Carrington, Takayuki Chikata, Juan Miguel Pascale, Kinh Van Nguyen, Rita I Meza, Shinichi Oka, Jeffrey N. Martin, Marvin Manzanero, Giang Van Tran, Mina John, and Humberto Valenzuela-Ponce
Subjects: 0301 basic medicine, HLA-B18 Antigen, Immunology, Mutation, Missense, HIV Infections, Human leukocyte antigen, Drug resistance, Biology, Global Health, Article, 03 medical and health sciences, chemistry.chemical_compound, Pre-exposure prophylaxis, 0302 clinical medicine, Immune system, Drug Resistance, Viral, Genotype, Humans, Immunology and Allergy, Missense mutation, 030212 general & internal medicine, Immune Evasion, Polymorphism, Genetic, Rilpivirine, Virology, HIV Reverse Transcriptase, Reverse transcriptase, 030104 developmental biology, Infectious Diseases, Anti-Retroviral Agents, chemistry, HIV-1, Pre-Exposure Prophylaxis
Abstract: Objective: Long-acting rilpivirine is a candidate for preexposure prophylaxis (PrEP) for prevention of HIV-1 infection. However, rilpivirine resistance mutations at reverse transcriptase codon 138 (E138X) occur naturally in a minority of HIV-1-infected persons; in particular those expressing human leukocyte antigen (HLA)-B*18 where reverse transcriptase-E138X arises as an immune escape mutation. We investigate the global prevalence, B*18-linkage and replicative cost of reverse transcriptase-E138X and its regional implications for rilpivirine PrEP. Methods: We analyzed linked reverse transcriptase-E138X/HLA data from 7772 antiretroviral-naive patients from 16 cohorts spanning five continents and five HIV-1 subtypes, alongside unlinked global reverse transcriptase-E138X and HLA frequencies from public databases. E138X-containing HIV-1 variants were assessed for in-vitro replication as a surrogate of mutation stability following transmission. Results: Reverse transcriptase-E138X variants, where the most common were rilpivirine resistance-associated mutations E138A/G/K, were significantly enriched in HLA-B*18-positive individuals globally (P = 3.5 × 10−20) and in all HIV-1 subtypes except A. Reverse transcriptase-E138X and B*18 frequencies correlated positively in 16 cohorts with linked HIV/HLA genotypes (Spearman's R = 0.75; P = 7.6 × 10−4) and in unlinked HIV/HLA data from 43 countries (Spearman's R = 0.34, P = 0.02). Notably, reverse transcriptase-E138X frequencies approached (or exceeded) 10% in key epidemic regions (e.g. sub-Saharan Africa, Southeastern Europe) where B*18 is more common. This, along with the observation that reverse transcriptase-E138X variants do not confer in-vitro replicative costs, supports their persistence, and ongoing accumulation in circulation over time. Conclusions: Results illustrate the potential for a natural immune-driven HIV-1 polymorphism to compromise antiretroviral-based prevention, particularly in key epidemic regions. Regional reverse transcriptase-E138X surveillance should be undertaken before use of rilpivirine PrEP.
Published: 2017

49. Promises and pitfalls of Illumina sequencing for HIV resistance genotyping

Author: Art F. Y. Poon and Chanson J. Brumme
Subjects: 0301 basic medicine, Cancer Research, Genotype, Genotyping Techniques, Anti-HIV Agents, HIV Infections, Biology, Genome, DNA sequencing, Virus, 03 medical and health sciences, Virology, Drug Resistance, Viral, Humans, Genotyping, Hiv resistance, Illumina dye sequencing, HIV, High-Throughput Nucleotide Sequencing, virus diseases, Reverse transcriptase, 030104 developmental biology, Infectious Diseases, Mutation, HIV drug resistance
Abstract: Genetic sequencing ("genotyping") plays a critical role in the modern clinical management of HIV infection. This virus evolves rapidly within patients because of its error-prone reverse transcriptase and short generation time. Consequently, HIV variants with mutations that confer resistance to one or more antiretroviral drugs can emerge during sub-optimal treatment. There are now multiple HIV drug resistance interpretation algorithms that take the region of the HIV genome encoding the major drug targets as inputs; expert use of these algorithms can significantly improve to clinical outcomes in HIV treatment. Next-generation sequencing has the potential to revolutionize HIV resistance genotyping by lowering the threshold that rare but clinically significant HIV variants can be detected reproducibly, and by conferring improved cost-effectiveness in high-throughput scenarios. In this review, we discuss the relative merits and challenges of deploying the Illumina MiSeq instrument for clinical HIV genotyping.
Published: 2017

50. Genetic Diversity, Compartmentalization, and Age of HIV Proviruses Persisting in CD4

Author: Bradley R, Jones, Rachel L, Miller, Natalie N, Kinloch, Olivia, Tsai, Hawley, Rigsby, Hanwei, Sudderuddin, Aniqa, Shahid, Bruce, Ganase, Chanson J, Brumme, Marianne, Harris, Art F Y, Poon, Mark A, Brockman, Rémi, Fromentin, Nicolas, Chomont, Jeffrey B, Joy, and Zabrina L, Brumme
Subjects: CD4-Positive T-Lymphocytes, Adolescent, Base Sequence, Genetic Variation, HIV Infections, Viral Load, Young Adult, Anti-Retroviral Agents, Proviruses, Genetic Diversity and Evolution, T-Lymphocyte Subsets, DNA, Viral, HIV-1, Humans, Child, Phylogeny
Abstract: The HIV reservoir, which comprises diverse proviruses integrated into the genomes of infected, primarily CD4(+) T cells, is the main barrier to developing an effective HIV cure. Our understanding of the genetics and dynamics of proviruses persisting within distinct CD4(+) T cell subsets, however, remains incomplete. Using single-genome amplification, we characterized subgenomic proviral sequences (nef region) from naive, central memory, transitional memory, and effector memory CD4(+) T cells from five HIV-infected individuals on long-term combination antiretroviral therapy (cART) and compared these to HIV RNA sequences isolated longitudinally from archived plasma collected prior to cART initiation, yielding HIV data sets spanning a median of 19.5 years (range, 10 to 20 years) per participant. We inferred a distribution of within-host phylogenies for each participant, from which we characterized proviral ages, phylogenetic diversity, and genetic compartmentalization between CD4(+) T cell subsets. While three of five participants exhibited some degree of proviral compartmentalization between CD4(+) T cell subsets, combined analyses revealed no evidence that any particular CD4(+) T cell subset harbored the longest persisting, most genetically diverse, and/or most genetically distinctive HIV reservoir. In one participant, diverse proviruses archived within naive T cells were significantly younger than those in memory subsets, while for three other participants we observed no significant differences in proviral ages between subsets. In one participant, “old” proviruses were recovered from all subsets, and included one sequence, estimated to be 21.5 years old, that dominated (>93%) their effector memory subset. HIV eradication strategies will need to overcome within- and between-host genetic complexity of proviral landscapes, possibly via personalized approaches. IMPORTANCE The main barrier to HIV cure is the ability of a genetically diverse pool of proviruses, integrated into the genomes of infected CD4(+) T cells, to persist despite long-term suppressive combination antiretroviral therapy (cART). CD4(+) T cells, however, constitute a heterogeneous population due to their maturation across a developmental continuum, and the genetic “landscapes” of latent proviruses archived within them remains incompletely understood. We applied phylogenetic techniques, largely novel to HIV persistence research, to reconstruct within-host HIV evolutionary history and characterize proviral diversity in CD4(+) T cell subsets in five individuals on long-term cART. Participants varied widely in terms of proviral burden, genetic diversity, and age distribution between CD4(+) T cell subsets, revealing that proviral landscapes can differ between individuals and between infected cell types within an individual. Our findings expose each within-host latent reservoir as unique in its genetic complexity and support personalized strategies for HIV eradication.
Published: 2019

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

Publisher

154 results on '"Art F Y Poon"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources