1. Towards Discovering SARS-CoV-2 Variants of High Consequence Based on Both Surveillance and Electronically Captured Health Data: First Year Experience in Washington State (January 2020-2021)
- Author
-
Lue Ping Zhao, Pavitra Roychoudhury, Keith R. Jerome, Peter B. Gilbert, Joshua Schiffer, Terry Lybrand, Thomas H. Payne, April Randhawa, Margaret G. Mills, Alex Greninger, Chul-woo Pyo, Ruihan Wang, Renyu Li, Alexander S. Thomas, Brandon M. Norris, Wyatt C. Nelson, and Daniel E. Geraghty
- Subjects
medicine.medical_specialty ,Coronavirus disease 2019 (COVID-19) ,business.industry ,Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) ,Internal medicine ,Cohort ,Haplotype ,medicine ,Clinical significance ,Disease ,Logistic regression ,Institutional review board ,business - Abstract
Background: SARS-CoV-2 is continuously evolving with the emergence of variants of interest (VOI) or with variants of concern (VOC). While Variants of High Consequence (VOHC) are well defined, no such variants have been formally documented. Here we propose an integrated strategy and application towards discovering VOHC. Methods: We utilized 7,137 viral sequences collected from COVID-19 cases in Washington State from January 19, 2020 to January 31, 2021, to identify genome-wide viral single nucleotide variants (SNVs). Utilizing a non-parametric regression model, we selected a subset of SNVs that had significant and substantial expansions over the collection period. Further, using unsupervised learning, we identified multiple SNVs forming haplotypes. To evaluate their clinical relevance, we assembled a discovery cohort of COVID-19 cases (388 inpatients and 295 outpatients) to identify SNVs and haplotypes associated with hospitalization status, a proxy for disease severity. A logistic regression model was used to assess associations of SNVs with hospitalization status in the discovery cohort. These results were validated on an independent cohort of 964 genome sequences derived from COVID-19 cases in Washington State from June 1, 2020 to March 31, 2021. Finding: The analysis of the 7,137 sequences led to identification of 107 SNVs that were statistically significant (false positive error rate q-value 0.10). Forty-one SNVs were considered urgent, because their SNV proportions persisted or expanded above 10% in January 2021, the last month of the current investigation period. Correlating with clinical data, eight SNVs were found to significantly associate with inpatient status (p-values
- Published
- 2021
- Full Text
- View/download PDF