Back to Search Start Over

Unsupervised genome-wide cluster analysis: nucleotide sequences of the omicron variant of SARS-CoV-2 are similar to sequences from early 2020

Authors :
Georg Hahn
Sanghun Lee
Dmitry Prokopenko
Tanya Novak
Julian Hecker
Surender Khurana
Lindsey R. Baden
Adrienne G. Randolph
Scott T. Weiss
Christoph Lange
Publication Year :
2021
Publisher :
Cold Spring Harbor Laboratory, 2021.

Abstract

The GISAID database contains more than 1,000,000 SARS-CoV-2 genomes, including sequences of the recently discovered SARS-CoV-2 omicron variant and of prior SARS-CoV-2 strains that have been collected from patients around the world since the beginning of the pandemic. We applied unsupervised cluster analysis to the SARS-CoV-2 genomes, assessing their similarity at a genome-wide level based on the Jaccard index and principal component analysis. Our analysis results show that the omicron variant sequences are most similar to sequences that have been submitted early in the pandemic around January 2020. Furthermore, the omicron variants in GISAID are spread across the entire range of the first principal component, suggesting that the strain has been in circulation for some time. This observation supports a long-term infection hypothesis as the omicron strain origin.

Details

Database :
OpenAIRE
Accession number :
edsair.doi...........2fe4882253479d62f1f7a857fe14cd28
Full Text :
https://doi.org/10.1101/2021.12.29.474469