Back to Search Start Over

Variant-driven early warning via unsupervised machine learning analysis of spike protein mutations for COVID-19

Authors :
Adele de Hoffer
Shahram Vatani
Corentin Cot
Giacomo Cacciapaglia
Maria Luisa Chiusano
Andrea Cimarelli
Francesco Conventi
Antonio Giannini
Stefan Hohenegger
Francesco Sannino
Source :
Scientific Reports, Vol 12, Iss 1, Pp 1-14 (2022)
Publication Year :
2022
Publisher :
Nature Portfolio, 2022.

Abstract

Abstract Never before such a vast amount of data, including genome sequencing, has been collected for any viral pandemic than for the current case of COVID-19. This offers the possibility to trace the virus evolution and to assess the role mutations play in its spread within the population, in real time. To this end, we focused on the Spike protein for its central role in mediating viral outbreak and replication in host cells. Employing the Levenshtein distance on the Spike protein sequences, we designed a machine learning algorithm yielding a temporal clustering of the available dataset. From this, we were able to identify and define emerging persistent variants that are in agreement with known evidences. Our novel algorithm allowed us to define persistent variants as chains that remain stable over time and to highlight emerging variants of epidemiological interest as branching events that occur over time. Hence, we determined the relationship and temporal connection between variants of interest and the ensuing passage to dominance of the current variants of concern. Remarkably, the analysis and the relevant tools introduced in our work serve as an early warning for the emergence of new persistent variants once the associated cluster reaches 1% of the time-binned sequence data. We validated our approach and its effectiveness on the onset of the Alpha variant of concern. We further predict that the recently identified lineage AY.4.2 (‘Delta plus’) is causing a new emerging variant. Comparing our findings with the epidemiological data we demonstrated that each new wave is dominated by a new emerging variant, thus confirming the hypothesis of the existence of a strong correlation between the birth of variants and the pandemic multi-wave temporal pattern. The above allows us to introduce the epidemiology of variants that we described via the Mutation epidemiological Renormalisation Group framework.

Subjects

Subjects :
Medicine
Science

Details

Language :
English
ISSN :
20452322
Volume :
12
Issue :
1
Database :
Directory of Open Access Journals
Journal :
Scientific Reports
Publication Type :
Academic Journal
Accession number :
edsdoj.48308a15a3a41e3ba643314968e9c98
Document Type :
article
Full Text :
https://doi.org/10.1038/s41598-022-12442-8