Back to Search Start Over

Genome-wide identification and prediction of SARS-CoV-2 mutations show an abundance of variants: Integrated study of bioinformatics and deep neural learning

Authors :
Md Shahadat Hossain
A.Q.M. Sala Uddin Pathan
Md Nur Islam
Mahafujul Islam Quadery Tonmoy
Mahmudul Islam Rakib
Md Adnan Munim
Otun Saha
Atqiya Fariha
Hasan Al Reza
Maitreyee Roy
Newaz Mohammed Bahadur
Md Mizanur Rahaman
Source :
Informatics in Medicine Unlocked, Vol 27, Iss , Pp 100798- (2021)
Publication Year :
2021
Publisher :
Elsevier, 2021.

Abstract

Genomic data analysis is a fundamental system for monitoring pathogen evolution and the outbreak of infectious diseases. Based on bioinformatics and deep learning, this study was designed to identify the genomic variability of SARS-CoV-2 worldwide and predict the impending mutation rate. Analysis of 259044 SARS-CoV-2 isolates identified 3334545 mutations with an average of 14.01 mutations per isolate. Globally, single nucleotide polymorphism (SNP) is the most prevalent mutational event. The prevalence of C > T (52.67%) was noticed as a major alteration across the world followed by the G > T (14.59%) and A > G (11.13%). Strains from India showed the highest number of mutations (48) followed by Scotland, USA, Netherlands, Norway, and France having up to 36 mutations. D416G, F106F, P314L, UTR:C241T, L93L, A222V, A199A, V30L, and A220V mutations were found as the most frequent mutations. D1118H, S194L, R262H, M809L, P314L, A8D, S220G, A890D, G1433C, T1456I, R233C, F263S, L111K, A54T, A74V, L183A, A316T, V212F, L46C, V48G, Q57H, W131R, G172V, Q185H, and Y206S missense mutations were found to largely decrease the structural stability of the corresponding proteins. Conversely, D3L, L5F, and S97I were found to largely increase the structural stability of the corresponding proteins. Multi-nucleotide mutations GGG > AAC, CC > TT, TG > CA, and AT > TA have come up in our analysis which are in the top 20 mutational cohort. Future mutation rate analysis predicts a 17%, 7%, and 3% increment of C > T, A > G, and A > T, respectively in the future. Conversely, 7%, 7%, and 6% decrement is estimated for T > C, G > A, and G > T mutations, respectively. T > G/A, C > G/A, and A > T/C are not anticipated in the future. Since SARS-CoV-2 is mutating continuously, our findings will facilitate the tracking of mutations and help to map the progression of the COVID-19 intensity worldwide.

Details

Language :
English
ISSN :
23529148
Volume :
27
Issue :
100798-
Database :
Directory of Open Access Journals
Journal :
Informatics in Medicine Unlocked
Publication Type :
Academic Journal
Accession number :
edsdoj.f4520e092ca4e9493c551338bcb7c98
Document Type :
article
Full Text :
https://doi.org/10.1016/j.imu.2021.100798