Back to Search Start Over

Virus2Vec: Viral Sequence Classification Using Machine Learning

Authors :
Ali, Sarwan
Bello, Babatunde
Chourasia, Prakash
Punathil, Ria Thazhe
Chen, Pin-Yu
Khan, Imdad Ullah
Patterson, Murray
Publication Year :
2023
Publisher :
arXiv, 2023.

Abstract

Understanding the host-specificity of different families of viruses sheds light on the origin of, e.g., SARS-CoV-2, rabies, and other such zoonotic pathogens in humans. It enables epidemiologists, medical professionals, and policymakers to curb existing epidemics and prevent future ones promptly. In the family Coronaviridae (of which SARS-CoV-2 is a member), it is well-known that the spike protein is the point of contact between the virus and the host cell membrane. On the other hand, the two traditional mammalian orders, Carnivora (carnivores) and Chiroptera (bats) are recognized to be responsible for maintaining and spreading the Rabies Lyssavirus (RABV). We propose Virus2Vec, a feature-vector representation for viral (nucleotide or amino acid) sequences that enable vector-space-based machine learning models to identify viral hosts. Virus2Vec generates numerical feature vectors for unaligned sequences, allowing us to forego the computationally expensive sequence alignment step from the pipeline. Virus2Vec leverages the power of both the \emph{minimizer} and position weight matrix (PWM) to generate compact feature vectors. Using several classifiers, we empirically evaluate Virus2Vec on real-world spike sequences of Coronaviridae and rabies virus sequence data to predict the host (identifying the reservoirs of infection). Our results demonstrate that Virus2Vec outperforms the predictive accuracies of baseline and state-of-the-art methods.<br />Comment: 11 Pages 6 Figures Accepted in conference Conference on Health, Inference, and Learning (CHIL) 2023

Details

Database :
OpenAIRE
Accession number :
edsair.doi.dedup.....9fff32f1507fe98a9acde2d272136e44
Full Text :
https://doi.org/10.48550/arxiv.2304.12328