1. A Novel Approach of Transcriptomic microRNA Analysis Using Text Mining Methods: An Early Detection of Multiple Sclerosis Disease
- Author
-
Mai S. Mabrouk, Mohamed Shaheen, Nehal M. Ali, and Mohamed Aborizka
- Subjects
transcriptomic data ,General Computer Science ,Computer science ,business.industry ,Dimensionality reduction ,Feature extraction ,General Engineering ,Computational biology ,text mining ,miRNA analysis ,Linear discriminant analysis ,KmerFIDF ,multiple sclerosis ,Fingolimod ,Random forest ,TK1-9971 ,Support vector machine ,Text mining ,machine learning ,medicine ,General Materials Science ,Data pre-processing ,Electrical engineering. Electronics. Nuclear engineering ,business ,medicine.drug - Abstract
Multiple sclerosis is an autoimmune disease that causes psychological impacts and severe physical disabilities, including motor disabilities and partial blindness. This work introduces an early detection method for multiple sclerosis disease by analyzing transcriptomic microRNA data. By transforming this phenotype classification problem into a text mining problem, multiple sclerosis disease biomarkers can be obtained. To our knowledge, text mining methods have not been introduced previously in transcriptomic data analysis of multiple sclerosis disease. Hence, this work presents a complete predictive model by combining consecutive transcriptomic data preprocessing procedures, followed by the proposed KmerFIDF method as a feature extraction method and linear discriminant analysis for dimensionality reduction. Predictive machine learning methods can then be obtained accordingly. This study describes experimental work on a transcriptomic dataset of noncoding microRNA sequences denoted from relapsing-remitting multiple sclerosis patients before fingolimod treatment and after six consecutive months of treatment. The experimental results of the predictive methods with the proposed model report sensitivity, specificity, F1-score, and average accuracy scores of 96.4, 96.47, 95.6, and 97% with random forest, 92.89, 92.78, 93.2, and 94% with support vector machine and 91.95, 92.2, 93.1, and 94% with logistic regression, respectively. These promising results support the introduced model and the proposed KmerFIDF method in transcriptomic data analysis. Moreover, comparative experiments are conducted with two referenced studies. The obtained results show that the average reported accuracy scores of the proposed model outperform the referenced literature work.
- Published
- 2021