1. Prediction of protein subcellular localization using machine learning with novel use of generic feature set
- Author
-
Nawshin Tabassum Tanny, Shahin Akhter, and Paramita Basak Upama
- Subjects
Drug discovery ,business.industry ,Computer science ,Location awareness ,Subcellular localization ,computer.software_genre ,Machine learning ,Support vector machine ,Statistical classification ,ComputingMethodologies_PATTERNRECOGNITION ,Then test ,Independent set ,Artificial intelligence ,business ,Feature set ,computer - Abstract
The method of identifying the location of protein within a cell is called subcellular localization of proteins. This area of research in Bioinformatics is pivotal for protein synthesis and drug discovery of several medical conditions and diseases. This paper introduces a new machine learning approach for subcellular localization of proteins, which used 18 basic and physicochemical features novel for such methods. A model with support vector machine (SVM) was developed at first to learn these properties of proteins from 6 locations inside a cell, and then test the model on another independent set of protein sequences. The proposed multi-class classification algorithm achieved an accuracy of about 94%. The results show superior performance with minimal computations when compared to similar algorithms in the literature.
- Published
- 2020