Back to Search
Start Over
A consistency-based feature selection method allied with linear SVMs for HIV-1 protease cleavage site prediction
- Source :
- PLoS ONE, Vol 8, Iss 8, p e63145 (2013), PLoS ONE
- Publication Year :
- 2013
- Publisher :
- Public Library of Science (PLoS), 2013.
-
Abstract
- Background Predicting type-1 Human Immunodeficiency Virus (HIV-1) protease cleavage site in protein molecules and determining its specificity is an important task which has attracted considerable attention in the research community. Achievements in this area are expected to result in effective drug design (especially for HIV-1 protease inhibitors) against this life-threatening virus. However, some drawbacks (like the shortage of the available training data and the high dimensionality of the feature space) turn this task into a difficult classification problem. Thus, various machine learning techniques, and specifically several classification methods have been proposed in order to increase the accuracy of the classification model. In addition, for several classification problems, which are characterized by having few samples and many features, selecting the most relevant features is a major factor for increasing classification accuracy. Results We propose for HIV-1 data a consistency-based feature selection approach in conjunction with recursive feature elimination of support vector machines (SVMs). We used various classifiers for evaluating the results obtained from the feature selection process. We further demonstrated the effectiveness of our proposed method by comparing it with a state-of-the-art feature selection method applied on HIV-1 data, and we evaluated the reported results based on attributes which have been selected from different combinations. Conclusion Applying feature selection on training data before realizing the classification task seems to be a reasonable data-mining process when working with types of data similar to HIV-1. On HIV-1 data, some feature selection or extraction operations in conjunction with different classifiers have been tested and noteworthy outcomes have been reported. These facts motivate for the work presented in this paper. Software availability The software is available at http://ozyer.etu.edu.tr/c-fs-svm.rar. The software can be downloaded at esnag.etu.edu.tr/software/hiv_cleavage_site_prediction.rar; you will find a readme file which explains how to set the software in order to work.
- Subjects :
- Viral Diseases
Support Vector Machine
Text Mining
Gene Identification and Analysis
lcsh:Medicine
Linear classifier
HIV Infections
02 engineering and technology
Bioinformatics
computer.software_genre
Biochemistry
Software
Computational Chemistry
HIV Protease
Drug Discovery
0202 electrical engineering, electronic engineering, information engineering
Macromolecular Structure Analysis
Biochemical Simulations
Preprocessor
lcsh:Science
Macromolecular Complex Analysis
0303 health sciences
Multidisciplinary
3. Good health
Chemistry
Infectious Diseases
Viral Enzymes
Medicine
020201 artificial intelligence & image processing
Computer Inferencing
Algorithms
Research Article
Feature vector
Feature selection
Biology
Machine learning
Data type
Microbiology
Molecular Genetics
03 medical and health sciences
README
Virology
Humans
030304 developmental biology
business.industry
lcsh:R
Proteins
Computational Biology
HIV
Computing Methods
Support vector machine
ComputingMethodologies_PATTERNRECOGNITION
Proteolysis
Computer Science
HIV-1
lcsh:Q
Artificial intelligence
business
Peptides
computer
Subjects
Details
- Language :
- English
- ISSN :
- 19326203
- Volume :
- 8
- Issue :
- 8
- Database :
- OpenAIRE
- Journal :
- PLoS ONE
- Accession number :
- edsair.doi.dedup.....3368fa5f71dc546c48e8d418fffb8caf