Back to Search
Start Over
A support vector machine-based method for predicting the propensity of a protein to be soluble or to form inclusion body on overexpression in Escherichia coli
- Source :
- Bioinformatics. 22:278-284
- Publication Year :
- 2005
- Publisher :
- Oxford University Press (OUP), 2005.
-
Abstract
- Motivation: Inclusion body formation has been a major deterrent for overexpression studies since a large number of proteins form insoluble inclusion bodies when overexpressed in Escherichia coli. The formation of inclusion bodies is known to be an outcome of improper protein folding; thus the composition and arrangement of amino acids in the proteins would be a major influencing factor in deciding its aggregation propensity. There is a significant need for a prediction algorithm that would enable the rational identification of both mutants and also the ideal protein candidates for mutations that would confer higher solubility-on-overexpression instead of the presently used trial-and-error procedures. Results: Six physicochemical properties together with residue and dipeptide-compositions have been used to develop a support vector machine-based classifier to predict the overexpression status in E.coli. The prediction accuracy is ∼72% suggesting that it performs reasonably well in predicting the propensity of a protein to be soluble or to form inclusion bodies. The algorithm could also correctly predict the change in solubility for most of the point mutations reported in literature. This algorithm can be a useful tool in screening protein libraries to identify soluble variants of proteins. Avalibility: Software is available on request from the authors. Contact: balaji@iitcb.ac.in; vk.jayaraman@ncl.res.in Supplementary information: Supplementary data are available at Bioinformatics Online web site.
- Subjects :
- Statistics and Probability
Directed Evolution
Eukaryotic Proteins
Mutant
Computational biology
Biology
medicine.disease_cause
Biochemistry
Inclusion bodies
Pattern Recognition, Automated
Artificial Intelligence
Sequence Analysis, Protein
Catalytic Domain
Escherichia coli
medicine
Fold Recognition
Databases, Protein
Molecular Biology
Inclusion Bodies
chemistry.chemical_classification
Escherichia coli Proteins
Gene Expression Profiling
Point mutation
A protein
Gene Expression Regulation, Bacterial
Classification
Recombinant Proteins
Computer Science Applications
Amino acid
Support vector machine
Computational Mathematics
Solubility
Computational Theory and Mathematics
chemistry
Structural Genomics
Protein folding
Mutations
Algorithms
Expression Data
Subjects
Details
- ISSN :
- 13674811 and 13674803
- Volume :
- 22
- Database :
- OpenAIRE
- Journal :
- Bioinformatics
- Accession number :
- edsair.doi.dedup.....89083f7d5b99a0c196549d66124f1ef7