Back to Search
Start Over
Identification of subfamily-specific sites based on active sites modeling and clustering
- Source :
- Bioinformatics (Oxford, England). 26(24)
- Publication Year :
- 2010
-
Abstract
- Motivation: Current computational approaches to function prediction are mostly based on protein sequence classification and transfer of annotation from known proteins to their closest homologous sequences relying on the orthology concept of function conservation. This approach suffers a major weakness: annotation reliability depends on global sequence similarity to known proteins and is poorly efficient for enzyme superfamilies that catalyze different reactions. Structural biology offers a different strategy to overcome the problem of annotation by adding information about protein 3D structures. This information can be used to identify amino acids located in active sites, focusing on detection of functional polymorphisms residues in an enzyme superfamily. Structural genomics programs are providing more and more novel protein structures at a high-throughput rate. However, there is still a huge gap between the number of sequences and available structures. Computational methods, such as homology modeling provides reliable approaches to bridge this gap and could be a new precise tool to annotate protein functions. Results: Here, we present Active Sites Modeling and Clustering (ASMC) method, a novel unsupervised method to classify sequences using structural information of protein pockets. ASMC combines homology modeling of family members, structural alignment of modeled active sites and a subsequent hierarchical conceptual classification. Comparison of profiles obtained from computed clusters allows the identification of residues correlated to subfamily function divergence, called specificity determining positions. ASMC method has been validated on a benchmark of 42 Pfam families for which previous resolved holo-structures were available. ASMC was also applied to several families containing known protein structures and comprehensive functional annotations. We will discuss how ASMC improves annotation and understanding of protein families functions by giving some specific illustrative examples on nucleotidyl cyclases, protein kinases and serine proteases. Availability: http://www.genoscope.fr/ASMC/. Contact: raquelcm@dcc.ufmg.br; kbastard@genoscope.cns.fr; artigue@genoscope.cns.fr Supplementary information: Supplementary data are available at Bioinformatics online.
- Subjects :
- Statistics and Probability
Protein family
Structural alignment
Biology
computer.software_genre
Biochemistry
Models, Biological
Structural genomics
Protein structure
Protein sequencing
Sequence Analysis, Protein
Catalytic Domain
Cluster Analysis
Homology modeling
Cluster analysis
Molecular Biology
Computational Biology
Proteins
Molecular Sequence Annotation
Computer Science Applications
Enzymes
Computational Mathematics
Computational Theory and Mathematics
Structural biology
Data mining
Phosphorus-Oxygen Lyases
Serine Proteases
computer
Protein Kinases
Sequence Alignment
Subjects
Details
- ISSN :
- 13674811
- Volume :
- 26
- Issue :
- 24
- Database :
- OpenAIRE
- Journal :
- Bioinformatics (Oxford, England)
- Accession number :
- edsair.doi.dedup.....537bde917ea1cfc87628503eeb68ab5c