1. Identification of subfamily-specific sites based on active sites modeling and clustering
- Author
-
François Artiguenave, Karine Bastard, and Raquel C. de Melo-Minardi
- Subjects
Statistics and Probability ,Protein family ,Structural alignment ,Biology ,computer.software_genre ,Biochemistry ,Models, Biological ,Structural genomics ,Protein structure ,Protein sequencing ,Sequence Analysis, Protein ,Catalytic Domain ,Cluster Analysis ,Homology modeling ,Cluster analysis ,Molecular Biology ,Computational Biology ,Proteins ,Molecular Sequence Annotation ,Computer Science Applications ,Enzymes ,Computational Mathematics ,Computational Theory and Mathematics ,Structural biology ,Data mining ,Phosphorus-Oxygen Lyases ,Serine Proteases ,computer ,Protein Kinases ,Sequence Alignment - Abstract
Motivation: Current computational approaches to function prediction are mostly based on protein sequence classification and transfer of annotation from known proteins to their closest homologous sequences relying on the orthology concept of function conservation. This approach suffers a major weakness: annotation reliability depends on global sequence similarity to known proteins and is poorly efficient for enzyme superfamilies that catalyze different reactions. Structural biology offers a different strategy to overcome the problem of annotation by adding information about protein 3D structures. This information can be used to identify amino acids located in active sites, focusing on detection of functional polymorphisms residues in an enzyme superfamily. Structural genomics programs are providing more and more novel protein structures at a high-throughput rate. However, there is still a huge gap between the number of sequences and available structures. Computational methods, such as homology modeling provides reliable approaches to bridge this gap and could be a new precise tool to annotate protein functions. Results: Here, we present Active Sites Modeling and Clustering (ASMC) method, a novel unsupervised method to classify sequences using structural information of protein pockets. ASMC combines homology modeling of family members, structural alignment of modeled active sites and a subsequent hierarchical conceptual classification. Comparison of profiles obtained from computed clusters allows the identification of residues correlated to subfamily function divergence, called specificity determining positions. ASMC method has been validated on a benchmark of 42 Pfam families for which previous resolved holo-structures were available. ASMC was also applied to several families containing known protein structures and comprehensive functional annotations. We will discuss how ASMC improves annotation and understanding of protein families functions by giving some specific illustrative examples on nucleotidyl cyclases, protein kinases and serine proteases. Availability: http://www.genoscope.fr/ASMC/. Contact: raquelcm@dcc.ufmg.br; kbastard@genoscope.cns.fr; artigue@genoscope.cns.fr Supplementary information: Supplementary data are available at Bioinformatics online.
- Published
- 2010