1. SAFlex: A structural alphabet extension to integrate protein structural flexibility and missing data information
- Author
-
Allam, Ikram, Flatters, Delphine, Caumes, Géraldine, Regad, Leslie, Delos, Vincent, Nuel, Gregory, Camproux, Anne-Claude, Delos, Vincent, Mathématiques Appliquées Paris 5 (MAP5 - UMR 8145), Université Paris Descartes - Paris 5 (UPD5)-Institut National des Sciences Mathématiques et de leurs Interactions (INSMI)-Centre National de la Recherche Scientifique (CNRS), UMR-S973, MTi, and Université Paris Diderot - Paris 7 (UPD7)
- Subjects
Protein Structure Comparison ,Proteomics ,Models, Molecular ,Protein Structure ,[MATH.MATH-PR] Mathematics [math]/Probability [math.PR] ,Protein Conformation ,Markov models ,Molecular Conformation ,lcsh:Medicine ,Research and Analysis Methods ,Biochemistry ,Database and Informatics Methods ,Protein Structure Databases ,Macromolecular Structure Analysis ,Solid State Physics ,Hidden Markov models ,Amino Acid Sequence ,Databases, Protein ,lcsh:Science ,Molecular Biology ,[INFO.INFO-MS]Computer Science [cs]/Mathematical Software [cs.MS] ,[INFO.INFO-BI] Computer Science [cs]/Bioinformatics [q-bio.QM] ,Crystallography ,Proteomic Databases ,Physics ,lcsh:R ,Biology and Life Sciences ,Proteins ,Probability theory ,Condensed Matter Physics ,Markov Chains ,[MATH.MATH-PR]Mathematics [math]/Probability [math.PR] ,Biological Databases ,[INFO.INFO-MS] Computer Science [cs]/Mathematical Software [cs.MS] ,Physical Sciences ,Crystal Structure ,lcsh:Q ,Structural Proteins ,Protein Structure Determination ,[INFO.INFO-BI]Computer Science [cs]/Bioinformatics [q-bio.QM] ,Mathematics ,Algorithms ,Research Article - Abstract
International audience; In this paper, we describe SAFlex (Structural Alphabet Flexibility), an extension of an existing structural alphabet (HMM-SA), to better explore increasing protein three dimensional structure information by encoding conformations of proteins in case of missing residues or uncertainties. An SA aims to reduce three dimensional conformations of proteins as well as their analysis and comparison complexity by simplifying any conformation in a series of structural letters. Our methodology presents several novelties. Firstly, it can account for the encoding uncertainty by providing a wide range of encoding options: the maximum a posteri-ori, the marginal posterior distribution, and the effective number of letters at each given position. Secondly, our new algorithm deals with the missing data in the protein structure files (concerning more than 75% of the proteins from the Protein Data Bank) in a rigorous proba-bilistic framework. Thirdly, SAFlex is able to encode and to build a consensus encoding from different replicates of a single protein such as several homomer chains. This allows localizing structural differences between different chains and detecting structural variability, which is essential for protein flexibility identification. These improvements are illustrated on different proteins, such as the crystal structure of an eukaryotic small heat shock protein. They are promising to explore increasing protein redundancy data and obtain useful quantification of their flexibility.
- Published
- 2018