101. Small angle X-ray scattering and cross-linking for data assisted protein structure prediction in CASP 12 with prospects for improved accuracy
- Author
-
Adam Belsom, Juri Rappsilber, Kathryn Burnett, Greg L. Hura, Krzysztof Fidelis, John A. Tainer, Susan E. Tsutakawa, Tadeusz L. Ogorzalek, and Andriy Kryshtafovych
- Subjects
Small Angle ,0301 basic medicine ,Models, Molecular ,Protein Folding ,Computer science ,Protein Conformation ,Biochemistry ,Mathematical Sciences ,Mass Spectrometry ,Scattering ,X-Ray Diffraction ,Models ,Structural Biology ,solution structure ,Small-angle X-ray scattering ,SAXS ,disorder ,Biological Sciences ,Protein structure prediction ,flexibility ,Cross-Linking Reagents ,unfolded regions ,SAS ,Protein folding ,Algorithm ,Algorithms ,assembly ,experimental restraints ,Bioinformatics ,combined methods ,Bioengineering ,Measure (mathematics) ,Article ,Set (abstract data type) ,03 medical and health sciences ,Information and Computing Sciences ,Scattering, Small Angle ,prediction accuracy ,Humans ,unstructured regions ,crystallography ,CASP ,Molecular Biology ,030102 biochemistry & molecular biology ,Molecular ,Experimental data ,Computational Biology ,Proteins ,modeling ,Filter (signal processing) ,solution scattering ,030104 developmental biology ,Generic health relevance - Abstract
Experimental data offers empowering constraints for structure prediction. These constraints can be used to filter equivalently scored models or more powerfully within optimization functions toward prediction. In CASP12, Small Angle X-ray Scattering (SAXS) and Cross-Linking Mass Spectrometry (CLMS) data, measured on an exemplary set of novel fold targets, were provided to the CASP community of protein structure predictors. As solution-based techniques, SAXS and CLMS can efficiently measure states of the full-length sequence in its native solution conformation and assembly. However, this experimental data did not substantially improve prediction accuracy judged by fits to crystallographic models. One issue, beyond intrinsic limitations of the algorithms, was a disconnect between crystal structures and solution-based measurements. Our analyses show that many targets had substantial percentages of disordered regions (up to 40%) or were multimeric or both. Thus, solution measurements of flexibility and assembly support variations that may confound prediction algorithms trained on crystallographic data and expecting globular fully-folded monomeric proteins. Here, we consider the CLMS and SAXS data collected, the information in these solution measurements, and the challenges in incorporating them into computational prediction. As improvement opportunities were only partly realized in CASP12, we provide guidance on how data from the full-length biological unit and the solution state can better aid prediction of the folded monomer or subunit. We furthermore describe strategic integrations of solution measurements with computational prediction programs with the aim of substantially improving foundational knowledge and the accuracy of computational algorithms for biologically-relevant structure predictions for proteins in solution.
- Published
- 2017