Back to Search Start Over

Improving the analysis of biological ensembles through extended similarity measures

Authors :
Ramón Alain Miranda-Quintana
Liwei Chang
Alberto Perez
Source :
Physical Chemistry Chemical Physics. 24:444-451
Publication Year :
2022
Publisher :
Royal Society of Chemistry (RSC), 2022.

Abstract

We present new algorithms to classify structural ensembles of macromolecules, based on the recently proposed extended similarity measures. Molecular Dynamics provides a wealth of structural information on systems of biologically interest. As computer power increases we capture larger ensembles and larger conformational transitions between states. Typically, structural clustering provides the statistical mechanics treatment of the system to identify relevant biological states. The key advantage of our approach is that the newly introduced extended similiarity indices reduce the computational complexity of assessing the similarity of a set of structures from O(N2) to O(N). Here we take advantage of this favorable cost to develop several highly efficient techniques, including a linear-scaling algorithm to determine the medoid of a set (which we effectively use to select the most representative structure of a cluster). Moreover, we use our extended similarity indices as a linkage criterion in a novel hierarchical agglomerative clustering algorithm. We apply these new metrics to analyze the ensembles of several systems of biological interest such as folding and binding of macromolecules (peptide,protein,DNA -protein). In particular, we design a new workflow that is capable of identifying the most important conformations contributing to the protein folding process. We show excellent performance in the resulting clusters (surpassing traditional linkage criteria), along with faster performance and an efficient cost-function to identify when to merge clusters.

Details

ISSN :
14639084 and 14639076
Volume :
24
Database :
OpenAIRE
Journal :
Physical Chemistry Chemical Physics
Accession number :
edsair.doi.dedup.....48f77a8e3c1df15888c183197d27ffe7
Full Text :
https://doi.org/10.1039/d1cp04019g