Back to Search
Start Over
Improving the analysis of biological ensembles through extended similarity measures
- Source :
- Physical Chemistry Chemical Physics. 24:444-451
- Publication Year :
- 2022
- Publisher :
- Royal Society of Chemistry (RSC), 2022.
-
Abstract
- We present new algorithms to classify structural ensembles of macromolecules, based on the recently proposed extended similarity measures. Molecular Dynamics provides a wealth of structural information on systems of biologically interest. As computer power increases we capture larger ensembles and larger conformational transitions between states. Typically, structural clustering provides the statistical mechanics treatment of the system to identify relevant biological states. The key advantage of our approach is that the newly introduced extended similiarity indices reduce the computational complexity of assessing the similarity of a set of structures from O(N2) to O(N). Here we take advantage of this favorable cost to develop several highly efficient techniques, including a linear-scaling algorithm to determine the medoid of a set (which we effectively use to select the most representative structure of a cluster). Moreover, we use our extended similarity indices as a linkage criterion in a novel hierarchical agglomerative clustering algorithm. We apply these new metrics to analyze the ensembles of several systems of biological interest such as folding and binding of macromolecules (peptide,protein,DNA -protein). In particular, we design a new workflow that is capable of identifying the most important conformations contributing to the protein folding process. We show excellent performance in the resulting clusters (surpassing traditional linkage criteria), along with faster performance and an efficient cost-function to identify when to merge clusters.
- Subjects :
- Computational complexity theory
Macromolecular Substances
Computer science
Proteins
General Physics and Astronomy
DNA
Linkage (mechanical)
Folding (DSP implementation)
computer.software_genre
Medoid
law.invention
Set (abstract data type)
ComputingMethodologies_PATTERNRECOGNITION
Similarity (network science)
law
Cluster (physics)
Data mining
Physical and Theoretical Chemistry
Peptides
Cluster analysis
computer
Algorithms
Subjects
Details
- ISSN :
- 14639084 and 14639076
- Volume :
- 24
- Database :
- OpenAIRE
- Journal :
- Physical Chemistry Chemical Physics
- Accession number :
- edsair.doi.dedup.....48f77a8e3c1df15888c183197d27ffe7
- Full Text :
- https://doi.org/10.1039/d1cp04019g