Back to Search Start Over

Energy-based clustering: Fast and robust clustering of data with known likelihood functions.

Authors :
Thürlemann, Moritz
Riniker, Sereina
Source :
Journal of Chemical Physics. 7/14/2023, Vol. 159 Issue 2, p1-11. 11p.
Publication Year :
2023

Abstract

Clustering has become an indispensable tool in the presence of increasingly large and complex datasets. Most clustering algorithms depend, either explicitly or implicitly, on the sampled density. However, estimated densities are fragile due to the curse of dimensionality and finite sampling effects, for instance, in molecular dynamics simulations. To avoid the dependence on estimated densities, an energy-based clustering (EBC) algorithm based on the Metropolis acceptance criterion is developed in this work. In the proposed formulation, EBC can be considered a generalization of spectral clustering in the limit of large temperatures. Taking the potential energy of a sample explicitly into account alleviates requirements regarding the distribution of the data. In addition, it permits the subsampling of densely sampled regions, which can result in significant speed-ups and sublinear scaling. The algorithm is validated on a range of test systems including molecular dynamics trajectories of alanine dipeptide and the Trp-cage miniprotein. Our results show that including information about the potential-energy surface can largely decouple clustering from the sampled density. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
00219606
Volume :
159
Issue :
2
Database :
Academic Search Index
Journal :
Journal of Chemical Physics
Publication Type :
Academic Journal
Accession number :
164937926
Full Text :
https://doi.org/10.1063/5.0148735