Back to Search Start Over

A statistical view of clustering performance through the theory of -processes.

Authors :
Clémençon, Stéphan
Source :
Journal of Multivariate Analysis. Feb2014, Vol. 124, p42-56. 15p.
Publication Year :
2014

Abstract

Abstract: Many clustering techniques aim at optimizing empirical criteria that are of the form of a -statistic of degree two. Given a measure of dissimilarity between pairs of observations, the goal is to minimize the within cluster point scatter over a class of partitions of the feature space. It is the purpose of this paper to define a general statistical framework, relying on the theory of -processes, for studying the performance of such clustering methods. In this setup, under adequate assumptions on the complexity of the subsets forming the partition candidates, the excess of clustering risk of the empirical minimizer is proved to be of the order . A lower bound result shows that the rate obtained is optimal in a minimax sense. Based on recent results related to the tail behavior of degenerate -processes, it is also shown how to establish tighter, and even faster, rate bounds under additional assumptions. Model selection issues, related to the number of clusters forming the data partition in particular, are also considered. Finally, it is explained how the theoretical results developed here can provide statistical guarantees for empirical clustering aggregation. [Copyright &y& Elsevier]

Details

Language :
English
ISSN :
0047259X
Volume :
124
Database :
Academic Search Index
Journal :
Journal of Multivariate Analysis
Publication Type :
Academic Journal
Accession number :
93485901
Full Text :
https://doi.org/10.1016/j.jmva.2013.10.001