Back to Search
Start Over
Multiple Sample Data Spectroscopic Clustering of Large Datasets Using Nyström Extension.
- Source :
-
Journal of Computational & Graphical Statistics . Apr-Jun2012, Vol. 21 Issue 2, p338-360. 23p. - Publication Year :
- 2012
-
Abstract
- In this article, we focus on computational aspects of spectral clustering algorithms that have recently shown promising results in machine learning, statistics, and computer vision. These algorithms cluster observations (of sizen) into groups by investigating eigenvectors of an affinity matrix or its Laplacian matrix, both of which are sizen×n. However, when the sample size is large, the computation involved in the matrix eigen-decomposition is expensive or even infeasible. To overcome the computation hurdle, subsampling techniques, such as the Nyström extension, have been used to approximate eigenvectors of large matrices. We study statistical properties of this approximation and their influence on the accuracy of various spectral clustering algorithms. We show that the perturbation of the spectrum due to subsampling could lead to a large discrepancy among clustering results. In order to provide accurate and stable results for large datasets, we propose a method to combine multiple subsamples using data spectroscopic clustering and the Nyström extension. In addition, we propose a sparse approximation of the eigenvectors to further speed up computation. Simulation and experiments on real datasets show that our approaches work quickly and provide reasonable results that are more stable across samples than the single sample approach. This article has supplementary material online. [ABSTRACT FROM AUTHOR]
Details
- Language :
- English
- ISSN :
- 10618600
- Volume :
- 21
- Issue :
- 2
- Database :
- Academic Search Index
- Journal :
- Journal of Computational & Graphical Statistics
- Publication Type :
- Academic Journal
- Accession number :
- 102882748
- Full Text :
- https://doi.org/10.1080/10618600.2012.672104