1. An integrated K-means – Laplacian cluster ensemble approach for document datasets
- Author
-
An Jing, Kung-Sik Chan, Xianfeng Li, Sen Xu, Jun Gao, Xiaopeng Hua, and Xu Xiufang
- Subjects
Normalization (statistics) ,Computer science ,Cognitive Neuroscience ,k-means clustering ,02 engineering and technology ,computer.software_genre ,01 natural sciences ,Computer Science Applications ,010104 statistics & probability ,ComputingMethodologies_PATTERNRECOGNITION ,Artificial Intelligence ,0202 electrical engineering, electronic engineering, information engineering ,Cluster (physics) ,Benchmark (computing) ,020201 artificial intelligence & image processing ,Data mining ,0101 mathematics ,Laplacian matrix ,Cluster analysis ,computer ,Laplace operator ,Eigendecomposition of a matrix - Abstract
Cluster ensemble has become an important extension to traditional clustering algorithms, yet the cluster ensemble problem is very challenging due to the inherent difficulty in resolving the label correspondence problem. We adapted the integrated K-means - Laplacian clustering approach to solve the cluster ensemble problem by exploiting both the attribute information embedded in the cluster labels and the pairwise relations among the objects. The optimal solution of the proposed approach requires computing the pseudo inverse of the normalized Laplacian matrix and the eigenvalue decomposition of a large matrix, which can be computationally burdensome for large scale document datasets. We devised an effective algebraic transformation method for efficiently carrying out the aforementioned computations and proposed an integrated K-means - Laplacian cluster ensemble approach (IKLCEA). Experimental results with benchmark document datasets demonstrate that IKLCEA outperforms other cluster ensemble techniques on most cases. In addition, IKLCEA is computationally efficient and can be readily employed in large scale document applications.
- Published
- 2016
- Full Text
- View/download PDF