Back to Search Start Over

An Optimized K-Means Algorithm of Reducing Cluster Intra-dissimilarity for Document Clustering.

Authors :
Fan, Wenfei
Wu, Zhaohui
Yang, Jun
Wang, Daling
Yu, Ge
Bao, Yubin
Zhang, Meng
Source :
Advances in Web-Age Information Management; 2005, p785-790, 6p
Publication Year :
2005

Abstract

Due to the high-dimension and sparseness properties of documents, clustering the similar documents together is a tough task. The most popular document clustering method K-Means has the shortcoming of its cluster intra-dissimilarity, i.e. inclining to clustering unrelated documents together. One of the reasons is that all objects (documents) in a cluster produce the same influence to the mean of the cluster. SOM (Self Organizing Map) is a method to reduce the dimension of data and display the data in low dimension space, and it has been applied successfully to clustering of high-dimensional objects. The scalar factor is an important part of SOM. In this paper, an optimized K-Means algorithm is proposed. It introduces the scalar factor from SOM into means during K-Means assignment stage for controlling the influence to the means from new objects. Experiments show that the optimized K-Means algorithm has more F-Measure and less Entropy of clustering than standard K-Means algorithm, thereby reduces the intra-dissimilarity of clusters effectively. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISBNs :
9783540292272
Database :
Supplemental Index
Journal :
Advances in Web-Age Information Management
Publication Type :
Book
Accession number :
32863065
Full Text :
https://doi.org/10.1007/11563952_81