Back to Search
Start Over
An Optimized K-Means Algorithm of Reducing Cluster Intra-dissimilarity for Document Clustering.
- Source :
- Advances in Web-Age Information Management; 2005, p785-790, 6p
- Publication Year :
- 2005
-
Abstract
- Due to the high-dimension and sparseness properties of documents, clustering the similar documents together is a tough task. The most popular document clustering method K-Means has the shortcoming of its cluster intra-dissimilarity, i.e. inclining to clustering unrelated documents together. One of the reasons is that all objects (documents) in a cluster produce the same influence to the mean of the cluster. SOM (Self Organizing Map) is a method to reduce the dimension of data and display the data in low dimension space, and it has been applied successfully to clustering of high-dimensional objects. The scalar factor is an important part of SOM. In this paper, an optimized K-Means algorithm is proposed. It introduces the scalar factor from SOM into means during K-Means assignment stage for controlling the influence to the means from new objects. Experiments show that the optimized K-Means algorithm has more F-Measure and less Entropy of clustering than standard K-Means algorithm, thereby reduces the intra-dissimilarity of clusters effectively. [ABSTRACT FROM AUTHOR]
Details
- Language :
- English
- ISBNs :
- 9783540292272
- Database :
- Supplemental Index
- Journal :
- Advances in Web-Age Information Management
- Publication Type :
- Book
- Accession number :
- 32863065
- Full Text :
- https://doi.org/10.1007/11563952_81