1. 关键语义信息补足的深度文本聚类算法.
- Author
-
郑璐依, 黄瑞章, 任丽娜, 白瑞娜, and 林川
- Subjects
- *
DOCUMENT clustering , *DATA quality , *DEEP learning , *ALGORITHMS - Abstract
The most existing deep text clustering methods only use traditional autoencoder to learn representation for clustering, and neglect the problems with over-reliance on raw data quality and loss of key semantic information during feature mapping. This paper proposed a deep document clustering method via key semantic information complementation (DCKSC) model. The DCKSC model firstly enriched the original text data by extracting keyword data. Secondly, this model designed a key semantic information complement module which used data enhancement representation to improve the traditional autoencoder, and compensated for the key semantic information lost in the mapping process. Finally, the algorithm synthesized the clustering loss and the reconstruction loss of the keyword semantic autoencoder, optimized the cluster label assignment and learned the presentation characteristics suitable for clustering. Experimental results show that DCKSC is superior to many mainstream deep document clustering algorithms. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF