Back to Search Start Over

Parallel topic model and its application on document clustering

Authors :
Kang An
Shihua Cao
Lidong Wang
Yuhuai Wang
Yun Zhang
Source :
International Journal of Information and Communication Technology. 11:552
Publication Year :
2017
Publisher :
Inderscience Publishers, 2017.

Abstract

This paper presents PLDACOL, our parallel implementation on LDACOL model, to effectively cluster large-scale documents. Since phrases contain more semantic information than the sum of its individual word, we use topic model LDACOL for phrase discovery, and use Gibbs sampling for parameter inference. PLDACOL overcomes the high computation time cost in parameter inference by the distributed computing framework based on Hadoop. We show that our PLDACOL can be applied to the clustering of large-scale documents in different size and produces significant improvements on both effectiveness and efficiency compared with other related traditional algorithms.

Details

ISSN :
17418070 and 14666642
Volume :
11
Database :
OpenAIRE
Journal :
International Journal of Information and Communication Technology
Accession number :
edsair.doi.dedup.....5ce13c9c5419a15f92c3f7c7b7e117fe
Full Text :
https://doi.org/10.1504/ijict.2017.10008317