Back to Search
Start Over
Parallel topic model and its application on document clustering
- Source :
- International Journal of Information and Communication Technology. 11:552
- Publication Year :
- 2017
- Publisher :
- Inderscience Publishers, 2017.
-
Abstract
- This paper presents PLDACOL, our parallel implementation on LDACOL model, to effectively cluster large-scale documents. Since phrases contain more semantic information than the sum of its individual word, we use topic model LDACOL for phrase discovery, and use Gibbs sampling for parameter inference. PLDACOL overcomes the high computation time cost in parameter inference by the distributed computing framework based on Hadoop. We show that our PLDACOL can be applied to the clustering of large-scale documents in different size and produces significant improvements on both effectiveness and efficiency compared with other related traditional algorithms.
- Subjects :
- Topic model
Phrase
Computer Networks and Communications
Computer science
Computation
Inference
Document clustering
computer.software_genre
Computer Science Applications
symbols.namesake
symbols
Data mining
Cluster analysis
computer
Software
Word (computer architecture)
Information Systems
Gibbs sampling
Subjects
Details
- ISSN :
- 17418070 and 14666642
- Volume :
- 11
- Database :
- OpenAIRE
- Journal :
- International Journal of Information and Communication Technology
- Accession number :
- edsair.doi.dedup.....5ce13c9c5419a15f92c3f7c7b7e117fe
- Full Text :
- https://doi.org/10.1504/ijict.2017.10008317