Back to Search Start Over

WETM: A word embedding-based topic model with modified collapsed Gibbs sampling for short text.

Authors :
Rashid, Junaid
Kim, Jungeun
Hussain, Amir
Naseem, Usman
Source :
Pattern Recognition Letters. Aug2023, Vol. 172, p158-164. 7p.
Publication Year :
2023

Abstract

• A word embedding-based topic model (WETM) for short text documents. • Sparsity problem removed in short text and discovered structural information for topics and words. • A modified collapsed Gibbs sampling algorithm to find the parameters for WETM. • WETM achieved better classification, topic coherence, topic quality, and clustering results. • The execution time is lower for WETM as compared to baseline topic models. Short texts are a common source of knowledge, and the extraction of such valuable information is beneficial for several purposes. Traditional topic models are incapable of analyzing the internal structural information of topics. They are mostly based on the co-occurrence of words at the document level and are often unable to extract semantically relevant topics from short text datasets due to their limited length. Although some traditional topic models are sensitive to word order due to the strong sparsity of data, they do not perform well on short texts. In this paper, we propose a novel word embedding-based topic model (WETM) for short text documents to discover the structural information of topics and words and eliminate the sparsity problem. Moreover, a modified collapsed Gibbs sampling algorithm is proposed to strengthen the semantic coherence of topics in short texts. WETM extracts semantically coherent topics from short texts and finds relationships between words. Extensive experimental results on two real-world datasets show that WETM achieves better topic quality, topic coherence, classification, and clustering results. WETM also requires less execution time compared to traditional topic models. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
01678655
Volume :
172
Database :
Academic Search Index
Journal :
Pattern Recognition Letters
Publication Type :
Academic Journal
Accession number :
169814875
Full Text :
https://doi.org/10.1016/j.patrec.2023.06.007