1. WETM: A word embedding-based topic model with modified collapsed Gibbs sampling for short text.
- Author
-
Rashid, Junaid, Kim, Jungeun, Hussain, Amir, and Naseem, Usman
- Subjects
- *
GIBBS sampling , *WORD order (Grammar) , *DATA mining , *VOCABULARY - Abstract
• A word embedding-based topic model (WETM) for short text documents. • Sparsity problem removed in short text and discovered structural information for topics and words. • A modified collapsed Gibbs sampling algorithm to find the parameters for WETM. • WETM achieved better classification, topic coherence, topic quality, and clustering results. • The execution time is lower for WETM as compared to baseline topic models. Short texts are a common source of knowledge, and the extraction of such valuable information is beneficial for several purposes. Traditional topic models are incapable of analyzing the internal structural information of topics. They are mostly based on the co-occurrence of words at the document level and are often unable to extract semantically relevant topics from short text datasets due to their limited length. Although some traditional topic models are sensitive to word order due to the strong sparsity of data, they do not perform well on short texts. In this paper, we propose a novel word embedding-based topic model (WETM) for short text documents to discover the structural information of topics and words and eliminate the sparsity problem. Moreover, a modified collapsed Gibbs sampling algorithm is proposed to strengthen the semantic coherence of topics in short texts. WETM extracts semantically coherent topics from short texts and finds relationships between words. Extensive experimental results on two real-world datasets show that WETM achieves better topic quality, topic coherence, classification, and clustering results. WETM also requires less execution time compared to traditional topic models. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF