Back to Search Start Over

A Pseudo-document-based Topical N-grams model for short texts

Authors :
Hao Lin
Guannan Liu
Junjie Wu
Zhiang Wu
Hong Li
Yuan Zuo
Source :
World Wide Web. 23:3001-3023
Publication Year :
2020
Publisher :
Springer Science and Business Media LLC, 2020.

Abstract

In recent years, short text topic modeling has drawn considerable attentions from interdisciplinary researchers. Various customized topic models have been proposed to tackle the semantic sparseness nature of short texts. Most (if not all) of them follow the bag-of-words assumption, which, however, is not adequate since word order and phrases are often critical to capturing the meaning of texts. On the other hand, while some existing topic models are sensitive to word order, they do not perform well on short texts due to the severe data sparseness. To address these issues, we propose the Pseudo-document-based Topical N-Grams model (PTNG), which alleviates the data sparsity problem of short texts while is sensitive to word order. Extensive experiments on three real-world data sets with state-of-the-art baselines demonstrate the high quality of topics learned by PTNG according to UCI coherence scores and more discriminative semantic representation of short texts according to classification results.

Details

ISSN :
15731413 and 1386145X
Volume :
23
Database :
OpenAIRE
Journal :
World Wide Web
Accession number :
edsair.doi...........31defd198695e3bd2f6da17bbbd81a78