Back to Search
Start Over
A Pseudo-document-based Topical N-grams model for short texts
- Source :
- World Wide Web. 23:3001-3023
- Publication Year :
- 2020
- Publisher :
- Springer Science and Business Media LLC, 2020.
-
Abstract
- In recent years, short text topic modeling has drawn considerable attentions from interdisciplinary researchers. Various customized topic models have been proposed to tackle the semantic sparseness nature of short texts. Most (if not all) of them follow the bag-of-words assumption, which, however, is not adequate since word order and phrases are often critical to capturing the meaning of texts. On the other hand, while some existing topic models are sensitive to word order, they do not perform well on short texts due to the severe data sparseness. To address these issues, we propose the Pseudo-document-based Topical N-Grams model (PTNG), which alleviates the data sparsity problem of short texts while is sensitive to word order. Extensive experiments on three real-world data sets with state-of-the-art baselines demonstrate the high quality of topics learned by PTNG according to UCI coherence scores and more discriminative semantic representation of short texts according to classification results.
- Subjects :
- Topic model
Computer Networks and Communications
business.industry
Computer science
media_common.quotation_subject
02 engineering and technology
computer.software_genre
Discriminative model
Hardware and Architecture
020204 information systems
0202 electrical engineering, electronic engineering, information engineering
020201 artificial intelligence & image processing
Semantic representation
Quality (business)
Artificial intelligence
business
computer
Software
Coherence (linguistics)
Natural language processing
media_common
Word order
Meaning (linguistics)
Subjects
Details
- ISSN :
- 15731413 and 1386145X
- Volume :
- 23
- Database :
- OpenAIRE
- Journal :
- World Wide Web
- Accession number :
- edsair.doi...........31defd198695e3bd2f6da17bbbd81a78