Back to Search
Start Over
Topic Analysis by Exploring Headline Information
- Source :
- Web Information Systems Engineering – WISE 2020 ISBN: 9783030620073, WISE (2)
- Publication Year :
- 2020
- Publisher :
- Springer International Publishing, 2020.
-
Abstract
- As for the topic representation in standard topic models, the words that appear in a document are considered with the same weight under the assumption of ‘bag of words’. The word-topic assignment will lean to the high-frequency words and ignore the influence of the low-frequency words. As a result, it will ultimately impact on the performance of topic representation. Generally, the statistical information obtained from the whole document collection can be used to improve this situation. In addition, headlines of some kind of documents, such as news articles, usually summarize the important elements in the document, and the words in headlines are more appropriate to represent the topics. However, few previous studies consider the headline rich information, which is significant for topic modeling. In this paper, we propose a new headline-based topic model in order to accomplish a well-formed topic description. Experimental results on three widely used datasets show that the proposed headline-based modeling scheme achieves lower perplexity.
- Subjects :
- Topic model
Scheme (programming language)
Perplexity
Information retrieval
Computer science
Headline
02 engineering and technology
Representation (arts)
Latent Dirichlet allocation
symbols.namesake
Order (exchange)
Bag-of-words model
020204 information systems
0202 electrical engineering, electronic engineering, information engineering
symbols
020201 artificial intelligence & image processing
computer
computer.programming_language
Subjects
Details
- ISBN :
- 978-3-030-62007-3
- ISBNs :
- 9783030620073
- Database :
- OpenAIRE
- Journal :
- Web Information Systems Engineering – WISE 2020 ISBN: 9783030620073, WISE (2)
- Accession number :
- edsair.doi...........e7f5fc5fd0b8de5bec9346923b336b43