Back to Search Start Over

Topic Analysis by Exploring Headline Information

Authors :
Guanglai Gao
Rong Yan
Source :
Web Information Systems Engineering – WISE 2020 ISBN: 9783030620073, WISE (2)
Publication Year :
2020
Publisher :
Springer International Publishing, 2020.

Abstract

As for the topic representation in standard topic models, the words that appear in a document are considered with the same weight under the assumption of ‘bag of words’. The word-topic assignment will lean to the high-frequency words and ignore the influence of the low-frequency words. As a result, it will ultimately impact on the performance of topic representation. Generally, the statistical information obtained from the whole document collection can be used to improve this situation. In addition, headlines of some kind of documents, such as news articles, usually summarize the important elements in the document, and the words in headlines are more appropriate to represent the topics. However, few previous studies consider the headline rich information, which is significant for topic modeling. In this paper, we propose a new headline-based topic model in order to accomplish a well-formed topic description. Experimental results on three widely used datasets show that the proposed headline-based modeling scheme achieves lower perplexity.

Details

ISBN :
978-3-030-62007-3
ISBNs :
9783030620073
Database :
OpenAIRE
Journal :
Web Information Systems Engineering – WISE 2020 ISBN: 9783030620073, WISE (2)
Accession number :
edsair.doi...........e7f5fc5fd0b8de5bec9346923b336b43