Back to Search
Start Over
Language Model-Driven Topic Clustering and Summarization for News Articles
- Source :
- IEEE Access, Vol 7, Pp 185506-185519 (2019)
- Publication Year :
- 2019
- Publisher :
- IEEE, 2019.
-
Abstract
- Topic models have been widely utilized in Topic Detection and Tracking tasks, which aim to detect, track, and describe topics from a stream of broadcast news reports. However, most existing topic models neglect semantic or syntactic information and lack readable topic descriptions. To exploit semantic and syntactic information, Language Models (LMs) have been applied in many supervised NLP tasks. However, there are still no extensions of LMs for unsupervised topic clustering. Moreover, it is difficult to employ general LMs (e.g., BERT) to produce readable topic summaries due to the mismatch between the pretraining method and the summarization task. In this paper, noticing the similarity between content and summary, first we propose a Language Model-based Topic Model (LMTM) for Topic Clustering by using an LM to generate a deep contextualized word representation. Then, a new method of training a Topic Summarization Model is introduced, where it is not only able to produce brief topic summaries but also used as an LM in LMTM for topic clustering. Empirical evaluations of two different datasets show that the proposed LMTM method achieves better performance over four baselines for JC, FMI, precision, recall and F1-score. Additionally, the generated readable and reasonable summaries also validate the rationality of our model components.
- Subjects :
- Topic model
General Computer Science
Computer science
02 engineering and technology
seq2seq
010501 environmental sciences
computer.software_genre
01 natural sciences
Task (project management)
Similarity (psychology)
0202 electrical engineering, electronic engineering, information engineering
General Materials Science
Cluster analysis
0105 earth and related environmental sciences
Thesaurus (information retrieval)
business.industry
General Engineering
020207 software engineering
topic summarization
Automatic summarization
language model
Language model
Artificial intelligence
lcsh:Electrical engineering. Electronics. Nuclear engineering
business
computer
lcsh:TK1-9971
Natural language processing
Subjects
Details
- Language :
- English
- ISSN :
- 21693536
- Volume :
- 7
- Database :
- OpenAIRE
- Journal :
- IEEE Access
- Accession number :
- edsair.doi.dedup.....c17aac2e9484635be5954beef96407c4