Back to Search Start Over

Unsupervised-learning-based keyphrase extraction from a single document by the effective combination of the graph-based model and the modified C-value method.

Authors :
Yeom, Hongseon
Ko, Youngjoong
Seo, Jungyun
Source :
Computer Speech & Language. Nov2019, Vol. 58, p304-318. 15p.
Publication Year :
2019

Abstract

Keyphrases of a given document represent its main topic and they are used as a simple method to represent the document. Statistical and graph-based models as unsupervised approaches have been mainly studied. The statistical models have some difficulty in extracting keyphrases from a single document because most statistical ones generally require statistical information from a large corpus. On the other hand, graph-based models can extract keyphrases by only using the information from a single document; nevertheless, they have some drawbacks. The scores of the edges can be biased because a single document does not contain sufficient information to score the edges of a graph and this influences the score of the nodes. In this paper, we propose an effective combination method of a statistical model, C-value method, and a graph-based model to overcome the drawbacks of each model. A new scoring method for keyphrase candidates is developed by the graph-based model and the scores calculated by the new method are applied to the modified C-value method to estimate the final importance scores of the keyphrase candidates. Subsequently, the proposed model is evaluated using two datasets, SemEval 2010 and Inspec, and its results outperformed the state-of-the-art model among unsupervised models and the existing graph-based ranking models. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
08852308
Volume :
58
Database :
Academic Search Index
Journal :
Computer Speech & Language
Publication Type :
Academic Journal
Accession number :
137662632
Full Text :
https://doi.org/10.1016/j.csl.2019.04.008