Back to Search Start Over

Multifeature Fusion Keyword Extraction Algorithm Based on TextRank

Authors :
Wenming Guo
Zihao Wang
Fang Han
Source :
IEEE Access, Vol 10, Pp 71805-71813 (2022)
Publication Year :
2022
Publisher :
IEEE, 2022.

Abstract

Keyword extraction is the predecessor of many tasks, and its results directly affect search, recommendation, classification, and other tasks. In this study, we take Chinese text as the research object and propose a multi-feature fusion keyword extraction algorithm combined with BERT semantics and K-Truss graph(BSKT). The BSKT algorithm is based on the TextRank algorithm, which combines BERT semantic features, K-Truss features, and other features. First, the BSKT algorithm obtains the word vectors from the BERT pretraining model to calculate the semantic difference, which is used to optimize the iterative process of the TextRank word graph. Then, the BSKT algorithm obtains its K-Truss graph by decomposing the TextRank word graph and obtains the truss level feature of the word. Finally, by combining the word IDF and truss level features, the BSKT algorithm scores the words to extract keywords. Experimental results show that the BSKT algorithm achieves better performance than the latest keyword extraction algorithm SCTR in the task of extracting 1–10 keywords. Furthermore, the increment in F1 increased by 11.2% when the BSKT algorithm was used to extract three keywords from the Sensor dataset.

Details

Language :
English
ISSN :
21693536
Volume :
10
Database :
Directory of Open Access Journals
Journal :
IEEE Access
Publication Type :
Academic Journal
Accession number :
edsdoj.2e003bd00081478c8e2909d1308f7145
Document Type :
article
Full Text :
https://doi.org/10.1109/ACCESS.2022.3188861