Keyword extraction as sequence labeling with classification algorithms.

Authors :: Kılıç Ünlü, Hüma
Çetin, Aydın
Source :: Neural Computing & Applications. Feb2023, Vol. 35 Issue 4, p3413-3422. 10p.
Publication Year :: 2023
Abstract: Keyword extraction is one of the main problems in clustering and linking textual content. In literature, several machine learning approaches were proposed for keyword and keyphrase extraction. However, the state-of-the-art performance results are still below the expectations. In this paper, we propose a novel hybrid keyword extraction model, HybridKEM. The proposed model addresses the keyword extraction problem as a sequence labelling task. Naive Bayes (NB), Polynomial Regression (PR) Support Vector Machine (SVM), Multi-Layer Perceptron (MLP), and Random Forest (RF) classification algorithms were trained separately in the Token Classification module of the model. The Token Classification process was performed by using text, graphic, embedding, and set features in the model. The performance of the model was evaluated using the Inspec, Semeval-2017, 500N-KPCrowd datasets, which are widely used in studies in the literature, and two newly collected, TRDizinEn and DergiParkEn datasets. The model achieved an average F 1 -score of 0.664 for all datasets. The highest F 1 -score (0.74) was obtained with the TRDizinEn dataset. [ABSTRACT FROM AUTHOR]