Start Over

On Block Classification for Automatic Content Extraction From Chinese Resumes

Authors :: Li-Hui Zhao
Ji Zhang
Qiqiang Xu
Linlin Hou
Xin Wang
Source :: IEEE Access, Vol 12, Pp 181808-181822 (2024)
Publication Year :: 2024
Publisher :: IEEE, 2024.
Abstract: Resume information extraction technologies are crucial for automating the shortlisting and evaluation of resumes, benefiting both enterprises and job seekers. Resume block classification plays a pivotal role in the process of resume information extraction, as it significantly impacts the accuracy of the subsequent intra-block information extraction. However, as reported in the existing research, the classification of resume blocks often uses traditional and general-purpose text classification models, failing to consider the intrinsic contextual order relationships amongst resume blocks. Moreover, there are few studies that consider the transferability of classification models. Therefore, to address these limitations, we propose a series of methods to enhance the performance of resume block classification. Our approach focuses on three key aspects. Firstly, we introduce a novel sequence encoder that effectively extracts the sequence features among blocks of the same resume, leveraging the intrinsic contextual order relationships to improve classification accuracy. Secondly, considering the large classification variance of the existing models in different scenarios, we enhance the sequence encoder with a feature fusion strategy, which combines multiple features to improve the model’s robustness and transferability. Thirdly, from the perspective of ensemble learning, we propose a dynamic weighted hybrid model that dynamically generates weighting for each participating sub-model, enabling adaptive integration of different classification models. Finally, to alleviate the huge workload of cross-domain relabeling, we develop a transfer learning model specifically designed for the resume block classification task, facilitating the application of our approach across different domains. To evaluate the effectiveness of our proposed methods, we release three Chinese datasets that include 4,500 Chinese resumes on https://github.com/xqqhelloword/resume-block-classification. Experimental results show that our hybrid model achieves 97.6% accuracy, outperforming existing methods and establishing a new state-of-the-art in this field.