Back to Search Start Over

An end-to-end framework for information extraction from Italian resumes

Authors :
Alessandro Barducci
Simone Iannaccone
Valerio La Gatta
Vincenzo Moscato
Giancarlo Sperlì
Sergio Zavota
Barducci, 63. A.
Iannaccone, S.
La Gatta, V.
Moscato, V.
Sperli', G.
Zavota, S.
Source :
Expert Systems with Applications. 210:118487
Publication Year :
2022
Publisher :
Elsevier BV, 2022.

Abstract

Nowadays, recruitment processes are increasingly being automated by intelligent systems which provide best candidates for companies’ open positions, and vice versa. However, extracting information from the unstructured documents involved in these processes (e.g. resumes, jobs’ descriptions) still represents an open challenge because of their high heterogeneity (in the form and style) and the lack of pre-defined standards between different companies and/or countries. In this paper, we address the resume information extraction problem, focusing on documents within the Italian Labor Market. Specifically, we propose an effective and efficient end-to-end framework capable of providing a complete candidate overview including his personal information, skills and work experiences. Specifically, after having extracted the raw data from the resume documents, the system segments them into semantically consistent parts using linguistics patterns. Each segment is further processed with a NER algorithm, based on pre-trained language models, to extract relevant information which an HR specialist could consult in order to assess the suitability of a candidate for a job offer. We collected (and labeled) a new Italian resume dataset and our results prove the effectiveness of the proposed method, especially considering the great advantages our segmentation strategy brings to the NER performance with respect to standard line-based segmentation approaches. In addition, our system achieves promising performance when combined with modern NLP models.

Details

ISSN :
09574174
Volume :
210
Database :
OpenAIRE
Journal :
Expert Systems with Applications
Accession number :
edsair.doi.dedup.....28a751cd1bd43a4a6f739f85c1ec14e4