Back to Search
Start Over
Developing a named entity recognition model for text documents in Russian to detect personal data using machine learning methods.
- Source :
- Procedia Computer Science; 2022, Vol. 213, p127-135, 9p
- Publication Year :
- 2022
-
Abstract
- This paper outlines a study of various approaches to the task of detecting and removing personal data from text documents in Russian. To solve the problem regular expressions, various supervised learning models, artificial neural networks and Markov models were used. To train and validate the models, a labeled dataset with named entities was used. For each approach, suitable preprocessing was performed on the data. To measure the performance of the models, 3 metrics were evaluated, precision, recall, and f1-score. The best result was obtained using bidirectional LSTM with conditional random field layer (BiLSTM CRF). Options for integrating the developed tool into the software solution, as well as ways to further improve the quality of models are proposed. The software tool can be used by government agencies, judicial institutions, or legal entities. [ABSTRACT FROM AUTHOR]
Details
- Language :
- English
- ISSN :
- 18770509
- Volume :
- 213
- Database :
- Supplemental Index
- Journal :
- Procedia Computer Science
- Publication Type :
- Academic Journal
- Accession number :
- 160438697
- Full Text :
- https://doi.org/10.1016/j.procs.2022.11.047