Back to Search Start Over

Candidate word generation for OCR errors using optimization algorithm.

Authors :
Pham, D. T.
Nguyen, D. Q.
Le, A. D.
Phan, M. N.
Kromer, P.
Source :
AIP Conference Proceedings. 2021, Vol. 2406 Issue 1, p1-8. 8p.
Publication Year :
2021

Abstract

OCR post-processing is an important step to improve OCR text accuracy. It includes two main tasks, error detection and error correction. Hill climbing algorithm is a heuristic search method used for solving optimization problems. In this paper, we present a novel OCR error correction approach using an adapted version of the Hill climbing algorithm. Correction candidates of OCR errors are explored by random character edits and evolved with the Hill climbing. The character edit patterns are obtained from the training data. The proposed model is evaluated on the benchmark dataset in the OCR post-correction competition of the International Conference on Document Analysis and Recognition 2017. It is shown that our model outperforms various baseline approaches in the competition. In addition, the randomness of the proposed algorithm is analyzed to verify its stability under parameter configurations. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
0094243X
Volume :
2406
Issue :
1
Database :
Academic Search Index
Journal :
AIP Conference Proceedings
Publication Type :
Conference
Accession number :
152533178
Full Text :
https://doi.org/10.1063/5.0066687