Back to Search Start Over

Segmentation of low-quality typewritten digits

Authors :
J. Perez
Javier Muguerza
C. Rodriguez
Marisa Navarro
José I. Martín
A. Zarate
Source :
ICPR
Publication Year :
2002
Publisher :
IEEE Comput. Soc, 2002.

Abstract

This work addresses the segmentation of numeric fields in forms presenting blurring, breaks and touching in digits. In an OCR system, the segmentation phase plays a determinant role in the global accuracy of the system. Segmentation is basically addressed from two approaches: (a) as an isolated phase in the OCR process, and (b) as interacting with the recognition of the segmented item. In this work, we have considered the first one in order to develop a robust new cost function combining vertical projection, Tsujimoto metric (1991) and background information. Unlike other techniques reported in the literature, ours obtains a near-optimum number of break points in fields containing broken, blurred and touching characters, leading to high accuracy in the global OCR system. Our experiments with a sample including about 11283 numeric fields in 144 forms (more than 50000 digits of that kind) show that 99.74% of fields have been correctly segmented. The new cost function only made 50 errors.

Details

Database :
OpenAIRE
Journal :
Proceedings. Fourteenth International Conference on Pattern Recognition (Cat. No.98EX170)
Accession number :
edsair.doi...........92eaf5a7a9927417a2e2e1d1f59cddc5