Back to Search Start Over

Efficient skew detection and correction in scanned document images through clustering of probabilistic hough transforms

Authors :
Saeeda Naz
Imran Razzak
Riaz Ahmad
Source :
Pattern Recognition Letters. 152:93-99
Publication Year :
2021
Publisher :
Elsevier BV, 2021.

Abstract

Documents scanning is still one of the widely used documents digitization steps; however, skew in scanned documents is inevitable. If this skew is not corrected, the extraction of region/s of interest (RoI) and further processing like; detection and classification on such RoI becomes difficult. It has been shown that skew detection and correction significantly improve Optical Character Recognition (OCR) systems’ accuracy. This paper introduces a novel, robust and straightforward skew detection method for scanned documents, which uses Probabilistic Hough Transformation (PHT) for line detection in a first step and clusters the lines in a second step based on parallelism. The cluster with maximum parallel lines represents the expected skewed lines. The proposed method is tested on real scanned images taken from the Document Image Skew Estimation Contest (DISEC’13), Pashto, and Tobacco800 datasets. The proposed method performs well both in terms of accuracy and efficiency. It is efficient and robust to noise. Furthermore, we show that it also works on Arabic and Latin scripts.

Details

ISSN :
01678655
Volume :
152
Database :
OpenAIRE
Journal :
Pattern Recognition Letters
Accession number :
edsair.doi...........276bfc1dd024b23ea348da8a0d3ac332