Back to Search Start Over

Application of texture-based features for text non-text classification in printed document images with novel feature selection algorithm

Authors :
Ankur Manna
S. K. Khalid Hassan
Showmik Bhowmik
Ram Sarkar
Ali Hussain Khan
Soulib Ghosh
Source :
Soft Computing. 26:891-909
Publication Year :
2021
Publisher :
Springer Science and Business Media LLC, 2021.

Abstract

Text non-text separation is one of the most essential pre-processing steps for any optical character recognition (OCR) system. As an OCR engine can only process texts, the non-texts present in an input document image are required to be suppressed at the initial level. Therefore, to build a complete OCR system, an efficient text non-text separation module needs to be developed. To this end, we have proposed a texture-based feature descriptor followed by a novel feature selection technique for region-based text non-text classification. First, we have incorporated rotation invariant property with local ternary pattern to form a new texture-based feature descriptor, rotation invariant local ternary pattern (RILTP). Next, a novel feature selection technique is proposed which is a modified version of binary particle swarm optimization (BPSO). For the evaluation of the proposed text non-text classification method, we have initially constructed a database consisting of 690 images of text and non-text regions extracted from 70 pages of RDCL 2015 and 75 pages of RDCL 2017 page segmentation competitions databases. In this database, each class contains 345 data samples. The proposed texture-based feature descriptor has obtained an accuracy of 97.09% on this database. Whereas, after applying BPSO, the feature dimension is reduced by approximately 55% and at the same time, the accuracy reaches 97.5%. Furthermore, in this work, another database is also created from Media team document pages to validate the robustness of this method. The second database comprises 100 text and 100 non-text images. The method has achieved 96.28% accuracy when it is trained with the first database and tested with the second database. The comparative study reveals the robustness and strength of the proposed method as it outnumbers many state-of-the-art texture-based features. Besides, the proposed feature selection method is also compared with various standard feature selection methods, and it has been observed that the proposed one outperforms all those methods considered here for comparison.

Details

ISSN :
14337479 and 14327643
Volume :
26
Database :
OpenAIRE
Journal :
Soft Computing
Accession number :
edsair.doi...........7b5550fe41482ce4a948c501e17cd6a6