Back to Search
Start Over
Application of texture-based features for text non-text classification in printed document images with novel feature selection algorithm
- Source :
- Soft Computing. 26:891-909
- Publication Year :
- 2021
- Publisher :
- Springer Science and Business Media LLC, 2021.
-
Abstract
- Text non-text separation is one of the most essential pre-processing steps for any optical character recognition (OCR) system. As an OCR engine can only process texts, the non-texts present in an input document image are required to be suppressed at the initial level. Therefore, to build a complete OCR system, an efficient text non-text separation module needs to be developed. To this end, we have proposed a texture-based feature descriptor followed by a novel feature selection technique for region-based text non-text classification. First, we have incorporated rotation invariant property with local ternary pattern to form a new texture-based feature descriptor, rotation invariant local ternary pattern (RILTP). Next, a novel feature selection technique is proposed which is a modified version of binary particle swarm optimization (BPSO). For the evaluation of the proposed text non-text classification method, we have initially constructed a database consisting of 690 images of text and non-text regions extracted from 70 pages of RDCL 2015 and 75 pages of RDCL 2017 page segmentation competitions databases. In this database, each class contains 345 data samples. The proposed texture-based feature descriptor has obtained an accuracy of 97.09% on this database. Whereas, after applying BPSO, the feature dimension is reduced by approximately 55% and at the same time, the accuracy reaches 97.5%. Furthermore, in this work, another database is also created from Media team document pages to validate the robustness of this method. The second database comprises 100 text and 100 non-text images. The method has achieved 96.28% accuracy when it is trained with the first database and tested with the second database. The comparative study reveals the robustness and strength of the proposed method as it outnumbers many state-of-the-art texture-based features. Besides, the proposed feature selection method is also compared with various standard feature selection methods, and it has been observed that the proposed one outperforms all those methods considered here for comparison.
- Subjects :
- Computer science
business.industry
Computational intelligence
Feature selection
Pattern recognition
Optical character recognition
computer.software_genre
Theoretical Computer Science
Feature Dimension
Robustness (computer science)
Segmentation
Geometry and Topology
Artificial intelligence
Invariant (mathematics)
business
Rotation (mathematics)
computer
Software
Subjects
Details
- ISSN :
- 14337479 and 14327643
- Volume :
- 26
- Database :
- OpenAIRE
- Journal :
- Soft Computing
- Accession number :
- edsair.doi...........7b5550fe41482ce4a948c501e17cd6a6