Back to Search
Start Over
LineSeg: line segmentation of scanned newspaper documents.
- Source :
- Pattern Analysis & Applications; Feb2022, Vol. 25 Issue 1, p189-208, 20p
- Publication Year :
- 2022
-
Abstract
- Segmentation is a significant stage for the recognition of old newspapers. Text-line extraction in the documents like newspaper pages which have very complex layouts poses a significant challenge. Old newspaper documents printed in Gurumukhi script present several forms of hurdles in segmentation due to noise, degradation, bleed-through of ink, multiple font styles and sizes, little space between neighboring text lines, overlapping of lines, etc. Because of the low quality and the complexity of these documents, automatic text line segmentation remains an open research field. Very few researches are available in the literature to segment news articles in Gurumukhi script. This is one of the first few attempts to recognize Gurumukhi newspaper text. The goal of this paper is to present a new methodology for text-line extraction by integrating median calculation and strip height calculation techniques. Non-suitability of existing techniques to segment newspaper text lines have also been discussed with results in the article. The efficiency of the proposed algorithm is demonstrated by experimentation directed on two diverse own made datasets: (a) on the data set of single-column documents with headlines block (b) on the dataset of multi-column documents with headlines block. [ABSTRACT FROM AUTHOR]
- Subjects :
- NEWSPAPERS
HEADLINES
SCRIPTS
NOISE
MARKOV random fields
ALGORITHMS
Subjects
Details
- Language :
- English
- ISSN :
- 14337541
- Volume :
- 25
- Issue :
- 1
- Database :
- Complementary Index
- Journal :
- Pattern Analysis & Applications
- Publication Type :
- Academic Journal
- Accession number :
- 155064083
- Full Text :
- https://doi.org/10.1007/s10044-021-01031-6