Start Over

A two-step framework for text line segmentation in historical Arabic and Latin document images.

Authors :: Mechi, Olfa
Mehri, Maroua
Ingold, Rolf
Essoukri Ben Amara, Najoua
Source :: International Journal on Document Analysis & Recognition; Sep2021, Vol. 24 Issue 3, p197-218, 22p
Publication Year :: 2021
Abstract: One of the most important preliminary tasks in a transcription system of historical document images is text line segmentation. Nevertheless, this task remains complex due to the idiosyncrasies of ancient document images. In this article, we present a complete framework for text line segmentation in historical Arabic or Latin document images. A two-step procedure is described. First, a deep fully convolutional networks (FCN) architecture has been applied to extract the main area covering the text core. In order to select the highest performing FCN architecture, a thorough performance benchmarking of the most recent and widely used FCN architectures for segmenting text lines in historical Arabic or Latin document images has been conducted. Then, a post-processing step, which is based on topological structure analysis is introduced to extract complete text lines (including the ascender and descender components). This second step aims at refining the obtained FCN results and at providing sufficient information for text recognition. Our experiments have been carried out using a large number of Arabic and Latin document images collected from the Tunisian national archives as well as other benchmark datasets. Quantitative and qualitative assessments are reported in order to firstly pinpoint the strengths and weaknesses of the different FCN architectures and secondly to illustrate the effectiveness of the proposed post-processing method. [ABSTRACT FROM AUTHOR]

Subjects :: IMAGE segmentation
TRANSCRIPTION (Linguistics)
CONVOLUTIONAL neural networks
HISTORICAL source material
INFORMATION & communication technologies

Details

Language :: English
ISSN :: 14332833
Volume :: 24
Issue :: 3
Database :: Complementary Index
Journal :: International Journal on Document Analysis & Recognition
Publication Type :: Academic Journal
Accession number :: 152027655
Full Text :: https://doi.org/10.1007/s10032-021-00377-1

Full Text Access

View/download PDF

Tools

Email
Cite

Printer

Authors Abstract Subjects Details

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

A two-step framework for text line segmentation in historical Arabic and Latin document images.

Abstract

Subjects

Details

Tools

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

A two-step framework for text line segmentation in historical Arabic and Latin document images.

Abstract

Subjects

Details

Tools

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources