Back to Search Start Over

A two-step framework for text line segmentation in historical Arabic and Latin document images.

Authors :
Mechi, Olfa
Mehri, Maroua
Ingold, Rolf
Essoukri Ben Amara, Najoua
Source :
International Journal on Document Analysis & Recognition; Sep2021, Vol. 24 Issue 3, p197-218, 22p
Publication Year :
2021

Abstract

One of the most important preliminary tasks in a transcription system of historical document images is text line segmentation. Nevertheless, this task remains complex due to the idiosyncrasies of ancient document images. In this article, we present a complete framework for text line segmentation in historical Arabic or Latin document images. A two-step procedure is described. First, a deep fully convolutional networks (FCN) architecture has been applied to extract the main area covering the text core. In order to select the highest performing FCN architecture, a thorough performance benchmarking of the most recent and widely used FCN architectures for segmenting text lines in historical Arabic or Latin document images has been conducted. Then, a post-processing step, which is based on topological structure analysis is introduced to extract complete text lines (including the ascender and descender components). This second step aims at refining the obtained FCN results and at providing sufficient information for text recognition. Our experiments have been carried out using a large number of Arabic and Latin document images collected from the Tunisian national archives as well as other benchmark datasets. Quantitative and qualitative assessments are reported in order to firstly pinpoint the strengths and weaknesses of the different FCN architectures and secondly to illustrate the effectiveness of the proposed post-processing method. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
14332833
Volume :
24
Issue :
3
Database :
Complementary Index
Journal :
International Journal on Document Analysis & Recognition
Publication Type :
Academic Journal
Accession number :
152027655
Full Text :
https://doi.org/10.1007/s10032-021-00377-1