Back to Search Start Over

Information Extraction from Arabic and Latin scanned invoices

Authors :
Mohamed Benjlaiel
Maroua Tounsi
Adel M. Alimi
Najoua Rahal
Source :
ASAR
Publication Year :
2018
Publisher :
IEEE, 2018.

Abstract

The relevant entity extraction from scanned document image is a very challenging task due to highly heterogeneous templates, and several structure layouts. These problems lead to inaccuracy for document image recognized by OCR. In this paper, we propose an effective solution for these problems, in which the relevant entities are extracted from Arabic and Latin scanned invoices. The input of the system is an invoice image which is submitted to an OCR without layout analysis. After, invoices are labeled in the text recognized by the OCR. By combining the logical and physical structures, a local graph model is built for extraction entity. Finally, we implement a correction module which requires the mislabeling correction by eliminating the superfluous parts detected by labeling step. We evaluate the obtained results with 1050 real invoices as reported in experimental section.

Details

Database :
OpenAIRE
Journal :
2018 IEEE 2nd International Workshop on Arabic and Derived Script Analysis and Recognition (ASAR)
Accession number :
edsair.doi...........8a01335d42c4ebc703ad9b2147c069f1