Back to Search
Start Over
Information Extraction from Arabic and Latin scanned invoices
- Source :
- ASAR
- Publication Year :
- 2018
- Publisher :
- IEEE, 2018.
-
Abstract
- The relevant entity extraction from scanned document image is a very challenging task due to highly heterogeneous templates, and several structure layouts. These problems lead to inaccuracy for document image recognized by OCR. In this paper, we propose an effective solution for these problems, in which the relevant entities are extracted from Arabic and Latin scanned invoices. The input of the system is an invoice image which is submitted to an OCR without layout analysis. After, invoices are labeled in the text recognized by the OCR. By combining the logical and physical structures, a local graph model is built for extraction entity. Finally, we implement a correction module which requires the mislabeling correction by eliminating the superfluous parts detected by labeling step. We evaluate the obtained results with 1050 real invoices as reported in experimental section.
- Subjects :
- Structure (mathematical logic)
Information retrieval
Invoice
Arabic
Computer science
Section (typography)
computer.software_genre
language.human_language
Image (mathematics)
Task (computing)
Information extraction
Logical conjunction
ComputingMethodologies_DOCUMENTANDTEXTPROCESSING
language
computer
Subjects
Details
- Database :
- OpenAIRE
- Journal :
- 2018 IEEE 2nd International Workshop on Arabic and Derived Script Analysis and Recognition (ASAR)
- Accession number :
- edsair.doi...........8a01335d42c4ebc703ad9b2147c069f1