Back to Search Start Over

Information extraction in handwritten historical logbooks.

Authors :
Prieto, Jose Ramón
Andrés, José
Granell, Emilio
Sánchez, Joan Andreu
Vidal, Enrique
Source :
Pattern Recognition Letters. Aug2023, Vol. 172, p128-136. 9p.
Publication Year :
2023

Abstract

• Pipeline to extract information from historical tables with different layouts. • Two information extraction systems are compared at each phase of the pipeline. • Deal with multispan and classify headers by content. • Improvements in both systems achieve new state-of-the-art results in HisClima tables. Document Image Understanding is a demanding Pattern Recognition problem that requires complex recognition models. This problem is even more difficult for document images with complicated layouts like tables, where the reading order is often intrinsically ambiguous, and consequently, the context is generally ambiguous as well. In this paper, we compare two machine learning approaches for extracting information in pre-printed historical tables with handwritten information. We analyze the performance of each approach at each step of the extraction process over different corpora, up to a realistic scenario where documents with different table layouts written by different hands are used. The results are good in general and show that a model based on Multilayer Perceptrons yields better results on more homogeneous documents, while another model based on Graph Neural Networks generalizes better on heterogeneous corpora. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
01678655
Volume :
172
Database :
Academic Search Index
Journal :
Pattern Recognition Letters
Publication Type :
Academic Journal
Accession number :
169814877
Full Text :
https://doi.org/10.1016/j.patrec.2023.06.008