Back to Search
Start Over
Information extraction in handwritten historical logbooks.
- Source :
-
Pattern Recognition Letters . Aug2023, Vol. 172, p128-136. 9p. - Publication Year :
- 2023
-
Abstract
- • Pipeline to extract information from historical tables with different layouts. • Two information extraction systems are compared at each phase of the pipeline. • Deal with multispan and classify headers by content. • Improvements in both systems achieve new state-of-the-art results in HisClima tables. Document Image Understanding is a demanding Pattern Recognition problem that requires complex recognition models. This problem is even more difficult for document images with complicated layouts like tables, where the reading order is often intrinsically ambiguous, and consequently, the context is generally ambiguous as well. In this paper, we compare two machine learning approaches for extracting information in pre-printed historical tables with handwritten information. We analyze the performance of each approach at each step of the extraction process over different corpora, up to a realistic scenario where documents with different table layouts written by different hands are used. The results are good in general and show that a model based on Multilayer Perceptrons yields better results on more homogeneous documents, while another model based on Graph Neural Networks generalizes better on heterogeneous corpora. [ABSTRACT FROM AUTHOR]
Details
- Language :
- English
- ISSN :
- 01678655
- Volume :
- 172
- Database :
- Academic Search Index
- Journal :
- Pattern Recognition Letters
- Publication Type :
- Academic Journal
- Accession number :
- 169814877
- Full Text :
- https://doi.org/10.1016/j.patrec.2023.06.008