Back to Search Start Over

Cnosso, a novel method for business document automation based on open information extraction.

Authors :
Scannapieco, Simone
Tomazzoli, Claudio
Source :
Expert Systems with Applications. Jul2024, Vol. 245, pN.PAG-N.PAG. 1p.
Publication Year :
2024

Abstract

The state-of-the-art in automated processing of unstructured business documents has evolved from manual labor to advanced AI systems in the span of mere decades. Such systems involve learning techniques, rule or clause sets, neural models – either used alone or in combination – for the extraction to work. As an example, rule-based processes operate on a perceived layout or positioning of the information, whereas model-based frameworks adopt a semantic, and often uninspectable, approach. Verb-Based Semantic Role Labeling (VBSRL) is a novel system presented in a former paper that uses a hybrid foundation to inform the extraction phase via a set of rules modeling natural language. We propose a new VBSRL-based document processing method, aided by valuable and innovative architectural choices, which has been implemented for the Italian language and experimented upon with promising results. Even in its infancy, in fact, the first implementation of this system shows better results than comparable IE solutions, obtaining an aggregate, average F-measure of nearly 79%. • Automating business document analysis is crucial and time consuming in enterprises. • Classification and information extraction for unstructured documents are hard tasks. • Document processing method via pre-processing, normalization and post-processing. • Information Extraction as Conceptual Dependency Theory plus Semantic Role Labeling. • Performances on real case scenario show better results than comparable IE solutions. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
09574174
Volume :
245
Database :
Academic Search Index
Journal :
Expert Systems with Applications
Publication Type :
Academic Journal
Accession number :
176151956
Full Text :
https://doi.org/10.1016/j.eswa.2023.123038