Back to Search Start Over

Automatic classification of company's document stream: Comparison of two solutions.

Authors :
Voerman, Joris
Souleiman Mahamoud, Ibrahim
Coustaty, Mickael
Joseph, Aurélie
Poulain d'Andecy, Vincent
Ogier, Jean-Marc
Source :
Pattern Recognition Letters. Aug2023, Vol. 172, p181-187. 7p.
Publication Year :
2023

Abstract

• Company's document stream highly challenge classical neural networks because it is imbalanced and incomplete. • There is multiple solutions to manage imbalanced, our best solution is to combine them. • Incompleteness is more difficult and require to include few-shot learning methods. • Our best proposition is to use adapted neural networks and few-shot learning methods successively in a cascade. Documents are essential nowadays and present everywhere. In order to manage the vast amount of documents managed by companies, a first step consists in automatically determining the type of the document (its class). Even if automatic classification has been widely studied in the state of the art, the strongly imbalanced context and industrial constraints bring new challenges which were not studied till now: how to classify as many documents as possible with the highest precision, in an imbalanced context and with some classes missing during training? To this end, this paper proposes to study two different solutions to address these issues. The first is a multimodal neural network reinforced by an attention model and an adapted loss function that is able to classify a great variety of documents. The second is a combination method that uses a cascade of systems to offer a gradual solution for each issue. These two options provide good results as well in ideal context than in imbalanced context. This comparison outlines the limitations and the future challenges. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
01678655
Volume :
172
Database :
Academic Search Index
Journal :
Pattern Recognition Letters
Publication Type :
Academic Journal
Accession number :
169814881
Full Text :
https://doi.org/10.1016/j.patrec.2023.06.012