1. An attention-based row-column encoder-decoder model for text recognition in Japanese historical documents
- Author
-
Nam Tuan Ly, Cuong Tuan Nguyen, and Masaki Nakagawa
- Subjects
Computer science ,Speech recognition ,Data_CODINGANDINFORMATIONTHEORY ,02 engineering and technology ,Function (mathematics) ,Residual ,Row and column spaces ,01 natural sciences ,Image (mathematics) ,Artificial Intelligence ,0103 physical sciences ,Signal Processing ,0202 electrical engineering, electronic engineering, information engineering ,Feature (machine learning) ,020201 artificial intelligence & image processing ,Segmentation ,Computer Vision and Pattern Recognition ,010306 general physics ,Encoder ,Software - Abstract
This paper presents an attention-based row-column encoder-decoder (ARCED) model for recognizing an input image of multiple text lines from Japanese historical documents without explicit segmentation of lines. The recognition system has three main parts: a feature extractor, a row-column encoder, and a decoder. We introduce a row-column BLSTM in the encoder and a residual LSTM network in the decoder. The whole system is trained end-to-end by a standard cross-entropy loss function, requiring only document images and their ground-truth text. We experimentally evaluate the performance of ARCED on the dataset of Japanese historical documents: Kana-PRMU. The results of the experiments show that ARCED outperforms the state-of-the-art recognition methods on the dataset. Furthermore, we demonstrate that the row-column BLSTM in the encoder and the residual LSTM in the decoder improves the performance of the encoder-decoder model for the recognition of Japanese historical document.
- Published
- 2020
- Full Text
- View/download PDF