An Attention-Based End-to-End Model for Multiple Text Lines Recognition in Japanese Historical Documents

Authors :: Masaki Nakagawa
Cuong Tuan Nguyen
Nam Tuan Ly
Source :: ICDAR
Publication Year :: 2019
Publisher :: IEEE, 2019.
Abstract: This paper presents an attention-based convolutional sequence to sequence (ACseq2seq) model for recognizing an input image of multiple text lines from Japanese historical documents without explicit segmentation of lines. The recognition system has three main parts: a feature extractor using Convolutional Neural Network (CNN) to extract a feature sequence from an input image; an encoder employing bidirectional Long Short-Term Memory (BLSTM) to encode the feature sequence; and a decoder using a unidirectional LSTM with the attention mechanism to generate the final target text based on the attended pertinent features. We also introduce a residual LSTM network between the attention vector and softmax layer in the decoder. The system can be trained end-to-end by a standard cross-entropy loss function. In the experiment, we evaluate the performance of the ACseq2seq model on the anomalously deformed Kana datasets in the PRMU contest. The results of the experiments show that our proposed model achieves higher recognition accuracy than the state-of-the-art recognition methods on the anomalously deformed Kana datasets.

Subjects :: Sequence
Computer science
business.industry
020206 networking & telecommunications
Pattern recognition
02 engineering and technology
Convolutional neural network
Softmax function
0202 electrical engineering, electronic engineering, information engineering
Feature (machine learning)
020201 artificial intelligence & image processing
Segmentation
Artificial intelligence
business
Encoder

Database :: OpenAIRE
Journal :: 2019 International Conference on Document Analysis and Recognition (ICDAR)
Accession number :: edsair.doi...........494b1aa34fb14ac5ccc52182bb035a8d
Full Text :: https://doi.org/10.1109/icdar.2019.00106

Full Text Access

Tools