NASTER: Non-local Attentional Scene Text Recognizer

Authors :: Yanbin Hao
Richang Hong
Xueliang Liu
Yunjie Ma
Lei Wu
Source :: ICMR
Publication Year :: 2021
Publisher :: ACM, 2021.
Abstract: Scene text recognition has been widely investigated in computer vision. In the literature, the encoder-decoder based framework, which first encodes image into feature map and then decodes them into corresponding text sequences, have achieved great success. However, this solution fails in low-quality images, as the local visual features extracted from curved or blurred images are difficult to decode into corresponding text. To address this issue, we propose a new framework for Scene Text Recognition (STR), named Non-Local Attentional Scene Text Recognizer (NASTER). We use ResNet with Global Context Block (GC block) to extract global visual features. The global context information is then captured in parallel using the self-attention module and finally decoded by a multi-layer attention decoder with an intermediate supervision module. The proposed method achieves the state-of-the-art performances on seven benchmark datasets, demonstrating the effectiveness of our approach.

Subjects :: Decodes
biology
Computer science
Speech recognition
ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION
Feature (machine learning)
Benchmark (computing)
Context (language use)
Text recognition
biology.organism_classification
Non local
Image (mathematics)
Block (data storage)

Database :: OpenAIRE
Journal :: Proceedings of the 2021 International Conference on Multimedia Retrieval
Accession number :: edsair.doi...........019ab892152d57aca9ce62e34ee233a9
Full Text :: https://doi.org/10.1145/3460426.3463623