Back to Search Start Over

TSRGAN: Real-world text image super-resolution based on adversarial learning and triplet attention

Authors :
Chuantao Fang
Xiaofeng Ling
Yu Zhu
Lei Liao
Source :
Neurocomputing. 455:88-96
Publication Year :
2021
Publisher :
Elsevier BV, 2021.

Abstract

The text in a low-resolution (LR) image is usually hard to read. Super-resolution (SR) is an intuitive solution to this issue. Existing single image super-resolution (SISR) models are mainly trained on synthetic datasets whose LR images are obtained by performing bicubic interpolation or gaussian blur on high-resolution (HR) images. However, these models can hardly generalize to practical scenarios because real-world LR images are more difficult to super-resolve. The newly proposed TextZoom dataset is the first dataset for real-world text image super-resolution. We propose a new model termed TSRGAN trained on this dataset. First, a discriminator is designed to prevent the SR network from generating over-smoothed images. Second, we introduce triplet attention into the SR network for better representational ability. Moreover, besides L 2 loss and adversarial loss, wavelet loss is incorporated to help reconstruct sharper character edges. Since TextZoom provides text labels, the recognition accuracy of scene text recognition (STR) model can be used to evaluate the quality of SR images. It can reflect the performance of text image SR models better than traditional SR evaluation metrics such as PSNR and SSIM. Comprehensive experiments show the superiority of our TSRGAN. Compared with the state-of-the-art method, the proposed TSRGAN improves the average recognition accuracy of ASTER, MORAN and CRNN by 0.8%, 1.5% and 3.2% on TextZoom respectively.

Details

ISSN :
09252312
Volume :
455
Database :
OpenAIRE
Journal :
Neurocomputing
Accession number :
edsair.doi...........df77cd3e98a270088c98f0a750241644
Full Text :
https://doi.org/10.1016/j.neucom.2021.05.060