Author: "Zhu, Zhangzi" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Zhu, Zhangzi"' showing total 6 results

Start Over Author "Zhu, Zhangzi"

6 results on '"Zhu, Zhangzi"'

1. 1st Place Solution to ECCV 2022 Challenge on Out of Vocabulary Scene Text Understanding: End-to-End Recognition of Out of Vocabulary Words

Author: Zhu, Zhangzi, Xue, Chuhui, Hao, Yu, Zhang, Wenqing, and Bai, Song
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Scene text recognition has attracted increasing interest in recent years due to its wide range of applications in multilingual translation, autonomous driving, etc. In this report, we describe our solution to the Out of Vocabulary Scene Text Understanding (OOV-ST) Challenge, which aims to extract out-of-vocabulary (OOV) words from natural scene images. Our oCLIP-based model achieves 28.59\% in h-mean which ranks 1st in end-to-end OOV word recognition track of OOV Challenge in ECCV2022 TiE Workshop., Comment: Report to ECCV TiE OOV competition
Published: 2022

2. Runner-Up Solution to ECCV 2022 Challenge on Out of Vocabulary Scene Text Understanding: Cropped Word Recognition

Author: Zhu, Zhangzi, Hao, Yu, Zhang, Wenqing, Xue, Chuhui, and Bai, Song
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: This report presents our 2nd place solution to ECCV 2022 challenge on Out-of-Vocabulary Scene Text Understanding (OOV-ST) : Cropped Word Recognition. This challenge is held in the context of ECCV 2022 workshop on Text in Everything (TiE), which aims to extract out-of-vocabulary words from natural scene images. In the competition, we first pre-train SCATTER on the synthetic datasets, then fine-tune the model on the training set with data augmentations. Meanwhile, two additional models are trained specifically for long and vertical texts. Finally, we combine the output from different models with different layers, different backbones, and different seeds as the final results. Our solution achieves a word accuracy of 59.45\% when considering out-of-vocabulary words only.
Published: 2022

3. Improving Image Captioning with Control Signal of Sentence Quality

Author: Zhu, Zhangzi and Qu, Hong
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: In the dataset of image captioning, each image is aligned with several descriptions. Despite the fact that the quality of these descriptions varies, existing captioning models treat them equally in the training process. In this paper, we propose a new control signal of sentence quality, which is taken as an additional input to the captioning model. By integrating the control signal information, captioning models are aware of the quality level of the target sentences and handle them differently. Moreover, we propose a novel reinforcement training method specially designed for the control signal of sentence quality: Quality-oriented Self-Annotated Training (Q-SAT). Extensive experiments on MSCOCO dataset show that without extra information from ground truth captions, models controlled by the highest quality level outperform baseline models on accuracy-based evaluation metrics, which validates the effectiveness of our proposed methods., Comment: Accepted by ICASSP2023
Published: 2022

4. Self-Annotated Training for Controllable Image Captioning

Author: Zhu, Zhangzi, Wang, Tianlei, and Qu, Hong
Subjects: Computer Science - Artificial Intelligence, Computer Science - Computer Vision and Pattern Recognition
Abstract: The Controllable Image Captioning (CIC) task aims to generate captions conditioned on designated control signals. Several structure-related control signals are proposed to control the semantic structure of sentences, such as sentence length and Part-of-Speech tag sequences. However, due to the fact that the accuracy-based reward focuses mainly on contents rather than semantic structures, existing reinforcement training methods are not applicable to structure-related CIC models. The lack of reinforcement training leads to exposure bias and the inconsistency between the optimizing function and evaluation metrics. In this paper, we propose a novel reinforcement training method for structure-related control signals: Self-Annotated Training (SAT), to improve both the accuracy and controllability of CIC models. In SAT, a recursive annotation mechanism (RAM) is designed to force the input control signal to match the actual output sentence. Moreover, we propose an extra alignment reward to finetune the CIC model trained after SAT method, which further enhances the controllability of models. On the MSCOCO benchmark, we conduct extensive experiments on different structure-related control signals and on different baseline models, the results of which demonstrate the effectiveness and generalizability of our methods.
Published: 2021

5. Macroscopic Control of Text Generation for Image Captioning

Author: Zhu, Zhangzi, Wang, Tianlei, and Qu, Hong
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Despite the fact that image captioning models have been able to generate impressive descriptions for a given image, challenges remain: (1) the controllability and diversity of existing models are still far from satisfactory; (2) models sometimes may produce extremely poor-quality captions. In this paper, two novel methods are introduced to solve the problems respectively. Specifically, for the former problem, we introduce a control signal which can control the macroscopic sentence attributes, such as sentence quality, sentence length, sentence tense and number of nouns etc. With such a control signal, the controllability and diversity of existing captioning models are enhanced. For the latter problem, we innovatively propose a strategy that an image-text matching model is trained to measure the quality of sentences generated in both forward and backward directions and finally choose the better one. As a result, this strategy can effectively reduce the proportion of poorquality sentences. Our proposed methods can be easily applie on most image captioning models to improve their overall performance. Based on the Up-Down model, the experimental results show that our methods achieve BLEU- 4/CIDEr/SPICE scores of 37.5/120.3/21.5 on MSCOCO Karpathy test split with cross-entropy training, which surpass the results of other state-of-the-art methods trained by cross-entropy loss.
Published: 2021

6. Improving Image Captioning with Control Signal of Sentence Quality

Author: Zhu, Zhangzi, primary, Wang, Shuai, additional, and Qu, Hong, additional
Published: 2023
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

6 results on '"Zhu, Zhangzi"'

1. 1st Place Solution to ECCV 2022 Challenge on Out of Vocabulary Scene Text Understanding: End-to-End Recognition of Out of Vocabulary Words

2. Runner-Up Solution to ECCV 2022 Challenge on Out of Vocabulary Scene Text Understanding: Cropped Word Recognition

3. Improving Image Captioning with Control Signal of Sentence Quality

4. Self-Annotated Training for Controllable Image Captioning

5. Macroscopic Control of Text Generation for Image Captioning

6. Improving Image Captioning with Control Signal of Sentence Quality

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Database

Publisher

6 results on '"Zhu, Zhangzi"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources