Start Over

An Accuracy Enhanced Vision Language Grounding Method Fused with Gaze Intention.

Authors :: Zhang, Junqian
Tu, Long
Zhang, Yakun
Xie, Liang
Xu, Minpeng
Ming, Dong
Yan, Ye
Yin, Erwei
Source :: Electronics (2079-9292); Dec2023, Vol. 12 Issue 24, p5007, 16p
Publication Year :: 2023
Abstract: Visual grounding aims to recognize and locate the target in the image according to human intention, which provides a new intelligent interaction idea and method for augmented reality (AR) and virtual reality (VR) devices. However, existing vision language grounding adopts language modals for visual grounding, but it performs ineffectively for images containing multiple similar objects. Gaze interaction is an important interaction mode in AR/VR devices, and it provides an advanced solution to the inaccurate vision language grounding cases. Based on the above questions and analysis, a vision language grounding framework fused with gaze intention is proposed. Firstly, we collect the manual gaze annotations using the AR device and construct a novel multi-modal dataset, RefCOCOg-Gaze, combining it with the proposed data augmentation methods. Secondly, an attention-based multi-modal feature fusion model is designed, providing a baseline framework for vision language grounding with gaze intention (VLG-Gaze). Through a series of precisely designed experiments, we analyze the proposed dataset and framework qualitatively and quantitatively. Comparing with the state-of-the-art vision language grounding model, our proposed scheme improves the accuracy by 5.3%, which indicates the significance of gaze fusion in multi-modal grounding tasks. [ABSTRACT FROM AUTHOR]

Subjects :: GAZE
VISION
AUGMENTED reality
DATA augmentation
INTENTION
VIRTUAL reality
NEUROLINGUISTICS

Details

Language :: English
ISSN :: 20799292
Volume :: 12
Issue :: 24
Database :: Complementary Index
Journal :: Electronics (2079-9292)
Publication Type :: Academic Journal
Accession number :: 174440492
Full Text :: https://doi.org/10.3390/electronics12245007

Full Text Access

View/download PDF

Tools

Email
Cite

Printer

Authors Abstract Subjects Details

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

An Accuracy Enhanced Vision Language Grounding Method Fused with Gaze Intention.

Abstract

Subjects

Details

Tools

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

An Accuracy Enhanced Vision Language Grounding Method Fused with Gaze Intention.

Abstract

Subjects

Details

Tools

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources