Automatic Findings Generation for Distress Images Using In-Context Few-Shot Learning of Visual Language Model Based on Image Similarity and Text Diversity.

Authors :: Watanabe, Yuto
Ogawa, Naoki
Maeda, Keisuke
Ogawa, Takahiro
Haseyama, Miki
Source :: Journal of Robotics & Mechatronics. Apr2024, Vol. 36 Issue 2, p353-364. 12p.
Publication Year :: 2024
Abstract: This study proposes an automatic findings generation method that performs in-context few-shot learning of a visual language model. The automatic generation of findings can reduce the burden of creating inspection records for infrastructure facilities. However, the findings must include the opinions and judgments of engineers, in addition to what is recognized from the image; therefore, the direct generation of findings is still challenging. With this background, we introduce in-context few-short learning that focuses on image similarity and text diversity in the visual language model, which enables text output with a highly accurate understanding of both vision and language. Based on a novel in-context few-shot learning strategy, the proposed method comprehensively considers the characteristics of the distress image and diverse findings and can achieve high accuracy in generating findings. In the experiments, the proposed method outperformed the comparative methods in generating findings for distress images captured during bridge inspections. [ABSTRACT FROM AUTHOR]

Subjects :: *LANGUAGE models
*VISUAL learning
*BRIDGE inspection
*COMPARATIVE method
*LEARNING strategies

Full Text Access

Tools