Back to Search
Start Over
A fine-tuned multimodal large model for power defect image-text question-answering.
- Source :
- Signal, Image & Video Processing; Dec2024, Vol. 18 Issue 12, p9191-9203, 13p
- Publication Year :
- 2024
-
Abstract
- In power defect detection, the complexity of scenes and the diversity of defects pose challenges for manual defect identification. Considering these issues, this paper proposes utilizing a multimodal large model to assist power professionals in identifying power scenes and defects through image-text interactions, thereby enhancing work efficiency. This paper presents a fine-tuned multimodal large model for power defect image-text question-answering, addressing challenges such as training difficulties and the lack of image-text knowledge specific to power defects. This paper utilizes the YOLOv8 to create a dataset for multimodal power defect detection, enriching the image-text information in the power defect domain. By integrating the LoRA and Q-Former methods for model fine-tuning, the algorithm enhances the extraction of visual and semantic features and aligns visual and semantic information. The experimental results demonstrate that the proposed multimodal large model significantly outperforms other popular multimodal models in the domain of power defect question-answering. [ABSTRACT FROM AUTHOR]
Details
- Language :
- English
- ISSN :
- 18631703
- Volume :
- 18
- Issue :
- 12
- Database :
- Complementary Index
- Journal :
- Signal, Image & Video Processing
- Publication Type :
- Academic Journal
- Accession number :
- 180654622
- Full Text :
- https://doi.org/10.1007/s11760-024-03539-w