Start Over

High performance RGB-Thermal Video Object Detection via hybrid fusion with progressive interaction and temporal-modal difference.

Authors :: Wang, Qishun
Tu, Zhengzheng
Li, Chenglong
Tang, Jin
Source :: Information Fusion. Feb2025, Vol. 114, pN.PAG-N.PAG. 1p.
Publication Year :: 2025
Abstract: RGB-Thermal Video Object Detection (RGBT VOD) is to localize and classify the predefined objects in visible and thermal spectrum videos. The key issue in RGBT VOD lies in integrating multi-modal information effectively to improve detection performance. Current multi-modal fusion methods predominantly employ middle fusion strategies, but the inherent modal difference directly influences the effect of multi-modal fusion. Although the early fusion strategy reduces the modality gap in the middle stage of the network, achieving in-depth feature interaction between different modalities remains challenging. In this work, we propose a novel hybrid fusion network called PTMNet, which effectively combines the early fusion strategy with the progressive interaction and the middle fusion strategy with the temporal-modal difference, for high performance RGBT VOD. In particular, we take each modality as a master modality to achieve an early fusion with other modalities as auxiliary information by progressive interaction. Such a design not only alleviates the modality gap but facilitates middle fusion. The temporal-modal difference models temporal information through spatial offsets and utilizes feature erasure between modalities to motivate the network to focus on shared objects in both modalities. The hybrid fusion can achieve high detection accuracy only using three input frames, which makes our PTMNet achieve a high inference speed. Experimental results show that our approach achieves state-of-the-art performance on the VT-VOD50 dataset and also operates at over 70 FPS. The code will be freely released at https://github.com/tzz-ahu for academic purposes. • A hybrid fusion strategy network for RGB-Thermal video object detection. • An early strategy for reducing modal disparities. • A novel differential method for modeling multimodal and temporal information. • The proposed PTMNet achieves SOTA performance on the VT-VOD50 dataset. [ABSTRACT FROM AUTHOR]

Subjects :: *VISIBLE spectra
*MODALITY (Linguistics)
*VIDEOS
*MOTIVATION (Psychology)
*SPEED

Details

Language :: English
ISSN :: 15662535
Volume :: 114
Database :: Academic Search Index
Journal :: Information Fusion
Publication Type :: Academic Journal
Accession number :: 180494341
Full Text :: https://doi.org/10.1016/j.inffus.2024.102665

Full Text Access

View/download PDF

Tools

Email
Cite

Printer

Authors Abstract Subjects Details

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

High performance RGB-Thermal Video Object Detection via hybrid fusion with progressive interaction and temporal-modal difference.

Abstract

Subjects

Details

Tools

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

High performance RGB-Thermal Video Object Detection via hybrid fusion with progressive interaction and temporal-modal difference.

Abstract

Subjects

Details

Tools

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources