1. Heterogeneous Dual-Attentional Network for WiFi and Video-Fused Multi-Modal Crowd Counting
- Author
-
Hao, Lifei, Huang, Baoqi, Jia, Bing, and Mao, Guoqiang
- Abstract
Crowd counting aims to estimate the number of individuals in targeted areas. However, mainstream vision-based methods suffer from limited coverage and difficulty in multi-camera collaboration, which limits their scalability, whereas emerging WiFi-based methods can only obtain coarse results due to signal randomness. To overcome the inherent limitations of unimodal approaches and effectively exploit the advantage of multi-modal approaches, this paper presents an innovative WiFi and video-fused multi-modal paradigm by leveraging a heterogeneous dual-attentional network, which jointly models the intra- and inter-modality relationships of global WiFi measurements and local videos to achieve accurate and stable large-scale crowd counting. First, a flexible hybrid sensing network is constructed to capture synchronized multi-modal measurements characterizing the same crowd at different scales and perspectives; second, differential preprocessing, heterogeneous feature extractors, and self-attention mechanisms are sequentially utilized to extract and optimize modality-independent and crowd-related features; third, the cross-attention mechanism is employed to deeply fuse and generalize the matching relationships of two modalities. Extensive real-world experiments demonstrate that our method can significantly reduce the error by 26.2%, improve the stability by 48.43%, and achieve the accuracy of about 88% in large-scale crowd counting when including the videos from two cameras, compared to the best WiFi unimodal baseline.
- Published
- 2024
- Full Text
- View/download PDF