Back to Search Start Over

CLDE-Net: crowd localization and density estimation based on CNN and transformer network.

Authors :
Hu, Yaocong
Lin, Yuanyuan
Yang, Huicheng
Liu, Bingyou
Wan, Guoyang
Hong, Jinwen
Xie, Chao
Wang, Wei
Lu, Xiaobo
Source :
Multimedia Systems. Jun2024, Vol. 30 Issue 3, p1-21. 21p.
Publication Year :
2024

Abstract

Given a crowd image, there are two ways for human to approximate the counting number: exactly locating head points in each local region or directly estimating the total number of person based on the whole image. By imitating human visual perception, CNN and transformer are two mainstream models for solving crowd counting challenging, among which CNN has a strong ability to extract locality-oriented feature and transformer is suitable for modeling global dependencies. Based on the fact, in this paper, the proposed CLDE-Net is the first study that fulfills exact localization and direct estimation by designing the hybrid of CNN and transformer, to be specific, CNN searches all candidate head points in each local region and transformer learns the crowd density map with global receptive fields. Furthermore, we adopt two pipelines to further boost crowd counting performance: (1) cross-layer feature interaction module is employed to facilitate information transmission between two network branches of CNN and transformer and (2) dynamic factor generator is designed to adaptively fuse the result of head point localization and density map estimation. Extensive experiments show that the proposed CLDE-Net framework achieves the state-of-the-art performance on multiple data sets for crowd counting. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
09424962
Volume :
30
Issue :
3
Database :
Academic Search Index
Journal :
Multimedia Systems
Publication Type :
Academic Journal
Accession number :
176525206
Full Text :
https://doi.org/10.1007/s00530-024-01318-8