1,261 results on '"Crowd counting"'
Search Results
2. Robust Zero-Shot Crowd Counting and Localization With Adaptive Resolution SAM
- Author
-
Wan, Jia, Wu, Qiangqiang, Lin, Wei, Chan, Antoni, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Leonardis, Aleš, editor, Ricci, Elisa, editor, Roth, Stefan, editor, Russakovsky, Olga, editor, Sattler, Torsten, editor, and Varol, Gül, editor
- Published
- 2025
- Full Text
- View/download PDF
3. Crowd Counting Using Meta-Test-Time Adaptation.
- Author
-
Ma, Chaoqun, Neri, Ferrante, Gu, Li, Wang, Ziqiang, Wang, Jian, Qing, Anyong, and Wang, Yang
- Subjects
- *
MACHINE learning , *SUPERVISED learning , *DATA augmentation , *CAPABILITIES approach (Social sciences) , *COUNTING - Abstract
Machine learning algorithms are commonly used for quickly and efficiently counting people from a crowd. Test-time adaptation methods for crowd counting adjust model parameters and employ additional data augmentation to better adapt the model to the specific conditions encountered during testing. The majority of current studies concentrate on unsupervised domain adaptation. These approaches commonly perform hundreds of epochs of training iterations, requiring a sizable number of unannotated data of every new target domain apart from annotated data of the source domain. Unlike these methods, we propose a meta-test-time adaptive crowd counting approach called CrowdTTA, which integrates the concept of test-time adaptation into the meta-learning framework and makes it easier for the counting model to adapt to the unknown test distributions. To facilitate the reliable supervision signal at the pixel level, we introduce uncertainty by inserting the dropout layer into the counting model. The uncertainty is then used to generate valuable pseudo labels, serving as effective supervisory signals for adapting the model. In the context of meta-learning, one image can be regarded as one task for crowd counting. In each iteration, our approach is a dual-level optimization process. In the inner update, we employ a self-supervised consistency loss function to optimize the model so as to simulate the parameters update process that occurs during the test phase. In the outer update, we authentically update the parameters based on the image with ground truth, improving the model's performance and making the pseudo labels more accurate in the next iteration. At test time, the input image is used for adapting the model before testing the image. In comparison to various supervised learning and domain adaptation methods, our results via extensive experiments on diverse datasets showcase the general adaptive capability of our approach across datasets with varying crowd densities and scales. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
4. Double multi-scale feature fusion network for crowd counting.
- Author
-
Liu, Qian, Fang, Jiongtao, Zhong, Yixiong, Wang, Cunbao, and Qi, Youwei
- Subjects
PYRAMIDS ,COUNTING ,CROWDS ,DENSITY ,ATTENTION - Abstract
Recently, the research of crowd counting has attracted increasing attention but still faces many challenges, such as crowded scenes, scale variations and cluttered backgrounds. With the development of deep learning, density maps are widely used for crowd counting, where the quality of density maps plays a crucial role in counting performance. In this paper, we propose a new convolutional network architecture, called double multi-scale feature fusion network (DMFFNet), to generate high-quality density maps and accurate counting estimates. DMFFNet utilizes VGG19 to extract multi-scale feature maps from input images. The features from last three scales are further enlarged the receptive fields by three designed dilated feature pyramid modules, and then fused together. Moreover, a feature enhancement module composed of spatial attention and channel-wise attention is presented to weight the fused feature maps for effectively distinguishing between crowd and background. We also design a new dual-scale loss to optimize the network during training. Experimental results show that DMFFNet reduces MAEs by at least 1.5 % , 1.5 % , 1.2 % , 0.6 % and 0.5 % on UCF _ CC _ 50, UCF-QNRF, JHU-Crowd++, ShanghaiTech Part A and Part B datasets, and decreases MSEs by at least 1.8 % and 0.1 % on JHU-Crowd++ and ShanghaiTech Part B datasets, as compared with the state-of-the-art. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
5. Crowd Counting Algorithm for Multi-Scale Fusion Based on Dual Branch Feature Extraction.
- Author
-
ZENG Yunyun, ZHANG Hongying, and YUAN Mingdong
- Abstract
Crowd counting has important applications in public safety management, public space design, and other visual tasks such as behavior analysis and congestion analysis. However, the complexity of the background and the varying size of the head scale result in unsatisfactory crowd counting performance. To address the issues of scale changes and background interference in static images, a crowd counting network based on dual branch intermediate feature extraction is proposed. The network follows the encoder decoder structure and uses the first 16 layers of VGG19 convolutional neural network in the encoding stage. In order to better fuse multi-scale information, it replaces the last 4 convolutions of the first 16 layers of the VGG19 convolutional neural network with dilated convolutions with a vacancy rate of 2. The decoding part uses a residual convolutional attention module (RCAM) to suppress background interference, and inserts a dual branch intermediate feature extraction module (DBFE) in the middle of the encoder decoder structure. Branch 1 adopts a pyramid structure and integrates the position attention module to extract multi-scale contextual information, branch 2 follows a pyramid structure and integrates a dual channel attention mechanism to focus the model on different sizes of head information, and finally uses 1x1 generate density maps through convolution. In terms of experiments, algorithm comparison experiments are carried out on the data sets of ShanghaiTech PartA, ShanghaiTech PartB and Mall. The average absolute error and root mean square error of the model in the above data sets are 63.2, 7.1, 1.80 and 99.2, 11.8, 2.28, respectively. Through comparative experimental analysis, the model has good counting performance and stability. Ablation experiments are conducted on ShanghaiTech PartB, which verifies the effectiveness of each module of the model. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
6. Device-Free Crowd Size Estimation Using Wireless Sensing on Subway Platforms.
- Author
-
Janssens, Robin, Mannens, Erik, Berkvens, Rafael, and Denis, Stijn
- Abstract
Featured Application: This work presents the use of device-free wireless sensing for crowd size estimation on subway platforms. Dense urban environments pose significant challenges when it comes to detecting and measuring crowd size due to their nature of being free-flow environments containing many dynamic factors. In this paper, we use a wireless sensor network (WSN) to perform device-free crowd size estimation in a subway station. Our sensing solution uses the change in attenuation of the communication links between sensor nodes to estimate the number of people standing on the platform. In order to achieve this, we use the same attenuation information coming from the WSN to detect the presence of a rail vehicle in the station and compensate for the channel fading caused by the introduced rail vehicle. We make use of two separately trained regression models depending on the presence or absence of a rail vehicle to estimate the people count. The detection of rail vehicles occurred with a near-perfect accuracy. When evaluating the resulting estimation model on our test set, we achieved a mean average error of 3.567 people, which is a significant improvement over 6.192 people when using a single regression model. This demonstrates that device-free sensing technologies can be successfully implemented in dynamic environments by implementing detection techniques and using different regression models depending on the environment's state. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
7. Synthetic Data for Video Surveillance Applications of Computer Vision: A Review.
- Author
-
Delussu, Rita, Putzu, Lorenzo, and Fumera, Giorgio
- Subjects
- *
OBJECT recognition (Computer vision) , *COMPUTER vision , *IMAGE analysis , *BEHAVIORAL assessment , *APPLICATION software , *VIDEO surveillance , *DEEP learning - Abstract
In recent years, there has been a growing interest in synthetic data for several computer vision applications, such as automotive, detection and tracking, surveillance, medical image analysis and robotics. Early use of synthetic data was aimed at performing controlled experiments under the analysis by synthesis approach. Currently, synthetic data are mainly used for training computer vision models, especially deep learning ones, to address well-known issues of real data, such as manual annotation effort, data imbalance and bias, and privacy-related restrictions. In this work, we survey the use of synthetic training data focusing on applications related to video surveillance, whose relevance has rapidly increased in the past few years due to their connection to security: crowd counting, object and pedestrian detection and tracking, behaviour analysis, person re-identification and face recognition. Synthetic training data are even more interesting in this kind of application, to address further, specific issues arising, e.g., from typically unconstrained image or video acquisition conditions and cross-scene application scenarios. We categorise and discuss the existing methods for creating synthetic data, analyse the synthetic data sets proposed in the literature for each of the considered applications, and provide an overview of their effectiveness as training data. We finally discuss whether and to what extent the existing synthetic data sets mitigate the issues of real data, highlight existing open issues, and suggest future research directions in this field. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
8. Light-sensitive and adaptive fusion network for RGB-T crowd counting.
- Author
-
Huang, Liangjun, Kang, Wencan, Chen, Guangkai, Zhang, Qing, and Zhang, Jianwei
- Subjects
- *
FEATURE extraction , *MULTISENSOR data fusion , *CROWDS , *COUNTING - Abstract
Mainstream RGB-T crowd counting methods use cross-modal complementary information to improve the counting accuracy. However, most of them neglect the effect of lighting variation on cross-modal data fusion. In this paper, we propose a Light-sensitive and Adaptive Fusion Network (LAFNet) for RGB-T crowd counting. Specifically, we present a Modality-specific Feature Extraction Module (MFEM) that fuses the lighting information, and a Light-sensitive and Adaptive Fusion Module (LAFM) that adjusts the fusion strategies of different modalities according to the lighting conditions of the input crowd images. Moreover, we propose an Improved Multi-scale Extraction Module (IMEM) to extract and fuse multi-modal at different scales. We evaluate our method on the RGBT-CC dataset and the experiment results show the validity of the model and its effectiveness in various scenarios. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
9. MRSNet: Multi-Resolution Scale Feature Fusion-Based Universal Density Counting Network.
- Author
-
Zhang, Yi, Song, Wei, Shao, Mingyue, and Liu, Xiangchun
- Subjects
- *
CONVOLUTIONAL neural networks , *TRANSFORMER models , *QUANTITATIVE research , *COUNTING , *GENERALIZATION - Abstract
This study focuses on the problem of dense object counting. In dense scenes, variations in object scales and uneven distributions greatly hinder counting accuracy. The current methods, whether CNNs with fixed convolutional kernel sizes or Transformers with fixed attention sizes, struggle to handle such variability effectively. Lower-resolution features are more sensitive to larger objects closer to the camera, while higher-resolution features are more efficient for smaller objects further away. Thus, preserving features that carry the most relevant information at each scale is crucial for improving counting precision. Motivated by this, we propose a multi-resolution scale feature fusion-based universal density counting network (MRSNet). It utilizes independent modules to process high- and low-resolution features, adaptively adjusts receptive field sizes, and incorporates dynamic sparse attention mechanisms to optimize feature information at each resolution, by integrating optimal features across multiple scales into density maps for counting evaluation. Our proposed network effectively mitigates issues caused by large variations in object scales, thereby enhancing counting accuracy. Furthermore, extensive quantitative analyses on six public datasets demonstrate the algorithm's strong generalization ability in handling diverse object scale variations. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
10. SPCANet: congested crowd counting via strip pooling combined attention network.
- Author
-
Yuan, Zhongyuan
- Subjects
CONVOLUTIONAL neural networks ,COLLECTIVE behavior ,BEHAVIORAL assessment ,PUBLIC spaces ,PUBLIC administration - Abstract
Crowd counting aims to estimate the number and distribution of the population in crowded places, which is an important research direction in object counting. It is widely used in public place management, crowd behavior analysis, and other scenarios, showing its robust practicality. In recent years, crowd-counting technology has been developing rapidly. However, in highly crowded and noisy scenes, the counting effect of most models is still seriously affected by the distortion of view angle, dense occlusion, and inconsistent crowd distribution. Perspective distortion causes crowds to appear in different sizes and shapes in the image, and dense occlusion and inconsistent crowd distributions result in parts of the crowd not being captured completely. This ultimately results in the imperfect capture of spatial information in the model. To solve such problems, we propose a strip pooling combined attention (SPCANet) network model based on normed-deformable convolution (NDConv). We model long-distance dependencies more efficiently by introducing strip pooling. In contrast to traditional square kernel pooling, strip pooling uses long and narrow kernels (1×N or N×1) to deal with dense crowds, mutual occlusion, and overlap. Efficient channel attention (ECA), a mechanism for learning channel attention using a local cross-channel interaction strategy, is also introduced in SPCANet. This module generates channel attention through a fast 1D convolution to reduce model complexity while improving performance as much as possible. Four mainstream datasets, Shanghai Tech Part A, Shanghai Tech Part B, UCF-QNRF, and UCF CC 50, were utilized in extensive experiments, and mean absolute error (MAE) exceeds the baseline, which is 60.9, 7.3, 90.8, and 161.1, validating the effectiveness of SPCANet. Meanwhile, mean squared error (MSE) decreases by 5.7% on average over the four datasets, and the robustness is greatly improved. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
11. Mask focal loss: a unifying framework for dense crowd counting with canonical object detection networks.
- Author
-
Zhong, Xiaopin, Wang, Guankun, Liu, Weixiang, Wu, Zongze, and Deng, Yuanlong
- Subjects
OBJECT recognition (Computer vision) ,DEEP learning ,CROWDS ,COUNTING ,PUBLIC safety - Abstract
As a fundamental computer vision task, crowd counting plays an important role in public safety. Currently, deep learning based head detection is a promising method for crowd counting. However, the highly concerned object detection networks cannot be well applied to this problem for three reasons: (1) Existing loss functions fail to address sample imbalance in highly dense and complex scenes; (2) Canonical object detectors lack spatial coherence in loss calculation, disregarding the relationship between object location and background region; (3) Most of the head detection datasets are only annotated with the center points, i.e. without bounding boxes. To overcome these issues, we propose a novel Mask Focal Loss (MFL) based on heatmap via the Gaussian kernel. MFL provides a unifying framework for the loss functions based on both heatmap and binary feature map ground truths. Additionally, we introduce GTA_Head, a synthetic dataset with comprehensive annotations, for evaluation and comparison. Extensive experimental results demonstrate the superior performance of our MFL across various detectors and datasets, and it can reduce MAE and RMSE by up to 47.03% and 61.99%, respectively. Therefore, our work presents a strong foundation for advancing crowd counting methods based on density estimation. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
12. MLANet: multi-level attention network with multi-scale feature fusion for crowd counting.
- Author
-
Xiong, Liyan, Zeng, Yijuan, Huang, Xiaohui, Li, Zhida, and Huang, Peng
- Subjects
- *
FEATURE extraction , *COUNTING , *CROWDS , *DENSITY - Abstract
Estimating the population in a given scene is a process known as crowd counting. The field has recently garnered significant attention, and many innovative methods have emerged. However, intense scale variations and background interference make crowd counting in realistic scenes always challenging. To address these in this paper, a multi-level attention network with multi-scale feature fusion named MLANet is proposed. The network consists of three sections: a multi-level base feature extraction front-end network, a centralized dilated multi-scale feature fusion mid-end network with a global attention module, and a back-end network for the generation of density maps. By incorporating a flexible attention module and multi-scale features, the method can accurately capture crowd information at different scales and achieve accurate counting results. We evaluated the method on four public datasets (UCF_CC_50, ShanghaiTech, WorldExpo'10, and Beijing BRT), and the experimental results demonstrate a significant reduction in counting error when compared with existing methods. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
13. TinyCount: an efficient crowd counting network for intelligent surveillance.
- Author
-
Lee, Hyeonbeen and Lee, Jangho
- Abstract
Crowd counting, the task of estimating the total number of people in an image, is essential for intelligent surveillance. Integrating a well-trained crowd counting network into edge devices, such as intelligent CCTV systems, enables its application across various domains, including the prevention of crowd collapses and urban planning. For a model to be embedded in edge devices, it requires robust performance, reduced parameter count, and faster response times. This study proposes a lightweight and powerful model called TinyCount, which has only 60k parameters. The proposed TinyCount is a fully convolutional network consisting of a feature extract module (FEM) for robust and rapid feature extraction, a scale perception module (SPM) for scale variation perception and an upsampling module (UM) that adjusts the feature map to the same size as the original image. TinyCount demonstrated competitive performance across three representative crowd counting datasets, despite utilizing approximately 3.33 to 271 times fewer parameters than other crowd counting approaches. The proposed model achieved relatively fast inference times by leveraging the MobileNetV2 architecture with dilated and transposed convolutions. The application of SEblock and findings from existing studies further proved its effectiveness. Finally, we evaluated the proposed TinyCount on multiple edge devices, including the Raspberry Pi 4, NVIDIA Jetson Nano, and NVIDIA Jetson AGX Xavier, to demonstrate its potential for practical applications. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
14. A three-stream fusion and self-differential attention network for multi-modal crowd counting.
- Author
-
Tang, Haihan, Wang, Yi, Lin, Zhiping, Chau, Lap-Pui, and Zhuang, Huiping
- Subjects
- *
COUNTING , *CROWDS - Abstract
Multi-modal crowd counting aims at using multiple types of data, like RGB-Thermal and RGB-Depth, to count the number of people in crowded scenes. Current methods mainly focus on two-stream multi-modal information fusing in the encoder and single-scale semantic features in the decoder. In this paper, we propose an end-to-end three-stream fusion and self-differential attention network to simultaneously address the multi-modal fusion and scale variation problems for multi-modal crowd counting. Specifically, the encoder adopts three-stream fusion to fuse stage-wise modality-paired and modality-specific features. The decoder applies a self-differential attention mechanism on multi-level fused features to extract basic and differential information adaptively, and finally, the counting head predicts the density map. Experimental results on RGB-T and RGB-D benchmarks show the superiority of our proposed method compared with the state-of-the-art multi-modal crowd counting methods. Ablation studies and visualization demonstrate the advantages of the proposed modules in our model. • We propose a novel multi-modal crowd counting model to address information fusion and scale variation problems. • The model uses the three-stream fusion encoder with IIM to fuse modality-paired and modality-specific features. • The model adaptively integrates multi-scale features by SDAM to emphasize discriminative scale information. • Our method outperforms its counterparts and performs consistently well in the daytime and nighttime. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
15. GTL-ASENet: global to local adaptive spatial encoder network for crowd counting.
- Author
-
Liu, Chengming, Hu, Guanzhong, Li, Yinghao, Gao, Yufei, and Shi, Lei
- Subjects
COUNTING ,CROWDS ,POPULATION density ,PIXELS - Abstract
Crowd counting from a single image is a challenging task due to perspective distortion and large-scale variation in crowd scenes. Many Researches only focus on local features to create density maps which is not effective in handing the challenges. This paper proposes a novel network named global-to-local adaptive spatial encoder network, which focuses on global features to generate a total structure density map of the population distribution, and then utilizes local features to reconstruct the total structure density map in detail to generate high-quality density map. To capture global features, local information and correlate them, we design a contextual module using different kernels with convolution and transposed convolution. To create a density map from global structure to local detail, two branches are designed, the global distribution branch and the local detail branch. The former aims to capture the population distribution region of interest in terms of global structure, and the latter aims to focus on the local details of each unit. Furthermore, to overcome the problem of pixel-wise loss of MSE, this paper proposes an efficient loss function that focuses on perceiving the possible crowd distribution over the whole image. We also apply a new upsampling mechanism that learns to create high-quality density maps on its own is advisable. The proposed network can capture the characteristics of pedestrian distribution and predict accurate results. It is evaluated on four crowd counting datasets (ShanghaiTech, NWPU, UCF_QNRF, UCF_CC_50), it obtains MAE of 67.1 and MSE, and achieves 108.8 in ShanghaiTech and gets MAE of 139.2 and the best MSE of 217.7 in UCF_CC_50 dataset and so on, and our method shows state-of-the-art on all the datasets. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
16. Multiscale aggregation network via smooth inverse map for crowd counting.
- Author
-
Guo, Xiangyu, Gao, Mingliang, Zhai, Wenzhe, Li, Qilei, Pan, Jinfeng, and Zou, Guofeng
- Subjects
CONVOLUTIONAL neural networks ,SMART cities ,COMPUTER vision ,MAPS - Abstract
Crowd counting is a practical yet essential research topic in computer vision, which has been beneficial to diverse applications in smart city environment safety. The commonly adopted paradigm in most existing methods is to regress a Gaussian density map that works as the learning objective during model training. However, given the unavoidable identity occlusion and scale variation in a crowd image, the corresponding Gaussian density map is degraded, failing to provide reliable supervision for optimization. To address this problem, we propose to replace the traditional Gaussian density map with a better alternation, namely the smooth inverse map (SIM). The proposed SIM can reflect the head location spatially and provide a smooth gradient to stabilize the model learning. Besides, we want the method to learn more discriminative features to cope with the challenge of large-scale variations. We deliver a multiscale aggregation (MA) to adaptively fuse features in different hierarchies to benefit semantic information under diverse receptive filed. The SIM and MA are meant to be complementary modules to guide the model in learning an accurate density map. Extensive experiments on benchmark datasets demonstrate the effectiveness of the proposed method compared with the state-of-the-art techniques. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
17. SA-DCPNet: Scale-aware deep convolutional pyramid network for crowd counting.
- Author
-
Tyagi, Bhawana, Nigam, Swati, and Singh, Rajiv
- Subjects
- *
STANDARD deviations , *PYRAMIDS , *DEEP learning , *COMPUTER vision , *VISUAL fields - Abstract
Crowd counting is one of the most complex research topics in the field of computer vision. There are many challenges associated with this task, including severe occlusion, scale variation, and complex background. Multi-column networks are commonly used for crowd counting, but they suffer from scale variation and feature similarity, which leads to poor analysis of crowd sequences. To address these issues, we propose a scale-aware deep convolutional pyramid network for crowd counting. We have introduced a scale-aware deep convolutional pyramid module by integrating message passing and global attention mechanisms into a multi-column network. The proposed network minimizes the problem of scale variation using SA-DPCM and uses a multi-column variance loss function to handle issues with feature similarity. Experiments have been performed over the ShanghaiTech and UCF-CC-50 datasets, which show the better performance of the proposed method in terms of mean absolute error and root mean square error. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
18. Lightweight multi-scale network with attention for accurate and efficient crowd counting.
- Author
-
Xi, Mengyuan and Yan, Hua
- Subjects
- *
COMPUTER vision , *FEATURE extraction , *CROWDS , *COUNTING - Abstract
Crowd counting is a significant task in computer vision, which aims to estimate the total number of people appeared in images or videos. However, it is still very challenging due to the huge scale variation and uneven density distributions in dense scenes. Moreover, although many works have been presented to tackle these issues, these methods always have a large number of parameters and high computation complexity, which leads to a limitation to the wide applications in edge devices. In this work, we propose a lightweight method for accurate and efficient crowd counting, called lightweight multi-scale network with attention. It is mainly composed of four parts: lightweight extractor, multi-scale features extraction module (MFEM), attention-based fusion module (ABFM), and efficient density map regressor. We design the MFEM and ABFM delicately to obtain rich scale representations, which is significantly beneficial for improving the counting accuracy. Moreover, the normalized union loss function is proposed to balance contribution of samples with diverse density distributions. Extensive experiments carried out on six mainstream crowd datasets demonstrate that our proposed method achieves superior performance to the other state-of-the-art methods with a small model size and low computational cost. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
19. CLDE-Net: crowd localization and density estimation based on CNN and transformer network.
- Author
-
Hu, Yaocong, Lin, Yuanyuan, Yang, Huicheng, Liu, Bingyou, Wan, Guoyang, Hong, Jinwen, Xie, Chao, Wang, Wei, and Lu, Xiaobo
- Abstract
Given a crowd image, there are two ways for human to approximate the counting number: exactly locating head points in each local region or directly estimating the total number of person based on the whole image. By imitating human visual perception, CNN and transformer are two mainstream models for solving crowd counting challenging, among which CNN has a strong ability to extract locality-oriented feature and transformer is suitable for modeling global dependencies. Based on the fact, in this paper, the proposed CLDE-Net is the first study that fulfills exact localization and direct estimation by designing the hybrid of CNN and transformer, to be specific, CNN searches all candidate head points in each local region and transformer learns the crowd density map with global receptive fields. Furthermore, we adopt two pipelines to further boost crowd counting performance: (1) cross-layer feature interaction module is employed to facilitate information transmission between two network branches of CNN and transformer and (2) dynamic factor generator is designed to adaptively fuse the result of head point localization and density map estimation. Extensive experiments show that the proposed CLDE-Net framework achieves the state-of-the-art performance on multiple data sets for crowd counting. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
20. WiFi Probe Request를 이용한 유동인구 자료수집 및 분석 적용 AI 연구.
- Author
-
이우익 and 박대우
- Abstract
Analyzing the flow of crowd counting can be used for public policy, corporate strategy, and commercial district analysis. Existing devices are installed on poles, buildings, etc. to detect WiFi signals from personal devices. However, these devices are often not located adjacent to actual sidewalks, do not provide power at all times, or have limitations in collecting personal terminal identification information, especially due to restrictions under the strengthened Personal Information Protection Act. This paper designs a WiFi scanner to accurately collect WiFi. We also conduct research on a repeater that transmits smartphone MAC information to a cloud server that does not violate the Personal Information Protection Act. In addition, it solves the power problem of existing communication transmitters with a self-generating system using solar power and piezoelectric power generation. The results of this paper will be used in commercial analysis, corporate strategy, and public policy, and will contribute to the development of wireless communication and artificial intelligence technology. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
21. Deep Learning Based Efficient Crowd Counting System.
- Author
-
Al-Ghanem, Waleed Khalid, Qazi, Emad Ul Haq, Faheem, Muhammad Hamza, and Quadri, Syed Shah Amanullah
- Subjects
CONVOLUTIONAL neural networks ,DEEP learning ,CROWDS - Abstract
Estimation of crowd count is becoming crucial nowadays, as it can help in security surveillance, crowd monitoring, and management for different events. It is challenging to determine the approximate crowd size from an image of the crowd's density. Therefore in this research study, we proposed a multi-headed convolutional neural network architecture-based model for crowd counting, where we divided our proposed model into two main components: (i) the convolutional neural network, which extracts the feature across the whole image that is given to it as an input, and (ii) the multi-headed layers, which make it easier to evaluate density maps to estimate the number of people in the input image and determine their number in the crowd. We employed the available public benchmark crowd-counting datasets UCF CC 50 and ShanghaiTech parts A and B for model training and testing to validate the model's performance. To analyze the results, we used two metrics Mean Absolute Error (MAE) and Mean Square Error (MSE), and compared the results of the proposed systems with the state-of-art models of crowd counting. The results show the superiority of the proposed system. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
22. SPCANet: congested crowd counting via strip pooling combined attention network
- Author
-
Zhongyuan Yuan
- Subjects
Crowd counting ,Convolutional neural network ,Spatial pooling ,Channel attention ,Electronic computers. Computer science ,QA75.5-76.95 - Abstract
Crowd counting aims to estimate the number and distribution of the population in crowded places, which is an important research direction in object counting. It is widely used in public place management, crowd behavior analysis, and other scenarios, showing its robust practicality. In recent years, crowd-counting technology has been developing rapidly. However, in highly crowded and noisy scenes, the counting effect of most models is still seriously affected by the distortion of view angle, dense occlusion, and inconsistent crowd distribution. Perspective distortion causes crowds to appear in different sizes and shapes in the image, and dense occlusion and inconsistent crowd distributions result in parts of the crowd not being captured completely. This ultimately results in the imperfect capture of spatial information in the model. To solve such problems, we propose a strip pooling combined attention (SPCANet) network model based on normed-deformable convolution (NDConv). We model long-distance dependencies more efficiently by introducing strip pooling. In contrast to traditional square kernel pooling, strip pooling uses long and narrow kernels (1×N or N×1) to deal with dense crowds, mutual occlusion, and overlap. Efficient channel attention (ECA), a mechanism for learning channel attention using a local cross-channel interaction strategy, is also introduced in SPCANet. This module generates channel attention through a fast 1D convolution to reduce model complexity while improving performance as much as possible. Four mainstream datasets, Shanghai Tech Part A, Shanghai Tech Part B, UCF-QNRF, and UCF CC 50, were utilized in extensive experiments, and mean absolute error (MAE) exceeds the baseline, which is 60.9, 7.3, 90.8, and 161.1, validating the effectiveness of SPCANet. Meanwhile, mean squared error (MSE) decreases by 5.7% on average over the four datasets, and the robustness is greatly improved.
- Published
- 2024
- Full Text
- View/download PDF
23. MPRNet: Multi-scale Pointwise Regression Network for Crowd Counting and Localization
- Author
-
Jia, Chenyan, Cheng, Zhitao, Leng, Yanlin, Wang, Junfeng, Tang, Yong, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Huang, De-Shuang, editor, Pan, Yijie, editor, and Guo, Jiayang, editor
- Published
- 2024
- Full Text
- View/download PDF
24. Crowd Counting and People Density Detection: An Overview
- Author
-
Jin, Fuqiang, Zhang, Zhaoguo, Ning, Yi, Lu, Yi, Song, Wei, Qin, Xingguo, Chen, Jinlong, Fournier-Viger, Philippe, Series Editor, Yao, Tang, editor, Chen, Shouchang, editor, Zhang, Zelin, editor, and Yan, Yingchen, editor
- Published
- 2024
- Full Text
- View/download PDF
25. Fuzzy Community Geometry Adaptive Density Map Generation for UAVs Crowd Counting
- Author
-
Yuan, Xin, Wang, Xinghu, Sun, Gen, Ma, Dan, Guo, Qiang, Liu, Zhijun, Angrisani, Leopoldo, Series Editor, Arteaga, Marco, Series Editor, Chakraborty, Samarjit, Series Editor, Chen, Jiming, Series Editor, Chen, Shanben, Series Editor, Chen, Tan Kay, Series Editor, Dillmann, Rüdiger, Series Editor, Duan, Haibin, Series Editor, Ferrari, Gianluigi, Series Editor, Ferre, Manuel, Series Editor, Jabbari, Faryar, Series Editor, Jia, Limin, Series Editor, Kacprzyk, Janusz, Series Editor, Khamis, Alaa, Series Editor, Kroeger, Torsten, Series Editor, Li, Yong, Series Editor, Liang, Qilian, Series Editor, Martín, Ferran, Series Editor, Ming, Tan Cher, Series Editor, Minker, Wolfgang, Series Editor, Misra, Pradeep, Series Editor, Mukhopadhyay, Subhas, Series Editor, Ning, Cun-Zheng, Series Editor, Nishida, Toyoaki, Series Editor, Oneto, Luca, Series Editor, Panigrahi, Bijaya Ketan, Series Editor, Pascucci, Federica, Series Editor, Qin, Yong, Series Editor, Seng, Gan Woon, Series Editor, Speidel, Joachim, Series Editor, Veiga, Germano, Series Editor, Wu, Haitao, Series Editor, Zamboni, Walter, Series Editor, Tan, Kay Chen, Series Editor, Qu, Yi, editor, Gu, Mancang, editor, Niu, Yifeng, editor, and Fu, Wenxing, editor
- Published
- 2024
- Full Text
- View/download PDF
26. Learning Transformation Maps for Crowd Analysis
- Author
-
Lian, Yu, Hu, Zhifei, Li, Xin, Zhang, Longxu, Zhang, Zhong, Gao, Song, Angrisani, Leopoldo, Series Editor, Arteaga, Marco, Series Editor, Chakraborty, Samarjit, Series Editor, Chen, Jiming, Series Editor, Chen, Shanben, Series Editor, Chen, Tan Kay, Series Editor, Dillmann, Rüdiger, Series Editor, Duan, Haibin, Series Editor, Ferrari, Gianluigi, Series Editor, Ferre, Manuel, Series Editor, Jabbari, Faryar, Series Editor, Jia, Limin, Series Editor, Kacprzyk, Janusz, Series Editor, Khamis, Alaa, Series Editor, Kroeger, Torsten, Series Editor, Li, Yong, Series Editor, Liang, Qilian, Series Editor, Martín, Ferran, Series Editor, Ming, Tan Cher, Series Editor, Minker, Wolfgang, Series Editor, Misra, Pradeep, Series Editor, Mukhopadhyay, Subhas, Series Editor, Ning, Cun-Zheng, Series Editor, Nishida, Toyoaki, Series Editor, Oneto, Luca, Series Editor, Panigrahi, Bijaya Ketan, Series Editor, Pascucci, Federica, Series Editor, Qin, Yong, Series Editor, Seng, Gan Woon, Series Editor, Speidel, Joachim, Series Editor, Veiga, Germano, Series Editor, Wu, Haitao, Series Editor, Zamboni, Walter, Series Editor, Zhang, Junjie James, Series Editor, Tan, Kay Chen, Series Editor, Wang, Wei, editor, Liu, Xin, editor, Na, Zhenyu, editor, and Zhang, Baoju, editor
- Published
- 2024
- Full Text
- View/download PDF
27. Calculating Bus Occupancy by Deep Learning Algorithms
- Author
-
Yıldırım, Kevser Büşra, Kiraz, Berna, Sahmoud, Shaaban, Xhafa, Fatos, Series Editor, Hemanth, D. Jude, editor, Kose, Utku, editor, Patrut, Bogdan, editor, and Ersoy, Mevlut, editor
- Published
- 2024
- Full Text
- View/download PDF
28. A Deep Learning-Based Method for Classroom Crowd Counting and Localization
- Author
-
Ding, Qin, Yu, Chunyan, Filipe, Joaquim, Editorial Board Member, Ghosh, Ashish, Editorial Board Member, Prates, Raquel Oliveira, Editorial Board Member, Zhou, Lizhu, Editorial Board Member, Hong, Wenxing, editor, and Kanaparan, Geetha, editor
- Published
- 2024
- Full Text
- View/download PDF
29. FGENet: Fine-Grained Extraction Network for Congested Crowd Counting
- Author
-
Ma, Hao-Yuan, Zhang, Li, Wei, Xiang-Yi, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Rudinac, Stevan, editor, Hanjalic, Alan, editor, Liem, Cynthia, editor, Worring, Marcel, editor, Jónsson, Björn Þór, editor, Liu, Bei, editor, and Yamakata, Yoko, editor
- Published
- 2024
- Full Text
- View/download PDF
30. A Survey Analysis on Dental Caries Detection from RVG Images Using Deep Learning
- Author
-
Nageswari, P., Pareek, Piyush Kumar, Suresh Kumar, A., Aditya, Pai H., Guru Prasad, M. S., Kandasamy, Manivel, Angrisani, Leopoldo, Series Editor, Arteaga, Marco, Series Editor, Chakraborty, Samarjit, Series Editor, Chen, Jiming, Series Editor, Chen, Shanben, Series Editor, Chen, Tan Kay, Series Editor, Dillmann, Rüdiger, Series Editor, Duan, Haibin, Series Editor, Ferrari, Gianluigi, Series Editor, Ferre, Manuel, Series Editor, Jabbari, Faryar, Series Editor, Jia, Limin, Series Editor, Kacprzyk, Janusz, Series Editor, Khamis, Alaa, Series Editor, Kroeger, Torsten, Series Editor, Li, Yong, Series Editor, Liang, Qilian, Series Editor, Martín, Ferran, Series Editor, Ming, Tan Cher, Series Editor, Minker, Wolfgang, Series Editor, Misra, Pradeep, Series Editor, Mukhopadhyay, Subhas, Series Editor, Ning, Cun-Zheng, Series Editor, Nishida, Toyoaki, Series Editor, Oneto, Luca, Series Editor, Panigrahi, Bijaya Ketan, Series Editor, Pascucci, Federica, Series Editor, Qin, Yong, Series Editor, Seng, Gan Woon, Series Editor, Speidel, Joachim, Series Editor, Veiga, Germano, Series Editor, Wu, Haitao, Series Editor, Zamboni, Walter, Series Editor, Zhang, Junjie James, Series Editor, Tan, Kay Chen, Series Editor, Shetty, N. R., editor, Prasad, N. H., editor, and Nagaraj, H. C., editor
- Published
- 2024
- Full Text
- View/download PDF
31. Computer Vision and Convolutional Neural Network for Dense Crowd Count Detection
- Author
-
Sirisha, D., Sambhu Prasad, S., Kumar, Subodh, Kacprzyk, Janusz, Series Editor, Gomide, Fernando, Advisory Editor, Kaynak, Okyay, Advisory Editor, Liu, Derong, Advisory Editor, Pedrycz, Witold, Advisory Editor, Polycarpou, Marios M., Advisory Editor, Rudas, Imre J., Advisory Editor, Wang, Jun, Advisory Editor, Sharma, Harish, editor, Chakravorty, Antorweep, editor, Hussain, Shahid, editor, and Kumari, Rajani, editor
- Published
- 2024
- Full Text
- View/download PDF
32. HTNet: A Hybrid Model Boosted by Triple Self-attention for Crowd Counting
- Author
-
Li, Yang, Yin, Baoqun, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Liu, Qingshan, editor, Wang, Hanzi, editor, Ma, Zhanyu, editor, Zheng, Weishi, editor, Zha, Hongbin, editor, Chen, Xilin, editor, Wang, Liang, editor, and Ji, Rongrong, editor
- Published
- 2024
- Full Text
- View/download PDF
33. Semantically Guided Bi-level Adaptation for Cross Domain Crowd Counting
- Author
-
Zhao, Muming, Xu, Weiqing, Zhang, Chongyang, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Liu, Qingshan, editor, Wang, Hanzi, editor, Ma, Zhanyu, editor, Zheng, Weishi, editor, Zha, Hongbin, editor, Chen, Xilin, editor, Wang, Liang, editor, and Ji, Rongrong, editor
- Published
- 2024
- Full Text
- View/download PDF
34. PVT-Crowd: Bridging Multi-scale Features from Pyramid Vision Transformer for Weakly-Supervised Crowd Counting
- Author
-
Huo, Zhanqiang, Zhang, Kunwei, Luo, Fen, Qiao, Yingxu, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Liu, Qingshan, editor, Wang, Hanzi, editor, Ma, Zhanyu, editor, Zheng, Weishi, editor, Zha, Hongbin, editor, Chen, Xilin, editor, Wang, Liang, editor, and Ji, Rongrong, editor
- Published
- 2024
- Full Text
- View/download PDF
35. DARN: Crowd Counting Network Guided by Double Attention Refinement
- Author
-
Chang, Shuhan, Zhong, Shan, Zhou, Lifan, Zhou, Xuanyu, Gong, Shengrong, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Liu, Qingshan, editor, Wang, Hanzi, editor, Ma, Zhanyu, editor, Zheng, Weishi, editor, Zha, Hongbin, editor, Chen, Xilin, editor, Wang, Liang, editor, and Ji, Rongrong, editor
- Published
- 2024
- Full Text
- View/download PDF
36. Repdistiller: Knowledge Distillation Scaled by Re-parameterization for Crowd Counting
- Author
-
Ni, Tian, Cao, Yuchen, Liang, Xiaoyu, Hu, Haoji, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Liu, Qingshan, editor, Wang, Hanzi, editor, Ma, Zhanyu, editor, Zheng, Weishi, editor, Zha, Hongbin, editor, Chen, Xilin, editor, Wang, Liang, editor, and Ji, Rongrong, editor
- Published
- 2024
- Full Text
- View/download PDF
37. Cross-Modal Information Aggregation and Distribution Method for Crowd Counting
- Author
-
Chen, Yin, Zhou, Yuhao, Dong, Tianyang, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Sheng, Bin, editor, Bi, Lei, editor, Kim, Jinman, editor, Magnenat-Thalmann, Nadia, editor, and Thalmann, Daniel, editor
- Published
- 2024
- Full Text
- View/download PDF
38. Loss Filtering Factor for Crowd Counting
- Author
-
Chen, Yufeng, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Liu, Qingshan, editor, Wang, Hanzi, editor, Ma, Zhanyu, editor, Zheng, Weishi, editor, Zha, Hongbin, editor, Chen, Xilin, editor, Wang, Liang, editor, and Ji, Rongrong, editor
- Published
- 2024
- Full Text
- View/download PDF
39. Multiscale Network with Equivalent Large Kernel Attention for Crowd Counting
- Author
-
Wu, Zhiwei, Gong, Wenhui, Chen, Yan, Xia, Xiaofeng, Sang, Jun, Filipe, Joaquim, Editorial Board Member, Ghosh, Ashish, Editorial Board Member, Prates, Raquel Oliveira, Editorial Board Member, Zhou, Lizhu, Editorial Board Member, Luo, Biao, editor, Cheng, Long, editor, Wu, Zheng-Guang, editor, Li, Hongyi, editor, and Li, Chaojie, editor
- Published
- 2024
- Full Text
- View/download PDF
40. Progressive Crowd Enhancement De-Background Network for crowd counting
- Author
-
Wang, Lin, Li, Jie, Qi, Chun, Wang, Fengping, and Wang, Pan
- Published
- 2024
- Full Text
- View/download PDF
41. Self-attention Guidance Based Crowd Localization and Counting
- Author
-
Ma, Zhouzhou, Gu, Guanghua, and Zhao, Wenrui
- Published
- 2024
- Full Text
- View/download PDF
42. Counting dense object of multiple types based on feature enhancement.
- Author
-
Qiyan Fu, Weidong Min, Weixiang Sheng, and Chunjiang Peng
- Subjects
FEATURE extraction ,COUNTING ,PEDESTRIANS - Abstract
Introduction: Accurately counting the number of dense objects in an image, such as pedestrians or vehicles, is a challenging and practical task. The existing density map regression methods based on CNN are mainly used to count a class of dense objects in a single scene. However, in complex traffic scenes, objects such as vehicles and pedestrians usually exist at the same time, and multiple classes of dense objects need to be counted simultaneously. Methods: To solve the above issues, we propose a new multiple types of dense object counting method based on feature enhancement, which can enhance the features of dense counting objects in complex traffic scenes to realize the classification and regression counting of dense vehicles and people. The counting model consists of the regression subnet and the classification subnet. The regression subnet is primarily used to generate two-channel predicted density maps, mainly including the initial feature layer and the feature enhancement layer, in which the feature enhancement layer can enhance the classification features and regression counting features of dense objects in complex traffic scenes. The classification subnet mainly supervises classifying dense vehicles and people into two feature channels to assist the regression counting task of the regression subnets. Results: Our method is compared on VisDrone+ datasets, ApolloScape+ datasets, and UAVDT+ datasets. The experimental results show that the method counts two kinds of dense objects simultaneously and outputs a high-quality two-channel predicted density map. The counting performance is better than the state-of-the-art counting network in dense people and vehicle counting. Discussion: In future work, we will further improve the feature extraction ability of the model in complex traffic scenes to classify and count a variety of dense objects such as cars, pedestrians, and non-motor vehicles. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
43. CC-DETR: DETR with Hybrid Context and Multi-Scale Coordinate Convolution for Crowd Counting.
- Author
-
Gu, Yanhong, Zhang, Tao, Hu, Yuxia, and Nian, Fudong
- Subjects
- *
COUNTING , *TRANSFORMER models , *CROWDS - Abstract
Prevailing crowd counting approaches primarily rely on density map regression methods. Despite wonderful progress, significant scale variations and complex background interference within the same image remain challenges. To address these issues, in this paper we propose a novel DETR-based crowd counting framework called Crowd Counting DETR (CC-DETR), which aims to extend the state-of-the-art DETR object detection framework to the crowd counting task. In CC-DETR, a DETR-like encoder–decoder structure (Hybrid Context DETR, i.e., HCDETR) is proposed to tackle complex visual information by fusing features from hybrid semantic levels through a transformer. In addition, we design a Coordinate Dilated Convolution Module (CDCM) to effectively employ position-sensitive context information in different scales. Extensive experiments on three challenging crowd counting datasets (ShanghaiTech, UCF-QNRF, and NWPU) demonstrate that our model is effective and competitive when compared against SOTA crowd counting models. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
44. Crowd Counting in Diverse Environments Using a Deep Routing Mechanism Informed by Crowd Density Levels.
- Author
-
Alhawsawi, Abdullah N, Khan, Sultan Daud, and Ur Rehman, Faizan
- Subjects
- *
CROWDS , *COUNTING , *DENSITY , *REGRESSION analysis - Abstract
Automated crowd counting is a crucial aspect of surveillance, especially in the context of mass events attended by large populations. Traditional methods of manually counting the people attending an event are error-prone, necessitating the development of automated methods. Accurately estimating crowd counts across diverse scenes is challenging due to high variations in the sizes of human heads. Regression-based crowd-counting methods often overestimate counts in low-density situations, while detection-based models struggle in high-density scenarios to precisely detect the head. In this work, we propose a unified framework that integrates regression and detection models to estimate the crowd count in diverse scenes. Our approach leverages a routing strategy based on crowd density variations within an image. By classifying image patches into density levels and employing a Patch-Routing Module (PRM) for routing, the framework directs patches to either the Detection or Regression Network to estimate the crowd count. The proposed framework demonstrates superior performance across various datasets, showcasing its effectiveness in handling diverse scenes. By effectively integrating regression and detection models, our approach offers a comprehensive solution for accurate crowd counting in scenarios ranging from low-density to high-density situations. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
45. Lightweight Res-Connection Multi-Branch Network for Highly Accurate Crowd Counting and Localization.
- Author
-
Mingze Li, Diwen Zheng, and Shuhua Lu
- Abstract
Crowd counting is a promising hotspot of computer vision involving crowd intelligence analysis, achieving tremendous success recently with the development of deep learning.However, there have been stillmany challenges including crowd multi-scale variations and high network complexity, etc. To tackle these issues, a lightweight Resconnection multi-branch network (LRMBNet) for highly accurate crowd counting and localization is proposed. Specifically, using improved ShuffleNet V2 as the backbone, a lightweight shallow extractor has been designed by employing the channel compression mechanism to reduce enormously the number of network parameters. A light multi-branch structure with different expansion rate convolutions is demonstrated to extract multi-scale features and enlarged receptive fields, where the information transmission and fusion of diverse scale features is enhanced via residual concatenation. In addition, a compound loss function is introduced for training themethod to improve global context information correlation. The proposed method is evaluated on the SHHA, SHHB, UCF-QNRF and UCF_CC_50 public datasets. The accuracy is better than those of many advanced approaches, while the number of parameters is smaller. The experimental results show that the proposed method achieves a good tradeoff between the complexity and accuracy of crowd counting, indicating a lightweight and high-precision method for crowd counting. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
46. CrowdGraph: Weakly supervised Crowd Counting via Pure Graph Neural Network.
- Author
-
Zhang, Chengyang, Zhang, Yong, Li, Bo, Piao, Xinglin, and Yin, Baocai
- Subjects
CONVOLUTIONAL neural networks ,TRANSFORMER models ,FEATURE extraction ,CROWDS ,COUNTING - Abstract
Most existing weakly supervised crowd counting methods utilize Convolutional Neural Networks (CNN) or Transformer to estimate the total number of individuals in an image. However, both CNN-based (grid-to-count paradigm) and Transformer-based (sequence-to-count paradigm) methods take images as inputs in a regular form. This approach treats all pixels equally but cannot address the uneven distribution problem within human crowds. This challenge would lead to a decline in the counting performance of the model. Compared with grid and sequence, the graph structure could better explore the relationship among features. In this article, we propose a new graph-based crowd counting method named CrowdGraph, which reinterprets the weakly supervised crowd counting problem from a graph-to-count perspective. In the proposed CrowdGraph, each image is constructed as a graph, and a graph-based network is designed to extract features at the graph level. CrowdGraph comprises three main components: a dynamic graph convolutional backbone, a multi-scale dilated graph convolution module, and a regression head. To the best of our knowledge, CrowdGraph is the first method that is completely formulated based on the Graph Neural Network (GNN) for the crowd counting task. Extensive experiments demonstrate that the proposed CrowdGraph outperforms pure CNN-based and pure Transformer-based weakly supervised methods comprehensively and achieves highly competitive counting performance. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
47. JMFEEL-Net: a joint multi-scale feature enhancement and lightweight transformer network for crowd counting.
- Author
-
Wang, Mingtao, Zhou, Xin, and Chen, Yuanyuan
- Subjects
TRANSFORMER models ,CONVOLUTIONAL neural networks ,SENSOR networks ,TEXT recognition ,COUNTING - Abstract
Crowd counting based on convolutional neural networks (CNNs) has made significant progress in recent years. However, the limited receptive field of CNNs makes it challenging to capture global features for comprehensive contextual modeling, resulting in insufficient accuracy in count estimation. In comparison, vision transformer (ViT)-based counting networks have demonstrated remarkable performance by exploiting their powerful global contextual modeling capabilities. However, ViT models are associated with higher computational costs and training difficulty. In this paper, we propose a novel network named JMFEEL-Net, which utilizes joint multi-scale feature enhancement and lightweight transformer to improve crowd counting accuracy. Specifically, we use a high-resolution CNN as the backbone network to generate high-resolution feature maps. In the backend network, we propose a multi-scale feature enhancement module to address the problem of low recognition accuracy caused by multi-scale variations, especially when counting small-scale objects in dense scenes. Furthermore, we introduce an improved lightweight ViT encoder to effectively model complex global contexts. We also adopt a multi-density map supervision strategy to learn crowd distribution features from feature maps of different resolutions, thereby improving the quality and training efficiency of the density maps. To validate the effectiveness of the proposed method, we conduct extensive experiments on four challenging datasets, namely ShanghaiTech Part A/B, UCF-QNRF, and JHU-Crowd++, achieving very competitive counting performance. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
48. SFPANet: Separation and fusion pyramid attention network for crowd counting.
- Author
-
Xiong, Li Yan, Deng, Huizi, Yi, Hu, Huang, Peng, and Zhou, Qiyun
- Abstract
Crowd counting methods have become increasingly mature. However, the problem of dramatic scale variation still exists. For this reason, we propose an efficient separated and fused pyramid attention network, which can extract multiscale features on channels and space and greatly alleviate the problem of dramatic scale variation. First, in order to extract the rich features on the channel, we design a separated and fused channel attention module, which is composed of two 3x3 convolution layers, a separated attention module, and a SE module. Second, we design a spatial contextual feature fusion module to fully extract multiscale features in spatial dimensions. Finally, we conduct comparison experiments with state-of-the-art methods on several challenging datasets, including the ShanghaiTech, UCF_CC_50, and WorldExpo'10 datasets. The experimental results show our method outperforms most of the state-of-the-art methods. We conduct ablation experiments on the ShanghaiTech Part A and Part B datasets to verify the importance of each submodule. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
49. Transformer-CNN hybrid network for crowd counting.
- Author
-
Yu, Jiamao, Yu, Ying, Qian, Jin, Han, Xing, Zhu, Feng, and Zhu, Zhiliang
- Subjects
- *
TRANSFORMER models , *CONVOLUTIONAL neural networks , *FEATURE extraction , *CROWDS , *COUNTING , *IMAGE representation - Abstract
Efficient feature representation is the key to improving crowd counting performance. CNN and Transformer are the two commonly used feature extraction frameworks in the field of crowd counting. CNN excels at hierarchically extracting local features to obtain a multi-scale feature representation of the image, but it struggles with capturing global features. Transformer, on the other hand, could capture global feature representation by utilizing cascaded self-attention to capture remote dependency relationships, but it often overlooks local detail information. Therefore, relying solely on CNN or Transformer for crowd counting has certain limitations. In this paper, we propose the TCHNet crowd counting model by combining the CNN and Transformer frameworks. The model employs the CMT (CNNs Meet Vision Transformers) backbone network as the Feature Extraction Module (FEM) to hierarchically extract local and global features of the crowd using a combination of convolution and self-attention mechanisms. To obtain more comprehensive spatial local information, an improved Progressive Multi-scale Learning Process (PMLP) is introduced into the FEM, guiding the network to learn at different granularity levels. The features from these three different granularity levels are then fed into the Multi-scale Feature Aggregation Module (MFAM) for fusion. Finally, a Multi-Scale Regression Module (MSRM) is designed to handle the multi-scale fused features, resulting in crowd features rich in high-level semantics and low-level detail. Experimental results on five benchmark datasets demonstrate that TCHNet achieves highly competitive performance compared to some popular crowd counting methods. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
50. 複数の人数計数手法と 回帰分析を用いた群衆人数の推定.
- Author
-
今井 龍一, 山本 雄平, 姜 文渊, 中原 匡哉, 神谷 大介, and 野村 圭哉
- Abstract
Copyright of Japanese Journal of JSCE / Doboku Gakkai Ronbunshu is the property of Japan Society of Civil Engineers and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
- Published
- 2024
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.