1
|
Zhang Y, Song W, Shao M, Liu X. MRSNet: Multi-Resolution Scale Feature Fusion-Based Universal Density Counting Network. SENSORS (BASEL, SWITZERLAND) 2024; 24:5974. [PMID: 39338718 PMCID: PMC11436111 DOI: 10.3390/s24185974] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/01/2024] [Revised: 08/18/2024] [Accepted: 09/12/2024] [Indexed: 09/30/2024]
Abstract
This study focuses on the problem of dense object counting. In dense scenes, variations in object scales and uneven distributions greatly hinder counting accuracy. The current methods, whether CNNs with fixed convolutional kernel sizes or Transformers with fixed attention sizes, struggle to handle such variability effectively. Lower-resolution features are more sensitive to larger objects closer to the camera, while higher-resolution features are more efficient for smaller objects further away. Thus, preserving features that carry the most relevant information at each scale is crucial for improving counting precision. Motivated by this, we propose a multi-resolution scale feature fusion-based universal density counting network (MRSNet). It utilizes independent modules to process high- and low-resolution features, adaptively adjusts receptive field sizes, and incorporates dynamic sparse attention mechanisms to optimize feature information at each resolution, by integrating optimal features across multiple scales into density maps for counting evaluation. Our proposed network effectively mitigates issues caused by large variations in object scales, thereby enhancing counting accuracy. Furthermore, extensive quantitative analyses on six public datasets demonstrate the algorithm's strong generalization ability in handling diverse object scale variations.
Collapse
Affiliation(s)
- Yi Zhang
- School of Information and Engineering, Minzu University of China, Beijing 100081, China
| | - Wei Song
- School of Information and Engineering, Minzu University of China, Beijing 100081, China
- Language Information Security Research Center, Institute of National Security MUC, Minzu University of China, Beijing 100081, China
- National Language Resource Monitoring and Research Center of Minority Languages, Minzu University of China, Beijing 100081, China
- Key Laboratory of Ethnic Language Intelligent Analysis and Security Governance of MOE, Minzu University of China, Beijing 100081, China
| | - Mingyue Shao
- School of Information and Engineering, Minzu University of China, Beijing 100081, China
| | - Xiangchun Liu
- School of Information and Engineering, Minzu University of China, Beijing 100081, China
| |
Collapse
|
2
|
Guo X, Gao M, Zou G, Bruno A, Chehri A, Jeon G. Object Counting via Group and Graph Attention Network. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:11884-11895. [PMID: 38051606 DOI: 10.1109/tnnls.2023.3336894] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/07/2023]
Abstract
Object counting, defined as the task of accurately predicting the number of objects in static images or videos, has recently attracted considerable interest. However, the unavoidable presence of background noise prevents counting performance from advancing further. To address this issue, we created a group and graph attention network (GGANet) for dense object counting. GGANet is an encoder-decoder architecture incorporating a group channel attention (GCA) module and a learnable graph attention (LGA) module. The GCA module groups the feature map into several subfeatures, each of which is assigned an attention factor through the identical channel attention. The LGA module views the feature map as a graph structure in which the different channels represent diverse feature vertices, and the responses between channels represent edges. The GCA and LGA modules jointly avoid the interference of irrelevant pixels and suppress the background noise. Experiments are conducted on four crowd-counting datasets, two vehicle-counting datasets, one remote-sensing counting dataset, and one few-shot object-counting dataset. Comparative results prove that the proposed GGANet achieves superior counting performance.
Collapse
|
3
|
Chen Y, Wang Q, Yang J, Chen B, Xiong H, Du S. Learning Discriminative Features for Crowd Counting. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2024; 33:3749-3764. [PMID: 38848225 DOI: 10.1109/tip.2024.3408609] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2024]
Abstract
Crowd counting models in highly congested areas confront two main challenges: weak localization ability and difficulty in differentiating between foreground and background, leading to inaccurate estimations. The reason is that objects in highly congested areas are normally small and high-level features extracted by convolutional neural networks are less discriminative to represent small objects. To address these problems, we propose a learning discriminative features framework for crowd counting, which is composed of a masked feature prediction module (MPM) and a supervised pixel-level contrastive learning module (CLM). The MPM randomly masks feature vectors in the feature map and then reconstructs them, allowing the model to learn about what is present in the masked regions and improving the model's ability to localize objects in high-density regions. The CLM pulls targets close to each other and pushes them far away from background in the feature space, enabling the model to discriminate foreground objects from background. Additionally, the proposed modules can be beneficial in various computer vision tasks, such as crowd counting and object detection, where dense scenes or cluttered environments pose challenges to accurate localization. The proposed two modules are plug-and-play, incorporating the proposed modules into existing models can potentially boost their performance in these scenarios.
Collapse
|
4
|
Shu W, Wan J, Chan AB. Generalized Characteristic Function Loss for Crowd Analysis in the Frequency Domain. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2024; 46:2882-2899. [PMID: 37995158 DOI: 10.1109/tpami.2023.3336196] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/25/2023]
Abstract
Typical approaches that learn crowd density maps are limited to extracting the supervisory information from the loosely organized spatial information in the crowd dot/density maps. This paper tackles this challenge by performing the supervision in the frequency domain. More specifically, we devise a new loss function for crowd analysis called generalized characteristic function loss (GCFL). This loss carries out two steps: 1) transforming the spatial information in density or dot maps to the frequency domain; 2) calculating a loss value between their frequency contents. For step 1, we establish a series of theoretical fundaments by extending the definition of the characteristic function for probability distributions to density maps, as well as proving some vital properties of the extended characteristic function. After taking the characteristic function of the density map, its information in the frequency domain is well-organized and hierarchically distributed, while in the spatial domain it is loose-organized and dispersed everywhere. In step 2, we design a loss function that can fit the information organization in the frequency domain, allowing the exploitation of the well-organized frequency information for the supervision of crowd analysis tasks. The loss function can be adapted to various crowd analysis tasks through the specification of its window functions. In this paper, we demonstrate its power in three tasks: Crowd Counting, Crowd Localization and Noisy Crowd Counting. We show the advantages of our GCFL compared to other SOTA losses and its competitiveness to other SOTA methods by theoretical analysis and empirical results on benchmark datasets. Our codes are available at https://github.com/wbshu/Crowd_Counting_in_the_Frequency_Domain.
Collapse
|
5
|
Wang J, Gao J, Yuan Y, Wang Q. Crowd Localization from Gaussian Mixture Scoped Knowledge and Scoped Teacher. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2023; PP:1802-1814. [PMID: 37028355 DOI: 10.1109/tip.2023.3251727] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Crowd localization is to predict each instance head position in crowd scenarios. Since the distance of pedestrians being to the camera are variant, there exists tremendous gaps among scales of instances within an image, which is called the intrinsic scale shift. The core reason of intrinsic scale shift being one of the most essential issues in crowd localization is that it is ubiquitous in crowd scenes and makes scale distribution chaotic. To this end, the paper concentrates on access to tackle the chaos of the scale distribution incurred by intrinsic scale shift.We propose Gaussian Mixture Scope (GMS) to regularize the chaotic scale distribution. Concretely, the GMS utilizes a Gaussian mixture distribution to adapt to scale distribution and decouples the mixture model into sub-normal distributions to regularize the chaos within the sub-distributions. Then, an alignment is introduced to regularize the chaos among sub-distributions. However, despite that GMS is effective in regularizing the data distribution, it amounts to dislodging the hard samples in training set, which incurs overfitting. We assert that it is blamed on the block of transferring the latent knowledge exploited by GMS from data to model. Therefore, a Scoped Teacher playing a role of bridge in knowledge transform is proposed. What' s more, the consistency regularization is also introduced to implement knowledge transform. To that effect, the further constraints are deployed on Scoped Teacher to derive feature consistence between teacher and student end. With proposed GMS and Scoped Teacher implemented on four mainstream datasets of crowd localization, the extensive experiments demonstrate the superiority of our work. Moreover, comparing with existing crowd locators, our work achieves state-of-the-art via F1-measure comprehensively on four datasets.
Collapse
|
6
|
Application of improved transformer based on weakly supervised in crowd localization and crowd counting. Sci Rep 2023; 13:1144. [PMID: 36670114 PMCID: PMC9859805 DOI: 10.1038/s41598-022-27299-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2022] [Accepted: 12/29/2022] [Indexed: 01/22/2023] Open
Abstract
To the problem of the complex pre-processing and post-processing to obtain head-position existing in the current crowd localization method using pseudo boundary box and pre-designed positioning map, this work proposes an end-to-end crowd localization framework named WSITrans, which reformulates the weakly-supervised crowd localization problem based on Transformer and implements crowd counting. Specifically, we first perform global maximum pooling (GMP) after each stage of pure Transformer, which can extract and retain more detail of heads. In addition, we design a binarization module that binarizes the output features of the decoder and fuses the confidence score to obtain more accurate confidence score. Finally, extensive experiments demonstrate that the proposed method achieves significant improvement on three challenging benchmarks. It is worth mentioning that the WSITrans improves F1-measure by 4.0%.
Collapse
|
7
|
Bai H, Mao J, Gary Chan SH. A survey on deep learning-based single image crowd counting: Network design, loss function and supervisory signal. Neurocomputing 2022. [DOI: 10.1016/j.neucom.2022.08.037] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
8
|
Guo X, Gao M, Zhai W, Shang J, Li Q. Spatial-Frequency Attention Network for Crowd Counting. BIG DATA 2022; 10:453-465. [PMID: 35679590 DOI: 10.1089/big.2022.0039] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Counting the number of people in crowded scenarios is a crucial task in video surveillance and urban security system. Widely deployed surveillance cameras provide big data for training, a compelling deep learning-based counting network. However, large-scale variations in dense crowds are still not entirely solved. To address this problem, we propose a spatial-frequency attention network (SFANet) for crowd counting in this article. A bottleneck spatial attention module is built to emphasize features in various spatial locations and select a region containing individuals adaptively in the spatial domain. As a complementary, in the frequency domain, a multispectral channel attention module is adopted to obtain a more complete set of frequency components for representing each channel. The two attention modules are combined to focus on the discriminative region and suppress the misleading information by their mutual promotion. Experimental results on five benchmark crowd data sets demonstrate that the SFANet can achieve the state-of-the-art performance in terms of accuracy and robustness.
Collapse
Affiliation(s)
- Xiangyu Guo
- School of Electrical and Electronic Engineering, Shandong University of Technology, Zibo, China
| | - Mingliang Gao
- School of Electrical and Electronic Engineering, Shandong University of Technology, Zibo, China
| | - Wenzhe Zhai
- School of Electrical and Electronic Engineering, Shandong University of Technology, Zibo, China
| | - Jianrun Shang
- School of Electrical and Electronic Engineering, Shandong University of Technology, Zibo, China
| | - Qilei Li
- School of Electronic Engineering and Computer Science, Queen Mary University of London, London, United Kingdom
| |
Collapse
|
9
|
MACC Net: Multi-task attention crowd counting network. APPL INTELL 2022. [DOI: 10.1007/s10489-022-03954-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|