1
|
Ma C, Neri F, Gu L, Wang Z, Wang J, Qing A, Wang Y. Crowd Counting Using Meta-Test-Time Adaptation. Int J Neural Syst 2024; 34:2450061. [PMID: 39252679 DOI: 10.1142/s0129065724500618] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/11/2024]
Abstract
Machine learning algorithms are commonly used for quickly and efficiently counting people from a crowd. Test-time adaptation methods for crowd counting adjust model parameters and employ additional data augmentation to better adapt the model to the specific conditions encountered during testing. The majority of current studies concentrate on unsupervised domain adaptation. These approaches commonly perform hundreds of epochs of training iterations, requiring a sizable number of unannotated data of every new target domain apart from annotated data of the source domain. Unlike these methods, we propose a meta-test-time adaptive crowd counting approach called CrowdTTA, which integrates the concept of test-time adaptation into the meta-learning framework and makes it easier for the counting model to adapt to the unknown test distributions. To facilitate the reliable supervision signal at the pixel level, we introduce uncertainty by inserting the dropout layer into the counting model. The uncertainty is then used to generate valuable pseudo labels, serving as effective supervisory signals for adapting the model. In the context of meta-learning, one image can be regarded as one task for crowd counting. In each iteration, our approach is a dual-level optimization process. In the inner update, we employ a self-supervised consistency loss function to optimize the model so as to simulate the parameters update process that occurs during the test phase. In the outer update, we authentically update the parameters based on the image with ground truth, improving the model's performance and making the pseudo labels more accurate in the next iteration. At test time, the input image is used for adapting the model before testing the image. In comparison to various supervised learning and domain adaptation methods, our results via extensive experiments on diverse datasets showcase the general adaptive capability of our approach across datasets with varying crowd densities and scales.
Collapse
Affiliation(s)
- Chaoqun Ma
- School of Electrical Engineering, Southwest Jiaotong University, Chengdu 611756, P. R. China
| | - Ferrante Neri
- NICE Group, School of Computer Science and Electronic Engineering, University of Surrey, Guildford, Surrey GU2 7XH, UK
| | - Li Gu
- Department of Computer Science and Software Engineering, Concordia University, Montreal, QC H3H 2L9, Canada
| | - Ziqiang Wang
- Department of Computer Science and Software Engineering, Concordia University, Montreal, QC H3H 2L9, Canada
| | - Jian Wang
- Faculty of Electric Power Engineering, Kunming University of Science and Technology, Kunming 650500, P. R. China
| | - Anyong Qing
- School of Electrical Engineering, Southwest Jiaotong University, Chengdu 611756, P. R. China
| | - Yang Wang
- Department of Computer Science and Software Engineering, Concordia University, Montreal, QC H3H 2L9, Canada
| |
Collapse
|
2
|
Chen J, Shi X, Zhang H, Li W, Li P, Yao Y, Miyazawa S, Song X, Shibasaki R. MobCovid: Confirmed Cases Dynamics Driven Time Series Prediction of Crowd in Urban Hotspot. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:13397-13410. [PMID: 37200115 DOI: 10.1109/tnnls.2023.3268291] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/20/2023]
Abstract
Monitoring the crowd in urban hot spot has been an important research topic in the field of urban management and has high social impact. It can allow more flexible allocation of public resources such as public transportation schedule adjustment and arrangement of police force. After 2020, because of the epidemic of COVID-19 virus, the public mobility pattern is deeply affected by the situation of epidemic as the physical close contact is the dominant way of infection. In this study, we propose a confirmed case-driven time-series prediction of crowd in urban hot spot named MobCovid. The model is a deviation of Informer, a popular time-serial prediction model proposed in 2021. The model takes both the number of nighttime staying people in downtown and confirmed cases of COVID-19 as input and predicts both the targets. In the current period of COVID, many areas and countries have relaxed the lockdown measures on public mobility. The outdoor travel of public is based on individual decision. Report of large amount of confirmed cases would restrict the public visitation of crowded downtown. But, still, government would publish some policies to try to intervene in the public mobility and control the spread of virus. For example, in Japan, there are no compulsory measures to force people to stay at home, but measures to persuade people to stay away from downtown area. Therefore, we also merge the encoding of policies on measures of mobility restriction made by government in the model to improve the precision. We use historical data of nighttime staying people in crowded downtown and confirmed cases of Tokyo and Osaka area as study case. Multiple times of comparison with other baselines including the original Informer model prove the effectiveness of our proposed method. We believe our work can make contribution to the current knowledge on forecasting the number of crowd in urban downtown during the Covid epidemic.
Collapse
|
3
|
Zhu P, Li J, Cao B, Hu Q. Multi-Task Credible Pseudo-Label Learning for Semi-Supervised Crowd Counting. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:10394-10406. [PMID: 37022812 DOI: 10.1109/tnnls.2023.3241211] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
As a widely used semi-supervised learning strategy, self-training generates pseudo-labels to alleviate the labor-intensive and time-consuming annotation problems in crowd counting while boosting the model performance with limited labeled data and massive unlabeled data. However, the noise in the pseudo-labels of the density maps greatly hinders the performance of semi-supervised crowd counting. Although auxiliary tasks, e.g., binary segmentation, are utilized to help improve the feature representation learning ability, they are isolated from the main task, i.e., density map regression and the multi-task relationships are totally ignored. To address the above issues, we develop a multi-task credible pseudo-label learning (MTCP) framework for crowd counting, consisting of three multi-task branches, i.e., density regression as the main task, and binary segmentation and confidence prediction as the auxiliary tasks. Multi-task learning is conducted on the labeled data by sharing the same feature extractor for all three tasks and taking multi-task relations into account. To reduce epistemic uncertainty, the labeled data are further expanded, by trimming the labeled data according to the predicted confidence map for low-confidence regions, which can be regarded as an effective data augmentation strategy. For unlabeled data, compared with the existing works that only use the pseudo-labels of binary segmentation, we generate credible pseudo-labels of density maps directly, which can reduce the noise in pseudo-labels and therefore decrease aleatoric uncertainty. Extensive comparisons on four crowd-counting datasets demonstrate the superiority of our proposed model over the competing methods. The code is available at: https://github.com/ljq2000/MTCP.
Collapse
|
4
|
Dong L, Zhang H, Ma J, Xu X, Yang Y, Wu QMJ. CLRNet: A Cross Locality Relation Network for Crowd Counting in Videos. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:6408-6422. [PMID: 36215378 DOI: 10.1109/tnnls.2022.3209918] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
In this article, we propose a new cross locality relation network (CLRNet) to generate high-quality crowd density maps for crowd counting in videos. Specifically, a cross locality relation module (CLRM) is proposed to enhance feature representations by modeling local dependencies of pixels between adjacent frames with an adapted local self-attention mechanism. First, different from the existing methods which measure similarity between pixels by dot product, a new adaptive cosine similarity is advanced to measure the relationship between two positions. Second, the traditional self-attention modules usually integrate the reconstructed features with the same weights for all the positions. However, crowd movement and background changes in a video sequence are uneven in real-life applications. As a consequence, it is inappropriate to treat all the positions in reconstructed features equally. To address this issue, a scene consistency attention map (SCAM) is developed to make CLRM pay more attention to the positions with strong correlations in adjacent frames. Furthermore, CLRM is incorporated into the network in a coarse-to-fine way to further enhance the representational capability of features. Experimental results demonstrate the effectiveness of our proposed CLRNet in comparison to the state-of-the-art methods on four public video datasets. The codes are available at: https://github.com/Amelie01/CLRNet.
Collapse
|
5
|
Lu H, Liu L, Wang H, Cao Z. Counting Crowd by Weighing Counts: A Sequential Decision-Making Perspective. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:5141-5154. [PMID: 36094991 DOI: 10.1109/tnnls.2022.3202652] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
We show that crowd counting can be formulated as a sequential decision-making (SDM) problem. Inspired by human counting, we evade one-step estimation mostly executed in existing counting models and decompose counting into sequential sub-decision problems. During implementation, a key insight is to interpret sequential counting as a physical process in reality-scale weighing. This analogy allows us to implement a novel "counting scale" termed LibraNet. Our idea is that, by placing a crowd image on the scale, LibraNet (agent) learns to place appropriate weights to match the count: at each step, one weight (action) is chosen from the weight box (the predefined action pool) conditioned on the image features and the placed weights (state) until the pointer (the agent output) informs balance. We investigate two forms of state definition and explore four types of LibraNet implementations under different learning paradigms, including deep Q-network (DQN), actor-critic (AC), imitation learning (IL), and mixed AC+IL. Experiments show that LibraNet indeed mimics scale weighing, that it outperforms or performs comparably against state-of-the-art approaches on five crowd counting benchmarks, that it can be used as a plug-in to improve off-the-shelf counting models, and particularly that it demonstrates remarkable cross-dataset generalization. Code and models are available at https://git.io/libranet.
Collapse
|
6
|
Zhai W, Gao M, Li Q, Jeon G, Anisetti M. FPANet: feature pyramid attention network for crowd counting. APPL INTELL 2023. [DOI: 10.1007/s10489-023-04499-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/02/2023]
|
7
|
Guo X, Gao M, Zhai W, Shang J, Li Q. Spatial-Frequency Attention Network for Crowd Counting. BIG DATA 2022; 10:453-465. [PMID: 35679590 DOI: 10.1089/big.2022.0039] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Counting the number of people in crowded scenarios is a crucial task in video surveillance and urban security system. Widely deployed surveillance cameras provide big data for training, a compelling deep learning-based counting network. However, large-scale variations in dense crowds are still not entirely solved. To address this problem, we propose a spatial-frequency attention network (SFANet) for crowd counting in this article. A bottleneck spatial attention module is built to emphasize features in various spatial locations and select a region containing individuals adaptively in the spatial domain. As a complementary, in the frequency domain, a multispectral channel attention module is adopted to obtain a more complete set of frequency components for representing each channel. The two attention modules are combined to focus on the discriminative region and suppress the misleading information by their mutual promotion. Experimental results on five benchmark crowd data sets demonstrate that the SFANet can achieve the state-of-the-art performance in terms of accuracy and robustness.
Collapse
Affiliation(s)
- Xiangyu Guo
- School of Electrical and Electronic Engineering, Shandong University of Technology, Zibo, China
| | - Mingliang Gao
- School of Electrical and Electronic Engineering, Shandong University of Technology, Zibo, China
| | - Wenzhe Zhai
- School of Electrical and Electronic Engineering, Shandong University of Technology, Zibo, China
| | - Jianrun Shang
- School of Electrical and Electronic Engineering, Shandong University of Technology, Zibo, China
| | - Qilei Li
- School of Electronic Engineering and Computer Science, Queen Mary University of London, London, United Kingdom
| |
Collapse
|
8
|
Tang S, Pan Z, Hu G, Wu Y, Li Y. Meta-Knowledge and Multi-Task Learning-Based Multi-Scene Adaptive Crowd Counting. SENSORS 2022; 22:s22093320. [PMID: 35591010 PMCID: PMC9104539 DOI: 10.3390/s22093320] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/17/2022] [Revised: 04/12/2022] [Accepted: 04/13/2022] [Indexed: 11/16/2022]
Abstract
In this paper, we propose a multi-scene adaptive crowd counting method based on meta-knowledge and multi-task learning. In practice, surveillance cameras are stationarily deployed in various scenes. Considering the extensibility of a surveillance system, the ideal crowd counting method should have a strong generalization capability to be deployed in unknown scenes. On the other hand, given the diversity of scenes, it should also effectively suit each scene for better performance. These two objectives are contradictory, so we propose a coarse-to-fine pipeline including meta-knowledge network and multi-task learning. Specifically, at the coarse-grained stage, we propose a generic two-stream network for all existing scenes to encode meta-knowledge especially inter-frame temporal knowledge. At the fine-grained stage, the regression of the crowd density map to the overall number of people in each scene is considered a homogeneous subtask in a multi-task framework. A robust multi-task learning algorithm is applied to effectively learn scene-specific regression parameters for existing and new scenes, which further improve the accuracy of each specific scenes. Taking advantage of multi-task learning, the proposed method can be deployed to multiple new scenes without duplicated model training. Compared with two representative methods, namely AMSNet and MAML-counting, the proposed method reduces the MAE by 10.29% and 13.48%, respectively.
Collapse
Affiliation(s)
- Siqi Tang
- Control Engineering College, Army Engineering University of PLA, Nanjing 210007, China; (S.T.); (G.H.); (Y.L.)
| | - Zhisong Pan
- Control Engineering College, Army Engineering University of PLA, Nanjing 210007, China; (S.T.); (G.H.); (Y.L.)
- Correspondence:
| | - Guyu Hu
- Control Engineering College, Army Engineering University of PLA, Nanjing 210007, China; (S.T.); (G.H.); (Y.L.)
| | - Yang Wu
- Beijing Information and Communications Technology Research Center, Beijing 100036, China;
| | - Yunbo Li
- Control Engineering College, Army Engineering University of PLA, Nanjing 210007, China; (S.T.); (G.H.); (Y.L.)
| |
Collapse
|
9
|
RDC-SAL: Refine distance compensating with quantum scale-aware learning for crowd counting and localization. APPL INTELL 2022. [DOI: 10.1007/s10489-022-03238-4] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
10
|
Wang W, Zhang J, Zhai W, Cao Y, Tao D. Robust Object Detection via Adversarial Novel Style Exploration. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2022; 31:1949-1962. [PMID: 35100117 DOI: 10.1109/tip.2022.3146017] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Deep object detection models trained on clean images may not generalize well on degraded images due to the well-known domain shift issue. This hinders their application in real-life scenarios such as video surveillance and autonomous driving. Though domain adaptation methods can adapt the detection model from a labeled source domain to an unlabeled target domain, they struggle in dealing with open and compound degradation types. In this paper, we attempt to address this problem in the context of object detection by proposing a robust object Detector via Adversarial Novel Style Exploration (DANSE). Technically, DANSE first disentangles images into domain-irrelevant content representation and domain-specific style representation under an adversarial learning framework. Then, it explores the style space to discover diverse novel degradation styles that are complementary to those of the target domain images by leveraging a novelty regularizer and a diversity regularizer. The clean source domain images are transferred into these discovered styles by using a content-preserving regularizer to ensure realism. These transferred source domain images are combined with the target domain images and used to train a robust degradation-agnostic object detection model via adversarial domain adaptation. Experiments on both synthetic and real benchmark scenarios confirm the superiority of DANSE over state-of-the-art methods.
Collapse
|
11
|
Xie J, Gu L, Li Z, Lyu L. HRANet: Hierarchical region-aware network for crowd counting. APPL INTELL 2022; 52:12191-12205. [PMID: 35125656 PMCID: PMC8807383 DOI: 10.1007/s10489-021-03030-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/23/2021] [Indexed: 11/18/2022]
Abstract
Aiming to tackle the most intractable problems of scale variation and complex backgrounds in crowd counting, we present an innovative framework called Hierarchical Region-Aware Network (HRANet) for crowd counting in this paper, which can better focus on crowd regions to accurately predict crowd density. In our implementation, first, we design a Region-Aware Module (RAM) to capture the internal differences within different regions of the feature map, thus adaptively extracting contextual features within different regions. Furthermore, we propose a Region Recalibration Module (RRM) which adopts a novel region-aware attention mechanism (RAAM) to further recalibrate the feature weights of different regions. By the integration of the above two modules, the influence of background regions can be effectively suppressed. Besides, considering the local correlations within different regions of the crowd density map, a Region Awareness Loss (RAL) is designed to reduce false identification while producing the locally consistent density map. Extensive experiments on five challenging datasets demonstrate that the proposed method significantly outperforms existing methods in terms of counting accuracy and quality of the generated density map. In addition, a series of specific experiments in crowd gathering scenes indicate that our method can be practically applied to crowd localization.
Collapse
|
12
|
|
13
|
Yuan C, Jiao S, Sun X, Wu QMJ. MFFFLD: A Multi-modal Feature Fusion Based Fingerprint Liveness Detection. IEEE Trans Cogn Dev Syst 2021. [DOI: 10.1109/tcds.2021.3062624] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
|