1
|
Liu Z, Feng C, Yu K, Hu J, Yang J. GCReID: Generalized continual person re-identification via meta learning and knowledge accumulation. Neural Netw 2024; 179:106561. [PMID: 39084171 DOI: 10.1016/j.neunet.2024.106561] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2023] [Revised: 04/24/2024] [Accepted: 07/20/2024] [Indexed: 08/02/2024]
Abstract
Person re-identification (ReID) has made good progress in stationary domains. The ReID model must be retrained to adapt to new scenarios (domains) as they emerge unexpectedly, which leads to catastrophic forgetting. Continual learning trains the model in the order of domain emergence to alleviate catastrophic forgetting. However, generalization ability of the model is still limited due to the distribution difference between training and testing domains. To address the above problem, we propose the generalized continual person re-Identification (GCReID) model to continuously train an anti-forgetting and generalizable model. We endeavor to increase the diversity of samples by prior to simulate unseen domains. Meta-train and meta-test are adopted to enhance generalization of the model. Universal knowledge extracted from all seen domains and the simulated domains is stored in a set of feature embeddings. The knowledge is continually updated and applied to guide meta-train and meta-test via a graph attention network. Extensive experiments on 12 benchmark datasets and comparisons with 6 representative models demonstrate the effectiveness of the proposed model GCReID in enhancing generalization performance on unseen domains and alleviating catastrophic forgetting of seen domains. The code will be available at https://github.com/DFLAG-NEU/GCReID if our work is accepted.
Collapse
Affiliation(s)
- Zhaoshuo Liu
- School of Computer Science and Engineering, Northeastern University, Shenyang, 110169, Liaoning, China
| | - Chaolu Feng
- School of Computer Science and Engineering, Northeastern University, Shenyang, 110169, Liaoning, China; Key Laboratory of Intelligent Computing in Medical Image, Ministry of Education, Shenyang, 110169, Liaoning, China.
| | - Kun Yu
- College of Medicine and Biological Information Engineering, Northeastern University, Shenyang, 110016, Liaoning, China
| | - Jun Hu
- Neusoft Reach Automotive Technology Company, Shenyang, 110179, Liaoning, China
| | - Jinzhu Yang
- School of Computer Science and Engineering, Northeastern University, Shenyang, 110169, Liaoning, China; Key Laboratory of Intelligent Computing in Medical Image, Ministry of Education, Shenyang, 110169, Liaoning, China
| |
Collapse
|
2
|
Shu W, Wan J, Chan AB. Generalized Characteristic Function Loss for Crowd Analysis in the Frequency Domain. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2024; 46:2882-2899. [PMID: 37995158 DOI: 10.1109/tpami.2023.3336196] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/25/2023]
Abstract
Typical approaches that learn crowd density maps are limited to extracting the supervisory information from the loosely organized spatial information in the crowd dot/density maps. This paper tackles this challenge by performing the supervision in the frequency domain. More specifically, we devise a new loss function for crowd analysis called generalized characteristic function loss (GCFL). This loss carries out two steps: 1) transforming the spatial information in density or dot maps to the frequency domain; 2) calculating a loss value between their frequency contents. For step 1, we establish a series of theoretical fundaments by extending the definition of the characteristic function for probability distributions to density maps, as well as proving some vital properties of the extended characteristic function. After taking the characteristic function of the density map, its information in the frequency domain is well-organized and hierarchically distributed, while in the spatial domain it is loose-organized and dispersed everywhere. In step 2, we design a loss function that can fit the information organization in the frequency domain, allowing the exploitation of the well-organized frequency information for the supervision of crowd analysis tasks. The loss function can be adapted to various crowd analysis tasks through the specification of its window functions. In this paper, we demonstrate its power in three tasks: Crowd Counting, Crowd Localization and Noisy Crowd Counting. We show the advantages of our GCFL compared to other SOTA losses and its competitiveness to other SOTA methods by theoretical analysis and empirical results on benchmark datasets. Our codes are available at https://github.com/wbshu/Crowd_Counting_in_the_Frequency_Domain.
Collapse
|
3
|
Wang Y, Yang B, Wang X, Liang C, Chen J. SATCount: A scale-aware transformer-based class-agnostic counting framework. Neural Netw 2024; 172:106126. [PMID: 38244354 DOI: 10.1016/j.neunet.2024.106126] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2023] [Revised: 12/07/2023] [Accepted: 01/11/2024] [Indexed: 01/22/2024]
Abstract
This paper studies the class-agnostic counting problem, which aims to count objects regardless of their class, and relies only on a limited number of exemplar objects. Existing methods usually extract visual features from query and exemplar images, compute similarity between them using convolution operations, and finally use this information to estimate object counts. However, these approaches often overlook the scale information of the exemplar objects, leading to lower counting accuracy for objects with multi-scale characteristics. Additionally, convolution operations are local linear matching processes that may result in a loss of semantic information, which can limit the performance of the counting algorithm. To address these issues, we devise a new scale-aware transformer-based feature fusion module that integrates visual and scale information of exemplar objects and models similarity between samples and queries using cross-attention. Finally, we propose an object counting algorithm based on a feature extraction backbone, a feature fusion module and a density map regression head, called SATCount. Our experiments on the FSC-147 and the CARPK demonstrate that our model outperforms the state-of-the-art methods.
Collapse
Affiliation(s)
- Yutian Wang
- National Engineering Research Center for Multimedia Software, Wuhan University, China; Hubei Key Laboratory of Multimedia and Network Communication Engineering, Wuhan University, China; School of Computer Science, Wuhan University, Wuhan, 430072, China
| | - Bin Yang
- National Engineering Research Center for Multimedia Software, Wuhan University, China; Hubei Key Laboratory of Multimedia and Network Communication Engineering, Wuhan University, China; School of Computer Science, Wuhan University, Wuhan, 430072, China
| | - Xi Wang
- Hubei Huazhong Electric Power Technology Development Co., Ltd., China
| | - Chao Liang
- National Engineering Research Center for Multimedia Software, Wuhan University, China; Hubei Key Laboratory of Multimedia and Network Communication Engineering, Wuhan University, China; School of Computer Science, Wuhan University, Wuhan, 430072, China.
| | - Jun Chen
- National Engineering Research Center for Multimedia Software, Wuhan University, China; Hubei Key Laboratory of Multimedia and Network Communication Engineering, Wuhan University, China; School of Computer Science, Wuhan University, Wuhan, 430072, China.
| |
Collapse
|
4
|
Bai H, Mao J, Gary Chan SH. A survey on deep learning-based single image crowd counting: Network design, loss function and supervisory signal. Neurocomputing 2022. [DOI: 10.1016/j.neucom.2022.08.037] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
5
|
Salient double reconstruction-based discriminative projective dictionary pair learning for crowd counting. APPL INTELL 2022. [DOI: 10.1007/s10489-022-03607-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
|
6
|
Zhou L, Wang P, Li W, Leng J, Lei B. Semantic-refined spatial pyramid network for crowd counting. Pattern Recognit Lett 2022. [DOI: 10.1016/j.patrec.2022.04.029] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|