1
|
Liu M, Bian Y, Liu Q, Wang X, Wang Y. Weakly Supervised Tracklet Association Learning With Video Labels for Person Re-Identification. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2024; 46:3595-3607. [PMID: 38133978 DOI: 10.1109/tpami.2023.3346168] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/24/2023]
Abstract
Supervised person re-identification (re-id) methods require expensive manual labeling costs. Although unsupervised re-id methods can reduce the requirement of the labeled datasets, the performance of these methods is lower than the supervised alternatives. Recently, some weakly supervised learning-based person re-id methods have been proposed, which is a balance between supervised and unsupervised learning. Nevertheless, most of these models require another auxiliary fully supervised datasets or ignore the interference of noisy tracklets. To address this problem, in this work, we formulate a weakly supervised tracklet association learning (WS-TAL) model only leveraging the video labels. Specifically, we first propose an intra-bag tracklet discrimination learning (ITDL) term. It can capture the associations between person identities and images by assigning pseudo labels to each person image in a bag. And then, the discriminative feature for each person is learned by utilizing the obtained associations after filtering the noisy tracklets. Based on that, a cross-bag tracklet association learning (CTAL) term is presented to explore the potential tracklet associations between bags by mining reliable positive tracklet pairs and hard negative pairs. Finally, these two complementary terms are jointly optimized to train our re-id model. Extensive experiments on the weakly labeled datasets demonstrate that WS-TAL achieves 88.1% and 90.3% rank-1 accuracy on the MARS and DukeMTMC-VideoReID datasets respectively. The performance of our model surpasses the state-of-the-art weakly supervised models by a large margin, even outperforms some fully supervised re-id models.
Collapse
|
2
|
Zhang H, Liu M, Li Y, Yan M, Gao Z, Chang X, Nie L. Attribute-Guided Collaborative Learning for Partial Person Re-Identification. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2023; 45:14144-14160. [PMID: 37669202 DOI: 10.1109/tpami.2023.3312302] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/07/2023]
Abstract
Partial person re-identification (ReID) aims to solve the problem of image spatial misalignment due to occlusions or out-of-views. Despite significant progress through the introduction of additional information, such as human pose landmarks, mask maps, and spatial information, partial person ReID remains challenging due to noisy keypoints and impressionable pedestrian representations. To address these issues, we propose a unified attribute-guided collaborative learning scheme for partial person ReID. Specifically, we introduce an adaptive threshold-guided masked graph convolutional network that can dynamically remove untrustworthy edges to suppress the diffusion of noisy keypoints. Furthermore, we incorporate human attributes and devise a cyclic heterogeneous graph convolutional network to effectively fuse cross-modal pedestrian information through intra- and inter-graph interaction, resulting in robust pedestrian representations. Finally, to enhance keypoint representation learning, we design a novel part-based similarity constraint based on the axisymmetric characteristic of the human body. Extensive experiments on multiple public datasets have shown that our model achieves superior performance compared to other state-of-the-art baselines.
Collapse
|
3
|
Yang B, Chen J, Ma X, Ye M. Translation, Association and Augmentation: Learning Cross-Modality Re-Identification From Single-Modality Annotation. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2023; 32:5099-5113. [PMID: 37669187 DOI: 10.1109/tip.2023.3310338] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/07/2023]
Abstract
Daytime visible modality (RGB) and night-time infrared (IR) modality person re-identification (VI-ReID) is a challenging cross-modality pedestrian retrieval problem. However, training a cross-modality ReID model requires plenty of cross-modality (visible-infrared) identity labels that are more expensive than single-modality person ReID. To alleviate this issue, this paper studies unsupervised domain adaptive visible infrared person re-identification (UDA-VI-ReID) task without the reliance on any cross-modality annotation. To transfer learned knowledge from the labelled visible source domain to the unlabelled visible-infrared target domain, we propose a Translation, Association and Augmentation (TAA) framework. Specifically, the modality translator is firstly utilized to transfer visible image to infrared image, formulating generated visible-infrared image pairs for cross-modality supervised training. A Robust Association and Mutual Learning (RAML) module is then designed to exploit the underlying relations between visible and infrared modalities for label noise modeling. Moreover, a Translation Supervision and Feature Augmentation (TSFA) module is designed to enhance the discriminability by enriching the supervision with feature augmentation and modality translation. The extensive experimental results demonstrate that our method significantly outperforms current state-of-the-art unsupervised methods under various settings, and even surpasses some supervised counterparts, providing a powerful baseline for UDA-VI-ReID.
Collapse
|
4
|
Shen L, Li X, Pan Z, Sun X, Zhang Y, Zheng J. Image2Brain: a cross-modality model for blind stereoscopic image quality ranking. J Neural Eng 2023; 20:046041. [PMID: 37607552 DOI: 10.1088/1741-2552/acf2c9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2023] [Accepted: 08/22/2023] [Indexed: 08/24/2023]
Abstract
Objective.Human beings perceive stereoscopic image quality through the cerebral visual cortex, which is a complex brain activity. As a solution, the quality of stereoscopic images can be evaluated more accurately by attempting to replicate the human perception from electroencephalogram (EEG) signals on image quality in a machine, which is different from previous stereoscopic image quality assessment methods focused only on the extraction of image features.Approach.Our proposed method is based on a novel image-to-brain (I2B) cross-modality model including a spatial-temporal EEG encoder (STEE) and an I2B deep convolutional generative adversarial network (I2B-DCGAN). Specifically, the EEG representations are first learned by STEE as real samples of I2B-DCGAN, which is designed to extract both quality and semantic features from the stereoscopic images by a semantic-guided image encoder, and utilize a generator to conditionally create the corresponding EEG features for images. Finally, the generated EEG features are classified to predict the image perceptual quality level.Main results.Extensive experimental results on the collected brain-visual multimodal stereoscopic image quality ranking database, demonstrate that the proposed I2B cross-modality model can better emulate the visual perception mechanism of the human brain and outperform the other methods by achieving an average accuracy of 95.95%.Significance.The proposed method can convert the learned stereoscopic image features into brain representations without EEG signals during testing. Further experiments verify that the proposed method has good generalization ability on new datasets and the potential for practical applications.
Collapse
Affiliation(s)
- Lili Shen
- School of Electrical and Information Engineering, Tianjin University, Tianjin, People's Republic of China
| | - Xintong Li
- School of Electrical and Information Engineering, Tianjin University, Tianjin, People's Republic of China
| | - Zhaoqing Pan
- School of Electrical and Information Engineering, Tianjin University, Tianjin, People's Republic of China
| | - Xichun Sun
- School of Electrical and Information Engineering, Tianjin University, Tianjin, People's Republic of China
| | - Yixuan Zhang
- School of Electrical and Information Engineering, Tianjin University, Tianjin, People's Republic of China
| | - Jianpu Zheng
- School of Electrical and Information Engineering, Tianjin University, Tianjin, People's Republic of China
| |
Collapse
|
5
|
Wang Y, Su Y, Li W, Sun Z, Wei Z, Nie J, Li X, Liu AA. Rare-aware attention network for image–text matching. Inf Process Manag 2023. [DOI: 10.1016/j.ipm.2023.103280] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/31/2023]
|
6
|
Wei X, Liu Q, Liu M, Wang Y, Meijering E. 3D Soma Detection in Large-Scale Whole Brain Images via a Two-Stage Neural Network. IEEE TRANSACTIONS ON MEDICAL IMAGING 2023; 42:148-157. [PMID: 36103445 DOI: 10.1109/tmi.2022.3206605] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/10/2023]
Abstract
3D soma detection in whole brain images is a critical step for neuron reconstruction. However, existing soma detection methods are not suitable for whole mouse brain images with large amounts of data and complex structure. In this paper, we propose a two-stage deep neural network to achieve fast and accurate soma detection in large-scale and high-resolution whole mouse brain images (more than 1TB). For the first stage, a lightweight Multi-level Cross Classification Network (MCC-Net) is proposed to filter out images without somas and generate coarse candidate images by combining the advantages of the multi convolution layer's feature extraction ability. It can speed up the detection of somas and reduce the computational complexity. For the second stage, to further obtain the accurate locations of somas in the whole mouse brain images, the Scale Fusion Segmentation Network (SFS-Net) is developed to segment soma regions from candidate images. Specifically, the SFS-Net captures multi-scale context information and establishes a complementary relationship between encoder and decoder by combining the encoder-decoder structure and a 3D Scale-Aware Pyramid Fusion (SAPF) module for better segmentation performance. The experimental results on three whole mouse brain images verify that the proposed method can achieve excellent performance and provide the reconstruction of neurons with beneficial information. Additionally, we have established a public dataset named WBMSD, including 798 high-resolution and representative images ( 256 ×256 ×256 voxels) from three whole mouse brain images, dedicated to the research of soma detection, which will be released along with this paper.
Collapse
|
7
|
Li Y, Zhang T, Liu X, Tian Q, Zhang Y, Wu F. Visible-Infrared Person Re-Identification With Modality-Specific Memory Network. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2022; 31:7165-7178. [PMID: 36367912 DOI: 10.1109/tip.2022.3220408] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
Visible-infrared person re-identification (VI-ReID) is challenging due to the large modality discrepancy between visible and infrared images. Existing methods mainly focus on learning modality-shared representations by embedding images from different modalities into a common feature space, in which some discriminative modality information is discarded. Different from these methods, in this paper, we propose a novel Modality-Specific Memory Network (MSMNet) to complete the missing modality information and aggregate visible and infrared modality features into a unified feature space for the VI-ReID task. The proposed model enjoys several merits. First, it can exploit the missing modality information to alleviate the modality discrepancy when only the single-modality input is provided. To the best of our knowledge, this is the first work to exploit the missing modality information completion and alleviate the modality discrepancy with the memory network. Second, to guide the learning process of the memory network, we design three effective learning strategies, including feature consistency, memory representativeness and structural alignment. By incorporating these learning strategies in a unified model, the memory network can be well learned to propagate identity-related information between modalities and boost the VI-ReID performance. Extensive experimental results on two standard benchmarks (SYSU-MM01 and RegDB) demonstrate that the proposed MSMNet performs favorably against state-of-the-art methods.
Collapse
|
8
|
Application and Evaluation of Image-based Information Acquisition in Railway Transportation. J INTELL ROBOT SYST 2022. [DOI: 10.1007/s10846-022-01652-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
9
|
Swin Transformer Based on Two-Fold Loss and Background Adaptation Re-Ranking for Person Re-Identification. ELECTRONICS 2022. [DOI: 10.3390/electronics11131941] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Person re-identification (Re-ID) aims to identify the same pedestrian from a surveillance video in various scenarios. Existing Re-ID models are biased to learn background appearances when there are many background variations in the pedestrian training set. Thus, pedestrians with the same identity will appear with different backgrounds, which interferes with the Re-ID performance. This paper proposes a swin transformer based on two-fold loss (TL-TransNet) to pay more attention to the semantic information of a pedestrian’s body and preserve valuable background information, thereby reducing the interference of corresponding background appearance. TL-TransNet is supervised by two types of losses (i.e., circle loss and instance loss) during the training phase. In the retrieval phase, DeepLabV3+ as a pedestrian background segmentation model is applied to generate body masks in terms of query and gallery set. The background removal results are generated according to the mask and are used to filter out interfering background information. Subsequently, a background adaptation re-ranking is designed to combine the original information with the background-removed information, which digs out more positive samples with large background deviation. Extensive experiments on two public person Re-ID datasets testify that the proposed method achieves competitive robustness performance in terms of the background variation problem.
Collapse
|
10
|
An Adaptively Attention-Driven Cascade Part-Based Graph Embedding Framework for UAV Object Re-Identification. REMOTE SENSING 2022. [DOI: 10.3390/rs14061436] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
With the rapid development of unmanned aerial vehicles (UAVs), object re-identification (Re-ID) based on the UAV platforms has attracted increasing attention, and several excellent achievements have been shown in the traditional scenarios. However, object Re-ID in aerial imagery acquired from the UAVs is still a challenging task, which is mainly due to the reason that variable locations and diverse viewpoints in UAVs platform are always resulting in more appearance ambiguities among the intra-objects and inter-objects. To address the above issues, in this paper, we proposed an adaptively attention-driven cascade part-based graph embedding framework (AAD-CPGE) for UAV object Re-ID. The AAD-CPGE aims to optimally fuse node features and their topological characteristics on the multi-scale structured graphs of parts-based objects, and then adaptively learn the most correlated information for improving the object Re-ID performance. Specifically, we first executed GCNs on the parts-based cascade node feature graphs and topological feature graphs for acquiring multi-scale structured-graph feature representations. After that, we designed a self-attention-based module for adaptive node and topological features fusion on the constructed hierarchical parts-based graphs. Finally, these learning hybrid graph-structured features with the most correlation discriminative capability were applied for object Re-ID. Several experimental verifications on three widely used UAVs-based benchmark datasets were carried out, and comparison with some state-of-the-art object Re-ID approaches validated the effectiveness and benefits of our proposed AAD-CPGE Re-ID framework.
Collapse
|
11
|
Zhao K, Wang Y, Zuo Y, Zhang C. Palletizing Robot Positioning Bolt Detection Based on Improved YOLO-V3. J INTELL ROBOT SYST 2022. [DOI: 10.1007/s10846-022-01580-w] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
12
|
Tang Y, Li B, Liu M, Chen B, Wang Y, Ouyang W. AutoPedestrian: An Automatic Data Augmentation and Loss Function Search Scheme for Pedestrian Detection. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2021; 30:8483-8496. [PMID: 34618670 DOI: 10.1109/tip.2021.3115672] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/22/2023]
Abstract
Pedestrian detection is a challenging and hot research topic in the field of computer vision, especially for the crowded scenes where occlusion happens frequently. In this paper, we propose a novel AutoPedestrian scheme that automatically augments the pedestrian data and searches for suitable loss functions, aiming for better performance of pedestrian detection especially in crowded scenes. To our best knowledge, it is the first work to automatically search the optimal policy of data augmentation and loss function jointly for the pedestrian detection. To achieve the goal of searching the optimal augmentation scheme and loss function jointly, we first formulate the data augmentation policy and loss function as probability distributions based on different hyper-parameters. Then, we apply a double-loop scheme with importance-sampling to solve the optimization problem of data augmentation and loss function types efficiently. Comprehensive experiments on two popular benchmarks of CrowdHuman and CityPersons show the effectiveness of our proposed method. In particular, we achieve 40.58% in MR on CrowdHuman datasets and 11.3% in MR on CityPersons reasonable subset, yielding new state-of-the-art results on these two datasets.
Collapse
|