1
|
Yang B, Chen J, Ma X, Ye M. Translation, Association and Augmentation: Learning Cross-Modality Re-Identification From Single-Modality Annotation. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2023; 32:5099-5113. [PMID: 37669187 DOI: 10.1109/tip.2023.3310338] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/07/2023]
Abstract
Daytime visible modality (RGB) and night-time infrared (IR) modality person re-identification (VI-ReID) is a challenging cross-modality pedestrian retrieval problem. However, training a cross-modality ReID model requires plenty of cross-modality (visible-infrared) identity labels that are more expensive than single-modality person ReID. To alleviate this issue, this paper studies unsupervised domain adaptive visible infrared person re-identification (UDA-VI-ReID) task without the reliance on any cross-modality annotation. To transfer learned knowledge from the labelled visible source domain to the unlabelled visible-infrared target domain, we propose a Translation, Association and Augmentation (TAA) framework. Specifically, the modality translator is firstly utilized to transfer visible image to infrared image, formulating generated visible-infrared image pairs for cross-modality supervised training. A Robust Association and Mutual Learning (RAML) module is then designed to exploit the underlying relations between visible and infrared modalities for label noise modeling. Moreover, a Translation Supervision and Feature Augmentation (TSFA) module is designed to enhance the discriminability by enriching the supervision with feature augmentation and modality translation. The extensive experimental results demonstrate that our method significantly outperforms current state-of-the-art unsupervised methods under various settings, and even surpasses some supervised counterparts, providing a powerful baseline for UDA-VI-ReID.
Collapse
|
2
|
Ruan W, Ye M, Wu Y, Liu W, Chen J, Liang C, Li G, Lin CW. TICNet: A Target-Insight Correlation Network for Object Tracking. IEEE TRANSACTIONS ON CYBERNETICS 2022; 52:12150-12162. [PMID: 34033563 DOI: 10.1109/tcyb.2021.3070677] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Recently, the correlation filter (CF) and Siamese network have become the two most popular frameworks in object tracking. Existing CF trackers, however, are limited by feature learning and context usage, making them sensitive to boundary effects. In contrast, Siamese trackers can easily suffer from the interference of semantic distractors. To address the above problems, we propose an end-to-end target-insight correlation network (TICNet) for object tracking, which aims at breaking the above limitations on top of a unified network. TICNet is an asymmetric dual-branch network involving a target-background awareness model (TBAM), a spatial-channel attention network (SCAN), and a distractor-aware filter (DAF) for end-to-end learning. Specifically, TBAM aims to distinguish a target from the background in the pixel level, yielding a target likelihood map based on color statistics to mine distractors for DAF learning. SCAN consists of a basic convolutional network, a channel-attention network, and a spatial-attention network, aiming to generate attentive weights to enhance the representation learning of the tracker. Especially, we formulate a differentiable DAF and employ it as a learnable layer in the network, thus helping suppress distracting regions in the background. During testing, DAF, together with TBAM, yields a response map for the final target estimation. Extensive experiments on seven benchmarks demonstrate that TICNet outperforms the state-of-the-art methods while running at real-time speed.
Collapse
|
3
|
Tang Y, Yang X, Jiang X, Wang N, Gao X. Dually Distribution Pulling Network for Cross-Resolution Person Reidentification. IEEE TRANSACTIONS ON CYBERNETICS 2022; 52:12016-12027. [PMID: 34043523 DOI: 10.1109/tcyb.2021.3077500] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Person reidentification (Re-ID) aims at recognizing the same identity across different camera views. However, the cross resolution of images [high resolution (HR) and low resolution (LR)] is unavoidable in a realistic scenario due to the various distances among cameras and pedestrians of interest, thus leading to cross-resolution person Re-ID problems. Recently, most cross-resolution person Re-ID methods focus on solving the resolution mismatch problem, while the distribution mismatch between HR and LR images is another factor that significantly impacts the person Re-ID performance. In this article, we propose a dually distribution pulling network (DDPN) to tackle the distribution mismatch problem. DDPN is composed of two modules, that is: 1) super-resolution module and 2) person Re-ID module. They attempt to pull the distribution of LR images closer to the distribution of HR images from image and feature aspects, respectively, through optimizing the maximum mean discrepancy losses. Extensive experiments have been conducted on three benchmark datasets and the results demonstrate the effectiveness of DDPN. Remarkably, DDPN shows a great advantage when compared to the state-of-the-art methods, for instance, we achieve rank-1 accuracy of 76.9% on VR-Market1501, which outperforms the best existing cross-resolution person Re-ID method by 10%.
Collapse
|
4
|
Lu W, Zhang Q, Luo S, Zhou Y, Huang J, Shi YQ. Robust Estimation of Upscaling Factor on Double JPEG Compressed Images. IEEE TRANSACTIONS ON CYBERNETICS 2022; 52:10814-10826. [PMID: 33878009 DOI: 10.1109/tcyb.2021.3069999] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
As one of the most important topics in image forensics, resampling detection has developed rapidly in recent years. However, the robustness to JPEG compression is still challenging for most classical spectrum-based methods, since JPEG compression severely degrades the image contents and introduces block artifacts in the boundary of the compression grid. In this article, we propose a method to estimate the upscaling factors on double JPEG compressed images in the presence of image upscaling between the two compressions. We first analyze the spectrum of scaled images and give an overall formulation of how the scaling factors along with the parameters of JPEG compression and image contents influence the appearance of tampering artifacts. The expected positions of five kinds of characteristic peaks are analytically derived. Then, we analyze the features of double JPEG compressed images in the block discrete cosine transform (BDCT) domain and present an inverse scaling strategy for the upscaling factor estimation with a detailed proof. Finally, a fusion method is proposed that through frequency-domain analysis, a candidate set of upscaling factors is given, and through analysis in the BDCT domain, the optimal estimation from all candidates is determined. The experimental results demonstrate that the proposed method outperforms other state-of-the-art methods.
Collapse
|
5
|
Abstract
Traditional machine learning approaches are susceptible to factors such as object scale, occlusion, leading to low detection efficiency and poor versatility in vehicle detection applications. To tackle this issue, we propose a part-aware refinement network, which combines multi-scale training and component confidence generation strategies in vehicle detection. Specifically, we divide the original single-valued prediction confidence and adopt the confidence of the visible part of the vehicle to correct the absolute detection confidence of the vehicle. That reduces the impact of occlusion on the detection effect. Simultaneously, we relabel the KITTI data, adding the detailed occlusion information of the vehicles. Then, the deep neural network model is trained and tested using the new images. Our proposed method can automatically extract the vehicle features and solve larger error problems when locating vehicles in traditional approaches. Extensive experimental results on KITTI datasets show that our method significantly outperforms the state-of-the-arts while maintaining the detection time.
Collapse
|
6
|
Harris EJ, Khoo IH, Demircan E. A Survey of Human Gait-Based Artificial Intelligence Applications. Front Robot AI 2022; 8:749274. [PMID: 35047564 PMCID: PMC8762057 DOI: 10.3389/frobt.2021.749274] [Citation(s) in RCA: 17] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2021] [Accepted: 11/01/2021] [Indexed: 12/17/2022] Open
Abstract
We performed an electronic database search of published works from 2012 to mid-2021 that focus on human gait studies and apply machine learning techniques. We identified six key applications of machine learning using gait data: 1) Gait analysis where analyzing techniques and certain biomechanical analysis factors are improved by utilizing artificial intelligence algorithms, 2) Health and Wellness, with applications in gait monitoring for abnormal gait detection, recognition of human activities, fall detection and sports performance, 3) Human Pose Tracking using one-person or multi-person tracking and localization systems such as OpenPose, Simultaneous Localization and Mapping (SLAM), etc., 4) Gait-based biometrics with applications in person identification, authentication, and re-identification as well as gender and age recognition 5) “Smart gait” applications ranging from smart socks, shoes, and other wearables to smart homes and smart retail stores that incorporate continuous monitoring and control systems and 6) Animation that reconstructs human motion utilizing gait data, simulation and machine learning techniques. Our goal is to provide a single broad-based survey of the applications of machine learning technology in gait analysis and identify future areas of potential study and growth. We discuss the machine learning techniques that have been used with a focus on the tasks they perform, the problems they attempt to solve, and the trade-offs they navigate.
Collapse
Affiliation(s)
- Elsa J Harris
- Human Performance and Robotics Laboratory, Department of Mechanical and Aerospace Engineering, California State University Long Beach, Long Beach, CA, United States
| | - I-Hung Khoo
- Department of Electrical Engineering, California State University Long Beach, Long Beach, CA, United States.,Department of Biomedical Engineering, California State University Long Beach, Long Beach, CA, United States
| | - Emel Demircan
- Human Performance and Robotics Laboratory, Department of Mechanical and Aerospace Engineering, California State University Long Beach, Long Beach, CA, United States.,Department of Biomedical Engineering, California State University Long Beach, Long Beach, CA, United States
| |
Collapse
|
7
|
Li D, Hu R, Huang W, Li D, Wang X, Hu C. Trajectory Association for Person Re-identification. Neural Process Lett 2021. [DOI: 10.1007/s11063-021-10540-8] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
8
|
Feng Y, Yuan Y, Lu X. Person Reidentification via Unsupervised Cross-View Metric Learning. IEEE TRANSACTIONS ON CYBERNETICS 2021; 51:1849-1859. [PMID: 31021787 DOI: 10.1109/tcyb.2019.2909480] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Person reidentification (Re-ID) aims to match observations of individuals across multiple nonoverlapping camera views. Recently, metric learning-based methods have played important roles in addressing this task. However, metrics are mostly learned in supervised manners, of which the performance relies heavily on the quantity and quality of manual annotations. Meanwhile, metric learning-based algorithms generally project person features into a common subspace, in which the extracted features are shared by all views. However, it may result in information loss since these algorithms neglect the view-specific features. Besides, they assume person samples of different views are taken from the same distribution. Conversely, these samples are more likely to obey different distributions due to view condition changes. To this end, this paper proposes an unsupervised cross-view metric learning method based on the properties of data distributions. Specifically, person samples in each view are taken from a mixture of two distributions: one models common prosperities among camera views and the other focuses on view-specific properties. Based on this, we introduce a shared mapping to explore the shared features. Meanwhile, we construct view-specific mappings to extract and project view-related features into a common subspace. As a result, samples in the transformed subspace follow the same distribution and are equipped with comprehensive representations. In this paper, these mappings are learned in an unsupervised manner by clustering samples in the projected space. Experimental results on five cross-view datasets validate the effectiveness of the proposed method.
Collapse
|
9
|
Wang X, Chen J, Jiang K, Han Z, Ruan W, Wang Z, Liang C. Single image de-raining via clique recursive feedback mechanism. Neurocomputing 2020. [DOI: 10.1016/j.neucom.2020.07.083] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
|
10
|
Zhang J, Niu L, Zhang L. Person Re-Identification With Reinforced Attribute Attention Selection. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2020; 30:603-616. [PMID: 33186114 DOI: 10.1109/tip.2020.3036762] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Person re-identification (Re-ID) aims to match pedestrian images across various scenes in video surveillance. There are a few works using attribute information to boost Re-ID performance. Specifically, those methods leverage attribute information to boost Re-ID performance by introducing auxiliary tasks like verifying the image level attribute information of two pedestrian images or recognizing identity level attributes. Identity level attribute annotations cost less manpower and are well-fitted for person re-identification task compared with image-level attribute annotations. However, the identity attribute information may be very noisy due to incorrect attribute annotation or lack of discriminativeness to distinguish different persons, which is probably unhelpful for the Re-ID task. In this paper, we propose a novel Attribute Attentional Block (AAB), which can be integrated into any backbone network or framework. Our AAB adopts reinforcement learning to drop noisy attributes based on our designed reward and then utilizes aggregated attribute attention of the remaining attributes to facilitate the Re-ID task. Experimental results demonstrate that our proposed method achieves state-of-the-art results on three benchmark datasets.
Collapse
|
11
|
Jiang J, Yu Y, Wang Z, Tang S, Hu R, Ma J. Ensemble Super-Resolution With a Reference Dataset. IEEE TRANSACTIONS ON CYBERNETICS 2020; 50:4694-4708. [PMID: 30843812 DOI: 10.1109/tcyb.2018.2890149] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
By developing sophisticated image priors or designing deep(er) architectures, a variety of image super-resolution (SR) approaches have been proposed recently and achieved very promising performance. A natural question that arises is whether these methods can be reformulated into a unifying framework and whether this framework assists in SR reconstruction? In this paper, we present a simple but effective single image SR method based on ensemble learning, which can produce a better performance than that could be obtained from any of SR methods to be ensembled (or called component super-resolvers). Based on the assumption that better component super-resolver should have larger ensemble weight when performing SR reconstruction, we present a maximum a posteriori (MAP) estimation framework for the inference of optimal ensemble weights. Especially, we introduce a reference dataset, which is composed of high-resolution (HR) and low-resolution (LR) image pairs, to measure the SR abilities (prior knowledge) of different component super-resolvers. To obtain the optimal ensemble weights, we propose to incorporate the reconstruction constraint, which states that the degenerated HR estimation should be equal to the LR observation one, as well as the prior knowledge of ensemble weights into the MAP estimation framework. Moreover, the proposed optimization problem can be solved by an analytical solution. We study the performance of the proposed method by comparing with different competitive approaches, including four state-of-the-art nondeep learning-based methods, four latest deep learning-based methods, and one ensemble learning-based method, and prove its effectiveness and superiority on some general image datasets and face image datasets.
Collapse
|
12
|
Zhang C, Zhu L, Zhang S, Yu W. PAC-GAN: An effective pose augmentation scheme for unsupervised cross-view person re-identification. Neurocomputing 2020. [DOI: 10.1016/j.neucom.2019.12.094] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
13
|
Wang Z, Jiang J, Wu Y, Ye M, Bai X, Satoh S. Learning Sparse and Identity-Preserved Hidden Attributes for Person Re-Identification. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2019; 29:2013-2025. [PMID: 31634836 DOI: 10.1109/tip.2019.2946975] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Person re-identification (Re-ID) aims at matching person images captured in non-overlapping camera views. To represent person appearance, low-level visual features are sensitive to environmental changes, while high-level semantic attributes, such as "short-hair" or "long-hair", are relatively stable. Hence, researches have started to design semantic attributes to reduce the visual ambiguity. However, to train a prediction model for semantic attributes, it requires plenty of annotations, which are hard to obtain in practical large-scale applications. To alleviate the reliance on annotation efforts, we propose to incrementally generate Deep Hidden Attribute (DHA) based on baseline deep network for newly uncovered annotations. In particular, we propose an auto-encoder model that can be plugged into any deep network to mine latent information in an unsupervised manner. To optimize the effectiveness of DHA, we reform the auto-encoder model with additional orthogonal generation module, along with identity-preserving and sparsity constraints. 1) Orthogonally generating: In order to make DHAs different from each other, Singular Vector Decomposition (SVD) is introduced to generate DHAs orthogonally. 2) Identity-preserving constraint: The generated DHAs should be distinct for telling different persons, so we associate DHAs with person identities. 3) Sparsity constraint: To enhance the discriminability of DHAs, we also introduce the sparsity constraint to restrict the number of effective DHAs for each person. Experiments conducted on public datasets have validated the effectiveness of the proposed network. On two large-scale datasets, i.e., Market-1501 and DukeMTMC-reID, the proposed method outperforms the state-of-the-art methods.
Collapse
|
14
|
Xu Z, Hu R, Chen J, Chen C, Jiang J, Li J, Li H. Semisupervised Discriminant Multimanifold Analysis for Action Recognition. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2019; 30:2951-2962. [PMID: 30762568 DOI: 10.1109/tnnls.2018.2886008] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Although recent semisupervised approaches have proven their effectiveness when there are limited training data, they assume that the samples from different actions lie on a single data manifold in the feature space and try to uncover a common subspace for all samples. However, this assumption ignores the intraclass compactness and the interclass separability simultaneously. We believe that human actions should occupy multimanifold subspace and, therefore, model the samples of the same action as the same manifold and those of different actions as different manifolds. In order to obtain the optimum subspace projection matrix, the current approaches may be mathematically imprecise owe to the badly scaled matrix and improper convergence. To address these issues in unconstrained convex optimization, we introduce a nontrivial spectral projected gradient method and Karush-Kuhn-Tucker conditions without matrix inversion. Through maximizing the separability between different classes by using labeled data points and estimating the intrinsic geometric structure of the data distributions by exploring unlabeled data points, the proposed algorithm can learn global and local consistency and boost the recognition performance. Extensive experiments conducted on the realistic video data sets, including JHMDB, HMDB51, UCF50, and UCF101, have demonstrated that our algorithm outperforms the compared algorithms, including deep learning approach when there are only a few labeled samples.
Collapse
|
15
|
Yu Y, Ji Z, Guo J, Zhang Z. Zero-Shot Learning via Latent Space Encoding. IEEE TRANSACTIONS ON CYBERNETICS 2019; 49:3755-3766. [PMID: 30010606 DOI: 10.1109/tcyb.2018.2850750] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Zero-shot learning (ZSL) is typically achieved by resorting to a class semantic embedding space to transfer the knowledge from the seen classes to unseen ones. Capturing the common semantic characteristics between the visual modality and the class semantic modality (e.g., attributes or word vector) is a key to the success of ZSL. In this paper, we propose a novel encoder-decoder approach, namely latent space encoding (LSE), to connect the semantic relations of different modalities. Instead of requiring a projection function to transfer information across different modalities like most previous work, LSE performs the interactions of different modalities via a feature aware latent space, which is learned in an implicit way. Specifically, different modalities are modeled separately but optimized jointly. For each modality, an encoder-decoder framework is performed to learn a feature aware latent space via jointly maximizing the recoverability of the original space from the latent space and the predictability of the latent space from the original space. To relate different modalities together, their features referring to the same concept are enforced to share the same latent codings. In this way, the common semantic characteristics of different modalities are generalized with the latent representations. Another property of the proposed approach is that it is easily extended to more modalities. Extensive experimental results on four benchmark datasets [animal with attribute, Caltech UCSD birds, aPY, and ImageNet] clearly demonstrate the superiority of the proposed approach on several ZSL tasks, including traditional ZSL, generalized ZSL, and zero-shot retrieval.
Collapse
|
16
|
Gao C, Wang J, Liu L, Yu JG, Sang N. Superpixel-Based Temporally Aligned Representation for Video-Based Person Re-Identification. SENSORS 2019; 19:s19183861. [PMID: 31500196 PMCID: PMC6766808 DOI: 10.3390/s19183861] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/18/2019] [Revised: 09/03/2019] [Accepted: 09/03/2019] [Indexed: 11/29/2022]
Abstract
Most existing person re-identification methods focus on matching still person images across non-overlapping camera views. Despite their excellent performance in some circumstances, these methods still suffer from occlusion and the changes of pose, viewpoint or lighting. Video-based re-id is a natural way to overcome these problems, by exploiting space–time information from videos. One of the most challenging problems in video-based person re-identification is temporal alignment, in addition to spatial alignment. To address the problem, we propose an effective superpixel-based temporally aligned representation for video-based person re-identification, which represents a video sequence only using one walking cycle. Particularly, we first build a candidate set of walking cycles by extracting motion information at superpixel level, which is more robust than that at the pixel level. Then, from the candidate set, we propose an effective criterion to select the walking cycle most matching the intrinsic periodicity property of walking persons. Finally, we propose a temporally aligned pooling scheme to describe the video data in the selected walking cycle. In addition, to characterize the individual still images in the cycle, we propose a superpixel-based representation to improve spatial alignment. Extensive experimental results on three public datasets demonstrate the effectiveness of the proposed method compared with the state-of-the-art approaches.
Collapse
Affiliation(s)
- Changxin Gao
- Key Laboratory of Ministry of Education for Image Processing and Intelligent Control, School of Artificial Intelligence and Automation, Huazhong University of Science and Technology, Wuhan 430074, China.
| | - Jin Wang
- Key Laboratory of Ministry of Education for Image Processing and Intelligent Control, School of Artificial Intelligence and Automation, Huazhong University of Science and Technology, Wuhan 430074, China.
| | - Leyuan Liu
- National Engineering Research Center for E-Learning, Central China Normal University, Wuhan 430079, China.
| | - Jin-Gang Yu
- School of Automation Science and Engineering, South China University of Technology, Guangzhou 510640, China.
| | - Nong Sang
- Key Laboratory of Ministry of Education for Image Processing and Intelligent Control, School of Artificial Intelligence and Automation, Huazhong University of Science and Technology, Wuhan 430074, China.
| |
Collapse
|
17
|
Liang G, Lan X, Chen X, Zheng K, Wang S, Zheng N. Cross-View Person Identification Based on Confidence-Weighted Human Pose Matching. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2019; 28:3821-3835. [PMID: 30794171 DOI: 10.1109/tip.2019.2899782] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Cross-view person identification (CVPI) from multiple temporally synchronized videos taken by multiple wearable cameras from different, varying views is a very challenging but important problem, which has attracted more interest recently. Current state-of-the-art performance of CVPI is achieved by matching appearance and motion features across videos, while the matching of pose features does not work effectively given the high inaccuracy of the 3D pose estimation on videos/images collected in the wild. To address this problem, we first introduce a new metric of confidence to the estimated location of each human-body joint in 3D human pose estimation. Then, a mapping function, which can be hand-crafted or learned directly from the datasets, is proposed to combine the inaccurately estimated human pose and the inferred confidence metric to accomplish CVPI. Specifically, the joints with higher confidence are weighted more in the pose matching for CVPI. Finally, the estimated pose information is integrated into the appearance and motion features to boost the CVPI performance. In the experiments, we evaluate the proposed method on three wearable-camera video datasets and compare the performance against several other existing CVPI methods. The experimental results show the effectiveness of the proposed confidence metric, and the integration of pose, appearance, and motion produces a new state-of-the-art CVPI performance.
Collapse
|
18
|
Zhang C, Wu L, Wang Y. Crossing generative adversarial networks for cross-view person re-identification. Neurocomputing 2019. [DOI: 10.1016/j.neucom.2019.01.093] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
|
19
|
Multi-Information Flow CNN and Attribute-Aided Reranking for Person Reidentification. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE 2019; 2019:7028107. [PMID: 30881442 PMCID: PMC6381562 DOI: 10.1155/2019/7028107] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/07/2018] [Accepted: 01/08/2019] [Indexed: 11/18/2022]
Abstract
This paper presents a multi-information flow convolutional neural network (MiF-CNN) model for person reidentification (re-id). It contains several specific multilayer convolutional structures, where the input and output of a convolutional layer are concatenated together on channel dimension. With this idea, layers of model can go deeper and feature maps can be reused by each subsequent layer. Inspired by an image caption, a person attribute recognition network is proposed based on long-short-term memory network and attention mechanism. By fusing identification results of MiF-CNN and attribute recognition, this paper introduces the attribute-aided reranking algorithm to improve the accuracy of person re-id further. Experiments on VIPeR, CUHK01, and Market1501 datasets verify the proposed MiF-CNN can be trained sufficiently with small-scale datasets and obtain outstanding accuracy of person re-id. Contrast experiments also confirm the availability of the attribute-assisted reranking algorithm.
Collapse
|
20
|
Ye M, Li J, Ma AJ, Zheng L, Yuen PC. Dynamic Graph Co-Matching for Unsupervised Video-based Person Re-Identification. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2019; 28:2976-2990. [PMID: 30640612 DOI: 10.1109/tip.2019.2893066] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Cross-camera label estimation from a set of unlabelled training data is an extremely important component in unsupervised person re-identification (re-ID) systems. With the estimated labels, existing advanced supervised learning methods can be leveraged to learn discriminative re-ID models. In this paper, we utilize the graph matching technique for accurate label estimation due to its advantages in optimal global matching and intra-camera relationship mining. However, the graph structure constructed with non-learnt similarity measurement cannot handle the large cross-camera variations, which leads to noisy and inaccurate label outputs. This paper designs a Dynamic Graph Matching (DGM) framework, which improves the label estimation process by iteratively refining the graph structure with better similarity measurement learnt from intermediate estimated labels. In addition, we design a positive re-weighting strategy to refine the intermediate labels, which enhances the robustness against inaccurate matching output and noisy initial training data. To fully utilize the abundant video information and reduce false matchings, a co-matching strategy is further incorporated into the framework. Comprehensive experiments conducted on three video benchmarks demonstrate that DGM outperforms state-of-the-art unsupervised re-ID methods and yields competitive performance to fully supervised upper bounds.
Collapse
|
21
|
Zhang D, Ding W, Zhang B, Xie C, Li H, Liu C, Han J. Automatic Modulation Classification Based on Deep Learning for Unmanned Aerial Vehicles. SENSORS (BASEL, SWITZERLAND) 2018; 18:E924. [PMID: 29558434 PMCID: PMC5876703 DOI: 10.3390/s18030924] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/11/2018] [Revised: 03/14/2018] [Accepted: 03/15/2018] [Indexed: 12/03/2022]
Abstract
Deep learning has recently attracted much attention due to its excellent performance in processing audio, image, and video data. However, few studies are devoted to the field of automatic modulation classification (AMC). It is one of the most well-known research topics in communication signal recognition and remains challenging for traditional methods due to complex disturbance from other sources. This paper proposes a heterogeneous deep model fusion (HDMF) method to solve the problem in a unified framework. The contributions include the following: (1) a convolutional neural network (CNN) and long short-term memory (LSTM) are combined by two different ways without prior knowledge involved; (2) a large database, including eleven types of single-carrier modulation signals with various noises as well as a fading channel, is collected with various signal-to-noise ratios (SNRs) based on a real geographical environment; and (3) experimental results demonstrate that HDMF is very capable of coping with the AMC problem, and achieves much better performance when compared with the independent network.
Collapse
Affiliation(s)
- Duona Zhang
- School of Beihang University, Beijing 100083, China.
| | - Wenrui Ding
- School of Beihang University, Beijing 100083, China.
| | | | - Chunyu Xie
- School of Beihang University, Beijing 100083, China.
| | - Hongguang Li
- School of Beihang University, Beijing 100083, China.
| | - Chunhui Liu
- School of Beihang University, Beijing 100083, China.
| | - Jungong Han
- School of Computing & Communications, Lancaster University, Lancaster LA1 4WA, UK.
| |
Collapse
|