1
|
Liu H, Ma S, Xia D, Li S. SFANet: A Spectrum-Aware Feature Augmentation Network for Visible-Infrared Person Reidentification. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2023; 34:1958-1971. [PMID: 34464275 DOI: 10.1109/tnnls.2021.3105702] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/27/2023]
Abstract
Visible-Infrared person reidentification (VI-ReID) is a challenging matching problem due to large modality variations between visible and infrared images. Existing approaches usually bridge the modality gap with only feature-level constraints, ignoring pixel-level variations. Some methods employ a generative adversarial network (GAN) to generate style-consistent images, but it destroys the structure information and incurs a considerable level of noise. In this article, we explicitly consider these challenges and formulate a novel spectrum-aware feature augmentation network named SFANet for cross-modality matching problem. Specifically, we put forward to employ grayscale-spectrum images to fully replace RGB images for feature learning. Learning with the grayscale-spectrum images, our model can apparently reduce modality discrepancy and detect inner structure relations across the different modalities, making it robust to color variations. At feature level, we improve the conventional two-stream network by balancing the number of specific and sharable convolutional blocks, which preserve the spatial structure information of features. Additionally, a bidirectional tri-constrained top-push ranking loss (BTTR) is embedded in the proposed network to improve the discriminability, which efficiently further boosts the matching accuracy. Meanwhile, we further introduce an effective dual-linear with batch normalization identification (ID) embedding method to model the identity-specific information and assist BTTR loss in magnitude stabilizing. On SYSU-MM01 and RegDB datasets, we conducted extensively experiments to demonstrate that our proposed framework contributes indispensably and achieves a very competitive VI-ReID performance.
Collapse
|
2
|
Bing Z, Brucker M, Morin FO, Li R, Su X, Huang K, Knoll A. Complex Robotic Manipulation via Graph-Based Hindsight Goal Generation. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2022; 33:7863-7876. [PMID: 34181552 DOI: 10.1109/tnnls.2021.3088947] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Reinforcement learning algorithms, such as hindsight experience replay (HER) and hindsight goal generation (HGG), have been able to solve challenging robotic manipulation tasks in multigoal settings with sparse rewards. HER achieves its training success through hindsight replays of past experience with heuristic goals but underperforms in challenging tasks in which goals are difficult to explore. HGG enhances HER by selecting intermediate goals that are easy to achieve in the short term and promising to lead to target goals in the long term. This guided exploration makes HGG applicable to tasks in which target goals are far away from the object's initial position. However, the vanilla HGG is not applicable to manipulation tasks with obstacles because the Euclidean metric used for HGG is not an accurate distance metric in such an environment. Although, with the guidance of a handcrafted distance grid, grid-based HGG can solve manipulation tasks with obstacles, a more feasible method that can solve such tasks automatically is still in demand. In this article, we propose graph-based hindsight goal generation (G-HGG), an extension of HGG selecting hindsight goals based on shortest distances in an obstacle-avoiding graph, which is a discrete representation of the environment. We evaluated G-HGG on four challenging manipulation tasks with obstacles, where significant enhancements in both sample efficiency and overall success rate are shown over HGG and HER. Videos can be viewed at https://videoviewsite.wixsite.com/ghgg.
Collapse
|
3
|
Miao J, Wu Y, Yang Y. Identifying Visible Parts via Pose Estimation for Occluded Person Re-Identification. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2022; 33:4624-4634. [PMID: 33651698 DOI: 10.1109/tnnls.2021.3059515] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
We focus on the occlusion problem in person re-identification (re-id), which is one of the main challenges in real-world person retrieval scenarios. Previous methods on the occluded re-id problem usually assume that only the probes are occluded, thereby removing occlusions by manually cropping. However, this may not always hold in practice. This article relaxes this assumption and investigates a more general occlusion problem, where both the probe and gallery images could be occluded. The key to this challenging problem is depressing the noise information by identifying bodies and occlusions. We propose to incorporate the pose information into the re-id framework, which benefits the model in three aspects. First, it provides the location of the body. We then design a Pose-Masked Feature Branch to make our model focus on the body region only and filter those noise features brought by occlusions. Second, the estimated pose reveals which body parts are visible, giving us a hint to construct more informative person features. We propose a Pose-Embedded Feature Branch to adaptively re-calibrate channel-wise feature responses based on the visible body parts. Third, in testing, the estimated pose indicates which regions are informative and reliable for both probe and gallery images. Then we explicitly split the extracted spatial feature into parts. Only part features from those commonly visible parts are utilized in the retrieval. To better evaluate the performances of the occluded re-id, we also propose a large-scale data set for the occluded re-id with more than 35 000 images, namely Occluded-DukeMTMC. Extensive experiments show our approach surpasses previous methods on the occluded, partial, and non-occluded re-id data sets.
Collapse
|
4
|
Pei J, Zhang J, Ni Z, Zhao Y. A novel video-based pedestrian re-identification method of sequence feature distribution similarity measurement combined with deep learning. APPL INTELL 2022. [DOI: 10.1007/s10489-022-04021-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
|
5
|
Zhou C, Xu C, Cui Z, Zhang T, Yang J. Self-Teaching Video Object Segmentation. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2022; 33:1623-1637. [PMID: 33690125 DOI: 10.1109/tnnls.2020.3043099] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Video object segmentation (VOS) is one of the most fundamental tasks for numerous sequent video applications. The crucial issue of online VOS is the drifting of segmenter when incrementally updated on continuous video frames under unconfident supervision constraints. In this work, we propose a self-teaching VOS (ST-VOS) method to make segmenter to learn online adaptation confidently as much as possible. In the segmenter learning at each time slice, the segment hypothesis and segmenter update are enclosed into a self-looping optimization circle such that they can be mutually improved for each other. To reduce error accumulation of the self-looping process, we specifically introduce a metalearning strategy to learn how to do this optimization within only a few iteration steps. To this end, the learning rates of segmenter are adaptively derived through metaoptimization in the channel space of convolutional kernels. Furthermore, to better launch the self-looping process, we calculate an initial mask map through part detectors and motion flow to well-establish a foundation for subsequent refinement, which could result in the robustness of the segmenter update. Extensive experiments demonstrate that this ST idea can boost the performance of baselines, and in the meantime, our ST-VOS achieves encouraging performance on the DAVIS16, Youtube-objects, DAVIS17, and SegTrackV2 data sets, where, in particular, the accuracy of 75.7% in J-mean metric is obtained on the multi-instance DAVIS17 data set.
Collapse
|
6
|
Chen H, He X, Yang H, Qing L, Teng Q. A Feature-Enriched Deep Convolutional Neural Network for JPEG Image Compression Artifacts Reduction and its Applications. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2022; 33:430-444. [PMID: 34793307 DOI: 10.1109/tnnls.2021.3124370] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
The amount of multimedia data, such as images and videos, has been increasing rapidly with the development of various imaging devices and the Internet, bringing more stress and challenges to information storage and transmission. The redundancy in images can be reduced to decrease data size via lossy compression, such as the most widely used standard Joint Photographic Experts Group (JPEG). However, the decompressed images generally suffer from various artifacts (e.g., blocking, banding, ringing, and blurring) due to the loss of information, especially at high compression ratios. This article presents a feature-enriched deep convolutional neural network for compression artifacts reduction (FeCarNet, for short). Taking the dense network as the backbone, FeCarNet enriches features to gain valuable information via introducing multi-scale dilated convolutions, along with the efficient 1 ×1 convolution for lowering both parameter complexity and computation cost. Meanwhile, to make full use of different levels of features in FeCarNet, a fusion block that consists of attention-based channel recalibration and dimension reduction is developed for local and global feature fusion. Furthermore, short and long residual connections both in the feature and pixel domains are combined to build a multi-level residual structure, thereby benefiting the network training and performance. In addition, aiming at reducing computation complexity further, pixel-shuffle-based image downsampling and upsampling layers are, respectively, arranged at the head and tail of the FeCarNet, which also enlarges the receptive field of the whole network. Experimental results show the superiority of FeCarNet over state-of-the-art compression artifacts reduction approaches in terms of both restoration capacity and model complexity. The applications of FeCarNet on several computer vision tasks, including image deblurring, edge detection, image segmentation, and object detection, demonstrate the effectiveness of FeCarNet further.
Collapse
|
7
|
Abstract
This study analyses the main challenges, trends, technological approaches, and artificial intelligence methods developed by new researchers and professionals in the field of machine learning, with an emphasis on the most outstanding and relevant works to date. This literature review evaluates the main methodological contributions of artificial intelligence through machine learning. The methodology used to study the documents was content analysis; the basic terminology of the study corresponds to machine learning, artificial intelligence, and big data between the years 2017 and 2021. For this study, we selected 181 references, of which 120 are part of the literature review. The conceptual framework includes 12 categories, four groups, and eight subgroups. The study of data management using AI methodologies presents symmetry in the four machine learning groups: supervised learning, unsupervised learning, semi-supervised learning, and reinforced learning. Furthermore, the artificial intelligence methods with more symmetry in all groups are artificial neural networks, Support Vector Machines, K-means, and Bayesian Methods. Finally, five research avenues are presented to improve the prediction of machine learning.
Collapse
|
8
|
Li T, Zhang W, Song R, Li Z, Liu J, Li X, Lu S. PoT-GAN: Pose Transform GAN for Person Image Synthesis. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2021; 30:7677-7688. [PMID: 34449357 DOI: 10.1109/tip.2021.3104183] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Pose-based person image synthesis aims to generate a new image containing a person with a target pose conditioned on a source image containing a person with a specified pose. It is challenging as the target pose is arbitrary and often significantly differs from the specified source pose, which leads to large appearance discrepancy between the source and the target images. This paper presents the Pose Transform Generative Adversarial Network (PoT-GAN) for person image synthesis where the generator explicitly learns the transform between the two poses by manipulating the corresponding multi-scale feature maps. By incorporating the learned pose transform information into the multi-scale feature maps of the source image in a GAN architecture, our method reliably transfers the appearance of the person in the source image to the target pose with no need for any hard-coded spatial information depicting the change of pose. According to both qualitative and quantitative results, the proposed PoT-GAN demonstrates a state-of-the-art performance on three publicly available datasets for person image synthesis.
Collapse
|
9
|
Zero-shot policy generation in lifelong reinforcement learning. Neurocomputing 2021. [DOI: 10.1016/j.neucom.2021.02.058] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
|
10
|
|
11
|
Zhang W, He X, Yu X, Lu W, Zha Z, Tian Q. A Multi-scale Spatial-temporal Attention Model for Person Re-identification in Videos. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2019; 29:3365-3373. [PMID: 31880550 DOI: 10.1109/tip.2019.2959653] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
In this paper, we propose a novel deep neural network based attention model to learn the representative local regions from a video sequence for person re-identification. Specifically, we propose a multi-scale spatial-temporal attention (MSTA) model to measure the regions of each frame in different scales from the perspective of whole video sequence. Compared to traditional temporal attention models, MSTA focuses on exploiting the importance of local regions of each frame to the whole video representation in both spatial and temporal domains. A new training strategy is designed for the proposed model by incorporating the image-to-image mode with the videoto- video mode. Extensive experiments on benchmark datasets demonstrate the superiority of the proposed model over state-ofthe- art methods.
Collapse
|
12
|
Gao C, Wang J, Liu L, Yu JG, Sang N. Superpixel-Based Temporally Aligned Representation for Video-Based Person Re-Identification. SENSORS 2019; 19:s19183861. [PMID: 31500196 PMCID: PMC6766808 DOI: 10.3390/s19183861] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/18/2019] [Revised: 09/03/2019] [Accepted: 09/03/2019] [Indexed: 11/29/2022]
Abstract
Most existing person re-identification methods focus on matching still person images across non-overlapping camera views. Despite their excellent performance in some circumstances, these methods still suffer from occlusion and the changes of pose, viewpoint or lighting. Video-based re-id is a natural way to overcome these problems, by exploiting space–time information from videos. One of the most challenging problems in video-based person re-identification is temporal alignment, in addition to spatial alignment. To address the problem, we propose an effective superpixel-based temporally aligned representation for video-based person re-identification, which represents a video sequence only using one walking cycle. Particularly, we first build a candidate set of walking cycles by extracting motion information at superpixel level, which is more robust than that at the pixel level. Then, from the candidate set, we propose an effective criterion to select the walking cycle most matching the intrinsic periodicity property of walking persons. Finally, we propose a temporally aligned pooling scheme to describe the video data in the selected walking cycle. In addition, to characterize the individual still images in the cycle, we propose a superpixel-based representation to improve spatial alignment. Extensive experimental results on three public datasets demonstrate the effectiveness of the proposed method compared with the state-of-the-art approaches.
Collapse
Affiliation(s)
- Changxin Gao
- Key Laboratory of Ministry of Education for Image Processing and Intelligent Control, School of Artificial Intelligence and Automation, Huazhong University of Science and Technology, Wuhan 430074, China.
| | - Jin Wang
- Key Laboratory of Ministry of Education for Image Processing and Intelligent Control, School of Artificial Intelligence and Automation, Huazhong University of Science and Technology, Wuhan 430074, China.
| | - Leyuan Liu
- National Engineering Research Center for E-Learning, Central China Normal University, Wuhan 430079, China.
| | - Jin-Gang Yu
- School of Automation Science and Engineering, South China University of Technology, Guangzhou 510640, China.
| | - Nong Sang
- Key Laboratory of Ministry of Education for Image Processing and Intelligent Control, School of Artificial Intelligence and Automation, Huazhong University of Science and Technology, Wuhan 430074, China.
| |
Collapse
|