1
|
Yin Y, Xu W, Chen L, Wu H. CoT-UNet++: A medical image segmentation method based on contextual transformer and dense connection. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2023; 20:8320-8336. [PMID: 37161200 DOI: 10.3934/mbe.2023364] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/11/2023]
Abstract
Accurate depiction of individual teeth from CBCT images is a critical step in the diagnosis of oral diseases, and the traditional methods are very tedious and laborious, so automatic segmentation of individual teeth in CBCT images is important to assist physicians in diagnosis and treatment. TransUNet has achieved success in medical image segmentation tasks, which combines the advantages of Transformer and CNN. However, the skip connection taken by TransUNet leads to unnecessary restrictive fusion and also ignores the rich context between adjacent keys. To solve these problems, this paper proposes a context-transformed TransUNet++ (CoT-UNet++) architecture, which consists of a hybrid encoder, a dense connection, and a decoder. To be specific, a hybrid encoder is first used to obtain the contextual information between adjacent keys by CoTNet and the global context encoded by Transformer. Then the decoder upsamples the encoded features by cascading upsamplers to recover the original resolution. Finally, the multi-scale fusion between the encoded and decoded features at different levels is performed by dense concatenation to obtain more accurate location information. In addition, we employ a weighted loss function consisting of focal, dice, and cross-entropy to reduce the training error and achieve pixel-level optimization. Experimental results demonstrate that the proposed CoT-UNet++ method outperforms the baseline models and can obtain better performance in tooth segmentation.
Collapse
Affiliation(s)
- Yijun Yin
- School of Information Science and Engineering, Shandong University, Qingdao 266200, China
| | - Wenzheng Xu
- School of Information Science and Engineering, Shandong University, Qingdao 266200, China
| | - Lei Chen
- School of Information Science and Engineering, Shandong University, Qingdao 266200, China
| | - Hao Wu
- Department of Stomatology, the First Medical Centre, Chinese PLA General Hospital, Beijing 100853, China
| |
Collapse
|
2
|
Lu J, Wan H, Li P, Zhao X, Ma N, Gao Y. Exploring High-order Spatio-temporal Correlations from Skeleton for Person Re-identification. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2023; PP:949-963. [PMID: 37021861 DOI: 10.1109/tip.2023.3236144] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Person re-identification (Re-ID) has become a hot research topic due to its widespread applications. Conducting person Re-ID in video sequences is a practical requirement, in which the crucial challenge is how to pursue a robust video representation based on spatial and temporal features. However, most of the previous methods only consider how to integrate part-level features in the spatio-temporal range, while how to model and generate the part-correlations is little exploited. In this paper, we propose a skeleton-based dynamic hypergraph framework, namely Skeletal Temporal Dynamic Hypergraph Neural Network (ST-DHGNN) for person Re-ID, which resorts to modeling the high-order correlations among various body parts based on a time series of skeletal information. Specifically, multi-shape and multi-scale patches are heuristically cropped from feature maps, constituting spatial representations in different frames. A joint-centered hypergraph and a bone-centered hypergraph are constructed in parallel from multiple body parts (i.e., head, trunk, and legs) with spatio-temporal multi-granularity in the entire video sequence, in which the graph vertices representing regional features and hyperedges denoting relationships. Dynamic hypergraph propagation containing the re-planning module and the hyperedge elimination module is proposed to better integrate features among vertices. Feature aggregation and attention mechanisms are also adopted to obtain a better video representation for person Re-ID. Experiments show that the proposed method performs significantly better than the state-of-the-art on three video-based person Re-ID datasets, including iLIDS-VID, PRID-2011, and MARS.
Collapse
|
3
|
Liu X, Yuan D, Xue K, Li JB, Zhao H, Liu H, Wang T. Diffeomorphic matching with multiscale kernels based on sparse parameterization for cross-view target detection. APPL INTELL 2022. [DOI: 10.1007/s10489-022-03668-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
|
4
|
OL-JCMSR: A Joint Coding Monitoring Strategy Recommendation Model Based on Operation Log. MATHEMATICS 2022. [DOI: 10.3390/math10132292] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
A surveillance system with more than hundreds of cameras and much fewer monitors strongly relies on manual scheduling and inspections from monitoring personnel. A monitoring method which improves the surveillance performance by analyzing and learning from a large amount of manual operation logs is proposed in this paper. Compared to fixed rules or existing computer-vision methods, the proposed method can more effectively learn from the operators’ behaviors and incorporate their intentions into the monitoring strategy. To the best of our knowledge, this method is the first to apply a monitoring-strategy recommendation model containing a global encoder and a local encoder in monitoring systems. The local encoder can adaptively select important items in the operating sequence to capture the main purpose of the operator, while the global encoder is used to summarize the behavior of the entire sequence. Two experiments are conducted on two data sets. Compared with att-RNN and att-GRU, the joint coding model in experiment 1 improves the Recall@20 by 9.4% and 4.6%, respectively, and improves the MRR@20 by 5.49% and 3.86%, respectively. In experiment 2, compared with att-RNN and att-GRU, the joint coding model improves by 11.8% and 6.2% on Recall@20, and improves by 7.02% and 5.16% on MRR@20, respectively. The results illustrate the effectiveness of the our model in monitoring systems.
Collapse
|
5
|
Wang M, Li P, Shen L, Wang Y, Wang S, Wang W, Zhang X, Chen J, Luo Z. Informative pairs mining based adaptive metric learning for adversarial domain adaptation. Neural Netw 2022; 151:238-249. [PMID: 35447481 DOI: 10.1016/j.neunet.2022.03.031] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2021] [Revised: 02/22/2022] [Accepted: 03/24/2022] [Indexed: 11/16/2022]
Abstract
Adversarial domain adaptation has made remarkable in promoting feature transferability, while recent work reveals that there exists an unexpected degradation of feature discrimination during the procedure of learning transferable features. This paper proposes an informative pairs mining based adaptive metric learning (IPM-AML), where a novel two-triplet-sampling strategy is advanced to select informative positive pairs from the same classes and informative negative pairs from different classes, and a metric loss imposed with special weights is further utilized to adaptively pay more attention to those more informative pairs which can adaptively improve discrimination. Then, we incorporate IPM-AML into popular conditional domain adversarial network (CDAN) to learn feature representation that is transferable and discriminative desirably (IPM-AML-CDAN). To ensure the reliability of pseudo target labels in the whole training process, we select more confident target ones whose predicted scores are higher than a given threshold T, and also provide theoretical validation for this simple threshold strategy. Extensive experiment results on four cross-domain benchmarks validate that IPM-AML-CDAN can achieve competitive results compared with state-of-the-art approaches.
Collapse
Affiliation(s)
- Mengzhu Wang
- National University of Defense Technology, Changsha, Hunan, China
| | - Paul Li
- Baidu Research, Sunnyvale, CA, USA
| | - Li Shen
- JD Explore Academy, Beijing, China
| | - Ye Wang
- National University of Defense Technology, Changsha, Hunan, China
| | | | - Wei Wang
- Dalian University of Technology, Dalian, Liaoning, China
| | - Xiang Zhang
- National University of Defense Technology, Changsha, Hunan, China
| | | | - Zhigang Luo
- National University of Defense Technology, Changsha, Hunan, China
| |
Collapse
|
6
|
Ding X, Wang K, Wang C, Lan T, Liu L. Sequential convolutional network for behavioral pattern extraction in gait recognition. Neurocomputing 2021. [DOI: 10.1016/j.neucom.2021.08.054] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
|
7
|
Xu L, Kim P, Wang M, Pan J, Yang X, Gao M. Spatio-temporal joint aberrance suppressed correlation filter for visual tracking. COMPLEX INTELL SYST 2021. [DOI: 10.1007/s40747-021-00544-1] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
Abstract
AbstractThe discriminative correlation filter (DCF)-based tracking methods have achieved remarkable performance in visual tracking. However, the existing DCF paradigm still suffers from dilemmas such as boundary effect, filter degradation, and aberrance. To address these problems, we propose a spatio-temporal joint aberrance suppressed regularization (STAR) correlation filter tracker under a unified framework of response map. Specifically, a dynamic spatio-temporal regularizer is introduced into the DCF to alleviate the boundary effect and filter degradation, simultaneously. Meanwhile, an aberrance suppressed regularizer is exploited to reduce the interference of background clutter. The proposed STAR model is effectively optimized using the alternating direction method of multipliers (ADMM). Finally, comprehensive experiments on TC128, OTB2013, OTB2015 and UAV123 benchmarks demonstrate that the STAR tracker achieves compelling performance compared with the state-of-the-art (SOTA) trackers.
Collapse
|
8
|
|
9
|
Wu H, Tian J, Fu Y, Li B, Li X. Condition-Aware Comparison Scheme for Gait Recognition. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2021; 30:2734-2744. [PMID: 33259300 DOI: 10.1109/tip.2020.3039888] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
As an important and challenging problem, gait recognition has gained considerable attention. It suffers from confounding conditions, that is, it is sensitive to camera views, dressing types and so on. Interestingly, it is observed that, under different conditions, local body parts contribute differently to recognition performance. In this paper, we propose a condition-aware comparison scheme to measure gait pairs' similarity via a novel module named Instructor. Also, we present a geometry-guided data augmentation approach (Dresser) to enrich dressing conditions. Furthermore, to enhance the gait representation, we propose to model temporal local information from coarse to fine. Our model is evaluated on two popular benchmarks, CASIA-B and OULP. Results show that our method outperforms current state-of-the-art methods, especially in the cross-condition scenario.
Collapse
|
10
|
Abstract
Abstract
Gait recognition in video surveillance is still challenging because the employed gait features are usually affected by many variations. To overcome this difficulty, this paper presents a novel Deep Large Margin Nearest Neighbor (DLMNN) method for gait recognition. The proposed DLMNN trains a convolutional neural network to project gait feature onto a metric subspace, under which intra-class gait samples are pulled together as small as possible while inter-class samples are pushed apart by a large margin. We provide an extensive evaluation in terms of various scenarios, namely, normal, carrying, clothing, and cross-view condition on two widely used gait datasets. Experimental results demonstrate that the proposed DLMNN achieves competitive gait recognition performances and promising computational efficiency.
Collapse
Affiliation(s)
- Wanjiang Xu
- Yancheng Teachers University , Yancheng , China
| |
Collapse
|
11
|
Cheng K, Gao S, Dong W, Yang X, Wang Q, Yu H. Boosting label weighted extreme learning machine for classifying multi-label imbalanced data. Neurocomputing 2020. [DOI: 10.1016/j.neucom.2020.04.098] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
|
12
|
|
13
|
Liu F, Xu X, Zhang T, Guo K, Wang L. Exploring privileged information from simple actions for complex action recognition. Neurocomputing 2020. [DOI: 10.1016/j.neucom.2019.11.020] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022]
|
14
|
Zhang Y, Huang Y, Yu S, Wang L. Cross-view Gait Recognition by Discriminative Feature Learning. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2019; 29:1001-1015. [PMID: 31295113 DOI: 10.1109/tip.2019.2926208] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/27/2023]
Abstract
Recently, deep learning based cross-view gait recognition becomes popular owing to the strong capacity of convolutional neural networks (CNNs). Current deep learning methods often rely on loss functions used widely in the task of face recognition, e.g., contrastive loss and triplet loss. These loss functions have the problem of hard negative mining. In this paper, a robust, effective and gait-related loss function, called angle center loss (ACL), is proposed to learn discriminative gait features. The proposed loss function is robust to different local parts and temporal window sizes. Different from center loss which learns a center for each identity, the proposed loss function learns multiple sub-centers for each angle of the same identity. Only the largest distance between the anchor feature and the corresponding crossview sub-centers is penalized, which achieves better intra-subject compactness. We also propose to extract discriminative spatialtemporal features by local feature extractors and a temporal attention model. A simplified spatial transformer network is proposed to localize the suitable horizontal parts of the human body. Local gait features for each horizontal part are extracted and then concatenated as the descriptor. We introduce long-short term memory (LSTM) units as the temporal attention model to learn the attention score for each frame, e.g., focusing more on discriminative frames and less on frames with bad quality. The temporal attention model shows better performance than the temporal average pooling or gait energy images (GEI). By combing the three aspects, we achieve the state-of-the-art results on several cross-view gait recognition benchmarks.
Collapse
|