1
|
Pu Y, Han Y, Wang Y, Feng J, Deng C, Huang G. Fine-Grained Recognition With Learnable Semantic Data Augmentation. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2024; 33:3130-3144. [PMID: 38662557 DOI: 10.1109/tip.2024.3364500] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/30/2024]
Abstract
Fine-grained image recognition is a longstanding computer vision challenge that focuses on differentiating objects belonging to multiple subordinate categories within the same meta-category. Since images belonging to the same meta-category usually share similar visual appearances, mining discriminative visual cues is the key to distinguishing fine-grained categories. Although commonly used image-level data augmentation techniques have achieved great success in generic image classification problems, they are rarely applied in fine-grained scenarios, because their random editing-region behavior is prone to destroy the discriminative visual cues residing in the subtle regions. In this paper, we propose diversifying the training data at the feature-level to alleviate the discriminative region loss problem. Specifically, we produce diversified augmented samples by translating image features along semantically meaningful directions. The semantic directions are estimated with a covariance prediction network, which predicts a sample-wise covariance matrix to adapt to the large intra-class variation inherent in fine-grained images. Furthermore, the covariance prediction network is jointly optimized with the classification network in a meta-learning manner to alleviate the degenerate solution problem. Experiments on four competitive fine-grained recognition benchmarks (CUB-200-2011, Stanford Cars, FGVC Aircrafts, NABirds) demonstrate that our method significantly improves the generalization performance on several popular classification networks (e.g., ResNets, DenseNets, EfficientNets, RegNets and ViT). Combined with a recently proposed method, our semantic data augmentation approach achieves state-of-the-art performance on the CUB-200-2011 dataset. Source code is available at https://github.com/LeapLabTHU/LearnableISDA.
Collapse
|
2
|
Wang D, Guo L, Zhong J, Yu H, Tang Y, Peng L, Cai Q, Qi Y, Zhang D, Lin P. A novel deep-learning based weighted feature fusion architecture for precise classification of pressure injury. Front Physiol 2024; 15:1304829. [PMID: 38455845 PMCID: PMC10917912 DOI: 10.3389/fphys.2024.1304829] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2023] [Accepted: 02/12/2024] [Indexed: 03/09/2024] Open
Abstract
Introduction: Precise classification has an important role in treatment of pressure injury (PI), while current machine-learning or deeplearning based methods of PI classification remain low accuracy. Methods: In this study, we developed a deeplearning based weighted feature fusion architecture for fine-grained classification, which combines a top-down and bottom-up pathway to fuse high-level semantic information and low-level detail representation. We validated it in our established database that consist of 1,519 images from multi-center clinical cohorts. ResNeXt was set as the backbone network. Results: We increased the accuracy of stage 3 PI from 60.3% to 76.2% by adding weighted feature pyramid network (wFPN). The accuracy for stage 1, 2, 4 PI were 0.870, 0.788, and 0.845 respectively. We found the overall accuracy, precision, recall, and F1-score of our network were 0.815, 0.808, 0.816, and 0.811 respectively. The area under the receiver operating characteristic curve was 0.940. Conclusions: Compared with current reported study, our network significantly increased the overall accuracy from 75% to 81.5% and showed great performance in predicting each stage. Upon further validation, our study will pave the path to the clinical application of our network in PI management.
Collapse
Affiliation(s)
- Dongfang Wang
- Department of Neurosurgery, Wuhan University Renmin Hospital, Wuhan, China
- School of Physics and Technology, Wuhan University, Wuhan, China
| | - Lirui Guo
- Department of Neurosurgery, Wuhan University Renmin Hospital, Wuhan, China
| | - Juan Zhong
- Department of Neurosurgery, Wuhan University Renmin Hospital, Wuhan, China
| | - Huodan Yu
- Department of Neurosurgery, Wuhan University Renmin Hospital, Wuhan, China
| | - Yadi Tang
- Department of Neurosurgery, Wuhan University Renmin Hospital, Wuhan, China
| | - Li Peng
- Union Hospital Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
| | - Qiuni Cai
- Neurosurgery Department, Zhongshan Hospital Xiamen University, Xiamen, China
| | - Yangzhi Qi
- Department of Neurosurgery, Wuhan University Renmin Hospital, Wuhan, China
| | - Dong Zhang
- School of Physics and Technology, Wuhan University, Wuhan, China
| | - Puxuan Lin
- Department of Neurosurgery, Wuhan University Renmin Hospital, Wuhan, China
| |
Collapse
|
3
|
Zhang Y, Hu J, Jiang R, Lin Z, Chen Z. Fine-Grained Radio Frequency Fingerprint Recognition Network Based on Attention Mechanism. ENTROPY (BASEL, SWITZERLAND) 2023; 26:29. [PMID: 38248155 PMCID: PMC10814318 DOI: 10.3390/e26010029] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/09/2023] [Revised: 12/22/2023] [Accepted: 12/26/2023] [Indexed: 01/23/2024]
Abstract
With the rapid development of the internet of things (IoT), hundreds of millions of IoT devices, such as smart home appliances, intelligent-connected vehicles, and wearable devices, have been connected to the network. The open nature of IoT makes it vulnerable to cybersecurity threats. Traditional cryptography-based encryption methods are not suitable for IoT due to their complexity and high communication overhead requirements. By contrast, RF-fingerprint-based recognition is promising because it is rooted in the inherent non-reproducible hardware defects of the transmitter. However, it still faces the challenges of low inter-class variation and large intra-class variation among RF fingerprints. Inspired by fine-grained recognition in computer vision, we propose a fine-grained RF fingerprint recognition network (FGRFNet) in this article. The network consists of a top-down feature pathway hierarchy to generate pyramidal features, attention modules to locate discriminative regions, and a fusion module to adaptively integrate features from different scales. Experiments demonstrate that the proposed FGRFNet achieves recognition accuracies of 89.8% on 100 ADS-B devices, 99.5% on 54 Zigbee devices, and 83.0% on 25 LoRa devices.
Collapse
Affiliation(s)
| | - Jun Hu
- School of Electronics and Communication Engineering, Sun Yat-sen University, Shenzhen 518107, China; (Y.Z.); (R.J.); (Z.L.); (Z.C.)
| | | | | | | |
Collapse
|
4
|
Liu D, Zhang D, Wang L, Wang J. Semantic segmentation of autonomous driving scenes based on multi-scale adaptive attention mechanism. Front Neurosci 2023; 17:1291674. [PMID: 37928734 PMCID: PMC10620498 DOI: 10.3389/fnins.2023.1291674] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2023] [Accepted: 10/06/2023] [Indexed: 11/07/2023] Open
Abstract
Introduction Semantic segmentation is a crucial visual representation learning task for autonomous driving systems, as it enables the perception of surrounding objects and road conditions to ensure safe and efficient navigation. Methods In this paper, we present a novel semantic segmentation approach for autonomous driving scenes using a Multi-Scale Adaptive Mechanism (MSAAM). The proposed method addresses the challenges associated with complex driving environments, including large-scale variations, occlusions, and diverse object appearances. Our MSAAM integrates multiple scale features and adaptively selects the most relevant features for precise segmentation. We introduce a novel attention module that incorporates spatial, channel-wise and scale-wise attention mechanisms to effectively enhance the discriminative power of features. Results The experimental results of the model on key objectives in the Cityscapes dataset are: ClassAvg:81.13, mIoU:71.46. The experimental results on comprehensive evaluation metrics are: AUROC:98.79, AP:68.46, FPR95:5.72. The experimental results in terms of computational cost are: GFLOPs:2117.01, Infer. Time (ms):61.06. All experimental results data are superior to the comparative method model. Discussion The proposed method achieves superior performance compared to state-of-the-art techniques on several benchmark datasets demonstrating its efficacy in addressing the challenges of autonomous driving scene understanding.
Collapse
Affiliation(s)
- Danping Liu
- School of Advanced Manufacturing Engineering, Hefei University, Hefei, China
| | - Dong Zhang
- State Key Laboratory of Automotive Simulation and Control, Jilin University, Changchun, China
| | - Lei Wang
- School of Advanced Manufacturing Engineering, Hefei University, Hefei, China
| | - Jun Wang
- School of Advanced Manufacturing Engineering, Hefei University, Hefei, China
| |
Collapse
|
5
|
Zhang J, Jiao L, Ma W, Liu F, Liu X, Li L, Zhu H. RDLNet: A Regularized Descriptor Learning Network. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2023; 34:5669-5681. [PMID: 34878982 DOI: 10.1109/tnnls.2021.3130655] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Local image descriptor learning has been instrumental in various computer vision tasks. Recent innovations lie with similarity measurement of descriptor vectors with metric learning for randomly selected Siamese or triplet patches. Local image descriptor learning focuses more on hard samples since easy samples do not contribute much to optimization. However, few studies focus on hard samples of image patches from the perspective of loss functions and design appropriate learning algorithms to obtain a more compact descriptor representation. This article proposes a regularized descriptor learning network (RDLNet) that makes the network focus on the learning of hard samples and compact descriptor with triplet networks. A novel hard sample mining strategy is designed to select the hardest negative samples in mini-batch. Then batch margin loss concerned with hard samples is adopted to optimize the distance of extreme cases. Finally, for a more stable network and preventing network collapsing, orthogonal regularization is designed to constrain convolutional kernels and obtain rich deep features. RDLNet provides a compact discriminative low-dimensional representation and can be embedded in other pipelines easily. This article gives extensive experimental results for large benchmarks in multiple scenarios and generalization in matching applications with significant improvements.
Collapse
|
6
|
Li X, Jiang A, Qiu Y, Li M, Zhang X, Yan S. TPFR-Net: U-shaped model for lung nodule segmentation based on transformer pooling and dual-attention feature reorganization. Med Biol Eng Comput 2023:10.1007/s11517-023-02852-9. [PMID: 37243853 DOI: 10.1007/s11517-023-02852-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2022] [Accepted: 05/17/2023] [Indexed: 05/29/2023]
Abstract
Accurate segmentation of lung nodules is the key to diagnosing the lesion type of lung nodule. The complex boundaries of lung nodules and the visual similarity to surrounding tissues make precise segmentation of lung nodules challenging. Traditional CNN based lung nodule segmentation models focus on extracting local features from neighboring pixels and ignore global contextual information, which is prone to incomplete segmentation of lung nodule boundaries. In the U-shaped encoder-decoder structure, variations of image resolution caused by up-sampling and down-sampling result in the loss of feature information, which reduces the reliability of output features. This paper proposes transformer pooling module and dual-attention feature reorganization module to effectively improve the above two defects. Transformer pooling module innovatively fuses the self-attention layer and pooling layer in the transformer, which compensates for the limitation of convolution operation, reduces the loss of feature information in the pooling process, and decreases the computational complexity of the Transformer significantly. Dual-attention feature reorganization module innovatively employs the dual-attention mechanism of channel and spatial to improve the sub-pixel convolution, minimizing the loss of feature information during up-sampling. In addition, two convolutional modules are proposed in this paper, which together with transformer pooling module form an encoder that can adequately extract local features and global dependencies. We use the fusion loss function and deep supervision strategy in the decoder to train the model. The proposed model has been extensively experimented and evaluated on the LIDC-IDRI dataset, the highest Dice Similarity Coefficient is 91.84 and the highest sensitivity is 92.66, indicating the model's comprehensive capability has surpassed state-of-the-art UTNet. The model proposed in this paper has superior segmentation performance for lung nodules and can provide a more in-depth assessment of lung nodules' shape, size, and other characteristics, which is of important clinical significance and application value to assist physicians in the early diagnosis of lung nodules.
Collapse
Affiliation(s)
- Xiaotian Li
- College of Information and Computer, Taiyuan University of Technology, Taiyuan, 030000, China
| | - Ailian Jiang
- College of Information and Computer, Taiyuan University of Technology, Taiyuan, 030000, China.
| | - Yanfang Qiu
- College of Information and Computer, Taiyuan University of Technology, Taiyuan, 030000, China
| | - Mengyang Li
- College of Information and Computer, Taiyuan University of Technology, Taiyuan, 030000, China
| | - Xinyue Zhang
- College of Information and Computer, Taiyuan University of Technology, Taiyuan, 030000, China
| | - Shuotian Yan
- College of Information Science and Technology, Shijiazhuang Tiedao University, Shijiazhuang, 050000, China
| |
Collapse
|
7
|
Li X, Jiang A, Wang S, Li F, Yan S. CTBP-Net: Lung nodule segmentation model based on the cross-transformer and bidirectional pyramid. Biomed Signal Process Control 2023. [DOI: 10.1016/j.bspc.2022.104528] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
|
8
|
Zeng M, Zou B, Kui X, Zhu C, Xiao L, Chen Z, Du J. PA-LBF: Prefix-Based and Adaptive Learned Bloom Filter for Spatial Data. INT J INTELL SYST 2023. [DOI: 10.1155/2023/4970776] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/03/2023]
Abstract
The recently proposed learned bloom filter (LBF) opens a new perspective on how to reconstruct bloom filters with machine learning. However, the LBF has a massive time cost and does not apply to multidimensional spatial data. In this paper, we propose a prefix-based and adaptive learned bloom filter (PA-LBF) for spatial data, which efficiently supports the insertion and deletion. The proposed PA-LBF is divided into three parts: (1) the prefix-based classification. The Z-order space-filling curve is used to extract data, prefix it, and classify it. (2) The adaptive learning process. The multiple independent adaptive sub-LBFs are designed to train the suffixes of data, combined with part 1, to reduce the false positive rate (FPR), query, and learning process time consumption. (3) The backup filter uses CBF. Two kinds of backup CBF are constructed to meet the situation of different insertion and deletion frequencies. Experimental results prove the validity of the theory and show that the PA-LBF reduces the FPR by 84.87%, 79.53%, and 43.01% with the same memory usage compared with the LBF on three real-world spatial datasets. Moreover, the time consumption of PA-LBF can be reduced to
and
that of the LBF on the query and learning process, respectively.
Collapse
|
9
|
Xue P, Lu Y, Chang J, Wei X, Wei Z. IR$$^2$$Net: information restriction and information recovery for accurate binary neural networks. Neural Comput Appl 2023. [DOI: 10.1007/s00521-023-08495-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/29/2023]
|
10
|
Xu X, Yang CC, Xiao Y, Kong JL. A Fine-Grained Recognition Neural Network with High-Order Feature Maps via Graph-Based Embedding for Natural Bird Diversity Conservation. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2023; 20:4924. [PMID: 36981832 PMCID: PMC10048992 DOI: 10.3390/ijerph20064924] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/20/2023] [Revised: 02/28/2023] [Accepted: 03/08/2023] [Indexed: 06/18/2023]
Abstract
The conservation of avian diversity plays a critical role in maintaining ecological balance and ecosystem function, as well as having a profound impact on human survival and livelihood. With species' continuous and rapid decline, information and intelligent technology have provided innovative knowledge about how functional biological diversity interacts with environmental changes. Especially in complex natural scenes, identifying bird species with a real-time and accurate pattern is vital to protect the ecological environment and maintain biodiversity changes. Aiming at the fine-grained problem in bird image recognition, this paper proposes a fine-grained detection neural network based on optimizing the YOLOV5 structure via a graph pyramid attention convolution operation. Firstly, the Cross Stage Partial (CSP) structure is introduced to a brand-new backbone classification network (GPA-Net) for significantly reducing the whole model's parameters. Then, the graph pyramid structure is applied to learn the bird image features of different scales, which enhances the fine-grained learning ability and embeds high-order features to reduce parameters. Thirdly, YOLOV5 with the soft non-maximum suppression (NMS) strategy is adopted to design the detector composition, improving the detection capability for small targets. Detailed experiments demonstrated that the proposed model achieves better or equivalent accuracy results, over-performing current advanced models in bird species identification, and is more stable and suitable for practical applications in biodiversity conservation.
Collapse
Affiliation(s)
- Xin Xu
- School of Artificial Intelligence, Beijing Technology and Business University, Beijing 100048, China
- State Environmental Protection Key Laboratory of Food Chain Pollution Control, Beijing Technology and Business University, Beijing 100048, China
| | - Cheng-Cai Yang
- School of Artificial Intelligence, Beijing Technology and Business University, Beijing 100048, China
| | - Yang Xiao
- School of Artificial Intelligence, Beijing Technology and Business University, Beijing 100048, China
| | - Jian-Lei Kong
- School of Artificial Intelligence, Beijing Technology and Business University, Beijing 100048, China
- State Environmental Protection Key Laboratory of Food Chain Pollution Control, Beijing Technology and Business University, Beijing 100048, China
| |
Collapse
|
11
|
Li M, Chen C, Cao Y, Zhou P, Deng X, Liu P, Wang Y, Lv X, Chen C. CIABNet: Category imbalance attention block network for the classification of multi-differentiated types of esophageal cancer. Med Phys 2023; 50:1507-1527. [PMID: 36272103 DOI: 10.1002/mp.16067] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2022] [Revised: 08/25/2022] [Accepted: 09/09/2022] [Indexed: 11/12/2022] Open
Abstract
BACKGROUND Esophageal cancer has become one of the important cancers that seriously threaten human life and health, and its incidence and mortality rate are still among the top malignant tumors. Histopathological image analysis is the gold standard for diagnosing different differentiation types of esophageal cancer. PURPOSE The grading accuracy and interpretability of the auxiliary diagnostic model for esophageal cancer are seriously affected by small interclass differences, imbalanced data distribution, and poor model interpretability. Therefore, we focused on developing the category imbalance attention block network (CIABNet) model to try to solve the previous problems. METHODS First, the quantitative metrics and model visualization results are integrated to transfer knowledge from the source domain images to better identify the regions of interest (ROI) in the target domain of esophageal cancer. Second, in order to pay attention to the subtle interclass differences, we propose the concatenate fusion attention block, which can focus on the contextual local feature relationships and the changes of channel attention weights among different regions simultaneously. Third, we proposed a category imbalance attention module, which treats each esophageal cancer differentiation class fairly based on aggregating different intensity information at multiple scales and explores more representative regional features for each class, which effectively mitigates the negative impact of category imbalance. Finally, we use feature map visualization to focus on interpreting whether the ROIs are the same or similar between the model and pathologists, thus better improving the interpretability of the model. RESULTS The experimental results show that the CIABNet model outperforms other state-of-the-art models, which achieves the most advanced results in classifying the differentiation types of esophageal cancer with an average classification accuracy of 92.24%, an average precision of 93.52%, an average recall of 90.31%, an average F1 value of 91.73%, and an average AUC value of 97.43%. In addition, the CIABNet model has essentially similar or identical to the ROI of pathologists in identifying histopathological images of esophageal cancer. CONCLUSIONS Our experimental results prove that our proposed computer-aided diagnostic algorithm shows great potential in histopathological images of multi-differentiated types of esophageal cancer.
Collapse
Affiliation(s)
- Min Li
- College of Information Science and Engineering, Xinjiang University, Urumqi, China
- Key Laboratory of Signal Detection and Processing, Xinjiang University, Urumqi, China
| | - Chen Chen
- College of Information Science and Engineering, Xinjiang University, Urumqi, China
- Xinjiang Cloud Computing Application Laboratory, Karamay, China
| | - Yanzhen Cao
- Department of Pathology, The Affiliated Tumor Hospital of Xinjiang Medical University, Urumqi, China
| | - Panyun Zhou
- College of Software, Xinjiang University, Urumqi, China
| | - Xin Deng
- College of Software, Xinjiang University, Urumqi, China
| | - Pei Liu
- College of Information Science and Engineering, Xinjiang University, Urumqi, China
| | - Yunling Wang
- The First Affiliated Hospital of Xinjiang Medical University, Urumqi, China
| | - Xiaoyi Lv
- College of Information Science and Engineering, Xinjiang University, Urumqi, China
- Key Laboratory of Signal Detection and Processing, Xinjiang University, Urumqi, China
- Xinjiang Cloud Computing Application Laboratory, Karamay, China
- College of Software, Xinjiang University, Urumqi, China
- Key Laboratory of software engineering technology, Xinjiang University, Urumqi, China
| | - Cheng Chen
- College of Software, Xinjiang University, Urumqi, China
| |
Collapse
|
12
|
Multi-point attention-based semi-supervised learning for diabetic retinopathy classification. Biomed Signal Process Control 2023. [DOI: 10.1016/j.bspc.2022.104412] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
|
13
|
Attention-Based Sentiment Region Importance and Relationship Analysis for Image Sentiment Recognition. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE 2022; 2022:9772714. [DOI: 10.1155/2022/9772714] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/18/2022] [Revised: 10/25/2022] [Accepted: 11/03/2022] [Indexed: 11/18/2022]
Abstract
Image sentiment recognition has attracted considerable attention from academia and industry due to the increasing tendency of expressing opinions via images and videos online. Previous studies focus on multilevel representation from global and local views to improve recognition performance. However, it is insufficient to research the importance and relationship of visual regions for image sentiment recognition. This paper proposes an attention-based sentiment region importance and relationship (ASRIR) analysis method, including important attention and relation attention for image sentiment recognition. First, we extract spatial region features using a multilevel pyramid network from the image. Second, we design important attention to exploring the sentiment semantic-related regions and relation attention to investigating the relationship between regions. In order to release the excessive concentration of attention, we employ a unimodal function as the objective function for regularization. Finally, the region features weighted by the attention mechanism are fused and input into a fully connected layer for classification. Extensive experiments on various commonly used image sentiment datasets demonstrate that our proposed method outperforms the state-of-the-art approaches.
Collapse
|
14
|
Fine-grained image recognition via trusted multi-granularity information fusion. INT J MACH LEARN CYB 2022. [DOI: 10.1007/s13042-022-01685-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
15
|
Zhao P, Miao Q, Li H, Liu R, Quan Y, Song J. Refined Probability Distribution Module for Fine-Grained Visual Categorization. Neurocomputing 2022. [DOI: 10.1016/j.neucom.2022.10.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
|
16
|
Bera A, Wharton Z, Liu Y, Bessis N, Behera A. SR-GNN: Spatial Relation-aware Graph Neural Network for Fine-Grained Image Categorization. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2022; PP:6017-6031. [PMID: 36103441 DOI: 10.1109/tip.2022.3205215] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Over the past few years, a significant progress has been made in deep convolutional neural networks (CNNs)-based image recognition. This is mainly due to the strong ability of such networks in mining discriminative object pose and parts information from texture and shape. This is often inappropriate for fine-grained visual classification (FGVC) since it exhibits high intra-class and low inter-class variances due to occlusions, deformation, illuminations, etc. Thus, an expressive feature representation describing global structural information is a key to characterize an object/ scene. To this end, we propose a method that effectively captures subtle changes by aggregating context-aware features from most relevant image-regions and their importance in discriminating fine-grained categories avoiding the bounding-box and/or distinguishable part annotations. Our approach is inspired by the recent advancement in self-attention and graph neural networks (GNNs) approaches to include a simple yet effective relation-aware feature transformation and its refinement using a context-aware attention mechanism to boost the discriminability of the transformed feature in an end-to-end learning process. Our model is evaluated on eight benchmark datasets consisting of fine-grained objects and human-object interactions. It outperforms the state-of-the-art approaches by a significant margin in recognition accuracy.
Collapse
|
17
|
Liu K, Chen K, Jia K. Convolutional Fine-Grained Classification With Self-Supervised Target Relation Regularization. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2022; 31:5570-5584. [PMID: 35981063 DOI: 10.1109/tip.2022.3197931] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Fine-grained visual classification can be addressed by deep representation learning under supervision of manually pre-defined targets (e.g., one-hot or the Hadamard codes). Such target coding schemes are less flexible to model inter-class correlation and are sensitive to sparse and imbalanced data distribution as well. In light of this, this paper introduces a novel target coding scheme - dynamic target relation graphs (DTRG), which, as an auxiliary feature regularization, is a self-generated structural output to be mapped from input images. Specifically, online computation of class-level feature centers is designed to generate cross-category distance in the representation space, which can thus be depicted by a dynamic graph in a non-parametric manner. Explicitly minimizing intra-class feature variations anchored on those class-level centers can encourage learning of discriminative features. Moreover, owing to exploiting inter-class dependency, the proposed target graphs can alleviate data sparsity and imbalanceness in representation learning. Inspired by recent success of the mixup style data augmentation, this paper introduces randomness into soft construction of dynamic target relation graphs to further explore relation diversity of target classes. Experimental results can demonstrate the effectiveness of our method on a number of diverse benchmarks of multiple visual classification, especially achieving the state-of-the-art performance on three popular fine-grained object benchmarks and superior robustness against sparse and imbalanced data. Source codes are made publicly available at https://github.com/AkonLau/DTRG.
Collapse
|
18
|
Chen J, Li H, Liang J, Su X, Zhai Z, Chai X. Attention-based cropping and erasing learning with coarse-to-fine refinement for fine-grained visual classification. Neurocomputing 2022. [DOI: 10.1016/j.neucom.2022.06.041] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|
19
|
Prediction of Metabolic Characteristics of Cardiovascular and Cerebrovascular Diseases Based on Convolutional Neural Network. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2022; 2022:3206378. [PMID: 35936374 PMCID: PMC9348942 DOI: 10.1155/2022/3206378] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/13/2022] [Revised: 06/29/2022] [Accepted: 07/06/2022] [Indexed: 11/18/2022]
Abstract
As a typical disease, cardiovascular and cerebrovascular diseases cause great damage to the human body. In view of the problem that the existing models failed to describe and represent the characteristics of cardiovascular and cerebrovascular indicators, convolution neural network was used to analyze the metabolic factors of cardiovascular and cerebrovascular. Based on convolutional neural network theory, feature extraction was carried out on the relevant parameters of the model, and the change trend of different cardiovascular and cerebrovascular indicators was studied by model optimization, theoretical analysis, and experimental verification. Relevant studies show that the value of neurons increases slowly at first and then rapidly with the increase of bias term
. And with the increase of computing time, the corresponding nonlinear characteristics are gradually reflected; so, the influence of computing time on neuron results should be considered when selecting bias term
. The gradient changes under different functions have typical symmetry, which indicates that the effects of functions on model parameters have certain cyclic characteristics. Among them, ReLU function has the largest variation range, tanh function has a relatively small curve variation range, and sigmoid function has the smallest variation range. Five indicators are selected to describe the metabolic characteristics of the disease through characteristic analysis of cardiovascular and cerebrovascular diseases. The onset signs have the greatest impact on cardiovascular and cerebrovascular diseases, while the corresponding metabolic characteristics have the least impact on cardiovascular and cerebrovascular diseases. The study showed that the influence of different indicators on the model had typical stage characteristics, and relevant data were used to verify the accuracy of the model. Finally, the optimization model based on convolutional neural network was used to predict the metabolic characteristics of cardiovascular and cerebrovascular diseases. Relevant studies show that the optimization model can better analyze the metabolic characteristics of cardiovascular and cerebrovascular diseases. This research can provide theoretical support for the application of convolutional neural networks in other fields.
Collapse
|
20
|
|
21
|
A Graph-Related High-Order Neural Network Architecture via Feature Aggregation Enhancement for Identification Application of Diseases and Pests. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE 2022; 2022:4391491. [PMID: 35665281 PMCID: PMC9162821 DOI: 10.1155/2022/4391491] [Citation(s) in RCA: 16] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/23/2022] [Accepted: 04/05/2022] [Indexed: 11/29/2022]
Abstract
Diseases and pests are essential threat factors that affect agricultural production, food security supply, and ecological plant diversity. However, the accurate recognition of various diseases and pests is still challenging for existing advanced information and intelligence technologies. Disease and pest recognition is typically a fine-grained visual classification problem, which is easy to confuse the traditional coarse-grained methods due to the external similarity between different categories and the significant differences among each subsample of the same category. Toward this end, this paper proposes an effective graph-related high-order network with feature aggregation enhancement (GHA-Net) to handle the fine-grained image recognition of plant pests and diseases. In our approach, an improved CSP-stage backbone network is first formed to offer massive channel-shuffled features in multiple granularities. Secondly, relying on the multilevel attention mechanism, the feature aggregation enhancement module is designed to exploit distinguishable fine-grained features representing different discriminating parts. Meanwhile, the graphic convolution module is constructed to analyse the graph-correlated representation of part-specific interrelationships by regularizing semantic features into the high-order tensor space. With the collaborative learning of three modules, our approach can grasp the robust contextual details of diseases and pests for better fine-grained identification. Extensive experiments on several public fine-grained disease and pest datasets demonstrate that the proposed GHA-Net achieves better performances in accuracy and efficiency surpassing several other existing models and is more suitable for fine-grained identification applications in complex scenes.
Collapse
|
22
|
Cheng L, Fang P, Liang Y, Zhang L, Shen C, Wang H. TSGB: Target-Selective Gradient Backprop for Probing CNN Visual Saliency. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2022; 31:2529-2540. [PMID: 35275820 DOI: 10.1109/tip.2022.3157149] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
The explanation for deep neural networks has drawn extensive attention in the deep learning community over the past few years. In this work, we study the visual saliency, a.k.a. visual explanation, to interpret convolutional neural networks. Compared to iteration based saliency methods, single backward pass based saliency methods benefit from faster speed, and they are widely used in downstream visual tasks. Thus, we focus on single backward pass based methods. However, existing methods in this category struggle to successfully produce fine-grained saliency maps concentrating on specific target classes. That said, producing faithful saliency maps satisfying both target-selectiveness and fine-grainedness using a single backward pass is a challenging problem in the field. To mitigate this problem, we revisit the gradient flow inside the network, and find that the entangled semantics and original weights may disturb the propagation of target-relevant saliency. Inspired by those observations, we propose a novel visual saliency method, termed Target-Selective Gradient Backprop (TSGB), which leverages rectification operations to effectively emphasize target classes and further efficiently propagate the saliency to the image space, thereby generating target-selective and fine-grained saliency maps. The proposed TSGB consists of two components, namely, TSGB-Conv and TSGB-FC, which rectify the gradients for convolutional layers and fully-connected layers, respectively. Extensive qualitative and quantitative experiments on the ImageNet and Pascal VOC datasets show that the proposed method achieves more accurate and reliable results than the other competitive methods. Code is available at https://github.com/123fxdx/CNNvisualizationTSGB.
Collapse
|
23
|
Self-distribution binary neural networks. APPL INTELL 2022. [DOI: 10.1007/s10489-022-03348-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
|
24
|
Yu J, Li K, Peng J. Reference-guided face inpainting with reference attention network. Neural Comput Appl 2022. [DOI: 10.1007/s00521-022-06961-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
25
|
Zhang C, Chen P, Lei T, Wu Y, Meng H. What-Where-When Attention Network for video-based person re-identification. Neurocomputing 2022. [DOI: 10.1016/j.neucom.2021.10.018] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
26
|
Liu M, Zhang C, Bai H, Zhang R, Zhao Y. Cross-Part Learning for Fine-Grained Image Classification. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2021; 31:748-758. [PMID: 34928798 DOI: 10.1109/tip.2021.3135477] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Recent techniques have achieved remarkable improvements depended on mining subtle yet distinctive features for fine-grained visual classification (FGVC). While prior works directly combine discriminative features extracted from different parts, we argue that the potential interactions between different parts and their abilities to category predictions should be taken into consideration, which enables significant parts to contribute more to the decision of the sub-category. To this end, we present a Cross-Part Convolutional Neural Network (CP-CNN) in a weakly supervised manner to explore cross-learning among multi-regional features. Specifically, the context transformer is implemented to encourage joint feature learning across different parts under the guidance of a navigator. The part with the highest confidence is regarded as a navigator to deliver distinguishing characteristics to the others with lower confidence while the complementary information is retained. To locate discriminative but subtle parts precisely, a part proposal generator (PPG) is designed with the feature enhancement blocks, through which complex scale variations caused by the viewpoint diversity can be effectively alleviated. Extensive experiments on three benchmark datasets demonstrate that our proposed method consistently outperforms existing state-of-the-art methods.
Collapse
|
27
|
Xiang X, Zhang Y, Jin L, Li Z, Tang J. Sub-Region Localized Hashing for Fine-Grained Image Retrieval. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2021; 31:314-326. [PMID: 34871171 DOI: 10.1109/tip.2021.3131042] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Fine-grained image hashing is challenging due to the difficulties of capturing discriminative local information to generate hash codes. On the one hand, existing methods usually extract local features with the dense attention mechanism by focusing on dense local regions, which cannot contain diverse local information for fine-grained hashing. On the other hand, hash codes of the same class suffer from large intra-class variation of fine-grained images. To address the above problems, this work proposes a novel sub-Region Localized Hashing (sRLH) to learn intra-class compact and inter-class separable hash codes that also contain diverse subtle local information for efficient fine-grained image retrieval. Specifically, to localize diverse local regions, a sub-region localization module is developed to learn discriminative local features by locating the peaks of non-overlap sub-regions in the feature map. Different from localizing dense local regions, these peaks can guide the sub-region localization module to capture multifarious local discriminative information by paying close attention to dispersive local regions. To mitigate intra-class variations, hash codes of the same class are enforced to approach one common binary center. Meanwhile, the gram-schmidt orthogonalization is performed on the binary centers to make the hash codes inter-class separable. Extensive experimental results on four widely used fine-grained image retrieval datasets demonstrate the superiority of sRLH to several state-of-the-art methods. The source code of sRLH will be released at https://github.com/ZhangYajie-NJUST/sRLH.git.
Collapse
|
28
|
Xie J, Ma Z, Xue JH, Zhang G, Sun J, Zheng Y, Guo J. DS-UI: Dual-Supervised Mixture of Gaussian Mixture Models for Uncertainty Inference in Image Recognition. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2021; 30:9208-9219. [PMID: 34739376 DOI: 10.1109/tip.2021.3123555] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
This paper proposes a dual-supervised uncertainty inference (DS-UI) framework for improving Bayesian estimation-based UI in DNN-based image recognition. In the DS-UI, we combine the classifier of a DNN, i.e., the last fully-connected (FC) layer, with a mixture of Gaussian mixture models (MoGMM) to obtain an MoGMM-FC layer. Unlike existing UI methods for DNNs, which only calculate the means or modes of the DNN outputs' distributions, the proposed MoGMM-FC layer acts as a probabilistic interpreter for the features that are inputs of the classifier to directly calculate the probabilities of them for the DS-UI. In addition, we propose a dual-supervised stochastic gradient-based variational Bayes (DS-SGVB) algorithm for the MoGMM-FC layer optimization. Unlike conventional SGVB and optimization algorithms in other UI methods, the DS-SGVB not only models the samples in the specific class for each Gaussian mixture model (GMM) in the MoGMM, but also considers the negative samples from other classes for the GMM to reduce the intra-class distances and enlarge the inter-class margins simultaneously for enhancing the learning ability of the MoGMM-FC layer in the DS-UI. Experimental results show the DS-UI outperforms the state-of-the-art UI methods in misclassification detection. We further evaluate the DS-UI in open-set out-of-domain/-distribution detection and find statistically significant improvements. Visualizations of the feature spaces demonstrate the superiority of the DS-UI. Codes are available at https://github.com/PRIS-CV/DS-UI.
Collapse
|
29
|
Zhang H, Liu J, Yu Z, Wang P. MASG-GAN: A multi-view attention superpixel-guided generative adversarial network for efficient and simultaneous histopathology image segmentation and classification. Neurocomputing 2021. [DOI: 10.1016/j.neucom.2021.08.039] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023]
|
30
|
Shen L, You L, Peng B, Zhang C. Group multi-scale attention pyramid network for traffic sign detection. Neurocomputing 2021. [DOI: 10.1016/j.neucom.2021.04.083] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|