1
|
Kohler M, Eisenbach M, Gross HM. Few-Shot Object Detection: A Comprehensive Survey. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:11958-11978. [PMID: 37067965 DOI: 10.1109/tnnls.2023.3265051] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Humans are able to learn to recognize new objects even from a few examples. In contrast, training deep-learning-based object detectors requires huge amounts of annotated data. To avoid the need to acquire and annotate these huge amounts of data, few-shot object detection (FSOD) aims to learn from few object instances of new categories in the target domain. In this survey, we provide an overview of the state of the art in FSOD. We categorize approaches according to their training scheme and architectural layout. For each type of approach, we describe the general realization as well as concepts to improve the performance on novel categories. Whenever appropriate, we give short takeaways regarding these concepts in order to highlight the best ideas. Eventually, we introduce commonly used datasets and their evaluation protocols and analyze the reported benchmark results. As a result, we emphasize common challenges in evaluation and identify the most promising current trends in this emerging field of FSOD.
Collapse
|
2
|
Li G, Cheng D, Ding X, Wang N, Li J, Gao X. Weakly Supervised Temporal Action Localization With Bidirectional Semantic Consistency Constraint. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:13032-13045. [PMID: 37134038 DOI: 10.1109/tnnls.2023.3266062] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
Weakly supervised temporal action localization (WTAL) aims to classify and localize temporal boundaries of actions for the video, given only video-level category labels in the training datasets. Due to the lack of boundary information during training, existing approaches formulate WTAL as a classification problem, i.e., generating the temporal class activation map (T-CAM) for localization. However, with only classification loss, the model would be suboptimized, i.e., the action-related scenes are enough to distinguish different class labels. Regarding other actions in the action-related scene (i.e., the scene same as positive actions) as co-scene actions, this suboptimized model would misclassify the co-scene actions as positive actions. To address this misclassification, we propose a simple yet efficient method, named bidirectional semantic consistency constraint (Bi-SCC), to discriminate the positive actions from co-scene actions. The proposed Bi-SCC first adopts a temporal context augmentation to generate an augmented video that breaks the correlation between positive actions and their co-scene actions in the inter-video. Then, a semantic consistency constraint (SCC) is used to enforce the predictions of the original video and augmented video to be consistent, hence suppressing the co-scene actions. However, we find that this augmented video would destroy the original temporal context. Simply applying the consistency constraint would affect the completeness of localized positive actions. Hence, we boost the SCC in a bidirectional way to suppress co-scene actions while ensuring the integrity of positive actions, by cross-supervising the original and augmented videos. Finally, our proposed Bi-SCC can be applied to current WTAL approaches and improve their performance. Experimental results show that our approach outperforms the state-of-the-art methods on THUMOS14 and ActivityNet. The code is available at https://github.com/lgzlIlIlI/BiSCC.
Collapse
|
3
|
Ji Z, An P, Liu X, Gao C, Pang Y, Shao L. Semantic-Aware Dynamic Generation Networks for Few-Shot Human-Object Interaction Recognition. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:12564-12575. [PMID: 37037250 DOI: 10.1109/tnnls.2023.3263660] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Recognizing human-object interaction (HOI) aims at inferring various relationships between actions and objects. Although great progress in HOI has been made, the long-tail problem and combinatorial explosion problem are still practical challenges. To this end, we formulate HOI as a few-shot task to tackle both challenges and design a novel dynamic generation method to address this task. The proposed approach is called semantic-aware dynamic generation networks (SADG-Nets). Specifically, SADG-Net first assigns semantic-aware task representations for different batches of data, which further generates dynamic parameters. It obtains the features that highlight intercategory discriminability and intracategory commonality adaptively. In addition, we also design a dual semantic-aware encoder module (DSAE-Module), that is, verb-aware and noun-aware branches, to yield both action and object prototypes of HOI for each task space, which generalizes to novel combinations by transferring similarities among interactions. Extensive experimental results on two benchmark datasets, that is, humans interacting with common objects (HICO)-FS and trento universal HOI (TUHOI)-FS, illustrate that our SADG-Net achieves superior performance over state-of-the-art approaches, which proves its impressive effectiveness on few-shot HOI recognition.
Collapse
|
4
|
Qin H, Cai M, Qin H. NABNet: Deep Learning-Based IoT Alert System for Detection of Abnormal Neck Behavior. SENSORS (BASEL, SWITZERLAND) 2024; 24:5379. [PMID: 39205072 PMCID: PMC11360098 DOI: 10.3390/s24165379] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/16/2024] [Revised: 08/10/2024] [Accepted: 08/15/2024] [Indexed: 09/04/2024]
Abstract
The excessive use of electronic devices for prolonged periods has led to problems such as neck pain and pressure injury in sedentary people. If not detected and corrected early, these issues can cause serious risks to physical health. Detectors for generic objects cannot adequately capture such subtle neck behaviors, resulting in missed detections. In this paper, we explore a deep learning-based solution for detecting abnormal behavior of the neck and propose a model called NABNet that combines object detection based on YOLOv5s with pose estimation based on Lightweight OpenPose. NABNet extracts the detailed behavior characteristics of the neck from global to local and detects abnormal behavior by analyzing the angle of the data. We deployed NABNet on the cloud and edge devices to achieve remote monitoring and abnormal behavior alarms. Finally, we applied the resulting NABNet-based IoT system for abnormal behavior detection in order to evaluate its effectiveness. The experimental results show that our system can effectively detect abnormal neck behavior and raise alarms on the cloud platform, with the highest accuracy reaching 94.13%.
Collapse
Affiliation(s)
- Hongshuai Qin
- School of Computer Science, Hangzhou Dianzi University, Hangzhou 310018, China
| | | | | |
Collapse
|
5
|
Wang J, Qiao L, Zhou S, Zhou J, Wang J, Li J, Ying S, Chang C, Shi J. Weakly Supervised Lesion Detection and Diagnosis for Breast Cancers With Partially Annotated Ultrasound Images. IEEE TRANSACTIONS ON MEDICAL IMAGING 2024; 43:2509-2521. [PMID: 38373131 DOI: 10.1109/tmi.2024.3366940] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/21/2024]
Abstract
Deep learning (DL) has proven highly effective for ultrasound-based computer-aided diagnosis (CAD) of breast cancers. In an automatic CAD system, lesion detection is critical for the following diagnosis. However, existing DL-based methods generally require voluminous manually-annotated region of interest (ROI) labels and class labels to train both the lesion detection and diagnosis models. In clinical practice, the ROI labels, i.e. ground truths, may not always be optimal for the classification task due to individual experience of sonologists, resulting in the issue of coarse annotation to limit the diagnosis performance of a CAD model. To address this issue, a novel Two-Stage Detection and Diagnosis Network (TSDDNet) is proposed based on weakly supervised learning to improve diagnostic accuracy of the ultrasound-based CAD for breast cancers. In particular, all the initial ROI-level labels are considered as coarse annotations before model training. In the first training stage, a candidate selection mechanism is then designed to refine manual ROIs in the fully annotated images and generate accurate pseudo-ROIs for the partially annotated images under the guidance of class labels. The training set is updated with more accurate ROI labels for the second training stage. A fusion network is developed to integrate detection network and classification network into a unified end-to-end framework as the final CAD model in the second training stage. A self-distillation strategy is designed on this model for joint optimization to further improves its diagnosis performance. The proposed TSDDNet is evaluated on three B-mode ultrasound datasets, and the experimental results indicate that it achieves the best performance on both lesion detection and diagnosis tasks, suggesting promising application potential.
Collapse
|
6
|
Shi J, Zhang K, Guo C, Yang Y, Xu Y, Wu J. A survey of label-noise deep learning for medical image analysis. Med Image Anal 2024; 95:103166. [PMID: 38613918 DOI: 10.1016/j.media.2024.103166] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2021] [Revised: 04/04/2024] [Accepted: 04/05/2024] [Indexed: 04/15/2024]
Abstract
Several factors are associated with the success of deep learning. One of the most important reasons is the availability of large-scale datasets with clean annotations. However, obtaining datasets with accurate labels in the medical imaging domain is challenging. The reliability and consistency of medical labeling are some of these issues, and low-quality annotations with label noise usually exist. Because noisy labels reduce the generalization performance of deep neural networks, learning with noisy labels is becoming an essential task in medical image analysis. Literature on this topic has expanded in terms of volume and scope. However, no recent surveys have collected and organized this knowledge, impeding the ability of researchers and practitioners to utilize it. In this work, we presented an up-to-date survey of label-noise learning for medical image domain. We reviewed extensive literature, illustrated some typical methods, and showed unified taxonomies in terms of methodological differences. Subsequently, we conducted the methodological comparison and demonstrated the corresponding advantages and disadvantages. Finally, we discussed new research directions based on the characteristics of medical images. Our survey aims to provide researchers and practitioners with a solid understanding of existing medical label-noise learning, such as the main algorithms developed over the past few years, which could help them investigate new methods to combat with the negative effects of label noise.
Collapse
Affiliation(s)
- Jialin Shi
- School of Computer and Communication Engineering, University of Science and Technology Beijing, Beijing, China.
| | - Kailai Zhang
- Department of Networks, China Mobile Communications Group Co., Ltd., Beijing, China
| | - Chenyi Guo
- Department of Electronic Engineering, Tsinghua University, Beijing, China
| | | | - Yali Xu
- Department of Breast Surgery, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| | - Ji Wu
- Department of Electronic Engineering, Tsinghua University, Beijing, China
| |
Collapse
|
7
|
Su L, Fei L, Zhang B, Zhao S, Wen J, Xu Y. Complete Region of Interest for Unconstrained Palmprint Recognition. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2024; 33:3662-3675. [PMID: 38837937 DOI: 10.1109/tip.2024.3407666] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2024]
Abstract
Unconstrained palmprint images have shown great potential for recognition applications due to their lower restrictions regarding hand poses and backgrounds during contactless image acquisition. However, they face two challenges: 1) unclear palm contours and finger-valley points of unconstrained palmprint images make it difficult to locate landmarks to crop the palmprint region of interest (ROI); and 2) large intra-class diversities of unconstrained palmprint images hinder the learning of intra-class-invariant palmprint features. In this paper, we propose to directly extract the complete palmprint region as the ROI (CROI) using the detection-style CenterNet without requiring the detection of any landmarks, and large intra-class diversities may occur. To address this, we further propose a palmprint feature alignment and learning hybrid network (PalmALNet) for unconstrained palmprint recognition. Specifically, we first exploit and align the multi-scale shallow representation of unconstrained palmprint images via deformable convolution and alignment-aware supervision, such that the pixel gaps of the intra-class palmprint CROIs can be minimized in shallow feature space. Then, we develop multiple triple-attention learning modules by integrating spatial, channel, and self-attention operations into convolution to adaptively learn and highlight the latent identity-invariant palmprint information, enhancing the overall discriminative power of the palmprint features. Extensive experimental results on four challenging palmprint databases demonstrate the promising effectiveness of both the proposed PalmALNet and CROI for unconstrained palmprint recognition.
Collapse
|
8
|
Lin Y, Wang Z, Zhang D, Cheng KT, Chen H. BoNuS: Boundary Mining for Nuclei Segmentation With Partial Point Labels. IEEE TRANSACTIONS ON MEDICAL IMAGING 2024; 43:2137-2147. [PMID: 38231818 DOI: 10.1109/tmi.2024.3355068] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/19/2024]
Abstract
Nuclei segmentation is a fundamental prerequisite in the digital pathology workflow. The development of automated methods for nuclei segmentation enables quantitative analysis of the wide existence and large variances in nuclei morphometry in histopathology images. However, manual annotation of tens of thousands of nuclei is tedious and time-consuming, which requires significant amount of human effort and domain-specific expertise. To alleviate this problem, in this paper, we propose a weakly-supervised nuclei segmentation method that only requires partial point labels of nuclei. Specifically, we propose a novel boundary mining framework for nuclei segmentation, named BoNuS, which simultaneously learns nuclei interior and boundary information from the point labels. To achieve this goal, we propose a novel boundary mining loss, which guides the model to learn the boundary information by exploring the pairwise pixel affinity in a multiple-instance learning manner. Then, we consider a more challenging problem, i.e., partial point label, where we propose a nuclei detection module with curriculum learning to detect the missing nuclei with prior morphological knowledge. The proposed method is validated on three public datasets, MoNuSeg, CPM, and CoNIC datasets. Experimental results demonstrate the superior performance of our method to the state-of-the-art weakly-supervised nuclei segmentation methods. Code: https://github.com/hust-linyi/bonus.
Collapse
|
9
|
Wan Y, Zhong Y, Ma A, Wang J, Zhang L. E2SCNet: Efficient Multiobjective Evolutionary Automatic Search for Remote Sensing Image Scene Classification Network Architecture. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:7752-7766. [PMID: 36395135 DOI: 10.1109/tnnls.2022.3220699] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
Remote sensing image scene classification methods based on deep learning have been widely studied and discussed. However, most of the network architectures are directly reliant on natural image processing methods and are fixed. A few studies have focused on automatic search mechanisms, but they cannot weigh the interpretation accuracy and the parameter quantity for practical application. As a result, automatic global search methods based on multiobjective evolutionary computation have more advantages. However, in the ranking process, the network individuals with large parameter quantities are easy to eliminate, but a higher accuracy may be obtained after full training. In addition, evolutionary neural architecture search methods often take several days. In this article, in order to solve the above concerns, we propose an efficient multiobjective evolutionary automatic search framework for remote sensing image scene classification deep learning network architectures (E2SCNet). In E2SCNet, eight kinds of lightweight operators are used to build a diversified search space, and the coding connection mode is flexible. In the search process, a large model retention mechanism is implemented through two-step multiobjective modeling and evolutionary search, where one step involves the "parameter quantity and accuracy," and the other step involves the "parameter quantity and accuracy growth quantity." Moreover, a super network is constructed to share the weight in the process of individual network evaluation and promote the search speed. The effectiveness of E2SCNet is proven by comparison with several networks designed by human experts and networks obtained by gradient and evolutionary computing-based search methods.
Collapse
|
10
|
Tan D, Huang Z, Peng X, Zhong W, Mahalec V. Deep Adaptive Fuzzy Clustering for Evolutionary Unsupervised Representation Learning. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:6103-6117. [PMID: 37027776 DOI: 10.1109/tnnls.2023.3243666] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Cluster assignment of large and complex datasets is a crucial but challenging task in pattern recognition and computer vision. In this study, we explore the possibility of employing fuzzy clustering in a deep neural network framework. Thus, we present a novel evolutionary unsupervised learning representation model with iterative optimization. It implements the deep adaptive fuzzy clustering (DAFC) strategy that learns a convolutional neural network classifier from given only unlabeled data samples. DAFC consists of a deep feature quality-verifying model and a fuzzy clustering model, where deep feature representation learning loss function and embedded fuzzy clustering with the weighted adaptive entropy is implemented. We joint fuzzy clustering to the deep reconstruction model, in which fuzzy membership is utilized to represent a clear structure of deep cluster assignments and jointly optimize for the deep representation learning and clustering. Also, the joint model evaluates current clustering performance by inspecting whether the resampled data from estimated bottleneck space have consistent clustering properties to improve the deep clustering model progressively. Experiments on various datasets show that the proposed method obtains a substantially better performance for both reconstruction and clustering quality compared to the other state-of-the-art deep clustering methods, as demonstrated with the in-depth analysis in the extensive experiments.
Collapse
|
11
|
Liang Y, Zhu L, Wang X, Yang Y. Penalizing the Hard Example But Not Too Much: A Strong Baseline for Fine-Grained Visual Classification. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:7048-7059. [PMID: 36409807 DOI: 10.1109/tnnls.2022.3213563] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
Though significant progress has been achieved on fine-grained visual classification (FGVC), severe overfitting still hinders model generalization. A recent study shows that hard samples in the training set can be easily fit, but most existing FGVC methods fail to classify some hard examples in the test set. The reason is that the model overfits those hard examples in the training set, but does not learn to generalize to unseen examples in the test set. In this article, we propose a moderate hard example modulation (MHEM) strategy to properly modulate the hard examples. MHEM encourages the model to not overfit hard examples and offers better generalization and discrimination. First, we introduce three conditions and formulate a general form of a modulated loss function. Second, we instantiate the loss function and provide a strong baseline for FGVC, where the performance of a naive backbone can be boosted and be comparable with recent methods. Moreover, we demonstrate that our baseline can be readily incorporated into the existing methods and empower these methods to be more discriminative. Equipped with our strong baseline, we achieve consistent improvements on three typical FGVC datasets, i.e., CUB-200-2011, Stanford Cars, and FGVC-Aircraft. We hope the idea of moderate hard example modulation will inspire future research work toward more effective fine-grained visual recognition.
Collapse
|
12
|
Chai Z, Luo L, Lin H, Heng PA, Chen H. Deep Omni-Supervised Learning for Rib Fracture Detection From Chest Radiology Images. IEEE TRANSACTIONS ON MEDICAL IMAGING 2024; 43:1972-1982. [PMID: 38215335 DOI: 10.1109/tmi.2024.3353248] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/14/2024]
Abstract
Deep learning (DL)-based rib fracture detection has shown promise of playing an important role in preventing mortality and improving patient outcome. Normally, developing DL-based object detection models requires a huge amount of bounding box annotation. However, annotating medical data is time-consuming and expertise-demanding, making obtaining a large amount of fine-grained annotations extremely infeasible. This poses a pressing need for developing label-efficient detection models to alleviate radiologists' labeling burden. To tackle this challenge, the literature on object detection has witnessed an increase of weakly-supervised and semi-supervised approaches, yet still lacks a unified framework that leverages various forms of fully-labeled, weakly-labeled, and unlabeled data. In this paper, we present a novel omni-supervised object detection network, ORF-Netv2, to leverage as much available supervision as possible. Specifically, a multi-branch omni-supervised detection head is introduced with each branch trained with a specific type of supervision. A co-training-based dynamic label assignment strategy is then proposed to enable flexible and robust learning from the weakly-labeled and unlabeled data. Extensive evaluation was conducted for the proposed framework with three rib fracture datasets on both chest CT and X-ray. By leveraging all forms of supervision, ORF-Netv2 achieves mAPs of 34.7, 44.7, and 19.4 on the three datasets, respectively, surpassing the baseline detector which uses only box annotations by mAP gains of 3.8, 4.8, and 5.0, respectively. Furthermore, ORF-Netv2 consistently outperforms other competitive label-efficient methods over various scenarios, showing a promising framework for label-efficient fracture detection. The code is available at: https://github.com/zhizhongchai/ORF-Net.
Collapse
|
13
|
Zhang D, Guo G, Zeng W, Li L, Han J. Generalized Weakly Supervised Object Localization. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:5395-5406. [PMID: 36129872 DOI: 10.1109/tnnls.2022.3204337] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
With the goal of learning to localize specific object semantics using the low-cost image-level annotation, weakly supervised object localization (WSOL) has been receiving increasing attention in recent years. Although existing literatures have studied a number of major issues in this field, one important yet challenging scenario, where the test object semantics may appear in the training phase (seen categories) or never been observed before (unseen categories), is still beyond the exploration of the existing works. We define this scenario as the generalized WSOL (GWSOL) and make a pioneering effort to study it in this article. By leveraging attribute vectors to associate seen and unseen categories, we involve threefold modeling components, i.e., the class-sensitive modeling, semantic-agnostic modeling, and content-aware modeling, into a unified end-to-end learning framework. Such design enables our model to recognize and localize unconstrained object semantics, learn compact and discriminative features that could represent the potential unseen categories, and customize content-aware attribute weights to avoid localizing on misleading attribute elements. To advance this research direction, we contribute the bounding-box manual annotations to the widely used AwA2 dataset and benchmark the GWSOL methods. Comprehensive experiments demonstrate the effectiveness of our proposed learning framework and each of the considered modeling components.
Collapse
|
14
|
Li H, Cao J, You K, Zhang Y, Ye J. Artificial intelligence-assisted management of retinal detachment from ultra-widefield fundus images based on weakly-supervised approach. Front Med (Lausanne) 2024; 11:1326004. [PMID: 38379556 PMCID: PMC10876892 DOI: 10.3389/fmed.2024.1326004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2023] [Accepted: 01/19/2024] [Indexed: 02/22/2024] Open
Abstract
Background Retinal detachment (RD) is a common sight-threatening condition in the emergency department. Early postural intervention based on detachment regions can improve visual prognosis. Methods We developed a weakly supervised model with 24,208 ultra-widefield fundus images to localize and coarsely outline the anatomical RD regions. The customized preoperative postural guidance was generated for patients accordingly. The localization performance was then compared with the baseline model and an ophthalmologist according to the reference standard established by the retina experts. Results In the 48-partition lesion detection, our proposed model reached an 86.42% (95% confidence interval (CI): 85.81-87.01%) precision and an 83.27% (95%CI: 82.62-83.90%) recall with an average precision (PA) of 0.9132. In contrast, the baseline model achieved a 92.67% (95%CI: 92.11-93.19%) precision and limited recall of 68.07% (95%CI: 67.25-68.88%). Our holistic lesion localization performance was comparable to the ophthalmologist's 89.16% (95%CI: 88.75-89.55%) precision and 83.38% (95%CI: 82.91-83.84%) recall. As to the performance of four-zone anatomical localization, compared with the ground truth, the un-weighted Cohen's κ coefficients were 0.710(95%CI: 0.659-0.761) and 0.753(95%CI: 0.702-0.804) for the weakly-supervised model and the general ophthalmologist, respectively. Conclusion The proposed weakly-supervised deep learning model showed outstanding performance comparable to that of the general ophthalmologist in localizing and outlining the RD regions. Hopefully, it would greatly facilitate managing RD patients, especially for medical referral and patient education.
Collapse
Affiliation(s)
- Huimin Li
- Eye Center, The Second Affiliated Hospital, School of Medicine, Zhejiang University, Hangzhou, Zhejiang, China
| | - Jing Cao
- Eye Center, The Second Affiliated Hospital, School of Medicine, Zhejiang University, Hangzhou, Zhejiang, China
| | - Kun You
- Zhejiang Feitu Medical Imaging Co., Ltd, Hangzhou, Zhejiang, China
| | - Yuehua Zhang
- Zhejiang Feitu Medical Imaging Co., Ltd, Hangzhou, Zhejiang, China
| | - Juan Ye
- Eye Center, The Second Affiliated Hospital, School of Medicine, Zhejiang University, Hangzhou, Zhejiang, China
| |
Collapse
|
15
|
Bakouri M, Alyami N, Alassaf A, Waly M, Alqahtani T, AlMohimeed I, Alqahtani A, Samsuzzaman M, Ismail HF, Alharbi Y. Sound-Based Localization Using LSTM Networks for Visually Impaired Navigation. SENSORS (BASEL, SWITZERLAND) 2023; 23:4033. [PMID: 37112374 PMCID: PMC10145617 DOI: 10.3390/s23084033] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/19/2023] [Revised: 04/04/2023] [Accepted: 04/14/2023] [Indexed: 06/19/2023]
Abstract
In this work, we developed a prototype that adopted sound-based systems for localization of visually impaired individuals. The system was implemented based on a wireless ultrasound network, which helped the blind and visually impaired to navigate and maneuver autonomously. Ultrasonic-based systems use high-frequency sound waves to detect obstacles in the environment and provide location information to the user. Voice recognition and long short-term memory (LSTM) techniques were used to design the algorithms. The Dijkstra algorithm was also used to determine the shortest distance between two places. Assistive hardware tools, which included an ultrasonic sensor network, a global positioning system (GPS), and a digital compass, were utilized to implement this method. For indoor evaluation, three nodes were localized on the doors of different rooms inside the house, including the kitchen, bathroom, and bedroom. The coordinates (interactive latitude and longitude points) of four outdoor areas (mosque, laundry, supermarket, and home) were identified and stored in a microcomputer's memory to evaluate the outdoor settings. The results showed that the root mean square error for indoor settings after 45 trials is about 0.192. In addition, the Dijkstra algorithm determined that the shortest distance between two places was within an accuracy of 97%.
Collapse
Affiliation(s)
- Mohsen Bakouri
- Department of Medical Equipment Technology, College of Applied Medical Science, Majmaah University, Al-Majmaah 11952, Saudi Arabia
- Department of Physics, College of Arts, Fezzan University, Traghen 71340, Libya
| | - Naif Alyami
- Department of Medical Equipment Technology, College of Applied Medical Science, Majmaah University, Al-Majmaah 11952, Saudi Arabia
| | - Ahmad Alassaf
- Department of Medical Equipment Technology, College of Applied Medical Science, Majmaah University, Al-Majmaah 11952, Saudi Arabia
| | - Mohamed Waly
- Department of Medical Equipment Technology, College of Applied Medical Science, Majmaah University, Al-Majmaah 11952, Saudi Arabia
| | - Tariq Alqahtani
- Department of Medical Equipment Technology, College of Applied Medical Science, Majmaah University, Al-Majmaah 11952, Saudi Arabia
| | - Ibrahim AlMohimeed
- Department of Medical Equipment Technology, College of Applied Medical Science, Majmaah University, Al-Majmaah 11952, Saudi Arabia
| | - Abdulrahman Alqahtani
- Department of Medical Equipment Technology, College of Applied Medical Science, Majmaah University, Al-Majmaah 11952, Saudi Arabia
- Department of Biomedical Technology, College of Applied Medical Sciences in Al-Kharj, Prince Sattam Bin Abdulaziz University, Al-Kharj 11942, Saudi Arabia
| | - Md Samsuzzaman
- Department of Computer and Communication Engineering, Faculty of Computer Science and Engineering, Patuakhali Science and Technology, Patuakhali 6800, Bangladesh
| | - Husham Farouk Ismail
- Department of Biomedical Equipment Technology, Inaya Medical College, Riyadh 13541, Saudi Arabia
| | - Yousef Alharbi
- Department of Biomedical Technology, College of Applied Medical Sciences in Al-Kharj, Prince Sattam Bin Abdulaziz University, Al-Kharj 11942, Saudi Arabia
| |
Collapse
|
16
|
Li K, Qian Z, Han Y, Chang EIC, Wei B, Lai M, Liao J, Fan Y, Xu Y. Weakly supervised histopathology image segmentation with self-attention. Med Image Anal 2023; 86:102791. [PMID: 36933385 DOI: 10.1016/j.media.2023.102791] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2022] [Revised: 01/09/2023] [Accepted: 02/24/2023] [Indexed: 03/13/2023]
Abstract
Accurate segmentation in histopathology images at pixel-level plays a critical role in the digital pathology workflow. The development of weakly supervised methods for histopathology image segmentation liberates pathologists from time-consuming and labor-intensive works, opening up possibilities of further automated quantitative analysis of whole-slide histopathology images. As an effective subgroup of weakly supervised methods, multiple instance learning (MIL) has achieved great success in histopathology images. In this paper, we specially treat pixels as instances so that the histopathology image segmentation task is transformed into an instance prediction task in MIL. However, the lack of relations between instances in MIL limits the further improvement of segmentation performance. Therefore, we propose a novel weakly supervised method called SA-MIL for pixel-level segmentation in histopathology images. SA-MIL introduces a self-attention mechanism into the MIL framework, which captures global correlation among all instances. In addition, we use deep supervision to make the best use of information from limited annotations in the weakly supervised method. Our approach makes up for the shortcoming that instances are independent of each other in MIL by aggregating global contextual information. We demonstrate state-of-the-art results compared to other weakly supervised methods on two histopathology image datasets. It is evident that our approach has generalization ability for the high performance on both tissue and cell histopathology datasets. There is potential in our approach for various applications in medical images.
Collapse
Affiliation(s)
- Kailu Li
- School of Biological Science and Medical Engineering, State Key Laboratory of Software Development Environment, Key Laboratory of Biomechanics, Mechanobiology of Ministry of Education and Beijing Advanced Innovation Centre for Biomedical Engineering, Beihang University, Beijing 100191, China.
| | - Ziniu Qian
- School of Biological Science and Medical Engineering, State Key Laboratory of Software Development Environment, Key Laboratory of Biomechanics, Mechanobiology of Ministry of Education and Beijing Advanced Innovation Centre for Biomedical Engineering, Beihang University, Beijing 100191, China.
| | - Yingnan Han
- School of Biological Science and Medical Engineering, State Key Laboratory of Software Development Environment, Key Laboratory of Biomechanics, Mechanobiology of Ministry of Education and Beijing Advanced Innovation Centre for Biomedical Engineering, Beihang University, Beijing 100191, China.
| | | | | | - Maode Lai
- Department of Pathology, School of Medicine, Zhejiang University, Hangzhou 310027, China.
| | - Jing Liao
- Department of Computer Science, City University of Hong Kong, 999077, Hong Kong SAR, China.
| | - Yubo Fan
- School of Biological Science and Medical Engineering, State Key Laboratory of Software Development Environment, Key Laboratory of Biomechanics, Mechanobiology of Ministry of Education and Beijing Advanced Innovation Centre for Biomedical Engineering, Beihang University, Beijing 100191, China.
| | - Yan Xu
- School of Biological Science and Medical Engineering, State Key Laboratory of Software Development Environment, Key Laboratory of Biomechanics, Mechanobiology of Ministry of Education and Beijing Advanced Innovation Centre for Biomedical Engineering, Beihang University, Beijing 100191, China; Microsoft Research, Beijing 100080, China.
| |
Collapse
|
17
|
Zhao T, Han J, Yang L, Zhang D. Equivalent Classification Mapping for Weakly Supervised Temporal Action Localization. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2023; 45:3019-3031. [PMID: 35635810 DOI: 10.1109/tpami.2022.3178957] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Weakly supervised temporal action localization is a newly emerging yet widely studied topic in recent years. The existing methods can be categorized into two localization-by-classification pipelines, i.e., the pre-classification pipeline and the post-classification pipeline. The pre-classification pipeline first performs classification on each video snippet, and then, aggregates the snippet-level classification scores to obtain the video-level classification score. In contrast, the post-classification pipeline aggregates the snippet-level features first and then predicts the video-level classification score based on the aggregated feature. Although the classifiers in these two pipelines are used in different ways, the role they play is exactly the same-to classify the given features to identify the corresponding action categories. To this end, an ideal classifier can make both pipelines work. This inspires us to simultaneously learn these two pipelines in a unified framework to obtain an effective classifier. Specifically, in the proposed learning framework, we implement two parallel network streams to model the two localization-by-classification pipelines simultaneously and make the two network streams share the same classifier. This achieves the novel Equivalent Classification Mapping (ECM) mechanism. Moreover, we discover that an ideal classifier may possess two characteristics: 1) the frame-level classification scores obtained from the pre-classification stream and the feature aggregation weights in the post-classification stream should be consistent; and 2) the classification results of these two streams should be identical. Based on these two characteristics, we further introduce a weight-transition module and an equivalent training strategy into the proposed learning framework, which assists to thoroughly mine the equivalence mechanism. Comprehensive experiments are conducted on three benchmarks and ECM achieves accurate action localization results.
Collapse
|
18
|
Yang W, Chen M, Wu H, Lin Z, Kong D, Xie S, Takamasu K. Deep learning-based weak micro-defect detection on an optical lens surface with micro vision. OPTICS EXPRESS 2023; 31:5593-5608. [PMID: 36823835 DOI: 10.1364/oe.482389] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/05/2022] [Accepted: 01/12/2023] [Indexed: 06/18/2023]
Abstract
To solve limited efficiency and reliability issues caused by current manual quality control processes in optical lens (OL) production environments, we propose an automatic micro vision-based inspection system named MVIS used to capture the surface defect images and make the OL dataset and predictive inference. Because of low resolution and recognition, OL defects are weak, due to their ambiguous morphology and micro size, making a poor detection effect for the existing method. A deep-learning algorithm for a weak micro-defect detector named ISE-YOLO is proposed, making the best for deep layers, utilizing the ISE attention mechanism module in the neck, and introducing a novel class loss function to extract richer semantics from convolution layers and learning more information. Experimental results on the OL dataset show that ISE-YOLO demonstrates a better performance, with the mean average precision, recall, and F1 score increasing by 3.62%, 6.12% and 3.07% respectively, compared to the YOLOv5. In addition, compared with YOLOv7, which is the latest version of YOLO serials, the mean average precision of ISE-YOLO is improved by 2.58%, the weight size is decreased by more than 30% and the speed is increased by 16%.
Collapse
|
19
|
Kamath V, Renuka A. Deep Learning Based Object Detection for Resource Constrained Devices- Systematic Review, Future Trends and Challenges Ahead. Neurocomputing 2023. [DOI: 10.1016/j.neucom.2023.02.006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/12/2023]
|
20
|
Cardoen B, Wong T, Alan P, Lee S, Matsubara JA, Nabi IR, Hamarneh G. SPECHT: Self-tuning Plausibility based object detection Enables quantification of Conflict in Heterogeneous multi-scale microscopy. PLoS One 2022; 17:e0276726. [PMID: 36580473 PMCID: PMC9799313 DOI: 10.1371/journal.pone.0276726] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2021] [Accepted: 10/12/2022] [Indexed: 12/30/2022] Open
Abstract
Identification of small objects in fluorescence microscopy is a non-trivial task burdened by parameter-sensitive algorithms, for which there is a clear need for an approach that adapts dynamically to changing imaging conditions. Here, we introduce an adaptive object detection method that, given a microscopy image and an image level label, uses kurtosis-based matching of the distribution of the image differential to express operator intent in terms of recall or precision. We show how a theoretical upper bound of the statistical distance in feature space enables application of belief theory to obtain statistical support for each detected object, capturing those aspects of the image that support the label, and to what extent. We validate our method on 2 datasets: distinguishing sub-diffraction limit caveolae and scaffold by stimulated emission depletion (STED) super-resolution microscopy; and detecting amyloid-β deposits in confocal microscopy retinal cross-sections of neuropathologically confirmed Alzheimer's disease donor tissue. Our results are consistent with biological ground truth and with previous subcellular object classification results, and add insight into more nuanced class transition dynamics. We illustrate the novel application of belief theory to object detection in heterogeneous microscopy datasets and the quantification of conflict of evidence in a joint belief function. By applying our method successfully to diffraction-limited confocal imaging of tissue sections and super-resolution microscopy of subcellular structures, we demonstrate multi-scale applicability.
Collapse
Affiliation(s)
- Ben Cardoen
- Medical Image Analysis Laboratory, School of Computing Science, Simon Fraser University, Burnaby, British Columbia, Canada
- * E-mail: (BC); (IRN); (GH)
| | - Timothy Wong
- Department of Cellular and Physiological Sciences, Life Sciences Institute, University of British Columbia, Vancouver, British Columbia, Canada
| | - Parsa Alan
- Department of Cellular and Physiological Sciences, Life Sciences Institute, University of British Columbia, Vancouver, British Columbia, Canada
| | - Sieun Lee
- Department of Ophthalmology and Visual Sciences, Eye Care Centre, University of British Columbia, Vancouver, British Columbia, Canada
- Mental Health & Clinical Neurosciences, School of Medicine, University of Nottingham, Nottingham, United Kingdom
| | - Joanne Aiko Matsubara
- Department of Ophthalmology and Visual Sciences, Eye Care Centre, University of British Columbia, Vancouver, British Columbia, Canada
| | - Ivan Robert Nabi
- Department of Cellular and Physiological Sciences, Life Sciences Institute, University of British Columbia, Vancouver, British Columbia, Canada
- School of Biomedical Engineering, University of British Columbia, Vancouver, British Columbia, Canada
- * E-mail: (BC); (IRN); (GH)
| | - Ghassan Hamarneh
- Medical Image Analysis Laboratory, School of Computing Science, Simon Fraser University, Burnaby, British Columbia, Canada
| |
Collapse
|
21
|
Yang L, Han J, Zhao T, Lin T, Zhang D, Chen J. Background-Click Supervision for Temporal Action Localization. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2022; 44:9814-9829. [PMID: 34855585 DOI: 10.1109/tpami.2021.3132058] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Weakly supervised temporal action localization aims at learning the instance-level action pattern from the video-level labels, where a significant challenge is action-context confusion. To overcome this challenge, one recent work builds an action-click supervision framework. It requires similar annotation costs but can steadily improve the localization performance when compared to the conventional weakly supervised methods. In this paper, by revealing that the performance bottleneck of the existing approaches mainly comes from the background errors, we find that a stronger action localizer can be trained with labels on the background video frames rather than those on the action frames. To this end, we convert the action-click supervision to the background-click supervision and develop a novel method, called BackTAL. Specifically, BackTAL implements two-fold modeling on the background video frames, i.e., the position modeling and the feature modeling. In position modeling, we not only conduct supervised learning on the annotated video frames but also design a score separation module to enlarge the score differences between the potential action frames and backgrounds. In feature modeling, we propose an affinity module to measure frame-specific similarities among neighboring frames and dynamically attend to informative neighbors when calculating temporal convolution. Extensive experiments on three benchmarks are conducted, which demonstrate the high performance of the established BackTAL and the rationality of the proposed background-click supervision.
Collapse
|
22
|
Zong G, Wei L, Guo S, Wang Y. A cascaded refined rgb-d salient object detection network based on the attention mechanism. APPL INTELL 2022. [DOI: 10.1007/s10489-022-04186-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
|
23
|
Weakly Supervised Object Detection with Symmetry Context. Symmetry (Basel) 2022. [DOI: 10.3390/sym14091832] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
Recently, weakly supervised object detection (WSOD) with image-level annotation has attracted great attention in the field of computer vision. The problem is often formulated as multiple instance learning in the existing studies, which are often trapped by discriminative object parts and fail to localize the object boundary precisely. In this work, we alleviate this problem by exploiting contextual information that may potentially increase object localization accuracy. Specifically, we propose novel context proposal mining strategies and a Symmetry Context Module to leverage surrounding contextual information of precomputed region proposals. Both naive and Gaussian-based context proposal mining methods are adopted to yield informative context proposals symmetrically surrounding region proposals. Then mined context proposals are fed into our Symmetry Context Module to encourage the model to select proposals that contain the whole object, rather than the most discriminative object parts. Experimental results show that the mean Average Precision (mAP) of the proposed method achieves 52.4% on the PASCAL VOC 2007 dataset, outperforming the state-of-the-art methods and demonstrating its effectiveness for weakly supervised object detection.
Collapse
|
24
|
Li Y, Xue Y, Li L, Zhang X, Qian X. Domain Adaptive Box-Supervised Instance Segmentation Network for Mitosis Detection. IEEE TRANSACTIONS ON MEDICAL IMAGING 2022; 41:2469-2485. [PMID: 35389862 DOI: 10.1109/tmi.2022.3165518] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
The number of mitotic cells present in histopathological slides is an important predictor of tumor proliferation in the diagnosis of breast cancer. However, the current approaches can hardly perform precise pixel-level prediction for mitosis datasets with only weak labels (i.e., only provide the centroid location of mitotic cells), and take no account of the large domain gap across histopathological slides from different pathology laboratories. In this work, we propose a Domain adaptive Box-supervised Instance segmentation Network (DBIN) to address the above issues. In DBIN, we propose a high-performance Box-supervised Instance-Aware (BIA) head with the core idea of redesigning three box-supervised mask loss terms. Furthermore, we add a Pseudo-Mask-supervised Semantic (PMS) head for enriching characteristics extracted from underlying feature maps. Besides, we align the pixel-level feature distributions between source and target domains by a Cross-Domain Adaptive Module (CDAM), so as to adapt the detector learned from one lab can work well on unlabeled data from another lab. The proposed method achieves state-of-the-art performance across four mainstream datasets. A series of analysis and experiments show that our proposed BIA and PMS head can accomplish mitosis pixel-wise localization under weak supervision, and we can boost the generalization ability of our model by CDAM.
Collapse
|
25
|
Milani F, Pinciroli Vago NO, Fraternali P. Proposals Generation for Weakly Supervised Object Detection in Artwork Images. J Imaging 2022; 8:215. [PMID: 36005458 PMCID: PMC9410216 DOI: 10.3390/jimaging8080215] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2022] [Revised: 08/03/2022] [Accepted: 08/04/2022] [Indexed: 11/16/2022] Open
Abstract
Object Detection requires many precise annotations, which are available for natural images but not for many non-natural data sets such as artworks data sets. A solution is using Weakly Supervised Object Detection (WSOD) techniques that learn accurate object localization from image-level labels. Studies have demonstrated that state-of-the-art end-to-end architectures may not be suitable for domains in which images or classes sensibly differ from those used to pre-train networks. This paper presents a novel two-stage Weakly Supervised Object Detection approach for obtaining accurate bounding boxes on non-natural data sets. The proposed method exploits existing classification knowledge to generate pseudo-ground truth bounding boxes from Class Activation Maps (CAMs). The automatically generated annotations are used to train a robust Faster R-CNN object detector. Quantitative and qualitative analysis shows that bounding boxes generated from CAMs can compensate for the lack of manually annotated ground truth (GT) and that an object detector, trained with such pseudo-GT, surpasses end-to-end WSOD state-of-the-art methods on ArtDL 2.0 (≈41.5% mAP) and IconArt (≈17% mAP), two artworks data sets. The proposed solution is a step towards the computer-aided study of non-natural images and opens the way to more advanced tasks, e.g., automatic artwork image captioning for digital archive applications.
Collapse
Affiliation(s)
- Federico Milani
- Department of Electronics Information and Bioengineering, Politecnico di Milano, 20133 Milano, Italy
| | | | | |
Collapse
|
26
|
Object Localization in Weakly Labeled Remote Sensing Images Based on Deep Convolutional Features. REMOTE SENSING 2022. [DOI: 10.3390/rs14133230] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
Object recognition, as one of the most fundamental and challenging problems in high-resolution remote sensing image interpretation, has received increasing attention in recent years. However, most conventional object recognition pipelines aim to recognize instances with bounding boxes in a supervised learning strategy, which require intensive and manual labor for instance annotation creation. In this paper, we propose a weakly supervised learning method to alleviate this problem. The core idea of our method is to recognize multiple objects in an image using only image-level semantic labels and indicate the recognized objects with location points instead of box extent. Specifically, a deep convolutional neural network is first trained to perform semantic scene classification, of which the result is employed for the categorical determination of objects in an image. Then, by back-propagating the categorical feature from the fully connected layer to the deep convolutional layer, the categorical and spatial information of an image are combined to obtain an object discriminative localization map, which can effectively indicate the salient regions of objects. Next, a dynamic updating method of local response extremum is proposed to further determine the locations of objects in an image. Finally, extensive experiments are conducted to localize aircraft and oiltanks in remote sensing images based on different convolutional neural networks. Experimental results show that the proposed method outperforms the-state-of-the-art methods, achieving the precision, recall, and F1-score at 94.50%, 88.79%, and 91.56% for aircraft localization and 89.12%, 83.04%, and 85.97% for oiltank localization, respectively. We hope that our work could serve as a basic reference for remote sensing object localization via a weakly supervised strategy and provide new opportunities for further research.
Collapse
|
27
|
Li H, Li Y, Jin Y, Wang T. Object representation enhancement for self‐supervised colocalization. INT J INTELL SYST 2022. [DOI: 10.1002/int.22938] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Affiliation(s)
- Huifang Li
- School of Computer and Information Technology Beijing Jiaotong University Beijing China
| | - Yidong Li
- School of Computer and Information Technology Beijing Jiaotong University Beijing China
| | - Yi Jin
- School of Computer and Information Technology Beijing Jiaotong University Beijing China
| | - Tao Wang
- School of Computer and Information Technology Beijing Jiaotong University Beijing China
| |
Collapse
|
28
|
Xu X, Sanford T, Turkbey B, Xu S, Wood BJ, Yan P. Shadow-Consistent Semi-Supervised Learning for Prostate Ultrasound Segmentation. IEEE TRANSACTIONS ON MEDICAL IMAGING 2022; 41:1331-1345. [PMID: 34971530 PMCID: PMC9709821 DOI: 10.1109/tmi.2021.3139999] [Citation(s) in RCA: 20] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
Prostate segmentation in transrectal ultrasound (TRUS) image is an essential prerequisite for many prostate-related clinical procedures, which, however, is also a long-standing problem due to the challenges caused by the low image quality and shadow artifacts. In this paper, we propose a Shadow-consistent Semi-supervised Learning (SCO-SSL) method with two novel mechanisms, namely shadow augmentation (Shadow-AUG) and shadow dropout (Shadow-DROP), to tackle this challenging problem. Specifically, Shadow-AUG enriches training samples by adding simulated shadow artifacts to the images to make the network robust to the shadow patterns. Shadow-DROP enforces the segmentation network to infer the prostate boundary using the neighboring shadow-free pixels. Extensive experiments are conducted on two large clinical datasets (a public dataset containing 1,761 TRUS volumes and an in-house dataset containing 662 TRUS volumes). In the fully-supervised setting, a vanilla U-Net equipped with our Shadow-AUG&Shadow-DROP outperforms the state-of-the-arts with statistical significance. In the semi-supervised setting, even with only 20% labeled training data, our SCO-SSL method still achieves highly competitive performance, suggesting great clinical value in relieving the labor of data annotation. Source code is released at https://github.com/DIAL-RPI/SCO-SSL.
Collapse
|
29
|
Adke S, Li C, Rasheed KM, Maier FW. Supervised and Weakly Supervised Deep Learning for Segmentation and Counting of Cotton Bolls Using Proximal Imagery. SENSORS 2022; 22:s22103688. [PMID: 35632096 PMCID: PMC9147286 DOI: 10.3390/s22103688] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/08/2022] [Revised: 04/28/2022] [Accepted: 05/05/2022] [Indexed: 11/16/2022]
Abstract
The total boll count from a plant is one of the most important phenotypic traits for cotton breeding and is also an important factor for growers to estimate the final yield. With the recent advances in deep learning, many supervised learning approaches have been implemented to perform phenotypic trait measurement from images for various crops, but few studies have been conducted to count cotton bolls from field images. Supervised learning models require a vast number of annotated images for training, which has become a bottleneck for machine learning model development. The goal of this study is to develop both fully supervised and weakly supervised deep learning models to segment and count cotton bolls from proximal imagery. A total of 290 RGB images of cotton plants from both potted (indoor and outdoor) and in-field settings were taken by consumer-grade cameras and the raw images were divided into 4350 image tiles for further model training and testing. Two supervised models (Mask R-CNN and S-Count) and two weakly supervised approaches (WS-Count and CountSeg) were compared in terms of boll count accuracy and annotation costs. The results revealed that the weakly supervised counting approaches performed well with RMSE values of 1.826 and 1.284 for WS-Count and CountSeg, respectively, whereas the fully supervised models achieve RMSE values of 1.181 and 1.175 for S-Count and Mask R-CNN, respectively, when the number of bolls in an image patch is less than 10. In terms of data annotation costs, the weakly supervised approaches were at least 10 times more cost efficient than the supervised approach for boll counting. In the future, the deep learning models developed in this study can be extended to other plant organs, such as main stalks, nodes, and primary and secondary branches. Both the supervised and weakly supervised deep learning models for boll counting with low-cost RGB images can be used by cotton breeders, physiologists, and growers alike to improve crop breeding and yield estimation.
Collapse
Affiliation(s)
- Shrinidhi Adke
- Institute of Artificial Intelligence, University of Georgia, Athens, GA 30602, USA; (S.A.); (K.M.R.); (F.W.M.)
- Bio-Sensing and Instrumentation Laboratory, College of Engineering, University of Georgia, Athens, GA 30602, USA
| | - Changying Li
- Institute of Artificial Intelligence, University of Georgia, Athens, GA 30602, USA; (S.A.); (K.M.R.); (F.W.M.)
- Bio-Sensing and Instrumentation Laboratory, College of Engineering, University of Georgia, Athens, GA 30602, USA
- Phenomics and Plant Robotics Center, University of Georgia, Athens, GA 30602, USA
- Correspondence:
| | - Khaled M. Rasheed
- Institute of Artificial Intelligence, University of Georgia, Athens, GA 30602, USA; (S.A.); (K.M.R.); (F.W.M.)
- Phenomics and Plant Robotics Center, University of Georgia, Athens, GA 30602, USA
| | - Frederick W. Maier
- Institute of Artificial Intelligence, University of Georgia, Athens, GA 30602, USA; (S.A.); (K.M.R.); (F.W.M.)
| |
Collapse
|
30
|
RSMNet: A Regional Similar Module Network for Weakly Supervised Object Localization. Neural Process Lett 2022. [DOI: 10.1007/s11063-022-10849-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
31
|
Jafari MH, Luong C, Tsang M, Gu AN, Van Woudenberg N, Rohling R, Tsang T, Abolmaesumi P. U-LanD: Uncertainty-Driven Video Landmark Detection. IEEE TRANSACTIONS ON MEDICAL IMAGING 2022; 41:793-804. [PMID: 34705639 DOI: 10.1109/tmi.2021.3123547] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
This paper presents U-LanD, a framework for automatic detection of landmarks on key frames of the video by leveraging the uncertainty of landmark prediction. We tackle a specifically challenging problem, where training labels are noisy and highly sparse. U-LanD builds upon a pivotal observation: a deep Bayesian landmark detector solely trained on key video frames, has significantly lower predictive uncertainty on those frames vs. other frames in videos. We use this observation as an unsupervised signal to automatically recognize key frames on which we detect landmarks. As a test-bed for our framework, we use ultrasound imaging videos of the heart, where sparse and noisy clinical labels are only available for a single frame in each video. Using data from 4,493 patients, we demonstrate that U-LanD can exceedingly outperform the state-of-the-art non-Bayesian counterpart by a noticeable absolute margin of 42% in R2 score, with almost no overhead imposed on the model size.
Collapse
|
32
|
Liu P, Zheng G. Handling Imbalanced Data: Uncertainty-guided Virtual Adversarial Training with Batch Nuclear-norm Optimization for Semi-supervised Medical Image Classification. IEEE J Biomed Health Inform 2022; 26:2983-2994. [PMID: 35344500 DOI: 10.1109/jbhi.2022.3162748] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
In many clinical settings, a lot of medical image datasets suffer from imbalance problems, which makes predictions of trained models to be biased toward majority classes. Semi-supervised Learning (SSL) algorithms trained with such imbalanced datasets become more problematic since pseudo-supervision of unlabeled data are generated from the model's biased predictions. To address these issues, in this work, we propose a novel semi-supervised deep learning method, i.e., uncertainty-guided virtual adversarial training (VAT) with batch nuclear-norm (BNN) optimization, for large-scale medical image classification. To effectively exploit useful information from both labeled and unlabeled data, we leverage VAT and BNN optimization to harness the underlying knowledge, which helps to improve discriminability, diversity and generalization of the trained models. More concretely, our network is trained by minimizing a combination of four types of losses, including a supervised cross-entropy loss, a BNN loss defined on the output matrix of labeled data batch (lBNN loss), a negative BNN loss defined on the output matrix of unlabeled data batch (uBNN loss), and a VAT loss on both labeled and unlabeled data. We additionally propose to use uncertainty estimation to filter out unlabeled samples near the decision boundary when computing the VAT loss. We conduct comprehensive experiments to evaluate the performance of our method on two publicly available datasets and one in-house collected dataset. The experimental results demonstrated that our method achieved better results than state-of-the-art SSL methods.
Collapse
|
33
|
The Challenge of Data Annotation in Deep Learning—A Case Study on Whole Plant Corn Silage. SENSORS 2022; 22:s22041596. [PMID: 35214497 PMCID: PMC8879292 DOI: 10.3390/s22041596] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/20/2021] [Revised: 02/14/2022] [Accepted: 02/16/2022] [Indexed: 02/04/2023]
Abstract
Recent advances in computer vision are primarily driven by the usage of deep learning, which is known to require large amounts of data, and creating datasets for this purpose is not a trivial task. Larger benchmark datasets often have detailed processes with multiple stages and users with different roles during annotation. However, this can be difficult to implement in smaller projects where resources can be limited. Therefore, in this work we present our processes for creating an image dataset for kernel fragmentation and stover overlengths in Whole Plant Corn Silage. This includes the guidelines for annotating object instances in respective classes and statistics of gathered annotations. Given the challenging image conditions, where objects are present in large amounts of occlusion and clutter, the datasets appear appropriate for training models. However, we experience annotator inconsistency, which can hamper evaluation. Based on this we argue the importance of having an evaluation form independent of the manual annotation where we evaluate our models with physically based sieving metrics. Additionally, instead of the traditional time-consuming manual annotation approach, we evaluate Semi-Supervised Learning as an alternative, showing competitive results while requiring fewer annotations. Specifically, given a relatively large supervised set of around 1400 images we can improve the Average Precision by a number of percentage points. Additionally, we show a significantly large improvement when using an extremely small set of just over 100 images, with over 3× in Average Precision and up to 20 percentage points when estimating the quality.
Collapse
|
34
|
Liu Y, Wei YS, Yan H, Li GB, Lin L. Causal Reasoning Meets Visual Representation Learning: A Prospective Study. MACHINE INTELLIGENCE RESEARCH 2022; 19:485-511. [PMCID: PMC9638478 DOI: 10.1007/s11633-022-1362-z] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/09/2022] [Accepted: 08/01/2022] [Indexed: 09/29/2023]
Abstract
Visual representation learning is ubiquitous in various real-world applications, including visual comprehension, video understanding, multi-modal analysis, human-computer interaction, and urban computing. Due to the emergence of huge amounts of multimodal heterogeneous spatial/temporal/spatial-temporal data in the big data era, the lack of interpretability, robustness, and out-of-distribution generalization are becoming the challenges of the existing visual models. The majority of the existing methods tend to fit the original data/variable distributions and ignore the essential causal relations behind the multi-modal knowledge, which lacks unified guidance and analysis about why modern visual representation learning methods easily collapse into data bias and have limited generalization and cognitive abilities. Inspired by the strong inference ability of human-level agents, recent years have therefore witnessed great effort in developing causal reasoning paradigms to realize robust representation and model learning with good cognitive ability. In this paper, we conduct a comprehensive review of existing causal reasoning methods for visual representation learning, covering fundamental theories, models, and datasets. The limitations of current methods and datasets are also discussed. Moreover, we propose some prospective challenges, opportunities, and future research directions for benchmarking causal reasoning algorithms in visual representation learning. This paper aims to provide a comprehensive overview of this emerging field, attract attention, encourage discussions, bring to the forefront the urgency of developing novel causal reasoning methods, publicly available benchmarks, and consensus-building standards for reliable visual representation learning and related real-world applications more efficiently.
Collapse
Affiliation(s)
- Yang Liu
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, 510006 China
| | - Yu-Shen Wei
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, 510006 China
| | - Hong Yan
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, 510006 China
| | - Guan-Bin Li
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, 510006 China
| | - Liang Lin
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, 510006 China
| |
Collapse
|
35
|
Pinciroli Vago NO, Milani F, Fraternali P, da Silva Torres R. Comparing CAM Algorithms for the Identification of Salient Image Features in Iconography Artwork Analysis. J Imaging 2021; 7:106. [PMID: 39080894 PMCID: PMC8321385 DOI: 10.3390/jimaging7070106] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2021] [Revised: 06/16/2021] [Accepted: 06/24/2021] [Indexed: 12/13/2022] Open
Abstract
Iconography studies the visual content of artworks by considering the themes portrayed in them and their representation. Computer Vision has been used to identify iconographic subjects in paintings and Convolutional Neural Networks enabled the effective classification of characters in Christian art paintings. However, it still has to be demonstrated if the classification results obtained by CNNs rely on the same iconographic properties that human experts exploit when studying iconography and if the architecture of a classifier trained on whole artwork images can be exploited to support the much harder task of object detection. A suitable approach for exposing the process of classification by neural models relies on Class Activation Maps, which emphasize the areas of an image contributing the most to the classification. This work compares state-of-the-art algorithms (CAM, Grad-CAM, Grad-CAM++, and Smooth Grad-CAM++) in terms of their capacity of identifying the iconographic attributes that determine the classification of characters in Christian art paintings. Quantitative and qualitative analyses show that Grad-CAM, Grad-CAM++, and Smooth Grad-CAM++ have similar performances while CAM has lower efficacy. Smooth Grad-CAM++ isolates multiple disconnected image regions that identify small iconographic symbols well. Grad-CAM produces wider and more contiguous areas that cover large iconographic symbols better. The salient image areas computed by the CAM algorithms have been used to estimate object-level bounding boxes and a quantitative analysis shows that the boxes estimated with Grad-CAM reach 55% average IoU, 61% GT-known localization and 31% mAP. The obtained results are a step towards the computer-aided study of the variations of iconographic elements positioning and mutual relations in artworks and open the way to the automatic creation of bounding boxes for training detectors of iconographic symbols in Christian art images.
Collapse
Affiliation(s)
- Nicolò Oreste Pinciroli Vago
- Department of Electronics Information and Bioengineering, Politecnico di Milano, 20133 Milano, Italy; (N.O.P.V.); (P.F.)
- Department of ICT and Engineering, NTNU—Norwegian University of Science and Technology, 6009 Ålesund, Norway (R.d.S.T.)
| | - Federico Milani
- Department of Electronics Information and Bioengineering, Politecnico di Milano, 20133 Milano, Italy; (N.O.P.V.); (P.F.)
| | - Piero Fraternali
- Department of Electronics Information and Bioengineering, Politecnico di Milano, 20133 Milano, Italy; (N.O.P.V.); (P.F.)
| | - Ricardo da Silva Torres
- Department of ICT and Engineering, NTNU—Norwegian University of Science and Technology, 6009 Ålesund, Norway (R.d.S.T.)
| |
Collapse
|