1
|
Borowa A, Rymarczyk D, Żyła M, Kańdula M, Sánchez-Fernández A, Rataj K, Struski Ł, Tabor J, Zieliński B. Decoding phenotypic screening: A comparative analysis of image representations. Comput Struct Biotechnol J 2024; 23:1181-1188. [PMID: 38510976 PMCID: PMC10951426 DOI: 10.1016/j.csbj.2024.02.022] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2023] [Revised: 02/26/2024] [Accepted: 02/26/2024] [Indexed: 03/22/2024] Open
Abstract
Biomedical imaging techniques such as high content screening (HCS) are valuable for drug discovery, but high costs limit their use to pharmaceutical companies. To address this issue, The JUMP-CP consortium released a massive open image dataset of chemical and genetic perturbations, providing a valuable resource for deep learning research. In this work, we aim to utilize the JUMP-CP dataset to develop a universal representation model for HCS data, mainly data generated using U2OS cells and CellPainting protocol, using supervised and self-supervised learning approaches. We propose an evaluation protocol that assesses their performance on mode of action and property prediction tasks using a popular phenotypic screening dataset. Results show that the self-supervised approach that uses data from multiple consortium partners provides representation that is more robust to batch effects whilst simultaneously achieving performance on par with standard approaches. Together with other conclusions, it provides recommendations on the training strategy of a representation model for HCS images.
Collapse
Affiliation(s)
- Adriana Borowa
- Jagiellonian University, Faculty of Mathematics and Computer Science, Kraków, Poland
- Jagiellonian University, Doctoral School of Exact and Natural Sciences, Kraków, Poland
- Ardigen SA, Kraków, Poland
| | - Dawid Rymarczyk
- Jagiellonian University, Faculty of Mathematics and Computer Science, Kraków, Poland
- Ardigen SA, Kraków, Poland
| | | | | | | | | | - Łukasz Struski
- Jagiellonian University, Faculty of Mathematics and Computer Science, Kraków, Poland
| | - Jacek Tabor
- Jagiellonian University, Faculty of Mathematics and Computer Science, Kraków, Poland
| | - Bartosz Zieliński
- Jagiellonian University, Faculty of Mathematics and Computer Science, Kraków, Poland
- Ardigen SA, Kraków, Poland
| |
Collapse
|
2
|
Xie Y, Yang Z, Yang Q, Liu D, Tang S, Yang L, Duan X, Hu C, Lu YJ, Wang J. Identification method of thyroid nodule ultrasonography based on self-supervised learning dual-branch attention learning framework. Health Inf Sci Syst 2024; 12:7. [PMID: 38261831 PMCID: PMC10794678 DOI: 10.1007/s13755-023-00266-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2023] [Accepted: 12/12/2023] [Indexed: 01/25/2024] Open
Abstract
Thyroid ultrasound is a widely used diagnostic technique for thyroid nodules in clinical practice. However, due to the characteristics of ultrasonic imaging, such as low image contrast, high noise levels, and heterogeneous features, detecting and identifying nodules remains challenging. In addition, high-quality labeled medical imaging datasets are rare, and thyroid ultrasound images are no exception, posing a significant challenge for machine learning applications in medical image analysis. In this study, we propose a Dual-branch Attention Learning (DBAL) convolutional neural network framework to enhance thyroid nodule detection by capturing contextual information. Leveraging jigsaw puzzles as a pretext task during network training, we improve the network's generalization ability with limited data. Our framework effectively captures intrinsic features in a global-to-local manner. Experimental results involve self-supervised pre-training on unlabeled ultrasound images and fine-tuning using 1216 clinical ultrasound images from a collaborating hospital. DBAL achieves accurate discrimination of thyroid nodules, with a 88.5% correct diagnosis rate for malignant and benign nodules and a 93.7% area under the ROC curve. This novel approach demonstrates promising potential in clinical applications for its accuracy and efficiency.
Collapse
Affiliation(s)
- Yifei Xie
- Guangzhou Panyu Central Hospital, Guangzhou, 510006 Guangdong People’s Republic of China
- Guangdong University of Technology, Guangzhou, 510006 Guangdong People’s Republic of China
| | - Zhengfei Yang
- Sun Yat-sen Memorial Hospital of Sun Yat-sen University, Guangzhou, 510006 Guangdong People’s Republic of China
| | - Qiyu Yang
- Guangdong University of Technology, Guangzhou, 510006 Guangdong People’s Republic of China
| | - Dongning Liu
- Guangdong University of Technology, Guangzhou, 510006 Guangdong People’s Republic of China
| | - Shuzhuang Tang
- Guangdong University of Technology, Guangzhou, 510006 Guangdong People’s Republic of China
| | - Lin Yang
- Guangzhou Panyu Central Hospital, Guangzhou, 510006 Guangdong People’s Republic of China
| | - Xuan Duan
- Guangdong University of Technology, Guangzhou, 510006 Guangdong People’s Republic of China
| | - Changming Hu
- Guangdong Medical Device Quality Supervision and Inspection Institute, Guangzhou, 510006 Guangdong People’s Republic of China
| | - Yu-Jing Lu
- Guangdong University of Technology, Guangzhou, 510006 Guangdong People’s Republic of China
- Smart Medical Innovation Technology Center, Guangdong University of Technology, Guangzhou, 510006 Guangdong People’s Republic of China
| | - Jiaxun Wang
- Guangzhou Panyu Central Hospital, Guangzhou, 510006 Guangdong People’s Republic of China
| |
Collapse
|
3
|
Alasmawi H, Bricker L, Yaqub M. FUSC: Fetal Ultrasound Semantic Clustering of Second-Trimester Scans Using Deep Self-Supervised Learning. Ultrasound Med Biol 2024; 50:703-711. [PMID: 38350787 DOI: 10.1016/j.ultrasmedbio.2024.01.010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/25/2023] [Revised: 12/31/2023] [Accepted: 01/14/2024] [Indexed: 02/15/2024]
Abstract
OBJECTIVE The aim of this study was address the challenges posed by the manual labeling of fetal ultrasound images by introducing an unsupervised approach, the fetal ultrasound semantic clustering (FUSC) method. The primary objective was to automatically cluster a large volume of ultrasound images into various fetal views, reducing or eliminating the need for labor-intensive manual labeling. METHODS The FUSC method was developed by using a substantial data set comprising 88,063 images. The methodology involves an unsupervised clustering approach to categorize ultrasound images into diverse fetal views. The method's effectiveness was further evaluated on an additional, unseen data set consisting of 8187 images. The evaluation included assessment of the clustering purity, and the entire process is detailed to provide insights into the method's performance. RESULTS The FUSC method exhibited notable success, achieving >92% clustering purity on the evaluation data set of 8187 images. The results signify the feasibility of automatically clustering fetal ultrasound images without relying on manual labeling. The study showcases the potential of this approach in handling a large volume of ultrasound scans encountered in clinical practice, with implications for improving efficiency and accuracy in fetal ultrasound imaging. CONCLUSION The findings of this investigation suggest that the FUSC method holds significant promise for the field of fetal ultrasound imaging. By automating the clustering of ultrasound images, this approach has the potential to reduce the manual labeling burden, making the process more efficient. The results pave the way for advanced automated labeling solutions, contributing to the enhancement of clinical practices in fetal ultrasound imaging. Our code is available at https://github.com/BioMedIA-MBZUAI/FUSC.
Collapse
Affiliation(s)
- Hussain Alasmawi
- Mohamed bin Zayed University of Artificial Intelligence, Abu Dhabi, United Arab Emirates.
| | - Leanne Bricker
- Abu Dhabi Health Services Company (SEHA), Abu Dhabi, United Arab Emirates
| | - Mohammad Yaqub
- Mohamed bin Zayed University of Artificial Intelligence, Abu Dhabi, United Arab Emirates
| |
Collapse
|
4
|
Hojjati H, Ho TKK, Armanfard N. Self-supervised anomaly detection in computer vision and beyond: A survey and outlook. Neural Netw 2024; 172:106106. [PMID: 38232432 DOI: 10.1016/j.neunet.2024.106106] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2023] [Revised: 12/31/2023] [Accepted: 01/05/2024] [Indexed: 01/19/2024]
Abstract
Anomaly detection (AD) plays a crucial role in various domains, including cybersecurity, finance, and healthcare, by identifying patterns or events that deviate from normal behavior. In recent years, significant progress has been made in this field due to the remarkable growth of deep learning models. Notably, the advent of self-supervised learning has sparked the development of novel AD algorithms that outperform the existing state-of-the-art approaches by a considerable margin. This paper aims to provide a comprehensive review of the current methodologies in self-supervised anomaly detection. We present technical details of the standard methods and discuss their strengths and drawbacks. We also compare the performance of these models against each other and other state-of-the-art anomaly detection models. Finally, the paper concludes with a discussion of future directions for self-supervised anomaly detection, including the development of more effective and efficient algorithms and the integration of these techniques with other related fields, such as multi-modal learning.
Collapse
Affiliation(s)
- Hadi Hojjati
- Department of Electrical and Computer Engineering, McGill University, Montreal, QC, Canada; Mila - Quebec AI Institute, Montreal, QC, Canada.
| | - Thi Kieu Khanh Ho
- Department of Electrical and Computer Engineering, McGill University, Montreal, QC, Canada; Mila - Quebec AI Institute, Montreal, QC, Canada
| | - Narges Armanfard
- Department of Electrical and Computer Engineering, McGill University, Montreal, QC, Canada; Mila - Quebec AI Institute, Montreal, QC, Canada
| |
Collapse
|
5
|
Godson L, Alemi N, Nsengimana J, Cook GP, Clarke EL, Treanor D, Bishop DT, Newton-Bishop J, Gooya A, Magee D. Immune subtyping of melanoma whole slide images using multiple instance learning. Med Image Anal 2024; 93:103097. [PMID: 38325154 DOI: 10.1016/j.media.2024.103097] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2022] [Revised: 01/15/2024] [Accepted: 01/25/2024] [Indexed: 02/09/2024]
Abstract
Determining early-stage prognostic markers and stratifying patients for effective treatment are two key challenges for improving outcomes for melanoma patients. Previous studies have used tumour transcriptome data to stratify patients into immune subgroups, which were associated with differential melanoma specific survival and potential predictive biomarkers. However, acquiring transcriptome data is a time-consuming and costly process. Moreover, it is not routinely used in the current clinical workflow. Here, we attempt to overcome this by developing deep learning models to classify gigapixel haematoxylin and eosin (H&E) stained pathology slides, which are well established in clinical workflows, into these immune subgroups. We systematically assess six different multiple instance learning (MIL) frameworks, using five different image resolutions and three different feature extraction methods. We show that pathology-specific self-supervised models using 10x resolution patches generate superior representations for the classification of immune subtypes. In addition, in a primary melanoma dataset, we achieve a mean area under the receiver operating characteristic curve (AUC) of 0.80 for classifying histopathology images into 'high' or 'low immune' subgroups and a mean AUC of 0.82 in an independent TCGA melanoma dataset. Furthermore, we show that these models are able to stratify patients into 'high' and 'low immune' subgroups with significantly different melanoma specific survival outcomes (log rank test, P< 0.005). We anticipate that MIL methods will allow us to find new biomarkers of high importance, act as a tool for clinicians to infer the immune landscape of tumours and stratify patients, without needing to carry out additional expensive genetic tests.
Collapse
Affiliation(s)
- Lucy Godson
- School of Computing, University of Leeds, Woodhouse, Leeds, LS2 9JT, United Kingdom.
| | - Navid Alemi
- School of Computing, University of Leeds, Woodhouse, Leeds, LS2 9JT, United Kingdom
| | - Jérémie Nsengimana
- Population Health Sciences Institute, Newcastle University, Newcastle upon Tyne, NE1 7RU, United Kingdom
| | - Graham P Cook
- Leeds Institute of Medical Research, University of Leeds School of Medicine, St. James's University Hospital, Leeds, United Kingdom
| | - Emily L Clarke
- Department of Histopathology, Leeds Teaching Hospitals NHS Trust, Leeds, United Kingdom; Division of Pathology and Data Analytics, Leeds Institute of Cancer and Pathology, University of Leeds, Beckett Street, Leeds, LS9 7TF, United Kingdom
| | - Darren Treanor
- Department of Histopathology, Leeds Teaching Hospitals NHS Trust, Leeds, United Kingdom; Division of Pathology and Data Analytics, Leeds Institute of Cancer and Pathology, University of Leeds, Beckett Street, Leeds, LS9 7TF, United Kingdom; Department of Clinical Pathology and Department of Clinical and Experimental Medicine, Linköping University, Linköping, Sweden; Centre for Medical Image Science and Visualization (CMIV), Linköping University, Linköping, Sweden
| | - D Timothy Bishop
- Leeds Institute of Medical Research, University of Leeds School of Medicine, St. James's University Hospital, Leeds, United Kingdom
| | - Julia Newton-Bishop
- Division of Pathology and Data Analytics, Leeds Institute of Cancer and Pathology, University of Leeds, Beckett Street, Leeds, LS9 7TF, United Kingdom
| | - Ali Gooya
- School of Computing, University of Glasgow, Glasgow, G12 8QQ, United Kingdom
| | - Derek Magee
- School of Computing, University of Leeds, Woodhouse, Leeds, LS2 9JT, United Kingdom
| |
Collapse
|
6
|
Yang K, Liu Y, Zhao Z, Ding P, Zhao W. Local structure-aware graph contrastive representation learning. Neural Netw 2024; 172:106083. [PMID: 38182463 DOI: 10.1016/j.neunet.2023.12.037] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2023] [Revised: 10/26/2023] [Accepted: 12/21/2023] [Indexed: 01/07/2024]
Abstract
Traditional Graph Neural Network (GNN), as a graph representation learning method, is constrained by label information. However, Graph Contrastive Learning (GCL) methods, which tackles the label problem effectively, mainly focus on the feature information of the global graph or small subgraph structure (e.g., the first-order neighborhood). In this paper, we propose a Local Structure-aware Graph Contrastive representation Learning method (LS-GCL) to model the structural information of nodes from multiple views. Specifically, we construct the semantic subgraphs that are not limited to the first-order neighbors. For the local view, the semantic subgraph of each target node is input into a shared GNN encoder to obtain the target node embeddings at the subgraph-level. Then, we use a pooling function to generate the subgraph-level graph embeddings. For the global view, considering the original graph preserves indispensable semantic information of nodes, we leverage the shared GNN encoder to learn the target node embeddings at the global graph-level. The proposed LS-GCL model is optimized to maximize the common information among similar instances at three various perspectives through a multi-level contrastive loss function. Experimental results on six datasets illustrate that our method outperforms state-of-the-art graph representation learning approaches for both node classification and link prediction tasks.
Collapse
Affiliation(s)
- Kai Yang
- College of Information Engineering, Yangzhou University, Yangzhou, 225127, China.
| | - Yuan Liu
- College of Information Engineering, Yangzhou University, Yangzhou, 225127, China
| | - Zijuan Zhao
- Business School, University of Shanghai for Science and Technology, Shanghai, 200093, China
| | - Peijin Ding
- College of Information Engineering, Yangzhou University, Yangzhou, 225127, China
| | - Wenqian Zhao
- College of Information Engineering, Yangzhou University, Yangzhou, 225127, China
| |
Collapse
|
7
|
Tian Q, Zhao M. Generation, division and training: A promising method for source-free unsupervised domain adaptation. Neural Netw 2024; 172:106142. [PMID: 38281364 DOI: 10.1016/j.neunet.2024.106142] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2023] [Revised: 12/20/2023] [Accepted: 01/21/2024] [Indexed: 01/30/2024]
Abstract
Conventional unsupervised domain adaptation (UDA) methods often presuppose the existence of labeled source domain samples while adapting the source model to the target domain. Nevertheless, this premise is not always tenable in the context of source-free UDA (SFUDA) attributed to data privacy considerations. Some existing methods address this challenging SFUDA problem by self-supervised learning. But inaccurate pseudo-labels are always unavoidable to degrade the performance of the target model among these methods. Therefore, we propose a promising SFUDA method, namely Generation, Division and Training (GDT) which aims to promote the reliability of pseudo-labels for self-supervised learning and encourage similar features to have closer predictions than dissimilar ones by contrastive learning. Specifically in our GDT method, we first refine pseudo-labels with deep clustering for target samples and then split them into reliable samples and unreliable samples. After that, we adopt self-supervised learning and information maximization for reliable samples training. And for unreliable samples, we conduct contrastive learning via the perspective of similarity and disparity to attract similar samples and repulse dissimilar samples, which helps pull the similar features closed and push the dissimilar features away, leading to efficient feature clustering. Thorough experimentation on three benchmark datasets substantiates the excellence of our proposed approach.
Collapse
Affiliation(s)
- Qing Tian
- School of Software, Nanjing University of Information Science and Technology, Nanjing, 210044, China; Wuxi Institute of Technology, Nanjing University of Information Science and Technology, Wuxi, 214000, China; State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, 210023, China.
| | - Mengna Zhao
- School of Software, Nanjing University of Information Science and Technology, Nanjing, 210044, China
| |
Collapse
|
8
|
Chavan S, Choubey N. Self-supervised category selective attention classifier network for diabetic macular edema classification. Acta Diabetol 2024:10.1007/s00592-024-02257-6. [PMID: 38521818 DOI: 10.1007/s00592-024-02257-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/11/2023] [Accepted: 02/11/2024] [Indexed: 03/25/2024]
Abstract
AIMS This study aims to develop an advanced model for the classification of Diabetic Macular Edema (DME) using deep learning techniques. Specifically, the objective is to introduce a novel architecture, SSCSAC-Net, that leverages self-supervised learning and category-selective attention mechanisms to improve the precision of DME classification. METHODS The proposed SSCSAC-Net integrates self-supervised learning to effectively utilize unlabeled data for learning robust features related to DME. Additionally, it incorporates a category-specific attention mechanism and a domain-specific layer into the ResNet-152 base architecture. The model is trained using an ensemble of unsupervised and supervised learning techniques. Benchmark datasets are utilized for testing the model's performance, ensuring its robustness and generalizability across different data distributions. RESULTS Evaluation of the SSCSAC-Net on multiple datasets demonstrates its superior performance compared to existing techniques. The model achieves high accuracy, precision, and recall rates, with an accuracy of 98.7%, precision of 98.6%, and recall of 98.8%. Furthermore, the incorporation of self-supervised learning reduces the dependency on extensive labeled data, making the solution more scalable and cost-effective. CONCLUSIONS The proposed SSCSAC-Net represents a significant advancement in automated DME classification. By effectively using self-supervised learning and attention mechanisms, the model offers improved accuracy in identifying DME-related features within retinal images. Its robustness and generalizability across different datasets highlight its potential for clinical applications, providing a valuable tool for clinicians in diagnosing DME effectively.
Collapse
Affiliation(s)
- Sachin Chavan
- SVKM'S NMIMS, Mukesh Patel School of Technology Management and Engineering, Shirpur, Maharashtra, India.
| | - Nitin Choubey
- SVKM'S NMIMS, Mukesh Patel School of Technology Management and Engineering, Shirpur, Maharashtra, India
| |
Collapse
|
9
|
Liu M, Lee CI, Tzeng CR, Lai HH, Huang Y, Chang TA. WISE: whole-scenario embryo identification using self-supervised learning encoder in IVF. J Assist Reprod Genet 2024:10.1007/s10815-024-03080-2. [PMID: 38470553 DOI: 10.1007/s10815-024-03080-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2023] [Accepted: 02/28/2024] [Indexed: 03/14/2024] Open
Abstract
PURPOSE To study the effectiveness of whole-scenario embryo identification using a self-supervised learning encoder (WISE) in in vitro fertilization (IVF) on time-lapse, cross-device, and cryo-thawed scenarios. METHODS WISE was based on the vision transformer (ViT) architecture and masked autoencoders (MAE), a self-supervised learning (SSL) method. To train WISE, we prepared three datasets including the SSL pre-training dataset, the time-lapse identification dataset, and the cross-device identification dataset. To identify whether pairs of images were from the same embryos in different scenarios in the downstream identification tasks, embryo images including time-lapse and microscope images were first pre-processed through object detection, cropping, padding, and resizing, and then fed into WISE to get predictions. RESULTS WISE could accurately identify embryos in the three scenarios. The accuracy was 99.89% on the time-lapse identification dataset, and 83.55% on the cross-device identification dataset. Besides, we subdivided a cryo-thawed evaluation set from the cross-device test set to have a better estimation of how WISE performs in the real-world, and it reached an accuracy of 82.22%. There were approximately 10% improvements in cross-device and cryo-thawed identification tasks after the SSL method was applied. Besides, WISE demonstrated improvements in the accuracy of 9.5%, 12%, and 18% over embryologists in the three scenarios. CONCLUSION SSL methods can improve embryo identification accuracy even when dealing with cross-device and cryo-thawed paired images. The study is the first to apply SSL in embryo identification, and the results show the promise of WISE for future application in embryo witnessing.
Collapse
Affiliation(s)
- Mark Liu
- Binflux, Inc., 4F.-1, No. 9, Dehui St., Zhongshan Dist., Taipei City, 10461, Taiwan.
| | - Chun-I Lee
- Institute of Medicine, Chung Shan Medical University, Taichung, Taiwan
- Department of Obstetrics and Gynecology, Chung Shan Medical University, Taichung, Taiwan
- Division of Infertility, Lee Women's Hospital, Taichung, Taiwan
| | | | - Hsing-Hua Lai
- Stork Fertility Center, Stork Ladies Clinic, Hsinchu City, Taiwan
| | - Yulun Huang
- Binflux, Inc., 4F.-1, No. 9, Dehui St., Zhongshan Dist., Taipei City, 10461, Taiwan
| | - T Arthur Chang
- Department of Obstetrics and Gynecology, University of Texas Health Science Center, San Antonio, TX, USA
| |
Collapse
|
10
|
Imagawa K, Shiomoto K. Evaluation of Effectiveness of Self-Supervised Learning in Chest X-Ray Imaging to Reduce Annotated Images. J Imaging Inform Med 2024:10.1007/s10278-024-00975-5. [PMID: 38459399 DOI: 10.1007/s10278-024-00975-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/07/2023] [Revised: 11/17/2023] [Accepted: 11/17/2023] [Indexed: 03/10/2024]
Abstract
A significant challenge in machine learning-based medical image analysis is the scarcity of medical images. Obtaining a large number of labeled medical images is difficult because annotating medical images is a time-consuming process that requires specialized knowledge. In addition, inappropriate annotation processes can increase model bias. Self-supervised learning (SSL) is a type of unsupervised learning method that extracts image representations. Thus, SSL can be an effective method to reduce the number of labeled images. In this study, we investigated the feasibility of reducing the number of labeled images in a limited set of unlabeled medical images. The unlabeled chest X-ray (CXR) images were pretrained using the SimCLR framework, and then the representations were fine-tuned as supervised learning for the target task. A total of 2000 task-specific CXR images were used to perform binary classification of coronavirus disease 2019 (COVID-19) and normal cases. The results demonstrate that the performance of pretraining on task-specific unlabeled CXR images can be maintained when the number of labeled CXR images is reduced by approximately 40%. In addition, the performance was significantly better than that obtained without pretraining. In contrast, a large number of pretrained unlabeled images are required to maintain performance regardless of task specificity among a small number of labeled CXR images. In summary, to reduce the number of labeled images using SimCLR, we must consider both the number of images and the task-specific characteristics of the target images.
Collapse
Affiliation(s)
- Kuniki Imagawa
- Faculty of Information Technology, Tokyo City University, 1-28-1 Tamazutsumi, Setagaya-ku, Tokyo, 158-8557, Japan.
| | - Kohei Shiomoto
- Faculty of Information Technology, Tokyo City University, 1-28-1 Tamazutsumi, Setagaya-ku, Tokyo, 158-8557, Japan
| |
Collapse
|
11
|
Tutsoy O, Koç GG. Deep self-supervised machine learning algorithms with a novel feature elimination and selection approaches for blood test-based multi-dimensional health risks classification. BMC Bioinformatics 2024; 25:103. [PMID: 38459463 PMCID: PMC10921629 DOI: 10.1186/s12859-024-05729-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2023] [Accepted: 03/04/2024] [Indexed: 03/10/2024] Open
Abstract
BACKGROUND Blood test is extensively performed for screening, diagnoses and surveillance purposes. Although it is possible to automatically evaluate the raw blood test data with the advanced deep self-supervised machine learning approaches, it has not been profoundly investigated and implemented yet. RESULTS This paper proposes deep machine learning algorithms with multi-dimensional adaptive feature elimination, self-feature weighting and novel feature selection approaches. To classify the health risks based on the processed data with the deep layers, four machine learning algorithms having various properties from being utterly model free to gradient driven are modified. CONCLUSIONS The results show that the proposed deep machine learning algorithms can remove the unnecessary features, assign self-importance weights, selects their most informative ones and classify the health risks automatically from the worst-case low to worst-case high values.
Collapse
Affiliation(s)
- Onder Tutsoy
- Adana Alparslan Turkes Science and Technology University, Adana, Turkey.
| | - Gizem Gul Koç
- Adana Alparslan Turkes Science and Technology University, Adana, Turkey
| |
Collapse
|
12
|
Tian C, Xiao J, Zhang B, Zuo W, Zhang Y, Lin CW. A self-supervised network for image denoising and watermark removal. Neural Netw 2024; 174:106218. [PMID: 38518709 DOI: 10.1016/j.neunet.2024.106218] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2023] [Revised: 10/18/2023] [Accepted: 02/27/2024] [Indexed: 03/24/2024]
Abstract
In image watermark removal, popular methods depend on given reference non-watermark images in a supervised way to remove watermarks. However, reference non-watermark images are difficult to be obtained in the real world. At the same time, they often suffer from the influence of noise when captured by digital devices. To resolve these issues, in this paper, we present a self-supervised network for image denoising and watermark removal (SSNet). SSNet uses a parallel network in a self-supervised learning way to remove noise and watermarks. Specifically, each sub-network contains two sub-blocks. The upper sub-network uses the first sub-block to remove noise, according to noise-to-noise. Then, the second sub-block in the upper sub-network is used to remove watermarks, according to the distributions of watermarks. To prevent the loss of important information, the lower sub-network is used to simultaneously learn noise and watermarks in a self-supervised learning way. Moreover, two sub-networks interact via attention to extract more complementary salient information. The proposed method does not depend on paired images to learn a blind denoising and watermark removal model, which is very meaningful for real applications. Also, it is more effective than the popular image watermark removal methods in public datasets. Codes can be found at https://github.com/hellloxiaotian/SSNet.
Collapse
Affiliation(s)
- Chunwei Tian
- PAMI Research Group, University of Macau, 999078, Macao Special Administrative Region of China
| | - Jingyu Xiao
- School of Computer Science, Central South University, Changsha, 410083, China
| | - Bob Zhang
- PAMI Research Group, University of Macau, 999078, Macao Special Administrative Region of China.
| | - Wangmeng Zuo
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, 150001, China
| | - Yudong Zhang
- School of Computing and Mathematics, University of Leicester, Leicester, LE1 7RH, UK
| | - Chia-Wen Lin
- Department of Electrical Engineering and the Institute of Communications Engineering, National Tsing Hua University, Hsinchu 300, Taiwan
| |
Collapse
|
13
|
Liu Z, Chen Q, Lan W, Lu H, Zhang S. SSLDTI: A novel method for drug-target interaction prediction based on self-supervised learning. Artif Intell Med 2024; 149:102778. [PMID: 38462280 DOI: 10.1016/j.artmed.2024.102778] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2023] [Revised: 12/01/2023] [Accepted: 01/14/2024] [Indexed: 03/12/2024]
Abstract
Many computational methods have been proposed to identify potential drug-target interactions (DTIs) to expedite drug development. Graph neural network (GNN) methods are considered to be one of the most effective approaches. However, shallow GNN methods can only aggregate local information from nodes. Also, deep GNN methods may result in over-smoothing while obtaining long-distance neighbourhood information. As a result, existing GNN methods struggle to extract the complete features of the graph. Additionally, the number of known DTIs is insufficient, and there are far more unknown drug-target pairs than known DTIs, leading to class imbalance. This article proposes a model that combines graph autoencoder and self-supervised learning to accurately encode multilevel features of graphs using only a small number of labelled samples. We introduce a positive sample compensation coefficient to the objective function to mitigate the impact of class imbalance. Experiments on two datasets demonstrated that our model outperforms the four baseline methods, and the new DTIs predicted by the SSLDTI model were verified by the DrugBank database.
Collapse
Affiliation(s)
- Zhixian Liu
- School of Electronics and Information Engineering, Beibu Gulf University, Qinzhou, Guangxi, China
| | - Qingfeng Chen
- School of Computer, Electronic and Information, Guangxi University, Nanning, Guangxi, China.
| | - Wei Lan
- School of Computer, Electronic and Information, Guangxi University, Nanning, Guangxi, China
| | - Huihui Lu
- School of Electronics and Information Engineering, Beibu Gulf University, Qinzhou, Guangxi, China
| | - Shichao Zhang
- School of Computer Science and Engineering, Central South University, Changsha, Hunan, China.
| |
Collapse
|
14
|
Wei J, Li Z, Zhuo L, Fu X, Wang M, Li K, Chen C. Enhancing drug-food interaction prediction with precision representations through multilevel self-supervised learning. Comput Biol Med 2024; 171:108104. [PMID: 38335821 DOI: 10.1016/j.compbiomed.2024.108104] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2023] [Revised: 01/27/2024] [Accepted: 02/04/2024] [Indexed: 02/12/2024]
Abstract
Drug-food interactions (DFIs) crucially impact patient safety and drug efficacy by modifying absorption, distribution, metabolism, and excretion. The application of deep learning for predicting DFIs is promising, yet the development of computational models remains in its early stages. This is mainly due to the complexity of food compounds, challenging dataset developers in acquiring comprehensive ingredient data, often resulting in incomplete or vague food component descriptions. DFI-MS tackles this issue by employing an accurate feature representation method alongside a refined computational model. It innovatively achieves a more precise characterization of food features, a previously daunting task in DFI research. This is accomplished through modules designed for perturbation interactions, feature alignment and domain separation, and inference feedback. These modules extract essential information from features, using a perturbation module and a feature interaction encoder to establish robust representations. The feature alignment and domain separation modules are particularly effective in managing data with diverse frequencies and characteristics. DFI-MS stands out as the first in its field to combine data augmentation, feature alignment, domain separation, and contrastive learning. The flexibility of the inference feedback module allows its application in various downstream tasks. Demonstrating exceptional performance across multiple datasets, DFI-MS represents a significant advancement in food presentations technology. Our code and data are available at https://github.com/kkkayle/DFI-MS.
Collapse
Affiliation(s)
- Jinhang Wei
- Wenzhou University of Technology, Wenzhou, 325000, China
| | - Zhen Li
- Institute of Computational Science and Technology, Guangzhou University, Guangzhou, 510006, China
| | - Linlin Zhuo
- Wenzhou University of Technology, Wenzhou, 325000, China.
| | - Xiangzheng Fu
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, 410006, China.
| | - Mingjing Wang
- Wenzhou University of Technology, Wenzhou, 325000, China.
| | - Keqin Li
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, 410006, China; Department of Computer Science, State University of New York, New York, 12561, USA
| | - Chengshui Chen
- Department of Pulmonary and Critical Care Medicine, Quzhou People's Hospital, The Quzhou Affiliated Hospital of Wenzhou Medical University, Quzhou, 324000, China; Key Laboratory of Interventional Pulmonology of Zhejiang Province, Department of Pulmonary and Critical Care Medicine, The First Affiliated Hospital of Wenzhou Medical University, Wenzhou, 325000, China.
| |
Collapse
|
15
|
Chelebian E, Avenel C, Ciompi F, Wählby C. DEPICTER: Deep representation clustering for histology annotation. Comput Biol Med 2024; 170:108026. [PMID: 38308865 DOI: 10.1016/j.compbiomed.2024.108026] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2023] [Revised: 01/24/2024] [Accepted: 01/24/2024] [Indexed: 02/05/2024]
Abstract
Automatic segmentation of histopathology whole-slide images (WSI) usually involves supervised training of deep learning models with pixel-level labels to classify each pixel of the WSI into tissue regions such as benign or cancerous. However, fully supervised segmentation requires large-scale data manually annotated by experts, which can be expensive and time-consuming to obtain. Non-fully supervised methods, ranging from semi-supervised to unsupervised, have been proposed to address this issue and have been successful in WSI segmentation tasks. But these methods have mainly been focused on technical advancements in algorithmic performance rather than on the development of practical tools that could be used by pathologists or researchers in real-world scenarios. In contrast, we present DEPICTER (Deep rEPresentatIon ClusTERing), an interactive segmentation tool for histopathology annotation that produces a patch-wise dense segmentation map at WSI level. The interactive nature of DEPICTER leverages self- and semi-supervised learning approaches to allow the user to participate in the segmentation producing reliable results while reducing the workload. DEPICTER consists of three steps: first, a pretrained model is used to compute embeddings from image patches. Next, the user selects a number of benign and cancerous patches from the multi-resolution image. Finally, guided by the deep representations, label propagation is achieved using our novel seeded iterative clustering method or by directly interacting with the embedding space via feature space gating. We report both real-time interaction results with three pathologists and evaluate the performance on three public cancer classification dataset benchmarks through simulations. The code and demos of DEPICTER are publicly available at https://github.com/eduardchelebian/depicter.
Collapse
Affiliation(s)
- Eduard Chelebian
- Department of Information Technology and SciLifeLab, Uppsala University, Uppsala, Sweden.
| | - Chirstophe Avenel
- Department of Information Technology and SciLifeLab, Uppsala University, Uppsala, Sweden
| | - Francesco Ciompi
- Department of Pathology, Radboud University Medical Center, Nijmegen, The Netherlands
| | - Carolina Wählby
- Department of Information Technology and SciLifeLab, Uppsala University, Uppsala, Sweden
| |
Collapse
|
16
|
Cheng L, Yu T, Khalitov R, Yang Z. Self-supervised Learning for DNA sequences with circular dilated convolutional networks. Neural Netw 2024; 171:466-473. [PMID: 38150872 DOI: 10.1016/j.neunet.2023.12.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2023] [Revised: 10/11/2023] [Accepted: 12/01/2023] [Indexed: 12/29/2023]
Abstract
DNA molecules commonly exhibit wide interactions between the nucleobases. Modeling the interactions is important for obtaining accurate sequence-based inference. Although many deep learning methods have recently been developed for modeling DNA sequences, they still suffer from two major issues: 1) most existing methods can handle only short DNA fragments and fail to capture long-range information; 2) current methods always require massive supervised labels, which are hard to obtain in practice. We propose a new method to address both issues. Our neural network employs circular dilated convolutions as building blocks in the backbone. As a result, our network can take long DNA sequences as input without any condensation. We also incorporate the neural network into a self-supervised learning framework to capture inherent information in DNA without expensive supervised labeling. We have tested our model in two DNA inference tasks, the human variant effect and the open chromatin region of plants, where the experimental results show that our method outperforms five other deep learning models. Our code is available at https://github.com/wiedersehne/cdilDNA.
Collapse
Affiliation(s)
- Lei Cheng
- Department of Computer Science, Norwegian University of Science and Technology, Norway
| | - Tong Yu
- Department of Computer Science, Norwegian University of Science and Technology, Norway
| | - Ruslan Khalitov
- Department of Computer Science, Norwegian University of Science and Technology, Norway
| | - Zhirong Yang
- Department of Computer Science, Norwegian University of Science and Technology, Norway; Jinhua Institute of Zhejiang Univerisity, China.
| |
Collapse
|
17
|
Wang J, Yang X, Jia X, Xue W, Chen R, Chen Y, Zhu X, Liu L, Cao Y, Zhou J, Ni D, Gu N. Thyroid ultrasound diagnosis improvement via multi-view self-supervised learning and two-stage pre-training. Comput Biol Med 2024; 171:108087. [PMID: 38364658 DOI: 10.1016/j.compbiomed.2024.108087] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2023] [Revised: 01/04/2024] [Accepted: 01/27/2024] [Indexed: 02/18/2024]
Abstract
Thyroid nodule classification and segmentation in ultrasound images are crucial for computer-aided diagnosis; however, they face limitations owing to insufficient labeled data. In this study, we proposed a multi-view contrastive self-supervised method to improve thyroid nodule classification and segmentation performance with limited manual labels. Our method aligns the transverse and longitudinal views of the same nodule, thereby enabling the model to focus more on the nodule area. We designed an adaptive loss function that eliminates the limitations of the paired data. Additionally, we adopted a two-stage pre-training to exploit the pre-training on ImageNet and thyroid ultrasound images. Extensive experiments were conducted on a large-scale dataset collected from multiple centers. The results showed that the proposed method significantly improves nodule classification and segmentation performance with limited manual labels and outperforms state-of-the-art self-supervised methods. The two-stage pre-training also significantly exceeded ImageNet pre-training.
Collapse
Affiliation(s)
- Jian Wang
- Key Laboratory for Bio-Electromagnetic Environment and Advanced Medical Theranostics, School of Biomedical Engineering and Informatics, Nanjing Medical University, Nanjing, 211166, China
| | - Xin Yang
- National-Regional Key Technology Engineering Laboratory for Medical Ultrasound, School of Biomedical Engineering, Health Science Center, Shenzhen University, Shenzhen, 518073, China; Marshall Laboratory of Biomedical Engineering, Shenzhen University, Shenzhen, 518073, China; Medical UltraSound Image Computing (MUSIC) Lab, Shenzhen University, Shenzhen, 518073, China
| | - Xiaohong Jia
- Department of Ultrasound, Ruijin Hospital, Shanghai Jiaotong University School of Medicine, Shanghai, 200025, China
| | - Wufeng Xue
- National-Regional Key Technology Engineering Laboratory for Medical Ultrasound, School of Biomedical Engineering, Health Science Center, Shenzhen University, Shenzhen, 518073, China; Marshall Laboratory of Biomedical Engineering, Shenzhen University, Shenzhen, 518073, China; Medical UltraSound Image Computing (MUSIC) Lab, Shenzhen University, Shenzhen, 518073, China
| | - Rusi Chen
- National-Regional Key Technology Engineering Laboratory for Medical Ultrasound, School of Biomedical Engineering, Health Science Center, Shenzhen University, Shenzhen, 518073, China; Marshall Laboratory of Biomedical Engineering, Shenzhen University, Shenzhen, 518073, China; Medical UltraSound Image Computing (MUSIC) Lab, Shenzhen University, Shenzhen, 518073, China
| | - Yanlin Chen
- National-Regional Key Technology Engineering Laboratory for Medical Ultrasound, School of Biomedical Engineering, Health Science Center, Shenzhen University, Shenzhen, 518073, China; Marshall Laboratory of Biomedical Engineering, Shenzhen University, Shenzhen, 518073, China; Medical UltraSound Image Computing (MUSIC) Lab, Shenzhen University, Shenzhen, 518073, China
| | - Xiliang Zhu
- National-Regional Key Technology Engineering Laboratory for Medical Ultrasound, School of Biomedical Engineering, Health Science Center, Shenzhen University, Shenzhen, 518073, China; Marshall Laboratory of Biomedical Engineering, Shenzhen University, Shenzhen, 518073, China; Medical UltraSound Image Computing (MUSIC) Lab, Shenzhen University, Shenzhen, 518073, China
| | - Lian Liu
- National-Regional Key Technology Engineering Laboratory for Medical Ultrasound, School of Biomedical Engineering, Health Science Center, Shenzhen University, Shenzhen, 518073, China; Marshall Laboratory of Biomedical Engineering, Shenzhen University, Shenzhen, 518073, China; Medical UltraSound Image Computing (MUSIC) Lab, Shenzhen University, Shenzhen, 518073, China
| | - Yan Cao
- Shenzhen RayShape Medical Technology Co., Ltd, Shenzhen, 518051, China
| | - Jianqiao Zhou
- Department of Ultrasound, Ruijin Hospital, Shanghai Jiaotong University School of Medicine, Shanghai, 200025, China.
| | - Dong Ni
- National-Regional Key Technology Engineering Laboratory for Medical Ultrasound, School of Biomedical Engineering, Health Science Center, Shenzhen University, Shenzhen, 518073, China; Marshall Laboratory of Biomedical Engineering, Shenzhen University, Shenzhen, 518073, China; Medical UltraSound Image Computing (MUSIC) Lab, Shenzhen University, Shenzhen, 518073, China.
| | - Ning Gu
- Key Laboratory for Bio-Electromagnetic Environment and Advanced Medical Theranostics, School of Biomedical Engineering and Informatics, Nanjing Medical University, Nanjing, 211166, China; Cardiovascular Disease Research Center, Nanjing Drum Tower Hospital, Affiliated Hospital of Medical School, Medical School, Nanjing University, Nanjing, 210093, China.
| |
Collapse
|
18
|
Yazdani E, Karamzadeh-Ziarati N, Cheshmi SS, Sadeghi M, Geramifar P, Vosoughi H, Jahromi MK, Kheradpisheh SR. Automated segmentation of lesions and organs at risk on [ 68Ga]Ga-PSMA-11 PET/CT images using self-supervised learning with Swin UNETR. Cancer Imaging 2024; 24:30. [PMID: 38424612 PMCID: PMC10903052 DOI: 10.1186/s40644-024-00675-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2023] [Accepted: 02/15/2024] [Indexed: 03/02/2024] Open
Abstract
BACKGROUND Prostate-specific membrane antigen (PSMA) PET/CT imaging is widely used for quantitative image analysis, especially in radioligand therapy (RLT) for metastatic castration-resistant prostate cancer (mCRPC). Unknown features influencing PSMA biodistribution can be explored by analyzing segmented organs at risk (OAR) and lesions. Manual segmentation is time-consuming and labor-intensive, so automated segmentation methods are desirable. Training deep-learning segmentation models is challenging due to the scarcity of high-quality annotated images. Addressing this, we developed shifted windows UNEt TRansformers (Swin UNETR) for fully automated segmentation. Within a self-supervised framework, the model's encoder was pre-trained on unlabeled data. The entire model was fine-tuned, including its decoder, using labeled data. METHODS In this work, 752 whole-body [68Ga]Ga-PSMA-11 PET/CT images were collected from two centers. For self-supervised model pre-training, 652 unlabeled images were employed. The remaining 100 images were manually labeled for supervised training. In the supervised training phase, 5-fold cross-validation was used with 64 images for model training and 16 for validation, from one center. For testing, 20 hold-out images, evenly distributed between two centers, were used. Image segmentation and quantification metrics were evaluated on the test set compared to the ground-truth segmentation conducted by a nuclear medicine physician. RESULTS The model generates high-quality OARs and lesion segmentation in lesion-positive cases, including mCRPC. The results show that self-supervised pre-training significantly improved the average dice similarity coefficient (DSC) for all classes by about 3%. Compared to nnU-Net, a well-established model in medical image segmentation, our approach outperformed with a 5% higher DSC. This improvement was attributed to our model's combined use of self-supervised pre-training and supervised fine-tuning, specifically when applied to PET/CT input. Our best model had the lowest DSC for lesions at 0.68 and the highest for liver at 0.95. CONCLUSIONS We developed a state-of-the-art neural network using self-supervised pre-training on whole-body [68Ga]Ga-PSMA-11 PET/CT images, followed by fine-tuning on a limited set of annotated images. The model generates high-quality OARs and lesion segmentation for PSMA image analysis. The generalizable model holds potential for various clinical applications, including enhanced RLT and patient-specific internal dosimetry.
Collapse
Affiliation(s)
- Elmira Yazdani
- Medical Physics Department, School of Medicine, Iran University of Medical Sciences, Tehran, 14155-6183, Iran
- Fintech in Medicine Research Center, Iran University of Medical Sciences, Tehran, Iran
| | | | - Seyyed Saeid Cheshmi
- Department of Computer and Data Sciences, Faculty of Mathematical Sciences, Shahid Beheshti University, Tehran, Iran
| | - Mahdi Sadeghi
- Medical Physics Department, School of Medicine, Iran University of Medical Sciences, Tehran, 14155-6183, Iran.
- Fintech in Medicine Research Center, Iran University of Medical Sciences, Tehran, Iran.
| | - Parham Geramifar
- Research Center for Nuclear Medicine, Tehran University of Medical Sciences, Tehran, Iran.
| | - Habibeh Vosoughi
- Research Center for Nuclear Medicine, Tehran University of Medical Sciences, Tehran, Iran
- Nuclear Medicine and Molecular Imaging Department, Imam Reza International University, Razavi Hospital, Mashhad, Iran
| | - Mahmood Kazemi Jahromi
- Medical Physics Department, School of Medicine, Iran University of Medical Sciences, Tehran, 14155-6183, Iran
- Fintech in Medicine Research Center, Iran University of Medical Sciences, Tehran, Iran
| | - Saeed Reza Kheradpisheh
- Department of Computer and Data Sciences, Faculty of Mathematical Sciences, Shahid Beheshti University, Tehran, Iran.
| |
Collapse
|
19
|
Unal MO, Ertas M, Yildirim I. Proj2Proj: self-supervised low-dose CT reconstruction. PeerJ Comput Sci 2024; 10:e1849. [PMID: 38435612 PMCID: PMC10909204 DOI: 10.7717/peerj-cs.1849] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2023] [Accepted: 01/10/2024] [Indexed: 03/05/2024]
Abstract
In Computed Tomography (CT) imaging, one of the most serious concerns has always been ionizing radiation. Several approaches have been proposed to reduce the dose level without compromising the image quality. With the emergence of deep learning, thanks to the increasing availability of computational power and huge datasets, data-driven methods have recently received a lot of attention. Deep learning based methods have also been applied in various ways to address the low-dose CT reconstruction problem. However, the success of these methods largely depends on the availability of labeled data. On the other hand, recent studies showed that training can be done successfully without the need for labeled datasets. In this study, a training scheme was defined to use low-dose projections as their own training targets. The self-supervision principle was applied in the projection domain. The parameters of a denoiser neural network were optimized through self-supervised training. It was shown that our method outperformed both traditional and compressed sensing-based iterative methods, and deep learning based unsupervised methods, in the reconstruction of analytic CT phantoms and human CT images in low-dose CT imaging. Our method's reconstruction quality is also comparable to a well-known supervised method.
Collapse
Affiliation(s)
- Mehmet Ozan Unal
- Department of Electronics and Communication Engineering, Istanbul Technical University, Istanbul, Turkey
| | - Metin Ertas
- Department of Electrical and Electronics Engineering, Istanbul University-Cerrahpasa, Istanbul, Turkey
| | - Isa Yildirim
- Department of Electronics and Communication Engineering, Istanbul Technical University, Istanbul, Turkey
| |
Collapse
|
20
|
Shi R, Yu G, Huo X, Yang Y. Prediction of chemical reaction yields with large-scale multi-view pre-training. J Cheminform 2024; 16:22. [PMID: 38403627 PMCID: PMC10895839 DOI: 10.1186/s13321-024-00815-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2023] [Accepted: 02/14/2024] [Indexed: 02/27/2024] Open
Abstract
Developing machine learning models with high generalization capability for predicting chemical reaction yields is of significant interest and importance. The efficacy of such models depends heavily on the representation of chemical reactions, which has commonly been learned from SMILES or graphs of molecules using deep neural networks. However, the progression of chemical reactions is inherently determined by the molecular 3D geometric properties, which have been recently highlighted as crucial features in accurately predicting molecular properties and chemical reactions. Additionally, large-scale pre-training has been shown to be essential in enhancing the generalization capability of complex deep learning models. Based on these considerations, we propose the Reaction Multi-View Pre-training (ReaMVP) framework, which leverages self-supervised learning techniques and a two-stage pre-training strategy to predict chemical reaction yields. By incorporating multi-view learning with 3D geometric information, ReaMVP achieves state-of-the-art performance on two benchmark datasets. Notably, the experimental results indicate that ReaMVP has a significant advantage in predicting out-of-sample data, suggesting an enhanced generalization ability to predict new reactions. Scientific Contribution: This study presents the ReaMVP framework, which improves the generalization capability of machine learning models for predicting chemical reaction yields. By integrating sequential and geometric views and leveraging self-supervised learning techniques with a two-stage pre-training strategy, ReaMVP achieves state-of-the-art performance on benchmark datasets. The framework demonstrates superior predictive ability for out-of-sample data and enhances the prediction of new reactions.
Collapse
Affiliation(s)
- Runhan Shi
- Department of Computer Science and Engineering, and Key Laboratory of Shanghai Education Commission for Intelligent Interaction and Cognitive Engineering, Shanghai Jiao Tong University, Shanghai, 200240, China
| | - Gufeng Yu
- Department of Computer Science and Engineering, and Key Laboratory of Shanghai Education Commission for Intelligent Interaction and Cognitive Engineering, Shanghai Jiao Tong University, Shanghai, 200240, China
| | - Xiaohong Huo
- Shanghai Key Laboratory for Molecular Engineering of Chiral Drugs, Frontiers Science Center for Transformative Molecules, School of Chemistry and Chemical Engineering, Shanghai Jiao Tong University, Shanghai, 200240, China
| | - Yang Yang
- Department of Computer Science and Engineering, and Key Laboratory of Shanghai Education Commission for Intelligent Interaction and Cognitive Engineering, Shanghai Jiao Tong University, Shanghai, 200240, China.
| |
Collapse
|
21
|
Cho K, Kim KD, Jeong J, Nam Y, Kim J, Choi C, Lee S, Hong GS, Seo JB, Kim N. Approximating Intermediate Feature Maps of Self-Supervised Convolution Neural Network to Learn Hard Positive Representations in Chest Radiography. J Imaging Inform Med 2024:10.1007/s10278-024-01032-x. [PMID: 38381382 DOI: 10.1007/s10278-024-01032-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/31/2023] [Revised: 01/22/2024] [Accepted: 01/24/2024] [Indexed: 02/22/2024]
Abstract
Recent advances in contrastive learning have significantly improved the performance of deep learning models. In contrastive learning of medical images, dealing with positive representation is sometimes difficult because some strong augmentation techniques can disrupt contrastive learning owing to the subtle differences between other standardized CXRs compared to augmented positive pairs; therefore, additional efforts are required. In this study, we propose intermediate feature approximation (IFA) loss, which improves the performance of contrastive convolutional neural networks by focusing more on positive representations of CXRs without additional augmentations. The IFA loss encourages the feature maps of a query image and its positive pair to resemble each other by maximizing the cosine similarity between the intermediate feature outputs of the original data and the positive pairs. Therefore, we used the InfoNCE loss, which is commonly used loss to address negative representations, and the IFA loss, which addresses positive representations, together to improve the contrastive network. We evaluated the performance of the network using various downstream tasks, including classification, object detection, and a generative adversarial network (GAN) inversion task. The downstream task results demonstrated that IFA loss can improve the performance of effectively overcoming data imbalance and data scarcity; furthermore, it can serve as a perceptual loss encoder for GAN inversion. In addition, we have made our model publicly available to facilitate access and encourage further research and collaboration in the field.
Collapse
Affiliation(s)
- Kyungjin Cho
- Department of Bioengineering, Asan Medical Institute of Convergence Science and Technology, Asan Medical Center, 88 Olympic-Ro 43-Gil Songpa-Gu, Seoul, 05505, South Korea
| | - Ki Duk Kim
- Department of Convergence Medicine, University of Ulsan College of Medicine, Asan Medical Center, 88 Olympic-Ro 43-Gil Songpa-Gu, Seoul, 05505, South Korea
| | - Jiheon Jeong
- Department of Bioengineering, Asan Medical Institute of Convergence Science and Technology, Asan Medical Center, 88 Olympic-Ro 43-Gil Songpa-Gu, Seoul, 05505, South Korea
| | - Yujin Nam
- Department of Bioengineering, Asan Medical Institute of Convergence Science and Technology, Asan Medical Center, 88 Olympic-Ro 43-Gil Songpa-Gu, Seoul, 05505, South Korea
| | - Jeeyoung Kim
- Department of Bioengineering, Asan Medical Institute of Convergence Science and Technology, Asan Medical Center, 88 Olympic-Ro 43-Gil Songpa-Gu, Seoul, 05505, South Korea
| | - Changyong Choi
- Department of Bioengineering, Asan Medical Institute of Convergence Science and Technology, Asan Medical Center, 88 Olympic-Ro 43-Gil Songpa-Gu, Seoul, 05505, South Korea
| | - Soyoung Lee
- Department of Bioengineering, Asan Medical Institute of Convergence Science and Technology, Asan Medical Center, 88 Olympic-Ro 43-Gil Songpa-Gu, Seoul, 05505, South Korea
| | - Gil-Sun Hong
- Department of Radiology and Research Institute of Radiology, Asan Medical Center, University of Ulsan College of Medicine, Seoul, Republic of Korea
| | - Joon Beom Seo
- Department of Radiology and Research Institute of Radiology, Asan Medical Center, University of Ulsan College of Medicine, Seoul, Republic of Korea
| | - Namkug Kim
- Department of Convergence Medicine, University of Ulsan College of Medicine, Asan Medical Center, 88 Olympic-Ro 43-Gil Songpa-Gu, Seoul, 05505, South Korea.
- Department of Radiology and Research Institute of Radiology, Asan Medical Center, University of Ulsan College of Medicine, Seoul, Republic of Korea.
| |
Collapse
|
22
|
Ding J, Zhou R, Fang X, Wang F, Wang J, Gan H, Fenster A. An image registration-based self-supervised Su-Net for carotid plaque ultrasound image segmentation. Comput Methods Programs Biomed 2024; 244:107957. [PMID: 38061113 DOI: 10.1016/j.cmpb.2023.107957] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/12/2022] [Revised: 11/17/2023] [Accepted: 11/27/2023] [Indexed: 01/26/2024]
Abstract
BACKGROUND AND OBJECTIVES Total Plaque Area (TPA) measurement is critical for early diagnosis and intervention of carotid atherosclerosis in individuals with high risk for stroke. The delineation of the carotid plaques is necessary for TPA measurement, and deep learning methods can automatically segment the plaque and measure TPA from carotid ultrasound images. A large number of labeled images is essential for training a good deep learning model, but it is very difficult to collect such large labeled datasets for carotid image segmentation in clinical practice. Self-supervised learning can provide a possible solution to improve the deep-learning models on small labeled training datasets by designing a pretext task to pre-train the models without using the segmentation masks. However, the existing self-supervised learning methods do not consider the feature presentations of object contours. METHODS In this paper, we propose an image registration-based self-supervised learning method and a stacked U-Net (SSL-SU-Net) for carotid plaque ultrasound image segmentation, which can better exploit the semantic features of carotid plaque contours in self-supervised task training. RESULTS Our network was trained on different numbers of labeled images (n = 10, 33, 50 and 100 subjects) and tested on 44 subjects from the SPARC dataset (n = 144, London, Canada). The network trained on the entire SPARC dataset was then directly applied to an independent dataset collected in Zhongnan hospital (n = 497, Wuhan, China). For the 44 subjects tested on the SPARC dataset, our method yielded a DSC of 80.25-89.18% and the produced TPA measurements, which were strongly correlated with manual segmentation (r = 0.965-0.995, ρ< 0.0001). For the Zhongnan dataset, the DSC was 90.3% and algorithm TPAs were strongly correlated with manual TPAs (r = 0.985, ρ< 0.0001). CONCLUSIONS The results demonstrate that our proposed method yielded excellent performance and good generalization ability when trained on a small labeled dataset, facilitating the use of deep learning in carotid ultrasound image analysis and clinical practice. The code of our algorithm is available https://github.com/a610lab/Registration-SSL.
Collapse
Affiliation(s)
- Jing Ding
- School of Computer Science, Hubei University of Technology, Wuhan, Hubei 430068, China
| | - Ran Zhou
- School of Computer Science, Hubei University of Technology, Wuhan, Hubei 430068, China.
| | - Xiaoyue Fang
- School of Computer Science, Hubei University of Technology, Wuhan, Hubei 430068, China
| | - Furong Wang
- Liyuan Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, Hubei 430074, China
| | - Ji Wang
- School of Computer Science, Hubei University of Technology, Wuhan, Hubei 430068, China
| | - Haitao Gan
- School of Computer Science, Hubei University of Technology, Wuhan, Hubei 430068, China.
| | - Aaron Fenster
- Imaging Research Laboratories, Robarts Research Institute, Western University, London N6A 5K8, Ontario, Canada
| |
Collapse
|
23
|
Stegmüller T, Abbet C, Bozorgtabar B, Clarke H, Petignat P, Vassilakos P, Thiran JP. Self-supervised learning-based cervical cytology for the triage of HPV-positive women in resource-limited settings and low-data regime. Comput Biol Med 2024; 169:107809. [PMID: 38113684 DOI: 10.1016/j.compbiomed.2023.107809] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2023] [Revised: 11/24/2023] [Accepted: 12/01/2023] [Indexed: 12/21/2023]
Abstract
Screening Papanicolaou test samples has proven to be highly effective in reducing cervical cancer-related mortality. However, the lack of trained cytopathologists hinders its widespread implementation in low-resource settings. Deep learning-assisted telecytology diagnosis emerges as an appealing alternative, but it requires the collection of large annotated training datasets, which is costly and time-consuming. In this paper, we demonstrate that the abundance of unlabeled images that can be extracted from Pap smear test whole slide images presents a fertile ground for self-supervised learning methods, yielding performance improvements compared to off-the-shelf pre-trained models for various downstream tasks. In particular, we propose Cervical Cell Copy-Pasting (C3P) as an effective augmentation method, which enables knowledge transfer from public and labeled single-cell datasets to unlabeled tiles. Not only does C3P outperforms naive transfer from single-cell images, but we also demonstrate its advantageous integration into multiple instance learning methods. Importantly, all our experiments are conducted on our introduced in-house dataset comprising liquid-based cytology Pap smear images obtained using low-cost technologies. This aligns with our long-term objective of deep learning-assisted telecytology for diagnosis in low-resource settings.
Collapse
Affiliation(s)
- Thomas Stegmüller
- Ecole Polytechnique Fédérale de Lausanne, Lausanne, 1015, Switzerland.
| | - Christian Abbet
- Ecole Polytechnique Fédérale de Lausanne, Lausanne, 1015, Switzerland
| | - Behzad Bozorgtabar
- Ecole Polytechnique Fédérale de Lausanne, Lausanne, 1015, Switzerland; Centre Hospitalier Universitaire Vaudois, Lausanne, 1011, Switzerland
| | - Holly Clarke
- Hôpitaux Universitaires de Genève, Genève, 1205, Switzerland
| | | | | | - Jean-Philippe Thiran
- Ecole Polytechnique Fédérale de Lausanne, Lausanne, 1015, Switzerland; Centre Hospitalier Universitaire Vaudois, Lausanne, 1011, Switzerland
| |
Collapse
|
24
|
Zhou Z, Xie P, Dai Z, Wu J. Self-supervised tumor segmentation and prognosis prediction in osteosarcoma using multiparametric MRI and clinical characteristics. Comput Methods Programs Biomed 2024; 244:107974. [PMID: 38154327 DOI: 10.1016/j.cmpb.2023.107974] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/08/2022] [Revised: 11/26/2023] [Accepted: 12/07/2023] [Indexed: 12/30/2023]
Abstract
BACKGROUND AND OBJECTIVE Osteosarcoma has a high mortality among malignant bone tumors. MRI-based tumor segmentation and prognosis prediction are helpful to assist doctors in detecting osteosarcoma, evaluating the patient's status, and improving patient survival. Current intelligent diagnostic approaches focus on segmentation with single-parameter MRI, which ignores the nature of MRI resulting in poor performance, and lacks the connection with prognosis prediction. Besides, osteosarcoma is a rare disease, and their few labeled data may lead to model overfitting. METHODS We propose a three-stage pipeline for segmentation and prognosis prediction of osteosarcoma to assist doctors in diagnosis. First, we propose the Multiparameter Fusion Contrast Learning (MPFCLR) algorithm to share pre-training weights for the segmentation model using unlabeled data. Then, we construct a multiparametric fusion network (MPFNet), which fuses the complementary features from multiparametric MRI (CE-T1WI, T2WI). It can automatically segment tumor and necrotic regions. Finally, a fusion nomogram is constructed by segmentation masks and clinical characteristics (volume, tumor spread) to predict the patient's prognostic status. RESULTS Our experiments used data from 136 patients at the Second Xiangya Hospital in China. According to experiments, the MPFNet achieves 84.19 % mean DSC and 84.56 % mean F1-score in segmenting tumor and necrotic regions, surpassing existing models and single-parameter MRI input for osteosarcoma segmentation. Besides, MPFCLR improves the segmentation performance and convergence speed. In prognosis prediction, our fusion nomogram (C-index: 0.806, 95 %CI: 0.758-0.854) is better than radiomics (C-index: 0.753, 95 %CI: 0.685-0.841) and clinical (C-index: 0.794, 95 %CI: 0.735-0.854) nomograms in predictive performance. Compared to the comparison models, our model is closest to the prediction model based on physician annotations. Moreover, it can accurately distinguish the patients' prognostic status with good or poor. CONCLUSION Our proposed solution can provide references for clinicians to detect osteosarcoma, evaluate patient status, and make personalized decisions. It can reduce delayed treatment or overtreatment and improve patient survival.
Collapse
Affiliation(s)
- Zhixun Zhou
- School of Computer Science and Engineering, Central South University, Changsha 410083, China
| | - Peng Xie
- Department of Spine Surgery, The Second Xiangya Hospital, Central South University, Changsha 410011, China
| | - Zhehao Dai
- Department of Spine Surgery, The Second Xiangya Hospital, Central South University, Changsha 410011, China
| | - Jia Wu
- School of Computer Science and Engineering, Central South University, Changsha 410083, China; Research Center for Artificial Intelligence, Monash University, Melbourne, Clayton VIC 3800, Australia.
| |
Collapse
|
25
|
Yu K, Sun L, Chen J, Reynolds M, Chaudhary T, Batmanghelich K. DrasCLR: A self-supervised framework of learning disease-related and anatomy-specific representation for 3D lung CT images. Med Image Anal 2024; 92:103062. [PMID: 38086236 PMCID: PMC10872608 DOI: 10.1016/j.media.2023.103062] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2023] [Revised: 08/24/2023] [Accepted: 12/05/2023] [Indexed: 01/12/2024]
Abstract
Large-scale volumetric medical images with annotation are rare, costly, and time prohibitive to acquire. Self-supervised learning (SSL) offers a promising pre-training and feature extraction solution for many downstream tasks, as it only uses unlabeled data. Recently, SSL methods based on instance discrimination have gained popularity in the medical imaging domain. However, SSL pre-trained encoders may use many clues in the image to discriminate an instance that are not necessarily disease-related. Moreover, pathological patterns are often subtle and heterogeneous, requiring the ability of the desired method to represent anatomy-specific features that are sensitive to abnormal changes in different body parts. In this work, we present a novel SSL framework, named DrasCLR, for 3D lung CT images to overcome these challenges. We propose two domain-specific contrastive learning strategies: one aims to capture subtle disease patterns inside a local anatomical region, and the other aims to represent severe disease patterns that span larger regions. We formulate the encoder using conditional hyper-parameterized network, in which the parameters are dependant on the anatomical location, to extract anatomically sensitive features. Extensive experiments on large-scale datasets of lung CT scans show that our method improves the performance of many downstream prediction and segmentation tasks. The patient-level representation improves the performance of the patient survival prediction task. We show how our method can detect emphysema subtypes via dense prediction. We demonstrate that fine-tuning the pre-trained model can significantly reduce annotation efforts without sacrificing emphysema detection accuracy. Our ablation study highlights the importance of incorporating anatomical context into the SSL framework. Our codes are available at https://github.com/batmanlab/DrasCLR.
Collapse
Affiliation(s)
- Ke Yu
- School of Computing and Information, University of Pittsburgh, Pittsburgh, USA.
| | - Li Sun
- Department of Electrical and Computer Engineering, Boston University, Boston, USA
| | - Junxiang Chen
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, USA
| | - Maxwell Reynolds
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, USA
| | - Tigmanshu Chaudhary
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, USA
| | - Kayhan Batmanghelich
- Department of Electrical and Computer Engineering, Boston University, Boston, USA
| |
Collapse
|
26
|
Xie Y, Zhong H, Wu J, Zhao W, Hou R, Zhao L, Xu X, Zhang M, Zhao J. Automatic classification of heart failure based on Cine-CMR images. Int J Comput Assist Radiol Surg 2024; 19:355-365. [PMID: 37921964 DOI: 10.1007/s11548-023-03028-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2023] [Accepted: 10/03/2023] [Indexed: 11/05/2023]
Abstract
PURPOSE Heart failure (HF) is a serious and complex syndrome with a high mortality rate. In clinical diagnosis, the correct classification of HF is helpful. In our previous work, we proposed a self-supervised learning framework of HF classification (SSLHF) on cine cardiac magnetic resonance images (Cine-CMR). However, this method lacks the integration of three dimensions of spatial information and temporal information. Thus, this study aims at proposing an automatic 4D HF classification algorithm. METHODS To construct a 4D classification model, we proposed an extensional framework called 4D-SSLHF. It mainly consists of self-supervised image restoration and HF classification. The image restoration proxy task utilizes three image transformation methods to enhance the exploration of spatial and temporal information in the Cine-CMR. In the classification task, we proposed a Siamese Conv-LSTM network by combining the Siamese network and bi-directional Conv-LSTM to integrate the features of the four dimensions simultaneously. RESULTS Experimental results on 184 patients from Shanghai Chest Hospital achieved an AUC of 0.8794 and an ACC of 0.8402 in the five-fold cross-validation. Compared with our previous work, the improvements in AUC and ACC were 2.89 % and 1.94 %, respectively. CONCLUSIONS In this study, we proposed a novel self-supervised learning framework named 4D-SSLHF for HF classification based on Cine-CMR. The proposed 4D-SSLHF can mine 3D spatial information and temporal information in Cine-CMR images well and accurately classify different categories of HF. The good classification results show our method's potential to assist physicians in choosing personalized treatment.
Collapse
Affiliation(s)
- Yuan Xie
- School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai, China
| | - Hai Zhong
- School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai, China
| | - Jiaqi Wu
- Cardiology, Shanghai Chest Hospital, Shanghai, China
| | - Wangyuan Zhao
- School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai, China
| | - Runping Hou
- School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai, China
| | - Lu Zhao
- School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai, China
| | - Xiaowei Xu
- School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai, China
| | - Min Zhang
- Cardiology, Shanghai Chest Hospital, Shanghai, China
| | - Jun Zhao
- School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai, China.
| |
Collapse
|
27
|
Bai Y, Li W, An J, Xia L, Chen H, Zhao G, Gao Z. Masked autoencoders with handcrafted feature predictions: Transformer for weakly supervised esophageal cancer classification. Comput Methods Programs Biomed 2024; 244:107936. [PMID: 38016392 DOI: 10.1016/j.cmpb.2023.107936] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/06/2022] [Revised: 10/28/2023] [Accepted: 11/19/2023] [Indexed: 11/30/2023]
Abstract
BACKGROUND AND OBJECTIVE Esophageal cancer is a serious disease with a high prevalence in Eastern Asia. Histopathology tissue analysis stands as the gold standard in diagnosing esophageal cancer. In recent years, there has been a shift towards digitizing histopathological images into whole slide images (WSIs), progressively integrating them into cancer diagnostics. However, the gigapixel sizes of WSIs present significant storage and processing challenges, and they often lack localized annotations. To address this issue, multi-instance learning (MIL) has been introduced for WSI classification, utilizing weakly supervised learning for diagnosis analysis. By applying the principles of MIL to WSI analysis, it is possible to reduce the workload of pathologists by facilitating the generation of localized annotations. Nevertheless, the approach's effectiveness is hindered by the traditional simple aggregation operation and the domain shift resulting from the prevalent use of convolutional feature extractors pretrained on ImageNet. METHODS We propose a MIL-based framework for WSI analysis and cancer classification. Concurrently, we introduce employing self-supervised learning, which obviates the need for manual annotation and demonstrates versatility in various tasks, to pretrain feature extractors. This method enhances the extraction of representative features from esophageal WSI for MIL, ensuring more robust and accurate performance. RESULTS We build a comprehensive dataset of whole esophageal slide images and conduct extensive experiments utilizing this dataset. The performance on our dataset demonstrates the efficiency of our proposed MIL framework and the pretraining process, with our framework outperforming existing methods, achieving an accuracy of 93.07% and AUC (area under the curve) of 95.31%. CONCLUSION This work proposes an effective MIL method to classify WSI of esophageal cancer. The promising results indicate that our cancer classification framework holds great potential in promoting the automatic whole esophageal slide image analysis.
Collapse
Affiliation(s)
- Yunhao Bai
- the School of Electrical and Information Engineering, Tianjin University, Tianjin, China
| | - Wenqi Li
- Department of Pathology, Key Laboratory of Cancer Prevention and Therapy, National Clinical Research Center for Cancer, Tianjin Medical University Cancer Institute and Hospital, Tianjin, China
| | - Jianpeng An
- the School of Electrical and Information Engineering, Tianjin University, Tianjin, China
| | - Lili Xia
- the School of Electrical and Information Engineering, Tianjin University, Tianjin, China
| | - Huazhen Chen
- the School of Electrical and Information Engineering, Tianjin University, Tianjin, China
| | - Gang Zhao
- Department of Pathology, Key Laboratory of Cancer Prevention and Therapy, National Clinical Research Center for Cancer, Tianjin Medical University Cancer Institute and Hospital, Tianjin, China
| | - Zhongke Gao
- the School of Electrical and Information Engineering, Tianjin University, Tianjin, China.
| |
Collapse
|
28
|
Luo Y, Ma Y, Yang Z. Multi-resolution auto-encoder for anomaly detection of retinal imaging. Phys Eng Sci Med 2024:10.1007/s13246-023-01381-x. [PMID: 38285270 DOI: 10.1007/s13246-023-01381-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2023] [Accepted: 12/27/2023] [Indexed: 01/30/2024]
Abstract
Identifying unknown types of diseases is a crucial step in preceding retinal imaging classification for the sake of safety, which is known as anomaly detection of retinal imaging. However, the widely-used supervised learning algorithms are not suitable for this problem, since the data of the unknown category is unobtainable. Moreover, for retinal imaging with different types of anomalous regions, using a single-resolution input causes information loss. Therefore, we propose an unsupervised auto-encoder model with multi-resolution inputs and outputs. We provide a theoretical understanding of the effectiveness of reconstruction error and the improvement of self-supervised learning for anomaly detection. Our experiments on two widely-used retinal imaging datasets show that the proposed methods are superior to other methods, and further experiments verify the validity of each part of the proposed method.
Collapse
Affiliation(s)
- Yixin Luo
- School of Mathematical Sciences, University of Science and Technology of China, No. 96 Jinzhai Road, Hefei, 230026, Anhui, China
| | - Yangling Ma
- School of Mathematical Sciences, Suzhou University of Science and Technology, No. 99 Xuefu Road, Suzhou, 215009, Jiangsu, China
| | - Zhouwang Yang
- School of Mathematical Sciences, University of Science and Technology of China, No. 96 Jinzhai Road, Hefei, 230026, Anhui, China.
| |
Collapse
|
29
|
Islam NU, Zhou Z, Gehlot S, Gotway MB, Liang J. Seeking an optimal approach for Computer-aided Diagnosis of Pulmonary Embolism. Med Image Anal 2024; 91:102988. [PMID: 37924750 DOI: 10.1016/j.media.2023.102988] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2022] [Revised: 09/28/2023] [Accepted: 09/29/2023] [Indexed: 11/06/2023]
Abstract
Pulmonary Embolism (PE) represents a thrombus ("blood clot"), usually originating from a lower extremity vein, that travels to the blood vessels in the lung, causing vascular obstruction and in some patients death. This disorder is commonly diagnosed using Computed Tomography Pulmonary Angiography (CTPA). Deep learning holds great promise for the Computer-aided Diagnosis (CAD) of PE. However, numerous deep learning methods, such as Convolutional Neural Networks (CNN) and Transformer-based models, exist for a given task, causing great confusion regarding the development of CAD systems for PE. To address this confusion, we present a comprehensive analysis of competing deep learning methods applicable to PE diagnosis based on four datasets. First, we use the RSNA PE dataset, which includes (weak) slice-level and exam-level labels, for PE classification and diagnosis, respectively. At the slice level, we compare CNNs with the Vision Transformer (ViT) and the Swin Transformer. We also investigate the impact of self-supervised versus (fully) supervised ImageNet pre-training, and transfer learning over training models from scratch. Additionally, at the exam level, we compare sequence model learning with our proposed transformer-based architecture, Embedding-based ViT (E-ViT). For the second and third datasets, we utilize the CAD-PE Challenge Dataset and Ferdowsi University of Mashad's PE Dataset, where we convert (strong) clot-level masks into slice-level annotations to evaluate the optimal CNN model for slice-level PE classification. Finally, we use our in-house PE-CAD dataset, which contains (strong) clot-level masks. Here, we investigate the impact of our vessel-oriented image representations and self-supervised pre-training on PE false positive reduction at the clot level across image dimensions (2D, 2.5D, and 3D). Our experiments show that (1) transfer learning boosts performance despite differences between photographic images and CTPA scans; (2) self-supervised pre-training can surpass (fully) supervised pre-training; (3) transformer-based models demonstrate comparable performance but slower convergence compared with CNNs for slice-level PE classification; (4) model trained on the RSNA PE dataset demonstrates promising performance when tested on unseen datasets for slice-level PE classification; (5) our E-ViT framework excels in handling variable numbers of slices and outperforms sequence model learning for exam-level diagnosis; and (6) vessel-oriented image representation and self-supervised pre-training both enhance performance for PE false positive reduction across image dimensions. Our optimal approach surpasses state-of-the-art results on the RSNA PE dataset, enhancing AUC by 0.62% (slice-level) and 2.22% (exam-level). On our in-house PE-CAD dataset, 3D vessel-oriented images improve performance from 80.07% to 91.35%, a remarkable 11% gain. Codes are available at GitHub.com/JLiangLab/CAD_PE.
Collapse
Affiliation(s)
- Nahid Ul Islam
- School of Computing and Augmented Intelligence, Arizona State University, Tempe, AZ 85281, USA
| | - Zongwei Zhou
- Department of Computer Science, Johns Hopkins University, Baltimore, MD 21218, USA
| | - Shiv Gehlot
- Biomedical Informants Program, Arizona State University, Phoenix, AZ 85054, USA
| | | | - Jianming Liang
- Biomedical Informants Program, Arizona State University, Phoenix, AZ 85054, USA.
| |
Collapse
|
30
|
Yang Z, Zang D, Li H, Zhang Z, Zhang F, Han R. Self-supervised noise modeling and sparsity guided electron tomography volumetric image denoising. Ultramicroscopy 2024; 255:113860. [PMID: 37844382 DOI: 10.1016/j.ultramic.2023.113860] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2023] [Revised: 08/07/2023] [Accepted: 09/27/2023] [Indexed: 10/18/2023]
Abstract
Cryo-Electron Tomography (cryo-ET) is a revolutionary technique for visualizing macromolecular structures in near-native states. However, the physical limitations of imaging instruments lead to cryo-ET volumetric images with very low Signal-to-Noise Ratio (SNR) with complex noise, which has a side effect on the downstream analysis of the characteristics of observed macromolecules. Additionally, existing methods for image denoising are difficult to be well generalized to the complex noise in cryo-ET volumes. In this work, we propose a self-supervised deep learning model for cryo-ET volumetric image denoising based on noise modeling and sparsity guidance (NMSG), achieved by learning the noise distribution in noisy cryo-ET volumes and introducing sparsity guidance to ensure smoothness. Firstly, a Generative Adversarial Network (GAN) is utilized to learn noise distribution in cryo-ET volumes and generate noisy volumes pair from single volume. Then, a new loss function is devised to both ensure the recovery of ultrastructure and local smoothness. Experiments are done on five real cryo-ET datasets and three simulated cryo-ET datasets. The comprehensive experimental results demonstrate that our method can perform reliable denoising by training on single noisy volume, achieving better results than state-of-the-art single volume-based methods and competitive with methods trained on large-scale datasets.
Collapse
Affiliation(s)
- Zhidong Yang
- High Performance Computer Research Center, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China; School of Medical Technology, Beijing Institute of Technology, Beijing, 100081, China; University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Dawei Zang
- High Performance Computer Research Center, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China
| | - Hongjia Li
- High Performance Computer Research Center, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China; University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Zhao Zhang
- Research Center for Mathematics and Interdisciplinary Sciences, Frontiers Science Center for Nonlinear Expectations (Ministry of Education), Shandong University, Qingdao 266237, China
| | - Fa Zhang
- School of Medical Technology, Beijing Institute of Technology, Beijing, 100081, China
| | - Renmin Han
- Research Center for Mathematics and Interdisciplinary Sciences, Frontiers Science Center for Nonlinear Expectations (Ministry of Education), Shandong University, Qingdao 266237, China
| |
Collapse
|
31
|
Kim B, Oh Y, Wood BJ, Summers RM, Ye JC. C-DARL: Contrastive diffusion adversarial representation learning for label-free blood vessel segmentation. Med Image Anal 2024; 91:103022. [PMID: 37976870 DOI: 10.1016/j.media.2023.103022] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2023] [Revised: 10/06/2023] [Accepted: 10/31/2023] [Indexed: 11/19/2023]
Abstract
Blood vessel segmentation in medical imaging is one of the essential steps for vascular disease diagnosis and interventional planning in a broad spectrum of clinical scenarios in image-based medicine and interventional medicine. Unfortunately, manual annotation of the vessel masks is challenging and resource-intensive due to subtle branches and complex structures. To overcome this issue, this paper presents a self-supervised vessel segmentation method, dubbed the contrastive diffusion adversarial representation learning (C-DARL) model. Our model is composed of a diffusion module and a generation module that learns the distribution of multi-domain blood vessel data by generating synthetic vessel images from diffusion latent. Moreover, we employ contrastive learning through a mask-based contrastive loss so that the model can learn more realistic vessel representations. To validate the efficacy, C-DARL is trained using various vessel datasets, including coronary angiograms, abdominal digital subtraction angiograms, and retinal imaging. Experimental results confirm that our model achieves performance improvement over baseline methods with noise robustness, suggesting the effectiveness of C-DARL for vessel segmentation.Our source code is available at https://github.com/boahK/MEDIA_CDARL.2.
Collapse
Affiliation(s)
- Boah Kim
- Radiology and Imaging Sciences, National Institutes of Health Clinical Center, Bethesda, MD, USA
| | - Yujin Oh
- Kim Jaechul Graduate School of AI, Korea Advanced Institute of Science & Technology (KAIST), Daejeon, Republic of Korea
| | - Bradford J Wood
- Radiology and Imaging Sciences, National Institutes of Health Clinical Center, Bethesda, MD, USA
| | - Ronald M Summers
- Radiology and Imaging Sciences, National Institutes of Health Clinical Center, Bethesda, MD, USA.
| | - Jong Chul Ye
- Kim Jaechul Graduate School of AI, Korea Advanced Institute of Science & Technology (KAIST), Daejeon, Republic of Korea.
| |
Collapse
|
32
|
Wang C, Zhao M, Zhou C, Dong N, Khan ZA, Zhao X, Alaya Cheikh F, Beghdadi A, Chen S. Smoke veil prior regularized surgical field desmoking without paired in-vivo data. Comput Biol Med 2024; 168:107761. [PMID: 38039894 DOI: 10.1016/j.compbiomed.2023.107761] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2023] [Revised: 11/16/2023] [Accepted: 11/21/2023] [Indexed: 12/03/2023]
Abstract
Though deep learning-based surgical smoke removal methods have shown significant improvements in effectiveness and efficiency, the lack of paired smoke and smoke-free images in real surgical scenarios limits the performance of these methods. Therefore, methods that can achieve good generalization performance without paired in-vivo data are in high demand. In this work, we propose a smoke veil prior regularized two-stage smoke removal framework based on the physical model of smoke image formation. More precisely, in the first stage, we leverage a reconstruction loss, a consistency loss and a smoke veil prior-based regularization term to perform fully supervised training on a synthetic paired image dataset. Then a self-supervised training stage is deployed on the real smoke images, where only the consistency loss and the smoke veil prior-based loss are minimized. Experiments show that the proposed method outperforms the state-of-the-art ones on synthetic dataset. The average PSNR, SSIM and RMSE values are 21.99±2.34, 0.9001±0.0252 and 0.2151±0.0643, respectively. The qualitative visual inspection on real dataset further demonstrates the effectiveness of the proposed method.
Collapse
Affiliation(s)
- Congcong Wang
- Engineering Research Center of Learning-Based Intelligent System, Ministry of Education, and School of Computer Science and Engineering, Tianjin University of Technology, Tianjin 300384, China
| | - Meng Zhao
- Engineering Research Center of Learning-Based Intelligent System, Ministry of Education, and School of Computer Science and Engineering, Tianjin University of Technology, Tianjin 300384, China.
| | - Chengguang Zhou
- Engineering Research Center of Learning-Based Intelligent System, Ministry of Education, and School of Computer Science and Engineering, Tianjin University of Technology, Tianjin 300384, China
| | - Nanqing Dong
- Shanghai Artificial Intelligence Laboratory, Shanghai 200232, China
| | - Zohaib Amjad Khan
- Laboratory of Signals and Systems (L2S), CentraleSupélec, Université Paris-Saclay, 91190 Gif-sur-Yvette, France
| | - Xintong Zhao
- Innovation Institute, Huafeng Meteorological Media Group Co., Ltd, Beijing 100081, China
| | - Faouzi Alaya Cheikh
- Intelligent Systems and Analytics Research Group, Norwegian University of Science and Technology, 2815 Gjøvik, Norway
| | - Azeddine Beghdadi
- Laboratory of Information Processing and Transmission, Institut Galilée, University Sorbonne Paris Nord, 93430 Villetaneuse, France
| | - Shengyong Chen
- Engineering Research Center of Learning-Based Intelligent System, Ministry of Education, and School of Computer Science and Engineering, Tianjin University of Technology, Tianjin 300384, China
| |
Collapse
|
33
|
Zhang J, Barbarisi S, Kadkhodamohammadi A, Stoyanov D, Luengo I. Self-knowledge distillation for surgical phase recognition. Int J Comput Assist Radiol Surg 2024; 19:61-68. [PMID: 37340283 DOI: 10.1007/s11548-023-02970-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2023] [Accepted: 05/19/2023] [Indexed: 06/22/2023]
Abstract
PURPOSE Advances in surgical phase recognition are generally led by training deeper networks. Rather than going further with a more complex solution, we believe that current models can be exploited better. We propose a self-knowledge distillation framework that can be integrated into current state-of-the-art (SOTA) models without requiring any extra complexity to the models or annotations. METHODS Knowledge distillation is a framework for network regularization where knowledge is distilled from a teacher network to a student network. In self-knowledge distillation, the student model becomes the teacher such that the network learns from itself. Most phase recognition models follow an encoder-decoder framework. Our framework utilizes self-knowledge distillation in both stages. The teacher model guides the training process of the student model to extract enhanced feature representations from the encoder and build a more robust temporal decoder to tackle the over-segmentation problem. RESULTS We validate our proposed framework on the public dataset Cholec80. Our framework is embedded on top of four popular SOTA approaches and consistently improves their performance. Specifically, our best GRU model boosts performance by [Formula: see text] accuracy and [Formula: see text] F1-score over the same baseline model. CONCLUSION We embed a self-knowledge distillation framework for the first time in the surgical phase recognition training pipeline. Experimental results demonstrate that our simple yet powerful framework can improve performance of existing phase recognition models. Moreover, our extensive experiments show that even with 75% of the training set we still achieve performance on par with the same baseline model trained on the full set.
Collapse
Affiliation(s)
- Jinglu Zhang
- Medtronic Digital Surgery, 230 City Road, London, UK
| | | | | | - Danail Stoyanov
- Medtronic Digital Surgery, 230 City Road, London, UK
- Wellcome/EPSRC Centre for Interventional and Surgical Sciences, University College London, London, UK
| | - Imanol Luengo
- Medtronic Digital Surgery, 230 City Road, London, UK
| |
Collapse
|
34
|
Xie Y, Zhang J, Liu L, Wang H, Ye Y, Verjans J, Xia Y. ReFs: A hybrid pre-training paradigm for 3D medical image segmentation. Med Image Anal 2024; 91:103023. [PMID: 37956551 DOI: 10.1016/j.media.2023.103023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2022] [Revised: 10/31/2023] [Accepted: 11/02/2023] [Indexed: 11/15/2023]
Abstract
Self-supervised learning (SSL) has achieved remarkable progress in medical image segmentation. The application of an SSL algorithm often follows a two-stage training process: using unlabeled data to perform label-free representation learning and fine-tuning the pre-trained model on the downstream tasks. One issue of this paradigm is that the SSL step is unaware of the downstream task, which may lead to sub-optimal feature representation for a target task. In this paper, we propose a hybrid pre-training paradigm that is driven by both self-supervised and supervised objectives. To achieve this, a supervised reference task is involved in self-supervised learning, aiming to improve the representation quality. Specifically, we employ the off-the-shelf medical image segmentation task as reference, and encourage learning a representation that (1) incurs low prediction loss on both SSL and reference tasks and (2) leads to a similar gradient when updating the feature extractor from either task. In this way, the reference task pilots SSL in the direction beneficial for the downstream segmentation. To this end, we propose a simple but effective gradient matching method to optimize the model towards a consistent direction, thus improving the compatibility of both SSL and supervised reference tasks. We call this hybrid pre-training paradigm reference-guided self-supervised learning (ReFs), and perform it on a large-scale unlabeled dataset and an additional reference dataset. The experimental results demonstrate its effectiveness on seven downstream medical image segmentation benchmarks.
Collapse
Affiliation(s)
| | - Jianpeng Zhang
- School of Computer Science and Engineering, Northwestern Polytechnical University, Xi'an 710072, China
| | | | - Hu Wang
- University of Adelaide, Australia
| | - Yiwen Ye
- School of Computer Science and Engineering, Northwestern Polytechnical University, Xi'an 710072, China
| | | | - Yong Xia
- School of Computer Science and Engineering, Northwestern Polytechnical University, Xi'an 710072, China.
| |
Collapse
|
35
|
Tan Z, Yu Y, Meng J, Liu S, Li W. Self-supervised learning with self-distillation on COVID-19 medical image classification. Comput Methods Programs Biomed 2024; 243:107876. [PMID: 37875036 DOI: 10.1016/j.cmpb.2023.107876] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/08/2023] [Revised: 10/11/2023] [Accepted: 10/17/2023] [Indexed: 10/26/2023]
Abstract
BACKGROUND AND OBJECTIVE Currently, COVID-19 is a highly infectious disease that can be clinically diagnosed based on diagnostic radiology. Deep learning is capable of mining the rich information implied in inpatient imaging data and accomplishing the classification of different stages of the disease process. However, a large amount of training data is essential to train an excellent deep-learning model. Unfortunately, due to factors such as privacy and labeling difficulties, annotated data for COVID-19 is extremely scarce, which encourages us to propose a more effective deep learning model that can effectively assist specialist physicians in COVID-19 diagnosis. METHODS In this study,we introduce Masked Autoencoder (MAE) for pre-training and fine-tuning directly on small-scale target datasets. Based on this, we propose Self-Supervised Learning with Self-Distillation on COVID-19 medical image classification (SSSD-COVID). In addition to the reconstruction loss computation on the masked image patches, SSSD-COVID further performs self-distillation loss calculations on the latent representation of the encoder and decoder outputs. The additional loss calculation can transfer the knowledge from the global attention of the decoder to the encoder which acquires only local attention. RESULTS Our model achieves 97.78 % recognition accuracy on the SARS-COV-CT dataset containing 2481 images and is further validated on the COVID-CT dataset containing 746 images, which achieves 81.76 % recognition accuracy. Further introduction of external knowledge resulted in experimental accuracies of 99.6% and 95.27 % on these two datasets, respectively. CONCLUSIONS SSSD-COVID can obtain good results on the target dataset alone, and when external information is introduced, the performance of the model can be further improved to significantly outperform other models.Overall, the experimental results show that our method can effectively mine COVID-19 features from rare data and can assist professional physicians in decision-making to improve the efficiency of COVID-19 disease detection.
Collapse
Affiliation(s)
- Zhiyong Tan
- School of Computer Science and Engineering, Dalian Minzu University, Dalian, Liaoning 116600, China
| | - Yuhai Yu
- School of Computer Science and Engineering, Dalian Minzu University, Dalian, Liaoning 116600, China
| | - Jiana Meng
- School of Computer Science and Engineering, Dalian Minzu University, Dalian, Liaoning 116600, China.
| | - Shuang Liu
- School of Computer Science and Engineering, Dalian Minzu University, Dalian, Liaoning 116600, China
| | - Wei Li
- School of Computer Science and Engineering, Dalian Minzu University, Dalian, Liaoning 116600, China
| |
Collapse
|
36
|
Fedorov A, Geenjaar E, Wu L, Sylvain T, DeRamus TP, Luck M, Misiura M, Mittapalle G, Hjelm RD, Plis SM, Calhoun VD. Self-supervised multimodal learning for group inferences from MRI data: Discovering disorder-relevant brain regions and multimodal links. Neuroimage 2024; 285:120485. [PMID: 38110045 PMCID: PMC10872501 DOI: 10.1016/j.neuroimage.2023.120485] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2023] [Revised: 11/15/2023] [Accepted: 12/04/2023] [Indexed: 12/20/2023] Open
Abstract
In recent years, deep learning approaches have gained significant attention in predicting brain disorders using neuroimaging data. However, conventional methods often rely on single-modality data and supervised models, which provide only a limited perspective of the intricacies of the highly complex brain. Moreover, the scarcity of accurate diagnostic labels in clinical settings hinders the applicability of the supervised models. To address these limitations, we propose a novel self-supervised framework for extracting multiple representations from multimodal neuroimaging data to enhance group inferences and enable analysis without resorting to labeled data during pre-training. Our approach leverages Deep InfoMax (DIM), a self-supervised methodology renowned for its efficacy in learning representations by estimating mutual information without the need for explicit labels. While DIM has shown promise in predicting brain disorders from single-modality MRI data, its potential for multimodal data remains untapped. This work extends DIM to multimodal neuroimaging data, allowing us to identify disorder-relevant brain regions and explore multimodal links. We present compelling evidence of the efficacy of our multimodal DIM analysis in uncovering disorder-relevant brain regions, including the hippocampus, caudate, insula, - and multimodal links with the thalamus, precuneus, and subthalamus hypothalamus. Our self-supervised representations demonstrate promising capabilities in predicting the presence of brain disorders across a spectrum of Alzheimer's phenotypes. Comparative evaluations against state-of-the-art unsupervised methods based on autoencoders, canonical correlation analysis, and supervised models highlight the superiority of our proposed method in achieving improved classification performance, capturing joint information, and interpretability capabilities. The computational efficiency of the decoder-free strategy enhances its practical utility, as it saves compute resources without compromising performance. This work offers a significant step forward in addressing the challenge of understanding multimodal links in complex brain disorders, with potential applications in neuroimaging research and clinical diagnosis.
Collapse
Affiliation(s)
- Alex Fedorov
- Tri-Institutional Center for Translational Research in Neuroimaging and Data Science (TReNDS), Georgia State, Georgia Tech, Emory, Atlanta, GA, USA.
| | - Eloy Geenjaar
- Tri-Institutional Center for Translational Research in Neuroimaging and Data Science (TReNDS), Georgia State, Georgia Tech, Emory, Atlanta, GA, USA
| | - Lei Wu
- Tri-Institutional Center for Translational Research in Neuroimaging and Data Science (TReNDS), Georgia State, Georgia Tech, Emory, Atlanta, GA, USA
| | | | - Thomas P DeRamus
- Tri-Institutional Center for Translational Research in Neuroimaging and Data Science (TReNDS), Georgia State, Georgia Tech, Emory, Atlanta, GA, USA
| | - Margaux Luck
- Mila - Quebec AI Institute, Montréal, QC, Canada
| | - Maria Misiura
- Tri-Institutional Center for Translational Research in Neuroimaging and Data Science (TReNDS), Georgia State, Georgia Tech, Emory, Atlanta, GA, USA
| | - Girish Mittapalle
- Tri-Institutional Center for Translational Research in Neuroimaging and Data Science (TReNDS), Georgia State, Georgia Tech, Emory, Atlanta, GA, USA
| | - R Devon Hjelm
- Mila - Quebec AI Institute, Montréal, QC, Canada; Apple Machine Learning Research, Seattle, WA, USA
| | - Sergey M Plis
- Tri-Institutional Center for Translational Research in Neuroimaging and Data Science (TReNDS), Georgia State, Georgia Tech, Emory, Atlanta, GA, USA
| | - Vince D Calhoun
- Tri-Institutional Center for Translational Research in Neuroimaging and Data Science (TReNDS), Georgia State, Georgia Tech, Emory, Atlanta, GA, USA
| |
Collapse
|
37
|
Ma S, Chen J, Ho JWK. An edge-device-compatible algorithm for valvular heart diseases screening using phonocardiogram signals with a lightweight convolutional neural network and self-supervised learning. Comput Methods Programs Biomed 2024; 243:107906. [PMID: 37950925 DOI: 10.1016/j.cmpb.2023.107906] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/22/2022] [Revised: 02/24/2023] [Accepted: 10/27/2023] [Indexed: 11/13/2023]
Abstract
BACKGROUND AND OBJECTIVES Detection and classification of heart murmur using mobile-phone-collected sound is an emerging approach to the scale-up screening of valvular heart disease at a population level. Nonetheless, the widespread adoption of artificial intelligence (AI) methods for this type of mobile health (mHealth) application requires highly accurate and lightweight AI models that can be deployed in consumer-grade mobile devices. This study presents a lightweight deep learning model and a self-supervised learning (SSL) method to utilise unlabelled data to improve the accuracy of valvular heart disease classification using phonocardiogram data. METHODS This study proposes a lightweight convolutional neural network (CNN) that consists of ten times fewer parameters than other deep learning models to classify phonocardiogram data. SSL is applied to harness a large collection of unlabelled data as pre-training to enhance the accuracy and robustness of the model and reduce the number of epochs required to converge. A mobile application prototype that encapsulates the model is developed to perform in-device inference and fine-turning. RESULTS The proposed lightweight model achieves an average accuracy of 98.65% in 10-fold cross-validation. When coupled with SSL using unlabelled data, the pre-trained model can reach an average accuracy higher than 99.4% in 10-fold cross-validation. Furthermore, SSL-trained models have a 4-20% improvement in classification accuracy over non-SSL-trained models when tested with perturbed or noisy data, suggesting that SSL improves robustness of the model. When deployed on common smartphones, in-device fine-tuning and inference of the model can be completed within 0.03-0.37 s, which is considerably faster than 0.22-5.7 s by a standard CNN model that have ten times the number of parameters. Our lightweight model also consumes only a third of the power compared to the larger standard model. CONCLUSION This work presents a lightweight and accurate phonocardiogram classifier that supports near real-time performance on standard mobile devices.
Collapse
Affiliation(s)
- Shichao Ma
- School of Biomedical Sciences, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Pokfulam, Hong Kong SAR, China; Laboratory of Data Discovery for Health Limited (D24H), Hong Kong Science Park, Hong Kong SAR, China
| | - Junyi Chen
- School of Biomedical Sciences, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Pokfulam, Hong Kong SAR, China; Laboratory of Data Discovery for Health Limited (D24H), Hong Kong Science Park, Hong Kong SAR, China
| | - Joshua W K Ho
- School of Biomedical Sciences, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Pokfulam, Hong Kong SAR, China; Laboratory of Data Discovery for Health Limited (D24H), Hong Kong Science Park, Hong Kong SAR, China.
| |
Collapse
|
38
|
Cui W, Akrami H, Zhao G, Joshi AA, Leahy RM. Meta Transfer of Self-Supervised Knowledge: Foundation Model in Action for Post-Traumatic Epilepsy Prediction. ArXiv 2023:arXiv:2312.14204v1. [PMID: 38196751 PMCID: PMC10775348] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 01/11/2024]
Abstract
Despite the impressive advancements achieved using deep-learning for functional brain activity analysis, the heterogeneity of functional patterns and scarcity of imaging data still pose challenges in tasks such as prediction of future onset of Post-Traumatic Epilepsy (PTE) from data acquired shortly after traumatic brain injury (TBI). Foundation models pre-trained on separate large-scale datasets can improve the performance from scarce and heterogeneous datasets. For functional Magnetic Resonance Imaging (fMRI), while data may be abundantly available from healthy controls, clinical data is often scarce, limiting the ability of foundation models to identify clinically-relevant features. We overcome this limitation by introducing a novel training strategy for our foundation model by integrating meta-learning with self-supervised learning to improve the generalization from normal to clinical features. In this way we enable generalization to other downstream clinical tasks, in our case prediction of PTE. To achieve this, we perform self-supervised training on the control dataset to focus on inherent features that are not limited to a particular supervised task while applying meta-learning, which strongly improves the model's generalizability using bi-level optimization. Through experiments on neurological disorder classification tasks, we demonstrate that the proposed strategy significantly improves task performance on small-scale clinical datasets. To explore the generalizability of the foundation model in downstream applications, we then apply the model to an unseen TBI dataset for prediction of PTE using zero-shot learning. Results further demonstrated the enhanced generalizability of our foundation model.
Collapse
Affiliation(s)
- Wenhui Cui
- Ming Hsieh Department of Electrical and Computer Engineering, University of Southern California, Los Angeles 90089, United States
| | - Haleh Akrami
- Ming Hsieh Department of Electrical and Computer Engineering, University of Southern California, Los Angeles 90089, United States
| | - Ganning Zhao
- Ming Hsieh Department of Electrical and Computer Engineering, University of Southern California, Los Angeles 90089, United States
| | - Anand A. Joshi
- Ming Hsieh Department of Electrical and Computer Engineering, University of Southern California, Los Angeles 90089, United States
| | - Richard M. Leahy
- Ming Hsieh Department of Electrical and Computer Engineering, University of Southern California, Los Angeles 90089, United States
| |
Collapse
|
39
|
Fang Y, Wei Y, Liu X, Qin L, Gao Y, Yu Z, Xu X, Cha G, Zhu X, Wang X, Xu L, Cao L, Chen X, Jiang H, Zhang C, Zhou Y, Zhu J. A self-supervised classification model for endometrial diseases. J Cancer Res Clin Oncol 2023; 149:17855-17863. [PMID: 37947870 PMCID: PMC10725391 DOI: 10.1007/s00432-023-05467-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2023] [Accepted: 10/09/2023] [Indexed: 11/12/2023]
Abstract
PURPOSE Ultrasound imaging is the preferred method for the early diagnosis of endometrial diseases because of its non-invasive nature, low cost, and real-time imaging features. However, the accurate evaluation of ultrasound images relies heavily on the experience of radiologist. Therefore, a stable and objective computer-aided diagnostic model is crucial to assist radiologists in diagnosing endometrial lesions. METHODS Transvaginal ultrasound images were collected from multiple hospitals in Quzhou city, Zhejiang province. The dataset comprised 1875 images from 734 patients, including cases of endometrial polyps, hyperplasia, and cancer. Here, we proposed a based self-supervised endometrial disease classification model (BSEM) that learns a joint unified task (raw and self-supervised tasks) and applies self-distillation techniques and ensemble strategies to aid doctors in diagnosing endometrial diseases. RESULTS The performance of BSEM was evaluated using fivefold cross-validation. The experimental results indicated that the BSEM model achieved satisfactory performance across indicators, with scores of 75.1%, 87.3%, 76.5%, 73.4%, and 74.1% for accuracy, area under the curve, precision, recall, and F1 score, respectively. Furthermore, compared to the baseline models ResNet, DenseNet, VGGNet, ConvNeXt, VIT, and CMT, the BSEM model enhanced accuracy, area under the curve, precision, recall, and F1 score in 3.3-7.9%, 3.2-7.3%, 3.9-8.5%, 3.1-8.5%, and 3.3-9.0%, respectively. CONCLUSION The BSEM model is an auxiliary diagnostic tool for the early detection of endometrial diseases revealed by ultrasound and helps radiologists to be accurate and efficient while screening for precancerous endometrial lesions.
Collapse
Affiliation(s)
- Yun Fang
- Quzhou People's Hospital, The Quzhou Affiliated Hospital of Wenzhou Medical University, Quzhou, 324000, Zhejiang, China
| | - Yanmin Wei
- Tianjin Normal University, Tianjin, 300387, China
| | - Xiaoying Liu
- Quzhou People's Hospital, The Quzhou Affiliated Hospital of Wenzhou Medical University, Quzhou, 324000, Zhejiang, China
| | - Liufeng Qin
- Quzhou People's Hospital, The Quzhou Affiliated Hospital of Wenzhou Medical University, Quzhou, 324000, Zhejiang, China
| | - Yunxia Gao
- The Second People's Hospital of Quzhou, Quzhou, 324000, Zhejiang, China
| | - Zhengjun Yu
- Kaihua County People's Hospital, Quzhou, 324300, Zhejiang, China
| | - Xia Xu
- Changshan County People's Hospital, Quzhou, 324200, Zhejiang, China
| | - Guofen Cha
- People's Hospital of Quzhou Kecheng, Quzhou, 324000, Zhejiang, China
| | - Xuehua Zhu
- Quzhou Maternal and Child Health Care Hospital, Quzhou, 324000, Zhejiang, China
| | - Xue Wang
- Quzhou People's Hospital, The Quzhou Affiliated Hospital of Wenzhou Medical University, Quzhou, 324000, Zhejiang, China
| | - Lijuan Xu
- Quzhou People's Hospital, The Quzhou Affiliated Hospital of Wenzhou Medical University, Quzhou, 324000, Zhejiang, China
| | - Lulu Cao
- Quzhou People's Hospital, The Quzhou Affiliated Hospital of Wenzhou Medical University, Quzhou, 324000, Zhejiang, China
| | - Xiangrui Chen
- Changshan County People's Hospital, Quzhou, 324200, Zhejiang, China
| | - Haixia Jiang
- Kaihua County People's Hospital, Quzhou, 324300, Zhejiang, China
| | - Chaozhen Zhang
- People's Hospital of Quzhou Kecheng, Quzhou, 324000, Zhejiang, China
| | - Yuwang Zhou
- Quzhou People's Hospital, The Quzhou Affiliated Hospital of Wenzhou Medical University, Quzhou, 324000, Zhejiang, China.
| | - Jinqi Zhu
- Tianjin Normal University, Tianjin, 300387, China.
| |
Collapse
|
40
|
Yan Y, Yang T, Zhao X, Jiao C, Yang A, Miao J. DC-SiamNet: Deep contrastive Siamese network for self-supervised MRI reconstruction. Comput Biol Med 2023; 167:107619. [PMID: 37925909 DOI: 10.1016/j.compbiomed.2023.107619] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2023] [Revised: 10/03/2023] [Accepted: 10/23/2023] [Indexed: 11/07/2023]
Abstract
Reconstruction methods based on deep learning have greatly shortened the data acquisition time of magnetic resonance imaging (MRI). However, these methods typically utilize massive fully sampled data for supervised training, restricting their application in certain clinical scenarios and posing challenges to the reconstruction effect when high-quality MR images are unavailable. Recently, self-supervised methods have been developed that only undersampled MRI images participate in the network training. Nevertheless, due to the lack of complete referable MR image data, self-supervised reconstruction is prone to produce incorrect structure contents, such as unnatural texture details and over-smoothed tissue sites. To solve this problem, we propose a self-supervised Deep Contrastive Siamese Network (DC-SiamNet) for fast MR imaging. First, DC-SiamNet performs the reconstruction with a Siamese unrolled structure and obtains visual representations in different iterative phases. Particularly, an attention-weighted average pooling module is employed at the bottleneck layer of the U-shape regularization unit, which can effectively aggregate valuable local information of the underlying feature map in the generated representation vector. Then, a novel hybrid loss function is designed to drive the self-supervised reconstruction and contrastive learning simultaneously by forcing the output consistency across different branches in the frequency domain, the image domain, and the latent space. The proposed method is extensively evaluated with different sampling patterns on the IXI brain dataset and the MRINet knee dataset. Experimental results show that DC-SiamNet can achieve 0.93 in structural similarity and 33.984 dB in peak signal-to-noise ratio on the IXI brain dataset under 8x acceleration. It has better reconstruction accuracy than other methods, and the performance is close to the corresponding model trained with full supervision, especially when the sampling rate is low. In addition, generalization experiments verify that our method has a strong cross-domain reconstruction ability for different contrast brain images.
Collapse
Affiliation(s)
- Yanghui Yan
- School of Information Science and Engineering, Henan University of Technology, Zhengzhou, 450001, China
| | - Tiejun Yang
- School of Artificial Intelligence and Big Data, Henan University of Technology, Zhengzhou, 450001, China; Key Laboratory of Grain Information Processing and Control (HAUT), Ministry of Education, Zhengzhou, China; Henan Key Laboratory of Grain Photoelectric Detection and Control (HAUT), Zhengzhou, Henan, China.
| | - Xiang Zhao
- School of Information Science and Engineering, Henan University of Technology, Zhengzhou, 450001, China
| | - Chunxia Jiao
- School of Information Science and Engineering, Henan University of Technology, Zhengzhou, 450001, China
| | - Aolin Yang
- School of Information Science and Engineering, Henan University of Technology, Zhengzhou, 450001, China
| | - Jianyu Miao
- School of Artificial Intelligence and Big Data, Henan University of Technology, Zhengzhou, 450001, China
| |
Collapse
|
41
|
Naseri A, Tax D, van der Harst P, Reinders M, van der Bilt I. Data-efficient machine learning methods in the ME-TIME study: Rationale and design of a longitudinal study to detect atrial fibrillation and heart failure from wearables. Cardiovasc Digit Health J 2023; 4:165-172. [PMID: 38222103 PMCID: PMC10787149 DOI: 10.1016/j.cvdhj.2023.09.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/16/2024] Open
Abstract
Background Smartwatches enable continuous and noninvasive time series monitoring of cardiovascular biomarkers like heart rate (from photoplethysmograms), step counter, skin temperature, et cetera; as such, they have promise in assisting in early detection and prevention of cardiovascular disease. Although these biomarkers may not be directly useful to physicians, a machine learning (ML) model could find clinically relevant patterns. Unfortunately, ML models typically need supervised (ie, annotated) data, and labeling of large amounts of continuous data is very labor intensive. Therefore, ML methods that are data efficient, ie, needing a low number of labels, are required to detect potential clinical value in patterns found in wearable data. Objective The primary study objective of the ME-TIME (Machine Learning Enabled Time Series Analysis in Medicine) study is to design an ML model that can detect atrial fibrillation (AF) and heart failure (HF) from wearable data in a data-efficient manner. To achieve this, self-supervised and weakly supervised learning techniques are used. Methods Two hundred subjects (100 reference, 50 AF, and 50 HF) are being invited to participate in wearing a Fitbit fitness tracker for 3 months. Interested volunteers are sent a questionnaire to determine their health, in particular cardiovascular health. Volunteers without any (history of) serious illness are assigned to the reference group. Participants with AF and HF are recruited in the Haga teaching hospital in The Hague, The Netherlands. Results Enrollment commenced on May 1, 2022, and as of the time of this report, 62 subjects have been included in the study. Preliminary analysis of the data reveals significant inter-subject variability. Notably, we identified heart rate recovery curves and time-delayed correlations between heart rate and step count as potential strong indicators for heart disease. Conclusion Using self-supervised and multiple-instance learning techniques, we hypothesize that patterns specific to AF and HF can be found in continuous data obtained from smartwatches.
Collapse
Affiliation(s)
- Arman Naseri
- Department of Cardiology, Haga Teaching Hospital, The Hague, The Netherlands
- Pattern Recognition and Bioinformatics, Delft University of Technology, Delft, The Netherlands
| | - David Tax
- Pattern Recognition and Bioinformatics, Delft University of Technology, Delft, The Netherlands
| | - Pim van der Harst
- Department of Cardiology, University Medical Center Utrecht, Utrecht, The Netherlands
| | - Marcel Reinders
- Pattern Recognition and Bioinformatics, Delft University of Technology, Delft, The Netherlands
| | - Ivo van der Bilt
- Department of Cardiology, Haga Teaching Hospital, The Hague, The Netherlands
- Department of Cardiology, University Medical Center Utrecht, Utrecht, The Netherlands
| |
Collapse
|
42
|
Tian Y, Liu F, Pang G, Chen Y, Liu Y, Verjans JW, Singh R, Carneiro G. Self-supervised pseudo multi-class pre-training for unsupervised anomaly detection and segmentation in medical images. Med Image Anal 2023; 90:102930. [PMID: 37657364 DOI: 10.1016/j.media.2023.102930] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2022] [Revised: 07/12/2023] [Accepted: 08/07/2023] [Indexed: 09/03/2023]
Abstract
Unsupervised anomaly detection (UAD) methods are trained with normal (or healthy) images only, but during testing, they are able to classify normal and abnormal (or disease) images. UAD is an important medical image analysis (MIA) method to be applied in disease screening problems because the training sets available for those problems usually contain only normal images. However, the exclusive reliance on normal images may result in the learning of ineffective low-dimensional image representations that are not sensitive enough to detect and segment unseen abnormal lesions of varying size, appearance, and shape. Pre-training UAD methods with self-supervised learning, based on computer vision techniques, can mitigate this challenge, but they are sub-optimal because they do not explore domain knowledge for designing the pretext tasks, and their contrastive learning losses do not try to cluster the normal training images, which may result in a sparse distribution of normal images that is ineffective for anomaly detection. In this paper, we propose a new self-supervised pre-training method for MIA UAD applications, named Pseudo Multi-class Strong Augmentation via Contrastive Learning (PMSACL). PMSACL consists of a novel optimisation method that contrasts a normal image class from multiple pseudo classes of synthesised abnormal images, with each class enforced to form a dense cluster in the feature space. In the experiments, we show that our PMSACL pre-training improves the accuracy of SOTA UAD methods on many MIA benchmarks using colonoscopy, fundus screening and Covid-19 Chest X-ray datasets.
Collapse
Affiliation(s)
- Yu Tian
- Harvard Ophthalmology AI Lab, Harvard Medical School, United States of America.
| | - Fengbei Liu
- Australian Institute for Machine Learning, University of Adelaide, Australia
| | - Guansong Pang
- School of Computing and Information Systems, Singapore Management University, Singapore
| | - Yuanhong Chen
- Australian Institute for Machine Learning, University of Adelaide, Australia
| | - Yuyuan Liu
- Australian Institute for Machine Learning, University of Adelaide, Australia
| | - Johan W Verjans
- Australian Institute for Machine Learning, University of Adelaide, Australia; South Australian Health and Medical Research Institute, Australia; Faculty of Health and Medical Sciences, University of Adelaide, Australia
| | - Rajvinder Singh
- Faculty of Health and Medical Sciences, University of Adelaide, Australia
| | - Gustavo Carneiro
- Centre for Vision, Speech and Signal Processing, University of Surrey, United Kingdom
| |
Collapse
|
43
|
Qiao X, Ge C, Zhao C, Tosi F, Poggi M, Mattoccia S. Self-supervised depth super-resolution with contrastive multiview pre-training. Neural Netw 2023; 168:223-236. [PMID: 37769459 DOI: 10.1016/j.neunet.2023.09.023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2023] [Revised: 08/03/2023] [Accepted: 09/13/2023] [Indexed: 09/30/2023]
Abstract
Many low-level vision tasks, including guided depth super-resolution (GDSR), struggle with the issue of insufficient paired training data. Self-supervised learning is a promising solution, but it remains challenging to upsample depth maps without the explicit supervision of high-resolution target images. To alleviate this problem, we propose a self-supervised depth super-resolution method with contrastive multiview pre-training. Unlike existing contrastive learning methods for classification or segmentation tasks, our strategy can be applied to regression tasks even when trained on a small-scale dataset and can reduce information redundancy by extracting unique features from the guide. Furthermore, we propose a novel mutual modulation scheme that can effectively compute the local spatial correlation between cross-modal features. Exhaustive experiments demonstrate that our method attains superior performance with respect to state-of-the-art GDSR methods and exhibits good generalization to other modalities.
Collapse
Affiliation(s)
- Xin Qiao
- Institute of Artificial Intelligence and Robotics, Xi'an Jiaotong University, Xi'an, 710049, China
| | - Chenyang Ge
- Institute of Artificial Intelligence and Robotics, Xi'an Jiaotong University, Xi'an, 710049, China.
| | - Chaoqiang Zhao
- National Key Laboratory of Air-based Information Perception and Fusion, Luoyang, 471000, China; Luoyang Institute of Electro-Optical Equipment of Avic, Luoyang, 471000, China
| | - Fabio Tosi
- Department of Computer Science and Engineering, University of Bologna, Bologna, 40136, Italy
| | - Matteo Poggi
- Department of Computer Science and Engineering, University of Bologna, Bologna, 40136, Italy
| | - Stefano Mattoccia
- Department of Computer Science and Engineering, University of Bologna, Bologna, 40136, Italy
| |
Collapse
|
44
|
Li G, Togo R, Ogawa T, Haseyama M. Self-supervised learning for gastritis detection with gastric X-ray images. Int J Comput Assist Radiol Surg 2023; 18:1841-1848. [PMID: 37040011 DOI: 10.1007/s11548-023-02891-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2022] [Accepted: 03/27/2023] [Indexed: 04/12/2023]
Abstract
PURPOSE Manual annotation of gastric X-ray images by doctors for gastritis detection is time-consuming and expensive. To solve this, a self-supervised learning method is developed in this study. The effectiveness of the proposed self-supervised learning method in gastritis detection is verified using a few annotated gastric X-ray images. METHODS In this study, we develop a novel method that can perform explicit self-supervised learning and learn discriminative representations from gastric X-ray images. Models trained based on the proposed method were fine-tuned on datasets comprising a few annotated gastric X-ray images. Five self-supervised learning methods, i.e., SimSiam, BYOL, PIRL-jigsaw, PIRL-rotation, and SimCLR, were compared with the proposed method. Furthermore, three previous methods, one pretrained on ImageNet, one trained from scratch, and one semi-supervised learning method, were compared with the proposed method. RESULTS The proposed method's harmonic mean score of sensitivity and specificity after fine-tuning with the annotated data of 10, 20, 30, and 40 patients were 0.875, 0.911, 0.915, and 0.931, respectively. The proposed method outperformed all comparative methods, including the five self-supervised learning and three previous methods. Experimental results showed the effectiveness of the proposed method in gastritis detection using a few annotated gastric X-ray images. CONCLUSIONS This paper proposes a novel self-supervised learning method based on a teacher-student architecture for gastritis detection using gastric X-ray images. The proposed method can perform explicit self-supervised learning and learn discriminative representations from gastric X-ray images. The proposed method exhibits potential clinical use in gastritis detection using a few annotated gastric X-ray images.
Collapse
Affiliation(s)
- Guang Li
- Graduate School of Information Science and Technology, Hokkaido University, Sapporo, Japan
| | - Ren Togo
- Faculty of Information Science and Technology, Hokkaido University, Sapporo, Japan
| | - Takahiro Ogawa
- Faculty of Information Science and Technology, Hokkaido University, Sapporo, Japan
| | - Miki Haseyama
- Faculty of Information Science and Technology, Hokkaido University, Sapporo, Japan.
| |
Collapse
|
45
|
Arabzadeh N, Bagheri E. A self-supervised language model selection strategy for biomedical question answering. J Biomed Inform 2023; 146:104486. [PMID: 37722445 DOI: 10.1016/j.jbi.2023.104486] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2023] [Revised: 08/14/2023] [Accepted: 09/01/2023] [Indexed: 09/20/2023]
Abstract
Large neural-based Pre-trained Language Models (PLM) have recently gained much attention due to their noteworthy performance in many downstream Information Retrieval (IR) and Natural Language Processing (NLP) tasks. PLMs can be categorized as either general-purpose, which are trained on resources such as large-scale Web corpora, and domain-specific which are trained on in-domain or mixed-domain corpora. While domain-specific PLMs have shown promising performance on domain-specific tasks, they are significantly more computationally expensive compared to general-purpose PLMs as they have to be either retrained or trained from scratch. The objective of our work in this paper is to explore whether it would be possible to leverage general-purpose PLMs to show competitive performance to domain-specific PLMs without the need for expensive retraining of the PLMs for domain-specific tasks. By focusing specifically on the recent BioASQ Biomedical Question Answering task, we show how different general-purpose PLMs show synergistic behaviour in terms of performance, which can lead to overall notable performance improvement when used in tandem with each other. More concretely, given a set of general-purpose PLMs, we propose a self-supervised method for training a classifier that systematically selects the PLM that is most likely to answer the question correctly on a per-input basis. We show that through such a selection strategy, the performance of general-purpose PLMs can become competitive with domain-specific PLMs while remaining computationally light since there is no need to retrain the large language model itself. We run experiments on the BioASQ dataset, which is a large-scale biomedical question-answering benchmark. We show that utilizing our proposed selection strategy can show statistically significant performance improvements on general-purpose language models with an average of 16.7% when using only lighter models such as DistilBERT and DistilRoBERTa, as well as 14.2% improvement when using relatively larger models such as BERT and RoBERTa and so, their performance become competitive with domain-specific large language models such as PubMedBERT.
Collapse
|
46
|
Lin Y, Qu Z, Chen H, Gao Z, Li Y, Xia L, Ma K, Zheng Y, Cheng KT. Nuclei segmentation with point annotations from pathology images via self-supervised learning and co-training. Med Image Anal 2023; 89:102933. [PMID: 37611532 DOI: 10.1016/j.media.2023.102933] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2022] [Revised: 07/21/2023] [Accepted: 08/10/2023] [Indexed: 08/25/2023]
Abstract
Nuclei segmentation is a crucial task for whole slide image analysis in digital pathology. Generally, the segmentation performance of fully-supervised learning heavily depends on the amount and quality of the annotated data. However, it is time-consuming and expensive for professional pathologists to provide accurate pixel-level ground truth, while it is much easier to get coarse labels such as point annotations. In this paper, we propose a weakly-supervised learning method for nuclei segmentation that only requires point annotations for training. First, coarse pixel-level labels are derived from the point annotations based on the Voronoi diagram and the k-means clustering method to avoid overfitting. Second, a co-training strategy with an exponential moving average method is designed to refine the incomplete supervision of the coarse labels. Third, a self-supervised visual representation learning method is tailored for nuclei segmentation of pathology images that transforms the hematoxylin component images into the H&E stained images to gain better understanding of the relationship between the nuclei and cytoplasm. We comprehensively evaluate the proposed method using two public datasets. Both visual and quantitative results demonstrate the superiority of our method to the state-of-the-art methods, and its competitive performance compared to the fully-supervised methods. Codes are available at https://github.com/hust-linyi/SC-Net.
Collapse
Affiliation(s)
- Yi Lin
- Department of Computer Science and Engineering, The Hong Kong University of Science and Technology, Hong Kong
| | - Zhiyong Qu
- School of Electrical and Information Engineering, Tianjin University, Tianjin, China
| | - Hao Chen
- Department of Computer Science and Engineering, The Hong Kong University of Science and Technology, Hong Kong; Department of Chemical and Biological Engineering, The Hong Kong University of Science and Technology, Hong Kong.
| | - Zhongke Gao
- School of Electrical and Information Engineering, Tianjin University, Tianjin, China
| | | | - Lili Xia
- School of Electrical and Information Engineering, Tianjin University, Tianjin, China
| | - Kai Ma
- Tencent Jarvis Lab, Shenzhen, China
| | | | - Kwang-Ting Cheng
- Department of Computer Science and Engineering, The Hong Kong University of Science and Technology, Hong Kong
| |
Collapse
|
47
|
Lin T, Yu Z, Xu Z, Hu H, Xu Y, Chen CW. SGCL: Spatial guided contrastive learning on whole-slide pathological images. Med Image Anal 2023; 89:102845. [PMID: 37597317 DOI: 10.1016/j.media.2023.102845] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2022] [Revised: 05/09/2023] [Accepted: 05/15/2023] [Indexed: 08/21/2023]
Abstract
Self-supervised representation learning (SSL) has achieved remarkable success in its application to natural images while falling behind in performance when applied to whole-slide pathological images (WSIs). This is because the inherent characteristics of WSIs in terms of gigapixel resolution and multiple objects in training patches are fundamentally different from natural images. Directly transferring the state-of-the-art (SOTA) SSL methods designed for natural images to WSIs will inevitably compromise their performance. We present a novel scheme SGCL: Spatial Guided Contrastive Learning, to fully explore the inherent properties of WSIs, leveraging the spatial proximity and multi-object priors for stable self-supervision. Beyond the self-invariance of instance discrimination, we expand and propagate the spatial proximity for the intra-invariance from the same WSI and inter-invariance from different WSIs, as well as propose the spatial-guided multi-cropping for inner-invariance within patches. To adaptively explore such spatial information without supervision, we propose a new loss function and conduct a theoretical analysis to validate it. This novel scheme of SGCL is able to achieve additional improvements over the SOTA pre-training methods on diverse downstream tasks across multiple datasets. Extensive ablation studies have been carried out and visualizations of these results have been presented to aid understanding of the proposed SGCL scheme. As open science, all codes and pre-trained models are available at https://github.com/HHHedo/SGCL.
Collapse
Affiliation(s)
- Tiancheng Lin
- Shanghai Key Lab of Digital Media Processing and Transmission, Shanghai Jiao Tong University, China; MoE Key Lab of Artificial Intelligence, AI Institute, Shanghai Jiao Tong University, China
| | - Zhimiao Yu
- Shanghai Key Lab of Digital Media Processing and Transmission, Shanghai Jiao Tong University, China; MoE Key Lab of Artificial Intelligence, AI Institute, Shanghai Jiao Tong University, China
| | - Zengchao Xu
- Department of Mathematics and Lab for Educational Big Data and Policymaking, Shanghai Normal University, China
| | - Hongyu Hu
- Shanghai Key Lab of Digital Media Processing and Transmission, Shanghai Jiao Tong University, China; MoE Key Lab of Artificial Intelligence, AI Institute, Shanghai Jiao Tong University, China
| | - Yi Xu
- Shanghai Key Lab of Digital Media Processing and Transmission, Shanghai Jiao Tong University, China; MoE Key Lab of Artificial Intelligence, AI Institute, Shanghai Jiao Tong University, China.
| | | |
Collapse
|
48
|
Salazar González JL, Álvarez-García JA, Rendón-Segador FJ, Carrara F. Conditioned Cooperative training for semi-supervised weapon detection. Neural Netw 2023; 167:489-501. [PMID: 37690211 DOI: 10.1016/j.neunet.2023.08.043] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2022] [Revised: 07/11/2023] [Accepted: 08/23/2023] [Indexed: 09/12/2023]
Abstract
Violent assaults and homicides occur daily, and the number of victims of mass shootings increases every year. However, this number can be reduced with the help of Closed Circuit Television (CCTV) and weapon detection models, as generic object detectors have become increasingly accurate with more data for training. We present a new semi-supervised learning methodology based on conditioned cooperative student-teacher training with optimal pseudo-label generation using a novel confidence threshold search method and improving both models by conditional knowledge transfer. Furthermore, a novel firearms image dataset of 458,599 images was collected using Instagram hashtags to evaluate our approach and compare the improvements obtained using a specific unsupervised dataset instead of a general one such as ImageNet. We compared our methodology with supervised, semi-supervised and self-supervised learning techniques, outperforming approaches such as YOLOv5 m (up to +19.86), YOLOv5l (up to +6.52) Unbiased Teacher (up to +10.5 AP), DETReg (up to +2.8 AP) and UP-DETR (up to +1.22 AP).
Collapse
Affiliation(s)
| | | | | | - Fabio Carrara
- Institute of Information Science and Technologies of the National Research Council of Italy (ISTI-CNR), Pisa, Italy.
| |
Collapse
|
49
|
Li J, Gao H, Qiang W, Zheng C. Information theory-guided heuristic progressive multi-view coding. Neural Netw 2023; 167:415-432. [PMID: 37673028 DOI: 10.1016/j.neunet.2023.08.027] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2023] [Revised: 07/17/2023] [Accepted: 08/17/2023] [Indexed: 09/08/2023]
Abstract
Multi-view representation learning aims to capture comprehensive information from multiple views of a shared context. Recent works intuitively apply contrastive learning to different views in a pairwise manner, which is still scalable: view-specific noise is not filtered in learning view-shared representations; the fake negative pairs, where the negative terms are actually within the same class as the positive, and the real negative pairs are coequally treated; evenly measuring the similarities between terms might interfere with optimization. Importantly, few works study the theoretical framework of generalized self-supervised multi-view learning, especially for more than two views. To this end, we rethink the existing multi-view learning paradigm from the perspective of information theory and then propose a novel information theoretical framework for generalized multi-view learning. Guided by it, we build a multi-view coding method with a three-tier progressive architecture, namely Information theory-guided heuristic Progressive Multi-view Coding (IPMC). In the distribution-tier, IPMC aligns the distribution between views to reduce view-specific noise. In the set-tier, IPMC constructs self-adjusted contrasting pools, which are adaptively modified by a view filter. Lastly, in the instance-tier, we adopt a designed unified loss to learn representations and reduce the gradient interference. Theoretically and empirically, we demonstrate the superiority of IPMC over state-of-the-art methods.
Collapse
Affiliation(s)
- Jiangmeng Li
- Science & Technology on Integrated Information System Laboratory, Institute of Software, Chinese Academy of Sciences, Beijing, China; University of Chinese Academy of Sciences, Beijing, China
| | - Hang Gao
- Science & Technology on Integrated Information System Laboratory, Institute of Software, Chinese Academy of Sciences, Beijing, China; University of Chinese Academy of Sciences, Beijing, China
| | - Wenwen Qiang
- Science & Technology on Integrated Information System Laboratory, Institute of Software, Chinese Academy of Sciences, Beijing, China; University of Chinese Academy of Sciences, Beijing, China.
| | - Changwen Zheng
- Science & Technology on Integrated Information System Laboratory, Institute of Software, Chinese Academy of Sciences, Beijing, China
| |
Collapse
|
50
|
Zhang C, Zheng H, Gu Y. Dive into the details of self-supervised learning for medical image analysis. Med Image Anal 2023; 89:102879. [PMID: 37453236 DOI: 10.1016/j.media.2023.102879] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2022] [Revised: 04/16/2023] [Accepted: 06/22/2023] [Indexed: 07/18/2023]
Abstract
Self-supervised learning (SSL) has achieved remarkable performance in various medical imaging tasks by dint of priors from massive unlabeled data. However, regarding a specific downstream task, there is still a lack of an instruction book on how to select suitable pretext tasks and implementation details throughout the standard "pretrain-then-finetune" workflow. In this work, we focus on exploiting the capacity of SSL in terms of four realistic and significant issues: (1) the impact of SSL on imbalanced datasets, (2) the network architecture, (3) the applicability of upstream tasks to downstream tasks and (4) the stacking effect of SSL and common policies for deep learning. We provide a large-scale, in-depth and fine-grained study through extensive experiments on predictive, contrastive, generative and multi-SSL algorithms. Based on the results, we have uncovered several insights. Positively, SSL advances class-imbalanced learning mainly by boosting the performance of the rare class, which is of interest to clinical diagnosis. Unfortunately, SSL offers marginal or even negative returns in some cases, including severely imbalanced and relatively balanced data regimes, as well as combinations with common training policies. Our intriguing findings provide practical guidelines for the usage of SSL in the medical context and highlight the need for developing universal pretext tasks to accommodate diverse application scenarios. The code of this paper can be found at https://github.com/EndoluminalSurgicalVision-IMR/Medical-SSL.
Collapse
Affiliation(s)
- Chuyan Zhang
- Institute of Medical Robotics, Shanghai Jiao Tong University, Shanghai, China
| | - Hao Zheng
- Institute of Medical Robotics, Shanghai Jiao Tong University, Shanghai, China
| | - Yun Gu
- Institute of Medical Robotics, Shanghai Jiao Tong University, Shanghai, China.
| |
Collapse
|