1
|
Wang T, Dai Q, Xiong W. Escarcitys: A framework for enhancing medical image classification performance in scarcity of trainable samples scenarios. Neural Netw 2025; 189:107573. [PMID: 40382989 DOI: 10.1016/j.neunet.2025.107573] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2025] [Revised: 04/11/2025] [Accepted: 04/30/2025] [Indexed: 05/20/2025]
Abstract
In the field of healthcare, the acquisition and annotation of medical images present significant challenges, resulting in a scarcity of trainable samples. This data limitation hinders the performance of deep learning models, creating bottlenecks in clinical applications. To address this issue, we construct a framework (EScarcityS) aimed at enhancing the success rate of disease diagnosis in scarcity of trainable medical image scenarios. Firstly, considering that Transformer-based deep learning networks rely on a large amount of trainable data, this study takes into account the unique characteristics of pathological regions. By extracting the feature representations of all particles in medical images at different granularities, a multi-granularity Transformer network (MGVit) is designed. This network leverages additional prior knowledge to assist the Transformer network during training, thereby reducing the data requirement to some extent. Next, the importance maps of particles at different granularities, generated by MGVit, are fused to construct disease probability maps corresponding to the images. Based on these maps, a disease probability map-guided diffusion generation model is designed to generate more realistic and interpretable synthetic data. Subsequently, authentic and synthetical data are mixed and used to retrain MGVit, aiming to enhance the accuracy of medical image classification in scarcity of trainable medical image scenarios. Finally, we conducted detailed experiments on four real medical image datasets to validate the effectiveness of EScarcityS and its specific modules.
Collapse
Affiliation(s)
- Tianxiang Wang
- College of Artificial Intelligence, Nanjing University of Aeronautics and Astronautics, Nanjing, 211106, China; Key Laboratory of Brain-Machine Intelligence Technology, Ministry of Education, Nanjing, 211106, China
| | - Qun Dai
- College of Artificial Intelligence, Nanjing University of Aeronautics and Astronautics, Nanjing, 211106, China; Key Laboratory of Brain-Machine Intelligence Technology, Ministry of Education, Nanjing, 211106, China.
| | - Wei Xiong
- College of Computer Science, China University of Geosciences, Wuhan, 430078, China
| |
Collapse
|
2
|
Ma Y, Al-Aroomi MA, Zheng Y, Ren W, Liu P, Wu Q, Liang Y, Jiang C. Application of Mask R-CNN for automatic recognition of teeth and caries in cone-beam computerized tomography. BMC Oral Health 2025; 25:927. [PMID: 40481434 PMCID: PMC12143100 DOI: 10.1186/s12903-025-06293-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2024] [Accepted: 05/28/2025] [Indexed: 06/11/2025] Open
Abstract
OBJECTIVES Deep convolutional neural networks (CNNs) are advancing rapidly in medical research, demonstrating promising results in diagnosis and prediction within radiology and pathology. This study evaluates the efficacy of deep learning algorithms for detecting and diagnosing dental caries using cone-beam computed tomography (CBCT) with the Mask R-CNN architecture while comparing various hyperparameters to enhance detection. MATERIALS AND METHODS A total of 2,128 CBCT images were divided into training and validation and test datasets in a 7:1:1 ratio. For the verification of tooth recognition, the data from the validation set were randomly selected for analysis. Three groups of Mask R-CNN networks were compared: A scratch-trained baseline using randomly initialized weights (R group); A transfer learning approach with models pre-trained on COCO for object detection (C group); A variant pre-trained on ImageNetfor for object detection (I group). All configurations maintained identical hyperparameter settings to ensure fair comparison. The deep learning model used ResNet-50 as the backbone network and was trained to 300epoch respectively. We assessed training loss, detection and training times, diagnostic accuracy, specificity, positive and negative predictive values, and coverage precision to compare performance across the groups. RESULTS Transfer learning significantly reduced training times compared to non-transfer learning approach (p < 0.05). The average detection time for group R was 0.269 ± 0.176 s, whereas groups I (0.323 ± 0.196 s) and C (0.346 ± 0.195 s) exhibited significantly longer detection times (p < 0.05). C-group, trained for 200 epochs, achieved a mean average precision (mAP) of 81.095, outperforming all other groups. The mAP for caries recognition in group R, trained for 300 epochs, was 53.328, with detection times under 0.5 s. Overall, C-group demonstrated significantly higher average precision across all epochs (100, 200, and 300) (p < 0.05). CONCLUSION Neural networks pre-trained with COCO transfer learning exhibit superior annotation accuracy compared to those pre-trained with ImageNet. This suggests that COCO's diverse and richly annotated images offer more relevant features for detecting dental structures and carious lesions. Furthermore, employing ResNet-50 as the backbone architecture enhances the detection of teeth and carious regions, achieving significant improvements with just 200 training epochs, potentially increasing the efficiency of clinical image interpretation.
Collapse
Affiliation(s)
- Yujie Ma
- Department of Oral and Maxillofacial Surgery, Center of Stomatology, Xiangya Hospital, Central South University, Changsha, Hunan Province, 410008, China
| | - Maged Ali Al-Aroomi
- Department of Oral and Maxillofacial Surgery, Center of Stomatology, Xiangya Hospital, Central South University, Changsha, Hunan Province, 410008, China
| | - Yutian Zheng
- The College of Mechanical and Electrical Engineering, Central South University, Changsha, Hunan Province, China
| | - Wenjie Ren
- Department of Oral and Maxillofacial Surgery, Center of Stomatology, Xiangya Hospital, Central South University, Changsha, Hunan Province, 410008, China
| | - Peixuan Liu
- Department of Oral and Maxillofacial Surgery, Center of Stomatology, Xiangya Hospital, Central South University, Changsha, Hunan Province, 410008, China
| | - Qing Wu
- High Performance Computing Center, Central South University, Changsha, Hunan Province, China
| | - Ye Liang
- Department of Oral and Maxillofacial Surgery, Center of Stomatology, Xiangya Hospital, Central South University, Changsha, Hunan Province, 410008, China.
| | - Canhua Jiang
- Department of Oral and Maxillofacial Surgery, Center of Stomatology, Xiangya Hospital, Central South University, Changsha, Hunan Province, 410008, China.
| |
Collapse
|
3
|
Matsubara N, Teramoto A, Takei M, Kitoh Y, Kawakami S. Retaking assessment system based on the inspiratory state of chest X-ray image. Radiol Phys Technol 2025; 18:384-398. [PMID: 39969765 PMCID: PMC12103368 DOI: 10.1007/s12194-025-00888-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2024] [Revised: 02/03/2025] [Accepted: 02/05/2025] [Indexed: 02/20/2025]
Abstract
When taking chest X-rays, the patient is encouraged to take maximum inspiration and the radiological technologist takes the images at the appropriate time. If the image is not taken at maximum inspiration, retaking of the image is required. However, there is variation in the judgment of whether retaking is necessary between the operators. Therefore, we considered that it might be possible to reduce variation in judgment by developing a retaking assessment system that evaluates whether retaking is necessary using a convolutional neural network (CNN). To train the CNN, the input chest X-ray image and the corresponding correct label indicating whether retaking is necessary are required. However, chest X-ray images cannot distinguish whether inspiration is sufficient and does not need to be retaken, or insufficient and retaking is required. Therefore, we generated input images and labels from dynamic digital radiography (DDR) and conducted the training. Verification using 18 dynamic chest X-ray cases (5400 images) and 48 actual chest X-ray cases (96 images) showed that the VGG16-based architecture achieved an assessment accuracy of 82.3% even for actual chest X-ray images. Therefore, if the proposed method is used in hospitals, it could possibly reduce the variability in judgment between operators.
Collapse
Affiliation(s)
- Naoki Matsubara
- Division of Radiology, Shinshu University Hospital, 3-1-1 Asahi, Matsumoto, Nagano, 390-8621, Japan.
| | - Atsushi Teramoto
- Faculty of Engineering, Meijo University, 1-501 Shiogamaguchi, Tempaku-ku, Nagoya, 468-8502, Japan
| | - Manabu Takei
- Division of Radiology, Shinshu University Hospital, 3-1-1 Asahi, Matsumoto, Nagano, 390-8621, Japan
| | - Yoshihiro Kitoh
- Division of Radiology, Shinshu University Hospital, 3-1-1 Asahi, Matsumoto, Nagano, 390-8621, Japan
| | - Satoshi Kawakami
- Department of Radiology, Shinshu University School of Medicine, 3-1-1 Asahi, Matsumoto, 390-8621, Japan
| |
Collapse
|
4
|
Harris CE, Liu L, Almeida L, Kassick C, Makrogiannis S. Artificial intelligence in pediatric osteopenia diagnosis: evaluating deep network classification and model interpretability using wrist X-rays. Bone Rep 2025; 25:101845. [PMID: 40343188 PMCID: PMC12059325 DOI: 10.1016/j.bonr.2025.101845] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/16/2024] [Revised: 04/11/2025] [Accepted: 04/21/2025] [Indexed: 05/11/2025] Open
Abstract
Osteopenia is a bone disorder that causes low bone density and affects millions of people worldwide. Diagnosis of this condition is commonly achieved through clinical assessment of bone mineral density (BMD). State of the art machine learning (ML) techniques, such as convolutional neural networks (CNNs) and transformer models, have gained increasing popularity in medicine. In this work, we employ six deep networks for osteopenia vs. healthy bone classification using X-ray imaging from the pediatric wrist dataset GRAZPEDWRI-DX. We apply two explainable AI techniques to analyze and interpret visual explanations for network decisions. Experimental results show that deep networks are able to effectively learn osteopenic and healthy bone features, achieving high classification accuracy rates. Among the six evaluated networks, DenseNet201 with transfer learning yielded the top classification accuracy at 95.2 %. Furthermore, visual explanations of CNN decisions provide valuable insight into the blackbox inner workings and present interpretable results. Our evaluation of deep network classification results highlights their capability to accurately differentiate between osteopenic and healthy bones in pediatric wrist X-rays. The combination of high classification accuracy and interpretable visual explanations underscores the promise of incorporating machine learning techniques into clinical workflows for the early and accurate diagnosis of osteopenia.
Collapse
Affiliation(s)
- Chelsea E. Harris
- Division of Physics, Engineering, Mathematics, and Computer Science, Delaware State University, 1200 N. Dupont Hwy., Dover, 19901, DE, USA
| | - Lingling Liu
- Division of Physics, Engineering, Mathematics, and Computer Science, Delaware State University, 1200 N. Dupont Hwy., Dover, 19901, DE, USA
| | - Luiz Almeida
- Department of Orthopaedic Surgery, Duke University, 2080 Duke University Road, Durham, 27710, NC, USA
| | - Carolina Kassick
- Division of Physics, Engineering, Mathematics, and Computer Science, Delaware State University, 1200 N. Dupont Hwy., Dover, 19901, DE, USA
| | - Sokratis Makrogiannis
- Division of Physics, Engineering, Mathematics, and Computer Science, Delaware State University, 1200 N. Dupont Hwy., Dover, 19901, DE, USA
| |
Collapse
|
5
|
Ashi L, Taurin S. Computational modeling of breast tissue mechanics and machine learning in cancer diagnostics: enhancing precision in risk prediction and therapeutic strategies. Expert Rev Anticancer Ther 2025:1-14. [PMID: 40380913 DOI: 10.1080/14737140.2025.2508850] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2025] [Revised: 03/19/2025] [Accepted: 05/16/2025] [Indexed: 05/19/2025]
Abstract
INTRODUCTION Breast cancer remains a significant global health issue. Despite advances in detection and treatment, its complexity is driven by genetic, environmental, and structural factors. Computational methods like Finite Element Modeling (FEM) have transformed our understanding of breast cancer risk and progression. AREAS COVERED Advanced computational approaches in breast cancer research are the focus, with an emphasis on FEM's role in simulating breast tissue mechanics and enhancing precision in therapies such as radiofrequency ablation (RFA). Machine learning (ML), particularly Convolutional Neural Networks (CNNs), has revolutionized imaging modalities like mammograms and MRIs, improving diagnostic accuracy and early detection. AI applications in analyzing histopathological images have advanced tumor classification and grading, offering consistency and reducing inter-observer variability. Explainability tools like Grad-CAM, SHAP, and LIME enhance the transparency of AI-driven models, facilitating their integration into clinical workflows. EXPERT OPINION Integrating FEM and ML represents a paradigm shift in breast cancer management. FEM offers precise modeling of tissue mechanics, while ML excels in predictive analytics and image analysis. Despite challenges such as data variability and limited standardization, synergizing these approaches promises adaptive, personalized care. These computational methods have the potential to redefine diagnostics, optimize treatment, and improve patient outcomes.
Collapse
Affiliation(s)
- Layal Ashi
- Department of Molecular Medicine, College of Medicine and Health Sciences, Princess Al-Jawhara Center for Molecular Medicine and Inherited Disorders, Arabian Gulf University, Manama, Kingdom of Bahrain
| | - Sebastien Taurin
- Department of Molecular Medicine, College of Medicine and Health Sciences, Princess Al-Jawhara Center for Molecular Medicine and Inherited Disorders, Arabian Gulf University, Manama, Kingdom of Bahrain
| |
Collapse
|
6
|
Cai L, Williamson C, Nguyen A, Wittrup E, Najarian K. Adapting segment anything model for hematoma segmentation in traumatic brain injury. DISCOVER IMAGING 2025; 2:6. [PMID: 40438440 PMCID: PMC12106135 DOI: 10.1007/s44352-025-00011-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/13/2024] [Accepted: 05/09/2025] [Indexed: 06/01/2025]
Abstract
Hematoma segmentation in traumatic brain injury (TBI) is critical for accurate diagnosis and effective treatment planning. In this study, we evaluate various automated segmentation models, including stat-of-the-art architecture as benchmarks, and compare their performance with our proposed SAM-Adapter method for segmenting hematomas in brain CT scans. By incorporating the adapter into the vanilla SAM model, we address the challenges in medical imaging, which has very limited annotated datasets, enhancing model performance efficiency. We also find that domain-specific pre-processing, such as contrast adjustment, reduces the need for extensive pretraining, making the model more streamlined. And the model performance benefited with optimization and hyperparameter tuning. Our results demonstrate that the SAM-Adapter model achieved strong performance and reliability in identifying hematomas with Dice (72.34%), IoU (59.78%), 95% HD (5.57), sensitivity (75.39%) and specificity (99.73%). Inter-observer variability was assessed, revealing that the model's performance Dice (67.20%) was closely aligned with human expert agreement Dice (63.79%), suggesting its potential clinical utility. The external validation on the HemSeg-200 dataset, which contains 222 scans, demonstrates the robustness of our approach across diverse cases. These advancements in automatic segmentation hold promise for improving the accuracy and efficiency of TBI diagnosis, supporting clinical decision-making, and enhancing patient outcomes. Supplementary Information The online version contains supplementary material available at 10.1007/s44352-025-00011-4.
Collapse
Affiliation(s)
- Lingrui Cai
- Department of Computational Medicine and Bioinformatics, University of Michigan, 2800 Plymouth Road, Ann Arbor, 48109 MI USA
| | - Craig Williamson
- Department of Neurosurgery and Neurology, University of Michigan, 1500 E. Medical Center Drive, Ann Arbor, 48109 MI USA
| | - Andrew Nguyen
- Department of Neurosurgery and Neurology, University of Michigan, 1500 E. Medical Center Drive, Ann Arbor, 48109 MI USA
| | - Emily Wittrup
- Department of Computational Medicine and Bioinformatics, University of Michigan, 2800 Plymouth Road, Ann Arbor, 48109 MI USA
| | - Kayvan Najarian
- Department of Computational Medicine and Bioinformatics, University of Michigan, 2800 Plymouth Road, Ann Arbor, 48109 MI USA
- Michigan Institute for Data Science, University of Michigan, 500 Church Street, Ann Arbor, 48109 MI USA
- Max Harry Weil Institute for Critical Care Research and Innovation, University of Michigan, 2800 Plymouth Road, Ann Arbor, 48109 MI USA
| |
Collapse
|
7
|
Fang W, Tang S, Yan D, Dai X, Zhang W, Xiong J. Breast cancer pathology image recognition based on convolutional neural network. PLoS One 2025; 20:e0311728. [PMID: 40388398 PMCID: PMC12088023 DOI: 10.1371/journal.pone.0311728] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2023] [Accepted: 09/18/2024] [Indexed: 05/21/2025] Open
Abstract
This study presents a convolutional neural network (CNN)-based method for the classification and recognition of breast cancer pathology images. It aims to solve the problems existing in traditional pathological tissue analysis methods, such as time-consuming and labour-intensive, and possible misdiagnosis or missed diagnosis. Using the idea of ensemble learning, the image is divided into four equal parts and sixteen equal parts for data augmentation. Then, using the Inception-ResNet V2 neural network model and transfer learning technology, features are extracted from pathological images, and a three-layer fully connected neural network is constructed for feature classification. In the recognition process of pathological image categories, the network first recognises each sub-image, and then sums and averages the recognition results of each sub-image to finally obtain the classification result. The experiment uses the BreaKHis dataset, which is a breast cancer pathological image classification dataset. It contains 7,909 images from 82 patients and covers benign and malignant lesion types. We randomly select 80% of the data as the training set and 20% as the test set and compare them with the Inception-ResNet V2, ResNet101, DenseNet169, MobileNetV3 and EfficientNetV2 models. Experimental results show that under the four magnifications of the BreaKHis dataset, the method used in this study achieves the highest accuracy rates of 99.75%, 98.31%, 98.51% and 96.69%, which are much higher than other models.
Collapse
Affiliation(s)
- Weijian Fang
- Chongqing Three Gorges University, Chongqing, China
| | - Shuyu Tang
- School of Computer Science and Engineering, School of Three Gorges Artificial Intelligence, and Key Laboratory of Intelligent Information Processing and Control, Chongqing Three Gorges University, Chongqing, China
| | - Dongfang Yan
- School of Computer Science and Engineering, School of Three Gorges Artificial Intelligence, and Key Laboratory of Intelligent Information Processing and Control, Chongqing Three Gorges University, Chongqing, China
| | - Xiangguang Dai
- School of Computer Science and Engineering, School of Three Gorges Artificial Intelligence, and Key Laboratory of Intelligent Information Processing and Control, Chongqing Three Gorges University, Chongqing, China
| | - Wei Zhang
- School of Computer Science and Engineering, School of Three Gorges Artificial Intelligence, and Key Laboratory of Intelligent Information Processing and Control, Chongqing Three Gorges University, Chongqing, China
| | - Jiang Xiong
- School of Computer Science and Engineering, School of Three Gorges Artificial Intelligence, and Key Laboratory of Intelligent Information Processing and Control, Chongqing Three Gorges University, Chongqing, China
| |
Collapse
|
8
|
Zhang Y, Huang YA, Hu Y, Liu R, Wu J, Huang ZA, Tan KC. CausalMixNet: A mixed-attention framework for causal intervention in robust medical image diagnosis. Med Image Anal 2025; 103:103581. [PMID: 40359724 DOI: 10.1016/j.media.2025.103581] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2024] [Revised: 03/25/2025] [Accepted: 04/01/2025] [Indexed: 05/15/2025]
Abstract
Confounding factors inherent in medical images can significantly impact the causal exploration capabilities of deep learning models, resulting in compromised accuracy and diminished generalization performance. In this paper, we present an innovative methodology named CausalMixNet that employs query-mixed intra-attention and key&value-mixed inter-attention to probe causal relationships between input images and labels. For mitigating unobservable confounding factors, CausalMixNet integrates the non-local reasoning module (NLRM) and the key&value-mixed inter-attention (KVMIA) to conduct a front-door adjustment strategy. Furthermore, CausalMixNet incorporates a patch-masked ranking module (PMRM) and query-mixed intra-attention (QMIA) to enhance mediator learning, thereby facilitating causal intervention. The patch mixing mechanism applied to query/(key&value) features within QMIA and KVMIA specifically targets lesion-related feature enhancement and the inference of average causal effect inference. CausalMixNet consistently outperforms existing methods, achieving superior accuracy and F1-scores across in-domain and out-of-domain scenarios on multiple datasets, with an average improvement of 3% over the closest competitor. Demonstrating robustness against noise, gender bias, and attribute bias, CausalMixNet excels in handling unobservable confounders, maintaining stable performance even in challenging conditions.
Collapse
Affiliation(s)
- Yajie Zhang
- Department of Data Science and Artificial Intelligence, The Hong Kong Polytechnic University, Hong Kong Special Administrative Region of China
| | - Yu-An Huang
- School of Computer Science, Northwestern Polytechnical University, Xi'an, China
| | - Yao Hu
- Department of Data Science and Artificial Intelligence, The Hong Kong Polytechnic University, Hong Kong Special Administrative Region of China
| | - Rui Liu
- Department of Data Science and Artificial Intelligence, The Hong Kong Polytechnic University, Hong Kong Special Administrative Region of China
| | - Jibin Wu
- Department of Data Science and Artificial Intelligence, The Hong Kong Polytechnic University, Hong Kong Special Administrative Region of China; Department of Computing, The Hong Kong Polytechnic University, Hong Kong Special Administrative Region of China; Research Center on Data Sciences and Artificial Intelligence, The Hong Kong Polytechnic University, Hong Kong Special Administrative Region of China
| | - Zhi-An Huang
- Department of Computer Science, City University of Hong Kong (Dongguan), Dongguan, China.
| | - Kay Chen Tan
- Department of Data Science and Artificial Intelligence, The Hong Kong Polytechnic University, Hong Kong Special Administrative Region of China; Research Center on Data Sciences and Artificial Intelligence, The Hong Kong Polytechnic University, Hong Kong Special Administrative Region of China
| |
Collapse
|
9
|
Sasmal P, Kumar Panigrahi S, Panda SL, Bhuyan MK. Attention-guided deep framework for polyp localization and subsequent classification via polyp local and Siamese feature fusion. Med Biol Eng Comput 2025:10.1007/s11517-025-03369-z. [PMID: 40314710 DOI: 10.1007/s11517-025-03369-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/24/2024] [Accepted: 04/16/2025] [Indexed: 05/03/2025]
Abstract
Colorectal cancer (CRC) is one of the leading causes of death worldwide. This paper proposes an automated diagnostic technique to detect, localize, and classify polyps in colonoscopy video frames. The proposed model adopts the deep YOLOv4 model that incorporates both spatial and contextual information in the form of spatial attention and channel attention blocks, respectively for better localization of polyps. Finally, leveraging a fusion of deep and handcrafted features, the detected polyps are classified as adenoma or non-adenoma. Polyp shape and texture are essential features in discriminating polyp types. Therefore, the proposed work utilizes a pyramid histogram of oriented gradient (PHOG) and embedding features learned via triplet Siamese architecture to extract these features. The PHOG extracts local shape information from each polyp class, whereas the Siamese network extracts intra-polyp discriminating features. The individual and cross-database performances on two databases suggest the robustness of our method in polyp localization. The competitive analysis based on significant clinical parameters with current state-of-the-art methods confirms that our method can be used for automated polyp localization in both real-time and offline colonoscopic video frames. Our method provides an average precision of 0.8971 and 0.9171 and an F1 score of 0.8869 and 0.8812 for the Kvasir-SEG and SUN databases. Similarly, the proposed classification framework for the detected polyps yields a classification accuracy of 96.66% on a publicly available UCI colonoscopy video dataset. Moreover, the classification framework provides an F1 score of 96.54% that validates the potential of the proposed framework in polyp localization and classification.
Collapse
Affiliation(s)
- Pradipta Sasmal
- Department of Electrical Engineering, Indian Institute of Technology, Kharagpur, West Bengal, 721302, India.
| | - Susant Kumar Panigrahi
- Department of Electrical Engineering, Indian Institute of Technology, Kharagpur, West Bengal, 721302, India
| | - Swarna Laxmi Panda
- Department of Electronics and Communication Engineering, National Institute of Technology, Rourkela, Odisha, 769008, India
| | - M K Bhuyan
- Department of Electronics and Electrical Engineering, Indian Institute of Technology, Guwahati, Assam, 781039, India
| |
Collapse
|
10
|
Hosseinzadeh Taher MR, Haghighi F, Gotway MB, Liang J. Large-scale benchmarking and boosting transfer learning for medical image analysis. Med Image Anal 2025; 102:103487. [PMID: 40117988 DOI: 10.1016/j.media.2025.103487] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2024] [Revised: 08/03/2024] [Accepted: 01/27/2025] [Indexed: 03/23/2025]
Abstract
Transfer learning, particularly fine-tuning models pretrained on photographic images to medical images, has proven indispensable for medical image analysis. There are numerous models with distinct architectures pretrained on various datasets using different strategies. But, there is a lack of up-to-date large-scale evaluations of their transferability to medical imaging, posing a challenge for practitioners in selecting the most proper pretrained models for their tasks at hand. To fill this gap, we conduct a comprehensive systematic study, focusing on (i) benchmarking numerous conventional and modern convolutional neural network (ConvNet) and vision transformer architectures across various medical tasks; (ii) investigating the impact of fine-tuning data size on the performance of ConvNets compared with vision transformers in medical imaging; (iii) examining the impact of pretraining data granularity on transfer learning performance; (iv) evaluating transferability of a wide range of recent self-supervised methods with diverse training objectives to a variety of medical tasks across different modalities; and (v) delving into the efficacy of domain-adaptive pretraining on both photographic and medical datasets to develop high-performance models for medical tasks. Our large-scale study (∼5,000 experiments) yields impactful insights: (1) ConvNets demonstrate higher transferability than vision transformers when fine-tuning for medical tasks; (2) ConvNets prove to be more annotation efficient than vision transformers when fine-tuning for medical tasks; (3) Fine-grained representations, rather than high-level semantic features, prove pivotal for fine-grained medical tasks; (4) Self-supervised models excel in learning holistic features compared with supervised models; and (5) Domain-adaptive pretraining leads to performant models via harnessing knowledge acquired from ImageNet and enhancing it through the utilization of readily accessible expert annotations associated with medical datasets. As open science, all codes and pretrained models are available at GitHub.com/JLiangLab/BenchmarkTransferLearning (Version 2).
Collapse
Affiliation(s)
| | - Fatemeh Haghighi
- School of Computing and Augmented Intelligence, Arizona State University, Tempe, AZ 85281, USA
| | | | - Jianming Liang
- School of Computing and Augmented Intelligence, Arizona State University, Tempe, AZ 85281, USA.
| |
Collapse
|
11
|
Xia C, Zuo M, Lin Z, Deng L, Rao Y, Chen W, Chen J, Yao W, Hu M. Multimodal Deep Learning Fusing Clinical and Radiomics Scores for Prediction of Early-Stage Lung Adenocarcinoma Lymph Node Metastasis. Acad Radiol 2025; 32:2977-2989. [PMID: 39730249 DOI: 10.1016/j.acra.2024.12.018] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2024] [Revised: 11/27/2024] [Accepted: 12/09/2024] [Indexed: 12/29/2024]
Abstract
RATIONALE AND OBJECTIVES To develop and validate a multimodal deep learning (DL) model based on computed tomography (CT) images and clinical knowledge to predict lymph node metastasis (LNM) in early lung adenocarcinoma. MATERIALS AND METHODS A total of 724 pathologically confirmed early invasive lung adenocarcinoma patients were retrospectively included from two centers. Clinical and CT semantic features of the patients were collected, and 3D radiomics features were extracted from nonenhanced CT images. We proposed a multimodal feature fusion DL network based on the InceptionResNetV2 architecture, which can effectively extract and integrate image and clinical knowledge to predict LNM. RESULTS A total of 524 lung adenocarcinoma patients from Center 1 were randomly divided into training (n=418) and internal validation (n=106) sets in a 4:1 ratio, while 200 lung adenocarcinoma patients from Center 2 served as the independent test set. Among the 16 collected clinical and imaging features, 8 were selected: gender, serum carcinoembryonic antigen, cytokeratin 19 fragment antigen 21-1, neuron-specific enolase, tumor size, location, density, and centrality. From the 1595 extracted radiomics features, six key features were identified. The CS-RS-DL fusion model achieved the highest area under the receiver operating characteristic curve in both the internal validation set (0.877) and the independent test set (0.906) compared to other models. The Delong test results for the independent test set indicated that the CS-RS-DL model significantly outperformed the clinical model (0.844), radiomics model (0.850), CS-RS model (0.872), single DL model (0.848), and the CS-DL model (0.875) (all P<0.05). Additionally, the CS-RS-DL model exhibited the highest sensitivity (0.941) and average precision (0.642). CONCLUSION The knowledge derived from clinical, radiomics, and DL is complementary in predicting LNM in lung adenocarcinoma. The integration of clinical and radiomics scores through DL can significantly improve the accuracy of lymph node status assessment.
Collapse
Affiliation(s)
- Chengcheng Xia
- School of Public Health, Jiangxi Medical College, Nanchang University, Nanchang 330006, China (C.X., L.D., W.C., M.H.); Jiangxi Provincial Key Laboratory of Disease Prevention and Public Health, Nanchang University, Nanchang 330006, China (C.X., L.D., W.C., M.H.)
| | - Minjing Zuo
- Department of Radiology, The Second Affiliated Hospital, Jiangxi Medical College, Nanchang University, Nanchang 330006, China (M.Z.); Intelligent Medical Imaging of Jiangxi Key Laboratory, Nanchang 330006, China (M.Z.)
| | - Ze Lin
- Department of Radiology, Hubei Provincial Hospital of Traditional Chinese Medicine, Wuhan 430022, China (Z.L.); Affiliated Hospital of Hubei University of Chinese Medicine, Wuhan 430022, China (Z.L.)
| | - Libin Deng
- School of Public Health, Jiangxi Medical College, Nanchang University, Nanchang 330006, China (C.X., L.D., W.C., M.H.); Jiangxi Provincial Key Laboratory of Disease Prevention and Public Health, Nanchang University, Nanchang 330006, China (C.X., L.D., W.C., M.H.)
| | - Yulian Rao
- Wanli District Center for Disease Control and Prevention of Nanchang, Nanchang 330004, China (Y.R.)
| | - Wenxiang Chen
- School of Public Health, Jiangxi Medical College, Nanchang University, Nanchang 330006, China (C.X., L.D., W.C., M.H.); Jiangxi Provincial Key Laboratory of Disease Prevention and Public Health, Nanchang University, Nanchang 330006, China (C.X., L.D., W.C., M.H.)
| | - Jinqin Chen
- Jiangxi Medical College, Nanchang University, Nanchang, China (J.C.)
| | - Weirong Yao
- Department of Oncology, Jiangxi Provincial People's Hospital, The First Affiliated Hospital of Nanchang Medical College, Nanchang, China (W.Y.)
| | - Min Hu
- School of Public Health, Jiangxi Medical College, Nanchang University, Nanchang 330006, China (C.X., L.D., W.C., M.H.); Jiangxi Provincial Key Laboratory of Disease Prevention and Public Health, Nanchang University, Nanchang 330006, China (C.X., L.D., W.C., M.H.).
| |
Collapse
|
12
|
Arnab SP, Campelo dos Santos AL, Fumagalli M, DeGiorgio M. Efficient Detection and Characterization of Targets of Natural Selection Using Transfer Learning. Mol Biol Evol 2025; 42:msaf094. [PMID: 40341942 PMCID: PMC12062966 DOI: 10.1093/molbev/msaf094] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2024] [Revised: 04/16/2025] [Accepted: 04/17/2025] [Indexed: 05/11/2025] Open
Abstract
Natural selection leaves detectable patterns of altered spatial diversity within genomes, and identifying affected regions is crucial for understanding species evolution. Recently, machine learning approaches applied to raw population genomic data have been developed to uncover these adaptive signatures. Convolutional neural networks (CNNs) are particularly effective for this task, as they handle large data arrays while maintaining element correlations. However, shallow CNNs may miss complex patterns due to their limited capacity, while deep CNNs can capture these patterns but require extensive data and computational power. Transfer learning addresses these challenges by utilizing a deep CNN pretrained on a large dataset as a feature extraction tool for downstream classification and evolutionary parameter prediction. This approach reduces extensive training data generation requirements and computational needs while maintaining high performance. In this study, we developed TrIdent, a tool that uses transfer learning to enhance detection of adaptive genomic regions from image representations of multilocus variation. We evaluated TrIdent across various genetic, demographic, and adaptive settings, in addition to unphased data and other confounding factors. TrIdent demonstrated improved detection of adaptive regions compared to recent methods using similar data representations. We further explored model interpretability through class activation maps and adapted TrIdent to infer selection parameters for identified adaptive candidates. Using whole-genome haplotype data from European and African populations, TrIdent effectively recapitulated known sweep candidates and identified novel cancer, and other disease-associated genes as potential sweeps.
Collapse
Affiliation(s)
- Sandipan Paul Arnab
- Department of Electrical Engineering and Computer Science, Florida Atlantic University, Boca Raton, FL, USA
| | | | - Matteo Fumagalli
- School of Biological and Behavioural Sciences, Queen Mary University of London, London, UK
- The Alan Turing Institute, London, UK
| | - Michael DeGiorgio
- Department of Electrical Engineering and Computer Science, Florida Atlantic University, Boca Raton, FL, USA
| |
Collapse
|
13
|
Liang X, Han L, Zhang X, Li X, Sun Y, Tong T, Tan T, Mann R. Singular value decomposition based under-sampling pattern optimization for MRI reconstruction. Med Phys 2025. [PMID: 40296184 DOI: 10.1002/mp.17860] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2024] [Revised: 03/26/2025] [Accepted: 04/16/2025] [Indexed: 04/30/2025] Open
Abstract
BACKGROUND Magnetic resonance imaging (MRI) is a crucial medical imaging technique that can determine the structural and functional status of body tissues and organs. However, the prolonged MRI acquisition time increases the scanning cost and limits its use in less developed areas. PURPOSE The objective of this study is to design a lightweight, data-driven under-sampling pattern for fastMRI to achieve a balance between MRI reconstruction quality and sampling time while also being able to be integrated with deep learning to further improve reconstruction quality. METHODS In this study, we attempted to establish a connection between k-space and the corresponding MRI through singular value decomposition(SVD). Specifically, we apply SVD to MRI to decouple it into multiple components, which are sorted by energy contribution. Then, the sampling points that match the energy contribution in the k-space, which correspond to each component are selected sequentially. Finally, the sampling points obtained from all components are merged to obtain a mask. This mask can be used directly as a sampler or integrated into deep learning as an initial or fixed sampling points. RESULTS The experiments were conducted on two public datasets, and the results demonstrate that when the mask generated based on our method is directly used as the sampler, the MRI reconstruction quality surpasses that of state-of-the-art heuristic samplers. In addition, when integrated into the deep learning models, the models converge faster and the sampler performance is significantly improved. CONCLUSIONS The proposed lightweight data-driven sampling approach avoids time-consuming parameter tuning and the establishment of complex mathematical models, achieving a balance between reconstruction quality and sampling time.
Collapse
Affiliation(s)
- Xinglong Liang
- The Department of Radiology and Nuclear Medicine, Radboud University Medical Centre, Nijmegen, The Netherlands
- The Department of Radiology, The Netherlands Cancer Institute, Amsterdam, The Netherlands
| | - Luyi Han
- The Department of Radiology and Nuclear Medicine, Radboud University Medical Centre, Nijmegen, The Netherlands
- The Department of Radiology, The Netherlands Cancer Institute, Amsterdam, The Netherlands
| | - Xinlin Zhang
- The College of Physics and Information Engineering, Fuzhou University, Fuzhou, China
| | - Xinnian Li
- Research Center of Space Control and Inertial Technology, Harbin, China
| | - Yue Sun
- Faculty of Applied Sciences, Macao Polytechnic University, Macao Special Administrative Region of China, Macao, China
| | - Tong Tong
- The College of Physics and Information Engineering, Fuzhou University, Fuzhou, China
| | - Tao Tan
- The Department of Radiology, The Netherlands Cancer Institute, Amsterdam, The Netherlands
- Faculty of Applied Sciences, Macao Polytechnic University, Macao Special Administrative Region of China, Macao, China
| | - Ritse Mann
- The Department of Radiology and Nuclear Medicine, Radboud University Medical Centre, Nijmegen, The Netherlands
- The Department of Radiology, The Netherlands Cancer Institute, Amsterdam, The Netherlands
| |
Collapse
|
14
|
Aktar M, Tampieri D, Xiao Y, Rivaz H, Kersten-Oertel M. CASCADE-FSL: Few-shot learning for collateral evaluation in ischemic stroke. Comput Med Imaging Graph 2025; 123:102550. [PMID: 40250214 DOI: 10.1016/j.compmedimag.2025.102550] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2024] [Revised: 03/20/2025] [Accepted: 04/02/2025] [Indexed: 04/20/2025]
Abstract
Assessing collateral circulation is essential in determining the best treatment for ischemic stroke patients as good collaterals lead to different treatment options, i.e., thrombectomy, whereas poor collaterals can adversely affect the treatment by leading to excess bleeding and eventually death. To reduce inter- and intra-rater variability and save time in radiologist assessments, computer-aided methods, mainly using deep neural networks, have gained popularity. The current literature demonstrates effectiveness when using balanced and extensive datasets in deep learning; however, such data sets are scarce for stroke, and the number of data samples for poor collateral cases is often limited compared to those for good collaterals. We propose a novel approach called CASCADE-FSL to distinguish poor collaterals effectively. Using a small, unbalanced data set, we employ a few-shot learning approach for training using a 2D ResNet-50 as a backbone and designating good and intermediate cases as two normal classes. We identify poor collaterals as anomalies in comparison to the normal classes. Our novel approach achieves an overall accuracy, sensitivity, and specificity of 0.88, 0.88, and 0.89, respectively, demonstrating its effectiveness in addressing the imbalanced dataset challenge and accurately identifying poor collateral circulation cases.
Collapse
Affiliation(s)
- Mumu Aktar
- Computer Science and Software Engineering, Concordia University, 1455 De Maisonneuve Blvd, Montreal, H3G 1M8, Quebec, Canada.
| | - Donatella Tampieri
- Computer Science and Software Engineering, Concordia University, 1455 De Maisonneuve Blvd, Montreal, H3G 1M8, Quebec, Canada
| | - Yiming Xiao
- Computer Science and Software Engineering, Concordia University, 1455 De Maisonneuve Blvd, Montreal, H3G 1M8, Quebec, Canada
| | - Hassan Rivaz
- Computer Science and Software Engineering, Concordia University, 1455 De Maisonneuve Blvd, Montreal, H3G 1M8, Quebec, Canada
| | - Marta Kersten-Oertel
- Computer Science and Software Engineering, Concordia University, 1455 De Maisonneuve Blvd, Montreal, H3G 1M8, Quebec, Canada
| |
Collapse
|
15
|
Li Y, Hui L, Wang X, Zou L, Chua S. Lung nodule detection using a multi-scale convolutional neural network and global channel spatial attention mechanisms. Sci Rep 2025; 15:12313. [PMID: 40210738 PMCID: PMC11986029 DOI: 10.1038/s41598-025-97187-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2024] [Accepted: 04/02/2025] [Indexed: 04/12/2025] Open
Abstract
Early detection of lung nodules is crucial for the prevention and treatment of lung cancer. However, current methods face challenges such as missing small nodules, variations in nodule size, and high false positive rates. To address these challenges, we propose a Global Channel Spatial Attention Mechanism (GCSAM). Building upon it, we develop a Candidate Nodule Detection Network (CNDNet) and a False Positive Reduction Network (FPRNet). CNDNet employs Res2Net as its backbone network to capture multi-scale features of lung nodules, utilizing GCSAM to fuse global contextual information, adaptively adjust feature weights, and refine processing along the spatial dimension. Additionally, we design a Hierarchical Progressive Feature Fusion (HPFF) module to effectively combine deep semantic information with shallow positional information, enabling high-sensitivity detection of nodules of varying sizes. FPRNet significantly reduces the false positive rate by accurately distinguishing true nodules from similar structures. Experimental results on the LUNA16 dataset demonstrate that our method achieves a competitive performance metric (CPM) value of 0.929 and a sensitivity of 0.977 under 2 false positives per scan. Compared to existing methods, our proposed method effectively reduces false positives while maintaining high sensitivity, achieving competitive results.
Collapse
Affiliation(s)
- Yongbin Li
- Faculty of Medical Information Engineering, Zunyi Medical University, 563000, Zunyi, Guizhou, China
- Faculty of Computer Science and Information Technology, Universiti Malaysia Sarawak, 94300, Kota Samarahan, Sarawak, Malaysia
| | - Linhu Hui
- Faculty of Medical Information Engineering, Zunyi Medical University, 563000, Zunyi, Guizhou, China
| | - Xiaohua Wang
- Faculty of Medical Information Engineering, Zunyi Medical University, 563000, Zunyi, Guizhou, China
| | - Liping Zou
- Faculty of Medical Information Engineering, Zunyi Medical University, 563000, Zunyi, Guizhou, China
| | - Stephanie Chua
- Faculty of Computer Science and Information Technology, Universiti Malaysia Sarawak, 94300, Kota Samarahan, Sarawak, Malaysia.
| |
Collapse
|
16
|
Santoro-Fernandes V, Schott B, Weisman AJ, Lokre O, Cho SY, Perlman SB, Perk TG, Jeraj R. Full-Body Tumor Response Heterogeneity of Metastatic Neuroendocrine Tumor Patients Undergoing Peptide Receptor Radiopharmaceutical Therapy. J Nucl Med 2025; 66:565-571. [PMID: 39947917 DOI: 10.2967/jnumed.124.267809] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2024] [Accepted: 01/06/2025] [Indexed: 04/03/2025] Open
Abstract
Patients with metastatic neuroendocrine tumors (NETs) can present with hundreds of lesions, and each lesion might have a unique response pattern to peptide receptor radiopharmaceutical therapy (PRRT). This response heterogeneity has been observed but is poorly understood. In this work, we perform a quantitative analysis of longitudinal PET/CT scans to comprehensively characterize the NET response to PRRT. Methods: NET patients treated with [177Lu]Lu-DOTATATE PRRT imaged at baseline, during, and after PRRT with [68Ga]Ga-DOTATATE PET/CT were enrolled in this retrospective single-institutional study. A deep-learning model was used to identify and contour regions of nonphysiological elevated tracer uptake (lesion-regions of interest [ROIs]). An automated analysis was performed to identify, contour, and quantify the individual lesion-ROI uptake, match ROI between time points, and categorize each lesion-ROI as disappearing, decreasing (ΔSUVtotal < -30%), stable (-30% ≤ ΔSUVtotal ≤ 30%), increasing (ΔSUVtotal > 30%), or new. A patient was considered to have response heterogeneity if both new or increasing lesion-ROIs and decreasing or disappearing lesion-ROIs were present after therapy. Results: Eighteen patients who received between 2 and 7 [68Ga]Ga-DOTATATE PET/CT scans were enrolled. In total, 3,289 lesion-ROIs were contoured in the 67 scans acquired (median of 24 lesion-ROIs per image), and 1,459 lesion-ROI tracks, defined as the path that each unique lesion-ROI follows across all time points, were determined by the ROI tracking method (median of 49 tracks per patient). All patients presented with disease response heterogeneity at the first follow-up scan. All 10 patients with more than 1 follow-up scan showed nonmonotonic change in lesion-ROI uptake. Of 129 tracks containing new lesion-ROIs at the first follow-up, 80 (62%) eventually resolved on final follow-up, whereas only 12% (7/60) of the tracks with lesion-ROIs disappearing at the first follow-up scan returned on final follow-up. Conclusion: To the best of our knowledge, this is the first study to evaluate response comprehensively and quantitatively in terms of individual lesion-ROIs. Response heterogeneity was observed in 100% of the patients, which suggests that comprehensive, lesion-level, response assessment is vital for the accurate understanding of the NET response to PRRT.
Collapse
Affiliation(s)
- Victor Santoro-Fernandes
- Department of Medical Physics, School of Medicine and Public Health, University of Wisconsin, Madison, Wisconsin
| | - Brayden Schott
- Department of Medical Physics, School of Medicine and Public Health, University of Wisconsin, Madison, Wisconsin
| | | | | | - Steve Y Cho
- Section of Nuclear Medicine and Molecular Imaging, Department of Radiology, School of Medicine and Public Health, University of Wisconsin, Madison, Wisconsin; and
- Carbone Cancer Centre, University of Wisconsin, Madison, Wisconsin
| | - Scott B Perlman
- Section of Nuclear Medicine and Molecular Imaging, Department of Radiology, School of Medicine and Public Health, University of Wisconsin, Madison, Wisconsin; and
- Carbone Cancer Centre, University of Wisconsin, Madison, Wisconsin
| | | | - Robert Jeraj
- Department of Medical Physics, School of Medicine and Public Health, University of Wisconsin, Madison, Wisconsin;
- Carbone Cancer Centre, University of Wisconsin, Madison, Wisconsin
| |
Collapse
|
17
|
Sammad A, Ding Z. Harnessing Multi-Omics: Integrating Radiomics and Pathomics for Predicting Microsatellite Instability in Rectal Cancer. Acad Radiol 2025; 32:1946-1948. [PMID: 39955254 DOI: 10.1016/j.acra.2025.02.015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2025] [Accepted: 02/10/2025] [Indexed: 02/17/2025]
Affiliation(s)
- Abdul Sammad
- The Fourth School of Clinical Medicine, Zhejiang Chinese Medical University, Hangzhou, PR China (A.S., Z.D.); Department of Radiology, Hangzhou First People's Hospital, Hangzhou, PR China (A.S., Z.D.)
| | - Zhongxiang Ding
- The Fourth School of Clinical Medicine, Zhejiang Chinese Medical University, Hangzhou, PR China (A.S., Z.D.); Department of Radiology, Hangzhou First People's Hospital, Hangzhou, PR China (A.S., Z.D.).
| |
Collapse
|
18
|
Sekkat H, Khallouqi A, Rhazouani OE, Halimi A. Automated Detection of Hydrocephalus in Pediatric Head Computed Tomography Using VGG 16 CNN Deep Learning Architecture and Based Automated Segmentation Workflow for Ventricular Volume Estimation. JOURNAL OF IMAGING INFORMATICS IN MEDICINE 2025:10.1007/s10278-025-01482-x. [PMID: 40108068 DOI: 10.1007/s10278-025-01482-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/30/2024] [Revised: 02/23/2025] [Accepted: 03/11/2025] [Indexed: 03/22/2025]
Abstract
Hydrocephalus, particularly congenital hydrocephalus in infants, remains underexplored in deep learning research. While deep learning has been widely applied to medical image analysis, few studies have specifically addressed the automated classification of hydrocephalus. This study proposes a convolutional neural network (CNN) model based on the VGG16 architecture to detect hydrocephalus in infant head CT images. The model integrates an automated method for ventricular volume extraction, applying windowing, histogram equalization, and thresholding techniques to segment the ventricles from surrounding brain structures. Morphological operations refine the segmentation and contours are extracted for visualization and volume measurement. The dataset consists of 105 head CT scans, each with 60 slices covering the ventricular volume, resulting in 6300 slices. Manual segmentation by three trained radiologists served as the reference standard. The automated method showed a high correlation with manual measurements, with R2 values ranging from 0.94 to 0.99. The mean absolute percentage error (MAPE) ranged 3.99 to 11.13%, while the root mean square error (RRMSE) from 4.56 to 13.74%. To improve model robustness, the dataset was preprocessed, normalized, and augmented with rotation, shifting, zooming, and flipping. The VGG16-based CNN used pre-trained convolutional layers with additional fully connected layers for classification, predicting hydrocephalus or normal labels. Performance evaluation using a multi-split strategy (15 independent splits) achieved a mean accuracy of 90.4% ± 1.2%. This study presents an automated approach for ventricular volume extraction and hydrocephalus detection, offering a promising tool for clinical and research applications with high accuracy and reduced observer bias.
Collapse
Affiliation(s)
- Hamza Sekkat
- Sciences and Engineering of Biomedicals, Biophysics and Health Laboratory, Higher Institute of Health Sciences, Hassan 1st University, Settat, 26000, Morocco.
- Department of Radiotherapy, International Clinic of Settat, Settat, Morocco.
| | - Abdellah Khallouqi
- Sciences and Engineering of Biomedicals, Biophysics and Health Laboratory, Higher Institute of Health Sciences, Hassan 1st University, Settat, 26000, Morocco
- Department of Radiology, Public Hospital of Mediouna, Mediouna, Morocco
- Department of Radiology, Private Clinic Hay Mouhamadi, Casablanca, Morocco
| | - Omar El Rhazouani
- Sciences and Engineering of Biomedicals, Biophysics and Health Laboratory, Higher Institute of Health Sciences, Hassan 1st University, Settat, 26000, Morocco
| | - Abdellah Halimi
- Sciences and Engineering of Biomedicals, Biophysics and Health Laboratory, Higher Institute of Health Sciences, Hassan 1st University, Settat, 26000, Morocco
| |
Collapse
|
19
|
Pelcat A, Le Berre A, Ben Hassen W, Debacker C, Charron S, Thirion B, Legrand L, Turc G, Oppenheim C, Benzakoun J. Generative T2*-weighted images as a substitute for true T2*-weighted images on brain MRI in patients with acute stroke. Diagn Interv Imaging 2025:S2211-5684(25)00048-8. [PMID: 40113490 DOI: 10.1016/j.diii.2025.03.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2024] [Revised: 03/11/2025] [Accepted: 03/14/2025] [Indexed: 03/22/2025]
Abstract
PURPOSE The purpose of this study was to validate a deep learning algorithm that generates T2*-weighted images from diffusion-weighted (DW) images and to compare its performance with that of true T2*-weighted images for hemorrhage detection on MRI in patients with acute stroke. MATERIALS AND METHODS This single-center, retrospective study included DW- and T2*-weighted images obtained less than 48 hours after symptom onset in consecutive patients admitted for acute stroke. Datasets were divided into training (60 %), validation (20 %), and test (20 %) sets, with stratification by stroke type (hemorrhagic/ischemic). A generative adversarial network was trained to produce generative T2*-weighted images using DW images. Concordance between true T2*-weighted images and generative T2*-weighted images for hemorrhage detection was independently graded by two readers into three categories (parenchymal hematoma, hemorrhagic infarct or no hemorrhage), and discordances were resolved by consensus reading. Sensitivity, specificity and accuracy of generative T2*-weighted images were estimated using true T2*-weighted images as the standard of reference. RESULTS A total of 1491 MRI sets from 939 patients (487 women, 452 men) with a median age of 71 years (first quartile, 57; third quartile, 81; range: 21-101) were included. In the test set (n = 300), there were no differences between true T2*-weighted images and generative T2*-weighted images for intraobserver reproducibility (κ = 0.97 [95 % CI: 0.95-0.99] vs. 0.95 [95 % CI: 0.92-0.97]; P = 0.27) and interobserver reproducibility (κ = 0.93 [95 % CI: 0.90-0.97] vs. 0.92 [95 % CI: 0.88-0.96]; P = 0.64). After consensus reading, concordance between true T2*-weighted images and generative T2*-weighted images was excellent (κ = 0.92; 95 % CI: 0.91-0.96). Generative T2*-weighted images achieved 90 % sensitivity (73/81; 95 % CI: 81-96), 97 % specificity (213/219; 95 % CI: 94-99) and 95 % accuracy (286/300; 95 % CI: 92-97) for the diagnosis of any cerebral hemorrhage (hemorrhagic infarct or parenchymal hemorrhage). CONCLUSION Generative T2*-weighted images and true T2*-weighted images have non-different diagnostic performances for hemorrhage detection in patients with acute stroke and may be used to shorten MRI protocols.
Collapse
Affiliation(s)
- Antoine Pelcat
- Université Paris Cité, Institute of Psychiatry and Neuroscience of Paris (IPNP), INSERM U1266, IMA-BRAIN, 75014 Paris, France
| | - Alice Le Berre
- Université Paris Cité, Institute of Psychiatry and Neuroscience of Paris (IPNP), INSERM U1266, IMA-BRAIN, 75014 Paris, France; GHU Paris Psychiatrie et Neurosciences, Hôpital Sainte Anne, Department of Neuroradiology, 75014 Paris, France
| | - Wagih Ben Hassen
- Université Paris Cité, Institute of Psychiatry and Neuroscience of Paris (IPNP), INSERM U1266, IMA-BRAIN, 75014 Paris, France; GHU Paris Psychiatrie et Neurosciences, Hôpital Sainte Anne, Department of Neuroradiology, 75014 Paris, France
| | - Clement Debacker
- Université Paris Cité, Institute of Psychiatry and Neuroscience of Paris (IPNP), INSERM U1266, IMA-BRAIN, 75014 Paris, France; GHU Paris Psychiatrie et Neurosciences, Hôpital Sainte Anne, Department of Neuroradiology, 75014 Paris, France
| | - Sylvain Charron
- Université Paris Cité, Institute of Psychiatry and Neuroscience of Paris (IPNP), INSERM U1266, IMA-BRAIN, 75014 Paris, France
| | - Bertrand Thirion
- INRIA, CEA, Université Paris-Saclay, MIND Team, 91400 Palaiseau, France
| | - Laurence Legrand
- Université Paris Cité, Institute of Psychiatry and Neuroscience of Paris (IPNP), INSERM U1266, IMA-BRAIN, 75014 Paris, France; GHU Paris Psychiatrie et Neurosciences, Hôpital Sainte Anne, Department of Neuroradiology, 75014 Paris, France
| | - Guillaume Turc
- Université Paris Cité, Institute of Psychiatry and Neuroscience of Paris (IPNP), INSERM U1266, Stroke Team, 75014 Paris, France; GHU Paris Psychiatrie et Neurosciences, Hôpital Sainte Anne, Department of Neurology, 75014 Paris, France
| | - Catherine Oppenheim
- Université Paris Cité, Institute of Psychiatry and Neuroscience of Paris (IPNP), INSERM U1266, IMA-BRAIN, 75014 Paris, France; GHU Paris Psychiatrie et Neurosciences, Hôpital Sainte Anne, Department of Neuroradiology, 75014 Paris, France
| | - Joseph Benzakoun
- Université Paris Cité, Institute of Psychiatry and Neuroscience of Paris (IPNP), INSERM U1266, IMA-BRAIN, 75014 Paris, France; GHU Paris Psychiatrie et Neurosciences, Hôpital Sainte Anne, Department of Neuroradiology, 75014 Paris, France.
| |
Collapse
|
20
|
Fasihi-Shirehjini O, Babapour-Mofrad F. Effectiveness of ConvNeXt variants in diabetic feet diagnosis using plantar thermal images. QUANTITATIVE INFRARED THERMOGRAPHY JOURNAL 2025; 22:155-172. [DOI: 10.1080/17686733.2024.2310794] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/18/2023] [Accepted: 01/23/2024] [Indexed: 10/11/2024]
|
21
|
Arnab SP, Dos Santos ALC, Fumagalli M, DeGiorgio M. Efficient detection and characterization of targets of natural selection using transfer learning. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2025.03.05.641710. [PMID: 40093065 PMCID: PMC11908262 DOI: 10.1101/2025.03.05.641710] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 03/19/2025]
Abstract
Natural selection leaves detectable patterns of altered spatial diversity within genomes, and identifying affected regions is crucial for understanding species evolution. Recently, machine learning approaches applied to raw population genomic data have been developed to uncover these adaptive signatures. Convolutional neural networks (CNNs) are particularly effective for this task, as they handle large data arrays while maintaining element correlations. However, shallow CNNs may miss complex patterns due to their limited capacity, while deep CNNs can capture these patterns but require extensive data and computational power. Transfer learning addresses these challenges by utilizing a deep CNN pre-trained on a large dataset as a feature extraction tool for downstream classification and evolutionary parameter prediction. This approach reduces extensive training data generation requirements and computational needs while maintaining high performance. In this study, we developed TrIdent, a tool that uses transfer learning to enhance detection of adaptive genomic regions from image representations of multilocus variation. We evaluated TrIdent across various genetic, demographic, and adaptive settings, in addition to unphased data and other confounding factors. TrIdent demonstrated improved detection of adaptive regions compared to recent methods using similar data representations. We further explored model interpretability through class activation maps and adapted TrIdent to infer selection parameters for identified adaptive candidates. Using whole-genome haplotype data from European and African populations, TrIdent effectively recapitulated known sweep candidates and identified novel cancer, and other disease-associated genes as potential sweeps.
Collapse
Affiliation(s)
- Sandipan Paul Arnab
- Department of Electrical Engineering and Computer Science, Florida Atlantic University, Boca Raton, FL, USA
| | | | - Matteo Fumagalli
- School of Biological and Behavioural Sciences, Queen Mary University of London, London, UK
- The Alan Turing Institute, London, UK
| | - Michael DeGiorgio
- Department of Electrical Engineering and Computer Science, Florida Atlantic University, Boca Raton, FL, USA
| |
Collapse
|
22
|
Deebani W, Aziz L, Aziz A, Basri WS, Alawad WM, Althubiti SA. Synergistic transfer learning and adversarial networks for breast cancer diagnosis: benign vs. invasive classification. Sci Rep 2025; 15:7461. [PMID: 40032913 PMCID: PMC11876678 DOI: 10.1038/s41598-025-90288-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2024] [Accepted: 02/11/2025] [Indexed: 03/05/2025] Open
Abstract
Current breast cancer diagnosis methods often face limitations such as high cost, time consumption, and inter-observer variability. To address these challenges, this research proposes a novel deep learning framework that leverages generative adversarial networks (GANs) for data augmentation and transfer learning to enhance breast cancer classification using convolutional neural networks (CNNs). The framework uses a two-stage augmentation approach. First, a conditional Wasserstein GAN (cWGAN) generates synthetic breast cancer images based on clinical data, enhancing training stability and enabling targeted feature incorporation. Second, traditional augmentation techniques (e.g., rotation, flipping, cropping) are applied to both original and synthetic images. A multi-scale transfer learning technique is also employed, integrating three pre-trained CNNs (DenseNet-201, NasNetMobile, ResNet-101) with a multi-scale feature enrichment scheme, allowing the model to capture features at various scales. The framework was evaluated on the BreakHis dataset, achieving an accuracy of 99.2% for binary classification and 98.5% for multi-class classification, significantly outperforming existing methods. This framework offers a more efficient, cost-effective, and accurate approach for breast cancer diagnosis. Future work will focus on generalizing the framework to clinical datasets and integrating it into diagnostic workflows.
Collapse
Affiliation(s)
- Wejdan Deebani
- Department of Mathematics, College of Science and Arts, King Abdul Aziz University, 21911, Rabigh, Saudi Arabia
| | - Lubna Aziz
- Department of Artificial Intelligence, FEST Iqra University Karachi, Karachi, Pakistan.
- Faculty of Computing, Universiti Teknologi Malaysia, Johor Bahru, Johor, Malaysia.
| | - Arshad Aziz
- Department of Artificial Intelligence, FEST Iqra University Karachi, Karachi, Pakistan
| | - Wael Sh Basri
- College of Business Administration, Management Information System, Northern Border University, Arar, Saudi Arabia
| | - Wedad M Alawad
- Department of Information Technology, College of Computer, Qassim University, Buraydah, 51452, Saudi Arabia
| | - Sara A Althubiti
- Department of Computer Science, College of Computer and Information Sciences, Majmaah University, 11952, Al-Majmaah, Saudi Arabia
| |
Collapse
|
23
|
Han K, Lou Q, Lu F. A semi-supervised domain adaptation method with scale-aware and global-local fusion for abdominal multi-organ segmentation. J Appl Clin Med Phys 2025; 26:e70008. [PMID: 39924943 PMCID: PMC11905256 DOI: 10.1002/acm2.70008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2024] [Revised: 11/02/2024] [Accepted: 11/27/2024] [Indexed: 02/11/2025] Open
Abstract
BACKGROUND Abdominal multi-organ segmentation remains a challenging task. Semi-supervised domain adaptation (SSDA) has emerged as an innovative solution. However, SSDA frameworks based on UNet struggle to capture multi-scale and global information. PURPOSE Our work aimed to propose a novel SSDA method to achieve more accurate abdominal multi-organ segmentation with limited labeled target domain data, which has a superior ability to capture the multi-scale features and integrate local and global information effectively. METHODS The proposed network is based on UNet. In the encoder part, a scale-aware with domain-specific batch normalization (SAD) module is integrated to adaptively extract multi-scale features and to get better generalization across source and target domains. In the bottleneck part, a global-local fusion (GLF) module is utilized for capturing and integrating both local and global information. They are integrated into the framework of self-ensembling mean-teacher (SE-MT) to enhance the model's capability to learn common features across source and target domains. RESULTS To validate the performance of the proposed model, we evaluated it on the public CHAOS and BTCV datasets. For CHAOS, the proposed method obtains an average DSC of 88.97% and ASD of 1.12 mm with only 20% labeled target data. For BTCV, it achieves an average DSC of 88.95% and ASD of 1.13 mm with 20% labeled target data. Compared with the state-of-the-art methods, DSC and ASD increased by at least 0.72% and 0.33 mm on CHAOS, 1.29% and 0.06 mm on BTCV, respectively. Ablation studies were also conducted to verify the contribution of each component of the model. The proposed method achieves a DSC improvement of 3.17% over the baseline with 20% labeled target data. CONCLUSION The proposed SSDA method for abdominal multi-organ segmentation has a powerful ability to extract multi-scale and more global features, significantly improving segmentation accuracy and robustness.
Collapse
Affiliation(s)
- Kexin Han
- School of ScienceZhejiang University of Science and TechnologyHangzhouChina
| | - Qiong Lou
- School of ScienceZhejiang University of Science and TechnologyHangzhouChina
| | - Fang Lu
- School of ScienceZhejiang University of Science and TechnologyHangzhouChina
| |
Collapse
|
24
|
Shao X, Niu R. Bridging Artificial Intelligence Models to Clinical Practice: Challenges in Lung Cancer Prediction. Radiol Artif Intell 2025; 7:e250080. [PMID: 40072120 DOI: 10.1148/ryai.250080] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/07/2025]
Affiliation(s)
- Xiaonan Shao
- Third Affiliated Hospital of Soochow University, No. 185 Juqian Street, Changzhou 213003, China
| | - Rong Niu
- Third Affiliated Hospital of Soochow University, No. 185 Juqian Street, Changzhou 213003, China
| |
Collapse
|
25
|
Giannakopoulos II, Carluccio G, Keerthivasan MB, Koerzdoerfer G, Lakshmanan K, De Moura HL, Serrallés JEC, Lattanzi R. MR electrical properties mapping using vision transformers and canny edge detectors. Magn Reson Med 2025; 93:1117-1131. [PMID: 39415436 PMCID: PMC11955224 DOI: 10.1002/mrm.30338] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2024] [Revised: 09/24/2024] [Accepted: 09/24/2024] [Indexed: 10/18/2024]
Abstract
PURPOSE We developed a 3D vision transformer-based neural network to reconstruct electrical properties (EP) from magnetic resonance measurements. THEORY AND METHODS Our network uses the magnitude of the transmit magnetic field of a birdcage coil, the associated transceive phase, and a Canny edge mask that identifies the object boundaries as inputs to compute the EP maps. We trained our network on a dataset of 10 000 synthetic tissue-mimicking phantoms and fine-tuned it on a dataset of 11 000 realistic head models. We assessed performance in-distribution simulated data and out-of-distribution head models, with and without synthetic lesions. We further evaluated our network in experiments for an inhomogeneous phantom and a volunteer. RESULTS The conductivity and permittivity maps had an average peak normalized absolute error (PNAE) of 1.3% and 1.7% for the synthetic phantoms, respectively. For the realistic heads, the average PNAE for the conductivity and permittivity was 1.8% and 2.7%, respectively. The location of synthetic lesions was accurately identified, with reconstructed conductivity and permittivity values within 15% and 25% of the ground-truth, respectively. The conductivity and permittivity for the phantom experiment yielded 2.7% and 2.1% average PNAEs with respect to probe-measured values, respectively. The in vivo EP reconstruction truthfully preserved the subject's anatomy with average values over the entire head similar to the expected literature values. CONCLUSION We introduced a new learning-based approach for reconstructing EP from MR measurements obtained with a birdcage coil, marking an important step towards the development of clinically-usable in vivo EP reconstruction protocols.
Collapse
Affiliation(s)
- Ilias I. Giannakopoulos
- The Bernard and Irene Schwartz Center for Biomedical Imaging and Center for Advanced Imaging Innovation and Research (CAIR), Department of Radiology, New York University Grossman School of Medicine, New York, New York, USA
| | | | | | | | - Karthik Lakshmanan
- The Bernard and Irene Schwartz Center for Biomedical Imaging and Center for Advanced Imaging Innovation and Research (CAIR), Department of Radiology, New York University Grossman School of Medicine, New York, New York, USA
| | - Hector L. De Moura
- The Bernard and Irene Schwartz Center for Biomedical Imaging and Center for Advanced Imaging Innovation and Research (CAIR), Department of Radiology, New York University Grossman School of Medicine, New York, New York, USA
| | - José E. Cruz Serrallés
- The Bernard and Irene Schwartz Center for Biomedical Imaging and Center for Advanced Imaging Innovation and Research (CAIR), Department of Radiology, New York University Grossman School of Medicine, New York, New York, USA
| | - Riccardo Lattanzi
- The Bernard and Irene Schwartz Center for Biomedical Imaging and Center for Advanced Imaging Innovation and Research (CAIR), Department of Radiology, New York University Grossman School of Medicine, New York, New York, USA
| |
Collapse
|
26
|
Buga R, Buzea CG, Agop M, Ochiuz L, Vasincu D, Popa O, Rusu DI, Știrban I, Eva L. Streamlit Application and Deep Learning Model for Brain Metastasis Monitoring After Gamma Knife Treatment. Biomedicines 2025; 13:423. [PMID: 40002836 PMCID: PMC11852629 DOI: 10.3390/biomedicines13020423] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2025] [Revised: 02/05/2025] [Accepted: 02/08/2025] [Indexed: 02/27/2025] Open
Abstract
Background/Objective: This study explores the use of AI-powered radiomics to classify and monitor brain metastasis progression and regression following Gamma Knife radiosurgery (GKRS) based on MRI imaging. A clinical decision support application was developed using Streamlit to provide real-time, AI-driven predictions for treatment monitoring. Methods: MRI scans from 60 patients (3194 images) were analyzed using a transfer learning-enhanced AlexNet deep learning model. Class imbalance was mitigated through dynamic class weighting and data augmentation to ensure equitable performance across all classes. Optimized preprocessing pipelines ensured dataset standardization. Model performance was evaluated using accuracy, precision, recall, F1-scores, and AUC, with 95% confidence intervals. Additionally, a comparative analysis of Gamma Knife radiosurgery (GKRS) outcomes and predictive modeling demonstrated strong correlations between tumor volume evolution and treatment response. The AI predictions and visualizations were integrated into a Streamlit-based application to ensure clinical usability and ease of access. The AI-driven approach effectively classified progression and regression patterns, reinforcing its potential for clinical integration. Results: The transfer learning model achieved flawless classification accuracy (100%; 95% CI: 100-100%) along with perfect precision, recall, and F1-scores. The AUC score of 1.0000 (95% CI: 1.0000-1.0000) indicated excellent discrimination between progression and regression cases. Compared to the baseline AlexNet model (99.53% accuracy; 95% CI: 98.90-100.00%), the TL-enhanced model resolved all misclassifications. Tumor volume analysis identified the baseline size as a key predictor of progression (Pearson r = 0.795, r = 0.795, r = 0.795, p < 0.0001, p < 0.0001, and p < 0.0001). The training time (420.12 s) was faster than ResNet-50 (443.38 s) and EfficientNet-B0 (439.87 s), while achieving equivalent metrics. Despite 100% accuracy, the model requires multi-center validation for generalizability. Conclusions: This study demonstrates that transfer learning with dynamic class weighting provides a highly accurate and reliable framework for monitoring brain metastases post-GKRS. The Streamlit-based AI application enhances clinical decision-making by improving diagnostic precision and reducing variability. Explainable AI techniques, such as Grad-CAM visualizations, improve interpretability and support clinical adoption. These findings emphasize the transformative potential of AI in personalized treatment strategies, extending applications to genomic profiling, survival modeling, and longitudinal follow-ups for brain metastasis management.
Collapse
Affiliation(s)
- Răzvan Buga
- Clinical Emergency Hospital “Prof. Dr. Nicolae Oblu” Iași, 700309 Iași, Romania; (R.B.); (I.Ș.); (L.E.)
| | - Călin Gh. Buzea
- Clinical Emergency Hospital “Prof. Dr. Nicolae Oblu” Iași, 700309 Iași, Romania; (R.B.); (I.Ș.); (L.E.)
- National Institute of Research and Development for Technical Physics, IFT Iași, 700050 Iași, Romania
| | - Maricel Agop
- Physics Department, Technical University “Gheorghe Asachi” Iași, 700050 Iași, Romania;
| | - Lăcrămioara Ochiuz
- Faculty of Medicine, University of Medicine and Pharmacy “Grigore T. Popa” Iași, 700115 Iași, Romania; (L.O.); (D.V.); (O.P.)
| | - Decebal Vasincu
- Faculty of Medicine, University of Medicine and Pharmacy “Grigore T. Popa” Iași, 700115 Iași, Romania; (L.O.); (D.V.); (O.P.)
| | - Ovidiu Popa
- Faculty of Medicine, University of Medicine and Pharmacy “Grigore T. Popa” Iași, 700115 Iași, Romania; (L.O.); (D.V.); (O.P.)
| | - Dragoș Ioan Rusu
- Faculty of Science, University “Vasile Alecsandri” of Bacău, 600115 Bacău, Romania;
| | - Ioana Știrban
- Clinical Emergency Hospital “Prof. Dr. Nicolae Oblu” Iași, 700309 Iași, Romania; (R.B.); (I.Ș.); (L.E.)
| | - Lucian Eva
- Clinical Emergency Hospital “Prof. Dr. Nicolae Oblu” Iași, 700309 Iași, Romania; (R.B.); (I.Ș.); (L.E.)
- Faculty of Medicine, Apollonia University, 700511 Iasi, Romania
| |
Collapse
|
27
|
Afzal S, Rauf M, Ashraf S, Bin Md Ayob S, Ahmad Arfeen Z. CART-ANOVA-Based Transfer Learning Approach for Seven Distinct Tumor Classification Schemes with Generalization Capability. Diagnostics (Basel) 2025; 15:378. [PMID: 39941307 PMCID: PMC11816775 DOI: 10.3390/diagnostics15030378] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2024] [Revised: 12/31/2024] [Accepted: 01/22/2025] [Indexed: 02/16/2025] Open
Abstract
Background/Objectives: Deep transfer learning, leveraging convolutional neural networks (CNNs), has become a pivotal tool for brain tumor detection. However, key challenges include optimizing hyperparameter selection and enhancing the generalization capabilities of models. This study introduces a novel CART-ANOVA (Cartesian-ANOVA) hyperparameter tuning framework, which differs from traditional optimization methods by systematically integrating statistical significance testing (ANOVA) with the Cartesian product of hyperparameter values. This approach ensures robust and precise parameter tuning by evaluating the interaction effects between hyperparameters, such as batch size and learning rate, rather than relying solely on grid or random search. Additionally, it implements seven distinct classification schemes for brain tumors, aimed at improving diagnostic accuracy and robustness. Methods: The proposed framework employs a ResNet18-based knowledge transfer learning (KTL) model trained on a primary dataset, with 20% allocated for testing. Hyperparameters were optimized using CART-ANOVA analysis, and statistical validation ensured robust parameter selection. The model's generalization and robustness were evaluated on an independent second dataset. Performance metrics, including precision, accuracy, sensitivity, and F1 score, were compared against other pre-trained CNN models. Results: The framework achieved exceptional testing accuracy of 99.65% for four-class classification and 98.05% for seven-class classification on the source 1 dataset. It also maintained high generalization capabilities, achieving accuracies of 98.77% and 96.77% on the source 2 datasets for the same tasks. The incorporation of seven distinct classification schemes further enhanced variability and diagnostic capability, surpassing the performance of other pre-trained models. Conclusions: The CART-ANOVA hyperparameter tuning framework, combined with a ResNet18-based KTL approach, significantly improves brain tumor classification accuracy, robustness, and generalization. These advancements demonstrate strong potential for enhancing diagnostic precision and informing effective treatment strategies, contributing to advancements in medical imaging and AI-driven healthcare solutions.
Collapse
Affiliation(s)
- Shiraz Afzal
- Department of Electronic Engineering, Dawood University of Engineering and Technology, Karachi 74800, Pakistan;
| | - Muhammad Rauf
- Department of Electronic Engineering, Dawood University of Engineering and Technology, Karachi 74800, Pakistan;
| | - Shahzad Ashraf
- Department of Computer Science, DHA Suffa University, Karachi 75500, Pakistan
| | - Shahrin Bin Md Ayob
- Faculty of Electrical Engineering, Universiti Teknologi Malaysia, Johor Bahru 81310, Malaysia
| | - Zeeshan Ahmad Arfeen
- Department of Electrical Engineering, The Islamia University of Bahawalpur (IUB), Bahawalpur 63100, Pakistan
| |
Collapse
|
28
|
Rey-Barroso L, Vilaseca M, Royo S, Díaz-Doutón F, Lihacova I, Bondarenko A, Burgos-Fernández FJ. Training State-of-the-Art Deep Learning Algorithms with Visible and Extended Near-Infrared Multispectral Images of Skin Lesions for the Improvement of Skin Cancer Diagnosis. Diagnostics (Basel) 2025; 15:355. [PMID: 39941285 PMCID: PMC11817636 DOI: 10.3390/diagnostics15030355] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2024] [Revised: 01/20/2025] [Accepted: 01/21/2025] [Indexed: 02/16/2025] Open
Abstract
An estimated 60,000 people die annually from skin cancer, predominantly melanoma. The diagnosis of skin lesions primarily relies on visual inspection, but around half of lesions pose diagnostic challenges, often necessitating a biopsy. Non-invasive detection methods like Computer-Aided Diagnosis (CAD) using Deep Learning (DL) are becoming more prominent. This study focuses on the use of multispectral (MS) imaging to improve skin lesion classification of DL models. We trained two convolutional neural networks (CNNs)-a simple CNN with six two-dimensional (2D) convolutional layers and a custom VGG-16 model with three-dimensional (3D) convolutional layers-using a dataset of MS images. The dataset included spectral cubes from 327 nevi, 112 melanomas, and 70 basal cell carcinomas (BCCs). We compared the performance of the CNNs trained with full spectral cubes versus using only three spectral bands closest to RGB wavelengths. The custom VGG-16 model achieved a classification accuracy of 71% with full spectral cubes and 45% with RGB-simulated images. The simple CNN achieved an accuracy of 83% with full spectral cubes and 36% with RGB-simulated images, demonstrating the added value of spectral information. These results confirm that MS imaging provides complementary information beyond traditional RGB images, contributing to improved classification performance. Although the dataset size remains a limitation, the findings indicate that MS imaging has significant potential for enhancing skin lesion diagnosis, paving the way for further advancements as larger datasets become available.
Collapse
Affiliation(s)
- Laura Rey-Barroso
- Centre for Sensors, Instruments and Systems Development, Universitat Politècnica de Catalunya, 08222 Terrassa, Spain; (M.V.); (S.R.); (F.D.-D.); (F.J.B.-F.)
| | - Meritxell Vilaseca
- Centre for Sensors, Instruments and Systems Development, Universitat Politècnica de Catalunya, 08222 Terrassa, Spain; (M.V.); (S.R.); (F.D.-D.); (F.J.B.-F.)
| | - Santiago Royo
- Centre for Sensors, Instruments and Systems Development, Universitat Politècnica de Catalunya, 08222 Terrassa, Spain; (M.V.); (S.R.); (F.D.-D.); (F.J.B.-F.)
| | - Fernando Díaz-Doutón
- Centre for Sensors, Instruments and Systems Development, Universitat Politècnica de Catalunya, 08222 Terrassa, Spain; (M.V.); (S.R.); (F.D.-D.); (F.J.B.-F.)
| | - Ilze Lihacova
- Institute of Atomic Physics and Spectroscopy, University of Latvia, 1004 Riga, Latvia;
| | - Andrey Bondarenko
- Faculty of Computer Science and Information Technology, Riga Technical University, 1048 Riga, Latvia;
| | - Francisco J. Burgos-Fernández
- Centre for Sensors, Instruments and Systems Development, Universitat Politècnica de Catalunya, 08222 Terrassa, Spain; (M.V.); (S.R.); (F.D.-D.); (F.J.B.-F.)
| |
Collapse
|
29
|
Zhang X, Zhao J, Zong D, Ren H, Gao C. Taming vision transformers for clinical laryngoscopy assessment. J Biomed Inform 2025; 162:104766. [PMID: 39827999 DOI: 10.1016/j.jbi.2024.104766] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2024] [Revised: 12/09/2024] [Accepted: 12/26/2024] [Indexed: 01/22/2025]
Abstract
OBJECTIVE Laryngoscopy, essential for diagnosing laryngeal cancer (LCA), faces challenges due to high inter-observer variability and the reliance on endoscopist expertise. Distinguishing precancerous from early-stage cancerous lesions is particularly challenging, even for experienced practitioners, given their similar appearances. This study aims to enhance laryngoscopic image analysis to improve early screening/detection of cancer or precancerous conditions. METHODS We propose MedFormer, a laryngeal cancer classification method based on the Vision Transformer (ViT). To address data scarcity, MedFormer employs a customized transfer learning approach that leverages the representational power of pre-trained transformers. This method enables robust out-of-domain generalization by fine-tuning a minimal set of additional parameters. RESULTS MedFormer exhibits sensitivity-specificity values of 98%-89% for identifying precancerous lesions (leukoplakia) and 89%-97% for detecting cancer, surpassing CNN counterparts significantly. Additionally, when compared to the two selected ViT-based models, MedFormer also demonstrates superior performance. It also outperforms physician visual evaluations (PVE) in certain scenarios and matches PVE performance in all cases. Visualizations using class activation maps (CAM) and deformable patches demonstrate MedFormer's interpretability, aiding clinicians in understanding the model's predictions. CONCLUSION We highlight the potential of visual transformers in clinical laryngoscopic assessments, presenting MedFormer as an effective method for the early detection of laryngeal cancer.
Collapse
Affiliation(s)
- Xinzhu Zhang
- School of Computer Science and Technology, East China Normal University, North Zhongshan Road 3663, Shanghai, 200062, China
| | - Jing Zhao
- School of Computer Science and Technology, East China Normal University, North Zhongshan Road 3663, Shanghai, 200062, China.
| | - Daoming Zong
- School of Computer Science and Technology, East China Normal University, North Zhongshan Road 3663, Shanghai, 200062, China
| | - Henglei Ren
- Eye & ENT Hospital of Fudan University, Fenyang Road 83, Shanghai, 200000, China
| | - Chunli Gao
- Eye & ENT Hospital of Fudan University, Fenyang Road 83, Shanghai, 200000, China.
| |
Collapse
|
30
|
Xu S, Li W, Li Z, Zhao T, Zhang B. Facing Differences of Similarity: Intra- and Inter-Correlation Unsupervised Learning for Chest X-Ray Anomaly Detection. IEEE TRANSACTIONS ON MEDICAL IMAGING 2025; 44:801-814. [PMID: 39283780 DOI: 10.1109/tmi.2024.3461231] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/05/2025]
Abstract
Anomaly detection can significantly aid doctors in interpreting chest X-rays. The commonly used strategy involves utilizing the pre-trained network to extract features from normal data to establish feature representations. However, when a pre-trained network is applied to more detailed X-rays, differences of similarity can limit the robustness of these feature representations. Therefore, we propose an intra- and inter-correlation learning framework for chest X-ray anomaly detection. Firstly, to better leverage the similar anatomical structure information in chest X-rays, we introduce the Anatomical-Feature Pyramid Fusion Module for feature fusion. This module aims to obtain fusion features with both local details and global contextual information. These fusion features are initialized by a trainable feature mapper and stored in a feature bank to serve as centers for learning. Furthermore, to Facing Differences of Similarity (FDS) introduced by the pre-trained network, we propose an intra- and inter-correlation learning strategy: 1) We use intra-correlation learning to establish intra-correlation between mapped features of individual images and semantic centers, thereby initially discovering lesions; 2) We employ inter-correlation learning to establish inter-correlation between mapped features of different images, further mitigating the differences of similarity introduced by the pre-trained network, and achieving effective detection results even in diverse chest disease environments. Finally, a comparison with 18 state-of-the-art methods on three datasets demonstrates the superiority and effectiveness of the proposed method across various scenarios.
Collapse
|
31
|
Huang GH, Lai WC, Chen TB, Hsu CC, Chen HY, Wu YC, Yeh LR. Deep Convolutional Neural Networks on Multiclass Classification of Three-Dimensional Brain Images for Parkinson's Disease Stage Prediction. JOURNAL OF IMAGING INFORMATICS IN MEDICINE 2025:10.1007/s10278-025-01402-z. [PMID: 39849204 DOI: 10.1007/s10278-025-01402-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/30/2024] [Revised: 12/11/2024] [Accepted: 01/01/2025] [Indexed: 01/25/2025]
Abstract
Parkinson's disease (PD), a degenerative disorder of the central nervous system, is commonly diagnosed using functional medical imaging techniques such as single-photon emission computed tomography (SPECT). In this study, we utilized two SPECT data sets (n = 634 and n = 202) from different hospitals to develop a model capable of accurately predicting PD stages, a multiclass classification task. We used the entire three-dimensional (3D) brain images as input and experimented with various model architectures. Initially, we treated the 3D images as sequences of two-dimensional (2D) slices and fed them sequentially into 2D convolutional neural network (CNN) models pretrained on ImageNet, averaging the outputs to obtain the final predicted stage. We also applied 3D CNN models pretrained on Kinetics-400. Additionally, we incorporated an attention mechanism to account for the varying importance of different slices in the prediction process. To further enhance model efficacy and robustness, we simultaneously trained the two data sets using weight sharing, a technique known as cotraining. Our results demonstrated that 2D models pretrained on ImageNet outperformed 3D models pretrained on Kinetics-400, and models utilizing the attention mechanism outperformed both 2D and 3D models. The cotraining technique proved effective in improving model performance when the cotraining data sets were sufficiently large.
Collapse
Affiliation(s)
- Guan-Hua Huang
- Institute of Statistics, National Yang Ming Chiao Tung University, Hsinchu, Taiwan.
| | - Wan-Chen Lai
- Institute of Statistics, National Yang Ming Chiao Tung University, Hsinchu, Taiwan
| | - Tai-Been Chen
- Department of Radiological Technology, Faculty of Medical Technology, Teikyo University, Tokyo, Japan
- Infinity Co. Ltd, Taoyuan, Taiwan
- Der Lih Fuh Co. Ltd, Taoyuan, Taiwan
| | - Chien-Chin Hsu
- Department of Nuclear Medicine, Kaohsiung Chang Gung Memorial Hospital, Kaohsiung, Taiwan
| | - Huei-Yung Chen
- Department of Nuclear Medicine, E-Da Hospital, I-Shou University, Kaohsiung, Taiwan
| | - Yi-Chen Wu
- Department of Nuclear Medicine, E-Da Hospital, I-Shou University, Kaohsiung, Taiwan
- Department of Medical Imaging and Radiological Sciences, I-Shou University, Kaohsiung, Taiwan
| | - Li-Ren Yeh
- Department of Anesthesiology, E-Da Cancer Hospital, I-Shou University, Kaohsiung, Taiwan
| |
Collapse
|
32
|
Zhang M, Deng Y, Zhou Q, Gao J, Zhang D, Pan X. Advancing micro-nano supramolecular assembly mechanisms of natural organic matter by machine learning for unveiling environmental geochemical processes. ENVIRONMENTAL SCIENCE. PROCESSES & IMPACTS 2025; 27:24-45. [PMID: 39745028 DOI: 10.1039/d4em00662c] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/23/2025]
Abstract
The nano-self-assembly of natural organic matter (NOM) profoundly influences the occurrence and fate of NOM and pollutants in large-scale complex environments. Machine learning (ML) offers a promising and robust tool for interpreting and predicting the processes, structures and environmental effects of NOM self-assembly. This review seeks to provide a tutorial-like compilation of data source determination, algorithm selection, model construction, interpretability analyses, applications and challenges for big-data-based ML aiming at elucidating NOM self-assembly mechanisms in environments. The results from advanced nano-submicron-scale spatial chemical analytical technologies are suggested as input data which provide the combined information of molecular interactions and structural visualization. The existing ML algorithms need to handle multi-scale and multi-modal data, necessitating the development of new algorithmic frameworks. Interpretable supervised models are crucial owing to their strong capacity of quantifying the structure-property-effect relationships and bridging the gap between simply data-driven ML and complicated NOM assembly practice. Then, the necessity and challenges are discussed and emphasized on adopting ML to understand the geochemical behaviors and bioavailability of pollutants as well as the elemental cycling processes in environments resulting from the NOM self-assembly patterns. Finally, a research framework integrating ML, experiments and theoretical simulation is proposed for comprehensively and efficiently understanding the NOM self-assembly-involved environmental issues.
Collapse
Affiliation(s)
- Ming Zhang
- College of Geoinformatics, Zhejiang University of Technology, Hangzhou, 310014, P. R. China.
| | - Yihui Deng
- College of Environment, Zhejiang University of Technology, Hangzhou, 310014, P. R. China.
| | - Qianwei Zhou
- College of Computer Science and Technology, Zhejiang University of Technology, Hangzhou 310023, P. R. China
| | - Jing Gao
- College of Environment, Zhejiang University of Technology, Hangzhou, 310014, P. R. China.
| | - Daoyong Zhang
- College of Geoinformatics, Zhejiang University of Technology, Hangzhou, 310014, P. R. China.
| | - Xiangliang Pan
- College of Environment, Zhejiang University of Technology, Hangzhou, 310014, P. R. China.
| |
Collapse
|
33
|
Qiong L, Chaofan L, Jinnan T, Liping C, Jianxiang S. Medical image segmentation based on frequency domain decomposition SVD linear attention. Sci Rep 2025; 15:2833. [PMID: 39843905 PMCID: PMC11754837 DOI: 10.1038/s41598-025-86315-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2024] [Accepted: 01/09/2025] [Indexed: 01/24/2025] Open
Abstract
Convolutional Neural Networks (CNNs) have achieved remarkable segmentation accuracy in medical image segmentation tasks. However, the Vision Transformer (ViT) model, with its capability of extracting global information, offers a significant advantage in contextual information compared to the limited receptive field of convolutional kernels in CNNs. Despite this, ViT models struggle to fully detect and extract high-frequency signals, such as textures and boundaries, in medical images. These high-frequency features are essential in medical imaging, as targets like tumors and pathological organs exhibit significant differences in texture and boundaries across different stages. Additionally, the high resolution of medical images leads to computational complexity in the self-attention mechanism of ViTs. To address these limitations, we propose a medical image segmentation network framework based on frequency domain decomposition using a Laplacian pyramid. This approach selectively computes attention features for high-frequency signals in the original image to enhance spatial structural information effectively. During attention feature computation, we introduce Singular Value Decomposition (SVD) to extract an effective representation matrix from the original image, which is then applied in the attention computation process for linear projection. This method reduces computational complexity while preserving essential features. We demonstrated the segmentation validity and superiority of our model on the Abdominal Multi-Organ Segmentation dataset and the Dermatological Disease dataset, and on the Synapse dataset our model achieved a score of 82.68 on the Dice metrics and 17.23 mm on the HD metrics. Experimental results indicate that our model consistently exhibits segmentation effectiveness and improved accuracy across various datasets.
Collapse
Affiliation(s)
- Liu Qiong
- School of Medical Imaging, Jiangsu Medical College, Yancheng, 224005, Jiangsu, China.
| | - Li Chaofan
- Affiliated Hospital 6 of Nantong University, Yancheng Third People's Hospital, Yancheng, 224001, Jiangsu, China
| | - Teng Jinnan
- Affiliated Hospital 6 of Nantong University, Yancheng Third People's Hospital, Yancheng, 224001, Jiangsu, China
| | - Chen Liping
- Affiliated Hospital 6 of Nantong University, Yancheng Third People's Hospital, Yancheng, 224001, Jiangsu, China
| | - Song Jianxiang
- Affiliated Hospital 6 of Nantong University, Yancheng Third People's Hospital, Yancheng, 224001, Jiangsu, China.
| |
Collapse
|
34
|
Fang X, Chong CF, Wong KL, Simões M, Ng BK. Investigating the key principles in two-step heterogeneous transfer learning for early laryngeal cancer identification. Sci Rep 2025; 15:2146. [PMID: 39820368 PMCID: PMC11739633 DOI: 10.1038/s41598-024-84836-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2024] [Accepted: 12/27/2024] [Indexed: 01/19/2025] Open
Abstract
Data scarcity in medical images makes transfer learning a common approach in computer-aided diagnosis. Some disease classification tasks can rely on large homogeneous public datasets to train the transferred model, while others cannot, i.e., endoscopic laryngeal cancer image identification. Distinguished from most current works, this work pioneers exploring a two-step heterogeneous transfer learning (THTL) framework for laryngeal cancer identification and summarizing the fundamental principles for the intermediate domain selection. For heterogeneity and clear vascular representation, diabetic retinopathy images were chosen as THTL's intermediate domain. The experiment results reveal two vital principles in intermediate domain selection for future studies: 1) the size of the intermediate domain is not a sufficient condition to improve the transfer learning performance; 2) even distinct vascular features in the intermediate domain do not guarantee improved performance in the target domain. We observe that radial vascular patterns benefit benign classification, whereas twisted and tangled patterns align more with malignant classification. Additionally, to compensate for the absence of twisted patterns in the intermediate domains, we propose the Step-Wise Fine-Tuning (SWFT) technique, guided by the Layer Class Activate Map (LayerCAM) visualization result, getting 20.4% accuracy increases compared to accuracy from THTL's, even higher than fine-tune all layers.
Collapse
Affiliation(s)
- Xinyi Fang
- Faculty of Applied Sciences, Macao Polytechnic University, Macao, 999078, China
- Department of Informatics Engineering, Centre for Informatics and Systems of the University of Coimbra, University of Coimbra, Coimbra, 3000, Portugal
| | - Chak Fong Chong
- Faculty of Applied Sciences, Macao Polytechnic University, Macao, 999078, China
- Department of Informatics Engineering, Centre for Informatics and Systems of the University of Coimbra, University of Coimbra, Coimbra, 3000, Portugal
| | - Kei Long Wong
- Faculty of Applied Sciences, Macao Polytechnic University, Macao, 999078, China
- Department of Computer Science and Engineering, University of Bologna, Bologna, 40100, Italy
| | - Marco Simões
- Department of Informatics Engineering, Centre for Informatics and Systems of the University of Coimbra, University of Coimbra, Coimbra, 3000, Portugal
| | - Benjamin K Ng
- Faculty of Applied Sciences, Macao Polytechnic University, Macao, 999078, China.
| |
Collapse
|
35
|
Maruyama S, Mizutani F, Watanabe H. Novel approach for quality control testing of medical displays using deep learning technology. Biomed Phys Eng Express 2025; 11:025004. [PMID: 39773861 DOI: 10.1088/2057-1976/ada6bd] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2024] [Accepted: 01/07/2025] [Indexed: 01/11/2025]
Abstract
Objectives:In digital image diagnosis using medical displays, it is crucial to rigorously manage display devices to ensure appropriate image quality and diagnostic safety. The aim of this study was to develop a model for the efficient quality control (QC) of medical displays, specifically addressing the measurement items of contrast response and maximum luminance as part of constancy testing, and to evaluate its performance. In addition, the study focused on whether these tasks could be addressed using a multitasking strategy.Methods:The model used in this study was constructed by fine-tuning a pretrained model and expanding it to a multioutput configuration that could perform both contrast response classification and maximum luminance regression. QC images displayed on a medical display were captured using a smartphone, and these images served as the input for the model. The performance was evaluated using the area under the receiver operating characteristic curve (AUC) for the classification task. For the regression task, correlation coefficients and Bland-Altman analysis were applied. We investigated the impact of different architectures and verified the performance of multi-task models against single-task models as a baseline.Results:Overall, the classification task achieved a high AUC of approximately 0.9. The correlation coefficients for the regression tasks ranged between 0.6 and 0.7 on average. Although the model tended to underestimate the maximum luminance values, the error margin was consistently within 5% for all conditions.Conclusion:These results demonstrate the feasibility of implementing an efficient QC system for medical displays and the usefulness of a multitask-based method. Thus, this study provides valuable insights into the potential to reduce the workload associated with medical-device management the development of QC systems for medical devices, highlighting the importance of future efforts to improve their accuracy and applicability.
Collapse
Affiliation(s)
- Sho Maruyama
- Department of Radiological Technology, Gunma Prefectural College of Health Sciences, Maebashi, Gunma, Japan
| | - Fumiya Mizutani
- Department of Radiology, Mie University Hospital, Tsu, Mie, Japan
| | - Haruyuki Watanabe
- Department of Radiological Technology, Gunma Prefectural College of Health Sciences, Maebashi, Gunma, Japan
| |
Collapse
|
36
|
Kishor Kumar Reddy C, Kaza VS, Madana Mohana R, Alhameed M, Jeribi F, Alam S, Shuaib M. Detecting anomalies in smart wearables for hypertension: a deep learning mechanism. Front Public Health 2025; 12:1426168. [PMID: 39850864 PMCID: PMC11755415 DOI: 10.3389/fpubh.2024.1426168] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2024] [Accepted: 11/25/2024] [Indexed: 01/25/2025] Open
Abstract
Introduction The growing demand for real-time, affordable, and accessible healthcare has underscored the need for advanced technologies that can provide timely health monitoring. One such area is predicting arterial blood pressure (BP) using non-invasive methods, which is crucial for managing cardiovascular diseases. This research aims to address the limitations of current healthcare systems, particularly in remote areas, by leveraging deep learning techniques in Smart Health Monitoring (SHM). Methods This paper introduces a novel neural network architecture, ResNet-LSTM, to predict BP from physiological signals such as electrocardiogram (ECG) and photoplethysmogram (PPG). The combination of ResNet's feature extraction capabilities and LSTM's sequential data processing offers improved prediction accuracy. Comprehensive error analysis was conducted, and the model was validated using Leave-One-Out (LOO) cross-validation and an additional dataset. Results The ResNet-LSTM model showed superior performance, particularly with PPG data, achieving a mean absolute error (MAE) of 6.2 mmHg and a root mean square error (RMSE) of 8.9 mmHg for BP prediction. Despite the higher computational cost (~4,375 FLOPs), the improved accuracy and generalization across datasets demonstrate the model's robustness and suitability for continuous BP monitoring. Discussion The results confirm the potential of integrating ResNet-LSTM into SHM for accurate and non-invasive BP prediction. This approach also highlights the need for accurate anomaly detection in continuous monitoring systems, especially for wearable devices. Future work will focus on enhancing cloud-based infrastructures for real-time analysis and refining anomaly detection models to improve patient outcomes.
Collapse
Affiliation(s)
| | | | - R. Madana Mohana
- Department of Artificial Intelligence and Data Science, Chaithanya Bharathi Institute of Technology, Hyderabad, Telangana, India
| | - Mohammed Alhameed
- Department of Computer Science, College of Engineering and Computer Science, Jazan University, Jazan, Saudi Arabia
| | - Fathe Jeribi
- Department of Computer Science, College of Engineering and Computer Science, Jazan University, Jazan, Saudi Arabia
| | - Shadab Alam
- Department of Computer Science, College of Engineering and Computer Science, Jazan University, Jazan, Saudi Arabia
| | - Mohammed Shuaib
- Department of Computer Science, College of Engineering and Computer Science, Jazan University, Jazan, Saudi Arabia
| |
Collapse
|
37
|
Jiang Y, Ebrahimpour L, Després P, Manem VS. A benchmark of deep learning approaches to predict lung cancer risk using national lung screening trial cohort. Sci Rep 2025; 15:1736. [PMID: 39799226 PMCID: PMC11724919 DOI: 10.1038/s41598-024-84193-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2024] [Accepted: 12/20/2024] [Indexed: 01/15/2025] Open
Abstract
Deep learning (DL) methods have demonstrated remarkable effectiveness in assisting with lung cancer risk prediction tasks using computed tomography (CT) scans. However, the lack of comprehensive comparison and validation of state-of-the-art (SOTA) models in practical settings limits their clinical application. This study aims to review and analyze current SOTA deep learning models for lung cancer risk prediction (malignant-benign classification). To evaluate our model's general performance, we selected 253 out of 467 patients from a subset of the National Lung Screening Trial (NLST) who had CT scans without contrast, which are the most commonly used, and divided them into training and test cohorts. The CT scans were preprocessed into 2D-image and 3D-volume formats according to their nodule annotations. We evaluated ten 3D and eleven 2D SOTA deep learning models, which were pretrained on large-scale general-purpose datasets (Kinetics and ImageNet) and radiological datasets (3DSeg-8, nnUnet and RadImageNet), for their lung cancer risk prediction performance. Our results showed that 3D-based deep learning models generally perform better than 2D models. On the test cohort, the best-performing 3D model achieved an AUROC of 0.86, while the best 2D model reached 0.79. The lowest AUROCs for the 3D and 2D models were 0.70 and 0.62, respectively. Furthermore, pretraining on large-scale radiological image datasets did not show the expected performance advantage over pretraining on general-purpose datasets. Both 2D and 3D deep learning models can handle lung cancer risk prediction tasks effectively, although 3D models generally have superior performance than their 2D competitors. Our findings highlight the importance of carefully selecting pretrained datasets and model architectures for lung cancer risk prediction. Overall, these results have important implications for the development and clinical integration of DL-based tools in lung cancer screening.
Collapse
Affiliation(s)
- Yifan Jiang
- Centre de recherche du CHU de Québec-Université Laval, Quebec City, Canada
- Département de biologie moléculaire, de biochimie médicale et de pathologie, Université Laval, Quebec City, Canada
- Institute Intelligence and Data, Université Laval, Quebec City, Canada
| | - Leyla Ebrahimpour
- Centre de recherche du CHU de Québec-Université Laval, Quebec City, Canada
- Département de biologie moléculaire, de biochimie médicale et de pathologie, Université Laval, Quebec City, Canada
- Département de physique, de génie physique et d'optique, Université Laval, Quebec City, Canada
- Centre de recherche de l'Institut universitaire de cardiologie et de pneumologie de Québec, Quebec City, Canada
- Institute Intelligence and Data, Université Laval, Quebec City, Canada
| | - Philippe Després
- Département de physique, de génie physique et d'optique, Université Laval, Quebec City, Canada
- Centre de recherche de l'Institut universitaire de cardiologie et de pneumologie de Québec, Quebec City, Canada
- Big Data Research Center, Université Laval, Quebec City, Canada
- Institute Intelligence and Data, Université Laval, Quebec City, Canada
| | - Venkata Sk Manem
- Centre de recherche du CHU de Québec-Université Laval, Quebec City, Canada.
- Département de biologie moléculaire, de biochimie médicale et de pathologie, Université Laval, Quebec City, Canada.
- Cancer Research Center, Université Laval, Quebec City, Canada.
- Big Data Research Center, Université Laval, Quebec City, Canada.
- Institute Intelligence and Data, Université Laval, Quebec City, Canada.
| |
Collapse
|
38
|
Lee H, Cho S, Song J, Kim H, Shin Y. An Enhanced Approach Using AGS Network for Skin Cancer Classification. SENSORS (BASEL, SWITZERLAND) 2025; 25:394. [PMID: 39860766 PMCID: PMC11769443 DOI: 10.3390/s25020394] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/25/2024] [Revised: 12/19/2024] [Accepted: 01/10/2025] [Indexed: 01/27/2025]
Abstract
Skin cancer accounts for over 40% of all cancer diagnoses worldwide. However, accurately diagnosing skin cancer remains challenging for dermatologists, as multiple types of skin cancer often appear visually similar. The diagnostic accuracy of dermatologists ranges between 62% and 80%. Although AI models have shown promise in assisting with skin cancer classification in various studies, obtaining the large-scale medical image datasets required for AI model training is not straightforward. To address this limitation, this study proposes the AGS network, designed to overcome the challenges of small datasets and enhance the performance of skin cancer classifiers. The AGS network integrates three key modules: Augmentation (A), GAN (G), and Segmentation (S). It was evaluated using eight deep learning classifiers-GoogLeNet, DenseNet201, ResNet50, MobileNet V3, EfficientNet B0, ViT, EfficientNet V2, and Swin Transformers-on the HAM10000 dataset. Five model configurations were also tested to assess the contribution of each module. The results showed that all eight classifiers demonstrated consistent performance improvements with the AGS network. In particular, EfficientNet V2 + AGS achieved the most significant performance gains over the baseline model, with an increase of +0.1808 in Accuracy and +0.1674 in F1-Score. Among all configurations, ResNet50+AGS achieved the best overall performance, with an Accuracy of 95.87% and an F1-Score of 95.73%. While most previous studies focused on single augmentation methods, this study demonstrates the effectiveness of combining multiple augmentation techniques within an integrated framework. The AGS network demonstrates how integrating diverse methods can improve the performance of skin cancer classification models.
Collapse
Affiliation(s)
- Hwanyoung Lee
- Department of Computer Science and Information Engineering, The Catholic University of Korea, Bucheon 14662, Republic of Korea;
| | - Seeun Cho
- Department of Artificial Intelligence, The Catholic University of Korea, Bucheon 14662, Republic of Korea; (S.C.); (J.S.)
| | - Jiyoon Song
- Department of Artificial Intelligence, The Catholic University of Korea, Bucheon 14662, Republic of Korea; (S.C.); (J.S.)
| | - Hoyoung Kim
- Department of Computer Science, Stony Brook University, Stony Brook, NY 11794, USA;
| | - Youjin Shin
- Department of Data Science, The Catholic University of Korea, Bucheon 14662, Republic of Korea
| |
Collapse
|
39
|
Li C, Liao Y, Ding C, Ye Z. MDAPT: Multi-Modal Depth Adversarial Prompt Tuning to Enhance the Adversarial Robustness of Visual Language Models. SENSORS (BASEL, SWITZERLAND) 2025; 25:258. [PMID: 39797049 PMCID: PMC11723442 DOI: 10.3390/s25010258] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/03/2024] [Revised: 12/23/2024] [Accepted: 01/02/2025] [Indexed: 01/13/2025]
Abstract
Large visual language models like Contrastive Language-Image Pre-training (CLIP), despite their excellent performance, are highly vulnerable to the influence of adversarial examples. This work investigates the accuracy and robustness of visual language models (VLMs) from a novel multi-modal perspective. We propose a multi-modal fine-tuning method called Multi-modal Depth Adversarial Prompt Tuning (MDAPT), which guides the generation of visual prompts through text prompts to improve the accuracy and performance of visual language models. We conducted extensive experiments and significantly improved performance on three datasets (ϵ=4/255). Compared with traditional manual design prompts, the accuracy and robustness increased by an average of 17.84% and 10.85%, respectively. Not only that, our method still has a very good performance improvement under different attack methods. With our efficient settings, compared with traditional manual prompts, our average accuracy and robustness are improved by 32.16% and 21.00%, respectively, under three different attacks.
Collapse
Affiliation(s)
- Chao Li
- School of Computer Science, Hubei University of Technology, Wuhan 430068, China; (C.L.); (Y.L.); (Z.Y.)
| | - Yonghao Liao
- School of Computer Science, Hubei University of Technology, Wuhan 430068, China; (C.L.); (Y.L.); (Z.Y.)
| | - Caichang Ding
- School of Computer and Information Science, Hubei Engineering University, Xiaogan 432000, China
| | - Zhiwei Ye
- School of Computer Science, Hubei University of Technology, Wuhan 430068, China; (C.L.); (Y.L.); (Z.Y.)
| |
Collapse
|
40
|
Silva-Rodríguez J, Chakor H, Kobbi R, Dolz J, Ben Ayed I. A Foundation Language-Image Model of the Retina (FLAIR): encoding expert knowledge in text supervision. Med Image Anal 2025; 99:103357. [PMID: 39418828 DOI: 10.1016/j.media.2024.103357] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2023] [Revised: 05/06/2024] [Accepted: 09/23/2024] [Indexed: 10/19/2024]
Abstract
Foundation vision-language models are currently transforming computer vision, and are on the rise in medical imaging fueled by their very promising generalization capabilities. However, the initial attempts to transfer this new paradigm to medical imaging have shown less impressive performances than those observed in other domains, due to the significant domain shift and the complex, expert domain knowledge inherent to medical-imaging tasks. Motivated by the need for domain-expert foundation models, we present FLAIR, a pre-trained vision-language model for universal retinal fundus image understanding. To this end, we compiled 38 open-access, mostly categorical fundus imaging datasets from various sources, with up to 101 different target conditions and 288,307 images. We integrate the expert's domain knowledge in the form of descriptive textual prompts, during both pre-training and zero-shot inference, enhancing the less-informative categorical supervision of the data. Such a textual expert's knowledge, which we compiled from the relevant clinical literature and community standards, describes the fine-grained features of the pathologies as well as the hierarchies and dependencies between them. We report comprehensive evaluations, which illustrate the benefit of integrating expert knowledge and the strong generalization capabilities of FLAIR under difficult scenarios with domain shifts or unseen categories. When adapted with a lightweight linear probe, FLAIR outperforms fully-trained, dataset-focused models, more so in the few-shot regimes. Interestingly, FLAIR outperforms by a wide margin larger-scale generalist image-language models and retina domain-specific self-supervised networks, which emphasizes the potential of embedding experts' domain knowledge and the limitations of generalist models in medical imaging. The pre-trained model is available at: https://github.com/jusiro/FLAIR.
Collapse
Affiliation(s)
| | | | | | - Jose Dolz
- ÉTS Montréal, Québec, Canada; Centre de Recherche du Centre Hospitalier de l'Université de Montréal (CR-CHUM), Québec, Canada
| | - Ismail Ben Ayed
- ÉTS Montréal, Québec, Canada; Centre de Recherche du Centre Hospitalier de l'Université de Montréal (CR-CHUM), Québec, Canada
| |
Collapse
|
41
|
Drazinos P, Gatos I, Katsakiori PF, Tsantis S, Syrmas E, Spiliopoulos S, Karnabatidis D, Theotokas I, Zoumpoulis P, Hazle JD, Kagadis GC. Comparison of deep learning schemes in grading non-alcoholic fatty liver disease using B-mode ultrasound hepatorenal window images with liver biopsy as the gold standard. Phys Med 2025; 129:104862. [PMID: 39626614 DOI: 10.1016/j.ejmp.2024.104862] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/05/2024] [Revised: 10/11/2024] [Accepted: 11/27/2024] [Indexed: 01/07/2025] Open
Abstract
BACKGROUND/INTRODUCTION To evaluate the performance of pre-trained deep learning schemes (DLS) in hepatic steatosis (HS) grading of Non-Alcoholic Fatty Liver Disease (NAFLD) patients, using as input B-mode US images containing right kidney (RK) cortex and liver parenchyma (LP) areas indicated by an expert radiologist. METHODS A total of 112 consecutively enrolled, biopsy-validated NAFLD patients underwent a regular abdominal B-mode US examination. For each patient, a radiologist obtained a B-mode US image containing RK cortex and LP and marked a point between the RK and LP, around which a window was automatically cropped. The cropped image dataset was augmented using up-sampling, and the augmented and non-augmented datasets were sorted by HS grade. Each dataset was split into training (70%) and testing (30%), and fed separately as input to InceptionV3, MobileNetV2, ResNet50, DenseNet201, and NASNetMobile pre-trained DLS. A receiver operating characteristic (ROC) analysis of hepatorenal index (HRI) measurements by the radiologist from the same cropped images was used for comparison with the performance of the DLS. RESULTS With the test data, the DLS reached 89.15 %-93.75 % accuracy when comparing HS grades S0-S1 vs. S2-S3 and 79.69 %-91.21 % accuracy for S0 vs. S1 vs. S2 vs. S3 with augmentation, and 80.45-82.73 % accuracy when comparing S0-S1 vs. S2-S3 and 59.54 %-63.64 % accuracy for S0 vs. S1 vs. S2 vs. S3 without augmentation. The performance of radiologists' HRI measurement after ROC analysis was 82 %, 91.56 %, and 96.19 % for thresholds of S ≥ S1, S ≥ S2, and S = S3, respectively. CONCLUSION All networks achieved high performance in HS assessment. DenseNet201 with the use of augmented data seems to be the most efficient supplementary tool for NAFLD diagnosis and grading.
Collapse
Affiliation(s)
- Petros Drazinos
- 3DMI Research Group, Department of Medical Physics, School of Medicine, University of Patras, Rion, GR 26504, Greece; Diagnostic Echotomography SA, Kifissia, GR 14561, Greece
| | - Ilias Gatos
- 3DMI Research Group, Department of Medical Physics, School of Medicine, University of Patras, Rion, GR 26504, Greece
| | - Paraskevi F Katsakiori
- 3DMI Research Group, Department of Medical Physics, School of Medicine, University of Patras, Rion, GR 26504, Greece
| | - Stavros Tsantis
- 3DMI Research Group, Department of Medical Physics, School of Medicine, University of Patras, Rion, GR 26504, Greece
| | - Efstratios Syrmas
- 3DMI Research Group, Department of Medical Physics, School of Medicine, University of Patras, Rion, GR 26504, Greece
| | - Stavros Spiliopoulos
- Second Department of Radiology, School of Medicine, University of Athens, Athens, GR 12461, Greece
| | - Dimitris Karnabatidis
- Department of Radiology, School of Medicine, University of Patras, Patras, GR 26504, Greece
| | | | | | - John D Hazle
- Department of Imaging Physics, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
| | - George C Kagadis
- 3DMI Research Group, Department of Medical Physics, School of Medicine, University of Patras, Rion, GR 26504, Greece; Department of Imaging Physics, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA.
| |
Collapse
|
42
|
Xu J, Huang K, Zhong L, Gao Y, Sun K, Liu W, Zhou Y, Guo W, Guo Y, Zou Y, Duan Y, Lu L, Wang Y, Chen X, Zhao S. RemixFormer++: A Multi-Modal Transformer Model for Precision Skin Tumor Differential Diagnosis With Memory-Efficient Attention. IEEE TRANSACTIONS ON MEDICAL IMAGING 2025; 44:320-337. [PMID: 39120989 DOI: 10.1109/tmi.2024.3441012] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/11/2024]
Abstract
Diagnosing malignant skin tumors accurately at an early stage can be challenging due to ambiguous and even confusing visual characteristics displayed by various categories of skin tumors. To improve diagnosis precision, all available clinical data from multiple sources, particularly clinical images, dermoscopy images, and medical history, could be considered. Aligning with clinical practice, we propose a novel Transformer model, named RemixFormer++ that consists of a clinical image branch, a dermoscopy image branch, and a metadata branch. Given the unique characteristics inherent in clinical and dermoscopy images, specialized attention strategies are adopted for each type. Clinical images are processed through a top-down architecture, capturing both localized lesion details and global contextual information. Conversely, dermoscopy images undergo a bottom-up processing with two-level hierarchical encoders, designed to pinpoint fine-grained structural and textural features. A dedicated metadata branch seamlessly integrates non-visual information by encoding relevant patient data. Fusing features from three branches substantially boosts disease classification accuracy. RemixFormer++ demonstrates exceptional performance on four single-modality datasets (PAD-UFES-20, ISIC 2017/2018/2019). Compared with the previous best method using a public multi-modal Derm7pt dataset, we achieved an absolute 5.3% increase in averaged F1 and 1.2% in accuracy for the classification of five skin tumors. Furthermore, using a large-scale in-house dataset of 10,351 patients with the twelve most common skin tumors, our method obtained an overall classification accuracy of 92.6%. These promising results, on par or better with the performance of 191 dermatologists through a comprehensive reader study, evidently imply the potential clinical usability of our method.
Collapse
|
43
|
Wang Y, Zhang W, Liu X, Tian L, Li W, He P, Huang S, He F, Pan X. Artificial intelligence in precision medicine for lung cancer: A bibliometric analysis. Digit Health 2025; 11:20552076241300229. [PMID: 39758259 PMCID: PMC11696962 DOI: 10.1177/20552076241300229] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2023] [Accepted: 10/28/2024] [Indexed: 01/07/2025] Open
Abstract
Background The increasing body of evidence has been stimulating the application of artificial intelligence (AI) in precision medicine research for lung cancer. This trend necessitates a comprehensive overview of the growing number of publications to facilitate researchers' understanding of this field. Method The bibliometric data for the current analysis was extracted from the Web of Science Core Collection database, CiteSpace, VOSviewer ,and an online website were applied to the analysis. Results After the data were filtered, this search yielded 4062 manuscripts. And 92.27% of the papers were published from 2014 onwards. The main contributing countries were China, the United States, India, Japan, and Korea. These publications were mainly published in the following scientific disciplines, including Radiology Nuclear Medicine, Medical Imaging, Oncology, and Computer Science Notably, Li Weimin and Aerts Hugo J. W. L. stand out as leading authorities in this domain. In the keyword co-occurrence and co-citation cluster analysis of the publication, the knowledge base was divided into four clusters that are more easily understood, including screening, diagnosis, treatment, and prognosis. Conclusion This bibliometric study reveals deep learning frameworks and AI-based radiomics are receiving attention. High-quality and standardized data have the potential to revolutionize lung cancer screening and diagnosis in the era of precision medicine. However, the importance of high-quality clinical datasets, the development of new and combined AI models, and their consistent assessment for advancing research on AI applications in lung cancer are highlighted before current research can be effectively applied in clinical practice.
Collapse
Affiliation(s)
- Yuchai Wang
- Department of Pharmacy, Hunan University of Chinese Medicine, Changsha, Hunan Province, China
| | - Weilong Zhang
- Department of Pharmacy, Hunan University of Chinese Medicine, Changsha, Hunan Province, China
| | - Xiang Liu
- Department of Pharmacy, Hunan University of Chinese Medicine, Changsha, Hunan Province, China
| | - Li Tian
- Department of Pharmacy, Hunan University of Chinese Medicine, Changsha, Hunan Province, China
| | - Wenjiao Li
- Department of Pharmacy, Hunan University of Chinese Medicine, Changsha, Hunan Province, China
| | - Peng He
- Department of Pharmacy, Hunan University of Chinese Medicine, Changsha, Hunan Province, China
| | - Sheng Huang
- Department of Pharmacy, Hunan University of Chinese Medicine, Changsha, Hunan Province, China
- Jiuzhitang Co., Ltd, Changsha, Hunan Province, China
| | - Fuyuan He
- School of Pharmacy, Hunan University of Chinese Medicine, Changsha, Hunan Province, China
| | - Xue Pan
- School of Pharmacy, Hunan University of Chinese Medicine, Changsha, Hunan Province, China
| |
Collapse
|
44
|
Pérez-Núñez JR, Rodríguez C, Vásquez-Serpa LJ, Navarro C. The Challenge of Deep Learning for the Prevention and Automatic Diagnosis of Breast Cancer: A Systematic Review. Diagnostics (Basel) 2024; 14:2896. [PMID: 39767257 PMCID: PMC11675111 DOI: 10.3390/diagnostics14242896] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2024] [Revised: 11/24/2024] [Accepted: 12/18/2024] [Indexed: 01/11/2025] Open
Abstract
OBJECTIVES This review aims to evaluate several convolutional neural network (CNN) models applied to breast cancer detection, to identify and categorize CNN variants in recent studies, and to analyze their specific strengths, limitations, and challenges. METHODS Using PRISMA methodology, this review examines studies that focus on deep learning techniques, specifically CNN, for breast cancer detection. Inclusion criteria encompassed studies from the past five years, with duplicates and those unrelated to breast cancer excluded. A total of 62 articles from the IEEE, SCOPUS, and PubMed databases were analyzed, exploring CNN architectures and their applicability in detecting this pathology. RESULTS The review found that CNN models with advanced architecture and greater depth exhibit high accuracy and sensitivity in image processing and feature extraction for breast cancer detection. CNN variants that integrate transfer learning proved particularly effective, allowing the use of pre-trained models with less training data required. However, challenges include the need for large, labeled datasets and significant computational resources. CONCLUSIONS CNNs represent a promising tool in breast cancer detection, although future research should aim to create models that are more resource-efficient and maintain accuracy while reducing data requirements, thus improving clinical applicability.
Collapse
Affiliation(s)
- Jhelly-Reynaluz Pérez-Núñez
- Facultad de Ingeniería de Sistemas e Informática, Universidad Nacional Mayor de San Marcos (UNMSM), Lima 15081, Peru; (C.R.); (L.-J.V.-S.); (C.N.)
| | | | | | | |
Collapse
|
45
|
Liu G, He J, Li P, Zhao Z, Zhong S. Cross-Modal self-supervised vision language pre-training with multiple objectives for medical visual question answering. J Biomed Inform 2024; 160:104748. [PMID: 39536998 DOI: 10.1016/j.jbi.2024.104748] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2024] [Revised: 09/29/2024] [Accepted: 11/03/2024] [Indexed: 11/16/2024]
Abstract
Medical Visual Question Answering (VQA) is a task that aims to provide answers to questions about medical images, which utilizes both visual and textual information in the reasoning process. The absence of large-scale annotated medical VQA datasets presents a formidable obstacle to training a medical VQA model from scratch in an end-to-end manner. Existing works have been using image captioning dataset in the pre-training stage and fine-tuning to downstream VQA tasks. Following the same paradigm, we use a collection of public medical image captioning datasets to pre-train multimodality models in a self-supervised setup, and fine-tune to downstream medical VQA tasks. In the work, we propose a method that featured with Cross-Modal pre-training with Multiple Objectives (CMMO), which includes masked image modeling, masked language modeling, image-text matching, and image-text contrastive learning. The proposed method is designed to associate the visual features of medical images with corresponding medical concepts in captions, for learning aligned vision and language feature representations, and multi-modal interactions. The experimental results reveal that our proposed CMMO method outperforms state-of-the-art methods on three public medical VQA datasets, showing absolute improvements of 2.6%, 0.9%, and 4.0% on the VQA-RAD, PathVQA, and SLAKE dataset, respectively. We also conduct comprehensive ablation studies to validate our method, and visualize the attention maps which show a strong interpretability. The code and pre-trained weights will be released at https://github.com/pengfeiliHEU/CMMO.
Collapse
Affiliation(s)
- Gang Liu
- College of Computer Science and Technology, Harbin Engineering University, Harbin, 150001, Heilongjiang, China.
| | - Jinlong He
- College of Computer Science and Technology, Harbin Engineering University, Harbin, 150001, Heilongjiang, China.
| | - Pengfei Li
- College of Computer Science and Technology, Harbin Engineering University, Harbin, 150001, Heilongjiang, China.
| | - Zixu Zhao
- College of Computer Science and Technology, Harbin Engineering University, Harbin, 150001, Heilongjiang, China.
| | - Shenjun Zhong
- Monash Biomedical Imaging, Monash University, Melbourne, 3800, Victoria, Australia.
| |
Collapse
|
46
|
Jun E, Jeong S, Heo DW, Suk HI. Medical Transformer: Universal Encoder for 3-D Brain MRI Analysis. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:17779-17789. [PMID: 37738193 DOI: 10.1109/tnnls.2023.3308712] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/24/2023]
Abstract
Transfer learning has attracted considerable attention in medical image analysis because of the limited number of annotated 3-D medical datasets available for training data-driven deep learning models in the real world. We propose Medical Transformer, a novel transfer learning framework that effectively models 3-D volumetric images as a sequence of 2-D image slices. To improve the high-level representation in 3-D-form empowering spatial relations, we use a multiview approach that leverages information from three planes of the 3-D volume, while providing parameter-efficient training. For building a source model generally applicable to various tasks, we pretrain the model using self-supervised learning (SSL) for masked encoding vector prediction as a proxy task, using a large-scale normal, healthy brain magnetic resonance imaging (MRI) dataset. Our pretrained model is evaluated on three downstream tasks: 1) brain disease diagnosis; 2) brain age prediction; and 3) brain tumor segmentation, which are widely studied in brain MRI research. Experimental results demonstrate that our Medical Transformer outperforms the state-of-the-art (SOTA) transfer learning methods, efficiently reducing the number of parameters by up to approximately 92% for classification and regression tasks and 97% for segmentation task, and it also achieves good performance in scenarios where only partial training samples are used.
Collapse
|
47
|
Rundo L, Militello C. Image biomarkers and explainable AI: handcrafted features versus deep learned features. Eur Radiol Exp 2024; 8:130. [PMID: 39560820 PMCID: PMC11576747 DOI: 10.1186/s41747-024-00529-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2024] [Accepted: 10/16/2024] [Indexed: 11/20/2024] Open
Abstract
Feature extraction and selection from medical data are the basis of radiomics and image biomarker discovery for various architectures, including convolutional neural networks (CNNs). We herein describe the typical radiomics steps and the components of a CNN for both deep feature extraction and end-to-end approaches. We discuss the curse of dimensionality, along with dimensionality reduction techniques. Despite the outstanding performance of deep learning (DL) approaches, the use of handcrafted features instead of deep learned features needs to be considered for each specific study. Dataset size is a key factor: large-scale datasets with low sample diversity could lead to overfitting; limited sample sizes can provide unstable models. The dataset must be representative of all the "facets" of the clinical phenomenon/disease investigated. The access to high-performance computational resources from graphics processing units is another key factor, especially for the training phase of deep architectures. The advantages of multi-institutional federated/collaborative learning are described. When large language models are used, high stability is needed to avoid catastrophic forgetting in complex domain-specific tasks. We highlight that non-DL approaches provide model explainability superior to that provided by DL approaches. To implement explainability, the need for explainable AI arises, also through post hoc mechanisms. RELEVANCE STATEMENT: This work aims to provide the key concepts for processing the imaging features to extract reliable and robust image biomarkers. KEY POINTS: The key concepts for processing the imaging features to extract reliable and robust image biomarkers are provided. The main differences between radiomics and representation learning approaches are highlighted. The advantages and disadvantages of handcrafted versus learned features are given without losing sight of the clinical purpose of artificial intelligence models.
Collapse
Affiliation(s)
- Leonardo Rundo
- Department of Information and Electrical Engineering and Applied Mathematics (DIEM), University of Salerno, Fisciano, Salerno, Italy.
| | - Carmelo Militello
- High Performance Computing and Networking Institute (ICAR-CNR), Italian National Research Council, Palermo, Italy
| |
Collapse
|
48
|
Liu K, Zhang J. Development of a Cost-Efficient and Glaucoma-Specialized OD/OC Segmentation Model for Varying Clinical Scenarios. SENSORS (BASEL, SWITZERLAND) 2024; 24:7255. [PMID: 39599032 PMCID: PMC11597940 DOI: 10.3390/s24227255] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/01/2024] [Revised: 10/31/2024] [Accepted: 11/11/2024] [Indexed: 11/29/2024]
Abstract
Most existing optic disc (OD) and cup (OC) segmentation models are biased to the dominant size and easy class (normal class), resulting in suboptimal performances on glaucoma-confirmed samples. Thus, these models are not optimal choices for assisting in tracking glaucoma progression and prognosis. Moreover, fully supervised models employing annotated glaucoma samples can achieve superior performances, although restricted by the high cost of collecting and annotating the glaucoma samples. Therefore, in this paper, we are dedicated to developing a glaucoma-specialized model by exploiting low-cost annotated normal fundus images, simultaneously adapting various common scenarios in clinical practice. We employ a contrastive learning and domain adaptation-based model by exploiting shared knowledge from normal samples. To capture glaucoma-related features, we utilize a Gram matrix to encode style information and the domain adaptation strategy to encode domain information, followed by narrowing the style and domain gaps between normal and glaucoma samples by contrastive and adversarial learning, respectively. To validate the efficacy of our proposed model, we conducted experiments utilizing two public datasets to mimic various common scenarios. The results demonstrate the superior performance of our proposed model across multi-scenarios, showcasing its proficiency in both the segmentation- and glaucoma-related metrics. In summary, our study illustrates a concerted effort to target confirmed glaucoma samples, mitigating the inherent bias issue in most existing models. Moreover, we propose an annotation-efficient strategy that exploits low-cost, normal-labeled fundus samples, mitigating the economic- and labor-related burdens by employing a fully supervised strategy. Simultaneously, our approach demonstrates its adaptability across various scenarios, highlighting its potential utility in both assisting in the monitoring of glaucoma progression and assessing glaucoma prognosis.
Collapse
Affiliation(s)
- Kai Liu
- School of Biological Science and Medical Engineering, Beihang University, Beijing 100083, China;
- Beijing Advanced Innovation Centre for Biomedical Engineering, Beihang University, Beijing 100083, China
- Department of Computer Science, City University of Hong Kong, Hong Kong 98121, China
| | - Jicong Zhang
- School of Biological Science and Medical Engineering, Beihang University, Beijing 100083, China;
- Beijing Advanced Innovation Centre for Biomedical Engineering, Beihang University, Beijing 100083, China
- Hefei Innovation Research Institute, Beihang University, Hefei 230012, China
| |
Collapse
|
49
|
Gravina M, Maddaluno M, Marrone S, Sansone M, Fusco R, Granata V, Petrillo A, Sansone C. A Physiological-Informed Generative Model for Improving Breast Lesion Classification in Small DCE-MRI Datasets. IEEE J Biomed Health Inform 2024; 28:6764-6777. [PMID: 39141452 DOI: 10.1109/jbhi.2024.3443705] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/16/2024]
Abstract
In biomedical image processing, Deep Learning (DL) is increasingly exploited in various forms and for diverse purposes. Despite unprecedented results, the huge number of parameters to learn, which necessitates a substantial number of annotated samples, remains a significant challenge. In medical domains, obtaining high-quality labelled datasets is still a challenging task. In recent years, several works have leveraged data augmentation to face this issue, mostly thanks to the introduction of generative models able to produce artificial samples having the same characteristics as the acquired ones. However, we claim that biological principles must be considered in this process, as all medical imaging techniques exploit one or more physical laws or properties directly associated with the physiological characteristics of the tissues under analysis. A notable example is the Dynamic Contrast Enhanced-Magnetic Resonance Imaging (DCE-MRI), in which the kinetic of the contrast agent (CA) highlights both morphological and physiological aspects. In this paper, we introduce a novel generative approach explicitly relying on Physiologically Based Pharmacokinetic (PBPK) modelling and on an Intrinsic Deforming Autoencoder (DAE) to implement a physiologically-aware data augmentation strategy. As a case of study, we consider breast DCE-MRI. In particular, we tested our proposal on two private and one public datasets with different acquisition protocols, demonstrating that the proposed method significantly improves the performance of several DL-based lesion classifiers.
Collapse
|
50
|
Hao J, Chen S. Language-aware multiple datasets detection pretraining for DETRs. Neural Netw 2024; 179:106506. [PMID: 38996689 DOI: 10.1016/j.neunet.2024.106506] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2024] [Revised: 05/17/2024] [Accepted: 07/02/2024] [Indexed: 07/14/2024]
Abstract
Pretraining on large-scale datasets can boost the performance of object detectors while the annotated datasets for object detection are hard to scale up due to the high labor cost. What we possess are numerous isolated filed-specific datasets, thus, it is appealing to jointly pretrain models across aggregation of datasets to enhance data volume and diversity. In this paper, we propose a strong framework for utilizing Multiple datasets to pretrain DETR-like detectors, termed METR, without the need for manual label spaces integration. It converts the typical multi-classification in object detection into binary classification by introducing a pre-trained language model. Specifically, we design a category extraction module for extracting potential categories involved in an image and assign these categories into different queries by language embeddings. Each query is only responsible for predicting a class-specific object. Besides, to adapt our novel detection paradigm, we propose a Class-wise Bipartite Matching strategy that limits the ground truths to match queries assigned to the same category. Extensive experiments demonstrate that METR achieves extraordinary results on either multi-task joint training or the pretrain & finetune paradigm. Notably, our pre-trained models have high flexible transferability and increase the performance upon various DETR-like detectors on COCO val2017 benchmark. Our code is publicly available at: https://github.com/isbrycee/METR.
Collapse
Affiliation(s)
- Jing Hao
- VIS, Baidu Inc., Beijing, 100000, China.
| | - Song Chen
- VIS, Baidu Inc., Beijing, 100000, China.
| |
Collapse
|