1
|
Li G, Huang Q, Wang W, Liu L. Human visual perception-inspired medical image segmentation network with multi-feature compression. Artif Intell Med 2025; 165:103133. [PMID: 40279876 DOI: 10.1016/j.artmed.2025.103133] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2024] [Revised: 04/04/2025] [Accepted: 04/10/2025] [Indexed: 04/29/2025]
Abstract
Medical image segmentation is crucial for computer-aided diagnosis and treatment planning, directly influencing clinical decision-making. To enhance segmentation accuracy, existing methods typically fuse local, global, and various other features. However, these methods often ignore the negative impact of noise on the results during the feature fusion process. In contrast, certain regions of the human visual system, such as the inferotemporal cortex and parietal cortex, effectively suppress irrelevant noise while integrating multiple features-a capability lacking in current methods. To address this gap, we propose MS-Net, a medical image segmentation network inspired by human visual perception. MS-Net incorporates a multi-feature compression (MFC) module that mimics the human visual system's processing of complex images, first learning various feature types and subsequently filtering out irrelevant ones. Additionally, MS-Net features a segmentation refinement (SR) module that emulates how physicians segment lesions. This module initially performs coarse segmentation to capture the lesion's approximate location and shape, followed by a refinement step to achieve precise boundary delineation. Experimental results demonstrate that MS-Net not only attains state-of-the-art segmentation performance across three public datasets but also significantly reduces the number of parameters compared to existing models. Code is available at https://github.com/guangguangLi/MS-Net.
Collapse
Affiliation(s)
- Guangju Li
- School of Computer Science, Northwestern Polytechnical University, Xi'an 710129, China; School of Artificial Intelligence, OPtics and ElectroNics (iOPEN), Northwestern Polytechnical University, Xi'an 710072, China.
| | - Qinghua Huang
- School of Artificial Intelligence, OPtics and ElectroNics (iOPEN), Northwestern Polytechnical University, Xi'an 710072, China.
| | - Wei Wang
- Department of Medical Ultrasonics, Institute of Diagnostic and Interventional Ultrasound, Ultrasomics Artificial Intelligence X-Lab, The First Affiliated Hospital of Sun Yat-Sen University, Guangzhou 510060, China.
| | - Longzhong Liu
- Department of Ultrasound, State Key Laboratory of Oncology in South China, Guangdong Provincial Clinical Research Center for Cancer, Sun Yat-sen University Cancer Center, Guangzhou 510060, China.
| |
Collapse
|
2
|
Yu L, Gou B, Xia X, Yang Y, Yi Z, Min X, He T. BUS-M2AE: Multi-scale Masked Autoencoder for Breast Ultrasound Image Analysis. Comput Biol Med 2025; 191:110159. [PMID: 40252289 DOI: 10.1016/j.compbiomed.2025.110159] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2024] [Revised: 03/12/2025] [Accepted: 04/04/2025] [Indexed: 04/21/2025]
Abstract
Masked AutoEncoder (MAE) has demonstrated significant potential in medical image analysis by reducing the cost of manual annotations. However, MAE and its recent variants are not well-developed for ultrasound images in breast cancer diagnosis, as they struggle to generalize to the task of distinguishing ultrasound breast tumors of varying sizes. This limitation hinders the model's ability to adapt to the diverse morphological characteristics of breast tumors. In this paper, we propose a novel Breast UltraSound Multi-scale Masked AutoEncoder (BUS-M2AE) model to address the limitations of the general MAE. BUS-M2AE incorporates multi-scale masking methods at both the token level during the image patching stage and the feature level during the feature learning stage. These two multi-scale masking methods enable flexible strategies to match the explicit masked patches and the implicit features with varying tumor scales. By introducing these multi-scale masking methods in the image patching and feature learning phases, BUS-M2AE allows the pre-trained vision transformer to adaptively perceive and accurately distinguish breast tumors of different sizes, thereby improving the model's overall performance in handling diverse tumor morphologies. Comprehensive experiments demonstrate that BUS-M2AE outperforms recent MAE variants and commonly used supervised learning methods in breast cancer classification and tumor segmentation tasks.
Collapse
Affiliation(s)
- Le Yu
- College of Computer Science, Sichuan University, Chengdu, 610065, China
| | - Bo Gou
- College of Computer Science, Sichuan University, Chengdu, 610065, China; School of Clinical Medicine, The First Affiliated Hospital of Chengdu Medical College, Chengdu, 610065, China
| | - Xun Xia
- School of Clinical Medicine, The First Affiliated Hospital of Chengdu Medical College, Chengdu, 610065, China
| | - Yujia Yang
- Department of Medical Ultrasound, West China Hospital of Sichuan University, Chengdu, 610065, China
| | - Zhang Yi
- College of Computer Science, Sichuan University, Chengdu, 610065, China
| | - Xiangde Min
- Department of Radiology, Tongji Hospital of Tongji Medical College of Huazhong University of Science and Technology, Wuhan, 430030, China
| | - Tao He
- College of Computer Science, Sichuan University, Chengdu, 610065, China.
| |
Collapse
|
3
|
Irfan M, Haq IU, Malik KM, Muhammad K. One-shot learning for generalization in medical image classification across modalities. Comput Med Imaging Graph 2025; 122:102507. [PMID: 40049026 DOI: 10.1016/j.compmedimag.2025.102507] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2024] [Revised: 01/22/2025] [Accepted: 01/29/2025] [Indexed: 03/24/2025]
Abstract
Generalizability is one of the biggest challenges hindering the advancement of medical sensing technologies across multiple imaging modalities. This issue is further impaired when the imaging data is limited in scope or of poor quality. To tackle this, we propose a generalized and robust, lightweight one-shot learning method for medical image classification across various imaging modalities, including X-ray, microscopic, and CT scans. Our model introduces a collaborative one-shot training (COST) approach, incorporating both meta-learning and metric-learning. This approach allows for effective training on only one image per class. To ensure generalization with fewer epochs, we employ gradient generalization at dense and fully connected layers, utilizing a lightweight Siamese network with triplet loss and shared parameters. The proposed model was evaluated on 12 medical image datasets from MedMNIST2D, achieving an average accuracy of 91.5 % and area under the curve (AUC) of 0.89, outperforming state-of-the-art models such as ResNet-50 and AutoML by over 10 % on certain datasets. Further, in the OCTMNIST dataset, our model achieved an AUC of 0.91 compared to ResNet-50's 0.77. Ablation studies further validate the superiority of our approach, with the COST method showing significant improvement in convergence speed and accuracy when compared to traditional one-shot learning setups. Additionally, our model's lightweight architecture requires only 0.15 million parameters, making it well-suited for deployment on resource-constrained devices.
Collapse
Affiliation(s)
- Muhammad Irfan
- SMILES LAB, College of Innovation & Technology, University of Michigan-Flint, Flint, MI 48502, USA
| | - Ijaz Ul Haq
- SMILES LAB, College of Innovation & Technology, University of Michigan-Flint, Flint, MI 48502, USA
| | - Khalid Mahmood Malik
- SMILES LAB, College of Innovation & Technology, University of Michigan-Flint, Flint, MI 48502, USA.
| | - Khan Muhammad
- VIS2KNOW Lab, Department of Applied Artificial Intelligence, School of Convergence, College of Computing and Informatics, Sungkyunkwan University, Seoul 03063, South Korea.
| |
Collapse
|
4
|
Cui H, Duan J, Lin L, Wu Q, Guo W, Zang Q, Zhou M, Fang W, Hu Y, Zou Z. DEMAC-Net: A Dual-Encoder Multiattention Collaborative Network for Cervical Nerve Pathway and Adjacent Anatomical Structure Segmentation. ULTRASOUND IN MEDICINE & BIOLOGY 2025:S0301-5629(25)00122-X. [PMID: 40368703 DOI: 10.1016/j.ultrasmedbio.2025.04.006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/05/2025] [Revised: 03/26/2025] [Accepted: 04/15/2025] [Indexed: 05/16/2025]
Abstract
OBJECTIVE Currently, cervical anesthesia is performed using three main approaches: superficial cervical plexus block, deep cervical plexus block, and intermediate plexus nerve block. However, each technique carries inherent risks and demands significant clinical expertise. Ultrasound imaging, known for its real-time visualization capabilities and accessibility, is widely used in both diagnostic and interventional procedures. Nevertheless, accurate segmentation of small and irregularly shaped structures such as the cervical and brachial plexuses remains challenging due to image noise, complex anatomical morphology, and limited annotated training data. This study introduces DEMAC-Net-a dual-encoder, multiattention collaborative network-to significantly improve the segmentation accuracy of these neural structures. By precisely identifying the cervical nerve pathway (CNP) and adjacent anatomical tissues, DEMAC-Net aims to assist clinicians, especially those less experienced, in effectively guiding anesthesia procedures and accurately identifying optimal needle insertion points. Consequently, this improvement is expected to enhance clinical safety, reduce procedural risks, and streamline decision-making efficiency during ultrasound-guided regional anesthesia. METHODS DEMAC-Net combines a dual-encoder architecture with the Spatial Understanding Convolution Kernel (SUCK) and the Spatial-Channel Attention Module (SCAM) to extract multi-scale features effectively. Additionally, a Global Attention Gate (GAG) and inter-layer fusion modules refine relevant features while suppressing noise. A novel dataset, Neck Ultrasound Dataset (NUSD), was introduced, containing 1,500 annotated ultrasound images across seven anatomical regions. Extensive experiments were conducted on both NUSD and the BUSI public dataset, comparing DEMAC-Net to state-of-the-art models using metrics such as Dice Similarity Coefficient (DSC) and Intersection over Union (IoU). RESULTS On the NUSD dataset, DEMAC-Net achieved a mean DSC of 93.3%, outperforming existing models. For external validation on the BUSI dataset, it demonstrated superior generalization, achieving a DSC of 87.2% and a mean IoU of 77.4%, surpassing other advanced methods. Notably, DEMAC-Net displayed consistent segmentation stability across all tested structures. CONCLUSION The proposed DEMAC-Net significantly improves segmentation accuracy for small nerves and complex anatomical structures in ultrasound images, outperforming existing methods in terms of accuracy and computational efficiency. This framework holds great potential for enhancing ultrasound-guided procedures, such as peripheral nerve blocks, by providing more precise anatomical localization, ultimately improving clinical outcomes.
Collapse
Affiliation(s)
- H Cui
- School of Health Science and Engineering, University of Shanghai for Science and Technology, Shanghai, China
| | - J Duan
- School of Health Science and Engineering, University of Shanghai for Science and Technology, Shanghai, China
| | - L Lin
- Department of Anesthesiology, The First Hospital of Putian City, Putian, China
| | - Q Wu
- Department of Anesthesiology, The First Hospital of Putian City, Putian, China
| | - W Guo
- School of Anesthesiology, Naval Medical University, Shanghai, China
| | - Q Zang
- Information Center, The Second Affiliated Hospital of Naval Medical University, No. 415, Fengyang Road, Huangpu District, Shanghai 200003, PR China
| | - M Zhou
- Jiangsu Cancer Hospital, Jiangsu, China
| | - W Fang
- Department of Anesthesiology, Shanghai Ninth People's Hospital Affiliated to Shanghai Jiao Tong University School of Medicine, Shanghai, China.
| | - Y Hu
- Department of Anesthesiology, Second Affiliated Hospital, Naval Medical University, Shanghai, China
| | - Z Zou
- School of Anesthesiology, Naval Medical University, Shanghai, China
| |
Collapse
|
5
|
Huang Y, Chang A, Dou H, Tao X, Zhou X, Cao Y, Huang R, Frangi AF, Bao L, Yang X, Ni D. Flip Learning: Weakly supervised erase to segment nodules in breast ultrasound. Med Image Anal 2025; 102:103552. [PMID: 40179628 DOI: 10.1016/j.media.2025.103552] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2024] [Revised: 12/01/2024] [Accepted: 03/11/2025] [Indexed: 04/05/2025]
Abstract
Accurate segmentation of nodules in both 2D breast ultrasound (BUS) and 3D automated breast ultrasound (ABUS) is crucial for clinical diagnosis and treatment planning. Therefore, developing an automated system for nodule segmentation can enhance user independence and expedite clinical analysis. Unlike fully-supervised learning, weakly-supervised segmentation (WSS) can streamline the laborious and intricate annotation process. However, current WSS methods face challenges in achieving precise nodule segmentation, as many of them depend on inaccurate activation maps or inefficient pseudo-mask generation algorithms. In this study, we introduce a novel multi-agent reinforcement learning-based WSS framework called Flip Learning, which relies solely on 2D/3D boxes for accurate segmentation. Specifically, multiple agents are employed to erase the target from the box to facilitate classification tag flipping, with the erased region serving as the predicted segmentation mask. The key contributions of this research are as follows: (1) Adoption of a superpixel/supervoxel-based approach to encode the standardized environment, capturing boundary priors and expediting the learning process. (2) Introduction of three meticulously designed rewards, comprising a classification score reward and two intensity distribution rewards, to steer the agents' erasing process precisely, thereby avoiding both under- and over-segmentation. (3) Implementation of a progressive curriculum learning strategy to enable agents to interact with the environment in a progressively challenging manner, thereby enhancing learning efficiency. Extensively validated on the large in-house BUS and ABUS datasets, our Flip Learning method outperforms state-of-the-art WSS methods and foundation models, and achieves comparable performance as fully-supervised learning algorithms.
Collapse
Affiliation(s)
- Yuhao Huang
- National-Regional Key Technology Engineering Laboratory for Medical Ultrasound, School of Biomedical Engineering, Shenzhen University Medical School, Shenzhen University, Shenzhen, China; Medical UltraSound Image Computing (MUSIC) Lab, Shenzhen University, Shenzhen, China; Marshall Laboratory of Biomedical Engineering, Shenzhen University, Shenzhen, China
| | - Ao Chang
- National-Regional Key Technology Engineering Laboratory for Medical Ultrasound, School of Biomedical Engineering, Shenzhen University Medical School, Shenzhen University, Shenzhen, China; Medical UltraSound Image Computing (MUSIC) Lab, Shenzhen University, Shenzhen, China; Marshall Laboratory of Biomedical Engineering, Shenzhen University, Shenzhen, China
| | - Haoran Dou
- Centre for Computational Imaging and Simulation Technologies in Biomedicine (CISTIB), University of Leeds, Leeds, UK; Department of Computer Science, School of Engineering, University of Manchester, Manchester, UK
| | - Xing Tao
- National-Regional Key Technology Engineering Laboratory for Medical Ultrasound, School of Biomedical Engineering, Shenzhen University Medical School, Shenzhen University, Shenzhen, China; Medical UltraSound Image Computing (MUSIC) Lab, Shenzhen University, Shenzhen, China; Marshall Laboratory of Biomedical Engineering, Shenzhen University, Shenzhen, China
| | - Xinrui Zhou
- National-Regional Key Technology Engineering Laboratory for Medical Ultrasound, School of Biomedical Engineering, Shenzhen University Medical School, Shenzhen University, Shenzhen, China; Medical UltraSound Image Computing (MUSIC) Lab, Shenzhen University, Shenzhen, China; Marshall Laboratory of Biomedical Engineering, Shenzhen University, Shenzhen, China
| | - Yan Cao
- Shenzhen RayShape Medical Technology Co., Ltd, Shenzhen, China
| | - Ruobing Huang
- National-Regional Key Technology Engineering Laboratory for Medical Ultrasound, School of Biomedical Engineering, Shenzhen University Medical School, Shenzhen University, Shenzhen, China; Medical UltraSound Image Computing (MUSIC) Lab, Shenzhen University, Shenzhen, China; Marshall Laboratory of Biomedical Engineering, Shenzhen University, Shenzhen, China
| | - Alejandro F Frangi
- Division of Informatics, Imaging and Data Science, School of Health Sciences, University of Manchester, Manchester, UK; Department of Computer Science, School of Engineering, University of Manchester, Manchester, UK; Medical Imaging Research Center (MIRC), Department of Electrical Engineering, Department of Cardiovascular Sciences, KU Leuven, Belgium; Alan Turing Institute, London, UK; NIHR Manchester Biomedical Research Centre, Manchester Academic Health Science Centre, Manchester, UK
| | - Lingyun Bao
- Department of Ultrasound, Affiliated Hangzhou First People's Hospital, School of Medicine, Westlake University, China.
| | - Xin Yang
- National-Regional Key Technology Engineering Laboratory for Medical Ultrasound, School of Biomedical Engineering, Shenzhen University Medical School, Shenzhen University, Shenzhen, China; Medical UltraSound Image Computing (MUSIC) Lab, Shenzhen University, Shenzhen, China; Marshall Laboratory of Biomedical Engineering, Shenzhen University, Shenzhen, China.
| | - Dong Ni
- National-Regional Key Technology Engineering Laboratory for Medical Ultrasound, School of Biomedical Engineering, Shenzhen University Medical School, Shenzhen University, Shenzhen, China; Medical UltraSound Image Computing (MUSIC) Lab, Shenzhen University, Shenzhen, China; Marshall Laboratory of Biomedical Engineering, Shenzhen University, Shenzhen, China; School of Biomedical Engineering and Informatics, Nanjing Medical University, Nanjing, China.
| |
Collapse
|
6
|
Zhang H, Lian J, Ma Y. FET-UNet: Merging CNN and transformer architectures for superior breast ultrasound image segmentation. Phys Med 2025; 133:104969. [PMID: 40184647 DOI: 10.1016/j.ejmp.2025.104969] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/23/2024] [Revised: 03/14/2025] [Accepted: 03/25/2025] [Indexed: 04/07/2025] Open
Abstract
PURPOSE Breast cancer remains a significant cause of mortality among women globally, highlighting the critical need for accurate diagnosis. Although Convolutional Neural Networks (CNNs) have shown effectiveness in segmenting breast ultrasound images, they often face challenges in capturing long-range dependencies, particularly for lesions with similar intensity distributions, irregular shapes, and blurred boundaries. To overcome these limitations, we introduce FET-UNet, a novel hybrid framework that integrates CNNs and Swin Transformers within a UNet-like architecture. METHODS FET-UNet features parallel branches for feature extraction: one utilizes ResNet34 blocks, and the other employs Swin Transformer blocks. These branches are fused using an advanced feature aggregation module (AFAM), enabling the network to effectively combine local details and global context. Additionally, we include a multi-scale upsampling mechanism in the decoder to ensure precise segmentation outputs. This design enhances the capture of both local details and long-range dependencies. RESULTS Extensive evaluations on the BUSI, UDIAT, and BLUI datasets demonstrate the superior performance of FET-UNet compared to state-of-the-art methods. The model achieves Dice coefficients of 82.9% on BUSI, 88.9% on UDIAT, and 90.1% on BLUI. CONCLUSION FET-UNet shows great potential to advance breast ultrasound image segmentation and support more precise clinical diagnoses. Further research could explore the application of this framework to other medical imaging modalities and its integration into clinical workflows.
Collapse
Affiliation(s)
- Huaikun Zhang
- School of Information Science and Engineering, Lanzhou University, Lanzhou, Gansu, China
| | - Jing Lian
- School of Electronic and Information Engineering, Lanzhou Jiaotong University, Lanzhou, Gansu, China
| | - Yide Ma
- School of Information Science and Engineering, Lanzhou University, Lanzhou, Gansu, China.
| |
Collapse
|
7
|
Yang X, Zhang J, Ou Y, Chen Q, Wang L, Wang L. Multilevel perception boundary-guided network for breast lesion segmentation in ultrasound images. Med Phys 2025; 52:3117-3134. [PMID: 39887423 DOI: 10.1002/mp.17647] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2024] [Revised: 11/27/2024] [Accepted: 12/28/2024] [Indexed: 02/01/2025] Open
Abstract
BACKGROUND Automatic segmentation of breast tumors from the ultrasound images is essential for the subsequent clinical diagnosis and treatment plan. Although the existing deep learning-based methods have achieved considerable progress in automatic segmentation of breast tumors, their performance on tumors with similar intensity to the normal tissues is still not satisfactory, especially for the tumor boundaries. PURPOSE To accurately segment the non-enhanced lesions with more accurate boundaries, a novel multilevel perception boundary-guided network (PBNet) is proposed to segment breast tumors from ultrasound images. METHODS PBNet consists of a multilevel global perception module (MGPM) and a boundary guided module (BGM). MGPM models long-range spatial dependencies by fusing both intra- and inter-level semantic information to enhance tumor recognition. In BGM, the tumor boundaries are extracted from the high-level semantic maps using the dilation and erosion effects of max pooling; such boundaries are then used to guide the fusion of low- and high-level features. Additionally, a multi-level boundary-enhanced segmentation (BS) loss is introduced to improve boundary segmentation performance. To evaluate the effectiveness of the proposed method, we compared it with state-of-the-art methods on two datasets, one publicly available datasets BUSI containing 780 images and one in-house dataset containing 995 images. To verify the robustness of each method, a 5-fold cross-validation method was used to train and test the models. Dice score (Dice), Jaccard coefficients (Jac), Hausdorff Distance (HD), Sensitivity (Sen), and specificity(Spe) were used to evaluate the segmentation performance quantitatively. The Wilcoxon test and Benjamini-Hochberg false discovery rate (FDR) multi-comparison correction were then performed to assess whether the proposed method presents statistically significant performance (p ≤ 0.05 $p\le 0.05$ ) difference comparing with existing methods. In addition, to comprehensively demonstrate the difference between different methods, the Cohen's d effect size and compound p-value (c-Pvalue) obtained with Fisher's method were also calculated. RESULTS On the BUSI dataset, the mean Dice and Sen of PBNet was increased by 0.93% (p ≤ 0.01 $p\le 0.01$ ) and 1.42% (p ≤ 0.05 $p\le 0.05$ ), respectively, comparing against the corresponding suboptimal methods. On the in-house dataset, PBNet improved Dice, Jac and Spe by approximately 0.86% (p ≤ 0.01 $p\le 0.01$ ), 1.42% (p ≤ 0.01 $p\le 0.01$ ), and 0.1%, respectively, and reduced HD by 1.7% (p ≤ 0.01 $p\le 0.01$ ) compared to the sub-optimal model. Comprehensively, in terms of all the evaluation metics, the performance of the proposed method significantly (c-Pvalue≤ 0.05 $\le 0.05$ ) outperformed the others but the effect size was smaller than 0.2. Ablation results confirmed that MGPM is effective in distinguishing non-enhanced tumors, while BGM and BS loss are beneficial for refining tumor segmentation contours. CONCLUSIONS The proposed PBNet allows us to segment the non-enhanced breast lesions from ultrasound images with more accurate boundaries, which provides a valuable means for the subsequent clinical applications.
Collapse
Affiliation(s)
- Xing Yang
- Engineering Research Center of Text Computing & Cognitive Intelligence, Ministry of Education, Key Laboratory of Intelligent Medical Image Analysis and Precise Diagnosis of Guizhou Province, State Key Laboratory of Public Big Data, College of Computer Science and Technology, Guizhou University, Guiyang, China
| | - Jian Zhang
- Engineering Research Center of Text Computing & Cognitive Intelligence, Ministry of Education, Key Laboratory of Intelligent Medical Image Analysis and Precise Diagnosis of Guizhou Province, State Key Laboratory of Public Big Data, College of Computer Science and Technology, Guizhou University, Guiyang, China
| | - Yingfeng Ou
- Engineering Research Center of Text Computing & Cognitive Intelligence, Ministry of Education, Key Laboratory of Intelligent Medical Image Analysis and Precise Diagnosis of Guizhou Province, State Key Laboratory of Public Big Data, College of Computer Science and Technology, Guizhou University, Guiyang, China
| | - Qijian Chen
- Engineering Research Center of Text Computing & Cognitive Intelligence, Ministry of Education, Key Laboratory of Intelligent Medical Image Analysis and Precise Diagnosis of Guizhou Province, State Key Laboratory of Public Big Data, College of Computer Science and Technology, Guizhou University, Guiyang, China
| | - Li Wang
- Engineering Research Center of Text Computing & Cognitive Intelligence, Ministry of Education, Key Laboratory of Intelligent Medical Image Analysis and Precise Diagnosis of Guizhou Province, State Key Laboratory of Public Big Data, College of Computer Science and Technology, Guizhou University, Guiyang, China
| | - Lihui Wang
- Engineering Research Center of Text Computing & Cognitive Intelligence, Ministry of Education, Key Laboratory of Intelligent Medical Image Analysis and Precise Diagnosis of Guizhou Province, State Key Laboratory of Public Big Data, College of Computer Science and Technology, Guizhou University, Guiyang, China
| |
Collapse
|
8
|
Chen Y, Shao X, Shi K, Rominger A, Caobelli F. AI in Breast Cancer Imaging: An Update and Future Trends. Semin Nucl Med 2025; 55:358-370. [PMID: 40011118 DOI: 10.1053/j.semnuclmed.2025.01.008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2025] [Revised: 01/30/2025] [Accepted: 01/30/2025] [Indexed: 02/28/2025]
Abstract
Breast cancer is one of the most common types of cancer affecting women worldwide. Artificial intelligence (AI) is transforming breast cancer imaging by enhancing diagnostic capabilities across multiple imaging modalities including mammography, digital breast tomosynthesis, ultrasound, magnetic resonance imaging, and nuclear medicines techniques. AI is being applied to diverse tasks such as breast lesion detection and classification, risk stratification, molecular subtyping, gene mutation status prediction, and treatment response assessment, with emerging research demonstrating performance levels comparable to or potentially exceeding those of radiologists. The large foundation models are showing remarkable potential in different breast cancer imaging tasks. Self-supervised learning gives an insight into data inherent correlation, and federated learning is an alternative way to maintain data privacy. While promising results have been obtained so far, data standardization from source, large-scale annotated multimodal datasets, and extensive prospective clinical trials are still needed to fully explore and validate deep learning's clinical utility and address the legal and ethical considerations, which will ultimately determine its widespread adoption in breast cancer care. We hereby provide a review of the most up-to-date knowledge on AI in breast cancer imaging.
Collapse
Affiliation(s)
- Yizhou Chen
- Department of Nuclear Medicine, Inselspital, Bern University Hospital, University of Bern, Bern, Switzerland
| | - Xiaoliang Shao
- Department of Nuclear Medicine, Inselspital, Bern University Hospital, University of Bern, Bern, Switzerland; Department of Nuclear Medicine, The Third Affiliated Hospital of Soochow University, Changzhou, China
| | - Kuangyu Shi
- Department of Nuclear Medicine, Inselspital, Bern University Hospital, University of Bern, Bern, Switzerland
| | - Axel Rominger
- Department of Nuclear Medicine, Inselspital, Bern University Hospital, University of Bern, Bern, Switzerland
| | - Federico Caobelli
- Department of Nuclear Medicine, Inselspital, Bern University Hospital, University of Bern, Bern, Switzerland.
| |
Collapse
|
9
|
Hosseinzadeh Taher MR, Haghighi F, Gotway MB, Liang J. Large-scale benchmarking and boosting transfer learning for medical image analysis. Med Image Anal 2025; 102:103487. [PMID: 40117988 DOI: 10.1016/j.media.2025.103487] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2024] [Revised: 08/03/2024] [Accepted: 01/27/2025] [Indexed: 03/23/2025]
Abstract
Transfer learning, particularly fine-tuning models pretrained on photographic images to medical images, has proven indispensable for medical image analysis. There are numerous models with distinct architectures pretrained on various datasets using different strategies. But, there is a lack of up-to-date large-scale evaluations of their transferability to medical imaging, posing a challenge for practitioners in selecting the most proper pretrained models for their tasks at hand. To fill this gap, we conduct a comprehensive systematic study, focusing on (i) benchmarking numerous conventional and modern convolutional neural network (ConvNet) and vision transformer architectures across various medical tasks; (ii) investigating the impact of fine-tuning data size on the performance of ConvNets compared with vision transformers in medical imaging; (iii) examining the impact of pretraining data granularity on transfer learning performance; (iv) evaluating transferability of a wide range of recent self-supervised methods with diverse training objectives to a variety of medical tasks across different modalities; and (v) delving into the efficacy of domain-adaptive pretraining on both photographic and medical datasets to develop high-performance models for medical tasks. Our large-scale study (∼5,000 experiments) yields impactful insights: (1) ConvNets demonstrate higher transferability than vision transformers when fine-tuning for medical tasks; (2) ConvNets prove to be more annotation efficient than vision transformers when fine-tuning for medical tasks; (3) Fine-grained representations, rather than high-level semantic features, prove pivotal for fine-grained medical tasks; (4) Self-supervised models excel in learning holistic features compared with supervised models; and (5) Domain-adaptive pretraining leads to performant models via harnessing knowledge acquired from ImageNet and enhancing it through the utilization of readily accessible expert annotations associated with medical datasets. As open science, all codes and pretrained models are available at GitHub.com/JLiangLab/BenchmarkTransferLearning (Version 2).
Collapse
Affiliation(s)
| | - Fatemeh Haghighi
- School of Computing and Augmented Intelligence, Arizona State University, Tempe, AZ 85281, USA
| | | | - Jianming Liang
- School of Computing and Augmented Intelligence, Arizona State University, Tempe, AZ 85281, USA.
| |
Collapse
|
10
|
Chen X, Ke J, Zhang Y, Gou J, Shen A, Wan S. Multimodal Distillation Pre-Training Model for Ultrasound Dynamic Images Annotation. IEEE J Biomed Health Inform 2025; 29:3124-3136. [PMID: 39102331 DOI: 10.1109/jbhi.2024.3438254] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/07/2024]
Abstract
With the development of medical technology, ultrasonography has become an important diagnostic method in doctors' clinical work. However, compared with the static medical image processing work such as CT, MRI, etc., which has more research bases, ultrasonography is a dynamic medical image similar to video, which is captured and generated by a real-time moving probe, so how to deal with the video data in the medical field and cross modal extraction of the textual semantics in the medical video is a difficult problem that needs to be researched. For this reason, this paper proposes a pre-training model of multimodal distillation and fusion coding for processing the semantic relationship between ultrasound dynamic Images and text. Firstly, by designing the fusion encoder, the visual geometric features of tissues and organs in ultrasound dynamic images, the overall visual appearance descriptive features and the named entity linguistic features are fused to form a unified visual-linguistic feature, so that the model obtains richer visual, linguistic cues aggregation and alignment ability. Then, the pre-training model is augmented by multimodal knowledge distillation to improve the learning ability of the model. The final experimental results on multiple datasets show that the multimodal distillation pre-training model generally improves the fusion ability of various types of features in ultrasound dynamic images, and realizes the automated and accurate annotation of ultrasound dynamic images.
Collapse
|
11
|
Güler O. A Dirichlet Distribution-Based Complex Ensemble Approach for Breast Cancer Classification from Ultrasound Images with Transfer Learning and Multiphase Spaced Repetition Method. JOURNAL OF IMAGING INFORMATICS IN MEDICINE 2025:10.1007/s10278-025-01515-5. [PMID: 40301291 DOI: 10.1007/s10278-025-01515-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/23/2025] [Revised: 04/04/2025] [Accepted: 04/16/2025] [Indexed: 05/01/2025]
Abstract
Breast ultrasound is a useful and rapid diagnostic tool for the early detection of breast cancer. Artificial intelligence-supported computer-aided decision systems, which assist expert radiologists and clinicians, provide reliable and rapid results. Deep learning methods and techniques are widely used in the field of health for early diagnosis, abnormality detection, and disease diagnosis. Therefore, in this study, a deep ensemble learning model based on Dirichlet distribution using pre-trained transfer learning models for breast cancer classification from ultrasound images is proposed. In the study, experiments were conducted using the Breast Ultrasound Images Dataset (BUSI). The dataset, which had an imbalanced class structure, was balanced using data augmentation techniques. DenseNet201, InceptionV3, VGG16, and ResNet152 models were used for transfer learning with fivefold cross-validation. Statistical analyses, including the ANOVA test and Tukey HSD test, were applied to evaluate the model's performance and ensure the reliability of the results. Additionally, Grad-CAM (Gradient-weighted Class Activation Mapping) was used for explainable AI (XAI), providing visual explanations of the deep learning model's decision-making process. The spaced repetition method, commonly used to improve the success of learners in educational sciences, was adapted to artificial intelligence in this study. The results of training with transfer learning models were used as input for further training, and spaced repetition was applied using previously learned information. The use of the spaced repetition method led to increased model success and reduced learning times. The weights obtained from the trained models were input into an ensemble learning system based on Dirichlet distribution with different variations. The proposed model achieved 99.60% validation accuracy on the dataset, demonstrating its effectiveness in breast cancer classification.
Collapse
Affiliation(s)
- Osman Güler
- Department of Computer Engineering, Çankırı Karatekin University, Çankırı, Turkey.
| |
Collapse
|
12
|
Kalkhof J, Ihm N, Köhler T, Gregori B, Mukhopadhyay A. MED-NCA: Bio-inspired medical image segmentation. Med Image Anal 2025; 103:103601. [PMID: 40324321 DOI: 10.1016/j.media.2025.103601] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2024] [Revised: 03/11/2025] [Accepted: 04/12/2025] [Indexed: 05/07/2025]
Abstract
The reliance on computationally intensive U-Net and Transformer architectures significantly limits their accessibility in low-resource environments, creating a technological divide that hinders global healthcare equity, especially in medical diagnostics and treatment planning. This divide is most pronounced in low- and middle-income countries, primary care facilities, and conflict zones. We introduced MED-NCA, Neural Cellular Automata (NCA) based segmentation models characterized by their low parameter count, robust performance, and inherent quality control mechanisms. These features drastically lower the barriers to high-quality medical image analysis in resource-constrained settings, allowing the models to run efficiently on hardware as minimal as a Raspberry Pi or a smartphone. Building upon the foundation laid by MED-NCA, this paper extends its validation across eight distinct anatomies, including the hippocampus and prostate (MRI, 3D), liver and spleen (CT, 3D), heart and lung (X-ray, 2D), breast tumor (Ultrasound, 2D), and skin lesion (Image, 2D). Our comprehensive evaluation demonstrates the broad applicability and effectiveness of MED-NCA in various medical imaging contexts, matching the performance of two magnitudes larger UNet models. Additionally, we introduce NCA-VIS, a visualization tool that gives insight into the inference process of MED-NCA and allows users to test its robustness by applying various artifacts. This combination of efficiency, broad applicability, and enhanced interpretability makes MED-NCA a transformative solution for medical image analysis, fostering greater global healthcare equity by making advanced diagnostics accessible in even the most resource-limited environments.
Collapse
Affiliation(s)
- John Kalkhof
- Darmstadt University of Technology, Karolinenplatz 5, 64289 Darmstadt, Germany.
| | - Niklas Ihm
- Darmstadt University of Technology, Karolinenplatz 5, 64289 Darmstadt, Germany
| | - Tim Köhler
- Darmstadt University of Technology, Karolinenplatz 5, 64289 Darmstadt, Germany
| | - Bjarne Gregori
- Darmstadt University of Technology, Karolinenplatz 5, 64289 Darmstadt, Germany
| | | |
Collapse
|
13
|
Saini M, Hassanzadeh S, Musa B, Fatemi M, Alizad A. Variational mode directed deep learning framework for breast lesion classification using ultrasound imaging. Sci Rep 2025; 15:14300. [PMID: 40274985 PMCID: PMC12022294 DOI: 10.1038/s41598-025-99009-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2024] [Accepted: 04/16/2025] [Indexed: 04/26/2025] Open
Abstract
Breast cancer is the most prevalent cancer and the second cause of cancer related death among women in the United States. Accurate and early detection of breast cancer can reduce the number of mortalities. Recent works explore deep learning techniques with ultrasound for detecting malignant breast lesions. However, the lack of explanatory features, need for segmentation, and high computational complexity limit their applicability in this detection. Therefore, we propose a novel ultrasound-based breast lesion classification framework that utilizes two-dimensional variational mode decomposition (2D-VMD) which provides self-explanatory features for guiding a convolutional neural network (CNN) with mixed pooling and attention mechanisms. The visual inspection of these features demonstrates their explainability in terms of discriminative lesion-specific boundary and texture in the decomposed modes of benign and malignant images, which further guide the deep learning network for enhanced classification. The proposed framework can classify the lesions with accuracies of 98% and 93% in two public breast ultrasound datasets and 89% in an in-house dataset without having to segment the lesions unlike existing techniques, along with an optimal trade-off between the sensitivity and specificity. 2D-VMD improves the areas under the receiver operating characteristics and precision-recall curves by 5% and 10% respectively. The proposed method achieves relative improvement of 14.47%(8.42%) (mean (SD)) in accuracy over state-of-the-art methods for one public dataset, and 5.75%(4.52%) for another public dataset with comparable performance to two existing methods. Further, it is computationally efficient with a reduction of [Formula: see text] in floating point operations as compared to existing methods.
Collapse
Affiliation(s)
- Manali Saini
- Department of Radiology, Mayo Clinic College of Medicine and Science, Rochester, MN, 55905, USA
| | - Sara Hassanzadeh
- Department of Radiology, Mayo Clinic College of Medicine and Science, Rochester, MN, 55905, USA
| | - Bushira Musa
- Department of Physiology and Biomedical Engineering, Mayo Clinic College of Medicine and Science, Rochester, MN, 55905, USA
| | - Mostafa Fatemi
- Department of Physiology and Biomedical Engineering, Mayo Clinic College of Medicine and Science, Rochester, MN, 55905, USA
| | - Azra Alizad
- Department of Radiology, Mayo Clinic College of Medicine and Science, Rochester, MN, 55905, USA.
- Department of Physiology and Biomedical Engineering, Mayo Clinic College of Medicine and Science, Rochester, MN, 55905, USA.
| |
Collapse
|
14
|
Chelloug SA, Ba Mahel AS, Alnashwan R, Rafiq A, Ali Muthanna MS, Aziz A. Enhanced breast cancer diagnosis using modified InceptionNet-V3: a deep learning approach for ultrasound image classification. Front Physiol 2025; 16:1558001. [PMID: 40330252 PMCID: PMC12052540 DOI: 10.3389/fphys.2025.1558001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2025] [Accepted: 04/07/2025] [Indexed: 05/08/2025] Open
Abstract
Introduction Breast cancer (BC) is a malignant neoplasm that originates in the mammary gland's cellular structures and remains one of the most prevalent cancers among women, ranking second in cancer-related mortality after lung cancer. Early and accurate diagnosis is crucial due to the heterogeneous nature of breast cancer and its rapid progression. However, manual detection and classification are often time-consuming and prone to errors, necessitating the development of automated and reliable diagnostic approaches. Methods Recent advancements in deep learning have significantly improved medical image analysis, demonstrating superior predictive performance in breast cancer detection using ultrasound images. Despite these advancements, training deep learning models from scratch can be computationally expensive and data-intensive. Transfer learning, leveraging pre-trained models on large-scale datasets, offers an effective solution to mitigate these challenges. In this study, we investigate and compare multiple deep-learning models for breast cancer classification using transfer learning. The evaluated architectures include modified InceptionV3, GoogLeNet, ShuffleNet, AlexNet, VGG-16, and SqueezeNet. Additionally, we propose a deep neural network model that integrates features from modified InceptionV3 to further enhance classification performance. Results The experimental results demonstrate that the modified InceptionV3 model achieves the highest classification accuracy of 99.10%, with a recall of 98.90%, precision of 99.00%, and an F1-score of 98.80%, outperforming all other evaluated models on the given datasets. Discussion The achieved findings underscore the potential of the proposed approach in enhancing diagnostic precision and confirm the superiority of the modified InceptionV3 model in breast cancer classification tasks.
Collapse
Affiliation(s)
- Samia Allaoua Chelloug
- Department of Information Technology, College of Computer and Information Sciences, Princess Nourah bint Abdulrahman University, Riyadh, Saudi Arabia
| | - Abduljabbar S. Ba Mahel
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, China
| | - Rana Alnashwan
- Department of Information Technology, College of Computer and Information Sciences, Princess Nourah bint Abdulrahman University, Riyadh, Saudi Arabia
| | - Ahsan Rafiq
- Institute of Information Technology and Information Security Southern Federal University, Taganrog, Russia
| | - Mohammed Saleh Ali Muthanna
- Department of International Business Management, Tashkent State University of Economics, Tashkent, Uzbekistan
| | - Ahmed Aziz
- Department of Computer Science, Faculty of Computer and Artificial Intelligence, Benha University, Benha, Egypt
- Engineering school, Central Asian University, Tashkent, Uzbekistan
| |
Collapse
|
15
|
Woerner S, Jaques A, Baumgartner CF. A comprehensive and easy-to-use multi-domain multi-task medical imaging meta-dataset. Sci Data 2025; 12:666. [PMID: 40253434 PMCID: PMC12009356 DOI: 10.1038/s41597-025-04866-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2024] [Accepted: 03/20/2025] [Indexed: 04/21/2025] Open
Abstract
While the field of medical image analysis has undergone a transformative shift with the integration of machine learning techniques, the main challenge of these techniques is often the scarcity of large, diverse, and well-annotated datasets. Medical images vary in format, size, and other parameters and therefore require extensive preprocessing and standardization, for usage in machine learning. Addressing these challenges, we introduce the Medical Imaging Meta-Dataset (MedIMeta), a novel multi-domain, multi-task meta-dataset. MedIMeta contains 19 medical imaging datasets spanning 10 different domains and encompassing 54 distinct medical tasks, all of which are standardized to the same format and readily usable in PyTorch or other ML frameworks. We perform a technical validation of MedIMeta, demonstrating its utility through fully supervised and cross-domain few-shot learning baselines.
Collapse
Affiliation(s)
- Stefano Woerner
- Cluster of Excellence "Machine Learning: New Perspectives for Science", University of Tübingen, Tübingen, Germany.
| | - Arthur Jaques
- Cluster of Excellence "Machine Learning: New Perspectives for Science", University of Tübingen, Tübingen, Germany
| | - Christian F Baumgartner
- Cluster of Excellence "Machine Learning: New Perspectives for Science", University of Tübingen, Tübingen, Germany
- Faculty of Health Sciences and Medicine, University of Lucerne, Lucerne, Switzerland
| |
Collapse
|
16
|
Lu Y, Sun F, Wang J, Yu K. Automatic joint segmentation and classification of breast ultrasound images via multi-task learning with object contextual attention. Front Oncol 2025; 15:1567577. [PMID: 40265029 PMCID: PMC12011763 DOI: 10.3389/fonc.2025.1567577] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2025] [Accepted: 02/28/2025] [Indexed: 04/24/2025] Open
Abstract
The segmentation and classification of breast ultrasound (BUS) images are crucial for the early diagnosis of breast cancer and remain a key focus in BUS image processing. Numerous machine learning and deep learning algorithms have shown their effectiveness in the segmentation and diagnosis of BUS images. In this work, we propose a multi-task learning network with an object contextual attention module (MTL-OCA) for the segmentation and classification of BUS images. The proposed method utilizes the object contextual attention module to capture pixel-region relationships, enhancing the quality of segmentation masks. For classification, the model leverages high-level features extracted from unenhanced segmentation masks to improve accuracy. Cross-validation on a public BUS dataset demonstrates that MTL-OCA outperforms several current state-of-the-art methods, achieving superior results in both classification and segmentation tasks.
Collapse
Affiliation(s)
- Yaling Lu
- Department of Medicine Ultrasound, People’s Hospital of Tianfu New Area in Sichuan, Chengdu, Sichuan, China
| | - Fengyuan Sun
- Guangxi Wireless Broadband Communication and Signal Processing Key Laboratory and School of Information and Communication, Guilin University of Electronic Technology, Guilin, Guangxi, China
| | - Jingyu Wang
- School of Software Engineering, Xi’an Jiaotong University, Xi’an, Shaanxi, China
| | - Kai Yu
- Beijing Smartmore Intelligent Technology Co., Ltd, Digital Intelligence Business Department, Beijing, China
| |
Collapse
|
17
|
Pan L, Tang M, Chen X, Du Z, Huang D, Yang M, Chen Y. M 2UNet: Multi-Scale Feature Acquisition and Multi-Input Edge Supplement Based on UNet for Efficient Segmentation of Breast Tumor in Ultrasound Images. Diagnostics (Basel) 2025; 15:944. [PMID: 40310342 PMCID: PMC12025914 DOI: 10.3390/diagnostics15080944] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2025] [Revised: 04/03/2025] [Accepted: 04/05/2025] [Indexed: 05/02/2025] Open
Abstract
Background/Objectives: The morphological characteristics of breast tumors play a crucial role in the preliminary diagnosis of breast cancer. However, malignant tumors often exhibit rough, irregular edges and unclear, boundaries in ultrasound images. Additionally, variations in tumor size, location, and shape further complicate the accurate segmentation of breast tumors from ultrasound images. Methods: For these difficulties, this paper introduces a breast ultrasound tumor segmentation network comprising a multi-scale feature acquisition (MFA) module and a multi-input edge supplement (MES) module. The MFA module effectively incorporates dilated convolutions of various sizes in a serial-parallel fashion to capture tumor features at diverse scales. Then, the MES module is employed to enhance the output of each decoder layer by supplementing edge information. This process aims to improve the overall integrity of tumor boundaries, contributing to more refined segmentation results. Results: The mean Dice (mDice), Pixel Accuracy (PA), Intersection over Union (IoU), Recall, and Hausdorff Distance (HD) of this method for the publicly available breast ultrasound image (BUSI) dataset were 79.43%, 96.84%, 83.00%, 87.17%, and 19.71 mm, respectively, and for the dataset of Fujian Cancer Hospital, 90.45%, 97.55%, 90.08%, 93.72%, and 11.02 mm, respectively. In the BUSI dataset, compared to the original UNet, the Dice for malignant tumors increased by 14.59%, and the HD decreased by 17.13 mm. Conclusions: Our method is capable of accurately segmenting breast tumor ultrasound images, which provides very valuable edge information for subsequent diagnosis of breast cancer. The experimental results show that our method has made substantial progress in improving accuracy.
Collapse
Affiliation(s)
- Lin Pan
- College of Physics and Information Engineering, Fuzhou University, Fuzhou 350108, China; (L.P.); (M.T.); (X.C.)
| | - Mengshi Tang
- College of Physics and Information Engineering, Fuzhou University, Fuzhou 350108, China; (L.P.); (M.T.); (X.C.)
| | - Xin Chen
- College of Physics and Information Engineering, Fuzhou University, Fuzhou 350108, China; (L.P.); (M.T.); (X.C.)
| | - Zhongshi Du
- Department of Ultrasound, Clinical Oncology School of Fujian Medical University, Fujian Cancer Hospital, Fuzhou 350014, China; (Z.D.); (D.H.)
| | - Danfeng Huang
- Department of Ultrasound, Clinical Oncology School of Fujian Medical University, Fujian Cancer Hospital, Fuzhou 350014, China; (Z.D.); (D.H.)
| | - Mingjing Yang
- College of Physics and Information Engineering, Fuzhou University, Fuzhou 350108, China; (L.P.); (M.T.); (X.C.)
| | - Yijie Chen
- Department of Ultrasound, Clinical Oncology School of Fujian Medical University, Fujian Cancer Hospital, Fuzhou 350014, China; (Z.D.); (D.H.)
| |
Collapse
|
18
|
Liu X, Su Z, Shi Y, Tong Y, Wang G, Wei GW. Manifold Topological Deep Learning for Biomedical Data. RESEARCH SQUARE 2025:rs.3.rs-6149503. [PMID: 40297704 PMCID: PMC12036455 DOI: 10.21203/rs.3.rs-6149503/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/30/2025]
Abstract
Recently, topological deep learning (TDL), which integrates algebraic topology with deep neural networks, has achieved tremendous success in processing point-cloud data, emerging as a promising paradigm in data science. However, TDL has not been developed for data on differentiable manifolds, including images, due to the challenges posed by differential topology. We address this challenge by introducing manifold topological deep learning (MTDL) for the first time. To highlight the power of Hodge theory rooted in differential topology, we consider a simple convolutional neural network (CNN) in MTDL. In this novel framework, original images are represented as smooth manifolds with vector fields that are decomposed into three orthogonal components based on Hodge theory. These components are then concatenated to form an input image for the CNN architecture. The performance of MTDL is evaluated using the MedMNIST v2 benchmark database, which comprises 717,287 biomedical images from eleven 2D and six 3D datasets. MTDL significantly outperforms other competing methods, extending TDL to a wide range of data on smooth manifolds.
Collapse
Affiliation(s)
- Xiang Liu
- Department of Mathematics, Michigan State University, MI, 48824, USA
| | - Zhe Su
- Department of Mathematics, Michigan State University, MI, 48824, USA
| | - Yongyi Shi
- Biomedical Imaging Center, Rensselaer Polytechnic Institute, NY, 12180, USA
| | - Yiying Tong
- Computer Science and Engineering, Michigan State University, MI 48824, USA
| | - Ge Wang
- Biomedical Imaging Center, Rensselaer Polytechnic Institute, NY, 12180, USA
| | - Guo-Wei Wei
- Department of Mathematics, Michigan State University, MI, 48824, USA
- Department of Electrical and Computer Engineering, Michigan State University, MI 48824, USA
- Department of Biochemistry and Molecular Biology, Michigan State University, MI 48824, USA
| |
Collapse
|
19
|
Zhang B, Huang H, Shen Y, Sun M. MM-UKAN++: A Novel Kolmogorov-Arnold Network-Based U-Shaped Network for Ultrasound Image Segmentation. IEEE TRANSACTIONS ON ULTRASONICS, FERROELECTRICS, AND FREQUENCY CONTROL 2025; 72:498-514. [PMID: 40031744 DOI: 10.1109/tuffc.2025.3539262] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/05/2025]
Abstract
Ultrasound (US) imaging is an important and commonly used medical imaging modality. Accurate and fast automatic segmentation of regions of interest (ROIs) in US images is essential for enhancing the efficiency of clinical and robot-assisted diagnosis. However, US images suffer from low contrast, fuzzy boundaries, and significant scale variations in ROIs. Existing convolutional neural network (CNN)-based and transformer-based methods struggle with model efficiency and explainability. To address these challenges, we introduce MM-UKAN++, a novel U-shaped network based on Kolmogorov-Arnold networks (KANs). MM-UKAN++ leverages multilevel KAN layers as the encoder and decoder within the U-network architecture and incorporates an innovative multidimensional attention mechanism to refine skip connections by weighting features from frequency-channel and spatial perspectives. In addition, the network effectively integrates multiscale information, fusing outputs from different scale decoders to generate precise segmentation predictions. MM-UKAN++ achieves higher segmentation accuracy with lower computational cost and outperforms other mainstream methods on several open-source datasets for US image segmentation tasks, including achieving 69.42% IoU, 81.30% Dice, and 3.31 mm HD in the BUSI dataset with 3.17 G floating point of operations (FLOPs) and 9.90 M parameters. The excellent performance on our automatic carotid artery US scanning and diagnostic system further proves the speed and accuracy of MM-UKAN++. Besides, the good performance in other medical image segmentation tasks reveals the promising applications of MM-UKAN++. The code is available on GitHub.
Collapse
|
20
|
Yang T, Huang Q, Cai F, Li J, Jiang L, Xia Y. Vital Characteristics Cellular Neural Network (VCeNN) for Melanoma Lesion Segmentation: A Biologically Inspired Deep Learning Approach. JOURNAL OF IMAGING INFORMATICS IN MEDICINE 2025; 38:1147-1164. [PMID: 39284982 PMCID: PMC11950543 DOI: 10.1007/s10278-024-01257-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/01/2024] [Revised: 08/27/2024] [Accepted: 08/31/2024] [Indexed: 03/29/2025]
Abstract
Cutaneous melanoma is a highly lethal form of cancer. Developing a medical image segmentation model capable of accurately delineating melanoma lesions with high robustness and generalization presents a formidable challenge. This study draws inspiration from cellular functional characteristics and natural selection, proposing a novel medical segmentation model named the vital characteristics cellular neural network. This model incorporates vital characteristics observed in multicellular organisms, including memory, adaptation, apoptosis, and division. Memory module enables the network to rapidly adapt to input data during the early stages of training, accelerating model convergence. Adaptation module allows neurons to select the appropriate activation function based on varying environmental conditions. Apoptosis module reduces the risk of overfitting by pruning neurons with low activation values. Division module enhances the network's learning capacity by duplicating neurons with high activation values. Experimental evaluations demonstrate the efficacy of this model in enhancing the performance of neural networks for medical image segmentation. The proposed method achieves outstanding results across numerous publicly available datasets, indicating its potential to contribute significantly to the field of medical image analysis and facilitating accurate and efficient segmentation of medical imagery. The proposed method achieves outstanding results across numerous publicly available datasets, with an F1 score of 0.901, Intersection over Union of 0.841, and Dice coefficient of 0.913, indicating its potential to contribute significantly to the field of medical image analysis and facilitating accurate and efficient segmentation of medical imagery.
Collapse
Affiliation(s)
- Tongxin Yang
- Chongqing University of Science and Technology, Chongqing, 401331, China
| | - Qilin Huang
- Chongqing University of Science and Technology, Chongqing, 401331, China
| | - Fenglin Cai
- Chongqing University of Science and Technology, Chongqing, 401331, China
| | - Jie Li
- Chongqing University of Science and Technology, Chongqing, 401331, China.
| | - Li Jiang
- The First Affiliated Hospital of Chongqing Medical University, Chongqing, 400016, China
| | - Yulong Xia
- The First Affiliated Hospital of Chongqing Medical University, Chongqing, 400016, China
| |
Collapse
|
21
|
Guo S, Liu Z, Yang Z, Lee CH, Lv Q, Shen L. Multi-scale multi-object semi-supervised consistency learning for ultrasound image segmentation. Neural Netw 2025; 184:107095. [PMID: 39754842 DOI: 10.1016/j.neunet.2024.107095] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2024] [Revised: 10/18/2024] [Accepted: 12/23/2024] [Indexed: 01/06/2025]
Abstract
Manual annotation of ultrasound images relies on expert knowledge and requires significant time and financial resources. Semi-supervised learning (SSL) exploits large amounts of unlabeled data to improve model performance under limited labeled data. However, it faces two challenges: fusion of contextual information at multiple scales and bias of spatial information between multiple objects. We propose a consistency learning-based multi-scale multi-object (MSMO) semi-supervised framework for ultrasound image segmentation. MSMO addresses these challenges by employing a contextual-aware encoder coupled with a multi-object semantic calibration and fusion decoder. First, the encoder extracts multi-scale multi-objects context-aware features, and introduces attention module to refine the feature map and enhance channel information interaction. Then, the decoder uses HConvLSTM to calibrate the output features of the current object by using the hidden state of the previous object, and recursively fuses multi-object semantics at different scales. Finally, MSMO further reduces variations among multiple decoders in different perturbations through consistency constraints, thereby producing consistent predictions for highly uncertain areas. Extensive experiments show that proposed MSMO outperforms the SSL baseline on four benchmark datasets, whether for single-object or multi-object ultrasound image segmentation. MSMO significantly reduces the burden of manual analysis of ultrasound images and holds great potential as a clinical tool. The source code is accessible to the public at: https://github.com/lol88/MSMO.
Collapse
Affiliation(s)
- Saidi Guo
- School of Cyber Science and Engineering, Zhengzhou University, Zhengzhou 450002, China; School of Computer and Artificial Intelligence, Zhengzhou University, Zhengzhou 450001, China
| | - Zhaoshan Liu
- Department of Mechanical Engineering, National University of Singapore, 9 Engineering Drive 1, Singapore 117575, Singapore
| | - Ziduo Yang
- Department of Mechanical Engineering, National University of Singapore, 9 Engineering Drive 1, Singapore 117575, Singapore; School of Intelligent Systems Engineering, Shenzhen Campus of Sun Yat-sen University, Shenzhen, Guangdong 518107, China
| | - Chau Hung Lee
- Department of Radiology, Tan Tock Seng Hospital, 11 Jalan Tan Tock Seng, Singapore 308433, Singapore
| | - Qiujie Lv
- School of Computer and Artificial Intelligence, Zhengzhou University, Zhengzhou 450001, China.
| | - Lei Shen
- Department of Mechanical Engineering, National University of Singapore, 9 Engineering Drive 1, Singapore 117575, Singapore.
| |
Collapse
|
22
|
Zhao L, Wang T, Chen Y, Zhang X, Tang H, Lin F, Li C, Li Q, Tan T, Kang D, Tong T. A novel framework for segmentation of small targets in medical images. Sci Rep 2025; 15:9924. [PMID: 40121297 PMCID: PMC11929788 DOI: 10.1038/s41598-025-94437-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2024] [Accepted: 03/13/2025] [Indexed: 03/25/2025] Open
Abstract
Medical image segmentation represents a pivotal and intricate procedure in the domain of medical image processing and analysis. With the progression of artificial intelligence in recent years, the utilization of deep learning techniques for medical image segmentation has witnessed escalating popularity. Nevertheless, the intricate nature of medical image poses challenges on the segmentation of diminutive targets is still in its early stages. Current networks encounter difficulties in addressing the segmentation of exceedingly small targets, especially when the number of training samples is limited. To overcome this constraint, we have implemented a proficient strategy to enhance lesion images containing small targets and constrained samples. We introduce a segmentation framework termed STS-Net, specifically designed for small target segmentation. This framework leverages the established capacity of convolutional neural networks to acquire effective image representations. The proposed STS-Net network adopts a ResNeXt50-32x4d architecture as the encoder, integrating attention mechanisms during the encoding phase to amplify the feature representation capabilities of the network. We evaluated the proposed network on four publicly available datasets. Experimental results underscore the superiority of our approach in the domain of medical image segmentation, particularly for small target segmentation. The codes are available at https://github.com/zlxokok/STSNet .
Collapse
Affiliation(s)
- Longxuan Zhao
- College of Physics and Information Engineering, Fuzhou University, Fuzhou, 350100, China.
- Fujian Key Lab of Medical Instrumentation and Pharmaceutical Technology, Fuzhou, 350100, China.
| | - Tao Wang
- College of Physics and Information Engineering, Fuzhou University, Fuzhou, 350100, China
- Fujian Key Lab of Medical Instrumentation and Pharmaceutical Technology, Fuzhou, 350100, China
| | - Yuanbin Chen
- College of Physics and Information Engineering, Fuzhou University, Fuzhou, 350100, China
- Fujian Key Lab of Medical Instrumentation and Pharmaceutical Technology, Fuzhou, 350100, China
| | - Xinlin Zhang
- College of Physics and Information Engineering, Fuzhou University, Fuzhou, 350100, China
- Fujian Key Lab of Medical Instrumentation and Pharmaceutical Technology, Fuzhou, 350100, China
- Imperial Vision Technology, Fuzhou, 350100, China
| | - Hui Tang
- College of Physics and Information Engineering, Fuzhou University, Fuzhou, 350100, China
- Fujian Key Lab of Medical Instrumentation and Pharmaceutical Technology, Fuzhou, 350100, China
| | - Fuxin Lin
- Department of Neurosurgery, Neurosurgery Research Institute, The First Affiliated Hospital, Fujian Medical University, Fuzhou, 350100, China
- Department of Neurosurgery, National Regional Medical Center, Binhai Campus of the First Affiliated Hospital, Fujian Medical University, Fuzhou, 350100, China
- Department of Neurosurgery, Fujian Institute of Brain Disorders and Brain Science, Fujian Clinical Research Center for Neurological Diseases, The First Affiliated Hospital and Neurosurgery Research Institute, Fujian Medical University, Fuzhou, 350100, China
- Fujian Provincial Clinical Research Center for Neurological Diseases, The First Affiliated Hospital, Fujian Medical University, Fuzhou, 350100, China
- Clinical Research and Translation Center, The First Affiliated Hospital, Fujian Medical University, Fuzhou, 350100, China
| | - Chunwang Li
- Department of Neurosurgery, Neurosurgery Research Institute, The First Affiliated Hospital, Fujian Medical University, Fuzhou, 350100, China
- Department of Neurosurgery, National Regional Medical Center, Binhai Campus of the First Affiliated Hospital, Fujian Medical University, Fuzhou, 350100, China
| | - Qixuan Li
- Department of Neurosurgery, Neurosurgery Research Institute, The First Affiliated Hospital, Fujian Medical University, Fuzhou, 350100, China
- Department of Neurosurgery, National Regional Medical Center, Binhai Campus of the First Affiliated Hospital, Fujian Medical University, Fuzhou, 350100, China
| | - Tao Tan
- Macao Polytechnic University, Macao, 999078, China
| | - Dezhi Kang
- Department of Neurosurgery, Neurosurgery Research Institute, The First Affiliated Hospital, Fujian Medical University, Fuzhou, 350100, China.
- Department of Neurosurgery, National Regional Medical Center, Binhai Campus of the First Affiliated Hospital, Fujian Medical University, Fuzhou, 350100, China.
- Department of Neurosurgery, Fujian Institute of Brain Disorders and Brain Science, Fujian Clinical Research Center for Neurological Diseases, The First Affiliated Hospital and Neurosurgery Research Institute, Fujian Medical University, Fuzhou, 350100, China.
- Fujian Provincial Clinical Research Center for Neurological Diseases, The First Affiliated Hospital, Fujian Medical University, Fuzhou, 350100, China.
- Clinical Research and Translation Center, The First Affiliated Hospital, Fujian Medical University, Fuzhou, 350100, China.
| | - Tong Tong
- College of Physics and Information Engineering, Fuzhou University, Fuzhou, 350100, China.
- Fujian Key Lab of Medical Instrumentation and Pharmaceutical Technology, Fuzhou, 350100, China.
- Imperial Vision Technology, Fuzhou, 350100, China.
| |
Collapse
|
23
|
Minocha S, Sharma SR, Singh B, Gandomi AH. Adaptive image encryption approach using an enhanced swarm intelligence algorithm. Sci Rep 2025; 15:9476. [PMID: 40108167 PMCID: PMC11923226 DOI: 10.1038/s41598-025-86569-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2024] [Accepted: 01/13/2025] [Indexed: 03/22/2025] Open
Abstract
Chaos-based encryption methods have gained popularity due to the unique properties of chaos. The performance of chaos-based encryption methods is highly impacted by the values of initial and control parameters. Therefore, this work proposes Iterative Cosine operator-based Hippopotamus Optimization (ICO-HO) to select optimal parameters for chaotic maps, which is further used to design an adaptive image encryption approach. ICO-HO algorithm improves the Hippopotamus Optimization (HO) by integrating a new phase (Phase 4) to update the position of the hippopotamus. ICO-HO updates the position of hippopotamuses using ICO and opposition-based learning, which enhances the exploration and exploitation capabilities of the HO algorithm. ICO-HO algorithm's better performance is signified by the Friedman mean rank test applied to mean values obtained on the CEC-2017 benchmark functions. The ICO-HO algorithm is utilized to optimize the parameters of PWLCM and PWCM chaotic maps to generate a secret key in the confusion and diffusion phases of image encryption. The performance of the proposed encryption approach is evaluated on grayscale, RGB, and hyperspectral medical images of different modalities, bit depth, and sizes. Different analyses, such as visual analysis, statistical attack analysis, differential attack analysis, and quantitative analysis, have been utilized to assess the effectiveness of the proposed encryption approach. The higher NPCR and UACI values, i.e., 99.60% and 33.40%, respectively, ensure security against differential attacks. Furthermore, the proposed encryption approach is compared with five state-of-the-art encryption techniques available in the literature and six similar metaheuristic techniques using NPCR, UACI, entropy, and correlation coefficient. The proposed methods exhibit 7.9995 and 15.8124 entropy values on 8-bit and 16-bit images, respectively, which is better than all other stated methods, resulting in improved image encryption with high randomness.
Collapse
Affiliation(s)
- Sachin Minocha
- School of Computer Science Engineering and Technology, Bennett University, Greater Noida, India
| | - Suvita Rani Sharma
- School of Computer Science Engineering and Technology, Bennett University, Greater Noida, India
| | - Birmohan Singh
- Department of Computer Science and Engineering, Sant Longowal Institute of Engineering and Technology, Longowal, India
| | - Amir H Gandomi
- Faculty of Engineering & IT, University of Technology Sydney, Sydney, Australia.
- University Research and Innovation Center (EKIK), Obuda University, Budapest, Hungary.
- Department of Computer Science, Khazar University, 41 Mahsati, Baku, Azerbaijan.
| |
Collapse
|
24
|
Li Y, Huang J, Zhang Y, Deng J, Zhang J, Dong L, Wang D, Mei L, Lei C. Dual branch segment anything model-transformer fusion network for accurate breast ultrasound image segmentation. Med Phys 2025. [PMID: 40103542 DOI: 10.1002/mp.17751] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2024] [Revised: 02/14/2025] [Accepted: 02/16/2025] [Indexed: 03/20/2025] Open
Abstract
BACKGROUND Precise and rapid ultrasound-based breast cancer diagnosis is essential for effective treatment. However, existing ultrasound image segmentation methods often fail to capture both global contextual features and fine-grained boundary details. PURPOSE This study proposes a dual-branch network architecture that combines the Swin Transformer and Segment Anything Model (SAM) to enhance breast ultrasound image (BUSI) segmentation accuracy and reliability. METHODS Our network integrates the global attention mechanism of the Swin Transformer with fine-grained boundary detection from SAM through a multi-stage feature fusion module. We evaluated our method against state-of-the-art methods on two datasets: the Breast Ultrasound Images dataset from Wuhan University (BUSI-WHU), which contains 927 images (560 benign and 367 malignant) with ground truth masks annotated by radiologists, and the public BUSI dataset. Performance was evaluated using mean Intersection-over-Union (mIoU), 95th percentile Hausdorff Distance (HD95) and Dice Similarity coefficients, with statistical significance assessed using two-tailed independent t-tests with Holm-Bonferroni correction (α = 0.05 $\alpha =0.05$ ). RESULTS On our proposed dataset, the network achieved a mIoU of 90.82% and a HD95 of 23.50 pixels, demonstrating significant improvements over current state-of-the-art methods with effect sizes for mIoU ranging from 0.38 to 0.61 (p < $<$ 0.05). On the BUSI dataset, the network achieved a mIoU of 82.83% and a HD95 of 71.13 pixels, demonstrating comparable improvements with effect sizes for mIoU ranging from 0.45 to 0.58 (p < $<$ 0.05). CONCLUSIONS Our dual-branch network leverages the complementary strengths of Swin Transformer and SAM through a fusion mechanism, demonstrating superior breast ultrasound segmentation performance. Our code is publicly available at https://github.com/Skylanding/DSATNet.
Collapse
Affiliation(s)
- Yu Li
- The Institute of Technological Sciences, Wuhan University, Wuhan, China
| | - Jin Huang
- The Institute of Technological Sciences, Wuhan University, Wuhan, China
| | - Yimin Zhang
- The Department of Breast and Thyroid Surgery, Renmin Hospital of Wuhan University, Wuhan, China
| | - Jingwen Deng
- The Department of Breast and Thyroid Surgery, Renmin Hospital of Wuhan University, Wuhan, China
| | - Jingwen Zhang
- The Department of Breast and Thyroid Surgery, Renmin Hospital of Wuhan University, Wuhan, China
| | - Lan Dong
- The Department of Gynecology, Renmin Hospital of Wuhan University, Wuhan, China
| | - Du Wang
- The Institute of Technological Sciences, Wuhan University, Wuhan, China
| | - Liye Mei
- The Institute of Technological Sciences, Wuhan University, Wuhan, China
- School of Computer Science, Hubei University of Technology, Wuhan, China
| | - Cheng Lei
- The Institute of Technological Sciences, Wuhan University, Wuhan, China
- Suzhou Institute of Wuhan University, Suzhou, China
- Shenzhen Institute of Wuhan University, Shenzhen, China
| |
Collapse
|
25
|
Doerrich S, Di Salvo F, Brockmann J, Ledig C. Rethinking model prototyping through the MedMNIST+ dataset collection. Sci Rep 2025; 15:7669. [PMID: 40044786 PMCID: PMC11883007 DOI: 10.1038/s41598-025-92156-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2024] [Accepted: 02/25/2025] [Indexed: 03/09/2025] Open
Abstract
The integration of deep learning based systems in clinical practice is often impeded by challenges rooted in limited and heterogeneous medical datasets. In addition, the field has increasingly prioritized marginal performance gains on a few, narrowly scoped benchmarks over clinical applicability, slowing down meaningful algorithmic progress. This trend often results in excessive fine-tuning of existing methods on selected datasets rather than fostering clinically relevant innovations. In response, this work introduces a comprehensive benchmark for the MedMNIST+ dataset collection, designed to diversify the evaluation landscape across several imaging modalities, anatomical regions, classification tasks and sample sizes. We systematically reassess commonly used Convolutional Neural Networks (CNNs) and Vision Transformer (ViT) architectures across distinct medical datasets, training methodologies, and input resolutions to validate and refine existing assumptions about model effectiveness and development. Our findings suggest that computationally efficient training schemes and modern foundation models offer viable alternatives to costly end-to-end training. Additionally, we observe that higher image resolutions do not consistently improve performance beyond a certain threshold. This highlights the potential benefits of using lower resolutions, particularly in prototyping stages, to reduce computational demands without sacrificing accuracy. Notably, our analysis reaffirms the competitiveness of CNNs compared to ViTs, emphasizing the importance of comprehending the intrinsic capabilities of different architectures. Finally, by establishing a standardized evaluation framework, we aim to enhance transparency, reproducibility, and comparability within the MedMNIST+ dataset collection as well as future research. Code is available at (https://github.com/sdoerrich97/rethinking-model-prototyping-MedMNISTPlus).
Collapse
Affiliation(s)
| | | | - Julius Brockmann
- University of Bamberg, xAILab Bamberg, Bamberg, 96047, Germany
- Ludwig Maximilian University of Munich, Munich, 80539, Germany
| | - Christian Ledig
- University of Bamberg, xAILab Bamberg, Bamberg, 96047, Germany
| |
Collapse
|
26
|
Zheng S, Li J, Qiao L, Gao X. Multi-task interaction learning for accurate segmentation and classification of breast tumors in ultrasound images. Phys Med Biol 2025; 70:065006. [PMID: 39854844 DOI: 10.1088/1361-6560/adae4d] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2024] [Accepted: 01/24/2025] [Indexed: 01/27/2025]
Abstract
Objective.In breast diagnostic imaging, the morphological variability of breast tumors and the inherent ambiguity of ultrasound images pose significant challenges. Moreover, multi-task computer-aided diagnosis systems in breast imaging may overlook inherent relationships between pixel-wise segmentation and categorical classification tasks.Approach.In this paper, we propose a multi-task learning network with deep inter-task interactions that exploits the inherently relations between two tasks. First, we fuse self-task attention and cross-task attention mechanisms to explore the two types of interaction information, location and semantic, between tasks. In addition, a feature aggregation block is developed based on the channel attention mechanism, which reduces the semantic differences between the decoder and the encoder. To exploit inter-task further, our network uses an circle training strategy to refine heterogeneous feature with the help of segmentation maps obtained from previous training.Main results.The experimental results show that our method achieved excellent performance on the BUSI and BUS-B datasets, with DSCs of 81.95% and 86.41% for segmentation tasks, and F1 scores of 82.13% and 69.01% for classification tasks, respectively.Significance.The proposed multi-task interaction learning not only enhances the performance of all tasks related to breast tumor segmentation and classification but also promotes research in multi-task learning, providing further insights for clinical applications.
Collapse
Affiliation(s)
- Shenhai Zheng
- College of Computer Science and Technology, Chongqing University of Posts and Telecommunications, Chongqing, People's Republic of China
- Chongqing Key Laboratory of Image Cognition, Chongqing University of Posts and Telecommunications, Chongqing, People's Republic of China
| | - Jianfei Li
- College of Computer Science and Technology, Chongqing University of Posts and Telecommunications, Chongqing, People's Republic of China
| | - Lihong Qiao
- College of Computer Science and Technology, Chongqing University of Posts and Telecommunications, Chongqing, People's Republic of China
- Chongqing Key Laboratory of Image Cognition, Chongqing University of Posts and Telecommunications, Chongqing, People's Republic of China
| | - Xi Gao
- College of Computer Science and Technology, Chongqing University of Posts and Telecommunications, Chongqing, People's Republic of China
| |
Collapse
|
27
|
Güneş YC, Cesur T, Çamur E, Karabekmez LG. Evaluating text and visual diagnostic capabilities of large language models on questions related to the Breast Imaging Reporting and Data System Atlas 5 th edition. Diagn Interv Radiol 2025; 31:111-129. [PMID: 39248152 PMCID: PMC11880873 DOI: 10.4274/dir.2024.242876] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2024] [Accepted: 08/24/2024] [Indexed: 09/10/2024]
Abstract
PURPOSE This study aimed to evaluate the performance of large language models (LLMs) and multimodal LLMs in interpreting the Breast Imaging Reporting and Data System (BI-RADS) categories and providing clinical management recommendations for breast radiology in text-based and visual questions. METHODS This cross-sectional observational study involved two steps. In the first step, we compared ten LLMs (namely ChatGPT 4o, ChatGPT 4, ChatGPT 3.5, Google Gemini 1.5 Pro, Google Gemini 1.0, Microsoft Copilot, Perplexity, Claude 3.5 Sonnet, Claude 3 Opus, and Claude 3 Opus 200K), general radiologists, and a breast radiologist using 100 text-based multiple-choice questions (MCQs) related to the BI-RADS Atlas 5th edition. In the second step, we assessed the performance of five multimodal LLMs (ChatGPT 4o, ChatGPT 4V, Claude 3.5 Sonnet, Claude 3 Opus, and Google Gemini 1.5 Pro) in assigning BI-RADS categories and providing clinical management recommendations on 100 breast ultrasound images. The comparison of correct answers and accuracy by question types was analyzed using McNemar's and chi-squared tests. Management scores were analyzed using the Kruskal- Wallis and Wilcoxon tests. RESULTS Claude 3.5 Sonnet achieved the highest accuracy in text-based MCQs (90%), followed by ChatGPT 4o (89%), outperforming all other LLMs and general radiologists (78% and 76%) (P < 0.05), except for the Claude 3 Opus models and the breast radiologist (82%) (P > 0.05). Lower-performing LLMs included Google Gemini 1.0 (61%) and ChatGPT 3.5 (60%). Performance across different categories of showed no significant variation among LLMs or radiologists (P > 0.05). For breast ultrasound images, Claude 3.5 Sonnet achieved 59% accuracy, significantly higher than other multimodal LLMs (P < 0.05). Management recommendations were evaluated using a 3-point Likert scale, with Claude 3.5 Sonnet scoring the highest (mean: 2.12 ± 0.97) (P < 0.05). Accuracy varied significantly across BI-RADS categories, except Claude 3 Opus (P < 0.05). Gemini 1.5 Pro failed to answer any BI-RADS 5 questions correctly. Similarly, ChatGPT 4V failed to answer any BI-RADS 1 questions correctly, making them the least accurate in these categories (P < 0.05). CONCLUSION Although LLMs such as Claude 3.5 Sonnet and ChatGPT 4o show promise in text-based BI-RADS assessments, their limitations in visual diagnostics suggest they should be used cautiously and under radiologists' supervision to avoid misdiagnoses. CLINICAL SIGNIFICANCE This study demonstrates that while LLMs exhibit strong capabilities in text-based BI-RADS assessments, their visual diagnostic abilities are currently limited, necessitating further development and cautious application in clinical practice.
Collapse
Affiliation(s)
- Yasin Celal Güneş
- Kırıkkale Yüksek İhtisas Hospital Clinic of Radiology, Kırıkkale, Türkiye
| | - Turay Cesur
- Mamak State Hospital Clinic of Radiology, Ankara, Türkiye
| | - Eren Çamur
- Ankara 29 Mayıs State Hospital Clinic of Radiology, Ankara, Türkiye
| | - Leman Günbey Karabekmez
- Ankara Yıldırım Beyazıt University Faculty of Medicine Department of Radiology, Ankara, Türkiye
| |
Collapse
|
28
|
Lu X, Lu Y, Zhao W, Qi Y, Zhang H, Sun W, Zhang H, Ma P, Guan L, Ma Y. Ultrasound-based deep learning radiomics for multi-stage assisted diagnosis in reducing unnecessary biopsies of BI-RADS 4A lesions. Quant Imaging Med Surg 2025; 15:2512-2528. [PMID: 40160614 PMCID: PMC11948369 DOI: 10.21037/qims-24-580] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2024] [Accepted: 12/03/2024] [Indexed: 04/02/2025]
Abstract
Background Even with the Breast Imaging Reporting and Data System (BI-RADS) guiding risk stratification on ultrasound (US) images, inconsistencies in diagnostic accuracy still exist, leading patients being subjected to unnecessary biopsies in clinical practice. This study investigated the construction of deep learning radiomics (DLR) models to improve the diagnostic consistency and reduce the unnecessary biopsies for BI-RADS 4A lesions. Methods A total of 746 patients with breast lesions were enrolled in this retrospective study. Two DLR models based on US images and clinical variables were developed to conduct breast lesion risk re-stratification as BI-RADS 3 or lower and BI-RADS 4A or higher (DLR_LH), while simultaneously identifying BI-RADS 4A lesions with low malignancy probabilities to avoid unnecessary biopsy (DLR_BM). A three-round reader study with a two-stage artificial intelligence (AI)-assisted diagnosis process was performed to verify the assistive capability and practical benefits of the models in clinical applications. Results The DLR_LH model achieved areas under the receiver operating characteristic curve (AUCs) of 0.963 and 0.889 with sensitivities of 92.0% and 83.3%, in the internal and external validation cohorts, respectively. The DLR_BM model exhibited AUCs of 0.977 and 0.942, with sensitivities of 94.1% and 86.4%, respectively. Both models were evaluated using integrated features of US images and clinical variables. Ultimately, 27.7% of BI-RADS 4A lesions avoided unnecessary biopsies. In the three-round reader study, all readers achieved significantly higher diagnostic accuracy and specificity, while maintaining outstanding sensitivity comparable to human experts, both before and after model assistance (P<0.05). These findings demonstrate the positive impact of the DLR models in assisting radiologists to enhance their diagnostic capabilities. Conclusions The models performed well in breast US imaging interpretation and BI-RADS risk re-stratification, and demonstrated potential in reducing unnecessary biopsies of BI-RADS 4A lesions, indicating the promising applicability of the DLR models in clinical diagnosis.
Collapse
Affiliation(s)
- Xiangyu Lu
- School of Information Science and Engineering, Lanzhou University, Lanzhou, China
| | - Yun Lu
- Department of Ultrasound, Gansu Provincial Cancer Hospital, Lanzhou, China
| | - Wuyuan Zhao
- School of Information Science and Engineering, Lanzhou University, Lanzhou, China
| | | | - Hongjuan Zhang
- School of Information Science and Engineering, Lanzhou University, Lanzhou, China
| | - Wenhao Sun
- School of Information Science and Engineering, Lanzhou University, Lanzhou, China
| | - Huaikun Zhang
- School of Information Science and Engineering, Lanzhou University, Lanzhou, China
| | - Pei Ma
- School of Information Science and Engineering, Lanzhou University, Lanzhou, China
| | - Ling Guan
- Department of Ultrasound, Gansu Provincial Cancer Hospital, Lanzhou, China
| | - Yide Ma
- School of Information Science and Engineering, Lanzhou University, Lanzhou, China
| |
Collapse
|
29
|
Prinzi F, Militello C, Zarcaro C, Bartolotta TV, Gaglio S, Vitabile S. Rad4XCNN: A new agnostic method for post-hoc global explanation of CNN-derived features by means of Radiomics. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2025; 260:108576. [PMID: 39798282 DOI: 10.1016/j.cmpb.2024.108576] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/29/2024] [Revised: 12/11/2024] [Accepted: 12/25/2024] [Indexed: 01/15/2025]
Abstract
BACKGROUND AND OBJECTIVE In recent years, machine learning-based clinical decision support systems (CDSS) have played a key role in the analysis of several medical conditions. Despite their promising capabilities, the lack of transparency in AI models poses significant challenges, particularly in medical contexts where reliability is a mandatory aspect. However, it appears that explainability is inversely proportional to accuracy. For this reason, achieving transparency without compromising predictive accuracy remains a key challenge. METHODS This paper presents a novel method, namely Rad4XCNN, to enhance the predictive power of CNN-derived features with the inherent interpretability of radiomic features. Rad4XCNN diverges from conventional methods based on saliency maps, by associating intelligible meaning to CNN-derived features by means of Radiomics, offering new perspectives on explanation methods beyond visualization maps. RESULTS Using a breast cancer classification task as a case study, we evaluated Rad4XCNN on ultrasound imaging datasets, including an online dataset and two in-house datasets for internal and external validation. Some key results are: (i) CNN-derived features guarantee more robust accuracy when compared against ViT-derived and radiomic features; (ii) conventional visualization map methods for explanation present several pitfalls; (iii) Rad4XCNN does not sacrifice model accuracy for their explainability; (iv) Rad4XCNN provides a global explanation enabling the physician to extract global insights and findings. CONCLUSIONS Our method can mitigate some concerns related to the explainability-accuracy trade-off. This study highlighted the importance of proposing new methods for model explanation without affecting their accuracy.
Collapse
Affiliation(s)
- Francesco Prinzi
- Department of Biomedicine, Neuroscience and Advanced Diagnostics (BiND), University of Palermo, Palermo, 90127, Italy.
| | - Carmelo Militello
- Institute for High-Performance Computing and Networking (ICAR-CNR), Italian National Research Council, Palermo, 90146, Italy.
| | - Calogero Zarcaro
- Department of Biomedicine, Neuroscience and Advanced Diagnostics (BiND), University of Palermo, Palermo, 90127, Italy.
| | - Tommaso Vincenzo Bartolotta
- Department of Biomedicine, Neuroscience and Advanced Diagnostics (BiND), University of Palermo, Palermo, 90127, Italy.
| | - Salvatore Gaglio
- Institute for High-Performance Computing and Networking (ICAR-CNR), Italian National Research Council, Palermo, 90146, Italy; Department of Engineering, University of Palermo, Palermo, 90128, Italy.
| | - Salvatore Vitabile
- Department of Biomedicine, Neuroscience and Advanced Diagnostics (BiND), University of Palermo, Palermo, 90127, Italy.
| |
Collapse
|
30
|
Tian F, Zhai J, Gong J, Lei W, Chang S, Ju F, Qian S, Zou X. SAM-MedUS: a foundational model for universal ultrasound image segmentation. J Med Imaging (Bellingham) 2025; 12:027001. [PMID: 40028655 PMCID: PMC11865838 DOI: 10.1117/1.jmi.12.2.027001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2024] [Revised: 01/19/2025] [Accepted: 02/04/2025] [Indexed: 03/05/2025] Open
Abstract
Purpose Segmentation of ultrasound images for medical diagnosis, monitoring, and research is crucial, and although existing methods perform well, they are limited by specific organs, tumors, and image devices. Applications of the Segment Anything Model (SAM), such as SAM-med2d, use a large number of medical datasets that contain only a small fraction of the ultrasound medical images. Approach In this work, we proposed a SAM-MedUS model for generic ultrasound image segmentation that utilizes the latest publicly available ultrasound image dataset to create a diverse dataset containing eight site categories for training and testing. We integrated ConvNext V2 and CM blocks in the encoder for better global context extraction. In addition, a boundary loss function is used to improve the segmentation of fuzzy boundaries and low-contrast ultrasound images. Results Experimental results show that SAM-MedUS outperforms recent methods on multiple ultrasound datasets. For the more easily datasets such as the adult kidney, it achieves 87.93% IoU and 93.58% dice, whereas for more complex ones such as the infant vein, IoU and dice reach 62.31% and 78.93%, respectively. Conclusions We collected and collated an ultrasound dataset of multiple different site types to achieve uniform segmentation of ultrasound images. In addition, the use of additional auxiliary branches ConvNext V2 and CM block enhances the ability of the model to extract global information and the use of boundary loss allows the model to exhibit robust performance and excellent generalization ability.
Collapse
Affiliation(s)
- Feng Tian
- Hunan Normal University, The School of Physics and Electronics, Changsha, China
| | - Jintao Zhai
- Hunan University, College of Computer Science and Electronic Engineering, Changsha, China
| | - Jinru Gong
- Hunan Normal University, The School of Physics and Electronics, Changsha, China
| | - Weirui Lei
- Hunan Normal University, The School of Physics and Electronics, Changsha, China
| | - Shuai Chang
- Hunan Normal University, The School of Physics and Electronics, Changsha, China
| | - Fangfang Ju
- Hunan Normal University, The School of Physics and Electronics, Changsha, China
| | - Shengyou Qian
- Hunan Normal University, The School of Physics and Electronics, Changsha, China
| | - Xiao Zou
- Hunan Normal University, The School of Physics and Electronics, Changsha, China
| |
Collapse
|
31
|
Aumente-Maestro C, Díez J, Remeseiro B. A multi-task framework for breast cancer segmentation and classification in ultrasound imaging. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2025; 260:108540. [PMID: 39647406 DOI: 10.1016/j.cmpb.2024.108540] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/24/2024] [Revised: 11/08/2024] [Accepted: 11/28/2024] [Indexed: 12/10/2024]
Abstract
BACKGROUND Ultrasound (US) is a medical imaging modality that plays a crucial role in the early detection of breast cancer. The emergence of numerous deep learning systems has offered promising avenues for the segmentation and classification of breast cancer tumors in US images. However, challenges such as the absence of data standardization, the exclusion of non-tumor images during training, and the narrow view of single-task methodologies have hindered the practical applicability of these systems, often resulting in biased outcomes. This study aims to explore the potential of multi-task systems in enhancing the detection of breast cancer lesions. METHODS To address these limitations, our research introduces an end-to-end multi-task framework designed to leverage the inherent correlations between breast cancer lesion classification and segmentation tasks. Additionally, a comprehensive analysis of a widely utilized public breast cancer ultrasound dataset named BUSI was carried out, identifying its irregularities and devising an algorithm tailored for detecting duplicated images in it. RESULTS Experiments are conducted utilizing the curated dataset to minimize potential biases in outcomes. Our multi-task framework exhibits superior performance in breast cancer respecting single-task approaches, achieving improvements close to 15% in segmentation and classification. Moreover, a comparative analysis against the state-of-the-art reveals statistically significant enhancements across both tasks. CONCLUSION The experimental findings underscore the efficacy of multi-task techniques, showcasing better generalization capabilities when considering all image types: benign, malignant, and non-tumor images. Consequently, our methodology represents an advance towards more general architectures with real clinical applications in the breast cancer field.
Collapse
Affiliation(s)
| | - Jorge Díez
- Artificial Intelligence Center, Universidad de Oviedo, Gijón, 33204, Spain
| | - Beatriz Remeseiro
- Artificial Intelligence Center, Universidad de Oviedo, Gijón, 33204, Spain.
| |
Collapse
|
32
|
Nguyen-Tat TB, Vo HA, Dang PS. QMaxViT-Unet+: A query-based MaxViT-Unet with edge enhancement for scribble-supervised segmentation of medical images. Comput Biol Med 2025; 187:109762. [PMID: 39919665 DOI: 10.1016/j.compbiomed.2025.109762] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2024] [Revised: 01/17/2025] [Accepted: 01/27/2025] [Indexed: 02/09/2025]
Abstract
The deployment of advanced deep learning models for medical image segmentation is often constrained by the requirement for extensively annotated datasets. Weakly-supervised learning, which allows less precise labels, has become a promising solution to this challenge. Building on this approach, we propose QMaxViT-Unet+, a novel framework for scribble-supervised medical image segmentation. This framework is built on the U-Net architecture, with the encoder and decoder replaced by Multi-Axis Vision Transformer (MaxViT) blocks. These blocks enhance the model's ability to learn local and global features efficiently. Additionally, our approach integrates a query-based Transformer decoder to refine features and an edge enhancement module to compensate for the limited boundary information in the scribble label. We evaluate the proposed QMaxViT-Unet+ on four public datasets focused on cardiac structures, colorectal polyps, and breast cancer: ACDC, MS-CMRSeg, SUN-SEG, and BUSI. Evaluation metrics include the Dice similarity coefficient (DSC) and the 95th percentile of Hausdorff distance (HD95). Experimental results show that QMaxViT-Unet+ achieves 89.1% DSC and 1.316 mm HD95 on ACDC, 88.4% DSC and 2.226 mm HD95 on MS-CMRSeg, 71.4% DSC and 4.996 mm HD95 on SUN-SEG, and 69.4% DSC and 50.122 mm HD95 on BUSI. These results demonstrate that our method outperforms existing approaches in terms of accuracy, robustness, and efficiency while remaining competitive with fully-supervised learning approaches. This makes it ideal for medical image analysis, where high-quality annotations are often scarce and require significant effort and expense. The code is available at https://github.com/anpc849/QMaxViT-Unet.
Collapse
Affiliation(s)
- Thien B Nguyen-Tat
- University of Information Technology, Ho Chi Minh City, Vietnam; Vietnam National University, Ho Chi Minh City, Vietnam.
| | - Hoang-An Vo
- University of Information Technology, Ho Chi Minh City, Vietnam; Vietnam National University, Ho Chi Minh City, Vietnam
| | - Phuoc-Sang Dang
- University of Information Technology, Ho Chi Minh City, Vietnam; Vietnam National University, Ho Chi Minh City, Vietnam
| |
Collapse
|
33
|
Wang W, Zhou J, Zhao J, Lin X, Zhang Y, Lu S, Zhao W, Wang S, Tang W, Qu X. Interactively Fusing Global and Local Features for Benign and Malignant Classification of Breast Ultrasound Images. ULTRASOUND IN MEDICINE & BIOLOGY 2025; 51:525-534. [PMID: 39709289 DOI: 10.1016/j.ultrasmedbio.2024.11.014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/23/2024] [Revised: 10/17/2024] [Accepted: 11/14/2024] [Indexed: 12/23/2024]
Abstract
OBJECTIVE Breast ultrasound (BUS) is used to classify benign and malignant breast tumors, and its automatic classification can reduce subjectivity. However, current convolutional neural networks (CNNs) face challenges in capturing global features, while vision transformer (ViT) networks have limitations in effectively extracting local features. Therefore, this study aimed to develop a deep learning method that enables the interaction and updating of intermediate features between CNN and ViT to achieve high-accuracy BUS image classification. METHODS This study introduced the CNN and transformer multi-stage fusion network (CTMF-Net) consisting of two branches: a CNN branch and a transformer branch. The CNN branch employs visual geometry group as its backbone, while the transformer branch utilizes ViT as its base network. Both branches were divided into four stages. At the end of each stage, a proposed feature interaction module facilitated feature interaction and fusion between the two branches. Additionally, the convolutional block attention module was employed to enhance relevant features after each stage of the CNN branch. Extensive experiments were conducted using various state-of-the-art deep-learning classification methods on three public breast ultrasound datasets (SYSU, UDIAT and BUSI). RESULTS For the internal validation on SYSU and UDIAT, our proposed method CTMF-Net achieved the highest accuracy of 90.14 ± 0.58% on SYSU and 92.04 ± 4.90% on UDIAT, which showed superior classification performance over other state-of-art networks (p < 0.05). Additionally, for external validation on BUSI, CTMF-Net showed outstanding performance, achieving the highest area under the curve score of 0.8704 when trained on SYSU, marking a 0.0126 improvement over the second-best visual geometry group attention ViT method. Similarly, when applied to UDIAT, CTMF-Net achieved an area under the curve score of 0.8505, surpassing the second-best global context ViT method by 0.0130. CONCLUSION Our proposed method, CTMF-Net, outperforms all existing methods and can effectively assist doctors in achieving more accurate classification performance of breast tumors.
Collapse
Affiliation(s)
- Wenhan Wang
- School of Instrumentation and Optoelectronics Engineering, Beihang University, Beijing, China
| | - Jiale Zhou
- School of Instrumentation and Optoelectronics Engineering, Beihang University, Beijing, China
| | - Jin Zhao
- Breast and Thyroid Surgery, China-Japan Friendship Hospital, Beijing, China
| | - Xun Lin
- School of Computer Science and Engineering, Beihang University, Beijing, China
| | - Yan Zhang
- Department of Gynecology and Obstetrics, Peking University Third Hospital, Beijing, China
| | - Shan Lu
- Department of Gynecology and Obstetrics, Peking University Third Hospital, Beijing, China
| | - Wanchen Zhao
- School of Instrumentation and Optoelectronics Engineering, Beihang University, Beijing, China
| | - Shuai Wang
- School of Computer Science and Engineering, Beihang University, Beijing, China
| | - Wenzhong Tang
- School of Computer Science and Engineering, Beihang University, Beijing, China
| | - Xiaolei Qu
- School of Instrumentation and Optoelectronics Engineering, Beihang University, Beijing, China.
| |
Collapse
|
34
|
Zhao G, Zhu X, Wang X, Yan F, Guo M. Syn-Net: A Synchronous Frequency-Perception Fusion Network for Breast Tumor Segmentation in Ultrasound Images. IEEE J Biomed Health Inform 2025; 29:2113-2124. [PMID: 40030423 DOI: 10.1109/jbhi.2024.3514134] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/05/2025]
Abstract
Accurate breast tumor segmentation in ultrasound images is a crucial step in medical diagnosis and locating the tumor region. However, segmentation faces numerous challenges due to the complexity of ultrasound images, similar intensity distributions, variable tumor morphology, and speckle noise. To address these challenges and achieve precise segmentation of breast tumors in complex ultrasound images, we propose a Synchronous Frequency-perception Fusion Network (Syn-Net). Initially, we design a synchronous dual-branch encoder to extract local and global feature information simultaneously from complex ultrasound images. Secondly, we introduce a novel Frequency- perception Cross-Feature Fusion (FrCFusion) Block, which utilizes Discrete Cosine Transform (DCT) to learn all-frequency features and effectively fuse local and global features while mitigating issues arising from similar intensity distributions. In addition, we develop a Full-Scale Deep Supervision method that not only corrects the influence of speckle noise on segmentation but also effectively guides decoder features towards the ground truth. We conduct extensive experiments on three publicly available ultrasound breast tumor datasets. Comparison with 14 state-of-the-art deep learning segmentation methods demonstrates that our approach exhibits superior sensitivity to different ultrasound images, variations in tumor size and shape, speckle noise, and similarity in intensity distribution between surrounding tissues and tumors. On the BUSI and Dataset B datasets, our method achieves better Dice scores compared to state-of-the-art methods, indicating superior performance in ultrasound breast tumor segmentation.
Collapse
|
35
|
Cho Y, Misra S, Managuli R, Barr RG, Lee J, Kim C. Attention-based Fusion Network for Breast Cancer Segmentation and Classification Using Multi-modal Ultrasound Images. ULTRASOUND IN MEDICINE & BIOLOGY 2025; 51:568-577. [PMID: 39694743 DOI: 10.1016/j.ultrasmedbio.2024.11.020] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/28/2024] [Revised: 11/19/2024] [Accepted: 11/21/2024] [Indexed: 12/20/2024]
Abstract
OBJECTIVE Breast cancer is one of the most commonly occurring cancers in women. Thus, early detection and treatment of cancer lead to a better outcome for the patient. Ultrasound (US) imaging plays a crucial role in the early detection of breast cancer, providing a cost-effective, convenient, and safe diagnostic approach. To date, much research has been conducted to facilitate reliable and effective early diagnosis of breast cancer through US image analysis. Recently, with the introduction of machine learning technologies such as deep learning (DL), automated lesion segmentation and classification, the identification of malignant masses in US breasts has progressed, and computer-aided diagnosis (CAD) technology is being applied in clinics effectively. Herein, we propose a novel deep learning-based "segmentation + classification" model based on B- and SE-mode images. METHODS For the segmentation task, we propose a Multi-Modal Fusion U-Net (MMF-U-Net), which segments lesions by mixing B- and SE-mode information through fusion blocks. After segmenting, the lesion area from the B- and SE-mode images is cropped using a predicted segmentation mask. The encoder part of the pre-trained MMF-U-Net model is then used on the cropped B- and SE-mode breast US images to classify benign and malignant lesions. RESULTS The experimental results using the proposed method showed good segmentation and classification scores. The dice score, intersection over union (IoU), precision, and recall are 78.23%, 68.60%, 82.21%, and 80.58%, respectively, using the proposed MMF-U-Net on real-world clinical data. The classification accuracy is 98.46%. CONCLUSION Our results show that the proposed method will effectively segment the breast lesion area and can reliably classify the benign from malignant lesions.
Collapse
Affiliation(s)
- Yoonjae Cho
- Department of Electrical Engineering, Convergence IT Engineering, Mechanical Engineering, Medical Device Innovation Center, and Graduate School of Artificial Intelligence, and Medical Device Innovation Center, Pohang University of Science and Technology, Pohang, Republic of Korea
| | - Sampa Misra
- Department of Electrical Engineering, Convergence IT Engineering, Mechanical Engineering, Medical Device Innovation Center, and Graduate School of Artificial Intelligence, and Medical Device Innovation Center, Pohang University of Science and Technology, Pohang, Republic of Korea
| | - Ravi Managuli
- Department of Bioengineering, University of Washington, Seattle, USA
| | | | - Jeongmin Lee
- Department of Radiology and Center for Imaging Science, Samsung Medical Center, Sungkyunkwan University School of Medicine, Gangnam-gu, Seoul, Republic of Korea
| | - Chulhong Kim
- Department of Electrical Engineering, Convergence IT Engineering, Mechanical Engineering, Medical Device Innovation Center, and Graduate School of Artificial Intelligence, and Medical Device Innovation Center, Pohang University of Science and Technology, Pohang, Republic of Korea; Opticho Inc., Pohang, Republic of Korea.
| |
Collapse
|
36
|
郭 宏, 丁 优, 党 豪, 刘 彤, 宋 学, 张 格, 姚 硕, 侯 代, 吕 宗. [A joint distillation model for the tumor segmentation using breast ultrasound images]. SHENG WU YI XUE GONG CHENG XUE ZA ZHI = JOURNAL OF BIOMEDICAL ENGINEERING = SHENGWU YIXUE GONGCHENGXUE ZAZHI 2025; 42:148-155. [PMID: 40000187 PMCID: PMC11955334 DOI: 10.7507/1001-5515.202311054] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Subscribe] [Scholar Register] [Received: 11/24/2023] [Revised: 12/22/2024] [Indexed: 02/27/2025]
Abstract
The accurate segmentation of breast ultrasound images is an important precondition for the lesion determination. The existing segmentation approaches embrace massive parameters, sluggish inference speed, and huge memory consumption. To tackle this problem, we propose T 2KD Attention U-Net (dual-Teacher Knowledge Distillation Attention U-Net), a lightweight semantic segmentation method combined double-path joint distillation in breast ultrasound images. Primarily, we designed two teacher models to learn the fine-grained features from each class of images according to different feature representation and semantic information of benign and malignant breast lesions. Then we leveraged the joint distillation to train a lightweight student model. Finally, we constructed a novel weight balance loss to focus on the semantic feature of small objection, solving the unbalance problem of tumor and background. Specifically, the extensive experiments conducted on Dataset BUSI and Dataset B demonstrated that the T 2KD Attention U-Net outperformed various knowledge distillation counterparts. Concretely, the accuracy, recall, precision, Dice, and mIoU of proposed method were 95.26%, 86.23%, 85.09%, 83.59%and 77.78% on Dataset BUSI, respectively. And these performance indexes were 97.95%, 92.80%, 88.33%, 88.40% and 82.42% on Dataset B, respectively. Compared with other models, the performance of this model was significantly improved. Meanwhile, compared with the teacher model, the number, size, and complexity of student model were significantly reduced (2.2×10 6 vs. 106.1×10 6, 8.4 MB vs. 414 MB, 16.59 GFLOPs vs. 205.98 GFLOPs, respectively). Indeedy, the proposed model guarantees the performances while greatly decreasing the amount of computation, which provides a new method for the deployment of clinical medical scenarios.
Collapse
Affiliation(s)
- 宏江 郭
- 河南中医药大学 信息技术学院(郑州 450046)School of Information Technology, Henan University of Chinese Medicine, Zhengzhou 450046, P. R. China
| | - 优优 丁
- 河南中医药大学 信息技术学院(郑州 450046)School of Information Technology, Henan University of Chinese Medicine, Zhengzhou 450046, P. R. China
| | - 豪 党
- 河南中医药大学 信息技术学院(郑州 450046)School of Information Technology, Henan University of Chinese Medicine, Zhengzhou 450046, P. R. China
- 郑州市中医药信息智能分析与利用重点实验室(郑州 450046)Zhengzhou Key Laboratory of Intelligent Analysis and Utilization of Traditional Chinese Medicine Information, Zhengzhou 450046, P. R. China
| | - 彤彤 刘
- 河南中医药大学 信息技术学院(郑州 450046)School of Information Technology, Henan University of Chinese Medicine, Zhengzhou 450046, P. R. China
| | - 学坤 宋
- 河南中医药大学 信息技术学院(郑州 450046)School of Information Technology, Henan University of Chinese Medicine, Zhengzhou 450046, P. R. China
- 郑州市中医药信息智能分析与利用重点实验室(郑州 450046)Zhengzhou Key Laboratory of Intelligent Analysis and Utilization of Traditional Chinese Medicine Information, Zhengzhou 450046, P. R. China
| | - 格 张
- 河南中医药大学 信息技术学院(郑州 450046)School of Information Technology, Henan University of Chinese Medicine, Zhengzhou 450046, P. R. China
- 郑州市中医药信息智能分析与利用重点实验室(郑州 450046)Zhengzhou Key Laboratory of Intelligent Analysis and Utilization of Traditional Chinese Medicine Information, Zhengzhou 450046, P. R. China
| | - 硕 姚
- 河南中医药大学 信息技术学院(郑州 450046)School of Information Technology, Henan University of Chinese Medicine, Zhengzhou 450046, P. R. China
| | - 代森 侯
- 河南中医药大学 信息技术学院(郑州 450046)School of Information Technology, Henan University of Chinese Medicine, Zhengzhou 450046, P. R. China
| | - 宗旺 吕
- 河南中医药大学 信息技术学院(郑州 450046)School of Information Technology, Henan University of Chinese Medicine, Zhengzhou 450046, P. R. China
| |
Collapse
|
37
|
Yang X, Wang Y, Sui L. NMTNet: A Multi-task Deep Learning Network for Joint Segmentation and Classification of Breast Tumors. JOURNAL OF IMAGING INFORMATICS IN MEDICINE 2025:10.1007/s10278-025-01440-7. [PMID: 39971818 DOI: 10.1007/s10278-025-01440-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/30/2024] [Revised: 01/21/2025] [Accepted: 02/03/2025] [Indexed: 02/21/2025]
Abstract
Segmentation and classification of breast tumors are two critical tasks since they provide significant information for computer-aided breast cancer diagnosis. Combining these tasks leverages their intrinsic relevance to enhance performance, but the variability and complexity of tumor characteristics remain challenging. We propose a novel multi-task deep learning network (NMTNet) for the joint segmentation and classification of breast tumors, which is based on a convolutional neural network (CNN) and U-shaped architecture. It mainly comprises a shared encoder, a multi-scale fusion channel refinement (MFCR) module, a segmentation branch, and a classification branch. First, ResNet18 is used as the backbone network in the encoding part to enhance the feature representation capability. Then, the MFCR module is introduced to enrich the feature depth and diversity. Besides, the segmentation branch combines a lesion region enhancement (LRE) module between the encoder and decoder parts, aiming to capture more detailed texture and edge information of irregular tumors to improve segmentation accuracy. The classification branch incorporates a fine-grained classifier that reuses valuable segmentation information to discriminate between benign and malignant tumors. The proposed NMTNet is evaluated on both ultrasound and magnetic resonance imaging datasets. It achieves segmentation dice scores of 90.30% and 91.50%, and Jaccard indices of 84.70% and 88.10% for each dataset, respectively. And the classification accuracy scores are 87.50% and 99.64% for the corresponding datasets, respectively. Experimental results demonstrate the superiority of NMTNet over state-of-the-art methods on breast tumor segmentation and classification tasks.
Collapse
Affiliation(s)
- Xuelian Yang
- School of Health Science and Engineering, University of Shanghai for Science and Technology, Shanghai, 200093, China
| | - Yuanjun Wang
- School of Health Science and Engineering, University of Shanghai for Science and Technology, Shanghai, 200093, China.
| | - Li Sui
- School of Health Science and Engineering, University of Shanghai for Science and Technology, Shanghai, 200093, China
| |
Collapse
|
38
|
Wen X, Tu H, Zhao B, Zhou W, Yang Z, Li L. Identification of benign and malignant breast nodules on ultrasound: comparison of multiple deep learning models and model interpretation. Front Oncol 2025; 15:1517278. [PMID: 40040727 PMCID: PMC11876547 DOI: 10.3389/fonc.2025.1517278] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2024] [Accepted: 01/30/2025] [Indexed: 03/06/2025] Open
Abstract
Background and Purpose Deep learning (DL) algorithms generally require full supervision of annotating the region of interest (ROI), a process that is both labor-intensive and susceptible to bias. We aimed to develop a weakly supervised algorithm to differentiate between benign and malignant breast tumors in ultrasound images without image annotation. Methods We developed and validated the models using two publicly available datasets: breast ultrasound image (BUSI) and GDPH&SYSUCC breast ultrasound datasets. After removing the poor quality images, a total of 3049 images were included, divided into two classes: benign (N = 1320 images) and malignant (N = 1729 images). Weakly-supervised DL algorithms were implemented with four networks (DenseNet121, ResNet50, EffientNetb0, and Vision Transformer) and trained using 2136 unannotated breast ultrasound images. 609 and 304 images were used for validation and test sets, respectively. Diagnostic performances were calculated as the area under the receiver operating characteristic curve (AUC). Using the class activation map to interpret the prediction results of weakly supervised DL algorithms. Results The DenseNet121 model, utilizing complete image inputs without ROI annotations, demonstrated superior diagnostic performance in distinguishing between benign and malignant breast nodules when compared to ResNet50, EfficientNetb0, and Vision Transformer models. DenseNet121 achieved the highest AUC, with values of 0.94 on the validation set and 0.93 on the test set, significantly surpassing the performance of the other models across both datasets (all P < 0.05). Conclusion The weakly supervised DenseNet121 model developed in this study demonstrated feasibility for ultrasound diagnosis of breast tumor and showed good capabilities in differential diagnosis. This model may help radiologists, especially novice doctors, to improve the accuracy of breast tumor diagnosis using ultrasound.
Collapse
Affiliation(s)
- Xi Wen
- Department of Ultrasound, The Central Hospital of Enshi Tujia And Miao Autonomous Prefecture (Enshi Clinical College of Wuhan University), Enshi, China
| | - Hao Tu
- Department of Ultrasound, The Central Hospital of Enshi Tujia And Miao Autonomous Prefecture (Enshi Clinical College of Wuhan University), Enshi, China
| | - Bingyang Zhao
- Department of Neurology, China-Japan Union Hospital of Jilin University, Changchun, China
| | - Wenbo Zhou
- Department of Stomatology, China-Japan Union Hospital of Jilin University, Changchun, China
| | - Zhuo Yang
- Department of Ultrasound, The Central Hospital of Enshi Tujia And Miao Autonomous Prefecture (Enshi Clinical College of Wuhan University), Enshi, China
| | - Lijuan Li
- Department of Ultrasound, The Central Hospital of Enshi Tujia And Miao Autonomous Prefecture (Enshi Clinical College of Wuhan University), Enshi, China
| |
Collapse
|
39
|
Vallez N, Bueno G, Deniz O, Rienda MA, Pastor C. BUS-UCLM: Breast ultrasound lesion segmentation dataset. Sci Data 2025; 12:242. [PMID: 39934113 PMCID: PMC11814256 DOI: 10.1038/s41597-025-04562-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2024] [Accepted: 01/30/2025] [Indexed: 02/13/2025] Open
Abstract
This dataset comprises 38 breast ultrasound scans from patients, encompassing a total of 683 images. The scans were conducted using a Siemens ACUSON S2000TM Ultrasound System from 2022 to 2023. The dataset is specifically created for the purpose of segmenting breast lesions, with the goal of identifying the area and contour of the lesion, as well as classifying it as either benign or malignant. The images can be classified into three categories based on their findings: 419 are normal, 174 are benign, and 90 are malignant. The ground truth is given as RGB segmentation masks in individual files, with black indicating normal breast tissue and green and red indicating benign and malignant lesions, respectively. This dataset enables researchers to construct and evaluate machine learning models for identifying between benign and malignant tumours in authentic breast ultrasound images. The segmentation annotations provided by expert radiologists enable accurate model training and evaluation, making this dataset a valuable asset in the field of computer vision and public health.
Collapse
Affiliation(s)
- Noelia Vallez
- VISILAB, E.T.S. Ingeniería Industrial, University of Castilla-La Mancha, Avda. Camilo José Cela s/n, 13005, Ciudad Real, Spain.
| | - Gloria Bueno
- VISILAB, E.T.S. Ingeniería Industrial, University of Castilla-La Mancha, Avda. Camilo José Cela s/n, 13005, Ciudad Real, Spain
| | - Oscar Deniz
- VISILAB, E.T.S. Ingeniería Industrial, University of Castilla-La Mancha, Avda. Camilo José Cela s/n, 13005, Ciudad Real, Spain
| | - Miguel Angel Rienda
- Hospital General Universitario de Ciudad Real, C/ Obispo Rafael Torija s/n, 13005, Ciudad Real, Spain
| | - Carlos Pastor
- Hospital General Universitario de Ciudad Real, C/ Obispo Rafael Torija s/n, 13005, Ciudad Real, Spain
| |
Collapse
|
40
|
Wang X, Lv L, Tang Q, Wang G, Shang E, Zheng H, Zhang L. A feature fusion method based on radiomic features and revised deep features for improving tumor prediction in ultrasound images. Comput Biol Med 2025; 185:109605. [PMID: 39721417 DOI: 10.1016/j.compbiomed.2024.109605] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2024] [Revised: 12/01/2024] [Accepted: 12/19/2024] [Indexed: 12/28/2024]
Abstract
BACKGROUND Radiomic features and deep features are both vitally helpful for the accurate prediction of tumor information in breast ultrasound. However, whether integrating radiomic features and deep features can improve the prediction performance of tumor information is unclear. METHODS A feature fusion method based on radiomic features and revised deep features was proposed to predict tumor information. Radiomic features were extracted from the tumor region on ultrasound images, and the optimal radiomic features were subsequently selected based on Gini score. Revised deep features, which were extracted using the revised CNN models integrating prior information, were combined with radiomic features to build a logistic regression classifier for tumor prediction. The performance was evaluated using area under the receiver operating characteristic (ROC) curve (AUC). RESULTS The results showed that the proposed feature fusion method (AUC = 0.9845) obtained better prediction performance than that based on radiomic features (AUC = 0.9796) or deep features (AUC = 0.9342). CONCLUSIONS Our results demonstrate that the proposed feature fusion framework integrating the radiomic features and revised deep features is an efficient method to improve the prediction performance of tumor information.
Collapse
Affiliation(s)
- Xianyang Wang
- School of Computer and Information, Anqing Normal University, Anqing, 246133, People's Republic of China
| | - Linlin Lv
- School of Computer and Information, Anqing Normal University, Anqing, 246133, People's Republic of China
| | - Qingfeng Tang
- School of Computer and Information, Anqing Normal University, Anqing, 246133, People's Republic of China
| | - Guangjun Wang
- School of Computer and Information, Anqing Normal University, Anqing, 246133, People's Republic of China
| | - Enci Shang
- School of Computer and Information, Anqing Normal University, Anqing, 246133, People's Republic of China
| | - Hang Zheng
- School of Computer and Information, Anqing Normal University, Anqing, 246133, People's Republic of China
| | - Liangliang Zhang
- School of Computer and Information, Anqing Normal University, Anqing, 246133, People's Republic of China.
| |
Collapse
|
41
|
Chen H, Cai Y, Wang C, Chen L, Zhang B, Han H, Guo Y, Ding H, Zhang Q. Multi-Organ Foundation Model for Universal Ultrasound Image Segmentation With Task Prompt and Anatomical Prior. IEEE TRANSACTIONS ON MEDICAL IMAGING 2025; 44:1005-1018. [PMID: 39361457 DOI: 10.1109/tmi.2024.3472672] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/05/2024]
Abstract
Semantic segmentation of ultrasound (US) images with deep learning has played a crucial role in computer-aided disease screening, diagnosis and prognosis. However, due to the scarcity of US images and small field of view, resulting segmentation models are tailored for a specific single organ and may lack robustness, overlooking correlations among anatomical structures of multiple organs. To address these challenges, we propose the Multi-Organ FOundation (MOFO) model for universal US image segmentation. The MOFO is optimized jointly from multiple organs across various anatomical regions to overcome the data scarcity and explore correlations between multiple organs. The MOFO extracts organ-invariant representations from US images. Simultaneously, the task prompt is employed to refine organ-specific representations for segmentation predictions. Moreover, the anatomical prior is incorporated to enhance the consistency of the anatomical structures. A multi-organ US database with segmentation labels, comprising 7039 images from 10 organs across various regions of the human body, has been established to develop and evaluate our model. Results demonstrate that the MOFO outperforms single-organ methods in terms of the Dice coefficient, 95% Hausdorff distance and average symmetric surface distance with statistically sufficient margins. Our experiments in multi-organ universal segmentation for US images serve as a pioneering exploration of improving segmentation performance by leveraging semantic and anatomical relationships within US images of multiple organs.
Collapse
|
42
|
Sun F, Zhou Y, Hu L, Li Y, Zhao D, Chen Y, He Y. EDSRNet: An Enhanced Decoder Semantic Recovery Network for 2D Medical Image Segmentation. IEEE J Biomed Health Inform 2025; 29:1113-1124. [PMID: 40030272 DOI: 10.1109/jbhi.2024.3504829] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/06/2025]
Abstract
In recent years, with the advancement of medical imaging technology, medical image segmentation has played a key role in assisting diagnosis and treatment planning. Current deep learning-based medical image segmentation methods mainly adopt encoder-decoder architecture design and have received wide attention. However, these methods still have some limitations, including: (1) Existing methods are often influenced by the significant semantic information gap when supplementing features for the decoder. (2) Existing methods do not simultaneously consider global and local information interaction during decoding, resulting in ineffective semantic recovery. Therefore, this paper proposes a novel Enhanced Decoder Semantic Recovery Network to address these challenges. Firstly, the Multi-Level Semantic Fusion (MLSF) module is introduced, which effectively fuses low-level features of the original image, encoder features, high-level features of the deepest network layer, and decoder features, and assigns weights based on semantic gaps. Secondly, the Multiscale Spatial Attention (MSSA) and Cross Convolution Channel Attention (CCCA) modules are employed to obtain richer feature information. Finally, the Global-Local Semantic Recovery (GLSR) module is designed to achieve better semantic recovery. Experiments on public datasets such as BUSI, CVC-ClinicDB, and Kvasir-SEG demonstrate that the proposed model improves IoU compared to suboptimal algorithms by 0.81%, 0.85% and 1.98%, respectively, significantly enhancing the performance of 2D medical image segmentation. This method provides effective technical support for further development in the field of medical image.
Collapse
|
43
|
Han H, Tian Z, Guo Q, Jiang J, Du S, Wang J. HSC-T: B-Ultrasound-to-Elastography Translation via Hierarchical Structural Consistency Learning for Thyroid Cancer Diagnosis. IEEE J Biomed Health Inform 2025; 29:799-806. [PMID: 39495688 DOI: 10.1109/jbhi.2024.3491905] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2024]
Abstract
Elastography ultrasound imaging is increasingly important in the diagnosis of thyroid cancer and other diseases, but its reliance on specialized equipment and techniques limits widespread adoption. This paper proposes a novel multimodal ultrasound diagnostic pipeline that expands the application of elastography ultrasound by translating B-ultrasound (BUS) images into elastography images (EUS). Additionally, to address the limitations of existing image-to-image translation methods, which struggle to effectively model inter-sample variations and accurately capture regional-scale structural consistency, we propose a BUS-to-EUS translation method based on hierarchical structural consistency. By incorporating domain-level, sample-level, patch-level, and pixel-level constraints, our approach guides the model in learning a more precise mapping from BUS to EUS, thereby enhancing diagnostic accuracy. Experimental results demonstrate that the proposed method significantly improves the accuracy of BUS-to-EUS translation on the MTUSI dataset and that the generated elastography images enhance nodule diagnostic accuracy compared to solely using BUS images on the STUSI and the BUSI datasets. This advancement highlights the potential for broader application of elastography in clinical practice.
Collapse
|
44
|
Xin J, Yu Y, Shen Q, Zhang S, Su N, Wang Z. BCT-Net: semantic-guided breast cancer segmentation on BUS. Med Biol Eng Comput 2025:10.1007/s11517-025-03304-2. [PMID: 39883373 DOI: 10.1007/s11517-025-03304-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2024] [Accepted: 01/17/2025] [Indexed: 01/31/2025]
Abstract
Accurately and swiftly segmenting breast tumors is significant for cancer diagnosis and treatment. Ultrasound imaging stands as one of the widely employed methods in clinical practice. However, due to challenges such as low contrast, blurred boundaries, and prevalent shadows in ultrasound images, tumor segmentation remains a daunting task. In this study, we propose BCT-Net, a network amalgamating CNN and transformer components for breast tumor segmentation. BCT-Net integrates a dual-level attention mechanism to capture more features and redefines the skip connection module. We introduce the utilization of a classification task as an auxiliary task to impart additional semantic information to the segmentation network, employing supervised contrastive learning. A hybrid objective loss function is proposed, which combines pixel-wise cross-entropy, binary cross-entropy, and supervised contrastive learning loss. Experimental results demonstrate that BCT-Net achieves high precision, with Pre and DSC indices of 86.12% and 88.70%, respectively. Experiments conducted on the BUSI dataset of breast ultrasound images manifest that this approach exhibits high accuracy in breast tumor segmentation.
Collapse
Affiliation(s)
- Junchang Xin
- School of Computer Science and Engineering, Northeastern University, Shenyang, 110169, China
| | - Yaqi Yu
- School of Computer Science and Engineering, Northeastern University, Shenyang, 110169, China
| | - Qi Shen
- College of Medicine and Biological Information Engineering, Northeastern University, Shenyang, 110169, China
| | - Shudi Zhang
- College of Medicine and Biological Information Engineering, Northeastern University, Shenyang, 110169, China
| | - Na Su
- College of Medicine and Biological Information Engineering, Northeastern University, Shenyang, 110169, China
| | - Zhiqiong Wang
- College of Medicine and Biological Information Engineering, Northeastern University, Shenyang, 110169, China.
| |
Collapse
|
45
|
Gupta C, Gill NS, Gulia P, Alduaiji N, Shreyas J, Shukla PK. Applying YOLOv6 as an ensemble federated learning framework to classify breast cancer pathology images. Sci Rep 2025; 15:3769. [PMID: 39885198 PMCID: PMC11782635 DOI: 10.1038/s41598-024-80187-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2024] [Accepted: 11/15/2024] [Indexed: 02/01/2025] Open
Abstract
The most common carcinoma-related cause of death among women is breast cancer. Early detection is crucial, and the manual screening method may lead to a delayed diagnosis, which would delay treatment and put lives at risk. Mammography imaging is advised for routine screening to diagnose breast cancer at an early stage. To improve generalizability, this study examines the implementation of Federated Learning (FedL) to detect breast cancer. Its performance is compared to a centralized training technique that diagnoses breast cancer. Although FedL has been famous as a safeguarding privacy algorithm, its similarities to ensemble learning methods, such as federated averaging (FEDAvrg), still need to be thoroughly investigated. This study examines explicitly how a YOLOv6 model trained with FedL performs across several clients. A new homomorphic encryption and decryption algorithm is also proposed to retain data privacy. A novel pruned YOLOv6 model with FedL is introduced in this study to differentiate benign and malignant tissues. The model is trained on the breast cancer pathological dataset BreakHis and BUSI. The proposed model achieved a validation accuracy of 98% on BreakHis dataset and 97% on BUSI dataset. The results are compared with the VGG-19, ResNet-50, and InceptionV3 algorithms, showing that the proposed model achieved better results. The tests reveal that federated learning is feasible, as FedAvrg trains models of outstanding quality with only a few communication rounds, as shown by the results on a range of model topologies such as ResNet50, VGG-19, InceptionV3, and the proposed Ensembled FedL YOLOv6.
Collapse
Affiliation(s)
- Chhaya Gupta
- Department of Computer Science and Applications, Maharshi Dayanand University, Rohtak, India
| | - Nasib Singh Gill
- Department of Computer Science and Applications, Maharshi Dayanand University, Rohtak, India
| | - Preeti Gulia
- Department of Computer Science and Applications, Maharshi Dayanand University, Rohtak, India
| | - Noha Alduaiji
- Department of Computer Science, College of Computer and Information Sciences, Majmaah University, 11952, Al Majmaah, Saudi Arabia
| | - J Shreyas
- Department of Information Technology, Manipal Institute of Technology Bengaluru, Manipal Academy of Higher Education, Manipal, Karnataka, 576104, India.
| | - Piyush Kumar Shukla
- Department of Computer Science and Engineering, University Institute of Technology, Rajiv Gandhi Proudyogiki Vishwavidyalaya (State Technological University of Madhya Pradesh), Madhya Pradesh, Bhopal, 462033, India
| |
Collapse
|
46
|
Hasan MK, Tasnim J. Reply to Comment on 'CAM-QUS guided self-tuning modular CNNs with multi-loss functions for fully automated breast lesion classification in ultrasound images'. Phys Med Biol 2025; 70:038002. [PMID: 39851263 DOI: 10.1088/1361-6560/ada7bf] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2024] [Accepted: 01/08/2025] [Indexed: 01/26/2025]
Affiliation(s)
- Md Kamrul Hasan
- Department of Electrical and Electronic Engineering, Bangladesh University of Engineering and Technology (BUET), Dhaka 1205, Bangladesh
| | - Jarin Tasnim
- Department of Electrical and Electronic Engineering, Bangladesh University of Engineering and Technology (BUET), Dhaka 1205, Bangladesh
| |
Collapse
|
47
|
Dar MF, Ganivada A. Adaptive ensemble loss and multi-scale attention in breast ultrasound segmentation with UMA-Net. Med Biol Eng Comput 2025:10.1007/s11517-025-03301-5. [PMID: 39847155 DOI: 10.1007/s11517-025-03301-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2024] [Accepted: 01/15/2025] [Indexed: 01/24/2025]
Abstract
The generalization of deep learning (DL) models is critical for accurate lesion segmentation in breast ultrasound (BUS) images. Traditional DL models often struggle to generalize well due to the high frequency and scale variations inherent in BUS images. Moreover, conventional loss functions used in these models frequently result in imbalanced optimization, either prioritizing region overlap or boundary accuracy, which leads to suboptimal segmentation performance. To address these issues, we propose UMA-Net, an enhanced UNet architecture specifically designed for BUS image segmentation. UMA-Net integrates residual connections, attention mechanisms, and a bottleneck with atrous convolutions to effectively capture multi-scale contextual information without compromising spatial resolution. Additionally, we introduce an adaptive ensemble loss function that dynamically balances the contributions of different loss components during training, ensuring optimization across key segmentation metrics. This novel approach mitigates the imbalances found in conventional loss functions. We validate UMA-Net on five diverse BUS datasets-BUET, BUSI, Mendeley, OMI, and UDIAT-demonstrating superior performance. Our findings highlight the importance of addressing frequency and scale variations, confirming UMA-Net as a robust and generalizable solution for BUS image segmentation.
Collapse
Affiliation(s)
- Mohsin Furkh Dar
- Artificial Intelligence Lab, School of Computer and Information Sciences, University of Hyderabad, Hyderabad, 500046, India.
| | - Avatharam Ganivada
- Artificial Intelligence Lab, School of Computer and Information Sciences, University of Hyderabad, Hyderabad, 500046, India
| |
Collapse
|
48
|
Wang T, Liu J, Tang J. A Cross-scale Attention-Based U-Net for Breast Ultrasound Image Segmentation. JOURNAL OF IMAGING INFORMATICS IN MEDICINE 2025:10.1007/s10278-025-01392-y. [PMID: 39838227 DOI: 10.1007/s10278-025-01392-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/14/2024] [Revised: 12/06/2024] [Accepted: 12/23/2024] [Indexed: 01/23/2025]
Abstract
Breast cancer remains a significant global health concern and is a leading cause of mortality among women. The accuracy of breast cancer diagnosis can be greatly improved with the assistance of automatic segmentation of breast ultrasound images. Research has demonstrated the effectiveness of convolutional neural networks (CNNs) and transformers in segmenting these images. Some studies combine transformers and CNNs, using the transformer's ability to exploit long-distance dependencies to address the limitations inherent in convolutional neural networks. Many of these studies face limitations due to the forced integration of transformer blocks into CNN architectures. This approach often leads to inconsistencies in the feature extraction process, ultimately resulting in suboptimal performance for the complex task of medical image segmentation. This paper presents CSAU-Net, a cross-scale attention-guided U-Net, which is a combined CNN-transformer structure that leverages the local detail depiction of CNNs and the ability of transformers to handle long-distance dependencies. To integrate global context data, we propose a cross-scale cross-attention transformer block that is embedded within the skip connections of the U-shaped architectural network. To further enhance the effectiveness of the segmentation process, we incorporated a gated dilated convolution (GDC) module and a lightweight channel self-attention transformer (LCAT) on the encoder side. Extensive experiments conducted on three open-source datasets demonstrate that our CSAU-Net surpasses state-of-the-art techniques in segmenting ultrasound breast lesions.
Collapse
Affiliation(s)
- Teng Wang
- School of Computer Science and Technology, Wuhan University of Science and Technology, Wuhan, 430081, China
- China & Hubei Province Key Laboratory of Intelligent Information Processing and Real-Time Industrial System, Wuhan, 430065, China
| | - Jun Liu
- School of Computer Science and Technology, Wuhan University of Science and Technology, Wuhan, 430081, China.
- China & Hubei Province Key Laboratory of Intelligent Information Processing and Real-Time Industrial System, Wuhan, 430065, China.
| | - Jinshan Tang
- Health Informatics, College of Public Health, George Mason University, Fairfax, VA, 22030, USA.
| |
Collapse
|
49
|
Rai HM, Yoo J, Agarwal S, Agarwal N. LightweightUNet: Multimodal Deep Learning with GAN-Augmented Imaging Data for Efficient Breast Cancer Detection. Bioengineering (Basel) 2025; 12:73. [PMID: 39851348 PMCID: PMC11761908 DOI: 10.3390/bioengineering12010073] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2024] [Revised: 01/06/2025] [Accepted: 01/08/2025] [Indexed: 01/26/2025] Open
Abstract
Breast cancer ranks as the second most prevalent cancer globally and is the most frequently diagnosed cancer among women; therefore, early, automated, and precise detection is essential. Most AI-based techniques for breast cancer detection are complex and have high computational costs. Hence, to overcome this challenge, we have presented the innovative LightweightUNet hybrid deep learning (DL) classifier for the accurate classification of breast cancer. The proposed model boasts a low computational cost due to its smaller number of layers in its architecture, and its adaptive nature stems from its use of depth-wise separable convolution. We have employed a multimodal approach to validate the model's performance, using 13,000 images from two distinct modalities: mammogram imaging (MGI) and ultrasound imaging (USI). We collected the multimodal imaging datasets from seven different sources, including the benchmark datasets DDSM, MIAS, INbreast, BrEaST, BUSI, Thammasat, and HMSS. Since the datasets are from various sources, we have resized them to the uniform size of 256 × 256 pixels and normalized them using the Box-Cox transformation technique. Since the USI dataset is smaller, we have applied the StyleGAN3 model to generate 10,000 synthetic ultrasound images. In this work, we have performed two separate experiments: the first on a real dataset without augmentation and the second on a real + GAN-augmented dataset using our proposed method. During the experiments, we used a 5-fold cross-validation method, and our proposed model obtained good results on the real dataset (87.16% precision, 86.87% recall, 86.84% F1-score, and 86.87% accuracy) without adding any extra data. Similarly, the second experiment provides better performance on the real + GAN-augmented dataset (96.36% precision, 96.35% recall, 96.35% F1-score, and 96.35% accuracy). This multimodal approach, which utilizes LightweightUNet, enhances the performance by 9.20% in precision, 9.48% in recall, 9.51% in F1-score, and a 9.48% increase in accuracy on the combined dataset. The LightweightUNet model we proposed works very well thanks to a creative network design, adding fake images to the data, and a multimodal training method. These results show that the model has a lot of potential for use in clinical settings.
Collapse
Affiliation(s)
- Hari Mohan Rai
- School of Computing, Gachon University, Seongnam 13120, Republic of Korea;
| | - Joon Yoo
- School of Computing, Gachon University, Seongnam 13120, Republic of Korea;
| | - Saurabh Agarwal
- Department of Information and Communication Engineering, Yeungnam University, Gyeongsan 38541, Republic of Korea
| | - Neha Agarwal
- School of Chemical Engineering, Yeungnam University, Gyeongsan 38541, Republic of Korea
| |
Collapse
|
50
|
Giner-Miguelez J, Gómez A, Cabot J. On the Readiness of Scientific Data Papers for a Fair and Transparent Use in Machine Learning. Sci Data 2025; 12:61. [PMID: 39805856 PMCID: PMC11730645 DOI: 10.1038/s41597-025-04402-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2024] [Accepted: 01/03/2025] [Indexed: 01/16/2025] Open
Abstract
To ensure the fairness and trustworthiness of machine learning (ML) systems, recent legislative initiatives and relevant research in the ML community have pointed out the need to document the data used to train ML models. Besides, data-sharing practices in many scientific domains have evolved in recent years for reproducibility purposes. In this sense, academic institutions' adoption of these practices has encouraged researchers to publish their data and technical documentation in peer-reviewed publications such as data papers. In this study, we analyze how this broader scientific data documentation meets the needs of the ML community and regulatory bodies for its use in ML technologies. We examine a sample of 4041 data papers of different domains, assessing their coverage and trends in the requested dimensions and comparing them to those from an ML-focused venue (NeurIPS D&B), which publishes papers describing datasets. As a result, we propose a set of recommendation guidelines for data creators and scientific data publishers to increase their data's preparedness for its transparent and fairer use in ML technologies.
Collapse
Affiliation(s)
- Joan Giner-Miguelez
- Internet Interdisciplinary Institute (IN3), Universitat Oberta de Catalunya (UOC), Barcelona, Spain.
- Barcelona Supercomputing Center, Plaça Eusebi Güell, 1-3, Barcelona, Spain.
| | - Abel Gómez
- Internet Interdisciplinary Institute (IN3), Universitat Oberta de Catalunya (UOC), Barcelona, Spain
| | - Jordi Cabot
- Luxembourg Institute of Science and Technology, Esch-sur-Alzette, Luxembourg
- University of Luxembourg, Esch-sur-Alzette, Luxembourg
| |
Collapse
|