1
|
Tong L, Li T, Zhang Q, Zhang Q, Zhu R, Du W, Hu P. LiViT-Net: A U-Net-like, lightweight Transformer network for retinal vessel segmentation. Comput Struct Biotechnol J 2024; 24:213-224. [PMID: 38572168 PMCID: PMC10987887 DOI: 10.1016/j.csbj.2024.03.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2023] [Revised: 02/22/2024] [Accepted: 03/04/2024] [Indexed: 04/05/2024] Open
Abstract
The intricate task of precisely segmenting retinal vessels from images, which is critical for diagnosing various eye diseases, presents significant challenges for models due to factors such as scale variation, complex anatomical patterns, low contrast, and limitations in training data. Building on these challenges, we offer novel contributions spanning model architecture, loss function design, robustness, and real-time efficacy. To comprehensively address these challenges, a new U-Net-like, lightweight Transformer network for retinal vessel segmentation is presented. By integrating MobileViT+ and a novel local representation in the encoder, our design emphasizes lightweight processing while capturing intricate image structures, enhancing vessel edge precision. A novel joint loss is designed, leveraging the characteristics of weighted cross-entropy and Dice loss to effectively guide the model through the task's challenges, such as foreground-background imbalance and intricate vascular structures. Exhaustive experiments were performed on three prominent retinal image databases. The results underscore the robustness and generalizability of the proposed LiViT-Net, which outperforms other methods in complex scenarios, especially in intricate environments with fine vessels or vessel edges. Importantly, optimized for efficiency, LiViT-Net excels on devices with constrained computational power, as evidenced by its fast performance. To demonstrate the model proposed in this study, a freely accessible and interactive website was established (https://hz-t3.matpool.com:28765?token=aQjYR4hqMI), revealing real-time performance with no login requirements.
Collapse
Affiliation(s)
- Le Tong
- The College of Information, Mechanical and Electrical Engineering, Shanghai Normal University, No. 100 Haisi Road, Shanghai, 201418, China
| | - Tianjiu Li
- The College of Information, Mechanical and Electrical Engineering, Shanghai Normal University, No. 100 Haisi Road, Shanghai, 201418, China
| | - Qian Zhang
- The College of Information, Mechanical and Electrical Engineering, Shanghai Normal University, No. 100 Haisi Road, Shanghai, 201418, China
| | - Qin Zhang
- Ophthalmology Department, Jing'an District Central Hospital, No. 259, Xikang Road, Shanghai, 200040, China
| | - Renchaoli Zhu
- The College of Information, Mechanical and Electrical Engineering, Shanghai Normal University, No. 100 Haisi Road, Shanghai, 201418, China
| | - Wei Du
- Laboratory of Smart Manufacturing in Energy Chemical Process, East China University of Science and Technology, No. 130 Meilong Road, Shanghai, 200237, China
| | - Pengwei Hu
- The Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, 40-1 South Beijing Road, Urumqi, 830011, China
| |
Collapse
|
2
|
Dong W, Liang Z, Wang L, Tian G, Long Q. Unsupervised domain adaptive segmentation algorithm based on two-level category alignment. Neural Netw 2024; 177:106399. [PMID: 38805794 DOI: 10.1016/j.neunet.2024.106399] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2023] [Revised: 03/13/2024] [Accepted: 05/19/2024] [Indexed: 05/30/2024]
Abstract
To enhance the model's generalization ability in unsupervised domain adaptive segmentation tasks, most approaches have primarily focused on pixel-level local features, but neglected the clue in category information. This limitation results in the segmentation network only learning global inter-domain invariant features but ignoring the category-specific inter-domain invariant features, which degenerates the segmentation performance. To address this issue, we present an Unsupervised Domain Adaptive algorithm based on two-level Category Alignment in two different spaces for semantic segmentation tasks, denoted as UDAca+. The first level is image-level category alignment based on class activation map (CAM), and the second one is pixel-level category alignment based on pseudo label. By utilizing category information, UDAca+ can effectively capture domain-invariant yet category-discriminative feature representations to improve segmentation accuracy. In addition, an adversarial learning-based strategy in mixed domain is designed to train the proposed network. Moreover, a confidence calculation method is introduced to mitigate the misleading issues of negative transfer and over-alignment caused by the noise in image-level pseudo labels. UDAca+ achieves the state-of-the-art (SOTA) performance on two synthetic-to-real adaptative tasks, and verifies its effectiveness for image segmentation.
Collapse
Affiliation(s)
- Wenyong Dong
- School of Computer Science, Wuhan University, Wuhan, 430072, China; School of Information Network Security, Xinjiang University of Political Science and Law, Tumushuke, 843900, China.
| | - Zhixue Liang
- School of Computer Science, Wuhan University, Wuhan, 430072, China; School of Computer and Software, Nanyang Institute of Technology, Nanyang, 473000, China
| | - Liping Wang
- School of Computer Science, Wuhan University, Wuhan, 430072, China
| | - Gang Tian
- School of Computer Science, Wuhan University, Wuhan, 430072, China.
| | - Qianhui Long
- School of Computer Science, Wuhan University, Wuhan, 430072, China
| |
Collapse
|
3
|
Chen X, Liu Q, Deng HH, Kuang T, Lin HHY, Xiao D, Gateno J, Xia JJ, Yap PT. Improving Image Segmentation with Contextual and Structural Similarity. PATTERN RECOGNITION 2024; 152:110489. [PMID: 38645435 PMCID: PMC11027435 DOI: 10.1016/j.patcog.2024.110489] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/23/2024]
Abstract
Deep learning models for medical image segmentation are usually trained with voxel-wise losses, e.g., cross-entropy loss, focusing on unary supervision without considering inter-voxel relationships. This oversight potentially leads to semantically inconsistent predictions. Here, we propose a contextual similarity loss (CSL) and a structural similarity loss (SSL) to explicitly and efficiently incorporate inter-voxel relationships for improved performance. The CSL promotes consistency in predicted object categories for each image sub-region compared to ground truth. The SSL enforces compatibility between the predictions of voxel pairs by computing pair-wise distances between them, ensuring that voxels of the same class are close together whereas those from different classes are separated by a wide margin in the distribution space. The effectiveness of the CSL and SSL is evaluated using a clinical cone-beam computed tomography (CBCT) dataset of patients with various craniomaxillofacial (CMF) deformities and a public pancreas dataset. Experimental results show that the CSL and SSL outperform state-of-the-art regional loss functions in preserving segmentation semantics.
Collapse
Affiliation(s)
- Xiaoyang Chen
- Department of Radiology and Biomedical Research Imaging Center, University of North Carolina, Chapel Hill, 27599, NC, USA
| | - Qin Liu
- Department of Computer Science, University of North Carolina, Chapel Hill, 27599, NC, USA
| | - Hannah H. Deng
- Department of Oral and Maxillofacial Surgery, Houston Methodist Research Institute, Houston, 77030, TX, USA
| | - Tianshu Kuang
- Department of Oral and Maxillofacial Surgery, Houston Methodist Research Institute, Houston, 77030, TX, USA
| | - Henry Hung-Ying Lin
- Department of Oral and Maxillofacial Surgery, Houston Methodist Research Institute, Houston, 77030, TX, USA
| | - Deqiang Xiao
- Department of Radiology and Biomedical Research Imaging Center, University of North Carolina, Chapel Hill, 27599, NC, USA
| | - Jaime Gateno
- Department of Oral and Maxillofacial Surgery, Houston Methodist Research Institute, Houston, 77030, TX, USA
- Department of Surgery (Oral and Maxillofacial Surgery), Weill Medical College, Cornell University, New York, 10065, NY, USA
| | - James J. Xia
- Department of Oral and Maxillofacial Surgery, Houston Methodist Research Institute, Houston, 77030, TX, USA
- Department of Surgery (Oral and Maxillofacial Surgery), Weill Medical College, Cornell University, New York, 10065, NY, USA
| | - Pew-Thian Yap
- Department of Radiology and Biomedical Research Imaging Center, University of North Carolina, Chapel Hill, 27599, NC, USA
| |
Collapse
|
4
|
Yuan W, Cheng J, Gong Y, He L, Zhang J. MACG-Net: Multi-axis cross gating network for deformable medical image registration. Comput Biol Med 2024; 178:108673. [PMID: 38905891 DOI: 10.1016/j.compbiomed.2024.108673] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2023] [Revised: 04/18/2024] [Accepted: 05/26/2024] [Indexed: 06/23/2024]
Abstract
Deformable Image registration is a fundamental yet vital task for preoperative planning, intraoperative information fusion, disease diagnosis and follow-ups. It solves the non-rigid deformation field to align an image pair. Latest approaches such as VoxelMorph and TransMorph compute features from a simple concatenation of moving and fixed images. However, this often leads to weak alignment. Moreover, the convolutional neural network (CNN) or the hybrid CNN-Transformer based backbones are constrained to have limited sizes of receptive field and cannot capture long range relations while full Transformer based approaches are computational expensive. In this paper, we propose a novel multi-axis cross grating network (MACG-Net) for deformable medical image registration, which combats these limitations. MACG-Net uses a dual stream multi-axis feature fusion module to capture both long-range and local context relationships from the moving and fixed images. Cross gate blocks are integrated with the dual stream backbone to consider both independent feature extractions in the moving-fixed image pair and the relationship between features from the image pair. We benchmark our method on several different datasets including 3D atlas-based brain MRI, inter-patient brain MRI and 2D cardiac MRI. The results demonstrate that the proposed method has achieved state-of-the-art performance. The source code has been released at https://github.com/Valeyards/MACG.
Collapse
Affiliation(s)
- Wei Yuan
- College of Biomedical Engineering, Sichuan University, Chengdu 610065, China
| | - Jun Cheng
- Institute for Infocomm Research, Agency for Science, Technology and Research, 138632, Singapore
| | - Yuhang Gong
- College of Biomedical Engineering, Sichuan University, Chengdu 610065, China
| | - Ling He
- College of Biomedical Engineering, Sichuan University, Chengdu 610065, China.
| | - Jing Zhang
- College of Biomedical Engineering, Sichuan University, Chengdu 610065, China
| |
Collapse
|
5
|
Pang H, Ma R, Su J, Liu C, Gao Y, Jin Q. Blinding and blurring the multi-object tracker with adversarial perturbations. Neural Netw 2024; 176:106331. [PMID: 38701599 DOI: 10.1016/j.neunet.2024.106331] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2023] [Revised: 03/18/2024] [Accepted: 04/21/2024] [Indexed: 05/05/2024]
Abstract
Adversarial attack reveals a potential imperfection in deep models that they are susceptible to being tricked by imperceptible perturbations added to images. Recent deep multi-object trackers combine the functionalities of detection and association, rendering attacks on either the detector or the association component an effective means of deception. Existing attacks focus on increasing the frequency of ID switching, which greatly damages tracking stability, but is not enough to make the tracker completely ineffective. To fully explore the potential of adversarial attacks, we propose Blind-Blur Attack (BBA), a novel attack method based on spatio-temporal motion information to fool multi-object trackers. Specifically, a simple but efficient perturbation generator is trained with the blind-blur loss, simultaneously making the target invisible to the tracker and letting the background be regarded as moving targets. We take TraDeS as our main research tracker, and verify our attack method on other excellent algorithms (i.e., CenterTrack, FairMOT, and ByteTrack) on MOT-Challenge benchmark datasets (i.e., MOT16, MOT17, and MOT20). BBA attack reduced the MOTA of TraDeS and ByteTrack from 69.1 and 80.3 to -238.1 and -357.0, respectively, indicating that it is an efficient method with a high degrees of transferability.
Collapse
Affiliation(s)
- Haibo Pang
- School of Cyber Science and Engineering, Zhengzhou University, Zhengzhou City, 450003, China.
| | - Rongqi Ma
- School of Cyber Science and Engineering, Zhengzhou University, Zhengzhou City, 450003, China.
| | - Jie Su
- School of Cyber Science and Engineering, Zhengzhou University, Zhengzhou City, 450003, China.
| | - Chengming Liu
- School of Cyber Science and Engineering, Zhengzhou University, Zhengzhou City, 450003, China.
| | - Yufei Gao
- School of Cyber Science and Engineering, Zhengzhou University, Zhengzhou City, 450003, China.
| | - Qun Jin
- Waseda University, Tokorozawa, 359-1192, Japan.
| |
Collapse
|
6
|
Calixto C, Taymourtash A, Karimi D, Snoussi H, Velasco-Annis C, Jaimes C, Gholipour A. Advances in Fetal Brain Imaging. Magn Reson Imaging Clin N Am 2024; 32:459-478. [PMID: 38944434 PMCID: PMC11216711 DOI: 10.1016/j.mric.2024.03.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/01/2024]
Abstract
Over the last 20 years, there have been remarkable developments in fetal brain MR imaging analysis methods. This article delves into the specifics of structural imaging, diffusion imaging, functional MR imaging, and spectroscopy, highlighting the latest advancements in motion correction, fetal brain development atlases, and the challenges and innovations. Furthermore, this article explores the clinical applications of these advanced imaging techniques in comprehending and diagnosing fetal brain development and abnormalities.
Collapse
Affiliation(s)
- Camilo Calixto
- Computational Radiology Laboratory, Department of Radiology, Boston Children's Hospital, 401 Park Dr, 7th Floor West, Boston, MA 02215, USA; Harvard Medical School, 25 Shattuck Street, Boston, MA 02115, USA.
| | - Athena Taymourtash
- Department of Biomedical Imaging and Image-guided Therapy, Medical University of Vienna, Spitalgasse 23, Wien 1090, Austria
| | - Davood Karimi
- Computational Radiology Laboratory, Department of Radiology, Boston Children's Hospital, 401 Park Dr, 7th Floor West, Boston, MA 02215, USA; Harvard Medical School, 25 Shattuck Street, Boston, MA 02115, USA
| | - Haykel Snoussi
- Computational Radiology Laboratory, Department of Radiology, Boston Children's Hospital, 401 Park Dr, 7th Floor West, Boston, MA 02215, USA; Harvard Medical School, 25 Shattuck Street, Boston, MA 02115, USA
| | - Clemente Velasco-Annis
- Computational Radiology Laboratory, Department of Radiology, Boston Children's Hospital, 401 Park Dr, 7th Floor West, Boston, MA 02215, USA; Harvard Medical School, 25 Shattuck Street, Boston, MA 02115, USA
| | - Camilo Jaimes
- Harvard Medical School, 25 Shattuck Street, Boston, MA 02115, USA; Department of Radiology, Massachusetts General Hospital, 55 Fruit Street, Boston, MA 02215, USA
| | - Ali Gholipour
- Computational Radiology Laboratory, Department of Radiology, Boston Children's Hospital, 401 Park Dr, 7th Floor West, Boston, MA 02215, USA; Harvard Medical School, 25 Shattuck Street, Boston, MA 02115, USA
| |
Collapse
|
7
|
Boneš E, Gergolet M, Bohak C, Lesar Ž, Marolt M. Automatic Segmentation and Alignment of Uterine Shapes from 3D Ultrasound Data. Comput Biol Med 2024; 178:108794. [PMID: 38941903 DOI: 10.1016/j.compbiomed.2024.108794] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2023] [Revised: 06/18/2024] [Accepted: 06/19/2024] [Indexed: 06/30/2024]
Abstract
BACKGROUND The uterus is the most important organ in the female reproductive system. Its shape plays a critical role in fertility and pregnancy outcomes. Advances in medical imaging, such as 3D ultrasound, have significantly improved the exploration of the female genital tract, thereby enhancing gynecological healthcare. Despite well-documented data for organs like the liver and heart, large-scale studies on the uterus are lacking. Existing classifications, such as VCUAM and ESHRE/ESGE, provide different definitions for normal uterine shapes but are not based on real-world measurements. Moreover, the lack of comprehensive datasets significantly hinders research in this area. Our research, part of the larger NURSE study, aims to fill this gap by establishing the shape of a normal uterus using real-world 3D vaginal ultrasound scans. This will facilitate research into uterine shape abnormalities associated with infertility and recurrent miscarriages. METHODS We developed an automated system for the segmentation and alignment of uterine shapes from 3D ultrasound data, which consists of two steps: automatic segmentation of the uteri in 3D ultrasound scans using deep learning techniques, and alignment of the resulting shapes with standard geometrical approaches, enabling the extraction of the normal shape for future analysis. The system was trained and validated on a comprehensive dataset of 3D ultrasound images from multiple medical centers. Its performance was evaluated by comparing the automated results with manual annotations provided by expert clinicians. RESULTS The presented approach demonstrated high accuracy in segmenting and aligning uterine shapes from 3D ultrasound data. The segmentation achieved an average Dice similarity coefficient (DSC) of 0.90. Our method for aligning uterine shapes showed minimal translation and rotation errors compared to traditional methods, with the preliminary average shape exhibiting characteristics consistent with expert findings of a normal uterus. CONCLUSION We have presented an approach to automatically segment and align uterine shapes from 3D ultrasound data. We trained a deep learning nnU-Net model that achieved high accuracy and proposed an alignment method using a combination of standard geometrical techniques. Additionally, we have created a publicly available dataset of 3D transvaginal ultrasound volumes with manual annotations of uterine cavities to support further research and development in this field. The dataset and the trained models are available at https://github.com/UL-FRI-LGM/UterUS.
Collapse
Affiliation(s)
- Eva Boneš
- University of Ljubljana, Faculty of Computer and Information Science, Večna pot 113, Ljubljana, 1000, Slovenia.
| | - Marco Gergolet
- University of Ljubljana, Faculty of Medicine, Vrazov trg 2, Ljubljana, 1000, Slovenia.
| | - Ciril Bohak
- University of Ljubljana, Faculty of Computer and Information Science, Večna pot 113, Ljubljana, 1000, Slovenia; King Abdullah University of Science and Technology, Visual Computing Center, Thuwal, 23955-6900, Saudi Arabia.
| | - Žiga Lesar
- University of Ljubljana, Faculty of Computer and Information Science, Večna pot 113, Ljubljana, 1000, Slovenia.
| | - Matija Marolt
- University of Ljubljana, Faculty of Computer and Information Science, Večna pot 113, Ljubljana, 1000, Slovenia.
| |
Collapse
|
8
|
Liu Z, Kainth K, Zhou A, Deyer TW, Fayad ZA, Greenspan H, Mei X. A review of self-supervised, generative, and few-shot deep learning methods for data-limited magnetic resonance imaging segmentation. NMR IN BIOMEDICINE 2024; 37:e5143. [PMID: 38523402 DOI: 10.1002/nbm.5143] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/01/2023] [Revised: 02/15/2024] [Accepted: 02/16/2024] [Indexed: 03/26/2024]
Abstract
Magnetic resonance imaging (MRI) is a ubiquitous medical imaging technology with applications in disease diagnostics, intervention, and treatment planning. Accurate MRI segmentation is critical for diagnosing abnormalities, monitoring diseases, and deciding on a course of treatment. With the advent of advanced deep learning frameworks, fully automated and accurate MRI segmentation is advancing. Traditional supervised deep learning techniques have advanced tremendously, reaching clinical-level accuracy in the field of segmentation. However, these algorithms still require a large amount of annotated data, which is oftentimes unavailable or impractical. One way to circumvent this issue is to utilize algorithms that exploit a limited amount of labeled data. This paper aims to review such state-of-the-art algorithms that use a limited number of annotated samples. We explain the fundamental principles of self-supervised learning, generative models, few-shot learning, and semi-supervised learning and summarize their applications in cardiac, abdomen, and brain MRI segmentation. Throughout this review, we highlight algorithms that can be employed based on the quantity of annotated data available. We also present a comprehensive list of notable publicly available MRI segmentation datasets. To conclude, we discuss possible future directions of the field-including emerging algorithms, such as contrastive language-image pretraining, and potential combinations across the methods discussed-that can further increase the efficacy of image segmentation with limited labels.
Collapse
Affiliation(s)
- Zelong Liu
- BioMedical Engineering and Imaging Institute, Icahn School of Medicine at Mount Sinai, New York, New York, USA
| | - Komal Kainth
- BioMedical Engineering and Imaging Institute, Icahn School of Medicine at Mount Sinai, New York, New York, USA
| | - Alexander Zhou
- BioMedical Engineering and Imaging Institute, Icahn School of Medicine at Mount Sinai, New York, New York, USA
| | - Timothy W Deyer
- East River Medical Imaging, New York, New York, USA
- Department of Radiology, Cornell Medicine, New York, New York, USA
| | - Zahi A Fayad
- BioMedical Engineering and Imaging Institute, Icahn School of Medicine at Mount Sinai, New York, New York, USA
- Department of Diagnostic, Molecular, and Interventional Radiology, Icahn School of Medicine at Mount Sinai, New York, New York, USA
| | - Hayit Greenspan
- BioMedical Engineering and Imaging Institute, Icahn School of Medicine at Mount Sinai, New York, New York, USA
- Department of Diagnostic, Molecular, and Interventional Radiology, Icahn School of Medicine at Mount Sinai, New York, New York, USA
| | - Xueyan Mei
- BioMedical Engineering and Imaging Institute, Icahn School of Medicine at Mount Sinai, New York, New York, USA
- Department of Diagnostic, Molecular, and Interventional Radiology, Icahn School of Medicine at Mount Sinai, New York, New York, USA
| |
Collapse
|
9
|
Chen C, Han J, Debattista K. Virtual Category Learning: A Semi-Supervised Learning Method for Dense Prediction With Extremely Limited Labels. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2024; 46:5595-5611. [PMID: 38376969 DOI: 10.1109/tpami.2024.3367416] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/22/2024]
Abstract
Due to the costliness of labelled data in real-world applications, semi-supervised learning, underpinned by pseudo labelling, is an appealing solution. However, handling confusing samples is nontrivial: discarding valuable confusing samples would compromise the model generalisation while using them for training would exacerbate the issue of confirmation bias caused by the resulting inevitable mislabelling. To solve this problem, this paper proposes to use confusing samples proactively without label correction. Specifically, a Virtual Category (VC) is assigned to each confusing sample in such a way that it can safely contribute to the model optimisation even without a concrete label. This provides an upper bound for inter-class information sharing capacity, which eventually leads to a better embedding space. Extensive experiments on two mainstream dense prediction tasks - semantic segmentation and object detection, demonstrate that the proposed VC learning significantly surpasses the state-of-the-art, especially when only very few labels are available. Our intriguing findings highlight the usage of VC learning in dense vision tasks.
Collapse
|
10
|
Li W, Ye X, Chen X, Jiang X, Yang Y. A deep learning-based method for the detection and segmentation of breast masses in ultrasound images. Phys Med Biol 2024; 69:155027. [PMID: 38986480 DOI: 10.1088/1361-6560/ad61b6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2024] [Accepted: 07/10/2024] [Indexed: 07/12/2024]
Abstract
Objective.Automated detection and segmentation of breast masses in ultrasound images are critical for breast cancer diagnosis, but remain challenging due to limited image quality and complex breast tissues. This study aims to develop a deep learning-based method that enables accurate breast mass detection and segmentation in ultrasound images.Approach.A novel convolutional neural network-based framework that combines the You Only Look Once (YOLO) v5 network and the Global-Local (GOLO) strategy was developed. First, YOLOv5 was applied to locate the mass regions of interest (ROIs). Second, a Global Local-Connected Multi-Scale Selection (GOLO-CMSS) network was developed to segment the masses. The GOLO-CMSS operated on both the entire images globally and mass ROIs locally, and then integrated the two branches for a final segmentation output. Particularly, in global branch, CMSS applied Multi-Scale Selection (MSS) modules to automatically adjust the receptive fields, and Multi-Input (MLI) modules to enable fusion of shallow and deep features at different resolutions. The USTC dataset containing 28 477 breast ultrasound images was collected for training and test. The proposed method was also tested on three public datasets, UDIAT, BUSI and TUH. The segmentation performance of GOLO-CMSS was compared with other networks and three experienced radiologists.Main results.YOLOv5 outperformed other detection models with average precisions of 99.41%, 95.15%, 93.69% and 96.42% on the USTC, UDIAT, BUSI and TUH datasets, respectively. The proposed GOLO-CMSS showed superior segmentation performance over other state-of-the-art networks, with Dice similarity coefficients (DSCs) of 93.19%, 88.56%, 87.58% and 90.37% on the USTC, UDIAT, BUSI and TUH datasets, respectively. The mean DSC between GOLO-CMSS and each radiologist was significantly better than that between radiologists (p< 0.001).Significance.Our proposed method can accurately detect and segment breast masses with a decent performance comparable to radiologists, highlighting its great potential for clinical implementation in breast ultrasound examination.
Collapse
Affiliation(s)
- Wanqing Li
- Department of Engineering and Applied Physics, University of Science and Technology of China, Hefei, Anhui 230026, People's Republic of China
| | - Xianjun Ye
- Department of Ultrasound Medicine, The First Affiliate Hospital of USTC, Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei, Anhui 230001, People's Republic of China
| | - Xuemin Chen
- Health Management Center, The First Affiliated Hospital of USTC, Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei, Anhui 230001, People's Republic of China
| | - Xianxian Jiang
- Graduate School of Bengbu Medical College, Bengbu, Anhui 233030, People's Republic of China
| | - Yidong Yang
- Department of Engineering and Applied Physics, University of Science and Technology of China, Hefei, Anhui 230026, People's Republic of China
- Ion Medical Research Institute, University of Science and Technology of China, Hefei, Anhui 230026, People's Republic of China
- Department of Radiation Oncology, The First Affiliated Hospital of USTC, Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei, Anhui 230001, People's Republic of China
| |
Collapse
|
11
|
Jiao C, Lao Y, Zhang W, Braunstein S, Salans M, Villanueva-Meyer J, Hervey-Jumper SL, Yang B, Morin O, Valdes G, Fan Z, Shiroishi M, Zada G, Sheng K, Yang W. Multi-modal fusion and feature enhancement U-Net coupling with stem cell niches proximity estimation for voxel-wise GBM recurrence prediction . Phys Med Biol 2024; 69:155021. [PMID: 39019073 DOI: 10.1088/1361-6560/ad64b8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2024] [Accepted: 07/17/2024] [Indexed: 07/19/2024]
Abstract
Objective.We aim to develop a Multi-modal Fusion and Feature Enhancement U-Net (MFFE U-Net) coupling with stem cell niche proximity estimation to improve voxel-wise Glioblastoma (GBM) recurrence prediction.Approach.57 patients with pre- and post-surgery magnetic resonance (MR) scans were retrospectively solicited from 4 databases. Post-surgery MR scans included two months before the clinical diagnosis of recurrence and the day of the radiologicaly confirmed recurrence. The recurrences were manually annotated on the T1ce. The high-risk recurrence region was first determined. Then, a sparse multi-modal feature fusion U-Net was developed. The 50 patients from 3 databases were divided into 70% training, 10% validation, and 20% testing. 7 patients from the 4th institution were used as external testing with transfer learning. Model performance was evaluated by recall, precision, F1-score, and Hausdorff Distance at the 95% percentile (HD95). The proposed MFFE U-Net was compared to the support vector machine (SVM) model and two state-of-the-art neural networks. An ablation study was performed.Main results.The MFFE U-Net achieved a precision of 0.79 ± 0.08, a recall of 0.85 ± 0.11, and an F1-score of 0.82 ± 0.09. Statistically significant improvement was observed when comparing MFFE U-Net with proximity estimation couple SVM (SVMPE), mU-Net, and Deeplabv3. The HD95 was 2.75 ± 0.44 mm and 3.91 ± 0.83 mm for the 10 patients used in the model construction and 7 patients used for external testing, respectively. The ablation test showed that all five MR sequences contributed to the performance of the final model, with T1ce contributing the most. Convergence analysis, time efficiency analysis, and visualization of the intermediate results further discovered the characteristics of the proposed method.Significance. We present an advanced MFFE learning framework, MFFE U-Net, for effective voxel-wise GBM recurrence prediction. MFFE U-Net performs significantly better than the state-of-the-art networks and can potentially guide early RT intervention of the disease recurrence.
Collapse
Affiliation(s)
- Changzhe Jiao
- Department of Radiation Oncology, UC San Francisco, San Francisco, CA 94143, United States of America
| | - Yi Lao
- Department of Radiation Oncology, UC Los Angeles, Los Angeles, CA 90095, United States of America
| | - Wenwen Zhang
- Department of Radiation Oncology, UC San Francisco, San Francisco, CA 94143, United States of America
| | - Steve Braunstein
- Department of Radiation Oncology, UC San Francisco, San Francisco, CA 94143, United States of America
| | - Mia Salans
- Department of Radiation Oncology, UC San Francisco, San Francisco, CA 94143, United States of America
| | - Javier Villanueva-Meyer
- Department of Radiology and Biomedical Imaging, UC San Francisco, San Francisco, CA 94143, United States of America
| | - Shawn L Hervey-Jumper
- Department of Neurosurgery, UC San Francisco, San Francisco, CA 94143, United States of America
| | - Bo Yang
- Department of Radiation Oncology, UC San Francisco, San Francisco, CA 94143, United States of America
| | - Olivier Morin
- Department of Radiation Oncology, UC San Francisco, San Francisco, CA 94143, United States of America
| | - Gilmer Valdes
- Department of Radiation Oncology, UC San Francisco, San Francisco, CA 94143, United States of America
| | - Zhaoyang Fan
- Department of Radiology, University of Southern California, Los Angeles, CA 90033, United States of America
| | - Mark Shiroishi
- Department of Radiology, University of Southern California, Los Angeles, CA 90033, United States of America
| | - Gabriel Zada
- Department of Neurosurgery, University of Southern California, Los Angeles, CA 90033, United States of America
| | - Ke Sheng
- Department of Radiation Oncology, UC San Francisco, San Francisco, CA 94143, United States of America
| | - Wensha Yang
- Department of Radiation Oncology, UC San Francisco, San Francisco, CA 94143, United States of America
| |
Collapse
|
12
|
Zheng X, Yang Y, Li D, Deng Y, Xie Y, Yi Z, Ma L, Xu L. Precise Localization for Anatomo-Physiological Hallmarks of the Cervical Spine by Using Neural Memory Ordinary Differential Equation. Int J Neural Syst 2024:2450056. [PMID: 39049777 DOI: 10.1142/s0129065724500564] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/27/2024]
Abstract
In the evaluation of cervical spine disorders, precise positioning of anatomo-physiological hallmarks is fundamental for calculating diverse measurement metrics. Despite the fact that deep learning has achieved impressive results in the field of keypoint localization, there are still many limitations when facing medical image. First, these methods often encounter limitations when faced with the inherent variability in cervical spine datasets, arising from imaging factors. Second, predicting keypoints for only 4% of the entire X-ray image surface area poses a significant challenge. To tackle these issues, we propose a deep neural network architecture, NF-DEKR, specifically tailored for predicting keypoints in cervical spine physiological anatomy. Leveraging neural memory ordinary differential equation with its distinctive memory learning separation and convergence to a singular global attractor characteristic, our design effectively mitigates inherent data variability. Simultaneously, we introduce a Multi-Resolution Focus module to preprocess feature maps before entering the disentangled regression branch and the heatmap branch. Employing a differentiated strategy for feature maps of varying scales, this approach yields more accurate predictions of densely localized keypoints. We construct a medical dataset, SCUSpineXray, comprising X-ray images annotated by orthopedic specialists and conduct similar experiments on the publicly available UWSpineCT dataset. Experimental results demonstrate that compared to the baseline DEKR network, our proposed method enhances average precision by 2% to 3%, accompanied by a marginal increase in model parameters and the floating-point operations (FLOPs). The code (https://github.com/Zhxyi/NF-DEKR) is available.
Collapse
Affiliation(s)
- Xi Zheng
- Machine Intelligence Laboratory, College of Computer Science, Sichuan University, No. 24 South Section 1, Yihuan Road, Chengdu 610065, P. R. China
| | - Yi Yang
- Department of Orthopedics, Orthopedic Research Institute, West China Hospital, Sichuan University, No. 37 Guo Xue Road, Chengdu 610041, P. R. China
| | - Dehan Li
- Machine Intelligence Laboratory, College of Computer Science, Sichuan University, No. 24 South Section 1, Yihuan Road, Chengdu 610065, P. R. China
| | - Yi Deng
- Department of Orthopedics, Orthopedic Research Institute, West China Hospital, Sichuan University, No. 37 Guo Xue Road, Chengdu 610041, P. R. China
| | - Yuexiong Xie
- Machine Intelligence Laboratory, College of Computer Science, Sichuan University, No. 24 South Section 1, Yihuan Road, Chengdu 610065, P. R. China
| | - Zhang Yi
- Machine Intelligence Laboratory, College of Computer Science, Sichuan University, No. 24 South Section 1, Yihuan Road, Chengdu 610065, P. R. China
| | - Litai Ma
- Department of Orthopedics, Orthopedic Research Institute, West China Hospital, Sichuan University, No. 37 Guo Xue Road, Chengdu 610041, P. R. China
| | - Lei Xu
- Machine Intelligence Laboratory, College of Computer Science, Sichuan University, No. 24 South Section 1, Yihuan Road, Chengdu 610065, P. R. China
| |
Collapse
|
13
|
Cho MJ, Hwang D, Yie SY, Lee JS. Multi-modal co-learning with attention mechanism for head and neck tumor segmentation on 18FDG PET-CT. EJNMMI Phys 2024; 11:67. [PMID: 39052194 DOI: 10.1186/s40658-024-00670-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2023] [Accepted: 07/12/2024] [Indexed: 07/27/2024] Open
Abstract
PURPOSE Effective radiation therapy requires accurate segmentation of head and neck cancer, one of the most common types of cancer. With the advancement of deep learning, people have come up with various methods that use positron emission tomography-computed tomography to get complementary information. However, these approaches are computationally expensive because of the separation of feature extraction and fusion functions and do not make use of the high sensitivity of PET. We propose a new deep learning-based approach to alleviate these challenges. METHODS We proposed a tumor region attention module that fully exploits the high sensitivity of PET and designed a network that learns the correlation between the PET and CT features using squeeze-and-excitation normalization (SE Norm) without separating the feature extraction and fusion functions. In addition, we introduce multi-scale context fusion, which exploits contextual information from different scales. RESULTS The HECKTOR challenge 2021 dataset was used for training and testing. The proposed model outperformed the state-of-the-art models for medical image segmentation; in particular, the dice similarity coefficient increased by 8.78% compared to U-net. CONCLUSION The proposed network segmented the complex shape of the tumor better than the state-of-the-art medical image segmentation methods, accurately distinguishing between tumor and non-tumor regions.
Collapse
Affiliation(s)
- Min Jeong Cho
- Interdisciplinary Program in Bioengineering, Seoul National University College of Engineering, Seoul, 03080, South Korea
- Department of Nuclear Medicine, Seoul National University College of Medicine, 103 Daehak-ro, Jongno-gu, Seoul, 03080, South Korea
- Integrated Major in Innovative Medical Science, Seoul National Graduate School, Seoul, South Korea
| | - Donghwi Hwang
- Department of Nuclear Medicine, Seoul National University College of Medicine, 103 Daehak-ro, Jongno-gu, Seoul, 03080, South Korea
- Department of Biomedical Sciences, Seoul National University College of Medicine, Seoul, 03080, South Korea
| | - Si Young Yie
- Interdisciplinary Program in Bioengineering, Seoul National University College of Engineering, Seoul, 03080, South Korea
- Department of Nuclear Medicine, Seoul National University College of Medicine, 103 Daehak-ro, Jongno-gu, Seoul, 03080, South Korea
- Integrated Major in Innovative Medical Science, Seoul National Graduate School, Seoul, South Korea
| | - Jae Sung Lee
- Interdisciplinary Program in Bioengineering, Seoul National University College of Engineering, Seoul, 03080, South Korea.
- Department of Nuclear Medicine, Seoul National University College of Medicine, 103 Daehak-ro, Jongno-gu, Seoul, 03080, South Korea.
- Integrated Major in Innovative Medical Science, Seoul National Graduate School, Seoul, South Korea.
- Department of Biomedical Sciences, Seoul National University College of Medicine, Seoul, 03080, South Korea.
- Brightonix Imaging Inc, Seoul, 04782, South Korea.
| |
Collapse
|
14
|
Xu G, Jia W, Wu T, Chen L, Gao G. HAFormer: Unleashing the Power of Hierarchy-Aware Features for Lightweight Semantic Segmentation. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2024; 33:4202-4214. [PMID: 39008382 DOI: 10.1109/tip.2024.3425048] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/17/2024]
Abstract
Both Convolutional Neural Networks (CNNs) and Transformers have shown great success in semantic segmentation tasks. Efforts have been made to integrate CNNs with Transformer models to capture both local and global context interactions. However, there is still room for enhancement, particularly when considering constraints on computational resources. In this paper, we introduce HAFormer, a model that combines the hierarchical features extraction ability of CNNs with the global dependency modeling capability of Transformers to tackle lightweight semantic segmentation challenges. Specifically, we design a Hierarchy-Aware Pixel-Excitation (HAPE) module for adaptive multi-scale local feature extraction. During the global perception modeling, we devise an Efficient Transformer (ET) module streamlining the quadratic calculations associated with traditional Transformers. Moreover, a correlation-weighted Fusion (cwF) module selectively merges diverse feature representations, significantly enhancing predictive accuracy. HAFormer achieves high performance with minimal computational overhead and compact model size, achieving 74.2% mIoU on Cityscapes and 71.1% mIoU on CamVid test datasets, with frame rates of 105FPS and 118FPS on a single 2080Ti GPU. The source codes are available at https://github.com/XU-GITHUB-curry/HAFormer.
Collapse
|
15
|
Ke A, Luo J, Cai B. UNet-like network fused swin transformer and CNN for semantic image synthesis. Sci Rep 2024; 14:16761. [PMID: 39033170 DOI: 10.1038/s41598-024-65585-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2024] [Accepted: 06/21/2024] [Indexed: 07/23/2024] Open
Abstract
Semantic image synthesis approaches has been dominated by the modelling of Convolutional Neural Networks (CNN). Due to the limitations of local perception, their performance improvement seems to have plateaued in recent years. To tackle this issue, we propose the SC-UNet model, which is a UNet-like network fused Swin Transformer and CNN for semantic image synthesis. Photorealistic image synthesis conditional on the given semantic layout depends on the high-level semantics and the low-level positions. To improve the synthesis performance, we design a novel conditional residual fusion module for the model decoder to efficiently fuse the hierarchical feature maps extracted at different scales. Moreover, this module combines the opposition-based learning mechanism and the weight assignment mechanism for enhancing and attending the semantic information. Compared to pure CNN-based models, our SC-UNet combines the local and global perceptions to better extract high- and low-level features and better fuse multi-scale features. We have conducted an extensive amount of comparison experiments, both in quantitative and qualitative terms, to validate the effectiveness of our proposed SC-UNet model for semantic image synthesis. The outcomes illustrate that SC-UNet distinctively outperforms the state-of-the-art model on three benchmark datasets (Citysacpes, ADE20K, and COCO-Stuff) including numerous real-scene images.
Collapse
Affiliation(s)
- Aihua Ke
- School of Cyber Science and Engineering, Wuhan University, Wuhan, 430072, China
- Key Laboratory of Aerospace Information Security and Trusted Computing, Ministry of Education, Wuhan, 430072, China
| | - Jian Luo
- School of Cyber Science and Engineering, Wuhan University, Wuhan, 430072, China
- Key Laboratory of Aerospace Information Security and Trusted Computing, Ministry of Education, Wuhan, 430072, China
| | - Bo Cai
- School of Cyber Science and Engineering, Wuhan University, Wuhan, 430072, China.
- Key Laboratory of Aerospace Information Security and Trusted Computing, Ministry of Education, Wuhan, 430072, China.
| |
Collapse
|
16
|
Oluigbo D, Mathai TS, Santra B, Mukherjee P, Liu J, Jha A, Patel M, Pacak K, Summers RM. Weakly supervised detection of pheochromocytomas and paragangliomas in CT using noisy data. Comput Med Imaging Graph 2024; 116:102419. [PMID: 39053035 DOI: 10.1016/j.compmedimag.2024.102419] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2024] [Revised: 06/07/2024] [Accepted: 07/16/2024] [Indexed: 07/27/2024]
Abstract
Pheochromocytomas and Paragangliomas (PPGLs) are rare adrenal and extra-adrenal tumors that have metastatic potential. Management of patients with PPGLs mainly depends on the makeup of their genetic cluster: SDHx, VHL/EPAS1, kinase, and sporadic. CT is the preferred modality for precise localization of PPGLs, such that their metastatic progression can be assessed. However, the variable size, morphology, and appearance of these tumors in different anatomical regions can pose challenges for radiologists. Since radiologists must routinely track changes across patient visits, manual annotation of PPGLs is quite time-consuming and cumbersome to do across all axial slices in a CT volume. As such, PPGLs are only weakly annotated on axial slices by radiologists in the form of RECIST measurements. To ameliorate the manual effort spent by radiologists, we propose a method for the automated detection of PPGLs in CT via a proxy segmentation task. Weak 3D annotations (derived from 2D bounding boxes) were used to train both 2D and 3D nnUNet models to detect PPGLs via segmentation. We evaluated our approaches on an in-house dataset comprised of chest-abdomen-pelvis CTs of 255 patients with confirmed PPGLs. On a test set of 53 CT volumes, our 3D nnUNet model achieved a detection precision of 70% and sensitivity of 64.1%, and outperformed the 2D model that obtained a precision of 52.7% and sensitivity of 27.5% (p< 0.05). SDHx and sporadic genetic clusters achieved the highest precisions of 73.1% and 72.7% respectively. Our state-of-the art findings highlight the promising nature of the challenging task of automated PPGL detection.
Collapse
Affiliation(s)
- David Oluigbo
- Clinical Center, National Institutes of Health, Bethesda, MD, USA
| | | | - Bikash Santra
- Clinical Center, National Institutes of Health, Bethesda, MD, USA
| | - Pritam Mukherjee
- Clinical Center, National Institutes of Health, Bethesda, MD, USA
| | - Jianfei Liu
- Clinical Center, National Institutes of Health, Bethesda, MD, USA
| | - Abhishek Jha
- National Institute of Child Health and Human Development, National Institutes of Health, Bethesda, MD, USA
| | - Mayank Patel
- National Institute of Child Health and Human Development, National Institutes of Health, Bethesda, MD, USA
| | - Karel Pacak
- National Institute of Child Health and Human Development, National Institutes of Health, Bethesda, MD, USA
| | - Ronald M Summers
- Clinical Center, National Institutes of Health, Bethesda, MD, USA
| |
Collapse
|
17
|
Kwon J, Kim J, Park H. Leveraging segmentation-guided spatial feature embedding for overall survival prediction in glioblastoma with multimodal magnetic resonance imaging. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2024; 255:108338. [PMID: 39042996 DOI: 10.1016/j.cmpb.2024.108338] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/17/2024] [Revised: 07/17/2024] [Accepted: 07/17/2024] [Indexed: 07/25/2024]
Abstract
BACKGROUND AND OBJECTIVE Patients with glioblastoma have a five-year relative survival rate of less than 5 %. Thus, accurately predicting the overall survival (OS) of patients with glioblastoma is crucial for effective treatment planning. METHODS To fully leverage the imaging characteristics of glioblastomas, we propose a segmentation-guided regression method for predicting OS of patients with brain tumors using multimodal magnetic resonance imaging. Specifically, a brain tumor segmentation network was first pre-trained without leveraging survival information. Subsequently, the survival regression network was jointly trained with the guidance of brain tumor segmentation, focusing on tumor voxels and suppressing irrelevant backgrounds. RESULTS Our proposed framework, based on the well-known backbone of UNETR++, achieved a Dice score of 0.7910, Spearman correlation of 0.4112, and Harrell's concordance index of 0.6488. The model consistently showed promising results compared with baseline methods on two different datasets (BraTS and UCSF-PDGM). Furthermore, ablation studies on our training configurations demonstrated that both the pre-training segmentation network and contrastive loss significantly improved all metrics for OS prediction. CONCLUSIONS In this study, we propose a joint learning framework based on a pre-trained segmentation backbone for OS prediction by leveraging a brain tumor segmentation map. By utilizing a spatial feature map, our model can operate using a sliding-window approach, which can be adopted by varying the matrix sizes and resolutions of the input images.
Collapse
Affiliation(s)
- Junmo Kwon
- Department of Electrical and Computer Engineering, Sungkyunkwan University, Suwon, 16419, South Korea; Center for Neuroscience Imaging Research, Institute for Basic Science, Suwon, 16419, South Korea
| | - Jonghun Kim
- Department of Electrical and Computer Engineering, Sungkyunkwan University, Suwon, 16419, South Korea
| | - Hyunjin Park
- Department of Electrical and Computer Engineering, Sungkyunkwan University, Suwon, 16419, South Korea; Center for Neuroscience Imaging Research, Institute for Basic Science, Suwon, 16419, South Korea.
| |
Collapse
|
18
|
Naeeni Davarani M, Arian Darestani A, Guillen Cañas V, Azimi H, Havadaragh SH, Hashemi H, Harirchian MH. Efficient segmentation of active and inactive plaques in FLAIR-images using DeepLabV3Plus SE with efficientnetb0 backbone in multiple sclerosis. Sci Rep 2024; 14:16304. [PMID: 39009636 PMCID: PMC11251059 DOI: 10.1038/s41598-024-67130-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2023] [Accepted: 07/08/2024] [Indexed: 07/17/2024] Open
Abstract
This research paper introduces an efficient approach for the segmentation of active and inactive plaques within Fluid-attenuated inversion recovery (FLAIR) images, employing a convolutional neural network (CNN) model known as DeepLabV3Plus SE with the EfficientNetB0 backbone in Multiple sclerosis (MS), and demonstrates its superior performance compared to other CNN architectures. The study encompasses various critical components, including dataset pre-processing techniques, the utilization of the Squeeze and Excitation Network (SE-Block), and the atrous spatial separable pyramid Block to enhance segmentation capabilities. Detailed descriptions of pre-processing procedures, such as removing the cranial bone segment, image resizing, and normalization, are provided. This study analyzed a cross-sectional cohort of 100 MS patients with active brain plaques, examining 5000 MRI slices. After filtering, 1500 slices were utilized for labeling and deep learning. The training process adopts the dice coefficient as the loss function and utilizes Adam optimization. The study evaluated the model's performance using multiple metrics, including intersection over union (IOU), Dice Score, Precision, Recall, and F1-Score, and offers a comparative analysis with other CNN architectures. Results demonstrate the superior segmentation ability of the proposed model, as evidenced by an IOU of 69.87, Dice Score of 76.24, Precision of 88.89, Recall of 73.52, and F1-Score of 80.47 for the DeepLabV3+SE_EfficientNetB0 model. This research contributes to the advancement of plaque segmentation in FLAIR images and offers a compelling approach with substantial potential for medical image analysis and diagnosis.
Collapse
Affiliation(s)
| | | | | | - Hossein Azimi
- Faculty of Mathematical Sciences and Computer, Kharazmi University, Tehran, Iran
| | - Sanaz Heydari Havadaragh
- Neurology Department, Imam Khomeini Hospital, Tehran University of Medical Sciences, Tehran, Iran
| | - Hasan Hashemi
- Department of Radiology, School of Medicine, Tehran University of Medical Sciences (TUMS), Tehran, Iran
| | - Mohammd Hossein Harirchian
- Iranian Center of Neurological Research, Neuroscience Institute, Tehran University of Medical Sciences, Tehran, Iran.
| |
Collapse
|
19
|
Wang MT, Cai YR, Jang V, Meng HJ, Sun LB, Deng LM, Liu YW, Zou WJ. Establishment of a corneal ulcer prognostic model based on machine learning. Sci Rep 2024; 14:16154. [PMID: 38997339 PMCID: PMC11245505 DOI: 10.1038/s41598-024-66608-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2023] [Accepted: 07/02/2024] [Indexed: 07/14/2024] Open
Abstract
Corneal infection is a major public health concern worldwide and the most common cause of unilateral corneal blindness. Toxic effects of different microorganisms, such as bacteria and fungi, worsen keratitis leading to corneal perforation even with optimal drug treatment. The cornea forms the main refractive surface of the eye. Diseases affecting the cornea can cause severe visual impairment. Therefore, it is crucial to analyze the risk of corneal perforation and visual impairment in corneal ulcer patients for making early treatment strategies. The modeling of a fully automated prognostic model system was performed in two parts. In the first part, the dataset contained 4973 slit lamp images of corneal ulcer patients in three centers. A deep learning model was developed and tested for segmenting and classifying five lesions (corneal ulcer, corneal scar, hypopyon, corneal descementocele, and corneal neovascularization) in the eyes of corneal ulcer patients. Further, hierarchical quantification was carried out based on policy rules. In the second part, the dataset included clinical data (name, gender, age, best corrected visual acuity, and type of corneal ulcer) of 240 patients with corneal ulcers and respective 1010 slit lamp images under two light sources (natural light and cobalt blue light). The slit lamp images were then quantified hierarchically according to the policy rules developed in the first part of the modeling. Combining the above clinical data, the features were used to build the final prognostic model system for corneal ulcer perforation outcome and visual impairment using machine learning algorithms such as XGBoost, LightGBM. The ROC curve area (AUC value) evaluated the model's performance. For segmentation of the five lesions, the accuracy rates of hypopyon, descemetocele, corneal ulcer under blue light, and corneal neovascularization were 96.86, 91.64, 90.51, and 93.97, respectively. For the corneal scar lesion classification, the accuracy rate of the final model was 69.76. The XGBoost model performed the best in predicting the 1-month prognosis of patients, with an AUC of 0.81 (95% CI 0.63-1.00) for ulcer perforation and an AUC of 0.77 (95% CI 0.63-0.91) for visual impairment. In predicting the 3-month prognosis of patients, the XGBoost model received the best AUC of 0.97 (95% CI 0.92-1.00) for ulcer perforation, while the LightGBM model achieved the best performance with an AUC of 0.98 (95% CI 0.94-1.00) for visual impairment.
Collapse
Affiliation(s)
- Meng-Tong Wang
- Department of Ophthalmology, The First Affiliated Hospital of Guangxi Medical University, 22 Shuangyong Road, Nanning, Guangxi Zhuang Autonomous Region, China
| | - You-Ran Cai
- Department of Ophthalmology, The First Affiliated Hospital of Guangxi Medical University, 22 Shuangyong Road, Nanning, Guangxi Zhuang Autonomous Region, China
| | - Vlon Jang
- Qi Dian Fu Liu Technology Co.Ltd, Beijing, China
| | - Hong-Jian Meng
- Department of Ophthalmology, The First Affiliated Hospital of Guangxi University of Chinese Medicine, Nanning, China
| | - Ling-Bo Sun
- Department of Ophthalmology, Ruikang Hospital Affiliated to Guangxi University of Chinese Medicine, Nanning, China
| | - Li-Min Deng
- Department of Ophthalmology, Guangxi Zhuang Autonomous Region People's Hospital, Nanning, China
| | - Yu-Wen Liu
- School of Medicine, Eye Institute of Xiamen University, Xiamen University, Xiamen, Fujian, China
| | - Wen-Jin Zou
- Department of Ophthalmology, The First Affiliated Hospital of Guangxi Medical University, 22 Shuangyong Road, Nanning, Guangxi Zhuang Autonomous Region, China.
| |
Collapse
|
20
|
Qu S, Cui C, Duan J, Lu Y, Pang Z. Underwater small target detection under YOLOv8-LA model. Sci Rep 2024; 14:16108. [PMID: 38997415 PMCID: PMC11245550 DOI: 10.1038/s41598-024-66950-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2024] [Accepted: 07/05/2024] [Indexed: 07/14/2024] Open
Abstract
In the realm of marine environmental engineering, the swift and accurate detection of underwater targets is of considerable significance. Recently, methods based on Convolutional Neural Networks (CNN) have been applied to enhance the detection of such targets. However, deep neural networks usually require a large number of parameters, resulting in slow processing speed. Meanwhile, existing methods present challenges in accurate detection when facing small and densely arranged underwater targets. To address these issues, we propose a new neural network model, YOLOv8-LA, for improving the detection performance of underwater targets. First, we design a Lightweight Efficient Partial Convolution (LEPC) module to optimize spatial feature extraction by selectively processing input channels to improve efficiency and significantly reduce redundant computation and storage requirements. Second, we developed the AP-FasterNet architecture for small targets that are commonly found in underwater datasets. By integrating depth-separable convolutions with different expansion rates into FasterNet, AP-FasterNet enhances the model's ability to capture detailed features of small targets. Finally, we integrate the lightweight and efficient content-aware reorganization (CARAFE) up-sampling operation into YOLOv8 to enhance the model performance by aggregating contextual information over a large perceptual field and mitigating information loss during up-sampling.Evaluation results on the URPC2021 dataset show that the YOLOv8-LA model achieves 84.7% mean accuracy (mAP) on a single Nvidia GeForce RTX 3090 and operates at 189.3 frames per second (FPS), demonstrating that it outperforms existing state-of-the-art methods in terms of performance. This result demonstrates the model's ability to ensure high detection accuracy while maintaining real-time processing capabilities.
Collapse
Affiliation(s)
- Shenming Qu
- School of Software, Henan University, Kaifeng, 475004, Henan, China
| | - Can Cui
- School of Software, Henan University, Kaifeng, 475004, Henan, China
| | - Jiale Duan
- School of Software, Henan University, Kaifeng, 475004, Henan, China
| | - Yongyong Lu
- School of Software, Henan University, Kaifeng, 475004, Henan, China
| | - Zilong Pang
- School of Software, Henan University, Kaifeng, 475004, Henan, China.
| |
Collapse
|
21
|
Liu S, Lin Y, Liu D. FreqSNet: a multiaxial integration of frequency and spatial domains for medical image segmentation. Phys Med Biol 2024; 69:145011. [PMID: 38959911 DOI: 10.1088/1361-6560/ad5ef3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2023] [Accepted: 07/03/2024] [Indexed: 07/05/2024]
Abstract
Objective.In recent years, convolutional neural networks, which typically focus on extracting spatial domain features, have shown limitations in learning global contextual information. However, frequency domain can offer a global perspective that spatial domain methods often struggle to capture. To address this limitation, we propose FreqSNet, which leverages both frequency and spatial features for medical image segmentation.Approach.To begin, we propose a frequency-space representation aggregation block (FSRAB) to replace conventional convolutions. FSRAB contains three frequency domain branches to capture global frequency information along different axial combinations, while a convolutional branch is designed to interact information across channels in local spatial features. Secondly, the multiplex expansion attention block extracts long-range dependency information using dilated convolutional blocks, while suppressing irrelevant information via attention mechanisms. Finally, the introduced Feature Integration Block enhances feature representation by integrating semantic features that fuse spatial and channel positional information.Main results.We validated our method on 5 public datasets, including BUSI, CVC-ClinicDB, CVC-ColonDB, ISIC-2018, and Luna16. On these datasets, our method achieved Intersection over Union (IoU) scores of 75.46%, 87.81%, 79.08%, 84.04%, and 96.99%, and Hausdorff distance values of 22.22 mm, 13.20 mm, 13.08 mm, 13.51 mm, and 5.22 mm, respectively. Compared to other state-of-the-art methods, our FreqSNet achieves better segmentation results.Significance.Our method can effectively combine frequency domain information with spatial domain features, enhancing the segmentation performance and generalization capability in medical image segmentation tasks.
Collapse
Affiliation(s)
- Shangwang Liu
- The School of Computer and Information Engineering, Henan Normal University, Xinxiang 453007, People's Republic of China
- Engineering Lab of Intelligence Business and Internet of Things, Henan Normal University, Xinxiang 453007, People's Republic of China
| | - Yinghai Lin
- The School of Computer and Information Engineering, Henan Normal University, Xinxiang 453007, People's Republic of China
- Engineering Lab of Intelligence Business and Internet of Things, Henan Normal University, Xinxiang 453007, People's Republic of China
| | - Danyang Liu
- The School of Computer and Information Engineering, Henan Normal University, Xinxiang 453007, People's Republic of China
- Engineering Lab of Intelligence Business and Internet of Things, Henan Normal University, Xinxiang 453007, People's Republic of China
| |
Collapse
|
22
|
Xie W, Lin W, Li P, Lai H, Wang Z, Liu P, Huang Y, Liu Y, Tang L, Lyu G. Developing a deep learning model for predicting ovarian cancer in Ovarian-Adnexal Reporting and Data System Ultrasound (O-RADS US) Category 4 lesions: A multicenter study. J Cancer Res Clin Oncol 2024; 150:346. [PMID: 38981916 PMCID: PMC11233367 DOI: 10.1007/s00432-024-05872-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2024] [Accepted: 06/27/2024] [Indexed: 07/11/2024]
Abstract
PURPOSE To develop a deep learning (DL) model for differentiating between benign and malignant ovarian tumors of Ovarian-Adnexal Reporting and Data System Ultrasound (O-RADS US) Category 4 lesions, and validate its diagnostic performance. METHODS A retrospective analysis of 1619 US images obtained from three centers from December 2014 to March 2023. DeepLabV3 and YOLOv8 were jointly used to segment, classify, and detect ovarian tumors. Precision and recall and area under the receiver operating characteristic curve (AUC) were employed to assess the model performance. RESULTS A total of 519 patients (including 269 benign and 250 malignant masses) were enrolled in the study. The number of women included in the training, validation, and test cohorts was 426, 46, and 47, respectively. The detection models exhibited an average precision of 98.68% (95% CI: 0.95-0.99) for benign masses and 96.23% (95% CI: 0.92-0.98) for malignant masses. Moreover, in the training set, the AUC was 0.96 (95% CI: 0.94-0.97), whereas in the validation set, the AUC was 0.93(95% CI: 0.89-0.94) and 0.95 (95% CI: 0.91-0.96) in the test set. The sensitivity, specificity, accuracy, positive predictive value, and negative predictive values for the training set were 0.943,0.957,0.951,0.966, and 0.936, respectively, whereas those for the validation set were 0.905,0.935, 0.935,0.919, and 0.931, respectively. In addition, the sensitivity, specificity, accuracy, positive predictive value, and negative predictive value for the test set were 0.925, 0.955, 0.941, 0.956, and 0.927, respectively. CONCLUSION The constructed DL model exhibited high diagnostic performance in distinguishing benign and malignant ovarian tumors in O-RADS US category 4 lesions.
Collapse
Affiliation(s)
- Wenting Xie
- Department of Ultrasound Medicine, The Second Affiliated Hospital of Fujian medical University, Quanzhou, Fujian Province, 362000, China
- Department of Ultrasound, Fujian Cancer Hospital, Clinical Oncology School of Fujian Medical University, Fuzhou, Fujian Province, 350014, China
| | - Wenjie Lin
- Department of Ultrasound Medicine, The Second Affiliated Hospital of Fujian medical University, Quanzhou, Fujian Province, 362000, China
| | - Ping Li
- Department of Gynecology and Obstetrics, Quanzhou First Hospital Affiliated to Fujian Medical University, Quanzhou, Fujian, 362000, China
| | - Hongwei Lai
- Department of Ultrasound, Fujian Provincial Maternity and Children's Hospital, Fuzhou, Fujian Province, 350014, China
| | - Zhilan Wang
- Department of Ultrasound, Nanping First Hospital Affiliated to Fujian Medical University, Nanping, Fujian Province, 35300, China
| | - Peizhong Liu
- School of Medicine, Huaqiao University, Quanzhou, Fujian Province, 362000, China
| | - Yijun Huang
- Department of Ultrasound, Fujian Cancer Hospital, Clinical Oncology School of Fujian Medical University, Fuzhou, Fujian Province, 350014, China
| | - Yao Liu
- Quanzhou Bolang Technology Group Co., Ltd, Quanzhou, Fujian Province, 362000, China.
| | - Lina Tang
- Department of Ultrasound, Fujian Cancer Hospital, Clinical Oncology School of Fujian Medical University, Fuzhou, Fujian Province, 350014, China.
| | - Guorong Lyu
- Department of Ultrasound Medicine, The Second Affiliated Hospital of Fujian medical University, Quanzhou, Fujian Province, 362000, China.
| |
Collapse
|
23
|
Pham TV, Vu TN, Le HMQ, Pham VT, Tran TT. CapNet: An Automatic Attention-Based with Mixer Model for Cardiovascular Magnetic Resonance Image Segmentation. JOURNAL OF IMAGING INFORMATICS IN MEDICINE 2024:10.1007/s10278-024-01191-x. [PMID: 38980628 DOI: 10.1007/s10278-024-01191-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/24/2023] [Revised: 05/21/2024] [Accepted: 05/22/2024] [Indexed: 07/10/2024]
Abstract
Deep neural networks have shown excellent performance in medical image segmentation, especially for cardiac images. Transformer-based models, though having advantages over convolutional neural networks due to the ability of long-range dependence learning, still have shortcomings such as having a large number of parameters and and high computational cost. Additionally, for better results, they are often pretrained on a larger data, thus requiring large memory size and increasing resource expenses. In this study, we propose a new lightweight but efficient model, namely CapNet, based on convolutions and mixing modules for cardiac segmentation from magnetic resonance images (MRI) that can be trained from scratch with a small amount of parameters. To handle varying sizes and shapes which often occur in cardiac systolic and diastolic phases, we propose attention modules for pooling, spatial, and channel information. We also propose a novel loss called the Tversky Shape Power Distance function based on the shape dissimilarity between labels and predictions that shows promising performances compared to other losses. Experiments on three public datasets including ACDC benchmark, Sunnybrook data, and MS-CMR challenge are conducted and compared with other state of the arts (SOTA). For binary segmentation, the proposed CapNet obtained the Dice similarity coefficient (DSC) of 94% and 95.93% for respectively the Endocardium and Epicardium regions with Sunnybrook dataset, 94.49% for Endocardium, and 96.82% for Epicardium with the ACDC data. Regarding the multiclass case, the average DSC by CapNet is 93.05% for the ACDC data; and the DSC scores for the MS-CMR are 94.59%, 92.22%, and 93.99% for respectively the bSSFP, T2-SPAIR, and LGE sequences of the MS-CMR. Moreover, the statistical significance analysis tests with p-value < 0.05 compared with transformer-based methods and some CNN-based approaches demonstrated that the CapNet, though having fewer training parameters, is statistically significant. The promising evaluation metrics show comparative results in both Dice and IoU indices compared to SOTA CNN-based and Transformer-based architectures.
Collapse
Affiliation(s)
- Tien Viet Pham
- Department of Automation Engineering, School of Electrical and Electronic Engineering, Hanoi University of Science and Technology, Hanoi, Vietnam
| | - Tu Ngoc Vu
- Department of Automation Engineering, School of Electrical and Electronic Engineering, Hanoi University of Science and Technology, Hanoi, Vietnam
| | - Hoang-Minh-Quang Le
- Department of Automation Engineering, School of Electrical and Electronic Engineering, Hanoi University of Science and Technology, Hanoi, Vietnam
| | - Van-Truong Pham
- Department of Automation Engineering, School of Electrical and Electronic Engineering, Hanoi University of Science and Technology, Hanoi, Vietnam
| | - Thi-Thao Tran
- Department of Automation Engineering, School of Electrical and Electronic Engineering, Hanoi University of Science and Technology, Hanoi, Vietnam.
| |
Collapse
|
24
|
Wang YRJ, Wang P, Yan Z, Zhou Q, Gunturkun F, Li P, Hu Y, Wu WE, Zhao K, Zhang M, Lv H, Fu L, Jin J, Du Q, Wang H, Chen K, Qu L, Lin K, Iv M, Wang H, Sun X, Vogel H, Han S, Tian L, Wu F, Gong J. Advancing presurgical non-invasive molecular subgroup prediction in medulloblastoma using artificial intelligence and MRI signatures. Cancer Cell 2024; 42:1239-1257.e7. [PMID: 38942025 DOI: 10.1016/j.ccell.2024.06.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/18/2023] [Revised: 04/25/2024] [Accepted: 06/05/2024] [Indexed: 06/30/2024]
Abstract
Global investigation of medulloblastoma has been hindered by the widespread inaccessibility of molecular subgroup testing and paucity of data. To bridge this gap, we established an international molecularly characterized database encompassing 934 medulloblastoma patients from thirteen centers across China and the United States. We demonstrate how image-based machine learning strategies have the potential to create an alternative pathway for non-invasive, presurgical, and low-cost molecular subgroup prediction in the clinical management of medulloblastoma. Our robust validation strategies-including cross-validation, external validation, and consecutive validation-demonstrate the model's efficacy as a generalizable molecular diagnosis classifier. The detailed analysis of MRI characteristics replenishes the understanding of medulloblastoma through a nuanced radiographic lens. Additionally, comparisons between East Asia and North America subsets highlight critical management implications. We made this comprehensive dataset, which includes MRI signatures, clinicopathological features, treatment variables, and survival data, publicly available to advance global medulloblastoma research.
Collapse
Affiliation(s)
- Yan-Ran Joyce Wang
- Anhui Province Key Laboratory of Biomedical Imaging and Intelligent Processing, Institute of Artificial Intelligence, Hefei Comprehensive National Science Center, Hefei 230088, China; School of Medicine, Stanford University, Stanford, CA 94304, USA.
| | - Pengcheng Wang
- Department of Biomedical Engineering, University of Southern California, Los Angeles, CA 90089, USA
| | - Zihan Yan
- Department of Pediatric Neurosurgery, Beijing Tiantan Hospital, Capital Medicine University, Beijing Neurosurgical Institute, Beijing 100070, China
| | - Quan Zhou
- School of Medicine, Stanford University, Stanford, CA 94304, USA; Department of Neurosurgery, Stanford School of Medicine, Stanford University, Stanford, CA 94304, USA
| | - Fatma Gunturkun
- School of Medicine, Stanford University, Stanford, CA 94304, USA; Quantitative Sciences Unit, Department of Medicine, Stanford University, Stanford, CA 94304, USA
| | - Peng Li
- Anhui Province Key Laboratory of Biomedical Imaging and Intelligent Processing, Institute of Artificial Intelligence, Hefei Comprehensive National Science Center, Hefei 230088, China; School of Engineering, University of Science and Technology of China, Hefei 230001, China
| | - Yanshen Hu
- School of Engineering, University of Science and Technology of China, Hefei 230001, China
| | - Wei Emma Wu
- School of Medicine, Stanford University, Stanford, CA 94304, USA; Department of Radiology Oncology, Stanford University, Stanford, CA 94305, USA
| | - Kankan Zhao
- Paul C. Lauterbur Research Center for Biomedical Imaging, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China
| | - Michael Zhang
- School of Medicine, Stanford University, Stanford, CA 94304, USA; Department of Neurosurgery, Stanford School of Medicine, Stanford University, Stanford, CA 94304, USA
| | - Haoyi Lv
- School of Engineering, University of Science and Technology of China, Hefei 230001, China
| | - Lehao Fu
- School of Engineering, University of Science and Technology of China, Hefei 230001, China
| | - Jiajie Jin
- School of Engineering, University of Science and Technology of China, Hefei 230001, China
| | - Qing Du
- Anhui Province Key Laboratory of Biomedical Imaging and Intelligent Processing, Institute of Artificial Intelligence, Hefei Comprehensive National Science Center, Hefei 230088, China
| | - Haoyu Wang
- School of Engineering, University of Science and Technology of China, Hefei 230001, China
| | - Kun Chen
- The First Affiliated Hospital of USTC, Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei 230026, China
| | - Liangqiong Qu
- The Department of Statistics and Actuarial Science and the Institute of Data Science, The University of Hong Kong, Hong Kong 999077, China
| | - Keldon Lin
- Mayo Clinic Alix School of Medicine, Scottsdale, AZ 85054, USA
| | - Michael Iv
- School of Medicine, Stanford University, Stanford, CA 94304, USA; Department of Neurosurgery, Stanford School of Medicine, Stanford University, Stanford, CA 94304, USA
| | - Hao Wang
- Anhui Province Key Laboratory of Biomedical Imaging and Intelligent Processing, Institute of Artificial Intelligence, Hefei Comprehensive National Science Center, Hefei 230088, China; MoE Key Laboratory of Brain-inspired Intelligent Perception and Cognition, School of Information Science and Technology, University of Science and Technology of China, Hefei 230026, China
| | - Xiaoyan Sun
- Anhui Province Key Laboratory of Biomedical Imaging and Intelligent Processing, Institute of Artificial Intelligence, Hefei Comprehensive National Science Center, Hefei 230088, China; School of Engineering, University of Science and Technology of China, Hefei 230001, China
| | - Hannes Vogel
- School of Medicine, Stanford University, Stanford, CA 94304, USA; Department of Pathology, Stanford School of Medicine, Stanford University, Stanford, CA 94304, USA
| | - Summer Han
- School of Medicine, Stanford University, Stanford, CA 94304, USA; Quantitative Sciences Unit, Department of Medicine, Stanford University, Stanford, CA 94304, USA
| | - Lu Tian
- School of Medicine, Stanford University, Stanford, CA 94304, USA; Department of Statistics, Stanford School of Medicine, Stanford University, Stanford, CA 94304, USA
| | - Feng Wu
- School of Engineering, University of Science and Technology of China, Hefei 230001, China
| | - Jian Gong
- Department of Pediatric Neurosurgery, Beijing Tiantan Hospital, Capital Medicine University, Beijing Neurosurgical Institute, Beijing 100070, China.
| |
Collapse
|
25
|
Qu W, Li X, Jin X. Knowledge enhanced bottom-up affordance grounding for robotic interaction. PeerJ Comput Sci 2024; 10:e2097. [PMID: 38983207 PMCID: PMC11232630 DOI: 10.7717/peerj-cs.2097] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2024] [Accepted: 05/13/2024] [Indexed: 07/11/2024]
Abstract
With the rapid advancement of robotics technology, an increasing number of researchers are exploring the use of natural language as a communication channel between humans and robots. In scenarios where language conditioned manipulation grounding, prevailing methods rely heavily on supervised multimodal deep learning. In this paradigm, robots assimilate knowledge from both language instructions and visual input. However, these approaches lack external knowledge for comprehending natural language instructions and are hindered by the substantial demand for a large amount of paired data, where vision and language are usually linked through manual annotation for the creation of realistic datasets. To address the above problems, we propose the knowledge enhanced bottom-up affordance grounding network (KBAG-Net), which enhances natural language understanding through external knowledge, improving accuracy in object grasping affordance segmentation. In addition, we introduce a semi-automatic data generation method aimed at facilitating the quick establishment of the language following manipulation grounding dataset. The experimental results on two standard dataset demonstrate that our method outperforms existing methods with the external knowledge. Specifically, our method outperforms the two-stage method by 12.98% and 1.22% of mIoU on the two dataset, respectively. For broader community engagement, we will make the semi-automatic data construction method publicly available at https://github.com/wmqu/Automated-Dataset-Construction4LGM.
Collapse
Affiliation(s)
- Wen Qu
- Computer Science and Technology, Dalian Martime University, Dalian, Liaoning, China
| | - Xiao Li
- Computer Science and Technology, Dalian Martime University, Dalian, Liaoning, China
| | - Xiao Jin
- Computer Science and Technology, Dalian Martime University, Dalian, Liaoning, China
| |
Collapse
|
26
|
Colleoni E, Sanchez Matilla R, Luengo I, Stoyanov D. Guided image generation for improved surgical image segmentation. Med Image Anal 2024; 97:103263. [PMID: 39013205 DOI: 10.1016/j.media.2024.103263] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2023] [Revised: 05/30/2024] [Accepted: 06/27/2024] [Indexed: 07/18/2024]
Abstract
The lack of large datasets and high-quality annotated data often limits the development of accurate and robust machine-learning models within the medical and surgical domains. In the machine learning community, generative models have recently demonstrated that it is possible to produce novel and diverse synthetic images that closely resemble reality while controlling their content with various types of annotations. However, generative models have not been yet fully explored in the surgical domain, partially due to the lack of large datasets and due to specific challenges present in the surgical domain such as the large anatomical diversity. We propose Surgery-GAN, a novel generative model that produces synthetic images from segmentation maps. Our architecture produces surgical images with improved quality when compared to early generative models thanks to the combination of channel- and pixel-level normalization layers that boost image quality while granting adherence to the input segmentation map. While state-of-the-art generative models often generate overfitted images, lacking diversity, or containing unrealistic artefacts such as cartooning; experiments demonstrate that Surgery-GAN is able to generate novel, realistic, and diverse surgical images in three different surgical datasets: cholecystectomy, partial nephrectomy, and radical prostatectomy. In addition, we investigate whether the use of synthetic images together with real ones can be used to improve the performance of other machine-learning models. Specifically, we use Surgery-GAN to generate large synthetic datasets which we then use to train five different segmentation models. Results demonstrate that using our synthetic images always improves the mean segmentation performance with respect to only using real images. For example, when considering radical prostatectomy, we can boost the mean segmentation performance by up to 5.43%. More interestingly, experimental results indicate that the performance improvement is larger in the set of classes that are under-represented in the training sets, where the performance boost of specific classes reaches up to 61.6%.
Collapse
Affiliation(s)
- Emanuele Colleoni
- Medtronic Digital Surgery, 230 City Rd, EC1V 2QY, London, United Kingdom; Wellcome/EPSRC Centre for Interventional and Surgical Sciences (WEISS), University College London (UCL), 43-45 Foley St, W1W 7TY, London, United Kingdom.
| | - Ricardo Sanchez Matilla
- Wellcome/EPSRC Centre for Interventional and Surgical Sciences (WEISS), University College London (UCL), 43-45 Foley St, W1W 7TY, London, United Kingdom
| | - Imanol Luengo
- Wellcome/EPSRC Centre for Interventional and Surgical Sciences (WEISS), University College London (UCL), 43-45 Foley St, W1W 7TY, London, United Kingdom
| | - Danail Stoyanov
- Medtronic Digital Surgery, 230 City Rd, EC1V 2QY, London, United Kingdom; Wellcome/EPSRC Centre for Interventional and Surgical Sciences (WEISS), University College London (UCL), 43-45 Foley St, W1W 7TY, London, United Kingdom
| |
Collapse
|
27
|
Ullah I, An S, Kang M, Chikontwe P, Lee H, Choi J, Park SH. Video domain adaptation for semantic segmentation using perceptual consistency matching. Neural Netw 2024; 179:106505. [PMID: 39002205 DOI: 10.1016/j.neunet.2024.106505] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2023] [Revised: 05/01/2024] [Accepted: 07/01/2024] [Indexed: 07/15/2024]
Abstract
Unsupervised domain adaptation (UDA) aims to transfer knowledge in previous and related labeled datasets (sources) to a new unlabeled dataset (target). Despite the impressive performance, existing approaches have largely focused on image-based UDA only, and video-based UDA has been relatively understudied and received less attention due to the difficulty of adapting diverse modal video features and modeling temporal associations efficiently. To address this, existing studies use optical flow to capture motion cues between in-domain consecutive frames, but is limited by heavy compute requirements and modeling flow patterns across diverse domains is equally challenging. In this work, we propose an adversarial domain adaptation approach for video semantic segmentation that aims to align temporally associated pixels in successive source and target domain frames without relying on optical flow. Specifically, we introduce a Perceptual Consistency Matching (PCM) strategy that leverages perceptual similarity to identify pixels with high correlation across consecutive frames, and infer that such pixels should correspond to the same class. Therefore, we can enhance prediction accuracy for video-UDA by enforcing consistency not only between in-domain frames, but across domains using PCM objectives during model training. Extensive experiments on public datasets show the benefit of our approach over existing state-of-the-art UDA methods. Our approach not only addresses a crucial task in video domain adaptation but also offers notable improvements in performance with faster inference times.
Collapse
Affiliation(s)
- Ihsan Ullah
- Department of Robotics and Mechatronics Engineering, Daegu Gyeongbuk Institute of Science and Technology (DGIST), Daegu, South Korea; Division of Intelligent Robotics, Daegu Gyeongbuk Institute of Science and Technology (DGIST), Daegu, South Korea
| | - Sion An
- Department of Robotics and Mechatronics Engineering, Daegu Gyeongbuk Institute of Science and Technology (DGIST), Daegu, South Korea
| | - Myeongkyun Kang
- Department of Robotics and Mechatronics Engineering, Daegu Gyeongbuk Institute of Science and Technology (DGIST), Daegu, South Korea
| | - Philip Chikontwe
- Department of Robotics and Mechatronics Engineering, Daegu Gyeongbuk Institute of Science and Technology (DGIST), Daegu, South Korea
| | - Hyunki Lee
- Division of Intelligent Robotics, Daegu Gyeongbuk Institute of Science and Technology (DGIST), Daegu, South Korea
| | - Jinwoo Choi
- Department of Computer Science and Engineering, Kyung Hee University, Yongin, South Korea
| | - Sang Hyun Park
- Department of Robotics and Mechatronics Engineering, Daegu Gyeongbuk Institute of Science and Technology (DGIST), Daegu, South Korea.
| |
Collapse
|
28
|
Kang Y, Zhang H, Wang X, Yang Y, Jia Q. MMDB: Multimodal dual-branch model for multi-functional bioactive peptide prediction. Anal Biochem 2024; 690:115491. [PMID: 38460901 DOI: 10.1016/j.ab.2024.115491] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2023] [Revised: 01/21/2024] [Accepted: 02/19/2024] [Indexed: 03/11/2024]
Abstract
Bioactive peptides can hinder oxidative processes and microbial spoilage in foodstuffs and play important roles in treating diverse diseases and disorders. While most of the methods focus on single-functional bioactive peptides and have obtained promising prediction performance, it is still a significant challenge to accurately detect complex and diverse functions simultaneously with the quick increase of multi-functional bioactive peptides. In contrast to previous research on multi-functional bioactive peptide prediction based solely on sequence, we propose a novel multimodal dual-branch (MMDB) lightweight deep learning model that designs two different branches to effectively capture the complementary information of peptide sequence and structural properties. Specifically, a multi-scale dilated convolution with Bi-LSTM branch is presented to effectively model the different scales sequence properties of peptides while a multi-layer convolution branch is proposed to capture structural information. To the best of our knowledge, this is the first effective extraction of peptide sequence features using multi-scale dilated convolution without parameter increase. Multimodal features from both branches are integrated via a fully connected layer for multi-label classification. Compared to state-of-the-art methods, our MMDB model exhibits competitive results across metrics, with a 9.1% Coverage increase and 5.3% and 3.5% improvements in Precision and Accuracy, respectively.
Collapse
Affiliation(s)
- Yan Kang
- National Pilot School of Software, Yunnan University, Kunming, 650091, Yunnan, China; Yunnan Key Laboratory of Software Engineering, China
| | - Huadong Zhang
- National Pilot School of Software, Yunnan University, Kunming, 650091, Yunnan, China
| | - Xinchao Wang
- National Pilot School of Software, Yunnan University, Kunming, 650091, Yunnan, China
| | - Yun Yang
- National Pilot School of Software, Yunnan University, Kunming, 650091, Yunnan, China; Yunnan Key Laboratory of Software Engineering, China.
| | - Qi Jia
- School of Information Science, Yunnan University, Kunming, 650091, Yunnan, China
| |
Collapse
|
29
|
Stan S, Rostami M. Unsupervised model adaptation for source-free segmentation of medical images. Med Image Anal 2024; 95:103179. [PMID: 38626666 DOI: 10.1016/j.media.2024.103179] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2023] [Revised: 04/09/2024] [Accepted: 04/11/2024] [Indexed: 04/18/2024]
Abstract
The recent prevalence of deep neural networks has led semantic segmentation networks to achieve human-level performance in the medical field, provided they are given sufficient training data. However, these networks often fail to generalize when tasked with creating semantic maps for out-of-distribution images, necessitating re-training on new distributions. This labor-intensive process requires expert knowledge for generating training labels. In the medical field, distribution shifts can naturally occur due to the choice of imaging devices, such as MRI or CT scanners. To mitigate the need for labeling images in a target domain after successful model training in a fully annotated source domain with a different data distribution, unsupervised domain adaptation (UDA) can be employed. Most UDA approaches ensure target generalization by generating a shared source/target latent feature space, allowing a source-trained classifier to maintain performance in the target domain. However, such approaches necessitate joint source and target data access, potentially leading to privacy leaks with respect to patient information. We propose a UDA algorithm for medical image segmentation that does not require access to source data during adaptation, thereby preserving patient data privacy. Our method relies on approximating the source latent features at the time of adaptation and creates a joint source/target embedding space by minimizing a distributional distance metric based on optimal transport. We demonstrate that our approach is competitive with recent UDA medical segmentation works, even with the added requirement of privacy. 1.
Collapse
Affiliation(s)
- Serban Stan
- University of Southern California, United States of America
| | | |
Collapse
|
30
|
Yu H, Yang Z, Zhang Z, Wang T, Ran M, Wang Z, Liu L, Liu Y, Zhang Y. Multiple organ segmentation framework for brain metastasis radiotherapy. Comput Biol Med 2024; 177:108637. [PMID: 38824789 DOI: 10.1016/j.compbiomed.2024.108637] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2024] [Revised: 04/24/2024] [Accepted: 05/18/2024] [Indexed: 06/04/2024]
Abstract
Radiotherapy is a preferred treatment for brain metastases, which kills cancer cells via high doses of radiation meanwhile hardly avoiding damage to surrounding healthy cells. Therefore, the delineation of organs-at-risk (OARs) is vital in treatment planning to minimize radiation-induced toxicity. However, the following aspects make OAR delineation a challenging task: extremely imbalanced organ sizes, ambiguous boundaries, and complex anatomical structures. To alleviate these challenges, we imitate how specialized clinicians delineate OARs and present a novel cascaded multi-OAR segmentation framework, called OAR-SegNet. OAR-SegNet comprises two distinct levels of segmentation networks: an Anatomical-Prior-Guided network (APG-Net) and a Point-Cloud-Guided network (PCG-Net). Specifically, APG-Net handles segmentation for all organs, where multi-view segmentation modules and a deep prior loss are designed under the guidance of prior knowledge. After APG-Net, PCG-Net refines small organs through the mini-segmentation and the point-cloud alignment heads. The mini-segmentation head is further equipped with the deep prior feature. Extensive experiments were conducted to demonstrate the superior performance of the proposed method compared to other state-of-the-art medical segmentation methods.
Collapse
Affiliation(s)
- Hui Yu
- College of Computer Science, Sichuan University, China
| | - Ziyuan Yang
- College of Computer Science, Sichuan University, China
| | | | - Tao Wang
- College of Computer Science, Sichuan University, China
| | - Maoson Ran
- College of Computer Science, Sichuan University, China
| | - Zhiwen Wang
- College of Computer Science, Sichuan University, China
| | - Lunxin Liu
- Department of Neurosurgery, West China Hospital of Sichuan University, China
| | - Yan Liu
- College of Electrical Engineering, Sichuan University, China.
| | - Yi Zhang
- School of Cyber Science and Engineering, Sichuan University, China
| |
Collapse
|
31
|
Xuan P, Chu X, Cui H, Nakaguchi T, Wang L, Ning Z, Ning Z, Li C, Zhang T. Multi-view attribute learning and context relationship encoding enhanced segmentation of lung tumors from CT images. Comput Biol Med 2024; 177:108640. [PMID: 38833798 DOI: 10.1016/j.compbiomed.2024.108640] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2023] [Revised: 04/25/2024] [Accepted: 05/18/2024] [Indexed: 06/06/2024]
Abstract
Graph convolutional neural networks (GCN) have shown the promise in medical image segmentation due to the flexibility of representing diverse range of image regions using graph nodes and propagating knowledge via graph edges. However, existing methods did not fully exploit the various attributes of image nodes and the context relationship among their attributes. We propose a new segmentation method with multi-similarity view enhancement and node attribute context learning (MNSeg). First, multiple views were formed by measuring the similarities among the image nodes, and MNSeg has a GCN based multi-view image node attribute learning (MAL) module to integrate various node attributes learnt from multiple similarity views. Each similarity view contains the specific similarities among all the image nodes, and it was integrated with the node attributes from all the channels to form the enhanced attributes of image nodes. Second, the context relationships among the attributes of image nodes are formulated by a transformer-based context relationship encoding (CRE) strategy to propagate these relationships across all the image nodes. During the transformer-based learning, the relationships were estimated based on the self-attention on all the image nodes, and then they were encoded into the learned node features. Finally, we design an attention at attribute category level (ACA) to discriminate and fuse the learnt diverse information from MAL, CRE, and the original node attributes. ACA identifies the more informative attribute categories by adaptively learn their importance. We validate the performance of MNSeg on a public lung tumor CT dataset and an in-house non-small cell lung cancer (NSCLC) dataset collected from the hospital. The segmentation results show that MNSeg outperformed the compared segmentation methods in terms of spatial overlap and the shape similarities. The ablation studies demonstrated the effectiveness of MAL, CRE, and ACA. The generalization ability of MNSeg was proved by the consistent improved segmentation performances using different 3D segmentation backbones.
Collapse
Affiliation(s)
- Ping Xuan
- Department of Computer Science and Technology, Shantou University, Shantou, China; School of Computer Science and Technology, Heilongjiang University, Harbin, China
| | - Xiuqiang Chu
- School of Computer Science and Technology, Heilongjiang University, Harbin, China
| | - Hui Cui
- Department of Computer Science and Information Technology, La Trobe University, Melbourne, Australia
| | - Toshiya Nakaguchi
- Center for Frontier Medical Engineering, Chiba University, Chiba, Japan
| | - Linlin Wang
- Department of Radiation Oncology, Shandong First Medical University and Shandong Academy of Medical Sciences, Jinan, China
| | - Zhiyuan Ning
- School of Electrical and Information Engineering, The University of Sydney, Sydney, Australia
| | - Zhiyu Ning
- School of Electrical and Information Engineering, The University of Sydney, Sydney, Australia
| | | | - Tiangang Zhang
- School of Computer Science and Technology, Heilongjiang University, Harbin, China; School of Mathematical Science, Heilongjiang University, Harbin, China.
| |
Collapse
|
32
|
Wang X, Lv Q, Chen G, Zhang J, Wei Z, Dong J, Fu H, Zhu Z, Liu J, Jin X. MobileSky: Real-Time Sky Replacement for Mobile AR. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2024; 30:4304-4320. [PMID: 37030763 DOI: 10.1109/tvcg.2023.3257840] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
We present MobileSky, the first automatic method for real-time high-quality sky replacement for mobile AR applications. The primary challenge of this task is how to extract sky regions in camera feed both quickly and accurately. While the problem of sky replacement is not new, previous methods mainly concern extraction quality rather than efficiency, limiting their application to our task. We aim to provide higher quality, both spatially and temporally consistent sky mask maps for all camera frames in real time. To this end, we develop a novel framework that combines a new deep semantic network called FSNet with novel post-processing refinement steps. By leveraging IMU data, we also propose new sky-aware constraints such as temporal consistency, position consistency, and color consistency to help refine the weakly classified part of the segmentation output. Experiments show that our method achieves an average of around 30 FPS on off-the-shelf smartphones and outperforms the state-of-the-art sky replacement methods in terms of execution speed and quality. In the meantime, our mask maps appear to be visually more stable across frames. Our fast sky replacement method enables several applications, such as AR advertising, art making, generating fantasy celestial objects, visually learning about weather phenomena, and advanced video-based visual effects. To facilitate future research, we also create a new video dataset containing annotated sky regions with IMU data.
Collapse
|
33
|
Li N, Pan Y, Qiu W, Xiong L, Wang Y, Zhang Y. Constantly optimized mean teacher for semi-supervised 3D MRI image segmentation. Med Biol Eng Comput 2024; 62:2231-2245. [PMID: 38514501 DOI: 10.1007/s11517-024-03061-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2023] [Accepted: 02/23/2024] [Indexed: 03/23/2024]
Abstract
The mean teacher model and its variants, as important methods in semi-supervised learning, have demonstrated promising performance in magnetic resonance imaging (MRI) data segmentation. However, the superior performance of teacher model through exponential moving average (EMA) is limited by the unreliability of unlabeled image, resulting in potentially unreliable predictions. In this paper, we propose a framework to optimized the teacher model with reliable expert-annotated data while preserving the advantages of EMA. To avoid the tight coupling that results from EMA, we leverage data augmentations to provide two distinct perspectives for the teacher and student models. The teacher model adopts weak data augmentation to provide supervision for the student model and optimizes itself with real annotations, while the student uses strong data augmentation to avoid overfitting on noise information. In addition, double softmax helps the model resist noise and continue learning meaningful information from the images, which is a key component in the proposed model. Extensive experiments show that the proposed method exhibits competitive performance on the Left Atrium segmentation MRI dataset (LA) and the Brain Tumor Segmentation MRI dataset (BraTS2019). For the LA dataset, we achieved a dice of 91.02% using only 20% labeled data, which is close to the dice of 91.14% obtained by the supervised approach using 100% labeled data. For the BraTs2019 dataset, the proposed method achieved 1.02% and 1.92% improvement on 5% and 10% labeled data, respectively, compared to the best baseline method on this dataset. This study demonstrates that the proposed model can be a potential candidate for medical image segmentation in semi-supervised learning scenario.
Collapse
Affiliation(s)
- Ning Li
- School of Computer Science and Technology, Laboratory for Brain Science and Medical Artificial Intelligence, Southwest University of Science and Technology, Mianyang, 621010, People's Republic of China
| | - Yudong Pan
- School of Computer Science and Technology, Laboratory for Brain Science and Medical Artificial Intelligence, Southwest University of Science and Technology, Mianyang, 621010, People's Republic of China
| | - Wei Qiu
- School of Computer Science and Technology, Laboratory for Brain Science and Medical Artificial Intelligence, Southwest University of Science and Technology, Mianyang, 621010, People's Republic of China
| | - Lianjin Xiong
- School of Computer Science and Technology, Laboratory for Brain Science and Medical Artificial Intelligence, Southwest University of Science and Technology, Mianyang, 621010, People's Republic of China
| | - Yaobin Wang
- School of Computer Science and Technology, Laboratory for Brain Science and Medical Artificial Intelligence, Southwest University of Science and Technology, Mianyang, 621010, People's Republic of China
| | - Yangsong Zhang
- School of Computer Science and Technology, Laboratory for Brain Science and Medical Artificial Intelligence, Southwest University of Science and Technology, Mianyang, 621010, People's Republic of China.
- NHC Key Laboratory of Nuclear Technology Medical Transformation (Mianyang Central Hospital), Mianyang, 621000, People's Republic of China.
- Key Laboratory of Testing Technology for Manufacturing Process, Ministry of Education, Southwest University of Science and Technology, Mianyang, 621010, People's Republic of China.
| |
Collapse
|
34
|
Thandiackal K, Piccinelli L, Gupta R, Pati P, Goksel O. Multi-Scale Feature Alignment for Continual Learning of Unlabeled Domains. IEEE TRANSACTIONS ON MEDICAL IMAGING 2024; 43:2599-2609. [PMID: 38381642 DOI: 10.1109/tmi.2024.3368365] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/23/2024]
Abstract
Methods for unsupervised domain adaptation (UDA) help to improve the performance of deep neural networks on unseen domains without any labeled data. Especially in medical disciplines such as histopathology, this is crucial since large datasets with detailed annotations are scarce. While the majority of existing UDA methods focus on the adaptation from a labeled source to a single unlabeled target domain, many real-world applications with a long life cycle involve more than one target domain. Thus, the ability to sequentially adapt to multiple target domains becomes essential. In settings where the data from previously seen domains cannot be stored, e.g., due to data protection regulations, the above becomes a challenging continual learning problem. To this end, we propose to use generative feature-driven image replay in conjunction with a dual-purpose discriminator that not only enables the generation of images with realistic features for replay, but also promotes feature alignment during domain adaptation. We evaluate our approach extensively on a sequence of three histopathological datasets for tissue-type classification, achieving state-of-the-art results. We present detailed ablation experiments studying our proposed method components and demonstrate a possible use-case of our continual UDA method for an unsupervised patch-based segmentation task given high-resolution tissue images. Our code is available at: https://github.com/histocartography/multi-scale-feature-alignment.
Collapse
|
35
|
Jiang C, Wang T, Pan Y, Ding Z, Shen D. Real-time diagnosis of intracerebral hemorrhage by generating dual-energy CT from single-energy CT. Med Image Anal 2024; 95:103194. [PMID: 38749304 DOI: 10.1016/j.media.2024.103194] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2023] [Revised: 04/20/2024] [Accepted: 05/02/2024] [Indexed: 06/01/2024]
Abstract
Real-time diagnosis of intracerebral hemorrhage after thrombectomy is crucial for follow-up treatment. However, this is difficult to achieve with standard single-energy CT (SECT) due to similar CT values of blood and contrast agents under a single energy spectrum. In contrast, dual-energy CT (DECT) scanners employ two different energy spectra, which allows for real-time differentiation between hemorrhage and contrast extravasation based on energy-related attenuation characteristics. Unfortunately, DECT scanners are not as widely used as SECT scanners due to their high costs. To address this dilemma, in this paper, we generate pseudo DECT images from a SECT image for real-time diagnosis of hemorrhage. More specifically, we propose a SECT-to-DECT Transformer-based Generative Adversarial Network (SDTGAN), which is a 3D transformer-based multi-task learning framework equipped with a shared attention mechanism. In this way, SDTGAN can be guided to focus more on high-density areas (crucial for hemorrhage diagnosis) during the generation. Meanwhile, the introduced multi-task learning strategy and the shared attention mechanism also enable SDTGAN to model dependencies between interconnected generation tasks, improving generation performance while significantly reducing model parameters and computational complexity. In the experiments, we approximate real SECT images using mixed 120kV images from DECT data to address the issue of not being able to obtain the true paired DECT and SECT data. Extensive experiments demonstrate that SDTGAN can generate DECT images better than state-of-the-art methods. The code of our implementation is available at https://github.com/jiang-cw/SDTGAN.
Collapse
Affiliation(s)
- Caiwen Jiang
- School of Biomedical Engineering & State Key Laboratory of Advanced Medical Materials and Devices, ShanghaiTech University, Shanghai, China
| | - Tianyu Wang
- Department of Radiology, Affiliated Hangzhou First People's Hospital, Westlake University School of Medicine, Hangzhou, China; Zhejiang University School of Medicine, Hangzhou, China
| | - Yongsheng Pan
- School of Biomedical Engineering & State Key Laboratory of Advanced Medical Materials and Devices, ShanghaiTech University, Shanghai, China
| | - Zhongxiang Ding
- Department of Radiology, Affiliated Hangzhou First People's Hospital, Westlake University School of Medicine, Hangzhou, China.
| | - Dinggang Shen
- School of Biomedical Engineering & State Key Laboratory of Advanced Medical Materials and Devices, ShanghaiTech University, Shanghai, China; Shanghai United Imaging Intelligence Co., Ltd., Shanghai, China; Shanghai Clinical Research and Trial Center, Shanghai, 201210, China.
| |
Collapse
|
36
|
Aghapanah H, Rasti R, Kermani S, Tabesh F, Banaem HY, Aliakbar HP, Sanei H, Segars WP. CardSegNet: An adaptive hybrid CNN-vision transformer model for heart region segmentation in cardiac MRI. Comput Med Imaging Graph 2024; 115:102382. [PMID: 38640619 DOI: 10.1016/j.compmedimag.2024.102382] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2023] [Revised: 03/08/2024] [Accepted: 04/10/2024] [Indexed: 04/21/2024]
Abstract
Cardiovascular MRI (CMRI) is a non-invasive imaging technique adopted for assessing the blood circulatory system's structure and function. Precise image segmentation is required to measure cardiac parameters and diagnose abnormalities through CMRI data. Because of anatomical heterogeneity and image variations, cardiac image segmentation is a challenging task. Quantification of cardiac parameters requires high-performance segmentation of the left ventricle (LV), right ventricle (RV), and left ventricle myocardium from the background. The first proposed solution here is to manually segment the regions, which is a time-consuming and error-prone procedure. In this context, many semi- or fully automatic solutions have been proposed recently, among which deep learning-based methods have revealed high performance in segmenting regions in CMRI data. In this study, a self-adaptive multi attention (SMA) module is introduced to adaptively leverage multiple attention mechanisms for better segmentation. The convolutional-based position and channel attention mechanisms with a patch tokenization-based vision transformer (ViT)-based attention mechanism in a hybrid and end-to-end manner are integrated into the SMA. The CNN- and ViT-based attentions mine the short- and long-range dependencies for more precise segmentation. The SMA module is applied in an encoder-decoder structure with a ResNet50 backbone named CardSegNet. Furthermore, a deep supervision method with multi-loss functions is introduced to the CardSegNet optimizer to reduce overfitting and enhance the model's performance. The proposed model is validated on the ACDC2017 (n=100), M&Ms (n=321), and a local dataset (n=22) using the 10-fold cross-validation method with promising segmentation results, demonstrating its outperformance versus its counterparts.
Collapse
Affiliation(s)
- Hamed Aghapanah
- School of Advanced Technologies in Medicine, Isfahan University of Medical Sciences, Isfahan, Iran
| | - Reza Rasti
- Department of Biomedical Engineering, Faculty of Engineering, University of Isfahan, Isfahan, Iran; Department of Biomedical Engineering, Duke University, Durham, NC 27708, USA.
| | - Saeed Kermani
- School of Advanced Technologies in Medicine, Isfahan University of Medical Sciences, Isfahan, Iran.
| | - Faezeh Tabesh
- Cardiovascular Research Institute, Isfahan University of Medical Sciences, Isfahan, Iran
| | - Hossein Yousefi Banaem
- Skull Base Research Center, Loghman Hakim Hospital, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | - Hamidreza Pour Aliakbar
- Rajaie Cardiovascular Medical and Research Center, Iran University of Medical Sciences, Tehran, Iran
| | - Hamid Sanei
- Cardiovascular Research Institute, Isfahan University of Medical Sciences, Isfahan, Iran
| | - William Paul Segars
- Department of Biomedical Engineering, Faculty of Engineering, University of Isfahan, Isfahan, Iran
| |
Collapse
|
37
|
Gao L, Wang W, Meng X, Zhang S, Xu J, Ju S, Wang YC. TPA: Two-stage progressive attention segmentation framework for hepatocellular carcinoma on multi-modality MRI. Med Phys 2024; 51:4936-4947. [PMID: 38306473 DOI: 10.1002/mp.16968] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2023] [Revised: 01/04/2024] [Accepted: 01/21/2024] [Indexed: 02/04/2024] Open
Abstract
BACKGROUND Dynamic contrast enhanced magnetic resonance imaging (DCE-MRI) plays a crucial role in the diagnosis and measurement of hepatocellular carcinoma (HCC). The multi-modality information contained in the multi-phase images of DCE-MRI is important for improving segmentation. However, this remains a challenging task due to the heterogeneity of HCC, which may cause one HCC lesion to have varied imaging appearance in each phase of DCE-MRI. In particular, some phases exhibit inconsistent sizes and boundaries will result in a lack of correlation between modalities, and it may pose inaccurate segmentation results. PURPOSE We aim to design a multi-modality segmentation model that can learn meaningful inter-phase correlation for achieving HCC segmentation. METHODS In this study, we propose a two-stage progressive attention segmentation framework (TPA) for HCC based on the transformer and the decision-making process of radiologists. Specifically, the first stage aims to fuse features from multi-phase images to identify HCC and provide localization region. In the second stage, a multi-modality attention transformer module (MAT) is designed to focus on the features that can represent the actual size. RESULTS We conduct training, validation, and test in a single-center dataset (386 cases), followed by external test on a batch of multi-center datasets (83 cases). Furthermore, we analyze a subgroup of data with weak inter-phase correlation in the test set. The proposed model achieves Dice coefficient of 0.822 and 0.772 in the internal and external test sets, respectively, and 0.829, 0.791 in the subgroup. The experimental results demonstrate that our model outperforms state-of-the-art models, particularly within subgroup. CONCLUSIONS The proposed TPA provides best segmentation results, and utilizing clinical prior knowledge for network design is practical and feasible.
Collapse
Affiliation(s)
- Lei Gao
- Institute for AI in Medicine, School of Artificial Intelligence, Nanjing University of Information Science and Technology, Nanjing, China
| | - Weilang Wang
- Department of Radiology, Zhongda Hospital, Jiangsu Key Laboratory of Molecular and Functional Imaging, School of Medicine, Southeast University, Nanjing, China
| | - Xiangpan Meng
- Department of Radiology, Zhongda Hospital, Jiangsu Key Laboratory of Molecular and Functional Imaging, School of Medicine, Southeast University, Nanjing, China
| | - Shuhang Zhang
- Department of Radiology, Zhongda Hospital, Jiangsu Key Laboratory of Molecular and Functional Imaging, School of Medicine, Southeast University, Nanjing, China
| | - Jun Xu
- Institute for AI in Medicine, School of Artificial Intelligence, Nanjing University of Information Science and Technology, Nanjing, China
| | - Shenghong Ju
- Institute for AI in Medicine, School of Artificial Intelligence, Nanjing University of Information Science and Technology, Nanjing, China
- Department of Radiology, Zhongda Hospital, Jiangsu Key Laboratory of Molecular and Functional Imaging, School of Medicine, Southeast University, Nanjing, China
| | - Yuan-Cheng Wang
- Institute for AI in Medicine, School of Artificial Intelligence, Nanjing University of Information Science and Technology, Nanjing, China
- Department of Radiology, Zhongda Hospital, Jiangsu Key Laboratory of Molecular and Functional Imaging, School of Medicine, Southeast University, Nanjing, China
| |
Collapse
|
38
|
Xu B, Yang J, Hong P, Fan X, Sun Y, Zhang L, Yang B, Xu L, Avolio A. Coronary artery segmentation in CCTA images based on multi-scale feature learning. JOURNAL OF X-RAY SCIENCE AND TECHNOLOGY 2024:XST240093. [PMID: 38943423 DOI: 10.3233/xst-240093] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/01/2024]
Abstract
BACKGROUND Coronary artery segmentation is a prerequisite in computer-aided diagnosis of Coronary Artery Disease (CAD). However, segmentation of coronary arteries in Coronary Computed Tomography Angiography (CCTA) images faces several challenges. The current segmentation approaches are unable to effectively address these challenges and existing problems such as the need for manual interaction or low segmentation accuracy. OBJECTIVE A Multi-scale Feature Learning and Rectification (MFLR) network is proposed to tackle the challenges and achieve automatic and accurate segmentation of coronary arteries. METHODS The MFLR network introduces a multi-scale feature extraction module in the encoder to effectively capture contextual information under different receptive fields. In the decoder, a feature correction and fusion module is proposed, which employs high-level features containing multi-scale information to correct and guide low-level features, achieving fusion between the two-level features to further improve segmentation performance. RESULTS The MFLR network achieved the best performance on the dice similarity coefficient, Jaccard index, Recall, F1-score, and 95% Hausdorff distance, for both in-house and public datasets. CONCLUSION Experimental results demonstrate the superiority and good generalization ability of the MFLR approach. This study contributes to the accurate diagnosis and treatment of CAD, and it also informs other segmentation applications in medicine.
Collapse
Affiliation(s)
- Bu Xu
- College of Medicine and Biological Information Engineering, Northeastern University, Shenyang, China
| | - Jinzhong Yang
- College of Medicine and Biological Information Engineering, Northeastern University, Shenyang, China
| | - Peng Hong
- Software College, Northeastern University, Shenyang, China
| | - Xiaoxue Fan
- College of Medicine and Biological Information Engineering, Northeastern University, Shenyang, China
| | - Yu Sun
- College of Medicine and Biological Information Engineering, Northeastern University, Shenyang, China
- Department of Radiology, General Hospital of North Theater Command, Shenyang, China
| | - Libo Zhang
- College of Medicine and Biological Information Engineering, Northeastern University, Shenyang, China
- Department of Radiology, General Hospital of North Theater Command, Shenyang, China
| | - Benqiang Yang
- College of Medicine and Biological Information Engineering, Northeastern University, Shenyang, China
- Department of Radiology, General Hospital of North Theater Command, Shenyang, China
| | - Lisheng Xu
- College of Medicine and Biological Information Engineering, Northeastern University, Shenyang, China
- Key Laboratory of Medical Image Computing, Ministry of Education, Shenyang, China
- Engineering Research Center of Medical Imaging and Intelligent Analysis, Ministry of Education, Shenyang, China
| | - Alberto Avolio
- Macquarie Medical School, Faculty of Medicine, Health and Human Sciences, Macquarie University, Sydney, Australia
| |
Collapse
|
39
|
Li Z, Wang X, Liu X, Jiang J. BinsFormer: Revisiting Adaptive Bins for Monocular Depth Estimation. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2024; 33:3964-3976. [PMID: 38913511 DOI: 10.1109/tip.2024.3416065] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/26/2024]
Abstract
Monocular depth estimation (MDE) is a fundamental task in computer vision and has drawn increasing attention. Recently, some methods reformulate it as a classification-regression task to boost the model performance, where continuous depth is estimated via a linear combination of predicted probability distributions and discrete bins. In this paper, we present a novel framework called BinsFormer, tailored for the classification-regression-based depth estimation. It mainly focuses on two crucial components in the specific task: 1) proper generation of adaptive bins; and 2) sufficient interaction between probability distribution and bins predictions. To specify, we employ a Transformer decoder to generate bins, novelly viewing it as a direct set-to-set prediction problem. We further integrate a multi-scale decoder structure to achieve a comprehensive understanding of spatial geometry information and estimate depth maps in a coarse-to-fine manner. Moreover, an extra scene understanding query is proposed to improve the estimation accuracy, which turns out that models can implicitly learn useful information from the auxiliary environment classification task. Extensive experiments on the KITTI, NYU, and SUN RGB-D datasets demonstrate that BinsFormer surpasses state-of-the-art MDE methods with prominent margins. Code and pretrained models are made publicly available at https://github.com/zhyever/ Monocular-Depth-Estimation-Toolbox/tree/main/configs/ binsformer.
Collapse
|
40
|
Li W, Wang Y, Liu Y. DMAF-Net: deformable multi-scale adaptive fusion network for dental structure detection with panoramic radiographs. Dentomaxillofac Radiol 2024; 53:296-307. [PMID: 38518093 PMCID: PMC11211679 DOI: 10.1093/dmfr/twae014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2023] [Revised: 03/03/2024] [Accepted: 03/19/2024] [Indexed: 03/24/2024] Open
Abstract
OBJECTIVES Panoramic radiography is one of the most commonly used diagnostic modalities in dentistry. Automatic recognition of panoramic radiography helps dentists in decision support. In order to improve the accuracy of the detection of dental structural problems in panoramic radiographs, we have improved the You Only Look Once (YOLO) network and verified the feasibility of this new method in aiding the detection of dental problems. METHODS We propose a Deformable Multi-scale Adaptive Fusion Net (DMAF-Net) to detect 5 types of dental situations (impacted teeth, missing teeth, implants, crown restorations, and root canal-treated teeth) in panoramic radiography by improving the YOLO network. In DMAF-Net, we propose different modules to enhance the feature extraction capability of the network as well as to acquire high-level features at different scales, while using adaptively spatial feature fusion to solve the problem of scale mismatches of different feature layers, which effectively improves the detection performance. In order to evaluate the detection performance of the models, we compare the experimental results of different models in the test set and select the optimal results of the models by calculating the average of different metrics in each category as the evaluation criteria. RESULTS About 1474 panoramic radiographs were divided into training, validation, and test sets in the ratio of 7:2:1. In the test set, the average precision and recall of DMAF-Net are 92.7% and 87.6%, respectively; the mean Average Precision (mAP0.5 and mAP[0.5:0.95]) are 91.8% and 63.7%, respectively. CONCLUSIONS The proposed DMAF-Net model improves existing deep learning models and achieves automatic detection of tooth structure problems in panoramic radiographs. This new method has great potential for new computer-aided diagnostic, teaching, and clinical applications in the future.
Collapse
Affiliation(s)
- Wei Li
- School of Health Science and Engineering, University of Shanghai for Science and Technology, Shanghai 200093, China
| | - Yuanjun Wang
- School of Health Science and Engineering, University of Shanghai for Science and Technology, Shanghai 200093, China
| | - Yu Liu
- Department of Radiology, Shanghai Ninth People’s Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai 200011, China
| |
Collapse
|
41
|
Miao Y, Sun Y, Zhang Y, Wang J, Zhang X. An efficient point cloud semantic segmentation network with multiscale super-patch transformer. Sci Rep 2024; 14:14581. [PMID: 38918404 PMCID: PMC11199674 DOI: 10.1038/s41598-024-63451-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2024] [Accepted: 05/29/2024] [Indexed: 06/27/2024] Open
Abstract
Efficient semantic segmentation of large-scale point cloud scenes is a fundamental and essential task for perception or understanding the surrounding 3d environments. However, due to the vast amount of point cloud data, it is always a challenging to train deep neural networks efficiently and also difficult to establish a unified model to represent different shapes effectively due to their variety and occlusions of scene objects. Taking scene super-patch as data representation and guided by its contextual information, we propose a novel multiscale super-patch transformer network (MSSPTNet) for point cloud segmentation, which consists of a multiscale super-patch local aggregation (MSSPLA) module and a super-patch transformer (SPT) module. Given large-scale point cloud data as input, a dynamic region-growing algorithm is first adopted to extract scene super-patches from the sampling points with consistent geometric features. Then, the MSSPLA module aggregates local features and their contextual information of adjacent super-patches at different scales. Owing to the self-attention mechanism, the SPT module exploits the similarity among scene super-patches in high-level feature space. By combining these two modules, our MSSPTNet can effectively learn both local and global features from the input point clouds. Finally, the interpolating upsampling and multi-layer perceptrons are exploited to generate semantic labels for the original point cloud data. Experimental results on the public S3DIS dataset demonstrate its efficiency of the proposed network for segmenting large-scale point cloud scenes, especially for those indoor scenes with a large number of repetitive structures, i.e., the network training of our MSSPTNet is much faster than other segmentation networks by a factor of tens to hundreds.
Collapse
Affiliation(s)
- Yongwei Miao
- School of Information Science and Technology, Hangzhou Normal University, Hangzhou, 311121, China
| | - Yuliang Sun
- School of Information Science and Technology, Zhejiang Shuren University, Hangzhou, 310015, China
| | - Yimin Zhang
- School of Computer Science and Technology, Zhejiang Sci-Tech University, Hangzhou, 310018, China
| | - Jinrong Wang
- School of Information Science and Technology, Hangzhou Normal University, Hangzhou, 311121, China
| | - Xudong Zhang
- School of Information Science and Technology, Zhejiang Shuren University, Hangzhou, 310015, China.
| |
Collapse
|
42
|
Yang M, Yang M, Yang L, Wang Z, Ye P, Chen C, Fu L, Xu S. Deep learning for MRI lesion segmentation in rectal cancer. Front Med (Lausanne) 2024; 11:1394262. [PMID: 38983364 PMCID: PMC11231084 DOI: 10.3389/fmed.2024.1394262] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2024] [Accepted: 06/14/2024] [Indexed: 07/11/2024] Open
Abstract
Rectal cancer (RC) is a globally prevalent malignant tumor, presenting significant challenges in its management and treatment. Currently, magnetic resonance imaging (MRI) offers superior soft tissue contrast and radiation-free effects for RC patients, making it the most widely used and effective detection method. In early screening, radiologists rely on patients' medical radiology characteristics and their extensive clinical experience for diagnosis. However, diagnostic accuracy may be hindered by factors such as limited expertise, visual fatigue, and image clarity issues, resulting in misdiagnosis or missed diagnosis. Moreover, the distribution of surrounding organs in RC is extensive with some organs having similar shapes to the tumor but unclear boundaries; these complexities greatly impede doctors' ability to diagnose RC accurately. With recent advancements in artificial intelligence, machine learning techniques like deep learning (DL) have demonstrated immense potential and broad prospects in medical image analysis. The emergence of this approach has significantly enhanced research capabilities in medical image classification, detection, and segmentation fields with particular emphasis on medical image segmentation. This review aims to discuss the developmental process of DL segmentation algorithms along with their application progress in lesion segmentation from MRI images of RC to provide theoretical guidance and support for further advancements in this field.
Collapse
Affiliation(s)
- Mingwei Yang
- Department of General Surgery, Nanfang Hospital Zengcheng Campus, Guangzhou, Guangdong, China
| | - Miyang Yang
- Department of Radiology, Fuzong Teaching Hospital, Fujian University of Traditional Chinese Medicine, Fuzhou, Fujian, China
- Department of Radiology, 900th Hospital of Joint Logistics Support Force, Fuzhou, Fujian, China
| | - Lanlan Yang
- Department of Radiology, Fuzong Teaching Hospital, Fujian University of Traditional Chinese Medicine, Fuzhou, Fujian, China
| | - Zhaochu Wang
- Department of Radiology, Fuzong Teaching Hospital, Fujian University of Traditional Chinese Medicine, Fuzhou, Fujian, China
| | - Peiyun Ye
- Department of Radiology, Fuzong Teaching Hospital, Fujian University of Traditional Chinese Medicine, Fuzhou, Fujian, China
- Department of Radiology, 900th Hospital of Joint Logistics Support Force, Fuzhou, Fujian, China
| | - Chujie Chen
- Department of Radiology, Fuzong Teaching Hospital, Fujian University of Traditional Chinese Medicine, Fuzhou, Fujian, China
- Department of Radiology, 900th Hospital of Joint Logistics Support Force, Fuzhou, Fujian, China
| | - Liyuan Fu
- Department of Radiology, 900th Hospital of Joint Logistics Support Force, Fuzhou, Fujian, China
| | - Shangwen Xu
- Department of Radiology, 900th Hospital of Joint Logistics Support Force, Fuzhou, Fujian, China
| |
Collapse
|
43
|
Xu Z, Wang Z. MCV-UNet: a modified convolution & transformer hybrid encoder-decoder network with multi-scale information fusion for ultrasound image semantic segmentation. PeerJ Comput Sci 2024; 10:e2146. [PMID: 38983210 PMCID: PMC11232629 DOI: 10.7717/peerj-cs.2146] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2024] [Accepted: 05/30/2024] [Indexed: 07/11/2024]
Abstract
In recent years, the growing importance of accurate semantic segmentation in ultrasound images has led to numerous advances in deep learning-based techniques. In this article, we introduce a novel hybrid network that synergistically combines convolutional neural networks (CNN) and Vision Transformers (ViT) for ultrasound image semantic segmentation. Our primary contribution is the incorporation of multi-scale CNN in both the encoder and decoder stages, enhancing feature learning capabilities across multiple scales. Further, the bottleneck of the network leverages the ViT to capture long-range high-dimension spatial dependencies, a critical factor often overlooked in conventional CNN-based approaches. We conducted extensive experiments using a public benchmark ultrasound nerve segmentation dataset. Our proposed method was benchmarked against 17 existing baseline methods, and the results underscored its superiority, as it outperformed all competing methods including a 4.6% improvement of Dice compared against TransUNet, 13.0% improvement of Dice against Attention UNet, 10.5% improvement of precision compared against UNet. This research offers significant potential for real-world applications in medical imaging, demonstrating the power of blending CNN and ViT in a unified framework.
Collapse
Affiliation(s)
- Zihong Xu
- Department of Mechanical Engineering, Columbia University, New York, United States of America
| | - Ziyang Wang
- Department of Computer Science, University of Oxford, Oxford, United Kingdom
| |
Collapse
|
44
|
Qiu H, Ning M, Song Z, Fang W, Chen Y, Sun T, Ma Z, Yuan L, Tian Y. Self-architectural knowledge distillation for spiking neural networks. Neural Netw 2024; 178:106475. [PMID: 38941738 DOI: 10.1016/j.neunet.2024.106475] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2023] [Revised: 05/16/2024] [Accepted: 06/17/2024] [Indexed: 06/30/2024]
Abstract
Spiking neural networks (SNNs) have attracted attention due to their biological plausibility and the potential for low-energy applications on neuromorphic hardware. Two mainstream approaches are commonly used to obtain SNNs, i.e., ANN-to-SNN conversion methods, and Directly-trained-SNN methods. However, the former achieve excellent performance at the cost of a large number of time steps (i.e., latency), while the latter exhibit lower latency but suffers from suboptimal performance. To tackle the performance-latency trade-off, we propose Self-Architectural Knowledge Distillation (SAKD), an intuitive and effective method for SNNs leveraging Knowledge Distillation (KD). We adopt a bilevel teacher-student training strategy in SAKD, i.e., level-1 involves directly transferring same-architectural pre-trained ANN weights to SNNs, and level-2 encourages the SNNs to mimic ANN's behavior, considering both final responses and intermediate features aspects. Learning with informative supervision signals fostered by labels and ANNs, our SAKD achieves new state-of-the-art (SOTA) performance with a few time steps on widely-used classification benchmark datasets. On ImageNet-1K, with only 4 time steps, our Spiking-ResNet34 model attains a Top-1 accuracy of 70.04%, outperforming the previous same-architectural SOTA methods. Notably, our SEW-ResNet152 model reaches a Top-1 accuracy of 77.30% on ImageNet-1K, setting a new SOTA benchmark for SNNs. Furthermore, we apply our SAKD to various dense prediction downstream tasks, such as object detection and semantic segmentation, demonstrating strong generalization ability and superior performance. In conclusion, our proposed SAKD framework presents a promising approach for achieving both high performance and low latency in SNNs, potentially paving the way for future advancements in the field.
Collapse
Affiliation(s)
- Haonan Qiu
- Peking University, School of Electronic and Computer Engineering, Shenzhen Graduate School, China.
| | - Munan Ning
- Peking University, School of Electronic and Computer Engineering, Shenzhen Graduate School, China
| | - Zeyin Song
- Peking University, School of Electronic and Computer Engineering, Shenzhen Graduate School, China
| | - Wei Fang
- Peking University, School of Computer Science, China; PengCheng Laboratory, China
| | - Yanqi Chen
- Peking University, School of Computer Science, China; PengCheng Laboratory, China
| | - Tao Sun
- Peking University, School of Electronic and Computer Engineering, Shenzhen Graduate School, China
| | | | - Li Yuan
- Peking University, School of Electronic and Computer Engineering, Shenzhen Graduate School, China; PengCheng Laboratory, China.
| | - Yonghong Tian
- Peking University, School of Electronic and Computer Engineering, Shenzhen Graduate School, China; Peking University, School of Computer Science, China; PengCheng Laboratory, China.
| |
Collapse
|
45
|
Alam MS, Wang D, Sowmya A. AMFP-net: Adaptive multi-scale feature pyramid network for diagnosis of pneumoconiosis from chest X-ray images. Artif Intell Med 2024; 154:102917. [PMID: 38917599 DOI: 10.1016/j.artmed.2024.102917] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2023] [Revised: 05/02/2024] [Accepted: 06/17/2024] [Indexed: 06/27/2024]
Abstract
Early detection of pneumoconiosis by routine health screening of workers in the mining industry is critical for preventing the progression of this incurable disease. Automated pneumoconiosis classification in chest X-ray images is challenging due to the low contrast of opacities, inter-class similarity, intra-class variation and the existence of artifacts. Compared to traditional methods, convolutional neural networks have shown significant improvement in pneumoconiosis classification tasks, however, accurate classification remains challenging due to mainly the inability to focus on semantically meaningful lesion opacities. Most existing networks focus on high level abstract information and ignore low level detailed object information. Different from natural images where an object occupies large space, the classification of pneumoconiosis depends on the density of small opacities inside the lung. To address this issue, we propose a novel two-stage adaptive multi-scale feature pyramid network called AMFP-Net for the diagnosis of pneumoconiosis from chest X-rays. The proposed model consists of 1) an adaptive multi-scale context block to extract rich contextual and discriminative information and 2) a weighted feature fusion module to effectively combine low level detailed and high level global semantic information. This two-stage network first segments the lungs to focus more on relevant regions by excluding irrelevant parts of the image, and then utilises the segmented lungs to classify pneumoconiosis into different categories. Extensive experiments on public and private datasets demonstrate that the proposed approach can outperform state-of-the-art methods for both segmentation and classification.
Collapse
Affiliation(s)
- Md Shariful Alam
- School of Computer Science and Engineering, University of New South Wales, Sydney, NSW, Australia.
| | | | - Arcot Sowmya
- School of Computer Science and Engineering, University of New South Wales, Sydney, NSW, Australia
| |
Collapse
|
46
|
Zhang Y, Pu C, Zhang Y, Niu M, Hao L, Wang J. Integrated Circuit Bonding Distance Inspection via Hierarchical Measurement Structure. SENSORS (BASEL, SWITZERLAND) 2024; 24:3933. [PMID: 38931717 PMCID: PMC11207810 DOI: 10.3390/s24123933] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/22/2024] [Revised: 05/27/2024] [Accepted: 06/01/2024] [Indexed: 06/28/2024]
Abstract
Bonding distance is defined by the projected distance on a substrate plane between two solder points of a bonding wire, which can directly affect the morphology of the bonding wire and the performance between internal components of the chip. For the inspection of the bonding distance, it is necessary to accurately recognize gold wires and solder points within the complex imagery of the chip. However, bonding wires at arbitrary angles and small-sized solder points are densely distributed across the complex background of bonding images. These characteristics pose challenges for conventional image detection and deep learning methods to effectively recognize and measure the bonding distances. In this paper, we present a novel method to measure bonding distance using a hierarchical measurement structure. First, we employ an image acquisition device to capture surface images of integrated circuits and use multi-layer convolution to coarsely locate the bonding region and remove redundant background. Second, we apply a multi-branch wire bonding inspection network for detecting bonding spots and segmenting gold wire. This network includes a fine location branch that utilizes low-level features to enhance detection accuracy for small bonding spots and a gold wire segmentation branch that incorporates an edge branch to effectively extract edge information. Finally, we use the bonding distance measurement module to develop four types of gold wire distribution models for bonding spot matching. Together, these modules create a fully automated method for measuring bonding distances in integrated circuits. The effectiveness of the proposed modules and overall framework has been validated through comprehensive experiments.
Collapse
Affiliation(s)
- Yuan Zhang
- College of Mechanical and Electrical Engineering, Nanjing University of Aeronautics and Astronautics, Nanjing 210016, China; (Y.Z.); (J.W.)
| | - Chenghan Pu
- College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing 211106, China;
| | - Yanming Zhang
- The 29th Research Institute of China Electronics Technology Group Corporation, Chengdu 610036, China; (Y.Z.); (L.H.)
| | - Muyuan Niu
- College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing 211106, China;
| | - Lifeng Hao
- The 29th Research Institute of China Electronics Technology Group Corporation, Chengdu 610036, China; (Y.Z.); (L.H.)
| | - Jun Wang
- College of Mechanical and Electrical Engineering, Nanjing University of Aeronautics and Astronautics, Nanjing 210016, China; (Y.Z.); (J.W.)
| |
Collapse
|
47
|
Guo B, Cao N, Zhang R, Yang P. GETNet: Group Normalization Shuffle and Enhanced Channel Self-Attention Network Based on VT-UNet for Brain Tumor Segmentation. Diagnostics (Basel) 2024; 14:1257. [PMID: 38928672 PMCID: PMC11203032 DOI: 10.3390/diagnostics14121257] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2024] [Revised: 06/08/2024] [Accepted: 06/11/2024] [Indexed: 06/28/2024] Open
Abstract
Currently, brain tumors are extremely harmful and prevalent. Deep learning technologies, including CNNs, UNet, and Transformer, have been applied in brain tumor segmentation for many years and have achieved some success. However, traditional CNNs and UNet capture insufficient global information, and Transformer cannot provide sufficient local information. Fusing the global information from Transformer with the local information of convolutions is an important step toward improving brain tumor segmentation. We propose the Group Normalization Shuffle and Enhanced Channel Self-Attention Network (GETNet), a network combining the pure Transformer structure with convolution operations based on VT-UNet, which considers both global and local information. The network includes the proposed group normalization shuffle block (GNS) and enhanced channel self-attention block (ECSA). The GNS is used after the VT Encoder Block and before the downsampling block to improve information extraction. An ECSA module is added to the bottleneck layer to utilize the characteristics of the detailed features in the bottom layer effectively. We also conducted experiments on the BraTS2021 dataset to demonstrate the performance of our network. The Dice coefficient (Dice) score results show that the values for the regions of the whole tumor (WT), tumor core (TC), and enhancing tumor (ET) were 91.77, 86.03, and 83.64, respectively. The results show that the proposed model achieves state-of-the-art performance compared with more than eleven benchmarks.
Collapse
Affiliation(s)
- Bin Guo
- College of Information Science and Engineering, Hohai University, Nanjing 210098, China;
- College of Computer and Information Engineering, Xinjiang Agricultural University, Urumqi 830052, China; (R.Z.); (P.Y.)
| | - Ning Cao
- College of Information Science and Engineering, Hohai University, Nanjing 210098, China;
| | - Ruihao Zhang
- College of Computer and Information Engineering, Xinjiang Agricultural University, Urumqi 830052, China; (R.Z.); (P.Y.)
| | - Peng Yang
- College of Computer and Information Engineering, Xinjiang Agricultural University, Urumqi 830052, China; (R.Z.); (P.Y.)
| |
Collapse
|
48
|
Urrea C, Garcia-Garcia Y, Kern J. Improving Surgical Scene Semantic Segmentation through a Deep Learning Architecture with Attention to Class Imbalance. Biomedicines 2024; 12:1309. [PMID: 38927516 PMCID: PMC11201157 DOI: 10.3390/biomedicines12061309] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2024] [Revised: 06/01/2024] [Accepted: 06/11/2024] [Indexed: 06/28/2024] Open
Abstract
This article addresses the semantic segmentation of laparoscopic surgery images, placing special emphasis on the segmentation of structures with a smaller number of observations. As a result of this study, adjustment parameters are proposed for deep neural network architectures, enabling a robust segmentation of all structures in the surgical scene. The U-Net architecture with five encoder-decoders (U-Net5ed), SegNet-VGG19, and DeepLabv3+ employing different backbones are implemented. Three main experiments are conducted, working with Rectified Linear Unit (ReLU), Gaussian Error Linear Unit (GELU), and Swish activation functions. The applied loss functions include Cross Entropy (CE), Focal Loss (FL), Tversky Loss (TL), Dice Loss (DiL), Cross Entropy Dice Loss (CEDL), and Cross Entropy Tversky Loss (CETL). The performance of Stochastic Gradient Descent with momentum (SGDM) and Adaptive Moment Estimation (Adam) optimizers is compared. It is qualitatively and quantitatively confirmed that DeepLabv3+ and U-Net5ed architectures yield the best results. The DeepLabv3+ architecture with the ResNet-50 backbone, Swish activation function, and CETL loss function reports a Mean Accuracy (MAcc) of 0.976 and Mean Intersection over Union (MIoU) of 0.977. The semantic segmentation of structures with a smaller number of observations, such as the hepatic vein, cystic duct, Liver Ligament, and blood, verifies that the obtained results are very competitive and promising compared to the consulted literature. The proposed selected parameters were validated in the YOLOv9 architecture, which showed an improvement in semantic segmentation compared to the results obtained with the original architecture.
Collapse
Affiliation(s)
- Claudio Urrea
- Electrical Engineering Department, Faculty of Engineering, University of Santiago of Chile, Las Sophoras 165, Estación Central, Santiago 9170020, Chile; (Y.G.-G.); (J.K.)
| | | | | |
Collapse
|
49
|
Dabove P, Daud M, Olivotto L. Revolutionizing urban mapping: deep learning and data fusion strategies for accurate building footprint segmentation. Sci Rep 2024; 14:13510. [PMID: 38866920 PMCID: PMC11169381 DOI: 10.1038/s41598-024-64231-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2023] [Accepted: 06/06/2024] [Indexed: 06/14/2024] Open
Abstract
In the dynamic urban landscape, understanding the distribution of buildings is paramount. Extracting and delineating building footprints from high-resolution images, captured by aerial platforms or satellites, is essential but challenging to accomplish manually, due to the abundance of high-resolution data. Automation becomes imperative, yet it introduces complexities related to handling diverse data sources and the computational demands of advanced algorithms. The innovative solution proposed in this paper addresses some intricate challenges occurring when integrating deep learning and data fusion on Earth Observed imagery. By merging RGB orthophotos with Digital Surface Models, deriving from the same aerial high-resolution surveys, an integrated consistent four-band dataset is generated. This unified approach, focused on the extraction of height information through stereoscopy utilizing a singular source, facilitates enhanced pixel-to-pixel data fusion. Employing DeepLabv3 algorithms, a state-of-the-art semantic segmentation network for multi-scale context, pixel-based segmentation on the integrated dataset was performed, excelling in capturing intricate details, particularly when enhanced by the additional height information deriving from the Digital Surface Models acquired over urban landscapes. Evaluation over a 21 km2 area in Turin, Italy, featuring diverse building frameworks, showcases how the proposed approach leads towards superior accuracy levels and building boundary refinement. Notably, the methodology discussed in the present article, significantly reduces training time compared to conventional approaches like U-Net, overcoming inherent challenges in high-resolution data automation. By establishing the effectiveness of leveraging DeepLabv3 algorithms on an integrated dataset for precise building footprint segmentation, the present contribution holds promise for applications in 3D modelling, Change detection and urban planning. An approach favouring the application of deep learning strategies on integrated high-resolution datasets can then guide decision-making processes facilitating urban management tasks.
Collapse
Affiliation(s)
- P Dabove
- Department of Environment, Land and Infrastructure Engineering, Politecnico di Torino, Turin, Italy.
| | - M Daud
- DigiSky S.R.L., Turin, Italy
| | | |
Collapse
|
50
|
Liu X, Qu L, Xie Z, Zhao J, Shi Y, Song Z. Towards more precise automatic analysis: a systematic review of deep learning-based multi-organ segmentation. Biomed Eng Online 2024; 23:52. [PMID: 38851691 PMCID: PMC11162022 DOI: 10.1186/s12938-024-01238-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2023] [Accepted: 04/11/2024] [Indexed: 06/10/2024] Open
Abstract
Accurate segmentation of multiple organs in the head, neck, chest, and abdomen from medical images is an essential step in computer-aided diagnosis, surgical navigation, and radiation therapy. In the past few years, with a data-driven feature extraction approach and end-to-end training, automatic deep learning-based multi-organ segmentation methods have far outperformed traditional methods and become a new research topic. This review systematically summarizes the latest research in this field. We searched Google Scholar for papers published from January 1, 2016 to December 31, 2023, using keywords "multi-organ segmentation" and "deep learning", resulting in 327 papers. We followed the PRISMA guidelines for paper selection, and 195 studies were deemed to be within the scope of this review. We summarized the two main aspects involved in multi-organ segmentation: datasets and methods. Regarding datasets, we provided an overview of existing public datasets and conducted an in-depth analysis. Concerning methods, we categorized existing approaches into three major classes: fully supervised, weakly supervised and semi-supervised, based on whether they require complete label information. We summarized the achievements of these methods in terms of segmentation accuracy. In the discussion and conclusion section, we outlined and summarized the current trends in multi-organ segmentation.
Collapse
Affiliation(s)
- Xiaoyu Liu
- Digital Medical Research Center, School of Basic Medical Sciences, Fudan University, 138 Yixueyuan Road, Shanghai, 200032, People's Republic of China
- Shanghai Key Laboratory of Medical Image Computing and Computer Assisted Intervention, Shanghai, 200032, China
| | - Linhao Qu
- Digital Medical Research Center, School of Basic Medical Sciences, Fudan University, 138 Yixueyuan Road, Shanghai, 200032, People's Republic of China
- Shanghai Key Laboratory of Medical Image Computing and Computer Assisted Intervention, Shanghai, 200032, China
| | - Ziyue Xie
- Digital Medical Research Center, School of Basic Medical Sciences, Fudan University, 138 Yixueyuan Road, Shanghai, 200032, People's Republic of China
- Shanghai Key Laboratory of Medical Image Computing and Computer Assisted Intervention, Shanghai, 200032, China
| | - Jiayue Zhao
- Digital Medical Research Center, School of Basic Medical Sciences, Fudan University, 138 Yixueyuan Road, Shanghai, 200032, People's Republic of China
- Shanghai Key Laboratory of Medical Image Computing and Computer Assisted Intervention, Shanghai, 200032, China
| | - Yonghong Shi
- Digital Medical Research Center, School of Basic Medical Sciences, Fudan University, 138 Yixueyuan Road, Shanghai, 200032, People's Republic of China.
- Shanghai Key Laboratory of Medical Image Computing and Computer Assisted Intervention, Shanghai, 200032, China.
| | - Zhijian Song
- Digital Medical Research Center, School of Basic Medical Sciences, Fudan University, 138 Yixueyuan Road, Shanghai, 200032, People's Republic of China.
- Shanghai Key Laboratory of Medical Image Computing and Computer Assisted Intervention, Shanghai, 200032, China.
| |
Collapse
|