Szentimrey Z, Al-Hayali A, de Ribaupierre S, Fenster A, Ukwatta E. Semi-supervised learning framework with shape encoding for neonatal ventricular segmentation from 3D ultrasound.
Med Phys 2024;
51:6134-6148. [PMID:
38857570 DOI:
10.1002/mp.17242]
[Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2023] [Revised: 05/27/2024] [Accepted: 05/27/2024] [Indexed: 06/12/2024] Open
Abstract
BACKGROUND
Three-dimensional (3D) ultrasound (US) imaging has shown promise in non-invasive monitoring of changes in the lateral brain ventricles of neonates suffering from intraventricular hemorrhaging. Due to the poorly defined anatomical boundaries and low signal-to-noise ratio, fully supervised methods for segmentation of the lateral ventricles in 3D US images require a large dataset of annotated images by trained physicians, which is tedious, time-consuming, and expensive. Training fully supervised segmentation methods on a small dataset may lead to overfitting and hence reduce its generalizability. Semi-supervised learning (SSL) methods for 3D US segmentation may be able to address these challenges but most existing SSL methods have been developed for magnetic resonance or computed tomography (CT) images.
PURPOSE
To develop a fast, lightweight, and accurate SSL method, specifically for 3D US images, that will use unlabeled data towards improving segmentation performance.
METHODS
We propose an SSL framework that leverages the shape-encoding ability of an autoencoder network to enforce complex shape and size constraints on a 3D U-Net segmentation model. The autoencoder created pseudo-labels, based on the 3D U-Net predicted segmentations, that enforces shape constraints. An adversarial discriminator network then determined whether images came from the labeled or unlabeled data distributions. We used 887 3D US images, of which 87 had manually annotated labels and 800 images were unlabeled. Training/validation/testing sets of 25/12/50, 25/12/25 and 50/12/25 images were used for model experimentation. The Dice similarity coefficient (DSC), mean absolute surface distance (MAD), and absolute volumetric difference (VD) were used as metrics for comparing to other benchmarks. The baseline benchmark was the fully supervised vanilla 3D U-Net while dual task consistency, shape-aware semi-supervised network, correlation-aware mutual learning, and 3D U-Net Ensemble models were used as state-of-the-art benchmarks with DSC, MAD, and VD as comparison metrics. The Wilcoxon signed-rank test was used to test statistical significance between algorithms for DSC and VD with the threshold being p < 0.05 and corrected to p < 0.01 using the Bonferroni correction. The random-access memory (RAM) trace and number of trainable parameters were used to compare the computing efficiency between models.
RESULTS
Relative to the baseline 3D U-Net model, our shape-encoding SSL method reported a mean DSC improvement of 6.5%, 7.7%, and 4.1% with a 95% confidence interval of 4.2%, 5.7%, and 2.1% using image data splits of 25/12/50, 25/12/25, and 50/12/25, respectively. Our method only used a 1GB increase in RAM compared to the baseline 3D U-Net and required less than half the RAM and trainable parameters compared to the 3D U-Net ensemble method.
CONCLUSIONS
Based on our extensive literature survey, this is one of the first reported works to propose an SSL method designed for segmenting organs in 3D US images and specifically one that incorporates unlabeled data for segmenting neonatal cerebral lateral ventricles. When compared to the state-of-the-art SSL and fully supervised learning methods, our method yielded the highest DSC and lowest VD while being computationally efficient.
Collapse