1
|
Li X, Hong Y, Xu Y, Hu M. VerFormer: Vertebrae-Aware Transformer for Automatic Spine Segmentation from CT Images. Diagnostics (Basel) 2024; 14:1859. [PMID: 39272643 PMCID: PMC11393940 DOI: 10.3390/diagnostics14171859] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2024] [Revised: 07/24/2024] [Accepted: 08/02/2024] [Indexed: 09/15/2024] Open
Abstract
The accurate and efficient segmentation of the spine is important in the diagnosis and treatment of spine malfunctions and fractures. However, it is still challenging because of large inter-vertebra variations in shape and cross-image localization of the spine. In previous methods, convolutional neural networks (CNNs) have been widely applied as a vision backbone to tackle this task. However, these methods are challenged in utilizing the global contextual information across the whole image for accurate spine segmentation because of the inherent locality of the convolution operation. Compared with CNNs, the Vision Transformer (ViT) has been proposed as another vision backbone with a high capacity to capture global contextual information. However, when the ViT is employed for spine segmentation, it treats all input tokens equally, including vertebrae-related tokens and non-vertebrae-related tokens. Additionally, it lacks the capability to locate regions of interest, thus lowering the accuracy of spine segmentation. To address this limitation, we propose a novel Vertebrae-aware Vision Transformer (VerFormer) for automatic spine segmentation from CT images. Our VerFormer is designed by incorporating a novel Vertebrae-aware Global (VG) block into the ViT backbone. In the VG block, the vertebrae-related global contextual information is extracted by a Vertebrae-aware Global Query (VGQ) module. Then, this information is incorporated into query tokens to highlight vertebrae-related tokens in the multi-head self-attention module. Thus, this VG block can leverage global contextual information to effectively and efficiently locate spines across the whole input, thus improving the segmentation accuracy of VerFormer. Driven by this design, the VerFormer demonstrates a solid capacity to capture more discriminative dependencies and vertebrae-related context in automatic spine segmentation. The experimental results on two spine CT segmentation tasks demonstrate the effectiveness of our VG block and the superiority of our VerFormer in spine segmentation. Compared with other popular CNN- or ViT-based segmentation models, our VerFormer shows superior segmentation accuracy and generalization.
Collapse
Affiliation(s)
- Xinchen Li
- Department of Orthopedics, Ruijin Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai 200025, China
| | - Yuan Hong
- Department of Orthopedics, Ruijin Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai 200025, China
| | - Yang Xu
- Department of Orthopedics, Ruijin Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai 200025, China
| | - Mu Hu
- Department of Orthopedics, Ruijin Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai 200025, China
| |
Collapse
|
2
|
Lu S, Liu J, Wang X, Zhou Y. Collaborative Multi-Metadata Fusion to Improve the Classification of Lumbar Disc Herniation. IEEE TRANSACTIONS ON MEDICAL IMAGING 2023; 42:3590-3601. [PMID: 37432809 DOI: 10.1109/tmi.2023.3294248] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/13/2023]
Abstract
Computed tomography (CT) images are the most commonly used radiographic imaging modality for detecting and diagnosing lumbar diseases. Despite many outstanding advances, computer-aided diagnosis (CAD) of lumbar disc disease remains challenging due to the complexity of pathological abnormalities and poor discrimination between different lesions. Therefore, we propose a Collaborative Multi-Metadata Fusion classification network (CMMF-Net) to address these challenges. The network consists of a feature selection model and a classification model. We propose a novel Multi-scale Feature Fusion (MFF) module that can improve the edge learning ability of the network region of interest (ROI) by fusing features of different scales and dimensions. We also propose a new loss function to improve the convergence of the network to the internal and external edges of the intervertebral disc. Subsequently, we use the ROI bounding box from the feature selection model to crop the original image and calculate the distance features matrix. We then concatenate the cropped CT images, multiscale fusion features, and distance feature matrices and input them into the classification network. Next, the model outputs the classification results and the class activation map (CAM). Finally, the CAM of the original image size is returned to the feature selection network during the upsampling process to achieve collaborative model training. Extensive experiments demonstrate the effectiveness of our method. The model achieved 91.32% accuracy in the lumbar spine disease classification task. In the labelled lumbar disc segmentation task, the Dice coefficient reaches 94.39%. The classification accuracy in the Lung Image Database Consortium and Image Database Resource Initiative (LIDC-IDRI) reaches 91.82%.
Collapse
|
3
|
Choi Y, Jang H, Baek J. Chest tomosynthesis deblurring using CNN with deconvolution layer for vertebrae segmentation. Med Phys 2023; 50:7714-7730. [PMID: 37401539 DOI: 10.1002/mp.16576] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2022] [Revised: 04/13/2023] [Accepted: 06/06/2023] [Indexed: 07/05/2023] Open
Abstract
BACKGROUND Limited scan angles cause severe distortions and artifacts in reconstructed tomosynthesis images when the Feldkamp-Davis-Kress (FDK) algorithm is used for the purpose, which degrades clinical diagnostic performance. These blurring artifacts are fatal in chest tomosynthesis images because precise vertebrae segmentation is crucial for various diagnostic analyses, such as early diagnosis, surgical planning, and injury detection. Moreover, because most spinal pathologies are related to vertebral conditions, the development of methods for accurate and objective vertebrae segmentation in medical images is an important and challenging research area. PURPOSE The existing point-spread-function-(PSF)-based deblurring methods use the same PSF in all sub-volumes without considering the spatially varying property of tomosynthesis images. This increases the PSF estimation error, thus further degrading the deblurring performance. However, the proposed method estimates the PSF more accurately by using sub-CNNs that contain a deconvolution layer for each sub-system, which improves the deblurring performance. METHODS To minimize the effect of the spatially varying property, the proposed deblurring network architecture comprises four modules: (1) block division module, (2) partial PSF module, (3) deblurring block module, and (4) assembling block module. We compared the proposed DL-based method with the FDK algorithm, total-variation iterative reconstruction with GP-BB (TV-IR), 3D U-Net, FBPConvNet, and two-phase deblurring method. To investigate the deblurring performance of the proposed method, we evaluated its vertebrae segmentation performance by comparing the pixel accuracy (PA), intersection-over-union (IoU), and F-score values of reference images to those of the deblurred images. Also, pixel-based evaluations of the reference and deblurred images were performed by comparing their root mean squared error (RMSE) and visual information fidelity (VIF) values. In addition, 2D analysis of the deblurred images were performed by artifact spread function (ASF) and full width half maximum (FWHM) of the ASF curve. RESULTS The proposed method was able to recover the original structure significantly, thereby further improving the image quality. The proposed method yielded the best deblurring performance in terms of vertebrae segmentation and similarity. The IoU, F-score, and VIF values of the chest tomosynthesis images reconstructed using the proposed SV method were 53.5%, 28.7%, and 63.2% higher, respectively, than those of the images reconstructed using the FDK method, and the RMSE value was 80.3% lower. These quantitative results indicate that the proposed method can effectively restore both the vertebrae and the surrounding soft tissue. CONCLUSIONS We proposed a chest tomosynthesis deblurring technique for vertebrae segmentation by considering the spatially varying property of tomosynthesis systems. The results of quantitative evaluations indicated that the vertebrae segmentation performance of the proposed method was better than those of the existing deblurring methods.
Collapse
Affiliation(s)
- Yunsu Choi
- School of Integrated Technology, Yonsei University, Incheon, South Korea
| | - Hanjoo Jang
- School of Integrated Technology, Yonsei University, Incheon, South Korea
| | - Jongduk Baek
- Department of Artificial Intelligence, College of Computing, Yonsei University, Incheon, South Korea
| |
Collapse
|
4
|
You X, Gu Y, Liu Y, Lu S, Tang X, Yang J. VerteFormer: A single-staged Transformer network for vertebrae segmentation from CT images with arbitrary field of views. Med Phys 2023; 50:6296-6318. [PMID: 37211910 DOI: 10.1002/mp.16467] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2022] [Revised: 04/09/2023] [Accepted: 04/27/2023] [Indexed: 05/23/2023] Open
Abstract
BACKGROUND Spinal diseases are burdening an increasing number of patients. And fully automatic vertebrae segmentation for CT images with arbitrary field of views (FOVs), has been a fundamental research for computer-assisted spinal disease diagnosis and surgical intervention. Therefore, researchers aim to solve this challenging task in the past years. PURPOSE This task suffers from challenges including the intra-vertebrae inconsistency of segmentation and the poor identification of biterminal vertebrae in CT scans. And there are some limitations in existing models, which might be difficult to be applied to spinal cases with arbitrary FOVs or employ multi-stage networks with too much computational cost. In this paper, we propose a single-staged model called VerteFormer which can effectively deal with the challenges and limitations mentioned above. METHODS The proposed VerteFormer utilizes the advantage of Vision Transformer (ViT), which does well in mining global relations for input data. The Transformer and UNet-based structure effectively fuse global and local features of vertebrae. Beisdes, we propose the Edge Detection (ED) block based on convolution and self-attention to divide neighboring vertebrae with clear boundary lines. And it simultaneously promotes the network to achieve more consistent segmentation masks of vertebrae. To better identify the labels of vertebrae in the spine, particularly biterminal vertebrae, we further introduce global information generated from the Global Information Extraction (GIE) block. RESULTS We evaluate the proposed model on two public datasets: MICCAI Challenge VerSe 2019 and 2020. And VerteFormer achieve 86.39% and 86.54% of dice scores on the public and hidden test datasets of VerSe 2019, 84.53% and 86.86% of dice scores on VerSe 2020, which outperforms other Transformer-based models and single-staged methods specifically designed for the VerSe Challenge. Additional ablation experiments validate the effectiveness of ViT block, ED block and GIE block. CONCLUSIONS We propose a single-staged Transformer-based model for the task of fully automatic vertebrae segmentation from CT images with arbitrary FOVs. ViT demonstrates its effectiveness in modeling long-term relations. The ED block and GIE block has shown their improvements to the segmentation performance of vertebrae. The proposed model can assist physicians for spinal diseases' diagnosis and surgical intervention, and is also promising to be generalized and transferred to other applications of medical imaging.
Collapse
Affiliation(s)
- Xin You
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, Shanghai, China
- Institute of Medical Robotics, Shanghai Jiao Tong University, Shanghai, China
| | - Yun Gu
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, Shanghai, China
- Institute of Medical Robotics, Shanghai Jiao Tong University, Shanghai, China
| | - Yingying Liu
- Research, Technology and Clinical, Medtronic Technology Center, Shanghai, China
| | - Steve Lu
- Visualization and Robotics, Medtronic Technology Center, Shanghai, China
| | - Xin Tang
- Research, Technology and Clinical, Medtronic Technology Center, Shanghai, China
| | - Jie Yang
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, Shanghai, China
- Institute of Medical Robotics, Shanghai Jiao Tong University, Shanghai, China
| |
Collapse
|
5
|
CT-Based Automatic Spine Segmentation Using Patch-Based Deep Learning. INT J INTELL SYST 2023. [DOI: 10.1155/2023/2345835] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/06/2023]
Abstract
CT vertebral segmentation plays an essential role in various clinical applications, such as computer-assisted surgical interventions, assessment of spinal abnormalities, and vertebral compression fractures. Automatic CT vertebral segmentation is challenging due to the overlapping shadows of thoracoabdominal structures such as the lungs, bony structures such as the ribs, and other issues such as ambiguous object borders, complicated spine architecture, patient variability, and fluctuations in image contrast. Deep learning is an emerging technique for disease diagnosis in the medical field. This study proposes a patch-based deep learning approach to extract the discriminative features from unlabeled data using a stacked sparse autoencoder (SSAE). 2D slices from a CT volume are divided into overlapping patches fed into the model for training. A random under sampling (RUS)-module is applied to balance the training data by selecting a subset of the majority class. SSAE uses pixel intensities alone to learn high-level features to recognize distinctive features from image patches. Each image is subjected to a sliding window operation to express image patches using autoencoder high-level features, which are then fed into a sigmoid layer to classify whether each patch is a vertebra or not. We validate our approach on three diverse publicly available datasets: VerSe, CSI-Seg, and the Lumbar CT dataset. Our proposed method outperformed other models after configuration optimization by achieving 89.9% in precision, 90.2% in recall, 98.9% in accuracy, 90.4% in F-score, 82.6% in intersection over union (IoU), and 90.2% in Dice coefficient (DC). The results of this study demonstrate that our model’s performance consistency using a variety of validation strategies is flexible, fast, and generalizable, making it suited for clinical application.
Collapse
|
6
|
Automatic vertebrae localization and segmentation in CT with a two-stage Dense-U-Net. Sci Rep 2021; 11:22156. [PMID: 34772972 PMCID: PMC8589948 DOI: 10.1038/s41598-021-01296-1] [Citation(s) in RCA: 32] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2021] [Accepted: 10/26/2021] [Indexed: 11/09/2022] Open
Abstract
Automatic vertebrae localization and segmentation in computed tomography (CT) are fundamental for spinal image analysis and spine surgery with computer-assisted surgery systems. But they remain challenging due to high variation in spinal anatomy among patients. In this paper, we proposed a deep-learning approach for automatic CT vertebrae localization and segmentation with a two-stage Dense-U-Net. The first stage used a 2D-Dense-U-Net to localize vertebrae by detecting the vertebrae centroids with dense labels and 2D slices. The second stage segmented the specific vertebra within a region-of-interest identified based on the centroid using 3D-Dense-U-Net. Finally, each segmented vertebra was merged into a complete spine and resampled to original resolution. We evaluated our method on the dataset from the CSI 2014 Workshop with 6 metrics: location error (1.69 ± 0.78 mm), detection rate (100%) for vertebrae localization; the dice coefficient (0.953 ± 0.014), intersection over union (0.911 ± 0.025), Hausdorff distance (4.013 ± 2.128 mm), pixel accuracy (0.998 ± 0.001) for vertebrae segmentation. The experimental results demonstrated the efficiency of the proposed method. Furthermore, evaluation on the dataset from the xVertSeg challenge with location error (4.12 ± 2.31), detection rate (100%), dice coefficient (0.877 ± 0.035) shows the generalizability of our method. In summary, our solution localized the vertebrae successfully by detecting the centroids of vertebrae and implemented instance segmentation of vertebrae in the whole spine.
Collapse
|
7
|
Khandelwal P, Collins DL, Siddiqi K. Spine and Individual Vertebrae Segmentation in Computed Tomography Images Using Geometric Flows and Shape Priors. FRONTIERS IN COMPUTER SCIENCE 2021. [DOI: 10.3389/fcomp.2021.592296] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
The surgical treatment of injuries to the spine often requires the placement of pedicle screws. To prevent damage to nearby blood vessels and nerves, the individual vertebrae and their surrounding tissue must be precisely localized. To aid surgical planning in this context we present a clinically applicable geometric flow based method to segment the human spinal column from computed tomography (CT) scans. We first apply anisotropic diffusion and flux computation to mitigate the effects of region inhomogeneities and partial volume effects at vertebral boundaries in such data. The first pipeline of our segmentation approach uses a region-based geometric flow, requires only a single manually identified seed point to initiate, and runs efficiently on a multi-core central processing unit (CPU). A shape-prior formulation is employed in a separate second pipeline to segment individual vertebrae, using both region and boundary based terms to augment the initial segmentation. We validate our method on four different clinical databases, each of which has a distinct intensity distribution. Our approach obviates the need for manual segmentation, significantly reduces inter- and intra-observer differences, runs in times compatible with use in a clinical workflow, achieves Dice scores that are comparable to the state of the art, and yields precise vertebral surfaces that are well within the acceptable 2 mm mark for surgical interventions.
Collapse
|
8
|
Novikov AA, Major D, Wimmer M, Lenis D, Buhler K. Deep Sequential Segmentation of Organs in Volumetric Medical Scans. IEEE TRANSACTIONS ON MEDICAL IMAGING 2019; 38:1207-1215. [PMID: 30452352 DOI: 10.1109/tmi.2018.2881678] [Citation(s) in RCA: 32] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Segmentation in 3-D scans is playing an increasingly important role in current clinical practice supporting diagnosis, tissue quantification, or treatment planning. The current 3-D approaches based on convolutional neural networks usually suffer from at least three main issues caused predominantly by implementation constraints-first, they require resizing the volume to the lower-resolutional reference dimensions, and second, the capacity of such approaches is very limited due to memory restrictions, and third, all slices of volumes have to be available at any given training or testing time. We address these problems by a U-Net-like architecture consisting of bidirectional convolutional long short-term memory and convolutional, pooling, upsampling, and concatenation layers enclosed into time-distributed wrappers. Our network can either process the full volumes in a sequential manner or segment slabs of slices on demand. We demonstrate performance of our architecture on vertebrae and liver segmentation tasks in 3-D computed tomography scans.
Collapse
|
9
|
Kainz P, Pfeiffer M, Urschler M. Segmentation and classification of colon glands with deep convolutional neural networks and total variation regularization. PeerJ 2017; 5:e3874. [PMID: 29018612 PMCID: PMC5629961 DOI: 10.7717/peerj.3874] [Citation(s) in RCA: 66] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2016] [Accepted: 09/09/2017] [Indexed: 12/17/2022] Open
Abstract
Segmentation of histopathology sections is a necessary preprocessing step for digital pathology. Due to the large variability of biological tissue, machine learning techniques have shown superior performance over conventional image processing methods. Here we present our deep neural network-based approach for segmentation and classification of glands in tissue of benign and malignant colorectal cancer, which was developed to participate in the GlaS@MICCAI2015 colon gland segmentation challenge. We use two distinct deep convolutional neural networks (CNN) for pixel-wise classification of Hematoxylin-Eosin stained images. While the first classifier separates glands from background, the second classifier identifies gland-separating structures. In a subsequent step, a figure-ground segmentation based on weighted total variation produces the final segmentation result by regularizing the CNN predictions. We present both quantitative and qualitative segmentation results on the recently released and publicly available Warwick-QU colon adenocarcinoma dataset associated with the GlaS@MICCAI2015 challenge and compare our approach to the simultaneously developed other approaches that participated in the same challenge. On two test sets, we demonstrate our segmentation performance and show that we achieve a tissue classification accuracy of 98% and 95%, making use of the inherent capability of our system to distinguish between benign and malignant tissue. Our results show that deep learning approaches can yield highly accurate and reproducible results for biomedical image analysis, with the potential to significantly improve the quality and speed of medical diagnoses.
Collapse
Affiliation(s)
- Philipp Kainz
- Institute of Biophysics, Center for Physiological Medicine, Medical University of Graz, Graz, Austria
- Institute of Neuroinformatics, University of Zurich and ETH Zurich, Zurich, Switzerland
| | - Michael Pfeiffer
- Institute of Neuroinformatics, University of Zurich and ETH Zurich, Zurich, Switzerland
| | - Martin Urschler
- Ludwig Boltzmann Institute for Clinical Forensic Imaging, Graz, Austria
- Institute for Computer Graphics and Vision, Graz University of Technology, Graz, Austria
- BioTechMed-Graz, Graz, Austria
| |
Collapse
|
10
|
Hanaoka S, Masutani Y, Nemoto M, Nomura Y, Miki S, Yoshikawa T, Hayashi N, Ohtomo K, Shimizu A. Landmark-guided diffeomorphic demons algorithm and its application to automatic segmentation of the whole spine and pelvis in CT images. Int J Comput Assist Radiol Surg 2016; 12:413-430. [DOI: 10.1007/s11548-016-1507-z] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2016] [Accepted: 11/16/2016] [Indexed: 10/20/2022]
|
11
|
Castro-Mateos I, Pozo JM, Pereañez M, Lekadir K, Lazary A, Frangi AF. Statistical Interspace Models (SIMs): Application to Robust 3D Spine Segmentation. IEEE TRANSACTIONS ON MEDICAL IMAGING 2015; 34:1663-1675. [PMID: 26080379 DOI: 10.1109/tmi.2015.2443912] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]
Abstract
Statistical shape models (SSM) are used to introduce shape priors in the segmentation of medical images. However, such models require large training datasets in the case of multi-object structures, since it is required to obtain not only the individual shape variations but also the relative position and orientation among objects. A solution to overcome this limitation is to model each individual shape independently. However, this approach does not take into account the relative position, orientations and shapes among the parts of an articulated object, which may result in unrealistic geometries, such as with object overlaps. In this article, we propose a new Statistical Model, the Statistical Interspace Model (SIM), which provides information about the interaction of all the individual structures by modeling the interspace between them. The SIM is described using relative position vectors between pair of points that belong to different objects that are facing each other. These vectors are divided into their magnitude and direction, each of these groups modeled as independent manifolds. The SIM was included in a segmentation framework that contains an SSM per individual object. This framework was tested using three distinct types of datasets of CT images of the spine. Results show that the SIM completely eliminated the inter-process overlap while improving the segmentation accuracy.
Collapse
|