Huang J, Zhong A, Wei Y. A new visual State Space Model for low-dose CT denoising.
Med Phys 2024. [PMID:
39231014 DOI:
10.1002/mp.17387]
[Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2024] [Revised: 08/08/2024] [Accepted: 08/19/2024] [Indexed: 09/06/2024] Open
Abstract
BACKGROUND
Low-dose computed tomography (LDCT) can mitigate potential health risks to the public. However, the severe noise and artifacts in LDCT images can impede subsequent clinical diagnosis and analysis. Convolutional neural networks (CNNs) and Transformers stand out as the two most popular backbones in LDCT denoising. Nonetheless, CNNs suffer from a lack of long-range modeling capabilities, while Transformers are hindered by high computational complexity.
PURPOSE
In this study, our main goal is to develop a simple and efficient model that can both focus on local spatial context and model long-range dependencies with linear computational complexity for LDCT denoising.
METHODS
In this study, we make the first attempt to apply the State Space Model to LDCT denoising and propose a novel LDCT denoising model named Visual Mamba Encoder-Decoder Network (ViMEDnet). To efficiently and effectively capture both the local and global features, we propose the Mixed State Space Module (MSSM), where the depth-wise convolution, max-pooling, and 2D Selective Scan Module (2DSSM) are coupled together through a partial channel splitting mechanism. 2DSSM is capable of capturing global information with linear computational complexity, while convolution and max-pooling can effectively learn local signals to facilitate detail restoration. Furthermore, the network uses a weighted gradient-sensitive hybrid loss function to facilitate the preservation of image details, improving the overall denoising performance.
RESULTS
The performance of our proposed ViMEDnet is compared to five state-of-the-art LDCT denoising methods, including an iterative algorithm, two CNN-based methods, and two Transformer-based methods. The comparative experimental results demonstrate that the proposed ViMEDnet can achieve better visual quality and quantitative assessment outcomes. In visual evaluation, ViMEDnet effectively removes noise and artifacts, while exhibiting superior performance in restoring fine structures and low-contrast structural edges, resulting in minimal deviation of denoised images from NDCT. In quantitative assessment, ViMEDnet obtains the lowest RMSE and the highest PSNR, SSIM, and FSIM scores, further substantiating the superiority of ViMEDnet.
CONCLUSIONS
The proposed ViMEDnet possesses excellent LDCT denoising performance and provides a new alternative to LDCT denoising models beyond the existing CNN and Transformer options.
Collapse