1
|
Xia Z, Wu B, Chan CY, Wu T, Zhou M, Kong LB. Deep-learning-based pyramid-transformer for localized porosity analysis of hot-press sintered ceramic paste. PLoS One 2024; 19:e0306385. [PMID: 39231159 PMCID: PMC11373816 DOI: 10.1371/journal.pone.0306385] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2023] [Accepted: 06/17/2024] [Indexed: 09/06/2024] Open
Abstract
Scanning Electron Microscope (SEM) is a crucial tool for studying microstructures of ceramic materials. However, the current practice heavily relies on manual efforts to extract porosity from SEM images. To address this issue, we propose PSTNet (Pyramid Segmentation Transformer Net) for grain and pore segmentation in SEM images, which merges multi-scale feature maps through operations like recombination and upsampling to predict and generate segmentation maps. These maps are used to predict the corresponding porosity at ceramic grain boundaries. To increase segmentation accuracy and minimize loss, we employ several strategies. (1) We train the micro-pore detection and segmentation model using publicly available Al2O3 and custom Y2O3 ceramic SEM images. We calculate the pixel percentage of segmented pores in SEM images to determine the surface porosity at the corresponding locations. (2) Utilizing high-temperature hot pressing sintering, we prepared and captured scanning electron microscope images of Y2O3 ceramics, with which a Y2O3 ceramic dataset was constructed through preprocessing and annotation. (3) We employed segmentation penalty cross-entropy loss, smooth L1 loss, and structural similarity (SSIM) loss as the constituent terms of a joint loss function. The segmentation penalty cross-entropy loss helps suppress segmentation loss bias, smooth L1 loss is utilized to reduce noise in images, and incorporating structural similarity into the loss function computation guides the model to better learn structural features of images, significantly improving the accuracy and robustness of semantic segmentation. (4) In the decoder stage, we utilized an improved version of the multi-head attention mechanism (MHA) for feature fusion, leading to a significant enhancement in model performance. Our model training is based on publicly available laser-sintered Al2O3 ceramic datasets and self-made high-temperature hot-pressed sintered Y2O3 ceramic datasets, and validation has been completed. Our Pix Acc score improves over the baseline by 12.2%, 86.52 vs. 76.01, and the mIoU score improves from by 25.5%, 69.10 vs. 51.49. The average relative errors on datasets Y2O3 and Al2O3 were 6.9% and 6.36%, respectively.
Collapse
Affiliation(s)
- Zhongyi Xia
- College of Applied Technology, Shenzhen University, Shenzhen, Guangdong, China
- College of New Materials and New Energies, Shenzhen Technology University, Shenzhen, Guangdong, China
| | - Boqi Wu
- Key Laboratory for Comprehemsive Energy Saving of Cold Regions Architecture of Ministry of Education, Jilin Jianzhu University, Changchun, Jilin, China
| | - C Y Chan
- College of New Materials and New Energies, Shenzhen Technology University, Shenzhen, Guangdong, China
| | - Tianzhao Wu
- College of Applied Technology, Shenzhen University, Shenzhen, Guangdong, China
- College of New Materials and New Energies, Shenzhen Technology University, Shenzhen, Guangdong, China
| | - Man Zhou
- College of Applied Technology, Shenzhen University, Shenzhen, Guangdong, China
- College of New Materials and New Energies, Shenzhen Technology University, Shenzhen, Guangdong, China
| | - Ling Bing Kong
- College of New Materials and New Energies, Shenzhen Technology University, Shenzhen, Guangdong, China
| |
Collapse
|
2
|
Xia Z, Wu T, Wang Z, Zhou M, Wu B, Chan CY, Kong LB. Dense monocular depth estimation for stereoscopic vision based on pyramid transformer and multi-scale feature fusion. Sci Rep 2024; 14:7037. [PMID: 38528098 DOI: 10.1038/s41598-024-57908-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2023] [Accepted: 03/22/2024] [Indexed: 03/27/2024] Open
Abstract
Stereoscopic display technology plays a significant role in industries, such as film, television and autonomous driving. The accuracy of depth estimation is crucial for achieving high-quality and realistic stereoscopic display effects. In addressing the inherent challenges of applying Transformers to depth estimation, the Stereoscopic Pyramid Transformer-Depth (SPT-Depth) is introduced. This method utilizes stepwise downsampling to acquire both shallow and deep semantic information, which are subsequently fused. The training process is divided into fine and coarse convergence stages, employing distinct training strategies and hyperparameters, resulting in a substantial reduction in both training and validation losses. In the training strategy, a shift and scale-invariant mean square error function is employed to compensate for the lack of translational invariance in the Transformers. Additionally, an edge-smoothing function is applied to reduce noise in the depth map, enhancing the model's robustness. The SPT-Depth achieves a global receptive field while effectively reducing time complexity. In comparison with the baseline method, with the New York University Depth V2 (NYU Depth V2) dataset, there is a 10% reduction in Absolute Relative Error (Abs Rel) and a 36% decrease in Root Mean Square Error (RMSE). When compared with the state-of-the-art methods, there is a 17% reduction in RMSE.
Collapse
Affiliation(s)
- Zhongyi Xia
- College of New Materials and New Energies, Shenzhen Technology University, Shenzhen, 518118, Guangdong, China
- College of Applied Technology, Shenzhen University, Shenzhen, 518000, Guangdong, China
| | - Tianzhao Wu
- College of New Materials and New Energies, Shenzhen Technology University, Shenzhen, 518118, Guangdong, China
- College of Applied Technology, Shenzhen University, Shenzhen, 518000, Guangdong, China
| | - Zhuoyan Wang
- College of New Materials and New Energies, Shenzhen Technology University, Shenzhen, 518118, Guangdong, China
- College of Applied Technology, Shenzhen University, Shenzhen, 518000, Guangdong, China
| | - Man Zhou
- College of New Materials and New Energies, Shenzhen Technology University, Shenzhen, 518118, Guangdong, China
- College of Applied Technology, Shenzhen University, Shenzhen, 518000, Guangdong, China
| | - Boqi Wu
- Jilin Jianzhu University, Changchun, 130118, Jilin, China
| | - C Y Chan
- College of New Materials and New Energies, Shenzhen Technology University, Shenzhen, 518118, Guangdong, China.
| | - Ling Bing Kong
- College of New Materials and New Energies, Shenzhen Technology University, Shenzhen, 518118, Guangdong, China.
| |
Collapse
|
3
|
Wu Z, Liu M, Pang Y, Deng L, Yang Y, Wu Y. A Comparative Study of Deep Learning Dose Prediction Models for Cervical Cancer Volumetric Modulated Arc Therapy. Technol Cancer Res Treat 2024; 23:15330338241242654. [PMID: 38584413 PMCID: PMC11005497 DOI: 10.1177/15330338241242654] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2023] [Revised: 12/19/2023] [Accepted: 02/19/2024] [Indexed: 04/09/2024] Open
Abstract
Purpose: Deep learning (DL) is widely used in dose prediction for radiation oncology, multiple DL techniques comparison is often lacking in the literature. To compare the performance of 4 state-of-the-art DL models in predicting the voxel-level dose distribution for cervical cancer volumetric modulated arc therapy (VMAT). Methods and Materials: A total of 261 patients' plans for cervical cancer were retrieved in this retrospective study. A three-channel feature map, consisting of a planning target volume (PTV) mask, organs at risk (OARs) mask, and CT image was fed into the three-dimensional (3D) U-Net and its 3 variants models. The data set was randomly divided into 80% as training-validation and 20% as testing set, respectively. The model performance was evaluated on the 52 testing patients by comparing the generated dose distributions against the clinical approved ground truth (GT) using mean absolute error (MAE), dose map difference (GT-predicted), clinical dosimetric indices, and dice similarity coefficients (DSC). Results: The 3D U-Net and its 3 variants DL models exhibited promising performance with a maximum MAE within the PTV 0.83% ± 0.67% in the UNETR model. The maximum MAE among the OARs is the left femoral head, which reached 6.95% ± 6.55%. For the body, the maximum MAE was observed in UNETR, which is 1.19 ± 0.86%, and the minimum MAE was 0.94 ± 0.85% for 3D U-Net. The average error of the Dmean difference for different OARs is within 2.5 Gy. The average error of V40 difference for the bladder and rectum is about 5%. The mean DSC under different isodose volumes was above 90%. Conclusions: DL models can predict the voxel-level dose distribution accurately for cervical cancer VMAT treatment plans. All models demonstrated almost analogous performance for voxel-wise dose prediction maps. Considering all voxels within the body, 3D U-Net showed the best performance. The state-of-the-art DL models are of great significance for further clinical applications of cervical cancer VMAT.
Collapse
Affiliation(s)
- Zhe Wu
- Department of Digital Medicine, School of Biomedical Engineering and Medical Imaging, Army Medical University (Third Military Medical University), Chongqing, China
- Department of Radiation Oncology, Zigong Disease Prevention and Control Center Mental Health Center, Zigong First People's Hospital, Zigong, Sichuan, China
| | - Mujun Liu
- Department of Digital Medicine, School of Biomedical Engineering and Medical Imaging, Army Medical University (Third Military Medical University), Chongqing, China
| | - Ya Pang
- Department of Radiation Oncology, Zigong Disease Prevention and Control Center Mental Health Center, Zigong First People's Hospital, Zigong, Sichuan, China
| | - Lihua Deng
- Department of Radiology, The First Affiliated Hospital of the Army Medical University, Chongqing, China
| | - Yi Yang
- Department of Digital Medicine, School of Biomedical Engineering and Medical Imaging, Army Medical University (Third Military Medical University), Chongqing, China
| | - Yi Wu
- Department of Digital Medicine, School of Biomedical Engineering and Medical Imaging, Army Medical University (Third Military Medical University), Chongqing, China
| |
Collapse
|
4
|
Wang K, Wang X, Xi Z, Li J, Zhang X, Wang R. Automatic Segmentation and Quantification of Abdominal Aortic Calcification in Lateral Lumbar Radiographs Based on Deep-Learning-Based Algorithms. Bioengineering (Basel) 2023; 10:1164. [PMID: 37892894 PMCID: PMC10604574 DOI: 10.3390/bioengineering10101164] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2023] [Revised: 10/01/2023] [Accepted: 10/02/2023] [Indexed: 10/29/2023] Open
Abstract
To investigate the performance of deep-learning-based algorithms for the automatic segmentation and quantification of abdominal aortic calcification (AAC) in lateral lumbar radiographs, we retrospectively collected 1359 consecutive lateral lumbar radiographs. The data were randomly divided into model development and hold-out test datasets. The model development dataset was used to develop U-shaped fully convolutional network (U-Net) models to segment the landmarks of vertebrae T12-L5, the aorta, and anterior and posterior aortic calcifications. The AAC lengths were calculated, resulting in an automatic Kauppila score output. The vertebral levels, AAC scores, and AAC severity were obtained from clinical reports and analyzed by an experienced expert (reference standard) and the model. Compared with the reference standard, the U-Net model demonstrated a good performance in predicting the total AAC score in the hold-out test dataset, with a correlation coefficient of 0.97 (p <0.001). The overall accuracy for the AAC severity was 0.77 for the model and 0.74 for the clinical report. Additionally, the Kendall coefficient of concordance of the total AAC score prediction was 0.89 between the model-predicted score and the reference standard, and 0.88 between the structured clinical report and the reference standard. In conclusion, the U-Net-based deep learning approach demonstrated a relatively high model performance in automatically segmenting and quantifying ACC.
Collapse
Affiliation(s)
- Kexin Wang
- Department of Radiology, Peking University First Hospital, Beijing 100034, China
- School of Basic Medical Sciences, Capital Medical University, Beijing 100069, China
| | - Xiaoying Wang
- Department of Radiology, Peking University First Hospital, Beijing 100034, China
| | - Zuqiang Xi
- Beijing Smart Tree Medical Technology Co., Ltd., Beijing 102200, China
| | - Jialun Li
- Beijing Smart Tree Medical Technology Co., Ltd., Beijing 102200, China
| | - Xiaodong Zhang
- Department of Radiology, Peking University First Hospital, Beijing 100034, China
| | - Rui Wang
- Department of Radiology, Peking University First Hospital, Beijing 100034, China
| |
Collapse
|
5
|
Aslan MF. A robust semantic lung segmentation study for CNN-based COVID-19 diagnosis. CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS : AN INTERNATIONAL JOURNAL SPONSORED BY THE CHEMOMETRICS SOCIETY 2022; 231:104695. [PMID: 36311473 PMCID: PMC9595502 DOI: 10.1016/j.chemolab.2022.104695] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/28/2022] [Revised: 10/16/2022] [Accepted: 10/17/2022] [Indexed: 05/06/2023]
Abstract
This paper aims to diagnose COVID-19 by using Chest X-Ray (CXR) scan images in a deep learning-based system. First of all, COVID-19 Chest X-Ray Dataset is used to segment the lung parts in CXR images semantically. DeepLabV3+ architecture is trained by using the masks of the lung parts in this dataset. The trained architecture is then fed with images in the COVID-19 Radiography Database. In order to improve the output images, some image preprocessing steps are applied. As a result, lung regions are successfully segmented from CXR images. The next step is feature extraction and classification. While features are extracted with modified AlexNet (mAlexNet), Support Vector Machine (SVM) is used for classification. As a result, 3-class data consisting of Normal, Viral Pneumonia and COVID-19 class are classified with 99.8% success. Classification results show that the proposed method is superior to previous state-of-the-art methods.
Collapse
Affiliation(s)
- Muhammet Fatih Aslan
- Electrical and Electronics Engineering, Karamanoglu Mehmetbey University, Karaman, Turkey
| |
Collapse
|