1
|
Yang P, Qiu H, Yang X, Wang L, Wang X. SAGL: A self-attention-based graph learning framework for predicting survival of colorectal cancer patients. Comput Methods Programs Biomed 2024; 249:108159. [PMID: 38583291 DOI: 10.1016/j.cmpb.2024.108159] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/25/2023] [Revised: 02/28/2024] [Accepted: 03/29/2024] [Indexed: 04/09/2024]
Abstract
BACKGROUND AND OBJECTIVE Colorectal cancer (CRC) is one of the most commonly diagnosed cancers worldwide. The accurate survival prediction for CRC patients plays a significant role in the formulation of treatment strategies. Recently, machine learning and deep learning approaches have been increasingly applied in cancer survival prediction. However, most existing methods inadequately represent and leverage the dependencies among features and fail to sufficiently mine and utilize the comorbidity patterns of CRC. To address these issues, we propose a self-attention-based graph learning (SAGL) framework to improve the postoperative cancer-specific survival prediction for CRC patients. METHODS We present a novel method for constructing dependency graph (DG) to reflect two types of dependencies including comorbidity-comorbidity dependencies and the dependencies between features related to patient characteristics and cancer treatments. This graph is subsequently refined by a disease comorbidity network, which offers a holistic view of comorbidity patterns of CRC. A DG-guided self-attention mechanism is proposed to unearth novel dependencies beyond what DG offers, thus augmenting CRC survival prediction. Finally, each patient will be represented, and these representations will be used for survival prediction. RESULTS The experimental results show that SAGL outperforms state-of-the-art methods on a real-world dataset, with the receiver operating characteristic curve for 3- and 5-year survival prediction achieving 0.849±0.002 and 0.895±0.005, respectively. In addition, the comparison results with different graph neural network-based variants demonstrate the advantages of our DG-guided self-attention graph learning framework. CONCLUSIONS Our study reveals that the potential of the DG-guided self-attention in optimizing feature graph learning which can improve the performance of CRC survival prediction.
Collapse
Affiliation(s)
- Ping Yang
- School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu, 611731, PR China
| | - Hang Qiu
- School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu, 611731, PR China; Big Data Research Center, University of Electronic Science and Technology of China, Chengdu, 611731, PR China.
| | - Xulin Yang
- School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu, 611731, PR China
| | - Liya Wang
- Big Data Research Center, University of Electronic Science and Technology of China, Chengdu, 611731, PR China
| | - Xiaodong Wang
- Department of Gastrointestinal Surgery, West China Hospital, Sichuan University, Chengdu, 610041, PR China.
| |
Collapse
|
2
|
Qian Z, Wang Z, Zhang X, Wei B, Lai M, Shou J, Fan Y, Xu Y. MSNSegNet: attention-based multi-shape nuclei instance segmentation in histopathology images. Med Biol Eng Comput 2024; 62:1821-1836. [PMID: 38401007 DOI: 10.1007/s11517-024-03050-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2023] [Accepted: 02/13/2024] [Indexed: 02/26/2024]
Abstract
In clinical research, the segmentation of irregularly shaped nuclei, particularly in mesenchymal areas like fibroblasts, is crucial yet often neglected. These irregular nuclei are significant for assessing tissue repair in immunotherapy, a process involving neovascularization and fibroblast proliferation. Proper segmentation of these nuclei is vital for evaluating immunotherapy's efficacy, as it provides insights into pathological features. However, the challenge lies in the pronounced curvature variations of these non-convex nuclei, making their segmentation more difficult than that of regular nuclei. In this work, we introduce an undefined task to segment nuclei with both regular and irregular morphology, namely multi-shape nuclei segmentation. We propose a proposal-based method to perform multi-shape nuclei segmentation. By leveraging the two-stage structure of the proposal-based method, a powerful refinement module with high computational costs can be selectively deployed only in local regions, improving segmentation accuracy without compromising computational efficiency. We introduce a novel self-attention module to refine features in proposals for the sake of effectiveness and efficiency in the second stage. The self-attention module improves segmentation performance by capturing long-range dependencies to assist in distinguishing the foreground from the background. In this process, similar features get high attention weights while dissimilar ones get low attention weights. In the first stage, we introduce a residual attention module and a semantic-aware module to accurately predict candidate proposals. The two modules capture more interpretable features and introduce additional supervision through semantic-aware loss. In addition, we construct a dataset with a proportion of non-convex nuclei compared with existing nuclei datasets, namely the multi-shape nuclei (MsN) dataset. Our MSNSegNet method demonstrates notable improvements across various metrics compared to the second-highest-scoring methods. For all nuclei, the D i c e score improved by approximately 1.66 % , A J I by about 2.15 % , and D i c e obj by roughly 0.65 % . For non-convex nuclei, which are crucial in clinical applications, our method's A J I improved significantly by approximately 3.86 % and D i c e obj by around 2.54 % . These enhancements underscore the effectiveness of our approach on multi-shape nuclei segmentation, particularly in challenging scenarios involving irregularly shaped nuclei.
Collapse
Affiliation(s)
- Ziniu Qian
- School of Biological Science and Medical Engineering, Beihang University, Haidian District, Beijing, 100191, Beijing, China
| | - Zihua Wang
- School of Biological Science and Medical Engineering, Beihang University, Haidian District, Beijing, 100191, Beijing, China
| | - Xin Zhang
- School of Biological Science and Medical Engineering, Beihang University, Haidian District, Beijing, 100191, Beijing, China
| | - Bingzheng Wei
- Xiaomi Corporation, Haidian District, Beijing, 100085, Beijing, China
| | - Maode Lai
- Department of Pathology, School of Medicine, Zhejiang Provincial Key Laboratory of Disease Proteomics and Alibaba-Zhejiang University Joint Research Center of Future Digital Healthcare, Zhejiang University, Hangzhou, 310027, Zhejiang, China
| | - Jianzhong Shou
- Department of Urology, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Changyang District, Beijing, 100021, Beijing, China
| | - Yubo Fan
- School of Biological Science and Medical Engineering, Beihang University, Haidian District, Beijing, 100191, Beijing, China
| | - Yan Xu
- School of Biological Science and Medical Engineering, Beihang University, Haidian District, Beijing, 100191, Beijing, China.
| |
Collapse
|
3
|
Yang J, Mehta N, Demirci G, Hu X, Ramakrishnan MS, Naguib M, Chen C, Tsai CL. Anomaly-guided weakly supervised lesion segmentation on retinal OCT images. Med Image Anal 2024; 94:103139. [PMID: 38493532 PMCID: PMC11016376 DOI: 10.1016/j.media.2024.103139] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2023] [Revised: 01/23/2024] [Accepted: 03/05/2024] [Indexed: 03/19/2024]
Abstract
The availability of big data can transform the studies in biomedical research to generate greater scientific insights if expert labeling is available to facilitate supervised learning. However, data annotation can be labor-intensive and cost-prohibitive if pixel-level precision is required. Weakly supervised semantic segmentation (WSSS) with image-level labeling has emerged as a promising solution in medical imaging. However, most existing WSSS methods in the medical domain are designed for single-class segmentation per image, overlooking the complexities arising from the co-existence of multiple classes in a single image. Additionally, the multi-class WSSS methods from the natural image domain cannot produce comparable accuracy for medical images, given the challenge of substantial variation in lesion scales and occurrences. To address this issue, we propose a novel anomaly-guided mechanism (AGM) for multi-class segmentation in a single image on retinal optical coherence tomography (OCT) using only image-level labels. AGM leverages the anomaly detection and self-attention approach to integrate weak abnormal signals with global contextual information into the training process. Furthermore, we include an iterative refinement stage to guide the model to focus more on the potential lesions while suppressing less relevant regions. We validate the performance of our model with two public datasets and one challenging private dataset. Experimental results show that our approach achieves a new state-of-the-art performance in WSSS for lesion segmentation on OCT images.
Collapse
Affiliation(s)
- Jiaqi Yang
- Graduate Center CUNY, 365 5th Ave, NY 10016, USA.
| | - Nitish Mehta
- New York University Department of Ophthalmology, NYU Langone Health, 222 E. 41st St., 3rd Floor, NY 10017, USA
| | | | - Xiaoling Hu
- Stony Brook University, 100 Nicolls Rd, Stony Brook 11794, USA
| | - Meera S Ramakrishnan
- New York University Department of Ophthalmology, NYU Langone Health, 222 E. 41st St., 3rd Floor, NY 10017, USA
| | - Mina Naguib
- New York University Department of Ophthalmology, NYU Langone Health, 222 E. 41st St., 3rd Floor, NY 10017, USA
| | - Chao Chen
- Stony Brook University, 100 Nicolls Rd, Stony Brook 11794, USA
| | | |
Collapse
|
4
|
Lasantha D, Vidanagamachchi S, Nallaperuma S. CRIECNN: Ensemble convolutional neural network and advanced feature extraction methods for the precise forecasting of circRNA-RBP binding sites. Comput Biol Med 2024; 174:108466. [PMID: 38615462 DOI: 10.1016/j.compbiomed.2024.108466] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2023] [Revised: 03/29/2024] [Accepted: 04/08/2024] [Indexed: 04/16/2024]
Abstract
Circular RNAs (circRNAs) have surfaced as important non-coding RNA molecules in biology. Understanding interactions between circRNAs and RNA-binding proteins (RBPs) is crucial in circRNA research. Existing prediction models suffer from limited availability and accuracy, necessitating advanced approaches. In this study, we propose CRIECNN (Circular RNA-RBP Interaction predictor using an Ensemble Convolutional Neural Network), a novel ensemble deep learning model that enhances circRNA-RBP binding site prediction accuracy. CRIECNN employs advanced feature extraction methods and evaluates four distinct sequence datasets and encoding techniques (BERT, Doc2Vec, KNF, EIIP). The model consists of an ensemble convolutional neural network, a BiLSTM, and a self-attention mechanism for feature refinement. Our results demonstrate that CRIECNN outperforms state-of-the-art methods in accuracy and performance, effectively predicting circRNA-RBP interactions from both full-length sequences and fragments. This novel strategy makes an enormous advancement in the prediction of circRNA-RBP interactions, improving our understanding of circRNAs and their regulatory roles.
Collapse
Affiliation(s)
- Dilan Lasantha
- Department of Computer Science, University of Ruhuna, Sri Lanka.
| | | | - Sam Nallaperuma
- Department of Engineering, University of Cambridge, United Kingdom.
| |
Collapse
|
5
|
Ju Z, Zhou Z, Qi Z, Yi C. H2MaT-Unet:Hierarchical hybrid multi-axis transformer based Unet for medical image segmentation. Comput Biol Med 2024; 174:108387. [PMID: 38613886 DOI: 10.1016/j.compbiomed.2024.108387] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2023] [Revised: 02/27/2024] [Accepted: 03/24/2024] [Indexed: 04/15/2024]
Abstract
Accurate segmentation and lesion localization are essential for treating diseases in medical images. Despite deep learning methods enhancing segmentation, they still have limitations due to convolutional neural networks' inability to capture long-range feature dependencies. The self-attention mechanism in Transformers addresses this drawback, but high-resolution images present computational complexity. To improve the convolution and Transformer, we suggest a hierarchical hybrid multiaxial attention mechanism called H2MaT-Unet. This approach combines hierarchical post-feature data and applies the multiaxial attention mechanism to the feature interactions. This design facilitates efficient local and global interactions. Furthermore, we introduce a Spatial and Channel Reconstruction Convolution (ScConv) module to enhance feature aggregation. The paper introduces the H2MaT-UNet model which achieves 87.74% Dice in the multi-target segmentation task and 87.88% IOU in the single-target segmentation task, surpassing current popular models and accomplish a new SOTA. H2MaT-UNet synthesizes multi-scale feature information during the layering stage and utilizes a multi-axis attention mechanism to amplify global information interactions in an innovative manner. This re-search holds value for the practical application of deep learning in clinical settings. It allows healthcare providers to analyze segmented details of medical images more quickly and accurately.
Collapse
Affiliation(s)
- ZhiYong Ju
- Shanghai University of Science and Technology, School of Optical-Electrical and Computer Engineering, Shanghai 200093, China
| | - ZhongChen Zhou
- Shanghai University of Science and Technology, School of Optical-Electrical and Computer Engineering, Shanghai 200093, China.
| | - ZiXiang Qi
- Shanghai University of Science and Technology, School of Optical-Electrical and Computer Engineering, Shanghai 200093, China
| | - Cheng Yi
- Shanghai University of Science and Technology, School of Optical-Electrical and Computer Engineering, Shanghai 200093, China
| |
Collapse
|
6
|
Boulila W, Ghandorh H, Masood S, Alzahem A, Koubaa A, Ahmed F, Khan Z, Ahmad J. A transformer-based approach empowered by a self-attention technique for semantic segmentation in remote sensing. Heliyon 2024; 10:e29396. [PMID: 38665569 PMCID: PMC11043938 DOI: 10.1016/j.heliyon.2024.e29396] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2023] [Revised: 03/27/2024] [Accepted: 04/08/2024] [Indexed: 04/28/2024] Open
Abstract
Semantic segmentation of Remote Sensing (RS) images involves the classification of each pixel in a satellite image into distinct and non-overlapping regions or segments. This task is crucial in various domains, including land cover classification, autonomous driving, and scene understanding. While deep learning has shown promising results, there is limited research that specifically addresses the challenge of processing fine details in RS images while also considering the high computational demands. To tackle this issue, we propose a novel approach that combines convolutional and transformer architectures. Our design incorporates convolutional layers with a low receptive field to generate fine-grained feature maps for small objects in very high-resolution images. On the other hand, transformer blocks are utilized to capture contextual information from the input. By leveraging convolution and self-attention in this manner, we reduce the need for extensive downsampling and enable the network to work with full-resolution features, which is particularly beneficial for handling small objects. Additionally, our approach eliminates the requirement for vast datasets, which is often necessary for purely transformer-based networks. In our experimental results, we demonstrate the effectiveness of our method in generating local and contextual features using convolutional and transformer layers, respectively. Our approach achieves a mean dice score of 80.41%, outperforming other well-known techniques such as UNet, Fully-Connected Network (FCN), Pyramid Scene Parsing Network (PSP Net), and the recent Convolutional vision Transformer (CvT) model, which achieved mean dice scores of 78.57%, 74.57%, 73.45%, and 62.97% respectively, under the same training conditions and using the same training dataset.
Collapse
Affiliation(s)
- Wadii Boulila
- Robotics and Internet-of-Things Laboratory, Prince Sultan University, Riyadh 12435, Saudi Arabia
- RIADI Laboratory, National School of Computer Science, University of Manouba, Manouba 2010, Tunisia
| | - Hamza Ghandorh
- College of Computer Science and Engineering, Taibah University, Medina 42353, Saudi Arabia
| | - Sharjeel Masood
- Department of IT and Energy Convergence, Korea National University of Transportation, Chungju, South Korea
| | - Ayyub Alzahem
- Robotics and Internet-of-Things Laboratory, Prince Sultan University, Riyadh 12435, Saudi Arabia
| | - Anis Koubaa
- Robotics and Internet-of-Things Laboratory, Prince Sultan University, Riyadh 12435, Saudi Arabia
| | - Fawad Ahmed
- Department of Cyber Security, Pakistan Navy Engineering College, NUST, Islamabad 75350, Pakistan
| | - Zahid Khan
- Robotics and Internet-of-Things Laboratory, Prince Sultan University, Riyadh 12435, Saudi Arabia
| | - Jawad Ahmad
- School of Computing, Engineering and the Built Environment, Edinburgh Napier University, Edinburgh EH10 5DT, United Kingdom
| |
Collapse
|
7
|
Deng F, Liu X, Zhou P, Shen J, Huang Y. Multi-stage progressive detection method for water deficit detection in vertical greenery plants. Sci Rep 2024; 14:9601. [PMID: 38671210 PMCID: PMC11053074 DOI: 10.1038/s41598-024-60179-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2023] [Accepted: 04/19/2024] [Indexed: 04/28/2024] Open
Abstract
Detecting the water deficit status of vertical greenery plants rapidly and accurately is a significant challenge in the process of cultivating and planting greenery plants. Currently, the mainstream method involves utilizing a single target detection algorithm for this task. However, in complex real-world scenarios, the accuracy of detection is influenced by factors such as image quality and background environment. Therefore, we propose a multi-stage progressive detection method aimed at enhancing detection accuracy by gradually filtering, processing, and detecting images through a multi-stage architecture. Additionally, to reduce the additional computational load brought by multiple stages and improve overall detection efficiency, we introduce a Swin Transformer based on mobile windows and hierarchical representations for feature extraction, along with global feature modeling through a self-attention mechanism. The experimental results demonstrate that our multi-stage detection approach achieves high accuracy in vertical greenery plants detection tasks, with an average precision of 93.5%. This represents an improvement of 19.2%, 17.3%, 13.8%, and 9.2% compared to Mask R-CNN (74.3%), YOLOv7 (76.2%), DETR (79.7%), and Deformable DETR (84.3%), respectively.
Collapse
Affiliation(s)
- Fei Deng
- College of Computer Science and Cyber Security, Chengdu University of Technology, Chengdu, 610059, China
| | - Xuan Liu
- College of Computer Science and Cyber Security, Chengdu University of Technology, Chengdu, 610059, China.
| | - Peng Zhou
- Sichuan Tianyi Ecological Garden Group Co., Ltd., No. 1 Keyuan South Road, High-tech Zone, Chengdu, 610093, Sichuan, China
| | - Jianglin Shen
- Sichuan Tianyi Ecological Garden Group Co., Ltd., No. 1 Keyuan South Road, High-tech Zone, Chengdu, 610093, Sichuan, China
| | - Yuanxiang Huang
- Sichuan Tianyi Ecological Garden Group Co., Ltd., No. 1 Keyuan South Road, High-tech Zone, Chengdu, 610093, Sichuan, China.
| |
Collapse
|
8
|
Fan X, Zhou J, Jiang X, Xin M, Hou L. CSAP-UNet: Convolution and self-attention paralleling network for medical image segmentation with edge enhancement. Comput Biol Med 2024; 172:108265. [PMID: 38461698 DOI: 10.1016/j.compbiomed.2024.108265] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2023] [Revised: 02/14/2024] [Accepted: 03/06/2024] [Indexed: 03/12/2024]
Abstract
Convolution operation is performed within a local window of the input image. Therefore, convolutional neural network (CNN) is skilled in obtaining local information. Meanwhile, the self-attention (SA) mechanism extracts features by calculating the correlation between tokens from all positions in the image, which has advantage in obtaining global information. Therefore, the two modules can complement each other to improve feature extraction ability. An effective fusion method is a problem worthy of further study. In this paper, we propose a CNN and SA paralleling network CSAP-UNet with U-Net as backbone. The encoder consists of two parallel branches of CNN and Transformer to extract the feature from the input image, which takes into account both the global dependencies and the local information. Because medical images come from certain frequency bands within the spectrum, their color channels are not as uniform as natural images. Meanwhile, medical segmentation pays more attention to lesion regions in the image. Attention fusion module (AFM) integrates channel attention and spatial attention in series to fuse the output features of the two branches. The medical image segmentation task is essentially to locate the boundary of the object in the image. The boundary enhancement module (BEM) is designed in the shallow layer of the proposed network to focus more specifically on pixel-level edge details. Experimental results on three public datasets validate that CSAP-UNet outperforms state-of-the-art networks, particularly on the ISIC 2017 dataset. The cross-dataset evaluation on Kvasir and CVC-ClinicDB shows that CSAP-UNet has strong generalization ability. Ablation experiments also indicate the effectiveness of the designed modules. The code for training and test is available at https://github.com/zhouzhou1201/CSAP-UNet.git.
Collapse
Affiliation(s)
- Xiaodong Fan
- Faculty of Electrical and Control Engineering, Liaoning Technical University, Huludao, 125105, Liaoning, China.
| | - Jing Zhou
- College of Mathematics, Bohai University, Jinzhou, 121013, Liaoning, China
| | - Xiaoli Jiang
- College of Mathematics, Bohai University, Jinzhou, 121013, Liaoning, China
| | - Meizhuo Xin
- College of Mathematics, Bohai University, Jinzhou, 121013, Liaoning, China
| | - Limin Hou
- Faculty of Electrical and Control Engineering, Liaoning Technical University, Huludao, 125105, Liaoning, China
| |
Collapse
|
9
|
Gao T, Xu CZ, Zhang L, Kong H. GSB: Group superposition binarization for vision transformer with limited training samples. Neural Netw 2024; 172:106133. [PMID: 38266471 DOI: 10.1016/j.neunet.2024.106133] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2023] [Revised: 12/02/2023] [Accepted: 01/15/2024] [Indexed: 01/26/2024]
Abstract
Vision Transformer (ViT) has performed remarkably in various computer vision tasks. Nonetheless, affected by the massive amount of parameters, ViT usually suffers from serious overfitting problems with a relatively limited number of training samples. In addition, ViT generally demands heavy computing resources, which limit its deployment on resource-constrained devices. As a type of model-compression method, model binarization is potentially a good choice to solve the above problems. Compared with the full-precision one, the model with the binarization method replaces complex tensor multiplication with simple bit-wise binary operations and represents full-precision model parameters and activations with only 1-bit ones, which potentially solves the problem of model size and computational complexity, respectively. In this paper, we investigate a binarized ViT model. Empirically, we observe that the existing binarization technology designed for Convolutional Neural Networks (CNN) cannot migrate well to a ViT's binarization task. We also find that the decline of the accuracy of the binary ViT model is mainly due to the information loss of the Attention module and the Value vector. Therefore, we propose a novel model binarization technique, called Group Superposition Binarization (GSB), to deal with these issues. Furthermore, in order to further improve the performance of the binarization model, we have investigated the gradient calculation procedure in the binarization process and derived more proper gradient calculation equations for GSB to reduce the influence of gradient mismatch. Then, the knowledge distillation technique is introduced to alleviate the performance degradation caused by model binarization. Analytically, model binarization can limit the parameter's search space during parameter updates while training a model. Therefore, the binarization process can actually play an implicit regularization role and help solve the problem of overfitting in the case of insufficient training data. Experiments on three datasets with limited numbers of training samples demonstrate that the proposed GSB model achieves state-of-the-art performance among the binary quantization schemes and exceeds its full-precision counterpart on some indicators. Code and models are available at: https://github.com/IMRL/GSB-Vision-Transformer.
Collapse
Affiliation(s)
- Tian Gao
- School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, Jiangsu, China.
| | - Cheng-Zhong Xu
- State Key Laboratory of Internet of Things for Smart City (SKL-IOTSC), University of Macau, 999078, Macao Special Administrative Region of China; Department of Computer and Information Science (CIS), University of Macau, 999078, Macao Special Administrative Region of China.
| | - Le Zhang
- School of Information and Communication Engineering, University of Electronic Science and Technology of China, Chengdu, Sichuan, China.
| | - Hui Kong
- State Key Laboratory of Internet of Things for Smart City (SKL-IOTSC), University of Macau, 999078, Macao Special Administrative Region of China; Department of Computer and Information Science (CIS), University of Macau, 999078, Macao Special Administrative Region of China; Department of Electromechanical Engineering (EME), University of Macau, 999078, Macao Special Administrative Region of China.
| |
Collapse
|
10
|
Jin Y, Liu J, Zhou Y, Chen R, Chen H, Duan W, Chen Y, Zhang XL. CRDet: A circle representation detector for lung granulomas based on multi-scale attention features with center point calibration. Comput Med Imaging Graph 2024; 113:102354. [PMID: 38341946 DOI: 10.1016/j.compmedimag.2024.102354] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2023] [Revised: 02/04/2024] [Accepted: 02/04/2024] [Indexed: 02/13/2024]
Abstract
Lung granuloma is a very common lung disease, and its specific diagnosis is important for determining the exact cause of the disease as well as the prognosis of the patient. And, an effective lung granuloma detection model based on computer-aided diagnostics (CAD) can help pathologists to localize granulomas, thereby improving the efficiency of the specific diagnosis. However, for lung granuloma detection models based on CAD, the significant size differences between granulomas and how to better utilize the morphological features of granulomas are both critical challenges to be addressed. In this paper, we propose an automatic method CRDet to localize granulomas in histopathological images and deal with these challenges. We first introduce the multi-scale feature extraction network with self-attention to extract features at different scales at the same time. Then, the features will be converted to circle representations of granulomas by circle representation detection heads to achieve the alignment of features and ground truth. In this way, we can also more effectively use the circular morphological features of granulomas. Finally, we propose a center point calibration method at the inference stage to further optimize the circle representation. For model evaluation, we built a lung granuloma circle representation dataset named LGCR, including 288 images from 50 subjects. Our method yielded 0.316 mAP and 0.571 mAR, outperforming the state-of-the-art object detection methods on our proposed LGCR.
Collapse
Affiliation(s)
- Yu Jin
- Institute of Artificial Intelligence, School of Computer Science, Wuhan University, Wuhan, China
| | - Juan Liu
- Institute of Artificial Intelligence, School of Computer Science, Wuhan University, Wuhan, China.
| | - Yuanyuan Zhou
- Department of Immunology, TaiKang Medical School (School of Basic Medical Sciences), Wuhan University, Wuhan, China; Hubei Province Key Laboratory of Allergy and Immunology, Wuhan University, Wuhan, China
| | - Rong Chen
- Wuhan Jinyintan Hospital, Tongji Medical College of Huazhong University of Science and Technology, Wuhan, China
| | - Hua Chen
- Institute of Artificial Intelligence, School of Computer Science, Wuhan University, Wuhan, China
| | - Wensi Duan
- Institute of Artificial Intelligence, School of Computer Science, Wuhan University, Wuhan, China
| | - Yuqi Chen
- Institute of Artificial Intelligence, School of Computer Science, Wuhan University, Wuhan, China
| | - Xiao-Lian Zhang
- Department of Immunology, TaiKang Medical School (School of Basic Medical Sciences), Wuhan University, Wuhan, China; Hubei Province Key Laboratory of Allergy and Immunology, Wuhan University, Wuhan, China
| |
Collapse
|
11
|
Shao Y, Zhou K, Zhang L. CSSNet: Cascaded spatial shift network for multi-organ segmentation. Comput Biol Med 2024; 170:107955. [PMID: 38215618 DOI: 10.1016/j.compbiomed.2024.107955] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2023] [Revised: 12/26/2023] [Accepted: 01/01/2024] [Indexed: 01/14/2024]
Abstract
Multi-organ segmentation is vital for clinical diagnosis and treatment. Although CNN and its extensions are popular in organ segmentation, they suffer from the local receptive field. In contrast, MultiLayer-Perceptron-based models (e.g., MLP-Mixer) have a global receptive field. However, these MLP-based models employ fully connected layers with many parameters and tend to overfit on sample-deficient medical image datasets. Therefore, we propose a Cascaded Spatial Shift Network, CSSNet, for multi-organ segmentation. Specifically, we design a novel cascaded spatial shift block to reduce the number of model parameters and aggregate feature segments in a cascaded way for efficient and effective feature extraction. Then, we propose a feature refinement network to aggregate multi-scale features with location information, and enhance the multi-scale features along the channel and spatial axis to obtain a high-quality feature map. Finally, we employ a self-attention-based fusion strategy to focus on the discriminative feature information for better multi-organ segmentation performance. Experimental results on the Synapse (multiply organs) and LiTS (liver & tumor) datasets demonstrate that our CSSNet achieves promising segmentation performance compared with CNN, MLP, and Transformer models. The source code will be available at https://github.com/zkyseu/CSSNet.
Collapse
Affiliation(s)
- Yeqin Shao
- School of Transportation, Nantong University, Jiangsu, 226019, China.
| | - Kunyang Zhou
- School of Zhangjian, Nantong University, Jiangsu, 226019, China
| | - Lichi Zhang
- School of Biomedical Engineering, Shanghai Jiaotong University, Shanghai, 200240, China
| |
Collapse
|
12
|
Zhou Z, Xiao C, Yin J, She J, Duan H, Liu C, Fu X, Cui F, Qi Q, Zhang Z. PSAC-6mA: 6mA site identifier using self-attention capsule network based on sequence-positioning. Comput Biol Med 2024; 171:108129. [PMID: 38342046 DOI: 10.1016/j.compbiomed.2024.108129] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2023] [Revised: 02/06/2024] [Accepted: 02/06/2024] [Indexed: 02/13/2024]
Abstract
DNA N6-methyladenine (6mA) modifications play a pivotal role in the regulation of growth, development, and diseases in organisms. As a significant epigenetic marker, 6mA modifications extensively participate in the intricate regulatory networks of the genome. Hence, gaining a profound understanding of how 6mA is intricately involved in these biological processes is imperative for deciphering the gene regulatory networks within organisms. In this study, we propose PSAC-6mA (Position-self-attention Capsule-6mA), a sequence-location-based self-attention capsule network. The positional layer in the model enables positional relationship extraction and independent parameter setting for each base position, avoiding parameter sharing inherent in convolutional approaches. Simultaneously, the self-attention capsule network enhances dimensionality, capturing correlation information between capsules and achieving exceptional results in feature extraction across multiple spatial dimensions within the model. Experimental results demonstrate the superior performance of PSAC-6mA in recognizing 6mA motifs across various species.
Collapse
Affiliation(s)
- Zheyu Zhou
- School of Computer Science and Technology, Hainan University, Haikou, 570228, China
| | - Cuilin Xiao
- School of Computer Science and Technology, Hainan University, Haikou, 570228, China
| | - Jinfen Yin
- School of Computer Science and Technology, Hainan University, Haikou, 570228, China
| | - Jiayi She
- School of Computer Science and Technology, Hainan University, Haikou, 570228, China
| | - Hao Duan
- School of Computer Science and Technology, Hainan University, Haikou, 570228, China
| | - Chunling Liu
- School of Computer Science and Technology, Hainan University, Haikou, 570228, China
| | - Xiuhao Fu
- School of Computer Science and Technology, Hainan University, Haikou, 570228, China
| | - Feifei Cui
- School of Computer Science and Technology, Hainan University, Haikou, 570228, China
| | - Qi Qi
- School of Computer Science and Technology, Hainan University, Haikou, 570228, China
| | - Zilong Zhang
- School of Computer Science and Technology, Hainan University, Haikou, 570228, China.
| |
Collapse
|
13
|
Luna M, Chikontwe P, Nam S, Park SH. Attention guided multi-scale cluster refinement with extended field of view for amodal nuclei segmentation. Comput Biol Med 2024; 170:108015. [PMID: 38266467 DOI: 10.1016/j.compbiomed.2024.108015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2023] [Revised: 01/04/2024] [Accepted: 01/19/2024] [Indexed: 01/26/2024]
Abstract
Nuclei segmentation plays a crucial role in disease understanding and diagnosis. In whole slide images, cell nuclei often appear overlapping and densely packed with ambiguous boundaries due to the underlying 3D structure of histopathology samples. Instance segmentation via deep neural networks with object clustering is able to detect individual segments in crowded nuclei but suffers from a limited field of view, and does not support amodal segmentation. In this work, we introduce a dense feature pyramid network with a feature mixing module to increase the field of view of the segmentation model while keeping pixel-level details. We also improve the model output quality by adding a multi-scale self-attention guided refinement module that sequentially adjusts predictions as resolution increases. Finally, we enable clusters to share pixels by separating the instance clustering objective function from other pixel-related tasks, and introduce supervision to occluded areas to guide the learning process. For evaluation of amodal nuclear segmentation, we also update prior metrics used in common modal segmentation to allow the evaluation of overlapping masks and mitigate over-penalization issues via a novel unique matching algorithm. Our experiments demonstrate consistent performance across multiple datasets with significantly improved segmentation quality.
Collapse
Affiliation(s)
- Miguel Luna
- Department of Robotics and Mechatronics Engineering, Daegu Gyeongbuk Institute of Science and Technology (DGIST), Daegu, 42988, South Korea
| | - Philip Chikontwe
- Department of Robotics and Mechatronics Engineering, Daegu Gyeongbuk Institute of Science and Technology (DGIST), Daegu, 42988, South Korea
| | - Siwoo Nam
- Department of Robotics and Mechatronics Engineering, Daegu Gyeongbuk Institute of Science and Technology (DGIST), Daegu, 42988, South Korea
| | - Sang Hyun Park
- Department of Robotics and Mechatronics Engineering, Daegu Gyeongbuk Institute of Science and Technology (DGIST), Daegu, 42988, South Korea; AI Graduate School, Daegu Gyeongbuk Institute of Science and Technology (DGIST), Daegu, 42988, South Korea.
| |
Collapse
|
14
|
AlSaad R, Malluhi Q, Abd-Alrazaq A, Boughorbel S. Temporal self-attention for risk prediction from electronic health records using non-stationary kernel approximation. Artif Intell Med 2024; 149:102802. [PMID: 38462292 DOI: 10.1016/j.artmed.2024.102802] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2022] [Revised: 09/27/2023] [Accepted: 02/03/2024] [Indexed: 03/12/2024]
Abstract
Effective modeling of patient representation from electronic health records (EHRs) is increasingly becoming a vital research topic. Yet, modeling the non-stationarity in EHR data has received less attention. Most existing studies follow a strong assumption of stationarity in patient representation from EHRs. However, in practice, a patient's visits are irregularly spaced over a relatively long period of time, and disease progression patterns exhibit non-stationarity. Furthermore, the time gaps between patient visits often encapsulate significant domain knowledge, potentially revealing undiscovered patterns that characterize specific medical conditions. To address these challenges, we introduce a new method which combines the self-attention mechanism with non-stationary kernel approximation to capture both contextual information and temporal relationships between patient visits in EHRs. To assess the effectiveness of our proposed approach, we use two real-world EHR datasets, comprising a total of 76,925 patients, for the task of predicting the next diagnosis code for a patient, given their EHR history. The first dataset is a general EHR cohort and consists of 11,451 patients with a total of 3,485 unique diagnosis codes. The second dataset is a disease-specific cohort that includes 65,474 pregnant patients and encompasses a total of 9,782 unique diagnosis codes. Our experimental evaluation involved nine prediction models, categorized into three distinct groups. Group 1 comprises the baselines: original self-attention with positional encoding model, RETAIN model, and LSTM model. Group 2 includes models employing self-attention with stationary kernel approximations, specifically incorporating three variations of Bochner's feature maps. Lastly, Group 3 consists of models utilizing self-attention with non-stationary kernel approximations, including quadratic, cubic, and bi-quadratic polynomials. The experimental results demonstrate that non-stationary kernels significantly outperformed baseline methods for NDCG@10 and Hit@10 metrics in both datasets. The performance boost was more substantial in dataset 1 for the NDCG@10 metric. On the other hand, stationary Kernels showed significant but smaller gains over baselines and were nearly as effective as Non-stationary Kernels for Hit@10 in dataset 2. These findings robustly validate the efficacy of employing non-stationary kernels for temporal modeling of EHR data, and emphasize the importance of modeling non-stationary temporal information in healthcare prediction tasks.
Collapse
Affiliation(s)
- Rawan AlSaad
- AI Center for Precision Health, Weill Cornell Medicine-Qatar, Qatar.
| | | | - Alaa Abd-Alrazaq
- AI Center for Precision Health, Weill Cornell Medicine-Qatar, Qatar
| | - Sabri Boughorbel
- Qatar Computing Research Institute, Hamad Bin Khalifa University, Qatar
| |
Collapse
|
15
|
Sun S, Fu C, Xu S, Wen Y, Ma T. GLFNet: Global-local fusion network for the segmentation in ultrasound images. Comput Biol Med 2024; 171:108103. [PMID: 38335822 DOI: 10.1016/j.compbiomed.2024.108103] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2023] [Revised: 01/27/2024] [Accepted: 02/04/2024] [Indexed: 02/12/2024]
Abstract
Ultrasound imaging, as a portable and radiation-free modality, presents challenges for accurate segmentation due to the variability of lesions and the similar intensity values of surrounding tissues. Current deep learning approaches leverage convolution for extracting local features and self-attention for handling global dependencies. However, traditional CNNs are spatially local, and Vision Transformers lack image specific bias and are computationally demanding. In response, we propose the Global-Local Fusion Network (GLFNet), a hybrid structure addressing the limitations of both CNNs and Vision Transformers. The GLFNet, featuring Global-Local Fusion Blocks (GLFBlocks), integrates global semantic information with local details to improve segmentation. Each GLFBlock comprises Global and Local Branches for feature extraction in parallel. Within the Global and Local Branches, we introduce the Self-Attention Convolution Fusion Block (SACFBlock), which includes a Spatial-Attention Module and Channel-Attention Module. Experimental results show that our proposed GLFNet surpasses its counterparts in the segmentation tasks, achieving the overall best results with an mIoU of 79.58% and Dice coefficient of 74.62% in the DDTI dataset, an mIoU of 76.61% and Dice coefficient of 71.04% in the BUSI dataset, and an mIoU of 86.77% and Dice coefficient of 87.38% in the BUID dataset. The fusion of local and global features contributes to enhanced performance, making GLFNet a promising approach for ultrasound image segmentation.
Collapse
Affiliation(s)
- Shiyao Sun
- School of Computer Science and Engineering, Northeastern University, Shenyang 110819, China
| | - Chong Fu
- School of Computer Science and Engineering, Northeastern University, Shenyang 110819, China; Key Laboratory of Intelligent Computing in Medical Image, Ministry of Education, Northeastern University, Shenyang 110819, China; Engineering Research Center of Security Technology of Complex Network System, Ministry of Education, China.
| | - Sen Xu
- General Hospital of Northern Theatre Command, Shenyang 110016, China
| | - Yingyou Wen
- School of Computer Science and Engineering, Northeastern University, Shenyang 110819, China; Medical Imaging Research Department, Neusoft Research of Intelligent Healthcare Technology, Co. Ltd., Shenyang, China
| | - Tao Ma
- Dopamine Group Ltd., Auckland, 1542, New Zealand
| |
Collapse
|
16
|
Çelebi M, Öztürk S, Kaplan K. An emotion recognition method based on EWT-3D-CNN-BiLSTM-GRU-AT model. Comput Biol Med 2024; 169:107954. [PMID: 38183705 DOI: 10.1016/j.compbiomed.2024.107954] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2023] [Revised: 12/28/2023] [Accepted: 01/01/2024] [Indexed: 01/08/2024]
Abstract
This has become a significant study area in recent years because of its use in brain-machine interaction (BMI). The robustness problem of emotion classification is one of the most basic approaches for improving the quality of emotion recognition systems. One of the two main branches of these approaches deals with the problem by extracting the features using manual engineering and the other is the famous artificial intelligence approach, which infers features of EEG data. This study proposes a novel method that considers the characteristic behavior of EEG recordings and based on the artificial intelligence method. The EEG signal is a noisy signal with a non-stationary and non-linear form. Using the Empirical Wavelet Transform (EWT) signal decomposition method, the signal's frequency components are obtained. Then, frequency-based features, linear and non-linear features are extracted. The resulting frequency-based, linear, and nonlinear features are mapped to the 2-D axis according to the positions of the EEG electrodes. By merging this 2-D images, 3-D images are constructed. In this way, the multichannel brain frequency of EEG recordings, spatial and temporal relationship are combined. Lastly, 3-D deep learning framework was constructed, which was combined with convolutional neural network (CNN), bidirectional long-short term memory (BiLSTM) and gated recurrent unit (GRU) with self-attention (AT). This model is named EWT-3D-CNN-BiLSTM-GRU-AT. As a result, we have created framework comprising handcrafted features generated and cascaded from state-of-the-art deep learning models. The framework is evaluated on the DEAP recordings based on the person-independent approach. The experimental findings demonstrate that the developed model can achieve classification accuracies of 90.57 % and 90.59 % for valence and arousal axes, respectively, for the DEAP database. Compared with existing cutting-edge emotion classification models, the proposed framework exhibits superior results for classifying human emotions.
Collapse
Affiliation(s)
- Muharrem Çelebi
- Electronics and Communication Engineering, Kocaeli University, Kocaeli, 41001, Turkey.
| | - Sıtkı Öztürk
- Electronics and Communication Engineering, Kocaeli University, Kocaeli, 41001, Turkey.
| | - Kaplan Kaplan
- Software Engineering, Kocaeli University, Kocaeli, 41001, Turkey.
| |
Collapse
|
17
|
Yu L, Xu Z, Qiu W, Xiao X. MSDSE: Predicting drug-side effects based on multi-scale features and deep multi-structure neural network. Comput Biol Med 2024; 169:107812. [PMID: 38091725 DOI: 10.1016/j.compbiomed.2023.107812] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2023] [Revised: 11/10/2023] [Accepted: 12/03/2023] [Indexed: 02/08/2024]
Abstract
Unexpected side effects may accompany the research stage and post-marketing of drugs. These accidents lead to drug development failure and even endanger patients' health. Thus, it is essential to recognize the unknown drug-side effects. Most existing methods in silico find the answer from the association network or similarity network of drugs while ignoring the drug-intrinsic attributes. The limitation is that they can only handle drugs in the maturation stage. To be suitable for early drug-side effect screening, we conceive a multi-structural deep learning framework, MSDSE, which synthetically considers the multi-scale features derived from the drug. MSDSE can jointly learn SMILES sequence-based word embedding, substructure-based molecular fingerprint, and chemical structure-based graph embedding. In the preprocessing stage of MSDSE, we project all features to the abstract space with the same dimension. MSDSE builds a bi-level channel strategy, including a convolutional neural network module with an Inception structure and a multi-head Self-Attention module, to learn and integrate multi-modal features from local to global perspectives. Finally, MSDSE regards the prediction of drug-side effects as pair-wise learning and outputs the pair-wise probability of drug-side effects through the inner product operation. MSDSE is evaluated and analyzed on benchmark datasets and performs optimally compared to other baseline models. We also set up the ablation study to explain the rationality of the feature approach and model structure. Moreover, we select model partial prediction results for the case study to reveal actual capability. The original data are available at http://github.com/yuliyi/MSDSE.
Collapse
Affiliation(s)
- Liyi Yu
- School of Information Engineering, Jingdezhen Ceramic University, Jingdezhen, 333403, China
| | - Zhaochun Xu
- School of Information Engineering, Jingdezhen Ceramic University, Jingdezhen, 333403, China
| | - Wangren Qiu
- School of Information Engineering, Jingdezhen Ceramic University, Jingdezhen, 333403, China
| | - Xuan Xiao
- School of Information Engineering, Jingdezhen Ceramic University, Jingdezhen, 333403, China.
| |
Collapse
|
18
|
Wang H, Huang T, Wang D, Zeng W, Sun Y, Zhang L. MSCAN: multi-scale self- and cross-attention network for RNA methylation site prediction. BMC Bioinformatics 2024; 25:32. [PMID: 38233745 PMCID: PMC10795237 DOI: 10.1186/s12859-024-05649-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2023] [Accepted: 01/11/2024] [Indexed: 01/19/2024] Open
Abstract
BACKGROUND Epi-transcriptome regulation through post-transcriptional RNA modifications is essential for all RNA types. Precise recognition of RNA modifications is critical for understanding their functions and regulatory mechanisms. However, wet experimental methods are often costly and time-consuming, limiting their wide range of applications. Therefore, recent research has focused on developing computational methods, particularly deep learning (DL). Bidirectional long short-term memory (BiLSTM), convolutional neural network (CNN), and the transformer have demonstrated achievements in modification site prediction. However, BiLSTM cannot achieve parallel computation, leading to a long training time, CNN cannot learn the dependencies of the long distance of the sequence, and the Transformer lacks information interaction with sequences at different scales. This insight underscores the necessity for continued research and development in natural language processing (NLP) and DL to devise an enhanced prediction framework that can effectively address the challenges presented. RESULTS This study presents a multi-scale self- and cross-attention network (MSCAN) to identify the RNA methylation site using an NLP and DL way. Experiment results on twelve RNA modification sites (m6A, m1A, m5C, m5U, m6Am, m7G, Ψ, I, Am, Cm, Gm, and Um) reveal that the area under the receiver operating characteristic of MSCAN obtains respectively 98.34%, 85.41%, 97.29%, 96.74%, 99.04%, 79.94%, 76.22%, 65.69%, 92.92%, 92.03%, 95.77%, 89.66%, which is better than the state-of-the-art prediction model. This indicates that the model has strong generalization capabilities. Furthermore, MSCAN reveals a strong association among different types of RNA modifications from an experimental perspective. A user-friendly web server for predicting twelve widely occurring human RNA modification sites (m6A, m1A, m5C, m5U, m6Am, m7G, Ψ, I, Am, Cm, Gm, and Um) is available at http://47.242.23.141/MSCAN/index.php . CONCLUSIONS A predictor framework has been developed through binary classification to predict RNA methylation sites.
Collapse
Affiliation(s)
- Honglei Wang
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou, 221116, China
- School of Information Engineering, Xuzhou College of Industrial Technology, Xuzhou, 221400, China
| | - Tao Huang
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou, 221116, China
| | - Dong Wang
- School of Computer Science and Technology, China University of Mining and Technology, Xuzhou, 221116, China
| | - Wenliang Zeng
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou, 221116, China
| | - Yanjing Sun
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou, 221116, China.
| | - Lin Zhang
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou, 221116, China.
| |
Collapse
|
19
|
Fischer M, Bartler A, Yang B. Prompt tuning for parameter-efficient medical image segmentation. Med Image Anal 2024; 91:103024. [PMID: 37976866 DOI: 10.1016/j.media.2023.103024] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2022] [Revised: 07/16/2023] [Accepted: 11/03/2023] [Indexed: 11/19/2023]
Abstract
Neural networks pre-trained on a self-supervision scheme have become the standard when operating in data rich environments with scarce annotations. As such, fine-tuning a model to a downstream task in a parameter-efficient but effective way, e.g. for a new set of classes in the case of semantic segmentation, is of increasing importance. In this work, we propose and investigate several contributions to achieve a parameter-efficient but effective adaptation for semantic segmentation on two medical imaging datasets. Relying on the recently popularized prompt tuning approach, we provide a prompt-able UNETR (PUNETR) architecture, that is frozen after pre-training, but adaptable throughout the network by class-dependent learnable prompt tokens. We pre-train this architecture with a dedicated dense self-supervision scheme based on assignments to online generated prototypes (contrastive prototype assignment, CPA) of a student teacher combination. Concurrently, an additional segmentation loss is applied for a subset of classes during pre-training, further increasing the effectiveness of leveraged prompts in the fine-tuning phase. We demonstrate that the resulting method is able to attenuate the gap between fully fine-tuned and parameter-efficiently adapted models on CT imaging datasets. To this end, the difference between fully fine-tuned and prompt-tuned variants amounts to 7.81 pp for the TCIA/BTCV dataset as well as 5.37 and 6.57 pp for subsets of the TotalSegmentator dataset in the mean Dice Similarity Coefficient (DSC, in %) while only adjusting prompt tokens, corresponding to 0.51% of the pre-trained backbone model with 24.4M frozen parameters. The code for this work is available on https://github.com/marcdcfischer/PUNETR.
Collapse
Affiliation(s)
- Marc Fischer
- Institute of Signal Processing and System Theory, University of Stuttgart, 70550 Stuttgart, Germany.
| | - Alexander Bartler
- Institute of Signal Processing and System Theory, University of Stuttgart, 70550 Stuttgart, Germany
| | - Bin Yang
- Institute of Signal Processing and System Theory, University of Stuttgart, 70550 Stuttgart, Germany
| |
Collapse
|
20
|
Cong H, Liu H, Cao Y, Liang C, Chen Y. Protein-protein interaction site prediction by model ensembling with hybrid feature and self-attention. BMC Bioinformatics 2023; 24:456. [PMID: 38053020 DOI: 10.1186/s12859-023-05592-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2022] [Accepted: 11/30/2023] [Indexed: 12/07/2023] Open
Abstract
BACKGROUND Protein-protein interactions (PPIs) are crucial in various biological functions and cellular processes. Thus, many computational approaches have been proposed to predict PPI sites. Although significant progress has been made, these methods still have limitations in encoding the characteristics of each amino acid in sequences. Many feature extraction methods rely on the sliding window technique, which simply merges all the features of residues into a vector. The importance of some key residues may be weakened in the feature vector, leading to poor performance. RESULTS We propose a novel sequence-based method for PPI sites prediction. The new network model, PPINet, contains multiple feature processing paths. For a residue, the PPINet extracts the features of the targeted residue and its context separately. These two types of features are processed by two paths in the network and combined to form a protein representation, where the two types of features are of relatively equal importance. The model ensembling technique is applied to make use of more features. The base models are trained with different features and then ensembled via stacking. In addition, a data balancing strategy is presented, by which our model can get significant improvement on highly unbalanced data. CONCLUSION The proposed method is evaluated on a fused dataset constructed from Dset186, Dset_72, and PDBset_164, as well as the public Dset_448 dataset. Compared with current state-of-the-art methods, the performance of our method is better than the others. In the most important metrics, such as AUPRC and recall, it surpasses the second-best programmer on the latter dataset by 6.9% and 4.7%, respectively. We also demonstrated that the improvement is essentially due to using the ensemble model, especially, the hybrid feature. We share our code for reproducibility and future research at https://github.com/CandiceCong/StackingPPINet .
Collapse
Affiliation(s)
- Hanhan Cong
- School of Information Science and Engineering, Shandong Normal University, Jinan, China
- Shandong Provincial Key Laboratory for Novel Distributed Computer Software Technology, Jinan, China
| | - Hong Liu
- School of Information Science and Engineering, Shandong Normal University, Jinan, China.
- Shandong Provincial Key Laboratory for Novel Distributed Computer Software Technology, Jinan, China.
| | - Yi Cao
- School of Information Science and Engineering, University of Jinan, Jinan, China
- Shandong Provincial Key Laboratory of Network Based Intelligent Computing, Jinan, China
| | - Cheng Liang
- School of Information Science and Engineering, Shandong Normal University, Jinan, China
| | - Yuehui Chen
- School of Information Science and Engineering, University of Jinan, Jinan, China
- Shandong Provincial Key Laboratory of Network Based Intelligent Computing, Jinan, China
| |
Collapse
|
21
|
Zhao G, Zhao Z, Gong W, Li F. Radiology report generation with medical knowledge and multilevel image-report alignment: A new method and its verification. Artif Intell Med 2023; 146:102714. [PMID: 38042601 DOI: 10.1016/j.artmed.2023.102714] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2022] [Revised: 11/01/2023] [Accepted: 11/01/2023] [Indexed: 12/04/2023]
Abstract
Medical report generation is an integral part of computer-aided diagnosis aimed at reducing the workload of radiologists and physicians and alerting them of misdiagnosis risks. In general, medical report generation is an image captioning task. Since medical reports have long sequences with data bias, the existing medical report generation models lack medical knowledge and ignore the interaction alignment between the two modalities of reports and images. The current paper attempts to mitigate these deficiencies by proposing an approach based on knowledge enhancement with multilevel alignment (MKMIA). To this end, it includes a knowledge enhancement (MKE) module and a multilevel alignment module (MIRA). Specifically, the MKE deals with general medical knowledge (MK) and historical knowledge (HK) obtained via data training. The general knowledge is embedded in the form of a dictionary with characteristic organs (referred to as Key) and organ aliases, disease symptoms, etc. (referred to as Value). It provides explicit exception candidates to mitigate data bias. Historical knowledge ensures the comparison of similar cases to provide a better diagnosis. MIRA furnishes coarse-to-fine multilevel alignment, reducing the gap between image and text features, improving the knowledge enhancement module's performance, and facilitating the generation of lengthy reports. Experimental results on two radiology report datasets (i.e., IU X-ray and MIMIC-CXR) proved the effectiveness of the proposed approach, achieving state-of-the-art performance.
Collapse
Affiliation(s)
- Guosheng Zhao
- School of Control Science and Engineering, Shandong University, Jinan, 250061, China
| | - Zijian Zhao
- School of Control Science and Engineering, Shandong University, Jinan, 250061, China.
| | - Wuxian Gong
- Department of Radiology, Shandong Provincial Hospital Affiliated to Shandong First Medical University, Jinan, 250021, China
| | - Feng Li
- Department of General Surgery, Qilu Hospital of Shandong University, Jinan, 250012, China
| |
Collapse
|
22
|
Xiao H, Song W, Liu C, Peng B, Zhu M, Jiang B, Liu Z. Reconstruction of central arterial pressure waveform based on CBi-SAN network from radial pressure waveform. Artif Intell Med 2023; 145:102683. [PMID: 37925212 DOI: 10.1016/j.artmed.2023.102683] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2022] [Revised: 05/30/2023] [Accepted: 10/06/2023] [Indexed: 11/06/2023]
Abstract
The central arterial pressure (CAP) is an important physiological indicator of the human cardiovascular system which represents one of the greatest threats to human health. Accurate non-invasive detection and reconstruction of CAP waveforms are crucial for the reliable treatment of cardiovascular system diseases. However, the traditional methods are reconstructed with relatively low accuracy, and some deep learning neural network models also have difficulty in extracting features, as a result, these methods have potential for further advancement. In this study, we proposed a novel model (CBi-SAN) to implement an end-to-end relationship from radial artery pressure (RAP) waveform to CAP waveform, which consisted of the convolutional neural network (CNN), the bidirectional long-short-time memory network (BiLSTM), and the self-attention mechanism to improve the performance of CAP reconstruction. The data on invasive measurements of CAP and RAP waveform were used in 62 patients before and after medication to develop and validate the performance of CBi-SAN model for reconstructing CAP waveform. We compared it with traditional methods and deep learning models in mean absolute error (MAE), root mean square error (RMSE), and Spearman correlation coefficient (SCC). Study results indicated the CBi-SAN model performed great performance on CAP waveform reconstruction (MAE: 2.23 ± 0.11 mmHg, RMSE: 2.21 ± 0.07 mmHg), concurrently, the best reconstruction effect was obtained in the central artery systolic pressure (CASP) and the central artery diastolic pressure(CADP) (RMSECASP: 2.94 ± 0.48 mmHg, RMSECADP: 1.96 ± 0.06 mmHg). These results implied the performance of the CAP reconstruction based on CBi-SAN model was superior to the existing methods, hopped to be effectively applied to clinical practice in the future.
Collapse
Affiliation(s)
- Hanguang Xiao
- College of Artificial Intelligent, Chongqing University of Technology, Chongqing 401135, China.
| | - Wangwang Song
- College of Artificial Intelligent, Chongqing University of Technology, Chongqing 401135, China
| | - Chang Liu
- College of Artificial Intelligent, Chongqing University of Technology, Chongqing 401135, China
| | - Bo Peng
- College of Artificial Intelligent, Chongqing University of Technology, Chongqing 401135, China
| | - Mi Zhu
- College of Artificial Intelligent, Chongqing University of Technology, Chongqing 401135, China
| | - Bin Jiang
- College of Artificial Intelligent, Chongqing University of Technology, Chongqing 401135, China
| | - Zhi Liu
- College of Artificial Intelligent, Chongqing University of Technology, Chongqing 401135, China.
| |
Collapse
|
23
|
Sun H, Jin J, Daly I, Huang Y, Zhao X, Wang X, Cichocki A. Feature learning framework based on EEG graph self-attention networks for motor imagery BCI systems. J Neurosci Methods 2023; 399:109969. [PMID: 37683772 DOI: 10.1016/j.jneumeth.2023.109969] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2023] [Revised: 08/18/2023] [Accepted: 09/03/2023] [Indexed: 09/10/2023]
Abstract
Learning distinguishable features from raw EEG signals is crucial for accurate classification of motor imagery (MI) tasks. To incorporate spatial relationships between EEG sources, we developed a feature set based on an EEG graph. In this graph, EEG channels represent the nodes, with power spectral density (PSD) features defining their properties, and the edges preserving the spatial information. We designed an EEG based graph self-attention network (EGSAN) to learn low-dimensional embedding vector for EEG graph, which can be used as distinguishable features for motor imagery task classification. We evaluated our EGSAN model on two publicly available MI EEG datasets, each containing different types of motor imagery tasks. Our experiments demonstrate that our proposed model effectively extracts distinguishable features from EEG graphs, achieving significantly higher classification accuracies than existing state-of-the-art methods.
Collapse
Affiliation(s)
- Hao Sun
- Key Laboratory of Smart Manufacturing in Energy Chemical Process, Ministry of Education, East China University of Science and Technology, Shanghai, China
| | - Jing Jin
- Key Laboratory of Smart Manufacturing in Energy Chemical Process, Ministry of Education, East China University of Science and Technology, Shanghai, China; Shenzhen Research Institute of East China University of Science and Technology, Shen Zhen 518063, China.
| | - Ian Daly
- Brain-Computer Interfacing and Neural Engineering Laboratory, School of Computer Science and Electronic Engineering, University of Essex, Colchester CO4 3SQ, United Kingdom
| | - Yitao Huang
- Key Laboratory of Smart Manufacturing in Energy Chemical Process, Ministry of Education, East China University of Science and Technology, Shanghai, China
| | - Xueqing Zhao
- Key Laboratory of Smart Manufacturing in Energy Chemical Process, Ministry of Education, East China University of Science and Technology, Shanghai, China
| | - Xingyu Wang
- Key Laboratory of Smart Manufacturing in Energy Chemical Process, Ministry of Education, East China University of Science and Technology, Shanghai, China
| | - Andrzej Cichocki
- RIKEN Brain Science Institute, Wako 351-0198, Japan; Nicolaus Copernicus University (UMK), 87-100 Torun, Poland
| |
Collapse
|
24
|
Wang Y, Wu Z, Dai J, Morgan TN, Garbens A, Kominsky H, Gahan J, Larson EC. Evaluating robotic-assisted partial nephrectomy surgeons with fully convolutional segmentation and multi-task attention networks. J Robot Surg 2023; 17:2323-2330. [PMID: 37368225 PMCID: PMC10492672 DOI: 10.1007/s11701-023-01657-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2023] [Accepted: 06/17/2023] [Indexed: 06/28/2023]
Abstract
We use machine learning to evaluate surgical skill from videos during the tumor resection and renography steps of a robotic assisted partial nephrectomy (RAPN). This expands previous work using synthetic tissue to include actual surgeries. We investigate cascaded neural networks for predicting surgical proficiency scores (OSATS and GEARS) from RAPN videos recorded from the DaVinci system. The semantic segmentation task generates a mask and tracks the various surgical instruments. The movements from the instruments found via semantic segmentation are processed by a scoring network that regresses (predicts) GEARS and OSATS scoring for each subcategory. Overall, the model performs well for many subcategories such as force sensitivity and knowledge of instruments of GEARS and OSATS scoring, but can suffer from false positives and negatives that would not be expected of human raters. This is mainly attributed to limited training data variability and sparsity.
Collapse
Affiliation(s)
- Yihao Wang
- Department of Computer Science, Southern Methodist University, Dallas, USA
| | - Zhongjie Wu
- Department of Computer Science, Southern Methodist University, Dallas, USA
| | - Jessica Dai
- Department of Urology, University of Texas Southwestern Medical Center, Dallas, USA
| | - Tara N. Morgan
- Department of Urology, University of Texas Southwestern Medical Center, Dallas, USA
| | - Alaina Garbens
- Department of Urology, University of Texas Southwestern Medical Center, Dallas, USA
| | - Hal Kominsky
- Department of Urology, University of Texas Southwestern Medical Center, Dallas, USA
| | - Jeffrey Gahan
- Department of Urology, University of Texas Southwestern Medical Center, Dallas, USA
| | - Eric C. Larson
- Department of Computer Science, Southern Methodist University, Dallas, USA
| |
Collapse
|
25
|
Liu X, Prince JL, Xing F, Zhuo J, Reese T, Stone M, El Fakhri G, Woo J. Attentive continuous generative self-training for unsupervised domain adaptive medical image translation. Med Image Anal 2023; 88:102851. [PMID: 37329854 PMCID: PMC10527936 DOI: 10.1016/j.media.2023.102851] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2022] [Revised: 03/28/2023] [Accepted: 05/23/2023] [Indexed: 06/19/2023]
Abstract
Self-training is an important class of unsupervised domain adaptation (UDA) approaches that are used to mitigate the problem of domain shift, when applying knowledge learned from a labeled source domain to unlabeled and heterogeneous target domains. While self-training-based UDA has shown considerable promise on discriminative tasks, including classification and segmentation, through reliable pseudo-label filtering based on the maximum softmax probability, there is a paucity of prior work on self-training-based UDA for generative tasks, including image modality translation. To fill this gap, in this work, we seek to develop a generative self-training (GST) framework for domain adaptive image translation with continuous value prediction and regression objectives. Specifically, we quantify both aleatoric and epistemic uncertainties within our GST using variational Bayes learning to measure the reliability of synthesized data. We also introduce a self-attention scheme that de-emphasizes the background region to prevent it from dominating the training process. The adaptation is then carried out by an alternating optimization scheme with target domain supervision that focuses attention on the regions with reliable pseudo-labels. We evaluated our framework on two cross-scanner/center, inter-subject translation tasks, including tagged-to-cine magnetic resonance (MR) image translation and T1-weighted MR-to-fractional anisotropy translation. Extensive validations with unpaired target domain data showed that our GST yielded superior synthesis performance in comparison to adversarial training UDA methods.
Collapse
Affiliation(s)
- Xiaofeng Liu
- Gordon Center for Medical Imaging, Department of Radiology, Massachusetts General Hospital and Harvard Medical School, Boston, MA, 02114, USA.
| | - Jerry L Prince
- Department of Electrical and Computer Engineering, Johns Hopkins University, Baltimore, MD, USA
| | - Fangxu Xing
- Gordon Center for Medical Imaging, Department of Radiology, Massachusetts General Hospital and Harvard Medical School, Boston, MA, 02114, USA
| | - Jiachen Zhuo
- Department of Neural and Pain Sciences, University of Maryland School of Dentistry, Baltimore, MD, USA
| | - Timothy Reese
- Athinoula A. Martinos Center for Biomedical Imaging, Department of Radiology, Massachusetts General Hospital and Harvard Medical School, Boston, MA, USA
| | - Maureen Stone
- Department of Neural and Pain Sciences, University of Maryland School of Dentistry, Baltimore, MD, USA
| | - Georges El Fakhri
- Gordon Center for Medical Imaging, Department of Radiology, Massachusetts General Hospital and Harvard Medical School, Boston, MA, 02114, USA
| | - Jonghye Woo
- Gordon Center for Medical Imaging, Department of Radiology, Massachusetts General Hospital and Harvard Medical School, Boston, MA, 02114, USA
| |
Collapse
|
26
|
Chong Y, Xie N, Liu X, Pan S. P-TransUNet: an improved parallel network for medical image segmentation. BMC Bioinformatics 2023; 24:285. [PMID: 37464322 DOI: 10.1186/s12859-023-05409-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2023] [Accepted: 07/10/2023] [Indexed: 07/20/2023] Open
Abstract
Deep learning-based medical image segmentation has made great progress over the past decades. Scholars have proposed many novel transformer-based segmentation networks to solve the problems of building long-range dependencies and global context connections in convolutional neural networks (CNNs). However, these methods usually replace the CNN-based blocks with improved transformer-based structures, which leads to the lack of local feature extraction ability, and these structures require a huge number of data for training. Moreover, those methods did not pay attention to edge information, which is essential in medical image segmentation. To address these problems, we proposed a new network structure, called P-TransUNet. This network structure combines the designed efficient P-Transformer and the fusion module, which extract distance-related long-range dependencies and local information respectively and produce the fused features. Besides, we introduced edge loss into training to focus the attention of the network on the edge of the lesion area to improve segmentation performance. Extensive experiments across four tasks of medical image segmentation demonstrated the effectiveness of P-TransUNet, and showed that our network outperforms other state-of-the-art methods.
Collapse
Affiliation(s)
- Yanwen Chong
- The State Key Laboratory of Information Engineering in Surveying, Mapping, and Remote Sensing, Wuhan University, Wuhan, China
| | - Ningdi Xie
- The State Key Laboratory of Information Engineering in Surveying, Mapping, and Remote Sensing, Wuhan University, Wuhan, China
| | - Xin Liu
- The State Key Laboratory of Information Engineering in Surveying, Mapping, and Remote Sensing, Wuhan University, Wuhan, China
| | - Shaoming Pan
- The State Key Laboratory of Information Engineering in Surveying, Mapping, and Remote Sensing, Wuhan University, Wuhan, China.
| |
Collapse
|
27
|
Su WT, Hung YC, Yu PJ, Yang SH, Lin CW. Making the Invisible Visible: Toward High-Quality Terahertz Tomographic Imaging via Physics-Guided Restoration. Int J Comput Vis 2023; 131:1-20. [PMID: 37363294 PMCID: PMC10247273 DOI: 10.1007/s11263-023-01812-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2022] [Accepted: 04/26/2023] [Indexed: 06/28/2023]
Abstract
Terahertz (THz) tomographic imaging has recently attracted significant attention thanks to its non-invasive, non-destructive, non-ionizing, material-classification, and ultra-fast nature for object exploration and inspection. However, its strong water absorption nature and low noise tolerance lead to undesired blurs and distortions of reconstructed THz images. The diffraction-limited THz signals highly constrain the performances of existing restoration methods. To address the problem, we propose a novel multi-view Subspace-Attention-guided Restoration Network (SARNet) that fuses multi-view and multi-spectral features of THz images for effective image restoration and 3D tomographic reconstruction. To this end, SARNet uses multi-scale branches to extract intra-view spatio-spectral amplitude and phase features and fuse them via shared subspace projection and self-attention guidance. We then perform inter-view fusion to further improve the restoration of individual views by leveraging the redundancies between neighboring views. Here, we experimentally construct a THz time-domain spectroscopy (THz-TDS) system covering a broad frequency range from 0.1 to 4 THz for building up a temporal/spectral/spatial/material THz database of hidden 3D objects. Complementary to a quantitative evaluation, we demonstrate the effectiveness of our SARNet model on 3D THz tomographic reconstruction applications. Supplementary Information The online version contains supplementary material available at 10.1007/s11263-023-01812-y.
Collapse
Affiliation(s)
- Weng-Tai Su
- Department of Electrical Engineering, National Tsing Hua University, Kuang-Fu Road, Hsinchu, 30048 Taiwan
| | - Yi-Chun Hung
- Department of Electrical Engineering, National Tsing Hua University, Kuang-Fu Road, Hsinchu, 30048 Taiwan
| | - Po-Jen Yu
- Department of Electrical Engineering, National Tsing Hua University, Kuang-Fu Road, Hsinchu, 30048 Taiwan
| | - Shang-Hua Yang
- Department of Electrical Engineering, National Tsing Hua University, Kuang-Fu Road, Hsinchu, 30048 Taiwan
| | - Chia-Wen Lin
- Department of Electrical Engineering, National Tsing Hua University, Kuang-Fu Road, Hsinchu, 30048 Taiwan
| |
Collapse
|
28
|
Gilany M, Wilson P, Perera-Ortega A, Jamzad A, To MNN, Fooladgar F, Wodlinger B, Abolmaesumi P, Mousavi P. TRUSformer: improving prostate cancer detection from micro-ultrasound using attention and self-supervision. Int J Comput Assist Radiol Surg 2023:10.1007/s11548-023-02949-4. [PMID: 37217768 DOI: 10.1007/s11548-023-02949-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2023] [Accepted: 05/02/2023] [Indexed: 05/24/2023]
Abstract
PURPOSE A large body of previous machine learning methods for ultrasound-based prostate cancer detection classify small regions of interest (ROIs) of ultrasound signals that lie within a larger needle trace corresponding to a prostate tissue biopsy (called biopsy core). These ROI-scale models suffer from weak labeling as histopathology results available for biopsy cores only approximate the distribution of cancer in the ROIs. ROI-scale models do not take advantage of contextual information that are normally considered by pathologists, i.e., they do not consider information about surrounding tissue and larger-scale trends when identifying cancer. We aim to improve cancer detection by taking a multi-scale, i.e., ROI-scale and biopsy core-scale, approach. METHODS Our multi-scale approach combines (i) an "ROI-scale" model trained using self-supervised learning to extract features from small ROIs and (ii) a "core-scale" transformer model that processes a collection of extracted features from multiple ROIs in the needle trace region to predict the tissue type of the corresponding core. Attention maps, as a by-product, allow us to localize cancer at the ROI scale. RESULTS We analyze this method using a dataset of micro-ultrasound acquired from 578 patients who underwent prostate biopsy, and compare our model to baseline models and other large-scale studies in the literature. Our model shows consistent and substantial performance improvements compared to ROI-scale-only models. It achieves [Formula: see text] AUROC, a statistically significant improvement over ROI-scale classification. We also compare our method to large studies on prostate cancer detection, using other imaging modalities. CONCLUSIONS Taking a multi-scale approach that leverages contextual information improves prostate cancer detection compared to ROI-scale-only models. The proposed model achieves a statistically significant improvement in performance and outperforms other large-scale studies in the literature. Our code is publicly available at www.github.com/med-i-lab/TRUSFormer .
Collapse
Affiliation(s)
- Mahdi Gilany
- School of Computing, Queen's University, Kingston, Canada.
| | - Paul Wilson
- School of Computing, Queen's University, Kingston, Canada
| | | | - Amoon Jamzad
- School of Computing, Queen's University, Kingston, Canada
| | - Minh Nguyen Nhat To
- Department of Electrical and Computer Engineering, University of British Columbia, Vancouver, Canada
| | - Fahimeh Fooladgar
- Department of Electrical and Computer Engineering, University of British Columbia, Vancouver, Canada
| | | | - Purang Abolmaesumi
- Department of Electrical and Computer Engineering, University of British Columbia, Vancouver, Canada
| | - Parvin Mousavi
- School of Computing, Queen's University, Kingston, Canada
| |
Collapse
|
29
|
Hou Z, Lv X, Zhou Y, Bu L, Ma Q, Wang Y, Bu F. A dynamic graph Hawkes process based on linear complexity self-attention for dynamic recommender systems. PeerJ Comput Sci 2023; 9:e1368. [PMID: 37346515 PMCID: PMC10280484 DOI: 10.7717/peerj-cs.1368] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2022] [Accepted: 04/04/2023] [Indexed: 06/23/2023]
Abstract
The dynamic recommender system realizes the real-time recommendation for users by learning the dynamic interest characteristics, which is especially suitable for the scenarios of rapid transfer of user interests, such as e-commerce and social media. The dynamic recommendation model mainly depends on the user-item history interaction sequence with timestamp, which contains historical records that reflect changes in the true interests of users and the popularity of items. Previous methods usually model interaction sequences to learn the dynamic embedding of users and items. However, these methods can not directly capture the excitation effects of different historical information on the evolution process of both sides of the interaction, i.e., the ability of events to influence the occurrence of another event. In this work, we propose a Dynamic Graph Hawkes Process based on Linear complexity Self-Attention (DGHP-LISA) for dynamic recommender systems, which is a new framework for modeling the dynamic relationship between users and items at the same time. Specifically, DGHP-LISA is built on dynamic graph and uses Hawkes process to capture the excitation effects between events. In addition, we propose a new self-attention with linear complexity to model the time correlation of different historical events and the dynamic correlation between different update mechanisms, which drives more accurate modeling of the evolution process of both sides of the interaction. Extensive experiments on three real-world datasets show that our model achieves consistent improvements over state-of-the-art baselines.
Collapse
Affiliation(s)
- Zhiwen Hou
- School of Information Network Security, People’s Public Security University of China, Beijing, China
| | - Xiaojun Lv
- Institute of Computing Technology, China Academy of Railway Sciences Corporation Limited, Beijing, China
| | - Yuchen Zhou
- School of Information Network Security, People’s Public Security University of China, Beijing, China
| | - Lingbin Bu
- School of Information Network Security, People’s Public Security University of China, Beijing, China
| | - Qiming Ma
- School of Information Network Security, People’s Public Security University of China, Beijing, China
| | - Yifan Wang
- School of Information Network Security, People’s Public Security University of China, Beijing, China
| | - Fanliang Bu
- School of Information Network Security, People’s Public Security University of China, Beijing, China
| |
Collapse
|
30
|
Li YZ, Wang Y, Huang YH, Xiang P, Liu WX, Lai QQ, Gao YY, Xu MS, Guo YF. RSU-Net: U-net based on residual and self-attention mechanism in the segmentation of cardiac magnetic resonance images. Comput Methods Programs Biomed 2023; 231:107437. [PMID: 36863157 DOI: 10.1016/j.cmpb.2023.107437] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/30/2022] [Revised: 11/20/2022] [Accepted: 02/18/2023] [Indexed: 06/18/2023]
Abstract
BACKGROUND Automated segmentation techniques for cardiac magnetic resonance imaging (MRI) are beneficial for evaluating cardiac functional parameters in clinical diagnosis. However, due to the characteristics of unclear image boundaries and anisotropic resolution anisotropy produced by cardiac magnetic resonance imaging technology, most of the existing methods still have the problems of intra-class uncertainty and inter-class uncertainty. However, due to the irregularity of the anatomical shape of the heart and the inhomogeneity of tissue density, the boundaries of its anatomical structures become uncertain and discontinuous. Therefore, fast and accurate segmentation of cardiac tissue remains a challenging problem in medical image processing. METHODOLOGY We collected cardiac MRI data from 195 patients as training set and 35patients from different medical centers as external validation set. Our research proposed a U-net network architecture with residual connections and a self-attentive mechanism (Residual Self-Attention U-net, RSU-Net). The network relies on the classic U-net network, adopts the U-shaped symmetric architecture of the encoding and decoding mode, improves the convolution module in the network, introduces skip connections, and improves the network's capacity for feature extraction. Then for solving locality defects of ordinary convolutional networks. To achieve a global receptive field, a self-attention mechanism is introduced at the bottom of the model. The loss function employs a combination of Cross Entropy Loss and Dice Loss to jointly guide network training, resulting in more stable network training. RESULTS In our study, we employ the Hausdorff distance (HD) and the Dice similarity coefficient (DSC) as metrics for assessing segmentation outcomes. Comparsion was made with the segmentation frameworks of other papers, and the comparison results prove that our RSU-Net network performs better and can make accurate segmentation of the heart. New ideas for scientific research. CONCLUSION Our proposed RSU-Net network combines the advantages of residual connections and self-attention. This paper uses the residual links to facilitate the training of the network. In this paper, a self-attention mechanism is introduced, and a bottom self-attention block (BSA Block) is used to aggregate global information. Self-attention aggregates global information, and has achieved good segmentation results on the cardiac segmentation dataset. It facilitates the diagnosis of cardiovascular patients in the future.
Collapse
Affiliation(s)
- Yuan-Zhe Li
- Department of CT/MRI, The Second Affiliated Hospital of Fujian Medical University, Quanzhou 362000, China
| | - Yi Wang
- Department of CT/MRI, The Second Affiliated Hospital of Fujian Medical University, Quanzhou 362000, China
| | - Yin-Hui Huang
- Department of Neurology, Jinjiang Municipal Hospital, Quanzhou 362000, China
| | - Ping Xiang
- Department of Radiology, The First Affiliated Hospital of Zhejiang Chinese Medical University (Zhejiang Provincial Hospital of Traditional Chinese Medicine), Hangzhou 310000, China
| | - Wen-Xi Liu
- Department of CT/MRI, The Second Affiliated Hospital of Fujian Medical University, Quanzhou 362000, China
| | - Qing-Quan Lai
- Department of CT/MRI, The Second Affiliated Hospital of Fujian Medical University, Quanzhou 362000, China
| | - Yi-Yuan Gao
- Department of Radiology, The First Affiliated Hospital of Zhejiang Chinese Medical University (Zhejiang Provincial Hospital of Traditional Chinese Medicine), Hangzhou 310000, China
| | - Mao-Sheng Xu
- Department of Radiology, The First Affiliated Hospital of Zhejiang Chinese Medical University (Zhejiang Provincial Hospital of Traditional Chinese Medicine), Hangzhou 310000, China.
| | - Yi-Fan Guo
- Department of Radiology, The First Affiliated Hospital of Zhejiang Chinese Medical University (Zhejiang Provincial Hospital of Traditional Chinese Medicine), Hangzhou 310000, China.
| |
Collapse
|
31
|
Fan Y, Li L, Chu P, Wu Q, Wang Y, Cao W, Li N. Clinical analysis of eye movement-based data in the medical diagnosis of amblyopia. Methods 2023:S1046-2023(23)00045-2. [PMID: 36924866 DOI: 10.1016/j.ymeth.2023.03.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2022] [Revised: 02/26/2023] [Accepted: 03/11/2023] [Indexed: 03/15/2023] Open
Abstract
Amblyopia is an abnormal visual processing-induced developmental disorder of the central nervous system that affects static and dynamic vision, as well as binocular visual function. Currently, changes in static vision in one eye are the gold standard for amblyopia diagnosis. However, there have been few comprehensive analyses of changes in dynamic vision, especially eye movement, among children with amblyopia. Here, we proposed an optimization scheme involving a video eye tracker combined with an "artificial eye" for comprehensive examination of eye movement in children with amblyopia; we sought to improve the diagnostic criteria for amblyopia and provide theoretical support for practical treatment. The resulting eye movement data were used to construct a deep learning approach for diagnostic and predictive applications. Through efforts to manage the uncooperativeness of children with strabismus who could not complete the eye movement assessment, this study quantitatively and objectively assessed the clinical implications of eye movement characteristics in children with amblyopia. Our results indicated that an amblyopic eye is always in a state of adjustment, and thus is not "lazy." Additionally, we found that the eye movement parameters of amblyopic eyes and eyes with normal vision are significantly different. Finally, we identified eye movement parameters that can be used to supplement and optimize the diagnostic criteria for amblyopia, providing a diagnostic basis for evaluation of binocular visual function.
Collapse
|
32
|
Li K, Qian Z, Han Y, Chang EIC, Wei B, Lai M, Liao J, Fan Y, Xu Y. Weakly supervised histopathology image segmentation with self-attention. Med Image Anal 2023; 86:102791. [PMID: 36933385 DOI: 10.1016/j.media.2023.102791] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2022] [Revised: 01/09/2023] [Accepted: 02/24/2023] [Indexed: 03/13/2023]
Abstract
Accurate segmentation in histopathology images at pixel-level plays a critical role in the digital pathology workflow. The development of weakly supervised methods for histopathology image segmentation liberates pathologists from time-consuming and labor-intensive works, opening up possibilities of further automated quantitative analysis of whole-slide histopathology images. As an effective subgroup of weakly supervised methods, multiple instance learning (MIL) has achieved great success in histopathology images. In this paper, we specially treat pixels as instances so that the histopathology image segmentation task is transformed into an instance prediction task in MIL. However, the lack of relations between instances in MIL limits the further improvement of segmentation performance. Therefore, we propose a novel weakly supervised method called SA-MIL for pixel-level segmentation in histopathology images. SA-MIL introduces a self-attention mechanism into the MIL framework, which captures global correlation among all instances. In addition, we use deep supervision to make the best use of information from limited annotations in the weakly supervised method. Our approach makes up for the shortcoming that instances are independent of each other in MIL by aggregating global contextual information. We demonstrate state-of-the-art results compared to other weakly supervised methods on two histopathology image datasets. It is evident that our approach has generalization ability for the high performance on both tissue and cell histopathology datasets. There is potential in our approach for various applications in medical images.
Collapse
Affiliation(s)
- Kailu Li
- School of Biological Science and Medical Engineering, State Key Laboratory of Software Development Environment, Key Laboratory of Biomechanics, Mechanobiology of Ministry of Education and Beijing Advanced Innovation Centre for Biomedical Engineering, Beihang University, Beijing 100191, China.
| | - Ziniu Qian
- School of Biological Science and Medical Engineering, State Key Laboratory of Software Development Environment, Key Laboratory of Biomechanics, Mechanobiology of Ministry of Education and Beijing Advanced Innovation Centre for Biomedical Engineering, Beihang University, Beijing 100191, China.
| | - Yingnan Han
- School of Biological Science and Medical Engineering, State Key Laboratory of Software Development Environment, Key Laboratory of Biomechanics, Mechanobiology of Ministry of Education and Beijing Advanced Innovation Centre for Biomedical Engineering, Beihang University, Beijing 100191, China.
| | | | | | - Maode Lai
- Department of Pathology, School of Medicine, Zhejiang University, Hangzhou 310027, China.
| | - Jing Liao
- Department of Computer Science, City University of Hong Kong, 999077, Hong Kong SAR, China.
| | - Yubo Fan
- School of Biological Science and Medical Engineering, State Key Laboratory of Software Development Environment, Key Laboratory of Biomechanics, Mechanobiology of Ministry of Education and Beijing Advanced Innovation Centre for Biomedical Engineering, Beihang University, Beijing 100191, China.
| | - Yan Xu
- School of Biological Science and Medical Engineering, State Key Laboratory of Software Development Environment, Key Laboratory of Biomechanics, Mechanobiology of Ministry of Education and Beijing Advanced Innovation Centre for Biomedical Engineering, Beihang University, Beijing 100191, China; Microsoft Research, Beijing 100080, China.
| |
Collapse
|
33
|
Pan S, Liu X, Xie N, Chong Y. EG-TransUNet: a transformer-based U-Net with enhanced and guided models for biomedical image segmentation. BMC Bioinformatics 2023; 24:85. [PMID: 36882688 PMCID: PMC9989586 DOI: 10.1186/s12859-023-05196-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2022] [Accepted: 02/20/2023] [Indexed: 03/09/2023] Open
Abstract
Although various methods based on convolutional neural networks have improved the performance of biomedical image segmentation to meet the precision requirements of medical imaging segmentation task, medical image segmentation methods based on deep learning still need to solve the following problems: (1) Difficulty in extracting the discriminative feature of the lesion region in medical images during the encoding process due to variable sizes and shapes; (2) difficulty in fusing spatial and semantic information of the lesion region effectively during the decoding process due to redundant information and the semantic gap. In this paper, we used the attention-based Transformer during the encoder and decoder stages to improve feature discrimination at the level of spatial detail and semantic location by its multihead-based self-attention. In conclusion, we propose an architecture called EG-TransUNet, including three modules improved by a transformer: progressive enhancement module, channel spatial attention, and semantic guidance attention. The proposed EG-TransUNet architecture allowed us to capture object variabilities with improved results on different biomedical datasets. EG-TransUNet outperformed other methods on two popular colonoscopy datasets (Kvasir-SEG and CVC-ClinicDB) by achieving 93.44% and 95.26% on mDice. Extensive experiments and visualization results demonstrate that our method advances the performance on five medical segmentation datasets with better generalization ability.
Collapse
Affiliation(s)
- Shaoming Pan
- The State Key Laboratory of Information Engineering in Surveying, Mapping, and Remote Sensing, Wuhan University, Wuhan, China
| | - Xin Liu
- The State Key Laboratory of Information Engineering in Surveying, Mapping, and Remote Sensing, Wuhan University, Wuhan, China
| | - Ningdi Xie
- The State Key Laboratory of Information Engineering in Surveying, Mapping, and Remote Sensing, Wuhan University, Wuhan, China
| | - Yanwen Chong
- The State Key Laboratory of Information Engineering in Surveying, Mapping, and Remote Sensing, Wuhan University, Wuhan, China.
| |
Collapse
|
34
|
Wang J, Yuan M, Li Y, Zhao Z. Hierarchical Attention Master-Slave for heterogeneous multi-agent reinforcement learning. Neural Netw 2023; 162:359-368. [PMID: 36940496 DOI: 10.1016/j.neunet.2023.02.037] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2022] [Revised: 02/17/2023] [Accepted: 02/23/2023] [Indexed: 03/06/2023]
Abstract
Most multi-agent reinforcement learning (MARL) approaches optimize strategy by improving itself, while ignoring the limitations of homogeneous agents that may have single function. However, in reality, the complex tasks tend to coordinate various types of agents and leverage advantages from one another. Therefore, it is a vital research issue how to establish appropriate communication among them and optimize decision. To this end, we propose a Hierarchical Attention Master-Slave (HAMS) MARL, where the Hierarchical Attention balances the weight allocation within and among clusters, and the Master-Slave architecture endows agents independent reasoning and individual guidance. By the offered design, information fusion, especially among clusters, is implemented effectively, and excessive communication is avoided, moreover, selective composed action optimizes decision. We evaluate the HAMS on both small and large scale heterogeneous StarCraft II micromanagement tasks. The proposed algorithm achieves the exceptional performance with more than 80% win rates in all evaluation scenarios, which obtains an impressive win rate of over 90% in the largest map. The experiments demonstrate a maximum improvement in win rate of 47% over the best known algorithm. The results show that our proposal outperforms recent state-of-the-art approaches, which provides a novel idea for heterogeneous multi-agent policy optimization.
Collapse
Affiliation(s)
- Jiao Wang
- College of Information Science and Engineering, Northeastern University, No. 3-11, Wenhua Road, Heping District, Shenyang, 110819, Liaoning, PR China.
| | - Mingrui Yuan
- College of Information Science and Engineering, Northeastern University, No. 3-11, Wenhua Road, Heping District, Shenyang, 110819, Liaoning, PR China.
| | - Yun Li
- College of Information Science and Engineering, Northeastern University, No. 3-11, Wenhua Road, Heping District, Shenyang, 110819, Liaoning, PR China.
| | - Zihui Zhao
- College of Information Science and Engineering, Northeastern University, No. 3-11, Wenhua Road, Heping District, Shenyang, 110819, Liaoning, PR China
| |
Collapse
|
35
|
Kn BP, Cs A, Mohammed A, Chitta KK, To XV, Srour H, Nasrallah F. An end-end deep learning framework for lesion segmentation on multi-contrast MR images-an exploratory study in a rat model of traumatic brain injury. Med Biol Eng Comput 2023; 61:847-865. [PMID: 36624356 DOI: 10.1007/s11517-022-02752-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2022] [Accepted: 12/24/2022] [Indexed: 01/11/2023]
Abstract
Traumatic brain injury (TBI) engenders traumatic necrosis and penumbra-areas of secondary neural injury which are crucial targets for therapeutic interventions. Segmenting manually areas of ongoing changes like necrosis, edema, hematoma, and inflammation is tedious, error-prone, and biased. Using the multi-parametric MR data from a rodent model study, we demonstrate the effectiveness of an end-end deep learning global-attention-based UNet (GA-UNet) framework for automatic segmentation and quantification of TBI lesions. Longitudinal MR scans (2 h, 1, 3, 7, 14, 30, and 60 days) were performed on eight Sprague-Dawley rats after controlled cortical injury was performed. TBI lesion and sub-regions segmentation was performed using 3D-UNet and GA-UNet. Dice statistics (DSI) and Hausdorff distance were calculated to assess the performance. MR scan variations-based (bias, noise, blur, ghosting) data augmentation was performed to develop a robust model.Training/validation median DSI for U-Net was 0.9368 with T2w and MPRAGE inputs, whereas GA-UNet had 0.9537 for the same. Testing accuracies were higher for GA-UNet than U-Net with a DSI of 0.8232 for the T2w-MPRAGE inputs.Longitudinally, necrosis remained constant while oligemia and penumbra decreased, and edema appearing around day 3 which increased with time. GA-UNet shows promise for multi-contrast MR image-based segmentation/quantification of TBI in large cohort studies.
Collapse
Affiliation(s)
- Bhanu Prakash Kn
- Clinical Data Analytics & Radiomics, Cellular Image Informatics, Bioinformatics Institute, A*STAR, 30 Biopolis St Matrix, Singapore, 138671, Singapore. .,Cellular Image Informatics, Bioinformatics Institute, A*STAR Horizontal Technology Centers, Singapore, Singapore.
| | - Arvind Cs
- Clinical Data Analytics & Radiomics, Cellular Image Informatics, Bioinformatics Institute, A*STAR, 30 Biopolis St Matrix, Singapore, 138671, Singapore
| | - Abdalla Mohammed
- Queensland Brain Institute, The University of Queensland, Building 79, Upland Road, Saint Lucia, Brisbane, QLD, 4072, Australia
| | - Krishna Kanth Chitta
- Signal and Image Processing Group, Laboratory of Molecular Imaging, Singapore Bioimaging Consortium, A*STAR, 02-02 Helios 11, Biopolis Way, Singapore, 138667, Singapore
| | - Xuan Vinh To
- Queensland Brain Institute, The University of Queensland, Building 79, Upland Road, Saint Lucia, Brisbane, QLD, 4072, Australia
| | - Hussein Srour
- Queensland Brain Institute, The University of Queensland, Building 79, Upland Road, Saint Lucia, Brisbane, QLD, 4072, Australia
| | - Fatima Nasrallah
- Queensland Brain Institute, The University of Queensland, Building 79, Upland Road, Saint Lucia, Brisbane, QLD, 4072, Australia
| |
Collapse
|
36
|
Li J, Sun W, von Deneen KM, Fan X, An G, Cui G, Zhang Y. MG-Net: Multi-level global-aware network for thymoma segmentation. Comput Biol Med 2023; 155:106635. [PMID: 36791547 DOI: 10.1016/j.compbiomed.2023.106635] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2022] [Revised: 01/26/2023] [Accepted: 02/04/2023] [Indexed: 02/10/2023]
Abstract
BACKGROUND AND OBJECTIVE Automatic thymoma segmentation in preoperative contrast-enhanced computed tomography (CECT) images makes great sense for diagnosis. Although convolutional neural networks (CNNs) are distinguished in medical image segmentation, they are challenged by thymomas with various shapes, scales and textures, owing to the intrinsic locality of convolution operations. In order to overcome this deficit, we built a deep learning network with enhanced global-awareness for thymoma segmentation. METHODS We propose a multi-level global-aware network (MG-Net) for thymoma segmentation, in which the multi-level feature interaction and integration are jointly designed to enhance the global-awareness of CNNs. Particularly, we design the cross-attention block (CAB) to calculate pixel-wise interactions of multi-level features, resulting in the Global Enhanced Convolution Block, which can enable the network to handle various thymomas by strengthening the global-awareness of the encoder. We further devise the Global Spatial Attention Module to integrate coarse- and fine-grain information for enhancing the semantic consistency between the encoder and decoder with CABs. We also develop an Adaptive Attention Fusion Module to adaptively aggregate different semantic-scale features in the decoder to preserve comprehensive details. RESULTS The MG-Net has been evaluated against several state-of-the-art models on the self-collected CECT dataset and NIH Pancreas-CT dataset. Results suggest that all designed components are effective, and MG-Net has superior segmentation performance and generalization ability over existing models. CONCLUSION Both the qualitative and quantitative experimental results indicate that our MG-Net with global-aware ability can achieve accurate thymoma segmentation and has generalization ability in different tasks. The code is available at: https://github.com/Leejyuan/MGNet.
Collapse
Affiliation(s)
- Jingyuan Li
- Center for Brain Imaging, School of Life Science and Technology, Xidian University & Engineering Research Center of Molecular and Neuro Imaging, Ministry of Education, Xi'an, Shaanxi, 710126, China; International Joint Research Center for Advanced Medical Imaging and Intelligent Diagnosis and Treatment & Xi'an Key Laboratory of Intelligent Sensing and Regulation of Trans-Scale Life Information, School of Life Science and Technology, Xidian University, Xi'an, Shaanxi, 710126, China
| | - Wenfang Sun
- International Joint Research Center for Advanced Medical Imaging and Intelligent Diagnosis and Treatment & Xi'an Key Laboratory of Intelligent Sensing and Regulation of Trans-Scale Life Information, School of Life Science and Technology, Xidian University, Xi'an, Shaanxi, 710126, China; School of Aerospace Science and Technology, Xidian University, Xi'an, Shaanxi, 710126, China.
| | - Karen M von Deneen
- Center for Brain Imaging, School of Life Science and Technology, Xidian University & Engineering Research Center of Molecular and Neuro Imaging, Ministry of Education, Xi'an, Shaanxi, 710126, China; International Joint Research Center for Advanced Medical Imaging and Intelligent Diagnosis and Treatment & Xi'an Key Laboratory of Intelligent Sensing and Regulation of Trans-Scale Life Information, School of Life Science and Technology, Xidian University, Xi'an, Shaanxi, 710126, China
| | - Xiao Fan
- Center for Brain Imaging, School of Life Science and Technology, Xidian University & Engineering Research Center of Molecular and Neuro Imaging, Ministry of Education, Xi'an, Shaanxi, 710126, China; International Joint Research Center for Advanced Medical Imaging and Intelligent Diagnosis and Treatment & Xi'an Key Laboratory of Intelligent Sensing and Regulation of Trans-Scale Life Information, School of Life Science and Technology, Xidian University, Xi'an, Shaanxi, 710126, China
| | - Gang An
- Center for Brain Imaging, School of Life Science and Technology, Xidian University & Engineering Research Center of Molecular and Neuro Imaging, Ministry of Education, Xi'an, Shaanxi, 710126, China; International Joint Research Center for Advanced Medical Imaging and Intelligent Diagnosis and Treatment & Xi'an Key Laboratory of Intelligent Sensing and Regulation of Trans-Scale Life Information, School of Life Science and Technology, Xidian University, Xi'an, Shaanxi, 710126, China
| | - Guangbin Cui
- Department of Radiology, Tangdu Hospital, Fourth Military Medical University, Xi'an, Shaanxi, 710038, China.
| | - Yi Zhang
- Center for Brain Imaging, School of Life Science and Technology, Xidian University & Engineering Research Center of Molecular and Neuro Imaging, Ministry of Education, Xi'an, Shaanxi, 710126, China; International Joint Research Center for Advanced Medical Imaging and Intelligent Diagnosis and Treatment & Xi'an Key Laboratory of Intelligent Sensing and Regulation of Trans-Scale Life Information, School of Life Science and Technology, Xidian University, Xi'an, Shaanxi, 710126, China.
| |
Collapse
|
37
|
Li F, Xu Y, Zhang B, Cong F. [Automated detection of sleep-arousal using multi-scale convolution and self-attention mechanism]. Sheng Wu Yi Xue Gong Cheng Xue Za Zhi 2023; 40:27-34. [PMID: 36854545 DOI: 10.7507/1001-5515.202204052] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Indexed: 03/02/2023]
Abstract
In clinical, manually scoring by technician is the major method for sleep arousal detection. This method is time-consuming and subjective. This study aimed to achieve an end-to-end sleep-arousal events detection by constructing a convolutional neural network based on multi-scale convolutional layers and self-attention mechanism, and using 1 min single-channel electroencephalogram (EEG) signals as its input. Compared with the performance of the baseline model, the results of the proposed method showed that the mean area under the precision-recall curve and area under the receiver operating characteristic were both improved by 7%. Furthermore, we also compared the effects of single modality and multi-modality on the performance of the proposed model. The results revealed the power of single-channel EEG signals in automatic sleep arousal detection. However, the simple combination of multi-modality signals may be counterproductive to the improvement of model performance. Finally, we also explored the scalability of the proposed model and transferred the model into the automated sleep staging task in the same dataset. The average accuracy of 73% also suggested the power of the proposed method in task transferring. This study provides a potential solution for the development of portable sleep monitoring and paves a way for the automatic sleep data analysis using the transfer learning method.
Collapse
Affiliation(s)
- Fan Li
- School of Biomedical Engineering, Dalian University of Technology, Dalian, Liaoning 116024, P. R. China
| | - Yan Xu
- Department of Psychiatry, Nanfang Hospital, Southern Medical University, Guangzhou 510515, P. R. China
| | - Bin Zhang
- Department of Psychiatry, Nanfang Hospital, Southern Medical University, Guangzhou 510515, P. R. China
| | - Fengyu Cong
- School of Biomedical Engineering, Dalian University of Technology, Dalian, Liaoning 116024, P. R. China.,School of Artificial Intelligence, Dalian University of Technology, Dalian, Liaoning 116024, P. R. China.,Key Laboratory of Integrated Circuit and Biomedical Electronic System, Liaoning Province. Dalian University of Technology, Dalian, Liaoning 116024, P. R. China
| |
Collapse
|
38
|
Ma Z, Qi Y, Xu C, Zhao W, Lou M, Wang Y, Ma Y. ATFE-Net: Axial Transformer and Feature Enhancement-based CNN for ultrasound breast mass segmentation. Comput Biol Med 2023; 153:106533. [PMID: 36638617 DOI: 10.1016/j.compbiomed.2022.106533] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2022] [Revised: 11/25/2022] [Accepted: 12/31/2022] [Indexed: 01/05/2023]
Abstract
Breast mass is one of the main clinical symptoms of breast cancer. Recently, many CNN-based methods for breast mass segmentation have been proposed. However, these methods have difficulties in capturing long-range dependencies, causing poor segmentation of large-scale breast masses. In this paper, we propose an axial Transformer and feature enhancement-based CNN (ATFE-Net) for ultrasound breast mass segmentation. Specially, an axial Transformer (Axial-Trans) module and a Transformer-based feature enhancement (Trans-FE) module are proposed to capture long-range dependencies. Axial-Trans module only calculates self-attention in width and height directions of input feature maps, which reduces the complexity of self-attention significantly from O(n2) to O(n). In addition, Trans-FE module can enhance feature representation by capturing dependencies between different feature layers, since deeper feature layers have richer semantic information and shallower feature layers have more detailed information. The experimental results show that our ATFE-Net achieved better performance than several state-of-the-art methods on two publicly available breast ultrasound datasets, with Dice coefficient of 82.46% for BUSI and 86.78% for UDIAT, respectively.
Collapse
Affiliation(s)
- Zhou Ma
- School of Information Science and Engineering, Lanzhou University, Lanzhou, Gansu, China
| | - Yunliang Qi
- School of Information Science and Engineering, Lanzhou University, Lanzhou, Gansu, China
| | - Chunbo Xu
- School of Information Science and Engineering, Lanzhou University, Lanzhou, Gansu, China
| | - Wei Zhao
- School of Information Science and Engineering, Lanzhou University, Lanzhou, Gansu, China
| | - Meng Lou
- School of Information Science and Engineering, Lanzhou University, Lanzhou, Gansu, China
| | - Yiming Wang
- School of Information Science and Engineering, Lanzhou University, Lanzhou, Gansu, China
| | - Yide Ma
- School of Information Science and Engineering, Lanzhou University, Lanzhou, Gansu, China.
| |
Collapse
|
39
|
Shen J, Hu Y, Zhang X, Gong Y, Kawasaki R, Liu J. Structure-Oriented Transformer for retinal diseases grading from OCT images. Comput Biol Med 2023; 152:106445. [PMID: 36549031 DOI: 10.1016/j.compbiomed.2022.106445] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2022] [Revised: 11/23/2022] [Accepted: 12/15/2022] [Indexed: 12/23/2022]
Abstract
Retinal diseases are the leading causes of vision temporary or permanent loss. Precise retinal disease grading is a prerequisite for early intervention or specific therapeutic schedules. Existing works based on Convolutional Neural Networks (CNN) focus on typical locality structures and cannot capture long-range dependencies. But retinal disease grading relies more on the relationship between the local lesion and the whole retina, which is consistent with the self-attention mechanism. Therefore, the paper proposes a novel Structure-Oriented Transformer (SoT) framework to further construct the relationship between lesions and retina on clinical datasets. To reduce the dependence on the amount of data, we design structure guidance as a model-oriented filter to emphasize the whole retina structure and guide relation construction. Then, we adopt the pre-trained vision transformer that efficiently models all feature patches' relationships via transfer learning. Besides, to make the best of all output tokens, a Token vote classifier is proposed to obtain the final grading results. We conduct extensive experiments on one clinical neovascular Age-related Macular Degeneration (nAMD) dataset. The experiments demonstrate the effectiveness of SoT components and improve the ability of relation construction between lesion and retina, which outperforms the state-of-the-art methods for nAMD grading. Furthermore, we evaluate our SoT on one publicly available retinal diseases dataset, which proves our algorithm has classification superiority and good generality.
Collapse
Affiliation(s)
- Junyong Shen
- Research Institute of Trustworthy Autonomous Systems and Department of Computer Science and Engineering, Southern University of Science and Technology, Shenzhen, 51805, Guangdong, China
| | - Yan Hu
- Research Institute of Trustworthy Autonomous Systems and Department of Computer Science and Engineering, Southern University of Science and Technology, Shenzhen, 51805, Guangdong, China.
| | - Xiaoqing Zhang
- Research Institute of Trustworthy Autonomous Systems and Department of Computer Science and Engineering, Southern University of Science and Technology, Shenzhen, 51805, Guangdong, China
| | - Yan Gong
- Ningbo Eye hospital, Ningbo, 315000, Zhenjiang, China
| | - Ryo Kawasaki
- Osaka University Graduate School of Medicine, Suita, Osaka, Japan
| | - Jiang Liu
- Research Institute of Trustworthy Autonomous Systems and Department of Computer Science and Engineering, Southern University of Science and Technology, Shenzhen, 51805, Guangdong, China; Guangdong Provincial Key Laboratory of Brain-inspired Intelligent Computation, Department of Computer Science and Engineering, Southern University of Science and Technology, Shenzhen, 51805, Guangdong, China.
| |
Collapse
|
40
|
Yang H, Wang L, Xu Y, Liu X. CovidViT: a novel neural network with self-attention mechanism to detect Covid-19 through X-ray images. INT J MACH LEARN CYB 2023; 14:973-987. [PMID: 36274812 PMCID: PMC9580454 DOI: 10.1007/s13042-022-01676-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2021] [Accepted: 09/28/2022] [Indexed: 11/30/2022]
Abstract
Since the emergence of the novel coronavirus in December 2019, it has rapidly swept across the globe, with a huge impact on daily life, public health and the economy around the world. There is an urgent necessary for a rapid and economical detection method for the Covid-19. In this study, we used the transformers-based deep learning method to analyze the chest X-rays of normal, Covid-19 and viral pneumonia patients. Covid-Vision-Transformers (CovidViT) is proposed to detect Covid-19 cases through X-ray images. CovidViT is based on transformers block with the self-attention mechanism. In order to demonstrate its superiority, this research is also compared with other popular deep learning models, and the experimental result shows CovidViT outperforms other deep learning models and achieves 98.0% accuracy on test set, which means that the proposed model is excellent in Covid-19 detection. Besides, an online system for quick Covid-19 diagnosis is built on http://yanghang.site/covid19.
Collapse
Affiliation(s)
- Hang Yang
- College of Science, China Agricultural University, Beijing, 100083 China
| | - Liyang Wang
- School of Clinical Medicine, Tsinghua University, Beijing, 100084 China
| | - Yitian Xu
- College of Science, China Agricultural University, Beijing, 100083 China
| | - Xuhua Liu
- College of Science, China Agricultural University, Beijing, 100083 China
| |
Collapse
|
41
|
Yang Z, Wu H, Liu Q, Liu X, Zhang Y, Cao X. A self-attention integrated spatiotemporal LSTM approach to edge-radar echo extrapolation in the Internet of Radars. ISA Trans 2023; 132:155-166. [PMID: 35840413 DOI: 10.1016/j.isatra.2022.06.046] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/30/2021] [Revised: 06/29/2022] [Accepted: 06/29/2022] [Indexed: 06/15/2023]
Abstract
In recent years, the number of weather-related disasters significantly increases across the world. As a typical example, short-range extreme precipitation can cause severe flooding and other secondary disasters, which therefore requires accurate prediction of extent and intensity of precipitation in a relatively short period of time. Based on the echo extrapolation of networked weather radars (i.e., the Internet of Radars), different solutions have been presented ranging from traditional optical-flow methods to recent deep neural networks. However, these existing networks focus on local features of echo variations to model the dynamics of holistic radar echo motion, so it often suffers from inaccurate extrapolation of the radar echo motion trend, trajectory, and intensity. To address the problem, this paper introduces the self-attention mechanism and an extra memory that saves global spatiotemporal feature into the original Spatiotemporal LSTM (ST-LSTM) to form a self-attention Integrated ST-LSTM recurrent unit (SAST-LSTM), capturing both spatial and temporal global features of radar echo motion. And several these units are stacked to build the radar echo extrapolation network SAST-Net. Comparative experiments show that the proposed model has better performance on different real world radar echo datasets over other recent methods.
Collapse
Affiliation(s)
- Zhiyun Yang
- School of Computer and Software, Engineering Research Center of Digital Forensics, Ministry of Education, Nanjing University of Information Science and Technology, Nanjing, 210044, China.
| | - Hao Wu
- School of Computer and Software, Engineering Research Center of Digital Forensics, Ministry of Education, Nanjing University of Information Science and Technology, Nanjing, 210044, China.
| | - Qi Liu
- School of Computer and Software, Engineering Research Center of Digital Forensics, Ministry of Education, Nanjing University of Information Science and Technology, Nanjing, 210044, China.
| | - Xiaodong Liu
- School of Computing, Edinburgh Napier University, Edinburgh, EH10 5DT, UK.
| | - Yonghong Zhang
- School of Automation, Nanjing University of Information Science Technology, Nanjing, 210044, China.
| | - Xuefei Cao
- School of Cyber and Information Security, Xidian University, Xi'an, 710071, China.
| |
Collapse
|
42
|
Teng J, Mi C, Liu W, Shi J, Li N. mTBI-DSANet: A deep self-attention model for diagnosing mild traumatic brain injury using multi-level functional connectivity networks. Comput Biol Med 2023; 152:106354. [PMID: 36481760 DOI: 10.1016/j.compbiomed.2022.106354] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2022] [Revised: 11/15/2022] [Accepted: 11/23/2022] [Indexed: 11/26/2022]
Abstract
The main approach for analyzing resting-state functional magnetic resonance imaging (rs-fMRI) is the low-order functional connectivity network (LoFCN) based on the correlation between two brain regions. Based on LoFCN, researchers recently proposed the topographical high-order FCN (tHoFCN) and the associated high-order FCN (aHoFCN) to explore the high-order interactions among brain regions. In this work, we designed a Deep Self-Attention (DSA) framework called mTBI-DSANet to diagnose mild traumatic brain injury (mTBI) using multi-level FCNs, including LoFCN, tHoFCN, and aHoFCN. The multilayer perceptron and self-attention mechanism in mTBI-DSANet were designed to capture important features for the mTBI diagnosis. We evaluated the mTBI-DSANet's performance on the real rs-fMRI dataset, which was collected by Third Xiangya Hospital of Central South University from April 2014 to February 2021. We compared the performance of mTBI-DSANet with distinct FCNs and their combinations under 10-fold cross-validation. Based on the LoFCN+aHoFCN combination, the average performance of mTBI-DSANet achieved the best accuracy of 0.834, which is significantly better than peer methods. The experiments demonstrated the potential of the mTBI-DSANet in assisting mTBI diagnosis.
Collapse
Affiliation(s)
- Jing Teng
- School of Control and Computer Engineering, North China Electric Power University, Beijing, China.
| | - Chunlin Mi
- School of Control and Computer Engineering, North China Electric Power University, Beijing, China.
| | - Wuyi Liu
- School of Control and Computer Engineering, North China Electric Power University, Beijing, China.
| | - Jian Shi
- Department of Hematology and Critical Care Medicine, The Third Xiangya Hospital of Central South University, Changsha, China
| | - Na Li
- Department of Radiology, The Third Xiangya Hospital of Central South University, Changsha, China.
| |
Collapse
|
43
|
Li Z, Zhang X, Dong Z. TSF-transformer: a time series forecasting model for exhaust gas emission using transformer. APPL INTELL 2022; 53:1-15. [PMID: 36590990 PMCID: PMC9788662 DOI: 10.1007/s10489-022-04326-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/05/2022] [Indexed: 12/24/2022]
Abstract
Monitoring and prediction of exhaust gas emissions for heavy trucks is a promising way to solve environmental problems. However, the emission data acquisition is time delayed and the pattern of emission is usually irregular, which makes it very difficult to accurately predict the emission state. To deal with these problems, in this paper, we interpret emission prediction as a time series prediction problem and explore a deep learning model, a time-series forecasting Transformer (TSF-Transformer) for exhaust gas emission prediction. The exhaust emission of the heavy truck is not directly predicted, but indirectly predicted by predicting the temperature and pressure changes of the exhaust pipe under the working state of the truck. The basis of our research is based on real-time data feeds from temperature and pressure sensors installed on the exhaust pipe of approximately 12,000 heavy trucks. Therefore, the task of time series forecasting consists of two key stages: monitoring and prediction. The former utilizes the server to receive the data sent by the sensors in real-time, and the latter uses these data as samples for network training and testing. The training of the network throughout the prediction process is done in an unsupervised manner. Also, to visualize the forecast results, we weight the forecast data with the truck trajectories and present them as heatmaps. To the best of our knowledge, this is the first case of using the Transformer as the core component of the prediction model to complete the task of exhaust emissions prediction from heavy trucks. Experiments show that the prediction model outperforms other state-of-the-art methods in prediction accuracy.
Collapse
Affiliation(s)
- Zhenyu Li
- Logistics Engineering College, Shanghai Maritime University, Shanghai, 200120 China
- School of Mechanical Engineering, Tongji University, Shanghai, 201804 China
| | - Xikun Zhang
- Logistics Engineering College, Shanghai Maritime University, Shanghai, 200120 China
| | - Zhenbiao Dong
- School of Mechanical Engineering, Shanghai Institute of Technology, Shanghai, 201418 China
| |
Collapse
|
44
|
Huang X, Chen J, Chen M, Chen L, Wan Y. TDD-UNet:Transformer with double decoder UNet for COVID-19 lesions segmentation. Comput Biol Med 2022; 151:106306. [PMID: 36403357 PMCID: PMC9664702 DOI: 10.1016/j.compbiomed.2022.106306] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2022] [Revised: 10/22/2022] [Accepted: 11/06/2022] [Indexed: 11/09/2022]
Abstract
The outbreak of new coronary pneumonia has brought severe health risks to the world. Detection of COVID-19 based on the UNet network has attracted widespread attention in medical image segmentation. However, the traditional UNet model is challenging to capture the long-range dependence of the image due to the limitations of the convolution kernel with a fixed receptive field. The Transformer Encoder overcomes the long-range dependence problem. However, the Transformer-based segmentation approach cannot effectively capture the fine-grained details. We propose a transformer with a double decoder UNet for COVID-19 lesions segmentation to address this challenge, TDD-UNet. We introduce the multi-head self-attention of the Transformer to the UNet encoding layer to extract global context information. The dual decoder structure is used to improve the result of foreground segmentation by predicting the background and applying deep supervision. We performed quantitative analysis and comparison for our proposed method on four public datasets with different modalities, including CT and CXR, to demonstrate its effectiveness and generality in segmenting COVID-19 lesions. We also performed ablation studies on the COVID-19-CT-505 dataset to verify the effectiveness of the key components of our proposed model. The proposed TDD-UNet also achieves higher Dice and Jaccard mean scores and the lowest standard deviation compared to competitors. Our proposed method achieves better segmentation results than other state-of-the-art methods.
Collapse
Affiliation(s)
- Xuping Huang
- Computer School, University of South China, Hengyang 421001, China
| | - Junxi Chen
- Affiliated Nanhua Hospital, University of South China, Hengyang 421001, China
| | - Mingzhi Chen
- College of Mechanical and Vehicle Engineering, Hunan University, Changsha 410082, China
| | - Lingna Chen
- Computer School, University of South China, Hengyang 421001, China,Corresponding author
| | - Yaping Wan
- Computer School, University of South China, Hengyang 421001, China,Corresponding author
| |
Collapse
|
45
|
Li H, Yue X, Meng L. Enhanced mechanisms of pooling and channel attention for deep learning feature maps. PeerJ Comput Sci 2022; 8:e1161. [PMID: 36532804 PMCID: PMC9748832 DOI: 10.7717/peerj-cs.1161] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2022] [Accepted: 10/26/2022] [Indexed: 06/17/2023]
Abstract
The pooling function is vital for deep neural networks (DNNs). The operation is to generalize the representation of feature maps and progressively cut down the spatial size of feature maps to optimize the computing consumption of the network. Furthermore, the function is also the basis for the computer vision attention mechanism. However, as a matter of fact, pooling is a down-sampling operation, which makes the feature-map representation approximately to small translations with the summary statistic of adjacent pixels. As a result, the function inevitably leads to information loss more or less. In this article, we propose a fused max-average pooling (FMAPooling) operation as well as an improved channel attention mechanism (FMAttn) by utilizing the two pooling functions to enhance the feature representation for DNNs. Basically, the methods are to enhance multiple-level features extracted by max pooling and average pooling respectively. The effectiveness of the proposals is verified with VGG, ResNet, and MobileNetV2 architectures on CIFAR10/100 and ImageNet100. According to the experimental results, the FMAPooling brings up to 1.63% accuracy improvement compared with the baseline model; the FMAttn achieves up to 2.21% accuracy improvement compared with the previous channel attention mechanism. Furthermore, the proposals are extensible and could be embedded into various DNN models easily, or take the place of certain structures of DNNs. The computation burden introduced by the proposals is negligible.
Collapse
Affiliation(s)
- Hengyi Li
- Graduate School of Science and Engineering, Ritsumeikan University, Kusatsu, Shiga, Japan
| | - Xuebin Yue
- Graduate School of Science and Engineering, Ritsumeikan University, Kusatsu, Shiga, Japan
| | - Lin Meng
- College of Science and Engineering, Ritsumeikan University, Kusatsu, Shiga, Japan
| |
Collapse
|
46
|
D’Souza G, Reddy NVS, Manjunath KN. Localization of lung abnormalities on chest X-rays using self-supervised equivariant attention. Biomed Eng Lett 2022; 13:21-30. [PMID: 36711159 PMCID: PMC9873849 DOI: 10.1007/s13534-022-00249-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2022] [Revised: 10/02/2022] [Accepted: 10/08/2022] [Indexed: 11/06/2022] Open
Abstract
Chest X-Ray (CXR) images provide most anatomical details and the abnormalities on a 2D plane. Therefore, a 2D view of the 3D anatomy is sometimes sufficient for the initial diagnosis. However, close to fourteen commonly occurring diseases are sometimes difficult to identify by visually inspecting the images. Therefore, there is a drift toward developing computer-aided assistive systems to help radiologists. This paper proposes a deep learning model for the classification and localization of chest diseases by using image-level annotations. The model consists of a modified Resnet50 backbone for extracting feature corpus from the images, a classifier, and a pixel correlation module (PCM). During PCM training, the network is a weight-shared siamese architecture where the first branch applies the affine transform to the image before feeding to the network, while the second applies the same transform to the network output. The method was evaluated on CXR from the clinical center in the ratio of 70:20 for training and testing. The model was developed and tested using the cloud computing platform Google Colaboratory (NVidia Tesla P100 GPU, 16 GB of RAM). A radiologist subjectively validated the results. Our model trained with the configurations mentioned in this paper outperformed benchmark results. Supplementary Information The online version contains supplementary material available at 10.1007/s13534-022-00249-5.
Collapse
Affiliation(s)
- Gavin D’Souza
- Department of Instrumentation and Control Engineering, Manipal Institute of Technology, Manipal Academy of Higher Education, Manipal, Karnataka 576104 India
| | - N. V. Subba Reddy
- Department of Information Technology, Manipal Institute of Technology Bengaluru, Manipal Academy of Higher Education, Manipal, Karnataka 560064 India
| | - K. N. Manjunath
- Department of Computer Science and Engineering, Manipal Institute of Technology, Manipal Academy of Higher Education, Manipal, Karnataka 576104 India
| |
Collapse
|
47
|
Zhang J, Liu Y, Wu Q, Wang Y, Liu Y, Xu X, Song B. SWTRU: Star-shaped Window Transformer Reinforced U-Net for medical image segmentation. Comput Biol Med 2022; 150:105954. [PMID: 36122443 DOI: 10.1016/j.compbiomed.2022.105954] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2022] [Revised: 07/19/2022] [Accepted: 08/06/2022] [Indexed: 11/03/2022]
Abstract
In the last decade, deep neural networks have been widely applied to medical image segmentation, achieving good results in computer-aided diagnosis tasks etc. However, the task of segmenting highly complex, low-contrast images of organs and tissues with high accuracy still faces great challenges. To better address this challenge, this paper proposes a novel model SWTRU (Star-shaped Window Transformer Reinforced U-Net) by combining the U-Net network which plays well in the image segmentation field, and the Transformer which possesses a powerful ability to capture global contexts. Unlike the previous methods that import the Transformer into U-Net, an improved Star-shaped Window Transformer is introduced into the decoder of the SWTRU to enhance the decision-making capability of the whole method. The SWTRU uses a redesigned multi-scale skip-connection model, which retains the inductive bias of the original FCN structure for images while obtaining fine-grained features and coarse-grained semantic information. Our method also presents the FFIM (Filtering Feature Integration Mechanism) to integration and dimensionality reduction of the fused multi-layered features, which reduces the computation. Our SWTRU yields 0.972 DICE on CHLISC for liver and tumor segmentation, 0.897 DICE on LGG for glioma segmentation, and 0.904 DICE on ISIC2018 for skin diseases' segmentation, achieves substantial improvements over the current SoTA across 9 different medical image segment methods. SWTRU can combine feature mapping from different scales, high-level semantics, and global contextual relationships, this architecture is effective in the medical image segmentation. The experimental findings indicate that SWTRU produces superior performance on the medical image segmentation tasks.
Collapse
Affiliation(s)
- Jianyi Zhang
- Qingdao University of Science and Technology, China
| | - Yong Liu
- Qingdao University of Science and Technology, China.
| | - Qihang Wu
- Qingdao University of Science and Technology, China
| | - Yongpan Wang
- Qingdao University of Science and Technology, China
| | - Yuhai Liu
- Dawning International Information Industry Co., Ltd, China; Sugon Nanjing Institute, Co., Ltd, China
| | - Xianchong Xu
- Qingdao University of Science and Technology, China
| | - Bo Song
- Qingdao University of Science and Technology, China
| |
Collapse
|
48
|
Zhang ZM, Zhao JP, Wei PJ, Zheng CH. iPromoter-CLA: Identifying promoters and their strength by deep capsule networks with bidirectional long short-term memory. Comput Methods Programs Biomed 2022; 226:107087. [PMID: 36099675 DOI: 10.1016/j.cmpb.2022.107087] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/24/2022] [Revised: 05/14/2022] [Accepted: 08/23/2022] [Indexed: 06/15/2023]
Abstract
BACKGROUND AND OBJECTIVE The promoter is a fragment of DNA and a specific sequence with transcriptional regulation function in DNA. Promoters are located upstream at the transcription start site, which is used to initiate downstream gene expression. So far, promoter identification is mainly achieved by biological methods, which often require more effort. It has become a more effective classification and prediction method to identify promoter types through computational methods. METHODS In this study, we proposed a new capsule network and recurrent neural network hybrid model to identify promoters and predict their strength. Firstly, we used one-hot to encode DNA sequence. Secondly, we used three one-dimensional convolutional layers, a one-dimensional convolutional capsule layer and digit capsule layer to learn local features. Thirdly, a bidirectional long short-time memory was utilized to extract global features. Finally, we adopted the self-attention mechanism to improve the contribution of relatively important features, which further enhances the performance of the model. RESULTS Our model attains a cross-validation accuracy of 86% and 73.46% in prokaryotic promoter recognition and their strength prediction, which showcases a better performance compared with the existing approaches in both the first layer promoter identification and the second layer promoter's strength prediction. CONCLUSIONS our model not only combines convolutional neural network and capsule layer but also uses a self-attention mechanism to better capture hidden information features from the perspective of sequence. Thus, we hope that our model can be widely applied to other components.
Collapse
Affiliation(s)
- Zhi-Min Zhang
- College of Mathematics and System Sciences, Xinjiang University, Urumqi, China
| | - Jian-Ping Zhao
- College of Mathematics and System Sciences, Xinjiang University, Urumqi, China.
| | - Pi-Jing Wei
- Institutes of Physical Science and Information Technology, Anhui University, Hefei, China
| | - Chun-Hou Zheng
- College of Mathematics and System Sciences, Xinjiang University, Urumqi, China; School of Artificial Intelligence, Anhui University, Hefei, China
| |
Collapse
|
49
|
Lee JRH, Pavlova M, Famouri M, Wong A. Cancer-Net SCa: tailored deep neural network designs for detection of skin cancer from dermoscopy images. BMC Med Imaging 2022; 22:143. [PMID: 35945505 PMCID: PMC9364616 DOI: 10.1186/s12880-022-00871-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2021] [Accepted: 07/26/2022] [Indexed: 11/25/2022] Open
Abstract
Background Skin cancer continues to be the most frequently diagnosed form of cancer in the U.S., with not only significant effects on health and well-being but also significant economic costs associated with treatment. A crucial step to the treatment and management of skin cancer is effective early detection with key screening approaches such as dermoscopy examinations, leading to stronger recovery prognoses. Motivated by the advances of deep learning and inspired by the open source initiatives in the research community, in this study we introduce Cancer-Net SCa, a suite of deep neural network designs tailored for the detection of skin cancer from dermoscopy images that is open source and available to the general public. To the best of the authors’ knowledge, Cancer-Net SCa comprises the first machine-driven design of deep neural network architectures tailored specifically for skin cancer detection, one of which leverages attention condensers for an efficient self-attention design. Results We investigate and audit the behaviour of Cancer-Net SCa in a responsible and transparent manner through explainability-driven performance validation. All the proposed designs achieved improved accuracy when compared to the ResNet-50 architecture while also achieving significantly reduced architectural and computational complexity. In addition, when evaluating the decision making process of the networks, it can be seen that diagnostically relevant critical factors are leveraged rather than irrelevant visual indicators and imaging artifacts. Conclusion The proposed Cancer-Net SCa designs achieve strong skin cancer detection performance on the International Skin Imaging Collaboration (ISIC) dataset, while providing a strong balance between computation and architectural efficiency and accuracy. While Cancer-Net SCa is not a production-ready screening solution, the hope is that the release of Cancer-Net SCa in open source, open access form will encourage researchers, clinicians, and citizen data scientists alike to leverage and build upon them.
Collapse
Affiliation(s)
- James Ren Hou Lee
- Vision and Image Processing Research Group, University of Waterloo, Waterloo, Canada.
| | - Maya Pavlova
- Vision and Image Processing Research Group, University of Waterloo, Waterloo, Canada.,DarwinAI Corp, Waterloo, Canada
| | | | - Alexander Wong
- Vision and Image Processing Research Group, University of Waterloo, Waterloo, Canada.,Waterloo Artificial Intelligence Institute, University of Waterloo, Waterloo, Canada.,DarwinAI Corp, Waterloo, Canada
| |
Collapse
|
50
|
Tanzi L, Audisio A, Cirrincione G, Aprato A, Vezzetti E. Vision Transformer for femur fracture classification. Injury 2022; 53:2625-2634. [PMID: 35469638 DOI: 10.1016/j.injury.2022.04.013] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/19/2022] [Revised: 04/01/2022] [Accepted: 04/15/2022] [Indexed: 02/02/2023]
Abstract
INTRODUCTION In recent years, the scientific community focused on developing Computer-Aided Diagnosis (CAD) tools that could improve clinicians' bone fractures diagnosis, primarily based on Convolutional Neural Networks (CNNs). However, the discerning accuracy of fractures' subtypes was far from optimal. The aim of the study was 1) to evaluate a new CAD system based on Vision Transformers (ViT), a very recent and powerful deep learning technique, and 2) to assess whether clinicians' diagnostic accuracy could be improved using this system. MATERIALS AND METHODS 4207 manually annotated images were used and distributed, by following the AO/OTA classification, in different fracture types. The ViT architecture was used and compared with a classic CNN and a multistage architecture composed of successive CNNs. To demonstrate the reliability of this approach, (1) the attention maps were used to visualize the most relevant areas of the images, (2) the performance of a generic CNN and ViT was compared through unsupervised learning techniques, and (3) 11 clinicians were asked to evaluate and classify 150 proximal femur fractures' images with and without the help of the ViT, then results were compared for potential improvement. RESULTS The ViT was able to predict 83% of the test images correctly. Precision, recall and F1-score were 0.77 (CI 0.64-0.90), 0.76 (CI 0.62-0.91) and 0.77 (CI 0.64-0.89), respectively. The clinicians' diagnostic improvement was 29% (accuracy 97%; p 0.003) when supported by ViT's predictions, outperforming the algorithm alone. CONCLUSIONS This paper showed the potential of Vision Transformers in bone fracture classification. For the first time, good results were obtained in sub-fractures classification, outperforming the state of the art. Accordingly, the assisted diagnosis yielded the best results, proving the effectiveness of collaborative work between neural networks and clinicians.
Collapse
Affiliation(s)
- Leonardo Tanzi
- DIGEP, Polytechnic University of Turin, Corso Duca degli Abruzzi 24, Torino 10129, Italy.
| | - Andrea Audisio
- School of Medicine, University of Turin, Torino 10133, Italy
| | | | | | - Enrico Vezzetti
- DIGEP, Polytechnic University of Turin, Corso Duca degli Abruzzi 24, Torino 10129, Italy
| |
Collapse
|