1
|
Dong C, Du G. An enhanced real-time human pose estimation method based on modified YOLOv8 framework. Sci Rep 2024; 14:8012. [PMID: 38580704 PMCID: PMC10997650 DOI: 10.1038/s41598-024-58146-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2024] [Accepted: 03/26/2024] [Indexed: 04/07/2024] Open
Abstract
The objective of human pose estimation (HPE) derived from deep learning aims to accurately estimate and predict the human body posture in images or videos via the utilization of deep neural networks. However, the accuracy of real-time HPE tasks is still to be improved due to factors such as partial occlusion of body parts and limited receptive field of the model. To alleviate the accuracy loss caused by these issues, this paper proposes a real-time HPE model called CCAM - Person based on the YOLOv8 framework. Specifically, we have improved the backbone and neck of the YOLOv8x-pose real-time HPE model to alleviate the feature loss and receptive field constraints. Secondly, we introduce the context coordinate attention module (CCAM) to augment the model's focus on salient features, reduce background noise interference, alleviate key point regression failure caused by limb occlusion, and improve the accuracy of pose estimation. Our approach attains competitive results on multiple metrics of two open-source datasets, MS COCO 2017 and CrowdPose. Compared with the baseline model YOLOv8x-pose, CCAM-Person improves the average precision by 2.8% and 3.5% on the two datasets, respectively.
Collapse
Affiliation(s)
- Chengang Dong
- Nanjing University of Aeronautics and Astronautics, Nanjing, 210000, Jiangsu, China
| | - Guodong Du
- Nanjing University of Aeronautics and Astronautics, Nanjing, 210000, Jiangsu, China.
| |
Collapse
|
2
|
Hussein R, Shin D, Zhao MY, Guo J, Davidzon G, Steinberg G, Moseley M, Zaharchuk G. Turning brain MRI into diagnostic PET: 15O-water PET CBF synthesis from multi-contrast MRI via attention-based encoder-decoder networks. Med Image Anal 2024; 93:103072. [PMID: 38176356 PMCID: PMC10922206 DOI: 10.1016/j.media.2023.103072] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2022] [Revised: 12/20/2023] [Accepted: 12/20/2023] [Indexed: 01/06/2024]
Abstract
Accurate quantification of cerebral blood flow (CBF) is essential for the diagnosis and assessment of a wide range of neurological diseases. Positron emission tomography (PET) with radiolabeled water (15O-water) is the gold-standard for the measurement of CBF in humans, however, it is not widely available due to its prohibitive costs and the use of short-lived radiopharmaceutical tracers that require onsite cyclotron production. Magnetic resonance imaging (MRI), in contrast, is more accessible and does not involve ionizing radiation. This study presents a convolutional encoder-decoder network with attention mechanisms to predict the gold-standard 15O-water PET CBF from multi-contrast MRI scans, thus eliminating the need for radioactive tracers. The model was trained and validated using 5-fold cross-validation in a group of 126 subjects consisting of healthy controls and cerebrovascular disease patients, all of whom underwent simultaneous 15O-water PET/MRI. The results demonstrate that the model can successfully synthesize high-quality PET CBF measurements (with an average SSIM of 0.924 and PSNR of 38.8 dB) and is more accurate compared to concurrent and previous PET synthesis methods. We also demonstrate the clinical significance of the proposed algorithm by evaluating the agreement for identifying the vascular territories with impaired CBF. Such methods may enable more widespread and accurate CBF evaluation in larger cohorts who cannot undergo PET imaging due to radiation concerns, lack of access, or logistic challenges.
Collapse
Affiliation(s)
- Ramy Hussein
- Radiological Sciences Laboratory, Department of Radiology, Stanford University, Stanford, CA 94305, USA.
| | - David Shin
- Global MR Applications & Workflow, GE Healthcare, Menlo Park, CA 94025, USA
| | - Moss Y Zhao
- Radiological Sciences Laboratory, Department of Radiology, Stanford University, Stanford, CA 94305, USA; Stanford Cardiovascular Institute, Stanford University, Stanford, CA 94305, USA
| | - Jia Guo
- Department of Bioengineering, University of California, Riverside, CA 92521, USA
| | - Guido Davidzon
- Division of Nuclear Medicine, Department of Radiology, Stanford University, Stanford, CA 94305, USA
| | - Gary Steinberg
- Department of Neurosurgery, Stanford University, Stanford, CA 94304, USA
| | - Michael Moseley
- Radiological Sciences Laboratory, Department of Radiology, Stanford University, Stanford, CA 94305, USA
| | - Greg Zaharchuk
- Radiological Sciences Laboratory, Department of Radiology, Stanford University, Stanford, CA 94305, USA
| |
Collapse
|
3
|
Wu Y, Li J, Wang X, Zhang Z, Zhao S. DECIDE: A decoupled semantic and boundary learning network for precise osteosarcoma segmentation by integrating multi-modality MRI. Comput Biol Med 2024; 174:108308. [PMID: 38581998 DOI: 10.1016/j.compbiomed.2024.108308] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2023] [Revised: 01/17/2024] [Accepted: 03/12/2024] [Indexed: 04/08/2024]
Abstract
Automated Osteosarcoma Segmentation in Multi-modality MRI (AOSMM) holds clinical significance for effective tumor evaluation and treatment planning. However, the precision of AOSMM is challenged by the diverse characteristics of multi-modality MRI and the inherent heterogeneity and boundary ambiguity of osteosarcoma. While numerous methods have made significant strides in automated osteosarcoma segmentation, they primarily focused on the use of a single MRI modality and overlooked the potential benefits of integrating complementary information from other MRI modalities. Furthermore, they did not adequately model the long-range dependencies of complex tumor features, which may lead to insufficiently discriminative feature representations. To this end, we propose a decoupled semantic and boundary learning network (DECIDE) to achieve precise AOSMM with three functional modules. The Multi-modality Feature Fusion and Recalibration (MFR) module adaptively fuses and recalibrates multi-modality features by exploiting their channel-wise dependencies to compute low-rank attention weights for effectively aggregating useful information from different MRI modalities, which promotes complementary learning between multi-modality MRI and enables a more comprehensive tumor characterization. The Lesion Attention Enhancement (LAE) module employs spatial and channel attention mechanisms to capture global contextual dependencies over local features, significantly enhancing the discriminability and representational capacity of intricate tumor features. The Boundary Context Aggregation (BCA) module further enhances semantic representations by utilizing boundary information for effective context aggregation while also ensuring intra-class consistency in cases of boundary ambiguity. Substantial experiments demonstrate that DECIDE achieves exceptional performance in osteosarcoma segmentation, surpassing state-of-the-art methods in terms of accuracy and stability.
Collapse
Affiliation(s)
- Yinhao Wu
- Department of Intelligent Systems Engineering, Sun Yat-sen University, Shenzhen, 518107, China
| | - Jianqi Li
- The First Affiliated Hospital of Sun Yat-sen University, Guangzhou, 510080, China
| | - Xinxin Wang
- Department of Intelligent Systems Engineering, Sun Yat-sen University, Shenzhen, 518107, China
| | - Zhaohui Zhang
- The First Affiliated Hospital of Sun Yat-sen University, Guangzhou, 510080, China.
| | - Shen Zhao
- Department of Intelligent Systems Engineering, Sun Yat-sen University, Shenzhen, 518107, China.
| |
Collapse
|
4
|
V JP, S AAV, P GK, N K K. A novel attention-based cross-modal transfer learning framework for predicting cardiovascular disease. Comput Biol Med 2024; 170:107977. [PMID: 38217974 DOI: 10.1016/j.compbiomed.2024.107977] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2023] [Revised: 12/19/2023] [Accepted: 01/08/2024] [Indexed: 01/15/2024]
Abstract
Cardiovascular disease (CVD) remains a leading cause of death globally, presenting significant challenges in early detection and treatment. The complexity of CVD arises from its multifaceted nature, influenced by a combination of genetic, environmental, and lifestyle factors. Traditional diagnostic approaches often struggle to effectively integrate and interpret the heterogeneous data associated with CVD. Addressing this challenge, we introduce a novel Attention-Based Cross-Modal (ABCM) transfer learning framework. This framework innovatively merges diverse data types, including clinical records, medical imagery, and genetic information, through an attention-driven mechanism. This mechanism adeptly identifies and focuses on the most pertinent attributes from each data source, thereby enhancing the model's ability to discern intricate interrelationships among various data types. Our extensive testing and validation demonstrate that the ABCM framework significantly surpasses traditional single-source models and other advanced multi-source methods in predicting CVD. Specifically, our approach achieves an accuracy of 93.5%, precision of 92.0%, recall of 94.5%, and an impressive area under the curve (AUC) of 97.2%. These results not only underscore the superior predictive capability of our model but also highlight its potential in offering more accurate and early detection of CVD. The integration of cross-modal data through attention-based mechanisms provides a deeper understanding of the disease, paving the way for more informed clinical decision-making and personalized patient care.
Collapse
Affiliation(s)
- Jothi Prakash V
- Karpagam College of Engineering, Myleripalayam Village, Coimbatore, 641032, Tamil Nadu, India.
| | - Arul Antran Vijay S
- Karpagam College of Engineering, Myleripalayam Village, Coimbatore, 641032, Tamil Nadu, India.
| | - Ganesh Kumar P
- College of Engineering, Guindy, Anna University, Chennai, 600025, Tamil Nadu, India.
| | - Karthikeyan N K
- Coimbatore Institute of Technology, Peelamedu, Coimbatore, 641014, Tamil Nadu, India.
| |
Collapse
|
5
|
Lu P, Zhang W, Wu J. AMPCDA: Prediction of circRNA-disease associations by utilizing attention mechanisms on metapaths. Comput Biol Chem 2024; 108:107989. [PMID: 38016366 DOI: 10.1016/j.compbiolchem.2023.107989] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2023] [Revised: 10/24/2023] [Accepted: 11/15/2023] [Indexed: 11/30/2023]
Abstract
Researchers have been creating an expanding corpus of experimental evidences in biomedical field which has revealed prevalent associations between circRNAs and human diseases. Such linkages unveiled afforded a new perspective for elucidating etiology and devise innovative therapeutic strategies. In recent years, many computational methods were introduced to remedy the limitations of inefficiency and exorbitant budgets brought by conventional lab-experimental approaches to enumerate possible circRNA-disease associations, but the majority of existing methods still face challenges in effectively integrating node embeddings with higher-order neighborhood representations, which might hinder the final predictive accuracy from attaining optimal measures. To overcome such constraints, we proposed AMPCDA, a computational technique harnessing predefined metapaths to predict circRNA-disease associations. Specifically, an association graph is initially built upon three source databases and two similarity derivation procedures, and DeepWalk is subsequently imposed on the graph to procure initial feature representations. Vectorial embeddings of metapath instances, concatenated by initial node features, are then fed through a customed encoder. By employing self-attention section, metapath-specific contributions to each node are accumulated before combining with node's intrinsic features and channeling into a graph attention module, which furnished the input representations for the multilayer perceptron to predict the ultimate association probability scores. By integrating graph topology features and node embedding themselves, AMPCDA managed to effectively leverage information carried by multiple nodes along paths and exhibited an exceptional predictive performance, achieving AUC values of 0.9623, 0.9675, and 0.9711 under 5-fold cross validation, 10-fold cross validation, and leave-one-out cross validation, respectively. These results signify substantial accuracy improvements compared to other prediction models. Case study assessments confirm the high predictive accuracy of our proposed technique in identifying circRNA-disease connections, highlighting its value in guiding future biological research to reveal new disease mechanisms.
Collapse
Affiliation(s)
- Pengli Lu
- School of Computer and Communication, Lanzhou University of Technology, Lanzhou, 730050, Gansu, PR China.
| | - Wenqi Zhang
- School of Computer and Communication, Lanzhou University of Technology, Lanzhou, 730050, Gansu, PR China.
| | - Jinkai Wu
- School of Computer and Communication, Lanzhou University of Technology, Lanzhou, 730050, Gansu, PR China.
| |
Collapse
|
6
|
Wang Z, Yang P, Hu L, Zhang B, Lin C, Lv W, Wang Q. SLAPP: Subgraph-level attention-based performance prediction for deep learning models. Neural Netw 2024; 170:285-297. [PMID: 38000312 DOI: 10.1016/j.neunet.2023.11.043] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2023] [Revised: 10/30/2023] [Accepted: 11/20/2023] [Indexed: 11/26/2023]
Abstract
The intricacy of the Deep Learning (DL) landscape, brimming with a variety of models, applications, and platforms, poses considerable challenges for the optimal design, optimization, or selection of suitable DL models. One promising avenue to address this challenge is the development of accurate performance prediction methods. However, existing methods reveal critical limitations. Operator-level methods, proficient at predicting the performance of individual operators, often neglect broader graph features, which results in inaccuracies in full network performance predictions. On the contrary, graph-level methods excel in overall network prediction by leveraging these graph features but lack the ability to predict the performance of individual operators. To bridge these gaps, we propose SLAPP, a novel subgraph-level performance prediction method. Central to SLAPP is an innovative variant of Graph Neural Networks (GNNs) that we developed, named the Edge Aware Graph Attention Network (EAGAT). This specially designed GNN enables superior encoding of both node and edge features. Through this approach, SLAPP effectively captures both graph and operator features, thereby providing precise performance predictions for individual operators and entire networks. Moreover, we introduce a mixed loss design with dynamic weight adjustment to reconcile the predictive accuracy between individual operators and entire networks. In our experimental evaluation, SLAPP consistently outperforms traditional approaches in prediction accuracy, including the ability to handle unseen models effectively. Moreover, when compared to existing research, our method demonstrates a superior predictive performance across multiple DL models.
Collapse
Affiliation(s)
- Zhenyi Wang
- School of Computer Science and Technology, Xidian University, Xi'an, 710071, China; The Key Laboratory of Smart Human-Computer Interaction and Wearable Technology of Shaanxi Province, Xi'an, 710071, China.
| | - Pengfei Yang
- School of Computer Science and Technology, Xidian University, Xi'an, 710071, China; The Key Laboratory of Smart Human-Computer Interaction and Wearable Technology of Shaanxi Province, Xi'an, 710071, China.
| | - Linwei Hu
- School of Computer Science and Technology, Xidian University, Xi'an, 710071, China; The Key Laboratory of Smart Human-Computer Interaction and Wearable Technology of Shaanxi Province, Xi'an, 710071, China.
| | - Bowen Zhang
- School of Computer Science and Technology, Xidian University, Xi'an, 710071, China; The Key Laboratory of Smart Human-Computer Interaction and Wearable Technology of Shaanxi Province, Xi'an, 710071, China.
| | - Chengmin Lin
- School of Computer Science and Technology, Xidian University, Xi'an, 710071, China; The Key Laboratory of Smart Human-Computer Interaction and Wearable Technology of Shaanxi Province, Xi'an, 710071, China.
| | - Wenkai Lv
- School of Computer Science and Technology, Xidian University, Xi'an, 710071, China; The Key Laboratory of Smart Human-Computer Interaction and Wearable Technology of Shaanxi Province, Xi'an, 710071, China.
| | - Quan Wang
- School of Computer Science and Technology, Xidian University, Xi'an, 710071, China; The Key Laboratory of Smart Human-Computer Interaction and Wearable Technology of Shaanxi Province, Xi'an, 710071, China.
| |
Collapse
|
7
|
Luo H, Wei J, Wang Y, Chen J, Li W. An improved lightweight object detection algorithm for YOLOv5. PeerJ Comput Sci 2024; 10:e1830. [PMID: 38435620 PMCID: PMC10909222 DOI: 10.7717/peerj-cs.1830] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2023] [Accepted: 12/29/2023] [Indexed: 03/05/2024]
Abstract
Object detection based on deep learning has made great progress in the past decade and has been widely used in various fields of daily life. Model lightweighting is the core of deploying target detection models on mobile or edge devices. Lightweight models have fewer parameters and lower computational costs, but are often accompanied by lower detection accuracy. Based on YOLOv5s, this article proposes an improved lightweight target detection model, which can achieve higher detection accuracy with smaller parameters. Firstly, utilizing the lightweight feature of the Ghost module, we integrated it into the C3 structure and replaced some of the C3 modules after the upsample layer on the neck network, thereby reducing the number of model parameters and expediting the model's inference process. Secondly, the coordinate attention (CA) mechanism was added to the neck to enhance the model's ability to pay attention to relevant information and improved detection accuracy. Finally, a more efficient Simplified Spatial Pyramid Pooling-Fast (SimSPPF) module was designed to enhance the stability of the model and shorten the training time of the model. In order to verify the effectiveness of the improved model, experiments were conducted using three datasets with different features. Experimental results show that the number of parameters of our model is significantly reduced by 28% compared with the original model, and mean average precision (mAP) is increased by 3.1%, 1.1% and 1.8% respectively. The model also performs better in terms of accuracy compared to existing lightweight state-of-the-art models. On three datasets with different features, mAP of the proposed model achieved 87.2%, 77.8% and 92.3%, which is better than YOLOv7tiny (81.4%, 77.7%, 90.3%), YOLOv8n (84.7%, 77.7%, 90.6%) and other advanced models. When achieving the decreased number of parameters, the improved model can successfully increase mAP, providing great reference for deploying the model on mobile or edge devices.
Collapse
Affiliation(s)
- Hao Luo
- College of Information Engineering, Sichuan Agricultural University, Ya’an, Sichuan, China
| | - Jiangshu Wei
- College of Information Engineering, Sichuan Agricultural University, Ya’an, Sichuan, China
| | - Yuchao Wang
- College of Mechanical and Electrical Engineering, Sichuan Agricultural University, Ya’an, Sichuan, China
| | - Jinrong Chen
- College of Information Engineering, Sichuan Agricultural University, Ya’an, Sichuan, China
| | - Wujie Li
- College of Information Engineering, Sichuan Agricultural University, Ya’an, Sichuan, China
| |
Collapse
|
8
|
Zhao Z, Zhu J, Jiao P, Wang J, Zhang X, Lu X, Zhang Y. Hybrid-FHR: a multi-modal AI approach for automated fetal acidosis diagnosis. BMC Med Inform Decis Mak 2024; 24:19. [PMID: 38247009 PMCID: PMC10801938 DOI: 10.1186/s12911-024-02423-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2023] [Accepted: 01/10/2024] [Indexed: 01/23/2024] Open
Abstract
BACKGROUND In clinical medicine, fetal heart rate (FHR) monitoring using cardiotocography (CTG) is one of the most commonly used methods for assessing fetal acidosis. However, as the visual interpretation of CTG depends on the subjective judgment of the clinician, this has led to high inter-observer and intra-observer variability, making it necessary to introduce automated diagnostic techniques. METHODS In this study, we propose a computer-aided diagnostic algorithm (Hybrid-FHR) for fetal acidosis to assist physicians in making objective decisions and taking timely interventions. Hybrid-FHR uses multi-modal features, including one-dimensional FHR signals and three types of expert features designed based on prior knowledge (morphological time domain, frequency domain, and nonlinear). To extract the spatiotemporal feature representation of one-dimensional FHR signals, we designed a multi-scale squeeze and excitation temporal convolutional network (SE-TCN) backbone model based on dilated causal convolution, which can effectively capture the long-term dependence of FHR signals by expanding the receptive field of each layer's convolution kernel while maintaining a relatively small parameter size. In addition, we proposed a cross-modal feature fusion (CMFF) method that uses multi-head attention mechanisms to explore the relationships between different modalities, obtaining more informative feature representations and improving diagnostic accuracy. RESULTS Our ablation experiments show that the Hybrid-FHR outperforms traditional previous methods, with average accuracy, specificity, sensitivity, precision, and F1 score of 96.8, 97.5, 96, 97.5, and 96.7%, respectively. CONCLUSIONS Our algorithm enables automated CTG analysis, assisting healthcare professionals in the early identification of fetal acidosis and the prompt implementation of interventions.
Collapse
Affiliation(s)
- Zhidong Zhao
- School of Cyberspace, Hangzhou Dianzi University, Hangzhou, China.
| | - Jiawei Zhu
- College of Electronics and Information Engineering, Hangzhou Dianzi University, Hangzhou, China
| | - Pengfei Jiao
- School of Cyberspace, Hangzhou Dianzi University, Hangzhou, China
| | - Jinpeng Wang
- School of Cyberspace, Hangzhou Dianzi University, Hangzhou, China
| | - Xiaohong Zhang
- College of Electronics and Information Engineering, Hangzhou Dianzi University, Hangzhou, China
| | - Xinmiao Lu
- College of Electronics and Information Engineering, Hangzhou Dianzi University, Hangzhou, China
| | - Yefei Zhang
- School of Cyberspace, Hangzhou Dianzi University, Hangzhou, China
| |
Collapse
|
9
|
Jia J, Lv P, Wei X, Qiu W. SNO-DCA: A model for predicting S-nitrosylation sites based on densely connected convolutional networks and attention mechanism. Heliyon 2024; 10:e23187. [PMID: 38148797 PMCID: PMC10750070 DOI: 10.1016/j.heliyon.2023.e23187] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2023] [Revised: 11/22/2023] [Accepted: 11/29/2023] [Indexed: 12/28/2023] Open
Abstract
Protein S-nitrosylation is a reversible oxidative reduction post-translational modification that is widely present in the biological community. S-nitrosylation can regulate protein function and is closely associated with a variety of diseases, thus identifying S-nitrosylation sites are crucial for revealing the function of proteins and related drug discovery. Traditional experimental methods are time-consuming and expensive; therefore, it is necessary to explore more efficient computational methods. Deep learning algorithms perform well in the field of bioinformatics sites prediction, and many studies show that they outperform existing machine learning algorithms. In this work, we proposed a deep learning algorithm-based predictor SNO-DCA for distinguishing between S-nitrosylated and non-S-nitrosylated sequences. First, one-hot encoding of protein sequences was performed. Second, the dense convolutional blocks were used to capture feature information, and an attention module was added to weigh different features to improve the prediction ability of the model. The 10-fold cross-validation and independent testing experimental results show that our SNO-DCA model outperforms existing S-nitrosylation sites prediction models under imbalanced data. In this paper, a web server prediction website: https://sno.cangmang.xyz/SNO-DCA/was established to provide an online prediction service for users. SNO-DCA can be available at https://github.com/peanono/SNO-DCA.
Collapse
Affiliation(s)
- Jianhua Jia
- Computer Department, Jingdezhen Ceramic University, Jingdezhen, 330403, China
| | - Peinuo Lv
- Computer Department, Jingdezhen Ceramic University, Jingdezhen, 330403, China
| | - Xin Wei
- Business School, Jiangxi Institute of Fashion Technology, Nanchang, 330201, China
| | - Wangren Qiu
- Computer Department, Jingdezhen Ceramic University, Jingdezhen, 330403, China
| |
Collapse
|
10
|
Huang H, Chen P, Wen J, Lu X, Zhang N. Multiband seizure type classification based on 3D convolution with attention mechanisms. Comput Biol Med 2023; 166:107517. [PMID: 37778214 DOI: 10.1016/j.compbiomed.2023.107517] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2023] [Revised: 08/28/2023] [Accepted: 09/19/2023] [Indexed: 10/03/2023]
Abstract
Electroencephalogram (EEG) signal contains important information about abnormal brain activity, which has become an important basis for epilepsy diagnosis. Recently, epilepsy EEG signal classification methods mainly extract features from the perspective of a single domain, which cannot effectively utilize the spatial domain information in EEG signals. The redundant information in EEG signals will affect the learning features with the increase of convolution layer and multi-domain features, resulting in inefficient learning and a lack of distinguishing features. To tackle these issues, we propose an end-to-end 3D convolutional multiband seizure-type classification model based on attention mechanisms. Specifically, to process preprocessed electroencephalogram (EEG) data, a multilevel wavelet decomposition is applied to obtain the joint distribution information in the two-dimensional time-frequency domain across multiple frequency bands. Subsequently, this information is transformed into three-dimensional spatial data based on the electrode configuration. Discriminative joint activity features in the time, frequency, and spatial domains are then extracted by a series of parallel 3D convolutional sub-networks, where 3D channels and spatial attention mechanisms improve the ability to learn critical global and local information. A multi-layer perceptron is finally implemented to integrate the extracted features and further map them to the classification results. Experimental results on the TUSZ dataset, the world's largest publicly available seizure corpus, show that 3D-CBAMNet significantly outperforms the state-of-the-art methods, indicating effectiveness in the seizure type classification task.
Collapse
Affiliation(s)
- Hui Huang
- College of Computer Science and Artificial Intelligence, Wenzhou University, Wenzhou 325035, China.
| | - Peiyu Chen
- College of Computer Science and Artificial Intelligence, Wenzhou University, Wenzhou 325035, China
| | - Jianfeng Wen
- College of Computer Science and Artificial Intelligence, Wenzhou University, Wenzhou 325035, China
| | - Xuzhe Lu
- College of Computer Science and Artificial Intelligence, Wenzhou University, Wenzhou 325035, China
| | - Nan Zhang
- College of Computer Science and Artificial Intelligence, Wenzhou University, Wenzhou 325035, China
| |
Collapse
|
11
|
Qiu C, Huang Z, Lin C, Zhang G, Ying S. A despeckling method for ultrasound images utilizing content-aware prior and attention-driven techniques. Comput Biol Med 2023; 166:107515. [PMID: 37839221 DOI: 10.1016/j.compbiomed.2023.107515] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2023] [Revised: 08/25/2023] [Accepted: 09/19/2023] [Indexed: 10/17/2023]
Abstract
The despeckling of ultrasound images contributes to the enhancement of image quality and facilitates precise treatment of conditions such as tumor cancers. However, the use of existing methods for eliminating speckle noise can cause the loss of image texture features, impacting clinical judgment. Thus, maintaining clear lesion boundaries while eliminating speckle noise is a challenging task. This paper presents an innovative approach for denoising ultrasound images using a novel noise reduction network model called content-aware prior and attention-driven (CAPAD). The model employs a neural network to automatically capture the hidden prior features in ultrasound images to guide denoising and embeds the denoiser into the optimization module to simultaneously optimize parameters and noise. Moreover, this model incorporates a content-aware attention module and a loss function that preserves the structural characteristics of the image. These additions enhance the network's capacity to capture and retain valuable information. Extensive qualitative evaluation and quantitative analysis performed on a comprehensive dataset provide compelling evidence of the model's superior denoising capabilities. It excels in noise suppression while successfully preserving the underlying structures within the ultrasound images. Compared to other denoising algorithms, it demonstrates an improvement of approximately 5.88% in PSNR and approximately 3.61% in SSIM. Furthermore, using CAPAD as a preprocessing step for breast tumor segmentation in ultrasound images can greatly improve the accuracy of image segmentation. The experimental results indicate that the utilization of CAPAD leads to a notable enhancement of 10.43% in the AUPRC for breast cancer tumor segmentation.
Collapse
Affiliation(s)
- Chenghao Qiu
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, 610000, Sichuan, China.
| | - Zifan Huang
- School of Electronics and Information Engineering, Guangdong Ocean University, Zhanjiang, 524088, China.
| | - Cong Lin
- School of Electronics and Information Engineering, Guangdong Ocean University, Zhanjiang, 524088, China.
| | - Guodao Zhang
- Department of Digital Media Technology, Hangzhou Dianzi University, Hangzhou, 310018, China.
| | - Shenpeng Ying
- Department of Radiotherapy, Taizhou Central Hospital (Taizhou University Hospital), Taizhou, 318000, China.
| |
Collapse
|
12
|
Lu S, Liu M, Yin L, Yin Z, Liu X, Zheng W. The multi-modal fusion in visual question answering: a review of attention mechanisms. PeerJ Comput Sci 2023; 9:e1400. [PMID: 37346665 PMCID: PMC10280591 DOI: 10.7717/peerj-cs.1400] [Citation(s) in RCA: 38] [Impact Index Per Article: 38.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2022] [Accepted: 04/25/2023] [Indexed: 06/23/2023]
Abstract
Visual Question Answering (VQA) is a significant cross-disciplinary issue in the fields of computer vision and natural language processing that requires a computer to output a natural language answer based on pictures and questions posed based on the pictures. This requires simultaneous processing of multimodal fusion of text features and visual features, and the key task that can ensure its success is the attention mechanism. Bringing in attention mechanisms makes it better to integrate text features and image features into a compact multi-modal representation. Therefore, it is necessary to clarify the development status of attention mechanism, understand the most advanced attention mechanism methods, and look forward to its future development direction. In this article, we first conduct a bibliometric analysis of the correlation through CiteSpace, then we find and reasonably speculate that the attention mechanism has great development potential in cross-modal retrieval. Secondly, we discuss the classification and application of existing attention mechanisms in VQA tasks, analysis their shortcomings, and summarize current improvement methods. Finally, through the continuous exploration of attention mechanisms, we believe that VQA will evolve in a smarter and more human direction.
Collapse
Affiliation(s)
- Siyu Lu
- School of Automation Engineering, University of Electronic Science and Technology of China, Chengdu, Sichuan Province, China
| | - Mingzhe Liu
- School of Data Science and Artificial Intelligence, Wenzhou University of Technology, Wenzhou, China
| | - Lirong Yin
- Department of Geography and Anthropology, Louisiana State University, Baton Rouge, LA, United States of America
| | - Zhengtong Yin
- College of Resource and Environment Engineering, Guizhou University, Guiyang, China
| | - Xuan Liu
- School of Public Affairs and Administration, University of Electronic Science and Technology of China, Chengdu, China
| | - Wenfeng Zheng
- School of Automation Engineering, University of Electronic Science and Technology of China, Chengdu, Sichuan Province, China
| |
Collapse
|
13
|
Fan T, Qiu S, Wang Z, Zhao H, Jiang J, Wang Y, Xu J, Sun T, Jiang N. A new deep convolutional neural network incorporating attentional mechanisms for ECG emotion recognition. Comput Biol Med 2023; 159:106938. [PMID: 37119553 DOI: 10.1016/j.compbiomed.2023.106938] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2023] [Revised: 03/28/2023] [Accepted: 04/14/2023] [Indexed: 05/01/2023]
Abstract
Using ECG signals captured by wearable devices for emotion recognition is a feasible solution. We propose a deep convolutional neural network incorporating attentional mechanisms for ECG emotion recognition. In order to address the problem of individuality differences in emotion recognition tasks, we incorporate an improved Convolutional Block Attention Module (CBAM) into the proposed deep convolutional neural network. The deep convolutional neural network is responsible for capturing ECG features. Channel attention in CBAM is responsible for adding weight information to ECG features of different channels and spatial attention is responsible for the weighted representation of ECG features of different regions inside the channel. We used three publicly available datasets, WESAD, DREAMER, and ASCERTAIN, for the ECG emotion recognition task. The new state-of-the-art results are set in three datasets for multi-class classification results, WESAD for tri-class results, and ASCERTAIN for two-category results, respectively. A large number of experiments are performed, providing an interesting analysis of the design of the convolutional structure parameters and the role of the attention mechanism used. We propose to use large convolutional kernels to improve the effective perceptual field of the model and thus fully capture the ECG signal features, which achieves better performance compared to the commonly used small kernels. In addition, channel attention and spatial attention were added to the deep convolutional model separately to explore their contribution levels. We found that in most cases, channel attention contributed to the model at a higher level than spatial attention.
Collapse
Affiliation(s)
- Tianqi Fan
- Key Laboratory of Intelligent Control and Optimization for Industrial Equipment of Ministry of Education, Dalian University of Technology, Dalian, China.
| | - Sen Qiu
- Key Laboratory of Intelligent Control and Optimization for Industrial Equipment of Ministry of Education, Dalian University of Technology, Dalian, China.
| | - Zhelong Wang
- Key Laboratory of Intelligent Control and Optimization for Industrial Equipment of Ministry of Education, Dalian University of Technology, Dalian, China.
| | - Hongyu Zhao
- Key Laboratory of Intelligent Control and Optimization for Industrial Equipment of Ministry of Education, Dalian University of Technology, Dalian, China.
| | - Junhan Jiang
- First Affiliated Hospital of China Medical University, Shenyang, China.
| | | | - Junnan Xu
- Department of Medical Oncology, Cancer Hospital of Dalian University of Technology, Shenyang, China.
| | - Tao Sun
- Department of Medical Oncology, Cancer Hospital of Dalian University of Technology, Shenyang, China.
| | - Nan Jiang
- College of Information Engineering, East China Jiaotong University, Nanchang, China.
| |
Collapse
|
14
|
Zhan B, Song E, Liu H. FSA-Net: Rethinking the attention mechanisms in medical image segmentation from releasing global suppressed information. Comput Biol Med 2023; 161:106932. [PMID: 37230013 DOI: 10.1016/j.compbiomed.2023.106932] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2022] [Revised: 03/28/2023] [Accepted: 04/13/2023] [Indexed: 05/27/2023]
Abstract
Attention mechanism-based medical image segmentation methods have developed rapidly recently. For the attention mechanisms, it is crucial to accurately capture the distribution weights of the effective features contained in the data. To accomplish this task, most attention mechanisms prefer using the global squeezing approach. However, it will lead to a problem of over-focusing on the global most salient effective features of the region of interest, while suppressing the secondary salient ones. Making partial fine-grained features are abandoned directly. To address this issue, we propose to use a multiple-local perception method to aggregate global effective features, and design a fine-grained medical image segmentation network, named FSA-Net. This network consists of two key components: 1) the novel Separable Attention Mechanisms which replace global squeezing with local squeezing to release the suppressed secondary salient effective features. 2) a Multi-Attention Aggregator (MAA) which can fuse multi-level attention to efficiently aggregate task-relevant semantic information. We conduct extensive experimental evaluations on five publicly available medical image segmentation datasets: MoNuSeg, COVID-19-CT100, GlaS, CVC-ClinicDB, ISIC2018, and DRIVE datasets. Experimental results show that FSA-Net outperforms state-of-the-art methods in medical image segmentation.
Collapse
Affiliation(s)
- Bangcheng Zhan
- School of Computer Science & Technology, Huazhong University of Science and Technology, Wuhan, Hubei, 430074, China
| | - Enmin Song
- School of Computer Science & Technology, Huazhong University of Science and Technology, Wuhan, Hubei, 430074, China.
| | - Hong Liu
- School of Computer Science & Technology, Huazhong University of Science and Technology, Wuhan, Hubei, 430074, China
| |
Collapse
|
15
|
Zhang J, Chen Y, Zeng P, Liu Y, Diao Y, Liu P. Ultra-Attention: Automatic Recognition of Liver Ultrasound Standard Sections Based on Visual Attention Perception Structures. Ultrasound Med Biol 2023; 49:1007-1017. [PMID: 36681610 DOI: 10.1016/j.ultrasmedbio.2022.12.016] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/07/2022] [Revised: 11/12/2022] [Accepted: 12/22/2022] [Indexed: 06/17/2023]
Abstract
Acquisition of a standard section is a prerequisite for ultrasound diagnosis. For a long time, there has been a lack of clear definitions of standard liver views because of physician experience. The accurate automated scanning of standard liver sections, however, remains one of ultrasonography medicine's most important issues. In this article, we enrich and expand the classification criteria of liver ultrasound standard sections from clinical practice and propose an Ultra-Attention structured perception strategy to automate the recognition of these sections. Inspired by the attention mechanism in natural language processing, the standard liver ultrasound views will participate in the global attention algorithm as modular local images in computer vision of ultrasound images, which will significantly amplify small features that would otherwise go unnoticed. In addition to using the dropout mechanism, we also use a Part-Transfer Learning training approach to fine-tune the model's rate of convergence to increase its robustness. The proposed Ultra-Attention model outperforms various traditional convolutional neural network-based techniques, achieving the best known performance in the field with a classification accuracy of 93.2%. As part of the feature extraction procedure, we also illustrate and compare the convolutional structure and the Ultra-Attention approach. This analysis provides a reasonable view for future research on local modular feature capture in ultrasound images. By developing a standard scan guideline for liver ultrasound-based illness diagnosis, this work will advance the research on automated disease diagnosis that is directed by standard sections of liver ultrasound.
Collapse
Affiliation(s)
- Jiansong Zhang
- College of Medicine, Huaqiao University, Quanzhou, Fujian Province, China
| | - Yongjian Chen
- Department of Ultrasound, Second Affiliated Hospital of Fujian Medical University, Quanzhou, Fujian Province, China
| | - Pan Zeng
- College of Medicine, Huaqiao University, Quanzhou, Fujian Province, China
| | - Yao Liu
- College of Science and Engineering, National Quemoy University, Kinmen, Taiwan
| | - Yong Diao
- College of Medicine, Huaqiao University, Quanzhou, Fujian Province, China
| | - Peizhong Liu
- College of Medicine, Huaqiao University, Quanzhou, Fujian Province, China; College of Engineering, Huaqiao University, Quanzhou, Fujian Province, China.
| |
Collapse
|
16
|
Gómez S, Mantilla D, Rangel E, Ortiz A, D Vera D, Martínez Carrillo F. A deep supervised cross-attention strategy for ischemic stroke segmentation in MRI studies. Biomed Phys Eng Express 2023; 9. [PMID: 36988115 DOI: 10.1088/2057-1976/acc853] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2022] [Accepted: 03/28/2023] [Indexed: 03/30/2023]
Abstract
The key component of stroke diagnosis is the localization and delineation of brain lesions, especially from MRI studies. Nonetheless, this manual delineation is time-consuming and biased by expert opinion. The main purpose of this study is to introduce an autoencoder architecture that effectively integrates cross-attention mechanisms, together with hierarchical deep supervision to delineate lesions under scenarios of remarked unbalance tissue classes, challenging geometry of the shape, and a variable textural representation. This work introduces a cross-attention deep autoencoder that focuses on the lesion shape through a set of convolutional saliency maps, forcing skip connections to preserve the morphology of affected tissue. Moreover, a deep supervision training scheme was herein adapted to induce the learning of hierarchical lesion details. Besides, a special weighted loss function remarks lesion tissue, alleviating the negative impact of class imbalance. The proposed approach was validated on the public ISLES2017 dataset outperforming state-of-the-art results, achieving a dice score of $0.36$ and a precision of $0.42$. Deeply supervised cross-attention autoencoders, trained to pay more attention to lesion tissue, are better at estimating ischemic lesions in MRI studies. The best architectural configuration was achieved by integrating ADC, TTP and Tmax sequences. The contribution of deeply supervised cross-attention autoencoders allows better support the discrimination between healthy and lesion regions, which in consequence results in favorable prognosis and follow-up of patients.
Collapse
Affiliation(s)
- Santiago Gómez
- Universidad Industrial de Santander, Calle 9 #27, Bucaramanga, 680002, COLOMBIA
| | - Daniel Mantilla
- Foscal Clinic, Calle 155A #23, Floridablanca, Santander, 681004, COLOMBIA
| | - Edgar Rangel
- Universidad Industrial de Santander, Calle 9 #27, Bucaramanga, 680002, COLOMBIA
| | - Andres Ortiz
- Foscal Clinic, Calle 155A #23, Floridablanca, Santander, 681004, COLOMBIA
| | - Daniela D Vera
- Foscal Clinic, Calle 155A #23, Floridablanca, Santander, 681004, COLOMBIA
| | | |
Collapse
|
17
|
Wei J, Liu G, Liu S, Xiao Z. A novel algorithm for small object detection based on YOLOv4. PeerJ Comput Sci 2023; 9:e1314. [PMID: 37346537 PMCID: PMC10280595 DOI: 10.7717/peerj-cs.1314] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2022] [Accepted: 03/06/2023] [Indexed: 06/23/2023]
Abstract
Small object detection is one of the difficulties in the development of computer vision, especially in the case of complex image backgrounds, and the accuracy of small object detection still needs to be improved. In this article, we present a small object detection network based on YOLOv4, which solves some obstacles that hinder the performance of traditional methods in small object detection tasks in complex road environments, such as few effective features, the influence of image noise, and occlusion by large objects, and improves the detection of small objects in complex background situations such as drone aerial survey images. The improved network architecture reduces the computation and GPU memory consumption of the network by including the cross-stage partial network (CSPNet) structure into the spatial pyramid pool (SPP) structure in the YOLOv4 network and convolutional layers after concatenation operation. Secondly, the accuracy of the model on the small object detection task is improved by adding a more suitable small object detection head and removing one used for large object detection. Then, a new branch is added to extract feature information at a shallow location in the backbone part, and the feature information extracted from this branch is fused in the neck part to enrich the small object location information extracted by the model; when fusing feature information from different levels in the backbone, the fusion weight of useful information is increased by adding a weighting mechanism to improve detection performance at each scale. Finally, a coordinated attention (CA) module is embedded at a suitable location in the neck part, which enables the model to focus on spatial location relationships and inter-channel relationships and enhances feature representation capability. The proposed model has been tested to detect 10 different target objects in aerial images from drones and five different road traffic signal signs in images taken from vehicles in a complex road environment. The detection speed of the model meets the criteria of real-time detection, the model has better performance in terms of accuracy compared to the existing state-of-the-art detection models, and the model has only 44M parameters. On the drone aerial photography dataset, the average accuracy of YOLOv4 and YOLOv5L is 42.79% and 42.10%, respectively, while our model achieves an average accuracy (mAP) of 52.76%; on the urban road traffic light dataset, the proposed model achieves an average accuracy of 96.98%, which is also better than YOLOv4 (95.32%), YOLOv5L (94.79%) and other advanced models. The current work provides an efficient method for small object detection in complex road environments, which can be extended to scenarios involving small object detection, such as drone cruising and autonomous driving.
Collapse
|
18
|
Yu Z, Liu S, Liu P, Liu Y. Automatic detection and diagnosis of thyroid ultrasound images based on attention mechanism. Comput Biol Med 2023; 155:106468. [PMID: 36841057 DOI: 10.1016/j.compbiomed.2022.106468] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2022] [Revised: 11/21/2022] [Accepted: 12/19/2022] [Indexed: 12/24/2022]
Abstract
Incidents of thyroid cancer have dramatically increased in recent years; however, early ultrasound diagnosis can reduce morbidity and mortality. The work in clinical situations relies heavily on the subjective experience of the sonographer. Numerous computer-aided diagnostic techniques exist, but most consider how good the results are, ignoring the pre-image collecting and its usefulness in post-clinical practise. To address these issues, this study proposes a computer-aided diagnosis method based on an attentional mechanism. Due to its lightweight properties, the model can rapidly identify nodules and distinguish between benign and malignant ones without using much hardware. The model uses a bounding box to locate the thyroid nodule and determines whether it is benign or cancerous, and outputs the diagnostic result of the thyroid nodule ultrasound images. The latest attention mechanisms are used to get better results at a fraction of the cost. Additionally, ultrasound images with different features of benign and malignant thyroid nodules were collected following the Thyroid Imaging Reporting and Data System standards. The experimental results showed that the approach identifies and classifies thyroid nodules rapidly and effectively; the mAP value of the results reached 0.89, and the mAP value of malignant nodules reached 0.94, with detection rate of single image reached 7 ms. Young physicians and small hospitals with limited resources can benefit from using this method to assist with thyroid ultrasound examination diagnosis.
Collapse
Affiliation(s)
- Zhenggang Yu
- College of Medicine, Huaqiao University, Quanzhou, Fujian Province, China
| | - Shunlan Liu
- Department of Ultrasound, The Second Affiliated Hospital of Fujian Medical University, Quanzhou, 362000, Fujian Province, China
| | - Peizhong Liu
- College of Medicine, Huaqiao University, Quanzhou, Fujian Province, China; College of Engineering, Huaqiao University, Quanzhou, Fujian Province, China.
| | - Yao Liu
- College of Science and Engineering, National Quemoy University, Kinmen, 89250, Taiwan
| |
Collapse
|
19
|
Zhang Y, Su L, Liu Z, Tan W, Jiang Y, Cheng C. A semi-supervised learning approach for COVID-19 detection from chest CT scans. Neurocomputing 2022; 503:314-24. [PMID: 35765410 DOI: 10.1016/j.neucom.2022.06.076] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2022] [Revised: 05/11/2022] [Accepted: 06/18/2022] [Indexed: 01/17/2023]
Abstract
COVID-19 has spread rapidly all over the world and has infected more than 200 countries and regions. Early screening of suspected infected patients is essential for preventing and combating COVID-19. Computed Tomography (CT) is a fast and efficient tool which can quickly provide chest scan results. To reduce the burden on doctors of reading CTs, in this article, a high precision diagnosis algorithm of COVID-19 from chest CTs is designed for intelligent diagnosis. A semi-supervised learning approach is developed to solve the problem when only small amount of labelled data is available. While following the MixMatch rules to conduct sophisticated data augmentation, we introduce a model training technique to reduce the risk of model over-fitting. At the same time, a new data enhancement method is proposed to modify the regularization term in MixMatch. To further enhance the generalization of the model, a convolutional neural network based on an attention mechanism is then developed that enables to extract multi-scale features on CT scans. The proposed algorithm is evaluated on an independent CT dataset of the chest from COVID-19 and achieves the area under the receiver operating characteristic curve (AUC) value of 0.932, accuracy of 90.1%, sensitivity of 91.4%, specificity of 88.9%, and F1-score of 89.9%. The results show that the proposed algorithm can accurately diagnose whether a chest CT belongs to a positive or negative indication of COVID-19, and can help doctors to diagnose rapidly in the early stages of a COVID-19 outbreak.
Collapse
|
20
|
Guan A, Liu L, Fu X, Liu L. Precision medical image hash retrieval by interpretability and feature fusion. Comput Methods Programs Biomed 2022; 222:106945. [PMID: 35749884 DOI: 10.1016/j.cmpb.2022.106945] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/30/2021] [Revised: 04/14/2022] [Accepted: 06/07/2022] [Indexed: 06/15/2023]
Abstract
BACKGROUND AND OBJECTIVE To address the problem of low accuracy of medical image retrieval due to high inter-class similarity and easy omission of lesions, a precision medical image hash retrieval method combining interpretability and feature fusion is proposed, taking chest X-ray images as an example. METHODS Firstly, the DenseNet-121 network is pre-trained on a large dataset of medical images without manual annotation using the comparison to learn (C2L) method to obtain a backbone network model containing more medical representations with training weights. Then, a global network is constructed by using global image learning to acquire an interpretable saliency map as attention mechanisms, which can generate a mask crop to get a local discriminant region. Thirdly, the local discriminant regions are used as local network inputs to obtain local features, and the global features are used with the local features by dimension in the pooling layer. Finally, a hash layer is added between the fully connected layer and the classification layer of the backbone network, defining classification loss, quantization loss and bit-balanced loss functions to generate high-quality hash codes. The final retrieval result is output by calculating the similarity metric of the hash codes. RESULTS Experiments on the Chest X-ray8 dataset demonstrate that our proposed interpretable saliency map can effectively locate focal regions, the fusion of features can avoid information omission, and the combination of three loss functions can generate more accurate hash codes. Compared with the current advanced medical image retrieval methods, this method can effectively improve the accuracy of medical image retrieval. CONCLUSIONS The proposed hash retrieval approach combining interpretability and feature fusion can effectively improve the accuracy of medical image retrieval which can be potentially applied in computer-aided-diagnosis systems.
Collapse
Affiliation(s)
- Anna Guan
- Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming, Yunnan 650500, China
| | - Li Liu
- Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming, Yunnan 650500, China; Computer Technology Application Key Lab of Yunnan Province, Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650500, China.
| | - Xiaodong Fu
- Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming, Yunnan 650500, China; Computer Technology Application Key Lab of Yunnan Province, Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650500, China
| | - Lijun Liu
- Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming, Yunnan 650500, China; Computer Technology Application Key Lab of Yunnan Province, Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650500, China
| |
Collapse
|
21
|
Ni J, Wu J, Elazab A, Tong J, Chen Z. DNL-Net: deformed non-local neural network for blood vessel segmentation. BMC Med Imaging 2022; 22:109. [PMID: 35668351 PMCID: PMC9169317 DOI: 10.1186/s12880-022-00836-z] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2022] [Accepted: 05/31/2022] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The non-local module has been primarily used in literature to capturing long-range dependencies. However, it suffers from prohibitive computational complexity and lacks the interactions among positions across the channels. METHODS We present a deformed non-local neural network (DNL-Net) for medical image segmentation, which has two prominent components; deformed non-local module (DNL) and multi-scale feature fusion. The former optimizes the structure of the non-local block (NL), hence, reduces the problem of excessive computation and memory usage, significantly. The latter is derived from the attention mechanisms to fuse the features of different levels and improve the ability to exchange information across channels. In addition, we introduce a residual squeeze and excitation pyramid pooling (RSEP) module that is like spatial pyramid pooling to effectively resample the features at different scales and improve the network receptive field. RESULTS The proposed method achieved 96.63% and 92.93% for Dice coefficient and mean intersection over union, respectively, on the intracranial blood vessel dataset. Also, DNL-Net attained 86.64%, 96.10%, and 98.37% for sensitivity, accuracy and area under receiver operation characteristic curve, respectively, on the DRIVE dataset. CONCLUSIONS The overall performance of DNL-Net outperforms other current state-of-the-art vessel segmentation methods, which indicates that the proposed network is more suitable for blood vessel segmentation, and is of great clinical significance.
Collapse
Affiliation(s)
- Jiajia Ni
- College of Internet of Things Engineering, HoHai University, Changzhou, China.,Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
| | - Jianhuang Wu
- Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China.
| | - Ahmed Elazab
- School of Biomedical Engineering, Shenzhen University, Shenzhen, China.,Computer Science Department, Misr Higher Institute for Commerce and Computers, Mansoura, Egypt
| | - Jing Tong
- College of Internet of Things Engineering, HoHai University, Changzhou, China
| | - Zhengming Chen
- College of Internet of Things Engineering, HoHai University, Changzhou, China
| |
Collapse
|
22
|
Yeung M, Sala E, Schönlieb CB, Rundo L. Focus U-Net: A novel dual attention-gated CNN for polyp segmentation during colonoscopy. Comput Biol Med 2021; 137:104815. [PMID: 34507156 PMCID: PMC8505797 DOI: 10.1016/j.compbiomed.2021.104815] [Citation(s) in RCA: 32] [Impact Index Per Article: 10.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2021] [Revised: 08/26/2021] [Accepted: 08/26/2021] [Indexed: 02/07/2023]
Abstract
BACKGROUND Colonoscopy remains the gold-standard screening for colorectal cancer. However, significant miss rates for polyps have been reported, particularly when there are multiple small adenomas. This presents an opportunity to leverage computer-aided systems to support clinicians and reduce the number of polyps missed. METHOD In this work we introduce the Focus U-Net, a novel dual attention-gated deep neural network, which combines efficient spatial and channel-based attention into a single Focus Gate module to encourage selective learning of polyp features. The Focus U-Net incorporates several further architectural modifications, including the addition of short-range skip connections and deep supervision. Furthermore, we introduce the Hybrid Focal loss, a new compound loss function based on the Focal loss and Focal Tversky loss, designed to handle class-imbalanced image segmentation. For our experiments, we selected five public datasets containing images of polyps obtained during optical colonoscopy: CVC-ClinicDB, Kvasir-SEG, CVC-ColonDB, ETIS-Larib PolypDB and EndoScene test set. We first perform a series of ablation studies and then evaluate the Focus U-Net on the CVC-ClinicDB and Kvasir-SEG datasets separately, and on a combined dataset of all five public datasets. To evaluate model performance, we use the Dice similarity coefficient (DSC) and Intersection over Union (IoU) metrics. RESULTS Our model achieves state-of-the-art results for both CVC-ClinicDB and Kvasir-SEG, with a mean DSC of 0.941 and 0.910, respectively. When evaluated on a combination of five public polyp datasets, our model similarly achieves state-of-the-art results with a mean DSC of 0.878 and mean IoU of 0.809, a 14% and 15% improvement over the previous state-of-the-art results of 0.768 and 0.702, respectively. CONCLUSIONS This study shows the potential for deep learning to provide fast and accurate polyp segmentation results for use during colonoscopy. The Focus U-Net may be adapted for future use in newer non-invasive colorectal cancer screening and more broadly to other biomedical image segmentation tasks similarly involving class imbalance and requiring efficiency.
Collapse
Affiliation(s)
- Michael Yeung
- Department of Radiology, University of Cambridge, Cambridge, CB2 0QQ, United Kingdom; School of Clinical Medicine, University of Cambridge, Cambridge, CB2 0SP, United Kingdom.
| | - Evis Sala
- Department of Radiology, University of Cambridge, Cambridge, CB2 0QQ, United Kingdom; Cancer Research UK Cambridge Centre, University of Cambridge, Cambridge, CB2 0RE, United Kingdom.
| | - Carola-Bibiane Schönlieb
- Department of Applied Mathematics and Theoretical Physics, University of Cambridge, Cambridge, CB3 0WA, United Kingdom.
| | - Leonardo Rundo
- Department of Radiology, University of Cambridge, Cambridge, CB2 0QQ, United Kingdom; Cancer Research UK Cambridge Centre, University of Cambridge, Cambridge, CB2 0RE, United Kingdom.
| |
Collapse
|
23
|
Sridharan D, Knudsen EI. Selective disinhibition: A unified neural mechanism for predictive and post hoc attentional selection. Vision Res 2015; 116:194-209. [PMID: 25542276 DOI: 10.1016/j.visres.2014.12.010] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2014] [Revised: 12/04/2014] [Accepted: 12/11/2014] [Indexed: 11/23/2022]
Abstract
The natural world presents us with a rich and ever-changing sensory landscape containing diverse stimuli that constantly compete for representation in the brain. When the brain selects a stimulus as the highest priority for attention, it differentially enhances the representation of the selected, "target" stimulus and suppresses the processing of other, distracting stimuli. A stimulus may be selected for attention while it is still present in the visual scene (predictive selection) or after it has vanished (post hoc selection). We present a biologically inspired computational model that accounts for the prioritized processing of information about targets that are selected for attention either predictively or post hoc. Central to the model is the neurobiological mechanism of "selective disinhibition" - the selective suppression of inhibition of the representation of the target stimulus. We demonstrate that this mechanism explains major neurophysiological hallmarks of selective attention, including multiplicative neural gain, increased inter-trial reliability (decreased variability), and reduced noise correlations. The same mechanism also reproduces key behavioral hallmarks associated with target-distracter interactions. Selective disinhibition exhibits several distinguishing and advantageous features over alternative mechanisms for implementing target selection, and is capable of explaining the effects of selective attention over a broad range of real-world conditions, involving both predictive and post hoc biasing of sensory competition and decisions.
Collapse
|