1
|
Xu C, Roddan A, Xu H, Stamatia G. FF-ViT: probe orientation regression for robot-assisted endomicroscopy tissue scanning. Int J Comput Assist Radiol Surg 2024:10.1007/s11548-024-03113-2. [PMID: 38598141 DOI: 10.1007/s11548-024-03113-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2024] [Accepted: 03/11/2024] [Indexed: 04/11/2024]
Abstract
PURPOSE Probe-based confocal laser endomicroscopy (pCLE) enables visualization of cellular tissue morphology during surgical procedures. To capture high-quality pCLE images during tissue scanning, it is important to maintain close contact between the probe and the tissue, while also keeping the probe perpendicular to the tissue surface. Existing robotic pCLE tissue scanning systems, which rely on macroscopic vision, struggle to accurately place the probe at the optimal position on the tissue surface. As a result, the need arises for regression of longitudinal distance and orientation via endomicroscopic vision. METHOD This paper introduces a novel method for automatically regressing the orientation between a pCLE probe and the tissue surface during robotic scanning, utilizing the fast Fourier vision transformer (FF-ViT) to extract local frequency representations and use them for probe orientation regression. Additionally, the FF-ViT incorporates a blur mapping attention (BMA) module to refine latent representations, which is combined with the pyramid angle regressor (PAR) to precisely estimate probe orientation. RESULT A first of its kind dataset for pCLE probe-tissue orientation (pCLE-PTO) has been created. The performance evaluation demonstrates that our proposed network surpasses other top regression networks in accuracy, stability, and generalizability, while maintaining low computational complexity (1.8G FLOPs) and high inference speed (90 fps). CONCLUSION The performance evaluation study verifies the clinical value of the proposed framework and its potential to be integrated into surgical robotic platforms for intraoperative tissue scanning.
Collapse
Affiliation(s)
- Chi Xu
- Hamlyn Centre for Robotic Surgery Department of Surgery and Cancer, Imperial College London, London, SW7 2AZ, UK.
| | - Alfie Roddan
- Hamlyn Centre for Robotic Surgery Department of Surgery and Cancer, Imperial College London, London, SW7 2AZ, UK
| | - Haozheng Xu
- Hamlyn Centre for Robotic Surgery Department of Surgery and Cancer, Imperial College London, London, SW7 2AZ, UK
| | - Giannarou Stamatia
- Hamlyn Centre for Robotic Surgery Department of Surgery and Cancer, Imperial College London, London, SW7 2AZ, UK
| |
Collapse
|
2
|
Zeng X, Chen W, Lei B. CAT-DTI: cross-attention and Transformer network with domain adaptation for drug-target interaction prediction. BMC Bioinformatics 2024; 25:141. [PMID: 38566002 DOI: 10.1186/s12859-024-05753-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2024] [Accepted: 03/19/2024] [Indexed: 04/04/2024] Open
Abstract
Accurate and efficient prediction of drug-target interaction (DTI) is critical to advance drug development and reduce the cost of drug discovery. Recently, the employment of deep learning methods has enhanced DTI prediction precision and efficacy, but it still encounters several challenges. The first challenge lies in the efficient learning of drug and protein feature representations alongside their interaction features to enhance DTI prediction. Another important challenge is to improve the generalization capability of the DTI model within real-world scenarios. To address these challenges, we propose CAT-DTI, a model based on cross-attention and Transformer, possessing domain adaptation capability. CAT-DTI effectively captures the drug-target interactions while adapting to out-of-distribution data. Specifically, we use a convolution neural network combined with a Transformer to encode the distance relationship between amino acids within protein sequences and employ a cross-attention module to capture the drug-target interaction features. Generalization to new DTI prediction scenarios is achieved by leveraging a conditional domain adversarial network, aligning DTI representations under diverse distributions. Experimental results within in-domain and cross-domain scenarios demonstrate that CAT-DTI model overall improves DTI prediction performance compared with previous methods.
Collapse
Affiliation(s)
- Xiaoting Zeng
- School of Computer and Software, Shenzhen University, Shenzhen, 518060, China
| | - Weilin Chen
- Marshall Laboratory of Biomedical Engineering, Shenzhen University Medical School, Shenzhen University, Shenzhen, 518055, China.
| | - Baiying Lei
- School of Biomedical Engineering, Shenzhen University, Shenzhen, 518055, China.
| |
Collapse
|
3
|
Zhao L, Xue Q, Zhang H, Hao Y, Yi H, Liu X, Pan W, Fu J, Zhang A. CatNet: Sequence-based deep learning with cross-attention mechanism for identifying endocrine-disrupting chemicals. J Hazard Mater 2024; 465:133055. [PMID: 38016311 DOI: 10.1016/j.jhazmat.2023.133055] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/19/2023] [Revised: 11/02/2023] [Accepted: 11/20/2023] [Indexed: 11/30/2023]
Abstract
Endocrine-disrupting chemicals (EDCs) pose significant environmental and health risks due to their potential to interfere with nuclear receptors (NRs), key regulators of physiological processes. Despite the evident risks, the majority of existing research narrows its focus on the interaction between compounds and the individual NR target, neglecting a comprehensive assessment across the entire NR family. In response, this study assembled a comprehensive human NR dataset, capturing 49,244 interactions between 35,467 unique compounds and 42 NRs. We introduced a cross-attention network framework, "CatNet", innovatively integrating compound and protein representations through cross-attention mechanisms. The results showed that CatNet model achieved excellent performance with an area under the receiver operating characteristic curve (AUCROC) = 0.916 on the test set, and exhibited reliable generalization on unseen compound-NR pairs. A distinguishing feature of our research is its capacity to expand to novel targets. Beyond its predictive accuracy, CatNet offers a valuable mechanistic perspective on compound-NR interactions through feature visualization. Augmenting the utility of our research, we have also developed a graphical user interface, empowering researchers to predict chemical binding to diverse NRs. Our model enables the prediction of human NR-related EDCs and shows the potential to identify EDCs related to other targets.
Collapse
Affiliation(s)
- Lu Zhao
- State Key Laboratory of Environmental Chemistry and Ecotoxicology, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing 100085, PR China; Sino-Danish College, University of Chinese Academy of Sciences, Beijing 100049, PR China
| | - Qiao Xue
- State Key Laboratory of Environmental Chemistry and Ecotoxicology, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing 100085, PR China.
| | - Huazhou Zhang
- State Key Laboratory of Environmental Chemistry and Ecotoxicology, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing 100085, PR China; Sino-Danish College, University of Chinese Academy of Sciences, Beijing 100049, PR China
| | - Yuxing Hao
- State Key Laboratory of Environmental Chemistry and Ecotoxicology, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing 100085, PR China; Sino-Danish College, University of Chinese Academy of Sciences, Beijing 100049, PR China
| | - Hang Yi
- State Key Laboratory of Environmental Chemistry and Ecotoxicology, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing 100085, PR China; College of Resources and Environment, University of Chinese Academy of Sciences, Beijing 100190, PR China
| | - Xian Liu
- State Key Laboratory of Environmental Chemistry and Ecotoxicology, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing 100085, PR China
| | - Wenxiao Pan
- State Key Laboratory of Environmental Chemistry and Ecotoxicology, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing 100085, PR China
| | - Jianjie Fu
- State Key Laboratory of Environmental Chemistry and Ecotoxicology, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing 100085, PR China; College of Resources and Environment, University of Chinese Academy of Sciences, Beijing 100190, PR China; School of Environment, Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Hangzhou 310012, PR China
| | - Aiqian Zhang
- State Key Laboratory of Environmental Chemistry and Ecotoxicology, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing 100085, PR China; Sino-Danish College, University of Chinese Academy of Sciences, Beijing 100049, PR China; College of Resources and Environment, University of Chinese Academy of Sciences, Beijing 100190, PR China; School of Environment, Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Hangzhou 310012, PR China.
| |
Collapse
|
4
|
Zhang W, Chen S, Ma Y, Liu Y, Cao X. ETUNet:Exploring efficient transformer enhanced UNet for 3D brain tumor segmentation. Comput Biol Med 2024; 171:108005. [PMID: 38340437 DOI: 10.1016/j.compbiomed.2024.108005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2023] [Revised: 01/03/2024] [Accepted: 01/13/2024] [Indexed: 02/12/2024]
Abstract
Medical image segmentation is a crucial topic in medical image processing. Accurately segmenting brain tumor regions from multimodal MRI scans is essential for clinical diagnosis and survival prediction. However, similar intensity distributions, variable tumor shapes, and fuzzy boundaries pose severe challenges for brain tumor segmentation. Traditional segmentation networks based on UNet struggle to establish explicit long-range dependencies from the feature space due to the limitations of the CNN receptive field. This is particularly crucial for dense prediction tasks such as brain tumor segmentation. Recent works have incorporated the powerful global modeling capability of Transformer into UNet to achieve more precise segmentation results. Nevertheless, these methods encounter some issues: (1) the global information is often modeled by simply stacking Transformer layers for a specific module, resulting in high computational complexity and underutilization of the potential of the UNet architecture; (2) the rich boundary information of tumor subregions in multi-scale features is often overlooked. Motivated by these challenges, we propose an advanced fusion of Transformer with UNet by reexamining the core three parts (encoder, bottleneck, and skip connections). Firstly, we introduce a CNN-Transformer module in the encoder to replace the traditional CNN module, enabling the capture of deep spatial dependencies from input images. To address high-level semantic information, we incorporate a computationally efficient spatial-channel attention layer in the bottleneck for global interaction, highlighting important semantic features from the encoder path output. For irregular lesions, we fuse the multi-scale features from the encoder output and the decoder features in the skip connections by calculating cross-attention. This adaptive querying of valuable information from multi-scale features enhances the boundary localization ability of the decoder path and suppresses redundant features with low correlation. Compared to existing methods, our model further enhances the learning capacity of the overall UNet architecture while maintaining low computational complexity. Experimental results on the BraTS2018 and BraTS2020 datasets for brain tumor segmentation tasks demonstrate that our model achieves comparable or superior results compared to recent CNN or Transformer-based models. The average DSC and HD95 on the two datasets are 0.854, 6.688, and 0.862, 5.455 respectively. At the same time, our model achieves optimal segmentation of Enhancing tumors, showcasing the effectiveness of our method. Our code will be made publicly available at https://github.com/wzhangck/ETUnet.
Collapse
Affiliation(s)
- Wang Zhang
- School of Computer and Information Science, SouthWest University, China.
| | - Shanxiong Chen
- School of Computer and Information Science, SouthWest University, China.
| | - Yuqi Ma
- School of Computer and Information Science, SouthWest University, China.
| | - Yu Liu
- School of Electronic Information and Electrical Engineering, TianShui Normal University, China.
| | - Xu Cao
- Department of Radiology, Shifang People's Hospital, China.
| |
Collapse
|
5
|
Wang H, Huang T, Wang D, Zeng W, Sun Y, Zhang L. MSCAN: multi-scale self- and cross-attention network for RNA methylation site prediction. BMC Bioinformatics 2024; 25:32. [PMID: 38233745 PMCID: PMC10795237 DOI: 10.1186/s12859-024-05649-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2023] [Accepted: 01/11/2024] [Indexed: 01/19/2024] Open
Abstract
BACKGROUND Epi-transcriptome regulation through post-transcriptional RNA modifications is essential for all RNA types. Precise recognition of RNA modifications is critical for understanding their functions and regulatory mechanisms. However, wet experimental methods are often costly and time-consuming, limiting their wide range of applications. Therefore, recent research has focused on developing computational methods, particularly deep learning (DL). Bidirectional long short-term memory (BiLSTM), convolutional neural network (CNN), and the transformer have demonstrated achievements in modification site prediction. However, BiLSTM cannot achieve parallel computation, leading to a long training time, CNN cannot learn the dependencies of the long distance of the sequence, and the Transformer lacks information interaction with sequences at different scales. This insight underscores the necessity for continued research and development in natural language processing (NLP) and DL to devise an enhanced prediction framework that can effectively address the challenges presented. RESULTS This study presents a multi-scale self- and cross-attention network (MSCAN) to identify the RNA methylation site using an NLP and DL way. Experiment results on twelve RNA modification sites (m6A, m1A, m5C, m5U, m6Am, m7G, Ψ, I, Am, Cm, Gm, and Um) reveal that the area under the receiver operating characteristic of MSCAN obtains respectively 98.34%, 85.41%, 97.29%, 96.74%, 99.04%, 79.94%, 76.22%, 65.69%, 92.92%, 92.03%, 95.77%, 89.66%, which is better than the state-of-the-art prediction model. This indicates that the model has strong generalization capabilities. Furthermore, MSCAN reveals a strong association among different types of RNA modifications from an experimental perspective. A user-friendly web server for predicting twelve widely occurring human RNA modification sites (m6A, m1A, m5C, m5U, m6Am, m7G, Ψ, I, Am, Cm, Gm, and Um) is available at http://47.242.23.141/MSCAN/index.php . CONCLUSIONS A predictor framework has been developed through binary classification to predict RNA methylation sites.
Collapse
Affiliation(s)
- Honglei Wang
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou, 221116, China
- School of Information Engineering, Xuzhou College of Industrial Technology, Xuzhou, 221400, China
| | - Tao Huang
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou, 221116, China
| | - Dong Wang
- School of Computer Science and Technology, China University of Mining and Technology, Xuzhou, 221116, China
| | - Wenliang Zeng
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou, 221116, China
| | - Yanjing Sun
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou, 221116, China.
| | - Lin Zhang
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou, 221116, China.
| |
Collapse
|
6
|
Amador K, Gutierrez A, Winder A, Fiehler J, Wilms M, Forkert ND. Providing clinical context to the spatio-temporal analysis of 4D CT perfusion to predict acute ischemic stroke lesion outcomes. J Biomed Inform 2024; 149:104567. [PMID: 38096945 DOI: 10.1016/j.jbi.2023.104567] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2023] [Revised: 10/25/2023] [Accepted: 12/07/2023] [Indexed: 12/18/2023]
Abstract
Acute ischemic stroke is a leading cause of mortality and morbidity worldwide. Timely identification of the extent of a stroke is crucial for effective treatment, whereas spatio-temporal (4D) Computed Tomography Perfusion (CTP) imaging is playing a critical role in this process. Recently, the first deep learning-based methods that leverage the full spatio-temporal nature of perfusion imaging for predicting stroke lesion outcomes have been proposed. However, clinical information is typically not integrated into the learning process, which may be helpful to improve the tissue outcome prediction given the known influence of various factors (i.e., physiological, demographic, and treatment factors) on lesion growth. Cross-attention, a multimodal fusion strategy, has been successfully used to combine information from multiple sources, but it has yet to be applied to stroke lesion outcome prediction. Therefore, this work aimed to develop and evaluate a novel multimodal and spatio-temporal deep learning model that utilizes cross-attention to combine information from 4D CTP and clinical metadata simultaneously to predict stroke lesion outcomes. The proposed model was evaluated using a dataset of 70 acute ischemic stroke patients, demonstrating significantly improved volume estimates (mean error = 19 ml) compared to a baseline unimodal approach (mean error = 35 ml, p< 0.05). The proposed model allows generating attention maps and counterfactual outcome scenarios to investigate the relevance of clinical variables in predicting stroke lesion outcomes at a patient level, helping to provide a better understanding of the model's decision-making process.
Collapse
Affiliation(s)
- Kimberly Amador
- Biomedical Engineering Graduate Program, University of Calgary, Calgary, Canada; Department of Radiology, University of Calgary, Calgary, Canada; Hotchkiss Brain Institute, University of Calgary, Calgary, Canada; Alberta Children's Hospital Research Institute, University of Calgary, Calgary, Canada.
| | - Alejandro Gutierrez
- Biomedical Engineering Graduate Program, University of Calgary, Calgary, Canada; Department of Radiology, University of Calgary, Calgary, Canada; Hotchkiss Brain Institute, University of Calgary, Calgary, Canada; Alberta Children's Hospital Research Institute, University of Calgary, Calgary, Canada
| | - Anthony Winder
- Department of Radiology, University of Calgary, Calgary, Canada; Hotchkiss Brain Institute, University of Calgary, Calgary, Canada
| | - Jens Fiehler
- Department of Diagnostic and Interventional Neuroradiology, University Medical Center Hamburg-Eppendorf, Hamburg, Germany
| | - Matthias Wilms
- Hotchkiss Brain Institute, University of Calgary, Calgary, Canada; Alberta Children's Hospital Research Institute, University of Calgary, Calgary, Canada; Departments of Pediatrics and Community Health Sciences, University of Calgary, Calgary, Canada
| | - Nils D Forkert
- Department of Radiology, University of Calgary, Calgary, Canada; Hotchkiss Brain Institute, University of Calgary, Calgary, Canada; Alberta Children's Hospital Research Institute, University of Calgary, Calgary, Canada; Department of Clinical Neurosciences, University of Calgary, Calgary, Canada
| |
Collapse
|
7
|
Mukhtar H, Khan MUG. STMMOT: Advancing multi-object tracking through spatiotemporal memory networks and multi-scale attention pyramids. Neural Netw 2023; 168:363-379. [PMID: 37801917 DOI: 10.1016/j.neunet.2023.09.047] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2023] [Revised: 09/14/2023] [Accepted: 09/27/2023] [Indexed: 10/08/2023]
Abstract
Multi-object Tracking (MOT) is very important in human surveillance, sports analytics, autonomous driving, and cooperative robots. Current MOT methods do not perform well in non-uniform movements, occlusion and appearance-reappearance scenarios. We introduce a comprehensive MOT method that seamlessly merges object detection and identity linkage within an end-to-end trainable framework, designed with the capability to maintain object links over a long period of time. Our proposed model, named STMMOT, is architectured around 4 key modules: (1) Candidate proposal creation network, generates object proposals via vision-Transformer encoder-decoder architecture; (2) Scale variant pyramid, progressive pyramid structure to learn the self-scale and cross-scale similarities in multi-scale feature maps; (3) Spatio-temporal memory encoder, extracting the essential information from the memory associated with each object under tracking; and (4) Spatio-temporal memory decoder, simultaneously resolving the tasks of object detection and identity association for MOT. Our system leverages a robust spatio-temporal memory module that retains extensive historical object state observations and effectively encodes them using an attention-based aggregator. The uniqueness of STMMOT resides in representing objects as dynamic query embeddings that are updated continuously, which enables the prediction of object states with an attention mechanism and eradicates the need for post-processing. Experimental results show that STMMOT archives scores of 79.8 and 78.4 for IDF1, 79.3 and 74.1 for MOTA, 73.2 and 69.0 for HOTA, 61.2 and 61.5 for AssA, and maintained an ID switch count of 1529 and 1264 on MOT17 and MOT20, respectively. When evaluated on MOT20, it scored 78.4 in IDF1, 74.1 in MOTA, 69.0 in HOTA, and 61.5 in AssA, and kept the ID switch count to 1264. Compared with the previous best TransMOT, STMMOT achieves around a 4.58% and 4.25% increase in IDF1, and ID switching reduction to 5.79% and 21.05% on MOT17 and MOT20, respectively.
Collapse
Affiliation(s)
- Hamza Mukhtar
- Department of Computer Science, University of Engineering and Technology Lahore, G.T. Road, Lahore, 54890, Punjab, Pakistan; Intelligent Criminology Lab, National Center of Artificial Intelligence, AlKhawarizmi Institute of Computer Science, University of Engineering and Technology, GT, Road, Lahore, 54890, Punjab, Pakistan.
| | - Muhammad Usman Ghani Khan
- Department of Computer Science, University of Engineering and Technology Lahore, G.T. Road, Lahore, 54890, Punjab, Pakistan; Intelligent Criminology Lab, National Center of Artificial Intelligence, AlKhawarizmi Institute of Computer Science, University of Engineering and Technology, GT, Road, Lahore, 54890, Punjab, Pakistan.
| |
Collapse
|
8
|
Cai Z, Zeng T, Lieffrig EV, Zhang J, Chen F, Toyonaga T, You C, Xin J, Zheng N, Lu Y, Duncan JS, Onofrey JA. Cross-Attention for Improved Motion Correction in Brain PET. Mach Learn Clin Neuroimaging (2023) 2023; 14312:34-45. [PMID: 38174216 PMCID: PMC10758996 DOI: 10.1007/978-3-031-44858-4_4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/05/2024]
Abstract
Head movement during long scan sessions degrades the quality of reconstruction in positron emission tomography (PET) and introduces artifacts, which limits clinical diagnosis and treatment. Recent deep learning-based motion correction work utilized raw PET list-mode data and hardware motion tracking (HMT) to learn head motion in a supervised manner. However, motion prediction results were not robust to testing subjects outside the training data domain. In this paper, we integrate a cross-attention mechanism into the supervised deep learning network to improve motion correction across test subjects. Specifically, cross-attention learns the spatial correspondence between the reference images and moving images to explicitly focus the model on the most correlative inherent information - the head region the motion correction. We validate our approach on brain PET data from two different scanners: HRRT without time of flight (ToF) and mCT with ToF. Compared with traditional and deep learning benchmarks, our network improved the performance of motion correction by 58% and 26% in translation and rotation, respectively, in multi-subject testing in HRRT studies. In mCT studies, our approach improved performance by 66% and 64% for translation and rotation, respectively. Our results demonstrate that cross-attention has the potential to improve the quality of brain PET image reconstruction without the dependence on HMT. All code will be released on GitHub: https://github.com/OnofreyLab/dl_hmc_attention_mlcn2023.
Collapse
Affiliation(s)
- Zhuotong Cai
- Institute of Artificial Intelligence and Robotics, Xi'an Jiaotong University, China
- Department of Radiology & Biomedical Imaging, New Haven, CT, USA
- Department of Biomedical Engineering, New Haven, CT, USA
| | - Tianyi Zeng
- Department of Radiology & Biomedical Imaging, New Haven, CT, USA
| | | | - Jiazhen Zhang
- Department of Biomedical Engineering, New Haven, CT, USA
| | - Fuyao Chen
- Department of Biomedical Engineering, New Haven, CT, USA
| | - Takuya Toyonaga
- Department of Radiology & Biomedical Imaging, New Haven, CT, USA
| | - Chenyu You
- Department of Electrical Engineering, New Haven, CT, USA
| | - Jingmin Xin
- Institute of Artificial Intelligence and Robotics, Xi'an Jiaotong University, China
| | - Nanning Zheng
- Institute of Artificial Intelligence and Robotics, Xi'an Jiaotong University, China
| | - Yihuan Lu
- United Imaging Healthcare, Shanghai, China
| | - James S Duncan
- Department of Radiology & Biomedical Imaging, New Haven, CT, USA
- Department of Biomedical Engineering, New Haven, CT, USA
- Department of Electrical Engineering, New Haven, CT, USA
| | - John A Onofrey
- Department of Radiology & Biomedical Imaging, New Haven, CT, USA
- Department of Biomedical Engineering, New Haven, CT, USA
- Department of Urology, Yale University, New Haven, CT, USA
| |
Collapse
|
9
|
Yang S, Zhang P, Che C, Zhong Z. B-LBConA: a medical entity disambiguation model based on Bio-LinkBERT and context-aware mechanism. BMC Bioinformatics 2023; 24:97. [PMID: 36927359 PMCID: PMC10021986 DOI: 10.1186/s12859-023-05209-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2022] [Accepted: 02/24/2023] [Indexed: 03/18/2023] Open
Abstract
BACKGROUND The main task of medical entity disambiguation is to link mentions, such as diseases, drugs, or complications, to standard entities in the target knowledge base. To our knowledge, models based on Bidirectional Encoder Representations from Transformers (BERT) have achieved good results in this task. Unfortunately, these models only consider text in the current document, fail to capture dependencies with other documents, and lack sufficient mining of hidden information in contextual texts. RESULTS We propose B-LBConA, which is based on Bio-LinkBERT and context-aware mechanism. Specifically, B-LBConA first utilizes Bio-LinkBERT, which is capable of learning cross-document dependencies, to obtain embedding representations of mentions and candidate entities. Then, cross-attention is used to capture the interaction information of mention-to-entity and entity-to-mention. Finally, B-LBConA incorporates disambiguation clues about the relevance between the mention context and candidate entities via the context-aware mechanism. CONCLUSIONS Experiment results on three publicly available datasets, NCBI, ADR and ShARe/CLEF, show that B-LBConA achieves a signifcantly more accurate performance compared with existing models.
Collapse
Affiliation(s)
- Siyu Yang
- Key Laboratory of Advanced Design and Intelligent Computing, Ministry of Education, Dalian University, 116622, Dalian, China
| | - Peiliang Zhang
- School of Computer Science and Artificial Intelligence, Wuhan University of Technology, 430070, Wuhan, China
| | - Chao Che
- Key Laboratory of Advanced Design and Intelligent Computing, Ministry of Education, Dalian University, 116622, Dalian, China
| | - Zhaoqian Zhong
- Key Laboratory of Advanced Design and Intelligent Computing, Ministry of Education, Dalian University, 116622, Dalian, China.
| |
Collapse
|
10
|
Liu S, Wang X, Xiang Y, Xu H, Wang H, Tang B. CATNet: Cross-event attention-based time-aware network for medical event prediction. Artif Intell Med 2022; 134:102440. [PMID: 36462902 DOI: 10.1016/j.artmed.2022.102440] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2021] [Revised: 09/13/2022] [Accepted: 10/28/2022] [Indexed: 12/14/2022]
Abstract
Medical event prediction (MEP) is a fundamental task in the healthcare domain, which needs to predict medical events, including medications, diagnosis codes, laboratory tests, procedures, outcomes, and so on, according to historical medical records of patients. Many researchers have tried to build MEP models to overcome the challenges caused by the heterogeneous and irregular temporal characteristics of EHR data. However, most of them consider the heterogenous and temporal medical events separately and ignore the correlations among different types of medical events, especially relations between heterogeneous historical medical events and target medical events. In this paper, we propose a novel neural network based on attention mechanism called Cross-event Attention-based Time-aware Network (CATNet) for MEP. It is a time-aware, event-aware and task-adaptive method with the following advantages: 1) modeling heterogeneous information and temporal information in a unified way and considering irregular temporal characteristics locally and globally respectively, 2) taking full advantage of correlations among different types of events via cross-event attention. Experiments on two public datasets (MIMIC-III and eICU) show CATNet outperforms other state-of-the-art methods on various MEP tasks. The source code of CATNet is released at https://github.com/sherry6247/CATNet.git.
Collapse
|
11
|
Xie H, Zeng X, Lei H, Du J, Wang J, Zhang G, Cao J, Wang T, Lei B. Cross-attention multi-branch network for fundus diseases classification using SLO images. Med Image Anal 2021; 71:102031. [PMID: 33798993 DOI: 10.1016/j.media.2021.102031] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2020] [Revised: 01/24/2021] [Accepted: 03/03/2021] [Indexed: 12/23/2022]
Abstract
Fundus diseases classification is vital for the health of human beings. However, most of existing methods detect diseases by means of single angle fundus images, which lead to the lack of pathological information. To address this limitation, this paper proposes a novel deep learning method to complete different fundus diseases classification tasks using ultra-wide field scanning laser ophthalmoscopy (SLO) images, which have an ultra-wide field view of 180-200˚. The proposed deep model consists of multi-branch network, atrous spatial pyramid pooling module (ASPP), cross-attention and depth-wise attention module. Specifically, the multi-branch network employs the ResNet-34 model as the backbone to extract feature information, where the ResNet-34 model with two-branch is followed by the ASPP module to extract multi-scale spatial contextual features by setting different dilated rates. The depth-wise attention module can provide the global attention map from the multi-branch network, which enables the network to focus on the salient targets of interest. The cross-attention module adopts the cross-fusion mode to fuse the channel and spatial attention maps from the ResNet-34 model with two-branch, which can enhance the representation ability of the disease-specific features. The extensive experiments on our collected SLO images and two publicly available datasets demonstrate that the proposed method can outperform the state-of-the-art methods and achieve quite promising classification performance of the fundus diseases.
Collapse
Affiliation(s)
- Hai Xie
- National-Regional Key Technology Engineering Laboratory for Medical Ultrasound, Guangdong Key Laboratory for Biomedical Measurements and Ultrasound Imaging, School of Biomedical Engineering, Health Science Center, Shenzhen University, Shenzhen, China
| | - Xianlu Zeng
- Shenzhen Eye Hospital, Shenzhen Key Ophthalmic Laboratory, Health Science Center, Shenzhen University, The Second Affiliated Hospital of Jinan University, Shenzhen, China
| | - Haijun Lei
- Guangdong Province Key Laboratory of Popular High-performance Computers, School of Computer and Software Engineering, Shenzhen University, Shenzhen, China
| | - Jie Du
- National-Regional Key Technology Engineering Laboratory for Medical Ultrasound, Guangdong Key Laboratory for Biomedical Measurements and Ultrasound Imaging, School of Biomedical Engineering, Health Science Center, Shenzhen University, Shenzhen, China
| | - Jiantao Wang
- Shenzhen Eye Hospital, Shenzhen Key Ophthalmic Laboratory, Health Science Center, Shenzhen University, The Second Affiliated Hospital of Jinan University, Shenzhen, China
| | - Guoming Zhang
- Shenzhen Eye Hospital, Shenzhen Key Ophthalmic Laboratory, Health Science Center, Shenzhen University, The Second Affiliated Hospital of Jinan University, Shenzhen, China.
| | - Jiuwen Cao
- Key Lab for IOT and Information Fusion Technology of Zhejiang, Artificial Intelligence Institute, Hangzhou Dianzi University, Hangzhou, China
| | - Tianfu Wang
- National-Regional Key Technology Engineering Laboratory for Medical Ultrasound, Guangdong Key Laboratory for Biomedical Measurements and Ultrasound Imaging, School of Biomedical Engineering, Health Science Center, Shenzhen University, Shenzhen, China
| | - Baiying Lei
- National-Regional Key Technology Engineering Laboratory for Medical Ultrasound, Guangdong Key Laboratory for Biomedical Measurements and Ultrasound Imaging, School of Biomedical Engineering, Health Science Center, Shenzhen University, Shenzhen, China.
| |
Collapse
|