1
|
Dai L, Yang X, Li H, Zhao X, Lin L, Jiang Y, Wang Y, Li Z, Shen H. A clinically actionable and explainable real-time risk assessment framework for stroke-associated pneumonia. Artif Intell Med 2024; 149:102772. [PMID: 38462273 DOI: 10.1016/j.artmed.2024.102772] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2022] [Revised: 12/13/2023] [Accepted: 01/14/2024] [Indexed: 03/12/2024]
Abstract
The current medical practice is more responsive rather than proactive, despite the widely recognized value of early disease detection, including improving the quality of care and reducing medical costs. One of the cornerstones of early disease detection is clinically actionable predictions, where predictions are expected to be accurate, stable, real-time and interpretable. As an example, we used stroke-associated pneumonia (SAP), setting up a transformer-encoder-based model that analyzes highly heterogeneous electronic health records in real-time. The model was proven accurate and stable on an independent test set. In addition, it issued at least one warning for 98.6 % of SAP patients, and on average, its alerts were ahead of physician diagnoses by 2.71 days. We applied Integrated Gradient to glean the model's reasoning process. Supplementing the risk scores, the model highlighted critical historical events on patients' trajectories, which were shown to have high clinical relevance.
Collapse
Affiliation(s)
- Lutao Dai
- Faculty of Business and Economics, The University of Hong Kong, Hong Kong
| | - Xin Yang
- China National Clinical Research Center for Neurological Diseases, Center for Healthcare Quality and Research, Beijing Tiantan Hospital, Capital Medical University, Beijing 100070, PR China; National Center for Healthcare Quality Management in Neurological Diseases, Beijing Tiantan Hospital, Capital Medical University, Beijing 100070, PR China
| | - Hao Li
- China National Clinical Research Center for Neurological Diseases, Center for Big Data Analytics and Artificial Intelligence, Beijing 100070, PR China
| | - Xingquan Zhao
- Vascular Neurology, Department of Neurology, Beijing Tiantan Hospital, Capital Medical University, Beijing 100070, PR China
| | - Lin Lin
- Information Management and Data Center, Beijing Tiantan Hospital, Capital Medical University, Beijing 100070, PR China
| | - Yong Jiang
- China National Clinical Research Center for Neurological Diseases, Center for Big Data Analytics and Artificial Intelligence, Beijing 100070, PR China
| | - Yongjun Wang
- China National Clinical Research Center for Neurological Diseases, Center for Healthcare Quality and Research, Beijing Tiantan Hospital, Capital Medical University, Beijing 100070, PR China; National Center for Healthcare Quality Management in Neurological Diseases, Beijing Tiantan Hospital, Capital Medical University, Beijing 100070, PR China; Vascular Neurology, Department of Neurology, Beijing Tiantan Hospital, Capital Medical University, Beijing 100070, PR China; Beijing Key Laboratory of Translational Medicine for Cerebrovascular Disease, Beijing 100070, PR China.
| | - Zixiao Li
- China National Clinical Research Center for Neurological Diseases, Center for Healthcare Quality and Research, Beijing Tiantan Hospital, Capital Medical University, Beijing 100070, PR China; National Center for Healthcare Quality Management in Neurological Diseases, Beijing Tiantan Hospital, Capital Medical University, Beijing 100070, PR China; Vascular Neurology, Department of Neurology, Beijing Tiantan Hospital, Capital Medical University, Beijing 100070, PR China; Chinese Institute for Brain Research, Beijing 100070, PR China.
| | - Haipeng Shen
- Faculty of Business and Economics, The University of Hong Kong, Hong Kong.
| |
Collapse
|
2
|
Lu K, Tong Y, Yu S, Lin Y, Yang Y, Xu H, Li Y, Yu S. Building a trustworthy AI differential diagnosis application for Crohn's disease and intestinal tuberculosis. BMC Med Inform Decis Mak 2023; 23:160. [PMID: 37582768 PMCID: PMC10426047 DOI: 10.1186/s12911-023-02257-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2022] [Accepted: 08/02/2023] [Indexed: 08/17/2023] Open
Abstract
BACKGROUND Differentiating between Crohn's disease (CD) and intestinal tuberculosis (ITB) with endoscopy is challenging. We aim to perform more accurate endoscopic diagnosis between CD and ITB by building a trustworthy AI differential diagnosis application. METHODS A total of 1271 electronic health record (EHR) patients who had undergone colonoscopies at Peking Union Medical College Hospital (PUMCH) and were clinically diagnosed with CD (n = 875) or ITB (n = 396) were used in this study. We build a workflow to make diagnoses with EHRs and mine differential diagnosis features; this involves finetuning the pretrained language models, distilling them into a light and efficient TextCNN model, interpreting the neural network and selecting differential attribution features, and then adopting manual feature checking and carrying out debias training. RESULTS The accuracy of debiased TextCNN on differential diagnosis between CD and ITB is 0.83 (CR F1: 0.87, ITB F1: 0.77), which is the best among the baselines. On the noisy validation set, its accuracy was 0.70 (CR F1: 0.87, ITB: 0.69), which was significantly higher than that of models without debias. We also find that the debiased model more easily mines the diagnostically significant features. The debiased TextCNN unearthed 39 diagnostic features in the form of phrases, 17 of which were key diagnostic features recognized by the guidelines. CONCLUSION We build a trustworthy AI differential diagnosis application for differentiating between CD and ITB focusing on accuracy, interpretability and robustness. The classifiers perform well, and the features which had statistical significance were in agreement with clinical guidelines.
Collapse
Affiliation(s)
- Keming Lu
- Department of Automation, Tsinghua University, Beijing, 100084, China
| | - Yuanren Tong
- Department of Gastroenterology, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, 100730, China
| | - Si Yu
- Department of Gastroenterology, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, 100730, China
| | - Yucong Lin
- Center for Statistical Science, Tsinghua University, Beijing, 100084, China
- Department of Industrial Engineering, Tsinghua University, Beijing, 100084, China
| | - Yingyun Yang
- Department of Gastroenterology, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, 100730, China
| | - Hui Xu
- Department of Gastroenterology, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, 100730, China
| | - Yue Li
- Department of Gastroenterology, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, 100730, China.
| | - Sheng Yu
- Center for Statistical Science, Tsinghua University, Beijing, 100084, China.
- Department of Industrial Engineering, Tsinghua University, Beijing, 100084, China.
| |
Collapse
|
3
|
Gao S, Rehman J, Dai Y. Assessing comparative importance of DNA sequence and epigenetic modifications on gene expression using a deep convolutional neural network. Comput Struct Biotechnol J 2022; 20:3814-23. [PMID: 35891778 DOI: 10.1016/j.csbj.2022.07.014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2022] [Revised: 07/05/2022] [Accepted: 07/05/2022] [Indexed: 11/26/2022] Open
Abstract
Gene expression is regulated at both transcriptional and post-transcriptional levels. DNA sequence and epigenetic modifications are key factors which regulate gene transcription. Understanding their complex interactions and their respective contributions to gene expression regulation remains a challenge in biological studies. We have developed iSEGnet, a framework of deep convolutional neural network to predict mRNA abundance using the information on DNA sequences as well as epigenetic modifications within genes and their cis-regulatory regions. We demonstrate that our framework outperforms other machine learning models in terms of predicting mRNA abundance using transcriptional and epigenetic profiles from six distinct cell lines/types chosen from the ENCODE. The analysis from the learned models also reveals that specific regions around promotors and transcription termination sites are most important for gene expression regulation. Using the method of Integrated Gradients, we identify narrow segments in these regions which are most likely to impact gene expression for a specific epigenetic modification. We further show that these identified segments are enriched in known active regulatory regions by comparing the transcription factor binding sites obtained via ChIP-seq. Moreover, we demonstrate how iSEGnet can uncover potential transcription factors that have regulatory functions in cancer using two cancer multi-omics data.
Collapse
|
4
|
Zhao S, Hamada M. Multi-resBind: a residual network-based multi-label classifier for in vivo RNA binding prediction and preference visualization. BMC Bioinformatics 2021; 22:554. [PMID: 34781902 PMCID: PMC8594109 DOI: 10.1186/s12859-021-04430-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2021] [Accepted: 10/06/2021] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Protein-RNA interactions play key roles in many processes regulating gene expression. To understand the underlying binding preference, ultraviolet cross-linking and immunoprecipitation (CLIP)-based methods have been used to identify the binding sites for hundreds of RNA-binding proteins (RBPs) in vivo. Using these large-scale experimental data to infer RNA binding preference and predict missing binding sites has become a great challenge. Some existing deep-learning models have demonstrated high prediction accuracy for individual RBPs. However, it remains difficult to avoid significant bias due to the experimental protocol. The DeepRiPe method was recently developed to solve this problem via introducing multi-task or multi-label learning into this field. However, this method has not reached an ideal level of prediction power due to the weak neural network architecture. RESULTS Compared to the DeepRiPe approach, our Multi-resBind method demonstrated substantial improvements using the same large-scale PAR-CLIP dataset with respect to an increase in the area under the receiver operating characteristic curve and average precision. We conducted extensive experiments to evaluate the impact of various types of input data on the final prediction accuracy. The same approach was used to evaluate the effect of loss functions. Finally, a modified integrated gradient was employed to generate attribution maps. The patterns disentangled from relative contributions according to context offer biological insights into the underlying mechanism of protein-RNA interactions. CONCLUSIONS Here, we propose Multi-resBind as a new multi-label deep-learning approach to infer protein-RNA binding preferences and predict novel interactions. The results clearly demonstrate that Multi-resBind is a promising tool to predict unknown binding sites in vivo and gain biology insights into why the neural network makes a given prediction.
Collapse
Affiliation(s)
- Shitao Zhao
- Waseda Research Institute for Science and Engineering, Waseda University, 3-4-1 Okubo Shinjuku-ku, Tokyo, 169-8555, Japan.
| | - Michiaki Hamada
- Department of Electrical Engineering and Bioscience, Faculty of Science and Engineering, Waseda University, 3-4-1 Okubo Shinjuku-ku, Tokyo, 169-8555, Japan. .,Computational Bio Big-Data Open Innovation Laboratory (CBBD-OIL), National Institute of Advanced Industrial Science and Technology, 3-4-1 Okubo Shinjuku-ku, Tokyo, 169-8555, Japan. .,Graduate School of Medicine, Nippon Medical School, 1-1-5 Sendagi, Bunkyo-ku, Tokyo, 113-8602, Japan.
| |
Collapse
|
5
|
Dutta A, Dalmia A, R A, Singh KK, Anand A. Using the Chou's 5-steps rule to predict splice junctions with interpretable bidirectional long short-term memory networks. Comput Biol Med 2020; 116:103558. [PMID: 31783254 DOI: 10.1016/j.compbiomed.2019.103558] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2019] [Revised: 11/17/2019] [Accepted: 11/18/2019] [Indexed: 11/21/2022]
Abstract
Neural models have been able to obtain state-of-the-art performances on several genome sequence-based prediction tasks. Such models take only nucleotide sequences as input and learn relevant features on their own. However, extracting the interpretable motifs from the model remains a challenge. This work explores various existing visualization techniques in their ability to infer relevant sequence information learnt by a recurrent neural network (RNN) on the task of splice junction identification. The visualization techniques have been modulated to suit the genome sequences as input. The visualizations inspect genomic regions at the level of a single nucleotide as well as a span of consecutive nucleotides. This inspection is performed based on the modification of input sequences (perturbation based) or the embedding space (back-propagation based). We infer features pertaining to both canonical and non-canonical splicing from a single neural model. Results indicate that the visualization techniques produce comparable performances for branchpoint detection. However, in the case of canonical donor and acceptor junction motifs, perturbation based visualizations perform better than back-propagation based visualizations, and vice-versa for non-canonical motifs. The source code of our stand-alone SpliceVisuL tool is available at https://github.com/aaiitggrp/SpliceVisuL.
Collapse
|