1
|
Zhu C, Li H, Song Z, Jiang M, Song L, Li L, Wang X, Zheng Q. Jointly constrained group sparse connectivity representation improves early diagnosis of Alzheimer's disease on routinely acquired T1-weighted imaging-based brain network. Health Inf Sci Syst 2024; 12:19. [PMID: 38464465 PMCID: PMC10917732 DOI: 10.1007/s13755-023-00269-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2023] [Accepted: 12/27/2023] [Indexed: 03/12/2024] Open
Abstract
Background Radiomics-based morphological brain networks (radMBN) constructed from routinely acquired structural MRI (sMRI) data have gained attention in Alzheimer's disease (AD). However, the radMBN suffers from limited characterization of AD because sMRI only characterizes anatomical changes and is not a direct measure of neuronal pathology or brain activity. Purpose To establish a group sparse representation of the radMBN under a joint constraint of group-level white matter fiber connectivity and individual-level sMRI regional similarity (JCGS-radMBN). Methods Two publicly available datasets were adopted, including 120 subjects from ADNI with both T1-weighted image (T1WI) and diffusion MRI (dMRI) for JCGS-radMBN construction, 818 subjects from ADNI and 200 subjects solely with T1WI from AIBL for validation in early AD diagnosis. Specifically, the JCGS-radMBN was conducted by jointly estimating non-zero connections among subjects, with the regularization term constrained by group-level white matter fiber connectivity and individual-level sMRI regional similarity. Then, a triplet graph convolutional network was adopted for early AD diagnosis. The discriminative brain connections were identified using a two-sample t-test, and the neurobiological interpretation was validated by correlating the discriminative brain connections with cognitive scores. Results The JCGS-radMBN exhibited superior classification performance over five brain network construction methods. For the typical NC vs. AD classification, the JCGS-radMBN increased by 1-30% in accuracy over the alternatives on ADNI and AIBL. The discriminative brain connections exhibited a strong connectivity to hippocampus, parahippocampal gyrus, and basal ganglia, and had significant correlation with MMSE scores. Conclusion The proposed JCGS-radMBN facilitated the AD characterization of brain network established on routinely acquired imaging modality of sMRI. Supplementary Information The online version of this article (10.1007/s13755-023-00269-0) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Chuanzhen Zhu
- School of Computer and Control Engineering, Yantai University, No 30, Qingquan Road, Laishan District, Yantai, 264005 Shandong China
| | - Honglun Li
- Departments of Medical Oncology and Radiology, Affiliated Yantai Yuhuangding Hospital of Qingdao University Medical College, Yantai, 264099 China
| | - Zhiwei Song
- School of Computer and Control Engineering, Yantai University, No 30, Qingquan Road, Laishan District, Yantai, 264005 Shandong China
| | - Minbo Jiang
- School of Computer and Control Engineering, Yantai University, No 30, Qingquan Road, Laishan District, Yantai, 264005 Shandong China
| | - Limei Song
- School of Medical Imaging, Weifang Medical University, Weifang, 261000 China
| | - Lin Li
- Yantaishan Hospital Affiliated to Binzhou Medical University, Yantai, 264003 China
| | - Xuan Wang
- School of Computer and Control Engineering, Yantai University, No 30, Qingquan Road, Laishan District, Yantai, 264005 Shandong China
| | - Qiang Zheng
- School of Computer and Control Engineering, Yantai University, No 30, Qingquan Road, Laishan District, Yantai, 264005 Shandong China
| |
Collapse
|
2
|
He Y, Ning Z, Zhu X, Zhang Y, Liu C, Jiang S, Yuan Z, Zhang H. Plant lncRNA-miRNA Interaction Prediction Based on Counterfactual Heterogeneous Graph Attention Network. Interdiscip Sci 2024:10.1007/s12539-024-00652-9. [PMID: 39382820 DOI: 10.1007/s12539-024-00652-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2024] [Revised: 08/10/2024] [Accepted: 08/12/2024] [Indexed: 10/10/2024]
Abstract
Identifying interactions between long non-coding RNAs (lncRNAs) and microRNAs (miRNAs) provides a new perspective for understanding regulatory relationships in plant life processes. Recently, computational methods based on graph neural networks (GNNs) have been widely employed to predict lncRNA-miRNA interactions (LMIs), which compensate for the inadequacy of biological experiments. However, the low-semantic and noise of graph limit the performance of existing GNN-based methods. In this paper, we develop a novel Counterfactual Heterogeneous Graph Attention Network (CFHAN) to improve the robustness to against the noise and the prediction of plant LMIs. Firstly, we construct a real-world based lncRNA-miRNA (L-M) heterogeneous network. Secondly, CFHAN utilizes the node-level attention, the semantic-level attention, and the counterfactual links to enhance the node embeddings learning. Finally, these embeddings are used as inputs for Multilayer Perceptron (MLP) to predict the interactions between lncRNAs and miRNAs. Evaluating our method on a benchmark dataset of plant LMIs, CFHAN outperforms five state-of-the-art methods, and achieves an average AUC and average ACC of 0.9953 and 0.9733, respectively. This demonstrates CFHAN's ability to predict plant LMIs and exhibits promising cross-species prediction ability, offering valuable insights for experimental LMI researches.
Collapse
Affiliation(s)
- Yu He
- College of Information and Intelligence, Hunan Agricultural University, Changsha, 410128, China
| | - ZiLan Ning
- College of Information and Intelligence, Hunan Agricultural University, Changsha, 410128, China
| | - XingHui Zhu
- College of Information and Intelligence, Hunan Agricultural University, Changsha, 410128, China
| | - YinQiong Zhang
- College of Information and Intelligence, Hunan Agricultural University, Changsha, 410128, China
| | - ChunHai Liu
- Hunan Engineering & Technology Research Center for Agricultural Big Data Analysis & Decision-Making, College of Plant Protection, Hunan Agricultural University, Changsha, 410128, China
| | - SiWei Jiang
- College of Information and Intelligence, Hunan Agricultural University, Changsha, 410128, China
| | - ZheMing Yuan
- Hunan Engineering & Technology Research Center for Agricultural Big Data Analysis & Decision-Making, College of Plant Protection, Hunan Agricultural University, Changsha, 410128, China.
| | - HongYan Zhang
- College of Information and Intelligence, Hunan Agricultural University, Changsha, 410128, China.
| |
Collapse
|
3
|
Liu Y, Yoshizawa AC, Ling Y, Okuda S. Insights into predicting small molecule retention times in liquid chromatography using deep learning. J Cheminform 2024; 16:113. [PMID: 39375739 PMCID: PMC11460055 DOI: 10.1186/s13321-024-00905-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2024] [Accepted: 09/13/2024] [Indexed: 10/09/2024] Open
Abstract
In untargeted metabolomics, structures of small molecules are annotated using liquid chromatography-mass spectrometry by leveraging information from the molecular retention time (RT) in the chromatogram and m/z (formerly called ''mass-to-charge ratio'') in the mass spectrum. However, correct identification of metabolites is challenging due to the vast array of small molecules. Therefore, various in silico tools for mass spectrometry peak alignment and compound prediction have been developed; however, the list of candidate compounds remains extensive. Accurate RT prediction is important to exclude false candidates and facilitate metabolite annotation. Recent advancements in artificial intelligence (AI) have led to significant breakthroughs in the use of deep learning models in various fields. Release of a large RT dataset has mitigated the bottlenecks limiting the application of deep learning models, thereby improving their application in RT prediction tasks. This review lists the databases that can be used to expand training datasets and concerns the issue about molecular representation inconsistencies in datasets. It also discusses the application of AI technology for RT prediction, particularly in the 5 years following the release of the METLIN small molecule RT dataset. This review provides a comprehensive overview of the AI applications used for RT prediction, highlighting the progress and remaining challenges. SCIENTIFIC CONTRIBUTION: This article focuses on the advancements in small molecule retention time prediction in computational metabolomics over the past five years, with a particular emphasis on the application of AI technologies in this field. It reviews the publicly available datasets for small molecule retention time, the molecular representation methods, the AI algorithms applied in recent studies. Furthermore, it discusses the effectiveness of these models in assisting with the annotation of small molecule structures and the challenges that must be addressed to achieve practical applications.
Collapse
Affiliation(s)
- Yuting Liu
- Medical AI Center, Niigata University School of Medicine, Niigata City, Niigata, 951-8514, Japan
| | - Akiyasu C Yoshizawa
- Medical AI Center, Niigata University School of Medicine, Niigata City, Niigata, 951-8514, Japan
| | - Yiwei Ling
- Medical AI Center, Niigata University School of Medicine, Niigata City, Niigata, 951-8514, Japan
| | - Shujiro Okuda
- Medical AI Center, Niigata University School of Medicine, Niigata City, Niigata, 951-8514, Japan.
| |
Collapse
|
4
|
Banerjee A, Kar S, Roy K, Patlewicz G, Charest N, Benfenati E, Cronin MTD. Molecular similarity in chemical informatics and predictive toxicity modeling: from quantitative read-across (q-RA) to quantitative read-across structure-activity relationship (q-RASAR) with the application of machine learning. Crit Rev Toxicol 2024; 54:659-684. [PMID: 39225123 DOI: 10.1080/10408444.2024.2386260] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2024] [Revised: 07/25/2024] [Accepted: 07/25/2024] [Indexed: 09/04/2024]
Abstract
This article aims to provide a comprehensive critical, yet readable, review of general interest to the chemistry community on molecular similarity as applied to chemical informatics and predictive modeling with a special focus on read-across (RA) and read-across structure-activity relationships (RASAR). Molecular similarity-based computational tools, such as quantitative structure-activity relationships (QSARs) and RA, are routinely used to fill the data gaps for a wide range of properties including toxicity endpoints for regulatory purposes. This review will explore the background of RA starting from how structural information has been used through to how other similarity contexts such as physicochemical, absorption, distribution, metabolism, and elimination (ADME) properties, and biological aspects are being characterized. More recent developments of RA's integration with QSAR have resulted in the emergence of novel models such as ToxRead, generalized read-across (GenRA), and quantitative RASAR (q-RASAR). Conventional QSAR techniques have been excluded from this review except where necessary for context.
Collapse
Affiliation(s)
- Arkaprava Banerjee
- Department of Pharmaceutical Technology, Drug Theoretics and Cheminformatics (DTC) Laboratory, Jadavpur University, Kolkata, India
| | - Supratik Kar
- Department of Chemistry and Physics, Chemometrics & Molecular Modeling Laboratory, Kean University, Union, NJ, USA
| | - Kunal Roy
- Department of Pharmaceutical Technology, Drug Theoretics and Cheminformatics (DTC) Laboratory, Jadavpur University, Kolkata, India
| | - Grace Patlewicz
- Center for Computational Toxicology and Exposure, US Environmental Protection Agency, Research Triangle Park, NC, USA
| | - Nathaniel Charest
- Center for Computational Toxicology and Exposure, US Environmental Protection Agency, Research Triangle Park, NC, USA
| | - Emilio Benfenati
- Department of Environmental Health Sciences, Istituto di Ricerche Farmacologiche Mario Negri IRCCS, Milan, Italy
| | - Mark T D Cronin
- School of Pharmacy and Biomolecular Sciences, Liverpool John Moores University, Liverpool, UK
| |
Collapse
|
5
|
Hausleitner C, Mueller H, Holzinger A, Pfeifer B. Collaborative weighting in federated graph neural networks for disease classification with the human-in-the-loop. Sci Rep 2024; 14:21839. [PMID: 39294334 PMCID: PMC11410954 DOI: 10.1038/s41598-024-72748-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2024] [Accepted: 09/10/2024] [Indexed: 09/20/2024] Open
Abstract
The authors introduce a novel framework that integrates federated learning with Graph Neural Networks (GNNs) to classify diseases, incorporating Human-in-the-Loop methodologies. This advanced framework innovatively employs collaborative voting mechanisms on subgraphs within a Protein-Protein Interaction (PPI) network, situated in a federated ensemble-based deep learning context. This methodological approach marks a significant stride in the development of explainable and privacy-aware Artificial Intelligence, significantly contributing to the progression of personalized digital medicine in a responsible and transparent manner.
Collapse
Affiliation(s)
- Christian Hausleitner
- Institute for Medical Informatics, Statistics and Documentation, Medical University Graz, 8036, Graz, Austria
| | - Heimo Mueller
- Institute for Medical Informatics, Statistics and Documentation, Medical University Graz, 8036, Graz, Austria
| | - Andreas Holzinger
- Institute for Medical Informatics, Statistics and Documentation, Medical University Graz, 8036, Graz, Austria.
- Human-Centered AI Lab, Institute of Forest Engineering, Department of Forest and Soil Sciences, University of Natural Resources and Life Sciences Vienna, 1190, Vienna, Austria.
- Alberta Machine Intelligence Institute, Edmonton, T6G 2R3, Canada.
| | - Bastian Pfeifer
- Institute for Medical Informatics, Statistics and Documentation, Medical University Graz, 8036, Graz, Austria
| |
Collapse
|
6
|
Zeng Z, Yin B, Wang S, Liu J, Yang C, Yao H, Sun X, Sun M, Xie G, Liu Z. ChatMol: interactive molecular discovery with natural language. BIOINFORMATICS (OXFORD, ENGLAND) 2024; 40:btae534. [PMID: 39222004 DOI: 10.1093/bioinformatics/btae534] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/09/2024] [Revised: 08/24/2024] [Accepted: 08/29/2024] [Indexed: 09/04/2024]
Abstract
MOTIVATION Natural language is poised to become a key medium for human-machine interactions in the era of large language models. In the field of biochemistry, tasks such as property prediction and molecule mining are critically important yet technically challenging. Bridging molecular expressions in natural language and chemical language can significantly enhance the interpretability and ease of these tasks. Moreover, it can integrate chemical knowledge from various sources, leading to a deeper understanding of molecules. RESULTS Recognizing these advantages, we introduce the concept of conversational molecular design, a novel task that utilizes natural language to describe and edit target molecules. To better accomplish this task, we develop ChatMol, a knowledgeable and versatile generative pretrained model. This model is enhanced by incorporating experimental property information, molecular spatial knowledge, and the associations between natural and chemical languages. Several typical solutions including large language models (e.g. ChatGPT) are evaluated, proving the challenge of conversational molecular design and the effectiveness of our knowledge enhancement approach. Case observations and analysis offer insights and directions for further exploration of natural-language interaction in molecular discovery. AVAILABILITY AND IMPLEMENTATION Codes and data are provided in https://github.com/Ellenzzn/ChatMol/tree/main.
Collapse
Affiliation(s)
- Zheni Zeng
- Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China
| | - Bangchen Yin
- Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China
| | | | - Jiarui Liu
- PingAn Technology, Beijing 100027, China
| | - Cheng Yang
- School of Computer Science, Beijing University of Posts and Telecommunications, Beijing 100876, China
| | | | | | - Maosong Sun
- Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China
| | | | - Zhiyuan Liu
- Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China
| |
Collapse
|
7
|
Yi X, Liu S, Wu Y, McCloskey D, Meng Z. BPP: a platform for automatic biochemical pathway prediction. Brief Bioinform 2024; 25:bbae355. [PMID: 39082653 PMCID: PMC11289738 DOI: 10.1093/bib/bbae355] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2024] [Revised: 05/16/2024] [Accepted: 07/09/2024] [Indexed: 08/03/2024] Open
Abstract
A biochemical pathway consists of a series of interconnected biochemical reactions to accomplish specific life activities. The participating reactants and resultant products of a pathway, including gene fragments, proteins, and small molecules, coalesce to form a complex reaction network. Biochemical pathways play a critical role in the biochemical domain as they can reveal the flow of biochemical reactions in living organisms, making them essential for understanding life processes. Existing studies of biochemical pathway networks are mainly based on experimentation and pathway database analysis methods, which are plagued by substantial cost constraints. Inspired by the success of representation learning approaches in biomedicine, we develop the biochemical pathway prediction (BPP) platform, which is an automatic BPP platform to identify potential links or attributes within biochemical pathway networks. Our BPP platform incorporates a variety of representation learning models, including the latest hypergraph neural networks technology to model biochemical reactions in pathways. In particular, BPP contains the latest biochemical pathway-based datasets and enables the prediction of potential participants or products of biochemical reactions in biochemical pathways. Additionally, BPP is equipped with an SHAP explainer to explain the predicted results and to calculate the contributions of each participating element. We conduct extensive experiments on our collected biochemical pathway dataset to benchmark the effectiveness of all models available on BPP. Furthermore, our detailed case studies based on the chronological pattern of our dataset demonstrate the effectiveness of our platform. Our BPP web portal, source code and datasets are freely accessible at https://github.com/Glasgow-AI4BioMed/BPP.
Collapse
Affiliation(s)
- Xinhao Yi
- School of Computing Science, University of Glasgow, 18 Lilybank Gardens, Glasgow G12 8RZ, United Kingdom
| | - Siwei Liu
- Machine Learning Department, Mohamed bin Zayed University of Artificial Intelligence, Building 1B, Masdar City, Abu Dhabi 000000, United Arab Emirates
| | - Yu Wu
- School of Mathematical Sciences, Fudan University, 220 Handan Rd, Yangpu District, Shanghai 200438, China
| | - Douglas McCloskey
- Artificial Intelligence, BioMed X Institute, Im Neuenheimer Feld 515, Heidelberg 69120, Germany
| | - Zaiqiao Meng
- School of Computing Science, University of Glasgow, 18 Lilybank Gardens, Glasgow G12 8RZ, United Kingdom
| |
Collapse
|
8
|
Hu X, Sun Z, Nian Y, Wang Y, Dang Y, Li F, Feng J, Yu E, Tao C. Self-Explainable Graph Neural Network for Alzheimer Disease and Related Dementias Risk Prediction: Algorithm Development and Validation Study. JMIR Aging 2024; 7:e54748. [PMID: 38976869 PMCID: PMC11263893 DOI: 10.2196/54748] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2023] [Revised: 03/31/2024] [Accepted: 06/02/2024] [Indexed: 07/10/2024] Open
Abstract
BACKGROUND Alzheimer disease and related dementias (ADRD) rank as the sixth leading cause of death in the United States, underlining the importance of accurate ADRD risk prediction. While recent advancements in ADRD risk prediction have primarily relied on imaging analysis, not all patients undergo medical imaging before an ADRD diagnosis. Merging machine learning with claims data can reveal additional risk factors and uncover interconnections among diverse medical codes. OBJECTIVE The study aims to use graph neural networks (GNNs) with claim data for ADRD risk prediction. Addressing the lack of human-interpretable reasons behind these predictions, we introduce an innovative, self-explainable method to evaluate relationship importance and its influence on ADRD risk prediction. METHODS We used a variationally regularized encoder-decoder GNN (variational GNN [VGNN]) integrated with our proposed relation importance method for estimating ADRD likelihood. This self-explainable method can provide a feature-important explanation in the context of ADRD risk prediction, leveraging relational information within a graph. Three scenarios with 1-year, 2-year, and 3-year prediction windows were created to assess the model's efficiency, respectively. Random forest (RF) and light gradient boost machine (LGBM) were used as baselines. By using this method, we further clarify the key relationships for ADRD risk prediction. RESULTS In scenario 1, the VGNN model showed area under the receiver operating characteristic (AUROC) scores of 0.7272 and 0.7480 for the small subset and the matched cohort data set. It outperforms RF and LGBM by 10.6% and 9.1%, respectively, on average. In scenario 2, it achieved AUROC scores of 0.7125 and 0.7281, surpassing the other models by 10.5% and 8.9%, respectively. Similarly, in scenario 3, AUROC scores of 0.7001 and 0.7187 were obtained, exceeding 10.1% and 8.5% than the baseline models, respectively. These results clearly demonstrate the significant superiority of the graph-based approach over the tree-based models (RF and LGBM) in predicting ADRD. Furthermore, the integration of the VGNN model and our relation importance interpretation could provide valuable insight into paired factors that may contribute to or delay ADRD progression. CONCLUSIONS Using our innovative self-explainable method with claims data enhances ADRD risk prediction and provides insights into the impact of interconnected medical code relationships. This methodology not only enables ADRD risk modeling but also shows potential for other image analysis predictions using claims data.
Collapse
Affiliation(s)
- Xinyue Hu
- Department of Artificial Intelligence and Informatics, Mayo Clinic, Jacksonville, FL, United States
- McWilliams School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, United States
| | - Zenan Sun
- McWilliams School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, United States
| | - Yi Nian
- McWilliams School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, United States
| | - Yichen Wang
- Division of Hospital Medicine at Perelman School of Medicine, The University of Pennsylvania, Philadelphia, PA, United States
| | - Yifang Dang
- McWilliams School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, United States
| | - Fang Li
- Department of Artificial Intelligence and Informatics, Mayo Clinic, Jacksonville, FL, United States
- McWilliams School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, United States
| | - Jingna Feng
- Department of Artificial Intelligence and Informatics, Mayo Clinic, Jacksonville, FL, United States
- McWilliams School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, United States
| | - Evan Yu
- McWilliams School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, United States
| | - Cui Tao
- Department of Artificial Intelligence and Informatics, Mayo Clinic, Jacksonville, FL, United States
- McWilliams School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, United States
| |
Collapse
|
9
|
Chen T, Xu ZQJ. Efficient and Flexible Method for Reducing Moderate-Size Deep Neural Networks with Condensation. ENTROPY (BASEL, SWITZERLAND) 2024; 26:567. [PMID: 39056928 PMCID: PMC11276590 DOI: 10.3390/e26070567] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/28/2024] [Revised: 06/20/2024] [Accepted: 06/24/2024] [Indexed: 07/28/2024]
Abstract
Neural networks have been extensively applied to a variety of tasks, achieving astounding results. Applying neural networks in the scientific field is an important research direction that is gaining increasing attention. In scientific applications, the scale of neural networks is generally moderate size, mainly to ensure the speed of inference during application. Additionally, comparing neural networks to traditional algorithms in scientific applications is inevitable. These applications often require rapid computations, making the reduction in neural network sizes increasingly important. Existing work has found that the powerful capabilities of neural networks are primarily due to their nonlinearity. Theoretical work has discovered that under strong nonlinearity, neurons in the same layer tend to behave similarly, a phenomenon known as condensation. Condensation offers an opportunity to reduce the scale of neural networks to a smaller subnetwork with a similar performance. In this article, we propose a condensation reduction method to verify the feasibility of this idea in practical problems, thereby validating existing theories. Our reduction method can currently be applied to both fully connected networks and convolutional networks, achieving positive results. In complex combustion acceleration tasks, we reduced the size of the neural network to 41.7% of its original scale while maintaining prediction accuracy. In the CIFAR10 image classification task, we reduced the network size to 11.5% of the original scale, still maintaining a satisfactory validation accuracy. Our method can be applied to most trained neural networks, reducing computational pressure and improving inference speed.
Collapse
Affiliation(s)
| | - Zhi-Qin John Xu
- School of Mathematical Sciences, Institute of Natural Sciences, MOE-LSC, Shanghai Jiao Tong University, Shanghai 200240, China
| |
Collapse
|
10
|
Li H, Jiang L, Yang K, Shang S, Li M, Lv Z. iNP_ESM: Neuropeptide Identification Based on Evolutionary Scale Modeling and Unified Representation Embedding Features. Int J Mol Sci 2024; 25:7049. [PMID: 39000158 PMCID: PMC11240975 DOI: 10.3390/ijms25137049] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2024] [Revised: 06/17/2024] [Accepted: 06/25/2024] [Indexed: 07/16/2024] Open
Abstract
Neuropeptides are biomolecules with crucial physiological functions. Accurate identification of neuropeptides is essential for understanding nervous system regulatory mechanisms. However, traditional analysis methods are expensive and laborious, and the development of effective machine learning models continues to be a subject of current research. Hence, in this research, we constructed an SVM-based machine learning neuropeptide predictor, iNP_ESM, by integrating protein language models Evolutionary Scale Modeling (ESM) and Unified Representation (UniRep) for the first time. Our model utilized feature fusion and feature selection strategies to improve prediction accuracy during optimization. In addition, we validated the effectiveness of the optimization strategy with UMAP (Uniform Manifold Approximation and Projection) visualization. iNP_ESM outperforms existing models on a variety of machine learning evaluation metrics, with an accuracy of up to 0.937 in cross-validation and 0.928 in independent testing, demonstrating optimal neuropeptide recognition capabilities. We anticipate improved neuropeptide data in the future, and we believe that the iNP_ESM model will have broader applications in the research and clinical treatment of neurological diseases.
Collapse
Affiliation(s)
- Honghao Li
- College of Biomedical Engineering, Sichuan University, Chengdu 610041, China
| | - Liangzhen Jiang
- College of Food and Biological Engineering, Chengdu University, Chengdu 610106, China
- Country Key Laboratory of Coarse Cereal Processing, Ministry of Agriculture and Rural Affairs, Chengdu 610106, China
| | - Kaixiang Yang
- College of Software Engineering, Sichuan University, Chengdu 610041, China
| | - Shulin Shang
- College of Biomedical Engineering, Sichuan University, Chengdu 610041, China
| | - Mingxin Li
- College of Biomedical Engineering, Sichuan University, Chengdu 610041, China
| | - Zhibin Lv
- College of Biomedical Engineering, Sichuan University, Chengdu 610041, China
| |
Collapse
|
11
|
Rashid PQ, Türker İ. Lung Disease Detection Using U-Net Feature Extractor Cascaded by Graph Convolutional Network. Diagnostics (Basel) 2024; 14:1313. [PMID: 38928728 PMCID: PMC11202625 DOI: 10.3390/diagnostics14121313] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2024] [Revised: 06/17/2024] [Accepted: 06/18/2024] [Indexed: 06/28/2024] Open
Abstract
Computed tomography (CT) scans have recently emerged as a major technique for the fast diagnosis of lung diseases via image classification techniques. In this study, we propose a method for the diagnosis of COVID-19 disease with improved accuracy by utilizing graph convolutional networks (GCN) at various layer formations and kernel sizes to extract features from CT scan images. We apply a U-Net model to aid in segmentation and feature extraction. In contrast with previous research retrieving deep features from convolutional filters and pooling layers, which fail to fully consider the spatial connectivity of the nodes, we employ GCNs for classification and prediction to capture spatial connectivity patterns, which provides a significant association benefit. We handle the extracted deep features to form an adjacency matrix that contains a graph structure and pass it to a GCN along with the original image graph and the largest kernel graph. We combine these graphs to form one block of the graph input and then pass it through a GCN with an additional dropout layer to avoid overfitting. Our findings show that the suggested framework, called the feature-extracted graph convolutional network (FGCN), performs better in identifying lung diseases compared to recently proposed deep learning architectures that are not based on graph representations. The proposed model also outperforms a variety of transfer learning models commonly used for medical diagnosis tasks, highlighting the abstraction potential of the graph representation over traditional methods.
Collapse
Affiliation(s)
| | - İlker Türker
- Department of Computer Engineering, Karabuk University, 78050 Karabuk, Turkey;
| |
Collapse
|
12
|
Zhang W, Zhang P, Sun W, Xu J, Liao L, Cao Y, Han Y. Improving plant miRNA-target prediction with self-supervised k-mer embedding and spectral graph convolutional neural network. PeerJ 2024; 12:e17396. [PMID: 38799058 PMCID: PMC11122044 DOI: 10.7717/peerj.17396] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2024] [Accepted: 04/25/2024] [Indexed: 05/29/2024] Open
Abstract
Deciphering the targets of microRNAs (miRNAs) in plants is crucial for comprehending their function and the variation in phenotype that they cause. As the highly cell-specific nature of miRNA regulation, recent computational approaches usually utilize expression data to identify the most physiologically relevant targets. Although these methods are effective, they typically require a large sample size and high-depth sequencing to detect potential miRNA-target pairs, thereby limiting their applicability in improving plant breeding. In this study, we propose a novel miRNA-target prediction framework named kmerPMTF (k-mer-based prediction framework for plant miRNA-target). Our framework effectively extracts the latent semantic embeddings of sequences by utilizing k-mer splitting and a deep self-supervised neural network. We construct multiple similarity networks based on k-mer embeddings and employ graph convolutional networks to derive deep representations of miRNAs and targets and calculate the probabilities of potential associations. We evaluated the performance of kmerPMTF on four typical plant datasets: Arabidopsis thaliana, Oryza sativa, Solanum lycopersicum, and Prunus persica. The results demonstrate its ability to achieve AUPRC values of 84.9%, 91.0%, 80.1%, and 82.1% in 5-fold cross-validation, respectively. Compared with several state-of-the-art existing methods, our framework achieves better performance on threshold-independent evaluation metrics. Overall, our study provides an efficient and simplified methodology for identifying plant miRNA-target associations, which will contribute to a deeper comprehension of miRNA regulatory mechanisms in plants.
Collapse
Affiliation(s)
- Weihan Zhang
- CAS Key Laboratory of Plant Germplasm Enhancement and Specialty Agriculture, Wuhan Botanical Garden, The Innovative Academy of Seed Design of Chinese Academy of Sciences, Wuhan, Hubei Province, China
- Sino-African Joint Research Center, Chinese Academy of Sciences, Wuhan, Hubei Province, China
| | - Ping Zhang
- College of Informatics, Huazhong Agricultural University, Wuhan, Hubei Province, China
| | - Weicheng Sun
- College of Informatics, Huazhong Agricultural University, Wuhan, Hubei Province, China
| | - Jinsheng Xu
- College of Informatics, Huazhong Agricultural University, Wuhan, Hubei Province, China
| | - Liao Liao
- CAS Key Laboratory of Plant Germplasm Enhancement and Specialty Agriculture, Wuhan Botanical Garden, The Innovative Academy of Seed Design of Chinese Academy of Sciences, Wuhan, Hubei Province, China
- Sino-African Joint Research Center, Chinese Academy of Sciences, Wuhan, Hubei Province, China
| | - Yunpeng Cao
- CAS Key Laboratory of Plant Germplasm Enhancement and Specialty Agriculture, Wuhan Botanical Garden, The Innovative Academy of Seed Design of Chinese Academy of Sciences, Wuhan, Hubei Province, China
- Sino-African Joint Research Center, Chinese Academy of Sciences, Wuhan, Hubei Province, China
| | - Yuepeng Han
- CAS Key Laboratory of Plant Germplasm Enhancement and Specialty Agriculture, Wuhan Botanical Garden, The Innovative Academy of Seed Design of Chinese Academy of Sciences, Wuhan, Hubei Province, China
- Sino-African Joint Research Center, Chinese Academy of Sciences, Wuhan, Hubei Province, China
| |
Collapse
|
13
|
Middleton L, Melas I, Vasavda C, Raies A, Rozemberczki B, Dhindsa RS, Dhindsa JS, Weido B, Wang Q, Harper AR, Edwards G, Petrovski S, Vitsios D. Phenome-wide identification of therapeutic genetic targets, leveraging knowledge graphs, graph neural networks, and UK Biobank data. SCIENCE ADVANCES 2024; 10:eadj1424. [PMID: 38718126 PMCID: PMC11078195 DOI: 10.1126/sciadv.adj1424] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/08/2023] [Accepted: 04/04/2024] [Indexed: 05/12/2024]
Abstract
The ongoing expansion of human genomic datasets propels therapeutic target identification; however, extracting gene-disease associations from gene annotations remains challenging. Here, we introduce Mantis-ML 2.0, a framework integrating AstraZeneca's Biological Insights Knowledge Graph and numerous tabular datasets, to assess gene-disease probabilities throughout the phenome. We use graph neural networks, capturing the graph's holistic structure, and train them on hundreds of balanced datasets via a robust semi-supervised learning framework to provide gene-disease probabilities across the human exome. Mantis-ML 2.0 incorporates natural language processing to automate disease-relevant feature selection for thousands of diseases. The enhanced models demonstrate a 6.9% average classification power boost, achieving a median receiver operating characteristic (ROC) area under curve (AUC) score of 0.90 across 5220 diseases from Human Phenotype Ontology, OpenTargets, and Genomics England. Notably, Mantis-ML 2.0 prioritizes associations from an independent UK Biobank phenome-wide association study (PheWAS), providing a stronger form of triaging and mitigating against underpowered PheWAS associations. Results are exposed through an interactive web resource.
Collapse
Affiliation(s)
- Lawrence Middleton
- Centre for Genomics Research, Discovery Sciences, BioPharmaceuticals R&D, AstraZeneca, Cambridge, UK
| | - Ioannis Melas
- Centre for Genomics Research, Discovery Sciences, BioPharmaceuticals R&D, AstraZeneca, Cambridge, UK
| | - Chirag Vasavda
- Centre for Genomics Research, Discovery Sciences, BioPharmaceuticals R&D, AstraZeneca, Waltham, MA 02451, USA
| | - Arwa Raies
- Centre for Genomics Research, Discovery Sciences, BioPharmaceuticals R&D, AstraZeneca, Cambridge, UK
| | - Benedek Rozemberczki
- Biological Insights Knowledge Graph (BIKG), Research D&A, R&D IT, AstraZeneca, Cambridge, UK
| | - Ryan S. Dhindsa
- Centre for Genomics Research, Discovery Sciences, BioPharmaceuticals R&D, AstraZeneca, Waltham, MA 02451, USA
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
- Jan and Dan Duncan Neurological Research Institute at Texas Children's Hospital, Houston, TX 77030, USA
| | - Justin S. Dhindsa
- Medical Scientist Training Program, Baylor College of Medicine, Houston, TX 77030, USA
| | - Blake Weido
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Quanli Wang
- Centre for Genomics Research, Discovery Sciences, BioPharmaceuticals R&D, AstraZeneca, Waltham, MA 02451, USA
| | - Andrew R. Harper
- Centre for Genomics Research, Discovery Sciences, BioPharmaceuticals R&D, AstraZeneca, Cambridge, UK
| | - Gavin Edwards
- Biological Insights Knowledge Graph (BIKG), Research D&A, R&D IT, AstraZeneca, Cambridge, UK
| | - Slavé Petrovski
- Centre for Genomics Research, Discovery Sciences, BioPharmaceuticals R&D, AstraZeneca, Cambridge, UK
- Department of Medicine, University of Melbourne, Austin Health, Melbourne, Victoria, Australia
| | - Dimitrios Vitsios
- Centre for Genomics Research, Discovery Sciences, BioPharmaceuticals R&D, AstraZeneca, Cambridge, UK
| |
Collapse
|
14
|
Lee Y, Xu Y, Gao P, Chen J. TENET: Triple-enhancement based graph neural network for cell-cell interaction network reconstruction from spatial transcriptomics. J Mol Biol 2024; 436:168543. [PMID: 38508302 DOI: 10.1016/j.jmb.2024.168543] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2024] [Revised: 03/03/2024] [Accepted: 03/13/2024] [Indexed: 03/22/2024]
Abstract
Cellular communication relies on the intricate interplay of signaling molecules, forming the Cell-cell Interaction network (CCI) that coordinates tissue behavior. Researchers have shown the capability of shallow neural networks in reconstructing CCI, given molecules' abundance in the Spatial Transcriptomics (ST) data. When encountering situations such as sparse connections in CCI and excessive noise, the susceptibility of shallow networks to these factors significantly impacts the accuracy of CCI reconstruction, resulting in subpar results. To reconstruct a more comprehensive and accurate CCI, we propose a novel method named Triple-Enhancement based Graph Neural Network (TENET). In TENET, three progressive enhancement mechanisms build upon each other, creating a cumulative effect. This approach can ensure the ability to capture valuable features in limited data and amplify the noise signal to facilitate the denoising effect. Additionally, the whole architecture guides the decoding reconstruction phase with integrated knowledge, which leverages the accumulated insights from each stage of enhancement to ensure a refined and comprehensive CCI reconstruction. The presented TENET has been implemented and tested on both real and synthetic ST datasets. Averagely, the CCI reconstruction using TENET achieves a 9.61% improvement in Average Precision (AP) and a 7.32% improvement in Area Under the Receiver Operating Characteristic (AUROC) compared to the existing state-of-the-art (SOTA) method. The source code and data are available at https://github.com/Yujian-Lee/TENET.
Collapse
Affiliation(s)
- Yujian Lee
- Guangdong Provincial Key Laboratory IRADS, Beijing Normal University-Hong Kong Baptist University United International College, Zhuhai, China; Department of Computer Science, Hong Kong Baptist University, Hong Kong Special Administrative Region; Beijing Normal University-Hong Kong Baptist University United International College, Zhuhai, China
| | - Yongqi Xu
- Department of Computer Science and Technology, Guangdong University of Technology, Guangzhou, China
| | - Peng Gao
- Department of Computer Science, Hong Kong Baptist University, Hong Kong Special Administrative Region; Beijing Normal University-Hong Kong Baptist University United International College, Zhuhai, China
| | - Jiaxing Chen
- Guangdong Provincial Key Laboratory IRADS, Beijing Normal University-Hong Kong Baptist University United International College, Zhuhai, China; Beijing Normal University-Hong Kong Baptist University United International College, Zhuhai, China.
| |
Collapse
|
15
|
Rana S, Hosen MJ, Tonni TJ, Rony MAH, Fatema K, Hasan MZ, Rahman MT, Khan RT, Jan T, Whaiduzzaman M. DeepChestGNN: A Comprehensive Framework for Enhanced Lung Disease Identification through Advanced Graphical Deep Features. SENSORS (BASEL, SWITZERLAND) 2024; 24:2830. [PMID: 38732936 PMCID: PMC11086108 DOI: 10.3390/s24092830] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/02/2024] [Revised: 04/06/2024] [Accepted: 04/16/2024] [Indexed: 05/13/2024]
Abstract
Lung diseases are the third-leading cause of mortality in the world. Due to compromised lung function, respiratory difficulties, and physiological complications, lung disease brought on by toxic substances, pollution, infections, or smoking results in millions of deaths every year. Chest X-ray images pose a challenge for classification due to their visual similarity, leading to confusion among radiologists. To imitate those issues, we created an automated system with a large data hub that contains 17 datasets of chest X-ray images for a total of 71,096, and we aim to classify ten different disease classes. For combining various resources, our large datasets contain noise and annotations, class imbalances, data redundancy, etc. We conducted several image pre-processing techniques to eliminate noise and artifacts from images, such as resizing, de-annotation, CLAHE, and filtering. The elastic deformation augmentation technique also generates a balanced dataset. Then, we developed DeepChestGNN, a novel medical image classification model utilizing a deep convolutional neural network (DCNN) to extract 100 significant deep features indicative of various lung diseases. This model, incorporating Batch Normalization, MaxPooling, and Dropout layers, achieved a remarkable 99.74% accuracy in extensive trials. By combining graph neural networks (GNNs) with feedforward layers, the architecture is very flexible when it comes to working with graph data for accurate lung disease classification. This study highlights the significant impact of combining advanced research with clinical application potential in diagnosing lung diseases, providing an optimal framework for precise and efficient disease identification and classification.
Collapse
Affiliation(s)
- Shakil Rana
- Health Informatics Research Laboratory (HIRL), Department of Computer Science and Engineering, Daffodil International University, Dhaka 1207, Bangladesh; (S.R.); (M.J.H.); (T.J.T.); (M.A.H.R.); (K.F.); (M.Z.H.)
| | - Md Jabed Hosen
- Health Informatics Research Laboratory (HIRL), Department of Computer Science and Engineering, Daffodil International University, Dhaka 1207, Bangladesh; (S.R.); (M.J.H.); (T.J.T.); (M.A.H.R.); (K.F.); (M.Z.H.)
| | - Tasnim Jahan Tonni
- Health Informatics Research Laboratory (HIRL), Department of Computer Science and Engineering, Daffodil International University, Dhaka 1207, Bangladesh; (S.R.); (M.J.H.); (T.J.T.); (M.A.H.R.); (K.F.); (M.Z.H.)
| | - Md. Awlad Hossen Rony
- Health Informatics Research Laboratory (HIRL), Department of Computer Science and Engineering, Daffodil International University, Dhaka 1207, Bangladesh; (S.R.); (M.J.H.); (T.J.T.); (M.A.H.R.); (K.F.); (M.Z.H.)
| | - Kaniz Fatema
- Health Informatics Research Laboratory (HIRL), Department of Computer Science and Engineering, Daffodil International University, Dhaka 1207, Bangladesh; (S.R.); (M.J.H.); (T.J.T.); (M.A.H.R.); (K.F.); (M.Z.H.)
| | - Md. Zahid Hasan
- Health Informatics Research Laboratory (HIRL), Department of Computer Science and Engineering, Daffodil International University, Dhaka 1207, Bangladesh; (S.R.); (M.J.H.); (T.J.T.); (M.A.H.R.); (K.F.); (M.Z.H.)
| | - Md. Tanvir Rahman
- School of Health and Rehabilitation Sciences, The University of Queensland, St. Lucia, QLD 4072, Australia
- Department of Information and Communication Technology, Mawlana Bhashani Science and Technology University, Tangail 1902, Bangladesh
| | - Risala Tasin Khan
- Institute of Information Technology, Jahangirnagar University, Dhaka 1342, Bangladesh;
| | - Tony Jan
- Centre for Artificial Intelligence Research and Optimisation (AIRO), Torrens University, Ultimo, NSW 2007, Australia;
| | - Md Whaiduzzaman
- Centre for Artificial Intelligence Research and Optimisation (AIRO), Torrens University, Ultimo, NSW 2007, Australia;
- School of Information Systems, Queensland University of Technology, Brisbane, QLD 4000, Australia
| |
Collapse
|
16
|
Wu X, Hou W, Zhao Z, Huang L, Sheng N, Yang Q, Zhang S, Wang Y. MMGAT: a graph attention network framework for ATAC-seq motifs finding. BMC Bioinformatics 2024; 25:158. [PMID: 38643066 PMCID: PMC11031952 DOI: 10.1186/s12859-024-05774-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2024] [Accepted: 04/10/2024] [Indexed: 04/22/2024] Open
Abstract
BACKGROUND Motif finding in Assay for Transposase-Accessible Chromatin using sequencing (ATAC-seq) data is essential to reveal the intricacies of transcription factor binding sites (TFBSs) and their pivotal roles in gene regulation. Deep learning technologies including convolutional neural networks (CNNs) and graph neural networks (GNNs), have achieved success in finding ATAC-seq motifs. However, CNN-based methods are limited by the fixed width of the convolutional kernel, which makes it difficult to find multiple transcription factor binding sites with different lengths. GNN-based methods has the limitation of using the edge weight information directly, makes it difficult to aggregate the neighboring nodes' information more efficiently when representing node embedding. RESULTS To address this challenge, we developed a novel graph attention network framework named MMGAT, which employs an attention mechanism to adjust the attention coefficients among different nodes. And then MMGAT finds multiple ATAC-seq motifs based on the attention coefficients of sequence nodes and k-mer nodes as well as the coexisting probability of k-mers. Our approach achieved better performance on the human ATAC-seq datasets compared to existing tools, as evidenced the highest scores on the precision, recall, F1_score, ACC, AUC, and PRC metrics, as well as finding 389 higher quality motifs. To validate the performance of MMGAT in predicting TFBSs and finding motifs on more datasets, we enlarged the number of the human ATAC-seq datasets to 180 and newly integrated 80 mouse ATAC-seq datasets for multi-species experimental validation. Specifically on the mouse ATAC-seq dataset, MMGAT also achieved the highest scores on six metrics and found 356 higher-quality motifs. To facilitate researchers in utilizing MMGAT, we have also developed a user-friendly web server named MMGAT-S that hosts the MMGAT method and ATAC-seq motif finding results. CONCLUSIONS The advanced methodology MMGAT provides a robust tool for finding ATAC-seq motifs, and the comprehensive server MMGAT-S makes a significant contribution to genomics research. The open-source code of MMGAT can be found at https://github.com/xiaotianr/MMGAT , and MMGAT-S is freely available at https://www.mmgraphws.com/MMGAT-S/ .
Collapse
Affiliation(s)
- Xiaotian Wu
- Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun, 130012, China
- School of Artificial Intelligence, Jilin University, Changchun, 130012, China
| | - Wenju Hou
- Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun, 130012, China
| | - Ziqi Zhao
- Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun, 130012, China
| | - Lan Huang
- Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun, 130012, China
| | - Nan Sheng
- Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun, 130012, China
| | - Qixing Yang
- Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun, 130012, China
| | - Shuangquan Zhang
- School of Cyber Science and Engineering, Nanjing University of Science and Technology, Nanjing, 210094, China.
| | - Yan Wang
- Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun, 130012, China.
- School of Artificial Intelligence, Jilin University, Changchun, 130012, China.
| |
Collapse
|
17
|
Yin R, Zhao H, Li L, Yang Q, Zeng M, Yang C, Bian J, Xie M. Gra-CRC-miRTar: The pre-trained nucleotide-to-graph neural networks to identify potential miRNA targets in colorectal cancer. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.04.15.589599. [PMID: 38659732 PMCID: PMC11042274 DOI: 10.1101/2024.04.15.589599] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/26/2024]
Abstract
Colorectal cancer (CRC) is the third most diagnosed cancer and the second deadliest cancer worldwide representing a major public health problem. In recent years, increasing evidence has shown that microRNA (miRNA) can control the expression of targeted human messenger RNA (mRNA) by reducing their abundance or translation, acting as oncogenes or tumor suppressors in various cancers, including CRC. Due to the significant up-regulation of oncogenic miRNAs in CRC, elucidating the underlying mechanism and identifying dysregulated miRNA targets may provide a basis for improving current therapeutic interventions. In this paper, we proposed Gra-CRC-miRTar, a pre-trained nucleotide-to-graph neural network framework, for identifying potential miRNA targets in CRC. Different from previous studies, we constructed two pre-trained models to encode RNA sequences and transformed them into de Bruijn graphs. We employed different graph neural networks to learn the latent representations. The embeddings generated from de Bruijn graphs were then fed into a Multilayer Perceptron (MLP) to perform the prediction tasks. Our extensive experiments show that Gra-CRC-miRTar achieves better performance than other deep learning algorithms and existing predictors. In addition, our analyses also successfully revealed 172 out of 201 functional interactions through experimentally validated miRNA-mRNA pairs in CRC. Collectively, our effort provides an accurate and efficient framework to identify potential miRNA targets in CRC, which can also be used to reveal miRNA target interactions in other malignancies, facilitating the development of novel therapeutics.
Collapse
Affiliation(s)
- Rui Yin
- Department of Health Outcomes and Biomedical Informatics, University of Florida, Gainesville, FL, USA
- These authors contributed equally
| | - Hongru Zhao
- Department of Health Outcomes and Biomedical Informatics, University of Florida, Gainesville, FL, USA
- These authors contributed equally
| | - Lu Li
- Department of Biochemistry and Molecular Biology, University of Florida, Gainesville, FL, USA
| | - Qiang Yang
- Department of Health Outcomes and Biomedical Informatics, University of Florida, Gainesville, FL, USA
| | - Min Zeng
- School of Computer Science and Engineering, Central South University, Changsha, Hunan, China
| | - Carl Yang
- Department of Computer Science, Emory University, Atlanta, GA, USA
| | - Jiang Bian
- Department of Health Outcomes and Biomedical Informatics, University of Florida, Gainesville, FL, USA
| | - Mingyi Xie
- Department of Biochemistry and Molecular Biology, University of Florida, Gainesville, FL, USA
| |
Collapse
|
18
|
Li H, Peralta AG, Schoffelen S, Hansen AH, Arnsdorf J, Schinn SM, Skidmore J, Choudhury B, Paulchakrabarti M, Voldborg BG, Chiang AW, Lewis NE. LeGenD: determining N-glycoprofiles using an explainable AI-leveraged model with lectin profiling. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.03.27.587044. [PMID: 38585977 PMCID: PMC10996628 DOI: 10.1101/2024.03.27.587044] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/09/2024]
Abstract
Glycosylation affects many vital functions of organisms. Therefore, its surveillance is critical from basic science to biotechnology, including biopharmaceutical development and clinical diagnostics. However, conventional glycan structure analysis faces challenges with throughput and cost. Lectins offer an alternative approach for analyzing glycans, but they only provide glycan epitopes and not full glycan structure information. To overcome these limitations, we developed LeGenD, a lectin and AI-based approach to predict N-glycan structures and determine their relative abundance in purified proteins based on lectin-binding patterns. We trained the LeGenD model using 309 glycoprofiles from 10 recombinant proteins, produced in 30 glycoengineered CHO cell lines. Our approach accurately reconstructed experimentally-measured N-glycoprofiles of bovine Fetuin B and IgG from human sera. Explanatory AI analysis with SHapley Additive exPlanations (SHAP) helped identify the critical lectins for glycoprofile predictions. Our LeGenD approach thus presents an alternative approach for N-glycan analysis.
Collapse
Affiliation(s)
- Haining Li
- Department of Bioengineering, University of California, San Diego, La Jolla, CA 92093, USA
| | - Angelo G. Peralta
- Department of Pediatrics, University of California, San Diego, La Jolla, CA 92093, USA
| | - Sanne Schoffelen
- National Biologics Facility Department of Biotechnology and Biomedicine, Technical University of Denmark, Building 220, Kemitorvet, 2800 Kgs. Lyngby Denmark
| | - Anders Holmgaard Hansen
- National Biologics Facility Department of Biotechnology and Biomedicine, Technical University of Denmark, Building 220, Kemitorvet, 2800 Kgs. Lyngby Denmark
| | - Johnny Arnsdorf
- National Biologics Facility Department of Biotechnology and Biomedicine, Technical University of Denmark, Building 220, Kemitorvet, 2800 Kgs. Lyngby Denmark
| | - Song-Min Schinn
- Department of Bioengineering, University of California, San Diego, La Jolla, CA 92093, USA
| | - Jonathan Skidmore
- Department of Bioengineering, University of California, San Diego, La Jolla, CA 92093, USA
- Department of Microbiology and Molecular Biology, Brigham Young University, Provo, UT 84602, USA
| | - Biswa Choudhury
- Glycobiology Research and Training Center, University of California, San Diego, La Jolla, CA 92093, USA
| | - Mousumi Paulchakrabarti
- Glycobiology Research and Training Center, University of California, San Diego, La Jolla, CA 92093, USA
| | - Bjorn G. Voldborg
- National Biologics Facility Department of Biotechnology and Biomedicine, Technical University of Denmark, Building 220, Kemitorvet, 2800 Kgs. Lyngby Denmark
| | - Austin W.T. Chiang
- Department of Pediatrics, University of California, San Diego, La Jolla, CA 92093, USA
| | - Nathan E. Lewis
- Department of Bioengineering, University of California, San Diego, La Jolla, CA 92093, USA
- Department of Pediatrics, University of California, San Diego, La Jolla, CA 92093, USA
| |
Collapse
|
19
|
Gogoshin G, Rodin AS. Graph Neural Networks in Cancer and Oncology Research: Emerging and Future Trends. Cancers (Basel) 2023; 15:5858. [PMID: 38136405 PMCID: PMC10742144 DOI: 10.3390/cancers15245858] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2023] [Revised: 12/09/2023] [Accepted: 12/14/2023] [Indexed: 12/24/2023] Open
Abstract
Next-generation cancer and oncology research needs to take full advantage of the multimodal structured, or graph, information, with the graph data types ranging from molecular structures to spatially resolved imaging and digital pathology, biological networks, and knowledge graphs. Graph Neural Networks (GNNs) efficiently combine the graph structure representations with the high predictive performance of deep learning, especially on large multimodal datasets. In this review article, we survey the landscape of recent (2020-present) GNN applications in the context of cancer and oncology research, and delineate six currently predominant research areas. We then identify the most promising directions for future research. We compare GNNs with graphical models and "non-structured" deep learning, and devise guidelines for cancer and oncology researchers or physician-scientists, asking the question of whether they should adopt the GNN methodology in their research pipelines.
Collapse
Affiliation(s)
- Grigoriy Gogoshin
- Department of Computational and Quantitative Medicine, Beckman Research Institute, and Diabetes and Metabolism Research Institute, City of Hope National Medical Center, 1500 East Duarte Road, Duarte, CA 91010, USA
| | - Andrei S. Rodin
- Department of Computational and Quantitative Medicine, Beckman Research Institute, and Diabetes and Metabolism Research Institute, City of Hope National Medical Center, 1500 East Duarte Road, Duarte, CA 91010, USA
| |
Collapse
|
20
|
Baran K, Kloskowski A. Graph Neural Networks and Structural Information on Ionic Liquids: A Cheminformatics Study on Molecular Physicochemical Property Prediction. J Phys Chem B 2023; 127:10542-10555. [PMID: 38015981 PMCID: PMC10726349 DOI: 10.1021/acs.jpcb.3c05521] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2023] [Revised: 11/01/2023] [Accepted: 11/16/2023] [Indexed: 11/30/2023]
Abstract
Ionic liquids (ILs) provide a promising solution in many industrial applications, such as solvents, absorbents, electrolytes, catalysts, lubricants, and many others. However, due to the enormous variety of their structures, uncovering or designing those with optimal attributes requires expensive and exhaustive simulations and experiments. For these reasons, searching for an efficient theoretical tool for finding the relationship between the IL structure and properties has been the subject of many research studies. Recently, special attention has been paid to machine learning tools, especially multilayer perceptron and convolutional neural networks, among many other algorithms in the field of artificial neural networks. For the latter, graph neural networks (GNNs) seem to be a powerful cheminformatic tool yet not well enough studied for dual molecular systems such as ILs. In this work, the usage of GNNs in structure-property studies is critically evaluated for predicting the density, viscosity, and surface tension of ILs. The problem of data availability and integrity is discussed to show how well GNNs deal with mislabeled chemical data. Providing more training data is proven to be more important than ensuring that they are immaculate. Great attention is paid to how GNNs process different ions to give graph transformations and electrostatic information. Clues on how GNNs should be applied to predict the properties of ILs are provided. Differences, especially regarding handling mislabeled data, favoring the use of GNNs over classical quantitative structure-property models are discussed.
Collapse
Affiliation(s)
- Karol Baran
- Department of Physical Chemistry,
Faculty of Chemistry, Gdansk University
of Technology, Narutowicza Street 11/12, 80-233 Gdansk, Poland
| | - Adam Kloskowski
- Department of Physical Chemistry,
Faculty of Chemistry, Gdansk University
of Technology, Narutowicza Street 11/12, 80-233 Gdansk, Poland
| |
Collapse
|
21
|
Marzi SJ, Schilder BM, Nott A, Frigerio CS, Willaime-Morawek S, Bucholc M, Hanger DP, James C, Lewis PA, Lourida I, Noble W, Rodriguez-Algarra F, Sharif JA, Tsalenchuk M, Winchester LM, Yaman Ü, Yao Z, Ranson JM, Llewellyn DJ. Artificial intelligence for neurodegenerative experimental models. Alzheimers Dement 2023; 19:5970-5987. [PMID: 37768001 DOI: 10.1002/alz.13479] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2023] [Revised: 08/11/2023] [Accepted: 08/14/2023] [Indexed: 09/29/2023]
Abstract
INTRODUCTION Experimental models are essential tools in neurodegenerative disease research. However, the translation of insights and drugs discovered in model systems has proven immensely challenging, marred by high failure rates in human clinical trials. METHODS Here we review the application of artificial intelligence (AI) and machine learning (ML) in experimental medicine for dementia research. RESULTS Considering the specific challenges of reproducibility and translation between other species or model systems and human biology in preclinical dementia research, we highlight best practices and resources that can be leveraged to quantify and evaluate translatability. We then evaluate how AI and ML approaches could be applied to enhance both cross-model reproducibility and translation to human biology, while sustaining biological interpretability. DISCUSSION AI and ML approaches in experimental medicine remain in their infancy. However, they have great potential to strengthen preclinical research and translation if based upon adequate, robust, and reproducible experimental data. HIGHLIGHTS There are increasing applications of AI in experimental medicine. We identified issues in reproducibility, cross-species translation, and data curation in the field. Our review highlights data resources and AI approaches as solutions. Multi-omics analysis with AI offers exciting future possibilities in drug discovery.
Collapse
Affiliation(s)
- Sarah J Marzi
- UK Dementia Research Institute, Imperial College London, London, UK
- Department of Brain Sciences, Imperial College London, London, UK
| | - Brian M Schilder
- UK Dementia Research Institute, Imperial College London, London, UK
- Department of Brain Sciences, Imperial College London, London, UK
| | - Alexi Nott
- UK Dementia Research Institute, Imperial College London, London, UK
- Department of Brain Sciences, Imperial College London, London, UK
| | | | | | - Magda Bucholc
- School of Computing, Engineering & Intelligent Systems, Ulster University, Derry, UK
| | - Diane P Hanger
- Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, UK
| | | | - Patrick A Lewis
- Royal Veterinary College, London, UK
- Department of Neurodegenerative Disease, UCL Queen Square Institute of Neurology, London, UK
| | | | - Wendy Noble
- Faculty of Health and Life Sciences, University of Exeter, Exeter, UK
| | | | - Jalil-Ahmad Sharif
- UK Dementia Research Institute, Imperial College London, London, UK
- Department of Brain Sciences, Imperial College London, London, UK
| | - Maria Tsalenchuk
- UK Dementia Research Institute, Imperial College London, London, UK
- Department of Brain Sciences, Imperial College London, London, UK
| | | | - Ümran Yaman
- UK Dementia Research Institute at UCL, London, UK
| | | | | | - David J Llewellyn
- University of Exeter Medical School, Exeter, UK
- Alan Turing Institute, London, UK
| |
Collapse
|
22
|
Keles E, Bagci U. The past, current, and future of neonatal intensive care units with artificial intelligence: a systematic review. NPJ Digit Med 2023; 6:220. [PMID: 38012349 PMCID: PMC10682088 DOI: 10.1038/s41746-023-00941-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2023] [Accepted: 10/05/2023] [Indexed: 11/29/2023] Open
Abstract
Machine learning and deep learning are two subsets of artificial intelligence that involve teaching computers to learn and make decisions from any sort of data. Most recent developments in artificial intelligence are coming from deep learning, which has proven revolutionary in almost all fields, from computer vision to health sciences. The effects of deep learning in medicine have changed the conventional ways of clinical application significantly. Although some sub-fields of medicine, such as pediatrics, have been relatively slow in receiving the critical benefits of deep learning, related research in pediatrics has started to accumulate to a significant level, too. Hence, in this paper, we review recently developed machine learning and deep learning-based solutions for neonatology applications. We systematically evaluate the roles of both classical machine learning and deep learning in neonatology applications, define the methodologies, including algorithmic developments, and describe the remaining challenges in the assessment of neonatal diseases by using PRISMA 2020 guidelines. To date, the primary areas of focus in neonatology regarding AI applications have included survival analysis, neuroimaging, analysis of vital parameters and biosignals, and retinopathy of prematurity diagnosis. We have categorically summarized 106 research articles from 1996 to 2022 and discussed their pros and cons, respectively. In this systematic review, we aimed to further enhance the comprehensiveness of the study. We also discuss possible directions for new AI models and the future of neonatology with the rising power of AI, suggesting roadmaps for the integration of AI into neonatal intensive care units.
Collapse
Affiliation(s)
- Elif Keles
- Northwestern University, Feinberg School of Medicine, Department of Radiology, Chicago, IL, USA.
| | - Ulas Bagci
- Northwestern University, Feinberg School of Medicine, Department of Radiology, Chicago, IL, USA
- Northwestern University, Department of Biomedical Engineering, Chicago, IL, USA
- Department of Electrical and Computer Engineering, Chicago, IL, USA
| |
Collapse
|
23
|
Przybyszewski J, Malawski M, Lichołai S. GraphTar: applying word2vec and graph neural networks to miRNA target prediction. BMC Bioinformatics 2023; 24:436. [PMID: 37978418 PMCID: PMC10657114 DOI: 10.1186/s12859-023-05564-x] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2023] [Accepted: 11/09/2023] [Indexed: 11/19/2023] Open
Abstract
BACKGROUND MicroRNAs (miRNAs) are short, non-coding RNA molecules that regulate gene expression by binding to specific mRNAs, inhibiting their translation. They play a critical role in regulating various biological processes and are implicated in many diseases, including cardiovascular, oncological, gastrointestinal diseases, and viral infections. Computational methods that can identify potential miRNA-mRNA interactions from raw data use one-dimensional miRNA-mRNA duplex representations and simple sequence encoding techniques, which may limit their performance. RESULTS We have developed GraphTar, a new target prediction method that uses a novel graph-based representation to reflect the spatial structure of the miRNA-mRNA duplex. Unlike existing approaches, we use the word2vec method to accurately encode RNA sequence information. In conjunction with the novel encoding method, we use a graph neural network classifier that can accurately predict miRNA-mRNA interactions based on graph representation learning. As part of a comparative study, we evaluate three different node embedding approaches within the GraphTar framework and compare them with other state-of-the-art target prediction methods. The results show that the proposed method achieves similar performance to the best methods in the field and outperforms them on one of the datasets. CONCLUSIONS In this study, a novel miRNA target prediction approach called GraphTar is introduced. Results show that GraphTar is as effective as existing methods and even outperforms them in some cases, opening new avenues for further research. However, the expansion of available datasets is critical for advancing the field towards real-world applications.
Collapse
Affiliation(s)
- Jan Przybyszewski
- Sano Centre for Computational Medicine, Czarnowiejska 36, 30-054, Cracow, Poland.
| | - Maciej Malawski
- Sano Centre for Computational Medicine, Czarnowiejska 36, 30-054, Cracow, Poland
| | - Sabina Lichołai
- Division of Molecular Biology and Clinical Genetics, Faculty of Medicine, Jagiellonian University Medical College, Skawińska 8, 31-066, Cracow, Poland
| |
Collapse
|
24
|
Lecca P, Lecca M. Graph embedding and geometric deep learning relevance to network biology and structural chemistry. Front Artif Intell 2023; 6:1256352. [PMID: 38035201 PMCID: PMC10687447 DOI: 10.3389/frai.2023.1256352] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2023] [Accepted: 10/16/2023] [Indexed: 12/02/2023] Open
Abstract
Graphs are used as a model of complex relationships among data in biological science since the advent of systems biology in the early 2000. In particular, graph data analysis and graph data mining play an important role in biology interaction networks, where recent techniques of artificial intelligence, usually employed in other type of networks (e.g., social, citations, and trademark networks) aim to implement various data mining tasks including classification, clustering, recommendation, anomaly detection, and link prediction. The commitment and efforts of artificial intelligence research in network biology are motivated by the fact that machine learning techniques are often prohibitively computational demanding, low parallelizable, and ultimately inapplicable, since biological network of realistic size is a large system, which is characterised by a high density of interactions and often with a non-linear dynamics and a non-Euclidean latent geometry. Currently, graph embedding emerges as the new learning paradigm that shifts the tasks of building complex models for classification, clustering, and link prediction to learning an informative representation of the graph data in a vector space so that many graph mining and learning tasks can be more easily performed by employing efficient non-iterative traditional models (e.g., a linear support vector machine for the classification task). The great potential of graph embedding is the main reason of the flourishing of studies in this area and, in particular, the artificial intelligence learning techniques. In this mini review, we give a comprehensive summary of the main graph embedding algorithms in light of the recent burgeoning interest in geometric deep learning.
Collapse
Affiliation(s)
- Paola Lecca
- Faculty of Engineering, Free University of Bozen-Bolzano, Bolzano, Italy
| | - Michela Lecca
- Fondazione Bruno Kessler, Digital Industry Center, Technologies of Vision, Trento, Italy
| |
Collapse
|
25
|
Kouba P, Kohout P, Haddadi F, Bushuiev A, Samusevich R, Sedlar J, Damborsky J, Pluskal T, Sivic J, Mazurenko S. Machine Learning-Guided Protein Engineering. ACS Catal 2023; 13:13863-13895. [PMID: 37942269 PMCID: PMC10629210 DOI: 10.1021/acscatal.3c02743] [Citation(s) in RCA: 23] [Impact Index Per Article: 23.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2023] [Revised: 09/20/2023] [Indexed: 11/10/2023]
Abstract
Recent progress in engineering highly promising biocatalysts has increasingly involved machine learning methods. These methods leverage existing experimental and simulation data to aid in the discovery and annotation of promising enzymes, as well as in suggesting beneficial mutations for improving known targets. The field of machine learning for protein engineering is gathering steam, driven by recent success stories and notable progress in other areas. It already encompasses ambitious tasks such as understanding and predicting protein structure and function, catalytic efficiency, enantioselectivity, protein dynamics, stability, solubility, aggregation, and more. Nonetheless, the field is still evolving, with many challenges to overcome and questions to address. In this Perspective, we provide an overview of ongoing trends in this domain, highlight recent case studies, and examine the current limitations of machine learning-based methods. We emphasize the crucial importance of thorough experimental validation of emerging models before their use for rational protein design. We present our opinions on the fundamental problems and outline the potential directions for future research.
Collapse
Affiliation(s)
- Petr Kouba
- Loschmidt
Laboratories, Department of Experimental Biology and RECETOX, Faculty
of Science, Masaryk University, Kamenice 5, 625 00 Brno, Czech
Republic
- Czech Institute
of Informatics, Robotics and Cybernetics, Czech Technical University in Prague, Jugoslavskych partyzanu 1580/3, 160 00 Prague 6, Czech Republic
- Faculty of
Electrical Engineering, Czech Technical
University in Prague, Technicka 2, 166 27 Prague 6, Czech Republic
| | - Pavel Kohout
- Loschmidt
Laboratories, Department of Experimental Biology and RECETOX, Faculty
of Science, Masaryk University, Kamenice 5, 625 00 Brno, Czech
Republic
- International
Clinical Research Center, St. Anne’s
University Hospital Brno, Pekarska 53, 656 91 Brno, Czech Republic
| | - Faraneh Haddadi
- Loschmidt
Laboratories, Department of Experimental Biology and RECETOX, Faculty
of Science, Masaryk University, Kamenice 5, 625 00 Brno, Czech
Republic
- International
Clinical Research Center, St. Anne’s
University Hospital Brno, Pekarska 53, 656 91 Brno, Czech Republic
| | - Anton Bushuiev
- Czech Institute
of Informatics, Robotics and Cybernetics, Czech Technical University in Prague, Jugoslavskych partyzanu 1580/3, 160 00 Prague 6, Czech Republic
| | - Raman Samusevich
- Czech Institute
of Informatics, Robotics and Cybernetics, Czech Technical University in Prague, Jugoslavskych partyzanu 1580/3, 160 00 Prague 6, Czech Republic
- Institute
of Organic Chemistry and Biochemistry of the Czech Academy of Sciences, Flemingovo nám. 2, 160 00 Prague 6, Czech Republic
| | - Jiri Sedlar
- Czech Institute
of Informatics, Robotics and Cybernetics, Czech Technical University in Prague, Jugoslavskych partyzanu 1580/3, 160 00 Prague 6, Czech Republic
| | - Jiri Damborsky
- Loschmidt
Laboratories, Department of Experimental Biology and RECETOX, Faculty
of Science, Masaryk University, Kamenice 5, 625 00 Brno, Czech
Republic
- International
Clinical Research Center, St. Anne’s
University Hospital Brno, Pekarska 53, 656 91 Brno, Czech Republic
| | - Tomas Pluskal
- Institute
of Organic Chemistry and Biochemistry of the Czech Academy of Sciences, Flemingovo nám. 2, 160 00 Prague 6, Czech Republic
| | - Josef Sivic
- Czech Institute
of Informatics, Robotics and Cybernetics, Czech Technical University in Prague, Jugoslavskych partyzanu 1580/3, 160 00 Prague 6, Czech Republic
| | - Stanislav Mazurenko
- Loschmidt
Laboratories, Department of Experimental Biology and RECETOX, Faculty
of Science, Masaryk University, Kamenice 5, 625 00 Brno, Czech
Republic
- International
Clinical Research Center, St. Anne’s
University Hospital Brno, Pekarska 53, 656 91 Brno, Czech Republic
| |
Collapse
|
26
|
Cohen S, Schneidman-Duhovny D. A deep learning model for predicting optimal distance range in crosslinking mass spectrometry data. Proteomics 2023; 23:e2200341. [PMID: 37070547 DOI: 10.1002/pmic.202200341] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2022] [Revised: 04/02/2023] [Accepted: 04/03/2023] [Indexed: 04/19/2023]
Abstract
Macromolecular assemblies play an important role in all cellular processes. While there has recently been significant progress in protein structure prediction based on deep learning, large protein complexes cannot be predicted with these approaches. The integrative structure modeling approach characterizes multi-subunit complexes by computational integration of data from fast and accessible experimental techniques. Crosslinking mass spectrometry is one such technique that provides spatial information about the proximity of crosslinked residues. One of the challenges in interpreting crosslinking datasets is designing a scoring function that, given a structure, can quantify how well it fits the data. Most approaches set an upper bound on the distance between Cα atoms of crosslinked residues and calculate a fraction of satisfied crosslinks. However, the distance spanned by the crosslinker greatly depends on the neighborhood of the crosslinked residues. Here, we design a deep learning model for predicting the optimal distance range for a crosslinked residue pair based on the structures of their neighborhoods. We find that our model can predict the distance range with the area under the receiver-operator curve of 0.86 and 0.7 for intra- and inter-protein crosslinks, respectively. Our deep scoring function can be used in a range of structure modeling applications.
Collapse
Affiliation(s)
- Shon Cohen
- The Rachel and Selim Benin School of Computer Science and Engineering, The Hebrew University of Jerusalem, Jerusalem, Israel
| | - Dina Schneidman-Duhovny
- The Rachel and Selim Benin School of Computer Science and Engineering, The Hebrew University of Jerusalem, Jerusalem, Israel
| |
Collapse
|
27
|
Sinha K, Ghosh N, Sil PC. A Review on the Recent Applications of Deep Learning in Predictive Drug Toxicological Studies. Chem Res Toxicol 2023; 36:1174-1205. [PMID: 37561655 DOI: 10.1021/acs.chemrestox.2c00375] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/12/2023]
Abstract
Drug toxicity prediction is an important step in ensuring patient safety during drug design studies. While traditional preclinical studies have historically relied on animal models to evaluate toxicity, recent advances in deep-learning approaches have shown great promise in advancing drug safety science and reducing animal use in preclinical studies. However, deep-learning-based approaches also face challenges in handling large biological data sets, model interpretability, and regulatory acceptance. In this review, we provide an overview of recent developments in deep-learning-based approaches for predicting drug toxicity, highlighting their potential advantages over traditional methods and the need to address their limitations. Deep-learning models have demonstrated excellent performance in predicting toxicity outcomes from various data sources such as chemical structures, genomic data, and high-throughput screening assays. The potential of deep learning for automated feature engineering is also discussed. This review emphasizes the need to address ethical concerns related to the use of deep learning in drug toxicity studies, including the reduction of animal use and ensuring regulatory acceptance. Furthermore, emerging applications of deep learning in drug toxicity prediction, such as predicting drug-drug interactions and toxicity in rare subpopulations, are highlighted. The integration of deep-learning-based approaches with traditional methods is discussed as a way to develop more reliable and efficient predictive models for drug safety assessment, paving the way for safer and more effective drug discovery and development. Overall, this review highlights the critical role of deep learning in predictive toxicology and drug safety evaluation, emphasizing the need for continued research and development in this rapidly evolving field. By addressing the limitations of traditional methods, leveraging the potential of deep learning for automated feature engineering, and addressing ethical concerns, deep-learning-based approaches have the potential to revolutionize drug toxicity prediction and improve patient safety in drug discovery and development.
Collapse
Affiliation(s)
- Krishnendu Sinha
- Department of Zoology, Jhargram Raj College, Jhargram 721507, West Bengal, India
| | - Nabanita Ghosh
- Department of Zoology, Maulana Azad College, Kolkata 700013, West Bengal, India
| | - Parames C Sil
- Division of Molecular Medicine, Bose Institute, Kolkata 700054, West Bengal, India
| |
Collapse
|
28
|
Hashemi N, Hao B, Ignatov M, Paschalidis IC, Vakili P, Vajda S, Kozakov D. Improved prediction of MHC-peptide binding using protein language models. FRONTIERS IN BIOINFORMATICS 2023; 3:1207380. [PMID: 37663788 PMCID: PMC10469926 DOI: 10.3389/fbinf.2023.1207380] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2023] [Accepted: 08/04/2023] [Indexed: 09/05/2023] Open
Abstract
Major histocompatibility complex Class I (MHC-I) molecules bind to peptides derived from intracellular antigens and present them on the surface of cells, allowing the immune system (T cells) to detect them. Elucidating the process of this presentation is essential for regulation and potential manipulation of the cellular immune system. Predicting whether a given peptide binds to an MHC molecule is an important step in the above process and has motivated the introduction of many computational approaches to address this problem. NetMHCPan, a pan-specific model for predicting binding of peptides to any MHC molecule, is one of the most widely used methods which focuses on solving this binary classification problem using shallow neural networks. The recent successful results of Deep Learning (DL) methods, especially Natural Language Processing (NLP-based) pretrained models in various applications, including protein structure determination, motivated us to explore their use in this problem. Specifically, we consider the application of deep learning models pretrained on large datasets of protein sequences to predict MHC Class I-peptide binding. Using the standard performance metrics in this area, and the same training and test sets, we show that our models outperform NetMHCpan4.1, currently considered as the-state-of-the-art.
Collapse
Affiliation(s)
- Nasser Hashemi
- Division of Systems Engineering, Boston University, Boston, MA, United States
| | - Boran Hao
- Department of Electrical and Computer Engineering, Boston University, Boston, MA, United States
| | - Mikhail Ignatov
- Department of Applied Mathematics and Statistics, Stony Brook University, Stony Brook, NY, United States
- Laufer Center for Physical and Quantitative Biology, Stony Brook University, Stony Brook, NY, United States
| | - Ioannis Ch. Paschalidis
- Division of Systems Engineering, Boston University, Boston, MA, United States
- Department of Electrical and Computer Engineering, Boston University, Boston, MA, United States
- Department of Biomedical Engineering, Boston University, Boston, MA, United States
| | - Pirooz Vakili
- Division of Systems Engineering, Boston University, Boston, MA, United States
| | - Sandor Vajda
- Division of Systems Engineering, Boston University, Boston, MA, United States
- Department of Biomedical Engineering, Boston University, Boston, MA, United States
- Department of Chemistry, Boston University, Boston, MA, United States
| | - Dima Kozakov
- Department of Applied Mathematics and Statistics, Stony Brook University, Stony Brook, NY, United States
- Laufer Center for Physical and Quantitative Biology, Stony Brook University, Stony Brook, NY, United States
- Department of Biomedical Engineering, Boston University, Boston, MA, United States
| |
Collapse
|
29
|
Abstract
Following the widespread use of deep learning for genomics, deep generative modeling is also becoming a viable methodology for the broad field. Deep generative models (DGMs) can learn the complex structure of genomic data and allow researchers to generate novel genomic instances that retain the real characteristics of the original dataset. Aside from data generation, DGMs can also be used for dimensionality reduction by mapping the data space to a latent space, as well as for prediction tasks via exploitation of this learned mapping or supervised/semi-supervised DGM designs. In this review, we briefly introduce generative modeling and two currently prevailing architectures, we present conceptual applications along with notable examples in functional and evolutionary genomics, and we provide our perspective on potential challenges and future directions.
Collapse
Affiliation(s)
- Burak Yelmen
- Laboratoire Interdisciplinaire des Sciences du Numérique, CNRS UMR 9015, INRIA, Université Paris-Saclay, Orsay, France;
- Institute of Genomics, University of Tartu, Tartu, Estonia
| | - Flora Jay
- Laboratoire Interdisciplinaire des Sciences du Numérique, CNRS UMR 9015, INRIA, Université Paris-Saclay, Orsay, France;
| |
Collapse
|
30
|
Latapiat V, Saez M, Pedroso I, Martin AJM. Unraveling patient heterogeneity in complex diseases through individualized co-expression networks: a perspective. Front Genet 2023; 14:1209416. [PMID: 37636264 PMCID: PMC10449456 DOI: 10.3389/fgene.2023.1209416] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2023] [Accepted: 07/24/2023] [Indexed: 08/29/2023] Open
Abstract
This perspective highlights the potential of individualized networks as a novel strategy for studying complex diseases through patient stratification, enabling advancements in precision medicine. We emphasize the impact of interpatient heterogeneity resulting from genetic and environmental factors and discuss how individualized networks improve our ability to develop treatments and enhance diagnostics. Integrating system biology, combining multimodal information such as genomic and clinical data has reached a tipping point, allowing the inference of biological networks at a single-individual resolution. This approach generates a specific biological network per sample, representing the individual from which the sample originated. The availability of individualized networks enables applications in personalized medicine, such as identifying malfunctions and selecting tailored treatments. In essence, reliable, individualized networks can expedite research progress in understanding drug response variability by modeling heterogeneity among individuals and enabling the personalized selection of pharmacological targets for treatment. Therefore, developing diverse and cost-effective approaches for generating these networks is crucial for widespread application in clinical services.
Collapse
Affiliation(s)
- Verónica Latapiat
- Programa de Doctorado en Genómica Integrativa, Vicerrectoría de Investigación, Universidad Mayor, Santiago, Chile
- Vicerrectoría de Investigación, Universidad Mayor, Santiago, Chile
- Laboratorio de Redes Biológicas, Centro Científico y Tecnológico de Excelencia Ciencia & Vida, Fundación Ciencia & Vida, Santiago, Chile
| | - Mauricio Saez
- Centro de Oncología de Precisión, Facultad de Medicina y Ciencias de la Salud, Universidad Mayor, Santiago, Chile
- Laboratorio de Investigación en Salud de Precisión, Departamento de Procesos Diagnósticos y Evaluación, Facultad de Ciencias de la Salud, Universidad Católica de Temuco, Temuco, Chile
| | - Inti Pedroso
- Vicerrectoría de Investigación, Universidad Mayor, Santiago, Chile
| | - Alberto J. M. Martin
- Laboratorio de Redes Biológicas, Centro Científico y Tecnológico de Excelencia Ciencia & Vida, Fundación Ciencia & Vida, Santiago, Chile
- Escuela de Ingeniería, Facultad de Ingeniería, Arquitectura y Diseño, Universidad San Sebastián, Santiago, Chile
| |
Collapse
|
31
|
Lu W, Lee NA, Buehler MJ. Modeling and design of heterogeneous hierarchical bioinspired spider web structures using deep learning and additive manufacturing. Proc Natl Acad Sci U S A 2023; 120:e2305273120. [PMID: 37487072 PMCID: PMC10401013 DOI: 10.1073/pnas.2305273120] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2023] [Accepted: 06/09/2023] [Indexed: 07/26/2023] Open
Abstract
Spider webs are incredible biological structures, comprising thin but strong silk filament and arranged into complex hierarchical architectures with striking mechanical properties (e.g., lightweight but high strength, achieving diverse mechanical responses). While simple 2D orb webs can easily be mimicked, the modeling and synthesis of 3D-based web structures remain challenging, partly due to the rich set of design features. Here, we provide a detailed analysis of the heterogeneous graph structures of spider webs and use deep learning as a way to model and then synthesize artificial, bioinspired 3D web structures. The generative models are conditioned based on key geometric parameters (including average edge length, number of nodes, average node degree, and others). To identify graph construction principles, we use inductive representation sampling of large experimentally determined spider web graphs, to yield a dataset that is used to train three conditional generative models: 1) an analog diffusion model inspired by nonequilibrium thermodynamics, with sparse neighbor representation; 2) a discrete diffusion model with full neighbor representation; and 3) an autoregressive transformer architecture with full neighbor representation. All three models are scalable, produce complex, de novo bioinspired spider web mimics, and successfully construct graphs that meet the design objectives. We further propose an algorithm that assembles web samples produced by the generative models into larger-scale structures based on a series of geometric design targets, including helical and parametric shapes, mimicking, and extending natural design principles toward integration with diverging engineering objectives. Several webs are manufactured using 3D printing and tested to assess mechanical properties.
Collapse
Affiliation(s)
- Wei Lu
- Laboratory for Atomistic and Molecular Mechanics, Massachusetts Institute of Technology, Cambridge, MA02139
- Department of Civil and Environmental Engineering, Massachusetts Institute of Technology, Cambridge, MA02139
| | - Nic A. Lee
- Laboratory for Atomistic and Molecular Mechanics, Massachusetts Institute of Technology, Cambridge, MA02139
- Media Lab, Massachusetts Institute of Technology, Cambridge, MA02139
| | - Markus J. Buehler
- Laboratory for Atomistic and Molecular Mechanics, Massachusetts Institute of Technology, Cambridge, MA02139
- Department of Civil and Environmental Engineering, Massachusetts Institute of Technology, Cambridge, MA02139
- Center for Computational Science and Engineering, Schwarzman College of Computing, Massachusetts Institute of Technology, Cambridge, MA02139
- Department of Mechanical Engineering, Massachusetts Institute of Technology, Cambridge, MA02139
| |
Collapse
|
32
|
Wang Y, Tai S, Zhang S, Sheng N, Xie X. PromGER: Promoter Prediction Based on Graph Embedding and Ensemble Learning for Eukaryotic Sequence. Genes (Basel) 2023; 14:1441. [PMID: 37510345 PMCID: PMC10379012 DOI: 10.3390/genes14071441] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2023] [Revised: 07/04/2023] [Accepted: 07/10/2023] [Indexed: 07/30/2023] Open
Abstract
Promoters are DNA non-coding regions around the transcription start site and are responsible for regulating the gene transcription process. Due to their key role in gene function and transcriptional activity, the prediction of promoter sequences and their core elements accurately is a crucial research area in bioinformatics. At present, models based on machine learning and deep learning have been developed for promoter prediction. However, these models cannot mine the deeper biological information of promoter sequences and consider the complex relationship among promoter sequences. In this work, we propose a novel prediction model called PromGER to predict eukaryotic promoter sequences. For a promoter sequence, firstly, PromGER utilizes four types of feature-encoding methods to extract local information within promoter sequences. Secondly, according to the potential relationships among promoter sequences, the whole promoter sequences are constructed as a graph. Furthermore, three different scales of graph-embedding methods are applied for obtaining the global feature information more comprehensively in the graph. Finally, combining local features with global features of sequences, PromGER analyzes and predicts promoter sequences through a tree-based ensemble-learning framework. Compared with seven existing methods, PromGER improved the average specificity of 13%, accuracy of 10%, Matthew's correlation coefficient of 16%, precision of 4%, F1 score of 6%, and AUC of 9%. Specifically, this study interpreted the PromGER by the t-distributed stochastic neighbor embedding (t-SNE) method and SHAPley Additive exPlanations (SHAP) value analysis, which demonstrates the interpretability of the model.
Collapse
Affiliation(s)
- Yan Wang
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun 130012, China
- School of Artificial Intelligence, Jilin University, Changchun 130012, China
| | - Shiwen Tai
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun 130012, China
| | - Shuangquan Zhang
- School of Cyber Science and Engineering, Nanjing University of Science and Technology, Nanjing 210094, China
| | - Nan Sheng
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun 130012, China
| | - Xuping Xie
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun 130012, China
| |
Collapse
|
33
|
Kim SY. Personalized Explanations for Early Diagnosis of Alzheimer's Disease Using Explainable Graph Neural Networks with Population Graphs. Bioengineering (Basel) 2023; 10:701. [PMID: 37370632 DOI: 10.3390/bioengineering10060701] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2023] [Revised: 06/05/2023] [Accepted: 06/06/2023] [Indexed: 06/29/2023] Open
Abstract
Leveraging recent advances in graph neural networks, our study introduces an application of graph convolutional networks (GCNs) within a correlation-based population graph, aiming to enhance Alzheimer's disease (AD) prognosis and illuminate the intricacies of AD progression. This methodological approach leverages the inherent structure and correlations in demographic and neuroimaging data to predict amyloid-beta (Aβ) positivity. To validate our approach, we conducted extensive performance comparisons with conventional machine learning models and a GCN model with randomly assigned edges. The results consistently highlighted the superior performance of the correlation-based GCN model across different sample groups in the Alzheimer's Disease Neuroimaging Initiative (ADNI) dataset, suggesting the importance of accurately reflecting the correlation structure in population graphs for effective pattern recognition and accurate prediction. Furthermore, our exploration of the model's decision-making process using GNNExplainer identified unique sets of biomarkers indicative of Aβ positivity in different groups, shedding light on the heterogeneity of AD progression. This study underscores the potential of our proposed approach for more nuanced AD prognoses, potentially informing more personalized and precise therapeutic strategies. Future research can extend these findings by integrating diverse data sources, employing longitudinal data, and refining the interpretability of the model, which potentially has broad applicability to other complex diseases.
Collapse
Affiliation(s)
- So Yeon Kim
- Department of Artificial Intelligence, Ajou University, Suwon 16499, Republic of Korea
- Department of Software and Computer Engineering, Ajou University, Suwon 16499, Republic of Korea
| |
Collapse
|
34
|
Brunson T, Sanati N, Matthews L, Haw R, Beavers D, Shorser S, Sevilla C, Viteri G, Conley P, Rothfels K, Hermjakob H, Stein L, D’Eustachio P, Wu G. Illuminating Dark Proteins using Reactome Pathways. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.06.05.543335. [PMID: 37333417 PMCID: PMC10274615 DOI: 10.1101/2023.06.05.543335] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/20/2023]
Abstract
Limited knowledge about a substantial portion of protein coding genes, known as "dark" proteins, hinders our understanding of their functions and potential therapeutic applications. To address this, we leveraged Reactome, the most comprehensive, open source, open-access pathway knowledgebase, to contextualize dark proteins within biological pathways. By integrating multiple resources and employing a random forest classifier trained on 106 protein/gene pairwise features, we predicted functional interactions between dark proteins and Reactome-annotated proteins. We then developed three scores to measure the interactions between dark proteins and Reactome pathways, utilizing enrichment analysis and fuzzy logic simulations. Correlation analysis of these scores with an independent single-cell RNA sequencing dataset provided supporting evidence for this approach. Furthermore, systematic natural language processing (NLP) analysis of over 22 million PubMed abstracts and manual checking of the literature associated with 20 randomly selected dark proteins reinforced the predicted interactions between proteins and pathways. To enhance the visualization and exploration of dark proteins within Reactome pathways, we developed the Reactome IDG portal, deployed at https://idg.reactome.org, a web application featuring tissue-specific protein and gene expression overlay, as well as drug interactions. Our integrated computational approach, together with the user-friendly web platform, offers a valuable resource for uncovering potential biological functions and therapeutic implications of dark proteins.
Collapse
Affiliation(s)
| | - Nasim Sanati
- Oregon Health & Science University, Portland, OR 97239, USA
| | | | - Robin Haw
- Ontario Institute for Cancer Research, Toronto, ON M5G0A3, Canada
| | - Deidre Beavers
- Oregon Health & Science University, Portland, OR 97239, USA
| | - Solomon Shorser
- Ontario Institute for Cancer Research, Toronto, ON M5G0A3, Canada
| | - Cristoffer Sevilla
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK
| | - Guilherme Viteri
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK
| | - Patrick Conley
- Oregon Health & Science University, Portland, OR 97239, USA
| | - Karen Rothfels
- Ontario Institute for Cancer Research, Toronto, ON M5G0A3, Canada
| | - Henning Hermjakob
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK
| | - Lincoln Stein
- Ontario Institute for Cancer Research, Toronto, ON M5G0A3, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, ON M5S1A1, Canada
| | | | - Guanming Wu
- Oregon Health & Science University, Portland, OR 97239, USA
| |
Collapse
|
35
|
Kang H, Hou L, Gu Y, Lu X, Li J, Li Q. Drug-disease association prediction with literature based multi-feature fusion. Front Pharmacol 2023; 14:1205144. [PMID: 37284317 PMCID: PMC10239876 DOI: 10.3389/fphar.2023.1205144] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2023] [Accepted: 05/09/2023] [Indexed: 06/08/2023] Open
Abstract
Introduction: Exploring the potential efficacy of a drug is a valid approach for drug development with shorter development times and lower costs. Recently, several computational drug repositioning methods have been introduced to learn multi-features for potential association prediction. However, fully leveraging the vast amount of information in the scientific literature to enhance drug-disease association prediction is a great challenge. Methods: We constructed a drug-disease association prediction method called Literature Based Multi-Feature Fusion (LBMFF), which effectively integrated known drugs, diseases, side effects and target associations from public databases as well as literature semantic features. Specifically, a pre-training and fine-tuning BERT model was introduced to extract literature semantic information for similarity assessment. Then, we revealed drug and disease embeddings from the constructed fusion similarity matrix by a graph convolutional network with an attention mechanism. Results: LBMFF achieved superior performance in drug-disease association prediction with an AUC value of 0.8818 and an AUPR value of 0.5916. Discussion: LBMFF achieved relative improvements of 31.67% and 16.09%, respectively, over the second-best results, compared to single feature methods and seven existing state-of-the-art prediction methods on the same test datasets. Meanwhile, case studies have verified that LBMFF can discover new associations to accelerate drug development. The proposed benchmark dataset and source code are available at: https://github.com/kang-hongyu/LBMFF.
Collapse
Affiliation(s)
- Hongyu Kang
- Department of Biomedical Engineering, School of Life Science, Beijing Institute of Technology, Beijing, China
- Institute of Medical Information, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| | - Li Hou
- Institute of Medical Information, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| | - Yaowen Gu
- Institute of Medical Information, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| | - Xiao Lu
- Department of Biomedical Engineering, School of Life Science, Beijing Institute of Technology, Beijing, China
| | - Jiao Li
- Institute of Medical Information, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| | - Qin Li
- Department of Biomedical Engineering, School of Life Science, Beijing Institute of Technology, Beijing, China
| |
Collapse
|
36
|
Lu H, Uddin S. Disease Prediction Using Graph Machine Learning Based on Electronic Health Data: A Review of Approaches and Trends. Healthcare (Basel) 2023; 11:healthcare11071031. [PMID: 37046958 PMCID: PMC10094099 DOI: 10.3390/healthcare11071031] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2022] [Revised: 03/11/2023] [Accepted: 04/01/2023] [Indexed: 04/07/2023] Open
Abstract
Graph machine-learning (ML) methods have recently attracted great attention and have made significant progress in graph applications. To date, most graph ML approaches have been evaluated on social networks, but they have not been comprehensively reviewed in the health informatics domain. Herein, a review of graph ML methods and their applications in the disease prediction domain based on electronic health data is presented in this study from two levels: node classification and link prediction. Commonly used graph ML approaches for these two levels are shallow embedding and graph neural networks (GNN). This study performs comprehensive research to identify articles that applied or proposed graph ML models on disease prediction using electronic health data. We considered journals and conferences from four digital library databases (i.e., PubMed, Scopus, ACM digital library, and IEEEXplore). Based on the identified articles, we review the present status of and trends in graph ML approaches for disease prediction using electronic health data. Even though GNN-based models have achieved outstanding results compared with the traditional ML methods in a wide range of disease prediction tasks, they still confront interpretability and dynamic graph challenges. Though the disease prediction field using ML techniques is still emerging, GNN-based models have the potential to be an excellent approach for disease prediction, which can be used in medical diagnosis, treatment, and the prognosis of diseases.
Collapse
Affiliation(s)
- Haohui Lu
- School of Project Management, Faculty of Engineering, The University of Sydney, Forest Lodge, Sydney, NSW 2037, Australia
| | - Shahadat Uddin
- School of Project Management, Faculty of Engineering, The University of Sydney, Forest Lodge, Sydney, NSW 2037, Australia
| |
Collapse
|
37
|
Zhang H, Li X, Li Z, Huang D, Zhang L. Estimation of Particle Location in Granular Materials Based on Graph Neural Networks. MICROMACHINES 2023; 14:714. [PMID: 37420946 DOI: 10.3390/mi14040714] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/21/2023] [Revised: 03/20/2023] [Accepted: 03/21/2023] [Indexed: 07/09/2023]
Abstract
Particle locations determine the whole structure of a granular system, which is crucial to understanding various anomalous behaviors in glasses and amorphous solids. How to accurately determine the coordinates of each particle in such materials within a short time has always been a challenge. In this paper, we use an improved graph convolutional neural network to estimate the particle locations in two-dimensional photoelastic granular materials purely from the knowledge of the distances for each particle, which can be estimated in advance via a distance estimation algorithm. The robustness and effectiveness of our model are verified by testing other granular systems with different disorder degrees, as well as systems with different configurations. In this study, we attempt to provide a new route to the structural information of granular systems irrelevant to dimensionality, compositions, or other material properties.
Collapse
Affiliation(s)
- Hang Zhang
- School of Automation, Central South University, Changsha 410083, China
| | - Xingqiao Li
- School of Automation, Central South University, Changsha 410083, China
| | - Zirui Li
- School of Automation, Central South University, Changsha 410083, China
| | - Duan Huang
- School of Computer Science and Engineering, Central South University, Changsha 410083, China
| | - Ling Zhang
- School of Automation, Central South University, Changsha 410083, China
| |
Collapse
|
38
|
Bhadani R, Chen Z, An L. Attention-Based Graph Neural Network for Label Propagation in Single-Cell Omics. Genes (Basel) 2023; 14:506. [PMID: 36833434 PMCID: PMC9957137 DOI: 10.3390/genes14020506] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2022] [Revised: 02/13/2023] [Accepted: 02/13/2023] [Indexed: 02/19/2023] Open
Abstract
Single-cell data analysis has been at forefront of development in biology and medicine since sequencing data have been made available. An important challenge in single-cell data analysis is the identification of cell types. Several methods have been proposed for cell-type identification. However, these methods do not capture the higher-order topological relationship between different samples. In this work, we propose an attention-based graph neural network that captures the higher-order topological relationship between different samples and performs transductive learning for predicting cell types. The evaluation of our method on both simulation and publicly available datasets demonstrates the superiority of our method, scAGN, in terms of prediction accuracy. In addition, our method works best for highly sparse datasets in terms of F1 score, precision score, recall score, and Matthew's correlation coefficients as well. Further, our method's runtime complexity is consistently faster compared to other methods.
Collapse
Affiliation(s)
- Rahul Bhadani
- Department of Electrical & Computer Engineering, The University of Arizona, Tucson, AZ 85721, USA
- Interdisciplinary Program in Statistics and Data Science, The University of Arizona, Tucson, AZ 85721, USA
| | - Zhuo Chen
- Interdisciplinary Program in Statistics and Data Science, The University of Arizona, Tucson, AZ 85721, USA
| | - Lingling An
- Interdisciplinary Program in Statistics and Data Science, The University of Arizona, Tucson, AZ 85721, USA
- Department of Biosystems Engineering, The University of Arizona, Tucson, AZ 85721, USA
- Department of Epidemiology and Biostatistics, The University of Arizona, Tucson, AZ 85721, USA
| |
Collapse
|
39
|
Alharbi F, Vakanski A. Machine Learning Methods for Cancer Classification Using Gene Expression Data: A Review. Bioengineering (Basel) 2023; 10:bioengineering10020173. [PMID: 36829667 PMCID: PMC9952758 DOI: 10.3390/bioengineering10020173] [Citation(s) in RCA: 17] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2022] [Revised: 01/24/2023] [Accepted: 01/26/2023] [Indexed: 01/31/2023] Open
Abstract
Cancer is a term that denotes a group of diseases caused by the abnormal growth of cells that can spread in different parts of the body. According to the World Health Organization (WHO), cancer is the second major cause of death after cardiovascular diseases. Gene expression can play a fundamental role in the early detection of cancer, as it is indicative of the biochemical processes in tissue and cells, as well as the genetic characteristics of an organism. Deoxyribonucleic acid (DNA) microarrays and ribonucleic acid (RNA)-sequencing methods for gene expression data allow quantifying the expression levels of genes and produce valuable data for computational analysis. This study reviews recent progress in gene expression analysis for cancer classification using machine learning methods. Both conventional and deep learning-based approaches are reviewed, with an emphasis on the application of deep learning models due to their comparative advantages for identifying gene patterns that are distinctive for various types of cancers. Relevant works that employ the most commonly used deep neural network architectures are covered, including multi-layer perceptrons, as well as convolutional, recurrent, graph, and transformer networks. This survey also presents an overview of the data collection methods for gene expression analysis and lists important datasets that are commonly used for supervised machine learning for this task. Furthermore, we review pertinent techniques for feature engineering and data preprocessing that are typically used to handle the high dimensionality of gene expression data, caused by a large number of genes present in data samples. The paper concludes with a discussion of future research directions for machine learning-based gene expression analysis for cancer classification.
Collapse
|
40
|
Liu C, Duan Y, Zhou Q, Wang Y, Gao Y, Kan H, Hu J. A classification method of gastric cancer subtype based on residual graph convolution network. Front Genet 2023; 13:1090394. [PMID: 36685956 PMCID: PMC9845413 DOI: 10.3389/fgene.2022.1090394] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2022] [Accepted: 12/09/2022] [Indexed: 01/06/2023] Open
Abstract
Background: Clinical diagnosis and treatment of tumors are greatly complicated by their heterogeneity, and the subtype classification of cancer frequently plays a significant role in the subsequent treatment of tumors. Presently, the majority of studies rely far too heavily on gene expression data, omitting the enormous power of multi-omics fusion data and the potential for patient similarities. Method: In this study, we created a gastric cancer subtype classification model called RRGCN based on residual graph convolutional network (GCN) using multi-omics fusion data and patient similarity network. Given the multi-omics data's high dimensionality, we built an artificial neural network Autoencoder (AE) to reduce the dimensionality of the data and extract hidden layer features. The model is then built using the feature data. In addition, we computed the correlation between patients using the Pearson correlation coefficient, and this relationship between patients forms the edge of the graph structure. Four graph convolutional network layers and two residual networks with skip connections make up RRGCN, which reduces the amount of information lost during transmission between layers and prevents model degradation. Results: The results show that RRGCN significantly outperforms other classification methods with an accuracy as high as 0.87 when compared to four other traditional machine learning methods and deep learning models. Conclusion: In terms of subtype classification, RRGCN excels in all areas and has the potential to offer fresh perspectives on disease mechanisms and disease progression. It has the potential to be used for a broader range of disorders and to aid in clinical diagnosis.
Collapse
Affiliation(s)
- Can Liu
- School of Medical Informatics Engineering, Anhui University of Chinese Medicine, Hefei, Anhui, China
- Anhui Computer Application Research Institute of Chinese Medicine, China Academy of Chinese Medical Sciences, Hefei, Anhui, China
| | - Yuchen Duan
- School of Medical Informatics Engineering, Anhui University of Chinese Medicine, Hefei, Anhui, China
| | - Qingqing Zhou
- School of Medical Informatics Engineering, Anhui University of Chinese Medicine, Hefei, Anhui, China
| | - Yongkang Wang
- School of Medical Informatics Engineering, Anhui University of Chinese Medicine, Hefei, Anhui, China
- Anhui Computer Application Research Institute of Chinese Medicine, China Academy of Chinese Medical Sciences, Hefei, Anhui, China
| | - Yong Gao
- School of Medical Informatics Engineering, Anhui University of Chinese Medicine, Hefei, Anhui, China
- Anhui Computer Application Research Institute of Chinese Medicine, China Academy of Chinese Medical Sciences, Hefei, Anhui, China
| | - Hongxing Kan
- School of Medical Informatics Engineering, Anhui University of Chinese Medicine, Hefei, Anhui, China
- Anhui Computer Application Research Institute of Chinese Medicine, China Academy of Chinese Medical Sciences, Hefei, Anhui, China
| | - Jili Hu
- School of Medical Informatics Engineering, Anhui University of Chinese Medicine, Hefei, Anhui, China
- Anhui Computer Application Research Institute of Chinese Medicine, China Academy of Chinese Medical Sciences, Hefei, Anhui, China
| |
Collapse
|
41
|
Li D, Liang H, Qin P, Wang J. A self-training subspace clustering algorithm based on adaptive confidence for gene expression data. Front Genet 2023; 14:1132370. [PMID: 37025450 PMCID: PMC10070828 DOI: 10.3389/fgene.2023.1132370] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2022] [Accepted: 03/07/2023] [Indexed: 04/08/2023] Open
Abstract
Gene clustering is one of the important techniques to identify co-expressed gene groups from gene expression data, which provides a powerful tool for investigating functional relationships of genes in biological process. Self-training is a kind of important semi-supervised learning method and has exhibited good performance on gene clustering problem. However, the self-training process inevitably suffers from mislabeling, the accumulation of which will lead to the degradation of semi-supervised learning performance of gene expression data. To solve the problem, this paper proposes a self-training subspace clustering algorithm based on adaptive confidence for gene expression data (SSCAC), which combines the low-rank representation of gene expression data and adaptive adjustment of label confidence to better guide the partition of unlabeled data. The superiority of the proposed SSCAC algorithm is mainly reflected in the following aspects. 1) In order to improve the discriminative property of gene expression data, the low-rank representation with distance penalty is used to mine the potential subspace structure of data. 2) Considering the problem of mislabeling in self-training, a semi-supervised clustering objective function with label confidence is proposed, and a self-training subspace clustering framework is constructed on this basis. 3) In order to mitigate the negative impact of mislabeled data, an adaptive adjustment strategy based on gravitational search algorithm is proposed for label confidence. Compared with a variety of state-of-the-art unsupervised and semi-supervised learning algorithms, the SSCAC algorithm has demonstrated its superiority through extensive experiments on two benchmark gene expression datasets.
Collapse
Affiliation(s)
- Dan Li
- Faculty of Electronic Information and Electrical Engineering, Dalian University of Technology, Dalian, Liaoning, China
| | - Hongnan Liang
- Faculty of Electronic Information and Electrical Engineering, Dalian University of Technology, Dalian, Liaoning, China
| | - Pan Qin
- Faculty of Electronic Information and Electrical Engineering, Dalian University of Technology, Dalian, Liaoning, China
- *Correspondence: Pan Qin, ; Jia Wang,
| | - Jia Wang
- Department of Breast Surgery, The Second Hospital of Dalian Medical University, Dalian, Liaoning, China
- *Correspondence: Pan Qin, ; Jia Wang,
| |
Collapse
|
42
|
Beeche C, Gezer NS, Iyer K, Almetwali O, Yu J, Zhang Y, Dhupar R, Leader JK, Pu J. Assessing retinal vein occlusion based on color fundus photographs using neural understanding network (NUN). Med Phys 2023; 50:449-464. [PMID: 36184848 PMCID: PMC9868057 DOI: 10.1002/mp.16012] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2021] [Revised: 09/15/2022] [Accepted: 09/16/2022] [Indexed: 01/26/2023] Open
Abstract
OBJECTIVE To develop and validate a novel deep learning architecture to classify retinal vein occlusion (RVO) on color fundus photographs (CFPs) and reveal the image features contributing to the classification. METHODS The neural understanding network (NUN) is formed by two components: (1) convolutional neural network (CNN)-based feature extraction and (2) graph neural networks (GNN)-based feature understanding. The CNN-based image features were transformed into a graph representation to encode and visualize long-range feature interactions to identify the image regions that significantly contributed to the classification decision. A total of 7062 CFPs were classified into three categories: (1) no vein occlusion ("normal"), (2) central RVO, and (3) branch RVO. The area under the receiver operative characteristic (ROC) curve (AUC) was used as the metric to assess the performance of the trained classification models. RESULTS The AUC, accuracy, sensitivity, and specificity for NUN to classify CFPs as normal, central occlusion, or branch occlusion were 0.975 (± 0.003), 0.911 (± 0.007), 0.983 (± 0.010), and 0.803 (± 0.005), respectively, which outperformed available classical CNN models. CONCLUSION The NUN architecture can provide a better classification performance and a straightforward visualization of the results compared to CNNs.
Collapse
Affiliation(s)
- Cameron Beeche
- Department of Radiology, University of Pittsburgh, Pittsburgh, PA 15213, USA
| | - Naciye S Gezer
- Department of Radiology, University of Pittsburgh, Pittsburgh, PA 15213, USA
| | - Kartik Iyer
- Department of Radiology, University of Pittsburgh, Pittsburgh, PA 15213, USA
| | - Omar Almetwali
- Department of Radiology, University of Pittsburgh, Pittsburgh, PA 15213, USA
| | - Juezhao Yu
- Department of Radiology, University of Pittsburgh, Pittsburgh, PA 15213, USA
| | - Yanchun Zhang
- Shaan’xi Eye Hospital, Xi’an, Shaanxi, 710004, China
| | - Rajeev Dhupar
- Department of Cardiothoracic Surgery, University of Pittsburgh, Pittsburgh, PA 15213, USA
- Surgical Services Division, VA Pittsburgh Healthcare System, Pittsburgh, PA 15240
| | - Joseph K. Leader
- Department of Radiology, University of Pittsburgh, Pittsburgh, PA 15213, USA
| | - Jiantao Pu
- Department of Radiology, University of Pittsburgh, Pittsburgh, PA 15213, USA
- Department of Bioengineering, University of Pittsburgh, Pittsburgh, PA 15213, USA
| |
Collapse
|
43
|
La Rosa M, Fiannaca A, La Paglia L, Urso A. A Graph Neural Network Approach for the Analysis of siRNA-Target Biological Networks. Int J Mol Sci 2022; 23:ijms232214211. [PMID: 36430688 PMCID: PMC9696923 DOI: 10.3390/ijms232214211] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2022] [Revised: 11/10/2022] [Accepted: 11/15/2022] [Indexed: 11/19/2022] Open
Abstract
Many biological systems are characterised by biological entities, as well as their relationships. These interaction networks can be modelled as graphs, with nodes representing bio-entities, such as molecules, and edges representing relations among them, such as interactions. Due to the current availability of a huge amount of biological data, it is very important to consider in silico analysis methods based on, for example, machine learning, that could take advantage of the inner graph structure of the data in order to improve the quality of the results. In this scenario, graph neural networks (GNNs) are recent computational approaches that directly deal with graph-structured data. In this paper, we present a GNN network for the analysis of siRNA-mRNA interaction networks. siRNAs, in fact, are small RNA molecules that are able to bind to target genes and silence them. These events make siRNAs key molecules as RNA interference agents in many biological interaction networks related to severe diseases such as cancer. In particular, our GNN approach allows for the prediction of the siRNA efficacy, which measures the siRNA's ability to bind and silence a gene target. Tested on benchmark datasets, our proposed method overcomes other machine learning algorithms, including the state-of-the-art predictor based on the convolutional neural network, reaching a Pearson correlation coefficient of approximately 73.6%. Finally, we proposed a case study where the efficacy of a set of siRNAs is predicted for a gene of interest. To the best of our knowledge, GNNs were used for the first time in this scenario.
Collapse
|
44
|
Yaseen A, Amin I, Akhter N, Ben-Hur A, Minhas F. Insights into performance evaluation of compound-protein interaction prediction methods. Bioinformatics 2022; 38:ii75-ii81. [PMID: 36124806 DOI: 10.1093/bioinformatics/btac496] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022] Open
Abstract
MOTIVATION Machine-learning-based prediction of compound-protein interactions (CPIs) is important for drug design, screening and repurposing. Despite numerous recent publication with increasing methodological sophistication claiming consistent improvements in predictive accuracy, we have observed a number of fundamental issues in experiment design that produce overoptimistic estimates of model performance. RESULTS We systematically analyze the impact of several factors affecting generalization performance of CPI predictors that are overlooked in existing work: (i) similarity between training and test examples in cross-validation; (ii) synthesizing negative examples in absence of experimentally verified negative examples and (iii) alignment of evaluation protocol and performance metrics with real-world use of CPI predictors in screening large compound libraries. Using both state-of-the-art approaches by other researchers as well as a simple kernel-based baseline, we have found that effective assessment of generalization performance of CPI predictors requires careful control over similarity between training and test examples. We show that, under stringent performance assessment protocols, a simple kernel-based approach can exceed the predictive performance of existing state-of-the-art methods. We also show that random pairing for generating synthetic negative examples for training and performance evaluation results in models with better generalization in comparison to more sophisticated strategies used in existing studies. Our analyses indicate that using proposed experiment design strategies can offer significant improvements for CPI prediction leading to effective target compound screening for drug repurposing and discovery of putative chemical ligands of SARS-CoV-2-Spike and Human-ACE2 proteins. AVAILABILITY AND IMPLEMENTATION Code and supplementary material available at https://github.com/adibayaseen/HKRCPI. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Adiba Yaseen
- Department of Computer and Information Sciences (DCIS), Pakistan Institute of Engineering and Applied Sciences (PIEAS), Islamabad 45650, Pakistan
| | - Imran Amin
- National Institute for Biotechnology and Genetic Engineering, Faisalabad 38000, Pakistan
| | - Naeem Akhter
- Department of Computer and Information Sciences (DCIS), Pakistan Institute of Engineering and Applied Sciences (PIEAS), Islamabad 45650, Pakistan
| | - Asa Ben-Hur
- Department of Computer Science, Colorado State University, Fort Collins, CO 80523, USA
| | - Fayyaz Minhas
- Department of Computer Science, University of Warwick, Coventry CV4 7AL, UK
| |
Collapse
|
45
|
Pfeifer B, Saranti A, Holzinger A. GNN-SubNet: disease subnetwork detection with explainable graph neural networks. Bioinformatics 2022; 38:ii120-ii126. [PMID: 36124793 DOI: 10.1093/bioinformatics/btac478] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022] Open
Abstract
MOTIVATION The tremendous success of graphical neural networks (GNNs) already had a major impact on systems biology research. For example, GNNs are currently being used for drug target recognition in protein-drug interaction networks, as well as for cancer gene discovery and more. Important aspects whose practical relevance is often underestimated are comprehensibility, interpretability and explainability. RESULTS In this work, we present a novel graph-based deep learning framework for disease subnetwork detection via explainable GNNs. Each patient is represented by the topology of a protein-protein interaction (PPI) network, and the nodes are enriched with multi-omics features from gene expression and DNA methylation. In addition, we propose a modification of the GNNexplainer that provides model-wide explanations for improved disease subnetwork detection. AVAILABILITY AND IMPLEMENTATION The proposed methods and tools are implemented in the GNN-SubNet Python package, which we have made available on our GitHub for the international research community (https://github.com/pievos101/GNN-SubNet). SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Bastian Pfeifer
- Institute for Medical Informatics Statistics and Documentation, Medical University Graz, Graz, Austria
| | - Anna Saranti
- Institute for Medical Informatics Statistics and Documentation, Medical University Graz, Graz, Austria
| | - Andreas Holzinger
- Institute for Medical Informatics Statistics and Documentation, Medical University Graz, Graz, Austria.,Human-Centered AI Lab, Department of Forest- and Soil Sciences, University of Natural Resources and Life Sciences Vienna, Vienna, Austria.,Alberta Machine Intelligence Institute, University of Alberta, Edmonton, Canada
| |
Collapse
|
46
|
|
47
|
Yang Z, Yan Y, Gan H, Zhao J, Ye Z. A safe semi-supervised graph convolution network. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2022; 19:12677-12692. [PMID: 36654017 DOI: 10.3934/mbe.2022592] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/17/2023]
Abstract
In the semi-supervised learning field, Graph Convolution Network (GCN), as a variant model of GNN, has achieved promising results for non-Euclidean data by introducing convolution into GNN. However, GCN and its variant models fail to safely use the information of risk unlabeled data, which will degrade the performance of semi-supervised learning. Therefore, we propose a Safe GCN framework (Safe-GCN) to improve the learning performance. In the Safe-GCN, we design an iterative process to label the unlabeled data. In each iteration, a GCN and its supervised version (S-GCN) are learned to find the unlabeled data with high confidence. The high-confidence unlabeled data and their pseudo labels are then added to the label set. Finally, both added unlabeled data and labeled ones are used to train a S-GCN which can achieve the safe exploration of the risk unlabeled data and enable safe use of large numbers of unlabeled data. The performance of Safe-GCN is evaluated on three well-known citation network datasets and the obtained results demonstrate the effectiveness of the proposed framework over several graph-based semi-supervised learning methods.
Collapse
Affiliation(s)
- Zhi Yang
- School of Computer Science, Hubei University of Technology, Wuhan 430068, China
- State Key Laboratory of Biocatalysis and Enzyme Engineering, School of Life Sciences, Hubei University, Wuhan 430062, China
| | - Yadong Yan
- School of Computer Science, Hubei University of Technology, Wuhan 430068, China
| | - Haitao Gan
- School of Computer Science, Hubei University of Technology, Wuhan 430068, China
- State Key Laboratory of Biocatalysis and Enzyme Engineering, School of Life Sciences, Hubei University, Wuhan 430062, China
| | - Jing Zhao
- State Key Laboratory of Biocatalysis and Enzyme Engineering, School of Life Sciences, Hubei University, Wuhan 430062, China
| | - Zhiwei Ye
- School of Computer Science, Hubei University of Technology, Wuhan 430068, China
| |
Collapse
|
48
|
Alharbi WS, Rashid M. A review of deep learning applications in human genomics using next-generation sequencing data. Hum Genomics 2022; 16:26. [PMID: 35879805 PMCID: PMC9317091 DOI: 10.1186/s40246-022-00396-x] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2021] [Accepted: 07/12/2022] [Indexed: 12/02/2022] Open
Abstract
Genomics is advancing towards data-driven science. Through the advent of high-throughput data generating technologies in human genomics, we are overwhelmed with the heap of genomic data. To extract knowledge and pattern out of this genomic data, artificial intelligence especially deep learning methods has been instrumental. In the current review, we address development and application of deep learning methods/models in different subarea of human genomics. We assessed over- and under-charted area of genomics by deep learning techniques. Deep learning algorithms underlying the genomic tools have been discussed briefly in later part of this review. Finally, we discussed briefly about the late application of deep learning tools in genomic. Conclusively, this review is timely for biotechnology or genomic scientists in order to guide them why, when and how to use deep learning methods to analyse human genomic data.
Collapse
Affiliation(s)
- Wardah S Alharbi
- Department of AI and Bioinformatics, King Abdullah International Medical Research Center (KAIMRC), King Saud Bin Abdulaziz University for Health Sciences (KSAU-HS), King Abdulaziz Medical City, Ministry of National Guard Health Affairs, P.O. Box 22490, Riyadh, 11426, Saudi Arabia
| | - Mamoon Rashid
- Department of AI and Bioinformatics, King Abdullah International Medical Research Center (KAIMRC), King Saud Bin Abdulaziz University for Health Sciences (KSAU-HS), King Abdulaziz Medical City, Ministry of National Guard Health Affairs, P.O. Box 22490, Riyadh, 11426, Saudi Arabia.
| |
Collapse
|
49
|
Artificial intelligence and machine-learning approaches in structure and ligand-based discovery of drugs affecting central nervous system. Mol Divers 2022; 27:959-985. [PMID: 35819579 DOI: 10.1007/s11030-022-10489-3] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2022] [Accepted: 06/21/2022] [Indexed: 12/11/2022]
Abstract
CNS disorders are indications with a very high unmet medical needs, relatively smaller number of available drugs, and a subpar satisfaction level among patients and caregiver. Discovery of CNS drugs is extremely expensive affair with its own unique challenges leading to extremely high attrition rates and low efficiency. With explosion of data in information age, there is hardly any aspect of life that has not been touched by data driven technologies such as artificial intelligence (AI) and machine learning (ML). Drug discovery is no exception, emergence of big data via genomic, proteomic, biological, and chemical technologies has driven pharmaceutical giants to collaborate with AI oriented companies to revolutionise drug discovery, with the goal of increasing the efficiency of the process. In recent years many examples of innovative applications of AI and ML techniques in CNS drug discovery has been reported. Research on therapeutics for diseases such as schizophrenia, Alzheimer's and Parkinsonism has been provided with a new direction and thrust from these developments. AI and ML has been applied to both ligand-based and structure-based drug discovery and design of CNS therapeutics. In this review, we have summarised the general aspects of AI and ML from the perspective of drug discovery followed by a comprehensive coverage of the recent developments in the applications of AI/ML techniques in CNS drug discovery.
Collapse
|
50
|
Fang Z, Peltz G. An automated multi-modal graph-based pipeline for mouse genetic discovery. Bioinformatics 2022; 38:3385-3394. [PMID: 35608290 PMCID: PMC9992076 DOI: 10.1093/bioinformatics/btac356] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2022] [Revised: 04/18/2022] [Accepted: 05/19/2022] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Our ability to identify causative genetic factors for mouse genetic models of human diseases and biomedical traits has been limited by the difficulties associated with identifying true causative factors, which are often obscured by the many false positive genetic associations produced by a GWAS. RESULTS To accelerate the pace of genetic discovery, we developed a graph neural network (GNN)-based automated pipeline (GNNHap) that could rapidly analyze mouse genetic model data and identify high probability causal genetic factors for analyzed traits. After assessing the strength of allelic associations with the strain response pattern; this pipeline analyzes 29M published papers to assess candidate gene-phenotype relationships; and incorporates the information obtained from a protein-protein interaction network and protein sequence features into the analysis. The GNN model produces markedly improved results relative to that of a simple linear neural network. We demonstrate that GNNHap can identify novel causative genetic factors for murine models of diabetes/obesity and for cataract formation, which were validated by the phenotypes appearing in previously analyzed gene knockout mice. The diabetes/obesity results indicate how characterization of the underlying genetic architecture enables new therapies to be discovered and tested by applying 'precision medicine' principles to murine models. AVAILABILITY AND IMPLEMENTATION The GNNHap source code is freely available at https://github.com/zqfang/gnnhap, and the new version of the HBCGM program is available at https://github.com/zqfang/haplomap. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Zhuoqing Fang
- Department of Anesthesia, Pain and Perioperative Medicine, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Gary Peltz
- Department of Anesthesia, Pain and Perioperative Medicine, Stanford University School of Medicine, Stanford, CA 94305, USA
| |
Collapse
|