1
|
Chowdhury S, Chen Y, Li P, Rajaganapathy S, Wen A, Ma X, Dai Q, Yu Y, Fu S, Jiang X, He Z, Sohn S, Liu X, Bielinski SJ, Chamberlain AM, Cerhan JR, Zong N. Stratifying heart failure patients with graph neural network and transformer using Electronic Health Records to optimize drug response prediction. J Am Med Inform Assoc 2024; 31:1671-1681. [PMID: 38926131 PMCID: PMC11258417 DOI: 10.1093/jamia/ocae137] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2024] [Revised: 05/05/2024] [Accepted: 05/30/2024] [Indexed: 06/28/2024] Open
Abstract
OBJECTIVES Heart failure (HF) impacts millions of patients worldwide, yet the variability in treatment responses remains a major challenge for healthcare professionals. The current treatment strategies, largely derived from population based evidence, often fail to consider the unique characteristics of individual patients, resulting in suboptimal outcomes. This study aims to develop computational models that are patient-specific in predicting treatment outcomes, by utilizing a large Electronic Health Records (EHR) database. The goal is to improve drug response predictions by identifying specific HF patient subgroups that are likely to benefit from existing HF medications. MATERIALS AND METHODS A novel, graph-based model capable of predicting treatment responses, combining Graph Neural Network and Transformer was developed. This method differs from conventional approaches by transforming a patient's EHR data into a graph structure. By defining patient subgroups based on this representation via K-Means Clustering, we were able to enhance the performance of drug response predictions. RESULTS Leveraging EHR data from 11 627 Mayo Clinic HF patients, our model significantly outperformed traditional models in predicting drug response using NT-proBNP as a HF biomarker across five medication categories (best RMSE of 0.0043). Four distinct patient subgroups were identified with differential characteristics and outcomes, demonstrating superior predictive capabilities over existing HF subtypes (best mean RMSE of 0.0032). DISCUSSION These results highlight the power of graph-based modeling of EHR in improving HF treatment strategies. The stratification of patients sheds light on particular patient segments that could benefit more significantly from tailored response predictions. CONCLUSIONS Longitudinal EHR data have the potential to enhance personalized prognostic predictions through the application of graph-based AI techniques.
Collapse
Affiliation(s)
- Shaika Chowdhury
- Department of Artificial Intelligence and Informatics Research, Mayo Clinic, Rochester, MN 55902, United States
| | - Yongbin Chen
- Department of Biochemistry and Molecular Biology, Mayo Clinic, Rochester, MN 55902, United States
| | - Pengyang Li
- Division of Cardiology, Pauley Heart Center, Virginia Commonwealth University, Richmond, VA 23219, United States
| | - Sivaraman Rajaganapathy
- Department of Artificial Intelligence and Informatics Research, Mayo Clinic, Rochester, MN 55902, United States
| | - Andrew Wen
- McWilliams School of Biomedical Informatics, University of Texas Health Science Center, Houston, TX 77030, United States
| | - Xiao Ma
- Department of Cardiovascular Medicine, Mayo Clinic, Rochester, MN 55902, United States
| | - Qiying Dai
- Department of Cardiovascular Medicine, Mayo Clinic, Rochester, MN 55902, United States
| | - Yue Yu
- Department of Quantitative Health Sciences, Mayo Clinic, Rochester, MN 55902, United States
| | - Sunyang Fu
- McWilliams School of Biomedical Informatics, University of Texas Health Science Center, Houston, TX 77030, United States
| | - Xiaoqian Jiang
- McWilliams School of Biomedical Informatics, University of Texas Health Science Center, Houston, TX 77030, United States
| | - Zhe He
- School of Information, Florida State University, Tallahassee, FL 32306, United States
| | - Sunghwan Sohn
- Department of Artificial Intelligence and Informatics Research, Mayo Clinic, Rochester, MN 55902, United States
| | - Xiaoke Liu
- Department of Cardiovascular Medicine, Mayo Clinic, La Crosse, WI 54601, United States
| | - Suzette J Bielinski
- Department of Quantitative Health Sciences, Mayo Clinic, Rochester, MN 55902, United States
| | - Alanna M Chamberlain
- Department of Cardiovascular Medicine, Mayo Clinic, Rochester, MN 55902, United States
- Department of Quantitative Health Sciences, Mayo Clinic, Rochester, MN 55902, United States
| | - James R Cerhan
- Department of Quantitative Health Sciences, Mayo Clinic, Rochester, MN 55902, United States
| | - Nansu Zong
- Department of Artificial Intelligence and Informatics Research, Mayo Clinic, Rochester, MN 55902, United States
| |
Collapse
|
2
|
He SH, Yun L, Yi HC. Accurate prediction of drug combination risk levels based on relational graph convolutional network and multi-head attention. J Transl Med 2024; 22:572. [PMID: 38880914 PMCID: PMC11180398 DOI: 10.1186/s12967-024-05372-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2024] [Accepted: 06/02/2024] [Indexed: 06/18/2024] Open
Abstract
BACKGROUND Accurately identifying the risk level of drug combinations is of great significance in investigating the mechanisms of combination medication and adverse reactions. Most existing methods can only predict whether there is an interaction between two drugs, but cannot directly determine their accurate risk level. METHODS In this study, we propose a multi-class drug combination risk prediction model named AERGCN-DDI, utilizing a relational graph convolutional network with a multi-head attention mechanism. Drug-drug interaction events with varying risk levels are modeled as a heterogeneous information graph. Attribute features of drug nodes and links are learned based on compound chemical structure information. Finally, the AERGCN-DDI model is proposed to predict drug combination risk level based on heterogenous graph neural network and multi-head attention modules. RESULTS To evaluate the effectiveness of the proposed method, five-fold cross-validation and ablation study were conducted. Furthermore, we compared its predictive performance with baseline models and other state-of-the-art methods on two benchmark datasets. Empirical studies demonstrated the superior performances of AERGCN-DDI. CONCLUSIONS AERGCN-DDI emerges as a valuable tool for predicting the risk levels of drug combinations, thereby aiding in clinical medication decision-making, mitigating severe drug side effects, and enhancing patient clinical prognosis.
Collapse
Affiliation(s)
- Shi-Hui He
- School of Information Science and Technology, Yunnan Normal University, Kunming, 650500, China
- Engineering Research Center of Computer Vision and Intelligent Control Technology, Department of Education, Kunming, 650500, China
| | - Lijun Yun
- School of Information Science and Technology, Yunnan Normal University, Kunming, 650500, China.
- Engineering Research Center of Computer Vision and Intelligent Control Technology, Department of Education, Kunming, 650500, China.
| | - Hai-Cheng Yi
- School of Computer Science, Northwestern Polytechnical University, Xi'an, 710129, China.
| |
Collapse
|
3
|
Zhang Y, Wang Z, Wei H, Chen M. Exploring potential circRNA biomarkers for cancers based on double-line heterogeneous graph representation learning. BMC Med Inform Decis Mak 2024; 24:159. [PMID: 38844961 PMCID: PMC11157868 DOI: 10.1186/s12911-024-02564-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2024] [Accepted: 06/04/2024] [Indexed: 06/09/2024] Open
Abstract
BACKGROUND Compared with the time-consuming and labor-intensive for biological validation in vitro or in vivo, the computational models can provide high-quality and purposeful candidates in an instant. Existing computational models face limitations in effectively utilizing sparse local structural information for accurate predictions in circRNA-disease associations. This study addresses this challenge with a proposed method, CDA-DGRL (Prediction of CircRNA-Disease Association based on Double-line Graph Representation Learning), which employs a deep learning framework leveraging graph networks and a dual-line representation model integrating graph node features. METHOD CDA-DGRL comprises several key steps: initially, the integration of diverse biological information to compute integrated similarities among circRNAs and diseases, leading to the construction of a heterogeneous network specific to circRNA-disease associations. Subsequently, circRNA and disease node features are derived using sparse autoencoders. Thirdly, a graph convolutional neural network is employed to capture the local graph network structure by inputting the circRNA-disease heterogeneous network alongside node features. Fourthly, the utilization of node2vec facilitates depth-first sampling of the circRNA-disease heterogeneous network to grasp the global graph network structure, addressing issues associated with sparse raw data. Finally, the fusion of local and global graph network structures is inputted into an extra trees classifier to identify potential circRNA-disease associations. RESULTS The results, obtained through a rigorous five-fold cross-validation on the circR2Disease dataset, demonstrate the superiority of CDA-DGRL with an AUC value of 0.9866 and an AUPR value of 0.9897 compared to existing state-of-the-art models. Notably, the hyper-random tree classifier employed in this model outperforms other machine learning classifiers. CONCLUSION Thus, CDA-DGRL stands as a promising methodology for reliably identifying circRNA-disease associations, offering potential avenues to alleviate the necessity for extensive traditional biological experiments. The source code and data for this study are available at https://github.com/zywait/CDA-DGRL .
Collapse
Affiliation(s)
- Yi Zhang
- School of Computer Science and Engineering, Guilin University of Technology, Guilin, 541004, China
- Guangxi Key Laboratory of Embedded Technology and Intelligent System, Guilin University of Technology, Guilin, 541004, China
| | - ZhenMei Wang
- School of Big Data, Guangxi Vocational and Technical College, Nanning, 530003, China.
| | - Hanyan Wei
- Pharmacy School, Guilin Medical University, Guilin, 541004, China
| | - Min Chen
- School of Computer Science and Technology, Hunan Institute of Technology, Hengyang, 421010, China
| |
Collapse
|
4
|
Liu W, Teng Z, Li Z, Chen J. CVGAE: A Self-Supervised Generative Method for Gene Regulatory Network Inference Using Single-Cell RNA Sequencing Data. Interdiscip Sci 2024:10.1007/s12539-024-00633-y. [PMID: 38778003 DOI: 10.1007/s12539-024-00633-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2023] [Revised: 04/07/2024] [Accepted: 04/09/2024] [Indexed: 05/25/2024]
Abstract
Gene regulatory network (GRN) inference based on single-cell RNA sequencing data (scRNAseq) plays a crucial role in understanding the regulatory mechanisms between genes. Various computational methods have been employed for GRN inference, but their performance in terms of network accuracy and model generalization is not satisfactory, and their poor performance is caused by high-dimensional data and network sparsity. In this paper, we propose a self-supervised method for gene regulatory network inference using single-cell RNA sequencing data (CVGAE). CVGAE uses graph neural network for inductive representation learning, which merges gene expression data and observed topology into a low-dimensional vector space. The well-trained vectors will be used to calculate mathematical distance of each gene, and further predict interactions between genes. In overall framework, FastICA is implemented to relief computational complexity caused by high dimensional data, and CVGAE adopts multi-stacked GraphSAGE layers as an encoder and an improved decoder to overcome network sparsity. CVGAE is evaluated on several single cell datasets containing four related ground-truth networks, and the result shows that CVGAE achieve better performance than comparative methods. To validate learning and generalization capabilities, CVGAE is applied in few-shot environment by change the ratio of train set and test set. In condition of few-shot, CVGAE obtains comparable or superior performance.
Collapse
Affiliation(s)
- Wei Liu
- School of Computer Science, Xiangtan University, Xiangtan, 411105, China.
| | - Zhijie Teng
- School of Computer Science, Xiangtan University, Xiangtan, 411105, China
| | - Zejun Li
- School of Computer Science and Engineering, Hunan Institute of Technology, Hengyang, 412002, China
| | - Jing Chen
- School of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou, 215009, China.
| |
Collapse
|
5
|
Li YC, You ZH, Yu CQ, Wang L, Hu L, Hu PW, Qiao Y, Wang XF, Huang YA. DeepCMI: a graph-based model for accurate prediction of circRNA-miRNA interactions with multiple information. Brief Funct Genomics 2024; 23:276-285. [PMID: 37539561 DOI: 10.1093/bfgp/elad030] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2023] [Revised: 05/25/2023] [Accepted: 07/13/2023] [Indexed: 08/05/2023] Open
Abstract
Recently, the role of competing endogenous RNAs in regulating gene expression through the interaction of microRNAs has been closely associated with the expression of circular RNAs (circRNAs) in various biological processes such as reproduction and apoptosis. While the number of confirmed circRNA-miRNA interactions (CMIs) continues to increase, the conventional in vitro approaches for discovery are expensive, labor intensive, and time consuming. Therefore, there is an urgent need for effective prediction of potential CMIs through appropriate data modeling and prediction based on known information. In this study, we proposed a novel model, called DeepCMI, that utilizes multi-source information on circRNA/miRNA to predict potential CMIs. Comprehensive evaluations on the CMI-9905 and CMI-9589 datasets demonstrated that DeepCMI successfully infers potential CMIs. Specifically, DeepCMI achieved AUC values of 90.54% and 94.8% on the CMI-9905 and CMI-9589 datasets, respectively. These results suggest that DeepCMI is an effective model for predicting potential CMIs and has the potential to significantly reduce the need for downstream in vitro studies. To facilitate the use of our trained model and data, we have constructed a computational platform, which is available at http://120.77.11.78/DeepCMI/. The source code and datasets used in this work are available at https://github.com/LiYuechao1998/DeepCMI.
Collapse
Affiliation(s)
- Yue-Chao Li
- School of Information Engineering, Xijing University, Xi'an, China
| | - Zhu-Hong You
- School of Computer Science, Northwestern Polytechnical University, Xi'an, China
| | - Chang-Qing Yu
- School of Information Engineering, Xijing University, Xi'an, China
| | - Lei Wang
- Guangxi Academy of Sciences, Nanning, China
| | - Lun Hu
- Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Science, Urumqi, China
| | - Peng-Wei Hu
- Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Science, Urumqi, China
| | - Yan Qiao
- College of Agriculture and Forestry, Longdong University, Qingyang 745000, China
| | - Xin-Fei Wang
- School of Information Engineering, Xijing University, Xi'an, China
| | - Yu-An Huang
- School of Computer Science, Northwestern Polytechnical University, Xi'an, China
| |
Collapse
|
6
|
Labarga A, Martínez-Gonzalez J, Barajas M. Integrative Multi-Omics Analysis for Etiology Classification and Biomarker Discovery in Stroke: Advancing towards Precision Medicine. BIOLOGY 2024; 13:338. [PMID: 38785820 PMCID: PMC11149453 DOI: 10.3390/biology13050338] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/23/2024] [Revised: 05/02/2024] [Accepted: 05/06/2024] [Indexed: 05/25/2024]
Abstract
Recent advancements in high-throughput omics technologies have opened new avenues for investigating stroke at the molecular level and elucidating the intricate interactions among various molecular components. We present a novel approach for multi-omics data integration on knowledge graphs and have applied it to a stroke etiology classification task of 30 stroke patients through the integrative analysis of DNA methylation and mRNA, miRNA, and circRNA. This approach has demonstrated promising performance as compared to other existing single technology approaches.
Collapse
Affiliation(s)
- Alberto Labarga
- Health Science Department, Public University of Navarra, 31006 Pamplona, Spain;
| | | | - Miguel Barajas
- Health Science Department, Public University of Navarra, 31006 Pamplona, Spain;
| |
Collapse
|
7
|
Zhao YX, Yu CQ, Li LP, Wang DW, Song HF, Wei Y. BJLD-CMI: a predictive circRNA-miRNA interactions model combining multi-angle feature information. Front Genet 2024; 15:1399810. [PMID: 38798699 PMCID: PMC11116695 DOI: 10.3389/fgene.2024.1399810] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2024] [Accepted: 04/03/2024] [Indexed: 05/29/2024] Open
Abstract
Increasing research findings suggest that circular RNA (circRNA) exerts a crucial function in the pathogenesis of complex human diseases by binding to miRNA. Identifying their potential interactions is of paramount importance for the diagnosis and treatment of diseases. However, long cycles, small scales, and time-consuming processes characterize previous biological wet experiments. Consequently, the use of an efficient computational model to forecast the interactions between circRNA and miRNA is gradually becoming mainstream. In this study, we present a new prediction model named BJLD-CMI. The model extracts circRNA sequence features and miRNA sequence features by applying Jaccard and Bert's method and organically integrates them to obtain CMI attribute features, and then uses the graph embedding method Line to extract CMI behavioral features based on the known circRNA-miRNA correlation graph information. And then we predict the potential circRNA-miRNA interactions by fusing the multi-angle feature information such as attribute and behavior through Autoencoder in Autoencoder Networks. BJLD-CMI attained 94.95% and 90.69% of the area under the ROC curve on the CMI-9589 and CMI-9905 datasets. When compared with existing models, the results indicate that BJLD-CMI exhibits the best overall competence. During the case study experiment, we conducted a PubMed literature search to confirm that out of the top 10 predicted CMIs, seven pairs did indeed exist. These results suggest that BJLD-CMI is an effective method for predicting interactions between circRNAs and miRNAs. It provides a valuable candidate for biological wet experiments and can reduce the burden of researchers.
Collapse
Affiliation(s)
- Yi-Xin Zhao
- School of information Engineering, Xijing University, Xi’an, China
| | - Chang-Qing Yu
- School of information Engineering, Xijing University, Xi’an, China
| | - Li-Ping Li
- School of information Engineering, Xijing University, Xi’an, China
- College of Grassland and Environment Sciences, Xinjiang Agricultural University, Ürümqi, China
| | - Deng-Wu Wang
- School of information Engineering, Xijing University, Xi’an, China
| | - Hui-Fan Song
- School of information Engineering, Xijing University, Xi’an, China
| | - Yu Wei
- School of information Engineering, Xijing University, Xi’an, China
| |
Collapse
|
8
|
Wei H, Gao L, Wu S, Jiang Y, Liu B. DiSMVC: a multi-view graph collaborative learning framework for measuring disease similarity. Bioinformatics 2024; 40:btae306. [PMID: 38715444 PMCID: PMC11256965 DOI: 10.1093/bioinformatics/btae306] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2024] [Revised: 04/19/2024] [Accepted: 05/05/2024] [Indexed: 05/30/2024] Open
Abstract
MOTIVATION Exploring potential associations between diseases can help in understanding pathological mechanisms of diseases and facilitating the discovery of candidate biomarkers and drug targets, thereby promoting disease diagnosis and treatment. Some computational methods have been proposed for measuring disease similarity. However, these methods describe diseases without considering their latent multi-molecule regulation and valuable supervision signal, resulting in limited biological interpretability and efficiency to capture association patterns. RESULTS In this study, we propose a new computational method named DiSMVC. Different from existing predictors, DiSMVC designs a supervised graph collaborative framework to measure disease similarity. Multiple bio-entity associations related to genes and miRNAs are integrated via cross-view graph contrastive learning to extract informative disease representation, and then association pattern joint learning is implemented to compute disease similarity by incorporating phenotype-annotated disease associations. The experimental results show that DiSMVC can draw discriminative characteristics for disease pairs, and outperform other state-of-the-art methods. As a result, DiSMVC is a promising method for predicting disease associations with molecular interpretability. AVAILABILITY AND IMPLEMENTATION Datasets and source codes are available at https://github.com/Biohang/DiSMVC.
Collapse
Affiliation(s)
- Hang Wei
- School of Computer Science and Technology, Xidian University, Xi’an, Shaanxi 710126, China
| | - Lin Gao
- School of Computer Science and Technology, Xidian University, Xi’an, Shaanxi 710126, China
| | - Shuai Wu
- School of Computer Science and Technology, Xidian University, Xi’an, Shaanxi 710126, China
| | - Yina Jiang
- Department of Basic Medicine, Shaanxi University of Chinese Medicine, Xianyang, Shaanxi 712046, China
| | - Bin Liu
- Faculty of Engineering, Shenzhen MSU-BIT University, Shenzhen, Guangdong 518172, China
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing, 100081, China
| |
Collapse
|
9
|
Su Y, Liu J, Wu Q, Gao Z, Wang J, Li H, Zheng C. AMPFLDAP: Adaptive Message Passing and Feature Fusion on Heterogeneous Network for LncRNA-Disease Associations Prediction. Interdiscip Sci 2024:10.1007/s12539-024-00610-5. [PMID: 38581626 DOI: 10.1007/s12539-024-00610-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2023] [Revised: 01/03/2024] [Accepted: 01/03/2024] [Indexed: 04/08/2024]
Abstract
Exploration of the intricate connections between long noncoding RNA (lncRNA) and diseases, referred to as lncRNA-disease associations (LDAs), plays a pivotal and indispensable role in unraveling the underlying molecular mechanisms of diseases and devising practical treatment approaches. It is imperative to employ computational methods for predicting lncRNA-disease associations to circumvent the need for superfluous experimental endeavors. Graph-based learning models have gained substantial popularity in predicting these associations, primarily because of their capacity to leverage node attributes and relationships within the network. Nevertheless, there remains much room for enhancing the performance of these techniques by incorporating and harmonizing the node attributes more effectively. In this context, we introduce a novel model, i.e., Adaptive Message Passing and Feature Fusion (AMPFLDAP), for forecasting lncRNA-disease associations within a heterogeneous network. Firstly, we constructed a heterogeneous network involving lncRNA, microRNA (miRNA), and diseases based on established associations and employing Gaussian interaction profile kernel similarity as a measure. Then, an adaptive topological message passing mechanism is suggested to address the information aggregation for heterogeneous networks. The topological features of nodes in the heterogeneous network were extracted based on the adaptive topological message passing mechanism. Moreover, an attention mechanism is applied to integrate both topological and semantic information to achieve the multimodal features of biomolecules, which are further used to predict potential LDAs. The experimental results demonstrated that the performance of the proposed AMPFLDAP is superior to seven state-of-the-art methods. Furthermore, to validate its efficacy in practical scenarios, we conducted detailed case studies involving three distinct diseases, which conclusively demonstrated AMPFLDAP's effectiveness in the prediction of LDAs.
Collapse
Affiliation(s)
- Yansen Su
- Key Laboratory of Intelligent Computing and Signal Processing, Anhui University, 111 Jiulong Road, Hefei, 230601, Anhui, China.
| | - Jingjing Liu
- Institute of Artificial Intelligence, Hefei Comprehensive National Science Center, 5089 Wangjiang West Road, Hefei, 230088, Anhui, China
| | - Qingwen Wu
- Institute of Artificial Intelligence, Hefei Comprehensive National Science Center, 5089 Wangjiang West Road, Hefei, 230088, Anhui, China
| | - Zhen Gao
- Institute of Artificial Intelligence, Hefei Comprehensive National Science Center, 5089 Wangjiang West Road, Hefei, 230088, Anhui, China
| | - Jing Wang
- Key Laboratory of Intelligent Computing and Signal Processing, Anhui University, 111 Jiulong Road, Hefei, 230601, Anhui, China
- Institute of Artificial Intelligence, Hefei Comprehensive National Science Center, 5089 Wangjiang West Road, Hefei, 230088, Anhui, China
| | - Haitao Li
- Key Laboratory of Intelligent Computing and Signal Processing, Anhui University, 111 Jiulong Road, Hefei, 230601, Anhui, China
| | - Chunhou Zheng
- Key Laboratory of Intelligent Computing and Signal Processing, Anhui University, 111 Jiulong Road, Hefei, 230601, Anhui, China
| |
Collapse
|
10
|
Varghese AJ, Bora A, Xu M, Karniadakis GE. TransformerG2G: Adaptive time-stepping for learning temporal graph embeddings using transformers. Neural Netw 2024; 172:106086. [PMID: 38159511 DOI: 10.1016/j.neunet.2023.12.040] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2023] [Revised: 12/18/2023] [Accepted: 12/22/2023] [Indexed: 01/03/2024]
Abstract
Dynamic graph embedding has emerged as a very effective technique for addressing diverse temporal graph analytic tasks (i.e., link prediction, node classification, recommender systems, anomaly detection, and graph generation) in various applications. Such temporal graphs exhibit heterogeneous transient dynamics, varying time intervals, and highly evolving node features throughout their evolution. Hence, incorporating long-range dependencies from the historical graph context plays a crucial role in accurately learning their temporal dynamics. In this paper, we develop a graph embedding model with uncertainty quantification, TransformerG2G, by exploiting the advanced transformer encoder to first learn intermediate node representations from its current state (t) and previous context (over timestamps [t-1,t-l], l is the length of context). Moreover, we employ two projection layers to generate lower-dimensional multivariate Gaussian distributions as each node's latent embedding at timestamp t. We consider diverse benchmarks with varying levels of "novelty" as measured by the TEA (Temporal Edge Appearance) plots. Our experiments demonstrate that the proposed TransformerG2G model outperforms conventional multi-step methods and our prior work (DynG2G) in terms of both link prediction accuracy and computational efficiency, especially for high degree of novelty. Furthermore, the learned time-dependent attention weights across multiple graph snapshots reveal the development of an automatic adaptive time stepping enabled by the transformer. Importantly, by examining the attention weights, we can uncover temporal dependencies, identify influential elements, and gain insights into the complex interactions within the graph structure. For example, we identified a strong correlation between attention weights and node degree at the various stages of the graph topology evolution.
Collapse
Affiliation(s)
| | - Aniruddha Bora
- Division of Applied Mathematics, Brown University, Providence, RI 02912, USA
| | - Mengjia Xu
- Department of Data Science, New Jersey Institute of Technology, Newark, NJ 07102, USA; Center for Brains, Minds and Machines, Massachusetts Institute of Technology, Cambridge, MA 02139, USA.
| | - George Em Karniadakis
- School of Engineering, Brown University, Providence, RI 02912, USA; Division of Applied Mathematics, Brown University, Providence, RI 02912, USA; Pacific Northwest National Laboratory, Richland, WA 99354, USA
| |
Collapse
|
11
|
Tao H, Cao J, Chen L, Sun H, Shi Y, Zhu X. Black-box attacks on dynamic graphs via adversarial topology perturbations. Neural Netw 2024; 171:308-319. [PMID: 38104509 DOI: 10.1016/j.neunet.2023.11.060] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2023] [Revised: 11/13/2023] [Accepted: 11/27/2023] [Indexed: 12/19/2023]
Abstract
Research and analysis of attacks on dynamic graph is beneficial for information systems to investigate vulnerabilities and strength abilities in resisting malicious attacks. Existing attacks on dynamic graphs mainly focus on rewiring original graph structures, which are often infeasible in real-world scenarios. To address this issue, we adopt a novel strategy by injecting both fake nodes and links to attack dynamic graphs. Based on that, we present the first study on attacking dynamic graphs via adversarial topology perturbations in a restricted black-box setting, in which downstream graph learning tasks are unknown. Specifically, we first divide dynamic graph structure perturbations into three sub-tasks and transform them as a sequential decision making process. Then, we propose a hierarchical reinforcement learning based black-box attack (HRBBA) framework to model three sub-tasks as attack policies. In addition, an imperceptible perturbation constraint to guarantee the concealment of attacks is incorporated into HRBBA. Finally, HRBBA is optimized based on the actor-critic process. Extensive experiments on four real-world dynamic graphs show that the performance of diverse dynamic graph learning methods (victim methods) on tasks like link prediction, node classification and network clustering can be substantially degraded under HRBBA attack.
Collapse
Affiliation(s)
- Haicheng Tao
- College of Information Engineering, Nanjing University of Finance and Economic, 3 Wenyuan Road, Nanjing, 210023, Jiangsu, China
| | - Jie Cao
- School of Management, Hefei University of Technology, 193 Tunxi Road, Hefei, 230009, Anhui, China.
| | - Lei Chen
- College of Information Science and Technology, Nanjing Forestry University, 159 Longpan Road, Nanjing, 210037, Jiangsu, China
| | - Hongliang Sun
- College of Information Engineering, Nanjing University of Finance and Economic, 3 Wenyuan Road, Nanjing, 210023, Jiangsu, China
| | - Yong Shi
- Research Center on Fictitious Economy and Data Science, Chinese Academy of Sciences, 19A Yuquan Road, Beijing, 100190, China
| | - Xingquan Zhu
- Department of Electrical Engineering and Computer Science, Florida Atlantic University, 777 Glades Road, Boca Raton, 33431, FL, USA
| |
Collapse
|
12
|
Guo LX, Wang L, You ZH, Yu CQ, Hu ML, Zhao BW, Li Y. Biolinguistic graph fusion model for circRNA-miRNA association prediction. Brief Bioinform 2024; 25:bbae058. [PMID: 38426324 PMCID: PMC10939421 DOI: 10.1093/bib/bbae058] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2023] [Revised: 01/19/2024] [Accepted: 01/27/2024] [Indexed: 03/02/2024] Open
Abstract
Emerging clinical evidence suggests that sophisticated associations with circular ribonucleic acids (RNAs) (circRNAs) and microRNAs (miRNAs) are a critical regulatory factor of various pathological processes and play a critical role in most intricate human diseases. Nonetheless, the above correlations via wet experiments are error-prone and labor-intensive, and the underlying novel circRNA-miRNA association (CMA) has been validated by numerous existing computational methods that rely only on single correlation data. Considering the inadequacy of existing machine learning models, we propose a new model named BGF-CMAP, which combines the gradient boosting decision tree with natural language processing and graph embedding methods to infer associations between circRNAs and miRNAs. Specifically, BGF-CMAP extracts sequence attribute features and interaction behavior features by Word2vec and two homogeneous graph embedding algorithms, large-scale information network embedding and graph factorization, respectively. Multitudinous comprehensive experimental analysis revealed that BGF-CMAP successfully predicted the complex relationship between circRNAs and miRNAs with an accuracy of 82.90% and an area under receiver operating characteristic of 0.9075. Furthermore, 23 of the top 30 miRNA-associated circRNAs of the studies on data were confirmed in relevant experiences, showing that the BGF-CMAP model is superior to others. BGF-CMAP can serve as a helpful model to provide a scientific theoretical basis for the study of CMA prediction.
Collapse
Affiliation(s)
- Lu-Xiang Guo
- School of Computer Science and Technology, China University of Mining and Technology, Xuzhou, 221116, China
| | - Lei Wang
- School of Computer Science and Technology, China University of Mining and Technology, Xuzhou, 221116, China
- Big Data and Intelligent Computing Research Center, Guangxi Academy of Sciences, Nanning 530007, China
- College of Information Science and Engineering, Zaozhuang University, Shandong 277100, China
| | - Zhu-Hong You
- School of Computer Science, Northwestern Polytechnical University, Xi’an, 710129, China
| | - Chang-Qing Yu
- College of Information Engineering, Xijing University, Xi’an 710123, China
| | - Meng-Lei Hu
- School of Medicine, Peking University, Beijing, 100091, China
| | - Bo-Wei Zhao
- Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Urumqi 830011, China
| | - Yang Li
- School of Computer Science and Information Engineering, Hefei University of Technology, Hefei 230601, China
| |
Collapse
|
13
|
Zhang C, Zang T, Zhao T. KGE-UNIT: toward the unification of molecular interactions prediction based on knowledge graph and multi-task learning on drug discovery. Brief Bioinform 2024; 25:bbae043. [PMID: 38348746 PMCID: PMC10939374 DOI: 10.1093/bib/bbae043] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2023] [Revised: 12/29/2023] [Accepted: 01/23/2024] [Indexed: 02/15/2024] Open
Abstract
The prediction of molecular interactions is vital for drug discovery. Existing methods often focus on individual prediction tasks and overlook the relationships between them. Additionally, certain tasks encounter limitations due to insufficient data availability, resulting in limited performance. To overcome these limitations, we propose KGE-UNIT, a unified framework that combines knowledge graph embedding (KGE) and multi-task learning, for simultaneous prediction of drug-target interactions (DTIs) and drug-drug interactions (DDIs) and enhancing the performance of each task, even when data availability is limited. Via KGE, we extract heterogeneous features from the drug knowledge graph to enhance the structural features of drug and protein nodes, thereby improving the quality of features. Additionally, employing multi-task learning, we introduce an innovative predictor that comprises the task-aware Convolutional Neural Network-based (CNN-based) encoder and the task-aware attention decoder which can fuse better multimodal features, capture the contextual interactions of molecular tasks and enhance task awareness, leading to improved performance. Experiments on two imbalanced datasets for DTIs and DDIs demonstrate the superiority of KGE-UNIT, achieving high area under the receiver operating characteristics curves (AUROCs) (0.942, 0.987) and area under the precision-recall curve ( AUPRs) (0.930, 0.980) for DTIs and high AUROCs (0.975, 0.989) and AUPRs (0.966, 0.988) for DDIs. Notably, on the LUO dataset where the data were more limited, KGE-UNIT exhibited a more pronounced improvement, with increases of 4.32$\%$ in AUROC and 3.56$\%$ in AUPR for DTIs and 6.56$\%$ in AUROC and 8.17$\%$ in AUPR for DDIs. The scalability of KGE-UNIT is demonstrated through its extension to protein-protein interactions prediction, ablation studies and case studies further validate its effectiveness.
Collapse
Affiliation(s)
- Chengcheng Zhang
- Department of Computer Science, Harbin Institute of Technology, Harbin, 150001, China
| | - Tianyi Zang
- Department of Computer Science, Harbin Institute of Technology, Harbin, 150001, China
| | - Tianyi Zhao
- School of Medicine and Health, Harbin Institute of Technology, Harbin, 150001, China
| |
Collapse
|
14
|
Ren ZH, Yu CQ, Li LP, You ZH, Li ZW, Zhang SW, Zeng X, Shang YF. SiSGC: A Drug Repositioning Prediction Model Based on Heterogeneous Simplifying Graph Convolution. J Chem Inf Model 2024; 64:238-249. [PMID: 38103039 DOI: 10.1021/acs.jcim.3c01665] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2023]
Abstract
Drug repositioning plays a key role in disease treatment. With the large-scale chemical data increasing, many computational methods are utilized for drug-disease association prediction. However, most of the existing models neglect the positive influence of non-Euclidean data and multisource information, and there is still a critical issue for graph neural networks regarding how to set the feature diffuse distance. To solve the problems, we proposed SiSGC, which makes full use of the biological knowledge information as initial features and learns the structure information from the constructed heterogeneous graph with the adaptive selection of the information diffuse distance. Then, the structural features are fused with the denoised similarity information and fed to the advanced classifier of CatBoost to make predictions. Three different data sets are used to confirm the robustness and generalization of SiSGC under two splitting strategies. Experiment results demonstrate that the proposed model achieves superior performance compared with the six leading methods and four variants. Our case study on breast neoplasms further indicates that SiSGC is trustworthy and robust yet simple. We also present four drugs for breast cancer treatment with high confidence and further give an explanation for demonstrating the rationality. There is no doubt that SiSGC can be used as a beneficial supplement for drug repositioning.
Collapse
Affiliation(s)
- Zhong-Hao Ren
- College of Computer Science and Electronic Engineering, Hunan University, Changsha 410082, China
| | - Chang-Qing Yu
- School of Information Engineering, Xijing University, Xi'an 710123, China
| | - Li-Ping Li
- College of Agriculture and Forestry, Longdong University, Qingyang 745000, China
| | - Zhu-Hong You
- School of Computer Science, Northwestern Polytechnical University, Xi'an 710129, China
| | - Zheng-Wei Li
- School of Computer Science and Technology, China University of Mining and Technology, Xuzhou 221116, China
| | - Shan-Wen Zhang
- School of Information Engineering, Xijing University, Xi'an 710123, China
| | - Xiangxiang Zeng
- College of Computer Science and Electronic Engineering, Hunan University, Changsha 410082, China
| | - Yi-Fan Shang
- College of Computer Science and Electronic Engineering, Hunan University, Changsha 410082, China
| |
Collapse
|
15
|
Wang X, Duan M, Li J, Ma A, Xin G, Xu D, Li Z, Liu B, Ma Q. MarsGT: Multi-omics analysis for rare population inference using single-cell graph transformer. Nat Commun 2024; 15:338. [PMID: 38184630 PMCID: PMC10771517 DOI: 10.1038/s41467-023-44570-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2023] [Accepted: 12/14/2023] [Indexed: 01/08/2024] Open
Abstract
Rare cell populations are key in neoplastic progression and therapeutic response, offering potential intervention targets. However, their computational identification and analysis often lag behind major cell types. To fill this gap, we introduce MarsGT: Multi-omics Analysis for Rare population inference using a Single-cell Graph Transformer. It identifies rare cell populations using a probability-based heterogeneous graph transformer on single-cell multi-omics data. MarsGT outperforms existing tools in identifying rare cells across 550 simulated and four real human datasets. In mouse retina data, it reveals unique subpopulations of rare bipolar cells and a Müller glia cell subpopulation. In human lymph node data, MarsGT detects an intermediate B cell population potentially acting as lymphoma precursors. In human melanoma data, it identifies a rare MAIT-like population impacted by a high IFN-I response and reveals the mechanism of immunotherapy. Hence, MarsGT offers biological insights and suggests potential strategies for early detection and therapeutic intervention of disease.
Collapse
Affiliation(s)
- Xiaoying Wang
- School of Mathematics, Shandong University, Jinan, Shandong, 250100, China
- Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, OH, 43210, USA
- Pelotonia Institute for Immuno-Oncology, The James Comprehensive Cancer Center, The Ohio State University, Columbus, OH, 43210, USA
| | - Maoteng Duan
- School of Mathematics, Shandong University, Jinan, Shandong, 250100, China
| | - Jingxian Li
- School of Mathematics, Shandong University, Jinan, Shandong, 250100, China
| | - Anjun Ma
- Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, OH, 43210, USA
- Pelotonia Institute for Immuno-Oncology, The James Comprehensive Cancer Center, The Ohio State University, Columbus, OH, 43210, USA
| | - Gang Xin
- Pelotonia Institute for Immuno-Oncology, The James Comprehensive Cancer Center, The Ohio State University, Columbus, OH, 43210, USA
| | - Dong Xu
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO, 65211, USA
- Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, MO, 65211, USA
| | - Zihai Li
- Pelotonia Institute for Immuno-Oncology, The James Comprehensive Cancer Center, The Ohio State University, Columbus, OH, 43210, USA
| | - Bingqiang Liu
- School of Mathematics, Shandong University, Jinan, Shandong, 250100, China.
| | - Qin Ma
- Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, OH, 43210, USA.
- Pelotonia Institute for Immuno-Oncology, The James Comprehensive Cancer Center, The Ohio State University, Columbus, OH, 43210, USA.
| |
Collapse
|
16
|
Veleiro U, de la Fuente J, Serrano G, Pizurica M, Casals M, Pineda-Lucena A, Vicent S, Ochoa I, Gevaert O, Hernaez M. GeNNius: an ultrafast drug-target interaction inference method based on graph neural networks. Bioinformatics 2024; 40:btad774. [PMID: 38134424 PMCID: PMC10766589 DOI: 10.1093/bioinformatics/btad774] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2023] [Revised: 11/20/2023] [Accepted: 12/21/2023] [Indexed: 12/24/2023] Open
Abstract
MOTIVATION Drug-target interaction (DTI) prediction is a relevant but challenging task in the drug repurposing field. In-silico approaches have drawn particular attention as they can reduce associated costs and time commitment of traditional methodologies. Yet, current state-of-the-art methods present several limitations: existing DTI prediction approaches are computationally expensive, thereby hindering the ability to use large networks and exploit available datasets and, the generalization to unseen datasets of DTI prediction methods remains unexplored, which could potentially improve the development processes of DTI inferring approaches in terms of accuracy and robustness. RESULTS In this work, we introduce GeNNius (Graph Embedding Neural Network Interaction Uncovering System), a Graph Neural Network (GNN)-based method that outperforms state-of-the-art models in terms of both accuracy and time efficiency across a variety of datasets. We also demonstrated its prediction power to uncover new interactions by evaluating not previously known DTIs for each dataset. We further assessed the generalization capability of GeNNius by training and testing it on different datasets, showing that this framework can potentially improve the DTI prediction task by training on large datasets and testing on smaller ones. Finally, we investigated qualitatively the embeddings generated by GeNNius, revealing that the GNN encoder maintains biological information after the graph convolutions while diffusing this information through nodes, eventually distinguishing protein families in the node embedding space. AVAILABILITY AND IMPLEMENTATION GeNNius code is available at https://github.com/ubioinformat/GeNNius.
Collapse
Affiliation(s)
- Uxía Veleiro
- CIMA University of Navarra, IdiSNA, 31008 Pamplona, Spain
| | - Jesús de la Fuente
- TECNUN, University of Navarra, 20016 San Sebastian, Spain
- Center for Data Science, New York University, New York, NY 10012, United States
| | - Guillermo Serrano
- CIMA University of Navarra, IdiSNA, 31008 Pamplona, Spain
- TECNUN, University of Navarra, 20016 San Sebastian, Spain
| | - Marija Pizurica
- Stanford Center for Biomedical Informatics Research, Department of Medicine and Department Biomedical Data Science, Stanford University, Stanford, CA 94305, United States
- Internet Technology and Data Science LAB (IDLab), Ghent University, Gent 9052, Belgium
| | - Mikel Casals
- TECNUN, University of Navarra, 20016 San Sebastian, Spain
| | | | - Silve Vicent
- CIMA University of Navarra, IdiSNA, 31008 Pamplona, Spain
| | - Idoia Ochoa
- TECNUN, University of Navarra, 20016 San Sebastian, Spain
- Instituto de Ciencia de los Datos e Inteligencia Artificial (DATAI), University of Navarra, 31008 Pamplona, Spain
| | - Olivier Gevaert
- Stanford Center for Biomedical Informatics Research, Department of Medicine and Department Biomedical Data Science, Stanford University, Stanford, CA 94305, United States
| | - Mikel Hernaez
- CIMA University of Navarra, IdiSNA, 31008 Pamplona, Spain
- Instituto de Ciencia de los Datos e Inteligencia Artificial (DATAI), University of Navarra, 31008 Pamplona, Spain
| |
Collapse
|
17
|
Zhao BW, Su XR, Yang Y, Li DX, Li GD, Hu PW, Zhao YG, Hu L. Drug-disease association prediction using semantic graph and function similarity representation learning over heterogeneous information networks. Methods 2023; 220:106-114. [PMID: 37972913 DOI: 10.1016/j.ymeth.2023.10.014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2023] [Revised: 10/13/2023] [Accepted: 10/28/2023] [Indexed: 11/19/2023] Open
Abstract
Discovering new indications for existing drugs is a promising development strategy at various stages of drug research and development. However, most of them complete their tasks by constructing a variety of heterogeneous networks without considering available higher-order connectivity patterns in heterogeneous biological information networks, which are believed to be useful for improving the accuracy of new drug discovering. To this end, we propose a computational-based model, called SFRLDDA, for drug-disease association prediction by using semantic graph and function similarity representation learning. Specifically, SFRLDDA first integrates a heterogeneous information network (HIN) by drug-disease, drug-protein, protein-disease associations, and their biological knowledge. Second, different representation learning strategies are applied to obtain the feature representations of drugs and diseases from different perspectives over semantic graph and function similarity graphs constructed, respectively. At last, a Random Forest classifier is incorporated by SFRLDDA to discover potential drug-disease associations (DDAs). Experimental results demonstrate that SFRLDDA yields a best performance when compared with other state-of-the-art models on three benchmark datasets. Moreover, case studies also indicate that the simultaneous consideration of semantic graph and function similarity of drugs and diseases in the HIN allows SFRLDDA to precisely predict DDAs in a more comprehensive manner.
Collapse
Affiliation(s)
- Bo-Wei Zhao
- The Xinjiang Technical Institute of Physics & Chemistry, Chinese Academy of Sciences, Urumqi 830011, China; University of Chinese Academy of Sciences, Beijing 100049, China; Xinjiang Laboratory of Minority Speech and Language Information Processing, Urumqi 830011, China.
| | - Xiao-Rui Su
- The Xinjiang Technical Institute of Physics & Chemistry, Chinese Academy of Sciences, Urumqi 830011, China; University of Chinese Academy of Sciences, Beijing 100049, China; Xinjiang Laboratory of Minority Speech and Language Information Processing, Urumqi 830011, China.
| | - Yue Yang
- The Xinjiang Technical Institute of Physics & Chemistry, Chinese Academy of Sciences, Urumqi 830011, China; University of Chinese Academy of Sciences, Beijing 100049, China; Xinjiang Laboratory of Minority Speech and Language Information Processing, Urumqi 830011, China.
| | - Dong-Xu Li
- The Xinjiang Technical Institute of Physics & Chemistry, Chinese Academy of Sciences, Urumqi 830011, China; University of Chinese Academy of Sciences, Beijing 100049, China; Xinjiang Laboratory of Minority Speech and Language Information Processing, Urumqi 830011, China.
| | - Guo-Dong Li
- The Xinjiang Technical Institute of Physics & Chemistry, Chinese Academy of Sciences, Urumqi 830011, China; University of Chinese Academy of Sciences, Beijing 100049, China; Xinjiang Laboratory of Minority Speech and Language Information Processing, Urumqi 830011, China.
| | - Peng-Wei Hu
- The Xinjiang Technical Institute of Physics & Chemistry, Chinese Academy of Sciences, Urumqi 830011, China; University of Chinese Academy of Sciences, Beijing 100049, China; Xinjiang Laboratory of Minority Speech and Language Information Processing, Urumqi 830011, China.
| | - Yong-Gang Zhao
- Department of Orthopaedic Surgery (hand and foot trauma), People's Hospital of Dongxihu, Wuhan 420100, China.
| | - Lun Hu
- The Xinjiang Technical Institute of Physics & Chemistry, Chinese Academy of Sciences, Urumqi 830011, China; University of Chinese Academy of Sciences, Beijing 100049, China; Xinjiang Laboratory of Minority Speech and Language Information Processing, Urumqi 830011, China.
| |
Collapse
|
18
|
Wang Y, Liu L, Wang C. Trends in using deep learning algorithms in biomedical prediction systems. Front Neurosci 2023; 17:1256351. [PMID: 38027475 PMCID: PMC10665494 DOI: 10.3389/fnins.2023.1256351] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2023] [Accepted: 09/25/2023] [Indexed: 12/01/2023] Open
Abstract
In the domain of using DL-based methods in medical and healthcare prediction systems, the utilization of state-of-the-art deep learning (DL) methodologies assumes paramount significance. DL has attained remarkable achievements across diverse domains, rendering its efficacy particularly noteworthy in this context. The integration of DL with health and medical prediction systems enables real-time analysis of vast and intricate datasets, yielding insights that significantly enhance healthcare outcomes and operational efficiency in the industry. This comprehensive literature review systematically investigates the latest DL solutions for the challenges encountered in medical healthcare, with a specific emphasis on DL applications in the medical domain. By categorizing cutting-edge DL approaches into distinct categories, including convolutional neural networks (CNNs), recurrent neural networks (RNNs), generative adversarial networks (GANs), long short-term memory (LSTM) models, support vector machine (SVM), and hybrid models, this study delves into their underlying principles, merits, limitations, methodologies, simulation environments, and datasets. Notably, the majority of the scrutinized articles were published in 2022, underscoring the contemporaneous nature of the research. Moreover, this review accentuates the forefront advancements in DL techniques and their practical applications within the realm of medical prediction systems, while simultaneously addressing the challenges that hinder the widespread implementation of DL in image segmentation within the medical healthcare domains. These discerned insights serve as compelling impetuses for future studies aimed at the progressive advancement of using DL-based methods in medical and health prediction systems. The evaluation metrics employed across the reviewed articles encompass a broad spectrum of features, encompassing accuracy, precision, specificity, F-score, adoptability, adaptability, and scalability.
Collapse
Affiliation(s)
- Yanbu Wang
- School of Strength and Conditioning, Beijing Sport University, Beijing, China
| | - Linqing Liu
- Department of Physical Education, Peking University, Beijing, China
| | - Chao Wang
- Institute of Competitive Sports, Beijing Sport University, Beijing, China
| |
Collapse
|
19
|
Ye S, Zhao W, Shen X, Jiang X, He T. An effective multi-task learning framework for drug repurposing based on graph representation learning. Methods 2023; 218:48-56. [PMID: 37516260 DOI: 10.1016/j.ymeth.2023.07.008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2023] [Revised: 07/04/2023] [Accepted: 07/20/2023] [Indexed: 07/31/2023] Open
Abstract
Drug repurposing, which typically applies the procedure of drug-disease associations (DDAs) prediction, is a feasible solution to drug discovery. Compared with traditional methods, drug repurposing can reduce the cost and time for drug development and advance the success rate of drug discovery. Although many methods for drug repurposing have been proposed and the obtained results are relatively acceptable, there is still some room for improving the predictive performance, since those methods fail to consider fully the issue of sparseness in known drug-disease associations. In this paper, we propose a novel multi-task learning framework based on graph representation learning to identify DDAs for drug repurposing. In our proposed framework, a heterogeneous information network is first constructed by combining multiple biological datasets. Then, a module consisting of multiple layers of graph convolutional networks is utilized to learn low-dimensional representations of nodes in the constructed heterogeneous information network. Finally, two types of auxiliary tasks are designed to help to train the target task of DDAs prediction in the multi-task learning framework. Comprehensive experiments are conducted on real data and the results demonstrate the effectiveness of the proposed method for drug repurposing.
Collapse
Affiliation(s)
- Shengwei Ye
- Hubei Provincial Key Laboratory of Artificial Intelligence and Smart Learning, Central China Normal University, Wuhan, Hubei 430079, PR China; School of Computer, Central China Normal University, Wuhan, Hubei 430079, PR China; National Language Resources Monitoring & Research Center for Network Media, Central China Normal University, Wuhan, Hubei 430079, PR China
| | - Weizhong Zhao
- Hubei Provincial Key Laboratory of Artificial Intelligence and Smart Learning, Central China Normal University, Wuhan, Hubei 430079, PR China; School of Computer, Central China Normal University, Wuhan, Hubei 430079, PR China; National Language Resources Monitoring & Research Center for Network Media, Central China Normal University, Wuhan, Hubei 430079, PR China.
| | - Xianjun Shen
- Hubei Provincial Key Laboratory of Artificial Intelligence and Smart Learning, Central China Normal University, Wuhan, Hubei 430079, PR China; School of Computer, Central China Normal University, Wuhan, Hubei 430079, PR China; National Language Resources Monitoring & Research Center for Network Media, Central China Normal University, Wuhan, Hubei 430079, PR China
| | - Xingpeng Jiang
- Hubei Provincial Key Laboratory of Artificial Intelligence and Smart Learning, Central China Normal University, Wuhan, Hubei 430079, PR China; School of Computer, Central China Normal University, Wuhan, Hubei 430079, PR China; National Language Resources Monitoring & Research Center for Network Media, Central China Normal University, Wuhan, Hubei 430079, PR China
| | - Tingting He
- Hubei Provincial Key Laboratory of Artificial Intelligence and Smart Learning, Central China Normal University, Wuhan, Hubei 430079, PR China; School of Computer, Central China Normal University, Wuhan, Hubei 430079, PR China; National Language Resources Monitoring & Research Center for Network Media, Central China Normal University, Wuhan, Hubei 430079, PR China
| |
Collapse
|
20
|
Zhang R, Wang X, Wang P, Meng Z, Cui W, Zhou Y. HTCL-DDI: a hierarchical triple-view contrastive learning framework for drug-drug interaction prediction. Brief Bioinform 2023; 24:bbad324. [PMID: 37742052 DOI: 10.1093/bib/bbad324] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2023] [Revised: 07/26/2023] [Accepted: 08/24/2023] [Indexed: 09/25/2023] Open
Abstract
Drug-drug interaction (DDI) prediction can discover potential risks of drug combinations in advance by detecting drug pairs that are likely to interact with each other, sparking an increasing demand for computational methods of DDI prediction. However, existing computational DDI methods mostly rely on the single-view paradigm, failing to handle the complex features and intricate patterns of DDIs due to the limited expressiveness of the single view. To this end, we propose a Hierarchical Triple-view Contrastive Learning framework for Drug-Drug Interaction prediction (HTCL-DDI), leveraging the molecular, structural and semantic views to model the complicated information involved in DDI prediction. To aggregate the intra-molecular compositional and structural information, we present a dual attention-aware network in the molecular view. Based on the molecular view, to further capture inter-molecular information, we utilize the one-hop neighboring information and high-order semantic relations in the structural view and semantic view, respectively. Then, we introduce contrastive learning to enhance drug representation learning from multifaceted aspects and improve the robustness of HTCL-DDI. Finally, we conduct extensive experiments on three real-world datasets. All the experimental results show the significant improvement of HTCL-DDI over the state-of-the-art methods, which also demonstrates that HTCL-DDI opens new avenues for ensuring medication safety and identifying synergistic drug combinations.
Collapse
Affiliation(s)
- Ran Zhang
- Computer Network Information Center, Chinese Academy of Sciences, Beijing, 100083, China
- University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Xuezhi Wang
- Computer Network Information Center, Chinese Academy of Sciences, Beijing, 100083, China
- University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Pengfei Wang
- Computer Network Information Center, Chinese Academy of Sciences, Beijing, 100083, China
- University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Zhen Meng
- Computer Network Information Center, Chinese Academy of Sciences, Beijing, 100083, China
- University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Wenjuan Cui
- Computer Network Information Center, Chinese Academy of Sciences, Beijing, 100083, China
- University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Yuanchun Zhou
- Computer Network Information Center, Chinese Academy of Sciences, Beijing, 100083, China
- University of Chinese Academy of Sciences, Beijing, 100049, China
| |
Collapse
|
21
|
Guan S, Zou Q, Wu H, Ding Y. Protein-DNA Binding Residues Prediction Using a Deep Learning Model With Hierarchical Feature Extraction. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:2619-2628. [PMID: 35834447 DOI: 10.1109/tcbb.2022.3190933] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Biologically important effects occur when proteins bind to other substances, of which binding to DNA is a crucial one. Therefore, accurate identification of protein-DNA binding residues is important for further understanding of the protein-DNA interaction mechanism. Although wet-lab methods can accurately obtain the location of bound residues, it requires significant human, financial and time costs. There is thus an urgent need to develop efficient computational-based methods. Most current state-of-the-art methods are two-step approaches: the first step uses a sliding window technique to extract residue features; the second step uses each residue as an input to the model for prediction. This has a negative impact on the efficiency of prediction and ease of use. In this study, we propose a sequence-to-sequence (seq2seq) model that can input the entire protein sequence of variable length and use two modules, Transformer Encoder Block and Feature Extracting Block, for hierarchical feature extraction, where Transformer Encoder Block is used to extract global features, and then Feature Extracting Block is used to extract local features to further improve the recognition capability of the model. The comparison results on two benchmark datasets, namely PDNA-543 and PDNA-41, prove the effectiveness of our method in identifying protein-DNA binding residues.
Collapse
|
22
|
Connell W, Garcia K, Goodarzi H, Keiser MJ. Learning chemical sensitivity reveals mechanisms of cellular response. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.08.26.554851. [PMID: 37693536 PMCID: PMC10491110 DOI: 10.1101/2023.08.26.554851] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/12/2023]
Abstract
Chemical probes interrogate disease mechanisms at the molecular level by linking genetic changes to observable traits. However, comprehensive chemical screens in diverse biological models are impractical. To address this challenge, we developed ChemProbe, a model that predicts cellular sensitivity to hundreds of molecular probes and drugs by learning to combine transcriptomes and chemical structures. Using ChemProbe, we inferred the chemical sensitivity of cancer cell lines and tumor samples and analyzed how the model makes predictions. We retrospectively evaluated drug response predictions for precision breast cancer treatment and prospectively validated chemical sensitivity predictions in new cellular models, including a genetically modified cell line. Our model interpretation analysis identified transcriptome features reflecting compound targets and protein network modules, identifying genes that drive ferroptosis. ChemProbe is an interpretable in silico screening tool that allows researchers to measure cellular response to diverse compounds, facilitating research into molecular mechanisms of chemical sensitivity.
Collapse
Affiliation(s)
- William Connell
- Department of Pharmaceutical Chemistry, University of California, San Francisco, San Francisco, CA, USA
- Institute for Neurodegenerative Diseases, University of California, San Francisco, San Francisco, CA, USA
- Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, CA, USA
| | - Kristle Garcia
- Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, CA, USA
- Department of Biochemistry and Biophysics, University of California, San Francisco, San Francisco, CA, USA
- Department of Urology, University of California, San Francisco, San Francisco, CA, USA
- Helen Diller Family Comprehensive Cancer Center, University of California, San Francisco, San Francisco, CA, USA
| | - Hani Goodarzi
- Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, CA, USA
- Department of Biochemistry and Biophysics, University of California, San Francisco, San Francisco, CA, USA
- Department of Urology, University of California, San Francisco, San Francisco, CA, USA
- Helen Diller Family Comprehensive Cancer Center, University of California, San Francisco, San Francisco, CA, USA
| | - Michael J. Keiser
- Department of Pharmaceutical Chemistry, University of California, San Francisco, San Francisco, CA, USA
- Institute for Neurodegenerative Diseases, University of California, San Francisco, San Francisco, CA, USA
- Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, CA, USA
| |
Collapse
|
23
|
Yi J, Lee S, Lim S, Cho C, Piao Y, Yeo M, Kim D, Kim S, Lee S. Exploring chemical space for lead identification by propagating on chemical similarity network. Comput Struct Biotechnol J 2023; 21:4187-4195. [PMID: 37680266 PMCID: PMC10480321 DOI: 10.1016/j.csbj.2023.08.016] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2023] [Revised: 08/08/2023] [Accepted: 08/20/2023] [Indexed: 09/09/2023] Open
Abstract
Motivation Lead identification is a fundamental step to prioritize candidate compounds for downstream drug discovery process. Machine learning (ML) and deep learning (DL) approaches are widely used to identify lead compounds using both chemical property and experimental information. However, ML or DL methods rarely consider compound similarity information directly since ML and DL models use abstract representation of molecules for model construction. Alternatively, data mining approaches are also used to explore chemical space with drug candidates by screening undesirable compounds. A major challenge for data mining approaches is to develop efficient data mining methods that search large chemical space for desirable lead compounds with low false positive rate. Results In this work, we developed a network propagation (NP) based data mining method for lead identification that performs search on an ensemble of chemical similarity networks. We compiled 14 fingerprint-based similarity networks. Given a target protein of interest, we use a deep learning-based drug target interaction model to narrow down compound candidates and then we use network propagation to prioritize drug candidates that are highly correlated with drug activity score such as IC50. In an extensive experiment with BindingDB, we showed that our approach successfully discovered intentionally unlabeled compounds for given targets. To further demonstrate the prediction power of our approach, we identified 24 candidate leads for CLK1. Two out of five synthesizable candidates were experimentally validated in binding assays. In conclusion, our framework can be very useful for lead identification from very large compound databases such as ZINC.
Collapse
Affiliation(s)
- Jungseob Yi
- Interdisciplinary Program in Artificial Intelligence, Seoul National University, Gwanak-ro 1, Gwanak-gu, Seoul, 08826, South Korea
| | - Sangseon Lee
- Institute of Computer Technology, Seoul National University, Gwanak-ro 1, Gwanak-gu, Seoul, 08826, South Korea
| | - Sangsoo Lim
- School of AI Software Convergence, Dongguk University, Pildong-ro 1-gil, Jung-gu, Seoul, South Korea
| | - Changyun Cho
- Interdisciplinary Program in Bioinformatics, Seoul National University, Gwanak-ro 1, Gwanak-gu, Seoul, 08826, South Korea
| | - Yinhua Piao
- Department of Computer Science and Engineering, Seoul National University, Gwanak-ro 1, Gwanak-gu, Seoul, 08826, South Korea
| | - Marie Yeo
- PHARMGENSCIENCE CO., LTD., 216, Dongjak-daero, Seocho-gu, Seoul, 06554, South Korea
| | - Dongkyu Kim
- PHARMGENSCIENCE CO., LTD., 216, Dongjak-daero, Seocho-gu, Seoul, 06554, South Korea
| | - Sun Kim
- Interdisciplinary Program in Artificial Intelligence, Seoul National University, Gwanak-ro 1, Gwanak-gu, Seoul, 08826, South Korea
- Interdisciplinary Program in Bioinformatics, Seoul National University, Gwanak-ro 1, Gwanak-gu, Seoul, 08826, South Korea
- Department of Computer Science and Engineering, Seoul National University, Gwanak-ro 1, Gwanak-gu, Seoul, 08826, South Korea
- AIGENDRUG CO., LTD., Gwanak-ro 1, Gwanak-gu, Seoul, 08826, South Korea
| | - Sunho Lee
- AIGENDRUG CO., LTD., Gwanak-ro 1, Gwanak-gu, Seoul, 08826, South Korea
| |
Collapse
|
24
|
Wang X, Duan M, Li J, Ma A, Xu D, Li Z, Liu B, Ma Q. MarsGT: Multi-omics analysis for rare population inference using single-cell graph transformer. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.08.15.553454. [PMID: 37645917 PMCID: PMC10462017 DOI: 10.1101/2023.08.15.553454] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/31/2023]
Abstract
Rare cell populations are key in neoplastic progression and therapeutic response, offering potential intervention targets. However, their computational identification and analysis often lag behind major cell types. To fill this gap, we introduced MarsGT: Multi-omics Analysis for Rare population inference using Single-cell Graph Transformer. It identifies rare cell populations using a probability-based heterogeneous graph transformer on single-cell multi-omics data. MarsGT outperformed existing tools in identifying rare cells across 400 simulated and four real human datasets. In mouse retina data, it revealed unique subpopulations of rare bipolar cells and a Müller glia cell subpopulation. In human lymph node data, MarsGT detected an intermediate B cell population potentially acting as lymphoma precursors. In human melanoma data, it identified a rare MAIT-like population impacted by a high IFN-I response and revealed the mechanism of immunotherapy. Hence, MarsGT offers biological insights and suggests potential strategies for early detection and therapeutic intervention of disease.
Collapse
Affiliation(s)
- Xiaoying Wang
- Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, OH, 43210, USA
- Pelotonia Institute for Immuno-Oncology, The James Comprehensive Cancer Center, The Ohio State University, Columbus, OH 43210, USA
| | - Maoteng Duan
- School of Mathematics, Shandong University, Jinan, Shandong, 250100, China
| | - Jingxian Li
- School of Mathematics, Shandong University, Jinan, Shandong, 250100, China
| | - Anjun Ma
- Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, OH, 43210, USA
- Pelotonia Institute for Immuno-Oncology, The James Comprehensive Cancer Center, The Ohio State University, Columbus, OH 43210, USA
| | - Dong Xu
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO 65211, USA
- Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, MO 65211, USA
| | - Zihai Li
- Pelotonia Institute for Immuno-Oncology, The James Comprehensive Cancer Center, The Ohio State University, Columbus, OH 43210, USA
| | - Bingqiang Liu
- School of Mathematics, Shandong University, Jinan, Shandong, 250100, China
| | - Qin Ma
- Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, OH, 43210, USA
- Pelotonia Institute for Immuno-Oncology, The James Comprehensive Cancer Center, The Ohio State University, Columbus, OH 43210, USA
| |
Collapse
|
25
|
Wu Y, Ni X, Wang Z, Feng W. Enhancing drug property prediction with dual-channel transfer learning based on molecular fragment. BMC Bioinformatics 2023; 24:293. [PMID: 37479969 PMCID: PMC10360281 DOI: 10.1186/s12859-023-05413-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2023] [Accepted: 07/13/2023] [Indexed: 07/23/2023] Open
Abstract
BACKGROUND Accurate prediction of molecular property holds significance in contemporary drug discovery and medical research. Recent advances in AI-driven molecular property prediction have shown promising results. Due to the costly annotation of in vitro and in vivo experiments, transfer learning paradigm has been gaining momentum in extracting general self-supervised information to facilitate neural network learning. However, prior pretraining strategies have overlooked the necessity of explicitly incorporating domain knowledge, especially the molecular fragments, into model design, resulting in the under-exploration of the molecular semantic space. RESULTS We propose an effective model with FRagment-based dual-channEL pretraining (FREL). Equipped with molecular fragments, FREL comprehensively employs masked autoencoder and contrastive learning to learn intra- and inter-molecule agreement, respectively. We further conduct extensive experiments on ten public datasets to demonstrate its superiority over state-of-the-art models. Further investigations and interpretations manifest the underlying relationship between molecular representations and molecular properties. CONCLUSIONS Our proposed model FREL achieves state-of-the-art performance on the benchmark datasets, emphasizing the importance of incorporating molecular fragments into model design. The expressiveness of learned molecular representations is also investigated by visualization and correlation analysis. Case studies indicate that the learned molecular representations better capture the drug property variation and fragment semantics.
Collapse
Affiliation(s)
- Yue Wu
- College of Traditional Chinese Medicine, Shandong University of Traditional Chinese Medicine, Jinan, China
| | - Xinran Ni
- College of Pharmacy, Shandong University of Traditional Chinese Medicine, Jinan, China
| | - Zhihao Wang
- College of Intelligence and Information Engineering, Shandong University of Traditional Chinese Medicine, Jinan, China
| | - Weike Feng
- College of Traditional Chinese Medicine, Shandong University of Traditional Chinese Medicine, Jinan, China.
| |
Collapse
|
26
|
Deng JP, Liu X, Li Y, Ni SH, Sun SN, Ou-Yang XL, Ye XH, Wang LJ, Lu L. Drug vector representation and potential efficacy prediction based on graph representation learning and transcriptome data: Acacetin from traditional Chinese Medicine model. JOURNAL OF ETHNOPHARMACOLOGY 2023; 305:115966. [PMID: 36572325 DOI: 10.1016/j.jep.2022.115966] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/04/2022] [Revised: 11/03/2022] [Accepted: 11/22/2022] [Indexed: 06/17/2023]
Abstract
ETHNOPHARMACOLOGICAL RELEVANCE Acacetin is widely distributed in traditional Chinese medicine and traditional herbs, with strong biological activity. Perhaps there are many potential effects that have not been explored. In the field of drug discovery, Mainstream methods focus on chemical structure. Traditional medicine cannot adapt to the mainstream prediction methods due to its complex composition. AIM OF THE STUDY Our aim is that provide a prediction method more suitable for traditional medicine by graph representation learning and transcriptome data. And use this method to predict acacetin. MATERIALS AND METHODS Our method mainly consists of two parts. The first part is to use the method of graph representation learning to vectorize drugs as a database. The original data of this part comes from transcriptome data on Gene Expression Omnibus. The method of graph representation learning is an unsupervised learning. If there is no prior knowledge as the label data, the training effect cannot be analyzed. Therefore, we define a standard score to evaluate our results through the idea of Jaccard index. The second part is to put the target drug into our database. The potential similarity between drugs was evaluated by the Euclidean distance between vectors, and the potential efficacy of the target drug is predicted by combining the chemical-disease relationship data in the Comparative Toxicogenomics Database. The target drug in this paper uses acacetin. We compared the predicted results with existing reports, and we also experimentally verified the efficacy of improving insulin resistance in the predicted results. RESULTS The prediction results are relatively consistent with the existing reports, which demonstrated that our method has a certain degree of predictive performance. And for the efficacy of improving insulin resistance in the predicted result, we verified it through experiments. CONCLUSIONS We propose a method to predict the potential efficacy of drugs based on transcriptome data, using Graph representation learning, which is very suitable for traditional medicine. Through this method, we predicted the efficacy of acacetin, and the results are relatively consistent with the current reports. This provides a new idea for unsupervised learning to apply medical information.
Collapse
Affiliation(s)
- Jian-Ping Deng
- The First Affiliated Hospital, Guangzhou University of Chinese Medicine, Guangzhou, 510407, China; Lingnan Medical Research Center, Guangzhou University of Chinese Medicine, Guangzhou, 510407, China; University Key Laboratory of Traditional Chinese Medicine Prevention and Treatment of Chronic Heart Failure, Guangdong Province, Guangzhou, 510407, China
| | - Xin Liu
- The First Affiliated Hospital, Guangzhou University of Chinese Medicine, Guangzhou, 510407, China; Lingnan Medical Research Center, Guangzhou University of Chinese Medicine, Guangzhou, 510407, China; University Key Laboratory of Traditional Chinese Medicine Prevention and Treatment of Chronic Heart Failure, Guangdong Province, Guangzhou, 510407, China
| | - Yue Li
- The First Affiliated Hospital, Guangzhou University of Chinese Medicine, Guangzhou, 510407, China; Lingnan Medical Research Center, Guangzhou University of Chinese Medicine, Guangzhou, 510407, China; University Key Laboratory of Traditional Chinese Medicine Prevention and Treatment of Chronic Heart Failure, Guangdong Province, Guangzhou, 510407, China
| | - Shi-Hao Ni
- The First Affiliated Hospital, Guangzhou University of Chinese Medicine, Guangzhou, 510407, China; Lingnan Medical Research Center, Guangzhou University of Chinese Medicine, Guangzhou, 510407, China; University Key Laboratory of Traditional Chinese Medicine Prevention and Treatment of Chronic Heart Failure, Guangdong Province, Guangzhou, 510407, China
| | - Shu-Ning Sun
- The First Affiliated Hospital, Guangzhou University of Chinese Medicine, Guangzhou, 510407, China; Lingnan Medical Research Center, Guangzhou University of Chinese Medicine, Guangzhou, 510407, China; University Key Laboratory of Traditional Chinese Medicine Prevention and Treatment of Chronic Heart Failure, Guangdong Province, Guangzhou, 510407, China
| | - Xiao-Lu Ou-Yang
- The First Affiliated Hospital, Guangzhou University of Chinese Medicine, Guangzhou, 510407, China; Lingnan Medical Research Center, Guangzhou University of Chinese Medicine, Guangzhou, 510407, China; University Key Laboratory of Traditional Chinese Medicine Prevention and Treatment of Chronic Heart Failure, Guangdong Province, Guangzhou, 510407, China
| | - Xiao-Han Ye
- Dongguan Hospital, Guangzhou University of Chinese Medicine, Guangzhou, 510407, China.
| | - Ling-Jun Wang
- The First Affiliated Hospital, Guangzhou University of Chinese Medicine, Guangzhou, 510407, China; Lingnan Medical Research Center, Guangzhou University of Chinese Medicine, Guangzhou, 510407, China; University Key Laboratory of Traditional Chinese Medicine Prevention and Treatment of Chronic Heart Failure, Guangdong Province, Guangzhou, 510407, China.
| | - Lu Lu
- The First Affiliated Hospital, Guangzhou University of Chinese Medicine, Guangzhou, 510407, China; Lingnan Medical Research Center, Guangzhou University of Chinese Medicine, Guangzhou, 510407, China; University Key Laboratory of Traditional Chinese Medicine Prevention and Treatment of Chronic Heart Failure, Guangdong Province, Guangzhou, 510407, China.
| |
Collapse
|
27
|
Jaume-Santero F, Bornet A, Valery A, Naderi N, Vicente Alvarez D, Proios D, Yazdani A, Bournez C, Fessard T, Teodoro D. Transformer Performance for Chemical Reactions: Analysis of Different Predictive and Evaluation Scenarios. J Chem Inf Model 2023; 63:1914-1924. [PMID: 36952584 PMCID: PMC10091402 DOI: 10.1021/acs.jcim.2c01407] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/25/2023]
Abstract
The prediction of chemical reaction pathways has been accelerated by the development of novel machine learning architectures based on the deep learning paradigm. In this context, deep neural networks initially designed for language translation have been used to accurately predict a wide range of chemical reactions. Among models suited for the task of language translation, the recently introduced molecular transformer reached impressive performance in terms of forward-synthesis and retrosynthesis predictions. In this study, we first present an analysis of the performance of transformer models for product, reactant, and reagent prediction tasks under different scenarios of data availability and data augmentation. We find that the impact of data augmentation depends on the prediction task and on the metric used to evaluate the model performance. Second, we probe the contribution of different combinations of input formats, tokenization schemes, and embedding strategies to model performance. We find that less stable input settings generally lead to better performance. Lastly, we validate the superiority of round-trip accuracy over simpler evaluation metrics, such as top-k accuracy, using a committee of human experts and show a strong agreement for predictions that pass the round-trip test. This demonstrates the usefulness of more elaborate metrics in complex predictive scenarios and highlights the limitations of direct comparisons to a predefined database, which may include a limited number of chemical reaction pathways.
Collapse
Affiliation(s)
- Fernando Jaume-Santero
- Department of Radiology and Medical Informatics, University of Geneva, 1205 Geneva, Switzerland
- Geneva School of Business Administration, HES-SO University of Applied Sciences and Arts of Western Switzerland, 1227 Geneva, Switzerland
| | - Alban Bornet
- Department of Radiology and Medical Informatics, University of Geneva, 1205 Geneva, Switzerland
- Geneva School of Business Administration, HES-SO University of Applied Sciences and Arts of Western Switzerland, 1227 Geneva, Switzerland
| | | | - Nona Naderi
- Geneva School of Business Administration, HES-SO University of Applied Sciences and Arts of Western Switzerland, 1227 Geneva, Switzerland
- Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
| | - David Vicente Alvarez
- Department of Radiology and Medical Informatics, University of Geneva, 1205 Geneva, Switzerland
- Geneva School of Business Administration, HES-SO University of Applied Sciences and Arts of Western Switzerland, 1227 Geneva, Switzerland
| | - Dimitrios Proios
- Department of Radiology and Medical Informatics, University of Geneva, 1205 Geneva, Switzerland
| | - Anthony Yazdani
- Department of Radiology and Medical Informatics, University of Geneva, 1205 Geneva, Switzerland
| | | | | | - Douglas Teodoro
- Department of Radiology and Medical Informatics, University of Geneva, 1205 Geneva, Switzerland
- Geneva School of Business Administration, HES-SO University of Applied Sciences and Arts of Western Switzerland, 1227 Geneva, Switzerland
- Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
| |
Collapse
|
28
|
Molecular Property Prediction by Combining LSTM and GAT. Biomolecules 2023; 13:biom13030503. [PMID: 36979438 PMCID: PMC10046625 DOI: 10.3390/biom13030503] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2023] [Revised: 02/10/2023] [Accepted: 03/06/2023] [Indexed: 03/12/2023] Open
Abstract
Molecular property prediction is an important direction in computer-aided drug design. In this paper, to fully explore the information from SMILE stings and graph data of molecules, we combined the SALSTM and GAT methods in order to mine the feature information of molecules from sequences and graphs. The embedding atoms are obtained through SALSTM, firstly using SMILES strings, and they are combined with graph node features and fed into the GAT to extract the global molecular representation. At the same time, data augmentation is added to enlarge the training dataset and improve the performance of the model. Finally, to enhance the interpretability of the model, the attention layers of both models are fused together to highlight the key atoms. Comparison with other graph-based and sequence-based methods, for multiple datasets, shows that our method can achieve high prediction accuracy with good generalizability.
Collapse
|
29
|
Sun Z, Wang LE, Sun J. A Multi-scale Graph Embedding Method via Multiple Corpora. Neurocomputing 2023. [DOI: 10.1016/j.neucom.2023.03.053] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/03/2023]
|
30
|
Guarrasi V, Soda P. Multi-objective optimization determines when, which and how to fuse deep networks: An application to predict COVID-19 outcomes. Comput Biol Med 2023; 154:106625. [PMID: 36738713 PMCID: PMC9892294 DOI: 10.1016/j.compbiomed.2023.106625] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2022] [Revised: 01/18/2023] [Accepted: 01/28/2023] [Indexed: 02/05/2023]
Abstract
The COVID-19 pandemic has caused millions of cases and deaths and the AI-related scientific community, after being involved with detecting COVID-19 signs in medical images, has been now directing the efforts towards the development of methods that can predict the progression of the disease. This task is multimodal by its very nature and, recently, baseline results achieved on the publicly available AIforCOVID dataset have shown that chest X-ray scans and clinical information are useful to identify patients at risk of severe outcomes. While deep learning has shown superior performance in several medical fields, in most of the cases it considers unimodal data only. In this respect, when, which and how to fuse the different modalities is an open challenge in multimodal deep learning. To cope with these three questions here we present a novel approach optimizing the setup of a multimodal end-to-end model. It exploits Pareto multi-objective optimization working with a performance metric and the diversity score of multiple candidate unimodal neural networks to be fused. We test our method on the AIforCOVID dataset, attaining state-of-the-art results, not only outperforming the baseline performance but also being robust to external validation. Moreover, exploiting XAI algorithms we figure out a hierarchy among the modalities and we extract the features' intra-modality importance, enriching the trust on the predictions made by the model.
Collapse
Affiliation(s)
- Valerio Guarrasi
- Unit of Computer Systems and Bioinformatics, Department of Engineering, University Campus Bio-Medico of Rome, Italy; Department of Computer, Control, and Management Engineering, Sapienza University of Rome, Italy.
| | - Paolo Soda
- Unit of Computer Systems and Bioinformatics, Department of Engineering, University Campus Bio-Medico of Rome, Italy; Department of Radiation Sciences, Radiation Physics, Biomedical Engineering, Umeå, University, Umeå, Sweden.
| |
Collapse
|
31
|
Zhu Y, Zhang F, Zhang S, Yi M. Predicting latent lncRNA and cancer metastatic event associations via variational graph auto-encoder. Methods 2023; 211:1-9. [PMID: 36709790 DOI: 10.1016/j.ymeth.2023.01.006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2022] [Revised: 12/05/2022] [Accepted: 01/20/2023] [Indexed: 01/27/2023] Open
Abstract
Long non-coding RNA (lncRNA) are shown to be closely associated with cancer metastatic events (CME, e.g., cancer cell invasion, intravasation, extravasation, proliferation) that collaboratively accelerate malignant cancer spread and cause high mortality rate in patients. Clinical trials may accurately uncover the relationships between lncRNAs and CMEs; however, it is time-consuming and expensive. With the accumulation of data, there is an urgent need to find efficient ways to identify these relationships. Herein, a graph embedding representation-based predictor (VGEA-LCME) for exploring latent lncRNA-CME associations is introduced. In VGEA-LCME, a heterogeneous combined network is constructed by integrating similarity and linkage matrix that can maintain internal and external characteristics of networks, and a variational graph auto-encoder serves as a feature generator to represent arbitrary lncRNA and CME pair. The final robustness predicted result is obtained by ensemble classifier strategy via cross-validation. Experimental comparisons and literature verification show better remarkable performance of VGEA-LCME, although the similarities between CMEs are challenging to calculate. In addition, VGEA-LCME can further identify organ-specific CMEs. To the best of our knowledge, this is the first computational attempt to discover the potential relationships between lncRNAs and CMEs. It may provide support and new insight for guiding experimental research of metastatic cancers. The source code and data are available at https://github.com/zhuyuan-cug/VGAE-LCME.
Collapse
Affiliation(s)
- Yuan Zhu
- School of Automation, China University of Geosciences, 388 Lumo Road, Hongshan District, 430074, Wuhan, Hubei, China; Hubei Key Laboratory of Advanced Control and Intelligent Automation for Complex Systems, 388 Lumo Road, Hongshan District, 430074, Wuhan, Hubei, China; Engineering Research Center of Intelligent Technology for Geo-Exploration, 388 Lumo Road, Hongshan District, 430074, Wuhan, Hubei, China
| | - Feng Zhang
- School of Mathematics and Physics, China University of Geosciences, 388 Lumo Road, Hongshan District, 430074, Wuhan, Hubei, China
| | - Shihua Zhang
- College of Life Science and Health, Wuhan University of Science and Technology, 974 Heping Avenue, Qingshan District, 430081, Wuhan, Hubei, China.
| | - Ming Yi
- School of Mathematics and Physics, China University of Geosciences, 388 Lumo Road, Hongshan District, 430074, Wuhan, Hubei, China.
| |
Collapse
|
32
|
Partin A, Brettin TS, Zhu Y, Narykov O, Clyde A, Overbeek J, Stevens RL. Deep learning methods for drug response prediction in cancer: Predominant and emerging trends. Front Med (Lausanne) 2023; 10:1086097. [PMID: 36873878 PMCID: PMC9975164 DOI: 10.3389/fmed.2023.1086097] [Citation(s) in RCA: 19] [Impact Index Per Article: 19.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2022] [Accepted: 01/23/2023] [Indexed: 02/17/2023] Open
Abstract
Cancer claims millions of lives yearly worldwide. While many therapies have been made available in recent years, by in large cancer remains unsolved. Exploiting computational predictive models to study and treat cancer holds great promise in improving drug development and personalized design of treatment plans, ultimately suppressing tumors, alleviating suffering, and prolonging lives of patients. A wave of recent papers demonstrates promising results in predicting cancer response to drug treatments while utilizing deep learning methods. These papers investigate diverse data representations, neural network architectures, learning methodologies, and evaluations schemes. However, deciphering promising predominant and emerging trends is difficult due to the variety of explored methods and lack of standardized framework for comparing drug response prediction models. To obtain a comprehensive landscape of deep learning methods, we conducted an extensive search and analysis of deep learning models that predict the response to single drug treatments. A total of 61 deep learning-based models have been curated, and summary plots were generated. Based on the analysis, observable patterns and prevalence of methods have been revealed. This review allows to better understand the current state of the field and identify major challenges and promising solution paths.
Collapse
Affiliation(s)
- Alexander Partin
- Division of Data Science and Learning, Argonne National Laboratory, Lemont, IL, United States
| | - Thomas S. Brettin
- Division of Data Science and Learning, Argonne National Laboratory, Lemont, IL, United States
| | - Yitan Zhu
- Division of Data Science and Learning, Argonne National Laboratory, Lemont, IL, United States
| | - Oleksandr Narykov
- Division of Data Science and Learning, Argonne National Laboratory, Lemont, IL, United States
| | - Austin Clyde
- Division of Data Science and Learning, Argonne National Laboratory, Lemont, IL, United States
| | - Jamie Overbeek
- Division of Data Science and Learning, Argonne National Laboratory, Lemont, IL, United States
| | - Rick L. Stevens
- Division of Data Science and Learning, Argonne National Laboratory, Lemont, IL, United States
- Department of Computer Science, The University of Chicago, Chicago, IL, United States
| |
Collapse
|
33
|
Wei MM, Yu CQ, Li LP, You ZH, Ren ZH, Guan YJ, Wang XF, Li YC. LPIH2V: LncRNA-protein interactions prediction using HIN2Vec based on heterogeneous networks model. Front Genet 2023; 14:1122909. [PMID: 36845392 PMCID: PMC9950107 DOI: 10.3389/fgene.2023.1122909] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2022] [Accepted: 01/30/2023] [Indexed: 02/12/2023] Open
Abstract
LncRNA-protein interaction plays an important role in the development and treatment of many human diseases. As the experimental approaches to determine lncRNA-protein interactions are expensive and time-consuming, considering that there are few calculation methods, therefore, it is urgent to develop efficient and accurate methods to predict lncRNA-protein interactions. In this work, a model for heterogeneous network embedding based on meta-path, namely LPIH2V, is proposed. The heterogeneous network is composed of lncRNA similarity networks, protein similarity networks, and known lncRNA-protein interaction networks. The behavioral features are extracted in a heterogeneous network using the HIN2Vec method of network embedding. The results showed that LPIH2V obtains an AUC of 0.97 and ACC of 0.95 in the 5-fold cross-validation test. The model successfully showed superiority and good generalization ability. Compared to other models, LPIH2V not only extracts attribute characteristics by similarity, but also acquires behavior properties by meta-path wandering in heterogeneous networks. LPIH2V would be beneficial in forecasting interactions between lncRNA and protein.
Collapse
Affiliation(s)
- Meng-Meng Wei
- School of Information Engineering, Xijing University, Xi’an, China
| | - Chang-Qing Yu
- School of Information Engineering, Xijing University, Xi’an, China,*Correspondence: Chang-Qing Yu, ; Li-Ping Li,
| | - Li-Ping Li
- School of Information Engineering, Xijing University, Xi’an, China,College of Grassland and Environment Sciences, Xinjiang Agricultural University, Urumqi, China,*Correspondence: Chang-Qing Yu, ; Li-Ping Li,
| | - Zhu-Hong You
- School of Computer Science, Northwestern Polytechnical University, Xi’an, China
| | - Zhong-Hao Ren
- School of Information Engineering, Xijing University, Xi’an, China
| | - Yong-Jian Guan
- School of Information Engineering, Xijing University, Xi’an, China
| | | | | |
Collapse
|
34
|
Chowdhury S, Chen Y, Wen A, Ma X, Dai Q, Yu Y, Fu S, Jiang X, Zong N. Predicting Physiological Response in Heart Failure Management: A Graph Representation Learning Approach using Electronic Health Records. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023:2023.01.27.23285129. [PMID: 36747787 PMCID: PMC9901060 DOI: 10.1101/2023.01.27.23285129] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/29/2023]
Abstract
Heart failure management is challenging due to the complex and heterogenous nature of its pathophysiology which makes the conventional treatments based on the "one size fits all" ideology not suitable. Coupling the longitudinal medical data with novel deep learning and network-based analytics will enable identifying the distinct patient phenotypic characteristics to help individualize the treatment regimen through the accurate prediction of the physiological response. In this study, we develop a graph representation learning framework that integrates the heterogeneous clinical events in the electronic health records (EHR) as graph format data, in which the patient-specific patterns and features are naturally infused for personalized predictions of lab test response. The framework includes a novel Graph Transformer Network that is equipped with a self-attention mechanism to model the underlying spatial interdependencies among the clinical events characterizing the cardiac physiological interactions in the heart failure treatment and a graph neural network (GNN) layer to incorporate the explicit temporality of each clinical event, that would help summarize the therapeutic effects induced on the physiological variables, and subsequently on the patient's health status as the heart failure condition progresses over time. We introduce a global attention mask that is computed based on event co-occurrences and is aggregated across all patient records to enhance the guidance of neighbor selection in graph representation learning. We test the feasibility of our model through detailed quantitative and qualitative evaluations on observational EHR data.
Collapse
Affiliation(s)
- Shaika Chowdhury
- Department of Artificial Intelligence and Informatics Research, Mayo Clinic, Rochester, MN, USA
| | - Yongbin Chen
- Department of Biochemistry and Molecular Biology, Mayo Clinic, Rochester, MN, USA
| | - Andrew Wen
- Department of Artificial Intelligence and Informatics Research, Mayo Clinic, Rochester, MN, USA
| | - Xiao Ma
- Department of Cardiovascular Medicine, Mayo Clinic, Rochester, MN, USA
| | - Qiying Dai
- Department of Cardiovascular Medicine, Mayo Clinic, Rochester, MN, USA
| | - Yue Yu
- Department of Health Sciences Research, Mayo Clinic, Rochester, MN, USA
| | - Sunyang Fu
- Department of Artificial Intelligence and Informatics Research, Mayo Clinic, Rochester, MN, USA
| | - Xiaoqian Jiang
- School of Biomedical Informatics, University of Texas Health Science Center, Houston, TX, USA
| | - Nansu Zong
- Department of Artificial Intelligence and Informatics Research, Mayo Clinic, Rochester, MN, USA
| |
Collapse
|
35
|
Wang T, Yang J, Xiao Y, Wang J, Wang Y, Zeng X, Wang Y, Peng J. DFinder: a novel end-to-end graph embedding-based method to identify drug-food interactions. Bioinformatics 2022; 39:6965015. [PMID: 36579885 PMCID: PMC9828147 DOI: 10.1093/bioinformatics/btac837] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2022] [Revised: 11/07/2022] [Accepted: 12/28/2022] [Indexed: 12/30/2022] Open
Abstract
MOTIVATION Drug-food interactions (DFIs) occur when some constituents of food affect the bioaccessibility or efficacy of the drug by involving in drug pharmacodynamic and/or pharmacokinetic processes. Many computational methods have achieved remarkable results in link prediction tasks between biological entities, which show the potential of computational methods in discovering novel DFIs. However, there are few computational approaches that pay attention to DFI identification. This is mainly due to the lack of DFI data. In addition, food is generally made up of a variety of chemical substances. The complexity of food makes it difficult to generate accurate feature representations for food. Therefore, it is urgent to develop effective computational approaches for learning the food feature representation and predicting DFIs. RESULTS In this article, we first collect DFI data from DrugBank and PubMed, respectively, to construct two datasets, named DrugBank-DFI and PubMed-DFI. Based on these two datasets, two DFI networks are constructed. Then, we propose a novel end-to-end graph embedding-based method named DFinder to identify DFIs. DFinder combines node attribute features and topological structure features to learn the representations of drugs and food constituents. In topology space, we adopt a simplified graph convolution network-based method to learn the topological structure features. In feature space, we use a deep neural network to extract attribute features from the original node attributes. The evaluation results indicate that DFinder performs better than other baseline methods. AVAILABILITY AND IMPLEMENTATION The source code is available at https://github.com/23AIBox/23AIBox-DFinder. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Tao Wang
- School of Computer Science, Northwestern Polytechnical University, Xi’an 710072, China,Key Laboratory of Big Data Storage and Management, Northwestern Polytechnical University, Ministry of Industry and Information Technology, Xi’an 710072, China
| | - Jinjin Yang
- School of Computer Science, Northwestern Polytechnical University, Xi’an 710072, China,Key Laboratory of Big Data Storage and Management, Northwestern Polytechnical University, Ministry of Industry and Information Technology, Xi’an 710072, China
| | - Yifu Xiao
- School of Computer Science, Northwestern Polytechnical University, Xi’an 710072, China,Key Laboratory of Big Data Storage and Management, Northwestern Polytechnical University, Ministry of Industry and Information Technology, Xi’an 710072, China
| | - Jingru Wang
- School of Computer Science, Northwestern Polytechnical University, Xi’an 710072, China,Key Laboratory of Big Data Storage and Management, Northwestern Polytechnical University, Ministry of Industry and Information Technology, Xi’an 710072, China
| | - Yuxian Wang
- School of Computer Science, Northwestern Polytechnical University, Xi’an 710072, China,Key Laboratory of Big Data Storage and Management, Northwestern Polytechnical University, Ministry of Industry and Information Technology, Xi’an 710072, China
| | - Xi Zeng
- School of Computer Science, Northwestern Polytechnical University, Xi’an 710072, China,Key Laboratory of Big Data Storage and Management, Northwestern Polytechnical University, Ministry of Industry and Information Technology, Xi’an 710072, China
| | - Yongtian Wang
- School of Computer Science, Northwestern Polytechnical University, Xi’an 710072, China,Key Laboratory of Big Data Storage and Management, Northwestern Polytechnical University, Ministry of Industry and Information Technology, Xi’an 710072, China
| | | |
Collapse
|
36
|
Cho HN, Ahn I, Gwon H, Kang HJ, Kim Y, Seo H, Choi H, Kim M, Han J, Kee G, Jun TJ, Kim YH. Heterogeneous graph construction and HinSAGE learning from electronic medical records. Sci Rep 2022; 12:21152. [PMID: 36477457 PMCID: PMC9729175 DOI: 10.1038/s41598-022-25693-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2022] [Accepted: 12/02/2022] [Indexed: 12/12/2022] Open
Abstract
Graph representation learning is a method for introducing how to effectively construct and learn patient embeddings using electronic medical records. Adapting the integration will support and advance the previous methods to predict the prognosis of patients in network models. This study aims to address the challenge of implementing a complex and highly heterogeneous dataset, including the following: (1) demonstrating how to build a multi-attributed and multi-relational graph model (2) and applying a downstream disease prediction task of a patient's prognosis using the HinSAGE algorithm. We present a bipartite graph schema and a graph database construction in detail. The first constructed graph database illustrates a query of a predictive network that provides analytical insights using a graph representation of a patient's journey. Moreover, we demonstrate an alternative bipartite model where we apply the model to the HinSAGE to perform the link prediction task for predicting the event occurrence. Consequently, the performance evaluation indicated that our heterogeneous graph model was successfully predicted as a baseline model. Overall, our graph database successfully demonstrated efficient real-time query performance and showed HinSAGE implementation to predict cardiovascular disease event outcomes on supervised link prediction learning.
Collapse
Affiliation(s)
- Ha Na Cho
- grid.267370.70000 0004 0533 4667Division of Cardiology, Department of Internal Medicine, Asan Medical Center, University of Ulsan College of Medicine, 88, Olympicro 43 Gil, Songpagu, 05505 Seoul, Republic of Korea
| | - Imjin Ahn
- grid.267370.70000 0004 0533 4667Department of Medical Science, Asan Medical Institute of Convergence Science and Technology, Asan Medical Center, University of Ulsan College of Medicine, 88, Olympicro 43 Gil, Songpagu, 05505 Seoul, Republic of Korea
| | - Hansle Gwon
- grid.267370.70000 0004 0533 4667Department of Medical Science, Asan Medical Institute of Convergence Science and Technology, Asan Medical Center, University of Ulsan College of Medicine, 88, Olympicro 43 Gil, Songpagu, 05505 Seoul, Republic of Korea
| | - Hee Jun Kang
- grid.267370.70000 0004 0533 4667Division of Cardiology, Department of Internal Medicine, Asan Medical Center, University of Ulsan College of Medicine, 88, Olympicro 43 Gil, Songpagu, 05505 Seoul, Republic of Korea
| | - Yunha Kim
- grid.267370.70000 0004 0533 4667Department of Medical Science, Asan Medical Institute of Convergence Science and Technology, Asan Medical Center, University of Ulsan College of Medicine, 88, Olympicro 43 Gil, Songpagu, 05505 Seoul, Republic of Korea
| | - Hyeram Seo
- grid.267370.70000 0004 0533 4667Department of Medical Science, Asan Medical Institute of Convergence Science and Technology, Asan Medical Center, University of Ulsan College of Medicine, 88, Olympicro 43 Gil, Songpagu, 05505 Seoul, Republic of Korea
| | - Heejung Choi
- grid.267370.70000 0004 0533 4667Department of Medical Science, Asan Medical Institute of Convergence Science and Technology, Asan Medical Center, University of Ulsan College of Medicine, 88, Olympicro 43 Gil, Songpagu, 05505 Seoul, Republic of Korea
| | - Minkyoung Kim
- grid.267370.70000 0004 0533 4667Department of Medical Science, Asan Medical Institute of Convergence Science and Technology, Asan Medical Center, University of Ulsan College of Medicine, 88, Olympicro 43 Gil, Songpagu, 05505 Seoul, Republic of Korea
| | - Jiye Han
- grid.267370.70000 0004 0533 4667Department of Medical Science, Asan Medical Institute of Convergence Science and Technology, Asan Medical Center, University of Ulsan College of Medicine, 88, Olympicro 43 Gil, Songpagu, 05505 Seoul, Republic of Korea
| | - Gaeun Kee
- grid.267370.70000 0004 0533 4667Division of Cardiology, Department of Internal Medicine, Asan Medical Center, University of Ulsan College of Medicine, 88, Olympicro 43 Gil, Songpagu, 05505 Seoul, Republic of Korea
| | - Tae Joon Jun
- grid.413967.e0000 0001 0842 2126Big Data Research Center, Asan Institute for Life Sciences, Asan Medical Center, 88, Olympicro 43 Gil, Songpagu, 05505 Seoul, Republic of Korea
| | - Young-Hak Kim
- grid.267370.70000 0004 0533 4667Division of Cardiology, Department of Internal Medicine, Asan Medical Center, University of Ulsan College of Medicine, 88, Olympicro 43 Gil, Songpagu, 05505 Seoul, Republic of Korea
| |
Collapse
|
37
|
Bi XA, Mao Y, Luo S, Wu H, Zhang L, Luo X, Xu L. A novel generation adversarial network framework with characteristics aggregation and diffusion for brain disease classification and feature selection. Brief Bioinform 2022; 23:6762742. [PMID: 36259367 DOI: 10.1093/bib/bbac454] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2022] [Revised: 09/01/2022] [Accepted: 09/23/2022] [Indexed: 12/14/2022] Open
Abstract
Imaging genetics provides unique insights into the pathological studies of complex brain diseases by integrating the characteristics of multi-level medical data. However, most current imaging genetics research performs incomplete data fusion. Also, there is a lack of effective deep learning methods to analyze neuroimaging and genetic data jointly. Therefore, this paper first constructs the brain region-gene networks to intuitively represent the association pattern of pathogenetic factors. Second, a novel feature information aggregation model is constructed to accurately describe the information aggregation process among brain region nodes and gene nodes. Finally, a deep learning method called feature information aggregation and diffusion generative adversarial network (FIAD-GAN) is proposed to efficiently classify samples and select features. We focus on improving the generator with the proposed convolution and deconvolution operations, with which the interpretability of the deep learning framework has been dramatically improved. The experimental results indicate that FIAD-GAN can not only achieve superior results in various disease classification tasks but also extract brain regions and genes closely related to AD. This work provides a novel method for intelligent clinical decisions. The relevant biomedical discoveries provide a reliable reference and technical basis for the clinical diagnosis, treatment and pathological analysis of disease.
Collapse
Affiliation(s)
- Xia-An Bi
- Hunan Provincial Key Laboratory of Intelligent Computing and Language Information Processing, and College of Information Science and Engineering in Hunan Normal University, Changsha, P.R. China
| | - Yuhua Mao
- Department of Computing, School of Information Science and Engineering, Hunan Normal University, Changsha, China
| | - Sheng Luo
- Department of Computing, School of Information Science and Engineering, Hunan Normal University, Changsha, China
| | - Hao Wu
- Department of Computing, School of Information Science and Engineering, Hunan Normal University, Changsha, China
| | - Lixia Zhang
- School of Information Science and Engineering, Hunan Normal University, Changsha, P.R. China
| | - Xun Luo
- College of Information Science and Engineering in Hunan Normal University, Changsha, P.R. China
| | - Luyun Xu
- College of Business in Hunan Normal University, Changsha, P.R. China
| |
Collapse
|
38
|
Dhillon SK, Ganggayah MD, Sinnadurai S, Lio P, Taib NA. Theory and Practice of Integrating Machine Learning and Conventional Statistics in Medical Data Analysis. Diagnostics (Basel) 2022; 12:2526. [PMID: 36292218 PMCID: PMC9601117 DOI: 10.3390/diagnostics12102526] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2022] [Revised: 09/26/2022] [Accepted: 10/04/2022] [Indexed: 11/16/2022] Open
Abstract
The practice of medical decision making is changing rapidly with the development of innovative computing technologies. The growing interest of data analysis with improvements in big data computer processing methods raises the question of whether machine learning can be integrated with conventional statistics in health research. To help address this knowledge gap, this paper presents a review on the conceptual integration between conventional statistics and machine learning, focusing on the health research. The similarities and differences between the two are compared using mathematical concepts and algorithms. The comparison between conventional statistics and machine learning methods indicates that conventional statistics are the fundamental basis of machine learning, where the black box algorithms are derived from basic mathematics, but are advanced in terms of automated analysis, handling big data and providing interactive visualizations. While the nature of both these methods are different, they are conceptually similar. Based on our review, we conclude that conventional statistics and machine learning are best to be integrated to develop automated data analysis tools. We also strongly believe that machine learning could be explored by health researchers to enhance conventional statistics in decision making for added reliable validation measures.
Collapse
Affiliation(s)
- Sarinder Kaur Dhillon
- Data Science & Bioinformatics Laboratory, Institute of Biological Sciences, Faculty of Science, Universiti Malaya, Kuala Lumpur 50603, Malaysia
| | - Mogana Darshini Ganggayah
- Department of Econometrics and Business Statistics, School of Business, Monash University Malaysia, Kuala Lumpur 47500, Malaysia
| | - Siamala Sinnadurai
- Department of Population Medicine and Lifestyle Disease Prevention, Medical University of Bialystok, 15-269 Bialystok, Poland
| | - Pietro Lio
- Department of Computer Science and Technology, University of Cambridge, 15 JJ Thomson Avenue, Cambridge CB3 0FD, UK
| | - Nur Aishah Taib
- Department of Surgery, Faculty of Medicine, Universiti Malaya, Kuala Lumpur 50603, Malaysia
| |
Collapse
|
39
|
Guo LX, You ZH, Wang L, Yu CQ, Zhao BW, Ren ZH, Pan J. A novel circRNA-miRNA association prediction model based on structural deep neural network embedding. Brief Bioinform 2022; 23:6694810. [DOI: 10.1093/bib/bbac391] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2022] [Revised: 07/14/2022] [Accepted: 08/11/2022] [Indexed: 11/14/2022] Open
Abstract
Abstract
A large amount of clinical evidence began to mount, showing that circular ribonucleic acids (RNAs; circRNAs) perform a very important function in complex diseases by participating in transcription and translation regulation of microRNA (miRNA) target genes. However, with strict high-throughput techniques based on traditional biological experiments and the conditions and environment, the association between circRNA and miRNA can be discovered to be labor-intensive, expensive, time-consuming, and inefficient. In this paper, we proposed a novel computational model based on Word2vec, Structural Deep Network Embedding (SDNE), Convolutional Neural Network and Deep Neural Network, which predicts the potential circRNA-miRNA associations, called Word2vec, SDNE, Convolutional Neural Network and Deep Neural Network (WSCD). Specifically, the WSCD model extracts attribute feature and behaviour feature by word embedding and graph embedding algorithm, respectively, and ultimately feed them into a feature fusion model constructed by combining Convolutional Neural Network and Deep Neural Network to deduce potential circRNA-miRNA interactions. The proposed method is proved on dataset and obtained a prediction accuracy and an area under the receiver operating characteristic curve of 81.61% and 0.8898, respectively, which is shown to have much higher accuracy than the state-of-the-art models and classifier models in prediction. In addition, 23 miRNA-related circular RNAs (circRNAs) from the top 30 were confirmed in relevant experiences. In these works, all results represent that WSCD would be a helpful supplementary reliable method for predicting potential miRNA-circRNA associations compared to wet laboratory experiments.
Collapse
Affiliation(s)
- Lu-Xiang Guo
- College of Information Engineering, Xijing University , Xi’an 710123, China
| | - Zhu-Hong You
- School of Computer Science, Northwestern Polytechnical University , Xi’an, 710129, China
| | - Lei Wang
- Big Data and Intelligent Computing Research Center, Guangxi Academy of Sciences , Nanning 530007, China
- College of Information Science and Engineering, Zaozhuang University , Shandong 277100, China
| | - Chang-Qing Yu
- College of Information Engineering, Xijing University , Xi’an 710123, China
| | - Bo-Wei Zhao
- Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences , Urumqi 830011, China
| | - Zhong-Hao Ren
- College of Information Engineering, Xijing University , Xi’an 710123, China
| | - Jie Pan
- Key Laboratory of Resources Biology and Biotechnology in Western China, Ministry of Education, College of Life Science, Northwest University , Xi’an 710069, China
| |
Collapse
|
40
|
Arora V, Sanguinetti G. Challenges for machine learning in RNA-protein interaction prediction. Stat Appl Genet Mol Biol 2022; 21:sagmb-2021-0087. [PMID: 35073469 DOI: 10.1515/sagmb-2021-0087] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2021] [Accepted: 01/02/2022] [Indexed: 11/15/2022]
Abstract
RNA-protein interactions have long being recognised as crucial regulators of gene expression. Recently, the development of scalable experimental techniques to measure these interactions has revolutionised the field, leading to the production of large-scale datasets which offer both opportunities and challenges for machine learning techniques. In this brief note, we will discuss some of the major stumbling blocks towards the use of machine learning in computational RNA biology, focusing specifically on the problem of predicting RNA-protein interactions from next-generation sequencing data.
Collapse
Affiliation(s)
- Viplove Arora
- Data Science, Department of Physics, International School for Advanced Studies (SISSA), Trieste 34136, Italy
| | - Guido Sanguinetti
- Data Science, Department of Physics, International School for Advanced Studies (SISSA), Trieste 34136, Italy
| |
Collapse
|