1
|
Yao D, Zhang B, Zhan X, Zhang B, Li XK. Predicting lncRNA-Disease Associations Based on a Dual-Path Feature Extraction Network with Multiple Sources of Information Integration. ACS OMEGA 2024; 9:35100-35112. [PMID: 39157140 PMCID: PMC11325412 DOI: 10.1021/acsomega.4c05365] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/08/2024] [Revised: 07/04/2024] [Accepted: 07/22/2024] [Indexed: 08/20/2024]
Abstract
Identifying the associations between long noncoding RNAs (lncRNAs) and disease is critical for disease prevention, diagnosis and treatment. However, conducting wet experiments to discover these associations is time-consuming and costly. Therefore, computational modeling for predicting lncRNA-disease associations (LDAs) has become an important alternative. To enhance the accuracy of LDAs prediction and alleviate the issue of node feature oversmoothing when exploring the potential features of nodes using graph neural networks, we introduce DPFELDA, a dual-path feature extraction network that leverages the integration of information from multiple sources to predict LDA. Initially, we establish a dual-view structure of lncRNAs and disease and a heterogeneous network of lncRNA-disease-microRNA (miRNA) interactions. Subsequently, features are extracted using a dual-path feature extraction network. In particular, we employ a combination of a graph convolutional network, a convolutional block attention module, and a node aggregation layer to perform multilayer topology feature extraction for the dual-view structure of lncRNAs and diseases. Additionally, we utilize a Transformer model to construct the node topology feature residual network for obtaining node-specific features in heterogeneous networks. Finally, XGBoost is employed for LDA prediction. The experimental results demonstrate that DPFELDA outperforms the benchmark model on various benchmark data sets. In the course of model exploration, it becomes evident that DPFELDA successfully alleviates the issue of node feature oversmoothing induced by graph-based learning. Ablation experiments confirm the effectiveness of the innovative module, and a case study substantiates the accuracy of DPFELDA model in predicting novel LDAs for characteristic diseases.
Collapse
Affiliation(s)
- Dengju Yao
- School
of Computer Science and Technology, Harbin
University of Science and Technology, Harbin 150080, China
| | - Binbin Zhang
- School
of Computer Science and Technology, Harbin
University of Science and Technology, Harbin 150080, China
| | - Xiaojuan Zhan
- School
of Computer Science and Technology, Harbin
University of Science and Technology, Harbin 150080, China
- College
of Computer Science and Technology, Heilongjiang
Institute of Technology, Harbin 150050, China
| | - Bo Zhang
- School
of Computer Science and Technology, Harbin
University of Science and Technology, Harbin 150080, China
| | - Xiang Kui Li
- School
of Computer Science and Technology, Harbin
University of Science and Technology, Harbin 150080, China
| |
Collapse
|
2
|
Wu H, Liu J, Zhang R, Lu Y, Cui G, Cui Z, Ding Y. A review of deep learning methods for ligand based drug virtual screening. FUNDAMENTAL RESEARCH 2024; 4:715-737. [PMID: 39156568 PMCID: PMC11330120 DOI: 10.1016/j.fmre.2024.02.011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2023] [Revised: 01/10/2024] [Accepted: 02/18/2024] [Indexed: 08/20/2024] Open
Abstract
Drug discovery is costly and time consuming, and modern drug discovery endeavors are progressively reliant on computational methodologies, aiming to mitigate temporal and financial expenditures associated with the process. In particular, the time required for vaccine and drug discovery is prolonged during emergency situations such as the coronavirus 2019 pandemic. Recently, the performance of deep learning methods in drug virtual screening has been particularly prominent. It has become a concern for researchers how to summarize the existing deep learning in drug virtual screening, select different models for different drug screening problems, exploit the advantages of deep learning models, and further improve the capability of deep learning in drug virtual screening. This review first introduces the basic concepts of drug virtual screening, common datasets, and data representation methods. Then, large numbers of common deep learning methods for drug virtual screening are compared and analyzed. In addition, a dataset of different sizes is constructed independently to evaluate the performance of each deep learning model for the difficult problem of large-scale ligand virtual screening. Finally, the existing challenges and future directions in the field of virtual screening are presented.
Collapse
Affiliation(s)
- Hongjie Wu
- School of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou 215009, China
| | - Junkai Liu
- School of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou 215009, China
| | - Runhua Zhang
- School of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou 215009, China
| | - Yaoyao Lu
- School of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou 215009, China
| | - Guozeng Cui
- School of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou 215009, China
| | - Zhiming Cui
- School of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou 215009, China
| | - Yijie Ding
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou 324000, China
| |
Collapse
|
3
|
Zhou Z, Liao Q, Wei J, Zhuo L, Wu X, Fu X, Zou Q. Revisiting drug-protein interaction prediction: a novel global-local perspective. Bioinformatics 2024; 40:btae271. [PMID: 38648052 PMCID: PMC11087820 DOI: 10.1093/bioinformatics/btae271] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2023] [Revised: 02/09/2024] [Accepted: 04/17/2024] [Indexed: 04/25/2024] Open
Abstract
MOTIVATION Accurate inference of potential drug-protein interactions (DPIs) aids in understanding drug mechanisms and developing novel treatments. Existing deep learning models, however, struggle with accurate node representation in DPI prediction, limiting their performance. RESULTS We propose a new computational framework that integrates global and local features of nodes in the drug-protein bipartite graph for efficient DPI inference. Initially, we employ pre-trained models to acquire fundamental knowledge of drugs and proteins and to determine their initial features. Subsequently, the MinHash and HyperLogLog algorithms are utilized to estimate the similarity and set cardinality between drug and protein subgraphs, serving as their local features. Then, an energy-constrained diffusion mechanism is integrated into the transformer architecture, capturing interdependencies between nodes in the drug-protein bipartite graph and extracting their global features. Finally, we fuse the local and global features of nodes and employ multilayer perceptrons to predict the likelihood of potential DPIs. A comprehensive and precise node representation guarantees efficient prediction of unknown DPIs by the model. Various experiments validate the accuracy and reliability of our model, with molecular docking results revealing its capability to identify potential DPIs not present in existing databases. This approach is expected to offer valuable insights for furthering drug repurposing and personalized medicine research. AVAILABILITY AND IMPLEMENTATION Our code and data are accessible at: https://github.com/ZZCrazy00/DPI.
Collapse
Affiliation(s)
- Zhecheng Zhou
- School of Data Science and Artificial Intelligence, Wenzhou University of Technology, Wenzhou 325027, China
| | - Qingquan Liao
- College of Computer Science and Electronic Engineering, Hunan University, Changsha 410012, China
| | - Jinhang Wei
- School of Data Science and Artificial Intelligence, Wenzhou University of Technology, Wenzhou 325027, China
| | - Linlin Zhuo
- School of Data Science and Artificial Intelligence, Wenzhou University of Technology, Wenzhou 325027, China
| | - Xiaonan Wu
- School of Data Science and Artificial Intelligence, Wenzhou University of Technology, Wenzhou 325027, China
| | - Xiangzheng Fu
- College of Computer Science and Electronic Engineering, Hunan University, Changsha 410012, China
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu 611730, China
| |
Collapse
|
4
|
Zhang Y, Li S, Meng K, Sun S. Machine Learning for Sequence and Structure-Based Protein-Ligand Interaction Prediction. J Chem Inf Model 2024; 64:1456-1472. [PMID: 38385768 DOI: 10.1021/acs.jcim.3c01841] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/23/2024]
Abstract
Developing new drugs is too expensive and time -consuming. Accurately predicting the interaction between drugs and targets will likely change how the drug is discovered. Machine learning-based protein-ligand interaction prediction has demonstrated significant potential. In this paper, computational methods, focusing on sequence and structure to study protein-ligand interactions, are examined. Therefore, this paper starts by presenting an overview of the data sets applied in this area, as well as the various approaches applied for representing proteins and ligands. Then, sequence-based and structure-based classification criteria are subsequently utilized to categorize and summarize both the classical machine learning models and deep learning models employed in protein-ligand interaction studies. Moreover, the evaluation methods and interpretability of these models are proposed. Furthermore, delving into the diverse applications of protein-ligand interaction models in drug research is presented. Lastly, the current challenges and future directions in this field are addressed.
Collapse
Affiliation(s)
- Yunjiang Zhang
- Beijing Key Laboratory for Green Catalysis and Separation, The Faculty of Environment and Life, Beijing University of Technology, Beijing 100124, P. R. China
| | - Shuyuan Li
- Beijing Key Laboratory for Green Catalysis and Separation, The Faculty of Environment and Life, Beijing University of Technology, Beijing 100124, P. R. China
| | - Kong Meng
- Beijing Key Laboratory for Green Catalysis and Separation, The Faculty of Environment and Life, Beijing University of Technology, Beijing 100124, P. R. China
| | - Shaorui Sun
- Beijing Key Laboratory for Green Catalysis and Separation, The Faculty of Environment and Life, Beijing University of Technology, Beijing 100124, P. R. China
| |
Collapse
|
5
|
Li X, Yang Q, Luo G, Xu L, Dong W, Wang W, Dong S, Wang K, Xuan P, Gao X. SAGDTI: self-attention and graph neural network with multiple information representations for the prediction of drug-target interactions. BIOINFORMATICS ADVANCES 2023; 3:vbad116. [PMID: 38282612 PMCID: PMC10818136 DOI: 10.1093/bioadv/vbad116] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 04/21/2023] [Revised: 07/31/2023] [Accepted: 08/24/2023] [Indexed: 01/30/2024]
Abstract
Motivation Accurate identification of target proteins that interact with drugs is a vital step in silico, which can significantly foster the development of drug repurposing and drug discovery. In recent years, numerous deep learning-based methods have been introduced to treat drug-target interaction (DTI) prediction as a classification task. The output of this task is binary identification suggesting the absence or presence of interactions. However, existing studies often (i) neglect the unique molecular attributes when embedding drugs and proteins, and (ii) determine the interaction of drug-target pairs without considering biological interaction information. Results In this study, we propose an end-to-end attention-derived method based on the self-attention mechanism and graph neural network, termed SAGDTI. The aim of this method is to overcome the aforementioned drawbacks in the identification of DTI. SAGDTI is the first method to sufficiently consider the unique molecular attribute representations for both drugs and targets in the input form of the SMILES sequences and three-dimensional structure graphs. In addition, our method aggregates the feature attributes of biological information between drugs and targets through multi-scale topologies and diverse connections. Experimental results illustrate that SAGDTI outperforms existing prediction models, which benefit from the unique molecular attributes embedded by atom-level attention and biological interaction information representation aggregated by node-level attention. Moreover, a case study on severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) shows that our model is a powerful tool for identifying DTIs in real life. Availability and implementation The data and codes underlying this article are available in Github at https://github.com/lixiaokun2020/SAGDTI.
Collapse
Affiliation(s)
- Xiaokun Li
- School of Computer Science and Technology, Heilongjiang University, Harbin 150080, China
- Postdoctoral Program of Heilongjiang Hengxun Technology Co., Ltd., Harbin 150090, China
| | - Qiang Yang
- School of Computer Science and Technology, Heilongjiang University, Harbin 150080, China
- Postdoctoral Program of Heilongjiang Hengxun Technology Co., Ltd., Harbin 150090, China
| | - Gongning Luo
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China
| | - Long Xu
- School of Computer Science and Technology, Heilongjiang University, Harbin 150080, China
- Postdoctoral Program of Heilongjiang Hengxun Technology Co., Ltd., Harbin 150090, China
| | - Weihe Dong
- Postdoctoral Program of Heilongjiang Hengxun Technology Co., Ltd., Harbin 150090, China
- College of Computer and Control Engineering, Northeast Forestry University, Harbin 150040, China
| | - Wei Wang
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China
| | - Suyu Dong
- College of Computer and Control Engineering, Northeast Forestry University, Harbin 150040, China
| | - Kuanquan Wang
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China
| | - Ping Xuan
- School of Computer Science and Technology, Heilongjiang University, Harbin 150080, China
- Department of Computer Science, School of Engineering, Shantou University, Shantou 515063, China
| | - Xin Gao
- Computer, Electrical and Mathematical Sciences & Engineering Division, King Abdullah University of Science and Technology, 4700 KAUST, Thuwal 23955, Saudi Arabia
| |
Collapse
|
6
|
Dong W, Yang Q, Wang J, Xu L, Li X, Luo G, Gao X. Multi-modality attribute learning-based method for drug-protein interaction prediction based on deep neural network. Brief Bioinform 2023; 24:7145903. [PMID: 37114624 DOI: 10.1093/bib/bbad161] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2022] [Revised: 03/19/2023] [Accepted: 04/02/2023] [Indexed: 04/29/2023] Open
Abstract
Identification of active candidate compounds for target proteins, also called drug-protein interaction (DPI) prediction, is an essential but time-consuming and expensive step, which leads to fostering the development of drug discovery. In recent years, deep network-based learning methods were frequently proposed in DPIs due to their powerful capability of feature representation. However, the performance of existing DPI methods is still limited by insufficiently labeled pharmacological data and neglected intermolecular information. Therefore, overcoming these difficulties to perfect the performance of DPIs is an urgent challenge for researchers. In this article, we designed an innovative 'multi-modality attributes' learning-based framework for DPIs with molecular transformer and graph convolutional networks, termed, multi-modality attributes (MMA)-DPI. Specifically, intermolecular sub-structural information and chemical semantic representations were extracted through an augmented transformer module from biomedical data. A tri-layer graph convolutional neural network module was applied to associate the neighbor topology information and learn the condensed dimensional features by aggregating a heterogeneous network that contains multiple biological representations of drugs, proteins, diseases and side effects. Then, the learned representations were taken as the input of a fully connected neural network module to further integrate them in molecular and topological space. Finally, the attribute representations were fused with adaptive learning weights to calculate the interaction score for the DPIs tasks. MMA-DPI was evaluated in different experimental conditions and the results demonstrate that the proposed method achieved higher performance than existing state-of-the-art frameworks.
Collapse
Affiliation(s)
- Weihe Dong
- College of information and Computer Engineering, Northeast Forestry University, Hexing Road, 150040, Harbin, China
| | - Qiang Yang
- School of Computer Science and Technology, Heilongjiang University, Xuefu Road, 150080, Harbin, China
- Postdoctoral Program of Heilongjiang Hengxun Technology Co., Ltd., Xuefu Road, 150080, Harbin, China
| | - Jian Wang
- College of information and Computer Engineering, Northeast Forestry University, Hexing Road, 150040, Harbin, China
| | - Long Xu
- School of Computer Science and Technology, Heilongjiang University, Xuefu Road, 150080, Harbin, China
- Postdoctoral Program of Heilongjiang Hengxun Technology Co., Ltd., Xuefu Road, 150080, Harbin, China
| | - Xiaokun Li
- School of Computer Science and Technology, Heilongjiang University, Xuefu Road, 150080, Harbin, China
- Postdoctoral Program of Heilongjiang Hengxun Technology Co., Ltd., Xuefu Road, 150080, Harbin, China
| | - Gongning Luo
- Computer, Electrical and Mathematical Sciences & Engineering Division, King Abdullah University of Science and Technology, 4700 KAUST, Thuwal 23955, Saudi Arabia
- School of Computer Science and Technology, Harbin Institute of Technology, West Dazhi Street, 150001, Harbin, China
| | - Xin Gao
- Computer, Electrical and Mathematical Sciences & Engineering Division, King Abdullah University of Science and Technology, 4700 KAUST, Thuwal 23955, Saudi Arabia
| |
Collapse
|