1
|
Shi W, Yang H, Xie L, Yin XX, Zhang Y. A review of machine learning-based methods for predicting drug-target interactions. Health Inf Sci Syst 2024; 12:30. [PMID: 38617016 PMCID: PMC11014838 DOI: 10.1007/s13755-024-00287-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2023] [Accepted: 03/04/2024] [Indexed: 04/16/2024] Open
Abstract
The prediction of drug-target interactions (DTI) is a crucial preliminary stage in drug discovery and development, given the substantial risk of failure and the prolonged validation period associated with in vitro and in vivo experiments. In the contemporary landscape, various machine learning-based methods have emerged as indispensable tools for DTI prediction. This paper begins by placing emphasis on the data representation employed by these methods, delineating five representations for drugs and four for proteins. The methods are then categorized into traditional machine learning-based approaches and deep learning-based ones, with a discussion of representative approaches in each category and the introduction of a novel taxonomy for deep neural network models in DTI prediction. Additionally, we present a synthesis of commonly used datasets and evaluation metrics to facilitate practical implementation. In conclusion, we address current challenges and outline potential future directions in this research field.
Collapse
Affiliation(s)
- Wen Shi
- Cyberspace Institute of Advanced Technology, Guangzhou University, Guangzhou, 510006 China
- School of Computer Science and Technology, Zhejiang Normal University, Jinhua, 321004 China
| | - Hong Yang
- Cyberspace Institute of Advanced Technology, Guangzhou University, Guangzhou, 510006 China
| | - Linhai Xie
- State Key Laboratory of Proteomics, National Center for Protein Sciences (Beijing), Beijing, 102206 China
| | - Xiao-Xia Yin
- Cyberspace Institute of Advanced Technology, Guangzhou University, Guangzhou, 510006 China
| | - Yanchun Zhang
- School of Computer Science and Technology, Zhejiang Normal University, Jinhua, 321004 China
- Department of New Networks, Peng Cheng Laboratory, Shenzhen, 518000 China
| |
Collapse
|
2
|
Xu C, Zheng L, Fan Q, Liu Y, Zeng C, Ning X, Liu H, Du K, Lu T, Chen Y, Zhang Y. Progress in the application of artificial intelligence in molecular generation models based on protein structure. Eur J Med Chem 2024; 277:116735. [PMID: 39098131 DOI: 10.1016/j.ejmech.2024.116735] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2024] [Revised: 07/12/2024] [Accepted: 07/30/2024] [Indexed: 08/06/2024]
Abstract
The molecular generation models based on protein structures represent a cutting-edge research direction in artificial intelligence-assisted drug discovery. This article aims to comprehensively summarize the research methods and developments by analyzing a series of novel molecular generation models predicated on protein structures. Initially, we categorize the molecular generation models based on protein structures and highlight the architectural frameworks utilized in these models. Subsequently, we detail the design and implementation of protein structure-based molecular generation models by introducing different specific examples. Lastly, we outline the current opportunities and challenges encountered in this field, intending to offer guidance and a referential framework for developing and studying new models in related fields in the future.
Collapse
Affiliation(s)
- Chengcheng Xu
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing, 211198, China
| | - Lidan Zheng
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing, 211198, China
| | - Qing Fan
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing, 211198, China
| | - Yingxu Liu
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing, 211198, China
| | - Chen Zeng
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing, 211198, China
| | - Xiangzhen Ning
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing, 211198, China
| | - Haichun Liu
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing, 211198, China
| | - Ke Du
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing, 211198, China
| | - Tao Lu
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing, 211198, China; State Key Laboratory of Natural Medicines, China Pharmaceutical University, 24 Tongjiaxiang, Nanjing, 210009, China.
| | - Yadong Chen
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing, 211198, China.
| | - Yanmin Zhang
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing, 211198, China.
| |
Collapse
|
3
|
E U, T M, A V G, D P. A comprehensive survey of drug-target interaction analysis in allopathy and siddha medicine. Artif Intell Med 2024; 157:102986. [PMID: 39326289 DOI: 10.1016/j.artmed.2024.102986] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2023] [Revised: 08/13/2024] [Accepted: 09/18/2024] [Indexed: 09/28/2024]
Abstract
Effective drug delivery is the cornerstone of modern healthcare, ensuring therapeutic compounds reach their intended targets efficiently. This paper explores the potential of personalized and holistic healthcare, driven by the synergy between traditional and allopathic medicine systems, with a specific focus on the vast reservoir of medicinal compounds found in plants rooted in the historical legacy of traditional medicine. Motivated by the desire to unlock the therapeutic potential of medicinal plants and bridge the gap between traditional and allopathic medicine, this survey delves into in-silico computational approaches for studying Drug-Target Interactions (DTI) within the contexts of allopathy and siddha medicine. The contributions of this survey are multifaceted: it offers a comprehensive overview of in-silico methods for DTI analysis in both systems, identifies common challenges in DTI studies, provides insights into future directions to advance DTI analysis, and includes a comparative analysis of DTI in allopathy and siddha medicine. The findings of this survey highlight the pivotal role of in-silico computational approaches in advancing drug research and development in both allopathy and siddha medicine, emphasizing the importance of integrating these methods to drive the future of personalized healthcare.
Collapse
Affiliation(s)
- Uma E
- Department of Information Science and Technology, College of Engineering Guindy, Chennai, India.
| | - Mala T
- Department of Information Science and Technology, College of Engineering Guindy, Chennai, India
| | - Geetha A V
- Department of Information Science and Technology, College of Engineering Guindy, Chennai, India
| | - Priyanka D
- Department of Information Science and Technology, College of Engineering Guindy, Chennai, India
| |
Collapse
|
4
|
Liu Y, Xia X, Gong Y, Song B, Zeng X. SSR-DTA: Substructure-aware multi-layer graph neural networks for drug-target binding affinity prediction. Artif Intell Med 2024; 157:102983. [PMID: 39321746 DOI: 10.1016/j.artmed.2024.102983] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2023] [Revised: 09/10/2024] [Accepted: 09/13/2024] [Indexed: 09/27/2024]
Abstract
Accurate prediction of drug-target binding affinity (DTA) is essential in the field of drug discovery. Recently, scientists have been attempting to utilize artificial intelligence prediction to screen out a significant number of ineffective compounds, thereby mitigating labor and financial losses. While graph neural networks (GNNs) have been applied to DTA, existing GNNs have limitations in effectively extracting substructural features across various sizes. Functional groups play a crucial role in modulating molecular properties, but existing GNNs struggle with feature extraction from certain motifs due to scale mismatches. Additionally, sequence-based models for target proteins lack the integration of structural information. To address these limitations, we present SSR-DTA, a multi-layer graph network capable of adapting to diverse structural sizes, which can extract richer biological features, thereby improving the robustness and accuracy of predictions. Multi-layer GNNs enable the capture of molecular motifs across different scales, ranging from atomic to macrocyclic motifs. Furthermore, we introduce BiGNN to simultaneously learn sequence and structural information. Sequence information corresponds to the primary structure of proteins, while graph information represents the tertiary structure. BiGNN assimilates richer information compared to sequence-based methods while mitigating the impact of errors from predicted structures, resulting in more accurate predictions. Through rigorous experimental evaluations conducted on four benchmark datasets, we demonstrate the superiority of SSR-DTA over state-of-the-art models. Particularly, in comparison to state-of-the-art models, SSR-DTA demonstrates an impressive 20% reduction in mean squared error on the Davis dataset and a 5% reduction on the KIBA dataset, underscoring its potential as a valuable tool for advancing DTA prediction.
Collapse
Affiliation(s)
- Yuansheng Liu
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, 410086, Hunan, China; Key Laboratory of Intelligent Computing & Signal Processing of Ministry of Education, Anhui University, Hefei, 230601, Anhui, China
| | - Xinyan Xia
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, 410086, Hunan, China
| | - Yongshun Gong
- School of Software, Shandong University, Jinan, 250100, Shandong, China
| | - Bosheng Song
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, 410086, Hunan, China
| | - Xiangxiang Zeng
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, 410086, Hunan, China.
| |
Collapse
|
5
|
Tang X, Ma W, Yang M, Li W. MFF-DTA: Multi-scale feature fusion for drug-target affinity prediction. Methods 2024; 231:1-7. [PMID: 39218169 DOI: 10.1016/j.ymeth.2024.08.008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2024] [Revised: 07/19/2024] [Accepted: 08/27/2024] [Indexed: 09/04/2024] Open
Abstract
Accurately predicting drug-target affinity is crucial in expediting the discovery and development of new drugs, which is a complex and risky process. Identifying these interactions not only aids in screening potential compounds but also guides further optimization. To address this, we propose a multi-perspective feature fusion model, MFF-DTA, which integrates chemical structure, biological sequence, and other data to comprehensively capture drug-target affinity features. The MFF-DTA model incorporates multiple feature learning components, each of which is capable of extracting drug molecular features and protein target information, respectively. These components are able to obtain key information from both global and local perspectives. Then, these features from different perspectives are efficiently combined using specific splicing strategies to create a comprehensive representation. Finally, the model uses the fused features to predict drug-target affinity. Comparative experiments show that MFF-DTA performs optimally on the Davis and KIBA data sets. Ablation experiments demonstrate that removing specific components results in the loss of unique information, thus confirming the effectiveness of the MFF-DTA design. Improvements in DTA prediction methods will decrease costs and time in drug development, enhancing industry efficiency and ultimately benefiting patients.
Collapse
Affiliation(s)
- Xiwei Tang
- School of Computer Science, Hunan First Normal University, Changsha, Hunan, China.
| | - Wanjun Ma
- Hunan Provincial Key Laboratory of Intelligent Processing of Big Data on Transportation, Changsha University of Science and Technology Changsha, Hunan, China
| | - Mengyun Yang
- School of Computer Science, Hunan First Normal University, Changsha, Hunan, China
| | - Wenjun Li
- Hunan Provincial Key Laboratory of Intelligent Processing of Big Data on Transportation, Changsha University of Science and Technology Changsha, Hunan, China
| |
Collapse
|
6
|
Quan L, Wu J, Jiang Y, Pan D, Qiang L. DTA-GTOmega: Enhancing Drug-Target Binding Affinity Prediction with Graph Transformers Using OmegaFold Protein Structures. J Mol Biol 2024:168843. [PMID: 39481634 DOI: 10.1016/j.jmb.2024.168843] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2024] [Revised: 10/05/2024] [Accepted: 10/24/2024] [Indexed: 11/02/2024]
Abstract
Understanding drug-protein interactions is crucial for elucidating drug mechanisms and optimizing drug development. However, existing methods have limitations in representing the three-dimensional structure of targets and capturing the complex relationships between drugs and targets. This study proposes a new method, DTA-GTOmega, for predicting drug-target binding affinity. DTA-GTOmega utilizes OmegaFold to predict protein three-dimensional structure and construct target graphs, while processing drug SMILES sequences with RDKit to generate drug graphs. By employing multi-layer graph transformer modules and co-attention modules, this method effectively integrates atomic-level features of drugs and residue-level features of targets, accurately modeling the complex interactions between drugs and targets, thereby significantly improving the accuracy of binding affinity predictions. Our method outperforms existing techniques on benchmark datasets such as KIBA, Davis, and BindingDB_Kd under cold-start setting. Moreover, DTA-GTOmega demonstrates competitive performance in real-world DTI scenarios involving DrugBank data and drug-target interactions related to cardiovascular and nervous system-related diseases, highlighting its robust generalization capabilities. Additionally, the introduced DTI evaluation metrics further validate DTA-GTOmega's potential in handling imbalanced data.
Collapse
Affiliation(s)
- Lijun Quan
- School of Computer Science and Technology, Soochow University, Jiangsu 215006, China; Collaborative Innovation Center of Novel Software Technology and Industrialization, Jiangsu 210000, China
| | - Jian Wu
- China Mobile (Suzhou) Software Technology Co., Ltd., Suzhou 215000, China
| | - Yelu Jiang
- School of Computer Science and Technology, Soochow University, Jiangsu 215006, China
| | - Deng Pan
- School of Computer Science and Technology, Soochow University, Jiangsu 215006, China
| | - Lyu Qiang
- School of Computer Science and Technology, Soochow University, Jiangsu 215006, China; Collaborative Innovation Center of Novel Software Technology and Industrialization, Jiangsu 210000, China.
| |
Collapse
|
7
|
Sun X, Huang J, Fang Y, Jin Y, Wu J, Wang G, Jia J. MREDTA: A BERT and transformer-based molecular representation encoder for predicting drug-target binding affinity. FASEB J 2024; 38:e70083. [PMID: 39373982 DOI: 10.1096/fj.202401254r] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2024] [Revised: 09/05/2024] [Accepted: 09/18/2024] [Indexed: 10/08/2024]
Abstract
Drug-target binding affinity (DTA) prediction is vital for drug repositioning. The accuracy and generalizability of DTA models remain a major challenge. Here, we develop a model composed of BERT-Trans Block, Multi-Trans Block, and DTI Learning modules, referred to as Molecular Representation Encoder-based DTA prediction (MREDTA). MREDTA has three advantages: (1) extraction of both local and global molecular features simultaneously through skip connections; (2) improved sensitivity to molecular structures through the Multi-Trans Block; (3) enhanced generalizability through the introduction of BERT. Compared with 12 advanced models, benchmark testing of KIBA and Davis datasets demonstrated optimal performance of MREDTA. In case study, we applied MREDTA to 2034 FDA-approved drugs for treating non-small-cell lung cancer (NSCLC), all of which act on mutant EGFRT790M protein. The corresponding molecular docking results demonstrated the robustness of MREDTA.
Collapse
Affiliation(s)
- Xu Sun
- Department of Computational Mathematics, School of Mathematics, Jilin University, Changchun, China
| | - Juanjuan Huang
- Department of Computational Mathematics, School of Mathematics, Jilin University, Changchun, China
- State Key Laboratory for Diagnosis and Treatment of Severe Zoonotic Infectious Diseases, Key Laboratory for Zoonosis Research of the Ministry of Education, College of Basic Medicine, Jilin University, Changchun, China
| | - Yabo Fang
- Department of Computational Mathematics, School of Mathematics, Jilin University, Changchun, China
| | - Yixuan Jin
- Department of Computational Mathematics, School of Mathematics, Jilin University, Changchun, China
| | - Jiageng Wu
- Department of Computational Mathematics, School of Mathematics, Jilin University, Changchun, China
| | - Guoqing Wang
- State Key Laboratory for Diagnosis and Treatment of Severe Zoonotic Infectious Diseases, Key Laboratory for Zoonosis Research of the Ministry of Education, College of Basic Medicine, Jilin University, Changchun, China
| | - Jiwei Jia
- Department of Computational Mathematics, School of Mathematics, Jilin University, Changchun, China
- Jilin National Applied Mathematical Center, Jilin University, Changchun, China
| |
Collapse
|
8
|
Tang X, Zhou Y, Yang M, Li W. TC-DTA: Predicting Drug-Target Binding Affinity With Transformer and Convolutional Neural Networks. IEEE Trans Nanobioscience 2024; 23:572-578. [PMID: 39133595 DOI: 10.1109/tnb.2024.3441590] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/16/2024]
Abstract
Bioinformatics is a rapidly evolving field that applies computational methods to analyze and interpret biological data. A key task in bioinformatics is identifying novel drug-target interactions (DTIs), which plays a crucial role in drug discovery. Most computational approaches treat DTI prediction as a binary classification problem, determining whether drug-target pairs interact. However, with the growing availability of drug-target binding affinity data, this binary task can be reframed as a regression problem focused on drug-target affinity (DTA). DTA quantifies the strength of drug-target binding, offering more detailed insights than DTI and serving as a valuable tool for virtual screening in drug discovery. Accurately predicting compound interactions with targets can accelerate the drug development process. In this study, we introduce a deep learning model named TC-DTA for DTA prediction, leveraging convolutional neural networks (CNN) and the encoder module of the transformer architecture. We begin by extracting raw drug SMILES strings and protein amino acid sequences from the dataset, which are then represented using various encoding methods. Subsequently, we employ CNN and the transformer's encoder module to extract features from the drug SMILES strings and protein sequences, respectively. Finally, the feature information is concatenated and input into a multi-layer perceptron to predict binding affinity scores. We evaluated our model on two benchmark DTA datasets, Davis and KIBA, comparing it with methods such as KronRLS, SimBoost, and DeepDTA. Our model, TC-DTA, outperformed these baseline methods based on evaluation metrics like Mean Squared Error (MSE), Concordance Index (CI), and Regression towards the Mean Index ( rm2 ). These results highlight the effectiveness of the Transformer's encoder and CNN in extracting meaningful representations from sequences, thereby enhancing DTA prediction accuracy. This deep learning model can accelerate drug discovery by identifying drug candidates with high binding affinity to specific targets. Compared to traditional methods, machine learning technology offers a more effective and efficient approach to drug discovery.
Collapse
|
9
|
Huang J, Sun C, Li M, Tang R, Xie B, Wang S, Wei JM. Structure-inclusive similarity based directed GNN: a method that can control information flow to predict drug-target binding affinity. Bioinformatics 2024; 40:btae563. [PMID: 39292540 PMCID: PMC11474107 DOI: 10.1093/bioinformatics/btae563] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2024] [Revised: 05/21/2024] [Accepted: 09/17/2024] [Indexed: 09/20/2024] Open
Abstract
MOTIVATION Exploring the association between drugs and targets is essential for drug discovery and repurposing. Comparing with the traditional methods that regard the exploration as a binary classification task, predicting the drug-target binding affinity can provide more specific information. Many studies work based on the assumption that similar drugs may interact with the same target. These methods constructed a symmetric graph according to the undirected drug similarity or target similarity. Although these similarities can measure the difference between two molecules, it is unable to analyze the inclusion relationship of their substructure. For example, if drug A contains all the substructures of drug B, then in the message-passing mechanism of the graph neural network, drug A should acquire all the properties of drug B, while drug B should only obtain some of the properties of A. RESULTS To this end, we proposed a structure-inclusive similarity (SIS) which measures the similarity of two drugs by considering the inclusion relationship of their substructures. Based on SIS, we constructed a drug graph and a target graph, respectively, and predicted the binding affinities between drugs and targets by a graph convolutional network-based model. Experimental results show that considering the inclusion relationship of the substructure of two molecules can effectively improve the accuracy of the prediction model. The performance of our SIS-based prediction method outperforms several state-of-the-art methods for drug-target binding affinity prediction. The case studies demonstrate that our model is a practical tool to predict the binding affinity between drugs and targets. AVAILABILITY AND IMPLEMENTATION Source codes and data are available at https://github.com/HuangStomach/SISDTA.
Collapse
Affiliation(s)
- Jipeng Huang
- Centre for Bioinformatics and Intelligent Medicine, Nankai University, Tianjin 300071, China
- College of Computer Science, Nankai University, Tianjin 300071, China
- Tianjin Key Laboratory of Network and Data Security, Tianjin 300350, China
| | - Chang Sun
- Centre for Bioinformatics and Intelligent Medicine, Nankai University, Tianjin 300071, China
- College of Computer Science, Nankai University, Tianjin 300071, China
- Tianjin Key Laboratory of Network and Data Security, Tianjin 300350, China
| | - Minglei Li
- Centre for Bioinformatics and Intelligent Medicine, Nankai University, Tianjin 300071, China
- College of Computer Science, Nankai University, Tianjin 300071, China
- Tianjin Key Laboratory of Network and Data Security, Tianjin 300350, China
| | - Rong Tang
- Centre for Bioinformatics and Intelligent Medicine, Nankai University, Tianjin 300071, China
- College of Computer Science, Nankai University, Tianjin 300071, China
- Tianjin Key Laboratory of Network and Data Security, Tianjin 300350, China
| | - Bin Xie
- College of Computer and Cyber Security, Hebei Normal University, Shijiazhuang 050024, China
| | - Shuqin Wang
- College of Computer and Information Engineering, Tianjin Normal University, Tianjin, Xi Qing District 300387, China
| | - Jin-Mao Wei
- Centre for Bioinformatics and Intelligent Medicine, Nankai University, Tianjin 300071, China
- College of Computer Science, Nankai University, Tianjin 300071, China
| |
Collapse
|
10
|
Durant G, Boyles F, Birchall K, Deane CM. The future of machine learning for small-molecule drug discovery will be driven by data. NATURE COMPUTATIONAL SCIENCE 2024; 4:735-743. [PMID: 39407003 DOI: 10.1038/s43588-024-00699-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/01/2024] [Accepted: 09/03/2024] [Indexed: 10/25/2024]
Abstract
Many studies have prophesied that the integration of machine learning techniques into small-molecule therapeutics development will help to deliver a true leap forward in drug discovery. However, increasingly advanced algorithms and novel architectures have not always yielded substantial improvements in results. In this Perspective, we propose that a greater focus on the data for training and benchmarking these models is more likely to drive future improvement, and explore avenues for future research and strategies to address these data challenges.
Collapse
Affiliation(s)
- Guy Durant
- Department of Statistics, University of Oxford, Oxford, UK
| | - Fergus Boyles
- Department of Statistics, University of Oxford, Oxford, UK
| | | | | |
Collapse
|
11
|
Zhao L, Wang H, Shi S. PocketDTA: an advanced multimodal architecture for enhanced prediction of drug-target affinity from 3D structural data of target binding pockets. Bioinformatics 2024; 40:btae594. [PMID: 39365726 PMCID: PMC11502498 DOI: 10.1093/bioinformatics/btae594] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2024] [Revised: 09/20/2024] [Accepted: 10/02/2024] [Indexed: 10/06/2024] Open
Abstract
MOTIVATION Accurately predicting the drug-target binding affinity (DTA) is crucial to drug discovery and repurposing. Although deep learning has been widely used in this field, it still faces challenges with insufficient generalization performance, inadequate use of 3D information, and poor interpretability. RESULTS To alleviate these problems, we developed the PocketDTA model. This model enhances the generalization performance by pre-trained models ESM-2 and GraphMVP. It ingeniously handles the first 3 (top-3) target binding pockets and drug 3D information through customized GVP-GNN Layers and GraphMVP-Decoder. In addition, it uses a bilinear attention network to enhance interpretability. Comparative analysis with state-of-the-art (SOTA) methods on the optimized Davis and KIBA datasets reveals that the PocketDTA model exhibits significant performance advantages. Further, ablation studies confirm the effectiveness of the model components, whereas cold-start experiments illustrate its robust generalization capabilities. In particular, the PocketDTA model has shown significant advantages in identifying key drug functional groups and amino acid residues via molecular docking and literature validation, highlighting its strong potential for interpretability. AVAILABILITY AND IMPLEMENTATION Code and data are available at: https://github.com/zhaolongNCU/PocketDTA.
Collapse
Affiliation(s)
- Long Zhao
- Department of Mathematics, School of Mathematics and Computer Sciences, Nanchang University, Nanchang 330031, China
| | - Hongmei Wang
- Department of Mathematics, School of Mathematics and Computer Sciences, Nanchang University, Nanchang 330031, China
| | - Shaoping Shi
- Department of Mathematics, School of Mathematics and Computer Sciences, Nanchang University, Nanchang 330031, China
- Institute of Mathematics and Interdisciplinary Sciences, Nanchang University, Nanchang 330031, China
| |
Collapse
|
12
|
Ye Q, Sun Y. Graph neural pre-training based drug-target affinity prediction. Front Genet 2024; 15:1452339. [PMID: 39350770 PMCID: PMC11439641 DOI: 10.3389/fgene.2024.1452339] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2024] [Accepted: 08/27/2024] [Indexed: 10/04/2024] Open
Abstract
Computational drug-target affinity prediction has the potential to accelerate drug discovery. Currently, pre-training models have achieved significant success in various fields due to their ability to train the model using vast amounts of unlabeled data. However, given the scarcity of drug-target interaction data, pre-training models can only be trained separately on drug and target data, resulting in features that are insufficient for drug-target affinity prediction. To address this issue, in this paper, we design a graph neural pre-training-based drug-target affinity prediction method (GNPDTA). This approach comprises three stages. In the first stage, two pre-training models are utilized to extract low-level features from drug atom graphs and target residue graphs, leveraging a large number of unlabeled training samples. In the second stage, two 2D convolutional neural networks are employed to combine the extracted drug atom features and target residue features into high-level representations of drugs and targets. Finally, in the third stage, a predictor is used to predict the drug-target affinity. This approach fully utilizes both unlabeled and labeled training samples, enhancing the effectiveness of pre-training models for drug-target affinity prediction. In our experiments, GNPDTA outperforms other deep learning methods, validating the efficacy of our approach.
Collapse
Affiliation(s)
- Qing Ye
- School of Data Science and Artificial Intelligence, Wenzhou University of Technology, Wenzhou, China
| | - Yaxin Sun
- School of Computer Science and Technology (School of Artificial Intelligence), Zhejiang Normal University, Jinhua, China
- Zhejiang Aerospace Hengjia Data Technology Co. Ltd., Jiaxing, China
| |
Collapse
|
13
|
Chen G, He H, Lv Q, Zhao L, Chen CYC. MMFA-DTA: Multimodal Feature Attention Fusion Network for Drug-Target Affinity Prediction for Drug Repurposing Against SARS-CoV-2. J Chem Theory Comput 2024. [PMID: 39269697 DOI: 10.1021/acs.jctc.4c00663] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/15/2024]
Abstract
The continuous emergence of novel infectious diseases poses a significant threat to global public health security, necessitating the development of small-molecule inhibitors that directly target pathogens. The RNA-dependent RNA polymerase (RdRp) and main protease (Mpro) of SARS-CoV-2 have been validated as potential key antiviral drug targets for the treatment of COVID-19. However, the conventional new drug R&D cycle takes 10-15 years, failing to meet the urgent needs during epidemics. Here, we propose a general multimodal deep learning framework for drug repurposing, MMFA-DTA, to enable rapid virtual screening of known drugs and significantly improve discovery efficiency. By extracting graph topological and sequence features from both small molecules and proteins, we design attention mechanisms to achieve dynamic fusion across modalities. Results demonstrate the superior performance of MMFA-DTA in drug-target affinity prediction over several state-of-the-art baseline methods on Davis and KIBA data sets, validating the benefits of heterogeneous information integration for representation learning and interaction modeling. Further fine-tuning on COVID-19-relevant bioactivity data enhances model predictions for critical SARS-CoV-2 enzymes. Case studies screening the FDA-approved drug library successfully identify etacrynic acid as the potential lead compound against both RdRp and Mpro. Molecular dynamics simulations further confirm the stability and binding affinity of etacrynic acid to these targets. This study proves the great potential and advantages of deep learning and drug repurposing strategies in supporting antiviral drug discovery. The proposed general and rapid response computational framework holds significance for preparedness against future public health events.
Collapse
Affiliation(s)
- Guanxing Chen
- Artificial Intelligence Medical Research Center, School of Intelligent Systems Engineering, Shenzhen Campus of Sun Yat-sen University, Shenzhen, Guangdong 518107, China
| | - Haohuai He
- Artificial Intelligence Medical Research Center, School of Intelligent Systems Engineering, Shenzhen Campus of Sun Yat-sen University, Shenzhen, Guangdong 518107, China
| | - Qiujie Lv
- School of Computer and Artificial Intelligence, Zhengzhou University, Zhengzhou, Henan 450001, China
| | - Lu Zhao
- State Key Laboratory of Chemical Oncogenomics, School of Chemical Biology and Biotechnology, Peking University Shenzhen Graduate School, Shenzhen, Guangdong 518055, China
| | - Calvin Yu-Chian Chen
- Artificial Intelligence Medical Research Center, School of Intelligent Systems Engineering, Shenzhen Campus of Sun Yat-sen University, Shenzhen, Guangdong 518107, China
- State Key Laboratory of Chemical Oncogenomics, School of Chemical Biology and Biotechnology, Peking University Shenzhen Graduate School, Shenzhen, Guangdong 518055, China
- AI for Science (AI4S)-Preferred Program, School of Electronic and Computer Engineering, Peking University Shenzhen Graduate School, Shenzhen, Guangdong 518055, China
- Department of Medical Research, China Medical University Hospital, Taichung 40447, Taiwan
- Department of Bioinformatics and Medical Engineering, Asia University, Taichung 41354, Taiwan
- Guangdong L-Med Biotechnology Co., Ltd, Meizhou, Guangdong 514699, China
| |
Collapse
|
14
|
Ji W, She S, Qiao C, Feng Q, Rui M, Xu X, Feng C. A general prediction model for compound-protein interactions based on deep learning. Front Pharmacol 2024; 15:1465890. [PMID: 39295942 PMCID: PMC11408283 DOI: 10.3389/fphar.2024.1465890] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2024] [Accepted: 08/20/2024] [Indexed: 09/21/2024] Open
Abstract
Background The identification of compound-protein interactions (CPIs) is crucial for drug discovery and understanding mechanisms of action. Accurate CPI prediction can elucidate drug-target-disease interactions, aiding in the discovery of candidate compounds and effective synergistic drugs, particularly from traditional Chinese medicine (TCM). Existing in silico methods face challenges in prediction accuracy and generalization due to compound and target diversity and the lack of largescale interaction datasets and negative datasets for model learning. Methods To address these issues, we developed a computational model for CPI prediction by integrating the constructed large-scale bioactivity benchmark dataset with a deep learning (DL) algorithm. To verify the accuracy of our CPI model, we applied it to predict the targets of compounds in TCM. An herb pair of Astragalus membranaceus and Hedyotis diffusaas was used as a model, and the active compounds in this herb pair were collected from various public databases and the literature. The complete targets of these active compounds were predicted by the CPI model, resulting in an expanded target dataset. This dataset was next used for the prediction of synergistic antitumor compound combinations. The predicted multi-compound combinations were subsequently examined through in vitro cellular experiments. Results Our CPI model demonstrated superior performance over other machine learning models, achieving an area under the Receiver Operating Characteristic curve (AUROC) of 0.98, an area under the precision-recall curve (AUPR) of 0.98, and an accuracy (ACC) of 93.31% on the test set. The model's generalization capability and applicability were further confirmed using external databases. Utilizing this model, we predicted the targets of compounds in the herb pair of Astragalus membranaceus and Hedyotis diffusaas, yielding an expanded target dataset. Then, we integrated this expanded target dataset to predict effective drug combinations using our drug synergy prediction model DeepMDS. Experimental assay on breast cancer cell line MDA-MB-231 proved the efficacy of the best predicted multi-compound combinations: Combination I (Epicatechin, Ursolic acid, Quercetin, Aesculetin and Astragaloside IV) exhibited a half-maximal inhibitory concentration (IC50) value of 19.41 μM, and a combination index (CI) value of 0.682; and Combination II (Epicatechin, Ursolic acid, Quercetin, Vanillic acid and Astragaloside IV) displayed a IC50 value of 23.83 μM and a CI value of 0.805. These results validated the ability of our model to make accurate predictions for novel CPI data outside the training dataset and evaluated the reliability of the predictions, showing good applicability potential in drug discovery and in the elucidation of the bioactive compounds in TCM. Conclusion Our CPI prediction model can serve as a useful tool for accurately identifying potential CPI for a wide range of proteins, and is expected to facilitate drug research, repurposing and support the understanding of TCM.
Collapse
Affiliation(s)
- Wei Ji
- School of Pharmacy, Jiangsu University, Zhenjiang, China
- School of Medicine, Jiangsu University, Zhenjiang, China
| | - Shengnan She
- School of Pharmacy, Jiangsu University, Zhenjiang, China
| | - Chunxue Qiao
- School of Pharmacy, Jiangsu University, Zhenjiang, China
| | - Qiuqi Feng
- School of Pharmacy, Jiangsu University, Zhenjiang, China
| | - Mengjie Rui
- School of Pharmacy, Jiangsu University, Zhenjiang, China
| | - Ximing Xu
- School of Pharmacy, Jiangsu University, Zhenjiang, China
| | - Chunlai Feng
- School of Pharmacy, Jiangsu University, Zhenjiang, China
| |
Collapse
|
15
|
Chen J, Yang X, Wu H. A Multibranch Neural Network for Drug-Target Affinity Prediction Using Similarity Information. ACS OMEGA 2024; 9:35978-35989. [PMID: 39184467 PMCID: PMC11339836 DOI: 10.1021/acsomega.4c05607] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/15/2024] [Revised: 08/03/2024] [Accepted: 08/06/2024] [Indexed: 08/27/2024]
Abstract
Predicting drug-target affinity (DTA) is beneficial for accelerating drug discovery. In recent years, graph structure-based deep learning models have garnered significant attention in this field. However, these models typically handle drug or target protein in isolation and only extract the molecular structure information on the drug or protein itself. To address this limitation, existing network-based models represent drug-target interactions or affinities as a knowledge graph to capture the interaction information. In this study, we propose a novel solution. Specifically, we introduce drug similarity information and protein similarity information into the field of DTA prediction. Moreover, we propose a network framework that autonomously extracts similarity information, avoiding reliance on knowledge graphs. Based on this framework, we design a multibranch neural network called GASI-DTA. This network integrates similarity information, sequence information, and molecular structure information. Comprehensive experimental results conducted on two benchmark data sets and three cold-start scenarios demonstrate that our model outperforms state-of-the-art graph structure-based methods in nearly all metrics. Furthermore, it exhibits significant advantages over existing network-based models, outperforming the best of them in the majority of metrics. Our study's code and data are openly accessible at http://github.com/XiaoLin-Yang-S/GASI-DTA.
Collapse
Affiliation(s)
- Jing Chen
- School
of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi 214122, China
- Jiangsu
Provincial Engineering Laboratory of Pattern Recognition and Computing
Intelligence, Jiangnan University, Wuxi 214122, China
| | - Xiaolin Yang
- School
of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi 214122, China
| | - Haoyu Wu
- School
of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi 214122, China
| |
Collapse
|
16
|
Bernett J, Blumenthal DB, Grimm DG, Haselbeck F, Joeres R, Kalinina OV, List M. Guiding questions to avoid data leakage in biological machine learning applications. Nat Methods 2024; 21:1444-1453. [PMID: 39122953 DOI: 10.1038/s41592-024-02362-y] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2024] [Accepted: 06/26/2024] [Indexed: 08/12/2024]
Abstract
Machine learning methods for extracting patterns from high-dimensional data are very important in the biological sciences. However, in certain cases, real-world applications cannot confirm the reported prediction performance. One of the main reasons for this is data leakage, which can be seen as the illicit sharing of information between the training data and the test data, resulting in performance estimates that are far better than the performance observed in the intended application scenario. Data leakage can be difficult to detect in biological datasets due to their complex dependencies. With this in mind, we present seven questions that should be asked to prevent data leakage when constructing machine learning models in biological domains. We illustrate the usefulness of our questions by applying them to nontrivial examples. Our goal is to raise awareness of potential data leakage problems and to promote robust and reproducible machine learning-based research in biology.
Collapse
Affiliation(s)
- Judith Bernett
- TUM School of Life Sciences, Technical University of Munich, Freising, Germany
| | - David B Blumenthal
- Department Artificial Intelligence in Biomedical Engineering, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany.
| | - Dominik G Grimm
- TUM Campus Straubing for Biotechnology and Sustainability, Technical University of Munich, Straubing, Germany.
- Bioinformatics, Weihenstephan-Triesdorf University of Applied Sciences, Straubing, Germany.
- TUM School of Computation, Information and Technology, Technical University of Munich, Garching, Germany.
| | - Florian Haselbeck
- TUM Campus Straubing for Biotechnology and Sustainability, Technical University of Munich, Straubing, Germany
- Bioinformatics, Weihenstephan-Triesdorf University of Applied Sciences, Straubing, Germany
- Smart Farming, Weihenstephan-Triesdorf University of Applied Sciences, Freising, Germany
| | - Roman Joeres
- Department of Chemistry and Molecular Biology, University of Gothenburg, Gothenburg, Sweden
- Wallenberg Centre for Molecular and Translational Medicine, University of Gothenburg, Gothenburg, Sweden
- Helmholtz Institute for Pharmaceutical Research Saarland (HIPS), Helmholtz Centre for Infection Research (HZI), Saarbrücken, Germany
- Center for Bioinformatics, Saarland University, Saarbrücken, Germany
| | - Olga V Kalinina
- Helmholtz Institute for Pharmaceutical Research Saarland (HIPS), Helmholtz Centre for Infection Research (HZI), Saarbrücken, Germany.
- Center for Bioinformatics, Saarland University, Saarbrücken, Germany.
- Medical Faculty, Saarland University, Homburg, Germany.
| | - Markus List
- TUM School of Life Sciences, Technical University of Munich, Freising, Germany.
- Munich Data Science Institute (MDSI), Technical University of Munich, Garching, Germany.
| |
Collapse
|
17
|
Yu H, Xu WX, Tan T, Liu Z, Shi JY. Prediction of drug-target binding affinity based on multi-scale feature fusion. Comput Biol Med 2024; 178:108699. [PMID: 38870725 DOI: 10.1016/j.compbiomed.2024.108699] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2024] [Revised: 05/05/2024] [Accepted: 06/01/2024] [Indexed: 06/15/2024]
Abstract
Accurate prediction of drug-target binding affinity (DTA) plays a pivotal role in drug discovery and repositioning. Although deep learning methods are widely used in DTA prediction, two significant challenges persist: (i) how to effectively represent the complex structural information of proteins and drugs; (ii) how to precisely model the mutual interactions between protein binding sites and key drug substructures. To address these challenges, we propose a MSFFDTA (Multi-scale feature fusion for predicting drug target affinity) model, in which multi-scale encoders effectively capture multi-level structural information of drugs and proteins are designed. And then a Selective Cross Attention (SCA) mechanism is developed to filter out the trivial interactions between drug-protein substructure pairs and retain the important ones, which will make the proposed model better focusing on these key interactions and offering insights into their underlying mechanism. Experimental results on two benchmark datasets demonstrate that MSFFDTA is superior to several state-of-the-art methods across almost all comparison metrics. Finally, we provide the ablation and case studies with visualizations to verify the effectiveness and the interpretability of MSFFDTA. The source code is freely available at https://github.com/whitehat32/MSFF-DTA/.
Collapse
Affiliation(s)
- Hui Yu
- School of Computer Science, Northwestern Polytechnical University, Xi'an, 710072, China.
| | - Wen-Xin Xu
- School of Computer Science, Northwestern Polytechnical University, Xi'an, 710072, China.
| | - Tian Tan
- School of Computer Science, Northwestern Polytechnical University, Xi'an, 710072, China.
| | - Zun Liu
- School of Computer Science, Northwestern Polytechnical University, Xi'an, 710072, China.
| | - Jian-Yu Shi
- School of Life Sciences, Northwestern Polytechnical University, Xi'an, 710072, China.
| |
Collapse
|
18
|
Yang X, Yang G, Chu J. GraphCL-DTA: A Graph Contrastive Learning With Molecular Semantics for Drug-Target Binding Affinity Prediction. IEEE J Biomed Health Inform 2024; 28:4544-4552. [PMID: 38190664 DOI: 10.1109/jbhi.2024.3350666] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2024]
Abstract
Drug-target binding affinity prediction plays an important role in the early stages of drug discovery, which can infer the strength of interactions between new drugs and new targets. However, the performance of previous computational models is limited by the following drawbacks. The learning of drug representation relies only on supervised data without considering the information in the molecular graph itself. Moreover, most previous studies tended to design complicated representation learning modules, while uniformity used to measure representation quality is ignored. In this study, we propose GraphCL-DTA, a graph contrastive learning with molecular semantics for drug-target binding affinity prediction. This graph contrastive learning framework replaces the dropout-based data augmentation strategy by performing data augmentation in the embedding space, thereby better preserving the semantic information of the molecular graph. A more essential and effective drug representation can be learned through this graph contrastive framework without additional supervised data. Next, we design a new loss function that can be directly used to adjust the uniformity of drug and target representations. By directly optimizing the uniformity of representations, the representation quality of drugs and targets can be improved. The effectiveness of the above innovative elements is verified on two real datasets, KIBA and Davis. Compared with the GraphDTA model, the relative improvement of the GraphCL-DTA model on the two datasets is 2.7% and 4.5%. The graph contrastive learning framework and uniformity function in the GraphCL-DTA model can be embedded into other computational models as independent modules to improve their generalization capability.
Collapse
|
19
|
Liu S, Yu J, Ni N, Wang Z, Chen M, Li Y, Xu C, Ding Y, Zhang J, Yao X, Liu H. Versatile Framework for Drug-Target Interaction Prediction by Considering Domain-Specific Features. J Chem Inf Model 2024; 64:5646-5656. [PMID: 38976879 DOI: 10.1021/acs.jcim.4c00403] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/10/2024]
Abstract
Predicting drug-target interactions (DTIs) is one of the crucial tasks in drug discovery, but traditional wet-lab experiments are costly and time-consuming. Recently, deep learning has emerged as a promising tool for accelerating DTI prediction due to its powerful performance. However, the models trained on limited known DTI data struggle to generalize effectively to novel drug-target pairs. In this work, we propose a strategy to train an ensemble of models by capturing both domain-generic and domain-specific features (E-DIS) to learn diverse domain features and adapt them to out-of-distribution data. Multiple experts were trained on different domains to capture and align domain-specific information from various distributions without accessing any data from unseen domains. E-DIS provides a comprehensive representation of proteins and ligands by capturing diverse features. Experimental results on four benchmark data sets in both in-domain and cross-domain settings demonstrated that E-DIS significantly improved model performance and domain generalization compared to existing methods. Our approach presents a significant advancement in DTI prediction by combining domain-generic and domain-specific features, enhancing the generalization ability of the DTI prediction model.
Collapse
Affiliation(s)
- Shuo Liu
- School of Pharmacy, Lanzhou University, Gansu 730000, China
- Huawei Technologies Co., Ltd., Hangzhou 310000, China
| | - Jialiang Yu
- Huawei Technologies Co., Ltd., Hangzhou 310000, China
| | - Ningxi Ni
- Huawei Technologies Co., Ltd., Hangzhou 310000, China
| | - Zidong Wang
- Huawei Technologies Co., Ltd., Hangzhou 310000, China
| | - Mengyun Chen
- Huawei Technologies Co., Ltd., Hangzhou 310000, China
| | - Yuquan Li
- College of Chemistry and Chemical Engineering, Lanzhou University, Gansu 730000, China
| | - Chen Xu
- Huawei Technologies Co., Ltd., Hangzhou 310000, China
| | - Yahao Ding
- Huawei Technologies Co., Ltd., Hangzhou 310000, China
| | - Jun Zhang
- Changping Laboratory, Beijing 102200, China
| | - Xiaojun Yao
- Faculty of Applied Sciences, Macao Polytechnic University, Macao SAR 999078, China
| | - Huanxiang Liu
- Faculty of Applied Sciences, Macao Polytechnic University, Macao SAR 999078, China
| |
Collapse
|
20
|
Ranjan A, Bess A, Alvin C, Mukhopadhyay S. MDF-DTA: A Multi-Dimensional Fusion Approach for Drug-Target Binding Affinity Prediction. J Chem Inf Model 2024; 64:4980-4990. [PMID: 38888163 PMCID: PMC11234358 DOI: 10.1021/acs.jcim.4c00310] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2024] [Revised: 05/15/2024] [Accepted: 05/29/2024] [Indexed: 06/20/2024]
Abstract
Drug-target affinity (DTA) prediction is an important task in the early stages of drug discovery. Traditional biological approaches are time-consuming, effort-consuming, and resource-consuming due to the large size of genomic and chemical spaces. Computational approaches using machine learning have emerged to narrow down the drug candidate search space. However, most of these prediction models focus on single feature encoding of drugs and targets, ignoring the importance of integrating different dimensions of these features. We propose a deep learning-based approach called Multi-Dimensional Fusion for Drug Target Affinity Prediction (MDF-DTA) incorporating different dimensional features. Our model fuses 1D, 2D, and 3D representations obtained from different pretrained models for both drugs and targets. We evaluated MDF-DTA on two standard benchmark data sets: DAVIS and KIBA. Experimental results show that MDF-DTA outperforms many state-of-the-art techniques in the DTA task across both data sets. Through ablation studies and performance evaluation metrics, we evaluate the importance of individual representations and the impact of each representation on MDF-DTA.
Collapse
Affiliation(s)
- Amit Ranjan
- Department
of Environmental Sciences, Louisiana State
University, Baton
Rouge, Louisiana 70803, United States
| | - Adam Bess
- Department
of Environmental Sciences, Louisiana State
University, Baton
Rouge, Louisiana 70803, United States
| | - Chris Alvin
- Department
of Computer Science, Furman University, Greenville, South Carolina 29613, United States
| | - Supratik Mukhopadhyay
- Department
of Environmental Sciences, Louisiana State
University, Baton
Rouge, Louisiana 70803, United States
| |
Collapse
|
21
|
Han L, Kang L, Guo Q. ImageDTA: A Simple Model for Drug-Target Binding Affinity Prediction. ACS OMEGA 2024; 9:28485-28493. [PMID: 38973881 PMCID: PMC11223229 DOI: 10.1021/acsomega.4c02308] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/12/2024] [Revised: 06/12/2024] [Accepted: 06/13/2024] [Indexed: 07/09/2024]
Abstract
Predicting the drug-target binding affinity (DTA) is crucial in drug discovery, and an increasing number of researchers are using artificial intelligence techniques to make such predictions. Many effective deep neural network prediction models have been proposed. However, current methods need improvement in accuracy, complexity, and efficiency. In this study, we propose a method based on a multiscale 2-dimensional convolutional neural network (CNN), namely ImageDTA. Many studies have shown that CNN achieves good learning effects with limited data. Therefore, we take a unique perspective by treating the word vector encoded with a simplified molecular input line entry system (SMILES) string as an "image" and processing it like handling images, fully leveraging the efficient processing capabilities of CNN for image data. Furthermore, we show that ImageDTA has higher training and inference efficiency than pretrained large models and outperforms attention-based graph neural network models in accuracy and interpretability. We also use visualization techniques to select appropriate convolutional kernel sizes, thereby increasing the network's interpretability.
Collapse
Affiliation(s)
- Li Han
- Software
and Big Data Technology Department, Dalian
Neusoft University of Information, Dalian, Liaoning 116023, China
| | - Ling Kang
- Neusoft
Research Institute, Dalian Neusoft University
of Information, Dalian, Liaoning 116023, China
| | - Quan Guo
- Neusoft
Research Institute, Dalian Neusoft University
of Information, Dalian, Liaoning 116023, China
| |
Collapse
|
22
|
Wu H, Liu J, Zhang R, Lu Y, Cui G, Cui Z, Ding Y. A review of deep learning methods for ligand based drug virtual screening. FUNDAMENTAL RESEARCH 2024; 4:715-737. [PMID: 39156568 PMCID: PMC11330120 DOI: 10.1016/j.fmre.2024.02.011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2023] [Revised: 01/10/2024] [Accepted: 02/18/2024] [Indexed: 08/20/2024] Open
Abstract
Drug discovery is costly and time consuming, and modern drug discovery endeavors are progressively reliant on computational methodologies, aiming to mitigate temporal and financial expenditures associated with the process. In particular, the time required for vaccine and drug discovery is prolonged during emergency situations such as the coronavirus 2019 pandemic. Recently, the performance of deep learning methods in drug virtual screening has been particularly prominent. It has become a concern for researchers how to summarize the existing deep learning in drug virtual screening, select different models for different drug screening problems, exploit the advantages of deep learning models, and further improve the capability of deep learning in drug virtual screening. This review first introduces the basic concepts of drug virtual screening, common datasets, and data representation methods. Then, large numbers of common deep learning methods for drug virtual screening are compared and analyzed. In addition, a dataset of different sizes is constructed independently to evaluate the performance of each deep learning model for the difficult problem of large-scale ligand virtual screening. Finally, the existing challenges and future directions in the field of virtual screening are presented.
Collapse
Affiliation(s)
- Hongjie Wu
- School of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou 215009, China
| | - Junkai Liu
- School of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou 215009, China
| | - Runhua Zhang
- School of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou 215009, China
| | - Yaoyao Lu
- School of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou 215009, China
| | - Guozeng Cui
- School of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou 215009, China
| | - Zhiming Cui
- School of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou 215009, China
| | - Yijie Ding
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou 324000, China
| |
Collapse
|
23
|
Zhang Z, He X, Long D, Luo G, Chen S. Enhancing generalizability and performance in drug-target interaction identification by integrating pharmacophore and pre-trained models. Bioinformatics 2024; 40:i539-i547. [PMID: 38940179 PMCID: PMC11211825 DOI: 10.1093/bioinformatics/btae240] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/29/2024] Open
Abstract
MOTIVATION In drug discovery, it is crucial to assess the drug-target binding affinity (DTA). Although molecular docking is widely used, computational efficiency limits its application in large-scale virtual screening. Deep learning-based methods learn virtual scoring functions from labeled datasets and can quickly predict affinity. However, there are three limitations. First, existing methods only consider the atom-bond graph or one-dimensional sequence representations of compounds, ignoring the information about functional groups (pharmacophores) with specific biological activities. Second, relying on limited labeled datasets fails to learn comprehensive embedding representations of compounds and proteins, resulting in poor generalization performance in complex scenarios. Third, existing feature fusion methods cannot adequately capture contextual interaction information. RESULTS Therefore, we propose a novel DTA prediction method named HeteroDTA. Specifically, a multi-view compound feature extraction module is constructed to model the atom-bond graph and pharmacophore graph. The residue concat graph and protein sequence are also utilized to model protein structure and function. Moreover, to enhance the generalization capability and reduce the dependence on task-specific labeled data, pre-trained models are utilized to initialize the atomic features of the compounds and the embedding representations of the protein sequence. A context-aware nonlinear feature fusion method is also proposed to learn interaction patterns between compounds and proteins. Experimental results on public benchmark datasets show that HeteroDTA significantly outperforms existing methods. In addition, HeteroDTA shows excellent generalization performance in cold-start experiments and superiority in the representation learning ability of drug-target pairs. Finally, the effectiveness of HeteroDTA is demonstrated in a real-world drug discovery study. AVAILABILITY AND IMPLEMENTATION The source code and data are available at https://github.com/daydayupzzl/HeteroDTA.
Collapse
Affiliation(s)
- Zuolong Zhang
- School of Software, Henan University, Kaifeng, Henan Province 475000, China
| | - Xin He
- School of Software, Henan University, Kaifeng, Henan Province 475000, China
- Henan International Joint Laboratory of Intelligent Network Theory and Key Technology, Henan University, Kaifeng, Henan Province 475000, China
| | - Dazhi Long
- Department of Urology, Ji’an Third People’s Hospital, Ji’an, Jiangxi Province 343000, China
| | - Gang Luo
- School of Mathematics and Computer Science, Nanchang University, Nanchang, Jiangxi Province 330031, China
| | - Shengbo Chen
- Henan Engineering Research Center of Intelligent Technology and Application, Henan University, Kaifeng, Henan Province 475000, China
| |
Collapse
|
24
|
Huang D, Xie J. EMPDTA: An End-to-End Multimodal Representation Learning Framework with Pocket Online Detection for Drug-Target Affinity Prediction. Molecules 2024; 29:2912. [PMID: 38930976 PMCID: PMC11206982 DOI: 10.3390/molecules29122912] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2024] [Revised: 06/15/2024] [Accepted: 06/17/2024] [Indexed: 06/28/2024] Open
Abstract
Accurately predicting drug-target interactions is a critical yet challenging task in drug discovery. Traditionally, pocket detection and drug-target affinity prediction have been treated as separate aspects of drug-target interaction, with few methods combining these tasks within a unified deep learning system to accelerate drug development. In this study, we propose EMPDTA, an end-to-end framework that integrates protein pocket prediction and drug-target affinity prediction to provide a comprehensive understanding of drug-target interactions. The EMPDTA framework consists of three main modules: pocket online detection, multimodal representation learning for affinity prediction, and multi-task joint training. The performance and potential of the proposed framework have been validated across diverse benchmark datasets, achieving robust results in both tasks. Furthermore, the visualization results of the predicted pockets demonstrate accurate pocket detection, confirming the effectiveness of our framework.
Collapse
Affiliation(s)
| | - Jiang Xie
- School of Computer Engineering and Science, Shanghai University, Shanghai 200444, China;
| |
Collapse
|
25
|
Nguyen VTD, Hy TS. Multimodal pretraining for unsupervised protein representation learning. Biol Methods Protoc 2024; 9:bpae043. [PMID: 38983679 PMCID: PMC11233121 DOI: 10.1093/biomethods/bpae043] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2024] [Revised: 05/30/2024] [Accepted: 06/12/2024] [Indexed: 07/11/2024] Open
Abstract
Proteins are complex biomolecules essential for numerous biological processes, making them crucial targets for advancements in molecular biology, medical research, and drug design. Understanding their intricate, hierarchical structures, and functions is vital for progress in these fields. To capture this complexity, we introduce Multimodal Protein Representation Learning (MPRL), a novel framework for symmetry-preserving multimodal pretraining that learns unified, unsupervised protein representations by integrating primary and tertiary structures. MPRL employs Evolutionary Scale Modeling (ESM-2) for sequence analysis, Variational Graph Auto-Encoders (VGAE) for residue-level graphs, and PointNet Autoencoder (PAE) for 3D point clouds of atoms, each designed to capture the spatial and evolutionary intricacies of proteins while preserving critical symmetries. By leveraging Auto-Fusion to synthesize joint representations from these pretrained models, MPRL ensures robust and comprehensive protein representations. Our extensive evaluation demonstrates that MPRL significantly enhances performance in various tasks such as protein-ligand binding affinity prediction, protein fold classification, enzyme activity identification, and mutation stability prediction. This framework advances the understanding of protein dynamics and facilitates future research in the field. Our source code is publicly available at https://github.com/HySonLab/Protein_Pretrain.
Collapse
Affiliation(s)
| | - Truong Son Hy
- FPT Software AI Center, HCMC, Hanoi, Vietnam
- Department of Mathematics and Computer Science, Indiana State University, Terre Haute, IN, 47809, United States
| |
Collapse
|
26
|
Amorim AM, Piochi LF, Gaspar AT, Preto A, Rosário-Ferreira N, Moreira IS. Advancing Drug Safety in Drug Development: Bridging Computational Predictions for Enhanced Toxicity Prediction. Chem Res Toxicol 2024; 37:827-849. [PMID: 38758610 PMCID: PMC11187637 DOI: 10.1021/acs.chemrestox.3c00352] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2023] [Revised: 04/29/2024] [Accepted: 05/07/2024] [Indexed: 05/19/2024]
Abstract
The attrition rate of drugs in clinical trials is generally quite high, with estimates suggesting that approximately 90% of drugs fail to make it through the process. The identification of unexpected toxicity issues during preclinical stages is a significant factor contributing to this high rate of failure. These issues can have a major impact on the success of a drug and must be carefully considered throughout the development process. These late-stage rejections or withdrawals of drug candidates significantly increase the costs associated with drug development, particularly when toxicity is detected during clinical trials or after market release. Understanding drug-biological target interactions is essential for evaluating compound toxicity and safety, as well as predicting therapeutic effects and potential off-target effects that could lead to toxicity. This will enable scientists to predict and assess the safety profiles of drug candidates more accurately. Evaluation of toxicity and safety is a critical aspect of drug development, and biomolecules, particularly proteins, play vital roles in complex biological networks and often serve as targets for various chemicals. Therefore, a better understanding of these interactions is crucial for the advancement of drug development. The development of computational methods for evaluating protein-ligand interactions and predicting toxicity is emerging as a promising approach that adheres to the 3Rs principles (replace, reduce, and refine) and has garnered significant attention in recent years. In this review, we present a thorough examination of the latest breakthroughs in drug toxicity prediction, highlighting the significance of drug-target binding affinity in anticipating and mitigating possible adverse effects. In doing so, we aim to contribute to the development of more effective and secure drugs.
Collapse
Affiliation(s)
- Ana M.
B. Amorim
- Department
of Life Sciences, University of Coimbra, Calçada Martim de Freitas, 3000-456 Coimbra, Portugal
- CNC-UC—Center
for Neuroscience and Cell Biology, University
of Coimbra, Calçada Martim de Freitas, 3000-456 Coimbra, Portugal
- CIBB—Centre
for Innovative Biomedicine and Biotechnology, University of Coimbra, Calçada Martim de Freitas, 3000-456 Coimbra, Portugal
- PhD
Programme in Biosciences, Department of Life Sciences, University of Coimbra, Calçada Martim de Freitas, 3000-456 Coimbra, Portugal
- PURR.AI,
Rua Pedro Nunes, IPN Incubadora, Ed C, 3030-199 Coimbra, Portugal
| | - Luiz F. Piochi
- Department
of Life Sciences, University of Coimbra, Calçada Martim de Freitas, 3000-456 Coimbra, Portugal
- CNC-UC—Center
for Neuroscience and Cell Biology, University
of Coimbra, Calçada Martim de Freitas, 3000-456 Coimbra, Portugal
- CIBB—Centre
for Innovative Biomedicine and Biotechnology, University of Coimbra, Calçada Martim de Freitas, 3000-456 Coimbra, Portugal
| | - Ana T. Gaspar
- Department
of Life Sciences, University of Coimbra, Calçada Martim de Freitas, 3000-456 Coimbra, Portugal
- CNC-UC—Center
for Neuroscience and Cell Biology, University
of Coimbra, Calçada Martim de Freitas, 3000-456 Coimbra, Portugal
- CIBB—Centre
for Innovative Biomedicine and Biotechnology, University of Coimbra, Calçada Martim de Freitas, 3000-456 Coimbra, Portugal
| | - António
J. Preto
- CNC-UC—Center
for Neuroscience and Cell Biology, University
of Coimbra, Calçada Martim de Freitas, 3000-456 Coimbra, Portugal
- CIBB—Centre
for Innovative Biomedicine and Biotechnology, University of Coimbra, Calçada Martim de Freitas, 3000-456 Coimbra, Portugal
- PhD Programme
in Experimental Biology and Biomedicine, Institute for Interdisciplinary
Research (IIIUC), University of Coimbra, Casa Costa Alemão, 3030-789 Coimbra, Portugal
| | - Nícia Rosário-Ferreira
- CNC-UC—Center
for Neuroscience and Cell Biology, University
of Coimbra, Calçada Martim de Freitas, 3000-456 Coimbra, Portugal
- CIBB—Centre
for Innovative Biomedicine and Biotechnology, University of Coimbra, Calçada Martim de Freitas, 3000-456 Coimbra, Portugal
| | - Irina S. Moreira
- Department
of Life Sciences, University of Coimbra, Calçada Martim de Freitas, 3000-456 Coimbra, Portugal
- CNC-UC—Center
for Neuroscience and Cell Biology, University
of Coimbra, Calçada Martim de Freitas, 3000-456 Coimbra, Portugal
- CIBB—Centre
for Innovative Biomedicine and Biotechnology, University of Coimbra, Calçada Martim de Freitas, 3000-456 Coimbra, Portugal
| |
Collapse
|
27
|
Tian T, Li S, Zhang Z, Chen L, Zou Z, Zhao D, Zeng J. Benchmarking compound activity prediction for real-world drug discovery applications. Commun Chem 2024; 7:127. [PMID: 38834746 DOI: 10.1038/s42004-024-01204-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2024] [Accepted: 05/16/2024] [Indexed: 06/06/2024] Open
Abstract
Identifying active compounds for target proteins is fundamental in early drug discovery. Recently, data-driven computational methods have demonstrated promising potential in predicting compound activities. However, there lacks a well-designed benchmark to comprehensively evaluate these methods from a practical perspective. To fill this gap, we propose a Compound Activity benchmark for Real-world Applications (CARA). Through carefully distinguishing assay types, designing train-test splitting schemes and selecting evaluation metrics, CARA can consider the biased distribution of current real-world compound activity data and avoid overestimation of model performances. We observed that although current models can make successful predictions for certain proportions of assays, their performances varied across different assays. In addition, evaluation of several few-shot training strategies demonstrated different performances related to task types. Overall, we provide a high-quality dataset for developing and evaluating compound activity prediction models, and the analyses in this work may inspire better applications of data-driven models in drug discovery.
Collapse
Affiliation(s)
- Tingzhong Tian
- Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing, China
| | - Shuya Li
- Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing, China
| | - Ziting Zhang
- Department of Automation, Tsinghua University, Beijing, China
- MOE Key Laboratory of Bioinformatics, Tsinghua University, Beijing, China
| | - Lin Chen
- Silexon AI Technology Co., Ltd., Nanjing, Jiangsu Province, China
| | - Ziheng Zou
- Silexon AI Technology Co., Ltd., Nanjing, Jiangsu Province, China
| | - Dan Zhao
- Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing, China.
| | - Jianyang Zeng
- Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing, China.
- School of Engineering, Westlake University, Hangzhou, Zhejiang Province, China.
| |
Collapse
|
28
|
Chen X, Huang J, Shen T, Zhang H, Xu L, Yang M, Xie X, Yan Y, Yan J. DEAttentionDTA: protein-ligand binding affinity prediction based on dynamic embedding and self-attention. Bioinformatics 2024; 40:btae319. [PMID: 38897656 PMCID: PMC11193059 DOI: 10.1093/bioinformatics/btae319] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2023] [Revised: 03/23/2024] [Accepted: 06/17/2024] [Indexed: 06/21/2024] Open
Abstract
MOTIVATION Predicting protein-ligand binding affinity is crucial in new drug discovery and development. However, most existing models rely on acquiring 3D structures of elusive proteins. Combining amino acid sequences with ligand sequences and better highlighting active sites are also significant challenges. RESULTS We propose an innovative neural network model called DEAttentionDTA, based on dynamic word embeddings and a self-attention mechanism, for predicting protein-ligand binding affinity. DEAttentionDTA takes the 1D sequence information of proteins as input, including the global sequence features of amino acids, local features of the active pocket site, and linear representation information of the ligand molecule in the SMILE format. These three linear sequences are fed into a dynamic word-embedding layer based on a 1D convolutional neural network for embedding encoding and are correlated through a self-attention mechanism. The output affinity prediction values are generated using a linear layer. We compared DEAttentionDTA with various mainstream tools and achieved significantly superior results on the same dataset. We then assessed the performance of this model in the p38 protein family. AVAILABILITY AND IMPLEMENTATION The resource codes are available at https://github.com/whatamazing1/DEAttentionDTA.
Collapse
Affiliation(s)
- Xiying Chen
- Key Lab of Molecular Biophysics of Ministry of Education, College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan 430074, China
| | - Jinsha Huang
- Key Lab of Molecular Biophysics of Ministry of Education, College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan 430074, China
| | - Tianqiao Shen
- Key Lab of Molecular Biophysics of Ministry of Education, College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan 430074, China
| | - Houjin Zhang
- Key Lab of Molecular Biophysics of Ministry of Education, College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan 430074, China
| | - Li Xu
- Key Lab of Molecular Biophysics of Ministry of Education, College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan 430074, China
| | - Min Yang
- Key Lab of Molecular Biophysics of Ministry of Education, College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan 430074, China
| | - Xiaoman Xie
- Key Lab of Molecular Biophysics of Ministry of Education, College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan 430074, China
| | - Yunjun Yan
- Key Lab of Molecular Biophysics of Ministry of Education, College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan 430074, China
| | - Jinyong Yan
- Key Lab of Molecular Biophysics of Ministry of Education, College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan 430074, China
| |
Collapse
|
29
|
Ma J, Zhao Z, Li T, Liu Y, Ma J, Zhang R. GraphsformerCPI: Graph Transformer for Compound-Protein Interaction Prediction. Interdiscip Sci 2024; 16:361-377. [PMID: 38457109 DOI: 10.1007/s12539-024-00609-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2023] [Revised: 01/01/2024] [Accepted: 01/08/2024] [Indexed: 03/09/2024]
Abstract
Accurately predicting compound-protein interactions (CPI) is a critical task in computer-aided drug design. In recent years, the exponential growth of compound activity and biomedical data has highlighted the need for efficient and interpretable prediction approaches. In this study, we propose GraphsformerCPI, an end-to-end deep learning framework that improves prediction performance and interpretability. GraphsformerCPI treats compounds and proteins as sequences of nodes with spatial structures, and leverages novel structure-enhanced self-attention mechanisms to integrate semantic and graph structural features within molecules for deep molecule representations. To capture the vital association between compound atoms and protein residues, we devise a dual-attention mechanism to effectively extract relational features through .cross-mapping. By extending the powerful learning capabilities of Transformers to spatial structures and extensively utilizing attention mechanisms, our model offers strong interpretability, a significant advantage over most black-box deep learning methods. To evaluate GraphsformerCPI, extensive experiments were conducted on benchmark datasets including human, C. elegans, Davis and KIBA datasets. We explored the impact of model depth and dropout rate on performance and compared our model against state-of-the-art baseline models. Our results demonstrate that GraphsformerCPI outperforms baseline models in classification datasets and achieves competitive performance in regression datasets. Specifically, on the human dataset, GraphsformerCPI achieves an average improvement of 1.6% in AUC, 0.5% in precision, and 5.3% in recall. On the KIBA dataset, the average improvement in Concordance index (CI) and mean squared error (MSE) is 3.3% and 7.2%, respectively. Molecular docking shows that our model provides novel insights into the intrinsic interactions and binding mechanisms. Our research holds practical significance in effectively predicting CPIs and binding affinities, identifying key atoms and residues, enhancing model interpretability.
Collapse
Affiliation(s)
- Jun Ma
- School of Information Science and Engineering, Lanzhou University, Lanzhou, 730000, China.
- School of Information Engineering, Lanzhou University of Finance and Economics, Lanzhou, 730020, China.
| | - Zhili Zhao
- School of Information Science and Engineering, Lanzhou University, Lanzhou, 730000, China
| | - Tongfeng Li
- School of Information Science and Engineering, Lanzhou University, Lanzhou, 730000, China
- Computer College, Qinghai Normal University, Xi'ning, 810016, China
| | - Yunwu Liu
- School of Information Science and Engineering, Lanzhou University, Lanzhou, 730000, China
| | - Jun Ma
- School of Information Science and Engineering, Lanzhou University, Lanzhou, 730000, China
| | - Ruisheng Zhang
- School of Information Science and Engineering, Lanzhou University, Lanzhou, 730000, China.
| |
Collapse
|
30
|
Zhang S, Tian X, Chen C, Su Y, Huang W, Lv X, Chen C, Li H. AIGO-DTI: Predicting Drug-Target Interactions Based on Improved Drug Properties Combined with Adaptive Iterative Algorithms. J Chem Inf Model 2024; 64:4373-4384. [PMID: 38743013 DOI: 10.1021/acs.jcim.4c00584] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/16/2024]
Abstract
Artificial intelligence-based methods for predicting drug-target interactions (DTIs) aim to explore reliable drug candidate targets rapidly and cost-effectively to accelerate the drug development process. However, current methods are often limited by the topological regularities of drug molecules, making them difficult to generalize to a broader chemical space. Additionally, the use of similarity to measure DTI network links often introduces noise, leading to false DTI relationships and affecting the prediction accuracy. To address these issues, this study proposes an Adaptive Iterative Graph Optimization (AIGO)-DTI prediction framework. This framework integrates atomic cluster information and enhances molecular features through the design of functional group prompts and graph encoders, optimizing the construction of DTI association networks. Furthermore, the optimization of graph structure is transformed into a node similarity learning problem, utilizing multihead similarity metric functions to iteratively update the network structure to improve the quality of DTI information. Experimental results demonstrate the outstanding performance of AIGO-DTI on multiple public data sets and label reversal data sets. Case studies, molecular docking, and existing research validate its effectiveness and reliability. Overall, the method proposed in this study can construct comprehensive and reliable DTI association network information, providing new graphing and optimization strategies for DTI prediction, which contribute to efficient drug development and reduce target discovery costs.
Collapse
Affiliation(s)
- Sizhe Zhang
- College of Software, Xinjiang University, Urumqi, 830046 Xinjiang, China
| | - Xuecong Tian
- College of Information Science and Engineering, Xinjiang University, Urumqi, 830046 Xinjiang, China
| | - Chen Chen
- College of Information Science and Engineering, Xinjiang University, Urumqi, 830046 Xinjiang, China
| | - Ying Su
- College of Information Science and Engineering, Xinjiang University, Urumqi, 830046 Xinjiang, China
| | - Wanhua Huang
- College of Information Science and Engineering, Xinjiang University, Urumqi, 830046 Xinjiang, China
| | - Xiaoyi Lv
- College of Software, Xinjiang University, Urumqi, 830046 Xinjiang, China
| | - Cheng Chen
- College of Software, Xinjiang University, Urumqi, 830046 Xinjiang, China
| | - Hongyi Li
- Xinjiang University, Urumqi, 830046 Xinjiang, China
| |
Collapse
|
31
|
Backenköhler M, Groß J, Wolf V, Volkamer A. Guided Docking as a Data Generation Approach Facilitates Structure-Based Machine Learning on Kinases. J Chem Inf Model 2024; 64:4009-4020. [PMID: 38751014 DOI: 10.1021/acs.jcim.4c00055] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/28/2024]
Abstract
Drug discovery pipelines nowadays rely on machine learning models to explore and evaluate large chemical spaces. While including 3D structural information is considered beneficial, structural models are hindered by the availability of protein-ligand complex structures. Exemplified for kinase drug discovery, we address this issue by generating kinase-ligand complex data using template docking for the kinase compound subset of available ChEMBL assay data. To evaluate the benefit of the created complex data, we use it to train a structure-based E(3)-invariant graph neural network. Our evaluation shows that binding affinities can be predicted with significantly higher precision by models that take synthetic binding poses into account compared to ligand- or drug-target interaction models alone.
Collapse
Affiliation(s)
- Michael Backenköhler
- Data Driven Drug Design, Center for Bioinformatics, Saarland University, Saarbrücken 66123, Germany
| | - Joschka Groß
- Modeling and Simulation, Saarland University, Saarbrücken 66123, Germany
| | - Verena Wolf
- Modeling and Simulation, Saarland University, Saarbrücken 66123, Germany
| | - Andrea Volkamer
- Data Driven Drug Design, Center for Bioinformatics, Saarland University, Saarbrücken 66123, Germany
- Structural Bioinformatics and in Silico Toxicology Institute of Physiology, Universitätsmedizin Berlin, Berlin 10117, Germany
| |
Collapse
|
32
|
Zhou G, Qin Y, Hong Q, Li H, Chen H, Shen J. GEMF: a novel geometry-enhanced mid-fusion network for PLA prediction. Brief Bioinform 2024; 25:bbae333. [PMID: 38980371 PMCID: PMC11232467 DOI: 10.1093/bib/bbae333] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2024] [Revised: 06/04/2024] [Accepted: 06/26/2024] [Indexed: 07/10/2024] Open
Abstract
Accurate prediction of protein-ligand binding affinity (PLA) is important for drug discovery. Recent advances in applying graph neural networks have shown great potential for PLA prediction. However, existing methods usually neglect the geometric information (i.e. bond angles), leading to difficulties in accurately distinguishing different molecular structures. In addition, these methods also pose limitations in representing the binding process of protein-ligand complexes. To address these issues, we propose a novel geometry-enhanced mid-fusion network, named GEMF, to learn comprehensive molecular geometry and interaction patterns. Specifically, the GEMF consists of a graph embedding layer, a message passing phase, and a multi-scale fusion module. GEMF can effectively represent protein-ligand complexes as graphs, with graph embeddings based on physicochemical and geometric properties. Moreover, our dual-stream message passing framework models both covalent and non-covalent interactions. In particular, the edge-update mechanism, which is based on line graphs, can fuse both distance and angle information in the covalent branch. In addition, the communication branch consisting of multiple heterogeneous interaction modules is developed to learn intricate interaction patterns. Finally, we fuse the multi-scale features from the covalent, non-covalent, and heterogeneous interaction branches. The extensive experimental results on several benchmarks demonstrate the superiority of GEMF compared with other state-of-the-art methods.
Collapse
Affiliation(s)
- Guoqiang Zhou
- School of Computer Science, Nanjing University of Posts and Telecommunications, No.9 Wenyuan Road, Jiangsu 210023, China
| | - Yuke Qin
- School of Computer Science, Nanjing University of Posts and Telecommunications, No.9 Wenyuan Road, Jiangsu 210023, China
| | - Qiansen Hong
- School of Computer Science, Nanjing University of Posts and Telecommunications, No.9 Wenyuan Road, Jiangsu 210023, China
| | - Haoran Li
- School of Computing and Information Technology, University of Wollongong, Northfields Avenue, NSW 2522, Australia
| | - Huaming Chen
- School of Electrical and Computer Engineering, University of Sydney, Camperdown, NSW 2050, Australia
| | - Jun Shen
- School of Computing and Information Technology, University of Wollongong, Northfields Avenue, NSW 2522, Australia
| |
Collapse
|
33
|
Rovenchak A, Druchok M. Machine learning-assisted search for novel coagulants: When machine learning can be efficient even if data availability is low. J Comput Chem 2024; 45:937-952. [PMID: 38174834 DOI: 10.1002/jcc.27292] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2023] [Revised: 12/04/2023] [Accepted: 12/10/2023] [Indexed: 01/05/2024]
Abstract
Design of new drugs is a challenging process: a candidate molecule should satisfy multiple conditions to act properly and make the least side-effect-perfect candidates selectively attach to and influence only targets, leaving off-targets intact. The amount of experimental data about various properties of molecules constantly grows, promoting data-driven approaches. However, the applicability of typical predictive machine learning techniques can be substantially limited by a lack of experimental data about a particular target. For example, there are many known Thrombin inhibitors (acting as anticoagulants), but a very limited number of known Protein C inhibitors (coagulants). In this study, we present our approach to suggest new inhibitor candidates by building an effective representation of chemical space. For this aim, we developed a deep learning model-autoencoder, trained on a large set of molecules in the SMILES format to map the chemical space. Further, we applied different sampling strategies to generate novel coagulant candidates. Symmetrically, we tested our approach on anticoagulant candidates, where we were able to predict their inhibition towards Thrombin. We also compare our approach with MegaMolBART-another deep learning generative model, but exploiting similar principles of navigation in a chemical space.
Collapse
Affiliation(s)
- Andrij Rovenchak
- SoftServe, Inc., Lviv, Ukraine
- Professor Ivan Vakarchuk Department for Theoretical Physics, Ivan Franko National University of Lviv, Lviv, Ukraine
| | - Maksym Druchok
- SoftServe, Inc., Lviv, Ukraine
- Institute for Condensed Matter Physics, Lviv, Ukraine
| |
Collapse
|
34
|
Tang X, Lei X, Zhang Y. Prediction of Drug-Target Affinity Using Attention Neural Network. Int J Mol Sci 2024; 25:5126. [PMID: 38791165 PMCID: PMC11121300 DOI: 10.3390/ijms25105126] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2024] [Revised: 04/19/2024] [Accepted: 04/27/2024] [Indexed: 05/26/2024] Open
Abstract
Studying drug-target interactions (DTIs) is the foundational and crucial phase in drug discovery. Biochemical experiments, while being the most reliable method for determining drug-target affinity (DTA), are time-consuming and costly, making it challenging to meet the current demands for swift and efficient drug development. Consequently, computational DTA prediction methods have emerged as indispensable tools for this research. In this article, we propose a novel deep learning algorithm named GRA-DTA, for DTA prediction. Specifically, we introduce Bidirectional Gated Recurrent Unit (BiGRU) combined with a soft attention mechanism to learn target representations. We employ Graph Sample and Aggregate (GraphSAGE) to learn drug representation, especially to distinguish the different features of drug and target representations and their dimensional contributions. We merge drug and target representations by an attention neural network (ANN) to learn drug-target pair representations, which are fed into fully connected layers to yield predictive DTA. The experimental results showed that GRA-DTA achieved mean squared error of 0.142 and 0.225 and concordance index reached 0.897 and 0.890 on the benchmark datasets KIBA and Davis, respectively, surpassing the most state-of-the-art DTA prediction algorithms.
Collapse
Affiliation(s)
- Xin Tang
- School of Computer Science, Shaanxi Normal University, Xi’an 710119, China
| | - Xiujuan Lei
- School of Computer Science, Shaanxi Normal University, Xi’an 710119, China
| | - Yuchen Zhang
- College of Information Engineering, Northwest A&F University, Xianyang 712199, China;
| |
Collapse
|
35
|
Qiu X, Wang H, Tan X, Fang Z. G-K BertDTA: A graph representation learning and semantic embedding-based framework for drug-target affinity prediction. Comput Biol Med 2024; 173:108376. [PMID: 38552281 DOI: 10.1016/j.compbiomed.2024.108376] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2023] [Revised: 03/21/2024] [Accepted: 03/24/2024] [Indexed: 04/17/2024]
Abstract
Developing new drugs is costly, time-consuming, and risky. Drug-target affinity (DTA), indicating the binding capability between drugs and target proteins, is a crucial indicator for drug development. Accurately predicting interaction strength between new drug-target pairs by analyzing previous experiments aids in screening potential drug molecules, repurposing them, and developing safe and effective medicines. Existing computational models for DTA prediction rely on strings or single-graph neural networks, lacking consideration of protein structure and molecular semantic information, leading to limited accuracy. Our experiments demonstrate that string-based methods may overlook protein conformations, causing a high root mean square error (RMSE) of 3.584 in affinity due to a lack of spatial context. Single graph networks also underperform on topology features, with a 6% lower confidence interval (CI) for activity classification. Absent semantic information also limits generalization across diverse compounds, resulting in 18% increment in RMSE and 5% in misclassifications within quantifications study, restricting potential drug discovery. To address these limitations, we propose G-K BertDTA, a novel framework for accurate DTA prediction incorporating protein features, molecular semantic features, and molecular structural information. In this proposed model, we represent drugs as graphs, with a GIN employed to learn the molecular topological information. For the extraction of protein structural features, we utilize a DenseNet architecture. A knowledge-based BERT semantic model is incorporated to obtain rich pre-trained semantic embeddings, thereby enhancing the feature information. We extensively evaluated our proposed approach on the publicly available benchmark datasets (i.e., KIBA and Davis), and experimental results demonstrate the promising performance of our method, which consistently outperforms previous state-of-the-art approaches. Code is available at https://github.com/AmbitYuki/G-K-BertDTA.
Collapse
Affiliation(s)
- Xihe Qiu
- School of Electronic and Electrical Engineering, Shanghai University of Engineering Science, Shanghai, China
| | - Haoyu Wang
- School of Electronic and Electrical Engineering, Shanghai University of Engineering Science, Shanghai, China
| | - Xiaoyu Tan
- INF Technology (Shanghai) Co., Ltd., Shanghai, China
| | - Zhijun Fang
- School of Computer Science and Technology, Donghua University, Shanghai, China.
| |
Collapse
|
36
|
Gao M, Zhang D, Chen Y, Zhang Y, Wang Z, Wang X, Li S, Guo Y, Webb GI, Nguyen ATN, May L, Song J. GraphormerDTI: A graph transformer-based approach for drug-target interaction prediction. Comput Biol Med 2024; 173:108339. [PMID: 38547658 DOI: 10.1016/j.compbiomed.2024.108339] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2023] [Revised: 03/05/2024] [Accepted: 03/17/2024] [Indexed: 04/17/2024]
Abstract
The application of Artificial Intelligence (AI) to screen drug molecules with potential therapeutic effects has revolutionized the drug discovery process, with significantly lower economic cost and time consumption than the traditional drug discovery pipeline. With the great power of AI, it is possible to rapidly search the vast chemical space for potential drug-target interactions (DTIs) between candidate drug molecules and disease protein targets. However, only a small proportion of molecules have labelled DTIs, consequently limiting the performance of AI-based drug screening. To solve this problem, a machine learning-based approach with great ability to generalize DTI prediction across molecules is desirable. Many existing machine learning approaches for DTI identification failed to exploit the full information with respect to the topological structures of candidate molecules. To develop a better approach for DTI prediction, we propose GraphormerDTI, which employs the powerful Graph Transformer neural network to model molecular structures. GraphormerDTI embeds molecular graphs into vector-format representations through iterative Transformer-based message passing, which encodes molecules' structural characteristics by node centrality encoding, node spatial encoding and edge encoding. With a strong structural inductive bias, the proposed GraphormerDTI approach can effectively infer informative representations for out-of-sample molecules and as such, it is capable of predicting DTIs across molecules with an exceptional performance. GraphormerDTI integrates the Graph Transformer neural network with a 1-dimensional Convolutional Neural Network (1D-CNN) to extract the drugs' and target proteins' representations and leverages an attention mechanism to model the interactions between them. To examine GraphormerDTI's performance for DTI prediction, we conduct experiments on three benchmark datasets, where GraphormerDTI achieves a superior performance than five state-of-the-art baselines for out-of-molecule DTI prediction, including GNN-CPI, GNN-PT, DeepEmbedding-DTI, MolTrans and HyperAttentionDTI, and is on a par with the best baseline for transductive DTI prediction. The source codes and datasets are publicly accessible at https://github.com/mengmeng34/GraphormerDTI.
Collapse
Affiliation(s)
- Mengmeng Gao
- School of Biological Science and Medical Engineering, Southeast University, Nanjing, China
| | - Daokun Zhang
- Department of Data Science and Artificial Intelligence, Faculty of Information Technology, Monash University, Melbourne, Australia.
| | - Yi Chen
- School of Biological Science and Medical Engineering, Southeast University, Nanjing, China.
| | - Yiwen Zhang
- Climate, Air Quality Research Unit, School of Public Health and Preventive Medicine, Monash University, Melbourne, VIC, 3004, Australia
| | - Zhikang Wang
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, Australia
| | - Xiaoyu Wang
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, Australia
| | - Shanshan Li
- Climate, Air Quality Research Unit, School of Public Health and Preventive Medicine, Monash University, Melbourne, VIC, 3004, Australia
| | - Yuming Guo
- Climate, Air Quality Research Unit, School of Public Health and Preventive Medicine, Monash University, Melbourne, VIC, 3004, Australia
| | - Geoffrey I Webb
- Department of Data Science and Artificial Intelligence, Faculty of Information Technology, Monash University, Melbourne, Australia
| | - Anh T N Nguyen
- Drug Discovery Biology Theme, Monash Institute of Pharmaceutical Sciences, Monash University, Melbourne, Australia
| | - Lauren May
- Drug Discovery Biology Theme, Monash Institute of Pharmaceutical Sciences, Monash University, Melbourne, Australia
| | - Jiangning Song
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, Australia.
| |
Collapse
|
37
|
Zhang H, Liu X, Cheng W, Wang T, Chen Y. Prediction of drug-target binding affinity based on deep learning models. Comput Biol Med 2024; 174:108435. [PMID: 38608327 DOI: 10.1016/j.compbiomed.2024.108435] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2024] [Revised: 04/05/2024] [Accepted: 04/07/2024] [Indexed: 04/14/2024]
Abstract
The prediction of drug-target binding affinity (DTA) plays an important role in drug discovery. Computerized virtual screening techniques have been used for DTA prediction, greatly reducing the time and economic costs of drug discovery. However, these techniques have not succeeded in reversing the low success rate of new drug development. In recent years, the continuous development of deep learning (DL) technology has brought new opportunities for drug discovery through the DTA prediction. This shift has moved the prediction of DTA from traditional machine learning methods to DL. The DL frameworks used for DTA prediction include convolutional neural networks (CNN), graph convolutional neural networks (GCN), and recurrent neural networks (RNN), and reinforcement learning (RL), among others. This review article summarizes the available literature on DTA prediction using DL models, including DTA quantification metrics and datasets, and DL algorithms used for DTA prediction (including input representation of models, neural network frameworks, valuation indicators, and model interpretability). In addition, the opportunities, challenges, and prospects of the application of DL frameworks for DTA prediction in the field of drug discovery are discussed.
Collapse
Affiliation(s)
- Hao Zhang
- College of Science, Nanjing Agricultural University, Nanjing, 210095, China
| | - Xiaoqian Liu
- College of Science, Nanjing Agricultural University, Nanjing, 210095, China
| | - Wenya Cheng
- College of Science, Nanjing Agricultural University, Nanjing, 210095, China
| | - Tianshi Wang
- College of Science, Nanjing Agricultural University, Nanjing, 210095, China
| | - Yuanyuan Chen
- College of Science, Nanjing Agricultural University, Nanjing, 210095, China.
| |
Collapse
|
38
|
Brocidiacono M, Francoeur P, Aggarwal R, Popov KI, Koes DR, Tropsha A. BigBind: Learning from Nonstructural Data for Structure-Based Virtual Screening. J Chem Inf Model 2024; 64:2488-2495. [PMID: 38113513 DOI: 10.1021/acs.jcim.3c01211] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2023]
Abstract
Deep learning methods that predict protein-ligand binding have recently been used for structure-based virtual screening. Many such models have been trained using protein-ligand complexes with known crystal structures and activities from the PDBBind data set. However, because PDBbind only includes 20K complexes, models typically fail to generalize to new targets, and model performance is on par with models trained with only ligand information. Conversely, the ChEMBL database contains a wealth of chemical activity information but includes no information about binding poses. We introduce BigBind, a data set that maps ChEMBL activity data to proteins from the CrossDocked data set. BigBind comprises 583 K ligand activities and includes 3D structures of the protein binding pockets. Additionally, we augmented the data by adding an equal number of putative inactives for each target. Using this data, we developed Banana (basic neural network for binding affinity), a neural network-based model to classify active from inactive compounds, defined by a 10 μM cutoff. Our model achieved an AUC of 0.72 on BigBind's test set, while a ligand-only model achieved an AUC of 0.59. Furthermore, Banana achieved competitive performance on the LIT-PCBA benchmark (median EF1% 1.81) while running 16,000 times faster than molecular docking with Gnina. We suggest that Banana, as well as other models trained on this data set, will significantly improve the outcomes of prospective virtual screening tasks.
Collapse
Affiliation(s)
- Michael Brocidiacono
- Eshelman School of Pharmacy, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599, United States
| | - Paul Francoeur
- Department of Computational and Systems Biology, University of Pittsburgh, Pittsburgh, Pennsylvania 15260, United States
| | - Rishal Aggarwal
- Department of Computational and Systems Biology, University of Pittsburgh, Pittsburgh, Pennsylvania 15260, United States
| | - Konstantin I Popov
- Eshelman School of Pharmacy, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599, United States
| | - David Ryan Koes
- Department of Computational and Systems Biology, University of Pittsburgh, Pittsburgh, Pennsylvania 15260, United States
| | - Alexander Tropsha
- Eshelman School of Pharmacy, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599, United States
| |
Collapse
|
39
|
Gorantla R, Kubincová A, Weiße AY, Mey ASJS. From Proteins to Ligands: Decoding Deep Learning Methods for Binding Affinity Prediction. J Chem Inf Model 2024; 64:2496-2507. [PMID: 37983381 PMCID: PMC11005465 DOI: 10.1021/acs.jcim.3c01208] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2023] [Revised: 10/26/2023] [Accepted: 10/27/2023] [Indexed: 11/22/2023]
Abstract
Accurate in silico prediction of protein-ligand binding affinity is important in the early stages of drug discovery. Deep learning-based methods exist but have yet to overtake more conventional methods such as giga-docking largely due to their lack of generalizability. To improve generalizability, we need to understand what these models learn from input protein and ligand data. We systematically investigated a sequence-based deep learning framework to assess the impact of protein and ligand encodings on predicting binding affinities for commonly used kinase data sets. The role of proteins is studied using convolutional neural network-based encodings obtained from sequences and graph neural network-based encodings enriched with structural information from contact maps. Ligand-based encodings are generated from graph-neural networks. We test different ligand perturbations by randomizing node and edge properties. For proteins, we make use of 3 different protein contact generation methods (AlphaFold2, Pconsc4, and ESM-1b) and compare these with a random control. Our investigation shows that protein encodings do not substantially impact the binding predictions, with no statistically significant difference in binding affinity for KIBA in the investigated metrics (concordance index, Pearson's R Spearman's Rank, and RMSE). Significant differences are seen for ligand encodings with random ligands and random ligand node properties, suggesting a much bigger reliance on ligand data for the learning tasks. Using different ways to combine protein and ligand encodings did not show a significant change in performance.
Collapse
Affiliation(s)
- Rohan Gorantla
- School
of Informatics, University of Edinburgh, Edinburgh, EH8 9AB, U.K.
- EaStCHEM
School of Chemistry, University of Edinburgh, Edinburgh, EH9 3FJ, U.K.
| | | | - Andrea Y. Weiße
- School
of Informatics, University of Edinburgh, Edinburgh, EH8 9AB, U.K.
- School
of Biological Sciences, University of Edinburgh, Edinburgh, EH9 3FF, U.K.
| | | |
Collapse
|
40
|
Oliveira PF, Guedes RC, Falcao AO. Inferring molecular inhibition potency with AlphaFold predicted structures. Sci Rep 2024; 14:8252. [PMID: 38589418 PMCID: PMC11001998 DOI: 10.1038/s41598-024-58394-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2023] [Accepted: 03/28/2024] [Indexed: 04/10/2024] Open
Abstract
Even though in silico drug ligand-based methods have been successful in predicting interactions with known target proteins, they struggle with new, unassessed targets. To address this challenge, we propose an approach that integrates structural data from AlphaFold 2 predicted protein structures into machine learning models. Our method extracts 3D structural protein fingerprints and combines them with ligand structural data to train a single machine learning model. This model captures the relationship between ligand properties and the unique structural features of various target proteins, enabling predictions for never before tested molecules and protein targets. To assess our model, we used a dataset of 144 Human G-protein Coupled Receptors (GPCRs) with over 140,000 measured inhibition constants (Ki) values. Results strongly suggest that our approach performs as well as state-of-the-art ligand-based methods. In a second modeling approach that used 129 targets for training and a separate test set of 15 different protein targets, our model correctly predicted interactions for 73% of targets, with explained variances exceeding 0.50 in 22% of cases. Our findings further verified that the usage of experimentally determined protein structures produced models that were statistically indistinct from the Alphafold synthetic structures. This study presents a proteo-chemometric drug screening approach that uses a simple and scalable method for extracting protein structural information for usage in machine learning models capable of predicting protein-molecule interactions even for orphan targets.
Collapse
Affiliation(s)
- Pedro F Oliveira
- Lasige, Faculdade de Ciências, Universidade de Lisboa, 1749-016 Lisboa, Portugal
| | - Rita C Guedes
- Research Institute for Medicines (iMed.ULisboa), Faculdade de Farmácia, Universidade de Lisboa, Av. Prof. Gama Pinto, 1649-003 Lisboa, Portugal
| | - Andre O Falcao
- Departamento de Informática, Faculdade de Ciências, Universidade de Lisboa, 1749-016, Lisboa, Portugal.
| |
Collapse
|
41
|
Zeng X, Li SJ, Lv SQ, Wen ML, Li Y. A comprehensive review of the recent advances on predicting drug-target affinity based on deep learning. Front Pharmacol 2024; 15:1375522. [PMID: 38628639 PMCID: PMC11019008 DOI: 10.3389/fphar.2024.1375522] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2024] [Accepted: 03/21/2024] [Indexed: 04/19/2024] Open
Abstract
Accurate calculation of drug-target affinity (DTA) is crucial for various applications in the pharmaceutical industry, including drug screening, design, and repurposing. However, traditional machine learning methods for calculating DTA often lack accuracy, posing a significant challenge in accurately predicting DTA. Fortunately, deep learning has emerged as a promising approach in computational biology, leading to the development of various deep learning-based methods for DTA prediction. To support researchers in developing novel and highly precision methods, we have provided a comprehensive review of recent advances in predicting DTA using deep learning. We firstly conducted a statistical analysis of commonly used public datasets, providing essential information and introducing the used fields of these datasets. We further explored the common representations of sequences and structures of drugs and targets. These analyses served as the foundation for constructing DTA prediction methods based on deep learning. Next, we focused on explaining how deep learning models, such as Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs), Transformer, and Graph Neural Networks (GNNs), were effectively employed in specific DTA prediction methods. We highlighted the unique advantages and applications of these models in the context of DTA prediction. Finally, we conducted a performance analysis of multiple state-of-the-art methods for predicting DTA based on deep learning. The comprehensive review aimed to help researchers understand the shortcomings and advantages of existing methods, and further develop high-precision DTA prediction tool to promote the development of drug discovery.
Collapse
Affiliation(s)
- Xin Zeng
- College of Mathematics and Computer Science, Dali University, Dali, China
| | - Shu-Juan Li
- Yunnan Institute of Endemic Diseases Control and Prevention, Dali, China
| | - Shuang-Qing Lv
- Institute of Surveying and Information Engineering West Yunnan University of Applied Science, Dali, China
| | - Meng-Liang Wen
- State Key Laboratory for Conservation and Utilization of Bio-Resources in Yunnan, Yunnan University, Kunming, China
| | - Yi Li
- College of Mathematics and Computer Science, Dali University, Dali, China
| |
Collapse
|
42
|
Chen S, Li M, Semenov I. MFA-DTI: Drug-target interaction prediction based on multi-feature fusion adopted framework. Methods 2024; 224:79-92. [PMID: 38430967 DOI: 10.1016/j.ymeth.2024.02.008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2023] [Revised: 02/16/2024] [Accepted: 02/23/2024] [Indexed: 03/05/2024] Open
Abstract
The identification of drug-target interactions (DTI) is a valuable step in the drug discovery and repositioning process. However, traditional laboratory experiments are time-consuming and expensive. Computational methods have streamlined research to determine DTIs. The application of deep learning methods has significantly improved the prediction performance for DTIs. Modern deep learning methods can leverage multiple sources of information, including sequence data that contains biological structural information, and interaction data. While useful, these methods cannot be effectively applied to each type of information individually (e.g., chemical structure and interaction network) and do not take into account the specificity of DTI data such as low- or zero-interaction biological entities. To overcome these limitations, we propose a method called MFA-DTI (Multi-feature Fusion Adopted framework for DTI). MFA-DTI consists of three modules: an interaction graph learning module that processes the interaction network to generate interaction vectors, a chemical structure learning module that extracts features from the chemical structure, and a fusion module that combines these features for the final prediction. To validate the performance of MFA-DTI, we conducted experiments on six public datasets under different settings. The results indicate that the proposed method is highly effective in various settings and outperforms state-of-the-art methods.
Collapse
Affiliation(s)
- Siqi Chen
- School of Information Science and Engineering, Chongqing Jiaotong University, Chongqing, 400074, China.
| | - Minghui Li
- Beidahuang Industry Group General Hospital, Harbin, 150006, China
| | - Ivan Semenov
- College of Intelligence and Computing, Tianjin University, Tianjin, 300072, China
| |
Collapse
|
43
|
Ghandikota SK, Jegga AG. Application of artificial intelligence and machine learning in drug repurposing. PROGRESS IN MOLECULAR BIOLOGY AND TRANSLATIONAL SCIENCE 2024; 205:171-211. [PMID: 38789178 DOI: 10.1016/bs.pmbts.2024.03.030] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/26/2024]
Abstract
The purpose of drug repurposing is to leverage previously approved drugs for a particular disease indication and apply them to another disease. It can be seen as a faster and more cost-effective approach to drug discovery and a powerful tool for achieving precision medicine. In addition, drug repurposing can be used to identify therapeutic candidates for rare diseases and phenotypic conditions with limited information on disease biology. Machine learning and artificial intelligence (AI) methodologies have enabled the construction of effective, data-driven repurposing pipelines by integrating and analyzing large-scale biomedical data. Recent technological advances, especially in heterogeneous network mining and natural language processing, have opened up exciting new opportunities and analytical strategies for drug repurposing. In this review, we first introduce the challenges in repurposing approaches and highlight some success stories, including those during the COVID-19 pandemic. Next, we review some existing computational frameworks in the literature, organized on the basis of the type of biomedical input data analyzed and the computational algorithms involved. In conclusion, we outline some exciting new directions that drug repurposing research may take, as pioneered by the generative AI revolution.
Collapse
Affiliation(s)
- Sudhir K Ghandikota
- Division of Biomedical Informatics, Cincinnati Children's Hospital Medical Center, Cincinnati, OH, United States
| | - Anil G Jegga
- Division of Biomedical Informatics, Cincinnati Children's Hospital Medical Center, Cincinnati, OH, United States; Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, OH, United States.
| |
Collapse
|
44
|
Liu Y, Xing L, Zhang L, Cai H, Guo M. GEFormerDTA: drug target affinity prediction based on transformer graph for early fusion. Sci Rep 2024; 14:7416. [PMID: 38548825 PMCID: PMC10979032 DOI: 10.1038/s41598-024-57879-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2023] [Accepted: 03/22/2024] [Indexed: 04/01/2024] Open
Abstract
Predicting the interaction affinity between drugs and target proteins is crucial for rapid and accurate drug discovery and repositioning. Therefore, more accurate prediction of DTA has become a key area of research in the field of drug discovery and drug repositioning. However, traditional experimental methods have disadvantages such as long operation cycles, high manpower requirements, and high economic costs, making it difficult to predict specific interactions between drugs and target proteins quickly and accurately. Some methods mainly use the SMILES sequence of drugs and the primary structure of proteins as inputs, ignoring the graph information such as bond encoding, degree centrality encoding, spatial encoding of drug molecule graphs, and the structural information of proteins such as secondary structure and accessible surface area. Moreover, previous methods were based on protein sequences to learn feature representations, neglecting the completeness of information. To address the completeness of drug and protein structure information, we propose a Transformer graph-based early fusion research approach for drug-target affinity prediction (GEFormerDTA). Our method reduces prediction errors caused by insufficient feature learning. Experimental results on Davis and KIBA datasets showed a better prediction of drugtarget affinity than existing affinity prediction methods.
Collapse
Affiliation(s)
- Youzhi Liu
- Department of Computer Science and Technology, Shandong University of Technology, Zibo, 255000, China
| | - Linlin Xing
- Department of Computer Science and Technology, Shandong University of Technology, Zibo, 255000, China.
| | - Longbo Zhang
- Department of Computer Science and Technology, Shandong University of Technology, Zibo, 255000, China
| | - Hongzhen Cai
- Department of Agricultural Engineering and Food Science, Shandong University of Technology, Zibo, 255000, China
| | - Maozu Guo
- Department of Electrical and Information Engineering, Beijing University of Architecture, Beijing, 102616, China
| |
Collapse
|
45
|
Zhao W, Yu Y, Liu G, Liang Y, Xu D, Feng X, Guan R. MSI-DTI: predicting drug-target interaction based on multi-source information and multi-head self-attention. Brief Bioinform 2024; 25:bbae238. [PMID: 38762789 PMCID: PMC11102638 DOI: 10.1093/bib/bbae238] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2024] [Revised: 04/09/2024] [Accepted: 05/03/2024] [Indexed: 05/20/2024] Open
Abstract
Identifying drug-target interactions (DTIs) holds significant importance in drug discovery and development, playing a crucial role in various areas such as virtual screening, drug repurposing and identification of potential drug side effects. However, existing methods commonly exploit only a single type of feature from drugs and targets, suffering from miscellaneous challenges such as high sparsity and cold-start problems. We propose a novel framework called MSI-DTI (Multi-Source Information-based Drug-Target Interaction Prediction) to enhance prediction performance, which obtains feature representations from different views by integrating biometric features and knowledge graph representations from multi-source information. Our approach involves constructing a Drug-Target Knowledge Graph (DTKG), obtaining multiple feature representations from diverse information sources for SMILES sequences and amino acid sequences, incorporating network features from DTKG and performing an effective multi-source information fusion. Subsequently, we employ a multi-head self-attention mechanism coupled with residual connections to capture higher-order interaction information between sparse features while preserving lower-order information. Experimental results on DTKG and two benchmark datasets demonstrate that our MSI-DTI outperforms several state-of-the-art DTIs prediction methods, yielding more accurate and robust predictions. The source codes and datasets are publicly accessible at https://github.com/KEAML-JLU/MSI-DTI.
Collapse
Affiliation(s)
- Wenchuan Zhao
- Key Laboratory of Symbolic Computation and Knowledge Engineering of the Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun 130012, Jilin, China
| | - Yufeng Yu
- Key Laboratory of Symbolic Computation and Knowledge Engineering of the Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun 130012, Jilin, China
| | - Guosheng Liu
- Key Laboratory of Symbolic Computation and Knowledge Engineering of the Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun 130012, Jilin, China
| | - Yanchun Liang
- Zhuhai Laboratory of the Key Laboratory of Symbolic Computation and Knowledge Engineering of the Ministry of Education, Zhuhai College of Science and Technology, Zhuhai 519041, China
| | - Dong Xu
- Department of Computer Science, Informatics Institute, and Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, MO 65211, USA
| | - Xiaoyue Feng
- Key Laboratory of Symbolic Computation and Knowledge Engineering of the Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun 130012, Jilin, China
| | - Renchu Guan
- Key Laboratory of Symbolic Computation and Knowledge Engineering of the Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun 130012, Jilin, China
| |
Collapse
|
46
|
Pogány D, Antal P. Towards explainable interaction prediction: Embedding biological hierarchies into hyperbolic interaction space. PLoS One 2024; 19:e0300906. [PMID: 38512848 PMCID: PMC10956837 DOI: 10.1371/journal.pone.0300906] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2023] [Accepted: 03/06/2024] [Indexed: 03/23/2024] Open
Abstract
Given the prolonged timelines and high costs associated with traditional approaches, accelerating drug development is crucial. Computational methods, particularly drug-target interaction prediction, have emerged as efficient tools, yet the explainability of machine learning models remains a challenge. Our work aims to provide more interpretable interaction prediction models using similarity-based prediction in a latent space aligned to biological hierarchies. We investigated integrating drug and protein hierarchies into a joint-embedding drug-target latent space via embedding regularization by conducting a comparative analysis between models employing traditional flat Euclidean vector spaces and those utilizing hyperbolic embeddings. Besides, we provided a latent space analysis as an example to show how we can gain visual insights into the trained model with the help of dimensionality reduction. Our results demonstrate that hierarchy regularization improves interpretability without compromising predictive performance. Furthermore, integrating hyperbolic embeddings, coupled with regularization, enhances the quality of the embedded hierarchy trees. Our approach enables a more informed and insightful application of interaction prediction models in drug discovery by constructing an interpretable hyperbolic latent space, simultaneously incorporating drug and target hierarchies and pairing them with available interaction information. Moreover, compatible with pairwise methods, the approach allows for additional transparency through existing explainable AI solutions.
Collapse
Affiliation(s)
- Domonkos Pogány
- Department of Measurement and Information Systems, Budapest University of Technology and Economics, Budapest, Hungary
| | - Péter Antal
- Department of Measurement and Information Systems, Budapest University of Technology and Economics, Budapest, Hungary
| |
Collapse
|
47
|
Zhang Y, Li S, Meng K, Sun S. Machine Learning for Sequence and Structure-Based Protein-Ligand Interaction Prediction. J Chem Inf Model 2024; 64:1456-1472. [PMID: 38385768 DOI: 10.1021/acs.jcim.3c01841] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/23/2024]
Abstract
Developing new drugs is too expensive and time -consuming. Accurately predicting the interaction between drugs and targets will likely change how the drug is discovered. Machine learning-based protein-ligand interaction prediction has demonstrated significant potential. In this paper, computational methods, focusing on sequence and structure to study protein-ligand interactions, are examined. Therefore, this paper starts by presenting an overview of the data sets applied in this area, as well as the various approaches applied for representing proteins and ligands. Then, sequence-based and structure-based classification criteria are subsequently utilized to categorize and summarize both the classical machine learning models and deep learning models employed in protein-ligand interaction studies. Moreover, the evaluation methods and interpretability of these models are proposed. Furthermore, delving into the diverse applications of protein-ligand interaction models in drug research is presented. Lastly, the current challenges and future directions in this field are addressed.
Collapse
Affiliation(s)
- Yunjiang Zhang
- Beijing Key Laboratory for Green Catalysis and Separation, The Faculty of Environment and Life, Beijing University of Technology, Beijing 100124, P. R. China
| | - Shuyuan Li
- Beijing Key Laboratory for Green Catalysis and Separation, The Faculty of Environment and Life, Beijing University of Technology, Beijing 100124, P. R. China
| | - Kong Meng
- Beijing Key Laboratory for Green Catalysis and Separation, The Faculty of Environment and Life, Beijing University of Technology, Beijing 100124, P. R. China
| | - Shaorui Sun
- Beijing Key Laboratory for Green Catalysis and Separation, The Faculty of Environment and Life, Beijing University of Technology, Beijing 100124, P. R. China
| |
Collapse
|
48
|
Huang D, Ye X, Sakurai T. Multi-party collaborative drug discovery via federated learning. Comput Biol Med 2024; 171:108181. [PMID: 38428094 DOI: 10.1016/j.compbiomed.2024.108181] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2023] [Revised: 01/28/2024] [Accepted: 02/18/2024] [Indexed: 03/03/2024]
Abstract
In the field of drug discovery and pharmacology research, precise and rapid prediction of drug-target binding affinity (DTA) and drug-drug interaction (DDI) are essential for drug efficacy and safety. However, pharmacological data are often distributed across different institutions. Moreover, due to concerns regarding data privacy and intellectual property, the sharing of pharmacological data is often restricted. It is difficult for institutions to achieve the desired performance by solely utilizing their data. This urgent challenge calls for a solution that not only enhances collaboration between multiple institutions to improve prediction accuracy but also safeguards data privacy. In this study, we propose a novel federated learning (FL) framework to advance the prediction of DTA and DDI, namely FL-DTA and FL-DDI. The proposed framework enables multiple institutions to collaboratively train a predictive model without the need to share their local data. Moreover, to ensure data privacy, we employ secure multi-party computation (MPC) during the federated learning model aggregation phase. We evaluated the proposed method on two DTA and one DDI benchmark datasets and compared them with centralized learning and local learning. The experimental results indicate that the proposed method performs closely to centralized learning, and significantly outperforms local learning. Moreover, the proposed framework ensures data security while promoting collaboration among institutions, thereby accelerating the drug discovery process.
Collapse
Affiliation(s)
- Dong Huang
- Department of Computer Science, University of Tsukuba, Tsukuba, 3058577, Japan
| | - Xiucai Ye
- Department of Computer Science, University of Tsukuba, Tsukuba, 3058577, Japan.
| | - Tetsuya Sakurai
- Department of Computer Science, University of Tsukuba, Tsukuba, 3058577, Japan
| |
Collapse
|
49
|
Huang Z, Xiao Q, Xiong T, Shi W, Yang Y, Li G. Predicting Drug-Protein Interactions through Branch-Chain Mining and multi-dimensional attention network. Comput Biol Med 2024; 171:108127. [PMID: 38350397 DOI: 10.1016/j.compbiomed.2024.108127] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2023] [Revised: 01/26/2024] [Accepted: 02/06/2024] [Indexed: 02/15/2024]
Abstract
Identifying drug-protein interactions (DPIs) is crucial in drug discovery and repurposing. Computational methods for precise DPI identification can expedite development timelines and reduce expenses compared with conventional experimental methods. Lately, deep learning techniques have been employed for predicting DPIs, enhancing these processes. Nevertheless, the limitations observed in prior studies, where many extract features from complete drug and protein entities, overlooking the crucial theoretical foundation that pharmacological responses are often correlated with specific substructures, can lead to poor predictive performance. Furthermore, certain substructure-focused research confines its exploration to a solitary fragment category, such as a functional group. In this study, addressing these constraints, we present an end-to-end framework termed BCMMDA for predicting DPIs. The framework considers various substructure types, including branch chains, common substructures, and specific fragments. We designed a specific feature learning module by combining our proposed multi-dimensional attention mechanism with convolutional neural networks (CNNs). Deep CNNs assist in capturing the synergistic effects among these fragment sets, enabling the extraction of relevant features of drugs and proteins. Meanwhile, the multi-dimensional attention mechanism refines the relationship between drug and protein features by assigning attention vectors to each drug compound and amino acid. This mechanism empowers the model to further concentrate on pivotal substructures and elements, thereby improving its ability to identify essential interactions in DPI prediction. We evaluated the performance of BCMMDA on four well-known benchmark datasets. The results indicated that BCMMDA outperformed state-of-the-art baseline models, demonstrating significant improvement in performance.
Collapse
Affiliation(s)
- Zhuo Huang
- College of Information Science and Engineering, Hunan Normal University, Changsha, 410081, China
| | - Qiu Xiao
- College of Information Science and Engineering, Hunan Normal University, Changsha, 410081, China; MOE-LCSM, School of Mathematics and Statistics, Hunan Normal University, Changsha, 410081, China; College of Computer Science and Electronic Engineering, Hunan University, Changsha, 410082, China.
| | - Tuo Xiong
- College of Information Science and Engineering, Hunan Normal University, Changsha, 410081, China
| | - Wanwan Shi
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, 410082, China
| | - Yide Yang
- Key Laboratory of Molecular Epidemiology of Hunan Province, School of Medicine, Hunan Normal University, Changsha, 410006, China.
| | - Guanghui Li
- School of Information Engineering, East China Jiaotong University, Nanchang, 330013, China.
| |
Collapse
|
50
|
Qi H, Yu T, Yu W, Liu C. Drug-target affinity prediction with extended graph learning-convolutional networks. BMC Bioinformatics 2024; 25:75. [PMID: 38365583 PMCID: PMC10874073 DOI: 10.1186/s12859-024-05698-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2024] [Accepted: 02/12/2024] [Indexed: 02/18/2024] Open
Abstract
BACKGROUND High-performance computing plays a pivotal role in computer-aided drug design, a field that holds significant promise in pharmaceutical research. The prediction of drug-target affinity (DTA) is a crucial stage in this process, potentially accelerating drug development through rapid and extensive preliminary compound screening, while also minimizing resource utilization and costs. Recently, the incorporation of deep learning into DTA prediction and the enhancement of its accuracy have emerged as key areas of interest in the research community. Drugs and targets can be characterized through various methods, including structure-based, sequence-based, and graph-based representations. Despite the progress in structure and sequence-based techniques, they tend to provide limited feature information. Conversely, graph-based approaches have risen to prominence, attracting considerable attention for their comprehensive data representation capabilities. Recent studies have focused on constructing protein and drug molecular graphs using sequences and SMILES, subsequently deriving representations through graph neural networks. However, these graph-based approaches are limited by the use of a fixed adjacent matrix of protein and drug molecular graphs for graph convolution. This limitation restricts the learning of comprehensive feature representations from intricate compound and protein structures, consequently impeding the full potential of graph-based feature representation in DTA prediction. This, in turn, significantly impacts the models' generalization capabilities in the complex realm of drug discovery. RESULTS To tackle these challenges, we introduce GLCN-DTA, a model specifically designed for proficiency in DTA tasks. GLCN-DTA innovatively integrates a graph learning module into the existing graph architecture. This module is designed to learn a soft adjacent matrix, which effectively and efficiently refines the contextual structure of protein and drug molecular graphs. This advancement allows for learning richer structural information from protein and drug molecular graphs via graph convolution, specifically tailored for DTA tasks, compared to the conventional fixed adjacent matrix approach. A series of experiments have been conducted to validate the efficacy of the proposed GLCN-DTA method across diverse scenarios. The results demonstrate that GLCN-DTA possesses advantages in terms of robustness and high accuracy. CONCLUSIONS The proposed GLCN-DTA model enhances DTA prediction performance by introducing a novel framework that synergizes graph learning operations with graph convolution operations, thereby achieving richer representations. GLCN-DTA does not distinguish between different protein classifications, including structurally ordered and intrinsically disordered proteins, focusing instead on improving feature representation. Therefore, its applicability scope may be more effective in scenarios involving structurally ordered proteins, while potentially being limited in contexts with intrinsically disordered proteins.
Collapse
Affiliation(s)
- Haiou Qi
- Nursing Department, Sir Run Run Shaw Hospital, Zhejiang University School of Medicine, Hangzhou, 310016, China
| | - Ting Yu
- Operating Room Department, Sir Run Run Shaw Hospital, Zhejiang University School of Medicine, Hangzhou, 310016, China.
| | - Wenwen Yu
- School of Artificial Intelligence and Automation, Huazhong University of Science and Technology, Wuhan, 430074, China
| | - Chenxi Liu
- School of Medicine and Health Management, Tongji Medical School, Huazhong University of Science and Technology, Wuhan, 430030, China
| |
Collapse
|