1
|
Choi D, Park S. Improving binding affinity prediction by emphasizing local features of drug and protein. Comput Biol Chem 2025; 115:108310. [PMID: 39674048 DOI: 10.1016/j.compbiolchem.2024.108310] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2024] [Revised: 10/10/2024] [Accepted: 12/04/2024] [Indexed: 12/16/2024]
Abstract
Binding affinity prediction has been considered as a fundamental task in drug discovery. Despite much effort to improve accuracy of binding affinity prediction, the prior work considered only macro-level features that can represent the characteristics of the whole architecture of a drug and a target protein, and the features from local structure of the drug and the protein tend to be lost. In this paper, we propose a deep learning model that can comprehensively extract the local features of both a drug and a target protein for accurate binding affinity prediction. The proposed model consists of two components named as Multi-Stream CNN and Multi-Stream GCN, each of which is responsible for capturing micro-level characteristics or local features from subsequences of a target protein sequence and subgraph of a drug molecule, respectively. Having multiple streams consisting of different numbers of layers, both the components can compute and preserve the local features with a stream consisting of a single layer. Our evaluation with two popular datasets, Davis and KIBA, demonstrates that the proposed model outperforms all the baseline models using the global features, implying that local features play significant roles of binding affinity prediction.
Collapse
Affiliation(s)
- Daejin Choi
- Department of Computer Science and Engineering, Incheon National University, Incheon, Republic of Korea.
| | - Sangjun Park
- Department of Artificial Intelligence, Korea University, Seoul, Republic of Korea.
| |
Collapse
|
2
|
Chen M, Gong X, Pan S, Wu J, Lin F, Du B, Hu W. Unified Knowledge-Guided Molecular Graph Encoder with multimodal fusion and multi-task learning. Neural Netw 2025; 184:107068. [PMID: 39732065 DOI: 10.1016/j.neunet.2024.107068] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2024] [Revised: 12/02/2024] [Accepted: 12/17/2024] [Indexed: 12/30/2024]
Abstract
The remarkable success of Graph Neural Networks underscores their formidable capacity to assimilate multimodal inputs, markedly enhancing performance across a broad spectrum of domains. In the context of molecular modeling, considerable efforts have been made to enrich molecular representations by integrating data from diverse aspects. Nevertheless, current methodologies frequently compartmentalize geometric and semantic components, resulting in a fragmented approach that impairs the holistic integration of molecular attributes. This constrained scope limits the generalizability and efficacy of such models in downstream applications. A pivotal challenge lies in harmonizing heterogeneous data sources, particularly in addressing the inherent inconsistencies and sparsity within multimodal molecular datasets. To overcome these limitations, we present the Unified Knowledge-Guided Molecular Graph Encoder (UKGE), a groundbreaking framework that leverages heterogeneous graphs to unify the representation of diverse molecular modalities. Unlike prior methods, UKGE reconciles geometric and semantic features through the use of elemental knowledge graphs (KGs) and meta-path definitions by constructing Unified Molecular Graphs, enabling comprehensive and unified molecular representations. It employs an innovative Meta-Path Aware Message Passing mechanism within its molecular encoder, enhancing the integration of multimodal data. Additionally, a multi-task learning strategy balances data from different modalities, further enriching UKGE's capability to embed complex biological insights.Empirical evaluations highlight UKGE's excellence across tasks: DDI prediction achieves 96.91% ACC and 99.14% AUC in warm-start settings, with 83.15% ACC in cold-start scenarios. For CPI prediction, it reaches 0.644 CI on Davis and 0.659 on KIBA. In LBDD, it achieves 99.3% validity, 98.4% uniqueness, and 98.9% novelty, establishing UKGE as a state-of-the-art molecular modeling framework.
Collapse
Affiliation(s)
- Mukun Chen
- School of Computer Science, Wuhan University, Luojiashan Road, Wuchang District., Wuhan, 430072, Hubei Province, China.
| | - Xiuwen Gong
- University of Technology Sydney, 15 Broadway Ultimo, NSW 2007, Sydney, 2007, Australia.
| | - Shirui Pan
- School of Information and Communication Technology, Griffith University, 170 Kessels Road, Nathan Qld 4111, Queensland, 4111, Australia.
| | - Jia Wu
- School of Computing, Macquarie University, Balaclava Rd, Macquarie Park NSW 2109, Sydney, 2109, Australia.
| | - Fu Lin
- School of Computer Science, Wuhan University, Luojiashan Road, Wuchang District., Wuhan, 430072, Hubei Province, China.
| | - Bo Du
- School of Computer Science, Wuhan University, Luojiashan Road, Wuchang District., Wuhan, 430072, Hubei Province, China.
| | - Wenbin Hu
- School of Computer Science, Wuhan University, Luojiashan Road, Wuchang District., Wuhan, 430072, Hubei Province, China; Hubei Key Laboratory of Digital Finance Innovation, Hubei University of Economics, No. 8, Yangqiaohu Avenue, Zanglong Island Development Zone, Jiangxia District, Wuhan, 2007, Hubei Province, China.
| |
Collapse
|
3
|
Zhang W, Hu F, Yin P, Cai Y. A transferability-guided protein-ligand interaction prediction method. Methods 2025; 235:64-70. [PMID: 39920915 DOI: 10.1016/j.ymeth.2025.01.019] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2024] [Revised: 01/19/2025] [Accepted: 01/21/2025] [Indexed: 02/10/2025] Open
Abstract
Accurate prediction of protein-ligand interaction (PLI) is crucial for drug discovery and development. However, existing methods often struggle with effectively integrating heterogeneous protein and ligand data modalities and optimizing knowledge transfer from pretraining to the target task. This paper proposes a novel transferability-guided PLI prediction method that maximizes knowledge transfer by deeply integrating protein and ligand representations through a cross-attention mechanism and incorporating transferability metrics to guide fine-tuning. The cross-attention mechanism facilitates interactive information exchange between modalities, enabling the model to capture intricate interdependencies. Meanwhile, the transferability-guided strategy quantifies transferability from pretraining tasks and incorporates it into the training objective, ensuring the effective utilization of beneficial knowledge while mitigating negative transfer. Extensive experiments demonstrate significant and consistent improvements over traditional fine-tuning, validated by statistical tests. Ablation studies highlight the pivotal role of cross-attention, and quantitative analysis reveals the method's ability to reduce harmful transfer. Our guided strategy provides a paradigm for more comprehensive utilization of pretraining knowledge, offering prospects for enhancing other PLI prediction approaches. This method advances PLI prediction via innovative modality fusion and guided knowledge transfer, paving the way for accelerated drug discovery pipelines. Code and data are freely available at https://github.com/brian-zZZ/Guided-PLI.
Collapse
Affiliation(s)
- Weihong Zhang
- Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China; University of Chinese Academy of Sciences, Beijing 100049, China
| | - Fan Hu
- Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China.
| | - Peng Yin
- Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China
| | - Yunpeng Cai
- Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China.
| |
Collapse
|
4
|
Heo R, Lee D, Kim BJ, Seo S, Park S, Park C. KNU-DTI: KNowledge United Drug-Target Interaction prediction. Comput Biol Med 2025; 189:109927. [PMID: 40024184 DOI: 10.1016/j.compbiomed.2025.109927] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2024] [Revised: 01/17/2025] [Accepted: 02/24/2025] [Indexed: 03/04/2025]
Abstract
MOTIVATION Accurately predicting drug-target protein interactions (DTI) is a cornerstone of drug discovery, enabling the identification of potential therapeutic compounds. Sequence-based prediction models, despite their simplicity, hold great promise in extracting essential information directly from raw sequences. However, the focus in recent DTI studies has increasingly shifted toward enhancing algorithmic complexity, often at the expense of fully leveraging robust sequence representation learning methods. This shift has led to the underestimation and gradual neglect of methodologies aimed at effectively capturing discriminative features from sequences. Our work seeks to address this oversight by emphasizing the value of well-constructed sequence representation algorithms, demonstrating that even with simple interaction mapping algorithm techniques, accurate DTI models can be achieved. By prioritizing meaningful information extraction over excessive model complexity, we aim to advance the development of practical and generalizable DTI prediction frameworks. RESULTS We developed the KNowledge Uniting DTI model (KNU-DTI), which retrieves structural information and unites them. Protein structural properties were obtained using structural property sequence (SPS). Extended-connectivity fingerprint (ECFP) was used to estimate the structure-activity relationship in molecules. Including these two features, a total of five latent vectors were derived from protein and molecule via various neural networks and integrated by elemental-wise addition to predict binding interactions or affinity. Using four test concepts to evaluate the model, we show that the model outperforms recently published competitors. Finally, a case study indicated that our model has a competitive edge over existing docking simulations in some cases.
Collapse
Affiliation(s)
- Ryong Heo
- Interdisciplinary Graduate Program in Medical Bigdata Convergence, Kangwon National University, Chuncheon-si, 24341, Gangwon-do, Republic of Korea; UBLBio Corporation, Yeongtong-ro 237, Suwon, 16679, Gyeonggi-do, Republic of Korea
| | - Dahyeon Lee
- Department of Data Science, Kangwon National University, Republic of Korea
| | - Byung Ju Kim
- UBLBio Corporation, Yeongtong-ro 237, Suwon, 16679, Gyeonggi-do, Republic of Korea
| | - Sangmin Seo
- Department of Computer Science, Yonsei University, Yonsei-ro 50, Seodaemun-gu, 03722, Seoul, Republic of Korea
| | - Sanghyun Park
- Department of Computer Science, Yonsei University, Yonsei-ro 50, Seodaemun-gu, 03722, Seoul, Republic of Korea
| | - Chihyun Park
- Interdisciplinary Graduate Program in Medical Bigdata Convergence, Kangwon National University, Chuncheon-si, 24341, Gangwon-do, Republic of Korea; Department of Data Science, Kangwon National University, Republic of Korea; UBLBio Corporation, Yeongtong-ro 237, Suwon, 16679, Gyeonggi-do, Republic of Korea; Department of Computer Science and Engineering, Kangwon National University, Republic of Korea.
| |
Collapse
|
5
|
Qiu X, Shao S, Wang H, Tan X. Bio-K-Transformer: A pre-trained transformer-based sequence-to-sequence model for adverse drug reactions prediction. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2025; 260:108524. [PMID: 39667145 DOI: 10.1016/j.cmpb.2024.108524] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/27/2024] [Revised: 10/20/2024] [Accepted: 11/19/2024] [Indexed: 12/14/2024]
Abstract
BACKGROUND AND OBJECTIVE Adverse drug reactions (ADRs) pose a serious threat to patient health, potentially resulting in severe consequences, including mortality. Accurate prediction of ADRs before drug market release is crucial for early prevention. Traditional ADR detection, relying on clinical trials and voluntary reporting, has inherent limitations. Clinical trials face challenges in capturing rare and long-term reactions due to scale and time constraints, while voluntary reporting tends to neglect mild and common reactions. Consequently, drugs on the market may carry unknown risks, leading to an increasing demand for more accurate predictions of ADRs before their commercial release. This study aims to develop a more accurate prediction model for ADRs prior to drug market release. METHODS We frame the ADR prediction task as a sequence-to-sequence problem and propose the Bio-K-Transformer, which integrates the transformer model with pre-trained models (i.e., Bio_ClinicalBERT and K-bert), to forecast potential ADRs. We enhance the attention mechanism of the Transformer encoder structure and adjust embedding layers to model diverse relationships between drug adverse reactions. Additionally, we employ a masking technique to handle target data. Experimental findings demonstrate a notable improvement in predicting potential adverse reactions, achieving a predictive accuracy of 90.08%. It significantly exceeds current state-of-the-art baseline models and even the fine-tuned Llama-3.1-8B and Llama3-Aloe-8B-Alpha model, while being cost-effective. The results highlight the model's efficacy in identifying potential adverse reactions with high precision, sensitivity, and specificity. CONCLUSION The Bio-K-Transformer significantly enhances the prediction of ADRs, offering a cost-effective method with strong potential for improving pre-market safety evaluations of pharmaceuticals.
Collapse
Affiliation(s)
- Xihe Qiu
- School of Electronic and Electrical Engineering, Shanghai University of Engineering Science, Shanghai, China
| | - Siyue Shao
- School of Electronic and Electrical Engineering, Shanghai University of Engineering Science, Shanghai, China
| | - Haoyu Wang
- School of Electronic and Electrical Engineering, Shanghai University of Engineering Science, Shanghai, China
| | - Xiaoyu Tan
- INF Technology (Shanghai) Co., Ltd., Shanghai, China.
| |
Collapse
|
6
|
Hu R, Ge R, Deng G, Fan J, Tang B, Wang C. MultiKD-DTA: Enhancing Drug-Target Affinity Prediction Through Multiscale Feature Extraction. Interdiscip Sci 2025:10.1007/s12539-025-00697-4. [PMID: 40019659 DOI: 10.1007/s12539-025-00697-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2024] [Revised: 02/05/2025] [Accepted: 02/07/2025] [Indexed: 03/01/2025]
Abstract
The discovery and development of novel pharmaceutical agents is characterized by high costs, lengthy timelines, and significant safety concerns. Traditional drug discovery involves pharmacologists manually screening drug molecules against protein targets, focusing on binding within protein cavities. However, this manual process is slow and inherently limited. Given these constraints, the use of deep learning techniques to predict drug-target interaction (DTI) affinities is both significant and promising for future applications. This paper introduces an innovative deep learning architecture designed to enhance the prediction of DTI affinities. The model ingeniously combines graph neural networks, pre-trained large-scale protein models, and attention mechanisms to improve performance. In this framework, molecular structures are represented as graphs and processed through graph neural networks and multiscale convolutional networks to facilitate feature extraction. Simultaneously, protein sequences are encoded using pre-trained ESM-2 large models and processed with bidirectional long short-term memory networks. Subsequently, the molecular and protein embeddings derived from these processes are integrated within a fusion module to compute affinity scores. Experimental results demonstrate that our proposed model outperforms existing methods on two publicly available datasets.
Collapse
Affiliation(s)
- Riqian Hu
- Hangzhou Dianzi University, Hangzhou, 310018, China
- University of California, San Diego, La Jolla, 92093, USA
| | - Ruiquan Ge
- Hangzhou Dianzi University, Hangzhou, 310018, China.
| | - Guojian Deng
- Hangzhou Dianzi University, Hangzhou, 310018, China
| | - Jin Fan
- Hangzhou Dianzi University, Hangzhou, 310018, China
| | - Bowen Tang
- MindRank AI Ltd., Hangzhou, 310018, China
| | - Changmiao Wang
- Shenzhen Research Institute of Big Data, Shenzhen, 518172, China.
| |
Collapse
|
7
|
Liu Y, Liu Y, Yang H, Zhang L, Che K, Xing L. NTMFF-DTA: Prediction of Drug-Target Affinity Based on Network Topology and Multi-feature Fusion. Interdiscip Sci 2025:10.1007/s12539-025-00692-9. [PMID: 39998589 DOI: 10.1007/s12539-025-00692-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2024] [Revised: 01/20/2025] [Accepted: 01/21/2025] [Indexed: 02/27/2025]
Abstract
Predicting drug-target binding affinity (DTA) is an important step in the complex process of drug discovery or drug repositioning. A large number of computational methods proposed for the task of DTA prediction utilize single features of proteins to measure drug-protein or protein-protein interactions, ignoring multi-feature fusion between protein-related features (e.g., solvent accessibility, protein pockets, secondary structures, and distance maps, etc.). To address the aforementioned constraints, we propose a new network topology and multi-feature fusion based approach for DTA prediction (NTMFF-DTA), which deeply mines protein multiple types of data and propagates drug information across domains. Data in drug-target interactions are often sparse, and multi-feature fusion can enrich data information by integrating multiple features, thus overcoming the data sparsity problem to some extent. The proposed approach offers two main contributions: (1) constructing a relationship-aware GAT that selectively focuses on the connections between nodes and edges in the molecular graph to capture the more central roles of nodes and edges in DTA prediction and (2) constructing an information propagation channel between different feature domains of drug proteins to achieve the sharing of the importance weight of drug atoms and edges, and combining with a multi-head self-attention mechanism to capture residue-enhancing features. The NTMFF-DTA model was comparatively tested against several leading baseline technologies on commonly used datasets. Experimental show that NTMFF-DTA can effectively and accurately predict DTA and outperform existing comparative models.
Collapse
Affiliation(s)
- Yuandong Liu
- Computer Science and Technology, Shandong University of Technology, Mashang, Zibo, 255000, China
| | - Youzhi Liu
- Computer Science and Technology, Shandong University of Technology, Mashang, Zibo, 255000, China
| | - Haoqin Yang
- Department of Mechanical Engineering, Shandong University of Technology, Mashang, Zibo, 255000, China
| | - Longbo Zhang
- Computer Science and Technology, Shandong University of Technology, Mashang, Zibo, 255000, China
| | - Kai Che
- Xi'an Aeronautics Computing Technique Research Institute, AVIC, Xi'an, 710065, China
- School of Computer Science, Northwestern Polytechnical University, Xi'an, 710072, China
| | - Linlin Xing
- Computer Science and Technology, Shandong University of Technology, Mashang, Zibo, 255000, China.
| |
Collapse
|
8
|
Michels J, Bandarupalli R, Ahangar Akbari A, Le T, Xiao H, Li J, Hom EFY. Natural Language Processing Methods for the Study of Protein-Ligand Interactions. J Chem Inf Model 2025. [PMID: 39993834 DOI: 10.1021/acs.jcim.4c01907] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/26/2025]
Abstract
Natural Language Processing (NLP) has revolutionized the way computers are used to study and interact with human languages and is increasingly influential in the study of protein and ligand binding, which is critical for drug discovery and development. This review examines how NLP techniques have been adapted to decode the "language" of proteins and small molecule ligands to predict protein-ligand interactions (PLIs). We discuss how methods such as long short-term memory (LSTM) networks, transformers, and attention mechanisms can leverage different protein and ligand data types to identify potential interaction patterns. Significant challenges are highlighted including the scarcity of high-quality negative data, difficulties in interpreting model decisions, and sampling biases in existing data sets. We argue that focusing on improving data quality, enhancing model robustness, and fostering both collaboration and competition could catalyze future advances in machine-learning-based predictions of PLIs.
Collapse
Affiliation(s)
- James Michels
- Department of Computer and Information Science, University of Mississippi, University, Mississippi 38677, United States
| | - Ramya Bandarupalli
- Department of BioMolecular Sciences, School of Pharmacy, University of Mississippi, University, Mississippi 38677, United States
| | - Amin Ahangar Akbari
- Department of BioMolecular Sciences, School of Pharmacy, University of Mississippi, University, Mississippi 38677, United States
| | - Thai Le
- Department of Computer Science, Indiana University, Bloomington, Indiana 47408, United States
| | - Hong Xiao
- Department of Computer and Information Science and Institute for Data Science, University of Mississippi, University, Mississippi 38677, United States
| | - Jing Li
- Department of BioMolecular Sciences, School of Pharmacy, University of Mississippi, University, Mississippi 38677, United States
| | - Erik F Y Hom
- Department of Biology and Center for Biodiversity and Conservation Research, University of Mississippi, University, Mississippi 38677, United States
| |
Collapse
|
9
|
Hu S, Hu J, Zhang X, Jin S, Xu X. Drug target affinity prediction based on multi-scale gated power graph and multi-head linear attention mechanism. PLoS One 2025; 20:e0315718. [PMID: 39982887 PMCID: PMC11844845 DOI: 10.1371/journal.pone.0315718] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2024] [Accepted: 11/30/2024] [Indexed: 02/23/2025] Open
Abstract
For the purpose of developing new drugs and repositioning existing ones, accurate drug-target affinity (DTA) prediction is essential. While graph neural networks are frequently utilized for DTA prediction, it is difficult for existing single-scale graph neural networks to access the global structure of compounds. We propose a novel DTA prediction model in this study, MAPGraphDTA, which uses an approach based on a multi-head linear attention mechanism that aggregates global features based on the attention weights and a multi-scale gated power graph that captures multi-hop connectivity relationships of graph nodes. In order to accurately extract drug target features, we provide a gated skip-connection approach in multiscale graph neural networks, which is used to fuse multiscale features to produce a rich representation of feature information. We experimented on the Davis, Kiba, Metz, and DTC datasets, and we evaluated the proposed method against other relevant models. Based on all evaluation metrics, MAPGraphDTA outperforms the other models, according to the results of the experiment. We also performed cold-start experiments on the Davis dataset, which showed that our model has good prediction ability for unseen drugs, unseen proteins, and cases where neither drugs nor proteins has been seen.
Collapse
Affiliation(s)
- Shuo Hu
- School of Computer Science and Technology, Wuhan University of Science and Technology, Wuhan, Hubei, China
| | - Jing Hu
- School of Computer Science and Technology, Wuhan University of Science and Technology, Wuhan, Hubei, China
- Hubei Province Key Laboratory of Intelligent Information Processing and Real-Time Industrial System, Wuhan, China
- Institute of Big Data Science and Engineering, Wuhan University of Science and Technology, Wuhan, Hubei, China
| | - Xiaolong Zhang
- School of Computer Science and Technology, Wuhan University of Science and Technology, Wuhan, Hubei, China
- Hubei Province Key Laboratory of Intelligent Information Processing and Real-Time Industrial System, Wuhan, China
- Institute of Big Data Science and Engineering, Wuhan University of Science and Technology, Wuhan, Hubei, China
| | - Shuting Jin
- School of Computer Science and Technology, Wuhan University of Science and Technology, Wuhan, Hubei, China
| | - Xin Xu
- School of Computer Science and Technology, Wuhan University of Science and Technology, Wuhan, Hubei, China
| |
Collapse
|
10
|
Kalemati M, Zamani Emani M, Koohi S. InceptionDTA: Predicting drug-target binding affinity with biological context features and inception networks. Heliyon 2025; 11:e42476. [PMID: 40007773 PMCID: PMC11850134 DOI: 10.1016/j.heliyon.2025.e42476] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2024] [Revised: 01/23/2025] [Accepted: 02/04/2025] [Indexed: 02/27/2025] Open
Abstract
Predicting drug-target binding affinity via in silico methods is crucial in drug discovery. Traditional machine learning relies on manually engineered features from limited data, leading to suboptimal performance. In contrast, deep learning excels at extracting features from raw sequences but often overlooks essential biological context features, hindering effective binding prediction. Additionally, these models struggle to capture global and local feature distributions efficiently in protein sequences and drug SMILES. Previous state-of-the-art models, like transformers and graph-based approaches, face scalability and resource efficiency challenges. Transformers struggle with scalability, while graph-based methods have difficulty handling large datasets and complex molecular structures. In this paper, we introduce InceptionDTA, a novel drug-target binding affinity prediction model that leverages CharVec, an enhanced variant of Prot2Vec, to incorporate both biological context and categorical features into protein sequence encoding. InceptionDTA utilizes a multi-scale convolutional architecture based on the Inception network to capture features at various spatial resolutions, enabling the extraction of both local and global features from protein sequences and drug SMILES. We evaluate InceptionDTA across a range of benchmark datasets commonly used in drug-target binding affinity prediction. Our results demonstrate that InceptionDTA outperforms various sequence-based, transformer-based, and graph-based deep learning approaches across warm-start, refined, and cold-start splitting settings. In addition to using CharVec, which demonstrates greater accuracy in absolute predictions, InceptionDTA also includes a version that employs simple label encoding and excels in ranking and predicting relative binding affinities. This versatility highlights how InceptionDTA can effectively adapt to various predictive requirements. These results emphasize the promise of our approach in expediting drug repurposing initiatives, enabling the discovery of new drugs, and contributing to advancements in disease treatment.
Collapse
Affiliation(s)
- Mahmood Kalemati
- Department of Computer Engineering, Sharif University of Technology, Tehran, Iran
| | - Mojtaba Zamani Emani
- Department of Computer Engineering, Sharif University of Technology, Tehran, Iran
| | - Somayyeh Koohi
- Department of Computer Engineering, Sharif University of Technology, Tehran, Iran
| |
Collapse
|
11
|
Luo J, Zhu Z, Xu Z, Xiao C, Wei J, Shen J. GS-DTA: integrating graph and sequence models for predicting drug-target binding affinity. BMC Genomics 2025; 26:105. [PMID: 39905318 PMCID: PMC11792192 DOI: 10.1186/s12864-025-11234-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2024] [Accepted: 01/10/2025] [Indexed: 02/06/2025] Open
Abstract
BACKGROUND Drug-target binding affinity (DTA) prediction is vital in drug discovery and repositioning, more and more researchers are beginning to focus on this. Many effective methods have been proposed. However, some current methods have certain shortcomings in focusing on important nodes in drug molecular graphs and dealing with complex structural molecules. In particular, when considering important nodes and complex substructures in molecules, they may not be able to fully explore the potential relationships between different parts. In addition, when dealing with protein structures, some methods ignore the connections between amino acid fragments that are far apart in sequence but may work synergistically in function. RESULTS In this paper, we propose a new method, called GS-DTA, for predicting DTA based on graph and sequence models. GS-DTA takes simplified molecular input line input system (SMILES) of the drug and the protein amino acid sequence as input. First, each drug is modeled as a graph, in which a vertex is an atom and an edge represents interaction between atoms. Then GATv2-GCN and the three-layer GCN networks are used to extract the features of the drug. GATv2-GCN enhances the model's ability to focus on important nodes by assigning dynamic attention scores, which improves the learning of the graph structure's intricate patterns. Besides, The three-layer GCN can captures hierarchical features of the drug through deeper propagation and feature transformation. Meanwhile, for each protein, a framework combining CNN, Bi-LSTM, and Transformer is used to extract the contextual and structural information of the protein amino acid sequences, and this combination can help to understand a comprehensive and detailed features of the protein. Finally, the obtained drug and protein feature vectors are combined to predict DTA through the fully connected layer. The source code can be downloaded from https://github.com/zhuziguang/GS-DTA . CONCLUSIONS The results show that GS-DTA achieves good performance in terms of MSE, CI, and r2m on the Davis and KIBA datasets, improving the accuracy of DTA prediction.
Collapse
Affiliation(s)
- Junwei Luo
- School of Software, Henan Polytechnic University, Jiaozuo, 454000, China
| | - Ziguang Zhu
- School of Software, Henan Polytechnic University, Jiaozuo, 454000, China
| | - Zhenhan Xu
- School of Software, Henan Polytechnic University, Jiaozuo, 454000, China
| | - Chuanle Xiao
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-Sen University, Guangzhou, 510060, China
| | - Jingjing Wei
- College of Chemical and Environmental Engineering, Anyang Institute of Technology, Anyang, 455000, China
| | - Jiquan Shen
- School of Software, Henan Polytechnic University, Jiaozuo, 454000, China.
- College of Chemical and Environmental Engineering, Anyang Institute of Technology, Anyang, 455000, China.
| |
Collapse
|
12
|
Wang X, Zhao Q, Wang J. FedKD-CPI: Combining the federated knowledge distillation technique to accomplish synergistic compound-protein interaction prediction. Methods 2025; 234:275-283. [PMID: 39824374 DOI: 10.1016/j.ymeth.2024.12.014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2024] [Revised: 12/20/2024] [Accepted: 12/31/2024] [Indexed: 01/20/2025] Open
Abstract
Compound-protein interaction (CPI) prediction is critical in the early stages of drug discovery, narrowing the search space for CPIs and reducing the cost and time required for traditional high-throughput screening. However, CPI-related data are usually distributed across different institutions and their sharing is restricted because of data privacy and intellectual property rights. Constructing a scheme that enhances multi-institutional collaboration to improve prediction accuracy while protecting data privacy is essential. To this end, we propose FedKD-CPI, the first framework based on federated knowledge distillation, to effectively facilitate multi-party CPI collaborative prediction and ensure data privacy and security. FedKD-CPI uses knowledge distillation technology to extract the updated knowledge of all client models and train the model on the server to achieve knowledge aggregation, which can effectively utilize the knowledge contained in public and private data. We evaluate FedKD-CPI on three benchmark datasets and compare it with four baselines. The results show that FedKD-CPI is very close to centralized learning and significantly better than localized learning. Furthermore, FedKD-CPI outperforms federated learning-based baselines on independent and identically distributed data and non-independent and identically distributed data. Overall, FedKD-CPI improves the CPI prediction while ensuring data security and promoting institutions' collaboration to accelerate drug discovery.
Collapse
Affiliation(s)
- Xuetao Wang
- School of Computer Science and Engineering, Central South University, Changsha 410083, China; Hunan Provincial Key Lab on Bioinformatics, Central South University, Changsha 410083, China
| | - Qichang Zhao
- School of Computer Science and Engineering, Central South University, Changsha 410083, China; Hunan Provincial Key Lab on Bioinformatics, Central South University, Changsha 410083, China.
| | - Jianxin Wang
- School of Computer Science and Engineering, Central South University, Changsha 410083, China; Hunan Provincial Key Lab on Bioinformatics, Central South University, Changsha 410083, China
| |
Collapse
|
13
|
Li C, Li G. DynHeter-DTA: Dynamic Heterogeneous Graph Representation for Drug-Target Binding Affinity Prediction. Int J Mol Sci 2025; 26:1223. [PMID: 39940990 PMCID: PMC11818550 DOI: 10.3390/ijms26031223] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2024] [Revised: 01/27/2025] [Accepted: 01/28/2025] [Indexed: 02/16/2025] Open
Abstract
In drug development, drug-target affinity (DTA) prediction is a key indicator for assessing the drug's efficacy and safety. Despite significant progress in deep learning-based affinity prediction approaches in recent years, there are still limitations in capturing the complex interactions between drugs and target receptors. To address this issue, a dynamic heterogeneous graph prediction model, DynHeter-DTA, is proposed in this paper, which fully leverages the complex relationships between drug-drug, protein-protein, and drug-protein interactions, allowing the model to adaptively learn the optimal graph structures. Specifically, (1) in the data processing layer, to better utilize the similarities and interactions between drugs and proteins, the model dynamically adjusts the connection strengths between drug-drug, protein-protein, and drug-protein pairs, constructing a variable heterogeneous graph structure, which significantly improves the model's expressive power and generalization performance; (2) in the model design layer, considering that the quantity of protein nodes significantly exceeds that of drug nodes, an approach leveraging Graph Isomorphism Networks (GIN) and Self-Attention Graph Pooling (SAGPooling) is proposed to enhance prediction efficiency and accuracy. Comprehensive experiments on the Davis, KIBA, and Human public datasets demonstrate that DynHeter-DTA exceeds the performance of previous models in drug-target interaction forecasting, providing an innovative solution for drug-target affinity prediction.
Collapse
Affiliation(s)
- Changli Li
- School of Artificial Intelligence, Nanjing University of Information Science & Technology, Nanjing 210044, China;
| | | |
Collapse
|
14
|
He H, Chen G, Tang Z, Chen CYC. Dual modality feature fused neural network integrating binding site information for drug target affinity prediction. NPJ Digit Med 2025; 8:67. [PMID: 39875637 PMCID: PMC11775287 DOI: 10.1038/s41746-025-01464-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2024] [Accepted: 01/15/2025] [Indexed: 01/30/2025] Open
Abstract
Accurately predicting binding affinities between drugs and targets is crucial for drug discovery but remains challenging due to the complexity of modeling interactions between small drug and large targets. This study proposes DMFF-DTA, a dual-modality neural network model integrates sequence and graph structure information from drugs and proteins for drug-target affinity prediction. The model introduces a binding site-focused graph construction approach to extract binding information, enabling more balanced and efficient modeling of drug-target interactions. Comprehensive experiments demonstrate DMFF-DTA outperforms state-of-the-art methods with significant improvements. The model exhibits excellent generalization capabilities on completely unseen drugs and targets, achieving an improvement of over 8% compared to existing methods. Model interpretability analysis validates the biological relevance of the model. A case study in pancreatic cancer drug repurposing demonstrates its practical utility. This work provides an interpretable, robust approach to integrate multi-view drug and protein features for advancing computational drug discovery.
Collapse
Affiliation(s)
- Haohuai He
- State Key Laboratory of Chemical Oncogenomics, Key Laboratory of Chemical Genomics, School of Chemical Biology and Biotechnology, Peking University Shenzhen Graduate School, Shenzhen, 518055, China
- Artificial Intelligence Medical Research Center, School of Intelligent Systems Engineering, Sun Yat-sen University, Shenzhen, 510275, China
| | - Guanxing Chen
- State Key Laboratory of Chemical Oncogenomics, Key Laboratory of Chemical Genomics, School of Chemical Biology and Biotechnology, Peking University Shenzhen Graduate School, Shenzhen, 518055, China
- Artificial Intelligence Medical Research Center, School of Intelligent Systems Engineering, Sun Yat-sen University, Shenzhen, 510275, China
| | - Zhenchao Tang
- State Key Laboratory of Chemical Oncogenomics, Key Laboratory of Chemical Genomics, School of Chemical Biology and Biotechnology, Peking University Shenzhen Graduate School, Shenzhen, 518055, China
- Artificial Intelligence Medical Research Center, School of Intelligent Systems Engineering, Sun Yat-sen University, Shenzhen, 510275, China
| | - Calvin Yu-Chian Chen
- State Key Laboratory of Chemical Oncogenomics, Key Laboratory of Chemical Genomics, School of Chemical Biology and Biotechnology, Peking University Shenzhen Graduate School, Shenzhen, 518055, China.
- Department of Medical Research, China Medical University Hospital, Taichung, 40447, Taiwan.
- Department of Bioinformatics and Medical Engineering, Asia University, Taichung, 41354, Taiwan.
| |
Collapse
|
15
|
Xu J, Ci L, Zhu B, Zhang G, Jiang L, Ye-Lehmann S, Long W. MMSG-DTA: A Multimodal, Multiscale Model Based on Sequence and Graph Modalities for Drug-Target Affinity Prediction. J Chem Inf Model 2025; 65:981-996. [PMID: 39772628 DOI: 10.1021/acs.jcim.4c01828] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2025]
Abstract
Drug-Target Affinity (DTA) prediction is a cornerstone of drug discovery and development, providing critical insights into the intricate interactions between candidate drugs and their biological targets. Despite its importance, existing methodologies often face significant limitations in capturing comprehensive global features from molecular graphs, which are essential for accurately characterizing drug properties. Furthermore, protein feature extraction is predominantly restricted to 1D amino acid sequences, which fail to adequately represent the spatial structures and complex functional regions of proteins. These shortcomings impede the development of models capable of fully elucidating the mechanisms underlying drug-target interactions. To overcome these challenges, we propose a multimodal, multiscale model based on Sequence and Graph Modalities for Drug-Target Affinity (MMSG-DTA) Prediction. The model combines graph neural networks with Transformers to effectively capture both local node-level features and global structural features of molecular graphs. Additionally, a graph-based modality is employed to improve the extraction of protein features from amino acid sequences. To further enhance the model's performance, an attention-based feature fusion module is incorporated to integrate diverse feature types, thereby strengthening its representation capacity and robustness. We evaluated MMSG-DTA on three public benchmark data sets─Davis, KIBA, and Metz─and the experimental results demonstrate that the proposed model outperforms several state-of-the-art methods in DTA prediction. These findings highlight the effectiveness of MMSG-DTA in advancing the accuracy and robustness of drug-target interaction modeling.
Collapse
Affiliation(s)
- Jiahao Xu
- School of Information Engineering, Huzhou University, Huzhou 313000, China
- Hangzhou Institute of Technology, Xidian University, Hangzhou 311231, China
| | - Lei Ci
- School of Information Engineering, Huzhou University, Huzhou 313000, China
| | - Bo Zhu
- School of Information Engineering, Huzhou University, Huzhou 313000, China
| | - Guanhua Zhang
- School of Information Engineering, Huzhou University, Huzhou 313000, China
- Hangzhou Institute of Technology, Xidian University, Hangzhou 311231, China
| | - Linhua Jiang
- School of Information Engineering, Huzhou University, Huzhou 313000, China
- Hangzhou Institute of Technology, Xidian University, Hangzhou 311231, China
| | - Shixin Ye-Lehmann
- Hangzhou Institute of Technology, Xidian University, Hangzhou 311231, China
- Faculty of Medicine, University Paris-Saclay, Paris 94276, France
| | - Wei Long
- School of Information Engineering, Huzhou University, Huzhou 313000, China
| |
Collapse
|
16
|
Li Z, Zeng Y, Jiang M, Wei B. Deep Drug-Target Binding Affinity Prediction Base on Multiple Feature Extraction and Fusion. ACS OMEGA 2025; 10:2020-2032. [PMID: 39866608 PMCID: PMC11755178 DOI: 10.1021/acsomega.4c08048] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/02/2024] [Revised: 12/25/2024] [Accepted: 01/03/2025] [Indexed: 01/28/2025]
Abstract
Accurate drug-target binding affinity (DTA) prediction is crucial in drug discovery. Recently, deep learning methods for DTA prediction have made significant progress. However, there are still two challenges: (1) recent models always ignore the correlations in drug and target data in the drug/target representation process and (2) the interaction learning of drug-target pairs always is by simple concatenation, which is insufficient to explore their fusion. To overcome these challenges, we propose an end-to-end sequence-based model called BTDHDTA. In the feature extraction process, the bidirectional gated recurrent unit (GRU), transformer encoder, and dilated convolution are employed to extract global, local, and their correlation patterns of drug and target input. Additionally, a module combining convolutional neural networks with a Highway connection is introduced to fuse drug and protein deep features. We evaluate the performance of BTDHDTA on three benchmark data sets (Davis, KIBA, and Metz), demonstrating its superiority over several current state-of-the-art methods in key metrics such as Mean Squared Error (MSE), Concordance Index (CI), and Regression toward the mean (R m 2). The results indicate that our method achieves a better performance in DTA prediction. In the case study, we use the BTDHDTA model to predict the binding affinities between 3137 FDA-approved drugs and severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) replication-related proteins, validating the model's effectiveness in practical scenarios.
Collapse
Affiliation(s)
- Zepeng Li
- School
of Computer Science and Technology, Zhejiang
Sci-Tech University, Hangzhou 310018, China
| | - Yuni Zeng
- School
of Computer Science and Technology, Zhejiang
Sci-Tech University, Hangzhou 310018, China
| | - Mingfeng Jiang
- School
of Computer Science and Technology, Zhejiang
Sci-Tech University, Hangzhou 310018, China
| | - Bo Wei
- School
of Computer Science and Technology, Zhejiang
Sci-Tech University, Hangzhou 310018, China
- Longgang
Research Institute, Zhejiang Sci-Tech University, Longgang 325000, Zhejiang, China
| |
Collapse
|
17
|
Deng M, Wang J, Zhao Y, Zhao Y, Cao H, Wang Z. Predicting drug and target interaction with dilated reparameterize convolution. Sci Rep 2025; 15:2579. [PMID: 39833385 PMCID: PMC11747116 DOI: 10.1038/s41598-025-86918-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2024] [Accepted: 01/15/2025] [Indexed: 01/22/2025] Open
Abstract
Predicting drug-target interaction (DTI) stands as a pivotal and formidable challenge in pharmaceutical research. Many existing deep learning methods only learn the high-dimensional representation of ligands and targets on a small scale. However, it is difficult for the model to obtain the potential law of combining pockets or multiple binding sites on a large scale. To address this lacuna, we designed a large-kernel convolutional block for extracting large-scale sequence information and proposed a novel DTI prediction framework, named Rep-ConvDTI. The reparameterization method is introduced to help large-kernel convolutions capture small-scale information. We have also developed a gated attention mechanism to more efficiently characterize the interaction of drugs and targets. Extensive experiments demonstrate that Rep-ConvDTI achieves the most competitive performance against state-of-the-art baselines on the three benchmark datasets. Furthermore, we validated the potential of Rep-ConvDTI as a drug screening tool through model interpretative studies and drug screening experiments with cystathionine-β-synthase.
Collapse
Affiliation(s)
- Moping Deng
- Shenyang Institute of Automation, Chinese Academy of Science, Shenyang, 110016, China
- University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Jian Wang
- Shenyang Institute of Automation, Chinese Academy of Science, Shenyang, 110016, China
- University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Yiming Zhao
- Shenyang Institute of Automation, Chinese Academy of Science, Shenyang, 110016, China
| | - Yongjia Zhao
- Shenyang Institute of Automation, Chinese Academy of Science, Shenyang, 110016, China
| | - Hao Cao
- School of Life Science and Biopharmaceutics, Shenyang Pharmaceutical University, 103 Wenhua Road, Shenyang, 110016, Liaoning Province, China
| | - Zhuo Wang
- Shenyang Institute of Automation, Chinese Academy of Science, Shenyang, 110016, China.
- University of Chinese Academy of Sciences, Beijing, 100049, China.
| |
Collapse
|
18
|
Li VOK, Han Y, Kaistha T, Zhang Q, Downey J, Gozes I, Lam JCK. DeepDrug as an expert guided and AI driven drug repurposing methodology for selecting the lead combination of drugs for Alzheimer's disease. Sci Rep 2025; 15:2093. [PMID: 39814937 PMCID: PMC11735786 DOI: 10.1038/s41598-025-85947-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2024] [Accepted: 01/07/2025] [Indexed: 01/18/2025] Open
Abstract
Alzheimer's Disease (AD) significantly aggravates human dignity and quality of life. While newly approved amyloid immunotherapy has been reported, effective AD drugs remain to be identified. Here, we propose a novel AI-driven drug-repurposing method, DeepDrug, to identify a lead combination of approved drugs to treat AD patients. DeepDrug advances drug-repurposing methodology in four aspects. Firstly, it incorporates expert knowledge to extend candidate targets to include long genes, immunological and aging pathways, and somatic mutation markers that are associated with AD. Secondly, it incorporates a signed directed heterogeneous biomedical graph encompassing a rich set of nodes and edges, and node/edge weighting to capture crucial pathways associated with AD. Thirdly, it encodes the weighted biomedical graph through a Graph Neural Network into a new embedding space to capture the granular relationships across different nodes. Fourthly, it systematically selects the high-order drug combinations via diminishing return-based thresholds. A five-drug lead combination, consisting of Tofacitinib, Niraparib, Baricitinib, Empagliflozin, and Doxercalciferol, has been selected from the top drug candidates based on DeepDrug scores to achieve the maximum synergistic effect. These five drugs target neuroinflammation, mitochondrial dysfunction, and glucose metabolism, which are all related to AD pathology. DeepDrug offers a novel AI-and-big-data, expert-guided mechanism for new drug combination discovery and drug-repurposing across AD and other neuro-degenerative diseases, with immediate clinical applications.
Collapse
Affiliation(s)
- Victor O K Li
- Department of Electrical and Electronic Engineering, The University of Hong Kong, Hong Kong, China.
| | - Yang Han
- Department of Electrical and Electronic Engineering, The University of Hong Kong, Hong Kong, China
| | - Tushar Kaistha
- Department of Electrical and Electronic Engineering, The University of Hong Kong, Hong Kong, China
| | - Qi Zhang
- Department of Electrical and Electronic Engineering, The University of Hong Kong, Hong Kong, China
| | - Jocelyn Downey
- Department of Electrical and Electronic Engineering, The University of Hong Kong, Hong Kong, China
| | - Illana Gozes
- Department of Human Molecular Genetics and Biochemistry, Sackler Faculty of Medicine, Tel Aviv University, Tel-Aviv, Israel
| | - Jacqueline C K Lam
- Department of Electrical and Electronic Engineering, The University of Hong Kong, Hong Kong, China.
| |
Collapse
|
19
|
Hu J, Hu S, Xia M, Zheng K, Zhang X. Drug-target binding affinity prediction based on power graph and word2vec. BMC Med Genomics 2025; 18:9. [PMID: 39806396 PMCID: PMC11730168 DOI: 10.1186/s12920-024-02073-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2023] [Accepted: 12/13/2024] [Indexed: 01/16/2025] Open
Abstract
BACKGROUND Drug and protein targets affect the physiological functions and metabolic effects of the body through bonding reactions, and accurate prediction of drug-protein target interactions is crucial for drug development. In order to shorten the drug development cycle and reduce costs, machine learning methods are gradually playing an important role in the field of drug-target interactions. RESULTS Compared with other methods, regression-based drug target affinity is more representative of the binding ability. Accurate prediction of drug target affinity can effectively reduce the time and cost of drug retargeting and new drug development. In this paper, a drug target affinity prediction model (WPGraphDTA) based on power graph and word2vec is proposed. CONCLUSIONS In this model, the drug molecular features in the power graph module are extracted by a graph neural network, and then the protein features are obtained by the Word2vec method. After feature fusion, they are input into the three full connection layers to obtain the drug target affinity prediction value. We conducted experiments on the Davis and Kiba datasets, and the experimental results showed that WPGraphDTA exhibited good prediction performance.
Collapse
Affiliation(s)
- Jing Hu
- School of Computer Science and Technology, Wuhan University of Science and Technology, Wuhan, 430065, Hubei, China.
- Hubei Province Key Laboratory of Intelligent Information Processing and Real-Time Industrial System, Wuhan, China.
- Institute of Big Data Science and Engineering, Wuhan University of Science and Technology, Wuhan, Hubei, China.
| | - Shuo Hu
- School of Computer Science and Technology, Wuhan University of Science and Technology, Wuhan, 430065, Hubei, China
| | - Minghao Xia
- School of Computer Science and Technology, Wuhan University of Science and Technology, Wuhan, 430065, Hubei, China
| | - Kangxing Zheng
- School of Computer Science and Technology, Wuhan University of Science and Technology, Wuhan, 430065, Hubei, China
| | - Xiaolong Zhang
- School of Computer Science and Technology, Wuhan University of Science and Technology, Wuhan, 430065, Hubei, China.
- Hubei Province Key Laboratory of Intelligent Information Processing and Real-Time Industrial System, Wuhan, China.
- Institute of Big Data Science and Engineering, Wuhan University of Science and Technology, Wuhan, Hubei, China.
| |
Collapse
|
20
|
Zhang Z, Luo G, Ma Y, Wu Z, Peng S, Chen S, Wu Y. GraphkmerDTA: integrating local sequence patterns and topological information for drug-target binding affinity prediction and applications in multi-target anti-Alzheimer's drug discovery. Mol Divers 2025:10.1007/s11030-024-11065-7. [PMID: 39792322 DOI: 10.1007/s11030-024-11065-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2024] [Accepted: 11/22/2024] [Indexed: 01/12/2025]
Abstract
Identifying drug-target binding affinity (DTA) plays a critical role in early-stage drug discovery. Despite the availability of various existing methods, there are still two limitations. Firstly, sequence-based methods often extract features from fixed length protein sequences, requiring truncation or padding, which can result in information loss or the introduction of unwanted noise. Secondly, structure-based methods prioritize extracting topological information but struggle to effectively capture sequence features. To address these challenges, we propose a novel deep learning model named GraphkmerDTA, which integrates Kmer features with structural topology. Specifically, GraphkmerDTA utilizes graph neural networks to extract topological features from both molecules and proteins, while fully connected networks learn local sequence patterns from the Kmer features of proteins. Experimental results indicate that GraphkmerDTA outperforms existing methods on benchmark datasets. Furthermore, a case study on lung cancer demonstrates the effectiveness of GraphkmerDTA, as it successfully identifies seven known EGFR inhibitors from a screening library of over two thousand compounds. To further assess the practical utility of GraphkmerDTA, we integrated it with network pharmacology to investigate the mechanisms underlying the therapeutic effects of Lonicera japonica flower in treating Alzheimer's disease. Through this interdisciplinary approach, three potential compounds were identified and subsequently validated through molecular docking studies. In conclusion, we present not only a novel AI model for the DTA task but also demonstrate its practical application in drug discovery by integrating modern AI approaches with traditional drug discovery methodologies.
Collapse
Affiliation(s)
- Zuolong Zhang
- School of Software, Henan University, Kaifeng, 475000, Henan, China
| | - Gang Luo
- School of Mathematics and Computer Science, Nanchang University, Nanchang, 330031, Jiangxi, China
| | - Yixuan Ma
- Key Laboratory of Prevention and Treatment of Cardiovascular and Cerebrovascular Diseases Ministry of Education, Jiangxi Province Key Laboratory of Biomaterials and Biofabrication for Tissue Engineering, Gannan Medical University, Ganzhou, 341000, Jiangxi, China
| | - Zhaoqi Wu
- School of Basic Medicine Sciences, Gannan Medical University, Ganzhou, 341000, Jiangxi, China
| | - Shuo Peng
- Department of Computer Science, Jinggangshan University, Ji'an, 343009, Jiangxi, China
| | - Shengbo Chen
- Henan Engineering Research Center of Intelligent Technology and Application, Henan University, Kaifeng, 475000, Henan, China.
- School of Software, Nanchang University, Nanchang, 330031, Jiangxi, China.
| | - Yi Wu
- Key Laboratory of Prevention and Treatment of Cardiovascular and Cerebrovascular Diseases Ministry of Education, Jiangxi Province Key Laboratory of Biomaterials and Biofabrication for Tissue Engineering, Gannan Medical University, Ganzhou, 341000, Jiangxi, China.
| |
Collapse
|
21
|
Ye Q, Sun Y. Improving drug-target affinity prediction by adaptive self-supervised learning. PeerJ Comput Sci 2025; 11:e2622. [PMID: 39896027 PMCID: PMC11784864 DOI: 10.7717/peerj-cs.2622] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2024] [Accepted: 12/02/2024] [Indexed: 02/04/2025]
Abstract
Computational drug-target affinity prediction is important for drug screening and discovery. Currently, self-supervised learning methods face two major challenges in drug-target affinity prediction. The first difficulty lies in the phenomenon of sample mismatch: self-supervised learning processes drug and target samples independently, while actual prediction requires the integration of drug-target pairs. Another challenge is the mismatch between the broadness of self-supervised learning objectives and the precision of biological mechanisms of drug-target affinity (i.e., the induced-fit principle). The former focuses on global feature extraction, while the latter emphasizes the importance of local precise matching. To address these issues, an adaptive self-supervised learning-based drug-target affinity prediction (ASSLDTA) was designed. ASSLDTA integrates a novel adaptive self-supervised learning (ASSL) module with a high-level feature learning network to extract the feature. The ASSL leverages a large amount of unlabeled training data to effectively capture low-level features of drugs and targets. Its goal is to maximize the retention of original feature information, thereby bridging the objective gap between self-supervised learning and drug-target affinity prediction and alleviating the sample mismatch problem. The high-level feature learning network, on the other hand, focuses on extracting effective high-level features for affinity prediction through a small amount of labeled data. Through this two-stage feature extraction design, each stage undertakes specific tasks, fully leveraging the advantages of each model while efficiently integrating information from different data sources, providing a more accurate and comprehensive solution for drug-target affinity prediction. In our experiments, ASSLDTA is much better than other deep methods, and the result of ASSLDTA is significantly increased by learning adaptive self-supervised learning-based features, which validates the effectiveness of our ASSLDTA.
Collapse
Affiliation(s)
- Qing Ye
- School of Data Science and Artificial Intelligence, Wenzhou University of Technology, Wenzhou, China
| | - Yaxin Sun
- School of Computer Science and Technology (School of Artificial Intelligence), Zhejiang Normal University, Jinhua, China
- Department of Algorithm, Zhejiang Aerospace Hengjia Data Technology Co. Ltd., Jiaxing, China
| |
Collapse
|
22
|
Tanoli Z, Schulman A, Aittokallio T. Validation guidelines for drug-target prediction methods. Expert Opin Drug Discov 2025; 20:31-45. [PMID: 39568436 DOI: 10.1080/17460441.2024.2430955] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2024] [Accepted: 11/14/2024] [Indexed: 11/22/2024]
Abstract
INTRODUCTION Mapping the interactions between pharmaceutical compounds and their molecular targets is a fundamental aspect of drug discovery and repurposing. Drug-target interactions are important for elucidating mechanisms of action and optimizing drug efficacy and safety profiles. Several computational methods have been developed to systematically predict drug-target interactions. However, computational and experimental validation of the drug-target predictions greatly vary across the studies. AREAS COVERED Through a PubMed query, a corpus comprising 3,286 articles on drug-target interaction prediction published within the past decade was covered. Natural language processing was used for automated abstract classification to study the evolution of computational methods, validation strategies and performance assessment metrics in the 3,286 articles. Additionally, a manual analysis of 259 studies that performed experimental validation of computational predictions revealed prevalent experimental protocols. EXPERT OPINION Starting from 2014, there has been a noticeable increase in articles focusing on drug-target interaction prediction. Docking and regression stands out as the most commonly used techniques among computational methods, and cross-validation is frequently employed as the computational validation strategy. Testing the predictions using multiple, orthogonal validation strategies is recommended and should be reported for the specific target prediction applications. Experimental validation remains relatively rare and should be performed more routinely to evaluate biological relevance of predictions.
Collapse
Affiliation(s)
- Ziaurrehman Tanoli
- Institute for Molecular Medicine Finland (FIMM), HiLIFE, University of Helsinki, Helsinki, Finland
- iCAN Digital Precision Cancer Medicine Flagship, University of Helsinki and Helsinki University Hospital, Helsinki, Finland
| | - Aron Schulman
- Institute for Molecular Medicine Finland (FIMM), HiLIFE, University of Helsinki, Helsinki, Finland
| | - Tero Aittokallio
- Institute for Molecular Medicine Finland (FIMM), HiLIFE, University of Helsinki, Helsinki, Finland
- iCAN Digital Precision Cancer Medicine Flagship, University of Helsinki and Helsinki University Hospital, Helsinki, Finland
- Institute for Cancer Research, Department of Cancer Genetics, Oslo University Hospital, Oslo, Norway
- Oslo Centre for Biostatistics and Epidemiology (OCBE), Faculty of Medicine, University of Oslo, Oslo, Norway
| |
Collapse
|
23
|
Sun J, Wang H, Mi J, Wan J, Gao J. MTAF-DTA: multi-type attention fusion network for drug-target affinity prediction. BMC Bioinformatics 2024; 25:375. [PMID: 39639198 PMCID: PMC11622562 DOI: 10.1186/s12859-024-05984-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2024] [Accepted: 11/11/2024] [Indexed: 12/07/2024] Open
Abstract
BACKGROUND The development of drug-target binding affinity (DTA) prediction tasks significantly drives the drug discovery process forward. Leveraging the rapid advancement of artificial intelligence, DTA prediction tasks have undergone a transformative shift from wet lab experimentation to machine learning-based prediction. This transition enables a more expedient exploration of potential interactions between drugs and targets, leading to substantial savings in time and funding resources. However, existing methods still face several challenges, such as drug information loss, lack of calculation of the contribution of each modality, and lack of simulation regarding the drug-target binding mechanisms. RESULTS We propose MTAF-DTA, a method for drug-target binding affinity prediction to solve the above problems. The drug representation module extracts three modalities of features from drugs and uses an attention mechanism to update their respective contribution weights. Additionally, we design a Spiral-Attention Block (SAB) as drug-target feature fusion module based on multi-type attention mechanisms, facilitating a triple fusion process between them. The SAB, to some extent, simulates the interactions between drugs and targets, thereby enabling outstanding performance in the DTA task. Our regression task on the Davis and KIBA datasets demonstrates the predictive capability of MTAF-DTA, with CI and MSE metrics showing respective improvements of 1.1% and 9.2% over the state-of-the-art (SOTA) method in the novel target settings. Furthermore, downstream tasks further validate MTAF-DTA's superiority in DTA prediction. CONCLUSIONS Experimental results and case study demonstrate the superior performance of our approach in DTA prediction tasks, showing its potential in practical applications such as drug discovery and disease treatment.
Collapse
Affiliation(s)
- Jinghong Sun
- College of Information Science and Technology, Beijing University of Chemical Technology, Beijing, 100029, China
| | - Han Wang
- College of Information Science and Technology, Beijing University of Chemical Technology, Beijing, 100029, China
| | - Jia Mi
- College of Information Science and Technology, Beijing University of Chemical Technology, Beijing, 100029, China
| | - Jing Wan
- College of Information Science and Technology, Beijing University of Chemical Technology, Beijing, 100029, China.
| | - Jingyang Gao
- College of Information Science and Technology, Beijing University of Chemical Technology, Beijing, 100029, China.
| |
Collapse
|
24
|
Hönig SMN, Gutermuth T, Ehrt C, Lemmen C, Rarey M. Combining crystallographic and binding affinity data towards a novel dataset of small molecule overlays. J Comput Aided Mol Des 2024; 39:2. [PMID: 39630291 PMCID: PMC11618164 DOI: 10.1007/s10822-024-00581-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2024] [Accepted: 11/13/2024] [Indexed: 12/08/2024]
Abstract
Although small molecule superposition is a standard technique in drug discovery, a rigorous performance assessment of the corresponding methods is currently challenging. Datasets in this field are sparse, small, tailored to specific applications, unavailable, or outdated. The newly developed LOBSTER set described herein offers a publicly available and method-independent dataset for benchmarking and method optimization. LOBSTER stands for "Ligand Overlays from Binding SiTe Ensemble Representatives". All ligands were derived from the PDB in a fully automated workflow, including a ligand efficiency filter. So-called ligand ensembles were assembled by aligning identical binding sites. Thus, the ligands within the ensembles are superimposed according to their experimentally determined binding orientation and conformation. Overall, 671 representative ligand ensembles comprise 3583 ligands from 3521 proteins. Altogether, 72,734 ligand pairs based on the ensembles were grouped into ten distinct subsets based on their volume overlap, for the benefit of introducing different degrees of difficulty for evaluating superposition methods. Statistics on the physicochemical properties of the compounds indicate that the dataset represents drug-like compounds. Consensus Diversity Plots show predominantly high Bemis-Murcko scaffold diversity and low median MACCS fingerprint similarity for each ensemble. An analysis of the underlying protein classes further demonstrates the heterogeneity within our dataset. The LOBSTER set offers a variety of applications like benchmarking multiple as well as pairwise alignments, generating training and test sets, for example based on time splits, or empirical software performance evaluation studies. The LOBSTER set is publicly available at https://doi.org/10.5281/zenodo.12658320 , representing a stable and versioned data resource. The Python scripts are available at https://github.com/rareylab/LOBSTER , open-source, and allow for updating or recreating superposition sets with different data sources.
Collapse
Affiliation(s)
- Sophia M N Hönig
- BioSolveIT, An der Ziegelei 79, 53757, Sankt Augustin, Germany
- University of Hamburg, ZBH - Center for Bioinformatics, Albert-Einstein-Ring 8-10, 22761, Hamburg, Germany
| | - Torben Gutermuth
- University of Hamburg, ZBH - Center for Bioinformatics, Albert-Einstein-Ring 8-10, 22761, Hamburg, Germany
| | - Christiane Ehrt
- University of Hamburg, ZBH - Center for Bioinformatics, Albert-Einstein-Ring 8-10, 22761, Hamburg, Germany
| | | | - Matthias Rarey
- University of Hamburg, ZBH - Center for Bioinformatics, Albert-Einstein-Ring 8-10, 22761, Hamburg, Germany.
| |
Collapse
|
25
|
Yang B, Liu Y, Wu J, Bai F, Zheng M, Zheng J. GENNDTI: Drug-Target Interaction Prediction Using Graph Neural Network Enhanced by Router Nodes. IEEE J Biomed Health Inform 2024; 28:7588-7598. [PMID: 40030413 DOI: 10.1109/jbhi.2024.3402529] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/05/2025]
Abstract
Identifying drug-target interactions (DTI) is crucial in drug discovery and repurposing, and in silico techniques for DTI predictions are becoming increasingly important for reducing time and cost. Most interaction-based DTI models rely on the guilt-by-association principle that "similar drugs can interact with similar targets". However, such methods utilize precomputed similarity matrices and cannot dynamically discover intricate correlations. Meanwhile, some methods enrich DTI networks by incorporating additional networks like DDI and PPI networks, enriching biological signals to enhance DTI prediction. While these approaches have achieved promising performance in DTI prediction, such coarse-grained association data do not explain the specific biological mechanisms underlying DTIs. In this work, we propose GENNDTI, which constructs biologically meaningful routers to represent and integrate the salient properties of drugs and targets. Similar drugs or targets connect to more same router nodes, capturing property sharing. In addition, heterogeneous encoders are designed to distinguish different types of interactions, modeling both real and constructed interactions. This strategy enriches graph topology and enhances prediction efficiency as well. We evaluate the proposed method on benchmark datasets, demonstrating comparative performance over existing methods. We specifically analyze router nodes to validate their efficacy in improving predictions and providing biological explanations.
Collapse
|
26
|
Chen JH, Tu HJ, Lin TE, Peng ZX, Wu YW, Yen SC, Sung TY, Hsieh JH, Lee HY, Pan SL, HuangFu WC, Hsu KC. Discovery of dual-specificity tyrosine-phosphorylation-regulated kinase 1A (DYRK1A) inhibitors using an artificial intelligence model and their effects on tau and tubulin dynamics. Biomed Pharmacother 2024; 181:117688. [PMID: 39591664 DOI: 10.1016/j.biopha.2024.117688] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2024] [Revised: 11/08/2024] [Accepted: 11/12/2024] [Indexed: 11/28/2024] Open
Abstract
The dual-specificity tyrosine-phosphorylation-regulated kinase 1 A (DYRK1A) presents a promising therapeutic target for neurological diseases. However, current inhibitors lack selectivity, which can lead to unexpected side effects and increase the difficulty of studying DYRK1A. Therefore, identifying selective inhibitors targeting DYRK1A is essential for reducing side effects and facilitating neurological disease research. This study aimed to discover DYRK1A inhibitors through a screening pipeline incorporating a deep neural network (DNN) model. Herein, we report an optimized model with an accuracy of 0.93 on a testing set. The pipeline was then performed to identify potential DYRK1A inhibitors from the National Cancer Institute (NCI) library. Four novel DYRK1A inhibitors were identified, and compounds NSC657702 and NSC31059 were noteworthy for their potent inhibition, with IC50 values of 50.9 and 39.5 nM, respectively. NSC31059 exhibited exceptional selectivity across 70 kinases. The compounds also significantly reduced DYRK1A-induced tau phosphorylation at key sites associated with the pathology of neurodegenerative diseases. Moreover, they promoted tubulin polymerization, suggesting a role in microtubule stabilization. Cytotoxicity assessments further confirmed the neuronal safety of the compounds. Together, the results demonstrated a promising screening pipeline and novel DYRK1A inhibitors as candidates for further optimization and development.
Collapse
Affiliation(s)
- Jun-Hong Chen
- Graduate Institute of Cancer Biology and Drug Discovery, College of Medical Science and Technology, Taipei Medical University, Taipei, Taiwan
| | - Huang-Ju Tu
- Graduate Institute of Cancer Biology and Drug Discovery, College of Medical Science and Technology, Taipei Medical University, Taipei, Taiwan
| | - Tony Eight Lin
- Graduate Institute of Cancer Biology and Drug Discovery, College of Medical Science and Technology, Taipei Medical University, Taipei, Taiwan; Ph.D. Program for Cancer Molecular Biology and Drug Discovery, College of Medical Science and Technology, Taipei Medical University, Taipei, Taiwan
| | - Zhao-Xiang Peng
- Graduate Institute of Cancer Biology and Drug Discovery, College of Medical Science and Technology, Taipei Medical University, Taipei, Taiwan
| | - Yi-Wen Wu
- Graduate Institute of Cancer Biology and Drug Discovery, College of Medical Science and Technology, Taipei Medical University, Taipei, Taiwan
| | - Shih-Chung Yen
- Warshel Institute for Computational Biology, The Chinese University of Hong Kong, Shenzhen, Guangdong, China
| | - Tzu-Ying Sung
- Graduate Institute of Cancer Biology and Drug Discovery, College of Medical Science and Technology, Taipei Medical University, Taipei, Taiwan
| | - Jui-Hua Hsieh
- Division of Translational Toxicology, National Institute of Environmental Health Sciences, National Institutes of Health, Durham, NC, USA
| | - Hsueh-Yun Lee
- School of Pharmacy, College of Pharmacy, Taipei Medical University, Taipei, Taiwan
| | - Shiow-Lin Pan
- Graduate Institute of Cancer Biology and Drug Discovery, College of Medical Science and Technology, Taipei Medical University, Taipei, Taiwan; Ph.D. Program for Cancer Molecular Biology and Drug Discovery, College of Medical Science and Technology, Taipei Medical University, Taipei, Taiwan; TMU Research Center of Cancer Translational Medicine, Taipei Medical University, Taipei, Taiwan
| | - Wei-Chun HuangFu
- Graduate Institute of Cancer Biology and Drug Discovery, College of Medical Science and Technology, Taipei Medical University, Taipei, Taiwan; Ph.D. Program for Cancer Molecular Biology and Drug Discovery, College of Medical Science and Technology, Taipei Medical University, Taipei, Taiwan; TMU Research Center of Cancer Translational Medicine, Taipei Medical University, Taipei, Taiwan.
| | - Kai-Cheng Hsu
- Graduate Institute of Cancer Biology and Drug Discovery, College of Medical Science and Technology, Taipei Medical University, Taipei, Taiwan; Ph.D. Program for Cancer Molecular Biology and Drug Discovery, College of Medical Science and Technology, Taipei Medical University, Taipei, Taiwan; TMU Research Center of Cancer Translational Medicine, Taipei Medical University, Taipei, Taiwan; Cancer Center, Wan Fang Hospital, Taipei Medical University, Taipei, Taiwan.
| |
Collapse
|
27
|
Shi W, Yang H, Xie L, Yin XX, Zhang Y. A review of machine learning-based methods for predicting drug-target interactions. Health Inf Sci Syst 2024; 12:30. [PMID: 38617016 PMCID: PMC11014838 DOI: 10.1007/s13755-024-00287-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2023] [Accepted: 03/04/2024] [Indexed: 04/16/2024] Open
Abstract
The prediction of drug-target interactions (DTI) is a crucial preliminary stage in drug discovery and development, given the substantial risk of failure and the prolonged validation period associated with in vitro and in vivo experiments. In the contemporary landscape, various machine learning-based methods have emerged as indispensable tools for DTI prediction. This paper begins by placing emphasis on the data representation employed by these methods, delineating five representations for drugs and four for proteins. The methods are then categorized into traditional machine learning-based approaches and deep learning-based ones, with a discussion of representative approaches in each category and the introduction of a novel taxonomy for deep neural network models in DTI prediction. Additionally, we present a synthesis of commonly used datasets and evaluation metrics to facilitate practical implementation. In conclusion, we address current challenges and outline potential future directions in this research field.
Collapse
Affiliation(s)
- Wen Shi
- Cyberspace Institute of Advanced Technology, Guangzhou University, Guangzhou, 510006 China
- School of Computer Science and Technology, Zhejiang Normal University, Jinhua, 321004 China
| | - Hong Yang
- Cyberspace Institute of Advanced Technology, Guangzhou University, Guangzhou, 510006 China
| | - Linhai Xie
- State Key Laboratory of Proteomics, National Center for Protein Sciences (Beijing), Beijing, 102206 China
| | - Xiao-Xia Yin
- Cyberspace Institute of Advanced Technology, Guangzhou University, Guangzhou, 510006 China
| | - Yanchun Zhang
- School of Computer Science and Technology, Zhejiang Normal University, Jinhua, 321004 China
- Department of New Networks, Peng Cheng Laboratory, Shenzhen, 518000 China
| |
Collapse
|
28
|
Paendong GG, Ngnamsie Njimbouom S, Zonyfar C, Kim J. ERL-ProLiGraph: Enhanced representation learning on protein-ligand graph structured data for binding affinity prediction. Mol Inform 2024; 43:e202400044. [PMID: 39404190 PMCID: PMC11639045 DOI: 10.1002/minf.202400044] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2024] [Revised: 06/03/2024] [Accepted: 06/21/2024] [Indexed: 12/14/2024]
Abstract
Predicting Protein-Ligand Binding Affinity (PLBA) is pivotal in drug development, as accurate estimations of PLBA expedite the identification of promising drug candidates for specific targets, thereby accelerating the drug discovery process. Despite substantial advancements in PLBA prediction, developing an efficient and more accurate method remains non-trivial. Unlike previous computer-aid PLBA studies which primarily using ligand SMILES and protein sequences represented as strings, this research introduces a Deep Learning-based method, the Enhanced Representation Learning on Protein-Ligand Graph Structured data for Binding Affinity Prediction (ERL-ProLiGraph). The unique aspect of this method is the use of graph representations for both proteins and ligands, intending to learn structural information continued from both to enhance the accuracy of PLBA predictions. In these graphs, nodes represent atomic structures, while edges depict chemical bonds and spatial relationship. The proposed model, leveraging deep-learning algorithms, effectively learns to correlate these graphical representations with binding affinities. This graph-based representations approach enhances the model's ability to capture the complex molecular interactions critical in PLBA. This work represents a promising advancement in computational techniques for protein-ligand binding prediction, offering a potential path toward more efficient and accurate predictions in drug development. Comparative analysis indicates that the proposed ERL-ProLiGraph outperforms previous models, showcasing notable efficacy and providing a more suitable approach for accurate PLBA predictions.
Collapse
Affiliation(s)
- Gloria Geine Paendong
- Department of Computer Science and Electronics EngineeringSun Moon UniversityChungcheongnam-doKorea
| | | | - Candra Zonyfar
- Department of Computer Science and Electronics EngineeringSun Moon UniversityChungcheongnam-doKorea
| | - Jeong‐Dong Kim
- Department of Computer Science and Electronics EngineeringSun Moon UniversityChungcheongnam-doKorea
- Department of Computer Science and EngineeringSun Moon UniversityChungcheongnam-doKorea
- Genome-Based Bio IT Convergence InstituteSun Moon UniversityChungcheongnam-doKorea
| |
Collapse
|
29
|
Kumar R, Romano JD, Ritchie MD. CASTER-DTA: Equivariant Graph Neural Networks for Predicting Drug-Target Affinity. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.11.25.625281. [PMID: 39651302 PMCID: PMC11623579 DOI: 10.1101/2024.11.25.625281] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 12/11/2024]
Abstract
Accurately determining the binding affinity of a ligand with a protein is important for drug design, development, and screening. With the advent of accessible protein structure prediction methods such as AlphaFold, several approaches have been developed that make use of information determined from the 3D structure for a variety of downstream tasks. However, methods for predicting binding affinity that do consider protein structure generally do not take full advantage of such 3D structural protein information, often using such information only to define nearest-neighbor graphs based on inter-residue or inter-atomic distances. Here, we present a joint architecture that we call CASTER-DTA (Cross-Attention with Structural Target Equivariant Representations for Drug-Target Affinity) that makes use of an SE(3)-equivariant graph neural network to learn more robust protein representations alongside a standard graph neural network to learn molecular representations, and we further augment these representations by incorporating an attention-based mechanism by which individual residues in a protein can attend to atoms in a ligand and vice-versa to improve interpretability. In this manner, we show that using equivariant graph neural networks in our architecture enables CASTER-DTA to approach and exceed state-of-the-art performance in predicting drug-target affinity without the inclusion of external information, such as protein language model embeddings. We do so on the Davis and KIBA datasets, common benchmarks for predicting drug-target affinity. We also discuss future steps to further improve performance.
Collapse
Affiliation(s)
- Rachit Kumar
- Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104
| | - Joseph D Romano
- Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104
| | - Marylyn D Ritchie
- Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104
| |
Collapse
|
30
|
Luo Z, Wu W, Sun Q, Wang J. Accurate and transferable drug-target interaction prediction with DrugLAMP. Bioinformatics 2024; 40:btae693. [PMID: 39570605 PMCID: PMC11629708 DOI: 10.1093/bioinformatics/btae693] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2024] [Revised: 10/29/2024] [Accepted: 11/14/2024] [Indexed: 11/22/2024] Open
Abstract
MOTIVATION Accurate prediction of drug-target interactions (DTIs), especially for novel targets or drugs, is crucial for accelerating drug discovery. Recent advances in pretrained language models (PLMs) and multi-modal learning present new opportunities to enhance DTI prediction by leveraging vast unlabeled molecular data and integrating complementary information from multiple modalities. RESULTS We introduce DrugLAMP (PLM-assisted multi-modal prediction), a PLM-based multi-modal framework for accurate and transferable DTI prediction. DrugLAMP integrates molecular graph and protein sequence features extracted by PLMs and traditional feature extractors. We introduce two novel multi-modal fusion modules: (i) pocket-guided co-attention (PGCA), which uses protein pocket information to guide the attention mechanism on drug features, and (ii) paired multi-modal attention (PMMA), which enables effective cross-modal interactions between drug and protein features. These modules work together to enhance the model's ability to capture complex drug-protein interactions. Moreover, the contrastive compound-protein pre-training (2C2P) module enhances the model's generalization to real-world scenarios by aligning features across modalities and conditions. Comprehensive experiments demonstrate DrugLAMP's state-of-the-art performance on both standard benchmarks and challenging settings simulating real-world drug discovery, where test drugs/targets are unseen during training. Visualizations of attention maps and application to predict cryptic pockets and drug side effects further showcase DrugLAMP's strong interpretability and generalizability. Ablation studies confirm the contributions of the proposed modules. AVAILABILITY AND IMPLEMENTATION Source code and datasets are freely available at https://github.com/Lzcstan/DrugLAMP. All data originate from public sources.
Collapse
Affiliation(s)
- Zhengchao Luo
- Department of Big Data and Biomedical AI, College of Future Technology, Peking University, Beijing 100871, China
| | - Wei Wu
- Department of Big Data and Biomedical AI, College of Future Technology, Peking University, Beijing 100871, China
| | - Qichen Sun
- School of Mathematical Sciences, Peking University, Beijing 100871, China
| | - Jinzhuo Wang
- Department of Big Data and Biomedical AI, College of Future Technology, Peking University, Beijing 100871, China
| |
Collapse
|
31
|
Ru X, Zhao S, Zou Q, Xu L. Identify potential drug candidates within a high-quality compound search space. Brief Bioinform 2024; 26:bbaf024. [PMID: 39853109 PMCID: PMC11758506 DOI: 10.1093/bib/bbaf024] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2024] [Revised: 12/10/2024] [Accepted: 01/14/2025] [Indexed: 01/26/2025] Open
Abstract
The identification of potential effective drug candidates is a fundamental step in new drug discovery, with profound implications for pharmaceutical research and the healthcare sector. While many computational methods have been developed for such predictions and have yielded promising results, two challenges persist: (i) The cold start problem of new drugs, which increases the difficulty of prediction due to lack of historical data or prior knowledge. (ii) The vastness of the compound search space for potential drug candidates. In this study, we present a promising method that not only enhances the accuracy of identifying potential novel drug candidates but also refines the search space. Drawing inspiration from solutions to the cold start problem in recommender systems, we apply 'learning to rank' techniques to the field of new drug discovery. Furthermore, we propose using three similarity metrics to condense the compound search space into compact yet high-quality spaces, allowing for more efficient screening of potential drug candidates. Experimental results from two widely used datasets demonstrate that our method outperforms other state-of-the-art approaches in the new drug cold-start scenario. Additionally, we have verified that it is feasible to identify potential drug candidates within these high-quality compound search spaces. To our knowledge, this study is the first to address drug cold-start problem in such a confined space, potentially providing valuable insights and guidance for drug screening.
Collapse
Affiliation(s)
- Xiaoqing Ru
- The Quzhou Affiliated Hospital of Wenzhou Medical University, Quzhou People's Hospital, No. 100, Minjiang Avenue, Smart New Town, Quzhou, Zhejiang Province, 324000, China
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, No. 1, Chengdian Road, Kecheng District, Quzhou, Zhejiang Province, 324003, China
| | - Shulin Zhao
- The Quzhou Affiliated Hospital of Wenzhou Medical University, Quzhou People's Hospital, No. 100, Minjiang Avenue, Smart New Town, Quzhou, Zhejiang Province, 324000, China
| | - Quan Zou
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, No. 1, Chengdian Road, Kecheng District, Quzhou, Zhejiang Province, 324003, China
| | - Lifeng Xu
- The Quzhou Affiliated Hospital of Wenzhou Medical University, Quzhou People's Hospital, No. 100, Minjiang Avenue, Smart New Town, Quzhou, Zhejiang Province, 324000, China
| |
Collapse
|
32
|
Xu C, Zheng L, Fan Q, Liu Y, Zeng C, Ning X, Liu H, Du K, Lu T, Chen Y, Zhang Y. Progress in the application of artificial intelligence in molecular generation models based on protein structure. Eur J Med Chem 2024; 277:116735. [PMID: 39098131 DOI: 10.1016/j.ejmech.2024.116735] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2024] [Revised: 07/12/2024] [Accepted: 07/30/2024] [Indexed: 08/06/2024]
Abstract
The molecular generation models based on protein structures represent a cutting-edge research direction in artificial intelligence-assisted drug discovery. This article aims to comprehensively summarize the research methods and developments by analyzing a series of novel molecular generation models predicated on protein structures. Initially, we categorize the molecular generation models based on protein structures and highlight the architectural frameworks utilized in these models. Subsequently, we detail the design and implementation of protein structure-based molecular generation models by introducing different specific examples. Lastly, we outline the current opportunities and challenges encountered in this field, intending to offer guidance and a referential framework for developing and studying new models in related fields in the future.
Collapse
Affiliation(s)
- Chengcheng Xu
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing, 211198, China
| | - Lidan Zheng
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing, 211198, China
| | - Qing Fan
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing, 211198, China
| | - Yingxu Liu
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing, 211198, China
| | - Chen Zeng
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing, 211198, China
| | - Xiangzhen Ning
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing, 211198, China
| | - Haichun Liu
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing, 211198, China
| | - Ke Du
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing, 211198, China
| | - Tao Lu
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing, 211198, China; State Key Laboratory of Natural Medicines, China Pharmaceutical University, 24 Tongjiaxiang, Nanjing, 210009, China.
| | - Yadong Chen
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing, 211198, China.
| | - Yanmin Zhang
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing, 211198, China.
| |
Collapse
|
33
|
E U, T M, A V G, D P. A comprehensive survey of drug-target interaction analysis in allopathy and siddha medicine. Artif Intell Med 2024; 157:102986. [PMID: 39326289 DOI: 10.1016/j.artmed.2024.102986] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2023] [Revised: 08/13/2024] [Accepted: 09/18/2024] [Indexed: 09/28/2024]
Abstract
Effective drug delivery is the cornerstone of modern healthcare, ensuring therapeutic compounds reach their intended targets efficiently. This paper explores the potential of personalized and holistic healthcare, driven by the synergy between traditional and allopathic medicine systems, with a specific focus on the vast reservoir of medicinal compounds found in plants rooted in the historical legacy of traditional medicine. Motivated by the desire to unlock the therapeutic potential of medicinal plants and bridge the gap between traditional and allopathic medicine, this survey delves into in-silico computational approaches for studying Drug-Target Interactions (DTI) within the contexts of allopathy and siddha medicine. The contributions of this survey are multifaceted: it offers a comprehensive overview of in-silico methods for DTI analysis in both systems, identifies common challenges in DTI studies, provides insights into future directions to advance DTI analysis, and includes a comparative analysis of DTI in allopathy and siddha medicine. The findings of this survey highlight the pivotal role of in-silico computational approaches in advancing drug research and development in both allopathy and siddha medicine, emphasizing the importance of integrating these methods to drive the future of personalized healthcare.
Collapse
Affiliation(s)
- Uma E
- Department of Information Science and Technology, College of Engineering Guindy, Chennai, India.
| | - Mala T
- Department of Information Science and Technology, College of Engineering Guindy, Chennai, India
| | - Geetha A V
- Department of Information Science and Technology, College of Engineering Guindy, Chennai, India
| | - Priyanka D
- Department of Information Science and Technology, College of Engineering Guindy, Chennai, India
| |
Collapse
|
34
|
Zhang Q, Wei Y, Liao B, Liu L, Zhang S. MMD-DTA: A Multi-Modal Deep Learning Framework for Drug-Target Binding Affinity and Binding Region Prediction. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2024; 21:2200-2211. [PMID: 39208057 DOI: 10.1109/tcbb.2024.3451985] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/04/2024]
Abstract
The prediction of drug-target affinity (DTA) plays a crucial role in drug development and the identification of potential drug targets. In recent years, computer-assisted DTA prediction has emerged as a significant approach in this field. In this study, we propose a multi-modal deep learning framework called MMD-DTA for predicting drug-target binding affinity and binding regions. The model can predict DTA while simultaneously learning the binding regions of drug-target interactions through unsupervised learning. To achieve this, MMD-DTA first uses graph neural networks and target structural feature extraction network to extract multi-modal information from the sequences and structures of drugs and targets. It then utilizes the feature interaction and fusion modules to generate interaction descriptors for predicting DTA and interaction strength for binding region prediction. Our experimental results demonstrate that MMD-DTA outperforms existing models based on key evaluation metrics. Furthermore, external validation results indicate that MMD-DTA enhances the generalization capability of the model by integrating sequence and structural information of drugs and targets. The model trained on the benchmark dataset can effectively generalize to independent virtual screening tasks. The visualization of drug-target binding region prediction showcases the interpretability of MMD-DTA, providing valuable insights into the functional regions of drug molecules that interact with proteins.
Collapse
|
35
|
Liu Y, Xia X, Gong Y, Song B, Zeng X. SSR-DTA: Substructure-aware multi-layer graph neural networks for drug-target binding affinity prediction. Artif Intell Med 2024; 157:102983. [PMID: 39321746 DOI: 10.1016/j.artmed.2024.102983] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2023] [Revised: 09/10/2024] [Accepted: 09/13/2024] [Indexed: 09/27/2024]
Abstract
Accurate prediction of drug-target binding affinity (DTA) is essential in the field of drug discovery. Recently, scientists have been attempting to utilize artificial intelligence prediction to screen out a significant number of ineffective compounds, thereby mitigating labor and financial losses. While graph neural networks (GNNs) have been applied to DTA, existing GNNs have limitations in effectively extracting substructural features across various sizes. Functional groups play a crucial role in modulating molecular properties, but existing GNNs struggle with feature extraction from certain motifs due to scale mismatches. Additionally, sequence-based models for target proteins lack the integration of structural information. To address these limitations, we present SSR-DTA, a multi-layer graph network capable of adapting to diverse structural sizes, which can extract richer biological features, thereby improving the robustness and accuracy of predictions. Multi-layer GNNs enable the capture of molecular motifs across different scales, ranging from atomic to macrocyclic motifs. Furthermore, we introduce BiGNN to simultaneously learn sequence and structural information. Sequence information corresponds to the primary structure of proteins, while graph information represents the tertiary structure. BiGNN assimilates richer information compared to sequence-based methods while mitigating the impact of errors from predicted structures, resulting in more accurate predictions. Through rigorous experimental evaluations conducted on four benchmark datasets, we demonstrate the superiority of SSR-DTA over state-of-the-art models. Particularly, in comparison to state-of-the-art models, SSR-DTA demonstrates an impressive 20% reduction in mean squared error on the Davis dataset and a 5% reduction on the KIBA dataset, underscoring its potential as a valuable tool for advancing DTA prediction.
Collapse
Affiliation(s)
- Yuansheng Liu
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, 410086, Hunan, China; Key Laboratory of Intelligent Computing & Signal Processing of Ministry of Education, Anhui University, Hefei, 230601, Anhui, China
| | - Xinyan Xia
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, 410086, Hunan, China
| | - Yongshun Gong
- School of Software, Shandong University, Jinan, 250100, Shandong, China
| | - Bosheng Song
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, 410086, Hunan, China
| | - Xiangxiang Zeng
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, 410086, Hunan, China.
| |
Collapse
|
36
|
Tang X, Ma W, Yang M, Li W. MFF-DTA: Multi-scale feature fusion for drug-target affinity prediction. Methods 2024; 231:1-7. [PMID: 39218169 DOI: 10.1016/j.ymeth.2024.08.008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2024] [Revised: 07/19/2024] [Accepted: 08/27/2024] [Indexed: 09/04/2024] Open
Abstract
Accurately predicting drug-target affinity is crucial in expediting the discovery and development of new drugs, which is a complex and risky process. Identifying these interactions not only aids in screening potential compounds but also guides further optimization. To address this, we propose a multi-perspective feature fusion model, MFF-DTA, which integrates chemical structure, biological sequence, and other data to comprehensively capture drug-target affinity features. The MFF-DTA model incorporates multiple feature learning components, each of which is capable of extracting drug molecular features and protein target information, respectively. These components are able to obtain key information from both global and local perspectives. Then, these features from different perspectives are efficiently combined using specific splicing strategies to create a comprehensive representation. Finally, the model uses the fused features to predict drug-target affinity. Comparative experiments show that MFF-DTA performs optimally on the Davis and KIBA data sets. Ablation experiments demonstrate that removing specific components results in the loss of unique information, thus confirming the effectiveness of the MFF-DTA design. Improvements in DTA prediction methods will decrease costs and time in drug development, enhancing industry efficiency and ultimately benefiting patients.
Collapse
Affiliation(s)
- Xiwei Tang
- School of Computer Science, Hunan First Normal University, Changsha, Hunan, China.
| | - Wanjun Ma
- Hunan Provincial Key Laboratory of Intelligent Processing of Big Data on Transportation, Changsha University of Science and Technology Changsha, Hunan, China
| | - Mengyun Yang
- School of Computer Science, Hunan First Normal University, Changsha, Hunan, China
| | - Wenjun Li
- Hunan Provincial Key Laboratory of Intelligent Processing of Big Data on Transportation, Changsha University of Science and Technology Changsha, Hunan, China
| |
Collapse
|
37
|
Quan L, Wu J, Jiang Y, Pan D, Qiang L. DTA-GTOmega: Enhancing Drug-Target Binding Affinity Prediction with Graph Transformers Using OmegaFold Protein Structures. J Mol Biol 2024:168843. [PMID: 39481634 DOI: 10.1016/j.jmb.2024.168843] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2024] [Revised: 10/05/2024] [Accepted: 10/24/2024] [Indexed: 11/02/2024]
Abstract
Understanding drug-protein interactions is crucial for elucidating drug mechanisms and optimizing drug development. However, existing methods have limitations in representing the three-dimensional structure of targets and capturing the complex relationships between drugs and targets. This study proposes a new method, DTA-GTOmega, for predicting drug-target binding affinity. DTA-GTOmega utilizes OmegaFold to predict protein three-dimensional structure and construct target graphs, while processing drug SMILES sequences with RDKit to generate drug graphs. By employing multi-layer graph transformer modules and co-attention modules, this method effectively integrates atomic-level features of drugs and residue-level features of targets, accurately modeling the complex interactions between drugs and targets, thereby significantly improving the accuracy of binding affinity predictions. Our method outperforms existing techniques on benchmark datasets such as KIBA, Davis, and BindingDB_Kd under cold-start setting. Moreover, DTA-GTOmega demonstrates competitive performance in real-world DTI scenarios involving DrugBank data and drug-target interactions related to cardiovascular and nervous system-related diseases, highlighting its robust generalization capabilities. Additionally, the introduced DTI evaluation metrics further validate DTA-GTOmega's potential in handling imbalanced data.
Collapse
Affiliation(s)
- Lijun Quan
- School of Computer Science and Technology, Soochow University, Jiangsu 215006, China; Collaborative Innovation Center of Novel Software Technology and Industrialization, Jiangsu 210000, China
| | - Jian Wu
- China Mobile (Suzhou) Software Technology Co., Ltd., Suzhou 215000, China
| | - Yelu Jiang
- School of Computer Science and Technology, Soochow University, Jiangsu 215006, China
| | - Deng Pan
- School of Computer Science and Technology, Soochow University, Jiangsu 215006, China
| | - Lyu Qiang
- School of Computer Science and Technology, Soochow University, Jiangsu 215006, China; Collaborative Innovation Center of Novel Software Technology and Industrialization, Jiangsu 210000, China.
| |
Collapse
|
38
|
Sun X, Huang J, Fang Y, Jin Y, Wu J, Wang G, Jia J. MREDTA: A BERT and transformer-based molecular representation encoder for predicting drug-target binding affinity. FASEB J 2024; 38:e70083. [PMID: 39373982 DOI: 10.1096/fj.202401254r] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2024] [Revised: 09/05/2024] [Accepted: 09/18/2024] [Indexed: 10/08/2024]
Abstract
Drug-target binding affinity (DTA) prediction is vital for drug repositioning. The accuracy and generalizability of DTA models remain a major challenge. Here, we develop a model composed of BERT-Trans Block, Multi-Trans Block, and DTI Learning modules, referred to as Molecular Representation Encoder-based DTA prediction (MREDTA). MREDTA has three advantages: (1) extraction of both local and global molecular features simultaneously through skip connections; (2) improved sensitivity to molecular structures through the Multi-Trans Block; (3) enhanced generalizability through the introduction of BERT. Compared with 12 advanced models, benchmark testing of KIBA and Davis datasets demonstrated optimal performance of MREDTA. In case study, we applied MREDTA to 2034 FDA-approved drugs for treating non-small-cell lung cancer (NSCLC), all of which act on mutant EGFRT790M protein. The corresponding molecular docking results demonstrated the robustness of MREDTA.
Collapse
Affiliation(s)
- Xu Sun
- Department of Computational Mathematics, School of Mathematics, Jilin University, Changchun, China
| | - Juanjuan Huang
- Department of Computational Mathematics, School of Mathematics, Jilin University, Changchun, China
- State Key Laboratory for Diagnosis and Treatment of Severe Zoonotic Infectious Diseases, Key Laboratory for Zoonosis Research of the Ministry of Education, College of Basic Medicine, Jilin University, Changchun, China
| | - Yabo Fang
- Department of Computational Mathematics, School of Mathematics, Jilin University, Changchun, China
| | - Yixuan Jin
- Department of Computational Mathematics, School of Mathematics, Jilin University, Changchun, China
| | - Jiageng Wu
- Department of Computational Mathematics, School of Mathematics, Jilin University, Changchun, China
| | - Guoqing Wang
- State Key Laboratory for Diagnosis and Treatment of Severe Zoonotic Infectious Diseases, Key Laboratory for Zoonosis Research of the Ministry of Education, College of Basic Medicine, Jilin University, Changchun, China
| | - Jiwei Jia
- Department of Computational Mathematics, School of Mathematics, Jilin University, Changchun, China
- Jilin National Applied Mathematical Center, Jilin University, Changchun, China
| |
Collapse
|
39
|
Tang X, Zhou Y, Yang M, Li W. TC-DTA: Predicting Drug-Target Binding Affinity With Transformer and Convolutional Neural Networks. IEEE Trans Nanobioscience 2024; 23:572-578. [PMID: 39133595 DOI: 10.1109/tnb.2024.3441590] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/16/2024]
Abstract
Bioinformatics is a rapidly evolving field that applies computational methods to analyze and interpret biological data. A key task in bioinformatics is identifying novel drug-target interactions (DTIs), which plays a crucial role in drug discovery. Most computational approaches treat DTI prediction as a binary classification problem, determining whether drug-target pairs interact. However, with the growing availability of drug-target binding affinity data, this binary task can be reframed as a regression problem focused on drug-target affinity (DTA). DTA quantifies the strength of drug-target binding, offering more detailed insights than DTI and serving as a valuable tool for virtual screening in drug discovery. Accurately predicting compound interactions with targets can accelerate the drug development process. In this study, we introduce a deep learning model named TC-DTA for DTA prediction, leveraging convolutional neural networks (CNN) and the encoder module of the transformer architecture. We begin by extracting raw drug SMILES strings and protein amino acid sequences from the dataset, which are then represented using various encoding methods. Subsequently, we employ CNN and the transformer's encoder module to extract features from the drug SMILES strings and protein sequences, respectively. Finally, the feature information is concatenated and input into a multi-layer perceptron to predict binding affinity scores. We evaluated our model on two benchmark DTA datasets, Davis and KIBA, comparing it with methods such as KronRLS, SimBoost, and DeepDTA. Our model, TC-DTA, outperformed these baseline methods based on evaluation metrics like Mean Squared Error (MSE), Concordance Index (CI), and Regression towards the Mean Index ( rm2 ). These results highlight the effectiveness of the Transformer's encoder and CNN in extracting meaningful representations from sequences, thereby enhancing DTA prediction accuracy. This deep learning model can accelerate drug discovery by identifying drug candidates with high binding affinity to specific targets. Compared to traditional methods, machine learning technology offers a more effective and efficient approach to drug discovery.
Collapse
|
40
|
Huang J, Sun C, Li M, Tang R, Xie B, Wang S, Wei JM. Structure-inclusive similarity based directed GNN: a method that can control information flow to predict drug-target binding affinity. Bioinformatics 2024; 40:btae563. [PMID: 39292540 PMCID: PMC11474107 DOI: 10.1093/bioinformatics/btae563] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2024] [Revised: 05/21/2024] [Accepted: 09/17/2024] [Indexed: 09/20/2024] Open
Abstract
MOTIVATION Exploring the association between drugs and targets is essential for drug discovery and repurposing. Comparing with the traditional methods that regard the exploration as a binary classification task, predicting the drug-target binding affinity can provide more specific information. Many studies work based on the assumption that similar drugs may interact with the same target. These methods constructed a symmetric graph according to the undirected drug similarity or target similarity. Although these similarities can measure the difference between two molecules, it is unable to analyze the inclusion relationship of their substructure. For example, if drug A contains all the substructures of drug B, then in the message-passing mechanism of the graph neural network, drug A should acquire all the properties of drug B, while drug B should only obtain some of the properties of A. RESULTS To this end, we proposed a structure-inclusive similarity (SIS) which measures the similarity of two drugs by considering the inclusion relationship of their substructures. Based on SIS, we constructed a drug graph and a target graph, respectively, and predicted the binding affinities between drugs and targets by a graph convolutional network-based model. Experimental results show that considering the inclusion relationship of the substructure of two molecules can effectively improve the accuracy of the prediction model. The performance of our SIS-based prediction method outperforms several state-of-the-art methods for drug-target binding affinity prediction. The case studies demonstrate that our model is a practical tool to predict the binding affinity between drugs and targets. AVAILABILITY AND IMPLEMENTATION Source codes and data are available at https://github.com/HuangStomach/SISDTA.
Collapse
Affiliation(s)
- Jipeng Huang
- Centre for Bioinformatics and Intelligent Medicine, Nankai University, Tianjin 300071, China
- College of Computer Science, Nankai University, Tianjin 300071, China
- Tianjin Key Laboratory of Network and Data Security, Tianjin 300350, China
| | - Chang Sun
- Centre for Bioinformatics and Intelligent Medicine, Nankai University, Tianjin 300071, China
- College of Computer Science, Nankai University, Tianjin 300071, China
- Tianjin Key Laboratory of Network and Data Security, Tianjin 300350, China
| | - Minglei Li
- Centre for Bioinformatics and Intelligent Medicine, Nankai University, Tianjin 300071, China
- College of Computer Science, Nankai University, Tianjin 300071, China
- Tianjin Key Laboratory of Network and Data Security, Tianjin 300350, China
| | - Rong Tang
- Centre for Bioinformatics and Intelligent Medicine, Nankai University, Tianjin 300071, China
- College of Computer Science, Nankai University, Tianjin 300071, China
- Tianjin Key Laboratory of Network and Data Security, Tianjin 300350, China
| | - Bin Xie
- College of Computer and Cyber Security, Hebei Normal University, Shijiazhuang 050024, China
| | - Shuqin Wang
- College of Computer and Information Engineering, Tianjin Normal University, Tianjin, Xi Qing District 300387, China
| | - Jin-Mao Wei
- Centre for Bioinformatics and Intelligent Medicine, Nankai University, Tianjin 300071, China
- College of Computer Science, Nankai University, Tianjin 300071, China
| |
Collapse
|
41
|
Durant G, Boyles F, Birchall K, Deane CM. The future of machine learning for small-molecule drug discovery will be driven by data. NATURE COMPUTATIONAL SCIENCE 2024; 4:735-743. [PMID: 39407003 DOI: 10.1038/s43588-024-00699-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/01/2024] [Accepted: 09/03/2024] [Indexed: 10/25/2024]
Abstract
Many studies have prophesied that the integration of machine learning techniques into small-molecule therapeutics development will help to deliver a true leap forward in drug discovery. However, increasingly advanced algorithms and novel architectures have not always yielded substantial improvements in results. In this Perspective, we propose that a greater focus on the data for training and benchmarking these models is more likely to drive future improvement, and explore avenues for future research and strategies to address these data challenges.
Collapse
Affiliation(s)
- Guy Durant
- Department of Statistics, University of Oxford, Oxford, UK
| | - Fergus Boyles
- Department of Statistics, University of Oxford, Oxford, UK
| | | | | |
Collapse
|
42
|
Zhao L, Wang H, Shi S. PocketDTA: an advanced multimodal architecture for enhanced prediction of drug-target affinity from 3D structural data of target binding pockets. Bioinformatics 2024; 40:btae594. [PMID: 39365726 PMCID: PMC11502498 DOI: 10.1093/bioinformatics/btae594] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2024] [Revised: 09/20/2024] [Accepted: 10/02/2024] [Indexed: 10/06/2024] Open
Abstract
MOTIVATION Accurately predicting the drug-target binding affinity (DTA) is crucial to drug discovery and repurposing. Although deep learning has been widely used in this field, it still faces challenges with insufficient generalization performance, inadequate use of 3D information, and poor interpretability. RESULTS To alleviate these problems, we developed the PocketDTA model. This model enhances the generalization performance by pre-trained models ESM-2 and GraphMVP. It ingeniously handles the first 3 (top-3) target binding pockets and drug 3D information through customized GVP-GNN Layers and GraphMVP-Decoder. In addition, it uses a bilinear attention network to enhance interpretability. Comparative analysis with state-of-the-art (SOTA) methods on the optimized Davis and KIBA datasets reveals that the PocketDTA model exhibits significant performance advantages. Further, ablation studies confirm the effectiveness of the model components, whereas cold-start experiments illustrate its robust generalization capabilities. In particular, the PocketDTA model has shown significant advantages in identifying key drug functional groups and amino acid residues via molecular docking and literature validation, highlighting its strong potential for interpretability. AVAILABILITY AND IMPLEMENTATION Code and data are available at: https://github.com/zhaolongNCU/PocketDTA.
Collapse
Affiliation(s)
- Long Zhao
- Department of Mathematics, School of Mathematics and Computer Sciences, Nanchang University, Nanchang 330031, China
| | - Hongmei Wang
- Department of Mathematics, School of Mathematics and Computer Sciences, Nanchang University, Nanchang 330031, China
| | - Shaoping Shi
- Department of Mathematics, School of Mathematics and Computer Sciences, Nanchang University, Nanchang 330031, China
- Institute of Mathematics and Interdisciplinary Sciences, Nanchang University, Nanchang 330031, China
| |
Collapse
|
43
|
Ye Q, Sun Y. Graph neural pre-training based drug-target affinity prediction. Front Genet 2024; 15:1452339. [PMID: 39350770 PMCID: PMC11439641 DOI: 10.3389/fgene.2024.1452339] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2024] [Accepted: 08/27/2024] [Indexed: 10/04/2024] Open
Abstract
Computational drug-target affinity prediction has the potential to accelerate drug discovery. Currently, pre-training models have achieved significant success in various fields due to their ability to train the model using vast amounts of unlabeled data. However, given the scarcity of drug-target interaction data, pre-training models can only be trained separately on drug and target data, resulting in features that are insufficient for drug-target affinity prediction. To address this issue, in this paper, we design a graph neural pre-training-based drug-target affinity prediction method (GNPDTA). This approach comprises three stages. In the first stage, two pre-training models are utilized to extract low-level features from drug atom graphs and target residue graphs, leveraging a large number of unlabeled training samples. In the second stage, two 2D convolutional neural networks are employed to combine the extracted drug atom features and target residue features into high-level representations of drugs and targets. Finally, in the third stage, a predictor is used to predict the drug-target affinity. This approach fully utilizes both unlabeled and labeled training samples, enhancing the effectiveness of pre-training models for drug-target affinity prediction. In our experiments, GNPDTA outperforms other deep learning methods, validating the efficacy of our approach.
Collapse
Affiliation(s)
- Qing Ye
- School of Data Science and Artificial Intelligence, Wenzhou University of Technology, Wenzhou, China
| | - Yaxin Sun
- School of Computer Science and Technology (School of Artificial Intelligence), Zhejiang Normal University, Jinhua, China
- Zhejiang Aerospace Hengjia Data Technology Co. Ltd., Jiaxing, China
| |
Collapse
|
44
|
Chen G, He H, Lv Q, Zhao L, Chen CYC. MMFA-DTA: Multimodal Feature Attention Fusion Network for Drug-Target Affinity Prediction for Drug Repurposing Against SARS-CoV-2. J Chem Theory Comput 2024. [PMID: 39269697 DOI: 10.1021/acs.jctc.4c00663] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/15/2024]
Abstract
The continuous emergence of novel infectious diseases poses a significant threat to global public health security, necessitating the development of small-molecule inhibitors that directly target pathogens. The RNA-dependent RNA polymerase (RdRp) and main protease (Mpro) of SARS-CoV-2 have been validated as potential key antiviral drug targets for the treatment of COVID-19. However, the conventional new drug R&D cycle takes 10-15 years, failing to meet the urgent needs during epidemics. Here, we propose a general multimodal deep learning framework for drug repurposing, MMFA-DTA, to enable rapid virtual screening of known drugs and significantly improve discovery efficiency. By extracting graph topological and sequence features from both small molecules and proteins, we design attention mechanisms to achieve dynamic fusion across modalities. Results demonstrate the superior performance of MMFA-DTA in drug-target affinity prediction over several state-of-the-art baseline methods on Davis and KIBA data sets, validating the benefits of heterogeneous information integration for representation learning and interaction modeling. Further fine-tuning on COVID-19-relevant bioactivity data enhances model predictions for critical SARS-CoV-2 enzymes. Case studies screening the FDA-approved drug library successfully identify etacrynic acid as the potential lead compound against both RdRp and Mpro. Molecular dynamics simulations further confirm the stability and binding affinity of etacrynic acid to these targets. This study proves the great potential and advantages of deep learning and drug repurposing strategies in supporting antiviral drug discovery. The proposed general and rapid response computational framework holds significance for preparedness against future public health events.
Collapse
Affiliation(s)
- Guanxing Chen
- Artificial Intelligence Medical Research Center, School of Intelligent Systems Engineering, Shenzhen Campus of Sun Yat-sen University, Shenzhen, Guangdong 518107, China
| | - Haohuai He
- Artificial Intelligence Medical Research Center, School of Intelligent Systems Engineering, Shenzhen Campus of Sun Yat-sen University, Shenzhen, Guangdong 518107, China
| | - Qiujie Lv
- School of Computer and Artificial Intelligence, Zhengzhou University, Zhengzhou, Henan 450001, China
| | - Lu Zhao
- State Key Laboratory of Chemical Oncogenomics, School of Chemical Biology and Biotechnology, Peking University Shenzhen Graduate School, Shenzhen, Guangdong 518055, China
| | - Calvin Yu-Chian Chen
- Artificial Intelligence Medical Research Center, School of Intelligent Systems Engineering, Shenzhen Campus of Sun Yat-sen University, Shenzhen, Guangdong 518107, China
- State Key Laboratory of Chemical Oncogenomics, School of Chemical Biology and Biotechnology, Peking University Shenzhen Graduate School, Shenzhen, Guangdong 518055, China
- AI for Science (AI4S)-Preferred Program, School of Electronic and Computer Engineering, Peking University Shenzhen Graduate School, Shenzhen, Guangdong 518055, China
- Department of Medical Research, China Medical University Hospital, Taichung 40447, Taiwan
- Department of Bioinformatics and Medical Engineering, Asia University, Taichung 41354, Taiwan
- Guangdong L-Med Biotechnology Co., Ltd, Meizhou, Guangdong 514699, China
| |
Collapse
|
45
|
Ji W, She S, Qiao C, Feng Q, Rui M, Xu X, Feng C. A general prediction model for compound-protein interactions based on deep learning. Front Pharmacol 2024; 15:1465890. [PMID: 39295942 PMCID: PMC11408283 DOI: 10.3389/fphar.2024.1465890] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2024] [Accepted: 08/20/2024] [Indexed: 09/21/2024] Open
Abstract
Background The identification of compound-protein interactions (CPIs) is crucial for drug discovery and understanding mechanisms of action. Accurate CPI prediction can elucidate drug-target-disease interactions, aiding in the discovery of candidate compounds and effective synergistic drugs, particularly from traditional Chinese medicine (TCM). Existing in silico methods face challenges in prediction accuracy and generalization due to compound and target diversity and the lack of largescale interaction datasets and negative datasets for model learning. Methods To address these issues, we developed a computational model for CPI prediction by integrating the constructed large-scale bioactivity benchmark dataset with a deep learning (DL) algorithm. To verify the accuracy of our CPI model, we applied it to predict the targets of compounds in TCM. An herb pair of Astragalus membranaceus and Hedyotis diffusaas was used as a model, and the active compounds in this herb pair were collected from various public databases and the literature. The complete targets of these active compounds were predicted by the CPI model, resulting in an expanded target dataset. This dataset was next used for the prediction of synergistic antitumor compound combinations. The predicted multi-compound combinations were subsequently examined through in vitro cellular experiments. Results Our CPI model demonstrated superior performance over other machine learning models, achieving an area under the Receiver Operating Characteristic curve (AUROC) of 0.98, an area under the precision-recall curve (AUPR) of 0.98, and an accuracy (ACC) of 93.31% on the test set. The model's generalization capability and applicability were further confirmed using external databases. Utilizing this model, we predicted the targets of compounds in the herb pair of Astragalus membranaceus and Hedyotis diffusaas, yielding an expanded target dataset. Then, we integrated this expanded target dataset to predict effective drug combinations using our drug synergy prediction model DeepMDS. Experimental assay on breast cancer cell line MDA-MB-231 proved the efficacy of the best predicted multi-compound combinations: Combination I (Epicatechin, Ursolic acid, Quercetin, Aesculetin and Astragaloside IV) exhibited a half-maximal inhibitory concentration (IC50) value of 19.41 μM, and a combination index (CI) value of 0.682; and Combination II (Epicatechin, Ursolic acid, Quercetin, Vanillic acid and Astragaloside IV) displayed a IC50 value of 23.83 μM and a CI value of 0.805. These results validated the ability of our model to make accurate predictions for novel CPI data outside the training dataset and evaluated the reliability of the predictions, showing good applicability potential in drug discovery and in the elucidation of the bioactive compounds in TCM. Conclusion Our CPI prediction model can serve as a useful tool for accurately identifying potential CPI for a wide range of proteins, and is expected to facilitate drug research, repurposing and support the understanding of TCM.
Collapse
Affiliation(s)
- Wei Ji
- School of Pharmacy, Jiangsu University, Zhenjiang, China
- School of Medicine, Jiangsu University, Zhenjiang, China
| | - Shengnan She
- School of Pharmacy, Jiangsu University, Zhenjiang, China
| | - Chunxue Qiao
- School of Pharmacy, Jiangsu University, Zhenjiang, China
| | - Qiuqi Feng
- School of Pharmacy, Jiangsu University, Zhenjiang, China
| | - Mengjie Rui
- School of Pharmacy, Jiangsu University, Zhenjiang, China
| | - Ximing Xu
- School of Pharmacy, Jiangsu University, Zhenjiang, China
| | - Chunlai Feng
- School of Pharmacy, Jiangsu University, Zhenjiang, China
| |
Collapse
|
46
|
Chen J, Yang X, Wu H. A Multibranch Neural Network for Drug-Target Affinity Prediction Using Similarity Information. ACS OMEGA 2024; 9:35978-35989. [PMID: 39184467 PMCID: PMC11339836 DOI: 10.1021/acsomega.4c05607] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/15/2024] [Revised: 08/03/2024] [Accepted: 08/06/2024] [Indexed: 08/27/2024]
Abstract
Predicting drug-target affinity (DTA) is beneficial for accelerating drug discovery. In recent years, graph structure-based deep learning models have garnered significant attention in this field. However, these models typically handle drug or target protein in isolation and only extract the molecular structure information on the drug or protein itself. To address this limitation, existing network-based models represent drug-target interactions or affinities as a knowledge graph to capture the interaction information. In this study, we propose a novel solution. Specifically, we introduce drug similarity information and protein similarity information into the field of DTA prediction. Moreover, we propose a network framework that autonomously extracts similarity information, avoiding reliance on knowledge graphs. Based on this framework, we design a multibranch neural network called GASI-DTA. This network integrates similarity information, sequence information, and molecular structure information. Comprehensive experimental results conducted on two benchmark data sets and three cold-start scenarios demonstrate that our model outperforms state-of-the-art graph structure-based methods in nearly all metrics. Furthermore, it exhibits significant advantages over existing network-based models, outperforming the best of them in the majority of metrics. Our study's code and data are openly accessible at http://github.com/XiaoLin-Yang-S/GASI-DTA.
Collapse
Affiliation(s)
- Jing Chen
- School
of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi 214122, China
- Jiangsu
Provincial Engineering Laboratory of Pattern Recognition and Computing
Intelligence, Jiangnan University, Wuxi 214122, China
| | - Xiaolin Yang
- School
of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi 214122, China
| | - Haoyu Wu
- School
of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi 214122, China
| |
Collapse
|
47
|
Bernett J, Blumenthal DB, Grimm DG, Haselbeck F, Joeres R, Kalinina OV, List M. Guiding questions to avoid data leakage in biological machine learning applications. Nat Methods 2024; 21:1444-1453. [PMID: 39122953 DOI: 10.1038/s41592-024-02362-y] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2024] [Accepted: 06/26/2024] [Indexed: 08/12/2024]
Abstract
Machine learning methods for extracting patterns from high-dimensional data are very important in the biological sciences. However, in certain cases, real-world applications cannot confirm the reported prediction performance. One of the main reasons for this is data leakage, which can be seen as the illicit sharing of information between the training data and the test data, resulting in performance estimates that are far better than the performance observed in the intended application scenario. Data leakage can be difficult to detect in biological datasets due to their complex dependencies. With this in mind, we present seven questions that should be asked to prevent data leakage when constructing machine learning models in biological domains. We illustrate the usefulness of our questions by applying them to nontrivial examples. Our goal is to raise awareness of potential data leakage problems and to promote robust and reproducible machine learning-based research in biology.
Collapse
Affiliation(s)
- Judith Bernett
- TUM School of Life Sciences, Technical University of Munich, Freising, Germany
| | - David B Blumenthal
- Department Artificial Intelligence in Biomedical Engineering, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany.
| | - Dominik G Grimm
- TUM Campus Straubing for Biotechnology and Sustainability, Technical University of Munich, Straubing, Germany.
- Bioinformatics, Weihenstephan-Triesdorf University of Applied Sciences, Straubing, Germany.
- TUM School of Computation, Information and Technology, Technical University of Munich, Garching, Germany.
| | - Florian Haselbeck
- TUM Campus Straubing for Biotechnology and Sustainability, Technical University of Munich, Straubing, Germany
- Bioinformatics, Weihenstephan-Triesdorf University of Applied Sciences, Straubing, Germany
- Smart Farming, Weihenstephan-Triesdorf University of Applied Sciences, Freising, Germany
| | - Roman Joeres
- Department of Chemistry and Molecular Biology, University of Gothenburg, Gothenburg, Sweden
- Wallenberg Centre for Molecular and Translational Medicine, University of Gothenburg, Gothenburg, Sweden
- Helmholtz Institute for Pharmaceutical Research Saarland (HIPS), Helmholtz Centre for Infection Research (HZI), Saarbrücken, Germany
- Center for Bioinformatics, Saarland University, Saarbrücken, Germany
| | - Olga V Kalinina
- Helmholtz Institute for Pharmaceutical Research Saarland (HIPS), Helmholtz Centre for Infection Research (HZI), Saarbrücken, Germany.
- Center for Bioinformatics, Saarland University, Saarbrücken, Germany.
- Medical Faculty, Saarland University, Homburg, Germany.
| | - Markus List
- TUM School of Life Sciences, Technical University of Munich, Freising, Germany.
- Munich Data Science Institute (MDSI), Technical University of Munich, Garching, Germany.
| |
Collapse
|
48
|
Yu H, Xu WX, Tan T, Liu Z, Shi JY. Prediction of drug-target binding affinity based on multi-scale feature fusion. Comput Biol Med 2024; 178:108699. [PMID: 38870725 DOI: 10.1016/j.compbiomed.2024.108699] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2024] [Revised: 05/05/2024] [Accepted: 06/01/2024] [Indexed: 06/15/2024]
Abstract
Accurate prediction of drug-target binding affinity (DTA) plays a pivotal role in drug discovery and repositioning. Although deep learning methods are widely used in DTA prediction, two significant challenges persist: (i) how to effectively represent the complex structural information of proteins and drugs; (ii) how to precisely model the mutual interactions between protein binding sites and key drug substructures. To address these challenges, we propose a MSFFDTA (Multi-scale feature fusion for predicting drug target affinity) model, in which multi-scale encoders effectively capture multi-level structural information of drugs and proteins are designed. And then a Selective Cross Attention (SCA) mechanism is developed to filter out the trivial interactions between drug-protein substructure pairs and retain the important ones, which will make the proposed model better focusing on these key interactions and offering insights into their underlying mechanism. Experimental results on two benchmark datasets demonstrate that MSFFDTA is superior to several state-of-the-art methods across almost all comparison metrics. Finally, we provide the ablation and case studies with visualizations to verify the effectiveness and the interpretability of MSFFDTA. The source code is freely available at https://github.com/whitehat32/MSFF-DTA/.
Collapse
Affiliation(s)
- Hui Yu
- School of Computer Science, Northwestern Polytechnical University, Xi'an, 710072, China.
| | - Wen-Xin Xu
- School of Computer Science, Northwestern Polytechnical University, Xi'an, 710072, China.
| | - Tian Tan
- School of Computer Science, Northwestern Polytechnical University, Xi'an, 710072, China.
| | - Zun Liu
- School of Computer Science, Northwestern Polytechnical University, Xi'an, 710072, China.
| | - Jian-Yu Shi
- School of Life Sciences, Northwestern Polytechnical University, Xi'an, 710072, China.
| |
Collapse
|
49
|
Yang X, Yang G, Chu J. GraphCL-DTA: A Graph Contrastive Learning With Molecular Semantics for Drug-Target Binding Affinity Prediction. IEEE J Biomed Health Inform 2024; 28:4544-4552. [PMID: 38190664 DOI: 10.1109/jbhi.2024.3350666] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2024]
Abstract
Drug-target binding affinity prediction plays an important role in the early stages of drug discovery, which can infer the strength of interactions between new drugs and new targets. However, the performance of previous computational models is limited by the following drawbacks. The learning of drug representation relies only on supervised data without considering the information in the molecular graph itself. Moreover, most previous studies tended to design complicated representation learning modules, while uniformity used to measure representation quality is ignored. In this study, we propose GraphCL-DTA, a graph contrastive learning with molecular semantics for drug-target binding affinity prediction. This graph contrastive learning framework replaces the dropout-based data augmentation strategy by performing data augmentation in the embedding space, thereby better preserving the semantic information of the molecular graph. A more essential and effective drug representation can be learned through this graph contrastive framework without additional supervised data. Next, we design a new loss function that can be directly used to adjust the uniformity of drug and target representations. By directly optimizing the uniformity of representations, the representation quality of drugs and targets can be improved. The effectiveness of the above innovative elements is verified on two real datasets, KIBA and Davis. Compared with the GraphDTA model, the relative improvement of the GraphCL-DTA model on the two datasets is 2.7% and 4.5%. The graph contrastive learning framework and uniformity function in the GraphCL-DTA model can be embedded into other computational models as independent modules to improve their generalization capability.
Collapse
|
50
|
Liu S, Yu J, Ni N, Wang Z, Chen M, Li Y, Xu C, Ding Y, Zhang J, Yao X, Liu H. Versatile Framework for Drug-Target Interaction Prediction by Considering Domain-Specific Features. J Chem Inf Model 2024; 64:5646-5656. [PMID: 38976879 DOI: 10.1021/acs.jcim.4c00403] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/10/2024]
Abstract
Predicting drug-target interactions (DTIs) is one of the crucial tasks in drug discovery, but traditional wet-lab experiments are costly and time-consuming. Recently, deep learning has emerged as a promising tool for accelerating DTI prediction due to its powerful performance. However, the models trained on limited known DTI data struggle to generalize effectively to novel drug-target pairs. In this work, we propose a strategy to train an ensemble of models by capturing both domain-generic and domain-specific features (E-DIS) to learn diverse domain features and adapt them to out-of-distribution data. Multiple experts were trained on different domains to capture and align domain-specific information from various distributions without accessing any data from unseen domains. E-DIS provides a comprehensive representation of proteins and ligands by capturing diverse features. Experimental results on four benchmark data sets in both in-domain and cross-domain settings demonstrated that E-DIS significantly improved model performance and domain generalization compared to existing methods. Our approach presents a significant advancement in DTI prediction by combining domain-generic and domain-specific features, enhancing the generalization ability of the DTI prediction model.
Collapse
Affiliation(s)
- Shuo Liu
- School of Pharmacy, Lanzhou University, Gansu 730000, China
- Huawei Technologies Co., Ltd., Hangzhou 310000, China
| | - Jialiang Yu
- Huawei Technologies Co., Ltd., Hangzhou 310000, China
| | - Ningxi Ni
- Huawei Technologies Co., Ltd., Hangzhou 310000, China
| | - Zidong Wang
- Huawei Technologies Co., Ltd., Hangzhou 310000, China
| | - Mengyun Chen
- Huawei Technologies Co., Ltd., Hangzhou 310000, China
| | - Yuquan Li
- College of Chemistry and Chemical Engineering, Lanzhou University, Gansu 730000, China
| | - Chen Xu
- Huawei Technologies Co., Ltd., Hangzhou 310000, China
| | - Yahao Ding
- Huawei Technologies Co., Ltd., Hangzhou 310000, China
| | - Jun Zhang
- Changping Laboratory, Beijing 102200, China
| | - Xiaojun Yao
- Faculty of Applied Sciences, Macao Polytechnic University, Macao SAR 999078, China
| | - Huanxiang Liu
- Faculty of Applied Sciences, Macao Polytechnic University, Macao SAR 999078, China
| |
Collapse
|