1
|
Liu Y, Xia X, Gong Y, Song B, Zeng X. SSR-DTA: Substructure-aware multi-layer graph neural networks for drug-target binding affinity prediction. Artif Intell Med 2024; 157:102983. [PMID: 39321746 DOI: 10.1016/j.artmed.2024.102983] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2023] [Revised: 09/10/2024] [Accepted: 09/13/2024] [Indexed: 09/27/2024]
Abstract
Accurate prediction of drug-target binding affinity (DTA) is essential in the field of drug discovery. Recently, scientists have been attempting to utilize artificial intelligence prediction to screen out a significant number of ineffective compounds, thereby mitigating labor and financial losses. While graph neural networks (GNNs) have been applied to DTA, existing GNNs have limitations in effectively extracting substructural features across various sizes. Functional groups play a crucial role in modulating molecular properties, but existing GNNs struggle with feature extraction from certain motifs due to scale mismatches. Additionally, sequence-based models for target proteins lack the integration of structural information. To address these limitations, we present SSR-DTA, a multi-layer graph network capable of adapting to diverse structural sizes, which can extract richer biological features, thereby improving the robustness and accuracy of predictions. Multi-layer GNNs enable the capture of molecular motifs across different scales, ranging from atomic to macrocyclic motifs. Furthermore, we introduce BiGNN to simultaneously learn sequence and structural information. Sequence information corresponds to the primary structure of proteins, while graph information represents the tertiary structure. BiGNN assimilates richer information compared to sequence-based methods while mitigating the impact of errors from predicted structures, resulting in more accurate predictions. Through rigorous experimental evaluations conducted on four benchmark datasets, we demonstrate the superiority of SSR-DTA over state-of-the-art models. Particularly, in comparison to state-of-the-art models, SSR-DTA demonstrates an impressive 20% reduction in mean squared error on the Davis dataset and a 5% reduction on the KIBA dataset, underscoring its potential as a valuable tool for advancing DTA prediction.
Collapse
Affiliation(s)
- Yuansheng Liu
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, 410086, Hunan, China; Key Laboratory of Intelligent Computing & Signal Processing of Ministry of Education, Anhui University, Hefei, 230601, Anhui, China
| | - Xinyan Xia
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, 410086, Hunan, China
| | - Yongshun Gong
- School of Software, Shandong University, Jinan, 250100, Shandong, China
| | - Bosheng Song
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, 410086, Hunan, China
| | - Xiangxiang Zeng
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, 410086, Hunan, China.
| |
Collapse
|
2
|
Zeng X, Feng PK, Li SJ, Lv SQ, Wen ML, Li Y. GNN-DDAS: Drug discovery for identifying anti-schistosome small molecules based on graph neural network. J Comput Chem 2024. [PMID: 39189298 DOI: 10.1002/jcc.27490] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2024] [Revised: 08/06/2024] [Accepted: 08/09/2024] [Indexed: 08/28/2024]
Abstract
Schistosomiasis is a tropical disease that poses a significant risk to hundreds of millions of people, yet often goes unnoticed. While praziquantel, a widely used anti-schistosome drug, has a low cost and a high cure rate, it has several drawbacks. These include ineffectiveness against schistosome larvae, reduced efficacy in young children, and emerging drug resistance. Discovering new and active anti-schistosome small molecules is therefore critical, but this process presents the challenge of low accuracy in computer-aided methods. To address this issue, we proposed GNN-DDAS, a novel deep learning framework based on graph neural networks (GNN), designed for drug discovery to identify active anti-schistosome (DDAS) small molecules. Initially, a multi-layer perceptron was used to derive sequence features from various representations of small molecule SMILES. Next, GNN was employed to extract structural features from molecular graphs. Finally, the extracted sequence and structural features were then concatenated and fed into a fully connected network to predict active anti-schistosome small molecules. Experimental results showed that GNN-DDAS exhibited superior performance compared to the benchmark methods on both benchmark and real-world application datasets. Additionally, the use of GNNExplainer model allowed us to analyze the key substructure features of small molecules, providing insight into the effectiveness of GNN-DDAS. Overall, GNN-DDAS provided a promising solution for discovering new and active anti-schistosome small molecules.
Collapse
Affiliation(s)
- Xin Zeng
- College of Mathematics and Computer Science, Dali University, Dali, China
| | - Peng-Kun Feng
- College of Mathematics and Computer Science, Dali University, Dali, China
| | - Shu-Juan Li
- Department of Endemic Diseases, Yunnan Institute of Endemic Diseases Control and Prevention, Dali, China
| | - Shuang-Qing Lv
- Institute of Surveying and Information Engineering, West Yunnan University of Applied Science, Dali, China
| | - Meng-Liang Wen
- State Key Laboratory for Conservation and Utilization of Bio-Resources in Yunnan, Yunnan University, Kunming, China
| | - Yi Li
- College of Mathematics and Computer Science, Dali University, Dali, China
| |
Collapse
|
3
|
Zeng X, Zhong KY, Meng PY, Li SJ, Lv SQ, Wen ML, Li Y. MvGraphDTA: multi-view-based graph deep model for drug-target affinity prediction by introducing the graphs and line graphs. BMC Biol 2024; 22:182. [PMID: 39183297 PMCID: PMC11346193 DOI: 10.1186/s12915-024-01981-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2024] [Accepted: 08/13/2024] [Indexed: 08/27/2024] Open
Abstract
BACKGROUND Accurately identifying drug-target affinity (DTA) plays a pivotal role in drug screening, design, and repurposing in pharmaceutical industry. It not only reduces the time, labor, and economic costs associated with biological experiments but also expedites drug development process. However, achieving the desired level of computational accuracy for DTA identification methods remains a significant challenge. RESULTS We proposed a novel multi-view-based graph deep model known as MvGraphDTA for DTA prediction. MvGraphDTA employed a graph convolutional network (GCN) to extract the structural features from original graphs of drugs and targets, respectively. It went a step further by constructing line graphs with edges as vertices based on original graphs of drugs and targets. GCN was also used to extract the relationship features within their line graphs. To enhance the complementarity between the extracted features from original graphs and line graphs, MvGraphDTA fused the extracted multi-view features of drugs and targets, respectively. Finally, these fused features were concatenated and passed through a fully connected (FC) network to predict DTA. CONCLUSIONS During the experiments, we performed data augmentation on all the training sets used. Experimental results showed that MvGraphDTA outperformed the competitive state-of-the-art methods on benchmark datasets for DTA prediction. Additionally, we evaluated the universality and generalization performance of MvGraphDTA on additional datasets. Experimental outcomes revealed that MvGraphDTA exhibited good universality and generalization capability, making it a reliable tool for drug-target interaction prediction.
Collapse
Affiliation(s)
- Xin Zeng
- College of Mathematics and Computer Science, Dali University, Dali, 671003, China
| | - Kai-Yang Zhong
- College of Mathematics and Computer Science, Dali University, Dali, 671003, China
| | - Pei-Yan Meng
- College of Mathematics and Computer Science, Dali University, Dali, 671003, China
| | - Shu-Juan Li
- Yunnan Institute of Endemic Diseases Control & Prevention, Dali, 671000, China
| | - Shuang-Qing Lv
- Institute of Surveying and Information Engineering, West Yunnan University of Applied Science, Dali, 671000, China
| | - Meng-Liang Wen
- State Key Laboratory for Conservation and Utilization of Bio-Resources in Yunnan, Yunnan University, Kunming, 650000, China
| | - Yi Li
- College of Mathematics and Computer Science, Dali University, Dali, 671003, China.
| |
Collapse
|
4
|
Chen J, Yang X, Wu H. A Multibranch Neural Network for Drug-Target Affinity Prediction Using Similarity Information. ACS OMEGA 2024; 9:35978-35989. [PMID: 39184467 PMCID: PMC11339836 DOI: 10.1021/acsomega.4c05607] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/15/2024] [Revised: 08/03/2024] [Accepted: 08/06/2024] [Indexed: 08/27/2024]
Abstract
Predicting drug-target affinity (DTA) is beneficial for accelerating drug discovery. In recent years, graph structure-based deep learning models have garnered significant attention in this field. However, these models typically handle drug or target protein in isolation and only extract the molecular structure information on the drug or protein itself. To address this limitation, existing network-based models represent drug-target interactions or affinities as a knowledge graph to capture the interaction information. In this study, we propose a novel solution. Specifically, we introduce drug similarity information and protein similarity information into the field of DTA prediction. Moreover, we propose a network framework that autonomously extracts similarity information, avoiding reliance on knowledge graphs. Based on this framework, we design a multibranch neural network called GASI-DTA. This network integrates similarity information, sequence information, and molecular structure information. Comprehensive experimental results conducted on two benchmark data sets and three cold-start scenarios demonstrate that our model outperforms state-of-the-art graph structure-based methods in nearly all metrics. Furthermore, it exhibits significant advantages over existing network-based models, outperforming the best of them in the majority of metrics. Our study's code and data are openly accessible at http://github.com/XiaoLin-Yang-S/GASI-DTA.
Collapse
Affiliation(s)
- Jing Chen
- School
of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi 214122, China
- Jiangsu
Provincial Engineering Laboratory of Pattern Recognition and Computing
Intelligence, Jiangnan University, Wuxi 214122, China
| | - Xiaolin Yang
- School
of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi 214122, China
| | - Haoyu Wu
- School
of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi 214122, China
| |
Collapse
|
5
|
Lavecchia A. Advancing drug discovery with deep attention neural networks. Drug Discov Today 2024; 29:104067. [PMID: 38925473 DOI: 10.1016/j.drudis.2024.104067] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2024] [Revised: 06/10/2024] [Accepted: 06/19/2024] [Indexed: 06/28/2024]
Abstract
In the dynamic field of drug discovery, deep attention neural networks are revolutionizing our approach to complex data. This review explores the attention mechanism and its extended architectures, including graph attention networks (GATs), transformers, bidirectional encoder representations from transformers (BERT), generative pre-trained transformers (GPTs) and bidirectional and auto-regressive transformers (BART). Delving into their core principles and multifaceted applications, we uncover their pivotal roles in catalyzing de novo drug design, predicting intricate molecular properties and deciphering elusive drug-target interactions. Despite challenges, these attention-based architectures hold unparalleled promise to drive transformative breakthroughs and accelerate progress in pharmaceutical research.
Collapse
Affiliation(s)
- Antonio Lavecchia
- Drug Discovery Laboratory, Department of Pharmacy, University of Napoli Federico II, I-80131 Naples, Italy.
| |
Collapse
|
6
|
Zhang Z, He X, Long D, Luo G, Chen S. Enhancing generalizability and performance in drug-target interaction identification by integrating pharmacophore and pre-trained models. Bioinformatics 2024; 40:i539-i547. [PMID: 38940179 PMCID: PMC11211825 DOI: 10.1093/bioinformatics/btae240] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/29/2024] Open
Abstract
MOTIVATION In drug discovery, it is crucial to assess the drug-target binding affinity (DTA). Although molecular docking is widely used, computational efficiency limits its application in large-scale virtual screening. Deep learning-based methods learn virtual scoring functions from labeled datasets and can quickly predict affinity. However, there are three limitations. First, existing methods only consider the atom-bond graph or one-dimensional sequence representations of compounds, ignoring the information about functional groups (pharmacophores) with specific biological activities. Second, relying on limited labeled datasets fails to learn comprehensive embedding representations of compounds and proteins, resulting in poor generalization performance in complex scenarios. Third, existing feature fusion methods cannot adequately capture contextual interaction information. RESULTS Therefore, we propose a novel DTA prediction method named HeteroDTA. Specifically, a multi-view compound feature extraction module is constructed to model the atom-bond graph and pharmacophore graph. The residue concat graph and protein sequence are also utilized to model protein structure and function. Moreover, to enhance the generalization capability and reduce the dependence on task-specific labeled data, pre-trained models are utilized to initialize the atomic features of the compounds and the embedding representations of the protein sequence. A context-aware nonlinear feature fusion method is also proposed to learn interaction patterns between compounds and proteins. Experimental results on public benchmark datasets show that HeteroDTA significantly outperforms existing methods. In addition, HeteroDTA shows excellent generalization performance in cold-start experiments and superiority in the representation learning ability of drug-target pairs. Finally, the effectiveness of HeteroDTA is demonstrated in a real-world drug discovery study. AVAILABILITY AND IMPLEMENTATION The source code and data are available at https://github.com/daydayupzzl/HeteroDTA.
Collapse
Affiliation(s)
- Zuolong Zhang
- School of Software, Henan University, Kaifeng, Henan Province 475000, China
| | - Xin He
- School of Software, Henan University, Kaifeng, Henan Province 475000, China
- Henan International Joint Laboratory of Intelligent Network Theory and Key Technology, Henan University, Kaifeng, Henan Province 475000, China
| | - Dazhi Long
- Department of Urology, Ji’an Third People’s Hospital, Ji’an, Jiangxi Province 343000, China
| | - Gang Luo
- School of Mathematics and Computer Science, Nanchang University, Nanchang, Jiangxi Province 330031, China
| | - Shengbo Chen
- Henan Engineering Research Center of Intelligent Technology and Application, Henan University, Kaifeng, Henan Province 475000, China
| |
Collapse
|
7
|
Zhou G, Qin Y, Hong Q, Li H, Chen H, Shen J. GEMF: a novel geometry-enhanced mid-fusion network for PLA prediction. Brief Bioinform 2024; 25:bbae333. [PMID: 38980371 PMCID: PMC11232467 DOI: 10.1093/bib/bbae333] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2024] [Revised: 06/04/2024] [Accepted: 06/26/2024] [Indexed: 07/10/2024] Open
Abstract
Accurate prediction of protein-ligand binding affinity (PLA) is important for drug discovery. Recent advances in applying graph neural networks have shown great potential for PLA prediction. However, existing methods usually neglect the geometric information (i.e. bond angles), leading to difficulties in accurately distinguishing different molecular structures. In addition, these methods also pose limitations in representing the binding process of protein-ligand complexes. To address these issues, we propose a novel geometry-enhanced mid-fusion network, named GEMF, to learn comprehensive molecular geometry and interaction patterns. Specifically, the GEMF consists of a graph embedding layer, a message passing phase, and a multi-scale fusion module. GEMF can effectively represent protein-ligand complexes as graphs, with graph embeddings based on physicochemical and geometric properties. Moreover, our dual-stream message passing framework models both covalent and non-covalent interactions. In particular, the edge-update mechanism, which is based on line graphs, can fuse both distance and angle information in the covalent branch. In addition, the communication branch consisting of multiple heterogeneous interaction modules is developed to learn intricate interaction patterns. Finally, we fuse the multi-scale features from the covalent, non-covalent, and heterogeneous interaction branches. The extensive experimental results on several benchmarks demonstrate the superiority of GEMF compared with other state-of-the-art methods.
Collapse
Affiliation(s)
- Guoqiang Zhou
- School of Computer Science, Nanjing University of Posts and Telecommunications, No.9 Wenyuan Road, Jiangsu 210023, China
| | - Yuke Qin
- School of Computer Science, Nanjing University of Posts and Telecommunications, No.9 Wenyuan Road, Jiangsu 210023, China
| | - Qiansen Hong
- School of Computer Science, Nanjing University of Posts and Telecommunications, No.9 Wenyuan Road, Jiangsu 210023, China
| | - Haoran Li
- School of Computing and Information Technology, University of Wollongong, Northfields Avenue, NSW 2522, Australia
| | - Huaming Chen
- School of Electrical and Computer Engineering, University of Sydney, Camperdown, NSW 2050, Australia
| | - Jun Shen
- School of Computing and Information Technology, University of Wollongong, Northfields Avenue, NSW 2522, Australia
| |
Collapse
|
8
|
Zhang H, Liu X, Cheng W, Wang T, Chen Y. Prediction of drug-target binding affinity based on deep learning models. Comput Biol Med 2024; 174:108435. [PMID: 38608327 DOI: 10.1016/j.compbiomed.2024.108435] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2024] [Revised: 04/05/2024] [Accepted: 04/07/2024] [Indexed: 04/14/2024]
Abstract
The prediction of drug-target binding affinity (DTA) plays an important role in drug discovery. Computerized virtual screening techniques have been used for DTA prediction, greatly reducing the time and economic costs of drug discovery. However, these techniques have not succeeded in reversing the low success rate of new drug development. In recent years, the continuous development of deep learning (DL) technology has brought new opportunities for drug discovery through the DTA prediction. This shift has moved the prediction of DTA from traditional machine learning methods to DL. The DL frameworks used for DTA prediction include convolutional neural networks (CNN), graph convolutional neural networks (GCN), and recurrent neural networks (RNN), and reinforcement learning (RL), among others. This review article summarizes the available literature on DTA prediction using DL models, including DTA quantification metrics and datasets, and DL algorithms used for DTA prediction (including input representation of models, neural network frameworks, valuation indicators, and model interpretability). In addition, the opportunities, challenges, and prospects of the application of DL frameworks for DTA prediction in the field of drug discovery are discussed.
Collapse
Affiliation(s)
- Hao Zhang
- College of Science, Nanjing Agricultural University, Nanjing, 210095, China
| | - Xiaoqian Liu
- College of Science, Nanjing Agricultural University, Nanjing, 210095, China
| | - Wenya Cheng
- College of Science, Nanjing Agricultural University, Nanjing, 210095, China
| | - Tianshi Wang
- College of Science, Nanjing Agricultural University, Nanjing, 210095, China
| | - Yuanyuan Chen
- College of Science, Nanjing Agricultural University, Nanjing, 210095, China.
| |
Collapse
|
9
|
Zhong KY, Wen ML, Meng FF, Li X, Jiang B, Zeng X, Li Y. MMDTA: A Multimodal Deep Model for Drug-Target Affinity with a Hybrid Fusion Strategy. J Chem Inf Model 2024; 64:2878-2888. [PMID: 37610162 DOI: 10.1021/acs.jcim.3c00866] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/24/2023]
Abstract
The prediction of the drug-target affinity (DTA) plays an important role in evaluating molecular druggability. Although deep learning-based models for DTA prediction have been extensively attempted, there are rare reports on multimodal models that leverage various fusion strategies to exploit heterogeneous information from multiple different modalities of drugs and targets. In this study, we proposed a multimodal deep model named MMDTA, which integrated the heterogeneous information from various modalities of drugs and targets using a hybrid fusion strategy to enhance DTA prediction. To achieve this, MMDTA first employed convolutional neural networks (CNNs) and graph convolutional networks (GCNs) to extract diverse heterogeneous information from the sequences and structures of drugs and targets. It then utilized a hybrid fusion strategy to combine and complement the extracted heterogeneous information, resulting in the fused modal information for predicting drug-target affinity through the fully connected (FC) layers. Experimental results demonstrated that MMDTA outperformed the competitive state-of-the-art deep learning models on the widely used benchmark data sets, particularly with a significantly improved key evaluation metric, Root Mean Square Error (RMSE). Furthermore, MMDTA exhibited excellent generalization and practical application performance on multiple different data sets. These findings highlighted MMDTA's accuracy and reliability in predicting the drug-target binding affinity. For researchers interested in the source data and code, they are accessible at http://github.com/dldxzx/MMDTA.
Collapse
Affiliation(s)
- Kai-Yang Zhong
- College of Mathematics and Computer Science, Dali University, Dali 671003, China
| | - Meng-Liang Wen
- State Key Laboratory for Conservation and Utilization of Bio-Resource in Yunnan, Yunnan University, Kunming 650000, China
| | - Fan-Fang Meng
- College of Mathematics and Computer Science, Dali University, Dali 671003, China
| | - Xin Li
- College of Mathematics and Computer Science, Dali University, Dali 671003, China
| | - Bei Jiang
- Yunnan Key Laboratory of Screening and Research on Anti-pathogenic Plant Resources from Western Yunnan, Dali University, Dali 671000, China
| | - Xin Zeng
- College of Mathematics and Computer Science, Dali University, Dali 671003, China
| | - Yi Li
- College of Mathematics and Computer Science, Dali University, Dali 671003, China
| |
Collapse
|
10
|
Zeng X, Li SJ, Lv SQ, Wen ML, Li Y. A comprehensive review of the recent advances on predicting drug-target affinity based on deep learning. Front Pharmacol 2024; 15:1375522. [PMID: 38628639 PMCID: PMC11019008 DOI: 10.3389/fphar.2024.1375522] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2024] [Accepted: 03/21/2024] [Indexed: 04/19/2024] Open
Abstract
Accurate calculation of drug-target affinity (DTA) is crucial for various applications in the pharmaceutical industry, including drug screening, design, and repurposing. However, traditional machine learning methods for calculating DTA often lack accuracy, posing a significant challenge in accurately predicting DTA. Fortunately, deep learning has emerged as a promising approach in computational biology, leading to the development of various deep learning-based methods for DTA prediction. To support researchers in developing novel and highly precision methods, we have provided a comprehensive review of recent advances in predicting DTA using deep learning. We firstly conducted a statistical analysis of commonly used public datasets, providing essential information and introducing the used fields of these datasets. We further explored the common representations of sequences and structures of drugs and targets. These analyses served as the foundation for constructing DTA prediction methods based on deep learning. Next, we focused on explaining how deep learning models, such as Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs), Transformer, and Graph Neural Networks (GNNs), were effectively employed in specific DTA prediction methods. We highlighted the unique advantages and applications of these models in the context of DTA prediction. Finally, we conducted a performance analysis of multiple state-of-the-art methods for predicting DTA based on deep learning. The comprehensive review aimed to help researchers understand the shortcomings and advantages of existing methods, and further develop high-precision DTA prediction tool to promote the development of drug discovery.
Collapse
Affiliation(s)
- Xin Zeng
- College of Mathematics and Computer Science, Dali University, Dali, China
| | - Shu-Juan Li
- Yunnan Institute of Endemic Diseases Control and Prevention, Dali, China
| | - Shuang-Qing Lv
- Institute of Surveying and Information Engineering West Yunnan University of Applied Science, Dali, China
| | - Meng-Liang Wen
- State Key Laboratory for Conservation and Utilization of Bio-Resources in Yunnan, Yunnan University, Kunming, China
| | - Yi Li
- College of Mathematics and Computer Science, Dali University, Dali, China
| |
Collapse
|
11
|
Lee J, Jun DW, Song I, Kim Y. DLM-DTI: a dual language model for the prediction of drug-target interaction with hint-based learning. J Cheminform 2024; 16:14. [PMID: 38297330 PMCID: PMC10832108 DOI: 10.1186/s13321-024-00808-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2023] [Accepted: 01/22/2024] [Indexed: 02/02/2024] Open
Abstract
The drug discovery process is demanding and time-consuming, and machine learning-based research is increasingly proposed to enhance efficiency. A significant challenge in this field is predicting whether a drug molecule's structure will interact with a target protein. A recent study attempted to address this challenge by utilizing an encoder that leverages prior knowledge of molecular and protein structures, resulting in notable improvements in the prediction performance of the drug-target interactions task. Nonetheless, the target encoders employed in previous studies exhibit computational complexity that increases quadratically with the input length, thereby limiting their practical utility. To overcome this challenge, we adopt a hint-based learning strategy to develop a compact and efficient target encoder. With the adaptation parameter, our model can blend general knowledge and target-oriented knowledge to build features of the protein sequences. This approach yielded considerable performance enhancements and improved learning efficiency on three benchmark datasets: BIOSNAP, DAVIS, and Binding DB. Furthermore, our methodology boasts the merit of necessitating only a minimal Video RAM (VRAM) allocation, specifically 7.7GB, during the training phase (16.24% of the previous state-of-the-art model). This ensures the feasibility of training and inference even with constrained computational resources.
Collapse
Affiliation(s)
- Jonghyun Lee
- Department of Medical and Digital Engineering, Hanyang University College of Engineering, 222, Wangsimni-ro, Seongdong-gu, Seoul, 04763, Korea
| | - Dae Won Jun
- Department of Medical and Digital Engineering, Hanyang University College of Engineering, 222, Wangsimni-ro, Seongdong-gu, Seoul, 04763, Korea
- Department of Internal Medicine, Hanyang University College of Medicine, 222, Wangsimni-ro, Seongdong-gu, Seoul, 04763, Korea
| | - Ildae Song
- Department of Pharmaceutical Science and Technology, Kyungsung University, 309, Suyeong-ro, Nam-gu, Busan, 48434, Korea
| | - Yun Kim
- College of Pharmacy, Deagu Catholic University, 13-13, Hayang-ro, Hayang-eup, Gyeongsan-si, 38430, Gyeongsangbuk-do, Korea.
| |
Collapse
|
12
|
Zhu Z, Yao Z, Zheng X, Qi G, Li Y, Mazur N, Gao X, Gong Y, Cong B. Drug-target affinity prediction method based on multi-scale information interaction and graph optimization. Comput Biol Med 2023; 167:107621. [PMID: 37907030 DOI: 10.1016/j.compbiomed.2023.107621] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2023] [Revised: 10/16/2023] [Accepted: 10/23/2023] [Indexed: 11/02/2023]
Abstract
Drug-target affinity (DTA) prediction as an emerging and effective method is widely applied to explore the strength of drug-target interactions in drug development research. By predicting these interactions, researchers can assess the potential efficacy and safety of candidate drugs at an early stage, narrowing down the search space for therapeutic targets and accelerating the discovery and development of new drugs. However, existing DTA prediction models mainly use graphical representations of drug molecules, which lack information on interactions between individual substructures, thus affecting prediction accuracy and model interpretability. Therefore, transformer and diffusion on drug graphs in DTA prediction (TDGraphDTA) are introduced to predict drug-target interactions using multi-scale information interaction and graph optimization. An interactive module is integrated into feature extraction of drug and target features at different granularity levels. A diffusion model-based graph optimization module is proposed to improve the representation of molecular graph structures and enhance the interpretability of graph representations while obtaining optimal feature representations. In addition, TDGraphDTA improves the accuracy and reliability of predictions by capturing relationships and contextual information between molecular substructures. The performance of the proposed TDGraphDTA in DTA prediction was verified on three publicly available benchmark datasets (Davis, Metz, and KIBA). Compared with state-of-the-art baseline models, it achieved better results in terms of consistency index, R-squared, etc. Furthermore, compared with some existing methods, the proposed TDGraphDTA is demonstrated to have better structure capturing capabilities by visualizing the feature capturing capabilities of the model using Grad-AAM toxicity labels in the ToxCast dataset. The corresponding source codes are available at https://github.com/Lamouryz/TDGraph.
Collapse
Affiliation(s)
- Zhiqin Zhu
- College of Automation, Chongqing University of Posts and Telecommunications, Chongqing, 400065, China.
| | - Zheng Yao
- College of Automation, Chongqing University of Posts and Telecommunications, Chongqing, 400065, China.
| | - Xin Zheng
- College of Automation, Chongqing University of Posts and Telecommunications, Chongqing, 400065, China.
| | - Guanqiu Qi
- Computer Information Systems Department, State University of New York at Buffalo State, Buffalo, NY 14222, USA.
| | - Yuanyuan Li
- College of Automation, Chongqing University of Posts and Telecommunications, Chongqing, 400065, China.
| | - Neal Mazur
- Computer Information Systems Department, State University of New York at Buffalo State, Buffalo, NY 14222, USA.
| | - Xinbo Gao
- College of Automation, Chongqing University of Posts and Telecommunications, Chongqing, 400065, China.
| | - Yifei Gong
- Faculty of applied science & engineering, the Edward S. Rogers Sr. Department of Electrical & Computer Engineering (ECE), University of Toronto at Toronto, ON M5S, Canada.
| | - Baisen Cong
- Diagnostics Digital, DH(Shanghai) Diagnostics Co, Ltd, a Danaher company, Shanghai, 200335, China.
| |
Collapse
|
13
|
Meng C, Yuan Y, Zhao H, Pei Y, Li Z. IIFS: An improved incremental feature selection method for protein sequence processing. Comput Biol Med 2023; 167:107654. [PMID: 37944304 DOI: 10.1016/j.compbiomed.2023.107654] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2023] [Revised: 10/09/2023] [Accepted: 10/31/2023] [Indexed: 11/12/2023]
Abstract
MOTIVATION Discrete features can be obtained from protein sequences using a feature extraction method. These features are the basis of downstream processing of protein data, but it is necessary to screen and select some important features from them as they generally have data redundancy. RESULT Here, we report IIFS, an improved incremental feature selection method that exploits a new subset search strategy to find the optimal feature set. IIFS combines nonadjacent sorting features to prevent the drawbacks of data explosion and excessive reliance on feature sorting results. The comparative experimental results on 27 feature sorting data show that IIFS can find more accurate and important features compared to existing methods.The IIFS approach also handles data redundancy more efficiently and finds more representative and discriminatory features while ensuring minimal feature dimensionality and good evaluation metrics. Moreover, we wrap this method and deploy it on a web server for access at http://112.124.26.17:8005/.
Collapse
Affiliation(s)
- Chaolu Meng
- College of Computer and Information Engineering, Inner Mongolia Agricultural University, Hohhot, China; Inner Mongolia Autonomous Region Key Laboratory of Big Data Research and Application of Agriculture and Animal Husbandry, China
| | - Ye Yuan
- Beidahuang Industry Group General Hospital, Harbin, 150001, China
| | - Haiyan Zhao
- College of Integration of Traditional Chinese and Western Medicine to Southwest Medical University, Luzhou, Sichuan, 646000, China
| | - Yue Pei
- Computer Network Information Center, Chinese Academy of Sciences, Beijing, 100190, China
| | - Zhi Li
- Department of Spleen and Stomach Diseases, The Affiliated Traditional Chinese Medicine Hospital of Southwest Medical University, Luzhou, Sichuan, 646000, China.
| |
Collapse
|
14
|
Zhang L, Wang CC, Zhang Y, Chen X. GPCNDTA: Prediction of drug-target binding affinity through cross-attention networks augmented with graph features and pharmacophores. Comput Biol Med 2023; 166:107512. [PMID: 37788507 DOI: 10.1016/j.compbiomed.2023.107512] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2023] [Revised: 08/28/2023] [Accepted: 09/19/2023] [Indexed: 10/05/2023]
Abstract
Drug-target affinity prediction is a challenging task in drug discovery. The latest computational models have limitations in mining edge information in molecule graphs, accessing to knowledge in pharmacophores, integrating multimodal data of the same biomolecule and realizing effective interactions between two different biomolecules. To solve these problems, we proposed a method called Graph features and Pharmacophores augmented Cross-attention Networks based Drug-Target binding Affinity prediction (GPCNDTA). First, we utilized the GNN module, the linear projection unit and self-attention layer to correspondingly extract features of drugs and proteins. Second, we devised intramolecular and intermolecular cross-attention to respectively fuse and interact features of drugs and proteins. Finally, the linear projection unit was applied to gain final features of drugs and proteins, and the Multi-Layer Perceptron was employed to predict drug-target binding affinity. Three major innovations of GPCNDTA are as follows: (i) developing the residual CensNet and the residual EW-GCN to correspondingly extract features of drug and protein graphs, (ii) regarding pharmacophores as a new type of priors to heighten drug-target affinity prediction performance, and (iii) devising intramolecular and intermolecular cross-attention, in which the intramolecular cross-attention realizes the effective fusion of different modal data related to the same biomolecule, and the intermolecular cross-attention fulfills the information interaction between two different biomolecules in attention space. The test results on five benchmark datasets imply that GPCNDTA achieves the best performance compared with state-of-the-art computational models. Besides, relying on ablation experiments, we proved effectiveness of GNN modules, pharmacophores and two cross-attention strategies in improving the prediction accuracy, stability and reliability of GPCNDA. In case studies, we applied GPCNDTA to predict binding affinities between 3C-like proteinase and 185 drugs, and observed that most binding affinities predicted by GPCNDTA are close to corresponding experimental measurements.
Collapse
Affiliation(s)
- Li Zhang
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou, 221116, China
| | - Chun-Chun Wang
- School of Science, Jiangnan University, Wuxi, 214122, China
| | - Yong Zhang
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou, 221116, China
| | - Xing Chen
- School of Science, Jiangnan University, Wuxi, 214122, China.
| |
Collapse
|
15
|
Fang K, Zhang Y, Du S, He J. ColdDTA: Utilizing data augmentation and attention-based feature fusion for drug-target binding affinity prediction. Comput Biol Med 2023; 164:107372. [PMID: 37597410 DOI: 10.1016/j.compbiomed.2023.107372] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2023] [Revised: 07/26/2023] [Accepted: 08/12/2023] [Indexed: 08/21/2023]
Abstract
Accurate prediction of drug-target affinity (DTA) plays a crucial role in drug discovery and development. Recently, deep learning methods have shown excellent predictive performance on randomly split public datasets. However, verifications are still required on this splitting method to reflect real-world problems in practical applications. And in a cold-start experimental setup, where drugs or proteins in the test set do not appear in the training set, the performance of deep learning models often significantly decreases. This indicates that improving the generalization ability of the models remains a challenge. To this end, in this study, we propose ColdDTA: using data augmentation and attention-based feature fusion to improve the generalization ability of predicting drug-target binding affinity. Specifically, ColdDTA generates new drug-target pairs by removing subgraphs of drugs. The attention-based feature fusion module is also used to better capture the drug-target interactions. We conduct cold-start experiments on three benchmark datasets, and the consistency index (CI) and mean square error (MSE) results on the Davis and KIBA datasets show that ColdDTA outperforms the five state-of-the-art baseline methods. Meanwhile, the results of area under the receiver operating characteristic (ROC-AUC) on the BindingDB dataset show that ColdDTA also has better performance on the classification task. Furthermore, visualizing the model weights allows for interpretable insights. Overall, ColdDTA can better solve the realistic DTA prediction problem. The code has been available to the public.
Collapse
Affiliation(s)
- Kejie Fang
- Faculty of Electrical Engineering and Computer Science, Ningbo University, Ningbo, 315211, China
| | - Yiming Zhang
- Engineering Laboratory of Advanced Energy Materials, Ningbo Institute of Materials Technology and Engineering, Chinese Academy of Sciences, Ningbo, 315201, China
| | - Shiyu Du
- Engineering Laboratory of Advanced Energy Materials, Ningbo Institute of Materials Technology and Engineering, Chinese Academy of Sciences, Ningbo, 315201, China; School of Materials Science and Engineering and School of Computer Science, China University of Petroleum (East China), Qingdao, 266580, China.
| | - Jian He
- State Key Laboratory of Systems Medicine for Cancer, Center for Single-Cell Omics, School of Public Health, Shanghai Jiao Tong University School of Medicine, Shanghai, 200025, China.
| |
Collapse
|
16
|
Zhang Y, Liu P, Tang LJ, Lin PM, Li R, Luo HR, Luo P. Basing on the machine learning model to analyse the coronary calcification score and the coronary flow reserve score to evaluate the degree of coronary artery stenosis. Comput Biol Med 2023; 163:107130. [PMID: 37329614 DOI: 10.1016/j.compbiomed.2023.107130] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2023] [Revised: 05/23/2023] [Accepted: 06/01/2023] [Indexed: 06/19/2023]
Abstract
AIM To obtain the coronary artery calcium score (CACS) for each branch in coronary artery computed tomography angiography (CCTA) examination combined with the flow fraction reserve (FFR) of each branch in the coronary artery detected by CT and apply a machine learning model (ML) to analyse and predict the severity of coronary artery stenosis. METHODS All patients who underwent coronary computed tomography angiography (CCTA) from January 2019 to April 2022 in the HOSPITAL (T.C.M) AFFILIATED TO SOUTHWEST MEDICAL UNIVERSITY) were retrospectively screened, and their sex, age, characteristics of lipid-containing lesions, coronary calcium score (CACS) and CT-FFR values were collected. Five machine learning models, random forest (RF), k-nearest neighbour algorithm (KNN), kernel logistic regression, support vector machine (SVM) and radial basis function neural network (RBFNN), were used as predictive models to evaluate the severity of coronary stenosis. RESULTS Among the five machine learning models, the SVM model achieved the best prediction performance, and the prediction accuracy of mild stenosis was up to 90%. Second, age and male sex were important influencing factors of increasing CACS and decreasing CT-FFR. Moreover, the critical CACS value of myocardial ischemia >200.70 was calculated. CONCLUSION Through computer machine learning model analysis, we prove the importance of CACS and FFR in predicting coronary stenosis, especially the prominent vector machine model, which promotes the application of artificial intelligence computer learning methods in the field of medical analysis.
Collapse
Affiliation(s)
- Ying Zhang
- State Key Laboratories for Quality Research in Chinese Medicines, Faculty of Pharmacy, Macau University of Science and Technology, Macau; Department of Anaesthesiology, HOSPITAL (T.C.M) AFFILIATED TO SOUTHWEST MEDICAL UNIVERSITY), Lu Zhou, (646000), Sichuan, China.
| | - Ping Liu
- Department of Anaesthesiology, HOSPITAL (T.C.M) AFFILIATED TO SOUTHWEST MEDICAL UNIVERSITY), Lu Zhou, (646000), Sichuan, China.
| | - Li-Jia Tang
- Department of Anaesthesiology, HOSPITAL (T.C.M) AFFILIATED TO SOUTHWEST MEDICAL UNIVERSITY), Lu Zhou, (646000), Sichuan, China.
| | - Pei-Min Lin
- Department of Anaesthesiology, HOSPITAL (T.C.M) AFFILIATED TO SOUTHWEST MEDICAL UNIVERSITY), Lu Zhou, (646000), Sichuan, China.
| | - Run Li
- Department of Anaesthesiology, HOSPITAL (T.C.M) AFFILIATED TO SOUTHWEST MEDICAL UNIVERSITY), Lu Zhou, (646000), Sichuan, China.
| | - Huai-Rong Luo
- State Key Laboratories for Quality Research in Chinese Medicines, Faculty of Pharmacy, Macau University of Science and Technology, Macau.
| | - Pei Luo
- State Key Laboratories for Quality Research in Chinese Medicines, Faculty of Pharmacy, Macau University of Science and Technology, Macau.
| |
Collapse
|
17
|
Zulfiqar H, Ahmed Z, Kissanga Grace-Mercure B, Hassan F, Zhang ZY, Liu F. Computational prediction of promotors in Agrobacterium tumefaciens strain C58 by using the machine learning technique. Front Microbiol 2023; 14:1170785. [PMID: 37125199 PMCID: PMC10133480 DOI: 10.3389/fmicb.2023.1170785] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2023] [Accepted: 03/17/2023] [Indexed: 05/02/2023] Open
Abstract
Promotors are those genomic regions on the upstream of genes, which are bound by RNA polymerase for starting gene transcription. Because it is the most critical element of gene expression, the recognition of promoters is crucial to understand the regulation of gene expression. This study aimed to develop a machine learning-based model to predict promotors in Agrobacterium tumefaciens (A. tumefaciens) strain C58. In the model, promotor sequences were encoded by three different kinds of feature descriptors, namely, accumulated nucleotide frequency, k-mer nucleotide composition, and binary encodings. The obtained features were optimized by using correlation and the mRMR-based algorithm. These optimized features were inputted into a random forest (RF) classifier to discriminate promotor sequences from non-promotor sequences in A. tumefaciens strain C58. The examination of 10-fold cross-validation showed that the proposed model could yield an overall accuracy of 0.837. This model will provide help for the study of promoters in A. tumefaciens C58 strain.
Collapse
Affiliation(s)
- Hasan Zulfiqar
- Yangtze Delta Region Institute (Huzhou), University of Electronic Science and Technology of China, Huzhou, China
- School of Life Science and Technology and Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| | - Zahoor Ahmed
- Yangtze Delta Region Institute (Huzhou), University of Electronic Science and Technology of China, Huzhou, China
| | - Bakanina Kissanga Grace-Mercure
- School of Life Science and Technology and Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| | - Farwa Hassan
- School of Life Science and Technology and Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| | - Zhao-Yue Zhang
- School of Life Science and Technology and Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| | - Fen Liu
- Department of Radiation Oncology, Peking University Cancer Hospital (Inner Mongolia Campus), Affiliated Cancer Hospital of Inner Mongolia Medical University, Inner Mongolia Cancer Hospital, Hohhot, China
| |
Collapse
|