1
|
Lu Q, Zhou Z, Wang Q. Multi-layer graph attention neural networks for accurate drug-target interaction mapping. Sci Rep 2024; 14:26119. [PMID: 39478027 PMCID: PMC11525987 DOI: 10.1038/s41598-024-75742-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2024] [Accepted: 10/08/2024] [Indexed: 11/02/2024] Open
Abstract
In the crucial process of drug discovery and repurposing, precise prediction of drug-target interactions (DTIs) is paramount. This study introduces a novel DTI prediction approach-Multi-Layer Graph Attention Neural Network (MLGANN), through a groundbreaking computational framework that effectively harnesses multi-source information to enhance prediction accuracy. MLGANN not only strides forward in constructing a multi-layer DTI network by capturing both direct interactions between drugs and targets as well as their multi-level information but also amalgamates Graph Convolutional Networks (GCN) with a self-attention mechanism to comprehensively integrate diverse data sources. This method exhibited significant performance surpassing existing approaches in comparative experiments, underscoring its immense potential in elevating the efficiency and accuracy of DTI predictions. More importantly, this study accentuates the significance of considering multi-source data information and network heterogeneity in the drug discovery process, offering new perspectives and tools for future pharmaceutical research.
Collapse
Affiliation(s)
- Qianwen Lu
- SDU-ANU Joint Science College, Shandong University, Weihai, 264209, Shandong, China
| | - Zhiheng Zhou
- Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, 100190, China
- School of Mathematical Sciences, University of Chinese Academy of Sciences, Beijing, 100190, China
| | - Qi Wang
- College of Science, China Agricultural University, Beijing, 100083, China.
| |
Collapse
|
2
|
Tao W, Lin X, Liu Y, Zeng L, Ma T, Cheng N, Jiang J, Zeng X, Yuan S. Bridging chemical structure and conceptual knowledge enables accurate prediction of compound-protein interaction. BMC Biol 2024; 22:248. [PMID: 39468510 PMCID: PMC11520867 DOI: 10.1186/s12915-024-02049-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2024] [Accepted: 10/17/2024] [Indexed: 10/30/2024] Open
Abstract
BACKGROUND Accurate prediction of compound-protein interaction (CPI) plays a crucial role in drug discovery. Existing data-driven methods aim to learn from the chemical structures of compounds and proteins yet ignore the conceptual knowledge that is the interrelationships among the fundamental elements in the biomedical knowledge graph (KG). Knowledge graphs provide a comprehensive view of entities and relationships beyond individual compounds and proteins. They encompass a wealth of information like pathways, diseases, and biological processes, offering a richer context for CPI prediction. This contextual information can be used to identify indirect interactions, infer potential relationships, and improve prediction accuracy. In real-world applications, the prevalence of knowledge-missing compounds and proteins is a critical barrier for injecting knowledge into data-driven models. RESULTS Here, we propose BEACON, a data and knowledge dual-driven framework that bridges chemical structure and conceptual knowledge for CPI prediction. The proposed BEACON learns the consistent representations by maximizing the mutual information between chemical structure and conceptual knowledge and predicts the missing representations by minimizing their conditional entropy. BEACON achieves state-of-the-art performance on multiple datasets compared to competing methods, notably with 5.1% and 6.6% performance gain on the BIOSNAP and DrugBank datasets, respectively. Moreover, BEACON is the only approach capable of effectively predicting knowledge representations for knowledge-lacking compounds and proteins. CONCLUSIONS Overall, our work provides a general approach for directly injecting conceptual knowledge to enhance the performance of CPI prediction.
Collapse
Affiliation(s)
- Wen Tao
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, 410082, Hunan, China
| | - Xuan Lin
- School of Computer Science, Xiangtan University, Xiangtan, 411105, Hunan, China
- Laboratory of Intelligent Computing and Information Processing, Ministry of Education (Xiangtan University), Xiangtan, 411105, Hunan, China
| | - Yuansheng Liu
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, 410082, Hunan, China.
- Laboratory of Intelligent Computing and Information Processing, Ministry of Education (Xiangtan University), Xiangtan, 411105, Hunan, China.
| | - Li Zeng
- Department of AIDD, Shanghai Yuyao Biotechnology Co., Ltd., Shanghai, 201109, Shanghai, China
| | - Tengfei Ma
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, 410082, Hunan, China
| | - Ning Cheng
- School of Informatics, Hunan University of Chinese Medicine, Changsha, 410208, Hunan, China
| | - Jing Jiang
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, 410082, Hunan, China
| | - Xiangxiang Zeng
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, 410082, Hunan, China
| | - Sisi Yuan
- Department of Bioinformatics and Genomics, University of North Carolina at Charlotte, Charlotte, 28223, NC, USA.
| |
Collapse
|
3
|
Abdollahi S, Schaub DP, Barroso M, Laubach NC, Hutwelker W, Panzer U, Gersting SØW, Bonn S. A comprehensive comparison of deep learning-based compound-target interaction prediction models to unveil guiding design principles. J Cheminform 2024; 16:118. [PMID: 39468635 PMCID: PMC11520803 DOI: 10.1186/s13321-024-00913-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2024] [Accepted: 10/10/2024] [Indexed: 10/30/2024] Open
Abstract
The evaluation of compound-target interactions (CTIs) is at the heart of drug discovery efforts. Given the substantial time and monetary costs of classical experimental screening, significant efforts have been dedicated to develop deep learning-based models that can accurately predict CTIs. A comprehensive comparison of these models on a large, curated CTI dataset is, however, still lacking. Here, we perform an in-depth comparison of 12 state-of-the-art deep learning architectures that use different protein and compound representations. The models were selected for their reported performance and architectures. To reliably compare model performance, we curated over 300 thousand binding and non-binding CTIs and established several gold-standard datasets of varying size and information. Based on our findings, DeepConv-DTI consistently outperforms other models in CTI prediction performance across the majority of datasets. It achieves an MCC of 0.6 or higher for most of the datasets and is one of the fastest models in training and inference. These results indicate that utilizing convolutional-based windows as in DeepConv-DTI to traverse trainable embeddings is a highly effective approach for capturing informative protein features. We also observed that physicochemical embeddings of targets increased model performance. We therefore modified DeepConv-DTI to include normalized physicochemical properties, which resulted in the overall best performing model Phys-DeepConv-DTI. This work highlights how the systematic evaluation of input features of compounds and targets, as well as their corresponding neural network architectures, can serve as a roadmap for the future development of improved CTI models.Scientific contributionThis work features comprehensive CTI datasets to allow for the objective comparison and benchmarking of CTI prediction algorithms. Based on this dataset, we gained insights into which embeddings of compounds and targets and which deep learning-based algorithms perform best, providing a blueprint for the future development of CTI algorithms. Using the insights gained from this screen, we provide a novel CTI algorithm with state-of-the-art performance.
Collapse
Affiliation(s)
- Sina Abdollahi
- Institute of Medical Systems Biology, University Medical Center Hamburg-Eppendorf, Hamburg, 20251, Germany
| | - Darius P Schaub
- Institute of Medical Systems Biology, University Medical Center Hamburg-Eppendorf, Hamburg, 20251, Germany
- III. Department of Medicine, University Medical Center Hamburg-Eppendorf, Hamburg, 20251, Germany
| | - Madalena Barroso
- University Children's Research, UCR@Kinder-UKE, University Medical Center Hamburg-Eppendorf, Hamburg, 20251, Germany
| | - Nora C Laubach
- University Children's Research, UCR@Kinder-UKE, University Medical Center Hamburg-Eppendorf, Hamburg, 20251, Germany
| | - Wiebke Hutwelker
- University Children's Research, UCR@Kinder-UKE, University Medical Center Hamburg-Eppendorf, Hamburg, 20251, Germany
| | - Ulf Panzer
- III. Department of Medicine, University Medical Center Hamburg-Eppendorf, Hamburg, 20251, Germany
- Hamburg Center for Translational Immunology (HCTI), University Medical Center Hamburg-Eppendorf, Hamburg, 20251, Germany
| | - S Øren W Gersting
- University Children's Research, UCR@Kinder-UKE, University Medical Center Hamburg-Eppendorf, Hamburg, 20251, Germany.
| | - Stefan Bonn
- Institute of Medical Systems Biology, University Medical Center Hamburg-Eppendorf, Hamburg, 20251, Germany.
- Hamburg Center for Translational Immunology (HCTI), University Medical Center Hamburg-Eppendorf, Hamburg, 20251, Germany.
- Center for Biomedical AI, University Medical Center Hamburg-Eppendorf, Hamburg, 20251, Germany.
| |
Collapse
|
4
|
Zuo Y, Wu X, Ge F, Yan H, Fei S, Liang J, Deng Z. Research Progress on Drug-Target Interactions in the Last Five Years. Anal Biochem 2024:115691. [PMID: 39455038 DOI: 10.1016/j.ab.2024.115691] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2024] [Revised: 10/06/2024] [Accepted: 10/16/2024] [Indexed: 10/28/2024]
Abstract
The identification of Drug-Target Interaction (DTI) is an important step in drug discovery and drug repositioning, and has high application value in multiple fields such as drug discovery, drug repositioning, and repurposing. However, the high cost of experimental validation limits its identification. In contrast, computation-based approaches are both economical and efficient. This review first synthesizes existing chemical genomic approaches, provides a comprehensive summary of prevalent databases for predicting DTIs, and categorizes the feature encodings from recent years. This is followed by an overview and brief description of the methods currently in use for predicting DTIs. The strengths and weaknesses of newly proposed prediction methods in the last five years (2020-2024), including those based on network representation learning and graph neural networks, are then discussed in detail, evaluating the performance of the different methods on a wide range of datasets. Finally, this review explores potential directions for future DTI research, emphasizing how to improve prediction accuracy and efficiency by combining big data and emerging computing technologies.
Collapse
Affiliation(s)
- Yun Zuo
- School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi 214000, China.
| | - Xubin Wu
- School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi 214000, China
| | - Fei Ge
- School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi 214000, China
| | - Hongjin Yan
- School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi 214000, China
| | - Sirui Fei
- School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi 214000, China
| | - Jingwen Liang
- School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi 214000, China
| | - Zhaohong Deng
- School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi 214000, China.
| |
Collapse
|
5
|
Xu K, Wang M, Zou X, Liu J, Wei A, Chen J, Tang C. HSTrans: Homogeneous substructures transformer for predicting frequencies of drug-side effects. Neural Netw 2024; 181:106779. [PMID: 39488108 DOI: 10.1016/j.neunet.2024.106779] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2024] [Revised: 08/29/2024] [Accepted: 10/01/2024] [Indexed: 11/04/2024]
Abstract
Identifying the frequencies of drug-side effects is crucial for assessing drug risk-benefit. However, accurately determining these frequencies remains challenging due to the limitations of time and scale in clinical randomized controlled trials. As a result, several computational methods have been proposed to address these issues. Nonetheless, two primary problems still persist. Firstly, most of these methods face challenges in generating accurate predictions for novel drugs, as they heavily depend on the interaction graph between drugs and side effects (SEs) within their modeling framework. Secondly, some previous methods often simply concatenate the features of drugs and SEs, which fails to effectively capture their underlying association. In this work, we present HSTrans, a novel approach that treats drugs and SEs as sets of substructures, leveraging a transformer encoder for unified substructure embedding and incorporating an interaction module for association capture. Specifically, HSTrans extracts drug substructures through a specialized algorithm and identifies effective substructures for each SE by employing an indicator that measures the importance of each substructure and SE. Additionally, HSTrans applies convolutional neural network (CNN) in the interaction module to capture complex relationships between drugs and SEs. Experimental results on datasets from Galeano et al.'s study demonstrate that the proposed method outperforms other state-of-the-art approaches. The demo codes for HSTrans are available at https://github.com/Dtdtxuky/HSTrans/tree/master.
Collapse
Affiliation(s)
- Kaiyi Xu
- School of Computer Science, China University of Geosciences, Wuhan 430074, China
| | - Minhui Wang
- Department of Pharmacy, Lianshui People's Hospital Affiliated to Kangda College of Nanjing Medical University, Huai'an 223300, China
| | - Xin Zou
- School of Computer Science, China University of Geosciences, Wuhan 430074, China
| | - Jingjing Liu
- Department of Cardiac Surgery, Tianjin Chest Hospital, Tianjin 300222, China
| | - Ao Wei
- Department of Cardiology, Tianjin Chest Hospital, Tianjin 300222, China
| | - Jiajia Chen
- Department of Pharmacy, The Affiliated Huai'an Hospital of Xuzhou Medical University and The Second People's Hospital of Huai'an, Huai'an 223002, China.
| | - Chang Tang
- School of Computer Science, China University of Geosciences, Wuhan 430074, China.
| |
Collapse
|
6
|
Li Y, Zhang X, Chen Z, Yang H, Liu Y, Wang H, Yan T, Xiang J, Wang B. Accurate prediction of drug-target interactions in Chinese and western medicine by the CWI-DTI model. Sci Rep 2024; 14:25054. [PMID: 39443630 PMCID: PMC11499656 DOI: 10.1038/s41598-024-76367-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2024] [Accepted: 10/14/2024] [Indexed: 10/25/2024] Open
Abstract
Accurate prediction of drug-target interactions (DTIs) is crucial for advancing drug discovery and repurposing. Computational methods have significantly improved the efficiency of experimental predictions for drug-target interactions in Western medicine. However, accurately predicting the complex relationships between Chinese medicine ingredients and targets remains a formidable challenge due to the vast number and high heterogeneity of these ingredients. In this study, we introduce the CWI-DTI method, which achieves high-accuracy prediction of DTIs using a large dataset of interactive relationships of drug ingredients or candidate targets. Moreover, we present a novel dataset to evaluate the prediction accuracy of both Chinese and Western medicine. Through meticulous collection and preprocessing of data on ingredients and targets, we employ an innovative autoencoder framework to fuse multiple drug (target) topological similarity matrices. Additionally, we employ denoising blocks, sparse blocks, and stacked blocks to extract crucial features from the similarity matrix, reducing noise and enhancing accuracy across diverse datasets. Our results indicate that the CWI-DTI model shows improved performance compared to several existing state-of-the-art methods on the datasets tested in both Western and Chinese medicine databases. The findings of this study hold immense promise for advancing DTI prediction in Chinese and Western medicine, thus fostering more efficient drug discovery and repurposing endeavors. Our model is available at https://github.com/WANG-BIN-LAB/CWIDTI .
Collapse
Affiliation(s)
- Ying Li
- Department of Computer Science and Technology, Taiyuan University of Technology, Taiyuan, China
| | - Xingyu Zhang
- Department of Computer Science and Technology, Taiyuan University of Technology, Taiyuan, China
| | - Zhuo Chen
- Department of Computer Science and Technology, Taiyuan University of Technology, Taiyuan, China
| | - Hongye Yang
- Department of Computer Science and Technology, Taiyuan University of Technology, Taiyuan, China
| | - Yuhui Liu
- Department of Computer Science and Technology, Taiyuan University of Technology, Taiyuan, China
| | - Huiqing Wang
- Department of Computer Science and Technology, Taiyuan University of Technology, Taiyuan, China
| | - Ting Yan
- Department of Pathology, Shanxi Key Laboratory of Carcinogenesis and Translational Research on Esophageal Cancer, Shanxi Medical University, Taiyuan, China
| | - Jie Xiang
- Department of Computer Science and Technology, Taiyuan University of Technology, Taiyuan, China
| | - Bin Wang
- Department of Computer Science and Technology, Taiyuan University of Technology, Taiyuan, China.
| |
Collapse
|
7
|
Michels J, Bandarupalli R, Akbari AA, Le T, Xiao H, Li J, Hom EFY. Natural Language Processing Methods for the Study of Protein-Ligand Interactions. ARXIV 2024:arXiv:2409.13057v2. [PMID: 39483353 PMCID: PMC11527106] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 11/03/2024]
Abstract
Natural Language Processing (NLP) has revolutionized the way computers are used to study and interact with human languages and is increasingly influential in the study of protein and ligand binding, which is critical for drug discovery and development. This review examines how NLP techniques have been adapted to decode the "language" of proteins and small molecule ligands to predict protein-ligand interactions (PLIs). We discuss how methods such as long short-term memory (LSTM) networks, transformers, and attention mechanisms can leverage different protein and ligand data types to identify potential interaction patterns. Significant challenges are highlighted, including the scarcity of high-quality negative data, difficulties in interpreting model decisions, and sampling biases of existing datasets. We argue that focusing on improving data quality, enhancing model robustness, and fostering both collaboration and competition could catalyze future advances in machine-learning-based predictions of PLIs.
Collapse
Affiliation(s)
- James Michels
- Department of Computer Science, University of Mississippi, University, MS
| | - Ramya Bandarupalli
- Department of BioMolecular Sciences, School of Pharmacy, University of Mississippi, University, MS
| | - Amin Ahangar Akbari
- Department of BioMolecular Sciences, School of Pharmacy, University of Mississippi, University, MS
| | - Thai Le
- Department of Computer Science, Indiana University, Bloomington, IN
| | - Hong Xiao
- Department of Computer Science, University of Mississippi, University, MS
| | - Jing Li
- Department of BioMolecular Sciences, School of Pharmacy, University of Mississippi, University, MS
| | - Erik F Y Hom
- Department of Biology and Center for Biodiversity and Conservation Research, University of Mississippi, University, MS
| |
Collapse
|
8
|
Liu G, Seal S, Arevalo J, Liang Z, Carpenter AE, Jiang M, Singh S. Learning Molecular Representation in a Cell. ARXIV 2024:arXiv:2406.12056v3. [PMID: 38947938 PMCID: PMC11213146] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Subscribe] [Scholar Register] [Indexed: 07/02/2024]
Abstract
Predicting drug efficacy and safety in vivo requires information on biological responses (e.g., cell morphology and gene expression) to small molecule perturbations. However, current molecular representation learning methods do not provide a comprehensive view of cell states under these perturbations and struggle to remove noise, hindering model generalization. We introduce the Information Alignment (InfoAlign) approach to learn molecular representations through the information bottleneck method in cells. We integrate molecules and cellular response data as nodes into a context graph, connecting them with weighted edges based on chemical, biological, and computational criteria. For each molecule in a training batch, InfoAlign optimizes the encoder's latent representation with a minimality objective to discard redundant structural information. A sufficiency objective decodes the representation to align with different feature spaces from the molecule's neighborhood in the context graph. We demonstrate that the proposed sufficiency objective for alignment is tighter than existing encoder-based contrastive methods. Empirically, we validate representations from InfoAlign in two downstream applications: molecular property prediction against up to 27 baseline methods across four datasets, plus zero-shot molecule-morphology matching.
Collapse
|
9
|
Tang X, Zhou Y, Yang M, Li W. TC-DTA: Predicting Drug-Target Binding Affinity With Transformer and Convolutional Neural Networks. IEEE Trans Nanobioscience 2024; 23:572-578. [PMID: 39133595 DOI: 10.1109/tnb.2024.3441590] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/16/2024]
Abstract
Bioinformatics is a rapidly evolving field that applies computational methods to analyze and interpret biological data. A key task in bioinformatics is identifying novel drug-target interactions (DTIs), which plays a crucial role in drug discovery. Most computational approaches treat DTI prediction as a binary classification problem, determining whether drug-target pairs interact. However, with the growing availability of drug-target binding affinity data, this binary task can be reframed as a regression problem focused on drug-target affinity (DTA). DTA quantifies the strength of drug-target binding, offering more detailed insights than DTI and serving as a valuable tool for virtual screening in drug discovery. Accurately predicting compound interactions with targets can accelerate the drug development process. In this study, we introduce a deep learning model named TC-DTA for DTA prediction, leveraging convolutional neural networks (CNN) and the encoder module of the transformer architecture. We begin by extracting raw drug SMILES strings and protein amino acid sequences from the dataset, which are then represented using various encoding methods. Subsequently, we employ CNN and the transformer's encoder module to extract features from the drug SMILES strings and protein sequences, respectively. Finally, the feature information is concatenated and input into a multi-layer perceptron to predict binding affinity scores. We evaluated our model on two benchmark DTA datasets, Davis and KIBA, comparing it with methods such as KronRLS, SimBoost, and DeepDTA. Our model, TC-DTA, outperformed these baseline methods based on evaluation metrics like Mean Squared Error (MSE), Concordance Index (CI), and Regression towards the Mean Index ( rm2 ). These results highlight the effectiveness of the Transformer's encoder and CNN in extracting meaningful representations from sequences, thereby enhancing DTA prediction accuracy. This deep learning model can accelerate drug discovery by identifying drug candidates with high binding affinity to specific targets. Compared to traditional methods, machine learning technology offers a more effective and efficient approach to drug discovery.
Collapse
|
10
|
Qian Y, Li X, Wu J, Zhang Q. MMCL-CPI: A multi-modal compound-protein interaction prediction model incorporating contrastive learning pre-training. Comput Biol Chem 2024; 112:108137. [PMID: 39079285 DOI: 10.1016/j.compbiolchem.2024.108137] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2024] [Revised: 05/31/2024] [Accepted: 06/20/2024] [Indexed: 09/13/2024]
Abstract
MOTIVATION Compound-protein interaction (CPI) prediction plays a crucial role in drug discovery and drug repositioning. Early researchers relied on time-consuming and labor-intensive wet laboratory experiments. However, the advent of deep learning has significantly accelerated this progress. Most existing deep learning methods utilize deep neural networks to extract compound features from sequences and graphs, either separately or in combination. Our team's previous research has demonstrated that compound images contain valuable information that can be leveraged for CPI task. However, there is a scarcity of multimodal methods that effectively combine sequence and image representations of compounds in CPI. Currently, the use of text-image pairs for contrastive language-image pre-training is a popular approach in the multimodal field. Further research is needed to explore how the integration of sequence and image representations can enhance the accuracy of CPI task. RESULTS This paper presents a novel method called MMCL-CPI, which encompasses two key highlights: 1) Firstly, we propose extracting compound features from two modalities: one-dimensional SMILES and two-dimensional images. This approach enables us to capture both sequence and spatial features, enhancing the prediction accuracy for CPI. Based on this, we design a novel multimodal model. 2) Secondly, we introduce a multimodal pre-training strategy that leverages comparative learning on a large-scale unlabeled dataset to establish the correspondence between SMILES string and compound's image. This pre-training approach significantly improves compound feature representations for downstream CPI task. Our method has shown competitive results on multiple datasets.
Collapse
Affiliation(s)
- Ying Qian
- School of Computer Science and Technology, Shanghai Frontiers Science Center of Molecule Intelligent Syntheses, East China Normal University, Shanghai, China
| | - Xinyi Li
- School of Computer Science and Technology, Shanghai Frontiers Science Center of Molecule Intelligent Syntheses, East China Normal University, Shanghai, China
| | - Jian Wu
- School of Computer Science and Technology, Shanghai Frontiers Science Center of Molecule Intelligent Syntheses, East China Normal University, Shanghai, China
| | - Qian Zhang
- School of Computer Science and Technology, Shanghai Frontiers Science Center of Molecule Intelligent Syntheses, East China Normal University, Shanghai, China.
| |
Collapse
|
11
|
Sun D, Macedonia C, Chen Z, Chandrasekaran S, Najarian K, Zhou S, Cernak T, Ellingrod VL, Jagadish HV, Marini B, Pai M, Violi A, Rech JC, Wang S, Li Y, Athey B, Omenn GS. Can Machine Learning Overcome the 95% Failure Rate and Reality that Only 30% of Approved Cancer Drugs Meaningfully Extend Patient Survival? J Med Chem 2024; 67:16035-16055. [PMID: 39253942 DOI: 10.1021/acs.jmedchem.4c01684] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/11/2024]
Abstract
Despite implementing hundreds of strategies, cancer drug development suffers from a 95% failure rate over 30 years, with only 30% of approved cancer drugs extending patient survival beyond 2.5 months. Adding more criteria without eliminating nonessential ones is impractical and may fall into the "survivorship bias" trap. Machine learning (ML) models may enhance efficiency by saving time and cost. Yet, they may not improve success rate without identifying the root causes of failure. We propose a "STAR-guided ML system" (structure-tissue/cell selectivity-activity relationship) to enhance success rate and efficiency by addressing three overlooked interdependent factors: potency/specificity to the on/off-targets determining efficacy in tumors at clinical doses, on/off-target-driven tissue/cell selectivity influencing adverse effects in the normal organs at clinical doses, and optimal clinical doses balancing efficacy/safety as determined by potency/specificity and tissue/cell selectivity. STAR-guided ML models can directly predict clinical dose/efficacy/safety from five features to design/select the best drugs, enhancing success and efficiency of cancer drug development.
Collapse
Affiliation(s)
| | | | - Zhigang Chen
- LabBotics.ai, Palo Alto, California 94303, United States
| | | | | | - Simon Zhou
- Aurinia Pharmaceuticals Inc., Rockville, Maryland 20850, United States
| | | | | | | | | | | | | | | | | | - Yan Li
- Translational Medicine and Clinical Pharmacology, Bristol Myers Squibb, Summit, New Jersey 07901, United States
| | | | | |
Collapse
|
12
|
E U, T M, A V G, D P. A comprehensive survey of drug-target interaction analysis in allopathy and siddha medicine. Artif Intell Med 2024; 157:102986. [PMID: 39326289 DOI: 10.1016/j.artmed.2024.102986] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2023] [Revised: 08/13/2024] [Accepted: 09/18/2024] [Indexed: 09/28/2024]
Abstract
Effective drug delivery is the cornerstone of modern healthcare, ensuring therapeutic compounds reach their intended targets efficiently. This paper explores the potential of personalized and holistic healthcare, driven by the synergy between traditional and allopathic medicine systems, with a specific focus on the vast reservoir of medicinal compounds found in plants rooted in the historical legacy of traditional medicine. Motivated by the desire to unlock the therapeutic potential of medicinal plants and bridge the gap between traditional and allopathic medicine, this survey delves into in-silico computational approaches for studying Drug-Target Interactions (DTI) within the contexts of allopathy and siddha medicine. The contributions of this survey are multifaceted: it offers a comprehensive overview of in-silico methods for DTI analysis in both systems, identifies common challenges in DTI studies, provides insights into future directions to advance DTI analysis, and includes a comparative analysis of DTI in allopathy and siddha medicine. The findings of this survey highlight the pivotal role of in-silico computational approaches in advancing drug research and development in both allopathy and siddha medicine, emphasizing the importance of integrating these methods to drive the future of personalized healthcare.
Collapse
Affiliation(s)
- Uma E
- Department of Information Science and Technology, College of Engineering Guindy, Chennai, India.
| | - Mala T
- Department of Information Science and Technology, College of Engineering Guindy, Chennai, India
| | - Geetha A V
- Department of Information Science and Technology, College of Engineering Guindy, Chennai, India
| | - Priyanka D
- Department of Information Science and Technology, College of Engineering Guindy, Chennai, India
| |
Collapse
|
13
|
Guichaoua G, Pinel P, Hoffmann B, Azencott CA, Stoven V. Drug-Target Interactions Prediction at Scale: The Komet Algorithm with the LCIdb Dataset. J Chem Inf Model 2024; 64:6938-6956. [PMID: 39237105 PMCID: PMC11423346 DOI: 10.1021/acs.jcim.4c00422] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/07/2024]
Abstract
Drug-target interactions (DTIs) prediction algorithms are used at various stages of the drug discovery process. In this context, specific problems such as deorphanization of a new therapeutic target or target identification of a drug candidate arising from phenotypic screens require large-scale predictions across the protein and molecule spaces. DTI prediction heavily relies on supervised learning algorithms that use known DTIs to learn associations between molecule and protein features, allowing for the prediction of new interactions based on learned patterns. The algorithms must be broadly applicable to enable reliable predictions, even in regions of the protein or molecule spaces where data may be scarce. In this paper, we address two key challenges to fulfill these goals: building large, high-quality training datasets and designing prediction methods that can scale, in order to be trained on such large datasets. First, we introduce LCIdb, a curated, large-sized dataset of DTIs, offering extensive coverage of both the molecule and druggable protein spaces. Notably, LCIdb contains a much higher number of molecules than publicly available benchmarks, expanding coverage of the molecule space. Second, we propose Komet (Kronecker Optimized METhod), a DTI prediction pipeline designed for scalability without compromising performance. Komet leverages a three-step framework, incorporating efficient computation choices tailored for large datasets and involving the Nyström approximation. Specifically, Komet employs a Kronecker interaction module for (molecule, protein) pairs, which efficiently captures determinants in DTIs, and whose structure allows for reduced computational complexity and quasi-Newton optimization, ensuring that the model can handle large training sets, without compromising on performance. Our method is implemented in open-source software, leveraging GPU parallel computation for efficiency. We demonstrate the interest of our pipeline on various datasets, showing that Komet displays superior scalability and prediction performance compared to state-of-the-art deep learning approaches. Additionally, we illustrate the generalization properties of Komet by showing its performance on an external dataset, and on the publicly available L H benchmark designed for scaffold hopping problems. Komet is available open source at https://komet.readthedocs.io and all datasets, including LCIdb, can be found at https://zenodo.org/records/10731712.
Collapse
Affiliation(s)
- Gwenn Guichaoua
- Center for Computational Biology (CBIO), Mines Paris-PSL, 75006 Paris, France
- Institut Curie, Université PSL, 75005 Paris, France
- INSERM U900, 75005 Paris, France
| | - Philippe Pinel
- Center for Computational Biology (CBIO), Mines Paris-PSL, 75006 Paris, France
- Institut Curie, Université PSL, 75005 Paris, France
- INSERM U900, 75005 Paris, France
- Iktos SAS, 75017 Paris, France
| | | | - Chloé-Agathe Azencott
- Center for Computational Biology (CBIO), Mines Paris-PSL, 75006 Paris, France
- Institut Curie, Université PSL, 75005 Paris, France
- INSERM U900, 75005 Paris, France
| | - Véronique Stoven
- Center for Computational Biology (CBIO), Mines Paris-PSL, 75006 Paris, France
- Institut Curie, Université PSL, 75005 Paris, France
- INSERM U900, 75005 Paris, France
| |
Collapse
|
14
|
Hashemi M, Zabihian A, Hajsaeedi M, Hooshmand M. Antivirals for monkeypox virus: Proposing an effective machine/deep learning framework. PLoS One 2024; 19:e0299342. [PMID: 39264896 DOI: 10.1371/journal.pone.0299342] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2024] [Accepted: 07/07/2024] [Indexed: 09/14/2024] Open
Abstract
Monkeypox (MPXV) is one of the infectious viruses which caused morbidity and mortality problems in these years. Despite its danger to public health, there is no approved drug to stand and handle MPXV. On the other hand, drug repurposing is a promising screening method for the low-cost introduction of approved drugs for emerging diseases and viruses which utilizes computational methods. Therefore, drug repurposing is a promising approach to suggesting approved drugs for the MPXV. This paper proposes a computational framework for MPXV antiviral prediction. To do this, we have generated a new virus-antiviral dataset. Moreover, we applied several machine learning and one deep learning method for virus-antiviral prediction. The suggested drugs by the learning methods have been investigated using docking studies. The target protein structure is modeled using homology modeling and, then, refined and validated. To the best of our knowledge, this work is the first work to study deep learning methods for the prediction of MPXV antivirals. The screening results confirm that Tilorone, Valacyclovir, Ribavirin, Favipiravir, and Baloxavir marboxil are effective drugs for MPXV treatment.
Collapse
Affiliation(s)
- Morteza Hashemi
- Department of Computer Science, Institute for Advanced Studies in Basic Sciences, Zanjan, Iran
| | - Arash Zabihian
- Department of QA, Kimia Zist Parsian Pharmaceutical Company, Zanjan, Iran
| | - Masih Hajsaeedi
- Department of Computer Science, Institute for Advanced Studies in Basic Sciences, Zanjan, Iran
| | - Mohsen Hooshmand
- Department of Computer Science, Institute for Advanced Studies in Basic Sciences, Zanjan, Iran
| |
Collapse
|
15
|
Majidifar S, Zabihian A, Hooshmand M. Combination therapy synergism prediction for virus treatment using machine learning models. PLoS One 2024; 19:e0309733. [PMID: 39231124 PMCID: PMC11373828 DOI: 10.1371/journal.pone.0309733] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2024] [Accepted: 08/16/2024] [Indexed: 09/06/2024] Open
Abstract
Combining different drugs synergistically is an essential aspect of developing effective treatments. Although there is a plethora of research on computational prediction for new combination therapies, there is limited to no research on combination therapies in the treatment of viral diseases. This paper proposes AI-based models for predicting novel antiviral combinations to treat virus diseases synergistically. To do this, we assembled a comprehensive dataset comprising information on viral strains, drug compounds, and their known interactions. As far as we know, this is the first dataset and learning model on combination therapy for viruses. Our proposal includes using a random forest model, an SVM model, and a deep model to train viral combination therapy. The machine learning models showed the highest performance, and the predicted values were validated by a t-test, indicating the effectiveness of the proposed methods. One of the predicted combinations of acyclovir and ribavirin has been experimentally confirmed to have a synergistic antiviral effect against herpes simplex type-1 virus, as described in the literature.
Collapse
Affiliation(s)
- Shayan Majidifar
- Department of Computer Science and Information Technology, Institute for Advanced Studies in Basic Sciences (IASBS), Zanjan, Iran
| | - Arash Zabihian
- Department of QA, Kimia Zist Parsian Pharmaceutical Company, Zanjan, Iran
| | - Mohsen Hooshmand
- Department of Computer Science and Information Technology, Institute for Advanced Studies in Basic Sciences (IASBS), Zanjan, Iran
| |
Collapse
|
16
|
Ahmed KT, Ansari MI, Zhang W. DTI-LM: language model powered drug-target interaction prediction. Bioinformatics 2024; 40:btae533. [PMID: 39221997 PMCID: PMC11520403 DOI: 10.1093/bioinformatics/btae533] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/01/2024] [Revised: 08/05/2024] [Accepted: 08/29/2024] [Indexed: 09/04/2024] Open
Abstract
MOTIVATION The identification and understanding of drug-target interactions (DTIs) play a pivotal role in the drug discovery and development process. Sequence representations of drugs and proteins in computational model offer advantages such as their widespread availability, easier input quality control, and reduced computational resource requirements. These make them an efficient and accessible tools for various computational biology and drug discovery applications. Many sequence-based DTI prediction methods have been developed over the years. Despite the advancement in methodology, cold start DTI prediction involving unknown drug or protein remains a challenging task, particularly for sequence-based models. Introducing DTI-LM, a novel framework leveraging advanced pretrained language models, we harness their exceptional context-capturing abilities along with neighborhood information to predict DTIs. DTI-LM is specifically designed to rely solely on sequence representations for drugs and proteins, aiming to bridge the gap between warm start and cold start predictions. RESULTS Large-scale experiments on four datasets show that DTI-LM can achieve state-of-the-art performance on DTI predictions. Notably, it excels in overcoming the common challenges faced by sequence-based models in cold start predictions for proteins, yielding impressive results. The incorporation of neighborhood information through a graph attention network further enhances prediction accuracy. Nevertheless, a disparity persists between cold start predictions for proteins and drugs. A detailed examination of DTI-LM reveals that language models exhibit contrasting capabilities in capturing similarities between drugs and proteins. AVAILABILITY AND IMPLEMENTATION Source code is available at: https://github.com/compbiolabucf/DTI-LM.
Collapse
Affiliation(s)
- Khandakar Tanvir Ahmed
- Department of Computer Science, University of Central Florida, Orlando, FL 32816, United States
- Genomics and Bioinformatics Cluster, University of Central Florida, Orlando, FL 32816, United States
| | - Md Istiaq Ansari
- Department of Computer Science, University of Central Florida, Orlando, FL 32816, United States
- Genomics and Bioinformatics Cluster, University of Central Florida, Orlando, FL 32816, United States
| | - Wei Zhang
- Department of Computer Science, University of Central Florida, Orlando, FL 32816, United States
- Genomics and Bioinformatics Cluster, University of Central Florida, Orlando, FL 32816, United States
| |
Collapse
|
17
|
Zhang L, Zeng W, Chen J, Chen J, Li K. ParaCPI: A Parallel Graph Convolutional Network for Compound-Protein Interaction Prediction. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2024; 21:1565-1578. [PMID: 38787671 DOI: 10.1109/tcbb.2024.3404889] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/26/2024]
Abstract
Identifying compound-protein interactions (CPIs) is critical in drug discovery, as accurate prediction of CPIs can remarkably reduce the time and cost of new drug development. The rapid growth of existing biological knowledge has opened up possibilities for leveraging known biological knowledge to predict unknown CPIs. However, existing CPI prediction models still fall short of meeting the needs of practical drug discovery applications. A novel parallel graph convolutional network model for CPI prediction (ParaCPI) is proposed in this study. This model constructs feature representation of compounds using a unique approach to predict unknown CPIs from known CPI data more effectively. Experiments are conducted on five public datasets, and the results are compared with current state-of-the-art (SOTA) models under three different experimental settings to evaluate the model's performance. In the three cold-start settings, ParaCPI achieves an average performance gain of 26.75%, 23.84%, and 14.68% in terms of area under the curve compared with the other SOTA models. In addition, the results of the experiments in the case study show ParaCPI's superior ability to predict unknown CPIs based on known data, with higher accuracy and stronger generalization compared with the SOTA models. Researchers can leverage ParaCPI to accelerate the drug discovery process.
Collapse
|
18
|
Zhao L, Zhu Y, Wen N, Wang C, Wang J, Yuan Y. Drug-Target Binding Affinity Prediction in a Continuous Latent Space Using Variational Autoencoders. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2024; 21:1458-1467. [PMID: 38767996 DOI: 10.1109/tcbb.2024.3402661] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/22/2024]
Abstract
Accurate prediction of Drug-Target binding Affinity (DTA) is a daunting yet pivotal task in the sphere of drug discovery. Over the years, a plethora of deep learning-based DTA models have emerged, rendering promising results in predicting the binding affinities between drugs and their target proteins. However, in contrast to the conventional approach of modeling binding affinity in vector spaces, we propose a more nuanced modeling process in a continuous space to account for the diversity of input samples. Initially, the drug is encoded using the Simplified Molecular Input Line Entry System (SMILES), while the target sequences are characterized via a pretrained language model. Subsequently, highly correlative information is extracted utilizing residual gated convolutional neural networks. In a departure from existing deep learning-based models, our model learns the hidden representations of the drugs and targets jointly. Instead of employing two vectors, our hidden representations consist of two Gaussian distributions. To validate the effectiveness of our proposal, we conducted evaluations on commonly utilized benchmark datasets. The experimental outcomes corroborated that our method surpasses the state-of-the-art vectorial representation methods in terms of performance. This approach, therefore, offers potential enhancements in the precision of DTA predictions, potentially contributing to more efficient drug discovery processes.
Collapse
|
19
|
Peng L, Liu X, Chen M, Liao W, Mao J, Zhou L. MGNDTI: A Drug-Target Interaction Prediction Framework Based on Multimodal Representation Learning and the Gating Mechanism. J Chem Inf Model 2024; 64:6684-6698. [PMID: 39137398 DOI: 10.1021/acs.jcim.4c00957] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/15/2024]
Abstract
Drug-Target Interaction (DTI) prediction facilitates acceleration of drug discovery and promotes drug repositioning. Most existing deep learning-based DTI prediction methods can better extract discriminative features for drugs and proteins, but they rarely consider multimodal features of drugs. Moreover, learning the interaction representations between drugs and targets needs further exploration. Here, we proposed a simple M ulti-modal G ating N etwork for DTI prediction, MGNDTI, based on multimodal representation learning and the gating mechanism. MGNDTI first learns the sequence representations of drugs and targets using different retentive networks. Next, it extracts molecular graph features of drugs through a graph convolutional network. Subsequently, it devises a multimodal gating network to obtain the joint representations of drugs and targets. Finally, it builds a fully connected network for computing the interaction probability. MGNDTI was benchmarked against seven state-of-the-art DTI prediction models (CPI-GNN, TransformerCPI, MolTrans, BACPI, CPGL, GIFDTI, and FOTF-CPI) using four data sets (i.e., Human, C. elegans, BioSNAP, and BindingDB) under four different experimental settings. Through evaluation with AUROC, AUPRC, accuracy, F1 score, and MCC, MGNDTI significantly outperformed the above seven methods. MGNDTI is a powerful tool for DTI prediction, showcasing its superior robustness and generalization ability on diverse data sets and different experimental settings. It is freely available at https://github.com/plhhnu/MGNDTI.
Collapse
Affiliation(s)
- Lihong Peng
- College of Life Science and Chemistry, Hunan University of Technology, Zhuzhou, Hunan 412007, China
| | - Xin Liu
- College of Life Science and Chemistry, Hunan University of Technology, Zhuzhou, Hunan 412007, China
| | - Min Chen
- School of Computer Science and Engineering, Hunan Institute of Technology, Hengyang, Hunan 421002, China
| | - Wen Liao
- School of Computer Science, Hunan University of Technology, Zhuzhou, Hunan 412007, China
| | - Jiale Mao
- School of Computer Science, Hunan University of Technology, Zhuzhou, Hunan 412007, China
| | - Liqian Zhou
- College of Life Science and Chemistry, Hunan University of Technology, Zhuzhou, Hunan 412007, China
| |
Collapse
|
20
|
Zeng X, Zhong KY, Meng PY, Li SJ, Lv SQ, Wen ML, Li Y. MvGraphDTA: multi-view-based graph deep model for drug-target affinity prediction by introducing the graphs and line graphs. BMC Biol 2024; 22:182. [PMID: 39183297 PMCID: PMC11346193 DOI: 10.1186/s12915-024-01981-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2024] [Accepted: 08/13/2024] [Indexed: 08/27/2024] Open
Abstract
BACKGROUND Accurately identifying drug-target affinity (DTA) plays a pivotal role in drug screening, design, and repurposing in pharmaceutical industry. It not only reduces the time, labor, and economic costs associated with biological experiments but also expedites drug development process. However, achieving the desired level of computational accuracy for DTA identification methods remains a significant challenge. RESULTS We proposed a novel multi-view-based graph deep model known as MvGraphDTA for DTA prediction. MvGraphDTA employed a graph convolutional network (GCN) to extract the structural features from original graphs of drugs and targets, respectively. It went a step further by constructing line graphs with edges as vertices based on original graphs of drugs and targets. GCN was also used to extract the relationship features within their line graphs. To enhance the complementarity between the extracted features from original graphs and line graphs, MvGraphDTA fused the extracted multi-view features of drugs and targets, respectively. Finally, these fused features were concatenated and passed through a fully connected (FC) network to predict DTA. CONCLUSIONS During the experiments, we performed data augmentation on all the training sets used. Experimental results showed that MvGraphDTA outperformed the competitive state-of-the-art methods on benchmark datasets for DTA prediction. Additionally, we evaluated the universality and generalization performance of MvGraphDTA on additional datasets. Experimental outcomes revealed that MvGraphDTA exhibited good universality and generalization capability, making it a reliable tool for drug-target interaction prediction.
Collapse
Affiliation(s)
- Xin Zeng
- College of Mathematics and Computer Science, Dali University, Dali, 671003, China
| | - Kai-Yang Zhong
- College of Mathematics and Computer Science, Dali University, Dali, 671003, China
| | - Pei-Yan Meng
- College of Mathematics and Computer Science, Dali University, Dali, 671003, China
| | - Shu-Juan Li
- Yunnan Institute of Endemic Diseases Control & Prevention, Dali, 671000, China
| | - Shuang-Qing Lv
- Institute of Surveying and Information Engineering, West Yunnan University of Applied Science, Dali, 671000, China
| | - Meng-Liang Wen
- State Key Laboratory for Conservation and Utilization of Bio-Resources in Yunnan, Yunnan University, Kunming, 650000, China
| | - Yi Li
- College of Mathematics and Computer Science, Dali University, Dali, 671003, China.
| |
Collapse
|
21
|
Guevara-Barrientos D, Kaundal R. Malivhu: A Comprehensive Bioinformatics Resource for Filtering SARS and MERS Virus Proteins by Their Classification, Family and Species, and Prediction of Their Interactions Against Human Proteins. Bioinform Biol Insights 2024; 18:11779322241263671. [PMID: 39148721 PMCID: PMC11325310 DOI: 10.1177/11779322241263671] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2024] [Accepted: 06/04/2024] [Indexed: 08/17/2024] Open
Abstract
COVID 19 pandemic is still ongoing, having taken more than 6 million human lives with it, and it seems that the world will have to learn how to live with the virus around. In consequence, there is a need to develop different treatments against it, not only with vaccines, but also new medicines. To do this, human-virus protein-protein interactions (PPIs) play a key part in drug-target discovery, but finding them experimentally can be either costly or sometimes unreliable. Therefore, computational methods arose as a powerful alternative to predict these interactions, reducing costs and helping researchers confirm only certain interactions instead of trying all possible combinations in the laboratory. Malivhu is a tool that predicts human-virus PPIs through a 4-phase process using machine learning models, where phase 1 filters ssRNA(+) class virus proteins, phase 2 filters Coronaviridae family proteins and phase 3 filters severe acute respiratory syndrome (SARS) and Middle East respiratory syndrome (MERS) species proteins, and phase 4 predicts human-SARS-CoV/SARS-CoV-2/MERS protein-protein interactions. The performance of the models was measured with Matthews correlation coefficient, F1-score, specificity, sensitivity, and accuracy scores, getting accuracies of 99.07%, 99.83%, and 100% for the first 3 phases, respectively, and 94.24% for human-SARS-CoV PPI, 94.50% for human-SARS-CoV-2 PPI, and 95.45% for human-MERS PPI on independent testing. All the prediction models developed for each of the 4 phases were implemented as web server which is freely available at https://kaabil.net/malivhu/.
Collapse
Affiliation(s)
- David Guevara-Barrientos
- Department of Computer Science, College of Science, Utah State University, Logan, UT, USA
- Bioinformatics Facility, Center for Integrated BioSystems, Utah State University, Logan, UT, USA
| | - Rakesh Kaundal
- Department of Computer Science, College of Science, Utah State University, Logan, UT, USA
- Bioinformatics Facility, Center for Integrated BioSystems, Utah State University, Logan, UT, USA
- Department of Plants, Soils & Climate, College of Agriculture and Applied Sciences, Utah State University, Logan, UT, USA
| |
Collapse
|
22
|
Schulman A, Rousu J, Aittokallio T, Tanoli Z. Attention-based approach to predict drug-target interactions across seven target superfamilies. Bioinformatics 2024; 40:btae496. [PMID: 39115379 PMCID: PMC11520408 DOI: 10.1093/bioinformatics/btae496] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2024] [Revised: 06/12/2024] [Accepted: 08/06/2024] [Indexed: 08/29/2024] Open
Abstract
MOTIVATION Drug-target interactions (DTIs) hold a pivotal role in drug repurposing and elucidation of drug mechanisms of action. While single-targeted drugs have demonstrated clinical success, they often exhibit limited efficacy against complex diseases, such as cancers, whose development and treatment is dependent on several biological processes. Therefore, a comprehensive understanding of primary, secondary and even inactive targets becomes essential in the quest for effective and safe treatments for cancer and other indications. The human proteome offers over a thousand druggable targets, yet most FDA-approved drugs bind to only a small fraction of these targets. RESULTS This study introduces an attention-based method (called as MMAtt-DTA) to predict drug-target bioactivities across human proteins within seven superfamilies. We meticulously examined nine different descriptor sets to identify optimal signature descriptors for predicting novel DTIs. Our testing results demonstrated Spearman correlations exceeding 0.72 (P < 0.001) for six out of seven superfamilies. The proposed method outperformed fourteen state-of-the-art machine learning, deep learning and graph-based methods and maintained relatively high performance for most target superfamilies when tested with independent bioactivity data sources. We computationally validated 185 676 drug-target pairs from ChEMBL-V33 that were not available during model training, achieving a reasonable performance with Spearman correlation >0.57 (P < 0.001) for most superfamilies. This underscores the robustness of the proposed method for predicting novel DTIs. Finally, we applied our method to predict missing bioactivities among 3492 approved molecules in ChEMBL-V33, offering a valuable tool for advancing drug mechanism discovery and repurposing existing drugs for new indications. AVAILABILITY AND IMPLEMENTATION https://github.com/AronSchulman/MMAtt-DTA.
Collapse
Affiliation(s)
- Aron Schulman
- Institute for Molecular Medicine Finland (FIMM), HiLIFE, University of Helsinki, Helsinki, 00014, Finland
| | - Juho Rousu
- Department of Computer Science, Aalto University, Espoo, 02150, Finland
| | - Tero Aittokallio
- Institute for Molecular Medicine Finland (FIMM), HiLIFE, University of Helsinki, Helsinki, 00014, Finland
- iCAN Digital Precision Cancer Medicine Flagship, University of Helsinki and Helsinki University Hospital, Helsinki, 00014, Finland
- Department of Cancer Genetics, Institute for Cancer Research, Oslo University Hospital, Oslo, 0379, Norway
- Oslo Centre for Biostatistics and Epidemiology (OCBE), Faculty of Medicine, University of Oslo, Oslo, 0372, Norway
| | - Ziaurrehman Tanoli
- Institute for Molecular Medicine Finland (FIMM), HiLIFE, University of Helsinki, Helsinki, 00014, Finland
- iCAN Digital Precision Cancer Medicine Flagship, University of Helsinki and Helsinki University Hospital, Helsinki, 00014, Finland
- Drug Discovery and Chemical Biology (DDCB) Consortium, Biocenter, Helsinki, 00014, Finland
- BioICAWtech, Helsinki, Helsinki, 00410, Finland
| |
Collapse
|
23
|
Wei J, Zhu Y, Zhuo L, Liu Y, Fu X, Li F. Efficient Deep Model Ensemble Framework for Drug-Target Interaction Prediction. J Phys Chem Lett 2024; 15:7681-7693. [PMID: 39038219 DOI: 10.1021/acs.jpclett.4c01509] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/24/2024]
Abstract
Accurate prediction of Drug-Target Interactions (DTI) is crucial for drug development. Current state-of-the-art deep learning methods have significantly advanced the field; however, these methods exhibit limitations in predictive performance and the propensity for false negatives. Therefore, we propose EADTN, a simple and efficient ensemble model. We have designed an innovative feature adaptation technique to automatically extract local weights of drugs and targets, and we utilize clustering-enhanced parameter fine-tuning to overcome the issue of false negatives, thereby enhancing its reliability in drug discovery. Based on EADTN, we also propose a Shapley value-based method for identifying key drug substructures, effectively enhancing the model's interpretability. Additionally, we utilized EADTN to reveal potential interactions between NQO1 targets and the drugs SIRT-IN-1 and LY2183240, which were subsequently validated through wet-lab experiments. Experimental evidence demonstrates that EADTN consistently outperforms existing best-performing models across various data sets, promising significant benefits in fields such as drug repositioning.
Collapse
Affiliation(s)
- Jinhang Wei
- School of Data Science and Artificial Intelligence, Wenzhou University of Technology, Wenzhou 325000, China
| | - Yangbin Zhu
- School of Data Science and Artificial Intelligence, Wenzhou University of Technology, Wenzhou 325000, China
| | - Linlin Zhuo
- School of Data Science and Artificial Intelligence, Wenzhou University of Technology, Wenzhou 325000, China
| | - Yang Liu
- Strait Institute of Flexible Electronics (SIFE, Future Technologies), Fujian Key Laboratory of Flexible Electronics, Fujian Normal University and Strait Laboratory of Flexible Electronics (SLoFE), Fuzhou 350002, China
| | - Xiangzheng Fu
- College of Computer Science and Electronic Engineering, Hunan University, Changsha 410082, China
| | - Fushan Li
- Institute of Optoelectronic Technology, Fuzhou University, Fuzhou 350002, China
- Fujian Science & Technology Innovation Laboratory for Optoelectronic Information of China, Fuzhou 350116, China
| |
Collapse
|
24
|
Menichetti G, Barabási AL, Loscalzo J. Decoding the Foodome: Molecular Networks Connecting Diet and Health. Annu Rev Nutr 2024; 44:257-288. [PMID: 39207880 DOI: 10.1146/annurev-nutr-062322-030557] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/04/2024]
Abstract
Diet, a modifiable risk factor, plays a pivotal role in most diseases, from cardiovascular disease to type 2 diabetes mellitus, cancer, and obesity. However, our understanding of the mechanistic role of the chemical compounds found in food remains incomplete. In this review, we explore the "dark matter" of nutrition, going beyond the macro- and micronutrients documented by national databases to unveil the exceptional chemical diversity of food composition. We also discuss the need to explore the impact of each compound in the presence of associated chemicals and relevant food sources and describe the tools that will allow us to do so. Finally, we discuss the role of network medicine in understanding the mechanism of action of each food molecule. Overall, we illustrate the important role of network science and artificial intelligence in our ability to reveal nutrition's multifaceted role in health and disease.
Collapse
Affiliation(s)
- Giulia Menichetti
- Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, Massachusetts, USA;
- Network Science Institute and Department of Physics, Northeastern University, Boston, Massachusetts, USA
- Harvard Data Science Initiative, Harvard University, Boston, Massachusetts, USA
| | - Albert-László Barabási
- Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, Massachusetts, USA;
- Network Science Institute and Department of Physics, Northeastern University, Boston, Massachusetts, USA
- Department of Network and Data Science, Central European University, Budapest, Hungary
| | - Joseph Loscalzo
- Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, Massachusetts, USA;
| |
Collapse
|
25
|
Zabihian A, Asghari J, Hooshmand M, Gharaghani S. A comparative analysis of computational drug repurposing approaches: proposing a novel tensor-matrix-tensor factorization method. Mol Divers 2024; 28:2177-2196. [PMID: 38683487 DOI: 10.1007/s11030-024-10851-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2023] [Accepted: 03/18/2024] [Indexed: 05/01/2024]
Abstract
Efficient drug discovery relies on drug repurposing, an important and open research field. This work presents a novel factorization method and a practical comparison of different approaches for drug repurposing. First, we propose a novel tensor-matrix-tensor (TMT) formulation as a new data array method with a gradient-based factorization procedure. Additionally, this paper examines and contrasts four computational drug repurposing approaches-factorization-based methods, machine learning methods, deep learning methods, and graph neural networks-to fulfill the second purpose. We test the strategies on two datasets and assess each approach's performance, drawbacks, problems, and benefits based on results. The results demonstrate that deep learning techniques work better than other strategies and that their results might be more reliable. Ultimately, graph neural methods need to be in an inductive manner to have a reliable prediction.
Collapse
Affiliation(s)
- Arash Zabihian
- Department of Bioinformatics, Kish International Campus, University of Tehran, Kish, Iran
| | - Javad Asghari
- Department of Computer Science and Information Technology, Institute of Advanced Studies in Basic Sciences, Zanjan, Iran
| | - Mohsen Hooshmand
- Department of Computer Science and Information Technology, Institute of Advanced Studies in Basic Sciences, Zanjan, Iran.
| | - Sajjad Gharaghani
- Laboratory of Bioinformatics and Drug Design, University of Tehran, Tehran, Iran
| |
Collapse
|
26
|
Nerella S, Bandyopadhyay S, Zhang J, Contreras M, Siegel S, Bumin A, Silva B, Sena J, Shickel B, Bihorac A, Khezeli K, Rashidi P. Transformers and large language models in healthcare: A review. Artif Intell Med 2024; 154:102900. [PMID: 38878555 DOI: 10.1016/j.artmed.2024.102900] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2023] [Revised: 05/28/2024] [Accepted: 05/30/2024] [Indexed: 08/09/2024]
Abstract
With Artificial Intelligence (AI) increasingly permeating various aspects of society, including healthcare, the adoption of the Transformers neural network architecture is rapidly changing many applications. Transformer is a type of deep learning architecture initially developed to solve general-purpose Natural Language Processing (NLP) tasks and has subsequently been adapted in many fields, including healthcare. In this survey paper, we provide an overview of how this architecture has been adopted to analyze various forms of healthcare data, including clinical NLP, medical imaging, structured Electronic Health Records (EHR), social media, bio-physiological signals, biomolecular sequences. Furthermore, which have also include the articles that used the transformer architecture for generating surgical instructions and predicting adverse outcomes after surgeries under the umbrella of critical care. Under diverse settings, these models have been used for clinical diagnosis, report generation, data reconstruction, and drug/protein synthesis. Finally, we also discuss the benefits and limitations of using transformers in healthcare and examine issues such as computational cost, model interpretability, fairness, alignment with human values, ethical implications, and environmental impact.
Collapse
Affiliation(s)
- Subhash Nerella
- Department of Biomedical Engineering, University of Florida, Gainesville, United States
| | | | - Jiaqing Zhang
- Department of Electrical and Computer Engineering, University of Florida, Gainesville, United States
| | - Miguel Contreras
- Department of Biomedical Engineering, University of Florida, Gainesville, United States
| | - Scott Siegel
- Department of Biomedical Engineering, University of Florida, Gainesville, United States
| | - Aysegul Bumin
- Department of Computer and Information Science and Engineering, University of Florida, Gainesville, United States
| | - Brandon Silva
- Department of Computer and Information Science and Engineering, University of Florida, Gainesville, United States
| | - Jessica Sena
- Department Of Computer Science, Universidade Federal de Minas Gerais, Belo Horizonte, Brazil
| | - Benjamin Shickel
- Department of Medicine, University of Florida, Gainesville, United States
| | - Azra Bihorac
- Department of Medicine, University of Florida, Gainesville, United States
| | - Kia Khezeli
- Department of Biomedical Engineering, University of Florida, Gainesville, United States
| | - Parisa Rashidi
- Department of Biomedical Engineering, University of Florida, Gainesville, United States.
| |
Collapse
|
27
|
Li X, Zhao X, Yu X, Zhao J, Fang X. Construction of a multi-tissue compound-target interaction network of Qingfei Paidu decoction in COVID-19 treatment based on deep learning and transcriptomic analysis. J Bioinform Comput Biol 2024; 22:2450016. [PMID: 39036847 DOI: 10.1142/s0219720024500161] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/23/2024]
Abstract
The Qingfei Paidu decoction (QFPDD) is a widely acclaimed therapeutic formula employed nationwide for the clinical management of coronavirus disease 2019 (COVID-19). QFPDD exerts a synergistic therapeutic effect, characterized by its multi-component, multi-target, and multi-pathway action. However, the intricate interactions among the ingredients and targets within QFPDD and their systematic effects in multiple tissues remain undetermined. To address this, we qualitatively characterized the chemical components of QFPDD. We integrated multi-tissue transcriptomic analysis with GraphDTA, a deep learning model, to screen for potential compound-target interactions of QFPDD in multiple tissues. We predicted 13 key active compounds, 127 potential targets and 27 pathways associated with QFPDD across six different tissues. Notably, oleanolic acid-AXL exhibited leading affinity in the heart, blood, and liver. Molecular docking and molecular dynamics simulation confirmed their strong binding affinity. The robust interaction between oleanolic acid and the AXL receptor suggests that AXL is a promising target for developing clinical intervention strategies. Through the construction of a multi-tissue compound-target interaction network, our study further elucidated the mechanisms through which QFPDD effectively combats COVID-19 in multiple tissues. Our work also establishes a framework for future investigations into the systemic effects of other Traditional Chinese Medicine (TCM) formulas in disease treatment.
Collapse
Affiliation(s)
- Xia Li
- Third Clinical College, Shanxi Provincial Integrated TCM and WM Hospital, Shanxi University of Chinese Medicine, Jinzhong, Shanxi, P. R. China
| | - Xuetong Zhao
- National Genomics Data Center, China National Center for Bioinformation, Beijing 100101, P. R. China
- Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, P. R. China
- University of Chinese Academy of Sciences, Beijing 100049, P. R. China
| | - Xinjian Yu
- Quantitative and Computational Biosciences Graduate Program, Baylor College of Medicine, Houston, TX 77030, USA
| | - Jianping Zhao
- Third Clinical College, Shanxi Provincial Integrated TCM and WM Hospital, Shanxi University of Chinese Medicine, Jinzhong, Shanxi, P. R. China
| | - Xiangdong Fang
- National Genomics Data Center, China National Center for Bioinformation, Beijing 100101, P. R. China
- University of Chinese Academy of Sciences, Beijing 100049, P. R. China
| |
Collapse
|
28
|
Lavecchia A. Advancing drug discovery with deep attention neural networks. Drug Discov Today 2024; 29:104067. [PMID: 38925473 DOI: 10.1016/j.drudis.2024.104067] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2024] [Revised: 06/10/2024] [Accepted: 06/19/2024] [Indexed: 06/28/2024]
Abstract
In the dynamic field of drug discovery, deep attention neural networks are revolutionizing our approach to complex data. This review explores the attention mechanism and its extended architectures, including graph attention networks (GATs), transformers, bidirectional encoder representations from transformers (BERT), generative pre-trained transformers (GPTs) and bidirectional and auto-regressive transformers (BART). Delving into their core principles and multifaceted applications, we uncover their pivotal roles in catalyzing de novo drug design, predicting intricate molecular properties and deciphering elusive drug-target interactions. Despite challenges, these attention-based architectures hold unparalleled promise to drive transformative breakthroughs and accelerate progress in pharmaceutical research.
Collapse
Affiliation(s)
- Antonio Lavecchia
- Drug Discovery Laboratory, Department of Pharmacy, University of Napoli Federico II, I-80131 Naples, Italy.
| |
Collapse
|
29
|
Hao Y, Li B, Huang D, Wu S, Wang T, Fu L, Liu X. Developing a Semi-Supervised Approach Using a PU-Learning-Based Data Augmentation Strategy for Multitarget Drug Discovery. Int J Mol Sci 2024; 25:8239. [PMID: 39125808 PMCID: PMC11312053 DOI: 10.3390/ijms25158239] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2024] [Revised: 07/26/2024] [Accepted: 07/26/2024] [Indexed: 08/12/2024] Open
Abstract
Multifactorial diseases demand therapeutics that can modulate multiple targets for enhanced safety and efficacy, yet the clinical approval of multitarget drugs remains rare. The integration of machine learning (ML) and deep learning (DL) in drug discovery has revolutionized virtual screening. This study investigates the synergy between ML/DL methodologies, molecular representations, and data augmentation strategies. Notably, we found that SVM can match or even surpass the performance of state-of-the-art DL methods. However, conventional data augmentation often involves a trade-off between the true positive rate and false positive rate. To address this, we introduce Negative-Augmented PU-bagging (NAPU-bagging) SVM, a novel semi-supervised learning framework. By leveraging ensemble SVM classifiers trained on resampled bags containing positive, negative, and unlabeled data, our approach is capable of managing false positive rates while maintaining high recall rates. We applied this method to the identification of multitarget-directed ligands (MTDLs), where high recall rates are critical for compiling a list of interaction candidate compounds. Case studies demonstrate that NAPU-bagging SVM can identify structurally novel MTDL hits for ALK-EGFR with favorable docking scores and binding modes, as well as pan-agonists for dopamine receptors. The NAPU-bagging SVM methodology should serve as a promising avenue to virtual screening, especially for the discovery of MTDLs.
Collapse
Affiliation(s)
- Yang Hao
- Wisdom Lake Academy of Pharmacy, Xi’an Jiaotong-Liverpool University, Suzhou 215123, China; (Y.H.); (B.L.); (S.W.); (T.W.); (L.F.)
- Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Liverpool L69 7ZX, UK
| | - Bo Li
- Wisdom Lake Academy of Pharmacy, Xi’an Jiaotong-Liverpool University, Suzhou 215123, China; (Y.H.); (B.L.); (S.W.); (T.W.); (L.F.)
- Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Liverpool L69 7ZX, UK
| | - Daiyun Huang
- Wisdom Lake Academy of Pharmacy, Xi’an Jiaotong-Liverpool University, Suzhou 215123, China; (Y.H.); (B.L.); (S.W.); (T.W.); (L.F.)
- School of Life Sciences, Fudan University, Shanghai 200092, China
| | - Sijin Wu
- Wisdom Lake Academy of Pharmacy, Xi’an Jiaotong-Liverpool University, Suzhou 215123, China; (Y.H.); (B.L.); (S.W.); (T.W.); (L.F.)
| | - Tianjun Wang
- Wisdom Lake Academy of Pharmacy, Xi’an Jiaotong-Liverpool University, Suzhou 215123, China; (Y.H.); (B.L.); (S.W.); (T.W.); (L.F.)
- Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Liverpool L69 7ZX, UK
| | - Lei Fu
- Wisdom Lake Academy of Pharmacy, Xi’an Jiaotong-Liverpool University, Suzhou 215123, China; (Y.H.); (B.L.); (S.W.); (T.W.); (L.F.)
| | - Xin Liu
- Wisdom Lake Academy of Pharmacy, Xi’an Jiaotong-Liverpool University, Suzhou 215123, China; (Y.H.); (B.L.); (S.W.); (T.W.); (L.F.)
| |
Collapse
|
30
|
Liu S, Yu J, Ni N, Wang Z, Chen M, Li Y, Xu C, Ding Y, Zhang J, Yao X, Liu H. Versatile Framework for Drug-Target Interaction Prediction by Considering Domain-Specific Features. J Chem Inf Model 2024; 64:5646-5656. [PMID: 38976879 DOI: 10.1021/acs.jcim.4c00403] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/10/2024]
Abstract
Predicting drug-target interactions (DTIs) is one of the crucial tasks in drug discovery, but traditional wet-lab experiments are costly and time-consuming. Recently, deep learning has emerged as a promising tool for accelerating DTI prediction due to its powerful performance. However, the models trained on limited known DTI data struggle to generalize effectively to novel drug-target pairs. In this work, we propose a strategy to train an ensemble of models by capturing both domain-generic and domain-specific features (E-DIS) to learn diverse domain features and adapt them to out-of-distribution data. Multiple experts were trained on different domains to capture and align domain-specific information from various distributions without accessing any data from unseen domains. E-DIS provides a comprehensive representation of proteins and ligands by capturing diverse features. Experimental results on four benchmark data sets in both in-domain and cross-domain settings demonstrated that E-DIS significantly improved model performance and domain generalization compared to existing methods. Our approach presents a significant advancement in DTI prediction by combining domain-generic and domain-specific features, enhancing the generalization ability of the DTI prediction model.
Collapse
Affiliation(s)
- Shuo Liu
- School of Pharmacy, Lanzhou University, Gansu 730000, China
- Huawei Technologies Co., Ltd., Hangzhou 310000, China
| | - Jialiang Yu
- Huawei Technologies Co., Ltd., Hangzhou 310000, China
| | - Ningxi Ni
- Huawei Technologies Co., Ltd., Hangzhou 310000, China
| | - Zidong Wang
- Huawei Technologies Co., Ltd., Hangzhou 310000, China
| | - Mengyun Chen
- Huawei Technologies Co., Ltd., Hangzhou 310000, China
| | - Yuquan Li
- College of Chemistry and Chemical Engineering, Lanzhou University, Gansu 730000, China
| | - Chen Xu
- Huawei Technologies Co., Ltd., Hangzhou 310000, China
| | - Yahao Ding
- Huawei Technologies Co., Ltd., Hangzhou 310000, China
| | - Jun Zhang
- Changping Laboratory, Beijing 102200, China
| | - Xiaojun Yao
- Faculty of Applied Sciences, Macao Polytechnic University, Macao SAR 999078, China
| | - Huanxiang Liu
- Faculty of Applied Sciences, Macao Polytechnic University, Macao SAR 999078, China
| |
Collapse
|
31
|
Wu H, Liu J, Zhang R, Lu Y, Cui G, Cui Z, Ding Y. A review of deep learning methods for ligand based drug virtual screening. FUNDAMENTAL RESEARCH 2024; 4:715-737. [PMID: 39156568 PMCID: PMC11330120 DOI: 10.1016/j.fmre.2024.02.011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2023] [Revised: 01/10/2024] [Accepted: 02/18/2024] [Indexed: 08/20/2024] Open
Abstract
Drug discovery is costly and time consuming, and modern drug discovery endeavors are progressively reliant on computational methodologies, aiming to mitigate temporal and financial expenditures associated with the process. In particular, the time required for vaccine and drug discovery is prolonged during emergency situations such as the coronavirus 2019 pandemic. Recently, the performance of deep learning methods in drug virtual screening has been particularly prominent. It has become a concern for researchers how to summarize the existing deep learning in drug virtual screening, select different models for different drug screening problems, exploit the advantages of deep learning models, and further improve the capability of deep learning in drug virtual screening. This review first introduces the basic concepts of drug virtual screening, common datasets, and data representation methods. Then, large numbers of common deep learning methods for drug virtual screening are compared and analyzed. In addition, a dataset of different sizes is constructed independently to evaluate the performance of each deep learning model for the difficult problem of large-scale ligand virtual screening. Finally, the existing challenges and future directions in the field of virtual screening are presented.
Collapse
Affiliation(s)
- Hongjie Wu
- School of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou 215009, China
| | - Junkai Liu
- School of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou 215009, China
| | - Runhua Zhang
- School of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou 215009, China
| | - Yaoyao Lu
- School of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou 215009, China
| | - Guozeng Cui
- School of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou 215009, China
| | - Zhiming Cui
- School of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou 215009, China
| | - Yijie Ding
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou 324000, China
| |
Collapse
|
32
|
Li Y, Liang W, Peng L, Zhang D, Yang C, Li KC. Predicting Drug-Target Interactions Via Dual-Stream Graph Neural Network. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2024; 21:948-958. [PMID: 36074878 DOI: 10.1109/tcbb.2022.3204188] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Drug target interaction prediction is a crucial stage in drug discovery. However, brute-force search over a compound database is financially infeasible. We have witnessed the increasing measured drug-target interactions records in recent years, and the rich drug/protein-related information allows the usage of graph machine learning. Despite the advances in deep learning-enabled drug-target interaction, there are still open challenges: (1) rich and complex relationship between drugs and proteins can be explored; (2) the intermediate node is not calibrated in the heterogeneous graph. To tackle with above issues, this paper proposed a framework named DSG-DTI. Specifically, DSG-DTI has the heterogeneous graph autoencoder and heterogeneous attention network-based Matrix Completion. Our framework ensures that the known types of nodes (e.g., drug, target, side effects, diseases) are precisely embedded into high-dimensional space with our pretraining skills. Also, the attention-based heterogeneous graph-based matrix completion achieves highly competitive results via effective long-range dependencies extraction. We verify our model on two public benchmarks. The result of two publicly available benchmark application programs show that the proposed scheme effectively predicts drug-target interactions and can generalize to newly registered drugs and targets with slight performance degradation, outperforming the best accuracy compared with other baselines.
Collapse
|
33
|
Nguyen VTD, Hy TS. Multimodal pretraining for unsupervised protein representation learning. Biol Methods Protoc 2024; 9:bpae043. [PMID: 38983679 PMCID: PMC11233121 DOI: 10.1093/biomethods/bpae043] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2024] [Revised: 05/30/2024] [Accepted: 06/12/2024] [Indexed: 07/11/2024] Open
Abstract
Proteins are complex biomolecules essential for numerous biological processes, making them crucial targets for advancements in molecular biology, medical research, and drug design. Understanding their intricate, hierarchical structures, and functions is vital for progress in these fields. To capture this complexity, we introduce Multimodal Protein Representation Learning (MPRL), a novel framework for symmetry-preserving multimodal pretraining that learns unified, unsupervised protein representations by integrating primary and tertiary structures. MPRL employs Evolutionary Scale Modeling (ESM-2) for sequence analysis, Variational Graph Auto-Encoders (VGAE) for residue-level graphs, and PointNet Autoencoder (PAE) for 3D point clouds of atoms, each designed to capture the spatial and evolutionary intricacies of proteins while preserving critical symmetries. By leveraging Auto-Fusion to synthesize joint representations from these pretrained models, MPRL ensures robust and comprehensive protein representations. Our extensive evaluation demonstrates that MPRL significantly enhances performance in various tasks such as protein-ligand binding affinity prediction, protein fold classification, enzyme activity identification, and mutation stability prediction. This framework advances the understanding of protein dynamics and facilitates future research in the field. Our source code is publicly available at https://github.com/HySonLab/Protein_Pretrain.
Collapse
Affiliation(s)
| | - Truong Son Hy
- FPT Software AI Center, HCMC, Hanoi, Vietnam
- Department of Mathematics and Computer Science, Indiana State University, Terre Haute, IN, 47809, United States
| |
Collapse
|
34
|
Luong KD, Singh A. Application of Transformers in Cheminformatics. J Chem Inf Model 2024; 64:4392-4409. [PMID: 38815246 PMCID: PMC11167597 DOI: 10.1021/acs.jcim.3c02070] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2023] [Revised: 04/05/2024] [Accepted: 05/06/2024] [Indexed: 06/01/2024]
Abstract
By accelerating time-consuming processes with high efficiency, computing has become an essential part of many modern chemical pipelines. Machine learning is a class of computing methods that can discover patterns within chemical data and utilize this knowledge for a wide variety of downstream tasks, such as property prediction or substance generation. The complex and diverse chemical space requires complex machine learning architectures with great learning power. Recently, learning models based on transformer architectures have revolutionized multiple domains of machine learning, including natural language processing and computer vision. Naturally, there have been ongoing endeavors in adopting these techniques to the chemical domain, resulting in a surge of publications within a short period. The diversity of chemical structures, use cases, and learning models necessitate a comprehensive summarization of existing works. In this paper, we review recent innovations in adapting transformers to solve learning problems in chemistry. Because chemical data is diverse and complex, we structure our discussion based on chemical representations. Specifically, we highlight the strengths and weaknesses of each representation, the current progress of adapting transformer architectures, and future directions.
Collapse
Affiliation(s)
- Kha-Dinh Luong
- Department of Computer Science, University of California Santa Barbara, Santa Barbara, CA 93106, United States
| | - Ambuj Singh
- Department of Computer Science, University of California Santa Barbara, Santa Barbara, CA 93106, United States
| |
Collapse
|
35
|
Tian T, Li S, Zhang Z, Chen L, Zou Z, Zhao D, Zeng J. Benchmarking compound activity prediction for real-world drug discovery applications. Commun Chem 2024; 7:127. [PMID: 38834746 DOI: 10.1038/s42004-024-01204-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2024] [Accepted: 05/16/2024] [Indexed: 06/06/2024] Open
Abstract
Identifying active compounds for target proteins is fundamental in early drug discovery. Recently, data-driven computational methods have demonstrated promising potential in predicting compound activities. However, there lacks a well-designed benchmark to comprehensively evaluate these methods from a practical perspective. To fill this gap, we propose a Compound Activity benchmark for Real-world Applications (CARA). Through carefully distinguishing assay types, designing train-test splitting schemes and selecting evaluation metrics, CARA can consider the biased distribution of current real-world compound activity data and avoid overestimation of model performances. We observed that although current models can make successful predictions for certain proportions of assays, their performances varied across different assays. In addition, evaluation of several few-shot training strategies demonstrated different performances related to task types. Overall, we provide a high-quality dataset for developing and evaluating compound activity prediction models, and the analyses in this work may inspire better applications of data-driven models in drug discovery.
Collapse
Affiliation(s)
- Tingzhong Tian
- Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing, China
| | - Shuya Li
- Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing, China
| | - Ziting Zhang
- Department of Automation, Tsinghua University, Beijing, China
- MOE Key Laboratory of Bioinformatics, Tsinghua University, Beijing, China
| | - Lin Chen
- Silexon AI Technology Co., Ltd., Nanjing, Jiangsu Province, China
| | - Ziheng Zou
- Silexon AI Technology Co., Ltd., Nanjing, Jiangsu Province, China
| | - Dan Zhao
- Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing, China.
| | - Jianyang Zeng
- Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing, China.
- School of Engineering, Westlake University, Hangzhou, Zhejiang Province, China.
| |
Collapse
|
36
|
Zhang Q, Zuo L, Ren Y, Wang S, Wang W, Ma L, Zhang J, Xia B. FMCA-DTI: a fragment-oriented method based on a multihead cross attention mechanism to improve drug-target interaction prediction. Bioinformatics 2024; 40:btae347. [PMID: 38810106 PMCID: PMC11256963 DOI: 10.1093/bioinformatics/btae347] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2023] [Revised: 04/23/2024] [Accepted: 05/28/2024] [Indexed: 05/31/2024] Open
Abstract
MOTIVATION Identifying drug-target interactions (DTI) is crucial in drug discovery. Fragments are less complex and can accurately characterize local features, which is important in DTI prediction. Recently, deep learning (DL)-based methods predict DTI more efficiently. However, two challenges remain in existing DL-based methods: (i) some methods directly encode drugs and proteins into integers, ignoring the substructure representation; (ii) some methods learn the features of the drugs and proteins separately instead of considering their interactions. RESULTS In this article, we propose a fragment-oriented method based on a multihead cross attention mechanism for predicting DTI, named FMCA-DTI. FMCA-DTI obtains multiple types of fragments of drugs and proteins by branch chain mining and category fragment mining. Importantly, FMCA-DTI utilizes the shared-weight-based multihead cross attention mechanism to learn the complex interaction features between different fragments. Experiments on three benchmark datasets show that FMCA-DTI achieves significantly improved performance by comparing it with four state-of-the-art baselines. AVAILABILITY AND IMPLEMENTATION The code for this workflow is available at: https://github.com/jacky102022/FMCA-DTI.
Collapse
Affiliation(s)
- Qi Zhang
- College of Mathematics and Computer Science, Yan'an University, Yan'an 716000, China
| | - Le Zuo
- College of Mathematics and Computer Science, Yan'an University, Yan'an 716000, China
| | - Ying Ren
- College of Mathematics and Computer Science, Yan'an University, Yan'an 716000, China
| | - Siyuan Wang
- College of Mathematics and Computer Science, Yan'an University, Yan'an 716000, China
| | - Wenfa Wang
- College of Mathematics and Computer Science, Yan'an University, Yan'an 716000, China
| | - Lerong Ma
- College of Mathematics and Computer Science, Yan'an University, Yan'an 716000, China
| | - Jing Zhang
- Medical College of Yan'an University, Yan'an University, Yan'an 716000, China
- Medical Research and Experimental Center, The Second Affiliated Hospital of Xi'an Medical University, Xi'an 710021, China
| | - Bisheng Xia
- College of Mathematics and Computer Science, Yan'an University, Yan'an 716000, China
| |
Collapse
|
37
|
Yang Z, Liu J, Yang F, Zhang X, Zhang Q, Zhu X, Jiang P. Advancing Drug-Target Interaction prediction with BERT and subsequence embedding. Comput Biol Chem 2024; 110:108058. [PMID: 38593480 DOI: 10.1016/j.compbiolchem.2024.108058] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2023] [Revised: 02/01/2024] [Accepted: 03/12/2024] [Indexed: 04/11/2024]
Abstract
Exploring the relationship between proteins and drugs plays a significant role in discovering new synthetic drugs. The Drug-Target Interaction (DTI) prediction is a fundamental task in the relationship between proteins and drugs. Unlike encoding proteins by amino acids, we use amino acid subsequence to encode proteins, which simulates the biological process of DTI better. For this research purpose, we proposed a novel deep learning framework based on Bidirectional Encoder Representation from Transformers (BERT), which integrates high-frequency subsequence embedding and transfer learning methods to complete the DTI prediction task. As the first key module, subsequence embedding allows to explore the functional interaction units from drug and protein sequences and then contribute to finding DTI modules. As the second key module, transfer learning promotes the model learn the common DTI features from protein and drug sequences in a large dataset. Overall, the BERT-based model can learn two kinds features through the multi-head self-attention mechanism: internal features of sequence and interaction features of both proteins and drugs, respectively. Compared with other methods, BERT-based methods enable more DTI-related features to be discovered by means of attention scores which associated with tokenized protein/drug subsequences. We conducted extensive experiments for the DTI prediction task on three different benchmark datasets. The experimental results show that the model achieves an average prediction metrics higher than most baseline methods. In order to verify the importance of transfer learning, we conducted an ablation study on datasets, and the results show the superiority of transfer learning. In addition, we test the scalability of the model on the dataset in unseen drugs and proteins, and the results of the experiments show that it is acceptable in scalability.
Collapse
Affiliation(s)
- Zhihui Yang
- Institute of Artificial Intelligence, School of Computer Science, Wuhan University, Wuhan, 430072, Hubei province, China
| | - Juan Liu
- Institute of Artificial Intelligence, School of Computer Science, Wuhan University, Wuhan, 430072, Hubei province, China.
| | - Feng Yang
- Institute of Artificial Intelligence, School of Computer Science, Wuhan University, Wuhan, 430072, Hubei province, China
| | - Xiaolei Zhang
- Institute of Artificial Intelligence, School of Computer Science, Wuhan University, Wuhan, 430072, Hubei province, China
| | - Qiang Zhang
- Institute of Artificial Intelligence, School of Computer Science, Wuhan University, Wuhan, 430072, Hubei province, China
| | - Xuekai Zhu
- Institute of Artificial Intelligence, School of Computer Science, Wuhan University, Wuhan, 430072, Hubei province, China
| | - Peng Jiang
- Institute of Artificial Intelligence, School of Computer Science, Wuhan University, Wuhan, 430072, Hubei province, China
| |
Collapse
|
38
|
Shah HA, Liu J, Yang Z. Gtie-Rt: A comprehensive graph learning model for predicting drugs targeting metabolic pathways in human. J Bioinform Comput Biol 2024; 22:2450010. [PMID: 39030668 DOI: 10.1142/s0219720024500100] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/21/2024]
Abstract
Drugs often target specific metabolic pathways to produce a therapeutic effect. However, these pathways are complex and interconnected, making it challenging to predict a drug's potential effects on an organism's overall metabolism. The mapping of drugs with targeting metabolic pathways in the organisms can provide a more complete understanding of the metabolic effects of a drug and help to identify potential drug-drug interactions. In this study, we proposed a machine learning hybrid model Graph Transformer Integrated Encoder (GTIE-RT) for mapping drugs to target metabolic pathways in human. The proposed model is a composite of a Graph Convolution Network (GCN) and transformer encoder for graph embedding and attention mechanism. The output of the transformer encoder is then fed into the Extremely Randomized Trees Classifier to predict target metabolic pathways. The evaluation of the GTIE-RT on drugs dataset demonstrates excellent performance metrics, including accuracy (>95%), recall (>92%), precision (>93%) and F1-score (>92%). Compared to other variants and machine learning methods, GTIE-RT consistently shows more reliable results.
Collapse
Affiliation(s)
- Hayat Ali Shah
- Institute of Artificial Intelligence, School of Computer Science, Wuhan University, Wuhan, P. R. China
| | - Juan Liu
- Institute of Artificial Intelligence, School of Computer Science, Wuhan University, Wuhan, P. R. China
| | - Zhihui Yang
- Institute of Artificial Intelligence, School of Computer Science, Wuhan University, Wuhan, P. R. China
| |
Collapse
|
39
|
Feng BM, Zhang YY, Zhou XC, Wang JL, Feng YF. MolLoG: A Molecular Level Interpretability Model Bridging Local to Global for Predicting Drug Target Interactions. J Chem Inf Model 2024; 64:4348-4358. [PMID: 38709146 DOI: 10.1021/acs.jcim.4c00171] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/07/2024]
Abstract
Developing new pharmaceuticals is a costly and time-consuming endeavor fraught with significant safety risks. A critical aspect of drug research and disease therapy is discerning the existence of interactions between drugs and proteins. The evolution of deep learning (DL) in computer science has been remarkably aided in this regard in recent years. Yet, two challenges remain: (i) balancing the extraction of profound, local cohesive characteristics while warding off gradient disappearance and (ii) globally representing and understanding the interactions between the drug and target local attributes, which is vital for delivering molecular level insights indispensable to drug development. In response to these challenges, we propose a DL network structure, MolLoG, primarily comprising two modules: local feature encoders (LFE) and global interactive learning (GIL). Within the LFE module, graph convolution networks and leap blocks capture the local features of drug and protein molecules, respectively. The GIL module enables the efficient amalgamation of feature information, facilitating the global learning of feature structural semantics and procuring multihead attention weights for abstract features stemming from two modalities, providing biologically pertinent explanations for black-box results. Finally, predictive outcomes are achieved by decoding the unified representation via a multilayer perceptron. Our experimental analysis reveals that MolLoG outperforms several cutting-edge baselines across four data sets, delivering superior overall performance and providing satisfactory results when elucidating various facets of drug-target interaction predictions.
Collapse
Affiliation(s)
- Bao-Ming Feng
- School of Information and Control Engineering, Qingdao University of Technology, Qingdao, 266520 Shandong, China
| | - Yuan-Yuan Zhang
- School of Information and Control Engineering, Qingdao University of Technology, Qingdao, 266520 Shandong, China
| | - Xiao-Chen Zhou
- School of Information and Control Engineering, Qingdao University of Technology, Qingdao, 266520 Shandong, China
| | - Jin-Long Wang
- School of Information and Control Engineering, Qingdao University of Technology, Qingdao, 266520 Shandong, China
| | - Yin-Fei Feng
- School of Information and Control Engineering, Qingdao University of Technology, Qingdao, 266520 Shandong, China
| |
Collapse
|
40
|
Zhang S, Tian X, Chen C, Su Y, Huang W, Lv X, Chen C, Li H. AIGO-DTI: Predicting Drug-Target Interactions Based on Improved Drug Properties Combined with Adaptive Iterative Algorithms. J Chem Inf Model 2024; 64:4373-4384. [PMID: 38743013 DOI: 10.1021/acs.jcim.4c00584] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/16/2024]
Abstract
Artificial intelligence-based methods for predicting drug-target interactions (DTIs) aim to explore reliable drug candidate targets rapidly and cost-effectively to accelerate the drug development process. However, current methods are often limited by the topological regularities of drug molecules, making them difficult to generalize to a broader chemical space. Additionally, the use of similarity to measure DTI network links often introduces noise, leading to false DTI relationships and affecting the prediction accuracy. To address these issues, this study proposes an Adaptive Iterative Graph Optimization (AIGO)-DTI prediction framework. This framework integrates atomic cluster information and enhances molecular features through the design of functional group prompts and graph encoders, optimizing the construction of DTI association networks. Furthermore, the optimization of graph structure is transformed into a node similarity learning problem, utilizing multihead similarity metric functions to iteratively update the network structure to improve the quality of DTI information. Experimental results demonstrate the outstanding performance of AIGO-DTI on multiple public data sets and label reversal data sets. Case studies, molecular docking, and existing research validate its effectiveness and reliability. Overall, the method proposed in this study can construct comprehensive and reliable DTI association network information, providing new graphing and optimization strategies for DTI prediction, which contribute to efficient drug development and reduce target discovery costs.
Collapse
Affiliation(s)
- Sizhe Zhang
- College of Software, Xinjiang University, Urumqi, 830046 Xinjiang, China
| | - Xuecong Tian
- College of Information Science and Engineering, Xinjiang University, Urumqi, 830046 Xinjiang, China
| | - Chen Chen
- College of Information Science and Engineering, Xinjiang University, Urumqi, 830046 Xinjiang, China
| | - Ying Su
- College of Information Science and Engineering, Xinjiang University, Urumqi, 830046 Xinjiang, China
| | - Wanhua Huang
- College of Information Science and Engineering, Xinjiang University, Urumqi, 830046 Xinjiang, China
| | - Xiaoyi Lv
- College of Software, Xinjiang University, Urumqi, 830046 Xinjiang, China
| | - Cheng Chen
- College of Software, Xinjiang University, Urumqi, 830046 Xinjiang, China
| | - Hongyi Li
- Xinjiang University, Urumqi, 830046 Xinjiang, China
| |
Collapse
|
41
|
Kalemati M, Zamani Emani M, Koohi S. DCGAN-DTA: Predicting drug-target binding affinity with deep convolutional generative adversarial networks. BMC Genomics 2024; 25:411. [PMID: 38724911 PMCID: PMC11080241 DOI: 10.1186/s12864-024-10326-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2024] [Accepted: 04/19/2024] [Indexed: 05/13/2024] Open
Abstract
BACKGROUND In recent years, there has been a growing interest in utilizing computational approaches to predict drug-target binding affinity, aiming to expedite the early drug discovery process. To address the limitations of experimental methods, such as cost and time, several machine learning-based techniques have been developed. However, these methods encounter certain challenges, including the limited availability of training data, reliance on human intervention for feature selection and engineering, and a lack of validation approaches for robust evaluation in real-life applications. RESULTS To mitigate these limitations, in this study, we propose a method for drug-target binding affinity prediction based on deep convolutional generative adversarial networks. Additionally, we conducted a series of validation experiments and implemented adversarial control experiments using straw models. These experiments serve to demonstrate the robustness and efficacy of our predictive models. We conducted a comprehensive evaluation of our method by comparing it to baselines and state-of-the-art methods. Two recently updated datasets, namely the BindingDB and PDBBind, were used for this purpose. Our findings indicate that our method outperforms the alternative methods in terms of three performance measures when using warm-start data splitting settings. Moreover, when considering physiochemical-based cold-start data splitting settings, our method demonstrates superior predictive performance, particularly in terms of the concordance index. CONCLUSION The results of our study affirm the practical value of our method and its superiority over alternative approaches in predicting drug-target binding affinity across multiple validation sets. This highlights the potential of our approach in accelerating drug repurposing efforts, facilitating novel drug discovery, and ultimately enhancing disease treatment. The data and source code for this study were deposited in the GitHub repository, https://github.com/mojtabaze7/DCGAN-DTA . Furthermore, the web server for our method is accessible at https://dcgan.shinyapps.io/bindingaffinity/ .
Collapse
Affiliation(s)
- Mahmood Kalemati
- Department of Computer Engineering, Sharif University of Technology, Tehran, Iran
| | - Mojtaba Zamani Emani
- Department of Computer Engineering, Sharif University of Technology, Tehran, Iran
| | - Somayyeh Koohi
- Department of Computer Engineering, Sharif University of Technology, Tehran, Iran.
| |
Collapse
|
42
|
Xia Y, Pan X, Shen HB. Heterogeneous sampled subgraph neural networks with knowledge distillation to enhance double-blind compound-protein interaction prediction. Structure 2024; 32:611-620.e4. [PMID: 38447575 DOI: 10.1016/j.str.2024.02.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2023] [Revised: 12/18/2023] [Accepted: 02/08/2024] [Indexed: 03/08/2024]
Abstract
Identifying binding compounds against a target protein is crucial for large-scale virtual screening in drug development. Recently, network-based methods have been developed for compound-protein interaction (CPI) prediction. However, they are difficult to be applied to unseen (i.e., never-seen-before) proteins and compounds. In this study, we propose SgCPI to incorporate local known interacting networks to predict CPI interactions. SgCPI randomly samples the local CPI network of the query compound-protein pair as a subgraph and applies a heterogeneous graph neural network (HGNN) to embed the active/inactive message of the subgraph. For unseen compounds and proteins, SgCPI-KD takes SgCPI as the teacher model to distillate its knowledge by estimating the potential neighbors. Experimental results indicate: (1) the sampled subgraphs of the CPI network introduce efficient knowledge for unseen molecular prediction with the HGNNs, and (2) the knowledge distillation strategy is beneficial to the double-blind interaction prediction by estimating molecular neighbors and distilling knowledge.
Collapse
Affiliation(s)
- Ying Xia
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai 200240, China
| | - Xiaoyong Pan
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai 200240, China.
| | - Hong-Bin Shen
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai 200240, China.
| |
Collapse
|
43
|
Pan F, Yin C, Liu SQ, Huang T, Bian Z, Yuen PC. BindingSiteDTI: differential-scale binding site modelling for drug-target interaction prediction. Bioinformatics 2024; 40:btae308. [PMID: 38730554 PMCID: PMC11256917 DOI: 10.1093/bioinformatics/btae308] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2023] [Revised: 03/06/2024] [Accepted: 05/09/2024] [Indexed: 05/13/2024] Open
Abstract
MOTIVATION Enhanced by contemporary computational advances, the prediction of drug-target interactions (DTIs) has become crucial in developing de novo and effective drugs. Existing deep learning approaches to DTI prediction are frequently beleaguered by a tendency to overfit specific molecular representations, which significantly impedes their predictive reliability and utility in novel drug discovery contexts. Furthermore, existing DTI networks often disregard the molecular size variance between macro molecules (targets) and micro molecules (drugs) by treating them at an equivalent scale that undermines the accurate elucidation of their interaction. RESULTS We propose a novel DTI network with a differential-scale scheme to model the binding site for enhancing DTI prediction, which is named as BindingSiteDTI. It explicitly extracts multiscale substructures from targets with different scales of molecular size and fixed-scale substructures from drugs, facilitating the identification of structurally similar substructural tokens, and models the concealed relationships at the substructural level to construct interaction feature. Experiments conducted on popular benchmarks, including DUD-E, human, and BindingDB, shown that BindingSiteDTI contains significant improvements compared with recent DTI prediction methods. AVAILABILITY AND IMPLEMENTATION The source code of BindingSiteDTI can be accessed at https://github.com/MagicPF/BindingSiteDTI.
Collapse
Affiliation(s)
- Feng Pan
- Department of Computer Science, Hong Kong Baptist University, Kowloon, 999077, Hong Kong
| | - Chong Yin
- Department of Computer Science, Hong Kong Baptist University, Kowloon, 999077, Hong Kong
| | - Si-Qi Liu
- Department of Computer Science, Hong Kong Baptist University, Kowloon, 999077, Hong Kong
- Shenzhen Research Institute of Big Data, The Chinese University of Hong Kong (Shenzhen), 518172, China
| | - Tao Huang
- School of Chinese Medicine, Hong Kong Baptist University, Kowloon, 999077, Hong Kong
| | - Zhaoxiang Bian
- School of Chinese Medicine, Hong Kong Baptist University, Kowloon, 999077, Hong Kong
| | - Pong Chi Yuen
- Department of Computer Science, Hong Kong Baptist University, Kowloon, 999077, Hong Kong
| |
Collapse
|
44
|
Zhou Z, Liao Q, Wei J, Zhuo L, Wu X, Fu X, Zou Q. Revisiting drug-protein interaction prediction: a novel global-local perspective. Bioinformatics 2024; 40:btae271. [PMID: 38648052 PMCID: PMC11087820 DOI: 10.1093/bioinformatics/btae271] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2023] [Revised: 02/09/2024] [Accepted: 04/17/2024] [Indexed: 04/25/2024] Open
Abstract
MOTIVATION Accurate inference of potential drug-protein interactions (DPIs) aids in understanding drug mechanisms and developing novel treatments. Existing deep learning models, however, struggle with accurate node representation in DPI prediction, limiting their performance. RESULTS We propose a new computational framework that integrates global and local features of nodes in the drug-protein bipartite graph for efficient DPI inference. Initially, we employ pre-trained models to acquire fundamental knowledge of drugs and proteins and to determine their initial features. Subsequently, the MinHash and HyperLogLog algorithms are utilized to estimate the similarity and set cardinality between drug and protein subgraphs, serving as their local features. Then, an energy-constrained diffusion mechanism is integrated into the transformer architecture, capturing interdependencies between nodes in the drug-protein bipartite graph and extracting their global features. Finally, we fuse the local and global features of nodes and employ multilayer perceptrons to predict the likelihood of potential DPIs. A comprehensive and precise node representation guarantees efficient prediction of unknown DPIs by the model. Various experiments validate the accuracy and reliability of our model, with molecular docking results revealing its capability to identify potential DPIs not present in existing databases. This approach is expected to offer valuable insights for furthering drug repurposing and personalized medicine research. AVAILABILITY AND IMPLEMENTATION Our code and data are accessible at: https://github.com/ZZCrazy00/DPI.
Collapse
Affiliation(s)
- Zhecheng Zhou
- School of Data Science and Artificial Intelligence, Wenzhou University of Technology, Wenzhou 325027, China
| | - Qingquan Liao
- College of Computer Science and Electronic Engineering, Hunan University, Changsha 410012, China
| | - Jinhang Wei
- School of Data Science and Artificial Intelligence, Wenzhou University of Technology, Wenzhou 325027, China
| | - Linlin Zhuo
- School of Data Science and Artificial Intelligence, Wenzhou University of Technology, Wenzhou 325027, China
| | - Xiaonan Wu
- School of Data Science and Artificial Intelligence, Wenzhou University of Technology, Wenzhou 325027, China
| | - Xiangzheng Fu
- College of Computer Science and Electronic Engineering, Hunan University, Changsha 410012, China
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu 611730, China
| |
Collapse
|
45
|
Qiu X, Wang H, Tan X, Fang Z. G-K BertDTA: A graph representation learning and semantic embedding-based framework for drug-target affinity prediction. Comput Biol Med 2024; 173:108376. [PMID: 38552281 DOI: 10.1016/j.compbiomed.2024.108376] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2023] [Revised: 03/21/2024] [Accepted: 03/24/2024] [Indexed: 04/17/2024]
Abstract
Developing new drugs is costly, time-consuming, and risky. Drug-target affinity (DTA), indicating the binding capability between drugs and target proteins, is a crucial indicator for drug development. Accurately predicting interaction strength between new drug-target pairs by analyzing previous experiments aids in screening potential drug molecules, repurposing them, and developing safe and effective medicines. Existing computational models for DTA prediction rely on strings or single-graph neural networks, lacking consideration of protein structure and molecular semantic information, leading to limited accuracy. Our experiments demonstrate that string-based methods may overlook protein conformations, causing a high root mean square error (RMSE) of 3.584 in affinity due to a lack of spatial context. Single graph networks also underperform on topology features, with a 6% lower confidence interval (CI) for activity classification. Absent semantic information also limits generalization across diverse compounds, resulting in 18% increment in RMSE and 5% in misclassifications within quantifications study, restricting potential drug discovery. To address these limitations, we propose G-K BertDTA, a novel framework for accurate DTA prediction incorporating protein features, molecular semantic features, and molecular structural information. In this proposed model, we represent drugs as graphs, with a GIN employed to learn the molecular topological information. For the extraction of protein structural features, we utilize a DenseNet architecture. A knowledge-based BERT semantic model is incorporated to obtain rich pre-trained semantic embeddings, thereby enhancing the feature information. We extensively evaluated our proposed approach on the publicly available benchmark datasets (i.e., KIBA and Davis), and experimental results demonstrate the promising performance of our method, which consistently outperforms previous state-of-the-art approaches. Code is available at https://github.com/AmbitYuki/G-K-BertDTA.
Collapse
Affiliation(s)
- Xihe Qiu
- School of Electronic and Electrical Engineering, Shanghai University of Engineering Science, Shanghai, China
| | - Haoyu Wang
- School of Electronic and Electrical Engineering, Shanghai University of Engineering Science, Shanghai, China
| | - Xiaoyu Tan
- INF Technology (Shanghai) Co., Ltd., Shanghai, China
| | - Zhijun Fang
- School of Computer Science and Technology, Donghua University, Shanghai, China.
| |
Collapse
|
46
|
Gao M, Zhang D, Chen Y, Zhang Y, Wang Z, Wang X, Li S, Guo Y, Webb GI, Nguyen ATN, May L, Song J. GraphormerDTI: A graph transformer-based approach for drug-target interaction prediction. Comput Biol Med 2024; 173:108339. [PMID: 38547658 DOI: 10.1016/j.compbiomed.2024.108339] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2023] [Revised: 03/05/2024] [Accepted: 03/17/2024] [Indexed: 04/17/2024]
Abstract
The application of Artificial Intelligence (AI) to screen drug molecules with potential therapeutic effects has revolutionized the drug discovery process, with significantly lower economic cost and time consumption than the traditional drug discovery pipeline. With the great power of AI, it is possible to rapidly search the vast chemical space for potential drug-target interactions (DTIs) between candidate drug molecules and disease protein targets. However, only a small proportion of molecules have labelled DTIs, consequently limiting the performance of AI-based drug screening. To solve this problem, a machine learning-based approach with great ability to generalize DTI prediction across molecules is desirable. Many existing machine learning approaches for DTI identification failed to exploit the full information with respect to the topological structures of candidate molecules. To develop a better approach for DTI prediction, we propose GraphormerDTI, which employs the powerful Graph Transformer neural network to model molecular structures. GraphormerDTI embeds molecular graphs into vector-format representations through iterative Transformer-based message passing, which encodes molecules' structural characteristics by node centrality encoding, node spatial encoding and edge encoding. With a strong structural inductive bias, the proposed GraphormerDTI approach can effectively infer informative representations for out-of-sample molecules and as such, it is capable of predicting DTIs across molecules with an exceptional performance. GraphormerDTI integrates the Graph Transformer neural network with a 1-dimensional Convolutional Neural Network (1D-CNN) to extract the drugs' and target proteins' representations and leverages an attention mechanism to model the interactions between them. To examine GraphormerDTI's performance for DTI prediction, we conduct experiments on three benchmark datasets, where GraphormerDTI achieves a superior performance than five state-of-the-art baselines for out-of-molecule DTI prediction, including GNN-CPI, GNN-PT, DeepEmbedding-DTI, MolTrans and HyperAttentionDTI, and is on a par with the best baseline for transductive DTI prediction. The source codes and datasets are publicly accessible at https://github.com/mengmeng34/GraphormerDTI.
Collapse
Affiliation(s)
- Mengmeng Gao
- School of Biological Science and Medical Engineering, Southeast University, Nanjing, China
| | - Daokun Zhang
- Department of Data Science and Artificial Intelligence, Faculty of Information Technology, Monash University, Melbourne, Australia.
| | - Yi Chen
- School of Biological Science and Medical Engineering, Southeast University, Nanjing, China.
| | - Yiwen Zhang
- Climate, Air Quality Research Unit, School of Public Health and Preventive Medicine, Monash University, Melbourne, VIC, 3004, Australia
| | - Zhikang Wang
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, Australia
| | - Xiaoyu Wang
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, Australia
| | - Shanshan Li
- Climate, Air Quality Research Unit, School of Public Health and Preventive Medicine, Monash University, Melbourne, VIC, 3004, Australia
| | - Yuming Guo
- Climate, Air Quality Research Unit, School of Public Health and Preventive Medicine, Monash University, Melbourne, VIC, 3004, Australia
| | - Geoffrey I Webb
- Department of Data Science and Artificial Intelligence, Faculty of Information Technology, Monash University, Melbourne, Australia
| | - Anh T N Nguyen
- Drug Discovery Biology Theme, Monash Institute of Pharmaceutical Sciences, Monash University, Melbourne, Australia
| | - Lauren May
- Drug Discovery Biology Theme, Monash Institute of Pharmaceutical Sciences, Monash University, Melbourne, Australia
| | - Jiangning Song
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, Australia.
| |
Collapse
|
47
|
Hong Q, Zhou G, Qin Y, Shen J, Li H. SadNet: a novel multimodal fusion network for protein-ligand binding affinity prediction. Phys Chem Chem Phys 2024; 26:12880-12891. [PMID: 38625412 DOI: 10.1039/d3cp05664c] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/17/2024]
Abstract
Protein-ligand binding affinity prediction plays an important role in the field of drug discovery. Existing deep learning-based approaches have significantly improved the efficiency of protein-ligand binding affinity prediction through their excellent inductive bias capability. However, these methods only focus on fragmented three-dimensional data, which truncates the integrity of pocket data, leading to the neglect of potential long-range interactions. In this paper, we propose a dual-stream framework, with amino acid sequence assisting the atomic data fusion for graph neural network (termed SadNet), to fuse both 3D atomic data and sequence data for more accurate prediction results. In detail, SadNet consists of a pocket module and a sequence module. The sequence module expands the "receptive field" of the pocket module through a mid-term virtual node fusion. To better integrate sequence-level information from the sequence module and 3D structural information from the pocket module, we incorporate structural information for each amino acid within the sequence module. Besides, to better understand the intrinsic relationship between sequences and 3D atomic information, our SadNet utilizes information stacking from both the early stage and later stage. Experimental results on publicly available benchmark datasets demonstrate the superiority of the proposed dual-stream approach over the state-of-the-art alternatives. The code of this work is available online at https://github.com/wardhong/SadNet.
Collapse
Affiliation(s)
- Qiansen Hong
- Nanjing University of Posts and Telecommunications, NanJing, China.
| | - Guoqiang Zhou
- Nanjing University of Posts and Telecommunications, NanJing, China.
| | - Yuke Qin
- Nanjing University of Posts and Telecommunications, NanJing, China.
| | - Jun Shen
- University of Wollongong, Australia
| | | |
Collapse
|
48
|
Du M, Xie X, Luo J, Li J. Meta-learning-based Inductive logistic matrix completion for prediction of kinase inhibitors. J Cheminform 2024; 16:44. [PMID: 38627866 PMCID: PMC11301988 DOI: 10.1186/s13321-024-00838-9] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2023] [Accepted: 03/31/2024] [Indexed: 08/09/2024] Open
Abstract
Protein kinases become an important source of potential drug targets. Developing new, efficient, and safe small-molecule kinase inhibitors has become an important topic in the field of drug research and development. In contrast with traditional wet experiments which are time-consuming and expensive, machine learning-based approaches for predicting small molecule inhibitors for protein kinases are time-saving and cost-effective, which are highly desired for us. However, the issue of sample scarcity (known active and inactive compounds are usually limited for most kinases) poses a challenge to the research and development of machine learning-based kinase inhibitors' active prediction methods. To alleviate the data scarcity problem in the prediction of kinase inhibitors, in this study, we present a novel Meta-learning-based inductive logistic matrix completion method for the Prediction of Kinase Inhibitors (MetaILMC). MetaILMC adopts a meta-learning framework to learn a well-generalized model from tasks with sufficient samples, which can fast adapt to new tasks with limited samples. As MetaILMC allows the effective transfer of the prior knowledge learned from kinases with sufficient samples to kinases with a small number of samples, the proposed model can produce accurate predictions for kinases with limited data. Experimental results show that MetaILMC has excellent performance for prediction tasks of kinases with few-shot samples and is significantly superior to the state-of-the-art multi-task learning in terms of AUC, AUPR, etc., various performance metrics. Case studies also provided for two drugs to predict Kinase Inhibitory scores, further validating the proposed method's effectiveness and feasibility. SCIENTIFIC CONTRIBUTION: Considering the potential correlation between activity prediction tasks for different kinases, we propose a novel meta learning algorithm MetaILMC, which learns a prior of strong generalization capacity during meta-training from the tasks with sufficient training samples, such that it can be easily and quickly adapted to the new tasks of the kinase with scarce data during meta-testing. Thus, MetaILMC can effectively alleviate the data scarcity problem in the prediction of kinase inhibitors.
Collapse
Affiliation(s)
- Ming Du
- School of Software, Yunnan University, Kunming, 650091, China
| | - XingRan Xie
- School of Software, Yunnan University, Kunming, 650091, China
| | - Jing Luo
- State Key Laboratory for Conservation and Utilization of Bio-Resource, School of Ecology and Environment and School of Life Sciences, Yunnan University, Kunming, 650091, Yunnan, China
| | - Jin Li
- School of Software, Yunnan University, Kunming, 650091, China.
- The Key Laboratory of Software Engineering of Yunnan Province, Kunming, 650091, China.
- The Cloud Computing Engineering Research Center of Yunnan Province, Kunming, 650091, China.
| |
Collapse
|
49
|
Zeng X, Chen W, Lei B. CAT-DTI: cross-attention and Transformer network with domain adaptation for drug-target interaction prediction. BMC Bioinformatics 2024; 25:141. [PMID: 38566002 PMCID: PMC11264959 DOI: 10.1186/s12859-024-05753-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2024] [Accepted: 03/19/2024] [Indexed: 04/04/2024] Open
Abstract
Accurate and efficient prediction of drug-target interaction (DTI) is critical to advance drug development and reduce the cost of drug discovery. Recently, the employment of deep learning methods has enhanced DTI prediction precision and efficacy, but it still encounters several challenges. The first challenge lies in the efficient learning of drug and protein feature representations alongside their interaction features to enhance DTI prediction. Another important challenge is to improve the generalization capability of the DTI model within real-world scenarios. To address these challenges, we propose CAT-DTI, a model based on cross-attention and Transformer, possessing domain adaptation capability. CAT-DTI effectively captures the drug-target interactions while adapting to out-of-distribution data. Specifically, we use a convolution neural network combined with a Transformer to encode the distance relationship between amino acids within protein sequences and employ a cross-attention module to capture the drug-target interaction features. Generalization to new DTI prediction scenarios is achieved by leveraging a conditional domain adversarial network, aligning DTI representations under diverse distributions. Experimental results within in-domain and cross-domain scenarios demonstrate that CAT-DTI model overall improves DTI prediction performance compared with previous methods.
Collapse
Affiliation(s)
- Xiaoting Zeng
- School of Computer and Software, Shenzhen University, Shenzhen, 518060, China
| | - Weilin Chen
- Marshall Laboratory of Biomedical Engineering, Shenzhen University Medical School, Shenzhen University, Shenzhen, 518055, China.
| | - Baiying Lei
- School of Biomedical Engineering, Shenzhen University, Shenzhen, 518055, China.
| |
Collapse
|
50
|
Chen S, Li M, Semenov I. MFA-DTI: Drug-target interaction prediction based on multi-feature fusion adopted framework. Methods 2024; 224:79-92. [PMID: 38430967 DOI: 10.1016/j.ymeth.2024.02.008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2023] [Revised: 02/16/2024] [Accepted: 02/23/2024] [Indexed: 03/05/2024] Open
Abstract
The identification of drug-target interactions (DTI) is a valuable step in the drug discovery and repositioning process. However, traditional laboratory experiments are time-consuming and expensive. Computational methods have streamlined research to determine DTIs. The application of deep learning methods has significantly improved the prediction performance for DTIs. Modern deep learning methods can leverage multiple sources of information, including sequence data that contains biological structural information, and interaction data. While useful, these methods cannot be effectively applied to each type of information individually (e.g., chemical structure and interaction network) and do not take into account the specificity of DTI data such as low- or zero-interaction biological entities. To overcome these limitations, we propose a method called MFA-DTI (Multi-feature Fusion Adopted framework for DTI). MFA-DTI consists of three modules: an interaction graph learning module that processes the interaction network to generate interaction vectors, a chemical structure learning module that extracts features from the chemical structure, and a fusion module that combines these features for the final prediction. To validate the performance of MFA-DTI, we conducted experiments on six public datasets under different settings. The results indicate that the proposed method is highly effective in various settings and outperforms state-of-the-art methods.
Collapse
Affiliation(s)
- Siqi Chen
- School of Information Science and Engineering, Chongqing Jiaotong University, Chongqing, 400074, China.
| | - Minghui Li
- Beidahuang Industry Group General Hospital, Harbin, 150006, China
| | - Ivan Semenov
- College of Intelligence and Computing, Tianjin University, Tianjin, 300072, China
| |
Collapse
|