1
|
Choi D, Park S. Improving binding affinity prediction by emphasizing local features of drug and protein. Comput Biol Chem 2024; 115:108310. [PMID: 39674048 DOI: 10.1016/j.compbiolchem.2024.108310] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2024] [Revised: 10/10/2024] [Accepted: 12/04/2024] [Indexed: 12/16/2024]
Abstract
Binding affinity prediction has been considered as a fundamental task in drug discovery. Despite much effort to improve accuracy of binding affinity prediction, the prior work considered only macro-level features that can represent the characteristics of the whole architecture of a drug and a target protein, and the features from local structure of the drug and the protein tend to be lost. In this paper, we propose a deep learning model that can comprehensively extract the local features of both a drug and a target protein for accurate binding affinity prediction. The proposed model consists of two components named as Multi-Stream CNN and Multi-Stream GCN, each of which is responsible for capturing micro-level characteristics or local features from subsequences of a target protein sequence and subgraph of a drug molecule, respectively. Having multiple streams consisting of different numbers of layers, both the components can compute and preserve the local features with a stream consisting of a single layer. Our evaluation with two popular datasets, Davis and KIBA, demonstrates that the proposed model outperforms all the baseline models using the global features, implying that local features play significant roles of binding affinity prediction.
Collapse
Affiliation(s)
- Daejin Choi
- Department of Computer Science and Engineering, Incheon National University, Incheon, Republic of Korea.
| | - Sangjun Park
- Department of Artificial Intelligence, Korea University, Seoul, Republic of Korea.
| |
Collapse
|
2
|
Schifferstein J, Bernatavicius A, Janssen APA. Docking-Informed Machine Learning for Kinome-wide Affinity Prediction. J Chem Inf Model 2024. [PMID: 39657274 DOI: 10.1021/acs.jcim.4c01260] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2024]
Abstract
Kinase inhibitors are an important class of anticancer drugs, with 80 inhibitors clinically approved and >100 in active clinical testing. Most bind competitively in the ATP-binding site, leading to challenges with selectivity for a specific kinase, resulting in risks for toxicity and general off-target effects. Assessing the binding of an inhibitor for the entire kinome is experimentally possible but expensive. A reliable and interpretable computational prediction of kinase selectivity would greatly benefit the inhibitor discovery and optimization process. Here, we use machine learning on docked poses to address this need. To this end, we aggregated all known inhibitor-kinase affinities and generated the complete accompanying 3D interactome by docking all inhibitors to the respective high-quality X-ray structures. We then used this resource to train a neural network as a kinase-specific scoring function, which achieved an overall performance (R2) of 0.63-0.74 on unseen inhibitors across the kinome. The entire pipeline from molecule to 3D-based affinity prediction has been fully automated and wrapped in a freely available package. This has a graphical user interface that is tightly integrated with PyMOL to allow immediate adoption in the medicinal chemistry practice.
Collapse
Affiliation(s)
- Jordy Schifferstein
- Department of Molecular Physiology, Leiden Institute of Chemistry, Leiden University, Leiden 2333CC, The Netherlands
- Oncode Institute, Utrecht 3521AL, The Netherlands
| | - Andrius Bernatavicius
- Leiden Institute of Advanced Computer Science, Leiden University, Leiden 2333CC, The Netherlands
| | - Antonius P A Janssen
- Department of Molecular Physiology, Leiden Institute of Chemistry, Leiden University, Leiden 2333CC, The Netherlands
- Oncode Institute, Utrecht 3521AL, The Netherlands
| |
Collapse
|
3
|
Liu M, Meng X, Mao Y, Li H, Liu J. ReduMixDTI: Prediction of Drug-Target Interaction with Feature Redundancy Reduction and Interpretable Attention Mechanism. J Chem Inf Model 2024; 64:8952-8962. [PMID: 39570771 DOI: 10.1021/acs.jcim.4c01554] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/10/2024]
Abstract
Identifying drug-target interactions (DTIs) is essential for drug discovery and development. Existing deep learning approaches to DTI prediction often employ powerful feature encoders to represent drugs and targets holistically, which usually cause significant redundancy and noise by neglecting the restricted binding regions. Furthermore, many previous DTI networks ignore or simplify the complex intermolecular interaction process involving diverse binding types, which significantly limits both predictive ability and interpretability. We propose ReduMixDTI, an end-to-end model that addresses feature redundancy and explicitly captures complex local interactions for DTI prediction. In this study, drug and target features are encoded by using graph neural networks and convolutional neural networks, respectively. These features are refined from channel and spatial perspectives to enhance the representations. The proposed attention mechanism explicitly models pairwise interactions between drug and target substructures, improving the model's understanding of binding processes. In extensive comparisons with seven state-of-the-art methods, ReduMixDTI demonstrates superior performance across three benchmark data sets and external test sets reflecting real-world scenarios. Additionally, we perform comprehensive ablation studies and visualize protein attention weights to enhance the interpretability. The results confirm that ReduMixDTI serves as a robust and interpretable model for reducing feature redundancy, contributing to advances in DTI prediction.
Collapse
Affiliation(s)
- Mingqing Liu
- National Engineering Laboratory for Brain-inspired Intelligence Technology and Application, School of Information Science and Technology, University of Science and Technology of China, Hefei 230026, Anhui, China
- Center for Advanced Interdisciplinary Science and Biomedicine of IHM, Division of Life Sciences and Medicine, University of Science and Technology of China MoE Key Laboratory of Brain-inspired Intelligent Perception and Cognition, University of Science and Technology of China, Hefei 230026, Anhui, China
| | - Xuechun Meng
- National Engineering Laboratory for Brain-inspired Intelligence Technology and Application, School of Information Science and Technology, University of Science and Technology of China, Hefei 230026, Anhui, China
- Center for Advanced Interdisciplinary Science and Biomedicine of IHM, Division of Life Sciences and Medicine, University of Science and Technology of China MoE Key Laboratory of Brain-inspired Intelligent Perception and Cognition, University of Science and Technology of China, Hefei 230026, Anhui, China
| | - Yiyang Mao
- National Engineering Laboratory for Brain-inspired Intelligence Technology and Application, School of Information Science and Technology, University of Science and Technology of China, Hefei 230026, Anhui, China
- Center for Advanced Interdisciplinary Science and Biomedicine of IHM, Division of Life Sciences and Medicine, University of Science and Technology of China MoE Key Laboratory of Brain-inspired Intelligent Perception and Cognition, University of Science and Technology of China, Hefei 230026, Anhui, China
| | - Hongqi Li
- Department of Geriatrics, The First Affiliated Hospital of USTC, University of Science and Technology of China, Hefei 230026, Anhui, China
| | - Ji Liu
- National Engineering Laboratory for Brain-inspired Intelligence Technology and Application, School of Information Science and Technology, University of Science and Technology of China, Hefei 230026, Anhui, China
- Center for Advanced Interdisciplinary Science and Biomedicine of IHM, Division of Life Sciences and Medicine, University of Science and Technology of China MoE Key Laboratory of Brain-inspired Intelligent Perception and Cognition, University of Science and Technology of China, Hefei 230026, Anhui, China
- MoE Key Laboratory of Brain-inspired Intelligent Perception and Cognition, University of Science and Technology of China, Hefei 230026, Anhui, China
- Institute of Artificial Intelligence, Hefei Comprehensive National Science Center, Hefei 230026, Anhui, China
| |
Collapse
|
4
|
Paendong GG, Ngnamsie Njimbouom S, Zonyfar C, Kim J. ERL-ProLiGraph: Enhanced representation learning on protein-ligand graph structured data for binding affinity prediction. Mol Inform 2024; 43:e202400044. [PMID: 39404190 PMCID: PMC11639045 DOI: 10.1002/minf.202400044] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2024] [Revised: 06/03/2024] [Accepted: 06/21/2024] [Indexed: 12/14/2024]
Abstract
Predicting Protein-Ligand Binding Affinity (PLBA) is pivotal in drug development, as accurate estimations of PLBA expedite the identification of promising drug candidates for specific targets, thereby accelerating the drug discovery process. Despite substantial advancements in PLBA prediction, developing an efficient and more accurate method remains non-trivial. Unlike previous computer-aid PLBA studies which primarily using ligand SMILES and protein sequences represented as strings, this research introduces a Deep Learning-based method, the Enhanced Representation Learning on Protein-Ligand Graph Structured data for Binding Affinity Prediction (ERL-ProLiGraph). The unique aspect of this method is the use of graph representations for both proteins and ligands, intending to learn structural information continued from both to enhance the accuracy of PLBA predictions. In these graphs, nodes represent atomic structures, while edges depict chemical bonds and spatial relationship. The proposed model, leveraging deep-learning algorithms, effectively learns to correlate these graphical representations with binding affinities. This graph-based representations approach enhances the model's ability to capture the complex molecular interactions critical in PLBA. This work represents a promising advancement in computational techniques for protein-ligand binding prediction, offering a potential path toward more efficient and accurate predictions in drug development. Comparative analysis indicates that the proposed ERL-ProLiGraph outperforms previous models, showcasing notable efficacy and providing a more suitable approach for accurate PLBA predictions.
Collapse
Affiliation(s)
- Gloria Geine Paendong
- Department of Computer Science and Electronics EngineeringSun Moon UniversityChungcheongnam-doKorea
| | | | - Candra Zonyfar
- Department of Computer Science and Electronics EngineeringSun Moon UniversityChungcheongnam-doKorea
| | - Jeong‐Dong Kim
- Department of Computer Science and Electronics EngineeringSun Moon UniversityChungcheongnam-doKorea
- Department of Computer Science and EngineeringSun Moon UniversityChungcheongnam-doKorea
- Genome-Based Bio IT Convergence InstituteSun Moon UniversityChungcheongnam-doKorea
| |
Collapse
|
5
|
Tanoli Z, Schulman A, Aittokallio T. Validation guidelines for drug-target prediction methods. Expert Opin Drug Discov 2024:1-15. [PMID: 39568436 DOI: 10.1080/17460441.2024.2430955] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2024] [Accepted: 11/14/2024] [Indexed: 11/22/2024]
Abstract
INTRODUCTION Mapping the interactions between pharmaceutical compounds and their molecular targets is a fundamental aspect of drug discovery and repurposing. Drug-target interactions are important for elucidating mechanisms of action and optimizing drug efficacy and safety profiles. Several computational methods have been developed to systematically predict drug-target interactions. However, computational and experimental validation of the drug-target predictions greatly vary across the studies. AREAS COVERED Through a PubMed query, a corpus comprising 3,286 articles on drug-target interaction prediction published within the past decade was covered. Natural language processing was used for automated abstract classification to study the evolution of computational methods, validation strategies and performance assessment metrics in the 3,286 articles. Additionally, a manual analysis of 259 studies that performed experimental validation of computational predictions revealed prevalent experimental protocols. EXPERT OPINION Starting from 2014, there has been a noticeable increase in articles focusing on drug-target interaction prediction. Docking and regression stands out as the most commonly used techniques among computational methods, and cross-validation is frequently employed as the computational validation strategy. Testing the predictions using multiple, orthogonal validation strategies is recommended and should be reported for the specific target prediction applications. Experimental validation remains relatively rare and should be performed more routinely to evaluate biological relevance of predictions.
Collapse
Affiliation(s)
- Ziaurrehman Tanoli
- Institute for Molecular Medicine Finland (FIMM), HiLIFE, University of Helsinki, Helsinki, Finland
- iCAN Digital Precision Cancer Medicine Flagship, University of Helsinki and Helsinki University Hospital, Helsinki, Finland
| | - Aron Schulman
- Institute for Molecular Medicine Finland (FIMM), HiLIFE, University of Helsinki, Helsinki, Finland
| | - Tero Aittokallio
- Institute for Molecular Medicine Finland (FIMM), HiLIFE, University of Helsinki, Helsinki, Finland
- iCAN Digital Precision Cancer Medicine Flagship, University of Helsinki and Helsinki University Hospital, Helsinki, Finland
- Institute for Cancer Research, Department of Cancer Genetics, Oslo University Hospital, Oslo, Norway
- Oslo Centre for Biostatistics and Epidemiology (OCBE), Faculty of Medicine, University of Oslo, Oslo, Norway
| |
Collapse
|
6
|
Theisen R, Wang T, Ravikumar B, Rahman R, Cichońska A. Leveraging multiple data types for improved compound-kinase bioactivity prediction. Nat Commun 2024; 15:7596. [PMID: 39217147 PMCID: PMC11365929 DOI: 10.1038/s41467-024-52055-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2024] [Accepted: 08/21/2024] [Indexed: 09/04/2024] Open
Abstract
Machine learning provides efficient ways to map compound-kinase interactions. However, diverse bioactivity data types, including single-dose and multi-dose-response assay results, present challenges. Traditional models utilize only multi-dose data, overlooking information contained in single-dose measurements. Here, we propose a machine learning methodology for compound-kinase activity prediction that leverages both single-dose and dose-response data. We demonstrate that our two-stage approach yields accurate activity predictions and significantly improves model performance compared to training solely on dose-response labels. This superior performance is consistent across five diverse machine learning methods. Using the best performing model, we carried out extensive experimental profiling on a total of 347 selected compound-kinase pairs, achieving a high hit rate of 40% and a negative predictive value of 78%. We show that these rates can be improved further by incorporating model uncertainty estimates into the compound selection process. By integrating multiple activity data types, we demonstrate that our approach holds promise for facilitating the development of training activity datasets in a more efficient and cost-effective way.
Collapse
Affiliation(s)
- Ryan Theisen
- Harmonic Discovery Inc., New York City, NY, USA.
| | | | | | | | | |
Collapse
|
7
|
Schulman A, Rousu J, Aittokallio T, Tanoli Z. Attention-based approach to predict drug-target interactions across seven target superfamilies. Bioinformatics 2024; 40:btae496. [PMID: 39115379 PMCID: PMC11520408 DOI: 10.1093/bioinformatics/btae496] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2024] [Revised: 06/12/2024] [Accepted: 08/06/2024] [Indexed: 08/29/2024] Open
Abstract
MOTIVATION Drug-target interactions (DTIs) hold a pivotal role in drug repurposing and elucidation of drug mechanisms of action. While single-targeted drugs have demonstrated clinical success, they often exhibit limited efficacy against complex diseases, such as cancers, whose development and treatment is dependent on several biological processes. Therefore, a comprehensive understanding of primary, secondary and even inactive targets becomes essential in the quest for effective and safe treatments for cancer and other indications. The human proteome offers over a thousand druggable targets, yet most FDA-approved drugs bind to only a small fraction of these targets. RESULTS This study introduces an attention-based method (called as MMAtt-DTA) to predict drug-target bioactivities across human proteins within seven superfamilies. We meticulously examined nine different descriptor sets to identify optimal signature descriptors for predicting novel DTIs. Our testing results demonstrated Spearman correlations exceeding 0.72 (P < 0.001) for six out of seven superfamilies. The proposed method outperformed fourteen state-of-the-art machine learning, deep learning and graph-based methods and maintained relatively high performance for most target superfamilies when tested with independent bioactivity data sources. We computationally validated 185 676 drug-target pairs from ChEMBL-V33 that were not available during model training, achieving a reasonable performance with Spearman correlation >0.57 (P < 0.001) for most superfamilies. This underscores the robustness of the proposed method for predicting novel DTIs. Finally, we applied our method to predict missing bioactivities among 3492 approved molecules in ChEMBL-V33, offering a valuable tool for advancing drug mechanism discovery and repurposing existing drugs for new indications. AVAILABILITY AND IMPLEMENTATION https://github.com/AronSchulman/MMAtt-DTA.
Collapse
Affiliation(s)
- Aron Schulman
- Institute for Molecular Medicine Finland (FIMM), HiLIFE, University of Helsinki, Helsinki, 00014, Finland
| | - Juho Rousu
- Department of Computer Science, Aalto University, Espoo, 02150, Finland
| | - Tero Aittokallio
- Institute for Molecular Medicine Finland (FIMM), HiLIFE, University of Helsinki, Helsinki, 00014, Finland
- iCAN Digital Precision Cancer Medicine Flagship, University of Helsinki and Helsinki University Hospital, Helsinki, 00014, Finland
- Department of Cancer Genetics, Institute for Cancer Research, Oslo University Hospital, Oslo, 0379, Norway
- Oslo Centre for Biostatistics and Epidemiology (OCBE), Faculty of Medicine, University of Oslo, Oslo, 0372, Norway
| | - Ziaurrehman Tanoli
- Institute for Molecular Medicine Finland (FIMM), HiLIFE, University of Helsinki, Helsinki, 00014, Finland
- iCAN Digital Precision Cancer Medicine Flagship, University of Helsinki and Helsinki University Hospital, Helsinki, 00014, Finland
- Drug Discovery and Chemical Biology (DDCB) Consortium, Biocenter, Helsinki, 00014, Finland
- BioICAWtech, Helsinki, Helsinki, 00410, Finland
| |
Collapse
|
8
|
Wu H, Liu J, Zhang R, Lu Y, Cui G, Cui Z, Ding Y. A review of deep learning methods for ligand based drug virtual screening. FUNDAMENTAL RESEARCH 2024; 4:715-737. [PMID: 39156568 PMCID: PMC11330120 DOI: 10.1016/j.fmre.2024.02.011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2023] [Revised: 01/10/2024] [Accepted: 02/18/2024] [Indexed: 08/20/2024] Open
Abstract
Drug discovery is costly and time consuming, and modern drug discovery endeavors are progressively reliant on computational methodologies, aiming to mitigate temporal and financial expenditures associated with the process. In particular, the time required for vaccine and drug discovery is prolonged during emergency situations such as the coronavirus 2019 pandemic. Recently, the performance of deep learning methods in drug virtual screening has been particularly prominent. It has become a concern for researchers how to summarize the existing deep learning in drug virtual screening, select different models for different drug screening problems, exploit the advantages of deep learning models, and further improve the capability of deep learning in drug virtual screening. This review first introduces the basic concepts of drug virtual screening, common datasets, and data representation methods. Then, large numbers of common deep learning methods for drug virtual screening are compared and analyzed. In addition, a dataset of different sizes is constructed independently to evaluate the performance of each deep learning model for the difficult problem of large-scale ligand virtual screening. Finally, the existing challenges and future directions in the field of virtual screening are presented.
Collapse
Affiliation(s)
- Hongjie Wu
- School of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou 215009, China
| | - Junkai Liu
- School of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou 215009, China
| | - Runhua Zhang
- School of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou 215009, China
| | - Yaoyao Lu
- School of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou 215009, China
| | - Guozeng Cui
- School of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou 215009, China
| | - Zhiming Cui
- School of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou 215009, China
| | - Yijie Ding
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou 324000, China
| |
Collapse
|
9
|
Zhang H, Liu X, Cheng W, Wang T, Chen Y. Prediction of drug-target binding affinity based on deep learning models. Comput Biol Med 2024; 174:108435. [PMID: 38608327 DOI: 10.1016/j.compbiomed.2024.108435] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2024] [Revised: 04/05/2024] [Accepted: 04/07/2024] [Indexed: 04/14/2024]
Abstract
The prediction of drug-target binding affinity (DTA) plays an important role in drug discovery. Computerized virtual screening techniques have been used for DTA prediction, greatly reducing the time and economic costs of drug discovery. However, these techniques have not succeeded in reversing the low success rate of new drug development. In recent years, the continuous development of deep learning (DL) technology has brought new opportunities for drug discovery through the DTA prediction. This shift has moved the prediction of DTA from traditional machine learning methods to DL. The DL frameworks used for DTA prediction include convolutional neural networks (CNN), graph convolutional neural networks (GCN), and recurrent neural networks (RNN), and reinforcement learning (RL), among others. This review article summarizes the available literature on DTA prediction using DL models, including DTA quantification metrics and datasets, and DL algorithms used for DTA prediction (including input representation of models, neural network frameworks, valuation indicators, and model interpretability). In addition, the opportunities, challenges, and prospects of the application of DL frameworks for DTA prediction in the field of drug discovery are discussed.
Collapse
Affiliation(s)
- Hao Zhang
- College of Science, Nanjing Agricultural University, Nanjing, 210095, China
| | - Xiaoqian Liu
- College of Science, Nanjing Agricultural University, Nanjing, 210095, China
| | - Wenya Cheng
- College of Science, Nanjing Agricultural University, Nanjing, 210095, China
| | - Tianshi Wang
- College of Science, Nanjing Agricultural University, Nanjing, 210095, China
| | - Yuanyuan Chen
- College of Science, Nanjing Agricultural University, Nanjing, 210095, China.
| |
Collapse
|
10
|
Pogány D, Antal P. Towards explainable interaction prediction: Embedding biological hierarchies into hyperbolic interaction space. PLoS One 2024; 19:e0300906. [PMID: 38512848 PMCID: PMC10956837 DOI: 10.1371/journal.pone.0300906] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2023] [Accepted: 03/06/2024] [Indexed: 03/23/2024] Open
Abstract
Given the prolonged timelines and high costs associated with traditional approaches, accelerating drug development is crucial. Computational methods, particularly drug-target interaction prediction, have emerged as efficient tools, yet the explainability of machine learning models remains a challenge. Our work aims to provide more interpretable interaction prediction models using similarity-based prediction in a latent space aligned to biological hierarchies. We investigated integrating drug and protein hierarchies into a joint-embedding drug-target latent space via embedding regularization by conducting a comparative analysis between models employing traditional flat Euclidean vector spaces and those utilizing hyperbolic embeddings. Besides, we provided a latent space analysis as an example to show how we can gain visual insights into the trained model with the help of dimensionality reduction. Our results demonstrate that hierarchy regularization improves interpretability without compromising predictive performance. Furthermore, integrating hyperbolic embeddings, coupled with regularization, enhances the quality of the embedded hierarchy trees. Our approach enables a more informed and insightful application of interaction prediction models in drug discovery by constructing an interpretable hyperbolic latent space, simultaneously incorporating drug and target hierarchies and pairing them with available interaction information. Moreover, compatible with pairwise methods, the approach allows for additional transparency through existing explainable AI solutions.
Collapse
Affiliation(s)
- Domonkos Pogány
- Department of Measurement and Information Systems, Budapest University of Technology and Economics, Budapest, Hungary
| | - Péter Antal
- Department of Measurement and Information Systems, Budapest University of Technology and Economics, Budapest, Hungary
| |
Collapse
|
11
|
Qiu W, Liang Q, Yu L, Xiao X, Qiu W, Lin W. LSTM-SAGDTA: Predicting Drug-target Binding Affinity with an Attention Graph Neural Network and LSTM Approach. Curr Pharm Des 2024; 30:468-476. [PMID: 38323613 PMCID: PMC11071654 DOI: 10.2174/0113816128282837240130102817] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2023] [Revised: 01/14/2024] [Accepted: 01/19/2024] [Indexed: 02/08/2024]
Abstract
INTRODUCTION Drug development is a challenging and costly process, yet it plays a crucial role in improving healthcare outcomes. Drug development requires extensive research and testing to meet the demands for economic efficiency, cures, and pain relief. METHODS Drug development is a vital research area that necessitates innovation and collaboration to achieve significant breakthroughs. Computer-aided drug design provides a promising avenue for drug discovery and development by reducing costs and improving the efficiency of drug design and testing. RESULTS In this study, a novel model, namely LSTM-SAGDTA, capable of accurately predicting drug-target binding affinity, was developed. We employed SeqVec for characterizing the protein and utilized the graph neural networks to capture information on drug molecules. By introducing self-attentive graph pooling, the model achieved greater accuracy and efficiency in predicting drug-target binding affinity. CONCLUSION Moreover, LSTM-SAGDTA obtained superior accuracy over current state-of-the-art methods only by using less training time. The results of experiments suggest that this method represents a highprecision solution for the DTA predictor.
Collapse
Affiliation(s)
- Wenjing Qiu
- School of Information Engineering, Jingdezhen Ceramic University, Jingdezhen 333000, China
| | - Qianle Liang
- School of Information Engineering, Jingdezhen Ceramic University, Jingdezhen 333000, China
| | - Liyi Yu
- School of Information Engineering, Jingdezhen Ceramic University, Jingdezhen 333000, China
| | - Xuan Xiao
- School of Information Engineering, Jingdezhen Ceramic University, Jingdezhen 333000, China
| | - Wangren Qiu
- School of Information Engineering, Jingdezhen Ceramic University, Jingdezhen 333000, China
| | - Weizhong Lin
- School of Information Engineering, Jingdezhen Ceramic University, Jingdezhen 333000, China
| |
Collapse
|
12
|
Luo Y, Liu Y, Peng J. Calibrated geometric deep learning improves kinase-drug binding predictions. NAT MACH INTELL 2023; 5:1390-1401. [PMID: 38962391 PMCID: PMC11221792 DOI: 10.1038/s42256-023-00751-0] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2022] [Accepted: 09/29/2023] [Indexed: 07/05/2024]
Abstract
Protein kinases regulate various cellular functions and hold significant pharmacological promise in cancer and other diseases. Although kinase inhibitors are one of the largest groups of approved drugs, much of the human kinome remains unexplored but potentially druggable. Computational approaches, such as machine learning, offer efficient solutions for exploring kinase-compound interactions and uncovering novel binding activities. Despite the increasing availability of three-dimensional (3D) protein and compound structures, existing methods predominantly focus on exploiting local features from one-dimensional protein sequences and two-dimensional molecular graphs to predict binding affinities, overlooking the 3D nature of the binding process. Here we present KDBNet, a deep learning algorithm that incorporates 3D protein and molecule structure data to predict binding affinities. KDBNet uses graph neural networks to learn structure representations of protein binding pockets and drug molecules, capturing the geometric and spatial characteristics of binding activity. In addition, we introduce an algorithm to quantify and calibrate the uncertainties of KDBNet's predictions, enhancing its utility in model-guided discovery in chemical or protein space. Experiments demonstrated that KDBNet outperforms existing deep learning models in predicting kinase-drug binding affinities. The uncertainties estimated by KDBNet are informative and well-calibrated with respect to prediction errors. When integrated with a Bayesian optimization framework, KDBNet enables data-efficient active learning and accelerates the exploration and exploitation of diverse high-binding kinase-drug pairs.
Collapse
Affiliation(s)
- Yunan Luo
- School of Computational Science and Engineering, Georgia Institute of Technology, Atlanta, GA, USA
- These authors contributed equally: Yunan Luo, Yang Liu
| | - Yang Liu
- Department of Computer Science, University of Illinois Urbana-Champaign, Urbana, IL, USA
- These authors contributed equally: Yunan Luo, Yang Liu
| | - Jian Peng
- Department of Computer Science, University of Illinois Urbana-Champaign, Urbana, IL, USA
| |
Collapse
|
13
|
Zhang L, Wang CC, Zhang Y, Chen X. GPCNDTA: Prediction of drug-target binding affinity through cross-attention networks augmented with graph features and pharmacophores. Comput Biol Med 2023; 166:107512. [PMID: 37788507 DOI: 10.1016/j.compbiomed.2023.107512] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2023] [Revised: 08/28/2023] [Accepted: 09/19/2023] [Indexed: 10/05/2023]
Abstract
Drug-target affinity prediction is a challenging task in drug discovery. The latest computational models have limitations in mining edge information in molecule graphs, accessing to knowledge in pharmacophores, integrating multimodal data of the same biomolecule and realizing effective interactions between two different biomolecules. To solve these problems, we proposed a method called Graph features and Pharmacophores augmented Cross-attention Networks based Drug-Target binding Affinity prediction (GPCNDTA). First, we utilized the GNN module, the linear projection unit and self-attention layer to correspondingly extract features of drugs and proteins. Second, we devised intramolecular and intermolecular cross-attention to respectively fuse and interact features of drugs and proteins. Finally, the linear projection unit was applied to gain final features of drugs and proteins, and the Multi-Layer Perceptron was employed to predict drug-target binding affinity. Three major innovations of GPCNDTA are as follows: (i) developing the residual CensNet and the residual EW-GCN to correspondingly extract features of drug and protein graphs, (ii) regarding pharmacophores as a new type of priors to heighten drug-target affinity prediction performance, and (iii) devising intramolecular and intermolecular cross-attention, in which the intramolecular cross-attention realizes the effective fusion of different modal data related to the same biomolecule, and the intermolecular cross-attention fulfills the information interaction between two different biomolecules in attention space. The test results on five benchmark datasets imply that GPCNDTA achieves the best performance compared with state-of-the-art computational models. Besides, relying on ablation experiments, we proved effectiveness of GNN modules, pharmacophores and two cross-attention strategies in improving the prediction accuracy, stability and reliability of GPCNDA. In case studies, we applied GPCNDTA to predict binding affinities between 3C-like proteinase and 185 drugs, and observed that most binding affinities predicted by GPCNDTA are close to corresponding experimental measurements.
Collapse
Affiliation(s)
- Li Zhang
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou, 221116, China
| | - Chun-Chun Wang
- School of Science, Jiangnan University, Wuxi, 214122, China
| | - Yong Zhang
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou, 221116, China
| | - Xing Chen
- School of Science, Jiangnan University, Wuxi, 214122, China.
| |
Collapse
|
14
|
Kusuma WA, Fadli A, Fatriani R, Sofyantoro F, Yudha DS, Lischer K, Nuringtyas TR, Putri WA, Purwestri YA, Swasono RT. Prediction of the interaction between Calloselasma rhodostoma venom-derived peptides and cancer-associated hub proteins: A computational study. Heliyon 2023; 9:e21149. [PMID: 37954374 PMCID: PMC10637925 DOI: 10.1016/j.heliyon.2023.e21149] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2023] [Revised: 09/04/2023] [Accepted: 10/17/2023] [Indexed: 11/14/2023] Open
Abstract
The use of peptide drugs to treat cancer is gaining popularity because of their efficacy, fewer side effects, and several advantages over other properties. Identifying the peptides that interact with cancer proteins is crucial in drug discovery. Several approaches related to predicting peptide-protein interactions have been conducted. However, problems arise due to the high costs of resources and time and the smaller number of studies. This study predicts peptide-protein interactions using Random Forest, XGBoost, and SAE-DNN. Feature extraction is also performed on proteins and peptides using intrinsic disorder, amino acid sequences, physicochemical properties, position-specific assessment matrices, amino acid composition, and dipeptide composition. Results show that all algorithms perform equally well in predicting interactions between peptides derived from venoms and target proteins associated with cancer. However, XGBoost produces the best results with accuracy, precision, and area under the receiver operating characteristic curve of 0.859, 0.663, and 0.697, respectively. The enrichment analysis revealed that peptides from the Calloselasma rhodostoma venom targeted several proteins (ESR1, GOPC, and BRD4) related to cancer.
Collapse
Affiliation(s)
- Wisnu Ananta Kusuma
- Department of Computer Science, Faculty of Mathematics and Natural Sciences, IPB University, Bogor, 16680, Indonesia
- Tropical Biopharmaca Research Center, IPB University, Bogor, 16128, Indonesia
| | - Aulia Fadli
- Department of Computer Science, Faculty of Mathematics and Natural Sciences, IPB University, Bogor, 16680, Indonesia
| | - Rizka Fatriani
- Tropical Biopharmaca Research Center, IPB University, Bogor, 16128, Indonesia
| | - Fajar Sofyantoro
- Faculty of Biology, Universitas Gadjah Mada, Yogyakarta, 55281, Indonesia
| | - Donan Satria Yudha
- Faculty of Biology, Universitas Gadjah Mada, Yogyakarta, 55281, Indonesia
| | - Kenny Lischer
- Faculty of Engineering, University of Indonesia, Jakarta, 16424, Indonesia
| | - Tri Rini Nuringtyas
- Faculty of Biology, Universitas Gadjah Mada, Yogyakarta, 55281, Indonesia
- Research Center for Biotechnology, Universitas Gadjah Mada, Yogyakarta, 55281, Indonesia
| | | | - Yekti Asih Purwestri
- Faculty of Biology, Universitas Gadjah Mada, Yogyakarta, 55281, Indonesia
- Research Center for Biotechnology, Universitas Gadjah Mada, Yogyakarta, 55281, Indonesia
| | - Respati Tri Swasono
- Department of Chemistry, Faculty of Mathematics and Natural Sciences, Universitas Gadjah Mada, Yogyakarta, 55281, Indonesia
| |
Collapse
|
15
|
Zhai H, Hou H, Luo J, Liu X, Wu Z, Wang J. DGDTA: dynamic graph attention network for predicting drug-target binding affinity. BMC Bioinformatics 2023; 24:367. [PMID: 37777712 PMCID: PMC10543834 DOI: 10.1186/s12859-023-05497-5] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2023] [Accepted: 09/23/2023] [Indexed: 10/02/2023] Open
Abstract
BACKGROUND Obtaining accurate drug-target binding affinity (DTA) information is significant for drug discovery and drug repositioning. Although some methods have been proposed for predicting DTA, the features of proteins and drugs still need to be further analyzed. Recently, deep learning has been successfully used in many fields. Hence, designing a more effective deep learning method for predicting DTA remains attractive. RESULTS Dynamic graph DTA (DGDTA), which uses a dynamic graph attention network combined with a bidirectional long short-term memory (Bi-LSTM) network to predict DTA is proposed in this paper. DGDTA adopts drug compound as input according to its corresponding simplified molecular input line entry system (SMILES) and protein amino acid sequence. First, each drug is considered a graph of interactions between atoms and edges, and dynamic attention scores are used to consider which atoms and edges in the drug are most important for predicting DTA. Then, Bi-LSTM is used to better extract the contextual information features of protein amino acid sequences. Finally, after combining the obtained drug and protein feature vectors, the DTA is predicted by a fully connected layer. The source code is available from GitHub at https://github.com/luojunwei/DGDTA . CONCLUSIONS The experimental results show that DGDTA can predict DTA more accurately than some other methods.
Collapse
Affiliation(s)
- Haixia Zhai
- School of Software, Henan Polytechnic University, Jiaozuo, 454003, China
| | - Hongli Hou
- School of Software, Henan Polytechnic University, Jiaozuo, 454003, China
| | - Junwei Luo
- School of Software, Henan Polytechnic University, Jiaozuo, 454003, China.
| | - Xiaoyan Liu
- School of Software, Henan Polytechnic University, Jiaozuo, 454003, China
| | - Zhengjiang Wu
- School of Software, Henan Polytechnic University, Jiaozuo, 454003, China
| | - Junfeng Wang
- School of Software, Henan Polytechnic University, Jiaozuo, 454003, China
| |
Collapse
|
16
|
Pei Q, Wu L, Zhu J, Xia Y, Xie S, Qin T, Liu H, Liu TY, Yan R. Breaking the barriers of data scarcity in drug-target affinity prediction. Brief Bioinform 2023; 24:bbad386. [PMID: 37903413 DOI: 10.1093/bib/bbad386] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2023] [Revised: 09/14/2023] [Accepted: 10/05/2023] [Indexed: 11/01/2023] Open
Abstract
Accurate prediction of drug-target affinity (DTA) is of vital importance in early-stage drug discovery, facilitating the identification of drugs that can effectively interact with specific targets and regulate their activities. While wet experiments remain the most reliable method, they are time-consuming and resource-intensive, resulting in limited data availability that poses challenges for deep learning approaches. Existing methods have primarily focused on developing techniques based on the available DTA data, without adequately addressing the data scarcity issue. To overcome this challenge, we present the Semi-Supervised Multi-task training (SSM) framework for DTA prediction, which incorporates three simple yet highly effective strategies: (1) A multi-task training approach that combines DTA prediction with masked language modeling using paired drug-target data. (2) A semi-supervised training method that leverages large-scale unpaired molecules and proteins to enhance drug and target representations. This approach differs from previous methods that only employed molecules or proteins in pre-training. (3) The integration of a lightweight cross-attention module to improve the interaction between drugs and targets, further enhancing prediction accuracy. Through extensive experiments on benchmark datasets such as BindingDB, DAVIS and KIBA, we demonstrate the superior performance of our framework. Additionally, we conduct case studies on specific drug-target binding activities, virtual screening experiments, drug feature visualizations and real-world applications, all of which showcase the significant potential of our work. In conclusion, our proposed SSM-DTA framework addresses the data limitation challenge in DTA prediction and yields promising results, paving the way for more efficient and accurate drug discovery processes.
Collapse
Affiliation(s)
- Qizhi Pei
- Gaoling School of Artificial Intelligence, Renmin University of China, No.59, Zhong Guan Cun Avenue, Haidian District, 100872, Beijing, China
| | - Lijun Wu
- Microsoft Research AI4Science, No.5, Dan Ling Street, Haidian District, 100080, Beijing, China
| | - Jinhua Zhu
- CAS Key Laboratory of GIPAS, EEIS Department, University of Science and Technology of China, No.96, JinZhai Road, Baohe District, 230026, Hefei, Anhui Province, China
| | - Yingce Xia
- Microsoft Research AI4Science, No.5, Dan Ling Street, Haidian District, 100080, Beijing, China
| | - Shufang Xie
- Gaoling School of Artificial Intelligence, Renmin University of China, No.59, Zhong Guan Cun Avenue, Haidian District, 100872, Beijing, China
| | - Tao Qin
- Engineering Research Center of Next-Generation Intelligent Search and Recommendation, Ministry of Education
| | - Haiguang Liu
- Microsoft Research AI4Science, No.5, Dan Ling Street, Haidian District, 100080, Beijing, China
| | - Tie-Yan Liu
- Microsoft Research AI4Science, No.5, Dan Ling Street, Haidian District, 100080, Beijing, China
| | - Rui Yan
- Beijing Key Laboratory of Big Data Management and Analysis Methods
| |
Collapse
|
17
|
Ong WJG, Kirubakaran P, Karanicolas J. Poor Generalization by Current Deep Learning Models for Predicting Binding Affinities of Kinase Inhibitors. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.09.04.556234. [PMID: 37732243 PMCID: PMC10508770 DOI: 10.1101/2023.09.04.556234] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/22/2023]
Abstract
The extreme surge of interest over the past decade surrounding the use of neural networks has inspired many groups to deploy them for predicting binding affinities of drug-like molecules to their receptors. A model that can accurately make such predictions has the potential to screen large chemical libraries and help streamline the drug discovery process. However, despite reports of models that accurately predict quantitative inhibition using protein kinase sequences and inhibitors' SMILES strings, it is still unclear whether these models can generalize to previously unseen data. Here, we build a Convolutional Neural Network (CNN) analogous to those previously reported and evaluate the model over four datasets commonly used for inhibitor/kinase predictions. We find that the model performs comparably to those previously reported, provided that the individual data points are randomly split between the training set and the test set. However, model performance is dramatically deteriorated when all data for a given inhibitor is placed together in the same training/testing fold, implying that information leakage underlies the models' performance. Through comparison to simple models in which the SMILES strings are tokenized, or in which test set predictions are simply copied from the closest training set data points, we demonstrate that there is essentially no generalization whatsoever in this model. In other words, the model has not learned anything about molecular interactions, and does not provide any benefit over much simpler and more transparent models. These observations strongly point to the need for richer structure-based encodings, to obtain useful prospective predictions of not-yet-synthesized candidate inhibitors.
Collapse
Affiliation(s)
- Wern Juin Gabriel Ong
- Cancer Signaling & Microenvironment Program, Fox Chase Cancer Center, Philadelphia, PA 19111
- Bowdoin College, Brunswick, ME 04011
| | - Palani Kirubakaran
- Cancer Signaling & Microenvironment Program, Fox Chase Cancer Center, Philadelphia, PA 19111
| | - John Karanicolas
- Cancer Signaling & Microenvironment Program, Fox Chase Cancer Center, Philadelphia, PA 19111
| |
Collapse
|
18
|
Kalia A, Krishnan D, Hassoun S. CSI: Contrastive data Stratification for Interaction prediction and its application to compound-protein interaction prediction. Bioinformatics 2023; 39:btad456. [PMID: 37490457 PMCID: PMC10423023 DOI: 10.1093/bioinformatics/btad456] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2022] [Revised: 05/10/2023] [Accepted: 07/24/2023] [Indexed: 07/27/2023] Open
Abstract
MOTIVATION Accurately predicting the likelihood of interaction between two objects (compound-protein sequence, user-item, author-paper, etc.) is a fundamental problem in Computer Science. Current deep-learning models rely on learning accurate representations of the interacting objects. Importantly, relationships between the interacting objects, or features of the interaction, offer an opportunity to partition the data to create multi-views of the interacting objects. The resulting congruent and non-congruent views can then be exploited via contrastive learning techniques to learn enhanced representations of the objects. RESULTS We present a novel method, Contrastive Stratification for Interaction Prediction (CSI), to stratify (partition) a dataset in a manner that can be exploited via Contrastive Multiview Coding to learn embeddings that maximize the mutual information across congruent data views. CSI assigns a key and multiple views to each data point, where data partitions under a particular key form congruent views of the data. We showcase the effectiveness of CSI by applying it to the compound-protein sequence interaction prediction problem, a pressing problem whose solution promises to expedite drug delivery (drug-protein interaction prediction), metabolic engineering, and synthetic biology (compound-enzyme interaction prediction) applications. Comparing CSI with a baseline model that does not utilize data stratification and contrastive learning, and show gains in average precision ranging from 13.7% to 39% using compounds and sequences as keys across multiple drug-target and enzymatic datasets, and gains ranging from 16.9% to 63% using reaction features as keys across enzymatic datasets. AVAILABILITY AND IMPLEMENTATION Code and dataset available at https://github.com/HassounLab/CSI.
Collapse
Affiliation(s)
- Apurva Kalia
- Department of Computer Science, Tufts University, Medford, MA 02155, United States
| | | | - Soha Hassoun
- Department of Computer Science, Tufts University, Medford, MA 02155, United States
- Department of Chemical and Biological Engineering, Tufts University, Medford, MA 02155, United States
| |
Collapse
|
19
|
Oršolić D, Šmuc T. Dynamic applicability domain (dAD): compound-target binding affinity estimates with local conformal prediction. Bioinformatics 2023; 39:btad465. [PMID: 37594752 PMCID: PMC10457664 DOI: 10.1093/bioinformatics/btad465] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2022] [Revised: 04/26/2023] [Accepted: 08/17/2023] [Indexed: 08/19/2023] Open
Abstract
MOTIVATION Increasing efforts are being made in the field of machine learning to advance the learning of robust and accurate models from experimentally measured data and enable more efficient drug discovery processes. The prediction of binding affinity is one of the most frequent tasks of compound bioactivity modelling. Learned models for binding affinity prediction are assessed by their average performance on unseen samples, but point predictions are typically not provided with a rigorous confidence assessment. Approaches, such as the conformal predictor framework equip conventional models with a more rigorous assessment of confidence for individual point predictions. In this article, we extend the inductive conformal prediction framework for interaction data, in particular the compound-target binding affinity prediction task. The new framework is based on dynamically defined calibration sets that are specific for each testing pair and provides prediction assessment in the context of calibration pairs from its compound-target neighbourhood, enabling improved estimates based on the local properties of the prediction model. RESULTS The effectiveness of the approach is benchmarked on several publicly available datasets and tested in realistic use-case scenarios with increasing levels of difficulty on a complex compound-target binding affinity space. We demonstrate that in such scenarios, novel approach combining applicability domain paradigm with conformal prediction framework, produces superior confidence assessment with valid and more informative prediction regions compared to other 'state-of-the-art' conformal prediction approaches. AVAILABILITY AND IMPLEMENTATION Dataset and the code are available on GitHub (https://github.com/mlkr-rbi/dAD).
Collapse
Affiliation(s)
- Davor Oršolić
- Division of Electronics, Ruđer Bošković Institute, Bijenička cesta 54, Zagreb 10000, Croatia
| | - Tomislav Šmuc
- Division of Electronics, Ruđer Bošković Institute, Bijenička cesta 54, Zagreb 10000, Croatia
| |
Collapse
|
20
|
Yousefi N, Yazdani-Jahromi M, Tayebi A, Kolanthai E, Neal CJ, Banerjee T, Gosai A, Balasubramanian G, Seal S, Ozmen Garibay O. BindingSite-AugmentedDTA: enabling a next-generation pipeline for interpretable prediction models in drug repurposing. Brief Bioinform 2023; 24:7140297. [PMID: 37096593 DOI: 10.1093/bib/bbad136] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2022] [Revised: 03/02/2022] [Accepted: 03/16/2023] [Indexed: 04/26/2023] Open
Abstract
While research into drug-target interaction (DTI) prediction is fairly mature, generalizability and interpretability are not always addressed in the existing works in this field. In this paper, we propose a deep learning (DL)-based framework, called BindingSite-AugmentedDTA, which improves drug-target affinity (DTA) predictions by reducing the search space of potential-binding sites of the protein, thus making the binding affinity prediction more efficient and accurate. Our BindingSite-AugmentedDTA is highly generalizable as it can be integrated with any DL-based regression model, while it significantly improves their prediction performance. Also, unlike many existing models, our model is highly interpretable due to its architecture and self-attention mechanism, which can provide a deeper understanding of its underlying prediction mechanism by mapping attention weights back to protein-binding sites. The computational results confirm that our framework can enhance the prediction performance of seven state-of-the-art DTA prediction algorithms in terms of four widely used evaluation metrics, including concordance index, mean squared error, modified squared correlation coefficient ($r^2_m$) and the area under the precision curve. We also contribute to three benchmark drug-traget interaction datasets by including additional information on 3D structure of all proteins contained in those datasets, which include the two most commonly used datasets, namely Kiba and Davis, as well as the data from IDG-DREAM drug-kinase binding prediction challenge. Furthermore, we experimentally validate the practical potential of our proposed framework through in-lab experiments. The relatively high agreement between computationally predicted and experimentally observed binding interactions supports the potential of our framework as the next-generation pipeline for prediction models in drug repurposing.
Collapse
Affiliation(s)
- Niloofar Yousefi
- Industrial Engineering and Management Systems, University of Central Florida, 32816, 4000 Central Florida Blvd., Orlando, FL, USA
| | - Mehdi Yazdani-Jahromi
- Computer Science, University of Central Florida, 32816, 4000 Central Florida Blvd., Orlando, FL, USA
| | - Aida Tayebi
- Industrial Engineering and Management Systems, University of Central Florida, 32816, 4000 Central Florida Blvd., Orlando, FL, USA
| | - Elayaraja Kolanthai
- College of Medicine, Bionix Cluster, University of Central Florida, 4000 Central Florida Blvd., Orlando 32816, FL, USA
| | - Craig J Neal
- College of Medicine, Bionix Cluster, University of Central Florida, 4000 Central Florida Blvd., Orlando 32816, FL, USA
| | - Tanumoy Banerjee
- Department of Mechanical Engineering and Mechanics, Lehigh University, Bethlehem 18015, PA, USA
| | | | - Ganesh Balasubramanian
- Department of Mechanical Engineering and Mechanics, Lehigh University, Bethlehem 18015, PA, USA
| | - Sudipta Seal
- College of Medicine, Bionix Cluster, University of Central Florida, 4000 Central Florida Blvd., Orlando 32816, FL, USA
- Advanced Materials Processing and Analysis Center, Department of Materials Science and Engineering, University of Central Florida, 4000 Central Florida Blvd., Orlando 32816, FL, USA
| | - Ozlem Ozmen Garibay
- Industrial Engineering and Management Systems, University of Central Florida, 32816, 4000 Central Florida Blvd., Orlando, FL, USA
| |
Collapse
|
21
|
Hu Z, Liu W, Zhang C, Huang J, Zhang S, Yu H, Xiong Y, Liu H, Ke S, Hong L. SAM-DTA: a sequence-agnostic model for drug-target binding affinity prediction. Brief Bioinform 2023; 24:6955272. [PMID: 36545795 DOI: 10.1093/bib/bbac533] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2022] [Revised: 10/05/2022] [Accepted: 11/07/2022] [Indexed: 12/24/2022] Open
Abstract
Drug-target binding affinity prediction is a fundamental task for drug discovery and has been studied for decades. Most methods follow the canonical paradigm that processes the inputs of the protein (target) and the ligand (drug) separately and then combines them together. In this study we demonstrate, surprisingly, that a model is able to achieve even superior performance without access to any protein-sequence-related information. Instead, a protein is characterized completely by the ligands that it interacts. Specifically, we treat different proteins separately, which are jointly trained in a multi-head manner, so as to learn a robust and universal representation of ligands that is generalizable across proteins. Empirical evidences show that the novel paradigm outperforms its competitive sequence-based counterpart, with the Mean Squared Error (MSE) of 0.4261 versus 0.7612 and the R-Square of 0.7984 versus 0.6570 compared with DeepAffinity. We also investigate the transfer learning scenario where unseen proteins are encountered after the initial training, and the cross-dataset evaluation for prospective studies. The results reveals the robustness of the proposed model in generalizing to unseen proteins as well as in predicting future data. Source codes and data are available at https://github.com/huzqatpku/SAM-DTA.
Collapse
Affiliation(s)
| | - Wenfeng Liu
- Shanghai Matwings Technology Co., Ltd., Shanghai, 200240, China
- Department of Computer Science and Engineering, East China University of Science and Technology, Shanghai 200237, China
| | | | - Jiawen Huang
- Shanghai Matwings Technology Co., Ltd., Shanghai, 200240, China
- Department of Computer Science and Engineering, East China University of Science and Technology, Shanghai 200237, China
| | - Shaoting Zhang
- SenseTime Research, Shanghai, 201103, China
- Shanghai Artificial Intelligence Laboratory, Shanghai 200232, China
| | - Huiqun Yu
- Department of Computer Science and Engineering, East China University of Science and Technology, Shanghai 200237, China
| | - Yi Xiong
- School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Hao Liu
- Shanghai Matwings Technology Co., Ltd., Shanghai, 200240, China
- Institute of Natural Sciences, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Song Ke
- Shanghai Matwings Technology Co., Ltd., Shanghai, 200240, China
| | - Liang Hong
- School of Pharmacy, Shanghai Jiao Tong University, Shanghai 200240, China
- School of Physics and Astronomy, Shanghai Jiao Tong University, Shanghai 200240, China
- Institute of Natural Sciences, Shanghai Jiao Tong University, Shanghai 200240, China
- Shanghai Artificial Intelligence Laboratory, Shanghai 200232, China
- School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200240, China
| |
Collapse
|
22
|
Bae H, Nam H. GraphATT-DTA: Attention-Based Novel Representation of Interaction to Predict Drug-Target Binding Affinity. Biomedicines 2022; 11:biomedicines11010067. [PMID: 36672575 PMCID: PMC9855982 DOI: 10.3390/biomedicines11010067] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2022] [Revised: 12/06/2022] [Accepted: 12/20/2022] [Indexed: 12/29/2022] Open
Abstract
Drug-target binding affinity (DTA) prediction is an essential step in drug discovery. Drug-target protein binding occurs at specific regions between the protein and drug, rather than the entire protein and drug. However, existing deep-learning DTA prediction methods do not consider the interactions between drug substructures and protein sub-sequences. This work proposes GraphATT-DTA, a DTA prediction model that constructs the essential regions for determining interaction affinity between compounds and proteins, modeled with an attention mechanism for interpretability. We make the model consider the local-to-global interactions with the attention mechanism between compound and protein. As a result, GraphATT-DTA shows an improved prediction of DTA performance and interpretability compared with state-of-the-art models. The model is trained and evaluated with the Davis dataset, the human kinase dataset; an external evaluation is achieved with the independently proposed human kinase dataset from the BindingDB dataset.
Collapse
Affiliation(s)
- Haelee Bae
- AI Graduate School, Gwangju Institute of Science and Technology, 123 Cheomdangwagi-ro, Buk-gu, Gwangju 61005, Republic of Korea
| | - Hojung Nam
- AI Graduate School, Gwangju Institute of Science and Technology, 123 Cheomdangwagi-ro, Buk-gu, Gwangju 61005, Republic of Korea
- School of Electrical Engineering and Computer Science, Gwangju Institute of Science and Technology, 123 Cheomdangwagi-ro, Buk-gu, Gwangju 61005, Republic of Korea
- Center for AI-Applied High Efficiency Drug Discovery (AHEDD), Gwangju Institute of Science and Technology, 123 Cheomdangwagi-ro, Buk-gu, Gwangju 61005, Republic of Korea
- Correspondence:
| |
Collapse
|
23
|
Nguyen MT, Nguyen T, Tran T. Learning to discover medicines. INTERNATIONAL JOURNAL OF DATA SCIENCE AND ANALYTICS 2022; 16:1-16. [PMID: 36440369 PMCID: PMC9676887 DOI: 10.1007/s41060-022-00371-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2022] [Accepted: 11/05/2022] [Indexed: 11/19/2022]
Abstract
Discovering new medicines is the hallmark of the human endeavor to live a better and longer life. Yet the pace of discovery has slowed down as we need to venture into more wildly unexplored biomedical space to find one that matches today's high standard. Modern AI-enabled by powerful computing, large biomedical databases, and breakthroughs in deep learning offers a new hope to break this loop as AI is rapidly maturing, ready to make a huge impact in the area. In this paper, we review recent advances in AI methodologies that aim to crack this challenge. We organize the vast and rapidly growing literature on AI for drug discovery into three relatively stable sub-areas: (a) representation learning over molecular sequences and geometric graphs; (b) data-driven reasoning where we predict molecular properties and their binding, optimize existing compounds, generate de novo molecules, and plan the synthesis of target molecules; and (c) knowledge-based reasoning where we discuss the construction and reasoning over biomedical knowledge graphs. We will also identify open challenges and chart possible research directions for the years to come.
Collapse
Affiliation(s)
- Minh-Tri Nguyen
- Applied Artificial Intelligence Institute, Deakin University, Burwood, VIC Australia
| | - Thin Nguyen
- Applied Artificial Intelligence Institute, Deakin University, Burwood, VIC Australia
| | - Truyen Tran
- Applied Artificial Intelligence Institute, Deakin University, Burwood, VIC Australia
| |
Collapse
|
24
|
Nguyen TM, Nguyen T, Tran T. Mitigating cold-start problems in drug-target affinity prediction with interaction knowledge transferring. Brief Bioinform 2022; 23:bbac269. [PMID: 35788823 PMCID: PMC9353967 DOI: 10.1093/bib/bbac269] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2022] [Revised: 05/20/2022] [Accepted: 06/08/2022] [Indexed: 12/04/2022] Open
Abstract
Predicting the drug-target interaction is crucial for drug discovery as well as drug repurposing. Machine learning is commonly used in drug-target affinity (DTA) problem. However, the machine learning model faces the cold-start problem where the model performance drops when predicting the interaction of a novel drug or target. Previous works try to solve the cold start problem by learning the drug or target representation using unsupervised learning. While the drug or target representation can be learned in an unsupervised manner, it still lacks the interaction information, which is critical in drug-target interaction. To incorporate the interaction information into the drug and protein interaction, we proposed using transfer learning from chemical-chemical interaction (CCI) and protein-protein interaction (PPI) task to drug-target interaction task. The representation learned by CCI and PPI tasks can be transferred smoothly to the DTA task due to the similar nature of the tasks. The result on the DTA datasets shows that our proposed method has advantages compared to other pre-training methods in the DTA task.
Collapse
Affiliation(s)
- Tri Minh Nguyen
- Applied Artificial Intelligence Institute, Deakin University, Victoria, Australia
| | - Thin Nguyen
- Applied Artificial Intelligence Institute, Deakin University, Victoria, Australia
| | - Truyen Tran
- Applied Artificial Intelligence Institute, Deakin University, Victoria, Australia
| |
Collapse
|
25
|
Luo H, Xiang Y, Fang X, Lin W, Wang F, Wu H, Wang H. BatchDTA: implicit batch alignment enhances deep learning-based drug-target affinity estimation. Brief Bioinform 2022; 23:6632927. [PMID: 35794723 DOI: 10.1093/bib/bbac260] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2022] [Revised: 05/23/2022] [Accepted: 06/03/2022] [Indexed: 11/14/2022] Open
Abstract
Candidate compounds with high binding affinities toward a target protein are likely to be developed as drugs. Deep neural networks (DNNs) have attracted increasing attention for drug-target affinity (DTA) estimation owning to their efficiency. However, the negative impact of batch effects caused by measure metrics, system technologies and other assay information is seldom discussed when training a DNN model for DTA. Suffering from the data deviation caused by batch effects, the DNN models can only be trained on a small amount of 'clean' data. Thus, it is challenging for them to provide precise and consistent estimations. We design a batch-sensitive training framework, namely BatchDTA, to train the DNN models. BatchDTA implicitly aligns multiple batches toward the same protein through learning the orders of candidate compounds with respect to the batches, alleviating the impact of the batch effects on the DNN models. Extensive experiments demonstrate that BatchDTA facilitates four mainstream DNN models to enhance the ability and robustness on multiple DTA datasets (BindingDB, Davis and KIBA). The average concordance index of the DNN models achieves a relative improvement of 4.0%. The case study reveals that BatchDTA can successfully learn the ranking orders of the compounds from multiple batches. In addition, BatchDTA can also be applied to the fused data collected from multiple sources to achieve further improvement.
Collapse
Affiliation(s)
- Hongyu Luo
- PaddleHelix team, Baidu Inc., 518000, Shenzhen, China
| | - Yingfei Xiang
- PaddleHelix team, Baidu Inc., 518000, Shenzhen, China
| | - Xiaomin Fang
- PaddleHelix team, Baidu Inc., 518000, Shenzhen, China
| | - Wei Lin
- PaddleHelix team, Baidu Inc., 518000, Shenzhen, China
| | - Fan Wang
- PaddleHelix team, Baidu Inc., 518000, Shenzhen, China
| | - Hua Wu
- Baidu Inc., 100000, Beijing, China
| | | |
Collapse
|
26
|
DeepMHADTA: Prediction of Drug-Target Binding Affinity Using Multi-Head Self-Attention and Convolutional Neural Network. Curr Issues Mol Biol 2022; 44:2287-2299. [PMID: 35678684 PMCID: PMC9164023 DOI: 10.3390/cimb44050155] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2022] [Revised: 05/08/2022] [Accepted: 05/14/2022] [Indexed: 11/17/2022] Open
Abstract
Drug-target interactions provide insight into the drug-side effects and drug repositioning. However, wet-lab biochemical experiments are time-consuming and labor-intensive, and are insufficient to meet the pressing demand for drug research and development. With the rapid advancement of deep learning, computational methods are increasingly applied to screen drug-target interactions. Many methods consider this problem as a binary classification task (binding or not), but ignore the quantitative binding affinity. In this paper, we propose a new end-to-end deep learning method called DeepMHADTA, which uses the multi-head self-attention mechanism in a deep residual network to predict drug-target binding affinity. On two benchmark datasets, our method outperformed several current state-of-the-art methods in terms of multiple performance measures, including mean square error (MSE), consistency index (CI), rm2, and PR curve area (AUPR). The results demonstrated that our method achieved better performance in predicting the drug–target binding affinity.
Collapse
|
27
|
Liu S, Wang Y, Deng Y, He L, Shao B, Yin J, Zheng N, Liu TY, Wang T. Improved drug-target interaction prediction with intermolecular graph transformer. Brief Bioinform 2022; 23:6581433. [PMID: 35514186 DOI: 10.1093/bib/bbac162] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2022] [Revised: 03/28/2022] [Accepted: 04/11/2022] [Indexed: 11/13/2022] Open
Abstract
The identification of active binding drugs for target proteins (referred to as drug-target interaction prediction) is the key challenge in virtual screening, which plays an essential role in drug discovery. Although recent deep learning-based approaches achieve better performance than molecular docking, existing models often neglect topological or spatial of intermolecular information, hindering prediction performance. We recognize this problem and propose a novel approach called the Intermolecular Graph Transformer (IGT) that employs a dedicated attention mechanism to model intermolecular information with a three-way Transformer-based architecture. IGT outperforms state-of-the-art (SoTA) approaches by 9.1% and 20.5% over the second best option for binding activity and binding pose prediction, respectively, and exhibits superior generalization ability to unseen receptor proteins than SoTA approaches. Furthermore, IGT exhibits promising drug screening ability against severe acute respiratory syndrome coronavirus 2 by identifying 83.1% active drugs that have been validated by wet-lab experiments with near-native predicted binding poses. Source code and datasets are available at https://github.com/microsoft/IGT-Intermolecular-Graph-Transformer.
Collapse
Affiliation(s)
- Siyuan Liu
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, 510006, China.,Guangdong Key Laboratory of Big Data Analysis and Processing, Guangzhou, 510006, China.,Microsoft Research Asia, Beijing, 100080, China
| | - Yusong Wang
- Microsoft Research Asia, Beijing, 100080, China.,Institute of Artificial Intelligence and Robotics, Xi'an Jiaotong University, Xi'an, 710049, China
| | - Yifan Deng
- Microsoft Research Asia, Beijing, 100080, China
| | - Liang He
- Microsoft Research Asia, Beijing, 100080, China.,School of Computer Science, Fudan University, Shanghai, 200433, China
| | - Bin Shao
- Microsoft Research Asia, Beijing, 100080, China
| | - Jian Yin
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, 510006, China.,Guangdong Key Laboratory of Big Data Analysis and Processing, Guangzhou, 510006, China
| | - Nanning Zheng
- Institute of Artificial Intelligence and Robotics, Xi'an Jiaotong University, Xi'an, 710049, China
| | - Tie-Yan Liu
- Microsoft Research Asia, Beijing, 100080, China
| | - Tong Wang
- Microsoft Research Asia, Beijing, 100080, China
| |
Collapse
|
28
|
Chen Y, Wang ZZ, Hao GF, Song BA. Web support for the more efficient discovery of kinase inhibitors. Drug Discov Today 2022; 27:2216-2225. [DOI: 10.1016/j.drudis.2022.04.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2021] [Revised: 02/16/2022] [Accepted: 04/01/2022] [Indexed: 11/24/2022]
|
29
|
Nguyen TM, Nguyen T, Le TM, Tran T. GEFA: Early Fusion Approach in Drug-Target Affinity Prediction. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:718-728. [PMID: 34197324 DOI: 10.1109/tcbb.2021.3094217] [Citation(s) in RCA: 35] [Impact Index Per Article: 17.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Predicting the interaction between a compound and a target is crucial for rapid drug repurposing. Deep learning has been successfully applied in drug-target affinity (DTA)problem. However, previous deep learning-based methods ignore modeling the direct interactions between drug and protein residues. This would lead to inaccurate learning of target representation which may change due to the drug binding effects. In addition, previous DTA methods learn protein representation solely based on a small number of protein sequences in DTA datasets while neglecting the use of proteins outside of the DTA datasets. We propose GEFA (Graph Early Fusion Affinity), a novel graph-in-graph neural network with attention mechanism to address the changes in target representation because of the binding effects. Specifically, a drug is modeled as a graph of atoms, which then serves as a node in a larger graph of residues-drug complex. The resulting model is an expressive deep nested graph neural network. We also use pre-trained protein representation powered by the recent effort of learning contextualized protein representation. The experiments are conducted under different settings to evaluate scenarios such as novel drugs or targets. The results demonstrate the effectiveness of the pre-trained protein embedding and the advantages our GEFA in modeling the nested graph for drug-target interaction.
Collapse
|
30
|
Ding Y, Tang J, Guo F, Zou Q. Identification of drug-target interactions via multiple kernel-based triple collaborative matrix factorization. Brief Bioinform 2022; 23:6520305. [PMID: 35134117 DOI: 10.1093/bib/bbab582] [Citation(s) in RCA: 31] [Impact Index Per Article: 15.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2021] [Revised: 12/02/2021] [Accepted: 12/19/2021] [Indexed: 12/15/2022] Open
Abstract
Targeted drugs have been applied to the treatment of cancer on a large scale, and some patients have certain therapeutic effects. It is a time-consuming task to detect drug-target interactions (DTIs) through biochemical experiments. At present, machine learning (ML) has been widely applied in large-scale drug screening. However, there are few methods for multiple information fusion. We propose a multiple kernel-based triple collaborative matrix factorization (MK-TCMF) method to predict DTIs. The multiple kernel matrices (contain chemical, biological and clinical information) are integrated via multi-kernel learning (MKL) algorithm. And the original adjacency matrix of DTIs could be decomposed into three matrices, including the latent feature matrix of the drug space, latent feature matrix of the target space and the bi-projection matrix (used to join the two feature spaces). To obtain better prediction performance, MKL algorithm can regulate the weight of each kernel matrix according to the prediction error. The weights of drug side-effects and target sequence are the highest. Compared with other computational methods, our model has better performance on four test data sets.
Collapse
Affiliation(s)
- Yijie Ding
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, P.R.China
| | - Jijun Tang
- Department of Computational Science and Engineering, University of South Carolina, Columbia, U.S
| | - Fei Guo
- School of Computer Science and Engineering, Central South University, Changsha, P.R.China
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, P.R.China
| |
Collapse
|
31
|
Viljanen M, Airola A, Pahikkala T. Generalized vec trick for fast learning of pairwise kernel models. Mach Learn 2022. [DOI: 10.1007/s10994-021-06127-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
AbstractPairwise learning corresponds to the supervised learning setting where the goal is to make predictions for pairs of objects. Prominent applications include predicting drug-target or protein-protein interactions, or customer-product preferences. In this work, we present a comprehensive review of pairwise kernels, that have been proposed for incorporating prior knowledge about the relationship between the objects. Specifically, we consider the standard, symmetric and anti-symmetric Kronecker product kernels, metric-learning, Cartesian, ranking, as well as linear, polynomial and Gaussian kernels. Recently, a $$O(nm+nq)$$
O
(
n
m
+
n
q
)
time generalized vec trick algorithm, where $$n$$
n
, $$m$$
m
, and $$q$$
q
denote the number of pairs, drugs and targets, was introduced for training kernel methods with the Kronecker product kernel. This was a significant improvement over previous $$O(n^2)$$
O
(
n
2
)
training methods, since in most real-world applications $$m,q<< n$$
m
,
q
<
<
n
. In this work we show how all the reviewed kernels can be expressed as sums of Kronecker products, allowing the use of generalized vec trick for speeding up their computation. In the experiments, we demonstrate how the introduced approach allows scaling pairwise kernels to much larger data sets than previously feasible, and provide an extensive comparison of the kernels on a number of biological interaction prediction tasks.
Collapse
|
32
|
Polypharmacology: The science of multi-targeting molecules. Pharmacol Res 2022; 176:106055. [PMID: 34990865 DOI: 10.1016/j.phrs.2021.106055] [Citation(s) in RCA: 33] [Impact Index Per Article: 16.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/15/2021] [Revised: 12/23/2021] [Accepted: 12/31/2021] [Indexed: 12/28/2022]
Abstract
Polypharmacology is a concept where a molecule can interact with two or more targets simultaneously. It offers many advantages as compared to the conventional single-targeting molecules. A multi-targeting drug is much more efficacious due to its cumulative efficacy at all of its individual targets making it much more effective in complex and multifactorial diseases like cancer, where multiple proteins and pathways are involved in the onset and development of the disease. For a molecule to be polypharmacologic in nature, it needs to possess promiscuity which is the ability to interact with multiple targets; and at the same time avoid binding to antitargets which would otherwise result in off-target adverse effects. There are certain structural features and physicochemical properties which when present would help researchers to predict if the designed molecule would possess promiscuity or not. Promiscuity can also be identified via advanced state-of-the-art computational methods. In this review, we also elaborate on the methods by which one can intentionally incorporate promiscuity in their molecules and make them polypharmacologic. The polypharmacology paradigm of "one drug-multiple targets" has numerous applications especially in drug repurposing where an already established drug is redeveloped for a new indication. Though designing a polypharmacological drug is much more difficult than designing a single-targeting drug, with the current technologies and information regarding different diseases and chemical functional groups, it is plausible for researchers to intentionally design a polypharmacological drug and unlock its advantages.
Collapse
|
33
|
Identification of drug-target interactions via multi-view graph regularized link propagation model. Neurocomputing 2021. [DOI: 10.1016/j.neucom.2021.05.100] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
|
34
|
Lei Y, Li S, Liu Z, Wan F, Tian T, Li S, Zhao D, Zeng J. A deep-learning framework for multi-level peptide-protein interaction prediction. Nat Commun 2021; 12:5465. [PMID: 34526500 PMCID: PMC8443569 DOI: 10.1038/s41467-021-25772-4] [Citation(s) in RCA: 79] [Impact Index Per Article: 26.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2021] [Accepted: 08/27/2021] [Indexed: 12/12/2022] Open
Abstract
Peptide-protein interactions are involved in various fundamental cellular functions and their identification is crucial for designing efficacious peptide therapeutics. Recently, a number of computational methods have been developed to predict peptide-protein interactions. However, most of the existing prediction approaches heavily depend on high-resolution structure data. Here, we present a deep learning framework for multi-level peptide-protein interaction prediction, called CAMP, including binary peptide-protein interaction prediction and corresponding peptide binding residue identification. Comprehensive evaluation demonstrated that CAMP can successfully capture the binary interactions between peptides and proteins and identify the binding residues along the peptides involved in the interactions. In addition, CAMP outperformed other state-of-the-art methods on binary peptide-protein interaction prediction. CAMP can serve as a useful tool in peptide-protein interaction prediction and identification of important binding residues in the peptides, which can thus facilitate the peptide drug discovery process.
Collapse
Affiliation(s)
- Yipin Lei
- Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing, 100084, China
| | - Shuya Li
- Machine Learning Department, Silexon AI Technology Co., Ltd., Nanjing, China
| | - Ziyi Liu
- Machine Learning Department, Silexon AI Technology Co., Ltd., Nanjing, China
| | - Fangping Wan
- Machine Learning Department, Silexon AI Technology Co., Ltd., Nanjing, China
| | - Tingzhong Tian
- Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing, 100084, China
| | - Shao Li
- Institute of TCM-X, MOE Key Laboratory of Bioinformatics, Bioinformatics Division, BNRist, Department of Automation, Tsinghua University, Beijing, 100084, China
| | - Dan Zhao
- Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing, 100084, China.
| | - Jianyang Zeng
- Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing, 100084, China.
| |
Collapse
|
35
|
Shaker B, Ahmad S, Lee J, Jung C, Na D. In silico methods and tools for drug discovery. Comput Biol Med 2021; 137:104851. [PMID: 34520990 DOI: 10.1016/j.compbiomed.2021.104851] [Citation(s) in RCA: 161] [Impact Index Per Article: 53.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2021] [Revised: 09/05/2021] [Accepted: 09/05/2021] [Indexed: 12/28/2022]
Abstract
In the past, conventional drug discovery strategies have been successfully employed to develop new drugs, but the process from lead identification to clinical trials takes more than 12 years and costs approximately $1.8 billion USD on average. Recently, in silico approaches have been attracting considerable interest because of their potential to accelerate drug discovery in terms of time, labor, and costs. Many new drug compounds have been successfully developed using computational methods. In this review, we briefly introduce computational drug discovery strategies and outline up-to-date tools to perform the strategies as well as available knowledge bases for those who develop their own computational models. Finally, we introduce successful examples of anti-bacterial, anti-viral, and anti-cancer drug discoveries that were made using computational methods.
Collapse
Affiliation(s)
- Bilal Shaker
- Department of Biomedical Engineering, Chung-Ang University, 84 Heukseok-ro, Dongjak-gu, Seoul, 06974, Republic of Korea
| | - Sajjad Ahmad
- Department of Health and Biological Sciences, Abasyn University, Peshawar, 25000, Pakistan
| | - Jingyu Lee
- Department of Biomedical Engineering, Chung-Ang University, 84 Heukseok-ro, Dongjak-gu, Seoul, 06974, Republic of Korea
| | - Chanjin Jung
- Department of Biomedical Engineering, Chung-Ang University, 84 Heukseok-ro, Dongjak-gu, Seoul, 06974, Republic of Korea
| | - Dokyun Na
- Department of Biomedical Engineering, Chung-Ang University, 84 Heukseok-ro, Dongjak-gu, Seoul, 06974, Republic of Korea.
| |
Collapse
|
36
|
Tanoli Z, Aldahdooh J, Alam F, Wang Y, Seemab U, Fratelli M, Pavlis P, Hajduch M, Bietrix F, Gribbon P, Zaliani A, Hall MD, Shen M, Brimacombe K, Kulesskiy E, Saarela J, Wennerberg K, Vähä-Koskela M, Tang J. Minimal information for chemosensitivity assays (MICHA): a next-generation pipeline to enable the FAIRification of drug screening experiments. Brief Bioinform 2021; 23:6361039. [PMID: 34472587 PMCID: PMC8769689 DOI: 10.1093/bib/bbab350] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2021] [Revised: 08/03/2021] [Accepted: 08/02/2021] [Indexed: 12/29/2022] Open
Abstract
Chemosensitivity assays are commonly used for preclinical drug discovery and clinical trial optimization. However, data from independent assays are often discordant, largely attributed to uncharacterized variation in the experimental materials and protocols. We report here the launching of Minimal Information for Chemosensitivity Assays (MICHA), accessed via https://micha-protocol.org. Distinguished from existing efforts that are often lacking support from data integration tools, MICHA can automatically extract publicly available information to facilitate the assay annotation including: 1) compounds, 2) samples, 3) reagents and 4) data processing methods. For example, MICHA provides an integrative web server and database to obtain compound annotation including chemical structures, targets and disease indications. In addition, the annotation of cell line samples, assay protocols and literature references can be greatly eased by retrieving manually curated catalogues. Once the annotation is complete, MICHA can export a report that conforms to the FAIR principle (Findable, Accessible, Interoperable and Reusable) of drug screening studies. To consolidate the utility of MICHA, we provide FAIRified protocols from five major cancer drug screening studies as well as six recently conducted COVID-19 studies. With the MICHA web server and database, we envisage a wider adoption of a community-driven effort to improve the open access of drug sensitivity assays.
Collapse
Affiliation(s)
- Ziaurrehman Tanoli
- Research Program in Systems Oncology, Faculty of medicine, University of Helsinki, Finland
| | - Jehad Aldahdooh
- Research Program in Systems Oncology, Faculty of medicine, University of Helsinki, Finland
| | - Farhan Alam
- Research Program in Systems Oncology, Faculty of medicine, University of Helsinki, Finland
| | - Yinyin Wang
- Research Program in Systems Oncology, Faculty of medicine, University of Helsinki, Finland
| | - Umair Seemab
- Research Program in Systems Oncology, Faculty of medicine, University of Helsinki, Finland
| | | | - Petr Pavlis
- Institute of Molecular and Translational Medicine, Czech
| | - Marian Hajduch
- Institute of Molecular and Translational Medicine, Czech
| | | | - Philip Gribbon
- Fraunhofer Institute for Molecular Biology and Applied Ecology, Germany
| | - Andrea Zaliani
- Fraunhofer Institute for Molecular Biology and Applied Ecology, Germany
| | - Matthew D Hall
- National Center for Advancing Translational Sciences, USA
| | - Min Shen
- National Center for Advancing Translational Sciences, USA
| | | | - Evgeny Kulesskiy
- Institute for Molecular Medicine Finland, University of Helsinki, Finland
| | - Jani Saarela
- Institute for Molecular Medicine Finland, University of Helsinki, Finland
| | - Krister Wennerberg
- Biotech Research & Innovation Centre (BRIC), University of Copenhagen, Denmark
| | | | - Jing Tang
- Research Program in Systems Oncology, Faculty of medicine, University of Helsinki, Finland
| |
Collapse
|
37
|
Zhang S, Jiang M, Wang S, Wang X, Wei Z, Li Z. SAG-DTA: Prediction of Drug-Target Affinity Using Self-Attention Graph Network. Int J Mol Sci 2021; 22:ijms22168993. [PMID: 34445696 PMCID: PMC8396496 DOI: 10.3390/ijms22168993] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2021] [Revised: 08/14/2021] [Accepted: 08/17/2021] [Indexed: 11/16/2022] Open
Abstract
The prediction of drug–target affinity (DTA) is a crucial step for drug screening and discovery. In this study, a new graph-based prediction model named SAG-DTA (self-attention graph drug–target affinity) was implemented. Unlike previous graph-based methods, the proposed model utilized self-attention mechanisms on the drug molecular graph to obtain effective representations of drugs for DTA prediction. Features of each atom node in the molecular graph were weighted using an attention score before being aggregated as molecule representation. Various self-attention scoring methods were compared in this study. In addition, two pooing architectures, namely, global and hierarchical architectures, were presented and evaluated on benchmark datasets. Results of comparative experiments on both regression and binary classification tasks showed that SAG-DTA was superior to previous sequence-based or other graph-based methods and exhibited good generalization ability.
Collapse
Affiliation(s)
- Shugang Zhang
- College of Computer Science and Technology, Ocean University of China, Qingdao 266100, China; (S.Z.); (Z.W.)
| | - Mingjian Jiang
- School of Information and Control Engineering, Qingdao University of Technology, Qingdao 266033, China;
| | - Shuang Wang
- College of Computer Science and Technology, China University of Petroleum (East China), Qingdao 266580, China;
| | | | - Zhiqiang Wei
- College of Computer Science and Technology, Ocean University of China, Qingdao 266100, China; (S.Z.); (Z.W.)
| | - Zhen Li
- College of Computer Science and Technology, Qingdao University, Qingdao 266071, China
- Correspondence: ; Tel./Fax: +86-532-85953086
| |
Collapse
|
38
|
Recent advances in drug repurposing using machine learning. Curr Opin Chem Biol 2021; 65:74-84. [PMID: 34274565 DOI: 10.1016/j.cbpa.2021.06.001] [Citation(s) in RCA: 24] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2021] [Revised: 05/28/2021] [Accepted: 06/01/2021] [Indexed: 12/11/2022]
Abstract
Drug repurposing aims to find new uses for already existing and approved drugs. We now provide a brief overview of recent developments in drug repurposing using machine learning alongside other computational approaches for comparison. We also highlight several applications for cancer using kinase inhibitors, Alzheimer's disease as well as COVID-19.
Collapse
|
39
|
Tanoli Z, Aldahdooh J, Alam F, Wang Y, Seemab U, Fratelli M, Pavlis P, Hajduch M, Bietrix F, Gribbon P, Zaliani A, Hall MD, Shen M, Brimacombe K, Kulesskiy E, Saarela J, Wennerberg K, Vähä-Koskela M, Tang J. Minimal information for Chemosensitivity assays (MICHA): A next-generation pipeline to enable the FAIRification of drug screening experiments. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2021:2020.12.03.409409. [PMID: 33300000 PMCID: PMC7724669 DOI: 10.1101/2020.12.03.409409] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
Chemosensitivity assays are commonly used for preclinical drug discovery and clinical trial optimization. However, data from independent assays are often discordant, largely attributed to uncharacterized variation in the experimental materials and protocols. We report here the launching of MICHA (Minimal Information for Chemosensitivity Assays), accessed via https://micha-protocol.org. Distinguished from existing efforts that are often lacking support from data integration tools, MICHA can automatically extract publicly available information to facilitate the assay annotation including: 1) compounds, 2) samples, 3) reagents, and 4) data processing methods. For example, MICHA provides an integrative web server and database to obtain compound annotation including chemical structures, targets, and disease indications. In addition, the annotation of cell line samples, assay protocols and literature references can be greatly eased by retrieving manually curated catalogues. Once the annotation is complete, MICHA can export a report that conforms to the FAIR principle (Findable, Accessible, Interoperable and Reusable) of drug screening studies. To consolidate the utility of MICHA, we provide FAIRified protocols from five major cancer drug screening studies, as well as six recently conducted COVID-19 studies. With the MICHA webserver and database, we envisage a wider adoption of a community-driven effort to improve the open access of drug sensitivity assays.
Collapse
Affiliation(s)
- Ziaurrehman Tanoli
- Research Program in Systems Oncology, Faculty of medicine, University of Helsinki, Finland
| | - Jehad Aldahdooh
- Research Program in Systems Oncology, Faculty of medicine, University of Helsinki, Finland
| | - Farhan Alam
- Research Program in Systems Oncology, Faculty of medicine, University of Helsinki, Finland
| | - Yinyin Wang
- Research Program in Systems Oncology, Faculty of medicine, University of Helsinki, Finland
| | - Umair Seemab
- Research Program in Systems Oncology, Faculty of medicine, University of Helsinki, Finland
| | | | - Petr Pavlis
- Institute of Molecular and Translational Medicine, Czech
| | - Marian Hajduch
- Institute of Molecular and Translational Medicine, Czech
| | | | - Philip Gribbon
- Fraunhofer Institute for Translational Medicine and Pharmacology, Hamburg, Germany
| | - Andrea Zaliani
- Fraunhofer Institute for Translational Medicine and Pharmacology, Hamburg, Germany
| | | | - Min Shen
- National Center for Advancing Translational Sciences, U.S.A
| | | | - Evgeny Kulesskiy
- Institute for Molecular Medicine Finland, University of Helsinki, Finland
| | - Jani Saarela
- Institute for Molecular Medicine Finland, University of Helsinki, Finland
| | - Krister Wennerberg
- Biotech Research & Innovation Centre (BRIC), University of Copenhagen, Denmark
| | | | - Jing Tang
- Research Program in Systems Oncology, Faculty of medicine, University of Helsinki, Finland
| |
Collapse
|
40
|
Nguyen T, Le H, Quinn TP, Nguyen T, Le TD, Venkatesh S. GraphDTA: predicting drug-target binding affinity with graph neural networks. Bioinformatics 2021; 37:1140-1147. [PMID: 33119053 DOI: 10.1093/bioinformatics/btaa921] [Citation(s) in RCA: 319] [Impact Index Per Article: 106.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2020] [Revised: 10/01/2020] [Accepted: 10/15/2020] [Indexed: 12/21/2022] Open
Abstract
SUMMARY The development of new drugs is costly, time consuming and often accompanied with safety issues. Drug repurposing can avoid the expensive and lengthy process of drug development by finding new uses for already approved drugs. In order to repurpose drugs effectively, it is useful to know which proteins are targeted by which drugs. Computational models that estimate the interaction strength of new drug-target pairs have the potential to expedite drug repurposing. Several models have been proposed for this task. However, these models represent the drugs as strings, which is not a natural way to represent molecules. We propose a new model called GraphDTA that represents drugs as graphs and uses graph neural networks to predict drug-target affinity. We show that graph neural networks not only predict drug-target affinity better than non-deep learning models, but also outperform competing deep learning methods. Our results confirm that deep learning models are appropriate for drug-target binding affinity prediction, and that representing drugs as graphs can lead to further improvements. AVAILABILITY OF IMPLEMENTATION The proposed models are implemented in Python. Related data, pre-trained models and source code are publicly available at https://github.com/thinng/GraphDTA. All scripts and data needed to reproduce the post hoc statistical analysis are available from https://doi.org/10.5281/zenodo.3603523. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Thin Nguyen
- Applied Artificial Intelligence Institute, Deakin University, Geelong, VIC, 3216, Australia
| | - Hang Le
- Faculty of Information Technology, Nha Trang University, Nha Trang, Khanh Hoa, Viet Nam
| | - Thomas P Quinn
- Applied Artificial Intelligence Institute, Deakin University, Geelong, VIC, 3216, Australia
| | - Tri Nguyen
- Applied Artificial Intelligence Institute, Deakin University, Geelong, VIC, 3216, Australia
| | - Thuc Duy Le
- School of Information Technology and Mathematical Sciences, University of South Australia, Adelaide, SA, 5095, Australia
| | - Svetha Venkatesh
- Applied Artificial Intelligence Institute, Deakin University, Geelong, VIC, 3216, Australia
| |
Collapse
|
41
|
Cichońska A, Ravikumar B, Allaway RJ, Wan F, Park S, Isayev O, Li S, Mason M, Lamb A, Tanoli Z, Jeon M, Kim S, Popova M, Capuzzi S, Zeng J, Dang K, Koytiger G, Kang J, Wells CI, Willson TM, Oprea TI, Schlessinger A, Drewry DH, Stolovitzky G, Wennerberg K, Guinney J, Aittokallio T. Crowdsourced mapping of unexplored target space of kinase inhibitors. Nat Commun 2021; 12:3307. [PMID: 34083538 PMCID: PMC8175708 DOI: 10.1038/s41467-021-23165-1] [Citation(s) in RCA: 42] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2020] [Accepted: 04/15/2021] [Indexed: 12/31/2022] Open
Abstract
Despite decades of intensive search for compounds that modulate the activity of particular protein targets, a large proportion of the human kinome remains as yet undrugged. Effective approaches are therefore required to map the massive space of unexplored compound-kinase interactions for novel and potent activities. Here, we carry out a crowdsourced benchmarking of predictive algorithms for kinase inhibitor potencies across multiple kinase families tested on unpublished bioactivity data. We find the top-performing predictions are based on various models, including kernel learning, gradient boosting and deep learning, and their ensemble leads to a predictive accuracy exceeding that of single-dose kinase activity assays. We design experiments based on the model predictions and identify unexpected activities even for under-studied kinases, thereby accelerating experimental mapping efforts. The open-source prediction algorithms together with the bioactivities between 95 compounds and 295 kinases provide a resource for benchmarking prediction algorithms and for extending the druggable kinome.
Collapse
Affiliation(s)
- Anna Cichońska
- Institute for Molecular Medicine Finland (FIMM), HiLIFE, University of Helsinki, Helsinki, Finland
- Department of Computer Science, Helsinki Institute for Information Technology (HIIT), Aalto University, Espoo, Finland
- Department of Computing, University of Turku, Turku, Finland
| | - Balaguru Ravikumar
- Institute for Molecular Medicine Finland (FIMM), HiLIFE, University of Helsinki, Helsinki, Finland
| | | | - Fangping Wan
- Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing, China
| | - Sungjoon Park
- Department of Computer Science and Engineering, Korea University, Seoul, Republic of Korea
| | - Olexandr Isayev
- Department of Chemistry, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Shuya Li
- Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing, China
| | - Michael Mason
- Computational Oncology, Sage Bionetworks, Seattle, WA, USA
| | - Andrew Lamb
- Computational Oncology, Sage Bionetworks, Seattle, WA, USA
| | - Ziaurrehman Tanoli
- Institute for Molecular Medicine Finland (FIMM), HiLIFE, University of Helsinki, Helsinki, Finland
| | - Minji Jeon
- Department of Computer Science and Engineering, Korea University, Seoul, Republic of Korea
| | - Sunkyu Kim
- Department of Computer Science and Engineering, Korea University, Seoul, Republic of Korea
| | - Mariya Popova
- Department of Chemistry, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Stephen Capuzzi
- Laboratory for Molecular Modeling, Division of Chemical Biology and Medicinal Chemistry, UNC Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, NC, USA
| | - Jianyang Zeng
- Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing, China
| | - Kristen Dang
- Computational Oncology, Sage Bionetworks, Seattle, WA, USA
| | | | - Jaewoo Kang
- Department of Computer Science and Engineering, Korea University, Seoul, Republic of Korea
| | - Carrow I Wells
- Structural Genomics Consortium, UNC Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, NC, USA
| | - Timothy M Willson
- Structural Genomics Consortium, UNC Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, NC, USA
| | - Tudor I Oprea
- Translational Informatics Division and Comprehensive Cancer Center, University of New Mexico School of Medicine, Albuquerque, NM, USA
| | - Avner Schlessinger
- Department of Pharmacological Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - David H Drewry
- Structural Genomics Consortium, UNC Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, NC, USA
| | | | - Krister Wennerberg
- Biotech Research and Innovation Centre (BRIC), University of Copenhagen, Copenhagen, Denmark.
| | - Justin Guinney
- Computational Oncology, Sage Bionetworks, Seattle, WA, USA.
| | - Tero Aittokallio
- Institute for Molecular Medicine Finland (FIMM), HiLIFE, University of Helsinki, Helsinki, Finland.
- Department of Computer Science, Helsinki Institute for Information Technology (HIIT), Aalto University, Espoo, Finland.
- Department of Mathematics and Statistics, University of Turku, Turku, Finland.
- Institute for Cancer Research, Oslo University Hospital, Oslo, Norway.
- Oslo Centre for Biostatistics and Epidemiology (OCBE), University of Oslo, Oslo, Norway.
| |
Collapse
|
42
|
Kharkar PS. Computational Approaches for the Design of (Mutant-)Selective Tyrosine Kinase Inhibitors: State-of-the-Art and Future Prospects. Curr Top Med Chem 2021; 20:1564-1575. [PMID: 32357816 DOI: 10.2174/1568026620666200502005853] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2020] [Revised: 03/10/2020] [Accepted: 03/26/2020] [Indexed: 02/08/2023]
Abstract
Kinases remain one of the major attractive therapeutic targets for a large number of indications such as cancer, rheumatoid arthritis, cardiac failure and many others. Design and development of kinase inhibitors (ATP-competitive, allosteric or covalent) is a clinically validated and successful strategy in the pharmaceutical industry. The perks come with limitations, particularly the development of resistance to highly potent and selective inhibitors. When this happens, the cycle needs to be repeated, i.e., the design and development of kinase inhibitors active against the mutated forms. The complexity of tumor milieu makes it awfully difficult for these molecularly-targeted therapies to work. Every year newer and better versions of these agents are introduced in the clinic. Several computational approaches such as structure-, ligand-based or hybrid ones continue to live up to their potential in discovering novel kinase inhibitors. New schools of thought in this area continue to emerge, e.g., development of dual-target kinase inhibitors. But there are fundamental issues with this approach. It is indeed difficult to selectively optimize binding at two entirely different or related kinases. In addition to the conventional strategies, modern technologies (machine learning, deep learning, artificial intelligence, etc.) started yielding the results and building success stories. Computational tools invariably played a critical role in catalysing the phenomenal progress in kinase drug discovery field. The present review summarized the progress in utilizing computational methods and tools for discovering (mutant-)selective tyrosine kinase inhibitor drugs in the last three years (2017-2019). Representative investigations have been discussed, while others are merely listed. The author believes that the enthusiastic reader will be inspired to dig out the cited literature extensively to appreciate the progress made so far and the future prospects of the field.
Collapse
Affiliation(s)
- Prashant S Kharkar
- Department of Pharmaceutical Sciences and Technology, Institute of Chemical Technology, Nathalal Parekh Marg, Matunga, Mumbai 400 019, India
| |
Collapse
|
43
|
Wang K, Zhou R, Li Y, Li M. DeepDTAF: a deep learning method to predict protein-ligand binding affinity. Brief Bioinform 2021; 22:6214647. [PMID: 33834190 DOI: 10.1093/bib/bbab072] [Citation(s) in RCA: 59] [Impact Index Per Article: 19.7] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2020] [Revised: 01/27/2021] [Accepted: 02/14/2021] [Indexed: 01/10/2023] Open
Abstract
Biomolecular recognition between ligand and protein plays an essential role in drug discovery and development. However, it is extremely time and resource consuming to determine the protein-ligand binding affinity by experiments. At present, many computational methods have been proposed to predict binding affinity, most of which usually require protein 3D structures that are not often available. Therefore, new methods that can fully take advantage of sequence-level features are greatly needed to predict protein-ligand binding affinity and accelerate the drug discovery process. We developed a novel deep learning approach, named DeepDTAF, to predict the protein-ligand binding affinity. DeepDTAF was constructed by integrating local and global contextual features. More specifically, the protein-binding pocket, which possesses some special properties for directly binding the ligand, was firstly used as the local input feature for protein-ligand binding affinity prediction. Furthermore, dilated convolution was used to capture multiscale long-range interactions. We compared DeepDTAF with the recent state-of-art methods and analyzed the effectiveness of different parts of our model, the significant accuracy improvement showed that DeepDTAF was a reliable tool for affinity prediction. The resource codes and data are available at https: //github.com/KailiWang1/DeepDTAF.
Collapse
Affiliation(s)
| | - Renyi Zhou
- School of Computer Science and Engineering, Central South University, China
| | - Yaohang Li
- Department of Computer Science at Old Dominion University, Norfolk, USA
| | - Min Li
- School of Computer Science and Engineering, Central South University, China
| |
Collapse
|
44
|
Naveed H, Reglin C, Schubert T, Gao X, Arold ST, Maitland ML. Identifying Novel Drug Targets by iDTPnd: A Case Study of Kinase Inhibitors. GENOMICS PROTEOMICS & BIOINFORMATICS 2021; 19:986-997. [PMID: 33794377 PMCID: PMC9403029 DOI: 10.1016/j.gpb.2020.05.006] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/01/2018] [Revised: 01/08/2020] [Accepted: 05/11/2020] [Indexed: 11/16/2022]
Abstract
Current FDA-approved kinase inhibitors cause diverse adverse effects, some of which are due to the mechanism-independent effects of these drugs. Identifying these mechanism-independent interactions could improve drug safety and support drug repurposing. Here, we develop iDTPnd (integrated Drug Target Predictor with negative dataset), a computational approach for large-scale discovery of novel targets for known drugs. For a given drug, we construct a positive structural signature as well as a negative structural signature that captures the weakly conserved structural features of drug-binding sites. To facilitate assessment of unintended targets, iDTPnd also provides a docking-based interaction score and its statistical significance. We confirm the interactions of sorafenib, imatinib, dasatinib, sunitinib, and pazopanib with their known targets at a sensitivity of 52% and a specificity of 55%. We also validate 10 predicted novel targets by using in vitro experiments. Our results suggest that proteins other than kinases, such as nuclear receptors, cytochrome P450, and MHC class I molecules, can also be physiologically relevant targets of kinase inhibitors. Our method is general and broadly applicable for the identification of protein–small molecule interactions, when sufficient drug–target 3D data are available. The code for constructing the structural signatures is available at https://sfb.kaust.edu.sa/Documents/iDTP.zip.
Collapse
Affiliation(s)
- Hammad Naveed
- Toyota Technological Institute at Chicago, Chicago, IL 60637, USA; Department of Computer Science, National University of Computer and Emerging Sciences, Islamabad 44000, Pakistan.
| | | | | | - Xin Gao
- King Abdullah University of Science and Technology (KAUST), Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division, Thuwal 23955, Saudi Arabia
| | - Stefan T Arold
- King Abdullah University of Science and Technology (KAUST), Computational Bioscience Research Center (CBRC), Biological and Environmental Sciences and Engineering (BESE) Division, Thuwal 23955, Saudi Arabia
| | - Michael L Maitland
- Inova Center for Personalized Health and Schar Cancer Institute, Falls Church, VA 22042 USA,; University of Virginia Cancer Center, Annandale, Virginia 22003, USA
| |
Collapse
|
45
|
Yılmaz S, Ayati M, Schlatzer D, Çiçek AE, Chance MR, Koyutürk M. Robust inference of kinase activity using functional networks. Nat Commun 2021; 12:1177. [PMID: 33608514 PMCID: PMC7895941 DOI: 10.1038/s41467-021-21211-6] [Citation(s) in RCA: 30] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2020] [Accepted: 01/15/2021] [Indexed: 12/17/2022] Open
Abstract
Mass spectrometry enables high-throughput screening of phosphoproteins across a broad range of biological contexts. When complemented by computational algorithms, phospho-proteomic data allows the inference of kinase activity, facilitating the identification of dysregulated kinases in various diseases including cancer, Alzheimer’s disease and Parkinson’s disease. To enhance the reliability of kinase activity inference, we present a network-based framework, RoKAI, that integrates various sources of functional information to capture coordinated changes in signaling. Through computational experiments, we show that phosphorylation of sites in the functional neighborhood of a kinase are significantly predictive of its activity. The incorporation of this knowledge in RoKAI consistently enhances the accuracy of kinase activity inference methods while making them more robust to missing annotations and quantifications. This enables the identification of understudied kinases and will likely lead to the development of novel kinase inhibitors for targeted therapy of many diseases. RoKAI is available as web-based tool at http://rokai.io. Kinases drive fundamental changes in cell state, but predicting kinase activity based on substrate-level changes can be challenging. Here the authors introduce a computational framework that utilizes similarities between substrates to robustly infer kinase activity.
Collapse
Affiliation(s)
- Serhan Yılmaz
- Department of Computer and Data Sciences, Case Western Reserve University, Cleveland, OH, USA.
| | - Marzieh Ayati
- Department of Computer Science, University of Texas Rio Grande Valley, Edinburg, TX, USA
| | - Daniela Schlatzer
- Center for Proteomics and Bioinformatics, Case Western Reserve University, Cleveland, OH, USA
| | - A Ercüment Çiçek
- Department of Computer Engineering, Bilkent University, Ankara, Turkey.,Department of Computational Biology, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Mark R Chance
- Center for Proteomics and Bioinformatics, Case Western Reserve University, Cleveland, OH, USA.,Department of Nutrition, Case Western Reserve University, Cleveland, OH, USA
| | - Mehmet Koyutürk
- Department of Computer and Data Sciences, Case Western Reserve University, Cleveland, OH, USA.,Center for Proteomics and Bioinformatics, Case Western Reserve University, Cleveland, OH, USA
| |
Collapse
|
46
|
Tanoli Z, Vähä-Koskela M, Aittokallio T. Artificial intelligence, machine learning, and drug repurposing in cancer. Expert Opin Drug Discov 2021; 16:977-989. [PMID: 33543671 DOI: 10.1080/17460441.2021.1883585] [Citation(s) in RCA: 40] [Impact Index Per Article: 13.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
Introduction: Drug repurposing provides a cost-effective strategy to re-use approved drugs for new medical indications. Several machine learning (ML) and artificial intelligence (AI) approaches have been developed for systematic identification of drug repurposing leads based on big data resources, hence further accelerating and de-risking the drug development process by computational means.Areas covered: The authors focus on supervised ML and AI methods that make use of publicly available databases and information resources. While most of the example applications are in the field of anticancer drug therapies, the methods and resources reviewed are widely applicable also to other indications including COVID-19 treatment. A particular emphasis is placed on the use of comprehensive target activity profiles that enable a systematic repurposing process by extending the target profile of drugs to include potent off-targets with therapeutic potential for a new indication.Expert opinion: The scarcity of clinical patient data and the current focus on genetic aberrations as primary drug targets may limit the performance of anticancer drug repurposing approaches that rely solely on genomics-based information. Functional testing of cancer patient cells exposed to a large number of targeted therapies and their combinations provides an additional source of repurposing information for tissue-aware AI approaches.
Collapse
Affiliation(s)
- Ziaurrehman Tanoli
- Institute for Molecular Medicine Finland (FIMM), Helsinki Institute of Life Science (HiLife, University of Helsinki, Helsinki, Finland
| | - Markus Vähä-Koskela
- Institute for Molecular Medicine Finland (FIMM), Helsinki Institute of Life Science (HiLife, University of Helsinki, Helsinki, Finland
| | - Tero Aittokallio
- Institute for Molecular Medicine Finland (FIMM), Helsinki Institute of Life Science (HiLife, University of Helsinki, Helsinki, Finland.,Institute for Cancer Research, Department of Cancer Genetics, Oslo University Hospital, Oslo, Norway.,Centre for Biostatistics and Epidemiology (OCBE), Faculty of Medicine, University of Oslo, Oslo, Norway
| |
Collapse
|
47
|
Kabir ER, Mustafa N, Nausheen N, Sharif Siam MK, Syed EU. Exploring existing drugs: proposing potential compounds in the treatment of COVID-19. Heliyon 2021; 7:e06284. [PMID: 33655082 PMCID: PMC7906017 DOI: 10.1016/j.heliyon.2021.e06284] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2020] [Revised: 12/13/2020] [Accepted: 02/10/2021] [Indexed: 01/08/2023] Open
Abstract
The COVID-19 situation had escalated into an unprecedented global crisis in just a few weeks. On the 30th of January 2020, World Health Organization officially declared the COVID-19 epidemic as a public health emergency of international concern. The confirmed cases were reported to exceed 105,856,046 globally, with the death toll of above 2,311,048, according to the dashboard from Johns Hopkins University on the 7th of February, 2021, though the actual figures may be much higher. Conserved regions of the South Asian strains were used to construct a phylogenetic tree to find evolutionary relationships among the novel virus. Off target similarities were searched with other microorganisms that have been previously reported using Basic Local Alignment Search Tool (BLAST). The conserved regions did not match with any previously reported microorganisms or viruses, which confirmed the novelty of SARS-CoV-2. Currently there is no approved drug for the prevention and treatment of COVID-19, but researchers globally are attempting to come up with one or more soon. Therapeutic strategies need to be addressed urgently to combat COVID-19. Successful drug repurposing is a tool that uses old and safe drugs, is time effective and requires lower development costs, and was thus considered for the study. Molecular docking was used for repurposing drugs from our own comprehensive database of approximately 300 highly characterized, existing drugs with known safety profile, to identify compounds that will inhibit the chosen molecular targets - SARS-CoV-2, ACE2, and TMPRSS2. The study has identified and proposed twenty seven candidates for further in vitro and in vivo studies for the treatment of SARS-CoV-2 infection.
Collapse
|
48
|
Leveraging multi-way interactions for systematic prediction of pre-clinical drug combination effects. Nat Commun 2020; 11:6136. [PMID: 33262326 PMCID: PMC7708835 DOI: 10.1038/s41467-020-19950-z] [Citation(s) in RCA: 49] [Impact Index Per Article: 12.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2020] [Accepted: 11/05/2020] [Indexed: 12/12/2022] Open
Abstract
We present comboFM, a machine learning framework for predicting the responses of drug combinations in pre-clinical studies, such as those based on cell lines or patient-derived cells. comboFM models the cell context-specific drug interactions through higher-order tensors, and efficiently learns latent factors of the tensor using powerful factorization machines. The approach enables comboFM to leverage information from previous experiments performed on similar drugs and cells when predicting responses of new combinations in so far untested cells; thereby, it achieves highly accurate predictions despite sparsely populated data tensors. We demonstrate high predictive performance of comboFM in various prediction scenarios using data from cancer cell line pharmacogenomic screens. Subsequent experimental validation of a set of previously untested drug combinations further supports the practical and robust applicability of comboFM. For instance, we confirm a novel synergy between anaplastic lymphoma kinase (ALK) inhibitor crizotinib and proteasome inhibitor bortezomib in lymphoma cells. Overall, our results demonstrate that comboFM provides an effective means for systematic pre-screening of drug combinations to support precision oncology applications. Combinatorial treatments have become a standard of care for various complex diseases including cancers. Here, the authors show that combinatorial responses of two anticancer drugs can be accurately predicted using factorization machines trained on large-scale pharmacogenomic data for guiding precision oncology studies.
Collapse
|
49
|
Identification of Drug–Target Interactions via Dual Laplacian Regularized Least Squares with Multiple Kernel Fusion. Knowl Based Syst 2020. [DOI: 10.1016/j.knosys.2020.106254] [Citation(s) in RCA: 71] [Impact Index Per Article: 17.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
|
50
|
Trigueiro-Louro J, Correia V, Figueiredo-Nunes I, Gíria M, Rebelo-de-Andrade H. Unlocking COVID therapeutic targets: A structure-based rationale against SARS-CoV-2, SARS-CoV and MERS-CoV Spike. Comput Struct Biotechnol J 2020; 18:2117-2131. [PMID: 32913581 PMCID: PMC7452956 DOI: 10.1016/j.csbj.2020.07.017] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2020] [Revised: 07/20/2020] [Accepted: 07/22/2020] [Indexed: 12/11/2022] Open
Abstract
There are no approved target therapeutics against SARS-CoV-2 or other beta-CoVs. The beta-CoV Spike protein is a promising target considering the critical role in viral infection and pathogenesis and its surface exposed features. We performed a structure-based strategy targeting highly conserved druggable regions resulting from a comprehensive large-scale sequence analysis and structural characterization of Spike domains across SARSr- and MERSr-CoVs. We have disclosed 28 main consensus druggable pockets within the Spike. The RBD and SD1 (S1 subunit); and the CR, HR1 and CH (S2 subunit) represent the most promising conserved druggable regions. Additionally, we have identified 181 new potential hot spot residues for the hSARSr-CoVs and 72 new hot spot residues for the SARSr- and MERSr-CoVs, which have not been described before in the literature. These sites/residues exhibit advantageous structural features for targeted molecular and pharmacological modulation. This study establishes the Spike as a promising anti-CoV target using an approach with a potential higher resilience to resistance development and directed to a broad spectrum of Beta-CoVs, including the new SARS-CoV-2 responsible for COVID-19. This research also provides a structure-based rationale for the design and discovery of chemical inhibitors, antibodies or other therapeutic modalities successfully targeting the Beta-CoV Spike protein.
Collapse
Key Words
- ACE2, angiotensin-converting enzyme2
- Bat-SL-CoVs, bat SARS-like coronavirus
- Beta-CoVs, betacoronavirus
- Betacoronavirus
- CC, conserved cluster
- CD, connector domain
- CDP, consensus druggable pocket
- CDR, consensus druggable residue
- CH, central helix
- CP, cytoplasmic domain
- CR, connecting region
- CS, conservation score
- CoVs, coronavirus
- Coronavirus disease
- DGSS, DoGSiteScorer
- DPP4, dipeptidyl peptidase-4
- Druggability prediction
- FP, fusion peptide
- HR1, heptad repeat 1
- HR2, heptad repeat 2
- MERS-CoVs, middle east respiratory syndrome coronavirus
- MERSr-CoVs, middle east respiratory syndrome-related coronavirus
- MSA, multiple sequence alignment
- NTD, N-terminal domain
- Novel antiviral targets
- PDB, Protein Data Bank
- PDS, PockDrug-Server
- RBD, Receptor-Binding Domain
- S, Spike
- SARS-CoV-2
- SARS-CoV-2, severe acute respiratory syndrome coronavirus 2
- SARS-CoVs, severe acute respiratory syndrome coronavirus
- SARSr-CoVs, severe acute respiratory syndrome-related coronavirus
- SD1, subdomain 1
- SD2, subdomain 2
- SF, SiteFinder from MOE
- SP, small pocket
- Sequence conservation
- Spike protein
- Sv, shorter variant
- T-RHS, top-ranked hot spots
- TMPRSS2, transmembrane protease serine 2
- aa, amino acid
- hSARSr-CoVs, human Severe acute respiratory syndrome-related coronavirus
- nts, nucleotides
Collapse
Affiliation(s)
- João Trigueiro-Louro
- Antiviral Resistance Lab, Research & Development Unit, Infectious Diseases Department, Instituto Nacional de Saúde Doutor Ricardo Jorge, IP, Av. Padre Cruz, 1649-016 Lisbon, Portugal
- Host-Pathogen Interaction Unit, Research Institute for Medicines (iMed.ULisboa), Faculty of Pharmacy, Universidade de Lisboa, Av. Professor Gama Pinto, 1649-003 Lisbon, Portugal
| | - Vanessa Correia
- Antiviral Resistance Lab, Research & Development Unit, Infectious Diseases Department, Instituto Nacional de Saúde Doutor Ricardo Jorge, IP, Av. Padre Cruz, 1649-016 Lisbon, Portugal
| | - Inês Figueiredo-Nunes
- Host-Pathogen Interaction Unit, Research Institute for Medicines (iMed.ULisboa), Faculty of Pharmacy, Universidade de Lisboa, Av. Professor Gama Pinto, 1649-003 Lisbon, Portugal
| | - Marta Gíria
- Host-Pathogen Interaction Unit, Research Institute for Medicines (iMed.ULisboa), Faculty of Pharmacy, Universidade de Lisboa, Av. Professor Gama Pinto, 1649-003 Lisbon, Portugal
| | - Helena Rebelo-de-Andrade
- Antiviral Resistance Lab, Research & Development Unit, Infectious Diseases Department, Instituto Nacional de Saúde Doutor Ricardo Jorge, IP, Av. Padre Cruz, 1649-016 Lisbon, Portugal
- Host-Pathogen Interaction Unit, Research Institute for Medicines (iMed.ULisboa), Faculty of Pharmacy, Universidade de Lisboa, Av. Professor Gama Pinto, 1649-003 Lisbon, Portugal
| |
Collapse
|