1
|
Ong WJG, Kirubakaran P, Karanicolas J. Poor Generalization by Current Deep Learning Models for Predicting Binding Affinities of Kinase Inhibitors. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.09.04.556234. [PMID: 37732243 PMCID: PMC10508770 DOI: 10.1101/2023.09.04.556234] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/22/2023]
Abstract
The extreme surge of interest over the past decade surrounding the use of neural networks has inspired many groups to deploy them for predicting binding affinities of drug-like molecules to their receptors. A model that can accurately make such predictions has the potential to screen large chemical libraries and help streamline the drug discovery process. However, despite reports of models that accurately predict quantitative inhibition using protein kinase sequences and inhibitors' SMILES strings, it is still unclear whether these models can generalize to previously unseen data. Here, we build a Convolutional Neural Network (CNN) analogous to those previously reported and evaluate the model over four datasets commonly used for inhibitor/kinase predictions. We find that the model performs comparably to those previously reported, provided that the individual data points are randomly split between the training set and the test set. However, model performance is dramatically deteriorated when all data for a given inhibitor is placed together in the same training/testing fold, implying that information leakage underlies the models' performance. Through comparison to simple models in which the SMILES strings are tokenized, or in which test set predictions are simply copied from the closest training set data points, we demonstrate that there is essentially no generalization whatsoever in this model. In other words, the model has not learned anything about molecular interactions, and does not provide any benefit over much simpler and more transparent models. These observations strongly point to the need for richer structure-based encodings, to obtain useful prospective predictions of not-yet-synthesized candidate inhibitors.
Collapse
Affiliation(s)
- Wern Juin Gabriel Ong
- Cancer Signaling & Microenvironment Program, Fox Chase Cancer Center, Philadelphia, PA 19111
- Bowdoin College, Brunswick, ME 04011
| | - Palani Kirubakaran
- Cancer Signaling & Microenvironment Program, Fox Chase Cancer Center, Philadelphia, PA 19111
| | - John Karanicolas
- Cancer Signaling & Microenvironment Program, Fox Chase Cancer Center, Philadelphia, PA 19111
| |
Collapse
|
2
|
Yu P, Ericksen S, Gitter A, Newton MA. Bayes optimal informer sets for early-stage drug discovery. Biometrics 2022. [PMID: 35165892 DOI: 10.1111/biom.13637] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2020] [Accepted: 02/08/2022] [Indexed: 11/26/2022]
Abstract
An important experimental design problem in early-stage drug discovery is how to prioritize available compounds for testing when very little is known about the target protein. Informer based ranking (IBR) methods address the prioritization problem when the compounds have provided bioactivity data on other potentially relevant targets. An IBR method selects an informer set of compounds, and then prioritizes the remaining compounds on the basis of new bioactivity experiments performed with the informer set on the target. We formalize the problem as a two-stage decision problem and introduce the Bayes Optimal Informer SEt (BOISE) method for its solution. BOISE leverages a flexible model of the initial bioactivity data, a relevant loss function, and effective computational schemes to resolve the two-step design problem. We evaluate BOISE and compare it to other IBR strategies in two retrospective studies, one on protein-kinase inhibition and the other on anti-cancer drug sensitivity. In both empirical settings BOISE exhibits better predictive performance than available methods. It also behaves well with missing data, where methods that use matrix completion show worse predictive performance. This article is protected by copyright. All rights reserved.
Collapse
Affiliation(s)
- Peng Yu
- University of Wisconsin-Madison
| | | | - Anthony Gitter
- University of Wisconsin-Madison.,Morgridge Institute for Research
| | | |
Collapse
|
3
|
Piroozmand F, Mohammadipanah F, Sajedi H. Spectrum of deep learning algorithms in drug discovery. Chem Biol Drug Des 2021; 96:886-901. [PMID: 33058458 DOI: 10.1111/cbdd.13674] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2019] [Revised: 02/11/2020] [Accepted: 02/19/2020] [Indexed: 12/16/2022]
Abstract
Deep learning (DL) algorithms are a subset of machine learning algorithms with the aim of modeling complex mapping between a set of elements and their classes. In parallel to the advance in revealing the molecular bases of diseases, a notable innovation has been undertaken to apply DL in data/libraries management, reaction optimizations, differentiating uncertainties, molecule constructions, creating metrics from qualitative results, and prediction of structures or interactions. From source identification to lead discovery and medicinal chemistry of the drug candidate, drug delivery, and modification, the challenges can be subjected to artificial intelligence algorithms to aid in the generation and interpretation of data. Discovery and design approach, both demand automation, large data management and data fusion by the advance in high-throughput mode. The application of DL can accelerate the exploration of drug mechanisms, finding novel indications for existing drugs (drug repositioning), drug development, and preclinical and clinical studies. The impact of DL in the workflow of drug discovery, design, and their complementary tools are highlighted in this review. Additionally, the type of DL algorithms used for this purpose, and their pros and cons along with the dominant directions of future research are presented.
Collapse
Affiliation(s)
- Firoozeh Piroozmand
- Pharmaceutical Biotechnology Lab, Department of Microbiology, School of Biology and Center of Excellence in Phylogeny of Living Organisms, College of Science, University of Tehran, Tehran, Iran
| | - Fatemeh Mohammadipanah
- Pharmaceutical Biotechnology Lab, Department of Microbiology, School of Biology and Center of Excellence in Phylogeny of Living Organisms, College of Science, University of Tehran, Tehran, Iran
| | - Hedieh Sajedi
- Department of Computer Science, School of Mathematics, Statistics and Computer Science, College of Science, University of Tehran, Tehran, Iran
| |
Collapse
|
4
|
Clemons PA, Bittker JA, Wagner FF, Hands A, Dančík V, Schreiber SL, Choudhary A, Wagner BK. The Use of Informer Sets in Screening: Perspectives on an Efficient Strategy to Identify New Probes. SLAS DISCOVERY 2021; 26:855-861. [PMID: 34130532 DOI: 10.1177/24725552211019410] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]
Abstract
Small-molecule discovery typically involves large-scale screening campaigns, spanning multiple compound collections. However, such activities can be cost- or time-prohibitive, especially when using complex assay systems, limiting the number of compounds tested. Further, low hit rates can make the process inefficient. Sparse coverage of chemical structure or biological activity space can lead to limited success in a primary screen and represents a missed opportunity by virtue of selecting the "wrong" compounds to test. Thus, the choice of screening collections becomes of paramount importance. In this perspective, we discuss the utility of generating "informer sets" for small-molecule discovery, and how this strategy can be leveraged to prioritize probe candidates. While many researchers may assume that informer sets are focused on particular targets (e.g., kinases) or processes (e.g., autophagy), efforts to assemble informer sets based on historical bioactivity or successful human exposure (e.g., repurposing collections) have shown promise as well. Another method for generating informer sets is based on chemical structure, particularly when the compounds have unknown activities and targets. We describe our efforts to screen an informer set representing a collection of 100,000 small molecules synthesized through diversity-oriented synthesis (DOS). This process enables researchers to identify activity early and more extensively screen only a few chemical scaffolds, rather than the entire collection. This elegant and economic outcome is a goal of the informer set approach. Here, we aim not only to shed light on this process, but also to promote the use of informer sets more widely in small-molecule discovery projects.
Collapse
Affiliation(s)
- Paul A Clemons
- Chemical Biology and Therapeutics Science Program, Broad Institute, Cambridge, MA, USA
| | - Joshua A Bittker
- Center for the Development of Therapeutics, Broad Institute, Cambridge, MA, USA.,Vertex Pharmaceuticals, Boston, MA, USA
| | - Florence F Wagner
- Center for the Development of Therapeutics, Broad Institute, Cambridge, MA, USA
| | - Allison Hands
- Center for the Development of Therapeutics, Broad Institute, Cambridge, MA, USA
| | - Vlado Dančík
- Chemical Biology and Therapeutics Science Program, Broad Institute, Cambridge, MA, USA
| | - Stuart L Schreiber
- Chemical Biology and Therapeutics Science Program, Broad Institute, Cambridge, MA, USA
| | - Amit Choudhary
- Chemical Biology and Therapeutics Science Program, Broad Institute, Cambridge, MA, USA
| | - Bridget K Wagner
- Chemical Biology and Therapeutics Science Program, Broad Institute, Cambridge, MA, USA
| |
Collapse
|
5
|
Wlodarchak N, Feltenberger JB, Ye Z, Beczkiewicz J, Procknow R, Yan G, King TM, Golden JE, Striker R. Engineering Selectivity for Reduced Toxicity of Bacterial Kinase Inhibitors Using Structure-Guided Medicinal Chemistry. ACS Med Chem Lett 2021; 12:228-235. [PMID: 35035774 PMCID: PMC8757511 DOI: 10.1021/acsmedchemlett.0c00580] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2020] [Accepted: 01/08/2021] [Indexed: 01/15/2023] Open
Abstract
![]()
Tuberculosis is a
major global public health concern, and new drugs
are needed to combat both the typical form and the increasingly common
drug-resistant form of this disease. The essential tuberculosis kinase
PknB is an attractive drug development target because of its central
importance in several critical signaling cascades. A major hurdle
in kinase inhibitor development is the reduction of toxicity due to
nonspecific kinase activity in host cells. Here a novel class of PknB
inhibitors was developed from hit aminopyrimidine 1 (GW779439X),
which was originally designed for human CDK4 but failed to progress
clinically because of high toxicity and low specificity. Replacing
the pyrazolopyridazine headgroup of the original hit with substituted
pyridine or phenyl headgroups resulted in a reduction of Cdk activity
and a 3-fold improvement in specificity over the human kinome while
maintaining PknB activity. This also resulted in improved microbiological
activity and reduced toxicity in THP-1 cells and zebrafish.
Collapse
Affiliation(s)
- Nathan Wlodarchak
- William S. Middleton Veterans Hospital, 2500 Overlook Terrace, Madison, Wisconsin 53705, United States.,Department of Medicine, University of Wisconsin-Madison, 1550 Linden Drive, Madison, Wisconsin 53706, United States
| | - John B Feltenberger
- University of Wisconsin-Madison Medicinal Chemistry Center, School of Pharmacy, University of Wisconsin-Madison, 777 Highland Avenue, Madison, Wisconsin 53705, United States
| | - Zhengqing Ye
- University of Wisconsin-Madison Medicinal Chemistry Center, School of Pharmacy, University of Wisconsin-Madison, 777 Highland Avenue, Madison, Wisconsin 53705, United States
| | - Jeffrey Beczkiewicz
- Department of Medicine, University of Wisconsin-Madison, 1550 Linden Drive, Madison, Wisconsin 53706, United States
| | - Rebecca Procknow
- Department of Medicine, University of Wisconsin-Madison, 1550 Linden Drive, Madison, Wisconsin 53706, United States
| | - Gang Yan
- Division of Pharmaceutical Sciences, School of Pharmacy, University of Wisconsin-Madison, 777 Highland Avenue, Madison, Wisconsin 53705, United States
| | - Troy M King
- Department of Medicine, University of Wisconsin-Madison, 1550 Linden Drive, Madison, Wisconsin 53706, United States
| | - Jennifer E Golden
- University of Wisconsin-Madison Medicinal Chemistry Center, School of Pharmacy, University of Wisconsin-Madison, 777 Highland Avenue, Madison, Wisconsin 53705, United States.,Division of Pharmaceutical Sciences, School of Pharmacy, University of Wisconsin-Madison, 777 Highland Avenue, Madison, Wisconsin 53705, United States
| | - Rob Striker
- William S. Middleton Veterans Hospital, 2500 Overlook Terrace, Madison, Wisconsin 53705, United States.,Department of Medicine, University of Wisconsin-Madison, 1550 Linden Drive, Madison, Wisconsin 53706, United States
| |
Collapse
|
6
|
Raschka S, Kaufman B. Machine learning and AI-based approaches for bioactive ligand discovery and GPCR-ligand recognition. Methods 2020; 180:89-110. [PMID: 32645448 PMCID: PMC8457393 DOI: 10.1016/j.ymeth.2020.06.016] [Citation(s) in RCA: 35] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2020] [Revised: 06/23/2020] [Accepted: 06/23/2020] [Indexed: 02/06/2023] Open
Abstract
In the last decade, machine learning and artificial intelligence applications have received a significant boost in performance and attention in both academic research and industry. The success behind most of the recent state-of-the-art methods can be attributed to the latest developments in deep learning. When applied to various scientific domains that are concerned with the processing of non-tabular data, for example, image or text, deep learning has been shown to outperform not only conventional machine learning but also highly specialized tools developed by domain experts. This review aims to summarize AI-based research for GPCR bioactive ligand discovery with a particular focus on the most recent achievements and research trends. To make this article accessible to a broad audience of computational scientists, we provide instructive explanations of the underlying methodology, including overviews of the most commonly used deep learning architectures and feature representations of molecular data. We highlight the latest AI-based research that has led to the successful discovery of GPCR bioactive ligands. However, an equal focus of this review is on the discussion of machine learning-based technology that has been applied to ligand discovery in general and has the potential to pave the way for successful GPCR bioactive ligand discovery in the future. This review concludes with a brief outlook highlighting the recent research trends in deep learning, such as active learning and semi-supervised learning, which have great potential for advancing bioactive ligand discovery.
Collapse
Affiliation(s)
- Sebastian Raschka
- University of Wisconsin-Madison, Department of Statistics, United States.
| | - Benjamin Kaufman
- University of Wisconsin-Madison, Department of Biostatistics and Medical Informatics, United States
| |
Collapse
|
7
|
Vijay S, Gujral TS. Non-linear Deep Neural Network for Rapid and Accurate Prediction of Phenotypic Responses to Kinase Inhibitors. iScience 2020; 23:101129. [PMID: 32434142 PMCID: PMC7235637 DOI: 10.1016/j.isci.2020.101129] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2020] [Revised: 04/04/2020] [Accepted: 04/29/2020] [Indexed: 12/19/2022] Open
Abstract
Protein kinase inhibitors are one of the most successful targeted therapies to date. Despite this progress, additional kinase inhibitors are needed to expand the target space as well as overcome drug resistance that has emerged in clinical setting. Here, we developed KiDNN (Kinase inhibitor prediction using Deep Neural Networks). KiDNN utilizes non-linear, multilayer feedforward network that mimics complex and dynamic kinase-driven signaling pathways. We used KiDNN to predict the effect of ∼200 kinase inhibitors on migration of breast and liver cancer cells. We show that the prediction accuracy of KiDNN outperformed other prediction tools based on linear models. We validated that an inhibitor of tyrosine kinase receptors, and an inhibitor of Src family kinases, decreased migration of triple-negative breast cancer cells, consistent with the role of these kinases in driving motility. Overall, we show that non-linear, DNN-based models provide a powerful approach to in silico screen hundreds of kinase inhibitors. Deep Neural Networks mimic non-linear, complex intracellular signaling pathways Multi-phase grid search identified best networks with less computation time Prediction accuracy of KiDNN outperformed linear models KiDNN can accelerate drug discovery and development efforts
Collapse
Affiliation(s)
- Siddharth Vijay
- Division of Human Biology, Fred Hutchinson Cancer Research Center, Seattle, WA, USA
| | - Taranjit S Gujral
- Division of Human Biology, Fred Hutchinson Cancer Research Center, Seattle, WA, USA; Department of Pharmacology, University of Washington, Seattle, WA, USA.
| |
Collapse
|