1
|
Rodríguez-Pérez R, Miljković F, Bajorath J. Assessing the information content of structural and protein-ligand interaction representations for the classification of kinase inhibitor binding modes via machine learning and active learning. J Cheminform 2020; 12:36. [PMID: 33431025 PMCID: PMC7245824 DOI: 10.1186/s13321-020-00434-7] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2019] [Accepted: 04/27/2020] [Indexed: 12/27/2022] Open
Abstract
For kinase inhibitors, X-ray crystallography has revealed different types of binding modes. Currently, more than 2000 kinase inhibitors with known binding modes are available, which makes it possible to derive and test machine learning models for the prediction of inhibitors with different binding modes. We have addressed this prediction task to evaluate and compare the information content of distinct molecular representations including protein–ligand interaction fingerprints (IFPs) and compound structure-based structural fingerprints (i.e., atom environment/fragment fingerprints). IFPs were designed to capture binding mode-specific interaction patterns at different resolution levels. Accurate predictions of kinase inhibitor binding modes were achieved with random forests using both representations. The performance of IFPs was consistently superior to atom environment fingerprints, albeit only by less than 10%. An active learning strategy applying information entropy-based selection of training instances was applied as a diagnostic approach to assess the relative information content of distinct representations. IFPs were found to capture more binding mode-relevant information than atom environment fingerprints, leading to highly predictive models even when training instances were randomly selected. By contrast, for atom environment fingerprints, the derivation of accurate models via active learning depended on entropy-based selection of informative training compounds. Notably, higher information content of IFPs confirmed by active learning only resulted in small improvements in global prediction accuracy compared to models derived using atom environment fingerprints. For practical applications, prediction of binding modes of new kinase inhibitors on the basis of chemical structure is highly attractive.![]()
Collapse
Affiliation(s)
- Raquel Rodríguez-Pérez
- Department of Life Science Informatics, B-IT, LIMES Program Unit Chemical Biology and Medicinal Chemistry, Rheinische Friedrich-Wilhelms-Universität, Endenicher Allee 19c, 53115, Bonn, Germany
| | - Filip Miljković
- Department of Life Science Informatics, B-IT, LIMES Program Unit Chemical Biology and Medicinal Chemistry, Rheinische Friedrich-Wilhelms-Universität, Endenicher Allee 19c, 53115, Bonn, Germany
| | - Jürgen Bajorath
- Department of Life Science Informatics, B-IT, LIMES Program Unit Chemical Biology and Medicinal Chemistry, Rheinische Friedrich-Wilhelms-Universität, Endenicher Allee 19c, 53115, Bonn, Germany.
| |
Collapse
|
2
|
Lin X, Li X, Lin X. A Review on Applications of Computational Methods in Drug Screening and Design. Molecules 2020; 25:E1375. [PMID: 32197324 PMCID: PMC7144386 DOI: 10.3390/molecules25061375] [Citation(s) in RCA: 235] [Impact Index Per Article: 58.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2020] [Revised: 03/16/2020] [Accepted: 03/16/2020] [Indexed: 12/27/2022] Open
Abstract
Drug development is one of the most significant processes in the pharmaceutical industry. Various computational methods have dramatically reduced the time and cost of drug discovery. In this review, we firstly discussed roles of multiscale biomolecular simulations in identifying drug binding sites on the target macromolecule and elucidating drug action mechanisms. Then, virtual screening methods (e.g., molecular docking, pharmacophore modeling, and QSAR) as well as structure- and ligand-based classical/de novo drug design were introduced and discussed. Last, we explored the development of machine learning methods and their applications in aforementioned computational methods to speed up the drug discovery process. Also, several application examples of combining various methods was discussed. A combination of different methods to jointly solve the tough problem at different scales and dimensions will be an inevitable trend in drug screening and design.
Collapse
Affiliation(s)
- Xiaoqian Lin
- Institute of Single Cell Engineering, Beijing Advanced Innovation Center for Biomedical Engineering, Beihang University, Beijing 100191, China;
- School of Biological Science and Medical Engineering, Beihang University, Beijing 100191, China
| | - Xiu Li
- School of Chemistry and Material Science, Shanxi Normal University, Linfen 041004, China;
| | - Xubo Lin
- Institute of Single Cell Engineering, Beijing Advanced Innovation Center for Biomedical Engineering, Beihang University, Beijing 100191, China;
- School of Biological Science and Medical Engineering, Beihang University, Beijing 100191, China
| |
Collapse
|
3
|
|
4
|
Pogodin PV, Lagunin AA, Rudik AV, Filimonov DA, Druzhilovskiy DS, Nicklaus MC, Poroikov VV. How to Achieve Better Results Using PASS-Based Virtual Screening: Case Study for Kinase Inhibitors. Front Chem 2018; 6:133. [PMID: 29755970 PMCID: PMC5935003 DOI: 10.3389/fchem.2018.00133] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2018] [Accepted: 04/09/2018] [Indexed: 12/16/2022] Open
Abstract
Discovery of new pharmaceutical substances is currently boosted by the possibility of utilization of the Synthetically Accessible Virtual Inventory (SAVI) library, which includes about 283 million molecules, each annotated with a proposed synthetic one-step route from commercially available starting materials. The SAVI database is well-suited for ligand-based methods of virtual screening to select molecules for experimental testing. In this study, we compare the performance of three approaches for the analysis of structure-activity relationships that differ in their criteria for selecting of "active" and "inactive" compounds included in the training sets. PASS (Prediction of Activity Spectra for Substances), which is based on a modified Naïve Bayes algorithm, was applied since it had been shown to be robust and to provide good predictions of many biological activities based on just the structural formula of a compound even if the information in the training set is incomplete. We used different subsets of kinase inhibitors for this case study because many data are currently available on this important class of drug-like molecules. Based on the subsets of kinase inhibitors extracted from the ChEMBL 20 database we performed the PASS training, and then applied the model to ChEMBL 23 compounds not yet present in ChEMBL 20 to identify novel kinase inhibitors. As one may expect, the best prediction accuracy was obtained if only the experimentally confirmed active and inactive compounds for distinct kinases in the training procedure were used. However, for some kinases, reasonable results were obtained even if we used merged training sets, in which we designated as inactives the compounds not tested against the particular kinase. Thus, depending on the availability of data for a particular biological activity, one may choose the first or the second approach for creating ligand-based computational tools to achieve the best possible results in virtual screening.
Collapse
Affiliation(s)
- Pavel V. Pogodin
- Department of Bioinformatics, Institute of Biomedical Chemistry, Moscow, Russia
| | - Alexey A. Lagunin
- Department of Bioinformatics, Institute of Biomedical Chemistry, Moscow, Russia
- Department of Bioinformatics, Medical-Biological Department, Pirogov Russian National Research Medical University, Moscow, Russia
| | - Anastasia V. Rudik
- Department of Bioinformatics, Institute of Biomedical Chemistry, Moscow, Russia
| | - Dmitry A. Filimonov
- Department of Bioinformatics, Institute of Biomedical Chemistry, Moscow, Russia
| | | | - Mark C. Nicklaus
- Computer-Aided Drug Design Group, Chemical Biology Laboratory, Center for Cancer Research, National Cancer Institute, NIH, NCI-Frederick, Frederick, MD, United States
| | | |
Collapse
|
5
|
Hu J, Li Y, Zhang Y, Yu DJ. ATPbind: Accurate Protein-ATP Binding Site Prediction by Combining Sequence-Profiling and Structure-Based Comparisons. J Chem Inf Model 2018; 58:501-510. [PMID: 29361215 DOI: 10.1021/acs.jcim.7b00397] [Citation(s) in RCA: 42] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
Abstract
Protein-ATP interactions are ubiquitous in a wide variety of biological processes. Correctly locating ATP binding sites from protein information is an important but challenging task for protein function annotation and drug discovery. However, there is no method that can optimally identify ATP binding sites for different proteins. In this study, we report a new composite predictor, ATPbind, for ATP binding sites by integrating the outputs of two template-based predictors (i.e., S-SITE and TM-SITE) and three discriminative sequence-driven features of proteins: position specific scoring matrix, predicted secondary structure, and predicted solvent accessibility. In ATPbind, we assembled multiple support vector machines (SVMs) based on a random undersampling technique to cope with the serious imbalance phenomenon between the numbers of ATP binding sites and of non-ATP binding sites. We also constructed a new gold-standard benchmark data set consisting of 429 ATP binding proteins from the PDB database to evaluate and compare the proposed ATPbind with other existing predictors. Starting from a query sequence and predicted I-TASSER models, ATPbind can achieve an average accuracy of 72%, covering 62% of all ATP binding sites while achieving a Matthews correlation coefficient value that is significantly higher than that of other state-of-the-art predictors.
Collapse
Affiliation(s)
- Jun Hu
- School of Computer Science and Engineering, Nanjing University of Science and Technology , Xiaolingwei 200, Nanjing, 210094, P. R. China.,Department of Computational Medicine and Bioinformatics, University of Michigan , 100 Washtenaw, Ann Arbor, Michigan 48109-2218, United States
| | - Yang Li
- School of Computer Science and Engineering, Nanjing University of Science and Technology , Xiaolingwei 200, Nanjing, 210094, P. R. China.,Department of Computational Medicine and Bioinformatics, University of Michigan , 100 Washtenaw, Ann Arbor, Michigan 48109-2218, United States
| | - Yang Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan , 100 Washtenaw, Ann Arbor, Michigan 48109-2218, United States
| | - Dong-Jun Yu
- School of Computer Science and Engineering, Nanjing University of Science and Technology , Xiaolingwei 200, Nanjing, 210094, P. R. China
| |
Collapse
|
6
|
Jacoby E, Wroblowski B, Buyck C, Neefs JM, Meyer C, Cummings MD, van Vlijmen H. Protocols for the Design of Kinase-focused Compound Libraries. Mol Inform 2017; 37:e1700119. [PMID: 29116686 DOI: 10.1002/minf.201700119] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2017] [Accepted: 10/20/2017] [Indexed: 01/12/2023]
Abstract
Protocols for the design of kinase-focused compound libraries are presented. Kinase-focused compound libraries can be differentiated based on the design goal. Depending on whether the library should be a discovery library specific for one particular kinase, a general discovery library for multiple distinct kinase projects, or even phenotypic screening, there exists today a variety of in silico methods to design candidate compound libraries. We address the following scenarios: 1) Datamining of SAR databases and kinase focused vendor catalogues; 2) Predictions and virtual screening; 3) Structure-based design of combinatorial kinase inhibitors; 4) Design of covalent kinase inhibitors; 5) Design of macrocyclic kinase inhibitors; and 6) Design of allosteric kinase inhibitors and activators.
Collapse
Affiliation(s)
- Edgar Jacoby
- Janssen Research & Development, Turnhoutseweg 30, 2340, Beerse, Belgium
| | | | - Christophe Buyck
- Janssen Research & Development, Turnhoutseweg 30, 2340, Beerse, Belgium
| | - Jean-Marc Neefs
- Janssen Research & Development, Turnhoutseweg 30, 2340, Beerse, Belgium
| | | | - Maxwell D Cummings
- Janssen Research & Development, 1400 McKean Rd, Spring House, PA 19477, USA
| | | |
Collapse
|
7
|
Computational-experimental approach to drug-target interaction mapping: A case study on kinase inhibitors. PLoS Comput Biol 2017; 13:e1005678. [PMID: 28787438 PMCID: PMC5560747 DOI: 10.1371/journal.pcbi.1005678] [Citation(s) in RCA: 62] [Impact Index Per Article: 8.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2017] [Revised: 08/17/2017] [Accepted: 07/11/2017] [Indexed: 01/09/2023] Open
Abstract
Due to relatively high costs and labor required for experimental profiling of the full target space of chemical compounds, various machine learning models have been proposed as cost-effective means to advance this process in terms of predicting the most potent compound-target interactions for subsequent verification. However, most of the model predictions lack direct experimental validation in the laboratory, making their practical benefits for drug discovery or repurposing applications largely unknown. Here, we therefore introduce and carefully test a systematic computational-experimental framework for the prediction and pre-clinical verification of drug-target interactions using a well-established kernel-based regression algorithm as the prediction model. To evaluate its performance, we first predicted unmeasured binding affinities in a large-scale kinase inhibitor profiling study, and then experimentally tested 100 compound-kinase pairs. The relatively high correlation of 0.77 (p < 0.0001) between the predicted and measured bioactivities supports the potential of the model for filling the experimental gaps in existing compound-target interaction maps. Further, we subjected the model to a more challenging task of predicting target interactions for such a new candidate drug compound that lacks prior binding profile information. As a specific case study, we used tivozanib, an investigational VEGF receptor inhibitor with currently unknown off-target profile. Among 7 kinases with high predicted affinity, we experimentally validated 4 new off-targets of tivozanib, namely the Src-family kinases FRK and FYN A, the non-receptor tyrosine kinase ABL1, and the serine/threonine kinase SLK. Our sub-sequent experimental validation protocol effectively avoids any possible information leakage between the training and validation data, and therefore enables rigorous model validation for practical applications. These results demonstrate that the kernel-based modeling approach offers practical benefits for probing novel insights into the mode of action of investigational compounds, and for the identification of new target selectivities for drug repurposing applications.
Collapse
|
8
|
Sorgenfrei FA, Fulle S, Merget B. Kinome-Wide Profiling Prediction of Small Molecules. ChemMedChem 2017; 13:495-499. [PMID: 28544552 DOI: 10.1002/cmdc.201700180] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2017] [Revised: 05/20/2017] [Indexed: 12/21/2022]
Abstract
Extensive kinase profiling data, covering more than half of the human kinome, are available nowadays and allow the construction of activity prediction models of high practical utility. Proteochemometric (PCM) approaches use compound and protein descriptors, which enables the extrapolation of bioactivity values to thus far unexplored kinases. In this study, the potential of PCM to make large-scale predictions on the entire kinome is explored, considering the applicability on novel compounds and kinases, including clinically relevant mutants. A rigorous validation indicates high predictive power on left-out kinases and superiority over individual kinase QSAR models for new compounds. Furthermore, external validation on clinically relevant mutant kinases reveals an excellent predictive power for mutations spread across the ATP binding site.
Collapse
Affiliation(s)
- Frieda A Sorgenfrei
- BioMed X Innovation Center, Im Neuenheimer Feld 515, 69120, Heidelberg, Germany
| | - Simone Fulle
- BioMed X Innovation Center, Im Neuenheimer Feld 515, 69120, Heidelberg, Germany
| | - Benjamin Merget
- BioMed X Innovation Center, Im Neuenheimer Feld 515, 69120, Heidelberg, Germany
| |
Collapse
|
9
|
Small Random Forest Models for Effective Chemogenomic Active Learning. JOURNAL OF COMPUTER AIDED CHEMISTRY 2017. [DOI: 10.2751/jcac.18.124] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
|