1
|
Lien ST, Lin TE, Hsieh JH, Sung TY, Chen JH, Hsu KC. Establishment of extensive artificial intelligence models for kinase inhibitor prediction: Identification of novel PDGFRB inhibitors. Comput Biol Med 2023; 156:106722. [PMID: 36878123 DOI: 10.1016/j.compbiomed.2023.106722] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2022] [Revised: 02/16/2023] [Accepted: 02/26/2023] [Indexed: 03/06/2023]
Abstract
Identifying hit compounds is an important step in drug development. Unfortunately, this process continues to be a challenging task. Several machine learning models have been generated to aid in simplifying and improving the prediction of candidate compounds. Models tuned for predicting kinase inhibitors have been established. However, an effective model can be limited by the size of the chosen training dataset. In this study, we tested several machine learning models to predict potential kinase inhibitors. A dataset was curated from a number of publicly available repositories. This resulted in a comprehensive dataset covering more than half of the human kinome. More than 2,000 kinase models were established using different model approaches. The performances of the models were compared, and the Keras-MLP model was determined to be the best performing model. The model was then used to screen a chemical library for potential inhibitors targeting platelet-derived growth factor receptor-β (PDGFRB). Several PDGFRB candidates were selected, and in vitro assays confirmed four compounds with PDGFRB inhibitory activity and IC50 values in the nanomolar range. These results show the effectiveness of machine learning models trained on the reported dataset. This report would aid in the establishment of machine learning models as well as in the discovery of novel kinase inhibitors.
Collapse
Affiliation(s)
- Ssu-Ting Lien
- Graduate Institute of Cancer Biology and Drug Discovery, College of Medical Science and Technology, Taipei Medical University, Taipei, Taiwan
| | - Tony Eight Lin
- Graduate Institute of Cancer Biology and Drug Discovery, College of Medical Science and Technology, Taipei Medical University, Taipei, Taiwan; Ph.D. Program for Cancer Molecular Biology and Drug Discovery, College of Medical Science and Technology, Taipei Medical University, Taipei, Taiwan
| | - Jui-Hua Hsieh
- Division of Translational Toxicology, National Institute of Environmental Health Sciences, National Institutes of Health, Durham, NC, USA
| | - Tzu-Ying Sung
- Biomedical Translation Research Center, Academia Sinica, Taipei, Taiwan
| | - Jun-Hong Chen
- Graduate Institute of Cancer Biology and Drug Discovery, College of Medical Science and Technology, Taipei Medical University, Taipei, Taiwan
| | - Kai-Cheng Hsu
- Graduate Institute of Cancer Biology and Drug Discovery, College of Medical Science and Technology, Taipei Medical University, Taipei, Taiwan; Ph.D. Program for Cancer Molecular Biology and Drug Discovery, College of Medical Science and Technology, Taipei Medical University, Taipei, Taiwan; Ph.D. Program in Drug Discovery and Development Industry, College of Pharmacy, Taipei Medical University, Taipei, Taiwan; Cancer Center, Wan Fang Hospital, Taipei Medical University, Taipei, Taiwan; TMU Research Center of Cancer Translational Medicine, Taipei Medical University, Taipei, Taiwan; TMU Research Center of Drug Discovery, Taipei Medical University, Taipei, Taiwan.
| |
Collapse
|
2
|
Gagic Z, Ruzic D, Djokovic N, Djikic T, Nikolic K. In silico Methods for Design of Kinase Inhibitors as Anticancer Drugs. Front Chem 2020; 7:873. [PMID: 31970149 PMCID: PMC6960140 DOI: 10.3389/fchem.2019.00873] [Citation(s) in RCA: 62] [Impact Index Per Article: 15.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2019] [Accepted: 12/04/2019] [Indexed: 12/11/2022] Open
Abstract
Rational drug design implies usage of molecular modeling techniques such as pharmacophore modeling, molecular dynamics, virtual screening, and molecular docking to explain the activity of biomolecules, define molecular determinants for interaction with the drug target, and design more efficient drug candidates. Kinases play an essential role in cell function and therefore are extensively studied targets in drug design and discovery. Kinase inhibitors are clinically very important and widely used antineoplastic drugs. In this review, computational methods used in rational drug design of kinase inhibitors are discussed and compared, considering some representative case studies.
Collapse
Affiliation(s)
- Zarko Gagic
- Department of Pharmaceutical Chemistry, Faculty of Medicine, University of Banja Luka, Banja Luka, Bosnia and Herzegovina
| | - Dusan Ruzic
- Department of Pharmaceutical Chemistry, Faculty of Pharmacy, University of Belgrade, Belgrade, Serbia
| | - Nemanja Djokovic
- Department of Pharmaceutical Chemistry, Faculty of Pharmacy, University of Belgrade, Belgrade, Serbia
| | - Teodora Djikic
- Department of Pharmaceutical Chemistry, Faculty of Pharmacy, University of Belgrade, Belgrade, Serbia
| | - Katarina Nikolic
- Department of Pharmaceutical Chemistry, Faculty of Pharmacy, University of Belgrade, Belgrade, Serbia
| |
Collapse
|
3
|
Exploring the Potential of Spherical Harmonics and PCVM for Compounds Activity Prediction. Int J Mol Sci 2019; 20:ijms20092175. [PMID: 31052500 PMCID: PMC6539940 DOI: 10.3390/ijms20092175] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2019] [Revised: 04/14/2019] [Accepted: 04/29/2019] [Indexed: 01/11/2023] Open
Abstract
Biologically active chemical compounds may provide remedies for several diseases. Meanwhile, Machine Learning techniques applied to Drug Discovery, which are cheaper and faster than wet-lab experiments, have the capability to more effectively identify molecules with the expected pharmacological activity. Therefore, it is urgent and essential to develop more representative descriptors and reliable classification methods to accurately predict molecular activity. In this paper, we investigate the potential of a novel representation based on Spherical Harmonics fed into Probabilistic Classification Vector Machines classifier, namely SHPCVM, to compound the activity prediction task. We make use of representation learning to acquire the features which describe the molecules as precise as possible. To verify the performance of SHPCVM ten-fold cross-validation tests are performed on twenty-one G protein-coupled receptors (GPCRs). Experimental outcomes (accuracy of 0.86) assessed by the classification accuracy, precision, recall, Matthews’ Correlation Coefficient and Cohen’s kappa reveal that using our Spherical Harmonics-based representation which is relatively short and Probabilistic Classification Vector Machines can achieve very satisfactory performance results for GPCRs.
Collapse
|
4
|
Afolabi LT, Saeed F, Hashim H, Petinrin OO. Ensemble learning method for the prediction of new bioactive molecules. PLoS One 2018; 13:e0189538. [PMID: 29329334 PMCID: PMC5766097 DOI: 10.1371/journal.pone.0189538] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2017] [Accepted: 11/27/2017] [Indexed: 12/31/2022] Open
Abstract
Pharmacologically active molecules can provide remedies for a range of different illnesses and infections. Therefore, the search for such bioactive molecules has been an enduring mission. As such, there is a need to employ a more suitable, reliable, and robust classification method for enhancing the prediction of the existence of new bioactive molecules. In this paper, we adopt a recently developed combination of different boosting methods (Adaboost) for the prediction of new bioactive molecules. We conducted the research experiments utilizing the widely used MDL Drug Data Report (MDDR) database. The proposed boosting method generated better results than other machine learning methods. This finding suggests that the method is suitable for inclusion among the in silico tools for use in cheminformatics, computational chemistry and molecular biology.
Collapse
Affiliation(s)
| | - Faisal Saeed
- College of Computer Science and Engineering, Taibah University, Medina, Saudi Arabia
- Information Systems Department, Faculty of Computing, Universiti Teknologi Malaysia, Skudai, Johor, Malaysia
| | - Haslinda Hashim
- Information Systems Department, Faculty of Computing, Universiti Teknologi Malaysia, Skudai, Johor, Malaysia
- Kolej Yayasan Pelajaran Johor, KM16, Jalan Kulai-Kota Tinggi, Kota Tinggi, Johor, Malaysia
| | | |
Collapse
|
5
|
Skoraczyński G, Dittwald P, Miasojedow B, Szymkuć S, Gajewska EP, Grzybowski BA, Gambin A. Predicting the outcomes of organic reactions via machine learning: are current descriptors sufficient? Sci Rep 2017; 7:3582. [PMID: 28620199 PMCID: PMC5472585 DOI: 10.1038/s41598-017-02303-0] [Citation(s) in RCA: 74] [Impact Index Per Article: 10.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2016] [Accepted: 04/06/2017] [Indexed: 11/09/2022] Open
Abstract
As machine learning/artificial intelligence algorithms are defeating chess masters and, most recently, GO champions, there is interest - and hope - that they will prove equally useful in assisting chemists in predicting outcomes of organic reactions. This paper demonstrates, however, that the applicability of machine learning to the problems of chemical reactivity over diverse types of chemistries remains limited - in particular, with the currently available chemical descriptors, fundamental mathematical theorems impose upper bounds on the accuracy with which raction yields and times can be predicted. Improving the performance of machine-learning methods calls for the development of fundamentally new chemical descriptors.
Collapse
Affiliation(s)
- G Skoraczyński
- Faculty of Mathematics, Informatics, and Mechanics, University of Warsaw, 02-097, Warsaw, Poland
| | - P Dittwald
- DARPA Make-It Program & the Institute of Organic Chemistry, Polish Academy of Sciences, Warsaw, Poland
| | - B Miasojedow
- Faculty of Mathematics, Informatics, and Mechanics, University of Warsaw, 02-097, Warsaw, Poland
| | - S Szymkuć
- DARPA Make-It Program & the Institute of Organic Chemistry, Polish Academy of Sciences, Warsaw, Poland
| | - E P Gajewska
- DARPA Make-It Program & the Institute of Organic Chemistry, Polish Academy of Sciences, Warsaw, Poland
| | - B A Grzybowski
- DARPA Make-It Program & the Institute of Organic Chemistry, Polish Academy of Sciences, Warsaw, Poland. .,Center for Soft and Living Matter of Korea's Institute for Basic Science (IBS), Department of Chemistry, Ulsan National Institute of Science and Technology, Ulsan, South Korea.
| | - A Gambin
- Faculty of Mathematics, Informatics, and Mechanics, University of Warsaw, 02-097, Warsaw, Poland.
| |
Collapse
|
6
|
Brown SA, Nhola L, Herrmann J. Cardiovascular Toxicities of Small Molecule Tyrosine Kinase Inhibitors: An Opportunity for Systems-Based Approaches. Clin Pharmacol Ther 2016; 101:65-80. [DOI: 10.1002/cpt.552] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2016] [Revised: 10/31/2016] [Accepted: 10/31/2016] [Indexed: 12/12/2022]
Affiliation(s)
- S-A Brown
- Department of Cardiovascular Diseases; Mayo Clinic; Rochester Minnesota USA
| | - L Nhola
- Department of Cardiovascular Diseases; Mayo Clinic; Rochester Minnesota USA
| | - J Herrmann
- Department of Cardiovascular Diseases; Mayo Clinic; Rochester Minnesota USA
| |
Collapse
|
7
|
Bora A, Avram S, Ciucanu I, Raica M, Avram S. Predictive Models for Fast and Effective Profiling of Kinase Inhibitors. J Chem Inf Model 2016; 56:895-905. [DOI: 10.1021/acs.jcim.5b00646] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Affiliation(s)
- Alina Bora
- Department of Chemistry, Faculty of Chemistry-Biology-Geography, West University of Timisoara, 16 Pestalozzi Str., 300115, Timisoara, Romania
- Department
of Computational Chemistry, Institute of Chemistry, Timisoara of Romanian Academy, 24 Mihai Viteazu Avenue, Timisoara, 300223, Romania
| | - Sorin Avram
- Department
of Computational Chemistry, Institute of Chemistry, Timisoara of Romanian Academy, 24 Mihai Viteazu Avenue, Timisoara, 300223, Romania
| | - Ionel Ciucanu
- Department of Chemistry, Faculty of Chemistry-Biology-Geography, West University of Timisoara, 16 Pestalozzi Str., 300115, Timisoara, Romania
| | | | | |
Collapse
|
8
|
Abstract
This chapter presents an introduction to data mining with machine learning. It gives an overview of various types of machine learning, along with some examples. It explains how to download, install, and run the WEKA data mining toolkit on a simple data set, then proceeds to explain how one might approach a bioinformatics problem. Finally, it includes a brief summary of machine learning algorithms for other types of data mining problems, and provides suggestions about where to find additional information.
Collapse
|
9
|
Wang J, Zuo Y, Liu L, Man Y, Tadesse MG, Ressom HW. Identification of functional modules by integration of multiple data sources using a Bayesian network classifier. ACTA ACUST UNITED AC 2015; 7:206-17. [PMID: 24736851 DOI: 10.1161/circgenetics.113.000087] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
BACKGROUND Prediction of functional modules is indispensable for detecting protein deregulation in human complex diseases such as cancer. Bayesian network is one of the most commonly used models to integrate heterogeneous data from multiple sources such as protein domain, interactome, functional annotation, genome-wide gene expression, and the literature. METHODS AND RESULTS In this article, we present a Bayesian network classifier that is customized to (1) increase the ability to integrate diverse information from different sources, (2) effectively predict protein-protein interactions, (3) infer aberrant networks with scale-free and small-world properties, and (4) group molecules into functional modules or pathways based on the primary function and biological features. Application of this model in discovering protein biomarkers of hepatocellular carcinoma leads to the identification of functional modules that provide insights into the mechanism of the development and progression of hepatocellular carcinoma. These functional modules include cell cycle deregulation, increased angiogenesis (eg, vascular endothelial growth factor, blood vessel morphogenesis), oxidative metabolic alterations, and aberrant activation of signaling pathways involved in cellular proliferation, survival, and differentiation. CONCLUSIONS The discoveries and conclusions derived from our customized Bayesian network classifier are consistent with previously published results. The proposed approach for determining Bayesian network structure facilitates the integration of heterogeneous data from multiple sources to elucidate the mechanisms of complex diseases.
Collapse
Affiliation(s)
- Jinlian Wang
- Lombardi Comprehensive Cancer Center, Georgetown University Medical Center, Washington, DC
| | | | | | | | | | | |
Collapse
|
10
|
Sugaya N. Ligand efficiency-based support vector regression models for predicting bioactivities of ligands to drug target proteins. J Chem Inf Model 2014; 54:2751-63. [PMID: 25220713 DOI: 10.1021/ci5003262] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/03/2023]
Abstract
The concept of ligand efficiency (LE) indices is widely accepted throughout the drug design community and is frequently used in a retrospective manner in the process of drug development. For example, LE indices are used to investigate LE optimization processes of already-approved drugs and to re-evaluate hit compounds obtained from structure-based virtual screening methods and/or high-throughput experimental assays. However, LE indices could also be applied in a prospective manner to explore drug candidates. Here, we describe the construction of machine learning-based regression models in which LE indices are adopted as an end point and show that LE-based regression models can outperform regression models based on pIC50 values. In addition to pIC50 values traditionally used in machine learning studies based on chemogenomics data, three representative LE indices (ligand lipophilicity efficiency (LLE), binding efficiency index (BEI), and surface efficiency index (SEI)) were adopted, then used to create four types of training data. We constructed regression models by applying a support vector regression (SVR) method to the training data. In cross-validation tests of the SVR models, the LE-based SVR models showed higher correlations between the observed and predicted values than the pIC50-based models. Application tests to new data displayed that, generally, the predictive performance of SVR models follows the order SEI > BEI > LLE > pIC50. Close examination of the distributions of the activity values (pIC50, LLE, BEI, and SEI) in the training and validation data implied that the performance order of the SVR models may be ascribed to the much higher diversity of the LE-based training and validation data. In the application tests, the LE-based SVR models can offer better predictive performance of compound-protein pairs with a wider range of ligand potencies than the pIC50-based models. This finding strongly suggests that LE-based SVR models are better than pIC50-based models at predicting bioactivities of compounds that could exhibit a much higher (or lower) potency.
Collapse
Affiliation(s)
- Nobuyoshi Sugaya
- Drug Discovery Department, Research & Development Division, PharmaDesign, Inc. , Hatchobori 2-19-8, Chuo-ku, Tokyo 104-0032, Japan
| |
Collapse
|
11
|
Sugaya N. Training based on ligand efficiency improves prediction of bioactivities of ligands and drug target proteins in a machine learning approach. J Chem Inf Model 2013; 53:2525-37. [PMID: 24020509 DOI: 10.1021/ci400240u] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Machine learning methods based on ligand-protein interaction data in bioactivity databases are one of the current strategies for efficiently finding novel lead compounds as the first step in the drug discovery process. Although previous machine learning studies have succeeded in predicting novel ligand-protein interactions with high performance, all of the previous studies to date have been heavily dependent on the simple use of raw bioactivity data of ligand potencies measured by IC50, EC50, K(i), and K(d) deposited in databases. ChEMBL provides us with a unique opportunity to investigate whether a machine-learning-based classifier created by reflecting ligand efficiency other than the IC50, EC50, K(i), and Kd values can also offer high predictive performance. Here we report that classifiers created from training data based on ligand efficiency show higher performance than those from data based on IC50 or K(i) values. Utilizing GPCRSARfari and KinaseSARfari databases in ChEMBL, we created IC50- or K(i)-based training data and binding efficiency index (BEI) based training data then constructed classifiers using support vector machines (SVMs). The SVM classifiers from the BEI-based training data showed slightly higher area under curve (AUC), accuracy, sensitivity, and specificity in the cross-validation tests. Application of the classifiers to the validation data demonstrated that the AUCs and specificities of the BEI-based classifiers dramatically increased in comparison with the IC50- or K(i)-based classifiers. The improvement of the predictive power by the BEI-based classifiers can be attributed to (i) the more separated distributions of positives and negatives, (ii) the higher diversity of negatives in the BEI-based training data in a feature space of SVMs, and (iii) a more balanced number of positives and negatives in the BEI-based training data. These results strongly suggest that training data based on ligand efficiency as well as data based on classical IC50, EC50, K(d), and K(i) values are important when creating a classifier using a machine learning approach based on bioactivity data.
Collapse
Affiliation(s)
- Nobuyoshi Sugaya
- Drug Discovery Department, Research & Development Division, PharmaDesign, Inc. , Hatchobori 2-19-8, Chuo-ku, Tokyo, 104-0032, Japan
| |
Collapse
|