1
|
Manen-Freixa L, Antolin AA. Polypharmacology prediction: the long road toward comprehensively anticipating small-molecule selectivity to de-risk drug discovery. Expert Opin Drug Discov 2024:1-27. [PMID: 39004919 DOI: 10.1080/17460441.2024.2376643] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2024] [Accepted: 07/02/2024] [Indexed: 07/16/2024]
Abstract
INTRODUCTION Small molecules often bind to multiple targets, a behavior termed polypharmacology. Anticipating polypharmacology is essential for drug discovery since unknown off-targets can modulate safety and efficacy - profoundly affecting drug discovery success. Unfortunately, experimental methods to assess selectivity present significant limitations and drugs still fail in the clinic due to unanticipated off-targets. Computational methods are a cost-effective, complementary approach to predict polypharmacology. AREAS COVERED This review aims to provide a comprehensive overview of the state of polypharmacology prediction and discuss its strengths and limitations, covering both classical cheminformatics methods and bioinformatic approaches. The authors review available data sources, paying close attention to their different coverage. The authors then discuss major algorithms grouped by the types of data that they exploit using selected examples. EXPERT OPINION Polypharmacology prediction has made impressive progress over the last decades and contributed to identify many off-targets. However, data incompleteness currently limits most approaches to comprehensively predict selectivity. Moreover, our limited agreement on model assessment challenges the identification of the best algorithms - which at present show modest performance in prospective real-world applications. Despite these limitations, the exponential increase of multidisciplinary Big Data and AI hold much potential to better polypharmacology prediction and de-risk drug discovery.
Collapse
Affiliation(s)
- Leticia Manen-Freixa
- Oncobell Division, Bellvitge Biomedical Research Institute (IDIBELL) and ProCURE Department, Catalan Institute of Oncology (ICO), Barcelona, Spain
| | - Albert A Antolin
- Oncobell Division, Bellvitge Biomedical Research Institute (IDIBELL) and ProCURE Department, Catalan Institute of Oncology (ICO), Barcelona, Spain
- Center for Cancer Drug Discovery, The Division of Cancer Therapeutics, The Institute of Cancer Research, London, UK
| |
Collapse
|
2
|
An Y, Lim J, Glavatskikh M, Wang X, Norris-Drouin J, Hardy PB, Leisner TM, Pearce KH, Kireev D. In silico fragment-based discovery of CIB1-directed anti-tumor agents by FRASE-bot. Nat Commun 2024; 15:5564. [PMID: 38956119 PMCID: PMC11219766 DOI: 10.1038/s41467-024-49892-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2023] [Accepted: 06/19/2024] [Indexed: 07/04/2024] Open
Abstract
Chemical probes are an indispensable tool for translating biological discoveries into new therapies, though are increasingly difficult to identify since novel therapeutic targets are often hard-to-drug proteins. We introduce FRASE-based hit-finding robot (FRASE-bot), to expedite drug discovery for unconventional therapeutic targets. FRASE-bot mines available 3D structures of ligand-protein complexes to create a database of FRAgments in Structural Environments (FRASE). The FRASE database can be screened to identify structural environments similar to those in the target protein and seed the target structure with relevant ligand fragments. A neural network model is used to retain fragments with the highest likelihood of being native binders. The seeded fragments then inform ultra-large-scale virtual screening of commercially available compounds. We apply FRASE-bot to identify ligands for Calcium and Integrin Binding protein 1 (CIB1), a promising drug target implicated in triple negative breast cancer. FRASE-based virtual screening identifies a small-molecule CIB1 ligand (with binding confirmed in a TR-FRET assay) showing specific cell-killing activity in CIB1-dependent cancer cells, but not in CIB1-depletion-insensitive cells.
Collapse
Affiliation(s)
- Yi An
- Center for Integrative Chemical Biology and Drug Discovery, UNC Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, NC, 27513, USA
| | - Jiwoong Lim
- Center for Integrative Chemical Biology and Drug Discovery, UNC Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, NC, 27513, USA
| | - Marta Glavatskikh
- Center for Integrative Chemical Biology and Drug Discovery, UNC Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, NC, 27513, USA
| | - Xiaowen Wang
- Center for Integrative Chemical Biology and Drug Discovery, UNC Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, NC, 27513, USA
- Chemistry department, University of Missouri, Columbia, Columbia, MO, 65211, USA
| | - Jacqueline Norris-Drouin
- Center for Integrative Chemical Biology and Drug Discovery, UNC Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, NC, 27513, USA
| | - P Brian Hardy
- Center for Integrative Chemical Biology and Drug Discovery, UNC Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, NC, 27513, USA
| | - Tina M Leisner
- Center for Integrative Chemical Biology and Drug Discovery, UNC Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, NC, 27513, USA
| | - Kenneth H Pearce
- Center for Integrative Chemical Biology and Drug Discovery, UNC Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, NC, 27513, USA.
| | - Dmitri Kireev
- Center for Integrative Chemical Biology and Drug Discovery, UNC Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, NC, 27513, USA.
- Chemistry department, University of Missouri, Columbia, Columbia, MO, 65211, USA.
| |
Collapse
|
3
|
Schuh M, Boldini D, Sieber SA. Synergizing Chemical Structures and Bioassay Descriptions for Enhanced Molecular Property Prediction in Drug Discovery. J Chem Inf Model 2024; 64:4640-4650. [PMID: 38836773 PMCID: PMC11200265 DOI: 10.1021/acs.jcim.4c00765] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2024] [Revised: 05/23/2024] [Accepted: 05/23/2024] [Indexed: 06/06/2024]
Abstract
The precise prediction of molecular properties can greatly accelerate the development of new drugs. However, in silico molecular property prediction approaches have been limited so far to assays for which large amounts of data are available. In this study, we develop a new computational approach leveraging both the textual description of the assay of interest and the chemical structure of target compounds. By combining these two sources of information via self-supervised learning, our tool can provide accurate predictions for assays where no measurements are available. Remarkably, our approach achieves state-of-the-art performance on the FS-Mol benchmark for zero-shot prediction, outperforming a wide variety of deep learning approaches. Additionally, we demonstrate how our tool can be used for tailoring screening libraries for the assay of interest, showing promising performance in a retrospective case study on a high-throughput screening campaign. By accelerating the early identification of active molecules in drug discovery and development, this method has the potential to streamline the identification of novel therapeutics.
Collapse
Affiliation(s)
- Maximilian
G. Schuh
- TUM School of Natural Sciences, Department
of Bioscience, Center for Functional Protein Assemblies (CPA), Technical University of Munich, 85748 Garching
bei München, Germany
| | - Davide Boldini
- TUM School of Natural Sciences, Department
of Bioscience, Center for Functional Protein Assemblies (CPA), Technical University of Munich, 85748 Garching
bei München, Germany
| | - Stephan A. Sieber
- TUM School of Natural Sciences, Department
of Bioscience, Center for Functional Protein Assemblies (CPA), Technical University of Munich, 85748 Garching
bei München, Germany
| |
Collapse
|
4
|
Oliveira PF, Guedes RC, Falcao AO. Inferring molecular inhibition potency with AlphaFold predicted structures. Sci Rep 2024; 14:8252. [PMID: 38589418 PMCID: PMC11001998 DOI: 10.1038/s41598-024-58394-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2023] [Accepted: 03/28/2024] [Indexed: 04/10/2024] Open
Abstract
Even though in silico drug ligand-based methods have been successful in predicting interactions with known target proteins, they struggle with new, unassessed targets. To address this challenge, we propose an approach that integrates structural data from AlphaFold 2 predicted protein structures into machine learning models. Our method extracts 3D structural protein fingerprints and combines them with ligand structural data to train a single machine learning model. This model captures the relationship between ligand properties and the unique structural features of various target proteins, enabling predictions for never before tested molecules and protein targets. To assess our model, we used a dataset of 144 Human G-protein Coupled Receptors (GPCRs) with over 140,000 measured inhibition constants (Ki) values. Results strongly suggest that our approach performs as well as state-of-the-art ligand-based methods. In a second modeling approach that used 129 targets for training and a separate test set of 15 different protein targets, our model correctly predicted interactions for 73% of targets, with explained variances exceeding 0.50 in 22% of cases. Our findings further verified that the usage of experimentally determined protein structures produced models that were statistically indistinct from the Alphafold synthetic structures. This study presents a proteo-chemometric drug screening approach that uses a simple and scalable method for extracting protein structural information for usage in machine learning models capable of predicting protein-molecule interactions even for orphan targets.
Collapse
Affiliation(s)
- Pedro F Oliveira
- Lasige, Faculdade de Ciências, Universidade de Lisboa, 1749-016 Lisboa, Portugal
| | - Rita C Guedes
- Research Institute for Medicines (iMed.ULisboa), Faculdade de Farmácia, Universidade de Lisboa, Av. Prof. Gama Pinto, 1649-003 Lisboa, Portugal
| | - Andre O Falcao
- Departamento de Informática, Faculdade de Ciências, Universidade de Lisboa, 1749-016, Lisboa, Portugal.
| |
Collapse
|
5
|
Viljanen M, Minnema J, Wassenaar PNH, Rorije E, Peijnenburg W. What is the ecotoxicity of a given chemical for a given aquatic species? Predicting interactions between species and chemicals using recommender system techniques. SAR AND QSAR IN ENVIRONMENTAL RESEARCH 2023; 34:765-788. [PMID: 37670728 DOI: 10.1080/1062936x.2023.2254225] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/21/2023] [Accepted: 08/27/2023] [Indexed: 09/07/2023]
Abstract
Ecotoxicological safety assessment of chemicals requires toxicity data on multiple species, despite the general desire of minimizing animal testing. Predictive models, specifically machine learning (ML) methods, are one of the tools capable of solving this apparent contradiction as they allow to generalize toxicity patterns across chemicals and species. However, despite the availability of large public toxicity datasets, the data is highly sparse, complicating model development. The aim of this study is to provide insights into how ML can predict toxicity using a large but sparse dataset. We developed models to predict LC50-values, based on experimental LC50-data covering 2431 organic chemicals and 1506 aquatic species from the ECOTOX-database. Several well-known ML techniques were evaluated and a new ML model was developed, inspired by recommender systems. This new model involves a simple linear model that learns low-rank interactions between species and chemicals using factorization machines. We evaluated the predictive performances of the developed models based on two validation settings: 1) predicting unseen chemical-species pairs, and 2) predicting unseen chemicals. The results of this study show that ML models can accurately predict LC50-values in both validation settings. Moreover, we show that the novel factorization machine approach can match well-tuned, complex, ML approaches.
Collapse
Affiliation(s)
- M Viljanen
- Department of Statistics, Data Science and Modelling, National Institute of Public Health and the Environment, Bilthoven, The Netherlands
| | - J Minnema
- Center for Safety of Substances and Products, National Institute of Public Health and the Environment, Bilthoven, The Netherlands
| | - P N H Wassenaar
- Center for Safety of Substances and Products, National Institute of Public Health and the Environment, Bilthoven, The Netherlands
| | - E Rorije
- Center for Safety of Substances and Products, National Institute of Public Health and the Environment, Bilthoven, The Netherlands
| | - W Peijnenburg
- Center for Safety of Substances and Products, National Institute of Public Health and the Environment, Bilthoven, The Netherlands
- Institute of Environmental Sciences (CML), Leiden University, Leiden, The Netherlands
| |
Collapse
|
6
|
Kanev GK, Zhang Y, Kooistra AJ, Bender A, Leurs R, Bailey D, Würdinger T, de Graaf C, de Esch IJP, Westerman BA. Predicting the target landscape of kinase inhibitors using 3D convolutional neural networks. PLoS Comput Biol 2023; 19:e1011301. [PMID: 37669273 PMCID: PMC10508635 DOI: 10.1371/journal.pcbi.1011301] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2022] [Revised: 09/19/2023] [Accepted: 06/25/2023] [Indexed: 09/07/2023] Open
Abstract
Many therapies in clinical trials are based on single drug-single target relationships. To further extend this concept to multi-target approaches using multi-targeted drugs, we developed a machine learning pipeline to unravel the target landscape of kinase inhibitors. This pipeline, which we call 3D-KINEssence, uses a new type of protein fingerprints (3D FP) based on the structure of kinases generated through a 3D convolutional neural network (3D-CNN). These 3D-CNN kinase fingerprints were matched to molecular Morgan fingerprints to predict the targets of each respective kinase inhibitor based on available bioactivity data. The performance of the pipeline was evaluated on two test sets: a sparse drug-target set where each drug is matched in most cases to a single target and also on a densely-covered drug-target set where each drug is matched to most if not all targets. This latter set is more challenging to train, given its non-exclusive character. Our model's root-mean-square error (RMSE) based on the two datasets was 0.68 and 0.8, respectively. These results indicate that 3D FP can predict the target landscape of kinase inhibitors at around 0.8 log units of bioactivity. Our strategy can be utilized in proteochemometric or chemogenomic workflows by consolidating the target landscape of kinase inhibitors.
Collapse
Affiliation(s)
- Georgi K. Kanev
- Division of Medicinal Chemistry, Amsterdam Institute of Molecular and Life Sciences (AIMMS), Vrije Universiteit Amsterdam, Amsterdam, The Netherlands
- Department of Neurosurgery, Amsterdam University Medical Centers, Cancer Center Amsterdam, Brain Tumor Center Amsterdam, Amsterdam, The Netherlands
| | - Yaran Zhang
- Department of Neurosurgery, Amsterdam University Medical Centers, Cancer Center Amsterdam, Brain Tumor Center Amsterdam, Amsterdam, The Netherlands
| | - Albert J. Kooistra
- Division of Medicinal Chemistry, Amsterdam Institute of Molecular and Life Sciences (AIMMS), Vrije Universiteit Amsterdam, Amsterdam, The Netherlands
- Department of Drug Design and Pharmacology, University of Copenhagen, Copenhagen, Denmark
| | - Andreas Bender
- Centre for Molecular Science Informatics, Department of Chemistry, University of Cambridge, Cambridge, United Kingdom
| | - Rob Leurs
- Division of Medicinal Chemistry, Amsterdam Institute of Molecular and Life Sciences (AIMMS), Vrije Universiteit Amsterdam, Amsterdam, The Netherlands
| | - David Bailey
- The WINDOW consortium, www.window-consortium.org
- IOTA Pharmaceuticals Ltd, St Johns Innovation Centre, Cambridge, United Kingdom
| | - Thomas Würdinger
- Department of Neurosurgery, Amsterdam University Medical Centers, Cancer Center Amsterdam, Brain Tumor Center Amsterdam, Amsterdam, The Netherlands
- The WINDOW consortium, www.window-consortium.org
| | - Chris de Graaf
- Division of Medicinal Chemistry, Amsterdam Institute of Molecular and Life Sciences (AIMMS), Vrije Universiteit Amsterdam, Amsterdam, The Netherlands
| | - Iwan J. P. de Esch
- Division of Medicinal Chemistry, Amsterdam Institute of Molecular and Life Sciences (AIMMS), Vrije Universiteit Amsterdam, Amsterdam, The Netherlands
| | - Bart A. Westerman
- Department of Neurosurgery, Amsterdam University Medical Centers, Cancer Center Amsterdam, Brain Tumor Center Amsterdam, Amsterdam, The Netherlands
- The WINDOW consortium, www.window-consortium.org
| |
Collapse
|
7
|
D’Souza S, Prema KV, Balaji S, Shah R. Deep Learning-Based Modeling of Drug–Target Interaction Prediction Incorporating Binding Site Information of Proteins. INTERDISCIPLINARY SCIENCES: COMPUTATIONAL LIFE SCIENCES 2023; 15:306-315. [PMID: 36967455 PMCID: PMC10148762 DOI: 10.1007/s12539-023-00557-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/24/2022] [Revised: 02/22/2023] [Accepted: 02/22/2023] [Indexed: 03/29/2023]
Abstract
AbstractChemogenomics, also known as proteochemometrics, covers various computational methods for predicting interactions between related drugs and targets on large-scale data. Chemogenomics is used in the early stages of drug discovery to predict the off-target effects of proteins against therapeutic candidates. This study aims to predict unknown ligand–target interactions using one-dimensional SMILES as inputs for ligands and binding site residues for proteins in a computationally efficient manner. We first formulate a Deep learning CNN model using one-dimensional SMILES for drugs and motif-rich binding pocket subsequences of proteins as inputs. We evaluate and compare the proposed deep learning model trained on expert-based features against shallow feature-based machine learning methods. The proposed method achieved better or similar performance on the MSE and AUPR metrics than the shallow methods. Additionally, We show that our deep learning model, DeepPS is computationally more efficient than the deep learning model trained on full-length raw sequences of proteins. We conclude that a beneficial research approach would be to integrate structural information of proteins for modeling drug-target interaction prediction of large datasets for more interpretability, high throughput, and broad applicability.
Graphical abstract
Collapse
Affiliation(s)
- Sofia D’Souza
- Department of Computer Science and Engineering, Manipal Academy of Higher Education, Manipal, India
| | - K. V. Prema
- Department of Computer Science and Engineering, Manipal Academy of Higher Education, Bengaluru, India
| | - S. Balaji
- Department of Biotechnology, Manipal Academy of Higher Education, Manipal, India
| | - Ronak Shah
- Department of Computer Science and Engineering, Manipal Academy of Higher Education, Manipal, India
| |
Collapse
|
8
|
Bongers BJ, Sijben HJ, Hartog PBR, Tarnovskiy A, IJzerman AP, Heitman LH, van Westen GJP. Proteochemometric Modeling Identifies Chemically Diverse Norepinephrine Transporter Inhibitors. J Chem Inf Model 2023; 63:1745-1755. [PMID: 36926886 PMCID: PMC10052348 DOI: 10.1021/acs.jcim.2c01645] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/18/2023]
Abstract
Solute carriers (SLCs) are relatively underexplored compared to other prominent protein families such as kinases and G protein-coupled receptors. However, proteins from the SLC family play an essential role in various diseases. One such SLC is the high-affinity norepinephrine transporter (NET/SLC6A2). In contrast to most other SLCs, the NET has been relatively well studied. However, the chemical space of known ligands has a low chemical diversity, making it challenging to identify chemically novel ligands. Here, a computational screening pipeline was developed to find new NET inhibitors. The approach increases the chemical space to model for NETs using the chemical space of related proteins that were selected utilizing similarity networks. Prior proteochemometric models added data from related proteins, but here we use a data-driven approach to select the optimal proteins to add to the modeled data set. After optimizing the data set, the proteochemometric model was optimized using stepwise feature selection. The final model was created using a two-step approach combining several proteochemometric machine learning models through stacking. This model was applied to the extensive virtual compound database of Enamine, from which the top predicted 22,000 of the 600 million virtual compounds were clustered to end up with 46 chemically diverse candidates. A subselection of 32 candidates was synthesized and subsequently tested using an impedance-based assay. There were five hit compounds identified (hit rate 16%) with sub-micromolar inhibitory potencies toward NET, which are promising for follow-up experimental research. This study demonstrates a data-driven approach to diversify known chemical space to identify novel ligands and is to our knowledge the first to select this set based on the sequence similarity of related targets.
Collapse
Affiliation(s)
- Brandon J Bongers
- Division of Drug Discovery and Safety, Leiden Academic Centre for Drug Research, Leiden University, Einsteinweg 55, Leiden 2333 CC, The Netherlands
| | - Huub J Sijben
- Division of Drug Discovery and Safety, Leiden Academic Centre for Drug Research, Leiden University, Einsteinweg 55, Leiden 2333 CC, The Netherlands
| | - Peter B R Hartog
- Division of Drug Discovery and Safety, Leiden Academic Centre for Drug Research, Leiden University, Einsteinweg 55, Leiden 2333 CC, The Netherlands
| | | | - Adriaan P IJzerman
- Division of Drug Discovery and Safety, Leiden Academic Centre for Drug Research, Leiden University, Einsteinweg 55, Leiden 2333 CC, The Netherlands
| | - Laura H Heitman
- Division of Drug Discovery and Safety, Leiden Academic Centre for Drug Research, Leiden University, Einsteinweg 55, Leiden 2333 CC, The Netherlands.,Oncode Institute, Jaarbeursplein 6, Utrecht 3521 AL, The Netherlands
| | - Gerard J P van Westen
- Division of Drug Discovery and Safety, Leiden Academic Centre for Drug Research, Leiden University, Einsteinweg 55, Leiden 2333 CC, The Netherlands
| |
Collapse
|
9
|
Naga D, Muster W, Musvasva E, Ecker GF. Off-targetP ML: an open source machine learning framework for off-target panel safety assessment of small molecules. J Cheminform 2022; 14:27. [PMID: 35525988 PMCID: PMC9077900 DOI: 10.1186/s13321-022-00603-w] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2021] [Accepted: 03/26/2022] [Indexed: 11/10/2022] Open
Abstract
Unpredicted drug safety issues constitute the majority of failures in the pharmaceutical industry according to several studies. Some of these preclinical safety issues could be attributed to the non-selective binding of compounds to targets other than their intended therapeutic target, causing undesired adverse events. Consequently, pharmaceutical companies routinely run in-vitro safety screens to detect off-target activities prior to preclinical and clinical studies. Hereby we present an open source machine learning framework aiming at the prediction of our in-house 50 off-target panel activities for ~ 4000 compounds, directly from their structure. This framework is intended to guide chemists in the drug design process prior to synthesis and to accelerate drug discovery. We also present a set of ML approaches that require minimum programming experience for deployment. The workflow incorporates different ML approaches such as deep learning and automated machine learning. It also accommodates popular issues faced in bioactivity predictions, as data imbalance, inter-target duplicated measurements and duplicated public compound identifiers. Throughout the workflow development, we explore and compare the capability of Neural Networks and AutoML in constructing prediction models for fifty off-targets of different protein classes, different dataset sizes, and high-class imbalance. Outcomes from different methods are compared in terms of efficiency and efficacy. The most important challenges and factors impacting model construction and performance in addition to suggestions on how to overcome such challenges are also discussed.
Collapse
Affiliation(s)
- Doha Naga
- Roche Pharma Research & Early Development, Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd., Basel, Switzerland.,Department of Pharmaceutical Sciences, University of Vienna, Vienna, Austria
| | - Wolfgang Muster
- Roche Pharma Research & Early Development, Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd., Basel, Switzerland
| | - Eunice Musvasva
- Roche Pharma Research & Early Development, Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd., Basel, Switzerland
| | - Gerhard F Ecker
- Department of Pharmaceutical Sciences, University of Vienna, Vienna, Austria.
| |
Collapse
|
10
|
Yang Z, Zhong W, Zhao L, Yu-Chian Chen C. MGraphDTA: deep multiscale graph neural network for explainable drug-target binding affinity prediction. Chem Sci 2022; 13:816-833. [PMID: 35173947 PMCID: PMC8768884 DOI: 10.1039/d1sc05180f] [Citation(s) in RCA: 74] [Impact Index Per Article: 37.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2021] [Accepted: 12/17/2021] [Indexed: 12/22/2022] Open
Abstract
Predicting drug-target affinity (DTA) is beneficial for accelerating drug discovery. Graph neural networks (GNNs) have been widely used in DTA prediction. However, existing shallow GNNs are insufficient to capture the global structure of compounds. Besides, the interpretability of the graph-based DTA models highly relies on the graph attention mechanism, which can not reveal the global relationship between each atom of a molecule. In this study, we proposed a deep multiscale graph neural network based on chemical intuition for DTA prediction (MGraphDTA). We introduced a dense connection into the GNN and built a super-deep GNN with 27 graph convolutional layers to capture the local and global structure of the compound simultaneously. We also developed a novel visual explanation method, gradient-weighted affinity activation mapping (Grad-AAM), to analyze a deep learning model from the chemical perspective. We evaluated our approach using seven benchmark datasets and compared the proposed method to the state-of-the-art deep learning (DL) models. MGraphDTA outperforms other DL-based approaches significantly on various datasets. Moreover, we show that Grad-AAM creates explanations that are consistent with pharmacologists, which may help us gain chemical insights directly from data beyond human perception. These advantages demonstrate that the proposed method improves the generalization and interpretation capability of DTA prediction modeling.
Collapse
Affiliation(s)
- Ziduo Yang
- Artificial Intelligence Medical Center, School of Intelligent Systems Engineering, Sun Yat-sen University Shenzhen 510275 China +862039332153
| | - Weihe Zhong
- Artificial Intelligence Medical Center, School of Intelligent Systems Engineering, Sun Yat-sen University Shenzhen 510275 China +862039332153
| | - Lu Zhao
- Artificial Intelligence Medical Center, School of Intelligent Systems Engineering, Sun Yat-sen University Shenzhen 510275 China +862039332153
- Department of Clinical Laboratory, The Sixth Affiliated Hospital, Sun Yat-sen University Guangzhou 510655 China
| | - Calvin Yu-Chian Chen
- Artificial Intelligence Medical Center, School of Intelligent Systems Engineering, Sun Yat-sen University Shenzhen 510275 China +862039332153
- Department of Medical Research, China Medical University Hospital Taichung 40447 Taiwan
- Department of Bioinformatics and Medical Engineering, Asia University Taichung 41354 Taiwan
| |
Collapse
|
11
|
Unsupervised Representation Learning for Proteochemometric Modeling. Int J Mol Sci 2021; 22:ijms222312882. [PMID: 34884688 PMCID: PMC8657702 DOI: 10.3390/ijms222312882] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2021] [Revised: 11/25/2021] [Accepted: 11/26/2021] [Indexed: 11/18/2022] Open
Abstract
In silico protein–ligand binding prediction is an ongoing area of research in computational chemistry and machine learning based drug discovery, as an accurate predictive model could greatly reduce the time and resources necessary for the detection and prioritization of possible drug candidates. Proteochemometric modeling (PCM) attempts to create an accurate model of the protein–ligand interaction space by combining explicit protein and ligand descriptors. This requires the creation of information-rich, uniform and computer interpretable representations of proteins and ligands. Previous studies in PCM modeling rely on pre-defined, handcrafted feature extraction methods, and many methods use protein descriptors that require alignment or are otherwise specific to a particular group of related proteins. However, recent advances in representation learning have shown that unsupervised machine learning can be used to generate embeddings that outperform complex, human-engineered representations. Several different embedding methods for proteins and molecules have been developed based on various language-modeling methods. Here, we demonstrate the utility of these unsupervised representations and compare three protein embeddings and two compound embeddings in a fair manner. We evaluate performance on various splits of a benchmark dataset, as well as on an internal dataset of protein–ligand binding activities and find that unsupervised-learned representations significantly outperform handcrafted representations.
Collapse
|
12
|
Thomas M, Boardman A, Garcia-Ortegon M, Yang H, de Graaf C, Bender A. Applications of Artificial Intelligence in Drug Design: Opportunities and Challenges. METHODS IN MOLECULAR BIOLOGY (CLIFTON, N.J.) 2021; 2390:1-59. [PMID: 34731463 DOI: 10.1007/978-1-0716-1787-8_1] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
Abstract
Artificial intelligence (AI) has undergone rapid development in recent years and has been successfully applied to real-world problems such as drug design. In this chapter, we review recent applications of AI to problems in drug design including virtual screening, computer-aided synthesis planning, and de novo molecule generation, with a focus on the limitations of the application of AI therein and opportunities for improvement. Furthermore, we discuss the broader challenges imposed by AI in translating theoretical practice to real-world drug design; including quantifying prediction uncertainty and explaining model behavior.
Collapse
Affiliation(s)
- Morgan Thomas
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Cambridge, UK
| | - Andrew Boardman
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Cambridge, UK
| | - Miguel Garcia-Ortegon
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Cambridge, UK.,Department of Pure Mathematics and Mathematical Statistics, University of Cambridge, Cambridge, UK
| | - Hongbin Yang
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Cambridge, UK
| | | | - Andreas Bender
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Cambridge, UK.
| |
Collapse
|
13
|
Ye Q, Hsieh CY, Yang Z, Kang Y, Chen J, Cao D, He S, Hou T. A unified drug-target interaction prediction framework based on knowledge graph and recommendation system. Nat Commun 2021; 12:6775. [PMID: 34811351 PMCID: PMC8635420 DOI: 10.1038/s41467-021-27137-3] [Citation(s) in RCA: 67] [Impact Index Per Article: 22.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2021] [Accepted: 11/05/2021] [Indexed: 02/06/2023] Open
Abstract
Prediction of drug-target interactions (DTI) plays a vital role in drug development in various areas, such as virtual screening, drug repurposing and identification of potential drug side effects. Despite extensive efforts have been invested in perfecting DTI prediction, existing methods still suffer from the high sparsity of DTI datasets and the cold start problem. Here, we develop KGE_NFM, a unified framework for DTI prediction by combining knowledge graph (KG) and recommendation system. This framework firstly learns a low-dimensional representation for various entities in the KG, and then integrates the multimodal information via neural factorization machine (NFM). KGE_NFM is evaluated under three realistic scenarios, and achieves accurate and robust predictions on four benchmark datasets, especially in the scenario of the cold start for proteins. Our results indicate that KGE_NFM provides valuable insight to integrate KG and recommendation system-based techniques into a unified framework for novel DTI discovery.
Collapse
Affiliation(s)
- Qing Ye
- grid.13402.340000 0004 1759 700XInnovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058 Zhejiang China ,grid.13402.340000 0004 1759 700XCollege of Control Science and Engineering, Zhejiang University, Hangzhou, 310027 Zhejiang China ,grid.13402.340000 0004 1759 700XState Key Lab of CAD&CG, Zhejiang University, Hangzhou, Zhejiang 310058 China
| | - Chang-Yu Hsieh
- Tencent Quantum Laboratory, Shenzhen, 518057 Guangdong China
| | - Ziyi Yang
- Tencent Quantum Laboratory, Shenzhen, 518057 Guangdong China
| | - Yu Kang
- grid.13402.340000 0004 1759 700XInnovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058 Zhejiang China
| | - Jiming Chen
- grid.13402.340000 0004 1759 700XCollege of Control Science and Engineering, Zhejiang University, Hangzhou, 310027 Zhejiang China
| | - Dongsheng Cao
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha, 410013, Hunan, China.
| | - Shibo He
- College of Control Science and Engineering, Zhejiang University, Hangzhou, 310027, Zhejiang, China.
| | - Tingjun Hou
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, Zhejiang, China. .,State Key Lab of CAD&CG, Zhejiang University, Hangzhou, Zhejiang, 310058, China.
| |
Collapse
|
14
|
Khan MKA, Akhtar S. Novel drug design and bioinformatics: an introduction. PHYSICAL SCIENCES REVIEWS 2021. [DOI: 10.1515/psr-2018-0158] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
Abstract
In the current era of high-throughput technology, where enormous amounts of biological data are generated day by day via various sequencing projects, thereby the staggering volume of biological targets deciphered. The discovery of new chemical entities and bioisosteres of relatively low molecular weight has been gaining high momentum in the pharmacopoeia, and traditional combinatorial design wherein chemical structure is used as an initial template for enhancing efficacy pharmacokinetic selectivity properties. Once the compound is identified, it undergoes ADMET filtration to ensure whether it has toxic and mutagenic properties or not. If the compound has no toxicity and mutagenicity is either considered a potential lead molecule. Understanding the mechanism of lead molecules with various biological targets is imperative to advance related functions for drug discovery and development. Notwithstanding, a tedious and costly process, taking around 10–15 years and costing around $4 billion, cascaded approached of Bioinformatics and Computational biology viz., structure-based drug design (SBDD) and cognate ligand-based drug design (LBDD) respectively rely on the availability of 3D structure of target biomacromolecules and vice versa has made this process easy and approachable. SBDD encompasses homology modelling, ligand docking, fragment-based drug design and molecular dynamics, while LBDD deals with pharmacophore mapping, QSAR, and similarity search. All the computational methods discussed herein, whether for target identification or novel ligand discovery, continuously evolve and facilitate cost-effective and reliable outcomes in an era of overwhelming data.
Collapse
Affiliation(s)
- Mohammad Kalim Ahmad Khan
- Department of Bioengineering, Faculty of Engineering , Integral University , Lucknow , Uttar Pradesh , 226026 , India
| | - Salman Akhtar
- Department of Bioengineering, Faculty of Engineering , Integral University , Lucknow , Uttar Pradesh , 226026 , India
| |
Collapse
|
15
|
Recent Advances in In Silico Target Fishing. Molecules 2021; 26:molecules26175124. [PMID: 34500568 PMCID: PMC8433825 DOI: 10.3390/molecules26175124] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2021] [Revised: 08/14/2021] [Accepted: 08/18/2021] [Indexed: 12/24/2022] Open
Abstract
In silico target fishing, whose aim is to identify possible protein targets for a query molecule, is an emerging approach used in drug discovery due its wide variety of applications. This strategy allows the clarification of mechanism of action and biological activities of compounds whose target is still unknown. Moreover, target fishing can be employed for the identification of off targets of drug candidates, thus recognizing and preventing their possible adverse effects. For these reasons, target fishing has increasingly become a key approach for polypharmacology, drug repurposing, and the identification of new drug targets. While experimental target fishing can be lengthy and difficult to implement, due to the plethora of interactions that may occur for a single small-molecule with different protein targets, an in silico approach can be quicker, less expensive, more efficient for specific protein structures, and thus easier to employ. Moreover, the possibility to use it in combination with docking and virtual screening studies, as well as the increasing number of web-based tools that have been recently developed, make target fishing a more appealing method for drug discovery. It is especially worth underlining the increasing implementation of machine learning in this field, both as a main target fishing approach and as a further development of already applied strategies. This review reports on the main in silico target fishing strategies, belonging to both ligand-based and receptor-based approaches, developed and applied in the last years, with a particular attention to the different web tools freely accessible by the scientific community for performing target fishing studies.
Collapse
|
16
|
Prajapati R, Park SE, Seong SH, Paudel P, Fauzi FM, Jung HA, Choi JS. Monoamine Oxidase Inhibition by Major Tanshinones from Salvia miltiorrhiza and Selective Muscarinic Acetylcholine M 4 Receptor Antagonism by Tanshinone I. Biomolecules 2021; 11:1001. [PMID: 34356625 PMCID: PMC8301926 DOI: 10.3390/biom11071001] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2021] [Revised: 06/30/2021] [Accepted: 07/05/2021] [Indexed: 11/23/2022] Open
Abstract
Monoamine oxidases (MAOs) and muscarinic acetylcholine receptors (mAChRs) are considered important therapeutic targets for Parkinson's disease (PD). Lipophilic tanshinones are major phytoconstituents in the dried roots of Salvia miltiorrhiza that have demonstrated neuroprotective effects against dopaminergic neurotoxins and the inhibition of MAO-A. Since MAO-B inhibition is considered an effective therapeutic strategy for PD, we tested the inhibitory activities of three abundant tanshinone congeners against recombinant human MAO (hMAO) isoenzymes through in vitro experiments. In our study, tanshinone I (1) exhibited the highest potency against hMAO-A, followed by tanshinone IIA and cryptotanshinone, with an IC50 less than 10 µM. They also suppressed hMAO-B activity, with an IC50 below 25 µM. Although tanshinones are known to inhibit hMAO-A, their enzyme inhibition mechanism and binding sites have yet to be investigated. Enzyme kinetics and molecular docking studies have revealed the mode of inhibition and interactions of tanshinones during enzyme inhibition. Proteochemometric modeling predicted mAChRs as possible pharmacological targets of 1, and in vitro functional assays confirmed the selective M4 antagonist nature of 1 (56.1% ± 2.40% inhibition of control agonist response at 100 µM). These findings indicate that 1 is a potential therapeutic molecule for managing the motor dysfunction and depression associated with PD.
Collapse
Affiliation(s)
- Ritu Prajapati
- Department of Food and Life Science, Pukyong National University, Busan 48513, Korea; (R.P.); (S.E.P.); (S.H.S.); (P.P.)
| | - Se Eun Park
- Department of Food and Life Science, Pukyong National University, Busan 48513, Korea; (R.P.); (S.E.P.); (S.H.S.); (P.P.)
- Department of Biomedical Science, Asan Medical Institute of Convergence Science and Technology, University of Ulsan, Seoul 05505, Korea
| | - Su Hui Seong
- Department of Food and Life Science, Pukyong National University, Busan 48513, Korea; (R.P.); (S.E.P.); (S.H.S.); (P.P.)
- Natural Product Research Division, Honam National Institute of Biological Resource, Mokpo 58762, Korea
| | - Pradeep Paudel
- Department of Food and Life Science, Pukyong National University, Busan 48513, Korea; (R.P.); (S.E.P.); (S.H.S.); (P.P.)
- National Center for Natural Products Research, Research Institute of Pharmaceutical Science, The University of Mississippi, Oxford, MS 38677, USA
| | - Fazlin Mohd Fauzi
- Department of Pharmacology and Chemistry, Faculty of Pharmacy, Universiti Teknologi MARA, Puncak Alam 42300, Malaysia;
| | - Hyun Ah Jung
- Department of Food Science and Human Nutrition, Jeonbok National University, Jeonju 54896, Korea
| | - Jae Sue Choi
- Department of Food and Life Science, Pukyong National University, Busan 48513, Korea; (R.P.); (S.E.P.); (S.H.S.); (P.P.)
| |
Collapse
|
17
|
Kimber TB, Chen Y, Volkamer A. Deep Learning in Virtual Screening: Recent Applications and Developments. Int J Mol Sci 2021; 22:4435. [PMID: 33922714 PMCID: PMC8123040 DOI: 10.3390/ijms22094435] [Citation(s) in RCA: 60] [Impact Index Per Article: 20.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2021] [Revised: 04/13/2021] [Accepted: 04/14/2021] [Indexed: 01/03/2023] Open
Abstract
Drug discovery is a cost and time-intensive process that is often assisted by computational methods, such as virtual screening, to speed up and guide the design of new compounds. For many years, machine learning methods have been successfully applied in the context of computer-aided drug discovery. Recently, thanks to the rise of novel technologies as well as the increasing amount of available chemical and bioactivity data, deep learning has gained a tremendous impact in rational active compound discovery. Herein, recent applications and developments of machine learning, with a focus on deep learning, in virtual screening for active compound design are reviewed. This includes introducing different compound and protein encodings, deep learning techniques as well as frequently used bioactivity and benchmark data sets for model training and testing. Finally, the present state-of-the-art, including the current challenges and emerging problems, are examined and discussed.
Collapse
Affiliation(s)
| | | | - Andrea Volkamer
- In Silico Toxicology and Structural Bioinformatics, Institute of Physiology, Charité-Universitätsmedizin Berlin, Charitéplatz 1, 10117 Berlin, Germany; (T.B.K.); (Y.C.)
| |
Collapse
|
18
|
Abstract
Introduction: Artificial Intelligence (AI) has become a component of our everyday lives, with applications ranging from recommendations on what to buy to the analysis of radiology images. Many of the techniques originally developed for other fields such as language translation and computer vision are now being applied in drug discovery. AI has enabled multiple aspects of drug discovery including the analysis of high content screening data, and the design and synthesis of new molecules.Areas covered: This perspective provides an overview of the application of AI in several areas relevant to drug discovery including property prediction, molecule generation, image analysis, and organic synthesis planning.Expert opinion: While a variety of machine learning methods are now being routinely used to predict biological activity and ADME properties, methods of representing molecules continue to evolve. Molecule generation methods are relatively new and unproven but hold the potential to access new, unexplored areas of chemical space. The application of AI in drug discovery will continue to benefit from dedicated research, as well as AI developments in other fields. With this pairing algorithmic advancements and high-quality data, the impact of AI in drug discovery will continue to grow in the coming years.
Collapse
Affiliation(s)
| | - Regina Barzilay
- Computer Science and Artificial Intelligence Laboratory, MIT, Cambridge, MA, USA
| |
Collapse
|
19
|
Liu X, IJzerman AP, van Westen GJP. Computational Approaches for De Novo Drug Design: Past, Present, and Future. Methods Mol Biol 2021; 2190:139-165. [PMID: 32804364 DOI: 10.1007/978-1-0716-0826-5_6] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
Drug discovery is time- and resource-consuming. To this end, computational approaches that are applied in de novo drug design play an important role to improve the efficiency and decrease costs to develop novel drugs. Over several decades, a variety of methods have been proposed and applied in practice. Traditionally, drug design problems are always taken as combinational optimization in discrete chemical space. Hence optimization methods were exploited to search for new drug molecules to meet multiple objectives. With the accumulation of data and the development of machine learning methods, computational drug design methods have gradually shifted to a new paradigm. There has been particular interest in the potential application of deep learning methods to drug design. In this chapter, we will give a brief description of these two different de novo methods, compare their application scopes and discuss their possible development in the future.
Collapse
Affiliation(s)
- Xuhan Liu
- Drug Discovery and Safety, Leiden Academic Centre for Drug Research, Leiden, The Netherlands
| | - Adriaan P IJzerman
- Drug Discovery and Safety, Leiden Academic Centre for Drug Research, Leiden, The Netherlands
| | - Gerard J P van Westen
- Drug Discovery and Safety, Leiden Academic Centre for Drug Research, Leiden, The Netherlands.
| |
Collapse
|
20
|
Mansoldo FRP, Carta F, Angeli A, Cardoso VDS, Supuran CT, Vermelho AB. Chagas Disease: Perspectives on the Past and Present and Challenges in Drug Discovery. Molecules 2020; 25:E5483. [PMID: 33238613 PMCID: PMC7700143 DOI: 10.3390/molecules25225483] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2020] [Revised: 11/19/2020] [Accepted: 11/20/2020] [Indexed: 12/20/2022] Open
Abstract
Chagas disease still has no effective treatment option for all of its phases despite being discovered more than 100 years ago. The development of commercial drugs has been stagnating since the 1960s, a fact that sheds light on the question of how drug discovery research has progressed and taken advantage of technological advances. Could it be that technological advances have not yet been sufficient to resolve this issue or is there a lack of protocol, validation and standardization of the data generated by different research teams? This work presents an overview of commercial drugs and those that have been evaluated in studies and clinical trials so far. A brief review is made of recent target-based and phenotypic studies based on the search for molecules with anti-Trypanosoma cruzi action. It also discusses how proteochemometric (PCM) modeling and microcrystal electron diffraction (MicroED) can help in the case of the lack of a 3D protein structure; more specifically, Trypanosoma cruzi carbonic anhydrase.
Collapse
Affiliation(s)
- Felipe Raposo Passos Mansoldo
- BIOINOVAR-Biocatalysis, Bioproducts and Bioenergy, Institute of Microbiology Paulo de Góes, Federal University of Rio de Janeiro (UFRJ), Rio de Janeiro 21941-902, Brazil; (F.R.P.M.); (V.d.S.C.)
| | - Fabrizio Carta
- Neurofarba Department, Università degli Studi di Firenze, Sezione di Scienze Farmaceutiche, Via Ugo Schiff 6, 50019 Sesto Fiorentino (Florence), Italy; (F.C.); (A.A.)
| | - Andrea Angeli
- Neurofarba Department, Università degli Studi di Firenze, Sezione di Scienze Farmaceutiche, Via Ugo Schiff 6, 50019 Sesto Fiorentino (Florence), Italy; (F.C.); (A.A.)
- Centre of Advanced Research in Bionanoconjugates and Biopolymers Department, “Petru Poni” Institute of Macromolecular Chemistry, 700487 Iasi, Romania
| | - Veronica da Silva Cardoso
- BIOINOVAR-Biocatalysis, Bioproducts and Bioenergy, Institute of Microbiology Paulo de Góes, Federal University of Rio de Janeiro (UFRJ), Rio de Janeiro 21941-902, Brazil; (F.R.P.M.); (V.d.S.C.)
| | - Claudiu T. Supuran
- Neurofarba Department, Università degli Studi di Firenze, Sezione di Scienze Farmaceutiche, Via Ugo Schiff 6, 50019 Sesto Fiorentino (Florence), Italy; (F.C.); (A.A.)
| | - Alane Beatriz Vermelho
- BIOINOVAR-Biocatalysis, Bioproducts and Bioenergy, Institute of Microbiology Paulo de Góes, Federal University of Rio de Janeiro (UFRJ), Rio de Janeiro 21941-902, Brazil; (F.R.P.M.); (V.d.S.C.)
| |
Collapse
|
21
|
Karasev D, Sobolev B, Lagunin A, Filimonov D, Poroikov V. Prediction of Protein-ligand Interaction Based on Sequence Similarity and Ligand Structural Features. Int J Mol Sci 2020; 21:ijms21218152. [PMID: 33142754 PMCID: PMC7663273 DOI: 10.3390/ijms21218152] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2020] [Revised: 10/28/2020] [Accepted: 10/29/2020] [Indexed: 01/09/2023] Open
Abstract
Computationally predicting the interaction of proteins and ligands presents three main directions: the search of new target proteins for ligands, the search of new ligands for targets, and predicting the interaction of new proteins and new ligands. We proposed an approach providing the fuzzy classification of protein sequences based on the ligand structural features to analyze the latter most complicated case. We tested our approach on five protein groups, which represented promised targets for drug-like ligands and differed in functional peculiarities. The training sets were built with the original procedure overcoming the data ambiguity. Our study showed the effective prediction of new targets for ligands with an average accuracy of 0.96. The prediction of new ligands for targets displayed the average accuracy 0.95; accuracy estimates were close to our previous results, comparable in accuracy to those of other methods or exceeded them. Using the fuzzy coefficients reflecting the target-to-ligand specificity, we provided predicting interactions for new proteins and new ligands; the obtained accuracy values from 0.89 to 0.99 were acceptable for such a sophisticated task. The protein kinase family case demonstrated the ability to account for subtle features of proteins and ligands required for the specificity of protein–ligand interaction.
Collapse
Affiliation(s)
- Dmitry Karasev
- Department of Bioinformatics, Institute of Biomedical Chemistry, Moscow 119121, Russia; (B.S.); (A.L.); (D.F.); (V.P.)
- Correspondence:
| | - Boris Sobolev
- Department of Bioinformatics, Institute of Biomedical Chemistry, Moscow 119121, Russia; (B.S.); (A.L.); (D.F.); (V.P.)
| | - Alexey Lagunin
- Department of Bioinformatics, Institute of Biomedical Chemistry, Moscow 119121, Russia; (B.S.); (A.L.); (D.F.); (V.P.)
- Department of Bioinformatics, Russian National Research Medical University, Moscow 117997, Russia
| | - Dmitry Filimonov
- Department of Bioinformatics, Institute of Biomedical Chemistry, Moscow 119121, Russia; (B.S.); (A.L.); (D.F.); (V.P.)
| | - Vladimir Poroikov
- Department of Bioinformatics, Institute of Biomedical Chemistry, Moscow 119121, Russia; (B.S.); (A.L.); (D.F.); (V.P.)
| |
Collapse
|
22
|
Yang S, Ye Q, Ding J, Yin, Lu A, Chen X, Hou T, Cao D. Current advances in ligand‐based target prediction. WIRES COMPUTATIONAL MOLECULAR SCIENCE 2020. [DOI: 10.1002/wcms.1504] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
Affiliation(s)
- Su‐Qing Yang
- Xiangya School of Pharmaceutical Sciences Central South University Changsha Hunan China
| | - Qing Ye
- College of Pharmaceutical Sciences Innovation Institute for Artificial Intelligence in Medicine, Zhejiang University Hangzhou, Zhejiang China
| | - Jun‐Jie Ding
- Beijing Institute of Pharmaceutical Chemistry Beijing China
| | - Yin
- Department of Dermatology, Hunan Engineering Research Center of Skin Health and Disease, Hunan Key Laboratory of Skin Cancer and Psoriasis, Xiangya Hospital Central South University Changsha Hunan China
| | - Ai‐Ping Lu
- Institute for Advancing Translational Medicine in Bone and Joint Diseases, School of Chinese Medicine Hong Kong Baptist University Hong Kong China
| | - Xiang Chen
- Department of Dermatology, Hunan Engineering Research Center of Skin Health and Disease, Hunan Key Laboratory of Skin Cancer and Psoriasis, Xiangya Hospital Central South University Changsha Hunan China
| | - Ting‐Jun Hou
- College of Pharmaceutical Sciences Innovation Institute for Artificial Intelligence in Medicine, Zhejiang University Hangzhou, Zhejiang China
| | - Dong‐Sheng Cao
- Xiangya School of Pharmaceutical Sciences Central South University Changsha Hunan China
- Institute for Advancing Translational Medicine in Bone and Joint Diseases, School of Chinese Medicine Hong Kong Baptist University Hong Kong China
| |
Collapse
|
23
|
Burggraaff L, Lenselink EB, Jespers W, van Engelen J, Bongers BJ, González MG, Liu R, Hoos HH, van Vlijmen HWT, IJzerman AP, van Westen GJP. Successive Statistical and Structure-Based Modeling to Identify Chemically Novel Kinase Inhibitors. J Chem Inf Model 2020; 60:4283-4295. [PMID: 32343143 PMCID: PMC7525794 DOI: 10.1021/acs.jcim.9b01204] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
![]()
Kinases are frequently
studied in the context of anticancer drugs.
Their involvement in cell responses, such as proliferation, differentiation,
and apoptosis, makes them interesting subjects in multitarget drug
design. In this study, a workflow is presented that models the bioactivity
spectra for two panels of kinases: (1) inhibition of RET, BRAF, SRC,
and S6K, while avoiding inhibition of MKNK1, TTK, ERK8, PDK1, and
PAK3, and (2) inhibition of AURKA, PAK1, FGFR1, and LKB1, while avoiding
inhibition of PAK3, TAK1, and PIK3CA. Both statistical and structure-based
models were included, which were thoroughly benchmarked and optimized.
A virtual screening was performed to test the workflow for one of
the main targets, RET kinase. This resulted in 5 novel and chemically
dissimilar RET inhibitors with remaining RET activity of <60% (at
a concentration of 10 μM) and similarities with known RET inhibitors
from 0.18 to 0.29 (Tanimoto, ECFP6). The four more potent inhibitors
were assessed in a concentration range and proved to be modestly active
with a pIC50 value of 5.1 for the most active compound.
The experimental validation of inhibitors for RET strongly indicates
that the multitarget workflow is able to detect novel inhibitors for
kinases, and hence, this workflow can potentially be applied in polypharmacology
modeling. We conclude that this approach can identify new chemical
matter for existing targets. Moreover, this workflow can easily be
applied to other targets as well.
Collapse
Affiliation(s)
- Lindsey Burggraaff
- Division of Drug Discovery & Safety, Leiden Academic Centre for Drug Research, Leiden University, Einsteinweg 55, 2333 CC, Leiden, The Netherlands
| | - Eelke B Lenselink
- Division of Drug Discovery & Safety, Leiden Academic Centre for Drug Research, Leiden University, Einsteinweg 55, 2333 CC, Leiden, The Netherlands
| | - Willem Jespers
- Division of Drug Discovery & Safety, Leiden Academic Centre for Drug Research, Leiden University, Einsteinweg 55, 2333 CC, Leiden, The Netherlands.,Department of Cell and Molecular Biology, Uppsala University, Uppsala 75124, Sweden
| | - Jesper van Engelen
- Department of Computer Science, Leiden Institute of Advanced Computer Science, Leiden University, Niels Bohrweg 1, 2333 CA, Leiden, The Netherlands
| | - Brandon J Bongers
- Division of Drug Discovery & Safety, Leiden Academic Centre for Drug Research, Leiden University, Einsteinweg 55, 2333 CC, Leiden, The Netherlands
| | - Marina Gorostiola González
- Division of Drug Discovery & Safety, Leiden Academic Centre for Drug Research, Leiden University, Einsteinweg 55, 2333 CC, Leiden, The Netherlands
| | - Rongfang Liu
- Division of Drug Discovery & Safety, Leiden Academic Centre for Drug Research, Leiden University, Einsteinweg 55, 2333 CC, Leiden, The Netherlands
| | - Holger H Hoos
- Department of Computer Science, Leiden Institute of Advanced Computer Science, Leiden University, Niels Bohrweg 1, 2333 CA, Leiden, The Netherlands
| | - Herman W T van Vlijmen
- Division of Drug Discovery & Safety, Leiden Academic Centre for Drug Research, Leiden University, Einsteinweg 55, 2333 CC, Leiden, The Netherlands.,Janssen Research & Development, Turnhoutseweg 30, 2340 Beerse, Belgium
| | - Adriaan P IJzerman
- Division of Drug Discovery & Safety, Leiden Academic Centre for Drug Research, Leiden University, Einsteinweg 55, 2333 CC, Leiden, The Netherlands
| | - Gerard J P van Westen
- Division of Drug Discovery & Safety, Leiden Academic Centre for Drug Research, Leiden University, Einsteinweg 55, 2333 CC, Leiden, The Netherlands
| |
Collapse
|
24
|
Sosnina EA, Sosnin S, Nikitina AA, Nazarov I, Osolodkin DI, Fedorov MV. Recommender Systems in Antiviral Drug Discovery. ACS OMEGA 2020; 5:15039-15051. [PMID: 32632398 PMCID: PMC7315437 DOI: 10.1021/acsomega.0c00857] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/26/2020] [Accepted: 06/03/2020] [Indexed: 06/11/2023]
Abstract
Recommender systems (RSs), which underwent rapid development and had an enormous impact on e-commerce, have the potential to become useful tools for drug discovery. In this paper, we applied RS methods for the prediction of the antiviral activity class (active/inactive) for compounds extracted from ChEMBL. Two main RS approaches were applied: collaborative filtering (Surprise implementation) and content-based filtering (sparse-group inductive matrix completion (SGIMC) method). The effectiveness of RS approaches was investigated for prediction of antiviral activity classes ("interactions") for compounds and viruses, for which some of their interactions with other viruses or compounds are known, and for prediction of interaction profiles for new compounds. Both approaches achieved relatively good prediction quality for binary classification of individual interactions and compound profiles, as quantified by cross-validation and external validation receiver operating characteristic (ROC) score >0.9. Thus, even simple recommender systems may serve as an effective tool in antiviral drug discovery.
Collapse
Affiliation(s)
- Ekaterina A. Sosnina
- Center
for Computational and Data-Intensive Science and Engineering, Skolkovo Institute of Science and Technology, Bolshoy Boulevard 30/1, Moscow 143026, Russia
- Institute
of Physiologically Active Compounds, RAS, Severniy pr. 1, Chernogolovka 142432, Russia
| | - Sergey Sosnin
- Center
for Computational and Data-Intensive Science and Engineering, Skolkovo Institute of Science and Technology, Bolshoy Boulevard 30/1, Moscow 143026, Russia
- Syntelly
LLC, Skolkovo Innovation Center, Bolshoy Boulevard 30, Moscow 121205, Russia
| | - Anastasia A. Nikitina
- Department
of Chemistry, Lomonosov Moscow State University, Leninskie Gory 1 bd. 3, Moscow 119991, Russia
- FSBSI
“Chumakov FSC R&D IBP RAS”, Poselok Instituta Poliomielita 8
bd. 1, Poselenie Moskovsky, Moscow 108819, Russia
| | - Ivan Nazarov
- Center
for Computational and Data-Intensive Science and Engineering, Skolkovo Institute of Science and Technology, Bolshoy Boulevard 30/1, Moscow 143026, Russia
| | - Dmitry I. Osolodkin
- FSBSI
“Chumakov FSC R&D IBP RAS”, Poselok Instituta Poliomielita 8
bd. 1, Poselenie Moskovsky, Moscow 108819, Russia
- Institute
of Translational Medicine and Biotechnology, Sechenov First Moscow State Medical University, Trubetskaya Ulitsa 8, Moscow 119991, Russia
| | - Maxim V. Fedorov
- Center
for Computational and Data-Intensive Science and Engineering, Skolkovo Institute of Science and Technology, Bolshoy Boulevard 30/1, Moscow 143026, Russia
- Syntelly
LLC, Skolkovo Innovation Center, Bolshoy Boulevard 30, Moscow 121205, Russia
- Physics
John Anderson Building, University of Strathclyde, 107 Rottenrow East, Glasgow G4 0NG, U.K.
| |
Collapse
|
25
|
Muratov EN, Bajorath J, Sheridan RP, Tetko IV, Filimonov D, Poroikov V, Oprea TI, Baskin II, Varnek A, Roitberg A, Isayev O, Curtarolo S, Fourches D, Cohen Y, Aspuru-Guzik A, Winkler DA, Agrafiotis D, Cherkasov A, Tropsha A. QSAR without borders. Chem Soc Rev 2020; 49:3525-3564. [PMID: 32356548 PMCID: PMC8008490 DOI: 10.1039/d0cs00098a] [Citation(s) in RCA: 319] [Impact Index Per Article: 79.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
Prediction of chemical bioactivity and physical properties has been one of the most important applications of statistical and more recently, machine learning and artificial intelligence methods in chemical sciences. This field of research, broadly known as quantitative structure-activity relationships (QSAR) modeling, has developed many important algorithms and has found a broad range of applications in physical organic and medicinal chemistry in the past 55+ years. This Perspective summarizes recent technological advances in QSAR modeling but it also highlights the applicability of algorithms, modeling methods, and validation practices developed in QSAR to a wide range of research areas outside of traditional QSAR boundaries including synthesis planning, nanotechnology, materials science, biomaterials, and clinical informatics. As modern research methods generate rapidly increasing amounts of data, the knowledge of robust data-driven modelling methods professed within the QSAR field can become essential for scientists working both within and outside of chemical research. We hope that this contribution highlighting the generalizable components of QSAR modeling will serve to address this challenge.
Collapse
Affiliation(s)
- Eugene N Muratov
- UNC Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, NC, USA.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
26
|
|
27
|
Playe B, Stoven V. Evaluation of deep and shallow learning methods in chemogenomics for the prediction of drugs specificity. J Cheminform 2020; 12:11. [PMID: 33431042 PMCID: PMC7011501 DOI: 10.1186/s13321-020-0413-0] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2019] [Accepted: 01/27/2020] [Indexed: 01/09/2023] Open
Abstract
Chemogenomics, also called proteochemometrics, covers a range of computational methods that can be used to predict protein–ligand interactions at large scales in the protein and chemical spaces. They differ from more classical ligand-based methods (also called QSAR) that predict ligands for a given protein receptor. In the context of drug discovery process, chemogenomics allows to tackle the question of predicting off-target proteins for drug candidates, one of the main causes of undesirable side-effects and failure within drugs development processes. The present study compares shallow and deep machine-learning approaches for chemogenomics, and explores data augmentation techniques for deep learning algorithms in chemogenomics. Shallow machine-learning algorithms rely on expert-based chemical and protein descriptors, while recent developments in deep learning algorithms enable to learn abstract numerical representations of molecular graphs and protein sequences, in order to optimise the performance of the prediction task. We first propose a formulation of chemogenomics with deep learning, called the chemogenomic neural network (CN), as a feed-forward neural network taking as input the combination of molecule and protein representations learnt by molecular graph and protein sequence encoders. We show that, on large datasets, the deep learning CN model outperforms state-of-the-art shallow methods, and competes with deep methods with expert-based descriptors. However, on small datasets, shallow methods present better prediction performance than deep learning methods. Then, we evaluate data augmentation techniques, namely multi-view and transfer learning, to improve the prediction performance of the chemogenomic neural network. We conclude that a promising research direction is to integrate heterogeneous sources of data such as auxiliary tasks for which large datasets are available, or independently, multiple molecule and protein attribute views.
Collapse
Affiliation(s)
- Benoit Playe
- Center for Computational Biology, Mines ParisTech, PSL Research University, 60 Bd Saint-Michel, 75006, Paris, France.,Institut Curie, 75248, Paris, France.,INSERM U900, 75248, Paris, France
| | - Veronique Stoven
- Center for Computational Biology, Mines ParisTech, PSL Research University, 60 Bd Saint-Michel, 75006, Paris, France. .,Institut Curie, 75248, Paris, France. .,INSERM U900, 75248, Paris, France.
| |
Collapse
|
28
|
Bagherian M, Sabeti E, Wang K, Sartor MA, Nikolovska-Coleska Z, Najarian K. Machine learning approaches and databases for prediction of drug-target interaction: a survey paper. Brief Bioinform 2020; 22:247-269. [PMID: 31950972 PMCID: PMC7820849 DOI: 10.1093/bib/bbz157] [Citation(s) in RCA: 148] [Impact Index Per Article: 37.0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2019] [Revised: 11/01/2019] [Accepted: 11/07/2019] [Indexed: 12/12/2022] Open
Abstract
The task of predicting the interactions between drugs and targets plays a key role in the process of drug discovery. There is a need to develop novel and efficient prediction approaches in order to avoid costly and laborious yet not-always-deterministic experiments to determine drug–target interactions (DTIs) by experiments alone. These approaches should be capable of identifying the potential DTIs in a timely manner. In this article, we describe the data required for the task of DTI prediction followed by a comprehensive catalog consisting of machine learning methods and databases, which have been proposed and utilized to predict DTIs. The advantages and disadvantages of each set of methods are also briefly discussed. Lastly, the challenges one may face in prediction of DTI using machine learning approaches are highlighted and we conclude by shedding some lights on important future research directions.
Collapse
Affiliation(s)
- Maryam Bagherian
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, 48109, USA
| | - Elyas Sabeti
- Michigan Institute for Data Science, University of Michigan, Ann Arbor, MI, 48109, USA
| | - Kai Wang
- Department of Biostatistics, School of Public Health, University of Michigan, Ann Arbor, MI, 48109, USA
| | - Maureen A Sartor
- Department of Pathology, University of Michigan, Ann Arbor, MI, 48109, USA
| | | | - Kayvan Najarian
- Department of Electrical Engineering and Computer Science, College of Engineering, University of Michigan, Ann Arbor, MI, 48109, USA
| |
Collapse
|
29
|
Bongers BJ, IJzerman AP, Van Westen GJP. Proteochemometrics - recent developments in bioactivity and selectivity modeling. DRUG DISCOVERY TODAY. TECHNOLOGIES 2019; 32-33:89-98. [PMID: 33386099 DOI: 10.1016/j.ddtec.2020.08.003] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/28/2020] [Revised: 08/18/2020] [Accepted: 08/28/2020] [Indexed: 06/12/2023]
Abstract
Proteochemometrics is a machine learning based modeling approach relying on a combination of ligand and protein descriptors. With ongoing developments in machine learning and increases in public data the technique is more frequently applied in early drug discovery, typically in ligand-target binding prediction. Common applications include improvements to single target quantitative structure-activity relationship models, protein selectivity and promiscuity modeling, and large-scale deep learning approaches. The increase in predictive power using proteochemometrics is observed in multi-target bioactivity modeling, opening the door to more extensive studies covering whole protein families. On top of that, with deep learning fueling more complex and larger scale models, proteochemometrics allows faster and higher quality computational models supporting the design, make, test cycle.
Collapse
Affiliation(s)
- Brandon J Bongers
- Division of Drug Discovery and Safety, Leiden Academic Centre for Drug Research, Leiden University, P.O. Box 9502, 2300 RA, Leiden, The Netherlands
| | - Adriaan P IJzerman
- Division of Drug Discovery and Safety, Leiden Academic Centre for Drug Research, Leiden University, P.O. Box 9502, 2300 RA, Leiden, The Netherlands
| | - Gerard J P Van Westen
- Division of Drug Discovery and Safety, Leiden Academic Centre for Drug Research, Leiden University, P.O. Box 9502, 2300 RA, Leiden, The Netherlands.
| |
Collapse
|
30
|
Moumbock AF, Li J, Mishra P, Gao M, Günther S. Current computational methods for predicting protein interactions of natural products. Comput Struct Biotechnol J 2019; 17:1367-1376. [PMID: 31762960 PMCID: PMC6861622 DOI: 10.1016/j.csbj.2019.08.008] [Citation(s) in RCA: 28] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2019] [Revised: 08/09/2019] [Accepted: 08/23/2019] [Indexed: 01/08/2023] Open
Abstract
Natural products (NPs) are an indispensable source of drugs and they have a better coverage of the pharmacological space than synthetic compounds, owing to their high structural diversity. The prediction of their interaction profiles with druggable protein targets remains a major challenge in modern drug discovery. Experimental (off-)target predictions of NPs are cost- and time-consuming, whereas computational methods, on the other hand, are much faster and cheaper. As a result, computational predictions are preferentially used in the first instance for NP profiling, prior to experimental validations. This review covers recent advances in computational approaches which have been developed to aid the annotation of unknown drug-target interactions (DTIs), by focusing on three broad classes, namely: ligand-based, target-based, and target-ligand-based (hybrid) approaches. Computational DTI prediction methods have the potential to significantly advance the discovery and development of novel selective drugs exhibiting minimal side effects. We highlight some inherent caveats of these methods which must be overcome to enable them to realize their full potential, and a future outlook is given.
Collapse
Affiliation(s)
| | | | | | | | - Stefan Günther
- Institute of Pharmaceutical Sciences, Research Group Pharmaceutical Bioinformatics, Albert-Ludwigs-Universität Freiburg, Germany
| |
Collapse
|
31
|
Parca L, Pepe G, Pietrosanto M, Galvan G, Galli L, Palmeri A, Sciandrone M, Ferrè F, Ausiello G, Helmer-Citterich M. Modeling cancer drug response through drug-specific informative genes. Sci Rep 2019; 9:15222. [PMID: 31645597 PMCID: PMC6811538 DOI: 10.1038/s41598-019-50720-0] [Citation(s) in RCA: 36] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2019] [Accepted: 09/06/2019] [Indexed: 12/18/2022] Open
Abstract
Recent advances in pharmacogenomics have generated a wealth of data of different types whose analysis have helped in the identification of signatures of different cellular sensitivity/resistance responses to hundreds of chemical compounds. Among the different data types, gene expression has proven to be the more successful for the inference of drug response in cancer cell lines. Although effective, the whole transcriptome can introduce noise in the predictive models, since specific mechanisms are required for different drugs and these realistically involve only part of the proteins encoded in the genome. We analyzed the pharmacogenomics data of 961 cell lines tested with 265 anti-cancer drugs and developed different machine learning approaches for dissecting the genome systematically and predict drug responses using both drug-unspecific and drug-specific genes. These methodologies reach better response predictions for the vast majority of the screened drugs using tens to few hundreds genes specific to each drug instead of the whole genome, thus allowing a better understanding and interpretation of drug-specific response mechanisms which are not necessarily restricted to the drug known targets.
Collapse
Affiliation(s)
- Luca Parca
- Department of Biology, University of Rome "Tor Vergata", Rome, Italy
| | - Gerardo Pepe
- Department of Biology, University of Rome "Tor Vergata", Rome, Italy
| | - Marco Pietrosanto
- Department of Biology, University of Rome "Tor Vergata", Rome, Italy
| | - Giulio Galvan
- Department of Information Engineering, University of Florence, Florence, Italy
| | - Leonardo Galli
- Department of Information Engineering, University of Florence, Florence, Italy
| | - Antonio Palmeri
- Department of Biology, University of Rome "Tor Vergata", Rome, Italy
- Celgene Institute for Translational Research Europe, Sevilla, Spain
| | - Marco Sciandrone
- Department of Information Engineering, University of Florence, Florence, Italy
| | - Fabrizio Ferrè
- Department of Pharmacy and Biotechnology, University of Bologna Alma Mater, Bologna, Italy
| | - Gabriele Ausiello
- Department of Biology, University of Rome "Tor Vergata", Rome, Italy
| | | |
Collapse
|
32
|
Rifaioglu AS, Atas H, Martin MJ, Cetin-Atalay R, Atalay V, Doğan T. Recent applications of deep learning and machine intelligence on in silico drug discovery: methods, tools and databases. Brief Bioinform 2019; 20:1878-1912. [PMID: 30084866 PMCID: PMC6917215 DOI: 10.1093/bib/bby061] [Citation(s) in RCA: 223] [Impact Index Per Article: 44.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2018] [Revised: 05/25/2018] [Indexed: 01/16/2023] Open
Abstract
The identification of interactions between drugs/compounds and their targets is crucial for the development of new drugs. In vitro screening experiments (i.e. bioassays) are frequently used for this purpose; however, experimental approaches are insufficient to explore novel drug-target interactions, mainly because of feasibility problems, as they are labour intensive, costly and time consuming. A computational field known as 'virtual screening' (VS) has emerged in the past decades to aid experimental drug discovery studies by statistically estimating unknown bio-interactions between compounds and biological targets. These methods use the physico-chemical and structural properties of compounds and/or target proteins along with the experimentally verified bio-interaction information to generate predictive models. Lately, sophisticated machine learning techniques are applied in VS to elevate the predictive performance. The objective of this study is to examine and discuss the recent applications of machine learning techniques in VS, including deep learning, which became highly popular after giving rise to epochal developments in the fields of computer vision and natural language processing. The past 3 years have witnessed an unprecedented amount of research studies considering the application of deep learning in biomedicine, including computational drug discovery. In this review, we first describe the main instruments of VS methods, including compound and protein features (i.e. representations and descriptors), frequently used libraries and toolkits for VS, bioactivity databases and gold-standard data sets for system training and benchmarking. We subsequently review recent VS studies with a strong emphasis on deep learning applications. Finally, we discuss the present state of the field, including the current challenges and suggest future directions. We believe that this survey will provide insight to the researchers working in the field of computational drug discovery in terms of comprehending and developing novel bio-prediction methods.
Collapse
Affiliation(s)
- Ahmet Sureyya Rifaioglu
- Department of Computer Engineering, Middle East Technical University, Ankara, Turkey
- Department of Computer Engineering, İskenderun Technical University, Hatay, Turkey
| | - Heval Atas
- Cancer System Biology Laboratory (CanSyL), Graduate School of Informatics, Middle East Technical University, Ankara, Turkey
| | - Maria Jesus Martin
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL–EBI), Cambridge, Hinxton, UK
| | - Rengul Cetin-Atalay
- Department of Computer Engineering, Middle East Technical University, Ankara, Turkey
| | - Volkan Atalay
- Department of Computer Engineering, Middle East Technical University, Ankara, Turkey
| | - Tunca Doğan
- Cancer System Biology Laboratory (CanSyL), Graduate School of Informatics, Middle East Technical University, Ankara, Turkey and European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL–EBI), Cambridge, Hinxton, UK
| |
Collapse
|
33
|
Oranje P, Gouka R, Burggraaff L, Vermeer M, Chalet C, Duchateau G, van der Pijl P, Geldof M, de Roo N, Clauwaert F, Vanpaeschen T, Nicolaï J, de Bruyn T, Annaert P, IJzerman AP, van Westen GJP. Novel natural and synthetic inhibitors of solute carriers SGLT1 and SGLT2. Pharmacol Res Perspect 2019; 7:e00504. [PMID: 31384471 PMCID: PMC6664820 DOI: 10.1002/prp2.504] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2019] [Revised: 06/05/2019] [Accepted: 06/06/2019] [Indexed: 12/12/2022] Open
Abstract
Selective analogs of the natural glycoside phloridzin are marketed drugs that reduce hyperglycemia in diabetes by inhibiting the active sodium glucose cotransporter SGLT2 in the kidneys. In addition, intestinal SGLT1 is now recognized as a target for glycemic control. To expand available type 2 diabetes remedies, we aimed to find novel SGLT1 inhibitors beyond the chemical space of glycosides. We screened a bioactive compound library for SGLT1 inhibitors and tested primary hits and additional structurally similar molecules on SGLT1 and SGLT2 (SGLT1/2). Novel SGLT1/2 inhibitors were discovered in separate chemical clusters of natural and synthetic compounds. These have IC50-values in the 10-100 μmol/L range. The most potent identified novel inhibitors from different chemical clusters are (SGLT1-IC50 Mean ± SD, SGLT2-IC50 Mean ± SD): (+)-pteryxin (12 ± 2 μmol/L, 9 ± 4 μmol/L), (+)-ε-viniferin (58 ± 18 μmol/L, 110 μmol/L), quinidine (62 μmol/L, 56 μmol/L), cloperastine (9 ± 3 μmol/L, 9 ± 7 μmol/L), bepridil (10 ± 5 μmol/L, 14 ± 12 μmol/L), trihexyphenidyl (12 ± 1 μmol/L, 20 ± 13 μmol/L) and bupivacaine (23 ± 14 μmol/L, 43 ± 29 μmol/L). The discovered natural inhibitors may be further investigated as new potential (prophylactic) agents for controlling dietary glucose uptake. The new diverse structure activity data can provide a starting point for the optimization of novel SGLT1/2 inhibitors and support the development of virtual SGLT1/2 inhibitor screening models.
Collapse
Affiliation(s)
- Paul Oranje
- Unilever Research & DevelopmentVlaardingenThe Netherlands
| | - Robin Gouka
- Unilever Research & DevelopmentVlaardingenThe Netherlands
| | - Lindsey Burggraaff
- Division of Drug Discovery & Safety, Leiden Academic Centre for Drug ResearchLeiden UniversityLeidenThe Netherlands
| | - Mario Vermeer
- Unilever Research & DevelopmentVlaardingenThe Netherlands
| | - Clément Chalet
- Unilever Research & DevelopmentVlaardingenThe Netherlands
| | - Guus Duchateau
- Unilever Research & DevelopmentVlaardingenThe Netherlands
| | | | - Marian Geldof
- Unilever Research & DevelopmentVlaardingenThe Netherlands
| | - Niels de Roo
- Unilever Research & DevelopmentVlaardingenThe Netherlands
| | - Fenja Clauwaert
- Drug Delivery and Disposition, Department of Pharmaceutical and Pharmacological SciencesKU LeuvenLeuvenBelgium
| | - Toon Vanpaeschen
- Drug Delivery and Disposition, Department of Pharmaceutical and Pharmacological SciencesKU LeuvenLeuvenBelgium
| | - Johan Nicolaï
- Drug Delivery and Disposition, Department of Pharmaceutical and Pharmacological SciencesKU LeuvenLeuvenBelgium
| | - Tom de Bruyn
- Drug Delivery and Disposition, Department of Pharmaceutical and Pharmacological SciencesKU LeuvenLeuvenBelgium
| | - Pieter Annaert
- Drug Delivery and Disposition, Department of Pharmaceutical and Pharmacological SciencesKU LeuvenLeuvenBelgium
| | - Adriaan P. IJzerman
- Division of Drug Discovery & Safety, Leiden Academic Centre for Drug ResearchLeiden UniversityLeidenThe Netherlands
| | - Gerard J. P. van Westen
- Division of Drug Discovery & Safety, Leiden Academic Centre for Drug ResearchLeiden UniversityLeidenThe Netherlands
| |
Collapse
|
34
|
Lee M, Kim H, Joe H, Kim HG. Multi-channel PINN: investigating scalable and transferable neural networks for drug discovery. J Cheminform 2019; 11:46. [PMID: 31289963 PMCID: PMC6617572 DOI: 10.1186/s13321-019-0368-1] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2018] [Accepted: 07/02/2019] [Indexed: 12/19/2022] Open
Abstract
Analysis of compound–protein interactions (CPIs) has become a crucial prerequisite for drug discovery and drug repositioning. In vitro experiments are commonly used in identifying CPIs, but it is not feasible to discover the molecular and proteomic space only through experimental approaches. Machine learning’s advances in predicting CPIs have made significant contributions to drug discovery. Deep neural networks (DNNs), which have recently been applied to predict CPIs, performed better than other shallow classifiers. However, such techniques commonly require a considerable volume of dense data for each training target. Although the number of publicly available CPI data has grown rapidly, public data is still sparse and has a large number of measurement errors. In this paper, we propose a novel method, Multi-channel PINN, to fully utilize sparse data in terms of representation learning. With representation learning, Multi-channel PINN can utilize three approaches of DNNs which are a classifier, a feature extractor, and an end-to-end learner. Multi-channel PINN can be fed with both low and high levels of representations and incorporates each of them by utilizing all approaches within a single model. To fully utilize sparse public data, we additionally explore the potential of transferring representations from training tasks to test tasks. As a proof of concept, Multi-channel PINN was evaluated on fifteen combinations of feature pairs to investigate how they affect the performance in terms of highest performance, initial performance, and convergence speed. The experimental results obtained indicate that the multi-channel models using protein features performed better than single-channel models or multi-channel models using compound features. Therefore, Multi-channel PINN can be advantageous when used with appropriate representations. Additionally, we pretrained models on a training task then finetuned them on a test task to figure out whether Multi-channel PINN can capture general representations for compounds and proteins. We found that there were significant differences in performance between pretrained models and non-pretrained models.
Collapse
Affiliation(s)
- Munhwan Lee
- Biomedical Knowledge Engineering Laboratory, Seoul National University, 1 Gwanak-ro, Seoul, Republic of Korea
| | - Hyeyeon Kim
- Biomedical Knowledge Engineering Laboratory, Seoul National University, 1 Gwanak-ro, Seoul, Republic of Korea
| | - Hyunwhan Joe
- Biomedical Knowledge Engineering Laboratory, Seoul National University, 1 Gwanak-ro, Seoul, Republic of Korea
| | - Hong-Gee Kim
- Biomedical Knowledge Engineering Laboratory, Seoul National University, 1 Gwanak-ro, Seoul, Republic of Korea.
| |
Collapse
|
35
|
Türková A, Zdrazil B. Current Advances in Studying Clinically Relevant Transporters of the Solute Carrier (SLC) Family by Connecting Computational Modeling and Data Science. Comput Struct Biotechnol J 2019; 17:390-405. [PMID: 30976382 PMCID: PMC6438991 DOI: 10.1016/j.csbj.2019.03.002] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2018] [Revised: 02/28/2019] [Accepted: 03/01/2019] [Indexed: 01/18/2023] Open
Abstract
Organic anion and cation transporting proteins (OATs, OATPs, and OCTs), as well as the Multidrug and Toxin Extrusion (MATE) transporters of the Solute Carrier (SLC) family are playing a pivotal role in the discovery and development of new drugs due to their involvement in drug disposition, drug-drug interactions, adverse drug effects and related toxicity. Computational methods to understand and predict clinically relevant transporter interactions can provide useful guidance at early stages in drug discovery and design, especially if they include contemporary data science approaches. In this review, we summarize the current state-of-the-art of computational approaches for exploring ligand interactions and selectivity for these drug (uptake) transporters. The computational methods discussed here by highlighting interesting examples from the current literature are ranging from semiautomatic data mining and integration, to ligand-based methods (such as quantitative structure-activity relationships, and combinatorial pharmacophore modeling), and finally structure-based methods (such as comparative modeling, molecular docking, and molecular dynamics simulations). We are focusing on promising computational techniques such as fold-recognition methods, proteochemometric modeling or techniques for enhanced sampling of protein conformations used in the context of these ADMET-relevant SLC transporters with a special focus on methods useful for studying ligand selectivity.
Collapse
Affiliation(s)
- Alžběta Türková
- Department of Pharmaceutical Chemistry, Divison of Drug Design and Medicinal Chemistry, University of Vienna, Althanstraße 14, A-1090 Vienna, Austria
| | - Barbara Zdrazil
- Department of Pharmaceutical Chemistry, Divison of Drug Design and Medicinal Chemistry, University of Vienna, Althanstraße 14, A-1090 Vienna, Austria
| |
Collapse
|
36
|
Sydow D, Burggraaff L, Szengel A, van Vlijmen HWT, IJzerman AP, van Westen GJP, Volkamer A. Advances and Challenges in Computational Target Prediction. J Chem Inf Model 2019; 59:1728-1742. [DOI: 10.1021/acs.jcim.8b00832] [Citation(s) in RCA: 58] [Impact Index Per Article: 11.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
Affiliation(s)
- Dominique Sydow
- In silico Toxicology, Institute of Physiology, Charité − Universitätsmedizin Berlin, Charitéplatz 1, 10117 Berlin, Germany
| | - Lindsey Burggraaff
- Division of Drug Discovery and Safety, Leiden Academic Centre for Drug Research, Leiden University, P.O. Box 9502, 2300 RA, Leiden, The Netherlands
| | - Angelika Szengel
- In silico Toxicology, Institute of Physiology, Charité − Universitätsmedizin Berlin, Charitéplatz 1, 10117 Berlin, Germany
| | - Herman W. T. van Vlijmen
- Division of Drug Discovery and Safety, Leiden Academic Centre for Drug Research, Leiden University, P.O. Box 9502, 2300 RA, Leiden, The Netherlands
- Computational Chemistry, Janssen Research & Development, Turnhoutseweg 30, B-2340 Beerse, Belgium
| | - Adriaan P. IJzerman
- Division of Drug Discovery and Safety, Leiden Academic Centre for Drug Research, Leiden University, P.O. Box 9502, 2300 RA, Leiden, The Netherlands
| | - Gerard J. P. van Westen
- Division of Drug Discovery and Safety, Leiden Academic Centre for Drug Research, Leiden University, P.O. Box 9502, 2300 RA, Leiden, The Netherlands
| | - Andrea Volkamer
- In silico Toxicology, Institute of Physiology, Charité − Universitätsmedizin Berlin, Charitéplatz 1, 10117 Berlin, Germany
| |
Collapse
|
37
|
Lopez-Del Rio A, Nonell-Canals A, Vidal D, Perera-Lluna A. Evaluation of Cross-Validation Strategies in Sequence-Based Binding Prediction Using Deep Learning. J Chem Inf Model 2019; 59:1645-1657. [PMID: 30730731 DOI: 10.1021/acs.jcim.8b00663] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023]
Abstract
Binding prediction between targets and drug-like compounds through deep neural networks has generated promising results in recent years, outperforming traditional machine learning-based methods. However, the generalization capability of these classification models is still an issue to be addressed. In this work, we explored how different cross-validation strategies applied to data from different molecular databases affect to the performance of binding prediction proteochemometrics models. These strategies are (1) random splitting, (2) splitting based on K-means clustering (both of actives and inactives), (3) splitting based on source database, and (4) splitting based both in the clustering and in the source database. These schemas are applied to a deep learning proteochemometrics model and to a simple logistic regression model to be used as baseline. Additionally, two different ways of describing molecules in the model are tested: (1) by their SMILES and (2) by three fingerprints. The classification performance of our deep learning-based proteochemometrics model is comparable to the state of the art. Our results show that the lack of generalization of these models is due to a bias in public molecular databases and that a restrictive cross-validation schema based on compound clustering leads to worse but more robust and credible results. Our results also show better performance when representing molecules by their fingerprints.
Collapse
Affiliation(s)
- Angela Lopez-Del Rio
- B2SLab, Departament d'Enginyeria de Sistemes, Automàtica i Informàtica Industrial , Universitat Politècnica de Catalunya , 08028 Barcelona , Spain.,Mind the Byte S.L. , 08007 Barcelona , Spain.,Networking Biomedical Research Centre in the subject area of Bioengineering , Biomaterials and Nanomedicine (CIBER-BBN) 28029 Madrid , Spain.,Department of Biomedical Engineering , Institut de Recerca Pediàtrica Hospital Sant Joan de Déu , Esplugues de Llobregat , 08950 Barcelona , Spain
| | | | - David Vidal
- Mind the Byte S.L. , 08007 Barcelona , Spain
| | - Alexandre Perera-Lluna
- B2SLab, Departament d'Enginyeria de Sistemes, Automàtica i Informàtica Industrial , Universitat Politècnica de Catalunya , 08028 Barcelona , Spain.,Networking Biomedical Research Centre in the subject area of Bioengineering , Biomaterials and Nanomedicine (CIBER-BBN) 28029 Madrid , Spain.,Department of Biomedical Engineering , Institut de Recerca Pediàtrica Hospital Sant Joan de Déu , Esplugues de Llobregat , 08950 Barcelona , Spain
| |
Collapse
|
38
|
Burggraaff L, Oranje P, Gouka R, van der Pijl P, Geldof M, van Vlijmen HWT, IJzerman AP, van Westen GJP. Identification of novel small molecule inhibitors for solute carrier SGLT1 using proteochemometric modeling. J Cheminform 2019; 11:15. [PMID: 30767155 PMCID: PMC6689890 DOI: 10.1186/s13321-019-0337-8] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2018] [Accepted: 02/08/2019] [Indexed: 01/18/2023] Open
Abstract
Sodium-dependent glucose co-transporter 1 (SGLT1) is a solute carrier responsible for active glucose absorption. SGLT1 is present in both the renal tubules and small intestine. In contrast, the closely related sodium-dependent glucose co-transporter 2 (SGLT2), a protein that is targeted in the treatment of diabetes type II, is only expressed in the renal tubules. Although dual inhibitors for both SGLT1 and SGLT2 have been developed, no drugs on the market are targeted at decreasing dietary glucose uptake by SGLT1 in the gastrointestinal tract. Here we aim at identifying SGLT1 inhibitors in silico by applying a machine learning approach that does not require structural information, which is absent for SGLT1. We applied proteochemometrics by implementation of compound- and protein-based information into random forest models. We obtained a predictive model with a sensitivity of 0.64 ± 0.06, specificity of 0.93 ± 0.01, positive predictive value of 0.47 ± 0.07, negative predictive value of 0.96 ± 0.01, and Matthews correlation coefficient of 0.49 ± 0.05. Subsequent to model training, we applied our model in virtual screening to identify novel SGLT1 inhibitors. Of the 77 tested compounds, 30 were experimentally confirmed for SGLT1-inhibiting activity in vitro, leading to a hit rate of 39% with activities in the low micromolar range. Moreover, the hit compounds included novel molecules, which is reflected by the low similarity of these compounds with the training set (< 0.3). Conclusively, proteochemometric modeling of SGLT1 is a viable strategy for identifying active small molecules. Therefore, this method may also be applied in detection of novel small molecules for other transporter proteins.![]()
Collapse
Affiliation(s)
- Lindsey Burggraaff
- Division of Drug Discovery & Safety, Leiden Academic Centre for Drug Research, Leiden University, Einsteinweg 55, 2333 CC, Leiden, The Netherlands
| | - Paul Oranje
- Unilever Research & Development, Olivier van Noortlaan 120, 3133 AT, Vlaardingen, The Netherlands
| | - Robin Gouka
- Unilever Research & Development, Olivier van Noortlaan 120, 3133 AT, Vlaardingen, The Netherlands
| | - Pieter van der Pijl
- Unilever Research & Development, Olivier van Noortlaan 120, 3133 AT, Vlaardingen, The Netherlands
| | - Marian Geldof
- Unilever Research & Development, Olivier van Noortlaan 120, 3133 AT, Vlaardingen, The Netherlands
| | - Herman W T van Vlijmen
- Division of Drug Discovery & Safety, Leiden Academic Centre for Drug Research, Leiden University, Einsteinweg 55, 2333 CC, Leiden, The Netherlands.,Janssen Research & Development, Turnhoutseweg 30, 2340, Beerse, Belgium
| | - Adriaan P IJzerman
- Division of Drug Discovery & Safety, Leiden Academic Centre for Drug Research, Leiden University, Einsteinweg 55, 2333 CC, Leiden, The Netherlands
| | - Gerard J P van Westen
- Division of Drug Discovery & Safety, Leiden Academic Centre for Drug Research, Leiden University, Einsteinweg 55, 2333 CC, Leiden, The Netherlands.
| |
Collapse
|
39
|
Srinivas R, Klimovich PV, Larson EC. Implicit-descriptor ligand-based virtual screening by means of collaborative filtering. J Cheminform 2018; 10:56. [PMID: 30467684 PMCID: PMC6755561 DOI: 10.1186/s13321-018-0310-y] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2018] [Accepted: 11/13/2018] [Indexed: 12/20/2022] Open
Abstract
Current ligand-based machine learning methods in virtual screening rely heavily on molecular fingerprinting for preprocessing, i.e., explicit description of ligands’ structural and physicochemical properties in a vectorized form. Of particular importance to current methods are the extent to which molecular fingerprints describe a particular ligand and what metric sufficiently captures similarity among ligands. In this work, we propose and evaluate methods that do not require explicit feature vectorization through fingerprinting, but, instead, provide implicit descriptors based only on other known assays. Our methods are based upon well known collaborative filtering algorithms used in recommendation systems. Our implicit descriptor method does not require any fingerprint similarity search, which makes the method free of the bias arising from the empirical nature of the fingerprint models. We show that implicit methods significantly outperform traditional machine learning methods, and the main strengths of implicit methods are their resilience to target-ligand sparsity and high potential for spotting promiscuous ligands.
Collapse
Affiliation(s)
- Raghuram Srinivas
- Department of Computer Science and Engineering, Bobby B. Lyle School of Engineering, Southern Methodist University, 3145 Dyer Street, Dallas, TX, 75205, USA. .,DataScience@SMU, Dallas, 75205, TX, USA.
| | - Pavel V Klimovich
- Department of Computer Science and Engineering, Bobby B. Lyle School of Engineering, Southern Methodist University, 3145 Dyer Street, Dallas, TX, 75205, USA.,The Dedman College Interdisciplinary Institute, 3225 Daniel Avenue, Dallas, TX, 75205, USA
| | - Eric C Larson
- Department of Computer Science and Engineering, Bobby B. Lyle School of Engineering, Southern Methodist University, 3145 Dyer Street, Dallas, TX, 75205, USA
| |
Collapse
|
40
|
Janssen APA, Grimm SH, Wijdeven RHM, Lenselink EB, Neefjes J, van Boeckel CAA, van Westen GJP, van der Stelt M. Drug Discovery Maps, a Machine Learning Model That Visualizes and Predicts Kinome-Inhibitor Interaction Landscapes. J Chem Inf Model 2018; 59:1221-1229. [PMID: 30372617 PMCID: PMC6437696 DOI: 10.1021/acs.jcim.8b00640] [Citation(s) in RCA: 36] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Abstract
The interpretation of high-dimensional structure-activity data sets in drug discovery to predict ligand-protein interaction landscapes is a challenging task. Here we present Drug Discovery Maps (DDM), a machine learning model that maps the activity profile of compounds across an entire protein family, as illustrated here for the kinase family. DDM is based on the t-distributed stochastic neighbor embedding (t-SNE) algorithm to generate a visualization of molecular and biological similarity. DDM maps chemical and target space and predicts the activities of novel kinase inhibitors across the kinome. The model was validated using independent data sets and in a prospective experimental setting, where DDM predicted new inhibitors for FMS-like tyrosine kinase 3 (FLT3), a therapeutic target for the treatment of acute myeloid leukemia. Compounds were resynthesized, yielding highly potent, cellularly active FLT3 inhibitors. Biochemical assays confirmed most of the predicted off-targets. DDM is further unique in that it is completely open-source and available as a ready-to-use executable to facilitate broad and easy adoption.
Collapse
Affiliation(s)
- Antonius P A Janssen
- Molecular Physiology, Leiden Institute of Chemistry , Leiden University , 2333 CC Leiden , The Netherlands
| | - Sebastian H Grimm
- Molecular Physiology, Leiden Institute of Chemistry , Leiden University , 2333 CC Leiden , The Netherlands
| | - Ruud H M Wijdeven
- Department of Cell and Chemical Biology , Leiden University Medical Centre , 2333 ZC Leiden , The Netherlands
| | - Eelke B Lenselink
- Drug and Target Discovery , Leiden Academic Centre for Drug Research, Leiden University , 2333 CC Leiden , The Netherlands
| | - Jacques Neefjes
- Department of Cell and Chemical Biology , Leiden University Medical Centre , 2333 ZC Leiden , The Netherlands
| | | | - Gerard J P van Westen
- Drug and Target Discovery , Leiden Academic Centre for Drug Research, Leiden University , 2333 CC Leiden , The Netherlands
| | - Mario van der Stelt
- Molecular Physiology, Leiden Institute of Chemistry , Leiden University , 2333 CC Leiden , The Netherlands
| |
Collapse
|
41
|
Giblin KA, Hughes SJ, Boyd H, Hansson P, Bender A. Prospectively Validated Proteochemometric Models for the Prediction of Small-Molecule Binding to Bromodomain Proteins. J Chem Inf Model 2018; 58:1870-1888. [DOI: 10.1021/acs.jcim.8b00400] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Affiliation(s)
- Kathryn A. Giblin
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge CB2 1EW, U.K
| | - Samantha J. Hughes
- Computational Chemistry, Oncology, IMED Biotech Unit, AstraZeneca, Cambridge CB10 1XL, U.K
| | - Helen Boyd
- Discovery Biology, Discovery Sciences, IMED Biotech Unit, AstraZeneca, Gothenburg 431 50 SE, Sweden
| | - Pia Hansson
- Discovery Biology, Discovery Sciences, IMED Biotech Unit, AstraZeneca, Gothenburg 431 50 SE, Sweden
| | - Andreas Bender
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge CB2 1EW, U.K
| |
Collapse
|
42
|
Seal A, Wild DJ. Netpredictor: R and Shiny package to perform drug-target network analysis and prediction of missing links. BMC Bioinformatics 2018; 19:265. [PMID: 30012095 PMCID: PMC6047136 DOI: 10.1186/s12859-018-2254-7] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2018] [Accepted: 06/18/2018] [Indexed: 12/31/2022] Open
Abstract
BACKGROUND Netpredictor is an R package for prediction of missing links in any given unipartite or bipartite network. The package provides utilities to compute missing links in a bipartite and well as unipartite networks using Random Walk with Restart and Network inference algorithm and a combination of both. The package also allows computation of Bipartite network properties, visualization of communities for two different sets of nodes, and calculation of significant interactions between two sets of nodes using permutation based testing. The application can also be used to search for top-K shortest paths between interactome and use enrichment analysis for disease, pathway and ontology. The R standalone package (including detailed introductory vignettes) and associated R Shiny web application is available under the GPL-2 Open Source license and is freely available to download. RESULTS We compared different algorithms performance in different small datasets and found random walk supersedes rest of the algorithms. The package is developed to perform network based prediction of unipartite and bipartite networks and use the results to understand the functionality of proteins in an interactome using enrichment analysis. CONCLUSION The rapid application development envrionment like shiny, helps non programmers to develop fast rich visualization apps and we beleieve it would continue to grow in future with further enhancements. We plan to update our algorithms in the package in near future and help scientist to analyse data in a much streamlined fashion.
Collapse
Affiliation(s)
- Abhik Seal
- School of Informatics and Computing, Indiana University Bloomington, Informatics West, Bloomington, 47408, Indiana, USA
| | - David J Wild
- School of Informatics and Computing, Indiana University Bloomington, Informatics West, Bloomington, 47408, Indiana, USA.
| |
Collapse
|
43
|
Nazarshodeh E, Sheikhpour R, Gharaghani S, Sarram MA. A novel proteochemometrics model for predicting the inhibition of nine carbonic anhydrase isoforms based on supervised Laplacian score and k-nearest neighbour regression. SAR AND QSAR IN ENVIRONMENTAL RESEARCH 2018; 29:419-437. [PMID: 29882433 DOI: 10.1080/1062936x.2018.1447995] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/19/2017] [Accepted: 02/28/2018] [Indexed: 06/08/2023]
Abstract
Carbonic anhydrases (CAs) are essential enzymes in biological processes. Prediction of the activity of compounds towards CA isoforms could be evaluated by computational techniques to discover a novel therapeutic inhibitor. Studies such as quantitative structure-activity relationships (QSARs), molecular docking and pharmacophore modelling have been carried out to design potent inhibitors. Unfortunately, QSAR does not consider the information of target space in the model. We successfully developed an in silico proteochemometrics model that simultaneously uses target and ligand descriptors to predict the activities of CA inhibitors. Herein, a strong predictive model was built for the prediction of protein-ligand binding affinity between nine human CA isoforms and 549 ligands. We applied descriptors obtained from the PROFEAT webserver for the proteins. Ligands were encoded by descriptors from PaDEL-Descriptor software. Supervised Laplacian score (SLS) and particle swarm optimization were used for feature selection. Models were derived using k-nearest neighbour (KNN) regression and a kernel smoother model. The predictive ability of the models was evaluated by an external validation test. Statistical results (Q2ext = 0.7806, r2test = 0.7811 and RMSEtest = 0.5549) showed that the model generated using SLS and KNN regression outperformed the other models. Consequently, the selectivity of compounds towards these enzymes will be predicted prior to synthesis.
Collapse
Affiliation(s)
- E Nazarshodeh
- a Laboratory of Bioinformatics and Drug Design (LBD), Institute of Biochemistry and Biophysics , University of Tehran , Tehran , Iran
| | - R Sheikhpour
- b Department of Computer Engineering , Yazd University , Yazd , Iran
| | - S Gharaghani
- a Laboratory of Bioinformatics and Drug Design (LBD), Institute of Biochemistry and Biophysics , University of Tehran , Tehran , Iran
| | - M A Sarram
- b Department of Computer Engineering , Yazd University , Yazd , Iran
| |
Collapse
|
44
|
Qiu T, Wu D, Qiu J, Cao Z. Finding the molecular scaffold of nuclear receptor inhibitors through high-throughput screening based on proteochemometric modelling. J Cheminform 2018; 10:21. [PMID: 29651663 PMCID: PMC5897275 DOI: 10.1186/s13321-018-0275-x] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2017] [Accepted: 04/02/2018] [Indexed: 02/10/2023] Open
Abstract
Nuclear receptors (NR) are a class of proteins that are responsible for sensing steroid and thyroid hormones and certain other molecules. In that case, NR have the ability to regulate the expression of specific genes and associated with various diseases, which make it essential drug targets. Approaches which can predict the inhibition ability of compounds for different NR target should be particularly helpful for drug development. In this study, proteochemometric modelling was introduced to analysis the bioactivity between chemical compounds and NR targets. Results illustrated the ability of our PCM model for high-throughput NR-inhibitor screening after evaluated on both internal (AUC > 0.870) and external (AUC > 0.746) validation set. Moreover, in-silico predicted bioactive compounds were clustered according to structure similarity and a series of representative molecular scaffolds can be derived for five major NR targets. Through scaffolds analysis, those essential bioactive scaffolds of different NR target can be detected and compared. Generally, the methods and molecular scaffolds proposed in this article can not only help the screening of potential therapeutic NR-inhibitors but also able to guide the future NR-related drug discovery.
Collapse
Affiliation(s)
- Tianyi Qiu
- School of Life Sciences and Technology, Shanghai 10th People's Hospital, Tongji University, No. 1239 SiPing Road, Shanghai, China.,The Institute of Biomedical Sciences, Fudan University, No. 138 Medical College Road, Shanghai, China
| | - Dingfeng Wu
- School of Life Sciences and Technology, Shanghai 10th People's Hospital, Tongji University, No. 1239 SiPing Road, Shanghai, China
| | - Jingxuan Qiu
- School of Life Sciences and Technology, Shanghai 10th People's Hospital, Tongji University, No. 1239 SiPing Road, Shanghai, China.,School of Medical Instrument and Food Engineering, University of Shanghai for Science and Technology, No. 516 JunGong Road, Shanghai, China
| | - Zhiwei Cao
- School of Life Sciences and Technology, Shanghai 10th People's Hospital, Tongji University, No. 1239 SiPing Road, Shanghai, China.
| |
Collapse
|
45
|
Jing Y, Bian Y, Hu Z, Wang L, Xie XQ. Deep Learning for Drug Design: an Artificial Intelligence Paradigm for Drug Discovery in the Big Data Era. AAPS J 2018; 20:58. [PMID: 29603063 PMCID: PMC6608578 DOI: 10.1208/s12248-018-0210-0] [Citation(s) in RCA: 128] [Impact Index Per Article: 21.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2017] [Accepted: 02/22/2018] [Indexed: 12/22/2022] Open
Abstract
Over the last decade, deep learning (DL) methods have been extremely successful and widely used to develop artificial intelligence (AI) in almost every domain, especially after it achieved its proud record on computational Go. Compared to traditional machine learning (ML) algorithms, DL methods still have a long way to go to achieve recognition in small molecular drug discovery and development. And there is still lots of work to do for the popularization and application of DL for research purpose, e.g., for small molecule drug research and development. In this review, we mainly discussed several most powerful and mainstream architectures, including the convolutional neural network (CNN), recurrent neural network (RNN), and deep auto-encoder networks (DAENs), for supervised learning and nonsupervised learning; summarized most of the representative applications in small molecule drug design; and briefly introduced how DL methods were used in those applications. The discussion for the pros and cons of DL methods as well as the main challenges we need to tackle were also emphasized.
Collapse
Affiliation(s)
- Yankang Jing
- Department of Pharmaceutical Sciences and Computational Chemical Genomics Screening Center, School of Pharmacy, University of Pittsburgh, 335 Sutherland Drive, 206 Salk Pavilion, Pittsburgh, Pennsylvania, 15261, USA
- NIH National Center of Excellence for Computational Drug Abuse Research, University of Pittsburgh, Pittsburgh, Pennsylvania, 15261, USA
- Drug Discovery Institute, University of Pittsburgh, Pittsburgh, Pennsylvania, 15261, USA
| | - Yuemin Bian
- Department of Pharmaceutical Sciences and Computational Chemical Genomics Screening Center, School of Pharmacy, University of Pittsburgh, 335 Sutherland Drive, 206 Salk Pavilion, Pittsburgh, Pennsylvania, 15261, USA
- NIH National Center of Excellence for Computational Drug Abuse Research, University of Pittsburgh, Pittsburgh, Pennsylvania, 15261, USA
- Drug Discovery Institute, University of Pittsburgh, Pittsburgh, Pennsylvania, 15261, USA
| | - Ziheng Hu
- Department of Pharmaceutical Sciences and Computational Chemical Genomics Screening Center, School of Pharmacy, University of Pittsburgh, 335 Sutherland Drive, 206 Salk Pavilion, Pittsburgh, Pennsylvania, 15261, USA
- NIH National Center of Excellence for Computational Drug Abuse Research, University of Pittsburgh, Pittsburgh, Pennsylvania, 15261, USA
- Drug Discovery Institute, University of Pittsburgh, Pittsburgh, Pennsylvania, 15261, USA
| | - Lirong Wang
- Department of Pharmaceutical Sciences and Computational Chemical Genomics Screening Center, School of Pharmacy, University of Pittsburgh, 335 Sutherland Drive, 206 Salk Pavilion, Pittsburgh, Pennsylvania, 15261, USA
- NIH National Center of Excellence for Computational Drug Abuse Research, University of Pittsburgh, Pittsburgh, Pennsylvania, 15261, USA
- Drug Discovery Institute, University of Pittsburgh, Pittsburgh, Pennsylvania, 15261, USA
| | - Xiang-Qun Xie
- Department of Pharmaceutical Sciences and Computational Chemical Genomics Screening Center, School of Pharmacy, University of Pittsburgh, 335 Sutherland Drive, 206 Salk Pavilion, Pittsburgh, Pennsylvania, 15261, USA.
- NIH National Center of Excellence for Computational Drug Abuse Research, University of Pittsburgh, Pittsburgh, Pennsylvania, 15261, USA.
- Drug Discovery Institute, University of Pittsburgh, Pittsburgh, Pennsylvania, 15261, USA.
- Departments of Computational Biology and Structural Biology, School of Medicine, University of Pittsburgh, Pittsburgh, Pennsylvania, 15261, USA.
| |
Collapse
|
46
|
Paricharak S, Méndez-Lucio O, Chavan Ravindranath A, Bender A, IJzerman AP, van Westen GJP. Data-driven approaches used for compound library design, hit triage and bioactivity modeling in high-throughput screening. Brief Bioinform 2018; 19:277-285. [PMID: 27789427 PMCID: PMC6018726 DOI: 10.1093/bib/bbw105] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2016] [Revised: 09/26/2016] [Indexed: 12/25/2022] Open
Abstract
High-throughput screening (HTS) campaigns are routinely performed in pharmaceutical companies to explore activity profiles of chemical libraries for the identification of promising candidates for further investigation. With the aim of improving hit rates in these campaigns, data-driven approaches have been used to design relevant compound screening collections, enable effective hit triage and perform activity modeling for compound prioritization. Remarkable progress has been made in the activity modeling area since the recent introduction of large-scale bioactivity-based compound similarity metrics. This is evidenced by increased hit rates in iterative screening strategies and novel insights into compound mode of action obtained through activity modeling. Here, we provide an overview of the developments in data-driven approaches, elaborate on novel activity modeling techniques and screening paradigms explored and outline their significance in HTS.
Collapse
Affiliation(s)
- Shardul Paricharak
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge, United Kingdom
- Division of Medicinal Chemistry, Leiden Academic Centre for Drug Research, Leiden University, RA Leiden, The Netherlands
| | - Oscar Méndez-Lucio
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge, United Kingdom
- Facultad de Química, Departamento de Farmacia, Universidad Nacional Autónoma de México, Avenida Universidad 3000, Mexico City, Mexico
| | - Aakash Chavan Ravindranath
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge, United Kingdom
| | - Andreas Bender
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge, United Kingdom
| | - Adriaan P IJzerman
- Division of Medicinal Chemistry, Leiden Academic Centre for Drug Research, Leiden University, RA Leiden, The Netherlands
| | - Gerard J P van Westen
- Division of Medicinal Chemistry, Leiden Academic Centre for Drug Research, Leiden University, RA Leiden, The Netherlands
| |
Collapse
|
47
|
Rakers C, Najnin RA, Polash AH, Takeda S, Brown J. Chemogenomic Active Learning's Domain of Applicability on Small, Sparse qHTS Matrices: A Study Using Cytochrome P450 and Nuclear Hormone Receptor Families. ChemMedChem 2018; 13:511-521. [DOI: 10.1002/cmdc.201700677] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2017] [Revised: 12/04/2017] [Indexed: 01/21/2023]
Affiliation(s)
- Christin Rakers
- Institute of Transformative bio-Molecules, WPI-ITbM; Nagoya University; Furo-cho Chikusa-ku Nagoya 464-8602 Japan
| | - Rifat Ara Najnin
- Department of Radiation Genetics; Kyoto University Graduate School of Medicine; Sakyo, Yoshida-konoemachi Building D, 3F Kyoto 606-8501 Japan
| | - Ahsan Habib Polash
- Department of Radiation Genetics; Kyoto University Graduate School of Medicine; Sakyo, Yoshida-konoemachi Building D, 3F Kyoto 606-8501 Japan
| | - Shunichi Takeda
- Department of Radiation Genetics; Kyoto University Graduate School of Medicine; Sakyo, Yoshida-konoemachi Building D, 3F Kyoto 606-8501 Japan
| | - J.B. Brown
- Laboratory for Molecular Biosciences; Kyoto University Graduate School of Medicine; Yoshida-konoemachi Building E 606-8501 Kyoto Sakyo Japan
| |
Collapse
|
48
|
Abstract
High-throughput and high-content screening campaigns have resulted in the creation of large chemogenomic matrices. These matrices form the training data which is used to build ligand-target interaction models for pharmacological and chemical biology research. While academic, government, and industrial efforts continuously add to the ligand-target data pairs available for modeling, major research efforts are devoted to improving machine learning techniques to cope with the sparseness, heterogeneity, and size of available datasets as well as inherent noise and bias. This "race of arms" has led to the creation of algorithms to generate highly complex models with high prediction performance at the cost of training efficiency as well as interpretability.In contrast, recent studies have challenged the necessity for "big data" in chemogenomic modeling and found that models built on larger numbers of examples do not necessarily result in better predictive abilities. Automated adaptive selection of the training data (ligand-target instances) used for model creation can result in considerably smaller training sets that retain prediction performance on par with training using hundreds of thousands of data points. In this chapter, we describe the protocols used for one such iterative chemogenomic selection technique, including model construction and update as well as possible techniques for evaluations of constructed models and analysis of the iterative model construction.
Collapse
Affiliation(s)
- Daniel Reker
- Koch Institute for Integrative Cancer Research, Massachusetts Institute of Technology, Cambridge, MA, USA.
| | - J B Brown
- Life Science Informatics Research Unit, Laboratory of Molecular Biosciences, Kyoto University Graduate School of Medicine, Kyoto, Japan
| |
Collapse
|
49
|
Tresadern G, Trabanco AA, Pérez-Benito L, Overington JP, van Vlijmen HWT, van Westen GJP. Identification of Allosteric Modulators of Metabotropic Glutamate 7 Receptor Using Proteochemometric Modeling. J Chem Inf Model 2017; 57:2976-2985. [PMID: 29172488 PMCID: PMC5755953 DOI: 10.1021/acs.jcim.7b00338] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2017] [Indexed: 01/07/2023]
Abstract
Proteochemometric modeling (PCM) is a computational approach that can be considered an extension of quantitative structure-activity relationship (QSAR) modeling, where a single model incorporates information for a family of targets and all the associated ligands instead of modeling activity versus one target. This is especially useful for situations where bioactivity data exists for similar proteins but is scarce for the protein of interest. Here we demonstrate the application of PCM to identify allosteric modulators of metabotropic glutamate (mGlu) receptors. Given our long-running interest in modulating mGlu receptor function we compiled a matrix of compound-target bioactivity data. Some members of the mGlu family are well explored both internally and in the public domain, while there are much fewer examples of ligands for other targets such as the mGlu7 receptor. Using a PCM approach mGlu7 receptor hits were found. In comparison to conventional single target modeling the identified hits were more diverse, had a better confirmation rate, and provide starting points for further exploration. We conclude that the robust structure-activity relationship from well explored target family members translated to better quality hits for PCM compared to virtual screening (VS) based on a single target.
Collapse
Affiliation(s)
- Gary Tresadern
- Computational
Chemistry and Neuroscience Medicinal Chemistry, Janssen
Research & Development, Janssen-Cilag
S.A., Jarama 75A, 45007 Toledo, Spain
| | - Andres A. Trabanco
- Computational
Chemistry and Neuroscience Medicinal Chemistry, Janssen
Research & Development, Janssen-Cilag
S.A., Jarama 75A, 45007 Toledo, Spain
| | - Laura Pérez-Benito
- Computational
Chemistry and Neuroscience Medicinal Chemistry, Janssen
Research & Development, Janssen-Cilag
S.A., Jarama 75A, 45007 Toledo, Spain
| | - John P. Overington
- ChEMBL Group, EMBL-EBI,
Wellcome Trust Genome Campus, CB10 1SD Hinxton, United Kingdom
| | - Herman W. T. van Vlijmen
- Computational
Chemistry, Janssen Research & Development, Turnhoutseweg 30, B-2340 Beerse, Belgium
| | | |
Collapse
|
50
|
Learning epistatic interactions from sequence-activity data to predict enantioselectivity. J Comput Aided Mol Des 2017; 31:1085-1096. [DOI: 10.1007/s10822-017-0090-x] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2017] [Accepted: 12/04/2017] [Indexed: 10/18/2022]
|