1
|
Juárez-Mercado KE, Gómez-Hernández MA, Salinas-Trujano J, Córdova-Bahena L, Espitia C, Pérez-Tapia SM, Medina-Franco JL, Velasco-Velázquez MA. Identification of SARS-CoV-2 Main Protease Inhibitors Using Chemical Similarity Analysis Combined with Machine Learning. Pharmaceuticals (Basel) 2024; 17:240. [PMID: 38399455 PMCID: PMC10892746 DOI: 10.3390/ph17020240] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2024] [Revised: 02/05/2024] [Accepted: 02/08/2024] [Indexed: 02/25/2024] Open
Abstract
SARS-CoV-2 Main Protease (Mpro) is an enzyme that cleaves viral polyproteins translated from the viral genome, which is critical for viral replication. Mpro is a target for anti-SARS-CoV-2 drug development. Herein, we performed a large-scale virtual screening by comparing multiple structural descriptors of reference molecules with reported anti-coronavirus activity against a library with >17 million compounds. Further filtering, performed by applying two machine learning algorithms, identified eighteen computational hits as anti-SARS-CoV-2 compounds with high structural diversity and drug-like properties. The activities of twelve compounds on Mpro's enzymatic activity were evaluated by fluorescence resonance energy transfer (FRET) assays. Compound 13 (ZINC13878776) significantly inhibited SARS-CoV-2 Mpro activity and was employed as a reference for an experimentally hit expansion. The structural analogues 13a (ZINC4248385), 13b (ZNC13523222), and 13c (ZINC4248365) were tested as Mpro inhibitors, reducing the enzymatic activity of recombinant Mpro with potency as follows: 13c > 13 > 13b > 13a. Then, their anti-SARS-CoV-2 activities were evaluated in plaque reduction assays using Vero CCL81 cells. Subtoxic concentrations of compounds 13a, 13c, and 13b displayed in vitro antiviral activity with IC50 in the mid micromolar range. Compounds 13a-c could become lead compounds for the development of new Mpro inhibitors with improved activity against anti-SARS-CoV-2.
Collapse
Affiliation(s)
| | - Milton Abraham Gómez-Hernández
- School of Medicine, Universidad Nacional Autónoma de México, Mexico City 04510, Mexico
- Graduate Program in Biomedical Sciences, Universidad Nacional Autónoma de México, Mexico City 04510, Mexico
| | - Juana Salinas-Trujano
- Research and Development in Biotherapeutics Unit (UDIBI), National School of Biological Sciences, Instituto Politécnico Nacional, Mexico City 11350, Mexico
- National Laboratory for Specialized Services of Investigation, Development and Innovation (I+D+i) for Pharma Chemicals and Biotechnological Products, LANSEIDI-FarBiotech-CONACHyT, Mexico City 11350, Mexico
| | - Luis Córdova-Bahena
- School of Medicine, Universidad Nacional Autónoma de México, Mexico City 04510, Mexico
- National Council of Humanities, Science and Technology (CONAHCYT), Mexico City 03940, Mexico
| | - Clara Espitia
- Immunology Department, Institute for Biomedical Research, Universidad Nacional Autónoma de México, Mexico City 04510, Mexico
| | - Sonia Mayra Pérez-Tapia
- Research and Development in Biotherapeutics Unit (UDIBI), National School of Biological Sciences, Instituto Politécnico Nacional, Mexico City 11350, Mexico
- National Laboratory for Specialized Services of Investigation, Development and Innovation (I+D+i) for Pharma Chemicals and Biotechnological Products, LANSEIDI-FarBiotech-CONACHyT, Mexico City 11350, Mexico
- Immunology Department, National School of Biological Sciences, Instituto Politécnico Nacional, Mexico City 11350, Mexico
| | - José L. Medina-Franco
- DIFACQUIM Research Group, School of Chemistry, Universidad Nacional Autónoma de México, Mexico City 04510, Mexico
| | | |
Collapse
|
2
|
Boldini D, Grisoni F, Kuhn D, Friedrich L, Sieber SA. Practical guidelines for the use of gradient boosting for molecular property prediction. J Cheminform 2023; 15:73. [PMID: 37641120 PMCID: PMC10464382 DOI: 10.1186/s13321-023-00743-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2023] [Accepted: 08/09/2023] [Indexed: 08/31/2023] Open
Abstract
Decision tree ensembles are among the most robust, high-performing and computationally efficient machine learning approaches for quantitative structure-activity relationship (QSAR) modeling. Among them, gradient boosting has recently garnered particular attention, for its performance in data science competitions, virtual screening campaigns, and bioactivity prediction. However, different variants of gradient boosting exist, the most popular being XGBoost, LightGBM and CatBoost. Our study provides the first comprehensive comparison of these approaches for QSAR. To this end, we trained 157,590 gradient boosting models, which were evaluated on 16 datasets and 94 endpoints, comprising 1.4 million compounds in total. Our results show that XGBoost generally achieves the best predictive performance, while LightGBM requires the least training time, especially for larger datasets. In terms of feature importance, the models surprisingly rank molecular features differently, reflecting differences in regularization techniques and decision tree structures. Thus, expert knowledge must always be employed when evaluating data-driven explanations of bioactivity. Furthermore, our results show that the relevance of each hyperparameter varies greatly across datasets and that it is crucial to optimize as many hyperparameters as possible to maximize the predictive performance. In conclusion, our study provides the first set of guidelines for cheminformatics practitioners to effectively train, optimize and evaluate gradient boosting models for virtual screening and QSAR applications.
Collapse
Affiliation(s)
- Davide Boldini
- Department of Bioscience, Center for Functional Protein Assemblies (CPA), Technical University of Munich, Garching bei Munich, Germany
| | - Francesca Grisoni
- Department of Biomedical Engineering, Institute for Complex Molecular Sciences, Eindhoven University of Technology, Eindhoven, The Netherlands
- Centre for Living Technologies, Alliance TU/E, WUR, UU, UMC Utrecht, Utrecht, The Netherlands
| | | | | | - Stephan A Sieber
- Department of Bioscience, Center for Functional Protein Assemblies (CPA), Technical University of Munich, Garching bei Munich, Germany.
| |
Collapse
|
3
|
Mukherjee G, Braka A, Wu S. Quantifying Functional-Group-like Structural Fragments in Molecules and Its Applications in Drug Design. J Chem Inf Model 2023; 63:2073-2083. [PMID: 36881497 DOI: 10.1021/acs.jcim.3c00050] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/08/2023]
Abstract
A functional group in a molecule is a structural fragment consisting of a few atoms or a single atom that imparts reactivity to a molecule. Hence, defining functional groups is crucial in chemistry to predict the properties and reactivities of molecules. However, there is no established method in the literature for defining functional groups based on reactivity parameters. In this work, we addressed this issue by designing a set of predefined structural fragments along with reactivity parameters like electron conjugation and ring strain. This approach uses bond orders and atom connectivities to quantify the presence of these fragments within an organic molecule based on a given input molecular coordinate. To assess the effectiveness of this approach, we performed a case study to show the benefits of using these newly designed structural fragments instead of traditional fingerprint-based methods for grouping potential COX1/COX2 inhibitors by screening an approved drug library against aspirin molecule. The structural fragment-based model for ternary classification of rat oral LD50 of chemicals showed performance similar to the fingerprint-based models. In evaluating the regression model performance for aqueous solubility, log(S), predictions, our approach outperformed the fingerprint-based model.
Collapse
Affiliation(s)
- Goutam Mukherjee
- R&D Center, PharmCADD Co. Ltd., 12F, 331, Jungang-daero, Dong-gu, Busan 48792, Republic of Korea
| | - Abdennour Braka
- R&D Center, PharmCADD Co. Ltd., 12F, 331, Jungang-daero, Dong-gu, Busan 48792, Republic of Korea
| | - Sangwook Wu
- R&D Center, PharmCADD Co. Ltd., 12F, 331, Jungang-daero, Dong-gu, Busan 48792, Republic of Korea.,Department of Physics, Pukyong National University, Busan 48513, Republic of Korea
| |
Collapse
|
4
|
Ferreira LLG, Andricopulo AD. Editorial: Chemoinformatics Approaches to Structure- and Ligand-Based Drug Design, Volume II. Front Pharmacol 2022; 13:945747. [PMID: 35847004 PMCID: PMC9277505 DOI: 10.3389/fphar.2022.945747] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2022] [Accepted: 06/14/2022] [Indexed: 12/03/2022] Open
|
5
|
Lovrić M, Đuričić T, Tran HTN, Hussain H, Lacić E, Rasmussen MA, Kern R. Should We Embed in Chemistry? A Comparison of Unsupervised Transfer Learning with PCA, UMAP, and VAE on Molecular Fingerprints. Pharmaceuticals (Basel) 2021; 14:758. [PMID: 34451855 PMCID: PMC8400160 DOI: 10.3390/ph14080758] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2021] [Revised: 07/21/2021] [Accepted: 07/22/2021] [Indexed: 02/07/2023] Open
Abstract
Methods for dimensionality reduction are showing significant contributions to knowledge generation in high-dimensional modeling scenarios throughout many disciplines. By achieving a lower dimensional representation (also called embedding), fewer computing resources are needed in downstream machine learning tasks, thus leading to a faster training time, lower complexity, and statistical flexibility. In this work, we investigate the utility of three prominent unsupervised embedding techniques (principal component analysis-PCA, uniform manifold approximation and projection-UMAP, and variational autoencoders-VAEs) for solving classification tasks in the domain of toxicology. To this end, we compare these embedding techniques against a set of molecular fingerprint-based models that do not utilize additional pre-preprocessing of features. Inspired by the success of transfer learning in several fields, we further study the performance of embedders when trained on an external dataset of chemical compounds. To gain a better understanding of their characteristics, we evaluate the embedders with different embedding dimensionalities, and with different sizes of the external dataset. Our findings show that the recently popularized UMAP approach can be utilized alongside known techniques such as PCA and VAE as a pre-compression technique in the toxicology domain. Nevertheless, the generative model of VAE shows an advantage in pre-compressing the data with respect to classification accuracy.
Collapse
Affiliation(s)
- Mario Lovrić
- Know-Center, Inffeldgasse 13, 8010 Graz, Austria; (M.L.); (T.Đ.); (H.T.N.T.); (H.H.); (E.L.)
- Centre for Applied Bioanthropology, Institute for Anthropological Research, 10000 Zagreb, Croatia
| | - Tomislav Đuričić
- Know-Center, Inffeldgasse 13, 8010 Graz, Austria; (M.L.); (T.Đ.); (H.T.N.T.); (H.H.); (E.L.)
- Institute of Interactive Systems and Data Science, Graz University of Technology, Inffeldgasse 16C, 8010 Graz, Austria
| | - Han T. N. Tran
- Know-Center, Inffeldgasse 13, 8010 Graz, Austria; (M.L.); (T.Đ.); (H.T.N.T.); (H.H.); (E.L.)
| | - Hussain Hussain
- Know-Center, Inffeldgasse 13, 8010 Graz, Austria; (M.L.); (T.Đ.); (H.T.N.T.); (H.H.); (E.L.)
- Institute of Interactive Systems and Data Science, Graz University of Technology, Inffeldgasse 16C, 8010 Graz, Austria
| | - Emanuel Lacić
- Know-Center, Inffeldgasse 13, 8010 Graz, Austria; (M.L.); (T.Đ.); (H.T.N.T.); (H.H.); (E.L.)
| | - Morten A. Rasmussen
- Copenhagen Studies on Asthma in Childhood, Herlev-Gentofte Hospital, University of Copenhagen, Ledreborg Alle 34, 2820 Gentofte, Denmark;
- Department of Food Science, University of Copenhagen, Rolighedsvej 26, 1958 Frederiksberg, Denmark
| | - Roman Kern
- Know-Center, Inffeldgasse 13, 8010 Graz, Austria; (M.L.); (T.Đ.); (H.T.N.T.); (H.H.); (E.L.)
- Institute of Interactive Systems and Data Science, Graz University of Technology, Inffeldgasse 16C, 8010 Graz, Austria
| |
Collapse
|
6
|
Bannigan P, Aldeghi M, Bao Z, Häse F, Aspuru-Guzik A, Allen C. Machine learning directed drug formulation development. Adv Drug Deliv Rev 2021; 175:113806. [PMID: 34019959 DOI: 10.1016/j.addr.2021.05.016] [Citation(s) in RCA: 77] [Impact Index Per Article: 25.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2021] [Revised: 03/31/2021] [Accepted: 05/14/2021] [Indexed: 12/12/2022]
Abstract
Machine learning (ML) has enabled ground-breaking advances in the healthcare and pharmaceutical sectors, from improvements in cancer diagnosis, to the identification of novel drugs and drug targets as well as protein structure prediction. Drug formulation is an essential stage in the discovery and development of new medicines. Through the design of drug formulations, pharmaceutical scientists can engineer important properties of new medicines, such as improved bioavailability and targeted delivery. The traditional approach to drug formulation development relies on iterative trial-and-error, requiring a large number of resource-intensive and time-consuming in vitro and in vivo experiments. This review introduces the basic concepts of ML-directed workflows and discusses how these tools can be used to aid in the development of various types of drug formulations. ML-directed drug formulation development offers unparalleled opportunities to fast-track development efforts, uncover new materials, innovative formulations, and generate new knowledge in drug formulation science. The review also highlights the latest artificial intelligence (AI) technologies, such as generative models, Bayesian deep learning, reinforcement learning, and self-driving laboratories, which have been gaining momentum in drug discovery and chemistry and have potential in drug formulation development.
Collapse
Affiliation(s)
- Pauric Bannigan
- Leslie Dan Faculty of Pharmacy, University of Toronto, Toronto, ON M5S 3M2, Canada
| | - Matteo Aldeghi
- Chemical Physics Theory Group, Department of Chemistry, University of Toronto, Toronto, ON M5S 3H6, Canada; Department of Computer Science, University of Toronto, Toronto, ON M5S 3H6, Canada; Vector Institute for Artificial Intelligence, Toronto, ON M5S 1M1, Canada
| | - Zeqing Bao
- Leslie Dan Faculty of Pharmacy, University of Toronto, Toronto, ON M5S 3M2, Canada
| | - Florian Häse
- Chemical Physics Theory Group, Department of Chemistry, University of Toronto, Toronto, ON M5S 3H6, Canada; Department of Computer Science, University of Toronto, Toronto, ON M5S 3H6, Canada; Vector Institute for Artificial Intelligence, Toronto, ON M5S 1M1, Canada
| | - Alán Aspuru-Guzik
- Chemical Physics Theory Group, Department of Chemistry, University of Toronto, Toronto, ON M5S 3H6, Canada; Department of Computer Science, University of Toronto, Toronto, ON M5S 3H6, Canada; Vector Institute for Artificial Intelligence, Toronto, ON M5S 1M1, Canada; Lebovic Fellow, Canadian Institute for Advanced Research, Toronto, ON M5S 1M1, Canada.
| | - Christine Allen
- Leslie Dan Faculty of Pharmacy, University of Toronto, Toronto, ON M5S 3M2, Canada.
| |
Collapse
|
7
|
Daley SK, Cordell GA. Alkaloids in Contemporary Drug Discovery to Meet Global Disease Needs. Molecules 2021; 26:molecules26133800. [PMID: 34206470 PMCID: PMC8270272 DOI: 10.3390/molecules26133800] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2021] [Revised: 06/05/2021] [Accepted: 06/14/2021] [Indexed: 12/15/2022] Open
Abstract
An overview is presented of the well-established role of alkaloids in drug discovery, the application of more sustainable chemicals, and biological approaches, and the implementation of information systems to address the current challenges faced in meeting global disease needs. The necessity for a new international paradigm for natural product discovery and development for the treatment of multidrug resistant organisms, and rare and neglected tropical diseases in the era of the Fourth Industrial Revolution and the Quintuple Helix is discussed.
Collapse
Affiliation(s)
| | - Geoffrey A. Cordell
- Natural Products Inc., Evanston, IL 60202, USA;
- Department of Pharmaceutics, College of Pharmacy, University of Florida, Gainesville, FL 32610, USA
- Correspondence:
| |
Collapse
|
8
|
Moreira-Filho JT, Silva AC, Dantas RF, Gomes BF, Souza Neto LR, Brandao-Neto J, Owens RJ, Furnham N, Neves BJ, Silva-Junior FP, Andrade CH. Schistosomiasis Drug Discovery in the Era of Automation and Artificial Intelligence. Front Immunol 2021; 12:642383. [PMID: 34135888 PMCID: PMC8203334 DOI: 10.3389/fimmu.2021.642383] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2020] [Accepted: 04/30/2021] [Indexed: 12/20/2022] Open
Abstract
Schistosomiasis is a parasitic disease caused by trematode worms of the genus Schistosoma and affects over 200 million people worldwide. The control and treatment of this neglected tropical disease is based on a single drug, praziquantel, which raises concerns about the development of drug resistance. This, and the lack of efficacy of praziquantel against juvenile worms, highlights the urgency for new antischistosomal therapies. In this review we focus on innovative approaches to the identification of antischistosomal drug candidates, including the use of automated assays, fragment-based screening, computer-aided and artificial intelligence-based computational methods. We highlight the current developments that may contribute to optimizing research outputs and lead to more effective drugs for this highly prevalent disease, in a more cost-effective drug discovery endeavor.
Collapse
Affiliation(s)
- José T. Moreira-Filho
- LabMol – Laboratory for Molecular Modeling and Drug Design, Faculdade de Farmácia, Universidade Federal de Goiás – UFG, Goiânia, Brazil
| | - Arthur C. Silva
- LabMol – Laboratory for Molecular Modeling and Drug Design, Faculdade de Farmácia, Universidade Federal de Goiás – UFG, Goiânia, Brazil
| | - Rafael F. Dantas
- LaBECFar – Laboratório de Bioquímica Experimental e Computacional de Fármacos, Instituto Oswaldo Cruz, Fundação Oswaldo Cruz, Rio de Janeiro, Brazil
| | - Barbara F. Gomes
- LaBECFar – Laboratório de Bioquímica Experimental e Computacional de Fármacos, Instituto Oswaldo Cruz, Fundação Oswaldo Cruz, Rio de Janeiro, Brazil
| | - Lauro R. Souza Neto
- LaBECFar – Laboratório de Bioquímica Experimental e Computacional de Fármacos, Instituto Oswaldo Cruz, Fundação Oswaldo Cruz, Rio de Janeiro, Brazil
| | - Jose Brandao-Neto
- Diamond Light Source Ltd., Didcot, United Kingdom
- Research Complex at Harwell, Didcot, United Kingdom
| | - Raymond J. Owens
- The Rosalind Franklin Institute, Harwell, United Kingdom
- Division of Structural Biology, The Wellcome Centre for Human Genetic, University of Oxford, Oxford, United Kingdom
| | - Nicholas Furnham
- Department of Infection Biology, Faculty of Infectious and Tropical Diseases, London School of Hygiene and Tropical Medicine, London, United Kingdom
| | - Bruno J. Neves
- LabMol – Laboratory for Molecular Modeling and Drug Design, Faculdade de Farmácia, Universidade Federal de Goiás – UFG, Goiânia, Brazil
| | - Floriano P. Silva-Junior
- LaBECFar – Laboratório de Bioquímica Experimental e Computacional de Fármacos, Instituto Oswaldo Cruz, Fundação Oswaldo Cruz, Rio de Janeiro, Brazil
| | - Carolina H. Andrade
- LabMol – Laboratory for Molecular Modeling and Drug Design, Faculdade de Farmácia, Universidade Federal de Goiás – UFG, Goiânia, Brazil
| |
Collapse
|
9
|
Nuzzo A, Saha S, Berg E, Jayawickreme C, Tocker J, Brown JR. Expanding the drug discovery space with predicted metabolite-target interactions. Commun Biol 2021; 4:288. [PMID: 33674782 PMCID: PMC7935942 DOI: 10.1038/s42003-021-01822-x] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2020] [Accepted: 02/05/2021] [Indexed: 02/07/2023] Open
Abstract
Metabolites produced in the human gut are known modulators of host immunity. However, large-scale identification of metabolite-host receptor interactions remains a daunting challenge. Here, we employed computational approaches to identify 983 potential metabolite-target interactions using the Inflammatory Bowel Disease (IBD) cohort dataset of the Human Microbiome Project 2 (HMP2). Using a consensus of multiple machine learning methods, we ranked metabolites based on importance to IBD, followed by virtual ligand-based screening to identify possible human targets and adding evidence from compound assay, differential gene expression, pathway enrichment, and genome-wide association studies. We confirmed known metabolite-target pairs such as nicotinic acid-GPR109a or linoleoyl ethanolamide-GPR119 and inferred interactions of interest including oleanolic acid-GABRG2 and alpha-CEHC-THRB. Eleven metabolites were tested for bioactivity in vitro using human primary cell-types. By expanding the universe of possible microbial metabolite-host protein interactions, we provide multiple drug targets for potential immune-therapies.
Collapse
Affiliation(s)
- Andrea Nuzzo
- GlaxoSmithKline Pharma R&D, 1250 S. Collegeville Rd, Collegeville, PA, 19426-0989, USA.
| | - Somdutta Saha
- GlaxoSmithKline Pharma R&D, 1250 S. Collegeville Rd, Collegeville, PA, 19426-0989, USA
- EMD Serono Research & Development Institute, Inc. 45A Middlesex Turnpike, Billerica, MA, 01821, USA
| | - Ellen Berg
- Eurofins Discovery, 111 Anza Boulevard, Burlingame, CA, 94010, USA
| | - Channa Jayawickreme
- GlaxoSmithKline Pharma R&D, 1250 S. Collegeville Rd, Collegeville, PA, 19426-0989, USA
| | - Joel Tocker
- GlaxoSmithKline Pharma R&D, 1250 S. Collegeville Rd, Collegeville, PA, 19426-0989, USA
| | - James R Brown
- GlaxoSmithKline Pharma R&D, 1250 S. Collegeville Rd, Collegeville, PA, 19426-0989, USA.
- Kaleido Biosciences, Inc. 65 Hayden Avenue, Lexington, MA, 02421, USA.
| |
Collapse
|
10
|
Piras A, Ehlert C, Gryn'ova G. Sensing and sensitivity: Computational chemistry of
graphene‐based
sensors. WIRES COMPUTATIONAL MOLECULAR SCIENCE 2021. [DOI: 10.1002/wcms.1526] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Affiliation(s)
- Anna Piras
- Heidelberg Institute for Theoretical Studies (HITS gGmbH) and Interdisciplinary Center for Scientific Computing (IWR) Heidelberg University Heidelberg Germany
| | - Christopher Ehlert
- Heidelberg Institute for Theoretical Studies (HITS gGmbH) and Interdisciplinary Center for Scientific Computing (IWR) Heidelberg University Heidelberg Germany
| | - Ganna Gryn'ova
- Heidelberg Institute for Theoretical Studies (HITS gGmbH) and Interdisciplinary Center for Scientific Computing (IWR) Heidelberg University Heidelberg Germany
| |
Collapse
|
11
|
Bjerrum EJ, Thakkar A, Engkvist O. Artificial applicability labels for improving policies in retrosynthesis prediction. MACHINE LEARNING: SCIENCE AND TECHNOLOGY 2021. [DOI: 10.1088/2632-2153/abcf90] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022] Open
Abstract
Abstract
Automated retrosynthetic planning algorithms are a research area of increasing importance. Automated reaction-template extraction from large datasets, in conjunction with neural-network-enhanced tree-search algorithms, can find plausible routes to target compounds in seconds. However, the current method for training neural networks to predict suitable templates for a given target product leads to many predictions that are not applicable in silico. Most templates in the top 50 suggested templates cannot be applied to the target molecule to perform the virtual reaction. Here, we describe how to generate data and train a neural network policy that predicts whether templates are applicable or not. First, we generate a massive training dataset by applying each retrosynthetic template to each product from our reaction database. Second, we train a neural network to perform near-perfect prediction of the applicability labels on a held-out test set. The trained network is then joined with a policy model trained to predict and prioritize templates using the labels from the original dataset. The combined model was found to outperform the policy model in a route-finding task using 1700 compounds from our internal drug-discovery projects.
Collapse
|
12
|
Gupta A, Choudhary M, Mohanty SK, Mittal A, Gupta K, Arya A, Kumar S, Katyayan N, Dixit NK, Kalra S, Goel M, Sahni M, Singhal V, Mishra T, Sengupta D, Ahuja G. Machine-OlF-Action: A unified framework for developing and interpreting machine-learning models for chemosensory research. Bioinformatics 2021; 37:1769-1771. [PMID: 33416866 DOI: 10.1093/bioinformatics/btaa1104] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2020] [Revised: 12/25/2020] [Accepted: 12/29/2020] [Indexed: 12/15/2022] Open
Abstract
Machine Learning-based techniques are emerging as state-of-the-art methods in chemoinformatics to selectively, effectively, and speedily identify biologically-relevant molecules from large databases. So far, a multitude of such techniques have been proposed, but unfortunately due to their sparse availability, and the dependency on high-end computational literacy, their wider adaptation faces challenges, at least in the context of G-Protein Coupled Receptors (GPCRs)-associated chemosensory research. Here we report Machine-OlF-Action (MOA), a user-friendly, open-source computational framework, that utilizes user-supplied SMILES (simplified molecular-input line-entry system) of the chemicals, along with their activation status, to synthesize classification models. MOA integrates a number of popular chemical databases collectively harboring ∼103 million chemical moieties. MOA also facilitates customized screening of user-supplied chemical datasets. A key feature of MOA is its ability to embed molecules based on the similarity of their local neighborhood, by utilizing a state of the art model interpretability framework LIME. We demonstrate the utility of MOA in identifying previously unreported agonists for human and mouse olfactory receptors OR1A1 and MOR174-9 by leveraging the chemical features of their known agonists and non-agonists. In summary, here we develop an ML-powered software playground for performing supervisory learning tasks involving chemical compounds. AVAILABILITY AND IMPLEMENTATION MOA is available for Windows, Mac, and Linux operating systems. It's accessible at (https://ahuja-lab.in/). Source code, user manual, step-and-step guide, and support is available at GitHub (https://github.com/the-ahuja-lab/Machine-Olf-Action). For results, reproducibility and hyperparameters, refer to Supplementary Notes.
Collapse
Affiliation(s)
- Anku Gupta
- Department of Computational Biology, Indraprastha Institute of Information Technology-Delhi (IIIT-Delhi), Okhla, Phase III, New Delhi-110020, India
| | - Mohit Choudhary
- Department of Computational Biology, Indraprastha Institute of Information Technology-Delhi (IIIT-Delhi), Okhla, Phase III, New Delhi-110020, India
| | - Sanjay Kumar Mohanty
- Department of Computational Biology, Indraprastha Institute of Information Technology-Delhi (IIIT-Delhi), Okhla, Phase III, New Delhi-110020, India
| | - Aayushi Mittal
- Department of Computational Biology, Indraprastha Institute of Information Technology-Delhi (IIIT-Delhi), Okhla, Phase III, New Delhi-110020, India
| | - Krishan Gupta
- Department of Computer Science and Engineering, Indraprastha Institute of Information Technology-Delhi (IIIT-Delhi), Okhla, Phase III, New Delhi-110020, India
| | - Aditya Arya
- Pathfinder Research and Training Foundation, 30/7 and 8, Knowledge Park III, Greater Noida, Uttar Pradesh - 201308, India
| | - Suvendu Kumar
- Department of Computational Biology, Indraprastha Institute of Information Technology-Delhi (IIIT-Delhi), Okhla, Phase III, New Delhi-110020, India
| | - Nikhil Katyayan
- Department of Computational Biology, Indraprastha Institute of Information Technology-Delhi (IIIT-Delhi), Okhla, Phase III, New Delhi-110020, India
| | - Nilesh Kumar Dixit
- Department of Computational Biology, Indraprastha Institute of Information Technology-Delhi (IIIT-Delhi), Okhla, Phase III, New Delhi-110020, India
| | - Siddhant Kalra
- Department of Computational Biology, Indraprastha Institute of Information Technology-Delhi (IIIT-Delhi), Okhla, Phase III, New Delhi-110020, India
| | - Manshi Goel
- Department of Computational Biology, Indraprastha Institute of Information Technology-Delhi (IIIT-Delhi), Okhla, Phase III, New Delhi-110020, India
| | - Megha Sahni
- Department of Computational Biology, Indraprastha Institute of Information Technology-Delhi (IIIT-Delhi), Okhla, Phase III, New Delhi-110020, India
| | - Vrinda Singhal
- Department of Computational Biology, Indraprastha Institute of Information Technology-Delhi (IIIT-Delhi), Okhla, Phase III, New Delhi-110020, India
| | - Tripti Mishra
- Pathfinder Research and Training Foundation, 30/7 and 8, Knowledge Park III, Greater Noida, Uttar Pradesh - 201308, India
| | - Debarka Sengupta
- Department of Computational Biology, Indraprastha Institute of Information Technology-Delhi (IIIT-Delhi), Okhla, Phase III, New Delhi-110020, India.,Department of Computer Science and Engineering, Indraprastha Institute of Information Technology-Delhi (IIIT-Delhi), Okhla, Phase III, New Delhi-110020, India.,Centre for Artificial Intelligence, Indraprastha Institute of Information Technology, Okhla Phase III, New Delhi, 110020, India.,Institute of Health and Biomedical Innovation, Queensland University of Technology, Brisbane, Australia
| | - Gaurav Ahuja
- Department of Computational Biology, Indraprastha Institute of Information Technology-Delhi (IIIT-Delhi), Okhla, Phase III, New Delhi-110020, India
| |
Collapse
|
13
|
Borah P, Hazarika S, Deka S, Venugopala KN, Nair AB, Attimarad M, Sreeharsha N, Mailavaram RP. Application of Advanced Technologies in Natural Product Research: A Review with Special Emphasis on ADMET Profiling. Curr Drug Metab 2020; 21:751-767. [PMID: 32664837 DOI: 10.2174/1389200221666200714144911] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2020] [Revised: 05/12/2020] [Accepted: 06/17/2020] [Indexed: 12/14/2022]
Abstract
The successful conversion of natural products (NPs) into lead compounds and novel pharmacophores has emboldened the researchers to harness the drug discovery process with a lot more enthusiasm. However, forfeit of bioactive NPs resulting from an overabundance of metabolites and their wide dynamic range have created the bottleneck in NP researches. Similarly, the existence of multidimensional challenges, including the evaluation of pharmacokinetics, pharmacodynamics, and safety parameters, has been a concerning issue. Advancement of technology has brought the evolution of traditional natural product researches into the computer-based assessment exhibiting pretentious remarks about their efficiency in drug discovery. The early attention to the quality of the NPs may reduce the attrition rate of drug candidates by parallel assessment of ADMET profiling. This article reviews the status, challenges, opportunities, and integration of advanced technologies in natural product research. Indeed, emphasis will be laid on the current and futuristic direction towards the application of newer technologies in early-stage ADMET profiling of bioactive moieties from the natural sources. It can be expected that combinatorial approaches in ADMET profiling will fortify the natural product-based drug discovery in the near future.
Collapse
Affiliation(s)
- Pobitra Borah
- Pratiksha Institute of Pharmaceutical Sciences, Chandrapur Road, Panikhaiti, Guwahati-26, Assam, India
| | - Sangeeta Hazarika
- Department of Pharmaceutical Engineering & Technology, Indian Institute of Technology (Banaras Hindu University), Varanasi, Uttar Pradesh-221005, India
| | - Satyendra Deka
- Pratiksha Institute of Pharmaceutical Sciences, Chandrapur Road, Panikhaiti, Guwahati-26, Assam, India
| | - Katharigatta N Venugopala
- Department of Pharmaceutical Sciences, College of Clinical Pharmacy, King Faisal University, Al-Ahsa-31982, Saudi Arabia
| | - Anroop B Nair
- Department of Pharmaceutical Sciences, College of Clinical Pharmacy, King Faisal University, Al-Ahsa-31982, Saudi Arabia
| | - Mahesh Attimarad
- Department of Pharmaceutical Sciences, College of Clinical Pharmacy, King Faisal University, Al-Ahsa-31982, Saudi Arabia
| | - Nagaraja Sreeharsha
- Department of Pharmaceutical Sciences, College of Clinical Pharmacy, King Faisal University, Al-Ahsa-31982, Saudi Arabia
| | - Raghu P Mailavaram
- Department of Pharmaceutical Chemistry, Shri Vishnu College of Pharmacy, Vishnupur (Affiliated to Andhra University), Bhimavaram, W.G. Dist., Andhra Pradesh, India
| |
Collapse
|
14
|
Vázquez J, López M, Gibert E, Herrero E, Luque FJ. Merging Ligand-Based and Structure-Based Methods in Drug Discovery: An Overview of Combined Virtual Screening Approaches. Molecules 2020; 25:E4723. [PMID: 33076254 PMCID: PMC7587536 DOI: 10.3390/molecules25204723] [Citation(s) in RCA: 77] [Impact Index Per Article: 19.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2020] [Revised: 10/06/2020] [Accepted: 10/11/2020] [Indexed: 12/20/2022] Open
Abstract
Virtual screening (VS) is an outstanding cornerstone in the drug discovery pipeline. A variety of computational approaches, which are generally classified as ligand-based (LB) and structure-based (SB) techniques, exploit key structural and physicochemical properties of ligands and targets to enable the screening of virtual libraries in the search of active compounds. Though LB and SB methods have found widespread application in the discovery of novel drug-like candidates, their complementary natures have stimulated continued efforts toward the development of hybrid strategies that combine LB and SB techniques, integrating them in a holistic computational framework that exploits the available information of both ligand and target to enhance the success of drug discovery projects. In this review, we analyze the main strategies and concepts that have emerged in the last years for defining hybrid LB + SB computational schemes in VS studies. Particularly, attention is focused on the combination of molecular similarity and docking, illustrating them with selected applications taken from the literature.
Collapse
Affiliation(s)
- Javier Vázquez
- Pharmacelera, Plaça Pau Vila, 1, Sector C 2a, Edificio Palau de Mar, 08039 Barcelona, Spain;
- Department of Nutrition, Food Science and Gastronomy, Faculty of Pharmacy and Food Sciences, Institute of Biomedicine (IBUB), and Institute of Theoretical and Computational Chemistry (IQTC-UB), University of Barcelona, Av. Prat de la Riba 171, E-08921 Santa Coloma de Gramanet, Spain
| | - Manel López
- AB Science, Parc Scientifique de Luminy, Zone Luminy Enterprise, Case 922, 163 Av. de Luminy, 13288 Marseille, France;
| | - Enric Gibert
- Pharmacelera, Plaça Pau Vila, 1, Sector C 2a, Edificio Palau de Mar, 08039 Barcelona, Spain;
| | - Enric Herrero
- Pharmacelera, Plaça Pau Vila, 1, Sector C 2a, Edificio Palau de Mar, 08039 Barcelona, Spain;
| | - F. Javier Luque
- Department of Nutrition, Food Science and Gastronomy, Faculty of Pharmacy and Food Sciences, Institute of Biomedicine (IBUB), and Institute of Theoretical and Computational Chemistry (IQTC-UB), University of Barcelona, Av. Prat de la Riba 171, E-08921 Santa Coloma de Gramanet, Spain
| |
Collapse
|
15
|
Arús-Pous J, Patronov A, Bjerrum EJ, Tyrchan C, Reymond JL, Chen H, Engkvist O. SMILES-based deep generative scaffold decorator for de-novo drug design. J Cheminform 2020; 12:38. [PMID: 33431013 PMCID: PMC7260788 DOI: 10.1186/s13321-020-00441-8] [Citation(s) in RCA: 63] [Impact Index Per Article: 15.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2020] [Accepted: 05/16/2020] [Indexed: 12/21/2022] Open
Abstract
Molecular generative models trained with small sets of molecules represented as SMILES strings can generate large regions of the chemical space. Unfortunately, due to the sequential nature of SMILES strings, these models are not able to generate molecules given a scaffold (i.e., partially-built molecules with explicit attachment points). Herein we report a new SMILES-based molecular generative architecture that generates molecules from scaffolds and can be trained from any arbitrary molecular set. This approach is possible thanks to a new molecular set pre-processing algorithm that exhaustively slices all possible combinations of acyclic bonds of every molecule, combinatorically obtaining a large number of scaffolds with their respective decorations. Moreover, it serves as a data augmentation technique and can be readily coupled with randomized SMILES to obtain even better results with small sets. Two examples showcasing the potential of the architecture in medicinal and synthetic chemistry are described: First, models were trained with a training set obtained from a small set of Dopamine Receptor D2 (DRD2) active modulators and were able to meaningfully decorate a wide range of scaffolds and obtain molecular series predicted active on DRD2. Second, a larger set of drug-like molecules from ChEMBL was selectively sliced using synthetic chemistry constraints (RECAP rules). In this case, the resulting scaffolds with decorations were filtered only to allow those that included fragment-like decorations. This filtering process allowed models trained with this dataset to selectively decorate diverse scaffolds with fragments that were generally predicted to be synthesizable and attachable to the scaffold using known synthetic approaches. In both cases, the models were already able to decorate molecules using specific knowledge without the need to add it with other techniques, such as reinforcement learning. We envision that this architecture will become a useful addition to the already existent architectures for de novo molecular generation.
Collapse
Affiliation(s)
- Josep Arús-Pous
- Molecular AI, Hit Discovery, Discovery Sciences, BioPharmaceutical R&D, AstraZeneca, Gothenburg, Sweden. .,Department of Chemistry and Biochemistry, University of Bern, Freiestrasse 3, 3012, Bern, Switzerland.
| | - Atanas Patronov
- Molecular AI, Hit Discovery, Discovery Sciences, BioPharmaceutical R&D, AstraZeneca, Gothenburg, Sweden
| | - Esben Jannik Bjerrum
- Molecular AI, Hit Discovery, Discovery Sciences, BioPharmaceutical R&D, AstraZeneca, Gothenburg, Sweden
| | - Christian Tyrchan
- Medicinal Chemistry, Respiratory Inflammation, and Autoimmune (RIA), BioPharmaceutical R&D, AstraZeneca, Gothenburg, Sweden
| | - Jean-Louis Reymond
- Department of Chemistry and Biochemistry, University of Bern, Freiestrasse 3, 3012, Bern, Switzerland
| | - Hongming Chen
- Chemistry and Chemical Biology Centre, Guangzhou Regenerative Medicine and Health -Guangdong Laboratory, Guangzhou, China
| | - Ola Engkvist
- Molecular AI, Hit Discovery, Discovery Sciences, BioPharmaceutical R&D, AstraZeneca, Gothenburg, Sweden
| |
Collapse
|
16
|
Freitas e Silva KS, C. Silva L, Gonçales RA, Neves BJ, Soares CM, Pereira M. Setting New Routes for Antifungal Drug Discovery Against Pathogenic Fungi. Curr Pharm Des 2020; 26:1509-1520. [DOI: 10.2174/1381612826666200317125956] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2019] [Accepted: 02/11/2020] [Indexed: 01/08/2023]
Abstract
:Fungal diseases are life-threatening to human health and responsible for millions of deaths around the world. Fungal pathogens lead to a high number of morbidity and mortality. Current antifungal treatment comprises drugs, such as azoles, echinocandins, and polyenes and the cure is not guaranteed. In addition, such drugs are related to severe side effects and the treatment lasts for an extended period. Thus, setting new routes for the discovery of effective and safe antifungal drugs should be a priority within the health care system. The discovery of alternative and efficient antifungal drugs showing fewer side effects is time-consuming and remains a challenge. Natural products can be a source of antifungals and used in combinatorial therapy. The most important natural products are antifungal peptides, antifungal lectins, antifungal plants, and fungi secondary metabolites. Several proteins, enzymes, and metabolic pathways could be targets for the discovery of efficient inhibitor compounds and recently, heat shock proteins, calcineurin, salinomycin, the trehalose biosynthetic pathway, and the glyoxylate cycle have been investigated in several fungal species. HSP protein inhibitors and echinocandins have been shown to have a fungicidal effect against azole-resistant fungi strains. Transcriptomic and proteomic approaches have advanced antifungal drug discovery and pointed to new important specific-pathogen targets. Certain enzymes, such as those from the glyoxylate cycle, have been a target of antifungal compounds in several fungi species. Natural and synthetic compounds inhibited the activity of such enzymes and reduced the ability of fungal cells to transit from mycelium to yeast, proving to be promisor antifungal agents. Finally, computational biology has developed effective approaches, setting new routes for early antifungal drug discovery since normal approaches take several years from discovery to clinical use. Thus, the development of new antifungal strategies might reduce the therapeutic time and increase the quality of life of patients.
Collapse
Affiliation(s)
- Kleber S. Freitas e Silva
- Laboratório de Biologia Molecular, Instituto de Ciências Biológicas, Universidade Federal de Goiás, Goiânia, Brazil
| | - Lívia C. Silva
- Laboratório de Biologia Molecular, Instituto de Ciências Biológicas, Universidade Federal de Goiás, Goiânia, Brazil
| | - Relber A. Gonçales
- Laboratório de Biologia Molecular, Instituto de Ciências Biológicas, Universidade Federal de Goiás, Goiânia, Brazil
| | - Bruno J. Neves
- LabMol - Laboratory for Molecular Modeling and Drug Design, Faculdade de Farmácia, Universidade Federal de Goiás, Goiânia, GO, 74605-510, Brazil
| | - Célia M.A. Soares
- Laboratório de Biologia Molecular, Instituto de Ciências Biológicas, Universidade Federal de Goiás, Goiânia, Brazil
| | - Maristela Pereira
- Laboratório de Biologia Molecular, Instituto de Ciências Biológicas, Universidade Federal de Goiás, Goiânia, Brazil
| |
Collapse
|
17
|
Singh N, Chaput L, Villoutreix BO. Virtual screening web servers: designing chemical probes and drug candidates in the cyberspace. Brief Bioinform 2020; 22:1790-1818. [PMID: 32187356 PMCID: PMC7986591 DOI: 10.1093/bib/bbaa034] [Citation(s) in RCA: 55] [Impact Index Per Article: 13.8] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
The interplay between life sciences and advancing technology drives a continuous cycle of chemical data growth; these data are most often stored in open or partially open databases. In parallel, many different types of algorithms are being developed to manipulate these chemical objects and associated bioactivity data. Virtual screening methods are among the most popular computational approaches in pharmaceutical research. Today, user-friendly web-based tools are available to help scientists perform virtual screening experiments. This article provides an overview of internet resources enabling and supporting chemical biology and early drug discovery with a main emphasis on web servers dedicated to virtual ligand screening and small-molecule docking. This survey first introduces some key concepts and then presents recent and easily accessible virtual screening and related target-fishing tools as well as briefly discusses case studies enabled by some of these web services. Notwithstanding further improvements, already available web-based tools not only contribute to the design of bioactive molecules and assist drug repositioning but also help to generate new ideas and explore different hypotheses in a timely fashion while contributing to teaching in the field of drug development.
Collapse
Affiliation(s)
- Natesh Singh
- Univ. Lille, Inserm, Institut Pasteur de Lille, U1177 Drugs and Molecules for Living Systems, F-59000 Lille, France
| | - Ludovic Chaput
- Univ. Lille, Inserm, Institut Pasteur de Lille, U1177 Drugs and Molecules for Living Systems, F-59000 Lille, France
| | - Bruno O Villoutreix
- Univ. Lille, Inserm, Institut Pasteur de Lille, U1177 Drugs and Molecules for Living Systems, F-59000 Lille, France
| |
Collapse
|
18
|
Prykhodko O, Johansson SV, Kotsias PC, Arús-Pous J, Bjerrum EJ, Engkvist O, Chen H. A de novo molecular generation method using latent vector based generative adversarial network. J Cheminform 2019; 11:74. [PMID: 33430938 PMCID: PMC6892210 DOI: 10.1186/s13321-019-0397-9] [Citation(s) in RCA: 149] [Impact Index Per Article: 29.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2019] [Accepted: 11/23/2019] [Indexed: 02/06/2023] Open
Abstract
Deep learning methods applied to drug discovery have been used to generate novel structures. In this study, we propose a new deep learning architecture, LatentGAN, which combines an autoencoder and a generative adversarial neural network for de novo molecular design. We applied the method in two scenarios: one to generate random drug-like compounds and another to generate target-biased compounds. Our results show that the method works well in both cases. Sampled compounds from the trained model can largely occupy the same chemical space as the training set and also generate a substantial fraction of novel compounds. Moreover, the drug-likeness score of compounds sampled from LatentGAN is also similar to that of the training set. Lastly, generated compounds differ from those obtained with a Recurrent Neural Network-based generative model approach, indicating that both methods can be used complementarily.
Collapse
Affiliation(s)
- Oleksii Prykhodko
- Hit Discovery, Discovery Sciences, Biopharmaceutical R&D, AstraZeneca, Gothenburg, Sweden
- Department of Computer Science and Engineering, Chalmers University of Technology, Gothenburg, Sweden
| | - Simon Viet Johansson
- Hit Discovery, Discovery Sciences, Biopharmaceutical R&D, AstraZeneca, Gothenburg, Sweden.
- Department of Computer Science and Engineering, Chalmers University of Technology, Gothenburg, Sweden.
| | | | - Josep Arús-Pous
- Hit Discovery, Discovery Sciences, Biopharmaceutical R&D, AstraZeneca, Gothenburg, Sweden
- Department of Chemistry and Biochemistry, University of Bern, Bern, Switzerland
| | - Esben Jannik Bjerrum
- Hit Discovery, Discovery Sciences, Biopharmaceutical R&D, AstraZeneca, Gothenburg, Sweden
| | - Ola Engkvist
- Hit Discovery, Discovery Sciences, Biopharmaceutical R&D, AstraZeneca, Gothenburg, Sweden
| | - Hongming Chen
- Hit Discovery, Discovery Sciences, Biopharmaceutical R&D, AstraZeneca, Gothenburg, Sweden.
- Chemistry and Chemical Biology Centre, Guangzhou Regenerative Medicine and Health-Guangdong Laboratory, Science Park, Guangzhou, China.
| |
Collapse
|
19
|
Maltarollo VG, Kronenberger T, Espinoza GZ, Oliveira PR, Honorio KM. Advances with support vector machines for novel drug discovery. Expert Opin Drug Discov 2018; 14:23-33. [PMID: 30488731 DOI: 10.1080/17460441.2019.1549033] [Citation(s) in RCA: 39] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
Abstract
INTRODUCTION Novel drug discovery remains an enormous challenge, with various computer-aided drug design (CADD) approaches having been widely employed for this purpose. CADD, specifically the commonly used support vector machines (SVMs), can employ machine learning techniques. SVMs and their variations offer numerous drug discovery applications, which range from the classification of substances (as active or inactive) to the construction of regression models and the ranking/virtual screening of databased compounds. Areas covered: Herein, the authors consider some of the applications of SVMs in medicinal chemistry, illustrating their main advantages and disadvantages, as well as trends in their utilization, via the available published literature. The aim of this review is to provide an up-to-date review of the recent applications of SVMs in drug discovery as described by the literature, thereby highlighting their strengths, weaknesses, and future challenges. Expert opinion: Techniques based on SVMs are considered as powerful approaches in early drug discovery. The ability of SVMs to classify active or inactive compounds has enabled the prioritization of substances for virtual screening. Indeed, one of the main advantages of SVMs is related to their potential in the analysis of nonlinear problems. However, despite successes in employing SVMs, the challenges of improving accuracy remain.
Collapse
Affiliation(s)
- Vinicius Gonçalves Maltarollo
- a Departamento de Produtos Farmacêuticos, Faculdade de Farmácia , Universidade Federal de Minas Gerais , Belo Horizonte , Brazil
| | - Thales Kronenberger
- b Department of Internal Medicine VIII , University Hospital of Tübingen , Tübingen , Germany
| | - Gabriel Zarzana Espinoza
- c Escola de Artes, Ciências e Humanidades , Universidade de São Paulo (USP) , São Paulo , Brazil
| | - Patricia Rufino Oliveira
- c Escola de Artes, Ciências e Humanidades , Universidade de São Paulo (USP) , São Paulo , Brazil
| | - Kathia Maria Honorio
- c Escola de Artes, Ciências e Humanidades , Universidade de São Paulo (USP) , São Paulo , Brazil.,d Centro de Ciências Naturais e Humanas , Universidade Federal do ABC , Santo André , Brazil
| |
Collapse
|
20
|
Bjerrum EJ, Sattarov B. Improving Chemical Autoencoder Latent Space and Molecular De Novo Generation Diversity with Heteroencoders. Biomolecules 2018; 8:E131. [PMID: 30380783 PMCID: PMC6316879 DOI: 10.3390/biom8040131] [Citation(s) in RCA: 88] [Impact Index Per Article: 14.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2018] [Revised: 10/22/2018] [Accepted: 10/23/2018] [Indexed: 11/16/2022] Open
Abstract
Chemical autoencoders are attractive models as they combine chemical space navigation with possibilities for de novo molecule generation in areas of interest. This enables them to produce focused chemical libraries around a single lead compound for employment early in a drug discovery project. Here, it is shown that the choice of chemical representation, such as strings from the simplified molecular-input line-entry system (SMILES), has a large influence on the properties of the latent space. It is further explored to what extent translating between different chemical representations influences the latent space similarity to the SMILES strings or circular fingerprints. By employing SMILES enumeration for either the encoder or decoder, it is found that the decoder has the largest influence on the properties of the latent space. Training a sequence to sequence heteroencoder based on recurrent neural networks (RNNs) with long short-term memory cells (LSTM) to predict different enumerated SMILES strings from the same canonical SMILES string gives the largest similarity between latent space distance and molecular similarity measured as circular fingerprints similarity. Using the output from the code layer in quantitative structure activity relationship (QSAR) of five molecular datasets shows that heteroencoder derived vectors markedly outperforms autoencoder derived vectors as well as models built using ECFP4 fingerprints, underlining the increased chemical relevance of the latent space. However, the use of enumeration during training of the decoder leads to a marked increase in the rate of decoding to different molecules than encoded, a tendency that can be counteracted with more complex network architectures.
Collapse
Affiliation(s)
- Esben Jannik Bjerrum
- Wildcard Pharmaceutical Consulting, Zeaborg Science Center, Frødings Allé 41, 2860 Søborg, Denmark.
| | - Boris Sattarov
- Science Data Software LLC, 14914 Bradwill Court, Rockville, MD 20850, USA.
| |
Collapse
|
21
|
Protein structure and computational drug discovery. Biochem Soc Trans 2018; 46:1367-1379. [DOI: 10.1042/bst20180202] [Citation(s) in RCA: 20] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2018] [Revised: 08/08/2018] [Accepted: 08/13/2018] [Indexed: 12/12/2022]
Abstract
The first protein structures revealed a complex web of weak interactions stabilising the three-dimensional shape of the molecule. Small molecule ligands were then found to exploit these same weak binding events to modulate protein function or act as substrates in enzymatic reactions. As the understanding of ligand–protein binding grew, it became possible to firstly predict how and where a particular small molecule might interact with a protein, and then to identify putative ligands for a specific protein site. Computer-aided drug discovery, based on the structure of target proteins, is now a well-established technique that has produced several marketed drugs. We present here an overview of the various methodologies being used for structure-based computer-aided drug discovery and comment on possible future developments in the field.
Collapse
|
22
|
De Majo F, De Windt LJ. RNA therapeutics for heart disease. Biochem Pharmacol 2018; 155:468-478. [DOI: 10.1016/j.bcp.2018.07.037] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2018] [Accepted: 07/25/2018] [Indexed: 12/20/2022]
|