1
|
Adamczewski M, Nisius B, Kausch-Busies N. Derisking Future Agrochemicals before They Are Made: Large-Scale In Vitro Screening for In Silico Modeling of Thyroid Peroxidase Inhibition. Chem Res Toxicol 2024; 37:1698-1711. [PMID: 39303287 DOI: 10.1021/acs.chemrestox.4c00248] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/22/2024]
Abstract
Inhibition of thyroid peroxidase (TPO) is a known molecular initiating event for thyroid hormone dysregulation and thyroid toxicity. Consequently, TPO is a critical off-target for the design of safer agrochemicals. To date, fewer than 500 structurally characterized TPO inhibitors are known, and the most comprehensive result set generated under identical conditions encompasses approximately 1000 compounds from a subset of the ToxCast compound collection. Here we describe a collaboration between wet lab and data scientists combining a large in vitro screen and the subsequent development of an in silico model for predicting TPO inhibition. The screen encompassed more than 100,000 diverse drug-like agrochemical compounds and yielded more than 6000 structurally novel TPO inhibitors. On this foundation, we applied different machine learning techniques and compared their performance. We discuss use cases for in silico TPO models in agrochemical research and explain that model recall is of particular importance when selecting compounds from large virtual compound collections. Furthermore, we show that due to the higher structural diversity of our training data, our final model allowed better generalization than models trained on the ToxCast data set. We now have a tool to predict TPO inhibition even for molecules that are only available virtually, such as hits from virtual screenings, or compounds under consideration for inclusion in our screening collection. Structures and activity data for 34,524 compounds are provided. This data set includes almost all inhibitors, including more than 3000 proprietary structures, and a large proportion of the inactives.
Collapse
Affiliation(s)
- Martin Adamczewski
- Bayer AG, Division CropScience, Alfred-Nobel-Str 50, Monheim 40789, Germany
| | - Britta Nisius
- Bayer AG, Division CropScience, Alfred-Nobel-Str 50, Monheim 40789, Germany
| | - Nina Kausch-Busies
- Bayer AG, Division CropScience, Alfred-Nobel-Str 50, Monheim 40789, Germany
| |
Collapse
|
2
|
Hall B, Keiser MJ. Retrieval Augmented Docking Using Hierarchical Navigable Small Worlds. J Chem Inf Model 2024; 64:7398-7408. [PMID: 39360680 PMCID: PMC11480973 DOI: 10.1021/acs.jcim.4c00683] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2024] [Revised: 09/17/2024] [Accepted: 09/18/2024] [Indexed: 10/04/2024]
Abstract
Make-on-demand chemical libraries have drastically increased the reach of molecular docking, with the enumerated ready-to-dock ZINC-22 library approaching 6.4 billion molecules (July 2024). While ever-growing libraries result in better-scoring molecules, the computational resources required to dock all of ZINC-22 make this endeavor infeasible for most. Here, we organize and traverse chemical space with hierarchical navigable small-world graphs, a method we term retrieval augmented docking (RAD). RAD recovers most virtual actives, despite docking only a fraction of the library. Furthermore, RAD is protein-agnostic, supporting additional docking campaigns without additional computational overhead. In depth, we assess RAD on published large-scale docking campaigns against D4 and AmpC spanning 99.5 million and 138 million molecules, respectively. RAD recovers 95% of DOCK virtual actives for both targets after evaluating only 10% of the libraries. In breadth, RAD shows widespread applicability against 43 DUDE-Z proteins, evaluating 50.3 million associations. On average, RAD recovers 87% of virtual actives while docking 10% of the library without sacrificing chemical diversity.
Collapse
Affiliation(s)
- Brendan
W. Hall
- Department
of Pharmaceutical Chemistry, University
of California, San Francisco, San Francisco, California 94158, United States
- Program
in Biophysics, University of California,
San Francisco, San Francisco, California 94158, United States
| | - Michael J. Keiser
- Department
of Pharmaceutical Chemistry, University
of California, San Francisco, San Francisco, California 94158, United States
- Institute
for Neurodegenerative Diseases, University of California, San Francisco, San Francisco, California 94158, United States
- Bakar
Computational Health Sciences Institute, University of California,
San Francisco, San Francisco, California 94158, United States
- Department
of Bioengineering and Therapeutic Sciences, University of California, San Francisco, San Francisco, California 94158, United States
| |
Collapse
|
3
|
Orsi M, Reymond JL. Navigating a 1E+60 Chemical Space of Peptide/Peptoid Oligomers. Mol Inform 2024:e202400186. [PMID: 39390672 DOI: 10.1002/minf.202400186] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2024] [Revised: 08/21/2024] [Accepted: 08/27/2024] [Indexed: 10/12/2024]
Abstract
Herein we report a virtual library of 1E+60 members, a common estimate for the size of the drug-like chemical space. The library consists of linear or cyclic oligomers forming molecules within the size range of peptide drugs. We demonstrate ligand-based virtual screening using a genetic algorithm.
Collapse
Affiliation(s)
- Markus Orsi
- Department of Chemistry, Biochemistry and Pharmaceutical Sciences, University of Bern, Freiestrasse 3, 3012, Bern, Switzerland
| | - Jean-Louis Reymond
- Department of Chemistry, Biochemistry and Pharmaceutical Sciences, University of Bern, Freiestrasse 3, 3012, Bern, Switzerland
| |
Collapse
|
4
|
Muegge I, Bentzien J, Ge Y. Perspectives on current approaches to virtual screening in drug discovery. Expert Opin Drug Discov 2024; 19:1173-1183. [PMID: 39132881 DOI: 10.1080/17460441.2024.2390511] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2024] [Accepted: 08/06/2024] [Indexed: 08/13/2024]
Abstract
INTRODUCTION For the past two decades, virtual screening (VS) has been an efficient hit finding approach for drug discovery. Today, billions of commercially accessible compounds are routinely screened, and many successful examples of VS have been reported. VS methods continue to evolve, including machine learning and physics-based methods. AREAS COVERED The authors examine recent examples of VS in drug discovery and discuss prospective hit finding results from the critical assessment of computational hit-finding experiments (CACHE) challenge. The authors also highlight the cost considerations and open-source options for conducting VS and examine chemical space coverage and library selections for VS. EXPERT OPINION The advancement of sophisticated VS approaches, including the use of machine learning techniques and increased computer resources as well as the ease of access to synthetically available chemical spaces, and commercial and open-source VS platforms allow for interrogating ultra-large libraries (ULL) of billions of molecules. An impressive number of prospective ULL VS campaigns have generated potent and structurally novel hits across many target classes. Nonetheless, many successful contemporary VS approaches still use considerably smaller focused libraries. This apparent dichotomy illustrates that VS is best conducted in a fit-for-purpose way choosing an appropriate chemical space. Better methods need to be developed to tackle more challenging targets.
Collapse
Affiliation(s)
- Ingo Muegge
- Research department, Alkermes, Inc, Waltham, MA, USA
| | - Jörg Bentzien
- Research department, Alkermes, Inc, Waltham, MA, USA
| | - Yunhui Ge
- Research department, Alkermes, Inc, Waltham, MA, USA
| |
Collapse
|
5
|
Weller J, Rohs R. Structure-Based Drug Design with a Deep Hierarchical Generative Model. J Chem Inf Model 2024; 64:6450-6463. [PMID: 39058534 PMCID: PMC11350878 DOI: 10.1021/acs.jcim.4c01193] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2024] [Revised: 07/16/2024] [Accepted: 07/17/2024] [Indexed: 07/28/2024]
Abstract
Recently, the remarkable growth of available crystal structure data and libraries of commercially available or readily synthesizable molecules have unlocked previously inaccessible regions of chemical space for drug development. Paired with improvements in virtual ligand screening methods, these expanded libraries are having a notable impact on early drug design efforts. Yet screening-based methods still face scalability limits, due to computational constraints and the sheer scale of drug-like space. Machine learning approaches are overcoming these limitations by learning the fundamental intra- and intermolecular relationships in drug-target systems from existing data. Here, we introduce DrugHIVE, a deep hierarchical variational autoencoder that outperforms state-of-the-art autoregressive and diffusion-based methods in both speed and performance on common generative benchmarks. DrugHIVE's hierarchical design enables improved control over molecular generation. Its capabilities include dramatically increasing virtual screening efficiency and accelerating a wide range of common drug design tasks, including de novo generation, molecular optimization, scaffold hopping, linker design, and high-throughput pattern replacement. Our highly scalable method can even be applied to receptors with high-confidence AlphaFold-predicted structures, extending the ability to generate high-quality drug-like molecules to a majority of the unsolved human proteome.
Collapse
Affiliation(s)
- Jesse
A. Weller
- Department
of Quantitative and Computational Biology, University of Southern California, Los Angeles, California 90089, United States
- Department
of Physics and Astronomy, University of
Southern California, Los Angeles, California 90089, United States
| | - Remo Rohs
- Department
of Quantitative and Computational Biology, University of Southern California, Los Angeles, California 90089, United States
- Department
of Physics and Astronomy, University of
Southern California, Los Angeles, California 90089, United States
- Department
of Chemistry, University of Southern California, Los Angeles, California 90089, United States
- Thomas
Lord Department of Computer Science, University
of Southern California, Los Angeles, California 90089, United States
| |
Collapse
|
6
|
Sultan A, Sieg J, Mathea M, Volkamer A. Transformers for Molecular Property Prediction: Lessons Learned from the Past Five Years. J Chem Inf Model 2024; 64:6259-6280. [PMID: 39136669 DOI: 10.1021/acs.jcim.4c00747] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/27/2024]
Abstract
Molecular Property Prediction (MPP) is vital for drug discovery, crop protection, and environmental science. Over the last decades, diverse computational techniques have been developed, from using simple physical and chemical properties and molecular fingerprints in statistical models and classical machine learning to advanced deep learning approaches. In this review, we aim to distill insights from current research on employing transformer models for MPP. We analyze the currently available models and explore key questions that arise when training and fine-tuning a transformer model for MPP. These questions encompass the choice and scale of the pretraining data, optimal architecture selections, and promising pretraining objectives. Our analysis highlights areas not yet covered in current research, inviting further exploration to enhance the field's understanding. Additionally, we address the challenges in comparing different models, emphasizing the need for standardized data splitting and robust statistical analysis.
Collapse
Affiliation(s)
- Afnan Sultan
- Data Driven Drug Design, Center for Bioinformatics, Saarland University, Saarbrücken 66123, Germany
| | | | | | - Andrea Volkamer
- Data Driven Drug Design, Center for Bioinformatics, Saarland University, Saarbrücken 66123, Germany
| |
Collapse
|
7
|
Moreira-Filho JT, Ranganath D, Conway M, Schmitt C, Kleinstreuer N, Mansouri K. Democratizing cheminformatics: interpretable chemical grouping using an automated KNIME workflow. J Cheminform 2024; 16:101. [PMID: 39152469 PMCID: PMC11330086 DOI: 10.1186/s13321-024-00894-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2024] [Accepted: 08/06/2024] [Indexed: 08/19/2024] Open
Abstract
With the increased availability of chemical data in public databases, innovative techniques and algorithms have emerged for the analysis, exploration, visualization, and extraction of information from these data. One such technique is chemical grouping, where chemicals with common characteristics are categorized into distinct groups based on physicochemical properties, use, biological activity, or a combination. However, existing tools for chemical grouping often require specialized programming skills or the use of commercial software packages. To address these challenges, we developed a user-friendly chemical grouping workflow implemented in KNIME, a free, open-source, low/no-code, data analytics platform. The workflow serves as an all-encompassing tool, expertly incorporating a range of processes such as molecular descriptor calculation, feature selection, dimensionality reduction, hyperparameter search, and supervised and unsupervised machine learning methods, enabling effective chemical grouping and visualization of results. Furthermore, we implemented tools for interpretation, identifying key molecular descriptors for the chemical groups, and using natural language summaries to clarify the rationale behind these groupings. The workflow was designed to run seamlessly in both the KNIME local desktop version and KNIME Server WebPortal as a web application. It incorporates interactive interfaces and guides to assist users in a step-by-step manner. We demonstrate the utility of this workflow through a case study using an eye irritation and corrosion dataset.Scientific contributionsThis work presents a novel, comprehensive chemical grouping workflow in KNIME, enhancing accessibility by integrating a user-friendly graphical interface that eliminates the need for extensive programming skills. This workflow uniquely combines several features such as automated molecular descriptor calculation, feature selection, dimensionality reduction, and machine learning algorithms (both supervised and unsupervised), with hyperparameter optimization to refine chemical grouping accuracy. Moreover, we have introduced an innovative interpretative step and natural language summaries to elucidate the underlying reasons for chemical groupings, significantly advancing the usability of the tool and interpretability of the results.
Collapse
Affiliation(s)
- José T Moreira-Filho
- National Toxicology Program Interagency Center for the Evaluation of Alternative Toxicological Methods, Division of Translational Toxicology, National Institute of Environmental Health Sciences, Research Triangle Park, North Carolina, USA.
| | - Dhruv Ranganath
- University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - Mike Conway
- National Institute of Environmental Health Sciences, Research Triangle Park, North Carolina, USA
| | - Charles Schmitt
- Division of Translational Toxicology, National Institute of Environmental Health Sciences, Research Triangle Park, North Carolina, USA
| | - Nicole Kleinstreuer
- National Toxicology Program Interagency Center for the Evaluation of Alternative Toxicological Methods, Division of Translational Toxicology, National Institute of Environmental Health Sciences, Research Triangle Park, North Carolina, USA
| | - Kamel Mansouri
- National Toxicology Program Interagency Center for the Evaluation of Alternative Toxicological Methods, Division of Translational Toxicology, National Institute of Environmental Health Sciences, Research Triangle Park, North Carolina, USA.
| |
Collapse
|
8
|
Pinzi L, Belluti S, Piccinini I, Imbriano C, Rastelli G. Searching for Novel HDAC6/Hsp90 Dual Inhibitors with Anti-Prostate Cancer Activity: In Silico Screening and In Vitro Evaluation. Pharmaceuticals (Basel) 2024; 17:1072. [PMID: 39204176 PMCID: PMC11357446 DOI: 10.3390/ph17081072] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2024] [Revised: 08/10/2024] [Accepted: 08/13/2024] [Indexed: 09/03/2024] Open
Abstract
Prostate cancer (PCA) is one of the most prevalent types of male cancers. While current treatments for early-stage PCA are available, their efficacy is limited in advanced PCA, mainly due to drug resistance or low efficacy. In this context, novel valuable therapeutic opportunities may arise from the combined inhibition of histone deacetylase 6 (HDAC6) and heat shock protein 90 (Hsp90). These targets are mutually involved in the regulation of several processes in cancer cells, and their inhibition is demonstrated to provide synergistic effects against PCA. On these premises, we performed an extensive in silico virtual screening campaign on commercial compounds in search of dual inhibitors of HDAC6 and Hsp90. In vitro tests against recombinant enzymes and PCA cells with different levels of aggressiveness allowed the identification of a subset of compounds with inhibitory activity against HDAC6 and antiproliferative effects towards LNCaP and PC-3 cells. None of the candidates showed appreciable Hsp90 inhibition. However, the discovered compounds have low molecular weight and a chemical structure similar to that of potent Hsp90 blockers. This provides an opportunity for structural and medicinal chemistry optimization in order to obtain HDAC6/Hsp90 dual modulators with antiproliferative effects against prostate cancer. These findings were discussed in detail in the study.
Collapse
Affiliation(s)
| | | | | | | | - Giulio Rastelli
- Department of Life Sciences, University of Modena and Reggio Emilia, Via Giuseppe Campi 103, 41125 Modena, Italy; (L.P.); (S.B.); (I.P.); (C.I.)
| |
Collapse
|
9
|
Gryn'ova G, Bereau T, Müller C, Friederich P, Wade RC, Nunes-Alves A, Soares TA, Merz K. EDITORIAL: Chemical Compound Space Exploration by Multiscale High-Throughput Screening and Machine Learning. J Chem Inf Model 2024; 64:5737-5738. [PMID: 39129448 DOI: 10.1021/acs.jcim.4c01300] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/13/2024]
Affiliation(s)
- Ganna Gryn'ova
- School of Chemistry, University of Birmingham, Birmingham B15 2TT, United Kingdom
| | - Tristan Bereau
- Institute for Theoretical Physics, Heidelberg University, Heidelberg 69120, Germany
| | - Carolin Müller
- Computer-Chemistry-Center, Friedrich-Alexander-Universität Erlangen-Nürnberg, Nägelsbachstraße 25, Erlangen 91052, Germany
| | - Pascal Friederich
- Institute of Theoretical Informatics, Karlsruhe Institute of Technology, Kaiserstr. 12, Karlsruhe 76131, Germany
- Institute of Nanotechnology, Karlsruhe Institute of Technology, Kaiserstr. 12, Karlsruhe 76131, Germany
| | - Rebecca C Wade
- Molecular and Cellular Modeling Group, Heidelberg Institute for Theoretical Studies (HITS), Schloss-Wolfsbrunnenweg 35, Heidelberg 69118, Germany
- Center for Molecular Biology of Heidelberg University (ZMBH), DKFZ-ZMBH Alliance, Heidelberg University, Im Neuenheimer Feld 329, Heidelberg 69120, Germany
- Interdisciplinary Center for Scientific Computing (IWR), Heidelberg University, Im Neuenheimer Feld 205, Heidelberg 69120, Germany
| | - Ariane Nunes-Alves
- Institute of Chemistry, Technische Universität Berlin, Berlin 10623, Germany
| | - Thereza A Soares
- Department of Chemistry, FFCLRP, University of São Paulo, Ribeirão Preto 14040-901, Brazil
- Hylleraas Centre for Quantum Molecular Sciences, University of Oslo, Oslo 0315, Norway
| | - Kenneth Merz
- Department of Chemistry, Michigan State University, Michigan 48824, United States
| |
Collapse
|
10
|
Zhao Y, Tian Y, Pang X, Li G, Shi S, Yan A. Classification of FLT3 inhibitors and SAR analysis by machine learning methods. Mol Divers 2024; 28:1995-2011. [PMID: 37142889 DOI: 10.1007/s11030-023-10640-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2023] [Accepted: 03/17/2023] [Indexed: 05/06/2023]
Abstract
FMS-like tyrosine kinase 3 (FLT3) is a type III receptor tyrosine kinase, which is an important target for anti-cancer therapy. In this work, we conducted a structure-activity relationship (SAR) study on 3867 FLT3 inhibitors we collected. MACCS fingerprints, ECFP4 fingerprints, and TT fingerprints were used to represent the inhibitors in the dataset. A total of 36 classification models were built based on support vector machine (SVM), random forest (RF), eXtreme Gradient Boosting (XGBoost), and deep neural networks (DNN) algorithms. Model 3D_3 built by deep neural networks (DNN) and TT fingerprints performed best on the test set with the highest prediction accuracy of 85.83% and Matthews correlation coefficient (MCC) of 0.72 and also performed well on the external test set. In addition, we clustered 3867 inhibitors into 11 subsets by the K-Means algorithm to figure out the structural characteristics of the reported FLT3 inhibitors. Finally, we analyzed the SAR of FLT3 inhibitors by RF algorithm based on ECFP4 fingerprints. The results showed that 2-aminopyrimidine, 1-ethylpiperidine,2,4-bis(methylamino)pyrimidine, amino-aromatic heterocycle, [(2E)-but-2-enyl]dimethylamine, but-2-enyl, and alkynyl were typical fragments among highly active inhibitors. Besides, three scaffolds in Subset_A (Subset 4), Subset_B, and Subset_C showed a significant relationship to inhibition activity targeting FLT3.
Collapse
Affiliation(s)
- Yunyang Zhao
- State Key Laboratory of Chemical Resource Engineering, Department of Pharmaceutical Engineering, Beijing University of Chemical Technology, 15 BeiSanHuan East Road, P.O. Box 53, Beijing, 100029, People's Republic of China
| | - Yujia Tian
- State Key Laboratory of Chemical Resource Engineering, Department of Pharmaceutical Engineering, Beijing University of Chemical Technology, 15 BeiSanHuan East Road, P.O. Box 53, Beijing, 100029, People's Republic of China
| | - Xiaoyang Pang
- State Key Laboratory of Chemical Resource Engineering, Department of Pharmaceutical Engineering, Beijing University of Chemical Technology, 15 BeiSanHuan East Road, P.O. Box 53, Beijing, 100029, People's Republic of China
| | - Guo Li
- State Key Laboratory of Chemical Resource Engineering, Department of Pharmaceutical Engineering, Beijing University of Chemical Technology, 15 BeiSanHuan East Road, P.O. Box 53, Beijing, 100029, People's Republic of China
| | - Shenghui Shi
- College of Information Science and Technology, Beijing University of Chemical Technology, 15 BeiSanHuan East Road, Beijing, 100029, People's Republic of China.
| | - Aixia Yan
- State Key Laboratory of Chemical Resource Engineering, Department of Pharmaceutical Engineering, Beijing University of Chemical Technology, 15 BeiSanHuan East Road, P.O. Box 53, Beijing, 100029, People's Republic of China.
| |
Collapse
|
11
|
Venkatraman V, Gaiser J, Demekas D, Roy A, Xiong R, Wheeler TJ. Do Molecular Fingerprints Identify Diverse Active Drugs in Large-Scale Virtual Screening? (No). Pharmaceuticals (Basel) 2024; 17:992. [PMID: 39204097 PMCID: PMC11356940 DOI: 10.3390/ph17080992] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2024] [Revised: 07/18/2024] [Accepted: 07/23/2024] [Indexed: 09/03/2024] Open
Abstract
Computational approaches for small-molecule drug discovery now regularly scale to the consideration of libraries containing billions of candidate small molecules. One promising approach to increased the speed of evaluating billion-molecule libraries is to develop succinct representations of each molecule that enable the rapid identification of molecules with similar properties. Molecular fingerprints are thought to provide a mechanism for producing such representations. Here, we explore the utility of commonly used fingerprints in the context of predicting similar molecular activity. We show that fingerprint similarity provides little discriminative power between active and inactive molecules for a target protein based on a known active-while they may sometimes provide some enrichment for active molecules in a drug screen, a screened data set will still be dominated by inactive molecules. We also demonstrate that high-similarity actives appear to share a scaffold with the query active, meaning that they could more easily be identified by structural enumeration. Furthermore, even when limited to only active molecules, fingerprint similarity values do not correlate with compound potency. In sum, these results highlight the need for a new wave of molecular representations that will improve the capacity to detect biologically active molecules based on their similarity to other such molecules.
Collapse
Affiliation(s)
- Vishwesh Venkatraman
- Department of Chemistry, Norwegian University of Science and Technology, 7034 Trondheim, Norway
| | - Jeremiah Gaiser
- School of Information, University of Arizona, Tucson, AZ 85721, USA
| | - Daphne Demekas
- R. Ken Coit College Pharmacy, University of Arizona, Tucson, AZ 85721, USA
| | - Amitava Roy
- Rocky Mountain Laboratories, Bioinformatics and Computational Biosciences Branch, Office of Cyber Infrastructure and Computational Biology, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Hamilton, MT 59840, USA;
- Department of Biomedical and Pharmaceutical Sciences, University of Montana, Missoula, MT 59812, USA
| | - Rui Xiong
- Department of Pharmacology & Toxicology, University of Arizona, Tucson, AZ 85721, USA
| | - Travis J. Wheeler
- R. Ken Coit College Pharmacy, University of Arizona, Tucson, AZ 85721, USA
| |
Collapse
|
12
|
Bedart C, Shimokura G, West FG, Wood TE, Batey RA, Irwin JJ, Schapira M. The Pan-Canadian Chemical Library: A Mechanism to Open Academic Chemistry to High-Throughput Virtual Screening. Sci Data 2024; 11:597. [PMID: 38844472 PMCID: PMC11156877 DOI: 10.1038/s41597-024-03443-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2023] [Accepted: 05/29/2024] [Indexed: 06/09/2024] Open
Abstract
Computationally screening chemical libraries to discover molecules with desired properties is a common technique used in early-stage drug discovery. Recent progress in the field now enables the efficient exploration of billions of molecules within days or hours, but this exploration remains confined within the boundaries of the accessible chemistry space. While the number of commercially available compounds grows rapidly, it remains a limited subset of all druglike small molecules that could be synthesized. Here, we present a workflow where chemical reactions typically developed in academia and unconventional in drug discovery are exploited to dramatically expand the chemistry space accessible to virtual screening. We use this process to generate a first version of the Pan-Canadian Chemical Library, a collection of nearly 150 billion diverse compounds that does not overlap with other ultra-large libraries such as Enamine REAL or SAVI and could be a resource of choice for protein targets where other libraries have failed to deliver bioactive molecules.
Collapse
Affiliation(s)
- Corentin Bedart
- Structural Genomics Consortium, University of Toronto, Toronto, Ontario, M5G 1L7, Canada
- Univ. Lille, Inserm, CHU Lille, U1286 - INFINITE - Institute for Translational Research in Inflammation, F-59000, Lille, France
| | - Grace Shimokura
- Davenport Research Laboratories, Dept. of Chemistry, University of Toronto, 80 St. George Street, Toronto, ON, M5S 3H6, Canada
| | - Frederick G West
- Department of Chemistry, University of Alberta, Edmonton, AB, T6G 2G2, Canada
| | - Tabitha E Wood
- Department of Chemistry, The University of Winnipeg, 515 Portage Avenue, Winnipeg, MB, R3B 2E9, Canada
| | - Robert A Batey
- Davenport Research Laboratories, Dept. of Chemistry, University of Toronto, 80 St. George Street, Toronto, ON, M5S 3H6, Canada
- Acceleration Consortium, University of Toronto, Toronto, ON, M5S 3H6, Canada
| | - John J Irwin
- Department of Pharmaceutical Chemistry, University of California San Francisco, San Francisco, California, 94143, USA.
| | - Matthieu Schapira
- Structural Genomics Consortium, University of Toronto, Toronto, Ontario, M5G 1L7, Canada.
- Department of Pharmacology and Toxicology, University of Toronto, Toronto, Ontario, M5S 1A1, Canada.
| |
Collapse
|
13
|
Bedart C, Simoben CV, Schapira M. Emerging structure-based computational methods to screen the exploding accessible chemical space. Curr Opin Struct Biol 2024; 86:102812. [PMID: 38603987 DOI: 10.1016/j.sbi.2024.102812] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2024] [Revised: 03/15/2024] [Accepted: 03/16/2024] [Indexed: 04/13/2024]
Abstract
Structure-based virtual screening can be a valuable approach to computationally select hit candidates based on their predicted interaction with a protein of interest. The recent explosion in the size of chemical libraries increases the chances of hitting high-quality compounds during virtual screening exercises but also poses new challenges as the number of chemically accessible molecules grows faster than the computing power necessary to screen them. We review here two novel approaches rapidly gaining in popularity to address this problem: machine learning-accelerated and synthon-based library screening. We summarize the results from seminal proof-of-concept studies, highlight the latest developments, and discuss limitations and future directions.
Collapse
Affiliation(s)
- Corentin Bedart
- Univ. Lille, Inserm, CHU Lille, U1286 - INFINITE - Institute for Translational Research in Inflammation, F-59000, Lille, France
| | - Conrad Veranso Simoben
- Structural Genomics Consortium, University of Toronto, 101 College Street, MaRS South Tower, Suite 700, Toronto, Ontario M5G 1L7, Canada
| | - Matthieu Schapira
- Structural Genomics Consortium, University of Toronto, 101 College Street, MaRS South Tower, Suite 700, Toronto, Ontario M5G 1L7, Canada; Department of Pharmacology and Toxicology, University of Toronto, 1 King's College Circle, Toronto, Ontario M5S 1A8, Canada.
| |
Collapse
|
14
|
Kallert E, Almena Rodriguez L, Husmann JÅ, Blatt K, Kersten C. Structure-based virtual screening of unbiased and RNA-focused libraries to identify new ligands for the HCV IRES model system. RSC Med Chem 2024; 15:1527-1538. [PMID: 38784459 PMCID: PMC11110755 DOI: 10.1039/d3md00696d] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2023] [Accepted: 03/16/2024] [Indexed: 05/25/2024] Open
Abstract
Targeting RNA including viral RNAs with small molecules is an emerging field. The hepatitis C virus internal ribosome entry site (HCV IRES) is a potential target for translation inhibitor development to raise drug resistance mutation preparedness. Using RNA-focused and unbiased molecule libraries, a structure-based virtual screening (VS) by molecular docking and pharmacophore analysis was performed against the HCV IRES subdomain IIa. VS hits were validated by a microscale thermophoresis (MST) binding assay and a Förster resonance energy transfer (FRET) assay elucidating ligand-induced conformational changes. Ten hit molecules were identified with potencies in the high to medium micromolar range proving the suitability of structure-based virtual screenings against RNA-targets. Hit compounds from a 2-guanidino-quinazoline series, like the strongest binder, compound 8b with an EC50 of 61 μM, show low molecular weight, moderate lipophilicity and reduced basicity compared to previously reported IRES ligands. Therefore, it can be considered as a potential starting point for further optimization by chemical derivatization.
Collapse
Affiliation(s)
- Elisabeth Kallert
- Institute of Pharmaceutical and Biomedical Sciences, Johannes Gutenberg-University Staudingerweg 5 55128 Mainz Germany
| | - Laura Almena Rodriguez
- Institute of Pharmaceutical and Biomedical Sciences, Johannes Gutenberg-University Staudingerweg 5 55128 Mainz Germany
| | - Jan-Åke Husmann
- Institute of Pharmaceutical and Biomedical Sciences, Johannes Gutenberg-University Staudingerweg 5 55128 Mainz Germany
| | - Kathrin Blatt
- Institute of Pharmaceutical and Biomedical Sciences, Johannes Gutenberg-University Staudingerweg 5 55128 Mainz Germany
| | - Christian Kersten
- Institute of Pharmaceutical and Biomedical Sciences, Johannes Gutenberg-University Staudingerweg 5 55128 Mainz Germany
- Institute for Quantitative and Computational Biosciences, Johannes Gutenberg-University BioZentrum I, Hanns-Dieter-Hüsch-Weg 15 55128 Mainz Germany
| |
Collapse
|
15
|
Song RX, Nicklaus MC, Tarasova NI. Correlation of protein binding pocket properties with hits' chemistries used in generation of ultra-large virtual libraries. J Comput Aided Mol Des 2024; 38:22. [PMID: 38753096 PMCID: PMC11098933 DOI: 10.1007/s10822-024-00562-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2024] [Accepted: 04/22/2024] [Indexed: 05/19/2024]
Abstract
Although the size of virtual libraries of synthesizable compounds is growing rapidly, we are still enumerating only tiny fractions of the drug-like chemical universe. Our capability to mine these newly generated libraries also lags their growth. That is why fragment-based approaches that utilize on-demand virtual combinatorial libraries are gaining popularity in drug discovery. These à la carte libraries utilize synthetic blocks found to be effective binders in parts of target protein pockets and a variety of reliable chemistries to connect them. There is, however, no data on the potential impact of the chemistries used for making on-demand libraries on the hit rates during virtual screening. There are also no rules to guide in the selection of these synthetic methods for production of custom libraries. We have used the SAVI (Synthetically Accessible Virtual Inventory) library, constructed using 53 reliable reaction types (transforms), to evaluate the impact of these chemistries on docking hit rates for 40 well-characterized protein pockets. The data shows that the virtual hit rates differ significantly for different chemistries with cross coupling reactions such as Sonogashira, Suzuki-Miyaura, Hiyama and Liebeskind-Srogl coupling producing the highest hit rates. Virtual hit rates appear to depend not only on the property of the formed chemical bond but also on the diversity of available building blocks and the scope of the reaction. The data identifies reactions that deserve wider use through increasing the number of corresponding building blocks and suggests the reactions that are more effective for pockets with certain physical and hydrogen bond-forming properties.
Collapse
Affiliation(s)
- Robert X Song
- Cancer Innovation Laboratory, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Frederick, MD, 21702, USA
| | - Marc C Nicklaus
- Computer-Aided Drug Design Group, Chemical Biology Laboratory, Center for Cancer Research, National Cancer Institute, NIH, Frederick, MD, 21702, USA
| | - Nadya I Tarasova
- Cancer Innovation Laboratory, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Frederick, MD, 21702, USA.
| |
Collapse
|
16
|
Raush E, Abagyan R, Totrov M. Efficient Generation of Conformer Ensembles Using Internal Coordinates and a Generative Directional Graph Convolution Neural Network. J Chem Theory Comput 2024; 20:4054-4063. [PMID: 38669307 DOI: 10.1021/acs.jctc.4c00280] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/28/2024]
Abstract
We present a neural-network-based high-throughput molecular conformer-generation algorithm. A chemical graph-convolutional network is trained to predict low-energy conformers in internal coordinate representation (bond lengths, bond, and torsion angles), starting from two-dimensional (2D) chemical topology. Generative neural network (NN) architecture performs denoising from torsion space, producing conformer ensembles with populations that are well correlated with torsion energy profiles. Short force-field-based energy minimization is applied to refine final conformers. All computation-intensive stages of the algorithm are GPU-optimized. The procedure (termed GINGER) is benchmarked on a commonly used test set of bioactive three-dimensional (3D) conformers from the PDB. We demonstrate highly competitive results in conformer recovery and throughput rates suitable for giga-scale compound library processing. A web server that allows interactive conformer ensemble generation by GINGER and their viewing is made freely available at https://www.molsoft.com/gingerdemo.html.
Collapse
Affiliation(s)
- Eugene Raush
- Molsoft L.L.C., 11199 Sorrento Valley Road, S209, San Diego, California 92121, United States
| | - Ruben Abagyan
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, California 92093, United States
| | - Maxim Totrov
- Molsoft L.L.C., 11199 Sorrento Valley Road, S209, San Diego, California 92121, United States
| |
Collapse
|
17
|
Luginina AP, Khnykin AN, Khorn PA, Moiseeva OV, Safronova NA, Pospelov VA, Dashevskii DE, Belousov AS, Borschevskiy VI, Mishin AV. Rational Design of Drugs Targeting G-Protein-Coupled Receptors: Ligand Search and Screening. BIOCHEMISTRY. BIOKHIMIIA 2024; 89:958-972. [PMID: 38880655 DOI: 10.1134/s0006297924050158] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/10/2024] [Revised: 02/22/2024] [Accepted: 02/23/2024] [Indexed: 06/18/2024]
Abstract
G protein-coupled receptors (GPCRs) are transmembrane proteins that participate in many physiological processes and represent major pharmacological targets. Recent advances in structural biology of GPCRs have enabled the development of drugs based on the receptor structure (structure-based drug design, SBDD). SBDD utilizes information about the receptor-ligand complex to search for suitable compounds, thus expanding the chemical space of possible receptor ligands without the need for experimental screening. The review describes the use of structure-based virtual screening (SBVS) for GPCR ligands and approaches for the functional testing of potential drug compounds, as well as discusses recent advances and successful examples in the application of SBDD for the identification of GPCR ligands.
Collapse
Affiliation(s)
- Aleksandra P Luginina
- Research Center for Molecular Mechanisms of Aging and Age-Related Diseases, Moscow Institute of Physics and Technology, Dolgoprudny, Moscow Region, 141701, Russia
| | - Andrey N Khnykin
- Research Center for Molecular Mechanisms of Aging and Age-Related Diseases, Moscow Institute of Physics and Technology, Dolgoprudny, Moscow Region, 141701, Russia
| | - Polina A Khorn
- Research Center for Molecular Mechanisms of Aging and Age-Related Diseases, Moscow Institute of Physics and Technology, Dolgoprudny, Moscow Region, 141701, Russia
| | - Olga V Moiseeva
- Research Center for Molecular Mechanisms of Aging and Age-Related Diseases, Moscow Institute of Physics and Technology, Dolgoprudny, Moscow Region, 141701, Russia
- Skryabin Institute of Biochemistry and Physiology of Microorganisms, Russian Academy of Sciences, Pushchino, Moscow Region, 142290, Russia
| | - Nadezhda A Safronova
- Research Center for Molecular Mechanisms of Aging and Age-Related Diseases, Moscow Institute of Physics and Technology, Dolgoprudny, Moscow Region, 141701, Russia
| | - Vladimir A Pospelov
- Research Center for Molecular Mechanisms of Aging and Age-Related Diseases, Moscow Institute of Physics and Technology, Dolgoprudny, Moscow Region, 141701, Russia
| | - Dmitrii E Dashevskii
- Research Center for Molecular Mechanisms of Aging and Age-Related Diseases, Moscow Institute of Physics and Technology, Dolgoprudny, Moscow Region, 141701, Russia
| | - Anatolii S Belousov
- Research Center for Molecular Mechanisms of Aging and Age-Related Diseases, Moscow Institute of Physics and Technology, Dolgoprudny, Moscow Region, 141701, Russia
| | - Valentin I Borschevskiy
- Research Center for Molecular Mechanisms of Aging and Age-Related Diseases, Moscow Institute of Physics and Technology, Dolgoprudny, Moscow Region, 141701, Russia.
- Frank Laboratory of Neutron Physics, Joint Institute for Nuclear Research, Dubna, Moscow Region, 141980, Russia
| | - Alexey V Mishin
- Research Center for Molecular Mechanisms of Aging and Age-Related Diseases, Moscow Institute of Physics and Technology, Dolgoprudny, Moscow Region, 141701, Russia.
| |
Collapse
|
18
|
Mahjour BA, Coley CW. RDCanon: A Python Package for Canonicalizing the Order of Tokens in SMARTS Queries. J Chem Inf Model 2024; 64:2948-2954. [PMID: 38488634 DOI: 10.1021/acs.jcim.4c00138] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/23/2024]
Abstract
SMARTS is a widely used language in cheminformatics for defining substructural queries for database lookups, reaction templates for chemical transformations, and other applications. As an extension to SMILES, many SMARTS patterns can represent the same query. Despite this, no canonicalization algorithm invariant of the line notation sequence or atomic numbering is publicly available. Here, we introduce RDCanon, an open-source Python package that can be used to standardize SMARTS queries. RDCanon is designed to ensure that the sequence of atomic queries remains consistent for all graphs representing the same substructure query and to ensure a canonical sequence of primitives within each individual atom query; furthermore, the algorithm can be applied to canonicalize the order of reactants, agents, and products and their atom map numbers in reaction SMARTS templates. As part of its canonicalization algorithm, RDCanon provides a mechanism in which the canonicalized SMARTS is optimized for speed against specific molecular databases. Several case studies are provided to showcase improved efficiency in substructure matching and retrosynthetic analysis.
Collapse
Affiliation(s)
- Babak A Mahjour
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Connor W Coley
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| |
Collapse
|
19
|
Marin E, Kovaleva M, Kadukova M, Mustafin K, Khorn P, Rogachev A, Mishin A, Guskov A, Borshchevskiy V. Regression-Based Active Learning for Accessible Acceleration of Ultra-Large Library Docking. J Chem Inf Model 2024; 64:2612-2623. [PMID: 38157481 PMCID: PMC11005039 DOI: 10.1021/acs.jcim.3c01661] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2023] [Revised: 11/28/2023] [Accepted: 12/04/2023] [Indexed: 01/03/2024]
Abstract
Structure-based drug discovery is a process for both hit finding and optimization that relies on a validated three-dimensional model of a target biomolecule, used to rationalize the structure-function relationship for this particular target. An ultralarge virtual screening approach has emerged recently for rapid discovery of high-affinity hit compounds, but it requires substantial computational resources. This study shows that active learning with simple linear regression models can accelerate virtual screening, retrieving up to 90% of the top-1% of the docking hit list after docking just 10% of the ligands. The results demonstrate that it is unnecessary to use complex models, such as deep learning approaches, to predict the imprecise results of ligand docking with a low sampling depth. Furthermore, we explore active learning meta-parameters and find that constant batch size models with a simple ensembling method provide the best ligand retrieval rate. Finally, our approach is validated on the ultralarge size virtual screening data set, retrieving 70% of the top-0.05% of ligands after screening only 2% of the library. Altogether, this work provides a computationally accessible approach for accelerated virtual screening that can serve as a blueprint for the future design of low-compute agents for exploration of the chemical space via large-scale accelerated docking. With recent breakthroughs in protein structure prediction, this method can significantly increase accessibility for the academic community and aid in the rapid discovery of high-affinity hit compounds for various targets.
Collapse
Affiliation(s)
- Egor Marin
- Research
Center for Molecular Mechanisms of Aging and Age-related Diseases, Moscow Institute of Physics and Technology, Dolgoprudny 141701, Russia
| | - Margarita Kovaleva
- Research
Center for Molecular Mechanisms of Aging and Age-related Diseases, Moscow Institute of Physics and Technology, Dolgoprudny 141701, Russia
| | - Maria Kadukova
- Research
Center for Molecular Mechanisms of Aging and Age-related Diseases, Moscow Institute of Physics and Technology, Dolgoprudny 141701, Russia
- University
Grenoble Alpes, Inria, CNRS, Grenoble INP, LJK, 38000 Grenoble, France
| | - Khalid Mustafin
- Research
Center for Molecular Mechanisms of Aging and Age-related Diseases, Moscow Institute of Physics and Technology, Dolgoprudny 141701, Russia
| | - Polina Khorn
- Research
Center for Molecular Mechanisms of Aging and Age-related Diseases, Moscow Institute of Physics and Technology, Dolgoprudny 141701, Russia
| | - Andrey Rogachev
- Research
Center for Molecular Mechanisms of Aging and Age-related Diseases, Moscow Institute of Physics and Technology, Dolgoprudny 141701, Russia
- Joint
Institute for Nuclear Research, Dubna 141980, Russian
Federation
| | - Alexey Mishin
- Research
Center for Molecular Mechanisms of Aging and Age-related Diseases, Moscow Institute of Physics and Technology, Dolgoprudny 141701, Russia
| | - Albert Guskov
- Groningen
Biomolecular Sciences and Biotechnology Institute, University of Groningen, Nijenborgh 4, 9747 AG Groningen, The Netherlands
| | - Valentin Borshchevskiy
- Research
Center for Molecular Mechanisms of Aging and Age-related Diseases, Moscow Institute of Physics and Technology, Dolgoprudny 141701, Russia
- Joint
Institute for Nuclear Research, Dubna 141980, Russian
Federation
| |
Collapse
|
20
|
Roggia M, Natale B, Amendola G, Di Maro S, Cosconati S. Streamlining Large Chemical Library Docking with Artificial Intelligence: the PyRMD2Dock Approach. J Chem Inf Model 2024; 64:2143-2149. [PMID: 37552222 PMCID: PMC11005044 DOI: 10.1021/acs.jcim.3c00647] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2023] [Indexed: 08/09/2023]
Abstract
The present contribution introduces a novel computational protocol called PyRMD2Dock, which combines the Ligand-Based Virtual Screening (LBVS) tool PyRMD with the popular docking software AutoDock-GPU (AD4-GPU) to enhance the throughput of virtual screening campaigns for drug discovery. By implementing PyRMD2Dock, we demonstrate that it is possible to rapidly screen massive chemical databases and identify those with the highest predicted binding affinity to a target protein. Our benchmarking and screening experiments illustrate the predictive power and speed of PyRMD2Dock and highlight its potential to accelerate the discovery of novel drug candidates. Overall, this study showcases the value of combining AI-powered LBVS tools with docking software to enable effective and high-throughput virtual screening of ultralarge molecular databases in drug discovery. PyRMD and the PyRMD2Dock protocol are freely available on GitHub (https://github.com/cosconatilab/PyRMD) as an open-source tool.
Collapse
Affiliation(s)
- Michele Roggia
- DiSTABiF, University
of Campania Luigi Vanvitelli, Via Vivaldi 43, 81100 Caserta, Italy
| | - Benito Natale
- DiSTABiF, University
of Campania Luigi Vanvitelli, Via Vivaldi 43, 81100 Caserta, Italy
| | - Giorgio Amendola
- DiSTABiF, University
of Campania Luigi Vanvitelli, Via Vivaldi 43, 81100 Caserta, Italy
| | - Salvatore Di Maro
- DiSTABiF, University
of Campania Luigi Vanvitelli, Via Vivaldi 43, 81100 Caserta, Italy
| | - Sandro Cosconati
- DiSTABiF, University
of Campania Luigi Vanvitelli, Via Vivaldi 43, 81100 Caserta, Italy
| |
Collapse
|
21
|
Vogt M. Chemoinformatic approaches for navigating large chemical spaces. Expert Opin Drug Discov 2024; 19:403-414. [PMID: 38300511 DOI: 10.1080/17460441.2024.2313475] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2023] [Accepted: 01/30/2024] [Indexed: 02/02/2024]
Abstract
INTRODUCTION Large chemical spaces (CSs) include traditional large compound collections, combinatorial libraries covering billions to trillions of molecules, DNA-encoded chemical libraries comprising complete combinatorial CSs in a single mixture, and virtual CSs explored by generative models. The diverse nature of these types of CSs require different chemoinformatic approaches for navigation. AREAS COVERED An overview of different types of large CSs is provided. Molecular representations and similarity metrics suitable for large CS exploration are discussed. A summary of navigation of CSs in generative models is provided. Methods for characterizing and comparing CSs are discussed. EXPERT OPINION The size of large CSs might restrict navigation to specialized algorithms and limit it to considering neighborhoods of structurally similar molecules. Efficient navigation of large CSs not only requires methods that scale with size but also requires smart approaches that focus on better but not necessarily larger molecule selections. Deep generative models aim to provide such approaches by implicitly learning features relevant for targeted biological properties. It is unclear whether these models can fulfill this ideal as validation is difficult as long as the covered CSs remain mainly virtual without experimental verification.
Collapse
Affiliation(s)
- Martin Vogt
- Department of Life Science Informatics, B-IT, LIMES Program Unit Chemical Biology and Medicinal Chemistry, Rheinische Friedrich-Wilhelms-Universität, Bonn, Germany
| |
Collapse
|
22
|
Sindt F, Seyller A, Eguida M, Rognan D. Protein Structure-Based Organic Chemistry-Driven Ligand Design from Ultralarge Chemical Spaces. ACS CENTRAL SCIENCE 2024; 10:615-627. [PMID: 38559302 PMCID: PMC10979501 DOI: 10.1021/acscentsci.3c01521] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/08/2023] [Revised: 01/25/2024] [Accepted: 01/29/2024] [Indexed: 04/04/2024]
Abstract
Ultralarge chemical spaces describing several billion compounds are revolutionizing hit identification in early drug discovery. Because of their size, such chemical spaces cannot be fully enumerated and require ad-hoc computational tools to navigate them and pick potentially interesting hits. We here propose a structure-based approach to ultralarge chemical space screening in which commercial chemical reagents are first docked to the target of interest and then directly connected according to organic chemistry and topological rules, to enumerate drug-like compounds under three-dimensional constraints of the target. When applied to bespoke chemical spaces of different sizes and chemical complexity targeting two receptors of pharmaceutical interest (estrogen β receptor, dopamine D3 receptor), the computational method was able to quickly enumerate hits that were either known ligands (or very close analogs) of targeted receptors as well as chemically novel candidates that could be experimentally confirmed by in vitro binding assays. The proposed approach is generic, can be applied to any docking algorithm, and requires few computational resources to prioritize easily synthesizable hits from billion-sized chemical spaces.
Collapse
Affiliation(s)
- François Sindt
- Laboratoire d’innovation
thérapeutique, UMR7200 CNRS-Université de Strasbourg, Illkirch 67400, France
| | - Anthony Seyller
- Laboratoire d’innovation
thérapeutique, UMR7200 CNRS-Université de Strasbourg, Illkirch 67400, France
| | | | - Didier Rognan
- Laboratoire d’innovation
thérapeutique, UMR7200 CNRS-Université de Strasbourg, Illkirch 67400, France
| |
Collapse
|
23
|
Weller JA, Rohs R. DrugHIVE: Target-specific spatial drug design and optimization with a hierarchical generative model. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.12.22.573155. [PMID: 38187658 PMCID: PMC10769420 DOI: 10.1101/2023.12.22.573155] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/09/2024]
Abstract
Rapid advancement in the computational methods of structure-based drug design has led to their widespread adoption as key tools in the early drug development process. Recently, the remarkable growth of available crystal structure data and libraries of commercially available or readily synthesizable molecules have unlocked previously inaccessible regions of chemical space for drug development. Paired with improvements in virtual ligand screening methods, these expanded libraries are having a significant impact on the success of early drug design efforts. However, screening-based methods are limited in their scalability due to computational limits and the sheer scale of drug-like space. An approach within the quickly evolving field of artificial intelligence (AI), deep generative modeling, is extending the reach of molecular design beyond classical methods by learning the fundamental intra- and inter-molecular relationships in drug-target systems from existing data. In this work we introduce DrugHIVE, a deep hierarchical structure-based generative model that enables fine-grained control over molecular generation. Our model outperforms state of the art autoregressive and diffusion-based methods on common benchmarks and in speed of generation. Here, we demonstrate DrugHIVEs capacity to accelerate a wide range of common drug design tasks such as de novo generation, molecular optimization, scaffold hopping, linker design, and high throughput pattern replacement. Our method is highly scalable and can be applied to high confidence AlphaFold predicted receptors, extending our ability to generate high quality drug-like molecules to a majority of the unsolved human proteome.
Collapse
|
24
|
Hönig SMN, Flachsenberg F, Ehrt C, Neumann A, Schmidt R, Lemmen C, Rarey M. SpaceGrow: efficient shape-based virtual screening of billion-sized combinatorial fragment spaces. J Comput Aided Mol Des 2024; 38:13. [PMID: 38493240 PMCID: PMC10944417 DOI: 10.1007/s10822-024-00551-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2023] [Accepted: 02/13/2024] [Indexed: 03/18/2024]
Abstract
The growing size of make-on-demand chemical libraries is posing new challenges to cheminformatics. These ultra-large chemical libraries became too large for exhaustive enumeration. Using a combinatorial approach instead, the resource requirement scales approximately with the number of synthons instead of the number of molecules. This gives access to billions or trillions of compounds as so-called chemical spaces with moderate hardware and in a reasonable time frame. While extremely performant ligand-based 2D methods exist in this context, 3D methods still largely rely on exhaustive enumeration and therefore fail to apply. Here, we present SpaceGrow: a novel shape-based 3D approach for ligand-based virtual screening of billions of compounds within hours on a single CPU. Compared to a conventional superposition tool, SpaceGrow shows comparable pose reproduction capacity based on RMSD and superior ranking performance while being orders of magnitude faster. Result assessment of two differently sized subsets of the eXplore space reveals a higher probability of finding superior results in larger spaces highlighting the potential of searching in ultra-large spaces. Furthermore, the application of SpaceGrow in a drug discovery workflow was investigated in four examples involving G protein-coupled receptors (GPCRs) with the aim to identify compounds with similar binding capabilities and molecular novelty.
Collapse
Affiliation(s)
- Sophia M N Hönig
- BioSolveIT, An der Ziegelei 79, 53757, Sankt Augustin, Germany
- Universität Hamburg, ZBH - Center for Bioinformatics, Albert-Einstein-Ring 8-10, 22761, Hamburg, Germany
| | | | - Christiane Ehrt
- Universität Hamburg, ZBH - Center for Bioinformatics, Albert-Einstein-Ring 8-10, 22761, Hamburg, Germany
| | | | - Robert Schmidt
- BioSolveIT, An der Ziegelei 79, 53757, Sankt Augustin, Germany
| | | | - Matthias Rarey
- Universität Hamburg, ZBH - Center for Bioinformatics, Albert-Einstein-Ring 8-10, 22761, Hamburg, Germany.
| |
Collapse
|
25
|
Cheng C, Beroza P. Shape-Aware Synthon Search (SASS) for Virtual Screening of Synthon-Based Chemical Spaces. J Chem Inf Model 2024; 64:1251-1260. [PMID: 38335044 DOI: 10.1021/acs.jcim.3c01865] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/12/2024]
Abstract
Virtual screening of large-scale chemical libraries has become increasingly useful for identifying high-quality candidates for drug discovery. While it is possible to exhaustively screen chemical spaces that number on the order of billions, indirect combinatorial approaches are needed to efficiently navigate larger, synthon-based virtual spaces. We describe Shape-Aware Synthon Search (SASS), a synthon-based virtual screening method that carries out shape similarity searches in the synthon space instead of the enumerated product space. SASS can replicate results from exhaustive searches in ultralarge, combinatorial spaces with high recall on a variety of query molecules while only scoring a small subspace of possible enumerated products, thereby significantly accelerating large-scale, shape-based virtual screening.
Collapse
Affiliation(s)
- Chen Cheng
- Discovery Chemistry, Genentech, South San Francisco, California 94080, United States
| | - Paul Beroza
- Discovery Chemistry, Genentech, South San Francisco, California 94080, United States
| |
Collapse
|
26
|
Klarich K, Goldman B, Kramer T, Riley P, Walters WP. Thompson Sampling─An Efficient Method for Searching Ultralarge Synthesis on Demand Databases. J Chem Inf Model 2024; 64:1158-1171. [PMID: 38316125 PMCID: PMC10900287 DOI: 10.1021/acs.jcim.3c01790] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2023] [Revised: 01/22/2024] [Accepted: 01/23/2024] [Indexed: 02/07/2024]
Abstract
Over the last five years, virtual screening of ultralarge synthesis on-demand libraries has emerged as a powerful tool for hit identification in drug discovery programs. As these libraries have grown to tens of billions of molecules, we have reached a point where it is no longer cost-effective to screen every molecule virtually. To address these challenges, several groups have developed heuristic search methods to rapidly identify the best molecules on a virtual screen. This article describes the application of Thompson sampling (TS), an active learning approach that streamlines the virtual screening of large combinatorial libraries by performing a probabilistic search in the reagent space, thereby never requiring the full enumeration of the library. TS is a general technique that can be applied to various virtual screening modalities, including 2D and 3D similarity search, docking, and application of machine-learning models. In an illustrative example, we show that TS can identify more than half of the top 100 molecules from a docking-based virtual screen of 335 million molecules by evaluating 1% of the data set.
Collapse
Affiliation(s)
- Kathryn Klarich
- ReNAgade
Therapeutics, 640 Memorial Drive, Cambridge, Massachusetts 02139, United States
| | - Brian Goldman
- Relay
Therapeutics, 399 Binney Street, Cambridge, Massachusetts 02141, United States
| | - Trevor Kramer
- Relay
Therapeutics, 399 Binney Street, Cambridge, Massachusetts 02141, United States
| | - Patrick Riley
- Relay
Therapeutics, 399 Binney Street, Cambridge, Massachusetts 02141, United States
| | - W. Patrick Walters
- Relay
Therapeutics, 399 Binney Street, Cambridge, Massachusetts 02141, United States
| |
Collapse
|
27
|
Woodhead AJ, Erlanson DA, de Esch IJP, Holvey RS, Jahnke W, Pathuri P. Fragment-to-Lead Medicinal Chemistry Publications in 2022. J Med Chem 2024; 67:2287-2304. [PMID: 38289623 DOI: 10.1021/acs.jmedchem.3c02070] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/02/2024]
Abstract
This Perspective is the eighth in an annual series that summarizes successful fragment-to-lead (F2L) case studies published each year. A tabulated summary of relevant articles published in 2022 is provided, and features such as target class, screening methods, and ligand efficiency are discussed both for the 2022 examples and for the combined examples over the years 2015-2022. In addition, trends and new developments in the field are summarized. In 2022, 18 publications described successful fragment-to-lead studies, including the development of three clinical compounds (MTRX1719, MK-8189, and BI-823911).
Collapse
Affiliation(s)
- Andrew J Woodhead
- Astex Pharmaceuticals, 436 Cambridge Science Park, Milton Road, Cambridge CB4 0QA, United Kingdom
| | - Daniel A Erlanson
- Frontier Medicines, 151 Oyster Point Blvd., South San Francisco, California 94080, United States
| | - Iwan J P de Esch
- Division of Medicinal Chemistry, Amsterdam Institute for Molecules, Medicines and Systems (AIMMS), Vrije Universiteit Amsterdam, De Boelelaan 1108, 1081 HZ Amsterdam, The Netherlands
| | - Rhian S Holvey
- Astex Pharmaceuticals, 436 Cambridge Science Park, Milton Road, Cambridge CB4 0QA, United Kingdom
| | - Wolfgang Jahnke
- Novartis Biomedical Research, Discovery Sciences, 4002 Basel, Switzerland
| | - Puja Pathuri
- Astex Pharmaceuticals, 436 Cambridge Science Park, Milton Road, Cambridge CB4 0QA, United Kingdom
| |
Collapse
|
28
|
Tropsha A, Isayev O, Varnek A, Schneider G, Cherkasov A. Integrating QSAR modelling and deep learning in drug discovery: the emergence of deep QSAR. Nat Rev Drug Discov 2024; 23:141-155. [PMID: 38066301 DOI: 10.1038/s41573-023-00832-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/21/2023] [Indexed: 02/08/2024]
Abstract
Quantitative structure-activity relationship (QSAR) modelling, an approach that was introduced 60 years ago, is widely used in computer-aided drug design. In recent years, progress in artificial intelligence techniques, such as deep learning, the rapid growth of databases of molecules for virtual screening and dramatic improvements in computational power have supported the emergence of a new field of QSAR applications that we term 'deep QSAR'. Marking a decade from the pioneering applications of deep QSAR to tasks involved in small-molecule drug discovery, we herein describe key advances in the field, including deep generative and reinforcement learning approaches in molecular design, deep learning models for synthetic planning and the application of deep QSAR models in structure-based virtual screening. We also reflect on the emergence of quantum computing, which promises to further accelerate deep QSAR applications and the need for open-source and democratized resources to support computer-aided drug design.
Collapse
Affiliation(s)
| | | | | | | | - Artem Cherkasov
- University of British Columbia, Vancouver, BC, Canada.
- Photonic Inc., Coquitlam, BC, Canada.
| |
Collapse
|
29
|
Olmedo DA, Durant-Archibold AA, López-Pérez JL, Medina-Franco JL. Design and Diversity Analysis of Chemical Libraries in Drug Discovery. Comb Chem High Throughput Screen 2024; 27:502-515. [PMID: 37409545 DOI: 10.2174/1386207326666230705150110] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2023] [Revised: 05/30/2023] [Accepted: 05/30/2023] [Indexed: 07/07/2023]
Abstract
Chemical libraries and compound data sets are among the main inputs to start the drug discovery process at universities, research institutes, and the pharmaceutical industry. The approach used in the design of compound libraries, the chemical information they possess, and the representation of structures, play a fundamental role in the development of studies: chemoinformatics, food informatics, in silico pharmacokinetics, computational toxicology, bioinformatics, and molecular modeling to generate computational hits that will continue the optimization process of drug candidates. The prospects for growth in drug discovery and development processes in chemical, biotechnological, and pharmaceutical companies began a few years ago by integrating computational tools with artificial intelligence methodologies. It is anticipated that it will increase the number of drugs approved by regulatory agencies shortly.
Collapse
Affiliation(s)
- Dionisio A Olmedo
- Centro de Investigaciones Farmacognósticas de la Flora Panameña (CIFLORPAN), Facultad de Farmacia, Universidad de Panamá, Ciudad de Panamá, Apartado, 0824-00178, Panamá
- Sistema Nacional de Investigación (SNI), Secretaria Nacional de Ciencia, Tecnología e Innovación (SENACYT), Ciudad del Saber, Clayton, Panamá
| | - Armando A Durant-Archibold
- Centro de Biodiversidad y Descubrimiento de Drogas, Instituto de Investigaciones Científicas y Servicios de Alta Tecnología (INDICASAT AIP), Apartado, 0843-01103, Panamá
- Departamento de Bioquímica, Facultad de Ciencias Naturales, Exactas y Tecnología, Universidad de Panamá, Ciudad de Panamá, Panamá
| | - José Luis López-Pérez
- CESIFAR, Departamento de Farmacología, Facultad de Medicina, Universidad de Panamá, Ciudad de Panamá, Panamá
- Departamento de Ciencias Farmacéuticas, Facultad de Farmacia, Universidad de Salamanca, Avda. Campo Charro s/n, 37071 Salamanca, España
| | - José Luis Medina-Franco
- DIFACQUIM Grupo de Investigación, Departamento de Farmacia, Escuela de Química, Universidad Nacional Autónoma de México, Ciudad de México, Apartado, 04510, México
| |
Collapse
|
30
|
John L, Nagamani S, Mahanta HJ, Vaikundamani S, Kumar N, Kumar A, Jamir E, Priyadarsinee L, Sastry GN. Molecular Property Diagnostic Suite Compound Library (MPDS-CL): a structure-based classification of the chemical space. Mol Divers 2023:10.1007/s11030-023-10752-1. [PMID: 37902900 DOI: 10.1007/s11030-023-10752-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2023] [Accepted: 10/17/2023] [Indexed: 11/01/2023]
Abstract
Molecular Property Diagnostic Suite Compound Library (MPDS-CL) is an open-source Galaxy-based cheminformatics web portal which presents a structure-based classification of the molecules. A structure-based classification of nearly 150 million unique compounds, obtained from 42 publicly available databases and curated for redundancy removal through 97 hierarchically well-defined atom composition-based portions, has been done. These are further subjected to 56-bit fingerprint-based classification algorithm which led to the formation of 56 structurally well-defined classes. The classes thus obtained were further divided into clusters based on their molecular weight. Thus, the entire set of molecules was put into 56 different classes and 625 clusters. This led to the assignment of a unique ID, named as MPDS-AadharID, for each of these 149,169,443 molecules. MPDS-AadharID is akin to the unique number given to citizens in India (similar to SSN in the US and NINO in the UK). The unique features of MPDS-CL are (a) several search options, such as exact structure search, substructure search, property-based search, fingerprint-based search, using SMILES, InChIKey and key-in; (b) automatic generation of information for the processing for MPDS and other galaxy tools; (c) providing the class and cluster of a molecule which makes it easier and fast to search for similar molecules and (d) information related to the presence of the molecules in multiple databases. The MPDS-CL can be accessed at https://mpds.neist.res.in:8086/ .
Collapse
Affiliation(s)
- Lijo John
- Advanced Computation and Data Sciences Division, CSIR - North East Institute of Science and Technology, Jorhat, 785006, India
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, 201002, India
| | - Selvaraman Nagamani
- Advanced Computation and Data Sciences Division, CSIR - North East Institute of Science and Technology, Jorhat, 785006, India
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, 201002, India
| | - Hridoy Jyoti Mahanta
- Advanced Computation and Data Sciences Division, CSIR - North East Institute of Science and Technology, Jorhat, 785006, India
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, 201002, India
| | - S Vaikundamani
- Advanced Computation and Data Sciences Division, CSIR - North East Institute of Science and Technology, Jorhat, 785006, India
| | - Nandan Kumar
- Advanced Computation and Data Sciences Division, CSIR - North East Institute of Science and Technology, Jorhat, 785006, India
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, 201002, India
| | - Asheesh Kumar
- Advanced Computation and Data Sciences Division, CSIR - North East Institute of Science and Technology, Jorhat, 785006, India
| | - Esther Jamir
- Advanced Computation and Data Sciences Division, CSIR - North East Institute of Science and Technology, Jorhat, 785006, India
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, 201002, India
| | - Lipsa Priyadarsinee
- Advanced Computation and Data Sciences Division, CSIR - North East Institute of Science and Technology, Jorhat, 785006, India
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, 201002, India
| | - G Narahari Sastry
- Advanced Computation and Data Sciences Division, CSIR - North East Institute of Science and Technology, Jorhat, 785006, India.
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, 201002, India.
| |
Collapse
|
31
|
Buehler Y, Reymond JL. Expanding Bioactive Fragment Space with the Generated Database GDB-13s. J Chem Inf Model 2023; 63:6239-6248. [PMID: 37722101 PMCID: PMC10598793 DOI: 10.1021/acs.jcim.3c01096] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2023] [Indexed: 09/20/2023]
Abstract
Identifying innovative fragments for drug design can help medicinal chemistry address new targets and overcome the limitations of the classical molecular series. By deconstructing molecules into ring fragments (RFs, consisting of ring atoms plus ring-adjacent atoms) and acyclic fragments (AFs, consisting of only acyclic atoms), we find that public databases of molecules (i.e., ZINC and PubChem) and natural products (i.e., COCONUT) contain mostly RFs and AFs of up to 13 atoms. We also find that many RFs and AFs are enriched in bioactive vs inactive compounds from ChEMBL. We then analyze the generated database GDB-13s, which enumerates 99 million possible molecules of up to 13 atoms, for RFs and AFs resembling ChEMBL bioactive RFs and AFs. This analysis reveals a large number of novel RFs and AFs that are structurally simple, have favorable synthetic accessibility scores, and represent opportunities for synthetic chemistry to contribute to drug innovation in the context of fragment-based drug discovery.
Collapse
Affiliation(s)
- Ye Buehler
- Department of Chemistry,
Biochemistry and Pharmaceutical Sciences, University of Bern, Freiestrasse 3, 3012 Bern, Switzerland
| | - Jean-Louis Reymond
- Department of Chemistry,
Biochemistry and Pharmaceutical Sciences, University of Bern, Freiestrasse 3, 3012 Bern, Switzerland
| |
Collapse
|
32
|
Stuart DD, Guzman-Perez A, Brooijmans N, Jackson EL, Kryukov GV, Friedman AA, Hoos A. Precision Oncology Comes of Age: Designing Best-in-Class Small Molecules by Integrating Two Decades of Advances in Chemistry, Target Biology, and Data Science. Cancer Discov 2023; 13:2131-2149. [PMID: 37712571 PMCID: PMC10551669 DOI: 10.1158/2159-8290.cd-23-0280] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2023] [Revised: 04/27/2023] [Accepted: 07/28/2023] [Indexed: 09/16/2023]
Abstract
Small-molecule drugs have enabled the practice of precision oncology for genetically defined patient populations since the first approval of imatinib in 2001. Scientific and technology advances over this 20-year period have driven the evolution of cancer biology, medicinal chemistry, and data science. Collectively, these advances provide tools to more consistently design best-in-class small-molecule drugs against known, previously undruggable, and novel cancer targets. The integration of these tools and their customization in the hands of skilled drug hunters will be necessary to enable the discovery of transformational therapies for patients across a wider spectrum of cancers. SIGNIFICANCE Target-centric small-molecule drug discovery necessitates the consideration of multiple approaches to identify chemical matter that can be optimized into drug candidates. To do this successfully and consistently, drug hunters require a comprehensive toolbox to avoid following the "law of instrument" or Maslow's hammer concept where only one tool is applied regardless of the requirements of the task. Combining our ever-increasing understanding of cancer and cancer targets with the technological advances in drug discovery described below will accelerate the next generation of small-molecule drugs in oncology.
Collapse
Affiliation(s)
| | | | | | | | | | | | - Axel Hoos
- Scorpion Therapeutics, Boston, Massachusetts
| |
Collapse
|
33
|
Sivula T, Yetukuri L, Kalliokoski T, Käsnänen H, Poso A, Pöhner I. Machine Learning-Boosted Docking Enables the Efficient Structure-Based Virtual Screening of Giga-Scale Enumerated Chemical Libraries. J Chem Inf Model 2023; 63:5773-5783. [PMID: 37655823 PMCID: PMC10523430 DOI: 10.1021/acs.jcim.3c01239] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2023] [Indexed: 09/02/2023]
Abstract
The emergence of ultra-large screening libraries, filled to the brim with billions of readily available compounds, poses a growing challenge for docking-based virtual screening. Machine learning (ML)-boosted strategies like the tool HASTEN combine rapid ML prediction with the brute-force docking of small fractions of such libraries to increase screening throughput and take on giga-scale libraries. In our case study of an anti-bacterial chaperone and an anti-viral kinase, we first generated a brute-force docking baseline for 1.56 billion compounds in the Enamine REAL lead-like library with the fast Glide high-throughput virtual screening protocol. With HASTEN, we observed robust recall of 90% of the true 1000 top-scoring virtual hits in both targets when docking only 1% of the entire library. This reduction of the required docking experiments by 99% significantly shortens the screening time. In the kinase target, the employment of a hydrogen bonding constraint resulted in a major proportion of unsuccessful docking attempts and hampered ML predictions. We demonstrate the optimization potential in the treatment of failed compounds when performing ML-boosted screening and benchmark and showcase HASTEN as a fast and robust tool in a growing arsenal of approaches to unlock the chemical space covered by giga-scale screening libraries for everyday drug discovery campaigns.
Collapse
Affiliation(s)
- Toni Sivula
- School
of Pharmacy, University of Eastern Finland, Kuopio FI-70211, Finland
| | | | - Tuomo Kalliokoski
- Computational
Medicine Design, Orion Pharma, Orionintie 1A, Espoo FI-02101, Finland
| | - Heikki Käsnänen
- Computational
Medicine Design, Orion Pharma, Orionintie 1A, Espoo FI-02101, Finland
| | - Antti Poso
- School
of Pharmacy, University of Eastern Finland, Kuopio FI-70211, Finland
- Department
of Pharmaceutical and Medicinal Chemistry, Institute of Pharmaceutical
Sciences, Eberhard Karls University, Tübingen DE-72076, Germany
- Cluster
of Excellence iFIT (EXC 2180) “Image-Guided and Functionally
Instructed Tumor Therapies”, University
of Tübingen, Tübingen DE-72076, Germany
- Tübingen
Center for Academic Drug Discovery & Development (TüCAD2), Tübingen DE-72076, Germany
| | - Ina Pöhner
- School
of Pharmacy, University of Eastern Finland, Kuopio FI-70211, Finland
| |
Collapse
|
34
|
Gonzalez-Ponce K, Horta Andrade C, Hunter F, Kirchmair J, Martinez-Mayorga K, Medina-Franco JL, Rarey M, Tropsha A, Varnek A, Zdrazil B. School of cheminformatics in Latin America. J Cheminform 2023; 15:82. [PMID: 37726809 PMCID: PMC10507835 DOI: 10.1186/s13321-023-00758-0] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2023] [Accepted: 09/10/2023] [Indexed: 09/21/2023] Open
Abstract
We report the major highlights of the School of Cheminformatics in Latin America, Mexico City, November 24-25, 2022. Six lectures, one workshop, and one roundtable with four editors were presented during an online public event with speakers from academia, big pharma, and public research institutions. One thousand one hundred eighty-one students and academics from seventy-nine countries registered for the meeting. As part of the meeting, advances in enumeration and visualization of chemical space, applications in natural product-based drug discovery, drug discovery for neglected diseases, toxicity prediction, and general guidelines for data analysis were discussed. Experts from ChEMBL presented a workshop on how to use the resources of this major compounds database used in cheminformatics. The school also included a round table with editors of cheminformatics journals. The full program of the meeting and the recordings of the sessions are publicly available at https://www.youtube.com/@SchoolChemInfLA/featured .
Collapse
Affiliation(s)
- Karla Gonzalez-Ponce
- Institute of Chemistry, Campus Merida, National Autonomous University of Mexico, Merida‑Tetiz Highway, Km. 4.5, Ucu, Yucatan, Mexico
| | - Carolina Horta Andrade
- LabMol - Laboratory for Molecular Modeling and Drug Design, Faculdade de Farmacia, Universidade Federal de Goias, Goiania, GO, Brazil
| | - Fiona Hunter
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, CB10 1SD, Cambridgeshire, UK
| | - Johannes Kirchmair
- Division of Pharmaceutical Chemistry, Department of Pharmaceutical Sciences, University of Vienna, Josef-Holaubek-Platz 2, 2D 303, 1090, Vienna, Austria
| | - Karina Martinez-Mayorga
- Institute of Chemistry, Campus Merida, National Autonomous University of Mexico, Merida‑Tetiz Highway, Km. 4.5, Ucu, Yucatan, Mexico.
- Institute for Applied Mathematics and Systems, Merida Research Unit, National Autonomous University of Mexico, Sierra Papacal, Merida, Yucatan, Mexico.
| | - José L Medina-Franco
- DIFACQUIM Research Group, Department of Pharmacy, School of Chemistry, National Autonomous University of Mexico, Avenida Universidad 3000, 04510, Mexico City, Mexico.
| | - Matthias Rarey
- ZBH - Center for Bioinformatics, Universität Hamburg, Bundesstraße 43, 20146, Hamburg, Germany
| | - Alexander Tropsha
- Molecular Modeling Laboratory, UNC Eshelman School of Pharmacy, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27599, USA
| | - Alexandre Varnek
- Laboratoire d'Infochimie, UMR 7177 CNRS, Université de Strasbourg, 4, Rue B. Pascal, 67000, Strasbourg, France
| | - Barbara Zdrazil
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, CB10 1SD, Cambridgeshire, UK
| |
Collapse
|
35
|
Alnammi M, Liu S, Ericksen SS, Ananiev GE, Voter AF, Guo S, Keck JL, Hoffmann FM, Wildman SA, Gitter A. Evaluating Scalable Supervised Learning for Synthesize-on-Demand Chemical Libraries. J Chem Inf Model 2023; 63:5513-5528. [PMID: 37625010 PMCID: PMC10538940 DOI: 10.1021/acs.jcim.3c00912] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2023] [Indexed: 08/27/2023]
Abstract
Traditional small-molecule drug discovery is a time-consuming and costly endeavor. High-throughput chemical screening can only assess a tiny fraction of drug-like chemical space. The strong predictive power of modern machine-learning methods for virtual chemical screening enables training models on known active and inactive compounds and extrapolating to much larger chemical libraries. However, there has been limited experimental validation of these methods in practical applications on large commercially available or synthesize-on-demand chemical libraries. Through a prospective evaluation with the bacterial protein-protein interaction PriA-SSB, we demonstrate that ligand-based virtual screening can identify many active compounds in large commercial libraries. We use cross-validation to compare different types of supervised learning models and select a random forest (RF) classifier as the best model for this target. When predicting the activity of more than 8 million compounds from Aldrich Market Select, the RF substantially outperforms a naïve baseline based on chemical structure similarity. 48% of the RF's 701 selected compounds are active. The RF model easily scales to score one billion compounds from the synthesize-on-demand Enamine REAL database. We tested 68 chemically diverse top predictions from Enamine REAL and observed 31 hits (46%), including one with an IC50 value of 1.3 μM.
Collapse
Affiliation(s)
- Moayad Alnammi
- Department
of Computer Sciences, University of Wisconsin−Madison, Madison, Wisconsin 53706, United States
- Morgridge
Institute for Research, Madison, Wisconsin 53715, United States
- Department
of Information and Computer Science, King
Fahd University of Petroleum & Minerals, Dhahran 31261, Saudi Arabia
| | - Shengchao Liu
- Department
of Computer Sciences, University of Wisconsin−Madison, Madison, Wisconsin 53706, United States
- Morgridge
Institute for Research, Madison, Wisconsin 53715, United States
| | - Spencer S. Ericksen
- Small
Molecule Screening Facility, University
of Wisconsin−Madison, Madison, Wisconsin 53792, United States
| | - Gene E. Ananiev
- Small
Molecule Screening Facility, University
of Wisconsin−Madison, Madison, Wisconsin 53792, United States
| | - Andrew F. Voter
- Department
of Biomolecular Chemistry, University of
Wisconsin−Madison, Madison, Wisconsin 53706, United States
| | - Song Guo
- Small
Molecule Screening Facility, University
of Wisconsin−Madison, Madison, Wisconsin 53792, United States
| | - James L. Keck
- Department
of Biomolecular Chemistry, University of
Wisconsin−Madison, Madison, Wisconsin 53706, United States
| | - F. Michael Hoffmann
- Small
Molecule Screening Facility, University
of Wisconsin−Madison, Madison, Wisconsin 53792, United States
- McArdle Laboratory
for Cancer Research, University of Wisconsin−Madison, Madison, Wisconsin 53705, United States
| | - Scott A. Wildman
- Small
Molecule Screening Facility, University
of Wisconsin−Madison, Madison, Wisconsin 53792, United States
| | - Anthony Gitter
- Department
of Computer Sciences, University of Wisconsin−Madison, Madison, Wisconsin 53706, United States
- Morgridge
Institute for Research, Madison, Wisconsin 53715, United States
- Department
of Biostatistics and Medical Informatics, University of Wisconsin−Madison, Madison, Wisconsin 53792, United States
| |
Collapse
|
36
|
López-Pérez K, López-López E, Medina-Franco JL, Miranda-Quintana RA. Sampling and Mapping Chemical Space with Extended Similarity Indices. Molecules 2023; 28:6333. [PMID: 37687162 PMCID: PMC10489020 DOI: 10.3390/molecules28176333] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2023] [Revised: 08/24/2023] [Accepted: 08/26/2023] [Indexed: 09/10/2023] Open
Abstract
Visualization of the chemical space is useful in many aspects of chemistry, including compound library design, diversity analysis, and exploring structure-property relationships, to name a few. Examples of notable research areas where the visualization of chemical space has strong applications are drug discovery and natural product research. However, the sheer volume of even comparatively small sub-sections of chemical space implies that we need to use approximations at the time of navigating through chemical space. ChemMaps is a visualization methodology that approximates the distribution of compounds in large datasets based on the selection of satellite compounds that yield a similar mapping of the whole dataset when principal component analysis on a similarity matrix is performed. Here, we show how the recently proposed extended similarity indices can help find regions that are relevant to sample satellites and reduce the amount of high-dimensional data needed to describe a library's chemical space.
Collapse
Affiliation(s)
- Kenneth López-Pérez
- Department of Chemistry and Quantum Theory Project, University of Florida, Gainesville, FL 32611, USA;
| | - Edgar López-López
- DIFACQUIM Research Group, Department of Pharmacy, National Autonomous University of Mexico, Mexico City 04510, Mexico;
- Department of Chemistry and Graduate Program in Pharmacology, Center for Research and Advanced Studies of the National Polytechnic Institute, Mexico City 07000, Mexico
| | - José L. Medina-Franco
- DIFACQUIM Research Group, Department of Pharmacy, National Autonomous University of Mexico, Mexico City 04510, Mexico;
| | | |
Collapse
|
37
|
Lyu J, Irwin JJ, Shoichet BK. Modeling the expansion of virtual screening libraries. Nat Chem Biol 2023; 19:712-718. [PMID: 36646956 PMCID: PMC10243288 DOI: 10.1038/s41589-022-01234-w] [Citation(s) in RCA: 46] [Impact Index Per Article: 46.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2022] [Accepted: 11/22/2022] [Indexed: 01/17/2023]
Abstract
Recently, 'tangible' virtual libraries have made billions of molecules readily available. Prioritizing these molecules for synthesis and testing demands computational approaches, such as docking. Their success may depend on library diversity, their similarity to bio-like molecules and how receptor fit and artifacts change with library size. We compared a library of 3 million 'in-stock' molecules with billion-plus tangible libraries. The bias toward bio-like molecules in the tangible library decreases 19,000-fold versus those 'in-stock'. Similarly, thousands of high-ranking molecules, including experimental actives, from five ultra-large-library docking campaigns are also dissimilar to bio-like molecules. Meanwhile, better-fitting molecules are found as the library grows, with the score improving log-linearly with library size. Finally, as library size increases, so too do rare molecules that rank artifactually well. Although the nature of these artifacts changes from target to target, the expectation of their occurrence does not, and simple strategies can minimize their impact.
Collapse
Affiliation(s)
- Jiankun Lyu
- Department of Pharmaceutical Chemistry, University of California, San Francisco, CA, USA
| | - John J Irwin
- Department of Pharmaceutical Chemistry, University of California, San Francisco, CA, USA.
| | - Brian K Shoichet
- Department of Pharmaceutical Chemistry, University of California, San Francisco, CA, USA.
| |
Collapse
|
38
|
Bonilla PA, Hoop CL, Stefanisko K, Tarasov SG, Sinha S, Nicklaus MC, Tarasova NI. Virtual screening of ultra-large chemical libraries identifies cell-permeable small-molecule inhibitors of a "non-druggable" target, STAT3 N-terminal domain. Front Oncol 2023; 13:1144153. [PMID: 37182134 PMCID: PMC10167007 DOI: 10.3389/fonc.2023.1144153] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2023] [Accepted: 03/23/2023] [Indexed: 05/16/2023] Open
Abstract
STAT3 N-terminal domain is a promising molecular target for cancer treatment and modulation of immune responses. However, STAT3 is localized in the cytoplasm, mitochondria, and nuclei, and thus, is inaccessible to therapeutic antibodies. Its N-terminal domain lacks deep pockets on the surface and represents a typical "non-druggable" protein. In order to successfully identify potent and selective inhibitors of the domain, we have used virtual screening of billion structure-sized virtual libraries of make-on-demand screening samples. The results suggest that the expansion of accessible chemical space by cutting-edge ultra-large virtual compound databases can lead to successful development of small molecule drugs for hard-to-target intracellular proteins.
Collapse
Affiliation(s)
- Pedro Andrade Bonilla
- Cancer Innovation Laboratory, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Frederick, MD, United States
| | - Cody L. Hoop
- Cancer Innovation Laboratory, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Frederick, MD, United States
| | - Karen Stefanisko
- Cancer Innovation Laboratory, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Frederick, MD, United States
| | - Sergey G. Tarasov
- Center for Structural Biology, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Frederick, MD, United States
| | | | - Marc C. Nicklaus
- Computer-Aided Drug Design Group, Chemical Biology Laboratory, Center for Cancer Research, National Cancer Institute, National Institute of Health (NIH), Frederick, MD, United States
| | - Nadya I. Tarasova
- Cancer Innovation Laboratory, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Frederick, MD, United States
| |
Collapse
|
39
|
Johnston RC, Yao K, Kaplan Z, Chelliah M, Leswing K, Seekins S, Watts S, Calkins D, Chief Elk J, Jerome SV, Repasky MP, Shelley JC. Epik: p Ka and Protonation State Prediction through Machine Learning. J Chem Theory Comput 2023; 19:2380-2388. [PMID: 37023332 DOI: 10.1021/acs.jctc.3c00044] [Citation(s) in RCA: 14] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/08/2023]
Abstract
Epik version 7 is a software program that uses machine learning for predicting the pKa values and protonation state distribution of complex, druglike molecules. Using an ensemble of atomic graph convolutional neural networks (GCNNs) trained on over 42,000 pKa values across broad chemical space from both experimental and computed origins, the model predicts pKa values with 0.42 and 0.72 pKa unit median absolute and root mean square errors, respectively, across seven test sets. Epik version 7 also generates protonation states and recovers 95% of the most populated protonation states compared to previous versions. Requiring on average only 47 ms per ligand, Epik version 7 is rapid and accurate enough to evaluate protonation states for crucial molecules and prepare ultra-large libraries of compounds to explore vast regions of chemical space. The simplicity and time required for the training allow for the generation of highly accurate models customized to a program's specific chemistry.
Collapse
Affiliation(s)
- Ryne C Johnston
- Schrödinger, Inc., 101 SW Main Street, Suite 1300, Portland, Oregon 97204, United States
| | - Kun Yao
- Schrödinger, Inc., 1540 Broadway Street, 24th Floor, New York, New York 10036, United States
| | - Zachary Kaplan
- Schrödinger, Inc., 1540 Broadway Street, 24th Floor, New York, New York 10036, United States
| | - Monica Chelliah
- Schrödinger, Inc., 1540 Broadway Street, 24th Floor, New York, New York 10036, United States
| | - Karl Leswing
- Schrödinger, Inc., 1540 Broadway Street, 24th Floor, New York, New York 10036, United States
| | - Sean Seekins
- Schrödinger, Inc., 101 SW Main Street, Suite 1300, Portland, Oregon 97204, United States
| | - Shawn Watts
- Schrödinger, Inc., 101 SW Main Street, Suite 1300, Portland, Oregon 97204, United States
| | - David Calkins
- Schrödinger, Inc., 101 SW Main Street, Suite 1300, Portland, Oregon 97204, United States
| | - Jackson Chief Elk
- Schrödinger, Inc., 101 SW Main Street, Suite 1300, Portland, Oregon 97204, United States
| | - Steven V Jerome
- Schrödinger, Inc., 9171 Towne Centre Drive, San Diego, California 92122, United States
| | - Matthew P Repasky
- Schrödinger, Inc., 101 SW Main Street, Suite 1300, Portland, Oregon 97204, United States
| | - John C Shelley
- Schrödinger, Inc., 101 SW Main Street, Suite 1300, Portland, Oregon 97204, United States
| |
Collapse
|
40
|
Korn M, Ehrt C, Ruggiu F, Gastreich M, Rarey M. Navigating large chemical spaces in early-phase drug discovery. Curr Opin Struct Biol 2023; 80:102578. [PMID: 37019067 DOI: 10.1016/j.sbi.2023.102578] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/24/2022] [Revised: 01/28/2023] [Accepted: 02/26/2023] [Indexed: 04/07/2023]
Abstract
The size of actionable chemical spaces is surging, owing to a variety of novel techniques, both computational and experimental. As a consequence, novel molecular matter is now at our fingertips that cannot and should not be neglected in early-phase drug discovery. Huge, combinatorial, make-on-demand chemical spaces with high probability of synthetic success rise exponentially in content, generative machine learning models go hand in hand with synthesis prediction, and DNA-encoded libraries offer new ways of hit structure discovery. These technologies enable to search for new chemical matter in a much broader and deeper manner with less effort and fewer financial resources. These transformational developments require new cheminformatics approaches to make huge chemical spaces searchable and analyzable with low resources, and with as little energy consumption as possible. Substantial progress has been made in the past years with respect to computation as well as organic synthesis. First examples of bioactive compounds resulting from the successful use of these novel technologies demonstrate their power to contribute to tomorrow's drug discovery programs. This article gives a compact overview of the state-of-the-art.
Collapse
Affiliation(s)
- Malte Korn
- Universität Hamburg, ZBH - Center for Bioinformatics, Bundesstr. 43, 20146 Hamburg, Germany
| | - Christiane Ehrt
- Universität Hamburg, ZBH - Center for Bioinformatics, Bundesstr. 43, 20146 Hamburg, Germany
| | - Fiorella Ruggiu
- insitro, 279 E Grand Ave., CA 94608, South San Francisco, USA
| | - Marcus Gastreich
- BioSolveIT GmbH, An der Ziegelei 79, 53757 Sankt Augustin, Germany
| | - Matthias Rarey
- Universität Hamburg, ZBH - Center for Bioinformatics, Bundesstr. 43, 20146 Hamburg, Germany.
| |
Collapse
|
41
|
Sadybekov AV, Katritch V. Computational approaches streamlining drug discovery. Nature 2023; 616:673-685. [PMID: 37100941 DOI: 10.1038/s41586-023-05905-z] [Citation(s) in RCA: 184] [Impact Index Per Article: 184.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2022] [Accepted: 03/01/2023] [Indexed: 04/28/2023]
Abstract
Computer-aided drug discovery has been around for decades, although the past few years have seen a tectonic shift towards embracing computational technologies in both academia and pharma. This shift is largely defined by the flood of data on ligand properties and binding to therapeutic targets and their 3D structures, abundant computing capacities and the advent of on-demand virtual libraries of drug-like small molecules in their billions. Taking full advantage of these resources requires fast computational methods for effective ligand screening. This includes structure-based virtual screening of gigascale chemical spaces, further facilitated by fast iterative screening approaches. Highly synergistic are developments in deep learning predictions of ligand properties and target activities in lieu of receptor structure. Here we review recent advances in ligand discovery technologies, their potential for reshaping the whole process of drug discovery and development, as well as the challenges they encounter. We also discuss how the rapid identification of highly diverse, potent, target-selective and drug-like ligands to protein targets can democratize the drug discovery process, presenting new opportunities for the cost-effective development of safer and more effective small-molecule treatments.
Collapse
Affiliation(s)
- Anastasiia V Sadybekov
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA
- Center for New Technologies in Drug Discovery and Development, Bridge Institute, Michelson Center for Convergent Biosciences, University of Southern California, Los Angeles, CA, USA
| | - Vsevolod Katritch
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA.
- Center for New Technologies in Drug Discovery and Development, Bridge Institute, Michelson Center for Convergent Biosciences, University of Southern California, Los Angeles, CA, USA.
- Department of Chemistry, University of Southern California, Los Angeles, CA, USA.
| |
Collapse
|
42
|
Yoo J, Kim TY, Joung I, Song SO. Industrializing AI/ML during the end-to-end drug discovery process. Curr Opin Struct Biol 2023; 79:102528. [PMID: 36736243 DOI: 10.1016/j.sbi.2023.102528] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2022] [Revised: 12/16/2022] [Accepted: 12/20/2022] [Indexed: 02/04/2023]
Abstract
Drug discovery aims to select proper targets and drug candidates to address unmet clinical needs. The end-to-end drug discovery process includes all stages of drug discovery from target identification to drug candidate selection. Recently, several artificial intelligence and machine learning (AI/ML)-based drug discovery companies have attempted to build data-driven platforms spanning the end-to-end drug discovery process. The ability to identify elusive targets essentially leads to the diversification of discovery pipelines, thereby increasing the ability to address unmet needs. Modern ML technologies are complementing traditional computer-aided drug discovery by accelerating candidate optimization in innovative ways. This review summarizes recent developments in AI/ML methods from target identification to molecule optimization, and concludes with an overview of current industrial trends in end-to-end AI/ML platforms.
Collapse
Affiliation(s)
- Jiho Yoo
- Standigm Inc., 3F, 70 Nonhyeon-ro 85-gil, Gangnam-gu, Seoul, South Korea, 06234 +82.2.501.8118
| | - Tae Yong Kim
- Standigm Inc., 3F, 70 Nonhyeon-ro 85-gil, Gangnam-gu, Seoul, South Korea, 06234 +82.2.501.8118
| | - InSuk Joung
- Standigm Inc., 3F, 70 Nonhyeon-ro 85-gil, Gangnam-gu, Seoul, South Korea, 06234 +82.2.501.8118
| | - Sang Ok Song
- Standigm Inc., 3F, 70 Nonhyeon-ro 85-gil, Gangnam-gu, Seoul, South Korea, 06234 +82.2.501.8118.
| |
Collapse
|
43
|
Jung S, Vatheuer H, Czodrowski P. VSFlow: an open-source ligand-based virtual screening tool. J Cheminform 2023; 15:40. [PMID: 37004101 PMCID: PMC10064649 DOI: 10.1186/s13321-023-00703-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2022] [Accepted: 02/18/2023] [Indexed: 04/03/2023] Open
Abstract
Ligand-based virtual screening is a widespread method in modern drug design. It allows for a rapid screening of large compound databases in order to identify similar structures. Here we report an open-source command line tool which includes a substructure-, fingerprint- and shape-based virtual screening. Most of the implemented features fully rely on the RDKit cheminformatics framework. VSFlow accepts a wide range of input file formats and is highly customizable. Additionally, a quick visualization of the screening results as pdf and/or pymol file is supported.
Collapse
Affiliation(s)
- Sascha Jung
- grid.5675.10000 0001 0416 9637Department of Chemistry and Chemical Biology, TU Dortmund University, Otto-Hahn-Straße 6, 44227 Dortmund, Germany
| | - Helge Vatheuer
- grid.5675.10000 0001 0416 9637Department of Chemistry and Chemical Biology, TU Dortmund University, Otto-Hahn-Straße 6, 44227 Dortmund, Germany
| | - Paul Czodrowski
- grid.5802.f0000 0001 1941 7111Department of Chemistry, Johannes Gutenberg University Mainz, Duesbergweg 10-14, 55128 Mainz, Germany
| |
Collapse
|
44
|
Petinrin OO, Saeed F, Toseef M, Liu Z, Basurra S, Muyide IO, Li X, Lin Q, Wong KC. Machine learning in metastatic cancer research: Potentials, possibilities, and prospects. Comput Struct Biotechnol J 2023; 21:2454-2470. [PMID: 37077177 PMCID: PMC10106342 DOI: 10.1016/j.csbj.2023.03.046] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2023] [Revised: 03/26/2023] [Accepted: 03/27/2023] [Indexed: 03/31/2023] Open
Abstract
Cancer has received extensive recognition for its high mortality rate, with metastatic cancer being the top cause of cancer-related deaths. Metastatic cancer involves the spread of the primary tumor to other body organs. As much as the early detection of cancer is essential, the timely detection of metastasis, the identification of biomarkers, and treatment choice are valuable for improving the quality of life for metastatic cancer patients. This study reviews the existing studies on classical machine learning (ML) and deep learning (DL) in metastatic cancer research. Since the majority of metastatic cancer research data are collected in the formats of PET/CT and MRI image data, deep learning techniques are heavily involved. However, its black-box nature and expensive computational cost are notable concerns. Furthermore, existing models could be overestimated for their generality due to the non-diverse population in clinical trial datasets. Therefore, research gaps are itemized; follow-up studies should be carried out on metastatic cancer using machine learning and deep learning tools with data in a symmetric manner.
Collapse
Affiliation(s)
| | - Faisal Saeed
- DAAI Research Group, Department of Computing and Data Science, School of Computing and Digital Technology, Birmingham City University, Birmingham B4 7XG, UK
| | - Muhammad Toseef
- Department of Computer Science, City University of Hong Kong, Kowloon Tong, Kowloon, Hong Kong SAR
| | - Zhe Liu
- Department of Computer Science, City University of Hong Kong, Kowloon Tong, Kowloon, Hong Kong SAR
| | - Shadi Basurra
- DAAI Research Group, Department of Computing and Data Science, School of Computing and Digital Technology, Birmingham City University, Birmingham B4 7XG, UK
| | | | - Xiangtao Li
- School of Artificial Intelligence, Jilin University, Jilin, China
| | - Qiuzhen Lin
- School of Computer Science and Software Engineering, Shenzhen University, Shenzhen, China
| | - Ka-Chun Wong
- Department of Computer Science, City University of Hong Kong, Kowloon Tong, Kowloon, Hong Kong SAR
- Hong Kong Institute for Data Science, City University of Hong Kong, Kowloon Tong, Kowloon, Hong Kong SAR
| |
Collapse
|
45
|
Zimmermann RA, Fischer TR, Schwickert M, Nidoieva Z, Schirmeister T, Kersten C. Chemical Space Virtual Screening against Hard-to-Drug RNA Methyltransferases DNMT2 and NSUN6. Int J Mol Sci 2023; 24:ijms24076109. [PMID: 37047081 PMCID: PMC10094593 DOI: 10.3390/ijms24076109] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2023] [Revised: 02/20/2023] [Accepted: 03/22/2023] [Indexed: 04/14/2023] Open
Abstract
Targeting RNA methyltransferases with small molecules as inhibitors or tool compounds is an emerging field of interest in epitranscriptomics and medicinal chemistry. For two challenging RNA methyltransferases that introduce the 5-methylcytosine (m5C) modification in different tRNAs, namely DNMT2 and NSUN6, an ultra-large commercially available chemical space was virtually screened by physicochemical property filtering, molecular docking, and clustering to identify new ligands for those enzymes. Novel chemotypes binding to DNMT2 and NSUN6 with affinities down to KD,app = 37 µM and KD,app = 12 µM, respectively, were identified using a microscale thermophoresis (MST) binding assay. These compounds represent the first molecules with a distinct structure from the cofactor SAM and have the potential to be developed into activity-based probes for these enzymes. Additionally, the challenges and strategies of chemical space docking screens with special emphasis on library focusing and diversification are discussed.
Collapse
Affiliation(s)
- Robert A Zimmermann
- Institute of Pharmaceutical and Biomedical Sciences, Johannes Gutenberg-University, Staudingerweg 5, 55128 Mainz, Germany
| | - Tim R Fischer
- Institute of Pharmaceutical and Biomedical Sciences, Johannes Gutenberg-University, Staudingerweg 5, 55128 Mainz, Germany
| | - Marvin Schwickert
- Institute of Pharmaceutical and Biomedical Sciences, Johannes Gutenberg-University, Staudingerweg 5, 55128 Mainz, Germany
| | - Zarina Nidoieva
- Institute of Pharmaceutical and Biomedical Sciences, Johannes Gutenberg-University, Staudingerweg 5, 55128 Mainz, Germany
| | - Tanja Schirmeister
- Institute of Pharmaceutical and Biomedical Sciences, Johannes Gutenberg-University, Staudingerweg 5, 55128 Mainz, Germany
| | - Christian Kersten
- Institute of Pharmaceutical and Biomedical Sciences, Johannes Gutenberg-University, Staudingerweg 5, 55128 Mainz, Germany
| |
Collapse
|
46
|
Sala D, Batebi H, Ledwitch K, Hildebrand PW, Meiler J. Targeting in silico GPCR conformations with ultra-large library screening for hit discovery. Trends Pharmacol Sci 2023; 44:150-161. [PMID: 36669974 PMCID: PMC9974811 DOI: 10.1016/j.tips.2022.12.006] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2022] [Revised: 12/23/2022] [Accepted: 12/27/2022] [Indexed: 01/20/2023]
Abstract
The use of deep machine learning (ML) in protein structure prediction has made it possible to easily access a large number of annotated conformations that can potentially compensate for missing experimental structures in structure-based drug discovery (SBDD). However, it is still unclear whether the accuracy of these predicted conformations is sufficient for screening chemical compounds that will effectively interact with a protein target for pharmacological purposes. In this opinion article, we examine the potential benefits and limitations of using state-annotated conformations for ultra-large library screening (ULLS) in light of the growing size of ultra-large libraries (ULLs). We believe that targeting different conformational states of common drug targets like G-protein-coupled receptors (GPCRs), which can regulate human physiology by switching between different conformations, can offer multiple advantages.
Collapse
Affiliation(s)
- D Sala
- Institute of Drug Discovery, Faculty of Medicine, University of Leipzig, 04103 Leipzig, Germany
| | - H Batebi
- Institute of Medical Physics and Biophysics, Faculty of Medicine, University of Leipzig, 04103 Leipzig, Germany
| | - K Ledwitch
- Center for Structural Biology, Vanderbilt University, Nashville, TN 37240, USA; Department of Chemistry, Vanderbilt University, Nashville, TN 37235, USA
| | - P W Hildebrand
- Institute of Medical Physics and Biophysics, Faculty of Medicine, University of Leipzig, 04103 Leipzig, Germany
| | - J Meiler
- Institute of Drug Discovery, Faculty of Medicine, University of Leipzig, 04103 Leipzig, Germany; Center for Structural Biology, Vanderbilt University, Nashville, TN 37240, USA; Department of Chemistry, Vanderbilt University, Nashville, TN 37235, USA.
| |
Collapse
|
47
|
Tingle B, Tang KG, Castanon M, Gutierrez JJ, Khurelbaatar M, Dandarchuluun C, Moroz YS, Irwin JJ. ZINC-22─A Free Multi-Billion-Scale Database of Tangible Compounds for Ligand Discovery. J Chem Inf Model 2023; 63:1166-1176. [PMID: 36790087 PMCID: PMC9976280 DOI: 10.1021/acs.jcim.2c01253] [Citation(s) in RCA: 37] [Impact Index Per Article: 37.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2022] [Indexed: 02/16/2023]
Abstract
Purchasable chemical space has grown rapidly into the tens of billions of molecules, providing unprecedented opportunities for ligand discovery but straining the tools that might exploit these molecules at scale. We have therefore developed ZINC-22, a database of commercially accessible small molecules derived from multi-billion-scale make-on-demand libraries. The new database and tools enable analog searching in this vast new space via a facile GUI, CartBlanche, drawing on similarity methods that scale sublinearly in the number of molecules. The new library also uses data organization methods, enabling rapid lookup of molecules and their physical properties, including conformations, partial atomic charges, c Log P values, and solvation energies, all crucial for molecule docking, which had become slow with older database organizations in previous versions of ZINC. As the libraries have continued to grow, we have been interested in finding whether molecular diversity has suffered, for instance, because certain scaffolds have come to dominate via easy analoging. This has not occurred thus far, and chemical diversity continues to grow with database size, with a log increase in Bemis-Murcko scaffolds for every two-log unit increase in database size. Most new scaffolds come from compounds with the highest heavy atom count. Finally, we consider the implications for databases like ZINC as the libraries grow toward and beyond the trillion-molecule range. ZINC is freely available to everyone and may be accessed at cartblanche22.docking.org, via Globus, and in the Amazon AWS and Oracle OCI clouds.
Collapse
Affiliation(s)
- Benjamin
I. Tingle
- Department
of Pharmaceutical Chemistry, University
of California San Francisco, 1700 4th St, Mailcode 2550, San Francisco, California 94158-2330, United States
| | - Khanh G. Tang
- Department
of Pharmaceutical Chemistry, University
of California San Francisco, 1700 4th St, Mailcode 2550, San Francisco, California 94158-2330, United States
| | - Mar Castanon
- Department
of Pharmaceutical Chemistry, University
of California San Francisco, 1700 4th St, Mailcode 2550, San Francisco, California 94158-2330, United States
| | - John J. Gutierrez
- Department
of Pharmaceutical Chemistry, University
of California San Francisco, 1700 4th St, Mailcode 2550, San Francisco, California 94158-2330, United States
| | - Munkhzul Khurelbaatar
- Department
of Pharmaceutical Chemistry, University
of California San Francisco, 1700 4th St, Mailcode 2550, San Francisco, California 94158-2330, United States
| | - Chinzorig Dandarchuluun
- Department
of Pharmaceutical Chemistry, University
of California San Francisco, 1700 4th St, Mailcode 2550, San Francisco, California 94158-2330, United States
| | - Yurii S. Moroz
- Taras
Shevchenko National University of Kyïv, 60 Volodymyrska Street, Kyïv 01601, Ukraine
- Chemspace
LLC, 85 Chervonotkatska
Street, Kyïv 02094, Ukraine
| | - John J. Irwin
- Department
of Pharmaceutical Chemistry, University
of California San Francisco, 1700 4th St, Mailcode 2550, San Francisco, California 94158-2330, United States
| |
Collapse
|
48
|
Buehler Y, Reymond JL. Molecular Framework Analysis of the Generated Database GDB-13s. J Chem Inf Model 2023; 63:484-492. [PMID: 36533982 PMCID: PMC9875802 DOI: 10.1021/acs.jcim.2c01107] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2022] [Indexed: 12/23/2022]
Abstract
The generated databases (GDBs) list billions of possible molecules from systematic enumeration following simple rules of chemical stability and synthetic feasibility. To assess the originality of GDB molecules, we compared their Bemis and Murcko molecular frameworks (MFs) with those in public databases. MFs result from molecules by converting all atoms to carbons, all bonds to single bonds, and removing terminal atoms iteratively until none remain. We compared GDB-13s (99,394,177 molecules up to 13 atoms containing simplified functional groups, 22,130 MFs) with ZINC (885,905,524 screening compounds, 1,016,597 MFs), PubChem50 (100,852,694 molecules up to 50 atoms, 1,530,189 MFs), and COCONUT (401,624 natural products, 42,734 MFs). While MFs in public databases mostly contained linker bonds and six-membered rings, GDB-13s MFs had diverse ring sizes and ring systems without linker bonds. Most GDB-13s MFs were exclusive to this database, and many were relatively simple, representing attractive targets for synthetic chemistry aiming at innovative molecules.
Collapse
Affiliation(s)
- Ye Buehler
- Department of Chemistry, Biochemistry
and Pharmaceutical Sciences, University
of Bern, Freiestrasse 3, 3012Bern, Switzerland
| | - Jean-Louis Reymond
- Department of Chemistry, Biochemistry
and Pharmaceutical Sciences, University
of Bern, Freiestrasse 3, 3012Bern, Switzerland
| |
Collapse
|
49
|
The 'Big Bang' of the chemical universe. Nat Chem Biol 2023; 19:667-668. [PMID: 36646955 DOI: 10.1038/s41589-022-01233-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023]
|
50
|
Perebyinis M, Rognan D. Overlap of On-demand Ultra-large Combinatorial Spaces with On-the-shelf Drug-like Libraries. Mol Inform 2023; 42:e2200163. [PMID: 36072995 DOI: 10.1002/minf.202200163] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2022] [Accepted: 09/07/2022] [Indexed: 01/12/2023]
Abstract
On-demand combinatorial spaces are shifting paradigms in early drug discovery, by considerably increasing the searchable chemical space to several billions of compounds while securing their synthetic accessibility. We here systematically compared the on-the-shelf available drug-like chemical space (9 million compounds) to three on-demand ultra-large (ODUL) combinatorial fragment spaces (REAL, CHEMriya, GalaXi) covering 32 billion of readily accessible molecules. Surprisingly, only one space (REAL) intersects almost entirely the currently available drug-like space, suggesting that it is the only ODUL widely suitable for in-stock hit expansion. Of course, expanding a preliminary ODUL hit in the same chemical space is the best possible strategy to rapidly generate structure-activity relationships. All three spaces remain well suited to early hit finding initiatives since they all provide numerous unique scaffolds that are not described by on-the shelf collections.
Collapse
Affiliation(s)
- Mariana Perebyinis
- Laboratoire d'Innovation Thérapeutique, UMR7200 CNRS-Université de Strasbourg, 74 route du Rhin, F-67400, Illkirch, France
| | - Didier Rognan
- Laboratoire d'Innovation Thérapeutique, UMR7200 CNRS-Université de Strasbourg, 74 route du Rhin, F-67400, Illkirch, France
| |
Collapse
|