1
|
Vogt M. Chemoinformatic approaches for navigating large chemical spaces. Expert Opin Drug Discov 2024; 19:403-414. [PMID: 38300511 DOI: 10.1080/17460441.2024.2313475] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2023] [Accepted: 01/30/2024] [Indexed: 02/02/2024]
Abstract
INTRODUCTION Large chemical spaces (CSs) include traditional large compound collections, combinatorial libraries covering billions to trillions of molecules, DNA-encoded chemical libraries comprising complete combinatorial CSs in a single mixture, and virtual CSs explored by generative models. The diverse nature of these types of CSs require different chemoinformatic approaches for navigation. AREAS COVERED An overview of different types of large CSs is provided. Molecular representations and similarity metrics suitable for large CS exploration are discussed. A summary of navigation of CSs in generative models is provided. Methods for characterizing and comparing CSs are discussed. EXPERT OPINION The size of large CSs might restrict navigation to specialized algorithms and limit it to considering neighborhoods of structurally similar molecules. Efficient navigation of large CSs not only requires methods that scale with size but also requires smart approaches that focus on better but not necessarily larger molecule selections. Deep generative models aim to provide such approaches by implicitly learning features relevant for targeted biological properties. It is unclear whether these models can fulfill this ideal as validation is difficult as long as the covered CSs remain mainly virtual without experimental verification.
Collapse
Affiliation(s)
- Martin Vogt
- Department of Life Science Informatics, B-IT, LIMES Program Unit Chemical Biology and Medicinal Chemistry, Rheinische Friedrich-Wilhelms-Universität, Bonn, Germany
| |
Collapse
|
2
|
Tropsha A, Isayev O, Varnek A, Schneider G, Cherkasov A. Integrating QSAR modelling and deep learning in drug discovery: the emergence of deep QSAR. Nat Rev Drug Discov 2024; 23:141-155. [PMID: 38066301 DOI: 10.1038/s41573-023-00832-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/21/2023] [Indexed: 02/08/2024]
Abstract
Quantitative structure-activity relationship (QSAR) modelling, an approach that was introduced 60 years ago, is widely used in computer-aided drug design. In recent years, progress in artificial intelligence techniques, such as deep learning, the rapid growth of databases of molecules for virtual screening and dramatic improvements in computational power have supported the emergence of a new field of QSAR applications that we term 'deep QSAR'. Marking a decade from the pioneering applications of deep QSAR to tasks involved in small-molecule drug discovery, we herein describe key advances in the field, including deep generative and reinforcement learning approaches in molecular design, deep learning models for synthetic planning and the application of deep QSAR models in structure-based virtual screening. We also reflect on the emergence of quantum computing, which promises to further accelerate deep QSAR applications and the need for open-source and democratized resources to support computer-aided drug design.
Collapse
Affiliation(s)
| | | | | | | | - Artem Cherkasov
- University of British Columbia, Vancouver, BC, Canada.
- Photonic Inc., Coquitlam, BC, Canada.
| |
Collapse
|
3
|
Pikalyova R, Zabolotna Y, Horvath D, Marcou G, Varnek A. Meta-GTM: Visualization and Analysis of the Chemical Library Space. J Chem Inf Model 2023; 63:5571-5582. [PMID: 37602843 DOI: 10.1021/acs.jcim.3c00719] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/22/2023]
Abstract
In chemical library analysis, it may be useful to describe libraries as individual items rather than collections of compounds. This is particularly true for ultra-large noncherry-pickable compound mixtures, such as DNA-encoded libraries (DELs). In this sense, the chemical library space (CLS) is useful for the management of a portfolio of libraries, just like chemical space (CS) helps manage a portfolio of molecules. Several possible CLSs were previously defined using vectorial library representations obtained from generative topographic mapping (GTM). Given the steadily growing number of DEL designs, the CLS becomes "crowded" and requires analysis tools beyond pairwise library comparison. Therefore, herein, we investigate the cartography of CLS on meta-(μ)GTMs─"meta" to remind that these are maps of the CLS, itself based on responsibility vectors issued by regular CS GTMs. 2,5 K DELs and ChEMBL (reference) were projected on the μGTM, producing landscapes of library-specific properties. These describe both interlibrary similarity and intrinsic library characteristics in the same view, herewith facilitating the selection of the best project-specific libraries.
Collapse
Affiliation(s)
- Regina Pikalyova
- Laboratory of Chemoinformatics, University of Strasbourg, 4, rue B. Pascal, Strasbourg 67081, France
| | - Yuliana Zabolotna
- Laboratory of Chemoinformatics, University of Strasbourg, 4, rue B. Pascal, Strasbourg 67081, France
| | - Dragos Horvath
- Laboratory of Chemoinformatics, University of Strasbourg, 4, rue B. Pascal, Strasbourg 67081, France
| | - Gilles Marcou
- Laboratory of Chemoinformatics, University of Strasbourg, 4, rue B. Pascal, Strasbourg 67081, France
| | - Alexandre Varnek
- Laboratory of Chemoinformatics, University of Strasbourg, 4, rue B. Pascal, Strasbourg 67081, France
| |
Collapse
|
4
|
Pikalyova R, Zabolotna Y, Horvath D, Marcou G, Varnek A. Chemical Library Space: Definition and DNA-Encoded Library Comparison Study Case. J Chem Inf Model 2023. [PMID: 37368824 DOI: 10.1021/acs.jcim.3c00520] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/29/2023]
Abstract
The development of DNA-encoded library (DEL) technology introduced new challenges for the analysis of chemical libraries. It is often useful to consider a chemical library as a stand-alone chemoinformatic object─represented both as a collection of independent molecules, and yet an individual entity─in particular, when they are inseparable mixtures, like DELs. Herein, we introduce the concept of chemical library space (CLS), in which resident items are individual chemical libraries. We define and compare four vectorial library representations obtained using generative topographic mapping. These allow for an effective comparison of libraries, with the ability to tune and chemically interpret the similarity relationships. In particular, property-tuned CLS encodings enable to simultaneously compare libraries with respect to both property and chemotype distributions. We apply the various CLS encodings for the selection problem of DELs that optimally "match" a reference collection (here ChEMBL28), showing how the choice of the CLS descriptors may help to fine-tune the "matching" (overlap) criteria. Hence, the proposed CLS may represent a new efficient way for polyvalent analysis of thousands of chemical libraries. Selection of an easily accessible compound collection for drug discovery, as a substitute for a difficult to produce reference library, can be tuned for either primary or target-focused screening, also considering property distributions of compounds. Alternatively, selection of libraries covering novel regions of the chemical space with respect to a reference compound subspace may serve for library portfolio enrichment.
Collapse
Affiliation(s)
- Regina Pikalyova
- Laboratory of Chemoinformatics, University of Strasbourg, 4, rue B. Pascal, Strasbourg 67081, France
| | - Yuliana Zabolotna
- Laboratory of Chemoinformatics, University of Strasbourg, 4, rue B. Pascal, Strasbourg 67081, France
| | - Dragos Horvath
- Laboratory of Chemoinformatics, University of Strasbourg, 4, rue B. Pascal, Strasbourg 67081, France
| | - Gilles Marcou
- Laboratory of Chemoinformatics, University of Strasbourg, 4, rue B. Pascal, Strasbourg 67081, France
| | - Alexandre Varnek
- Laboratory of Chemoinformatics, University of Strasbourg, 4, rue B. Pascal, Strasbourg 67081, France
| |
Collapse
|
5
|
Bort W, Mazitov D, Horvath D, Bonachera F, Lin A, Marcou G, Baskin I, Madzhidov T, Varnek A. Inverse QSAR: Reversing Descriptor-Driven Prediction Pipeline Using Attention-Based Conditional Variational Autoencoder. J Chem Inf Model 2022; 62:5471-5484. [PMID: 36332178 DOI: 10.1021/acs.jcim.2c01086] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
In order to better foramize it, the notorious inverse-QSAR problem (finding structures of given QSAR-predicted properties) is considered in this paper as a two-step process including (i) finding "seed" descriptor vectors corresponding to user-constrained QSAR model output values and (ii) identifying the chemical structures best matching the "seed" vectors. The main development effort here was focused on the latter stage, proposing a new attention-based conditional variational autoencoder neural-network architecture based on recent developments in attention-based methods. The obtained results show that this workflow was capable of generating compounds predicted to display desired activity while being completely novel compared to the training database (ChEMBL). Moreover, the generated compounds show acceptable druglikeness and synthetic accessibility. Both pharmacophore and docking studies were carried out as "orthogonal" in silico validation methods, proving that some of de novo structures are, beyond being predicted active by 2D-QSAR models, clearly able to match binding 3D pharmacophores and bind the protein pocket.
Collapse
Affiliation(s)
- William Bort
- Laboratory of Chemoinformatics, UMR 7140 University of Strasbourg/CNRS, 4 rue Blaise Pascal, 67000 Strasbourg, France
| | - Daniyar Mazitov
- Laboratory of Chemoinformatics and Molecular Modeling, A. M. Butlerov Institute of Chemistry, Kazan Federal University, 18, Kremlyovskaya str., 420008 Kazan, Russia
| | - Dragos Horvath
- Laboratory of Chemoinformatics, UMR 7140 University of Strasbourg/CNRS, 4 rue Blaise Pascal, 67000 Strasbourg, France
| | - Fanny Bonachera
- Laboratory of Chemoinformatics, UMR 7140 University of Strasbourg/CNRS, 4 rue Blaise Pascal, 67000 Strasbourg, France
| | - Arkadii Lin
- Laboratory of Chemoinformatics, UMR 7140 University of Strasbourg/CNRS, 4 rue Blaise Pascal, 67000 Strasbourg, France
| | - Gilles Marcou
- Laboratory of Chemoinformatics, UMR 7140 University of Strasbourg/CNRS, 4 rue Blaise Pascal, 67000 Strasbourg, France
| | - Igor Baskin
- Department of Material Science and Engineering, Technion─Israel Institute of Technology, 3200003 Haifa, Israel
| | - Timur Madzhidov
- Laboratory of Chemoinformatics and Molecular Modeling, A. M. Butlerov Institute of Chemistry, Kazan Federal University, 18, Kremlyovskaya str., 420008 Kazan, Russia
| | - Alexandre Varnek
- Laboratory of Chemoinformatics, UMR 7140 University of Strasbourg/CNRS, 4 rue Blaise Pascal, 67000 Strasbourg, France
| |
Collapse
|
6
|
Medina‐Franco JL, Chávez‐Hernández AL, López‐López E, Saldívar‐González FI. Chemical Multiverse: An Expanded View of Chemical Space. Mol Inform 2022; 41:e2200116. [PMID: 35916110 PMCID: PMC9787733 DOI: 10.1002/minf.202200116] [Citation(s) in RCA: 19] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2022] [Accepted: 08/01/2022] [Indexed: 12/30/2022]
Abstract
Technological advances and practical applications of the chemical space concept in drug discovery, natural product research, and other research areas have attracted the scientific community's attention. The large- and ultra-large chemical spaces are associated with the significant increase in the number of compounds that can potentially be made and exist and the increasing number of experimental and calculated descriptors, that are emerging that encode the molecular structure and/or property aspects of the molecules. Due to the importance and continued evolution of compound libraries, herein, we discuss definitions proposed in the literature for chemical space and emphasize the convenience, discussed in the literature to use complementary descriptors to obtain a comprehensive view of the chemical space of compound data sets. In this regard, we introduce the term chemical multiverse to refer to the comprehensive analysis of compound data sets through several chemical spaces, each defined by a different set of chemical representations. The chemical multiverse is contrasted with a related idea: consensus chemical space.
Collapse
Affiliation(s)
- José L. Medina‐Franco
- DIFACQUIM research group, Department of Pharmacy, School of ChemistryNational Autonomous University of MexicoMexico City04510Mexico
| | - Ana L. Chávez‐Hernández
- DIFACQUIM research group, Department of Pharmacy, School of ChemistryNational Autonomous University of MexicoMexico City04510Mexico
| | - Edgar López‐López
- Department of PharmacologyCenter for Research and Advanced Studies of the National Polytechnic Institute (CINVESTAV)Mexico City07360Mexico
| | - Fernanda I. Saldívar‐González
- DIFACQUIM research group, Department of Pharmacy, School of ChemistryNational Autonomous University of MexicoMexico City04510Mexico
| |
Collapse
|
7
|
Zabolotna Y, Bonachera F, Horvath D, Lin A, Marcou G, Klimchuk O, Varnek A. Chemspace Atlas: Multiscale Chemography of Ultralarge Libraries for Drug Discovery. J Chem Inf Model 2022; 62:4537-4548. [DOI: 10.1021/acs.jcim.2c00509] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Affiliation(s)
- Yuliana Zabolotna
- University of Strasbourg, Laboratoire de Chemoinformatique, 4, rue B. Pascal, Strasbourg 67081, France
| | - Fanny Bonachera
- University of Strasbourg, Laboratoire de Chemoinformatique, 4, rue B. Pascal, Strasbourg 67081, France
| | - Dragos Horvath
- University of Strasbourg, Laboratoire de Chemoinformatique, 4, rue B. Pascal, Strasbourg 67081, France
| | - Arkadii Lin
- University of Strasbourg, Laboratoire de Chemoinformatique, 4, rue B. Pascal, Strasbourg 67081, France
| | - Gilles Marcou
- University of Strasbourg, Laboratoire de Chemoinformatique, 4, rue B. Pascal, Strasbourg 67081, France
| | - Olga Klimchuk
- University of Strasbourg, Laboratoire de Chemoinformatique, 4, rue B. Pascal, Strasbourg 67081, France
| | - Alexandre Varnek
- University of Strasbourg, Laboratoire de Chemoinformatique, 4, rue B. Pascal, Strasbourg 67081, France
| |
Collapse
|
8
|
Pikalyova R, Zabolotna Y, Volochnyuk D, Horvath D, Gilles M, Varnek A. Exploration of the chemical space of DNA-encoded libraries. Mol Inform 2022; 41:e2100289. [PMID: 34981643 DOI: 10.1002/minf.202100289] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2021] [Accepted: 01/03/2022] [Indexed: 11/09/2022]
Abstract
DNA-Encoded Library (DEL) technology has emerged as an alternative method for bioactive molecules discovery in medicinal chemistry. It enables the simple synthesis and screening of compound libraries of enormous size. Even though it gains more and more popularity each day, there are almost no reports of chemoinformatics analysis of DEL chemical space. Therefore, in this project, we aimed to generate and analyze the ultra-large chemical space of DEL. Around 2500 DELs were designed using commercially available BBs resulting in 2,5B DEL compounds that were compared to biologically relevant compounds from ChEMBL using Generative Topographic Mapping. This allowed to choose several optimal DELs covering the chemical space of ChEMBL to the highest extent and thus containing the maximum possible percentage of biologically relevant chemotypes. Different combinations of DELs were also analyzed to identify a set of mutually complementary libraries allowing to attain even higher coverage of ChEMBL than it is possible with one single DEL.
Collapse
|
9
|
Zabolotna Y, Volochnyuk DM, Ryabukhin SV, Horvath D, Gavrilenko KS, Marcou G, Moroz YS, Oksiuta O, Varnek A. A Close-up Look at the Chemical Space of Commercially Available Building Blocks for Medicinal Chemistry. J Chem Inf Model 2021; 62:2171-2185. [PMID: 34928600 DOI: 10.1021/acs.jcim.1c00811] [Citation(s) in RCA: 28] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
The ability to efficiently synthesize desired compounds can be a limiting factor for chemical space exploration in drug discovery. This ability is conditioned not only by the existence of well-studied synthetic protocols but also by the availability of corresponding reagents, so-called building blocks (BBs). In this work, we present a detailed analysis of the chemical space of 400 000 purchasable BBs. The chemical space was defined by corresponding synthons─fragments contributed to the final molecules upon reaction. They allow an analysis of BB physicochemical properties and diversity, unbiased by the leaving and protective groups in actual reagents. The main classes of BBs were analyzed in terms of their availability, rule-of-two-defined quality, and diversity. Available BBs were eventually compared to a reference set of biologically relevant synthons derived from ChEMBL fragmentation, in order to illustrate how well they cover the actual medicinal chemistry needs. This was performed on a newly constructed universal generative topographic map of synthon chemical space that enables visualization of both libraries and analysis of their overlapped and library-specific regions.
Collapse
Affiliation(s)
- Yuliana Zabolotna
- University of Strasbourg, Laboratoire de Chemoinformatique, 4, rue B. Pascal, Strasbourg 67081, France
| | - Dmitriy M Volochnyuk
- Institute of Organic Chemistry, National Academy of Sciences of Ukraine, Murmanska Street 5, Kyiv 02660, Ukraine.,Enamine Ltd., 78 Chervonotkatska str., 02660 Kiev, Ukraine
| | - Sergey V Ryabukhin
- The Institute of High Technologies, Kyiv National Taras Shevchenko University, 64 Volodymyrska Street, Kyiv 01601, Ukraine.,Enamine Ltd., 78 Chervonotkatska str., 02660 Kiev, Ukraine
| | - Dragos Horvath
- University of Strasbourg, Laboratoire de Chemoinformatique, 4, rue B. Pascal, Strasbourg 67081, France
| | - Konstantin S Gavrilenko
- Research-And-Education ChemBioCenter, National Taras Shevchenko University of Kyiv, Chervonotkatska str., 61, 03022 Kiev, Ukraine.,Enamine Ltd., 78 Chervonotkatska str., 02660 Kiev, Ukraine
| | - Gilles Marcou
- University of Strasbourg, Laboratoire de Chemoinformatique, 4, rue B. Pascal, Strasbourg 67081, France
| | - Yurii S Moroz
- Research-And-Education ChemBioCenter, National Taras Shevchenko University of Kyiv, Chervonotkatska str., 61, 03022 Kiev, Ukraine.,Chemspace, Chervonotkatska Street 78, 02094 Kyiv, Ukraine
| | - Oleksandr Oksiuta
- Institute of Organic Chemistry, National Academy of Sciences of Ukraine, Murmanska Street 5, Kyiv 02660, Ukraine.,Chemspace, Chervonotkatska Street 78, 02094 Kyiv, Ukraine
| | - Alexandre Varnek
- University of Strasbourg, Laboratoire de Chemoinformatique, 4, rue B. Pascal, Strasbourg 67081, France.,Institute for Chemical Reaction Design and Discovery (WPI-ICReDD), Hokkaido University, Kita 21 Nishi 10, Kita-ku, 001-0021 Sapporo, Japan
| |
Collapse
|
10
|
Muratov EN, Amaro R, Andrade CH, Brown N, Ekins S, Fourches D, Isayev O, Kozakov D, Medina-Franco JL, Merz KM, Oprea TI, Poroikov V, Schneider G, Todd MH, Varnek A, Winkler DA, Zakharov AV, Cherkasov A, Tropsha A. A critical overview of computational approaches employed for COVID-19 drug discovery. Chem Soc Rev 2021; 50:9121-9151. [PMID: 34212944 PMCID: PMC8371861 DOI: 10.1039/d0cs01065k] [Citation(s) in RCA: 99] [Impact Index Per Article: 33.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2021] [Indexed: 01/18/2023]
Abstract
COVID-19 has resulted in huge numbers of infections and deaths worldwide and brought the most severe disruptions to societies and economies since the Great Depression. Massive experimental and computational research effort to understand and characterize the disease and rapidly develop diagnostics, vaccines, and drugs has emerged in response to this devastating pandemic and more than 130 000 COVID-19-related research papers have been published in peer-reviewed journals or deposited in preprint servers. Much of the research effort has focused on the discovery of novel drug candidates or repurposing of existing drugs against COVID-19, and many such projects have been either exclusively computational or computer-aided experimental studies. Herein, we provide an expert overview of the key computational methods and their applications for the discovery of COVID-19 small-molecule therapeutics that have been reported in the research literature. We further outline that, after the first year the COVID-19 pandemic, it appears that drug repurposing has not produced rapid and global solutions. However, several known drugs have been used in the clinic to cure COVID-19 patients, and a few repurposed drugs continue to be considered in clinical trials, along with several novel clinical candidates. We posit that truly impactful computational tools must deliver actionable, experimentally testable hypotheses enabling the discovery of novel drugs and drug combinations, and that open science and rapid sharing of research results are critical to accelerate the development of novel, much needed therapeutics for COVID-19.
Collapse
Affiliation(s)
- Eugene N. Muratov
- UNC Eshelman School of Pharmacy, University of North CarolinaChapel HillNCUSA
| | - Rommie Amaro
- University of California in San DiegoSan DiegoCAUSA
| | | | | | - Sean Ekins
- Collaborations PharmaceuticalsRaleighNCUSA
| | - Denis Fourches
- Department of Chemistry, North Carolina State UniversityRaleighNCUSA
| | - Olexandr Isayev
- Department of Chemistry, Carnegie Melon UniversityPittsburghPAUSA
| | - Dima Kozakov
- Department of Applied Mathematics and Statistics, Stony Brook UniversityStony BrookNYUSA
| | | | - Kenneth M. Merz
- Department of Chemistry, Michigan State UniversityEast LansingMIUSA
| | - Tudor I. Oprea
- Department of Internal Medicine and UNM Comprehensive Cancer Center, University of New Mexico, AlbuquerqueNMUSA
- Department of Rheumatology and Inflammation Research, Gothenburg UniversitySweden
- Novo Nordisk Foundation Center for Protein Research, University of CopenhagenDenmark
| | | | - Gisbert Schneider
- Institute of Pharmaceutical Sciences, Swiss Federal Institute of TechnologyZurichSwitzerland
| | | | - Alexandre Varnek
- Department of Chemistry, University of StrasbourgStrasbourgFrance
- Institute for Chemical Reaction Design and Discovery (WPI-ICReDD), Hokkaido UniversitySapporoJapan
| | - David A. Winkler
- Monash Institute of Pharmaceutical Sciences, Monash UniversityMelbourneVICAustralia
- School of Biochemistry and Genetics, La Trobe Institute for Molecular Science, La Trobe UniversityBundooraAustralia
- School of Pharmacy, University of NottinghamNottinghamUK
| | | | - Artem Cherkasov
- Vancouver Prostate Centre, University of British ColumbiaVancouverBCCanada
| | - Alexander Tropsha
- UNC Eshelman School of Pharmacy, University of North CarolinaChapel HillNCUSA
| |
Collapse
|
11
|
Zabolotna Y, Ertl P, Horvath D, Bonachera F, Marcou G, Varnek A. NP Navigator: A New Look at the Natural Product Chemical Space. Mol Inform 2021; 40:e2100068. [PMID: 34170632 DOI: 10.1002/minf.202100068] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2021] [Accepted: 05/15/2021] [Indexed: 11/08/2022]
Abstract
Natural products (NPs), being evolutionary selected over millions of years to bind to biological macromolecules, remained an important source of inspiration for medicinal chemists even after the advent of efficient drug discovery technologies such as combinatorial chemistry and high-throughput screening. Thus, there is a strong demand for efficient and user-friendly computational tools that allow to analyze large libraries of NPs. In this context, we introduce NP Navigator - a freely available intuitive online tool for visualization and navigation through the chemical space of NPs and NP-like molecules. It is based on the hierarchical ensemble of generative topographic maps, featuring NPs from the COlleCtion of Open NatUral producTs (COCONUT), bioactive compounds from ChEMBL and commercially available molecules from ZINC. NP Navigator allows to efficiently analyze different aspects of NPs - chemotype distribution, physicochemical properties, biological activity and commercial availability of NPs. The latter concerns not only purchasable NPs but also their close analogs that can be considered as synthetic mimetics of NPs or pseudo-NPs.
Collapse
Affiliation(s)
- Yuliana Zabolotna
- University of Strasbourg, Laboratory of Chemoinformatics, 4, rue B. Pascal, 67081, Strasbourg, France
| | - Peter Ertl
- Novartis Institutes for BioMedical Research, Novartis Campus, CH-4056, Basel, Switzerland
| | - Dragos Horvath
- University of Strasbourg, Laboratory of Chemoinformatics, 4, rue B. Pascal, 67081, Strasbourg, France
| | - Fanny Bonachera
- University of Strasbourg, Laboratory of Chemoinformatics, 4, rue B. Pascal, 67081, Strasbourg, France
| | - Gilles Marcou
- University of Strasbourg, Laboratory of Chemoinformatics, 4, rue B. Pascal, 67081, Strasbourg, France
| | - Alexandre Varnek
- University of Strasbourg, Laboratory of Chemoinformatics, 4, rue B. Pascal, 67081, Strasbourg, France.,Institute for Chemical Reaction Design and Discovery, WPI-ICReDD), Hokkaido University, Kita 21 Nishi 10, Sapporo, Kita-ku, 001-0021 Sapporo, Japan
| |
Collapse
|
12
|
Serafim MSM, Dos Santos Júnior VS, Gertrudes JC, Maltarollo VG, Honorio KM. Machine learning techniques applied to the drug design and discovery of new antivirals: a brief look over the past decade. Expert Opin Drug Discov 2021; 16:961-975. [PMID: 33957833 DOI: 10.1080/17460441.2021.1918098] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Introduction: Drug design and discovery of new antivirals will always be extremely important in medicinal chemistry, taking into account known and new viral diseases that are yet to come. Although machine learning (ML) have shown to improve predictions on the biological potential of chemicals and accelerate the discovery of drugs over the past decade, new methods and their combinations have improved their performance and established promising perspectives regarding ML in the search for new antivirals.Areas covered: The authors consider some interesting areas that deal with different ML techniques applied to antivirals. Recent innovative studies on ML and antivirals were selected and analyzed in detail. Also, the authors provide a brief look at the past to the present to detect advances and bottlenecks in the area.Expert opinion: From classical ML techniques, it was possible to boost the searches for antivirals. However, from the emergence of new algorithms and the improvement in old approaches, promising results will be achieved every day, as we have observed in the case of SARS-CoV-2. Recent experience has shown that it is possible to use ML to discover new antiviral candidates from virtual screening and drug repurposing.
Collapse
Affiliation(s)
- Mateus Sá Magalhães Serafim
- Departamento de Microbiologia, Instituto de Ciências Biológicas, Universidade Federal de Minas Gerais (UFMG), Belo Horizonte, Brazil
| | | | - Jadson Castro Gertrudes
- Departamento de Computação, Instituto de Ciências Exatas e Biológicas, Universidade Federal de Ouro Preto (UFOP), Ouro Preto, Brazil
| | - Vinícius Gonçalves Maltarollo
- Departamento de Produtos Farmacêuticos, Faculdade de Farmácia, Universidade Federal de Minas Gerais (UFMG), Belo Horizonte, Brazil
| | - Kathia Maria Honorio
- Escola de Artes, Ciências e Humanidades, Universidade de São Paulo (USP), São Paulo, Brazil.,Centro de Ciências Naturais e Humanas, Universidade Federal do ABC (UFABC), Santo André, Brazil
| |
Collapse
|
13
|
Shibayama S, Funatsu K. Industrial Case Study: Identification of Important Substructures and Exploration of Monomers for the Rapid Design of Novel Network Polymers with Distributed Representation. BULLETIN OF THE CHEMICAL SOCIETY OF JAPAN 2021. [DOI: 10.1246/bcsj.20200220] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Affiliation(s)
- Shojiro Shibayama
- Department of Chemical System Engineering, The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo 113-8656, Japan
| | - Kimito Funatsu
- Department of Chemical System Engineering, The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo 113-8656, Japan
| |
Collapse
|
14
|
Horvath D, Marcou G, Varnek A. Trustworthiness, the Key to Grid-Based Map-Driven Predictive Model Enhancement and Applicability Domain Control. J Chem Inf Model 2020; 60:6020-6032. [DOI: 10.1021/acs.jcim.0c00998] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
- Dragos Horvath
- Laboratory of Chemoinformatics, UMR 7140 University of Strasbourg/CNRS, 4 rue Blaise Pascal, 67000 Strasbourg, France
| | - Gilles Marcou
- Laboratory of Chemoinformatics, UMR 7140 University of Strasbourg/CNRS, 4 rue Blaise Pascal, 67000 Strasbourg, France
| | - Alexandre Varnek
- Laboratory of Chemoinformatics, UMR 7140 University of Strasbourg/CNRS, 4 rue Blaise Pascal, 67000 Strasbourg, France
| |
Collapse
|
15
|
Zabolotna Y, Lin A, Horvath D, Marcou G, Volochnyuk DM, Varnek A. Chemography: Searching for Hidden Treasures. J Chem Inf Model 2020; 61:179-188. [PMID: 33334102 DOI: 10.1021/acs.jcim.0c00936] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023]
Abstract
The days when medicinal chemistry was limited to a few series of compounds of therapeutic interest are long gone. Nowadays, no human may succeed to acquire a complete overview of more than a billion existing or feasible compounds within which the potential "blockbuster drugs" are well hidden and yet only a few mouse clicks away. To reach these "hidden treasures", we adapted the generative topographic mapping method to enable efficient navigation through the chemical space, from a global overview to a structural pattern detection, covering, for the first time, the complete ZINC library of purchasable compounds, relative to 1.6 million biologically relevant ChEMBL molecules. About 40 000 hierarchical maps of the chemical space were constructed. Structural motifs inherent to only one library were identified. Roughly 20 000 off-market ChEMBL compound families represent incentives to enrich commercial catalogs. Alternatively, 125 000 ZINC-specific compound classes, absent in structure-activity bases, are novel paths to explore in medicinal chemistry. The complete list of these chemotypes can be downloaded using the link https://forms.gle/B6bUJj82t9EfmttV6.
Collapse
Affiliation(s)
- Yuliana Zabolotna
- University of Strasbourg, Laboratoire de Chemoinformatique, 4, rue B. Pascal, Strasbourg 67081 France
| | - Arkadii Lin
- University of Strasbourg, Laboratoire de Chemoinformatique, 4, rue B. Pascal, Strasbourg 67081 France
| | - Dragos Horvath
- University of Strasbourg, Laboratoire de Chemoinformatique, 4, rue B. Pascal, Strasbourg 67081 France
| | - Gilles Marcou
- University of Strasbourg, Laboratoire de Chemoinformatique, 4, rue B. Pascal, Strasbourg 67081 France
| | - Dmitriy M Volochnyuk
- Institute of Organic Chemistry National Academy of Sciences of Ukraine, Murmanska Street 5, Kyiv 02660, Ukraine.,Enamine Ltd., Chervonotkatska Street 78, Kyiv 02094, Ukraine
| | - Alexandre Varnek
- University of Strasbourg, Laboratoire de Chemoinformatique, 4, rue B. Pascal, Strasbourg 67081 France
| |
Collapse
|
16
|
Horvath D, Orlov A, Osolodkin DI, Ishmukhametov AA, Marcou G, Varnek A. A Chemographic Audit of anti-Coronavirus Structure-activity Information from Public Databases (ChEMBL). Mol Inform 2020; 39:e2000080. [PMID: 32363750 PMCID: PMC7267182 DOI: 10.1002/minf.202000080] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2020] [Accepted: 04/26/2020] [Indexed: 01/30/2023]
Abstract
Discovery of drugs against newly emerged pathogenic agents like the SARS-CoV-2 coronavirus (CoV) must be based on previous research against related species. Scientists need to get acquainted with and develop a global oversight over so-far tested molecules. Chemography (herein used Generative Topographic Mapping, in particular) places structures on a human-readable 2D map (obtained by dimensionality reduction of the chemical space of molecular descriptors) and is thus well suited for such an audit. The goal is to map medicinal chemistry efforts so far targeted against CoVs. This includes comparing libraries tested against various virus species/genera, predicting their polypharmacological profiles and highlighting often encountered chemotypes. Maps are challenged to provide predictive activity landscapes against viral proteins. Definition of "anti-CoV" map zones led to selection of therein residing 380 potential anti-CoV agents, out of a vast pool of 800 M organic compounds.
Collapse
Affiliation(s)
- Dragos Horvath
- Chemoinformatics LaboratoryUMR 7140 CNRS/University of Strasbourg4, rue Blaise Pascal67000Strasbourg
| | - Alexey Orlov
- Chemoinformatics LaboratoryUMR 7140 CNRS/University of Strasbourg4, rue Blaise Pascal67000Strasbourg
- FSBSI “Chumakov FSC R&D IBP RAS”Poselok Instituta Poliomielita 8 bd. 1Poselenie MoskovskyMoscow108819Russia
| | - Dmitry I. Osolodkin
- FSBSI “Chumakov FSC R&D IBP RAS”Poselok Instituta Poliomielita 8 bd. 1Poselenie MoskovskyMoscow108819Russia
- Institute of Translational Medicine and BiotechnologySechenov First Moscow State Medical UniversityTrubetskaya ul. 8Moscow119991Russia
| | - Aydar A. Ishmukhametov
- FSBSI “Chumakov FSC R&D IBP RAS”Poselok Instituta Poliomielita 8 bd. 1Poselenie MoskovskyMoscow108819Russia
- Institute of Translational Medicine and BiotechnologySechenov First Moscow State Medical UniversityTrubetskaya ul. 8Moscow119991Russia
| | - Gilles Marcou
- Chemoinformatics LaboratoryUMR 7140 CNRS/University of Strasbourg4, rue Blaise Pascal67000Strasbourg
| | - Alexandre Varnek
- Chemoinformatics LaboratoryUMR 7140 CNRS/University of Strasbourg4, rue Blaise Pascal67000Strasbourg
| |
Collapse
|
17
|
Lin A, Baskin II, Marcou G, Horvath D, Beck B, Varnek A. Parallel Generative Topographic Mapping: An Efficient Approach for Big Data Handling. Mol Inform 2020; 39:e2000009. [PMID: 32347666 PMCID: PMC7757192 DOI: 10.1002/minf.202000009] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2020] [Accepted: 04/10/2020] [Indexed: 11/12/2022]
Abstract
Generative Topographic Mapping (GTM) can be efficiently used to visualize, analyze and model large chemical data. The GTM manifold needs to span the chemical space deemed relevant for a given problem. Therefore, the Frame set (FS) of compounds used for the manifold construction must well cover a given chemical space. Intuitively, the FS size must raise with the size and diversity of the target library. At the same time, the GTM training can be very slow or even becomes technically impossible at FS sizes of the order of 105 compounds - which is a very small number compared to today's commercially accessible compounds, and, especially, to the theoretically feasible molecules. In order to solve this problem, we propose a Parallel GTM algorithm based on the merging of "intermediate" manifolds constructed in parallel for different subsets of molecules. An ensemble of these subsets forms a FS for the "final" manifold. In order to assess the efficiency of the new algorithm, 80 GTMs were built on the FSs of different sizes ranging from 10 to 1.8 M compounds selected from the ChEMBL database. Each GTM was challenged to build classification models for up to 712 biological activities (depending on the FS size). With the novel parallel GTM procedure, we could thus cover the entire spectrum of possible FS sizes, whereas previous studies were forced to rely on the working hypothesis that FS sizes of few thousands of compounds are sufficient to describe the ChEMBL chemical space. In fact, this study formally proves this to be true: a FS containing only 5000 randomly picked compounds is sufficient to represent the entire ChEMBL collection (1.8 M molecules), in the sense that a further increase of FS compound numbers has no benefice impact on the predictive propensity of the above-mentioned 712 activity classification models. Parallel GTM may, however, be required to generate maps based on very large FS, that might improve chemical space cartography of big commercial and virtual libraries, approaching billions of compounds.
Collapse
Affiliation(s)
- Arkadii Lin
- University of StrasbourgLaboratory of Chemoinformatics, Faculty of Chemistry4, Blaise Pascal str.67081StrasbourgFrance
| | - Igor I. Baskin
- Faculty of PhysicsLomonosov Moscow State University1/2, Leninskie Gory str.119991MoscowRussia
| | - Gilles Marcou
- University of StrasbourgLaboratory of Chemoinformatics, Faculty of Chemistry4, Blaise Pascal str.67081StrasbourgFrance
| | - Dragos Horvath
- University of StrasbourgLaboratory of Chemoinformatics, Faculty of Chemistry4, Blaise Pascal str.67081StrasbourgFrance
| | - Bernd Beck
- Department of Medicinal ChemistryBoehringer Ingelheim Pharma GmbH & Co. KG65, Birkendorfer str.88397Biberach an der RissGermany
| | - Alexandre Varnek
- University of StrasbourgLaboratory of Chemoinformatics, Faculty of Chemistry4, Blaise Pascal str.67081StrasbourgFrance
| |
Collapse
|
18
|
Horvath D, Marcou G, Varnek A. Generative topographic mapping in drug design. DRUG DISCOVERY TODAY. TECHNOLOGIES 2019; 32-33:99-107. [PMID: 33386101 DOI: 10.1016/j.ddtec.2020.06.003] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/07/2020] [Revised: 06/10/2020] [Accepted: 06/18/2020] [Indexed: 06/12/2023]
Abstract
This is a review article of Generative Topographic Mapping (GTM) - a non-linear dimensionality reduction technique producing generative 2D maps of high-dimensional vector spaces - and its specific applications in Drug Design (chemical space cartography, compound library design and analysis, virtual screening, pharmacological profiling, de novo drug design, conformational space & docking interaction cartography, etc.) Written by chemoinformaticians for potential users among medicinal chemists and biologists, the article purposely avoids all underlying mathematics. First, the GTM concept is intuitively explained, based on the strong analogies with the rather popular Self-Organizing Maps (SOMs), which are well established library analysis tools. GTM is basically a fuzzy-logics-based generalization of SOMs. The second part of the review, some of published GTM applications in drug design are briefly revisited.
Collapse
Affiliation(s)
- Dragos Horvath
- Laboratory of Chemoinformatics, UMR 7140 University of Strasbourg/CNRS, 4 rue Blaise Pascal, 67000 Strasbourg, France.
| | - Gilles Marcou
- Laboratory of Chemoinformatics, UMR 7140 University of Strasbourg/CNRS, 4 rue Blaise Pascal, 67000 Strasbourg, France
| | - Alexandre Varnek
- Laboratory of Chemoinformatics, UMR 7140 University of Strasbourg/CNRS, 4 rue Blaise Pascal, 67000 Strasbourg, France.
| |
Collapse
|
19
|
Lin A, Beck B, Horvath D, Marcou G, Varnek A. Diversifying chemical libraries with generative topographic mapping. J Comput Aided Mol Des 2019; 34:805-815. [DOI: 10.1007/s10822-019-00215-x] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2019] [Accepted: 07/15/2019] [Indexed: 01/28/2023]
|