1
|
Heinzke AL, Pahl A, Zdrazil B, Leach AR, Waldmann H, Young RJ, Leeson PD. Occurrence of "Natural Selection" in Successful Small Molecule Drug Discovery. J Med Chem 2024; 67:11226-11241. [PMID: 38949112 PMCID: PMC11247505 DOI: 10.1021/acs.jmedchem.4c00811] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2024] [Revised: 06/08/2024] [Accepted: 06/13/2024] [Indexed: 07/02/2024]
Abstract
Published compounds from ChEMBL version 32 are used to seek evidence for the occurrence of "natural selection" in drug discovery. Three measures of natural product (NP) character were applied, to compare time- and target-matched compounds reaching the clinic (clinical compounds in phase 1-3 development and approved drugs) with background compounds (reference compounds). Pseudo-NPs (PNPs), containing NP fragments combined in ways inaccessible by nature, are increasing over time, reaching 67% of clinical compounds first disclosed since 2010. PNPs are 54% more likely to be found in post-2008 clinical versus reference compounds. The majority of target classes show increased clinical compound NP character versus their reference compounds. Only 176 NP fragments appear in >1000 clinical compounds published since 2008, yet these make up on average 63% of the clinical compound's core scaffolds. There is untapped potential awaiting exploitation, by applying nature's building blocks─"natural intelligence"─to drug design.
Collapse
Affiliation(s)
- A. Lina Heinzke
- European
Molecular Biology Laboratory, European Bioinformatics
Institute, Wellcome Genome Campus, Hinxton CB10 1SD, Cambridgeshire, U.K.
| | - Axel Pahl
- Compound
Management and Screening Center, Max-Planck-Institute
of Molecular Physiology, Otto-Hahn-Straße 11, 44227 Dortmund, Germany
| | - Barbara Zdrazil
- European
Molecular Biology Laboratory, European Bioinformatics
Institute, Wellcome Genome Campus, Hinxton CB10 1SD, Cambridgeshire, U.K.
| | - Andrew R. Leach
- European
Molecular Biology Laboratory, European Bioinformatics
Institute, Wellcome Genome Campus, Hinxton CB10 1SD, Cambridgeshire, U.K.
| | - Herbert Waldmann
- Department
of Chemical Biology, Max-Planck-Institute
of Molecular Physiology, Otto-Hahn-Straße 11, 44227 Dortmund, Germany
- Faculty
of Chemistry and Chemical Biology, Technical
University Dortmund, Otto-Hahn-Straße 6, 44227 Dortmund, Germany
| | | | - Paul D. Leeson
- Paul Leeson
Consulting Ltd., Nuneaton CV13 6LZ, Warwickshire, U.K.
| |
Collapse
|
2
|
Dobbelaere MR, Lengyel I, Stevens CV, Van Geem KM. Rxn-INSIGHT: fast chemical reaction analysis using bond-electron matrices. J Cheminform 2024; 16:37. [PMID: 38553720 PMCID: PMC10980627 DOI: 10.1186/s13321-024-00834-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2023] [Accepted: 03/23/2024] [Indexed: 04/02/2024] Open
Abstract
The challenge of devising pathways for organic synthesis remains a central issue in the field of medicinal chemistry. Over the span of six decades, computer-aided synthesis planning has given rise to a plethora of potent tools for formulating synthetic routes. Nevertheless, a significant expert task still looms: determining the appropriate solvent, catalyst, and reagents when provided with a set of reactants to achieve and optimize the desired product for a specific step in the synthesis process. Typically, chemists identify key functional groups and rings that exert crucial influences at the reaction center, classify reactions into categories, and may assign them names. This research introduces Rxn-INSIGHT, an open-source algorithm based on the bond-electron matrix approach, with the purpose of automating this endeavor. Rxn-INSIGHT not only streamlines the process but also facilitates extensive querying of reaction databases, effectively replicating the thought processes of an organic chemist. The core functions of the algorithm encompass the classification and naming of reactions, extraction of functional groups, rings, and scaffolds from the involved chemical entities. The provision of reaction condition recommendations based on the similarity and prevalence of reactions eventually arises as a side application. The performance of our rule-based model has been rigorously assessed against a carefully curated benchmark dataset, exhibiting an accuracy rate exceeding 90% in reaction classification and surpassing 95% in reaction naming. Notably, it has been discerned that a pivotal factor in selecting analogous reactions lies in the analysis of ring structures participating in the reactions. An examination of ring structures within the USPTO chemical reaction database reveals that with just 35 unique rings, a remarkable 75% of all rings found in nearly 1 million products can be encompassed. Furthermore, Rxn-INSIGHT is proficient in suggesting appropriate choices for solvents, catalysts, and reagents in entirely novel reactions, all within the span of a second, utilizing nothing more than an everyday laptop.
Collapse
Affiliation(s)
- Maarten R Dobbelaere
- Laboratory for Chemical Technology, Department of Materials, Textiles and Chemical Engineering, Faculty of Engineering and Architecture, Ghent University, Technologiepark 125, 9052, Ghent, Belgium
| | - István Lengyel
- Laboratory for Chemical Technology, Department of Materials, Textiles and Chemical Engineering, Faculty of Engineering and Architecture, Ghent University, Technologiepark 125, 9052, Ghent, Belgium
- ChemInsights LLC, Dover, DE, 19901, USA
| | - Christian V Stevens
- SynBioC Research Group, Department of Green Chemistry and Technology, Faculty of Bioscience Engineering, Ghent University, Coupure Links 653, 9000, Ghent, Belgium
| | - Kevin M Van Geem
- Laboratory for Chemical Technology, Department of Materials, Textiles and Chemical Engineering, Faculty of Engineering and Architecture, Ghent University, Technologiepark 125, 9052, Ghent, Belgium.
| |
Collapse
|
3
|
Ertl P. Database of 4 Million Medicinal Chemistry-Relevant Ring Systems. J Chem Inf Model 2024; 64:1245-1250. [PMID: 38311838 DOI: 10.1021/acs.jcim.3c01812] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2024]
Abstract
Central ring systems are the most important part of bioactive molecules. They determine molecule shape, keep substituents in their proper positions, and also influence global molecular properties. In the present study, a database of 4 million medicinal chemistry-relevant ring systems has been created, not by crude random enumeration but by applying a set of rules derived by analyzing rings present in bioactive molecules. The aromatic properties and tautomer stability of generated rings have also been considered to ensure that the rings in the database are stable and chemically reasonable. 99.2% of these rings are novel and not included in molecules in the ChEMBL or PubChem databases. This large database of ring systems has been created with the goal to provide support for bioisosteric design and scaffold hopping as well as to be used in generative chemistry applications. The complete set of created rings is available for download in the SMILES format from https://peter-ertl.com/molecular/data/.
Collapse
Affiliation(s)
- Peter Ertl
- Global Discovery Chemistry, Biomedical Research, Novartis CH-4056 Basel, Switzerland
| |
Collapse
|
4
|
Tandi M, Tripathi N, Gaur A, Gopal B, Sundriyal S. Curation and cheminformatics analysis of a Ugi-reaction derived library (URDL) of synthetically tractable small molecules for virtual screening application. Mol Divers 2024; 28:37-50. [PMID: 36574164 DOI: 10.1007/s11030-022-10588-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2022] [Accepted: 12/17/2022] [Indexed: 12/28/2022]
Abstract
Virtual screening (VS) is an important approach in drug discovery and relies on the availability of a virtual library of synthetically tractable molecules. Ugi reaction (UR) represents an important multi-component reaction (MCR) that reliably produces a peptidomimetic scaffold. Recent literature shows that a tactically assembled Ugi adduct can be subjected to further chemical modifications to yield a variety of rings and scaffolds, thus, renewing the interest in this old reaction. Given the reliability and efficiency of UR, we collated an UR derived library (URDL) of small molecules (total = 5773) for VS. The synthesis of the majority of URDL molecules may be carried out in 1-2 pots in a time and cost-effective manner. The detailed analysis of the average property and chemical space of URDL was also carried out using the open-source Datawarrior program. The comparison with FDA-approved oral drugs and inhibitors of protein-protein interactions (iPPIs) suggests URDL molecules are 'clean', drug-like, and conform to a structurally distinct space from the other two categories. The average physicochemical properties of compounds in the URDL library lie closer to iPPI molecules than oral drugs thus suggesting that the URDL resource can be applied to discover novel iPPI molecules. The URDL molecules consist of diverse ring systems, many of which have not been exploited yet for drug design. Thus, URDL represents a small virtual library of drug-like molecules with unexplored chemical space designed for VS. The structures of all molecules of URDL, oral drugs, and iPPI compounds are being made freely accessible as supplementary information for broader application.
Collapse
Affiliation(s)
- Mukesh Tandi
- Department of Pharmacy, Birla Institute of Technology and Science Pilani, Pilani Campus, Rajasthan, 333031, India
| | - Nancy Tripathi
- Department of Pharmacy, Birla Institute of Technology and Science Pilani, Pilani Campus, Rajasthan, 333031, India
| | - Animesh Gaur
- Department of Pharmacy, Birla Institute of Technology and Science Pilani, Pilani Campus, Rajasthan, 333031, India
| | | | - Sandeep Sundriyal
- Department of Pharmacy, Birla Institute of Technology and Science Pilani, Pilani Campus, Rajasthan, 333031, India.
| |
Collapse
|
5
|
Buehler Y, Reymond JL. Molecular Framework Analysis of the Generated Database GDB-13s. J Chem Inf Model 2023; 63:484-492. [PMID: 36533982 PMCID: PMC9875802 DOI: 10.1021/acs.jcim.2c01107] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2022] [Indexed: 12/23/2022]
Abstract
The generated databases (GDBs) list billions of possible molecules from systematic enumeration following simple rules of chemical stability and synthetic feasibility. To assess the originality of GDB molecules, we compared their Bemis and Murcko molecular frameworks (MFs) with those in public databases. MFs result from molecules by converting all atoms to carbons, all bonds to single bonds, and removing terminal atoms iteratively until none remain. We compared GDB-13s (99,394,177 molecules up to 13 atoms containing simplified functional groups, 22,130 MFs) with ZINC (885,905,524 screening compounds, 1,016,597 MFs), PubChem50 (100,852,694 molecules up to 50 atoms, 1,530,189 MFs), and COCONUT (401,624 natural products, 42,734 MFs). While MFs in public databases mostly contained linker bonds and six-membered rings, GDB-13s MFs had diverse ring sizes and ring systems without linker bonds. Most GDB-13s MFs were exclusive to this database, and many were relatively simple, representing attractive targets for synthetic chemistry aiming at innovative molecules.
Collapse
Affiliation(s)
- Ye Buehler
- Department of Chemistry, Biochemistry
and Pharmaceutical Sciences, University
of Bern, Freiestrasse 3, 3012Bern, Switzerland
| | - Jean-Louis Reymond
- Department of Chemistry, Biochemistry
and Pharmaceutical Sciences, University
of Bern, Freiestrasse 3, 3012Bern, Switzerland
| |
Collapse
|
6
|
Shearer J, Castro JL, Lawson ADG, MacCoss M, Taylor RD. Rings in Clinical Trials and Drugs: Present and Future. J Med Chem 2022; 65:8699-8712. [PMID: 35730680 PMCID: PMC9289879 DOI: 10.1021/acs.jmedchem.2c00473] [Citation(s) in RCA: 110] [Impact Index Per Article: 55.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
We present a comprehensive analysis of all ring systems (both heterocyclic and nonheterocyclic) in clinical trial compounds and FDA-approved drugs. We show 67% of small molecules in clinical trials comprise only ring systems found in marketed drugs, which mirrors previously published findings for newly approved drugs. We also show there are approximately 450 000 unique ring systems derived from 2.24 billion molecules currently available in synthesized chemical space, and molecules in clinical trials utilize only 0.1% of this available pool. Moreover, there are fewer ring systems in drugs compared with those in clinical trials, but this is balanced by the drug ring systems being reused more often. Furthermore, systematic changes of up to two atoms on existing drug and clinical trial ring systems give a set of 3902 future clinical trial ring systems, which are predicted to cover approximately 50% of the novel ring systems entering clinical trials.
Collapse
Affiliation(s)
| | | | | | - Malcolm MacCoss
- Bohicket Pharma Consulting Limited Liability Company, 2556 Seabrook Island Road, Seabrook Island, South Carolina29455, United States
| | | |
Collapse
|
7
|
Warr WA, Nicklaus MC, Nicolaou CA, Rarey M. Exploration of Ultralarge Compound Collections for Drug Discovery. J Chem Inf Model 2022; 62:2021-2034. [PMID: 35421301 DOI: 10.1021/acs.jcim.2c00224] [Citation(s) in RCA: 46] [Impact Index Per Article: 23.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
Designing new medicines more cheaply and quickly is tightly linked to the quest of exploring chemical space more widely and efficiently. Chemical space is monumentally large, but recent advances in computer software and hardware have enabled researchers to navigate virtual chemical spaces containing billions of chemical structures. This review specifically concerns collections of many millions or even billions of enumerated chemical structures as well as even larger chemical spaces that are not fully enumerated. We present examples of chemical libraries and spaces and the means used to construct them, and we discuss new technologies for searching huge libraries and for searching combinatorially in chemical space. We also cover space navigation techniques and consider new approaches to de novo drug design and the impact of the "autonomous laboratory" on synthesis of designed compounds. Finally, we summarize some other challenges and opportunities for the future.
Collapse
Affiliation(s)
- Wendy A Warr
- Wendy Warr & Associates, 6 Berwick Court, Holmes Chapel, Crewe, Cheshire CW4 7HZ, United Kingdom
| | - Marc C Nicklaus
- NCI, NIH, CADD Group, NCI-Frederick, Frederick, Maryland 21702, United States
| | - Christos A Nicolaou
- Discovery Chemistry, Lilly Research Laboratories, Eli Lilly and Company, Indianapolis, Indiana 46285, United States
| | - Matthias Rarey
- Universität Hamburg, ZBH Center for Bioinformatics, 20146 Hamburg, Germany
| |
Collapse
|
8
|
Hernández‐Lladó P, Garrec K, Schmitt DC, Burton JW. Transition Metal‐Free, Visible Light‐Mediated Radical Cyclisation of Malonyl Radicals onto 5‐Ring Heteroaromatics. Adv Synth Catal 2022. [DOI: 10.1002/adsc.202101451] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Affiliation(s)
- Pol Hernández‐Lladó
- Department of Chemistry Chemistry Research Laboratory University of Oxford Mansfield Road Oxford OX1 3TA UK
| | - Kilian Garrec
- Department of Chemistry Chemistry Research Laboratory University of Oxford Mansfield Road Oxford OX1 3TA UK
| | - Daniel C. Schmitt
- Medicine Design Pfizer Worldwide Research Development and Medical Groton Connecticut 06340 United States
| | - Jonathan W. Burton
- Department of Chemistry Chemistry Research Laboratory University of Oxford Mansfield Road Oxford OX1 3TA UK
| |
Collapse
|
9
|
Thomas M, Boardman A, Garcia-Ortegon M, Yang H, de Graaf C, Bender A. Applications of Artificial Intelligence in Drug Design: Opportunities and Challenges. METHODS IN MOLECULAR BIOLOGY (CLIFTON, N.J.) 2021; 2390:1-59. [PMID: 34731463 DOI: 10.1007/978-1-0716-1787-8_1] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
Abstract
Artificial intelligence (AI) has undergone rapid development in recent years and has been successfully applied to real-world problems such as drug design. In this chapter, we review recent applications of AI to problems in drug design including virtual screening, computer-aided synthesis planning, and de novo molecule generation, with a focus on the limitations of the application of AI therein and opportunities for improvement. Furthermore, we discuss the broader challenges imposed by AI in translating theoretical practice to real-world drug design; including quantifying prediction uncertainty and explaining model behavior.
Collapse
Affiliation(s)
- Morgan Thomas
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Cambridge, UK
| | - Andrew Boardman
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Cambridge, UK
| | - Miguel Garcia-Ortegon
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Cambridge, UK.,Department of Pure Mathematics and Mathematical Statistics, University of Cambridge, Cambridge, UK
| | - Hongbin Yang
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Cambridge, UK
| | | | - Andreas Bender
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Cambridge, UK.
| |
Collapse
|
10
|
Keller SM, Samarin M, Arend Torres F, Wieser M, Roth V. Learning Extremal Representations with Deep Archetypal Analysis. Int J Comput Vis 2021; 129:805-820. [PMID: 34720403 PMCID: PMC8550171 DOI: 10.1007/s11263-020-01390-3] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2020] [Accepted: 09/28/2020] [Indexed: 11/24/2022]
Abstract
Archetypes represent extreme manifestations of a population with respect to specific characteristic traits or features. In linear feature space, archetypes approximate the data convex hull allowing all data points to be expressed as convex mixtures of archetypes. As mixing of archetypes is performed directly on the input data, linear Archetypal Analysis requires additivity of the input, which is a strong assumption unlikely to hold e.g. in case of image data. To address this problem, we propose learning an appropriate latent feature space while simultaneously identifying suitable archetypes. We thus introduce a generative formulation of the linear archetype model, parameterized by neural networks. By introducing the distance-dependent archetype loss, the linear archetype model can be integrated into the latent space of a deep variational information bottleneck and an optimal representation, together with the archetypes, can be learned end-to-end. Moreover, the information bottleneck framework allows for a natural incorporation of arbitrarily complex side information during training. As a consequence, learned archetypes become easily interpretable as they derive their meaning directly from the included side information. Applicability of the proposed method is demonstrated by exploring archetypes of female facial expressions while using multi-rater based emotion scores of these expressions as side information. A second application illustrates the exploration of the chemical space of small organic molecules. By using different kinds of side information we demonstrate how identified archetypes, along with their interpretation, largely depend on the side information provided. Supplementary Information The online version contains supplementary material available at 10.1007/s11263-020-01390-3.
Collapse
Affiliation(s)
- Sebastian Mathias Keller
- Department of Mathematics and Computer Science, University of Basel, Spiegelgasse 1, 4051 Basel, Switzerland
| | - Maxim Samarin
- Department of Mathematics and Computer Science, University of Basel, Spiegelgasse 1, 4051 Basel, Switzerland
| | - Fabricio Arend Torres
- Department of Mathematics and Computer Science, University of Basel, Spiegelgasse 1, 4051 Basel, Switzerland
| | - Mario Wieser
- Department of Mathematics and Computer Science, University of Basel, Spiegelgasse 1, 4051 Basel, Switzerland
| | - Volker Roth
- Department of Mathematics and Computer Science, University of Basel, Spiegelgasse 1, 4051 Basel, Switzerland
| |
Collapse
|
11
|
Ertl P. Magic Rings: Navigation in the Ring Chemical Space Guided by the Bioactive Rings. J Chem Inf Model 2021; 62:2164-2170. [PMID: 34445865 DOI: 10.1021/acs.jcim.1c00761] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
The large majority of bioactive molecules contain a more or less complex ring system as a central structural element. This central core determines the basic molecule shape, keeps substituents in their proper positions, and often also contributes to the biological activity itself. In this study the ring systems extracted from one billion molecules are processed and differences between rings from bioactive molecules and common synthetic molecules are analyzed. The bioactive rings seem to be distributed throughout the large portion of chemical space, but not uniformly; one can see several more dense regions, where the bioactive rings often appear in small clusters, as well as empty areas. A web tool offering an interactive navigation in the ring chemical space and supporting identification of bioisosteric ring analogs available at https://bit.ly/magicrings is also described.
Collapse
Affiliation(s)
- Peter Ertl
- Novartis Institutes for BioMedical Research, CH-4056 Basel, Switzerland
| |
Collapse
|
12
|
Yu P, Sterling AJ, Hein J. A Novel Automated Screening Method for Combinatorially Generated Small Molecules. J Chem Inf Model 2021; 61:1637-1646. [PMID: 33844913 DOI: 10.1021/acs.jcim.0c01462] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
A main challenge in the enumeration of small-molecule chemical spaces for drug design is to quickly and accurately differentiate between possible and impossible molecules. Current approaches for screening enumerated molecules (e.g., 2D heuristics and 3D force fields) have not been able to achieve a balance between accuracy and speed. We have developed a new automated approach for fast and high-quality screening of small molecules, with the following steps: (1) for each molecule in the set, an ensemble of 2D descriptors as feature encoding is computed; (2) on a random small subset, classification (feasible/infeasible) targets via a 3D-based approach are generated; (3) a classification dataset with the computed features and targets is formed and a machine learning model for predicting the 3D approach's decisions is trained; and (4) the trained model is used to screen the remainder of the enumerated set. Our approach is ≈8× (7.96× to 8.84×) faster than screening via 3D simulations without significantly sacrificing accuracy; while compared to 2D-based pruning rules, this approach is more accurate, with better coverage of known feasible molecules. Once the topological features and 3D conformer evaluation methods are established, the process can be fully automated, without any additional chemistry expertise.
Collapse
Affiliation(s)
- Pingshi Yu
- Department of Statistics, University of Oxford, 29 St Giles', Oxford OX1 2JD, U.K.,Department of Computer Science, University of Oxford, 15 Parks Road, Oxford OX1 3QD, U.K
| | - Alistair J Sterling
- Department of Chemistry, University of Oxford, Mansfield Road, Oxford OX1 3TA, U.K
| | - Jotun Hein
- Department of Statistics, University of Oxford, 29 St Giles', Oxford OX1 2JD, U.K
| |
Collapse
|
13
|
Meier K, Arús‐Pous J, Reymond J. A Potent and Selective Janus Kinase Inhibitor with a Chiral 3D‐Shaped Triquinazine Ring System from Chemical Space. Angew Chem Int Ed Engl 2021. [DOI: 10.1002/ange.202012049] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Affiliation(s)
- Kris Meier
- Department of Chemistry and Biochemistry University of Bern Freiestrasse 3 3012 Bern Switzerland
| | - Josep Arús‐Pous
- Department of Chemistry and Biochemistry University of Bern Freiestrasse 3 3012 Bern Switzerland
| | - Jean‐Louis Reymond
- Department of Chemistry and Biochemistry University of Bern Freiestrasse 3 3012 Bern Switzerland
| |
Collapse
|
14
|
Shan J, Pan X, Wang X, Xiao X, Ji C. FragRep: A Web Server for Structure-Based Drug Design by Fragment Replacement. J Chem Inf Model 2020; 60:5900-5906. [PMID: 33275427 DOI: 10.1021/acs.jcim.0c00767] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
The design of efficient computational tools for structure-guided ligand design is essential for the drug discovery process. We hereby present FragRep, a new web server for structure-based ligand design by fragment replacement. The input is a protein and a ligand structure, either from protein data bank or from molecular docking. Users can choose specific substructures they want to modify. The server tries to find suitable fragments that not only meet the geometric requirements of the remaining part of the ligand but also fit well with local protein environments. FragRep is a powerful computational tool for the rapid generation of ligand design ideas; either in scaffold hopping or bioisosteric replacing. The FragRep Server is freely available to researchers and can be accessed at http://xundrug.cn/fragrep.
Collapse
Affiliation(s)
- Jinwen Shan
- Shanghai Engineering Research Center for Molecular Therapeutics and New Drug Development, School of Chemistry and Molecular Engineering, East China Normal University, Shanghai 200062 China.,NYU-ECNU Center for Computational Chemistry at NYU Shanghai, Shanghai 200062 China
| | - Xiaolin Pan
- Shanghai Engineering Research Center for Molecular Therapeutics and New Drug Development, School of Chemistry and Molecular Engineering, East China Normal University, Shanghai 200062 China.,NYU-ECNU Center for Computational Chemistry at NYU Shanghai, Shanghai 200062 China
| | - Xingyu Wang
- NYU-ECNU Center for Computational Chemistry at NYU Shanghai, Shanghai 200062 China
| | - Xudong Xiao
- Shanghai Engineering Research Center for Molecular Therapeutics and New Drug Development, School of Chemistry and Molecular Engineering, East China Normal University, Shanghai 200062 China.,NYU-ECNU Center for Computational Chemistry at NYU Shanghai, Shanghai 200062 China
| | - Changge Ji
- Shanghai Engineering Research Center for Molecular Therapeutics and New Drug Development, School of Chemistry and Molecular Engineering, East China Normal University, Shanghai 200062 China.,NYU-ECNU Center for Computational Chemistry at NYU Shanghai, Shanghai 200062 China
| |
Collapse
|
15
|
Meier K, Arús‐Pous J, Reymond J. A Potent and Selective Janus Kinase Inhibitor with a Chiral 3D‐Shaped Triquinazine Ring System from Chemical Space. Angew Chem Int Ed Engl 2020; 60:2074-2077. [DOI: 10.1002/anie.202012049] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2020] [Revised: 09/25/2020] [Indexed: 01/31/2023]
Affiliation(s)
- Kris Meier
- Department of Chemistry and Biochemistry University of Bern Freiestrasse 3 3012 Bern Switzerland
| | - Josep Arús‐Pous
- Department of Chemistry and Biochemistry University of Bern Freiestrasse 3 3012 Bern Switzerland
| | - Jean‐Louis Reymond
- Department of Chemistry and Biochemistry University of Bern Freiestrasse 3 3012 Bern Switzerland
| |
Collapse
|
16
|
Thakkar A, Selmi N, Reymond JL, Engkvist O, Bjerrum EJ. "Ring Breaker": Neural Network Driven Synthesis Prediction of the Ring System Chemical Space. J Med Chem 2020; 63:8791-8808. [PMID: 32352286 DOI: 10.1021/acs.jmedchem.9b01919] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/02/2023]
Abstract
Ring systems in pharmaceuticals, agrochemicals, and dyes are ubiquitous chemical motifs. While the synthesis of common ring systems is well described and novel ring systems can be readily and computationally enumerated, the synthetic accessibility of unprecedented ring systems remains a challenge. "Ring Breaker" uses a data-driven approach to enable the prediction of ring-forming reactions, for which we have demonstrated its utility on frequently found and unprecedented ring systems, in agreement with literature syntheses. We demonstrate the performance of the neural network on a range of ring fragments from the ZINC and DrugBank databases and highlight its potential for incorporation into computer aided synthesis planning tools. These approaches to ring formation and retrosynthetic disconnection offer opportunities for chemists to explore and select more efficient syntheses/synthetic routes.
Collapse
Affiliation(s)
- Amol Thakkar
- Hit Discovery, Discovery Sciences, R&D, AstraZeneca, Gothenburg 431 50, Sweden.,Department of Chemistry and Biochemistry, University of Bern, Bern CH-3012, Switzerland
| | - Nidhal Selmi
- Hit Discovery, Discovery Sciences, R&D, AstraZeneca, Gothenburg 431 50, Sweden
| | - Jean-Louis Reymond
- Department of Chemistry and Biochemistry, University of Bern, Bern CH-3012, Switzerland
| | - Ola Engkvist
- Hit Discovery, Discovery Sciences, R&D, AstraZeneca, Gothenburg 431 50, Sweden
| | | |
Collapse
|
17
|
Bühlmann S, Reymond JL. ChEMBL-Likeness Score and Database GDBChEMBL. Front Chem 2020; 8:46. [PMID: 32117874 PMCID: PMC7010641 DOI: 10.3389/fchem.2020.00046] [Citation(s) in RCA: 22] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2019] [Accepted: 01/15/2020] [Indexed: 01/02/2023] Open
Abstract
The generated database GDB17 enumerates 166.4 billion molecules up to 17 atoms of C, N, O, S and halogens following simple rules of chemical stability and synthetic feasibility. However, most molecules in GDB17 are too complex to be considered for chemical synthesis. To address this limitation, we report GDBChEMBL as a subset of GDB17 featuring 10 million molecules selected according to a ChEMBL-likeness score (CLscore) calculated from the frequency of occurrence of circular substructures in ChEMBL, followed by uniform sampling across molecular size, stereocenters and heteroatoms. Compared to the previously reported subsets FDB17 and GDBMedChem selected from GDB17 by fragment-likeness, respectively, medicinal chemistry criteria, our new subset features molecules with higher synthetic accessibility and possibly bioactivity yet retains a broad and continuous coverage of chemical space typical of the entire GDB17. GDBChEMBL is accessible at http://gdb.unibe.ch for download and for browsing using an interactive chemical space map at http://faerun.gdb.tools.
Collapse
Affiliation(s)
- Sven Bühlmann
- Department of Chemistry and Biochemistry, University of Bern, Bern, Switzerland
| | - Jean-Louis Reymond
- Department of Chemistry and Biochemistry, University of Bern, Bern, Switzerland
| |
Collapse
|
18
|
Arús-Pous J, Johansson SV, Prykhodko O, Bjerrum EJ, Tyrchan C, Reymond JL, Chen H, Engkvist O. Randomized SMILES strings improve the quality of molecular generative models. J Cheminform 2019; 11:71. [PMID: 33430971 PMCID: PMC6873550 DOI: 10.1186/s13321-019-0393-0] [Citation(s) in RCA: 126] [Impact Index Per Article: 25.2] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2019] [Accepted: 11/09/2019] [Indexed: 12/22/2022] Open
Abstract
Recurrent Neural Networks (RNNs) trained with a set of molecules represented as unique (canonical) SMILES strings, have shown the capacity to create large chemical spaces of valid and meaningful structures. Herein we perform an extensive benchmark on models trained with subsets of GDB-13 of different sizes (1 million, 10,000 and 1000), with different SMILES variants (canonical, randomized and DeepSMILES), with two different recurrent cell types (LSTM and GRU) and with different hyperparameter combinations. To guide the benchmarks new metrics were developed that define how well a model has generalized the training set. The generated chemical space is evaluated with respect to its uniformity, closedness and completeness. Results show that models that use LSTM cells trained with 1 million randomized SMILES, a non-unique molecular string representation, are able to generalize to larger chemical spaces than the other approaches and they represent more accurately the target chemical space. Specifically, a model was trained with randomized SMILES that was able to generate almost all molecules from GDB-13 with a quasi-uniform probability. Models trained with smaller samples show an even bigger improvement when trained with randomized SMILES models. Additionally, models were trained on molecules obtained from ChEMBL and illustrate again that training with randomized SMILES lead to models having a better representation of the drug-like chemical space. Namely, the model trained with randomized SMILES was able to generate at least double the amount of unique molecules with the same distribution of properties comparing to one trained with canonical SMILES.
Collapse
Affiliation(s)
- Josep Arús-Pous
- Hit Discovery, Discovery Sciences, R&D, AstraZeneca Gothenburg, Mölndal, Sweden.
- Department of Chemistry and Biochemistry, University of Bern, Freiestrasse 3, 3012, Bern, Switzerland.
| | | | - Oleksii Prykhodko
- Hit Discovery, Discovery Sciences, R&D, AstraZeneca Gothenburg, Mölndal, Sweden
| | | | - Christian Tyrchan
- Medicinal Chemistry, BioPharmaceuticals Early RIA, R&D, AstraZeneca Gothenburg, Mölndal, Sweden
| | - Jean-Louis Reymond
- Department of Chemistry and Biochemistry, University of Bern, Freiestrasse 3, 3012, Bern, Switzerland
| | - Hongming Chen
- Hit Discovery, Discovery Sciences, R&D, AstraZeneca Gothenburg, Mölndal, Sweden
| | - Ola Engkvist
- Hit Discovery, Discovery Sciences, R&D, AstraZeneca Gothenburg, Mölndal, Sweden
| |
Collapse
|
19
|
Ivanenkov YA, Zagribelnyy BA, Aladinskiy VA. Are We Opening the Door to a New Era of Medicinal Chemistry or Being Collapsed to a Chemical Singularity? J Med Chem 2019; 62:10026-10043. [PMID: 31188596 DOI: 10.1021/acs.jmedchem.9b00004] [Citation(s) in RCA: 45] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
Abstract
The paradigm of "drug-like-ness" dramatically altered the behavior of the medicinal chemistry community for a long time. In recent years, scientists have empirically found a significant increase in key properties of drugs that have moved structures closer to the periphery or the outside of the rule-of-five "cage". Herein, we show that for the past decade, the number of molecules claimed in patent records by major pharmaceutical companies has dramatically decreased, which may lead to a "chemical singularity". New compounds containing fragments with increased 3D complexity are generally larger, slightly more lipophilic, and more polar. A core difference between this study and recently published papers is that we consider the nature and quality of sp3-rich frameworks rather than sp3 count. We introduce the original descriptor MCE-18, which stands for medicinal chemistry evolution, 2018, and this measure can effectively score molecules by novelty in terms of their cumulative sp3 complexity.
Collapse
Affiliation(s)
- Yan A Ivanenkov
- Insilico Medicine Hong Kong Limited (previously Insilico Medicine, Inc.) , Unit 307A, Core Building 1, 1 Science Park East Avenue, Hong Kong Science Park , Pak Shek Kok , Hong Kong.,Institute of Biochemistry and Genetics Russian Academy of Science (IBG RAS) Ufa Scientific Centre , Oktyabrya Prospekt 71 , Ufa 450054 , Russian Federation.,Moscow Institute of Physics and Technology (State University) , 9 Institutskiy Lane , Dolgoprudny , Moscow 141700 , Russian Federation.,Chemistry Department , Lomonosov Moscow State University , Leninskie Gory, Building 1/3, GSP-1 , Moscow 119991 , Russian Federation
| | - Bogdan A Zagribelnyy
- Insilico Medicine Hong Kong Limited (previously Insilico Medicine, Inc.) , Unit 307A, Core Building 1, 1 Science Park East Avenue, Hong Kong Science Park , Pak Shek Kok , Hong Kong.,Chemistry Department , Lomonosov Moscow State University , Leninskie Gory, Building 1/3, GSP-1 , Moscow 119991 , Russian Federation
| | - Vladimir A Aladinskiy
- Insilico Medicine Hong Kong Limited (previously Insilico Medicine, Inc.) , Unit 307A, Core Building 1, 1 Science Park East Avenue, Hong Kong Science Park , Pak Shek Kok , Hong Kong.,Moscow Institute of Physics and Technology (State University) , 9 Institutskiy Lane , Dolgoprudny , Moscow 141700 , Russian Federation
| |
Collapse
|
20
|
Awale M, Sirockin F, Stiefl N, Reymond JL. Medicinal Chemistry Aware Database GDBMedChem. Mol Inform 2019; 38:e1900031. [PMID: 31169974 DOI: 10.1002/minf.201900031] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2019] [Accepted: 05/21/2019] [Indexed: 12/17/2022]
Abstract
The generated database GDB17 enumerates 166.4 billion possible molecules up to 17 atoms of C, N, O, S and halogens following simple chemical stability and synthetic feasibility rules, however medicinal chemistry criteria are not taken into account. Here we applied rules inspired by medicinal chemistry to exclude problematic functional groups and complex molecules from GDB17, and sampled the resulting subset uniformly across molecular size, stereochemistry and polarity to form GDBMedChem as a compact collection of 10 million small molecules. This collection has reduced complexity and better synthetic accessibility than the entire GDB17 but retains higher sp3 -carbon fraction and natural product likeness scores compared to known drugs. GDBMedChem molecules are more diverse and very different from known molecules in terms of substructures and represent an unprecedented source of diversity for drug design. GDBMedChem is available for 3D-visualization, similarity searching and for download at http://gdb.unibe.ch.
Collapse
Affiliation(s)
- Mahendra Awale
- Department of Chemistry and Biochemistry, University of Bern, Freiestrasse 3, 3012, Bern, Switzerland
| | - Finton Sirockin
- Novartis Institutes for Biomedical Research, Basel, Switzerland
| | - Nikolaus Stiefl
- Novartis Institutes for Biomedical Research, Basel, Switzerland
| | - Jean-Louis Reymond
- Department of Chemistry and Biochemistry, University of Bern, Freiestrasse 3, 3012, Bern, Switzerland
| |
Collapse
|
21
|
Arús-Pous J, Blaschke T, Ulander S, Reymond JL, Chen H, Engkvist O. Exploring the GDB-13 chemical space using deep generative models. J Cheminform 2019; 11:20. [PMID: 30868314 PMCID: PMC6419837 DOI: 10.1186/s13321-019-0341-z] [Citation(s) in RCA: 80] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2018] [Accepted: 02/26/2019] [Indexed: 11/15/2022] Open
Abstract
Recent applications of recurrent neural networks (RNN) enable training models that sample the chemical space. In this study we train RNN with molecular string representations (SMILES) with a subset of the enumerated database GDB-13 (975 million molecules). We show that a model trained with 1 million structures (0.1% of the database) reproduces 68.9% of the entire database after training, when sampling 2 billion molecules. We also developed a method to assess the quality of the training process using negative log-likelihood plots. Furthermore, we use a mathematical model based on the “coupon collector problem” that compares the trained model to an upper bound and thus we are able to quantify how much it has learned. We also suggest that this method can be used as a tool to benchmark the learning capabilities of any molecular generative model architecture. Additionally, an analysis of the generated chemical space was performed, which shows that, mostly due to the syntax of SMILES, complex molecules with many rings and heteroatoms are more difficult to sample.![]()
Collapse
Affiliation(s)
- Josep Arús-Pous
- Hit Discovery, Discovery Sciences, IMED Biotech Unit, AstraZeneca, Gothenburg, Pepparedsleden 1, 43183, Mölndal, Sweden. .,Department of Chemistry and Biochemistry, University of Bern, Freiestrasse 3, 3012, Bern, Switzerland.
| | - Thomas Blaschke
- Hit Discovery, Discovery Sciences, IMED Biotech Unit, AstraZeneca, Gothenburg, Pepparedsleden 1, 43183, Mölndal, Sweden.,Department of Life Science Informatics, B-IT, LIMES Program Unit Chemical Biology and Medicinal Chemistry, Rheinische Friedrich-Wilhelms-Universität, Endenicher Allee 19C, 53115, Bonn, Germany
| | - Silas Ulander
- Medicinal Chemistry, Cardiovascular, Renal and Metabolism, IMED Biotech Unit, AstraZeneca, Gothenburg, Pepparedsleden 1, 43183, Mölndal, Sweden
| | - Jean-Louis Reymond
- Department of Chemistry and Biochemistry, University of Bern, Freiestrasse 3, 3012, Bern, Switzerland
| | - Hongming Chen
- Hit Discovery, Discovery Sciences, IMED Biotech Unit, AstraZeneca, Gothenburg, Pepparedsleden 1, 43183, Mölndal, Sweden
| | - Ola Engkvist
- Hit Discovery, Discovery Sciences, IMED Biotech Unit, AstraZeneca, Gothenburg, Pepparedsleden 1, 43183, Mölndal, Sweden
| |
Collapse
|
22
|
Capecchi A, Awale M, Probst D, Reymond JL. PubChem and ChEMBL beyond Lipinski. Mol Inform 2019; 38:e1900016. [PMID: 30844149 DOI: 10.1002/minf.201900016] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2019] [Accepted: 02/18/2019] [Indexed: 12/13/2022]
Abstract
Seven million of the currently 94 million entries in the PubChem database break at least one of the four Lipinski constraints for oral bioavailability, 183,185 of which are also found in the ChEMBL database. These non-Lipinski PubChem (NLP) and ChEMBL (NLC) subsets are interesting because they contain new modalities that can display biological properties not accessible to small molecule drugs. Unfortunately, the current search tools in PubChem and ChEMBL are designed for small molecules and are not well suited to explore these subsets, which therefore remain poorly appreciated. Herein we report MXFP (macromolecule extended atom-pair fingerprint), a 217-D fingerprint tailored to analyze large molecules in terms of molecular shape and pharmacophores. We implement MXFP in two web-based applications, the first one to visualize NLP and NLC interactively using Faerun (http://faerun.gdb.tools/), the second one to perform MXFP nearest neighbor searches in NLP and NLC (http://similaritysearch.gdb.tools/). We show that these tools provide a meaningful insight into the diversity of large molecules in NLP and NLC. The interactive tools presented here are publicly available at http://gdb.unibe.ch and can be used freely to explore and better understand the diversity of non-Lipinski molecules in PubChem and ChEMBL.
Collapse
Affiliation(s)
- Alice Capecchi
- Department of Chemistry and Biochemistry, University of Bern, Freiestrasse 3, 3012, Bern, Switzerland
| | - Mahendra Awale
- Department of Chemistry and Biochemistry, University of Bern, Freiestrasse 3, 3012, Bern, Switzerland
| | - Daniel Probst
- Department of Chemistry and Biochemistry, University of Bern, Freiestrasse 3, 3012, Bern, Switzerland
| | - Jean-Louis Reymond
- Department of Chemistry and Biochemistry, University of Bern, Freiestrasse 3, 3012, Bern, Switzerland
| |
Collapse
|
23
|
Boström J, Brown DG, Young RJ, Keserü GM. Expanding the medicinal chemistry synthetic toolbox. Nat Rev Drug Discov 2018; 17:709-727. [DOI: 10.1038/nrd.2018.116] [Citation(s) in RCA: 267] [Impact Index Per Article: 44.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
|
24
|
Pereira F, Aires-de-Sousa J. Computational Methodologies in the Exploration of Marine Natural Product Leads. Mar Drugs 2018; 16:md16070236. [PMID: 30011882 PMCID: PMC6070892 DOI: 10.3390/md16070236] [Citation(s) in RCA: 57] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2018] [Revised: 07/02/2018] [Accepted: 07/06/2018] [Indexed: 12/18/2022] Open
Abstract
Computational methodologies are assisting the exploration of marine natural products (MNPs) to make the discovery of new leads more efficient, to repurpose known MNPs, to target new metabolites on the basis of genome analysis, to reveal mechanisms of action, and to optimize leads. In silico efforts in drug discovery of NPs have mainly focused on two tasks: dereplication and prediction of bioactivities. The exploration of new chemical spaces and the application of predicted spectral data must be included in new approaches to select species, extracts, and growth conditions with maximum probabilities of medicinal chemistry novelty. In this review, the most relevant current computational dereplication methodologies are highlighted. Structure-based (SB) and ligand-based (LB) chemoinformatics approaches have become essential tools for the virtual screening of NPs either in small datasets of isolated compounds or in large-scale databases. The most common LB techniques include Quantitative Structure–Activity Relationships (QSAR), estimation of drug likeness, prediction of adsorption, distribution, metabolism, excretion, and toxicity (ADMET) properties, similarity searching, and pharmacophore identification. Analogously, molecular dynamics, docking and binding cavity analysis have been used in SB approaches. Their significance and achievements are the main focus of this review.
Collapse
Affiliation(s)
- Florbela Pereira
- LAQV and REQUIMTE, Departamento de Química, Faculdade de Ciências e Tecnologia, Universidade Nova de Lisboa, 2829-516 Caparica, Portugal.
| | - Joao Aires-de-Sousa
- LAQV and REQUIMTE, Departamento de Química, Faculdade de Ciências e Tecnologia, Universidade Nova de Lisboa, 2829-516 Caparica, Portugal.
| |
Collapse
|