1
|
McKay BD, Yirik MA, Steinbeck C. Surge: a fast open-source chemical graph generator. J Cheminform 2022; 14:24. [PMID: 35461261 PMCID: PMC9034616 DOI: 10.1186/s13321-022-00604-9] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2021] [Accepted: 04/03/2022] [Indexed: 11/10/2022] Open
Abstract
Chemical structure generators are used in cheminformatics to produce or enumerate virtual molecules based on a set of boundary conditions. The result can then be tested for properties of interest, such as adherence to measured data or for their suitability as drugs. The starting point can be a potentially fuzzy set of fragments or a molecular formula. In the latter case, the generator produces the set of constitutional isomers of the given input formula. Here we present the novel constitutional isomer generator surge based on the canonical generation path method. Surge uses the nauty package to compute automorphism groups of graphs. We outline the working principles of surge and present benchmarking results which show that surge is currently the fastest structure generator. Surge is available under a liberal open-source license.
Collapse
Affiliation(s)
- Brendan D McKay
- School of Computing, Australian National University, Canberra, ACT, 2601, Australia.
| | - Mehmet Aziz Yirik
- Institute of Inorganic and Analytical Chemistry, Friedrich-Schiller-University, Lessingstr. 8, 07743, Jena, Germany
| | - Christoph Steinbeck
- Institute of Inorganic and Analytical Chemistry, Friedrich-Schiller-University, Lessingstr. 8, 07743, Jena, Germany.
| |
Collapse
|
2
|
Abstract
Chemical graph generators are software packages to generate computer representations of chemical structures adhering to certain boundary conditions. Their development is a research topic of cheminformatics. Chemical graph generators are used in areas such as virtual library generation in drug design, in molecular design with specified properties, called inverse QSAR/QSPR, as well as in organic synthesis design, retrosynthesis or in systems for computer-assisted structure elucidation (CASE). CASE systems again have regained interest for the structure elucidation of unknowns in computational metabolomics, a current area of computational biology.
Collapse
Affiliation(s)
- Mehmet Aziz Yirik
- Friedrich Schiller Universität Jena, Institute for Inorganic and Analytical Chemistry, Jena, Germany
| | - Christoph Steinbeck
- Friedrich Schiller Universität Jena, Institute for Inorganic and Analytical Chemistry, Jena, Germany
| |
Collapse
|
3
|
Inoue T, Tanaka K, Kotera M, Funatsu K. Improvement of the Structure Generator DAECS with Respect to Structural Diversity. Mol Inform 2020; 40:e2000225. [PMID: 33237627 DOI: 10.1002/minf.202000225] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2020] [Accepted: 10/29/2020] [Indexed: 11/11/2022]
Abstract
The development of novel organic compounds with desired properties is time consuming and costly. Thus, the quantitative structure-property relationship (QSPR) model is used widely for efficiently discovering compounds with the desired properties. Novel structures can be generated from a variety of input structures in silico by structure generators. We previously developed the structure generator DAECS to yield highly active drug-like structures. However, the structural diversity of the structures generated by DAECS was still small for practical applications such as drug discovery. In this paper, we present structure modification rules and the algorithm to output more diverse structures through the DAECS workflow. Two new types of structural modification rules, bond contraction and ring mergence, were added. The new algorithm, which restricts the search area and subsequently clusters structures on a two-dimensional map generated by generative topographic mapping, was implemented for the repetitive selection of seed structures. A case study was conducted to evaluate our method using ligand structures for the histamine H1 receptor. The results showed improved structural diversity than the previous method.
Collapse
Affiliation(s)
- Takahiro Inoue
- Department of Chemical System Engineering, The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo, 113-8656, Japan
| | - Kenichi Tanaka
- Department of Chemical System Engineering, The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo, 113-8656, Japan
| | - Masaaki Kotera
- Department of Chemical System Engineering, The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo, 113-8656, Japan
| | - Kimito Funatsu
- Department of Chemical System Engineering, The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo, 113-8656, Japan
| |
Collapse
|
4
|
Wu S, Lambard G, Liu C, Yamada H, Yoshida R. iQSPR in XenonPy: A Bayesian Molecular Design Algorithm. Mol Inform 2020; 39:e1900107. [PMID: 31841276 PMCID: PMC7050509 DOI: 10.1002/minf.201900107] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2019] [Accepted: 10/14/2019] [Indexed: 01/10/2023]
Abstract
iQSPR is an inverse molecular design algorithm based on Bayesian inference that was developed in our previous study. Here, the algorithm is integrated in Python as a new module called iQSPR-X in the all-in-one materials informatics platform XenonPy. Our new software provides a flexible, easy-to-use, and extensible platform for users to build customized molecular design algorithms using pre-set modules and a pre-trained model library in XenonPy. In this paper, we describe key features of iQSPR-X and provide guidance on its use, illustrated by an application to a polymer design that targets a specific range of bandgap and dielectric constant.
Collapse
Affiliation(s)
- Stephen Wu
- The Institute of Statistical Mathematics, Research Organization of Information and Systems10-3 Midori-choTachikawa, Tokyo190-8562Japan
- The Graduate University for Advanced Studies, SOKENDAI10-3 Midori-choTachikawa, Tokyo190-8562Japan
| | - Guillaume Lambard
- Center for Materials Research by Information Integration (CMI)Research and Services Division of Materials Data and Integrated System (MaDIS), National Institute for Materials Science (NIMS)1-2-1 Sengen, TsukubaIbaraki305-0047Japan
| | - Chang Liu
- The Institute of Statistical Mathematics, Research Organization of Information and Systems10-3 Midori-choTachikawa, Tokyo190-8562Japan
| | - Hironao Yamada
- The Institute of Statistical Mathematics, Research Organization of Information and Systems10-3 Midori-choTachikawa, Tokyo190-8562Japan
- School of Pharmacy, TokyoUniversity of Pharmacy and Life Sciences1432-1 Horinouchi, HachiojiTokyo192-0392Japan
| | - Ryo Yoshida
- The Institute of Statistical Mathematics, Research Organization of Information and Systems10-3 Midori-choTachikawa, Tokyo190-8562Japan
- The Graduate University for Advanced Studies, SOKENDAI10-3 Midori-choTachikawa, Tokyo190-8562Japan
- Center for Materials Research by Information Integration (CMI)Research and Services Division of Materials Data and Integrated System (MaDIS), National Institute for Materials Science (NIMS)1-2-1 Sengen, TsukubaIbaraki305-0047Japan
| |
Collapse
|
5
|
Gantzer P, Creton B, Nieto-Draghi C. Inverse-QSPR for de novo Design: A Review. Mol Inform 2019; 39:e1900087. [PMID: 31682079 DOI: 10.1002/minf.201900087] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2019] [Accepted: 11/04/2019] [Indexed: 11/09/2022]
Abstract
The use of computer tools to solve chemistry-related problems has given rise to a large and increasing number of publications these last decades. This new field of science is now well recognized and labelled Chemoinformatics. Among all chemoinformatics techniques, the use of statistical based approaches for property predictions has been the subject of numerous research reflecting both new developments and many cases of applications. The so obtained predictive models relating a property to molecular features - descriptors - are gathered under the acronym QSPR, for Quantitative Structure Property Relationships. Apart from the obvious use of such models to predict property values for new compounds, their use to virtually synthesize new molecules - de novo design - is currently a high-interest subject. Inverse-QSPR (i-QSPR) methods have hence been developed to accelerate the discovery of new materials that meet a set of specifications. In the proposed manuscript, we review existing i-QSPR methodologies published in the open literature in a way to highlight developments, applications, improvements and limitations of each.
Collapse
Affiliation(s)
- Philippe Gantzer
- IFP Energies nouvelles, 1 et 4 avenue de Bois-Préau, 92852, Rueil-Malmaison, France
| | - Benoit Creton
- IFP Energies nouvelles, 1 et 4 avenue de Bois-Préau, 92852, Rueil-Malmaison, France
| | - Carlos Nieto-Draghi
- IFP Energies nouvelles, 1 et 4 avenue de Bois-Préau, 92852, Rueil-Malmaison, France
| |
Collapse
|
6
|
Grisoni F, Merk D, Byrne R, Schneider G. Scaffold-Hopping from Synthetic Drugs by Holistic Molecular Representation. Sci Rep 2018; 8:16469. [PMID: 30405170 PMCID: PMC6220272 DOI: 10.1038/s41598-018-34677-0] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2018] [Accepted: 10/16/2018] [Indexed: 12/31/2022] Open
Abstract
The discovery of novel ligand chemotypes allows to explore uncharted regions in chemical space, thereby potentially improving synthetic accessibility, potency, and the drug-likeness of molecules. Here, we demonstrate the scaffold-hopping ability of the new Weighted Holistic Atom Localization and Entity Shape (WHALES) molecular descriptors compared to seven state-of-the-art molecular representations on 30,000 compounds and 182 biological targets. In a prospective application, we apply WHALES to the discovery of novel retinoid X receptor (RXR) modulators. WHALES descriptors identified four agonists with innovative molecular scaffolds, populating uncharted regions of the chemical space. One of the agonists, possessing a rare non-acidic chemotype, revealed high selectivity on 12 nuclear receptors and comparable efficacy as bexarotene on induction of ATP-binding cassette transporter A1, angiopoietin like protein 4 and apolipoprotein E. The outcome of this research supports WHALES as an innovative tool to explore novel regions of the chemical space and to detect novel bioactive chemotypes by straightforward similarity searching.
Collapse
Affiliation(s)
- Francesca Grisoni
- Swiss Federal Institute of Technology (ETH), Department of Chemistry and Applied Biosciences, Vladimir-Prelog-Weg 4, CH-8093, Zurich, Switzerland. .,Milano Chemometrics & QSAR Research Group, Department of Earth and Environmental Sciences, University of Milano-Bicocca, IT-20126, Milano, Italy.
| | - Daniel Merk
- Swiss Federal Institute of Technology (ETH), Department of Chemistry and Applied Biosciences, Vladimir-Prelog-Weg 4, CH-8093, Zurich, Switzerland
| | - Ryan Byrne
- Swiss Federal Institute of Technology (ETH), Department of Chemistry and Applied Biosciences, Vladimir-Prelog-Weg 4, CH-8093, Zurich, Switzerland
| | - Gisbert Schneider
- Swiss Federal Institute of Technology (ETH), Department of Chemistry and Applied Biosciences, Vladimir-Prelog-Weg 4, CH-8093, Zurich, Switzerland.
| |
Collapse
|
7
|
Grisoni F, Consonni V, Todeschini R. Impact of Molecular Descriptors on Computational Models. Methods Mol Biol 2018; 1825:171-209. [PMID: 30334206 DOI: 10.1007/978-1-4939-8639-2_5] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023]
Abstract
Molecular descriptors encode a wide variety of molecular information and have become the support of many contemporary chemoinformatic and bioinformatic applications. They grasp specific molecular features (e.g., geometry, shape, pharmacophores, or atomic properties) and directly affect computational models, in terms of outcome, performance, and applicability. This chapter aims to illustrate the impact of different molecular descriptors on the structural information captured and on the perceived chemical similarity among molecules. After introducing the fundamental concepts of molecular descriptor theory and application, a step-by-step retrospective virtual screening procedure guides users through the fundamental processing steps and discusses the impact of different types of molecular descriptors.
Collapse
Affiliation(s)
- Francesca Grisoni
- Department of Earth and Environmental Sciences, Milano Chemometrics and QSAR Research Group, University of Milano-Bicocca, Milan, Italy.
| | - Viviana Consonni
- Department of Earth and Environmental Sciences, Milano Chemometrics and QSAR Research Group, University of Milano-Bicocca, Milan, Italy
| | - Roberto Todeschini
- Department of Earth and Environmental Sciences, Milano Chemometrics and QSAR Research Group, University of Milano-Bicocca, Milan, Italy
| |
Collapse
|
8
|
Nakagawa T, Miyao T, Funatsu K. Identification of Bioactive Scaffolds Based on QSAR Models. Mol Inform 2017; 37. [DOI: 10.1002/minf.201700103] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2017] [Accepted: 10/02/2017] [Indexed: 11/10/2022]
Affiliation(s)
- Tomoki Nakagawa
- Department of Chemical System Engineering, School of Engineering The University of Tokyo 7-3-1 Hongo Bunkyo-ku, Tokyo 113-8656 Japan
| | - Tomoyuki Miyao
- Department of Life Science Informatics, B-IT, LIMES Program Unit Chemical Biology and Medicinal Chemistry Rheinische Friedrich-Wilhelms-Universität Dahlmannstr. 2 D-53113 Bonn Germany
| | - Kimito Funatsu
- Department of Chemical System Engineering, School of Engineering The University of Tokyo 7-3-1 Hongo Bunkyo-ku, Tokyo 113-8656 Japan
| |
Collapse
|
9
|
Ochi S, Miyao T, Funatsu K. Structure Modification toward Applicability Domain of a QSAR/QSPR Model Considering Activity/Property. Mol Inform 2017; 36. [PMID: 28815921 DOI: 10.1002/minf.201700076] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2017] [Accepted: 08/01/2017] [Indexed: 11/11/2022]
Abstract
In drug and material design, the activity and property values of the designed chemical structures can be predicted by quantitative structure-activity and structure-property relationship (QSAR/QSPR) models. When a QSAR/QSPR model is applied to chemical structures, its applicability domain (AD) must be considered. The predicted activity/property values are only reliable for chemical structures inside the AD. Chemical structures outside the AD are usually neglected, as the predicted values are unreliable. The purpose of this study is to develop a methodology for obtaining novel chemical structures with the desired activity or property based on a QSAR/QSPR model by making use of the neglected structures. We propose a structure modification strategy for the AD that considers the activity and property simultaneously. The AD is defined by a one-class support vector machine and the structure modification is guided by a partial derivative of the AD model and matched molecular pairs analysis. Three proof-of-concept case studies generate novel chemical structures inside the AD that exhibit preferable activity/property values according to the QSAR/QSPR model.
Collapse
Affiliation(s)
- Shoki Ochi
- Department of Chemical Systems Engineering, University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo, 113-8656, Japan
| | - Tomoyuki Miyao
- Department of Chemical Systems Engineering, University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo, 113-8656, Japan
| | - Kimito Funatsu
- Department of Chemical Systems Engineering, University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo, 113-8656, Japan
| |
Collapse
|
10
|
Abstract
Inverse quantitative structure-activity relationship (QSAR) modeling encompasses the generation of compound structures from values of descriptors corresponding to high activity predicted with a given QSAR model. Structure generation proceeds from descriptor coordinates optimized for activity prediction. Herein, we concentrate on the first phase of the inverse QSAR process and introduce a new methodology for coordinate optimization, termed differential evolution (DE), that originated from computer science and engineering. Using simulation and compound activity data, we demonstrate that DE in combination with support vector regression (SVR) yields effective and robust predictions of optimized coordinates satisfying model constraints and requirements. For different compound activity classes, optimized coordinates are obtained that exclusively map to regions of high activity in feature space, represent novel positions for structure generation, and are chemically meaningful.
Collapse
Affiliation(s)
- Tomoyuki Miyao
- Department of Chemical System Engineering, School of Engineering, The University of Tokyo, Tokyo, 113-8656, Japan.,Department of Life Science Informatics, B-IT, LIMES Program Unit Chemical Biology and Medicinal Chemistry, Rheinische Friedrich-Wilhelms-Universität, Bonn, D-53113, Germany
| | - Kimito Funatsu
- Department of Chemical System Engineering, School of Engineering, The University of Tokyo, Tokyo, 113-8656, Japan
| | - Jürgen Bajorath
- Department of Life Science Informatics, B-IT, LIMES Program Unit Chemical Biology and Medicinal Chemistry, Rheinische Friedrich-Wilhelms-Universität, Bonn, D-53113, Germany
| |
Collapse
|
11
|
Miyao T, Funatsu K. Finding Chemical Structures Corresponding to a Set of Coordinates in Chemical Descriptor Space. Mol Inform 2017; 36. [DOI: 10.1002/minf.201700030] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2017] [Accepted: 04/04/2017] [Indexed: 11/10/2022]
Affiliation(s)
- Tomoyuki Miyao
- Department of Chemical System Engineering; The University of Tokyo; 7-3-1 Hongo, Bunkyo-ku Tokyo 113-8656 Japan
| | - Kimito Funatsu
- Department of Chemical System Engineering; The University of Tokyo; 7-3-1 Hongo, Bunkyo-ku Tokyo 113-8656 Japan
| |
Collapse
|
12
|
Mok NY, Brown N. Applications of Systematic Molecular Scaffold Enumeration to Enrich Structure-Activity Relationship Information. J Chem Inf Model 2016; 57:27-35. [PMID: 27990817 PMCID: PMC6152611 DOI: 10.1021/acs.jcim.6b00386] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/02/2023]
Abstract
![]()
Establishing
structure–activity relationships (SARs) in
hit identification during early stage drug discovery is important
in accelerating hit confirmation and expansion. We describe the development
of EnCore, a systematic molecular scaffold enumeration
protocol using single atom mutations, to enhance the application of
objective scaffold definitions and to enrich SAR information from
analysis of high-throughput screening output. A list of 43 literature
medicinal chemistry compound series, each containing a minimum of
100 compounds, published in the Journal of Medicinal Chemistry was collated to validate the protocol. Analysis using the top representative
Level 1 scaffolds this list of literature compound series demonstrated
that EnCore could mimic the scaffold exploration
conducted when establishing SAR. When EnCore was
applied to analyze an HTS library containing over 200 000 compounds,
we observed that over 70% of the molecular scaffolds matched extant
scaffolds within the library after enumeration. In particular, over
60% of the singleton scaffolds with only one representative compound
were found to have structurally related compounds after enumeration.
These results illustrate the potential of EnCore to
enrich SAR information. A case study using literature cyclooxygenase-2
inhibitors further demonstrates the advantage of EnCore application in establishing SAR from structurally related scaffolds. EnCore complements literature enumeration methods in enabling
changes to the physicochemical properties of molecular scaffolds and
structural modifications to aliphatic rings and linkers. The enumerated
scaffold clusters generated would constitute a comprehensive collection
of scaffolds for scaffold morphing and hopping.
Collapse
Affiliation(s)
- N Yi Mok
- Cancer Research UK Cancer Therapeutics Unit, Division of Cancer Therapeutics, The Institute of Cancer Research , London, SM2 5NG, U.K
| | - Nathan Brown
- Cancer Research UK Cancer Therapeutics Unit, Division of Cancer Therapeutics, The Institute of Cancer Research , London, SM2 5NG, U.K
| |
Collapse
|