1
|
Srinivasan K, Puliyanda A, Prasad V. Identification of Reaction Network Hypotheses for Complex Feedstocks from Spectroscopic Measurements with Minimal Human Intervention. J Phys Chem A 2024; 128:4714-4729. [PMID: 38836378 DOI: 10.1021/acs.jpca.4c01592] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/06/2024]
Abstract
In this work, we detail an automated reaction network hypothesis generation protocol for processes involving complex feedstocks where information about the species and reactions involved is unknown. Our methodology is process agnostic and can be utilized in any reactive process with spectroscopic measurements that provide information on the evolution of the components in the mixture. We decompose the mixture spectra to obtain spectroscopic signatures of the individual components and use a 1-D convolutional neural network to automatically identify functional groups indicated by them. We employ atom-atom mapping to automatically recover reaction rules that are applied on candidate molecules identified from chemistry databases through fingerprint similarity. The method is tested on synthetic data and on spectroscopic measurements of lab-scale batch hydrothermal liquefaction (HTL) of biomass to determine the accuracy of prediction across datasets of varying complexities. Our methodology is able to identify reaction network hypotheses containing reaction networks close to the ground truth in the case of synthetic data, and we are also able to recover candidate molecules and reaction networks close to the ones reported in the previous literature studies for biomass pyrolysis.
Collapse
Affiliation(s)
- Karthik Srinivasan
- Department of Chemical and Materials Engineering, Donadeo Innovation Centre for Engineering, 9211, 116st NW, Edmonton T6G 1H9, AB, Canada
| | - Anjana Puliyanda
- Department of Chemical and Materials Engineering, Donadeo Innovation Centre for Engineering, 9211, 116st NW, Edmonton T6G 1H9, AB, Canada
| | - Vinay Prasad
- Department of Chemical and Materials Engineering, Donadeo Innovation Centre for Engineering, 9211, 116st NW, Edmonton T6G 1H9, AB, Canada
| |
Collapse
|
2
|
Dobbelaere MR, Lengyel I, Stevens CV, Van Geem KM. Rxn-INSIGHT: fast chemical reaction analysis using bond-electron matrices. J Cheminform 2024; 16:37. [PMID: 38553720 PMCID: PMC10980627 DOI: 10.1186/s13321-024-00834-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2023] [Accepted: 03/23/2024] [Indexed: 04/02/2024] Open
Abstract
The challenge of devising pathways for organic synthesis remains a central issue in the field of medicinal chemistry. Over the span of six decades, computer-aided synthesis planning has given rise to a plethora of potent tools for formulating synthetic routes. Nevertheless, a significant expert task still looms: determining the appropriate solvent, catalyst, and reagents when provided with a set of reactants to achieve and optimize the desired product for a specific step in the synthesis process. Typically, chemists identify key functional groups and rings that exert crucial influences at the reaction center, classify reactions into categories, and may assign them names. This research introduces Rxn-INSIGHT, an open-source algorithm based on the bond-electron matrix approach, with the purpose of automating this endeavor. Rxn-INSIGHT not only streamlines the process but also facilitates extensive querying of reaction databases, effectively replicating the thought processes of an organic chemist. The core functions of the algorithm encompass the classification and naming of reactions, extraction of functional groups, rings, and scaffolds from the involved chemical entities. The provision of reaction condition recommendations based on the similarity and prevalence of reactions eventually arises as a side application. The performance of our rule-based model has been rigorously assessed against a carefully curated benchmark dataset, exhibiting an accuracy rate exceeding 90% in reaction classification and surpassing 95% in reaction naming. Notably, it has been discerned that a pivotal factor in selecting analogous reactions lies in the analysis of ring structures participating in the reactions. An examination of ring structures within the USPTO chemical reaction database reveals that with just 35 unique rings, a remarkable 75% of all rings found in nearly 1 million products can be encompassed. Furthermore, Rxn-INSIGHT is proficient in suggesting appropriate choices for solvents, catalysts, and reagents in entirely novel reactions, all within the span of a second, utilizing nothing more than an everyday laptop.
Collapse
Affiliation(s)
- Maarten R Dobbelaere
- Laboratory for Chemical Technology, Department of Materials, Textiles and Chemical Engineering, Faculty of Engineering and Architecture, Ghent University, Technologiepark 125, 9052, Ghent, Belgium
| | - István Lengyel
- Laboratory for Chemical Technology, Department of Materials, Textiles and Chemical Engineering, Faculty of Engineering and Architecture, Ghent University, Technologiepark 125, 9052, Ghent, Belgium
- ChemInsights LLC, Dover, DE, 19901, USA
| | - Christian V Stevens
- SynBioC Research Group, Department of Green Chemistry and Technology, Faculty of Bioscience Engineering, Ghent University, Coupure Links 653, 9000, Ghent, Belgium
| | - Kevin M Van Geem
- Laboratory for Chemical Technology, Department of Materials, Textiles and Chemical Engineering, Faculty of Engineering and Architecture, Ghent University, Technologiepark 125, 9052, Ghent, Belgium.
| |
Collapse
|
3
|
Heid E, Probst D, Green WH, Madsen GKH. EnzymeMap: curation, validation and data-driven prediction of enzymatic reactions. Chem Sci 2023; 14:14229-14242. [PMID: 38098707 PMCID: PMC10718068 DOI: 10.1039/d3sc02048g] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2023] [Accepted: 11/21/2023] [Indexed: 12/17/2023] Open
Abstract
Enzymatic reactions are an ecofriendly, selective, and versatile addition, sometimes even alternative to organic reactions for the synthesis of chemical compounds such as pharmaceuticals or fine chemicals. To identify suitable reactions, computational models to predict the activity of enzymes on non-native substrates, to perform retrosynthetic pathway searches, or to predict the outcomes of reactions including regio- and stereoselectivity are becoming increasingly important. However, current approaches are substantially hindered by the limited amount of available data, especially if balanced and atom mapped reactions are needed and if the models feature machine learning components. We therefore constructed a high-quality dataset (EnzymeMap) by developing a large set of correction and validation algorithms for recorded reactions in the literature and showcase its significant positive impact on machine learning models of retrosynthesis, forward prediction, and regioselectivity prediction, outperforming previous approaches by a large margin. Our dataset allows for deep learning models of enzymatic reactions with unprecedented accuracy, and is freely available online.
Collapse
Affiliation(s)
- Esther Heid
- Institute of Materials Chemistry, TU Wien 1060 Vienna Austria
- Department of Chemical Engineering, Massachusetts Institute of Technology Cambridge Massachusetts 02139 USA
| | | | - William H Green
- Department of Chemical Engineering, Massachusetts Institute of Technology Cambridge Massachusetts 02139 USA
| | | |
Collapse
|
4
|
Oh KK, Gupta H, Ganesan R, Sharma SP, Won SM, Jeong JJ, Lee SB, Cha MG, Kwon GH, Jeong MK, Min BH, Hyun JY, Eom JA, Park HJ, Yoon SJ, Choi MR, Kim DJ, Suk KT. The seamless integration of dietary plant-derived natural flavonoids and gut microbiota may ameliorate non-alcoholic fatty liver disease: a network pharmacology analysis. ARTIFICIAL CELLS, NANOMEDICINE, AND BIOTECHNOLOGY 2023; 51:217-232. [PMID: 37129458 DOI: 10.1080/21691401.2023.2203734] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/03/2023]
Abstract
We comprised metabolites of gut microbiota (GM; endogenous species) and dietary plant-derived natural flavonoids (DPDNFs; exogenous species) were known as potent effectors against non-alcoholic fatty liver disease (NAFLD) via network pharmacology (NP). The crucial targets against NAFLD were identified via GM and DPDNFs. The protein interaction (PPI), bubble chart and networks of GM or natural products- metabolites-targets-key signalling (GNMTK) pathway were described via R Package. Furthermore, the molecular docking test (MDT) to verify the affinity was performed between metabolite(s) and target(s) on a key signalling pathway. On the networks of GNMTK, Enterococcus sp. 45, Escherichia sp.12, Escherichia sp.33 and Bacterium MRG-PMF-1 as key microbiota; flavonoid-rich products as key natural resources; luteolin and myricetin as key metabolites (or dietary flavonoids); AKT Serine/Threonine Kinase 1 (AKT1), CF Transmembrane conductance Regulator (CFTR) and PhosphoInositide-3-Kinase, Regulatory subunit 1 (PIK3R1) as key targets are promising components to treat NAFLD, by suppressing cyclic Adenosine MonoPhosphate (cAMP) signalling pathway. This study shows that components (microbiota, metabolites, targets and a key signalling pathway) and DPDNFs can exert combinatorial pharmacological effects against NAFLD. Overall, the integrated pharmacological approach sheds light on the relationships between GM and DPDNFs.
Collapse
Affiliation(s)
- Ki-Kwang Oh
- Center for Microbiome, Institute for Liver and Digestive Diseases, Hallym University Medical Center, Chuncheon, Korea
| | - Haripriya Gupta
- Center for Microbiome, Institute for Liver and Digestive Diseases, Hallym University Medical Center, Chuncheon, Korea
| | - Raja Ganesan
- Center for Microbiome, Institute for Liver and Digestive Diseases, Hallym University Medical Center, Chuncheon, Korea
| | - Satya Priya Sharma
- Center for Microbiome, Institute for Liver and Digestive Diseases, Hallym University Medical Center, Chuncheon, Korea
| | - Sung-Min Won
- Center for Microbiome, Institute for Liver and Digestive Diseases, Hallym University Medical Center, Chuncheon, Korea
| | - Jin-Ju Jeong
- Center for Microbiome, Institute for Liver and Digestive Diseases, Hallym University Medical Center, Chuncheon, Korea
| | - Su-Been Lee
- Center for Microbiome, Institute for Liver and Digestive Diseases, Hallym University Medical Center, Chuncheon, Korea
| | - Min-Gi Cha
- Center for Microbiome, Institute for Liver and Digestive Diseases, Hallym University Medical Center, Chuncheon, Korea
| | - Goo-Hyun Kwon
- Center for Microbiome, Institute for Liver and Digestive Diseases, Hallym University Medical Center, Chuncheon, Korea
| | - Min-Kyo Jeong
- Center for Microbiome, Institute for Liver and Digestive Diseases, Hallym University Medical Center, Chuncheon, Korea
| | - Byeong-Hyun Min
- Center for Microbiome, Institute for Liver and Digestive Diseases, Hallym University Medical Center, Chuncheon, Korea
| | - Ji-Ye Hyun
- Center for Microbiome, Institute for Liver and Digestive Diseases, Hallym University Medical Center, Chuncheon, Korea
| | - Jung-A Eom
- Center for Microbiome, Institute for Liver and Digestive Diseases, Hallym University Medical Center, Chuncheon, Korea
| | - Hee-Jin Park
- Center for Microbiome, Institute for Liver and Digestive Diseases, Hallym University Medical Center, Chuncheon, Korea
| | - Sang-Jun Yoon
- Center for Microbiome, Institute for Liver and Digestive Diseases, Hallym University Medical Center, Chuncheon, Korea
| | - Mi-Ran Choi
- Center for Microbiome, Institute for Liver and Digestive Diseases, Hallym University Medical Center, Chuncheon, Korea
| | - Dong Joon Kim
- Center for Microbiome, Institute for Liver and Digestive Diseases, Hallym University Medical Center, Chuncheon, Korea
| | - Ki-Tae Suk
- Center for Microbiome, Institute for Liver and Digestive Diseases, Hallym University Medical Center, Chuncheon, Korea
| |
Collapse
|
5
|
Lim PK, Julca I, Mutwil M. Redesigning plant specialized metabolism with supervised machine learning using publicly available reactome data. Comput Struct Biotechnol J 2023; 21:1639-1650. [PMID: 36874159 PMCID: PMC9976193 DOI: 10.1016/j.csbj.2023.01.013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2022] [Revised: 01/12/2023] [Accepted: 01/12/2023] [Indexed: 01/19/2023] Open
Abstract
The immense structural diversity of products and intermediates of plant specialized metabolism (specialized metabolites) makes them rich sources of therapeutic medicine, nutrients, and other useful materials. With the rapid accumulation of reactome data that can be accessible on biological and chemical databases, along with recent advances in machine learning, this review sets out to outline how supervised machine learning can be used to design new compounds and pathways by exploiting the wealth of said data. We will first examine the various sources from which reactome data can be obtained, followed by explaining the different machine learning encoding methods for reactome data. We then discuss current supervised machine learning developments that can be employed in various aspects to help redesign plant specialized metabolism.
Collapse
Affiliation(s)
- Peng Ken Lim
- School of Biological Sciences, Nanyang Technological University, Singapore, Singapore
| | - Irene Julca
- School of Biological Sciences, Nanyang Technological University, Singapore, Singapore
| | - Marek Mutwil
- School of Biological Sciences, Nanyang Technological University, Singapore, Singapore
| |
Collapse
|
6
|
Ismail I, Chantreau Majerus R, Habershon S. Graph-Driven Reaction Discovery: Progress, Challenges, and Future Opportunities. J Phys Chem A 2022; 126:7051-7069. [PMID: 36190262 PMCID: PMC9574932 DOI: 10.1021/acs.jpca.2c06408] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Graph-based descriptors, such as bond-order matrices and adjacency matrices, offer a simple and compact way of categorizing molecular structures; furthermore, such descriptors can be readily used to catalog chemical reactions (i.e., bond-making and -breaking). As such, a number of graph-based methodologies have been developed with the goal of automating the process of generating chemical reaction network models describing the possible mechanistic chemistry in a given set of reactant species. Here, we outline the evolution of these graph-based reaction discovery schemes, with particular emphasis on more recent methods incorporating graph-based methods with semiempirical and ab initio electronic structure calculations, minimum-energy path refinements, and transition state searches. Using representative examples from homogeneous catalysis and interstellar chemistry, we highlight how these schemes increasingly act as "virtual reaction vessels" for interrogating mechanistic questions. Finally, we highlight where challenges remain, including issues of chemical accuracy and calculation speeds, as well as the inherent challenge of dealing with the vast size of accessible chemical reaction space.
Collapse
Affiliation(s)
- Idil Ismail
- Department of Chemistry, University of Warwick, CoventryCV4 7AL, United Kingdom
| | | | - Scott Habershon
- Department of Chemistry, University of Warwick, CoventryCV4 7AL, United Kingdom
| |
Collapse
|
7
|
Oh L, Ji Y, Li W, Varki A, Chen X, Wang LP. O-Acetyl Migration within the Sialic Acid Side Chain: A Mechanistic Study Using the Ab Initio Nanoreactor. Biochemistry 2022; 61:2007-2013. [PMID: 36054099 DOI: 10.1021/acs.biochem.2c00343] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Many disease-causing viruses target sialic acids on the surface of host cells. Some viruses bind preferentially to sialic acids with O-acetyl modification at the hydroxyl group of C7, C8, or C9 on the glycerol-like side chain. Studies of proteins binding to sialosides containing O-acetylated sialic acids are crucial in understanding the related diseases but experimentally difficult due to the lability of the ester group. We recently showed that O-acetyl migration among hydroxyl groups of C7, C8, and C9 in sialic acids occurs in all directions in a pH-dependent manner. In the current study, we elucidate a full mechanistic pathway for the migration of O-acetyl among C7, C8, and C9. We used an ab initio nanoreactor to explore potential reaction pathways and density functional theory, pKa calculations, and umbrella sampling to investigate elementary steps of interest. We found that when a base is present, migration is easy in any direction and involves three key steps: deprotonation of the hydroxyl group, cyclization between the two carbons, and the migration of the O-acetyl group. This dynamic equilibrium may play a defensive role against pathogens that evolve to gain entry to the cell by binding selectively to one acetylation state.
Collapse
Affiliation(s)
- Lisa Oh
- Department of Chemistry, University of California, Davis, California 95616, United States
| | - Yang Ji
- Glycobiology Research and Training Center, Departments of Medicine and Cellular and Molecular Medicine, University of California, San Diego, California 92093, United States
| | - Wanqing Li
- Department of Chemistry, University of California, Davis, California 95616, United States
| | - Ajit Varki
- Glycobiology Research and Training Center, Departments of Medicine and Cellular and Molecular Medicine, University of California, San Diego, California 92093, United States
| | - Xi Chen
- Department of Chemistry, University of California, Davis, California 95616, United States
| | - Lee-Ping Wang
- Department of Chemistry, University of California, Davis, California 95616, United States
| |
Collapse
|
8
|
|
9
|
|
10
|
Sridharan B, Goel M, Priyakumar UD. Modern Machine Learning for Tackling Inverse Problems in Chemistry: Molecular Design to Realization. Chem Commun (Camb) 2022; 58:5316-5331. [DOI: 10.1039/d1cc07035e] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
The discovery of new molecules and materials helps expand the horizons of novel and innovative real-life applications. In the pursuit of finding molecules with desired properties, chemists have traditionally relied...
Collapse
|
11
|
Szymkuć S, Badowski T, Grzybowski BA. Is Organic Chemistry Really Growing Exponentially? Angew Chem Int Ed Engl 2021. [DOI: 10.1002/ange.202111540] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Affiliation(s)
- Sara Szymkuć
- Institute of Organic Chemistry Polish Academy of Sciences Ul. Kasprzaka 44/52 01-224 Warsaw Poland
- Allchemy, Inc. Highland IN USA
| | - Tomasz Badowski
- Institute of Organic Chemistry Polish Academy of Sciences Ul. Kasprzaka 44/52 01-224 Warsaw Poland
- Allchemy, Inc. Highland IN USA
| | - Bartosz A. Grzybowski
- Institute of Organic Chemistry Polish Academy of Sciences Ul. Kasprzaka 44/52 01-224 Warsaw Poland
- Allchemy, Inc. Highland IN USA
- IBS Center for Soft and Living Matter and Department of Chemistry UNIST 50, UNIST-gil, Eonyang-eup, Ulju-gun Ulsan South Korea
| |
Collapse
|
12
|
Szymkuć S, Badowski T, Grzybowski BA. Is Organic Chemistry Really Growing Exponentially? Angew Chem Int Ed Engl 2021; 60:26226-26232. [PMID: 34558168 DOI: 10.1002/anie.202111540] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2021] [Indexed: 11/05/2022]
Abstract
In terms of molecules and specific reaction examples, organic chemistry features an impressive, exponential growth. However, new reaction classes/types that fuel this growth are being discovered at a much slower and only linear (or even sublinear) rate. The proportion of newly discovered reaction types to all reactions being performed keeps decreasing, suggesting that synthetic chemistry becomes more reliant on reusing the well-known methods. The newly discovered chemistries are more complex than decades ago and allow for the rapid construction of complex scaffolds in fewer numbers of steps. We study these and other trends in the function of time, reaction-type popularity and complexity based on the algorithm that extracts generalized reaction class templates. These analyses are useful in the context of computer-assisted synthesis, machine learning (to estimate the numbers of models with sufficient reaction statistics), and identifying erroneous entries in reaction databases.
Collapse
Affiliation(s)
- Sara Szymkuć
- Institute of Organic Chemistry, Polish Academy of Sciences, Ul. Kasprzaka 44/52, 01-224, Warsaw, Poland.,Allchemy, Inc., Highland, IN, USA
| | - Tomasz Badowski
- Institute of Organic Chemistry, Polish Academy of Sciences, Ul. Kasprzaka 44/52, 01-224, Warsaw, Poland.,Allchemy, Inc., Highland, IN, USA
| | - Bartosz A Grzybowski
- Institute of Organic Chemistry, Polish Academy of Sciences, Ul. Kasprzaka 44/52, 01-224, Warsaw, Poland.,Allchemy, Inc., Highland, IN, USA.,IBS Center for Soft and Living Matter and Department of Chemistry, UNIST, 50, UNIST-gil, Eonyang-eup, Ulju-gun, Ulsan, South Korea
| |
Collapse
|
13
|
Sharma S, Arya A, Cruz R, Cleaves II HJ. Automated Exploration of Prebiotic Chemical Reaction Space: Progress and Perspectives. Life (Basel) 2021; 11:1140. [PMID: 34833016 PMCID: PMC8624352 DOI: 10.3390/life11111140] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2021] [Revised: 10/15/2021] [Accepted: 10/18/2021] [Indexed: 12/12/2022] Open
Abstract
Prebiotic chemistry often involves the study of complex systems of chemical reactions that form large networks with a large number of diverse species. Such complex systems may have given rise to emergent phenomena that ultimately led to the origin of life on Earth. The environmental conditions and processes involved in this emergence may not be fully recapitulable, making it difficult for experimentalists to study prebiotic systems in laboratory simulations. Computational chemistry offers efficient ways to study such chemical systems and identify the ones most likely to display complex properties associated with life. Here, we review tools and techniques for modelling prebiotic chemical reaction networks and outline possible ways to identify self-replicating features that are central to many origin-of-life models.
Collapse
Affiliation(s)
- Siddhant Sharma
- Blue Marble Space Institute of Science, Seattle, WA 98154, USA; (S.S.); (A.A.); (R.C.)
- Department of Biochemistry, Deshbandhu College, University of Delhi, New Delhi 110019, India
- Department of Chemistry and Chemical Engineering, Chalmers University of Technology, SE-412 96 Gothenburg, Sweden
| | - Aayush Arya
- Blue Marble Space Institute of Science, Seattle, WA 98154, USA; (S.S.); (A.A.); (R.C.)
- Department of Physics, Lovely Professional University, Jalandhar-Delhi GT Road, Phagwara 144001, India
| | - Romulo Cruz
- Blue Marble Space Institute of Science, Seattle, WA 98154, USA; (S.S.); (A.A.); (R.C.)
- Big Data Laboratory, Information and Communications Technology Center (CTIC), National University of Engineering, Amaru 210, Lima 15333, Peru
| | - Henderson James Cleaves II
- Blue Marble Space Institute of Science, Seattle, WA 98154, USA; (S.S.); (A.A.); (R.C.)
- Earth-Life Science Institute, Tokyo Institute of Technology, Tokyo 152-8550, Japan
| |
Collapse
|
14
|
Heid E, Goldman S, Sankaranarayanan K, Coley CW, Flamm C, Green WH. EHreact: Extended Hasse Diagrams for the Extraction and Scoring of Enzymatic Reaction Templates. J Chem Inf Model 2021; 61:4949-4961. [PMID: 34587449 PMCID: PMC8549070 DOI: 10.1021/acs.jcim.1c00921] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2021] [Indexed: 11/29/2022]
Abstract
Data-driven computer-aided synthesis planning utilizing organic or biocatalyzed reactions from large databases has gained increasing interest in the last decade, sparking the development of numerous tools to extract, apply, and score general reaction templates. The generation of reaction rules for enzymatic reactions is especially challenging since substrate promiscuity varies between enzymes, causing the optimal levels of rule specificity and optimal number of included atoms to differ between enzymes. This complicates an automated extraction from databases and has promoted the creation of manually curated reaction rule sets. Here, we present EHreact, a purely data-driven open-source software tool, to extract and score reaction rules from sets of reactions known to be catalyzed by an enzyme at appropriate levels of specificity without expert knowledge. EHreact extracts and groups reaction rules into tree-like structures, Hasse diagrams, based on common substructures in the imaginary transition structures. Each diagram can be utilized to output a single or a set of reaction rules, as well as calculate the probability of a new substrate to be processed by the given enzyme by inferring information about the reactive site of the enzyme from the known reactions and their grouping in the template tree. EHreact heuristically predicts the activity of a given enzyme on a new substrate, outperforming current approaches in accuracy and functionality.
Collapse
Affiliation(s)
- Esther Heid
- Department
of Chemical Engineering, Massachusetts Institute
of Technology, Cambridge, Massachusetts 02139, United States
| | - Samuel Goldman
- Computational
and Systems Biology, Massachusetts Institute
of Technology, Cambridge, Massachusetts 02139, United States
| | - Karthik Sankaranarayanan
- Department
of Chemical Engineering, Massachusetts Institute
of Technology, Cambridge, Massachusetts 02139, United States
| | - Connor W. Coley
- Department
of Chemical Engineering, Massachusetts Institute
of Technology, Cambridge, Massachusetts 02139, United States
| | - Christoph Flamm
- Department
of Theoretical Chemistry, University of
Vienna, 1090 Vienna, Austria
| | - William H. Green
- Department
of Chemical Engineering, Massachusetts Institute
of Technology, Cambridge, Massachusetts 02139, United States
| |
Collapse
|
15
|
Gimadiev TR, Lin A, Afonina VA, Batyrshin D, Nugmanov RI, Akhmetshin T, Sidorov P, Duybankova N, Verhoeven J, Wegner J, Ceulemans H, Gedich A, Madzhidov TI, Varnek A. Reaction Data Curation I: Chemical Structures and Transformations Standardization. Mol Inform 2021; 40:e2100119. [PMID: 34427989 DOI: 10.1002/minf.202100119] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2021] [Accepted: 08/13/2021] [Indexed: 12/11/2022]
Abstract
The quality of experimental data for chemical reactions is a critical consideration for any reaction-driven study. However, the curation of reaction data has not been extensively discussed in the literature so far. Here, we suggest a 4 steps protocol that includes the curation of individual structures (reactants and products), chemical transformations, reaction conditions and endpoints. Its implementation in Python3 using CGRTools toolkit has been used to clean three popular reaction databases Reaxys, USPTO and Pistachio. The curated USPTO database is available in the GitHub repository (Laboratoire-de-Chemoinformatique/Reaction_Data_Cleaning).
Collapse
Affiliation(s)
- Timur R Gimadiev
- Institute for Chemical Reaction Design and Discovery (WPI-ICReDD), Hokkaido University, Kita 21 Nishi 10, Kita-ku, 001-0021, Sapporo, Japan
| | - Arkadii Lin
- Laboratory of Chemoinformatics, UMR 7140 CNRS, University of Strasbourg, 4, Blaise Pascal str., 67081, Strasbourg, France
| | - Valentina A Afonina
- Laboratory of Chemoinformatics and Molecular Modeling, Butlerov Institute of Chemistry, Kazan Federal University, 18, Kremlyovskaya str., 420008, Kazan, Russia
| | - Dinar Batyrshin
- Laboratory of Chemoinformatics and Molecular Modeling, Butlerov Institute of Chemistry, Kazan Federal University, 18, Kremlyovskaya str., 420008, Kazan, Russia
| | - Ramil I Nugmanov
- Laboratory of Chemoinformatics and Molecular Modeling, Butlerov Institute of Chemistry, Kazan Federal University, 18, Kremlyovskaya str., 420008, Kazan, Russia
| | - Tagir Akhmetshin
- Laboratory of Chemoinformatics, UMR 7140 CNRS, University of Strasbourg, 4, Blaise Pascal str., 67081, Strasbourg, France.,Laboratory of Chemoinformatics and Molecular Modeling, Butlerov Institute of Chemistry, Kazan Federal University, 18, Kremlyovskaya str., 420008, Kazan, Russia
| | - Pavel Sidorov
- Institute for Chemical Reaction Design and Discovery (WPI-ICReDD), Hokkaido University, Kita 21 Nishi 10, Kita-ku, 001-0021, Sapporo, Japan
| | | | - Jonas Verhoeven
- Janssen Pharmaceutica, 30, Turnhoutseweg str., 2340, Beerse, Belgium
| | - Joerg Wegner
- Janssen Pharmaceutica, 30, Turnhoutseweg str., 2340, Beerse, Belgium
| | - Hugo Ceulemans
- Janssen Pharmaceutica, 30, Turnhoutseweg str., 2340, Beerse, Belgium
| | - Andrey Gedich
- Arcadia Inc., Bol'shoy Sampsoniyevskiy Prospekt, 28 κopпyc 2, 194044, St Petersburg, Russia
| | - Timur I Madzhidov
- Laboratory of Chemoinformatics and Molecular Modeling, Butlerov Institute of Chemistry, Kazan Federal University, 18, Kremlyovskaya str., 420008, Kazan, Russia
| | - Alexandre Varnek
- Institute for Chemical Reaction Design and Discovery (WPI-ICReDD), Hokkaido University, Kita 21 Nishi 10, Kita-ku, 001-0021, Sapporo, Japan.,Laboratory of Chemoinformatics, UMR 7140 CNRS, University of Strasbourg, 4, Blaise Pascal str., 67081, Strasbourg, France
| |
Collapse
|
16
|
Gupta U, Vlachos DG. Learning Chemistry of Complex Reaction Systems via a Python First-Principles Reaction Rule Stencil (pReSt) Generator. J Chem Inf Model 2021; 61:3431-3441. [PMID: 34265203 DOI: 10.1021/acs.jcim.1c00297] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Complex reaction networks can be generated with automated network generators from initial reactants and reaction rules. Reaction rule specification is central to network generation. These reaction rules are, at present, user-defined based on (intuitive) expert knowledge of chemistry and are often transferred from gas-phase to surface processes. The catalyst active site geometry is usually left out but is often responsible for selectivity. We propose a first-principles-based reaction mechanism generation framework using density functional theory (DFT) data of published reaction mechanisms. The framework "learns the chemistry" from published mechanisms. It can generate reaction networks not studied before, "flag" reactions not seen before for further DFT convergence tests, and easily reconcile differences between catalysts and reactants that may introduce new pathways never seen before. As such, it can be a diagnostic tool for data (mechanism) quality assessment and novel pathway discovery to new molecules. A software, the Python Reaction Stencil (pReSt), was developed for this purpose to wrap around automatic mechanism generation software. Multiple catalytic chemistries are considered to show the efficacy of the proposed framework.
Collapse
Affiliation(s)
- Udit Gupta
- Department of Chemical and Biomolecular Engineering, Rapid Advancement in Process Intensification Deployment (RAPID) Institute, Delaware Energy Institute, University of Delaware, Newark, Delaware 19716, United States
| | - Dionisios G Vlachos
- Department of Chemical and Biomolecular Engineering, Rapid Advancement in Process Intensification Deployment (RAPID) Institute, Delaware Energy Institute, University of Delaware, Newark, Delaware 19716, United States
| |
Collapse
|
17
|
Borges R, Colby SM, Das S, Edison AS, Fiehn O, Kind T, Lee J, Merrill AT, Merz KM, Metz TO, Nunez JR, Tantillo DJ, Wang LP, Wang S, Renslow RS. Quantum Chemistry Calculations for Metabolomics. Chem Rev 2021; 121:5633-5670. [PMID: 33979149 PMCID: PMC8161423 DOI: 10.1021/acs.chemrev.0c00901] [Citation(s) in RCA: 35] [Impact Index Per Article: 11.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2020] [Indexed: 02/07/2023]
Abstract
A primary goal of metabolomics studies is to fully characterize the small-molecule composition of complex biological and environmental samples. However, despite advances in analytical technologies over the past two decades, the majority of small molecules in complex samples are not readily identifiable due to the immense structural and chemical diversity present within the metabolome. Current gold-standard identification methods rely on reference libraries built using authentic chemical materials ("standards"), which are not available for most molecules. Computational quantum chemistry methods, which can be used to calculate chemical properties that are then measured by analytical platforms, offer an alternative route for building reference libraries, i.e., in silico libraries for "standards-free" identification. In this review, we cover the major roadblocks currently facing metabolomics and discuss applications where quantum chemistry calculations offer a solution. Several successful examples for nuclear magnetic resonance spectroscopy, ion mobility spectrometry, infrared spectroscopy, and mass spectrometry methods are reviewed. Finally, we consider current best practices, sources of error, and provide an outlook for quantum chemistry calculations in metabolomics studies. We expect this review will inspire researchers in the field of small-molecule identification to accelerate adoption of in silico methods for generation of reference libraries and to add quantum chemistry calculations as another tool at their disposal to characterize complex samples.
Collapse
Affiliation(s)
- Ricardo
M. Borges
- Walter
Mors Institute of Research on Natural Products, Federal University of Rio de Janeiro, Rio de Janeiro 21941-901, Brazil
| | - Sean M. Colby
- Biological
Science Division, Pacific Northwest National
Laboratory, Richland, Washington 99352, United States
| | - Susanta Das
- Department
of Chemistry, Michigan State University, East Lansing, Michigan 48824, United States
| | - Arthur S. Edison
- Departments
of Genetics and Biochemistry and Molecular Biology, Complex Carbohydrate
Research Center and Institute of Bioinformatics, University of Georgia, Athens, Georgia 30602, United States
| | - Oliver Fiehn
- West
Coast Metabolomics Center for Compound Identification, UC Davis Genome
Center, University of California, Davis, California 95616, United States
| | - Tobias Kind
- West
Coast Metabolomics Center for Compound Identification, UC Davis Genome
Center, University of California, Davis, California 95616, United States
| | - Jesi Lee
- West
Coast Metabolomics Center for Compound Identification, UC Davis Genome
Center, University of California, Davis, California 95616, United States
- Department
of Chemistry, University of California, Davis, California 95616, United States
| | - Amy T. Merrill
- Department
of Chemistry, University of California, Davis, California 95616, United States
| | - Kenneth M. Merz
- Department
of Chemistry, Michigan State University, East Lansing, Michigan 48824, United States
| | - Thomas O. Metz
- Biological
Science Division, Pacific Northwest National
Laboratory, Richland, Washington 99352, United States
| | - Jamie R. Nunez
- Biological
Science Division, Pacific Northwest National
Laboratory, Richland, Washington 99352, United States
| | - Dean J. Tantillo
- Department
of Chemistry, University of California, Davis, California 95616, United States
| | - Lee-Ping Wang
- Department
of Chemistry, University of California, Davis, California 95616, United States
| | - Shunyang Wang
- West
Coast Metabolomics Center for Compound Identification, UC Davis Genome
Center, University of California, Davis, California 95616, United States
- Department
of Chemistry, University of California, Davis, California 95616, United States
| | - Ryan S. Renslow
- Biological
Science Division, Pacific Northwest National
Laboratory, Richland, Washington 99352, United States
| |
Collapse
|
18
|
Abstract
As more data are introduced in the building of models of chemical reactivity, the mechanistic component can be reduced until 'big data' applications are reached. These methods no longer depend on underlying mechanistic hypotheses, potentially learning them implicitly through extensive data training. Reactivity models often focus on reaction barriers, but can also be trained to directly predict lab-relevant properties, such as yields or conditions. Calculations with a quantum-mechanical component are still preferred for quantitative predictions of reactivity. Although big data applications tend to be more qualitative, they have the advantage to be broadly applied to different kinds of reactions. There is a continuum of methods in between these extremes, such as methods that use quantum-derived data or descriptors in machine learning models. Here, we present an overview of the recent machine learning applications in the field of chemical reactivity from a mechanistic perspective. Starting with a summary of how reactivity questions are addressed by quantum-mechanical methods, we discuss methods that augment or replace quantum-based modelling with faster alternatives relying on machine learning.
Collapse
|
19
|
Finnigan W, Hepworth LJ, Flitsch SL, Turner NJ. RetroBioCat as a computer-aided synthesis planning tool for biocatalytic reactions and cascades. Nat Catal 2021; 4:98-104. [PMID: 33604511 PMCID: PMC7116764 DOI: 10.1038/s41929-020-00556-z] [Citation(s) in RCA: 87] [Impact Index Per Article: 29.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]
Abstract
As the enzyme toolbox for biocatalysis has expanded, so has the potential for the construction of powerful enzymatic cascades for efficient and selective synthesis of target molecules. Additionally, recent advances in computer-aided synthesis planning are revolutionising synthesis design in both synthetic biology and organic chemistry. However, the potential for biocatalysis is not well captured by tools currently available in either field. Here we present RetroBioCat, an intuitive and accessible tool for computer-aided design of biocatalytic cascades, freely available at retrobiocat.com. Our approach uses a set of expertly encoded reaction rules encompassing the enzyme toolbox for biocatalysis, and a system for identifying literature precedent for enzymes with the correct substrate specificity where this is available. Applying these rules for automated biocatalytic retrosynthesis, we show our tool to be capable of identifying promising biocatalytic pathways to target molecules, validated using a test-set of recent cascades described in the literature.
Collapse
Affiliation(s)
- William Finnigan
- Department of Chemistry, University of Manchester, Manchester Institute of Biotechnology, 131 Princess Street, M1 7DN, Manchester, UK
| | - Lorna J Hepworth
- Department of Chemistry, University of Manchester, Manchester Institute of Biotechnology, 131 Princess Street, M1 7DN, Manchester, UK
| | - Sabine L Flitsch
- Department of Chemistry, University of Manchester, Manchester Institute of Biotechnology, 131 Princess Street, M1 7DN, Manchester, UK
| | - Nicholas J Turner
- Department of Chemistry, University of Manchester, Manchester Institute of Biotechnology, 131 Princess Street, M1 7DN, Manchester, UK
| |
Collapse
|
20
|
Duigou T, du Lac M, Carbonell P, Faulon JL. RetroRules: a database of reaction rules for engineering biology. Nucleic Acids Res 2020; 47:D1229-D1235. [PMID: 30321422 PMCID: PMC6323975 DOI: 10.1093/nar/gky940] [Citation(s) in RCA: 44] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2018] [Accepted: 10/09/2018] [Indexed: 01/03/2023] Open
Abstract
RetroRules is a database of reaction rules for metabolic engineering (https://retrorules.org). Reaction rules are generic descriptions of chemical reactions that can be used in retrosynthesis workflows in order to enumerate all possible biosynthetic routes connecting a target molecule to its precursors. The use of such rules is becoming increasingly important in the context of synthetic biology applied to de novo pathway discovery and in systems biology to discover underground metabolism due to enzyme promiscuity. Here, we provide for the first time a complete set containing >400 000 stereochemistry-aware reaction rules extracted from public databases and expressed in the community-standard SMARTS (SMIRKS) format, augmented by a rule representation at different levels of specificity (the atomic environment around the reaction center). Such numerous representations of reactions expand natural chemical diversity by predicting de novo reactions of promiscuous enzymes.
Collapse
Affiliation(s)
- Thomas Duigou
- Micalis Institute, INRA, AgroParisTech, Université Paris-Saclay, 78350 Jouy-en-Josas, France
| | - Melchior du Lac
- Micalis Institute, INRA, AgroParisTech, Université Paris-Saclay, 78350 Jouy-en-Josas, France
| | - Pablo Carbonell
- SYNBIOCHEM Centre, Manchester Institute of Biotechnology, University of Manchester, Manchester M1 7DN, UK
| | - Jean-Loup Faulon
- Micalis Institute, INRA, AgroParisTech, Université Paris-Saclay, 78350 Jouy-en-Josas, France.,SYNBIOCHEM Centre, Manchester Institute of Biotechnology, University of Manchester, Manchester M1 7DN, UK.,CNRS-UMR8030/Laboratoire iSSB, Université Paris-Saclay, Évry 91000, France
| |
Collapse
|
21
|
Nicolaou CA, Watson IA, LeMasters M, Masquelin T, Wang J. Context Aware Data-Driven Retrosynthetic Analysis. J Chem Inf Model 2020; 60:2728-2738. [DOI: 10.1021/acs.jcim.9b01141] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Affiliation(s)
- Christos A. Nicolaou
- Discovery Chemistry, Lilly Research Laboratories, Eli Lilly and Company, Indianapolis, Indiana 46285, United States
| | - Ian A. Watson
- Discovery Chemistry, Lilly Research Laboratories, Eli Lilly and Company, Indianapolis, Indiana 46285, United States
| | - Mark LeMasters
- Research Chemistry IT, Eli Lilly and Company, Indianapolis, Indiana 46285, United States
| | - Thierry Masquelin
- Discovery Chemistry, Lilly Research Laboratories, Eli Lilly and Company, Indianapolis, Indiana 46285, United States
| | - Jibo Wang
- Discovery Chemistry, Lilly Research Laboratories, Eli Lilly and Company, Indianapolis, Indiana 46285, United States
| |
Collapse
|
22
|
Rudik AV, Dmitriev AV, Lagunin AA, Filimonov DA, Poroikov VV. PASS-based prediction of metabolites detection in biological systems. SAR AND QSAR IN ENVIRONMENTAL RESEARCH 2019; 30:751-758. [PMID: 31542944 DOI: 10.1080/1062936x.2019.1665099] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/08/2019] [Accepted: 09/04/2019] [Indexed: 06/10/2023]
Abstract
Metabolite identification is an essential part of the drug discovery and development process. Experimental methods allow identifying metabolites and estimating their relative amount, but they require cost-intensive and time-consuming techniques. Computational methods for metabolite prediction are devoid of these shortcomings and may be applied at the early stage of drug discovery. In this study, we investigated the possibility of creating SAR models for the prediction of the qualitative metabolite yield ('major', 'minor', "trace" and "negligible") depending on species and biological experimental systems. In addition, we have created models for prediction of xenobiotic excretion depending on its administration route for different species. The prediction is based on an algorithm of naïve Bayes classifier implemented in PASS software. The average accuracy of prediction was 0.91 for qualitative metabolite yield prediction and 0.89 for prediction of xenobiotic excretion. The created models were included as a component of MetaTox web application, which allows predicting the xenobiotic metabolism pathways ( http://www.way2drug.com/mg ).
Collapse
Affiliation(s)
- A V Rudik
- Department for Bioinformatics, Institute of Biomedical Chemistry (IBMC) , Moscow , Russia
| | - A V Dmitriev
- Department for Bioinformatics, Institute of Biomedical Chemistry (IBMC) , Moscow , Russia
| | - A A Lagunin
- Department for Bioinformatics, Institute of Biomedical Chemistry (IBMC) , Moscow , Russia
- Medico-biological Faculty, Pirogov Russian National Research Medical University , Moscow , Russia
| | - D A Filimonov
- Department for Bioinformatics, Institute of Biomedical Chemistry (IBMC) , Moscow , Russia
| | - V V Poroikov
- Department for Bioinformatics, Institute of Biomedical Chemistry (IBMC) , Moscow , Russia
| |
Collapse
|
23
|
Watson IA, Wang J, Nicolaou CA. A retrosynthetic analysis algorithm implementation. J Cheminform 2019; 11:1. [PMID: 30604073 PMCID: PMC6689887 DOI: 10.1186/s13321-018-0323-6] [Citation(s) in RCA: 55] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2018] [Accepted: 12/20/2018] [Indexed: 11/30/2022] Open
Abstract
The need for synthetic route design arises frequently in discovery-oriented chemistry organizations. While traditionally finding solutions to this problem has been the domain of human experts, several computational approaches, aided by the algorithmic advances and the availability of large reaction collections, have recently been reported. Herein we present our own implementation of a retrosynthetic analysis method and demonstrate its capabilities in an attempt to identify synthetic routes for a collection of approved drugs. Our results indicate that the method, leveraging on reaction transformation rules learned from a large patent reaction dataset, can identify multiple theoretically feasible synthetic routes and, thus, support research chemist everyday efforts.
Collapse
Affiliation(s)
- Ian A Watson
- Discovery Chemistry, Lilly Research Laboratories, Eli Lilly and Company, Indianapolis, IN, 46285, USA
| | - Jibo Wang
- Discovery Chemistry, Lilly Research Laboratories, Eli Lilly and Company, Indianapolis, IN, 46285, USA
| | - Christos A Nicolaou
- Discovery Chemistry, Lilly Research Laboratories, Eli Lilly and Company, Indianapolis, IN, 46285, USA.
| |
Collapse
|