1
|
Guo J, Yu C, Li K, Zhang Y, Wang G, Li S, Dong H. Retrosynthesis Zero: Self-Improving Global Synthesis Planning Using Reinforcement Learning. J Chem Theory Comput 2024; 20:4921-4938. [PMID: 38747149 DOI: 10.1021/acs.jctc.4c00071] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/12/2024]
Abstract
The field of computer-aided synthesis planning (CASP) has witnessed significant growth in recent years. Still, many CASP programs rely on large data sets to train neural networks, resulting in limitations due to the data quality and prior knowledge from chemists. In response, we propose Retrosynthesis Zero (ReSynZ), a reaction template-based method that combines Monte Carlo Tree Search with reinforcement learning inspired by AlphaGo Zero. Unlike other single-step reaction template-based CASP methods, ReSynZ takes complete synthesis paths for complex molecules, determined by reaction rules, as input for training the neural network. ReSynZ enables neural networks trained with relatively small reaction data sets (tens of thousands of data) to generate multiple synthesis pathways for a target molecule and suggest possible reaction conditions. On multiple data sets of molecular retrosynthesis, ReSynZ demonstrates excellent predictive performance compared to existing algorithms. The advantages, such as self-improving model features, flexible reward settings, the potential to surpass human limitations in chemical synthesis route planning, and others, make ReSynZ a valuable tool in chemical synthesis design.
Collapse
Affiliation(s)
- Jiasheng Guo
- Kuang Yaming Honors School, Nanjing University, Nanjing 210023, China
| | - Chenning Yu
- Kuang Yaming Honors School, Nanjing University, Nanjing 210023, China
| | - Kenan Li
- Kuang Yaming Honors School, Nanjing University, Nanjing 210023, China
| | - Yijian Zhang
- Kuang Yaming Honors School, Nanjing University, Nanjing 210023, China
| | - Guoqiang Wang
- School of Chemistry and Chemical Engineering, Key Laboratory of Mesoscopic Chemistry of Ministry of Education, Institute of Theoretical and Computational Chemistry, Nanjing University, Nanjing 210023, China
| | - Shuhua Li
- School of Chemistry and Chemical Engineering, Key Laboratory of Mesoscopic Chemistry of Ministry of Education, Institute of Theoretical and Computational Chemistry, Nanjing University, Nanjing 210023, China
| | - Hao Dong
- Kuang Yaming Honors School, Nanjing University, Nanjing 210023, China
- State Key Laboratory of Analytical Chemistry for Life Science, Chemistry and Biomedicine Innovation Center (ChemBIC), Institute for Brain Sciences, Nanjing University, Nanjing 210023, China
| |
Collapse
|
2
|
Fromer JC, Coley CW. An algorithmic framework for synthetic cost-aware decision making in molecular design. NATURE COMPUTATIONAL SCIENCE 2024; 4:440-450. [PMID: 38886590 DOI: 10.1038/s43588-024-00639-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/20/2023] [Accepted: 05/07/2024] [Indexed: 06/20/2024]
Abstract
Small molecules exhibiting desirable property profiles are often discovered through an iterative process of designing, synthesizing and testing sets of molecules. The selection of molecules to synthesize from all possible candidates is a complex decision-making process that typically relies on expert chemist intuition. Here we propose a quantitative decision-making framework, SPARROW, that prioritizes molecules for evaluation by balancing expected information gain and synthetic cost. SPARROW integrates molecular design, property prediction and retrosynthetic planning to balance the utility of testing a molecule with the cost of batch synthesis. We demonstrate, through three case studies, that the developed algorithm captures the non-additive costs inherent to batch synthesis, leverages common reaction steps and intermediates, and scales to hundreds of molecules.
Collapse
Affiliation(s)
- Jenna C Fromer
- Department of Chemical Engineering, MIT, Cambridge, MA, USA
| | - Connor W Coley
- Department of Chemical Engineering, MIT, Cambridge, MA, USA.
- Department of Electrical Engineering and Computer Science, MIT, Cambridge, MA, USA.
| |
Collapse
|
3
|
Saigiridharan L, Hassen AK, Lai H, Torren-Peraire P, Engkvist O, Genheden S. AiZynthFinder 4.0: developments based on learnings from 3 years of industrial application. J Cheminform 2024; 16:57. [PMID: 38778382 PMCID: PMC11112899 DOI: 10.1186/s13321-024-00860-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2023] [Accepted: 05/15/2024] [Indexed: 05/25/2024] Open
Abstract
We present an updated overview of the AiZynthFinder package for retrosynthesis planning. Since the first version was released in 2020, we have added a substantial number of new features based on user feedback. Feature enhancements include policies for filter reactions, support for any one-step retrosynthesis model, a scoring framework and several additional search algorithms. To exemplify the typical use-cases of the software and highlight some learnings, we perform a large-scale analysis on several hundred thousand target molecules from diverse sources. This analysis looks at for instance route shape, stock usage and exploitation of reaction space, and points out strengths and weaknesses of our retrosynthesis approach. The software is released as open-source for educational purposes as well as to provide a reference implementation of the core algorithms for synthesis prediction. We hope that releasing the software as open-source will further facilitate innovation in developing novel methods for synthetic route prediction. AiZynthFinder is a fast, robust and extensible open-source software and can be downloaded from https://github.com/MolecularAI/aizynthfinder .
Collapse
Affiliation(s)
| | - Alan Kai Hassen
- Leiden Institute of Advanced Computer Science, Leiden University, Leiden, The Netherlands
| | - Helen Lai
- Molecular AI, Discovery Sciences, R&D, AstraZeneca, Cambridge, UK
| | - Paula Torren-Peraire
- Institute of Structural Biology, Molecular Targets and Therapeutics Center, Helmholtz Zentrum München, Neuherberg, Germany
| | - Ola Engkvist
- Molecular AI, Discovery Sciences, R&D, AstraZeneca, Gothenburg, Sweden
| | - Samuel Genheden
- Molecular AI, Discovery Sciences, R&D, AstraZeneca, Gothenburg, Sweden.
| |
Collapse
|
4
|
Strieth-Kalthoff F, Szymkuć S, Molga K, Aspuru-Guzik A, Glorius F, Grzybowski BA. Artificial Intelligence for Retrosynthetic Planning Needs Both Data and Expert Knowledge. J Am Chem Soc 2024. [PMID: 38598363 DOI: 10.1021/jacs.4c00338] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/12/2024]
Abstract
Rapid advancements in artificial intelligence (AI) have enabled breakthroughs across many scientific disciplines. In organic chemistry, the challenge of planning complex multistep chemical syntheses should conceptually be well-suited for AI. Yet, the development of AI synthesis planners trained solely on reaction-example-data has stagnated and is not on par with the performance of "hybrid" algorithms combining AI with expert knowledge. This Perspective examines possible causes of these shortcomings, extending beyond the established reasoning of insufficient quantities of reaction data. Drawing attention to the intricacies and data biases that are specific to the domain of synthetic chemistry, we advocate augmenting the unique capabilities of AI with the knowledge base and the reasoning strategies of domain experts. By actively involving synthetic chemists, who are the end users of any synthesis planning software, into the development process, we envision to bridge the gap between computer algorithms and the intricate nature of chemical synthesis.
Collapse
Affiliation(s)
- Felix Strieth-Kalthoff
- University of Toronto, Department of Chemistry and Department of Computer Science, 80 St. George St., Toronto, Ontario M5S 3H6, Canada
- University of Toronto, Department of Computer Science, 10 King's College Road, Toronto, Ontario M5S 3G4, Canada
| | - Sara Szymkuć
- Allchemy, 2145 45th Street #201, Highland, Indiana 46322, United States
- Institute of Organic Chemistry, Polish Academy of Sciences, ul. Kasprzaka 44/52, Warsaw 01-224, Poland
| | - Karol Molga
- Allchemy, 2145 45th Street #201, Highland, Indiana 46322, United States
- Institute of Organic Chemistry, Polish Academy of Sciences, ul. Kasprzaka 44/52, Warsaw 01-224, Poland
| | - Alán Aspuru-Guzik
- University of Toronto, Department of Chemistry and Department of Computer Science, 80 St. George St., Toronto, Ontario M5S 3H6, Canada
- University of Toronto, Department of Computer Science, 10 King's College Road, Toronto, Ontario M5S 3G4, Canada
- Vector Institute for Artificial Intelligence, 661 University Ave., Toronto, Ontario M5G 1M1, Canada
- University of Toronto, Department of Chemical Engineering and Applied Chemistry, 200 College St., Toronto, Ontario M5S 3E5, Canada
- University of Toronto, Department of Materials Science and Engineering, 184 College St., Toronto, Ontario M5S 3E4, Canada
| | - Frank Glorius
- Universität Münster, Organisch-Chemisches Institut, Corrensstr. 36, 48149 Münster, Germany
| | - Bartosz A Grzybowski
- Institute of Organic Chemistry, Polish Academy of Sciences, ul. Kasprzaka 44/52, Warsaw 01-224, Poland
- IBS Center for Algorithmic and Robotized Synthesis, CARS, UNIST 50, UNIST-gil, Eonyang-eup, Ulju-gun, Ulsan 689-798, South Korea
- Department of Chemistry, UNIST, 50, UNIST-gil, Eonyang-eup, Ulju-gun, Ulsan 689-798, South Korea
| |
Collapse
|
5
|
Zhang B, Lin J, Du L, Zhang L. Harnessing Data Augmentation and Normalization Preprocessing to Improve the Performance of Chemical Reaction Predictions of Data-Driven Model. Polymers (Basel) 2023; 15:polym15092224. [PMID: 37177370 PMCID: PMC10180765 DOI: 10.3390/polym15092224] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2023] [Revised: 05/03/2023] [Accepted: 05/03/2023] [Indexed: 05/15/2023] Open
Abstract
As a template-free, data-driven methodology, the molecular transformer model provides an alternative by which to predict the outcome of chemical reactions and design the route of the retrosynthetic plane in the field of organic synthesis and polymer chemistry. However, in consideration of the small datasets of chemical reactions, the data-driven model suffers from the difficulty of low accuracy in the prediction tasks of chemical reactions. In this contribution, we integrate the molecular transformer model with the strategies of data augmentation and normalization preprocessing to accomplish the three tasks of chemical reactions, including the forward predictions of chemical reactions, and single-step retrosynthetic predictions with and without the reaction classes. It is clearly demonstrated that the prediction accuracy of the molecular transformer model can be significantly raised by the use of proposed strategies for the three tasks of chemical reactions. Notably, after the introduction of the 40-level data augmentation and normalization preprocessing, the top-1 accuracy of the forward prediction increases markedly from 71.6% to 84.2% and the top-1 accuracy of the single-step retrosynthetic prediction with additional reaction class increases from 53.2% to 63.4%. Furthermore, it is found that the superior performance of the data-driven model originates from the correction of the grammatical errors of the SMILES strings, especially for the case of the reaction classes with small datasets.
Collapse
Affiliation(s)
- Boyu Zhang
- Shanghai Key Laboratory of Advanced Polymeric Materials, School of Materials Science and Engineering, East China University of Science and Technology, Shanghai 200237, China
| | - Jiaping Lin
- Shanghai Key Laboratory of Advanced Polymeric Materials, School of Materials Science and Engineering, East China University of Science and Technology, Shanghai 200237, China
| | - Lei Du
- Shanghai Key Laboratory of Advanced Polymeric Materials, School of Materials Science and Engineering, East China University of Science and Technology, Shanghai 200237, China
| | - Liangshun Zhang
- Shanghai Key Laboratory of Advanced Polymeric Materials, School of Materials Science and Engineering, East China University of Science and Technology, Shanghai 200237, China
| |
Collapse
|
6
|
Grzybowski BA, Badowski T, Molga K, Szymkuć S. Network search algorithms and scoring functions for advanced‐level computerized synthesis planning. WIRES COMPUTATIONAL MOLECULAR SCIENCE 2022. [DOI: 10.1002/wcms.1630] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Affiliation(s)
- Bartosz A. Grzybowski
- Institute of Organic Chemistry, Polish Academy of Sciences Warsaw Poland
- Center for Soft and Living Matter, Institute for Basic Science (IBS) Ulsan Republic of Korea
- Department of Chemistry Ulsan National Institute of Science and Technology (UNIST) Ulsan Republic of Korea
| | - Tomasz Badowski
- Institute of Organic Chemistry, Polish Academy of Sciences Warsaw Poland
| | - Karol Molga
- Institute of Organic Chemistry, Polish Academy of Sciences Warsaw Poland
| | - Sara Szymkuć
- Institute of Organic Chemistry, Polish Academy of Sciences Warsaw Poland
| |
Collapse
|
7
|
Urbina F, Lowden CT, Culberson JC, Ekins S. MegaSyn: Integrating Generative Molecular Design, Automated Analog Designer, and Synthetic Viability Prediction. ACS OMEGA 2022; 7:18699-18713. [PMID: 35694522 PMCID: PMC9178760 DOI: 10.1021/acsomega.2c01404] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/08/2022] [Accepted: 05/11/2022] [Indexed: 05/04/2023]
Abstract
Generative machine learning models have become widely adopted in drug discovery and other fields to produce new molecules and explore molecular space, with the goal of discovering novel compounds with optimized properties. These generative models are frequently combined with transfer learning or scoring of the physicochemical properties to steer generative design, yet often, they are not capable of addressing a wide variety of potential problems, as well as converge into similar molecular space when combined with a scoring function for the desired properties. In addition, these generated compounds may not be synthetically feasible, reducing their capabilities and limiting their usefulness in real-world scenarios. Here, we introduce a suite of automated tools called MegaSyn representing three components: a new hill-climb algorithm, which makes use of SMILES-based recurrent neural network (RNN) generative models, analog generation software, and retrosynthetic analysis coupled with fragment analysis to score molecules for their synthetic feasibility. We show that by deconstructing the targeted molecules and focusing on substructures, combined with an ensemble of generative models, MegaSyn generally performs well for the specific tasks of generating new scaffolds as well as targeted analogs, which are likely synthesizable and druglike. We now describe the development, benchmarking, and testing of this suite of tools and propose how they might be used to optimize molecules or prioritize promising lead compounds using these RNN examples provided by multiple test case examples.
Collapse
Affiliation(s)
- Fabio Urbina
- Collaborations
Pharmaceuticals, Inc., 840 Main Campus Drive, Lab 3510, Raleigh, North Carolina 27606, United States
| | - Christopher T. Lowden
- Workflow
Informatics Corporation, 9316 Bramden Court, Wake Forest, North Carolina 27587, United States
| | - J. Christopher Culberson
- Workflow
Informatics Corporation, 9316 Bramden Court, Wake Forest, North Carolina 27587, United States
| | - Sean Ekins
- Collaborations
Pharmaceuticals, Inc., 840 Main Campus Drive, Lab 3510, Raleigh, North Carolina 27606, United States
- . Phone: 215-687-1320
| |
Collapse
|
8
|
Venkatasubramanian V, Mann V. Artificial intelligence in reaction prediction and chemical synthesis. Curr Opin Chem Eng 2022. [DOI: 10.1016/j.coche.2021.100749] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
|
9
|
Jia P, Pei J, Wang G, Pan X, Zhu Y, Wu Y, Ouyang L. The roles of computer-aided drug synthesis in drug development. GREEN SYNTHESIS AND CATALYSIS 2021. [DOI: 10.1016/j.gresc.2021.11.007] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/08/2023] Open
|
10
|
Dong J, Zhao M, Liu Y, Su Y, Zeng X. Deep learning in retrosynthesis planning: datasets, models and tools. Brief Bioinform 2021; 23:6375056. [PMID: 34571535 DOI: 10.1093/bib/bbab391] [Citation(s) in RCA: 32] [Impact Index Per Article: 10.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2021] [Revised: 08/16/2021] [Accepted: 08/30/2021] [Indexed: 12/29/2022] Open
Abstract
In recent years, synthesizing drugs powered by artificial intelligence has brought great convenience to society. Since retrosynthetic analysis occupies an essential position in synthetic chemistry, it has received broad attention from researchers. In this review, we comprehensively summarize the development process of retrosynthesis in the context of deep learning. This review covers all aspects of retrosynthesis, including datasets, models and tools. Specifically, we report representative models from academia, in addition to a detailed description of the available and stable platforms in the industry. We also discuss the disadvantages of the existing models and provide potential future trends, so that more abecedarians will quickly understand and participate in the family of retrosynthesis planning.
Collapse
Affiliation(s)
- Jingxin Dong
- College of Information Science and Engineering, Hunan University, 2 Lushan S Rd, Yuelu District, 410086, Hunan, China
| | - Mingyi Zhao
- Department of Pediatrics, Third Xiangya Hospital, Central South University, 400013, Hunan, China
| | - Yuansheng Liu
- College of Information Science and Engineering, Hunan University, 2 Lushan S Rd, Yuelu District, 410086, Hunan, China
| | - Yansen Su
- Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, School of Computer Science and Technology, Anhui University, 230601, Hefei, China
| | - Xiangxiang Zeng
- College of Information Science and Engineering, Hunan University, 2 Lushan S Rd, Yuelu District, 410086, Hunan, China
| |
Collapse
|
11
|
Genheden S, Engkvist O, Bjerrum E. Clustering of Synthetic Routes Using Tree Edit Distance. J Chem Inf Model 2021; 61:3899-3907. [PMID: 34342428 DOI: 10.1021/acs.jcim.1c00232] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
We present a novel algorithm to compute the distance between synthetic routes based on tree edit distances. Such distances can be used to cluster synthesis routes generated using a retrosynthesis prediction tool. We show that the clustering of selected routes from a retrosynthesis analysis is performed in less than 10 s on average and only constitutes seven percent of the total time (prediction + clustering). Furthermore, we are able to show that representative routes from each cluster can be used to reduce the set of predicted routes. Finally, we show with a number of examples that the algorithm gives intuitive clusters that can be easily rationalized and that the routes in a cluster tend to use similar chemistry. The algorithm is included in the latest version of open-source AiZynthFinder software (https://github.com/MolecularAI/aizynthfinder) and as a separate package (https://github.com/MolecularAI/route-distances).
Collapse
Affiliation(s)
- Samuel Genheden
- Molecular AI, Discovery Sciences, R&D, AstraZeneca Gothenburg, SE-431 83 Mölndal, Sweden
| | - Ola Engkvist
- Molecular AI, Discovery Sciences, R&D, AstraZeneca Gothenburg, SE-431 83 Mölndal, Sweden
| | - Esben Bjerrum
- Molecular AI, Discovery Sciences, R&D, AstraZeneca Gothenburg, SE-431 83 Mölndal, Sweden
| |
Collapse
|
12
|
Moskal M, Beker W, Szymkuć S, Grzybowski BA. Scaffold‐Directed Face Selectivity Machine‐Learned from Vectors of Non‐covalent Interactions. Angew Chem Int Ed Engl 2021. [DOI: 10.1002/ange.202101986] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
Affiliation(s)
- Martyna Moskal
- Institute of Organic Chemistry Polish Academy of Sciences Ul. Kasprzaka 44/52 01-224 Warsaw Poland
- Allchemy, Inc. Highland IN USA
| | - Wiktor Beker
- Institute of Organic Chemistry Polish Academy of Sciences Ul. Kasprzaka 44/52 01-224 Warsaw Poland
- Allchemy, Inc. Highland IN USA
| | - Sara Szymkuć
- Institute of Organic Chemistry Polish Academy of Sciences Ul. Kasprzaka 44/52 01-224 Warsaw Poland
- Allchemy, Inc. Highland IN USA
| | - Bartosz A. Grzybowski
- Institute of Organic Chemistry Polish Academy of Sciences Ul. Kasprzaka 44/52 01-224 Warsaw Poland
- Allchemy, Inc. Highland IN USA
- IBS Center for Soft and Living Matter and Department of Chemistry UNIST 50, UNIST-gil, Eonyang-eup, Ulju-gun Ulsan South Korea
| |
Collapse
|
13
|
Moskal M, Beker W, Szymkuć S, Grzybowski BA. Scaffold-Directed Face Selectivity Machine-Learned from Vectors of Non-covalent Interactions. Angew Chem Int Ed Engl 2021; 60:15230-15235. [PMID: 33876554 DOI: 10.1002/anie.202101986] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2021] [Revised: 03/29/2021] [Indexed: 11/06/2022]
Abstract
This work describes a method to vectorize and Machine-Learn, ML, non-covalent interactions responsible for scaffold-directed reactions important in synthetic chemistry. Models trained on this representation predict correct face of approach in ca. 90 % of Michael additions or Diels-Alder cycloadditions. These accuracies are significantly higher than those based on traditional ML descriptors, energetic calculations, or intuition of experienced synthetic chemists. Our results also emphasize the importance of ML models being provided with relevant mechanistic knowledge; without such knowledge, these models cannot easily "transfer-learn" and extrapolate to previously unseen reaction mechanisms.
Collapse
Affiliation(s)
- Martyna Moskal
- Institute of Organic Chemistry, Polish Academy of Sciences, Ul. Kasprzaka 44/52, 01-224, Warsaw, Poland.,Allchemy, Inc., Highland, IN, USA
| | - Wiktor Beker
- Institute of Organic Chemistry, Polish Academy of Sciences, Ul. Kasprzaka 44/52, 01-224, Warsaw, Poland.,Allchemy, Inc., Highland, IN, USA
| | - Sara Szymkuć
- Institute of Organic Chemistry, Polish Academy of Sciences, Ul. Kasprzaka 44/52, 01-224, Warsaw, Poland.,Allchemy, Inc., Highland, IN, USA
| | - Bartosz A Grzybowski
- Institute of Organic Chemistry, Polish Academy of Sciences, Ul. Kasprzaka 44/52, 01-224, Warsaw, Poland.,Allchemy, Inc., Highland, IN, USA.,IBS Center for Soft and Living Matter and Department of Chemistry, UNIST, 50, UNIST-gil, Eonyang-eup, Ulju-gun, Ulsan, South Korea
| |
Collapse
|
14
|
Matsubara S. Digitization of Organic Synthesis — How Synthetic Organic Chemists Use AI Technology —. CHEM LETT 2021. [DOI: 10.1246/cl.200802] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Affiliation(s)
- Seijiro Matsubara
- Department of Material Chemistry, Graduate School of Engineering, Kyoto University, Kyodai-Katsura, Nishikyo, Nishikyo-ku, Kyoto 615-8501, Japan
| |
Collapse
|
15
|
Molga K, Szymkuć S, Grzybowski BA. Chemist Ex Machina: Advanced Synthesis Planning by Computers. Acc Chem Res 2021; 54:1094-1106. [PMID: 33423460 DOI: 10.1021/acs.accounts.0c00714] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023]
Abstract
Teaching computers to plan multistep syntheses of arbitrary target molecules-including natural products-has been one of the oldest challenges in chemistry, dating back to the 1960s. This Account recapitulates two decades of our group's work on the software platform called Chematica, which very recently achieved this long-sought objective and has been shown capable of planning synthetic routes to complex natural products, several of which were validated in the laboratory.For the machine to plan syntheses at an expert level, it must know the rules describing chemical reactions and use these rules to expand and search the networks of synthetic options. The rules must be of high quality: They must delineate accurately the scope of admissible substituents, capture all relevant stereochemical information, detect potential reactivity conflicts, and protection requirements. They should yield only those synthons that are chemically stable and energetically allowed (e.g., not too strained) and should be able to extrapolate beyond examples already published in the literature. In parallel, the network-search algorithms must be able to assign meaningful scores to the sets of synthons they encounter, make judicious choices which of the network's branches to expand, and when to withdraw from unpromising ones. They must be able to strategize over multiple steps to resolve intermittent reactivity conflicts, exchange functional groups, or overcome local maxima of molecular complexity.Meeting all these requirements makes the problem of computer-driven retrosynthesis very multifaceted, combining expert and AI approaches further supplemented by quantum-mechanical and molecular-mechanics calculations. Development of Chematica has been a very long and gradual process because all these components are needed. Any shortcuts-for example, reliance on only expert or only data-based approaches-yield chemically naïve and often erroneous syntheses, especially for complex targets. On the bright side, once all the requisite algorithms are implemented-as they now are-they not only streamline conventional synthetic planning but also enable completely new modalities that would challenge any human chemist, for example, synthesis with multiple constraints imposed simultaneously or library-wide syntheses in which the machine constructs "global plans" leading to multiple targets and benefiting from the use of common intermediates. These types of analyses will have profound impact on the practice of chemical industry, designing more economical, more green, and less hazardous pathways.
Collapse
Affiliation(s)
- Karol Molga
- Institute of Organic Chemistry, Polish Academy of Sciences, ul. Kasprzaka 44/52, 01-224, Warsaw, Poland
| | - Sara Szymkuć
- Institute of Organic Chemistry, Polish Academy of Sciences, ul. Kasprzaka 44/52, 01-224, Warsaw, Poland
| | - Bartosz A. Grzybowski
- Institute of Organic Chemistry, Polish Academy of Sciences, ul. Kasprzaka 44/52, 01-224, Warsaw, Poland
- Center for Soft and Living Matter, Institute for Basic Science (IBS), Ulsan 44919, Republic of Korea
- Department of Chemistry, Ulsan National Institute of Science and Technology (UNIST), 50 UNIST-gil, Ulsan 44919, Republic of Korea
| |
Collapse
|
16
|
Blau SM, Patel HD, Spotte-Smith EWC, Xie X, Dwaraknath S, Persson KA. A chemically consistent graph architecture for massive reaction networks applied to solid-electrolyte interphase formation. Chem Sci 2021; 12:4931-4939. [PMID: 34163740 PMCID: PMC8179555 DOI: 10.1039/d0sc05647b] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2020] [Accepted: 02/23/2021] [Indexed: 01/09/2023] Open
Abstract
Modeling reactivity with chemical reaction networks could yield fundamental mechanistic understanding that would expedite the development of processes and technologies for energy storage, medicine, catalysis, and more. Thus far, reaction networks have been limited in size by chemically inconsistent graph representations of multi-reactant reactions (e.g. A + B → C) that cannot enforce stoichiometric constraints, precluding the use of optimized shortest-path algorithms. Here, we report a chemically consistent graph architecture that overcomes these limitations using a novel multi-reactant representation and iterative cost-solving procedure. Our approach enables the identification of all low-cost pathways to desired products in massive reaction networks containing reactions of any stoichiometry, allowing for the investigation of vastly more complex systems than previously possible. Leveraging our architecture, we construct the first ever electrochemical reaction network from first-principles thermodynamic calculations to describe the formation of the Li-ion solid electrolyte interphase (SEI), which is critical for passivation of the negative electrode. Using this network comprised of nearly 6000 species and 4.5 million reactions, we interrogate the formation of a key SEI component, lithium ethylene dicarbonate. We automatically identify previously proposed mechanisms as well as multiple novel pathways containing counter-intuitive reactions that have not, to our knowledge, been reported in the literature. We envision that our framework and data-driven methodology will facilitate efforts to engineer the composition-related properties of the SEI - or of any complex chemical process - through selective control of reactivity.
Collapse
Affiliation(s)
- Samuel M Blau
- Energy Technologies Area, Lawrence Berkeley National Laboratory Berkeley CA 94720 USA
| | - Hetal D Patel
- Department of Materials Science and Engineering, University of California Berkeley CA 94720 USA
- Materials Science Division, Lawrence Berkeley National Laboratory Berkeley CA 94720 USA
| | - Evan Walter Clark Spotte-Smith
- Department of Materials Science and Engineering, University of California Berkeley CA 94720 USA
- Materials Science Division, Lawrence Berkeley National Laboratory Berkeley CA 94720 USA
| | - Xiaowei Xie
- Materials Science Division, Lawrence Berkeley National Laboratory Berkeley CA 94720 USA
- College of Chemistry, University of California Berkeley CA 94720 USA
| | - Shyam Dwaraknath
- Materials Science Division, Lawrence Berkeley National Laboratory Berkeley CA 94720 USA
| | - Kristin A Persson
- Department of Materials Science and Engineering, University of California Berkeley CA 94720 USA
- Molecular Foundry, Lawrence Berkeley National Laboratory Berkeley CA 94720 USA
| |
Collapse
|
17
|
Orlova Y, Gambardella AA, Kryven I, Keune K, Iedema PD. Generative Algorithm for Molecular Graphs Uncovers Products of Oil Oxidation. J Chem Inf Model 2021; 61:1457-1469. [PMID: 33615781 PMCID: PMC7988456 DOI: 10.1021/acs.jcim.0c01163] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
![]()
The autoxidation
of triglyceride (or triacylglycerol, TAG) is a
poorly understood complex system. It is known from mass spectrometry
measurements that, although initiated by a single molecule, this system
involves an abundance of intermediate species and a complex network
of reactions. For this reason, the attribution of the mass peaks to
exact molecular structures is difficult without additional information
about the system. We provide such information using a graph theory-based
algorithm. Our algorithm performs an automatic discovery of the chemical
reaction network that is responsible for the complexity of the mass
spectra in drying oils. This knowledge is then applied to match experimentally
measured mass spectra with computationally predicted molecular graphs.
We demonstrate this methodology on the autoxidation of triolein as
measured by electrospray ionization-mass spectrometry (ESI-MS). Our
protocol can be readily applied to investigate other oils and their
mixtures.
Collapse
Affiliation(s)
- Yuliia Orlova
- Van't Hoff Institute for Molecular Sciences, University of Amsterdam, Amsterdam 1098 XH, The Netherlands
| | | | - Ivan Kryven
- Mathematical Institute, Utrecht University, Utrecht 3584 CD, The Netherlands.,Centre for Complex Systems Studies, Utrecht 3584 CE, The Netherlands
| | | | - Piet D Iedema
- Van't Hoff Institute for Molecular Sciences, University of Amsterdam, Amsterdam 1098 XH, The Netherlands
| |
Collapse
|
18
|
Wang Z, Zhao W, Hao G, Song B. Mapping the resources and approaches facilitating computer-aided synthesis planning. Org Chem Front 2021. [DOI: 10.1039/d0qo00946f] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/21/2023]
Abstract
Computer-aided synthesis planning could facilitate organic synthesis study and relieve chemists of manual tasks. Artificial intelligence and deep learning would be useful for the development of computer-aided synthesis planning.
Collapse
Affiliation(s)
- Zheng Wang
- State Key Laboratory Breeding Base of Green Pesticide and Agricultural Bioengineering
- Key Laboratory of Green Pesticide and Agricultural Bioengineering
- Ministry of Education
- Center for Research and Development of Fine Chemicals
- Guizhou University
| | - Wei Zhao
- State Key Laboratory Breeding Base of Green Pesticide and Agricultural Bioengineering
- Key Laboratory of Green Pesticide and Agricultural Bioengineering
- Ministry of Education
- Center for Research and Development of Fine Chemicals
- Guizhou University
| | - Gefei Hao
- State Key Laboratory Breeding Base of Green Pesticide and Agricultural Bioengineering
- Key Laboratory of Green Pesticide and Agricultural Bioengineering
- Ministry of Education
- Center for Research and Development of Fine Chemicals
- Guizhou University
| | - Baoan Song
- State Key Laboratory Breeding Base of Green Pesticide and Agricultural Bioengineering
- Key Laboratory of Green Pesticide and Agricultural Bioengineering
- Ministry of Education
- Center for Research and Development of Fine Chemicals
- Guizhou University
| |
Collapse
|
19
|
Kell DB, Samanta S, Swainston N. Deep learning and generative methods in cheminformatics and chemical biology: navigating small molecule space intelligently. Biochem J 2020; 477:4559-4580. [PMID: 33290527 PMCID: PMC7733676 DOI: 10.1042/bcj20200781] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2020] [Revised: 11/11/2020] [Accepted: 11/12/2020] [Indexed: 12/15/2022]
Abstract
The number of 'small' molecules that may be of interest to chemical biologists - chemical space - is enormous, but the fraction that have ever been made is tiny. Most strategies are discriminative, i.e. have involved 'forward' problems (have molecule, establish properties). However, we normally wish to solve the much harder generative or inverse problem (describe desired properties, find molecule). 'Deep' (machine) learning based on large-scale neural networks underpins technologies such as computer vision, natural language processing, driverless cars, and world-leading performance in games such as Go; it can also be applied to the solution of inverse problems in chemical biology. In particular, recent developments in deep learning admit the in silico generation of candidate molecular structures and the prediction of their properties, thereby allowing one to navigate (bio)chemical space intelligently. These methods are revolutionary but require an understanding of both (bio)chemistry and computer science to be exploited to best advantage. We give a high-level (non-mathematical) background to the deep learning revolution, and set out the crucial issue for chemical biology and informatics as a two-way mapping from the discrete nature of individual molecules to the continuous but high-dimensional latent representation that may best reflect chemical space. A variety of architectures can do this; we focus on a particular type known as variational autoencoders. We then provide some examples of recent successes of these kinds of approach, and a look towards the future.
Collapse
Affiliation(s)
- Douglas B. Kell
- Department of Biochemistry and Systems Biology, Institute of Systems, Molecular and Integrative Biology, Faculty of Health and Life Sciences, University of Liverpool, Crown St, Liverpool L69 7ZB, U.K
- Novo Nordisk Foundation Centre for Biosustainability, Technical University of Denmark, Building 220, Kemitorvet, 2800 Kgs. Lyngby, Denmark
| | - Soumitra Samanta
- Department of Biochemistry and Systems Biology, Institute of Systems, Molecular and Integrative Biology, Faculty of Health and Life Sciences, University of Liverpool, Crown St, Liverpool L69 7ZB, U.K
| | - Neil Swainston
- Department of Biochemistry and Systems Biology, Institute of Systems, Molecular and Integrative Biology, Faculty of Health and Life Sciences, University of Liverpool, Crown St, Liverpool L69 7ZB, U.K
| |
Collapse
|
20
|
Mo Y, Guan Y, Verma P, Guo J, Fortunato ME, Lu Z, Coley CW, Jensen KF. Evaluating and clustering retrosynthesis pathways with learned strategy. Chem Sci 2020; 12:1469-1478. [PMID: 34163910 PMCID: PMC8179211 DOI: 10.1039/d0sc05078d] [Citation(s) in RCA: 23] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023] Open
Abstract
With recent advances in the computer-aided synthesis planning (CASP) powered by data science and machine learning, modern CASP programs can rapidly identify thousands of potential pathways for a given target molecule. However, the lack of a holistic pathway evaluation mechanism makes it challenging to systematically prioritize strategic pathways except for using some simple heuristics. Herein, we introduce a data-driven approach to evaluate the relative strategic levels of retrosynthesis pathways using a dynamic tree-structured long short-term memory (tree-LSTM) model. We first curated a retrosynthesis pathway database, containing 238k patent-extracted pathways along with ∼55 M artificial pathways generated from an open-source CASP program, ASKCOS. The tree-LSTM model was trained to differentiate patent-extracted and artificial pathways with the same target molecule in order to learn the strategic relationship among single-step reactions within the patent-extracted pathways. The model achieved a top-1 ranking accuracy of 79.1% to recognize patent-extracted pathways. In addition, the trained tree-LSTM model learned to encode pathway-level information into a representative latent vector, which can facilitate clustering similar pathways to help illustrate strategically diverse pathways generated from CASP programs. Tree-structured long short-term memory neural model learns to understand the retrosynthesis design strategies from patent-extracted retrosynthetic pathway data.![]()
Collapse
Affiliation(s)
- Yiming Mo
- Department of Chemical Engineering, Massachusetts Institute of Technology Cambridge Massachusetts 02139 USA .,College of Chemical and Biological Engineering, Zhejiang University Hangzhou Zhejiang Province 310007 China.,ZJU-Hangzhou Global Scientific and Technological Innovation Center Hangzhou Zhejiang Province 311215 China
| | - Yanfei Guan
- Department of Chemical Engineering, Massachusetts Institute of Technology Cambridge Massachusetts 02139 USA
| | - Pritha Verma
- Department of Chemical Engineering, Massachusetts Institute of Technology Cambridge Massachusetts 02139 USA
| | - Jiang Guo
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology Cambridge Massachusetts 02139 USA
| | - Mike E Fortunato
- Department of Chemical Engineering, Massachusetts Institute of Technology Cambridge Massachusetts 02139 USA
| | - Zhaohong Lu
- Department of Chemistry, Massachusetts Institute of Technology Cambridge Massachusetts 02139 USA
| | - Connor W Coley
- Department of Chemical Engineering, Massachusetts Institute of Technology Cambridge Massachusetts 02139 USA
| | - Klavs F Jensen
- Department of Chemical Engineering, Massachusetts Institute of Technology Cambridge Massachusetts 02139 USA
| |
Collapse
|
21
|
Jethava KP, Fine J, Chen Y, Hossain A, Chopra G. Accelerated Reactivity Mechanism and Interpretable Machine Learning Model of N-Sulfonylimines toward Fast Multicomponent Reactions. Org Lett 2020; 22:8480-8486. [PMID: 33074678 DOI: 10.1021/acs.orglett.0c03083] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
We introduce chemical reactivity flowcharts to help chemists interpret reaction outcomes using statistically robust machine learning models trained on a small number of reactions. We developed fast N-sulfonylimine multicomponent reactions for understanding reactivity and to generate training data. Accelerated reactivity mechanisms were investigated using density functional theory. Intuitive chemical features learned by the model accurately predicted heterogeneous reactivity of N-sulfonylimine with different carboxylic acids. Validation of the predictions shows that reaction outcome interpretation is useful for human chemists.
Collapse
|
22
|
Computational planning of the synthesis of complex natural products. Nature 2020; 588:83-88. [PMID: 33049755 DOI: 10.1038/s41586-020-2855-y] [Citation(s) in RCA: 88] [Impact Index Per Article: 22.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2020] [Accepted: 10/06/2020] [Indexed: 12/27/2022]
Abstract
Training algorithms to computationally plan multistep organic syntheses has been a challenge for more than 50 years1-7. However, the field has progressed greatly since the development of early programs such as LHASA1,7, for which reaction choices at each step were made by human operators. Multiple software platforms6,8-14 are now capable of completely autonomous planning. But these programs 'think' only one step at a time and have so far been limited to relatively simple targets, the syntheses of which could arguably be designed by human chemists within minutes, without the help of a computer. Furthermore, no algorithm has yet been able to design plausible routes to complex natural products, for which much more far-sighted, multistep planning is necessary15,16 and closely related literature precedents cannot be relied on. Here we demonstrate that such computational synthesis planning is possible, provided that the program's knowledge of organic chemistry and data-based artificial intelligence routines are augmented with causal relationships17,18, allowing it to 'strategize' over multiple synthetic steps. Using a Turing-like test administered to synthesis experts, we show that the routes designed by such a program are largely indistinguishable from those designed by humans. We also successfully validated three computer-designed syntheses of natural products in the laboratory. Taken together, these results indicate that expert-level automated synthetic planning is feasible, pending continued improvements to the reaction knowledge base and further code optimization.
Collapse
|
23
|
Wang X, Qian Y, Gao H, Coley CW, Mo Y, Barzilay R, Jensen KF. Towards efficient discovery of green synthetic pathways with Monte Carlo tree search and reinforcement learning. Chem Sci 2020; 11:10959-10972. [PMID: 34094345 PMCID: PMC8162445 DOI: 10.1039/d0sc04184j] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2020] [Accepted: 09/11/2020] [Indexed: 12/25/2022] Open
Abstract
Computer aided synthesis planning of synthetic pathways with green process conditions has become of increasing importance in organic chemistry, but the large search space inherent in synthesis planning and the difficulty in predicting reaction conditions make it a significant challenge. We introduce a new Monte Carlo Tree Search (MCTS) variant that promotes balance between exploration and exploitation across the synthesis space. Together with a value network trained from reinforcement learning and a solvent-prediction neural network, our algorithm is comparable to the best MCTS variant (PUCT, similar to Google's Alpha Go) in finding valid synthesis pathways within a fixed searching time, and superior in identifying shorter routes with greener solvents under the same search conditions. In addition, with the same root compound visit count, our algorithm outperforms the PUCT MCTS by 16% in terms of determining successful routes. Overall the success rate is improved by 19.7% compared to the upper confidence bound applied to trees (UCT) MCTS method. Moreover, we improve 71.4% of the routes proposed by the PUCT MCTS variant in pathway length and choices of green solvents. The approach generally enables including Green Chemistry considerations in computer aided synthesis planning with potential applications in process development for fine chemicals or pharmaceuticals.
Collapse
Affiliation(s)
- Xiaoxue Wang
- Department of Chemical Engineering, Massachusetts Institute of Technology Cambridge Massachusetts 02139 USA
- Department of Chemical and Biomolecular Engineering, The Ohio State University Columbus Ohio 43210 USA
| | - Yujie Qian
- Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology Cambridge Massachusetts 02139 USA
| | - Hanyu Gao
- Department of Chemical Engineering, Massachusetts Institute of Technology Cambridge Massachusetts 02139 USA
| | - Connor W Coley
- Department of Chemical Engineering, Massachusetts Institute of Technology Cambridge Massachusetts 02139 USA
| | - Yiming Mo
- Department of Chemical Engineering, Massachusetts Institute of Technology Cambridge Massachusetts 02139 USA
| | - Regina Barzilay
- Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology Cambridge Massachusetts 02139 USA
| | - Klavs F Jensen
- Department of Chemical Engineering, Massachusetts Institute of Technology Cambridge Massachusetts 02139 USA
| |
Collapse
|
24
|
Szymkuć S, Gajewska EP, Molga K, Wołos A, Roszak R, Beker W, Moskal M, Dittwald P, Grzybowski BA. Computer-generated "synthetic contingency" plans at times of logistics and supply problems: scenarios for hydroxychloroquine and remdesivir. Chem Sci 2020; 11:6736-6744. [PMID: 33033595 PMCID: PMC7500088 DOI: 10.1039/d0sc01799j] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2020] [Accepted: 06/02/2020] [Indexed: 01/21/2023] Open
Abstract
A computer program for retrosynthetic planning helps develop multiple "synthetic contingency" plans for hydroxychloroquine and also routes leading to remdesivir, both promising but yet unproven medications against COVID-19. These plans are designed to navigate, as much as possible, around known and patented routes and to commence from inexpensive and diverse starting materials, so as to ensure supply in case of anticipated market shortages of commonly used substrates. Looking beyond the current COVID-19 pandemic, development of similar contingency syntheses is advocated for other already-approved medications, in case such medications become urgently needed in mass quantities to face other public-health emergencies.
Collapse
Affiliation(s)
- Sara Szymkuć
- Institute of Organic Chemistry , Polish Academy of Sciences , ul. Kasprzaka 44/52 , Warsaw 02-224 , Poland .
| | - Ewa P Gajewska
- Institute of Organic Chemistry , Polish Academy of Sciences , ul. Kasprzaka 44/52 , Warsaw 02-224 , Poland .
| | - Karol Molga
- Institute of Organic Chemistry , Polish Academy of Sciences , ul. Kasprzaka 44/52 , Warsaw 02-224 , Poland .
| | - Agnieszka Wołos
- Institute of Organic Chemistry , Polish Academy of Sciences , ul. Kasprzaka 44/52 , Warsaw 02-224 , Poland .
| | - Rafał Roszak
- Institute of Organic Chemistry , Polish Academy of Sciences , ul. Kasprzaka 44/52 , Warsaw 02-224 , Poland .
| | - Wiktor Beker
- Institute of Organic Chemistry , Polish Academy of Sciences , ul. Kasprzaka 44/52 , Warsaw 02-224 , Poland .
| | - Martyna Moskal
- Institute of Organic Chemistry , Polish Academy of Sciences , ul. Kasprzaka 44/52 , Warsaw 02-224 , Poland .
| | - Piotr Dittwald
- Institute of Organic Chemistry , Polish Academy of Sciences , ul. Kasprzaka 44/52 , Warsaw 02-224 , Poland .
| | - Bartosz A Grzybowski
- Institute of Organic Chemistry , Polish Academy of Sciences , ul. Kasprzaka 44/52 , Warsaw 02-224 , Poland .
- IBS Center for Soft and Living Matter , 50, UNIST-gil, Eonyang-eup, Ulju-gun , Ulsan , 689-798 , South Korea
- Department of Chemistry , UNIST , 50, UNIST-gil, Eonyang-eup, Ulju-gun , Ulsan , 689-798 , South Korea
| |
Collapse
|
25
|
Lin K, Xu Y, Pei J, Lai L. Automatic retrosynthetic route planning using template-free models. Chem Sci 2020; 11:3355-3364. [PMID: 34122843 PMCID: PMC8152431 DOI: 10.1039/c9sc03666k] [Citation(s) in RCA: 71] [Impact Index Per Article: 17.8] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2019] [Accepted: 03/02/2020] [Indexed: 01/05/2023] Open
Abstract
Retrosynthetic route planning can be considered a rule-based reasoning procedure. The possibilities for each transformation are generated based on collected reaction rules, and then potential reaction routes are recommended by various optimization algorithms. Although there has been much progress in computer-assisted retrosynthetic route planning and reaction prediction, fully data-driven automatic retrosynthetic route planning remains challenging. Here we present a template-free approach that is independent of reaction templates, rules, or atom mapping, to implement automatic retrosynthetic route planning. We treated each reaction prediction task as a data-driven sequence-to-sequence problem using the multi-head attention-based Transformer architecture, which has demonstrated power in machine translation tasks. Using reactions from the United States patent literature, our end-to-end models naturally incorporate the global chemical environments of molecules and achieve remarkable performance in top-1 predictive accuracy (63.0%, with the reaction class provided) and top-1 molecular validity (99.6%) in one-step retrosynthetic tasks. Inspired by the success rate of the one-step reaction prediction, we further carried out iterative, multi-step retrosynthetic route planning for four case products, which was successful. We then constructed an automatic data-driven end-to-end retrosynthetic route planning system (AutoSynRoute) using Monte Carlo tree search with a heuristic scoring function. AutoSynRoute successfully reproduced published synthesis routes for the four case products. The end-to-end model for reaction task prediction can be easily extended to larger or customer-requested reaction databases. Our study presents an important step in realizing automatic retrosynthetic route planning.
Collapse
Affiliation(s)
- Kangjie Lin
- BNLMS, Peking-Tsinghua Center for Life Sciences at the College of Chemistry and Molecular Engineering, Peking University Beijing 100871 PR China
| | - Youjun Xu
- BNLMS, Peking-Tsinghua Center for Life Sciences at the College of Chemistry and Molecular Engineering, Peking University Beijing 100871 PR China
| | - Jianfeng Pei
- Center for Quantitative Biology, Academy for Advanced Interdisciplinary Studies, Peking University Beijing 100871 PR China
| | - Luhua Lai
- BNLMS, Peking-Tsinghua Center for Life Sciences at the College of Chemistry and Molecular Engineering, Peking University Beijing 100871 PR China
- Center for Quantitative Biology, Academy for Advanced Interdisciplinary Studies, Peking University Beijing 100871 PR China
| |
Collapse
|
26
|
Godfrey AG, Michael SG, Sittampalam GS, Zahoránszky-Köhalmi G. A Perspective on Innovating the Chemistry Lab Bench. Front Robot AI 2020; 7:24. [PMID: 33501193 PMCID: PMC7805875 DOI: 10.3389/frobt.2020.00024] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2019] [Accepted: 02/12/2020] [Indexed: 12/16/2022] Open
Abstract
Innovating on the design and function of the chemical bench remains a quintessential challenge of the ages. It requires a deep understanding of the important role chemistry plays in scientific discovery as well a first principles approach to addressing the gaps in how work gets done at the bench. This perspective examines how one might explore designing and creating a sustainable new standard for advancing automated chemistry bench itself. We propose how this might be done by leveraging recent advances in laboratory automation whereby integrating the latest synthetic, analytical and information technologies, and AI/ML algorithms within a standardized framework, maximizes the value of the data generated and the broader utility of such systems. Although the context of this perspective focuses on the design of advancing molecule of potential therapeutic value, it would not be a stretch to contemplate how such systems could be applied to other applied disciplines like advanced materials, foodstuffs, or agricultural product development.
Collapse
Affiliation(s)
- Alexander G. Godfrey
- National Center for Advancing Translational Sciences, National Institutes of Health, Bethesda, MD, United States
| | | | | | | |
Collapse
|
27
|
Badowski T, Gajewska EP, Molga K, Grzybowski BA. Synergy Between Expert and Machine‐Learning Approaches Allows for Improved Retrosynthetic Planning. Angew Chem Int Ed Engl 2019. [DOI: 10.1002/ange.201912083] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Affiliation(s)
- Tomasz Badowski
- Institute of Organic Chemistry Polish Academy of Sciences Ul. Kasprzaka 44/52 01-224 Warsaw Poland
| | - Ewa P. Gajewska
- Institute of Organic Chemistry Polish Academy of Sciences Ul. Kasprzaka 44/52 01-224 Warsaw Poland
| | - Karol Molga
- Institute of Organic Chemistry Polish Academy of Sciences Ul. Kasprzaka 44/52 01-224 Warsaw Poland
| | - Bartosz A. Grzybowski
- Institute of Organic Chemistry Polish Academy of Sciences Ul. Kasprzaka 44/52 01-224 Warsaw Poland
- IBS Center for Soft and Living Matter and Department of Chemistry UNIST 50, UNIST-gil, Eonyang-eup, Ulju-gun Ulsan South Korea
| |
Collapse
|
28
|
Badowski T, Gajewska EP, Molga K, Grzybowski BA. Synergy Between Expert and Machine‐Learning Approaches Allows for Improved Retrosynthetic Planning. Angew Chem Int Ed Engl 2019; 59:725-730. [DOI: 10.1002/anie.201912083] [Citation(s) in RCA: 42] [Impact Index Per Article: 8.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2019] [Indexed: 12/27/2022]
Affiliation(s)
- Tomasz Badowski
- Institute of Organic Chemistry Polish Academy of Sciences Ul. Kasprzaka 44/52 01-224 Warsaw Poland
| | - Ewa P. Gajewska
- Institute of Organic Chemistry Polish Academy of Sciences Ul. Kasprzaka 44/52 01-224 Warsaw Poland
| | - Karol Molga
- Institute of Organic Chemistry Polish Academy of Sciences Ul. Kasprzaka 44/52 01-224 Warsaw Poland
| | - Bartosz A. Grzybowski
- Institute of Organic Chemistry Polish Academy of Sciences Ul. Kasprzaka 44/52 01-224 Warsaw Poland
- IBS Center for Soft and Living Matter and Department of Chemistry UNIST 50, UNIST-gil, Eonyang-eup, Ulju-gun Ulsan South Korea
| |
Collapse
|
29
|
Molga K, Dittwald P, Grzybowski BA. Computational design of syntheses leading to compound libraries or isotopically labelled targets. Chem Sci 2019; 10:9219-9232. [PMID: 32055308 PMCID: PMC6979321 DOI: 10.1039/c9sc02678a] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2019] [Accepted: 08/09/2019] [Indexed: 01/08/2023] Open
Abstract
Although computer programs for retrosynthetic planning have shown improved and in some cases quite satisfactory performance in designing routes leading to specific, individual targets, no algorithms capable of planning syntheses of entire target libraries - important in modern drug discovery - have yet been reported. This study describes how network-search routines underlying existing retrosynthetic programs can be adapted and extended to multi-target design operating on one common search graph, benefitting from the use of common intermediates and reducing the overall synthetic cost. Implementation in the Chematica platform illustrates the usefulness of such algorithms in the syntheses of either (i) all members of a user-defined library, or (ii) the most synthetically accessible members of this library. In the latter case, algorithms are also readily adapted to the identification of the most facile syntheses of isotopically labelled targets. These examples are industrially relevant in the context of hit-to-lead optimization and syntheses of isotopomers of various bioactive molecules.
Collapse
Affiliation(s)
- Karol Molga
- Institute of Organic Chemistry , Polish Academy of Sciences , ul. Kasprzaka 44/52 , Warsaw 01-224 , Poland .
| | - Piotr Dittwald
- Institute of Organic Chemistry , Polish Academy of Sciences , ul. Kasprzaka 44/52 , Warsaw 01-224 , Poland .
| | - Bartosz A Grzybowski
- Institute of Organic Chemistry , Polish Academy of Sciences , ul. Kasprzaka 44/52 , Warsaw 01-224 , Poland .
- IBS Center for Soft and Living Matter and Department of Chemistry , UNIST , 50, UNIST-gil, Eonyang-eup, Ulju-gun , Ulsan , 689-798 , South Korea
| |
Collapse
|
30
|
de Almeida AF, Moreira R, Rodrigues T. Synthetic organic chemistry driven by artificial intelligence. Nat Rev Chem 2019. [DOI: 10.1038/s41570-019-0124-0] [Citation(s) in RCA: 111] [Impact Index Per Article: 22.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
|
31
|
Lin GM, Warden-Rothman R, Voigt CA. Retrosynthetic design of metabolic pathways to chemicals not found in nature. ACTA ACUST UNITED AC 2019. [DOI: 10.1016/j.coisb.2019.04.004] [Citation(s) in RCA: 57] [Impact Index Per Article: 11.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
|