51
|
A Review on Artificial Intelligence Enabled Design, Synthesis, and Process Optimization of Chemical Products for Industry 4.0. Processes (Basel) 2023. [DOI: 10.3390/pr11020330] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/20/2023] Open
Abstract
With the development of Industry 4.0, artificial intelligence (AI) is gaining increasing attention for its performance in solving particularly complex problems in industrial chemistry and chemical engineering. Therefore, this review provides an overview of the application of AI techniques, in particular machine learning, in chemical design, synthesis, and process optimization over the past years. In this review, the focus is on the application of AI for structure-function relationship analysis, synthetic route planning, and automated synthesis. Finally, we discuss the challenges and future of AI in making chemical products.
Collapse
|
52
|
Skoraczyński G, Kitlas M, Miasojedow B, Gambin A. Critical assessment of synthetic accessibility scores in computer-assisted synthesis planning. J Cheminform 2023; 15:6. [PMID: 36641473 PMCID: PMC9840255 DOI: 10.1186/s13321-023-00678-z] [Citation(s) in RCA: 11] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2022] [Accepted: 01/04/2023] [Indexed: 01/15/2023] Open
Abstract
Modern computer-assisted synthesis planning tools provide strong support for this problem. However, they are still limited by computational complexity. This limitation may be overcome by scoring the synthetic accessibility as a pre-retrosynthesis heuristic. A wide range of machine learning scoring approaches is available, however, their applicability and correctness were studied to a limited extent. Moreover, there is a lack of critical assessment of synthetic accessibility scores with common test conditions.In the present work, we assess if synthetic accessibility scores can reliably predict the outcomes of retrosynthesis planning. Using a specially prepared compounds database, we examine the outcomes of the retrosynthetic tool AiZynthFinder. We test whether synthetic accessibility scores: SAscore, SYBA, SCScore, and RAscore accurately predict the results of retrosynthesis planning. Furthermore, we investigate if synthetic accessibility scores can speed up retrosynthesis planning by better prioritizing explored partial synthetic routes and thus reducing the size of the search space. For that purpose, we analyze the AiZynthFinder partial solutions search trees, their structure, and complexity parameters, such as the number of nodes, or treewidth.We confirm that synthetic accessibility scores in most cases well discriminate feasible molecules from infeasible ones and can be potential boosters of retrosynthesis planning tools. Moreover, we show the current challenges of designing computer-assisted synthesis planning tools. We conclude that hybrid machine learning and human intuition-based synthetic accessibility scores can efficiently boost the effectiveness of computer-assisted retrosynthesis planning, however, they need to be carefully crafted for retrosynthesis planning algorithms.The source code of this work is publicly available at https://github.com/grzsko/ASAP .
Collapse
Affiliation(s)
- Grzegorz Skoraczyński
- grid.12847.380000 0004 1937 1290Faculty of Mathematics, Informatics, and Mechanics, University of Warsaw, Stefana Banacha 2, Warsaw, Poland
| | - Mateusz Kitlas
- grid.12847.380000 0004 1937 1290Faculty of Mathematics, Informatics, and Mechanics, University of Warsaw, Stefana Banacha 2, Warsaw, Poland
| | - Błażej Miasojedow
- grid.12847.380000 0004 1937 1290Faculty of Mathematics, Informatics, and Mechanics, University of Warsaw, Stefana Banacha 2, Warsaw, Poland
| | - Anna Gambin
- grid.12847.380000 0004 1937 1290Faculty of Mathematics, Informatics, and Mechanics, University of Warsaw, Stefana Banacha 2, Warsaw, Poland
| |
Collapse
|
53
|
Moret M, Pachon Angona I, Cotos L, Yan S, Atz K, Brunner C, Baumgartner M, Grisoni F, Schneider G. Leveraging molecular structure and bioactivity with chemical language models for de novo drug design. Nat Commun 2023; 14:114. [PMID: 36611029 PMCID: PMC9825622 DOI: 10.1038/s41467-022-35692-6] [Citation(s) in RCA: 28] [Impact Index Per Article: 28.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2021] [Accepted: 12/19/2022] [Indexed: 01/09/2023] Open
Abstract
Generative chemical language models (CLMs) can be used for de novo molecular structure generation by learning from a textual representation of molecules. Here, we show that hybrid CLMs can additionally leverage the bioactivity information available for the training compounds. To computationally design ligands of phosphoinositide 3-kinase gamma (PI3Kγ), a collection of virtual molecules was created with a generative CLM. This virtual compound library was refined using a CLM-based classifier for bioactivity prediction. This second hybrid CLM was pretrained with patented molecular structures and fine-tuned with known PI3Kγ ligands. Several of the computer-generated molecular designs were commercially available, enabling fast prescreening and preliminary experimental validation. A new PI3Kγ ligand with sub-micromolar activity was identified, highlighting the method's scaffold-hopping potential. Chemical synthesis and biochemical testing of two of the top-ranked de novo designed molecules and their derivatives corroborated the model's ability to generate PI3Kγ ligands with medium to low nanomolar activity for hit-to-lead expansion. The most potent compounds led to pronounced inhibition of PI3K-dependent Akt phosphorylation in a medulloblastoma cell model, demonstrating efficacy of PI3Kγ ligands in PI3K/Akt pathway repression in human tumor cells. The results positively advocate hybrid CLMs for virtual compound screening and activity-focused molecular design.
Collapse
Affiliation(s)
- Michael Moret
- ETH Zurich, Department of Chemistry and Applied Biosciences, Vladimir-Prelog-Weg 4, 8093, Zurich, Switzerland
| | - Irene Pachon Angona
- ETH Zurich, Department of Chemistry and Applied Biosciences, Vladimir-Prelog-Weg 4, 8093, Zurich, Switzerland
| | - Leandro Cotos
- ETH Zurich, Department of Chemistry and Applied Biosciences, Vladimir-Prelog-Weg 4, 8093, Zurich, Switzerland
| | - Shen Yan
- University of Zurich, University Children's Hospital, Children's Research Center, Pediatric Molecular Neuro-Oncology Research, Lengghalde 5, 8008, Zurich, Switzerland
| | - Kenneth Atz
- ETH Zurich, Department of Chemistry and Applied Biosciences, Vladimir-Prelog-Weg 4, 8093, Zurich, Switzerland
| | - Cyrill Brunner
- ETH Zurich, Department of Chemistry and Applied Biosciences, Vladimir-Prelog-Weg 4, 8093, Zurich, Switzerland
| | - Martin Baumgartner
- University of Zurich, University Children's Hospital, Children's Research Center, Pediatric Molecular Neuro-Oncology Research, Lengghalde 5, 8008, Zurich, Switzerland
| | - Francesca Grisoni
- ETH Zurich, Department of Chemistry and Applied Biosciences, Vladimir-Prelog-Weg 4, 8093, Zurich, Switzerland. .,Eindhoven University of Technology, Institute for Complex Molecular Systems and Eindhoven Artificial Intelligence Systems Institute, Department of Biomedical Engineering, Groene Loper 7, 5612AZ, Eindhoven, The Netherlands. .,Center for 393 Living Technologies, Alliance TU/e, WUR, UU, UMC 394 Utrecht, Utrecht, 3584 CB, The Netherlands.
| | - Gisbert Schneider
- ETH Zurich, Department of Chemistry and Applied Biosciences, Vladimir-Prelog-Weg 4, 8093, Zurich, Switzerland. .,ETH Singapore SEC Ltd, 1 CREATE Way, #06-01 CREATE Tower, Singapore, 138602, Singapore.
| |
Collapse
|
54
|
Tu Z, Stuyver T, Coley CW. Predictive chemistry: machine learning for reaction deployment, reaction development, and reaction discovery. Chem Sci 2023; 14:226-244. [PMID: 36743887 PMCID: PMC9811563 DOI: 10.1039/d2sc05089g] [Citation(s) in RCA: 16] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2022] [Accepted: 11/25/2022] [Indexed: 11/29/2022] Open
Abstract
The field of predictive chemistry relates to the development of models able to describe how molecules interact and react. It encompasses the long-standing task of computer-aided retrosynthesis, but is far more reaching and ambitious in its goals. In this review, we summarize several areas where predictive chemistry models hold the potential to accelerate the deployment, development, and discovery of organic reactions and advance synthetic chemistry.
Collapse
Affiliation(s)
- Zhengkai Tu
- Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology 77 Massachusetts Avenue Cambridge MA 02139 USA
| | - Thijs Stuyver
- Department of Chemical Engineering, Massachusetts Institute of Technology 77 Massachusetts Avenue Cambridge MA 02139 USA
| | - Connor W Coley
- Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology 77 Massachusetts Avenue Cambridge MA 02139 USA
- Department of Chemical Engineering, Massachusetts Institute of Technology 77 Massachusetts Avenue Cambridge MA 02139 USA
| |
Collapse
|
55
|
Lim PK, Julca I, Mutwil M. Redesigning plant specialized metabolism with supervised machine learning using publicly available reactome data. Comput Struct Biotechnol J 2023; 21:1639-1650. [PMID: 36874159 PMCID: PMC9976193 DOI: 10.1016/j.csbj.2023.01.013] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2022] [Revised: 01/12/2023] [Accepted: 01/12/2023] [Indexed: 01/19/2023] Open
Abstract
The immense structural diversity of products and intermediates of plant specialized metabolism (specialized metabolites) makes them rich sources of therapeutic medicine, nutrients, and other useful materials. With the rapid accumulation of reactome data that can be accessible on biological and chemical databases, along with recent advances in machine learning, this review sets out to outline how supervised machine learning can be used to design new compounds and pathways by exploiting the wealth of said data. We will first examine the various sources from which reactome data can be obtained, followed by explaining the different machine learning encoding methods for reactome data. We then discuss current supervised machine learning developments that can be employed in various aspects to help redesign plant specialized metabolism.
Collapse
Affiliation(s)
- Peng Ken Lim
- School of Biological Sciences, Nanyang Technological University, Singapore, Singapore
| | - Irene Julca
- School of Biological Sciences, Nanyang Technological University, Singapore, Singapore
| | - Marek Mutwil
- School of Biological Sciences, Nanyang Technological University, Singapore, Singapore
| |
Collapse
|
56
|
Andronov M, Voinarovska V, Andronova N, Wand M, Clevert DA, Schmidhuber J. Reagent prediction with a molecular transformer improves reaction data quality. Chem Sci 2023; 14:3235-3246. [PMID: 36970100 PMCID: PMC10034139 DOI: 10.1039/d2sc06798f] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2022] [Accepted: 02/12/2023] [Indexed: 03/05/2023] Open
Abstract
A molecular transformer predicts reagents for organic reactions. It is also able to replace questionable reagents in reaction data, e.g. USPTO, to enable better product prediction models to be trained on these new data.
Collapse
Affiliation(s)
- Mikhail Andronov
- IDSIA, USI, SUPSI, 6900 Lugano, Switzerland
- Machine Learning Research, Pfizer Worldwide Research Development and Medical, Linkstr.10, Berlin, Germany
| | - Varvara Voinarovska
- Institute of Structural Biology, Molecular Targets and Therapeutics Center, Helmholtz Munich – Deutsches Forschungszentrum für Gesundheit und Umwelt (GmbH), 85764 Neuherberg, Germany
| | | | - Michael Wand
- IDSIA, USI, SUPSI, 6900 Lugano, Switzerland
- Institute for Digital Technologies for Personalized Healthcare, SUPSI, 6900 Lugano, Switzerland
| | - Djork-Arné Clevert
- Machine Learning Research, Pfizer Worldwide Research Development and Medical, Linkstr.10, Berlin, Germany
| | | |
Collapse
|
57
|
Wen M, Spotte-Smith EWC, Blau SM, McDermott MJ, Krishnapriyan AS, Persson KA. Chemical reaction networks and opportunities for machine learning. NATURE COMPUTATIONAL SCIENCE 2023; 3:12-24. [PMID: 38177958 DOI: 10.1038/s43588-022-00369-z] [Citation(s) in RCA: 18] [Impact Index Per Article: 18.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/24/2022] [Accepted: 11/08/2022] [Indexed: 01/06/2024]
Abstract
Chemical reaction networks (CRNs), defined by sets of species and possible reactions between them, are widely used to interrogate chemical systems. To capture increasingly complex phenomena, CRNs can be leveraged alongside data-driven methods and machine learning (ML). In this Perspective, we assess the diverse strategies available for CRN construction and analysis in pursuit of a wide range of scientific goals, discuss ML techniques currently being applied to CRNs and outline future CRN-ML approaches, presenting scientific and technical challenges to overcome.
Collapse
Affiliation(s)
- Mingjian Wen
- Chemical and Biomolecular Engineering, University of Houston, Houston, TX, USA
- Energy Technologies Area, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Evan Walter Clark Spotte-Smith
- Materials Science Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
- Materials Science and Engineering, University of California, Berkeley, Berkeley, CA, USA
| | - Samuel M Blau
- Energy Technologies Area, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Matthew J McDermott
- Materials Science Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
- Materials Science and Engineering, University of California, Berkeley, Berkeley, CA, USA
| | - Aditi S Krishnapriyan
- Computational Research Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
- Chemical and Biomolecular Engineering, University of California, Berkeley, Berkeley, CA, USA
- Electrical Engineering and Computer Science, University of California, Berkeley, Berkeley, CA, USA
| | - Kristin A Persson
- Materials Science and Engineering, University of California, Berkeley, Berkeley, CA, USA.
- Molecular Foundry, Lawrence Berkeley National Laboratory, Berkeley, CA, USA.
| |
Collapse
|
58
|
Merging enzymatic and synthetic chemistry with computational synthesis planning. Nat Commun 2022; 13:7747. [PMID: 36517480 PMCID: PMC9750992 DOI: 10.1038/s41467-022-35422-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2022] [Accepted: 11/30/2022] [Indexed: 12/15/2022] Open
Abstract
Synthesis planning programs trained on chemical reaction data can design efficient routes to new molecules of interest, but are limited in their ability to leverage rare chemical transformations. This challenge is acute for enzymatic reactions, which are valuable due to their selectivity and sustainability but are few in number. We report a retrosynthetic search algorithm using two neural network models for retrosynthesis-one covering 7984 enzymatic transformations and one 163,723 synthetic transformations-that balances the exploration of enzymatic and synthetic reactions to identify hybrid synthesis plans. This approach extends the space of retrosynthetic moves by thousands of uniquely enzymatic one-step transformations, discovers routes to molecules for which synthetic or enzymatic searches find none, and designs shorter routes for others. Application to (-)-Δ9 tetrahydrocannabinol (THC) (dronabinol) and R,R-formoterol (arformoterol) illustrates how our strategy facilitates the replacement of metal catalysis, high step counts, or costly enantiomeric resolution with more elegant hybrid proposals.
Collapse
|
59
|
Fleitmann L, Gertig C, Scheffczyk J, Schilling J, Leonhard K, Bardow A. From Molecules to Heat‐Integrated Processes: Computer‐Aided Design of Solvents and Processes Using Quantum Chemistry. CHEM-ING-TECH 2022. [DOI: 10.1002/cite.202200098] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/05/2022]
Affiliation(s)
- Lorenz Fleitmann
- ETH Zürich Department of Mechanical and Process Engineering, Energy and Process Systems Engineering Tannenstrasse 3 8092 Zürich Switzerland
- RWTH Aachen University Institute of Technical Thermodynamics Schinkelstraße 8 52062 Aachen Germany
| | - Christoph Gertig
- RWTH Aachen University Institute of Technical Thermodynamics Schinkelstraße 8 52062 Aachen Germany
| | - Jan Scheffczyk
- RWTH Aachen University Institute of Technical Thermodynamics Schinkelstraße 8 52062 Aachen Germany
| | - Johannes Schilling
- ETH Zürich Department of Mechanical and Process Engineering, Energy and Process Systems Engineering Tannenstrasse 3 8092 Zürich Switzerland
| | - Kai Leonhard
- RWTH Aachen University Institute of Technical Thermodynamics Schinkelstraße 8 52062 Aachen Germany
| | - André Bardow
- ETH Zürich Department of Mechanical and Process Engineering, Energy and Process Systems Engineering Tannenstrasse 3 8092 Zürich Switzerland
- RWTH Aachen University Institute of Technical Thermodynamics Schinkelstraße 8 52062 Aachen Germany
- Forschungszentrum Jülich GmbH Institute of Energy and Climate Research (IEK-10) Wilhelm-Johnen-Straße 52425 Jülich Germany
| |
Collapse
|
60
|
Lavigne C, Gomes G, Pollice R, Aspuru-Guzik A. Guided discovery of chemical reaction pathways with imposed activation. Chem Sci 2022; 13:13857-13871. [PMID: 36544742 PMCID: PMC9710306 DOI: 10.1039/d2sc05135d] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2022] [Accepted: 11/09/2022] [Indexed: 11/12/2022] Open
Abstract
Computational power and quantum chemical methods have improved immensely since computers were first applied to the study of reactivity, but the de novo prediction of chemical reactions has remained challenging. We show that complex reaction pathways can be efficiently predicted in a guided manner using chemical activation imposed by geometrical constraints of specific reactive modes, which we term imposed activation (IACTA). Our approach is demonstrated on realistic and challenging chemistry, such as a triple cyclization cascade involved in the total synthesis of a natural product, a water-mediated Michael addition, and several oxidative addition reactions of complex drug-like molecules. Notably and in contrast with traditional hand-guided computational chemistry calculations, our method requires minimal human involvement and no prior knowledge of the products or the associated mechanisms. We believe that IACTA will be a transformational tool to screen for chemical reactivity and to study both by-product formation and decomposition pathways in a guided way.
Collapse
Affiliation(s)
- Cyrille Lavigne
- Department of Computer Science, University of Toronto214 College St.TorontoOntarioM5T 3A1Canada
| | - Gabe Gomes
- Department of Computer Science, University of Toronto214 College St.TorontoOntarioM5T 3A1Canada,Chemical Physics Theory Group, Department of Chemistry, University of Toronto80 St George StTorontoOntarioM5S 3H6Canada
| | - Robert Pollice
- Department of Computer Science, University of Toronto214 College St.TorontoOntarioM5T 3A1Canada,Chemical Physics Theory Group, Department of Chemistry, University of Toronto80 St George StTorontoOntarioM5S 3H6Canada
| | - Alán Aspuru-Guzik
- Department of Computer Science, University of Toronto214 College St.TorontoOntarioM5T 3A1Canada,Chemical Physics Theory Group, Department of Chemistry, University of Toronto80 St George StTorontoOntarioM5S 3H6Canada,Department of Chemical Engineering & Applied Chemistry, University of Toronto200 College St.OntarioM5S 3E5Canada,Department of Materials Science & Engineering, University of Toronto184 College St.OntarioM5S 3E4Canada,Vector Institute for Artificial Intelligence661 University Ave Suite 710TorontoOntarioM5G 1M1Canada,Lebovic Fellow, Canadian Institute for Advanced Research (CIFAR)661 University AveTorontoOntarioM5GCanada
| |
Collapse
|
61
|
Reiser P, Neubert M, Eberhard A, Torresi L, Zhou C, Shao C, Metni H, van Hoesel C, Schopmans H, Sommer T, Friederich P. Graph neural networks for materials science and chemistry. COMMUNICATIONS MATERIALS 2022; 3:93. [PMID: 36468086 PMCID: PMC9702700 DOI: 10.1038/s43246-022-00315-6] [Citation(s) in RCA: 65] [Impact Index Per Article: 32.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/10/2022] [Accepted: 11/07/2022] [Indexed: 05/14/2023]
Abstract
Machine learning plays an increasingly important role in many areas of chemistry and materials science, being used to predict materials properties, accelerate simulations, design new structures, and predict synthesis routes of new materials. Graph neural networks (GNNs) are one of the fastest growing classes of machine learning models. They are of particular relevance for chemistry and materials science, as they directly work on a graph or structural representation of molecules and materials and therefore have full access to all relevant information required to characterize materials. In this Review, we provide an overview of the basic principles of GNNs, widely used datasets, and state-of-the-art architectures, followed by a discussion of a wide range of recent applications of GNNs in chemistry and materials science, and concluding with a road-map for the further development and application of GNNs.
Collapse
Affiliation(s)
- Patrick Reiser
- Institute of Theoretical Informatics, Karlsruhe Institute of Technology, Am Fasanengarten 5, 76131 Karlsruhe, Germany
- Institute of Nanotechnology, Karlsruhe Institute of Technology, Hermann-von-Helmholtz-Platz 1, 76344 Eggenstein-Leopoldshafen, Germany
| | - Marlen Neubert
- Institute of Theoretical Informatics, Karlsruhe Institute of Technology, Am Fasanengarten 5, 76131 Karlsruhe, Germany
| | - André Eberhard
- Institute of Theoretical Informatics, Karlsruhe Institute of Technology, Am Fasanengarten 5, 76131 Karlsruhe, Germany
| | - Luca Torresi
- Institute of Theoretical Informatics, Karlsruhe Institute of Technology, Am Fasanengarten 5, 76131 Karlsruhe, Germany
| | - Chen Zhou
- Institute of Theoretical Informatics, Karlsruhe Institute of Technology, Am Fasanengarten 5, 76131 Karlsruhe, Germany
| | - Chen Shao
- Institute of Theoretical Informatics, Karlsruhe Institute of Technology, Am Fasanengarten 5, 76131 Karlsruhe, Germany
- Present Address: Institute for Applied Informatics and Formal Description Systems, Karlsruhe Institute of Technology, Kaiserstr. 89, 76133 Karlsruhe, Germany
| | - Houssam Metni
- Institute of Theoretical Informatics, Karlsruhe Institute of Technology, Am Fasanengarten 5, 76131 Karlsruhe, Germany
- ECPM, Université de Strasbourg, 25 Rue Becquerel, 67087 Strasbourg, France
| | - Clint van Hoesel
- Institute of Theoretical Informatics, Karlsruhe Institute of Technology, Am Fasanengarten 5, 76131 Karlsruhe, Germany
- Department of Applied Physics, Eindhoven University of Technology, Groene Loper 19, 5612 AP Eindhoven, The Netherlands
| | - Henrik Schopmans
- Institute of Theoretical Informatics, Karlsruhe Institute of Technology, Am Fasanengarten 5, 76131 Karlsruhe, Germany
- Institute of Nanotechnology, Karlsruhe Institute of Technology, Hermann-von-Helmholtz-Platz 1, 76344 Eggenstein-Leopoldshafen, Germany
| | - Timo Sommer
- Institute of Theoretical Informatics, Karlsruhe Institute of Technology, Am Fasanengarten 5, 76131 Karlsruhe, Germany
- Institute for Theory of Condensed Matter, Karlsruhe Institute of Technology, Wolfgang-Gaede-Str. 1, 76131 Karlsruhe, Germany
- Present Address: School of Chemistry, Trinity College Dublin, College Green, Dublin 2, Ireland
| | - Pascal Friederich
- Institute of Theoretical Informatics, Karlsruhe Institute of Technology, Am Fasanengarten 5, 76131 Karlsruhe, Germany
- Institute of Nanotechnology, Karlsruhe Institute of Technology, Hermann-von-Helmholtz-Platz 1, 76344 Eggenstein-Leopoldshafen, Germany
| |
Collapse
|
62
|
Yarish D, Garkot S, Grygorenko OO, Radchenko DS, Moroz YS, Gurbych O. Advancing molecular graphs with descriptors for the prediction of chemical reaction yields. J Comput Chem 2022; 44:76-92. [PMID: 36264601 DOI: 10.1002/jcc.27016] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2022] [Revised: 08/31/2022] [Accepted: 09/05/2022] [Indexed: 11/08/2022]
Abstract
Chemical yield is the percentage of the reactants converted to the desired products. Chemists use predictive algorithms to select high-yielding reactions and score synthesis routes, saving time and reagents. This study suggests a novel graph neural network architecture for chemical yield prediction. The network combines structural information about participants of the transformation as well as molecular and reaction-level descriptors. It works with incomplete chemical reactions and generates reactants-product atom mapping. We show that the network benefits from advanced information by comparing it with several machine learning models and molecular representations. Models included logistic regression, support vector machine, CatBoost, and Bidirectional Encoder Representations from Transformers. Molecular representations included extended-connectivity fingerprints, Morgan fingerprints, SMILESVec embeddings, and textual. Classification and regression objectives were assessed for each model and feature set. The goal of each classification model was to separate zero- and non-zero-yielding reactions. The models were trained and evaluated on a proprietary dataset of 10 reaction types. Also, the models were benchmarked on two public single reaction type datasets. The study was supplemented with analysis of data, results, and errors, as well as the impact of steric factors, side reactions, isolation, and purification efficiency. The supplementary code is available at https://github.com/SoftServeInc/yield-paper.
Collapse
Affiliation(s)
| | - Sofiya Garkot
- SoftServe, Inc., Lviv, Ukraine.,Ukrainian Catholic University, Lviv, Ukraine
| | - Oleksandr O Grygorenko
- Enamine Ltd., Kyiv, Ukraine.,Taras Shevchenko National University of Kyiv, Kyiv, Ukraine
| | - Dmytro S Radchenko
- Enamine Ltd., Kyiv, Ukraine.,Taras Shevchenko National University of Kyiv, Kyiv, Ukraine
| | - Yurii S Moroz
- Taras Shevchenko National University of Kyiv, Kyiv, Ukraine.,Chemspace LLC, Kyiv, Ukraine
| | - Oleksandr Gurbych
- Lviv Polytechnic National University, Lviv, Ukraine.,Blackthorn AI, Ltd., London, UK
| |
Collapse
|
63
|
Melnyk N, Iribarren I, Mates‐Torres E, Trujillo C. Theoretical Perspectives in Organocatalysis. Chemistry 2022; 28:e202201570. [PMID: 35792702 PMCID: PMC9804221 DOI: 10.1002/chem.202201570] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2022] [Indexed: 01/05/2023]
Abstract
It is clear that the field of organocatalysis is continuously expanding during the last decades. With increasing computational capacity and new techniques, computational methods have provided a more economic approach to explore different chemical systems. This review offers a broad yet concise overview of current state-of-the-art studies that have employed novel strategies for catalyst design. The evolution of the all different theoretical approaches most commonly used within organocatalysis is discussed, from the traditional approach, manual-driven, to the most recent one, machine-driven.
Collapse
Affiliation(s)
- Nika Melnyk
- School of ChemistryTrinity College DublinCollege GreenDublin2Ireland
| | - Iñigo Iribarren
- School of ChemistryTrinity College DublinCollege GreenDublin2Ireland
| | - Eric Mates‐Torres
- School of ChemistryTrinity College DublinCollege GreenDublin2Ireland
| | - Cristina Trujillo
- School of ChemistryTrinity College DublinCollege GreenDublin2Ireland
| |
Collapse
|
64
|
Krenn M, Ai Q, Barthel S, Carson N, Frei A, Frey NC, Friederich P, Gaudin T, Gayle AA, Jablonka KM, Lameiro RF, Lemm D, Lo A, Moosavi SM, Nápoles-Duarte JM, Nigam A, Pollice R, Rajan K, Schatzschneider U, Schwaller P, Skreta M, Smit B, Strieth-Kalthoff F, Sun C, Tom G, Falk von Rudorff G, Wang A, White AD, Young A, Yu R, Aspuru-Guzik A. SELFIES and the future of molecular string representations. PATTERNS (NEW YORK, N.Y.) 2022; 3:100588. [PMID: 36277819 PMCID: PMC9583042 DOI: 10.1016/j.patter.2022.100588] [Citation(s) in RCA: 34] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
Artificial intelligence (AI) and machine learning (ML) are expanding in popularity for broad applications to challenging tasks in chemistry and materials science. Examples include the prediction of properties, the discovery of new reaction pathways, or the design of new molecules. The machine needs to read and write fluently in a chemical language for each of these tasks. Strings are a common tool to represent molecular graphs, and the most popular molecular string representation, Smiles, has powered cheminformatics since the late 1980s. However, in the context of AI and ML in chemistry, Smiles has several shortcomings-most pertinently, most combinations of symbols lead to invalid results with no valid chemical interpretation. To overcome this issue, a new language for molecules was introduced in 2020 that guarantees 100% robustness: SELF-referencing embedded string (Selfies). Selfies has since simplified and enabled numerous new applications in chemistry. In this perspective, we look to the future and discuss molecular string representations, along with their respective opportunities and challenges. We propose 16 concrete future projects for robust molecular representations. These involve the extension toward new chemical domains, exciting questions at the interface of AI and robust languages, and interpretability for both humans and machines. We hope that these proposals will inspire several follow-up works exploiting the full potential of molecular string representations for the future of AI in chemistry and materials science.
Collapse
Affiliation(s)
- Mario Krenn
- Max Planck Institute for the Science of Light (MPL), Erlangen, Germany,Corresponding author
| | - Qianxiang Ai
- Department of Chemistry, Fordham University, The Bronx, NY, USA
| | - Senja Barthel
- Department of Mathematics, Vrije Universiteit Amsterdam, Amsterdam, the Netherlands
| | - Nessa Carson
- Syngenta Jealott’s Hill International Research Centre, Bracknell, Berkshire, UK
| | - Angelo Frei
- Department of Chemistry, Imperial College London, Molecular Sciences Research Hub, White City Campus, Wood Lane, London, UK
| | - Nathan C. Frey
- Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Pascal Friederich
- Institute of Theoretical Informatics, Karlsruhe Institute of Technology, Karlsruhe, Germany,Institute of Nanotechnology, Karlsruhe Institute of Technology, Eggenstein-Leopoldshafen, Germany
| | - Théophile Gaudin
- Department of Computer Science, University of Toronto, Toronto, ON, Canada,IBM Research Europe, Zürich, Switzerland
| | | | - Kevin Maik Jablonka
- Laboratory of Molecular Simulation (LSMO), Institut des Sciences et Ingénierie Chimiques, Ecole Polytechnique Fédérale de Lausanne (EPFL), Sion, Valais, Switzerland
| | - Rafael F. Lameiro
- Medicinal and Biological Chemistry Group, São Carlos Institute of Chemistry, University of São Paulo, São Paulo, Brazil
| | - Dominik Lemm
- Faculty of Physics, University of Vienna, Vienna, Austria
| | - Alston Lo
- Department of Computer Science, University of Toronto, Toronto, ON, Canada
| | - Seyed Mohamad Moosavi
- Department of Mathematics and Computer Science, Freie Universität Berlin, Berlin, Germany
| | | | - AkshatKumar Nigam
- Department of Computer Science, Stanford University, Stanford, CA, USA
| | - Robert Pollice
- Department of Computer Science, University of Toronto, Toronto, ON, Canada,Chemical Physics Theory Group, Department of Chemistry, University of Toronto, Toronto, ON, Canada
| | - Kohulan Rajan
- Institute for Inorganic and Analytical Chemistry, Friedrich-Schiller Universität Jena, Jena, Germany
| | - Ulrich Schatzschneider
- Institut für Anorganische Chemie, Julius-Maximilians-Universität Würzburg, Würzburg, Germany
| | - Philippe Schwaller
- IBM Research Europe, Zürich, Switzerland,Laboratory of Artificial Chemical Intelligence (LIAC), Institut des Sciences et Ingénierie Chimiques, Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland,National Centre of Competence in Research (NCCR) Catalysis, Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
| | - Marta Skreta
- Department of Computer Science, University of Toronto, Toronto, ON, Canada,Vector Institute for Artificial Intelligence, Toronto, ON, Canada
| | - Berend Smit
- Laboratory of Molecular Simulation (LSMO), Institut des Sciences et Ingénierie Chimiques, Ecole Polytechnique Fédérale de Lausanne (EPFL), Sion, Valais, Switzerland
| | - Felix Strieth-Kalthoff
- Chemical Physics Theory Group, Department of Chemistry, University of Toronto, Toronto, ON, Canada
| | - Chong Sun
- Department of Computer Science, University of Toronto, Toronto, ON, Canada
| | - Gary Tom
- Department of Computer Science, University of Toronto, Toronto, ON, Canada,Chemical Physics Theory Group, Department of Chemistry, University of Toronto, Toronto, ON, Canada
| | | | - Andrew Wang
- Chemical Physics Theory Group, Department of Chemistry, University of Toronto, Toronto, ON, Canada,Solar Fuels Group, Department of Chemistry, University of Toronto, Toronto, ON, Canada
| | - Andrew D. White
- Department of Chemical Engineering, University of Rochester, Rochester, NY, USA
| | - Adamo Young
- Department of Computer Science, University of Toronto, Toronto, ON, Canada,Vector Institute for Artificial Intelligence, Toronto, ON, Canada
| | - Rose Yu
- Department of Computer Science and Engineering, University of California, San Diego, La Jolla, CA, USA
| | - Alán Aspuru-Guzik
- Department of Computer Science, University of Toronto, Toronto, ON, Canada,Chemical Physics Theory Group, Department of Chemistry, University of Toronto, Toronto, ON, Canada,Vector Institute for Artificial Intelligence, Toronto, ON, Canada,Department of Chemical Engineering and Applied Chemistry, University of Toronto, Toronto, ON, Canada,Department of Materials Science, University of Toronto, Toronto, ON, Canada,Canadian Institute for Advanced Research (CIFAR) Lebovic Fellow, Toronto, ON, Canada,Corresponding author
| |
Collapse
|
65
|
de Crécy-lagard V, Amorin de Hegedus R, Arighi C, Babor J, Bateman A, Blaby I, Blaby-Haas C, Bridge AJ, Burley SK, Cleveland S, Colwell LJ, Conesa A, Dallago C, Danchin A, de Waard A, Deutschbauer A, Dias R, Ding Y, Fang G, Friedberg I, Gerlt J, Goldford J, Gorelik M, Gyori BM, Henry C, Hutinet G, Jaroch M, Karp PD, Kondratova L, Lu Z, Marchler-Bauer A, Martin MJ, McWhite C, Moghe GD, Monaghan P, Morgat A, Mungall CJ, Natale DA, Nelson WC, O’Donoghue S, Orengo C, O’Toole KH, Radivojac P, Reed C, Roberts RJ, Rodionov D, Rodionova IA, Rudolf JD, Saleh L, Sheynkman G, Thibaud-Nissen F, Thomas PD, Uetz P, Vallenet D, Carter EW, Weigele PR, Wood V, Wood-Charlson EM, Xu J. A roadmap for the functional annotation of protein families: a community perspective. Database (Oxford) 2022; 2022:6663924. [PMID: 35961013 PMCID: PMC9374478 DOI: 10.1093/database/baac062] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2022] [Revised: 06/28/2022] [Accepted: 08/03/2022] [Indexed: 12/23/2022]
Abstract
Over the last 25 years, biology has entered the genomic era and is becoming a science of ‘big data’. Most interpretations of genomic analyses rely on accurate functional annotations of the proteins encoded by more than 500 000 genomes sequenced to date. By different estimates, only half the predicted sequenced proteins carry an accurate functional annotation, and this percentage varies drastically between different organismal lineages. Such a large gap in knowledge hampers all aspects of biological enterprise and, thereby, is standing in the way of genomic biology reaching its full potential. A brainstorming meeting to address this issue funded by the National Science Foundation was held during 3–4 February 2022. Bringing together data scientists, biocurators, computational biologists and experimentalists within the same venue allowed for a comprehensive assessment of the current state of functional annotations of protein families. Further, major issues that were obstructing the field were identified and discussed, which ultimately allowed for the proposal of solutions on how to move forward.
Collapse
Affiliation(s)
- Valérie de Crécy-lagard
- Department of Microbiology and Cell Sciences, University of Florida , Gainesville, FL 32611, USA
| | | | - Cecilia Arighi
- Department of Computer and Information Sciences, University of Delaware , Newark, DE 19713, USA
| | - Jill Babor
- Department of Microbiology and Cell Sciences, University of Florida , Gainesville, FL 32611, USA
| | - Alex Bateman
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus , Hinxton CB10 1SD, UK
| | - Ian Blaby
- US Department of Energy Joint Genome Institute, Lawrence Berkeley National Laboratory , Berkeley, CA 94720, USA
| | - Crysten Blaby-Haas
- Biology Department, Brookhaven National Laboratory , Upton, NY 11973, USA
| | - Alan J Bridge
- Swiss-Prot group, SIB Swiss Institute of Bioinformatics, Centre Medical Universitaire , Geneva 4 CH-1211, Switzerland
| | - Stephen K Burley
- RCSB Protein Data Bank, Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey , Piscataway, NJ 08854, USA
| | - Stacey Cleveland
- Department of Microbiology and Cell Sciences, University of Florida , Gainesville, FL 32611, USA
| | - Lucy J Colwell
- Departmenf of Chemistry, University of Cambridge , Lensfield Road, Cambridge CB2 1EW, UK
| | - Ana Conesa
- Spanish National Research Council, Institute for Integrative Systems Biology , Paterna, Valencia 46980, Spain
| | - Christian Dallago
- TUM (Technical University of Munich) Department of Informatics, Bioinformatics & Computational Biology , i12, Boltzmannstr. 3, Garching/Munich 85748, Germany
| | - Antoine Danchin
- School of Biomedical Sciences, Li KaShing Faculty of Medicine, The University of Hong Kong , 21 Sassoon Road, Pokfulam, SAR Hong Kong 999077, China
| | - Anita de Waard
- Research Collaboration Unit, Elsevier , Jericho, VT 05465, USA
| | - Adam Deutschbauer
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory , Berkeley, CA 94720, USA
| | - Raquel Dias
- Department of Microbiology and Cell Sciences, University of Florida , Gainesville, FL 32611, USA
| | - Yousong Ding
- Department of Medicinal Chemistry, Center for Natural Products, Drug Discovery and Development, University of Florida , Gainesville, FL 32610, USA
| | - Gang Fang
- NYU-Shanghai , Shanghai 200120, China
| | - Iddo Friedberg
- Department of Veterinary Microbiology and Preventive Medicine, Iowa State University , Ames, IA 50011, USA
| | - John Gerlt
- Institute for Genomic Biology and Departments of Biochemistry and Chemistry, University of Illinois at Urbana-Champaign , Urbana, IL 61801, USA
| | - Joshua Goldford
- Physics of Living Systems, Massachusetts Institute of Technology , Cambridge, MA 02139, USA
| | - Mark Gorelik
- Department of Microbiology and Cell Sciences, University of Florida , Gainesville, FL 32611, USA
| | - Benjamin M Gyori
- Laboratory of Systems Pharmacology, Harvard Medical School , Boston, MA 02115, USA
| | - Christopher Henry
- Mathematics and Computer Science Division, Argonne National Laboratory , Argonne, IL 60439, USA
| | - Geoffrey Hutinet
- Department of Microbiology and Cell Sciences, University of Florida , Gainesville, FL 32611, USA
| | - Marshall Jaroch
- Department of Microbiology and Cell Sciences, University of Florida , Gainesville, FL 32611, USA
| | - Peter D Karp
- Bioinformatics Research Group, SRI International , Menlo Park, CA 94025, USA
| | | | - Zhiyong Lu
- National Center for Biotechnology Information (NCBI), National Library of Medicine (NLM), National Institutes of Health (NIH) , 8600 Rockville Pike, Bethesda, MD 20817, USA
| | - Aron Marchler-Bauer
- National Center for Biotechnology Information (NCBI), National Library of Medicine (NLM), National Institutes of Health (NIH) , 8600 Rockville Pike, Bethesda, MD 20817, USA
| | - Maria-Jesus Martin
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus , Hinxton CB10 1SD, UK
| | - Claire McWhite
- Lewis-Sigler Institute for Integrative Genomics, Princeton University , Princeton, NJ 08540, USA
| | - Gaurav D Moghe
- Plant Biology Section, School of Integrative Plant Science, Cornell University , Ithaca, NY 14853, USA
| | - Paul Monaghan
- Department of Agricultural Education and Communication, University of Florida , Gainesville, FL 32611, USA
| | - Anne Morgat
- Swiss-Prot group, SIB Swiss Institute of Bioinformatics, Centre Medical Universitaire , Geneva 4 CH-1211, Switzerland
| | - Christopher J Mungall
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory , Berkeley, CA 94720, USA
| | - Darren A Natale
- Georgetown University Medical Center , Washington, DC 20007, USA
| | - William C Nelson
- Biological Sciences Division, Pacific Northwest National Laboratories , Richland, WA 99354, USA
| | - Seán O’Donoghue
- School of Biotechnology and Biomolecular Sciences, University of NSW , Sydney, NSW 2052, Australia
| | - Christine Orengo
- Department of Structural and Molecular Biology, University College London , London WC1E 6BT, UK
| | | | - Predrag Radivojac
- Khoury College of Computer Sciences, Northeastern University , Boston, MA 02115, USA
| | - Colbie Reed
- Department of Microbiology and Cell Sciences, University of Florida , Gainesville, FL 32611, USA
| | | | - Dmitri Rodionov
- Sanford Burnham Prebys Medical Discovery Institute , La Jolla, CA 92037, USA
| | - Irina A Rodionova
- Department of Bioengineering, Division of Engineering, University of California at San Diego , La Jolla, CA 92093-0412, USA
| | - Jeffrey D Rudolf
- Department of Chemistry, University of Florida , Gainesville, FL 32611, USA
| | - Lana Saleh
- New England Biolabs , Ipswich, MA 01938, USA
| | - Gloria Sheynkman
- Department of Molecular Physiology and Biological Physics, University of Virginia , Charlottesville, VA, USA
| | - Francoise Thibaud-Nissen
- National Center for Biotechnology Information (NCBI), National Library of Medicine (NLM), National Institutes of Health (NIH) , 8600 Rockville Pike, Bethesda, MD 20817, USA
| | - Paul D Thomas
- Department of Population and Public Health Sciences, University of Southern California , Los Angeles, CA 90033, USA
| | - Peter Uetz
- Center for Biological Data Science, Virginia Commonwealth University , Richmond, VA 23284, USA
| | - David Vallenet
- LABGeM, Génomique Métabolique, CEA, Genoscope, Institut François Jacob, Université d’Évry, Université Paris-Saclay, CNRS , Evry 91057, France
| | - Erica Watson Carter
- Department of Plant Pathology, University of Florida Citrus Research and Education Center , 700 Experiment Station Rd., Lake Alfred, FL 33850, USA
| | | | - Valerie Wood
- Department of Biochemistry, University of Cambridge , Cambridge CB2 1GA, UK
| | - Elisha M Wood-Charlson
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory , Berkeley, CA 94720, USA
| | - Jin Xu
- Department of Plant Pathology, University of Florida Citrus Research and Education Center , 700 Experiment Station Rd., Lake Alfred, FL 33850, USA
| |
Collapse
|
66
|
Spiekermann K, Pattanaik L, Green WH. High accuracy barrier heights, enthalpies, and rate coefficients for chemical reactions. Sci Data 2022; 9:417. [PMID: 35851390 PMCID: PMC9293986 DOI: 10.1038/s41597-022-01529-6] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2022] [Accepted: 06/30/2022] [Indexed: 12/13/2022] Open
Abstract
Quantitative chemical reaction data, including activation energies and reaction rates, are crucial for developing detailed kinetic mechanisms and accurately predicting reaction outcomes. However, such data are often difficult to find, and high-quality datasets are especially rare. Here, we use CCSD(T)-F12a/cc-pVDZ-F12//ωB97X-D3/def2-TZVP to obtain high-quality single point calculations for nearly 22,000 unique stable species and transition states. We report the results from these quantum chemistry calculations and extract the barrier heights and reaction enthalpies to create a kinetics dataset of nearly 12,000 gas-phase reactions. These reactions involve H, C, N, and O, contain up to seven heavy atoms, and have cleaned atom-mapped SMILES. Our higher-accuracy coupled-cluster barrier heights differ significantly (RMSE of ∼5 kcal mol−1) relative to those calculated at ωB97X-D3/def2-TZVP. We also report accurate transition state theory rate coefficients \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$${k}_{\infty }(T)$$\end{document}k∞(T) between 300 K and 2000 K and the corresponding Arrhenius parameters for a subset of rigid reactions. We believe this data will accelerate development of automated and reliable methods for quantitative reaction prediction. Measurement(s) | Barrier Heights • Enthalpies • Rate Coefficients | Technology Type(s) | ab initio quantum chemistry computational method |
Collapse
Affiliation(s)
- Kevin Spiekermann
- Department of Chemical Engineering, Massachusetts Institute of Technology, 77 Massachusetts Ave, Cambridge, MA, 02139, USA
| | - Lagnajit Pattanaik
- Department of Chemical Engineering, Massachusetts Institute of Technology, 77 Massachusetts Ave, Cambridge, MA, 02139, USA
| | - William H Green
- Department of Chemical Engineering, Massachusetts Institute of Technology, 77 Massachusetts Ave, Cambridge, MA, 02139, USA.
| |
Collapse
|
67
|
Xu Z, Mahadevan R. Efficient Enumeration of Branched Novel Biochemical Pathways Using a Probabilistic Technique. Ind Eng Chem Res 2022. [DOI: 10.1021/acs.iecr.1c02211] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
- Zhiqing Xu
- Department of Chemical Engineering and Applied Chemistry, University of Toronto, Toronto, Ontario M5S 3E5, Canada
| | - Radhakrishnan Mahadevan
- Department of Chemical Engineering and Applied Chemistry, University of Toronto, Toronto, Ontario M5S 3E5, Canada
- Institute of Biomedical Engineering, University of Toronto, Toronto, Ontario M5S 3G9, Canada
| |
Collapse
|
68
|
Gao H, Zhu LT, Luo ZH, Fraga MA, Hsing IM. Machine Learning and Data Science in Chemical Engineering. Ind Eng Chem Res 2022. [DOI: 10.1021/acs.iecr.2c01788] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Affiliation(s)
- Hanyu Gao
- Department of Chemical and Biological Engineering, Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong SAR, People’s Republic of China
| | - Li-Tao Zhu
- Department of Chemical Engineering, School of Chemistry and Chemical Engineering, State Key Laboratory of Metal Matrix Composites, Shanghai Jiao Tong University, Shanghai 200240, People’s Republic of China
| | - Zheng-Hong Luo
- Department of Chemical Engineering, School of Chemistry and Chemical Engineering, State Key Laboratory of Metal Matrix Composites, Shanghai Jiao Tong University, Shanghai 200240, People’s Republic of China
| | - Marco A. Fraga
- Instituto Nacional de Tecnologia − INT, Av. Venezuela, 82/518, Rio de Janeiro, RJ 20081-312, Brazil
| | - I-Ming Hsing
- Department of Chemical and Biological Engineering, Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong SAR, People’s Republic of China
| |
Collapse
|
69
|
|
70
|
Grzybowski BA, Badowski T, Molga K, Szymkuć S. Network search algorithms and scoring functions for advanced‐level computerized synthesis planning. WIRES COMPUTATIONAL MOLECULAR SCIENCE 2022. [DOI: 10.1002/wcms.1630] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Affiliation(s)
- Bartosz A. Grzybowski
- Institute of Organic Chemistry, Polish Academy of Sciences Warsaw Poland
- Center for Soft and Living Matter, Institute for Basic Science (IBS) Ulsan Republic of Korea
- Department of Chemistry Ulsan National Institute of Science and Technology (UNIST) Ulsan Republic of Korea
| | - Tomasz Badowski
- Institute of Organic Chemistry, Polish Academy of Sciences Warsaw Poland
| | - Karol Molga
- Institute of Organic Chemistry, Polish Academy of Sciences Warsaw Poland
| | - Sara Szymkuć
- Institute of Organic Chemistry, Polish Academy of Sciences Warsaw Poland
| |
Collapse
|
71
|
Bender A, Schneider N, Segler M, Patrick Walters W, Engkvist O, Rodrigues T. Evaluation guidelines for machine learning tools in the chemical sciences. Nat Rev Chem 2022; 6:428-442. [PMID: 37117429 DOI: 10.1038/s41570-022-00391-9] [Citation(s) in RCA: 29] [Impact Index Per Article: 14.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/13/2022] [Indexed: 02/07/2023]
Abstract
Machine learning (ML) promises to tackle the grand challenges in chemistry and speed up the generation, improvement and/or ordering of research hypotheses. Despite the overarching applicability of ML workflows, one usually finds diverse evaluation study designs. The current heterogeneity in evaluation techniques and metrics leads to difficulty in (or the impossibility of) comparing and assessing the relevance of new algorithms. Ultimately, this may delay the digitalization of chemistry at scale and confuse method developers, experimentalists, reviewers and journal editors. In this Perspective, we critically discuss a set of method development and evaluation guidelines for different types of ML-based publications, emphasizing supervised learning. We provide a diverse collection of examples from various authors and disciplines in chemistry. While taking into account varying accessibility across research groups, our recommendations focus on reporting completeness and standardizing comparisons between tools. We aim to further contribute to improved ML transparency and credibility by suggesting a checklist of retro-/prospective tests and dissecting their importance. We envisage that the wide adoption and continuous update of best practices will encourage an informed use of ML on real-world problems related to the chemical sciences.
Collapse
|
72
|
Rankine CD, Penfold TJ. Accurate, affordable, and generalizable machine learning simulations of transition metal x-ray absorption spectra using the XANESNET deep neural network. J Chem Phys 2022; 156:164102. [PMID: 35490005 DOI: 10.1063/5.0087255] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
Abstract
The affordable, accurate, and generalizable prediction of spectroscopic observables plays a key role in the analysis of increasingly complex experiments. In this article, we develop and deploy a deep neural network-XANESNET-for predicting the lineshape of first-row transition metal K-edge x-ray absorption near-edge structure (XANES) spectra. XANESNET predicts the spectral intensities using only information about the local coordination geometry of the transition metal complexes encoded in a feature vector of weighted atom-centered symmetry functions. We address in detail the calibration of the feature vector for the particularities of the problem at hand, and we explore the individual feature importance to reveal the physical insight that XANESNET obtains at the Fe K-edge. XANESNET relies on only a few judiciously selected features-radial information on the first and second coordination shells suffices along with angular information sufficient to separate satisfactorily key coordination geometries. The feature importance is found to reflect the XANES spectral window under consideration and is consistent with the expected underlying physics. We subsequently apply XANESNET at nine first-row transition metal (Ti-Zn) K-edges. It can be optimized in as little as a minute, predicts instantaneously, and provides K-edge XANES spectra with an average accuracy of ∼±2%-4% in which the positions of prominent peaks are matched with a >90% hit rate to sub-eV (∼0.8 eV) error.
Collapse
Affiliation(s)
- C D Rankine
- Chemistry-School of Natural and Environmental Sciences, Newcastle University, Newcastle Upon Tyne NE1 7RU, United Kingdom
| | - T J Penfold
- Chemistry-School of Natural and Environmental Sciences, Newcastle University, Newcastle Upon Tyne NE1 7RU, United Kingdom
| |
Collapse
|
73
|
Liu CH, Korablyov M, Jastrzębski S, Włodarczyk-Pruszyński P, Bengio Y, Segler M. RetroGNN: Fast Estimation of Synthesizability for Virtual Screening and De Novo Design by Learning from Slow Retrosynthesis Software. J Chem Inf Model 2022; 62:2293-2300. [PMID: 35452226 DOI: 10.1021/acs.jcim.1c01476] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
De novo molecule design algorithms often result in chemically unfeasible or synthetically inaccessible molecules. A natural idea to mitigate this problem is to bias these algorithms toward more easily synthesizable molecules using a proxy score for synthetic accessibility. However, using currently available proxies can still result in highly unrealistic compounds. Here, we propose a novel approach, RetroGNN, to estimate synthesizability. First, we search for routes using synthesis planning software for a large number of random molecules. This information is then used to train a graph neural network to predict the outcome of the synthesis planner given the target molecule, in which the regression task can be used as a synthesizability scorer. We highlight how RetroGNN can be used in generative molecule-discovery pipelines together with other scoring functions. We evaluate our approach on several QSAR-based molecule design benchmarks, for which we find synthesizable molecules with state-of-the-art scores. Compared to the virtual screening of 5 million existing molecules from the ZINC database, using RetroGNNScore with a simple fragment-based de novo design algorithm finds molecules predicted to be more likely to possess the desired activity exponentially faster, while maintaining good druglike properties and being easier to synthesize. Importantly, our deep neural network can successfully filter out hard to synthesize molecules while achieving a 105 times speedup over using retrosynthesis planning software.
Collapse
Affiliation(s)
- Cheng-Hao Liu
- Mila and Université de Montréal, 6666 St-Urbain Street, Montreal, Canada H2S 3H1.,Department of Chemistry, McGill University, 801 Sherbooke Street W, Montreal, Canada H3A 0B8
| | - Maksym Korablyov
- Mila and Université de Montréal, 6666 St-Urbain Street, Montreal, Canada H2S 3H1
| | - Stanisław Jastrzębski
- Molecule.one, Warsaw 00-815, Poland.,Faculty of Mathematics and Computer Science, Jagiellonian University, 6 Łojasiewicza Street, 30-348 Kraków, Poland
| | | | - Yoshua Bengio
- Mila and Université de Montréal, 6666 St-Urbain Street, Montreal, Canada H2S 3H1
| | - Marwin Segler
- Institute of Organic Chemistry and Center for Multiscale Theory and Computation, Westfälische Wilhelms-Universität Münster, 48149 Münster, Germany.,Microsoft Research, 21 Station Road, Cambridge, U.K. CB1 2FB
| |
Collapse
|
74
|
Zahoránszky-Kőhalmi G, Lysov N, Vorontcov I, Wang J, Soundararajan J, Metaxotos D, Mathew B, Sarosh R, Michael SG, Godfrey AG. Algorithm for the Pruning of Synthesis Graphs. J Chem Inf Model 2022; 62:2226-2238. [PMID: 35438992 PMCID: PMC9093600 DOI: 10.1021/acs.jcim.1c01202] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Synthesis route planning is in the core of chemical intelligence that will power the autonomous chemistry platforms. In this task, we rely on algorithms to generate possible synthesis routes with the help of retro- and forward-synthetic approaches. Generated synthesis routes can be merged into a synthesis graph which represents theoretical pathways to the target molecule. However, it is often required to modify a synthesis graph due to typical constraints. These constraints might include "undesirable substances", e.g., an intermediate that the chemist does not favor or substances that might be toxic. Consequently, we need to prune the synthesis graph by the elimination of such undesirable substances. Synthesis graphs can be represented as directed (not necessarily acyclic) bipartite graphs, and the pruning of such graphs in the light of a set of undesirable substances has been an open question. In this study, we present the Synthesis Graph Pruning (SGP) algorithm that addresses this question. The input to the SGP algorithm is a synthesis graph and a set of undesirable substances. Furthermore, information for substances is provided as metadata regarding their availability from the inventory. The SGP algorithm operates with a simple local rule set, in order to determine which nodes and edges need to be eliminated from the synthesis graph. In this study, we present the SGP algorithm in detail and provide several case studies that demonstrate the operation of the SGP algorithm. We believe that the SGP algorithm will be an essential component of computer aided synthesis planning.
Collapse
Affiliation(s)
| | - Nikita Lysov
- National Center for Advancing Translational Sciences, Rockville, Maryland 20850, United States
| | - Ilia Vorontcov
- National Center for Advancing Translational Sciences, Rockville, Maryland 20850, United States
| | - Jeffrey Wang
- National Center for Advancing Translational Sciences, Rockville, Maryland 20850, United States
| | - Jeyaraman Soundararajan
- National Center for Advancing Translational Sciences, Rockville, Maryland 20850, United States
| | - Dimitrios Metaxotos
- National Center for Advancing Translational Sciences, Rockville, Maryland 20850, United States
| | - Biju Mathew
- National Center for Advancing Translational Sciences, Rockville, Maryland 20850, United States
| | - Rafat Sarosh
- National Center for Advancing Translational Sciences, Rockville, Maryland 20850, United States
| | - Samuel G Michael
- National Center for Advancing Translational Sciences, Rockville, Maryland 20850, United States
| | - Alexander G Godfrey
- National Center for Advancing Translational Sciences, Rockville, Maryland 20850, United States
| |
Collapse
|
75
|
Xu Y, Huang X, Li C, Wei Z, Wang M. Predicting Structure‐dependent Properties Directly from the
3D
Molecular Images via Convolutional Neural Networks. AIChE J 2022. [DOI: 10.1002/aic.17721] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Affiliation(s)
- Yunhao Xu
- School of Chemistry and Chemical Engineering Chongqing University Chongqing 400044 China
| | - Xun Huang
- School of Chemistry and Chemical Engineering Chongqing University Chongqing 400044 China
| | - Cunpu Li
- School of Chemistry and Chemical Engineering Chongqing University Chongqing 400044 China
| | - Zidong Wei
- School of Chemistry and Chemical Engineering Chongqing University Chongqing 400044 China
| | - Meng Wang
- School of Chemistry and Chemical Engineering Chongqing University Chongqing 400044 China
| |
Collapse
|
76
|
Autonomous design of new chemical reactions using a variational autoencoder. Commun Chem 2022; 5:40. [PMID: 36697652 PMCID: PMC9814385 DOI: 10.1038/s42004-022-00647-x] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2021] [Accepted: 02/16/2022] [Indexed: 01/28/2023] Open
Abstract
Artificial intelligence based chemistry models are a promising method of exploring chemical reaction design spaces. However, training datasets based on experimental synthesis are typically reported only for the optimal synthesis reactions. This leads to an inherited bias in the model predictions. Therefore, robust datasets that span the entirety of the solution space are necessary to remove inherited bias and permit complete training of the space. In this study, an artificial intelligence model based on a Variational AutoEncoder (VAE) has been developed and investigated to synthetically generate continuous datasets. The approach involves sampling the latent space to generate new chemical reactions. This developed technique is demonstrated by generating over 7,000,000 new reactions from a training dataset containing only 7,000 reactions. The generated reactions include molecular species that are larger and more diverse than the training set.
Collapse
|
77
|
Lin MH, Tu Z, Coley CW. Improving the performance of models for one-step retrosynthesis through re-ranking. J Cheminform 2022; 14:15. [PMID: 35292121 PMCID: PMC8922884 DOI: 10.1186/s13321-022-00594-8] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2022] [Accepted: 02/26/2022] [Indexed: 12/03/2022] Open
Abstract
Abstract Retrosynthesis is at the core of organic chemistry. Recently, the rapid growth of artificial intelligence (AI) has spurred a variety of novel machine learning approaches for data-driven synthesis planning. These methods learn complex patterns from reaction databases in order to predict, for a given product, sets of reactants that can be used to synthesise that product. However, their performance as measured by the top-N accuracy in matching published reaction precedents still leaves room for improvement. This work aims to enhance these models by learning to re-rank their reactant predictions. Specifically, we design and train an energy-based model to re-rank, for each product, the published reaction as the top suggestion and the remaining reactant predictions as lower-ranked. We show that re-ranking can improve one-step models significantly using the standard USPTO-50k benchmark dataset, such as RetroSim, a similarity-based method, from 35.7 to 51.8% top-1 accuracy and NeuralSym, a deep learning method, from 45.7 to 51.3%, and also that re-ranking the union of two models’ suggestions can lead to better performance than either alone. However, the state-of-the-art top-1 accuracy is not improved by this method. Graphical Abstract ![]()
Supplementary Information The online version contains supplementary material available at 10.1186/s13321-022-00594-8.
Collapse
Affiliation(s)
- Min Htoo Lin
- Division of Chemistry and Biological Chemistry, School of Physical and Mathematical Sciences, Nanyang Technological University, Singapore, 637371, Singapore
| | - Zhengkai Tu
- Computational Science and Engineering, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, MA, 02139, USA
| | - Connor W Coley
- Department of Chemical Engineering, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, MA, 02139, USA.
| |
Collapse
|
78
|
Lu J, Zhang Y. Unified Deep Learning Model for Multitask Reaction Predictions with Explanation. J Chem Inf Model 2022; 62:1376-1387. [PMID: 35266390 PMCID: PMC8960360 DOI: 10.1021/acs.jcim.1c01467] [Citation(s) in RCA: 17] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
There is significant interest and importance to develop robust machine learning models to assist organic chemistry synthesis. Typically, task-specific machine learning models for distinct reaction prediction tasks have been developed. In this work, we develop a unified deep learning model, T5Chem, for a variety of chemical reaction predictions tasks by adapting the "Text-to-Text Transfer Transformer" (T5) framework in natural language processing (NLP). On the basis of self-supervised pretraining with PubChem molecules, the T5Chem model can achieve state-of-the-art performances for four distinct types of task-specific reaction prediction tasks using four different open-source data sets, including reaction type classification on USPTO_TPL, forward reaction prediction on USPTO_MIT, single-step retrosynthesis on USPTO_50k, and reaction yield prediction on high-throughput C-N coupling reactions. Meanwhile, we introduced a new unified multitask reaction prediction data set USPTO_500_MT, which can be used to train and test five different types of reaction tasks, including the above four as well as a new reagent suggestion task. Our results showed that models trained with multiple tasks are more robust and can benefit from mutual learning on related tasks. Furthermore, we demonstrated the use of SHAP (SHapley Additive exPlanations) to explain T5Chem predictions at the functional group level, which provides a way to demystify sequence-based deep learning models in chemistry. T5Chem is accessible through https://yzhang.hpc.nyu.edu/T5Chem.
Collapse
Affiliation(s)
- Jieyu Lu
- Department of Chemistry, New York University, New York, New York 10003, United States
| | - Yingkai Zhang
- Department of Chemistry, New York University, New York, New York 10003, United States.,NYU-ECNU Center for Computational Chemistry at NYU Shanghai, Shanghai 200062, China
| |
Collapse
|
79
|
Ishida S, Terayama K, Kojima R, Takasu K, Okuno Y. AI-Driven Synthetic Route Design Incorporated with Retrosynthesis Knowledge. J Chem Inf Model 2022; 62:1357-1367. [PMID: 35258953 PMCID: PMC8965881 DOI: 10.1021/acs.jcim.1c01074] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Computer-aided synthesis planning (CASP) aims to assist chemists in performing retrosynthetic analysis for which they utilize their experiments, intuition, and knowledge. Recent breakthroughs in machine learning (ML) techniques, including deep neural networks, have significantly improved data-driven synthetic route designs without human intervention. However, learning chemical knowledge by ML for practical synthesis planning has not yet been adequately achieved and remains a challenging problem. In this study, we developed a data-driven CASP application integrated with various portions of retrosynthesis knowledge called "ReTReK" that introduces the knowledge as adjustable parameters into the evaluation of promising search directions. The experimental results showed that ReTReK successfully searched synthetic routes based on the specified retrosynthesis knowledge, indicating that the synthetic routes searched with the knowledge were preferred to those without the knowledge. The concept of integrating retrosynthesis knowledge as adjustable parameters into a data-driven CASP application is expected to enhance the performance of both existing data-driven CASP applications and those under development.
Collapse
Affiliation(s)
- Shoichi Ishida
- Graduate School of Pharmaceutical Sciences, Kyoto University, 46-29 Yoshidashimo-Adachicho, Sakyo-ku 606-8501, Kyoto, Japan
| | - Kei Terayama
- Graduate School of Medical Life Science, Yokohama City University, 1-7-29, Suehiro-cho, Tsurumi-ku, Yokohama 230-0045, Kanagawa, Japan.,Graduate School of Medicine, Kyoto University, 53 Shogoin-Kawaharacho, Sakyo-ku 606-8507, Kyoto, Japan
| | - Ryosuke Kojima
- Graduate School of Medicine, Kyoto University, 53 Shogoin-Kawaharacho, Sakyo-ku 606-8507, Kyoto, Japan
| | - Kiyosei Takasu
- Graduate School of Pharmaceutical Sciences, Kyoto University, 46-29 Yoshidashimo-Adachicho, Sakyo-ku 606-8501, Kyoto, Japan
| | - Yasushi Okuno
- Graduate School of Medicine, Kyoto University, 53 Shogoin-Kawaharacho, Sakyo-ku 606-8507, Kyoto, Japan.,HPC- and AI-driven Drug Development Platform Division, RIKEN Center for Computational Science, 7-1-26, Minatojima-minami-machi, Chuo-ku, Kobe 650-0047, Hyogo, Japan
| |
Collapse
|
80
|
Ucak UV, Ashyrmamatov I, Ko J, Lee J. Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nat Commun 2022; 13:1186. [PMID: 35246540 PMCID: PMC8897428 DOI: 10.1038/s41467-022-28857-w] [Citation(s) in RCA: 20] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2021] [Accepted: 02/10/2022] [Indexed: 11/09/2022] Open
Abstract
Designing efficient synthetic routes for a target molecule remains a major challenge in organic synthesis. Atom environments are ideal, stand-alone, chemically meaningful building blocks providing a high-resolution molecular representation. Our approach mimics chemical reasoning, and predicts reactant candidates by learning the changes of atom environments associated with the chemical reaction. Through careful inspection of reactant candidates, we demonstrate atom environments as promising descriptors for studying reaction route prediction and discovery. Here, we present a new single-step retrosynthesis prediction method, viz. RetroTRAE, being free from all SMILES-based translation issues, yields a top-1 accuracy of 58.3% on the USPTO test dataset, and top-1 accuracy reaches to 61.6% with the inclusion of highly similar analogs, outperforming other state-of-the-art neural machine translation-based methods. Our methodology introduces a novel scheme for fragmental and topological descriptors to be used as natural inputs for retrosynthetic prediction tasks.
Collapse
Affiliation(s)
- Umit V Ucak
- Department of Chemistry, Division of Chemistry and Biochemistry, Kangwon National University, Chuncheon, 24341, Republic of Korea
| | - Islambek Ashyrmamatov
- Department of Chemistry, Division of Chemistry and Biochemistry, Kangwon National University, Chuncheon, 24341, Republic of Korea
| | - Junsu Ko
- Arontier co., Seoul, Republic of Korea
| | - Juyong Lee
- Department of Chemistry, Division of Chemistry and Biochemistry, Kangwon National University, Chuncheon, 24341, Republic of Korea.
- Arontier co., Seoul, Republic of Korea.
| |
Collapse
|
81
|
|
82
|
|
83
|
Bai J, Cao L, Mosbach S, Akroyd J, Lapkin AA, Kraft M. From Platform to Knowledge Graph: Evolution of Laboratory Automation. JACS AU 2022; 2:292-309. [PMID: 35252980 PMCID: PMC8889618 DOI: 10.1021/jacsau.1c00438] [Citation(s) in RCA: 19] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/05/2021] [Indexed: 05/19/2023]
Abstract
High-fidelity computer-aided experimentation is becoming more accessible with the development of computing power and artificial intelligence tools. The advancement of experimental hardware also empowers researchers to reach a level of accuracy that was not possible in the past. Marching toward the next generation of self-driving laboratories, the orchestration of both resources lies at the focal point of autonomous discovery in chemical science. To achieve such a goal, algorithmically accessible data representations and standardized communication protocols are indispensable. In this perspective, we recategorize the recently introduced approach based on Materials Acceleration Platforms into five functional components and discuss recent case studies that focus on the data representation and exchange scheme between different components. Emerging technologies for interoperable data representation and multi-agent systems are also discussed with their recent applications in chemical automation. We hypothesize that knowledge graph technology, orchestrating semantic web technologies and multi-agent systems, will be the driving force to bring data to knowledge, evolving our way of automating the laboratory.
Collapse
Affiliation(s)
- Jiaru Bai
- Department
of Chemical Engineering and Biotechnology, University of Cambridge, Philippa Fawcett Drive, Cambridge CB3 0AS, United Kingdom
| | - Liwei Cao
- Department
of Chemical Engineering and Biotechnology, University of Cambridge, Philippa Fawcett Drive, Cambridge CB3 0AS, United Kingdom
| | - Sebastian Mosbach
- Department
of Chemical Engineering and Biotechnology, University of Cambridge, Philippa Fawcett Drive, Cambridge CB3 0AS, United Kingdom
- Cambridge
Centre for Advanced Research and Education in Singapore (CARES), CREATE Tower #05-05, 1 Create Way, 138602 Singapore
| | - Jethro Akroyd
- Department
of Chemical Engineering and Biotechnology, University of Cambridge, Philippa Fawcett Drive, Cambridge CB3 0AS, United Kingdom
- Cambridge
Centre for Advanced Research and Education in Singapore (CARES), CREATE Tower #05-05, 1 Create Way, 138602 Singapore
| | - Alexei A. Lapkin
- Department
of Chemical Engineering and Biotechnology, University of Cambridge, Philippa Fawcett Drive, Cambridge CB3 0AS, United Kingdom
- Cambridge
Centre for Advanced Research and Education in Singapore (CARES), CREATE Tower #05-05, 1 Create Way, 138602 Singapore
| | - Markus Kraft
- Department
of Chemical Engineering and Biotechnology, University of Cambridge, Philippa Fawcett Drive, Cambridge CB3 0AS, United Kingdom
- Cambridge
Centre for Advanced Research and Education in Singapore (CARES), CREATE Tower #05-05, 1 Create Way, 138602 Singapore
- School
of Chemical and Biomedical Engineering, Nanyang Technological University, 62 Nanyang Drive, 637459 Singapore
- The
Alan Turing Institute, London NW1 2DB, United Kingdom
| |
Collapse
|
84
|
Abramov YA, Sun G, Zeng Q. Emerging Landscape of Computational Modeling in Pharmaceutical Development. J Chem Inf Model 2022; 62:1160-1171. [PMID: 35226809 DOI: 10.1021/acs.jcim.1c01580] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
Computational chemistry applications have become an integral part of the drug discovery workflow over the past 35 years. However, computational modeling in support of drug development has remained a relatively uncharted territory for a significant part of both academic and industrial communities. This review considers the computational modeling workflows for three key components of drug preclinical and clinical development, namely, process chemistry, analytical research and development, as well as drug product and formulation development. An overview of the computational support for each step of the respective workflows is presented. Additionally, in context of solid form design, special consideration is given to modern physics-based virtual screening methods. This covers rational approaches to polymorph, coformer, counterion, and solvent virtual screening in support of solid form selection and design.
Collapse
Affiliation(s)
- Yuriy A Abramov
- XtalPi, Inc., 245 Main St., Cambridge, Massachusetts 02142, United States.,Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, North Carolina 27599, United States
| | - Guangxu Sun
- XtalPi, Inc., Shenzhen Jingtai Technology Co., Ltd., Floor 3, Sf Industrial Plant, No. 2 Hongliu road, Fubao Community, Fubao Street, Futian District, Shenzhen 518100, China
| | - Qun Zeng
- XtalPi, Inc., Shenzhen Jingtai Technology Co., Ltd., Floor 3, Sf Industrial Plant, No. 2 Hongliu road, Fubao Community, Fubao Street, Futian District, Shenzhen 518100, China
| |
Collapse
|
85
|
Probst D, Manica M, Nana Teukam YG, Castrogiovanni A, Paratore F, Laino T. Biocatalysed synthesis planning using data-driven learning. Nat Commun 2022; 13:964. [PMID: 35181654 PMCID: PMC8857209 DOI: 10.1038/s41467-022-28536-w] [Citation(s) in RCA: 24] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2021] [Accepted: 01/25/2022] [Indexed: 01/30/2023] Open
Abstract
Enzyme catalysts are an integral part of green chemistry strategies towards a more sustainable and resource-efficient chemical synthesis. However, the use of biocatalysed reactions in retrosynthetic planning clashes with the difficulties in predicting the enzymatic activity on unreported substrates and enzyme-specific stereo- and regioselectivity. As of now, only rule-based systems support retrosynthetic planning using biocatalysis, while initial data-driven approaches are limited to forward predictions. Here, we extend the data-driven forward reaction as well as retrosynthetic pathway prediction models based on the Molecular Transformer architecture to biocatalysis. The enzymatic knowledge is learned from an extensive data set of publicly available biochemical reactions with the aid of a new class token scheme based on the enzyme commission classification number, which captures catalysis patterns among different enzymes belonging to the same hierarchy. The forward reaction prediction model (top-1 accuracy of 49.6%), the retrosynthetic pathway (top-1 single-step round-trip accuracy of 39.6%) and the curated data set are made publicly available to facilitate the adoption of enzymatic catalysis in the design of greener chemistry processes. As of now, only rule-based systems support retrosynthetic planning using biocatalysis, while initial data-driven approaches are limited to forward predictions. Here, the authors extend the data-driven forward reaction as well as retrosynthetic pathway prediction models based on the Molecular Transformer architecture to biocatalysis.
Collapse
Affiliation(s)
- Daniel Probst
- IBM Research Europe, CH-8803, Rüschlikon, Switzerland. .,National Center for Competence in Research-Catalysis (NCCR-Catalysis), Rüschlikon, Switzerland.
| | - Matteo Manica
- IBM Research Europe, CH-8803, Rüschlikon, Switzerland
| | | | - Alessandro Castrogiovanni
- IBM Research Europe, CH-8803, Rüschlikon, Switzerland.,National Center for Competence in Research-Catalysis (NCCR-Catalysis), Rüschlikon, Switzerland
| | | | - Teodoro Laino
- IBM Research Europe, CH-8803, Rüschlikon, Switzerland.,National Center for Competence in Research-Catalysis (NCCR-Catalysis), Rüschlikon, Switzerland
| |
Collapse
|
86
|
Wang W, Liu Q, Zhang L, Dong Y, Du J. RetroSynX: A retrosynthetic analysis framework using hybrid reaction templates and group contribution-based thermodynamic models. Chem Eng Sci 2022. [DOI: 10.1016/j.ces.2021.117208] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
87
|
Genheden S, Engkvist O, Bjerrum E. Fast prediction of distances between synthetic routes with deep learning. MACHINE LEARNING: SCIENCE AND TECHNOLOGY 2022. [DOI: 10.1088/2632-2153/ac4a91] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022] Open
Abstract
Abstract
We expand the recent work on clustering of synthetic routes and train a deep learning model to predict the distances between arbitrary routes. The model is based on a long short-term memory representation of a synthetic route and is trained as a twin network to reproduce the tree edit distance (TED) between two routes. The machine learning approach is approximately two orders of magnitude faster than the TED approach and enables clustering many more routes from a retrosynthesis route prediction. The clusters have a high degree of similarity to the clusters given by the TED-based approach and are accordingly intuitive and explainable. We provide the developed model as open-source.
Collapse
|
88
|
Xu J, Zhang Y, Han J, Su A, Qiao H, Zhang C, Tang J, Shen X, Sun B, Yu W, Zhai S, Wang X, Wu Y, Su W, Duan H. Providing direction for mechanistic inferences in radical cascade cyclization using Transformer model. Org Chem Front 2022. [DOI: 10.1039/d2qo00188h] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
Even in modern organic chemistry, predicting or proposing a reaction mechanism and speculating on reaction intermediates remains challenging. For example, it is challenging to predict the regioselectivity of radical attraction...
Collapse
|
89
|
Sridharan B, Goel M, Priyakumar UD. Modern Machine Learning for Tackling Inverse Problems in Chemistry: Molecular Design to Realization. Chem Commun (Camb) 2022; 58:5316-5331. [DOI: 10.1039/d1cc07035e] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
The discovery of new molecules and materials helps expand the horizons of novel and innovative real-life applications. In the pursuit of finding molecules with desired properties, chemists have traditionally relied...
Collapse
|
90
|
|
91
|
Mann V, Venkatasubramanian V. Retrosynthesis prediction using grammar-based neural machine translation: An information-theoretic approach. Comput Chem Eng 2021. [DOI: 10.1016/j.compchemeng.2021.107533] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
92
|
Thomas M, Boardman A, Garcia-Ortegon M, Yang H, de Graaf C, Bender A. Applications of Artificial Intelligence in Drug Design: Opportunities and Challenges. METHODS IN MOLECULAR BIOLOGY (CLIFTON, N.J.) 2021; 2390:1-59. [PMID: 34731463 DOI: 10.1007/978-1-0716-1787-8_1] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
Abstract
Artificial intelligence (AI) has undergone rapid development in recent years and has been successfully applied to real-world problems such as drug design. In this chapter, we review recent applications of AI to problems in drug design including virtual screening, computer-aided synthesis planning, and de novo molecule generation, with a focus on the limitations of the application of AI therein and opportunities for improvement. Furthermore, we discuss the broader challenges imposed by AI in translating theoretical practice to real-world drug design; including quantifying prediction uncertainty and explaining model behavior.
Collapse
Affiliation(s)
- Morgan Thomas
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Cambridge, UK
| | - Andrew Boardman
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Cambridge, UK
| | - Miguel Garcia-Ortegon
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Cambridge, UK.,Department of Pure Mathematics and Mathematical Statistics, University of Cambridge, Cambridge, UK
| | - Hongbin Yang
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Cambridge, UK
| | | | - Andreas Bender
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Cambridge, UK.
| |
Collapse
|
93
|
Caramelli D, Granda J, Mehr SHM, Cambié D, Henson AB, Cronin L. Discovering New Chemistry with an Autonomous Robotic Platform Driven by a Reactivity-Seeking Neural Network. ACS CENTRAL SCIENCE 2021; 7:1821-1830. [PMID: 34849401 PMCID: PMC8620554 DOI: 10.1021/acscentsci.1c00435] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/07/2021] [Indexed: 05/04/2023]
Abstract
We present a robotic chemical discovery system capable of navigating a chemical space based on a learned general association between molecular structures and reactivity, while incorporating a neural network model that can process data from online analytics and assess reactivity without knowing the identity of the reagents. Working in conjunction with this learned knowledge, our robotic platform is able to autonomously explore a large number of potential reactions and assess the reactivity of mixtures, including unknown chemical spaces, regardless of the identity of the starting materials. Through the system, we identified a range of chemical reactions and products, some of which were well-known, some new but predictable from known pathways, and some unpredictable reactions that yielded new molecules. The validation of the system was done within a budget of 15 inputs combined in 1018 reactions, further analysis of which allowed us to discover not only a new photochemical reaction but also a new reactivity mode for a well-known reagent (p-toluenesulfonylmethyl isocyanide, TosMIC). This involved the reaction of 6 equiv of TosMIC in a "multistep, single-substrate" cascade reaction yielding a trimeric product in high yield (47% unoptimized) with the formation of five new C-C bonds involving sp-sp2 and sp-sp3 carbon centers. An analysis reveals that this transformation is intrinsically unpredictable, demonstrating the possibility of a reactivity-first robotic discovery of unknown reaction methodologies without requiring human input.
Collapse
|
94
|
Wang Z, Zhang W, Liu B. Computational Analysis of Synthetic Planning: Past and Future. CHINESE J CHEM 2021. [DOI: 10.1002/cjoc.202100273] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Affiliation(s)
- Zhuang Wang
- Key Laboratory of Green Chemistry & Technology of Ministry of Education, College of Chemistry, Sichuan University, 29 Wangjiang Rd., Chengdu, Sichuan 610064 (China) Center for Molecular Discovery, Department of Chemistry, Boston University, 590 Commonwealth Ave., Boston, Massachusetts 02215, United States cCurrent Address: One Amgen Center Dr. Amgen Inc., Thousand Oaks California 91320 United States
| | - Wenhan Zhang
- Key Laboratory of Green Chemistry & Technology of Ministry of Education, College of Chemistry, Sichuan University, 29 Wangjiang Rd., Chengdu, Sichuan 610064 (China) Center for Molecular Discovery, Department of Chemistry, Boston University, 590 Commonwealth Ave., Boston, Massachusetts 02215, United States cCurrent Address: One Amgen Center Dr. Amgen Inc., Thousand Oaks California 91320 United States
| | - Bo Liu
- Key Laboratory of Green Chemistry & Technology of Ministry of Education, College of Chemistry, Sichuan University, 29 Wangjiang Rd., Chengdu, Sichuan 610064 (China) Center for Molecular Discovery, Department of Chemistry, Boston University, 590 Commonwealth Ave., Boston, Massachusetts 02215, United States cCurrent Address: One Amgen Center Dr. Amgen Inc., Thousand Oaks California 91320 United States
| |
Collapse
|
95
|
Machine learning modelling of chemical reaction characteristics: yesterday, today, tomorrow. MENDELEEV COMMUNICATIONS 2021. [DOI: 10.1016/j.mencom.2021.11.003] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
96
|
Sharma S, Arya A, Cruz R, Cleaves II HJ. Automated Exploration of Prebiotic Chemical Reaction Space: Progress and Perspectives. Life (Basel) 2021; 11:1140. [PMID: 34833016 PMCID: PMC8624352 DOI: 10.3390/life11111140] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2021] [Revised: 10/15/2021] [Accepted: 10/18/2021] [Indexed: 12/12/2022] Open
Abstract
Prebiotic chemistry often involves the study of complex systems of chemical reactions that form large networks with a large number of diverse species. Such complex systems may have given rise to emergent phenomena that ultimately led to the origin of life on Earth. The environmental conditions and processes involved in this emergence may not be fully recapitulable, making it difficult for experimentalists to study prebiotic systems in laboratory simulations. Computational chemistry offers efficient ways to study such chemical systems and identify the ones most likely to display complex properties associated with life. Here, we review tools and techniques for modelling prebiotic chemical reaction networks and outline possible ways to identify self-replicating features that are central to many origin-of-life models.
Collapse
Affiliation(s)
- Siddhant Sharma
- Blue Marble Space Institute of Science, Seattle, WA 98154, USA; (S.S.); (A.A.); (R.C.)
- Department of Biochemistry, Deshbandhu College, University of Delhi, New Delhi 110019, India
- Department of Chemistry and Chemical Engineering, Chalmers University of Technology, SE-412 96 Gothenburg, Sweden
| | - Aayush Arya
- Blue Marble Space Institute of Science, Seattle, WA 98154, USA; (S.S.); (A.A.); (R.C.)
- Department of Physics, Lovely Professional University, Jalandhar-Delhi GT Road, Phagwara 144001, India
| | - Romulo Cruz
- Blue Marble Space Institute of Science, Seattle, WA 98154, USA; (S.S.); (A.A.); (R.C.)
- Big Data Laboratory, Information and Communications Technology Center (CTIC), National University of Engineering, Amaru 210, Lima 15333, Peru
| | - Henderson James Cleaves II
- Blue Marble Space Institute of Science, Seattle, WA 98154, USA; (S.S.); (A.A.); (R.C.)
- Earth-Life Science Institute, Tokyo Institute of Technology, Tokyo 152-8550, Japan
| |
Collapse
|
97
|
Chen S, Jung Y. Deep Retrosynthetic Reaction Prediction using Local Reactivity and Global Attention. JACS AU 2021; 1:1612-1620. [PMID: 34723264 PMCID: PMC8549044 DOI: 10.1021/jacsau.1c00246] [Citation(s) in RCA: 26] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/02/2021] [Indexed: 05/10/2023]
Abstract
As a fundamental problem in chemistry, retrosynthesis aims at designing reaction pathways and intermediates for a target compound. The goal of artificial intelligence (AI)-aided retrosynthesis is to automate this process by learning from the previous chemical reactions to make new predictions. Although several models have demonstrated their potentials for automated retrosynthesis, there is still a significant need to further enhance the prediction accuracy to a more practical level. Here we propose a local retrosynthesis framework called LocalRetro, motivated by the chemical intuition that the molecular changes occur mostly locally during the chemical reactions. This differs from nearly all existing retrosynthesis methods that suggest reactants based on the global structures of the molecules, often containing fine details not directly relevant to the reactions. This local concept yields local reaction templates involving the atom and bond edits. Because the remote functional groups can also affect the overall reaction path as a secondary aspect, the proposed locally encoded retrosynthesis model is then further refined to account for the nonlocal effects of chemical reaction through a global attention mechanism. Our model shows a promising 89.5 and 99.2% round-trip accuracy at top-1 and top-5 predictions for the USPTO-50K dataset containing 50 016 reactions. We further demonstrate the validity of LocalRetro on a large dataset containing 479 035 reactions (UTPTO-MIT) with comparable round-trip top-1 and top-5 accuracy of 87.0 and 97.4%, respectively. The practical application of the model is also demonstrated by correctly predicting the synthesis pathways of five drug candidate molecules from various literature.
Collapse
|
98
|
Luan B, Huynh T. Crystal-structures-guided design of fragment-based drugs for inhibiting the main protease of SARS-CoV-2. Proteins 2021; 90:1081-1089. [PMID: 34636446 PMCID: PMC8661981 DOI: 10.1002/prot.26260] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2021] [Revised: 08/10/2021] [Accepted: 09/16/2021] [Indexed: 01/29/2023]
Abstract
Since the beginning of the COVID‐19 pandemic, scientists across the globe are racing to find a cure for the highly contagious infectious disease caused by the SARS‐CoV‐2 virus. Despite many promising ongoing progress, there are currently no FDA approved drug to treat infected patients. Recently, the crowdsourcing of drug discovery for inhibiting the main protease (Mpro) of SARS‐CoV‐2 have yielded a plenty of drug fragments resolved inside the active site of Mpro via the crystallography method. Following the principle of fragment‐based drug design (FBDD), we are motivated to design a potent drug candidate (named B19) by merging three fragments JFM, U0P, and HWH. Through extensive all‐atom molecular dynamics simulation and molecular docking, we found that B19 among all designed ones is most stable inside the Mpro's active site and the binding free energy of B19 is comparable to or even a little better than that of a native protein ligand processed by Mpro. Our promising results suggest that B19 and its derivatives can potentially be efficacious drug candidates for COVID‐19.
Collapse
Affiliation(s)
- Binquan Luan
- Computational Biological Center, IBM Thomas J. Watson Research, New York, New York, USA
| | - Tien Huynh
- Computational Biological Center, IBM Thomas J. Watson Research, New York, New York, USA
| |
Collapse
|
99
|
Dong J, Zhao M, Liu Y, Su Y, Zeng X. Deep learning in retrosynthesis planning: datasets, models and tools. Brief Bioinform 2021; 23:6375056. [PMID: 34571535 DOI: 10.1093/bib/bbab391] [Citation(s) in RCA: 32] [Impact Index Per Article: 10.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2021] [Revised: 08/16/2021] [Accepted: 08/30/2021] [Indexed: 12/29/2022] Open
Abstract
In recent years, synthesizing drugs powered by artificial intelligence has brought great convenience to society. Since retrosynthetic analysis occupies an essential position in synthetic chemistry, it has received broad attention from researchers. In this review, we comprehensively summarize the development process of retrosynthesis in the context of deep learning. This review covers all aspects of retrosynthesis, including datasets, models and tools. Specifically, we report representative models from academia, in addition to a detailed description of the available and stable platforms in the industry. We also discuss the disadvantages of the existing models and provide potential future trends, so that more abecedarians will quickly understand and participate in the family of retrosynthesis planning.
Collapse
Affiliation(s)
- Jingxin Dong
- College of Information Science and Engineering, Hunan University, 2 Lushan S Rd, Yuelu District, 410086, Hunan, China
| | - Mingyi Zhao
- Department of Pediatrics, Third Xiangya Hospital, Central South University, 400013, Hunan, China
| | - Yuansheng Liu
- College of Information Science and Engineering, Hunan University, 2 Lushan S Rd, Yuelu District, 410086, Hunan, China
| | - Yansen Su
- Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, School of Computer Science and Technology, Anhui University, 230601, Hefei, China
| | - Xiangxiang Zeng
- College of Information Science and Engineering, Hunan University, 2 Lushan S Rd, Yuelu District, 410086, Hunan, China
| |
Collapse
|
100
|
Kondo M. Developing a Generative Model Utilizing Self-attention Networks: Application to Materials/Drug Discovery. Mol Inform 2021; 40:e2100102. [PMID: 34432953 DOI: 10.1002/minf.202100102] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2021] [Accepted: 08/09/2021] [Indexed: 11/11/2022]
Abstract
A new generative model, in which the Variational Autoencoder network is combined with the Transformer architecture, is developed. The proposed model, the Variational Autoencoding Transformer (VAT), is applied to the task of generating molecules, showing that, with proper training, the VAT model can not only produce similar molecules to input ones with high accuracy but also generate new molecules from a predefined prior almost perfectly. A desirable aspect of our VAT is that no heuristic setting is necessary for optimal performance, which suggests that the model can readily be available to a variety of datasets. As practical directions toward materials/drug discovery, two strategies: a fine-tuning method for directed molecular generation and a method of mixing molecules in the latent space, are demonstrated.
Collapse
Affiliation(s)
- Masakazu Kondo
- Ichihara Research Laboratories, JNC Petrochemical Corporation, 5-1, Goi Kaigan, 290-8551, Ichihara, Chiba, Japan
| |
Collapse
|