1
|
Fromer JC, Coley CW. An algorithmic framework for synthetic cost-aware decision making in molecular design. NATURE COMPUTATIONAL SCIENCE 2024; 4:440-450. [PMID: 38886590 DOI: 10.1038/s43588-024-00639-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/20/2023] [Accepted: 05/07/2024] [Indexed: 06/20/2024]
Abstract
Small molecules exhibiting desirable property profiles are often discovered through an iterative process of designing, synthesizing and testing sets of molecules. The selection of molecules to synthesize from all possible candidates is a complex decision-making process that typically relies on expert chemist intuition. Here we propose a quantitative decision-making framework, SPARROW, that prioritizes molecules for evaluation by balancing expected information gain and synthetic cost. SPARROW integrates molecular design, property prediction and retrosynthetic planning to balance the utility of testing a molecule with the cost of batch synthesis. We demonstrate, through three case studies, that the developed algorithm captures the non-additive costs inherent to batch synthesis, leverages common reaction steps and intermediates, and scales to hundreds of molecules.
Collapse
Affiliation(s)
- Jenna C Fromer
- Department of Chemical Engineering, MIT, Cambridge, MA, USA
| | - Connor W Coley
- Department of Chemical Engineering, MIT, Cambridge, MA, USA.
- Department of Electrical Engineering and Computer Science, MIT, Cambridge, MA, USA.
| |
Collapse
|
2
|
M. Bran A, Cox S, Schilter O, Baldassari C, White AD, Schwaller P. Augmenting large language models with chemistry tools. NAT MACH INTELL 2024; 6:525-535. [PMID: 38799228 PMCID: PMC11116106 DOI: 10.1038/s42256-024-00832-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2023] [Accepted: 03/27/2024] [Indexed: 05/29/2024]
Abstract
Large language models (LLMs) have shown strong performance in tasks across domains but struggle with chemistry-related problems. These models also lack access to external knowledge sources, limiting their usefulness in scientific applications. We introduce ChemCrow, an LLM chemistry agent designed to accomplish tasks across organic synthesis, drug discovery and materials design. By integrating 18 expert-designed tools and using GPT-4 as the LLM, ChemCrow augments the LLM performance in chemistry, and new capabilities emerge. Our agent autonomously planned and executed the syntheses of an insect repellent and three organocatalysts and guided the discovery of a novel chromophore. Our evaluation, including both LLM and expert assessments, demonstrates ChemCrow's effectiveness in automating a diverse set of chemical tasks. Our work not only aids expert chemists and lowers barriers for non-experts but also fosters scientific advancement by bridging the gap between experimental and computational chemistry.
Collapse
Affiliation(s)
- Andres M. Bran
- Laboratory of Artificial Chemical Intelligence (LIAC), ISIC, EPFL, Lausanne, Switzerland
- National Centre of Competence in Research (NCCR) Catalysis, EPFL, Lausanne, Switzerland
| | - Sam Cox
- Department of Chemical Engineering, University of Rochester, Rochester, NY USA
- FutureHouse, San Francisco, CA USA
| | - Oliver Schilter
- Laboratory of Artificial Chemical Intelligence (LIAC), ISIC, EPFL, Lausanne, Switzerland
- National Centre of Competence in Research (NCCR) Catalysis, EPFL, Lausanne, Switzerland
- Accelerated Discovery, IBM Research – Europe, Rüschlikon, Switzerland
| | - Carlo Baldassari
- Accelerated Discovery, IBM Research – Europe, Rüschlikon, Switzerland
| | - Andrew D. White
- Department of Chemical Engineering, University of Rochester, Rochester, NY USA
- FutureHouse, San Francisco, CA USA
| | - Philippe Schwaller
- Laboratory of Artificial Chemical Intelligence (LIAC), ISIC, EPFL, Lausanne, Switzerland
- National Centre of Competence in Research (NCCR) Catalysis, EPFL, Lausanne, Switzerland
| |
Collapse
|
3
|
Strieth-Kalthoff F, Szymkuć S, Molga K, Aspuru-Guzik A, Glorius F, Grzybowski BA. Artificial Intelligence for Retrosynthetic Planning Needs Both Data and Expert Knowledge. J Am Chem Soc 2024. [PMID: 38598363 DOI: 10.1021/jacs.4c00338] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/12/2024]
Abstract
Rapid advancements in artificial intelligence (AI) have enabled breakthroughs across many scientific disciplines. In organic chemistry, the challenge of planning complex multistep chemical syntheses should conceptually be well-suited for AI. Yet, the development of AI synthesis planners trained solely on reaction-example-data has stagnated and is not on par with the performance of "hybrid" algorithms combining AI with expert knowledge. This Perspective examines possible causes of these shortcomings, extending beyond the established reasoning of insufficient quantities of reaction data. Drawing attention to the intricacies and data biases that are specific to the domain of synthetic chemistry, we advocate augmenting the unique capabilities of AI with the knowledge base and the reasoning strategies of domain experts. By actively involving synthetic chemists, who are the end users of any synthesis planning software, into the development process, we envision to bridge the gap between computer algorithms and the intricate nature of chemical synthesis.
Collapse
Affiliation(s)
- Felix Strieth-Kalthoff
- University of Toronto, Department of Chemistry and Department of Computer Science, 80 St. George St., Toronto, Ontario M5S 3H6, Canada
- University of Toronto, Department of Computer Science, 10 King's College Road, Toronto, Ontario M5S 3G4, Canada
| | - Sara Szymkuć
- Allchemy, 2145 45th Street #201, Highland, Indiana 46322, United States
- Institute of Organic Chemistry, Polish Academy of Sciences, ul. Kasprzaka 44/52, Warsaw 01-224, Poland
| | - Karol Molga
- Allchemy, 2145 45th Street #201, Highland, Indiana 46322, United States
- Institute of Organic Chemistry, Polish Academy of Sciences, ul. Kasprzaka 44/52, Warsaw 01-224, Poland
| | - Alán Aspuru-Guzik
- University of Toronto, Department of Chemistry and Department of Computer Science, 80 St. George St., Toronto, Ontario M5S 3H6, Canada
- University of Toronto, Department of Computer Science, 10 King's College Road, Toronto, Ontario M5S 3G4, Canada
- Vector Institute for Artificial Intelligence, 661 University Ave., Toronto, Ontario M5G 1M1, Canada
- University of Toronto, Department of Chemical Engineering and Applied Chemistry, 200 College St., Toronto, Ontario M5S 3E5, Canada
- University of Toronto, Department of Materials Science and Engineering, 184 College St., Toronto, Ontario M5S 3E4, Canada
| | - Frank Glorius
- Universität Münster, Organisch-Chemisches Institut, Corrensstr. 36, 48149 Münster, Germany
| | - Bartosz A Grzybowski
- Institute of Organic Chemistry, Polish Academy of Sciences, ul. Kasprzaka 44/52, Warsaw 01-224, Poland
- IBS Center for Algorithmic and Robotized Synthesis, CARS, UNIST 50, UNIST-gil, Eonyang-eup, Ulju-gun, Ulsan 689-798, South Korea
- Department of Chemistry, UNIST, 50, UNIST-gil, Eonyang-eup, Ulju-gun, Ulsan 689-798, South Korea
| |
Collapse
|
4
|
Pasquini M, Stenta M. LinChemIn: Route Arithmetic─Operations on Digital Synthetic Routes. J Chem Inf Model 2024; 64:1765-1771. [PMID: 38480486 DOI: 10.1021/acs.jcim.3c01819] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/26/2024]
Abstract
Computational tools are revolutionizing our understanding and prediction of chemical reactivity by combining traditional data analysis techniques with new predictive models. These tools extract additional value from the reaction data corpus, but to effectively convert this value into actionable knowledge, domain specialists need to interact easily with the computer-generated output. In this application note, we demonstrate the capabilities of the open-source Python toolkit LinChemIn, which simplifies the manipulation of reaction networks and provides advanced functionality for working with synthetic routes. LinChemIn ensures chemical consistency when merging, editing, mining, and analyzing reaction networks. Its flexible input interface can process routes from various sources, including predictive models and expert input. The toolkit also efficiently extracts individual routes from the combined synthetic tree, identifying alternative paths and reaction combinations. By reducing the operational barrier to accessing and analyzing synthetic routes from multiple sources, LinChemIn facilitates a constructive interplay between artificial intelligence and human expertise.
Collapse
Affiliation(s)
- Marta Pasquini
- Syngenta Crop Protection AG, Schaffhauserstrasse, 4332 Stein, AG, Switzerland
| | - Marco Stenta
- Syngenta Crop Protection AG, Schaffhauserstrasse, 4332 Stein, AG, Switzerland
| |
Collapse
|
5
|
Pham TT, Guo Z, Li B, Lapkin AA, Yan N. Synthesis of Pyrrole-2-Carboxylic Acid from Cellulose- and Chitin-Based Feedstocks Discovered by the Automated Route Search. CHEMSUSCHEM 2024; 17:e202300538. [PMID: 37792551 DOI: 10.1002/cssc.202300538] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/10/2023] [Revised: 10/02/2023] [Accepted: 10/04/2023] [Indexed: 10/06/2023]
Abstract
The shift towards sustainable feedstocks for platform chemicals requires new routes to access functional molecules that contain heteroatoms, but there are limited bio-derived feedstocks that lead to heteroatoms in platform chemicals. Combining renewable molecules of different origins could be a solution to optimize the use of atoms from renewable sources. However, the lack of retrosynthetic tools makes it challenging to examine the extensive reaction networks of various platform molecules focusing on multiple bio-based feedstocks. In this study, a protocol was developed to identify potential transformation pathways that allow for the use of feedstocks from different origins. By analyzing existing knowledge on chemical reactions in large databases, several promising synthetic routes were shortlisted, with the reaction of D-glucosamine and pyruvic acid being the most interesting to make pyrrole-2-carboxylic acid (PCA). The optimized synthetic conditions resulted in 50 % yield of PCA, with insights gained from temperature variant NMR studies. The use of substrates obtained from two different bio-feedstock bases, namely cellulose and chitin, allowed for the establishment of a PCA-based chemical space.
Collapse
Affiliation(s)
- Thuy Trang Pham
- Department of Chemical and Biomolecular Engineering, National University of Singapore, 4 Engineering Drive 4, 117585, Singapore City, Singapore
| | - Zhen Guo
- Cambridge Centre for Advanced Research and Education in Singapore (CARES Ltd), 1 CREATE Way, #05-05 Create Tower, 138602, Singapore City, Singapore
- Chemical Data Intelligence (CDI) Pte Ltd, Robinson Road #02-00, 068898, Singapore City, Singapore
| | - Bing Li
- Department of Chemical and Biomolecular Engineering, National University of Singapore, 4 Engineering Drive 4, 117585, Singapore City, Singapore
| | - Alexei A Lapkin
- Cambridge Centre for Advanced Research and Education in Singapore (CARES Ltd), 1 CREATE Way, #05-05 Create Tower, 138602, Singapore City, Singapore
- Department of Chemical Engineering and Biotechnology, University of Cambridge, Cambridge, CB3 0AS, UK
| | - Ning Yan
- Department of Chemical and Biomolecular Engineering, National University of Singapore, 4 Engineering Drive 4, 117585, Singapore City, Singapore
| |
Collapse
|
6
|
Türtscher PL, Reiher M. Pathfinder─Navigating and Analyzing Chemical Reaction Networks with an Efficient Graph-Based Approach. J Chem Inf Model 2023; 63:147-160. [PMID: 36515968 PMCID: PMC9832502 DOI: 10.1021/acs.jcim.2c01136] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
While the field of first-principles explorations into chemical reaction space has been continuously growing, the development of strategies for analyzing resulting chemical reaction networks (CRNs) is lagging behind. A CRN consists of compounds linked by reactions. Analyzing how these compounds are transformed into one another based on kinetic modeling is a nontrivial task. Here, we present the graph-optimization-driven algorithm and program Pathfinder to allow for such an analysis of a CRN. The CRN for this work has been obtained with our open-source Chemoton reaction network exploration software. Chemoton probes reactive combinations of compounds for elementary steps and sorts them into reactions. By encoding these reactions of the CRN as a graph consisting of compound and reaction vertices and adding information about activation barriers as well as required reagents to the edges of the graph yields a complete graph-theoretical representation of the CRN. Since the probabilities of the formation of compounds depend on the starting conditions, the consumption of any compound during a reaction must be accounted for to reflect the availability of reagents. To account for this, we introduce compound costs to reflect compound availability. Simultaneously, the determined compound costs rank the compounds in the CRN in terms of their probability to be formed. This ranking then allows us to probe easily accessible compounds in the CRN first for further explorations into yet unexplored terrain. We first illustrate the working principle on an abstract small CRN. Afterward, Pathfinder is demonstrated in the example of the disproportionation of iodine with water and the comproportionation of iodic acid and hydrogen iodide. Both processes are analyzed within the same CRN, which we construct with our autonomous first-principles CRN exploration software Chemoton [Unsleber, J. P.; J. Chem. Theory Comput. 2022, 18, 5393-5409] guided by Pathfinder.
Collapse
|
7
|
Wen M, Spotte-Smith EWC, Blau SM, McDermott MJ, Krishnapriyan AS, Persson KA. Chemical reaction networks and opportunities for machine learning. NATURE COMPUTATIONAL SCIENCE 2023; 3:12-24. [PMID: 38177958 DOI: 10.1038/s43588-022-00369-z] [Citation(s) in RCA: 17] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/24/2022] [Accepted: 11/08/2022] [Indexed: 01/06/2024]
Abstract
Chemical reaction networks (CRNs), defined by sets of species and possible reactions between them, are widely used to interrogate chemical systems. To capture increasingly complex phenomena, CRNs can be leveraged alongside data-driven methods and machine learning (ML). In this Perspective, we assess the diverse strategies available for CRN construction and analysis in pursuit of a wide range of scientific goals, discuss ML techniques currently being applied to CRNs and outline future CRN-ML approaches, presenting scientific and technical challenges to overcome.
Collapse
Affiliation(s)
- Mingjian Wen
- Chemical and Biomolecular Engineering, University of Houston, Houston, TX, USA
- Energy Technologies Area, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Evan Walter Clark Spotte-Smith
- Materials Science Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
- Materials Science and Engineering, University of California, Berkeley, Berkeley, CA, USA
| | - Samuel M Blau
- Energy Technologies Area, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Matthew J McDermott
- Materials Science Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
- Materials Science and Engineering, University of California, Berkeley, Berkeley, CA, USA
| | - Aditi S Krishnapriyan
- Computational Research Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
- Chemical and Biomolecular Engineering, University of California, Berkeley, Berkeley, CA, USA
- Electrical Engineering and Computer Science, University of California, Berkeley, Berkeley, CA, USA
| | - Kristin A Persson
- Materials Science and Engineering, University of California, Berkeley, Berkeley, CA, USA.
- Molecular Foundry, Lawrence Berkeley National Laboratory, Berkeley, CA, USA.
| |
Collapse
|
8
|
Seifrid M, Pollice R, Aguilar-Granda A, Morgan Chan Z, Hotta K, Ser CT, Vestfrid J, Wu TC, Aspuru-Guzik A. Autonomous Chemical Experiments: Challenges and Perspectives on Establishing a Self-Driving Lab. Acc Chem Res 2022; 55:2454-2466. [PMID: 35948428 PMCID: PMC9454899 DOI: 10.1021/acs.accounts.2c00220] [Citation(s) in RCA: 17] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2022] [Indexed: 01/19/2023]
Abstract
We must accelerate the pace at which we make technological advancements to address climate change and disease risks worldwide. This swifter pace of discovery requires faster research and development cycles enabled by better integration between hypothesis generation, design, experimentation, and data analysis. Typical research cycles take months to years. However, data-driven automated laboratories, or self-driving laboratories, can significantly accelerate molecular and materials discovery. Recently, substantial advancements have been made in the areas of machine learning and optimization algorithms that have allowed researchers to extract valuable knowledge from multidimensional data sets. Machine learning models can be trained on large data sets from the literature or databases, but their performance can often be hampered by a lack of negative results or metadata. In contrast, data generated by self-driving laboratories can be information-rich, containing precise details of the experimental conditions and metadata. Consequently, much larger amounts of high-quality data are gathered in self-driving laboratories. When placed in open repositories, this data can be used by the research community to reproduce experiments, for more in-depth analysis, or as the basis for further investigation. Accordingly, high-quality open data sets will increase the accessibility and reproducibility of science, which is sorely needed.In this Account, we describe our efforts to build a self-driving lab for the development of a new class of materials: organic semiconductor lasers (OSLs). Since they have only recently been demonstrated, little is known about the molecular and material design rules for thin-film, electrically-pumped OSL devices as compared to other technologies such as organic light-emitting diodes or organic photovoltaics. To realize high-performing OSL materials, we are developing a flexible system for automated synthesis via iterative Suzuki-Miyaura cross-coupling reactions. This automated synthesis platform is directly coupled to the analysis and purification capabilities. Subsequently, the molecules of interest can be transferred to an optical characterization setup. We are currently limited to optical measurements of the OSL molecules in solution. However, material properties are ultimately most important in the solid state (e.g., as a thin-film device). To that end and for a different scientific goal, we are developing a self-driving lab for inorganic thin-film materials focused on the oxygen evolution reaction.While the future of self-driving laboratories is very promising, numerous challenges still need to be overcome. These challenges can be split into cognition and motor function. Generally, the cognitive challenges are related to optimization with constraints or unexpected outcomes for which general algorithmic solutions have yet to be developed. A more practical challenge that could be resolved in the near future is that of software control and integration because few instrument manufacturers design their products with self-driving laboratories in mind. Challenges in motor function are largely related to handling heterogeneous systems, such as dispensing solids or performing extractions. As a result, it is critical to understand that adapting experimental procedures that were designed for human experimenters is not as simple as transferring those same actions to an automated system, and there may be more efficient ways to achieve the same goal in an automated fashion. Accordingly, for self-driving laboratories, we need to carefully rethink the translation of manual experimental protocols.
Collapse
Affiliation(s)
- Martin Seifrid
- Department
of Chemistry, University of Toronto, Toronto, Ontario M5S 3H6, Canada
| | - Robert Pollice
- Department
of Chemistry, University of Toronto, Toronto, Ontario M5S 3H6, Canada
| | | | - Zamyla Morgan Chan
- Department
of Chemistry, University of Toronto, Toronto, Ontario M5S 3H6, Canada
- Acceleration
Consortium, University of Toronto, Toronto, Ontario M5S 3H6, Canada
| | - Kazuhiro Hotta
- Department
of Chemistry, University of Toronto, Toronto, Ontario M5S 3H6, Canada
- Science
& Innovation Center, Mitsubishi Chemical
Corporation, 1000 Kamoshidacho, Aoba, Yokohama, Kanagawa 227-8502, Japan
| | - Cher Tian Ser
- Department
of Chemistry, University of Toronto, Toronto, Ontario M5S 3H6, Canada
| | - Jenya Vestfrid
- Department
of Chemistry, University of Toronto, Toronto, Ontario M5S 3H6, Canada
| | - Tony C. Wu
- Department
of Chemistry, University of Toronto, Toronto, Ontario M5S 3H6, Canada
| | - Alán Aspuru-Guzik
- Department
of Chemistry, University of Toronto, Toronto, Ontario M5S 3H6, Canada
- Department
of Computer Science, University of Toronto, Toronto, Ontario M5S 3H6, Canada
- Department
of Chemical Engineering & Applied Chemistry, University of Toronto, Toronto, Ontario M5S 3E5, Canada
- Department
of Materials Science, University of Toronto, Toronto, Ontario M5S 3E4, Canada
- Vector
Institute for Artificial Intelligence, Toronto, Ontario M5S 1M1, Canada
- Lebovic
Fellow, Canadian Institute for Advanced
Research, Toronto, Ontario M5S 1M1, Canada
| |
Collapse
|