1
|
Kulichenko M, Nebgen B, Lubbers N, Smith JS, Barros K, Allen AEA, Habib A, Shinkle E, Fedik N, Li YW, Messerly RA, Tretiak S. Data Generation for Machine Learning Interatomic Potentials and Beyond. Chem Rev 2024; 124:13681-13714. [PMID: 39572011 DOI: 10.1021/acs.chemrev.4c00572] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2024]
Abstract
The field of data-driven chemistry is undergoing an evolution, driven by innovations in machine learning models for predicting molecular properties and behavior. Recent strides in ML-based interatomic potentials have paved the way for accurate modeling of diverse chemical and structural properties at the atomic level. The key determinant defining MLIP reliability remains the quality of the training data. A paramount challenge lies in constructing training sets that capture specific domains in the vast chemical and structural space. This Review navigates the intricate landscape of essential components and integrity of training data that ensure the extensibility and transferability of the resulting models. We delve into the details of active learning, discussing its various facets and implementations. We outline different types of uncertainty quantification applied to atomistic data acquisition and the correlations between estimated uncertainty and true error. The role of atomistic data samplers in generating diverse and informative structures is highlighted. Furthermore, we discuss data acquisition via modified and surrogate potential energy surfaces as an innovative approach to diversify training data. The Review also provides a list of publicly available data sets that cover essential domains of chemical space.
Collapse
Affiliation(s)
- Maksim Kulichenko
- Theoretical Division, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, United States
| | - Benjamin Nebgen
- Theoretical Division, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, United States
| | - Nicholas Lubbers
- Computer, Computational, and Statistical Sciences Division, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, United States
| | - Justin S Smith
- NVIDIA Corporation, Santa Clara, California 95051, United States
| | - Kipton Barros
- Theoretical Division, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, United States
- Center for Nonlinear Studies, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, United States
| | - Alice E A Allen
- Theoretical Division, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, United States
- Center for Nonlinear Studies, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, United States
| | - Adela Habib
- Theoretical Division, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, United States
| | - Emily Shinkle
- Computer, Computational, and Statistical Sciences Division, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, United States
| | - Nikita Fedik
- Theoretical Division, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, United States
| | - Ying Wai Li
- Computer, Computational, and Statistical Sciences Division, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, United States
| | - Richard A Messerly
- Theoretical Division, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, United States
| | - Sergei Tretiak
- Theoretical Division, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, United States
- Center for Nonlinear Studies, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, United States
- Center for Integrated Nanotechnologies, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, United States
| |
Collapse
|
2
|
Li XT, Mi S, Xu Y, Li BW, Zhu T, Zhang JZH. Discovery of New Synthetic Routes of Amino Acids in Prebiotic Chemistry. JACS AU 2024; 4:4757-4768. [PMID: 39735912 PMCID: PMC11672127 DOI: 10.1021/jacsau.4c00685] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/30/2024] [Revised: 11/07/2024] [Accepted: 11/08/2024] [Indexed: 12/31/2024]
Abstract
The origin of life on Earth remains one of the most perplexing challenges in biochemistry. While numerous bottom-up experiments under prebiotic conditions have provided valuable insights into the spontaneous chemical genesis of life, there remains a significant gap in the theoretical understanding of the complex reaction processes involved. In this study, we propose a novel approach using a roto-translationally invariant potential (RTIP) formulated with pristine Cartesian coordinates to facilitate the simulation of chemical reactions. By employing RTIP pathway sampling to explore the reactivity of primitive molecules, we identified several low-energy reaction mechanisms, such as two-hydrogen-transfer hydrogenation and HCOOH-catalyzed hydration and amination. This led to the construction of a comprehensive reaction network, illustrating the synthesis pathways for glycine, serine, and alanine. Further thermodynamic analysis highlights the pivotal role of formaldimine as a key precursor in amino acid synthesis, owing to its more favorable reactivity in coupling reactions compared to the traditionally recognized hydrogen cyanide. Our study demonstrates that the RTIP methodology, coupled with a divide-and-conquer strategy, provides new insights into the simulation of complex reaction processes, offering promising applications for advancing organic design and synthesis.
Collapse
Affiliation(s)
- Xiao-Tian Li
- Faculty
of Synthetic Biology, Shenzhen University of Advanced Technology, Shenzhen 518055, China
| | - Sixuan Mi
- Shanghai
Engineering Research Center of Molecular Therapeutics and New Drug
Development, School of Chemistry and Molecular Engineering, East China Normal University, Shanghai 200062, China
| | - Yuzhi Xu
- NYU-ECNU
Center for Computational Chemistry at NYU Shanghai, Shanghai 200062, China
| | - Bo-Wen Li
- Shanghai
Engineering Research Center of Molecular Therapeutics and New Drug
Development, School of Chemistry and Molecular Engineering, East China Normal University, Shanghai 200062, China
| | - Tong Zhu
- Shanghai
Engineering Research Center of Molecular Therapeutics and New Drug
Development, School of Chemistry and Molecular Engineering, East China Normal University, Shanghai 200062, China
- NYU-ECNU
Center for Computational Chemistry at NYU Shanghai, Shanghai 200062, China
- Shanghai
Innovation Institute, Shanghai 200003, China
| | - John Z. H. Zhang
- Faculty
of Synthetic Biology, Shenzhen University of Advanced Technology, Shenzhen 518055, China
- Shanghai
Engineering Research Center of Molecular Therapeutics and New Drug
Development, School of Chemistry and Molecular Engineering, East China Normal University, Shanghai 200062, China
- NYU-ECNU
Center for Computational Chemistry at NYU Shanghai, Shanghai 200062, China
- Department
of Chemistry, New York University, New York, New York 10003, United States
- Collaborative
Innovation Center of Extreme Optics, Shanxi
University, Taiyuan 030006, Shanxi, China
| |
Collapse
|
3
|
Lin X, Chang X, Zhang Y, Gao Z, Chi X. Automatic construction of Petri net models for computational simulations of molecular interaction network. NPJ Syst Biol Appl 2024; 10:131. [PMID: 39521772 PMCID: PMC11550427 DOI: 10.1038/s41540-024-00464-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2024] [Accepted: 10/30/2024] [Indexed: 11/16/2024] Open
Abstract
Petri nets are commonly applied in modeling biological systems. However, construction of a Petri net model for complex biological systems is often time consuming, and requires expertise in the research area, limiting their application. To address this challenge, we developed GINtoSPN, an R package that automates the conversion of multi-omics molecular interaction network extracted from the Global Integrative Network (GIN) into Petri nets in GraphML format. These GraphML files can be directly used for Signaling Petri Net (SPN) simulation. To demonstrate the utility of this tool, we built a Petri net model for neurofibromatosis type I. Simulation of NF1 gene knockout, compared to normal skin fibroblast cells, revealed persistent accumulation of Ras-GTPs as expected. Additionally, we identified several other genes substantially affected by the loss of NF1's function, exhibiting individual-specific variability. These results highlight the effectiveness of GINtoSPN in streamlining the modeling and simulation of complex biological systems.
Collapse
Affiliation(s)
- Xuefei Lin
- Department of Dermatology and Venereal Disease, Xuan Wu Hospital, Beijing, China
| | - Xiao Chang
- Department of Dermatology and Venereal Disease, Xuan Wu Hospital, Beijing, China
| | - Yizheng Zhang
- China National Center for Bioinformation, Beijing, China
- CAS Key Laboratory of Genomic and Precision Medicine, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Zhanyu Gao
- China National Center for Bioinformation, Beijing, China
- CAS Key Laboratory of Genomic and Precision Medicine, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
- HKU Li Ka Shing Faculty of Medicine, Hong Kong, China
| | - Xu Chi
- China National Center for Bioinformation, Beijing, China.
- CAS Key Laboratory of Genomic and Precision Medicine, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, China.
| |
Collapse
|
4
|
Stuyver T. TS-tools: Rapid and automated localization of transition states based on a textual reaction SMILES input. J Comput Chem 2024; 45:2308-2317. [PMID: 38850166 DOI: 10.1002/jcc.27374] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2024] [Revised: 03/08/2024] [Accepted: 03/20/2024] [Indexed: 06/10/2024]
Abstract
Here, TS-tools is presented, a Python package facilitating the automated localization of transition states (TS) based on a textual reaction SMILES input. TS searches can either be performed at xTB or DFT level of theory, with the former yielding guesses at marginal computational cost, and the latter directly yielding accurate structures at greater expense. On a benchmarking dataset of mono- and bimolecular reactions, TS-tools reaches an excellent success rate of 95% already at xTB level of theory. For tri- and multimolecular reaction pathways - which are typically not benchmarked when developing new automated TS search approaches, yet are relevant for various types of reactivity, cf. solvent- and autocatalysis and enzymatic reactivity - TS-tools retains its ability to identify TS geometries, though a DFT treatment becomes essential in many cases. Throughout the presented applications, a particular emphasis is placed on solvation-induced mechanistic changes, another issue that received limited attention in the automated TS search literature so far.
Collapse
Affiliation(s)
- Thijs Stuyver
- Ecole Nationale Supérieure de Chimie de Paris, Université PSL, CNRS, Institute of Chemistry for Life and Health Sciences, Paris, France
| |
Collapse
|
5
|
Stulajter MM, Rappoport D. Reaction Networks Resemble Low-Dimensional Regular Lattices. J Chem Theory Comput 2024. [PMID: 39236261 DOI: 10.1021/acs.jctc.4c00810] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/07/2024]
Abstract
The computational exploration, manipulation, and design of complex chemical reactions face fundamental challenges related to the high-dimensional nature of potential energy surfaces (PESs) that govern reactivity. Accurately modeling complex reactions is crucial for understanding the chemical processes involved in, for example, organocatalysis, autocatalytic cycles, and one-pot molecular assembly. Our prior research demonstrated that discretizing PESs using heuristics based on bond breaking and bond formation produces a reaction network representation with a low-dimensional structure (metric space). We now find that these stoichiometry-preserving reaction networks possess additional, though approximate, structure and resemble low-dimensional regular lattices with a small amount of random edge rewiring. The heuristics-based discretization thus generates a nonlinear dimensionality reduction by a factor of 10 with an a posteriori error measure (probability of random rewiring). The structure becomes evident through a comparative analysis of CHNO reaction networks of varying stoichiometries against a panel of size-matched generative network models, taking into account their local, metric, and global properties. The generative models include random networks (Erdős-Rényi and bipartite random networks), regular lattices (periodic and nonperiodic), and network models with a tunable level of "randomness" (Watts-Strogatz graphs and regular lattices with random rewiring). The CHNO networks are simultaneously closely matched in all these properties by 3-4-dimensional regular lattices with 10% or less of edges randomly rewired. The effective dimensionality reduction is found to be independent of the system size, stoichiometry, and ruleset, suggesting that search and sampling algorithms for PESs of complex chemical reactions can be effectively leveraged.
Collapse
Affiliation(s)
- Miko M Stulajter
- Department of Chemistry, University of California Irvine, Irvine, California 92697, United States
- Computational Science Research Center, San Diego State University, San Diego, California 92182, United States
| | - Dmitrij Rappoport
- Department of Chemistry, University of California Irvine, Irvine, California 92697, United States
| |
Collapse
|
6
|
Laplaza R, Wodrich MD, Corminboeuf C. Overcoming the Pitfalls of Computing Reaction Selectivity from Ensembles of Transition States. J Phys Chem Lett 2024; 15:7363-7370. [PMID: 38990895 DOI: 10.1021/acs.jpclett.4c01657] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/13/2024]
Abstract
The prediction of reaction selectivity is a challenging task for computational chemistry, not only because many molecules adopt multiple conformations but also due to the exponential relationship between effective activation energies and rate constants. To account for molecular flexibility, an increasing number of methods exist that generate conformational ensembles of transition state (TS) structures. Typically, these TS ensembles are Boltzmann weighted and used to compute selectivity assuming Curtin-Hammett conditions. This strategy, however, can lead to erroneous predictions if the appropriate filtering of the conformer ensembles is not conducted. Here, we demonstrate how any possible selectivity can be obtained by processing the same sets of TS ensembles for a model reaction. To address the burdensome filtering task in a consistent and automated way, we introduce marc, a tool for the modular analysis of representative conformers that aids in avoiding human errors while minimizing the number of reoptimization computations needed to obtain correct reaction selectivity.
Collapse
Affiliation(s)
- Ruben Laplaza
- Laboratory for Computational Molecular Design, Institute of Chemical Sciences and Engineering, École Polytechnique Fédérale de Lausanne (EPFL), 1015 Lausanne, Switzerland
- National Center for Competence in Research-Catalysis (NCCR-Catalysis), École Polytechnique Fédérale de Lausanne, 1015 Lausanne, Switzerland
| | - Matthew D Wodrich
- Laboratory for Computational Molecular Design, Institute of Chemical Sciences and Engineering, École Polytechnique Fédérale de Lausanne (EPFL), 1015 Lausanne, Switzerland
- National Center for Competence in Research-Catalysis (NCCR-Catalysis), École Polytechnique Fédérale de Lausanne, 1015 Lausanne, Switzerland
| | - Clemence Corminboeuf
- Laboratory for Computational Molecular Design, Institute of Chemical Sciences and Engineering, École Polytechnique Fédérale de Lausanne (EPFL), 1015 Lausanne, Switzerland
- National Center for Competence in Research-Catalysis (NCCR-Catalysis), École Polytechnique Fédérale de Lausanne, 1015 Lausanne, Switzerland
| |
Collapse
|
7
|
Gilkes J, Storr MT, Maurer RJ, Habershon S. Predicting Long-Time-Scale Kinetics under Variable Experimental Conditions with Kinetica.jl. J Chem Theory Comput 2024; 20:5196-5214. [PMID: 38829777 PMCID: PMC11209948 DOI: 10.1021/acs.jctc.4c00333] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2024] [Revised: 05/13/2024] [Accepted: 05/13/2024] [Indexed: 06/05/2024]
Abstract
Predicting the degradation processes of molecules over long time scales is a key aspect of industrial materials design. However, it is made computationally challenging by the need to construct large networks of chemical reactions that are relevant to the experimental conditions that kinetic models must mirror, with every reaction requiring accurate kinetic data. Here, we showcase Kinetica.jl, a new software package for constructing large-scale chemical reaction networks in a fully automated fashion by exploring chemical reaction space with a kinetics-driven algorithm; coupled to efficient machine-learning models of activation energies for sampled elementary reactions, we show how this approach readily enables generation and kinetic characterization of networks containing ∼103 chemical species and ≃104-105 reactions. Symbolic-numeric modeling of the generated reaction networks is used to allow for flexible, efficient computation of kinetic profiles under experimentally realizable conditions such as continuously variable temperature regimes, enabling direct connection between bottom-up reaction networks and experimental observations. Highly efficient propagation of long-time-scale kinetic profiles is required for automated reaction network refinement and is enabled here by a new discrete kinetic approximation. The resulting Kinetica.jl simulation package therefore enables automated generation, characterization, and long-time-scale modeling of complex chemical reaction systems. We demonstrate this for hydrocarbon pyrolysis simulated over time scales of seconds, using transient temperature profiles representing those of tubular flow reactor experiments.
Collapse
Affiliation(s)
- Joe Gilkes
- Department
of Chemistry, University of Warwick, Gibbet Hill Road, CV4 7AL Coventry, U.K.
- EPSRC
HetSys Centre for Doctoral Training, University
of Warwick, Gibbet Hill
Rd, CV4 7AL Coventry, U.K.
| | | | - Reinhard J. Maurer
- Department
of Chemistry, University of Warwick, Gibbet Hill Road, CV4 7AL Coventry, U.K.
- Department
of Physics, University of Warwick, Gibbet Hill Road, CV4 7AL Coventry, U.K.
| | - Scott Habershon
- Department
of Chemistry, University of Warwick, Gibbet Hill Road, CV4 7AL Coventry, U.K.
| |
Collapse
|
8
|
Csizi KS, Steiner M, Reiher M. Nanoscale chemical reaction exploration with a quantum magnifying glass. Nat Commun 2024; 15:5320. [PMID: 38909029 PMCID: PMC11193806 DOI: 10.1038/s41467-024-49594-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2023] [Accepted: 06/04/2024] [Indexed: 06/24/2024] Open
Abstract
Nanoscopic systems exhibit diverse molecular substructures by which they facilitate specific functions. Theoretical models of them, which aim at describing, understanding, and predicting these capabilities, are difficult to build. Viable quantum-classical hybrid models come with specific challenges regarding atomistic structure construction and quantum region selection. Moreover, if their dynamics are mapped onto a state-to-state mechanism such as a chemical reaction network, its exhaustive exploration will be impossible due to the combinatorial explosion of the reaction space. Here, we introduce a "quantum magnifying glass" that allows one to interactively manipulate nanoscale structures at the quantum level. The quantum magnifying glass seamlessly combines autonomous model parametrization, ultra-fast quantum mechanical calculations, and automated reaction exploration. It represents an approach to investigate complex reaction sequences in a physically consistent manner with unprecedented effortlessness in real time. We demonstrate these features for reactions in bio-macromolecules and metal-organic frameworks, diverse systems that highlight general applicability.
Collapse
Affiliation(s)
- Katja-Sophia Csizi
- ETH Zurich, Department of Chemistry and Applied Biosciences, Vladimir-Prelog-Weg 2, 8093, Zurich, Switzerland
| | - Miguel Steiner
- ETH Zurich, Department of Chemistry and Applied Biosciences, Vladimir-Prelog-Weg 2, 8093, Zurich, Switzerland
- ETH Zurich, NCCR Catalysis, Vladimir-Prelog-Weg 2, 8093, Zurich, Switzerland
| | - Markus Reiher
- ETH Zurich, Department of Chemistry and Applied Biosciences, Vladimir-Prelog-Weg 2, 8093, Zurich, Switzerland.
- ETH Zurich, NCCR Catalysis, Vladimir-Prelog-Weg 2, 8093, Zurich, Switzerland.
| |
Collapse
|
9
|
Steiner M, Reiher M. A human-machine interface for automatic exploration of chemical reaction networks. Nat Commun 2024; 15:3680. [PMID: 38693117 PMCID: PMC11063077 DOI: 10.1038/s41467-024-47997-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2023] [Accepted: 04/15/2024] [Indexed: 05/03/2024] Open
Abstract
Autonomous reaction network exploration algorithms offer a systematic approach to explore mechanisms of complex chemical processes. However, the resulting reaction networks are so vast that an exploration of all potentially accessible intermediates is computationally too demanding. This renders brute-force explorations unfeasible, while explorations with completely pre-defined intermediates or hard-wired chemical constraints, such as element-specific coordination numbers, are not flexible enough for complex chemical systems. Here, we introduce a STEERING WHEEL to guide an otherwise unbiased automated exploration. The STEERING WHEEL algorithm is intuitive, generally applicable, and enables one to focus on specific regions of an emerging network. It also allows for guiding automated data generation in the context of mechanism exploration, catalyst design, and other chemical optimization challenges. The algorithm is demonstrated for reaction mechanism elucidation of transition metal catalysts. We highlight how to explore catalytic cycles in a systematic and reproducible way. The exploration objectives are fully adjustable, allowing one to harness the STEERING WHEEL for both structure-specific (accurate) calculations as well as for broad high-throughput screening of possible reaction intermediates.
Collapse
Affiliation(s)
- Miguel Steiner
- ETH Zurich, Department of Chemistry and Applied Biosciences, Vladimir-Prelog-Weg 2, 8093, Zurich, Switzerland
- ETH Zurich, NCCR Catalysis, Vladimir-Prelog-Weg 2, 8093, Zurich, Switzerland
| | - Markus Reiher
- ETH Zurich, Department of Chemistry and Applied Biosciences, Vladimir-Prelog-Weg 2, 8093, Zurich, Switzerland.
- ETH Zurich, NCCR Catalysis, Vladimir-Prelog-Weg 2, 8093, Zurich, Switzerland.
| |
Collapse
|
10
|
Hwang J, Zhao Q, Ahmed M, Yakisan AC, Espenship MF, Laskin J, Savoie BM, Mei J. Reductive Doping Inhibits the Formation of Isomerization-Derived Structural Defects in N-doped Poly(benzodifurandione) (n-PBDF). Angew Chem Int Ed Engl 2024; 63:e202401465. [PMID: 38346013 DOI: 10.1002/anie.202401465] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2024] [Indexed: 03/28/2024]
Abstract
Recently, solution-processable n-doped poly(benzodifurandione) (n-PBDF) has been made through in-situ oxidative polymerization and reductive doping, which exhibited exceptionally high electrical conductivities and optical transparency. The discovery of n-PBDF is considered a breakthrough in the field of organic semiconductors. In the initial report, the possibility of structural defect formation in n-PBDF was proposed, based on the observation of structural isomerization from (E)-2H,2'H-[3,3'-bibenzofuranylidene]-2,2'-dione (isoxindigo) to chromeno[4,3-c]chromene-5,11-dione (dibenzonaphthyrone) in the dimer model reactions. In this study, we present clear evidence that structural isomerization is inhibited during polymerization. We reveal that the dimer (BFD1) and the trimer (BFD2) can be reductively doped by several mechanisms, including hydride transfer, forming charge transfer complexes (CTC) or undergoing an integer charge transfer (ICT) with reactants available during polymerization. Once the hydride transfer adducts, the CTC, or the ICT product forms, structural isomerization can be effectively prevented even at elevated temperatures. Our findings provide a mechanistic understanding of why isomerization-derived structural defects are absent in n-PBDF backbone. It lays a solid foundation for the future development of n-PBDF as a benchmark polymer for organic electronics and beyond.
Collapse
Affiliation(s)
- Jinhyo Hwang
- Department of Chemistry, Purdue University, 47907, West Lafayette, IN, USA
| | - Qiyuan Zhao
- Davidson School of Chemical Engineering, Purdue University, 47907, West Lafayette, IN, USA
| | - Mustafa Ahmed
- Department of Chemistry, Purdue University, 47907, West Lafayette, IN, USA
| | | | | | - Julia Laskin
- Department of Chemistry, Purdue University, 47907, West Lafayette, IN, USA
| | - Brett M Savoie
- Davidson School of Chemical Engineering, Purdue University, 47907, West Lafayette, IN, USA
| | - Jianguo Mei
- Department of Chemistry, Purdue University, 47907, West Lafayette, IN, USA
| |
Collapse
|
11
|
Vadaddi SM, Zhao Q, Savoie BM. Graph to Activation Energy Models Easily Reach Irreducible Errors but Show Limited Transferability. J Phys Chem A 2024; 128:2543-2555. [PMID: 38517281 DOI: 10.1021/acs.jpca.3c07240] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/23/2024]
Abstract
Activation energy characterization of competing reactions is a costly but crucial step for understanding the kinetic relevance of distinct reaction pathways, product yields, and myriad other properties of reacting systems. The standard methodology for activation energy characterization has historically been a transition state search using the highest level of theory that can be afforded. However, recently, several groups have popularized the idea of predicting activation energies directly based on nothing more than the reactant and product graphs, a sufficiently complex neural network, and a broad enough data set. Here, we have revisited this task using the recently developed Reaction Graph Depth 1 (RGD1) transition state data set and several newly developed graph attention architectures. All of these new architectures achieve similar state-of-the-art results of ∼4 kcal/mol mean absolute error on withheld testing sets of reactions but poor performance on external testing sets composed of reactions with differing mechanisms, reaction molecularity, or reactant size distribution. Limited transferability is also shown to be shared by other contemporary graph to activation energy architectures through a series of case studies. We conclude that an array of standard graph architectures can already achieve results comparable to the irreducible error of available reaction data sets but that out-of-distribution performance remains poor.
Collapse
Affiliation(s)
- Sai Mahit Vadaddi
- Davidson School of Chemical Engineering, Purdue University, West Lafayette, Indiana 47906, United States
| | - Qiyuan Zhao
- Department of Medicinal Chemistry, University of Michigan, Ann Arbor, Michigan 48109, United States
| | - Brett M Savoie
- Davidson School of Chemical Engineering, Purdue University, West Lafayette, Indiana 47906, United States
| |
Collapse
|
12
|
Zhang R, Mahjour B, Outlaw A, McGrath A, Hopper T, Kelley B, Walters WP, Cernak T. Exploring the combinatorial explosion of amine-acid reaction space via graph editing. Commun Chem 2024; 7:22. [PMID: 38310120 PMCID: PMC10838272 DOI: 10.1038/s42004-024-01101-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2023] [Accepted: 01/08/2024] [Indexed: 02/05/2024] Open
Abstract
Amines and carboxylic acids are abundant chemical feedstocks that are nearly exclusively united via the amide coupling reaction. The disproportionate use of the amide coupling leaves a large section of unexplored reaction space between amines and acids: two of the most common chemical building blocks. Herein we conduct a thorough exploration of amine-acid reaction space via systematic enumeration of reactions involving a simple amine-carboxylic acid pair. This approach to chemical space exploration investigates the coarse and fine modulation of physicochemical properties and molecular shapes. With the invention of reaction methods becoming increasingly automated and bringing conceptual reactions into reality, our map provides an entirely new axis of chemical space exploration for rational property design.
Collapse
Affiliation(s)
- Rui Zhang
- Department of Chemistry, University of Michigan, Ann Arbor, MI, USA
| | - Babak Mahjour
- Department of Medicinal Chemistry, University of Michigan, Ann Arbor, MI, USA
| | - Andrew Outlaw
- Department of Medicinal Chemistry, University of Michigan, Ann Arbor, MI, USA
| | - Andrew McGrath
- Department of Medicinal Chemistry, University of Michigan, Ann Arbor, MI, USA
| | | | | | | | - Tim Cernak
- Department of Chemistry, University of Michigan, Ann Arbor, MI, USA.
- Department of Medicinal Chemistry, University of Michigan, Ann Arbor, MI, USA.
| |
Collapse
|
13
|
Kopp WA, Huang C, Zhao Y, Yu P, Schmalz F, Krep L, Leonhard K. Automatic Potential Energy Surface Exploration by Accelerated Reactive Molecular Dynamics Simulations: From Pyrolysis to Oxidation Chemistry. J Phys Chem A 2023; 127:10681-10692. [PMID: 38059461 DOI: 10.1021/acs.jpca.3c05253] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/08/2023]
Abstract
Automatic potential energy surface (PES) exploration is important to a better understanding of reaction mechanisms. Existing automatic PES mapping tools usually rely on predefined knowledge or computationally expensive on-the-fly quantum-chemical calculations. In this work, we have developed the PESmapping algorithm for discovering novel reaction pathways and automatically mapping out the PES using merely one starting species is present. The algorithm explores the unknown PES by iteratively spawning new reactive molecular dynamics (RMD) simulations for species that it has detected within previous RMD simulations. We have therefore extended the RMD simulation tool ChemTraYzer2.1 (Chemical Trajectory Analyzer, CTY) for this PESmapping algorithm. It can generate new seed species, automatically start replica simulations for new pathways, and stop the simulation when a reaction is found, reducing the computational cost of the algorithm. To explore PESs with low-temperature reactions, we applied the acceleration method collective variable (CV)-driven hyperdynamics. This involved the development of tailored CV templates, which are discussed in this study. We validate our approach for known pathways in various pyrolysis and oxidation systems: hydrocarbon isomerization and dissociation (C4H7 and C8H7 PES), mostly dominant at high temperatures and low-temperature oxidation of n-butane (C4H9O2 PES) and cyclohexane (C6H11O2 PES). As a result, in addition to new pathways showing up in the simulations, common isomerization and dissociation pathways were found very fast: for example, 44 reactions of butenyl radicals including major isomerizations and decompositions within about 30 min wall time and low-temperature chemistry such as the internal H-shift of RO2 → QO2H within 1 day wall time. Last, we applied PESmapping to the oxidation of the recently proposed biohybrid fuel 1,3-dioxane and validated that the tool could be used to discover new reaction pathways of larger molecules that are of practical use.
Collapse
Affiliation(s)
- Wassja A Kopp
- Institute of Technical Thermodynamics, RWTH Aachen University, 52062 Aachen, Germany
| | - Can Huang
- Institute of Technical Thermodynamics, RWTH Aachen University, 52062 Aachen, Germany
| | - Yuqing Zhao
- Institute of Technical Thermodynamics, RWTH Aachen University, 52062 Aachen, Germany
| | - Peiyang Yu
- Institute of Technical Thermodynamics, RWTH Aachen University, 52062 Aachen, Germany
| | - Felix Schmalz
- Institute of Technical Thermodynamics, RWTH Aachen University, 52062 Aachen, Germany
| | - Lukas Krep
- Institute of Technical Thermodynamics, RWTH Aachen University, 52062 Aachen, Germany
| | - Kai Leonhard
- Institute of Technical Thermodynamics, RWTH Aachen University, 52062 Aachen, Germany
| |
Collapse
|
14
|
Duan C, Du Y, Jia H, Kulik HJ. Accurate transition state generation with an object-aware equivariant elementary reaction diffusion model. NATURE COMPUTATIONAL SCIENCE 2023; 3:1045-1055. [PMID: 38177724 DOI: 10.1038/s43588-023-00563-7] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/12/2023] [Accepted: 11/03/2023] [Indexed: 01/06/2024]
Abstract
Transition state search is key in chemistry for elucidating reaction mechanisms and exploring reaction networks. The search for accurate 3D transition state structures, however, requires numerous computationally intensive quantum chemistry calculations due to the complexity of potential energy surfaces. Here we developed an object-aware SE(3) equivariant diffusion model that satisfies all physical symmetries and constraints for generating sets of structures-reactant, transition state and product-in an elementary reaction. Provided reactant and product, this model generates a transition state structure in seconds instead of hours, which is typically required when performing quantum-chemistry-based optimizations. The generated transition state structures achieve a median of 0.08 Å root mean square deviation compared to the true transition state. With a confidence scoring model for uncertainty quantification, we approach an accuracy required for reaction barrier estimation (2.6 kcal mol-1) by only performing quantum chemistry-based optimizations on 14% of the most challenging reactions. We envision usefulness for our approach in constructing large reaction networks with unknown mechanisms.
Collapse
Affiliation(s)
- Chenru Duan
- Department of Chemistry, Massachusetts Institute of Technology, Cambridge, MA, US.
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA, US.
| | - Yuanqi Du
- Department of Computer Science, Cornell University, Ithaca, NY, US
| | - Haojun Jia
- Department of Chemistry, Massachusetts Institute of Technology, Cambridge, MA, US
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA, US
| | - Heather J Kulik
- Department of Chemistry, Massachusetts Institute of Technology, Cambridge, MA, US
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA, US
| |
Collapse
|
15
|
Zhang Y, Xu C, Lan Z. Automated Exploration of Reaction Networks and Mechanisms Based on Metadynamics Nanoreactor Simulations. J Chem Theory Comput 2023. [PMID: 38031422 DOI: 10.1021/acs.jctc.3c00752] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2023]
Abstract
We developed an automated approach to construct a complex reaction network and explore the reaction mechanisms for numerous reactant molecules by integrating several theoretical approaches. Nanoreactor-type molecular dynamics was used to generate possible chemical reactions, in which the metadynamics was used to overcome the reaction barriers, and the semiempirical GFN2-xTB method was used to reduce the computational cost. Reaction events were identified from trajectories using the hidden Markov model based on the evolution of the molecular connectivity. This provided the starting points for further transition-state searches at the electronic structure levels of density functional theory to obtain the reaction mechanism. Finally, the entire reaction network containing multiple pathways was built. The feasibility and efficiency of the automated construction of the reaction network were investigated using the HCHO and NH3 biomolecular reaction and the reaction network for a multispecies system comprising dozens of HCN and H2O molecules. The results indicated that the proposed approach provides a valuable and effective tool for the automated exploration of the reaction networks.
Collapse
Affiliation(s)
- Yutai Zhang
- Guangdong Provincial Key Laboratory of Chemical Pollution and Environmental Safety and MOE Key Laboratory of Environmental Theoretical Chemistry, SCNU Environmental Research Institute, School of Environment, South China Normal University, Guangzhou 510006, P. R. China
| | - Chao Xu
- Guangdong Provincial Key Laboratory of Chemical Pollution and Environmental Safety and MOE Key Laboratory of Environmental Theoretical Chemistry, SCNU Environmental Research Institute, School of Environment, South China Normal University, Guangzhou 510006, P. R. China
| | - Zhenggang Lan
- Guangdong Provincial Key Laboratory of Chemical Pollution and Environmental Safety and MOE Key Laboratory of Environmental Theoretical Chemistry, SCNU Environmental Research Institute, School of Environment, South China Normal University, Guangzhou 510006, P. R. China
| |
Collapse
|
16
|
Hayashi H, Maeda S, Mita T. Quantum chemical calculations for reaction prediction in the development of synthetic methodologies. Chem Sci 2023; 14:11601-11616. [PMID: 37920348 PMCID: PMC10619630 DOI: 10.1039/d3sc03319h] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2023] [Accepted: 09/29/2023] [Indexed: 11/04/2023] Open
Abstract
Quantum chemical calculations have been used in the development of synthetic methodologies to analyze the reaction mechanisms of the developed reactions. Their ability to estimate chemical reaction pathways, including transition state energies and connected equilibria, has led researchers to embrace their use in predicting unknown reactions. This perspective highlights strategies that leverage quantum chemical calculations for the prediction of reactions in the discovery of new methodologies. Selected examples demonstrate how computation has driven the development of unknown reactions, catalyst design, and the exploration of synthetic routes to complex molecules prior to often laborious, costly, and time-consuming experimental investigations.
Collapse
Affiliation(s)
- Hiroki Hayashi
- Institute for Chemical Reaction Design and Discovery (WPI-ICReDD), Hokkaido University Kita 21, Nishi 10, Kita-ku Sapporo Hokkaido 001-0021 Japan
- JST-ERATO, Maeda Artificial Intelligence in Chemical Reaction Design and Discovery Project Kita 10, Nishi 8, Kita-ku Sapporo Hokkaido 060-0810 Japan
| | - Satoshi Maeda
- Institute for Chemical Reaction Design and Discovery (WPI-ICReDD), Hokkaido University Kita 21, Nishi 10, Kita-ku Sapporo Hokkaido 001-0021 Japan
- JST-ERATO, Maeda Artificial Intelligence in Chemical Reaction Design and Discovery Project Kita 10, Nishi 8, Kita-ku Sapporo Hokkaido 060-0810 Japan
- Department of Chemistry, Faculty of Science, Hokkaido University Kita 10, Nishi 8, Kita-ku Sapporo Hokkaido 060-0810 Japan
| | - Tsuyoshi Mita
- Institute for Chemical Reaction Design and Discovery (WPI-ICReDD), Hokkaido University Kita 21, Nishi 10, Kita-ku Sapporo Hokkaido 001-0021 Japan
- JST-ERATO, Maeda Artificial Intelligence in Chemical Reaction Design and Discovery Project Kita 10, Nishi 8, Kita-ku Sapporo Hokkaido 060-0810 Japan
| |
Collapse
|
17
|
Liu Z, Moroz YS, Isayev O. The challenge of balancing model sensitivity and robustness in predicting yields: a benchmarking study of amide coupling reactions. Chem Sci 2023; 14:10835-10846. [PMID: 37829036 PMCID: PMC10566507 DOI: 10.1039/d3sc03902a] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2023] [Accepted: 09/12/2023] [Indexed: 10/14/2023] Open
Abstract
Accurate prediction of reaction yield is the holy grail for computer-assisted synthesis prediction, but current models have failed to generalize to large literature datasets. To understand the causes and inspire future design, we systematically benchmarked the yield prediction task. We carefully curated and augmented a literature dataset of 41 239 amide coupling reactions, each with information on reactants, products, intermediates, yields, and reaction contexts, and provided 3D structures for the molecules. We calculated molecular features related to 2D and 3D structure information, as well as physical and electronic properties. These descriptors were paired with 4 categories of machine learning methods (linear, kernel, ensemble, and neural network), yielding valuable benchmarks about feature and model performance. Despite the excellent performance on a high-throughput experiment (HTE) dataset (R2 around 0.9), no method gave satisfactory results on the literature data. The best performance was an R2 of 0.395 ± 0.020 using the stack technique. Error analysis revealed that reactivity cliff and yield uncertainty are among the main reasons for incorrect predictions. Removing reactivity cliffs and uncertain reactions boosted the R2 to 0.457 ± 0.006. These results highlight that yield prediction models must be sensitive to the reactivity change due to the subtle structure variance, as well as be robust to the uncertainty associated with yield measurements.
Collapse
Affiliation(s)
- Zhen Liu
- Department of Chemistry, Mellon College of Science, Carnegie Mellon University Pittsburgh PA 15213 USA
| | - Yurii S Moroz
- Enamine Ltd Kyïv 02660 Ukraine
- Chemspace LLC Kyïv 02094 Ukraine
- Taras Shevchenko National University of Kyïv Kyïv 01601 Ukraine
| | - Olexandr Isayev
- Department of Chemistry, Mellon College of Science, Carnegie Mellon University Pittsburgh PA 15213 USA
| |
Collapse
|
18
|
Unsleber JP. Accelerating Reaction Network Explorations with Automated Reaction Template Extraction and Application. J Chem Inf Model 2023; 63:3392-3403. [PMID: 37216641 PMCID: PMC10268957 DOI: 10.1021/acs.jcim.3c00102] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2023] [Indexed: 05/24/2023]
Abstract
Autonomously exploring chemical reaction networks with first-principles methods can generate vast data. Especially autonomous explorations without tight constraints risk getting trapped in regions of reaction networks that are not of interest. In many cases, these regions of the networks are only exited once fully searched. Consequently, the required human time for analysis and computer time for data generation can make these investigations unfeasible. Here, we show how simple reaction templates can facilitate the transfer of chemical knowledge from expert input or existing data into new explorations. This process significantly accelerates reaction network explorations and improves cost-effectiveness. We discuss the definition of the reaction templates and their generation based on molecular graphs. The resulting simple filtering mechanism for autonomous reaction network investigations is exemplified with a polymerization reaction.
Collapse
Affiliation(s)
- Jan P. Unsleber
- Laboratory
of Physical Chemistry, ETH Zurich, Vladimir-Prelog-Weg 2, 8093 Zurich, Switzerland
| |
Collapse
|
19
|
Zhao Q, Garimella SS, Savoie BM. Thermally Accessible Prebiotic Pathways for Forming Ribonucleic Acid and Protein Precursors from Aqueous Hydrogen Cyanide. J Am Chem Soc 2023; 145:6135-6143. [PMID: 36883252 DOI: 10.1021/jacs.2c11857] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/09/2023]
Abstract
The search for prebiotic chemical pathways to biologically relevant molecules is a long-standing puzzle that has generated a menagerie of competing hypotheses with limited experimental prospects for falsification. However, the advent of computational network exploration methodologies has created the opportunity to compare the kinetic plausibility of various channels and even propose new pathways. Here, the space of organic molecules that can be formed within four polar or pericyclic reactions from water and hydrogen cyanide (HCN), two established prebiotic candidates for generating biological precursors, was comprehensively explored with a state-of-the-art exploration algorithm. A surprisingly diverse reactivity landscape was revealed within just a few steps of these simple molecules. Reaction pathways to several biologically relevant molecules were discovered involving lower activation energies and fewer reaction steps compared with recently proposed alternatives. Accounting for water-catalyzed reactions qualitatively affects the interpretation of the network kinetics. The case-study also highlights omissions of simpler and lower barrier reaction pathways to certain products by other algorithms that qualitatively affect the interpretation of HCN reactivity.
Collapse
Affiliation(s)
- Qiyuan Zhao
- Davidson School of Chemical Engineering, Purdue University, West Lafayette, Indiana 47906, United States
| | - Sanjay S Garimella
- Davidson School of Chemical Engineering, Purdue University, West Lafayette, Indiana 47906, United States
| | - Brett M Savoie
- Davidson School of Chemical Engineering, Purdue University, West Lafayette, Indiana 47906, United States
| |
Collapse
|
20
|
Zhao Q, Vaddadi SM, Woulfe M, Ogunfowora LA, Garimella SS, Isayev O, Savoie BM. Comprehensive exploration of graphically defined reaction spaces. Sci Data 2023; 10:145. [PMID: 36935430 PMCID: PMC10025260 DOI: 10.1038/s41597-023-02043-z] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2022] [Accepted: 02/27/2023] [Indexed: 03/21/2023] Open
Abstract
Existing reaction transition state (TS) databases are comparatively small and lack chemical diversity. Here, this data gap has been addressed using the concept of a graphically-defined model reaction to comprehensively characterize a reaction space associated with C, H, O, and N containing molecules with up to 10 heavy (non-hydrogen) atoms. The resulting dataset is composed of 176,992 organic reactions possessing at least one validated TS, activation energy, heat of reaction, reactant and product geometries, frequencies, and atom-mapping. For 33,032 reactions, more than one TS was discovered by conformational sampling, allowing conformational errors in TS prediction to be assessed. Data is supplied at the GFN2-xTB and B3LYP-D3/TZVP levels of theory. A subset of reactions were recalculated at the CCSD(T)-F12/cc-pVDZ-F12 and ωB97X-D2/def2-TZVP levels to establish relative errors. The resulting collection of reactions and properties are called the Reaction Graph Depth 1 (RGD1) dataset. RGD1 represents the largest and most chemically diverse TS dataset published to date and should find immediate use in developing novel machine learning models for predicting reaction properties.
Collapse
Affiliation(s)
- Qiyuan Zhao
- Davidson School of Chemical Engineering, Purdue University, West Lafayette, IN, 47906, USA
| | - Sai Mahit Vaddadi
- Davidson School of Chemical Engineering, Purdue University, West Lafayette, IN, 47906, USA
| | - Michael Woulfe
- Davidson School of Chemical Engineering, Purdue University, West Lafayette, IN, 47906, USA
| | - Lawal A Ogunfowora
- Department of Chemistry, Purdue University, West Lafayette, IN, 47906, USA
| | - Sanjay S Garimella
- Davidson School of Chemical Engineering, Purdue University, West Lafayette, IN, 47906, USA
| | - Olexandr Isayev
- Department of Chemistry, Carnegie Mellon University, Pittsburgh, PA, 15213, USA
| | - Brett M Savoie
- Davidson School of Chemical Engineering, Purdue University, West Lafayette, IN, 47906, USA.
| |
Collapse
|
21
|
Tu Z, Stuyver T, Coley CW. Predictive chemistry: machine learning for reaction deployment, reaction development, and reaction discovery. Chem Sci 2023; 14:226-244. [PMID: 36743887 PMCID: PMC9811563 DOI: 10.1039/d2sc05089g] [Citation(s) in RCA: 29] [Impact Index Per Article: 14.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2022] [Accepted: 11/25/2022] [Indexed: 11/29/2022] Open
Abstract
The field of predictive chemistry relates to the development of models able to describe how molecules interact and react. It encompasses the long-standing task of computer-aided retrosynthesis, but is far more reaching and ambitious in its goals. In this review, we summarize several areas where predictive chemistry models hold the potential to accelerate the deployment, development, and discovery of organic reactions and advance synthetic chemistry.
Collapse
Affiliation(s)
- Zhengkai Tu
- Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology 77 Massachusetts Avenue Cambridge MA 02139 USA
| | - Thijs Stuyver
- Department of Chemical Engineering, Massachusetts Institute of Technology 77 Massachusetts Avenue Cambridge MA 02139 USA
| | - Connor W Coley
- Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology 77 Massachusetts Avenue Cambridge MA 02139 USA
- Department of Chemical Engineering, Massachusetts Institute of Technology 77 Massachusetts Avenue Cambridge MA 02139 USA
| |
Collapse
|
22
|
Wen M, Spotte-Smith EWC, Blau SM, McDermott MJ, Krishnapriyan AS, Persson KA. Chemical reaction networks and opportunities for machine learning. NATURE COMPUTATIONAL SCIENCE 2023; 3:12-24. [PMID: 38177958 DOI: 10.1038/s43588-022-00369-z] [Citation(s) in RCA: 27] [Impact Index Per Article: 13.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/24/2022] [Accepted: 11/08/2022] [Indexed: 01/06/2024]
Abstract
Chemical reaction networks (CRNs), defined by sets of species and possible reactions between them, are widely used to interrogate chemical systems. To capture increasingly complex phenomena, CRNs can be leveraged alongside data-driven methods and machine learning (ML). In this Perspective, we assess the diverse strategies available for CRN construction and analysis in pursuit of a wide range of scientific goals, discuss ML techniques currently being applied to CRNs and outline future CRN-ML approaches, presenting scientific and technical challenges to overcome.
Collapse
Affiliation(s)
- Mingjian Wen
- Chemical and Biomolecular Engineering, University of Houston, Houston, TX, USA
- Energy Technologies Area, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Evan Walter Clark Spotte-Smith
- Materials Science Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
- Materials Science and Engineering, University of California, Berkeley, Berkeley, CA, USA
| | - Samuel M Blau
- Energy Technologies Area, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Matthew J McDermott
- Materials Science Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
- Materials Science and Engineering, University of California, Berkeley, Berkeley, CA, USA
| | - Aditi S Krishnapriyan
- Computational Research Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
- Chemical and Biomolecular Engineering, University of California, Berkeley, Berkeley, CA, USA
- Electrical Engineering and Computer Science, University of California, Berkeley, Berkeley, CA, USA
| | - Kristin A Persson
- Materials Science and Engineering, University of California, Berkeley, Berkeley, CA, USA.
- Molecular Foundry, Lawrence Berkeley National Laboratory, Berkeley, CA, USA.
| |
Collapse
|
23
|
Zhao Q, Savoie BM. Algorithmic Explorations of Unimolecular and Bimolecular Reaction Spaces. Angew Chem Int Ed Engl 2022; 61:e202210693. [PMID: 36074520 PMCID: PMC9827825 DOI: 10.1002/anie.202210693] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2022] [Indexed: 01/12/2023]
Abstract
Algorithmic reaction exploration based on transition state searches has already made inroads into many niche applications, but its potential as a general-purpose tool is still largely unrealized. Computational cost and the absence of benchmark problems involving larger molecules remain obstacles to further progress. Here an ultra-low cost exploration algorithm is implemented and used to explore the reactivity of unimolecular and bimolecular reactants, comprising a total of 581 reactions involving 51 distinct reactants. The algorithm discovers all established reaction pathways, where such comparisons are possible, while also revealing a much richer reactivity landscape, including lower barrier reaction pathways and a strong dependence of reaction conformation in the apparent barriers of the reported reactions. The diversity of these benchmarks illustrate that reaction exploration algorithms are approaching general-purpose capability.
Collapse
Affiliation(s)
- Qiyuan Zhao
- Davidson School of Chemical EngineeringPurdue UniversityWest LafayetteIN47906USA
| | - Brett M. Savoie
- Davidson School of Chemical EngineeringPurdue UniversityWest LafayetteIN47906USA
| |
Collapse
|
24
|
Ramos-Sánchez P, Harvey JN, Gámez JA. An automated method for graph-based chemical space exploration and transition state finding. J Comput Chem 2022; 44:27-42. [PMID: 36239971 DOI: 10.1002/jcc.27011] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2022] [Revised: 07/28/2022] [Accepted: 09/05/2022] [Indexed: 12/24/2022]
Abstract
Algorithms that automatically explore the chemical space have been limited to chemical systems with a low number of atoms due to expensive involved quantum calculations and the large amount of possible reaction pathways. The method described here presents a novel solution to the problem of chemical exploration by generating reaction networks with heuristics based on chemical theory. First, a second version of the reaction network is determined through molecular graph transformations acting upon functional groups of the reacting. Only transformations that break two chemical bonds and form two new ones are considered, leading to a significant performance enhancement compared to previously presented algorithm. Second, energy barriers for this reaction network are estimated through quantum chemical calculations by a growing string method, which can also identify non-octet species missed during the previous step and further define the reaction network. The proposed algorithm has been successfully applied to five different chemical reactions, in all cases identifying the most important reaction pathways.
Collapse
Affiliation(s)
- Pablo Ramos-Sánchez
- Digital R&D, Covestro Deutschland AG, Leverkusen, Germany.,Department of Chemistry, KU Leuven, Leuven, Belgium
| | | | - José A Gámez
- Digital R&D, Covestro Deutschland AG, Leverkusen, Germany
| |
Collapse
|
25
|
Ismail I, Chantreau Majerus R, Habershon S. Graph-Driven Reaction Discovery: Progress, Challenges, and Future Opportunities. J Phys Chem A 2022; 126:7051-7069. [PMID: 36190262 PMCID: PMC9574932 DOI: 10.1021/acs.jpca.2c06408] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2022] [Revised: 09/22/2022] [Indexed: 11/29/2022]
Abstract
Graph-based descriptors, such as bond-order matrices and adjacency matrices, offer a simple and compact way of categorizing molecular structures; furthermore, such descriptors can be readily used to catalog chemical reactions (i.e., bond-making and -breaking). As such, a number of graph-based methodologies have been developed with the goal of automating the process of generating chemical reaction network models describing the possible mechanistic chemistry in a given set of reactant species. Here, we outline the evolution of these graph-based reaction discovery schemes, with particular emphasis on more recent methods incorporating graph-based methods with semiempirical and ab initio electronic structure calculations, minimum-energy path refinements, and transition state searches. Using representative examples from homogeneous catalysis and interstellar chemistry, we highlight how these schemes increasingly act as "virtual reaction vessels" for interrogating mechanistic questions. Finally, we highlight where challenges remain, including issues of chemical accuracy and calculation speeds, as well as the inherent challenge of dealing with the vast size of accessible chemical reaction space.
Collapse
Affiliation(s)
- Idil Ismail
- Department of Chemistry, University
of Warwick, CoventryCV4 7AL, United Kingdom
| | | | - Scott Habershon
- Department of Chemistry, University
of Warwick, CoventryCV4 7AL, United Kingdom
| |
Collapse
|
26
|
Unsleber JP, Grimmel SA, Reiher M. Chemoton 2.0: Autonomous Exploration of Chemical Reaction Networks. J Chem Theory Comput 2022; 18:5393-5409. [PMID: 35926118 PMCID: PMC11516015 DOI: 10.1021/acs.jctc.2c00193] [Citation(s) in RCA: 29] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2022] [Indexed: 11/28/2022]
Abstract
Fueled by advances in hardware and algorithm design, large-scale automated explorations of chemical reaction space have become possible. Here, we present our approach to an open-source, extensible framework for explorations of chemical reaction mechanisms based on the first-principles of quantum mechanics. It is intended to facilitate reaction network explorations for diverse chemical problems with a wide range of goals such as mechanism elucidation, reaction path optimization, retrosynthetic path validation, reagent design, and microkinetic modeling. The stringent first-principles basis of all algorithms in our framework is key for the general applicability that avoids any restrictions to specific chemical systems. Such an agile framework requires multiple specialized software components of which we present three modules in this work. The key module, Chemoton, drives the exploration of reaction networks. For the exploration itself, we introduce two new algorithms for elementary-step searches that are based on Newton trajectories. The performance of these algorithms is assessed for a variety of reactions characterized by a broad chemical diversity in terms of bonding patterns and chemical elements. Chemoton successfully recovers the vast majority of these. We provide the resulting data, including large numbers of reactions that were not included in our reference set, to be used as a starting point for further explorations and for future reference.
Collapse
Affiliation(s)
- Jan P. Unsleber
- Laboratorium für Physikalische
Chemie, ETH Zürich, Vladimir-Prelog-Weg 2, 8093 Zürich, Switzerland
| | - Stephanie A. Grimmel
- Laboratorium für Physikalische
Chemie, ETH Zürich, Vladimir-Prelog-Weg 2, 8093 Zürich, Switzerland
| | - Markus Reiher
- Laboratorium für Physikalische
Chemie, ETH Zürich, Vladimir-Prelog-Weg 2, 8093 Zürich, Switzerland
| |
Collapse
|
27
|
Zhao Q, Xu Y, Greeley J, Savoie BM. Deep reaction network exploration at a heterogeneous catalytic interface. Nat Commun 2022; 13:4860. [PMID: 35982057 PMCID: PMC9388529 DOI: 10.1038/s41467-022-32514-7] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2022] [Accepted: 08/03/2022] [Indexed: 11/09/2022] Open
Abstract
Characterizing the reaction energies and barriers of reaction networks is central to catalyst development. However, heterogeneous catalytic surfaces pose several unique challenges to automatic reaction network characterization, including large sizes and open-ended reactant sets, that make ad hoc network construction the current state-of-the-art. Here, we show how automated network exploration algorithms can be adapted to the constraints of heterogeneous systems using ethylene oligomerization on silica-supported single-site Ga3+ as a model system. Using only graph-based rules for exploring the network and elementary constraints based on activation energy and size for identifying network terminations, a comprehensive reaction network is generated and validated against standard methods. The algorithm (re)discovers the Ga-alkyl-centered Cossee-Arlman mechanism that is hypothesized to drive major product formation while also predicting several new pathways for producing alkanes and coke precursors. These results demonstrate that automated reaction exploration algorithms are rapidly maturing towards general purpose capability for exploratory catalytic applications.
Collapse
Affiliation(s)
- Qiyuan Zhao
- Davidson School of Chemical Engineering, Purdue University, West Lafayette, IN, 47906, USA
| | - Yinan Xu
- Davidson School of Chemical Engineering, Purdue University, West Lafayette, IN, 47906, USA
| | - Jeffrey Greeley
- Davidson School of Chemical Engineering, Purdue University, West Lafayette, IN, 47906, USA.
| | - Brett M Savoie
- Davidson School of Chemical Engineering, Purdue University, West Lafayette, IN, 47906, USA.
| |
Collapse
|
28
|
Kang PL, Shi YF, Shang C, Liu ZP. Artificial intelligence pathway search to resolve catalytic glycerol hydrogenolysis selectivity. Chem Sci 2022; 13:8148-8160. [PMID: 35919423 PMCID: PMC9278456 DOI: 10.1039/d2sc02107b] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2022] [Accepted: 06/20/2022] [Indexed: 11/29/2022] Open
Abstract
The complex interaction between molecules and catalyst surfaces leads to great difficulties in understanding and predicting the activity and selectivity in heterogeneous catalysis. Here we develop an end-to-end artificial intelligence framework for the activity prediction of heterogeneous catalytic systems (AI-Cat method), which takes simple inputs from names of molecules and metal catalysts and outputs the reaction energy profile from the input molecule to low energy pathway products. The AI-Cat method combines two neural network models, one for predicting reaction patterns and the other for providing the reaction barrier and energy, with a Monte Carlo tree search to resolve the low energy pathways in a reaction network. We then apply AI-Cat to resolve the reaction network of glycerol hydrogenolysis on Cu surfaces, which is a typical selective C-O bond activation system and of key significance for biomass-derived polyol utilization. We show that glycerol hydrogenolysis features a huge reaction network of relevant candidates, containing 420 reaction intermediates and 2467 elementary reactions. Among them, the surface-mediated enol-keto tautomeric resonance is a key step to facilitate the primary C-OH bond breaking and thus selects 1,2-propanediol as the major product on Cu catalysts. 1,3-Propanediol can only be produced under strong acidic conditions and high surface H coverage by following a hydrogenation-dehydration pathway. AI-Cat further discovers six low-energy reaction patterns for C-O bond activation on metals that is of general significance to polyol catalysis. Our results demonstrate that the reaction prediction for complex heterogeneous catalysis is now feasible with AI-based atomic simulation and a Monte Carlo tree search.
Collapse
Affiliation(s)
- Pei-Lin Kang
- Collaborative Innovation Center of Chemistry for Energy Material, Shanghai Key Laboratory of Molecular Catalysis and Innovative Materials, Key Laboratory of Computational Physical Science, Department of Chemistry, Fudan University Shanghai 200433 China
| | - Yun-Fei Shi
- Collaborative Innovation Center of Chemistry for Energy Material, Shanghai Key Laboratory of Molecular Catalysis and Innovative Materials, Key Laboratory of Computational Physical Science, Department of Chemistry, Fudan University Shanghai 200433 China
| | - Cheng Shang
- Collaborative Innovation Center of Chemistry for Energy Material, Shanghai Key Laboratory of Molecular Catalysis and Innovative Materials, Key Laboratory of Computational Physical Science, Department of Chemistry, Fudan University Shanghai 200433 China
- Shanghai Qi Zhi Institution Shanghai 200030 China
| | - Zhi-Pan Liu
- Collaborative Innovation Center of Chemistry for Energy Material, Shanghai Key Laboratory of Molecular Catalysis and Innovative Materials, Key Laboratory of Computational Physical Science, Department of Chemistry, Fudan University Shanghai 200433 China
- Shanghai Qi Zhi Institution Shanghai 200030 China
- Key Laboratory of Synthetic and Self-Assembly Chemistry for Organic Functional Molecules, Shanghai Institute of Organic Chemistry, Chinese Academy of Sciences Shanghai 200032 China
| |
Collapse
|
29
|
Zhao Q, Hsu HH, Savoie BM. Conformational Sampling for Transition State Searches on a Computational Budget. J Chem Theory Comput 2022; 18:3006-3016. [PMID: 35403426 DOI: 10.1021/acs.jctc.2c00081] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023]
Affiliation(s)
- Qiyuan Zhao
- Davidson School of Chemical Engineering, Purdue University, West Lafayette, Indiana 47906, United States
| | - Hsuan-Hao Hsu
- Davidson School of Chemical Engineering, Purdue University, West Lafayette, Indiana 47906, United States
| | - Brett M. Savoie
- Davidson School of Chemical Engineering, Purdue University, West Lafayette, Indiana 47906, United States
| |
Collapse
|
30
|
Steiner M, Reiher M. Autonomous Reaction Network Exploration in Homogeneous and Heterogeneous Catalysis. Top Catal 2022; 65:6-39. [PMID: 35185305 PMCID: PMC8816766 DOI: 10.1007/s11244-021-01543-9] [Citation(s) in RCA: 20] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/17/2021] [Indexed: 12/11/2022]
Abstract
Autonomous computations that rely on automated reaction network elucidation algorithms may pave the way to make computational catalysis on a par with experimental research in the field. Several advantages of this approach are key to catalysis: (i) automation allows one to consider orders of magnitude more structures in a systematic and open-ended fashion than what would be accessible by manual inspection. Eventually, full resolution in terms of structural varieties and conformations as well as with respect to the type and number of potentially important elementary reaction steps (including decomposition reactions that determine turnover numbers) may be achieved. (ii) Fast electronic structure methods with uncertainty quantification warrant high efficiency and reliability in order to not only deliver results quickly, but also to allow for predictive work. (iii) A high degree of autonomy reduces the amount of manual human work, processing errors, and human bias. Although being inherently unbiased, it is still steerable with respect to specific regions of an emerging network and with respect to the addition of new reactant species. This allows for a high fidelity of the formalization of some catalytic process and for surprising in silico discoveries. In this work, we first review the state of the art in computational catalysis to embed autonomous explorations into the general field from which it draws its ingredients. We then elaborate on the specific conceptual issues that arise in the context of autonomous computational procedures, some of which we discuss at an example catalytic system. GRAPHICAL ABSTRACT SUPPLEMENTARY INFORMATION The online version contains supplementary material available at 10.1007/s11244-021-01543-9.
Collapse
Affiliation(s)
- Miguel Steiner
- Laboratory of Physical Chemistry, ETH Zurich, Vladimir-Prelog-Weg 2, 8093 Zurich, Switzerland
| | - Markus Reiher
- Laboratory of Physical Chemistry, ETH Zurich, Vladimir-Prelog-Weg 2, 8093 Zurich, Switzerland
| |
Collapse
|