51
|
Kulichenko M, Barros K, Lubbers N, Li YW, Messerly R, Tretiak S, Smith JS, Nebgen B. Uncertainty-driven dynamics for active learning of interatomic potentials. NATURE COMPUTATIONAL SCIENCE 2023; 3:230-239. [PMID: 38177878 PMCID: PMC10766548 DOI: 10.1038/s43588-023-00406-5] [Citation(s) in RCA: 15] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/27/2022] [Accepted: 01/24/2023] [Indexed: 01/06/2024]
Abstract
Machine learning (ML) models, if trained to data sets of high-fidelity quantum simulations, produce accurate and efficient interatomic potentials. Active learning (AL) is a powerful tool to iteratively generate diverse data sets. In this approach, the ML model provides an uncertainty estimate along with its prediction for each new atomic configuration. If the uncertainty estimate passes a certain threshold, then the configuration is included in the data set. Here we develop a strategy to more rapidly discover configurations that meaningfully augment the training data set. The approach, uncertainty-driven dynamics for active learning (UDD-AL), modifies the potential energy surface used in molecular dynamics simulations to favor regions of configuration space for which there is large model uncertainty. The performance of UDD-AL is demonstrated for two AL tasks: sampling the conformational space of glycine and sampling the promotion of proton transfer in acetylacetone. The method is shown to efficiently explore the chemically relevant configuration space, which may be inaccessible using regular dynamical sampling at target temperature conditions.
Collapse
Affiliation(s)
- Maksim Kulichenko
- Theoretical Division, Los Alamos National Laboratory, Los Alamos, NM, USA.
| | - Kipton Barros
- Theoretical Division, Los Alamos National Laboratory, Los Alamos, NM, USA
- Center for Nonlinear Studies, Los Alamos National Laboratory, Los Alamos, NM, USA
| | - Nicholas Lubbers
- Computer, Computational, and Statistical Sciences Division, Los Alamos National Laboratory, Los Alamos, NM, USA
| | - Ying Wai Li
- Computer, Computational, and Statistical Sciences Division, Los Alamos National Laboratory, Los Alamos, NM, USA
| | - Richard Messerly
- Theoretical Division, Los Alamos National Laboratory, Los Alamos, NM, USA
| | - Sergei Tretiak
- Theoretical Division, Los Alamos National Laboratory, Los Alamos, NM, USA
- Center for Nonlinear Studies, Los Alamos National Laboratory, Los Alamos, NM, USA
- Center for Integrated Nanotechnologies, Los Alamos National Laboratory, Los Alamos, NM, USA
| | - Justin S Smith
- Theoretical Division, Los Alamos National Laboratory, Los Alamos, NM, USA.
- Nvidia Corporation, Santa Clara, CA, USA.
| | - Benjamin Nebgen
- Theoretical Division, Los Alamos National Laboratory, Los Alamos, NM, USA.
| |
Collapse
|
52
|
Choi S. Prediction of transition state structures of gas-phase chemical reactions via machine learning. Nat Commun 2023; 14:1168. [PMID: 36859495 PMCID: PMC9977841 DOI: 10.1038/s41467-023-36823-3] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2022] [Accepted: 02/15/2023] [Indexed: 03/03/2023] Open
Abstract
The elucidation of transition state (TS) structures is essential for understanding the mechanisms of chemical reactions and exploring reaction networks. Despite significant advances in computational approaches, TS searching remains a challenging problem owing to the difficulty of constructing an initial structure and heavy computational costs. In this paper, a machine learning (ML) model for predicting the TS structures of general organic reactions is proposed. The proposed model derives the interatomic distances of a TS structure from atomic pair features reflecting reactant, product, and linearly interpolated structures. The model exhibits excellent accuracy, particularly for atomic pairs in which bond formation or breakage occurs. The predicted TS structures yield a high success ratio (93.8%) for quantum chemical saddle point optimizations, and 88.8% of the optimization results have energy errors of less than 0.1 kcal mol-1. Additionally, as a proof of concept, the exploration of multiple reaction paths of an organic reaction is demonstrated based on ML inferences. I envision that the proposed approach will aid in the construction of initial geometries for TS optimization and reaction path exploration.
Collapse
Affiliation(s)
- Sunghwan Choi
- Division of National Supercomputing, Korea Institute of Science and Technology Information, 245 Daehak-ro, Yuseong-gu, 34141, Daejeon, Republic of Korea.
| |
Collapse
|
53
|
Chen Y, Ou Y, Zheng P, Huang Y, Ge F, Dral PO. Benchmark of general-purpose machine learning-based quantum mechanical method AIQM1 on reaction barrier heights. J Chem Phys 2023; 158:074103. [PMID: 36813722 DOI: 10.1063/5.0137101] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023] Open
Abstract
Artificial intelligence-enhanced quantum mechanical method 1 (AIQM1) is a general-purpose method that was shown to achieve high accuracy for many applications with a speed close to its baseline semiempirical quantum mechanical (SQM) method ODM2*. Here, we evaluate the hitherto unknown performance of out-of-the-box AIQM1 without any refitting for reaction barrier heights on eight datasets, including a total of ∼24 thousand reactions. This evaluation shows that AIQM1's accuracy strongly depends on the type of transition state and ranges from excellent for rotation barriers to poor for, e.g., pericyclic reactions. AIQM1 clearly outperforms its baseline ODM2* method and, even more so, a popular universal potential, ANI-1ccx. Overall, however, AIQM1 accuracy largely remains similar to SQM methods (and B3LYP/6-31G* for most reaction types) suggesting that it is desirable to focus on improving AIQM1 performance for barrier heights in the future. We also show that the built-in uncertainty quantification helps in identifying confident predictions. The accuracy of confident AIQM1 predictions is approaching the level of popular density functional theory methods for most reaction types. Encouragingly, AIQM1 is rather robust for transition state optimizations, even for the type of reactions it struggles with the most. Single-point calculations with high-level methods on AIQM1-optimized geometries can be used to significantly improve barrier heights, which cannot be said for its baseline ODM2* method.
Collapse
Affiliation(s)
- Yuxinxin Chen
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, Department of Chemistry, and College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, China
| | - Yanchi Ou
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, Department of Chemistry, and College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, China
| | - Peikun Zheng
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, Department of Chemistry, and College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, China
| | - Yaohuang Huang
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, Department of Chemistry, and College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, China
| | - Fuchun Ge
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, Department of Chemistry, and College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, China
| | - Pavlo O Dral
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, Department of Chemistry, and College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, China
| |
Collapse
|
54
|
Villard J, Kılıç M, Rothlisberger U. Surrogate Based Genetic Algorithm Method for Efficient Identification of Low-Energy Peptide Structures. J Chem Theory Comput 2023; 19:1080-1097. [PMID: 36692853 PMCID: PMC9933449 DOI: 10.1021/acs.jctc.2c01078] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2022] [Indexed: 01/25/2023]
Abstract
Identification of the most stable structure(s) of a system is a prerequisite for the calculation of any of its properties from first-principles. However, even for relatively small molecules, exhaustive explorations of the potential energy surface (PES) are severely hampered by the dimensionality bottleneck. In this work, we address the challenging task of efficiently sampling realistic low-lying peptide coordinates by resorting to a surrogate based genetic algorithm (GA)/density functional theory (DFT) approach (sGADFT) in which promising candidates provided by the GA are ultimately optimized with DFT. We provide a benchmark of several computational methods (GAFF, AMOEBApro13, PM6, PM7, DFTB3-D3(BJ)) as possible prescanning surrogates and apply sGADFT to two test case systems that are (i) two isomer families of the protonated Gly-Pro-Gly-Gly tetrapeptide (Masson, A.; J. Am. Soc. Mass Spectrom.2015, 26, 1444-1454) and (ii) the doubly protonated cyclic decapeptide gramicidin S (Nagornova, N. S.; J. Am. Chem. Soc.2010, 132, 4040-4041). We show that our GA procedure can correctly identify low-energy minima in as little as a few hours. Subsequent refinement of surrogate low-energy structures within a given energy threshold (≤10 kcal/mol (i), ≤5 kcal/mol (ii)) via DFT relaxation invariably led to the identification of the most stable structures as determined from high-resolution infrared (IR) spectroscopy at low temperature. The sGADFT method therefore constitutes a highly efficient route for the screening of realistic low-lying peptide structures in the gas phase as needed for instance for the interpretation and assignment of experimental IR spectra.
Collapse
Affiliation(s)
- Justin Villard
- Laboratory of Computational Chemistry
and Biochemistry, Institute of Chemical Sciences and Engineering, École Polytechnique Fédérale
de Lausanne (EPFL), CH-1015Lausanne, Switzerland
| | - Murat Kılıç
- Laboratory of Computational Chemistry
and Biochemistry, Institute of Chemical Sciences and Engineering, École Polytechnique Fédérale
de Lausanne (EPFL), CH-1015Lausanne, Switzerland
| | - Ursula Rothlisberger
- Laboratory of Computational Chemistry
and Biochemistry, Institute of Chemical Sciences and Engineering, École Polytechnique Fédérale
de Lausanne (EPFL), CH-1015Lausanne, Switzerland
| |
Collapse
|
55
|
Chmiela S, Vassilev-Galindo V, Unke OT, Kabylda A, Sauceda HE, Tkatchenko A, Müller KR. Accurate global machine learning force fields for molecules with hundreds of atoms. SCIENCE ADVANCES 2023; 9:eadf0873. [PMID: 36630510 PMCID: PMC9833674 DOI: 10.1126/sciadv.adf0873] [Citation(s) in RCA: 40] [Impact Index Per Article: 40.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/29/2022] [Accepted: 11/28/2022] [Indexed: 05/25/2023]
Abstract
Global machine learning force fields, with the capacity to capture collective interactions in molecular systems, now scale up to a few dozen atoms due to considerable growth of model complexity with system size. For larger molecules, locality assumptions are introduced, with the consequence that nonlocal interactions are not described. Here, we develop an exact iterative approach to train global symmetric gradient domain machine learning (sGDML) force fields (FFs) for several hundred atoms, without resorting to any potentially uncontrolled approximations. All atomic degrees of freedom remain correlated in the global sGDML FF, allowing the accurate description of complex molecules and materials that present phenomena with far-reaching characteristic correlation lengths. We assess the accuracy and efficiency of sGDML on a newly developed MD22 benchmark dataset containing molecules from 42 to 370 atoms. The robustness of our approach is demonstrated in nanosecond path-integral molecular dynamics simulations for supramolecular complexes in the MD22 dataset.
Collapse
Affiliation(s)
- Stefan Chmiela
- Machine Learning Group, Technische Universität Berlin, 10587 Berlin, Germany
- Berlin Institute for the Foundations of Learning and Data – BIFOLD, Germany
| | - Valentin Vassilev-Galindo
- Department of Physics and Materials Science, University of Luxembourg, L-1511 Luxembourg City, Luxembourg
| | - Oliver T. Unke
- Machine Learning Group, Technische Universität Berlin, 10587 Berlin, Germany
- Google Research, Brain Team, Berlin, Germany
| | - Adil Kabylda
- Department of Physics and Materials Science, University of Luxembourg, L-1511 Luxembourg City, Luxembourg
| | - Huziel E. Sauceda
- Machine Learning Group, Technische Universität Berlin, 10587 Berlin, Germany
- Berlin Institute for the Foundations of Learning and Data – BIFOLD, Germany
- Departamento de Materia Condensada, Instituto de Física, Universidad Nacional Autónoma de México, Cd. de México C.P. 04510, Mexico
- BASLEARN - TU Berlin/BASF Joint Lab for Machine Learning, Technische Universität Berlin, 10587 Berlin, Germany
| | - Alexandre Tkatchenko
- Department of Physics and Materials Science, University of Luxembourg, L-1511 Luxembourg City, Luxembourg
| | - Klaus-Robert Müller
- Machine Learning Group, Technische Universität Berlin, 10587 Berlin, Germany
- Berlin Institute for the Foundations of Learning and Data – BIFOLD, Germany
- Google Research, Brain Team, Berlin, Germany
- Max Planck Institute for Informatics, Stuhlsatzenhausweg, 66123 Saarbrücken, Germany
- Department of Artificial Intelligence, Korea University, Anam-dong, Seongbuk-gu, Seoul 02841, Korea
| |
Collapse
|
56
|
Xia S, Zhang D, Zhang Y. Multitask Deep Ensemble Prediction of Molecular Energetics in Solution: From Quantum Mechanics to Experimental Properties. J Chem Theory Comput 2023; 19:10.1021/acs.jctc.2c01024. [PMID: 36607141 PMCID: PMC10323048 DOI: 10.1021/acs.jctc.2c01024] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023]
Abstract
The past few years have witnessed significant advances in developing machine learning methods for molecular energetics predictions, including calculated electronic energies with high-level quantum mechanical methods and experimental properties, such as solvation free energy and logP. Typically, task-specific machine learning models are developed for distinct prediction tasks. In this work, we present a multitask deep ensemble model, sPhysNet-MT-ens5, which can simultaneously and accurately predict electronic energies of molecules in gas, water, and octanol phases, as well as transfer free energies at both calculated and experimental levels. On the calculated data set Frag20-solv-678k, which is developed in this work and contains 678,916 molecular conformations, up to 20 heavy atoms, and their properties calculated at B3LYP/6-31G* level of theory with continuum solvent models, sPhysNet-MT-ens5 predicts density functional theory (DFT)-level electronic energies directly from force field-optimized geometry within chemical accuracy. On the experimental data sets, sPhysNet-MT-ens5 achieves state-of-the-art performances, which predict both experimental hydration free energy with a RMSE of 0.620 kcal/mol on the FreeSolv data set and experimental logP with a RMSE of 0.393 on the PHYSPROP data set. Furthermore, sPhysNet-MT-ens5 also provides a reasonable estimation of model uncertainty which shows correlations with prediction error. Finally, by analyzing the atomic contributions of its predictions, we find that the developed deep learning model is aware of the chemical environment of each atom by assigning reasonable atomic contributions consistent with our chemical knowledge.
Collapse
Affiliation(s)
- Song Xia
- Department of Chemistry, New York University, New York, New York 10003, United States
| | - Dongdong Zhang
- Department of Chemistry, New York University, New York, New York 10003, United States
| | - Yingkai Zhang
- Department of Chemistry, New York University, New York, New York 10003, United States
- Simons Center for Computational Physical Chemistry at New York University, New York, New York 10003, United States
- NYU-ECNU Center for Computational Chemistry at NYU Shanghai, Shanghai 200062, China
| |
Collapse
|
57
|
Combining machine‐learning and molecular‐modeling methods for drug‐target affinity predictions. WIRES COMPUTATIONAL MOLECULAR SCIENCE 2022. [DOI: 10.1002/wcms.1653] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]
|
58
|
Schreiner M, Bhowmik A, Vegge T, Busk J, Winther O. Transition1x - a dataset for building generalizable reactive machine learning potentials. Sci Data 2022; 9:779. [PMID: 36566281 PMCID: PMC9789978 DOI: 10.1038/s41597-022-01870-w] [Citation(s) in RCA: 20] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2022] [Accepted: 11/16/2022] [Indexed: 12/25/2022] Open
Abstract
Machine Learning (ML) models have, in contrast to their usefulness in molecular dynamics studies, had limited success as surrogate potentials for reaction barrier search. This is primarily because available datasets for training ML models on small molecular systems almost exclusively contain configurations at or near equilibrium. In this work, we present the dataset Transition1x containing 9.6 million Density Functional Theory (DFT) calculations of forces and energies of molecular configurations on and around reaction pathways at the ωB97x/6-31 G(d) level of theory. The data was generated by running Nudged Elastic Band (NEB) with DFT on 10k organic reactions of various types while saving intermediate calculations. We train equivariant graph message-passing neural network models on Transition1x and cross-validate on the popular ANI1x and QM9 datasets. We show that ML models cannot learn features in transition state regions solely by training on hitherto popular benchmark datasets. Transition1x is a new challenging benchmark that will provide an important step towards developing next-generation ML force fields that also work far away from equilibrium configurations and reactive systems.
Collapse
Affiliation(s)
- Mathias Schreiner
- DTU Compute, Technical University of Denmark (DTU), 2800, Lyngby, Denmark.
| | - Arghya Bhowmik
- DTU Energy, Technical University of Denmark, 2800, Lyngby, Denmark
| | - Tejs Vegge
- DTU Energy, Technical University of Denmark, 2800, Lyngby, Denmark
| | - Jonas Busk
- DTU Energy, Technical University of Denmark, 2800, Lyngby, Denmark
| | - Ole Winther
- DTU Compute, Technical University of Denmark (DTU), 2800, Lyngby, Denmark
- Department of Biology, University of Copenhagen (UCph), 2700, Copenhagen N, Denmark
- Genomic Medicine, Copenhagen University Hospital, Rigshospitalet, 2100, Copenhagen Ø, Denmark
| |
Collapse
|
59
|
Browning NJ, Faber FA, Anatole von Lilienfeld O. GPU-accelerated approximate kernel method for quantum machine learning. J Chem Phys 2022; 157:214801. [PMID: 36511559 DOI: 10.1063/5.0108967] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
We introduce Quantum Machine Learning (QML)-Lightning, a PyTorch package containing graphics processing unit (GPU)-accelerated approximate kernel models, which can yield trained models within seconds. QML-Lightning includes a cost-efficient GPU implementation of FCHL19, which together can provide energy and force predictions with competitive accuracy on a microsecond per atom timescale. Using modern GPU hardware, we report learning curves of energies and forces as well as timings as numerical evidence for select legacy benchmarks from atomistic simulation including QM9, MD-17, and 3BPA.
Collapse
Affiliation(s)
- Nicholas J Browning
- Institute of Physical Chemistry and National Center for Computational Design and Discovery of Novel Materials, Department of Chemistry, University of Basel, Klingelbergstrasse 80, CH-4056 Basel, Switzerland
| | - Felix A Faber
- Department of Physics, University of Cambridge, Cambridge, United Kingdom
| | | |
Collapse
|
60
|
Reiser P, Neubert M, Eberhard A, Torresi L, Zhou C, Shao C, Metni H, van Hoesel C, Schopmans H, Sommer T, Friederich P. Graph neural networks for materials science and chemistry. COMMUNICATIONS MATERIALS 2022; 3:93. [PMID: 36468086 PMCID: PMC9702700 DOI: 10.1038/s43246-022-00315-6] [Citation(s) in RCA: 68] [Impact Index Per Article: 34.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/10/2022] [Accepted: 11/07/2022] [Indexed: 05/14/2023]
Abstract
Machine learning plays an increasingly important role in many areas of chemistry and materials science, being used to predict materials properties, accelerate simulations, design new structures, and predict synthesis routes of new materials. Graph neural networks (GNNs) are one of the fastest growing classes of machine learning models. They are of particular relevance for chemistry and materials science, as they directly work on a graph or structural representation of molecules and materials and therefore have full access to all relevant information required to characterize materials. In this Review, we provide an overview of the basic principles of GNNs, widely used datasets, and state-of-the-art architectures, followed by a discussion of a wide range of recent applications of GNNs in chemistry and materials science, and concluding with a road-map for the further development and application of GNNs.
Collapse
Affiliation(s)
- Patrick Reiser
- Institute of Theoretical Informatics, Karlsruhe Institute of Technology, Am Fasanengarten 5, 76131 Karlsruhe, Germany
- Institute of Nanotechnology, Karlsruhe Institute of Technology, Hermann-von-Helmholtz-Platz 1, 76344 Eggenstein-Leopoldshafen, Germany
| | - Marlen Neubert
- Institute of Theoretical Informatics, Karlsruhe Institute of Technology, Am Fasanengarten 5, 76131 Karlsruhe, Germany
| | - André Eberhard
- Institute of Theoretical Informatics, Karlsruhe Institute of Technology, Am Fasanengarten 5, 76131 Karlsruhe, Germany
| | - Luca Torresi
- Institute of Theoretical Informatics, Karlsruhe Institute of Technology, Am Fasanengarten 5, 76131 Karlsruhe, Germany
| | - Chen Zhou
- Institute of Theoretical Informatics, Karlsruhe Institute of Technology, Am Fasanengarten 5, 76131 Karlsruhe, Germany
| | - Chen Shao
- Institute of Theoretical Informatics, Karlsruhe Institute of Technology, Am Fasanengarten 5, 76131 Karlsruhe, Germany
- Present Address: Institute for Applied Informatics and Formal Description Systems, Karlsruhe Institute of Technology, Kaiserstr. 89, 76133 Karlsruhe, Germany
| | - Houssam Metni
- Institute of Theoretical Informatics, Karlsruhe Institute of Technology, Am Fasanengarten 5, 76131 Karlsruhe, Germany
- ECPM, Université de Strasbourg, 25 Rue Becquerel, 67087 Strasbourg, France
| | - Clint van Hoesel
- Institute of Theoretical Informatics, Karlsruhe Institute of Technology, Am Fasanengarten 5, 76131 Karlsruhe, Germany
- Department of Applied Physics, Eindhoven University of Technology, Groene Loper 19, 5612 AP Eindhoven, The Netherlands
| | - Henrik Schopmans
- Institute of Theoretical Informatics, Karlsruhe Institute of Technology, Am Fasanengarten 5, 76131 Karlsruhe, Germany
- Institute of Nanotechnology, Karlsruhe Institute of Technology, Hermann-von-Helmholtz-Platz 1, 76344 Eggenstein-Leopoldshafen, Germany
| | - Timo Sommer
- Institute of Theoretical Informatics, Karlsruhe Institute of Technology, Am Fasanengarten 5, 76131 Karlsruhe, Germany
- Institute for Theory of Condensed Matter, Karlsruhe Institute of Technology, Wolfgang-Gaede-Str. 1, 76131 Karlsruhe, Germany
- Present Address: School of Chemistry, Trinity College Dublin, College Green, Dublin 2, Ireland
| | - Pascal Friederich
- Institute of Theoretical Informatics, Karlsruhe Institute of Technology, Am Fasanengarten 5, 76131 Karlsruhe, Germany
- Institute of Nanotechnology, Karlsruhe Institute of Technology, Hermann-von-Helmholtz-Platz 1, 76344 Eggenstein-Leopoldshafen, Germany
| |
Collapse
|
61
|
Wang W, Liu Y, Wang Z, Hao G, Song B. The way to AI-controlled synthesis: how far do we need to go? Chem Sci 2022; 13:12604-12615. [PMID: 36519036 PMCID: PMC9645373 DOI: 10.1039/d2sc04419f] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2022] [Accepted: 09/26/2022] [Indexed: 09/08/2024] Open
Abstract
Chemical synthesis always plays an irreplaceable role in chemical, materials, and pharmacological fields. Meanwhile, artificial intelligence (AI) is causing a rapid technological revolution in many fields by replacing manual chemical synthesis and has exhibited a much more economical and time-efficient manner. However, the rate-determining step of AI-controlled synthesis systems is rarely mentioned, which makes it difficult to apply them in general laboratories. Here, the history of developing AI-aided synthesis has been overviewed and summarized. We propose that the hardware of AI-controlled synthesis systems should be more adaptive to execute reactions with different phase reagents and under different reaction conditions, and the software of AI-controlled synthesis systems should have richer kinds of reaction prediction modules. An updated system will better address more different kinds of syntheses. Our viewpoint could help scientists advance the revolution that combines AI and synthesis to achieve more progress in complicated systems.
Collapse
Affiliation(s)
- Wei Wang
- State Key Laboratory Breeding Base of Green Pesticide and Agricultural Bioengineering, Key Laboratory of Green Pesticide and Agricultural Bioengineering, Ministry of Education, Research and Development Center for Fine Chemicals, Guizhou University Guiyang 550025 P. R. China
| | - Yingwei Liu
- State Key Laboratory of Public Big Data, Guizhou University Guiyang 550025 P. R. China
| | - Zheng Wang
- State Key Laboratory Breeding Base of Green Pesticide and Agricultural Bioengineering, Key Laboratory of Green Pesticide and Agricultural Bioengineering, Ministry of Education, Research and Development Center for Fine Chemicals, Guizhou University Guiyang 550025 P. R. China
| | - Gefei Hao
- State Key Laboratory Breeding Base of Green Pesticide and Agricultural Bioengineering, Key Laboratory of Green Pesticide and Agricultural Bioengineering, Ministry of Education, Research and Development Center for Fine Chemicals, Guizhou University Guiyang 550025 P. R. China
| | - Baoan Song
- State Key Laboratory Breeding Base of Green Pesticide and Agricultural Bioengineering, Key Laboratory of Green Pesticide and Agricultural Bioengineering, Ministry of Education, Research and Development Center for Fine Chemicals, Guizhou University Guiyang 550025 P. R. China
| |
Collapse
|
62
|
Schmitz N, Müller KR, Chmiela S. Algorithmic Differentiation for Automated Modeling of Machine Learned Force Fields. J Phys Chem Lett 2022; 13:10183-10189. [PMID: 36279418 PMCID: PMC9639201 DOI: 10.1021/acs.jpclett.2c02632] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2022] [Accepted: 10/20/2022] [Indexed: 05/09/2023]
Abstract
Reconstructing force fields (FFs) from atomistic simulation data is a challenge since accurate data can be highly expensive. Here, machine learning (ML) models can help to be data economic as they can be successfully constrained using the underlying symmetry and conservation laws of physics. However, so far, every descriptor newly proposed for an ML model has required a cumbersome and mathematically tedious remodeling. We therefore propose using modern techniques from algorithmic differentiation within the ML modeling process, effectively enabling the usage of novel descriptors or models fully automatically at an order of magnitude higher computational efficiency. This paradigmatic approach enables not only a versatile usage of novel representations and the efficient computation of larger systems─all of high value to the FF community─but also the simple inclusion of further physical knowledge, such as higher-order information (e.g., Hessians, more complex partial differential equations constraints etc.), even beyond the presented FF domain.
Collapse
Affiliation(s)
| | - Klaus-Robert Müller
- Machine
Learning Group, Technische Universität
Berlin, 10587Berlin, Germany
- BIFOLD
- Berlin Institute for the Foundations of Learning and Data, 10587Berlin, Germany
- Department
of Artificial Intelligence, Korea University, Seongbuk-gu, Seoul02841, Korea
- Max
Planck Institute for Informatics, Stuhlsatzenhausweg, 66123Saarbrücken, Germany
- Google
Research, Brain Team, 10117Berlin, Germany
| | - Stefan Chmiela
- Machine
Learning Group, Technische Universität
Berlin, 10587Berlin, Germany
- BIFOLD
- Berlin Institute for the Foundations of Learning and Data, 10587Berlin, Germany
| |
Collapse
|
63
|
Schnake T, Eberle O, Lederer J, Nakajima S, Schutt KT, Muller KR, Montavon G. Higher-Order Explanations of Graph Neural Networks via Relevant Walks. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2022; 44:7581-7596. [PMID: 34559639 DOI: 10.1109/tpami.2021.3115452] [Citation(s) in RCA: 16] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Graph Neural Networks (GNNs) are a popular approach for predicting graph structured data. As GNNs tightly entangle the input graph into the neural network structure, common explainable AI approaches are not applicable. To a large extent, GNNs have remained black-boxes for the user so far. In this paper, we show that GNNs can in fact be naturally explained using higher-order expansions, i.e., by identifying groups of edges that jointly contribute to the prediction. Practically, we find that such explanations can be extracted using a nested attribution scheme, where existing techniques such as layer-wise relevance propagation (LRP) can be applied at each step. The output is a collection of walks into the input graph that are relevant for the prediction. Our novel explanation method, which we denote by GNN-LRP, is applicable to a broad range of graph neural networks and lets us extract practically relevant insights on sentiment analysis of text data, structure-property relationships in quantum chemistry, and image classification.
Collapse
|
64
|
Tu C, Huang W, Liang S, Wang K, Tian Q, Yan W. Combining machine learning and quantum chemical calculations for high-throughput virtual screening of thermally activated delayed fluorescence molecular materials: the impact of selection strategy and structural mutations. RSC Adv 2022; 12:30962-30975. [PMID: 36349007 PMCID: PMC9619240 DOI: 10.1039/d2ra05643g] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2022] [Accepted: 10/09/2022] [Indexed: 11/23/2022] Open
Abstract
In view of the theoretical importance and huge application potential of Thermally Activated Delayed Fluorescence (TADF) materials, it is of great significance to conduct High-Throughput Virtual Screening (HTVS) on compound libraries to find TADF candidate molecules. This research focuses on the computational design of pure organic TADF molecules. By combining machine learning and quantum chemical calculations, using cheminformatics tools, and introducing the concept of selection and mutation from evolutionary theory, we have designed a computational program for HTVS of TADF molecular materials, especially the impact of selection strategy and structural mutations on the results of HTVS was explored. An initial compound library (size = 103) constructed by enumeration of typical donors and acceptors was used to evolve by successively applying selection and 10 different structural mutations. And a group fingerprint similarity (ΔMSPR) index was proposed to account for the similarity between two compound libraries with comparable sizes. Based on the computed data, we have found that the mix of selection and mutations into the evolution map does have great impact on the HTVS results: (a) except the fast mutation Sub2, all the rest of the mutations can effectively concentrate 'good' molecules in a compound library, and hence give large material abundance (typically >0.8) for high mutation generations (n g ≥ 6). (b) The mean energy gap can exhibit a fast convergent trend toward very low values, hence the studied mutations (except Sub2) can cooperate very well with the studied DA substrates to generate optimal molecules, and the group fingerprint similarity can retain high enough values for large n g, which can be associated with the apparent convergence in molecular skeletons as n g increases. (c) The distribution of skeleton frequencies for a specific mutation is generally uneven with one dominant skeleton. The overall numbers of common and generic cores for all mutations are 11 and 7 as n g = 9. Hence, in a sense, the 'optimal' skeletons seem unique and useful in realizing low energy gaps. With these observations and the development of related HTVS software, we expect to provide insight and tools to the research community of HTVS of molecular (TADF) materials.
Collapse
Affiliation(s)
- Chunyun Tu
- School of Chemistry and Materials Engineering, Guiyang University Guiyang 550005 P. R. China +86-180-9605-0905
| | - Weijiang Huang
- School of Chemistry and Materials Engineering, Guiyang University Guiyang 550005 P. R. China +86-180-9605-0905
| | - Sheng Liang
- School of Mathematics and Information Science, Guiyang University Guiyang 550005 P. R. China
| | - Kui Wang
- School of Chemistry and Materials Engineering, Guiyang University Guiyang 550005 P. R. China +86-180-9605-0905
| | - Qin Tian
- School of Chemistry and Materials Engineering, Guiyang University Guiyang 550005 P. R. China +86-180-9605-0905
| | - Wei Yan
- School of Chemistry and Materials Engineering, Guiyang University Guiyang 550005 P. R. China +86-180-9605-0905
| |
Collapse
|
65
|
Krenn M, Ai Q, Barthel S, Carson N, Frei A, Frey NC, Friederich P, Gaudin T, Gayle AA, Jablonka KM, Lameiro RF, Lemm D, Lo A, Moosavi SM, Nápoles-Duarte JM, Nigam A, Pollice R, Rajan K, Schatzschneider U, Schwaller P, Skreta M, Smit B, Strieth-Kalthoff F, Sun C, Tom G, Falk von Rudorff G, Wang A, White AD, Young A, Yu R, Aspuru-Guzik A. SELFIES and the future of molecular string representations. PATTERNS (NEW YORK, N.Y.) 2022; 3:100588. [PMID: 36277819 PMCID: PMC9583042 DOI: 10.1016/j.patter.2022.100588] [Citation(s) in RCA: 44] [Impact Index Per Article: 22.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
Artificial intelligence (AI) and machine learning (ML) are expanding in popularity for broad applications to challenging tasks in chemistry and materials science. Examples include the prediction of properties, the discovery of new reaction pathways, or the design of new molecules. The machine needs to read and write fluently in a chemical language for each of these tasks. Strings are a common tool to represent molecular graphs, and the most popular molecular string representation, Smiles, has powered cheminformatics since the late 1980s. However, in the context of AI and ML in chemistry, Smiles has several shortcomings-most pertinently, most combinations of symbols lead to invalid results with no valid chemical interpretation. To overcome this issue, a new language for molecules was introduced in 2020 that guarantees 100% robustness: SELF-referencing embedded string (Selfies). Selfies has since simplified and enabled numerous new applications in chemistry. In this perspective, we look to the future and discuss molecular string representations, along with their respective opportunities and challenges. We propose 16 concrete future projects for robust molecular representations. These involve the extension toward new chemical domains, exciting questions at the interface of AI and robust languages, and interpretability for both humans and machines. We hope that these proposals will inspire several follow-up works exploiting the full potential of molecular string representations for the future of AI in chemistry and materials science.
Collapse
Affiliation(s)
- Mario Krenn
- Max Planck Institute for the Science of Light (MPL), Erlangen, Germany
| | - Qianxiang Ai
- Department of Chemistry, Fordham University, The Bronx, NY, USA
| | - Senja Barthel
- Department of Mathematics, Vrije Universiteit Amsterdam, Amsterdam, the Netherlands
| | - Nessa Carson
- Syngenta Jealott’s Hill International Research Centre, Bracknell, Berkshire, UK
| | - Angelo Frei
- Department of Chemistry, Imperial College London, Molecular Sciences Research Hub, White City Campus, Wood Lane, London, UK
| | - Nathan C. Frey
- Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Pascal Friederich
- Institute of Theoretical Informatics, Karlsruhe Institute of Technology, Karlsruhe, Germany
- Institute of Nanotechnology, Karlsruhe Institute of Technology, Eggenstein-Leopoldshafen, Germany
| | - Théophile Gaudin
- Department of Computer Science, University of Toronto, Toronto, ON, Canada
- IBM Research Europe, Zürich, Switzerland
| | | | - Kevin Maik Jablonka
- Laboratory of Molecular Simulation (LSMO), Institut des Sciences et Ingénierie Chimiques, Ecole Polytechnique Fédérale de Lausanne (EPFL), Sion, Valais, Switzerland
| | - Rafael F. Lameiro
- Medicinal and Biological Chemistry Group, São Carlos Institute of Chemistry, University of São Paulo, São Paulo, Brazil
| | - Dominik Lemm
- Faculty of Physics, University of Vienna, Vienna, Austria
| | - Alston Lo
- Department of Computer Science, University of Toronto, Toronto, ON, Canada
| | - Seyed Mohamad Moosavi
- Department of Mathematics and Computer Science, Freie Universität Berlin, Berlin, Germany
| | | | - AkshatKumar Nigam
- Department of Computer Science, Stanford University, Stanford, CA, USA
| | - Robert Pollice
- Department of Computer Science, University of Toronto, Toronto, ON, Canada
- Chemical Physics Theory Group, Department of Chemistry, University of Toronto, Toronto, ON, Canada
| | - Kohulan Rajan
- Institute for Inorganic and Analytical Chemistry, Friedrich-Schiller Universität Jena, Jena, Germany
| | - Ulrich Schatzschneider
- Institut für Anorganische Chemie, Julius-Maximilians-Universität Würzburg, Würzburg, Germany
| | - Philippe Schwaller
- IBM Research Europe, Zürich, Switzerland
- Laboratory of Artificial Chemical Intelligence (LIAC), Institut des Sciences et Ingénierie Chimiques, Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
- National Centre of Competence in Research (NCCR) Catalysis, Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
| | - Marta Skreta
- Department of Computer Science, University of Toronto, Toronto, ON, Canada
- Vector Institute for Artificial Intelligence, Toronto, ON, Canada
| | - Berend Smit
- Laboratory of Molecular Simulation (LSMO), Institut des Sciences et Ingénierie Chimiques, Ecole Polytechnique Fédérale de Lausanne (EPFL), Sion, Valais, Switzerland
| | - Felix Strieth-Kalthoff
- Chemical Physics Theory Group, Department of Chemistry, University of Toronto, Toronto, ON, Canada
| | - Chong Sun
- Department of Computer Science, University of Toronto, Toronto, ON, Canada
| | - Gary Tom
- Department of Computer Science, University of Toronto, Toronto, ON, Canada
- Chemical Physics Theory Group, Department of Chemistry, University of Toronto, Toronto, ON, Canada
| | | | - Andrew Wang
- Chemical Physics Theory Group, Department of Chemistry, University of Toronto, Toronto, ON, Canada
- Solar Fuels Group, Department of Chemistry, University of Toronto, Toronto, ON, Canada
| | - Andrew D. White
- Department of Chemical Engineering, University of Rochester, Rochester, NY, USA
| | - Adamo Young
- Department of Computer Science, University of Toronto, Toronto, ON, Canada
- Vector Institute for Artificial Intelligence, Toronto, ON, Canada
| | - Rose Yu
- Department of Computer Science and Engineering, University of California, San Diego, La Jolla, CA, USA
| | - Alán Aspuru-Guzik
- Department of Computer Science, University of Toronto, Toronto, ON, Canada
- Chemical Physics Theory Group, Department of Chemistry, University of Toronto, Toronto, ON, Canada
- Vector Institute for Artificial Intelligence, Toronto, ON, Canada
- Department of Chemical Engineering and Applied Chemistry, University of Toronto, Toronto, ON, Canada
- Department of Materials Science, University of Toronto, Toronto, ON, Canada
- Canadian Institute for Advanced Research (CIFAR) Lebovic Fellow, Toronto, ON, Canada
| |
Collapse
|
66
|
Sumita M, Terayama K, Tamura R, Tsuda K. QCforever: A Quantum Chemistry Wrapper for Everyone to Use in Black-Box Optimization. J Chem Inf Model 2022; 62:4427-4434. [PMID: 36074116 PMCID: PMC9518232 DOI: 10.1021/acs.jcim.2c00812] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2022] [Indexed: 11/29/2022]
Abstract
To obtain observable physical or molecular properties such as ionization potential and fluorescent wavelength with quantum chemical (QC) computation, multi-step computation manipulated by a human is required. Hence, automating the multi-step computational process and making it a black box that can be handled by anybody are important for effective database construction and fast realistic material design through the framework of black-box optimization where machine learning algorithms are introduced as a predictor. Here, we propose a Python library, QCforever, to automate the computation of some molecular properties and chemical phenomena induced by molecules. This tool just requires a molecule file for providing its observable properties, automating the computation process of molecular properties (for ionization potential, fluorescence, etc.) and output analysis for providing their multi-values for evaluating a molecule. Incorporating the tool in black-box optimization, we can explore molecules that have properties we desired within the limitation of QC computation.
Collapse
Affiliation(s)
- Masato Sumita
- RIKEN
Center for Advanced Intelligence Project, Tokyo 103-0027, Japan
- International
Center for Materials Nanoarchitectonics (WPI-MANA), National Institute for Materials Science, Tsukuba 305-0044, Japan
| | - Kei Terayama
- RIKEN
Center for Advanced Intelligence Project, Tokyo 103-0027, Japan
- Graduate
School of Medical Life Science, Yokohama
City University, Tsurumi-ku, Yokohama 230-0045, Japan
| | - Ryo Tamura
- RIKEN
Center for Advanced Intelligence Project, Tokyo 103-0027, Japan
- International
Center for Materials Nanoarchitectonics (WPI-MANA), National Institute for Materials Science, Tsukuba 305-0044, Japan
- Graduate
School of Frontier Sciences, The University
of Tokyo, Kashiwa 277-8561, Japan
- Research
and Services Division of Materials Data and Integrated System, National Institute for Materials Science, Tsukuba 305-0047, Japan
| | - Koji Tsuda
- RIKEN
Center for Advanced Intelligence Project, Tokyo 103-0027, Japan
- Graduate
School of Frontier Sciences, The University
of Tokyo, Kashiwa 277-8561, Japan
- Research
and Services Division of Materials Data and Integrated System, National Institute for Materials Science, Tsukuba 305-0047, Japan
| |
Collapse
|
67
|
Shmilovich K, Willmott D, Batalov I, Kornbluth M, Mailoa J, Kolter JZ. Orbital Mixer: Using Atomic Orbital Features for Basis-Dependent Prediction of Molecular Wavefunctions. J Chem Theory Comput 2022; 18:6021-6030. [PMID: 36122312 DOI: 10.1021/acs.jctc.2c00555] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Leveraging ab initio data at scale has enabled the development of machine learning models capable of extremely accurate and fast molecular property prediction. A central paradigm of many previous studies focuses on generating predictions for only a fixed set of properties. Recent lines of research instead aim to explicitly learn the electronic structure via molecular wavefunctions, from which other quantum chemical properties can be directly derived. While previous methods generate predictions as a function of only the atomic configuration, in this work we present an alternate approach that directly purposes basis-dependent information to predict molecular electronic structure. Our model, Orbital Mixer, is composed entirely of multi-layer perceptrons (MLPs) using MLP-Mixer layers within a simple, intuitive, and scalable architecture that achieves competitive Hamiltonian and molecular orbital energy and coefficient prediction accuracies compared to the state-of-the-art.
Collapse
Affiliation(s)
- Kirill Shmilovich
- Pritzker School of Molecular Engineering, University of Chicago, Chicago, Illinois 60637, United States
| | - Devin Willmott
- Bosch Center for Artificial Intelligence, Pittsburgh, Pennsylvania 15222, United States
| | - Ivan Batalov
- Bosch Center for Artificial Intelligence, Pittsburgh, Pennsylvania 15222, United States
| | - Mordechai Kornbluth
- Bosch Research and Technology Center, Cambridge, Massachusetts 02139, United States
| | - Jonathan Mailoa
- Tencent Quantum Laboratory, Shenzhen, Guangdong 518057, China
| | - J Zico Kolter
- Bosch Center for Artificial Intelligence, Pittsburgh, Pennsylvania 15222, United States.,Carnegie Mellon University, Pittsburgh, Pennsylvania 15222, United States
| |
Collapse
|
68
|
Abstract
Machine-learning force fields have become increasingly popular because of their balance of accuracy and speed. However, a significant limitation is the use of element-specific features, leading to poor scalability with the number of elements. This work introduces the Gaussian multipole (GMP) featurization scheme that utilizes physically relevant multipole expansions of the electron density around atoms to yield feature vectors that interpolate between element types and have a fixed dimension regardless of the number of elements present. We combine GMP with neural networks and apply these models to the MD17 and QM9 data sets, revealing high computational efficiency, systematically improvable accuracy, and the ability to make reasonable predictions on elements not included in the training set. Finally, we test GMP-based models for the OCP data set, demonstrating comparable performance to graph-convolutional models. The results indicate that this featurization scheme fills a critical gap in the construction of efficient and transferable machine-learned force fields.
Collapse
Affiliation(s)
- Xiangyun Lei
- School of Chemical & Biomolecular Engineering, Georgia Institute of Technology, Atlanta, Georgia 30332, United States
| | - Andrew J Medford
- School of Chemical & Biomolecular Engineering, Georgia Institute of Technology, Atlanta, Georgia 30332, United States
| |
Collapse
|
69
|
Fedik N, Zubatyuk R, Kulichenko M, Lubbers N, Smith JS, Nebgen B, Messerly R, Li YW, Boldyrev AI, Barros K, Isayev O, Tretiak S. Extending machine learning beyond interatomic potentials for predicting molecular properties. Nat Rev Chem 2022; 6:653-672. [PMID: 37117713 DOI: 10.1038/s41570-022-00416-3] [Citation(s) in RCA: 25] [Impact Index Per Article: 12.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/15/2022] [Indexed: 11/09/2022]
Abstract
Machine learning (ML) is becoming a method of choice for modelling complex chemical processes and materials. ML provides a surrogate model trained on a reference dataset that can be used to establish a relationship between a molecular structure and its chemical properties. This Review highlights developments in the use of ML to evaluate chemical properties such as partial atomic charges, dipole moments, spin and electron densities, and chemical bonding, as well as to obtain a reduced quantum-mechanical description. We overview several modern neural network architectures, their predictive capabilities, generality and transferability, and illustrate their applicability to various chemical properties. We emphasize that learned molecular representations resemble quantum-mechanical analogues, demonstrating the ability of the models to capture the underlying physics. We also discuss how ML models can describe non-local quantum effects. Finally, we conclude by compiling a list of available ML toolboxes, summarizing the unresolved challenges and presenting an outlook for future development. The observed trends demonstrate that this field is evolving towards physics-based models augmented by ML, which is accompanied by the development of new methods and the rapid growth of user-friendly ML frameworks for chemistry.
Collapse
|
70
|
Ebisawa S, Tsutsumi T, Taketsugu T. Extension of Natural Reaction Orbital Approach to Multiconfigurational Wavefunctions. J Chem Phys 2022; 157:084118. [DOI: 10.1063/5.0098230] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
Recently, we proposed a new orbital analysis method, natural reaction orbital (NRO), which automatically extracts orbital pairs that characterize electron transfer in reaction processes by singular value decomposition (SVD) of the first-order orbital response matrix to the nuclear coordinate displacements (S. Ebisawa, M. Hasebe, T. Tsutsumi, T. Tsuneda, and T. Taketsugu, Phys. Chem. Chem. Phys. 24, 3532 (2022)). NRO analysis along the intrinsic reaction coordinate (IRC) for several typical chemical reactions demonstrated that electron transfer occurs mainly in the vicinity of transition states and in regions where the energy profile along the IRC shows shoulder features, allowing the reaction mechanism to be explained in terms of electron motion. However, its application has been limited to single configuration theories such as Hartree-Fock theory and density functional theory (DFT). In this work, the concept of NRO is extended to multiconfigurational wavefunctions and formulated as the multiconfiguration NRO (MC-NRO). The MC-NRO method is applicable to various types of electronic structure theories, including multiconfigurational theory and linear response theory, and is expected to be a practical tool for extracting the qualitative essence of a broad range of chemical reactions, including covalent bond dissociation and chemical reactions in electronically excited states. In this paper, we calculate the IRC for five basic chemical reaction processes at the level of the complete active space self-consistent field (CASSCF) theory and discuss the electron transfer by performing MC-NRO analysis along each IRC. Finally, issues and future prospects of the MC-NRO method are discussed.
Collapse
|
71
|
Golze D, Hirvensalo M, Hernández-León P, Aarva A, Etula J, Susi T, Rinke P, Laurila T, Caro MA. Accurate Computational Prediction of Core-Electron Binding Energies in Carbon-Based Materials: A Machine-Learning Model Combining Density-Functional Theory and GW. CHEMISTRY OF MATERIALS : A PUBLICATION OF THE AMERICAN CHEMICAL SOCIETY 2022; 34:6240-6254. [PMID: 35910537 PMCID: PMC9330771 DOI: 10.1021/acs.chemmater.1c04279] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/13/2021] [Revised: 06/30/2022] [Indexed: 06/15/2023]
Abstract
We present a quantitatively accurate machine-learning (ML) model for the computational prediction of core-electron binding energies, from which X-ray photoelectron spectroscopy (XPS) spectra can be readily obtained. Our model combines density functional theory (DFT) with GW and uses kernel ridge regression for the ML predictions. We apply the new approach to disordered materials and small molecules containing carbon, hydrogen, and oxygen and obtain qualitative and quantitative agreement with experiment, resolving spectral features within 0.1 eV of reference experimental spectra. The method only requires the user to provide a structural model for the material under study to obtain an XPS prediction within seconds. Our new tool is freely available online through the XPS Prediction Server.
Collapse
Affiliation(s)
- Dorothea Golze
- Faculty
of Chemistry and Food Chemistry, Technische
Universität Dresden, 01062 Dresden, Germany
- Department
of Applied Physics, Aalto University, 02150 Espoo, Finland
| | - Markus Hirvensalo
- Department
of Applied Physics, Aalto University, 02150 Espoo, Finland
| | | | - Anja Aarva
- Department
of Electrical Engineering and Automation, Aalto University, 02150 Espoo, Finland
| | - Jarkko Etula
- Department
of Chemistry and Materials Science, Aalto
University, 02150 Espoo, Finland
| | - Toma Susi
- University
of Vienna, Faculty of Physics, Boltzmanngasse 5, 1090 Vienna, Austria
| | - Patrick Rinke
- Department
of Applied Physics, Aalto University, 02150 Espoo, Finland
| | - Tomi Laurila
- Department
of Electrical Engineering and Automation, Aalto University, 02150 Espoo, Finland
- Department
of Chemistry and Materials Science, Aalto
University, 02150 Espoo, Finland
| | - Miguel A. Caro
- Department
of Electrical Engineering and Automation, Aalto University, 02150 Espoo, Finland
| |
Collapse
|
72
|
Weinreich J, Lemm D, von Rudorff GF, von Lilienfeld OA. Ab initio machine learning of phase space averages. J Chem Phys 2022; 157:024303. [PMID: 35840379 DOI: 10.1063/5.0095674] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Equilibrium structures determine material properties and biochemical functions. We here propose to machine learn phase space averages, conventionally obtained by ab initio or force-field-based molecular dynamics (MD) or Monte Carlo (MC) simulations. In analogy to ab initio MD, our ab initio machine learning (AIML) model does not require bond topologies and, therefore, enables a general machine learning pathway to obtain ensemble properties throughout the chemical compound space. We demonstrate AIML for predicting Boltzmann averaged structures after training on hundreds of MD trajectories. The AIML output is subsequently used to train machine learning models of free energies of solvation using experimental data and to reach competitive prediction errors (mean absolute error ∼ 0.8 kcal/mol) for out-of-sample molecules-within milliseconds. As such, AIML effectively bypasses the need for MD or MC-based phase space sampling, enabling exploration campaigns of Boltzmann averages throughout the chemical compound space at a much accelerated pace. We contextualize our findings by comparison to state-of-the-art methods resulting in a Pareto plot for the free energy of solvation predictions in terms of accuracy and time.
Collapse
Affiliation(s)
- Jan Weinreich
- Faculty of Physics, University of Vienna, Kolingasse 14-16, AT-1090 Wien, Austria
| | - Dominik Lemm
- Faculty of Physics, University of Vienna, Kolingasse 14-16, AT-1090 Wien, Austria
| | | | | |
Collapse
|
73
|
AI-based prediction of new binding site and virtual screening for the discovery of novel P2X3 receptor antagonists. Eur J Med Chem 2022; 240:114556. [DOI: 10.1016/j.ejmech.2022.114556] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2022] [Revised: 06/15/2022] [Accepted: 06/18/2022] [Indexed: 11/17/2022]
|
74
|
Sauceda HE, Gálvez-González LE, Chmiela S, Paz-Borbón LO, Müller KR, Tkatchenko A. BIGDML-Towards accurate quantum machine learning force fields for materials. Nat Commun 2022; 13:3733. [PMID: 35768400 PMCID: PMC9243122 DOI: 10.1038/s41467-022-31093-x] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2021] [Accepted: 06/01/2022] [Indexed: 12/16/2022] Open
Abstract
Machine-learning force fields (MLFF) should be accurate, computationally and data efficient, and applicable to molecules, materials, and interfaces thereof. Currently, MLFFs often introduce tradeoffs that restrict their practical applicability to small subsets of chemical space or require exhaustive datasets for training. Here, we introduce the Bravais-Inspired Gradient-Domain Machine Learning (BIGDML) approach and demonstrate its ability to construct reliable force fields using a training set with just 10-200 geometries for materials including pristine and defect-containing 2D and 3D semiconductors and metals, as well as chemisorbed and physisorbed atomic and molecular adsorbates on surfaces. The BIGDML model employs the full relevant symmetry group for a given material, does not assume artificial atom types or localization of atomic interactions and exhibits high data efficiency and state-of-the-art energy accuracies (errors substantially below 1 meV per atom) for an extended set of materials. Extensive path-integral molecular dynamics carried out with BIGDML models demonstrate the counterintuitive localization of benzene-graphene dynamics induced by nuclear quantum effects and their strong contributions to the hydrogen diffusion coefficient in a Pd crystal for a wide range of temperatures.
Collapse
Affiliation(s)
- Huziel E Sauceda
- Departamento de Materia Condensada, Instituto de Física, Universidad Nacional Autónoma de México, Cd. de México C.P., 04510, Mexico.
- Machine Learning Group, Technische Universität Berlin, 10587, Berlin, Germany.
- BASLEARN - TU Berlin/BASF Joint Lab for Machine Learning, Technische Universität Berlin, 10587, Berlin, Germany.
| | - Luis E Gálvez-González
- Programa de Doctorado en Ciencias (Física), División de Ciencias Exactas y Naturales, Universidad de Sonora, Blvd. Luis Encinas & Rosales, Hermosillo, C.P., 83000, Mexico
| | - Stefan Chmiela
- Machine Learning Group, Technische Universität Berlin, 10587, Berlin, Germany
- BIFOLD - Berlin Institute for the Foundations of Learning and Data, Berlin, Germany
| | - Lauro Oliver Paz-Borbón
- Departamento de Física Química, Instituto de Física, Universidad Nacional Autónoma de México, Cd. de México C.P., 04510, Mexico
| | - Klaus-Robert Müller
- Machine Learning Group, Technische Universität Berlin, 10587, Berlin, Germany.
- BIFOLD - Berlin Institute for the Foundations of Learning and Data, Berlin, Germany.
- Google Research, Brain team, Berlin, Germany.
- Department of Artificial Intelligence, Korea University, Anam-dong, Seongbuk-gu, 02841, Seoul, Korea.
- Max Planck Institute for Informatics, Stuhlsatzenhausweg, 66123, Saarbrücken, Germany.
| | - Alexandre Tkatchenko
- Department of Physics and Materials Science, University of Luxembourg, L-1511, Luxembourg City, Luxembourg.
| |
Collapse
|
75
|
Xiouras C, Cameli F, Quilló GL, Kavousanakis ME, Vlachos DG, Stefanidis GD. Applications of Artificial Intelligence and Machine Learning Algorithms to Crystallization. Chem Rev 2022; 122:13006-13042. [PMID: 35759465 DOI: 10.1021/acs.chemrev.2c00141] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]
Abstract
Artificial intelligence and specifically machine learning applications are nowadays used in a variety of scientific applications and cutting-edge technologies, where they have a transformative impact. Such an assembly of statistical and linear algebra methods making use of large data sets is becoming more and more integrated into chemistry and crystallization research workflows. This review aims to present, for the first time, a holistic overview of machine learning and cheminformatics applications as a novel, powerful means to accelerate the discovery of new crystal structures, predict key properties of organic crystalline materials, simulate, understand, and control the dynamics of complex crystallization process systems, as well as contribute to high throughput automation of chemical process development involving crystalline materials. We critically review the advances in these new, rapidly emerging research areas, raising awareness in issues such as the bridging of machine learning models with first-principles mechanistic models, data set size, structure, and quality, as well as the selection of appropriate descriptors. At the same time, we propose future research at the interface of applied mathematics, chemistry, and crystallography. Overall, this review aims to increase the adoption of such methods and tools by chemists and scientists across industry and academia.
Collapse
Affiliation(s)
- Christos Xiouras
- Chemical Process R&D, Crystallization Technology Unit, Janssen R&D, Turnhoutseweg 30, 2340 Beerse, Belgium
| | - Fabio Cameli
- Department of Chemical and Biomolecular Engineering, University of Delaware, 150 Academy Street, Newark, Delaware 19716, United States
| | - Gustavo Lunardon Quilló
- Chemical Process R&D, Crystallization Technology Unit, Janssen R&D, Turnhoutseweg 30, 2340 Beerse, Belgium.,Chemical and BioProcess Technology and Control, Department of Chemical Engineering, Faculty of Engineering Technology, KU Leuven, Gebroeders de Smetstraat 1, 9000 Ghent, Belgium
| | - Mihail E Kavousanakis
- School of Chemical Engineering, National Technical University of Athens, Heroon Polytechniou 9, 15780 Zografou, Greece
| | - Dionisios G Vlachos
- Department of Chemical and Biomolecular Engineering, University of Delaware, 150 Academy Street, Newark, Delaware 19716, United States
| | - Georgios D Stefanidis
- School of Chemical Engineering, National Technical University of Athens, Heroon Polytechniou 9, 15780 Zografou, Greece.,Laboratory for Chemical Technology, Ghent University; Tech Lane Ghent Science Park 125, B-9052 Ghent, Belgium
| |
Collapse
|
76
|
Lee S, Ermanis K, Goodman JM. MolE8: finding DFT potential energy surface minima values from force-field optimised organic molecules with new machine learning representations. Chem Sci 2022; 13:7204-7214. [PMID: 35799803 PMCID: PMC9214916 DOI: 10.1039/d1sc06324c] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2021] [Accepted: 05/23/2022] [Indexed: 11/21/2022] Open
Abstract
The use of machine learning techniques in computational chemistry has gained significant momentum since large molecular databases are now readily available. Predictions of molecular properties using machine learning have advantages over the traditional quantum mechanics calculations because they can be cheaper computationally without losing the accuracy. We present a new extrapolatable and explainable molecular representation based on bonds, angles and dihedrals that can be used to train machine learning models. The trained models can accurately predict the electronic energy and the free energy of small organic molecules with atom types C, H N and O, with a mean absolute error of 1.2 kcal mol-1. The models can be extrapolated to larger organic molecules with an average error of less than 3.7 kcal mol-1 for 10 or fewer heavy atoms, which represent a chemical space two orders of magnitude larger. The rapid energy predictions of multiple molecules, up to 7 times faster than previous ML models of similar accuracy, has been achieved by sampling geometries around the potential energy surface minima. Therefore, the input geometries do not have to be located precisely on the minima and we show that accurate density functional theory energy predictions can be made from force-field optimised geometries with a mean absolute error 2.5 kcal mol-1.
Collapse
Affiliation(s)
- Sanha Lee
- Yusuf Hamied Department of Chemistry, University of Cambridge Lensfield Road Cambridge CB2 1EW UK
| | | | - Jonathan M Goodman
- Yusuf Hamied Department of Chemistry, University of Cambridge Lensfield Road Cambridge CB2 1EW UK
| |
Collapse
|
77
|
Isert C, Atz K, Jiménez-Luna J, Schneider G. QMugs, quantum mechanical properties of drug-like molecules. Sci Data 2022; 9:273. [PMID: 35672335 PMCID: PMC9174255 DOI: 10.1038/s41597-022-01390-7] [Citation(s) in RCA: 39] [Impact Index Per Article: 19.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2021] [Accepted: 05/17/2022] [Indexed: 12/16/2022] Open
Abstract
Machine learning approaches in drug discovery, as well as in other areas of the chemical sciences, benefit from curated datasets of physical molecular properties. However, there currently is a lack of data collections featuring large bioactive molecules alongside first-principle quantum chemical information. The open-access QMugs (Quantum-Mechanical Properties of Drug-like Molecules) dataset fills this void. The QMugs collection comprises quantum mechanical properties of more than 665 k biologically and pharmacologically relevant molecules extracted from the ChEMBL database, totaling ~2 M conformers. QMugs contains optimized molecular geometries and thermodynamic data obtained via the semi-empirical method GFN2-xTB. Atomic and molecular properties are provided on both the GFN2-xTB and on the density-functional levels of theory (DFT, ωB97X-D/def2-SVP). QMugs features molecules of significantly larger size than previously-reported collections and comprises their respective quantum mechanical wave functions, including DFT density and orbital matrices. This dataset is intended to facilitate the development of models that learn from molecular data on different levels of theory while also providing insight into the corresponding relationships between molecular structure and biological activity.
Collapse
Affiliation(s)
- Clemens Isert
- Department of Chemistry and Applied Biosciences, RETHINK, ETH Zurich, 8093, Zurich, Switzerland
| | - Kenneth Atz
- Department of Chemistry and Applied Biosciences, RETHINK, ETH Zurich, 8093, Zurich, Switzerland
| | - José Jiménez-Luna
- Department of Chemistry and Applied Biosciences, RETHINK, ETH Zurich, 8093, Zurich, Switzerland.
- Department of Medicinal Chemistry, Boehringer Ingelheim Pharma GmbH & Co. KG, Birkendorfer Straße 65, 88397, Biberach an der Riss, Germany.
| | - Gisbert Schneider
- Department of Chemistry and Applied Biosciences, RETHINK, ETH Zurich, 8093, Zurich, Switzerland.
- ETH Singapore SEC Ltd, 1 CREATE Way, #06-01 CREATE Tower, Singapore, 138602, Singapore.
| |
Collapse
|
78
|
Winkler L, Müller KR, Sauceda HE. High-fidelity molecular dynamics trajectory reconstruction with bi-directional neural networks. MACHINE LEARNING: SCIENCE AND TECHNOLOGY 2022. [DOI: 10.1088/2632-2153/ac6ec6] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Abstract
Molecular dynamics (MD) simulations are a cornerstone in science, enabling the investigation of a system’s thermodynamics all the way to analyzing intricate molecular interactions. In general, creating extended molecular trajectories can be a computationally expensive process, for example, when running ab-initio simulations. Hence, repeating such calculations to either obtain more accurate thermodynamics or to get a higher resolution in the dynamics generated by a fine-grained quantum interaction can be time- and computational resource-consuming. In this work, we explore different machine learning methodologies to increase the resolution of MD trajectories on-demand within a post-processing step. As a proof of concept, we analyse the performance of bi-directional neural networks (NNs) such as neural ODEs, Hamiltonian networks, recurrent NNs and long short-term memories, as well as the uni-directional variants as a reference, for MD simulations (here: the MD17 dataset). We have found that Bi-LSTMs are the best performing models; by utilizing the local time-symmetry of thermostated trajectories they can even learn long-range correlations and display high robustness to noisy dynamics across molecular complexity. Our models can reach accuracies of up to 10−4 Å in trajectory interpolation, which leads to the faithful reconstruction of several unseen high-frequency molecular vibration cycles. This renders the comparison between the learned and reference trajectories indistinguishable. The results reported in this work can serve (1) as a baseline for larger systems, as well as (2) for the construction of better MD integrators.
Collapse
|
79
|
Atz K, Isert C, Böcker MNA, Jiménez-Luna J, Schneider G. Δ-Quantum machine-learning for medicinal chemistry. Phys Chem Chem Phys 2022; 24:10775-10783. [PMID: 35470831 PMCID: PMC9093086 DOI: 10.1039/d2cp00834c] [Citation(s) in RCA: 16] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2022] [Accepted: 04/05/2022] [Indexed: 11/21/2022]
Abstract
Many molecular design tasks benefit from fast and accurate calculations of quantum-mechanical (QM) properties. However, the computational cost of QM methods applied to drug-like molecules currently renders large-scale applications of quantum chemistry challenging. Aiming to mitigate this problem, we developed DelFTa, an open-source toolbox for the prediction of electronic properties of drug-like molecules at the density functional (DFT) level of theory, using Δ-machine-learning. Δ-Learning corrects the prediction error (Δ) of a fast but inaccurate property calculation. DelFTa employs state-of-the-art three-dimensional message-passing neural networks trained on a large dataset of QM properties. It provides access to a wide array of quantum observables on the molecular, atomic and bond levels by predicting approximations to DFT values from a low-cost semiempirical baseline. Δ-Learning outperformed its direct-learning counterpart for most of the considered QM endpoints. The results suggest that predictions for non-covalent intra- and intermolecular interactions can be extrapolated to larger biomolecular systems. The software is fully open-sourced and features documented command-line and Python APIs.
Collapse
Affiliation(s)
- Kenneth Atz
- ETH Zurich, Department of Chemistry and Applied Biosciences, Vladimir-Prelog-Weg 4, 8093 Zurich, Switzerland.
| | - Clemens Isert
- ETH Zurich, Department of Chemistry and Applied Biosciences, Vladimir-Prelog-Weg 4, 8093 Zurich, Switzerland.
| | - Markus N A Böcker
- ETH Zurich, Department of Chemistry and Applied Biosciences, Vladimir-Prelog-Weg 4, 8093 Zurich, Switzerland.
| | - José Jiménez-Luna
- ETH Zurich, Department of Chemistry and Applied Biosciences, Vladimir-Prelog-Weg 4, 8093 Zurich, Switzerland.
- Department of Medicinal Chemistry, Boehringer Ingelheim Pharma GmbH & Co. KG, Birkendorfer Straße 65, 88397 Biberach an der Riss, Germany
| | - Gisbert Schneider
- ETH Zurich, Department of Chemistry and Applied Biosciences, Vladimir-Prelog-Weg 4, 8093 Zurich, Switzerland.
- ETH Singapore SEC Ltd., 1 CREATE Way, #06-01 CREATE Tower, Singapore 138602, Singapore
| |
Collapse
|
80
|
Mirzoev AA, Gelchinski BR, Rempel AA. Neural Network Prediction of Interatomic Interaction in Multielement Substances and High-Entropy Alloys: A Review. DOKLADY PHYSICAL CHEMISTRY 2022. [DOI: 10.1134/s0012501622700026] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
|
81
|
Rankine CD, Penfold TJ. Accurate, affordable, and generalizable machine learning simulations of transition metal x-ray absorption spectra using the XANESNET deep neural network. J Chem Phys 2022; 156:164102. [PMID: 35490005 DOI: 10.1063/5.0087255] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
Abstract
The affordable, accurate, and generalizable prediction of spectroscopic observables plays a key role in the analysis of increasingly complex experiments. In this article, we develop and deploy a deep neural network-XANESNET-for predicting the lineshape of first-row transition metal K-edge x-ray absorption near-edge structure (XANES) spectra. XANESNET predicts the spectral intensities using only information about the local coordination geometry of the transition metal complexes encoded in a feature vector of weighted atom-centered symmetry functions. We address in detail the calibration of the feature vector for the particularities of the problem at hand, and we explore the individual feature importance to reveal the physical insight that XANESNET obtains at the Fe K-edge. XANESNET relies on only a few judiciously selected features-radial information on the first and second coordination shells suffices along with angular information sufficient to separate satisfactorily key coordination geometries. The feature importance is found to reflect the XANES spectral window under consideration and is consistent with the expected underlying physics. We subsequently apply XANESNET at nine first-row transition metal (Ti-Zn) K-edges. It can be optimized in as little as a minute, predicts instantaneously, and provides K-edge XANES spectra with an average accuracy of ∼±2%-4% in which the positions of prominent peaks are matched with a >90% hit rate to sub-eV (∼0.8 eV) error.
Collapse
Affiliation(s)
- C D Rankine
- Chemistry-School of Natural and Environmental Sciences, Newcastle University, Newcastle Upon Tyne NE1 7RU, United Kingdom
| | - T J Penfold
- Chemistry-School of Natural and Environmental Sciences, Newcastle University, Newcastle Upon Tyne NE1 7RU, United Kingdom
| |
Collapse
|
82
|
Zheng P, Yang W, Wu W, Isayev O, Dral PO. Toward Chemical Accuracy in Predicting Enthalpies of Formation with General-Purpose Data-Driven Methods. J Phys Chem Lett 2022; 13:3479-3491. [PMID: 35416675 DOI: 10.1021/acs.jpclett.2c00734] [Citation(s) in RCA: 15] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Enthalpies of formation and reaction are important thermodynamic properties that have a crucial impact on the outcome of chemical transformations. Here we implement the calculation of enthalpies of formation with a general-purpose ANI-1ccx neural network atomistic potential. We demonstrate on a wide range of benchmark sets that both ANI-1ccx and our other general-purpose data-driven method AIQM1 approach the coveted chemical accuracy of 1 kcal/mol with the speed of semiempirical quantum mechanical methods (AIQM1) or faster (ANI-1ccx). It is remarkably achieved without specifically training the machine learning parts of ANI-1ccx or AIQM1 on formation enthalpies. Importantly, we show that these data-driven methods provide statistical means for uncertainty quantification of their predictions, which we use to detect and eliminate outliers and revise reference experimental data. Uncertainty quantification may also help in the systematic improvement of such data-driven methods.
Collapse
Affiliation(s)
- Peikun Zheng
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, Department of Chemistry, and College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, China
| | - Wudi Yang
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, Department of Chemistry, and College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, China
| | - Wei Wu
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, Department of Chemistry, and College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, China
| | - Olexandr Isayev
- Department of Chemistry, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213, United States
| | - Pavlo O Dral
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, Department of Chemistry, and College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, China
| |
Collapse
|
83
|
Abstract
We propose to relax geometries throughout chemical compound space (CCS) using alchemical perturbation density functional theory (APDFT). APDFT refers to perturbation theory involving changes in nuclear charges within approximate solutions to Schr\"odinger's equation. We give an analytical formula to calculate the mixed second order energy derivatives with respect to both, nuclear charges and nuclear positions (named "alchemical force"), within the restricted Hartree-Fock case.We have implemented and studied the formula for its use in geometry relaxation of various reference and target molecules.We have also analysed the convergence of the alchemical force perturbation series, as well as basis set effects.Interpolating alchemically predicted energies, forces, and Hessian to a Morse potential yields more accurate geometries and equilibrium energies than when performing a standard Newton Raphson step. Our numerical predictions for small molecules including BF, CO, N2, CH$_4$, NH$_3$, H$_2$O, and HF yield mean absolute errors of of equilibrium energies and bond lengths smaller than 10 mHa and 0.01 Bohr for 4$^\text{th}$ order APDFT predictions, respectively.Our alchemical geometry relaxation still preserves the combinatorial efficiency of APDFT: Based on a single coupled perturbed Hartree Fock derivative for benzene we provide numerical predictions of equilibrium energies and relaxed structures of all the 17 iso-electronic charge-netural BN-doped mutants with averaged absolute deviations of $\sim$27 mHa and $\sim$0.12 Bohr, respectively.
Collapse
|
84
|
Fabregat R, Fabrizio A, Engel EA, Meyer B, Juraskova V, Ceriotti M, Corminboeuf C. Local Kernel Regression and Neural Network Approaches to the Conformational Landscapes of Oligopeptides. J Chem Theory Comput 2022; 18:1467-1479. [PMID: 35179897 PMCID: PMC8908737 DOI: 10.1021/acs.jctc.1c00813] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2021] [Indexed: 11/30/2022]
Abstract
The application of machine learning to theoretical chemistry has made it possible to combine the accuracy of quantum chemical energetics with the thorough sampling of finite-temperature fluctuations. To reach this goal, a diverse set of methods has been proposed, ranging from simple linear models to kernel regression and highly nonlinear neural networks. Here we apply two widely different approaches to the same, challenging problem: the sampling of the conformational landscape of polypeptides at finite temperature. We develop a local kernel regression (LKR) coupled with a supervised sparsity method and compare it with a more established approach based on Behler-Parrinello type neural networks. In the context of the LKR, we discuss how the supervised selection of the reference pool of environments is crucial to achieve accurate potential energy surfaces at a competitive computational cost and leverage the locality of the model to infer which chemical environments are poorly described by the DFTB baseline. We then discuss the relative merits of the two frameworks and perform Hamiltonian-reservoir replica-exchange Monte Carlo sampling and metadynamics simulations, respectively, to demonstrate that both frameworks can achieve converged and transferable sampling of the conformational landscape of complex and flexible biomolecules with comparable accuracy and computational cost.
Collapse
Affiliation(s)
- Raimon Fabregat
- Laboratory for Computational
Molecular Design, Institute of Chemical
Sciences and Engineering, National Centre for Computational Design and Discovery
of Novel Materials (MARVEL), École
Polytechnique Fédérale de Lausanne, CH-1015 Lausanne, Switzerland
| | - Alberto Fabrizio
- Laboratory for Computational
Molecular Design, Institute of Chemical
Sciences and Engineering, National Centre for Computational Design and Discovery
of Novel Materials (MARVEL), École
Polytechnique Fédérale de Lausanne, CH-1015 Lausanne, Switzerland
| | - Edgar A. Engel
- Laboratory for Computational
Molecular Design, Institute of Chemical
Sciences and Engineering, National Centre for Computational Design and Discovery
of Novel Materials (MARVEL), École
Polytechnique Fédérale de Lausanne, CH-1015 Lausanne, Switzerland
- Laboratory
of Computational Science and Modeling, IMX,
École Polytechnique Fédérale de Lausanne, 1015 Lausanne, Switzerland
| | - Benjamin Meyer
- Laboratory for Computational
Molecular Design, Institute of Chemical
Sciences and Engineering, National Centre for Computational Design and Discovery
of Novel Materials (MARVEL), École
Polytechnique Fédérale de Lausanne, CH-1015 Lausanne, Switzerland
| | - Veronika Juraskova
- Laboratory for Computational
Molecular Design, Institute of Chemical
Sciences and Engineering, National Centre for Computational Design and Discovery
of Novel Materials (MARVEL), École
Polytechnique Fédérale de Lausanne, CH-1015 Lausanne, Switzerland
| | - Michele Ceriotti
- Laboratory for Computational
Molecular Design, Institute of Chemical
Sciences and Engineering, National Centre for Computational Design and Discovery
of Novel Materials (MARVEL), École
Polytechnique Fédérale de Lausanne, CH-1015 Lausanne, Switzerland
- Laboratory
of Computational Science and Modeling, IMX,
École Polytechnique Fédérale de Lausanne, 1015 Lausanne, Switzerland
| | - Clemence Corminboeuf
- Laboratory for Computational
Molecular Design, Institute of Chemical
Sciences and Engineering, National Centre for Computational Design and Discovery
of Novel Materials (MARVEL), École
Polytechnique Fédérale de Lausanne, CH-1015 Lausanne, Switzerland
| |
Collapse
|
85
|
Gebauer NWA, Gastegger M, Hessmann SSP, Müller KR, Schütt KT. Inverse design of 3d molecular structures with conditional generative neural networks. Nat Commun 2022; 13:973. [PMID: 35190542 PMCID: PMC8861047 DOI: 10.1038/s41467-022-28526-y] [Citation(s) in RCA: 45] [Impact Index Per Article: 22.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2021] [Accepted: 01/28/2022] [Indexed: 11/09/2022] Open
Abstract
The rational design of molecules with desired properties is a long-standing challenge in chemistry. Generative neural networks have emerged as a powerful approach to sample novel molecules from a learned distribution. Here, we propose a conditional generative neural network for 3d molecular structures with specified chemical and structural properties. This approach is agnostic to chemical bonding and enables targeted sampling of novel molecules from conditional distributions, even in domains where reference calculations are sparse. We demonstrate the utility of our method for inverse design by generating molecules with specified motifs or composition, discovering particularly stable molecules, and jointly targeting multiple electronic properties beyond the training regime.
Collapse
Affiliation(s)
- Niklas W A Gebauer
- Machine Learning Group, Technische Universität Berlin, 10587, Berlin, Germany.
- Berlin Institute for the Foundations of Learning and Data, 10587, Berlin, Germany.
- BASLEARN-TU Berlin/BASF Joint Lab for Machine Learning, Technische Universität Berlin, 10587, Berlin, Germany.
| | - Michael Gastegger
- Machine Learning Group, Technische Universität Berlin, 10587, Berlin, Germany
- BASLEARN-TU Berlin/BASF Joint Lab for Machine Learning, Technische Universität Berlin, 10587, Berlin, Germany
| | - Stefaan S P Hessmann
- Machine Learning Group, Technische Universität Berlin, 10587, Berlin, Germany
- Berlin Institute for the Foundations of Learning and Data, 10587, Berlin, Germany
| | - Klaus-Robert Müller
- Machine Learning Group, Technische Universität Berlin, 10587, Berlin, Germany
- Berlin Institute for the Foundations of Learning and Data, 10587, Berlin, Germany
- Department of Artificial Intelligence, Korea University, Anam-dong, Seongbuk-gu, Seoul, 02841, Korea
- Max-Planck-Institut für Informatik, 66123, Saarbrücken, Germany
| | - Kristof T Schütt
- Machine Learning Group, Technische Universität Berlin, 10587, Berlin, Germany.
- Berlin Institute for the Foundations of Learning and Data, 10587, Berlin, Germany.
| |
Collapse
|
86
|
Kalikadien AV, Pidko EA, Sinha V. ChemSpaX: exploration of chemical space by automated functionalization of molecular scaffold. DIGITAL DISCOVERY 2022; 1:8-25. [PMID: 35340336 PMCID: PMC8887922 DOI: 10.1039/d1dd00017a] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/02/2021] [Accepted: 12/23/2021] [Indexed: 12/19/2022]
Abstract
Exploration of the local chemical space of molecular scaffolds by post-functionalization (PF) is a promising route to discover novel molecules with desired structure and function. PF with rationally chosen substituents based on known electronic and steric properties is a commonly used experimental and computational strategy in screening, design and optimization of catalytic scaffolds. Automated generation of reasonably accurate geometric representations of post-functionalized molecular scaffolds is highly desirable for data-driven applications. However, automated PF of transition metal (TM) complexes remains challenging. In this work a Python-based workflow, ChemSpaX, that is aimed at automating the PF of a given molecular scaffold with special emphasis on TM complexes, is introduced. In three representative applications of ChemSpaX by comparing with DFT and DFT-B calculations, we show that the generated structures have a reasonable quality for use in computational screening applications. Furthermore, we show that ChemSpaX generated geometries can be used in machine learning applications to accurately predict DFT computed HOMO-LUMO gaps for transition metal complexes. ChemSpaX is open-source and aims to bolster and democratize the efforts of the scientific community towards data-driven chemical discovery.
Collapse
Affiliation(s)
- Adarsh V Kalikadien
- Inorganic Systems Engineering, Department of Chemical Engineering, Faculty of Applied Sciences, Delft University of Technology Van der Maasweg 9 2629 HZ Delft The Netherlands
| | - Evgeny A Pidko
- Inorganic Systems Engineering, Department of Chemical Engineering, Faculty of Applied Sciences, Delft University of Technology Van der Maasweg 9 2629 HZ Delft The Netherlands
| | - Vivek Sinha
- Inorganic Systems Engineering, Department of Chemical Engineering, Faculty of Applied Sciences, Delft University of Technology Van der Maasweg 9 2629 HZ Delft The Netherlands
| |
Collapse
|
87
|
Wen M, Blau SM, Xie X, Dwaraknath S, Persson KA. Improving machine learning performance on small chemical reaction data with unsupervised contrastive pretraining. Chem Sci 2022; 13:1446-1458. [PMID: 35222929 PMCID: PMC8809395 DOI: 10.1039/d1sc06515g] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2021] [Accepted: 01/09/2022] [Indexed: 11/21/2022] Open
Abstract
Machine learning (ML) methods have great potential to transform chemical discovery by accelerating the exploration of chemical space and drawing scientific insights from data. However, modern chemical reaction ML models, such as those based on graph neural networks (GNNs), must be trained on a large amount of labelled data in order to avoid overfitting the data and thus possessing low accuracy and transferability. In this work, we propose a strategy to leverage unlabelled data to learn accurate ML models for small labelled chemical reaction data. We focus on an old and prominent problem-classifying reactions into distinct families-and build a GNN model for this task. We first pretrain the model on unlabelled reaction data using unsupervised contrastive learning and then fine-tune it on a small number of labelled reactions. The contrastive pretraining learns by making the representations of two augmented versions of a reaction similar to each other but distinct from other reactions. We propose chemically consistent reaction augmentation methods that protect the reaction center and find they are the key for the model to extract relevant information from unlabelled data to aid the reaction classification task. The transfer learned model outperforms a supervised model trained from scratch by a large margin. Further, it consistently performs better than models based on traditional rule-driven reaction fingerprints, which have long been the default choice for small datasets, as well as those based on reaction fingerprints derived from masked language modelling. In addition to reaction classification, the effectiveness of the strategy is tested on regression datasets; the learned GNN-based reaction fingerprints can also be used to navigate the chemical reaction space, which we demonstrate by querying for similar reactions. The strategy can be readily applied to other predictive reaction problems to uncover the power of unlabelled data for learning better models with a limited supply of labels.
Collapse
Affiliation(s)
- Mingjian Wen
- Energy Technologies Area, Lawrence Berkeley National Laboratory Berkeley CA 94720 USA
| | - Samuel M Blau
- Energy Technologies Area, Lawrence Berkeley National Laboratory Berkeley CA 94720 USA
| | - Xiaowei Xie
- College of Chemistry, University of California Berkeley CA 94720 USA
- Materials Science Division, Lawrence Berkeley National Laboratory Berkeley CA 94720 USA
| | | | - Kristin A Persson
- Department of Materials Science and Engineering, University of California Berkeley CA 94720 USA
- Molecular Foundry, Lawrence Berkeley National Laboratory Berkeley CA 94720 USA
| |
Collapse
|
88
|
Pereyaslavets L, Kamath G, Butin O, Illarionov A, Olevanov M, Kurnikov I, Sakipov S, Leontyev I, Voronina E, Gannon T, Nawrocki G, Darkhovskiy M, Ivahnenko I, Kostikov A, Scaranto J, Kurnikova MG, Banik S, Chan H, Sternberg MG, Sankaranarayanan SKRS, Crawford B, Potoff J, Levitt M, Kornberg RD, Fain B. Accurate determination of solvation free energies of neutral organic compounds from first principles. Nat Commun 2022; 13:414. [PMID: 35058472 PMCID: PMC8776904 DOI: 10.1038/s41467-022-28041-0] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2021] [Accepted: 01/03/2022] [Indexed: 12/28/2022] Open
Abstract
The main goal of molecular simulation is to accurately predict experimental observables of molecular systems. Another long-standing goal is to devise models for arbitrary neutral organic molecules with little or no reliance on experimental data. While separately these goals have been met to various degrees, for an arbitrary system of molecules they have not been achieved simultaneously. For biophysical ensembles that exist at room temperature and pressure, and where the entropic contributions are on par with interaction strengths, it is the free energies that are both most important and most difficult to predict. We compute the free energies of solvation for a diverse set of neutral organic compounds using a polarizable force field fitted entirely to ab initio calculations. The mean absolute errors (MAE) of hydration, cyclohexane solvation, and corresponding partition coefficients are 0.2 kcal/mol, 0.3 kcal/mol and 0.22 log units, i.e. within chemical accuracy. The model (ARROW FF) is multipolar, polarizable, and its accompanying simulation stack includes nuclear quantum effects (NQE). The simulation tools' computational efficiency is on a par with current state-of-the-art packages. The construction of a wide-coverage molecular modelling toolset from first principles, together with its excellent predictive ability in the liquid phase is a major advance in biomolecular simulation.
Collapse
Affiliation(s)
| | - Ganesh Kamath
- InterX Inc, 805 Allston Way, Berkeley, CA, 94710, USA
| | - Oleg Butin
- InterX Inc, 805 Allston Way, Berkeley, CA, 94710, USA
| | | | - Michael Olevanov
- InterX Inc, 805 Allston Way, Berkeley, CA, 94710, USA
- Faculty of Physics, Lomonosov Moscow State University, Moscow, 119991, Russia
| | - Igor Kurnikov
- InterX Inc, 805 Allston Way, Berkeley, CA, 94710, USA
| | | | - Igor Leontyev
- InterX Inc, 805 Allston Way, Berkeley, CA, 94710, USA
| | - Ekaterina Voronina
- InterX Inc, 805 Allston Way, Berkeley, CA, 94710, USA
- Faculty of Physics, Lomonosov Moscow State University, Moscow, 119991, Russia
| | - Tyler Gannon
- InterX Inc, 805 Allston Way, Berkeley, CA, 94710, USA
| | | | | | | | | | - Jessica Scaranto
- Department of Chemistry, Carnegie Mellon University, Pittsburgh, PA, 15213, USA
| | - Maria G Kurnikova
- Department of Chemistry, Carnegie Mellon University, Pittsburgh, PA, 15213, USA
| | - Suvo Banik
- Center for Nanoscale Materials, Argonne National Lab, Argonne, IL, 60439, USA
- Department of Mechanical and Industrial Engineering, University of Illinois, Chicago, IL, 60607, USA
| | - Henry Chan
- Center for Nanoscale Materials, Argonne National Lab, Argonne, IL, 60439, USA
- Department of Mechanical and Industrial Engineering, University of Illinois, Chicago, IL, 60607, USA
| | - Michael G Sternberg
- Center for Nanoscale Materials, Argonne National Lab, Argonne, IL, 60439, USA
| | - Subramanian K R S Sankaranarayanan
- Center for Nanoscale Materials, Argonne National Lab, Argonne, IL, 60439, USA
- Department of Mechanical and Industrial Engineering, University of Illinois, Chicago, IL, 60607, USA
| | - Brad Crawford
- Department of Chemical Engineering and Materials Science, Wayne State University, Detroit, MI, 48202, USA
| | - Jeffrey Potoff
- Department of Chemical Engineering and Materials Science, Wayne State University, Detroit, MI, 48202, USA
| | - Michael Levitt
- Department of Structural Biology, Stanford University School of Medicine, Stanford, CA, 94304, USA
| | - Roger D Kornberg
- Department of Structural Biology, Stanford University School of Medicine, Stanford, CA, 94304, USA
| | - Boris Fain
- InterX Inc, 805 Allston Way, Berkeley, CA, 94710, USA.
| |
Collapse
|
89
|
Steiner M, Reiher M. Autonomous Reaction Network Exploration in Homogeneous and Heterogeneous Catalysis. Top Catal 2022; 65:6-39. [PMID: 35185305 PMCID: PMC8816766 DOI: 10.1007/s11244-021-01543-9] [Citation(s) in RCA: 20] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/17/2021] [Indexed: 12/11/2022]
Abstract
Autonomous computations that rely on automated reaction network elucidation algorithms may pave the way to make computational catalysis on a par with experimental research in the field. Several advantages of this approach are key to catalysis: (i) automation allows one to consider orders of magnitude more structures in a systematic and open-ended fashion than what would be accessible by manual inspection. Eventually, full resolution in terms of structural varieties and conformations as well as with respect to the type and number of potentially important elementary reaction steps (including decomposition reactions that determine turnover numbers) may be achieved. (ii) Fast electronic structure methods with uncertainty quantification warrant high efficiency and reliability in order to not only deliver results quickly, but also to allow for predictive work. (iii) A high degree of autonomy reduces the amount of manual human work, processing errors, and human bias. Although being inherently unbiased, it is still steerable with respect to specific regions of an emerging network and with respect to the addition of new reactant species. This allows for a high fidelity of the formalization of some catalytic process and for surprising in silico discoveries. In this work, we first review the state of the art in computational catalysis to embed autonomous explorations into the general field from which it draws its ingredients. We then elaborate on the specific conceptual issues that arise in the context of autonomous computational procedures, some of which we discuss at an example catalytic system. GRAPHICAL ABSTRACT SUPPLEMENTARY INFORMATION The online version contains supplementary material available at 10.1007/s11244-021-01543-9.
Collapse
Affiliation(s)
- Miguel Steiner
- Laboratory of Physical Chemistry, ETH Zurich, Vladimir-Prelog-Weg 2, 8093 Zurich, Switzerland
| | - Markus Reiher
- Laboratory of Physical Chemistry, ETH Zurich, Vladimir-Prelog-Weg 2, 8093 Zurich, Switzerland
| |
Collapse
|
90
|
Li S, Liu Y, Chen D, Jiang Y, Nie Z, Pan F. Encoding the atomic structure for machine learning in materials science. WIRES COMPUTATIONAL MOLECULAR SCIENCE 2022. [DOI: 10.1002/wcms.1558] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Affiliation(s)
- Shunning Li
- School of Advanced Materials Peking University, Shenzhen Graduate School Shenzhen China
| | - Yuanji Liu
- School of Advanced Materials Peking University, Shenzhen Graduate School Shenzhen China
| | - Dong Chen
- School of Advanced Materials Peking University, Shenzhen Graduate School Shenzhen China
| | - Yi Jiang
- School of Advanced Materials Peking University, Shenzhen Graduate School Shenzhen China
| | - Zhiwei Nie
- School of Advanced Materials Peking University, Shenzhen Graduate School Shenzhen China
| | - Feng Pan
- School of Advanced Materials Peking University, Shenzhen Graduate School Shenzhen China
| |
Collapse
|
91
|
Landeros-Rivera B, Gallegos M, Munarriz J, Laplaza R, Contreras García J. New venues in electron density analysis. Phys Chem Chem Phys 2022; 24:21538-21548. [DOI: 10.1039/d2cp01517j] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
We provide a comprehensive overview of the chemical information within the electron density: how to extract information, but also how to obtain and how to assess the quality of the...
Collapse
|
92
|
Karthikeyan A, Priyakumar UD. Artificial intelligence: machine learning for chemical sciences. J CHEM SCI 2021; 134:2. [PMID: 34955617 PMCID: PMC8691161 DOI: 10.1007/s12039-021-01995-2] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2021] [Revised: 09/08/2021] [Accepted: 09/14/2021] [Indexed: 12/05/2022]
Abstract
Research in molecular sciences witnessed the rise and fall of Artificial Intelligence (AI)/ Machine Learning (ML) methods, especially artificial neural networks, few decades ago. However, we see a major resurgence in the use of modern ML methods in scientific research during the last few years. These methods have had phenomenal success in the areas of computer vision, speech recognition, natural language processing (NLP), etc. This has inspired chemists and biologists to apply these algorithms to problems in natural sciences. Availability of high performance Graphics Processing Unit (GPU) accelerators, large datasets, new algorithms, and libraries has enabled this surge. ML algorithms have successfully been applied to various domains in molecular sciences by providing much faster and sometimes more accurate solutions compared to traditional methods like Quantum Mechanical (QM) calculations, Density Functional Theory (DFT) or Molecular Mechanics (MM) based methods, etc. Some of the areas where the potential of ML methods are shown to be effective are in drug design, prediction of high-level quantum mechanical energies, molecular design, molecular dynamics materials, and retrosynthesis of organic compounds, etc. This article intends to conceptually introduce various modern ML methods and their relevance and applications in computational natural sciences. Synopsis Recent surge in the application of machine learning (ML) methods in fundamental sciences has led to a perspective that these methods may become important tools in chemical science. This perspective provides an overview of the modern ML methods and their successful applications in chemistry during the last few years.
Collapse
Affiliation(s)
- Akshaya Karthikeyan
- Center for Computational Natural Sciences and Bioinformatics, International Institute of Information Technology, Hyderabad, 500 032 India
| | - U Deva Priyakumar
- Center for Computational Natural Sciences and Bioinformatics, International Institute of Information Technology, Hyderabad, 500 032 India
| |
Collapse
|
93
|
Jeong W, Gaggioli CA, Gagliardi L. Active Learning Configuration Interaction for Excited-State Calculations of Polycyclic Aromatic Hydrocarbons. J Chem Theory Comput 2021; 17:7518-7530. [PMID: 34787422 PMCID: PMC8675132 DOI: 10.1021/acs.jctc.1c00769] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2021] [Indexed: 11/30/2022]
Abstract
We present the active learning configuration interaction (ALCI) method for multiconfigurational calculations based on large active spaces. ALCI leverages the use of an active learning procedure to find important electronic configurations among the full configurational space generated within an active space. We tested it for the calculation of singlet-singlet excited states of acenes and pyrene using different machine learning algorithms. The ALCI method yields excitation energies within 0.2-0.3 eV from those obtained by traditional complete active-space configuration interaction (CASCI) calculations (affordable for active spaces up to 16 electrons in 16 orbitals) by including only a small fraction of the CASCI configuration space in the calculations. For larger active spaces (we tested up to 26 electrons in 26 orbitals), not affordable with traditional CI methods, ALCI captures the trends of experimental excitation energies. Overall, ALCI provides satisfactory approximations to large active-space wave functions with up to 10 orders of magnitude fewer determinants for the systems presented here. These ALCI wave functions are promising and affordable starting points for the subsequent second-order perturbation theory or pair-density functional theory calculations.
Collapse
Affiliation(s)
- WooSeok Jeong
- Department
of Chemistry, Nanoporous Materials Genome Center, Chemical Theory
Center, and Minnesota Supercomputing Institute, University of Minnesota, Minneapolis, Minnesota 55455, United States
| | - Carlo Alberto Gaggioli
- Department
of Chemistry, Pritzker School of Molecular Engineering, James Franck
Institute, Chicago Center for Theoretical Chemistry, University of Chicago, Chicago, Illinois 60637, United States
| | - Laura Gagliardi
- Department
of Chemistry, Pritzker School of Molecular Engineering, James Franck
Institute, Chicago Center for Theoretical Chemistry, University of Chicago, Chicago, Illinois 60637, United States
- Argonne
National Laboratory, Lemont, Illinois 60439, United States
| |
Collapse
|
94
|
Unke OT, Chmiela S, Gastegger M, Schütt KT, Sauceda HE, Müller KR. SpookyNet: Learning force fields with electronic degrees of freedom and nonlocal effects. Nat Commun 2021; 12:7273. [PMID: 34907176 PMCID: PMC8671403 DOI: 10.1038/s41467-021-27504-0] [Citation(s) in RCA: 107] [Impact Index Per Article: 35.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2021] [Accepted: 11/16/2021] [Indexed: 01/12/2023] Open
Abstract
Machine-learned force fields combine the accuracy of ab initio methods with the efficiency of conventional force fields. However, current machine-learned force fields typically ignore electronic degrees of freedom, such as the total charge or spin state, and assume chemical locality, which is problematic when molecules have inconsistent electronic states, or when nonlocal effects play a significant role. This work introduces SpookyNet, a deep neural network for constructing machine-learned force fields with explicit treatment of electronic degrees of freedom and nonlocality, modeled via self-attention in a transformer architecture. Chemically meaningful inductive biases and analytical corrections built into the network architecture allow it to properly model physical limits. SpookyNet improves upon the current state-of-the-art (or achieves similar performance) on popular quantum chemistry data sets. Notably, it is able to generalize across chemical and conformational space and can leverage the learned chemical insights, e.g. by predicting unknown spin states, thus helping to close a further important remaining gap for today's machine learning models in quantum chemistry.
Collapse
Affiliation(s)
- Oliver T Unke
- Machine Learning Group, Technische Universität Berlin, 10587, Berlin, Germany.
- DFG Cluster of Excellence "Unifying Systems in Catalysis" (UniSysCat), Technische Universität Berlin, 10623, Berlin, Germany.
| | - Stefan Chmiela
- Machine Learning Group, Technische Universität Berlin, 10587, Berlin, Germany
| | - Michael Gastegger
- Machine Learning Group, Technische Universität Berlin, 10587, Berlin, Germany
- DFG Cluster of Excellence "Unifying Systems in Catalysis" (UniSysCat), Technische Universität Berlin, 10623, Berlin, Germany
| | - Kristof T Schütt
- Machine Learning Group, Technische Universität Berlin, 10587, Berlin, Germany
| | - Huziel E Sauceda
- Machine Learning Group, Technische Universität Berlin, 10587, Berlin, Germany
- BASLEARN, BASF-TU joint Lab, Technische Universität Berlin, 10587, Berlin, Germany
| | - Klaus-Robert Müller
- Machine Learning Group, Technische Universität Berlin, 10587, Berlin, Germany.
- Department of Artificial Intelligence, Korea University, Anam-dong, Seongbuk-gu, Seoul, 02841, Korea.
- Max Planck Institute for Informatics, Stuhlsatzenhausweg, 66123, Saarbrücken, Germany.
- BIFOLD-Berlin Institute for the Foundations of Learning and Data, Berlin, Germany.
- Google Research, Brain team, Berlin, Germany.
| |
Collapse
|
95
|
Powell D, Hansen KR, Flannery L, Whittaker-Brooks L. Traversing Excitonic and Ionic Landscapes: Reduced-Dimensionality-Inspired Design of Organometal Halide Semiconductors for Energy Applications. Acc Chem Res 2021; 54:4371-4382. [PMID: 34841870 DOI: 10.1021/acs.accounts.1c00492] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
At the very heart of the global semiconductor industry lies the omnipresent push for new materials discovery. New materials constantly rise and fall out of fashion in the scientific literature, with those passing an initial phase of research scrutiny becoming hotbeds of characterization and optimization efforts. Yet, innumerable hours of painstaking research have been devoted to materials that have ultimately fallen by the wayside after crossing over an indefinable threshold, whereupon historical optimism is met with newfound skepticism. Materials have to perform well, and they have to do it quickly. In the past decade, metal-halide perovskites (MHPs) have garnered widespread attention. The hegemonic view in both academic and industrial circles is that these materials could be engineered to meet the demands of the semiconductor industry. Their promise as inexpensive solar cell devices is highly attractive, and it has been nothing short of remarkable that efficiencies have risen from 3.8% in 2009 to more than 25.5% in 2021. Moreover, MHPs are poised to be revolutionary materials in more ways than one. The highest MHP LED efficiency was recently reported (23.4%), and MHPs have demonstrated promise in photodetectors, memristors, and transistors. However, the many excellent properties of MHPs are contrasted by longstanding stability and reproducibility limitations that have hindered their commercialization. Overcoming the limitations of MHPs is ultimately a materials engineering problem, which should be solved by mapping more precise relationships between structure, composition, and device performance. In 1958, Francis Crick famously developed the central dogma of molecular biology which describes the unidirectional flow of information in biological systems. In the words of Crick, "nature has devised a unique instrument in which an underlying simplicity is used to express great subtlety and versatility." In this Account, taking inspiration from the hierarchical organization of nature, we describe a hierarchical approach to materials engineering of organic metal-halide semiconductors. We demonstrate that organo-metal halide semiconductors' dimensionality, composition, and morphology dictate their optoelectronic properties and can be exploited in defining more explicit relationships between structure and function. Here, we traverse three-dimensional (3D), two-dimensional (2D), and one-dimensional (1D) organo-metal halide semiconductors, detailing the morphological and compositional differences in each and the implications that can be drawn within each domain on the engineering process. Control over ion migration pathways via morphology engineering as well as control over charge formation in organic-inorganic semiconductors is demonstrated. Fundamental insights into the amount of static and dynamic disorder in the MHP lattice are provided, which can be continuously tuned as a function of composition and morphology. Using electroabsorption spectroscopy on 2D MHPs, a disorder-induced dipole moment in the exciton proportional to the summed value of static and dynamic disorder is measured. Spectroscopic isolation of exciton features in 2D MHP electroabsorption spectra allows us to obtain precise, model-independent measurements of exciton binding energies to study the effect of chemical substitutions, such as Sn2+ → Pb2+, on the value of the exciton binding energy. Finally, we conclude that this multidimensional platform, with the aid of machine learning and robotics, will be foundational in accurately predicting structure-property-device relationships in organo-metal halide semiconductors in the future.
Collapse
Affiliation(s)
- Daniel Powell
- Department of Chemistry, University of Utah, Salt Lake City, Utah 84112, United States
| | - Kameron R. Hansen
- Department of Chemistry, University of Utah, Salt Lake City, Utah 84112, United States
| | - Laura Flannery
- Department of Chemistry, University of Utah, Salt Lake City, Utah 84112, United States
| | | |
Collapse
|
96
|
Zheng P, Zubatyuk R, Wu W, Isayev O, Dral PO. Artificial intelligence-enhanced quantum chemical method with broad applicability. Nat Commun 2021; 12:7022. [PMID: 34857738 PMCID: PMC8640006 DOI: 10.1038/s41467-021-27340-2] [Citation(s) in RCA: 40] [Impact Index Per Article: 13.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2021] [Accepted: 11/10/2021] [Indexed: 11/08/2022] Open
Abstract
High-level quantum mechanical (QM) calculations are indispensable for accurate explanation of natural phenomena on the atomistic level. Their staggering computational cost, however, poses great limitations, which luckily can be lifted to a great extent by exploiting advances in artificial intelligence (AI). Here we introduce the general-purpose, highly transferable artificial intelligence-quantum mechanical method 1 (AIQM1). It approaches the accuracy of the gold-standard coupled cluster QM method with high computational speed of the approximate low-level semiempirical QM methods for the neutral, closed-shell species in the ground state. AIQM1 can provide accurate ground-state energies for diverse organic compounds as well as geometries for even challenging systems such as large conjugated compounds (fullerene C60) close to experiment. This opens an opportunity to investigate chemical compounds with previously unattainable speed and accuracy as we demonstrate by determining geometries of polyyne molecules-the task difficult for both experiment and theory. Noteworthy, our method's accuracy is also good for ions and excited-state properties, although the neural network part of AIQM1 was never fitted to these properties.
Collapse
Affiliation(s)
- Peikun Zheng
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, Department of Chemistry, and College of Chemistry and Chemical Engineering, Xiamen University, Xiamen, 361005, China
| | - Roman Zubatyuk
- Department of Chemistry, Carnegie Mellon University, Pittsburgh, PA, 15213, USA
| | - Wei Wu
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, Department of Chemistry, and College of Chemistry and Chemical Engineering, Xiamen University, Xiamen, 361005, China
| | - Olexandr Isayev
- Department of Chemistry, Carnegie Mellon University, Pittsburgh, PA, 15213, USA.
| | - Pavlo O Dral
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, Department of Chemistry, and College of Chemistry and Chemical Engineering, Xiamen University, Xiamen, 361005, China.
| |
Collapse
|
97
|
Xu Y, Liu X, Cao X, Huang C, Liu E, Qian S, Liu X, Wu Y, Dong F, Qiu CW, Qiu J, Hua K, Su W, Wu J, Xu H, Han Y, Fu C, Yin Z, Liu M, Roepman R, Dietmann S, Virta M, Kengara F, Zhang Z, Zhang L, Zhao T, Dai J, Yang J, Lan L, Luo M, Liu Z, An T, Zhang B, He X, Cong S, Liu X, Zhang W, Lewis JP, Tiedje JM, Wang Q, An Z, Wang F, Zhang L, Huang T, Lu C, Cai Z, Wang F, Zhang J. Artificial intelligence: A powerful paradigm for scientific research. Innovation (N Y) 2021; 2:100179. [PMID: 34877560 PMCID: PMC8633405 DOI: 10.1016/j.xinn.2021.100179] [Citation(s) in RCA: 103] [Impact Index Per Article: 34.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2021] [Accepted: 10/26/2021] [Indexed: 12/18/2022] Open
Abstract
Artificial intelligence (AI) coupled with promising machine learning (ML) techniques well known from computer science is broadly affecting many aspects of various fields including science and technology, industry, and even our day-to-day life. The ML techniques have been developed to analyze high-throughput data with a view to obtaining useful insights, categorizing, predicting, and making evidence-based decisions in novel ways, which will promote the growth of novel applications and fuel the sustainable booming of AI. This paper undertakes a comprehensive survey on the development and application of AI in different aspects of fundamental sciences, including information science, mathematics, medical science, materials science, geoscience, life science, physics, and chemistry. The challenges that each discipline of science meets, and the potentials of AI techniques to handle these challenges, are discussed in detail. Moreover, we shed light on new research trends entailing the integration of AI into each scientific discipline. The aim of this paper is to provide a broad research guideline on fundamental sciences with potential infusion of AI, to help motivate researchers to deeply understand the state-of-the-art applications of AI-based fundamental sciences, and thereby to help promote the continuous development of these fundamental sciences.
Collapse
Affiliation(s)
- Yongjun Xu
- Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Xin Liu
- Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Xin Cao
- Zhongshan Hospital Institute of Clinical Science, Fudan University, Shanghai 200032, China
| | - Changping Huang
- Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100094, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Enke Liu
- Institute of Physics, Chinese Academy of Sciences, Beijing 100190, China
- Songshan Lake Materials Laboratory, Dongguan, Guangdong 523808, China
| | - Sen Qian
- Institute of High Energy Physics, Chinese Academy of Sciences, Beijing 100049, China
| | - Xingchen Liu
- Institute of Coal Chemistry, Chinese Academy of Sciences, Taiyuan 030001, China
| | - Yanjun Wu
- Institute of Software, Chinese Academy of Sciences, Beijing 100190, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Fengliang Dong
- National Center for Nanoscience and Technology, Beijing 100190, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Cheng-Wei Qiu
- Department of Electrical and Computer Engineering, National University of Singapore, Singapore 117583, Singapore
| | - Junjun Qiu
- Department of Gynaecology, Obstetrics and Gynaecology Hospital, Fudan University, Shanghai 200011, China
- Shanghai Key Laboratory of Female Reproductive Endocrine-Related Diseases, Shanghai 200011, China
| | - Keqin Hua
- Department of Gynaecology, Obstetrics and Gynaecology Hospital, Fudan University, Shanghai 200011, China
- Shanghai Key Laboratory of Female Reproductive Endocrine-Related Diseases, Shanghai 200011, China
| | - Wentao Su
- School of Food Science and Technology, Dalian Polytechnic University, Dalian 116034, China
| | - Jian Wu
- Second Affiliated Hospital School of Medicine, and School of Public Health, Zhejiang University, Hangzhou 310058, China
| | - Huiyu Xu
- Department of Obstetrics and Gynecology, Peking University Third Hospital, Beijing 100191, China
| | - Yong Han
- Zhejiang Provincial People’s Hospital, Hangzhou 310014, China
| | - Chenguang Fu
- School of Materials Science and Engineering, Zhejiang University, Hangzhou 310027, China
| | - Zhigang Yin
- Fujian Institute of Research on the Structure of Matter, Chinese Academy of Sciences, Fuzhou 350002, China
| | - Miao Liu
- Institute of Physics, Chinese Academy of Sciences, Beijing 100190, China
- Songshan Lake Materials Laboratory, Dongguan, Guangdong 523808, China
| | - Ronald Roepman
- Medical Center, Radboud University, 6500 Nijmegen, the Netherlands
| | - Sabine Dietmann
- Institute for Informatics, Washington University School of Medicine, St. Louis, MO 63110, USA
| | - Marko Virta
- Department of Microbiology, University of Helsinki, 00014 Helsinki, Finland
| | - Fredrick Kengara
- School of Pure and Applied Sciences, Bomet University College, Bomet 20400, Kenya
| | - Ze Zhang
- Agriculture College of Shihezi University, Xinjiang 832000, China
| | - Lifu Zhang
- Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100094, China
- Agriculture College of Shihezi University, Xinjiang 832000, China
| | - Taolan Zhao
- Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, Beijing 100101, China
| | - Ji Dai
- The Brain Cognition and Brain Disease Institute, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China
- University of Chinese Academy of Sciences, Beijing 100049, China
- Shenzhen-Hong Kong Institute of Brain Science-Shenzhen Fundamental Research Institutions, Shenzhen 518055, China
| | | | - Liang Lan
- Department of Communication Studies, Hong Kong Baptist University, Hong Kong, China
| | - Ming Luo
- South China Botanical Garden, Chinese Academy of Sciences, Guangzhou 510650, China
- Center of Economic Botany, Core Botanical Gardens, Chinese Academy of Sciences, Guangzhou 510650, China
| | - Zhaofeng Liu
- Institute of High Energy Physics, Chinese Academy of Sciences, Beijing 100049, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Tao An
- Shanghai Astronomical Observatory, Chinese Academy of Sciences, Shanghai 200030, China
| | - Bin Zhang
- Institute of Coal Chemistry, Chinese Academy of Sciences, Taiyuan 030001, China
| | - Xiao He
- Institute of High Energy Physics, Chinese Academy of Sciences, Beijing 100049, China
| | - Shan Cong
- Suzhou Institute of Nano-Tech and Nano-Bionics, Chinese Academy of Sciences, Suzhou 215123, China
| | - Xiaohong Liu
- Chongqing Institute of Green and Intelligent Technology, Chinese Academy of Sciences, Chongqing 400714, China
| | - Wei Zhang
- Chongqing Institute of Green and Intelligent Technology, Chinese Academy of Sciences, Chongqing 400714, China
| | - James P. Lewis
- Institute of Coal Chemistry, Chinese Academy of Sciences, Taiyuan 030001, China
| | - James M. Tiedje
- Center for Microbial Ecology, Department of Plant, Soil and Microbial Sciences, Michigan State University, East Lansing, MI 48824, USA
| | - Qi Wang
- Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China
- University of Chinese Academy of Sciences, Beijing 100049, China
- Zhejiang Lab, Hangzhou 311121, China
| | - Zhulin An
- Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Fei Wang
- Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Libo Zhang
- Institute of Software, Chinese Academy of Sciences, Beijing 100190, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Tao Huang
- Shanghai Institute of Nutrition and Health, Chinese Academy of Sciences, Shanghai 200031, China
| | - Chuan Lu
- Department of Computer Science, Aberystwyth University, Aberystwyth, Ceredigion SY23 3FL, UK
| | - Zhipeng Cai
- Department of Computer Science, Georgia State University, Atlanta, GA 30303, USA
| | - Fang Wang
- Institute of Soil Science, Chinese Academy of Sciences, Nanjing 210008, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Jiabao Zhang
- Institute of Soil Science, Chinese Academy of Sciences, Nanjing 210008, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| |
Collapse
|
98
|
Chen X, Liu X, Shen X, Zhang Q. Applying Machine Learning to Rechargeable Batteries: From the Microscale to the Macroscale. Angew Chem Int Ed Engl 2021. [DOI: 10.1002/ange.202107369] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023]
Affiliation(s)
- Xiang Chen
- Beijing Key Laboratory of Green Chemical Reaction Engineering and Technology Department of Chemical Engineering Tsinghua University Beijing 100084 China
| | - Xinyan Liu
- Beijing Key Laboratory of Green Chemical Reaction Engineering and Technology Department of Chemical Engineering Tsinghua University Beijing 100084 China
- Institute of Fundamental and Frontier Sciences University of Electronic Science and Technology of China Chengdu 611731 Sichuan China
| | - Xin Shen
- Beijing Key Laboratory of Green Chemical Reaction Engineering and Technology Department of Chemical Engineering Tsinghua University Beijing 100084 China
| | - Qiang Zhang
- Beijing Key Laboratory of Green Chemical Reaction Engineering and Technology Department of Chemical Engineering Tsinghua University Beijing 100084 China
| |
Collapse
|
99
|
Gastegger M, Schütt KT, Müller KR. Machine learning of solvent effects on molecular spectra and reactions. Chem Sci 2021; 12:11473-11483. [PMID: 34567501 PMCID: PMC8409491 DOI: 10.1039/d1sc02742e] [Citation(s) in RCA: 45] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2021] [Accepted: 07/22/2021] [Indexed: 01/13/2023] Open
Abstract
Fast and accurate simulation of complex chemical systems in environments such as solutions is a long standing challenge in theoretical chemistry. In recent years, machine learning has extended the boundaries of quantum chemistry by providing highly accurate and efficient surrogate models of electronic structure theory, which previously have been out of reach for conventional approaches. Those models have long been restricted to closed molecular systems without accounting for environmental influences, such as external electric and magnetic fields or solvent effects. Here, we introduce the deep neural network FieldSchNet for modeling the interaction of molecules with arbitrary external fields. FieldSchNet offers access to a wealth of molecular response properties, enabling it to simulate a wide range of molecular spectra, such as infrared, Raman and nuclear magnetic resonance. Beyond that, it is able to describe implicit and explicit molecular environments, operating as a polarizable continuum model for solvation or in a quantum mechanics/molecular mechanics setup. We employ FieldSchNet to study the influence of solvent effects on molecular spectra and a Claisen rearrangement reaction. Based on these results, we use FieldSchNet to design an external environment capable of lowering the activation barrier of the rearrangement reaction significantly, demonstrating promising venues for inverse chemical design.
Collapse
Affiliation(s)
- Michael Gastegger
- Machine Learning Group, Technische Universität Berlin 10587 Berlin Germany
| | - Kristof T Schütt
- Machine Learning Group, Technische Universität Berlin 10587 Berlin Germany
- Berlin Institute for the Foundations of Learning and Data 10587 Berlin Germany
| | - Klaus-Robert Müller
- Machine Learning Group, Technische Universität Berlin 10587 Berlin Germany
- Berlin Institute for the Foundations of Learning and Data 10587 Berlin Germany
- Department of Artificial Intelligence, Korea University Anam-dong, Seongbuk-gu Seoul 02841 Korea
- Max-Planck-Institut für Informatik 66123 Saarbrücken Germany
| |
Collapse
|
100
|
Knijff L, Zhang C. Machine learning inference of molecular dipole moment in liquid water. MACHINE LEARNING: SCIENCE AND TECHNOLOGY 2021. [DOI: 10.1088/2632-2153/ac0123] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Abstract
Molecular dipole moment in liquid water is an intriguing property, partly due to the fact that there is no unique way to partition the total electron density into individual molecular contributions. The prevailing method to circumvent this problem is to use maximally localized Wannier functions, which perform a unitary transformation of the occupied molecular orbitals by minimizing the spread function of Boys. Here we revisit this problem using a data-driven approach satisfying two physical constraints, namely: (a) The displacement of the atomic charges is proportional to the Berry phase polarization; (b) Each water molecule has a formal charge of zero. It turns out that the distribution of molecular dipole moments in liquid water inferred from latent variables is surprisingly similar to that obtained from maximally localized Wannier functions. Apart from putting a maximum-likelihood footnote to the established method, this work highlights the capability of graph convolution based charge models and the importance of physical constraints on improving the model interpretability.
Collapse
|