1
|
Seppäläinen L, Björklund A, Besel V, Puolamäki K. Using slisemap to interpret physical data. PLoS One 2024; 19:e0297714. [PMID: 38271355 PMCID: PMC10810528 DOI: 10.1371/journal.pone.0297714] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2023] [Accepted: 01/10/2024] [Indexed: 01/27/2024] Open
Abstract
Manifold visualisation techniques are commonly used to visualise high-dimensional datasets in physical sciences. In this paper, we apply a recently introduced manifold visualisation method, slisemap, on datasets from physics and chemistry. slisemap combines manifold visualisation with explainable artificial intelligence. Explainable artificial intelligence investigates the decision processes of black box machine learning models and complex simulators. With slisemap, we find an embedding such that data items with similar local explanations are grouped together. Hence, slisemap gives us an overview of the different behaviours of a black box model, where the patterns in the embedding reflect a target property. In this paper, we show how slisemap can be used and evaluated on physical data and that it is helpful in finding meaningful information on classification and regression models trained on these datasets.
Collapse
|
2
|
Zhang S, He X, Xia X, Xiao P, Wu Q, Zheng F, Lu Q. Machine-Learning-Enabled Framework in Engineering Plastics Discovery: A Case Study of Designing Polyimides with Desired Glass-Transition Temperature. ACS APPLIED MATERIALS & INTERFACES 2023; 15:37893-37902. [PMID: 37490394 DOI: 10.1021/acsami.3c05376] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/27/2023]
Abstract
Great and continuous efforts have been made to discover high-performance engineering plastics with specific properties to replace traditional engineering materials in many fields. The utilization of machine learning (ML) has brought more opportunities for the discovery of high-performing engineering plastics. However, hindered by either the relatively small database or a lack of accurate structure descriptors with clear physical and chemical meanings relating to polymer properties, the current ML studies show some flaws in the accuracy and efficiency in polymer development. Herein, we collected a dataset of 878 polyimides (PI), one of the best engineering plastics, with experimentally measured glass-transition temperature (Tg) values, and developed a rapid and accurate ML approach to design PI candidates with the desired Tg value. After the conversion from PI structures into "mechanically identifiable" SMILES (Simplified molecular input line entry system) language, the eight most critical descriptors were ultimately obtained by multiple analysis methods. The physiochemical meaning of the key descriptors was further analyzed carefully to translate the implicit "machine language" to chemical knowledge. The artificial neural network (ANN)-based model gave the most accurate results with a root-mean-square error of ∼11 K among the studied ML methods. More importantly, three potential PI candidates with desired Tg (DPIs) were designed according to the chemical insight of the key descriptors, which were then verified by experiments. The experimental and predicted Tg values of DPIs have an acceptable average deviation of ca. 3.66%. This accuracy has reached the level of the traditional molecular simulation, but the time consumption and hold-up computing resource are tremendously reduced. Furthermore, the current ML approach could offer a scalable and adaptable framework in future engineer plastics innovation.
Collapse
Affiliation(s)
- Songyang Zhang
- School of Chemical Science and Engineering, Tongji University, Shanghai 200092, China
| | - Xiaojie He
- School of Chemical Science and Engineering, Tongji University, Shanghai 200092, China
| | - Xuejian Xia
- School of Chemical Science and Engineering, Tongji University, Shanghai 200092, China
| | - Peng Xiao
- School of Chemical Science and Engineering, Tongji University, Shanghai 200092, China
| | - Qi Wu
- Shanghai Key Lab of Electrical & Thermal Aging, School of Chemistry and Chemical Engineering, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Feng Zheng
- School of Chemical Science and Engineering, Tongji University, Shanghai 200092, China
| | - Qinghua Lu
- Shanghai Key Lab of Electrical & Thermal Aging, School of Chemistry and Chemical Engineering, Shanghai Jiao Tong University, Shanghai 200240, China
| |
Collapse
|
3
|
Besel V, Todorović M, Kurtén T, Rinke P, Vehkamäki H. Atomic structures, conformers and thermodynamic properties of 32k atmospheric molecules. Sci Data 2023; 10:450. [PMID: 37438370 DOI: 10.1038/s41597-023-02366-x] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2023] [Accepted: 07/05/2023] [Indexed: 07/14/2023] Open
Abstract
Low-volatile organic compounds (LVOCs) drive key atmospheric processes, such as new particle formation (NPF) and growth. Machine learning tools can accelerate studies of these phenomena, but extensive and versatile LVOC datasets relevant for the atmospheric research community are lacking. We present the GeckoQ dataset with atomic structures of 31,637 atmospherically relevant molecules resulting from the oxidation of α-pinene, toluene and decane. For each molecule, we performed comprehensive conformer sampling with the COSMOconf program and calculated thermodynamic properties with density functional theory (DFT) using the Conductor-like Screening Model (COSMO). Our dataset contains the geometries of the 7 Mio. conformers we found and their corresponding structural and thermodynamic properties, including saturation vapor pressures (pSat), chemical potentials and free energies. The pSat were compared to values calculated with the group contribution method SIMPOL. To validate the dataset, we explored the relationship between structural and thermodynamic properties, and then demonstrated a first machine-learning application with Gaussian process regression.
Collapse
Affiliation(s)
- Vitus Besel
- University of Helsinki, Institute for Atmospheric and Earth System Research, Helsinki, 00014, Finland.
| | - Milica Todorović
- University of Turku, Dept. Mechanical and Materials Engineering, Turku, FI-20014, Finland
| | - Theo Kurtén
- University of Helsinki, Institute for Atmospheric and Earth System Research, Helsinki, 00014, Finland
| | - Patrick Rinke
- Aalto University, Dept. of Applied Physics, P.O. Box 11100, FI-00076 Aalto, Espoo, Finland
| | - Hanna Vehkamäki
- University of Helsinki, Institute for Atmospheric and Earth System Research, Helsinki, 00014, Finland
| |
Collapse
|
4
|
Chen K, Kunkel C, Cheng B, Reuter K, Margraf JT. Physics-inspired machine learning of localized intensive properties. Chem Sci 2023; 14:4913-4922. [PMID: 37181767 PMCID: PMC10171074 DOI: 10.1039/d3sc00841j] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2023] [Accepted: 04/10/2023] [Indexed: 05/16/2023] Open
Abstract
Machine learning (ML) has been widely applied to chemical property prediction, most prominently for the energies and forces in molecules and materials. The strong interest in predicting energies in particular has led to a 'local energy'-based paradigm for modern atomistic ML models, which ensures size-extensivity and a linear scaling of computational cost with system size. However, many electronic properties (such as excitation energies or ionization energies) do not necessarily scale linearly with system size and may even be spatially localized. Using size-extensive models in these cases can lead to large errors. In this work, we explore different strategies for learning intensive and localized properties, using HOMO energies in organic molecules as a representative test case. In particular, we analyze the pooling functions that atomistic neural networks use to predict molecular properties, and suggest an orbital weighted average (OWA) approach that enables the accurate prediction of orbital energies and locations.
Collapse
Affiliation(s)
- Ke Chen
- Fritz-Haber-Institut der Max-Planck-Gesellschaft Faradayweg 4-6 D-14195 Berlin Germany
- Chair for Theoretical Chemistry and Catalysis Research Center, Technische Universität München Lichtenbergstraße 4 D-85747 Garching Germany
- Institute of Science and Technology Am Campus 1 3400 Klosterneuburg Austria
| | - Christian Kunkel
- Fritz-Haber-Institut der Max-Planck-Gesellschaft Faradayweg 4-6 D-14195 Berlin Germany
| | - Bingqing Cheng
- Institute of Science and Technology Am Campus 1 3400 Klosterneuburg Austria
| | - Karsten Reuter
- Fritz-Haber-Institut der Max-Planck-Gesellschaft Faradayweg 4-6 D-14195 Berlin Germany
- Chair for Theoretical Chemistry and Catalysis Research Center, Technische Universität München Lichtenbergstraße 4 D-85747 Garching Germany
| | - Johannes T Margraf
- Fritz-Haber-Institut der Max-Planck-Gesellschaft Faradayweg 4-6 D-14195 Berlin Germany
| |
Collapse
|
5
|
An Integrated Method of Bayesian Optimization and D-Optimal Design for Chemical Experiment Optimization. Processes (Basel) 2022. [DOI: 10.3390/pr11010087] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022] Open
Abstract
The smart chemical laboratory has recently emerged as a promising trend for future chemical research, where experiment optimization is of vital importance. The traditional Bayesian optimization (BO) algorithm focuses on exploring the dependent variable space while overlooking the independent variable space. Consequently, the BO algorithm suffers from becoming stuck at local optima, which severely deteriorates the optimization performance, especially with bad-quality initial points. Herein, we propose a novel stochastic framework of Bayesian optimization with D-optimal design (BODO) by integrating BO with D-optimal design. BODO can balance the exploitation in the dependent variable space and the exploration in the independent variable space. We highlight the excellent performance of BODO even with poor initial points on the benchmark alpine2 function. Meanwhile, BODO demonstrates a better average objective function value than BO on the benchmark Summit SnAr chemical process, showing its advantage in chemical experiment optimization and potential application in future chemical experiments.
Collapse
|
6
|
Bhat V, Sornberger P, Pokuri BSS, Duke R, Ganapathysubramanian B, Risko C. Electronic, redox, and optical property prediction of organic π-conjugated molecules through a hierarchy of machine learning approaches. Chem Sci 2022; 14:203-213. [PMID: 36605753 PMCID: PMC9769113 DOI: 10.1039/d2sc04676h] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2022] [Accepted: 11/16/2022] [Indexed: 11/18/2022] Open
Abstract
Accelerating the development of π-conjugated molecules for applications such as energy generation and storage, catalysis, sensing, pharmaceuticals, and (semi)conducting technologies requires rapid and accurate evaluation of the electronic, redox, or optical properties. While high-throughput computational screening has proven to be a tremendous aid in this regard, machine learning (ML) and other data-driven methods can further enable orders of magnitude reduction in time while at the same time providing dramatic increases in the chemical space that is explored. However, the lack of benchmark datasets containing the electronic, redox, and optical properties that characterize the diverse, known chemical space of organic π-conjugated molecules limits ML model development. Here, we present a curated dataset containing 25k molecules with density functional theory (DFT) and time-dependent DFT (TDDFT) evaluated properties that include frontier molecular orbitals, ionization energies, relaxation energies, and low-lying optical excitation energies. Using the dataset, we train a hierarchy of ML models, ranging from classical models such as ridge regression to sophisticated graph neural networks, with molecular SMILES representation as input. We observe that graph neural networks augmented with contextual information allow for significantly better predictions across a wide array of properties. Our best-performing models also provide an uncertainty quantification for the predictions. To democratize access to the data and trained models, an interactive web platform has been developed and deployed.
Collapse
Affiliation(s)
- Vinayak Bhat
- Department of Chemistry and Center for Applied Energy Research, University of KentuckyLexingtonKentucky 40506USA
| | - Parker Sornberger
- Department of Chemistry and Center for Applied Energy Research, University of KentuckyLexingtonKentucky 40506USA
| | | | - Rebekah Duke
- Department of Chemistry and Center for Applied Energy Research, University of KentuckyLexingtonKentucky 40506USA
| | | | - Chad Risko
- Department of Chemistry and Center for Applied Energy Research, University of KentuckyLexingtonKentucky 40506USA
| |
Collapse
|
7
|
Mazouin B, Schöpfer AA, von Lilienfeld OA. Selected machine learning of HOMO-LUMO gaps with improved data-efficiency. MATERIALS ADVANCES 2022; 3:8306-8316. [PMID: 36561279 PMCID: PMC9662596 DOI: 10.1039/d2ma00742h] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/27/2022] [Accepted: 09/12/2022] [Indexed: 06/17/2023]
Abstract
Despite their relevance for organic electronics, quantum machine learning (QML) models of molecular electronic properties, such as HOMO-LUMO-gaps, often struggle to achieve satisfying data-efficiency as measured by decreasing prediction errors for increasing training set sizes. We demonstrate that partitioning training sets into different chemical classes prior to training results in independently trained QML models with overall reduced training data needs. For organic molecules drawn from previously published QM7 and QM9-data-sets we have identified and exploited three relevant classes corresponding to compounds containing either aromatic rings and carbonyl groups, or single unsaturated bonds, or saturated bonds The selected QML models of band-gaps (considered at GW and hybrid DFT levels of theory) reach mean absolute prediction errors of ∼0.1 eV for up to an order of magnitude fewer training molecules than for QML models trained on randomly selected molecules. Comparison to Δ-QML models of band-gaps indicates that selected QML exhibit superior data-efficiency. Our findings suggest that selected QML, e.g. based on simple classifications prior to training, could help to successfully tackle challenging quantum property screening tasks of large libraries with high fidelity and low computational burden.
Collapse
Affiliation(s)
- Bernard Mazouin
- University of Vienna, Faculty of Physics and Vienna Doctoral School in Physics Kolingasse 14-16 1090 Vienna Austria
| | | | - O Anatole von Lilienfeld
- Departments of Chemistry, Materials Science and Engineering, and Physics, University of Toronto St. George Campus Toronto ON Canada
- Vector Institute for Artificial Intelligence Toronto ON M5S 1M1 Canada
- Machine Learning Group, Technische Universität Berlin and Institute for the Foundations of Learning and Data 10587 Berlin Germany
| |
Collapse
|
8
|
Sun Q, Xiang Y, Liu Y, Xu L, Leng T, Ye Y, Fortunelli A, Goddard WA, Cheng T. Machine Learning Predicts the X-ray Photoelectron Spectroscopy of the Solid Electrolyte Interface of Lithium Metal Battery. J Phys Chem Lett 2022; 13:8047-8054. [PMID: 35994432 DOI: 10.1021/acs.jpclett.2c02222] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
X-ray photoelectron spectroscopy (XPS) is a powerful surface analysis technique widely applied in characterizing the solid electrolyte interphase (SEI) of lithium metal batteries. However, experiment XPS measurements alone fail to provide atomic structures from a deeply buried SEI, leaving vital details missing. By combining hybrid ab initio and reactive molecular dynamics (HAIR) and machine learning (ML) models, we present an artificial intelligence ab initio (AI-ai) framework to predict the XPS of a SEI. A localized high-concentration electrolyte with a Li metal anode is simulated with a HAIR scheme for ∼3 ns. Taking the local many-body tensor representation as a descriptor, four ML models are utilized to predict the core level shifts. Overall, extreme gradient boosting exhibits the highest accuracy and lowest variance (with errors ≤ 0.05 eV). Such an AI-ai model enables the XPS predictions of ten thousand frames with marginal cost.
Collapse
Affiliation(s)
- Qintao Sun
- Institute of Functional Nano & Soft Materials (FUNSOM), Jiangsu Key Laboratory for Carbon-Based Functional Materials & Devices, Soochow University, 199 Ren'ai Road, Suzhou, 215123, Jiangsu P. R. China
| | - Yan Xiang
- School of Chemistry and Chemical Engineering, Shanghai Jiao Tong University, Shanghai, 200240, China
| | - Yue Liu
- Institute of Functional Nano & Soft Materials (FUNSOM), Jiangsu Key Laboratory for Carbon-Based Functional Materials & Devices, Soochow University, 199 Ren'ai Road, Suzhou, Jiangsu 215123, P. R. China
| | - Liang Xu
- Institute of Functional Nano & Soft Materials (FUNSOM), Jiangsu Key Laboratory for Carbon-Based Functional Materials & Devices, Soochow University, 199 Ren'ai Road, Suzhou, Jiangsu 215123, P. R. China
| | - Tianle Leng
- Materials and Process Simulation Center, California Institute of Technology, Pasadena, California 91125, United States
| | - Yifan Ye
- National Synchrotron Radiation Laboratory, University of Science and Technology of China, Hefei, An Hui 230026, China
| | | | - William A Goddard
- Materials and Process Simulation Center, California Institute of Technology, Pasadena, California 91125, United States
| | - Tao Cheng
- Institute of Functional Nano & Soft Materials (FUNSOM), Jiangsu Key Laboratory for Carbon-Based Functional Materials & Devices, Soochow University, 199 Ren'ai Road, Suzhou, Jiangsu 215123, P. R. China
| |
Collapse
|
9
|
Sajjan M, Li J, Selvarajan R, Sureshbabu SH, Kale SS, Gupta R, Singh V, Kais S. Quantum machine learning for chemistry and physics. Chem Soc Rev 2022; 51:6475-6573. [PMID: 35849066 DOI: 10.1039/d2cs00203e] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Machine learning (ML) has emerged as a formidable force for identifying hidden but pertinent patterns within a given data set with the objective of subsequent generation of automated predictive behavior. In recent years, it is safe to conclude that ML and its close cousin, deep learning (DL), have ushered in unprecedented developments in all areas of physical sciences, especially chemistry. Not only classical variants of ML, even those trainable on near-term quantum hardwares have been developed with promising outcomes. Such algorithms have revolutionized materials design and performance of photovoltaics, electronic structure calculations of ground and excited states of correlated matter, computation of force-fields and potential energy surfaces informing chemical reaction dynamics, reactivity inspired rational strategies of drug designing and even classification of phases of matter with accurate identification of emergent criticality. In this review we shall explicate a subset of such topics and delineate the contributions made by both classical and quantum computing enhanced machine learning algorithms over the past few years. We shall not only present a brief overview of the well-known techniques but also highlight their learning strategies using statistical physical insight. The objective of the review is not only to foster exposition of the aforesaid techniques but also to empower and promote cross-pollination among future research in all areas of chemistry which can benefit from ML and in turn can potentially accelerate the growth of such algorithms.
Collapse
Affiliation(s)
- Manas Sajjan
- Department of Chemistry, Purdue University, West Lafayette, IN-47907, USA. .,Purdue Quantum Science and Engineering Institute, Purdue University, West Lafayette, Indiana 47907, USA
| | - Junxu Li
- Purdue Quantum Science and Engineering Institute, Purdue University, West Lafayette, Indiana 47907, USA.,Department of Physics and Astronomy, Purdue University, West Lafayette, IN-47907, USA
| | - Raja Selvarajan
- Purdue Quantum Science and Engineering Institute, Purdue University, West Lafayette, Indiana 47907, USA.,Department of Physics and Astronomy, Purdue University, West Lafayette, IN-47907, USA
| | - Shree Hari Sureshbabu
- Purdue Quantum Science and Engineering Institute, Purdue University, West Lafayette, Indiana 47907, USA.,Elmore Family School of Electrical and Computer Engineering, Purdue University, West Lafayette, IN-47907, USA
| | - Sumit Suresh Kale
- Department of Chemistry, Purdue University, West Lafayette, IN-47907, USA. .,Purdue Quantum Science and Engineering Institute, Purdue University, West Lafayette, Indiana 47907, USA
| | - Rishabh Gupta
- Department of Chemistry, Purdue University, West Lafayette, IN-47907, USA. .,Purdue Quantum Science and Engineering Institute, Purdue University, West Lafayette, Indiana 47907, USA
| | - Vinit Singh
- Department of Chemistry, Purdue University, West Lafayette, IN-47907, USA. .,Purdue Quantum Science and Engineering Institute, Purdue University, West Lafayette, Indiana 47907, USA
| | - Sabre Kais
- Department of Chemistry, Purdue University, West Lafayette, IN-47907, USA. .,Purdue Quantum Science and Engineering Institute, Purdue University, West Lafayette, Indiana 47907, USA.,Department of Physics and Astronomy, Purdue University, West Lafayette, IN-47907, USA.,Elmore Family School of Electrical and Computer Engineering, Purdue University, West Lafayette, IN-47907, USA
| |
Collapse
|
10
|
Golze D, Hirvensalo M, Hernández-León P, Aarva A, Etula J, Susi T, Rinke P, Laurila T, Caro MA. Accurate Computational Prediction of Core-Electron Binding Energies in Carbon-Based Materials: A Machine-Learning Model Combining Density-Functional Theory and GW. CHEMISTRY OF MATERIALS : A PUBLICATION OF THE AMERICAN CHEMICAL SOCIETY 2022; 34:6240-6254. [PMID: 35910537 PMCID: PMC9330771 DOI: 10.1021/acs.chemmater.1c04279] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/13/2021] [Revised: 06/30/2022] [Indexed: 06/15/2023]
Abstract
We present a quantitatively accurate machine-learning (ML) model for the computational prediction of core-electron binding energies, from which X-ray photoelectron spectroscopy (XPS) spectra can be readily obtained. Our model combines density functional theory (DFT) with GW and uses kernel ridge regression for the ML predictions. We apply the new approach to disordered materials and small molecules containing carbon, hydrogen, and oxygen and obtain qualitative and quantitative agreement with experiment, resolving spectral features within 0.1 eV of reference experimental spectra. The method only requires the user to provide a structural model for the material under study to obtain an XPS prediction within seconds. Our new tool is freely available online through the XPS Prediction Server.
Collapse
Affiliation(s)
- Dorothea Golze
- Faculty
of Chemistry and Food Chemistry, Technische
Universität Dresden, 01062 Dresden, Germany
- Department
of Applied Physics, Aalto University, 02150 Espoo, Finland
| | - Markus Hirvensalo
- Department
of Applied Physics, Aalto University, 02150 Espoo, Finland
| | | | - Anja Aarva
- Department
of Electrical Engineering and Automation, Aalto University, 02150 Espoo, Finland
| | - Jarkko Etula
- Department
of Chemistry and Materials Science, Aalto
University, 02150 Espoo, Finland
| | - Toma Susi
- University
of Vienna, Faculty of Physics, Boltzmanngasse 5, 1090 Vienna, Austria
| | - Patrick Rinke
- Department
of Applied Physics, Aalto University, 02150 Espoo, Finland
| | - Tomi Laurila
- Department
of Electrical Engineering and Automation, Aalto University, 02150 Espoo, Finland
- Department
of Chemistry and Materials Science, Aalto
University, 02150 Espoo, Finland
| | - Miguel A. Caro
- Department
of Electrical Engineering and Automation, Aalto University, 02150 Espoo, Finland
| |
Collapse
|
11
|
G S V, V S H. Prediction of Bus Passenger Traffic using Gaussian Process Regression. JOURNAL OF SIGNAL PROCESSING SYSTEMS 2022; 95:281-292. [PMID: 35692285 PMCID: PMC9166211 DOI: 10.1007/s11265-022-01774-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/24/2021] [Revised: 04/28/2022] [Accepted: 05/05/2022] [Indexed: 06/15/2023]
Abstract
The paper summarizes the design and implementation of a passenger traffic prediction model, based on Gaussian Process Regression (GPR). Passenger traffic analysis is the present day requirement for proper bus scheduling and traffic management to improve the efficiency and passenger comfort. Bayesian analysis uses statistical modelling to recursively estimate new data from existing data. GPR is a fully Bayesian process model, which is developed using PyMC3 with Theano as backend. The passenger data is modelled as a Poisson process so that the prior for designing the GP regression model is a Gamma distributed function. It is observed that the proposed GP based regression method outperforms the existing methods like Student-t process model and Kernel Ridge Regression (KRR) process.
Collapse
Affiliation(s)
- Vidya G S
- Department of Electronics, College of Engineering Chengannur, A P J Abdul Kalam Technological University, Kerala 689121 Thiruvananthapuram, India
| | - Hari V S
- Department of Electronics, College of Engineering Chengannur, A P J Abdul Kalam Technological University, Kerala 689121 Thiruvananthapuram, India
| |
Collapse
|
12
|
Rankine CD, Penfold TJ. Accurate, affordable, and generalizable machine learning simulations of transition metal x-ray absorption spectra using the XANESNET deep neural network. J Chem Phys 2022; 156:164102. [PMID: 35490005 DOI: 10.1063/5.0087255] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
Abstract
The affordable, accurate, and generalizable prediction of spectroscopic observables plays a key role in the analysis of increasingly complex experiments. In this article, we develop and deploy a deep neural network-XANESNET-for predicting the lineshape of first-row transition metal K-edge x-ray absorption near-edge structure (XANES) spectra. XANESNET predicts the spectral intensities using only information about the local coordination geometry of the transition metal complexes encoded in a feature vector of weighted atom-centered symmetry functions. We address in detail the calibration of the feature vector for the particularities of the problem at hand, and we explore the individual feature importance to reveal the physical insight that XANESNET obtains at the Fe K-edge. XANESNET relies on only a few judiciously selected features-radial information on the first and second coordination shells suffices along with angular information sufficient to separate satisfactorily key coordination geometries. The feature importance is found to reflect the XANES spectral window under consideration and is consistent with the expected underlying physics. We subsequently apply XANESNET at nine first-row transition metal (Ti-Zn) K-edges. It can be optimized in as little as a minute, predicts instantaneously, and provides K-edge XANES spectra with an average accuracy of ∼±2%-4% in which the positions of prominent peaks are matched with a >90% hit rate to sub-eV (∼0.8 eV) error.
Collapse
Affiliation(s)
- C D Rankine
- Chemistry-School of Natural and Environmental Sciences, Newcastle University, Newcastle Upon Tyne NE1 7RU, United Kingdom
| | - T J Penfold
- Chemistry-School of Natural and Environmental Sciences, Newcastle University, Newcastle Upon Tyne NE1 7RU, United Kingdom
| |
Collapse
|
13
|
Herzog B, Chagas da Silva M, Casier B, Badawi M, Pascale F, Bučko T, Lebègue S, Rocca D. Assessing the Accuracy of Machine Learning Thermodynamic Perturbation Theory: Density Functional Theory and Beyond. J Chem Theory Comput 2022; 18:1382-1394. [PMID: 35191699 DOI: 10.1021/acs.jctc.1c01034] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/08/2023]
Abstract
Machine learning thermodynamic perturbation theory (MLPT) is a promising approach to compute finite temperature properties when the goal is to compare several different levels of ab initio theory and/or to apply highly expensive computational methods. Indeed, starting from a production molecular dynamics trajectory, this method can estimate properties at one or more target levels of theory from only a small number of additional fixed-geometry calculations, which are used to train a machine learning model. However, as MLPT is based on thermodynamic perturbation theory (TPT), inaccuracies might arise when the starting point trajectory samples a configurational space which has a small overlap with that of the target approximations of interest. By considering case studies of molecules adsorbed in zeolites and several different density functional theory approximations, in this work we assess the accuracy of MLPT for ensemble total energies and enthalpies of adsorption. It is shown that problematic cases can be detected even without knowing reference results and that even in these situations it is possible to recover target level results within chemical accuracy by applying a machine-learning-based Monte Carlo (MLMC) resampling. Finally, on the basis of the ideas developed in this work, we assess and confirm the accuracy of recently published MLPT-based enthalpies of adsorption at the random phase approximation level, whose high computational cost would completely hinder a direct molecular dynamics simulation.
Collapse
Affiliation(s)
- Basile Herzog
- Université de Lorraine and CNRS, Laboratoire de Physique et Chimie Théorique, UMR 7019, 54506 Vandœuvre-lés-Nancy, France
| | - Maurício Chagas da Silva
- Université de Lorraine and CNRS, Laboratoire de Physique et Chimie Théorique, UMR 7019, 54506 Vandœuvre-lés-Nancy, France
| | - Bastien Casier
- Université de Lorraine and CNRS, Laboratoire de Physique et Chimie Théorique, UMR 7019, 54506 Vandœuvre-lés-Nancy, France
| | - Michael Badawi
- Université de Lorraine and CNRS, Laboratoire de Physique et Chimie Théorique, UMR 7019, 54506 Vandœuvre-lés-Nancy, France
| | - Fabien Pascale
- Université de Lorraine and CNRS, Laboratoire de Physique et Chimie Théorique, UMR 7019, 54506 Vandœuvre-lés-Nancy, France
| | - Tomáš Bučko
- Department of Physical and Theoretical Chemistry, Faculty of Natural Sciences, Comenius University in Bratislava, Mlynská Dolina, Ilkovičova 6, SK-84215 Bratislava, Slovakia.,Institute of Inorganic Chemistry, Slovak Academy of Sciences, Dúbravská cesta 9, SK-84236 Bratislava, Slovakia
| | - Sébastien Lebègue
- Université de Lorraine and CNRS, Laboratoire de Physique et Chimie Théorique, UMR 7019, 54506 Vandœuvre-lés-Nancy, France
| | - Dario Rocca
- Université de Lorraine and CNRS, Laboratoire de Physique et Chimie Théorique, UMR 7019, 54506 Vandœuvre-lés-Nancy, France
| |
Collapse
|
14
|
Gallegos M, Guevara-Vela JM, Pendás ÁM. NNAIMQ: A neural network model for predicting QTAIM charges. J Chem Phys 2022; 156:014112. [PMID: 34998318 DOI: 10.1063/5.0076896] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open
Abstract
Atomic charges provide crucial information about the electronic structure of a molecular system. Among the different definitions of these descriptors, the one proposed by the Quantum Theory of Atoms in Molecules (QTAIM) is particularly attractive given its invariance against orbital transformations although the computational cost associated with their calculation limits its applicability. Given that Machine Learning (ML) techniques have been shown to accelerate orders of magnitude the computation of a number of quantum mechanical observables, in this work, we take advantage of ML knowledge to develop an intuitive and fast neural network model (NNAIMQ) for the computation of QTAIM charges for C, H, O, and N atoms with high accuracy. Our model has been trained and tested using data from quantum chemical calculations in more than 45 000 molecular environments of the near-equilibrium CHON chemical space. The reliability and performance of NNAIMQ have been analyzed in a variety of scenarios, from equilibrium geometries to molecular dynamics simulations. Altogether, NNAIMQ yields remarkably small prediction errors, well below the 0.03 electron limit in the general case, while accelerating the calculation of QTAIM charges by several orders of magnitude.
Collapse
Affiliation(s)
- Miguel Gallegos
- Depto. Química Física y Analítica, Universidad de Oviedo, 33006 Oviedo, Spain
| | - José Manuel Guevara-Vela
- Institute of Chemistry, National Autonomous University of Mexico, Circuito Exterior, Ciudad Universitaria, Delegación Coyoacán, Mexico City C.P. 04510, Mexico
| | - Ángel Martín Pendás
- Depto. Química Física y Analítica, Universidad de Oviedo, 33006 Oviedo, Spain
| |
Collapse
|
15
|
Tsubaki M, Mizoguchi T. Quantum Deep Descriptor: Physically Informed Transfer Learning from Small Molecules to Polymers. J Chem Theory Comput 2021; 17:7814-7821. [PMID: 34846893 DOI: 10.1021/acs.jctc.1c00568] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]
Abstract
In this study, we propose a physically informed transfer learning approach for materials informatics (MI) using a quantum deep descriptor (QDD) obtained from the quantum deep field (QDF). The QDF is a machine learning model based on density functional theory (DFT) and can be trained with a large database of molecular properties. The pre-trained QDF model can provide an effective molecular descriptor that encodes the fundamental quantum-chemical characteristics (i.e., the wave function or orbital, electron density, and energies of a molecule) learned from the large database; we refer to this descriptor as a QDD. We show that a QDD pre-trained with certain properties of small molecules can predict different properties (e.g., the band gap and dielectric constant) of polymers compared with some existing descriptors. We believe that our DFT-based, physically informed transfer learning approach will not only be useful for practical applications in MI but will also provide quantum-chemical insights into materials in the future. All codes used in this study are available at https://github.com/masashitsubaki.
Collapse
Affiliation(s)
- Masashi Tsubaki
- National Institute of Advanced Industrial Science and Technology, Tokyo 135-0064, Japan
| | - Teruyasu Mizoguchi
- Institute of Industrial Science, The University of Tokyo, Tokyo 113-0033, Japan
| |
Collapse
|
16
|
Westermayr J, Marquetand P. Machine Learning for Electronically Excited States of Molecules. Chem Rev 2021; 121:9873-9926. [PMID: 33211478 PMCID: PMC8391943 DOI: 10.1021/acs.chemrev.0c00749] [Citation(s) in RCA: 162] [Impact Index Per Article: 54.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2020] [Indexed: 12/11/2022]
Abstract
Electronically excited states of molecules are at the heart of photochemistry, photophysics, as well as photobiology and also play a role in material science. Their theoretical description requires highly accurate quantum chemical calculations, which are computationally expensive. In this review, we focus on not only how machine learning is employed to speed up such excited-state simulations but also how this branch of artificial intelligence can be used to advance this exciting research field in all its aspects. Discussed applications of machine learning for excited states include excited-state dynamics simulations, static calculations of absorption spectra, as well as many others. In order to put these studies into context, we discuss the promises and pitfalls of the involved machine learning techniques. Since the latter are mostly based on quantum chemistry calculations, we also provide a short introduction into excited-state electronic structure methods and approaches for nonadiabatic dynamics simulations and describe tricks and problems when using them in machine learning for excited states of molecules.
Collapse
Affiliation(s)
- Julia Westermayr
- Institute
of Theoretical Chemistry, Faculty of Chemistry, University of Vienna, Währinger Strasse 17, 1090 Vienna, Austria
| | - Philipp Marquetand
- Institute
of Theoretical Chemistry, Faculty of Chemistry, University of Vienna, Währinger Strasse 17, 1090 Vienna, Austria
- Vienna
Research Platform on Accelerating Photoreaction Discovery, University of Vienna, Währinger Strasse 17, 1090 Vienna, Austria
- Data
Science @ Uni Vienna, University of Vienna, Währinger Strasse 29, 1090 Vienna, Austria
| |
Collapse
|
17
|
Abstract
Chemical compound space (CCS), the set of all theoretically conceivable combinations of chemical elements and (meta-)stable geometries that make up matter, is colossal. The first-principles based virtual sampling of this space, for example, in search of novel molecules or materials which exhibit desirable properties, is therefore prohibitive for all but the smallest subsets and simplest properties. We review studies aimed at tackling this challenge using modern machine learning techniques based on (i) synthetic data, typically generated using quantum mechanics based methods, and (ii) model architectures inspired by quantum mechanics. Such Quantum mechanics based Machine Learning (QML) approaches combine the numerical efficiency of statistical surrogate models with an ab initio view on matter. They rigorously reflect the underlying physics in order to reach universality and transferability across CCS. While state-of-the-art approximations to quantum problems impose severe computational bottlenecks, recent QML based developments indicate the possibility of substantial acceleration without sacrificing the predictive power of quantum mechanics.
Collapse
Affiliation(s)
- Bing Huang
- Faculty
of Physics, University of Vienna, 1090 Vienna, Austria
| | - O. Anatole von Lilienfeld
- Faculty
of Physics, University of Vienna, 1090 Vienna, Austria
- Institute
of Physical Chemistry and National Center for Computational Design
and Discovery of Novel Materials (MARVEL), Department of Chemistry, University of Basel, 4056 Basel, Switzerland
| |
Collapse
|
18
|
Abstract
Electronically excited states of molecules are at the heart of photochemistry, photophysics, as well as photobiology and also play a role in material science. Their theoretical description requires highly accurate quantum chemical calculations, which are computationally expensive. In this review, we focus on not only how machine learning is employed to speed up such excited-state simulations but also how this branch of artificial intelligence can be used to advance this exciting research field in all its aspects. Discussed applications of machine learning for excited states include excited-state dynamics simulations, static calculations of absorption spectra, as well as many others. In order to put these studies into context, we discuss the promises and pitfalls of the involved machine learning techniques. Since the latter are mostly based on quantum chemistry calculations, we also provide a short introduction into excited-state electronic structure methods and approaches for nonadiabatic dynamics simulations and describe tricks and problems when using them in machine learning for excited states of molecules.
Collapse
Affiliation(s)
- Julia Westermayr
- Institute of Theoretical Chemistry, Faculty of Chemistry, University of Vienna, Währinger Strasse 17, 1090 Vienna, Austria
| | - Philipp Marquetand
- Institute of Theoretical Chemistry, Faculty of Chemistry, University of Vienna, Währinger Strasse 17, 1090 Vienna, Austria
- Vienna Research Platform on Accelerating Photoreaction Discovery, University of Vienna, Währinger Strasse 17, 1090 Vienna, Austria
- Data Science @ Uni Vienna, University of Vienna, Währinger Strasse 29, 1090 Vienna, Austria
| |
Collapse
|
19
|
Westermayr J, Maurer RJ. Physically inspired deep learning of molecular excitations and photoemission spectra. Chem Sci 2021; 12:10755-10764. [PMID: 34447563 PMCID: PMC8372319 DOI: 10.1039/d1sc01542g] [Citation(s) in RCA: 20] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2021] [Accepted: 06/29/2021] [Indexed: 12/29/2022] Open
Abstract
Modern functional materials consist of large molecular building blocks with significant chemical complexity which limits spectroscopic property prediction with accurate first-principles methods. Consequently, a targeted design of materials with tailored optoelectronic properties by high-throughput screening is bound to fail without efficient methods to predict molecular excited-state properties across chemical space. In this work, we present a deep neural network that predicts charged quasiparticle excitations for large and complex organic molecules with a rich elemental diversity and a size well out of reach of accurate many body perturbation theory calculations. The model exploits the fundamental underlying physics of molecular resonances as eigenvalues of a latent Hamiltonian matrix and is thus able to accurately describe multiple resonances simultaneously. The performance of this model is demonstrated for a range of organic molecules across chemical composition space and configuration space. We further showcase the model capabilities by predicting photoemission spectra at the level of the GW approximation for previously unseen conjugated molecules.
Collapse
Affiliation(s)
- Julia Westermayr
- Department of Chemistry, University of Warwick Gibbet Hill Road Coventry CV4 7AL UK
| | - Reinhard J Maurer
- Department of Chemistry, University of Warwick Gibbet Hill Road Coventry CV4 7AL UK
| |
Collapse
|
20
|
Musil F, Grisafi A, Bartók AP, Ortner C, Csányi G, Ceriotti M. Physics-Inspired Structural Representations for Molecules and Materials. Chem Rev 2021; 121:9759-9815. [PMID: 34310133 DOI: 10.1021/acs.chemrev.1c00021] [Citation(s) in RCA: 135] [Impact Index Per Article: 45.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
The first step in the construction of a regression model or a data-driven analysis, aiming to predict or elucidate the relationship between the atomic-scale structure of matter and its properties, involves transforming the Cartesian coordinates of the atoms into a suitable representation. The development of atomic-scale representations has played, and continues to play, a central role in the success of machine-learning methods for chemistry and materials science. This review summarizes the current understanding of the nature and characteristics of the most commonly used structural and chemical descriptions of atomistic structures, highlighting the deep underlying connections between different frameworks and the ideas that lead to computationally efficient and universally applicable models. It emphasizes the link between properties, structures, their physical chemistry, and their mathematical description, provides examples of recent applications to a diverse set of chemical and materials science problems, and outlines the open questions and the most promising research directions in the field.
Collapse
Affiliation(s)
- Felix Musil
- Laboratory of Computational Science and Modeling, IMX, École Polytechnique Fédérale de Lausanne, 1015 Lausanne, Switzerland.,National Centre for Computational Design and Discovery of Novel Materials (MARVEL), École Polytechnique Fédérale de Lausanne, 1015 Lausanne, Switzerland
| | - Andrea Grisafi
- Laboratory of Computational Science and Modeling, IMX, École Polytechnique Fédérale de Lausanne, 1015 Lausanne, Switzerland
| | - Albert P Bartók
- Department of Physics and Warwick Centre for Predictive Modelling, School of Engineering, University of Warwick, Coventry CV4 7AL, United Kingdom
| | - Christoph Ortner
- University of British Columbia, Vancouver, British Columbia V6T 1Z2, Canada
| | - Gábor Csányi
- Engineering Laboratory, University of Cambridge, Trumpington Street, Cambridge CB2 1PZ, United Kingdom
| | - Michele Ceriotti
- Laboratory of Computational Science and Modeling, IMX, École Polytechnique Fédérale de Lausanne, 1015 Lausanne, Switzerland.,National Centre for Computational Design and Discovery of Novel Materials (MARVEL), École Polytechnique Fédérale de Lausanne, 1015 Lausanne, Switzerland
| |
Collapse
|
21
|
Stuke A, Rinke P, Todorović M. Efficient hyperparameter tuning for kernel ridge regression with Bayesian optimization. MACHINE LEARNING: SCIENCE AND TECHNOLOGY 2021. [DOI: 10.1088/2632-2153/abee59] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022] Open
Abstract
Abstract
Machine learning methods usually depend on internal parameters—so called hyperparameters—that need to be optimized for best performance. Such optimization poses a burden on machine learning practitioners, requiring expert knowledge, intuition or computationally demanding brute-force parameter searches. We here assess three different hyperparameter selection methods: grid search, random search and an efficient automated optimization technique based on Bayesian optimization (BO). We apply these methods to a machine learning problem based on kernel ridge regression in computational chemistry. Two different descriptors are employed to represent the atomic structure of organic molecules, one of which introduces its own set of hyperparameters to the method. We identify optimal hyperparameter configurations and infer entire prediction error landscapes in hyperparameter space that serve as visual guides for the hyperparameter performance. We further demonstrate that for an increasing number of hyperparameters, BO and random search become significantly more efficient in computational time than an exhaustive grid search, while delivering an equivalent or even better accuracy.
Collapse
|
22
|
Abstract
Theoretical simulations of electronic excitations and associated processes in molecules are indispensable for fundamental research and technological innovations. However, such simulations are notoriously challenging to perform with quantum mechanical methods. Advances in machine learning open many new avenues for assisting molecular excited-state simulations. In this Review, we track such progress, assess the current state of the art and highlight the critical issues to solve in the future. We overview a broad range of machine learning applications in excited-state research, which include the prediction of molecular properties, improvements of quantum mechanical methods for the calculations of excited-state properties and the search for new materials. Machine learning approaches can help us understand hidden factors that influence photo-processes, leading to a better control of such processes and new rules for the design of materials for optoelectronic applications.
Collapse
|
23
|
Rahaman O, Gagliardi A. Deep Learning Total Energies and Orbital Energies of Large Organic Molecules Using Hybridization of Molecular Fingerprints. J Chem Inf Model 2020; 60:5971-5983. [PMID: 33118351 DOI: 10.1021/acs.jcim.0c00687] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
Abstract
The ability to predict material properties without the need for resource-consuming experimental efforts can immensely accelerate material and drug discovery. Although ab initio methods can be reliable and accurate in making such predictions, they are computationally too expensive on a large scale. The recent advancements in artificial intelligence and machine learning as well as the availability of large quantum mechanics derived datasets enable us to train models on these datasets as a benchmark and to make fast predictions on much larger datasets. The success of these machine learning models highly depends on the machine-readable fingerprints of the molecules that capture their chemical properties as well as topological information. In this work, we propose a common deep learning-based framework to combine different types of molecular fingerprints to enhance prediction accuracy. A graph neural network (GNN), many-body tensor representation (MBTR), and a set of simple molecular descriptors (MD) were used to predict the total energies, highest occupied molecular orbital (HOMO) energies, and lowest unoccupied molecular orbital (LUMO) energies of a dataset containing ∼62k large organic molecules with complex aromatic rings and remarkably diverse functional groups. The results demonstrate that a combination of best performing molecular fingerprints can produce better results than the individual ones. The simple and flexible deep learning framework developed in this work can be easily adapted to incorporate other types of molecular fingerprints.
Collapse
Affiliation(s)
- Obaidur Rahaman
- Technische Universität München, Karlstr. 45, 80333 Munich, Germany
| | | |
Collapse
|
24
|
Pinheiro GA, Mucelini J, Soares MD, Prati RC, Da Silva JLF, Quiles MG. Machine Learning Prediction of Nine Molecular Properties Based on the SMILES Representation of the QM9 Quantum-Chemistry Dataset. J Phys Chem A 2020; 124:9854-9866. [DOI: 10.1021/acs.jpca.0c05969] [Citation(s) in RCA: 26] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/08/2023]
Affiliation(s)
- Gabriel A. Pinheiro
- Associate Laboratory for Computing and Applied Mathematics, National Institute for Space Research, PO BOX 515, 12227-010, São José dos Campos, SP, Brazil
| | - Johnatan Mucelini
- São Carlos Institute of Chemistry, University of São Paulo, PO Box 780, 13560-970, São Carlos, SP, Brazil
| | - Marinalva D. Soares
- Institute of Science and Technology, Federal University of São Paulo (Unifesp), 12247-014, São José dos Campos, SP, Brazil
| | - Ronaldo C. Prati
- Center of Mathematics, Computation and Cognition, Federal University of ABC, Av. Dos Estados, 5001, 09210−580, Santo André, SP, Brazil
| | - Juarez L. F. Da Silva
- São Carlos Institute of Chemistry, University of São Paulo, PO Box 780, 13560-970, São Carlos, SP, Brazil
| | - Marcos G. Quiles
- Institute of Science and Technology, Federal University of São Paulo, 12247-014, São José dos Campos, SP, Brazil
| |
Collapse
|
25
|
Stocker S, Csányi G, Reuter K, Margraf JT. Machine learning in chemical reaction space. Nat Commun 2020; 11:5505. [PMID: 33127879 PMCID: PMC7603480 DOI: 10.1038/s41467-020-19267-x] [Citation(s) in RCA: 55] [Impact Index Per Article: 13.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2020] [Accepted: 10/01/2020] [Indexed: 12/29/2022] Open
Abstract
Chemical compound space refers to the vast set of all possible chemical compounds, estimated to contain 1060 molecules. While intractable as a whole, modern machine learning (ML) is increasingly capable of accurately predicting molecular properties in important subsets. Here, we therefore engage in the ML-driven study of even larger reaction space. Central to chemistry as a science of transformations, this space contains all possible chemical reactions. As an important basis for 'reactive' ML, we establish a first-principles database (Rad-6) containing closed and open-shell organic molecules, along with an associated database of chemical reaction energies (Rad-6-RE). We show that the special topology of reaction spaces, with central hub molecules involved in multiple reactions, requires a modification of existing compound space ML-concepts. Showcased by the application to methane combustion, we demonstrate that the learned reaction energies offer a non-empirical route to rationally extract reduced reaction networks for detailed microkinetic analyses.
Collapse
Affiliation(s)
- Sina Stocker
- Chair of Theoretical Chemistry and Catalysis Research Center, Technische Universität München, Garching, Germany
| | - Gábor Csányi
- Engineering Laboratory, University of Cambridge, Cambridge, CB2 1PZ, UK
| | - Karsten Reuter
- Chair of Theoretical Chemistry and Catalysis Research Center, Technische Universität München, Garching, Germany
- Fritz-Haber-Institut der Max-Planck-Gesellschaft, Berlin, Germany
| | - Johannes T Margraf
- Chair of Theoretical Chemistry and Catalysis Research Center, Technische Universität München, Garching, Germany.
| |
Collapse
|
26
|
Saraceni N, Cantori S, Pilati S. Scalable neural networks for the efficient learning of disordered quantum systems. Phys Rev E 2020; 102:033301. [PMID: 33075937 DOI: 10.1103/physreve.102.033301] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2020] [Accepted: 08/11/2020] [Indexed: 12/20/2022]
Abstract
Supervised machine learning is emerging as a powerful computational tool to predict the properties of complex quantum systems at a limited computational cost. In this article, we quantify how accurately deep neural networks can learn the properties of disordered quantum systems as a function of the system size. We implement a scalable convolutional network that can address arbitrary system sizes. This network is compared with a recently introduced extensive convolutional architecture [Mills et al., Chem. Sci. 10, 4129 (2019)2041-652010.1039/C8SC04578J] and with conventional dense networks with all-to-all connectivity. The networks are trained to predict the exact ground-state energies of various disordered systems, namely, a continuous-space single-particle Hamiltonian for cold-atoms in speckle disorder, and different setups of a quantum Ising chain with random couplings, including one with only short-range interactions and one augmented with a long-range term. In all testbeds we consider, the scalable network retains high accuracy as the system size increases. Furthermore, we demonstrate that the network scalability enables a transfer-learning protocol, whereby a pretraining performed on small systems drastically accelerates the learning of large-system properties, allowing reaching high accuracy with small training sets. In fact, with the scalable network one can even extrapolate to sizes larger than those included in the training set, accurately reproducing the results of state-of-the-art quantum Monte Carlo simulations.
Collapse
Affiliation(s)
- N Saraceni
- School of Science and Technology, Physics Division, Università di Camerino, 62032 Camerino (MC), Italy
| | - S Cantori
- School of Science and Technology, Physics Division, Università di Camerino, 62032 Camerino (MC), Italy
| | - S Pilati
- School of Science and Technology, Physics Division, Università di Camerino, 62032 Camerino (MC), Italy
| |
Collapse
|
27
|
Westermayr J, Marquetand P. Machine learning and excited-state molecular dynamics. MACHINE LEARNING-SCIENCE AND TECHNOLOGY 2020. [DOI: 10.1088/2632-2153/ab9c3e] [Citation(s) in RCA: 27] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
|
28
|
Low K, Kobayashi R, Izgorodina EI. The effect of descriptor choice in machine learning models for ionic liquid melting point prediction. J Chem Phys 2020; 153:104101. [DOI: 10.1063/5.0016289] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022] Open
Affiliation(s)
- Kaycee Low
- Monash Computational Chemistry Group, Monash University, 17 Rainforest Walk, Clayton, VIC 3800, Australia
| | - Rika Kobayashi
- ANU Supercomputer Facility, Leonard Huxley Building 56, Mills Road, Canberra, ACT 2601, Australia
| | - Ekaterina I. Izgorodina
- Monash Computational Chemistry Group, Monash University, 17 Rainforest Walk, Clayton, VIC 3800, Australia
| |
Collapse
|
29
|
Stuke A, Kunkel C, Golze D, Todorović M, Margraf JT, Reuter K, Rinke P, Oberhofer H. Atomic structures and orbital energies of 61,489 crystal-forming organic molecules. Sci Data 2020; 7:58. [PMID: 32071311 PMCID: PMC7029047 DOI: 10.1038/s41597-020-0385-y] [Citation(s) in RCA: 36] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2019] [Accepted: 01/21/2020] [Indexed: 12/14/2022] Open
Abstract
Data science and machine learning in materials science require large datasets of technologically relevant molecules or materials. Currently, publicly available molecular datasets with realistic molecular geometries and spectral properties are rare. We here supply a diverse benchmark spectroscopy dataset of 61,489 molecules extracted from organic crystals in the Cambridge Structural Database (CSD), denoted OE62. Molecular equilibrium geometries are reported at the Perdew-Burke-Ernzerhof (PBE) level of density functional theory (DFT) including van der Waals corrections for all 62 k molecules. For these geometries, OE62 supplies total energies and orbital eigenvalues at the PBE and the PBE hybrid (PBE0) functional level of DFT for all 62 k molecules in vacuum as well as at the PBE0 level for a subset of 30,876 molecules in (implicit) water. For 5,239 molecules in vacuum, the dataset provides quasiparticle energies computed with many-body perturbation theory in the G0W0 approximation with a PBE0 starting point (denoted GW5000 in analogy to the GW100 benchmark set (M. van Setten et al. J. Chem. Theory Comput. 12, 5076 (2016))).
Collapse
Affiliation(s)
- Annika Stuke
- Department of Applied Physics, Aalto University, P.O. Box 11100, Aalto, FI-00076, Finland.
| | - Christian Kunkel
- Chair for Theoretical Chemistry and Catalysis Research Center, Technische Universität München, Lichtenbergstraße 4, D-85747, Garching, Germany
| | - Dorothea Golze
- Department of Applied Physics, Aalto University, P.O. Box 11100, Aalto, FI-00076, Finland
| | - Milica Todorović
- Department of Applied Physics, Aalto University, P.O. Box 11100, Aalto, FI-00076, Finland
| | - Johannes T Margraf
- Chair for Theoretical Chemistry and Catalysis Research Center, Technische Universität München, Lichtenbergstraße 4, D-85747, Garching, Germany
| | - Karsten Reuter
- Chair for Theoretical Chemistry and Catalysis Research Center, Technische Universität München, Lichtenbergstraße 4, D-85747, Garching, Germany
| | - Patrick Rinke
- Department of Applied Physics, Aalto University, P.O. Box 11100, Aalto, FI-00076, Finland
- Chair for Theoretical Chemistry and Catalysis Research Center, Technische Universität München, Lichtenbergstraße 4, D-85747, Garching, Germany
| | - Harald Oberhofer
- Chair for Theoretical Chemistry and Catalysis Research Center, Technische Universität München, Lichtenbergstraße 4, D-85747, Garching, Germany
| |
Collapse
|
30
|
Jung H, Stocker S, Kunkel C, Oberhofer H, Han B, Reuter K, Margraf JT. Size‐Extensive Molecular Machine Learning with Global Representations. CHEMSYSTEMSCHEM 2020. [DOI: 10.1002/syst.201900052] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Affiliation(s)
- Hyunwook Jung
- Chair for Theoretical Chemistry and Catalysis Research Center Technische Universität München Lichtenbergstraße 4 D-85747 Garching Germany
- Department of Chemical and Biomolecular Engineering Yonsei University Seoul 03722 Republic of Korea
| | - Sina Stocker
- Chair for Theoretical Chemistry and Catalysis Research Center Technische Universität München Lichtenbergstraße 4 D-85747 Garching Germany
| | - Christian Kunkel
- Chair for Theoretical Chemistry and Catalysis Research Center Technische Universität München Lichtenbergstraße 4 D-85747 Garching Germany
| | - Harald Oberhofer
- Chair for Theoretical Chemistry and Catalysis Research Center Technische Universität München Lichtenbergstraße 4 D-85747 Garching Germany
| | - Byungchan Han
- Department of Chemical and Biomolecular Engineering Yonsei University Seoul 03722 Republic of Korea
| | - Karsten Reuter
- Chair for Theoretical Chemistry and Catalysis Research Center Technische Universität München Lichtenbergstraße 4 D-85747 Garching Germany
| | - Johannes T. Margraf
- Chair for Theoretical Chemistry and Catalysis Research Center Technische Universität München Lichtenbergstraße 4 D-85747 Garching Germany
| |
Collapse
|
31
|
Himanen L, Geurts A, Foster AS, Rinke P. Data-Driven Materials Science: Status, Challenges, and Perspectives. ADVANCED SCIENCE (WEINHEIM, BADEN-WURTTEMBERG, GERMANY) 2019; 6:1900808. [PMID: 31728276 PMCID: PMC6839624 DOI: 10.1002/advs.201900808] [Citation(s) in RCA: 141] [Impact Index Per Article: 28.2] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/08/2019] [Revised: 06/20/2019] [Indexed: 05/06/2023]
Abstract
Data-driven science is heralded as a new paradigm in materials science. In this field, data is the new resource, and knowledge is extracted from materials datasets that are too big or complex for traditional human reasoning-typically with the intent to discover new or improved materials or materials phenomena. Multiple factors, including the open science movement, national funding, and progress in information technology, have fueled its development. Such related tools as materials databases, machine learning, and high-throughput methods are now established as parts of the materials research toolset. However, there are a variety of challenges that impede progress in data-driven materials science: data veracity, integration of experimental and computational data, data longevity, standardization, and the gap between industrial interests and academic efforts. In this perspective article, the historical development and current state of data-driven materials science, building from the early evolution of open science to the rapid expansion of materials data infrastructures are discussed. Key successes and challenges so far are also reviewed, providing a perspective on the future development of the field.
Collapse
Affiliation(s)
- Lauri Himanen
- Department of Applied PhysicsAalto UniversityP.O. Box 1110000076Aalto,EspooFinland
| | - Amber Geurts
- Department of Applied PhysicsAalto UniversityP.O. Box 1110000076Aalto,EspooFinland
- Department of Management StudiesAalto UniversityP.O. Box 1110000076Aalto,EspooFinland
- TNO, Netherlands Organization for Applied Scientific ResearchExpertise Center for Strategy and PolicyAnna van Beurenplein 1DA 2595The HagueNetherlands
| | - Adam Stuart Foster
- Department of Applied PhysicsAalto UniversityP.O. Box 1110000076Aalto,EspooFinland
- Graduate School Materials Science in MainzStaudinger Weg 955128MainzGermany
- WPI Nano Life Science Institute (WPI‐NanoLSI)Kanazawa UniversityKakuma‐machiKanazawa920‐1192Japan
| | - Patrick Rinke
- Department of Applied PhysicsAalto UniversityP.O. Box 1110000076Aalto,EspooFinland
- Theoretical Chemistry and Catalysis Research CentreTechnische Universität MünchenLichtenbergstr. 4D‐85747GarchingGermany
| |
Collapse
|