1
|
Hu X, Lenz-Himmer MO, Baldauf C. Better force fields start with better data: A data set of cation dipeptide interactions. Sci Data 2022; 9:327. [PMID: 35715420 PMCID: PMC9205945 DOI: 10.1038/s41597-022-01297-3] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2021] [Accepted: 03/18/2022] [Indexed: 11/08/2022] Open
Abstract
We present a data set from a first-principles study of amino-methylated and acetylated (capped) dipeptides of the 20 proteinogenic amino acids - including alternative possible side chain protonation states and their interactions with selected divalent cations (Ca2+, Mg2+ and Ba2+). The data covers 21,909 stationary points on the respective potential-energy surfaces in a wide relative energy range of up to 4 eV (390 kJ/mol). Relevant properties of interest, like partial charges, were derived for the conformers. The motivation was to provide a solid data basis for force field parameterization and further applications like machine learning or benchmarking. In particular the process of creating all this data on the same first-principles footing, i.e. density-functional theory calculations employing the generalized gradient approximation with a van der Waals correction, makes this data suitable for first principles data-driven force field development. To make the data accessible across domain borders and to machines, we formalized the metadata in an ontology.
Collapse
Affiliation(s)
- Xiaojuan Hu
- Fritz-Haber-Institut der Max-Planck-Gesellschaft, Faradayweg 4-6, 14195, Berlin, Germany.
| | | | - Carsten Baldauf
- Fritz-Haber-Institut der Max-Planck-Gesellschaft, Faradayweg 4-6, 14195, Berlin, Germany.
| |
Collapse
|
2
|
Musil F, Grisafi A, Bartók AP, Ortner C, Csányi G, Ceriotti M. Physics-Inspired Structural Representations for Molecules and Materials. Chem Rev 2021; 121:9759-9815. [PMID: 34310133 DOI: 10.1021/acs.chemrev.1c00021] [Citation(s) in RCA: 173] [Impact Index Per Article: 43.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
The first step in the construction of a regression model or a data-driven analysis, aiming to predict or elucidate the relationship between the atomic-scale structure of matter and its properties, involves transforming the Cartesian coordinates of the atoms into a suitable representation. The development of atomic-scale representations has played, and continues to play, a central role in the success of machine-learning methods for chemistry and materials science. This review summarizes the current understanding of the nature and characteristics of the most commonly used structural and chemical descriptions of atomistic structures, highlighting the deep underlying connections between different frameworks and the ideas that lead to computationally efficient and universally applicable models. It emphasizes the link between properties, structures, their physical chemistry, and their mathematical description, provides examples of recent applications to a diverse set of chemical and materials science problems, and outlines the open questions and the most promising research directions in the field.
Collapse
Affiliation(s)
- Felix Musil
- Laboratory of Computational Science and Modeling, IMX, École Polytechnique Fédérale de Lausanne, 1015 Lausanne, Switzerland.,National Centre for Computational Design and Discovery of Novel Materials (MARVEL), École Polytechnique Fédérale de Lausanne, 1015 Lausanne, Switzerland
| | - Andrea Grisafi
- Laboratory of Computational Science and Modeling, IMX, École Polytechnique Fédérale de Lausanne, 1015 Lausanne, Switzerland
| | - Albert P Bartók
- Department of Physics and Warwick Centre for Predictive Modelling, School of Engineering, University of Warwick, Coventry CV4 7AL, United Kingdom
| | - Christoph Ortner
- University of British Columbia, Vancouver, British Columbia V6T 1Z2, Canada
| | - Gábor Csányi
- Engineering Laboratory, University of Cambridge, Trumpington Street, Cambridge CB2 1PZ, United Kingdom
| | - Michele Ceriotti
- Laboratory of Computational Science and Modeling, IMX, École Polytechnique Fédérale de Lausanne, 1015 Lausanne, Switzerland.,National Centre for Computational Design and Discovery of Novel Materials (MARVEL), École Polytechnique Fédérale de Lausanne, 1015 Lausanne, Switzerland
| |
Collapse
|
3
|
Paleico ML, Behler J. A bin and hash method for analyzing reference data and descriptors in machine learning potentials. MACHINE LEARNING: SCIENCE AND TECHNOLOGY 2021. [DOI: 10.1088/2632-2153/abe663] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022] Open
Abstract
Abstract
In recent years the development of machine learning potentials (MLPs) has become a very active field of research. Numerous approaches have been proposed, which allow one to perform extended simulations of large systems at a small fraction of the computational costs of electronic structure calculations. The key to the success of modern MLPs is the close-to first principles quality description of the atomic interactions. This accuracy is reached by using very flexible functional forms in combination with high-level reference data from electronic structure calculations. These data sets can include up to hundreds of thousands of structures covering millions of atomic environments to ensure that all relevant features of the potential energy surface are well represented. The handling of such large data sets is nowadays becoming one of the main challenges in the construction of MLPs. In this paper we present a method, the bin-and-hash (BAH) algorithm, to overcome this problem by enabling the efficient identification and comparison of large numbers of multidimensional vectors. Such vectors emerge in multiple contexts in the construction of MLPs. Examples are the comparison of local atomic environments to identify and avoid unnecessary redundant information in the reference data sets that is costly in terms of both the electronic structure calculations as well as the training process, the assessment of the quality of the descriptors used as structural fingerprints in many types of MLPs, and the detection of possibly unreliable data points. The BAH algorithm is illustrated for the example of high-dimensional neural network potentials using atom-centered symmetry functions for the geometrical description of the atomic environments, but the method is general and can be combined with any current type of MLP.
Collapse
|
4
|
Fukutani T, Miyazawa K, Iwata S, Satoh H. G-RMSD: Root Mean Square Deviation Based Method for Three-Dimensional Molecular Similarity Determination. BULLETIN OF THE CHEMICAL SOCIETY OF JAPAN 2021. [DOI: 10.1246/bcsj.20200258] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/02/2023]
Affiliation(s)
- Tomonori Fukutani
- Department of Mathematical Informatics, The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo 113-8654, Japan
| | - Kohei Miyazawa
- Department of Mathematical Informatics, The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo 113-8654, Japan
| | - Satoru Iwata
- Department of Mathematical Informatics, The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo 113-8654, Japan
| | - Hiroko Satoh
- Department of Chemistry, University of Zurich, Winterthurerstrasse 190, 8057 Zurich, Switzerland
- Research Organization of Information and Systems (ROIS), 4-3-13 Toranomon, Minato-ku, Tokyo 105-0001, Japan
| |
Collapse
|
5
|
Helfrecht BA, Cersonsky RK, Fraux G, Ceriotti M. Structure-property maps with Kernel principal covariates regression. MACHINE LEARNING-SCIENCE AND TECHNOLOGY 2020. [DOI: 10.1088/2632-2153/aba9ef] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
|
6
|
Building Nonparametric n-Body Force Fields Using Gaussian Process Regression. MACHINE LEARNING MEETS QUANTUM PHYSICS 2020. [DOI: 10.1007/978-3-030-40245-7_5] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/03/2023]
|
7
|
Basdogan Y, Groenenboom MC, Henderson E, De S, Rempe SB, Keith JA. Machine Learning-Guided Approach for Studying Solvation Environments. J Chem Theory Comput 2019; 16:633-642. [PMID: 31809056 DOI: 10.1021/acs.jctc.9b00605] [Citation(s) in RCA: 42] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Abstract
Molecular-level understanding and characterization of solvation environments are often needed across chemistry, biology, and engineering. Toward practical modeling of local solvation effects of any solute in any solvent, we report a static and all-quantum mechanics-based cluster-continuum approach for calculating single-ion solvation free energies. This approach uses a global optimization procedure to identify low-energy molecular clusters with different numbers of explicit solvent molecules and then employs the smooth overlap for atomic positions learning kernel to quantify the similarity between different low-energy solute environments. From these data, we use sketch maps, a nonlinear dimensionality reduction algorithm, to obtain a two-dimensional visual representation of the similarity between solute environments in differently sized microsolvated clusters. After testing this approach on different ions having charges 2+, 1+, 1-, and 2-, we find that the solvation environment around each ion can be seen to usually become more similar in hand with its calculated single-ion solvation free energy. Without needing either dynamics simulations or an a priori knowledge of local solvation structure of the ions, this approach can be used to calculate solvation free energies within 5% of experimental measurements for most cases, and it should be transferable for the study of other systems where dynamics simulations are not easily carried out.
Collapse
Affiliation(s)
- Yasemin Basdogan
- Department of Chemical and Petroleum Engineering Swanson School of Engineering , University of Pittsburgh , Pittsburgh 15261 , Pennsylvania , United States
| | - Mitchell C Groenenboom
- Department of Chemical and Petroleum Engineering Swanson School of Engineering , University of Pittsburgh , Pittsburgh 15261 , Pennsylvania , United States
| | - Ethan Henderson
- Department of Chemical and Petroleum Engineering Swanson School of Engineering , University of Pittsburgh , Pittsburgh 15261 , Pennsylvania , United States
| | - Sandip De
- Laboratory of Computational Science and Modelling, Institute of Materials , École Polytechnique Fédérale de Lausanne , Lausanne 1015 , Switzerland
| | - Susan B Rempe
- Department of Nanobiology , Sandia National Laboratories , Albuquerque 87185 , New Mexico , United States
| | - John A Keith
- Department of Chemical and Petroleum Engineering Swanson School of Engineering , University of Pittsburgh , Pittsburgh 15261 , Pennsylvania , United States
| |
Collapse
|
8
|
Hofstetter A, Balodis M, Paruzzo FM, Widdifield CM, Stevanato G, Pinon AC, Bygrave PJ, Day GM, Emsley L. Rapid Structure Determination of Molecular Solids Using Chemical Shifts Directed by Unambiguous Prior Constraints. J Am Chem Soc 2019; 141:16624-16634. [PMID: 31117663 PMCID: PMC7540916 DOI: 10.1021/jacs.9b03908] [Citation(s) in RCA: 39] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/27/2023]
Abstract
NMR-based crystallography approaches involving the combination of crystal structure prediction methods, ab initio calculated chemical shifts and solid-state NMR experiments are powerful methods for crystal structure determination of microcrystalline powders. However, currently structural information obtained from solid-state NMR is usually included only after a set of candidate crystal structures has already been independently generated, starting from a set of single-molecule conformations. Here, we show with the case of ampicillin that this can lead to failure of structure determination. We propose a crystal structure determination method that includes experimental constraints during conformer selection. In order to overcome the problem that experimental measurements on the crystalline samples are not obviously translatable to restrict the single-molecule conformational space, we propose constraints based on the analysis of absent cross-peaks in solid-state NMR correlation experiments. We show that these absences provide unambiguous structural constraints on both the crystal structure and the gas-phase conformations, and therefore can be used for unambiguous selection. The approach is parametrized on the crystal structure determination of flutamide, flufenamic acid, and cocaine, where we reduce the computational cost by around 50%. Most importantly, the method is then shown to correctly determine the crystal structure of ampicillin, which would have failed using current methods because it adopts a high-energy conformer in its crystal structure. The average positional RMSE on the NMR powder structure is ⟨rav⟩ = 0.176 Å, which corresponds to an average equivalent displacement parameter Ueq = 0.0103 Å2.
Collapse
Affiliation(s)
- Albert Hofstetter
- Institut des Sciences et Ingénierie Chimiques , École Polytechnique Fédérale de Lausanne (EPFL) , 1015 Lausanne , Switzerland
| | - Martins Balodis
- Institut des Sciences et Ingénierie Chimiques , École Polytechnique Fédérale de Lausanne (EPFL) , 1015 Lausanne , Switzerland
| | - Federico M Paruzzo
- Institut des Sciences et Ingénierie Chimiques , École Polytechnique Fédérale de Lausanne (EPFL) , 1015 Lausanne , Switzerland
| | - Cory M Widdifield
- Department of Chemistry, Mathematics and Science Center , Oakland University , 146 Library Drive , Rochester , Michigan 48309-4479 , United States
| | - Gabriele Stevanato
- Institut des Sciences et Ingénierie Chimiques , École Polytechnique Fédérale de Lausanne (EPFL) , 1015 Lausanne , Switzerland
| | - Arthur C Pinon
- Institut des Sciences et Ingénierie Chimiques , École Polytechnique Fédérale de Lausanne (EPFL) , 1015 Lausanne , Switzerland
| | - Peter J Bygrave
- School of Chemistry , University of Southampton , Highfield , Southampton SO17 1BJ , United Kingdom
| | - Graeme M Day
- School of Chemistry , University of Southampton , Highfield , Southampton SO17 1BJ , United Kingdom
| | - Lyndon Emsley
- Institut des Sciences et Ingénierie Chimiques , École Polytechnique Fédérale de Lausanne (EPFL) , 1015 Lausanne , Switzerland
| |
Collapse
|
9
|
Stuke A, Todorović M, Rupp M, Kunkel C, Ghosh K, Himanen L, Rinke P. Chemical diversity in molecular orbital energy predictions with kernel ridge regression. J Chem Phys 2019; 150:204121. [DOI: 10.1063/1.5086105] [Citation(s) in RCA: 33] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open
Affiliation(s)
- Annika Stuke
- Department of Applied Physics, Aalto University, P.O. Box 11100, Aalto FI-00076, Finland
| | - Milica Todorović
- Department of Applied Physics, Aalto University, P.O. Box 11100, Aalto FI-00076, Finland
| | - Matthias Rupp
- Fritz Haber Institute of the Max Planck Society, Faradayweg 4-6, 14195 Berlin, Germany
| | - Christian Kunkel
- Department of Applied Physics, Aalto University, P.O. Box 11100, Aalto FI-00076, Finland
- Chair for Theoretical Chemistry and Catalysis Research Center, Technische Universität München, Lichtenbergstr. 4, 85747 Garching, Germany
| | - Kunal Ghosh
- Department of Applied Physics, Aalto University, P.O. Box 11100, Aalto FI-00076, Finland
- Department of Computer Science, Aalto University, P.O. Box 15400, Aaalto FI-00076, Finland
| | - Lauri Himanen
- Department of Applied Physics, Aalto University, P.O. Box 11100, Aalto FI-00076, Finland
| | - Patrick Rinke
- Department of Applied Physics, Aalto University, P.O. Box 11100, Aalto FI-00076, Finland
- Chair for Theoretical Chemistry and Catalysis Research Center, Technische Universität München, Lichtenbergstr. 4, 85747 Garching, Germany
| |
Collapse
|
10
|
Ceriotti M. Unsupervised machine learning in atomistic simulations, between predictions and understanding. J Chem Phys 2019; 150:150901. [PMID: 31005087 DOI: 10.1063/1.5091842] [Citation(s) in RCA: 85] [Impact Index Per Article: 14.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022] Open
Abstract
Automated analyses of the outcome of a simulation have been an important part of atomistic modeling since the early days, addressing the need of linking the behavior of individual atoms and the collective properties that are usually the final quantity of interest. Methods such as clustering and dimensionality reduction have been used to provide a simplified, coarse-grained representation of the structure and dynamics of complex systems from proteins to nanoparticles. In recent years, the rise of machine learning has led to an even more widespread use of these algorithms in atomistic modeling and to consider different classification and inference techniques as part of a coherent toolbox of data-driven approaches. This perspective briefly reviews some of the unsupervised machine-learning methods-that are geared toward classification and coarse-graining of molecular simulations-seen in relation to the fundamental mathematical concepts that underlie all machine-learning techniques. It discusses the importance of using concise yet complete representations of atomic structures as the starting point of the analyses and highlights the risk of introducing preconceived biases when using machine learning to rationalize and understand structure-property relations. Supervised machine-learning techniques that explicitly attempt to predict the properties of a material given its structure are less susceptible to such biases. Current developments in the field suggest that using these two classes of approaches side-by-side and in a fully integrated mode, while keeping in mind the relations between the data analysis framework and the fundamental physical principles, will be key to realizing the full potential of machine learning to help understand the behavior of complex molecules and materials.
Collapse
Affiliation(s)
- Michele Ceriotti
- Laboratory of Computational Science and Modeling, Institute des Materiaux, École Polytechnique Fédérale de Lausanne, 1015 Lausanne, Switzerland
| |
Collapse
|
11
|
Veit M, Jain SK, Bonakala S, Rudra I, Hohl D, Csányi G. Equation of State of Fluid Methane from First Principles with Machine Learning Potentials. J Chem Theory Comput 2019; 15:2574-2586. [DOI: 10.1021/acs.jctc.8b01242] [Citation(s) in RCA: 31] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023]
Affiliation(s)
- Max Veit
- Engineering Laboratory, University of Cambridge, Trumpington Street, Cambridge CB2 1PZ, United Kingdom
| | | | | | - Indranil Rudra
- Shell India Markets
Pvt. Ltd., Bengaluru 562149, Karnataka, India
| | - Detlef Hohl
- Shell Global Solutions
International BV, Grasweg 31, 1031 HW Amsterdam, The Netherlands
| | - Gábor Csányi
- Engineering Laboratory, University of Cambridge, Trumpington Street, Cambridge CB2 1PZ, United Kingdom
| |
Collapse
|
12
|
Li X, Curtis FS, Rose T, Schober C, Vazquez-Mayagoitia A, Reuter K, Oberhofer H, Marom N. Genarris: Random generation of molecular crystal structures and fast screening with a Harris approximation. J Chem Phys 2018; 148:241701. [PMID: 29960303 DOI: 10.1063/1.5014038] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023] Open
Abstract
We present Genarris, a Python package that performs configuration space screening for molecular crystals of rigid molecules by random sampling with physical constraints. For fast energy evaluations, Genarris employs a Harris approximation, whereby the total density of a molecular crystal is constructed via superposition of single molecule densities. Dispersion-inclusive density functional theory is then used for the Harris density without performing a self-consistency cycle. Genarris uses machine learning for clustering, based on a relative coordinate descriptor developed specifically for molecular crystals, which is shown to be robust in identifying packing motif similarity. In addition to random structure generation, Genarris offers three workflows based on different sequences of successive clustering and selection steps: the "Rigorous" workflow is an exhaustive exploration of the potential energy landscape, the "Energy" workflow produces a set of low energy structures, and the "Diverse" workflow produces a maximally diverse set of structures. The latter is recommended for generating initial populations for genetic algorithms. Here, the implementation of Genarris is reported and its application is demonstrated for three test cases.
Collapse
Affiliation(s)
- Xiayue Li
- Department of Materials Science and Engineering, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213, USA
| | - Farren S Curtis
- Department of Physics, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213, USA
| | - Timothy Rose
- Department of Materials Science and Engineering, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213, USA
| | - Christoph Schober
- Chair for Theoretical Chemistry and Catalysis Research Center, Technische Universiät München, Lichtenbergstr. 4, D-85747 Garching, Germany
| | | | - Karsten Reuter
- Chair for Theoretical Chemistry and Catalysis Research Center, Technische Universiät München, Lichtenbergstr. 4, D-85747 Garching, Germany
| | - Harald Oberhofer
- Chair for Theoretical Chemistry and Catalysis Research Center, Technische Universiät München, Lichtenbergstr. 4, D-85747 Garching, Germany
| | - Noa Marom
- Department of Materials Science and Engineering, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213, USA
| |
Collapse
|
13
|
Nguyen TT, Székely E, Imbalzano G, Behler J, Csányi G, Ceriotti M, Götz AW, Paesani F. Comparison of permutationally invariant polynomials, neural networks, and Gaussian approximation potentials in representing water interactions through many-body expansions. J Chem Phys 2018; 148:241725. [DOI: 10.1063/1.5024577] [Citation(s) in RCA: 118] [Impact Index Per Article: 16.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/02/2023] Open
Affiliation(s)
- Thuong T. Nguyen
- Department of Chemistry and Biochemistry, University of California, San Diego, La Jolla, California 92093, USA
- San Diego Supercomputer Center, University of California, San Diego, La Jolla, California 92093, USA
| | - Eszter Székely
- Engineering Department, University of Cambridge, Trumpington Street, Cambridge CB2 1PZ, United Kingdom
| | - Giulio Imbalzano
- Laboratory of Computational Science and Modeling, Institute of Materials, École Polytechnique Fédérale de Lausanne, 1015 Lausanne, Switzerland
| | - Jörg Behler
- Universität Göttingen, Institut für Physikalische Chemie, Theoretische Chemie, Tammannstr. 6, 37077 Göttingen, Germany
| | - Gábor Csányi
- Engineering Department, University of Cambridge, Trumpington Street, Cambridge CB2 1PZ, United Kingdom
| | - Michele Ceriotti
- Laboratory of Computational Science and Modeling, Institute of Materials, École Polytechnique Fédérale de Lausanne, 1015 Lausanne, Switzerland
| | - Andreas W. Götz
- San Diego Supercomputer Center, University of California, San Diego, La Jolla, California 92093, USA
| | - Francesco Paesani
- Department of Chemistry and Biochemistry, University of California, San Diego, La Jolla, California 92093, USA
- San Diego Supercomputer Center, University of California, San Diego, La Jolla, California 92093, USA
| |
Collapse
|
14
|
Imbalzano G, Anelli A, Giofré D, Klees S, Behler J, Ceriotti M. Automatic selection of atomic fingerprints and reference configurations for machine-learning potentials. J Chem Phys 2018; 148:241730. [DOI: 10.1063/1.5024611] [Citation(s) in RCA: 163] [Impact Index Per Article: 23.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023] Open
Affiliation(s)
- Giulio Imbalzano
- Laboratory of Computational Science and Modeling, IMX, École Polytechnique Fédérale de Lausanne, 1015 Lausanne, Switzerland
| | - Andrea Anelli
- Laboratory of Computational Science and Modeling, IMX, École Polytechnique Fédérale de Lausanne, 1015 Lausanne, Switzerland
| | - Daniele Giofré
- Laboratory of Computational Science and Modeling, IMX, École Polytechnique Fédérale de Lausanne, 1015 Lausanne, Switzerland
| | - Sinja Klees
- Lehrstuhl für Theoretische Chemie, Ruhr-Universität Bochum, 44801 Bochum, Germany
| | - Jörg Behler
- Lehrstuhl für Theoretische Chemie, Ruhr-Universität Bochum, 44801 Bochum, Germany
- Universität Göttingen, Institut für Physikalische Chemie, Theoretische Chemie, Tammannstr. 6, 37077 Göttingen, Germany
| | - Michele Ceriotti
- Laboratory of Computational Science and Modeling, IMX, École Polytechnique Fédérale de Lausanne, 1015 Lausanne, Switzerland
| |
Collapse
|
15
|
Vannay L, Meyer B, Petraglia R, Sforazzini G, Ceriotti M, Corminboeuf C. Analyzing Fluxional Molecules Using DORI. J Chem Theory Comput 2018; 14:2370-2379. [PMID: 29570294 DOI: 10.1021/acs.jctc.7b01176] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
The Density Overlap Region Indicator (DORI) is a density-based scalar field that reveals covalent bonding patterns and noncovalent interactions in the same value range. This work goes beyond the traditional static quantum chemistry use of scalar fields and illustrates the suitability of DORI for analyzing geometrical and electronic signatures in highly fluxional molecular systems. Examples include a dithiocyclophane, which possesses multiple local minima with differing extents of π-stacking interactions and a temperature dependent rotation of a molecular rotor, where the descriptor is employed to capture fingerprints of CH-π and π-π interactions. Finally, DORI serves to examine the fluctuating π-conjugation pathway of a photochromic torsional switch (PTS). Attention is also placed on postprocessing the large amount of generated data and juxtaposing DORI with a data-driven low-dimensional representation of the structural landscape.
Collapse
|
16
|
Musil F, De S, Yang J, Campbell JE, Day GM, Ceriotti M. Machine learning for the structure-energy-property landscapes of molecular crystals. Chem Sci 2018; 9:1289-1300. [PMID: 29675175 PMCID: PMC5887104 DOI: 10.1039/c7sc04665k] [Citation(s) in RCA: 98] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2017] [Accepted: 12/11/2017] [Indexed: 12/18/2022] Open
Abstract
Molecular crystals play an important role in several fields of science and technology. They frequently crystallize in different polymorphs with substantially different physical properties. To help guide the synthesis of candidate materials, atomic-scale modelling can be used to enumerate the stable polymorphs and to predict their properties, as well as to propose heuristic rules to rationalize the correlations between crystal structure and materials properties. Here we show how a recently-developed machine-learning (ML) framework can be used to achieve inexpensive and accurate predictions of the stability and properties of polymorphs, and a data-driven classification that is less biased and more flexible than typical heuristic rules. We discuss, as examples, the lattice energy and property landscapes of pentacene and two azapentacene isomers that are of interest as organic semiconductor materials. We show that we can estimate force field or DFT lattice energies with sub-kJ mol-1 accuracy, using only a few hundred reference configurations, and reduce by a factor of ten the computational effort needed to predict charge mobility in the crystal structures. The automatic structural classification of the polymorphs reveals a more detailed picture of molecular packing than that provided by conventional heuristics, and helps disentangle the role of hydrogen bonded and π-stacking interactions in determining molecular self-assembly. This observation demonstrates that ML is not just a black-box scheme to interpolate between reference calculations, but can also be used as a tool to gain intuitive insights into structure-property relations in molecular crystal engineering.
Collapse
Affiliation(s)
- Félix Musil
- National Center for Computational Design and Discovery of Novel Materials (MARVEL) , Laboratory of Computational Science and Modelling , Institute of Materials , Ecole Polytechnique Federale de Lausanne , Lausanne , Switzerland . ;
| | - Sandip De
- National Center for Computational Design and Discovery of Novel Materials (MARVEL) , Laboratory of Computational Science and Modelling , Institute of Materials , Ecole Polytechnique Federale de Lausanne , Lausanne , Switzerland . ;
| | - Jack Yang
- School of Chemistry , University of Southampton , Highfield , Southampton , UK
| | - Joshua E Campbell
- School of Chemistry , University of Southampton , Highfield , Southampton , UK
| | - Graeme M Day
- School of Chemistry , University of Southampton , Highfield , Southampton , UK
| | - Michele Ceriotti
- National Center for Computational Design and Discovery of Novel Materials (MARVEL) , Laboratory of Computational Science and Modelling , Institute of Materials , Ecole Polytechnique Federale de Lausanne , Lausanne , Switzerland . ;
| |
Collapse
|
17
|
Gasparotto P, Meißner RH, Ceriotti M. Recognizing Local and Global Structural Motifs at the Atomic Scale. J Chem Theory Comput 2018; 14:486-498. [DOI: 10.1021/acs.jctc.7b00993] [Citation(s) in RCA: 30] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Affiliation(s)
- Piero Gasparotto
- Laboratory of Computational
Science and Modeling, Institute of Materials, École Polytechnique Fédérale de Lausanne, 1015 Lausanne, Switzerland
| | - Robert Horst Meißner
- Laboratory of Computational
Science and Modeling, Institute of Materials, École Polytechnique Fédérale de Lausanne, 1015 Lausanne, Switzerland
| | - Michele Ceriotti
- Laboratory of Computational
Science and Modeling, Institute of Materials, École Polytechnique Fédérale de Lausanne, 1015 Lausanne, Switzerland
| |
Collapse
|
18
|
Grisafi A, Wilkins DM, Csányi G, Ceriotti M. Symmetry-Adapted Machine Learning for Tensorial Properties of Atomistic Systems. PHYSICAL REVIEW LETTERS 2018; 120:036002. [PMID: 29400528 DOI: 10.1103/physrevlett.120.036002] [Citation(s) in RCA: 149] [Impact Index Per Article: 21.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/11/2017] [Revised: 11/30/2017] [Indexed: 05/28/2023]
Abstract
Statistical learning methods show great promise in providing an accurate prediction of materials and molecular properties, while minimizing the need for computationally demanding electronic structure calculations. The accuracy and transferability of these models are increased significantly by encoding into the learning procedure the fundamental symmetries of rotational and permutational invariance of scalar properties. However, the prediction of tensorial properties requires that the model respects the appropriate geometric transformations, rather than invariance, when the reference frame is rotated. We introduce a formalism that extends existing schemes and makes it possible to perform machine learning of tensorial properties of arbitrary rank, and for general molecular geometries. To demonstrate it, we derive a tensor kernel adapted to rotational symmetry, which is the natural generalization of the smooth overlap of atomic positions kernel commonly used for the prediction of scalar properties at the atomic scale. The performance and generality of the approach is demonstrated by learning the instantaneous response to an external electric field of water oligomers of increasing complexity, from the isolated molecule to the condensed phase.
Collapse
Affiliation(s)
- Andrea Grisafi
- Laboratory of Computational Science and Modeling, IMX, École Polytechnique Fédérale de Lausanne, 1015 Lausanne, Switzerland
| | - David M Wilkins
- Laboratory of Computational Science and Modeling, IMX, École Polytechnique Fédérale de Lausanne, 1015 Lausanne, Switzerland
| | - Gábor Csányi
- Engineering Laboratory, University of Cambridge, Trumpington Street, Cambridge CB21PZ, United Kingdom
| | - Michele Ceriotti
- Laboratory of Computational Science and Modeling, IMX, École Polytechnique Fédérale de Lausanne, 1015 Lausanne, Switzerland
| |
Collapse
|
19
|
Willatt MJ, Musil F, Ceriotti M. Feature optimization for atomistic machine learning yields a data-driven construction of the periodic table of the elements. Phys Chem Chem Phys 2018; 20:29661-29668. [DOI: 10.1039/c8cp05921g] [Citation(s) in RCA: 62] [Impact Index Per Article: 8.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Abstract
By representing elements as points in a low-dimensional chemical space it is possible to improve the performance of a machine-learning model for a chemically-diverse dataset. The resulting coordinates are reminiscent of the main groups of the periodic table.
Collapse
Affiliation(s)
- Michael J. Willatt
- National Center for Computational Design and Discovery of Novel Materials (MARVEL)
- Laboratory of Computational Science and Modelling
- Institute of Materials
- Ecole Polytechnique Federale de Lausanne
- Lausanne
| | - Félix Musil
- National Center for Computational Design and Discovery of Novel Materials (MARVEL)
- Laboratory of Computational Science and Modelling
- Institute of Materials
- Ecole Polytechnique Federale de Lausanne
- Lausanne
| | - Michele Ceriotti
- National Center for Computational Design and Discovery of Novel Materials (MARVEL)
- Laboratory of Computational Science and Modelling
- Institute of Materials
- Ecole Polytechnique Federale de Lausanne
- Lausanne
| |
Collapse
|
20
|
Schneider M, Masellis C, Rizzo T, Baldauf C. Kinetically Trapped Liquid-State Conformers of a Sodiated Model Peptide Observed in the Gas Phase. J Phys Chem A 2017; 121:6838-6844. [DOI: 10.1021/acs.jpca.7b06431] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023]
Affiliation(s)
- Markus Schneider
- Theory
Department, Fritz-Haber-Institut der Max-Planck-Gesellschaft, Faradayweg 4-6, D-14195 Berlin, Germany
| | - Chiara Masellis
- Laboratoire
de Chimie Physique Moléculaire, EPFL SB ISIC LCPM, Ecole Polytechnique Fédérale de Lausanne, Station 6, CH-1015 Lausanne, Switzerland
| | - Thomas Rizzo
- Laboratoire
de Chimie Physique Moléculaire, EPFL SB ISIC LCPM, Ecole Polytechnique Fédérale de Lausanne, Station 6, CH-1015 Lausanne, Switzerland
| | - Carsten Baldauf
- Theory
Department, Fritz-Haber-Institut der Max-Planck-Gesellschaft, Faradayweg 4-6, D-14195 Berlin, Germany
| |
Collapse
|