151
|
Spectral Properties of Effective Dynamics from Conditional Expectations. ENTROPY 2021; 23:e23020134. [PMID: 33494443 PMCID: PMC7912208 DOI: 10.3390/e23020134] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/15/2020] [Accepted: 01/18/2021] [Indexed: 01/03/2023]
Abstract
The reduction of high-dimensional systems to effective models on a smaller set of variables is an essential task in many areas of science. For stochastic dynamics governed by diffusion processes, a general procedure to find effective equations is the conditioning approach. In this paper, we are interested in the spectrum of the generator of the resulting effective dynamics, and how it compares to the spectrum of the full generator. We prove a new relative error bound in terms of the eigenfunction approximation error for reversible systems. We also present numerical examples indicating that, if Kramers–Moyal (KM) type approximations are used to compute the spectrum of the reduced generator, it seems largely insensitive to the time window used for the KM estimators. We analyze the implications of these observations for systems driven by underdamped Langevin dynamics, and show how meaningful effective dynamics can be defined in this setting.
Collapse
|
152
|
Abstract
Four decades of molecular theory and computation have helped form the modern understanding of the physical chemistry of organic semiconductors. Whereas these efforts have historically centered around characterizations of electronic structure at the single-molecule or dimer scale, emerging trends in noncrystalline molecular and polymeric semiconductors are motivating the need for modeling techniques capable of morphological and electronic structure predictions at the mesoscale. Provided the challenges associated with these prediction tasks, the community has begun to evolve a computational toolkit for organic semiconductors incorporating techniques from the fields of soft matter, coarse-graining, and machine learning. Here, we highlight recent advances in coarse-grained methodologies aimed at the multiscale characterization of noncrystalline organic semiconductors. As organic semiconductor performance is dependent on the interplay of mesoscale morphology and molecular electronic structure, specific emphasis is placed on coarse-grained modeling approaches capable of both structural and electronic predictions without recourse to all-atom representations.
Collapse
Affiliation(s)
- Nicholas E Jackson
- Department of Chemistry, University of Illinois, Urbana-Champaign, Urbana, Illinois 61801, United States
| |
Collapse
|
153
|
GABRIEL JOSHUAJ, PAULSON NOAHH, DUONG THIENC, TAVAZZA FRANCESCA, BECKER CHANDLERA, CHAUDHURI SANTANU, STAN MARIUS. Uncertainty Quantification in Atomistic Modeling of Metals and Its Effect on Mesoscale and Continuum Modeling: A Review. JOM (WARRENDALE, PA. : 1989) 2021; 73:10.1007/s11837-020-04436-6. [PMID: 34511862 PMCID: PMC8431950 DOI: 10.1007/s11837-020-04436-6] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/27/2020] [Accepted: 10/05/2020] [Indexed: 06/13/2023]
Abstract
The design of next-generation alloys through the integrated computational materials engineering (ICME) approach relies on multiscale computer simulations to provide thermodynamic properties when experiments are difficult to conduct. Atomistic methods such as density functional theory (DFT) and molecular dynamics (MD) have been successful in predicting properties of never before studied compounds or phases. However, uncertainty quantification (UQ) of DFT and MD results is rarely reported due to computational and UQ methodology challenges. Over the past decade, studies that mitigate this gap have emerged. These advances are reviewed in the context of thermodynamic modeling and information exchange with mesoscale methods such as the phase-field method (PFM) and calculation of phase diagrams (CALPHAD). The importance of UQ is illustrated using properties of metals, with aluminum as an example, and highlighting deterministic, frequentist, and Bayesian methodologies. Challenges facing routine uncertainty quantification and an outlook on addressing them are also presented.
Collapse
Affiliation(s)
- JOSHUA J. GABRIEL
- Applied Materials Division, Argonne National Laboratory, Lemont, IL 60439, USA
| | - NOAH H. PAULSON
- Applied Materials Division, Argonne National Laboratory, Lemont, IL 60439, USA
| | - THIEN C. DUONG
- Energy and Global Security, Argonne National Laboratory, Lemont, IL 60439, USA
| | - FRANCESCA TAVAZZA
- Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD 20899, USA
| | - CHANDLER A. BECKER
- Office of Data and Informatics, Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD 20899, USA
| | - SANTANU CHAUDHURI
- Manufacturing Science and Engineering, Energy and Global Security, Argonne National Laboratory, Lemont, IL 60439, USA
- Civil, Materials, and Environmental Engineering, University of Illinois at Chicago, Chicago, IL 60607, USA
| | - MARIUS STAN
- Applied Materials Division, Argonne National Laboratory, Lemont, IL 60439, USA
| |
Collapse
|
154
|
Chen M. Collective variable-based enhanced sampling and machine learning. THE EUROPEAN PHYSICAL JOURNAL. B 2021; 94:211. [PMID: 34697536 PMCID: PMC8527828 DOI: 10.1140/epjb/s10051-021-00220-w] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/02/2021] [Accepted: 10/03/2021] [Indexed: 05/14/2023]
Abstract
ABSTRACT Collective variable-based enhanced sampling methods have been widely used to study thermodynamic properties of complex systems. Efficiency and accuracy of these enhanced sampling methods are affected by two factors: constructing appropriate collective variables for enhanced sampling and generating accurate free energy surfaces. Recently, many machine learning techniques have been developed to improve the quality of collective variables and the accuracy of free energy surfaces. Although machine learning has achieved great successes in improving enhanced sampling methods, there are still many challenges and open questions. In this perspective, we shall review recent developments on integrating machine learning techniques and collective variable-based enhanced sampling approaches. We also discuss challenges and future research directions including generating kinetic information, exploring high-dimensional free energy surfaces, and efficiently sampling all-atom configurations.
Collapse
Affiliation(s)
- Ming Chen
- Department of Chemistry, Purdue University, West Lafayette, IN 47907 USA
| |
Collapse
|
155
|
Balcells D, Skjelstad BB. tmQM Dataset-Quantum Geometries and Properties of 86k Transition Metal Complexes. J Chem Inf Model 2020; 60:6135-6146. [PMID: 33166143 PMCID: PMC7768608 DOI: 10.1021/acs.jcim.0c01041] [Citation(s) in RCA: 37] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2020] [Indexed: 12/19/2022]
Abstract
We report the transition metal quantum mechanics (tmQM) data set, which contains the geometries and properties of a large transition metal-organic compound space. tmQM comprises 86,665 mononuclear complexes extracted from the Cambridge Structural Database, including Werner, bioinorganic, and organometallic complexes based on a large variety of organic ligands and 30 transition metals (the 3d, 4d, and 5d from groups 3 to 12). All complexes are closed-shell, with a formal charge in the range {+1, 0, -1}e. The tmQM data set provides the Cartesian coordinates of all metal complexes optimized at the GFN2-xTB level, and their molecular size, stoichiometry, and metal node degree. The quantum properties were computed at the DFT(TPSSh-D3BJ/def2-SVP) level and include the electronic and dispersion energies, highest occupied molecular orbital (HOMO) and lowest unoccupied molecular orbital (LUMO) energies, HOMO/LUMO gap, dipole moment, and natural charge of the metal center; GFN2-xTB polarizabilities are also provided. Pairwise representations showed the low correlation between these properties, providing nearly continuous maps with unusual regions of the chemical space, for example, complexes combining large polarizabilities with wide HOMO/LUMO gaps and complexes combining low-energy HOMO orbitals with electron-rich metal centers. The tmQM data set can be exploited in the data-driven discovery of new metal complexes, including predictive models based on machine learning. These models may have a strong impact on the fields in which transition metal chemistry plays a key role, for example, catalysis, organic synthesis, and materials science. tmQM is an open data set that can be downloaded free of charge from https://github.com/bbskjelstad/tmqm.
Collapse
Affiliation(s)
- David Balcells
- Hylleraas
Centre for Quantum Molecular Sciences, Department of Chemistry, University of Oslo, P.O. Box 1033, Blindern, 0315 Oslo, Norway
| | - Bastian Bjerkem Skjelstad
- Institute
for Chemical Reaction Design and Discovery (WPI-ICReDD), Hokkaido University, Sapporo 001-0021, Japan
| |
Collapse
|
156
|
Gui S, Chen Z, Lu B, Chen M. Molecular Sparse Representation by a 3D Ellipsoid Radial Basis Function Neural Network via L1 Regularization. J Chem Inf Model 2020; 60:6054-6064. [PMID: 33180488 DOI: 10.1021/acs.jcim.0c00585] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
The three-dimensional structures and shapes of biomolecules provide essential information about their interactions and functions. Unfortunately, the computational cost of biomolecular shape representation is an active challenge which increases rapidly as the number of atoms increase. Recent developments in sparse representation and deep learning have shown significant improvements in terms of time and space. A sparse representation of molecular shape is also useful in various other applications, such as molecular structure alignment, docking, and coarse-grained molecular modeling. We have developed an ellipsoid radial basis function neural network (ERBFNN) and an algorithm for sparsely representing molecular shape. To evaluate a sparse representation model of molecular shape, the Gaussian density map of the molecule is approximated using ERBFNN with a relatively small number of neurons. The deep learning models were trained by optimizing a nonlinear loss function with L1 regularization. Experimental results reveal that our algorithm can represent the original molecular shape with a relatively higher accuracy and fewer scale of ERBFNN. Our network in principle is applicable to the multiresolution sparse representation of molecular shape and coarse-grained molecular modeling. Executable files are available at https://github.com/SGUI-LSEC/SparseGaussianMolecule. The program was implemented in PyTorch and was run on Linux.
Collapse
Affiliation(s)
- Sheng Gui
- State Key Laboratory of Scientific and Engineering Computing, National Center for Mathematics and Interdisciplinary Sciences, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, China.,School of Mathematical Sciences, University of Chinese Academy of Sciences, Beijing 100049, China.,Department of Mathematics, Soochow University, Suzhou 215006, China
| | - Zhaodi Chen
- Department of Mathematics, Soochow University, Suzhou 215006, China
| | - Benzhuo Lu
- State Key Laboratory of Scientific and Engineering Computing, National Center for Mathematics and Interdisciplinary Sciences, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, China.,School of Mathematical Sciences, University of Chinese Academy of Sciences, Beijing 100049, China
| | - Minxin Chen
- Department of Mathematics, Soochow University, Suzhou 215006, China
| |
Collapse
|
157
|
Gao W, Mahajan SP, Sulam J, Gray JJ. Deep Learning in Protein Structural Modeling and Design. PATTERNS (NEW YORK, N.Y.) 2020; 1:100142. [PMID: 33336200 PMCID: PMC7733882 DOI: 10.1016/j.patter.2020.100142] [Citation(s) in RCA: 82] [Impact Index Per Article: 20.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
Deep learning is catalyzing a scientific revolution fueled by big data, accessible toolkits, and powerful computational resources, impacting many fields, including protein structural modeling. Protein structural modeling, such as predicting structure from amino acid sequence and evolutionary information, designing proteins toward desirable functionality, or predicting properties or behavior of a protein, is critical to understand and engineer biological systems at the molecular level. In this review, we summarize the recent advances in applying deep learning techniques to tackle problems in protein structural modeling and design. We dissect the emerging approaches using deep learning techniques for protein structural modeling and discuss advances and challenges that must be addressed. We argue for the central importance of structure, following the "sequence → structure → function" paradigm. This review is directed to help both computational biologists to gain familiarity with the deep learning methods applied in protein modeling, and computer scientists to gain perspective on the biologically meaningful problems that may benefit from deep learning techniques.
Collapse
Affiliation(s)
- Wenhao Gao
- Department of Chemical and Biomolecular Engineering, Johns Hopkins University, Baltimore, MD 21218, USA
| | - Sai Pooja Mahajan
- Department of Chemical and Biomolecular Engineering, Johns Hopkins University, Baltimore, MD 21218, USA
| | - Jeremias Sulam
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD 21218, USA
| | - Jeffrey J. Gray
- Department of Chemical and Biomolecular Engineering, Johns Hopkins University, Baltimore, MD 21218, USA
| |
Collapse
|
158
|
Rudzinski JF, Bereau T. Coarse-grained conformational surface hopping: Methodology and transferability. J Chem Phys 2020; 153:214110. [DOI: 10.1063/5.0031249] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Affiliation(s)
| | - Tristan Bereau
- Max Planck Institute for Polymer Research, 55128 Mainz, Germany
- Van ’t Hoff Institute for Molecular Sciences and Informatics Institute, University of Amsterdam, Amsterdam 1098 XH, The Netherlands
| |
Collapse
|
159
|
Abella JR, Antunes D, Jackson K, Lizée G, Clementi C, Kavraki LE. Markov state modeling reveals alternative unbinding pathways for peptide-MHC complexes. Proc Natl Acad Sci U S A 2020; 117:30610-30618. [PMID: 33184174 PMCID: PMC7720115 DOI: 10.1073/pnas.2007246117] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023] Open
Abstract
Peptide binding to major histocompatibility complexes (MHCs) is a central component of the immune system, and understanding the mechanism behind stable peptide-MHC binding will aid the development of immunotherapies. While MHC binding is mostly influenced by the identity of the so-called anchor positions of the peptide, secondary interactions from nonanchor positions are known to play a role in complex stability. However, current MHC-binding prediction methods lack an analysis of the major conformational states and might underestimate the impact of secondary interactions. In this work, we present an atomically detailed analysis of peptide-MHC binding that can reveal the contributions of any interaction toward stability. We propose a simulation framework that uses both umbrella sampling and adaptive sampling to generate a Markov state model (MSM) for a coronavirus-derived peptide (QFKDNVILL), bound to one of the most prevalent MHC receptors in humans (HLA-A24:02). While our model reaffirms the importance of the anchor positions of the peptide in establishing stable interactions, our model also reveals the underestimated importance of position 4 (p4), a nonanchor position. We confirmed our results by simulating the impact of specific peptide mutations and validated these predictions through competitive binding assays. By comparing the MSM of the wild-type system with those of the D4A and D4P mutations, our modeling reveals stark differences in unbinding pathways. The analysis presented here can be applied to any peptide-MHC complex of interest with a structural model as input, representing an important step toward comprehensive modeling of the MHC class I pathway.
Collapse
Affiliation(s)
- Jayvee R Abella
- Department of Computer Science, Rice University, Houston, TX 77005
| | - Dinler Antunes
- Department of Computer Science, Rice University, Houston, TX 77005
| | - Kyle Jackson
- Department of Melanoma Medical Oncology-Research, The University of Texas MD Anderson Cancer Center, Houston, TX 77030
| | - Gregory Lizée
- Department of Melanoma Medical Oncology-Research, The University of Texas MD Anderson Cancer Center, Houston, TX 77030
| | - Cecilia Clementi
- Center for Theoretical Biological Physics, Rice University, Houston, TX 77005
- Department of Chemistry, Rice University, Houston, TX 77005
| | - Lydia E Kavraki
- Department of Computer Science, Rice University, Houston, TX 77005;
| |
Collapse
|
160
|
Husic BE, Charron NE, Lemm D, Wang J, Pérez A, Majewski M, Krämer A, Chen Y, Olsson S, de Fabritiis G, Noé F, Clementi C. Coarse graining molecular dynamics with graph neural networks. J Chem Phys 2020; 153:194101. [PMID: 33218238 PMCID: PMC7671749 DOI: 10.1063/5.0026133] [Citation(s) in RCA: 71] [Impact Index Per Article: 17.8] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2020] [Accepted: 10/27/2020] [Indexed: 11/14/2022] Open
Abstract
Coarse graining enables the investigation of molecular dynamics for larger systems and at longer timescales than is possible at an atomic resolution. However, a coarse graining model must be formulated such that the conclusions we draw from it are consistent with the conclusions we would draw from a model at a finer level of detail. It has been proved that a force matching scheme defines a thermodynamically consistent coarse-grained model for an atomistic system in the variational limit. Wang et al. [ACS Cent. Sci. 5, 755 (2019)] demonstrated that the existence of such a variational limit enables the use of a supervised machine learning framework to generate a coarse-grained force field, which can then be used for simulation in the coarse-grained space. Their framework, however, requires the manual input of molecular features to machine learn the force field. In the present contribution, we build upon the advance of Wang et al. and introduce a hybrid architecture for the machine learning of coarse-grained force fields that learn their own features via a subnetwork that leverages continuous filter convolutions on a graph neural network architecture. We demonstrate that this framework succeeds at reproducing the thermodynamics for small biomolecular systems. Since the learned molecular representations are inherently transferable, the architecture presented here sets the stage for the development of machine-learned, coarse-grained force fields that are transferable across molecular systems.
Collapse
Affiliation(s)
| | | | - Dominik Lemm
- Computational Science Laboratory, Universitat Pompeu Fabra, PRBB, C/Dr. Aiguader 88, Barcelona, Spain
| | | | - Adrià Pérez
- Computational Science Laboratory, Universitat Pompeu Fabra, PRBB, C/Dr. Aiguader 88, Barcelona, Spain
| | - Maciej Majewski
- Computational Science Laboratory, Universitat Pompeu Fabra, PRBB, C/Dr. Aiguader 88, Barcelona, Spain
| | - Andreas Krämer
- Department of Mathematics and Computer Science, Freie Universität, Berlin, Germany
| | | | - Simon Olsson
- Department of Mathematics and Computer Science, Freie Universität, Berlin, Germany
| | | | | | | |
Collapse
|
161
|
Gao P, Zhang J, Sun Y, Yu J. Toward Accurate Predictions of Atomic Properties via Quantum Mechanics Descriptors Augmented Graph Convolutional Neural Network: Application of This Novel Approach in NMR Chemical Shifts Predictions. J Phys Chem Lett 2020; 11:9812-9818. [PMID: 33151693 DOI: 10.1021/acs.jpclett.0c02654] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
In this study, an augmented Graph Convolutional Network (GCN) with quantum mechanics (QM) descriptors was reported for its accurate predictions of NMR chemical shifts with respect to experimental values. The prediction errors of 13C/1H NMR chemical shifts can be as small as 2.14/0.11 ppm. There are two crucial characteristics for this modified GCN: in one aspect, such a novel neural network could efficiently extract the overall molecule structure information; in another aspect, it could accurately solve the chemical environment of the target atom. As there exists an imperfect linear regression between the experimental NMR chemical shifts (δ) and the density functional theory (DFT) calculated isotropic shielding constants (σ), the inclusion of QM descriptors within GCN can largely improve its performance. Moreover, few-shot learning also becomes feasible with these descriptors. The success of this novel GCN in chemical shifts predictions also indicates its potential applicability for other computational studies.
Collapse
Affiliation(s)
- Peng Gao
- School of Chemistry and Molecular Bioscience, University of Wollongong, Wollongong, NSW 2500, Australia
| | - Jie Zhang
- Centre of Chemistry and Chemical Biology, Bioland Laboratory (Guangzhou Regenerative Medicine and Health-Guangdong Laboratory), Guangzhou 53000, China
- School of Chemical Engineering, East China University of Science and Technology, Shanghai 200237, China
| | - Yuzhu Sun
- School of Chemical Engineering, East China University of Science and Technology, Shanghai 200237, China
| | - Jianguo Yu
- School of Chemical Engineering, East China University of Science and Technology, Shanghai 200237, China
| |
Collapse
|
162
|
Crabb E, France-Lanord A, Leverick G, Stephens R, Shao-Horn Y, Grossman JC. Importance of Equilibration Method and Sampling for Ab Initio Molecular Dynamics Simulations of Solvent–Lithium-Salt Systems in Lithium-Oxygen Batteries. J Chem Theory Comput 2020; 16:7255-7266. [DOI: 10.1021/acs.jctc.0c00833] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
- Emily Crabb
- Department of Physics, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, Massachusetts 02139, United States
| | - Arthur France-Lanord
- Department of Materials Science and Engineering, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, Massachusetts 02139, United States
| | - Graham Leverick
- Department of Mechanical Engineering, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, Massachusetts 02139, United States
| | - Ryan Stephens
- Shell International Exploration & Production Inc., Houston, Texas 77082, United States
| | - Yang Shao-Horn
- Department of Materials Science and Engineering, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, Massachusetts 02139, United States
- Department of Mechanical Engineering, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, Massachusetts 02139, United States
| | - Jeffrey C. Grossman
- Department of Materials Science and Engineering, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, Massachusetts 02139, United States
| |
Collapse
|
163
|
Zhang J, Lei YK, Yang YI, Gao YQ. Deep learning for variational multiscale molecular modeling. J Chem Phys 2020; 153:174115. [DOI: 10.1063/5.0026836] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Affiliation(s)
- Jun Zhang
- Institute of Systems and Physical Biology, Shenzhen Bay Laboratory, 518055 Shenzhen, China
- Department of Mathematics and Computer Science, Freie Universität Berlin, Arnimallee 6, 14195 Berlin, Germany
| | - Yao-Kun Lei
- Beijing National Laboratory for Molecular Sciences, College of Chemistry and Molecular Engineering, Peking University, 100871 Beijing, China
| | - Yi Isaac Yang
- Institute of Systems and Physical Biology, Shenzhen Bay Laboratory, 518055 Shenzhen, China
| | - Yi Qin Gao
- Institute of Systems and Physical Biology, Shenzhen Bay Laboratory, 518055 Shenzhen, China
- Beijing National Laboratory for Molecular Sciences, College of Chemistry and Molecular Engineering, Peking University, 100871 Beijing, China
- Beijing Advanced Innovation Center for Genomics, Peking University, 100871 Beijing, China
- Biomedical Pioneering Innovation Center, Peking University, 100871 Beijing, China
| |
Collapse
|
164
|
Ruza J, Wang W, Schwalbe-Koda D, Axelrod S, Harris WH, Gómez-Bombarelli R. Temperature-transferable coarse-graining of ionic liquids with dual graph convolutional neural networks. J Chem Phys 2020; 153:164501. [DOI: 10.1063/5.0022431] [Citation(s) in RCA: 24] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023] Open
Affiliation(s)
- Jurgis Ruza
- Materials Science and Engineering, École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland
- Department of Materials Science and Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA
| | - Wujie Wang
- Department of Materials Science and Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA
| | - Daniel Schwalbe-Koda
- Department of Materials Science and Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA
| | - Simon Axelrod
- Department of Materials Science and Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA
- Department of Chemistry and Chemical Biology, Harvard University, Cambridge, Massachusetts 02138, USA
| | - William H. Harris
- Department of Materials Science and Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA
| | - Rafael Gómez-Bombarelli
- Department of Materials Science and Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA
| |
Collapse
|
165
|
Kojima T, Washio T, Hara S, Koishi M. Synthesis of computer simulation and machine learning for achieving the best material properties of filled rubber. Sci Rep 2020; 10:18127. [PMID: 33093549 PMCID: PMC7581745 DOI: 10.1038/s41598-020-75038-0] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2020] [Accepted: 10/09/2020] [Indexed: 12/04/2022] Open
Abstract
Molecular dynamics (MD) simulation is used to analyze the mechanical properties of polymerized and nanoscale filled rubber. Unfortunately, the computation time for a simulation can require several months’ computing power, because the interactions of thousands of filler particles must be calculated. To alleviate this problem, we introduce a surrogate convolutional neural network model to achieve faster and more accurate predictions. The major difficulty when employing machine-learning-based surrogate models is the shortage of training data, contributing to the huge simulation costs. To derive a highly accurate surrogate model using only a small amount of training data, we increase the number of training instances by dividing the large-scale simulation results into 3D images of middle-scale filler morphologies and corresponding regional stresses. The images include fringe regions to reflect the influence of the filler constituents outside the core regions. The resultant surrogate model provides higher prediction accuracy than that trained only by images of the entire region. Afterwards, we extract the fillers that dominate the mechanical properties using the surrogate model and we confirm their validity using MD.
Collapse
Affiliation(s)
- Takashi Kojima
- Research and Advanced Development Division, The Yokohama Rubber Co., Ltd., 2-1 Oiwake, Hiratsuka,, Kanagawa,, 254-8601, Japan. .,Department of Reasoning for Intelligence, The Institute of Scientific and Industrial Research, Osaka University, 8-1, Mihogaoka, Ibarakishi, Osaka, 567-0047, Japan.
| | - Takashi Washio
- Department of Reasoning for Intelligence, The Institute of Scientific and Industrial Research, Osaka University, 8-1, Mihogaoka, Ibarakishi, Osaka, 567-0047, Japan
| | - Satoshi Hara
- Department of Reasoning for Intelligence, The Institute of Scientific and Industrial Research, Osaka University, 8-1, Mihogaoka, Ibarakishi, Osaka, 567-0047, Japan
| | - Masataka Koishi
- Research and Advanced Development Division, The Yokohama Rubber Co., Ltd., 2-1 Oiwake, Hiratsuka,, Kanagawa,, 254-8601, Japan
| |
Collapse
|
166
|
Nicholas TC, Goodwin AL, Deringer VL. Understanding the geometric diversity of inorganic and hybrid frameworks through structural coarse-graining. Chem Sci 2020; 11:12580-12587. [PMID: 34123235 PMCID: PMC8162807 DOI: 10.1039/d0sc03287e] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2020] [Accepted: 10/16/2020] [Indexed: 12/13/2022] Open
Abstract
Much of our understanding of complex structures is based on simplification: for example, metal-organic frameworks are often discussed in the context of "nodes" and "linkers", allowing for a qualitative comparison with simpler inorganic structures. Here we show how such an understanding can be obtained in a systematic and quantitative framework, combining atom-density based similarity (kernel) functions and unsupervised machine learning with the long-standing idea of "coarse-graining" atomic structure. We demonstrate how the latter enables a comparison of vastly different chemical systems, and we use it to create a unified, two-dimensional structure map of experimentally known tetrahedral AB2 networks - including clathrate hydrates, zeolitic imidazolate frameworks (ZIFs), and diverse inorganic phases. The structural relationships that emerge can then be linked to microscopic properties of interest, which we exemplify for structural heterogeneity and tetrahedral density.
Collapse
Affiliation(s)
- Thomas C Nicholas
- Department of Chemistry, Inorganic Chemistry Laboratory, University of Oxford Oxford OX1 3QR UK
| | - Andrew L Goodwin
- Department of Chemistry, Inorganic Chemistry Laboratory, University of Oxford Oxford OX1 3QR UK
| | - Volker L Deringer
- Department of Chemistry, Inorganic Chemistry Laboratory, University of Oxford Oxford OX1 3QR UK
| |
Collapse
|
167
|
Webb MA, Jackson NE, Gil PS, de Pablo JJ. Targeted sequence design within the coarse-grained polymer genome. SCIENCE ADVANCES 2020; 6:eabc6216. [PMID: 33087352 PMCID: PMC7577717 DOI: 10.1126/sciadv.abc6216] [Citation(s) in RCA: 66] [Impact Index Per Article: 16.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/04/2020] [Accepted: 09/02/2020] [Indexed: 05/05/2023]
Abstract
The chemical design of polymers with target structural and/or functional properties represents a grand challenge in materials science. While data-driven design approaches are promising, success with polymers has been limited, largely due to limitations in data availability. Here, we demonstrate the targeted sequence design of single-chain structure in polymers by combining coarse-grained modeling, machine learning, and model optimization. Nearly 2000 unique coarse-grained polymers are simulated to construct and analyze machine learning models. We find that deep neural networks inexpensively and reliably predict structural properties with limited sequence information as input. By coupling trained ML models with sequential model-based optimization, polymer sequences are proposed to exhibit globular, swollen, or rod-like behaviors, which are verified by explicit simulations. This work highlights the promising integration of coarse-grained modeling with data-driven design and represents a necessary and crucial step toward more complex polymer design efforts.
Collapse
Affiliation(s)
- Michael A Webb
- Pritzker School of Molecular Engineering, University of Chicago, Chicago, IL 60615, USA
| | - Nicholas E Jackson
- Pritzker School of Molecular Engineering, University of Chicago, Chicago, IL 60615, USA
- Center for Molecular Engineering and Materials Science Division, Argonne National Laboratory, Lemont, IL 06349, USA
| | - Phwey S Gil
- Pritzker School of Molecular Engineering, University of Chicago, Chicago, IL 60615, USA
| | - Juan J de Pablo
- Pritzker School of Molecular Engineering, University of Chicago, Chicago, IL 60615, USA.
- Center for Molecular Engineering and Materials Science Division, Argonne National Laboratory, Lemont, IL 06349, USA
| |
Collapse
|
168
|
Sauceda HE, Gastegger M, Chmiela S, Müller KR, Tkatchenko A. Molecular force fields with gradient-domain machine learning (GDML): Comparison and synergies with classical force fields. J Chem Phys 2020; 153:124109. [DOI: 10.1063/5.0023005] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Affiliation(s)
- Huziel E. Sauceda
- Department of Physics and Materials Science, University of Luxembourg, L-1511 Luxembourg, Luxembourg
- Machine Learning Group, Technische Universität Berlin, 10587 Berlin, Germany
- BASLEARN, BASF-TU Joint Lab, Technische Universität Berlin, 10587 Berlin, Germany
| | - Michael Gastegger
- Machine Learning Group, Technische Universität Berlin, 10587 Berlin, Germany
- BASLEARN, BASF-TU Joint Lab, Technische Universität Berlin, 10587 Berlin, Germany
- DFG Cluster of Excellence “Unifying Systems in Catalysis” (UniSysCat), Technische Universität Berlin, 10623 Berlin, Germany
| | - Stefan Chmiela
- Machine Learning Group, Technische Universität Berlin, 10587 Berlin, Germany
| | - Klaus-Robert Müller
- Machine Learning Group, Technische Universität Berlin, 10587 Berlin, Germany
- Department of Artificial Intelligence, Korea University, Anam-dong, Seongbuk-gu, Seoul 136-713, South Korea
- Max Planck Institute for Informatics, Stuhlsatzenhausweg, 66123 Saarbrücken, Germany
- Google Research, Brain Team, Berlin, Germany
| | - Alexandre Tkatchenko
- Department of Physics and Materials Science, University of Luxembourg, L-1511 Luxembourg, Luxembourg
| |
Collapse
|
169
|
Wang S, Ma Z, Pan W. Data-driven coarse-grained modeling of polymers in solution with structural and dynamic properties conserved. SOFT MATTER 2020; 16:8330-8344. [PMID: 32785383 DOI: 10.1039/d0sm01019g] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
We present data-driven coarse-grained (CG) modeling for polymers in solution, which conserves the dynamic as well as structural properties of the underlying atomistic system. The CG modeling is built upon the framework of the generalized Langevin equation (GLE). The key is to determine each term in the GLE by directly linking it to atomistic data. In particular, we propose a two-stage Gaussian process-based Bayesian optimization method to infer the non-Markovian memory kernel from the data of the velocity autocorrelation function (VACF). Considering that the long-time behaviors of the VACF and memory kernel for polymer solutions can exhibit hydrodynamic scaling (algebraic decay with time), we further develop an active learning method to determine the emergence of hydrodynamic scaling, which can accelerate the inference process of the memory kernel. The proposed methods do not rely on how the mean force or CG potential in the GLE is constructed. Thus, we also compare two methods for constructing the CG potential: a deep learning method and the iterative Boltzmann inversion method. With the memory kernel and CG potential determined, the GLE is mapped onto an extended Markovian process to circumvent the expensive cost of directly solving the GLE. The accuracy and computational efficiency of the proposed CG modeling are assessed in a model star-polymer solution system at three representative concentrations. By comparing with the reference atomistic simulation results, we demonstrate that the proposed CG modeling can robustly and accurately reproduce the dynamic and structural properties of polymers in solution.
Collapse
Affiliation(s)
- Shu Wang
- Department of Mechanical Engineering, University of Wisconsin-Madison, Madison, WI 53706, USA.
| | - Zhan Ma
- Department of Mechanical Engineering, University of Wisconsin-Madison, Madison, WI 53706, USA.
| | - Wenxiao Pan
- Department of Mechanical Engineering, University of Wisconsin-Madison, Madison, WI 53706, USA.
| |
Collapse
|
170
|
Gao P, Yang X, Tartakovsky AM. Learning Coarse-Grained Potentials for Binary Fluids. J Chem Inf Model 2020; 60:3731-3745. [PMID: 32668158 DOI: 10.1021/acs.jcim.0c00337] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
For a multiple-fluid system, CG models capable of accurately predicting the interfacial properties as a function of curvature are still lacking. In this work, we propose a new probabilistic machine learning (ML) model for learning CG potentials for binary fluids. The water-hexane mixture is selected as a typical immiscible binary liquid-liquid system. We develop a new CG force field (FF) using the Shinoda-DeVane-Klein (SDK) FF framework and compute parameters in this CG FF using the proposed probabilistic ML method. It is shown that a standard response-surface approach does not provide a unique set of parameters, as it results in a loss function with multiple shallow minima. To address this challenge, we develop a probabilistic ML approach where we compute the probability density function (PDF) of parameters that minimize the loss function. The PDF has a well-defined peak corresponding to a unique set of parameters in the CG FF that reproduces the desired properties of a liquid-liquid interface. We compare the performance of the new CG FF with several existing FFs for the water-hexane mixture, including two atomistic and three CG FFs with respect to modeling the interface structure and thermodynamic properties. It is demonstrated that the new FF significantly improves the CG model prediction of both the interfacial tension and structure for the water-hexane mixture.
Collapse
Affiliation(s)
- Peiyuan Gao
- Advanced Computing, Mathematics, and Data Division, Pacific Northwest National Laboratory, Richland, Washington 99352, United States
| | - Xiu Yang
- Department of Industrial and Systems Engineering, Lehigh University, Bethlehem, Pennsylvania 18015, United States
| | - Alexandre M Tartakovsky
- Advanced Computing, Mathematics, and Data Division, Pacific Northwest National Laboratory, Richland, Washington 99352, United States
| |
Collapse
|
171
|
Murillo MS, Marciante M, Stanton LG. Machine Learning Discovery of Computational Model Efficacy Boundaries. PHYSICAL REVIEW LETTERS 2020; 125:085503. [PMID: 32909767 DOI: 10.1103/physrevlett.125.085503] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/28/2019] [Accepted: 07/30/2020] [Indexed: 06/11/2023]
Abstract
Computational models are formulated in hierarchies of variable fidelity, often with no quantitative rule for defining the fidelity boundaries. We have constructed a dataset from a wide range of atomistic computational models to reveal the accuracy boundary between higher-fidelity models and a simple, lower-fidelity model. The symbolic decision boundary is discovered by optimizing a support vector machine on the data through iterative feature engineering. This data-driven approach reveals two important results: (i) a symbolic rule emerges that is independent of the algorithm, and (ii) the symbolic rule provides a deeper understanding of the fidelity boundary. Specifically, our dataset is composed of radial distribution functions from seven high-fidelity methods that cover wide ranges in the features (element, density, and temperature); high-fidelity results are compared with a simple pair-potential model to discover the nonlinear combination of the features, and the machine learning approach directly reveals the central role of atomic physics in determining accuracy.
Collapse
Affiliation(s)
- Michael S Murillo
- Department of Computational Mathematics, Science and Engineering, Michigan State University, East Lansing, Michigan 48824, USA
| | | | - Liam G Stanton
- Department of Mathematics and Statistics, San José State University, San José, California 95192, USA
| |
Collapse
|
172
|
Gkeka P, Stoltz G, Barati Farimani A, Belkacemi Z, Ceriotti M, Chodera JD, Dinner AR, Ferguson AL, Maillet JB, Minoux H, Peter C, Pietrucci F, Silveira A, Tkatchenko A, Trstanova Z, Wiewiora R, Lelièvre T. Machine Learning Force Fields and Coarse-Grained Variables in Molecular Dynamics: Application to Materials and Biological Systems. J Chem Theory Comput 2020; 16:4757-4775. [PMID: 32559068 PMCID: PMC8312194 DOI: 10.1021/acs.jctc.0c00355] [Citation(s) in RCA: 82] [Impact Index Per Article: 20.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Abstract
Machine learning encompasses tools and algorithms that are now becoming popular in almost all scientific and technological fields. This is true for molecular dynamics as well, where machine learning offers promises of extracting valuable information from the enormous amounts of data generated by simulation of complex systems. We provide here a review of our current understanding of goals, benefits, and limitations of machine learning techniques for computational studies on atomistic systems, focusing on the construction of empirical force fields from ab initio databases and the determination of reaction coordinates for free energy computation and enhanced sampling.
Collapse
Affiliation(s)
- Paraskevi Gkeka
- Integrated Drug Discovery, Sanofi R&D, 91385 Chilly-Mazarin, France
| | - Gabriel Stoltz
- CERMICS, Ecole des Ponts, Marne-la-Vallée, France
- Matherials Project-Team, Inria Paris, 75012 Paris, France
| | | | - Zineb Belkacemi
- Integrated Drug Discovery, Sanofi R&D, 91385 Chilly-Mazarin, France
- CERMICS, Ecole des Ponts, Marne-la-Vallée, France
| | - Michele Ceriotti
- Laboratory of Computational Science and Modelling, Institute of Materials, École Polytechnique Fédérale de Lausanne, CH-1015 Lausanne, Switzerland
| | - John D Chodera
- Computational and Systems Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, New York 10065, United States
| | - Aaron R Dinner
- Department of Chemistry, The University of Chicago, Chicago, Illinois 60637, United States
| | - Andrew L Ferguson
- Pritzker School of Molecular Engineering, University of Chicago, 5640 South Ellis Avenue, Chicago, Illinois 60637, United States
| | | | - Hervé Minoux
- Integrated Drug Discovery, Sanofi R&D, 94403 Vitry-sur-Seine, France
| | | | - Fabio Pietrucci
- UMR CNRS 7590, MNHN, Institut de Minéralogie, de Physique des Matériaux et de Cosmochimie, Sorbonne Université, 75005 Paris, France
| | - Ana Silveira
- Computational and Systems Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, New York 10065, United States
| | - Alexandre Tkatchenko
- Department of Physics and Materials Science, University of Luxembourg, L-1511 Luxembourg City, Luxembourg
| | - Zofia Trstanova
- School of Mathematics, The University of Edinburgh, Edinburgh EH9 3FD, U.K
| | - Rafal Wiewiora
- Computational and Systems Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, New York 10065, United States
| | - Tony Lelièvre
- CERMICS, Ecole des Ponts, Marne-la-Vallée, France
- Matherials Project-Team, Inria Paris, 75012 Paris, France
| |
Collapse
|
173
|
Li W, Burkhart C, Polińska P, Harmandaris V, Doxastakis M. Backmapping coarse-grained macromolecules: An efficient and versatile machine learning approach. J Chem Phys 2020; 153:041101. [DOI: 10.1063/5.0012320] [Citation(s) in RCA: 25] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Affiliation(s)
- Wei Li
- Department of Chemical and Biomolecular Engineering, University of Tennessee, Knoxville, Tennessee 37996, USA
| | - Craig Burkhart
- The Goodyear Tire and Rubber Company, Akron, Ohio 44305, USA
| | - Patrycja Polińska
- Goodyear Innovation Center Luxembourg, Avenue Gordon Smith, L-7750 Colmar-Berg, Luxembourg
| | - Vagelis Harmandaris
- Department of Applied Mathematics, University of Crete, and IACM FORTH, GR-71110 Heraklion, Greece
- Computation-Based Science and Technology Research Center, The Cyprus Institute, Nicosia 2121, Cyprus
| | - Manolis Doxastakis
- Department of Chemical and Biomolecular Engineering, University of Tennessee, Knoxville, Tennessee 37996, USA
| |
Collapse
|
174
|
Zhang J, Lei YK, Zhang Z, Chang J, Li M, Han X, Yang L, Yang YI, Gao YQ. A Perspective on Deep Learning for Molecular Modeling and Simulations. J Phys Chem A 2020; 124:6745-6763. [PMID: 32786668 DOI: 10.1021/acs.jpca.0c04473] [Citation(s) in RCA: 24] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
Deep learning is transforming many areas in science, and it has great potential in modeling molecular systems. However, unlike the mature deployment of deep learning in computer vision and natural language processing, its development in molecular modeling and simulations is still at an early stage, largely because the inductive biases of molecules are completely different from those of images or texts. Footed on these differences, we first reviewed the limitations of traditional deep learning models from the perspective of molecular physics and wrapped up some relevant technical advancement at the interface between molecular modeling and deep learning. We do not focus merely on the ever more complex neural network models; instead, we introduce various useful concepts and ideas brought by modern deep learning. We hope that transacting these ideas into molecular modeling will create new opportunities. For this purpose, we summarized several representative applications, ranging from supervised to unsupervised and reinforcement learning, and discussed their connections with the emerging trends in deep learning. Finally, we give an outlook for promising directions which may help address the existing issues in the current framework of deep molecular modeling.
Collapse
Affiliation(s)
- Jun Zhang
- Institute of Systems and Physical Biology, Shenzhen Bay Laboratory, 518055 Shenzhen, China
- Department of Mathematics and Computer Science, Freie Universität Berlin, Arnimallee 6, 14195 Berlin, Germany
| | - Yao-Kun Lei
- Beijing National Laboratory for Molecular Sciences, College of Chemistry and Molecular Engineering, Peking University, 100871 Beijing, China
| | - Zhen Zhang
- Department of Physics, Tangshan Normal University, 063000 Tangshan, China
| | - Junhan Chang
- Beijing National Laboratory for Molecular Sciences, College of Chemistry and Molecular Engineering, Peking University, 100871 Beijing, China
| | - Maodong Li
- Institute of Systems and Physical Biology, Shenzhen Bay Laboratory, 518055 Shenzhen, China
| | - Xu Han
- Beijing National Laboratory for Molecular Sciences, College of Chemistry and Molecular Engineering, Peking University, 100871 Beijing, China
| | - Lijiang Yang
- Beijing National Laboratory for Molecular Sciences, College of Chemistry and Molecular Engineering, Peking University, 100871 Beijing, China
| | - Yi Isaac Yang
- Institute of Systems and Physical Biology, Shenzhen Bay Laboratory, 518055 Shenzhen, China
| | - Yi Qin Gao
- Institute of Systems and Physical Biology, Shenzhen Bay Laboratory, 518055 Shenzhen, China
- Beijing National Laboratory for Molecular Sciences, College of Chemistry and Molecular Engineering, Peking University, 100871 Beijing, China
- Beijing Advanced Innovation Center for Genomics, Peking University, 100871 Beijing, China
- Biomedical Pioneering Innovation Center, Peking University, 100871 Beijing, China
| |
Collapse
|
175
|
Rauer C, Bereau T. Hydration free energies from kernel-based machine learning: Compound-database bias. J Chem Phys 2020; 153:014101. [DOI: 10.1063/5.0012230] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Affiliation(s)
- Clemens Rauer
- Max Planck Institute for Polymer Research, 55128 Mainz, Germany
| | - Tristan Bereau
- Max Planck Institute for Polymer Research, 55128 Mainz, Germany
- Van ’t Hoff Institute for Molecular Sciences and Informatics Institute, University of Amsterdam, Amsterdam 1098 XH, The Netherlands
| |
Collapse
|
176
|
Gao P, Zhang J, Peng Q, Zhang J, Glezakou VA. General Protocol for the Accurate Prediction of Molecular 13C/1H NMR Chemical Shifts via Machine Learning Augmented DFT. J Chem Inf Model 2020; 60:3746-3754. [DOI: 10.1021/acs.jcim.0c00388] [Citation(s) in RCA: 25] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Affiliation(s)
- Peng Gao
- School of Chemistry and Molecular Bioscience, University of Wollongong, Wollongong, NSW 2500, Australia
| | - Jun Zhang
- Physical Sciences Division, Pacific Northwest National Laboratory (PNNL), Richland, Washington 99352, United States
| | - Qian Peng
- State Key Laboratory of Elemento-Organic Chemistry, College of Chemistry, Nankai University, Tianjin 300071, China
| | - Jie Zhang
- Centre of Chemistry and Chemical Biology, Guangzhou Regenerative Medicine and Health-Guangdong Laboratory, Science Park, Guangzhou 510530, China
| | - Vassiliki-Alexandra Glezakou
- Physical Sciences Division, Pacific Northwest National Laboratory (PNNL), Richland, Washington 99352, United States
| |
Collapse
|
177
|
Coley CW, Eyke NS, Jensen KF. Autonomous Discovery in the Chemical Sciences Part I: Progress. Angew Chem Int Ed Engl 2020; 59:22858-22893. [DOI: 10.1002/anie.201909987] [Citation(s) in RCA: 100] [Impact Index Per Article: 25.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2019] [Indexed: 01/05/2023]
Affiliation(s)
- Connor W. Coley
- Department of Chemical Engineering Massachusetts Institute of Technology Cambridge MA 02139 USA
| | - Natalie S. Eyke
- Department of Chemical Engineering Massachusetts Institute of Technology Cambridge MA 02139 USA
| | - Klavs F. Jensen
- Department of Chemical Engineering Massachusetts Institute of Technology Cambridge MA 02139 USA
| |
Collapse
|
178
|
Predicting optical spectra for optoelectronic polymers using coarse-grained models and recurrent neural networks. Proc Natl Acad Sci U S A 2020; 117:13945-13948. [PMID: 32513725 DOI: 10.1073/pnas.1918696117] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Coarse-grained modeling of conjugated polymers has become an increasingly popular route to investigate the physics of organic optoelectronic materials. While ultraviolet (UV)-vis spectroscopy remains one of the key experimental methods for the interrogation of these materials, a rigorous bridge between simulated coarse-grained structures and spectroscopy has not been established. Here, we address this challenge by developing a method that can predict spectra of conjugated polymers directly from coarse-grained representations while avoiding repetitive procedures such as ad hoc back-mapping from coarse-grained to atomistic representations followed by spectral computation using quantum chemistry. Our approach is based on a generative deep-learning model: the long-short-term memory recurrent neural network (LSTM-RNN). The latter is suggested by the apparent similarity between natural languages and the mathematical structure of perturbative expansions of, in our case, excited-state energies perturbed by conformational fluctuations. We also use this model to explore the level of sensitivity of spectra to the coarse-grained representation back-mapping protocol. Our approach presents a tool uniquely suited for improving postsimulation analysis protocols, as well as, potentially, for including spectral data as input in the refinement of coarse-grained potentials.
Collapse
|
179
|
Coley CW, Eyke NS, Jensen KF. Autonome Entdeckung in den chemischen Wissenschaften, Teil I: Fortschritt. Angew Chem Int Ed Engl 2020. [DOI: 10.1002/ange.201909987] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Affiliation(s)
- Connor W. Coley
- Department of Chemical Engineering Massachusetts Institute of Technology Cambridge MA 02139 USA
| | - Natalie S. Eyke
- Department of Chemical Engineering Massachusetts Institute of Technology Cambridge MA 02139 USA
| | - Klavs F. Jensen
- Department of Chemical Engineering Massachusetts Institute of Technology Cambridge MA 02139 USA
| |
Collapse
|
180
|
Wang J, Chmiela S, Müller KR, Noé F, Clementi C. Ensemble learning of coarse-grained molecular dynamics force fields with a kernel approach. J Chem Phys 2020; 152:194106. [DOI: 10.1063/5.0007276] [Citation(s) in RCA: 27] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023] Open
Affiliation(s)
- Jiang Wang
- Center for Theoretical Biological Physics, Rice University, Houston, Texas 77005, USA
- Department of Chemistry, Rice University, Houston, Texas 77005, USA
| | - Stefan Chmiela
- Machine Learning Group, Technische Universität Berlin, 10587 Berlin, Germany
| | - Klaus-Robert Müller
- Machine Learning Group, Technische Universität Berlin, 10587 Berlin, Germany
- Department of Brain and Cognitive Engineering, Korea University, Seoul 02841, South Korea
- Max Planck Institute for Informatics, Saarbrücken 66123, Germany
| | - Frank Noé
- Center for Theoretical Biological Physics, Rice University, Houston, Texas 77005, USA
- Department of Chemistry, Rice University, Houston, Texas 77005, USA
- Department of Mathematics and Computer Science, Freie Universität Berlin, Arnimallee 6, 14195 Berlin, Germany
- Department of Physics, Freie Universität Berlin, Arnimallee 14, 14195 Berlin, Germany
| | - Cecilia Clementi
- Center for Theoretical Biological Physics, Rice University, Houston, Texas 77005, USA
- Department of Chemistry, Rice University, Houston, Texas 77005, USA
- Department of Mathematics and Computer Science, Freie Universität Berlin, Arnimallee 6, 14195 Berlin, Germany
- Department of Physics, Freie Universität Berlin, Arnimallee 14, 14195 Berlin, Germany
- Department of Physics, Rice University, Houston, Texas 77005, USA
| |
Collapse
|
181
|
Roel-Touris J, Bonvin AM. Coarse-grained (hybrid) integrative modeling of biomolecular interactions. Comput Struct Biotechnol J 2020; 18:1182-1190. [PMID: 32514329 PMCID: PMC7264466 DOI: 10.1016/j.csbj.2020.05.002] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2020] [Revised: 04/23/2020] [Accepted: 05/06/2020] [Indexed: 12/23/2022] Open
Abstract
The computational modeling field has vastly evolved over the past decades. The early developments of simplified protein systems represented a stepping stone towards establishing more efficient approaches to sample intricated conformational landscapes. Downscaling the level of resolution of biomolecules to coarser representations allows for studying protein structure, dynamics and interactions that are not accessible by classical atomistic approaches. The combination of different resolutions, namely hybrid modeling, has also been proved as an alternative when mixed levels of details are required. In this review, we provide an overview of coarse-grained/hybrid models focusing on their applicability in the modeling of biomolecular interactions. We give a detailed list of ready-to-use modeling software for studying biomolecular interactions allowing various levels of coarse-graining and provide examples of complexes determined by integrative coarse-grained/hybrid approaches in combination with experimental information.
Collapse
|
182
|
Friederich P, Dos Passos Gomes G, De Bin R, Aspuru-Guzik A, Balcells D. Machine learning dihydrogen activation in the chemical space surrounding Vaska's complex. Chem Sci 2020; 11:4584-4601. [PMID: 33224459 PMCID: PMC7659707 DOI: 10.1039/d0sc00445f] [Citation(s) in RCA: 72] [Impact Index Per Article: 18.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2020] [Accepted: 04/06/2020] [Indexed: 12/15/2022] Open
Abstract
A machine learning exploration of the chemical space surrounding Vaska's complex.
Homogeneous catalysis using transition metal complexes is ubiquitously used for organic synthesis, as well as technologically relevant in applications such as water splitting and CO2 reduction. The key steps underlying homogeneous catalysis require a specific combination of electronic and steric effects from the ligands bound to the metal center. Finding the optimal combination of ligands is a challenging task due to the exceedingly large number of possibilities and the non-trivial ligand–ligand interactions. The classic example of Vaska's complex, trans-[Ir(PPh3)2(CO)(Cl)], illustrates this scenario. The ligands of this species activate iridium for the oxidative addition of hydrogen, yielding the dihydride cis-[Ir(H)2(PPh3)2(CO)(Cl)] complex. Despite the simplicity of this system, thousands of derivatives can be formulated for the activation of H2, with a limited number of ligands belonging to the same general categories found in the original complex. In this work, we show how DFT and machine learning (ML) methods can be combined to enable the prediction of reactivity within large chemical spaces containing thousands of complexes. In a space of 2574 species derived from Vaska's complex, data from DFT calculations are used to train and test ML models that predict the H2-activation barrier. In contrast to experiments and calculations requiring several days to be completed, the ML models were trained and used on a laptop on a time-scale of minutes. As a first approach, we combined Bayesian-optimized artificial neural networks (ANN) with features derived from autocorrelation and deltametric functions. The resulting ANNs achieved high accuracies, with mean absolute errors (MAE) between 1 and 2 kcal mol–1, depending on the size of the training set. By using a Gaussian process (GP) model trained with a set of selected features, including fingerprints, accuracy was further enhanced. Remarkably, this GP model minimized the MAE below 1 kcal mol–1, by using only 20% or less of the data available for training. The gradient boosting (GB) method was also used to assess the relevance of the features, which was used for both feature selection and model interpretation purposes. Features accounting for chemical composition, atom size and electronegativity were found to be the most determinant in the predictions. Further, the ligand fragments with the strongest influence on the H2-activation barrier were identified.
Collapse
Affiliation(s)
- Pascal Friederich
- Chemical Physics Theory Group , Department of Chemistry , University of Toronto , Toronto , Ontario M5S 3H6 , Canada.,Institute of Nanotechnology , Karlsruhe Institute of Technology , Hermann-von-Helmholtz-Platz 1 , 76344 Eggenstein-Leopoldshafen , Germany.,Department of Computer Science , University of Toronto , 214 College St. , Toronto , Ontario M5T 3A1 , Canada
| | - Gabriel Dos Passos Gomes
- Chemical Physics Theory Group , Department of Chemistry , University of Toronto , Toronto , Ontario M5S 3H6 , Canada.,Department of Computer Science , University of Toronto , 214 College St. , Toronto , Ontario M5T 3A1 , Canada
| | - Riccardo De Bin
- Department of Mathematics , University of Oslo , P. O. Box 1053, Blindern , N-0316 , Oslo , Norway
| | - Alán Aspuru-Guzik
- Chemical Physics Theory Group , Department of Chemistry , University of Toronto , Toronto , Ontario M5S 3H6 , Canada.,Department of Computer Science , University of Toronto , 214 College St. , Toronto , Ontario M5T 3A1 , Canada.,Vector Institute for Artificial Intelligence , 661 University Ave. Suite 710 , Toronto , Ontario M5G 1M1 , Canada.,Lebovic Fellow , Canadian Institute for Advanced Research (CIFAR) , 661 University Ave , Toronto , ON M5G 1M1 , Canada
| | - David Balcells
- Hylleraas Centre for Quantum Molecular Sciences , Department of Chemistry , University of Oslo , P. O. Box 1033, Blindern , N-0315 , Oslo , Norway .
| |
Collapse
|
183
|
Scherer C, Scheid R, Andrienko D, Bereau T. Kernel-Based Machine Learning for Efficient Simulations of Molecular Liquids. J Chem Theory Comput 2020; 16:3194-3204. [PMID: 32282206 PMCID: PMC7304872 DOI: 10.1021/acs.jctc.9b01256] [Citation(s) in RCA: 34] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2019] [Indexed: 11/29/2022]
Abstract
Current machine learning (ML) models aimed at learning force fields are plagued by their high computational cost at every integration time step. We describe a number of practical and computationally efficient strategies to parametrize traditional force fields for molecular liquids from ML: the particle decomposition ansatz to two- and three-body force fields, the use of kernel-based ML models that incorporate physical symmetries, the incorporation of switching functions close to the cutoff, and the use of covariant meshing to boost the training set size. Results are presented for model molecular liquids: pairwise Lennard-Jones, three-body Stillinger-Weber, and bottom-up coarse-graining of water. Here, covariant meshing proves to be an efficient strategy to learn canonically averaged instantaneous forces. We show that molecular dynamics simulations with tabulated two- and three-body ML potentials are computationally efficient and recover two- and three-body distribution functions. Many-body representations, decomposition, and kernel regression schemes are all implemented in the open-source software package VOTCA.
Collapse
Affiliation(s)
- Christoph Scherer
- Max Planck Institute for
Polymer Research, Ackermannweg 10, 55128 Mainz, Germany
| | - René Scheid
- Max Planck Institute for
Polymer Research, Ackermannweg 10, 55128 Mainz, Germany
| | - Denis Andrienko
- Max Planck Institute for
Polymer Research, Ackermannweg 10, 55128 Mainz, Germany
| | - Tristan Bereau
- Max Planck Institute for
Polymer Research, Ackermannweg 10, 55128 Mainz, Germany
| |
Collapse
|
184
|
Zhai Y, Caruso A, Gao S, Paesani F. Active learning of many-body configuration space: Application to the Cs+–water MB-nrg potential energy function as a case study. J Chem Phys 2020; 152:144103. [DOI: 10.1063/5.0002162] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023] Open
Affiliation(s)
- Yaoguang Zhai
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, California 92093, USA
| | - Alessandro Caruso
- Department of Chemistry and Biochemistry, University of California San Diego, La Jolla, California 92093, USA
| | - Sicun Gao
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, California 92093, USA
| | - Francesco Paesani
- Department of Chemistry and Biochemistry, University of California San Diego, La Jolla, California 92093, USA
- Materials Science and Engineering, University of California San Diego, La Jolla, California 92093, USA
- San Diego Supercomputer Center, University of California San Diego, La Jolla, California 92093, USA
| |
Collapse
|
185
|
Bryan JS, Sgouralis I, Pressé S. Inferring effective forces for Langevin dynamics using Gaussian processes. J Chem Phys 2020; 152:124106. [PMID: 32241120 PMCID: PMC7096241 DOI: 10.1063/1.5144523] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2020] [Accepted: 02/27/2020] [Indexed: 11/14/2022] Open
Abstract
Effective forces derived from experimental or in silico molecular dynamics time traces are critical in developing reduced and computationally efficient descriptions of otherwise complex dynamical problems. This helps motivate why it is important to develop methods to efficiently learn effective forces from time series data. A number of methods already exist to do this when data are plentiful but otherwise fail for sparse datasets or datasets where some regions of phase space are undersampled. In addition, any method developed to learn effective forces from time series data should be minimally a priori committal as to the shape of the effective force profile, exploit every data point without reducing data quality through any form of binning or pre-processing, and provide full credible intervals (error bars) about the prediction for the entirety of the effective force curve. Here, we propose a generalization of the Gaussian process, a key tool in Bayesian nonparametric inference and machine learning, which meets all of the above criteria in learning effective forces for the first time.
Collapse
Affiliation(s)
- J. Shepard Bryan
- Center for Biological Physics, Department of Physics, Arizona State University, Tempe, Arizona 85287, USA
| | - Ioannis Sgouralis
- Center for Biological Physics, Department of Physics, Arizona State University, Tempe, Arizona 85287, USA
| | - Steve Pressé
- Author to whom correspondence should be addressed:
| |
Collapse
|
186
|
Exploring Successful Parameter Region for Coarse-Grained Simulation of Biomolecules by Bayesian Optimization and Active Learning. Biomolecules 2020; 10:biom10030482. [PMID: 32245275 PMCID: PMC7175118 DOI: 10.3390/biom10030482] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2020] [Revised: 03/11/2020] [Accepted: 03/19/2020] [Indexed: 11/19/2022] Open
Abstract
Accompanied with an increase of revealed biomolecular structures owing to advancements in structural biology, the molecular dynamics (MD) approach, especially coarse-grained (CG) MD suitable for macromolecules, is becoming increasingly important for elucidating their dynamics and behavior. In fact, CG-MD simulation has succeeded in qualitatively reproducing numerous biological processes for various biomolecules such as conformational changes and protein folding with reasonable calculation costs. However, CG-MD simulations strongly depend on various parameters, and selecting an appropriate parameter set is necessary to reproduce a particular biological process. Because exhaustive examination of all candidate parameters is inefficient, it is important to identify successful parameters. Furthermore, the successful region, in which the desired process is reproducible, is essential for describing the detailed mechanics of functional processes and environmental sensitivity and robustness. We propose an efficient search method for identifying the successful region by using two machine learning techniques, Bayesian optimization and active learning. We evaluated its performance using F1-ATPase, a biological rotary motor, with CG-MD simulations. We successfully identified the successful region with lower computational costs (12.3% in the best case) without sacrificing accuracy compared to exhaustive search. This method can accelerate not only parameter search but also biological discussion of the detailed mechanics of functional processes and environmental sensitivity based on MD simulation studies.
Collapse
|
187
|
Sidky H, Chen W, Ferguson AL. Machine learning for collective variable discovery and enhanced sampling in biomolecular simulation. Mol Phys 2020. [DOI: 10.1080/00268976.2020.1737742] [Citation(s) in RCA: 26] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Affiliation(s)
- Hythem Sidky
- Pritzker School of Molecular Engineering, University of Chicago, Chicago, IL, USA
| | - Wei Chen
- Department of Physics, University of Illinois at Urbana-Champaign, Urbana, IL, USA
| | - Andrew L. Ferguson
- Pritzker School of Molecular Engineering, University of Chicago, Chicago, IL, USA
| |
Collapse
|
188
|
Liu S, Xiang X, Gao X, Liu H. Neighborhood Preference of Amino Acids in Protein Structures and its Applications in Protein Structure Assessment. Sci Rep 2020; 10:4371. [PMID: 32152349 PMCID: PMC7062742 DOI: 10.1038/s41598-020-61205-w] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2019] [Accepted: 02/24/2020] [Indexed: 12/02/2022] Open
Abstract
Amino acids form protein 3D structures in unique manners such that the folded structure is stable and functional under physiological conditions. Non-specific and non-covalent interactions between amino acids exhibit neighborhood preferences. Based on structural information from the protein data bank, a statistical energy function was derived to quantify amino acid neighborhood preferences. The neighborhood of one amino acid is defined by its contacting residues, and the energy function is determined by the neighboring residue types and relative positions. The neighborhood preference of amino acids was exploited to facilitate structural quality assessment, which was implemented in the neighborhood preference program NEPRE. The source codes are available via https://github.com/LiuLab-CSRC/NePre.
Collapse
Affiliation(s)
- Siyuan Liu
- Complex Systems Division, Beijing Computational Science Research Center, Haidian, Beijing, 100193, China
- School of Software Engineering, University of Science and Technology of China, Hefei, Anhui, 230026, China
| | - Xilun Xiang
- Complex Systems Division, Beijing Computational Science Research Center, Haidian, Beijing, 100193, China
- School of Software Engineering, University of Science and Technology of China, Hefei, Anhui, 230026, China
| | - Xiang Gao
- Complex Systems Division, Beijing Computational Science Research Center, Haidian, Beijing, 100193, China
- School of Software Engineering, University of Science and Technology of China, Hefei, Anhui, 230026, China
| | - Haiguang Liu
- Complex Systems Division, Beijing Computational Science Research Center, Haidian, Beijing, 100193, China.
- Physics Department, Beijing Normal University, Haidian, Beijing, 100875, China.
| |
Collapse
|
189
|
Abstract
Machine learning (ML) is transforming all areas of science. The complex and time-consuming calculations in molecular simulations are particularly suitable for an ML revolution and have already been profoundly affected by the application of existing ML methods. Here we review recent ML methods for molecular simulation, with particular focus on (deep) neural networks for the prediction of quantum-mechanical energies and forces, on coarse-grained molecular dynamics, on the extraction of free energy surfaces and kinetics, and on generative network approaches to sample molecular equilibrium structures and compute thermodynamics. To explain these methods and illustrate open methodological problems, we review some important principles of molecular physics and describe how they can be incorporated into ML structures. Finally, we identify and describe a list of open challenges for the interface between ML and molecular simulation.
Collapse
Affiliation(s)
- Frank Noé
- Department of Mathematics and Computer Science, Freie Universität Berlin, 14195 Berlin, Germany; .,Department of Physics, Freie Universität Berlin, 14195 Berlin, Germany.,Department of Chemistry and Center for Theoretical Biological Physics, Rice University, Houston, Texas 77005, USA;
| | - Alexandre Tkatchenko
- Physics and Materials Science Research Unit, University of Luxembourg, 1511 Luxembourg, Luxembourg;
| | - Klaus-Robert Müller
- Department of Computer Science, Technical University Berlin, 10587 Berlin, Germany; .,Max-Planck-Institut für Informatik, 66123 Saarbrücken, Germany.,Department of Brain and Cognitive Engineering, Korea University, Seoul 136-713, South Korea
| | - Cecilia Clementi
- Department of Mathematics and Computer Science, Freie Universität Berlin, 14195 Berlin, Germany; .,Department of Chemistry and Center for Theoretical Biological Physics, Rice University, Houston, Texas 77005, USA; .,Department of Physics, Rice University, Houston, Texas 77005, USA
| |
Collapse
|
190
|
Machine learning for protein folding and dynamics. Curr Opin Struct Biol 2020; 60:77-84. [DOI: 10.1016/j.sbi.2019.12.005] [Citation(s) in RCA: 60] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2019] [Revised: 11/21/2019] [Accepted: 12/05/2019] [Indexed: 12/17/2022]
|
191
|
Gasparotto P, Bochicchio D, Ceriotti M, Pavan GM. Identifying and Tracking Defects in Dynamic Supramolecular Polymers. J Phys Chem B 2020; 124:589-599. [PMID: 31888337 DOI: 10.1021/acs.jpcb.9b11015] [Citation(s) in RCA: 30] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]
Abstract
A central paradigm of self-assembly is to create ordered structures starting from molecular monomers that spontaneously recognize and interact with each other via noncovalent interactions. In recent years, great efforts have been directed toward perfecting the design of a variety of supramolecular polymers and materials with different architectures. The resulting structures are often thought of as ideally perfect, defect-free supramolecular fibers, micelles, vesicles, etc., having an intrinsic dynamic character, which are typically studied at the level of statistical ensembles to assess their average properties. However, molecular simulations recently demonstrated that local defects that may be present or may form in these assemblies, and which are poorly captured by conventional approaches, are key to controlling their dynamic behavior and properties. The study of these defects poses considerable challenges, as the flexible/dynamic nature of these soft systems makes it difficult to identify what effectively constitutes a defect and to characterize its stability and evolution. Here, we demonstrate the power of unsupervised machine-learning techniques to systematically identify and compare defects in supramolecular polymer variants in different conditions, using as a benchmark 5 Å resolution coarse-grained molecular simulations of a family of supramolecular polymers. We show that this approach allows a complete data-driven characterization of the internal structure and dynamics of these complex assemblies and of the dynamic pathways for defects formation and resorption. This provides a useful, generally applicable approach to unambiguously identify defects in these dynamic self-assembled materials and to classify them based on their structure, stability, and dynamics.
Collapse
Affiliation(s)
- Piero Gasparotto
- Laboratory of Computational Science and Modeling, Institute des Materiaux , Ecole polytechnique fédérale de Lausanne , CH-1015 Lausanne , Switzerland.,Thomas Young Centre and Department of Physics and Astronomy , University College London , Gower Street , London WC1E 6BT , United Kingdom
| | - Davide Bochicchio
- Department of Innovative Technologies , University of Applied Sciences and Arts of Southern Switzerland , Galleria 2, Via Cantonale 2c , CH-6928 Manno , Switzerland
| | - Michele Ceriotti
- Laboratory of Computational Science and Modeling, Institute des Materiaux , Ecole polytechnique fédérale de Lausanne , CH-1015 Lausanne , Switzerland
| | - Giovanni M Pavan
- Department of Innovative Technologies , University of Applied Sciences and Arts of Southern Switzerland , Galleria 2, Via Cantonale 2c , CH-6928 Manno , Switzerland.,Department of Applied Science and Technology , Politecnico di Torino , Corso Duca degli Abruzzi 24 , 10129 Torino , Italy
| |
Collapse
|
192
|
Khot A, Shiring SB, Savoie BM. Evidence of information limitations in coarse-grained models. J Chem Phys 2020; 151:244105. [PMID: 31893900 DOI: 10.1063/1.5129398] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Developing accurate coarse-grained (CG) models is critical for addressing long time and length scale phenomena with molecular simulations. Here, we distinguish and quantify two sources of error that are relevant to CG models in order to guide further methods development: "representability" errors, which result from the finite basis associated with the chosen functional form of the CG model and mapping operator, and "information" errors, which result from the limited kind and quantity of data supplied to the CG parameterization algorithm. We have performed a systematic investigation of these errors by generating all possible CG models of three liquids (butane, 1-butanol, and 1,3-propanediol) that conserve a set of chemically motivated locality and topology relationships. In turn, standard algorithms (iterative Boltzmann inversion, IBI, and multiscale coarse-graining, MSCG) were used to parameterize the models and the CG predictions were compared with atomistic results. For off-target properties, we observe a strong correlation between the accuracy and the resolution of the CG model, which suggests that the approximations represented by MSCG and IBI deteriorate with decreasing resolution. Conversely, on-target properties exhibit an extremely weak resolution dependence that suggests a limited role of representability errors in model accuracy. Taken together, these results suggest that simple CG models are capable of utilizing more information than is provided by standard parameterization algorithms, and that model accuracy can be improved by algorithm development rather than resorting to more complicated CG models.
Collapse
Affiliation(s)
- Aditi Khot
- Davidson School of Chemical Engineering, Purdue University, West Lafayette, Indiana 47906, USA
| | - Stephen B Shiring
- Davidson School of Chemical Engineering, Purdue University, West Lafayette, Indiana 47906, USA
| | - Brett M Savoie
- Davidson School of Chemical Engineering, Purdue University, West Lafayette, Indiana 47906, USA
| |
Collapse
|
193
|
Sauceda HE, Chmiela S, Poltavsky I, Müller KR, Tkatchenko A. Construction of Machine Learned Force Fields with Quantum Chemical Accuracy: Applications and Chemical Insights. MACHINE LEARNING MEETS QUANTUM PHYSICS 2020. [DOI: 10.1007/978-3-030-40245-7_14] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/02/2023]
|
194
|
Accurate Molecular Dynamics Enabled by Efficient Physically Constrained Machine Learning Approaches. MACHINE LEARNING MEETS QUANTUM PHYSICS 2020. [DOI: 10.1007/978-3-030-40245-7_7] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/03/2022]
|
195
|
Jackson NE, Bowen AS, de Pablo JJ. Efficient Multiscale Optoelectronic Prediction for Conjugated Polymers. Macromolecules 2019. [DOI: 10.1021/acs.macromol.9b02020] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/29/2023]
Affiliation(s)
- Nicholas E. Jackson
- Center for Molecular Engineering and Materials Science Division, Argonne National Laboratory, Lemont, Illinois 60439, United States
- Pritzker School of Molecular Engineering, University of Chicago, Chicago, Illinois 60637, United States
| | - Alec S. Bowen
- Pritzker School of Molecular Engineering, University of Chicago, Chicago, Illinois 60637, United States
| | - Juan J. de Pablo
- Center for Molecular Engineering and Materials Science Division, Argonne National Laboratory, Lemont, Illinois 60439, United States
- Pritzker School of Molecular Engineering, University of Chicago, Chicago, Illinois 60637, United States
| |
Collapse
|
196
|
Cova TFGG, Pais AACC. Deep Learning for Deep Chemistry: Optimizing the Prediction of Chemical Patterns. Front Chem 2019; 7:809. [PMID: 32039134 PMCID: PMC6988795 DOI: 10.3389/fchem.2019.00809] [Citation(s) in RCA: 60] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2019] [Accepted: 11/11/2019] [Indexed: 12/14/2022] Open
Abstract
Computational Chemistry is currently a synergistic assembly between ab initio calculations, simulation, machine learning (ML) and optimization strategies for describing, solving and predicting chemical data and related phenomena. These include accelerated literature searches, analysis and prediction of physical and quantum chemical properties, transition states, chemical structures, chemical reactions, and also new catalysts and drug candidates. The generalization of scalability to larger chemical problems, rather than specialization, is now the main principle for transforming chemical tasks in multiple fronts, for which systematic and cost-effective solutions have benefited from ML approaches, including those based on deep learning (e.g. quantum chemistry, molecular screening, synthetic route design, catalysis, drug discovery). The latter class of ML algorithms is capable of combining raw input into layers of intermediate features, enabling bench-to-bytes designs with the potential to transform several chemical domains. In this review, the most exciting developments concerning the use of ML in a range of different chemical scenarios are described. A range of different chemical problems and respective rationalization, that have hitherto been inaccessible due to the lack of suitable analysis tools, is thus detailed, evidencing the breadth of potential applications of these emerging multidimensional approaches. Focus is given to the models, algorithms and methods proposed to facilitate research on compound design and synthesis, materials design, prediction of binding, molecular activity, and soft matter behavior. The information produced by pairing Chemistry and ML, through data-driven analyses, neural network predictions and monitoring of chemical systems, allows (i) prompting the ability to understand the complexity of chemical data, (ii) streamlining and designing experiments, (ii) discovering new molecular targets and materials, and also (iv) planning or rethinking forthcoming chemical challenges. In fact, optimization engulfs all these tasks directly.
Collapse
Affiliation(s)
- Tânia F. G. G. Cova
- Coimbra Chemistry Centre, CQC, Department of Chemistry, Faculty of Sciences and Technology, University of Coimbra, Coimbra, Portugal
| | - Alberto A. C. C. Pais
- Coimbra Chemistry Centre, CQC, Department of Chemistry, Faculty of Sciences and Technology, University of Coimbra, Coimbra, Portugal
| |
Collapse
|
197
|
In Silico Insights towards the Identification of NLRP3 Druggable Hot Spots. Int J Mol Sci 2019; 20:ijms20204974. [PMID: 31600880 PMCID: PMC6834175 DOI: 10.3390/ijms20204974] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2019] [Revised: 09/30/2019] [Accepted: 10/07/2019] [Indexed: 11/25/2022] Open
Abstract
NLRP3 (NOD-like receptor family, pyrin domain-containing protein 3) activation has been linked to several chronic pathologies, including atherosclerosis, type-II diabetes, fibrosis, rheumatoid arthritis, and Alzheimer’s disease. Therefore, NLRP3 represents an appealing target for the development of innovative therapeutic approaches. A few companies are currently working on the discovery of selective modulators of NLRP3 inflammasome. Unfortunately, limited structural data are available for this target. To date, MCC950 represents one of the most promising noncovalent NLRP3 inhibitors. Recently, a possible region for the binding of MCC950 to the NLRP3 protein was described but no details were disclosed regarding the key interactions. In this communication, we present an in silico multiple approach as an insight useful for the design of novel NLRP3 inhibitors. In detail, combining different computational techniques, we propose consensus-retrieved protein residues that seem to be essential for the binding process and for the stabilization of the protein–ligand complex.
Collapse
|
198
|
Durumeric AEP, Voth GA. Adversarial-residual-coarse-graining: Applying machine learning theory to systematic molecular coarse-graining. J Chem Phys 2019; 151:124110. [PMID: 31575201 DOI: 10.1063/1.5097559] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023] Open
Abstract
We utilize connections between molecular coarse-graining (CG) approaches and implicit generative models in machine learning to describe a new framework for systematic molecular CG. Focus is placed on the formalism encompassing generative adversarial networks. The resulting method enables a variety of model parameterization strategies, some of which show similarity to previous CG methods. We demonstrate that the resulting framework can rigorously parameterize CG models containing CG sites with no prescribed connection to the reference atomistic system (termed virtual sites); however, this advantage is offset by the lack of a closed-form expression for the CG Hamiltonian at the resolution obtained after integration over the virtual CG sites. Computational examples are provided for cases in which these methods ideally return identical parameters as relative entropy minimization CG but where traditional relative entropy minimization CG optimization equations are not applicable.
Collapse
Affiliation(s)
- Aleksander E P Durumeric
- Department of Chemistry, James Franck Institute, Institute for Biophysical Dynamics, and Computation Institute, The University of Chicago, Chicago, Illinois 60637, USA
| | - Gregory A Voth
- Department of Chemistry, James Franck Institute, Institute for Biophysical Dynamics, and Computation Institute, The University of Chicago, Chicago, Illinois 60637, USA
| |
Collapse
|
199
|
Noé F, Olsson S, Köhler J, Wu H. Boltzmann generators: Sampling equilibrium states of many-body systems with deep learning. Science 2019; 365:365/6457/eaaw1147. [DOI: 10.1126/science.aaw1147] [Citation(s) in RCA: 205] [Impact Index Per Article: 41.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2018] [Accepted: 07/19/2019] [Indexed: 02/01/2023]
Abstract
Computing equilibrium states in condensed-matter many-body systems, such as solvated proteins, is a long-standing challenge. Lacking methods for generating statistically independent equilibrium samples in “one shot,” vast computational effort is invested for simulating these systems in small steps, e.g., using molecular dynamics. Combining deep learning and statistical mechanics, we developed Boltzmann generators, which are shown to generate unbiased one-shot equilibrium samples of representative condensed-matter systems and proteins. Boltzmann generators use neural networks to learn a coordinate transformation of the complex configurational equilibrium distribution to a distribution that can be easily sampled. Accurate computation of free-energy differences and discovery of new configurations are demonstrated, providing a statistical mechanics tool that can avoid rare events during sampling without prior knowledge of reaction coordinates.
Collapse
|
200
|
Pervaje AK, Walker CC, Santiso EE. Molecular simulation of polymers with a SAFT-γ Mie approach. MOLECULAR SIMULATION 2019. [DOI: 10.1080/08927022.2019.1645331] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/18/2023]
Affiliation(s)
- Amulya K. Pervaje
- Department of Chemical and Biomolecular Engineering, North Carolina State University, Raleigh, NC, USA
| | - Christopher C. Walker
- Department of Chemical and Biomolecular Engineering, North Carolina State University, Raleigh, NC, USA
| | - Erik E. Santiso
- Department of Chemical and Biomolecular Engineering, North Carolina State University, Raleigh, NC, USA
| |
Collapse
|