1
|
Patel RA, Webb MA. Data-Driven Design of Polymer-Based Biomaterials: High-throughput Simulation, Experimentation, and Machine Learning. ACS APPLIED BIO MATERIALS 2024; 7:510-527. [PMID: 36701125 DOI: 10.1021/acsabm.2c00962] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/27/2023]
Abstract
Polymers, with the capacity to tunably alter properties and response based on manipulation of their chemical characteristics, are attractive components in biomaterials. Nevertheless, their potential as functional materials is also inhibited by their complexity, which complicates rational or brute-force design and realization. In recent years, machine learning has emerged as a useful tool for facilitating materials design via efficient modeling of structure-property relationships in the chemical domain of interest. In this Spotlight, we discuss the emergence of data-driven design of polymers that can be deployed in biomaterials with particular emphasis on complex copolymer systems. We outline recent developments, as well as our own contributions and takeaways, related to high-throughput data generation for polymer systems, methods for surrogate modeling by machine learning, and paradigms for property optimization and design. Throughout this discussion, we highlight key aspects of successful strategies and other considerations that will be relevant to the future design of polymer-based biomaterials with target properties.
Collapse
Affiliation(s)
- Roshan A Patel
- Department of Chemical and Biological Engineering, Princeton University, Princeton, New Jersey 08540, United States
| | - Michael A Webb
- Department of Chemical and Biological Engineering, Princeton University, Princeton, New Jersey 08540, United States
| |
Collapse
|
2
|
Christians LF, Halingstad EV, Kram E, Okolovitch EM, Pak AJ. Formalizing Coarse-Grained Representations of Anisotropic Interactions at Multimeric Protein Interfaces Using Virtual Sites. J Phys Chem B 2024; 128:1394-1406. [PMID: 38316012 DOI: 10.1021/acs.jpcb.3c07023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2024]
Abstract
Molecular simulations of biomacromolecules that assemble into multimeric complexes remain a challenge due to computationally inaccessible length and time scales. Low-resolution and implicit-solvent coarse-grained modeling approaches using traditional nonbonded interactions (both pairwise and spherically isotropic) have been able to partially address this gap. However, these models may fail to capture the complex anisotropic interactions present at macromolecular interfaces unless higher-order interaction potentials are incorporated at the expense of the computational cost. In this work, we introduce an alternate and systematic approach to represent directional interactions at protein-protein interfaces by using virtual sites restricted to pairwise interactions. We show that virtual site interaction parameters can be optimized within a relative entropy minimization framework by using only information from known statistics between coarse-grained sites. We compare our virtual site models to traditional coarse-grained models using two case studies of multimeric protein assemblies and find that the virtual site models predict pairwise correlations with higher fidelity and, more importantly, assembly behavior that is morphologically consistent with experiments. Our study underscores the importance of anisotropic interaction representations and paves the way for more accurate yet computationally efficient coarse-grained simulations of macromolecular assembly in future research.
Collapse
Affiliation(s)
- Luc F Christians
- Department of Chemical and Biological Engineering, Colorado School of Mines, Golden, Colorado 80401, United States
| | - Ethan V Halingstad
- Department of Chemical and Biological Engineering, Colorado School of Mines, Golden, Colorado 80401, United States
| | - Emiel Kram
- Department of Chemical and Biological Engineering, Colorado School of Mines, Golden, Colorado 80401, United States
| | - Evan M Okolovitch
- Department of Chemical and Biological Engineering, Colorado School of Mines, Golden, Colorado 80401, United States
| | - Alexander J Pak
- Department of Chemical and Biological Engineering, Colorado School of Mines, Golden, Colorado 80401, United States
- Quantitative Biosciences and Engineering Program, Colorado School of Mines, Golden, Colorado 80401, United States
- Materials Science Program, Colorado School of Mines, Golden, Colorado 80401, United States
| |
Collapse
|
3
|
Kidder KM, Shell MS, Noid WG. Surveying the energy landscape of coarse-grained mappings. J Chem Phys 2024; 160:054105. [PMID: 38310476 DOI: 10.1063/5.0182524] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2023] [Accepted: 12/28/2023] [Indexed: 02/05/2024] Open
Abstract
Simulations of soft materials often adopt low-resolution coarse-grained (CG) models. However, the CG representation is not unique and its impact upon simulated properties is poorly understood. In this work, we investigate the space of CG representations for ubiquitin, which is a typical globular protein with 72 amino acids. We employ Monte Carlo methods to ergodically sample this space and to characterize its landscape. By adopting the Gaussian network model as an analytically tractable atomistic model for equilibrium fluctuations, we exactly assess the intrinsic quality of each CG representation without introducing any approximations in sampling configurations or in modeling interactions. We focus on two metrics, the spectral quality and the information content, that quantify the extent to which the CG representation preserves low-frequency, large-amplitude motions and configurational information, respectively. The spectral quality and information content are weakly correlated among high-resolution representations but become strongly anticorrelated among low-resolution representations. Representations with maximal spectral quality appear consistent with physical intuition, while low-resolution representations with maximal information content do not. Interestingly, quenching studies indicate that the energy landscape of mapping space is very smooth and highly connected. Moreover, our study suggests a critical resolution below which a "phase transition" qualitatively distinguishes good and bad representations.
Collapse
Affiliation(s)
- Katherine M Kidder
- Department of Chemistry, The Pennsylvania State University, University Park, Pennsylvania 16802, USA
| | - M Scott Shell
- Department of Chemical Engineering, University of California, Santa Barbara, California 93106, USA
| | - W G Noid
- Department of Chemistry, The Pennsylvania State University, University Park, Pennsylvania 16802, USA
| |
Collapse
|
4
|
Maier JC, Wang CI, Jackson NE. Distilling coarse-grained representations of molecular electronic structure with continuously gated message passing. J Chem Phys 2024; 160:024109. [PMID: 38193551 DOI: 10.1063/5.0179253] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2023] [Accepted: 12/14/2023] [Indexed: 01/10/2024] Open
Abstract
Bottom-up methods for coarse-grained (CG) molecular modeling are critically needed to establish rigorous links between atomistic reference data and reduced molecular representations. For a target molecule, the ideal reduced CG representation is a function of both the conformational ensemble of the system and the target physical observable(s) to be reproduced at the CG resolution. However, there is an absence of algorithms for selecting CG representations of molecules from which complex properties, including molecular electronic structure, can be accurately modeled. We introduce continuously gated message passing (CGMP), a graph neural network (GNN) method for atomically decomposing molecular electronic structure sampled over conformational ensembles. CGMP integrates 3D-invariant GNNs and a novel gated message passing system to continuously reduce the atomic degrees of freedom accessible for electronic predictions, resulting in a one-shot importance ranking of atoms contributing to a target molecular property. Moreover, CGMP provides the first approach by which to quantify the degeneracy of "good" CG representations conditioned on specific prediction targets, facilitating the development of more transferable CG representations. We further show how CGMP can be used to highlight multiatom correlations, illuminating a path to developing CG electronic Hamiltonians in terms of interpretable collective variables for arbitrarily complex molecules.
Collapse
Affiliation(s)
- J Charlie Maier
- Department of Physics, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, USA
| | - Chun-I Wang
- Department of Chemistry, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, USA
| | - Nicholas E Jackson
- Department of Chemistry, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, USA
| |
Collapse
|
5
|
Wu J, Xue W, Voth GA. K-Means Clustering Coarse-Graining (KMC-CG): A Next Generation Methodology for Determining Optimal Coarse-Grained Mappings of Large Biomolecules. J Chem Theory Comput 2023; 19:8987-8997. [PMID: 37957028 PMCID: PMC10720621 DOI: 10.1021/acs.jctc.3c01053] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2023] [Revised: 10/25/2023] [Accepted: 10/27/2023] [Indexed: 11/21/2023]
Abstract
Coarse-grained (CG) molecular dynamics (MD) has become a method of choice for simulating various large scale biomolecular processes; therefore, the systematic definition of the CG mappings for biomolecules remains an important topic. Appropriate CG mappings can significantly enhance the representability of a CG model and improve its ability to capture critical features of large biomolecules. In this work, we present a systematic and more generalized method called K-means clustering coarse-graining (KMC-CG), which builds on the earlier approach of essential dynamics coarse-graining (ED-CG). KMC-CG removes the sequence-dependent constraints of ED-CG, allowing it to explore a more extensive space and thus enabling the discovery of more physically optimal CG mappings. Furthermore, the implementation of the K-means clustering algorithm can variationally optimize the CG mapping with efficiency and stability. This new method is tested in three cases: ATP-bound G-actin, the HIV-1 CA pentamer, and the Arp2/3 complex. In these examples, the CG models generated by KMC-CG are seen to better capture the structural, dynamic, and functional domains. KMC-CG therefore provides a robust and consistent approach to generating CG models of large biomolecules that can then be more accurately parametrized by either bottom-up or top-down CG force fields.
Collapse
Affiliation(s)
| | | | - Gregory A. Voth
- Department of Chemistry,
Chicago Center for Theoretical Chemistry, The James Franck Institute,
and Institute for Biophysical Dynamics, The University of Chicago, Chicago, Illinois 60637, United States
| |
Collapse
|
6
|
Lederer J, Gastegger M, Schütt KT, Kampffmeyer M, Müller KR, Unke OT. Automatic identification of chemical moieties. Phys Chem Chem Phys 2023; 25:26370-26379. [PMID: 37750554 PMCID: PMC10548786 DOI: 10.1039/d3cp03845a] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2023] [Accepted: 08/18/2023] [Indexed: 09/27/2023]
Abstract
In recent years, the prediction of quantum mechanical observables with machine learning methods has become increasingly popular. Message-passing neural networks (MPNNs) solve this task by constructing atomic representations, from which the properties of interest are predicted. Here, we introduce a method to automatically identify chemical moieties (molecular building blocks) from such representations, enabling a variety of applications beyond property prediction, which otherwise rely on expert knowledge. The required representation can either be provided by a pretrained MPNN, or be learned from scratch using only structural information. Beyond the data-driven design of molecular fingerprints, the versatility of our approach is demonstrated by enabling the selection of representative entries in chemical databases, the automatic construction of coarse-grained force fields, as well as the identification of reaction coordinates.
Collapse
Affiliation(s)
- Jonas Lederer
- Berlin Institute of Technology (TU Berlin), 10587 Berlin, Germany.
- BIFOLD - Berlin Institute for the Foundations of Learning and Data, Germany
| | - Michael Gastegger
- Berlin Institute of Technology (TU Berlin), 10587 Berlin, Germany.
- BIFOLD - Berlin Institute for the Foundations of Learning and Data, Germany
| | - Kristof T Schütt
- Berlin Institute of Technology (TU Berlin), 10587 Berlin, Germany.
- BIFOLD - Berlin Institute for the Foundations of Learning and Data, Germany
| | - Michael Kampffmeyer
- Department of Physics and Technology, UiT The Arctic University of Norway, 9019 Tromsø, Norway
| | - Klaus-Robert Müller
- Berlin Institute of Technology (TU Berlin), 10587 Berlin, Germany.
- BIFOLD - Berlin Institute for the Foundations of Learning and Data, Germany
- Google Deepmind, Germany
- Department of Artificial Intelligence, Korea University, Seoul 136-713, Korea
- Max Planck Institut für Informatik, 66123 Saarbrücken, Germany
| | - Oliver T Unke
- Berlin Institute of Technology (TU Berlin), 10587 Berlin, Germany.
- BIFOLD - Berlin Institute for the Foundations of Learning and Data, Germany
- Google Deepmind, Germany
| |
Collapse
|
7
|
Schneider L, de Pablo JJ. Entanglements via Slip Springs with Soft, Coarse-Grained Models for Systems Having Explicit Liquid-Vapor Interfaces. Macromolecules 2023; 56:7445-7453. [PMID: 37781215 PMCID: PMC10538480 DOI: 10.1021/acs.macromol.3c00960] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2023] [Revised: 08/07/2023] [Indexed: 10/03/2023]
Abstract
Recent advances in nano-rheology require that new techniques and models be developed to precisely describe the equilibrium and non-equilibrium characteristics of entangled polymeric materials and their interfaces at a molecular level. In this study, a slip-spring (SLSP) model is proposed to capture the dynamics of entangled polymers at interfaces, including those between liquids, liquids and vapors, and liquids and solids. The SLSP model employs a highly coarse-grained approach, which allows for comprehensive simulations of entire nano-rheological characterization systems using a particle-level description. The model relies on many-body dissipative particle dynamics (MDPD) non-bonded interactions, which permit explicit description of liquid-vapor interfaces; a compensating potential is introduced to ensure an unbiased representation of the shape of the liquid-vapor interface within the SLSP model. The usefulness of the proposed MDPD + SLSP approach is illustrated by simulating a capillary breakup rheometer (CaBR) experiment, in which a liquid droplet splits into two segments under the influence of capillary forces. We find that the predictions of the MDPD + SLSP model are consistent with experimental measurements and theoretical predictions. The proposed model is also verified by comparison to the results of explicit molecular dynamics simulations of an entangled polymer melt using a Kremer-Grest chain representation, both at equilibrium and far from equilibrium. Taken together, the model and methods presented in this study provide a reliable framework for molecular-level interpretation of high-polymer dynamics in the presence of interfaces.
Collapse
Affiliation(s)
- Ludwig Schneider
- Pritzker
School of Molecular Engineering, University
of Chicago, 5740 S. Ellis Avenue, Chicago, Illinois 60637-1403, United States
| | - Juan J. de Pablo
- Pritzker
School of Molecular Engineering, University
of Chicago, 5740 S. Ellis Avenue, Chicago, Illinois 60637-1403, United States
- Argonne
National Laboratory, 9700 S. Cass Avenue, Lemont, IL 60439, United States
| |
Collapse
|
8
|
Mahajan S, Tang T. Automated Parameterization of Coarse-Grained Polyethylenimine under a Martini Framework. J Chem Inf Model 2023; 63:4328-4341. [PMID: 37424081 DOI: 10.1021/acs.jcim.3c00103] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/11/2023]
Abstract
As a versatile polymer in many applications, synthesized polyethylenimine (PEI) is polydisperse with diverse branched structures that attain pH-dependent protonation states. Understanding the structure-function relationship of PEI is necessary for enhancing its efficacy in various applications. Coarse-grained (CG) simulations can be performed at length and time scales directly comparable with experimental data while maintaining the molecular perspective. However, manually developing CG forcefields for complex PEI structures is time-consuming and prone to human errors. This article presents a fully automated algorithm that can coarse-grain any branched architecture of PEI from its all-atom (AA) simulation trajectories and topology. The algorithm is demonstrated by coarse-graining a branched 2 kDa PEI, which can replicate the AA diffusion coefficient, radius of gyration, and end-to-end distance of the longest linear chain. Commercially available 25 and 2 kDa Millipore-Sigma PEIs are used for experimental validation. Specifically, branched PEI architectures are proposed, coarse-grained using the automated algorithm, and then simulated at different mass concentrations. The CG PEIs can reproduce existing experimental data on PEI's diffusion coefficient and Stokes-Einstein radius at infinite dilution as well as its intrinsic viscosity. This suggests a strategy where probable chemical structures of synthetic PEIs can be inferred computationally using the developed algorithm. The coarse-graining methodology presented here can also be extended to other polymers.
Collapse
Affiliation(s)
- Subhamoy Mahajan
- Department of Mechanical Engineering, University of Alberta, Edmonton, AB T6G 1H9, Canada
| | - Tian Tang
- Department of Mechanical Engineering, University of Alberta, Edmonton, AB T6G 1H9, Canada
| |
Collapse
|
9
|
Izvekov S, Rice BM. Hierarchical Machine Learning of Low-Resolution Coarse-Grained Free Energy Potentials. J Chem Theory Comput 2023. [PMID: 37256918 DOI: 10.1021/acs.jctc.3c00128] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/02/2023]
Abstract
A force-matching-based method for supervised machine learning (ML) of coarse-grained (CG) free energy (FE) potentials─known as multiscale coarse-graining via force-matching (MSCG/FM)─is an efficient method to develop microscopically informed CG models that are thermodynamically and statistically equivalent to the reference microscopic models. For low-resolution models, when the coarse-graining is at supramolecular scales, objective-oriented clustering of nonbonded particles is required and the reduced description becomes a function of the clustering algorithm. In the present work, we explore the dependence of the ML of the CG Helmholtz FE potential on the clustering algorithm. We consider coarse-graining based on partitional (k-means, leading to Voronoi diagram) and hierarchical agglomerative (bottom-up) clustering algorithms common in unsupervised ML and develop theory connecting the MSCG/FM learned CG Helmholtz potential and the clustering statistics. By combining the agglomerative clustering and the MSCG/FM learning in a recursive manner, we propose an efficient ML methodology to develop the fine-to-low resolution hierarchies of the CG models. The methodology does not suffer from degrading accuracy or increased computational cost to construct larger hierarchies and as such does not impose an upper size limitation of the CG particles resulting from the extended hierarchies. The utility of the methodology is demonstrated by obtaining the bottom-up agglomerative hierarchy for liquid nitromethane from all-atom molecular dynamics (MD) simulations. For agglomerative hierarchies, we prove the existence of renormalization group transformations that indicate self-similarity and allow for learning the low-resolution MSCG/FM potentials at low computational cost by rescaling and renormalizing the certain finer-resolution members of the hierarchy. The hierarchies of the CG models can be used to carry out simulations under constant-pressure conditions.
Collapse
Affiliation(s)
- Sergei Izvekov
- U.S. Army DEVCOM Army Research Laboratory, Aberdeen Proving Ground, Maryland 21005, United States
| | - Betsy M Rice
- U.S. Army DEVCOM Army Research Laboratory, Aberdeen Proving Ground, Maryland 21005, United States
| |
Collapse
|
10
|
Pei HW, Zhu YL, Lu ZY, Li JP, Sun ZY. Automatic Multiscale Method of Building up a Cross-linked Polymer Reaction System: Bridging SMILES to the Multiscale Molecular Dynamics Simulation. J Phys Chem B 2023. [PMID: 37200472 DOI: 10.1021/acs.jpcb.3c01555] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/20/2023]
Abstract
An automatic method is introduced to generate the initial configuration and input file from SMILES for multiscale molecular dynamics (MD) simulation of cross-linked polymer reaction systems. Inputs are a modified version of SMILES of all the components and conditions of coarse-grained (CG) and all-atom (AA) simulations. The overall process comprises the following steps: (1) Modified SMILES inputs of all the components are converted to 3-dimensional coordinates of molecular structures. (2) Molecular structures are mapped to the coarse-grained scale, followed by a CG reaction simulation. (3) CG beads are backmapped to the atomic scale after the CG reaction. (4) An AA productive run is finally performed to analyze volume shrinkage, glass transition, and atomic detail of network structure. The method is applied to two common epoxy resin reactions, that is, the cross-linking process of DGEVA (diglycidyl ether of vanillyl alcohol) and DHAVA (dihydroxyaminopropane of vanillyl alcohol) and that of DGEBA (diglycidyl ether of bisphenol A) and DETA (diethylenetriamine). These components form network structures after the CG cross-linking reaction and are then backmapped to calculate properties in the atomic scale. The result demonstrates that the method can accurately predict volume shrinkage, glass transition, and all-atom structure of cross-linked polymers. The method bridges from SMILES to MD simulation trajectories in an automatic way, which shortens the time of building up cross-linked polymer reaction model and suitable for high-throughput computations.
Collapse
Affiliation(s)
- Han-Wen Pei
- State Key Laboratory of Polymer Physics and Chemistry, Changchun Institute of Applied Chemistry, Chinese Academy of Sciences, Changchun 130022, People's Republic of China
- School of Applied Chemistry and Engineering, University of Science and Technology of China, Hefei 230026, People's Republic of China
| | - You-Liang Zhu
- College of Chemistry, Jilin University, Changchun 130012, People's Republic of China
| | - Zhong-Yuan Lu
- College of Chemistry, Jilin University, Changchun 130012, People's Republic of China
| | - Jun-Peng Li
- State Key Laboratory of Advanced Technologies for Comprehensive Utilization of Platinum Metals, Sino-Platinum Metals Company, Limited, Kunming 650106, People's Republic of China
| | - Zhao-Yan Sun
- State Key Laboratory of Polymer Physics and Chemistry, Changchun Institute of Applied Chemistry, Chinese Academy of Sciences, Changchun 130022, People's Republic of China
- School of Applied Chemistry and Engineering, University of Science and Technology of China, Hefei 230026, People's Republic of China
| |
Collapse
|
11
|
Bhat V, Callaway CP, Risko C. Computational Approaches for Organic Semiconductors: From Chemical and Physical Understanding to Predicting New Materials. Chem Rev 2023. [PMID: 37141497 DOI: 10.1021/acs.chemrev.2c00704] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/06/2023]
Abstract
While a complete understanding of organic semiconductor (OSC) design principles remains elusive, computational methods─ranging from techniques based in classical and quantum mechanics to more recent data-enabled models─can complement experimental observations and provide deep physicochemical insights into OSC structure-processing-property relationships, offering new capabilities for in silico OSC discovery and design. In this Review, we trace the evolution of these computational methods and their application to OSCs, beginning with early quantum-chemical methods to investigate resonance in benzene and building to recent machine-learning (ML) techniques and their application to ever more sophisticated OSC scientific and engineering challenges. Along the way, we highlight the limitations of the methods and how sophisticated physical and mathematical frameworks have been created to overcome those limitations. We illustrate applications of these methods to a range of specific challenges in OSCs derived from π-conjugated polymers and molecules, including predicting charge-carrier transport, modeling chain conformations and bulk morphology, estimating thermomechanical properties, and describing phonons and thermal transport, to name a few. Through these examples, we demonstrate how advances in computational methods accelerate the deployment of OSCsin wide-ranging technologies, such as organic photovoltaics (OPVs), organic light-emitting diodes (OLEDs), organic thermoelectrics, organic batteries, and organic (bio)sensors. We conclude by providing an outlook for the future development of computational techniques to discover and assess the properties of high-performing OSCs with greater accuracy.
Collapse
Affiliation(s)
- Vinayak Bhat
- Department of Chemistry & Center for Applied Energy Research, University of Kentucky, Lexington, Kentucky 40506-0055, United States
| | - Connor P Callaway
- Department of Chemistry & Center for Applied Energy Research, University of Kentucky, Lexington, Kentucky 40506-0055, United States
| | - Chad Risko
- Department of Chemistry & Center for Applied Energy Research, University of Kentucky, Lexington, Kentucky 40506-0055, United States
| |
Collapse
|
12
|
Krämer A, Durumeric AEP, Charron NE, Chen Y, Clementi C, Noé F. Statistically Optimal Force Aggregation for Coarse-Graining Molecular Dynamics. J Phys Chem Lett 2023; 14:3970-3979. [PMID: 37079800 DOI: 10.1021/acs.jpclett.3c00444] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/03/2023]
Abstract
Machine-learned coarse-grained (CG) models have the potential for simulating large molecular complexes beyond what is possible with atomistic molecular dynamics. However, training accurate CG models remains a challenge. A widely used methodology for learning bottom-up CG force fields maps forces from all-atom molecular dynamics to the CG representation and matches them with a CG force field on average. We show that there is flexibility in how to map all-atom forces to the CG representation and that the most commonly used mapping methods are statistically inefficient and potentially even incorrect in the presence of constraints in the all-atom simulation. We define an optimization statement for force mappings and demonstrate that substantially improved CG force fields can be learned from the same simulation data when using optimized force maps. The method is demonstrated on the miniproteins chignolin and tryptophan cage and published as open-source code.
Collapse
Affiliation(s)
- Andreas Krämer
- Department of Mathematics and Computer Science, Freie Universität Berlin, Arnimallee 12, 14195 Berlin, Germany
| | - Aleksander E P Durumeric
- Department of Mathematics and Computer Science, Freie Universität Berlin, Arnimallee 12, 14195 Berlin, Germany
| | - Nicholas E Charron
- Department of Physics and Astronomy, Rice University, Houston, Texas 77005, United States
- Center for Theoretical Biological Physics, Rice University, Houston, Texas 77251, United States
- Department of Physics, Freie Universität Berlin, Arnimallee 12, 14195 Berlin, Germany
| | - Yaoyi Chen
- Department of Mathematics and Computer Science, Freie Universität Berlin, Arnimallee 12, 14195 Berlin, Germany
- International Max Planck Research School for Biology and Computation (IMPRS-BAC), Max Planck Institute for Molecular Genetics, 14195 Berlin, Germany
| | - Cecilia Clementi
- Department of Physics and Astronomy, Rice University, Houston, Texas 77005, United States
- Center for Theoretical Biological Physics, Rice University, Houston, Texas 77251, United States
- Department of Physics, Freie Universität Berlin, Arnimallee 12, 14195 Berlin, Germany
- Department of Chemistry, Rice University, Houston, Texas 77005, United States
| | - Frank Noé
- Department of Mathematics and Computer Science, Freie Universität Berlin, Arnimallee 12, 14195 Berlin, Germany
- Department of Physics, Freie Universität Berlin, Arnimallee 12, 14195 Berlin, Germany
- Department of Chemistry, Rice University, Houston, Texas 77005, United States
- Microsoft Research AI4Science, Karl-Liebknecht Straße 32, 10178 Berlin, Germany
| |
Collapse
|
13
|
Chennakesavalu S, Toomer DJ, Rotskoff GM. Ensuring thermodynamic consistency with invertible coarse-graining. J Chem Phys 2023; 158:124126. [PMID: 37003724 DOI: 10.1063/5.0141888] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/03/2023] Open
Abstract
Coarse-grained models are a core computational tool in theoretical chemistry and biophysics. A judicious choice of a coarse-grained model can yield physical insights by isolating the essential degrees of freedom that dictate the thermodynamic properties of a complex, condensed-phase system. The reduced complexity of the model typically leads to lower computational costs and more efficient sampling compared with atomistic models. Designing "good" coarse-grained models is an art. Generally, the mapping from fine-grained configurations to coarse-grained configurations itself is not optimized in any way; instead, the energy function associated with the mapped configurations is. In this work, we explore the consequences of optimizing the coarse-grained representation alongside its potential energy function. We use a graph machine learning framework to embed atomic configurations into a low-dimensional space to produce efficient representations of the original molecular system. Because the representation we obtain is no longer directly interpretable as a real-space representation of the atomic coordinates, we also introduce an inversion process and an associated thermodynamic consistency relation that allows us to rigorously sample fine-grained configurations conditioned on the coarse-grained sampling. We show that this technique is robust, recovering the first two moments of the distribution of several observables in proteins such as chignolin and alanine dipeptide.
Collapse
Affiliation(s)
| | - David J Toomer
- Department of Chemistry, Stanford University, Stanford, California 94305, USA
| | - Grant M Rotskoff
- Department of Chemistry, Stanford University, Stanford, California 94305, USA
| |
Collapse
|
14
|
Ricci E, Vergadou N. Integrating Machine Learning in the Coarse-Grained Molecular Simulation of Polymers. J Phys Chem B 2023; 127:2302-2322. [PMID: 36888553 DOI: 10.1021/acs.jpcb.2c06354] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/09/2023]
Abstract
Machine learning (ML) is having an increasing impact on the physical sciences, engineering, and technology and its integration into molecular simulation frameworks holds great potential to expand their scope of applicability to complex materials and facilitate fundamental knowledge and reliable property predictions, contributing to the development of efficient materials design routes. The application of ML in materials informatics in general, and polymer informatics in particular, has led to interesting results, however great untapped potential lies in the integration of ML techniques into the multiscale molecular simulation methods for the study of macromolecular systems, specifically in the context of Coarse Grained (CG) simulations. In this Perspective, we aim at presenting the pioneering recent research efforts in this direction and discussing how these new ML-based techniques can contribute to critical aspects of the development of multiscale molecular simulation methods for bulk complex chemical systems, especially polymers. Prerequisites for the implementation of such ML-integrated methods and open challenges that need to be met toward the development of general systematic ML-based coarse graining schemes for polymers are discussed.
Collapse
Affiliation(s)
- Eleonora Ricci
- Institute of Nanoscience and Nanotechnology, National Center for Scientific Research "Demokritos", GR-15341 Agia Paraskevi, Athens, Greece
- Institute of Informatics and Telecommunications, National Center for Scientific Research "Demokritos", GR-15341 Agia Paraskevi, Athens, Greece
| | - Niki Vergadou
- Institute of Nanoscience and Nanotechnology, National Center for Scientific Research "Demokritos", GR-15341 Agia Paraskevi, Athens, Greece
| |
Collapse
|
15
|
Yang W, Templeton C, Rosenberger D, Bittracher A, Nüske F, Noé F, Clementi C. Slicing and Dicing: Optimal Coarse-Grained Representation to Preserve Molecular Kinetics. ACS CENTRAL SCIENCE 2023; 9:186-196. [PMID: 36844497 PMCID: PMC9951291 DOI: 10.1021/acscentsci.2c01200] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/10/2022] [Indexed: 05/05/2023]
Abstract
The aim of molecular coarse-graining approaches is to recover relevant physical properties of the molecular system via a lower-resolution model that can be more efficiently simulated. Ideally, the lower resolution still accounts for the degrees of freedom necessary to recover the correct physical behavior. The selection of these degrees of freedom has often relied on the scientist's chemical and physical intuition. In this article, we make the argument that in soft matter contexts desirable coarse-grained models accurately reproduce the long-time dynamics of a system by correctly capturing the rare-event transitions. We propose a bottom-up coarse-graining scheme that correctly preserves the relevant slow degrees of freedom, and we test this idea for three systems of increasing complexity. We show that in contrast to this method existing coarse-graining schemes such as those from information theory or structure-based approaches are not able to recapitulate the slow time scales of the system.
Collapse
Affiliation(s)
- Wangfei Yang
- Center
for Theoretical Biological Physics, Rice
University, Houston, Texas77005, United States
- Graduate
Program in Systems, Synthetic and Physical Biology, Rice University, Houston, Texas77005, United States
| | - Clark Templeton
- Department
of Physics, Freie Universität Berlin, Arnimallee 12, 14195Berlin, Germany
| | - David Rosenberger
- Department
of Physics, Freie Universität Berlin, Arnimallee 12, 14195Berlin, Germany
| | - Andreas Bittracher
- Department
of Mathematics and Computer Science, Freie
Universität Berlin, Arnimallee 12, 14195Berlin, Germany
| | - Feliks Nüske
- Max
Planck Institute for Dynamics of Complex Technical Systems, Sandtorstrasse 1, 39106Magdeburg, Germany
| | - Frank Noé
- Center
for Theoretical Biological Physics, Rice
University, Houston, Texas77005, United States
- Department
of Physics, Freie Universität Berlin, Arnimallee 12, 14195Berlin, Germany
- Department
of Mathematics and Computer Science, Freie
Universität Berlin, Arnimallee 12, 14195Berlin, Germany
- Department
of Chemistry, Rice University, Houston, Texas77005, United States
| | - Cecilia Clementi
- Center
for Theoretical Biological Physics, Rice
University, Houston, Texas77005, United States
- Department
of Physics, Freie Universität Berlin, Arnimallee 12, 14195Berlin, Germany
- Department
of Chemistry, Rice University, Houston, Texas77005, United States
- Department
of Physics, Rice University, Houston, Texas77005, United States
- E-mail:
| |
Collapse
|
16
|
Jin J, Pak AJ, Durumeric AEP, Loose TD, Voth GA. Bottom-up Coarse-Graining: Principles and Perspectives. J Chem Theory Comput 2022; 18:5759-5791. [PMID: 36070494 PMCID: PMC9558379 DOI: 10.1021/acs.jctc.2c00643] [Citation(s) in RCA: 72] [Impact Index Per Article: 36.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2022] [Indexed: 01/14/2023]
Abstract
Large-scale computational molecular models provide scientists a means to investigate the effect of microscopic details on emergent mesoscopic behavior. Elucidating the relationship between variations on the molecular scale and macroscopic observable properties facilitates an understanding of the molecular interactions driving the properties of real world materials and complex systems (e.g., those found in biology, chemistry, and materials science). As a result, discovering an explicit, systematic connection between microscopic nature and emergent mesoscopic behavior is a fundamental goal for this type of investigation. The molecular forces critical to driving the behavior of complex heterogeneous systems are often unclear. More problematically, simulations of representative model systems are often prohibitively expensive from both spatial and temporal perspectives, impeding straightforward investigations over possible hypotheses characterizing molecular behavior. While the reduction in resolution of a study, such as moving from an atomistic simulation to that of the resolution of large coarse-grained (CG) groups of atoms, can partially ameliorate the cost of individual simulations, the relationship between the proposed microscopic details and this intermediate resolution is nontrivial and presents new obstacles to study. Small portions of these complex systems can be realistically simulated. Alone, these smaller simulations likely do not provide insight into collectively emergent behavior. However, by proposing that the driving forces in both smaller and larger systems (containing many related copies of the smaller system) have an explicit connection, systematic bottom-up CG techniques can be used to transfer CG hypotheses discovered using a smaller scale system to a larger system of primary interest. The proposed connection between different CG systems is prescribed by (i) the CG representation (mapping) and (ii) the functional form and parameters used to represent the CG energetics, which approximate potentials of mean force (PMFs). As a result, the design of CG methods that facilitate a variety of physically relevant representations, approximations, and force fields is critical to moving the frontier of systematic CG forward. Crucially, the proposed connection between the system used for parametrization and the system of interest is orthogonal to the optimization used to approximate the potential of mean force present in all systematic CG methods. The empirical efficacy of machine learning techniques on a variety of tasks provides strong motivation to consider these approaches for approximating the PMF and analyzing these approximations.
Collapse
Affiliation(s)
- Jaehyeok Jin
- Department of Chemistry,
Chicago Center for Theoretical Chemistry, Institute for Biophysical
Dynamics, and James Franck Institute, The
University of Chicago, Chicago, Illinois 60637, United States
| | - Alexander J. Pak
- Department of Chemistry,
Chicago Center for Theoretical Chemistry, Institute for Biophysical
Dynamics, and James Franck Institute, The
University of Chicago, Chicago, Illinois 60637, United States
| | - Aleksander E. P. Durumeric
- Department of Chemistry,
Chicago Center for Theoretical Chemistry, Institute for Biophysical
Dynamics, and James Franck Institute, The
University of Chicago, Chicago, Illinois 60637, United States
| | - Timothy D. Loose
- Department of Chemistry,
Chicago Center for Theoretical Chemistry, Institute for Biophysical
Dynamics, and James Franck Institute, The
University of Chicago, Chicago, Illinois 60637, United States
| | - Gregory A. Voth
- Department of Chemistry,
Chicago Center for Theoretical Chemistry, Institute for Biophysical
Dynamics, and James Franck Institute, The
University of Chicago, Chicago, Illinois 60637, United States
| |
Collapse
|
17
|
Liang H, Yoshimoto K, Kitabata M, Yamamoto U, de Pablo JJ. Multiscale rheology model for entangled Nylon 6 melts. JOURNAL OF POLYMER SCIENCE 2022. [DOI: 10.1002/pol.20220434] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Affiliation(s)
- Heyi Liang
- Pritzker School of Molecular Engineering The University of Chicago Chicago Illinois USA
| | - Kenji Yoshimoto
- Advanced Materials Research Laboratories Toray Indurstires Inc. Otsu Shiga Japan
| | - Masahiro Kitabata
- Advanced Materials Research Laboratories Toray Indurstires Inc. Otsu Shiga Japan
| | - Umi Yamamoto
- Advanced Materials Research Laboratories Toray Indurstires Inc. Otsu Shiga Japan
| | - Juan J. de Pablo
- Pritzker School of Molecular Engineering The University of Chicago Chicago Illinois USA
| |
Collapse
|
18
|
Hollborn KU, Schneider L, Müller M. Effect of Slip-Spring Parameters on the Dynamics and Rheology of Soft, Coarse-Grained Polymer Models. J Phys Chem B 2022; 126:6725-6739. [PMID: 36037428 DOI: 10.1021/acs.jpcb.2c03983] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Highly coarse-grained (hCG) linear polymer models allow for accessing long time and length scales by dissipative particle dynamics (DPD). This top-down strategy exploits the universal equilibrium behavior of long, flexible macromolecules by accounting only for the relevant interactions, such as molecular connectivity, and by parametrizing their strength via coarse-grained invariants, such as the mean-squared end-to-end distance. The description of the dynamics of long, entangled polymers, however, poses a challenge because (i) the noncrossability of the molecular backbones is not enforced by the soft interactions of an hCG model and (ii) the rheology involves multiple time and length scales, such as the Rouse-like dynamics on short scales and the reptation dynamics on long scales. One popular technique to effectively mimic the effect of entanglements in linear polymer melts via hCG models is slip-springs, and quantitative agreement with simulations that explicitly account for the noncrossability of molecular contours, experiments, and theoretical predictions has been achieved by identifying the time, length, and energy scales of the hCG model and adjusting the number of slip-springs per macromolecule. In the present work, we study how the spatial extent and the mobility of slip-springs affect the dynamics and discuss their implications in the choice of the degree of coarse-graining in computationally efficient hCG models.
Collapse
Affiliation(s)
- Kai-Uwe Hollborn
- Institute for Theoretical Physics, Georg-August Universität Göttingen, Friedrich-Hund-Platz 1, 37077 Göttingen, Germany
| | - Ludwig Schneider
- Institute for Theoretical Physics, Georg-August Universität Göttingen, Friedrich-Hund-Platz 1, 37077 Göttingen, Germany.,Pritzker School of Molecular Engineering, University of Chicago, 5640 Ellis Avenue, Chicago, Illinois 60637, United States
| | - Marcus Müller
- Institute for Theoretical Physics, Georg-August Universität Göttingen, Friedrich-Hund-Platz 1, 37077 Göttingen, Germany
| |
Collapse
|
19
|
Liang H, Yoshimoto K, Gil P, Kitabata M, Yamamoto U, de Pablo JJ. Bottom-Up Multiscale Approach to Estimate Viscoelastic Properties of Entangled Polymer Melts with High Glass Transition Temperature. Macromolecules 2022. [DOI: 10.1021/acs.macromol.1c02044] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Affiliation(s)
- Heyi Liang
- Pritzker School of Molecular Engineering, The University of Chicago, Chicago, Illinois 60637, United States
| | - Kenji Yoshimoto
- Toray Industries Inc., 3-2-1 Sonoyama, Otsu, Shiga 520-0842, Japan
| | - Phwey Gil
- Pritzker School of Molecular Engineering, The University of Chicago, Chicago, Illinois 60637, United States
| | | | - Umi Yamamoto
- Toray Industries Inc., 3-2-1 Sonoyama, Otsu, Shiga 520-0842, Japan
| | - Juan J. de Pablo
- Pritzker School of Molecular Engineering, The University of Chicago, Chicago, Illinois 60637, United States
| |
Collapse
|
20
|
Nguyen D, Tao L, Li Y. Integration of Machine Learning and Coarse-Grained Molecular Simulations for Polymer Materials: Physical Understandings and Molecular Design. Front Chem 2022; 9:820417. [PMID: 35141207 PMCID: PMC8819075 DOI: 10.3389/fchem.2021.820417] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2021] [Accepted: 12/31/2021] [Indexed: 12/21/2022] Open
Abstract
In recent years, the synthesis of monomer sequence-defined polymers has expanded into broad-spectrum applications in biomedical, chemical, and materials science fields. Pursuing the characterization and inverse design of these polymer systems requires our fundamental understanding not only at the individual monomer level, but also considering the chain scales, such as polymer configuration, self-assembly, and phase separation. However, our accessibility to this field is still rudimentary due to the limitations of traditional design approaches, the complexity of chemical space along with the burdened cost and time issues that prevent us from unveiling the underlying monomer sequence-structure-property relationships. Fortunately, thanks to the recent advancements in molecular dynamics simulations and machine learning (ML) algorithms, the bottlenecks in the tasks of establishing the structure-function correlation of the polymer chains can be overcome. In this review, we will discuss the applications of the integration between ML techniques and coarse-grained molecular dynamics (CGMD) simulations to solve the current issues in polymer science at the chain level. In particular, we focus on the case studies in three important topics—polymeric configuration characterization, feed-forward property prediction, and inverse design—in which CGMD simulations are leveraged to generate training datasets to develop ML-based surrogate models for specific polymer systems and designs. By doing so, this computational hybridization allows us to well establish the monomer sequence-functional behavior relationship of the polymers as well as guide us toward the best polymer chain candidates for the inverse design in undiscovered chemical space with reasonable computational cost and time. Even though there are still limitations and challenges ahead in this field, we finally conclude that this CGMD/ML integration is very promising, not only in the attempt of bridging the monomeric and macroscopic characterizations of polymer materials, but also enabling further tailored designs for sequence-specific polymers with superior properties in many practical applications.
Collapse
Affiliation(s)
- Danh Nguyen
- Department of Mechanical Engineering, University of Connecticut, Mansfield, CT, United States
| | - Lei Tao
- Department of Mechanical Engineering, University of Connecticut, Mansfield, CT, United States
| | - Ying Li
- Department of Mechanical Engineering, University of Connecticut, Mansfield, CT, United States
- Polymer Program, Institute of Materials Science, University of Connecticut, Mansfield, CT, United States
- *Correspondence: Ying Li,
| |
Collapse
|
21
|
Abstract
Polymer science is one of the few fundamental research fields where the results can be transferred into real-life products almost immediately. Industries need collaborations with the best researchers (universities or national laboratories) to elevate the field and favor the development of new materials, which will boost the chemical and materials business economy and ensure that innovative and sustainable polymer products are constantly being brought to the market. The mechanisms to ensure a seamless and fruitful collaboration are numerous, but few approaches really manage to incorporate the full range of polymer research from a molecular understanding to a macroscopic control of properties. We review some of the main components of standard industry-academia collaborations and propose to develop polymer open centers that put the business development objective as the starting point of the collaboration and allow those to gather and focus on different scientific fields toward a common objective.
Collapse
|
22
|
Sivaraman G, Jackson NE. Coarse-Grained Density Functional Theory Predictions via Deep Kernel Learning. J Chem Theory Comput 2022; 18:1129-1141. [PMID: 35020388 DOI: 10.1021/acs.jctc.1c01001] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Abstract
Scalable electronic predictions are critical for soft materials design. Recently, the Electronic Coarse-Graining (ECG) method was introduced to renormalize all-atom quantum chemical (QC) predictions to coarse-grained (CG) resolutions using deep neural networks (DNNs). While DNNs can learn complex representations that prove challenging for kernel-based methods, they are susceptible to overfitting and the overconfidence of uncertainty estimations. Here, we develop ECG within a GPU-accelerated Deep Kernel Learning (DKL) framework to enable CG QC predictions using range-separated hybrid density functional theory (DFT), obtaining a 107 speedup relative to naive all-atom QC. By treating the predicted electronic properties as random Gaussian Processes, DKL incorporates CG mapping degeneracy by learning the distribution of electronic energies as a function of CG configuration. DKL-ECG accurately reproduces molecular orbital energies from range-separated DFT while facilitating efficient training via active learning using the uncertainties provided by DKL. We show that while active learning algorithms enable efficient sampling of a more diverse configurational space relative to random sampling, all explored query methods exhibit comparable performance for the examined system. We attribute this result to the significant overlap of the feature space and output property distributions across multiple temperatures.
Collapse
Affiliation(s)
- Ganesh Sivaraman
- Data Science and Learning Division, Argonne National Laboratory, Lemont, Illinois 60439, United States
| | - Nicholas E Jackson
- Department of Chemistry, University of Illinois at Urbana-Champaign, 505 South Mathews Avenue, Urbana, Illinois 61801, United States
| |
Collapse
|
23
|
Duong VT, Diessner EM, Grazioli G, Martin RW, Butts CT. Neural Upscaling from Residue-Level Protein Structure Networks to Atomistic Structures. Biomolecules 2021; 11:biom11121788. [PMID: 34944432 PMCID: PMC8698800 DOI: 10.3390/biom11121788] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2021] [Revised: 11/11/2021] [Accepted: 11/19/2021] [Indexed: 01/01/2023] Open
Abstract
Coarse-graining is a powerful tool for extending the reach of dynamic models of proteins and other biological macromolecules. Topological coarse-graining, in which biomolecules or sets thereof are represented via graph structures, is a particularly useful way of obtaining highly compressed representations of molecular structures, and simulations operating via such representations can achieve substantial computational savings. A drawback of coarse-graining, however, is the loss of atomistic detail—an effect that is especially acute for topological representations such as protein structure networks (PSNs). Here, we introduce an approach based on a combination of machine learning and physically-guided refinement for inferring atomic coordinates from PSNs. This “neural upscaling” procedure exploits the constraints implied by PSNs on possible configurations, as well as differences in the likelihood of observing different configurations with the same PSN. Using a 1 μs atomistic molecular dynamics trajectory of Aβ1–40, we show that neural upscaling is able to effectively recapitulate detailed structural information for intrinsically disordered proteins, being particularly successful in recovering features such as transient secondary structure. These results suggest that scalable network-based models for protein structure and dynamics may be used in settings where atomistic detail is desired, with upscaling employed to impute atomic coordinates from PSNs.
Collapse
Affiliation(s)
- Vy T. Duong
- Department of Chemistry, University of California, Irvine, CA 92697, USA; (V.T.D.); (E.M.D.)
| | - Elizabeth M. Diessner
- Department of Chemistry, University of California, Irvine, CA 92697, USA; (V.T.D.); (E.M.D.)
| | - Gianmarc Grazioli
- Department of Chemistry, San Jose State University, San Jose, CA 95192, USA;
| | - Rachel W. Martin
- Department of Chemistry, University of California, Irvine, CA 92697, USA; (V.T.D.); (E.M.D.)
- Department of Molecular Biology & Biochemistry, University of California, Irvine, CA 92697, USA
- Correspondence: (R.W.M.); (C.T.B.)
| | - Carter T. Butts
- Departments of Sociology, Statistics and Electrical Engineering & Computer Science, University of California, Irvine, CA 92697, USA
- Correspondence: (R.W.M.); (C.T.B.)
| |
Collapse
|
24
|
Dhamankar S, Webb MA. Chemically specific coarse‐graining of polymers: Methods and prospects. JOURNAL OF POLYMER SCIENCE 2021. [DOI: 10.1002/pol.20210555] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Affiliation(s)
- Satyen Dhamankar
- Department of Chemical and Biological Engineering Princeton University Princeton New Jersey USA
| | - Michael A. Webb
- Department of Chemical and Biological Engineering Princeton University Princeton New Jersey USA
| |
Collapse
|
25
|
Potter T, Barrett EL, Miller MA. Automated Coarse-Grained Mapping Algorithm for the Martini Force Field and Benchmarks for Membrane-Water Partitioning. J Chem Theory Comput 2021; 17:5777-5791. [PMID: 34472843 PMCID: PMC8444346 DOI: 10.1021/acs.jctc.1c00322] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2021] [Indexed: 01/08/2023]
Abstract
With a view to high-throughput simulations, we present an automated system for mapping and parameterizing organic molecules for use with the coarse-grained Martini force field. The method scales to larger molecules and a broader chemical space than existing schemes. The core of the mapping process is a graph-based analysis of the molecule's bonding network, which has the advantages of being fast, general, and preserving symmetry. The parameterization process pays special attention to coarse-grained beads in aromatic rings. It also includes a method for building efficient and stable frameworks of constraints for molecules with structural rigidity. The performance of the method is tested on a diverse set of 87 neutral organic molecules and the ability of the resulting models to capture octanol-water and membrane-water partition coefficients. In the latter case, we introduce an adaptive method for extracting partition coefficients from free-energy profiles to take into account the interfacial region of the membrane. We also use the models to probe the response of membrane-water partitioning to the cholesterol content of the membrane.
Collapse
Affiliation(s)
- Thomas
D. Potter
- Department
of Chemistry, Durham University, South Road, Durham DH1 3LE, United
Kingdom
| | - Elin L. Barrett
- Unilever
Safety and Environmental Assurance Centre, Colworth Science Park, Sharnbrook, Bedfordshire MK44 1LQ, United Kingdom
| | - Mark A. Miller
- Department
of Chemistry, Durham University, South Road, Durham DH1 3LE, United
Kingdom
| |
Collapse
|
26
|
Fite S, Nitecki O, Gross Z. Custom Tokenization Dictionary, CUSTODI: A General, Fast, and Reversible Data-Driven Representation and Regressor. J Chem Inf Model 2021; 61:3285-3291. [PMID: 34180231 DOI: 10.1021/acs.jcim.1c00563] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/20/2023]
Abstract
Custom tokenization dictionary (CUSTODI) is introduced as a novel way for tackling the problem of molecular representations, and especially the challenge of molecular property prediction. Herein, the motivational theory and the actual representation and model are presented and shown to have performance that is in line with benchmark methodologies. The uniqueness of CUSTODI is its applicability on small training sets and the developed theory suggests its possible use for a-priori estimation of future fit quality on any given dataset, regardless of the method used for fitting.
Collapse
Affiliation(s)
- Shachar Fite
- Schulich Faculty of Chemistry, Technion-Israel Institute of Technology, Haifa 32000, Israel
| | - Omri Nitecki
- Schulich Faculty of Chemistry, Technion-Israel Institute of Technology, Haifa 32000, Israel
| | - Zeev Gross
- Schulich Faculty of Chemistry, Technion-Israel Institute of Technology, Haifa 32000, Israel
| |
Collapse
|
27
|
Cao X, Tian P. "Dividing and Conquering" and "Caching" in Molecular Modeling. Int J Mol Sci 2021; 22:5053. [PMID: 34068835 PMCID: PMC8126232 DOI: 10.3390/ijms22095053] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2021] [Revised: 04/26/2021] [Accepted: 04/27/2021] [Indexed: 11/17/2022] Open
Abstract
Molecular modeling is widely utilized in subjects including but not limited to physics, chemistry, biology, materials science and engineering. Impressive progress has been made in development of theories, algorithms and software packages. To divide and conquer, and to cache intermediate results have been long standing principles in development of algorithms. Not surprisingly, most important methodological advancements in more than half century of molecular modeling are various implementations of these two fundamental principles. In the mainstream classical computational molecular science, tremendous efforts have been invested on two lines of algorithm development. The first is coarse graining, which is to represent multiple basic particles in higher resolution modeling as a single larger and softer particle in lower resolution counterpart, with resulting force fields of partial transferability at the expense of some information loss. The second is enhanced sampling, which realizes "dividing and conquering" and/or "caching" in configurational space with focus either on reaction coordinates and collective variables as in metadynamics and related algorithms, or on the transition matrix and state discretization as in Markov state models. For this line of algorithms, spatial resolution is maintained but results are not transferable. Deep learning has been utilized to realize more efficient and accurate ways of "dividing and conquering" and "caching" along these two lines of algorithmic research. We proposed and demonstrated the local free energy landscape approach, a new framework for classical computational molecular science. This framework is based on a third class of algorithm that facilitates molecular modeling through partially transferable in resolution "caching" of distributions for local clusters of molecular degrees of freedom. Differences, connections and potential interactions among these three algorithmic directions are discussed, with the hope to stimulate development of more elegant, efficient and reliable formulations and algorithms for "dividing and conquering" and "caching" in complex molecular systems.
Collapse
Affiliation(s)
- Xiaoyong Cao
- School of Life Sciences, Jilin University, Changchun 130012, China;
| | - Pu Tian
- School of Life Sciences, Jilin University, Changchun 130012, China;
- School of Artificial Intelligence, Jilin University, Changchun 130012, China
| |
Collapse
|
28
|
Errica F, Giulini M, Bacciu D, Menichetti R, Micheli A, Potestio R. A Deep Graph Network-Enhanced Sampling Approach to Efficiently Explore the Space of Reduced Representations of Proteins. Front Mol Biosci 2021; 8:637396. [PMID: 33996896 PMCID: PMC8116519 DOI: 10.3389/fmolb.2021.637396] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2020] [Accepted: 02/17/2021] [Indexed: 12/12/2022] Open
Abstract
The limits of molecular dynamics (MD) simulations of macromolecules are steadily pushed forward by the relentless development of computer architectures and algorithms. The consequent explosion in the number and extent of MD trajectories induces the need for automated methods to rationalize the raw data and make quantitative sense of them. Recently, an algorithmic approach was introduced by some of us to identify the subset of a protein's atoms, or mapping, that enables the most informative description of the system. This method relies on the computation, for a given reduced representation, of the associated mapping entropy, that is, a measure of the information loss due to such simplification; albeit relatively straightforward, this calculation can be time-consuming. Here, we describe the implementation of a deep learning approach aimed at accelerating the calculation of the mapping entropy. We rely on Deep Graph Networks, which provide extreme flexibility in handling structured input data and whose predictions prove to be accurate and-remarkably efficient. The trained network produces a speedup factor as large as 105 with respect to the algorithmic computation of the mapping entropy, enabling the reconstruction of its landscape by means of the Wang-Landau sampling scheme. Applications of this method reach much further than this, as the proposed pipeline is easily transferable to the computation of arbitrary properties of a molecular structure.
Collapse
Affiliation(s)
- Federico Errica
- Department of Computer Science, University of Pisa, Pisa, Italy
| | - Marco Giulini
- Physics Department, University of Trento, Trento, Italy
- INFN-TIFPA, Trento Institute for Fundamental Physics and Applications, Trento, Italy
| | - Davide Bacciu
- Department of Computer Science, University of Pisa, Pisa, Italy
| | - Roberto Menichetti
- Physics Department, University of Trento, Trento, Italy
- INFN-TIFPA, Trento Institute for Fundamental Physics and Applications, Trento, Italy
| | - Alessio Micheli
- Department of Computer Science, University of Pisa, Pisa, Italy
| | - Raffaello Potestio
- Physics Department, University of Trento, Trento, Italy
- INFN-TIFPA, Trento Institute for Fundamental Physics and Applications, Trento, Italy
| |
Collapse
|
29
|
Upadhya R, Kosuri S, Tamasi M, Meyer TA, Atta S, Webb MA, Gormley AJ. Automation and data-driven design of polymer therapeutics. Adv Drug Deliv Rev 2021; 171:1-28. [PMID: 33242537 PMCID: PMC8127395 DOI: 10.1016/j.addr.2020.11.009] [Citation(s) in RCA: 23] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2020] [Revised: 11/10/2020] [Accepted: 11/12/2020] [Indexed: 01/01/2023]
Abstract
Polymers are uniquely suited for drug delivery and biomaterial applications due to tunable structural parameters such as length, composition, architecture, and valency. To facilitate designs, researchers may explore combinatorial libraries in a high throughput fashion to correlate structure to function. However, traditional polymerization reactions including controlled living radical polymerization (CLRP) and ring-opening polymerization (ROP) require inert reaction conditions and extensive expertise to implement. With the advent of air-tolerance and automation, several polymerization techniques are now compatible with well plates and can be carried out at the benchtop, making high throughput synthesis and high throughput screening (HTS) possible. To avoid HTS pitfalls often described as "fishing expeditions," it is crucial to employ intelligent and big data approaches to maximize experimental efficiency. This is where the disruptive technologies of machine learning (ML) and artificial intelligence (AI) will likely play a role. In fact, ML and AI are already impacting small molecule drug discovery and showing signs of emerging in drug delivery. In this review, we present state-of-the-art research in drug delivery, gene delivery, antimicrobial polymers, and bioactive polymers alongside data-driven developments in drug design and organic synthesis. From this insight, important lessons are revealed for the polymer therapeutics community including the value of a closed loop design-build-test-learn workflow. This is an exciting time as researchers will gain the ability to fully explore the polymer structural landscape and establish quantitative structure-property relationships (QSPRs) with biological significance.
Collapse
Affiliation(s)
| | | | | | | | - Supriya Atta
- Rutgers, The State University of New Jersey, USA
| | - Michael A Webb
- Department of Chemical and Biological Engineering, Princeton University, Princeton, NJ 08540, USA
| | | |
Collapse
|
30
|
Ye H, Xian W, Li Y. Machine Learning of Coarse-Grained Models for Organic Molecules and Polymers: Progress, Opportunities, and Challenges. ACS OMEGA 2021; 6:1758-1772. [PMID: 33521417 PMCID: PMC7841771 DOI: 10.1021/acsomega.0c05321] [Citation(s) in RCA: 21] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/31/2020] [Accepted: 01/04/2021] [Indexed: 05/02/2023]
Abstract
Machine learning (ML) has emerged as one of the most powerful tools transforming all areas of science and engineering. The nature of molecular dynamics (MD) simulations, complex and time-consuming calculations, makes them particularly suitable for ML research. This review article focuses on recent advancements in developing efficient and accurate coarse-grained (CG) models using various ML methods, in terms of regulating the coarse-graining process, constructing adequate descriptors/features, generating representative training data sets, and optimization of the loss function. Two classes of the CG models are introduced: bottom-up and top-down CG methods. To illustrate these methods and demonstrate the open methodological questions, we survey several important principles in constructing CG models and how these are incorporated into ML methods and improved with specific learning techniques. Finally, we discuss some key aspects of developing machine-learned CG models with high accuracy and efficiency. Besides, we describe how these aspects are tackled in state-of-the-art methods and which remain to be addressed in the near future. We expect that these machine-learned CG models can address thermodynamic consistent, transferable, and representative issues in classical CG models.
Collapse
Affiliation(s)
- Huilin Ye
- Department
of Mechanical Engineering, University of
Connecticut, Storrs, Connecticut 06269, United States
| | - Weikang Xian
- Department
of Mechanical Engineering, University of
Connecticut, Storrs, Connecticut 06269, United States
| | - Ying Li
- Department
of Mechanical Engineering, University of
Connecticut, Storrs, Connecticut 06269, United States
- Polymer
Program, Institute of Materials Science, University of Connecticut, Storrs, Connecticut 06269, United States
- E-mail: . Phone: +1 860 4867110. Fax: +1 860 4865088
| |
Collapse
|
31
|
Abstract
Four decades of molecular theory and computation have helped form the modern understanding of the physical chemistry of organic semiconductors. Whereas these efforts have historically centered around characterizations of electronic structure at the single-molecule or dimer scale, emerging trends in noncrystalline molecular and polymeric semiconductors are motivating the need for modeling techniques capable of morphological and electronic structure predictions at the mesoscale. Provided the challenges associated with these prediction tasks, the community has begun to evolve a computational toolkit for organic semiconductors incorporating techniques from the fields of soft matter, coarse-graining, and machine learning. Here, we highlight recent advances in coarse-grained methodologies aimed at the multiscale characterization of noncrystalline organic semiconductors. As organic semiconductor performance is dependent on the interplay of mesoscale morphology and molecular electronic structure, specific emphasis is placed on coarse-grained modeling approaches capable of both structural and electronic predictions without recourse to all-atom representations.
Collapse
Affiliation(s)
- Nicholas E Jackson
- Department of Chemistry, University of Illinois, Urbana-Champaign, Urbana, Illinois 61801, United States
| |
Collapse
|
32
|
Menichetti R, Giulini M, Potestio R. A journey through mapping space: characterising the statistical and metric properties of reduced representations of macromolecules. THE EUROPEAN PHYSICAL JOURNAL. B 2021; 94:204. [PMID: 34720709 PMCID: PMC8550479 DOI: 10.1140/epjb/s10051-021-00205-9] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/15/2021] [Accepted: 09/13/2021] [Indexed: 05/04/2023]
Abstract
ABSTRACT A mapping of a macromolecule is a prescription to construct a simplified representation of the system in which only a subset of its constituent atoms is retained. As the specific choice of the mapping affects the analysis of all-atom simulations as well as the construction of coarse-grained models, the characterisation of the mapping space has recently attracted increasing attention. We here introduce a notion of scalar product and distance between reduced representations, which allows the study of the metric and topological properties of their space in a quantitative manner. Making use of a Wang-Landau enhanced sampling algorithm, we exhaustively explore such space, and examine the qualitative features of mappings in terms of their squared norm. A one-to-one correspondence with an interacting lattice gas on a finite volume leads to the emergence of discontinuous phase transitions in mapping space, which mark the boundaries between qualitatively different reduced representations of the same molecule.
Collapse
Affiliation(s)
- Roberto Menichetti
- Physics Department, University of Trento, via Sommarive, 14, 38123 Trento, Italy
- INFN-TIFPA, Trento Institute for Fundamental Physics and Applications, via Sommarive, 14, 38123 Trento, Italy
| | - Marco Giulini
- Physics Department, University of Trento, via Sommarive, 14, 38123 Trento, Italy
- INFN-TIFPA, Trento Institute for Fundamental Physics and Applications, via Sommarive, 14, 38123 Trento, Italy
| | - Raffaello Potestio
- Physics Department, University of Trento, via Sommarive, 14, 38123 Trento, Italy
- INFN-TIFPA, Trento Institute for Fundamental Physics and Applications, via Sommarive, 14, 38123 Trento, Italy
| |
Collapse
|
33
|
Zuo YY, Uspal WE, Wei T. Airborne Transmission of COVID-19: Aerosol Dispersion, Lung Deposition, and Virus-Receptor Interactions. ACS NANO 2020; 14:16502-16524. [PMID: 33236896 PMCID: PMC7724984 DOI: 10.1021/acsnano.0c08484] [Citation(s) in RCA: 74] [Impact Index Per Article: 18.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/11/2020] [Accepted: 11/19/2020] [Indexed: 05/02/2023]
Abstract
Coronavirus disease 2019 (COVID-19), due to infection by the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), is now causing a global pandemic. Aerosol transmission of COVID-19, although plausible, has not been confirmed by the World Health Organization (WHO) as a general transmission route. Considering the rapid spread of SARS-CoV-2, especially nosocomial outbreaks and other superspreading events, there is an urgent need to study the possibility of airborne transmission and its impact on the lung, the primary body organ attacked by the virus. Here, we review the complete pathway of airborne transmission of SARS-CoV-2 from aerosol dispersion in air to subsequent biological uptake after inhalation. In particular, we first review the aerodynamic and colloidal mechanisms by which aerosols disperse and transmit in air and deposit onto surfaces. We then review the fundamental mechanisms that govern regional deposition of micro- and nanoparticles in the lung. Focus is given to biophysical interactions between particles and the pulmonary surfactant film, the initial alveolar-capillary barrier and first-line host defense system against inhaled particles and pathogens. Finally, we summarize the current understanding about the structural dynamics of the SARS-CoV-2 spike protein and its interactions with receptors at the atomistic and molecular scales, primarily as revealed by molecular dynamics simulations. This review provides urgent and multidisciplinary knowledge toward understanding the airborne transmission of SARS-CoV-2 and its health impact on the respiratory system.
Collapse
Affiliation(s)
- Yi Y. Zuo
- Department of Mechanical Engineering,
University of Hawaii at Manoa,
Honolulu, Hawaii 96822, United States
- Department of Pediatrics, John A.
Burns School of Medicine, University of
Hawaii, Honolulu, Hawaii 96826, United
States
| | - William E. Uspal
- Department of Mechanical Engineering,
University of Hawaii at Manoa,
Honolulu, Hawaii 96822, United States
| | - Tao Wei
- Chemical Engineering Department,
Howard University, Washington, DC
20059, United States
| |
Collapse
|
34
|
Giulini M, Menichetti R, Shell MS, Potestio R. An Information-Theory-Based Approach for Optimal Model Reduction of Biomolecules. J Chem Theory Comput 2020; 16:6795-6813. [PMID: 33108737 PMCID: PMC7659038 DOI: 10.1021/acs.jctc.0c00676] [Citation(s) in RCA: 31] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2020] [Indexed: 02/06/2023]
Abstract
In theoretical modeling of a physical system, a crucial step consists of the identification of those degrees of freedom that enable a synthetic yet informative representation of it. While in some cases this selection can be carried out on the basis of intuition and experience, straightforward discrimination of the important features from the negligible ones is difficult for many complex systems, most notably heteropolymers and large biomolecules. We here present a thermodynamics-based theoretical framework to gauge the effectiveness of a given simplified representation by measuring its information content. We employ this method to identify those reduced descriptions of proteins, in terms of a subset of their atoms, that retain the largest amount of information from the original model; we show that these highly informative representations share common features that are intrinsically related to the biological properties of the proteins under examination, thereby establishing a bridge between protein structure, energetics, and function.
Collapse
Affiliation(s)
- Marco Giulini
- Physics Department, University of Trento, via Sommarive 14, I-38123 Trento, Italy
- INFN-TIFPA, Trento Institute for Fundamental Physics and Applications, I-38123 Trento, Italy
| | - Roberto Menichetti
- Physics Department, University of Trento, via Sommarive 14, I-38123 Trento, Italy
- INFN-TIFPA, Trento Institute for Fundamental Physics and Applications, I-38123 Trento, Italy
| | - M Scott Shell
- Department of Chemical Engineering, University of California Santa Barbara, Santa Barbara, California 93106, United States
| | - Raffaello Potestio
- Physics Department, University of Trento, via Sommarive 14, I-38123 Trento, Italy
- INFN-TIFPA, Trento Institute for Fundamental Physics and Applications, I-38123 Trento, Italy
| |
Collapse
|
35
|
Bedolla E, Padierna LC, Castañeda-Priego R. Machine learning for condensed matter physics. JOURNAL OF PHYSICS. CONDENSED MATTER : AN INSTITUTE OF PHYSICS JOURNAL 2020; 33:053001. [PMID: 32932243 DOI: 10.1088/1361-648x/abb895] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/30/2020] [Accepted: 09/15/2020] [Indexed: 06/11/2023]
Abstract
Condensed matter physics (CMP) seeks to understand the microscopic interactions of matter at the quantum and atomistic levels, and describes how these interactions result in both mesoscopic and macroscopic properties. CMP overlaps with many other important branches of science, such as chemistry, materials science, statistical physics, and high-performance computing. With the advancements in modern machine learning (ML) technology, a keen interest in applying these algorithms to further CMP research has created a compelling new area of research at the intersection of both fields. In this review, we aim to explore the main areas within CMP, which have successfully applied ML techniques to further research, such as the description and use of ML schemes for potential energy surfaces, the characterization of topological phases of matter in lattice systems, the prediction of phase transitions in off-lattice and atomistic simulations, the interpretation of ML theories with physics-inspired frameworks and the enhancement of simulation methods with ML algorithms. We also discuss in detail the main challenges and drawbacks of using ML methods on CMP problems, as well as some perspectives for future developments.
Collapse
Affiliation(s)
- Edwin Bedolla
- División de Ciencias e Ingenierías, Universidad de Guanajuato, Loma del Bosque 103, 37150 León, Mexico
| | - Luis Carlos Padierna
- División de Ciencias e Ingenierías, Universidad de Guanajuato, Loma del Bosque 103, 37150 León, Mexico
| | - Ramón Castañeda-Priego
- División de Ciencias e Ingenierías, Universidad de Guanajuato, Loma del Bosque 103, 37150 León, Mexico
| |
Collapse
|
36
|
Webb MA, Jackson NE, Gil PS, de Pablo JJ. Targeted sequence design within the coarse-grained polymer genome. SCIENCE ADVANCES 2020; 6:eabc6216. [PMID: 33087352 PMCID: PMC7577717 DOI: 10.1126/sciadv.abc6216] [Citation(s) in RCA: 65] [Impact Index Per Article: 16.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/04/2020] [Accepted: 09/02/2020] [Indexed: 05/05/2023]
Abstract
The chemical design of polymers with target structural and/or functional properties represents a grand challenge in materials science. While data-driven design approaches are promising, success with polymers has been limited, largely due to limitations in data availability. Here, we demonstrate the targeted sequence design of single-chain structure in polymers by combining coarse-grained modeling, machine learning, and model optimization. Nearly 2000 unique coarse-grained polymers are simulated to construct and analyze machine learning models. We find that deep neural networks inexpensively and reliably predict structural properties with limited sequence information as input. By coupling trained ML models with sequential model-based optimization, polymer sequences are proposed to exhibit globular, swollen, or rod-like behaviors, which are verified by explicit simulations. This work highlights the promising integration of coarse-grained modeling with data-driven design and represents a necessary and crucial step toward more complex polymer design efforts.
Collapse
Affiliation(s)
- Michael A Webb
- Pritzker School of Molecular Engineering, University of Chicago, Chicago, IL 60615, USA
| | - Nicholas E Jackson
- Pritzker School of Molecular Engineering, University of Chicago, Chicago, IL 60615, USA
- Center for Molecular Engineering and Materials Science Division, Argonne National Laboratory, Lemont, IL 06349, USA
| | - Phwey S Gil
- Pritzker School of Molecular Engineering, University of Chicago, Chicago, IL 60615, USA
| | - Juan J de Pablo
- Pritzker School of Molecular Engineering, University of Chicago, Chicago, IL 60615, USA.
- Center for Molecular Engineering and Materials Science Division, Argonne National Laboratory, Lemont, IL 06349, USA
| |
Collapse
|
37
|
Foley TT, Kidder KM, Shell MS, Noid WG. Exploring the landscape of model representations. Proc Natl Acad Sci U S A 2020; 117:24061-24068. [PMID: 32929015 PMCID: PMC7533877 DOI: 10.1073/pnas.2000098117] [Citation(s) in RCA: 26] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
The success of any physical model critically depends upon adopting an appropriate representation for the phenomenon of interest. Unfortunately, it remains generally challenging to identify the essential degrees of freedom or, equivalently, the proper order parameters for describing complex phenomena. Here we develop a statistical physics framework for exploring and quantitatively characterizing the space of order parameters for representing physical systems. Specifically, we examine the space of low-resolution representations that correspond to particle-based coarse-grained (CG) models for a simple microscopic model of protein fluctuations. We employ Monte Carlo (MC) methods to sample this space and determine the density of states for CG representations as a function of their ability to preserve the configurational information, I, and large-scale fluctuations, Q, of the microscopic model. These two metrics are uncorrelated in high-resolution representations but become anticorrelated at lower resolutions. Moreover, our MC simulations suggest an emergent length scale for coarse-graining proteins, as well as a qualitative distinction between good and bad representations of proteins. Finally, we relate our work to recent approaches for clustering graphs and detecting communities in networks.
Collapse
Affiliation(s)
- Thomas T Foley
- Department of Chemistry, The Pennsylvania State University, University Park, PA 16802
- Department of Physics, The Pennsylvania State University, University Park, PA 16802
| | - Katherine M Kidder
- Department of Chemistry, The Pennsylvania State University, University Park, PA 16802
| | - M Scott Shell
- Department of Chemical Engineering, University of California, Santa Barbara, CA 93106
| | - W G Noid
- Department of Chemistry, The Pennsylvania State University, University Park, PA 16802;
| |
Collapse
|
38
|
Li Z, Wellawatte GP, Chakraborty M, Gandhi HA, Xu C, White AD. Graph neural network based coarse-grained mapping prediction. Chem Sci 2020; 11:9524-9531. [PMID: 34123175 PMCID: PMC8161155 DOI: 10.1039/d0sc02458a] [Citation(s) in RCA: 22] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/25/2023] Open
Abstract
The selection of coarse-grained (CG) mapping operators is a critical step for CG molecular dynamics (MD) simulation. It is still an open question about what is optimal for this choice and there is a need for theory. The current state-of-the art method is mapping operators manually selected by experts. In this work, we demonstrate an automated approach by viewing this problem as supervised learning where we seek to reproduce the mapping operators produced by experts. We present a graph neural network based CG mapping predictor called Deep Supervised Graph Partitioning Model (DSGPM) that treats mapping operators as a graph segmentation problem. DSGPM is trained on a novel dataset, Human-annotated Mappings (HAM), consisting of 1180 molecules with expert annotated mapping operators. HAM can be used to facilitate further research in this area. Our model uses a novel metric learning objective to produce high-quality atomic features that are used in spectral clustering. The results show that the DSGPM outperforms state-of-the-art methods in the field of graph segmentation. Finally, we find that predicted CG mapping operators indeed result in good CG MD models when used in simulation. We propose a scalable graph neural network-based method for automating coarse-grained mapping prediction for molecules.![]()
Collapse
Affiliation(s)
- Zhiheng Li
- Department of Computer Science, University of Rochester USA
| | | | | | - Heta A Gandhi
- Department of Chemical Engineering, University of Rochester USA
| | - Chenliang Xu
- Department of Computer Science, University of Rochester USA
| | - Andrew D White
- Department of Chemical Engineering, University of Rochester USA
| |
Collapse
|
39
|
Khot A, Shiring SB, Savoie BM. Evidence of information limitations in coarse-grained models. J Chem Phys 2020; 151:244105. [PMID: 31893900 DOI: 10.1063/1.5129398] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Developing accurate coarse-grained (CG) models is critical for addressing long time and length scale phenomena with molecular simulations. Here, we distinguish and quantify two sources of error that are relevant to CG models in order to guide further methods development: "representability" errors, which result from the finite basis associated with the chosen functional form of the CG model and mapping operator, and "information" errors, which result from the limited kind and quantity of data supplied to the CG parameterization algorithm. We have performed a systematic investigation of these errors by generating all possible CG models of three liquids (butane, 1-butanol, and 1,3-propanediol) that conserve a set of chemically motivated locality and topology relationships. In turn, standard algorithms (iterative Boltzmann inversion, IBI, and multiscale coarse-graining, MSCG) were used to parameterize the models and the CG predictions were compared with atomistic results. For off-target properties, we observe a strong correlation between the accuracy and the resolution of the CG model, which suggests that the approximations represented by MSCG and IBI deteriorate with decreasing resolution. Conversely, on-target properties exhibit an extremely weak resolution dependence that suggests a limited role of representability errors in model accuracy. Taken together, these results suggest that simple CG models are capable of utilizing more information than is provided by standard parameterization algorithms, and that model accuracy can be improved by algorithm development rather than resorting to more complicated CG models.
Collapse
Affiliation(s)
- Aditi Khot
- Davidson School of Chemical Engineering, Purdue University, West Lafayette, Indiana 47906, USA
| | - Stephen B Shiring
- Davidson School of Chemical Engineering, Purdue University, West Lafayette, Indiana 47906, USA
| | - Brett M Savoie
- Davidson School of Chemical Engineering, Purdue University, West Lafayette, Indiana 47906, USA
| |
Collapse
|
40
|
Chakraborty M, Xu J, White AD. Is preservation of symmetry necessary for coarse-graining? Phys Chem Chem Phys 2020; 22:14998-15005. [DOI: 10.1039/d0cp02309d] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]
Abstract
This work investigates if preserving the symmetry of the underlying molecular graph of a given molecule when choosing a coarse-grained (CG) mapping significantly affects the CG model accuracy.
Collapse
Affiliation(s)
| | - Jinyu Xu
- Department of Chemical Engineering
- University of Rochester
- Rochester
- USA
| | - Andrew D. White
- Department of Chemical Engineering
- University of Rochester
- Rochester
- USA
| |
Collapse
|
41
|
Wan M, Song J, Li W, Gao L, Fang W. Development of Coarse‐Grained Force Field by Combining Multilinear Interpolation Technique and Simplex Algorithm. J Comput Chem 2019; 41:814-829. [DOI: 10.1002/jcc.26131] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2019] [Revised: 11/07/2019] [Accepted: 12/05/2019] [Indexed: 12/23/2022]
Affiliation(s)
- Mingwei Wan
- Key Laboratory of Theoretical and Computational PhotochemistryMinistry of Education, College of Chemistry, Beijing Normal University 19 Xin‐Jie‐Kou‐Wai Street Beijing 100875 China
- Institution of Theoretical and Computational Chemistry, School of Chemistry and Chemical Engineering, Nanjing University Nanjing 210023 China
| | - Junjie Song
- Key Laboratory of Theoretical and Computational PhotochemistryMinistry of Education, College of Chemistry, Beijing Normal University 19 Xin‐Jie‐Kou‐Wai Street Beijing 100875 China
| | - Wenli Li
- Key Laboratory of Theoretical and Computational PhotochemistryMinistry of Education, College of Chemistry, Beijing Normal University 19 Xin‐Jie‐Kou‐Wai Street Beijing 100875 China
| | - Lianghui Gao
- Key Laboratory of Theoretical and Computational PhotochemistryMinistry of Education, College of Chemistry, Beijing Normal University 19 Xin‐Jie‐Kou‐Wai Street Beijing 100875 China
| | - Weihai Fang
- Key Laboratory of Theoretical and Computational PhotochemistryMinistry of Education, College of Chemistry, Beijing Normal University 19 Xin‐Jie‐Kou‐Wai Street Beijing 100875 China
- Institution of Theoretical and Computational Chemistry, School of Chemistry and Chemical Engineering, Nanjing University Nanjing 210023 China
| |
Collapse
|
42
|
Pireddu G, Pazzona FG, Demontis P, Załuska-Kotur MA. Scaling-Up Simulations of Diffusion in Microporous Materials. J Chem Theory Comput 2019; 15:6931-6943. [PMID: 31604017 DOI: 10.1021/acs.jctc.9b00801] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
We introduce and demonstrate the coarse-graining of static and dynamical properties of host-guest systems constituted by methane in two different microporous materials. The reference systems are mapped to occupancy-based pore-scale lattice models. Each coarse-grained model is equipped with an appropriate coarse-grained potential and a local dynamical operator, which represents the probability of interpore molecular jumps between different cages. Coarse-grained thermodynamics and dynamics are both defined based on small-scale atomistic simulations of the reference systems. We considered two host materials: the widely studied ITQ-29 zeolite and the LTA-zeolite-templated carbon, which was recently theorized. Our method allows for representing with satisfactory accuracy and a considerably reduced computational effort the reference systems while providing new interesting physical insights in terms of static and diffusive properties.
Collapse
Affiliation(s)
- Giovanni Pireddu
- Dipartimento di Chimica e Farmacia , Università degli Studi di Sassari , Via Vienna 2 , 01700 Sassari , Italy.,Institute of Physics , Polish Academy of Sciences , Al. Lotników 32/46 , 02-668 Warsaw , Poland
| | - Federico G Pazzona
- Dipartimento di Chimica e Farmacia , Università degli Studi di Sassari , Via Vienna 2 , 01700 Sassari , Italy
| | - Pierfranco Demontis
- Dipartimento di Chimica e Farmacia , Università degli Studi di Sassari , Via Vienna 2 , 01700 Sassari , Italy
| | | |
Collapse
|
43
|
Cao Y, Li X, Xiong J, Wang L, Yan LT, Ge J. Investigating the origin of high efficiency in confined multienzyme catalysis. NANOSCALE 2019; 11:22108-22117. [PMID: 31720641 DOI: 10.1039/c9nr07381g] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Biomimetic strategies have successfully been applied to confine multiple enzymes on scaffolds to obtain higher catalytic efficiency of enzyme cascades than freely distributed enzymes. However, the origin of high efficiency is poorly understood. We developed a coarse-grained, particle-based model to understand the origin of high efficiency. We found that a reaction intermediate is the key in affecting reaction kinetics. In the case of unstable intermediates, the confinement of multiple enzymes in clusters enhanced the catalytic efficiency and a shorter distance between enzymes resulted in a higher reaction rate and yield. This understanding was verified by co-encapsulating multiple enzymes in metal-organic framework (MOF) nanocrystals as artificially confined multienzyme complexes. The activity enhancement of multiple enzymes in MOFs depended on the distance between enzymes, when the decay of intermediates existed. The finding of this study is useful for designing in vitro synthetic biology systems based on artificial multienzyme complexes.
Collapse
Affiliation(s)
- Yufei Cao
- Key Lab for Industrial Biocatalysis, Ministry of Education, Department of Chemical Engineering, Tsinghua University, Beijing 100084, China.
| | | | | | | | | | | |
Collapse
|
44
|
Rosenberger D, van der Vegt NFA. Relative entropy indicates an ideal concentration for structure-based coarse graining of binary mixtures. Phys Rev E 2019; 99:053308. [PMID: 31212527 DOI: 10.1103/physreve.99.053308] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2019] [Indexed: 06/09/2023]
Abstract
Many methodological approaches have been proposed to improve systematic or bottom-up coarse-graining techniques to enhance the representability and transferability of the derived interaction potentials. Transferability describes the ability of a coarse-grained (CG) model to be predictive, i.e., to describe a system at state points different from those chosen for parametrization. Whereas the representability characterizes the accuracy of a CG model to reproduce target properties of the underlying reference or fine-grained model at a given state point. In this article, we shift the focus away from methodological aspects and rather raise the question whether we can overcome the disadvantages of a given method in terms of representability and transferability by systematically selecting the state point at which the CG model gets parametrized. We answer this question by applying the inverse Monte Carlo (IMC) approach-a structure-based coarse-graining method-to derive effective interactions for binary mixtures of simple Lennard-Jones (LJ) particles, which are different in size. For such simple systems we indeed can identify a concentration where the derived potentials show the best performance in terms of structural representability and transferability. This specific concentration is identified by computing the relative entropy which quantifies the information loss between different IMC models and the reference LJ model at varying mixture compositions. Further, we show that an IMC model for mixtures of n-hexane and n-perfluorohexane shows the same trend in transferability as the IMC models for the LJ system. All derived models are more transferable in the direction of increasing concentration of the larger-sized compound.
Collapse
Affiliation(s)
- David Rosenberger
- Eduard Zintl Institut für Anorganische und Physikalische Chemie, Technische Universität Darmstadt, Darmstadt, 64287, Germany
| | - Nico F A van der Vegt
- Eduard Zintl Institut für Anorganische und Physikalische Chemie, Technische Universität Darmstadt, Darmstadt, 64287, Germany
| |
Collapse
|
45
|
Jackson NE, Bowen AS, Antony LW, Webb MA, Vishwanath V, de Pablo JJ. Electronic structure at coarse-grained resolutions from supervised machine learning. SCIENCE ADVANCES 2019; 5:eaav1190. [PMID: 30915396 PMCID: PMC6430626 DOI: 10.1126/sciadv.aav1190] [Citation(s) in RCA: 31] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/15/2018] [Accepted: 02/04/2019] [Indexed: 05/21/2023]
Abstract
Computational studies aimed at understanding conformationally dependent electronic structure in soft materials require a combination of classical and quantum-mechanical simulations, for which the sampling of conformational space can be particularly demanding. Coarse-grained (CG) models provide a means of accessing relevant time scales, but CG configurations must be back-mapped into atomistic representations to perform quantum-chemical calculations, which is computationally intensive and inconsistent with the spatial resolution of the CG models. A machine learning approach, denoted as artificial neural network electronic coarse graining (ANN-ECG), is presented here in which the conformationally dependent electronic structure of a molecule is mapped directly to CG pseudo-atom configurations. By averaging over decimated degrees of freedom, ANN-ECG accelerates simulations by eliminating backmapping and repeated quantum-chemical calculations. The approach is accurate, consistent with the CG spatial resolution, and can be used to identify computationally optimal CG resolutions.
Collapse
Affiliation(s)
- Nicholas E. Jackson
- Institute for Molecular Engineering, Argonne National Laboratory, Lemont, IL 60439, USA
- Institute for Molecular Engineering, University of Chicago, Chicago, IL 60637, USA
| | - Alec S. Bowen
- Institute for Molecular Engineering, University of Chicago, Chicago, IL 60637, USA
| | - Lucas W. Antony
- Institute for Molecular Engineering, University of Chicago, Chicago, IL 60637, USA
| | - Michael A. Webb
- Institute for Molecular Engineering, University of Chicago, Chicago, IL 60637, USA
| | - Venkatram Vishwanath
- Argonne Leadership Computing Facility, Argonne National Laboratory, Lemont, IL 60439, USA
| | - Juan J. de Pablo
- Institute for Molecular Engineering, Argonne National Laboratory, Lemont, IL 60439, USA
- Institute for Molecular Engineering, University of Chicago, Chicago, IL 60637, USA
- Corresponding author.
| |
Collapse
|
46
|
Jackson NE, Webb MA, de Pablo JJ. Recent advances in machine learning towards multiscale soft materials design. Curr Opin Chem Eng 2019. [DOI: 10.1016/j.coche.2019.03.005] [Citation(s) in RCA: 70] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
|