1
|
Zhao Y, Zhang W, Li T. EPR-Net: constructing a non-equilibrium potential landscape via a variational force projection formulation. Natl Sci Rev 2024; 11:nwae052. [PMID: 38883298 PMCID: PMC11173252 DOI: 10.1093/nsr/nwae052] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2023] [Revised: 01/07/2024] [Accepted: 01/29/2024] [Indexed: 06/18/2024] Open
Abstract
We present EPR-Net, a novel and effective deep learning approach that tackles a crucial challenge in biophysics: constructing potential landscapes for high-dimensional non-equilibrium steady-state systems. EPR-Net leverages a nice mathematical fact that the desired negative potential gradient is simply the orthogonal projection of the driving force of the underlying dynamics in a weighted inner-product space. Remarkably, our loss function has an intimate connection with the steady entropy production rate (EPR), enabling simultaneous landscape construction and EPR estimation. We introduce an enhanced learning strategy for systems with small noise, and extend our framework to include dimensionality reduction and the state-dependent diffusion coefficient case in a unified fashion. Comparative evaluations on benchmark problems demonstrate the superior accuracy, effectiveness and robustness of EPR-Net compared to existing methods. We apply our approach to challenging biophysical problems, such as an eight-dimensional (8D) limit cycle and a 52D multi-stability problem, which provide accurate solutions and interesting insights on constructed landscapes. With its versatility and power, EPR-Net offers a promising solution for diverse landscape construction problems in biophysics.
Collapse
Affiliation(s)
- Yue Zhao
- Center for Data Science, Peking University, Beijing 100871, China
| | - Wei Zhang
- Zuse Institute Berlin, Berlin 14195, Germany
- Department of Mathematics and Computer Science, Freie Universität Berlin, Berlin 14195, Germany
| | - Tiejun Li
- Center for Data Science, Peking University, Beijing 100871, China
- Laboratory of Mathematics and Applied Mathematics (LMAM) and School of Mathematical Sciences, Peking University, Beijing 100871, China
- Center for Machine Learning Research, Peking University, Beijing 100871, China
| |
Collapse
|
2
|
Izvekov S, Kroonblawd MP, Larentzos JP, Brennan JK, Rice BM. Maximum Entropy Theory of Multiscale Coarse-Graining via Matching Thermodynamic Forces: Application to a Molecular Crystal (TATB). J Phys Chem B 2024. [PMID: 38489758 DOI: 10.1021/acs.jpcb.3c07078] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/17/2024]
Abstract
The MSCG/FM (multiscale coarse-graining via force-matching) approach is an efficient supervised machine learning method to develop microscopically informed coarse-grained (CG) models. We present a theory based on the principle of maximum entropy (PME) enveloping the existing MSCG/FM approaches. This theory views the MSCG/FM method as a special case of matching the thermodynamic forces from the extended ensemble described by the set of thermodynamic (relevant) system coordinates. This set may include CG coordinates, the stress tensor, applied external fields, and so forth, and may be characterized by nonequilibrium conditions. Following the presentation of the theory, we discuss the consistent matching of both bonded and nonbonded interactions. The proposed PME formulation is used as a starting point to extend the MSCG/FM method to the constant strain ensemble, which together with the explicit matching of the bonded forces is better suited for coarse-graining anisotropic media at a submolecular resolution. The theory is demonstrated by performing the fine coarse-graining of crystalline 1,3,5-triamino-2,4,6-trinitrobenzene (TATB), a well-known insensitive molecular energetic material, which exhibits highly anisotropic mechanical properties.
Collapse
Affiliation(s)
- Sergei Izvekov
- U.S. Army DEVCOM Army Research Laboratory, Aberdeen Proving Ground, Maryland 21005, United States
| | - Matthew P Kroonblawd
- Physical and Life Sciences Directorate, Lawrence Livermore National Laboratory, Livermore, California 94550, United States
| | - James P Larentzos
- U.S. Army DEVCOM Army Research Laboratory, Aberdeen Proving Ground, Maryland 21005, United States
| | - John K Brennan
- U.S. Army DEVCOM Army Research Laboratory, Aberdeen Proving Ground, Maryland 21005, United States
| | - Betsy M Rice
- U.S. Army DEVCOM Army Research Laboratory, Aberdeen Proving Ground, Maryland 21005, United States
| |
Collapse
|
3
|
Chennakesavalu S, Rotskoff GM. Data-Efficient Generation of Protein Conformational Ensembles with Backbone-to-Side-Chain Transformers. J Phys Chem B 2024; 128:2114-2123. [PMID: 38394363 DOI: 10.1021/acs.jpcb.3c08195] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/25/2024]
Abstract
Excitement at the prospect of using data-driven generative models to sample configurational ensembles of biomolecular systems stems from the extraordinary success of these models on a diverse set of high-dimensional sampling tasks. Unlike image generation or even the closely related problem of protein structure prediction, there are currently no data sources with sufficient breadth to parametrize generative models for conformational ensembles. To enable discovery, a fundamentally different approach to building generative models is required: models should be able to propose rare, albeit physical, conformations that may not arise in even the largest data sets. Here we introduce a modular strategy to generate conformations based on "backmapping" from a fixed protein backbone that (1) maintains conformational diversity of the side chains and (2) couples the side-chain fluctuations using global information about the protein conformation. Our model combines simple statistical models of side-chain conformations based on rotamer libraries with the now ubiquitous transformer architecture to sample with atomistic accuracy. Together, these ingredients provide a strategy for rapid data acquisition and hence a crucial ingredient for scalable physical simulation with generative neural networks.
Collapse
Affiliation(s)
| | - Grant M Rotskoff
- Department of Chemistry, Stanford University, Stanford, California 94305, United States
- Institute for Computational and Mathematical Engineering, Stanford University, Stanford, California 94305, United States
| |
Collapse
|
4
|
Christians LF, Halingstad EV, Kram E, Okolovitch EM, Pak AJ. Formalizing Coarse-Grained Representations of Anisotropic Interactions at Multimeric Protein Interfaces Using Virtual Sites. J Phys Chem B 2024; 128:1394-1406. [PMID: 38316012 DOI: 10.1021/acs.jpcb.3c07023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2024]
Abstract
Molecular simulations of biomacromolecules that assemble into multimeric complexes remain a challenge due to computationally inaccessible length and time scales. Low-resolution and implicit-solvent coarse-grained modeling approaches using traditional nonbonded interactions (both pairwise and spherically isotropic) have been able to partially address this gap. However, these models may fail to capture the complex anisotropic interactions present at macromolecular interfaces unless higher-order interaction potentials are incorporated at the expense of the computational cost. In this work, we introduce an alternate and systematic approach to represent directional interactions at protein-protein interfaces by using virtual sites restricted to pairwise interactions. We show that virtual site interaction parameters can be optimized within a relative entropy minimization framework by using only information from known statistics between coarse-grained sites. We compare our virtual site models to traditional coarse-grained models using two case studies of multimeric protein assemblies and find that the virtual site models predict pairwise correlations with higher fidelity and, more importantly, assembly behavior that is morphologically consistent with experiments. Our study underscores the importance of anisotropic interaction representations and paves the way for more accurate yet computationally efficient coarse-grained simulations of macromolecular assembly in future research.
Collapse
Affiliation(s)
- Luc F Christians
- Department of Chemical and Biological Engineering, Colorado School of Mines, Golden, Colorado 80401, United States
| | - Ethan V Halingstad
- Department of Chemical and Biological Engineering, Colorado School of Mines, Golden, Colorado 80401, United States
| | - Emiel Kram
- Department of Chemical and Biological Engineering, Colorado School of Mines, Golden, Colorado 80401, United States
| | - Evan M Okolovitch
- Department of Chemical and Biological Engineering, Colorado School of Mines, Golden, Colorado 80401, United States
| | - Alexander J Pak
- Department of Chemical and Biological Engineering, Colorado School of Mines, Golden, Colorado 80401, United States
- Quantitative Biosciences and Engineering Program, Colorado School of Mines, Golden, Colorado 80401, United States
- Materials Science Program, Colorado School of Mines, Golden, Colorado 80401, United States
| |
Collapse
|
5
|
Madanchi A, Kilgour M, Zysk F, Kühne TD, Simine L. Simulations of disordered matter in 3D with the morphological autoregressive protocol (MAP) and convolutional neural networks. J Chem Phys 2024; 160:024101. [PMID: 38189615 DOI: 10.1063/5.0174615] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2023] [Accepted: 12/08/2023] [Indexed: 01/09/2024] Open
Abstract
Disordered molecular systems, such as amorphous catalysts, organic thin films, electrolyte solutions, and water, are at the cutting edge of computational exploration at present. Traditional simulations of such systems at length scales relevant to experiments in practice require a compromise between model accuracy and quality of sampling. To address this problem, we have developed an approach based on generative machine learning called the Morphological Autoregressive Protocol (MAP), which provides computational access to mesoscale disordered molecular configurations at linear cost at generation for materials in which structural correlations decay sufficiently rapidly. The algorithm is implemented using an augmented PixelCNN deep learning architecture that, as we previously demonstrated, produces excellent results in 2 dimensions (2D) for mono-elemental molecular systems. Here, we extend our implementation to multi-elemental 3D and demonstrate performance using water as our test system in two scenarios: (1) liquid water and (2) samples conditioned on the presence of pre-selected motifs. We trained the model on small-scale samples of liquid water produced using path-integral molecular dynamics simulations, including nuclear quantum effects under ambient conditions. MAP-generated water configurations are shown to accurately reproduce the properties of the training set and to produce stable trajectories when used as initial conditions in quantum dynamics simulations. We expect our approach to perform equally well on other disordered molecular systems in which structural correlations decay sufficiently fast while offering unique advantages in situations when the disorder is quenched rather than equilibrated.
Collapse
Affiliation(s)
- Ata Madanchi
- Department of Physics, McGill University, 3600 University St., Montreal, Quebec H3A 2T8, Canada
| | - Michael Kilgour
- Department of Chemistry, McGill University, 801 Sherbrooke St. W, Montreal, Quebec H3A 0B8, Canada
| | - Frederik Zysk
- Dynamics of Condensed Matter and Center for Sustainable Systems Design, Chair of Theoretical Chemistry, University of Paderborn, Paderborn 33098, Germany
| | - Thomas D Kühne
- Dynamics of Condensed Matter and Center for Sustainable Systems Design, Chair of Theoretical Chemistry, University of Paderborn, Paderborn 33098, Germany
| | - Lena Simine
- Department of Chemistry, McGill University, 801 Sherbrooke St. W, Montreal, Quebec H3A 0B8, Canada
| |
Collapse
|
6
|
Del Razo MJ, Crommelin D, Bolhuis PG. Data-driven dynamical coarse-graining for condensed matter systems. J Chem Phys 2024; 160:024108. [PMID: 38193550 DOI: 10.1063/5.0177553] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2023] [Accepted: 12/05/2023] [Indexed: 01/10/2024] Open
Abstract
Simulations of condensed matter systems often focus on the dynamics of a few distinguished components but require integrating the full system. A prime example is a molecular dynamics simulation of a (macro)molecule in a solution, where the molecule(s) and the solvent dynamics need to be integrated, rendering the simulations computationally costly and often unfeasible for physically/biologically relevant time scales. Standard coarse graining approaches can reproduce equilibrium distributions and structural features but do not properly include the dynamics. In this work, we develop a general data-driven coarse-graining methodology inspired by the Mori-Zwanzig formalism, which shows that macroscopic systems with a large number of degrees of freedom can be described by a few relevant variables and additional noise and memory terms. Our coarse-graining method consists of numerical integrators for the distinguished components, where the noise and interaction terms with other system components are substituted by a random variable sampled from a data-driven model. The model is parameterized using data from multiple short-time full-system simulations, and then, it is used to run long-time simulations. Applying our methodology to three systems-a distinguished particle under a harmonic and a bistable potential and a dimer with two metastable configurations-the resulting coarse-grained models are capable of reproducing not only the equilibrium distributions but also the dynamic behavior due to temporal correlations and memory effects. Remarkably, our method even reproduces the transition dynamics between metastable states, which is challenging to capture correctly. Our approach is not constrained to specific dynamics and can be extended to systems beyond Langevin dynamics, and, in principle, even to non-equilibrium dynamics.
Collapse
Affiliation(s)
- Mauricio J Del Razo
- Department of Mathematics and Computer Science, Freie Universität Berlin, Berlin, Germany
- Van't Hoff Institute for Molecular Sciences, University of Amsterdam, PO Box 94157, 1090GD Amsterdam, The Netherlands
- Korteweg-de Vries Institute for Mathematics, University of Amsterdam, PO Box 94248, 1090GD Amsterdam, The Netherlands
- Dutch Institute for Emergent Phenomena, University of Amsterdam, Amsterdam, The Netherlands
| | - Daan Crommelin
- Korteweg-de Vries Institute for Mathematics, University of Amsterdam, PO Box 94248, 1090GD Amsterdam, The Netherlands
- Centrum Wiskunde & Informatica, 1098 XG Amsterdam, The Netherlands
| | - Peter G Bolhuis
- Van't Hoff Institute for Molecular Sciences, University of Amsterdam, PO Box 94157, 1090GD Amsterdam, The Netherlands
| |
Collapse
|
7
|
Maier JC, Wang CI, Jackson NE. Distilling coarse-grained representations of molecular electronic structure with continuously gated message passing. J Chem Phys 2024; 160:024109. [PMID: 38193551 DOI: 10.1063/5.0179253] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2023] [Accepted: 12/14/2023] [Indexed: 01/10/2024] Open
Abstract
Bottom-up methods for coarse-grained (CG) molecular modeling are critically needed to establish rigorous links between atomistic reference data and reduced molecular representations. For a target molecule, the ideal reduced CG representation is a function of both the conformational ensemble of the system and the target physical observable(s) to be reproduced at the CG resolution. However, there is an absence of algorithms for selecting CG representations of molecules from which complex properties, including molecular electronic structure, can be accurately modeled. We introduce continuously gated message passing (CGMP), a graph neural network (GNN) method for atomically decomposing molecular electronic structure sampled over conformational ensembles. CGMP integrates 3D-invariant GNNs and a novel gated message passing system to continuously reduce the atomic degrees of freedom accessible for electronic predictions, resulting in a one-shot importance ranking of atoms contributing to a target molecular property. Moreover, CGMP provides the first approach by which to quantify the degeneracy of "good" CG representations conditioned on specific prediction targets, facilitating the development of more transferable CG representations. We further show how CGMP can be used to highlight multiatom correlations, illuminating a path to developing CG electronic Hamiltonians in terms of interpretable collective variables for arbitrarily complex molecules.
Collapse
Affiliation(s)
- J Charlie Maier
- Department of Physics, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, USA
| | - Chun-I Wang
- Department of Chemistry, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, USA
| | - Nicholas E Jackson
- Department of Chemistry, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, USA
| |
Collapse
|
8
|
Zong T, Liu X, Zhang X, Yang Q. Efficient characterization of double-cross-linked networks in hydrogels using data-inspired coarse-grained molecular dynamics model. J Chem Phys 2024; 160:024115. [PMID: 38197443 DOI: 10.1063/5.0180847] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2023] [Accepted: 12/14/2023] [Indexed: 01/11/2024] Open
Abstract
The network structure within polymers significantly influences their mechanical properties, including their strength, toughness, and fatigue resistance. All-atom molecular dynamics (AAMD) simulations offer a method to investigate the energy dissipation mechanism within polymers during deformation and fracture; Such an approach is, however, computationally inefficient when used to analyze polymers with complex network structures, such as the common chemically double-networked hydrogels. Alternatively, coarse-grained molecular dynamics (CGMD) models, which reduce the computational degrees of freedom by concentrating a set of adjacent atoms into a coarse-grained bead, can be employed. In CGMD simulations, a coarse-grained force field (CGFF) is a critical factor affecting the simulation accuracy. In this paper, we proposed a data-based method for predicting the CGFF parameters to improve the simulation efficiency of complex cross-linked network in polymers. Here, we utilized a typical chemically double-networked hydrogel as an example. An artificial neural network was selected, and it was trained with the tensile stress-strain data from the CGMD simulations using different CGFF parameters. The CGMD simulations using the predicted CGFF parameters show good agreement with the AAMD simulations and are almost fifty times faster. The data-inspired CGMD model presented here broadens the applicability of molecular dynamics simulations to cross-linked polymers and has the potential to provide insights that will aid the design of polymers with desirable mechanical properties.
Collapse
Affiliation(s)
- Ting Zong
- Beijing University of Technology, Beijing 100124, China
| | - Xia Liu
- Beijing University of Technology, Beijing 100124, China
| | - Xingyu Zhang
- Beijing University of Technology, Beijing 100124, China
| | | |
Collapse
|
9
|
Coste A, Slejko E, Zavadlav J, Praprotnik M. Developing an Implicit Solvation Machine Learning Model for Molecular Simulations of Ionic Media. J Chem Theory Comput 2024; 20:411-420. [PMID: 38118122 PMCID: PMC10782447 DOI: 10.1021/acs.jctc.3c00984] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2023] [Revised: 12/04/2023] [Accepted: 12/04/2023] [Indexed: 12/22/2023]
Abstract
Molecular dynamics (MD) simulations of biophysical systems require accurate modeling of their native environment, i.e., aqueous ionic solution, as it critically impacts the structure and function of biomolecules. On the other hand, the models should be computationally efficient to enable simulations of large spatiotemporal scales. Here, we present the deep implicit solvation model for sodium chloride solutions that satisfies both requirements. Owing to the use of the neural network potential, the model can capture the many-body potential of mean force, while the implicit water treatment renders the model inexpensive. We demonstrate our approach first for pure ionic solutions with concentrations ranging from physiological to 2 M. We then extend the model to capture the effective ion interactions in the vicinity and far away from a DNA molecule. In both cases, the structural properties are in good agreement with all-atom MD, showcasing a general methodology for the efficient and accurate modeling of ionic media.
Collapse
Affiliation(s)
- Amaury Coste
- Laboratory
for Molecular Modeling, National Institute of Chemistry, Ljubljana SI-1001, Slovenia
| | - Ema Slejko
- Laboratory
for Molecular Modeling, National Institute of Chemistry, Ljubljana SI-1001, Slovenia
- Department
of Physics, Faculty of Mathematics and Physics, University of Ljubljana, Ljubljana SI-1000, Slovenia
| | - Julija Zavadlav
- Professorship
of Multiscale Modeling of Fluid Materials, TUM School of Engineering
and Design, Technical University of Munich, Garching Near Munich DE-85748, Germany
| | - Matej Praprotnik
- Laboratory
for Molecular Modeling, National Institute of Chemistry, Ljubljana SI-1001, Slovenia
- Department
of Physics, Faculty of Mathematics and Physics, University of Ljubljana, Ljubljana SI-1000, Slovenia
| |
Collapse
|
10
|
Kim J, Jeong Y, Kim WJ, Lee EK, Choi IS. MolNet_Equi: A Chemically Intuitive, Rotation-Equivariant Graph Neural Network. Chem Asian J 2024; 19:e202300684. [PMID: 37953530 DOI: 10.1002/asia.202300684] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2023] [Revised: 11/11/2023] [Accepted: 11/12/2023] [Indexed: 11/14/2023]
Abstract
Although deep-learning (DL) models suggest unprecedented prediction capabilities in tackling various chemical problems, their demonstrated tasks have so far been limited to the scalar properties including the magnitude of vectorial properties, such as molecular dipole moments. A rotation-equivariant MolNet_Equi model, proposed in this paper, understands and recognizes the molecular rotation in the 3D Euclidean space, and exhibits the ability to predict directional dipole moments in the rotation-sensitive mode, as well as showing superior performance for the prediction of scalar properties. Three consecutive operations of molecular rotationR M ${\left(R\left(M\right)\right)}$ , dipole-moment predictionφ μ R M ${\left({\phi{} }_{\mu }\left(R\left(M\right)\right)\right)}$ , and dipole-moment inverse-rotationR - 1 φ μ R M ${\left({R}^{-1}\left({\phi{} }_{\mu }\left(R\left(M\right)\right)\right)\right)}$ do not alter the original prediction of the total dipole moment of a moleculeφ μ M ${\left({\phi{} }_{\mu }\right(M\left)\right)}$ , assuring the rotational equivariance of MolNet_Equi. Furthermore, MolNet_Equi faithfully predicts the absolute direction of dipole moments given molecular poses, albeit the model has been trained only with the information on dipole-moment magnitudes, not directions. This work highlights the potential of incorporating fundamental yet crucial chemical rules and concepts into DL models, leading to the development of chemically intuitive models.
Collapse
Affiliation(s)
- Jihoo Kim
- Department of Chemistry, KAIST, Daejeon, 34141, Korea
| | - Yoonho Jeong
- Department of Chemistry, KAIST, Daejeon, 34141, Korea
| | - Won June Kim
- Department of Biology and Chemistry, Changwon National University, Changwon, 51140, Korea
| | - Eok Kyun Lee
- Department of Chemistry, KAIST, Daejeon, 34141, Korea
| | - Insung S Choi
- Department of Chemistry, KAIST, Daejeon, 34141, Korea
| |
Collapse
|
11
|
Airas J, Ding X, Zhang B. Transferable Implicit Solvation via Contrastive Learning of Graph Neural Networks. ACS CENTRAL SCIENCE 2023; 9:2286-2297. [PMID: 38161379 PMCID: PMC10755853 DOI: 10.1021/acscentsci.3c01160] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/19/2023] [Revised: 10/26/2023] [Accepted: 10/31/2023] [Indexed: 01/03/2024]
Abstract
Implicit solvent models are essential for molecular dynamics simulations of biomolecules, striking a balance between computational efficiency and biological realism. Efforts are underway to develop accurate and transferable implicit solvent models and coarse-grained (CG) force fields in general, guided by a bottom-up approach that matches the CG energy function with the potential of mean force (PMF) defined by the finer system. However, practical challenges arise due to the lack of analytical expressions for the PMF and algorithmic limitations in parameterizing CG force fields. To address these challenges, a machine learning-based approach is proposed, utilizing graph neural networks (GNNs) to represent the solvation free energy and potential contrasting for parameter optimization. We demonstrate the effectiveness of the approach by deriving a transferable GNN implicit solvent model using 600,000 atomistic configurations of six proteins obtained from explicit solvent simulations. The GNN model provides solvation free energy estimations much more accurately than state-of-the-art implicit solvent models, reproducing configurational distributions of explicit solvent simulations. We also demonstrate the reasonable transferability of the GNN model outside of the training data. Our study offers valuable insights for deriving systematically improvable implicit solvent models and CG force fields from a bottom-up perspective.
Collapse
Affiliation(s)
- Justin Airas
- Department of Chemistry, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139-4307, United
States
| | - Xinqiang Ding
- Department of Chemistry, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139-4307, United
States
| | - Bin Zhang
- Department of Chemistry, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139-4307, United
States
| |
Collapse
|
12
|
Loose T, Sahrmann PG, Qu TS, Voth GA. Coarse-Graining with Equivariant Neural Networks: A Path Toward Accurate and Data-Efficient Models. J Phys Chem B 2023; 127:10564-10572. [PMID: 38033234 PMCID: PMC10726966 DOI: 10.1021/acs.jpcb.3c05928] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2023] [Revised: 10/30/2023] [Accepted: 11/09/2023] [Indexed: 12/02/2023]
Abstract
Machine learning has recently entered into the mainstream of coarse-grained (CG) molecular modeling and simulation. While a variety of methods for incorporating deep learning into these models exist, many of them involve training neural networks to act directly as the CG force field. This has several benefits of which the most significant is accuracy. Neural networks can inherently incorporate multibody effects during the calculation of CG forces, and a well-trained neural network force field outperforms pairwise basis sets generated from essentially any methodology. However, this comes at a significant cost. First, these models are typically slower than pairwise force fields, even when accounting for specialized hardware, which accelerates the training and integration of such networks. The second and the focus of this paper is the need for a considerable amount of data to train such force fields. It is common to use 10s of microseconds of molecular dynamics data to train a single CG model, which approaches the point of eliminating the CG model's usefulness in the first place. As we investigate in this work, this "data-hunger" trap from neural networks for predicting molecular energies and forces can be remediated in part by incorporating equivariant convolutional operations. We demonstrate that, for CG water, networks that incorporate equivariant convolutional operations can produce functional models using data sets as small as a single frame of reference data, while networks without these operations cannot.
Collapse
Affiliation(s)
| | | | - Thomas S. Qu
- Department of Chemistry,
Chicago Center for Theoretical Chemistry, James Franck Institute,
and Institute for Biophysical Dynamics, The University of Chicago, Chicago, Illinois 60637, United States
| | - Gregory A. Voth
- Department of Chemistry,
Chicago Center for Theoretical Chemistry, James Franck Institute,
and Institute for Biophysical Dynamics, The University of Chicago, Chicago, Illinois 60637, United States
| |
Collapse
|
13
|
Jones MS, Shmilovich K, Ferguson AL. DiAMoNDBack: Diffusion-Denoising Autoregressive Model for Non-Deterministic Backmapping of Cα Protein Traces. J Chem Theory Comput 2023; 19:7908-7923. [PMID: 37906711 DOI: 10.1021/acs.jctc.3c00840] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2023]
Abstract
Coarse-grained molecular models of proteins permit access to length and time scales unattainable by all-atom models and the simulation of processes that occur on long time scales, such as aggregation and folding. The reduced resolution realizes computational accelerations, but an atomistic representation can be vital for a complete understanding of mechanistic details. Backmapping is the process of restoring all-atom resolution to coarse-grained molecular models. In this work, we report DiAMoNDBack (Diffusion-denoising Autoregressive Model for Non-Deterministic Backmapping) as an autoregressive denoising diffusion probability model to restore all-atom details to coarse-grained protein representations retaining only Cα coordinates. The autoregressive generation process proceeds from the protein N-terminus to C-terminus in a residue-by-residue fashion conditioned on the Cα trace and previously backmapped backbone and side-chain atoms within the local neighborhood. The local and autoregressive nature of our model makes it transferable between proteins. The stochastic nature of the denoising diffusion process means that the model generates a realistic ensemble of backbone and side-chain all-atom configurations consistent with the coarse-grained Cα trace. We train DiAMoNDBack over 65k+ structures from the Protein Data Bank (PDB) and validate it in applications to a hold-out PDB test set, intrinsically disordered protein structures from the Protein Ensemble Database (PED), molecular dynamics simulations of fast-folding mini-proteins from DE Shaw Research, and coarse-grained simulation data. We achieve state-of-the-art reconstruction performance in terms of correct bond formation, avoidance of side-chain clashes, and the diversity of the generated side-chain configurational states. We make the DiAMoNDBack model publicly available as a free and open-source Python package.
Collapse
Affiliation(s)
- Michael S Jones
- Pritzker School of Molecular Engineering, University of Chicago, Chicago, Illinois 60637, United States
| | - Kirill Shmilovich
- Pritzker School of Molecular Engineering, University of Chicago, Chicago, Illinois 60637, United States
| | - Andrew L Ferguson
- Pritzker School of Molecular Engineering, University of Chicago, Chicago, Illinois 60637, United States
| |
Collapse
|
14
|
Navarro C, Majewski M, De Fabritiis G. Top-Down Machine Learning of Coarse-Grained Protein Force Fields. J Chem Theory Comput 2023; 19:7518-7526. [PMID: 37874270 PMCID: PMC10777392 DOI: 10.1021/acs.jctc.3c00638] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2023] [Indexed: 10/25/2023]
Abstract
Developing accurate and efficient coarse-grained representations of proteins is crucial for understanding their folding, function, and interactions over extended time scales. Our methodology involves simulating proteins with molecular dynamics and utilizing the resulting trajectories to train a neural network potential through differentiable trajectory reweighting. Remarkably, this method requires only the native conformation of proteins, eliminating the need for labeled data derived from extensive simulations or memory-intensive end-to-end differentiable simulations. Once trained, the model can be employed to run parallel molecular dynamics simulations and sample folding events for proteins both within and beyond the training distribution, showcasing its extrapolation capabilities. By applying Markov state models, native-like conformations of the simulated proteins can be predicted from the coarse-grained simulations. Owing to its theoretical transferability and ability to use solely experimental static structures as training data, we anticipate that this approach will prove advantageous for developing new protein force fields and further advancing the study of protein dynamics, folding, and interactions.
Collapse
Affiliation(s)
- Carles Navarro
- Acellera
Labs, Doctor Trueta 183, 08005 Barcelona, Spain
| | | | - Gianni De Fabritiis
- Computational
Science Laboratory, Universitat Pompeu Fabra, Barcelona Biomedical Research Park (PRBB), Carrer Dr. Aiguader 88, 08003 Barcelona, Spain
- Acellera
Ltd., Devonshire House
582, Middlesex HA7 1JS, United Kingdom
- Institució
Catalana de Recerca i Estudis Avançats (ICREA), Passeig Lluis Companys 23, 08010 Barcelona, Spain
| |
Collapse
|
15
|
Peng Y, Pak AJ, Durumeric AEP, Sahrmann PG, Mani S, Jin J, Loose TD, Beiter J, Voth GA. OpenMSCG: A Software Tool for Bottom-Up Coarse-Graining. J Phys Chem B 2023; 127:8537-8550. [PMID: 37791670 PMCID: PMC10577682 DOI: 10.1021/acs.jpcb.3c04473] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2023] [Revised: 09/05/2023] [Indexed: 10/05/2023]
Abstract
The "bottom-up" approach to coarse-graining, for building accurate and efficient computational models to simulate large-scale and complex phenomena and processes, is an important approach in computational chemistry, biophysics, and materials science. As one example, the Multiscale Coarse-Graining (MS-CG) approach to developing CG models can be rigorously derived using statistical mechanics applied to fine-grained, i.e., all-atom simulation data for a given system. Under a number of circumstances, a systematic procedure, such as MS-CG modeling, is particularly valuable. Here, we present the development of the OpenMSCG software, a modularized open-source software that provides a collection of successful and widely applied bottom-up CG methods, including Boltzmann Inversion (BI), Force-Matching (FM), Ultra-Coarse-Graining (UCG), Relative Entropy Minimization (REM), Essential Dynamics Coarse-Graining (EDCG), and Heterogeneous Elastic Network Modeling (HeteroENM). OpenMSCG is a high-performance and comprehensive toolset that can be used to derive CG models from large-scale fine-grained simulation data in file formats from common molecular dynamics (MD) software packages, such as GROMACS, LAMMPS, and NAMD. OpenMSCG is modularized in the Python programming framework, which allows users to create and customize modeling "recipes" for reproducible results, thus greatly improving the reliability, reproducibility, and sharing of bottom-up CG models and their applications.
Collapse
Affiliation(s)
- Yuxing Peng
- NVIDIA
Corporation, 2788 San Tomas Expressway, Santa Clara, California 95051, United States
| | - Alexander J. Pak
- Department
of Chemical and Biological Engineering, Colorado School of Mines, Golden, Colorado 80401, United States
| | | | - Patrick G. Sahrmann
- Department
of Chemistry, Chicago Center for Theoretical Chemistry, James Franck
Institute, and Institute for Biophysical Dynamics, The University of Chicago, Chicago, Illinois 60637, United States
| | - Sriramvignesh Mani
- Department
of Chemistry, Chicago Center for Theoretical Chemistry, James Franck
Institute, and Institute for Biophysical Dynamics, The University of Chicago, Chicago, Illinois 60637, United States
| | - Jaehyeok Jin
- Department
of Chemistry, Chicago Center for Theoretical Chemistry, James Franck
Institute, and Institute for Biophysical Dynamics, The University of Chicago, Chicago, Illinois 60637, United States
| | - Timothy D. Loose
- Department
of Chemistry, Chicago Center for Theoretical Chemistry, James Franck
Institute, and Institute for Biophysical Dynamics, The University of Chicago, Chicago, Illinois 60637, United States
| | - Jeriann Beiter
- Department
of Chemistry, Chicago Center for Theoretical Chemistry, James Franck
Institute, and Institute for Biophysical Dynamics, The University of Chicago, Chicago, Illinois 60637, United States
| | - Gregory A. Voth
- Department
of Chemistry, Chicago Center for Theoretical Chemistry, James Franck
Institute, and Institute for Biophysical Dynamics, The University of Chicago, Chicago, Illinois 60637, United States
| |
Collapse
|
16
|
Ivanov M, Posysoev M, Lyubartsev AP. Coarse-Grained Modeling Using Neural Networks Trained on Structural Data. J Chem Theory Comput 2023; 19:6704-6717. [PMID: 37712507 PMCID: PMC10569054 DOI: 10.1021/acs.jctc.3c00516] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2023] [Indexed: 09/16/2023]
Abstract
We propose a method of bottom-up coarse-graining, in which interactions within a coarse-grained model are determined by an artificial neural network trained on structural data obtained from multiple atomistic simulations. The method uses ideas of the inverse Monte Carlo approach, relating changes in the neural network weights with changes in average structural properties, such as radial distribution functions. As a proof of concept, we demonstrate the method on a system interacting by a Lennard-Jones potential modeled by a simple linear network and a single-site coarse-grained model of methanol-water solutions. In the latter case, we implement a nonlinear neural network with intermediate layers trained by atomistic simulations carried out at different methanol concentrations. We show that such a network acts as a transferable potential at the coarse-grained resolution for a wide range of methanol concentrations, including those not included in the training set.
Collapse
Affiliation(s)
- Mikhail Ivanov
- Department of Materials and
Environmental Chemistry, Stockholm University, SE-106 91 Stockholm, Sweden
| | - Maksim Posysoev
- Department of Materials and
Environmental Chemistry, Stockholm University, SE-106 91 Stockholm, Sweden
| | - Alexander P. Lyubartsev
- Department of Materials and
Environmental Chemistry, Stockholm University, SE-106 91 Stockholm, Sweden
| |
Collapse
|
17
|
Lederer J, Gastegger M, Schütt KT, Kampffmeyer M, Müller KR, Unke OT. Automatic identification of chemical moieties. Phys Chem Chem Phys 2023; 25:26370-26379. [PMID: 37750554 PMCID: PMC10548786 DOI: 10.1039/d3cp03845a] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2023] [Accepted: 08/18/2023] [Indexed: 09/27/2023]
Abstract
In recent years, the prediction of quantum mechanical observables with machine learning methods has become increasingly popular. Message-passing neural networks (MPNNs) solve this task by constructing atomic representations, from which the properties of interest are predicted. Here, we introduce a method to automatically identify chemical moieties (molecular building blocks) from such representations, enabling a variety of applications beyond property prediction, which otherwise rely on expert knowledge. The required representation can either be provided by a pretrained MPNN, or be learned from scratch using only structural information. Beyond the data-driven design of molecular fingerprints, the versatility of our approach is demonstrated by enabling the selection of representative entries in chemical databases, the automatic construction of coarse-grained force fields, as well as the identification of reaction coordinates.
Collapse
Affiliation(s)
- Jonas Lederer
- Berlin Institute of Technology (TU Berlin), 10587 Berlin, Germany.
- BIFOLD - Berlin Institute for the Foundations of Learning and Data, Germany
| | - Michael Gastegger
- Berlin Institute of Technology (TU Berlin), 10587 Berlin, Germany.
- BIFOLD - Berlin Institute for the Foundations of Learning and Data, Germany
| | - Kristof T Schütt
- Berlin Institute of Technology (TU Berlin), 10587 Berlin, Germany.
- BIFOLD - Berlin Institute for the Foundations of Learning and Data, Germany
| | - Michael Kampffmeyer
- Department of Physics and Technology, UiT The Arctic University of Norway, 9019 Tromsø, Norway
| | - Klaus-Robert Müller
- Berlin Institute of Technology (TU Berlin), 10587 Berlin, Germany.
- BIFOLD - Berlin Institute for the Foundations of Learning and Data, Germany
- Google Deepmind, Germany
- Department of Artificial Intelligence, Korea University, Seoul 136-713, Korea
- Max Planck Institut für Informatik, 66123 Saarbrücken, Germany
| | - Oliver T Unke
- Berlin Institute of Technology (TU Berlin), 10587 Berlin, Germany.
- BIFOLD - Berlin Institute for the Foundations of Learning and Data, Germany
- Google Deepmind, Germany
| |
Collapse
|
18
|
Arts M, Garcia Satorras V, Huang CW, Zügner D, Federici M, Clementi C, Noé F, Pinsler R, van den Berg R. Two for One: Diffusion Models and Force Fields for Coarse-Grained Molecular Dynamics. J Chem Theory Comput 2023; 19:6151-6159. [PMID: 37688551 DOI: 10.1021/acs.jctc.3c00702] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/11/2023]
Abstract
Coarse-grained (CG) molecular dynamics enables the study of biological processes at temporal and spatial scales that would be intractable at an atomistic resolution. However, accurately learning a CG force field remains a challenge. In this work, we leverage connections between score-based generative models, force fields, and molecular dynamics to learn a CG force field without requiring any force inputs during training. Specifically, we train a diffusion generative model on protein structures from molecular dynamics simulations, and we show that its score function approximates a force field that can directly be used to simulate CG molecular dynamics. While having a vastly simplified training setup compared to previous work, we demonstrate that our approach leads to improved performance across several protein simulations for systems up to 56 amino acids, reproducing the CG equilibrium distribution and preserving the dynamics of all-atom simulations such as protein folding events.
Collapse
Affiliation(s)
- Marloes Arts
- Department of Computer Science, University of Copenhagen, Universitetsparken 1, Copenhagen 2100, Denmark
| | - Victor Garcia Satorras
- AI4Science, Microsoft Research, Evert van de Beekstraat 354, Amsterdam 1118 CZ, The Netherlands
| | - Chin-Wei Huang
- AI4Science, Microsoft Research, Evert van de Beekstraat 354, Amsterdam 1118 CZ, The Netherlands
| | - Daniel Zügner
- AI4Science, Microsoft Research, Karl-Liebknecht-Straße 32, Berlin 10178, Germany
| | - Marco Federici
- Informatics Institute, University of Amsterdam, Science Park 904, Amsterdam 1098 XH, The Netherlands
| | - Cecilia Clementi
- AI4Science, Microsoft Research, Karl-Liebknecht-Straße 32, Berlin 10178, Germany
- Department of Physics, Freie Universität Berlin, Arnimalle 12, Berlin 14195, Germany
| | - Frank Noé
- AI4Science, Microsoft Research, Karl-Liebknecht-Straße 32, Berlin 10178, Germany
| | - Robert Pinsler
- AI4Science, Microsoft Research, 21 Station Road, Cambridge CB1 2FB, U.K
| | - Rianne van den Berg
- AI4Science, Microsoft Research, Evert van de Beekstraat 354, Amsterdam 1118 CZ, The Netherlands
| |
Collapse
|
19
|
Majewski M, Pérez A, Thölke P, Doerr S, Charron NE, Giorgino T, Husic BE, Clementi C, Noé F, De Fabritiis G. Machine learning coarse-grained potentials of protein thermodynamics. Nat Commun 2023; 14:5739. [PMID: 37714883 PMCID: PMC10504246 DOI: 10.1038/s41467-023-41343-1] [Citation(s) in RCA: 16] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2023] [Accepted: 08/29/2023] [Indexed: 09/17/2023] Open
Abstract
A generalized understanding of protein dynamics is an unsolved scientific problem, the solution of which is critical to the interpretation of the structure-function relationships that govern essential biological processes. Here, we approach this problem by constructing coarse-grained molecular potentials based on artificial neural networks and grounded in statistical mechanics. For training, we build a unique dataset of unbiased all-atom molecular dynamics simulations of approximately 9 ms for twelve different proteins with multiple secondary structure arrangements. The coarse-grained models are capable of accelerating the dynamics by more than three orders of magnitude while preserving the thermodynamics of the systems. Coarse-grained simulations identify relevant structural states in the ensemble with comparable energetics to the all-atom systems. Furthermore, we show that a single coarse-grained potential can integrate all twelve proteins and can capture experimental structural features of mutated proteins. These results indicate that machine learning coarse-grained potentials could provide a feasible approach to simulate and understand protein dynamics.
Collapse
Affiliation(s)
- Maciej Majewski
- Computational Science Laboratory, Universitat Pompeu Fabra, Barcelona Biomedical Research Park (PRBB), Carrer Dr. Aiguader 88, 08003, Barcelona, Spain
- Acellera Labs, Doctor Trueta 183, 08005, Barcelona, Spain
| | - Adrià Pérez
- Computational Science Laboratory, Universitat Pompeu Fabra, Barcelona Biomedical Research Park (PRBB), Carrer Dr. Aiguader 88, 08003, Barcelona, Spain
- Acellera Labs, Doctor Trueta 183, 08005, Barcelona, Spain
| | - Philipp Thölke
- Computational Science Laboratory, Universitat Pompeu Fabra, Barcelona Biomedical Research Park (PRBB), Carrer Dr. Aiguader 88, 08003, Barcelona, Spain
| | - Stefan Doerr
- Acellera Labs, Doctor Trueta 183, 08005, Barcelona, Spain
| | - Nicholas E Charron
- Department of Physics, Rice University, Houston, TX, 77005, USA
- Center for Theoretical Biological Physics, Rice University, Houston, TX, 77005, USA
- Department of Physics, FU Berlin, Arnimallee 12, 14195, Berlin, Germany
| | - Toni Giorgino
- Biophysics Institute, National Research Council (CNR-IBF), 20133, Milan, Italy
| | - Brooke E Husic
- Department of Mathematics and Computer Science, FU Berlin, Arnimallee 12, 14195, Berlin, Germany
- Lewis Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ, 08540, USA
- Princeton Center for Theoretical Science, Princeton University, Princeton, NJ, 08540, USA
- Center for the Physics of Biological Function, Princeton University, Princeton, NJ, 08540, USA
| | - Cecilia Clementi
- Department of Physics, Rice University, Houston, TX, 77005, USA.
- Center for Theoretical Biological Physics, Rice University, Houston, TX, 77005, USA.
- Department of Physics, FU Berlin, Arnimallee 12, 14195, Berlin, Germany.
- Department of Chemistry, Rice University, Houston, TX, 77005, USA.
| | - Frank Noé
- Department of Physics, FU Berlin, Arnimallee 12, 14195, Berlin, Germany.
- Department of Mathematics and Computer Science, FU Berlin, Arnimallee 12, 14195, Berlin, Germany.
- Department of Chemistry, Rice University, Houston, TX, 77005, USA.
- Microsoft Research AI4Science, Karl-Liebknecht Str. 32, 10178, Berlin, Germany.
| | - Gianni De Fabritiis
- Computational Science Laboratory, Universitat Pompeu Fabra, Barcelona Biomedical Research Park (PRBB), Carrer Dr. Aiguader 88, 08003, Barcelona, Spain.
- Acellera Labs, Doctor Trueta 183, 08005, Barcelona, Spain.
- Institució Catalana de Recerca i Estudis Avançats (ICREA), Passeig Lluis Companys 23, 08010, Barcelona, Spain.
| |
Collapse
|
20
|
Danielsson A, Samsonov SA, Liwo A, Sieradzan AK. Extension of the SUGRES-1P Coarse-Grained Model of Polysaccharides to Heparin. J Chem Theory Comput 2023; 19:6023-6036. [PMID: 37587433 PMCID: PMC10500997 DOI: 10.1021/acs.jctc.3c00511] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2023] [Indexed: 08/18/2023]
Abstract
Heparin is an unbranched periodic polysaccharide composed of negatively charged monomers and involved in key biological processes, including anticoagulation, angiogenesis, and inflammation. Its structure and dynamics have been studied extensively using experimental as well as theoretical approaches. The conventional approach of computational chemistry applied to the analysis of biomolecules is all-atom molecular dynamics, which captures the interactions of individual atoms by solving Newton's equation of motion. An alternative is molecular dynamics simulations using coarse-grained models of biomacromolecules, which offer a reduction of the representation and consequently enable us to extend the time and size scale of simulations by orders of magnitude. In this work, we extend the UNIfied COarse-gRaiNed (UNICORN) model of biological macromolecules developed in our laboratory to heparin. We carried out extensive tests to estimate the optimal weights of energy terms of the effective energy function as well as the optimal Debye-Hückel screening factor for electrostatic interactions. We applied the model to study unbound heparin molecules of polymerization degree ranging from 6 to 68 residues. We compare the obtained coarse-grained heparin conformations with models obtained from X-ray diffraction studies of heparin. The SUGRES-1P force field was able to accurately predict the general shape and global characteristics of heparin molecules.
Collapse
Affiliation(s)
- Annemarie Danielsson
- Faculty of Chemistry, University
of Gdansk, ul. Wita Stwosza 63, 80-308 Gdańsk, Poland
| | - Sergey A. Samsonov
- Faculty of Chemistry, University
of Gdansk, ul. Wita Stwosza 63, 80-308 Gdańsk, Poland
| | - Adam Liwo
- Faculty of Chemistry, University
of Gdansk, ul. Wita Stwosza 63, 80-308 Gdańsk, Poland
| | - Adam K. Sieradzan
- Faculty of Chemistry, University
of Gdansk, ul. Wita Stwosza 63, 80-308 Gdańsk, Poland
| |
Collapse
|
21
|
Liu S, Wang C, Latham AP, Ding X, Zhang B. OpenABC enables flexible, simplified, and efficient GPU accelerated simulations of biomolecular condensates. PLoS Comput Biol 2023; 19:e1011442. [PMID: 37695778 PMCID: PMC10513381 DOI: 10.1371/journal.pcbi.1011442] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2023] [Revised: 09/21/2023] [Accepted: 08/19/2023] [Indexed: 09/13/2023] Open
Abstract
Biomolecular condensates are important structures in various cellular processes but are challenging to study using traditional experimental techniques. In silico simulations with residue-level coarse-grained models strike a balance between computational efficiency and chemical accuracy. They could offer valuable insights by connecting the emergent properties of these complex systems with molecular sequences. However, existing coarse-grained models often lack easy-to-follow tutorials and are implemented in software that is not optimal for condensate simulations. To address these issues, we introduce OpenABC, a software package that greatly simplifies the setup and execution of coarse-grained condensate simulations with multiple force fields using Python scripting. OpenABC seamlessly integrates with the OpenMM molecular dynamics engine, enabling efficient simulations with performance on a single GPU that rivals the speed achieved by hundreds of CPUs. We also provide tools that convert coarse-grained configurations to all-atom structures for atomistic simulations. We anticipate that OpenABC will significantly facilitate the adoption of in silico simulations by a broader community to investigate the structural and dynamical properties of condensates.
Collapse
Affiliation(s)
- Shuming Liu
- Department of Chemistry, Massachusetts Institute of Technology, Cambridge, Massachusetts, United States of America
| | - Cong Wang
- Department of Chemistry, Massachusetts Institute of Technology, Cambridge, Massachusetts, United States of America
| | - Andrew P. Latham
- Department of Chemistry, Massachusetts Institute of Technology, Cambridge, Massachusetts, United States of America
- Department of Bioengineering and Therapeutic Sciences, Department of Pharmaceutical Chemistry, Quantitative Biosciences Institute, University of California, San Francisco, San Francisco, California, United States of America
| | - Xinqiang Ding
- Department of Chemistry, Massachusetts Institute of Technology, Cambridge, Massachusetts, United States of America
| | - Bin Zhang
- Department of Chemistry, Massachusetts Institute of Technology, Cambridge, Massachusetts, United States of America
| |
Collapse
|
22
|
Wellawatte GP, Hocky GM, White AD. Neural potentials of proteins extrapolate beyond training data. J Chem Phys 2023; 159:085103. [PMID: 37642255 PMCID: PMC10474891 DOI: 10.1063/5.0147240] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2023] [Accepted: 07/31/2023] [Indexed: 08/31/2023] Open
Abstract
We evaluate neural network (NN) coarse-grained (CG) force fields compared to traditional CG molecular mechanics force fields. We conclude that NN force fields are able to extrapolate and sample from unseen regions of the free energy surface when trained with limited data. Our results come from 88 NN force fields trained on different combinations of clustered free energy surfaces from four protein mapped trajectories. We used a statistical measure named total variation similarity to assess the agreement between reference free energy surfaces from mapped atomistic simulations and CG simulations from trained NN force fields. Our conclusions support the hypothesis that NN CG force fields trained with samples from one region of the proteins' free energy surface can, indeed, extrapolate to unseen regions. Additionally, the force matching error was found to only be weakly correlated with a force field's ability to reconstruct the correct free energy surface.
Collapse
Affiliation(s)
- Geemi P. Wellawatte
- Department of Chemistry, University of Rochester, Rochester, New York 14627, USA
| | - Glen M. Hocky
- Department of Chemistry, Simons Center for Computational Physical Chemistry, New York University, New York, New York 10003, USA
| | - Andrew D. White
- Department of Chemical Engineering, University of Rochester, Rochester, New York 14627, USA
| |
Collapse
|
23
|
Zaporozhets I, Clementi C. Multibody Terms in Protein Coarse-Grained Models: A Top-Down Perspective. J Phys Chem B 2023; 127:6920-6927. [PMID: 37499123 DOI: 10.1021/acs.jpcb.3c04493] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/29/2023]
Abstract
Coarse-grained models allow computational investigation of biomolecular processes occurring on long time and length scales, intractable with atomistic simulation. Traditionally, many coarse-grained models rely mostly on pairwise interaction potentials. However, the decimation of degrees of freedom should, in principle, lead to a complex many-body effective interaction potential. In this work, we use experimental data on mutant stability to parametrize coarse-grained models for two proteins with and without many-body terms. We demonstrate that many-body terms are necessary to reproduce quantitatively the effects of point mutations on protein stability, particularly to implicitly take into account the effect of the solvent.
Collapse
Affiliation(s)
- Iryna Zaporozhets
- Department of Chemistry, Rice University, 6100 Main Street, Houston, Texas 77005, United States
- Center for Theoretical Biological Physics, Rice University, 6100 Main Street, Houston, Texas 77005, United States
- Department of Physics, Freie Universität, Arnimallee 12, Berlin 14195, Germany
| | - Cecilia Clementi
- Department of Chemistry, Rice University, 6100 Main Street, Houston, Texas 77005, United States
- Center for Theoretical Biological Physics, Rice University, 6100 Main Street, Houston, Texas 77005, United States
- Department of Physics, Freie Universität, Arnimallee 12, Berlin 14195, Germany
| |
Collapse
|
24
|
Sahrmann P, Loose TD, Durumeric AEP, Voth GA. Utilizing Machine Learning to Greatly Expand the Range and Accuracy of Bottom-Up Coarse-Grained Models through Virtual Particles. J Chem Theory Comput 2023; 19:4402-4413. [PMID: 36802592 PMCID: PMC10373655 DOI: 10.1021/acs.jctc.2c01183] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2022] [Indexed: 02/22/2023]
Abstract
Coarse-grained (CG) models parametrized using atomistic reference data, i.e., "bottom up" CG models, have proven useful in the study of biomolecules and other soft matter. However, the construction of highly accurate, low resolution CG models of biomolecules remains challenging. We demonstrate in this work how virtual particles, CG sites with no atomistic correspondence, can be incorporated into CG models within the context of relative entropy minimization (REM) as latent variables. The methodology presented, variational derivative relative entropy minimization (VD-REM), enables optimization of virtual particle interactions through a gradient descent algorithm aided by machine learning. We apply this methodology to the challenging case of a solvent-free CG model of a 1,2-dioleoyl-sn-glycero-3-phosphocholine (DOPC) lipid bilayer and demonstrate that introduction of virtual particles captures solvent-mediated behavior and higher-order correlations which REM alone cannot capture in a more standard CG model based only on the mapping of collections of atoms to the CG sites.
Collapse
Affiliation(s)
- Patrick
G. Sahrmann
- Department of Chemistry, Chicago Center
for Theoretical Chemistry, James Franck Institute, and Institute for
Biophysical Dynamics, The University of
Chicago, Chicago, Illinois 60637, United
States
| | - Timothy D. Loose
- Department of Chemistry, Chicago Center
for Theoretical Chemistry, James Franck Institute, and Institute for
Biophysical Dynamics, The University of
Chicago, Chicago, Illinois 60637, United
States
| | - Aleksander E. P. Durumeric
- Department of Chemistry, Chicago Center
for Theoretical Chemistry, James Franck Institute, and Institute for
Biophysical Dynamics, The University of
Chicago, Chicago, Illinois 60637, United
States
| | - Gregory A. Voth
- Department of Chemistry, Chicago Center
for Theoretical Chemistry, James Franck Institute, and Institute for
Biophysical Dynamics, The University of
Chicago, Chicago, Illinois 60637, United
States
| |
Collapse
|
25
|
Conev A, Rigo MM, Devaurs D, Fonseca AF, Kalavadwala H, de Freitas MV, Clementi C, Zanatta G, Antunes DA, Kavraki LE. EnGens: a computational framework for generation and analysis of representative protein conformational ensembles. Brief Bioinform 2023; 24:bbad242. [PMID: 37418278 PMCID: PMC10359083 DOI: 10.1093/bib/bbad242] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2023] [Revised: 05/23/2023] [Accepted: 06/10/2023] [Indexed: 07/08/2023] Open
Abstract
Proteins are dynamic macromolecules that perform vital functions in cells. A protein structure determines its function, but this structure is not static, as proteins change their conformation to achieve various functions. Understanding the conformational landscapes of proteins is essential to understand their mechanism of action. Sets of carefully chosen conformations can summarize such complex landscapes and provide better insights into protein function than single conformations. We refer to these sets as representative conformational ensembles. Recent advances in computational methods have led to an increase in the number of available structural datasets spanning conformational landscapes. However, extracting representative conformational ensembles from such datasets is not an easy task and many methods have been developed to tackle it. Our new approach, EnGens (short for ensemble generation), collects these methods into a unified framework for generating and analyzing representative protein conformational ensembles. In this work, we: (1) provide an overview of existing methods and tools for representative protein structural ensemble generation and analysis; (2) unify existing approaches in an open-source Python package, and a portable Docker image, providing interactive visualizations within a Jupyter Notebook pipeline; (3) test our pipeline on a few canonical examples from the literature. Representative ensembles produced by EnGens can be used for many downstream tasks such as protein-ligand ensemble docking, Markov state modeling of protein dynamics and analysis of the effect of single-point mutations.
Collapse
Affiliation(s)
- Anja Conev
- Department of Computer Science, Rice University, Houston 77005, TX, USA
| | | | - Didier Devaurs
- MRC Institute of Genetics and Cancer, University of Edinburgh, Edinburgh EH4 2XU, UK
| | | | - Hussain Kalavadwala
- Department of Biology and Biochemistry, University of Houston, Houston 77004, TX, USA
| | | | - Cecilia Clementi
- Department of Physics, Freie Universität Berlin, Berlin 14195, Germany
| | - Geancarlo Zanatta
- Department of Biophysics, Institute of Biosciences, Federal University of Rio Grande do Sul, Porto Alegre 91501-970, Brazil
| | - Dinler Amaral Antunes
- Department of Biology and Biochemistry, University of Houston, Houston 77004, TX, USA
| | - Lydia E Kavraki
- Department of Computer Science, Rice University, Houston 77005, TX, USA
| |
Collapse
|
26
|
Bhatia H, Aydin F, Carpenter TS, Lightstone FC, Bremer PT, Ingólfsson HI, Nissley DV, Streitz FH. The confluence of machine learning and multiscale simulations. Curr Opin Struct Biol 2023; 80:102569. [PMID: 36966691 DOI: 10.1016/j.sbi.2023.102569] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2022] [Revised: 01/31/2023] [Accepted: 02/08/2023] [Indexed: 06/04/2023]
Abstract
Multiscale modeling has a long history of use in structural biology, as computational biologists strive to overcome the time- and length-scale limits of atomistic molecular dynamics. Contemporary machine learning techniques, such as deep learning, have promoted advances in virtually every field of science and engineering and are revitalizing the traditional notions of multiscale modeling. Deep learning has found success in various approaches for distilling information from fine-scale models, such as building surrogate models and guiding the development of coarse-grained potentials. However, perhaps its most powerful use in multiscale modeling is in defining latent spaces that enable efficient exploration of conformational space. This confluence of machine learning and multiscale simulation with modern high-performance computing promises a new era of discovery and innovation in structural biology.
Collapse
Affiliation(s)
- Harsh Bhatia
- Computing Directorate, Lawrence Livermore National Laboratory, Livermore, CA, 94550, USA. https://twitter.com/@harshbhatia85
| | - Fikret Aydin
- Physical and Life Sciences (PLS) Directorate, Lawrence Livermore National Laboratory, Livermore, CA, 94550, USA
| | - Timothy S Carpenter
- Physical and Life Sciences (PLS) Directorate, Lawrence Livermore National Laboratory, Livermore, CA, 94550, USA
| | - Felice C Lightstone
- Physical and Life Sciences (PLS) Directorate, Lawrence Livermore National Laboratory, Livermore, CA, 94550, USA
| | - Peer-Timo Bremer
- Computing Directorate, Lawrence Livermore National Laboratory, Livermore, CA, 94550, USA
| | - Helgi I Ingólfsson
- Physical and Life Sciences (PLS) Directorate, Lawrence Livermore National Laboratory, Livermore, CA, 94550, USA
| | - Dwight V Nissley
- RAS Initiative, The Cancer Research Technology Program, Frederick National Laboratory, Frederick, MD, 21701, USA.
| | - Frederick H Streitz
- Physical and Life Sciences (PLS) Directorate, Lawrence Livermore National Laboratory, Livermore, CA, 94550, USA.
| |
Collapse
|
27
|
Conev A, Rigo MM, Devaurs D, Fonseca AF, Kalavadwala H, de Freitas MV, Clementi C, Zanatta G, Antunes DA, Kavraki L. EnGens: a computational framework for generation and analysis of representative protein conformational ensembles. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.04.24.538094. [PMID: 37163076 PMCID: PMC10168271 DOI: 10.1101/2023.04.24.538094] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/11/2023]
Abstract
Proteins are dynamic macromolecules that perform vital functions in cells. A protein structure determines its function, but this structure is not static, as proteins change their conformation to achieve various functions. Understanding the conformational landscapes of proteins is essential to understand their mechanism of action. Sets of carefully chosen conformations can summarize such complex landscapes and provide better insights into protein function than single conformations. We refer to these sets as representative conformational ensembles. Recent advances in computational methods have led to an increase in number of available structural datasets spanning conformational landscapes. However, extracting representative conformational ensembles from such datasets is not an easy task and many methods have been developed to tackle it. Our new approach, EnGens (short for ensemble generation), collects these methods into a unified framework for generating and analyzing protein conformational ensembles. In this work we: (1) provide an overview of existing methods and tools for protein structural ensemble generation and analysis; (2) unify existing approaches in an open-source Python package, and a portable Docker image, providing interactive visualizations within a Jupyter Notebook pipeline; (3) test our pipeline on a few canonical examples found in the literature. Representative ensembles produced by EnGens can be used for many downstream tasks such as protein-ligand ensemble docking, Markov state modeling of protein dynamics and analysis of the effect of single-point mutations.
Collapse
|
28
|
Liu S, Wang C, Latham A, Ding X, Zhang B. OpenABC Enables Flexible, Simplified, and Efficient GPU Accelerated Simulations of Biomolecular Condensates. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.04.19.537533. [PMID: 37131742 PMCID: PMC10153273 DOI: 10.1101/2023.04.19.537533] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
Biomolecular condensates are important structures in various cellular processes but are challenging to study using traditional experimental techniques. In silico simulations with residue-level coarse-grained models strike a balance between computational efficiency and chemical accuracy. They could offer valuable insights by connecting the emergent properties of these complex systems with molecular sequences. However, existing coarse-grained models often lack easy-to-follow tutorials and are implemented in software that is not optimal for condensate simulations. To address these issues, we introduce OpenABC, a software package that greatly simplifies the setup and execution of coarse-grained condensate simulations with multiple force fields using Python scripting. OpenABC seamlessly integrates with the OpenMM molecular dynamics engine, enabling efficient simulations with performances on a single GPU that rival the speed achieved by hundreds of CPUs. We also provide tools that convert coarse-grained configurations to all-atom structures for atomistic simulations. We anticipate that Open-ABC will significantly facilitate the adoption of in silico simulations by a broader community to investigate the structural and dynamical properties of condensates. Open-ABC is available at https://github.com/ZhangGroup-MITChemistry/OpenABC.
Collapse
Affiliation(s)
- Shuming Liu
- Department of Chemistry, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Cong Wang
- Department of Chemistry, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Andrew Latham
- Department of Chemistry, Massachusetts Institute of Technology, Cambridge, MA, USA
- Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, CA, USA
| | - Xinqiang Ding
- Department of Chemistry, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Bin Zhang
- Department of Chemistry, Massachusetts Institute of Technology, Cambridge, MA, USA
| |
Collapse
|
29
|
Krämer A, Durumeric AEP, Charron NE, Chen Y, Clementi C, Noé F. Statistically Optimal Force Aggregation for Coarse-Graining Molecular Dynamics. J Phys Chem Lett 2023; 14:3970-3979. [PMID: 37079800 DOI: 10.1021/acs.jpclett.3c00444] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/03/2023]
Abstract
Machine-learned coarse-grained (CG) models have the potential for simulating large molecular complexes beyond what is possible with atomistic molecular dynamics. However, training accurate CG models remains a challenge. A widely used methodology for learning bottom-up CG force fields maps forces from all-atom molecular dynamics to the CG representation and matches them with a CG force field on average. We show that there is flexibility in how to map all-atom forces to the CG representation and that the most commonly used mapping methods are statistically inefficient and potentially even incorrect in the presence of constraints in the all-atom simulation. We define an optimization statement for force mappings and demonstrate that substantially improved CG force fields can be learned from the same simulation data when using optimized force maps. The method is demonstrated on the miniproteins chignolin and tryptophan cage and published as open-source code.
Collapse
Affiliation(s)
- Andreas Krämer
- Department of Mathematics and Computer Science, Freie Universität Berlin, Arnimallee 12, 14195 Berlin, Germany
| | - Aleksander E P Durumeric
- Department of Mathematics and Computer Science, Freie Universität Berlin, Arnimallee 12, 14195 Berlin, Germany
| | - Nicholas E Charron
- Department of Physics and Astronomy, Rice University, Houston, Texas 77005, United States
- Center for Theoretical Biological Physics, Rice University, Houston, Texas 77251, United States
- Department of Physics, Freie Universität Berlin, Arnimallee 12, 14195 Berlin, Germany
| | - Yaoyi Chen
- Department of Mathematics and Computer Science, Freie Universität Berlin, Arnimallee 12, 14195 Berlin, Germany
- International Max Planck Research School for Biology and Computation (IMPRS-BAC), Max Planck Institute for Molecular Genetics, 14195 Berlin, Germany
| | - Cecilia Clementi
- Department of Physics and Astronomy, Rice University, Houston, Texas 77005, United States
- Center for Theoretical Biological Physics, Rice University, Houston, Texas 77251, United States
- Department of Physics, Freie Universität Berlin, Arnimallee 12, 14195 Berlin, Germany
- Department of Chemistry, Rice University, Houston, Texas 77005, United States
| | - Frank Noé
- Department of Mathematics and Computer Science, Freie Universität Berlin, Arnimallee 12, 14195 Berlin, Germany
- Department of Physics, Freie Universität Berlin, Arnimallee 12, 14195 Berlin, Germany
- Department of Chemistry, Rice University, Houston, Texas 77005, United States
- Microsoft Research AI4Science, Karl-Liebknecht Straße 32, 10178 Berlin, Germany
| |
Collapse
|
30
|
Bryer AJ, Rey JS, Perilla JR. Performance efficient macromolecular mechanics via sub-nanometer shape based coarse graining. Nat Commun 2023; 14:2014. [PMID: 37037809 PMCID: PMC10086035 DOI: 10.1038/s41467-023-37801-5] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2022] [Accepted: 03/30/2023] [Indexed: 04/12/2023] Open
Abstract
Dimensionality reduction via coarse grain modeling is a valuable tool in biomolecular research. For large assemblies, ultra coarse models are often knowledge-based, relying on a priori information to parameterize models thus hindering general predictive capability. Here, we present substantial advances to the shape based coarse graining (SBCG) method, which we refer to as SBCG2. SBCG2 utilizes a revitalized formulation of the topology representing network which makes high-granularity modeling possible, preserving atomistic details that maintain assembly characteristics. Further, we present a method of granularity selection based on charge density Fourier Shell Correlation and have additionally developed a refinement method to optimize, adjust and validate high-granularity models. We demonstrate our approach with the conical HIV-1 capsid and heteromultimeric cofilin-2 bound actin filaments. Our approach is available in the Visual Molecular Dynamics (VMD) software suite, and employs a CHARMM-compatible Hamiltonian that enables high-performance simulation in the GPU-resident NAMD3 molecular dynamics engine.
Collapse
Affiliation(s)
- Alexander J Bryer
- Department of Chemistry and Biochemistry, University of Delaware, Newark, DE, 19716, USA
| | - Juan S Rey
- Department of Chemistry and Biochemistry, University of Delaware, Newark, DE, 19716, USA
| | - Juan R Perilla
- Department of Chemistry and Biochemistry, University of Delaware, Newark, DE, 19716, USA.
| |
Collapse
|
31
|
Chennakesavalu S, Toomer DJ, Rotskoff GM. Ensuring thermodynamic consistency with invertible coarse-graining. J Chem Phys 2023; 158:124126. [PMID: 37003724 DOI: 10.1063/5.0141888] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/03/2023] Open
Abstract
Coarse-grained models are a core computational tool in theoretical chemistry and biophysics. A judicious choice of a coarse-grained model can yield physical insights by isolating the essential degrees of freedom that dictate the thermodynamic properties of a complex, condensed-phase system. The reduced complexity of the model typically leads to lower computational costs and more efficient sampling compared with atomistic models. Designing "good" coarse-grained models is an art. Generally, the mapping from fine-grained configurations to coarse-grained configurations itself is not optimized in any way; instead, the energy function associated with the mapped configurations is. In this work, we explore the consequences of optimizing the coarse-grained representation alongside its potential energy function. We use a graph machine learning framework to embed atomic configurations into a low-dimensional space to produce efficient representations of the original molecular system. Because the representation we obtain is no longer directly interpretable as a real-space representation of the atomic coordinates, we also introduce an inversion process and an associated thermodynamic consistency relation that allows us to rigorously sample fine-grained configurations conditioned on the coarse-grained sampling. We show that this technique is robust, recovering the first two moments of the distribution of several observables in proteins such as chignolin and alanine dipeptide.
Collapse
Affiliation(s)
| | - David J Toomer
- Department of Chemistry, Stanford University, Stanford, California 94305, USA
| | - Grant M Rotskoff
- Department of Chemistry, Stanford University, Stanford, California 94305, USA
| |
Collapse
|
32
|
Ricci E, Vergadou N. Integrating Machine Learning in the Coarse-Grained Molecular Simulation of Polymers. J Phys Chem B 2023; 127:2302-2322. [PMID: 36888553 DOI: 10.1021/acs.jpcb.2c06354] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/09/2023]
Abstract
Machine learning (ML) is having an increasing impact on the physical sciences, engineering, and technology and its integration into molecular simulation frameworks holds great potential to expand their scope of applicability to complex materials and facilitate fundamental knowledge and reliable property predictions, contributing to the development of efficient materials design routes. The application of ML in materials informatics in general, and polymer informatics in particular, has led to interesting results, however great untapped potential lies in the integration of ML techniques into the multiscale molecular simulation methods for the study of macromolecular systems, specifically in the context of Coarse Grained (CG) simulations. In this Perspective, we aim at presenting the pioneering recent research efforts in this direction and discussing how these new ML-based techniques can contribute to critical aspects of the development of multiscale molecular simulation methods for bulk complex chemical systems, especially polymers. Prerequisites for the implementation of such ML-integrated methods and open challenges that need to be met toward the development of general systematic ML-based coarse graining schemes for polymers are discussed.
Collapse
Affiliation(s)
- Eleonora Ricci
- Institute of Nanoscience and Nanotechnology, National Center for Scientific Research "Demokritos", GR-15341 Agia Paraskevi, Athens, Greece
- Institute of Informatics and Telecommunications, National Center for Scientific Research "Demokritos", GR-15341 Agia Paraskevi, Athens, Greece
| | - Niki Vergadou
- Institute of Nanoscience and Nanotechnology, National Center for Scientific Research "Demokritos", GR-15341 Agia Paraskevi, Athens, Greece
| |
Collapse
|
33
|
Guidarelli Mattioli F, Sciortino F, Russo J. A neural network potential with self-trained atomic fingerprints: A test with the mW water potential. J Chem Phys 2023; 158:104501. [PMID: 36922151 DOI: 10.1063/5.0139245] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/18/2023] Open
Abstract
We present a neural network (NN) potential based on a new set of atomic fingerprints built upon two- and three-body contributions that probe distances and local orientational order, respectively. Compared with the existing NN potentials, the atomic fingerprints depend on a small set of tunable parameters that are trained together with the NN weights. In addition to simplifying the selection of the atomic fingerprints, this strategy can also considerably increase the overall accuracy of the network representation. To tackle the simultaneous training of the atomic fingerprint parameters and NN weights, we adopt an annealing protocol that progressively cycles the learning rate, significantly improving the accuracy of the NN potential. We test the performance of the network potential against the mW model of water, which is a classical three-body potential that well captures the anomalies of the liquid phase. Trained on just three state points, the NN potential is able to reproduce the mW model in a very wide range of densities and temperatures, from negative pressures to several GPa, capturing the transition from an open random tetrahedral network to a dense interpenetrated network. The NN potential also reproduces very well properties for which it was not explicitly trained, such as dynamical properties and the structure of the stable crystalline phases of mW.
Collapse
Affiliation(s)
| | | | - John Russo
- Sapienza University of Rome, Piazzale Aldo Moro 2, 00185 Rome, Italy
| |
Collapse
|
34
|
Raabe D, Mianroodi JR, Neugebauer J. Accelerating the design of compositionally complex materials via physics-informed artificial intelligence. NATURE COMPUTATIONAL SCIENCE 2023; 3:198-209. [PMID: 38177883 DOI: 10.1038/s43588-023-00412-7] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/11/2022] [Accepted: 02/07/2023] [Indexed: 01/06/2024]
Abstract
The chemical space for designing materials is practically infinite. This makes disruptive progress by traditional physics-based modeling alone challenging. Yet, training data for identifying composition-structure-property relations by artificial intelligence are sparse. We discuss opportunities to discover new chemically complex materials by hybrid methods where physics laws are combined with artificial intelligence.
Collapse
Affiliation(s)
- Dierk Raabe
- Max-Planck-Institut für Eisenforschung, Düsseldorf, Germany.
| | | | - Jörg Neugebauer
- Max-Planck-Institut für Eisenforschung, Düsseldorf, Germany.
| |
Collapse
|
35
|
Yang W, Templeton C, Rosenberger D, Bittracher A, Nüske F, Noé F, Clementi C. Slicing and Dicing: Optimal Coarse-Grained Representation to Preserve Molecular Kinetics. ACS CENTRAL SCIENCE 2023; 9:186-196. [PMID: 36844497 PMCID: PMC9951291 DOI: 10.1021/acscentsci.2c01200] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/10/2022] [Indexed: 05/05/2023]
Abstract
The aim of molecular coarse-graining approaches is to recover relevant physical properties of the molecular system via a lower-resolution model that can be more efficiently simulated. Ideally, the lower resolution still accounts for the degrees of freedom necessary to recover the correct physical behavior. The selection of these degrees of freedom has often relied on the scientist's chemical and physical intuition. In this article, we make the argument that in soft matter contexts desirable coarse-grained models accurately reproduce the long-time dynamics of a system by correctly capturing the rare-event transitions. We propose a bottom-up coarse-graining scheme that correctly preserves the relevant slow degrees of freedom, and we test this idea for three systems of increasing complexity. We show that in contrast to this method existing coarse-graining schemes such as those from information theory or structure-based approaches are not able to recapitulate the slow time scales of the system.
Collapse
Affiliation(s)
- Wangfei Yang
- Center
for Theoretical Biological Physics, Rice
University, Houston, Texas77005, United States
- Graduate
Program in Systems, Synthetic and Physical Biology, Rice University, Houston, Texas77005, United States
| | - Clark Templeton
- Department
of Physics, Freie Universität Berlin, Arnimallee 12, 14195Berlin, Germany
| | - David Rosenberger
- Department
of Physics, Freie Universität Berlin, Arnimallee 12, 14195Berlin, Germany
| | - Andreas Bittracher
- Department
of Mathematics and Computer Science, Freie
Universität Berlin, Arnimallee 12, 14195Berlin, Germany
| | - Feliks Nüske
- Max
Planck Institute for Dynamics of Complex Technical Systems, Sandtorstrasse 1, 39106Magdeburg, Germany
| | - Frank Noé
- Center
for Theoretical Biological Physics, Rice
University, Houston, Texas77005, United States
- Department
of Physics, Freie Universität Berlin, Arnimallee 12, 14195Berlin, Germany
- Department
of Mathematics and Computer Science, Freie
Universität Berlin, Arnimallee 12, 14195Berlin, Germany
- Department
of Chemistry, Rice University, Houston, Texas77005, United States
| | - Cecilia Clementi
- Center
for Theoretical Biological Physics, Rice
University, Houston, Texas77005, United States
- Department
of Physics, Freie Universität Berlin, Arnimallee 12, 14195Berlin, Germany
- Department
of Chemistry, Rice University, Houston, Texas77005, United States
- Department
of Physics, Rice University, Houston, Texas77005, United States
- E-mail:
| |
Collapse
|
36
|
Lalmansingh JM, Keeley AT, Ruff KM, Pappu RV, Holehouse AS. SOURSOP: A Python package for the analysis of simulations of intrinsically disordered proteins. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.02.16.528879. [PMID: 36824878 PMCID: PMC9949127 DOI: 10.1101/2023.02.16.528879] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/19/2023]
Abstract
Conformational heterogeneity is a defining hallmark of intrinsically disordered proteins and protein regions (IDRs). The functions of IDRs and the emergent cellular phenotypes they control are associated with sequence-specific conformational ensembles. Simulations of conformational ensembles that are based on atomistic and coarse-grained models are routinely used to uncover the sequence-specific interactions that may contribute to IDR functions. These simulations are performed either independently or in conjunction with data from experiments. Functionally relevant features of IDRs can span a range of length scales. Extracting these features requires analysis routines that quantify a range of properties. Here, we describe a new analysis suite SOURSOP, an object-oriented and open-source toolkit designed for the analysis of simulated conformational ensembles of IDRs. SOURSOP implements several analysis routines motivated by principles in polymer physics, offering a unique collection of simple-to-use functions to characterize IDR ensembles. As an extendable framework, SOURSOP supports the development and implementation of new analysis routines that can be easily packaged and shared.
Collapse
|
37
|
Köhler J, Chen Y, Krämer A, Clementi C, Noé F. Flow-Matching: Efficient Coarse-Graining of Molecular Dynamics without Forces. J Chem Theory Comput 2023; 19:942-952. [PMID: 36668906 DOI: 10.1021/acs.jctc.3c00016] [Citation(s) in RCA: 12] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/21/2023]
Abstract
Coarse-grained (CG) molecular simulations have become a standard tool to study molecular processes on time and length scales inaccessible to all-atom simulations. Parametrizing CG force fields to match all-atom simulations has mainly relied on force-matching or relative entropy minimization, which require many samples from costly simulations with all-atom or CG resolutions, respectively. Here we present flow-matching, a new training method for CG force fields that combines the advantages of both methods by leveraging normalizing flows, a generative deep learning method. Flow-matching first trains a normalizing flow to represent the CG probability density, which is equivalent to minimizing the relative entropy without requiring iterative CG simulations. Subsequently, the flow generates samples and forces according to the learned distribution in order to train the desired CG free energy model via force-matching. Even without requiring forces from the all-atom simulations, flow-matching outperforms classical force-matching by an order of magnitude in terms of data efficiency and produces CG models that can capture the folding and unfolding transitions of small proteins.
Collapse
Affiliation(s)
- Jonas Köhler
- Department of Mathematics and Computer Science, Freie Universität Berlin, Arnimallee 12, 14195Berlin, Germany
| | - Yaoyi Chen
- Department of Mathematics and Computer Science, Freie Universität Berlin, Arnimallee 12, 14195Berlin, Germany
| | - Andreas Krämer
- Department of Mathematics and Computer Science, Freie Universität Berlin, Arnimallee 12, 14195Berlin, Germany
| | - Cecilia Clementi
- Department of Physics, Freie Universität Berlin, Arnimallee 12, 14195Berlin, Germany.,Center for Theoretical Biological Physics, Rice University, Houston, Texas77005, United States.,Department of Physics, Rice University, Houston, Texas77005, United States.,Department of Chemistry, Rice University, Houston, Texas77005, United States
| | - Frank Noé
- Department of Mathematics and Computer Science, Freie Universität Berlin, Arnimallee 12, 14195Berlin, Germany.,Department of Physics, Freie Universität Berlin, Arnimallee 12, 14195Berlin, Germany.,Department of Chemistry, Rice University, Houston, Texas77005, United States.,Microsoft Research AI4Science, Karl-Liebknecht Strasse 32, 10178Berlin, Germany
| |
Collapse
|
38
|
Direct generation of protein conformational ensembles via machine learning. Nat Commun 2023; 14:774. [PMID: 36774359 PMCID: PMC9922302 DOI: 10.1038/s41467-023-36443-x] [Citation(s) in RCA: 28] [Impact Index Per Article: 28.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2022] [Accepted: 02/01/2023] [Indexed: 02/13/2023] Open
Abstract
Dynamics and conformational sampling are essential for linking protein structure to biological function. While challenging to probe experimentally, computer simulations are widely used to describe protein dynamics, but at significant computational costs that continue to limit the systems that can be studied. Here, we demonstrate that machine learning can be trained with simulation data to directly generate physically realistic conformational ensembles of proteins without the need for any sampling and at negligible computational cost. As a proof-of-principle we train a generative adversarial network based on a transformer architecture with self-attention on coarse-grained simulations of intrinsically disordered peptides. The resulting model, idpGAN, can predict sequence-dependent coarse-grained ensembles for sequences that are not present in the training set demonstrating that transferability can be achieved beyond the limited training data. We also retrain idpGAN on atomistic simulation data to show that the approach can be extended in principle to higher-resolution conformational ensemble generation.
Collapse
|
39
|
Thaler S, Stupp M, Zavadlav J. Deep coarse-grained potentials via relative entropy minimization. J Chem Phys 2022; 157:244103. [PMID: 36586977 DOI: 10.1063/5.0124538] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022] Open
Abstract
Neural network (NN) potentials are a natural choice for coarse-grained (CG) models. Their many-body capacity allows highly accurate approximations of the potential of mean force, promising CG simulations of unprecedented accuracy. CG NN potentials trained bottom-up via force matching (FM), however, suffer from finite data effects: They rely on prior potentials for physically sound predictions outside the training data domain, and the corresponding free energy surface is sensitive to errors in the transition regions. The standard alternative to FM for classical potentials is relative entropy (RE) minimization, which has not yet been applied to NN potentials. In this work, we demonstrate, for benchmark problems of liquid water and alanine dipeptide, that RE training is more data efficient, due to accessing the CG distribution during training, resulting in improved free energy surfaces and reduced sensitivity to prior potentials. In addition, RE learns to correct time integration errors, allowing larger time steps in CG molecular dynamics simulation, while maintaining accuracy. Thus, our findings support the use of training objectives beyond FM, as a promising direction for improving CG NN potential's accuracy and reliability.
Collapse
Affiliation(s)
- Stephan Thaler
- Multiscale Modeling of Fluid Materials, Department of Engineering Physics and Computation, TUM School of Engineering and Design, Technical University of Munich, Munich, Germany
| | - Maximilian Stupp
- Multiscale Modeling of Fluid Materials, Department of Engineering Physics and Computation, TUM School of Engineering and Design, Technical University of Munich, Munich, Germany
| | - Julija Zavadlav
- Multiscale Modeling of Fluid Materials, Department of Engineering Physics and Computation, TUM School of Engineering and Design, Technical University of Munich, Munich, Germany
| |
Collapse
|
40
|
Shmilovich K, Stieffenhofer M, Charron NE, Hoffmann M. Temporally Coherent Backmapping of Molecular Trajectories From Coarse-Grained to Atomistic Resolution. J Phys Chem A 2022; 126:9124-9139. [PMID: 36417670 PMCID: PMC9743211 DOI: 10.1021/acs.jpca.2c07716] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
Coarse-graining offers a means to extend the achievable time and length scales of molecular dynamics simulations beyond what is practically possible in the atomistic regime. Sampling molecular configurations of interest can be done efficiently using coarse-grained simulations, from which meaningful physicochemical information can be inferred if the corresponding all-atom configurations are reconstructed. However, this procedure of backmapping to reintroduce the lost atomistic detail into coarse-grain structures has proven a challenging task due to the many feasible atomistic configurations that can be associated with one coarse-grain structure. Existing backmapping methods are strictly frame-based, relying on either heuristics to replace coarse-grain particles with atomic fragments and subsequent relaxation or parametrized models to propose atomic coordinates separately and independently for each coarse-grain structure. These approaches neglect information from previous trajectory frames that is critical to ensuring temporal coherence of the backmapped trajectory, while also offering information potentially helpful to producing higher-fidelity atomic reconstructions. In this work, we present a deep learning-enabled data-driven approach for temporally coherent backmapping that explicitly incorporates information from preceding trajectory structures. Our method trains a conditional variational autoencoder to nondeterministically reconstruct atomistic detail conditioned on both the target coarse-grain configuration and the previously reconstructed atomistic configuration. We demonstrate our backmapping approach on two exemplar biomolecular systems: alanine dipeptide and the miniprotein chignolin. We show that our backmapped trajectories accurately recover the structural, thermodynamic, and kinetic properties of the atomistic trajectory data.
Collapse
Affiliation(s)
- Kirill Shmilovich
- Pritzker
School of Molecular Engineering, University
of Chicago, Chicago, Illinois60637, United States,E-mail:
| | | | - Nicholas E. Charron
- Weiss
School of Natural Sciences, Department of Physics and Astronomy, Rice University, Houston, Texas77005, United States,Department
of Physics, Freie Universität Berlin, Berlin14195, Germany
| | - Moritz Hoffmann
- Fachbereich
Mathematik und Informatik, Freie Universität
Berlin, Berlin14195, Germany
| |
Collapse
|
41
|
Reiser P, Neubert M, Eberhard A, Torresi L, Zhou C, Shao C, Metni H, van Hoesel C, Schopmans H, Sommer T, Friederich P. Graph neural networks for materials science and chemistry. COMMUNICATIONS MATERIALS 2022; 3:93. [PMID: 36468086 PMCID: PMC9702700 DOI: 10.1038/s43246-022-00315-6] [Citation(s) in RCA: 57] [Impact Index Per Article: 28.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/10/2022] [Accepted: 11/07/2022] [Indexed: 05/14/2023]
Abstract
Machine learning plays an increasingly important role in many areas of chemistry and materials science, being used to predict materials properties, accelerate simulations, design new structures, and predict synthesis routes of new materials. Graph neural networks (GNNs) are one of the fastest growing classes of machine learning models. They are of particular relevance for chemistry and materials science, as they directly work on a graph or structural representation of molecules and materials and therefore have full access to all relevant information required to characterize materials. In this Review, we provide an overview of the basic principles of GNNs, widely used datasets, and state-of-the-art architectures, followed by a discussion of a wide range of recent applications of GNNs in chemistry and materials science, and concluding with a road-map for the further development and application of GNNs.
Collapse
Affiliation(s)
- Patrick Reiser
- Institute of Theoretical Informatics, Karlsruhe Institute of Technology, Am Fasanengarten 5, 76131 Karlsruhe, Germany
- Institute of Nanotechnology, Karlsruhe Institute of Technology, Hermann-von-Helmholtz-Platz 1, 76344 Eggenstein-Leopoldshafen, Germany
| | - Marlen Neubert
- Institute of Theoretical Informatics, Karlsruhe Institute of Technology, Am Fasanengarten 5, 76131 Karlsruhe, Germany
| | - André Eberhard
- Institute of Theoretical Informatics, Karlsruhe Institute of Technology, Am Fasanengarten 5, 76131 Karlsruhe, Germany
| | - Luca Torresi
- Institute of Theoretical Informatics, Karlsruhe Institute of Technology, Am Fasanengarten 5, 76131 Karlsruhe, Germany
| | - Chen Zhou
- Institute of Theoretical Informatics, Karlsruhe Institute of Technology, Am Fasanengarten 5, 76131 Karlsruhe, Germany
| | - Chen Shao
- Institute of Theoretical Informatics, Karlsruhe Institute of Technology, Am Fasanengarten 5, 76131 Karlsruhe, Germany
- Present Address: Institute for Applied Informatics and Formal Description Systems, Karlsruhe Institute of Technology, Kaiserstr. 89, 76133 Karlsruhe, Germany
| | - Houssam Metni
- Institute of Theoretical Informatics, Karlsruhe Institute of Technology, Am Fasanengarten 5, 76131 Karlsruhe, Germany
- ECPM, Université de Strasbourg, 25 Rue Becquerel, 67087 Strasbourg, France
| | - Clint van Hoesel
- Institute of Theoretical Informatics, Karlsruhe Institute of Technology, Am Fasanengarten 5, 76131 Karlsruhe, Germany
- Department of Applied Physics, Eindhoven University of Technology, Groene Loper 19, 5612 AP Eindhoven, The Netherlands
| | - Henrik Schopmans
- Institute of Theoretical Informatics, Karlsruhe Institute of Technology, Am Fasanengarten 5, 76131 Karlsruhe, Germany
- Institute of Nanotechnology, Karlsruhe Institute of Technology, Hermann-von-Helmholtz-Platz 1, 76344 Eggenstein-Leopoldshafen, Germany
| | - Timo Sommer
- Institute of Theoretical Informatics, Karlsruhe Institute of Technology, Am Fasanengarten 5, 76131 Karlsruhe, Germany
- Institute for Theory of Condensed Matter, Karlsruhe Institute of Technology, Wolfgang-Gaede-Str. 1, 76131 Karlsruhe, Germany
- Present Address: School of Chemistry, Trinity College Dublin, College Green, Dublin 2, Ireland
| | - Pascal Friederich
- Institute of Theoretical Informatics, Karlsruhe Institute of Technology, Am Fasanengarten 5, 76131 Karlsruhe, Germany
- Institute of Nanotechnology, Karlsruhe Institute of Technology, Hermann-von-Helmholtz-Platz 1, 76344 Eggenstein-Leopoldshafen, Germany
| |
Collapse
|
42
|
Abstract
Coarse-grained models have proven helpful for simulating complex systems over long time scales to provide molecular insights into various processes. Methodologies for systematic parametrization of the underlying energy function or force field that describes the interactions among different components of the system are of great interest for ensuring simulation accuracy. We present a new method, potential contrasting, to enable efficient learning of force fields that can accurately reproduce the conformational distribution produced with all-atom simulations. Potential contrasting generalizes the noise contrastive estimation method with umbrella sampling to better learn the complex energy landscape of molecular systems. When applied to the Trp-cage protein, we found that the technique produces force fields that thoroughly capture the thermodynamics of the folding process despite the use of only α-carbons in the coarse-grained model. We further showed that potential contrasting could be applied over large data sets that combine the conformational ensembles of many proteins to improve force field transferability. We anticipate potential contrasting as a powerful tool for building general-purpose coarse-grained force fields.
Collapse
Affiliation(s)
- Xinqiang Ding
- Department of Chemistry, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Bin Zhang
- Department of Chemistry, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| |
Collapse
|
43
|
Jin J, Pak AJ, Durumeric AEP, Loose TD, Voth GA. Bottom-up Coarse-Graining: Principles and Perspectives. J Chem Theory Comput 2022; 18:5759-5791. [PMID: 36070494 PMCID: PMC9558379 DOI: 10.1021/acs.jctc.2c00643] [Citation(s) in RCA: 72] [Impact Index Per Article: 36.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2022] [Indexed: 01/14/2023]
Abstract
Large-scale computational molecular models provide scientists a means to investigate the effect of microscopic details on emergent mesoscopic behavior. Elucidating the relationship between variations on the molecular scale and macroscopic observable properties facilitates an understanding of the molecular interactions driving the properties of real world materials and complex systems (e.g., those found in biology, chemistry, and materials science). As a result, discovering an explicit, systematic connection between microscopic nature and emergent mesoscopic behavior is a fundamental goal for this type of investigation. The molecular forces critical to driving the behavior of complex heterogeneous systems are often unclear. More problematically, simulations of representative model systems are often prohibitively expensive from both spatial and temporal perspectives, impeding straightforward investigations over possible hypotheses characterizing molecular behavior. While the reduction in resolution of a study, such as moving from an atomistic simulation to that of the resolution of large coarse-grained (CG) groups of atoms, can partially ameliorate the cost of individual simulations, the relationship between the proposed microscopic details and this intermediate resolution is nontrivial and presents new obstacles to study. Small portions of these complex systems can be realistically simulated. Alone, these smaller simulations likely do not provide insight into collectively emergent behavior. However, by proposing that the driving forces in both smaller and larger systems (containing many related copies of the smaller system) have an explicit connection, systematic bottom-up CG techniques can be used to transfer CG hypotheses discovered using a smaller scale system to a larger system of primary interest. The proposed connection between different CG systems is prescribed by (i) the CG representation (mapping) and (ii) the functional form and parameters used to represent the CG energetics, which approximate potentials of mean force (PMFs). As a result, the design of CG methods that facilitate a variety of physically relevant representations, approximations, and force fields is critical to moving the frontier of systematic CG forward. Crucially, the proposed connection between the system used for parametrization and the system of interest is orthogonal to the optimization used to approximate the potential of mean force present in all systematic CG methods. The empirical efficacy of machine learning techniques on a variety of tasks provides strong motivation to consider these approaches for approximating the PMF and analyzing these approximations.
Collapse
Affiliation(s)
- Jaehyeok Jin
- Department of Chemistry,
Chicago Center for Theoretical Chemistry, Institute for Biophysical
Dynamics, and James Franck Institute, The
University of Chicago, Chicago, Illinois 60637, United States
| | - Alexander J. Pak
- Department of Chemistry,
Chicago Center for Theoretical Chemistry, Institute for Biophysical
Dynamics, and James Franck Institute, The
University of Chicago, Chicago, Illinois 60637, United States
| | - Aleksander E. P. Durumeric
- Department of Chemistry,
Chicago Center for Theoretical Chemistry, Institute for Biophysical
Dynamics, and James Franck Institute, The
University of Chicago, Chicago, Illinois 60637, United States
| | - Timothy D. Loose
- Department of Chemistry,
Chicago Center for Theoretical Chemistry, Institute for Biophysical
Dynamics, and James Franck Institute, The
University of Chicago, Chicago, Illinois 60637, United States
| | - Gregory A. Voth
- Department of Chemistry,
Chicago Center for Theoretical Chemistry, Institute for Biophysical
Dynamics, and James Franck Institute, The
University of Chicago, Chicago, Illinois 60637, United States
| |
Collapse
|
44
|
Protein Function Analysis through Machine Learning. Biomolecules 2022; 12:biom12091246. [PMID: 36139085 PMCID: PMC9496392 DOI: 10.3390/biom12091246] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2022] [Revised: 08/22/2022] [Accepted: 08/31/2022] [Indexed: 11/16/2022] Open
Abstract
Machine learning (ML) has been an important arsenal in computational biology used to elucidate protein function for decades. With the recent burgeoning of novel ML methods and applications, new ML approaches have been incorporated into many areas of computational biology dealing with protein function. We examine how ML has been integrated into a wide range of computational models to improve prediction accuracy and gain a better understanding of protein function. The applications discussed are protein structure prediction, protein engineering using sequence modifications to achieve stability and druggability characteristics, molecular docking in terms of protein–ligand binding, including allosteric effects, protein–protein interactions and protein-centric drug discovery. To quantify the mechanisms underlying protein function, a holistic approach that takes structure, flexibility, stability, and dynamics into account is required, as these aspects become inseparable through their interdependence. Another key component of protein function is conformational dynamics, which often manifest as protein kinetics. Computational methods that use ML to generate representative conformational ensembles and quantify differences in conformational ensembles important for function are included in this review. Future opportunities are highlighted for each of these topics.
Collapse
|
45
|
Miszta P, Pasznik P, Niewieczerzał S, Młynarczyk K, Filipek S. COGRIMEN: Coarse-Grained Method for Modeling of Membrane Proteins in Implicit Environments. J Chem Theory Comput 2022; 18:5145-5156. [PMID: 35998323 PMCID: PMC9476660 DOI: 10.1021/acs.jctc.2c00140] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
![]()
The presented methodology is based on coarse-grained
representation
of biomolecules in implicit environments and is designed for the molecular
dynamics simulations of membrane proteins and their complexes. The
membrane proteins are not only found in the cell membrane but also
in all membranous compartments of the cell: Golgi apparatus, mitochondria,
endosomes and lysosomes, and they usually form large complexes. To
investigate such systems the methodology is proposed based on two
independent approaches combining the coarse-grained MARTINI model
for proteins and the effective energy function to mimic the water/membrane
environments. The latter is based on the implicit environment developed
for all-atom simulations in the IMM1 method. The force field solvation
parameters for COGRIMEN were initially calculated from IMM1 all-atom
parameters and then optimized using Genetic Algorithms. The new methodology
was tested on membrane proteins, their complexes and oligomers. COGRIMEN
method is implemented as a patch for NAMD program and can be useful
for fast and brief studies of large membrane protein complexes.
Collapse
Affiliation(s)
- Przemysław Miszta
- Faculty of Chemistry, Biological and Chemical Research Centre, University of Warsaw, Warsaw 02-093, Poland
| | - Paweł Pasznik
- Faculty of Chemistry, Biological and Chemical Research Centre, University of Warsaw, Warsaw 02-093, Poland
| | - Szymon Niewieczerzał
- Faculty of Chemistry, Biological and Chemical Research Centre, University of Warsaw, Warsaw 02-093, Poland
| | - Krzysztof Młynarczyk
- Faculty of Chemistry, Biological and Chemical Research Centre, University of Warsaw, Warsaw 02-093, Poland
| | - Sławomir Filipek
- Faculty of Chemistry, Biological and Chemical Research Centre, University of Warsaw, Warsaw 02-093, Poland
| |
Collapse
|
46
|
Yang H, Xiong Z, Zonta F. Construction of a Deep Neural Network Energy Function for Protein Physics. J Chem Theory Comput 2022; 18:5649-5658. [PMID: 35939398 PMCID: PMC9476656 DOI: 10.1021/acs.jctc.2c00069] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
The traditional approach of computational biology consists of calculating molecule properties by using approximate classical potentials. Interactions between atoms are described by an energy function derived from physical principles or fitted to experimental data. Their functional form is usually limited to pairwise interactions between atoms and does not consider complex multibody effects. More recently, neural networks have emerged as an alternative way of describing the interactions between biomolecules. In this approach, the energy function does not have an explicit functional form and is learned bottom-up from simulations at the atomistic or quantum level. In this study, we attempt a top-down approach and use deep learning methods to obtain an energy function by exploiting the large amount of experimental data acquired with years in the field of structural biology. The energy function is represented by a probability density model learned from a large repertoire of building blocks representing local clusters of amino acids paired with their sequence signature. We demonstrated the feasibility of this approach by generating a neural network energy function and testing its validity on several applications such as discriminating decoys, assessing qualities of structural models, sampling structural conformations, and designing new protein sequences. We foresee that, in the future, our methodology could exploit the continuously increasing availability of experimental data and simulations and provide a new method for the parametrization of protein energy functions.
Collapse
Affiliation(s)
- Huan Yang
- Shanghai Institute for Advanced Immunochemical Studies, ShanghaiTech University, 393 Middle Huaxia Road, Shanghai 201210, China
| | - Zhaoping Xiong
- Shanghai Institute for Advanced Immunochemical Studies, ShanghaiTech University, 393 Middle Huaxia Road, Shanghai 201210, China
| | - Francesco Zonta
- Shanghai Institute for Advanced Immunochemical Studies, ShanghaiTech University, 393 Middle Huaxia Road, Shanghai 201210, China
| |
Collapse
|
47
|
Marrink SJ, Monticelli L, Melo MN, Alessandri R, Tieleman DP, Souza PCT. Two decades of Martini: Better beads, broader scope. WIRES COMPUTATIONAL MOLECULAR SCIENCE 2022. [DOI: 10.1002/wcms.1620] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Affiliation(s)
- Siewert J. Marrink
- Groningen Biomolecular Sciences and Biotechnology Institute & Zernike Institute for Advanced Materials University of Groningen Groningen The Netherlands
| | - Luca Monticelli
- Molecular Microbiology and Structural Biochemistry (MMSB ‐ UMR 5086) CNRS & University of Lyon Lyon France
| | - Manuel N. Melo
- Instituto de Tecnologia Química e Biológica António Xavier, Universidade Nova de Lisboa Oeiras Portugal
| | - Riccardo Alessandri
- Pritzker School of Molecular Engineering University of Chicago Chicago Illinois USA
| | - D. Peter Tieleman
- Centre for Molecular Simulation and Department of Biological Sciences University of Calgary Alberta Canada
| | - Paulo C. T. Souza
- Molecular Microbiology and Structural Biochemistry (MMSB ‐ UMR 5086) CNRS & University of Lyon Lyon France
| |
Collapse
|
48
|
Hudson PS, Aviat F, Meana-Pañeda R, Warrensford L, Pollard BC, Prasad S, Jones MR, Woodcock HL, Brooks BR. Obtaining QM/MM binding free energies in the SAMPL8 drugs of abuse challenge: indirect approaches. J Comput Aided Mol Des 2022; 36:263-277. [PMID: 35597880 PMCID: PMC9148874 DOI: 10.1007/s10822-022-00443-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2021] [Accepted: 02/17/2022] [Indexed: 11/28/2022]
Abstract
Accurately predicting free energy differences is essential in realizing the full potential of rational drug design. Unfortunately, high levels of accuracy often require computationally expensive QM/MM Hamiltonians. Fortuitously, the cost of employing QM/MM approaches in rigorous free energy simulation can be reduced through the use of the so-called “indirect” approach to QM/MM free energies, in which the need for QM/MM simulations is avoided via a QM/MM “correction” at the classical endpoints of interest. Herein, we focus on the computation of QM/MM binding free energies in the context of the SAMPL8 Drugs of Abuse host–guest challenge. Of the 5 QM/MM correction coupled with force-matching submissions, PM6-D3H4/MM ranked submission proved the best overall QM/MM entry, with an RMSE from experimental results of 2.43 kcal/mol (best in ranked submissions), a Pearson’s correlation of 0.78 (second-best in ranked submissions), and a Kendall \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\tau$$\end{document}τ correlation of 0.52 (best in ranked submissions).
Collapse
Affiliation(s)
- Phillip S Hudson
- Laboratory of Computational Biology, National Heart, Lung and Blood Institute, National Institutes of Health, Bethesda, MD, 20852, USA.
| | - Félix Aviat
- Laboratory of Computational Biology, National Heart, Lung and Blood Institute, National Institutes of Health, Bethesda, MD, 20852, USA
| | - Rubén Meana-Pañeda
- Laboratory of Computational Biology, National Heart, Lung and Blood Institute, National Institutes of Health, Bethesda, MD, 20852, USA
| | - Luke Warrensford
- Department of Chemistry, University of South Florida, Tampa, FL, 33620, USA
| | - Benjamin C Pollard
- Department of Chemistry, University of South Florida, Tampa, FL, 33620, USA
| | - Samarjeet Prasad
- Laboratory of Computational Biology, National Heart, Lung and Blood Institute, National Institutes of Health, Bethesda, MD, 20852, USA
| | - Michael R Jones
- Laboratory of Computational Biology, National Heart, Lung and Blood Institute, National Institutes of Health, Bethesda, MD, 20852, USA
| | - H Lee Woodcock
- Department of Chemistry, University of South Florida, Tampa, FL, 33620, USA
| | - Bernard R Brooks
- Laboratory of Computational Biology, National Heart, Lung and Blood Institute, National Institutes of Health, Bethesda, MD, 20852, USA
| |
Collapse
|
49
|
Bai Q, Liu S, Tian Y, Xu T, Banegas‐Luna AJ, Pérez‐Sánchez H, Huang J, Liu H, Yao X. Application advances of deep learning methods for de novo drug design and molecular dynamics simulation. WIRES COMPUTATIONAL MOLECULAR SCIENCE 2022. [DOI: 10.1002/wcms.1581] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
Affiliation(s)
- Qifeng Bai
- Key Lab of Preclinical Study for New Drugs of Gansu Province Institute of Biochemistry and Molecular Biology, School of Basic Medical Sciences, Lanzhou University Lanzhou Gansu China
| | - Shuo Liu
- School of Pharmacy Lanzhou University Lanzhou Gansu China
| | - Yanan Tian
- School of Pharmacy Lanzhou University Lanzhou Gansu China
| | - Tingyang Xu
- Tencent AI Lab, Shenzhen Tencent Computer Ltd Shenzhen China
| | - Antonio Jesús Banegas‐Luna
- Structural Bioinformatics and High Performance Computing Research Group (BIO‐HPC), Computer Engineering Department UCAM Universidad Católica de Murcia Murcia Spain
| | - Horacio Pérez‐Sánchez
- Structural Bioinformatics and High Performance Computing Research Group (BIO‐HPC), Computer Engineering Department UCAM Universidad Católica de Murcia Murcia Spain
| | - Junzhou Huang
- Tencent AI Lab, Shenzhen Tencent Computer Ltd Shenzhen China
| | - Huanxiang Liu
- School of Pharmacy Lanzhou University Lanzhou Gansu China
| | - Xiaojun Yao
- College of Chemistry and Chemical Engineering Lanzhou University Lanzhou Gansu China
| |
Collapse
|
50
|
Ghorbani M, Prasad S, Klauda J, Brooks B. GraphVAMPNet, using graph neural networks and variational approach to Markov processes for dynamical modeling of biomolecules. J Chem Phys 2022; 156:184103. [PMID: 35568532 PMCID: PMC9094994 DOI: 10.1063/5.0085607] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Finding low dimensional representation of data from long-timescale trajectories of biomolecular processes such as protein-folding or ligand-receptor binding is of fundamental importance and kinetic models such as Markov modeling have proven useful in describing the kinetics of these systems. Recently, an unsupervised machine learning technique called VAMPNet was introduced to learn the low dimensional representation and linear dynamical model in an end-to-end manner. VAMPNet is based on variational approach to Markov processes (VAMP) and relies on neural networks to learn the coarse-grained dynamics. In this contribution, we combine VAMPNet and graph neural networks to generate an end-to-end framework to efficiently learn high-level dynamics and metastable states from the long-timescale molecular dynamics trajectories. This method bears the advantages of graph representation learning and uses graph message passing operations to generate an embedding for each datapoint which is used in the VAMPNet to generate a coarse-grained representation. This type of molecular representation results in a higher resolution and more interpretable Markov model than the standard VAMPNet enabling a more detailed kinetic study of the biomolecular processes. Our GraphVAMPNet approach is also enhanced with an attention mechanism to find the important residues for classification into different metastable states.
Collapse
Affiliation(s)
- Mahdi Ghorbani
- University of Maryland at College Park, United States of America
| | - Samarjeet Prasad
- National Heart Lung and Blood Institute, United States of America
| | - Jeffery Klauda
- Chemical and Biomolecular Engineering, University of Maryland at College Park, United States of America
| | - Bernard Brooks
- Laboratory of Computational Biology, National Heart, Lung, and Blood Institute, United States of America
| |
Collapse
|