1
|
Chennakesavalu S, Rotskoff GM. Data-Efficient Generation of Protein Conformational Ensembles with Backbone-to-Side-Chain Transformers. J Phys Chem B 2024; 128:2114-2123. [PMID: 38394363 DOI: 10.1021/acs.jpcb.3c08195] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/25/2024]
Abstract
Excitement at the prospect of using data-driven generative models to sample configurational ensembles of biomolecular systems stems from the extraordinary success of these models on a diverse set of high-dimensional sampling tasks. Unlike image generation or even the closely related problem of protein structure prediction, there are currently no data sources with sufficient breadth to parametrize generative models for conformational ensembles. To enable discovery, a fundamentally different approach to building generative models is required: models should be able to propose rare, albeit physical, conformations that may not arise in even the largest data sets. Here we introduce a modular strategy to generate conformations based on "backmapping" from a fixed protein backbone that (1) maintains conformational diversity of the side chains and (2) couples the side-chain fluctuations using global information about the protein conformation. Our model combines simple statistical models of side-chain conformations based on rotamer libraries with the now ubiquitous transformer architecture to sample with atomistic accuracy. Together, these ingredients provide a strategy for rapid data acquisition and hence a crucial ingredient for scalable physical simulation with generative neural networks.
Collapse
Affiliation(s)
| | - Grant M Rotskoff
- Department of Chemistry, Stanford University, Stanford, California 94305, United States
- Institute for Computational and Mathematical Engineering, Stanford University, Stanford, California 94305, United States
| |
Collapse
|
2
|
Palma Banos M, Popov AV, Hernandez R. Representability and Dynamical Consistency in Coarse-Grained Models. J Phys Chem B 2024; 128:1506-1514. [PMID: 38315661 DOI: 10.1021/acs.jpcb.3c08054] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2024]
Abstract
We address the challenge of representativity and dynamical consistency when unbonded fine-grained particles are collected together into coarse-grained particles. We implement a hybrid procedure for identifying and tracking the underlying fine-grained particles─e.g., atoms or molecules─by exchanging them between the coarse-grained particles periodically at a characteristic time. The exchange involves a back-mapping of the coarse-grained particles into fine-grained particles and a subsequent reassignment to coarse-grained particles conserving total mass and momentum. We find that an appropriate choice of the characteristic exchange time can lead to the correct effective diffusion rate of the fine-grained particles when simulated in hybrid coarse-grained dynamics. In the compressed (supercritical) fluid regime, without the exchange term, fine-grained particles remain associated with a given coarse-grained particle, leading to substantially lower diffusion rates than seen in all-atom molecular dynamics of the fine-grained particles. Thus, this work confirms the need for addressing the representativity of fine-grained particles within coarse-grained particles and offers a simple exchange mechanism so as to retain dynamical consistency between the fine- and coarse-grained scales.
Collapse
Affiliation(s)
- Manuel Palma Banos
- Department of Chemistry, Johns Hopkins University, Baltimore, Maryland 21218, United States
| | - Alexander V Popov
- Department of Chemistry, Johns Hopkins University, Baltimore, Maryland 21218, United States
| | - Rigoberto Hernandez
- Department of Chemistry, Johns Hopkins University, Baltimore, Maryland 21218, United States
- Department of Chemical & Biomolecular Engineering, Johns Hopkins University, Baltimore, Maryland 21218, United States
- Department of Materials Science & Engineering, Johns Hopkins University, Baltimore, Maryland 21218, United States
| |
Collapse
|
3
|
Kidder KM, Shell MS, Noid WG. Surveying the energy landscape of coarse-grained mappings. J Chem Phys 2024; 160:054105. [PMID: 38310476 DOI: 10.1063/5.0182524] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2023] [Accepted: 12/28/2023] [Indexed: 02/05/2024] Open
Abstract
Simulations of soft materials often adopt low-resolution coarse-grained (CG) models. However, the CG representation is not unique and its impact upon simulated properties is poorly understood. In this work, we investigate the space of CG representations for ubiquitin, which is a typical globular protein with 72 amino acids. We employ Monte Carlo methods to ergodically sample this space and to characterize its landscape. By adopting the Gaussian network model as an analytically tractable atomistic model for equilibrium fluctuations, we exactly assess the intrinsic quality of each CG representation without introducing any approximations in sampling configurations or in modeling interactions. We focus on two metrics, the spectral quality and the information content, that quantify the extent to which the CG representation preserves low-frequency, large-amplitude motions and configurational information, respectively. The spectral quality and information content are weakly correlated among high-resolution representations but become strongly anticorrelated among low-resolution representations. Representations with maximal spectral quality appear consistent with physical intuition, while low-resolution representations with maximal information content do not. Interestingly, quenching studies indicate that the energy landscape of mapping space is very smooth and highly connected. Moreover, our study suggests a critical resolution below which a "phase transition" qualitatively distinguishes good and bad representations.
Collapse
Affiliation(s)
- Katherine M Kidder
- Department of Chemistry, The Pennsylvania State University, University Park, Pennsylvania 16802, USA
| | - M Scott Shell
- Department of Chemical Engineering, University of California, Santa Barbara, California 93106, USA
| | - W G Noid
- Department of Chemistry, The Pennsylvania State University, University Park, Pennsylvania 16802, USA
| |
Collapse
|
4
|
Maier JC, Wang CI, Jackson NE. Distilling coarse-grained representations of molecular electronic structure with continuously gated message passing. J Chem Phys 2024; 160:024109. [PMID: 38193551 DOI: 10.1063/5.0179253] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2023] [Accepted: 12/14/2023] [Indexed: 01/10/2024] Open
Abstract
Bottom-up methods for coarse-grained (CG) molecular modeling are critically needed to establish rigorous links between atomistic reference data and reduced molecular representations. For a target molecule, the ideal reduced CG representation is a function of both the conformational ensemble of the system and the target physical observable(s) to be reproduced at the CG resolution. However, there is an absence of algorithms for selecting CG representations of molecules from which complex properties, including molecular electronic structure, can be accurately modeled. We introduce continuously gated message passing (CGMP), a graph neural network (GNN) method for atomically decomposing molecular electronic structure sampled over conformational ensembles. CGMP integrates 3D-invariant GNNs and a novel gated message passing system to continuously reduce the atomic degrees of freedom accessible for electronic predictions, resulting in a one-shot importance ranking of atoms contributing to a target molecular property. Moreover, CGMP provides the first approach by which to quantify the degeneracy of "good" CG representations conditioned on specific prediction targets, facilitating the development of more transferable CG representations. We further show how CGMP can be used to highlight multiatom correlations, illuminating a path to developing CG electronic Hamiltonians in terms of interpretable collective variables for arbitrarily complex molecules.
Collapse
Affiliation(s)
- J Charlie Maier
- Department of Physics, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, USA
| | - Chun-I Wang
- Department of Chemistry, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, USA
| | - Nicholas E Jackson
- Department of Chemistry, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, USA
| |
Collapse
|
5
|
Jones MS, Shmilovich K, Ferguson AL. DiAMoNDBack: Diffusion-Denoising Autoregressive Model for Non-Deterministic Backmapping of Cα Protein Traces. J Chem Theory Comput 2023; 19:7908-7923. [PMID: 37906711 DOI: 10.1021/acs.jctc.3c00840] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2023]
Abstract
Coarse-grained molecular models of proteins permit access to length and time scales unattainable by all-atom models and the simulation of processes that occur on long time scales, such as aggregation and folding. The reduced resolution realizes computational accelerations, but an atomistic representation can be vital for a complete understanding of mechanistic details. Backmapping is the process of restoring all-atom resolution to coarse-grained molecular models. In this work, we report DiAMoNDBack (Diffusion-denoising Autoregressive Model for Non-Deterministic Backmapping) as an autoregressive denoising diffusion probability model to restore all-atom details to coarse-grained protein representations retaining only Cα coordinates. The autoregressive generation process proceeds from the protein N-terminus to C-terminus in a residue-by-residue fashion conditioned on the Cα trace and previously backmapped backbone and side-chain atoms within the local neighborhood. The local and autoregressive nature of our model makes it transferable between proteins. The stochastic nature of the denoising diffusion process means that the model generates a realistic ensemble of backbone and side-chain all-atom configurations consistent with the coarse-grained Cα trace. We train DiAMoNDBack over 65k+ structures from the Protein Data Bank (PDB) and validate it in applications to a hold-out PDB test set, intrinsically disordered protein structures from the Protein Ensemble Database (PED), molecular dynamics simulations of fast-folding mini-proteins from DE Shaw Research, and coarse-grained simulation data. We achieve state-of-the-art reconstruction performance in terms of correct bond formation, avoidance of side-chain clashes, and the diversity of the generated side-chain configurational states. We make the DiAMoNDBack model publicly available as a free and open-source Python package.
Collapse
Affiliation(s)
- Michael S Jones
- Pritzker School of Molecular Engineering, University of Chicago, Chicago, Illinois 60637, United States
| | - Kirill Shmilovich
- Pritzker School of Molecular Engineering, University of Chicago, Chicago, Illinois 60637, United States
| | - Andrew L Ferguson
- Pritzker School of Molecular Engineering, University of Chicago, Chicago, Illinois 60637, United States
| |
Collapse
|
6
|
Boulougouris GC. Accessible Molecular System Creator: Building Molecular Configurations Based on the Inaccessible Molecular Volume and Accessible Molecular Surface via Static Monte Carlo Sampling. J Phys Chem B 2023; 127:9520-9531. [PMID: 37883744 DOI: 10.1021/acs.jpcb.3c03670] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2023]
Abstract
Monte Carlo (MC) stochastic sampling is a powerful tool in classical molecular simulations that directly connects the observable macroscopic properties of matter and the underlying atomistic interactions. This connection operates within the framework of the statistical mechanics proposed by Gibbs. Most MC simulations are "dynamic," creating statistical ensembles of microstates via a Markovian chain, where each microstate in the ensemble depends only on its previous microstate. Herein, we re-examine an alternative form of MC that generates ensemble members through a "static" approach, building molecular systems stepwise. The basic theory for such an approach traces back to Rosenbluth and Rosenbluth, who proposed "static" stepwise sampling of a polymeric chain. It is almost as old as the Metropolis importance sampling approach used in dynamic MC, although the latter has been considerably more popular than the former. Herein, we address the main obstacle in static MC that has hindered the widespread adoption of Rosenbluth-based approaches in atomistic simulations. The obstacle lies in mapping the molecular accessible volume for adding a molecule in a Rosenbluth-like static sampling of atomistic configurations. We demonstrate a breakthrough by leveraging the ability to analytically map the inaccessible molecular volume and the accessible molecular surface owing to interatomically excluded volume interactions. This advance substantially enhances the ability to create molecular samples using a Rosenbluth-like static building process. The proposed approach can be used as a tool for creating initial configurations in MC or molecular dynamics simulations─a field where Rosenbluth-like static building has been applied. Additionally, this approach can be used as the first step in a perturbation scheme that accurately estimates free energy differences by estimating the chemical work related to molecule addition, removal, or reinsertion within the context of free energy perturbation schemes employed in molecular simulations.
Collapse
Affiliation(s)
- Georgios C Boulougouris
- Laboratory of Computational Physical Chemistry, Department of Molecular Biology and Genetics, University of Thrace, GR 681 00 Alexandroupoulis, Greece
| |
Collapse
|
7
|
Krämer A, Durumeric AEP, Charron NE, Chen Y, Clementi C, Noé F. Statistically Optimal Force Aggregation for Coarse-Graining Molecular Dynamics. J Phys Chem Lett 2023; 14:3970-3979. [PMID: 37079800 DOI: 10.1021/acs.jpclett.3c00444] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/03/2023]
Abstract
Machine-learned coarse-grained (CG) models have the potential for simulating large molecular complexes beyond what is possible with atomistic molecular dynamics. However, training accurate CG models remains a challenge. A widely used methodology for learning bottom-up CG force fields maps forces from all-atom molecular dynamics to the CG representation and matches them with a CG force field on average. We show that there is flexibility in how to map all-atom forces to the CG representation and that the most commonly used mapping methods are statistically inefficient and potentially even incorrect in the presence of constraints in the all-atom simulation. We define an optimization statement for force mappings and demonstrate that substantially improved CG force fields can be learned from the same simulation data when using optimized force maps. The method is demonstrated on the miniproteins chignolin and tryptophan cage and published as open-source code.
Collapse
Affiliation(s)
- Andreas Krämer
- Department of Mathematics and Computer Science, Freie Universität Berlin, Arnimallee 12, 14195 Berlin, Germany
| | - Aleksander E P Durumeric
- Department of Mathematics and Computer Science, Freie Universität Berlin, Arnimallee 12, 14195 Berlin, Germany
| | - Nicholas E Charron
- Department of Physics and Astronomy, Rice University, Houston, Texas 77005, United States
- Center for Theoretical Biological Physics, Rice University, Houston, Texas 77251, United States
- Department of Physics, Freie Universität Berlin, Arnimallee 12, 14195 Berlin, Germany
| | - Yaoyi Chen
- Department of Mathematics and Computer Science, Freie Universität Berlin, Arnimallee 12, 14195 Berlin, Germany
- International Max Planck Research School for Biology and Computation (IMPRS-BAC), Max Planck Institute for Molecular Genetics, 14195 Berlin, Germany
| | - Cecilia Clementi
- Department of Physics and Astronomy, Rice University, Houston, Texas 77005, United States
- Center for Theoretical Biological Physics, Rice University, Houston, Texas 77251, United States
- Department of Physics, Freie Universität Berlin, Arnimallee 12, 14195 Berlin, Germany
- Department of Chemistry, Rice University, Houston, Texas 77005, United States
| | - Frank Noé
- Department of Mathematics and Computer Science, Freie Universität Berlin, Arnimallee 12, 14195 Berlin, Germany
- Department of Physics, Freie Universität Berlin, Arnimallee 12, 14195 Berlin, Germany
- Department of Chemistry, Rice University, Houston, Texas 77005, United States
- Microsoft Research AI4Science, Karl-Liebknecht Straße 32, 10178 Berlin, Germany
| |
Collapse
|