1
|
Jones MS, Shmilovich K, Ferguson AL. Tutorial on Molecular Latent Space Simulators (LSSs): Spatially and Temporally Continuous Data-Driven Surrogate Dynamical Models of Molecular Systems. J Phys Chem A 2024; 128:10299-10317. [PMID: 39540914 DOI: 10.1021/acs.jpca.4c05389] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2024]
Abstract
The inherently serial nature and requirement for short integration time steps in the numerical integration of molecular dynamics (MD) calculations place strong limitations on the accessible simulation time scales and statistical uncertainties in sampling slowly relaxing dynamical modes and rare events. Molecular latent space simulators (LSSs) are a data-driven approach to learning a surrogate dynamical model of the molecular system from modest MD training trajectories that can generate synthetic trajectories at a fraction of the computational cost. The training data may comprise single long trajectories or multiple short, discontinuous trajectories collected over, for example, distributed computing resources. Provided the training data provide sufficient sampling of the relevant thermodynamic states and dynamical transitions to robustly learn the underlying microscopic propagator, an LSS furnishes a global model of the dynamics capable of producing temporally and spatially continuous molecular trajectories. Trained LSS models have produced simulation trajectories at up to 6 orders of magnitude lower cost than standard MD to enable dense sampling of molecular phase space and large reduction of the statistical errors in structural, thermodynamic, and kinetic observables. The LSS employs three deep learning architectures to solve three independent learning problems over the training data: (i) an encoding of the high-dimensional MD into a low-dimensional slow latent space using state-free reversible VAMPnets (SRVs), (ii) a propagator of the microscopic dynamics within the low-dimensional latent space using mixture density networks (MDNs), and (iii) a generative decoding of the low-dimensional latent coordinates back to the original high-dimensional molecular configuration space using conditional Wasserstein generative adversarial networks (cWGANs) or denoising diffusion probability models (DDPMs). In this software tutorial, we introduce the mathematical and numerical background and theory of LSS and present example applications of a user-friendly Python package software implementation to alanine dipeptide and a 28-residue beta-beta-alpha (BBA) protein within simple Google Colab notebooks.
Collapse
Affiliation(s)
- Michael S Jones
- Pritzker School of Molecular Engineering, The University of Chicago, 5640 South Ellis Avenue, Chicago, Illinois 60637, United States
| | - Kirill Shmilovich
- Pritzker School of Molecular Engineering, The University of Chicago, 5640 South Ellis Avenue, Chicago, Illinois 60637, United States
| | - Andrew L Ferguson
- Pritzker School of Molecular Engineering, The University of Chicago, 5640 South Ellis Avenue, Chicago, Illinois 60637, United States
| |
Collapse
|
2
|
Koehl P, Navaza R, Tekpinar M, Delarue M. MinActionPath2: path generation between different conformations of large macromolecular assemblies by action minimization. Nucleic Acids Res 2024; 52:W256-W263. [PMID: 38783081 PMCID: PMC11223808 DOI: 10.1093/nar/gkae421] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2024] [Revised: 04/25/2024] [Accepted: 05/07/2024] [Indexed: 05/25/2024] Open
Abstract
Recent progress in solving macromolecular structures and assemblies by cryogenic electron microscopy techniques enables sampling of their conformations in different states that are relevant to their biological function. Knowing the transition path between these conformations would provide new avenues for drug discovery. While the experimental study of transition paths is intrinsically difficult, in-silico methods can be used to generate an initial guess for those paths. The Elastic Network Model (ENM), along with a coarse-grained representation (CG) of the structures are among the most popular models to explore such possible paths. Here we propose an update to our software platform MinActionPath that generates non-linear transition paths based on ENM and CG models, using action minimization to solve the equations of motion. The new website enables the study of large structures such as ribosomes or entire virus envelopes. It provides direct visualization of the trajectories along with quantitative analyses of their behaviors at http://dynstr.pasteur.fr/servers/minactionpath/minactionpath2_submission.
Collapse
Affiliation(s)
- Patrice Koehl
- Department of Computer Science and Genome Centre, University of California, Davis, CA 95616, USA
| | - Rafael Navaza
- Plateforme de Cristallographie, C2RT, Institut Pasteur, Université Paris Cité, UMR 3528 du CNRS, 75015 Paris, France
| | - Mustafa Tekpinar
- Unité Architecture et Dynamique des Macromolécules Biologiques, Institut Pasteur, Université Paris Cité, UMR 3528 du CNRS, 75015 Paris, France
| | - Marc Delarue
- Unité Architecture et Dynamique des Macromolécules Biologiques, Institut Pasteur, Université Paris Cité, UMR 3528 du CNRS, 75015 Paris, France
| |
Collapse
|
3
|
Palma Banos M, Popov AV, Hernandez R. Representability and Dynamical Consistency in Coarse-Grained Models. J Phys Chem B 2024; 128:1506-1514. [PMID: 38315661 DOI: 10.1021/acs.jpcb.3c08054] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2024]
Abstract
We address the challenge of representativity and dynamical consistency when unbonded fine-grained particles are collected together into coarse-grained particles. We implement a hybrid procedure for identifying and tracking the underlying fine-grained particles─e.g., atoms or molecules─by exchanging them between the coarse-grained particles periodically at a characteristic time. The exchange involves a back-mapping of the coarse-grained particles into fine-grained particles and a subsequent reassignment to coarse-grained particles conserving total mass and momentum. We find that an appropriate choice of the characteristic exchange time can lead to the correct effective diffusion rate of the fine-grained particles when simulated in hybrid coarse-grained dynamics. In the compressed (supercritical) fluid regime, without the exchange term, fine-grained particles remain associated with a given coarse-grained particle, leading to substantially lower diffusion rates than seen in all-atom molecular dynamics of the fine-grained particles. Thus, this work confirms the need for addressing the representativity of fine-grained particles within coarse-grained particles and offers a simple exchange mechanism so as to retain dynamical consistency between the fine- and coarse-grained scales.
Collapse
Affiliation(s)
- Manuel Palma Banos
- Department of Chemistry, Johns Hopkins University, Baltimore, Maryland 21218, United States
| | - Alexander V Popov
- Department of Chemistry, Johns Hopkins University, Baltimore, Maryland 21218, United States
| | - Rigoberto Hernandez
- Department of Chemistry, Johns Hopkins University, Baltimore, Maryland 21218, United States
- Department of Chemical & Biomolecular Engineering, Johns Hopkins University, Baltimore, Maryland 21218, United States
- Department of Materials Science & Engineering, Johns Hopkins University, Baltimore, Maryland 21218, United States
| |
Collapse
|
4
|
Jones MS, Shmilovich K, Ferguson AL. DiAMoNDBack: Diffusion-Denoising Autoregressive Model for Non-Deterministic Backmapping of Cα Protein Traces. J Chem Theory Comput 2023; 19:7908-7923. [PMID: 37906711 DOI: 10.1021/acs.jctc.3c00840] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2023]
Abstract
Coarse-grained molecular models of proteins permit access to length and time scales unattainable by all-atom models and the simulation of processes that occur on long time scales, such as aggregation and folding. The reduced resolution realizes computational accelerations, but an atomistic representation can be vital for a complete understanding of mechanistic details. Backmapping is the process of restoring all-atom resolution to coarse-grained molecular models. In this work, we report DiAMoNDBack (Diffusion-denoising Autoregressive Model for Non-Deterministic Backmapping) as an autoregressive denoising diffusion probability model to restore all-atom details to coarse-grained protein representations retaining only Cα coordinates. The autoregressive generation process proceeds from the protein N-terminus to C-terminus in a residue-by-residue fashion conditioned on the Cα trace and previously backmapped backbone and side-chain atoms within the local neighborhood. The local and autoregressive nature of our model makes it transferable between proteins. The stochastic nature of the denoising diffusion process means that the model generates a realistic ensemble of backbone and side-chain all-atom configurations consistent with the coarse-grained Cα trace. We train DiAMoNDBack over 65k+ structures from the Protein Data Bank (PDB) and validate it in applications to a hold-out PDB test set, intrinsically disordered protein structures from the Protein Ensemble Database (PED), molecular dynamics simulations of fast-folding mini-proteins from DE Shaw Research, and coarse-grained simulation data. We achieve state-of-the-art reconstruction performance in terms of correct bond formation, avoidance of side-chain clashes, and the diversity of the generated side-chain configurational states. We make the DiAMoNDBack model publicly available as a free and open-source Python package.
Collapse
Affiliation(s)
- Michael S Jones
- Pritzker School of Molecular Engineering, University of Chicago, Chicago, Illinois 60637, United States
| | - Kirill Shmilovich
- Pritzker School of Molecular Engineering, University of Chicago, Chicago, Illinois 60637, United States
| | - Andrew L Ferguson
- Pritzker School of Molecular Engineering, University of Chicago, Chicago, Illinois 60637, United States
| |
Collapse
|
5
|
Ricci E, Vergadou N. Integrating Machine Learning in the Coarse-Grained Molecular Simulation of Polymers. J Phys Chem B 2023; 127:2302-2322. [PMID: 36888553 DOI: 10.1021/acs.jpcb.2c06354] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/09/2023]
Abstract
Machine learning (ML) is having an increasing impact on the physical sciences, engineering, and technology and its integration into molecular simulation frameworks holds great potential to expand their scope of applicability to complex materials and facilitate fundamental knowledge and reliable property predictions, contributing to the development of efficient materials design routes. The application of ML in materials informatics in general, and polymer informatics in particular, has led to interesting results, however great untapped potential lies in the integration of ML techniques into the multiscale molecular simulation methods for the study of macromolecular systems, specifically in the context of Coarse Grained (CG) simulations. In this Perspective, we aim at presenting the pioneering recent research efforts in this direction and discussing how these new ML-based techniques can contribute to critical aspects of the development of multiscale molecular simulation methods for bulk complex chemical systems, especially polymers. Prerequisites for the implementation of such ML-integrated methods and open challenges that need to be met toward the development of general systematic ML-based coarse graining schemes for polymers are discussed.
Collapse
Affiliation(s)
- Eleonora Ricci
- Institute of Nanoscience and Nanotechnology, National Center for Scientific Research "Demokritos", GR-15341 Agia Paraskevi, Athens, Greece
- Institute of Informatics and Telecommunications, National Center for Scientific Research "Demokritos", GR-15341 Agia Paraskevi, Athens, Greece
| | - Niki Vergadou
- Institute of Nanoscience and Nanotechnology, National Center for Scientific Research "Demokritos", GR-15341 Agia Paraskevi, Athens, Greece
| |
Collapse
|
6
|
Shmilovich K, Stieffenhofer M, Charron NE, Hoffmann M. Temporally Coherent Backmapping of Molecular Trajectories From Coarse-Grained to Atomistic Resolution. J Phys Chem A 2022; 126:9124-9139. [PMID: 36417670 PMCID: PMC9743211 DOI: 10.1021/acs.jpca.2c07716] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
Coarse-graining offers a means to extend the achievable time and length scales of molecular dynamics simulations beyond what is practically possible in the atomistic regime. Sampling molecular configurations of interest can be done efficiently using coarse-grained simulations, from which meaningful physicochemical information can be inferred if the corresponding all-atom configurations are reconstructed. However, this procedure of backmapping to reintroduce the lost atomistic detail into coarse-grain structures has proven a challenging task due to the many feasible atomistic configurations that can be associated with one coarse-grain structure. Existing backmapping methods are strictly frame-based, relying on either heuristics to replace coarse-grain particles with atomic fragments and subsequent relaxation or parametrized models to propose atomic coordinates separately and independently for each coarse-grain structure. These approaches neglect information from previous trajectory frames that is critical to ensuring temporal coherence of the backmapped trajectory, while also offering information potentially helpful to producing higher-fidelity atomic reconstructions. In this work, we present a deep learning-enabled data-driven approach for temporally coherent backmapping that explicitly incorporates information from preceding trajectory structures. Our method trains a conditional variational autoencoder to nondeterministically reconstruct atomistic detail conditioned on both the target coarse-grain configuration and the previously reconstructed atomistic configuration. We demonstrate our backmapping approach on two exemplar biomolecular systems: alanine dipeptide and the miniprotein chignolin. We show that our backmapped trajectories accurately recover the structural, thermodynamic, and kinetic properties of the atomistic trajectory data.
Collapse
Affiliation(s)
- Kirill Shmilovich
- Pritzker
School of Molecular Engineering, University
of Chicago, Chicago, Illinois60637, United States,E-mail:
| | | | - Nicholas E. Charron
- Weiss
School of Natural Sciences, Department of Physics and Astronomy, Rice University, Houston, Texas77005, United States,Department
of Physics, Freie Universität Berlin, Berlin14195, Germany
| | - Moritz Hoffmann
- Fachbereich
Mathematik und Informatik, Freie Universität
Berlin, Berlin14195, Germany
| |
Collapse
|