1
|
Gupta A, Ma H, Ramanathan A, Zerze GH. A Deep Learning-Driven Sampling Technique to Explore the Phase Space of an RNA Stem-Loop. J Chem Theory Comput 2024; 20:9178-9189. [PMID: 39374435 DOI: 10.1021/acs.jctc.4c00669] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/09/2024]
Abstract
The folding and unfolding of RNA stem-loops are critical biological processes; however, their computational studies are often hampered by the ruggedness of their folding landscape, necessitating long simulation times at the atomistic scale. Here, we adapted DeepDriveMD (DDMD), an advanced deep learning-driven sampling technique originally developed for protein folding, to address the challenges of RNA stem-loop folding. Although tempering- and order parameter-based techniques are commonly used for similar rare-event problems, the computational costs or the need for a priori knowledge about the system often present a challenge in their effective use. DDMD overcomes these challenges by adaptively learning from an ensemble of running MD simulations using generic contact maps as the raw input. DeepDriveMD enables on-the-fly learning of a low-dimensional latent representation and guides the simulation toward the undersampled regions while optimizing the resources to explore the relevant parts of the phase space. We showed that DDMD estimates the free energy landscape of the RNA stem-loop reasonably well at room temperature. Our simulation framework runs at a constant temperature without external biasing potential, hence preserving the information on transition rates, with a computational cost much lower than that of the simulations performed with external biasing potentials. We also introduced a reweighting strategy for obtaining unbiased free energy surfaces and presented a qualitative analysis of the latent space. This analysis showed that the latent space captures the relevant slow degrees of freedom for the RNA folding problem of interest. Finally, throughout the manuscript, we outlined how different parameters are selected and optimized to adapt DDMD for this system. We believe this compendium of decision-making processes will help new users adapt this technique for the rare-event sampling problems of their interest.
Collapse
Affiliation(s)
- Ayush Gupta
- William A. Brookshire Department of Chemical and Biomolecular Engineering, University of Houston, Houston, Texas 77204, United States
| | - Heng Ma
- Data Science and Learning Division, Argonne National Laboratory, Lemont, Illinois 60439, United States
| | - Arvind Ramanathan
- Data Science and Learning Division, Argonne National Laboratory, Lemont, Illinois 60439, United States
| | - Gül H Zerze
- William A. Brookshire Department of Chemical and Biomolecular Engineering, University of Houston, Houston, Texas 77204, United States
| |
Collapse
|
2
|
Rydzewski J. Spectral Map for Slow Collective Variables, Markovian Dynamics, and Transition State Ensembles. J Chem Theory Comput 2024; 20. [PMID: 39265157 PMCID: PMC11428138 DOI: 10.1021/acs.jctc.4c00428] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2024] [Revised: 08/14/2024] [Accepted: 08/14/2024] [Indexed: 09/14/2024]
Abstract
Understanding the behavior of complex molecular systems is a fundamental problem in physical chemistry. To describe the long-time dynamics of such systems, which is responsible for their most informative characteristics, we can identify a few slow collective variables (CVs) while treating the remaining fast variables as thermal noise. This enables us to simplify the dynamics and treat it as diffusion in a free-energy landscape spanned by slow CVs, effectively rendering the dynamics Markovian. Our recent statistical learning technique, spectral map [Rydzewski, J. J. Phys. Chem. Lett. 2023, 14(22), 5216-5220], explores this strategy to learn slow CVs by maximizing a spectral gap of a transition matrix. In this work, we introduce several advancements into our framework, using a high-dimensional reversible folding process of a protein as an example. We implement an algorithm for coarse-graining Markov transition matrices to partition the reduced space of slow CVs kinetically and use it to define a transition state ensemble. We show that slow CVs learned by spectral map closely approach the Markovian limit for an overdamped diffusion. We demonstrate that coordinate-dependent diffusion coefficients only slightly affect the constructed free-energy landscapes. Finally, we present how spectral maps can be used to quantify the importance of features and compare slow CVs with structural descriptors commonly used in protein folding. Overall, we demonstrate that a single slow CV learned by spectral map can be used as a physical reaction coordinate to capture essential characteristics of protein folding.
Collapse
Affiliation(s)
- Jakub Rydzewski
- Institute of Physics, Faculty
of Physics, Astronomy and Informatics, Nicolaus
Copernicus University, Grudziadzka 5, 87-100 Toruń, Poland
| |
Collapse
|
3
|
Mehdi S, Smith Z, Herron L, Zou Z, Tiwary P. Enhanced Sampling with Machine Learning. Annu Rev Phys Chem 2024; 75:347-370. [PMID: 38382572 PMCID: PMC11213683 DOI: 10.1146/annurev-physchem-083122-125941] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/23/2024]
Abstract
Molecular dynamics (MD) enables the study of physical systems with excellent spatiotemporal resolution but suffers from severe timescale limitations. To address this, enhanced sampling methods have been developed to improve the exploration of configurational space. However, implementing these methods is challenging and requires domain expertise. In recent years, integration of machine learning (ML) techniques into different domains has shown promise, prompting their adoption in enhanced sampling as well. Although ML is often employed in various fields primarily due to its data-driven nature, its integration with enhanced sampling is more natural with many common underlying synergies. This review explores the merging of ML and enhanced MD by presenting different shared viewpoints. It offers a comprehensive overview of this rapidly evolving field, which can be difficult to stay updated on. We highlight successful strategies such as dimensionality reduction, reinforcement learning, and flow-based methods. Finally, we discuss open problems at the exciting ML-enhanced MD interface.
Collapse
Affiliation(s)
- Shams Mehdi
- Institute for Physical Science and Technology, University of Maryland, College Park, Maryland, USA;
- Biophysics Program, University of Maryland, College Park, Maryland, USA
| | - Zachary Smith
- Institute for Physical Science and Technology, University of Maryland, College Park, Maryland, USA;
- Biophysics Program, University of Maryland, College Park, Maryland, USA
| | - Lukas Herron
- Institute for Physical Science and Technology, University of Maryland, College Park, Maryland, USA;
- Biophysics Program, University of Maryland, College Park, Maryland, USA
| | - Ziyue Zou
- Department of Chemistry and Biochemistry, University of Maryland, College Park, Maryland, USA
| | - Pratyush Tiwary
- Institute for Physical Science and Technology, University of Maryland, College Park, Maryland, USA;
- Department of Chemistry and Biochemistry, University of Maryland, College Park, Maryland, USA
| |
Collapse
|
4
|
Rydzewski J, Gökdemir T. Learning Markovian dynamics with spectral maps. J Chem Phys 2024; 160:091102. [PMID: 38436438 DOI: 10.1063/5.0189241] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2023] [Accepted: 02/05/2024] [Indexed: 03/05/2024] Open
Abstract
The long-time behavior of many complex molecular systems can often be described by Markovian dynamics in a slow subspace spanned by a few reaction coordinates referred to as collective variables (CVs). However, determining CVs poses a fundamental challenge in chemical physics. Depending on intuition or trial and error to construct CVs can lead to non-Markovian dynamics with long memory effects, hindering analysis. To address this problem, we continue to develop a recently introduced deep-learning technique called spectral map [J. Rydzewski, J. Phys. Chem. Lett. 14, 5216-5220 (2023)]. Spectral map learns slow CVs by maximizing a spectral gap of a Markov transition matrix describing anisotropic diffusion. Here, to represent heterogeneous and multiscale free-energy landscapes with spectral map, we implement an adaptive algorithm to estimate transition probabilities. Through a Markov state model analysis, we validate that spectral map learns slow CVs related to the dominant relaxation timescales and discerns between long-lived metastable states.
Collapse
Affiliation(s)
- Jakub Rydzewski
- Institute of Physics, Faculty of Physics, Astronomy and Informatics, Nicolaus Copernicus University, Grudziadzka 5, 87-100 Toruń, Poland
| | - Tuğçe Gökdemir
- Institute of Physics, Faculty of Physics, Astronomy and Informatics, Nicolaus Copernicus University, Grudziadzka 5, 87-100 Toruń, Poland
| |
Collapse
|
5
|
Fu H, Bian H, Shao X, Cai W. Collective Variable-Based Enhanced Sampling: From Human Learning to Machine Learning. J Phys Chem Lett 2024; 15:1774-1783. [PMID: 38329095 DOI: 10.1021/acs.jpclett.3c03542] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/09/2024]
Abstract
Enhanced-sampling algorithms relying on collective variables (CVs) are extensively employed to study complex (bio)chemical processes that are not amenable to brute-force molecular simulations. The selection of appropriate CVs characterizing the slow movement modes is of paramount importance for reliable and efficient enhanced-sampling simulations. In this Perspective, we first review the application and limitations of CVs obtained from chemical and geometrical intuition. We also introduce path-sampling algorithms, which can identify path-like CVs in a high-dimensional free-energy space. Machine-learning algorithms offer a viable approach to finding suitable CVs by analyzing trajectories from preliminary simulations. We discuss both the performance of machine-learning-derived CVs in enhanced-sampling simulations of experimental models and the challenges involved in applying these CVs to realistic, complex molecular assemblies. Moreover, we provide a prospective view of the potential advancements of machine-learning algorithms for the development of CVs in the field of enhanced-sampling simulations.
Collapse
Affiliation(s)
- Haohao Fu
- Research Center for Analytical Sciences, Frontiers Science Center for New Organic Matter, College of Chemistry, Nankai University, Tianjin Key Laboratory of Biosensing and Molecular Recognition, State Key Laboratory of Medicinal Chemical Biology, Tianjin 300071, China
- Haihe Laboratory of Sustainable Chemical Transformations, Tianjin 300192, China
| | - Hengwei Bian
- Research Center for Analytical Sciences, Frontiers Science Center for New Organic Matter, College of Chemistry, Nankai University, Tianjin Key Laboratory of Biosensing and Molecular Recognition, State Key Laboratory of Medicinal Chemical Biology, Tianjin 300071, China
- Haihe Laboratory of Sustainable Chemical Transformations, Tianjin 300192, China
| | - Xueguang Shao
- Research Center for Analytical Sciences, Frontiers Science Center for New Organic Matter, College of Chemistry, Nankai University, Tianjin Key Laboratory of Biosensing and Molecular Recognition, State Key Laboratory of Medicinal Chemical Biology, Tianjin 300071, China
- Haihe Laboratory of Sustainable Chemical Transformations, Tianjin 300192, China
| | - Wensheng Cai
- Research Center for Analytical Sciences, Frontiers Science Center for New Organic Matter, College of Chemistry, Nankai University, Tianjin Key Laboratory of Biosensing and Molecular Recognition, State Key Laboratory of Medicinal Chemical Biology, Tianjin 300071, China
- Haihe Laboratory of Sustainable Chemical Transformations, Tianjin 300192, China
| |
Collapse
|
6
|
Hradiská H, Kurečka M, Beránek J, Tedeschi G, Višňovský V, Křenek A, Spiwok V. Acceleration of Molecular Simulations by Parametric Time-Lagged tSNE Metadynamics. J Phys Chem B 2024; 128:903-913. [PMID: 38237064 PMCID: PMC10839826 DOI: 10.1021/acs.jpcb.3c05669] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2023] [Revised: 12/22/2023] [Accepted: 12/28/2023] [Indexed: 02/02/2024]
Abstract
The potential of molecular simulations is limited by their computational costs. There is often a need to accelerate simulations using some of the enhanced sampling methods. Metadynamics applies a history-dependent bias potential that disfavors previously visited states. To apply metadynamics, it is necessary to select a few properties of the system─collective variables (CVs) that can be used to define the bias potential. Over the past few years, there have been emerging opportunities for machine learning and, in particular, artificial neural networks within this domain. In this broad context, a specific unsupervised machine learning method was utilized, namely, parametric time-lagged t-distributed stochastic neighbor embedding (ptltSNE) to design CVs. The approach was tested on a Trp-cage trajectory (tryptophan cage) from the literature. The trajectory was used to generate a map of conformations, distinguish fast conformational changes from slow ones, and design CVs. Then, metadynamic simulations were performed. To accelerate the formation of the α-helix, we added the α-RMSD collective variable. This simulation led to one folding event in a 350 ns metadynamics simulation. To accelerate degrees of freedom not addressed by CVs, we performed parallel tempering metadynamics. This simulation led to 10 folding events in a 200 ns simulation with 32 replicas.
Collapse
Affiliation(s)
- Helena Hradiská
- Department
of Biochemistry and Microbiology, University
of Chemistry and Technology Prague, Technická 3, Prague
6 166 28, Czech Republic
| | - Martin Kurečka
- Institute
of Computer Science, Masaryk Univerzity, Šumavská 416/15, Brno 602 00, Czech Republic
| | - Jan Beránek
- Department
of Biochemistry and Microbiology, University
of Chemistry and Technology Prague, Technická 3, Prague
6 166 28, Czech Republic
| | - Guglielmo Tedeschi
- Department
of Biochemistry and Microbiology, University
of Chemistry and Technology Prague, Technická 3, Prague
6 166 28, Czech Republic
| | - Vladimír Višňovský
- Institute
of Computer Science, Masaryk Univerzity, Šumavská 416/15, Brno 602 00, Czech Republic
| | - Aleš Křenek
- Institute
of Computer Science, Masaryk Univerzity, Šumavská 416/15, Brno 602 00, Czech Republic
| | - Vojtěch Spiwok
- Department
of Biochemistry and Microbiology, University
of Chemistry and Technology Prague, Technická 3, Prague
6 166 28, Czech Republic
| |
Collapse
|
7
|
Stevensson B, Edén M. Improved reweighting protocols for variationally enhanced sampling simulations with multiple walkers. Phys Chem Chem Phys 2023; 25:22063-22078. [PMID: 37560777 DOI: 10.1039/d2cp04009c] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/11/2023]
Abstract
In molecular dynamics simulations utilizing enhanced-sampling techniques, reweighting is a central component for recovering the targeted ensemble averages of the "unbiased" system by calculating and applying a bias-correction function c(t). We present enhanced reweighting protocols for variationally enhanced sampling (VES) simulations by exploiting a recent reweighting method, originally introduced in the metadynamics framework [Giberti et al. J. Chem. Theory Comput., 2020, 16, 100-107], which was modified and extended to multiple-walker simulations: these may be implemented either as "independent" walkers (associated with one unique correction function per walker) or "cooperative" ones that all share one correction function, which is the hitherto only explored option. When each case is combined with the two possibilities of determining c(t) by time integration up to either t or over the entire simulation period , altogether four reweighting options result. Their relative merits were assessed by well-tempered VES simulations of two model problems: locating the free-energy difference between two metastable molecular conformations of the N-acetyl-L-alanine methylamide dipeptide, and the recovery of an a priori known distribution when one water molecule in the liquid phase is perturbed by a periodic free-energy function. The most rapid convergence occurred for large cooperative walkers, regardless of the upper integration limit, but integrating up to t proved advantageous for small walker ensembles. That novel reweighting method compared favorably to the standard VES reweighting, as well as to current state-of-the-art reweighting options introduced for metadynamics simulations that estimate c(t) by integration over the collective variables. For further gains in computational speed and accuracy, we also introduce analytical solutions for c(t), as well as offering further insight into its features by approximative analytical expressions in the "high-temperature" regime.
Collapse
Affiliation(s)
- Baltzar Stevensson
- Department of Materials and Environmental Chemistry, Stockholm University, SE-106 91 Stockholm, Sweden.
| | - Mattias Edén
- Department of Materials and Environmental Chemistry, Stockholm University, SE-106 91 Stockholm, Sweden.
| |
Collapse
|
8
|
Chen H, Roux B, Chipot C. Discovering Reaction Pathways, Slow Variables, and Committor Probabilities with Machine Learning. J Chem Theory Comput 2023; 19:4414-4426. [PMID: 37224455 PMCID: PMC11372462 DOI: 10.1021/acs.jctc.3c00028] [Citation(s) in RCA: 15] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/26/2023]
Abstract
A significant challenge faced by atomistic simulations is the difficulty, and often impossibility, to sample the transitions between metastable states of the free-energy landscape associated with slow molecular processes. Importance-sampling schemes represent an appealing option to accelerate the underlying dynamics by smoothing out the relevant free-energy barriers, but require the definition of suitable reaction-coordinate (RC) models expressed in terms of compact low-dimensional sets of collective variables (CVs). While most computational studies of slow molecular processes have traditionally relied on educated guesses based on human intuition to reduce the dimensionality of the problem at hand, a variety of machine-learning (ML) algorithms have recently emerged as powerful alternatives to discover meaningful CVs capable of capturing the dynamics of the slowest degrees of freedom. Considering a simple paradigmatic situation in which the long-time dynamics is dominated by the transition between two known metastable states, we compare two variational data-driven ML methods based on Siamese neural networks aimed at discovering a meaningful RC model─the slowest decorrelating CV of the molecular process, and the committor probability to first reach one of the two metastable states. One method is the state-free reversible variational approach for Markov processes networks (VAMPnets), or SRVs─the other, inspired by the transition path theory framework, is the variational committor-based neural networks, or VCNs. The relationship and the ability of these methodologies to discover the relevant descriptors of the slow molecular process of interest are illustrated with a series of simple model systems. We also show that both strategies are amenable to importance-sampling schemes through an appropriate reweighting algorithm that approximates the kinetic properties of the transition.
Collapse
Affiliation(s)
- Haochuan Chen
- Laboratoire International Associé Centre National de la Recherche Scientifique et University of Illinois at Urbana-Champaign, Unité Mixte de Recherche n°7019, Université de Lorraine, B.P. 70239, 54506 Vandœuvre-lès-Nancy cedex, France
| | - Benoît Roux
- Department of Biochemistry and Molecular Biology, University of Chicago, Chicago, 60637, United States
| | - Christophe Chipot
- Laboratoire International Associé Centre National de la Recherche Scientifique et University of Illinois at Urbana-Champaign, Unité Mixte de Recherche n°7019, Université de Lorraine, B.P. 70239, 54506 Vandœuvre-lès-Nancy cedex, France
- Department of Biochemistry and Molecular Biology, University of Chicago, Chicago, 60637, United States
- NIH Center for Macromolecular Modeling and Bioinformatics, Beckman Institute for Advanced Science and Technology, and Department of Physics, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, United States
| |
Collapse
|
9
|
Rydzewski J. Spectral Map: Embedding Slow Kinetics in Collective Variables. J Phys Chem Lett 2023; 14:5216-5220. [PMID: 37260045 PMCID: PMC10258851 DOI: 10.1021/acs.jpclett.3c01101] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2023] [Accepted: 05/26/2023] [Indexed: 06/02/2023]
Abstract
The dynamics of physical systems that require high-dimensional representation can often be captured in a few meaningful degrees of freedom called collective variables (CVs). However, identifying CVs is challenging and constitutes a fundamental problem in physical chemistry. This problem is even more pronounced when CVs need to provide information about slow kinetics related to rare transitions between long-lived metastable states. To address this issue, we propose an unsupervised deep-learning method called spectral map. Our method constructs slow CVs by maximizing the spectral gap between slow and fast eigenvalues of a transition matrix estimated by an anisotropic diffusion kernel. We demonstrate our method in several high-dimensional reversible folding processes.
Collapse
Affiliation(s)
- Jakub Rydzewski
- Institute of Physics, Faculty of Physics,
Astronomy and Informatics, Nicolaus Copernicus
University, Grudziadzka 5, 87-100 Toruń, Poland
| |
Collapse
|
10
|
Bhatia H, Aydin F, Carpenter TS, Lightstone FC, Bremer PT, Ingólfsson HI, Nissley DV, Streitz FH. The confluence of machine learning and multiscale simulations. Curr Opin Struct Biol 2023; 80:102569. [PMID: 36966691 DOI: 10.1016/j.sbi.2023.102569] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2022] [Revised: 01/31/2023] [Accepted: 02/08/2023] [Indexed: 06/04/2023]
Abstract
Multiscale modeling has a long history of use in structural biology, as computational biologists strive to overcome the time- and length-scale limits of atomistic molecular dynamics. Contemporary machine learning techniques, such as deep learning, have promoted advances in virtually every field of science and engineering and are revitalizing the traditional notions of multiscale modeling. Deep learning has found success in various approaches for distilling information from fine-scale models, such as building surrogate models and guiding the development of coarse-grained potentials. However, perhaps its most powerful use in multiscale modeling is in defining latent spaces that enable efficient exploration of conformational space. This confluence of machine learning and multiscale simulation with modern high-performance computing promises a new era of discovery and innovation in structural biology.
Collapse
Affiliation(s)
- Harsh Bhatia
- Computing Directorate, Lawrence Livermore National Laboratory, Livermore, CA, 94550, USA. https://twitter.com/@harshbhatia85
| | - Fikret Aydin
- Physical and Life Sciences (PLS) Directorate, Lawrence Livermore National Laboratory, Livermore, CA, 94550, USA
| | - Timothy S Carpenter
- Physical and Life Sciences (PLS) Directorate, Lawrence Livermore National Laboratory, Livermore, CA, 94550, USA
| | - Felice C Lightstone
- Physical and Life Sciences (PLS) Directorate, Lawrence Livermore National Laboratory, Livermore, CA, 94550, USA
| | - Peer-Timo Bremer
- Computing Directorate, Lawrence Livermore National Laboratory, Livermore, CA, 94550, USA
| | - Helgi I Ingólfsson
- Physical and Life Sciences (PLS) Directorate, Lawrence Livermore National Laboratory, Livermore, CA, 94550, USA
| | - Dwight V Nissley
- RAS Initiative, The Cancer Research Technology Program, Frederick National Laboratory, Frederick, MD, 21701, USA.
| | - Frederick H Streitz
- Physical and Life Sciences (PLS) Directorate, Lawrence Livermore National Laboratory, Livermore, CA, 94550, USA.
| |
Collapse
|
11
|
Rydzewski J. Selecting High-Dimensional Representations of Physical Systems by Reweighted Diffusion Maps. J Phys Chem Lett 2023; 14:2778-2783. [PMID: 36897996 PMCID: PMC10041639 DOI: 10.1021/acs.jpclett.3c00265] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2023] [Accepted: 03/08/2023] [Indexed: 06/18/2023]
Abstract
Constructing reduced representations of high-dimensional systems is a fundamental problem in physical chemistry. Many unsupervised machine learning methods can automatically find such low-dimensional representations. However, an often overlooked problem is what high-dimensional representation should be used to describe systems before dimensionality reduction. Here, we address this issue using a recently developed method called the reweighted diffusion map [J. Chem. Theory Comput. 2022, 18, 7179-7192]. We show how high-dimensional representations can be quantitatively selected by exploring the spectral decomposition of Markov transition matrices built from data obtained from standard or enhanced sampling atomistic simulations. We demonstrate the performance of the method in several high-dimensional examples.
Collapse
|
12
|
Dutta P, Sengupta N. Efficient Interrogation of the Kinetic Barriers Demarcating Catalytic States of a Tyrosine Kinase with Optimal Physical Descriptors and Mixture Models. Chemphyschem 2023; 24:e202200595. [PMID: 36394126 DOI: 10.1002/cphc.202200595] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2022] [Revised: 11/16/2022] [Accepted: 11/16/2022] [Indexed: 11/18/2022]
Abstract
Computer simulations are increasingly used to access thermo-kinetic information underlying structural transformation of protein kinases. Such information are necessary to probe their roles in disease progression and interactions with drug targets. However, the investigations are frequently challenged by forbiddingly high computational expense, and by the lack of standard protocols for the design of low dimensional physical descriptors that encode system features important for transitions. Here, we consider the demarcating characteristics of the different states of Abelson tyrosine kinase associated with distinct catalytic activity to construct a set of physically meaningful, orthogonal collective variables that preserve the slow modes of the system. Independent sampling of each metastable state is followed by the estimation of global partition function along the appropriate physical descriptors using the modified Expectation Maximized Molecular Dynamics method. The resultant free energy barriers are in excellent agreement with experimentally known rate-limiting dynamics and activation energy computed with conventional enhanced sampling methods. We discuss possible directions for further development and applications.
Collapse
Affiliation(s)
- Pallab Dutta
- Department of Biological Sciences, Indian Institute of Science Education and Research (IISER) Kolkata, Mohanpur, 741246, India
| | - Neelanjana Sengupta
- Department of Biological Sciences, Indian Institute of Science Education and Research (IISER) Kolkata, Mohanpur, 741246, India
| |
Collapse
|
13
|
Rydzewski J, Chen M, Ghosh TK, Valsson O. Reweighted Manifold Learning of Collective Variables from Enhanced Sampling Simulations. J Chem Theory Comput 2022; 18:7179-7192. [PMID: 36367826 DOI: 10.1021/acs.jctc.2c00873] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
Enhanced sampling methods are indispensable in computational chemistry and physics, where atomistic simulations cannot exhaustively sample the high-dimensional configuration space of dynamical systems due to the sampling problem. A class of such enhanced sampling methods works by identifying a few slow degrees of freedom, termed collective variables (CVs), and enhancing the sampling along these CVs. Selecting CVs to analyze and drive the sampling is not trivial and often relies on chemical intuition. Despite routinely circumventing this issue using manifold learning to estimate CVs directly from standard simulations, such methods cannot provide mappings to a low-dimensional manifold from enhanced sampling simulations, as the geometry and density of the learned manifold are biased. Here, we address this crucial issue and provide a general reweighting framework based on anisotropic diffusion maps for manifold learning that takes into account that the learning data set is sampled from a biased probability distribution. We consider manifold learning methods based on constructing a Markov chain describing transition probabilities between high-dimensional samples. We show that our framework reverts the biasing effect, yielding CVs that correctly describe the equilibrium density. This advancement enables the construction of low-dimensional CVs using manifold learning directly from the data generated by enhanced sampling simulations. We call our framework reweighted manifold learning. We show that it can be used in many manifold learning techniques on data from both standard and enhanced sampling simulations.
Collapse
Affiliation(s)
- Jakub Rydzewski
- Institute of Physics, Faculty of Physics, Astronomy and Informatics, Nicolaus Copernicus University, Grudziadzka 5, 87-100 Toruń, Poland
| | - Ming Chen
- Department of Chemistry, Purdue University, West Lafayette, Indiana 47907, United States
| | - Tushar K Ghosh
- Department of Chemistry, Purdue University, West Lafayette, Indiana 47907, United States
| | - Omar Valsson
- Department of Chemistry, University of North Texas, Denton, Texas 76201, United States
| |
Collapse
|
14
|
Rydzewski J, Walczewska-Szewc K, Czach S, Nowak W, Kuczera K. Enhancing the Inhomogeneous Photodynamics of Canonical Bacteriophytochrome. J Phys Chem B 2022; 126:2647-2657. [PMID: 35357137 PMCID: PMC9014414 DOI: 10.1021/acs.jpcb.2c00131] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
![]()
The ability of phytochromes
to act as photoswitches in plants and
microorganisms depends on interactions between a bilin-like chromophore
and a host protein. The interconversion occurs between the spectrally
distinct red (Pr) and far-red (Pfr) conformers. This conformational
change is triggered by the photoisomerization of the chromophore D-ring
pyrrole. In this study, as a representative example of a phytochrome-bilin
system, we consider biliverdin IXα (BV) bound to bacteriophytochrome
(BphP) from Deinococcus radiodurans. In the absence
of light, we use an enhanced sampling molecular dynamics (MD) method
to overcome the photoisomerization energy barrier. We find that the
calculated free energy (FE) barriers between essential metastable
states agree with spectroscopic results. We show that the enhanced
dynamics of the BV chromophore in BphP contributes to triggering nanometer-scale
conformational movements that propagate by two experimentally determined
signal transduction pathways. Most importantly, we describe how the
metastable states enable a thermal transition known as the dark reversion
between Pfr and Pr, through a previously unknown intermediate state
of Pfr. We present the heterogeneity of temperature-dependent Pfr
states at the atomistic level. This work paves a way toward understanding
the complete mechanism of the photoisomerization of a bilin-like chromophore
in phytochromes.
Collapse
Affiliation(s)
- Jakub Rydzewski
- Institute of Physics, Faculty of Physics, Astronomy and Informatics, Nicolaus Copernicus University, Grudziadzka 5, 87-100, Torun, Poland
| | - Katarzyna Walczewska-Szewc
- Institute of Physics, Faculty of Physics, Astronomy and Informatics, Nicolaus Copernicus University, Grudziadzka 5, 87-100, Torun, Poland
| | - Sylwia Czach
- Institute of Physics, Faculty of Physics, Astronomy and Informatics, Nicolaus Copernicus University, Grudziadzka 5, 87-100, Torun, Poland
| | - Wieslaw Nowak
- Institute of Physics, Faculty of Physics, Astronomy and Informatics, Nicolaus Copernicus University, Grudziadzka 5, 87-100, Torun, Poland
| | - Krzysztof Kuczera
- Department of Molecular Biosciences, University of Kansas, Lawrence, Kansas 66047, United States.,Department of Chemistry, University of Kansas, Lawrence, Kansas 66045, United States
| |
Collapse
|
15
|
Raucci U, Rizzi V, Parrinello M. Discover, Sample, and Refine: Exploring Chemistry with Enhanced Sampling Techniques. J Phys Chem Lett 2022; 13:1424-1430. [PMID: 35119863 DOI: 10.1021/acs.jpclett.1c03993] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Over the last few decades, enhanced sampling methods have been continuously improved. Here, we exploit this progress and propose a modular workflow for blind reaction discovery and determination of reaction paths. In a three-step strategy, at first we use a collective variable derived from spectral graph theory in conjunction with the explore variant of the on-the-fly probability enhanced sampling method to drive reaction discovery runs. Once different chemical products are determined, we construct an ad-hoc neural network-based collective variable to improve sampling, and finally we refine the results using the free energy perturbation theory and a more accurate Hamiltonian. We apply this strategy to both intramolecular and intermolecular reactions. Our workflow requires minimal user input and extends the power of ab initio molecular dynamics to explore and characterize the reaction space.
Collapse
Affiliation(s)
- Umberto Raucci
- Italian Institute of Technology, Via E. Melen 83, 16152, Genova, Italy
| | - Valerio Rizzi
- Italian Institute of Technology, Via E. Melen 83, 16152, Genova, Italy
| | | |
Collapse
|
16
|
Chen H, Liu H, Feng H, Fu H, Cai W, Shao X, Chipot C. MLCV: Bridging Machine-Learning-Based Dimensionality Reduction and Free-Energy Calculation. J Chem Inf Model 2021; 62:1-8. [PMID: 34939790 DOI: 10.1021/acs.jcim.1c01010] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Importance-sampling algorithms leaning on the definition of a model reaction coordinate (RC) are widely employed to probe processes relevant to chemistry and biology alike, spanning time scales not amenable to common, brute-force molecular dynamics (MD) simulations. In practice, the model RC often consists of a handful of collective variables (CVs) chosen on the basis of chemical intuition. However, constructing manually a low-dimensional RC model to describe an intricate geometrical transformation for the purpose of free-energy calculations and analyses remains a daunting challenge due to the inherent complexity of the conformational transitions at play. To solve this issue, remarkable progress has been made in employing machine-learning techniques, such as autoencoders, to extract the low-dimensional RC model from a large set of CVs. Implementation of the differentiable, nonlinear machine-learned CVs in common MD engines to perform free-energy calculations is, however, particularly cumbersome. To address this issue, we present here a user-friendly tool (called MLCV) that facilitates the use of machine-learned CVs in importance-sampling simulations through the popular Colvars module. Our approach is critically probed with three case examples consisting of small peptides, showcasing that through hard-coded neural network in Colvars, deep-learning and enhanced-sampling can be effectively bridged with MD simulations. The MLCV code is versatile, applicable to all the CVs available in Colvars, and can be connected to any kind of dense neural networks. We believe that MLCV provides an effective, powerful, and user-friendly platform accessible to experts and nonexperts alike for machine-learning (ML)-guided CV discovery and enhanced-sampling simulations to unveil the molecular mechanisms underlying complex biochemical processes.
Collapse
Affiliation(s)
- Haochuan Chen
- Research Center for Analytical Sciences, Frontiers Science Center for New Organic Matter, College of Chemistry, Nankai University, Tianjin 300071, China.,Tianjin Key Laboratory of Biosensing and Molecular Recognition, Tianjin 300071, China.,State Key Laboratory of Medicinal Chemical Biology, Tianjin 300071, China
| | - Han Liu
- Research Center for Analytical Sciences, Frontiers Science Center for New Organic Matter, College of Chemistry, Nankai University, Tianjin 300071, China.,Tianjin Key Laboratory of Biosensing and Molecular Recognition, Tianjin 300071, China.,State Key Laboratory of Medicinal Chemical Biology, Tianjin 300071, China
| | - Heying Feng
- Research Center for Analytical Sciences, Frontiers Science Center for New Organic Matter, College of Chemistry, Nankai University, Tianjin 300071, China.,Tianjin Key Laboratory of Biosensing and Molecular Recognition, Tianjin 300071, China.,State Key Laboratory of Medicinal Chemical Biology, Tianjin 300071, China
| | - Haohao Fu
- Research Center for Analytical Sciences, Frontiers Science Center for New Organic Matter, College of Chemistry, Nankai University, Tianjin 300071, China.,Tianjin Key Laboratory of Biosensing and Molecular Recognition, Tianjin 300071, China.,State Key Laboratory of Medicinal Chemical Biology, Tianjin 300071, China
| | - Wensheng Cai
- Research Center for Analytical Sciences, Frontiers Science Center for New Organic Matter, College of Chemistry, Nankai University, Tianjin 300071, China.,Tianjin Key Laboratory of Biosensing and Molecular Recognition, Tianjin 300071, China.,State Key Laboratory of Medicinal Chemical Biology, Tianjin 300071, China
| | - Xueguang Shao
- Research Center for Analytical Sciences, Frontiers Science Center for New Organic Matter, College of Chemistry, Nankai University, Tianjin 300071, China.,Tianjin Key Laboratory of Biosensing and Molecular Recognition, Tianjin 300071, China.,State Key Laboratory of Medicinal Chemical Biology, Tianjin 300071, China
| | - Christophe Chipot
- Laboratoire International Associé CNRS and University of Illinois at Urbana-Champaign, UMR no. 7019, Université de Lorraine, BP 70239, F-54506 Vandœuvre-lès-Nancy, France
| |
Collapse
|
17
|
Tsai ST, Smith Z, Tiwary P. SGOOP-d: Estimating Kinetic Distances and Reaction Coordinate Dimensionality for Rare Event Systems from Biased/Unbiased Simulations. J Chem Theory Comput 2021; 17:6757-6765. [PMID: 34662516 DOI: 10.1021/acs.jctc.1c00431] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
Abstract
Understanding kinetics including reaction pathways and associated transition rates is an important yet difficult problem in numerous chemical and biological systems, especially in situations with multiple competing pathways. When these high-dimensional systems are projected on low-dimensional coordinates, which are often needed for enhanced sampling or for interpretation of simulations and experiments, one can end up losing the kinetic connectivity of the underlying high-dimensional landscape. Thus, in the low-dimensional projection, metastable states might appear closer or further than they actually are. To deal with this issue, in this work, we develop a formalism that learns a multidimensional yet minimally complex reaction coordinate (RC) for generic high-dimensional systems. When projected along this RC, all possible kinetically relevant pathways can be demarcated and the true high-dimensional connectivity is maintained. One of the defining attributes of our method lies in that it can work on long unbiased simulations as well as biased simulations often needed for rare event systems. We demonstrate the utility of the method by studying a range of model systems including conformational transitions in a small peptide Ace-Ala3-Nme, where we show how two-dimensional and three-dimensional RCs found by our previously published spectral gap optimization method "SGOOP" [Tiwary, P. and Berne, B. J. Proc. Natl. Acad. Sci. 2016, 113, 2839] can capture the kinetics for 23 and all 28 out of the 28 dominant state-to-state transitions, respectively.
Collapse
Affiliation(s)
- Sun-Ting Tsai
- Department of Physics and Institute for Physical Science and Technology, University of Maryland, College Park 20742, Maryland, United States
| | - Zachary Smith
- Biophysics Program and Institute for Physical Science and Technology, University of Maryland, College Park 20742, Maryland, United States
| | - Pratyush Tiwary
- Department of Chemistry and Biochemistry and Institute for Physical Science and Technology, University of Maryland, College Park 20742, Maryland, United States
| |
Collapse
|
18
|
Bonati L, Piccini G, Parrinello M. Deep learning the slow modes for rare events sampling. Proc Natl Acad Sci U S A 2021; 118:e2113533118. [PMID: 34706940 PMCID: PMC8612227 DOI: 10.1073/pnas.2113533118] [Citation(s) in RCA: 90] [Impact Index Per Article: 30.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/19/2021] [Indexed: 02/08/2023] Open
Abstract
The development of enhanced sampling methods has greatly extended the scope of atomistic simulations, allowing long-time phenomena to be studied with accessible computational resources. Many such methods rely on the identification of an appropriate set of collective variables. These are meant to describe the system's modes that most slowly approach equilibrium under the action of the sampling algorithm. Once identified, the equilibration of these modes is accelerated by the enhanced sampling method of choice. An attractive way of determining the collective variables is to relate them to the eigenfunctions and eigenvalues of the transfer operator. Unfortunately, this requires knowing the long-term dynamics of the system beforehand, which is generally not available. However, we have recently shown that it is indeed possible to determine efficient collective variables starting from biased simulations. In this paper, we bring the power of machine learning and the efficiency of the recently developed on the fly probability-enhanced sampling method to bear on this approach. The result is a powerful and robust algorithm that, given an initial enhanced sampling simulation performed with trial collective variables or generalized ensembles, extracts transfer operator eigenfunctions using a neural network ansatz and then accelerates them to promote sampling of rare events. To illustrate the generality of this approach, we apply it to several systems, ranging from the conformational transition of a small molecule to the folding of a miniprotein and the study of materials crystallization.
Collapse
Affiliation(s)
- Luigi Bonati
- Department of Physics, Eidgenössische Technische Hochschule (ETH) Zürich, 8092 Zürich, Switzerland;
- Atomistic Simulations, Italian Institute of Technology, 16163 Genova, Italy
| | | | - Michele Parrinello
- Atomistic Simulations, Italian Institute of Technology, 16163 Genova, Italy;
| |
Collapse
|
19
|
Rydzewski J, Valsson O. Multiscale Reweighted Stochastic Embedding: Deep Learning of Collective Variables for Enhanced Sampling. J Phys Chem A 2021; 125:6286-6302. [PMID: 34213915 PMCID: PMC8389995 DOI: 10.1021/acs.jpca.1c02869] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2021] [Revised: 06/17/2021] [Indexed: 12/29/2022]
Abstract
Machine learning methods provide a general framework for automatically finding and representing the essential characteristics of simulation data. This task is particularly crucial in enhanced sampling simulations. There we seek a few generalized degrees of freedom, referred to as collective variables (CVs), to represent and drive the sampling of the free energy landscape. In theory, these CVs should separate different metastable states and correspond to the slow degrees of freedom of the studied physical process. To this aim, we propose a new method that we call multiscale reweighted stochastic embedding (MRSE). Our work builds upon a parametric version of stochastic neighbor embedding. The technique automatically learns CVs that map a high-dimensional feature space to a low-dimensional latent space via a deep neural network. We introduce several new advancements to stochastic neighbor embedding methods that make MRSE especially suitable for enhanced sampling simulations: (1) weight-tempered random sampling as a landmark selection scheme to obtain training data sets that strike a balance between equilibrium representation and capturing important metastable states lying higher in free energy; (2) a multiscale representation of the high-dimensional feature space via a Gaussian mixture probability model; and (3) a reweighting procedure to account for training data from a biased probability distribution. We show that MRSE constructs low-dimensional CVs that can correctly characterize the different metastable states in three model systems: the Müller-Brown potential, alanine dipeptide, and alanine tetrapeptide.
Collapse
Affiliation(s)
- Jakub Rydzewski
- Institute
of Physics, Faculty of Physics, Astronomy and Informatics, Nicolaus Copernicus University, Grudziadzka 5, 87-100 Torun, Poland
| | - Omar Valsson
- Max
Planck Institute for Polymer Research, Ackermannweg 10, Mainz D-55128, Germany
| |
Collapse
|