1
|
Wu Y, Cao S, Qiu Y, Huang X. Tutorial on how to build non-Markovian dynamic models from molecular dynamics simulations for studying protein conformational changes. J Chem Phys 2024; 160:121501. [PMID: 38516972 DOI: 10.1063/5.0189429] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2023] [Accepted: 02/20/2024] [Indexed: 03/23/2024] Open
Abstract
Protein conformational changes play crucial roles in their biological functions. In recent years, the Markov State Model (MSM) constructed from extensive Molecular Dynamics (MD) simulations has emerged as a powerful tool for modeling complex protein conformational changes. In MSMs, dynamics are modeled as a sequence of Markovian transitions among metastable conformational states at discrete time intervals (called lag time). A major challenge for MSMs is that the lag time must be long enough to allow transitions among states to become memoryless (or Markovian). However, this lag time is constrained by the length of individual MD simulations available to track these transitions. To address this challenge, we have recently developed Generalized Master Equation (GME)-based approaches, encoding non-Markovian dynamics using a time-dependent memory kernel. In this Tutorial, we introduce the theory behind two recently developed GME-based non-Markovian dynamic models: the quasi-Markov State Model (qMSM) and the Integrative Generalized Master Equation (IGME). We subsequently outline the procedures for constructing these models and provide a step-by-step tutorial on applying qMSM and IGME to study two peptide systems: alanine dipeptide and villin headpiece. This Tutorial is available at https://github.com/xuhuihuang/GME_tutorials. The protocols detailed in this Tutorial aim to be accessible for non-experts interested in studying the biomolecular dynamics using these non-Markovian dynamic models.
Collapse
Affiliation(s)
- Yue Wu
- Department of Chemistry, Theoretical Chemistry Institute, University of Wisconsin-Madison, Madison, Wisconsin 53706, USA
| | - Siqin Cao
- Department of Chemistry, Theoretical Chemistry Institute, University of Wisconsin-Madison, Madison, Wisconsin 53706, USA
| | - Yunrui Qiu
- Department of Chemistry, Theoretical Chemistry Institute, University of Wisconsin-Madison, Madison, Wisconsin 53706, USA
| | - Xuhui Huang
- Department of Chemistry, Theoretical Chemistry Institute, University of Wisconsin-Madison, Madison, Wisconsin 53706, USA
- Data Science Institute, University of Wisconsin-Madison, Madison, Wisconsin 53706, USA
| |
Collapse
|
2
|
Lorpaiboon C, Guo SC, Strahan J, Weare J, Dinner AR. Accurate estimates of dynamical statistics using memory. J Chem Phys 2024; 160:084108. [PMID: 38391020 PMCID: PMC10898919 DOI: 10.1063/5.0187145] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2023] [Accepted: 01/29/2024] [Indexed: 02/24/2024] Open
Abstract
Many chemical reactions and molecular processes occur on time scales that are significantly longer than those accessible by direct simulations. One successful approach to estimating dynamical statistics for such processes is to use many short time series of observations of the system to construct a Markov state model, which approximates the dynamics of the system as memoryless transitions between a set of discrete states. The dynamical Galerkin approximation (DGA) is a closely related framework for estimating dynamical statistics, such as committors and mean first passage times, by approximating solutions to their equations with a projection onto a basis. Because the projected dynamics are generally not memoryless, the Markov approximation can result in significant systematic errors. Inspired by quasi-Markov state models, which employ the generalized master equation to encode memory resulting from the projection, we reformulate DGA to account for memory and analyze its performance on two systems: a two-dimensional triple well and the AIB9 peptide. We demonstrate that our method is robust to the choice of basis and can decrease the time series length required to obtain accurate kinetics by an order of magnitude.
Collapse
Affiliation(s)
- Chatipat Lorpaiboon
- Department of Chemistry and James Franck Institute, University of Chicago, Chicago, Illinois 60637, USA
| | - Spencer C. Guo
- Department of Chemistry and James Franck Institute, University of Chicago, Chicago, Illinois 60637, USA
| | - John Strahan
- Department of Chemistry and James Franck Institute, University of Chicago, Chicago, Illinois 60637, USA
| | - Jonathan Weare
- Courant Institute of Mathematical Sciences, New York University, New York, New York 10012, USA
| | - Aaron R. Dinner
- Department of Chemistry and James Franck Institute, University of Chicago, Chicago, Illinois 60637, USA
| |
Collapse
|
3
|
Guo SC, Shen R, Roux B, Dinner AR. Dynamics of activation in the voltage-sensing domain of Ciona intestinalis phosphatase Ci-VSP. Nat Commun 2024; 15:1408. [PMID: 38360718 PMCID: PMC10869754 DOI: 10.1038/s41467-024-45514-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2022] [Accepted: 01/25/2024] [Indexed: 02/17/2024] Open
Abstract
The Ciona intestinalis voltage-sensing phosphatase (Ci-VSP) is a membrane protein containing a voltage-sensing domain (VSD) that is homologous to VSDs from voltage-gated ion channels responsible for cellular excitability. Previously published crystal structures of Ci-VSD in putative resting and active conformations suggested a helical-screw voltage sensing mechanism in which the S4 helix translocates and rotates to enable exchange of salt-bridge partners, but the microscopic details of the transition between the resting and active conformations remained unknown. Here, by combining extensive molecular dynamics simulations with a recently developed computational framework based on dynamical operators, we elucidate the microscopic mechanism of the resting-active transition at physiological membrane potential. Sparse regression reveals a small set of coordinates that distinguish intermediates that are hidden from electrophysiological measurements. The intermediates arise from a noncanonical helical-screw mechanism in which translocation, rotation, and side-chain movement of the S4 helix are only loosely coupled. These results provide insights into existing experimental and computational findings on voltage sensing and suggest ways of further probing its mechanism.
Collapse
Affiliation(s)
- Spencer C Guo
- Department of Chemistry, The University of Chicago, Chicago, IL, 60637, USA
- James Franck Institute, The University of Chicago, Chicago, IL, 60637, USA
| | - Rong Shen
- Department of Biochemistry and Molecular Biology, The University of Chicago, Chicago, IL, 60637, USA
| | - Benoît Roux
- Department of Chemistry, The University of Chicago, Chicago, IL, 60637, USA.
- Department of Biochemistry and Molecular Biology, The University of Chicago, Chicago, IL, 60637, USA.
- Institute for Biophysical Dynamics, The University of Chicago, Chicago, IL, 60637, USA.
| | - Aaron R Dinner
- Department of Chemistry, The University of Chicago, Chicago, IL, 60637, USA.
- James Franck Institute, The University of Chicago, Chicago, IL, 60637, USA.
- Institute for Biophysical Dynamics, The University of Chicago, Chicago, IL, 60637, USA.
| |
Collapse
|
4
|
Cao S, Qiu Y, Kalin ML, Huang X. Integrative generalized master equation: A method to study long-timescale biomolecular dynamics via the integrals of memory kernels. J Chem Phys 2023; 159:134106. [PMID: 37787134 PMCID: PMC11005468 DOI: 10.1063/5.0167287] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2023] [Accepted: 09/18/2023] [Indexed: 10/04/2023] Open
Abstract
The generalized master equation (GME) provides a powerful approach to study biomolecular dynamics via non-Markovian dynamic models built from molecular dynamics (MD) simulations. Previously, we have implemented the GME, namely the quasi Markov State Model (qMSM), where we explicitly calculate the memory kernel and propagate dynamics using a discretized GME. qMSM can be constructed with much shorter MD trajectories than the MSM. However, since qMSM needs to explicitly compute the time-dependent memory kernels, it is heavily affected by the numerical fluctuations of simulation data when applied to study biomolecular conformational changes. This can lead to numerical instability of predicted long-time dynamics, greatly limiting the applicability of qMSM in complicated biomolecules. We present a new method, the Integrative GME (IGME), in which we analytically solve the GME under the condition when the memory kernels have decayed to zero. Our IGME overcomes the challenges of the qMSM by using the time integrations of memory kernels, thereby avoiding the numerical instability caused by explicit computation of time-dependent memory kernels. Using our solutions of the GME, we have developed a new approach to compute long-time dynamics based on MD simulations in a numerically stable, accurate and efficient way. To demonstrate its effectiveness, we have applied the IGME in three biomolecules: the alanine dipeptide, FIP35 WW-domain, and Taq RNA polymerase. In each system, the IGME achieves significantly smaller fluctuations for both memory kernels and long-time dynamics compared to the qMSM. We anticipate that the IGME can be widely applied to investigate biomolecular conformational changes.
Collapse
Affiliation(s)
- Siqin Cao
- Department of Chemistry, Theoretical Chemistry Institute, University of Wisconsin-Madison, Madison, Wisconsin 53706, USA
| | - Yunrui Qiu
- Department of Chemistry, Theoretical Chemistry Institute, University of Wisconsin-Madison, Madison, Wisconsin 53706, USA
| | - Michael L. Kalin
- Biophysics Graduate Program, University of Wisconsin-Madison, Madison, Wisconsin 53706, USA
| | - Xuhui Huang
- Department of Chemistry, Theoretical Chemistry Institute, University of Wisconsin-Madison, Madison, Wisconsin 53706, USA
| |
Collapse
|
5
|
Conev A, Rigo MM, Devaurs D, Fonseca AF, Kalavadwala H, de Freitas MV, Clementi C, Zanatta G, Antunes DA, Kavraki LE. EnGens: a computational framework for generation and analysis of representative protein conformational ensembles. Brief Bioinform 2023; 24:bbad242. [PMID: 37418278 PMCID: PMC10359083 DOI: 10.1093/bib/bbad242] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2023] [Revised: 05/23/2023] [Accepted: 06/10/2023] [Indexed: 07/08/2023] Open
Abstract
Proteins are dynamic macromolecules that perform vital functions in cells. A protein structure determines its function, but this structure is not static, as proteins change their conformation to achieve various functions. Understanding the conformational landscapes of proteins is essential to understand their mechanism of action. Sets of carefully chosen conformations can summarize such complex landscapes and provide better insights into protein function than single conformations. We refer to these sets as representative conformational ensembles. Recent advances in computational methods have led to an increase in the number of available structural datasets spanning conformational landscapes. However, extracting representative conformational ensembles from such datasets is not an easy task and many methods have been developed to tackle it. Our new approach, EnGens (short for ensemble generation), collects these methods into a unified framework for generating and analyzing representative protein conformational ensembles. In this work, we: (1) provide an overview of existing methods and tools for representative protein structural ensemble generation and analysis; (2) unify existing approaches in an open-source Python package, and a portable Docker image, providing interactive visualizations within a Jupyter Notebook pipeline; (3) test our pipeline on a few canonical examples from the literature. Representative ensembles produced by EnGens can be used for many downstream tasks such as protein-ligand ensemble docking, Markov state modeling of protein dynamics and analysis of the effect of single-point mutations.
Collapse
Affiliation(s)
- Anja Conev
- Department of Computer Science, Rice University, Houston 77005, TX, USA
| | | | - Didier Devaurs
- MRC Institute of Genetics and Cancer, University of Edinburgh, Edinburgh EH4 2XU, UK
| | | | - Hussain Kalavadwala
- Department of Biology and Biochemistry, University of Houston, Houston 77004, TX, USA
| | | | - Cecilia Clementi
- Department of Physics, Freie Universität Berlin, Berlin 14195, Germany
| | - Geancarlo Zanatta
- Department of Biophysics, Institute of Biosciences, Federal University of Rio Grande do Sul, Porto Alegre 91501-970, Brazil
| | - Dinler Amaral Antunes
- Department of Biology and Biochemistry, University of Houston, Houston 77004, TX, USA
| | - Lydia E Kavraki
- Department of Computer Science, Rice University, Houston 77005, TX, USA
| |
Collapse
|
6
|
Strahan J, Guo SC, Lorpaiboon C, Dinner AR, Weare J. Inexact iterative numerical linear algebra for neural network-based spectral estimation and rare-event prediction. J Chem Phys 2023; 159:014110. [PMID: 37409704 PMCID: PMC10328561 DOI: 10.1063/5.0151309] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2023] [Accepted: 06/02/2023] [Indexed: 07/07/2023] Open
Abstract
Understanding dynamics in complex systems is challenging because there are many degrees of freedom, and those that are most important for describing events of interest are often not obvious. The leading eigenfunctions of the transition operator are useful for visualization, and they can provide an efficient basis for computing statistics, such as the likelihood and average time of events (predictions). Here, we develop inexact iterative linear algebra methods for computing these eigenfunctions (spectral estimation) and making predictions from a dataset of short trajectories sampled at finite intervals. We demonstrate the methods on a low-dimensional model that facilitates visualization and a high-dimensional model of a biomolecular system. Implications for the prediction problem in reinforcement learning are discussed.
Collapse
Affiliation(s)
- John Strahan
- Department of Chemistry and James Franck Institute, University of Chicago, Chicago, Illinois 60637, USA
| | - Spencer C. Guo
- Department of Chemistry and James Franck Institute, University of Chicago, Chicago, Illinois 60637, USA
| | - Chatipat Lorpaiboon
- Department of Chemistry and James Franck Institute, University of Chicago, Chicago, Illinois 60637, USA
| | - Aaron R. Dinner
- Department of Chemistry and James Franck Institute, University of Chicago, Chicago, Illinois 60637, USA
| | - Jonathan Weare
- Courant Institute of Mathematical Sciences, New York University, New York, New York 10012, USA
| |
Collapse
|
7
|
Conev A, Rigo MM, Devaurs D, Fonseca AF, Kalavadwala H, de Freitas MV, Clementi C, Zanatta G, Antunes DA, Kavraki L. EnGens: a computational framework for generation and analysis of representative protein conformational ensembles. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.04.24.538094. [PMID: 37163076 PMCID: PMC10168271 DOI: 10.1101/2023.04.24.538094] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/11/2023]
Abstract
Proteins are dynamic macromolecules that perform vital functions in cells. A protein structure determines its function, but this structure is not static, as proteins change their conformation to achieve various functions. Understanding the conformational landscapes of proteins is essential to understand their mechanism of action. Sets of carefully chosen conformations can summarize such complex landscapes and provide better insights into protein function than single conformations. We refer to these sets as representative conformational ensembles. Recent advances in computational methods have led to an increase in number of available structural datasets spanning conformational landscapes. However, extracting representative conformational ensembles from such datasets is not an easy task and many methods have been developed to tackle it. Our new approach, EnGens (short for ensemble generation), collects these methods into a unified framework for generating and analyzing protein conformational ensembles. In this work we: (1) provide an overview of existing methods and tools for protein structural ensemble generation and analysis; (2) unify existing approaches in an open-source Python package, and a portable Docker image, providing interactive visualizations within a Jupyter Notebook pipeline; (3) test our pipeline on a few canonical examples found in the literature. Representative ensembles produced by EnGens can be used for many downstream tasks such as protein-ligand ensemble docking, Markov state modeling of protein dynamics and analysis of the effect of single-point mutations.
Collapse
|
8
|
Dominic AJ, Cao S, Montoya-Castillo A, Huang X. Memory Unlocks the Future of Biomolecular Dynamics: Transformative Tools to Uncover Physical Insights Accurately and Efficiently. J Am Chem Soc 2023; 145:9916-9927. [PMID: 37104720 DOI: 10.1021/jacs.3c01095] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/29/2023]
Abstract
Conformational changes underpin function and encode complex biomolecular mechanisms. Gaining atomic-level detail of how such changes occur has the potential to reveal these mechanisms and is of critical importance in identifying drug targets, facilitating rational drug design, and enabling bioengineering applications. While the past two decades have brought Markov state model techniques to the point where practitioners can regularly use them to glimpse the long-time dynamics of slow conformations in complex systems, many systems are still beyond their reach. In this Perspective, we discuss how including memory (i.e., non-Markovian effects) can reduce the computational cost to predict the long-time dynamics in these complex systems by orders of magnitude and with greater accuracy and resolution than state-of-the-art Markov state models. We illustrate how memory lies at the heart of successful and promising techniques, ranging from the Fokker-Planck and generalized Langevin equations to deep-learning recurrent neural networks and generalized master equations. We delineate how these techniques work, identify insights that they can offer in biomolecular systems, and discuss their advantages and disadvantages in practical settings. We show how generalized master equations can enable the investigation of, for example, the gate-opening process in RNA polymerase II and demonstrate how our recent advances tame the deleterious influence of statistical underconvergence of the molecular dynamics simulations used to parameterize these techniques. This represents a significant leap forward that will enable our memory-based techniques to interrogate systems that are currently beyond the reach of even the best Markov state models. We conclude by discussing some current challenges and future prospects for how exploiting memory will open the door to many exciting opportunities.
Collapse
Affiliation(s)
- Anthony J Dominic
- Department of Chemistry, University of Colorado Boulder, Boulder, Colorado 80309, USA
| | - Siqin Cao
- Department of Chemistry, Theoretical Chemistry Institute, University of Wisconsin-Madison, Madison, Wisconsin 53706, USA
| | | | - Xuhui Huang
- Department of Chemistry, Theoretical Chemistry Institute, University of Wisconsin-Madison, Madison, Wisconsin 53706, USA
| |
Collapse
|
9
|
Chen H, Chipot C. Chasing collective variables using temporal data-driven strategies. QRB DISCOVERY 2023; 4:e2. [PMID: 37564298 PMCID: PMC10411323 DOI: 10.1017/qrd.2022.23] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2022] [Revised: 12/21/2022] [Accepted: 12/29/2022] [Indexed: 01/09/2023] Open
Abstract
The convergence of free-energy calculations based on importance sampling depends heavily on the choice of collective variables (CVs), which in principle, should include the slow degrees of freedom of the biological processes to be investigated. Autoencoders (AEs), as emerging data-driven dimension reduction tools, have been utilised for discovering CVs. AEs, however, are often treated as black boxes, and what AEs actually encode during training, and whether the latent variables from encoders are suitable as CVs for further free-energy calculations remains unknown. In this contribution, we review AEs and their time-series-based variants, including time-lagged AEs (TAEs) and modified TAEs, as well as the closely related model variational approach for Markov processes networks (VAMPnets). We then show through numerical examples that AEs learn the high-variance modes instead of the slow modes. In stark contrast, time series-based models are able to capture the slow modes. Moreover, both modified TAEs with extensions from slow feature analysis and the state-free reversible VAMPnets (SRVs) can yield orthogonal multidimensional CVs. As an illustration, we employ SRVs to discover the CVs of the isomerizations of N-acetyl-N'-methylalanylamide and trialanine by iterative learning with trajectories from biased simulations. Last, through numerical experiments with anisotropic diffusion, we investigate the potential relationship of time-series-based models and committor probabilities.
Collapse
Affiliation(s)
- Haochuan Chen
- Laboratoire International Associé Centre National de la Recherche Scientifique et University of Illinois at Urbana-Champaign, Unité Mixte de Recherche n°7019, Université de Lorraine, 54506 Vandœuvre-lès-Nancy, France
| | - Christophe Chipot
- Laboratoire International Associé Centre National de la Recherche Scientifique et University of Illinois at Urbana-Champaign, Unité Mixte de Recherche n°7019, Université de Lorraine, 54506 Vandœuvre-lès-Nancy, France
- Theoretical and Computational Biophysics Group, Beckman Institute, and Department of Physics, University of Illinois at Urbana-Champaign, Urbana, IL61801, USA
- Department of Biochemistry and Molecular Biology, University of Chicago, Chicago, IL60637, USA
| |
Collapse
|
10
|
Hsu WT, Ramirez DA, Sammakia T, Tan Z, Shirts MR. Identifying signatures of proteolytic stability and monomeric propensity in O-glycosylated insulin using molecular simulation. J Comput Aided Mol Des 2022; 36:313-328. [PMID: 35507105 DOI: 10.1007/s10822-022-00453-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2021] [Accepted: 04/06/2022] [Indexed: 11/24/2022]
Abstract
Insulin has been commonly adopted as a peptide drug to treat diabetes as it facilitates the uptake of glucose from the blood. The development of oral insulin remains elusive over decades owing to its susceptibility to the enzymes in the gastrointestinal tract and poor permeability through the intestinal epithelium upon dimerization. Recent experimental studies have revealed that certain O-linked glycosylation patterns could enhance insulin's proteolytic stability and reduce its dimerization propensity, but understanding such phenomena at the molecular level is still difficult. To address this challenge, we proposed and tested several structural determinants that could potentially influence insulin's proteolytic stability and dimerization propensity. We used these metrics to assess the properties of interest from [Formula: see text] aggregate molecular dynamics of each of 12 targeted insulin glyco-variants from multiple wild-type crystal structures. We found that glycan-involved hydrogen bonds and glycan-dimer occlusion were useful metrics predicting the proteolytic stability and dimerization propensity of insulin, respectively, as was in part the solvent-accessible surface area of proteolytic sites. However, other plausible metrics were not generally predictive. This work helps better explain how O-linked glycosylation influences the proteolytic stability and monomeric propensity of insulin, illuminating a path towards rational molecular design of insulin glycoforms.
Collapse
Affiliation(s)
- Wei-Tse Hsu
- Department of Chemical & Biological Engineering, University of Colorado Boulder, Boulder, CO, 80309, USA
| | - Dominique A Ramirez
- Department of Biochemistry, University of Colorado Boulder, Boulder, CO, 80309, USA
| | - Tarek Sammakia
- Department of Chemistry, University of Colorado Boulder, Boulder, CO, 80309, USA
| | - Zhongping Tan
- Institute of Materia Medica, Chinese Academy of Medical Sciences, Peking Union Medical College, Beijing, 100050, China.
| | - Michael R Shirts
- Department of Chemical & Biological Engineering, University of Colorado Boulder, Boulder, CO, 80309, USA.
| |
Collapse
|
11
|
Abstract
The kinetics of a dynamical system dominated by two metastable states is examined from the perspective of the activated-dynamics reactive flux formalism, Markov state eigenvalue spectral decomposition, and committor-based transition path theory. Analysis shows that the different theoretical formulations are consistent, clarifying the significance of the inherent microscopic lag-times that are implicated, and that the most meaningful one-dimensional reaction coordinate in the region of the transition state is along the gradient of the committor in the multidimensional subspace of collective variables. It is shown that the familiar reactive flux activated dynamics formalism provides an effective route to calculate the transition rate in the case of a narrow sharp barrier but much less so in the case of a broad flat barrier. In this case, the standard reactive flux correlation function decays very slowly to the plateau value that corresponds to the transmission coefficient. Treating the committor function as a reaction coordinate does not alleviate all issues caused by the slow relaxation of the reactive flux correlation function. A more efficient activated dynamics simulation algorithm may be achieved from a modified reactive flux weighted by the committor. Simulation results on simple systems are used to illustrate the various conceptual points.
Collapse
Affiliation(s)
- Benoît Roux
- Department of Biochemistry and Molecular Biology, Department of Chemistry, The University of Chicago, 5735 S Ellis Ave., Chicago, Illinois 60637, USA
| |
Collapse
|
12
|
Busto-Moner L, Feng CJ, Antoszewski A, Tokmakoff A, Dinner AR. Structural Ensemble of the Insulin Monomer. Biochemistry 2021; 60:3125-3136. [PMID: 34637307 PMCID: PMC8552439 DOI: 10.1021/acs.biochem.1c00583] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2021] [Revised: 09/21/2021] [Indexed: 11/29/2022]
Abstract
Experimental evidence suggests that monomeric insulin exhibits significant conformational heterogeneity, and modifications of apparently disordered regions affect both biological activity and the longevity of pharmaceutical formulations, presumably through receptor binding and fibrillation/degradation, respectively. However, a microscopic understanding of conformational heterogeneity has been lacking. Here, we integrate all-atom molecular dynamics simulations with an analysis pipeline to investigate the structural ensemble of human insulin monomers. We find that 60% of the structures present at least one of the following elements of disorder: melting of the A-chain N-terminal helix, detachment of the B-chain N-terminus, and detachment of the B-chain C-terminus. We also observe partial melting and extension of the B-chain helix and significant conformational heterogeneity in the region containing the B-chain β-turn. We then estimate hydrogen-exchange protection factors for the sampled ensemble and find them in line with experimental results for KP-insulin, although the simulations underestimate the importance of unfolded states. Our results help explain the ready exchange of specific amide sites that appear to be protected in crystal structures. Finally, we discuss the implications for insulin function and stability.
Collapse
Affiliation(s)
- Luis Busto-Moner
- Department
of Chemistry, The University of Chicago, Chicago, Illinois 60637, United States
| | - Chi-Jui Feng
- Department
of Chemistry, The University of Chicago, Chicago, Illinois 60637, United States
| | - Adam Antoszewski
- Department
of Chemistry, The University of Chicago, Chicago, Illinois 60637, United States
| | - Andrei Tokmakoff
- Department
of Chemistry, The University of Chicago, Chicago, Illinois 60637, United States
- James
Franck Institute, The University of Chicago, Chicago, Illinois 60637, United States
- Institute
for Biophysical Dynamics, The University
of Chicago, Chicago, Illinois 60637, United
States
| | - Aaron R. Dinner
- Department
of Chemistry, The University of Chicago, Chicago, Illinois 60637, United States
- James
Franck Institute, The University of Chicago, Chicago, Illinois 60637, United States
- Institute
for Biophysical Dynamics, The University
of Chicago, Chicago, Illinois 60637, United
States
| |
Collapse
|
13
|
Thomas T, Roux B. TYROSINE KINASES: COMPLEX MOLECULAR SYSTEMS CHALLENGING COMPUTATIONAL METHODOLOGIES. THE EUROPEAN PHYSICAL JOURNAL. B 2021; 94:203. [PMID: 36524055 PMCID: PMC9749240 DOI: 10.1140/epjb/s10051-021-00207-7] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/22/2021] [Accepted: 09/14/2021] [Indexed: 05/28/2023]
Abstract
Classical molecular dynamics (MD) simulations based on atomic models play an increasingly important role in a wide range of applications in physics, biology, and chemistry. Nonetheless, generating genuine knowledge about biological systems using MD simulations remains challenging. Protein tyrosine kinases are important cellular signaling enzymes that regulate cell growth, proliferation, metabolism, differentiation, and migration. Due to the large conformational changes and long timescales involved in their function, these kinases present particularly challenging problems to modern computational and theoretical frameworks aimed at elucidating the dynamics of complex biomolecular systems. Markov state models have achieved limited success in tackling the broader conformational ensemble and biased methods are often employed to examine specific long timescale events. Recent advances in machine learning continue to push the limitations of current methodologies and provide notable improvements when integrated with the existing frameworks. A broad perspective is drawn from a critical review of recent studies.
Collapse
|
14
|
Abstract
![]()
The kinetics of
a dynamical system comprising two metastable states
is formulated in terms of a finite-time propagator in phase space
(position and velocity) adapted to the underdamped Langevin equation.
Dimensionality reduction to a subspace of collective variables yields
familiar expressions for the propagator, committor, and steady-state
flux. A quadratic expression for the steady-state flux between the
two metastable states can serve as a robust variational principle
to determine an optimal approximate committor expressed in terms of
a set of collective variables. The theoretical formulation is exploited
to clarify the foundation of the string method with swarms-of-trajectories,
which relies on the mean drift of short trajectories to determine
the optimal transition pathway. It is argued that the conditions for
Markovity within a subspace of collective variables may not be satisfied
with an arbitrary short time-step and that proper kinetic behaviors
appear only when considering the effective propagator for longer lag
times. The effective propagator with finite lag time is amenable to
an eigenvalue-eigenvector spectral analysis, as elaborated previously
in the context of position-based Markov models. The time-correlation
functions calculated by swarms-of-trajectories along the string pathway
constitutes a natural extension of these developments. The present
formulation provides a powerful theoretical framework to characterize
the optimal pathway between two metastable states of a system.
Collapse
Affiliation(s)
- Benoît Roux
- Department of Biochemistry and Molecular Biology, The University of Chicago, Chicago, Illinois 60637, United States.,Department of Chemistry, The University of Chicago, 5735 S. Ellis Avenue, Chicago, Illinois 60637, United States
| |
Collapse
|
15
|
Ferguson AL, Hachmann J, Miller TF, Pfaendtner J. The Journal of Physical Chemistry A/ B/ C Virtual Special Issue on Machine Learning in Physical Chemistry. J Phys Chem A 2021; 124:9113-9118. [PMID: 33147969 DOI: 10.1021/acs.jpca.0c09205] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
16
|
Strahan J, Antoszewski A, Lorpaiboon C, Vani BP, Weare J, Dinner AR. Long-Time-Scale Predictions from Short-Trajectory Data: A Benchmark Analysis of the Trp-Cage Miniprotein. J Chem Theory Comput 2021; 17:2948-2963. [PMID: 33908762 DOI: 10.1021/acs.jctc.0c00933] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Elucidating physical mechanisms with statistical confidence from molecular dynamics simulations can be challenging owing to the many degrees of freedom that contribute to collective motions. To address this issue, we recently introduced a dynamical Galerkin approximation (DGA) [Thiede, E. H. J. Chem. Phys., 150, 2019, 244111], in which chemical kinetic statistics that satisfy equations of dynamical operators are represented by a basis expansion. Here, we reformulate this approach, clarifying (and reducing) the dependence on the choice of lag time. We present a new projection of the reactive current onto collective variables and provide improved estimators for rates and committors. We also present simple procedures for constructing suitable smoothly varying basis functions from arbitrary molecular features. To evaluate estimators and basis sets numerically, we generate and carefully validate a data set of short trajectories for the unfolding and folding of the trp-cage miniprotein, a well-studied system. Our analysis demonstrates a comprehensive strategy for characterizing reaction pathways quantitatively.
Collapse
Affiliation(s)
- John Strahan
- Department of Chemistry, University of Chicago, Chicago, Illinois 60637, United States
| | - Adam Antoszewski
- Department of Chemistry, University of Chicago, Chicago, Illinois 60637, United States
| | - Chatipat Lorpaiboon
- Department of Chemistry, University of Chicago, Chicago, Illinois 60637, United States
| | - Bodhi P Vani
- Department of Chemistry, University of Chicago, Chicago, Illinois 60637, United States
| | - Jonathan Weare
- Courant Institute of Mathematical Sciences, New York University, New York, New York 10012, United States
| | - Aaron R Dinner
- Department of Chemistry, University of Chicago, Chicago, Illinois 60637, United States
| |
Collapse
|
17
|
Webber RJ, Thiede EH, Dow D, Dinner AR, Weare J. Error Bounds for Dynamical Spectral Estimation. SIAM JOURNAL ON MATHEMATICS OF DATA SCIENCE 2021; 3:225-252. [PMID: 34355137 PMCID: PMC8336423 DOI: 10.1137/20m1335984] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Dynamical spectral estimation is a well-established numerical approach for estimating eigenvalues and eigenfunctions of the Markov transition operator from trajectory data. Although the approach has been widely applied in biomolecular simulations, its error properties remain poorly understood. Here we analyze the error of a dynamical spectral estimation method called "the variational approach to conformational dynamics" (VAC). We bound the approximation error and estimation error for VAC estimates. Our analysis establishes VAC's convergence properties and suggests new strategies for tuning VAC to improve accuracy.
Collapse
Affiliation(s)
- Robert J Webber
- Courant Institute of Mathematical Sciences, New York University, New York, NY 10012 USA
| | - Erik H Thiede
- Department of Chemistry, University of Chicago, Chicago, IL 60637 USA
| | - Douglas Dow
- Department of Mathematics, University of Chicago, Chicago, IL 60637 USA
| | - Aaron R Dinner
- Department of Chemistry, University of Chicago, Chicago, IL 60637 USA
| | - Jonathan Weare
- Courant Institute of Mathematical Sciences, New York University, New York, NY 10012 USA
| |
Collapse
|
18
|
Ferguson AL, Hachmann J, Miller TF, Pfaendtner J. The Journal of Physical Chemistry A/ B/ C Virtual Special Issue on Machine Learning in Physical Chemistry. J Phys Chem B 2021; 124:9767-9772. [PMID: 33147970 DOI: 10.1021/acs.jpcb.0c09206] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|