1
|
Beyerle ER, Mehdi S, Tiwary P. Quantifying Energetic and Entropic Pathways in Molecular Systems. J Phys Chem B 2022; 126:3950-3960. [PMID: 35605180 DOI: 10.1021/acs.jpcb.2c01782] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
When examining dynamics occurring at nonzero temperatures, both energy and entropy must be taken into account to describe activated barrier crossing events. Furthermore, good reaction coordinates need to be constructed to describe different metastable states and the transition mechanisms between them. Here we use a physics-based machine learning method called state predictive information bottleneck (SPIB) to find nonlinear reaction coordinates for three systems of varying complexity. SPIB is able to correctly predict an entropic bottleneck for an analytical flat-energy double-well system and identify the entropy- and energy-dominated pathways for an analytical four-well system. Finally, for a simulation of benzoic acid permeation through a lipid bilayer, SPIB is able to discover the entropic and energetic barriers to the permeation process. Given these results, we thus establish that SPIB is a reasonable and robust method for finding the important entropy, energy, and enthalpy barriers in physical systems, which can then be used to enhance the understanding and sampling of different activated mechanisms.
Collapse
Affiliation(s)
- Eric R Beyerle
- Institute for Physical Science and Technology, University of Maryland, College Park, Maryland 20740, United States
| | - Shams Mehdi
- Biophysics Program and Institute for Physical Science and Technology, University of Maryland, College Park, Maryland 20742, United States
| | - Pratyush Tiwary
- Department of Chemistry and Biochemistry and Institute for Physical Science and Technology, University of Maryland, College Park, Maryland 20742, United States
| |
Collapse
|
2
|
Mitxelena I, López X, de Sancho D. Markov state models from hierarchical density-based assignment. J Chem Phys 2021; 155:054102. [PMID: 34364321 DOI: 10.1063/5.0056748] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Markov state models (MSMs) have become one of the preferred methods for the analysis and interpretation of molecular dynamics (MD) simulations of conformational transitions in biopolymers. While there is great variation in terms of implementation, a well-defined workflow involving multiple steps is often adopted. Typically, molecular coordinates are first subjected to dimensionality reduction and then clustered into small "microstates," which are subsequently lumped into "macrostates" using the information from the slowest eigenmodes. However, the microstate dynamics is often non-Markovian, and long lag times are required to converge the relevant slow dynamics in the MSM. Here, we propose a variation on this typical workflow, taking advantage of hierarchical density-based clustering. When applied to simulation data, this type of clustering separates high population regions of conformational space from others that are rarely visited. In this way, density-based clustering naturally implements assignment of the data based on transitions between metastable states, resulting in a core-set MSM. As a result, the state definition becomes more consistent with the assumption of Markovianity, and the timescales of the slow dynamics of the system are recovered more effectively. We present results of this simplified workflow for a model potential and MD simulations of the alanine dipeptide and the FiP35 WW domain.
Collapse
Affiliation(s)
- Ion Mitxelena
- Polimero eta Material Aurreratuak: Fisika, Kimika eta Teknologia, Kimika Fakultatea, UPV/EHU & Donostia International Physics Center (DIPC), PK 1072, 20018 Donostia-San Sebastian, Euskadi, Spain
| | - Xabier López
- Polimero eta Material Aurreratuak: Fisika, Kimika eta Teknologia, Kimika Fakultatea, UPV/EHU & Donostia International Physics Center (DIPC), PK 1072, 20018 Donostia-San Sebastian, Euskadi, Spain
| | - David de Sancho
- Polimero eta Material Aurreratuak: Fisika, Kimika eta Teknologia, Kimika Fakultatea, UPV/EHU & Donostia International Physics Center (DIPC), PK 1072, 20018 Donostia-San Sebastian, Euskadi, Spain
| |
Collapse
|
3
|
Suárez E, Wiewiora RP, Wehmeyer C, Noé F, Chodera JD, Zuckerman DM. What Markov State Models Can and Cannot Do: Correlation versus Path-Based Observables in Protein-Folding Models. J Chem Theory Comput 2021; 17:3119-3133. [PMID: 33904312 PMCID: PMC8127341 DOI: 10.1021/acs.jctc.0c01154] [Citation(s) in RCA: 25] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Markov state models (MSMs) have been widely applied to study the kinetics and pathways of protein conformational dynamics based on statistical analysis of molecular dynamics (MD) simulations. These MSMs coarse-grain both configuration space and time in ways that limit what kinds of observables they can reproduce with high fidelity over different spatial and temporal resolutions. Despite their popularity, there is still limited understanding of which biophysical observables can be computed from these MSMs in a robust and unbiased manner, and which suffer from the space-time coarse-graining intrinsic in the MSM model. Most theoretical arguments and practical validity tests for MSMs rely on long-time equilibrium kinetics, such as the slowest relaxation time scales and experimentally observable time-correlation functions. Here, we perform an extensive assessment of the ability of well-validated protein folding MSMs to accurately reproduce path-based observable such as mean first-passage times (MFPTs) and transition path mechanisms compared to a direct trajectory analysis. We also assess a recently proposed class of history-augmented MSMs (haMSMs) that exploit additional information not accounted for in standard MSMs. We conclude with some practical guidance on the use of MSMs to study various problems in conformational dynamics of biomolecules. In brief, MSMs can accurately reproduce correlation functions slower than the lag time, but path-based observables can only be reliably reproduced if the lifetimes of states exceed the lag time, which is a much stricter requirement. Even in the presence of short-lived states, we find that haMSMs reproduce path-based observables more reliably.
Collapse
Affiliation(s)
- Ernesto Suárez
- Advanced Biomedical Computational Science, Frederick National Laboratory for Cancer Research, Frederick, MD 21702
| | - Rafal P. Wiewiora
- Computational and Systems Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, NY 10065
| | | | | | - John D. Chodera
- Computational and Systems Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, NY 10065
| | - Daniel M. Zuckerman
- Department of Biomedical Engineering, Oregon Health and Science University, Portland, OR 97239
| |
Collapse
|
4
|
Guarnera E, Tan ZW, Berezovsky IN. Three-dimensional chromatin ensemble reconstruction via stochastic embedding. Structure 2021; 29:622-634.e3. [PMID: 33567266 DOI: 10.1016/j.str.2021.01.008] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2020] [Revised: 11/17/2020] [Accepted: 01/13/2021] [Indexed: 01/04/2023]
Abstract
We propose a comprehensive method for reconstructing the whole-genome chromatin ensemble from the Hi-C data. The procedure starts from Markov state modeling (MSM), delineating the structural hierarchy of chromatin organization with partitioning and effective interactions archetypal for corresponding levels of hierarchy. The stochastic embedding procedure introduced in this work provides the 3D ensemble reconstruction, using effective interactions obtained by the MSM as the input. As a result, we obtain the structural ensemble of a genome, allowing one to model the functional and the cell-type variability in the chromatin structure. The whole-genome reconstructions performed on the human B lymphoblastoid (GM12878) and lung fibroblast (IMR90) Hi-C data unravel distinctions in their morphologies and in the spatial arrangement of intermingling chromosomal territories, paving the way to studies of chromatin dynamics, developmental changes, and conformational transitions taking place in normal cells and during potential pathological developments.
Collapse
Affiliation(s)
- Enrico Guarnera
- Bioinformatics Institute (BII), Agency for Science, Technology and Research (A(∗)STAR), 30 Biopolis Street, #07-01, Matrix, Singapore 138671, Singapore
| | - Zhen Wah Tan
- Bioinformatics Institute (BII), Agency for Science, Technology and Research (A(∗)STAR), 30 Biopolis Street, #07-01, Matrix, Singapore 138671, Singapore
| | - Igor N Berezovsky
- Bioinformatics Institute (BII), Agency for Science, Technology and Research (A(∗)STAR), 30 Biopolis Street, #07-01, Matrix, Singapore 138671, Singapore; Department of Biological Sciences (DBS), National University of Singapore (NUS), 8 Medical Drive, Singapore 117597, Singapore.
| |
Collapse
|
5
|
Cao S, Montoya-Castillo A, Wang W, Markland TE, Huang X. On the advantages of exploiting memory in Markov state models for biomolecular dynamics. J Chem Phys 2021; 153:014105. [PMID: 32640825 DOI: 10.1063/5.0010787] [Citation(s) in RCA: 26] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/24/2023] Open
Abstract
Biomolecular dynamics play an important role in numerous biological processes. Markov State Models (MSMs) provide a powerful approach to study these dynamic processes by predicting long time scale dynamics based on many short molecular dynamics (MD) simulations. In an MSM, protein dynamics are modeled as a kinetic process consisting of a series of Markovian transitions between different conformational states at discrete time intervals (called "lag time"). To achieve this, a master equation must be constructed with a sufficiently long lag time to allow interstate transitions to become truly Markovian. This imposes a major challenge for MSM studies of proteins since the lag time is bound by the length of relatively short MD simulations available to estimate the frequency of transitions. Here, we show how one can employ the generalized master equation formalism to obtain an exact description of protein conformational dynamics both at short and long time scales without the time resolution restrictions imposed by the MSM lag time. Using a simple kinetic model, alanine dipeptide, and WW domain, we demonstrate that it is possible to construct these quasi-Markov State Models (qMSMs) using MD simulations that are 5-10 times shorter than those required by MSMs. These qMSMs only contain a handful of metastable states and, thus, can greatly facilitate the interpretation of mechanisms associated with protein dynamics. A qMSM opens the door to the study of conformational changes of complex biomolecules where a Markovian model with a few states is often difficult to construct due to the limited length of available MD simulations.
Collapse
Affiliation(s)
- Siqin Cao
- Department of Chemistry, The Hong Kong University of Science and Technology, Kowloon, Hong Kong
| | | | - Wei Wang
- Department of Chemistry, The Hong Kong University of Science and Technology, Kowloon, Hong Kong
| | - Thomas E Markland
- Department of Chemistry, Stanford University, Stanford, California 94305, USA
| | - Xuhui Huang
- Department of Chemistry, The Hong Kong University of Science and Technology, Kowloon, Hong Kong
| |
Collapse
|
6
|
Wolfe DK, Persichetti JR, Sharma AK, Hudson PS, Woodcock HL, O'Brien EP. Hierarchical Markov State Model Building to Describe Molecular Processes. J Chem Theory Comput 2020; 16:1816-1826. [PMID: 32011146 DOI: 10.1021/acs.jctc.9b00955] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]
Abstract
Markov state models can describe ensembles of pathways via kinetic networks but are difficult to create when large free-energy barriers limit unbiased sampling. Chain-of-states simulations allow sampling over large free-energy barriers but are often constructed using a single pathway that is unlikely to thermodynamically average over orthogonal degrees of freedom in complex systems. Here, we combine the advantages of these two approaches in the form of a Markov state model of Markov state models, which we call a Hierarchical Markov state model. In this approach, independent Markov models are constructed in regions of configuration space that are locally well sampled but are separated by large free-energy barriers from other regions. A string method is used to construct an ensemble of pathways connecting the states of these different local Markov models, and the rate through each pathway is then estimated. These rates are then combined with the rate information from the local Markov models in a master equation to predict global rates, fluxes, and populations. By applying this hierarchical approach to tractable systems, a toy potential and dipeptides, we demonstrate that it is more accurate than the conventional single-pathway description. The advantages of this approach are that it (i) is more realistic than the conventional chain-of-states approach, as an ensemble of pathways rather than a single pathway is used to describe processes in high-dimensional systems, and (ii) it resolves the issue of poor sampling in Markov State model building when large free-energy barriers are present. The divide-and-conquer strategy inherent to this approach should make this procedure straightforward to apply to more complex systems.
Collapse
Affiliation(s)
| | | | - Ajeet K Sharma
- Department of Physics, Indian Institute of Technology, Jammu 181221, India
| | - Phillip S Hudson
- Department of Chemistry, University of South Florida, Tampa, Florida 33620, United States
| | - H Lee Woodcock
- Department of Chemistry, University of South Florida, Tampa, Florida 33620, United States
| | | |
Collapse
|
7
|
Bacci M, Caflisch A, Vitalis A. On the removal of initial state bias from simulation data. J Chem Phys 2019; 150:104105. [PMID: 30876362 DOI: 10.1063/1.5063556] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022] Open
Abstract
Classical atomistic simulations of biomolecules play an increasingly important role in molecular life science. The structure of current computing architectures favors methods that run multiple trajectories at once without requiring extensive communication between them. Many advanced sampling strategies in the field fit this mold. These approaches often rely on an adaptive logic and create ensembles of comparatively short trajectories whose starting points are not distributed according to the correct Boltzmann weights. This type of bias is notoriously difficult to remove, and Markov state models (MSMs) are one of the few strategies available for recovering the correct kinetics and thermodynamics from these ensembles of trajectories. In this contribution, we analyze the performance of MSMs in the thermodynamic reweighting task for a hierarchical set of systems. We show that MSMs can be rigorous tools to recover the correct equilibrium distribution for systems of sufficiently low dimensionality. This is conditional upon not tampering with local flux imbalances found in the data. For a real-world application, we find that a pure likelihood-based inference of the transition matrix produces the best results. The removal of the bias is incomplete, however, and for this system, all tested MSMs are outperformed by an alternative albeit less general approach rooted in the ideas of statistical resampling. We conclude by formulating some recommendations for how to address the reweighting issue in practice.
Collapse
Affiliation(s)
- Marco Bacci
- University of Zurich, Department of Biochemistry, Winterthurerstrasse 190, CH-8057 Zurich, Switzerland
| | - Amedeo Caflisch
- University of Zurich, Department of Biochemistry, Winterthurerstrasse 190, CH-8057 Zurich, Switzerland
| | - Andreas Vitalis
- University of Zurich, Department of Biochemistry, Winterthurerstrasse 190, CH-8057 Zurich, Switzerland
| |
Collapse
|
8
|
Swenson DWH, Prinz JH, Noe F, Chodera JD, Bolhuis PG. OpenPathSampling: A Python Framework for Path Sampling Simulations. 1. Basics. J Chem Theory Comput 2018; 15:813-836. [PMID: 30336030 PMCID: PMC6374749 DOI: 10.1021/acs.jctc.8b00626] [Citation(s) in RCA: 35] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]
Abstract
![]()
Transition
path sampling techniques allow molecular dynamics simulations of complex
systems to focus on rare dynamical events, providing
insight into mechanisms and the ability to calculate rates inaccessible
by ordinary dynamics simulations. While path sampling algorithms are
conceptually as simple as importance sampling Monte Carlo, the technical
complexity of their implementation has kept these techniques out of
reach of the broad community. Here, we introduce an easy-to-use Python
framework called OpenPathSampling (OPS) that facilitates path sampling
for (bio)molecular systems with minimal effort and yet is still extensible.
Interfaces to OpenMM and an internal dynamics engine for simple models
are provided in the initial release, but new molecular simulation
packages can easily be added. Multiple ready-to-use transition path
sampling methodologies are implemented, including standard transition
path sampling (TPS) between reactant and product states and transition
interface sampling (TIS) and its replica exchange variant (RETIS),
as well as recent multistate and multiset extensions of transition
interface sampling (MSTIS, MISTIS). In addition, tools are provided
to facilitate the implementation of new path sampling schemes built
on basic path sampling components. In this paper, we give an overview
of the design of this framework and illustrate the simplicity of applying
the available path sampling algorithms to a variety of benchmark problems.
Collapse
Affiliation(s)
- David W H Swenson
- van 't Hoff Institute for Molecular Sciences , University of Amsterdam , P.O. Box 94157, 1090 GD Amsterdam , The Netherlands.,Computational and Systems Biology Program , Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center , New York , New York 10065 , United States
| | - Jan-Hendrik Prinz
- Computational and Systems Biology Program , Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center , New York , New York 10065 , United States.,Department of Mathematics and Computer Science, Arnimallee 6 , Freie Universität Berlin , 14195 Berlin , Germany
| | - Frank Noe
- Department of Mathematics and Computer Science, Arnimallee 6 , Freie Universität Berlin , 14195 Berlin , Germany
| | - John D Chodera
- Computational and Systems Biology Program , Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center , New York , New York 10065 , United States
| | - Peter G Bolhuis
- van 't Hoff Institute for Molecular Sciences , University of Amsterdam , P.O. Box 94157, 1090 GD Amsterdam , The Netherlands
| |
Collapse
|
9
|
Tan ZW, Guarnera E, Berezovsky IN. Exploring chromatin hierarchical organization via Markov State Modelling. PLoS Comput Biol 2018; 14:e1006686. [PMID: 30596637 PMCID: PMC6355033 DOI: 10.1371/journal.pcbi.1006686] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2018] [Revised: 01/31/2019] [Accepted: 11/27/2018] [Indexed: 01/02/2023] Open
Abstract
We propose a new computational method for exploring chromatin structural organization based on Markov State Modelling of Hi-C data represented as an interaction network between genomic loci. A Markov process describes the random walk of a traveling probe in the corresponding energy landscape, mimicking the motion of a biomolecule involved in chromatin function. By studying the metastability of the associated Markov State Model upon annealing, the hierarchical structure of individual chromosomes is observed, and corresponding set of structural partitions is identified at each level of hierarchy. Then, the notion of effective interaction between partitions is derived, delineating the overall topology and architecture of chromosomes. Mapping epigenetic data on the graphs of intra-chromosomal effective interactions helps in understanding how chromosome organization facilitates its function. A sketch of whole-genome interactions obtained from the analysis of 539 partitions from all 23 chromosomes, complemented by distributions of gene expression regulators and epigenetic factors, sheds light on the structure-function relationships in chromatin, delineating chromosomal territories, as well as structural partitions analogous to topologically associating domains and active / passive epigenomic compartments. In addition to the overall genome architecture shown by effective interactions, the affinity between partitions of different chromosomes was analyzed as an indicator of the degree of association between partitions in functionally relevant genomic interactions. The overall static picture of whole-genome interactions obtained with the method presented in this work provides a foundation for chromatin structural reconstruction, for the modelling of chromatin dynamics, and for exploring the regulation of genome function. The algorithms used in this study are implemented in a freely available Python package ChromaWalker (https://bitbucket.org/ZhenWahTan/chromawalker).
Collapse
Affiliation(s)
- Zhen Wah Tan
- Bioinformatics Institute (BII), Agency for Science, Technology and Research (A*STAR), Matrix, Singapore
| | - Enrico Guarnera
- Bioinformatics Institute (BII), Agency for Science, Technology and Research (A*STAR), Matrix, Singapore
| | - Igor N. Berezovsky
- Bioinformatics Institute (BII), Agency for Science, Technology and Research (A*STAR), Matrix, Singapore
- Department of Biological Sciences (DBS), National University of Singapore (NUS), Singapore
| |
Collapse
|
10
|
Kots ED, Khrenova MG, Nemukhin AV. Allosteric Control of N-Acetyl-Aspartate Hydrolysis by the Y231C and F295S Mutants of Human Aspartoacylase. J Chem Inf Model 2018; 59:2299-2308. [DOI: 10.1021/acs.jcim.8b00666] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022]
Affiliation(s)
- Ekaterina D. Kots
- Department of Chemistry, Lomonosov Moscow State University, Leninskie Gory 1/3, Moscow, 119991, Russian Federation
- Emanuel Institute of Biochemical Physics, Russian Academy of Sciences, Kosygina 4, Moscow, 119334, Russian Federation
| | - Maria G. Khrenova
- Department of Chemistry, Lomonosov Moscow State University, Leninskie Gory 1/3, Moscow, 119991, Russian Federation
- Federal Research Center of Biotechnology, Bach Institute of Biochemistry, Russian Academy of Sciences, Leninskiy Prospect 33, 119071 Moscow, Russian Federation
| | - Alexander V. Nemukhin
- Department of Chemistry, Lomonosov Moscow State University, Leninskie Gory 1/3, Moscow, 119991, Russian Federation
- Emanuel Institute of Biochemical Physics, Russian Academy of Sciences, Kosygina 4, Moscow, 119334, Russian Federation
| |
Collapse
|
11
|
Noé F, Clementi C. Collective variables for the study of long-time kinetics from molecular trajectories: theory and methods. Curr Opin Struct Biol 2017; 43:141-147. [PMID: 28327454 DOI: 10.1016/j.sbi.2017.02.006] [Citation(s) in RCA: 98] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2016] [Accepted: 02/20/2017] [Indexed: 12/23/2022]
Abstract
Collective variables are an important concept to study high-dimensional dynamical systems, such as molecular dynamics of macromolecules, liquids, or polymers, in particular to define relevant metastable states and state-transition or phase-transition. Over the past decade, a rigorous mathematical theory has been formulated to define optimal collective variables to characterize slow dynamical processes. Here we review recent developments, including a variational principle to find optimal approximations to slow collective variables from simulation data, and algorithms such as the time-lagged independent component analysis. Using these concepts, a distance metric can be defined that quantifies how slowly molecular conformations interconvert. Extensions and open questions are discussed.
Collapse
Affiliation(s)
- Frank Noé
- Department of Mathematics and Computer Science, FU Berlin, Arnimallee 6, 14195 Berlin, Germany.
| | - Cecilia Clementi
- Center for Theoretical Biological Physics, and Department of Chemistry, Rice University, 6100 Main Street, Houston, TX 77005, United States.
| |
Collapse
|
12
|
Lemke O, Keller BG. Density-based cluster algorithms for the identification of core sets. J Chem Phys 2016; 145:164104. [DOI: 10.1063/1.4965440] [Citation(s) in RCA: 48] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/12/2023] Open
Affiliation(s)
- Oliver Lemke
- Department of Biology, Chemistry, Pharmacy, Freie Universität Berlin, Takustraße 3, D-14195 Berlin, Germany
| | - Bettina G. Keller
- Department of Biology, Chemistry, Pharmacy, Freie Universität Berlin, Takustraße 3, D-14195 Berlin, Germany
| |
Collapse
|