1
|
Wu Y, Cao S, Qiu Y, Huang X. Tutorial on how to build non-Markovian dynamic models from molecular dynamics simulations for studying protein conformational changes. J Chem Phys 2024; 160:121501. [PMID: 38516972 DOI: 10.1063/5.0189429] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2023] [Accepted: 02/20/2024] [Indexed: 03/23/2024] Open
Abstract
Protein conformational changes play crucial roles in their biological functions. In recent years, the Markov State Model (MSM) constructed from extensive Molecular Dynamics (MD) simulations has emerged as a powerful tool for modeling complex protein conformational changes. In MSMs, dynamics are modeled as a sequence of Markovian transitions among metastable conformational states at discrete time intervals (called lag time). A major challenge for MSMs is that the lag time must be long enough to allow transitions among states to become memoryless (or Markovian). However, this lag time is constrained by the length of individual MD simulations available to track these transitions. To address this challenge, we have recently developed Generalized Master Equation (GME)-based approaches, encoding non-Markovian dynamics using a time-dependent memory kernel. In this Tutorial, we introduce the theory behind two recently developed GME-based non-Markovian dynamic models: the quasi-Markov State Model (qMSM) and the Integrative Generalized Master Equation (IGME). We subsequently outline the procedures for constructing these models and provide a step-by-step tutorial on applying qMSM and IGME to study two peptide systems: alanine dipeptide and villin headpiece. This Tutorial is available at https://github.com/xuhuihuang/GME_tutorials. The protocols detailed in this Tutorial aim to be accessible for non-experts interested in studying the biomolecular dynamics using these non-Markovian dynamic models.
Collapse
Affiliation(s)
- Yue Wu
- Department of Chemistry, Theoretical Chemistry Institute, University of Wisconsin-Madison, Madison, Wisconsin 53706, USA
| | - Siqin Cao
- Department of Chemistry, Theoretical Chemistry Institute, University of Wisconsin-Madison, Madison, Wisconsin 53706, USA
| | - Yunrui Qiu
- Department of Chemistry, Theoretical Chemistry Institute, University of Wisconsin-Madison, Madison, Wisconsin 53706, USA
| | - Xuhui Huang
- Department of Chemistry, Theoretical Chemistry Institute, University of Wisconsin-Madison, Madison, Wisconsin 53706, USA
- Data Science Institute, University of Wisconsin-Madison, Madison, Wisconsin 53706, USA
| |
Collapse
|
2
|
Cao S, Qiu Y, Kalin ML, Huang X. Integrative generalized master equation: A method to study long-timescale biomolecular dynamics via the integrals of memory kernels. J Chem Phys 2023; 159:134106. [PMID: 37787134 PMCID: PMC11005468 DOI: 10.1063/5.0167287] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2023] [Accepted: 09/18/2023] [Indexed: 10/04/2023] Open
Abstract
The generalized master equation (GME) provides a powerful approach to study biomolecular dynamics via non-Markovian dynamic models built from molecular dynamics (MD) simulations. Previously, we have implemented the GME, namely the quasi Markov State Model (qMSM), where we explicitly calculate the memory kernel and propagate dynamics using a discretized GME. qMSM can be constructed with much shorter MD trajectories than the MSM. However, since qMSM needs to explicitly compute the time-dependent memory kernels, it is heavily affected by the numerical fluctuations of simulation data when applied to study biomolecular conformational changes. This can lead to numerical instability of predicted long-time dynamics, greatly limiting the applicability of qMSM in complicated biomolecules. We present a new method, the Integrative GME (IGME), in which we analytically solve the GME under the condition when the memory kernels have decayed to zero. Our IGME overcomes the challenges of the qMSM by using the time integrations of memory kernels, thereby avoiding the numerical instability caused by explicit computation of time-dependent memory kernels. Using our solutions of the GME, we have developed a new approach to compute long-time dynamics based on MD simulations in a numerically stable, accurate and efficient way. To demonstrate its effectiveness, we have applied the IGME in three biomolecules: the alanine dipeptide, FIP35 WW-domain, and Taq RNA polymerase. In each system, the IGME achieves significantly smaller fluctuations for both memory kernels and long-time dynamics compared to the qMSM. We anticipate that the IGME can be widely applied to investigate biomolecular conformational changes.
Collapse
Affiliation(s)
- Siqin Cao
- Department of Chemistry, Theoretical Chemistry Institute, University of Wisconsin-Madison, Madison, Wisconsin 53706, USA
| | - Yunrui Qiu
- Department of Chemistry, Theoretical Chemistry Institute, University of Wisconsin-Madison, Madison, Wisconsin 53706, USA
| | - Michael L. Kalin
- Biophysics Graduate Program, University of Wisconsin-Madison, Madison, Wisconsin 53706, USA
| | - Xuhui Huang
- Department of Chemistry, Theoretical Chemistry Institute, University of Wisconsin-Madison, Madison, Wisconsin 53706, USA
| |
Collapse
|
3
|
Liu B, Xue M, Qiu Y, Konovalov KA, O’Connor MS, Huang X. GraphVAMPnets for uncovering slow collective variables of self-assembly dynamics. J Chem Phys 2023; 159:094901. [PMID: 37655771 PMCID: PMC11005469 DOI: 10.1063/5.0158903] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2023] [Accepted: 08/11/2023] [Indexed: 09/02/2023] Open
Abstract
Uncovering slow collective variables (CVs) of self-assembly dynamics is important to elucidate its numerous kinetic assembly pathways and drive the design of novel structures for advanced materials through the bottom-up approach. However, identifying the CVs for self-assembly presents several challenges. First, self-assembly systems often consist of identical monomers, and the feature representations should be invariant to permutations and rotational symmetries. Physical coordinates, such as aggregate size, lack high-resolution detail, while common geometric coordinates like pairwise distances are hindered by the permutation and rotational symmetry challenges. Second, self-assembly is usually a downhill process, and the trajectories often suffer from insufficient sampling of backward transitions that correspond to the dissociation of self-assembled structures. Popular dimensionality reduction methods, such as time-structure independent component analysis, impose detailed balance constraints, potentially obscuring the true dynamics of self-assembly. In this work, we employ GraphVAMPnets, which combines graph neural networks with a variational approach for Markovian process (VAMP) theory to identify the slow CVs of the self-assembly processes. First, GraphVAMPnets bears the advantages of graph neural networks, in which the graph embeddings can represent self-assembly structures in high-resolution while being invariant to permutations and rotational symmetries. Second, it is built upon VAMP theory, which studies Markov processes without forcing detailed balance constraints, which addresses the out-of-equilibrium challenge in the self-assembly process. We demonstrate GraphVAMPnets for identifying slow CVs of self-assembly kinetics in two systems: the aggregation of two hydrophobic molecules and the self-assembly of patchy particles. We expect that our GraphVAMPnets can be widely applied to molecular self-assembly.
Collapse
Affiliation(s)
- Bojun Liu
- Department of Chemistry, Theoretical Chemistry Institute, University of Wisconsin-Madison, Madison, Wisconsin 53706, USA
| | - Mingyi Xue
- Department of Chemistry, Theoretical Chemistry Institute, University of Wisconsin-Madison, Madison, Wisconsin 53706, USA
| | - Yunrui Qiu
- Department of Chemistry, Theoretical Chemistry Institute, University of Wisconsin-Madison, Madison, Wisconsin 53706, USA
| | - Kirill A. Konovalov
- Department of Chemistry, Theoretical Chemistry Institute, University of Wisconsin-Madison, Madison, Wisconsin 53706, USA
| | - Michael S. O’Connor
- Biophysics Graduate Program, University of Wisconsin-Madison, Madison, Wisconsin 53706, USA
| | - Xuhui Huang
- Author to whom correspondence should be addressed:
| |
Collapse
|
4
|
Qiu Y, O’Connor MS, Xue M, Liu B, Huang X. An Efficient Path Classification Algorithm Based on Variational Autoencoder to Identify Metastable Path Channels for Complex Conformational Changes. J Chem Theory Comput 2023; 19:4728-4742. [PMID: 37382437 PMCID: PMC11042546 DOI: 10.1021/acs.jctc.3c00318] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/30/2023]
Abstract
Conformational changes (i.e., dynamic transitions between pairs of conformational states) play important roles in many chemical and biological processes. Constructing the Markov state model (MSM) from extensive molecular dynamics (MD) simulations is an effective approach to dissect the mechanism of conformational changes. When combined with transition path theory (TPT), MSM can be applied to elucidate the ensemble of kinetic pathways connecting pairs of conformational states. However, the application of TPT to analyze complex conformational changes often results in a vast number of kinetic pathways with comparable fluxes. This obstacle is particularly pronounced in heterogeneous self-assembly and aggregation processes. The large number of kinetic pathways makes it challenging to comprehend the molecular mechanisms underlying conformational changes of interest. To address this challenge, we have developed a path classification algorithm named latent-space path clustering (LPC) that efficiently lumps parallel kinetic pathways into distinct metastable path channels, making them easier to comprehend. In our algorithm, MD conformations are first projected onto a low-dimensional space containing a small set of collective variables (CVs) by time-structure-based independent component analysis (tICA) with kinetic mapping. Then, MSM and TPT are constructed to obtain the ensemble of pathways, and a deep learning architecture named the variational autoencoder (VAE) is used to learn the spatial distributions of kinetic pathways in the continuous CV space. Based on the trained VAE model, the TPT-generated ensemble of kinetic pathways can be embedded into a latent space, where the classification becomes clear. We show that LPC can efficiently and accurately identify the metastable path channels in three systems: a 2D potential, the aggregation of two hydrophobic particles in water, and the folding of the Fip35 WW domain. Using the 2D potential, we further demonstrate that our LPC algorithm outperforms the previous path-lumping algorithms by making substantially fewer incorrect assignments of individual pathways to four path channels. We expect that LPC can be widely applied to identify the dominant kinetic pathways underlying complex conformational changes.
Collapse
Affiliation(s)
- Yunrui Qiu
- Department of Chemistry, Theoretical Chemistry Institute, University of Wisconsin-Madison, Madison, WI, 53706, USA
| | - Michael S. O’Connor
- Biophysics Graduate Program, University of Wisconsin-Madison, Madison, WI, 53706, USA
| | - Mingyi Xue
- Department of Chemistry, Theoretical Chemistry Institute, University of Wisconsin-Madison, Madison, WI, 53706, USA
| | - Bojun Liu
- Department of Chemistry, Theoretical Chemistry Institute, University of Wisconsin-Madison, Madison, WI, 53706, USA
| | - Xuhui Huang
- Department of Chemistry, Theoretical Chemistry Institute, University of Wisconsin-Madison, Madison, WI, 53706, USA
- Biophysics Graduate Program, University of Wisconsin-Madison, Madison, WI, 53706, USA
| |
Collapse
|
5
|
Damjanovic J, Murphy JM, Lin YS. CATBOSS: Cluster Analysis of Trajectories Based on Segment Splitting. J Chem Inf Model 2021; 61:5066-5081. [PMID: 34608796 PMCID: PMC8549068 DOI: 10.1021/acs.jcim.1c00598] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/14/2023]
Abstract
![]()
Molecular dynamics
(MD) simulations are an exceedingly and increasingly
potent tool for molecular behavior prediction and analysis. However,
the enormous wealth of data generated by these simulations can be
difficult to process and render in a human-readable fashion. Cluster
analysis is a commonly used way to partition data into structurally
distinct states. We present a method that improves on the state of
the art by taking advantage of the temporal information of MD trajectories
to enable more accurate clustering at a lower memory cost. To date,
cluster analysis of MD simulations has generally treated simulation
snapshots as a mere collection of independent data points and attempted
to separate them into different clusters based on structural similarity.
This new method, cluster analysis of trajectories based on segment
splitting (CATBOSS), applies density-peak-based clustering to classify trajectory segments learned by change detection. Applying
the method to a synthetic toy model as well as four real-life data
sets–trajectories of MD simulations of alanine dipeptide and
valine dipeptide as well as two fast-folding proteins–we find
CATBOSS to be robust and highly performant, yielding natural-looking
cluster boundaries and greatly improving clustering resolution. As
the classification of points into segments emphasizes density gaps
in the data by grouping them close to the state means, CATBOSS applied
to the valine dipeptide system is even able to account for a degree
of freedom deliberately omitted from the input data set. We also demonstrate
the potential utility of CATBOSS in distinguishing metastable states
from transition segments as well as promising application to cases
where there is little or no advance knowledge of intrinsic coordinates,
making for a highly versatile analysis tool.
Collapse
Affiliation(s)
- Jovan Damjanovic
- Department of Chemistry, Tufts University, Medford, Massachusetts 02155, United States
| | - James M Murphy
- Department of Mathematics, Tufts University, Medford, Massachusetts 02155, United States
| | - Yu-Shan Lin
- Department of Chemistry, Tufts University, Medford, Massachusetts 02155, United States
| |
Collapse
|
6
|
Cao S, Montoya-Castillo A, Wang W, Markland TE, Huang X. On the advantages of exploiting memory in Markov state models for biomolecular dynamics. J Chem Phys 2021; 153:014105. [PMID: 32640825 DOI: 10.1063/5.0010787] [Citation(s) in RCA: 31] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/24/2023] Open
Abstract
Biomolecular dynamics play an important role in numerous biological processes. Markov State Models (MSMs) provide a powerful approach to study these dynamic processes by predicting long time scale dynamics based on many short molecular dynamics (MD) simulations. In an MSM, protein dynamics are modeled as a kinetic process consisting of a series of Markovian transitions between different conformational states at discrete time intervals (called "lag time"). To achieve this, a master equation must be constructed with a sufficiently long lag time to allow interstate transitions to become truly Markovian. This imposes a major challenge for MSM studies of proteins since the lag time is bound by the length of relatively short MD simulations available to estimate the frequency of transitions. Here, we show how one can employ the generalized master equation formalism to obtain an exact description of protein conformational dynamics both at short and long time scales without the time resolution restrictions imposed by the MSM lag time. Using a simple kinetic model, alanine dipeptide, and WW domain, we demonstrate that it is possible to construct these quasi-Markov State Models (qMSMs) using MD simulations that are 5-10 times shorter than those required by MSMs. These qMSMs only contain a handful of metastable states and, thus, can greatly facilitate the interpretation of mechanisms associated with protein dynamics. A qMSM opens the door to the study of conformational changes of complex biomolecules where a Markovian model with a few states is often difficult to construct due to the limited length of available MD simulations.
Collapse
Affiliation(s)
- Siqin Cao
- Department of Chemistry, The Hong Kong University of Science and Technology, Kowloon, Hong Kong
| | | | - Wei Wang
- Department of Chemistry, The Hong Kong University of Science and Technology, Kowloon, Hong Kong
| | - Thomas E Markland
- Department of Chemistry, Stanford University, Stanford, California 94305, USA
| | - Xuhui Huang
- Department of Chemistry, The Hong Kong University of Science and Technology, Kowloon, Hong Kong
| |
Collapse
|
7
|
GPU-based fast clustering via K-Centres and k-NN mode seeking for geospatial industry applications. COMPUT IND 2020. [DOI: 10.1016/j.compind.2020.103260] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
8
|
Target search and recognition mechanisms of glycosylase AlkD revealed by scanning FRET-FCS and Markov state models. Proc Natl Acad Sci U S A 2020; 117:21889-21895. [PMID: 32820079 PMCID: PMC7486748 DOI: 10.1073/pnas.2002971117] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
Abstract
DNA glycosylase repairs DNA damage to maintain the genome integrity, and thus it is essential for the survival of all organisms. However, it remains a long-standing puzzle how glycosylase diffuses along the genomic DNA to locate the sparse and aberrant lesion sites efficiently and accurately in the genome containing numerous base pairs. Previously, only the high-speed–low-accuracy search mode has been characterized experimentally, while the low-speed–high-accuracy mode is undetectable. Here, we observed the low-speed mode of glycosylase AlkD translocating, and further dissected its molecular mechanisms. To achieve this, we developed an integrated platform by combining scanning FRET-FCS with Markov state model. We expect that this platform can be widely applied to investigate other glycosylases and DNA-binding proteins. DNA glycosylase is responsible for repairing DNA damage to maintain the genome stability and integrity. However, how glycosylase can efficiently and accurately recognize DNA lesions across the enormous DNA genome remains elusive. It has been hypothesized that glycosylase translocates along the DNA by alternating between a fast but low-accuracy diffusion mode and a slow but high-accuracy mode when searching for DNA lesions. However, the slow mode has not been successfully characterized due to the limitation in the spatial and temporal resolutions of current experimental techniques. Using a newly developed scanning fluorescence resonance energy transfer (FRET)–fluorescence correlation spectroscopy (FCS) platform, we were able to observe both slow and fast modes of glycosylase AlkD translocating on double-stranded DNA (dsDNA), reaching the temporal resolution of microsecond and spatial resolution of subnanometer. The underlying molecular mechanism of the slow mode was further elucidated by Markov state model built from extensive all-atom molecular dynamics simulations. We found that in the slow mode, AlkD follows an asymmetric diffusion pathway, i.e., rotation followed by translation. Furthermore, the essential role of Y27 in AlkD diffusion dynamics was identified both experimentally and computationally. Our results provided mechanistic insights on how conformational dynamics of AlkD–dsDNA complex coordinate different diffusion modes to accomplish the search for DNA lesions with high efficiency and accuracy. We anticipate that the mechanism adopted by AlkD to search for DNA lesions could be a general one utilized by other glycosylases and DNA binding proteins.
Collapse
|
9
|
Ligand-bound glutamine binding protein assumes multiple metastable binding sites with different binding affinities. Commun Biol 2020; 3:419. [PMID: 32747735 PMCID: PMC7400645 DOI: 10.1038/s42003-020-01149-z] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2020] [Accepted: 07/14/2020] [Indexed: 11/08/2022] Open
Abstract
Protein dynamics plays key roles in ligand binding. However, the microscopic description of conformational dynamics-coupled ligand binding remains a challenge. In this study, we integrate molecular dynamics simulations, Markov state model (MSM) analysis and experimental methods to characterize the conformational dynamics of ligand-bound glutamine binding protein (GlnBP). We show that ligand-bound GlnBP has high conformational flexibility and additional metastable binding sites, presenting a more complex energy landscape than the scenario in the absence of ligand. The diverse conformations of GlnBP demonstrate different binding affinities and entail complex transition kinetics, implicating a concerted ligand binding mechanism. Single molecule fluorescence resonance energy transfer measurements and mutagenesis experiments are performed to validate our MSM-derived structure ensemble as well as the binding mechanism. Collectively, our study provides deeper insights into the protein dynamics-coupled ligand binding, revealing an intricate regulatory network underlying the apparent binding affinity. Zhang, Wu, Feng et al. show that ligand-bound glutamine binding protein assumes multiple metastable binding sites, presenting a more dynamic energy landscape than its ligand-free form. This study provides insights into the ligand-binding mechanisms coupled with protein dynamics that underly the apparent binding affinity.
Collapse
|
10
|
Pei HW, Laaksonen A. Feature vector clustering molecular pairs in computer simulations. J Comput Chem 2019; 40:2539-2549. [PMID: 31313339 DOI: 10.1002/jcc.26028] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2019] [Revised: 06/18/2019] [Accepted: 06/22/2019] [Indexed: 01/07/2023]
Abstract
A clustering framework is introduced to analyze the microscopic structural organization of molecular pairs in liquids and solutions. A molecular pair is represented by a representative vector (RV). To obtain RV, intermolecular atom distances in the pair are extracted from simulation trajectory as components of the key feature vector (KFV). A specific scheme is then suggested to transform KFV to RV by removing the influence of permutational molecular symmetry on the KFV as the predicted clusters should be independent of possible permutations of identical atoms in the pair. After RVs of pairs are obtained, a clustering analysis technique is finally used to classify all the RVs of molecular pairs into the clusters. The framework is applied to analyze trajectory from molecular dynamics simulations of an ionic liquid (trihexyltetradecylphosphonium bis(oxalato)borate ([P6,6,6,14 ][BOB])). The molecular pairs are successfully categorized into physically meaningful clusters, and their effectiveness is evaluated by computing the product moment correlation coefficient (PMCC). (Willett, Winterman, and Bawden, J. Chem. Inf. Comput. Sci. 1986, 26, 109-118; Downs, Willett, and Fisanick, J. Chem. Inf. Comput. Sci. 1994, 34, 1094-1102) It is observed that representative configurations of two clusters are related to two energy local minimum structures optimized by density functional theory (DFT) calculation, respectively. Several widely used clustering analysis techniques of both nonhierarchical (k-means) and hierarchical clustering algorithms are also evaluated and compared with each other. The proposed KFV technique efficiently reveals local molecular pair structures in the simulated complex liquid. It is a method, which is highly useful for liquids and solutions in particular with strong intermolecular interactions. © 2019 Wiley Periodicals, Inc.
Collapse
Affiliation(s)
- Han-Wen Pei
- Department of Materials and Environmental Chemistry, Arrhenius Laboratory, Stockholm University, SE-106 91, Stockholm, Sweden.,System and Component Design, Department of Machine Design, KTH Royal Institute of Technology, SE-100 44, Stockholm, Sweden
| | - Aatto Laaksonen
- Department of Materials and Environmental Chemistry, Arrhenius Laboratory, Stockholm University, SE-106 91, Stockholm, Sweden.,State Key Laboratory of Materials-Oriented and Chemical Engineering, Nanjing Tech University, Nanjing, 210009, China.,Centre of Advanced Research in Bionanoconjugates and Biopolymers, Petru Poni Institute of Macromolecular Chemistry Aleea Grigore Ghica-Voda, 41A, 700487, Lasi, Romania
| |
Collapse
|
11
|
Porter JR, Zimmerman MI, Bowman GR. Enspara: Modeling molecular ensembles with scalable data structures and parallel computing. J Chem Phys 2019; 150:044108. [PMID: 30709308 DOI: 10.1063/1.5063794] [Citation(s) in RCA: 34] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Markov state models (MSMs) are quantitative models of protein dynamics that are useful for uncovering the structural fluctuations that proteins undergo, as well as the mechanisms of these conformational changes. Given the enormity of conformational space, there has been ongoing interest in identifying a small number of states that capture the essential features of a protein. Generally, this is achieved by making assumptions about the properties of relevant features-for example, that the most important features are those that change slowly. An alternative strategy is to keep as many degrees of freedom as possible and subsequently learn from the model which of the features are most important. In these larger models, however, traditional approaches quickly become computationally intractable. In this paper, we present enspara, a library for working with MSMs that provides several novel algorithms and specialized data structures that dramatically improve the scalability of traditional MSM methods. This includes ragged arrays for minimizing memory requirements, message passing interface-parallelized implementations of compute-intensive operations, and a flexible framework for model construction and analysis.
Collapse
Affiliation(s)
- J R Porter
- Department of Biochemistry and Molecular Biophysics, Washington University School of Medicine, 660 South Euclid Avenue, St. Louis, Missouri 63110, USA
| | - M I Zimmerman
- Department of Biochemistry and Molecular Biophysics, Washington University School of Medicine, 660 South Euclid Avenue, St. Louis, Missouri 63110, USA
| | - G R Bowman
- Department of Biochemistry and Molecular Biophysics, Washington University School of Medicine, 660 South Euclid Avenue, St. Louis, Missouri 63110, USA
| |
Collapse
|
12
|
Wang W, Liang T, Sheong FK, Fan X, Huang X. An efficient Bayesian kinetic lumping algorithm to identify metastable conformational states via Gibbs sampling. J Chem Phys 2018; 149:072337. [PMID: 30134698 DOI: 10.1063/1.5027001] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/29/2023] Open
Abstract
Markov State Model (MSM) has become a popular approach to study the conformational dynamics of complex biological systems in recent years. Built upon a large number of short molecular dynamics simulation trajectories, MSM is able to predict the long time scale dynamics of complex systems. However, to achieve Markovianity, an MSM often contains hundreds or thousands of states (microstates), hindering human interpretation of the underlying system mechanism. One way to reduce the number of states is to lump kinetically similar states together and thus coarse-grain the microstates into macrostates. In this work, we introduce a probabilistic lumping algorithm, the Gibbs lumping algorithm, to assign a probability to any given kinetic lumping using the Bayesian inference. In our algorithm, the transitions among kinetically distinct macrostates are modeled by Poisson processes, which will well reflect the separation of time scales in the underlying free energy landscape of biomolecules. Furthermore, to facilitate the search for the optimal kinetic lumping (i.e., the lumped model with the highest probability), a Gibbs sampling algorithm is introduced. To demonstrate the power of our new method, we apply it to three systems: a 2D potential, alanine dipeptide, and a WW protein domain. In comparison with six other popular lumping algorithms, we show that our method can persistently produce the lumped macrostate model with the highest probability as well as the largest metastability. We anticipate that our Gibbs lumping algorithm holds great promise to be widely applied to investigate conformational changes in biological macromolecules.
Collapse
Affiliation(s)
- Wei Wang
- HKUST-Shenzhen Research Institute, Hi-Tech Park, Nanshan, Shenzhen 518057, China
| | - Tong Liang
- Department of Statistics, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong
| | - Fu Kit Sheong
- Department of Chemistry, The Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong
| | - Xiaodan Fan
- Department of Statistics, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong
| | - Xuhui Huang
- HKUST-Shenzhen Research Institute, Hi-Tech Park, Nanshan, Shenzhen 518057, China
| |
Collapse
|
13
|
Peng JH, Wang W, Yu YQ, Gu HL, Huang X. Clustering algorithms to analyze molecular dynamics simulation trajectories for complex chemical and biological systems. CHINESE J CHEM PHYS 2018. [DOI: 10.1063/1674-0068/31/cjcp1806147] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Affiliation(s)
- Jun-hui Peng
- HKUST-Shenzhen Research Institute, Hi-Tech Park, Nanshan, Shenzhen 518057, China
- Department of Chemistry, The Hong Kong University of Science and Technology, Kowloon, Hong Kong
| | - Wei Wang
- HKUST-Shenzhen Research Institute, Hi-Tech Park, Nanshan, Shenzhen 518057, China
- Department of Chemistry, The Hong Kong University of Science and Technology, Kowloon, Hong Kong
| | - Ye-qing Yu
- HKUST-Shenzhen Research Institute, Hi-Tech Park, Nanshan, Shenzhen 518057, China
- Department of Chemistry, The Hong Kong University of Science and Technology, Kowloon, Hong Kong
| | - Han-lin Gu
- Department of Mathematics, The Hong Kong University of Science and Technology, Kowloon, Hong Kong
| | - Xuhui Huang
- HKUST-Shenzhen Research Institute, Hi-Tech Park, Nanshan, Shenzhen 518057, China
- Department of Chemistry, The Hong Kong University of Science and Technology, Kowloon, Hong Kong
- Center of Systems Biology and Human Health, The Hong Kong University of Science and Technology, Kowloon, Hong Kong
- State Key Laboratory of Molecular Neuroscience, The Hong Kong University of Science and Technology, Kowloon, Hong Kong
| |
Collapse
|
14
|
Wang W, Cao S, Zhu L, Huang X. Constructing Markov State Models to elucidate the functional conformational changes of complex biomolecules. WILEY INTERDISCIPLINARY REVIEWS-COMPUTATIONAL MOLECULAR SCIENCE 2017. [DOI: 10.1002/wcms.1343] [Citation(s) in RCA: 52] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]
Affiliation(s)
- Wei Wang
- Department of ChemistryThe Hong Kong University of Science and Technology Kowloon Hong Kong
- Center of Systems Biology and Human HealthThe Hong Kong University of Science and Technology Kowloon Hong Kong
| | - Siqin Cao
- Department of ChemistryThe Hong Kong University of Science and Technology Kowloon Hong Kong
| | - Lizhe Zhu
- Department of ChemistryThe Hong Kong University of Science and Technology Kowloon Hong Kong
- Center of Systems Biology and Human HealthThe Hong Kong University of Science and Technology Kowloon Hong Kong
| | - Xuhui Huang
- Department of ChemistryThe Hong Kong University of Science and Technology Kowloon Hong Kong
- Center of Systems Biology and Human HealthThe Hong Kong University of Science and Technology Kowloon Hong Kong
- Hong Kong Branch of Chinese National Engineering Research Center for Tissue Restoration & ReconstructionThe Hong Kong University of Science and Technology Kowloon Hong Kong
- HKUST‐Shenzhen Research Institute Shenzhen China
| |
Collapse
|
15
|
Meng L, Sheong FK, Zeng X, Zhu L, Huang X. Path lumping: An efficient algorithm to identify metastable path channels for conformational dynamics of multi-body systems. J Chem Phys 2017; 147:044112. [DOI: 10.1063/1.4995558] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023] Open
Affiliation(s)
- Luming Meng
- Department of Chemistry, The Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong
| | - Fu Kit Sheong
- Department of Chemistry, The Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong
| | - Xiangze Zeng
- Department of Chemistry, The Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong
| | - Lizhe Zhu
- Department of Chemistry, The Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong
- Center of Systems Biology and Human Health, The Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong
| | - Xuhui Huang
- Department of Chemistry, The Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong
- Center of Systems Biology and Human Health, The Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong
- Hong Kong Branch of Chinese National Engineering Research Center for Tissue Restoration and Reconstruction, The Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong
- HKUST-Shenzhen Research Institute, Hi-Tech Park, Nanshan, Shenzhen 518057, China
| |
Collapse
|
16
|
Liu S, Zhu L, Sheong FK, Wang W, Huang X. Adaptive partitioning by local density-peaks: An efficient density-based clustering algorithm for analyzing molecular dynamics trajectories. J Comput Chem 2016; 38:152-160. [PMID: 27868222 DOI: 10.1002/jcc.24664] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2016] [Revised: 10/09/2016] [Accepted: 10/26/2016] [Indexed: 12/11/2022]
Abstract
We present an efficient density-based adaptive-resolution clustering method APLoD for analyzing large-scale molecular dynamics (MD) trajectories. APLoD performs the k-nearest-neighbors search to estimate the density of MD conformations in a local fashion, which can group MD conformations in the same high-density region into a cluster. APLoD greatly improves the popular density peaks algorithm by reducing the running time and the memory usage by 2-3 orders of magnitude for systems ranging from alanine dipeptide to a 370-residue Maltose-binding protein. In addition, we demonstrate that APLoD can produce clusters with various sizes that are adaptive to the underlying density (i.e., larger clusters at low-density regions, while smaller clusters at high-density regions), which is a clear advantage over other popular clustering algorithms including k-centers and k-medoids. We anticipate that APLoD can be widely applied to split ultra-large MD datasets containing millions of conformations for subsequent construction of Markov State Models. © 2016 Wiley Periodicals, Inc.
Collapse
Affiliation(s)
- Song Liu
- Department of Chemistry, The Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong
| | - Lizhe Zhu
- Department of Chemistry, The Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong.,Center of Systems Biology and Human Health, State Key Laboratory of Molecular Neuroscience, The Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong
| | - Fu Kit Sheong
- Department of Chemistry, The Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong
| | - Wei Wang
- Department of Chemistry, The Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong.,Center of Systems Biology and Human Health, State Key Laboratory of Molecular Neuroscience, The Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong
| | - Xuhui Huang
- Department of Chemistry, The Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong.,Center of Systems Biology and Human Health, State Key Laboratory of Molecular Neuroscience, The Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong
| |
Collapse
|
17
|
Zhang L, Pardo-Avila F, Unarta IC, Cheung PPH, Wang G, Wang D, Huang X. Elucidation of the Dynamics of Transcription Elongation by RNA Polymerase II using Kinetic Network Models. Acc Chem Res 2016; 49:687-94. [PMID: 26991064 DOI: 10.1021/acs.accounts.5b00536] [Citation(s) in RCA: 39] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023]
Abstract
RNA polymerase II (Pol II) is an essential enzyme that catalyzes transcription with high efficiency and fidelity in eukaryotic cells. During transcription elongation, Pol II catalyzes the nucleotide addition cycle (NAC) to synthesize mRNA using DNA as the template. The transitions between the states of the NAC require conformational changes of both the protein and nucleotides. Although X-ray structures are available for most of these states, the dynamics of the transitions between states are largely unknown. Molecular dynamics (MD) simulations can predict structure-based molecular details and shed light on the mechanisms of these dynamic transitions. However, the employment of MD simulations on a macromolecule (tens to hundreds of nanoseconds) such as Pol II is challenging due to the difficulty of reaching biologically relevant timescales (tens of microseconds or even longer). For this challenge to be overcome, kinetic network models (KNMs), such as Markov State Models (MSMs), have become a popular approach to access long-timescale conformational changes using many short MD simulations. We describe here our application of KNMs to characterize the molecular mechanisms of the NAC of Pol II. First, we introduce the general background of MSMs and further explain procedures for the construction and validation of MSMs by providing some technical details. Next, we review our previous studies in which we applied MSMs to investigate the individual steps of the NAC, including translocation and pyrophosphate ion release. In particular, we describe in detail how we prepared the initial conformations of Pol II elongation complex, performed MD simulations, extracted MD conformations to construct MSMs, and further validated them. We also summarize our major findings on molecular mechanisms of Pol II elongation based on these MSMs. In addition, we have included discussions regarding various key points and challenges for applications of MSMs to systems as large as the Pol II elongation complex. Finally, to study the overall NAC, we combine the individual steps of the NAC into a five-state KNM based on a nonbranched Brownian ratchet scheme to explain the single-molecule optical tweezers experimental data. The studies complement experimental observations and provide molecular mechanisms for the transcription elongation cycle. In the long term, incorporation of sequence-dependent kinetic parameters into KNMs has great potential for identifying error-prone sequences and predicting transcription dynamics in genome-wide transcriptomes.
Collapse
Affiliation(s)
- Lu Zhang
- Department
of Chemistry and State Key Laboratory of Molecular Neuroscience, Center
for System Biology and Human Health, School of Science, and IAS, The Hong Kong University of Science and Technology, Kowloon, Hong Kong
| | - Fátima Pardo-Avila
- Department
of Chemistry and State Key Laboratory of Molecular Neuroscience, Center
for System Biology and Human Health, School of Science, and IAS, The Hong Kong University of Science and Technology, Kowloon, Hong Kong
| | - Ilona Christy Unarta
- Department
of Chemistry and State Key Laboratory of Molecular Neuroscience, Center
for System Biology and Human Health, School of Science, and IAS, The Hong Kong University of Science and Technology, Kowloon, Hong Kong
| | - Peter Pak-Hang Cheung
- Department
of Chemistry and State Key Laboratory of Molecular Neuroscience, Center
for System Biology and Human Health, School of Science, and IAS, The Hong Kong University of Science and Technology, Kowloon, Hong Kong
| | - Guo Wang
- Department
of Chemistry and State Key Laboratory of Molecular Neuroscience, Center
for System Biology and Human Health, School of Science, and IAS, The Hong Kong University of Science and Technology, Kowloon, Hong Kong
| | - Dong Wang
- Department
of Cellular and Molecular Medicine, Skaggs School of Pharmacy and
Pharmaceutical Sciences, University of California, San Diego, La Jolla, California 92093, United States
| | - Xuhui Huang
- Department
of Chemistry and State Key Laboratory of Molecular Neuroscience, Center
for System Biology and Human Health, School of Science, and IAS, The Hong Kong University of Science and Technology, Kowloon, Hong Kong
| |
Collapse
|
18
|
Zhang L, Jiang H, Sheong F, Pardo-Avila F, Cheung PH, Huang X. Constructing Kinetic Network Models to Elucidate Mechanisms of Functional Conformational Changes of Enzymes and Their Recognition with Ligands. Methods Enzymol 2016; 578:343-71. [DOI: 10.1016/bs.mie.2016.05.026] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
|
19
|
|
20
|
Sheong FK, Silva DA, Meng L, Zhao Y, Huang X. Automatic state partitioning for multibody systems (APM): an efficient algorithm for constructing Markov state models to elucidate conformational dynamics of multibody systems. J Chem Theory Comput 2014; 11:17-27. [PMID: 26574199 DOI: 10.1021/ct5007168] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Abstract
The conformational dynamics of multibody systems plays crucial roles in many important problems. Markov state models (MSMs) are powerful kinetic network models that can predict long-time-scale dynamics using many short molecular dynamics simulations. Although MSMs have been successfully applied to conformational changes of individual proteins, the analysis of multibody systems is still a challenge because of the complexity of the dynamics that occur on a mixture of drastically different time scales. In this work, we have developed a new algorithm, automatic state partitioning for multibody systems (APM), for constructing MSMs to elucidate the conformational dynamics of multibody systems. The APM algorithm effectively addresses different time scales in the multibody systems by directly incorporating dynamics into geometric clustering when identifying the metastable conformational states. We have applied the APM algorithm to a 2D potential that can mimic a protein-ligand binding system and the aggregation of two hydrophobic particles in water and have shown that it can yield tremendous enhancements in the computational efficiency of MSM construction and the accuracy of the models.
Collapse
Affiliation(s)
- Fu Kit Sheong
- HKUST Shenzhen Research Institute , Nanshan, Shenzhen 518057, China
| | | | - Luming Meng
- HKUST Shenzhen Research Institute , Nanshan, Shenzhen 518057, China
| | | | - Xuhui Huang
- HKUST Shenzhen Research Institute , Nanshan, Shenzhen 518057, China
| |
Collapse
|
21
|
Korb O, Finn PW, Jones G. The cloud and other new computational methods to improve molecular modelling. Expert Opin Drug Discov 2014; 9:1121-31. [DOI: 10.1517/17460441.2014.941800] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
|
22
|
Application of Markov State Models to Simulate Long Timescale Dynamics of Biological Macromolecules. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2014; 805:29-66. [DOI: 10.1007/978-3-319-02970-2_2] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
|
23
|
Yao Y, Cui RZ, Bowman GR, Silva DA, Sun J, Huang X. Hierarchical Nyström methods for constructing Markov state models for conformational dynamics. J Chem Phys 2013; 138:174106. [PMID: 23656113 DOI: 10.1063/1.4802007] [Citation(s) in RCA: 43] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023] Open
Abstract
Markov state models (MSMs) have become a popular approach for investigating the conformational dynamics of proteins and other biomolecules. MSMs are typically built from numerous molecular dynamics simulations by dividing the sampled configurations into a large number of microstates based on geometric criteria. The resulting microstate model can then be coarse-grained into a more understandable macrostate model by lumping together rapidly mixing microstates into larger, metastable aggregates. However, finite sampling often results in the creation of many poorly sampled microstates. During coarse-graining, these states are mistakenly identified as being kinetically important because transitions to/from them appear to be slow. In this paper, we propose a formalism based on an algebraic principle for matrix approximation, i.e., the Nyström method, to deal with such poorly sampled microstates. Our scheme builds a hierarchy of microstates from high to low populations and progressively applies spectral clustering on sets of microstates within each level of the hierarchy. It helps spectral clustering identify metastable aggregates with highly populated microstates rather than being distracted by lowly populated states. We demonstrate the ability of this algorithm to discover the major metastable states on two model systems, the alanine dipeptide and trpzip2 peptide.
Collapse
Affiliation(s)
- Yuan Yao
- School of Mathematical Sciences, LMAM-LMEQF-LMPR, Peking University, Beijing 100871, China.
| | | | | | | | | | | |
Collapse
|
24
|
Thorpe IF. Efficiently Refining a Transition Path Using Clustering. Biophys J 2013; 105:545-6. [DOI: 10.1016/j.bpj.2013.06.020] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2013] [Accepted: 06/12/2013] [Indexed: 11/30/2022] Open
|
25
|
Faccioli P, Pederiva F. Microscopically computing free-energy profiles and transition path time of rare macromolecular transitions. PHYSICAL REVIEW. E, STATISTICAL, NONLINEAR, AND SOFT MATTER PHYSICS 2012; 86:061916. [PMID: 23367984 DOI: 10.1103/physreve.86.061916] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/12/2012] [Indexed: 06/01/2023]
Abstract
We introduce a rigorous method to microscopically compute the observables which characterize the thermodynamics and kinetics of rare macromolecular transitions for which it is possible to identify a priori a slow reaction coordinate. In order to sample the ensemble of statistically significant reaction pathways, we define a biased molecular dynamics (MD) in which barrier-crossing transitions are accelerated without introducing any unphysical external force. In contrast to other biased MD methods, in the present approach the systematic errors which are generated in order to accelerate the transition can be analytically calculated and therefore can be corrected for. This allows for a computationally efficient reconstruction of the free-energy profile as a function of the reaction coordinate and for the calculation of the corresponding diffusion coefficient. The transition path time can then be readily evaluated within the dominant reaction pathways approach. We illustrate and test this method by characterizing a thermally activated transition on a two-dimensional energy surface and the folding of a small protein fragment within a coarse-grained model.
Collapse
Affiliation(s)
- P Faccioli
- Physics Department, University of Trento, Via Sommarive 14, Povo, I-38129 Trento, Italy
| | | |
Collapse
|