1
|
Rydzewski J, Gökdemir T. Learning Markovian dynamics with spectral maps. J Chem Phys 2024; 160:091102. [PMID: 38436438 DOI: 10.1063/5.0189241] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2023] [Accepted: 02/05/2024] [Indexed: 03/05/2024] Open
Abstract
The long-time behavior of many complex molecular systems can often be described by Markovian dynamics in a slow subspace spanned by a few reaction coordinates referred to as collective variables (CVs). However, determining CVs poses a fundamental challenge in chemical physics. Depending on intuition or trial and error to construct CVs can lead to non-Markovian dynamics with long memory effects, hindering analysis. To address this problem, we continue to develop a recently introduced deep-learning technique called spectral map [J. Rydzewski, J. Phys. Chem. Lett. 14, 5216-5220 (2023)]. Spectral map learns slow CVs by maximizing a spectral gap of a Markov transition matrix describing anisotropic diffusion. Here, to represent heterogeneous and multiscale free-energy landscapes with spectral map, we implement an adaptive algorithm to estimate transition probabilities. Through a Markov state model analysis, we validate that spectral map learns slow CVs related to the dominant relaxation timescales and discerns between long-lived metastable states.
Collapse
Affiliation(s)
- Jakub Rydzewski
- Institute of Physics, Faculty of Physics, Astronomy and Informatics, Nicolaus Copernicus University, Grudziadzka 5, 87-100 Toruń, Poland
| | - Tuğçe Gökdemir
- Institute of Physics, Faculty of Physics, Astronomy and Informatics, Nicolaus Copernicus University, Grudziadzka 5, 87-100 Toruń, Poland
| |
Collapse
|
2
|
Ishizone T, Matsunaga Y, Fuchigami S, Nakamura K. Representation of Protein Dynamics Disentangled by Time-Structure-Based Prior. J Chem Theory Comput 2024; 20:436-450. [PMID: 38151233 DOI: 10.1021/acs.jctc.3c01025] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2023]
Abstract
Representation learning (RL) is a universal technique for deriving low-dimensional disentangled representations from high-dimensional observations, aiding in a multitude of downstream tasks. RL has been extensively applied to various data types, including images and natural language. Here, we analyze molecular dynamics (MD) simulation data of biomolecules in terms of RL. Currently, state-of-the-art RL techniques, mainly motivated by the variational principle, try to capture slow motions in the representation (latent) space. Here, we propose two methods based on an alternative perspective on the disentanglement in the latent space. By disentanglement, we here mean the separation of underlying factors in the simulation data, aiding in detecting physically important coordinates for conformational transitions. The proposed methods introduce a simple prior that imposes temporal constraints in the latent space, serving as a regularization term to facilitate the capture of disentangled representations of dynamics. Comparison with other methods via the analysis of MD simulation trajectories for alanine dipeptide and chignolin validates that the proposed methods construct Markov state models (MSMs) whose implied time scales are comparable to those of the state-of-the-art methods. Using a measure based on total variation, we quantitatively evaluated that the proposed methods successfully disentangle physically important coordinates, aiding the interpretation of folding/unfolding transitions of chignolin. Overall, our methods provide good representations of complex biomolecular dynamics for downstream tasks, allowing for better interpretations of the conformational transitions.
Collapse
Affiliation(s)
- Tsuyoshi Ishizone
- Mathematical Sciences Program, Graduate School of Advanced Mathematical Sciences, Meiji University, Nakano 4-21-1, Nakano-ku, Tokyo 164-8525, Japan
| | - Yasuhiro Matsunaga
- Graduate School of Science and Engineering, Saitama University, Shimo-Okubo 255, Sakura-ku, Saitama-shi, Saitama 338-8570, Japan
| | - Sotaro Fuchigami
- Physical Biochemistry Laboratory, Division of Pharmaceutical Sciences, School of Pharmaceutical Sciences, University of Shizuoka, 52-1 Yada, Suruga-ku, Shizuoka 422-8526, Japan
| | - Kazuyuki Nakamura
- Department of Mathematical Sciences Based on Modeling and Analysis, School of Interdisciplinary Mathematical Sciences, Meiji University, Nakano 4-21-1, Nakano-ku, Tokyo 164-8525, Japan
| |
Collapse
|
3
|
Semelak JA, Zeida A, Foglia NO, Estrin DA. Minimum Free Energy Pathways of Reactive Processes with Nudged Elastic Bands. J Chem Theory Comput 2023; 19:6273-6293. [PMID: 37647166 DOI: 10.1021/acs.jctc.3c00366] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/01/2023]
Abstract
The determination of minimum free energy pathways (MFEP) is one of the most widely used strategies to study reactive processes. For chemical reactions in complex environments, the combination of quantum mechanics (QM) with a molecular mechanics (MM) representation is usually necessary in a hybrid QM/MM framework. However, even within the QM/MM approximation, the affordable sampling of the phase space is, in general, quite restricted. To reduce drastically the computational cost of the simulations, several methods such as umbrella sampling require performing a priori a selection of a reaction coordinate. The quality of the computed results, in an affordable computational time, is intimately related to the reaction coordinate election which is, in general, a nontrivial task. In this work, we provide an approach to model reactive processes in complex environments that does not require the a priori selection of a reaction coordinate. The proposed methodology combines QM/MM simulations with an extrapolation of the nudged elastic bands (NEB) method to the free energy surface (FENEB). We present and apply our own FENEB scheme to optimize MFEP in different reactive processes, using QM/MM frameworks at semiempirical and density functional theory levels. Our implementation is based on performing the FENEB optimization by uncoupling the optimization of the band in a perpendicular and tangential direction. In each step, a full optimization with the spring force is performed, which guarantees that the images remain evenly distributed. The robustness of the method and the influence of sampling on the quality of the optimized MFEP and its associated free energy barrier are studied. We show that the FENEB method provides a good estimation of the reaction barrier even with relatively short simulation times, supporting that its combination with QM/MM frameworks provides an adequate tool to study chemical processes in complex environments.
Collapse
Affiliation(s)
- Jonathan A Semelak
- Facultad de Ciencias Exactas y Naturales, Departamento de Química Inorgánica, Analítica y Química Física, Universidad de Buenos Aires, Buenos Aires C1428EHA, Argentina
- Instituto de Química Física de los Materiales, Medio Ambiente y Energía (INQUIMAE), CONICET-Universidad de Buenos Aires, Buenos Aires C1428EHA, Argentina
| | - Ari Zeida
- Departamento de Bioquímica, Facultad de Medicina, Universidad de la República, Montevideo 11800, Uruguay
- Centro de Investigaciones Biomédicas (CEINBIO), Universidad de la República, Montevideo 11800, Uruguay
| | - Nicolás O Foglia
- Max-Planck-Institut für Kohlenforschung, Kaiser-Wilhelm-Platz 1, Mülheim an der Ruhr 45470, Germany
| | - Darío A Estrin
- Facultad de Ciencias Exactas y Naturales, Departamento de Química Inorgánica, Analítica y Química Física, Universidad de Buenos Aires, Buenos Aires C1428EHA, Argentina
- Instituto de Química Física de los Materiales, Medio Ambiente y Energía (INQUIMAE), CONICET-Universidad de Buenos Aires, Buenos Aires C1428EHA, Argentina
| |
Collapse
|
4
|
Dietschreit JCB, Diestler DJ, Gómez-Bombarelli R. Entropy and Energy Profiles of Chemical Reactions. J Chem Theory Comput 2023; 19:5369-5379. [PMID: 37535443 DOI: 10.1021/acs.jctc.3c00448] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/05/2023]
Abstract
The description of chemical processes at the molecular level is often facilitated by the use of reaction coordinates or collective variables (CVs). The CV measures the progress of the reaction and allows the construction of profiles that track how specific properties evolve as the reaction progresses. Whereas CVs are routinely used, especially alongside enhanced sampling techniques, the links among reaction profiles, thermodynamic state functions, and reaction rate constants are not rigorously exploited. Here, we report a unified treatment of such reaction profiles. Tractable expressions are derived for the free-energy, internal-energy, and entropy profiles as functions of only the CV. We demonstrate the ability of this treatment to extract quantitative insight from the entropy and internal-energy profiles of various real-world physicochemical processes, including intramolecular organic reactions, ionic transport in superionic electrolytes, and molecular transport in nanoporous materials.
Collapse
Affiliation(s)
- Johannes C B Dietschreit
- Department of Materials Science and Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Dennis J Diestler
- University of Nebraska-Lincoln, Lincoln, Nebraska 68583, United States
| | - Rafael Gómez-Bombarelli
- Department of Materials Science and Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| |
Collapse
|
5
|
Naleem N, Abreu CRA, Warmuz K, Tong M, Kirmizialtin S, Tuckerman ME. An exploration of machine learning models for the determination of reaction coordinates associated with conformational transitions. J Chem Phys 2023; 159:034102. [PMID: 37458344 DOI: 10.1063/5.0147597] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2023] [Accepted: 06/23/2023] [Indexed: 07/20/2023] Open
Abstract
Determining collective variables (CVs) for conformational transitions is crucial to understanding their dynamics and targeting them in enhanced sampling simulations. Often, CVs are proposed based on intuition or prior knowledge of a system. However, the problem of systematically determining a proper reaction coordinate (RC) for a specific process in terms of a set of putative CVs can be achieved using committor analysis (CA). Identifying essential degrees of freedom that govern such transitions using CA remains elusive because of the high dimensionality of the conformational space. Various schemes exist to leverage the power of machine learning (ML) to extract an RC from CA. Here, we extend these studies and compare the ability of 17 different ML schemes to identify accurate RCs associated with conformational transitions. We tested these methods on an alanine dipeptide in vacuum and on a sarcosine dipeptoid in an implicit solvent. Our comparison revealed that the light gradient boosting machine method outperforms other methods. In order to extract key features from the models, we employed Shapley Additive exPlanations analysis and compared its interpretation with the "feature importance" approach. For the alanine dipeptide, our methodology identifies ϕ and θ dihedrals as essential degrees of freedom in the C7ax to C7eq transition. For the sarcosine dipeptoid system, the dihedrals ψ and ω are the most important for the cisαD to transαD transition. We further argue that analysis of the full dynamical pathway, and not just endpoint states, is essential for identifying key degrees of freedom governing transitions.
Collapse
Affiliation(s)
- Nawavi Naleem
- Chemistry Program, Science Division, New York University, Abu Dhabi, UAE
| | - Charlles R A Abreu
- Chemical Engineering Department, Escola de Química, Universidade Federal do Rio de Janeiro, 21941-909 Rio de Janeiro, RJ, Brazil
| | - Krzysztof Warmuz
- Computer Science Program, Science Division, New York University, Abu Dhabi, UAE
| | - Muchen Tong
- Department of Chemistry, New York University (NYU), New York, New York 10003, USA
| | - Serdal Kirmizialtin
- Chemistry Program, Science Division, New York University, Abu Dhabi, UAE
- Department of Chemistry, New York University (NYU), New York, New York 10003, USA
- Center for Smart Engineering Materials, New York University, Abu Dhabi, UAE
| | - Mark E Tuckerman
- Department of Chemistry, New York University (NYU), New York, New York 10003, USA
- Courant Institute of Mathematical Sciences, New York University, New York, New York 10012, USA
- NYU-ECNU Center for Computational Chemistry at NYU Shanghai, 3663 Zhongshan Rd. North, Shanghai 200062, China
- Simons Center for Computational Physical Chemistry at New York University, New York, New York 10003, USA
| |
Collapse
|
6
|
Chen H, Roux B, Chipot C. Discovering Reaction Pathways, Slow Variables, and Committor Probabilities with Machine Learning. J Chem Theory Comput 2023. [PMID: 37224455 DOI: 10.1021/acs.jctc.3c00028] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/26/2023]
Abstract
A significant challenge faced by atomistic simulations is the difficulty, and often impossibility, to sample the transitions between metastable states of the free-energy landscape associated with slow molecular processes. Importance-sampling schemes represent an appealing option to accelerate the underlying dynamics by smoothing out the relevant free-energy barriers, but require the definition of suitable reaction-coordinate (RC) models expressed in terms of compact low-dimensional sets of collective variables (CVs). While most computational studies of slow molecular processes have traditionally relied on educated guesses based on human intuition to reduce the dimensionality of the problem at hand, a variety of machine-learning (ML) algorithms have recently emerged as powerful alternatives to discover meaningful CVs capable of capturing the dynamics of the slowest degrees of freedom. Considering a simple paradigmatic situation in which the long-time dynamics is dominated by the transition between two known metastable states, we compare two variational data-driven ML methods based on Siamese neural networks aimed at discovering a meaningful RC model─the slowest decorrelating CV of the molecular process, and the committor probability to first reach one of the two metastable states. One method is the state-free reversible variational approach for Markov processes networks (VAMPnets), or SRVs─the other, inspired by the transition path theory framework, is the variational committor-based neural networks, or VCNs. The relationship and the ability of these methodologies to discover the relevant descriptors of the slow molecular process of interest are illustrated with a series of simple model systems. We also show that both strategies are amenable to importance-sampling schemes through an appropriate reweighting algorithm that approximates the kinetic properties of the transition.
Collapse
Affiliation(s)
- Haochuan Chen
- Laboratoire International Associé Centre National de la Recherche Scientifique et University of Illinois at Urbana-Champaign, Unité Mixte de Recherche n°7019, Université de Lorraine, B.P. 70239, 54506 Vandœuvre-lès-Nancy cedex, France
| | - Benoît Roux
- Department of Biochemistry and Molecular Biology, University of Chicago, Chicago, 60637, United States
| | - Christophe Chipot
- Laboratoire International Associé Centre National de la Recherche Scientifique et University of Illinois at Urbana-Champaign, Unité Mixte de Recherche n°7019, Université de Lorraine, B.P. 70239, 54506 Vandœuvre-lès-Nancy cedex, France
- Department of Biochemistry and Molecular Biology, University of Chicago, Chicago, 60637, United States
- NIH Center for Macromolecular Modeling and Bioinformatics, Beckman Institute for Advanced Science and Technology, and Department of Physics, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, United States
| |
Collapse
|
7
|
Dominic AJ, Cao S, Montoya-Castillo A, Huang X. Memory Unlocks the Future of Biomolecular Dynamics: Transformative Tools to Uncover Physical Insights Accurately and Efficiently. J Am Chem Soc 2023; 145:9916-9927. [PMID: 37104720 DOI: 10.1021/jacs.3c01095] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/29/2023]
Abstract
Conformational changes underpin function and encode complex biomolecular mechanisms. Gaining atomic-level detail of how such changes occur has the potential to reveal these mechanisms and is of critical importance in identifying drug targets, facilitating rational drug design, and enabling bioengineering applications. While the past two decades have brought Markov state model techniques to the point where practitioners can regularly use them to glimpse the long-time dynamics of slow conformations in complex systems, many systems are still beyond their reach. In this Perspective, we discuss how including memory (i.e., non-Markovian effects) can reduce the computational cost to predict the long-time dynamics in these complex systems by orders of magnitude and with greater accuracy and resolution than state-of-the-art Markov state models. We illustrate how memory lies at the heart of successful and promising techniques, ranging from the Fokker-Planck and generalized Langevin equations to deep-learning recurrent neural networks and generalized master equations. We delineate how these techniques work, identify insights that they can offer in biomolecular systems, and discuss their advantages and disadvantages in practical settings. We show how generalized master equations can enable the investigation of, for example, the gate-opening process in RNA polymerase II and demonstrate how our recent advances tame the deleterious influence of statistical underconvergence of the molecular dynamics simulations used to parameterize these techniques. This represents a significant leap forward that will enable our memory-based techniques to interrogate systems that are currently beyond the reach of even the best Markov state models. We conclude by discussing some current challenges and future prospects for how exploiting memory will open the door to many exciting opportunities.
Collapse
Affiliation(s)
- Anthony J Dominic
- Department of Chemistry, University of Colorado Boulder, Boulder, Colorado 80309, USA
| | - Siqin Cao
- Department of Chemistry, Theoretical Chemistry Institute, University of Wisconsin-Madison, Madison, Wisconsin 53706, USA
| | | | - Xuhui Huang
- Department of Chemistry, Theoretical Chemistry Institute, University of Wisconsin-Madison, Madison, Wisconsin 53706, USA
| |
Collapse
|
8
|
Chen H, Chipot C. Chasing collective variables using temporal data-driven strategies. QRB DISCOVERY 2023; 4:e2. [PMID: 37564298 PMCID: PMC10411323 DOI: 10.1017/qrd.2022.23] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2022] [Revised: 12/21/2022] [Accepted: 12/29/2022] [Indexed: 01/09/2023] Open
Abstract
The convergence of free-energy calculations based on importance sampling depends heavily on the choice of collective variables (CVs), which in principle, should include the slow degrees of freedom of the biological processes to be investigated. Autoencoders (AEs), as emerging data-driven dimension reduction tools, have been utilised for discovering CVs. AEs, however, are often treated as black boxes, and what AEs actually encode during training, and whether the latent variables from encoders are suitable as CVs for further free-energy calculations remains unknown. In this contribution, we review AEs and their time-series-based variants, including time-lagged AEs (TAEs) and modified TAEs, as well as the closely related model variational approach for Markov processes networks (VAMPnets). We then show through numerical examples that AEs learn the high-variance modes instead of the slow modes. In stark contrast, time series-based models are able to capture the slow modes. Moreover, both modified TAEs with extensions from slow feature analysis and the state-free reversible VAMPnets (SRVs) can yield orthogonal multidimensional CVs. As an illustration, we employ SRVs to discover the CVs of the isomerizations of N-acetyl-N'-methylalanylamide and trialanine by iterative learning with trajectories from biased simulations. Last, through numerical experiments with anisotropic diffusion, we investigate the potential relationship of time-series-based models and committor probabilities.
Collapse
Affiliation(s)
- Haochuan Chen
- Laboratoire International Associé Centre National de la Recherche Scientifique et University of Illinois at Urbana-Champaign, Unité Mixte de Recherche n°7019, Université de Lorraine, 54506 Vandœuvre-lès-Nancy, France
| | - Christophe Chipot
- Laboratoire International Associé Centre National de la Recherche Scientifique et University of Illinois at Urbana-Champaign, Unité Mixte de Recherche n°7019, Université de Lorraine, 54506 Vandœuvre-lès-Nancy, France
- Theoretical and Computational Biophysics Group, Beckman Institute, and Department of Physics, University of Illinois at Urbana-Champaign, Urbana, IL61801, USA
- Department of Biochemistry and Molecular Biology, University of Chicago, Chicago, IL60637, USA
| |
Collapse
|
9
|
Köhs L, Kukovetz K, Rauh O, Koeppl H. Nonparametric Bayesian inference for meta-stable conformational dynamics. Phys Biol 2022; 19. [PMID: 35944548 DOI: 10.1088/1478-3975/ac885e] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2022] [Accepted: 08/09/2022] [Indexed: 11/11/2022]
Abstract
Analyses of structural dynamics of biomolecules hold great promise to deepen the understanding of and ability to construct complex molecular systems. To this end, both experimental and computational means are available, such as fluorescence quenching experiments or molecular dynamics simulations, respectively. We argue that while seemingly disparate, both fields of study have to deal with the same type of data about the same underlying phenomenon of conformational switching. Two central challenges typically arise in both contexts: (i) the amount of obtained data is large, and (ii) it is often unknown how many distinct molecular states underlie these data. In this study, we build on the established idea of Markov state modeling and propose a generative, Bayesian nonparametric hidden Markov state model that addresses these challenges. Utilizing hierarchical Dirichlet processes, we treat different meta-stable molecule conformations as distinct Markov states, the number of which we then do not have to set a priori. In contrast to existing approaches to both experimental as well as simulation data that are based on the same idea, we leverage a mean-field variational inference approach, enabling scalable inference on large amounts of data. Furthermore, we specify the model also for the important case of angular data, which however proves to be computationally intractable. Addressing this issue, we propose a computationally tractable approximation to the angular model. We demonstrate the method on synthetic ground truth data and apply it to known benchmark problems as well as electrophysiological experimental data from a conformation-switching ion channel to highlight its practical utility.
Collapse
Affiliation(s)
- Lukas Köhs
- Centre for Synthetic Biology, Technische Universität Darmstadt, Rundeturmstrasse 12, Darmstadt, 64283, GERMANY
| | - Kerri Kukovetz
- Biology Department, Technische Universität Darmstadt, Schnittspahnstrasse 3, Darmstadt, 64287, GERMANY
| | - Oliver Rauh
- Biology Department, Technische Universität Darmstadt, Schnittspahnstrasse 3, Darmstadt, 64287, GERMANY
| | - Heinz Koeppl
- Centre for Synthetic Biology, Technische Universität Darmstadt, Rundeturmstrasse 12, Darmstadt, 64283, GERMANY
| |
Collapse
|
10
|
Novelli P, Bonati L, Pontil M, Parrinello M. Characterizing Metastable States with the Help of Machine Learning. J Chem Theory Comput 2022; 18:5195-5202. [PMID: 35920063 DOI: 10.1021/acs.jctc.2c00393] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Present-day atomistic simulations generate long trajectories of ever more complex systems. Analyzing these data, discovering metastable states, and uncovering their nature are becoming increasingly challenging. In this paper, we first use the variational approach to conformation dynamics to discover the slowest dynamical modes of the simulations. This allows the different metastable states of the system to be located and organized hierarchically. The physical descriptors that characterize metastable states are discovered by means of a machine learning method. We show in the cases of two proteins, chignolin and bovine pancreatic trypsin inhibitor, how such analysis can be effortlessly performed in a matter of seconds. Another strength of our approach is that it can be applied to the analysis of both unbiased and biased simulations.
Collapse
Affiliation(s)
- Pietro Novelli
- Computational Statistics and Machine Learning, Italian Institute of Technology, Via Enrico Melen 83, 16142 Genoa, Italy
| | - Luigi Bonati
- Atomistic Simulations, Italian Institute of Technology, Via Enrico Melen 83, 16142 Genoa, Italy
| | - Massimiliano Pontil
- Computational Statistics and Machine Learning, Italian Institute of Technology, Via Enrico Melen 83, 16142 Genoa, Italy.,Department of Computer Science, University College London, London WC1E 6BT, United Kingdom
| | - Michele Parrinello
- Atomistic Simulations, Italian Institute of Technology, Via Enrico Melen 83, 16142 Genoa, Italy
| |
Collapse
|
11
|
Krivov SV. Additive eigenvectors as optimal reaction coordinates, conditioned trajectories, and time-reversible description of stochastic processes. J Chem Phys 2022; 157:014108. [DOI: 10.1063/5.0088061] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
A fundamental way to analyze complex multidimensional stochastic dynamics is to describe it as diffusion on a free energy landscape—free energy as a function of reaction coordinates (RCs). For such a description to be quantitatively accurate, the RC should be chosen in an optimal way. The committor function is a primary example of an optimal RC for the description of equilibrium reaction dynamics between two states. Here, additive eigenvectors (addevs) are considered as optimal RCs to address the limitations of the committor. An addev master equation for a Markov chain is derived. A stationary solution of the equation describes a sub-ensemble of trajectories conditioned on having the same optimal RC for the forward and time-reversed dynamics in the sub-ensemble. A collection of such sub-ensembles of trajectories, called stochastic eigenmodes, can be used to describe/approximate the stochastic dynamics. A non-stationary solution describes the evolution of the probability distribution. However, in contrast to the standard master equation, it provides a time-reversible description of stochastic dynamics. It can be integrated forward and backward in time. The developed framework is illustrated on two model systems—unidirectional random walk and diffusion.
Collapse
Affiliation(s)
- Sergei V. Krivov
- University of Leeds, Astbury Center for Structural Molecular Biology, Faculty of Biological Sciences, University of Leeds, Leeds LS2 9JT, United Kingdom
| |
Collapse
|
12
|
Diez G, Nagel D, Stock G. Correlation-Based Feature Selection to Identify Functional Dynamics in Proteins. J Chem Theory Comput 2022; 18:5079-5088. [PMID: 35793551 DOI: 10.1021/acs.jctc.2c00337] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
To interpret molecular dynamics simulations of biomolecular systems, systematic dimensionality reduction methods are commonly employed. Among others, this includes principal component analysis (PCA) and time-lagged independent component analysis (TICA), which aim to maximize the variance and the time scale of the first components, respectively. A crucial first step of such an analysis is the identification of suitable and relevant input coordinates (the so-called features), such as backbone dihedral angles and interresidue distances. As typically only a small subset of those coordinates is involved in a specific biomolecular process, it is important to discard the remaining uncorrelated motions or weakly correlated noise coordinates. This is because they may exhibit large amplitudes or long time scales and therefore will be erroneously considered important by PCA and TICA, respectively. To discriminate collective motions underlying functional dynamics from uncorrelated motions, the correlation matrix of the input coordinates is block-diagonalized by a clustering method. This strategy avoids possible bias due to presumed functional observables and conformational states or variation principles that maximize variance or time scales. Considering several linear and nonlinear correlation measures and various clustering algorithms, it is shown that the combination of linear correlation and the Leiden community detection algorithm yields excellent results for all considered model systems. These include the functional motion of T4 lysozyme to demonstrate the successful identification of collective motion, as well as the folding of the villin headpiece to highlight the physical interpretation of the correlated motions in terms of a functional mechanism.
Collapse
Affiliation(s)
- Georg Diez
- Biomolecular Dynamics, Institute of Physics, Albert-Ludwigs-Universität, 79104 Freiburg, Germany
| | - Daniel Nagel
- Biomolecular Dynamics, Institute of Physics, Albert-Ludwigs-Universität, 79104 Freiburg, Germany
| | - Gerhard Stock
- Biomolecular Dynamics, Institute of Physics, Albert-Ludwigs-Universität, 79104 Freiburg, Germany
| |
Collapse
|
13
|
Abstract
The kinetics of a dynamical system dominated by two metastable states is examined from the perspective of the activated-dynamics reactive flux formalism, Markov state eigenvalue spectral decomposition, and committor-based transition path theory. Analysis shows that the different theoretical formulations are consistent, clarifying the significance of the inherent microscopic lag-times that are implicated, and that the most meaningful one-dimensional reaction coordinate in the region of the transition state is along the gradient of the committor in the multidimensional subspace of collective variables. It is shown that the familiar reactive flux activated dynamics formalism provides an effective route to calculate the transition rate in the case of a narrow sharp barrier but much less so in the case of a broad flat barrier. In this case, the standard reactive flux correlation function decays very slowly to the plateau value that corresponds to the transmission coefficient. Treating the committor function as a reaction coordinate does not alleviate all issues caused by the slow relaxation of the reactive flux correlation function. A more efficient activated dynamics simulation algorithm may be achieved from a modified reactive flux weighted by the committor. Simulation results on simple systems are used to illustrate the various conceptual points.
Collapse
Affiliation(s)
- Benoît Roux
- Department of Biochemistry and Molecular Biology, Department of Chemistry, The University of Chicago, 5735 S Ellis Ave., Chicago, Illinois 60637, USA
| |
Collapse
|
14
|
Hénin J, Lopes LJS, Fiorin G. Human Learning for Molecular Simulations: The Collective Variables Dashboard in VMD. J Chem Theory Comput 2022; 18:1945-1956. [PMID: 35143194 DOI: 10.1021/acs.jctc.1c01081] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
The Collective Variables Dashboard is a software tool for real-time, seamless exploration of molecular structures and trajectories in a customizable space of collective variables. The Dashboard arises from the integration of the Collective Variables Module (also known as Colvars) with the visualization software VMD, augmented with a fully discoverable graphical interface offering interactive workflows for the design and analysis of collective variables. Typical use cases include a priori design of collective variables for enhanced sampling and free energy simulations as well as analysis of any type of simulation or collection of structures in a collective variable space. A combination of those cases commonly occurs when preliminary simulations, biased or unbiased, reveal that an optimized set of collective variables is necessary to improve sampling in further simulations. Then the Dashboard provides an efficient way to intuitively explore the space of likely collective variables, validate them on existing data, and use the resulting collective variable definitions directly in further biased simulations using the Collective Variables Module. Visualization of biasing energies and forces is proposed to help analyze or plan biased simulations. We illustrate the use of the Dashboard on two applications: discovering coordinates to describe ligand unbinding from a protein binding site and designing volume-based variables to bias the hydration of a transmembrane pore.
Collapse
Affiliation(s)
- Jérôme Hénin
- Laboratoire de Biochimie Théorique UPR 9080, CNRS, Université de Paris, 75005 Paris, France.,Institut de Biologie Physico-Chimique-Fondation Edmond de Rothschild, PSL Research University, 75005 Paris, France
| | - Laura J S Lopes
- Theoretical Division T-1, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, United States
| | - Giacomo Fiorin
- National Institute of Neurological Disorders and Stroke (NINDS) and National Heart, Lung and Blood Institute (NHLBI), Bethesda, Maryland 20892, United States
| |
Collapse
|
15
|
Paul TK, Taraphder S. Nonlinear Reaction Coordinate of an Enzyme Catalyzed Proton Transfer Reaction. J Phys Chem B 2022; 126:1413-1425. [PMID: 35138854 DOI: 10.1021/acs.jpcb.1c08760] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
We present an in-depth study on the theoretical calculation of an optimum reaction coordinate as a linear or nonlinear combination of important collective variables (CVs) sampled from an ensemble of reactive transition paths for an intramolecular proton transfer reaction catalyzed by the enzyme human carbonic anhydrase (HCA) II. The linear models are optimized by likelihood maximization for a given number of CVs. The nonlinear models are based on an artificial neural network with the same number of CVs and optimized by minimizing the root-mean-square error in comparison to a training set of committor estimators generated for the given transition. The nonlinear reaction coordinate thus obtained yields the free energy of activation and rate constant as 9.46 kcal mol-1 and 1.25 × 106 s-1, respectively. These estimates are found to be in quantitative agreement with the known experimental results. We have also used an extended autoencoder model to show that a similar analysis can be carried out using a single CV only. The resultant free energies and kinetics of the reaction slightly overestimate the experimental data. The implications of these results are discussed using a detailed microkinetic scheme of the proton transfer reaction catalyzed by HCA II.
Collapse
Affiliation(s)
- Tanmoy Kumar Paul
- Department of Chemistry, Indian Institute of Technology Kharagpur, Kharagpur 721302, India
| | - Srabani Taraphder
- Department of Chemistry, Indian Institute of Technology Kharagpur, Kharagpur 721302, India
| |
Collapse
|
16
|
González-Fernández C, Bringas E, Oostenbrink C, Ortiz I. In silico investigation and surmounting of Lipopolysaccharide barrier in Gram-Negative Bacteria: How far has molecular dynamics Come? Comput Struct Biotechnol J 2022; 20:5886-5901. [PMID: 36382192 PMCID: PMC9636410 DOI: 10.1016/j.csbj.2022.10.039] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2022] [Revised: 10/24/2022] [Accepted: 10/24/2022] [Indexed: 11/29/2022] Open
Abstract
Lipopolysaccharide (LPS), a main component of the outer membrane of Gram-negative bacteria, has crucial implications on both antibiotic resistance and the overstimulation of the host innate immune system. Fighting against these global concerns calls for the molecular understanding of the barrier function and immunostimulatory ability of LPS. Molecular dynamics (MD) simulations have become an invaluable tool for uncovering important findings in LPS research. While the reach of MD simulations for investigating the immunostimulatory ability of LPS has been already outlined, little attention has been paid to the role of MD simulations for exploring its barrier function and synthesis. Herein, we give an overview about the impact of MD simulations on gaining insight into the shield role and synthesis pathway of LPS, which have attracted considerable attention to discover molecules able to surmount antibiotic resistance, either circumventing LPS defenses or disrupting its synthesis. We specifically focus on the enhanced sampling and free energy calculation methods that have been combined with MD simulations to address such research. We also highlight the use of special-purpose MD supercomputers, the importance of appropriate LPS and ions parameterization to obtain reliable results, and the complementary views that MD and wet-lab experiments provide. Thereby, this work, which covers the last five years of research, apart from outlining the phenomena and strategies that are being explored, evidences the valuable insights that are gained by MD, which may be useful to advance antibiotic design, and what the prospects of this in silico method could be in LPS research.
Collapse
Affiliation(s)
- Cristina González-Fernández
- Department of Chemical and Biomolecular Engineering, ETSIIT, University of Cantabria, Avda. Los Castros s/n, 39005 Santander, Spain
| | - Eugenio Bringas
- Department of Chemical and Biomolecular Engineering, ETSIIT, University of Cantabria, Avda. Los Castros s/n, 39005 Santander, Spain
| | - Chris Oostenbrink
- Institute for Molecular Modeling and Simulation, BOKU – University of Natural Resources and Life Sciences, Muthgasse 18, 1190 Vienna, Austria
| | - Inmaculada Ortiz
- Department of Chemical and Biomolecular Engineering, ETSIIT, University of Cantabria, Avda. Los Castros s/n, 39005 Santander, Spain
- Corresponding author.
| |
Collapse
|
17
|
A quantitative paradigm for water-assisted proton transport through proteins and other confined spaces. Proc Natl Acad Sci U S A 2021; 118:2113141118. [PMID: 34857630 DOI: 10.1073/pnas.2113141118] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/15/2021] [Indexed: 11/18/2022] Open
Abstract
Water-assisted proton transport through confined spaces influences many phenomena in biomolecular and nanomaterial systems. In such cases, the water molecules that fluctuate in the confined pathways provide the environment and the medium for the hydrated excess proton migration via Grotthuss shuttling. However, a definitive collective variable (CV) that accurately couples the hydration and the connectivity of the proton wire with the proton translocation has remained elusive. To address this important challenge-and thus to define a quantitative paradigm for facile proton transport in confined spaces-a CV is derived in this work from graph theory, which is verified to accurately describe water wire formation and breakage coupled to the proton translocation in carbon nanotubes and the Cl-/H+ antiporter protein, ClC-ec1. Significant alterations in the conformations and thermodynamics of water wires are uncovered after introducing an excess proton into them. Large barriers in the proton translocation free-energy profiles are found when water wires are defined to be disconnected according to the new CV, even though the pertinent confined space is still reasonably well hydrated and-by the simple measure of the mere existence of a water structure-the proton transport would have been predicted to be facile via that oversimplified measure. In this paradigm, however, the simple presence of water is not sufficient for inferring proton translocation, since an excess proton itself is able to drive hydration, and additionally, the water molecules themselves must be adequately connected to facilitate any successful proton transport.
Collapse
|
18
|
Bonati L, Piccini G, Parrinello M. Deep learning the slow modes for rare events sampling. Proc Natl Acad Sci U S A 2021; 118:e2113533118. [PMID: 34706940 PMCID: PMC8612227 DOI: 10.1073/pnas.2113533118] [Citation(s) in RCA: 75] [Impact Index Per Article: 25.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/19/2021] [Indexed: 02/08/2023] Open
Abstract
The development of enhanced sampling methods has greatly extended the scope of atomistic simulations, allowing long-time phenomena to be studied with accessible computational resources. Many such methods rely on the identification of an appropriate set of collective variables. These are meant to describe the system's modes that most slowly approach equilibrium under the action of the sampling algorithm. Once identified, the equilibration of these modes is accelerated by the enhanced sampling method of choice. An attractive way of determining the collective variables is to relate them to the eigenfunctions and eigenvalues of the transfer operator. Unfortunately, this requires knowing the long-term dynamics of the system beforehand, which is generally not available. However, we have recently shown that it is indeed possible to determine efficient collective variables starting from biased simulations. In this paper, we bring the power of machine learning and the efficiency of the recently developed on the fly probability-enhanced sampling method to bear on this approach. The result is a powerful and robust algorithm that, given an initial enhanced sampling simulation performed with trial collective variables or generalized ensembles, extracts transfer operator eigenfunctions using a neural network ansatz and then accelerates them to promote sampling of rare events. To illustrate the generality of this approach, we apply it to several systems, ranging from the conformational transition of a small molecule to the folding of a miniprotein and the study of materials crystallization.
Collapse
Affiliation(s)
- Luigi Bonati
- Department of Physics, Eidgenössische Technische Hochschule (ETH) Zürich, 8092 Zürich, Switzerland;
- Atomistic Simulations, Italian Institute of Technology, 16163 Genova, Italy
| | | | - Michele Parrinello
- Atomistic Simulations, Italian Institute of Technology, 16163 Genova, Italy;
| |
Collapse
|
19
|
Konovalov K, Unarta IC, Cao S, Goonetilleke EC, Huang X. Markov State Models to Study the Functional Dynamics of Proteins in the Wake of Machine Learning. JACS AU 2021; 1:1330-1341. [PMID: 34604842 PMCID: PMC8479766 DOI: 10.1021/jacsau.1c00254] [Citation(s) in RCA: 46] [Impact Index Per Article: 15.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/07/2021] [Indexed: 05/19/2023]
Abstract
Markov state models (MSMs) based on molecular dynamics (MD) simulations are routinely employed to study protein folding, however, their application to functional conformational changes of biomolecules is still limited. In the past few years, the field of computational chemistry has experienced a surge of advancements stemming from machine learning algorithms, and MSMs have not been left out. Unlike global processes, such as protein folding, the application of MSMs to functional conformational changes is challenging because they mostly consist of localized structural transitions. Therefore, it is critical to properly select a subset of structural features that can describe the slowest dynamics of these functional conformational changes. To address this challenge, we recommend several automatic feature selection methods such as Spectral-OASIS. To identify states in MSMs, the chosen features can be subject to dimensionality reduction methods such as TICA or deep learning based VAMPNets to project MD conformations onto a few collective variables for subsequent clustering. Another challenge for the application of MSMs to the study of functional conformational changes is the ability to comprehend their biophysical mechanisms, as MSMs built for these processes often require a large number of states. We recommend the recently developed quasi-MSMs (qMSMs) to address this issue. Compared to MSMs, qMSMs encode the non-Markovian dynamics via the generalized master equation and can significantly reduce the number of states. As a result, qMSMs can be built with a handful of states to facilitate the interpretation of functional conformational changes. In the wake of machine learning, we believe that the rapid advancement in the MSM methodology will lead to their wider application in studying functional conformational changes of biomolecules.
Collapse
Affiliation(s)
- Kirill
A. Konovalov
- Department
of Chemistry, State Key Laboratory of Molecular Neuroscience, The Hong Kong University of Science and Technology, Kowloon, Hong Kong
- Hong
Kong Center for Neurodegenerative Diseases, Hong Kong Science Park, Hong Kong
| | - Ilona Christy Unarta
- Department
of Chemical and Biological Engineering, The Hong Kong University of Science and Technology, Kowloon, Hong Kong
- Hong
Kong Center for Neurodegenerative Diseases, Hong Kong Science Park, Hong Kong
| | - Siqin Cao
- Department
of Chemistry, State Key Laboratory of Molecular Neuroscience, The Hong Kong University of Science and Technology, Kowloon, Hong Kong
- Hong
Kong Center for Neurodegenerative Diseases, Hong Kong Science Park, Hong Kong
| | - Eshani C. Goonetilleke
- Department
of Chemistry, State Key Laboratory of Molecular Neuroscience, The Hong Kong University of Science and Technology, Kowloon, Hong Kong
- Hong
Kong Center for Neurodegenerative Diseases, Hong Kong Science Park, Hong Kong
| | - Xuhui Huang
- Department
of Chemistry, State Key Laboratory of Molecular Neuroscience, The Hong Kong University of Science and Technology, Kowloon, Hong Kong
- Department
of Chemical and Biological Engineering, The Hong Kong University of Science and Technology, Kowloon, Hong Kong
- Hong
Kong Center for Neurodegenerative Diseases, Hong Kong Science Park, Hong Kong
| |
Collapse
|
20
|
Abstract
We extend the nonparametric framework of reaction coordinate optimization to nonequilibrium ensembles of (short) trajectories. For example, we show how, starting from such an ensemble, one can obtain an equilibrium free-energy profile along the committor, which can be used to determine important properties of the dynamics exactly. A new adaptive sampling approach, the transition-state ensemble enrichment, is suggested, which samples the configuration space by "growing" committor segments toward each other starting from the boundary states. This framework is suggested as a general tool, alternative to the Markov state models, for a rigorous and accurate analysis of simulations of large biomolecular systems, as it has the following attractive properties. It is immune to the curse of dimensionality, does not require system-specific information, can approximate arbitrary reaction coordinates with high accuracy, and has sensitive and rigorous criteria to test optimality and convergence. The approaches are illustrated on a 50-dimensional model system and a realistic protein folding trajectory.
Collapse
Affiliation(s)
- Sergei V Krivov
- Astbury Center for Structural Molecular Biology, Faculty of Biological Sciences, University of Leeds, Leeds LS2 9JT, U.K
| |
Collapse
|
21
|
Abstract
![]()
The kinetics of
a dynamical system comprising two metastable states
is formulated in terms of a finite-time propagator in phase space
(position and velocity) adapted to the underdamped Langevin equation.
Dimensionality reduction to a subspace of collective variables yields
familiar expressions for the propagator, committor, and steady-state
flux. A quadratic expression for the steady-state flux between the
two metastable states can serve as a robust variational principle
to determine an optimal approximate committor expressed in terms of
a set of collective variables. The theoretical formulation is exploited
to clarify the foundation of the string method with swarms-of-trajectories,
which relies on the mean drift of short trajectories to determine
the optimal transition pathway. It is argued that the conditions for
Markovity within a subspace of collective variables may not be satisfied
with an arbitrary short time-step and that proper kinetic behaviors
appear only when considering the effective propagator for longer lag
times. The effective propagator with finite lag time is amenable to
an eigenvalue-eigenvector spectral analysis, as elaborated previously
in the context of position-based Markov models. The time-correlation
functions calculated by swarms-of-trajectories along the string pathway
constitutes a natural extension of these developments. The present
formulation provides a powerful theoretical framework to characterize
the optimal pathway between two metastable states of a system.
Collapse
Affiliation(s)
- Benoît Roux
- Department of Biochemistry and Molecular Biology, The University of Chicago, Chicago, Illinois 60637, United States.,Department of Chemistry, The University of Chicago, 5735 S. Ellis Avenue, Chicago, Illinois 60637, United States
| |
Collapse
|
22
|
Aydin F, Durumeric AEP, da Hora GCA, Nguyen JDM, Oh MI, Swanson JMJ. Improving the accuracy and convergence of drug permeation simulations via machine-learned collective variables. J Chem Phys 2021; 155:045101. [PMID: 34340389 DOI: 10.1063/5.0055489] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023] Open
Abstract
Understanding the permeation of biomolecules through cellular membranes is critical for many biotechnological applications, including targeted drug delivery, pathogen detection, and the development of new antibiotics. To this end, computer simulations are routinely used to probe the underlying mechanisms of membrane permeation. Despite great progress and continued development, permeation simulations of realistic systems (e.g., more complex drug molecules or biologics through heterogeneous membranes) remain extremely challenging if not intractable. In this work, we combine molecular dynamics simulations with transition-tempered metadynamics and techniques from the variational approach to conformational dynamics to study the permeation mechanism of a drug molecule, trimethoprim, through a multicomponent membrane. We show that collective variables (CVs) obtained from an unsupervised machine learning algorithm called time-structure based Independent Component Analysis (tICA) improve performance and substantially accelerate convergence of permeation potential of mean force (PMF) calculations. The addition of cholesterol to the lipid bilayer is shown to increase both the width and height of the free energy barrier due to a condensing effect (lower area per lipid) and increase bilayer thickness. Additionally, the tICA CVs reveal a subtle effect of cholesterol increasing the resistance to permeation in the lipid head group region, which is not observed when canonical CVs are used. We conclude that the use of tICA CVs can enable more efficient PMF calculations with additional insight into the permeation mechanism.
Collapse
Affiliation(s)
- Fikret Aydin
- Quantum Simulation Group, Lawrence Livermore National Laboratory, Livermore, California 94550, USA
| | | | - Gabriel C A da Hora
- Department of Chemistry, University of Utah, Salt Lake City, Utah 84112-0850, USA
| | - John D M Nguyen
- Department of Chemistry, University of Utah, Salt Lake City, Utah 84112-0850, USA
| | - Myong In Oh
- Department of Chemistry, University of Utah, Salt Lake City, Utah 84112-0850, USA
| | - Jessica M J Swanson
- Department of Chemistry, University of Utah, Salt Lake City, Utah 84112-0850, USA
| |
Collapse
|
23
|
Serapian SA, Moroni E, Ferraro M, Colombo G. Atomistic Simulations of the Mechanisms of the Poorly Catalytic Mitochondrial Chaperone Trap1: Insights into the Effects of Structural Asymmetry on Reactivity. ACS Catal 2021. [DOI: 10.1021/acscatal.1c00692] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022]
Affiliation(s)
- Stefano A. Serapian
- Department of Chemistry, University of Pavia, Via Torquato Taramelli 12, 27100 Pavia, Italy
| | - Elisabetta Moroni
- ″Giulio Natta” Institute of Chemical and Technological Sciences (SCITEC), Via Mario Bianco 9, 20131 Milan, Italy
| | - Mariarosaria Ferraro
- ″Giulio Natta” Institute of Chemical and Technological Sciences (SCITEC), Via Mario Bianco 9, 20131 Milan, Italy
| | - Giorgio Colombo
- Department of Chemistry, University of Pavia, Via Torquato Taramelli 12, 27100 Pavia, Italy
- ″Giulio Natta” Institute of Chemical and Technological Sciences (SCITEC), Via Mario Bianco 9, 20131 Milan, Italy
| |
Collapse
|
24
|
Abstract
We describe a nonparametric approach for accurate determination of the slowest relaxation eigenvectors of molecular dynamics. The approach is blind as it uses no system specific information. In particular, it does not require a functional form with many parameters to closely approximate eigenvectors, e.g., linear combinations of molecular descriptors or a deep neural network, and thus no extensive expertise with the system. We suggest a rigorous and sensitive validation/optimality criterion for an eigenvector. The criterion uses only eigenvector time series and can be used to validate eigenvectors computed by other approaches. The power of the approach is illustrated on long atomistic protein folding trajectories. The determined eigenvectors pass the validation test at a time scale of 0.2 ns, much shorter than alternative approaches.
Collapse
Affiliation(s)
- Sergei V Krivov
- Astbury Center for Structural Molecular Biology, Faculty of Biological Sciences, University of Leeds, Leeds LS2 9JT, United Kingdom
| |
Collapse
|
25
|
Gkeka P, Stoltz G, Barati Farimani A, Belkacemi Z, Ceriotti M, Chodera JD, Dinner AR, Ferguson AL, Maillet JB, Minoux H, Peter C, Pietrucci F, Silveira A, Tkatchenko A, Trstanova Z, Wiewiora R, Lelièvre T. Machine Learning Force Fields and Coarse-Grained Variables in Molecular Dynamics: Application to Materials and Biological Systems. J Chem Theory Comput 2020; 16:4757-4775. [PMID: 32559068 PMCID: PMC8312194 DOI: 10.1021/acs.jctc.0c00355] [Citation(s) in RCA: 82] [Impact Index Per Article: 20.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Abstract
Machine learning encompasses tools and algorithms that are now becoming popular in almost all scientific and technological fields. This is true for molecular dynamics as well, where machine learning offers promises of extracting valuable information from the enormous amounts of data generated by simulation of complex systems. We provide here a review of our current understanding of goals, benefits, and limitations of machine learning techniques for computational studies on atomistic systems, focusing on the construction of empirical force fields from ab initio databases and the determination of reaction coordinates for free energy computation and enhanced sampling.
Collapse
Affiliation(s)
- Paraskevi Gkeka
- Integrated Drug Discovery, Sanofi R&D, 91385 Chilly-Mazarin, France
| | - Gabriel Stoltz
- CERMICS, Ecole des Ponts, Marne-la-Vallée, France
- Matherials Project-Team, Inria Paris, 75012 Paris, France
| | | | - Zineb Belkacemi
- Integrated Drug Discovery, Sanofi R&D, 91385 Chilly-Mazarin, France
- CERMICS, Ecole des Ponts, Marne-la-Vallée, France
| | - Michele Ceriotti
- Laboratory of Computational Science and Modelling, Institute of Materials, École Polytechnique Fédérale de Lausanne, CH-1015 Lausanne, Switzerland
| | - John D Chodera
- Computational and Systems Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, New York 10065, United States
| | - Aaron R Dinner
- Department of Chemistry, The University of Chicago, Chicago, Illinois 60637, United States
| | - Andrew L Ferguson
- Pritzker School of Molecular Engineering, University of Chicago, 5640 South Ellis Avenue, Chicago, Illinois 60637, United States
| | | | - Hervé Minoux
- Integrated Drug Discovery, Sanofi R&D, 94403 Vitry-sur-Seine, France
| | | | - Fabio Pietrucci
- UMR CNRS 7590, MNHN, Institut de Minéralogie, de Physique des Matériaux et de Cosmochimie, Sorbonne Université, 75005 Paris, France
| | - Ana Silveira
- Computational and Systems Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, New York 10065, United States
| | - Alexandre Tkatchenko
- Department of Physics and Materials Science, University of Luxembourg, L-1511 Luxembourg City, Luxembourg
| | - Zofia Trstanova
- School of Mathematics, The University of Edinburgh, Edinburgh EH9 3FD, U.K
| | - Rafal Wiewiora
- Computational and Systems Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, New York 10065, United States
| | - Tony Lelièvre
- CERMICS, Ecole des Ponts, Marne-la-Vallée, France
- Matherials Project-Team, Inria Paris, 75012 Paris, France
| |
Collapse
|
26
|
Paul S, Ainavarapu SRK, Venkatramani R. Variance of Atomic Coordinates as a Dynamical Metric to Distinguish Proteins and Protein-Protein Interactions in Molecular Dynamics Simulations. J Phys Chem B 2020; 124:4247-4262. [PMID: 32281802 DOI: 10.1021/acs.jpcb.0c01191] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023]
Abstract
Protein dynamics is a manifestation of the complex trajectories of these biomolecules on a multidimensional rugged potential energy surface (PES) driven by thermal energy. At present, computational methods such as atomistic molecular dynamics (MD) simulations can describe thermal protein conformational changes in fully solvated environments over millisecond timescales. Despite these advances, a quantitative assessment of protein dynamics remains a complicated topic, intricately linked to issues such as sampling convergence and the identification of appropriate reaction coordinates/structural features to describe protein conformational states and motions. Here, we present the cumulative variance of atomic coordinate fluctuations (CVCF) along trajectories as an intuitive PES sensitive metric to assess both the extent of sampling and protein dynamics captured in MD simulations. We first examine the sampling problem in model one- (1D) and two-dimensional (2D) PES to demonstrate that the CVCF when traced as a function of the sampling variable (time in MD simulations) can identify local and global equilibria. Further, even far from global equilibrium, a situation representative of standard MD trajectories of proteins, the CVCF can distinguish different PES and therefore resolve the resultant protein dynamics. We demonstrate the utility of our CVCF analysis by applying it to distinguish the dynamics of structurally homologous proteins from the ubiquitin family (ubiquitin, SUMO1, SUMO2) and ubiquitin protein-protein interactions. Our CVCF analysis reveals that differential side-chain dynamics from the structured part of the protein (the conserved β-grasp fold) present distinct protein PES to distinguish ubiquitin from SUMO isoforms. Upon binding to two functionally distinct protein partners (UBCH5A and UEV), intrinsic ubiquitin dynamics changes to reflect the binding context even though the two proteins have similar binding modes, which lead to negligible (sub-angstrom scale) structural changes.
Collapse
Affiliation(s)
- Sanjoy Paul
- Department of Chemical Sciences, Tata Institute of Fundamental Research, Dr. Homi Bhabha Road, Colaba, Mumbai 400005, Maharashtra, India
| | - Sri Rama Koti Ainavarapu
- Department of Chemical Sciences, Tata Institute of Fundamental Research, Dr. Homi Bhabha Road, Colaba, Mumbai 400005, Maharashtra, India
| | - Ravindra Venkatramani
- Department of Chemical Sciences, Tata Institute of Fundamental Research, Dr. Homi Bhabha Road, Colaba, Mumbai 400005, Maharashtra, India
| |
Collapse
|
27
|
Feng J, Shukla D. FingerprintContacts: Predicting Alternative Conformations of Proteins from Coevolution. J Phys Chem B 2020; 124:3605-3615. [PMID: 32283936 DOI: 10.1021/acs.jpcb.9b11869] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Proteins are dynamic molecules which perform diverse molecular functions by adopting different three-dimensional structures. Recent progress in residue-residue contacts prediction opens up new avenues for the de novo protein structure prediction from sequence information. However, it is still difficult to predict more than one conformation from residue-residue contacts alone. This is due to the inability to deconvolve the complex signals of residue-residue contacts, i.e., spatial contacts relevant for protein folding, conformational diversity, and ligand binding. Here, we introduce a machine learning based method, called FingerprintContacts, for extending the capabilities of residue-residue contacts. This algorithm leverages the features of residue-residue contacts, that is, (1) a single conformation outperforms the others in the structural prediction using all the top ranking residue-residue contacts as structural constraints and (2) conformation specific contacts rank lower and constitute a small fraction of residue-residue contacts. We demonstrate the capabilities of FingerprintContacts on eight ligand binding proteins with varying conformational motions. Furthermore, FingerprintContacts identifies small clusters of residue-residue contacts which are preferentially located in the dynamically fluctuating regions. With the rapid growth in protein sequence information, we expect FingerprintContacts to be a powerful first step in structural understanding of protein functional mechanisms.
Collapse
|
28
|
Cummins PL, Gready JE. Kohn-Sham Density Functional Calculations Reveal Proton Wires in the Enolization and Carboxylase Reactions Catalyzed by Rubisco. J Phys Chem B 2020; 124:3015-3026. [PMID: 32208706 DOI: 10.1021/acs.jpcb.0c01169] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]
Abstract
Ribulose 1,5-bisphosphate (RuBP) carboxylase-oxygenase (Rubisco) plays a fundamental role in the carbon cycle by fixing the atmospheric CO2 used in photosynthesis. Rubisco is all the more remarkable because it must catalyze some difficult multistep reaction chemistry involving proton transfers within the one active site. In the present study, we have used Kohn-Sham density functional theory at the B3LYP/6-31G* level with basis set superposition error and dispersion corrections (B3LYP-gCP-D3) to examine the possibility that the proton transfers can take place through molecular wires (including active-site water molecules) via the classical Grotthuss proton-shuttle mechanism. The results support an essential role for water molecules found in the crystal structures of Rubisco complexes as facilitators of proton transport in all the rate-limiting (catalytic) reaction steps through a network of short proton wires within the Rubisco active site. We suggest that completion of the initial product turnover (cycle) requires two excess protons produced in the initial carbamylation that is required for Rubisco activation. By use of proton wires, a large number of reaction steps may be accommodated within a single active site without necessitating the input of excessive conformational strain energy arising from the movement of residue side chains into positions where direct protonation of substrates can occur. The involvement of the identified types of proton wires in the kinetic mechanism is capable of providing a unique explanation for various experimental observations, including deuterium isotope effects and the results of site-directed mutagenesis experiments, and may thus provide a realistic solution to the problem of Rubisco's challenging chemistry.
Collapse
Affiliation(s)
- Peter L Cummins
- Department of Genome Sciences, John Curtin School of Medical Research, The Australian National University, Canberra, ACT 0200, Australia
| | - Jill E Gready
- Department of Genome Sciences, John Curtin School of Medical Research, The Australian National University, Canberra, ACT 0200, Australia
| |
Collapse
|
29
|
Prakashchand DD, Ahalawat N, Bandyopadhyay S, Sengupta S, Mondal J. Nonaffine Displacements Encode Collective Conformational Fluctuations in Proteins. J Chem Theory Comput 2020; 16:2508-2516. [DOI: 10.1021/acs.jctc.9b01100] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Affiliation(s)
- Dube Dheeraj Prakashchand
- Tata Institute of Fundamental Research, Center for Interdisciplinary Sciences, Hyderabad 500107, India
| | - Navjeet Ahalawat
- Tata Institute of Fundamental Research, Center for Interdisciplinary Sciences, Hyderabad 500107, India
- Department of Molecular Biology, Biotechnology and Bioinformatics, Chaudhary Charan Singh Haryana Agricultural University, Hisar 125004, India
| | - Satyabrata Bandyopadhyay
- Tata Institute of Fundamental Research, Center for Interdisciplinary Sciences, Hyderabad 500107, India
| | - Surajit Sengupta
- Tata Institute of Fundamental Research, Center for Interdisciplinary Sciences, Hyderabad 500107, India
| | - Jagannath Mondal
- Tata Institute of Fundamental Research, Center for Interdisciplinary Sciences, Hyderabad 500107, India
| |
Collapse
|
30
|
Foglia NO, González Lebrero MC, Biekofsky RR, Estrin DA. Reaction Path Analysis from Potential Energy Contributions Using Forces: An Accessible Estimator of Reaction Coordinate Adequacy. J Chem Theory Comput 2020; 16:1618-1629. [PMID: 31999449 DOI: 10.1021/acs.jctc.9b01081] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
The calculation of potential energy and free-energy profiles along complex chemical reactions or rare event processes is of great interest because of their importance for many areas in chemistry, molecular biology, and material science. One typical way to generate these profiles is to add a bias potential to modify the energy surface, which can act on a selected degree of freedom in the system. However, in these cases, the quality of the result is strongly dependent on the selection of the degree of freedom over which this bias potential acts. The present work introduces a simple method for the analysis of the degree of freedom selected to describe a chemical process. The proposed methodology is based on the decomposition of contributions to the potential energy profiles by the integration of forces along a reaction path, which allows evaluating the different contributions to the energy change. This could be useful for discriminating the contributions to the energy arising from different regions of the system, which is particularly useful in systems with complex environments that must be represented using hybrid quantum mechanics/molecular mechanics schemes. Furthermore, this methodology allows in generating a quick and simple analysis of the degree of freedom which is used to describe the potential energy profile associated with the reactive process. This is computationally more accessible than the corresponding free-energy profile and can therefore be used as a simple estimator of reaction coordinate adequacy.
Collapse
Affiliation(s)
- Nicolás O Foglia
- Departamento de Quı́mica Inorgánica, Analı́tica y Quı́mica Fı́sica/INQUIMAE-CONICET, Facultad de Ciencias Exactas y Naturales, Universidad de Buenos Aires, Ciudad Universitaria, Pab. II, Buenos Aires C1428EHA, Argentina
| | - Mariano C González Lebrero
- Departamento de Quı́mica Inorgánica, Analı́tica y Quı́mica Fı́sica/INQUIMAE-CONICET, Facultad de Ciencias Exactas y Naturales, Universidad de Buenos Aires, Ciudad Universitaria, Pab. II, Buenos Aires C1428EHA, Argentina
| | - Rodolfo R Biekofsky
- Moebius Research Ltd., Systems Biomedicine, 24 Chedworth House, West Green Rd, N15 5EH London, U.K
| | - Darío A Estrin
- Departamento de Quı́mica Inorgánica, Analı́tica y Quı́mica Fı́sica/INQUIMAE-CONICET, Facultad de Ciencias Exactas y Naturales, Universidad de Buenos Aires, Ciudad Universitaria, Pab. II, Buenos Aires C1428EHA, Argentina
| |
Collapse
|
31
|
Copperman J, Aristoff D, Makarov DE, Simpson G, Zuckerman DM. Transient probability currents provide upper and lower bounds on non-equilibrium steady-state currents in the Smoluchowski picture. J Chem Phys 2019; 151:174108. [PMID: 31703496 PMCID: PMC7043855 DOI: 10.1063/1.5120511] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2019] [Accepted: 10/14/2019] [Indexed: 01/04/2023] Open
Abstract
Probability currents are fundamental in characterizing the kinetics of nonequilibrium processes. Notably, the steady-state current Jss for a source-sink system can provide the exact mean-first-passage time (MFPT) for the transition from the source to sink. Because transient nonequilibrium behavior is quantified in some modern path sampling approaches, such as the "weighted ensemble" strategy, there is strong motivation to determine bounds on Jss-and hence on the MFPT-as the system evolves in time. Here, we show that Jss is bounded from above and below by the maximum and minimum, respectively, of the current as a function of the spatial coordinate at any time t for one-dimensional systems undergoing overdamped Langevin (i.e., Smoluchowski) dynamics and for higher-dimensional Smoluchowski systems satisfying certain assumptions when projected onto a single dimension. These bounds become tighter with time, making them of potential practical utility in a scheme for estimating Jss and the long time scale kinetics of complex systems. Conceptually, the bounds result from the fact that extrema of the transient currents relax toward the steady-state current.
Collapse
Affiliation(s)
- Jeremy Copperman
- Department of Biomedical Engineering, Oregon Health and Science University, Portland, Oregon 97239, USA
| | - David Aristoff
- Department of Mathematics, Colorado State University, Fort Collins, Colorado 80523, USA
| | - Dmitrii E Makarov
- Department of Chemistry and Oden Institute for Computational Engineering and Sciences, University of Texas, Austin, Texas 78712, USA
| | - Gideon Simpson
- Department of Mathematics, Drexel University, Philadelphia, Pennsylvania 19104, USA
| | - Daniel M Zuckerman
- Department of Biomedical Engineering, Oregon Health and Science University, Portland, Oregon 97239, USA
| |
Collapse
|
32
|
Husic BE, Noé F. Deflation reveals dynamical structure in nondominant reaction coordinates. J Chem Phys 2019. [DOI: 10.1063/1.5099194] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Affiliation(s)
- Brooke E. Husic
- Department of Mathematics and Computer Science, Freie Universität, 14195 Berlin, Germany
- Department of Chemistry, Stanford University, Stanford, California 94305, USA
| | - Frank Noé
- Department of Mathematics and Computer Science, Freie Universität, 14195 Berlin, Germany
- Department of Chemistry, Rice University, Houston, Texas 77005, USA
| |
Collapse
|
33
|
Sengupta U, Carballo-Pacheco M, Strodel B. Automated Markov state models for molecular dynamics simulations of aggregation and self-assembly. J Chem Phys 2019; 150:115101. [DOI: 10.1063/1.5083915] [Citation(s) in RCA: 34] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022] Open
Affiliation(s)
- Ushnish Sengupta
- Institute of Complex Systems: Structural Biochemistry (ICS-6), Forschungszentrum Jülich, 52425 Jülich, Germany
- Department of Engineering, University of Cambridge, Cambridge CB2 1PZ, United Kingdom
| | - Martín Carballo-Pacheco
- Institute of Complex Systems: Structural Biochemistry (ICS-6), Forschungszentrum Jülich, 52425 Jülich, Germany
- AICES Graduate School, RWTH Aachen University, Schinkelstraße 2, 52062 Aachen, Germany
- School of Physics and Astronomy, University of Edinburgh, Peter Guthrie Tait Road, Edinburgh EH9 3FD, United Kingdom
| | - Birgit Strodel
- Institute of Complex Systems: Structural Biochemistry (ICS-6), Forschungszentrum Jülich, 52425 Jülich, Germany
- Institute of Theoretical and Computational Chemistry, Heinrich Heine University Düsseldorf, 40225 Düsseldorf, Germany
| |
Collapse
|
34
|
Zhang YY, Niu H, Piccini G, Mendels D, Parrinello M. Improving collective variables: The case of crystallization. J Chem Phys 2019; 150:094509. [PMID: 30849916 DOI: 10.1063/1.5081040] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022] Open
Abstract
Several enhanced sampling methods, such as umbrella sampling or metadynamics, rely on the identification of an appropriate set of collective variables. Recently two methods have been proposed to alleviate the task of determining efficient collective variables. One is based on linear discriminant analysis; the other is based on a variational approach to conformational dynamics and uses time-lagged independent component analysis. In this paper, we compare the performance of these two approaches in the study of the homogeneous crystallization of two simple metals. We focus on Na and Al and search for the most efficient collective variables that can be expressed as a linear combination of X-ray diffraction peak intensities. We find that the performances of the two methods are very similar. Wherever the different metastable states are well-separated, the method based on linear discriminant analysis, based on its harmonic version, is to be preferred because simpler to implement and less computationally demanding. The variational approach, however, has the potential to discover the existence of different metastable states.
Collapse
Affiliation(s)
- Yue-Yu Zhang
- Department of Chemistry and Applied Biosciences, ETH Zurich, c/o USI Campus, Via Giuseppe Buffi 13, CH-6900 Lugano, Ticino, Switzerland
| | - Haiyang Niu
- Department of Chemistry and Applied Biosciences, ETH Zurich, c/o USI Campus, Via Giuseppe Buffi 13, CH-6900 Lugano, Ticino, Switzerland
| | - GiovanniMaria Piccini
- Department of Chemistry and Applied Biosciences, ETH Zurich, c/o USI Campus, Via Giuseppe Buffi 13, CH-6900 Lugano, Ticino, Switzerland
| | - Dan Mendels
- Department of Chemistry and Applied Biosciences, ETH Zurich, c/o USI Campus, Via Giuseppe Buffi 13, CH-6900 Lugano, Ticino, Switzerland
| | - Michele Parrinello
- Department of Chemistry and Applied Biosciences, ETH Zurich, c/o USI Campus, Via Giuseppe Buffi 13, CH-6900 Lugano, Ticino, Switzerland
| |
Collapse
|
35
|
Schöberl M, Zabaras N, Koutsourelakis PS. Predictive collective variable discovery with deep Bayesian models. J Chem Phys 2019; 150:024109. [DOI: 10.1063/1.5058063] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Affiliation(s)
- Markus Schöberl
- Center for Informatics and Computational Science, University of Notre Dame, 311 Cushing Hall, Notre Dame, Indiana 46556, USA
- Continuum Mechanics Group, Technical University of Munich, Boltzmannstraße 15, 85748 Garching, Germany
| | - Nicholas Zabaras
- Center for Informatics and Computational Science, University of Notre Dame, 311 Cushing Hall, Notre Dame, Indiana 46556, USA
| | | |
Collapse
|
36
|
Ahalawat N, Mondal J. Assessment and optimization of collective variables for protein conformational landscape: GB1 β-hairpin as a case study. J Chem Phys 2018; 149:094101. [PMID: 30195312 DOI: 10.1063/1.5041073] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Collective variables (CVs), when chosen judiciously, can play an important role in recognizing rate-limiting processes and rare events in any biomolecular systems. However, high dimensionality and inherent complexities associated with such biochemical systems render the identification of an optimal CV a challenging task, which in turn precludes the elucidation of an underlying conformational landscape in sufficient details. In this context, a relevant model system is presented by a 16-residue β-hairpin of GB1 protein. Despite being the target of numerous theoretical and computational studies for understanding the protein folding, the set of CVs optimally characterizing the conformational landscape of the β-hairpin of GB1 protein has remained elusive, resulting in a lack of consensus on its folding mechanism. Here we address this by proposing a pair of optimal CVs which can resolve the underlying free energy landscape of the GB1 hairpin quite efficiently. Expressed as a linear combination of a number of traditional CVs, the optimal CV for this system is derived by employing the recently introduced time-structured independent component analysis approach on a large number of independent unbiased simulations. By projecting the replica-exchange simulated trajectories along these pair of optimized CVs, the resulting free energy landscape of this system is able to resolve four distinct well-separated metastable states encompassing the extensive ensembles of folded, unfolded, and molten globule states. Importantly, the optimized CVs were found to be capable of automatically recovering a novel partial helical state of this protein, without needing to explicitly invoke helicity as a constituent CV. Furthermore, a quantitative sensitivity analysis of each constituent in the optimized CV provided key insights on the relative contributions of the constituent CVs in the overall free energy landscapes. Finally, the kinetic pathways connecting these metastable states, constructed using a Markov state model, provide an optimum description of the underlying folding mechanism of the peptide. Taken together, this work offers a quantitatively robust approach toward comprehensive mapping of the underlying folding landscape of a quintessential model system along its optimized CV.
Collapse
Affiliation(s)
- Navjeet Ahalawat
- Tata Institute of Fundamental Research, Center for Interdisciplinary Sciences, Hyderabad 500107, India
| | - Jagannath Mondal
- Tata Institute of Fundamental Research, Center for Interdisciplinary Sciences, Hyderabad 500107, India
| |
Collapse
|
37
|
Kells A, Annibale A, Rosta E. Limiting relaxation times from Markov state models. J Chem Phys 2018; 149:072324. [DOI: 10.1063/1.5027203] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
Affiliation(s)
- Adam Kells
- Department of Chemistry, King’s College London, SE1 1DB London, United Kingdom
| | - Alessia Annibale
- Department of Mathematics, King’s College London, WC2R 2LS London, United Kingdom
| | - Edina Rosta
- Department of Chemistry, King’s College London, SE1 1DB London, United Kingdom
| |
Collapse
|
38
|
Lindahl V, Lidmar J, Hess B. Riemann metric approach to optimal sampling of multidimensional free-energy landscapes. Phys Rev E 2018; 98:023312. [PMID: 30253489 DOI: 10.1103/physreve.98.023312] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2018] [Indexed: 06/08/2023]
Abstract
Exploring the free-energy landscape along reaction coordinates or system parameters λ is central to many studies of high-dimensional model systems in physics, e.g., large molecules or spin glasses. In simulations this usually requires sampling conformational transitions or phase transitions, but efficient sampling is often difficult to attain due to the roughness of the energy landscape. For Boltzmann distributions, crossing rates decrease exponentially with free-energy barrier heights. Thus, exponential acceleration can be achieved in simulations by applying an artificial bias along λ tuned such that a flat target distribution is obtained. A flat distribution is, however, an ambiguous concept unless a proper metric is used and is generally suboptimal. Here we propose a multidimensional Riemann metric, which takes the local diffusion into account, and redefine uniform sampling such that it is invariant under nonlinear coordinate transformations. We use the metric in combination with the accelerated weight histogram method, a free-energy calculation and sampling method, to adaptively optimize sampling toward the target distribution prescribed by the metric. We demonstrate that for complex problems, such as molecular dynamics simulations of DNA base-pair opening, sampling uniformly according to the metric, which can be calculated without significant computational overhead, improves sampling efficiency by 50%-70%.
Collapse
Affiliation(s)
- Viveca Lindahl
- Department of Physics and Swedish e-Science Research Center, KTH Royal Institute of Technology, 10691 Stockholm, Sweden
| | - Jack Lidmar
- Department of Physics and Swedish e-Science Research Center, KTH Royal Institute of Technology, 10691 Stockholm, Sweden
| | - Berk Hess
- Department of Physics and Swedish e-Science Research Center, KTH Royal Institute of Technology, 10691 Stockholm, Sweden
| |
Collapse
|
39
|
Hernández CX, Wayment-Steele HK, Sultan MM, Husic BE, Pande VS. Variational encoding of complex dynamics. Phys Rev E 2018; 97:062412. [PMID: 30011547 DOI: 10.1103/physreve.97.062412] [Citation(s) in RCA: 84] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2017] [Indexed: 11/07/2022]
Abstract
Often the analysis of time-dependent chemical and biophysical systems produces high-dimensional time-series data for which it can be difficult to interpret which individual features are most salient. While recent work from our group and others has demonstrated the utility of time-lagged covariate models to study such systems, linearity assumptions can limit the compression of inherently nonlinear dynamics into just a few characteristic components. Recent work in the field of deep learning has led to the development of the variational autoencoder (VAE), which is able to compress complex datasets into simpler manifolds. We present the use of a time-lagged VAE, or variational dynamics encoder (VDE), to reduce complex, nonlinear processes to a single embedding with high fidelity to the underlying dynamics. We demonstrate how the VDE is able to capture nontrivial dynamics in a variety of examples, including Brownian dynamics and atomistic protein folding. Additionally, we demonstrate a method for analyzing the VDE model, inspired by saliency mapping, to determine what features are selected by the VDE model to describe dynamics. The VDE presents an important step in applying techniques from deep learning to more accurately model and interpret complex biophysics.
Collapse
Affiliation(s)
| | | | - Mohammad M Sultan
- Chemistry Department, Stanford University, Stanford, California, USA
| | - Brooke E Husic
- Chemistry Department, Stanford University, Stanford, California, USA
| | - Vijay S Pande
- Biophysics Program, Stanford University, Stanford, California, USA.,Chemistry Department, Stanford University, Stanford, California, USA
| |
Collapse
|
40
|
Ahn SH, Grate JW, Darve EF. Efficiently sampling conformations and pathways using the concurrent adaptive sampling (CAS) algorithm. J Chem Phys 2018; 147:074115. [PMID: 28830168 DOI: 10.1063/1.4999097] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023] Open
Abstract
Molecular dynamics simulations are useful in obtaining thermodynamic and kinetic properties of bio-molecules, but they are limited by the time scale barrier. That is, we may not obtain properties' efficiently because we need to run microseconds or longer simulations using femtosecond time steps. To overcome this time scale barrier, we can use the weighted ensemble (WE) method, a powerful enhanced sampling method that efficiently samples thermodynamic and kinetic properties. However, the WE method requires an appropriate partitioning of phase space into discrete macrostates, which can be problematic when we have a high-dimensional collective space or when little is known a priori about the molecular system. Hence, we developed a new WE-based method, called the "Concurrent Adaptive Sampling (CAS) algorithm," to tackle these issues. The CAS algorithm is not constrained to use only one or two collective variables, unlike most reaction coordinate-dependent methods. Instead, it can use a large number of collective variables and adaptive macrostates to enhance the sampling in the high-dimensional space. This is especially useful for systems in which we do not know what the right reaction coordinates are, in which case we can use many collective variables to sample conformations and pathways. In addition, a clustering technique based on the committor function is used to accelerate sampling the slowest process in the molecular system. In this paper, we introduce the new method and show results from two-dimensional models and bio-molecules, specifically penta-alanine and a triazine trimer.
Collapse
Affiliation(s)
- Surl-Hee Ahn
- Chemistry Department, Stanford University, Stanford, California 94305, USA
| | - Jay W Grate
- Pacific Northwest National Laboratory, Richland, Washington 99352, USA
| | - Eric F Darve
- Mechanical Engineering Department, Stanford University, Stanford, California 94305, USA
| |
Collapse
|
41
|
Chattopadhyay A, Zheng M, Waller MP, Priyakumar UD. A Probabilistic Framework for Constructing Temporal Relations in Replica Exchange Molecular Trajectories. J Chem Theory Comput 2018; 14:3365-3380. [PMID: 29791153 DOI: 10.1021/acs.jctc.7b01245] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Knowledge of the structure and dynamics of biomolecules is essential for elucidating the underlying mechanisms of biological processes. Given the stochastic nature of many biological processes, like protein unfolding, it is almost impossible that two independent simulations will generate the exact same sequence of events, which makes direct analysis of simulations difficult. Statistical models like Markov chains, transition networks, etc. help in shedding some light on the mechanistic nature of such processes by predicting long-time dynamics of these systems from short simulations. However, such methods fall short in analyzing trajectories with partial or no temporal information, for example, replica exchange molecular dynamics or Monte Carlo simulations. In this work, we propose a probabilistic algorithm, borrowing concepts from graph theory and machine learning, to extract reactive pathways from molecular trajectories in the absence of temporal data. A suitable vector representation was chosen to represent each frame in the macromolecular trajectory (as a series of interaction and conformational energies), and dimensionality reduction was performed using principal component analysis (PCA). The trajectory was then clustered using a density-based clustering algorithm, where each cluster represents a metastable state on the potential energy surface (PES) of the biomolecule under study. A graph was created with these clusters as nodes with the edges learned using an iterative expectation maximization algorithm. The most reactive path is conceived as the widest path along this graph. We have tested our method on RNA hairpin unfolding trajectory in aqueous urea solution. Our method makes the understanding of the mechanism of unfolding in the RNA hairpin molecule more tractable. As this method does not rely on temporal data, it can be used to analyze trajectories from Monte Carlo sampling techniques and replica exchange molecular dynamics (REMD).
Collapse
Affiliation(s)
- Aditya Chattopadhyay
- Centre for Computational Natural Sciences and Bioinformatics , International Institute of Information Technology , Hyderabad 500032 , India
| | - Min Zheng
- Centre for Multiscale Theory and Computation , Westfälische Wilhelms-Universität Münster , Münster , Germany
| | - Mark P Waller
- Department of Physics and International Centre for Quantum and Molecular Structures , Shanghai University , Shanghai , 200444 , People's Republic of China
| | - U Deva Priyakumar
- Centre for Computational Natural Sciences and Bioinformatics , International Institute of Information Technology , Hyderabad 500032 , India
| |
Collapse
|
42
|
Brandt S, Sittel F, Ernst M, Stock G. Machine Learning of Biomolecular Reaction Coordinates. J Phys Chem Lett 2018; 9:2144-2150. [PMID: 29630378 DOI: 10.1021/acs.jpclett.8b00759] [Citation(s) in RCA: 44] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
We present a systematic approach to reduce the dimensionality of a complex molecular system. Starting with a data set of molecular coordinates (obtained from experiment or simulation) and an associated set of metastable conformational states (obtained from clustering the data), a supervised machine learning model is trained to assign unknown molecular structures to the set of metastable states. In this way, the model learns to determine the features of the molecular coordinates that are most important to discriminate the states. Using a new algorithm that exploits this feature importance via an iterative exclusion principle, we identify the essential internal coordinates (such as specific interatomic distances or dihedral angles) of the system, which are shown to represent versatile reaction coordinates that account for the dynamics of the slow degrees of freedom and explain the mechanism of the underlying processes. Moreover, these coordinates give rise to a free energy landscape that may reveal previously hidden intermediate states of the system.
Collapse
Affiliation(s)
- Simon Brandt
- Biomolecular Dynamics, Institute of Physics , Albert Ludwigs University , 79104 Freiburg , Germany
| | - Florian Sittel
- Biomolecular Dynamics, Institute of Physics , Albert Ludwigs University , 79104 Freiburg , Germany
| | - Matthias Ernst
- Biomolecular Dynamics, Institute of Physics , Albert Ludwigs University , 79104 Freiburg , Germany
| | - Gerhard Stock
- Biomolecular Dynamics, Institute of Physics , Albert Ludwigs University , 79104 Freiburg , Germany
| |
Collapse
|
43
|
Heuer MA, Vaucher AC, Haag MP, Reiher M. Integrated Reaction Path Processing from Sampled Structure Sequences. J Chem Theory Comput 2018. [PMID: 29518323 DOI: 10.1021/acs.jctc.8b00019] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Sampled structure sequences obtained, for instance, from real-time reactivity explorations or first-principles molecular dynamics simulations contain valuable information about chemical reactivity. Eventually, such sequences allow for the construction of reaction networks that are required for the kinetic analysis of chemical systems. For this purpose, however, the sampled information must be processed to obtain stable chemical structures and associated transition states. The manual extraction of valuable information from such reaction paths is straightforward but unfeasible for large and complex reaction networks. For real-time quantum chemistry, this implies automatization of the extraction and relaxation process while maintaining immersion in the virtual chemical environment. Here, we describe an efficient path processing scheme for the on-the-fly construction of an exploration network by approximating the explored paths as continuous basis-spline curves.
Collapse
Affiliation(s)
- Michael A Heuer
- ETH Zürich , Laboratorium für Physikalische Chemie , Vladimir-Prelog-Weg 2 , CH-8093 Zürich , Switzerland
| | - Alain C Vaucher
- ETH Zürich , Laboratorium für Physikalische Chemie , Vladimir-Prelog-Weg 2 , CH-8093 Zürich , Switzerland
| | - Moritz P Haag
- ETH Zürich , Laboratorium für Physikalische Chemie , Vladimir-Prelog-Weg 2 , CH-8093 Zürich , Switzerland
| | - Markus Reiher
- ETH Zürich , Laboratorium für Physikalische Chemie , Vladimir-Prelog-Weg 2 , CH-8093 Zürich , Switzerland
| |
Collapse
|
44
|
Sultan MM, Wayment-Steele HK, Pande VS. Transferable Neural Networks for Enhanced Sampling of Protein Dynamics. J Chem Theory Comput 2018. [DOI: 10.1021/acs.jctc.8b00025] [Citation(s) in RCA: 60] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]
|
45
|
Affiliation(s)
- Brooke E. Husic
- Department of Chemistry, Stanford University, Stanford, California 94305, United States
| | - Vijay S. Pande
- Department of Chemistry, Stanford University, Stanford, California 94305, United States
| |
Collapse
|
46
|
Husic BE, McKiernan KA, Wayment-Steele HK, Sultan MM, Pande VS. A Minimum Variance Clustering Approach Produces Robust and Interpretable Coarse-Grained Models. J Chem Theory Comput 2018; 14:1071-1082. [PMID: 29253336 DOI: 10.1021/acs.jctc.7b01004] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Markov state models (MSMs) are a powerful framework for the analysis of molecular dynamics data sets, such as protein folding simulations, because of their straightforward construction and statistical rigor. The coarse-graining of MSMs into an interpretable number of macrostates is a crucial step for connecting theoretical results with experimental observables. Here we present the minimum variance clustering approach (MVCA) for the coarse-graining of MSMs into macrostate models. The method utilizes agglomerative clustering with Ward's minimum variance objective function, and the similarity of the microstate dynamics is determined using the Jensen-Shannon divergence between the corresponding rows in the MSM transition probability matrix. We first show that MVCA produces intuitive results for a simple tripeptide system and is robust toward long-duration statistical artifacts. MVCA is then applied to two protein folding simulations of the same protein in different force fields to demonstrate that a different number of macrostates is appropriate for each model, revealing a misfolded state present in only one of the simulations. Finally, we show that the same method can be used to analyze a data set containing many MSMs from simulations in different force fields by aggregating them into groups and quantifying their dynamical similarity in the context of force field parameter choices. The minimum variance clustering approach with the Jensen-Shannon divergence provides a powerful tool to group dynamics by similarity, both among model states and among dynamical models themselves.
Collapse
Affiliation(s)
- Brooke E Husic
- Department of Chemistry, Stanford University , Stanford, California 94305, United States
| | - Keri A McKiernan
- Department of Chemistry, Stanford University , Stanford, California 94305, United States
| | | | - Mohammad M Sultan
- Department of Chemistry, Stanford University , Stanford, California 94305, United States
| | - Vijay S Pande
- Department of Chemistry, Stanford University , Stanford, California 94305, United States
| |
Collapse
|
47
|
Feng J, Shukla D. Characterizing Conformational Dynamics of Proteins Using Evolutionary Couplings. J Phys Chem B 2018; 122:1017-1025. [DOI: 10.1021/acs.jpcb.7b07529] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Affiliation(s)
- Jiangyan Feng
- Department
of Chemical and Biomolecular Engineering, ‡Center for Biophysics and Quantitative
Biology, §Department of Plant Biology, and ∥National Center for Supercomputing Applications, University of Illinois, Urbana, Illinois 61801, United States
| | - Diwakar Shukla
- Department
of Chemical and Biomolecular Engineering, ‡Center for Biophysics and Quantitative
Biology, §Department of Plant Biology, and ∥National Center for Supercomputing Applications, University of Illinois, Urbana, Illinois 61801, United States
| |
Collapse
|
48
|
Millisecond dynamics of BTK reveal kinome-wide conformational plasticity within the apo kinase domain. Sci Rep 2017; 7:15604. [PMID: 29142210 PMCID: PMC5688120 DOI: 10.1038/s41598-017-10697-0] [Citation(s) in RCA: 42] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2017] [Accepted: 08/14/2017] [Indexed: 12/20/2022] Open
Abstract
Bruton tyrosine kinase (BTK) is a key enzyme in B-cell development whose improper regulation causes severe immunodeficiency diseases. Design of selective BTK therapeutics would benefit from improved, in-silico structural modeling of the kinase’s solution ensemble. However, this remains challenging due to the immense computational cost of sampling events on biological timescales. In this work, we combine multi-millisecond molecular dynamics (MD) simulations with Markov state models (MSMs) to report on the thermodynamics, kinetics, and accessible states of BTK’s kinase domain. Our conformational landscape links the active state to several inactive states, connected via a structurally diverse intermediate. Our calculations predict a kinome-wide conformational plasticity, and indicate the presence of several new potentially druggable BTK states. We further find that the population of these states and the kinetics of their inter-conversion are modulated by protonation of an aspartate residue, establishing the power of MD & MSMs in predicting effects of chemical perturbations.
Collapse
|
49
|
Bittracher A, Koltai P, Klus S, Banisch R, Dellnitz M, Schütte C. Transition Manifolds of Complex Metastable Systems: Theory and Data-Driven Computation of Effective Dynamics. JOURNAL OF NONLINEAR SCIENCE 2017; 28:471-512. [PMID: 29527099 PMCID: PMC5835149 DOI: 10.1007/s00332-017-9415-0] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/28/2017] [Accepted: 09/15/2017] [Indexed: 06/01/2023]
Abstract
We consider complex dynamical systems showing metastable behavior, but no local separation of fast and slow time scales. The article raises the question of whether such systems exhibit a low-dimensional manifold supporting its effective dynamics. For answering this question, we aim at finding nonlinear coordinates, called reaction coordinates, such that the projection of the dynamics onto these coordinates preserves the dominant time scales of the dynamics. We show that, based on a specific reducibility property, the existence of good low-dimensional reaction coordinates preserving the dominant time scales is guaranteed. Based on this theoretical framework, we develop and test a novel numerical approach for computing good reaction coordinates. The proposed algorithmic approach is fully local and thus not prone to the curse of dimension with respect to the state space of the dynamics. Hence, it is a promising method for data-based model reduction of complex dynamical systems such as molecular dynamics.
Collapse
Affiliation(s)
- Andreas Bittracher
- Department of Mathematics and Computer Science, Freie Universität Berlin, Berlin, Germany
| | - Péter Koltai
- Department of Mathematics and Computer Science, Freie Universität Berlin, Berlin, Germany
| | - Stefan Klus
- Department of Mathematics and Computer Science, Freie Universität Berlin, Berlin, Germany
| | - Ralf Banisch
- Department of Mathematics and Computer Science, Freie Universität Berlin, Berlin, Germany
| | - Michael Dellnitz
- Department of Mathematics, Paderborn University, Paderborn, Germany
| | - Christof Schütte
- Department of Mathematics and Computer Science, Freie Universität Berlin, Berlin, Germany
- Zuse Institute Berlin, Berlin, Germany
| |
Collapse
|
50
|
McKiernan KA, Husic BE, Pande VS. Modeling the mechanism of CLN025 beta-hairpin formation. J Chem Phys 2017; 147:104107. [PMID: 28915754 PMCID: PMC5597441 DOI: 10.1063/1.4993207] [Citation(s) in RCA: 30] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2017] [Accepted: 08/24/2017] [Indexed: 01/26/2023] Open
Abstract
Beta-hairpins are substructures found in proteins that can lend insight into more complex systems. Furthermore, the folding of beta-hairpins is a valuable test case for benchmarking experimental and theoretical methods. Here, we simulate the folding of CLN025, a miniprotein with a beta-hairpin structure, at its experimental melting temperature using a range of state-of-the-art protein force fields. We construct Markov state models in order to examine the thermodynamics, kinetics, mechanism, and rate-determining step of folding. Mechanistically, we find the folding process is rate-limited by the formation of the turn region hydrogen bonds, which occurs following the downhill hydrophobic collapse of the extended denatured protein. These results are presented in the context of established and contradictory theories of the beta-hairpin folding process. Furthermore, our analysis suggests that the AMBER-FB15 force field, at this temperature, best describes the characteristics of the full experimental CLN025 conformational ensemble, while the AMBER ff99SB-ILDN and CHARMM22* force fields display a tendency to overstabilize the native state.
Collapse
Affiliation(s)
- Keri A McKiernan
- Department of Chemistry, Stanford University, Stanford, California 94305, USA
| | - Brooke E Husic
- Department of Chemistry, Stanford University, Stanford, California 94305, USA
| | - Vijay S Pande
- Department of Chemistry, Stanford University, Stanford, California 94305, USA
| |
Collapse
|