1
|
Domingues TS, Coifman R, Haji-Akbari A. Estimating Position-Dependent and Anisotropic Diffusivity Tensors from Molecular Dynamics Trajectories: Existing Methods and Future Outlook. J Chem Theory Comput 2024; 20:4427-4455. [PMID: 38815171 DOI: 10.1021/acs.jctc.4c00148] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/01/2024]
Abstract
Confinement can substantially alter the physicochemical properties of materials by breaking translational isotropy and rendering all physical properties position-dependent. Molecular dynamics (MD) simulations have proven instrumental in characterizing such spatial heterogeneities and probing the impact of confinement on materials' properties. For static properties, this is a straightforward task and can be achieved via simple spatial binning. Such an approach, however, cannot be readily applied to transport coefficients due to lack of natural extensions of autocorrelations used for their calculation in the bulk. The prime example of this challenge is diffusivity, which, in the bulk, can be readily estimated from the particles' mobility statistics, which satisfy the Fokker-Planck equation. Under confinement, however, such statistics will follow the Smoluchowski equation, which lacks a closed-form analytical solution. This brief review explores the rich history of estimating profiles of the diffusivity tensor from MD simulations and discusses various approximate methods and algorithms developed for this purpose. Besides discussing heuristic extensions of bulk methods, we overview more rigorous algorithms, including kernel-based methods, Bayesian approaches, and operator discretization techniques. Additionally, we outline methods based on applying biasing potentials or imposing constraints on tracer particles. Finally, we discuss approaches that estimate diffusivity from mean first passage time or committor probability profiles, a conceptual framework originally developed in the context of collective variable spaces describing rare events in computational chemistry and biology. In summary, this paper offers a concise survey of diverse approaches for estimating diffusivity from MD trajectories, highlighting challenges and opportunities in this area.
Collapse
Affiliation(s)
- Tiago S Domingues
- Department of Chemical and Environmental Engineering, Yale University, New Haven, Connecticut 06520, United States
| | - Ronald Coifman
- Department of Mathematics, Yale University, New Haven, Connecticut 06520, United States
- Department of Computer Science, Yale University, New Haven, Connecticut 06520, United States
| | - Amir Haji-Akbari
- Department of Chemical and Environmental Engineering, Yale University, New Haven, Connecticut 06520, United States
| |
Collapse
|
2
|
Chatterjee H, Mahapatra AJ, Zacharias M, Sengupta N. Helical reorganization in the context of membrane protein folding: Insights from simulations with bacteriorhodopsin (BR) fragments. BIOCHIMICA ET BIOPHYSICA ACTA. BIOMEMBRANES 2024; 1866:184333. [PMID: 38740122 DOI: 10.1016/j.bbamem.2024.184333] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/07/2024] [Revised: 04/20/2024] [Accepted: 05/09/2024] [Indexed: 05/16/2024]
Abstract
Membrane protein folding is distinct from folding of soluble proteins. Conformational acquisition in major membrane protein subclasses can be delineated into insertion and folding processes. An exception to the "two stage" folding, later developed to "three stage" folding, is observed within the last two helices in bacteriorhodopsin (BR), a system that serves as a model membrane protein. We employ a reductionist approach to understand interplay of molecular factors underlying the apparent defiance. Leveraging available solution NMR structures, we construct, sample in silico, and analyze partially (PIn) and fully inserted (FIn) BR membrane states. The membrane lateral C-terminal helix (CH) in PIn is markedly prone to transient structural distortions over microsecond timescales; a disorder prone region (DPR) is thereby identified. While clear transmembrane propensities are not acquired, the distortions induce alterations in local membrane curvature and area per lipid. Importantly, energetic decompositions reveal that overall, the N-terminal helix (NH) is thermodynamically more stable in the PIn. Higher overall stability of the FIn arises from favorable interactions between the NH and the CH. Our results establish lack of spontaneous transition of the PIn to the FIn, and attributes their partitioning to barriers that exceed those accessible with thermal fluctuations. This work paves the way for further detailed studies aimed at determining the thermo-kinetic roles of the initial five helices, or complementary external factors, in complete helical folding and insertion in BR. We comment that complementing such efforts with the growing field of machine learning assisted energy landscape searches may offer unprecedented insights.
Collapse
Affiliation(s)
- Hindol Chatterjee
- Department of Biological Sciences, Indian Institute of Science Education and Research Kolkata, Mohanpur, West Bengal 741246, India
| | - Anshuman J Mahapatra
- Department of Biological Sciences, Indian Institute of Science Education and Research Kolkata, Mohanpur, West Bengal 741246, India
| | - Martin Zacharias
- Center for Functional Protein Assemblies, TUM School of Natural Sciences Technical University Munich, Ernst-Otto-Fischer-Straße 8, 85748 Garching, Germany.
| | - Neelanjana Sengupta
- Department of Biological Sciences, Indian Institute of Science Education and Research Kolkata, Mohanpur, West Bengal 741246, India.
| |
Collapse
|
3
|
Mehdi S, Smith Z, Herron L, Zou Z, Tiwary P. Enhanced Sampling with Machine Learning. Annu Rev Phys Chem 2024; 75:347-370. [PMID: 38382572 PMCID: PMC11213683 DOI: 10.1146/annurev-physchem-083122-125941] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/23/2024]
Abstract
Molecular dynamics (MD) enables the study of physical systems with excellent spatiotemporal resolution but suffers from severe timescale limitations. To address this, enhanced sampling methods have been developed to improve the exploration of configurational space. However, implementing these methods is challenging and requires domain expertise. In recent years, integration of machine learning (ML) techniques into different domains has shown promise, prompting their adoption in enhanced sampling as well. Although ML is often employed in various fields primarily due to its data-driven nature, its integration with enhanced sampling is more natural with many common underlying synergies. This review explores the merging of ML and enhanced MD by presenting different shared viewpoints. It offers a comprehensive overview of this rapidly evolving field, which can be difficult to stay updated on. We highlight successful strategies such as dimensionality reduction, reinforcement learning, and flow-based methods. Finally, we discuss open problems at the exciting ML-enhanced MD interface.
Collapse
Affiliation(s)
- Shams Mehdi
- Institute for Physical Science and Technology, University of Maryland, College Park, Maryland, USA;
- Biophysics Program, University of Maryland, College Park, Maryland, USA
| | - Zachary Smith
- Institute for Physical Science and Technology, University of Maryland, College Park, Maryland, USA;
- Biophysics Program, University of Maryland, College Park, Maryland, USA
| | - Lukas Herron
- Institute for Physical Science and Technology, University of Maryland, College Park, Maryland, USA;
- Biophysics Program, University of Maryland, College Park, Maryland, USA
| | - Ziyue Zou
- Department of Chemistry and Biochemistry, University of Maryland, College Park, Maryland, USA
| | - Pratyush Tiwary
- Institute for Physical Science and Technology, University of Maryland, College Park, Maryland, USA;
- Department of Chemistry and Biochemistry, University of Maryland, College Park, Maryland, USA
| |
Collapse
|
4
|
Wu Y, Cao S, Qiu Y, Huang X. Tutorial on how to build non-Markovian dynamic models from molecular dynamics simulations for studying protein conformational changes. J Chem Phys 2024; 160:121501. [PMID: 38516972 DOI: 10.1063/5.0189429] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2023] [Accepted: 02/20/2024] [Indexed: 03/23/2024] Open
Abstract
Protein conformational changes play crucial roles in their biological functions. In recent years, the Markov State Model (MSM) constructed from extensive Molecular Dynamics (MD) simulations has emerged as a powerful tool for modeling complex protein conformational changes. In MSMs, dynamics are modeled as a sequence of Markovian transitions among metastable conformational states at discrete time intervals (called lag time). A major challenge for MSMs is that the lag time must be long enough to allow transitions among states to become memoryless (or Markovian). However, this lag time is constrained by the length of individual MD simulations available to track these transitions. To address this challenge, we have recently developed Generalized Master Equation (GME)-based approaches, encoding non-Markovian dynamics using a time-dependent memory kernel. In this Tutorial, we introduce the theory behind two recently developed GME-based non-Markovian dynamic models: the quasi-Markov State Model (qMSM) and the Integrative Generalized Master Equation (IGME). We subsequently outline the procedures for constructing these models and provide a step-by-step tutorial on applying qMSM and IGME to study two peptide systems: alanine dipeptide and villin headpiece. This Tutorial is available at https://github.com/xuhuihuang/GME_tutorials. The protocols detailed in this Tutorial aim to be accessible for non-experts interested in studying the biomolecular dynamics using these non-Markovian dynamic models.
Collapse
Affiliation(s)
- Yue Wu
- Department of Chemistry, Theoretical Chemistry Institute, University of Wisconsin-Madison, Madison, Wisconsin 53706, USA
| | - Siqin Cao
- Department of Chemistry, Theoretical Chemistry Institute, University of Wisconsin-Madison, Madison, Wisconsin 53706, USA
| | - Yunrui Qiu
- Department of Chemistry, Theoretical Chemistry Institute, University of Wisconsin-Madison, Madison, Wisconsin 53706, USA
| | - Xuhui Huang
- Department of Chemistry, Theoretical Chemistry Institute, University of Wisconsin-Madison, Madison, Wisconsin 53706, USA
- Data Science Institute, University of Wisconsin-Madison, Madison, Wisconsin 53706, USA
| |
Collapse
|
5
|
Banerjee P, Monje-Galvan V, Voth GA. Cooperative Membrane Binding of HIV-1 Matrix Proteins. J Phys Chem B 2024; 128:2595-2606. [PMID: 38477117 PMCID: PMC10962350 DOI: 10.1021/acs.jpcb.3c06222] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2023] [Revised: 02/24/2024] [Accepted: 02/27/2024] [Indexed: 03/14/2024]
Abstract
The HIV-1 assembly process begins with a newly synthesized Gag polyprotein being targeted to the inner leaflet of the plasma membrane of the infected cells to form immature viral particles. Gag-membrane interactions are mediated through the myristoylated (Myr) N-terminal matrix (MA) domain of Gag, which eventually multimerize on the membrane to form trimers and higher order oligomers. The study of the structure and dynamics of peripheral membrane proteins like MA has been challenging for both experimental and computational studies due to the complex transient dynamics of protein-membrane interactions. Although the roles of anionic phospholipids (PIP2, PS) and the Myr group in the membrane targeting and stable membrane binding of MA are now well-established, the cooperative interactions between the MA monomers and MA-membrane remain elusive in the context of viral assembly and release. Our present study focuses on the membrane binding dynamics of a higher order oligomeric structure of MA protein (a dimer of trimers), which has not been explored before. Employing time-lagged independent component analysis (tICA) to our microsecond-long trajectories, we investigate conformational changes of the matrix protein induced by membrane binding. Interestingly, the Myr switch of an MA monomer correlates with the conformational switch of adjacent monomers in the same trimer. Together, our findings suggest complex protein dynamics during the formation of the immature HIV-1 lattice; while MA trimerization facilitates Myr insertion, MA trimer-trimer interactions in the immature lattice can hinder the same.
Collapse
Affiliation(s)
- Puja Banerjee
- Department of Chemistry,
Chicago Center for Theoretical Chemistry, Institute for Biophysical
Dynamics, and James Franck Institute, The
University of Chicago, Chicago, Illinois 60637, United States
| | | | - Gregory A. Voth
- Department of Chemistry,
Chicago Center for Theoretical Chemistry, Institute for Biophysical
Dynamics, and James Franck Institute, The
University of Chicago, Chicago, Illinois 60637, United States
| |
Collapse
|
6
|
Wu H, Noé F. Reaction coordinate flows for model reduction of molecular kinetics. J Chem Phys 2024; 160:044109. [PMID: 38270975 DOI: 10.1063/5.0176078] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2023] [Accepted: 12/26/2023] [Indexed: 01/26/2024] Open
Abstract
In this work, we introduce a flow based machine learning approach called reaction coordinate (RC) flow for the discovery of low-dimensional kinetic models of molecular systems. The RC flow utilizes a normalizing flow to design the coordinate transformation and a Brownian dynamics model to approximate the kinetics of RC, where all model parameters can be estimated in a data-driven manner. In contrast to existing model reduction methods for molecular kinetics, RC flow offers a trainable and tractable model of reduced kinetics in continuous time and space due to the invertibility of the normalizing flow. Furthermore, the Brownian dynamics-based reduced kinetic model investigated in this work yields a readily discernible representation of metastable states within the phase space of the molecular system. Numerical experiments demonstrate how effectively the proposed method discovers interpretable and accurate low-dimensional representations of given full-state kinetics from simulations.
Collapse
Affiliation(s)
- Hao Wu
- School of Mathematical Sciences, Institute of Natural Sciences and MOE-LSC, Shanghai Jiao Tong University, Shanghai, People's Republic of China
| | - Frank Noé
- Department of Mathematics and Computer Science and Department of Physics, Freie Universität Berlin, Berlin, Germany
- Department of Chemistry, Rice University, Houston, Texas 77005, USA
- Microsoft Research AI4Science, Berlin, Germany
| |
Collapse
|
7
|
Ishizone T, Matsunaga Y, Fuchigami S, Nakamura K. Representation of Protein Dynamics Disentangled by Time-Structure-Based Prior. J Chem Theory Comput 2024; 20:436-450. [PMID: 38151233 DOI: 10.1021/acs.jctc.3c01025] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2023]
Abstract
Representation learning (RL) is a universal technique for deriving low-dimensional disentangled representations from high-dimensional observations, aiding in a multitude of downstream tasks. RL has been extensively applied to various data types, including images and natural language. Here, we analyze molecular dynamics (MD) simulation data of biomolecules in terms of RL. Currently, state-of-the-art RL techniques, mainly motivated by the variational principle, try to capture slow motions in the representation (latent) space. Here, we propose two methods based on an alternative perspective on the disentanglement in the latent space. By disentanglement, we here mean the separation of underlying factors in the simulation data, aiding in detecting physically important coordinates for conformational transitions. The proposed methods introduce a simple prior that imposes temporal constraints in the latent space, serving as a regularization term to facilitate the capture of disentangled representations of dynamics. Comparison with other methods via the analysis of MD simulation trajectories for alanine dipeptide and chignolin validates that the proposed methods construct Markov state models (MSMs) whose implied time scales are comparable to those of the state-of-the-art methods. Using a measure based on total variation, we quantitatively evaluated that the proposed methods successfully disentangle physically important coordinates, aiding the interpretation of folding/unfolding transitions of chignolin. Overall, our methods provide good representations of complex biomolecular dynamics for downstream tasks, allowing for better interpretations of the conformational transitions.
Collapse
Affiliation(s)
- Tsuyoshi Ishizone
- Mathematical Sciences Program, Graduate School of Advanced Mathematical Sciences, Meiji University, Nakano 4-21-1, Nakano-ku, Tokyo 164-8525, Japan
| | - Yasuhiro Matsunaga
- Graduate School of Science and Engineering, Saitama University, Shimo-Okubo 255, Sakura-ku, Saitama-shi, Saitama 338-8570, Japan
| | - Sotaro Fuchigami
- Physical Biochemistry Laboratory, Division of Pharmaceutical Sciences, School of Pharmaceutical Sciences, University of Shizuoka, 52-1 Yada, Suruga-ku, Shizuoka 422-8526, Japan
| | - Kazuyuki Nakamura
- Department of Mathematical Sciences Based on Modeling and Analysis, School of Interdisciplinary Mathematical Sciences, Meiji University, Nakano 4-21-1, Nakano-ku, Tokyo 164-8525, Japan
| |
Collapse
|
8
|
Bose S, Lotz SD, Deb I, Shuck M, Lee KSS, Dickson A. How Robust Is the Ligand Binding Transition State? J Am Chem Soc 2023; 145:25318-25331. [PMID: 37943667 PMCID: PMC11059145 DOI: 10.1021/jacs.3c08940] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2023]
Abstract
For many drug targets, it has been shown that the kinetics of drug binding (e.g., on rate and off rate) is more predictive of drug efficacy than thermodynamic quantities alone. This motivates the development of predictive computational models that can be used to optimize compounds on the basis of their kinetics. The structural details underpinning these computational models are found not only in the bound state but also in the short-lived ligand binding transition states. Although transition states cannot be directly observed experimentally due to their extremely short lifetimes, recent successes have demonstrated that modeling the ligand binding transition state is possible with the help of enhanced sampling molecular dynamics methods. Previously, we generated unbinding paths for an inhibitor of soluble epoxide hydrolase (sEH) with a residence time of 11 min. Here, we computationally modeled unbinding events with the weighted ensemble method REVO (resampling of ensembles by variation optimization) for five additional inhibitors of sEH with residence times ranging from 14.25 to 31.75 min, with average prediction accuracy within an order of magnitude. The unbinding ensembles are analyzed in detail, focusing on features of the ligand binding transition state ensembles (TSEs). We find that ligands with similar bound poses can show significant differences in their ligand binding TSEs, in terms of their spatial distribution and protein-ligand interactions. However, we also find similarities across the TSEs when examining more general features such as ligand degrees of freedom. Together these findings show significant challenges for rational, kinetics-based drug design.
Collapse
Affiliation(s)
- Samik Bose
- Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, Michigan 48824, United States
| | - Samuel D Lotz
- Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, Michigan 48824, United States
| | - Indrajit Deb
- Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, Michigan 48824, United States
| | - Megan Shuck
- Department of Pharmacology and Toxicology, Michigan State University, East Lansing, Michigan 48824, United States
| | - Kin Sing Stephen Lee
- Department of Pharmacology and Toxicology, Michigan State University, East Lansing, Michigan 48824, United States
- Department of Chemistry, Michigan State University, East Lansing, Michigan 48824, United States
- Institute of Integrative Toxicology, Michigan State University, East Lansing, Michigan 48824, United States
| | - Alex Dickson
- Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, Michigan 48824, United States
- Department of Computational Mathematics, Science and Engineering, Michigan State University, East Lansing, Michigan 48824, United States
| |
Collapse
|
9
|
Ngo VA, Lin YT, Perez D. Improving Estimation of the Koopman Operator with Kolmogorov-Smirnov Indicator Functions. J Chem Theory Comput 2023; 19:7187-7198. [PMID: 37800673 DOI: 10.1021/acs.jctc.3c00632] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/07/2023]
Abstract
It has become common to perform kinetic analysis using approximate Koopman operators that transform high-dimensional timeseries of observables into ranked dynamical modes. The key to the practical success of the approach is the identification of a set of observables that form a good basis on which to expand the slow relaxation modes. Good observables are, however, difficult to identify a priori and suboptimal choices can lead to significant underestimations of characteristic time scales. Leveraging the representation of slow dynamics in terms of Hidden Markov Models (HMM), we propose a simple and computationally efficient clustering procedure to infer surrogate observables that form a good basis for slow modes. We apply the approach to an analytically solvable model system as well as on three protein systems of different complexities. We consistently demonstrate that the inferred indicator functions can significantly improve the estimation of the leading eigenvalues of Koopman operators and correctly identify key states and transition time scales of stochastic systems, even when good observables are not known a priori.
Collapse
Affiliation(s)
- Van A Ngo
- Advanced Computing for Life Sciences and Engineering, Computing and Computational Sciences, National Center for Computational Sciences, Oak Ridge National Laboratory, Oak Ridge, Tennessee 37830, United States
| | - Yen Ting Lin
- Information Sciences Group (CCS-3), Computer, Computational and Statistical Sciences Division, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, United States
| | - Danny Perez
- Physics and Chemistry of Materials Group (T-1), Theoretical Division, Los Alamos National Laboratory, Los Alamos, New Mexico 87544, United States
| |
Collapse
|
10
|
Pasarkar AP, Bencomo GM, Olsson S, Dieng AB. Vendi sampling for molecular simulations: Diversity as a force for faster convergence and better exploration. J Chem Phys 2023; 159:144108. [PMID: 37823459 DOI: 10.1063/5.0166172] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2023] [Accepted: 09/25/2023] [Indexed: 10/13/2023] Open
Abstract
Molecular dynamics (MD) is the method of choice for understanding the structure, function, and interactions of molecules. However, MD simulations are limited by the strong metastability of many molecules, which traps them in a single conformation basin for an extended amount of time. Enhanced sampling techniques, such as metadynamics and replica exchange, have been developed to overcome this limitation and accelerate the exploration of complex free energy landscapes. In this paper, we propose Vendi Sampling, a replica-based algorithm for increasing the efficiency and efficacy of the exploration of molecular conformation spaces. In Vendi sampling, replicas are simulated in parallel and coupled via a global statistical measure, the Vendi Score, to enhance diversity. Vendi sampling allows for the recovery of unbiased sampling statistics and dramatically improves sampling efficiency. We demonstrate the effectiveness of Vendi sampling in improving molecular dynamics simulations by showing significant improvements in coverage and mixing between metastable states and convergence of free energy estimates for four common benchmarks, including Alanine Dipeptide and Chignolin.
Collapse
Affiliation(s)
- Amey P Pasarkar
- Vertaix, Department of Computer Science, Princeton University, 35 Olden Street, Princeton, New Jersey 08544, USA
| | - Gianluca M Bencomo
- Department of Computer Science, Princeton University, 35 Olden Street, Princeton, New Jersey 08544, USA
| | - Simon Olsson
- Department of Computer Science and Engineering, Chalmers University of Technology, Rännvägen 6, 41258 Gothenburg, Sweden
| | - Adji Bousso Dieng
- Vertaix, Department of Computer Science, Princeton University, 35 Olden Street, Princeton, New Jersey 08544, USA
| |
Collapse
|
11
|
Conflitti P, Raniolo S, Limongelli V. Perspectives on Ligand/Protein Binding Kinetics Simulations: Force Fields, Machine Learning, Sampling, and User-Friendliness. J Chem Theory Comput 2023; 19:6047-6061. [PMID: 37656199 PMCID: PMC10536999 DOI: 10.1021/acs.jctc.3c00641] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2023] [Indexed: 09/02/2023]
Abstract
Computational techniques applied to drug discovery have gained considerable popularity for their ability to filter potentially active drugs from inactive ones, reducing the time scale and costs of preclinical investigations. The main focus of these studies has historically been the search for compounds endowed with high affinity for a specific molecular target to ensure the formation of stable and long-lasting complexes. Recent evidence has also correlated the in vivo drug efficacy with its binding kinetics, thus opening new fascinating scenarios for ligand/protein binding kinetic simulations in drug discovery. The present article examines the state of the art in the field, providing a brief summary of the most popular and advanced ligand/protein binding kinetics techniques and evaluating their current limitations and the potential solutions to reach more accurate kinetic models. Particular emphasis is put on the need for a paradigm change in the present methodologies toward ligand and protein parametrization, the force field problem, characterization of the transition states, the sampling issue, and algorithms' performance, user-friendliness, and data openness.
Collapse
Affiliation(s)
- Paolo Conflitti
- Faculty
of Biomedical Sciences, Euler Institute, Universitá della Svizzera italiana (USI), 6900 Lugano, Switzerland
| | - Stefano Raniolo
- Faculty
of Biomedical Sciences, Euler Institute, Universitá della Svizzera italiana (USI), 6900 Lugano, Switzerland
| | - Vittorio Limongelli
- Faculty
of Biomedical Sciences, Euler Institute, Universitá della Svizzera italiana (USI), 6900 Lugano, Switzerland
- Department
of Pharmacy, University of Naples “Federico
II”, 80131 Naples, Italy
| |
Collapse
|
12
|
Strahan J, Finkel J, Dinner AR, Weare J. Predicting rare events using neural networks and short-trajectory data. JOURNAL OF COMPUTATIONAL PHYSICS 2023; 488:112152. [PMID: 37332834 PMCID: PMC10270692 DOI: 10.1016/j.jcp.2023.112152] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/20/2023]
Abstract
Estimating the likelihood, timing, and nature of events is a major goal of modeling stochastic dynamical systems. When the event is rare in comparison with the timescales of simulation and/or measurement needed to resolve the elemental dynamics, accurate prediction from direct observations becomes challenging. In such cases a more effective approach is to cast statistics of interest as solutions to Feynman-Kac equations (partial differential equations). Here, we develop an approach to solve Feynman-Kac equations by training neural networks on short-trajectory data. Our approach is based on a Markov approximation but otherwise avoids assumptions about the underlying model and dynamics. This makes it applicable to treating complex computational models and observational data. We illustrate the advantages of our method using a low-dimensional model that facilitates visualization, and this analysis motivates an adaptive sampling strategy that allows on-the-fly identification of and addition of data to regions important for predicting the statistics of interest. Finally, we demonstrate that we can compute accurate statistics for a 75-dimensional model of sudden stratospheric warming. This system provides a stringent test bed for our method.
Collapse
Affiliation(s)
- John Strahan
- Department of Chemistry and James Franck Institute, the University of Chicago, Chicago, IL 60637
| | - Justin Finkel
- Department of Earth, Atmospheric, and Planetary Sciences, Massachusetts Institute of Technology, Cambridge, MA 02139
| | - Aaron R. Dinner
- Department of Chemistry and James Franck Institute, the University of Chicago, Chicago, IL 60637
- Committee on Computational and Applied Mathematics, the University of Chicago, Chicago, IL 60637
| | - Jonathan Weare
- Courant Institute of Mathematical Sciences, New York University, New York, New York 10012
| |
Collapse
|
13
|
Qiu Y, O’Connor MS, Xue M, Liu B, Huang X. An Efficient Path Classification Algorithm Based on Variational Autoencoder to Identify Metastable Path Channels for Complex Conformational Changes. J Chem Theory Comput 2023; 19:4728-4742. [PMID: 37382437 PMCID: PMC11042546 DOI: 10.1021/acs.jctc.3c00318] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/30/2023]
Abstract
Conformational changes (i.e., dynamic transitions between pairs of conformational states) play important roles in many chemical and biological processes. Constructing the Markov state model (MSM) from extensive molecular dynamics (MD) simulations is an effective approach to dissect the mechanism of conformational changes. When combined with transition path theory (TPT), MSM can be applied to elucidate the ensemble of kinetic pathways connecting pairs of conformational states. However, the application of TPT to analyze complex conformational changes often results in a vast number of kinetic pathways with comparable fluxes. This obstacle is particularly pronounced in heterogeneous self-assembly and aggregation processes. The large number of kinetic pathways makes it challenging to comprehend the molecular mechanisms underlying conformational changes of interest. To address this challenge, we have developed a path classification algorithm named latent-space path clustering (LPC) that efficiently lumps parallel kinetic pathways into distinct metastable path channels, making them easier to comprehend. In our algorithm, MD conformations are first projected onto a low-dimensional space containing a small set of collective variables (CVs) by time-structure-based independent component analysis (tICA) with kinetic mapping. Then, MSM and TPT are constructed to obtain the ensemble of pathways, and a deep learning architecture named the variational autoencoder (VAE) is used to learn the spatial distributions of kinetic pathways in the continuous CV space. Based on the trained VAE model, the TPT-generated ensemble of kinetic pathways can be embedded into a latent space, where the classification becomes clear. We show that LPC can efficiently and accurately identify the metastable path channels in three systems: a 2D potential, the aggregation of two hydrophobic particles in water, and the folding of the Fip35 WW domain. Using the 2D potential, we further demonstrate that our LPC algorithm outperforms the previous path-lumping algorithms by making substantially fewer incorrect assignments of individual pathways to four path channels. We expect that LPC can be widely applied to identify the dominant kinetic pathways underlying complex conformational changes.
Collapse
Affiliation(s)
- Yunrui Qiu
- Department of Chemistry, Theoretical Chemistry Institute, University of Wisconsin-Madison, Madison, WI, 53706, USA
| | - Michael S. O’Connor
- Biophysics Graduate Program, University of Wisconsin-Madison, Madison, WI, 53706, USA
| | - Mingyi Xue
- Department of Chemistry, Theoretical Chemistry Institute, University of Wisconsin-Madison, Madison, WI, 53706, USA
| | - Bojun Liu
- Department of Chemistry, Theoretical Chemistry Institute, University of Wisconsin-Madison, Madison, WI, 53706, USA
| | - Xuhui Huang
- Department of Chemistry, Theoretical Chemistry Institute, University of Wisconsin-Madison, Madison, WI, 53706, USA
- Biophysics Graduate Program, University of Wisconsin-Madison, Madison, WI, 53706, USA
| |
Collapse
|
14
|
Conev A, Rigo MM, Devaurs D, Fonseca AF, Kalavadwala H, de Freitas MV, Clementi C, Zanatta G, Antunes DA, Kavraki LE. EnGens: a computational framework for generation and analysis of representative protein conformational ensembles. Brief Bioinform 2023; 24:bbad242. [PMID: 37418278 PMCID: PMC10359083 DOI: 10.1093/bib/bbad242] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2023] [Revised: 05/23/2023] [Accepted: 06/10/2023] [Indexed: 07/08/2023] Open
Abstract
Proteins are dynamic macromolecules that perform vital functions in cells. A protein structure determines its function, but this structure is not static, as proteins change their conformation to achieve various functions. Understanding the conformational landscapes of proteins is essential to understand their mechanism of action. Sets of carefully chosen conformations can summarize such complex landscapes and provide better insights into protein function than single conformations. We refer to these sets as representative conformational ensembles. Recent advances in computational methods have led to an increase in the number of available structural datasets spanning conformational landscapes. However, extracting representative conformational ensembles from such datasets is not an easy task and many methods have been developed to tackle it. Our new approach, EnGens (short for ensemble generation), collects these methods into a unified framework for generating and analyzing representative protein conformational ensembles. In this work, we: (1) provide an overview of existing methods and tools for representative protein structural ensemble generation and analysis; (2) unify existing approaches in an open-source Python package, and a portable Docker image, providing interactive visualizations within a Jupyter Notebook pipeline; (3) test our pipeline on a few canonical examples from the literature. Representative ensembles produced by EnGens can be used for many downstream tasks such as protein-ligand ensemble docking, Markov state modeling of protein dynamics and analysis of the effect of single-point mutations.
Collapse
Affiliation(s)
- Anja Conev
- Department of Computer Science, Rice University, Houston 77005, TX, USA
| | | | - Didier Devaurs
- MRC Institute of Genetics and Cancer, University of Edinburgh, Edinburgh EH4 2XU, UK
| | | | - Hussain Kalavadwala
- Department of Biology and Biochemistry, University of Houston, Houston 77004, TX, USA
| | | | - Cecilia Clementi
- Department of Physics, Freie Universität Berlin, Berlin 14195, Germany
| | - Geancarlo Zanatta
- Department of Biophysics, Institute of Biosciences, Federal University of Rio Grande do Sul, Porto Alegre 91501-970, Brazil
| | - Dinler Amaral Antunes
- Department of Biology and Biochemistry, University of Houston, Houston 77004, TX, USA
| | - Lydia E Kavraki
- Department of Computer Science, Rice University, Houston 77005, TX, USA
| |
Collapse
|
15
|
Conev A, Rigo MM, Devaurs D, Fonseca AF, Kalavadwala H, de Freitas MV, Clementi C, Zanatta G, Antunes DA, Kavraki L. EnGens: a computational framework for generation and analysis of representative protein conformational ensembles. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.04.24.538094. [PMID: 37163076 PMCID: PMC10168271 DOI: 10.1101/2023.04.24.538094] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/11/2023]
Abstract
Proteins are dynamic macromolecules that perform vital functions in cells. A protein structure determines its function, but this structure is not static, as proteins change their conformation to achieve various functions. Understanding the conformational landscapes of proteins is essential to understand their mechanism of action. Sets of carefully chosen conformations can summarize such complex landscapes and provide better insights into protein function than single conformations. We refer to these sets as representative conformational ensembles. Recent advances in computational methods have led to an increase in number of available structural datasets spanning conformational landscapes. However, extracting representative conformational ensembles from such datasets is not an easy task and many methods have been developed to tackle it. Our new approach, EnGens (short for ensemble generation), collects these methods into a unified framework for generating and analyzing protein conformational ensembles. In this work we: (1) provide an overview of existing methods and tools for protein structural ensemble generation and analysis; (2) unify existing approaches in an open-source Python package, and a portable Docker image, providing interactive visualizations within a Jupyter Notebook pipeline; (3) test our pipeline on a few canonical examples found in the literature. Representative ensembles produced by EnGens can be used for many downstream tasks such as protein-ligand ensemble docking, Markov state modeling of protein dynamics and analysis of the effect of single-point mutations.
Collapse
|
16
|
Shmilovich K, Ferguson AL. Girsanov Reweighting Enhanced Sampling Technique (GREST): On-the-Fly Data-Driven Discovery of and Enhanced Sampling in Slow Collective Variables. J Phys Chem A 2023; 127:3497-3517. [PMID: 37036804 DOI: 10.1021/acs.jpca.3c00505] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/11/2023]
Abstract
Molecular dynamics simulations of microscopic phenomena are limited by the short integration time steps which are required for numerical stability but which limit the practically achievable simulation time scales. Collective variable (CV) enhanced sampling techniques apply biases to predefined collective coordinates to promote barrier crossing, phase space exploration, and sampling of rare events. The efficacy of these techniques is contingent on the selection of good CVs correlated with the molecular motions governing the long-time dynamical evolution of the system. In this work, we introduce Girsanov Reweighting Enhanced Sampling Technique (GREST) as an adaptive sampling scheme that interleaves rounds of data-driven slow CV discovery and enhanced sampling along these coordinates. Since slow CVs are inherently dynamical quantities, a key ingredient in our approach is the use of both thermodynamic and dynamical Girsanov reweighting corrections for rigorous estimation of slow CVs from biased simulation data. We demonstrate our approach on a toy 1D 4-well potential, a simple biomolecular system alanine dipeptide, and the Trp-Leu-Ala-Leu-Leu (WLALL) pentapeptide. In each case GREST learns appropriate slow CVs and drives sampling of all thermally accessible metastable states starting from zero prior knowledge of the system. We make GREST accessible to the community via a publicly available open source Python package.
Collapse
Affiliation(s)
- Kirill Shmilovich
- Pritzker School of Molecular Engineering, University of Chicago, Chicago, Illinois 60637, United States
| | - Andrew L Ferguson
- Pritzker School of Molecular Engineering, University of Chicago, Chicago, Illinois 60637, United States
| |
Collapse
|
17
|
Cardellini A, Crippa M, Lionello C, Afrose SP, Das D, Pavan GM. Unsupervised Data-Driven Reconstruction of Molecular Motifs in Simple to Complex Dynamic Micelles. J Phys Chem B 2023; 127:2595-2608. [PMID: 36891625 PMCID: PMC10041528 DOI: 10.1021/acs.jpcb.2c08726] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/10/2023]
Abstract
The reshuffling mobility of molecular building blocks in self-assembled micelles is a key determinant of many their interesting properties, from emerging morphologies and surface compartmentalization, to dynamic reconfigurability and stimuli-responsiveness. However, the microscopic details of such complex structural dynamics are typically nontrivial to elucidate, especially in multicomponent assemblies. Here we show a machine-learning approach that allows us to reconstruct the structural and dynamic complexity of mono- and bicomponent surfactant micelles from high-dimensional data extracted from equilibrium molecular dynamics simulations. Unsupervised clustering of smooth overlap of atomic position (SOAP) data enables us to identify, in a set of multicomponent surfactant micelles, the dominant local molecular environments that emerge within them and to retrace their dynamics, in terms of exchange probabilities and transition pathways of the constituent building blocks. Tested on a variety of micelles differing in size and in the chemical nature of the constitutive self-assembling units, this approach effectively recognizes the molecular motifs populating them in an exquisitely agnostic and unsupervised way, and allows correlating them to their composition in terms of constitutive surfactant species.
Collapse
Affiliation(s)
- Annalisa Cardellini
- Department of Innovative Technologies, University of Applied Sciences and Arts of Southern Switzerland, Polo Universitario Lugano, Campus Est, Via la Santa 1, 6962 Lugano-Viganello, Switzerland
| | - Martina Crippa
- Department of Applied Science and Technology, Politecnico di Torino, Corso Duca degli Abruzzi 24, 10129 Torino, Italy
| | - Chiara Lionello
- Department of Applied Science and Technology, Politecnico di Torino, Corso Duca degli Abruzzi 24, 10129 Torino, Italy
| | - Syed Pavel Afrose
- Department of Chemical Sciences and Centre for Advanced Functional Materials, Indian Institute of Science Education and Research (IISER) Kolkata, Mohanpur 741246, India
| | - Dibyendu Das
- Department of Chemical Sciences and Centre for Advanced Functional Materials, Indian Institute of Science Education and Research (IISER) Kolkata, Mohanpur 741246, India
| | - Giovanni M Pavan
- Department of Innovative Technologies, University of Applied Sciences and Arts of Southern Switzerland, Polo Universitario Lugano, Campus Est, Via la Santa 1, 6962 Lugano-Viganello, Switzerland
- Department of Applied Science and Technology, Politecnico di Torino, Corso Duca degli Abruzzi 24, 10129 Torino, Italy
| |
Collapse
|
18
|
Maruyama Y, Igarashi R, Ushiku Y, Mitsutake A. Analysis of Protein Folding Simulation with Moving Root Mean Square Deviation. J Chem Inf Model 2023; 63:1529-1541. [PMID: 36821519 PMCID: PMC10015464 DOI: 10.1021/acs.jcim.2c01444] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/24/2023]
Abstract
We apply moving root-mean-square deviation (mRMSD), which does not require a reference structure, as a method for analyzing protein dynamics. This method can be used to calculate the root-mean-square deviation (RMSD) of structure between two specified time points and to analyze protein dynamics behavior through time series analysis. We applied this method to the Trp-cage trajectory calculated by the Anton supercomputer and found that it shows regions of stable states as well as the conventional RMSD. In addition, we extracted a characteristic structure in which the side chains of Asp1 and Arg16 form hydrogen bonds near the most stable structure of the Trp-cage. We also determined that ≥20 ns is an appropriate time interval to investigate protein dynamics using mRMSD. Applying this method to NuG2 protein, we found that mRMSD can be used to detect regions of metastable states in addition to the stable state. This method can be applied to molecular dynamics simulations of proteins whose stable structures are unknown.
Collapse
Affiliation(s)
- Yutaka Maruyama
- OMRON SINIC X Corporation, Tokyo 113-0033, Japan.,Department of Physics, School of Science and Technology, Meiji University, 1-1-1 Higashi-Mita, Tama-ku, Kawasaki-shi, Kanagawa 214-8571, Japan
| | - Ryo Igarashi
- OMRON SINIC X Corporation, Tokyo 113-0033, Japan
| | | | - Ayori Mitsutake
- Department of Physics, School of Science and Technology, Meiji University, 1-1-1 Higashi-Mita, Tama-ku, Kawasaki-shi, Kanagawa 214-8571, Japan
| |
Collapse
|
19
|
Ojha AA, Thakur S, Ahn SH, Amaro RE. DeepWEST: Deep Learning of Kinetic Models with the Weighted Ensemble Simulation Toolkit for Enhanced Sampling. J Chem Theory Comput 2023; 19:1342-1359. [PMID: 36719802 DOI: 10.1021/acs.jctc.2c00282] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023]
Abstract
Recent advances in computational power and algorithms have enabled molecular dynamics (MD) simulations to reach greater time scales. However, for observing conformational transitions associated with biomolecular processes, MD simulations still have limitations. Several enhanced sampling techniques seek to address this challenge, including the weighted ensemble (WE) method, which samples transitions between metastable states using many weighted trajectories to estimate kinetic rate constants. However, initial sampling of the potential energy surface has a significant impact on the performance of WE, i.e., convergence and efficiency. We therefore introduce deep-learned kinetic modeling approaches that extract statistically relevant information from short MD trajectories to provide a well-sampled initial state distribution for WE simulations. This hybrid approach overcomes any statistical bias to the system, as it runs short unbiased MD trajectories and identifies meaningful metastable states of the system. It is shown to provide a more refined free energy landscape closer to the steady state that could efficiently sample kinetic properties such as rate constants.
Collapse
Affiliation(s)
- Anupam Anand Ojha
- Department of Chemistry, University of California San Diego, La Jolla, California92093, United States
| | - Saumya Thakur
- Department of Chemistry, Indian Institute of Technology Bombay, Mumbai, Maharashtra400076, India
| | - Surl-Hee Ahn
- Department of Chemical Engineering, University of California Davis, Davis, California95616, United States
| | - Rommie E Amaro
- Department of Chemistry, University of California San Diego, La Jolla, California92093, United States
| |
Collapse
|
20
|
Chen H, Chipot C. Chasing collective variables using temporal data-driven strategies. QRB DISCOVERY 2023; 4:e2. [PMID: 37564298 PMCID: PMC10411323 DOI: 10.1017/qrd.2022.23] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2022] [Revised: 12/21/2022] [Accepted: 12/29/2022] [Indexed: 01/09/2023] Open
Abstract
The convergence of free-energy calculations based on importance sampling depends heavily on the choice of collective variables (CVs), which in principle, should include the slow degrees of freedom of the biological processes to be investigated. Autoencoders (AEs), as emerging data-driven dimension reduction tools, have been utilised for discovering CVs. AEs, however, are often treated as black boxes, and what AEs actually encode during training, and whether the latent variables from encoders are suitable as CVs for further free-energy calculations remains unknown. In this contribution, we review AEs and their time-series-based variants, including time-lagged AEs (TAEs) and modified TAEs, as well as the closely related model variational approach for Markov processes networks (VAMPnets). We then show through numerical examples that AEs learn the high-variance modes instead of the slow modes. In stark contrast, time series-based models are able to capture the slow modes. Moreover, both modified TAEs with extensions from slow feature analysis and the state-free reversible VAMPnets (SRVs) can yield orthogonal multidimensional CVs. As an illustration, we employ SRVs to discover the CVs of the isomerizations of N-acetyl-N'-methylalanylamide and trialanine by iterative learning with trajectories from biased simulations. Last, through numerical experiments with anisotropic diffusion, we investigate the potential relationship of time-series-based models and committor probabilities.
Collapse
Affiliation(s)
- Haochuan Chen
- Laboratoire International Associé Centre National de la Recherche Scientifique et University of Illinois at Urbana-Champaign, Unité Mixte de Recherche n°7019, Université de Lorraine, 54506 Vandœuvre-lès-Nancy, France
| | - Christophe Chipot
- Laboratoire International Associé Centre National de la Recherche Scientifique et University of Illinois at Urbana-Champaign, Unité Mixte de Recherche n°7019, Université de Lorraine, 54506 Vandœuvre-lès-Nancy, France
- Theoretical and Computational Biophysics Group, Beckman Institute, and Department of Physics, University of Illinois at Urbana-Champaign, Urbana, IL61801, USA
- Department of Biochemistry and Molecular Biology, University of Chicago, Chicago, IL60637, USA
| |
Collapse
|
21
|
Köhs L, Kukovetz K, Rauh O, Koeppl H. Nonparametric Bayesian inference for meta-stable conformational dynamics. Phys Biol 2022; 19. [PMID: 35944548 DOI: 10.1088/1478-3975/ac885e] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2022] [Accepted: 08/09/2022] [Indexed: 11/11/2022]
Abstract
Analyses of structural dynamics of biomolecules hold great promise to deepen the understanding of and ability to construct complex molecular systems. To this end, both experimental and computational means are available, such as fluorescence quenching experiments or molecular dynamics simulations, respectively. We argue that while seemingly disparate, both fields of study have to deal with the same type of data about the same underlying phenomenon of conformational switching. Two central challenges typically arise in both contexts: (i) the amount of obtained data is large, and (ii) it is often unknown how many distinct molecular states underlie these data. In this study, we build on the established idea of Markov state modeling and propose a generative, Bayesian nonparametric hidden Markov state model that addresses these challenges. Utilizing hierarchical Dirichlet processes, we treat different meta-stable molecule conformations as distinct Markov states, the number of which we then do not have to set a priori. In contrast to existing approaches to both experimental as well as simulation data that are based on the same idea, we leverage a mean-field variational inference approach, enabling scalable inference on large amounts of data. Furthermore, we specify the model also for the important case of angular data, which however proves to be computationally intractable. Addressing this issue, we propose a computationally tractable approximation to the angular model. We demonstrate the method on synthetic ground truth data and apply it to known benchmark problems as well as electrophysiological experimental data from a conformation-switching ion channel to highlight its practical utility.
Collapse
Affiliation(s)
- Lukas Köhs
- Centre for Synthetic Biology, Technische Universität Darmstadt, Rundeturmstrasse 12, Darmstadt, 64283, GERMANY
| | - Kerri Kukovetz
- Biology Department, Technische Universität Darmstadt, Schnittspahnstrasse 3, Darmstadt, 64287, GERMANY
| | - Oliver Rauh
- Biology Department, Technische Universität Darmstadt, Schnittspahnstrasse 3, Darmstadt, 64287, GERMANY
| | - Heinz Koeppl
- Centre for Synthetic Biology, Technische Universität Darmstadt, Rundeturmstrasse 12, Darmstadt, 64283, GERMANY
| |
Collapse
|
22
|
Novelli P, Bonati L, Pontil M, Parrinello M. Characterizing Metastable States with the Help of Machine Learning. J Chem Theory Comput 2022; 18:5195-5202. [PMID: 35920063 DOI: 10.1021/acs.jctc.2c00393] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Present-day atomistic simulations generate long trajectories of ever more complex systems. Analyzing these data, discovering metastable states, and uncovering their nature are becoming increasingly challenging. In this paper, we first use the variational approach to conformation dynamics to discover the slowest dynamical modes of the simulations. This allows the different metastable states of the system to be located and organized hierarchically. The physical descriptors that characterize metastable states are discovered by means of a machine learning method. We show in the cases of two proteins, chignolin and bovine pancreatic trypsin inhibitor, how such analysis can be effortlessly performed in a matter of seconds. Another strength of our approach is that it can be applied to the analysis of both unbiased and biased simulations.
Collapse
Affiliation(s)
- Pietro Novelli
- Computational Statistics and Machine Learning, Italian Institute of Technology, Via Enrico Melen 83, 16142 Genoa, Italy
| | - Luigi Bonati
- Atomistic Simulations, Italian Institute of Technology, Via Enrico Melen 83, 16142 Genoa, Italy
| | - Massimiliano Pontil
- Computational Statistics and Machine Learning, Italian Institute of Technology, Via Enrico Melen 83, 16142 Genoa, Italy.,Department of Computer Science, University College London, London WC1E 6BT, United Kingdom
| | - Michele Parrinello
- Atomistic Simulations, Italian Institute of Technology, Via Enrico Melen 83, 16142 Genoa, Italy
| |
Collapse
|
23
|
Li Y, Gong H. Identifying a Feasible Transition Pathway between Two Conformational States for a Protein. J Chem Theory Comput 2022; 18:4529-4543. [PMID: 35723447 DOI: 10.1021/acs.jctc.2c00390] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Proteins usually need to transit between different conformational states to fulfill their biological functions. In the mechanistic study of such transition processes by molecular dynamics simulations, identification of the minimum free energy path (MFEP) can substantially reduce the sampling space, thus enabling rigorous thermodynamic evaluation of the process. Conventionally, the MFEP is derived by iterative local optimization from an initial path, which is typically generated by simple brute force techniques like the targeted molecular dynamics (tMD). Therefore, the quality of the initial path determines the successfulness of MFEP estimation. In this work, we propose a method to improve derivation of the initial path. Through iterative relaxation-biasing simulations in a bidirectional manner, this method can construct a feasible transition pathway connecting two known states for a protein. Evaluation on small, fast-folding proteins against long equilibrium trajectories supports the good sampling efficiency of our method. When applied to larger proteins including the catalytic domain of human c-Src kinase as well as the converter domain of myosin VI, the paths generated by our method deviate significantly from those computed with the generic tMD approach. More importantly, free energy profiles and intermediate states obtained from our paths exhibit remarkable improvements over those from tMD paths with respect to both physical rationality and consistency with a priori knowledge.
Collapse
Affiliation(s)
- Yao Li
- MOE Key Laboratory of Bioinformatics, School of Life Sciences, Tsinghua University, Beijing 100084, China.,Beijing Advanced Innovation Center for Structural Biology, Tsinghua University, Beijing 100084, China
| | - Haipeng Gong
- MOE Key Laboratory of Bioinformatics, School of Life Sciences, Tsinghua University, Beijing 100084, China.,Beijing Advanced Innovation Center for Structural Biology, Tsinghua University, Beijing 100084, China
| |
Collapse
|
24
|
Principal Component Analysis and Related Methods for Investigating the Dynamics of Biological Macromolecules. J 2022. [DOI: 10.3390/j5020021] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
Abstract
Principal component analysis (PCA) is used to reduce the dimensionalities of high-dimensional datasets in a variety of research areas. For example, biological macromolecules, such as proteins, exhibit many degrees of freedom, allowing them to adopt intricate structures and exhibit complex functions by undergoing large conformational changes. Therefore, molecular simulations of and experiments on proteins generate a large number of structure variations in high-dimensional space. PCA and many PCA-related methods have been developed to extract key features from such structural data, and these approaches have been widely applied for over 30 years to elucidate macromolecular dynamics. This review mainly focuses on the methodological aspects of PCA and related methods and their applications for investigating protein dynamics.
Collapse
|
25
|
Paul TK, Taraphder S. Nonlinear Reaction Coordinate of an Enzyme Catalyzed Proton Transfer Reaction. J Phys Chem B 2022; 126:1413-1425. [PMID: 35138854 DOI: 10.1021/acs.jpcb.1c08760] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
We present an in-depth study on the theoretical calculation of an optimum reaction coordinate as a linear or nonlinear combination of important collective variables (CVs) sampled from an ensemble of reactive transition paths for an intramolecular proton transfer reaction catalyzed by the enzyme human carbonic anhydrase (HCA) II. The linear models are optimized by likelihood maximization for a given number of CVs. The nonlinear models are based on an artificial neural network with the same number of CVs and optimized by minimizing the root-mean-square error in comparison to a training set of committor estimators generated for the given transition. The nonlinear reaction coordinate thus obtained yields the free energy of activation and rate constant as 9.46 kcal mol-1 and 1.25 × 106 s-1, respectively. These estimates are found to be in quantitative agreement with the known experimental results. We have also used an extended autoencoder model to show that a similar analysis can be carried out using a single CV only. The resultant free energies and kinetics of the reaction slightly overestimate the experimental data. The implications of these results are discussed using a detailed microkinetic scheme of the proton transfer reaction catalyzed by HCA II.
Collapse
Affiliation(s)
- Tanmoy Kumar Paul
- Department of Chemistry, Indian Institute of Technology Kharagpur, Kharagpur 721302, India
| | - Srabani Taraphder
- Department of Chemistry, Indian Institute of Technology Kharagpur, Kharagpur 721302, India
| |
Collapse
|
26
|
Bhakat S. Collective variable discovery in the age of machine learning: reality, hype and everything in between. RSC Adv 2022; 12:25010-25024. [PMID: 36199882 PMCID: PMC9437778 DOI: 10.1039/d2ra03660f] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2022] [Accepted: 08/20/2022] [Indexed: 11/21/2022] Open
Abstract
Understanding the kinetics and thermodynamics profile of biomolecules is necessary to understand their functional roles which has a major impact in mechanism driven drug discovery. Molecular dynamics simulation has been routinely used to understand conformational dynamics and molecular recognition in biomolecules. Statistical analysis of high-dimensional spatiotemporal data generated from molecular dynamics simulation requires identification of a few low-dimensional variables which can describe the essential dynamics of a system without significant loss of information. In physical chemistry, these low-dimensional variables are often called collective variables. Collective variables are used to generate reduced representations of free energy surfaces and calculate transition probabilities between different metastable basins. However the choice of collective variables is not trivial for complex systems. Collective variables range from geometric criteria such as distances and dihedral angles to abstract ones such as weighted linear combinations of multiple geometric variables. The advent of machine learning algorithms led to increasing use of abstract collective variables to represent biomolecular dynamics. In this review, I will highlight several nuances of commonly used collective variables ranging from geometric to abstract ones. Further, I will put forward some cases where machine learning based collective variables were used to describe simple systems which in principle could have been described by geometric ones. Finally, I will put forward my thoughts on artificial general intelligence and how it can be used to discover and predict collective variables from spatiotemporal data generated by molecular dynamics simulations. Data driven collective variable discovery methods to capture conformational dynamics in biological macromolecules.![]()
Collapse
Affiliation(s)
- Soumendranath Bhakat
- Department of Biochemistry and Biophysics, Perelman School of Medicine, University of Pennsylvania, Pennsylvania 19104-6059, USA
| |
Collapse
|
27
|
Beyerle ER, Guenza MG. Identifying the leading dynamics of ubiquitin: A comparison between the tICA and the LE4PD slow fluctuations in amino acids' position. J Chem Phys 2021; 155:244108. [PMID: 34972386 DOI: 10.1063/5.0059688] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
Molecular Dynamics (MD) simulations of proteins implicitly contain the information connecting the atomistic molecular structure and proteins' biologically relevant motion, where large-scale fluctuations are deemed to guide folding and function. In the complex multiscale processes described by MD trajectories, it is difficult to identify, separate, and study those large-scale fluctuations. This problem can be formulated as the need to identify a small number of collective variables that guide the slow kinetic processes. The most promising method among the ones used to study the slow leading processes in proteins' dynamics is the time-structure based on time-lagged independent component analysis (tICA), which identifies the dominant components in a noisy signal. Recently, we developed an anisotropic Langevin approach for the dynamics of proteins, called the anisotropic Langevin Equation for Protein Dynamics or LE4PD-XYZ. This approach partitions the protein's MD dynamics into mostly uncorrelated, wavelength-dependent, diffusive modes. It associates with each mode a free-energy map, where one measures the spatial extension and the time evolution of the mode-dependent, slow dynamical fluctuations. Here, we compare the tICA modes' predictions with the collective LE4PD-XYZ modes. We observe that the two methods consistently identify the nature and extension of the slowest fluctuation processes. The tICA separates the leading processes in a smaller number of slow modes than the LE4PD does. The LE4PD provides time-dependent information at short times and a formal connection to the physics of the kinetic processes that are missing in the pure statistical analysis of tICA.
Collapse
Affiliation(s)
- E R Beyerle
- Institute for Fundamental Science and Department of Chemistry and Biochemistry, University of Oregon, Eugene, Oregon 97403, USA
| | - M G Guenza
- Institute for Fundamental Science and Department of Chemistry and Biochemistry, University of Oregon, Eugene, Oregon 97403, USA
| |
Collapse
|
28
|
Klebanov I, Sprungk B, Sullivan T. The linear conditional expectation in Hilbert space. BERNOULLI 2021. [DOI: 10.3150/20-bej1308] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Affiliation(s)
- Ilja Klebanov
- Zuse Institute Berlin, Takustraße 7, 14195 Berlin, Germany
| | - Björn Sprungk
- Technische Universität Bergakademie Freiberg, 09596 Freiberg, Germany
| | - T.J. Sullivan
- Zuse Institute Berlin, Takustraße 7, 14195 Berlin, Germany
| |
Collapse
|
29
|
Bandyopadhyay S, Mondal J. A deep autoencoder framework for discovery of metastable ensembles in biomacromolecules. J Chem Phys 2021; 155:114106. [PMID: 34551528 DOI: 10.1063/5.0059965] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023] Open
Abstract
Biomacromolecules manifest dynamic conformational fluctuation and involve mutual interconversion among metastable states. A robust mapping of their conformational landscape often requires the low-dimensional projection of the conformational ensemble along optimized collective variables (CVs). However, the traditional choice for the CV is often limited by user-intuition and prior knowledge about the system, and this lacks a rigorous assessment of their optimality over other candidate CVs. To address this issue, we propose an approach in which we first choose the possible combinations of inter-residue Cα-distances within a given macromolecule as a set of input CVs. Subsequently, we derive a non-linear combination of latent space embedded CVs via auto-encoding the unbiased molecular dynamics simulation trajectories within the framework of the feed-forward neural network. We demonstrate the ability of the derived latent space variables in elucidating the conformational landscape in four hierarchically complex systems. The latent space CVs identify key metastable states of a bead-in-a-spring polymer. The combination of the adopted dimensional reduction technique with a Markov state model, built on the derived latent space, reveals multiple spatially and kinetically well-resolved metastable conformations for GB1 β-hairpin. A quantitative comparison based on the variational approach-based scoring of the auto-encoder-derived latent space CVs with the ones obtained via independent component analysis (principal component analysis or time-structured independent component analysis) confirms the optimality of the former. As a practical application, the auto-encoder-derived CVs were found to predict the reinforced folding of a Trp-cage mini-protein in aqueous osmolyte solution. Finally, the protocol was able to decipher the conformational heterogeneities involved in a complex metalloenzyme, namely, cytochrome P450.
Collapse
Affiliation(s)
- Satyabrata Bandyopadhyay
- Tata Institute of Fundamental Research, Center for Interdisciplinary Sciences, Hyderabad 500046, India
| | - Jagannath Mondal
- Tata Institute of Fundamental Research, Center for Interdisciplinary Sciences, Hyderabad 500046, India
| |
Collapse
|
30
|
Glielmo A, Husic BE, Rodriguez A, Clementi C, Noé F, Laio A. Unsupervised Learning Methods for Molecular Simulation Data. Chem Rev 2021; 121:9722-9758. [PMID: 33945269 PMCID: PMC8391792 DOI: 10.1021/acs.chemrev.0c01195] [Citation(s) in RCA: 116] [Impact Index Per Article: 38.7] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2020] [Indexed: 12/21/2022]
Abstract
Unsupervised learning is becoming an essential tool to analyze the increasingly large amounts of data produced by atomistic and molecular simulations, in material science, solid state physics, biophysics, and biochemistry. In this Review, we provide a comprehensive overview of the methods of unsupervised learning that have been most commonly used to investigate simulation data and indicate likely directions for further developments in the field. In particular, we discuss feature representation of molecular systems and present state-of-the-art algorithms of dimensionality reduction, density estimation, and clustering, and kinetic models. We divide our discussion into self-contained sections, each discussing a specific method. In each section, we briefly touch upon the mathematical and algorithmic foundations of the method, highlight its strengths and limitations, and describe the specific ways in which it has been used-or can be used-to analyze molecular simulation data.
Collapse
Affiliation(s)
- Aldo Glielmo
- International
School for Advanced Studies (SISSA) 34014 Trieste, Italy
| | - Brooke E. Husic
- Freie
Universität Berlin, Department of Mathematics
and Computer Science, 14195 Berlin, Germany
| | - Alex Rodriguez
- International Centre for Theoretical
Physics (ICTP), Condensed Matter and Statistical
Physics Section, 34100 Trieste, Italy
| | - Cecilia Clementi
- Freie
Universität Berlin, Department for
Physics, 14195 Berlin, Germany
- Rice
University Houston, Department of Chemistry, Houston, Texas 77005, United States
| | - Frank Noé
- Freie
Universität Berlin, Department of Mathematics
and Computer Science, 14195 Berlin, Germany
- Freie
Universität Berlin, Department for
Physics, 14195 Berlin, Germany
- Rice
University Houston, Department of Chemistry, Houston, Texas 77005, United States
| | - Alessandro Laio
- International
School for Advanced Studies (SISSA) 34014 Trieste, Italy
- International Centre for Theoretical
Physics (ICTP), Condensed Matter and Statistical
Physics Section, 34100 Trieste, Italy
| |
Collapse
|
31
|
Lu S, He X, Yang Z, Chai Z, Zhou S, Wang J, Rehman AU, Ni D, Pu J, Sun J, Zhang J. Activation pathway of a G protein-coupled receptor uncovers conformational intermediates as targets for allosteric drug design. Nat Commun 2021; 12:4721. [PMID: 34354057 PMCID: PMC8342441 DOI: 10.1038/s41467-021-25020-9] [Citation(s) in RCA: 103] [Impact Index Per Article: 34.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2020] [Accepted: 07/17/2021] [Indexed: 02/07/2023] Open
Abstract
G protein-coupled receptors (GPCRs) are the most common proteins targeted by approved drugs. A complete mechanistic elucidation of large-scale conformational transitions underlying the activation mechanisms of GPCRs is of critical importance for therapeutic drug development. Here, we apply a combined computational and experimental framework integrating extensive molecular dynamics simulations, Markov state models, site-directed mutagenesis, and conformational biosensors to investigate the conformational landscape of the angiotensin II (AngII) type 1 receptor (AT1 receptor) - a prototypical class A GPCR-activation. Our findings suggest a synergistic transition mechanism for AT1 receptor activation. A key intermediate state is identified in the activation pathway, which possesses a cryptic binding site within the intracellular region of the receptor. Mutation of this cryptic site prevents activation of the downstream G protein signaling and β-arrestin-mediated pathways by the endogenous AngII octapeptide agonist, suggesting an allosteric regulatory mechanism. Together, these findings provide a deeper understanding of AT1 receptor activation at an atomic level and suggest avenues for the design of allosteric AT1 receptor modulators with a broad range of applications in GPCR biology, biophysics, and medicinal chemistry.
Collapse
Affiliation(s)
- Shaoyong Lu
- College of Pharmacy, Ningxia Medical University, Yinchuan, Ningxia Hui Autonomous Region, China.
- State Key Laboratory of Oncogenes and Related Genes, Key Laboratory of Cell Differentiation and Apoptosis of Chinese Ministry of Education, Shanghai Jiao Tong University, School of Medicine, Shanghai, China.
| | - Xinheng He
- The CAS Key Laboratory of Receptor Research, Shanghai Institute of Material Medica, Chinese Academy of Sciences, Shanghai, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Zhao Yang
- Department of Biochemistry and Molecular Biology, Key Laboratory Experimental Teratology of Chinese Ministry of Education, School of Medicine, Shandong University, Jinan, Shandong, China
| | - Zongtao Chai
- Department of Hepatic Surgery VI, Eastern Hepatobiliary Surgery Hospital, Second Military Medical University, Shanghai, China
| | - Shuhua Zhou
- Department of Biochemistry and Molecular Biology, Key Laboratory Experimental Teratology of Chinese Ministry of Education, School of Medicine, Shandong University, Jinan, Shandong, China
| | - Junyan Wang
- Department of Biochemistry and Molecular Biology, Key Laboratory Experimental Teratology of Chinese Ministry of Education, School of Medicine, Shandong University, Jinan, Shandong, China
| | - Ashfaq Ur Rehman
- State Key Laboratory of Oncogenes and Related Genes, Key Laboratory of Cell Differentiation and Apoptosis of Chinese Ministry of Education, Shanghai Jiao Tong University, School of Medicine, Shanghai, China
| | - Duan Ni
- State Key Laboratory of Oncogenes and Related Genes, Key Laboratory of Cell Differentiation and Apoptosis of Chinese Ministry of Education, Shanghai Jiao Tong University, School of Medicine, Shanghai, China
| | - Jun Pu
- Department of Cardiology, Renji Hospital, Shanghai Jiao Tong University, School of Medicine, Shanghai, China
| | - Jinpeng Sun
- Department of Biochemistry and Molecular Biology, Key Laboratory Experimental Teratology of Chinese Ministry of Education, School of Medicine, Shandong University, Jinan, Shandong, China.
| | - Jian Zhang
- College of Pharmacy, Ningxia Medical University, Yinchuan, Ningxia Hui Autonomous Region, China.
- State Key Laboratory of Oncogenes and Related Genes, Key Laboratory of Cell Differentiation and Apoptosis of Chinese Ministry of Education, Shanghai Jiao Tong University, School of Medicine, Shanghai, China.
- School of Pharmaceutical Sciences, Zhengzhou University, Zhengzhou, China.
| |
Collapse
|
32
|
Van Speybroeck V, Vandenhaute S, Hoffman AE, Rogge SM. Towards modeling spatiotemporal processes in metal–organic frameworks. TRENDS IN CHEMISTRY 2021. [DOI: 10.1016/j.trechm.2021.04.003] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
|
33
|
Computational methods for exploring protein conformations. Biochem Soc Trans 2021; 48:1707-1724. [PMID: 32756904 PMCID: PMC7458412 DOI: 10.1042/bst20200193] [Citation(s) in RCA: 30] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2020] [Revised: 07/07/2020] [Accepted: 07/09/2020] [Indexed: 12/13/2022]
Abstract
Proteins are dynamic molecules that can transition between a potentially wide range of structures comprising their conformational ensemble. The nature of these conformations and their relative probabilities are described by a high-dimensional free energy landscape. While computer simulation techniques such as molecular dynamics simulations allow characterisation of the metastable conformational states and the transitions between them, and thus free energy landscapes, to be characterised, the barriers between states can be high, precluding efficient sampling without substantial computational resources. Over the past decades, a dizzying array of methods have emerged for enhancing conformational sampling, and for projecting the free energy landscape onto a reduced set of dimensions that allow conformational states to be distinguished, known as collective variables (CVs), along which sampling may be directed. Here, a brief description of what biomolecular simulation entails is followed by a more detailed exposition of the nature of CVs and methods for determining these, and, lastly, an overview of the myriad different approaches for enhancing conformational sampling, most of which rely upon CVs, including new advances in both CV determination and conformational sampling due to machine learning.
Collapse
|
34
|
Abstract
We describe a nonparametric approach for accurate determination of the slowest relaxation eigenvectors of molecular dynamics. The approach is blind as it uses no system specific information. In particular, it does not require a functional form with many parameters to closely approximate eigenvectors, e.g., linear combinations of molecular descriptors or a deep neural network, and thus no extensive expertise with the system. We suggest a rigorous and sensitive validation/optimality criterion for an eigenvector. The criterion uses only eigenvector time series and can be used to validate eigenvectors computed by other approaches. The power of the approach is illustrated on long atomistic protein folding trajectories. The determined eigenvectors pass the validation test at a time scale of 0.2 ns, much shorter than alternative approaches.
Collapse
Affiliation(s)
- Sergei V Krivov
- Astbury Center for Structural Molecular Biology, Faculty of Biological Sciences, University of Leeds, Leeds LS2 9JT, United Kingdom
| |
Collapse
|
35
|
Bhakat S. Pepsin-like aspartic proteases (PAPs) as model systems for combining biomolecular simulation with biophysical experiments. RSC Adv 2021; 11:11026-11047. [PMID: 35423571 PMCID: PMC8695779 DOI: 10.1039/d0ra10359d] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2020] [Accepted: 02/21/2021] [Indexed: 01/26/2023] Open
Abstract
Pepsin-like aspartic proteases (PAPs) are a class of aspartic proteases which shares tremendous structural similarity with human pepsin. One of the key structural features of PAPs is the presence of a β-hairpin motif otherwise known as flap. The biological function of the PAPs is highly dependent on the conformational dynamics of the flap region. In apo PAPs, the conformational dynamics of the flap is dominated by the rotational degrees of freedom associated with χ1 and χ2 angles of conserved Tyr (or Phe in some cases). However it is plausible that dihedral order parameters associated with several other residues might play crucial roles in the conformational dynamics of apo PAPs. Due to their size, complexities associated with conformational dynamics and clinical significance (drug targets for malaria, Alzheimer's disease etc.), PAPs provide a challenging testing ground for computational and experimental methods focusing on understanding conformational dynamics and molecular recognition in biomolecules. The opening of the flap region is necessary to accommodate substrate/ligand in the active site of the PAPs. The BIG challenge is to gain atomistic details into how reversible ligand binding/unbinding (molecular recognition) affects the conformational dynamics. Recent reports of kinetics (K i, K d) and thermodynamic parameters (ΔH, TΔS, and ΔG) associated with macro-cyclic ligands bound to BACE1 (belongs to PAP family) provide a perfect challenge (how to deal with big ligands with multiple torsional angles and select optimum order parameters to study reversible ligand binding/unbinding) for computational methods to predict binding free energies and kinetics beyond typical test systems e.g. benzamide-trypsin. In this work, i reviewed several order parameters which were proposed to capture the conformational dynamics and molecular recognition in PAPs. I further highlighted how machine learning methods can be used as order parameters in the context of PAPs. I then proposed some open ideas and challenges in the context of molecular simulation and put forward my case on how biophysical experiments e.g. NMR, time-resolved FRET etc. can be used in conjunction with biomolecular simulation to gain complete atomistic insights into the conformational dynamics of PAPs.
Collapse
Affiliation(s)
- Soumendranath Bhakat
- Division of Biophysical Chemistry, Center for Molecular Protein Science, Department of Chemistry, Lund University P. O. Box 124 SE-22100 Lund Sweden +46-769608418
| |
Collapse
|
36
|
Gkeka P, Stoltz G, Barati Farimani A, Belkacemi Z, Ceriotti M, Chodera JD, Dinner AR, Ferguson AL, Maillet JB, Minoux H, Peter C, Pietrucci F, Silveira A, Tkatchenko A, Trstanova Z, Wiewiora R, Lelièvre T. Machine Learning Force Fields and Coarse-Grained Variables in Molecular Dynamics: Application to Materials and Biological Systems. J Chem Theory Comput 2020; 16:4757-4775. [PMID: 32559068 PMCID: PMC8312194 DOI: 10.1021/acs.jctc.0c00355] [Citation(s) in RCA: 82] [Impact Index Per Article: 20.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Abstract
Machine learning encompasses tools and algorithms that are now becoming popular in almost all scientific and technological fields. This is true for molecular dynamics as well, where machine learning offers promises of extracting valuable information from the enormous amounts of data generated by simulation of complex systems. We provide here a review of our current understanding of goals, benefits, and limitations of machine learning techniques for computational studies on atomistic systems, focusing on the construction of empirical force fields from ab initio databases and the determination of reaction coordinates for free energy computation and enhanced sampling.
Collapse
Affiliation(s)
- Paraskevi Gkeka
- Integrated Drug Discovery, Sanofi R&D, 91385 Chilly-Mazarin, France
| | - Gabriel Stoltz
- CERMICS, Ecole des Ponts, Marne-la-Vallée, France
- Matherials Project-Team, Inria Paris, 75012 Paris, France
| | | | - Zineb Belkacemi
- Integrated Drug Discovery, Sanofi R&D, 91385 Chilly-Mazarin, France
- CERMICS, Ecole des Ponts, Marne-la-Vallée, France
| | - Michele Ceriotti
- Laboratory of Computational Science and Modelling, Institute of Materials, École Polytechnique Fédérale de Lausanne, CH-1015 Lausanne, Switzerland
| | - John D Chodera
- Computational and Systems Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, New York 10065, United States
| | - Aaron R Dinner
- Department of Chemistry, The University of Chicago, Chicago, Illinois 60637, United States
| | - Andrew L Ferguson
- Pritzker School of Molecular Engineering, University of Chicago, 5640 South Ellis Avenue, Chicago, Illinois 60637, United States
| | | | - Hervé Minoux
- Integrated Drug Discovery, Sanofi R&D, 94403 Vitry-sur-Seine, France
| | | | - Fabio Pietrucci
- UMR CNRS 7590, MNHN, Institut de Minéralogie, de Physique des Matériaux et de Cosmochimie, Sorbonne Université, 75005 Paris, France
| | - Ana Silveira
- Computational and Systems Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, New York 10065, United States
| | - Alexandre Tkatchenko
- Department of Physics and Materials Science, University of Luxembourg, L-1511 Luxembourg City, Luxembourg
| | - Zofia Trstanova
- School of Mathematics, The University of Edinburgh, Edinburgh EH9 3FD, U.K
| | - Rafal Wiewiora
- Computational and Systems Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, New York 10065, United States
| | - Tony Lelièvre
- CERMICS, Ecole des Ponts, Marne-la-Vallée, France
- Matherials Project-Team, Inria Paris, 75012 Paris, France
| |
Collapse
|
37
|
Ngo VA, Sarkar S, Neale C, Garcia AE. How Anionic Lipids Affect Spatiotemporal Properties of KRAS4B on Model Membranes. J Phys Chem B 2020; 124:5434-5453. [PMID: 32438809 DOI: 10.1021/acs.jpcb.0c02642] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
Abstract
RAS proteins are small membrane-anchored GTPases that regulate key cellular signaling networks. It has been recently shown that different anionic lipid types can affect the spatiotemporal properties of RAS through dimerization/clustering and signaling fidelity. To understand the effects of anionic lipids on key spatiotemporal properties of RAS, we dissected 1 ms of data from all-atom molecular dynamics simulations for KRAS4B on two model anionic lipid membranes that have 30% of POPS mixed with neutral POPC and 8% of PIP2 mixed with POPC. We unveiled the orientation space of KRAS4B, whose kinetics were slower and more distinguishable on the membrane containing PIP2 than the membrane containing POPS. Particularly, the PIP2-mixed membrane can differentiate a third kinetic orientation state from the other two known orientation states. We observed that each orientation state may yield different binding modes with an RAF kinase, which is required for activating the MAPK/ERK signaling pathway. However, an overall occluded probability, for which RAF kinases cannot bind KRAS4B, remains unchanged on the two different membranes. We identified rare fast diffusion modes of KRAS4B that appear coupled with orientations exposed to cytosolic RAF. Particularly, on the membrane having PIP2, we found nonlinear correlations between the orientation states and the conformations of the cationic farnesylated hypervariable region, which acts as an anchor in the membrane. Using diffusion coefficients estimated from the all-atom simulations, we quantified the effect of PIP2 and POPS on the KRAS4B dimerization via Green's function reaction dynamics simulations, in which the averaged dimerization rate is 12.5% slower on PIP2-mixed membranes.
Collapse
Affiliation(s)
- Van A Ngo
- Center for Nonlinear Studies (CNLS), Los Alamos National Lab, Los Alamos, New Mexico 87545, United States
| | - Sumantra Sarkar
- Center for Nonlinear Studies (CNLS), Los Alamos National Lab, Los Alamos, New Mexico 87545, United States
| | - Chris Neale
- Theoretical Biology and Biophysics Group, T-6, Los Alamos National Lab, Los Alamos, New Mexico 87545, United States
| | - Angel E Garcia
- Center for Nonlinear Studies (CNLS), Los Alamos National Lab, Los Alamos, New Mexico 87545, United States
| |
Collapse
|
38
|
Bozkurt Varolgüneş Y, Bereau T, Rudzinski JF. Interpretable embeddings from molecular simulations using Gaussian mixture variational autoencoders. MACHINE LEARNING-SCIENCE AND TECHNOLOGY 2020. [DOI: 10.1088/2632-2153/ab80b7] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/08/2023]
|
39
|
Sidky H, Chen W, Ferguson AL. Machine learning for collective variable discovery and enhanced sampling in biomolecular simulation. Mol Phys 2020. [DOI: 10.1080/00268976.2020.1737742] [Citation(s) in RCA: 26] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Affiliation(s)
- Hythem Sidky
- Pritzker School of Molecular Engineering, University of Chicago, Chicago, IL, USA
| | - Wei Chen
- Department of Physics, University of Illinois at Urbana-Champaign, Urbana, IL, USA
| | - Andrew L. Ferguson
- Pritzker School of Molecular Engineering, University of Chicago, Chicago, IL, USA
| |
Collapse
|
40
|
McKiernan KA, Koster AK, Maduke M, Pande VS. Dynamical model of the CLC-2 ion channel reveals conformational changes associated with selectivity-filter gating. PLoS Comput Biol 2020; 16:e1007530. [PMID: 32226009 PMCID: PMC7145265 DOI: 10.1371/journal.pcbi.1007530] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2018] [Revised: 04/09/2020] [Accepted: 11/05/2019] [Indexed: 12/18/2022] Open
Abstract
This work reports a dynamical Markov state model of CLC-2 "fast" (pore) gating, based on 600 microseconds of molecular dynamics (MD) simulation. In the starting conformation of our CLC-2 model, both outer and inner channel gates are closed. The first conformational change in our dataset involves rotation of the inner-gate backbone along residues S168-G169-I170. This change is strikingly similar to that observed in the cryo-EM structure of the bovine CLC-K channel, though the volume of the intracellular (inner) region of the ion conduction pathway is further expanded in our model. From this state (inner gate open and outer gate closed), two additional states are observed, each involving a unique rotameric flip of the outer-gate residue GLUex. Both additional states involve conformational changes that orient GLUex away from the extracellular (outer) region of the ion conduction pathway. In the first additional state, the rotameric flip of GLUex results in an open, or near-open, channel pore. The equilibrium population of this state is low (∼1%), consistent with the low open probability of CLC-2 observed experimentally in the absence of a membrane potential stimulus (0 mV). In the second additional state, GLUex rotates to occlude the channel pore. This state, which has a low equilibrium population (∼1%), is only accessible when GLUex is protonated. Together, these pathways model the opening of both an inner and outer gate within the CLC-2 selectivity filter, as a function of GLUex protonation. Collectively, our findings are consistent with published experimental analyses of CLC-2 gating and provide a high-resolution structural model to guide future investigations.
Collapse
Affiliation(s)
- Keri A. McKiernan
- Department of Chemistry, Stanford University, Stanford, California, United States of America
| | - Anna K. Koster
- Department of Chemistry, Stanford University, Stanford, California, United States of America
- Department of Molecular & Cellular Physiology, Stanford University, Stanford, California, United States of America
| | - Merritt Maduke
- Department of Molecular & Cellular Physiology, Stanford University, Stanford, California, United States of America
| | - Vijay S. Pande
- Department of Bioengineering, Stanford University, Stanford, California, United States of America
| |
Collapse
|
41
|
Zhang H, Gong Q, Zhang H, Chen C. FSATOOL: A useful tool to do the conformational sampling and trajectory analysis work for biomolecules. J Comput Chem 2020; 41:156-164. [PMID: 31603251 DOI: 10.1002/jcc.26083] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2019] [Revised: 09/10/2019] [Accepted: 09/12/2019] [Indexed: 12/27/2022]
Abstract
Reliable conformational sampling and trajectory analysis are always important to the study of the folding or binding mechanisms of biomolecules. Generally, one has to prepare many complicated parameters and follow a lot of steps to obtain the final data. The whole process is too complicated to new users. In this article, we provide a convenient and user-friendly tool that is compatible to AMBER, called fast sampling and analysis tool (FSATOOL). FSATOOL has some useful features. First and the most important, the whole work is extremely simplified into two steps, one is the fast sampling procedure and the other is the trajectory analysis procedure. Second, it contains several powerful sampling methods for the simulation on graphics process unit, including our previous mixing replica exchange molecular dynamics method. The method combines the advantages of the biased and unbiased simulations. Finally, it extracts the dominant transition pathways automatically from the folding network by Markov state model. Users do not need to do the tedious intermediate steps by hand. To illustrate the usage of FSATOOL in practice, we perform one simulation for a RNA hairpin in explicit solvent. All the results are presented. © 2019 Wiley Periodicals, Inc.
Collapse
Affiliation(s)
- Haomiao Zhang
- Biomolecular Physics and Modeling Group, School of Physics, Huazhong University of Science and Technology, Wuhan, 430074, Hubei, China
| | - Qiankun Gong
- Biomolecular Physics and Modeling Group, School of Physics, Huazhong University of Science and Technology, Wuhan, 430074, Hubei, China
| | - Haozhe Zhang
- Biomolecular Physics and Modeling Group, School of Physics, Huazhong University of Science and Technology, Wuhan, 430074, Hubei, China
| | - Changjun Chen
- Biomolecular Physics and Modeling Group, School of Physics, Huazhong University of Science and Technology, Wuhan, 430074, Hubei, China
| |
Collapse
|
42
|
Noé F. Machine Learning for Molecular Dynamics on Long Timescales. MACHINE LEARNING MEETS QUANTUM PHYSICS 2020. [DOI: 10.1007/978-3-030-40245-7_16] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
|
43
|
Klus S, Husic BE, Mollenhauer M, Noé F. Kernel methods for detecting coherent structures in dynamical data. CHAOS (WOODBURY, N.Y.) 2019; 29:123112. [PMID: 31893642 DOI: 10.1063/1.5100267] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/18/2019] [Accepted: 11/08/2019] [Indexed: 06/10/2023]
Abstract
We illustrate relationships between classical kernel-based dimensionality reduction techniques and eigendecompositions of empirical estimates of reproducing kernel Hilbert space operators associated with dynamical systems. In particular, we show that kernel canonical correlation analysis (CCA) can be interpreted in terms of kernel transfer operators and that it can be obtained by optimizing the variational approach for Markov processes score. As a result, we show that coherent sets of particle trajectories can be computed by kernel CCA. We demonstrate the efficiency of this approach with several examples, namely, the well-known Bickley jet, ocean drifter data, and a molecular dynamics problem with a time-dependent potential. Finally, we propose a straightforward generalization of dynamic mode decomposition called coherent mode decomposition. Our results provide a generic machine learning approach to the computation of coherent sets with an objective score that can be used for cross-validation and the comparison of different methods.
Collapse
Affiliation(s)
- Stefan Klus
- Department of Mathematics and Computer Science, Freie Universität Berlin, 14195 Berlin, Germany
| | - Brooke E Husic
- Department of Mathematics and Computer Science, Freie Universität Berlin, 14195 Berlin, Germany
| | - Mattes Mollenhauer
- Department of Mathematics and Computer Science, Freie Universität Berlin, 14195 Berlin, Germany
| | - Frank Noé
- Department of Mathematics and Computer Science, Freie Universität Berlin, 14195 Berlin, Germany
| |
Collapse
|
44
|
Affiliation(s)
- Frank Noé
- Department of Mathematics and Computer Science, Freie Universität Berlin, Berlin, Germany
- Department of Physics, Freie Universität Berlin, Berlin, Germany
| | - Edina Rosta
- Department of Chemistry, Kings College London, London, England
| |
Collapse
|
45
|
Sidky H, Chen W, Ferguson AL. High-Resolution Markov State Models for the Dynamics of Trp-Cage Miniprotein Constructed Over Slow Folding Modes Identified by State-Free Reversible VAMPnets. J Phys Chem B 2019; 123:7999-8009. [DOI: 10.1021/acs.jpcb.9b05578] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/31/2023]
Affiliation(s)
- Hythem Sidky
- Pritzker School of Molecular Engineering, University of Chicago, Chicago, Illinois 60637, United States
| | - Wei Chen
- Department of Physics, University of Illinois at Urbana-Champaign, 1110 West Green Street, Urbana, Illinois 61801, United States
| | - Andrew L. Ferguson
- Pritzker School of Molecular Engineering, University of Chicago, Chicago, Illinois 60637, United States
| |
Collapse
|
46
|
Recent Progress towards Chemically-Specific Coarse-Grained Simulation Models with Consistent Dynamical Properties. COMPUTATION 2019. [DOI: 10.3390/computation7030042] [Citation(s) in RCA: 34] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/26/2023]
Abstract
Coarse-grained (CG) models can provide computationally efficient and conceptually simple characterizations of soft matter systems. While generic models probe the underlying physics governing an entire family of free-energy landscapes, bottom-up CG models are systematically constructed from a higher-resolution model to retain a high level of chemical specificity. The removal of degrees of freedom from the system modifies the relationship between the relative time scales of distinct dynamical processes through both a loss of friction and a “smoothing” of the free-energy landscape. While these effects typically result in faster dynamics, decreasing the computational expense of the model, they also obscure the connection to the true dynamics of the system. The lack of consistent dynamics is a serious limitation for CG models, which not only prevents quantitatively accurate predictions of dynamical observables but can also lead to qualitatively incorrect descriptions of the characteristic dynamical processes. With many methods available for optimizing the structural and thermodynamic properties of chemically-specific CG models, recent years have seen a stark increase in investigations addressing the accurate description of dynamical properties generated from CG simulations. In this review, we present an overview of these efforts, ranging from bottom-up parameterizations of generalized Langevin equations to refinements of the CG force field based on a Markov state modeling framework. We aim to make connections between seemingly disparate approaches, while laying out some of the major challenges as well as potential directions for future efforts.
Collapse
|
47
|
Chen W, Sidky H, Ferguson AL. Capabilities and limitations of time-lagged autoencoders for slow mode discovery in dynamical systems. J Chem Phys 2019. [DOI: 10.1063/1.5112048] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
Affiliation(s)
- Wei Chen
- Department of Physics, University of Illinois at Urbana-Champaign, 1110 West Green Street, Urbana, Illinois 61801, USA
| | - Hythem Sidky
- Pritzker School of Molecular Engineering, University of Chicago, 5640 South Ellis Avenue, Chicago, Illinois 60637, USA
| | - Andrew L. Ferguson
- Pritzker School of Molecular Engineering, University of Chicago, 5640 South Ellis Avenue, Chicago, Illinois 60637, USA
| |
Collapse
|
48
|
Chen W, Sidky H, Ferguson AL. Nonlinear discovery of slow molecular modes using state-free reversible VAMPnets. J Chem Phys 2019; 150:214114. [PMID: 31176319 DOI: 10.1063/1.5092521] [Citation(s) in RCA: 63] [Impact Index Per Article: 12.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
Abstract
The success of enhanced sampling molecular simulations that accelerate along collective variables (CVs) is predicated on the availability of variables coincident with the slow collective motions governing the long-time conformational dynamics of a system. It is challenging to intuit these slow CVs for all but the simplest molecular systems, and their data-driven discovery directly from molecular simulation trajectories has been a central focus of the molecular simulation community to both unveil the important physical mechanisms and drive enhanced sampling. In this work, we introduce state-free reversible VAMPnets (SRV) as a deep learning architecture that learns nonlinear CV approximants to the leading slow eigenfunctions of the spectral decomposition of the transfer operator that evolves equilibrium-scaled probability distributions through time. Orthogonality of the learned CVs is naturally imposed within network training without added regularization. The CVs are inherently explicit and differentiable functions of the input coordinates making them well-suited to use in enhanced sampling calculations. We demonstrate the utility of SRVs in capturing parsimonious nonlinear representations of complex system dynamics in applications to 1D and 2D toy systems where the true eigenfunctions are exactly calculable and to molecular dynamics simulations of alanine dipeptide and the WW domain protein.
Collapse
Affiliation(s)
- Wei Chen
- Department of Physics, University of Illinois at Urbana-Champaign, 1110 West Green Street, Urbana, Illinois 61801, USA
| | - Hythem Sidky
- Institute for Molecular Engineering, University of Chicago, 5640 South Ellis Avenue, Chicago, Illinois 60637, USA
| | - Andrew L Ferguson
- Institute for Molecular Engineering, University of Chicago, 5640 South Ellis Avenue, Chicago, Illinois 60637, USA
| |
Collapse
|
49
|
Scherer MK, Husic BE, Hoffmann M, Paul F, Wu H, Noé F. Variational selection of features for molecular kinetics. J Chem Phys 2019; 150:194108. [PMID: 31117766 DOI: 10.1063/1.5083040] [Citation(s) in RCA: 40] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022] Open
Abstract
The modeling of atomistic biomolecular simulations using kinetic models such as Markov state models (MSMs) has had many notable algorithmic advances in recent years. The variational principle has opened the door for a nearly fully automated toolkit for selecting models that predict the long time-scale kinetics from molecular dynamics simulations. However, one yet-unoptimized step of the pipeline involves choosing the features, or collective variables, from which the model should be constructed. In order to build intuitive models, these collective variables are often sought to be interpretable and familiar features, such as torsional angles or contact distances in a protein structure. However, previous approaches for evaluating the chosen features rely on constructing a full MSM, which in turn requires additional hyperparameters to be chosen, and hence leads to a computationally expensive framework. Here, we present a method to optimize the feature choice directly, without requiring the construction of the final kinetic model. We demonstrate our rigorous preprocessing algorithm on a canonical set of 12 fast-folding protein simulations and show that our procedure leads to more efficient model selection.
Collapse
Affiliation(s)
- Martin K Scherer
- Department of Mathematics and Computer Science, Freie Universität Berlin, Arnimallee 6, 14195 Berlin, Germany
| | - Brooke E Husic
- Department of Mathematics and Computer Science, Freie Universität Berlin, Arnimallee 6, 14195 Berlin, Germany
| | - Moritz Hoffmann
- Department of Mathematics and Computer Science, Freie Universität Berlin, Arnimallee 6, 14195 Berlin, Germany
| | - Fabian Paul
- Department of Mathematics and Computer Science, Freie Universität Berlin, Arnimallee 6, 14195 Berlin, Germany
| | - Hao Wu
- School of Mathematical Sciences, Tongji University, Shanghai 200092, People's Republic of China
| | - Frank Noé
- Department of Mathematics and Computer Science, Freie Universität Berlin, Arnimallee 6, 14195 Berlin, Germany
| |
Collapse
|
50
|
Paul F, Wu H, Vossel M, de Groot BL, Noé F. Identification of kinetic order parameters for non-equilibrium dynamics. J Chem Phys 2019; 150:164120. [PMID: 31042914 DOI: 10.1063/1.5083627] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022] Open
Abstract
A popular approach to analyze the dynamics of high-dimensional many-body systems, such as macromolecules, is to project the trajectories onto a space of slowly varying collective variables, where subsequent analyses are made, such as clustering or estimation of free energy profiles or Markov state models. However, existing "dynamical" dimension reduction methods, such as the time-lagged independent component analysis (TICA), are only valid if the dynamics obeys detailed balance (microscopic reversibility) and typically require long, equilibrated simulation trajectories. Here, we develop a dimension reduction method for non-equilibrium dynamics based on the recently developed Variational Approach for Markov Processes (VAMP) by Wu and Noé. VAMP is illustrated by obtaining a low-dimensional description of a single file ion diffusion model and by identifying long-lived states from molecular dynamics simulations of the KcsA channel protein in an external electrochemical potential. This analysis provides detailed insights into the coupling of conformational dynamics, the configuration of the selectivity filter, and the conductance of the channel. We recommend VAMP as a replacement for the less general TICA method.
Collapse
Affiliation(s)
- Fabian Paul
- Department of Mathematics and Computer Science, FU Berlin, Arnimallee 6, 14195 Berlin, Germany
| | - Hao Wu
- Tongji University, School of Mathematical Sciences, Shanghai 200092, People's Republic of China
| | - Maximilian Vossel
- Max Planck Institute for Biophysical Chemistry, Am Fassberg 11 D-37077 Göttingen, Germany
| | - Bert L de Groot
- Max Planck Institute for Biophysical Chemistry, Am Fassberg 11 D-37077 Göttingen, Germany
| | - Frank Noé
- Department of Mathematics and Computer Science, FU Berlin, Arnimallee 6, 14195 Berlin, Germany
| |
Collapse
|