1
|
Hradiská H, Kurečka M, Beránek J, Tedeschi G, Višňovský V, Křenek A, Spiwok V. Acceleration of Molecular Simulations by Parametric Time-Lagged tSNE Metadynamics. J Phys Chem B 2024; 128:903-913. [PMID: 38237064 PMCID: PMC10839826 DOI: 10.1021/acs.jpcb.3c05669] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2023] [Revised: 12/22/2023] [Accepted: 12/28/2023] [Indexed: 02/02/2024]
Abstract
The potential of molecular simulations is limited by their computational costs. There is often a need to accelerate simulations using some of the enhanced sampling methods. Metadynamics applies a history-dependent bias potential that disfavors previously visited states. To apply metadynamics, it is necessary to select a few properties of the system─collective variables (CVs) that can be used to define the bias potential. Over the past few years, there have been emerging opportunities for machine learning and, in particular, artificial neural networks within this domain. In this broad context, a specific unsupervised machine learning method was utilized, namely, parametric time-lagged t-distributed stochastic neighbor embedding (ptltSNE) to design CVs. The approach was tested on a Trp-cage trajectory (tryptophan cage) from the literature. The trajectory was used to generate a map of conformations, distinguish fast conformational changes from slow ones, and design CVs. Then, metadynamic simulations were performed. To accelerate the formation of the α-helix, we added the α-RMSD collective variable. This simulation led to one folding event in a 350 ns metadynamics simulation. To accelerate degrees of freedom not addressed by CVs, we performed parallel tempering metadynamics. This simulation led to 10 folding events in a 200 ns simulation with 32 replicas.
Collapse
Affiliation(s)
- Helena Hradiská
- Department
of Biochemistry and Microbiology, University
of Chemistry and Technology Prague, Technická 3, Prague
6 166 28, Czech Republic
| | - Martin Kurečka
- Institute
of Computer Science, Masaryk Univerzity, Šumavská 416/15, Brno 602 00, Czech Republic
| | - Jan Beránek
- Department
of Biochemistry and Microbiology, University
of Chemistry and Technology Prague, Technická 3, Prague
6 166 28, Czech Republic
| | - Guglielmo Tedeschi
- Department
of Biochemistry and Microbiology, University
of Chemistry and Technology Prague, Technická 3, Prague
6 166 28, Czech Republic
| | - Vladimír Višňovský
- Institute
of Computer Science, Masaryk Univerzity, Šumavská 416/15, Brno 602 00, Czech Republic
| | - Aleš Křenek
- Institute
of Computer Science, Masaryk Univerzity, Šumavská 416/15, Brno 602 00, Czech Republic
| | - Vojtěch Spiwok
- Department
of Biochemistry and Microbiology, University
of Chemistry and Technology Prague, Technická 3, Prague
6 166 28, Czech Republic
| |
Collapse
|
2
|
Blumer O, Reuveni S, Hirshberg B. Combining stochastic resetting with Metadynamics to speed-up molecular dynamics simulations. Nat Commun 2024; 15:240. [PMID: 38172126 PMCID: PMC10764788 DOI: 10.1038/s41467-023-44528-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2023] [Accepted: 12/18/2023] [Indexed: 01/05/2024] Open
Abstract
Metadynamics is a powerful method to accelerate molecular dynamics simulations, but its efficiency critically depends on the identification of collective variables that capture the slow modes of the process. Unfortunately, collective variables are usually not known a priori and finding them can be very challenging. We recently presented a collective variables-free approach to enhanced sampling using stochastic resetting. Here, we combine the two methods, showing that it can lead to greater acceleration than either of them separately. We also demonstrate that resetting Metadynamics simulations performed with suboptimal collective variables can lead to speedups comparable with those obtained with optimal collective variables. Therefore, applying stochastic resetting can be an alternative to the challenging task of improving suboptimal collective variables, at almost no additional computational cost. Finally, we propose a method to extract unbiased mean first-passage times from Metadynamics simulations with resetting, resulting in an improved tradeoff between speedup and accuracy. This work enables combining stochastic resetting with other enhanced sampling methods to accelerate a broad range of molecular simulations.
Collapse
Affiliation(s)
- Ofir Blumer
- School of Chemistry, Tel Aviv University, Tel Aviv, 6997801, Israel
| | - Shlomi Reuveni
- School of Chemistry, Tel Aviv University, Tel Aviv, 6997801, Israel
- The Center for Computational Molecular and Materials Science, Tel Aviv University, Tel Aviv, 6997801, Israel
- The Center for Physics and Chemistry of Living Systems, Tel Aviv University, Tel Aviv, 6997801, Israel
| | - Barak Hirshberg
- School of Chemistry, Tel Aviv University, Tel Aviv, 6997801, Israel.
- The Center for Computational Molecular and Materials Science, Tel Aviv University, Tel Aviv, 6997801, Israel.
- The Center for Physics and Chemistry of Living Systems, Tel Aviv University, Tel Aviv, 6997801, Israel.
| |
Collapse
|
3
|
Poruthoor AJ, Sharma A, Grossfield A. Understanding the free-energy landscape of phase separation in lipid bilayers using molecular dynamics. Biophys J 2023; 122:4144-4159. [PMID: 37742069 PMCID: PMC10645549 DOI: 10.1016/j.bpj.2023.09.012] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2023] [Revised: 08/28/2023] [Accepted: 09/19/2023] [Indexed: 09/25/2023] Open
Abstract
Liquid-liquid phase separation inside the cell often results in biological condensates that can critically affect cell homeostasis. Such phase separation events occur in multiple parts of cells, including the cell membranes, where the "lipid raft" hypothesis posits the formation of ordered domains floating in a sea of disordered lipids. The resulting lipid domains often have functional roles. However, the thermodynamics of lipid phase separation and their resulting mechanistic effects on cell function and dysfunction are poorly understood. Understanding such complex phenomena in cell membranes, with their diverse lipid compositions, is exceptionally difficult. For these reasons, simple model systems that can recapitulate similar behavior are widely used to study this phenomenon. Despite these simplifications, the timescale and length scales of domain formation pose a challenge for molecular dynamics (MD) simulations. Thus, most MD studies focus on spontaneous lipid phase separation-essentially measuring the sign (but not the amplitude) of the free-energy change upon separation-rather than directly interrogating the thermodynamics. Here, we propose a proof-of-concept pipeline that can directly measure this free energy by combining coarse-grained MD with enhanced sampling protocols using a novel collective variable. This approach will be a useful tool to help connect the thermodynamics of phase separation with the mechanistic insights already available from MD simulations.
Collapse
Affiliation(s)
- Ashlin J Poruthoor
- Department of Biochemistry and Biophysics, University of Rochester Medical Center, Rochester, New York
| | - Akshara Sharma
- Department of Biochemistry and Biophysics, University of Rochester Medical Center, Rochester, New York
| | - Alan Grossfield
- Department of Biochemistry and Biophysics, University of Rochester Medical Center, Rochester, New York.
| |
Collapse
|
4
|
Conflitti P, Raniolo S, Limongelli V. Perspectives on Ligand/Protein Binding Kinetics Simulations: Force Fields, Machine Learning, Sampling, and User-Friendliness. J Chem Theory Comput 2023; 19:6047-6061. [PMID: 37656199 PMCID: PMC10536999 DOI: 10.1021/acs.jctc.3c00641] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2023] [Indexed: 09/02/2023]
Abstract
Computational techniques applied to drug discovery have gained considerable popularity for their ability to filter potentially active drugs from inactive ones, reducing the time scale and costs of preclinical investigations. The main focus of these studies has historically been the search for compounds endowed with high affinity for a specific molecular target to ensure the formation of stable and long-lasting complexes. Recent evidence has also correlated the in vivo drug efficacy with its binding kinetics, thus opening new fascinating scenarios for ligand/protein binding kinetic simulations in drug discovery. The present article examines the state of the art in the field, providing a brief summary of the most popular and advanced ligand/protein binding kinetics techniques and evaluating their current limitations and the potential solutions to reach more accurate kinetic models. Particular emphasis is put on the need for a paradigm change in the present methodologies toward ligand and protein parametrization, the force field problem, characterization of the transition states, the sampling issue, and algorithms' performance, user-friendliness, and data openness.
Collapse
Affiliation(s)
- Paolo Conflitti
- Faculty
of Biomedical Sciences, Euler Institute, Universitá della Svizzera italiana (USI), 6900 Lugano, Switzerland
| | - Stefano Raniolo
- Faculty
of Biomedical Sciences, Euler Institute, Universitá della Svizzera italiana (USI), 6900 Lugano, Switzerland
| | - Vittorio Limongelli
- Faculty
of Biomedical Sciences, Euler Institute, Universitá della Svizzera italiana (USI), 6900 Lugano, Switzerland
- Department
of Pharmacy, University of Naples “Federico
II”, 80131 Naples, Italy
| |
Collapse
|
5
|
Chen H, Roux B, Chipot C. Discovering Reaction Pathways, Slow Variables, and Committor Probabilities with Machine Learning. J Chem Theory Comput 2023; 19:4414-4426. [PMID: 37224455 PMCID: PMC11372462 DOI: 10.1021/acs.jctc.3c00028] [Citation(s) in RCA: 10] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/26/2023]
Abstract
A significant challenge faced by atomistic simulations is the difficulty, and often impossibility, to sample the transitions between metastable states of the free-energy landscape associated with slow molecular processes. Importance-sampling schemes represent an appealing option to accelerate the underlying dynamics by smoothing out the relevant free-energy barriers, but require the definition of suitable reaction-coordinate (RC) models expressed in terms of compact low-dimensional sets of collective variables (CVs). While most computational studies of slow molecular processes have traditionally relied on educated guesses based on human intuition to reduce the dimensionality of the problem at hand, a variety of machine-learning (ML) algorithms have recently emerged as powerful alternatives to discover meaningful CVs capable of capturing the dynamics of the slowest degrees of freedom. Considering a simple paradigmatic situation in which the long-time dynamics is dominated by the transition between two known metastable states, we compare two variational data-driven ML methods based on Siamese neural networks aimed at discovering a meaningful RC model─the slowest decorrelating CV of the molecular process, and the committor probability to first reach one of the two metastable states. One method is the state-free reversible variational approach for Markov processes networks (VAMPnets), or SRVs─the other, inspired by the transition path theory framework, is the variational committor-based neural networks, or VCNs. The relationship and the ability of these methodologies to discover the relevant descriptors of the slow molecular process of interest are illustrated with a series of simple model systems. We also show that both strategies are amenable to importance-sampling schemes through an appropriate reweighting algorithm that approximates the kinetic properties of the transition.
Collapse
Affiliation(s)
- Haochuan Chen
- Laboratoire International Associé Centre National de la Recherche Scientifique et University of Illinois at Urbana-Champaign, Unité Mixte de Recherche n°7019, Université de Lorraine, B.P. 70239, 54506 Vandœuvre-lès-Nancy cedex, France
| | - Benoît Roux
- Department of Biochemistry and Molecular Biology, University of Chicago, Chicago, 60637, United States
| | - Christophe Chipot
- Laboratoire International Associé Centre National de la Recherche Scientifique et University of Illinois at Urbana-Champaign, Unité Mixte de Recherche n°7019, Université de Lorraine, B.P. 70239, 54506 Vandœuvre-lès-Nancy cedex, France
- Department of Biochemistry and Molecular Biology, University of Chicago, Chicago, 60637, United States
- NIH Center for Macromolecular Modeling and Bioinformatics, Beckman Institute for Advanced Science and Technology, and Department of Physics, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, United States
| |
Collapse
|
6
|
Sasmal S, McCullagh M, Hocky GM. Reaction Coordinates for Conformational Transitions Using Linear Discriminant Analysis on Positions. J Chem Theory Comput 2023; 19:4427-4435. [PMID: 37130367 PMCID: PMC10373481 DOI: 10.1021/acs.jctc.3c00051] [Citation(s) in RCA: 11] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2023] [Indexed: 05/04/2023]
Abstract
In this work, we demonstrate that Linear Discriminant Analysis (LDA) applied to atomic positions in two different states of a biomolecule produces a good reaction coordinate between those two states. Atomic coordinates of a macromolecule are a direct representation of a macromolecular configuration, and yet, they are not used in enhanced sampling studies due to a lack of rotational and translational invariance. We resolve this issue using the technique of our prior work, whereby a molecular configuration is considered a member of an equivalence class in size-and-shape space, which is the set of all configurations that can be translated and rotated to a single point within a reference multivariate Gaussian distribution characterizing a single molecular state. The reaction coordinates produced by LDA applied to positions are shown to be good reaction coordinates both in terms of characterizing the transition between two states of a system within a long molecular dynamics (MD) simulation and also ones that allow us to readily produce free energy estimates along that reaction coordinate using enhanced sampling MD techniques.
Collapse
Affiliation(s)
- Subarna Sasmal
- Department
of Chemistry and Simons Center for Computational Physical Chemistry, New York University, New York, New York 10003, United States
| | - Martin McCullagh
- Department
of Chemistry, Oklahoma State University, Stillwater, Oklahoma 74078, United States
| | - Glen M. Hocky
- Department
of Chemistry and Simons Center for Computational Physical Chemistry, New York University, New York, New York 10003, United States
| |
Collapse
|
7
|
Lüking M, van der Spoel D, Elf J, Tribello GA. Can molecular dynamics be used to simulate biomolecular recognition? J Chem Phys 2023; 158:2889489. [PMID: 37158325 DOI: 10.1063/5.0146899] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2023] [Accepted: 04/19/2023] [Indexed: 05/10/2023] Open
Abstract
There are many problems in biochemistry that are difficult to study experimentally. Simulation methods are appealing due to direct availability of atomic coordinates as a function of time. However, direct molecular simulations are challenged by the size of systems and the time scales needed to describe relevant motions. In theory, enhanced sampling algorithms can help to overcome some of the limitations of molecular simulations. Here, we discuss a problem in biochemistry that offers a significant challenge for enhanced sampling methods and that could, therefore, serve as a benchmark for comparing approaches that use machine learning to find suitable collective variables. In particular, we study the transitions LacI undergoes upon moving between being non-specifically and specifically bound to DNA. Many degrees of freedom change during this transition and that the transition does not occur reversibly in simulations if only a subset of these degrees of freedom are biased. We also explain why this problem is so important to biologists and the transformative impact that a simulation of it would have on the understanding of DNA regulation.
Collapse
Affiliation(s)
- Malin Lüking
- Department of Cell and Molecular Biology, Uppsala University, Husargatan 3, SE-75124 Uppsala, Sweden
| | - David van der Spoel
- Department of Cell and Molecular Biology, Uppsala University, Husargatan 3, SE-75124 Uppsala, Sweden
| | - Johan Elf
- Department of Cell and Molecular Biology, Uppsala University, Husargatan 3, SE-75124 Uppsala, Sweden
| | - Gareth A Tribello
- Centre for Quantum Materials and Technologies, School of Mathematics and Physics, Queen's University Belfast, Belfast BT7 1NN, United Kingdom
| |
Collapse
|
8
|
Cardellini A, Crippa M, Lionello C, Afrose SP, Das D, Pavan GM. Unsupervised Data-Driven Reconstruction of Molecular Motifs in Simple to Complex Dynamic Micelles. J Phys Chem B 2023; 127:2595-2608. [PMID: 36891625 PMCID: PMC10041528 DOI: 10.1021/acs.jpcb.2c08726] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/10/2023]
Abstract
The reshuffling mobility of molecular building blocks in self-assembled micelles is a key determinant of many their interesting properties, from emerging morphologies and surface compartmentalization, to dynamic reconfigurability and stimuli-responsiveness. However, the microscopic details of such complex structural dynamics are typically nontrivial to elucidate, especially in multicomponent assemblies. Here we show a machine-learning approach that allows us to reconstruct the structural and dynamic complexity of mono- and bicomponent surfactant micelles from high-dimensional data extracted from equilibrium molecular dynamics simulations. Unsupervised clustering of smooth overlap of atomic position (SOAP) data enables us to identify, in a set of multicomponent surfactant micelles, the dominant local molecular environments that emerge within them and to retrace their dynamics, in terms of exchange probabilities and transition pathways of the constituent building blocks. Tested on a variety of micelles differing in size and in the chemical nature of the constitutive self-assembling units, this approach effectively recognizes the molecular motifs populating them in an exquisitely agnostic and unsupervised way, and allows correlating them to their composition in terms of constitutive surfactant species.
Collapse
Affiliation(s)
- Annalisa Cardellini
- Department of Innovative Technologies, University of Applied Sciences and Arts of Southern Switzerland, Polo Universitario Lugano, Campus Est, Via la Santa 1, 6962 Lugano-Viganello, Switzerland
| | - Martina Crippa
- Department of Applied Science and Technology, Politecnico di Torino, Corso Duca degli Abruzzi 24, 10129 Torino, Italy
| | - Chiara Lionello
- Department of Applied Science and Technology, Politecnico di Torino, Corso Duca degli Abruzzi 24, 10129 Torino, Italy
| | - Syed Pavel Afrose
- Department of Chemical Sciences and Centre for Advanced Functional Materials, Indian Institute of Science Education and Research (IISER) Kolkata, Mohanpur 741246, India
| | - Dibyendu Das
- Department of Chemical Sciences and Centre for Advanced Functional Materials, Indian Institute of Science Education and Research (IISER) Kolkata, Mohanpur 741246, India
| | - Giovanni M Pavan
- Department of Innovative Technologies, University of Applied Sciences and Arts of Southern Switzerland, Polo Universitario Lugano, Campus Est, Via la Santa 1, 6962 Lugano-Viganello, Switzerland
- Department of Applied Science and Technology, Politecnico di Torino, Corso Duca degli Abruzzi 24, 10129 Torino, Italy
| |
Collapse
|
9
|
Rydzewski J, Chen M, Ghosh TK, Valsson O. Reweighted Manifold Learning of Collective Variables from Enhanced Sampling Simulations. J Chem Theory Comput 2022; 18:7179-7192. [PMID: 36367826 DOI: 10.1021/acs.jctc.2c00873] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
Enhanced sampling methods are indispensable in computational chemistry and physics, where atomistic simulations cannot exhaustively sample the high-dimensional configuration space of dynamical systems due to the sampling problem. A class of such enhanced sampling methods works by identifying a few slow degrees of freedom, termed collective variables (CVs), and enhancing the sampling along these CVs. Selecting CVs to analyze and drive the sampling is not trivial and often relies on chemical intuition. Despite routinely circumventing this issue using manifold learning to estimate CVs directly from standard simulations, such methods cannot provide mappings to a low-dimensional manifold from enhanced sampling simulations, as the geometry and density of the learned manifold are biased. Here, we address this crucial issue and provide a general reweighting framework based on anisotropic diffusion maps for manifold learning that takes into account that the learning data set is sampled from a biased probability distribution. We consider manifold learning methods based on constructing a Markov chain describing transition probabilities between high-dimensional samples. We show that our framework reverts the biasing effect, yielding CVs that correctly describe the equilibrium density. This advancement enables the construction of low-dimensional CVs using manifold learning directly from the data generated by enhanced sampling simulations. We call our framework reweighted manifold learning. We show that it can be used in many manifold learning techniques on data from both standard and enhanced sampling simulations.
Collapse
Affiliation(s)
- Jakub Rydzewski
- Institute of Physics, Faculty of Physics, Astronomy and Informatics, Nicolaus Copernicus University, Grudziadzka 5, 87-100 Toruń, Poland
| | - Ming Chen
- Department of Chemistry, Purdue University, West Lafayette, Indiana 47907, United States
| | - Tushar K Ghosh
- Department of Chemistry, Purdue University, West Lafayette, Indiana 47907, United States
| | - Omar Valsson
- Department of Chemistry, University of North Texas, Denton, Texas 76201, United States
| |
Collapse
|
10
|
Evans L, Cameron MK, Tiwary P. Computing committors via Mahalanobis diffusion maps with enhanced sampling data. J Chem Phys 2022; 157:214107. [PMID: 36511548 DOI: 10.1063/5.0122990] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/03/2022] Open
Abstract
The study of phenomena such as protein folding and conformational changes in molecules is a central theme in chemical physics. Molecular dynamics (MD) simulation is the primary tool for the study of transition processes in biomolecules, but it is hampered by a huge timescale gap between the processes of interest and atomic vibrations that dictate the time step size. Therefore, it is imperative to combine MD simulations with other techniques in order to quantify the transition processes taking place on large timescales. In this work, the diffusion map with Mahalanobis kernel, a meshless approach for approximating the Backward Kolmogorov Operator (BKO) in collective variables, is upgraded to incorporate standard enhanced sampling techniques, such as metadynamics. The resulting algorithm, which we call the target measure Mahalanobis diffusion map (tm-mmap), is suitable for a moderate number of collective variables in which one can approximate the diffusion tensor and free energy. Imposing appropriate boundary conditions allows use of the approximated BKO to solve for the committor function and utilization of transition path theory to find the reactive current delineating the transition channels and the transition rate. The proposed algorithm, tm-mmap, is tested on the two-dimensional Moro-Cardin two-well system with position-dependent diffusion coefficient and on alanine dipeptide in two collective variables where the committor, the reactive current, and the transition rate are compared to those computed by the finite element method (FEM). Finally, tm-mmap is applied to alanine dipeptide in four collective variables where the use of finite elements is infeasible.
Collapse
Affiliation(s)
- L Evans
- Department of Mathematics, University of Maryland, College Park, Maryland 20742, USA
| | - M K Cameron
- Department of Mathematics, University of Maryland, College Park, Maryland 20742, USA
| | - P Tiwary
- Department of Chemistry and Biochemistry and Institute for Physical Science and Technology, University of Maryland, College Park, Maryland 20742, USA
| |
Collapse
|
11
|
Mardt A, Hempel T, Clementi C, Noé F. Deep learning to decompose macromolecules into independent Markovian domains. Nat Commun 2022; 13:7101. [PMID: 36402768 PMCID: PMC9675806 DOI: 10.1038/s41467-022-34603-z] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2022] [Accepted: 10/27/2022] [Indexed: 11/21/2022] Open
Abstract
The increasing interest in modeling the dynamics of ever larger proteins has revealed a fundamental problem with models that describe the molecular system as being in a global configuration state. This notion limits our ability to gather sufficient statistics of state probabilities or state-to-state transitions because for large molecular systems the number of metastable states grows exponentially with size. In this manuscript, we approach this challenge by introducing a method that combines our recent progress on independent Markov decomposition (IMD) with VAMPnets, a deep learning approach to Markov modeling. We establish a training objective that quantifies how well a given decomposition of the molecular system into independent subdomains with Markovian dynamics approximates the overall dynamics. By constructing an end-to-end learning framework, the decomposition into such subdomains and their individual Markov state models are simultaneously learned, providing a data-efficient and easily interpretable summary of the complex system dynamics. While learning the dynamical coupling between Markovian subdomains is still an open issue, the present results are a significant step towards learning Ising models of large molecular complexes from simulation data.
Collapse
Affiliation(s)
- Andreas Mardt
- grid.14095.390000 0000 9116 4836Freie Universität Berlin, Department of Mathematics and Computer Science, Berlin, Germany
| | - Tim Hempel
- grid.14095.390000 0000 9116 4836Freie Universität Berlin, Department of Mathematics and Computer Science, Berlin, Germany ,grid.14095.390000 0000 9116 4836Freie Universität Berlin, Department of Physics, Berlin, Germany
| | - Cecilia Clementi
- grid.14095.390000 0000 9116 4836Freie Universität Berlin, Department of Physics, Berlin, Germany ,grid.21940.3e0000 0004 1936 8278Rice University, Department of Chemistry, Houston, TX USA ,grid.509984.90000 0004 5907 3802Rice University, Center for Theoretical Biological Physics, Houston, TX USA
| | - Frank Noé
- grid.14095.390000 0000 9116 4836Freie Universität Berlin, Department of Mathematics and Computer Science, Berlin, Germany ,grid.14095.390000 0000 9116 4836Freie Universität Berlin, Department of Physics, Berlin, Germany ,grid.21940.3e0000 0004 1936 8278Rice University, Department of Chemistry, Houston, TX USA ,Microsoft Research AI4Science, Berlin, Germany
| |
Collapse
|
12
|
Abstract
The treatment of slow and rare transitions in the simulation of complex systems poses a great computational challenge. A powerful approach to tackle this challenge is the string method, which represents the transition path as a one-dimensional curve in a multidimensional space of collective variables. Commonly used strategies for pathway optimization include aligning the tangent of the string to the local mean force or to the mean drift determined from swarms of short trajectories. Here, a novel strategy is proposed, allowing the string to be optimized based on a variational principle involving the unidirectional reactive flux expressed in terms of the time-correlation function of the committor. The method is illustrated with model systems and then probed with the alanine dipeptide and a coarse-grained model of the barstar-barnase protein complex. Successive iterations variationally refine the string toward an optimal transition pathway following the gradient of the committor between two metastable states.
Collapse
Affiliation(s)
- Ziwei He
- Department of Chemistry, The University of Chicago, 5735 S. Ellis Avenue, Chicago60637, Illinois, United States
| | - Christophe Chipot
- Laboratoire International Associé Centre National de la Recherche Scientifique et University of Illinois at Urbana-Champaign, Unité Mixte de Recherche No. 7019, Université de Lorraine, B.P. 70239, Vandœuvre-lès-Nancy cedex54506, France
| | - Benoît Roux
- Department of Chemistry, The University of Chicago, 5735 S. Ellis Avenue, Chicago60637, Illinois, United States
- Department of Biochemistry and Molecular Biology, The University of Chicago, Chicago60637, IllinoisUnited States
| |
Collapse
|
13
|
Bhakat S. Collective variable discovery in the age of machine learning: reality, hype and everything in between. RSC Adv 2022; 12:25010-25024. [PMID: 36199882 PMCID: PMC9437778 DOI: 10.1039/d2ra03660f] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2022] [Accepted: 08/20/2022] [Indexed: 11/21/2022] Open
Abstract
Understanding the kinetics and thermodynamics profile of biomolecules is necessary to understand their functional roles which has a major impact in mechanism driven drug discovery. Molecular dynamics simulation has been routinely used to understand conformational dynamics and molecular recognition in biomolecules. Statistical analysis of high-dimensional spatiotemporal data generated from molecular dynamics simulation requires identification of a few low-dimensional variables which can describe the essential dynamics of a system without significant loss of information. In physical chemistry, these low-dimensional variables are often called collective variables. Collective variables are used to generate reduced representations of free energy surfaces and calculate transition probabilities between different metastable basins. However the choice of collective variables is not trivial for complex systems. Collective variables range from geometric criteria such as distances and dihedral angles to abstract ones such as weighted linear combinations of multiple geometric variables. The advent of machine learning algorithms led to increasing use of abstract collective variables to represent biomolecular dynamics. In this review, I will highlight several nuances of commonly used collective variables ranging from geometric to abstract ones. Further, I will put forward some cases where machine learning based collective variables were used to describe simple systems which in principle could have been described by geometric ones. Finally, I will put forward my thoughts on artificial general intelligence and how it can be used to discover and predict collective variables from spatiotemporal data generated by molecular dynamics simulations.
Collapse
Affiliation(s)
- Soumendranath Bhakat
- Department of Biochemistry and Biophysics, Perelman School of Medicine, University of Pennsylvania Pennsylvania 19104-6059 USA +1 30549 32620
| |
Collapse
|
14
|
Beyerle ER, Mehdi S, Tiwary P. Quantifying Energetic and Entropic Pathways in Molecular Systems. J Phys Chem B 2022; 126:3950-3960. [PMID: 35605180 DOI: 10.1021/acs.jpcb.2c01782] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
When examining dynamics occurring at nonzero temperatures, both energy and entropy must be taken into account to describe activated barrier crossing events. Furthermore, good reaction coordinates need to be constructed to describe different metastable states and the transition mechanisms between them. Here we use a physics-based machine learning method called state predictive information bottleneck (SPIB) to find nonlinear reaction coordinates for three systems of varying complexity. SPIB is able to correctly predict an entropic bottleneck for an analytical flat-energy double-well system and identify the entropy- and energy-dominated pathways for an analytical four-well system. Finally, for a simulation of benzoic acid permeation through a lipid bilayer, SPIB is able to discover the entropic and energetic barriers to the permeation process. Given these results, we thus establish that SPIB is a reasonable and robust method for finding the important entropy, energy, and enthalpy barriers in physical systems, which can then be used to enhance the understanding and sampling of different activated mechanisms.
Collapse
Affiliation(s)
- Eric R Beyerle
- Institute for Physical Science and Technology, University of Maryland, College Park, Maryland 20740, United States
| | - Shams Mehdi
- Biophysics Program and Institute for Physical Science and Technology, University of Maryland, College Park, Maryland 20742, United States
| | - Pratyush Tiwary
- Department of Chemistry and Biochemistry and Institute for Physical Science and Technology, University of Maryland, College Park, Maryland 20742, United States
| |
Collapse
|
15
|
Cignoni E, Slama V, Cupellini L, Mennucci B. The atomistic modeling of light-harvesting complexes from the physical models to the computational protocol. J Chem Phys 2022; 156:120901. [DOI: 10.1063/5.0086275] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
Abstract
The function of light-harvesting complexes is determined by a complex network of dynamic interactions among all the different components: the aggregate of pigments, the protein, and the surrounding environment. Complete and reliable predictions on these types of composite systems can be only achieved with an atomistic description. In the last few decades, there have been important advances in the atomistic modeling of light-harvesting complexes. These advances have involved both the completeness of the physical models and the accuracy and effectiveness of the computational protocols. In this Perspective, we present an overview of the main theoretical and computational breakthroughs attained so far in the field, with particular focus on the important role played by the protein and its dynamics. We then discuss the open problems in their accurate modeling that still need to be addressed. To illustrate an effective computational workflow for the modeling of light harvesting complexes, we take as an example the plant antenna complex CP29 and its H111N mutant.
Collapse
Affiliation(s)
- Edoardo Cignoni
- Dipartimento di Chimica e Chimica Industriale, University of Pisa, via G. Moruzzi 13, 56124 Pisa, Italy
| | - Vladislav Slama
- Dipartimento di Chimica e Chimica Industriale, University of Pisa, via G. Moruzzi 13, 56124 Pisa, Italy
| | - Lorenzo Cupellini
- Dipartimento di Chimica e Chimica Industriale, University of Pisa, via G. Moruzzi 13, 56124 Pisa, Italy
| | - Benedetta Mennucci
- Dipartimento di Chimica e Chimica Industriale, University of Pisa, via G. Moruzzi 13, 56124 Pisa, Italy
| |
Collapse
|
16
|
Zou Z, Tsai ST, Tiwary P. Toward Automated Sampling of Polymorph Nucleation and Free Energies with the SGOOP and Metadynamics. J Phys Chem B 2021; 125:13049-13056. [PMID: 34788047 DOI: 10.1021/acs.jpcb.1c07595] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Understanding the driving forces behind the nucleation of different polymorphs is of great importance for material sciences and the pharmaceutical industry. This includes understanding the reaction coordinate that governs the nucleation process and correctly calculating the relative free energies of different polymorphs. Here, we demonstrate, for the prototypical case of urea nucleation from the melt, how one can learn such a one-dimensional reaction coordinate as a function of prespecified order parameters and use it to perform efficient biased all-atom molecular dynamics simulations. The reaction coordinate is learnt as a function of the generic thermodynamic and structural order parameters using the "spectral gap optimization of order parameters (SGOOP)" approach [Tiwary, P. and Berne, B. J. Proc. Natl. Acad. Sci. U.S.A. (2016)] and is biased using well-tempered metadynamics simulations. The reaction coordinate gives insights into the role played by different structural and thermodynamics order parameters, and the biased simulations obtain accurate relative free energies for different polymorphs. This includes an accurate prediction of the approximate pressure at which urea undergoes a phase transition and one of the metastable polymorphs becomes the most stable conformation. We believe the ideas demonstrated in this work will facilitate efficient sampling of nucleation in complex, generic systems.
Collapse
Affiliation(s)
- Ziyue Zou
- Department of Chemistry and Biochemistry, University of Maryland, College Park, Maryland 20742, United States
| | - Sun-Ting Tsai
- Department of Physics, University of Maryland, College Park, Maryland 20742, United States.,Institute for Physical Science and Technology, University of Maryland, College Park, Maryland 20742, United States
| | - Pratyush Tiwary
- Department of Chemistry and Biochemistry, University of Maryland, College Park, Maryland 20742, United States.,Institute for Physical Science and Technology, University of Maryland, College Park, Maryland 20742, United States
| |
Collapse
|
17
|
Bonati L, Piccini G, Parrinello M. Deep learning the slow modes for rare events sampling. Proc Natl Acad Sci U S A 2021; 118:e2113533118. [PMID: 34706940 PMCID: PMC8612227 DOI: 10.1073/pnas.2113533118] [Citation(s) in RCA: 87] [Impact Index Per Article: 29.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/19/2021] [Indexed: 02/08/2023] Open
Abstract
The development of enhanced sampling methods has greatly extended the scope of atomistic simulations, allowing long-time phenomena to be studied with accessible computational resources. Many such methods rely on the identification of an appropriate set of collective variables. These are meant to describe the system's modes that most slowly approach equilibrium under the action of the sampling algorithm. Once identified, the equilibration of these modes is accelerated by the enhanced sampling method of choice. An attractive way of determining the collective variables is to relate them to the eigenfunctions and eigenvalues of the transfer operator. Unfortunately, this requires knowing the long-term dynamics of the system beforehand, which is generally not available. However, we have recently shown that it is indeed possible to determine efficient collective variables starting from biased simulations. In this paper, we bring the power of machine learning and the efficiency of the recently developed on the fly probability-enhanced sampling method to bear on this approach. The result is a powerful and robust algorithm that, given an initial enhanced sampling simulation performed with trial collective variables or generalized ensembles, extracts transfer operator eigenfunctions using a neural network ansatz and then accelerates them to promote sampling of rare events. To illustrate the generality of this approach, we apply it to several systems, ranging from the conformational transition of a small molecule to the folding of a miniprotein and the study of materials crystallization.
Collapse
Affiliation(s)
- Luigi Bonati
- Department of Physics, Eidgenössische Technische Hochschule (ETH) Zürich, 8092 Zürich, Switzerland;
- Atomistic Simulations, Italian Institute of Technology, 16163 Genova, Italy
| | | | - Michele Parrinello
- Atomistic Simulations, Italian Institute of Technology, 16163 Genova, Italy;
| |
Collapse
|