1
|
Huang Y, Zhang H, Lin Z, Wei Y, Xi W. RevGraphVAMP: A protein molecular simulation analysis model combining graph convolutional neural networks and physical constraints. Methods 2024; 229:163-174. [PMID: 38972499 DOI: 10.1016/j.ymeth.2024.06.011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2023] [Revised: 06/19/2024] [Accepted: 06/24/2024] [Indexed: 07/09/2024] Open
Abstract
Molecular dynamics simulation is a crucial research domain within the life sciences, focusing on comprehending the mechanisms of biomolecular interactions at atomic scales. Protein simulation, as a critical subfield, often utilizes MD for implementation, with trajectory data play a pivotal role in drug discovery. The advancement of high-performance computing and deep learning technology becomes popular and critical to predict protein properties from vast trajectory data, posing challenges regarding data features extraction from the complicated simulation data and dimensionality reduction. Simultaneously, it is essential to provide a meaningful explanation of the biological mechanism behind dimensionality. To tackle this challenge, we propose a new unsupervised model named RevGraphVAMP to intelligently analyze the simulation trajectory. This model is based on the variational approach for Markov processes (VAMP) and integrates graph convolutional neural networks and physical constraint optimization to enhance the learning performance. Additionally, we introduce attention mechanism to assess the importance of key interaction region, facilitating the interpretation of molecular mechanism. In comparison to other VAMPNets models, our model showcases competitive performance, improved accuracy in state transition prediction, as demonstrated through its application to two public datasets and the Shank3-Rap1 complex, which is associated with autism spectrum disorder. Moreover, it enhanced dimensionality reduction discrimination across different substates and provides interpretable results for protein structural characterization.
Collapse
Affiliation(s)
- Ying Huang
- Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China
| | - Huiling Zhang
- Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China; College of Mathematics and Informatics, South China Agricultural University, Guangzhou, 510642, China
| | - Zhenli Lin
- Department of Ophthalmology, Shenzhen University General Hospital, Shenzhen 518055, China
| | - Yanjie Wei
- Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China; Faculty of Computer Science and Control Engineering, Shenzhen University of Advanced Technology, Shenzhen 518107, China.
| | - Wenhui Xi
- Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China; Faculty of Computer Science and Control Engineering, Shenzhen University of Advanced Technology, Shenzhen 518107, China.
| |
Collapse
|
2
|
Wang D, Qiu Y, Beyerle ER, Huang X, Tiwary P. Information Bottleneck Approach for Markov Model Construction. J Chem Theory Comput 2024; 20:5352-5367. [PMID: 38859575 PMCID: PMC11199095 DOI: 10.1021/acs.jctc.4c00449] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/12/2024]
Abstract
Markov state models (MSMs) have proven valuable in studying the dynamics of protein conformational changes via statistical analysis of molecular dynamics simulations. In MSMs, the complex configuration space is coarse-grained into conformational states, with dynamics modeled by a series of Markovian transitions among these states at discrete lag times. Constructing the Markovian model at a specific lag time necessitates defining states that circumvent significant internal energy barriers, enabling internal dynamics relaxation within the lag time. This process effectively coarse-grains time and space, integrating out rapid motions within metastable states. Thus, MSMs possess a multiresolution nature, where the granularity of states can be adjusted according to the time-resolution, offering flexibility in capturing system dynamics. This work introduces a continuous embedding approach for molecular conformations using the state predictive information bottleneck (SPIB), a framework that unifies dimensionality reduction and state space partitioning via a continuous, machine learned basis set. Without explicit optimization of the VAMP-based scores, SPIB demonstrates state-of-the-art performance in identifying slow dynamical processes and constructing predictive multiresolution Markovian models. Through applications to well-validated mini-proteins, SPIB showcases unique advantages compared to competing methods. It autonomously and self-consistently adjusts the number of metastable states based on a specified minimal time resolution, eliminating the need for manual tuning. While maintaining efficacy in dynamical properties, SPIB excels in accurately distinguishing metastable states and capturing numerous well-populated macrostates. This contrasts with existing VAMP-based methods, which often emphasize slow dynamics at the expense of incorporating numerous sparsely populated states. Furthermore, SPIB's ability to learn a low-dimensional continuous embedding of the underlying MSMs enhances the interpretation of dynamic pathways. With these benefits, we propose SPIB as an easy-to-implement methodology for end-to-end MSM construction.
Collapse
Affiliation(s)
- Dedi Wang
- Biophysics Program and Institute for Physical Science and Technology, University of Maryland, College Park, MD 20742, United States
| | - Yunrui Qiu
- Department of Chemistry, Theoretical Chemistry Institute, University of Wisconsin-Madison, Madison, WI 53706, United States
- Data Science Institute, University of Wisconsin-Madison, Madison, WI, 53706, United States
| | - Eric R. Beyerle
- Institute for Physical Science and Technology, University of Maryland, College Park, MD 20742, United States
| | - Xuhui Huang
- Department of Chemistry, Theoretical Chemistry Institute, University of Wisconsin-Madison, Madison, WI 53706, United States
- Data Science Institute, University of Wisconsin-Madison, Madison, WI, 53706, United States
| | - Pratyush Tiwary
- Department of Chemistry and Biochemistry and Institute for Physical Science and Technology, University of Maryland, College Park, MD 20742, United States
- University of Maryland Institute for Health Computing, Bethesda, MD 20852, United States
| |
Collapse
|
3
|
Marques S, Kouba P, Legrand A, Sedlar J, Disson L, Planas-Iglesias J, Sanusi Z, Kunka A, Damborsky J, Pajdla T, Prokop Z, Mazurenko S, Sivic J, Bednar D. CoVAMPnet: Comparative Markov State Analysis for Studying Effects of Drug Candidates on Disordered Biomolecules. JACS AU 2024; 4:2228-2245. [PMID: 38938816 PMCID: PMC11200249 DOI: 10.1021/jacsau.4c00182] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/28/2024] [Revised: 04/24/2024] [Accepted: 05/13/2024] [Indexed: 06/29/2024]
Abstract
Computational study of the effect of drug candidates on intrinsically disordered biomolecules is challenging due to their vast and complex conformational space. Here, we developed a comparative Markov state analysis (CoVAMPnet) framework to quantify changes in the conformational distribution and dynamics of a disordered biomolecule in the presence and absence of small organic drug candidate molecules. First, molecular dynamics trajectories are generated using enhanced sampling, in the presence and absence of small molecule drug candidates, and ensembles of soft Markov state models (MSMs) are learned for each system using unsupervised machine learning. Second, these ensembles of learned MSMs are aligned across different systems based on a solution to an optimal transport problem. Third, the directional importance of inter-residue distances for the assignment to different conformational states is assessed by a discriminative analysis of aggregated neural network gradients. This final step provides interpretability and biophysical context to the learned MSMs. We applied this novel computational framework to assess the effects of ongoing phase 3 therapeutics tramiprosate (TMP) and its metabolite 3-sulfopropanoic acid (SPA) on the disordered Aβ42 peptide involved in Alzheimer's disease. Based on adaptive sampling molecular dynamics and CoVAMPnet analysis, we observed that both TMP and SPA preserved more structured conformations of Aβ42 by interacting nonspecifically with charged residues. SPA impacted Aβ42 more than TMP, protecting α-helices and suppressing the formation of aggregation-prone β-strands. Experimental biophysical analyses showed only mild effects of TMP/SPA on Aβ42 and activity enhancement by the endogenous metabolization of TMP into SPA. Our data suggest that TMP/SPA may also target biomolecules other than Aβ peptides. The CoVAMPnet method is broadly applicable to study the effects of drug candidates on the conformational behavior of intrinsically disordered biomolecules.
Collapse
Affiliation(s)
- Sérgio
M. Marques
- Loschmidt
Laboratories, Department of Experimental Biology and RECETOX, Faculty
of Science, Masaryk University, Kamenice 5, Brno 625 00, Czech Republic
- International
Clinical Research Center, St. Anne’s
University Hospital Brno, Pekarska 53, Brno 656
91, Czech Republic
| | - Petr Kouba
- Loschmidt
Laboratories, Department of Experimental Biology and RECETOX, Faculty
of Science, Masaryk University, Kamenice 5, Brno 625 00, Czech Republic
- Czech
Institute of Informatics, Robotics and Cybernetics, Czech Technical University in Prague, Jugoslavskych partyzanu 1580/3, Dejvice, Praha 6 160 00, Czech Republic
- Faculty
of Electrical Engineering, Czech Technical
University in Prague, Technicka 2, Dejvice, Praha 6 166 27, Czech Republic
| | - Anthony Legrand
- Loschmidt
Laboratories, Department of Experimental Biology and RECETOX, Faculty
of Science, Masaryk University, Kamenice 5, Brno 625 00, Czech Republic
- International
Clinical Research Center, St. Anne’s
University Hospital Brno, Pekarska 53, Brno 656
91, Czech Republic
| | - Jiri Sedlar
- Czech
Institute of Informatics, Robotics and Cybernetics, Czech Technical University in Prague, Jugoslavskych partyzanu 1580/3, Dejvice, Praha 6 160 00, Czech Republic
| | - Lucas Disson
- Czech
Institute of Informatics, Robotics and Cybernetics, Czech Technical University in Prague, Jugoslavskych partyzanu 1580/3, Dejvice, Praha 6 160 00, Czech Republic
| | - Joan Planas-Iglesias
- Loschmidt
Laboratories, Department of Experimental Biology and RECETOX, Faculty
of Science, Masaryk University, Kamenice 5, Brno 625 00, Czech Republic
- International
Clinical Research Center, St. Anne’s
University Hospital Brno, Pekarska 53, Brno 656
91, Czech Republic
| | - Zainab Sanusi
- Loschmidt
Laboratories, Department of Experimental Biology and RECETOX, Faculty
of Science, Masaryk University, Kamenice 5, Brno 625 00, Czech Republic
- International
Clinical Research Center, St. Anne’s
University Hospital Brno, Pekarska 53, Brno 656
91, Czech Republic
| | - Antonin Kunka
- Loschmidt
Laboratories, Department of Experimental Biology and RECETOX, Faculty
of Science, Masaryk University, Kamenice 5, Brno 625 00, Czech Republic
- International
Clinical Research Center, St. Anne’s
University Hospital Brno, Pekarska 53, Brno 656
91, Czech Republic
| | - Jiri Damborsky
- Loschmidt
Laboratories, Department of Experimental Biology and RECETOX, Faculty
of Science, Masaryk University, Kamenice 5, Brno 625 00, Czech Republic
- International
Clinical Research Center, St. Anne’s
University Hospital Brno, Pekarska 53, Brno 656
91, Czech Republic
| | - Tomas Pajdla
- Czech
Institute of Informatics, Robotics and Cybernetics, Czech Technical University in Prague, Jugoslavskych partyzanu 1580/3, Dejvice, Praha 6 160 00, Czech Republic
| | - Zbynek Prokop
- Loschmidt
Laboratories, Department of Experimental Biology and RECETOX, Faculty
of Science, Masaryk University, Kamenice 5, Brno 625 00, Czech Republic
- International
Clinical Research Center, St. Anne’s
University Hospital Brno, Pekarska 53, Brno 656
91, Czech Republic
| | - Stanislav Mazurenko
- Loschmidt
Laboratories, Department of Experimental Biology and RECETOX, Faculty
of Science, Masaryk University, Kamenice 5, Brno 625 00, Czech Republic
- International
Clinical Research Center, St. Anne’s
University Hospital Brno, Pekarska 53, Brno 656
91, Czech Republic
| | - Josef Sivic
- Czech
Institute of Informatics, Robotics and Cybernetics, Czech Technical University in Prague, Jugoslavskych partyzanu 1580/3, Dejvice, Praha 6 160 00, Czech Republic
| | - David Bednar
- Loschmidt
Laboratories, Department of Experimental Biology and RECETOX, Faculty
of Science, Masaryk University, Kamenice 5, Brno 625 00, Czech Republic
- International
Clinical Research Center, St. Anne’s
University Hospital Brno, Pekarska 53, Brno 656
91, Czech Republic
| |
Collapse
|
4
|
Lee SC, Z Y. Interpretation of autoencoder-learned collective variables using Morse-Smale complex and sublevelset persistent homology: An application on molecular trajectories. J Chem Phys 2024; 160:144104. [PMID: 38591676 DOI: 10.1063/5.0191446] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2023] [Accepted: 03/22/2024] [Indexed: 04/10/2024] Open
Abstract
Dimensionality reduction often serves as the first step toward a minimalist understanding of physical systems as well as the accelerated simulations of them. In particular, neural network-based nonlinear dimensionality reduction methods, such as autoencoders, have shown promising outcomes in uncovering collective variables (CVs). However, the physical meaning of these CVs remains largely elusive. In this work, we constructed a framework that (1) determines the optimal number of CVs needed to capture the essential molecular motions using an ensemble of hierarchical autoencoders and (2) provides topology-based interpretations to the autoencoder-learned CVs with Morse-Smale complex and sublevelset persistent homology. This approach was exemplified using a series of n-alkanes and can be regarded as a general, explainable nonlinear dimensionality reduction method.
Collapse
Affiliation(s)
- Shao-Chun Lee
- Department of Nuclear, Plasma, and Radiological Engineering, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, USA
- Beckman Institute for Advanced Science and Technology, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, USA
| | - Y Z
- Department of Nuclear, Plasma, and Radiological Engineering, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, USA
- Beckman Institute for Advanced Science and Technology, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, USA
- Department of Nuclear Engineering and Radiological Sciences, Department of Materials Science and Engineering, Department of Robotics, and Applied Physics Program, University of Michigan, Ann Arbor, Michigan 48105, USA
| |
Collapse
|
5
|
Wu Y, Cao S, Qiu Y, Huang X. Tutorial on how to build non-Markovian dynamic models from molecular dynamics simulations for studying protein conformational changes. J Chem Phys 2024; 160:121501. [PMID: 38516972 DOI: 10.1063/5.0189429] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2023] [Accepted: 02/20/2024] [Indexed: 03/23/2024] Open
Abstract
Protein conformational changes play crucial roles in their biological functions. In recent years, the Markov State Model (MSM) constructed from extensive Molecular Dynamics (MD) simulations has emerged as a powerful tool for modeling complex protein conformational changes. In MSMs, dynamics are modeled as a sequence of Markovian transitions among metastable conformational states at discrete time intervals (called lag time). A major challenge for MSMs is that the lag time must be long enough to allow transitions among states to become memoryless (or Markovian). However, this lag time is constrained by the length of individual MD simulations available to track these transitions. To address this challenge, we have recently developed Generalized Master Equation (GME)-based approaches, encoding non-Markovian dynamics using a time-dependent memory kernel. In this Tutorial, we introduce the theory behind two recently developed GME-based non-Markovian dynamic models: the quasi-Markov State Model (qMSM) and the Integrative Generalized Master Equation (IGME). We subsequently outline the procedures for constructing these models and provide a step-by-step tutorial on applying qMSM and IGME to study two peptide systems: alanine dipeptide and villin headpiece. This Tutorial is available at https://github.com/xuhuihuang/GME_tutorials. The protocols detailed in this Tutorial aim to be accessible for non-experts interested in studying the biomolecular dynamics using these non-Markovian dynamic models.
Collapse
Affiliation(s)
- Yue Wu
- Department of Chemistry, Theoretical Chemistry Institute, University of Wisconsin-Madison, Madison, Wisconsin 53706, USA
| | - Siqin Cao
- Department of Chemistry, Theoretical Chemistry Institute, University of Wisconsin-Madison, Madison, Wisconsin 53706, USA
| | - Yunrui Qiu
- Department of Chemistry, Theoretical Chemistry Institute, University of Wisconsin-Madison, Madison, Wisconsin 53706, USA
| | - Xuhui Huang
- Department of Chemistry, Theoretical Chemistry Institute, University of Wisconsin-Madison, Madison, Wisconsin 53706, USA
- Data Science Institute, University of Wisconsin-Madison, Madison, Wisconsin 53706, USA
| |
Collapse
|
6
|
Lelièvre T, Pigeon T, Stoltz G, Zhang W. Analyzing Multimodal Probability Measures with Autoencoders. J Phys Chem B 2024; 128:2607-2631. [PMID: 38466759 DOI: 10.1021/acs.jpcb.3c07075] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/13/2024]
Abstract
Finding collective variables to describe some important coarse-grained information on physical systems, in particular metastable states, remains a key issue in molecular dynamics. Recently, machine learning techniques have been intensively used to complement and possibly bypass expert knowledge in order to construct collective variables. Our focus here is on neural network approaches based on autoencoders. We study some relevant mathematical properties of the loss function considered for training autoencoders and provide physical interpretations based on conditional variances and minimum energy paths. We also consider various extensions in order to better describe physical systems, by incorporating more information on transition states at saddle points, and/or allowing for multiple decoders in order to describe several transition paths. Our results are illustrated on toy two-dimensional systems and on alanine dipeptide.
Collapse
Affiliation(s)
- Tony Lelièvre
- CERMICS, École des Ponts ParisTech, 6-8 Avenue Blaise Pascal, 77455 Marne-la-Vallée, France
- MATHERIALS Team-project, Inria Paris, 2 Rue Simone Iff, 75012 Paris, France
| | - Thomas Pigeon
- CERMICS, École des Ponts ParisTech, 6-8 Avenue Blaise Pascal, 77455 Marne-la-Vallée, France
- MATHERIALS Team-project, Inria Paris, 2 Rue Simone Iff, 75012 Paris, France
- IFP Energies Nouvelles, Rond-Point de l'Echangeur de Solaize, BP 3, 69360 Solaize, France
| | - Gabriel Stoltz
- CERMICS, École des Ponts ParisTech, 6-8 Avenue Blaise Pascal, 77455 Marne-la-Vallée, France
- MATHERIALS Team-project, Inria Paris, 2 Rue Simone Iff, 75012 Paris, France
| | - Wei Zhang
- Department of Mathematics and Computer Science, Freie Universität Berlin, Arnimallee 14, 14195 Berlin, Germany
- Zuse Institute Berlin, Takustraße 7, 14195 Berlin, Germany
| |
Collapse
|
7
|
Rydzewski J, Gökdemir T. Learning Markovian dynamics with spectral maps. J Chem Phys 2024; 160:091102. [PMID: 38436438 DOI: 10.1063/5.0189241] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2023] [Accepted: 02/05/2024] [Indexed: 03/05/2024] Open
Abstract
The long-time behavior of many complex molecular systems can often be described by Markovian dynamics in a slow subspace spanned by a few reaction coordinates referred to as collective variables (CVs). However, determining CVs poses a fundamental challenge in chemical physics. Depending on intuition or trial and error to construct CVs can lead to non-Markovian dynamics with long memory effects, hindering analysis. To address this problem, we continue to develop a recently introduced deep-learning technique called spectral map [J. Rydzewski, J. Phys. Chem. Lett. 14, 5216-5220 (2023)]. Spectral map learns slow CVs by maximizing a spectral gap of a Markov transition matrix describing anisotropic diffusion. Here, to represent heterogeneous and multiscale free-energy landscapes with spectral map, we implement an adaptive algorithm to estimate transition probabilities. Through a Markov state model analysis, we validate that spectral map learns slow CVs related to the dominant relaxation timescales and discerns between long-lived metastable states.
Collapse
Affiliation(s)
- Jakub Rydzewski
- Institute of Physics, Faculty of Physics, Astronomy and Informatics, Nicolaus Copernicus University, Grudziadzka 5, 87-100 Toruń, Poland
| | - Tuğçe Gökdemir
- Institute of Physics, Faculty of Physics, Astronomy and Informatics, Nicolaus Copernicus University, Grudziadzka 5, 87-100 Toruń, Poland
| |
Collapse
|
8
|
Fu H, Bian H, Shao X, Cai W. Collective Variable-Based Enhanced Sampling: From Human Learning to Machine Learning. J Phys Chem Lett 2024; 15:1774-1783. [PMID: 38329095 DOI: 10.1021/acs.jpclett.3c03542] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/09/2024]
Abstract
Enhanced-sampling algorithms relying on collective variables (CVs) are extensively employed to study complex (bio)chemical processes that are not amenable to brute-force molecular simulations. The selection of appropriate CVs characterizing the slow movement modes is of paramount importance for reliable and efficient enhanced-sampling simulations. In this Perspective, we first review the application and limitations of CVs obtained from chemical and geometrical intuition. We also introduce path-sampling algorithms, which can identify path-like CVs in a high-dimensional free-energy space. Machine-learning algorithms offer a viable approach to finding suitable CVs by analyzing trajectories from preliminary simulations. We discuss both the performance of machine-learning-derived CVs in enhanced-sampling simulations of experimental models and the challenges involved in applying these CVs to realistic, complex molecular assemblies. Moreover, we provide a prospective view of the potential advancements of machine-learning algorithms for the development of CVs in the field of enhanced-sampling simulations.
Collapse
Affiliation(s)
- Haohao Fu
- Research Center for Analytical Sciences, Frontiers Science Center for New Organic Matter, College of Chemistry, Nankai University, Tianjin Key Laboratory of Biosensing and Molecular Recognition, State Key Laboratory of Medicinal Chemical Biology, Tianjin 300071, China
- Haihe Laboratory of Sustainable Chemical Transformations, Tianjin 300192, China
| | - Hengwei Bian
- Research Center for Analytical Sciences, Frontiers Science Center for New Organic Matter, College of Chemistry, Nankai University, Tianjin Key Laboratory of Biosensing and Molecular Recognition, State Key Laboratory of Medicinal Chemical Biology, Tianjin 300071, China
- Haihe Laboratory of Sustainable Chemical Transformations, Tianjin 300192, China
| | - Xueguang Shao
- Research Center for Analytical Sciences, Frontiers Science Center for New Organic Matter, College of Chemistry, Nankai University, Tianjin Key Laboratory of Biosensing and Molecular Recognition, State Key Laboratory of Medicinal Chemical Biology, Tianjin 300071, China
- Haihe Laboratory of Sustainable Chemical Transformations, Tianjin 300192, China
| | - Wensheng Cai
- Research Center for Analytical Sciences, Frontiers Science Center for New Organic Matter, College of Chemistry, Nankai University, Tianjin Key Laboratory of Biosensing and Molecular Recognition, State Key Laboratory of Medicinal Chemical Biology, Tianjin 300071, China
- Haihe Laboratory of Sustainable Chemical Transformations, Tianjin 300192, China
| |
Collapse
|
9
|
Lücke M, Winkelmann S, Heitzig J, Molkenthin N, Koltai P. Learning interpretable collective variables for spreading processes on networks. Phys Rev E 2024; 109:L022301. [PMID: 38491651 DOI: 10.1103/physreve.109.l022301] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2023] [Accepted: 12/28/2023] [Indexed: 03/18/2024]
Abstract
Collective variables (CVs) are low-dimensional projections of high-dimensional system states. They are used to gain insights into complex emergent dynamical behaviors of processes on networks. The relation between CVs and network measures is not well understood and its derivation typically requires detailed knowledge of both the dynamical system and the network topology. In this Letter, we present a data-driven method for algorithmically learning and understanding CVs for binary-state spreading processes on networks of arbitrary topology. We demonstrate our method using four example networks: the stochastic block model, a ring-shaped graph, a random regular graph, and a scale-free network generated by the Albert-Barabási model. Our results deliver evidence for the existence of low-dimensional CVs even in cases that are not yet understood theoretically.
Collapse
Affiliation(s)
- Marvin Lücke
- Modeling and Simulation of Complex Processes, Zuse Institute Berlin, 14195 Berlin, Germany
| | - Stefanie Winkelmann
- Modeling and Simulation of Complex Processes, Zuse Institute Berlin, 14195 Berlin, Germany
| | - Jobst Heitzig
- FutureLab on Game Theory and Networks of Interacting Agents, Potsdam Institute for Climate Impact Research, 14473 Potsdam, Germany and Zuse Institute Berlin, 14195 Berlin, Germany
| | - Nora Molkenthin
- Complexity Science Department, Potsdam Institute for Climate Impact Research, 14473 Potsdam, Germany
| | - Péter Koltai
- Department of Mathematics, University of Bayreuth, 95447 Bayreuth, Germany
| |
Collapse
|
10
|
Herringer NSM, Dasetty S, Gandhi D, Lee J, Ferguson AL. Permutationally Invariant Networks for Enhanced Sampling (PINES): Discovery of Multimolecular and Solvent-Inclusive Collective Variables. J Chem Theory Comput 2024; 20:178-198. [PMID: 38150421 DOI: 10.1021/acs.jctc.3c00923] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2023]
Abstract
The typically rugged nature of molecular free-energy landscapes can frustrate efficient sampling of the thermodynamically relevant phase space due to the presence of high free-energy barriers. Enhanced sampling techniques can improve phase space exploration by accelerating sampling along particular collective variables (CVs). A number of techniques exist for the data-driven discovery of CVs parametrizing the important large-scale motions of the system. A challenge to CV discovery is learning CVs invariant to the symmetries of the molecular system, frequently rigid translation, rigid rotation, and permutational relabeling of identical particles. Of these, permutational invariance has proved a persistent challenge in frustrating the data-driven discovery of multimolecular CVs in systems of self-assembling particles and solvent-inclusive CVs for solvated systems. In this work, we integrate permutation invariant vector (PIV) featurizations with autoencoding neural networks to learn nonlinear CVs invariant to translation, rotation, and permutation and perform interleaved rounds of CV discovery and enhanced sampling to iteratively expand the sampling of configurational phase space and obtain converged CVs and free-energy landscapes. We demonstrate the permutationally invariant network for enhanced sampling (PINES) approach in applications to the self-assembly of a 13-atom argon cluster, association/dissociation of a NaCl ion pair in water, and hydrophobic collapse of a C45H92 n-pentatetracontane polymer chain. We make the approach freely available as a new module within the PLUMED2 enhanced sampling libraries.
Collapse
Affiliation(s)
| | - Siva Dasetty
- Pritzker School of Molecular Engineering, University of Chicago, Chicago, Illinois 60637, United States
| | - Diya Gandhi
- Pritzker School of Molecular Engineering, University of Chicago, Chicago, Illinois 60637, United States
| | - Junhee Lee
- Pritzker School of Molecular Engineering, University of Chicago, Chicago, Illinois 60637, United States
| | - Andrew L Ferguson
- Pritzker School of Molecular Engineering, University of Chicago, Chicago, Illinois 60637, United States
| |
Collapse
|
11
|
Ishizone T, Matsunaga Y, Fuchigami S, Nakamura K. Representation of Protein Dynamics Disentangled by Time-Structure-Based Prior. J Chem Theory Comput 2024; 20:436-450. [PMID: 38151233 DOI: 10.1021/acs.jctc.3c01025] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2023]
Abstract
Representation learning (RL) is a universal technique for deriving low-dimensional disentangled representations from high-dimensional observations, aiding in a multitude of downstream tasks. RL has been extensively applied to various data types, including images and natural language. Here, we analyze molecular dynamics (MD) simulation data of biomolecules in terms of RL. Currently, state-of-the-art RL techniques, mainly motivated by the variational principle, try to capture slow motions in the representation (latent) space. Here, we propose two methods based on an alternative perspective on the disentanglement in the latent space. By disentanglement, we here mean the separation of underlying factors in the simulation data, aiding in detecting physically important coordinates for conformational transitions. The proposed methods introduce a simple prior that imposes temporal constraints in the latent space, serving as a regularization term to facilitate the capture of disentangled representations of dynamics. Comparison with other methods via the analysis of MD simulation trajectories for alanine dipeptide and chignolin validates that the proposed methods construct Markov state models (MSMs) whose implied time scales are comparable to those of the state-of-the-art methods. Using a measure based on total variation, we quantitatively evaluated that the proposed methods successfully disentangle physically important coordinates, aiding the interpretation of folding/unfolding transitions of chignolin. Overall, our methods provide good representations of complex biomolecular dynamics for downstream tasks, allowing for better interpretations of the conformational transitions.
Collapse
Affiliation(s)
- Tsuyoshi Ishizone
- Mathematical Sciences Program, Graduate School of Advanced Mathematical Sciences, Meiji University, Nakano 4-21-1, Nakano-ku, Tokyo 164-8525, Japan
| | - Yasuhiro Matsunaga
- Graduate School of Science and Engineering, Saitama University, Shimo-Okubo 255, Sakura-ku, Saitama-shi, Saitama 338-8570, Japan
| | - Sotaro Fuchigami
- Physical Biochemistry Laboratory, Division of Pharmaceutical Sciences, School of Pharmaceutical Sciences, University of Shizuoka, 52-1 Yada, Suruga-ku, Shizuoka 422-8526, Japan
| | - Kazuyuki Nakamura
- Department of Mathematical Sciences Based on Modeling and Analysis, School of Interdisciplinary Mathematical Sciences, Meiji University, Nakano 4-21-1, Nakano-ku, Tokyo 164-8525, Japan
| |
Collapse
|
12
|
Fu H, Liu H, Xing J, Zhao T, Shao X, Cai W. Deep-Learning-Assisted Enhanced Sampling for Exploring Molecular Conformational Changes. J Phys Chem B 2023; 127:9926-9935. [PMID: 37947397 DOI: 10.1021/acs.jpcb.3c05284] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2023]
Abstract
We present a novel strategy to explore conformational changes and identify stable states of molecular objects, eliminating the need for a priori knowledge. The approach applies a deep learning method to extract information about the movement modes of the molecular object from a short, high-dimensional, and parameter-free preliminary enhanced-sampling simulation. The gathered information is described by a small set of deep-learning-based collective variables (dCVs), which steer the production-enhanced-sampling simulation. Considering the challenge of adequately exploring the configurational space using the low-dimensional, suboptimal dCVs, we incorporate a method designed for ergodic sampling, namely, Gaussian-accelerated molecular dynamics (MD), into the framework of CV-based enhanced sampling. MD simulations on both toy models and nontrivial examples demonstrate the remarkable computational efficiency of the strategy in capturing the conformational changes of molecular objects without a priori knowledge. Specifically, we achieved the blind folding of two fast folders, chignolin and villin, within a time scale of hundreds of nanoseconds and successfully reconstructed the free-energy landscapes that characterize their reversible folding. All in all, the presented strategy holds significant promise for investigating conformational changes in macromolecules, and it is anticipated to find extensive applications in the fields of chemistry and biology.
Collapse
Affiliation(s)
- Haohao Fu
- Research Center for Analytical Sciences, Tianjin Key Laboratory of Biosensing and Molecular Recognition, State Key Laboratory of Medicinal Chemical Biology, College of Chemistry, Nankai University, Tianjin 300071, China
- Haihe Laboratory of Sustainable Chemical Transformations, Tianjin 300192, China
| | - Han Liu
- Research Center for Analytical Sciences, Tianjin Key Laboratory of Biosensing and Molecular Recognition, State Key Laboratory of Medicinal Chemical Biology, College of Chemistry, Nankai University, Tianjin 300071, China
- Haihe Laboratory of Sustainable Chemical Transformations, Tianjin 300192, China
| | - Jingya Xing
- Research Center for Analytical Sciences, Tianjin Key Laboratory of Biosensing and Molecular Recognition, State Key Laboratory of Medicinal Chemical Biology, College of Chemistry, Nankai University, Tianjin 300071, China
- Haihe Laboratory of Sustainable Chemical Transformations, Tianjin 300192, China
| | - Tong Zhao
- Research Center for Analytical Sciences, Tianjin Key Laboratory of Biosensing and Molecular Recognition, State Key Laboratory of Medicinal Chemical Biology, College of Chemistry, Nankai University, Tianjin 300071, China
| | - Xueguang Shao
- Research Center for Analytical Sciences, Tianjin Key Laboratory of Biosensing and Molecular Recognition, State Key Laboratory of Medicinal Chemical Biology, College of Chemistry, Nankai University, Tianjin 300071, China
- Haihe Laboratory of Sustainable Chemical Transformations, Tianjin 300192, China
| | - Wensheng Cai
- Research Center for Analytical Sciences, Tianjin Key Laboratory of Biosensing and Molecular Recognition, State Key Laboratory of Medicinal Chemical Biology, College of Chemistry, Nankai University, Tianjin 300071, China
- Haihe Laboratory of Sustainable Chemical Transformations, Tianjin 300192, China
| |
Collapse
|
13
|
Siddiqui GA, Stebani JA, Wragg D, Koutsourelakis PS, Casini A, Gagliardi A. Application of Machine Learning Algorithms to Metadynamics for the Elucidation of the Binding Modes and Free Energy Landscape of Drug/Target Interactions: a Case Study. Chemistry 2023; 29:e202302375. [PMID: 37555841 DOI: 10.1002/chem.202302375] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2023] [Accepted: 08/09/2023] [Indexed: 08/10/2023]
Abstract
In the context of drug discovery, computational methods were able to accelerate the challenging process of designing and optimizing a new drug candidate. Amongst the possible atomistic simulation approaches, metadynamics (metaD) has proven very powerful. However, the choice of collective variables (CVs) is not trivial for complex systems. To automate the process of CVs identification, two different machine learning algorithms were applied in this study, namely DeepLDA and Autoencoder, to the metaD simulation of a well-researched drug/target complex, consisting in a pharmacologically relevant non-canonical DNA secondary structure (G-quadruplex) and a metallodrug acting as its stabilizer, as well as solvent molecules.
Collapse
Affiliation(s)
- Gohar Ali Siddiqui
- Professorship of Simulation of Nanosystems for Energy Conversion Department of Electrical and Computer Engineering School of Computation, Information and Technology, Technical University of Munich (TUM), Hans-Piloty-Str. 1, 85748, Garching b. München, Germany
| | - Julia A Stebani
- Chair of Medicinal and Bioinorganic Chemistry Department of Chemistry, School of Natural Sciences, Technical University of Munich (TUM), Lichtenbergstr. 4, 85748, Garching b. München, Germany
| | - Darren Wragg
- Chair of Medicinal and Bioinorganic Chemistry Department of Chemistry, School of Natural Sciences, Technical University of Munich (TUM), Lichtenbergstr. 4, 85748, Garching b. München, Germany
| | - Phaedon-Stelios Koutsourelakis
- Professorship for Data-driven Materials Modeling School of Engineering and Design, Technical University of Munich (TUM), Boltzmannstr. 15, 85748, Garching b. München, Germany
| | - Angela Casini
- Chair of Medicinal and Bioinorganic Chemistry Department of Chemistry, School of Natural Sciences, Technical University of Munich (TUM), Lichtenbergstr. 4, 85748, Garching b. München, Germany
| | - Alessio Gagliardi
- Professorship of Simulation of Nanosystems for Energy Conversion Department of Electrical and Computer Engineering School of Computation, Information and Technology, Technical University of Munich (TUM), Hans-Piloty-Str. 1, 85748, Garching b. München, Germany
| |
Collapse
|
14
|
Liu B, Xue M, Qiu Y, Konovalov KA, O’Connor MS, Huang X. GraphVAMPnets for uncovering slow collective variables of self-assembly dynamics. J Chem Phys 2023; 159:094901. [PMID: 37655771 PMCID: PMC11005469 DOI: 10.1063/5.0158903] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2023] [Accepted: 08/11/2023] [Indexed: 09/02/2023] Open
Abstract
Uncovering slow collective variables (CVs) of self-assembly dynamics is important to elucidate its numerous kinetic assembly pathways and drive the design of novel structures for advanced materials through the bottom-up approach. However, identifying the CVs for self-assembly presents several challenges. First, self-assembly systems often consist of identical monomers, and the feature representations should be invariant to permutations and rotational symmetries. Physical coordinates, such as aggregate size, lack high-resolution detail, while common geometric coordinates like pairwise distances are hindered by the permutation and rotational symmetry challenges. Second, self-assembly is usually a downhill process, and the trajectories often suffer from insufficient sampling of backward transitions that correspond to the dissociation of self-assembled structures. Popular dimensionality reduction methods, such as time-structure independent component analysis, impose detailed balance constraints, potentially obscuring the true dynamics of self-assembly. In this work, we employ GraphVAMPnets, which combines graph neural networks with a variational approach for Markovian process (VAMP) theory to identify the slow CVs of the self-assembly processes. First, GraphVAMPnets bears the advantages of graph neural networks, in which the graph embeddings can represent self-assembly structures in high-resolution while being invariant to permutations and rotational symmetries. Second, it is built upon VAMP theory, which studies Markov processes without forcing detailed balance constraints, which addresses the out-of-equilibrium challenge in the self-assembly process. We demonstrate GraphVAMPnets for identifying slow CVs of self-assembly kinetics in two systems: the aggregation of two hydrophobic molecules and the self-assembly of patchy particles. We expect that our GraphVAMPnets can be widely applied to molecular self-assembly.
Collapse
Affiliation(s)
- Bojun Liu
- Department of Chemistry, Theoretical Chemistry Institute, University of Wisconsin-Madison, Madison, Wisconsin 53706, USA
| | - Mingyi Xue
- Department of Chemistry, Theoretical Chemistry Institute, University of Wisconsin-Madison, Madison, Wisconsin 53706, USA
| | - Yunrui Qiu
- Department of Chemistry, Theoretical Chemistry Institute, University of Wisconsin-Madison, Madison, Wisconsin 53706, USA
| | - Kirill A. Konovalov
- Department of Chemistry, Theoretical Chemistry Institute, University of Wisconsin-Madison, Madison, Wisconsin 53706, USA
| | - Michael S. O’Connor
- Biophysics Graduate Program, University of Wisconsin-Madison, Madison, Wisconsin 53706, USA
| | - Xuhui Huang
- Author to whom correspondence should be addressed:
| |
Collapse
|
15
|
Van Speybroeck V, Bocus M, Cnudde P, Vanduyfhuys L. Operando Modeling of Zeolite-Catalyzed Reactions Using First-Principles Molecular Dynamics Simulations. ACS Catal 2023; 13:11455-11493. [PMID: 37671178 PMCID: PMC10476167 DOI: 10.1021/acscatal.3c01945] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2023] [Revised: 07/27/2023] [Indexed: 09/07/2023]
Abstract
Within this Perspective, we critically reflect on the role of first-principles molecular dynamics (MD) simulations in unraveling the catalytic function within zeolites under operating conditions. First-principles MD simulations refer to methods where the dynamics of the nuclei is followed in time by integrating the Newtonian equations of motion on a potential energy surface that is determined by solving the quantum-mechanical many-body problem for the electrons. Catalytic solids used in industrial applications show an intriguing high degree of complexity, with phenomena taking place at a broad range of length and time scales. Additionally, the state and function of a catalyst critically depend on the operating conditions, such as temperature, moisture, presence of water, etc. Herein we show by means of a series of exemplary cases how first-principles MD simulations are instrumental to unravel the catalyst complexity at the molecular scale. Examples show how the nature of reactive species at higher catalytic temperatures may drastically change compared to species at lower temperatures and how the nature of active sites may dynamically change upon exposure to water. To simulate rare events, first-principles MD simulations need to be used in combination with enhanced sampling techniques to efficiently sample low-probability regions of phase space. Using these techniques, it is shown how competitive pathways at operating conditions can be discovered and how broad transition state regions can be explored. Interestingly, such simulations can also be used to study hindered diffusion under operating conditions. The cases shown clearly illustrate how first-principles MD simulations reveal insights into the catalytic function at operating conditions, which could not be discovered using static or local approaches where only a few points are considered on the potential energy surface (PES). Despite these advantages, some major hurdles still exist to fully integrate first-principles MD methods in a standard computational catalytic workflow or to use the output of MD simulations as input for multiple length/time scale methods that aim to bridge to the reactor scale. First of all, methods are needed that allow us to evaluate the interatomic forces with quantum-mechanical accuracy, albeit at a much lower computational cost compared to currently used density functional theory (DFT) methods. The use of DFT limits the currently attainable length/time scales to hundreds of picoseconds and a few nanometers, which are much smaller than realistic catalyst particle dimensions and time scales encountered in the catalysis process. One solution could be to construct machine learning potentials (MLPs), where a numerical potential is derived from underlying quantum-mechanical data, which could be used in subsequent MD simulations. As such, much longer length and time scales could be reached; however, quite some research is still necessary to construct MLPs for the complex systems encountered in industrially used catalysts. Second, most currently used enhanced sampling techniques in catalysis make use of collective variables (CVs), which are mostly determined based on chemical intuition. To explore complex reactive networks with MD simulations, methods are needed that allow the automatic discovery of CVs or methods that do not rely on a priori definition of CVs. Recently, various data-driven methods have been proposed, which could be explored for complex catalytic systems. Lastly, first-principles MD methods are currently mostly used to investigate local reactive events. We hope that with the rise of data-driven methods and more efficient methods to describe the PES, first-principles MD methods will in the future also be able to describe longer length/time scale processes in catalysis. This might lead to a consistent dynamic description of all steps-diffusion, adsorption, and reaction-as they take place at the catalyst particle level.
Collapse
Affiliation(s)
| | - Massimo Bocus
- Center for Molecular Modeling, Ghent University, Technologiepark 46, 9052 Zwijnaarde, Belgium
| | - Pieter Cnudde
- Center for Molecular Modeling, Ghent University, Technologiepark 46, 9052 Zwijnaarde, Belgium
| | - Louis Vanduyfhuys
- Center for Molecular Modeling, Ghent University, Technologiepark 46, 9052 Zwijnaarde, Belgium
| |
Collapse
|
16
|
Qiu Y, O’Connor MS, Xue M, Liu B, Huang X. An Efficient Path Classification Algorithm Based on Variational Autoencoder to Identify Metastable Path Channels for Complex Conformational Changes. J Chem Theory Comput 2023; 19:4728-4742. [PMID: 37382437 PMCID: PMC11042546 DOI: 10.1021/acs.jctc.3c00318] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/30/2023]
Abstract
Conformational changes (i.e., dynamic transitions between pairs of conformational states) play important roles in many chemical and biological processes. Constructing the Markov state model (MSM) from extensive molecular dynamics (MD) simulations is an effective approach to dissect the mechanism of conformational changes. When combined with transition path theory (TPT), MSM can be applied to elucidate the ensemble of kinetic pathways connecting pairs of conformational states. However, the application of TPT to analyze complex conformational changes often results in a vast number of kinetic pathways with comparable fluxes. This obstacle is particularly pronounced in heterogeneous self-assembly and aggregation processes. The large number of kinetic pathways makes it challenging to comprehend the molecular mechanisms underlying conformational changes of interest. To address this challenge, we have developed a path classification algorithm named latent-space path clustering (LPC) that efficiently lumps parallel kinetic pathways into distinct metastable path channels, making them easier to comprehend. In our algorithm, MD conformations are first projected onto a low-dimensional space containing a small set of collective variables (CVs) by time-structure-based independent component analysis (tICA) with kinetic mapping. Then, MSM and TPT are constructed to obtain the ensemble of pathways, and a deep learning architecture named the variational autoencoder (VAE) is used to learn the spatial distributions of kinetic pathways in the continuous CV space. Based on the trained VAE model, the TPT-generated ensemble of kinetic pathways can be embedded into a latent space, where the classification becomes clear. We show that LPC can efficiently and accurately identify the metastable path channels in three systems: a 2D potential, the aggregation of two hydrophobic particles in water, and the folding of the Fip35 WW domain. Using the 2D potential, we further demonstrate that our LPC algorithm outperforms the previous path-lumping algorithms by making substantially fewer incorrect assignments of individual pathways to four path channels. We expect that LPC can be widely applied to identify the dominant kinetic pathways underlying complex conformational changes.
Collapse
Affiliation(s)
- Yunrui Qiu
- Department of Chemistry, Theoretical Chemistry Institute, University of Wisconsin-Madison, Madison, WI, 53706, USA
| | - Michael S. O’Connor
- Biophysics Graduate Program, University of Wisconsin-Madison, Madison, WI, 53706, USA
| | - Mingyi Xue
- Department of Chemistry, Theoretical Chemistry Institute, University of Wisconsin-Madison, Madison, WI, 53706, USA
| | - Bojun Liu
- Department of Chemistry, Theoretical Chemistry Institute, University of Wisconsin-Madison, Madison, WI, 53706, USA
| | - Xuhui Huang
- Department of Chemistry, Theoretical Chemistry Institute, University of Wisconsin-Madison, Madison, WI, 53706, USA
- Biophysics Graduate Program, University of Wisconsin-Madison, Madison, WI, 53706, USA
| |
Collapse
|
17
|
Sasmal S, McCullagh M, Hocky GM. Reaction Coordinates for Conformational Transitions Using Linear Discriminant Analysis on Positions. J Chem Theory Comput 2023; 19:4427-4435. [PMID: 37130367 PMCID: PMC10373481 DOI: 10.1021/acs.jctc.3c00051] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2023] [Indexed: 05/04/2023]
Abstract
In this work, we demonstrate that Linear Discriminant Analysis (LDA) applied to atomic positions in two different states of a biomolecule produces a good reaction coordinate between those two states. Atomic coordinates of a macromolecule are a direct representation of a macromolecular configuration, and yet, they are not used in enhanced sampling studies due to a lack of rotational and translational invariance. We resolve this issue using the technique of our prior work, whereby a molecular configuration is considered a member of an equivalence class in size-and-shape space, which is the set of all configurations that can be translated and rotated to a single point within a reference multivariate Gaussian distribution characterizing a single molecular state. The reaction coordinates produced by LDA applied to positions are shown to be good reaction coordinates both in terms of characterizing the transition between two states of a system within a long molecular dynamics (MD) simulation and also ones that allow us to readily produce free energy estimates along that reaction coordinate using enhanced sampling MD techniques.
Collapse
Affiliation(s)
- Subarna Sasmal
- Department
of Chemistry and Simons Center for Computational Physical Chemistry, New York University, New York, New York 10003, United States
| | - Martin McCullagh
- Department
of Chemistry, Oklahoma State University, Stillwater, Oklahoma 74078, United States
| | - Glen M. Hocky
- Department
of Chemistry and Simons Center for Computational Physical Chemistry, New York University, New York, New York 10003, United States
| |
Collapse
|
18
|
Conev A, Rigo MM, Devaurs D, Fonseca AF, Kalavadwala H, de Freitas MV, Clementi C, Zanatta G, Antunes DA, Kavraki LE. EnGens: a computational framework for generation and analysis of representative protein conformational ensembles. Brief Bioinform 2023; 24:bbad242. [PMID: 37418278 PMCID: PMC10359083 DOI: 10.1093/bib/bbad242] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2023] [Revised: 05/23/2023] [Accepted: 06/10/2023] [Indexed: 07/08/2023] Open
Abstract
Proteins are dynamic macromolecules that perform vital functions in cells. A protein structure determines its function, but this structure is not static, as proteins change their conformation to achieve various functions. Understanding the conformational landscapes of proteins is essential to understand their mechanism of action. Sets of carefully chosen conformations can summarize such complex landscapes and provide better insights into protein function than single conformations. We refer to these sets as representative conformational ensembles. Recent advances in computational methods have led to an increase in the number of available structural datasets spanning conformational landscapes. However, extracting representative conformational ensembles from such datasets is not an easy task and many methods have been developed to tackle it. Our new approach, EnGens (short for ensemble generation), collects these methods into a unified framework for generating and analyzing representative protein conformational ensembles. In this work, we: (1) provide an overview of existing methods and tools for representative protein structural ensemble generation and analysis; (2) unify existing approaches in an open-source Python package, and a portable Docker image, providing interactive visualizations within a Jupyter Notebook pipeline; (3) test our pipeline on a few canonical examples from the literature. Representative ensembles produced by EnGens can be used for many downstream tasks such as protein-ligand ensemble docking, Markov state modeling of protein dynamics and analysis of the effect of single-point mutations.
Collapse
Affiliation(s)
- Anja Conev
- Department of Computer Science, Rice University, Houston 77005, TX, USA
| | | | - Didier Devaurs
- MRC Institute of Genetics and Cancer, University of Edinburgh, Edinburgh EH4 2XU, UK
| | | | - Hussain Kalavadwala
- Department of Biology and Biochemistry, University of Houston, Houston 77004, TX, USA
| | | | - Cecilia Clementi
- Department of Physics, Freie Universität Berlin, Berlin 14195, Germany
| | - Geancarlo Zanatta
- Department of Biophysics, Institute of Biosciences, Federal University of Rio Grande do Sul, Porto Alegre 91501-970, Brazil
| | - Dinler Amaral Antunes
- Department of Biology and Biochemistry, University of Houston, Houston 77004, TX, USA
| | - Lydia E Kavraki
- Department of Computer Science, Rice University, Houston 77005, TX, USA
| |
Collapse
|
19
|
Strahan J, Guo SC, Lorpaiboon C, Dinner AR, Weare J. Inexact iterative numerical linear algebra for neural network-based spectral estimation and rare-event prediction. J Chem Phys 2023; 159:014110. [PMID: 37409704 PMCID: PMC10328561 DOI: 10.1063/5.0151309] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2023] [Accepted: 06/02/2023] [Indexed: 07/07/2023] Open
Abstract
Understanding dynamics in complex systems is challenging because there are many degrees of freedom, and those that are most important for describing events of interest are often not obvious. The leading eigenfunctions of the transition operator are useful for visualization, and they can provide an efficient basis for computing statistics, such as the likelihood and average time of events (predictions). Here, we develop inexact iterative linear algebra methods for computing these eigenfunctions (spectral estimation) and making predictions from a dataset of short trajectories sampled at finite intervals. We demonstrate the methods on a low-dimensional model that facilitates visualization and a high-dimensional model of a biomolecular system. Implications for the prediction problem in reinforcement learning are discussed.
Collapse
Affiliation(s)
- John Strahan
- Department of Chemistry and James Franck Institute, University of Chicago, Chicago, Illinois 60637, USA
| | - Spencer C. Guo
- Department of Chemistry and James Franck Institute, University of Chicago, Chicago, Illinois 60637, USA
| | - Chatipat Lorpaiboon
- Department of Chemistry and James Franck Institute, University of Chicago, Chicago, Illinois 60637, USA
| | - Aaron R. Dinner
- Department of Chemistry and James Franck Institute, University of Chicago, Chicago, Illinois 60637, USA
| | - Jonathan Weare
- Courant Institute of Mathematical Sciences, New York University, New York, New York 10012, USA
| |
Collapse
|
20
|
Chen H, Roux B, Chipot C. Discovering Reaction Pathways, Slow Variables, and Committor Probabilities with Machine Learning. J Chem Theory Comput 2023. [PMID: 37224455 DOI: 10.1021/acs.jctc.3c00028] [Citation(s) in RCA: 10] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/26/2023]
Abstract
A significant challenge faced by atomistic simulations is the difficulty, and often impossibility, to sample the transitions between metastable states of the free-energy landscape associated with slow molecular processes. Importance-sampling schemes represent an appealing option to accelerate the underlying dynamics by smoothing out the relevant free-energy barriers, but require the definition of suitable reaction-coordinate (RC) models expressed in terms of compact low-dimensional sets of collective variables (CVs). While most computational studies of slow molecular processes have traditionally relied on educated guesses based on human intuition to reduce the dimensionality of the problem at hand, a variety of machine-learning (ML) algorithms have recently emerged as powerful alternatives to discover meaningful CVs capable of capturing the dynamics of the slowest degrees of freedom. Considering a simple paradigmatic situation in which the long-time dynamics is dominated by the transition between two known metastable states, we compare two variational data-driven ML methods based on Siamese neural networks aimed at discovering a meaningful RC model─the slowest decorrelating CV of the molecular process, and the committor probability to first reach one of the two metastable states. One method is the state-free reversible variational approach for Markov processes networks (VAMPnets), or SRVs─the other, inspired by the transition path theory framework, is the variational committor-based neural networks, or VCNs. The relationship and the ability of these methodologies to discover the relevant descriptors of the slow molecular process of interest are illustrated with a series of simple model systems. We also show that both strategies are amenable to importance-sampling schemes through an appropriate reweighting algorithm that approximates the kinetic properties of the transition.
Collapse
Affiliation(s)
- Haochuan Chen
- Laboratoire International Associé Centre National de la Recherche Scientifique et University of Illinois at Urbana-Champaign, Unité Mixte de Recherche n°7019, Université de Lorraine, B.P. 70239, 54506 Vandœuvre-lès-Nancy cedex, France
| | - Benoît Roux
- Department of Biochemistry and Molecular Biology, University of Chicago, Chicago, 60637, United States
| | - Christophe Chipot
- Laboratoire International Associé Centre National de la Recherche Scientifique et University of Illinois at Urbana-Champaign, Unité Mixte de Recherche n°7019, Université de Lorraine, B.P. 70239, 54506 Vandœuvre-lès-Nancy cedex, France
- Department of Biochemistry and Molecular Biology, University of Chicago, Chicago, 60637, United States
- NIH Center for Macromolecular Modeling and Bioinformatics, Beckman Institute for Advanced Science and Technology, and Department of Physics, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, United States
| |
Collapse
|
21
|
Bandyopadhyay S, Mondal J. A deep encoder-decoder framework for identifying distinct ligand binding pathways. J Chem Phys 2023; 158:2890463. [PMID: 37184003 DOI: 10.1063/5.0145197] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2023] [Accepted: 04/25/2023] [Indexed: 05/16/2023] Open
Abstract
The pathway(s) that a ligand would adopt en route to its trajectory to the native pocket of the receptor protein act as a key determinant of its biological activity. While Molecular Dynamics (MD) simulations have emerged as the method of choice for modeling protein-ligand binding events, the high dimensional nature of the MD-derived trajectories often remains a barrier in the statistical elucidation of distinct ligand binding pathways due to the stochasticity inherent in the ligand's fluctuation in the solution and around the receptor. Here, we demonstrate that an autoencoder based deep neural network, trained using an objective input feature of a large matrix of residue-ligand distances, can efficiently produce an optimal low-dimensional latent space that stores necessary information on the ligand-binding event. In particular, for a system of L99A mutant of T4 lysozyme interacting with its native ligand, benzene, this deep encoder-decoder framework automatically identifies multiple distinct recognition pathways, without requiring user intervention. The intermediates involve the spatially discrete location of the ligand in different helices of the protein before its eventual recognition of native pose. The compressed subspace derived from the autoencoder provides a quantitatively accurate measure of the free energy and kinetics of ligand binding to the native pocket. The investigation also recommends that while a linear dimensional reduction technique, such as time-structured independent component analysis, can do a decent job of state-space decomposition in cases where the intermediates are long-lived, autoencoder is the method of choice in systems where transient, low-populated intermediates can lead to multiple ligand-binding pathways.
Collapse
Affiliation(s)
- Satyabrata Bandyopadhyay
- Tata Institute of Fundamental Research, Center for Interdisciplinary Sciences, Hyderabad 500046, India
| | - Jagannath Mondal
- Tata Institute of Fundamental Research, Center for Interdisciplinary Sciences, Hyderabad 500046, India
| |
Collapse
|
22
|
Wu M, Liao J, Shu Z, Chen C. Enhanced sampling in explicit solvent by deep learning module in FSATOOL. J Comput Chem 2023. [PMID: 37191088 DOI: 10.1002/jcc.27132] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2023] [Revised: 04/21/2023] [Accepted: 04/27/2023] [Indexed: 05/17/2023]
Abstract
FSATOOL is an integrated molecular simulation and data analysis program. Its old molecular dynamics engine only supports simulations in vacuum or implicit solvent. In this work, we implement the well-known smooth particle mesh Ewald method for simulations in explicit solvent. The new developed engine is runnable on both CPU and GPU. All the existed analysis modules in the program are compatible with the new engine. Moreover, we also build a complete deep learning module in FSATOOL. Based on the module, we further implement two useful trajectory analysis methods: state-free reversible VAMPnets and time-lagged autoencoder. They are good at searching the collective variables related to the conformational transitions of biomolecules. In FSATOOL, these collective variables can be further used to construct a bias potential for the enhanced sampling purpose. We introduce the implementation details of the methods and present their actual performances in FSATOOL by a few enhanced sampling simulations.
Collapse
Affiliation(s)
- Mincong Wu
- Biomolecular Physics and Modeling Group, School of Physics, Huazhong University of Science and Technology, Wuhan, Hubei, China
| | - Jun Liao
- Biomolecular Physics and Modeling Group, School of Physics, Huazhong University of Science and Technology, Wuhan, Hubei, China
| | - Zirui Shu
- Biomolecular Physics and Modeling Group, School of Physics, Huazhong University of Science and Technology, Wuhan, Hubei, China
| | - Changjun Chen
- Biomolecular Physics and Modeling Group, School of Physics, Huazhong University of Science and Technology, Wuhan, Hubei, China
| |
Collapse
|
23
|
Conev A, Rigo MM, Devaurs D, Fonseca AF, Kalavadwala H, de Freitas MV, Clementi C, Zanatta G, Antunes DA, Kavraki L. EnGens: a computational framework for generation and analysis of representative protein conformational ensembles. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.04.24.538094. [PMID: 37163076 PMCID: PMC10168271 DOI: 10.1101/2023.04.24.538094] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/11/2023]
Abstract
Proteins are dynamic macromolecules that perform vital functions in cells. A protein structure determines its function, but this structure is not static, as proteins change their conformation to achieve various functions. Understanding the conformational landscapes of proteins is essential to understand their mechanism of action. Sets of carefully chosen conformations can summarize such complex landscapes and provide better insights into protein function than single conformations. We refer to these sets as representative conformational ensembles. Recent advances in computational methods have led to an increase in number of available structural datasets spanning conformational landscapes. However, extracting representative conformational ensembles from such datasets is not an easy task and many methods have been developed to tackle it. Our new approach, EnGens (short for ensemble generation), collects these methods into a unified framework for generating and analyzing protein conformational ensembles. In this work we: (1) provide an overview of existing methods and tools for protein structural ensemble generation and analysis; (2) unify existing approaches in an open-source Python package, and a portable Docker image, providing interactive visualizations within a Jupyter Notebook pipeline; (3) test our pipeline on a few canonical examples found in the literature. Representative ensembles produced by EnGens can be used for many downstream tasks such as protein-ligand ensemble docking, Markov state modeling of protein dynamics and analysis of the effect of single-point mutations.
Collapse
|
24
|
Shmilovich K, Ferguson AL. Girsanov Reweighting Enhanced Sampling Technique (GREST): On-the-Fly Data-Driven Discovery of and Enhanced Sampling in Slow Collective Variables. J Phys Chem A 2023; 127:3497-3517. [PMID: 37036804 DOI: 10.1021/acs.jpca.3c00505] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/11/2023]
Abstract
Molecular dynamics simulations of microscopic phenomena are limited by the short integration time steps which are required for numerical stability but which limit the practically achievable simulation time scales. Collective variable (CV) enhanced sampling techniques apply biases to predefined collective coordinates to promote barrier crossing, phase space exploration, and sampling of rare events. The efficacy of these techniques is contingent on the selection of good CVs correlated with the molecular motions governing the long-time dynamical evolution of the system. In this work, we introduce Girsanov Reweighting Enhanced Sampling Technique (GREST) as an adaptive sampling scheme that interleaves rounds of data-driven slow CV discovery and enhanced sampling along these coordinates. Since slow CVs are inherently dynamical quantities, a key ingredient in our approach is the use of both thermodynamic and dynamical Girsanov reweighting corrections for rigorous estimation of slow CVs from biased simulation data. We demonstrate our approach on a toy 1D 4-well potential, a simple biomolecular system alanine dipeptide, and the Trp-Leu-Ala-Leu-Leu (WLALL) pentapeptide. In each case GREST learns appropriate slow CVs and drives sampling of all thermally accessible metastable states starting from zero prior knowledge of the system. We make GREST accessible to the community via a publicly available open source Python package.
Collapse
Affiliation(s)
- Kirill Shmilovich
- Pritzker School of Molecular Engineering, University of Chicago, Chicago, Illinois 60637, United States
| | - Andrew L Ferguson
- Pritzker School of Molecular Engineering, University of Chicago, Chicago, Illinois 60637, United States
| |
Collapse
|
25
|
Dutta P, Sengupta N. Efficient Interrogation of the Kinetic Barriers Demarcating Catalytic States of a Tyrosine Kinase with Optimal Physical Descriptors and Mixture Models. Chemphyschem 2023; 24:e202200595. [PMID: 36394126 DOI: 10.1002/cphc.202200595] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2022] [Revised: 11/16/2022] [Accepted: 11/16/2022] [Indexed: 11/18/2022]
Abstract
Computer simulations are increasingly used to access thermo-kinetic information underlying structural transformation of protein kinases. Such information are necessary to probe their roles in disease progression and interactions with drug targets. However, the investigations are frequently challenged by forbiddingly high computational expense, and by the lack of standard protocols for the design of low dimensional physical descriptors that encode system features important for transitions. Here, we consider the demarcating characteristics of the different states of Abelson tyrosine kinase associated with distinct catalytic activity to construct a set of physically meaningful, orthogonal collective variables that preserve the slow modes of the system. Independent sampling of each metastable state is followed by the estimation of global partition function along the appropriate physical descriptors using the modified Expectation Maximized Molecular Dynamics method. The resultant free energy barriers are in excellent agreement with experimentally known rate-limiting dynamics and activation energy computed with conventional enhanced sampling methods. We discuss possible directions for further development and applications.
Collapse
Affiliation(s)
- Pallab Dutta
- Department of Biological Sciences, Indian Institute of Science Education and Research (IISER) Kolkata, Mohanpur, 741246, India
| | - Neelanjana Sengupta
- Department of Biological Sciences, Indian Institute of Science Education and Research (IISER) Kolkata, Mohanpur, 741246, India
| |
Collapse
|
26
|
Chen H, Chipot C. Chasing collective variables using temporal data-driven strategies. QRB DISCOVERY 2023; 4:e2. [PMID: 37564298 PMCID: PMC10411323 DOI: 10.1017/qrd.2022.23] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2022] [Revised: 12/21/2022] [Accepted: 12/29/2022] [Indexed: 01/09/2023] Open
Abstract
The convergence of free-energy calculations based on importance sampling depends heavily on the choice of collective variables (CVs), which in principle, should include the slow degrees of freedom of the biological processes to be investigated. Autoencoders (AEs), as emerging data-driven dimension reduction tools, have been utilised for discovering CVs. AEs, however, are often treated as black boxes, and what AEs actually encode during training, and whether the latent variables from encoders are suitable as CVs for further free-energy calculations remains unknown. In this contribution, we review AEs and their time-series-based variants, including time-lagged AEs (TAEs) and modified TAEs, as well as the closely related model variational approach for Markov processes networks (VAMPnets). We then show through numerical examples that AEs learn the high-variance modes instead of the slow modes. In stark contrast, time series-based models are able to capture the slow modes. Moreover, both modified TAEs with extensions from slow feature analysis and the state-free reversible VAMPnets (SRVs) can yield orthogonal multidimensional CVs. As an illustration, we employ SRVs to discover the CVs of the isomerizations of N-acetyl-N'-methylalanylamide and trialanine by iterative learning with trajectories from biased simulations. Last, through numerical experiments with anisotropic diffusion, we investigate the potential relationship of time-series-based models and committor probabilities.
Collapse
Affiliation(s)
- Haochuan Chen
- Laboratoire International Associé Centre National de la Recherche Scientifique et University of Illinois at Urbana-Champaign, Unité Mixte de Recherche n°7019, Université de Lorraine, 54506 Vandœuvre-lès-Nancy, France
| | - Christophe Chipot
- Laboratoire International Associé Centre National de la Recherche Scientifique et University of Illinois at Urbana-Champaign, Unité Mixte de Recherche n°7019, Université de Lorraine, 54506 Vandœuvre-lès-Nancy, France
- Theoretical and Computational Biophysics Group, Beckman Institute, and Department of Physics, University of Illinois at Urbana-Champaign, Urbana, IL61801, USA
- Department of Biochemistry and Molecular Biology, University of Chicago, Chicago, IL60637, USA
| |
Collapse
|
27
|
Ketkaew R, Luber S. DeepCV: A Deep Learning Framework for Blind Search of Collective Variables in Expanded Configurational Space. J Chem Inf Model 2022; 62:6352-6364. [PMID: 36445176 DOI: 10.1021/acs.jcim.2c00883] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/02/2022]
Abstract
We present Deep learning for Collective Variables (DeepCV), a computer code that provides an efficient and customizable implementation of the deep autoencoder neural network (DAENN) algorithm that has been developed in our group for computing collective variables (CVs) and can be used with enhanced sampling methods to reconstruct free energy surfaces of chemical reactions. DeepCV can be used to conveniently calculate molecular features, train models, generate CVs, validate rare events from sampling, and analyze a trajectory for chemical reactions of interest. We use DeepCV in an example study of the conformational transition of cyclohexene, where metadynamics simulations are performed using DAENN-generated CVs. The results show that the adopted CVs give free energies in line with those obtained by previously developed CVs and experimental results. DeepCV is open-source software written in Python/C++ object-oriented languages, based on the TensorFlow framework and distributed free of charge for noncommercial purposes, which can be incorporated into general molecular dynamics software. DeepCV also comes with several additional tools, i.e., an application program interface (API), documentation, and tutorials.
Collapse
Affiliation(s)
- Rangsiman Ketkaew
- Department of Chemistry, University of Zurich, Winterthurerstrasse 190, CH-8057 Zürich, Switzerland
| | - Sandra Luber
- Department of Chemistry, University of Zurich, Winterthurerstrasse 190, CH-8057 Zürich, Switzerland
| |
Collapse
|
28
|
Mardt A, Hempel T, Clementi C, Noé F. Deep learning to decompose macromolecules into independent Markovian domains. Nat Commun 2022; 13:7101. [PMID: 36402768 PMCID: PMC9675806 DOI: 10.1038/s41467-022-34603-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2022] [Accepted: 10/27/2022] [Indexed: 11/21/2022] Open
Abstract
The increasing interest in modeling the dynamics of ever larger proteins has revealed a fundamental problem with models that describe the molecular system as being in a global configuration state. This notion limits our ability to gather sufficient statistics of state probabilities or state-to-state transitions because for large molecular systems the number of metastable states grows exponentially with size. In this manuscript, we approach this challenge by introducing a method that combines our recent progress on independent Markov decomposition (IMD) with VAMPnets, a deep learning approach to Markov modeling. We establish a training objective that quantifies how well a given decomposition of the molecular system into independent subdomains with Markovian dynamics approximates the overall dynamics. By constructing an end-to-end learning framework, the decomposition into such subdomains and their individual Markov state models are simultaneously learned, providing a data-efficient and easily interpretable summary of the complex system dynamics. While learning the dynamical coupling between Markovian subdomains is still an open issue, the present results are a significant step towards learning Ising models of large molecular complexes from simulation data.
Collapse
Affiliation(s)
- Andreas Mardt
- grid.14095.390000 0000 9116 4836Freie Universität Berlin, Department of Mathematics and Computer Science, Berlin, Germany
| | - Tim Hempel
- grid.14095.390000 0000 9116 4836Freie Universität Berlin, Department of Mathematics and Computer Science, Berlin, Germany ,grid.14095.390000 0000 9116 4836Freie Universität Berlin, Department of Physics, Berlin, Germany
| | - Cecilia Clementi
- grid.14095.390000 0000 9116 4836Freie Universität Berlin, Department of Physics, Berlin, Germany ,grid.21940.3e0000 0004 1936 8278Rice University, Department of Chemistry, Houston, TX USA ,grid.509984.90000 0004 5907 3802Rice University, Center for Theoretical Biological Physics, Houston, TX USA
| | - Frank Noé
- grid.14095.390000 0000 9116 4836Freie Universität Berlin, Department of Mathematics and Computer Science, Berlin, Germany ,grid.14095.390000 0000 9116 4836Freie Universität Berlin, Department of Physics, Berlin, Germany ,grid.21940.3e0000 0004 1936 8278Rice University, Department of Chemistry, Houston, TX USA ,Microsoft Research AI4Science, Berlin, Germany
| |
Collapse
|
29
|
Kawada R, Endo K, Yuhara D, Yasuoka K. MD-GAN with multi-particle input: the machine learning of long-time molecular behavior from short-time MD data. SOFT MATTER 2022; 18:8446-8455. [PMID: 36314893 DOI: 10.1039/d2sm00852a] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
Molecular dynamics simulation is a method of investigating the behavior of molecules, which is useful for analyzing a variety of structural and dynamic properties and mechanisms of phenomena. However, the huge computational cost of large-scale and long-time simulations is an enduring problem that must be addressed. MD-GAN is a machine learning-based method that can evolve part of the system at any time step, accelerating the generation of molecular dynamics data [Endo et al., Proceedings of the AAAI Conference on Artificial Intelligence, 2018, 32]. For the accurate prediction of MD-GAN, sufficient information on the dynamics of a part of the system should be included with the training data. Therefore, the selection of the part of the system is important for efficient learning. In a previous study, only one particle (or vector) of each molecule was extracted as part of the system. The effectiveness of adding information from other particles to the learning process is investigated in this study. When the dynamics of three particles of each molecule were used in the polyethylene experiment, the diffusion was successfully predicted using the training data with a time length of approximately 40%, compared to the single-particle input. Surprisingly, the unobserved transition of diffusion in the training data was also predicted using this method. The reduced cost for the generation of training MD data achieved in this study is useful for accelerating MD-GAN.
Collapse
Affiliation(s)
- Ryo Kawada
- Department of Mechanical Engineering, Keio University, 3-14-1 Hiyoshi, Kohoku-ku, Yokohama, Kanagawa, 223-8522, Japan.
| | - Katsuhiro Endo
- Department of Mechanical Engineering, Keio University, 3-14-1 Hiyoshi, Kohoku-ku, Yokohama, Kanagawa, 223-8522, Japan.
| | - Daisuke Yuhara
- Materials Design Laboratory, Science & Innovation Center, R&D Transformation Div., Mitsubishi Chemical Holdings Group, 1000 Kamoshida-cho, Aoba-ku, Yokohama, Kanagawa, 227-8502, Japan
| | - Kenji Yasuoka
- Department of Mechanical Engineering, Keio University, 3-14-1 Hiyoshi, Kohoku-ku, Yokohama, Kanagawa, 223-8522, Japan.
| |
Collapse
|
30
|
Novelli P, Bonati L, Pontil M, Parrinello M. Characterizing Metastable States with the Help of Machine Learning. J Chem Theory Comput 2022; 18:5195-5202. [PMID: 35920063 DOI: 10.1021/acs.jctc.2c00393] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Present-day atomistic simulations generate long trajectories of ever more complex systems. Analyzing these data, discovering metastable states, and uncovering their nature are becoming increasingly challenging. In this paper, we first use the variational approach to conformation dynamics to discover the slowest dynamical modes of the simulations. This allows the different metastable states of the system to be located and organized hierarchically. The physical descriptors that characterize metastable states are discovered by means of a machine learning method. We show in the cases of two proteins, chignolin and bovine pancreatic trypsin inhibitor, how such analysis can be effortlessly performed in a matter of seconds. Another strength of our approach is that it can be applied to the analysis of both unbiased and biased simulations.
Collapse
Affiliation(s)
- Pietro Novelli
- Computational Statistics and Machine Learning, Italian Institute of Technology, Via Enrico Melen 83, 16142 Genoa, Italy
| | - Luigi Bonati
- Atomistic Simulations, Italian Institute of Technology, Via Enrico Melen 83, 16142 Genoa, Italy
| | - Massimiliano Pontil
- Computational Statistics and Machine Learning, Italian Institute of Technology, Via Enrico Melen 83, 16142 Genoa, Italy.,Department of Computer Science, University College London, London WC1E 6BT, United Kingdom
| | - Michele Parrinello
- Atomistic Simulations, Italian Institute of Technology, Via Enrico Melen 83, 16142 Genoa, Italy
| |
Collapse
|
31
|
Gardin A, Perego C, Doni G, Pavan GM. Classifying soft self-assembled materials via unsupervised machine learning of defects. Commun Chem 2022; 5:82. [PMID: 36697761 PMCID: PMC9814741 DOI: 10.1038/s42004-022-00699-z] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2021] [Accepted: 06/29/2022] [Indexed: 01/28/2023] Open
Abstract
Unlike molecular crystals, soft self-assembled fibers, micelles, vesicles, etc., exhibit a certain order in the arrangement of their constitutive monomers but also high structural dynamicity and variability. Defects and disordered local domains that continuously form-and-repair in their structures impart to such materials unique adaptive and dynamical properties, which make them, e.g., capable to communicate with each other. However, objective criteria to compare such complex dynamical features and to classify soft supramolecular materials are non-trivial to attain. Here we show a data-driven workflow allowing us to achieve this goal. Building on unsupervised clustering of Smooth Overlap of Atomic Position (SOAP) data obtained from equilibrium molecular dynamics simulations, we can compare a variety of soft supramolecular assemblies via a robust SOAP metric. This provides us with a data-driven "defectometer" to classify different types of supramolecular materials based on the structural dynamics of the ordered/disordered local molecular environments that statistically emerge within them.
Collapse
Affiliation(s)
- Andrea Gardin
- Department of Applied Science and Technology, Politecnico di Torino, Torino, Italy
| | - Claudio Perego
- Department of Innovative Technologies, University of Applied Sciences and Arts of Southern Switzerland, Lugano-Viganello, Switzerland
| | - Giovanni Doni
- Department of Innovative Technologies, University of Applied Sciences and Arts of Southern Switzerland, Lugano-Viganello, Switzerland
| | - Giovanni M Pavan
- Department of Applied Science and Technology, Politecnico di Torino, Torino, Italy. .,Department of Innovative Technologies, University of Applied Sciences and Arts of Southern Switzerland, Lugano-Viganello, Switzerland.
| |
Collapse
|
32
|
Remington JM, Ferrell JB, Schneebeli ST, Li J. Concerted Rolling and Penetration of Peptides during Membrane Binding. J Chem Theory Comput 2022; 18:3921-3929. [PMID: 35507824 DOI: 10.1021/acs.jctc.2c00014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Peptide binding to membranes is common and fundamental in biochemistry and biophysics and critical for applications ranging from drug delivery to the treatment of bacterial infections. However, it is largely unclear, from a theoretical point of view, what peptides of different sequences and structures share in the membrane-binding and insertion process. In this work, we analyze three prototypical membrane-binding peptides (α-helical magainin, PGLa, and β-hairpin tachyplesin) during membrane binding, using molecular details provided by Markov state modeling and microsecond-long molecular dynamics simulations. By leveraging both geometric and data-driven collective variables that capture the essential physics of the amphiphilic and cationic peptide-membrane interactions, we reveal how the slowest kinetic process of membrane binding is the dynamic rolling of the peptide from an attached to a fully bound state. These results not only add fundamental knowledge of the theory of how peptides bind to biological membranes but also open new avenues to study general peptides in more complex environments for further applications.
Collapse
Affiliation(s)
- Jacob M Remington
- Department of Chemistry, The University of Vermont, Burlington, Vermont 05405, United States
| | - Jonathon B Ferrell
- Department of Chemistry, The University of Vermont, Burlington, Vermont 05405, United States
| | - Severin T Schneebeli
- Department of Chemistry, The University of Vermont, Burlington, Vermont 05405, United States
| | - Jianing Li
- Department of Chemistry, The University of Vermont, Burlington, Vermont 05405, United States
| |
Collapse
|
33
|
Ghorbani M, Prasad S, Klauda J, Brooks B. GraphVAMPNet, using graph neural networks and variational approach to Markov processes for dynamical modeling of biomolecules. J Chem Phys 2022; 156:184103. [PMID: 35568532 PMCID: PMC9094994 DOI: 10.1063/5.0085607] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Finding low dimensional representation of data from long-timescale trajectories of biomolecular processes such as protein-folding or ligand-receptor binding is of fundamental importance and kinetic models such as Markov modeling have proven useful in describing the kinetics of these systems. Recently, an unsupervised machine learning technique called VAMPNet was introduced to learn the low dimensional representation and linear dynamical model in an end-to-end manner. VAMPNet is based on variational approach to Markov processes (VAMP) and relies on neural networks to learn the coarse-grained dynamics. In this contribution, we combine VAMPNet and graph neural networks to generate an end-to-end framework to efficiently learn high-level dynamics and metastable states from the long-timescale molecular dynamics trajectories. This method bears the advantages of graph representation learning and uses graph message passing operations to generate an embedding for each datapoint which is used in the VAMPNet to generate a coarse-grained representation. This type of molecular representation results in a higher resolution and more interpretable Markov model than the standard VAMPNet enabling a more detailed kinetic study of the biomolecular processes. Our GraphVAMPNet approach is also enhanced with an attention mechanism to find the important residues for classification into different metastable states.
Collapse
Affiliation(s)
- Mahdi Ghorbani
- University of Maryland at College Park, United States of America
| | - Samarjeet Prasad
- National Heart Lung and Blood Institute, United States of America
| | - Jeffery Klauda
- Chemical and Biomolecular Engineering, University of Maryland at College Park, United States of America
| | - Bernard Brooks
- Laboratory of Computational Biology, National Heart, Lung, and Blood Institute, United States of America
| |
Collapse
|
34
|
Gianti E, Percec S. Machine Learning at the Interface of Polymer Science and Biology: How Far Can We Go? Biomacromolecules 2022; 23:576-591. [PMID: 35133143 DOI: 10.1021/acs.biomac.1c01436] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
This Perspective outlines recent progress and future directions for using machine learning (ML), a data-driven method, to address critical questions in the design, synthesis, processing, and characterization of biomacromolecules. The achievement of these tasks requires the navigation of vast and complex chemical and biological spaces, difficult to accomplish with reasonable speed. Using modern algorithms and supercomputers, quantum physics methods are able to examine systems containing a few hundred interacting species and determine the probability of finding them in a particular region of phase space, thereby anticipating their properties. Likewise, modern approaches in chemistry and biomolecular simulation, supported by high performance computing, have culminated in producing data sets of escalating size and intrinsically high complexity. Hence, using ML to extract relevant information from these fields is of paramount importance to advance our understanding of chemical and biomolecular systems. At the heart of ML approaches lie statistical algorithms, which by evaluating a portion of a given data set, identify, learn, and manipulate the underlying rules that govern the whole data set. The assembly of a quality model to represent the data followed by the predictions and elimination of error sources are the key steps in ML. In addition to a growing infrastructure of ML tools to address complex problems, an increasing number of aspects related to our understanding of the fundamental properties of biomacromolecules are exposed to ML. These fields, including those residing at the interface of polymer science and biology (i.e., structure determination, de novo design, folding, and dynamics), strive to adopt and take advantage of the transformative power offered by approaches in the ML domain, which clearly has the potential of accelerating research in the field of biomacromolecules.
Collapse
Affiliation(s)
- Eleonora Gianti
- Institute for Computational Molecular Science (ICMS), Temple University, Philadelphia, Pennsylvania 19122, United States.,Department of Chemistry, Temple University, Philadelphia, Pennsylvania 19122, United States
| | - Simona Percec
- Department of Chemistry, Temple University, Philadelphia, Pennsylvania 19122, United States
| |
Collapse
|
35
|
Gu H, Wang W, Cao S, Unarta IC, Yao Y, Sheong FK, Huang X. RPnet: a reverse-projection-based neural network for coarse-graining metastable conformational states for protein dynamics. Phys Chem Chem Phys 2022; 24:1462-1474. [PMID: 34985469 DOI: 10.1039/d1cp03622j] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
Abstract
The Markov State Model (MSM) is a powerful tool for modeling long timescale dynamics based on numerous short molecular dynamics (MD) simulation trajectories, which makes it a useful tool for elucidating the conformational changes of biological macromolecules. By partitioning the phase space into discretized states and estimating the probabilities of inter-state transitions based on short MD trajectories, one can construct a kinetic network model that could be used to extrapolate long-timescale kinetics if the Markovian condition is met. However, meeting the Markovian condition often requires hundreds or even thousands of states (microstates), which greatly hinders the comprehension of the conformational dynamics of complex biomolecules. Kinetic lumping algorithms can coarse grain numerous microstates into a handful of metastable states (macrostates), which would greatly facilitate the elucidation of biological mechanisms. In this work, we have developed a reverse-projection-based neural network (RPnet) to lump microstates into macrostates, by making use of a physics-based loss function that is based on the projection operator framework of conformational dynamics. By recognizing that microstate and macrostate transition modes can be related through a projection process, we have developed a reverse-projection scheme to directly compare the microstate and macrostate dynamics. Based on this reverse-projection scheme, we designed a loss function that allows the effective assessment of the quality of a given kinetic lumping. We then make use of a neural network to efficiently minimize this loss function to obtain an optimized set of macrostates. We have demonstrated the power of our RPnet in analyzing the dynamics of a numerical 2D potential, alanine dipeptide, and the clamp opening of an RNA polymerase. In all these systems, we have illustrated that our method could yield comparable or better results than competing methods in terms of state partitioning and reproduction of slow dynamics. We expect that our RPnet holds promise in analyzing the conformational dynamics of biological macromolecules.
Collapse
Affiliation(s)
- Hanlin Gu
- Department of Mathematics, Hong Kong University of Science and Technology, Kowloon, Hong Kong
| | - Wei Wang
- Department of Chemistry, Hong Kong University of Science and Technology, Kowloon, Hong Kong.
| | - Siqin Cao
- Department of Chemistry, Hong Kong University of Science and Technology, Kowloon, Hong Kong.
| | - Ilona Christy Unarta
- Department of Chemical and Biological Engineering, Hong Kong University of Science and Technology, Kowloon, Hong Kong
| | - Yuan Yao
- Department of Mathematics, Hong Kong University of Science and Technology, Kowloon, Hong Kong
| | - Fu Kit Sheong
- Department of Chemistry, Hong Kong University of Science and Technology, Kowloon, Hong Kong. .,Institute for Advanced Study, Hong Kong University of Science and Technology, Kowloon, Hong Kong
| | - Xuhui Huang
- Department of Chemistry, Hong Kong University of Science and Technology, Kowloon, Hong Kong. .,Department of Chemical and Biological Engineering, Hong Kong University of Science and Technology, Kowloon, Hong Kong
| |
Collapse
|
36
|
Guardiani C, Cecconi F, Chiodo L, Cottone G, Malgaretti P, Maragliano L, Barabash ML, Camisasca G, Ceccarelli M, Corry B, Roth R, Giacomello A, Roux B. Computational methods and theory for ion channel research. ADVANCES IN PHYSICS: X 2022; 7:2080587. [PMID: 35874965 PMCID: PMC9302924 DOI: 10.1080/23746149.2022.2080587] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2022] [Accepted: 05/15/2022] [Indexed: 06/15/2023] Open
Abstract
Ion channels are fundamental biological devices that act as gates in order to ensure selective ion transport across cellular membranes; their operation constitutes the molecular mechanism through which basic biological functions, such as nerve signal transmission and muscle contraction, are carried out. Here, we review recent results in the field of computational research on ion channels, covering theoretical advances, state-of-the-art simulation approaches, and frontline modeling techniques. We also report on few selected applications of continuum and atomistic methods to characterize the mechanisms of permeation, selectivity, and gating in biological and model channels.
Collapse
Affiliation(s)
- C. Guardiani
- Dipartimento di Ingegneria Meccanica e Aerospaziale, Sapienza Università di Roma, Rome, Italy
| | - F. Cecconi
- CNR - Istituto dei Sistemi Complessi, Rome, Italy and Istituto Nazionale di Fisica Nucleare, INFN, Roma1 section. 00185, Roma, Italy
| | - L. Chiodo
- Department of Engineering, Campus Bio-Medico University, Rome, Italy
| | - G. Cottone
- Department of Physics and Chemistry-Emilio Segrè, University of Palermo, Palermo, Italy
| | - P. Malgaretti
- Helmholtz Institute Erlangen-Nürnberg for Renewable Energy (IEK-11), Forschungszentrum Jülich, Erlangen, Germany
| | - L. Maragliano
- Department of Life and Environmental Sciences, Polytechnic University of Marche, Ancona, Italy, and Center for Synaptic Neuroscience and Technology, Istituto Italiano di Tecnologia, Genova, Italy
| | - M. L. Barabash
- Department of Materials Science and Nanoengineering, Rice University, Houston, TX 77005, USA
| | - G. Camisasca
- Dipartimento di Ingegneria Meccanica e Aerospaziale, Sapienza Università di Roma, Rome, Italy
- Dipartimento di Fisica, Università Roma Tre, Rome, Italy
| | - M. Ceccarelli
- Department of Physics and CNR-IOM, University of Cagliari, Monserrato 09042-IT, Italy
| | - B. Corry
- Research School of Biology, The Australian National University, Canberra, ACT 2600, Australia
| | - R. Roth
- Institut Für Theoretische Physik, Eberhard Karls Universität Tübingen, Tübingen, Germany
| | - A. Giacomello
- Dipartimento di Ingegneria Meccanica e Aerospaziale, Sapienza Università di Roma, Rome, Italy
| | - B. Roux
- Department of Biochemistry & Molecular Biology, University of Chicago, Chicago IL, USA
| |
Collapse
|
37
|
Belkacemi Z, Gkeka P, Lelièvre T, Stoltz G. Chasing Collective Variables Using Autoencoders and Biased Trajectories. J Chem Theory Comput 2021; 18:59-78. [PMID: 34965117 DOI: 10.1021/acs.jctc.1c00415] [Citation(s) in RCA: 22] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
Free energy biasing methods have proven to be powerful tools to accelerate the simulation of important conformational changes of molecules by modifying the sampling measure. However, most of these methods rely on the prior knowledge of low-dimensional slow degrees of freedom, i.e., collective variables (CVs). Alternatively, such CVs can be identified using machine learning (ML) and dimensionality reduction algorithms. In this context, approaches where the CVs are learned in an iterative way using adaptive biasing have been proposed: at each iteration, the learned CV is used to perform free energy adaptive biasing to generate new data and learn a new CV. In this paper, we introduce a new iterative method involving CV learning with autoencoders: Free Energy Biasing and Iterative Learning with AutoEncoders (FEBILAE). Our method includes a reweighting scheme to ensure that the learning model optimizes the same loss at each iteration and achieves CV convergence. Using the alanine dipeptide system and the solvated chignolin mini-protein system as examples, we present results of our algorithm using the extended adaptive biasing force as the free energy adaptive biasing method.
Collapse
Affiliation(s)
- Zineb Belkacemi
- CERMICS, Ecole des Ponts ParisTech, 77455 Marne-la-Vallée, France.,Structure Design and Informatics, Sanofi 1371 R&D, 91385 Chilly-Mazarin, France
| | - Paraskevi Gkeka
- Structure Design and Informatics, Sanofi 1371 R&D, 91385 Chilly-Mazarin, France
| | - Tony Lelièvre
- CERMICS, Ecole des Ponts ParisTech, 77455 Marne-la-Vallée, France.,MATHERIALS Team-Project, Inria, 75589 Paris, France
| | - Gabriel Stoltz
- CERMICS, Ecole des Ponts ParisTech, 77455 Marne-la-Vallée, France.,MATHERIALS Team-Project, Inria, 75589 Paris, France
| |
Collapse
|
38
|
Beyerle ER, Guenza MG. Identifying the leading dynamics of ubiquitin: A comparison between the tICA and the LE4PD slow fluctuations in amino acids' position. J Chem Phys 2021; 155:244108. [PMID: 34972386 DOI: 10.1063/5.0059688] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
Molecular Dynamics (MD) simulations of proteins implicitly contain the information connecting the atomistic molecular structure and proteins' biologically relevant motion, where large-scale fluctuations are deemed to guide folding and function. In the complex multiscale processes described by MD trajectories, it is difficult to identify, separate, and study those large-scale fluctuations. This problem can be formulated as the need to identify a small number of collective variables that guide the slow kinetic processes. The most promising method among the ones used to study the slow leading processes in proteins' dynamics is the time-structure based on time-lagged independent component analysis (tICA), which identifies the dominant components in a noisy signal. Recently, we developed an anisotropic Langevin approach for the dynamics of proteins, called the anisotropic Langevin Equation for Protein Dynamics or LE4PD-XYZ. This approach partitions the protein's MD dynamics into mostly uncorrelated, wavelength-dependent, diffusive modes. It associates with each mode a free-energy map, where one measures the spatial extension and the time evolution of the mode-dependent, slow dynamical fluctuations. Here, we compare the tICA modes' predictions with the collective LE4PD-XYZ modes. We observe that the two methods consistently identify the nature and extension of the slowest fluctuation processes. The tICA separates the leading processes in a smaller number of slow modes than the LE4PD does. The LE4PD provides time-dependent information at short times and a formal connection to the physics of the kinetic processes that are missing in the pure statistical analysis of tICA.
Collapse
Affiliation(s)
- E R Beyerle
- Institute for Fundamental Science and Department of Chemistry and Biochemistry, University of Oregon, Eugene, Oregon 97403, USA
| | - M G Guenza
- Institute for Fundamental Science and Department of Chemistry and Biochemistry, University of Oregon, Eugene, Oregon 97403, USA
| |
Collapse
|
39
|
Vlachas PR, Zavadlav J, Praprotnik M, Koumoutsakos P. Accelerated Simulations of Molecular Systems through Learning of Effective Dynamics. J Chem Theory Comput 2021; 18:538-549. [PMID: 34890204 DOI: 10.1021/acs.jctc.1c00809] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Simulations are vital for understanding and predicting the evolution of complex molecular systems. However, despite advances in algorithms and special purpose hardware, accessing the time scales necessary to capture the structural evolution of biomolecules remains a daunting task. In this work, we present a novel framework to advance simulation time scales by up to 3 orders of magnitude by learning the effective dynamics (LED) of molecular systems. LED augments the equation-free methodology by employing a probabilistic mapping between coarse and fine scales using mixture density network (MDN) autoencoders and evolves the non-Markovian latent dynamics using long short-term memory MDNs. We demonstrate the effectiveness of LED in the Müller-Brown potential, the Trp cage protein, and the alanine dipeptide. LED identifies explainable reduced-order representations, i.e., collective variables, and can generate, at any instant, all-atom molecular trajectories consistent with the collective variables. We believe that the proposed framework provides a dramatic increase to simulation capabilities and opens new horizons for the effective modeling of complex molecular systems.
Collapse
Affiliation(s)
- Pantelis R Vlachas
- Computational Science and Engineering Laboratory, ETH Zurich, CH-8092, Switzerland
| | - Julija Zavadlav
- Professorship of Multiscale Modeling of Fluid Materials, TUM School of Engineering and Design, Technical University of Munich, 85748 Garching bei München, Germany.,Munich Data Science Institute, Technical University of Munich, 85748 Munich, Germany
| | - Matej Praprotnik
- Laboratory for Molecular Modeling, National Institute of Chemistry, SI-1001 Ljubljana, Slovenia.,Department of Physics, Faculty of Mathematics and Physics, University of Ljubljana, SI-1000 Ljubljana, Slovenia
| | - Petros Koumoutsakos
- John A. Paulson School of Engineering and Applied Sciences, Harvard University, Cambridge, Massachusetts 02138, United States
| |
Collapse
|
40
|
Mardt A, Noé F. Progress in deep Markov state modeling: Coarse graining and experimental data restraints. J Chem Phys 2021; 155:214106. [PMID: 34879670 DOI: 10.1063/5.0064668] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Recent advances in deep learning frameworks have established valuable tools for analyzing the long-timescale behavior of complex systems, such as proteins. In particular, the inclusion of physical constraints, e.g., time-reversibility, was a crucial step to make the methods applicable to biophysical systems. Furthermore, we advance the method by incorporating experimental observables into the model estimation showing that biases in simulation data can be compensated for. We further develop a new neural network layer in order to build a hierarchical model allowing for different levels of details to be studied. Finally, we propose an attention mechanism, which highlights important residues for the classification into different states. We demonstrate the new methodology on an ultralong molecular dynamics simulation of the Villin headpiece miniprotein.
Collapse
Affiliation(s)
- Andreas Mardt
- Department of Mathematics and Computer Science, Freie Universität Berlin, Berlin, Germany
| | - Frank Noé
- Department of Mathematics and Computer Science, Freie Universität Berlin, Berlin, Germany
| |
Collapse
|
41
|
Bonati L, Piccini G, Parrinello M. Deep learning the slow modes for rare events sampling. Proc Natl Acad Sci U S A 2021; 118:e2113533118. [PMID: 34706940 PMCID: PMC8612227 DOI: 10.1073/pnas.2113533118] [Citation(s) in RCA: 76] [Impact Index Per Article: 25.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/19/2021] [Indexed: 02/08/2023] Open
Abstract
The development of enhanced sampling methods has greatly extended the scope of atomistic simulations, allowing long-time phenomena to be studied with accessible computational resources. Many such methods rely on the identification of an appropriate set of collective variables. These are meant to describe the system's modes that most slowly approach equilibrium under the action of the sampling algorithm. Once identified, the equilibration of these modes is accelerated by the enhanced sampling method of choice. An attractive way of determining the collective variables is to relate them to the eigenfunctions and eigenvalues of the transfer operator. Unfortunately, this requires knowing the long-term dynamics of the system beforehand, which is generally not available. However, we have recently shown that it is indeed possible to determine efficient collective variables starting from biased simulations. In this paper, we bring the power of machine learning and the efficiency of the recently developed on the fly probability-enhanced sampling method to bear on this approach. The result is a powerful and robust algorithm that, given an initial enhanced sampling simulation performed with trial collective variables or generalized ensembles, extracts transfer operator eigenfunctions using a neural network ansatz and then accelerates them to promote sampling of rare events. To illustrate the generality of this approach, we apply it to several systems, ranging from the conformational transition of a small molecule to the folding of a miniprotein and the study of materials crystallization.
Collapse
Affiliation(s)
- Luigi Bonati
- Department of Physics, Eidgenössische Technische Hochschule (ETH) Zürich, 8092 Zürich, Switzerland;
- Atomistic Simulations, Italian Institute of Technology, 16163 Genova, Italy
| | | | - Michele Parrinello
- Atomistic Simulations, Italian Institute of Technology, 16163 Genova, Italy;
| |
Collapse
|
42
|
Jones M, Ashwood B, Tokmakoff A, Ferguson AL. Determining Sequence-Dependent DNA Oligonucleotide Hybridization and Dehybridization Mechanisms Using Coarse-Grained Molecular Simulation, Markov State Models, and Infrared Spectroscopy. J Am Chem Soc 2021; 143:17395-17411. [PMID: 34644072 PMCID: PMC8554761 DOI: 10.1021/jacs.1c05219] [Citation(s) in RCA: 21] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2021] [Indexed: 11/29/2022]
Abstract
A robust understanding of the sequence-dependent thermodynamics of DNA hybridization has enabled rapid advances in DNA nanotechnology. A fundamental understanding of the sequence-dependent kinetics and mechanisms of hybridization and dehybridization remains comparatively underdeveloped. In this work, we establish new understanding of the sequence-dependent hybridization/dehybridization kinetics and mechanism within a family of self-complementary pairs of 10-mer DNA oligomers by integrating coarse-grained molecular simulation, machine learning of the slow dynamical modes, data-driven inference of long-time kinetic models, and experimental temperature-jump infrared spectroscopy. For a repetitive ATATATATAT sequence, we resolve a rugged dynamical landscape comprising multiple metastable states, numerous competing hybridization/dehybridization pathways, and a spectrum of dynamical relaxations. Introduction of a G:C pair at the terminus (GATATATATC) or center (ATATGCATAT) of the sequence reduces the ruggedness of the dynamics landscape by eliminating a number of metastable states and reducing the number of competing dynamical pathways. Only by introducing a G:C pair midway between the terminus and the center to maximally disrupt the repetitive nature of the sequence (ATGATATCAT) do we recover a canonical "all-or-nothing" two-state model of hybridization/dehybridization with no intermediate metastable states. Our results establish new understanding of the dynamical richness of sequence-dependent kinetics and mechanisms of DNA hybridization/dehybridization by furnishing quantitative and predictive kinetic models of the dynamical transition network between metastable states, present a molecular basis with which to understand experimental temperature jump data, and furnish foundational design rules by which to rationally engineer the kinetics and pathways of DNA association and dissociation for DNA nanotechnology applications.
Collapse
Affiliation(s)
- Michael
S. Jones
- Pritzker
School of Molecular Engineering, The University
of Chicago, 5640 South Ellis Avenue, Chicago, Illinois 60637, United
States
| | - Brennan Ashwood
- Department
of Chemistry, Institute for Biophysical Dynamics, and James Franck
Institute, The University of Chicago, 929 East 57th Street, Chicago, Illinois 60637, United States
| | - Andrei Tokmakoff
- Department
of Chemistry, Institute for Biophysical Dynamics, and James Franck
Institute, The University of Chicago, 929 East 57th Street, Chicago, Illinois 60637, United States
| | - Andrew L. Ferguson
- Pritzker
School of Molecular Engineering, The University
of Chicago, 5640 South Ellis Avenue, Chicago, Illinois 60637, United
States
| |
Collapse
|
43
|
Konovalov K, Unarta IC, Cao S, Goonetilleke EC, Huang X. Markov State Models to Study the Functional Dynamics of Proteins in the Wake of Machine Learning. JACS AU 2021; 1:1330-1341. [PMID: 34604842 PMCID: PMC8479766 DOI: 10.1021/jacsau.1c00254] [Citation(s) in RCA: 47] [Impact Index Per Article: 15.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/07/2021] [Indexed: 05/19/2023]
Abstract
Markov state models (MSMs) based on molecular dynamics (MD) simulations are routinely employed to study protein folding, however, their application to functional conformational changes of biomolecules is still limited. In the past few years, the field of computational chemistry has experienced a surge of advancements stemming from machine learning algorithms, and MSMs have not been left out. Unlike global processes, such as protein folding, the application of MSMs to functional conformational changes is challenging because they mostly consist of localized structural transitions. Therefore, it is critical to properly select a subset of structural features that can describe the slowest dynamics of these functional conformational changes. To address this challenge, we recommend several automatic feature selection methods such as Spectral-OASIS. To identify states in MSMs, the chosen features can be subject to dimensionality reduction methods such as TICA or deep learning based VAMPNets to project MD conformations onto a few collective variables for subsequent clustering. Another challenge for the application of MSMs to the study of functional conformational changes is the ability to comprehend their biophysical mechanisms, as MSMs built for these processes often require a large number of states. We recommend the recently developed quasi-MSMs (qMSMs) to address this issue. Compared to MSMs, qMSMs encode the non-Markovian dynamics via the generalized master equation and can significantly reduce the number of states. As a result, qMSMs can be built with a handful of states to facilitate the interpretation of functional conformational changes. In the wake of machine learning, we believe that the rapid advancement in the MSM methodology will lead to their wider application in studying functional conformational changes of biomolecules.
Collapse
Affiliation(s)
- Kirill
A. Konovalov
- Department
of Chemistry, State Key Laboratory of Molecular Neuroscience, The Hong Kong University of Science and Technology, Kowloon, Hong Kong
- Hong
Kong Center for Neurodegenerative Diseases, Hong Kong Science Park, Hong Kong
| | - Ilona Christy Unarta
- Department
of Chemical and Biological Engineering, The Hong Kong University of Science and Technology, Kowloon, Hong Kong
- Hong
Kong Center for Neurodegenerative Diseases, Hong Kong Science Park, Hong Kong
| | - Siqin Cao
- Department
of Chemistry, State Key Laboratory of Molecular Neuroscience, The Hong Kong University of Science and Technology, Kowloon, Hong Kong
- Hong
Kong Center for Neurodegenerative Diseases, Hong Kong Science Park, Hong Kong
| | - Eshani C. Goonetilleke
- Department
of Chemistry, State Key Laboratory of Molecular Neuroscience, The Hong Kong University of Science and Technology, Kowloon, Hong Kong
- Hong
Kong Center for Neurodegenerative Diseases, Hong Kong Science Park, Hong Kong
| | - Xuhui Huang
- Department
of Chemistry, State Key Laboratory of Molecular Neuroscience, The Hong Kong University of Science and Technology, Kowloon, Hong Kong
- Department
of Chemical and Biological Engineering, The Hong Kong University of Science and Technology, Kowloon, Hong Kong
- Hong
Kong Center for Neurodegenerative Diseases, Hong Kong Science Park, Hong Kong
| |
Collapse
|
44
|
Clark AE, Adams H, Hernandez R, Krylov AI, Niklasson AMN, Sarupria S, Wang Y, Wild SM, Yang Q. The Middle Science: Traversing Scale In Complex Many-Body Systems. ACS CENTRAL SCIENCE 2021; 7:1271-1287. [PMID: 34471670 PMCID: PMC8393217 DOI: 10.1021/acscentsci.1c00685] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
A roadmap is developed that integrates simulation methodology and data science methods to target new theories that traverse the multiple length- and time-scale features of many-body phenomena.
Collapse
Affiliation(s)
- Aurora E. Clark
- Department of Chemistry, Washington State University, Pullman, Washington 99163, United States
| | - Henry Adams
- Department of Mathematics, Colorado State
University, Fort Collins, Colorado 80523, United States
| | - Rigoberto Hernandez
- Departments
of Chemistry, Chemical and Biomolecular Engineering, and Materials
Science and Engineering, Johns Hopkins University, Baltimore, Maryland 21218, United States
| | - Anna I. Krylov
- Department of Chemistry, University of Southern California, Los Angeles, California 90089, United States
| | - Anders M. N. Niklasson
- Theoretical
Division, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, United States
| | - Sapna Sarupria
- Department of Chemical and Biomolecular Engineering, Center for Optical
Materials Science and Engineering Technologies (COMSET), Clemson University, Clemson, South Carolina 29670, United States
- Department
of Chemistry, University of Minnesota, Minneapolis, Minnesota 55455, United States
| | - Yusu Wang
- Halıcıŏglu Data Science Institute, University of California, San Diego, La Jolla, California 92093, United States
| | - Stefan M. Wild
- Mathematics
and Computer Science Division, Argonne National
Laboratory, Lemont, Illinois 60439, United
States
| | - Qian Yang
- Computer Science and Engineering Department, University of Connecticut, Storrs, Connecticut 06269-4155, United States
| |
Collapse
|
45
|
Glielmo A, Husic BE, Rodriguez A, Clementi C, Noé F, Laio A. Unsupervised Learning Methods for Molecular Simulation Data. Chem Rev 2021; 121:9722-9758. [PMID: 33945269 PMCID: PMC8391792 DOI: 10.1021/acs.chemrev.0c01195] [Citation(s) in RCA: 116] [Impact Index Per Article: 38.7] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2020] [Indexed: 12/21/2022]
Abstract
Unsupervised learning is becoming an essential tool to analyze the increasingly large amounts of data produced by atomistic and molecular simulations, in material science, solid state physics, biophysics, and biochemistry. In this Review, we provide a comprehensive overview of the methods of unsupervised learning that have been most commonly used to investigate simulation data and indicate likely directions for further developments in the field. In particular, we discuss feature representation of molecular systems and present state-of-the-art algorithms of dimensionality reduction, density estimation, and clustering, and kinetic models. We divide our discussion into self-contained sections, each discussing a specific method. In each section, we briefly touch upon the mathematical and algorithmic foundations of the method, highlight its strengths and limitations, and describe the specific ways in which it has been used-or can be used-to analyze molecular simulation data.
Collapse
Affiliation(s)
- Aldo Glielmo
- International
School for Advanced Studies (SISSA) 34014 Trieste, Italy
| | - Brooke E. Husic
- Freie
Universität Berlin, Department of Mathematics
and Computer Science, 14195 Berlin, Germany
| | - Alex Rodriguez
- International Centre for Theoretical
Physics (ICTP), Condensed Matter and Statistical
Physics Section, 34100 Trieste, Italy
| | - Cecilia Clementi
- Freie
Universität Berlin, Department for
Physics, 14195 Berlin, Germany
- Rice
University Houston, Department of Chemistry, Houston, Texas 77005, United States
| | - Frank Noé
- Freie
Universität Berlin, Department of Mathematics
and Computer Science, 14195 Berlin, Germany
- Freie
Universität Berlin, Department for
Physics, 14195 Berlin, Germany
- Rice
University Houston, Department of Chemistry, Houston, Texas 77005, United States
| | - Alessandro Laio
- International
School for Advanced Studies (SISSA) 34014 Trieste, Italy
- International Centre for Theoretical
Physics (ICTP), Condensed Matter and Statistical
Physics Section, 34100 Trieste, Italy
| |
Collapse
|
46
|
Unke O, Chmiela S, Sauceda HE, Gastegger M, Poltavsky I, Schütt KT, Tkatchenko A, Müller KR. Machine Learning Force Fields. Chem Rev 2021; 121:10142-10186. [PMID: 33705118 PMCID: PMC8391964 DOI: 10.1021/acs.chemrev.0c01111] [Citation(s) in RCA: 371] [Impact Index Per Article: 123.7] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2020] [Indexed: 12/27/2022]
Abstract
In recent years, the use of machine learning (ML) in computational chemistry has enabled numerous advances previously out of reach due to the computational complexity of traditional electronic-structure methods. One of the most promising applications is the construction of ML-based force fields (FFs), with the aim to narrow the gap between the accuracy of ab initio methods and the efficiency of classical FFs. The key idea is to learn the statistical relation between chemical structure and potential energy without relying on a preconceived notion of fixed chemical bonds or knowledge about the relevant interactions. Such universal ML approximations are in principle only limited by the quality and quantity of the reference data used to train them. This review gives an overview of applications of ML-FFs and the chemical insights that can be obtained from them. The core concepts underlying ML-FFs are described in detail, and a step-by-step guide for constructing and testing them from scratch is given. The text concludes with a discussion of the challenges that remain to be overcome by the next generation of ML-FFs.
Collapse
Affiliation(s)
- Oliver
T. Unke
- Machine
Learning Group, Technische Universität
Berlin, 10587 Berlin, Germany
- DFG
Cluster of Excellence “Unifying Systems in Catalysis”
(UniSysCat), Technische Universität Berlin, 10623 Berlin, Germany
| | - Stefan Chmiela
- Machine
Learning Group, Technische Universität
Berlin, 10587 Berlin, Germany
| | - Huziel E. Sauceda
- Machine
Learning Group, Technische Universität
Berlin, 10587 Berlin, Germany
- BASLEARN,
BASF-TU Joint Lab, Technische Universität
Berlin, 10587 Berlin, Germany
| | - Michael Gastegger
- Machine
Learning Group, Technische Universität
Berlin, 10587 Berlin, Germany
- DFG
Cluster of Excellence “Unifying Systems in Catalysis”
(UniSysCat), Technische Universität Berlin, 10623 Berlin, Germany
- BASLEARN,
BASF-TU Joint Lab, Technische Universität
Berlin, 10587 Berlin, Germany
| | - Igor Poltavsky
- Department
of Physics and Materials Science, University
of Luxembourg, L-1511 Luxembourg City, Luxembourg
| | - Kristof T. Schütt
- Machine
Learning Group, Technische Universität
Berlin, 10587 Berlin, Germany
| | - Alexandre Tkatchenko
- Department
of Physics and Materials Science, University
of Luxembourg, L-1511 Luxembourg City, Luxembourg
| | - Klaus-Robert Müller
- Machine
Learning Group, Technische Universität
Berlin, 10587 Berlin, Germany
- BIFOLD−Berlin
Institute for the Foundations of Learning and Data, Berlin, Germany
- Department
of Artificial Intelligence, Korea University, Anam-dong, Seongbuk-gu, Seoul 02841, Korea
- Max Planck
Institute for Informatics, Stuhlsatzenhausweg, 66123 Saarbrücken, Germany
- Google
Research, Brain Team, Berlin, Germany
| |
Collapse
|
47
|
Alvarado W, Moller J, Ferguson AL, de Pablo JJ. Tetranucleosome Interactions Drive Chromatin Folding. ACS CENTRAL SCIENCE 2021; 7:1019-1027. [PMID: 34235262 PMCID: PMC8227587 DOI: 10.1021/acscentsci.1c00085] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/18/2021] [Indexed: 06/10/2023]
Abstract
The multiscale organizational structure of chromatin in eukaryotic cells is instrumental to DNA transcription, replication, and repair. At mesoscopic length scales, nucleosomes pack in a manner that serves to regulate gene expression through condensation and expansion of the genome. The particular structures that arise and their respective thermodynamic stabilities, however, have yet to be fully resolved. In this study, we combine molecular modeling using the 1CPN mesoscale model of chromatin with nonlinear manifold learning to identify and characterize the structure and free energy of metastable states of short chromatin segments comprising between 4- and 16-nucleosomes. Our results reveal the formation of two previously characterized tetranucleosomal conformations, the "α-tetrahedron" and the "β-rhombus", which have been suggested to play an important role in the accessibility of DNA and, respectively, induce local chromatin compaction or elongation. The spontaneous formation of these motifs is potentially responsible for the slow nucleosome dynamics observed in experimental studies. Increases of the nucleosome repeat length are accompanied by more pronounced structural irregularity and flexibility and, ultimately, a dynamic liquid-like behavior that allows for frequent structural reorganization. Our findings indicate that tetranucleosome motifs are intrinsically stable structural states, driven by local internucleosomal interactions, and support a mechanistic picture of chromatin packing, dynamics, and accessibility that is strongly influenced by emergent local mesoscale structure.
Collapse
Affiliation(s)
- Walter Alvarado
- Biophysical
Sciences, University of Chicago, Chicago, Illinois 60637 United States
| | - Joshua Moller
- Pritzker
School of Molecular Engineering, University
of Chicago, Chicago, Illinois 60637 United States
| | - Andrew L. Ferguson
- Pritzker
School of Molecular Engineering, University
of Chicago, Chicago, Illinois 60637 United States
| | - Juan J. de Pablo
- Pritzker
School of Molecular Engineering, University
of Chicago, Chicago, Illinois 60637 United States
| |
Collapse
|
48
|
Webber RJ, Thiede EH, Dow D, Dinner AR, Weare J. Error Bounds for Dynamical Spectral Estimation. SIAM JOURNAL ON MATHEMATICS OF DATA SCIENCE 2021; 3:225-252. [PMID: 34355137 PMCID: PMC8336423 DOI: 10.1137/20m1335984] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Dynamical spectral estimation is a well-established numerical approach for estimating eigenvalues and eigenfunctions of the Markov transition operator from trajectory data. Although the approach has been widely applied in biomolecular simulations, its error properties remain poorly understood. Here we analyze the error of a dynamical spectral estimation method called "the variational approach to conformational dynamics" (VAC). We bound the approximation error and estimation error for VAC estimates. Our analysis establishes VAC's convergence properties and suggests new strategies for tuning VAC to improve accuracy.
Collapse
Affiliation(s)
- Robert J Webber
- Courant Institute of Mathematical Sciences, New York University, New York, NY 10012 USA
| | - Erik H Thiede
- Department of Chemistry, University of Chicago, Chicago, IL 60637 USA
| | - Douglas Dow
- Department of Mathematics, University of Chicago, Chicago, IL 60637 USA
| | - Aaron R Dinner
- Department of Chemistry, University of Chicago, Chicago, IL 60637 USA
| | - Jonathan Weare
- Courant Institute of Mathematical Sciences, New York University, New York, NY 10012 USA
| |
Collapse
|
49
|
Appadurai R, Nagesh J, Srivastava A. High resolution ensemble description of metamorphic and intrinsically disordered proteins using an efficient hybrid parallel tempering scheme. Nat Commun 2021; 12:958. [PMID: 33574233 PMCID: PMC7878814 DOI: 10.1038/s41467-021-21105-7] [Citation(s) in RCA: 42] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2020] [Accepted: 01/08/2021] [Indexed: 12/26/2022] Open
Abstract
Mapping free energy landscapes of complex multi-funneled metamorphic proteins and weakly-funneled intrinsically disordered proteins (IDPs) remains challenging. While rare-event sampling molecular dynamics simulations can be useful, they often need to either impose restraints or reweigh the generated data to match experiments. Here, we present a parallel-tempering method that takes advantage of accelerated water dynamics and allows efficient and accurate conformational sampling across a wide variety of proteins. We demonstrate the improved sampling efficiency by benchmarking against standard model systems such as alanine di-peptide, TRP-cage and β-hairpin. The method successfully scales to large metamorphic proteins such as RFA-H and to highly disordered IDPs such as Histatin-5. Across the diverse proteins, the calculated ensemble averages match well with the NMR, SAXS and other biophysical experiments without the need to reweigh. By allowing accurate sampling across different landscapes, the method opens doors for sampling free energy landscape of complex uncharted proteins.
Collapse
Affiliation(s)
- Rajeswari Appadurai
- Molecular Biophysics Unit, Indian Institute of Science, Bangalore, Karnataka, India
| | - Jayashree Nagesh
- Solid State & Structural Chemistry Unit, Indian Institute of Science, Bangalore, Karnataka, India
| | - Anand Srivastava
- Molecular Biophysics Unit, Indian Institute of Science, Bangalore, Karnataka, India.
| |
Collapse
|
50
|
Pant S, Smith Z, Wang Y, Tajkhorshid E, Tiwary P. Confronting pitfalls of AI-augmented molecular dynamics using statistical physics. J Chem Phys 2020; 153:234118. [PMID: 33353347 PMCID: PMC7863682 DOI: 10.1063/5.0030931] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2020] [Accepted: 11/29/2020] [Indexed: 12/31/2022] Open
Abstract
Artificial intelligence (AI)-based approaches have had indubitable impact across the sciences through the ability to extract relevant information from raw data. Recently, AI has also found use in enhancing the efficiency of molecular simulations, wherein AI derived slow modes are used to accelerate the simulation in targeted ways. However, while typical fields where AI is used are characterized by a plethora of data, molecular simulations, per construction, suffer from limited sampling and thus limited data. As such, the use of AI in molecular simulations can suffer from a dangerous situation where the AI-optimization could get stuck in spurious regimes, leading to incorrect characterization of the reaction coordinate (RC) for the problem at hand. When such an incorrect RC is then used to perform additional simulations, one could start to deviate progressively from the ground truth. To deal with this problem of spurious AI-solutions, here, we report a novel and automated algorithm using ideas from statistical mechanics. It is based on the notion that a more reliable AI-solution will be one that maximizes the timescale separation between slow and fast processes. To learn this timescale separation even from limited data, we use a maximum caliber-based framework. We show the applicability of this automatic protocol for three classic benchmark problems, namely, the conformational dynamics of a model peptide, ligand-unbinding from a protein, and folding/unfolding energy landscape of the C-terminal domain of protein G. We believe that our work will lead to increased and robust use of trustworthy AI in molecular simulations of complex systems.
Collapse
Affiliation(s)
- Shashank Pant
- NIH Center for Macromolecular Modeling and Bioinformatics, Beckman Institute for Advanced Science and Technology, Department of Biochemistry, Center for Biophysics and Quantitative Biology, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, USA
| | | | | | - Emad Tajkhorshid
- NIH Center for Macromolecular Modeling and Bioinformatics, Beckman Institute for Advanced Science and Technology, Department of Biochemistry, Center for Biophysics and Quantitative Biology, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, USA
| | | |
Collapse
|