1
|
Rydzewski J. Spectral Map for Slow Collective Variables, Markovian Dynamics, and Transition State Ensembles. J Chem Theory Comput 2024; 20. [PMID: 39265157 PMCID: PMC11428138 DOI: 10.1021/acs.jctc.4c00428] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2024] [Revised: 08/14/2024] [Accepted: 08/14/2024] [Indexed: 09/14/2024]
Abstract
Understanding the behavior of complex molecular systems is a fundamental problem in physical chemistry. To describe the long-time dynamics of such systems, which is responsible for their most informative characteristics, we can identify a few slow collective variables (CVs) while treating the remaining fast variables as thermal noise. This enables us to simplify the dynamics and treat it as diffusion in a free-energy landscape spanned by slow CVs, effectively rendering the dynamics Markovian. Our recent statistical learning technique, spectral map [Rydzewski, J. J. Phys. Chem. Lett. 2023, 14(22), 5216-5220], explores this strategy to learn slow CVs by maximizing a spectral gap of a transition matrix. In this work, we introduce several advancements into our framework, using a high-dimensional reversible folding process of a protein as an example. We implement an algorithm for coarse-graining Markov transition matrices to partition the reduced space of slow CVs kinetically and use it to define a transition state ensemble. We show that slow CVs learned by spectral map closely approach the Markovian limit for an overdamped diffusion. We demonstrate that coordinate-dependent diffusion coefficients only slightly affect the constructed free-energy landscapes. Finally, we present how spectral maps can be used to quantify the importance of features and compare slow CVs with structural descriptors commonly used in protein folding. Overall, we demonstrate that a single slow CV learned by spectral map can be used as a physical reaction coordinate to capture essential characteristics of protein folding.
Collapse
Affiliation(s)
- Jakub Rydzewski
- Institute of Physics, Faculty
of Physics, Astronomy and Informatics, Nicolaus
Copernicus University, Grudziadzka 5, 87-100 Toruń, Poland
| |
Collapse
|
2
|
Brunton SL, Kutz JN. Promising directions of machine learning for partial differential equations. NATURE COMPUTATIONAL SCIENCE 2024; 4:483-494. [PMID: 38942926 DOI: 10.1038/s43588-024-00643-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/08/2023] [Accepted: 05/13/2024] [Indexed: 06/30/2024]
Abstract
Partial differential equations (PDEs) are among the most universal and parsimonious descriptions of natural physical laws, capturing a rich variety of phenomenology and multiscale physics in a compact and symbolic representation. Here, we examine several promising avenues of PDE research that are being advanced by machine learning, including (1) discovering new governing PDEs and coarse-grained approximations for complex natural and engineered systems, (2) learning effective coordinate systems and reduced-order models to make PDEs more amenable to analysis, and (3) representing solution operators and improving traditional numerical algorithms. In each of these fields, we summarize key advances, ongoing challenges, and opportunities for further development.
Collapse
Affiliation(s)
- Steven L Brunton
- Department of Mechanical Engineering, University of Washington, Seattle, WA, USA.
| | - J Nathan Kutz
- Department of Applied Mathematics, University of Washington, Seattle, WA, USA
| |
Collapse
|
3
|
Wang D, Qiu Y, Beyerle ER, Huang X, Tiwary P. Information Bottleneck Approach for Markov Model Construction. J Chem Theory Comput 2024; 20:5352-5367. [PMID: 38859575 PMCID: PMC11199095 DOI: 10.1021/acs.jctc.4c00449] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/12/2024]
Abstract
Markov state models (MSMs) have proven valuable in studying the dynamics of protein conformational changes via statistical analysis of molecular dynamics simulations. In MSMs, the complex configuration space is coarse-grained into conformational states, with dynamics modeled by a series of Markovian transitions among these states at discrete lag times. Constructing the Markovian model at a specific lag time necessitates defining states that circumvent significant internal energy barriers, enabling internal dynamics relaxation within the lag time. This process effectively coarse-grains time and space, integrating out rapid motions within metastable states. Thus, MSMs possess a multiresolution nature, where the granularity of states can be adjusted according to the time-resolution, offering flexibility in capturing system dynamics. This work introduces a continuous embedding approach for molecular conformations using the state predictive information bottleneck (SPIB), a framework that unifies dimensionality reduction and state space partitioning via a continuous, machine learned basis set. Without explicit optimization of the VAMP-based scores, SPIB demonstrates state-of-the-art performance in identifying slow dynamical processes and constructing predictive multiresolution Markovian models. Through applications to well-validated mini-proteins, SPIB showcases unique advantages compared to competing methods. It autonomously and self-consistently adjusts the number of metastable states based on a specified minimal time resolution, eliminating the need for manual tuning. While maintaining efficacy in dynamical properties, SPIB excels in accurately distinguishing metastable states and capturing numerous well-populated macrostates. This contrasts with existing VAMP-based methods, which often emphasize slow dynamics at the expense of incorporating numerous sparsely populated states. Furthermore, SPIB's ability to learn a low-dimensional continuous embedding of the underlying MSMs enhances the interpretation of dynamic pathways. With these benefits, we propose SPIB as an easy-to-implement methodology for end-to-end MSM construction.
Collapse
Affiliation(s)
- Dedi Wang
- Biophysics Program and Institute for Physical Science and Technology, University of Maryland, College Park, MD 20742, United States
| | - Yunrui Qiu
- Department of Chemistry, Theoretical Chemistry Institute, University of Wisconsin-Madison, Madison, WI 53706, United States
- Data Science Institute, University of Wisconsin-Madison, Madison, WI, 53706, United States
| | - Eric R. Beyerle
- Institute for Physical Science and Technology, University of Maryland, College Park, MD 20742, United States
| | - Xuhui Huang
- Department of Chemistry, Theoretical Chemistry Institute, University of Wisconsin-Madison, Madison, WI 53706, United States
- Data Science Institute, University of Wisconsin-Madison, Madison, WI, 53706, United States
| | - Pratyush Tiwary
- Department of Chemistry and Biochemistry and Institute for Physical Science and Technology, University of Maryland, College Park, MD 20742, United States
- University of Maryland Institute for Health Computing, Bethesda, MD 20852, United States
| |
Collapse
|
4
|
Marques S, Kouba P, Legrand A, Sedlar J, Disson L, Planas-Iglesias J, Sanusi Z, Kunka A, Damborsky J, Pajdla T, Prokop Z, Mazurenko S, Sivic J, Bednar D. CoVAMPnet: Comparative Markov State Analysis for Studying Effects of Drug Candidates on Disordered Biomolecules. JACS AU 2024; 4:2228-2245. [PMID: 38938816 PMCID: PMC11200249 DOI: 10.1021/jacsau.4c00182] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/28/2024] [Revised: 04/24/2024] [Accepted: 05/13/2024] [Indexed: 06/29/2024]
Abstract
Computational study of the effect of drug candidates on intrinsically disordered biomolecules is challenging due to their vast and complex conformational space. Here, we developed a comparative Markov state analysis (CoVAMPnet) framework to quantify changes in the conformational distribution and dynamics of a disordered biomolecule in the presence and absence of small organic drug candidate molecules. First, molecular dynamics trajectories are generated using enhanced sampling, in the presence and absence of small molecule drug candidates, and ensembles of soft Markov state models (MSMs) are learned for each system using unsupervised machine learning. Second, these ensembles of learned MSMs are aligned across different systems based on a solution to an optimal transport problem. Third, the directional importance of inter-residue distances for the assignment to different conformational states is assessed by a discriminative analysis of aggregated neural network gradients. This final step provides interpretability and biophysical context to the learned MSMs. We applied this novel computational framework to assess the effects of ongoing phase 3 therapeutics tramiprosate (TMP) and its metabolite 3-sulfopropanoic acid (SPA) on the disordered Aβ42 peptide involved in Alzheimer's disease. Based on adaptive sampling molecular dynamics and CoVAMPnet analysis, we observed that both TMP and SPA preserved more structured conformations of Aβ42 by interacting nonspecifically with charged residues. SPA impacted Aβ42 more than TMP, protecting α-helices and suppressing the formation of aggregation-prone β-strands. Experimental biophysical analyses showed only mild effects of TMP/SPA on Aβ42 and activity enhancement by the endogenous metabolization of TMP into SPA. Our data suggest that TMP/SPA may also target biomolecules other than Aβ peptides. The CoVAMPnet method is broadly applicable to study the effects of drug candidates on the conformational behavior of intrinsically disordered biomolecules.
Collapse
Affiliation(s)
- Sérgio
M. Marques
- Loschmidt
Laboratories, Department of Experimental Biology and RECETOX, Faculty
of Science, Masaryk University, Kamenice 5, Brno 625 00, Czech Republic
- International
Clinical Research Center, St. Anne’s
University Hospital Brno, Pekarska 53, Brno 656
91, Czech Republic
| | - Petr Kouba
- Loschmidt
Laboratories, Department of Experimental Biology and RECETOX, Faculty
of Science, Masaryk University, Kamenice 5, Brno 625 00, Czech Republic
- Czech
Institute of Informatics, Robotics and Cybernetics, Czech Technical University in Prague, Jugoslavskych partyzanu 1580/3, Dejvice, Praha 6 160 00, Czech Republic
- Faculty
of Electrical Engineering, Czech Technical
University in Prague, Technicka 2, Dejvice, Praha 6 166 27, Czech Republic
| | - Anthony Legrand
- Loschmidt
Laboratories, Department of Experimental Biology and RECETOX, Faculty
of Science, Masaryk University, Kamenice 5, Brno 625 00, Czech Republic
- International
Clinical Research Center, St. Anne’s
University Hospital Brno, Pekarska 53, Brno 656
91, Czech Republic
| | - Jiri Sedlar
- Czech
Institute of Informatics, Robotics and Cybernetics, Czech Technical University in Prague, Jugoslavskych partyzanu 1580/3, Dejvice, Praha 6 160 00, Czech Republic
| | - Lucas Disson
- Czech
Institute of Informatics, Robotics and Cybernetics, Czech Technical University in Prague, Jugoslavskych partyzanu 1580/3, Dejvice, Praha 6 160 00, Czech Republic
| | - Joan Planas-Iglesias
- Loschmidt
Laboratories, Department of Experimental Biology and RECETOX, Faculty
of Science, Masaryk University, Kamenice 5, Brno 625 00, Czech Republic
- International
Clinical Research Center, St. Anne’s
University Hospital Brno, Pekarska 53, Brno 656
91, Czech Republic
| | - Zainab Sanusi
- Loschmidt
Laboratories, Department of Experimental Biology and RECETOX, Faculty
of Science, Masaryk University, Kamenice 5, Brno 625 00, Czech Republic
- International
Clinical Research Center, St. Anne’s
University Hospital Brno, Pekarska 53, Brno 656
91, Czech Republic
| | - Antonin Kunka
- Loschmidt
Laboratories, Department of Experimental Biology and RECETOX, Faculty
of Science, Masaryk University, Kamenice 5, Brno 625 00, Czech Republic
- International
Clinical Research Center, St. Anne’s
University Hospital Brno, Pekarska 53, Brno 656
91, Czech Republic
| | - Jiri Damborsky
- Loschmidt
Laboratories, Department of Experimental Biology and RECETOX, Faculty
of Science, Masaryk University, Kamenice 5, Brno 625 00, Czech Republic
- International
Clinical Research Center, St. Anne’s
University Hospital Brno, Pekarska 53, Brno 656
91, Czech Republic
| | - Tomas Pajdla
- Czech
Institute of Informatics, Robotics and Cybernetics, Czech Technical University in Prague, Jugoslavskych partyzanu 1580/3, Dejvice, Praha 6 160 00, Czech Republic
| | - Zbynek Prokop
- Loschmidt
Laboratories, Department of Experimental Biology and RECETOX, Faculty
of Science, Masaryk University, Kamenice 5, Brno 625 00, Czech Republic
- International
Clinical Research Center, St. Anne’s
University Hospital Brno, Pekarska 53, Brno 656
91, Czech Republic
| | - Stanislav Mazurenko
- Loschmidt
Laboratories, Department of Experimental Biology and RECETOX, Faculty
of Science, Masaryk University, Kamenice 5, Brno 625 00, Czech Republic
- International
Clinical Research Center, St. Anne’s
University Hospital Brno, Pekarska 53, Brno 656
91, Czech Republic
| | - Josef Sivic
- Czech
Institute of Informatics, Robotics and Cybernetics, Czech Technical University in Prague, Jugoslavskych partyzanu 1580/3, Dejvice, Praha 6 160 00, Czech Republic
| | - David Bednar
- Loschmidt
Laboratories, Department of Experimental Biology and RECETOX, Faculty
of Science, Masaryk University, Kamenice 5, Brno 625 00, Czech Republic
- International
Clinical Research Center, St. Anne’s
University Hospital Brno, Pekarska 53, Brno 656
91, Czech Republic
| |
Collapse
|
5
|
Yawata K, Fukami K, Taira K, Nakao H. Phase autoencoder for limit-cycle oscillators. CHAOS (WOODBURY, N.Y.) 2024; 34:063111. [PMID: 38829787 DOI: 10.1063/5.0205718] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/27/2024] [Accepted: 05/10/2024] [Indexed: 06/05/2024]
Abstract
We present a phase autoencoder that encodes the asymptotic phase of a limit-cycle oscillator, a fundamental quantity characterizing its synchronization dynamics. This autoencoder is trained in such a way that its latent variables directly represent the asymptotic phase of the oscillator. The trained autoencoder can perform two functions without relying on the mathematical model of the oscillator: first, it can evaluate the asymptotic phase and the phase sensitivity function of the oscillator; second, it can reconstruct the oscillator state on the limit cycle in the original space from the phase value as an input. Using several examples of limit-cycle oscillators, we demonstrate that the asymptotic phase and the phase sensitivity function can be estimated only from time-series data by the trained autoencoder. We also present a simple method for globally synchronizing two oscillators as an application of the trained autoencoder.
Collapse
Affiliation(s)
- Koichiro Yawata
- Department of Systems and Control Engineering, Tokyo Institute of Technology, Tokyo 152-8552, Japan
| | - Kai Fukami
- Department of Mechanical and Aerospace Engineering, University of California, Los Angeles, Los Angeles, California 90095, USA
| | - Kunihiko Taira
- Department of Mechanical and Aerospace Engineering, University of California, Los Angeles, Los Angeles, California 90095, USA
| | - Hiroya Nakao
- Department of Systems and Control Engineering, Tokyo Institute of Technology, Tokyo 152-8552, Japan
- Research Center for Autonomous Systems Materialogy, Institute of Innovative Research, Tokyo Institute of Technology, Kanagawa 226-8501, Japan
| |
Collapse
|
6
|
Stracke K, Evans JD. The use of collective variables and enhanced sampling in the simulations of existing and emerging microporous materials. NANOSCALE 2024. [PMID: 38647659 DOI: 10.1039/d4nr01024h] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/25/2024]
Abstract
Microporous materials, including zeolites, metal-organic frameworks, and cage compounds, offer diverse functionalities due to their unique dynamics and guest confinement properties. These materials play a significant role in separation, catalysis, and sensing, but their complexity hinders exploration using traditional atomistic simulations. This review explores collective variables (CVs) paired with enhanced sampling as a powerful approach to enable efficient investigation of key features in microporous materials. We highlight successful applications of CVs in studying adsorption, diffusion, phase transitions, and mechanical properties, demonstrating their crucial role in guiding material design and optimisation. The future of CVs lies in integration with techniques like machine learning, allowing for enhanced efficiency and accuracy. By tailoring CVs to specific materials and developing multi-scale approaches we can further unlock the intricacies of these fascinating materials. Simulations are a cornerstone in unravelling the complexities of microporous materials and are crucial for our future understanding.
Collapse
Affiliation(s)
- Konstantin Stracke
- School of Physics, Chemistry and Earth Science, The University of Adelaide, 5005 Australia.
| | - Jack D Evans
- School of Physics, Chemistry and Earth Science, The University of Adelaide, 5005 Australia.
| |
Collapse
|
7
|
Sahimi M. Physics-informed and data-driven discovery of governing equations for complex phenomena in heterogeneous media. Phys Rev E 2024; 109:041001. [PMID: 38755895 DOI: 10.1103/physreve.109.041001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2023] [Indexed: 05/18/2024]
Abstract
Rapid evolution of sensor technology, advances in instrumentation, and progress in devising data-acquisition software and hardware are providing vast amounts of data for various complex phenomena that occur in heterogeneous media, ranging from those in atmospheric environment, to large-scale porous formations, and biological systems. The tremendous increase in the speed of scientific computing has also made it possible to emulate diverse multiscale and multiphysics phenomena that contain elements of stochasticity or heterogeneity, and to generate large volumes of numerical data for them. Thus, given a heterogeneous system with annealed or quenched disorder in which a complex phenomenon occurs, how should one analyze and model the system and phenomenon, explain the data, and make predictions for length and time scales much larger than those over which the data were collected? We divide such systems into three distinct classes. (i) Those for which the governing equations for the physical phenomena of interest, as well as data, are known, but solving the equations over large length scales and long times is very difficult. (ii) Those for which data are available, but the governing equations are only partially known, in the sense that they either contain various coefficients that must be evaluated based on the data, or that the number of degrees of freedom of the system is so large that deriving the complete equations is very difficult, if not impossible, as a result of which one must develop the governing equations with reduced dimensionality. (iii) In the third class are systems for which large amounts of data are available, but the governing equations for the phenomena of interest are not known. Several classes of physics-informed and data-driven approaches for analyzing and modeling of the three classes of systems have been emerging, which are based on machine learning, symbolic regression, the Koopman operator, the Mori-Zwanzig projection operator formulation, sparse identification of nonlinear dynamics, data assimilation combined with a neural network, and stochastic optimization and analysis. This perspective describes such methods and the latest developments in this highly important and rapidly expanding area and discusses possible future directions.
Collapse
Affiliation(s)
- Muhammad Sahimi
- Mork Family Department of Chemical Engineering and Materials Science, University of Southern California, Los Angeles, California 90089-1211, USA
| |
Collapse
|
8
|
Müllender L, Rizzi A, Parrinello M, Carloni P, Mandelli D. Effective data-driven collective variables for free energy calculations from metadynamics of paths. PNAS NEXUS 2024; 3:pgae159. [PMID: 38665160 PMCID: PMC11044970 DOI: 10.1093/pnasnexus/pgae159] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/05/2023] [Accepted: 04/04/2024] [Indexed: 04/28/2024]
Abstract
A variety of enhanced sampling (ES) methods predict multidimensional free energy landscapes associated with biological and other molecular processes as a function of a few selected collective variables (CVs). The accuracy of these methods is crucially dependent on the ability of the chosen CVs to capture the relevant slow degrees of freedom of the system. For complex processes, finding such CVs is the real challenge. Machine learning (ML) CVs offer, in principle, a solution to handle this problem. However, these methods rely on the availability of high-quality datasets-ideally incorporating information about physical pathways and transition states-which are difficult to access, therefore greatly limiting their domain of application. Here, we demonstrate how these datasets can be generated by means of ES simulations in trajectory space via the metadynamics of paths algorithm. The approach is expected to provide a general and efficient way to generate efficient ML-based CVs for the fast prediction of free energy landscapes in ES simulations. We demonstrate our approach with two numerical examples, a 2D model potential and the isomerization of alanine dipeptide, using deep targeted discriminant analysis as our ML-based CV of choice.
Collapse
Affiliation(s)
- Lukas Müllender
- Department of Applied Physics, Science for Life Laboratory, KTH Royal Institute of Technology, SE-171 21 Solna, Sweden
- Computational Biomedicine, Institute of Advanced Simulations IAS-5/Institute for Neuroscience and Medicine INM-9, Forschungszentrum Jülich GmbH, 52428 Jülich, Germany
- Department of Physics, RWTH Aachen University, 52062 Aachen, Germany
| | - Andrea Rizzi
- Computational Biomedicine, Institute of Advanced Simulations IAS-5/Institute for Neuroscience and Medicine INM-9, Forschungszentrum Jülich GmbH, 52428 Jülich, Germany
- Atomistic Simulations, Italian Institute of Technology, 16163 Genova, Italy
| | - Michele Parrinello
- Atomistic Simulations, Italian Institute of Technology, 16163 Genova, Italy
| | - Paolo Carloni
- Computational Biomedicine, Institute of Advanced Simulations IAS-5/Institute for Neuroscience and Medicine INM-9, Forschungszentrum Jülich GmbH, 52428 Jülich, Germany
- Department of Physics, RWTH Aachen University, 52062 Aachen, Germany
- Universitätsklinikum, RWTH Aachen University, 52062 Aachen, Germany
| | - Davide Mandelli
- Computational Biomedicine, Institute of Advanced Simulations IAS-5/Institute for Neuroscience and Medicine INM-9, Forschungszentrum Jülich GmbH, 52428 Jülich, Germany
| |
Collapse
|
9
|
Wu Y, Cao S, Qiu Y, Huang X. Tutorial on how to build non-Markovian dynamic models from molecular dynamics simulations for studying protein conformational changes. J Chem Phys 2024; 160:121501. [PMID: 38516972 DOI: 10.1063/5.0189429] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2023] [Accepted: 02/20/2024] [Indexed: 03/23/2024] Open
Abstract
Protein conformational changes play crucial roles in their biological functions. In recent years, the Markov State Model (MSM) constructed from extensive Molecular Dynamics (MD) simulations has emerged as a powerful tool for modeling complex protein conformational changes. In MSMs, dynamics are modeled as a sequence of Markovian transitions among metastable conformational states at discrete time intervals (called lag time). A major challenge for MSMs is that the lag time must be long enough to allow transitions among states to become memoryless (or Markovian). However, this lag time is constrained by the length of individual MD simulations available to track these transitions. To address this challenge, we have recently developed Generalized Master Equation (GME)-based approaches, encoding non-Markovian dynamics using a time-dependent memory kernel. In this Tutorial, we introduce the theory behind two recently developed GME-based non-Markovian dynamic models: the quasi-Markov State Model (qMSM) and the Integrative Generalized Master Equation (IGME). We subsequently outline the procedures for constructing these models and provide a step-by-step tutorial on applying qMSM and IGME to study two peptide systems: alanine dipeptide and villin headpiece. This Tutorial is available at https://github.com/xuhuihuang/GME_tutorials. The protocols detailed in this Tutorial aim to be accessible for non-experts interested in studying the biomolecular dynamics using these non-Markovian dynamic models.
Collapse
Affiliation(s)
- Yue Wu
- Department of Chemistry, Theoretical Chemistry Institute, University of Wisconsin-Madison, Madison, Wisconsin 53706, USA
| | - Siqin Cao
- Department of Chemistry, Theoretical Chemistry Institute, University of Wisconsin-Madison, Madison, Wisconsin 53706, USA
| | - Yunrui Qiu
- Department of Chemistry, Theoretical Chemistry Institute, University of Wisconsin-Madison, Madison, Wisconsin 53706, USA
| | - Xuhui Huang
- Department of Chemistry, Theoretical Chemistry Institute, University of Wisconsin-Madison, Madison, Wisconsin 53706, USA
- Data Science Institute, University of Wisconsin-Madison, Madison, Wisconsin 53706, USA
| |
Collapse
|
10
|
Lelièvre T, Pigeon T, Stoltz G, Zhang W. Analyzing Multimodal Probability Measures with Autoencoders. J Phys Chem B 2024; 128:2607-2631. [PMID: 38466759 DOI: 10.1021/acs.jpcb.3c07075] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/13/2024]
Abstract
Finding collective variables to describe some important coarse-grained information on physical systems, in particular metastable states, remains a key issue in molecular dynamics. Recently, machine learning techniques have been intensively used to complement and possibly bypass expert knowledge in order to construct collective variables. Our focus here is on neural network approaches based on autoencoders. We study some relevant mathematical properties of the loss function considered for training autoencoders and provide physical interpretations based on conditional variances and minimum energy paths. We also consider various extensions in order to better describe physical systems, by incorporating more information on transition states at saddle points, and/or allowing for multiple decoders in order to describe several transition paths. Our results are illustrated on toy two-dimensional systems and on alanine dipeptide.
Collapse
Affiliation(s)
- Tony Lelièvre
- CERMICS, École des Ponts ParisTech, 6-8 Avenue Blaise Pascal, 77455 Marne-la-Vallée, France
- MATHERIALS Team-project, Inria Paris, 2 Rue Simone Iff, 75012 Paris, France
| | - Thomas Pigeon
- CERMICS, École des Ponts ParisTech, 6-8 Avenue Blaise Pascal, 77455 Marne-la-Vallée, France
- MATHERIALS Team-project, Inria Paris, 2 Rue Simone Iff, 75012 Paris, France
- IFP Energies Nouvelles, Rond-Point de l'Echangeur de Solaize, BP 3, 69360 Solaize, France
| | - Gabriel Stoltz
- CERMICS, École des Ponts ParisTech, 6-8 Avenue Blaise Pascal, 77455 Marne-la-Vallée, France
- MATHERIALS Team-project, Inria Paris, 2 Rue Simone Iff, 75012 Paris, France
| | - Wei Zhang
- Department of Mathematics and Computer Science, Freie Universität Berlin, Arnimallee 14, 14195 Berlin, Germany
- Zuse Institute Berlin, Takustraße 7, 14195 Berlin, Germany
| |
Collapse
|
11
|
Rubaiyat AHM, Thai DH, Nichols JM, Hutchinson MN, Wallen SP, Naify CJ, Geib N, Haberman MR, Rohde GK. Data-driven Identification of Parametric Governing Equations of Dynamical Systems Using the Signed Cumulative Distribution Transform. COMPUTER METHODS IN APPLIED MECHANICS AND ENGINEERING 2024; 422:116822. [PMID: 38352168 PMCID: PMC10861186 DOI: 10.1016/j.cma.2024.116822] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/16/2024]
Abstract
This paper presents a novel data-driven approach to identify partial differential equation (PDE) parameters of a dynamical system. Specifically, we adopt a mathematical "transport" model for the solution of the dynamical system at specific spatial locations that allows us to accurately estimate the model parameters, including those associated with structural damage. This is accomplished by means of a newly-developed mathematical transform, the signed cumulative distribution transform (SCDT), which is shown to convert the general nonlinear parameter estimation problem into a simple linear regression. This approach has the additional practical advantage of requiring no a priori knowledge of the source of the excitation (or, alternatively, the initial conditions). By using training data, we devise a coarse regression procedure to recover different PDE parameters from the PDE solution measured at a single location. Numerical experiments show that the proposed regression procedure is capable of detecting and estimating PDE parameters with superior accuracy compared to a number of recently developed machine learning methods. Furthermore, a damage identification experiment conducted on a publicly available dataset provides strong evidence of the proposed method's effectiveness in structural health monitoring (SHM) applications. The Python implementation of the proposed system identification technique is integrated as a part of the software package PyTransKit [1].
Collapse
Affiliation(s)
- Abu Hasnat Mohammad Rubaiyat
- Department of Electrical and Computer Engineering, University of Virginia, Charlottesville, VA, 22904, USA
- U.S. Naval Research Laboratory, Washington, DC, 20375, USA
| | - Duy H Thai
- Department of Biomedical Engineering, University of Virginia, Charlottesville, VA, 22908, USA
| | | | | | - Samuel P Wallen
- Applied Research Laboratories, The University of Texas at Austin, Austin, TX, 78712, USA
| | - Christina J Naify
- Applied Research Laboratories, The University of Texas at Austin, Austin, TX, 78712, USA
| | - Nathan Geib
- Applied Research Laboratories, The University of Texas at Austin, Austin, TX, 78712, USA
| | - Michael R Haberman
- Walker Department of Mechanical Engineering, The University of Texas at Austin, Austin, TX, 78712, USA
- Applied Research Laboratories, The University of Texas at Austin, Austin, TX, 78712, USA
| | - Gustavo K Rohde
- Department of Biomedical Engineering, University of Virginia, Charlottesville, VA, 22908, USA
- Department of Electrical and Computer Engineering, University of Virginia, Charlottesville, VA, 22904, USA
| |
Collapse
|
12
|
Rydzewski J, Gökdemir T. Learning Markovian dynamics with spectral maps. J Chem Phys 2024; 160:091102. [PMID: 38436438 DOI: 10.1063/5.0189241] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2023] [Accepted: 02/05/2024] [Indexed: 03/05/2024] Open
Abstract
The long-time behavior of many complex molecular systems can often be described by Markovian dynamics in a slow subspace spanned by a few reaction coordinates referred to as collective variables (CVs). However, determining CVs poses a fundamental challenge in chemical physics. Depending on intuition or trial and error to construct CVs can lead to non-Markovian dynamics with long memory effects, hindering analysis. To address this problem, we continue to develop a recently introduced deep-learning technique called spectral map [J. Rydzewski, J. Phys. Chem. Lett. 14, 5216-5220 (2023)]. Spectral map learns slow CVs by maximizing a spectral gap of a Markov transition matrix describing anisotropic diffusion. Here, to represent heterogeneous and multiscale free-energy landscapes with spectral map, we implement an adaptive algorithm to estimate transition probabilities. Through a Markov state model analysis, we validate that spectral map learns slow CVs related to the dominant relaxation timescales and discerns between long-lived metastable states.
Collapse
Affiliation(s)
- Jakub Rydzewski
- Institute of Physics, Faculty of Physics, Astronomy and Informatics, Nicolaus Copernicus University, Grudziadzka 5, 87-100 Toruń, Poland
| | - Tuğçe Gökdemir
- Institute of Physics, Faculty of Physics, Astronomy and Informatics, Nicolaus Copernicus University, Grudziadzka 5, 87-100 Toruń, Poland
| |
Collapse
|
13
|
Karrenbrock M, Rizzi V, Procacci P, Gervasio FL. Addressing Suboptimal Poses in Nonequilibrium Alchemical Calculations. J Phys Chem B 2024; 128:1595-1605. [PMID: 38323915 DOI: 10.1021/acs.jpcb.3c06516] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/08/2024]
Abstract
Alchemical transformations can be used to quantitatively estimate absolute binding free energies at a reasonable computational cost. However, most of the approaches currently in use require knowledge of the correct (crystallographic) pose. In this paper, we present a combined Hamiltonian replica exchange nonequilibrium alchemical method that allows us to reliably calculate absolute binding free energies, even when starting from suboptimal initial binding poses. Performing a preliminary Hamiltonian replica exchange enhances the sampling of slow degrees of freedom of the ligand and the target, allowing the system to populate the correct binding pose when starting from an approximate docking pose. We apply the method on 6 ligands of the first bromodomain of the BRD4 bromodomain-containing protein. For each ligand, we start nonequilibrium alchemical transformations from both the crystallographic pose and the top-scoring docked pose that are often significantly different. We show that the method produces statistically equivalent binding free energies, making it a useful tool for computational drug discovery pipelines.
Collapse
Affiliation(s)
- Maurice Karrenbrock
- School of Pharmaceutical Sciences, University of Geneva, Rue Michel-Servet 1, CH-1206 Geneva, Switzerland
| | - Valerio Rizzi
- School of Pharmaceutical Sciences, University of Geneva, Rue Michel-Servet 1, CH-1206 Geneva, Switzerland
| | - Piero Procacci
- Chemistry Department, University of Florence, Via della Lastruccia 3-13, 50019 Sesto Fiorentino, Italy
| | - Francesco Luigi Gervasio
- School of Pharmaceutical Sciences, University of Geneva, Rue Michel-Servet 1, CH-1206 Geneva, Switzerland
- Institute of Pharmaceutical Sciences of Western Switzerland, University of Geneva, CH-1206 Geneva, Switzerland
- Chemistry Department, University College London (UCL), WC1E 6BT London, U.K
- Swiss Bioinformatics Institute, University of Geneva, CH-1206 Geneva, Switzerland
| |
Collapse
|
14
|
Fu H, Bian H, Shao X, Cai W. Collective Variable-Based Enhanced Sampling: From Human Learning to Machine Learning. J Phys Chem Lett 2024; 15:1774-1783. [PMID: 38329095 DOI: 10.1021/acs.jpclett.3c03542] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/09/2024]
Abstract
Enhanced-sampling algorithms relying on collective variables (CVs) are extensively employed to study complex (bio)chemical processes that are not amenable to brute-force molecular simulations. The selection of appropriate CVs characterizing the slow movement modes is of paramount importance for reliable and efficient enhanced-sampling simulations. In this Perspective, we first review the application and limitations of CVs obtained from chemical and geometrical intuition. We also introduce path-sampling algorithms, which can identify path-like CVs in a high-dimensional free-energy space. Machine-learning algorithms offer a viable approach to finding suitable CVs by analyzing trajectories from preliminary simulations. We discuss both the performance of machine-learning-derived CVs in enhanced-sampling simulations of experimental models and the challenges involved in applying these CVs to realistic, complex molecular assemblies. Moreover, we provide a prospective view of the potential advancements of machine-learning algorithms for the development of CVs in the field of enhanced-sampling simulations.
Collapse
Affiliation(s)
- Haohao Fu
- Research Center for Analytical Sciences, Frontiers Science Center for New Organic Matter, College of Chemistry, Nankai University, Tianjin Key Laboratory of Biosensing and Molecular Recognition, State Key Laboratory of Medicinal Chemical Biology, Tianjin 300071, China
- Haihe Laboratory of Sustainable Chemical Transformations, Tianjin 300192, China
| | - Hengwei Bian
- Research Center for Analytical Sciences, Frontiers Science Center for New Organic Matter, College of Chemistry, Nankai University, Tianjin Key Laboratory of Biosensing and Molecular Recognition, State Key Laboratory of Medicinal Chemical Biology, Tianjin 300071, China
- Haihe Laboratory of Sustainable Chemical Transformations, Tianjin 300192, China
| | - Xueguang Shao
- Research Center for Analytical Sciences, Frontiers Science Center for New Organic Matter, College of Chemistry, Nankai University, Tianjin Key Laboratory of Biosensing and Molecular Recognition, State Key Laboratory of Medicinal Chemical Biology, Tianjin 300071, China
- Haihe Laboratory of Sustainable Chemical Transformations, Tianjin 300192, China
| | - Wensheng Cai
- Research Center for Analytical Sciences, Frontiers Science Center for New Organic Matter, College of Chemistry, Nankai University, Tianjin Key Laboratory of Biosensing and Molecular Recognition, State Key Laboratory of Medicinal Chemical Biology, Tianjin 300071, China
- Haihe Laboratory of Sustainable Chemical Transformations, Tianjin 300192, China
| |
Collapse
|
15
|
Lücke M, Winkelmann S, Heitzig J, Molkenthin N, Koltai P. Learning interpretable collective variables for spreading processes on networks. Phys Rev E 2024; 109:L022301. [PMID: 38491651 DOI: 10.1103/physreve.109.l022301] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2023] [Accepted: 12/28/2023] [Indexed: 03/18/2024]
Abstract
Collective variables (CVs) are low-dimensional projections of high-dimensional system states. They are used to gain insights into complex emergent dynamical behaviors of processes on networks. The relation between CVs and network measures is not well understood and its derivation typically requires detailed knowledge of both the dynamical system and the network topology. In this Letter, we present a data-driven method for algorithmically learning and understanding CVs for binary-state spreading processes on networks of arbitrary topology. We demonstrate our method using four example networks: the stochastic block model, a ring-shaped graph, a random regular graph, and a scale-free network generated by the Albert-Barabási model. Our results deliver evidence for the existence of low-dimensional CVs even in cases that are not yet understood theoretically.
Collapse
Affiliation(s)
- Marvin Lücke
- Modeling and Simulation of Complex Processes, Zuse Institute Berlin, 14195 Berlin, Germany
| | - Stefanie Winkelmann
- Modeling and Simulation of Complex Processes, Zuse Institute Berlin, 14195 Berlin, Germany
| | - Jobst Heitzig
- FutureLab on Game Theory and Networks of Interacting Agents, Potsdam Institute for Climate Impact Research, 14473 Potsdam, Germany and Zuse Institute Berlin, 14195 Berlin, Germany
| | - Nora Molkenthin
- Complexity Science Department, Potsdam Institute for Climate Impact Research, 14473 Potsdam, Germany
| | - Péter Koltai
- Department of Mathematics, University of Bayreuth, 95447 Bayreuth, Germany
| |
Collapse
|
16
|
Wu H, Noé F. Reaction coordinate flows for model reduction of molecular kinetics. J Chem Phys 2024; 160:044109. [PMID: 38270975 DOI: 10.1063/5.0176078] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2023] [Accepted: 12/26/2023] [Indexed: 01/26/2024] Open
Abstract
In this work, we introduce a flow based machine learning approach called reaction coordinate (RC) flow for the discovery of low-dimensional kinetic models of molecular systems. The RC flow utilizes a normalizing flow to design the coordinate transformation and a Brownian dynamics model to approximate the kinetics of RC, where all model parameters can be estimated in a data-driven manner. In contrast to existing model reduction methods for molecular kinetics, RC flow offers a trainable and tractable model of reduced kinetics in continuous time and space due to the invertibility of the normalizing flow. Furthermore, the Brownian dynamics-based reduced kinetic model investigated in this work yields a readily discernible representation of metastable states within the phase space of the molecular system. Numerical experiments demonstrate how effectively the proposed method discovers interpretable and accurate low-dimensional representations of given full-state kinetics from simulations.
Collapse
Affiliation(s)
- Hao Wu
- School of Mathematical Sciences, Institute of Natural Sciences and MOE-LSC, Shanghai Jiao Tong University, Shanghai, People's Republic of China
| | - Frank Noé
- Department of Mathematics and Computer Science and Department of Physics, Freie Universität Berlin, Berlin, Germany
- Department of Chemistry, Rice University, Houston, Texas 77005, USA
- Microsoft Research AI4Science, Berlin, Germany
| |
Collapse
|
17
|
Beyerle ER, Tiwary P. Thermodynamically Optimized Machine-Learned Reaction Coordinates for Hydrophobic Ligand Dissociation. J Phys Chem B 2024; 128:755-767. [PMID: 38205806 DOI: 10.1021/acs.jpcb.3c08304] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/12/2024]
Abstract
Ligand unbinding is mediated by its free energy change, which has intertwined contributions from both energy and entropy. It is important, but not easy, to quantify their individual contributions to the free energy profile. We model hydrophobic ligand unbinding for two systems, a methane particle and a C60 fullerene, both unbinding from hydrophobic pockets in all-atom water. Using a modified deep learning framework, we learn a thermodynamically optimized reaction coordinate to describe the hydrophobic ligand dissociation for both systems. Interpretation of these reaction coordinates reveals the roles of entropic and enthalpic forces as the ligand and pocket sizes change. In both cases, we observe that the free-energy barrier to unbinding is dominated by entropy considerations. Furthermore, the process of methane unbinding is driven by methane solvation, while fullerene unbinding is driven first by pocket wetting and then fullerene wetting. For both solutes, the direct importance of the distance from the binding pocket to the learned reaction coordinate is present, but low. Our framework and subsequent feature important analysis thus give useful thermodynamic insight into hydrophobic ligand dissociation problems that are otherwise difficult to glean.
Collapse
Affiliation(s)
- Eric R Beyerle
- Institute for Physical Science and Technology, University of Maryland, College Park, Maryland 20742, United States
| | - Pratyush Tiwary
- Institute for Physical Science and Technology, University of Maryland, College Park, Maryland 20742, United States
- Department of Chemistry, University of Maryland, College Park, Maryland 20742, United States
| |
Collapse
|
18
|
Liu X, Xing J, Fu H, Shao X, Cai W. Analyzing Molecular Dynamics Trajectories Thermodynamically through Artificial Intelligence. J Chem Theory Comput 2024; 20:665-676. [PMID: 38193858 DOI: 10.1021/acs.jctc.3c00975] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2024]
Abstract
Molecular dynamics simulations produce trajectories that correspond to vast amounts of structure when exploring biochemical processes. Extracting valuable information, e.g., important intermediate states and collective variables (CVs) that describe the major movement modes, from molecular trajectories to understand the underlying mechanisms of biological processes presents a significant challenge. To achieve this goal, we introduce a deep learning approach, coined DIKI (deep identification of key intermediates), to determine low-dimensional CVs distinguishing key intermediate conformations without a-priori assumptions. DIKI dynamically plans the distribution of latent space and groups together similar conformations within the same cluster. Moreover, by incorporating two user-defined parameters, namely, coarse focus knob and fine focus knob, to help identify conformations with low free energy and differentiate the subtle distinctions among these conformations, resolution-tunable clustering was achieved. Furthermore, the integration of DIKI with a path-finding algorithm contributes to the identification of crucial intermediates along the lowest free-energy pathway. We postulate that DIKI is a robust and flexible tool that can find widespread applications in the analysis of complex biochemical processes.
Collapse
Affiliation(s)
- Xuyang Liu
- Research Center for Analytical Sciences, Tianjin Key Laboratory of Biosensing and Molecular Recognition, State Key Laboratory of Medicinal Chemical Biology, College of Chemistry, Nankai University, Tianjin 300071, China
- Haihe Laboratory of Sustainable Chemical Transformations, Tianjin 300192, China
| | - Jingya Xing
- Research Center for Analytical Sciences, Tianjin Key Laboratory of Biosensing and Molecular Recognition, State Key Laboratory of Medicinal Chemical Biology, College of Chemistry, Nankai University, Tianjin 300071, China
- Haihe Laboratory of Sustainable Chemical Transformations, Tianjin 300192, China
| | - Haohao Fu
- Research Center for Analytical Sciences, Tianjin Key Laboratory of Biosensing and Molecular Recognition, State Key Laboratory of Medicinal Chemical Biology, College of Chemistry, Nankai University, Tianjin 300071, China
- Haihe Laboratory of Sustainable Chemical Transformations, Tianjin 300192, China
| | - Xueguang Shao
- Research Center for Analytical Sciences, Tianjin Key Laboratory of Biosensing and Molecular Recognition, State Key Laboratory of Medicinal Chemical Biology, College of Chemistry, Nankai University, Tianjin 300071, China
- Haihe Laboratory of Sustainable Chemical Transformations, Tianjin 300192, China
| | - Wensheng Cai
- Research Center for Analytical Sciences, Tianjin Key Laboratory of Biosensing and Molecular Recognition, State Key Laboratory of Medicinal Chemical Biology, College of Chemistry, Nankai University, Tianjin 300071, China
- Haihe Laboratory of Sustainable Chemical Transformations, Tianjin 300192, China
| |
Collapse
|
19
|
Tam B, Qin Z, Zhao B, Sinha S, Lei CL, Wang SM. Classification of MLH1 Missense VUS Using Protein Structure-Based Deep Learning-Ramachandran Plot-Molecular Dynamics Simulations Method. Int J Mol Sci 2024; 25:850. [PMID: 38255924 PMCID: PMC10815254 DOI: 10.3390/ijms25020850] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2023] [Revised: 01/04/2024] [Accepted: 01/05/2024] [Indexed: 01/24/2024] Open
Abstract
Pathogenic variation in DNA mismatch repair (MMR) gene MLH1 is associated with Lynch syndrome (LS), an autosomal dominant hereditary cancer. Of the 3798 MLH1 germline variants collected in the ClinVar database, 38.7% (1469) were missense variants, of which 81.6% (1199) were classified as Variants of Uncertain Significance (VUS) due to the lack of functional evidence. Further determination of the impact of VUS on MLH1 function is important for the VUS carriers to take preventive action. We recently developed a protein structure-based method named "Deep Learning-Ramachandran Plot-Molecular Dynamics Simulation (DL-RP-MDS)" to evaluate the deleteriousness of MLH1 missense VUS. The method extracts protein structural information by using the Ramachandran plot-molecular dynamics simulation (RP-MDS) method, then combines the variation data with an unsupervised learning model composed of auto-encoder and neural network classifier to identify the variants causing significant change in protein structure. In this report, we applied the method to classify 447 MLH1 missense VUS. We predicted 126/447 (28.2%) MLH1 missense VUS were deleterious. Our study demonstrates that DL-RP-MDS is able to classify the missense VUS based solely on their impact on protein structure.
Collapse
Affiliation(s)
- Benjamin Tam
- Ministry of Education Frontiers Science Center for Precision Oncology, Faculty of Health Sciences, University of Macau, Macau SAR, China
- Cancer Centre, Faculty of Health Sciences, University of Macau, Macau SAR, China
- Institute of Translational Medicine, Faculty of Health Sciences, University of Macau, Macau SAR, China
| | - Zixin Qin
- Ministry of Education Frontiers Science Center for Precision Oncology, Faculty of Health Sciences, University of Macau, Macau SAR, China
- Cancer Centre, Faculty of Health Sciences, University of Macau, Macau SAR, China
- Institute of Translational Medicine, Faculty of Health Sciences, University of Macau, Macau SAR, China
| | - Bojin Zhao
- Ministry of Education Frontiers Science Center for Precision Oncology, Faculty of Health Sciences, University of Macau, Macau SAR, China
- Cancer Centre, Faculty of Health Sciences, University of Macau, Macau SAR, China
- Institute of Translational Medicine, Faculty of Health Sciences, University of Macau, Macau SAR, China
| | - Siddharth Sinha
- Ministry of Education Frontiers Science Center for Precision Oncology, Faculty of Health Sciences, University of Macau, Macau SAR, China
- Cancer Centre, Faculty of Health Sciences, University of Macau, Macau SAR, China
- Institute of Translational Medicine, Faculty of Health Sciences, University of Macau, Macau SAR, China
| | - Chon Lok Lei
- Ministry of Education Frontiers Science Center for Precision Oncology, Faculty of Health Sciences, University of Macau, Macau SAR, China
- Cancer Centre, Faculty of Health Sciences, University of Macau, Macau SAR, China
- Institute of Translational Medicine, Faculty of Health Sciences, University of Macau, Macau SAR, China
| | - San Ming Wang
- Ministry of Education Frontiers Science Center for Precision Oncology, Faculty of Health Sciences, University of Macau, Macau SAR, China
- Cancer Centre, Faculty of Health Sciences, University of Macau, Macau SAR, China
- Institute of Translational Medicine, Faculty of Health Sciences, University of Macau, Macau SAR, China
| |
Collapse
|
20
|
Herringer NSM, Dasetty S, Gandhi D, Lee J, Ferguson AL. Permutationally Invariant Networks for Enhanced Sampling (PINES): Discovery of Multimolecular and Solvent-Inclusive Collective Variables. J Chem Theory Comput 2024; 20:178-198. [PMID: 38150421 DOI: 10.1021/acs.jctc.3c00923] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2023]
Abstract
The typically rugged nature of molecular free-energy landscapes can frustrate efficient sampling of the thermodynamically relevant phase space due to the presence of high free-energy barriers. Enhanced sampling techniques can improve phase space exploration by accelerating sampling along particular collective variables (CVs). A number of techniques exist for the data-driven discovery of CVs parametrizing the important large-scale motions of the system. A challenge to CV discovery is learning CVs invariant to the symmetries of the molecular system, frequently rigid translation, rigid rotation, and permutational relabeling of identical particles. Of these, permutational invariance has proved a persistent challenge in frustrating the data-driven discovery of multimolecular CVs in systems of self-assembling particles and solvent-inclusive CVs for solvated systems. In this work, we integrate permutation invariant vector (PIV) featurizations with autoencoding neural networks to learn nonlinear CVs invariant to translation, rotation, and permutation and perform interleaved rounds of CV discovery and enhanced sampling to iteratively expand the sampling of configurational phase space and obtain converged CVs and free-energy landscapes. We demonstrate the permutationally invariant network for enhanced sampling (PINES) approach in applications to the self-assembly of a 13-atom argon cluster, association/dissociation of a NaCl ion pair in water, and hydrophobic collapse of a C45H92 n-pentatetracontane polymer chain. We make the approach freely available as a new module within the PLUMED2 enhanced sampling libraries.
Collapse
Affiliation(s)
| | - Siva Dasetty
- Pritzker School of Molecular Engineering, University of Chicago, Chicago, Illinois 60637, United States
| | - Diya Gandhi
- Pritzker School of Molecular Engineering, University of Chicago, Chicago, Illinois 60637, United States
| | - Junhee Lee
- Pritzker School of Molecular Engineering, University of Chicago, Chicago, Illinois 60637, United States
| | - Andrew L Ferguson
- Pritzker School of Molecular Engineering, University of Chicago, Chicago, Illinois 60637, United States
| |
Collapse
|
21
|
Ishizone T, Matsunaga Y, Fuchigami S, Nakamura K. Representation of Protein Dynamics Disentangled by Time-Structure-Based Prior. J Chem Theory Comput 2024; 20:436-450. [PMID: 38151233 DOI: 10.1021/acs.jctc.3c01025] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2023]
Abstract
Representation learning (RL) is a universal technique for deriving low-dimensional disentangled representations from high-dimensional observations, aiding in a multitude of downstream tasks. RL has been extensively applied to various data types, including images and natural language. Here, we analyze molecular dynamics (MD) simulation data of biomolecules in terms of RL. Currently, state-of-the-art RL techniques, mainly motivated by the variational principle, try to capture slow motions in the representation (latent) space. Here, we propose two methods based on an alternative perspective on the disentanglement in the latent space. By disentanglement, we here mean the separation of underlying factors in the simulation data, aiding in detecting physically important coordinates for conformational transitions. The proposed methods introduce a simple prior that imposes temporal constraints in the latent space, serving as a regularization term to facilitate the capture of disentangled representations of dynamics. Comparison with other methods via the analysis of MD simulation trajectories for alanine dipeptide and chignolin validates that the proposed methods construct Markov state models (MSMs) whose implied time scales are comparable to those of the state-of-the-art methods. Using a measure based on total variation, we quantitatively evaluated that the proposed methods successfully disentangle physically important coordinates, aiding the interpretation of folding/unfolding transitions of chignolin. Overall, our methods provide good representations of complex biomolecular dynamics for downstream tasks, allowing for better interpretations of the conformational transitions.
Collapse
Affiliation(s)
- Tsuyoshi Ishizone
- Mathematical Sciences Program, Graduate School of Advanced Mathematical Sciences, Meiji University, Nakano 4-21-1, Nakano-ku, Tokyo 164-8525, Japan
| | - Yasuhiro Matsunaga
- Graduate School of Science and Engineering, Saitama University, Shimo-Okubo 255, Sakura-ku, Saitama-shi, Saitama 338-8570, Japan
| | - Sotaro Fuchigami
- Physical Biochemistry Laboratory, Division of Pharmaceutical Sciences, School of Pharmaceutical Sciences, University of Shizuoka, 52-1 Yada, Suruga-ku, Shizuoka 422-8526, Japan
| | - Kazuyuki Nakamura
- Department of Mathematical Sciences Based on Modeling and Analysis, School of Interdisciplinary Mathematical Sciences, Meiji University, Nakano 4-21-1, Nakano-ku, Tokyo 164-8525, Japan
| |
Collapse
|
22
|
Kleiman DE, Nadeem H, Shukla D. Adaptive Sampling Methods for Molecular Dynamics in the Era of Machine Learning. J Phys Chem B 2023; 127:10669-10681. [PMID: 38081185 DOI: 10.1021/acs.jpcb.3c04843] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2023]
Abstract
Molecular dynamics (MD) simulations are fundamental computational tools for the study of proteins and their free energy landscapes. However, sampling protein conformational changes through MD simulations is challenging due to the relatively long time scales of these processes. Many enhanced sampling approaches have emerged to tackle this problem, including biased sampling and path-sampling methods. In this Perspective, we focus on adaptive sampling algorithms. These techniques differ from other approaches because the thermodynamic ensemble is preserved and the sampling is enhanced solely by restarting MD trajectories at particularly chosen seeds rather than introducing biasing forces. We begin our treatment with an overview of theoretically transparent methods, where we discuss principles and guidelines for adaptive sampling. Then, we present a brief summary of select methods that have been applied to realistic systems in the past. Finally, we discuss recent advances in adaptive sampling methodology powered by deep learning techniques, as well as their shortcomings.
Collapse
Affiliation(s)
- Diego E Kleiman
- Center for Biophysics and Quantitative Biology, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, United States
| | - Hassan Nadeem
- Department of Bioengineering, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, United States
| | - Diwakar Shukla
- Center for Biophysics and Quantitative Biology, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, United States
- Department of Bioengineering, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, United States
- Department of Chemical and Biomolecular Engineering, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, United States
- Department of Plant Biology, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, United States
| |
Collapse
|
23
|
Fu H, Liu H, Xing J, Zhao T, Shao X, Cai W. Deep-Learning-Assisted Enhanced Sampling for Exploring Molecular Conformational Changes. J Phys Chem B 2023; 127:9926-9935. [PMID: 37947397 DOI: 10.1021/acs.jpcb.3c05284] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2023]
Abstract
We present a novel strategy to explore conformational changes and identify stable states of molecular objects, eliminating the need for a priori knowledge. The approach applies a deep learning method to extract information about the movement modes of the molecular object from a short, high-dimensional, and parameter-free preliminary enhanced-sampling simulation. The gathered information is described by a small set of deep-learning-based collective variables (dCVs), which steer the production-enhanced-sampling simulation. Considering the challenge of adequately exploring the configurational space using the low-dimensional, suboptimal dCVs, we incorporate a method designed for ergodic sampling, namely, Gaussian-accelerated molecular dynamics (MD), into the framework of CV-based enhanced sampling. MD simulations on both toy models and nontrivial examples demonstrate the remarkable computational efficiency of the strategy in capturing the conformational changes of molecular objects without a priori knowledge. Specifically, we achieved the blind folding of two fast folders, chignolin and villin, within a time scale of hundreds of nanoseconds and successfully reconstructed the free-energy landscapes that characterize their reversible folding. All in all, the presented strategy holds significant promise for investigating conformational changes in macromolecules, and it is anticipated to find extensive applications in the fields of chemistry and biology.
Collapse
Affiliation(s)
- Haohao Fu
- Research Center for Analytical Sciences, Tianjin Key Laboratory of Biosensing and Molecular Recognition, State Key Laboratory of Medicinal Chemical Biology, College of Chemistry, Nankai University, Tianjin 300071, China
- Haihe Laboratory of Sustainable Chemical Transformations, Tianjin 300192, China
| | - Han Liu
- Research Center for Analytical Sciences, Tianjin Key Laboratory of Biosensing and Molecular Recognition, State Key Laboratory of Medicinal Chemical Biology, College of Chemistry, Nankai University, Tianjin 300071, China
- Haihe Laboratory of Sustainable Chemical Transformations, Tianjin 300192, China
| | - Jingya Xing
- Research Center for Analytical Sciences, Tianjin Key Laboratory of Biosensing and Molecular Recognition, State Key Laboratory of Medicinal Chemical Biology, College of Chemistry, Nankai University, Tianjin 300071, China
- Haihe Laboratory of Sustainable Chemical Transformations, Tianjin 300192, China
| | - Tong Zhao
- Research Center for Analytical Sciences, Tianjin Key Laboratory of Biosensing and Molecular Recognition, State Key Laboratory of Medicinal Chemical Biology, College of Chemistry, Nankai University, Tianjin 300071, China
| | - Xueguang Shao
- Research Center for Analytical Sciences, Tianjin Key Laboratory of Biosensing and Molecular Recognition, State Key Laboratory of Medicinal Chemical Biology, College of Chemistry, Nankai University, Tianjin 300071, China
- Haihe Laboratory of Sustainable Chemical Transformations, Tianjin 300192, China
| | - Wensheng Cai
- Research Center for Analytical Sciences, Tianjin Key Laboratory of Biosensing and Molecular Recognition, State Key Laboratory of Medicinal Chemical Biology, College of Chemistry, Nankai University, Tianjin 300071, China
- Haihe Laboratory of Sustainable Chemical Transformations, Tianjin 300192, China
| |
Collapse
|
24
|
Zhu J, Li Z, Tong H, Lu Z, Zhang N, Wei T, Chen HF. Phanto-IDP: compact model for precise intrinsically disordered protein backbone generation and enhanced sampling. Brief Bioinform 2023; 25:bbad429. [PMID: 38018910 PMCID: PMC10783862 DOI: 10.1093/bib/bbad429] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2023] [Revised: 09/21/2023] [Accepted: 11/05/2023] [Indexed: 11/30/2023] Open
Abstract
The biological function of proteins is determined not only by their static structures but also by the dynamic properties of their conformational ensembles. Numerous high-accuracy static structure prediction tools have been recently developed based on deep learning; however, there remains a lack of efficient and accurate methods for exploring protein dynamic conformations. Traditionally, studies concerning protein dynamics have relied on molecular dynamics (MD) simulations, which incur significant computational costs for all-atom precision and struggle to adequately sample conformational spaces with high energy barriers. To overcome these limitations, various enhanced sampling techniques have been developed to accelerate sampling in MD. Traditional enhanced sampling approaches like replica exchange molecular dynamics (REMD) and frontier expansion sampling (FEXS) often follow the MD simulation approach and still cost a lot of computational resources and time. Variational autoencoders (VAEs), as a classic deep generative model, are not restricted by potential energy landscapes and can explore conformational spaces more efficiently than traditional methods. However, VAEs often face challenges in generating reasonable conformations for complex proteins, especially intrinsically disordered proteins (IDPs), which limits their application as an enhanced sampling method. In this study, we presented a novel deep learning model (named Phanto-IDP) that utilizes a graph-based encoder to extract protein features and a transformer-based decoder combined with variational sampling to generate highly accurate protein backbones. Ten IDPs and four structured proteins were used to evaluate the sampling ability of Phanto-IDP. The results demonstrate that Phanto-IDP has high fidelity and diversity in the generated conformation ensembles, making it a suitable tool for enhancing the efficiency of MD simulation, generating broader protein conformational space and a continuous protein transition path.
Collapse
Affiliation(s)
- Junjie Zhu
- State Key Laboratory of Microbial Metabolism, Joint International Research Laboratory of Metabolic & Developmental Sciences, Department of Bioinformatics and Biostatistics, National Experimental Teaching Center for Life Sciences and Biotechnology, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, 200240, China
| | - Zhengxin Li
- State Key Laboratory of Microbial Metabolism, Joint International Research Laboratory of Metabolic & Developmental Sciences, Department of Bioinformatics and Biostatistics, National Experimental Teaching Center for Life Sciences and Biotechnology, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, 200240, China
| | - Haowei Tong
- State Key Laboratory of Microbial Metabolism, Joint International Research Laboratory of Metabolic & Developmental Sciences, Department of Bioinformatics and Biostatistics, National Experimental Teaching Center for Life Sciences and Biotechnology, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, 200240, China
| | - Zhouyu Lu
- State Key Laboratory of Microbial Metabolism, Joint International Research Laboratory of Metabolic & Developmental Sciences, Department of Bioinformatics and Biostatistics, National Experimental Teaching Center for Life Sciences and Biotechnology, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, 200240, China
| | - Ningjie Zhang
- State Key Laboratory of Microbial Metabolism, Joint International Research Laboratory of Metabolic & Developmental Sciences, Department of Bioinformatics and Biostatistics, National Experimental Teaching Center for Life Sciences and Biotechnology, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, 200240, China
| | - Ting Wei
- State Key Laboratory of Microbial Metabolism, Joint International Research Laboratory of Metabolic & Developmental Sciences, Department of Bioinformatics and Biostatistics, National Experimental Teaching Center for Life Sciences and Biotechnology, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, 200240, China
| | - Hai-Feng Chen
- State Key Laboratory of Microbial Metabolism, Joint International Research Laboratory of Metabolic & Developmental Sciences, Department of Bioinformatics and Biostatistics, National Experimental Teaching Center for Life Sciences and Biotechnology, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, 200240, China
| |
Collapse
|
25
|
Lederer J, Gastegger M, Schütt KT, Kampffmeyer M, Müller KR, Unke OT. Automatic identification of chemical moieties. Phys Chem Chem Phys 2023; 25:26370-26379. [PMID: 37750554 PMCID: PMC10548786 DOI: 10.1039/d3cp03845a] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2023] [Accepted: 08/18/2023] [Indexed: 09/27/2023]
Abstract
In recent years, the prediction of quantum mechanical observables with machine learning methods has become increasingly popular. Message-passing neural networks (MPNNs) solve this task by constructing atomic representations, from which the properties of interest are predicted. Here, we introduce a method to automatically identify chemical moieties (molecular building blocks) from such representations, enabling a variety of applications beyond property prediction, which otherwise rely on expert knowledge. The required representation can either be provided by a pretrained MPNN, or be learned from scratch using only structural information. Beyond the data-driven design of molecular fingerprints, the versatility of our approach is demonstrated by enabling the selection of representative entries in chemical databases, the automatic construction of coarse-grained force fields, as well as the identification of reaction coordinates.
Collapse
Affiliation(s)
- Jonas Lederer
- Berlin Institute of Technology (TU Berlin), 10587 Berlin, Germany.
- BIFOLD - Berlin Institute for the Foundations of Learning and Data, Germany
| | - Michael Gastegger
- Berlin Institute of Technology (TU Berlin), 10587 Berlin, Germany.
- BIFOLD - Berlin Institute for the Foundations of Learning and Data, Germany
| | - Kristof T Schütt
- Berlin Institute of Technology (TU Berlin), 10587 Berlin, Germany.
- BIFOLD - Berlin Institute for the Foundations of Learning and Data, Germany
| | - Michael Kampffmeyer
- Department of Physics and Technology, UiT The Arctic University of Norway, 9019 Tromsø, Norway
| | - Klaus-Robert Müller
- Berlin Institute of Technology (TU Berlin), 10587 Berlin, Germany.
- BIFOLD - Berlin Institute for the Foundations of Learning and Data, Germany
- Google Deepmind, Germany
- Department of Artificial Intelligence, Korea University, Seoul 136-713, Korea
- Max Planck Institut für Informatik, 66123 Saarbrücken, Germany
| | - Oliver T Unke
- Berlin Institute of Technology (TU Berlin), 10587 Berlin, Germany.
- BIFOLD - Berlin Institute for the Foundations of Learning and Data, Germany
- Google Deepmind, Germany
| |
Collapse
|
26
|
Lemcke S, Appeldorn JH, Wand M, Speck T. Toward a structural identification of metastable molecular conformations. J Chem Phys 2023; 159:114105. [PMID: 37712784 DOI: 10.1063/5.0164145] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2023] [Accepted: 08/21/2023] [Indexed: 09/16/2023] Open
Abstract
Interpreting high-dimensional data from molecular dynamics simulations is a persistent challenge. In this paper, we show that for a small peptide, deca-alanine, metastable states can be identified through a neural net based on structural information alone. While processing molecular dynamics data, dimensionality reduction is a necessary step that projects high-dimensional data onto a low-dimensional representation that, ideally, captures the conformational changes in the underlying data. Conventional methods make use of the temporal information contained in trajectories generated through integrating the equations of motion, which forgoes more efficient sampling schemes. We demonstrate that EncoderMap, an autoencoder architecture with an additional distance metric, can find a suitable low-dimensional representation to identify long-lived molecular conformations using exclusively structural information. For deca-alanine, which exhibits several helix-forming pathways, we show that this approach allows us to combine simulations with different biasing forces and yields representations comparable in quality to other established methods. Our results contribute to computational strategies for the rapid automatic exploration of the configuration space of peptides and proteins.
Collapse
Affiliation(s)
- Simon Lemcke
- Institut für Physik, Johannes Gutenberg-Universität Mainz, Staudingerweg 7-9, 55128 Mainz, Germany
| | - Jörn H Appeldorn
- Institut für Physik, Johannes Gutenberg-Universität Mainz, Staudingerweg 7-9, 55128 Mainz, Germany
| | - Michael Wand
- Institut für Informatik, Johannes Gutenberg-Universität Mainz, Staudingerweg 9, 55128 Mainz, Germany
| | - Thomas Speck
- Institut für Theoretische Physik IV, Universität Stuttgart, Heisenbergstr. 3, 70569 Stuttgart, Germany
| |
Collapse
|
27
|
Strahan J, Finkel J, Dinner AR, Weare J. Predicting rare events using neural networks and short-trajectory data. JOURNAL OF COMPUTATIONAL PHYSICS 2023; 488:112152. [PMID: 37332834 PMCID: PMC10270692 DOI: 10.1016/j.jcp.2023.112152] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/20/2023]
Abstract
Estimating the likelihood, timing, and nature of events is a major goal of modeling stochastic dynamical systems. When the event is rare in comparison with the timescales of simulation and/or measurement needed to resolve the elemental dynamics, accurate prediction from direct observations becomes challenging. In such cases a more effective approach is to cast statistics of interest as solutions to Feynman-Kac equations (partial differential equations). Here, we develop an approach to solve Feynman-Kac equations by training neural networks on short-trajectory data. Our approach is based on a Markov approximation but otherwise avoids assumptions about the underlying model and dynamics. This makes it applicable to treating complex computational models and observational data. We illustrate the advantages of our method using a low-dimensional model that facilitates visualization, and this analysis motivates an adaptive sampling strategy that allows on-the-fly identification of and addition of data to regions important for predicting the statistics of interest. Finally, we demonstrate that we can compute accurate statistics for a 75-dimensional model of sudden stratospheric warming. This system provides a stringent test bed for our method.
Collapse
Affiliation(s)
- John Strahan
- Department of Chemistry and James Franck Institute, the University of Chicago, Chicago, IL 60637
| | - Justin Finkel
- Department of Earth, Atmospheric, and Planetary Sciences, Massachusetts Institute of Technology, Cambridge, MA 02139
| | - Aaron R. Dinner
- Department of Chemistry and James Franck Institute, the University of Chicago, Chicago, IL 60637
- Committee on Computational and Applied Mathematics, the University of Chicago, Chicago, IL 60637
| | - Jonathan Weare
- Courant Institute of Mathematical Sciences, New York University, New York, New York 10012
| |
Collapse
|
28
|
Melo MCR, Bernardi RC. Fostering discoveries in the era of exascale computing: How the next generation of supercomputers empowers computational and experimental biophysics alike. Biophys J 2023; 122:2833-2840. [PMID: 36738105 PMCID: PMC10398237 DOI: 10.1016/j.bpj.2023.01.042] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2022] [Revised: 01/24/2023] [Accepted: 01/30/2023] [Indexed: 02/05/2023] Open
Abstract
Over a century ago, physicists started broadly relying on theoretical models to guide new experiments. Soon thereafter, chemists began doing the same. Now, biological research enters a new era when experiment and theory walk hand in hand. Novel software and specialized hardware became essential to understand experimental data and propose new models. In fact, current petascale computing resources already allow researchers to reach unprecedented levels of simulation throughput to connect in silico and in vitro experiments. The reduction in cost and improved access allowed a large number of research groups to adopt supercomputing resources and techniques. Here, we outline how large-scale computing has evolved to expand decades-old research, spark new research efforts, and continuously connect simulation and observation. For instance, multiple publicly and privately funded groups have dedicated extensive resources to develop artificial intelligence tools for computational biophysics, from accelerating quantum chemistry calculations to proposing protein structure models. Moreover, advances in computer hardware have accelerated data processing from single-molecule experimental observations and simulations of chemical reactions occurring throughout entire cells. The combination of software and hardware has opened the way for exascale computing and the production of the first public exascale supercomputer, Frontier, inaugurated by the Oak Ridge National Laboratory in 2022. Ultimately, the popularization and development of computational techniques and the training of researchers to use them will only accelerate the diversification of tools and learning resources for future generations.
Collapse
Affiliation(s)
- Marcelo C R Melo
- Auburn University, Department of Physics, Auburn University, Auburn, Alabama
| | - Rafael C Bernardi
- Auburn University, Department of Physics, Auburn University, Auburn, Alabama.
| |
Collapse
|
29
|
Qiu Y, O’Connor MS, Xue M, Liu B, Huang X. An Efficient Path Classification Algorithm Based on Variational Autoencoder to Identify Metastable Path Channels for Complex Conformational Changes. J Chem Theory Comput 2023; 19:4728-4742. [PMID: 37382437 PMCID: PMC11042546 DOI: 10.1021/acs.jctc.3c00318] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/30/2023]
Abstract
Conformational changes (i.e., dynamic transitions between pairs of conformational states) play important roles in many chemical and biological processes. Constructing the Markov state model (MSM) from extensive molecular dynamics (MD) simulations is an effective approach to dissect the mechanism of conformational changes. When combined with transition path theory (TPT), MSM can be applied to elucidate the ensemble of kinetic pathways connecting pairs of conformational states. However, the application of TPT to analyze complex conformational changes often results in a vast number of kinetic pathways with comparable fluxes. This obstacle is particularly pronounced in heterogeneous self-assembly and aggregation processes. The large number of kinetic pathways makes it challenging to comprehend the molecular mechanisms underlying conformational changes of interest. To address this challenge, we have developed a path classification algorithm named latent-space path clustering (LPC) that efficiently lumps parallel kinetic pathways into distinct metastable path channels, making them easier to comprehend. In our algorithm, MD conformations are first projected onto a low-dimensional space containing a small set of collective variables (CVs) by time-structure-based independent component analysis (tICA) with kinetic mapping. Then, MSM and TPT are constructed to obtain the ensemble of pathways, and a deep learning architecture named the variational autoencoder (VAE) is used to learn the spatial distributions of kinetic pathways in the continuous CV space. Based on the trained VAE model, the TPT-generated ensemble of kinetic pathways can be embedded into a latent space, where the classification becomes clear. We show that LPC can efficiently and accurately identify the metastable path channels in three systems: a 2D potential, the aggregation of two hydrophobic particles in water, and the folding of the Fip35 WW domain. Using the 2D potential, we further demonstrate that our LPC algorithm outperforms the previous path-lumping algorithms by making substantially fewer incorrect assignments of individual pathways to four path channels. We expect that LPC can be widely applied to identify the dominant kinetic pathways underlying complex conformational changes.
Collapse
Affiliation(s)
- Yunrui Qiu
- Department of Chemistry, Theoretical Chemistry Institute, University of Wisconsin-Madison, Madison, WI, 53706, USA
| | - Michael S. O’Connor
- Biophysics Graduate Program, University of Wisconsin-Madison, Madison, WI, 53706, USA
| | - Mingyi Xue
- Department of Chemistry, Theoretical Chemistry Institute, University of Wisconsin-Madison, Madison, WI, 53706, USA
| | - Bojun Liu
- Department of Chemistry, Theoretical Chemistry Institute, University of Wisconsin-Madison, Madison, WI, 53706, USA
| | - Xuhui Huang
- Department of Chemistry, Theoretical Chemistry Institute, University of Wisconsin-Madison, Madison, WI, 53706, USA
- Biophysics Graduate Program, University of Wisconsin-Madison, Madison, WI, 53706, USA
| |
Collapse
|
30
|
Chen H, Roux B, Chipot C. Discovering Reaction Pathways, Slow Variables, and Committor Probabilities with Machine Learning. J Chem Theory Comput 2023; 19:4414-4426. [PMID: 37224455 PMCID: PMC11372462 DOI: 10.1021/acs.jctc.3c00028] [Citation(s) in RCA: 10] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/26/2023]
Abstract
A significant challenge faced by atomistic simulations is the difficulty, and often impossibility, to sample the transitions between metastable states of the free-energy landscape associated with slow molecular processes. Importance-sampling schemes represent an appealing option to accelerate the underlying dynamics by smoothing out the relevant free-energy barriers, but require the definition of suitable reaction-coordinate (RC) models expressed in terms of compact low-dimensional sets of collective variables (CVs). While most computational studies of slow molecular processes have traditionally relied on educated guesses based on human intuition to reduce the dimensionality of the problem at hand, a variety of machine-learning (ML) algorithms have recently emerged as powerful alternatives to discover meaningful CVs capable of capturing the dynamics of the slowest degrees of freedom. Considering a simple paradigmatic situation in which the long-time dynamics is dominated by the transition between two known metastable states, we compare two variational data-driven ML methods based on Siamese neural networks aimed at discovering a meaningful RC model─the slowest decorrelating CV of the molecular process, and the committor probability to first reach one of the two metastable states. One method is the state-free reversible variational approach for Markov processes networks (VAMPnets), or SRVs─the other, inspired by the transition path theory framework, is the variational committor-based neural networks, or VCNs. The relationship and the ability of these methodologies to discover the relevant descriptors of the slow molecular process of interest are illustrated with a series of simple model systems. We also show that both strategies are amenable to importance-sampling schemes through an appropriate reweighting algorithm that approximates the kinetic properties of the transition.
Collapse
Affiliation(s)
- Haochuan Chen
- Laboratoire International Associé Centre National de la Recherche Scientifique et University of Illinois at Urbana-Champaign, Unité Mixte de Recherche n°7019, Université de Lorraine, B.P. 70239, 54506 Vandœuvre-lès-Nancy cedex, France
| | - Benoît Roux
- Department of Biochemistry and Molecular Biology, University of Chicago, Chicago, 60637, United States
| | - Christophe Chipot
- Laboratoire International Associé Centre National de la Recherche Scientifique et University of Illinois at Urbana-Champaign, Unité Mixte de Recherche n°7019, Université de Lorraine, B.P. 70239, 54506 Vandœuvre-lès-Nancy cedex, France
- Department of Biochemistry and Molecular Biology, University of Chicago, Chicago, 60637, United States
- NIH Center for Macromolecular Modeling and Bioinformatics, Beckman Institute for Advanced Science and Technology, and Department of Physics, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, United States
| |
Collapse
|
31
|
Sasmal S, McCullagh M, Hocky GM. Reaction Coordinates for Conformational Transitions Using Linear Discriminant Analysis on Positions. J Chem Theory Comput 2023; 19:4427-4435. [PMID: 37130367 PMCID: PMC10373481 DOI: 10.1021/acs.jctc.3c00051] [Citation(s) in RCA: 11] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2023] [Indexed: 05/04/2023]
Abstract
In this work, we demonstrate that Linear Discriminant Analysis (LDA) applied to atomic positions in two different states of a biomolecule produces a good reaction coordinate between those two states. Atomic coordinates of a macromolecule are a direct representation of a macromolecular configuration, and yet, they are not used in enhanced sampling studies due to a lack of rotational and translational invariance. We resolve this issue using the technique of our prior work, whereby a molecular configuration is considered a member of an equivalence class in size-and-shape space, which is the set of all configurations that can be translated and rotated to a single point within a reference multivariate Gaussian distribution characterizing a single molecular state. The reaction coordinates produced by LDA applied to positions are shown to be good reaction coordinates both in terms of characterizing the transition between two states of a system within a long molecular dynamics (MD) simulation and also ones that allow us to readily produce free energy estimates along that reaction coordinate using enhanced sampling MD techniques.
Collapse
Affiliation(s)
- Subarna Sasmal
- Department
of Chemistry and Simons Center for Computational Physical Chemistry, New York University, New York, New York 10003, United States
| | - Martin McCullagh
- Department
of Chemistry, Oklahoma State University, Stillwater, Oklahoma 74078, United States
| | - Glen M. Hocky
- Department
of Chemistry and Simons Center for Computational Physical Chemistry, New York University, New York, New York 10003, United States
| |
Collapse
|
32
|
Strahan J, Guo SC, Lorpaiboon C, Dinner AR, Weare J. Inexact iterative numerical linear algebra for neural network-based spectral estimation and rare-event prediction. J Chem Phys 2023; 159:014110. [PMID: 37409704 PMCID: PMC10328561 DOI: 10.1063/5.0151309] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2023] [Accepted: 06/02/2023] [Indexed: 07/07/2023] Open
Abstract
Understanding dynamics in complex systems is challenging because there are many degrees of freedom, and those that are most important for describing events of interest are often not obvious. The leading eigenfunctions of the transition operator are useful for visualization, and they can provide an efficient basis for computing statistics, such as the likelihood and average time of events (predictions). Here, we develop inexact iterative linear algebra methods for computing these eigenfunctions (spectral estimation) and making predictions from a dataset of short trajectories sampled at finite intervals. We demonstrate the methods on a low-dimensional model that facilitates visualization and a high-dimensional model of a biomolecular system. Implications for the prediction problem in reinforcement learning are discussed.
Collapse
Affiliation(s)
- John Strahan
- Department of Chemistry and James Franck Institute, University of Chicago, Chicago, Illinois 60637, USA
| | - Spencer C. Guo
- Department of Chemistry and James Franck Institute, University of Chicago, Chicago, Illinois 60637, USA
| | - Chatipat Lorpaiboon
- Department of Chemistry and James Franck Institute, University of Chicago, Chicago, Illinois 60637, USA
| | - Aaron R. Dinner
- Department of Chemistry and James Franck Institute, University of Chicago, Chicago, Illinois 60637, USA
| | - Jonathan Weare
- Courant Institute of Mathematical Sciences, New York University, New York, New York 10012, USA
| |
Collapse
|
33
|
Mendels D, Byléhn F, Sirk TW, de Pablo JJ. Systematic modification of functionality in disordered elastic networks through free energy surface tailoring. SCIENCE ADVANCES 2023; 9:eadf7541. [PMID: 37285442 DOI: 10.1126/sciadv.adf7541] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/14/2022] [Accepted: 05/01/2023] [Indexed: 06/09/2023]
Abstract
A combined machine learning-physics-based approach is explored for molecular and materials engineering. Specifically, collective variables, akin to those used in enhanced sampled simulations, are constructed using a machine learning model trained on data gathered from a single system. Through the constructed collective variables, it becomes possible to identify critical molecular interactions in the considered system, the modulation of which enables a systematic tailoring of the system's free energy landscape. To explore the efficacy of the proposed approach, we use it to engineer allosteric regulation and uniaxial strain fluctuations in a complex disordered elastic network. Its successful application in these two cases provides insights regarding how functionality is governed in systems characterized by extensive connectivity and points to its potential for design of complex molecular systems.
Collapse
Affiliation(s)
- Dan Mendels
- Pritzker School of Molecular Engineering, University of Chicago, 5640 S. Ellis Avenue, Chicago, IL 60637 USA
| | - Fabian Byléhn
- Pritzker School of Molecular Engineering, University of Chicago, 5640 S. Ellis Avenue, Chicago, IL 60637 USA
| | - Timothy W Sirk
- Polymers Branch, U.S. CCDC Army Research Laboratory, Aberdeen Proving Ground, MD 21005, USA
| | - Juan J de Pablo
- Pritzker School of Molecular Engineering, University of Chicago, 5640 S. Ellis Avenue, Chicago, IL 60637 USA
| |
Collapse
|
34
|
Xiao S, Song Z, Tian H, Tao P. Assessments of Variational Autoencoder in Protein Conformation Exploration. JOURNAL OF COMPUTATIONAL BIOPHYSICS AND CHEMISTRY 2023; 22:489-501. [PMID: 38826699 PMCID: PMC11138204 DOI: 10.1142/s2737416523500217] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/04/2024]
Abstract
Molecular dynamics (MD) simulations have been extensively used to study protein dynamics and subsequently functions. However, MD simulations are often insufficient to explore adequate conformational space for protein functions within reachable timescales. Accordingly, many enhanced sampling methods, including variational autoencoder (VAE) based methods, have been developed to address this issue. The purpose of this study is to evaluate the feasibility of using VAE to assist in the exploration of protein conformational landscapes. Using three modeling systems, we showed that VAE could capture high-level hidden information which distinguishes protein conformations. These models could also be used to generate new physically plausible protein conformations for direct sampling in favorable conformational spaces. We also found that VAE worked better in interpolation than extrapolation and increasing latent space dimension could lead to a trade-off between performances and complexities.
Collapse
Affiliation(s)
- Sian Xiao
- Department of Chemistry, Center for Research Computing, Center for Drug Discovery, Design, and Delivery (CD4), Southern Methodist University, Dallas, Texas 75205, United States
| | - Zilin Song
- Department of Chemistry, Center for Research Computing, Center for Drug Discovery, Design, and Delivery (CD4), Southern Methodist University, Dallas, Texas 75205, United States
| | - Hao Tian
- Department of Chemistry, Center for Research Computing, Center for Drug Discovery, Design, and Delivery (CD4), Southern Methodist University, Dallas, Texas 75205, United States
| | - Peng Tao
- Department of Chemistry, Center for Research Computing, Center for Drug Discovery, Design, and Delivery (CD4), Southern Methodist University, Dallas, Texas 75205, United States
| |
Collapse
|
35
|
Purnomo A, Hayashibe M. Sparse identification of Lagrangian for nonlinear dynamical systems via proximal gradient method. Sci Rep 2023; 13:7919. [PMID: 37193704 DOI: 10.1038/s41598-023-34931-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2023] [Accepted: 05/10/2023] [Indexed: 05/18/2023] Open
Abstract
The autonomous distillation of physical laws only from data is of great interest in many scientific fields. Data-driven modeling frameworks that adopt sparse regression techniques, such as sparse identification of nonlinear dynamics (SINDy) and its modifications, are developed to resolve difficulties in extracting underlying dynamics from experimental data. However, SINDy faces certain difficulties when the dynamics contain rational functions. The Lagrangian is substantially more concise than the actual equations of motion, especially for complex systems, and it does not usually contain rational functions for mechanical systems. Few proposed methods proposed to date, such as Lagrangian-SINDy we have proposed recently, can extract the true form of the Lagrangian of dynamical systems from data; however, these methods are easily affected by noise as a fact. In this study, we developed an extended version of Lagrangian-SINDy (xL-SINDy) to obtain the Lagrangian of dynamical systems from noisy measurement data. We incorporated the concept of SINDy and used the proximal gradient method to obtain sparse Lagrangian expressions. Further, we demonstrated the effectiveness of xL-SINDy against different noise levels using four mechanical systems. In addition, we compared its performance with SINDy-PI (parallel, implicit) which is a latest robust variant of SINDy that can handle implicit dynamics and rational nonlinearities. The experimental results reveal that xL-SINDy is much more robust than the existing methods for extracting the governing equations of nonlinear mechanical systems from data with noise. We believe this contribution is significant toward noise-tolerant computational method for explicit dynamics law extraction from data.
Collapse
Affiliation(s)
- Adam Purnomo
- Department of Robotics, Graduate School of Engineering, Tohoku University, Sendai, 980-8579, Japan
| | - Mitsuhiro Hayashibe
- Department of Robotics, Graduate School of Engineering, Tohoku University, Sendai, 980-8579, Japan.
| |
Collapse
|
36
|
Wu M, Liao J, Shu Z, Chen C. Enhanced sampling in explicit solvent by deep learning module in FSATOOL. J Comput Chem 2023. [PMID: 37191088 DOI: 10.1002/jcc.27132] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2023] [Revised: 04/21/2023] [Accepted: 04/27/2023] [Indexed: 05/17/2023]
Abstract
FSATOOL is an integrated molecular simulation and data analysis program. Its old molecular dynamics engine only supports simulations in vacuum or implicit solvent. In this work, we implement the well-known smooth particle mesh Ewald method for simulations in explicit solvent. The new developed engine is runnable on both CPU and GPU. All the existed analysis modules in the program are compatible with the new engine. Moreover, we also build a complete deep learning module in FSATOOL. Based on the module, we further implement two useful trajectory analysis methods: state-free reversible VAMPnets and time-lagged autoencoder. They are good at searching the collective variables related to the conformational transitions of biomolecules. In FSATOOL, these collective variables can be further used to construct a bias potential for the enhanced sampling purpose. We introduce the implementation details of the methods and present their actual performances in FSATOOL by a few enhanced sampling simulations.
Collapse
Affiliation(s)
- Mincong Wu
- Biomolecular Physics and Modeling Group, School of Physics, Huazhong University of Science and Technology, Wuhan, Hubei, China
| | - Jun Liao
- Biomolecular Physics and Modeling Group, School of Physics, Huazhong University of Science and Technology, Wuhan, Hubei, China
| | - Zirui Shu
- Biomolecular Physics and Modeling Group, School of Physics, Huazhong University of Science and Technology, Wuhan, Hubei, China
| | - Changjun Chen
- Biomolecular Physics and Modeling Group, School of Physics, Huazhong University of Science and Technology, Wuhan, Hubei, China
| |
Collapse
|
37
|
Ahalawat N, Sahil M, Mondal J. Resolving Protein Conformational Plasticity and Substrate Binding via Machine Learning. J Chem Theory Comput 2023; 19:2644-2657. [PMID: 37068044 DOI: 10.1021/acs.jctc.2c00932] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/18/2023]
Abstract
A long-standing target in elucidating the biomolecular recognition process is the identification of binding-competent conformations of the receptor protein. However, protein conformational plasticity and the stochastic nature of the recognition processes often preclude the assignment of a specific protein conformation to an individual ligand-bound pose. Here, we demonstrate that a computational framework coined as RF-TICA-MD, which integrates an ensemble decision-tree-based Random Forest (RF) machine learning (ML) technique with an unsupervised dimension reduction approach time-structured independent component analysis (TICA), provides an efficient and unambiguous solution toward resolving protein conformational plasticity and the substrate binding process. In particular, we consider multimicrosecond-long molecular dynamics (MD) simulation trajectories of a ligand recognition process in solvent-inaccessible cavities of archetypal proteins T4 lysozyme and cytochrome P450cam. We show that in a scenario in which clear correspondence between protein conformation and binding-competent macrostates could not be obtained via an unsupervised dimension reduction approach, an a priori decision-tree-based supervised classification of the simulated recognition trajectories via RF would help characterize key amino acid residue pairs of the protein that are deemed sensitive for ligand binding. A subsequent unsupervised dimensional reduction of the selected residue pairs via TICA would then delineate a conformational landscape of protein which is able to demarcate ligand-bound poses from unbound ones. The proposed RF-TICA-MD approach is shown to be data agnostic and found to be robust when using other ML-based classification methods such as XGBoost. As a promising spinoff of the protocol, the framework is found to be capable of identifying distal protein locations which would be allosterically important for ligand binding and would characterize their roles in recognition pathways. A Python implementation of a proposed ML workflow is available in GitHub https://github.com/navjeet0211/rf-tica-md.
Collapse
Affiliation(s)
- Navjeet Ahalawat
- Department of Bioinformatics and Computational Biology, College of Biotechnology, CCS Haryana Agricultural University, Hisar 125 004, Haryana, India
| | - Mohammad Sahil
- Center for Interdisciplinary Sciences, Tata Institute of Fundamental Research, Hyderabad 500046, India
| | - Jagannath Mondal
- Center for Interdisciplinary Sciences, Tata Institute of Fundamental Research, Hyderabad 500046, India
| |
Collapse
|
38
|
Hunkler S, Diederichs K, Kukharenko O, Peter C. Fast conformational clustering of extensive molecular dynamics simulation data. J Chem Phys 2023; 158:144109. [PMID: 37061476 DOI: 10.1063/5.0142797] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/17/2023] Open
Abstract
We present an unsupervised data processing workflow that is specifically designed to obtain a fast conformational clustering of long molecular dynamics simulation trajectories. In this approach, we combine two dimensionality reduction algorithms (cc_analysis and encodermap) with a density-based spatial clustering algorithm (hierarchical density-based spatial clustering of applications with noise). The proposed scheme benefits from the strengths of the three algorithms while avoiding most of the drawbacks of the individual methods. Here, the cc_analysis algorithm is applied for the first time to molecular simulation data. The encodermap algorithm complements cc_analysis by providing an efficient way to process and assign large amounts of data to clusters. The main goal of the procedure is to maximize the number of assigned frames of a given trajectory while keeping a clear conformational identity of the clusters that are found. In practice, we achieve this by using an iterative clustering approach and a tunable root-mean-square-deviation-based criterion in the final cluster assignment. This allows us to find clusters of different densities and different degrees of structural identity. With the help of four protein systems, we illustrate the capability and performance of this clustering workflow: wild-type and thermostable mutant of the Trp-cage protein (TC5b and TC10b), NTL9, and Protein B. Each of these test systems poses their individual challenges to the scheme, which, in total, give a nice overview of the advantages and potential difficulties that can arise when using the proposed method.
Collapse
Affiliation(s)
- Simon Hunkler
- Department of Chemistry, University of Konstanz, Konstanz, Germany
| | - Kay Diederichs
- Department of Chemistry, University of Konstanz, Konstanz, Germany
| | | | - Christine Peter
- Department of Chemistry, University of Konstanz, Konstanz, Germany
| |
Collapse
|
39
|
Xiao S, Verkhivker GM, Tao P. Machine learning and protein allostery. Trends Biochem Sci 2023; 48:375-390. [PMID: 36564251 PMCID: PMC10023316 DOI: 10.1016/j.tibs.2022.12.001] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2022] [Revised: 11/16/2022] [Accepted: 12/02/2022] [Indexed: 12/23/2022]
Abstract
The fundamental biological importance and complexity of allosterically regulated proteins stem from their central role in signal transduction and cellular processes. Recently, machine-learning approaches have been developed and actively deployed to facilitate theoretical and experimental studies of protein dynamics and allosteric mechanisms. In this review, we survey recent developments in applications of machine-learning methods for studies of allosteric mechanisms, prediction of allosteric effects and allostery-related physicochemical properties, and allosteric protein engineering. We also review the applications of machine-learning strategies for characterization of allosteric mechanisms and drug design targeting SARS-CoV-2. Continuous development and task-specific adaptation of machine-learning methods for protein allosteric mechanisms will have an increasingly important role in bridging a wide spectrum of data-intensive experimental and theoretical technologies.
Collapse
Affiliation(s)
- Sian Xiao
- Department of Chemistry, Center for Research Computing, Center for Drug Discovery, Design, and Delivery (CD4), Southern Methodist University, Dallas, TX 75205, USA.
| | - Gennady M Verkhivker
- Graduate Program in Computational and Data Sciences, Schmid College of Science and Technology, Chapman University, Orange, CA 92866, USA; Department of Biomedical and Pharmaceutical Sciences, Chapman University School of Pharmacy, Irvine, CA 92618, USA
| | - Peng Tao
- Department of Chemistry, Center for Research Computing, Center for Drug Discovery, Design, and Delivery (CD4), Southern Methodist University, Dallas, TX 75205, USA.
| |
Collapse
|
40
|
Tam B, Qin Z, Zhao B, Wang SM, Lei CL. Integration of deep learning with Ramachandran plot molecular dynamics simulation for genetic variant classification. iScience 2023; 26:106122. [PMID: 36879825 PMCID: PMC9984559 DOI: 10.1016/j.isci.2023.106122] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2022] [Revised: 10/07/2022] [Accepted: 01/30/2023] [Indexed: 02/05/2023] Open
Abstract
Functional classification of genetic variants is a key for their clinical applications in patient care. However, abundant variant data generated by the next-generation DNA sequencing technologies limit the use of experimental methods for their classification. Here, we developed a protein structure and deep learning (DL)-based system for genetic variant classification, DL-RP-MDS, which comprises two principles: 1) Extracting protein structural and thermodynamics information using the Ramachandran plot-molecular dynamics simulation (RP-MDS) method, 2) combining those data with an unsupervised learning model of auto-encoder and a neural network classifier to identify the statistical significance patterns of the structural changes. We observed that DL-RP-MDS provided higher specificity than over 20 widely used in silico methods in classifying the variants of three DNA damage repair genes: TP53, MLH1, and MSH2. DL-RP-MDS offers a powerful platform for high-throughput genetic variant classification. The software and online application are available at https://genemutation.fhs.um.edu.mo/DL-RP-MDS/.
Collapse
Affiliation(s)
- Benjamin Tam
- Ministry of Education Frontiers Science Center for Precision Oncology, Faculty of Health Sciences, University of Macau, Macau SAR, China.,Cancer Centre, Faculty of Health Sciences, University of Macau, Macau SAR, China.,Institute of Translational Medicine, Faculty of Health Sciences, University of Macau, Macau SAR, China
| | - Zixin Qin
- Ministry of Education Frontiers Science Center for Precision Oncology, Faculty of Health Sciences, University of Macau, Macau SAR, China.,Cancer Centre, Faculty of Health Sciences, University of Macau, Macau SAR, China.,Institute of Translational Medicine, Faculty of Health Sciences, University of Macau, Macau SAR, China
| | - Bojin Zhao
- Ministry of Education Frontiers Science Center for Precision Oncology, Faculty of Health Sciences, University of Macau, Macau SAR, China.,Cancer Centre, Faculty of Health Sciences, University of Macau, Macau SAR, China.,Institute of Translational Medicine, Faculty of Health Sciences, University of Macau, Macau SAR, China
| | - San Ming Wang
- Ministry of Education Frontiers Science Center for Precision Oncology, Faculty of Health Sciences, University of Macau, Macau SAR, China.,Cancer Centre, Faculty of Health Sciences, University of Macau, Macau SAR, China.,Institute of Translational Medicine, Faculty of Health Sciences, University of Macau, Macau SAR, China
| | - Chon Lok Lei
- Ministry of Education Frontiers Science Center for Precision Oncology, Faculty of Health Sciences, University of Macau, Macau SAR, China.,Cancer Centre, Faculty of Health Sciences, University of Macau, Macau SAR, China.,Institute of Translational Medicine, Faculty of Health Sciences, University of Macau, Macau SAR, China
| |
Collapse
|
41
|
Agajanian S, Alshahrani M, Bai F, Tao P, Verkhivker GM. Exploring and Learning the Universe of Protein Allostery Using Artificial Intelligence Augmented Biophysical and Computational Approaches. J Chem Inf Model 2023; 63:1413-1428. [PMID: 36827465 PMCID: PMC11162550 DOI: 10.1021/acs.jcim.2c01634] [Citation(s) in RCA: 13] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/26/2023]
Abstract
Allosteric mechanisms are commonly employed regulatory tools used by proteins to orchestrate complex biochemical processes and control communications in cells. The quantitative understanding and characterization of allosteric molecular events are among major challenges in modern biology and require integration of innovative computational experimental approaches to obtain atomistic-level knowledge of the allosteric states, interactions, and dynamic conformational landscapes. The growing body of computational and experimental studies empowered by emerging artificial intelligence (AI) technologies has opened up new paradigms for exploring and learning the universe of protein allostery from first principles. In this review we analyze recent developments in high-throughput deep mutational scanning of allosteric protein functions; applications and latest adaptations of Alpha-fold structural prediction methods for studies of protein dynamics and allostery; new frontiers in integrating machine learning and enhanced sampling techniques for characterization of allostery; and recent advances in structural biology approaches for studies of allosteric systems. We also highlight recent computational and experimental studies of the SARS-CoV-2 spike (S) proteins revealing an important and often hidden role of allosteric regulation driving functional conformational changes, binding interactions with the host receptor, and mutational escape mechanisms of S proteins which are critical for viral infection. We conclude with a summary and outlook of future directions suggesting that AI-augmented biophysical and computer simulation approaches are beginning to transform studies of protein allostery toward systematic characterization of allosteric landscapes, hidden allosteric states, and mechanisms which may bring about a new revolution in molecular biology and drug discovery.
Collapse
Affiliation(s)
- Steve Agajanian
- Keck Center for Science and Engineering, Graduate Program in Computational and Data Sciences, Schmid College of Science and Technology, Chapman University, Orange, California 92866, United States
| | - Mohammed Alshahrani
- Keck Center for Science and Engineering, Graduate Program in Computational and Data Sciences, Schmid College of Science and Technology, Chapman University, Orange, California 92866, United States
| | - Fang Bai
- Shanghai Institute for Advanced Immunochemical Studies, School of Life Science and Technology and Information Science and Technology, Shanghai Tech University, 393 Middle Huaxia Road, Shanghai 201210, China
| | - Peng Tao
- Department of Chemistry, Center for Research Computing, Center for Drug Discovery, Design, and Delivery (CD4), Southern Methodist University, Dallas, Texas 75205, United States
| | - Gennady M Verkhivker
- Keck Center for Science and Engineering, Graduate Program in Computational and Data Sciences, Schmid College of Science and Technology, Chapman University, Orange, California 92866, United States
- Department of Biomedical and Pharmaceutical Sciences, Chapman University School of Pharmacy, Irvine, California 92618, United States
| |
Collapse
|
42
|
Dutagaci B, Duan B, Qiu C, Kaplan CD, Feig M. Characterization of RNA polymerase II trigger loop mutations using molecular dynamics simulations and machine learning. PLoS Comput Biol 2023; 19:e1010999. [PMID: 36947548 PMCID: PMC10069792 DOI: 10.1371/journal.pcbi.1010999] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2022] [Revised: 04/03/2023] [Accepted: 03/06/2023] [Indexed: 03/23/2023] Open
Abstract
Catalysis and fidelity of multisubunit RNA polymerases rely on a highly conserved active site domain called the trigger loop (TL), which achieves roles in transcription through conformational changes and interaction with NTP substrates. The mutations of TL residues cause distinct effects on catalysis including hypo- and hyperactivity and altered fidelity. We applied molecular dynamics simulation (MD) and machine learning (ML) techniques to characterize TL mutations in the Saccharomyces cerevisiae RNA Polymerase II (Pol II) system. We did so to determine relationships between individual mutations and phenotypes and to associate phenotypes with MD simulated structural alterations. Using fitness values of mutants under various stress conditions, we modeled phenotypes along a spectrum of continual values. We found that ML could predict the phenotypes with 0.68 R2 correlation from amino acid sequences alone. It was more difficult to incorporate MD data to improve predictions from machine learning, presumably because MD data is too noisy and possibly incomplete to directly infer functional phenotypes. However, a variational auto-encoder model based on the MD data allowed the clustering of mutants with different phenotypes based on structural details. Overall, we found that a subset of loss-of-function (LOF) and lethal mutations tended to increase distances of TL residues to the NTP substrate, while another subset of LOF and lethal substitutions tended to confer an increase in distances between TL and bridge helix (BH). In contrast, some of the gain-of-function (GOF) mutants appear to cause disruption of hydrophobic contacts among TL and nearby helices.
Collapse
Affiliation(s)
- Bercem Dutagaci
- Department of Molecular and Cell Biology, University of California Merced, Merced, California, United States of America
| | - Bingbing Duan
- Department of Biological Sciences, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America
| | - Chenxi Qiu
- Department of Genetics, Harvard Medical School, Boston, Massachusetts, United States of America
| | - Craig D. Kaplan
- Department of Biological Sciences, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America
| | - Michael Feig
- Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, Michigan, United States of America
| |
Collapse
|
43
|
Hsu WT, Piomponi V, Merz PT, Bussi G, Shirts MR. Alchemical Metadynamics: Adding Alchemical Variables to Metadynamics to Enhance Sampling in Free Energy Calculations. J Chem Theory Comput 2023; 19:1805-1817. [PMID: 36853624 DOI: 10.1021/acs.jctc.2c01258] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/01/2023]
Abstract
Performing alchemical transformations, in which one molecular system is nonphysically changed to another system, is a popular approach adopted in performing free energy calculations associated with various biophysical processes, such as protein-ligand binding or the transfer of a molecule between environments. While the sampling of alchemical intermediate states in either parallel (e.g., Hamiltonian replica exchange) or serial manner (e.g., expanded ensemble) can bridge the high-probability regions in the configurational space between two end states of interest, alchemical methods can fail in scenarios where the most important slow degrees of freedom in the configurational space are, in large part, orthogonal to the alchemical variable, or if the system gets trapped in a deep basin extending in both the configurational and alchemical space. To alleviate these issues, we propose to use alchemical variables as an additional dimension in metadynamics, making it possible to both sample collective variables and to enhance sampling in free energy calculations simultaneously. In this study, we validate our implementation of "alchemical metadynamics" in PLUMED with test systems and alchemical processes with varying complexities and dimensionalities of collective variable space, including the interconversion between the torsional metastable states of a toy system and the methylation of a nucleoside both in the isolated form and in a duplex. We show that multidimensional alchemical metadynamics can address the challenges mentioned above and further accelerate sampling by introducing configurational collective variables. The method can trivially be combined with other metadynamics-based algorithms implemented in PLUMED. The necessary PLUMED code changes have already been released for general use in PLUMED 2.8.
Collapse
Affiliation(s)
- Wei-Tse Hsu
- Department of Chemical and Biological Engineering, University of Colorado at Boulder, Boulder, Colorado 80305, United States
| | - Valerio Piomponi
- Scuola Internazionale Superiore di Studi Avanzati, via Bonomea 265, 34136 Trieste, Italy
| | - Pascal T Merz
- Department of Chemical and Biological Engineering, University of Colorado at Boulder, Boulder, Colorado 80305, United States
| | - Giovanni Bussi
- Scuola Internazionale Superiore di Studi Avanzati, via Bonomea 265, 34136 Trieste, Italy
| | - Michael R Shirts
- Department of Chemical and Biological Engineering, University of Colorado at Boulder, Boulder, Colorado 80305, United States
| |
Collapse
|
44
|
Šípka M, Erlebach A, Grajciar L. Constructing Collective Variables Using Invariant Learned Representations. J Chem Theory Comput 2023; 19:887-901. [PMID: 36696574 PMCID: PMC9940718 DOI: 10.1021/acs.jctc.2c00729] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2022] [Indexed: 01/26/2023]
Abstract
On the time scales accessible to atomistic numerical modeling, chemical reactions are considered rare events. Therefore, the atomistic simulations are commonly biased along a low-dimensional representation of a chemical reaction in an atomic structure space, i.e., along the collective variables. However, suitable collective variables are often complicated to guess a priori. We propose a novel method of collective variable discovery based on dimensionality reduction of the atomic representation vectors. These linear-scaling and invariant representations can be either fixed (untrained) or learned by supervised training of the end-to-end machine learning potential. The learned representations are expected to reflect not only the structural but also the energetic features of the system that are transferable to all of the reactive transformation covered by the machine learning potential. We demonstrate our approach on four high-barrier reactions ranging from a simple gas-phase hydrogen jump reaction to complex reactions in periodic models of industrially relevant heterogeneous catalysts. High data efficiency, automatized feature extraction, favorable scaling, and retention of inherent invariances are all properties that are expected to enable fast and largely automatic construction of suitable collective variables even in highly complex reactive scenarios such as reactive/catalytic transformations at solid-liquid interfaces.
Collapse
Affiliation(s)
- Martin Šípka
- Department
of Physical and Macromolecular Chemistry, Faculty of Sciences, Charles University, Hlavova 8, 128 43 Prague 2, Czech Republic
- Mathematical
Institute, Faculty of Mathematics and Physics, Charles University, Sokolovská 83, 186 75 Prague, Czech Republic
| | - Andreas Erlebach
- Department
of Physical and Macromolecular Chemistry, Faculty of Sciences, Charles University, Hlavova 8, 128 43 Prague 2, Czech Republic
| | - Lukáš Grajciar
- Department
of Physical and Macromolecular Chemistry, Faculty of Sciences, Charles University, Hlavova 8, 128 43 Prague 2, Czech Republic
| |
Collapse
|
45
|
Chen H, Chipot C. Chasing collective variables using temporal data-driven strategies. QRB DISCOVERY 2023; 4:e2. [PMID: 37564298 PMCID: PMC10411323 DOI: 10.1017/qrd.2022.23] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2022] [Revised: 12/21/2022] [Accepted: 12/29/2022] [Indexed: 01/09/2023] Open
Abstract
The convergence of free-energy calculations based on importance sampling depends heavily on the choice of collective variables (CVs), which in principle, should include the slow degrees of freedom of the biological processes to be investigated. Autoencoders (AEs), as emerging data-driven dimension reduction tools, have been utilised for discovering CVs. AEs, however, are often treated as black boxes, and what AEs actually encode during training, and whether the latent variables from encoders are suitable as CVs for further free-energy calculations remains unknown. In this contribution, we review AEs and their time-series-based variants, including time-lagged AEs (TAEs) and modified TAEs, as well as the closely related model variational approach for Markov processes networks (VAMPnets). We then show through numerical examples that AEs learn the high-variance modes instead of the slow modes. In stark contrast, time series-based models are able to capture the slow modes. Moreover, both modified TAEs with extensions from slow feature analysis and the state-free reversible VAMPnets (SRVs) can yield orthogonal multidimensional CVs. As an illustration, we employ SRVs to discover the CVs of the isomerizations of N-acetyl-N'-methylalanylamide and trialanine by iterative learning with trajectories from biased simulations. Last, through numerical experiments with anisotropic diffusion, we investigate the potential relationship of time-series-based models and committor probabilities.
Collapse
Affiliation(s)
- Haochuan Chen
- Laboratoire International Associé Centre National de la Recherche Scientifique et University of Illinois at Urbana-Champaign, Unité Mixte de Recherche n°7019, Université de Lorraine, 54506 Vandœuvre-lès-Nancy, France
| | - Christophe Chipot
- Laboratoire International Associé Centre National de la Recherche Scientifique et University of Illinois at Urbana-Champaign, Unité Mixte de Recherche n°7019, Université de Lorraine, 54506 Vandœuvre-lès-Nancy, France
- Theoretical and Computational Biophysics Group, Beckman Institute, and Department of Physics, University of Illinois at Urbana-Champaign, Urbana, IL61801, USA
- Department of Biochemistry and Molecular Biology, University of Chicago, Chicago, IL60637, USA
| |
Collapse
|
46
|
Ketkaew R, Luber S. DeepCV: A Deep Learning Framework for Blind Search of Collective Variables in Expanded Configurational Space. J Chem Inf Model 2022; 62:6352-6364. [PMID: 36445176 DOI: 10.1021/acs.jcim.2c00883] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/02/2022]
Abstract
We present Deep learning for Collective Variables (DeepCV), a computer code that provides an efficient and customizable implementation of the deep autoencoder neural network (DAENN) algorithm that has been developed in our group for computing collective variables (CVs) and can be used with enhanced sampling methods to reconstruct free energy surfaces of chemical reactions. DeepCV can be used to conveniently calculate molecular features, train models, generate CVs, validate rare events from sampling, and analyze a trajectory for chemical reactions of interest. We use DeepCV in an example study of the conformational transition of cyclohexene, where metadynamics simulations are performed using DAENN-generated CVs. The results show that the adopted CVs give free energies in line with those obtained by previously developed CVs and experimental results. DeepCV is open-source software written in Python/C++ object-oriented languages, based on the TensorFlow framework and distributed free of charge for noncommercial purposes, which can be incorporated into general molecular dynamics software. DeepCV also comes with several additional tools, i.e., an application program interface (API), documentation, and tutorials.
Collapse
Affiliation(s)
- Rangsiman Ketkaew
- Department of Chemistry, University of Zurich, Winterthurerstrasse 190, CH-8057 Zürich, Switzerland
| | - Sandra Luber
- Department of Chemistry, University of Zurich, Winterthurerstrasse 190, CH-8057 Zürich, Switzerland
| |
Collapse
|
47
|
Zeng X, Wang F, Luo Y, Kang SG, Tang J, Lightstone FC, Fang EF, Cornell W, Nussinov R, Cheng F. Deep generative molecular design reshapes drug discovery. Cell Rep Med 2022; 3:100794. [PMID: 36306797 PMCID: PMC9797947 DOI: 10.1016/j.xcrm.2022.100794] [Citation(s) in RCA: 38] [Impact Index Per Article: 19.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2022] [Revised: 08/05/2022] [Accepted: 09/30/2022] [Indexed: 11/05/2022]
Abstract
Recent advances and accomplishments of artificial intelligence (AI) and deep generative models have established their usefulness in medicinal applications, especially in drug discovery and development. To correctly apply AI, the developer and user face questions such as which protocols to consider, which factors to scrutinize, and how the deep generative models can integrate the relevant disciplines. This review summarizes classical and newly developed AI approaches, providing an updated and accessible guide to the broad computational drug discovery and development community. We introduce deep generative models from different standpoints and describe the theoretical frameworks for representing chemical and biological structures and their applications. We discuss the data and technical challenges and highlight future directions of multimodal deep generative models for accelerating drug discovery.
Collapse
Affiliation(s)
- Xiangxiang Zeng
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, Hunan 410082, P.R. China
| | - Fei Wang
- Department of Population Health Sciences, Weill Cornell Medical College, Cornell University, New York, NY 10065, USA
| | - Yuan Luo
- Division of Health and Biomedical Informatics, Department of Preventive Medicine, Feinberg School of Medicine, Northwestern University, Chicago, IL 60611, USA
| | - Seung-Gu Kang
- Healthcare & Life Sciences Research, IBM TJ Watson Research Center, 1101 Kitchawan Road, Yorktown Heights, NY 10598, USA
| | - Jian Tang
- Mila-Quebec Institute for Learning Algorithms and CIFAR AI Research Chair, HEC Montreal, Montréal, QC H3T 2A7, Canada
| | - Felice C Lightstone
- Biosciences and Biotechnology Division, Physical and Life Sciences Directorate, Lawrence Livermore National Lab, Livermore, CA 94550, USA
| | - Evandro F Fang
- Department of Clinical Molecular Biology, University of Oslo and Akershus University Hospital, 1478 Lørenskog, Oslo, Norway; The Norwegian Centre on Healthy Ageing (NO-Age), Oslo, Norway
| | - Wendy Cornell
- Healthcare & Life Sciences Research, IBM TJ Watson Research Center, 1101 Kitchawan Road, Yorktown Heights, NY 10598, USA
| | - Ruth Nussinov
- Computational Structural Biology Section, Frederick National Laboratory for Cancer Research in the Laboratory of Cancer Immunometabolism, National Cancer Institute, Frederick, MD 21702, USA; Department of Human Molecular Genetics and Biochemistry, Sackler School of Medicine, Tel Aviv University, Tel Aviv 69978, Israel
| | - Feixiong Cheng
- Genomic Medicine Institute, Lerner Research Institute, Cleveland Clinic, Cleveland, OH 44195, USA; Department of Molecular Medicine, Cleveland Clinic Lerner College of Medicine, Case Western Reserve University, Cleveland, OH 44195, USA; Case Comprehensive Cancer Center, Case Western Reserve University School of Medicine, Cleveland, OH 44106, USA.
| |
Collapse
|
48
|
Shmilovich K, Stieffenhofer M, Charron NE, Hoffmann M. Temporally Coherent Backmapping of Molecular Trajectories From Coarse-Grained to Atomistic Resolution. J Phys Chem A 2022; 126:9124-9139. [PMID: 36417670 PMCID: PMC9743211 DOI: 10.1021/acs.jpca.2c07716] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
Coarse-graining offers a means to extend the achievable time and length scales of molecular dynamics simulations beyond what is practically possible in the atomistic regime. Sampling molecular configurations of interest can be done efficiently using coarse-grained simulations, from which meaningful physicochemical information can be inferred if the corresponding all-atom configurations are reconstructed. However, this procedure of backmapping to reintroduce the lost atomistic detail into coarse-grain structures has proven a challenging task due to the many feasible atomistic configurations that can be associated with one coarse-grain structure. Existing backmapping methods are strictly frame-based, relying on either heuristics to replace coarse-grain particles with atomic fragments and subsequent relaxation or parametrized models to propose atomic coordinates separately and independently for each coarse-grain structure. These approaches neglect information from previous trajectory frames that is critical to ensuring temporal coherence of the backmapped trajectory, while also offering information potentially helpful to producing higher-fidelity atomic reconstructions. In this work, we present a deep learning-enabled data-driven approach for temporally coherent backmapping that explicitly incorporates information from preceding trajectory structures. Our method trains a conditional variational autoencoder to nondeterministically reconstruct atomistic detail conditioned on both the target coarse-grain configuration and the previously reconstructed atomistic configuration. We demonstrate our backmapping approach on two exemplar biomolecular systems: alanine dipeptide and the miniprotein chignolin. We show that our backmapped trajectories accurately recover the structural, thermodynamic, and kinetic properties of the atomistic trajectory data.
Collapse
Affiliation(s)
- Kirill Shmilovich
- Pritzker
School of Molecular Engineering, University
of Chicago, Chicago, Illinois60637, United States,E-mail:
| | | | - Nicholas E. Charron
- Weiss
School of Natural Sciences, Department of Physics and Astronomy, Rice University, Houston, Texas77005, United States,Department
of Physics, Freie Universität Berlin, Berlin14195, Germany
| | - Moritz Hoffmann
- Fachbereich
Mathematik und Informatik, Freie Universität
Berlin, Berlin14195, Germany
| |
Collapse
|
49
|
Zhang L, Tang S, He G. Learning chaotic systems from noisy data via multi-step optimization and adaptive training. CHAOS (WOODBURY, N.Y.) 2022; 32:123134. [PMID: 36587345 DOI: 10.1063/5.0114542] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/26/2022] [Accepted: 11/17/2022] [Indexed: 06/17/2023]
Abstract
A data-driven sparse identification method is developed to discover the underlying governing equations from noisy measurement data through the minimization of Multi-Step-Accumulation (MSA) in error. The method focuses on the multi-step model, while conventional sparse regression methods, such as the Sparse Identification of Nonlinear Dynamics method (SINDy), are one-step models. We adopt sparse representation and assume that the underlying equations involve only a small number of functions among possible candidates in a library. The new development in MSA is to use a multi-step model, i.e., predictions from an approximate evolution scheme based on initial points. Accordingly, the loss function comprises the total error at all time steps between the measured series and predicted series with the same initial point. This enables MSA to capture the dynamics directly from the noisy measurements, resisting the corruption of noise. By use of several numerical examples, we demonstrate the robustness and accuracy of the proposed MSA method, including a two-dimensional chaotic map, the logistic map, a two-dimensional damped oscillator, the Lorenz system, and a reduced order model of a self-sustaining process in turbulent shear flows. We also perform further studies under challenging conditions, such as noisy measurements, missing data, and large time step sizes. Furthermore, in order to resolve the difficulty of the nonlinear optimization, we suggest an adaptive training strategy, namely, by gradually increasing the length of time series for training. Higher prediction accuracy is achieved in an illustrative example of the chaotic map by the adaptive strategy.
Collapse
Affiliation(s)
- Lei Zhang
- The State Key Laboratory of Nonlinear Mechanics, Institute of Mechanics, Chinese Academy of Sciences, Beijing 100190, China
| | - Shaoqiang Tang
- HEDPS and LTCS, College of Engineering, Peking University, Beijing 100871, China
| | - Guowei He
- The State Key Laboratory of Nonlinear Mechanics, Institute of Mechanics, Chinese Academy of Sciences, Beijing 100190, China
| |
Collapse
|
50
|
Baima J, Goryaeva AM, Swinburne TD, Maillet JB, Nastar M, Marinica MC. Capabilities and limits of autoencoders for extracting collective variables in atomistic materials science. Phys Chem Chem Phys 2022; 24:23152-23163. [PMID: 36128869 DOI: 10.1039/d2cp01917e] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
Free energy calculations in materials science are routinely hindered by the need to provide reaction coordinates that can meaningfully partition atomic configuration space, a prerequisite for most enhanced sampling approaches. Recent studies on molecular systems have highlighted the possibility of constructing appropriate collective variables directly from atomic motions through deep learning techniques. Here we extend this class of approaches to condensed matter problems, for which we encode the finite temperature collective variable by an iterative procedure starting from 0 K features of the energy landscape i.e. activation events or migration mechanisms given by a minimum - saddle point - minimum sequence. We employ the autoencoder neural networks in order to build a scalar collective variable for use with the adaptive biasing force method. Particular attention is given to design choices required for application to crystalline systems with defects, including the filtering of thermal motions which otherwise dominate the autoencoder input. The machine-learning workflow is tested on body-centered cubic iron and its common defects, such as small vacancy or self-interstitial clusters and screw dislocations. For localized defects, excellent collective variables as well as derivatives, necessary for free energy sampling, are systematically obtained. However, the approach has a limited accuracy when dealing with reaction coordinates that include atomic displacements of a magnitude comparable to thermal motions, e.g. the ones produced by the long-range elastic field of dislocations. We then combine the extraction of collective variables by autoencoders with an adaptive biasing force free energy method based on Bayesian inference. Using a vacancy migration as an example, we demonstrate the performance of coupling these two approaches for simultaneous discovery of reaction coordinates and free energy sampling in systems with localized defects.
Collapse
Affiliation(s)
- Jacopo Baima
- Université Paris-Saclay, CEA, Service de Recherches de Métallurgie Physique, Gif-sur-Yvette 91191, France.
| | - Alexandra M Goryaeva
- Université Paris-Saclay, CEA, Service de Recherches de Métallurgie Physique, Gif-sur-Yvette 91191, France.
| | - Thomas D Swinburne
- Aix-Marseille Université, CNRS, CINaM UMR 7325, Campus de Luminy, 13288 Marseille, France
| | | | - Maylise Nastar
- Université Paris-Saclay, CEA, Service de Recherches de Métallurgie Physique, Gif-sur-Yvette 91191, France.
| | - Mihai-Cosmin Marinica
- Université Paris-Saclay, CEA, Service de Recherches de Métallurgie Physique, Gif-sur-Yvette 91191, France.
| |
Collapse
|