1
|
Costa AC, Ahamed T, Jordan D, Stephens GJ. A Markovian dynamics for Caenorhabditis elegans behavior across scales. Proc Natl Acad Sci U S A 2024; 121:e2318805121. [PMID: 39083417 PMCID: PMC11317559 DOI: 10.1073/pnas.2318805121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2023] [Accepted: 07/01/2024] [Indexed: 08/02/2024] Open
Abstract
How do we capture the breadth of behavior in animal movement, from rapid body twitches to aging? Using high-resolution videos of the nematode worm Caenorhabditis elegans, we show that a single dynamics connects posture-scale fluctuations with trajectory diffusion and longer-lived behavioral states. We take short posture sequences as an instantaneous behavioral measure, fixing the sequence length for maximal prediction. Within the space of posture sequences, we construct a fine-scale, maximum entropy partition so that transitions among microstates define a high-fidelity Markov model, which we also use as a means of principled coarse-graining. We translate these dynamics into movement using resistive force theory, capturing the statistical properties of foraging trajectories. Predictive across scales, we leverage the longest-lived eigenvectors of the inferred Markov chain to perform a top-down subdivision of the worm's foraging behavior, revealing both "runs-and-pirouettes" as well as previously uncharacterized finer-scale behaviors. We use our model to investigate the relevance of these fine-scale behaviors for foraging success, recovering a trade-off between local and global search strategies.
Collapse
Affiliation(s)
- Antonio C. Costa
- Department of Physics and Astronomy, Vrije Universiteit Amsterdam, Amsterdam1081HV, The Netherlands
| | | | - David Jordan
- Department of Biochemistry, University of Cambridge, CambridgeCB2 1GA, United Kingdom
| | - Greg J. Stephens
- Department of Physics and Astronomy, Vrije Universiteit Amsterdam, Amsterdam1081HV, The Netherlands
- Biological Physics Theory Unit, Okinawa Institute of Science and Technology Graduate University, Okinawa904-0495, Japan
| |
Collapse
|
2
|
Shmilovich K, Ferguson AL. Girsanov Reweighting Enhanced Sampling Technique (GREST): On-the-Fly Data-Driven Discovery of and Enhanced Sampling in Slow Collective Variables. J Phys Chem A 2023; 127:3497-3517. [PMID: 37036804 DOI: 10.1021/acs.jpca.3c00505] [Citation(s) in RCA: 10] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/11/2023]
Abstract
Molecular dynamics simulations of microscopic phenomena are limited by the short integration time steps which are required for numerical stability but which limit the practically achievable simulation time scales. Collective variable (CV) enhanced sampling techniques apply biases to predefined collective coordinates to promote barrier crossing, phase space exploration, and sampling of rare events. The efficacy of these techniques is contingent on the selection of good CVs correlated with the molecular motions governing the long-time dynamical evolution of the system. In this work, we introduce Girsanov Reweighting Enhanced Sampling Technique (GREST) as an adaptive sampling scheme that interleaves rounds of data-driven slow CV discovery and enhanced sampling along these coordinates. Since slow CVs are inherently dynamical quantities, a key ingredient in our approach is the use of both thermodynamic and dynamical Girsanov reweighting corrections for rigorous estimation of slow CVs from biased simulation data. We demonstrate our approach on a toy 1D 4-well potential, a simple biomolecular system alanine dipeptide, and the Trp-Leu-Ala-Leu-Leu (WLALL) pentapeptide. In each case GREST learns appropriate slow CVs and drives sampling of all thermally accessible metastable states starting from zero prior knowledge of the system. We make GREST accessible to the community via a publicly available open source Python package.
Collapse
Affiliation(s)
- Kirill Shmilovich
- Pritzker School of Molecular Engineering, University of Chicago, Chicago, Illinois 60637, United States
| | - Andrew L Ferguson
- Pritzker School of Molecular Engineering, University of Chicago, Chicago, Illinois 60637, United States
| |
Collapse
|
3
|
Costa AC, Ahamed T, Jordan D, Stephens GJ. Maximally predictive states: From partial observations to long timescales. CHAOS (WOODBURY, N.Y.) 2023; 33:023136. [PMID: 36859220 DOI: 10.1063/5.0129398] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/04/2022] [Accepted: 01/31/2023] [Indexed: 06/18/2023]
Abstract
Isolating slower dynamics from fast fluctuations has proven remarkably powerful, but how do we proceed from partial observations of dynamical systems for which we lack underlying equations? Here, we construct maximally predictive states by concatenating measurements in time, partitioning the resulting sequences using maximum entropy, and choosing the sequence length to maximize short-time predictive information. Transitions between these states yield a simple approximation of the transfer operator, which we use to reveal timescale separation and long-lived collective modes through the operator spectrum. Applicable to both deterministic and stochastic processes, we illustrate our approach through partial observations of the Lorenz system and the stochastic dynamics of a particle in a double-well potential. We use our transfer operator approach to provide a new estimator of the Kolmogorov-Sinai entropy, which we demonstrate in discrete and continuous-time systems, as well as the movement behavior of the nematode worm C. elegans.
Collapse
Affiliation(s)
- Antonio C Costa
- Department of Physics and Astronomy, Vrije Universiteit Amsterdam, 1081HV Amsterdam, The Netherlands
| | - Tosif Ahamed
- Lunenfeld-Tanenbaum Research Institute, Mount Sinai Hospital, Toronto, Ontario M5G 1X5, Canada
| | - David Jordan
- Wellcome/CRUK Gurdon Institute, University of Cambridge, Cambridge CB2 1QN, United Kingdom
| | - Greg J Stephens
- Department of Physics and Astronomy, Vrije Universiteit Amsterdam, 1081HV Amsterdam, The Netherlands
| |
Collapse
|
4
|
Hoffmann M, Scherer M, Hempel T, Mardt A, de Silva B, Husic BE, Klus S, Wu H, Kutz N, Brunton SL, Noé F. Deeptime: a Python library for machine learning dynamical models from time series data. MACHINE LEARNING: SCIENCE AND TECHNOLOGY 2022. [DOI: 10.1088/2632-2153/ac3de0] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022] Open
Abstract
Abstract
Generation and analysis of time-series data is relevant to many quantitative fields ranging from economics to fluid mechanics. In the physical sciences, structures such as metastable and coherent sets, slow relaxation processes, collective variables, dominant transition pathways or manifolds and channels of probability flow can be of great importance for understanding and characterizing the kinetic, thermodynamic and mechanistic properties of the system. Deeptime is a general purpose Python library offering various tools to estimate dynamical models based on time-series data including conventional linear learning methods, such as Markov state models (MSMs), Hidden Markov Models and Koopman models, as well as kernel and deep learning approaches such as VAMPnets and deep MSMs. The library is largely compatible with scikit-learn, having a range of Estimator classes for these different models, but in contrast to scikit-learn also provides deep Model classes, e.g. in the case of an MSM, which provide a multitude of analysis methods to compute interesting thermodynamic, kinetic and dynamical quantities, such as free energies, relaxation times and transition paths. The library is designed for ease of use but also easily maintainable and extensible code. In this paper we introduce the main features and structure of the deeptime software. Deeptime can be found under https://deeptime-ml.github.io/.
Collapse
|
5
|
Glielmo A, Husic BE, Rodriguez A, Clementi C, Noé F, Laio A. Unsupervised Learning Methods for Molecular Simulation Data. Chem Rev 2021; 121:9722-9758. [PMID: 33945269 PMCID: PMC8391792 DOI: 10.1021/acs.chemrev.0c01195] [Citation(s) in RCA: 116] [Impact Index Per Article: 38.7] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2020] [Indexed: 12/21/2022]
Abstract
Unsupervised learning is becoming an essential tool to analyze the increasingly large amounts of data produced by atomistic and molecular simulations, in material science, solid state physics, biophysics, and biochemistry. In this Review, we provide a comprehensive overview of the methods of unsupervised learning that have been most commonly used to investigate simulation data and indicate likely directions for further developments in the field. In particular, we discuss feature representation of molecular systems and present state-of-the-art algorithms of dimensionality reduction, density estimation, and clustering, and kinetic models. We divide our discussion into self-contained sections, each discussing a specific method. In each section, we briefly touch upon the mathematical and algorithmic foundations of the method, highlight its strengths and limitations, and describe the specific ways in which it has been used-or can be used-to analyze molecular simulation data.
Collapse
Affiliation(s)
- Aldo Glielmo
- International
School for Advanced Studies (SISSA) 34014 Trieste, Italy
| | - Brooke E. Husic
- Freie
Universität Berlin, Department of Mathematics
and Computer Science, 14195 Berlin, Germany
| | - Alex Rodriguez
- International Centre for Theoretical
Physics (ICTP), Condensed Matter and Statistical
Physics Section, 34100 Trieste, Italy
| | - Cecilia Clementi
- Freie
Universität Berlin, Department for
Physics, 14195 Berlin, Germany
- Rice
University Houston, Department of Chemistry, Houston, Texas 77005, United States
| | - Frank Noé
- Freie
Universität Berlin, Department of Mathematics
and Computer Science, 14195 Berlin, Germany
- Freie
Universität Berlin, Department for
Physics, 14195 Berlin, Germany
- Rice
University Houston, Department of Chemistry, Houston, Texas 77005, United States
| | - Alessandro Laio
- International
School for Advanced Studies (SISSA) 34014 Trieste, Italy
- International Centre for Theoretical
Physics (ICTP), Condensed Matter and Statistical
Physics Section, 34100 Trieste, Italy
| |
Collapse
|
6
|
Sikorski A, Weber M, Schütte C. The Augmented Jump Chain. ADVANCED THEORY AND SIMULATIONS 2021. [DOI: 10.1002/adts.202000274] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Affiliation(s)
- Alexander Sikorski
- Zuse Institute Berlin Mathematics for Life and Materials Sciences Takustr. 7 D‐14195 Berlin Germany
| | - Marcus Weber
- Zuse Institute Berlin Mathematics for Life and Materials Sciences Takustr. 7 D‐14195 Berlin Germany
| | - Christof Schütte
- Zuse Institute Berlin Mathematics for Life and Materials Sciences Takustr. 7 D‐14195 Berlin Germany
- Freie Universität Berlin Department of Mathematics and Computer Science Biocomputing Group Arnimallee 6 D‐14195 Berlin Germany
| |
Collapse
|
7
|
Löffler M, Picard A. Spectral thresholding for the estimation of Markov chain transition operators. Electron J Stat 2021. [DOI: 10.1214/21-ejs1935] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
8
|
Sidky H, Chen W, Ferguson AL. Molecular latent space simulators. Chem Sci 2020; 11:9459-9467. [PMID: 34094212 PMCID: PMC8162036 DOI: 10.1039/d0sc03635h] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2020] [Accepted: 08/20/2020] [Indexed: 12/30/2022] Open
Abstract
Small integration time steps limit molecular dynamics (MD) simulations to millisecond time scales. Markov state models (MSMs) and equation-free approaches learn low-dimensional kinetic models from MD simulation data by performing configurational or dynamical coarse-graining of the state space. The learned kinetic models enable the efficient generation of dynamical trajectories over vastly longer time scales than are accessible by MD, but the discretization of configurational space and/or absence of a means to reconstruct molecular configurations precludes the generation of continuous atomistic molecular trajectories. We propose latent space simulators (LSS) to learn kinetic models for continuous atomistic simulation trajectories by training three deep learning networks to (i) learn the slow collective variables of the molecular system, (ii) propagate the system dynamics within this slow latent space, and (iii) generatively reconstruct molecular configurations. We demonstrate the approach in an application to Trp-cage miniprotein to produce novel ultra-long synthetic folding trajectories that accurately reproduce atomistic molecular structure, thermodynamics, and kinetics at six orders of magnitude lower cost than MD. The dramatically lower cost of trajectory generation enables greatly improved sampling and greatly reduced statistical uncertainties in estimated thermodynamic averages and kinetic rates.
Collapse
Affiliation(s)
- Hythem Sidky
- Pritzker School of Molecular Engineering, University of Chicago Chicago USA
| | - Wei Chen
- Department of Physics, University of Illinois at Urbana-Champaign Urbana USA
| | - Andrew L Ferguson
- Pritzker School of Molecular Engineering, University of Chicago Chicago USA
| |
Collapse
|
9
|
Abstract
Machine learning (ML) is transforming all areas of science. The complex and time-consuming calculations in molecular simulations are particularly suitable for an ML revolution and have already been profoundly affected by the application of existing ML methods. Here we review recent ML methods for molecular simulation, with particular focus on (deep) neural networks for the prediction of quantum-mechanical energies and forces, on coarse-grained molecular dynamics, on the extraction of free energy surfaces and kinetics, and on generative network approaches to sample molecular equilibrium structures and compute thermodynamics. To explain these methods and illustrate open methodological problems, we review some important principles of molecular physics and describe how they can be incorporated into ML structures. Finally, we identify and describe a list of open challenges for the interface between ML and molecular simulation.
Collapse
Affiliation(s)
- Frank Noé
- Department of Mathematics and Computer Science, Freie Universität Berlin, 14195 Berlin, Germany; .,Department of Physics, Freie Universität Berlin, 14195 Berlin, Germany.,Department of Chemistry and Center for Theoretical Biological Physics, Rice University, Houston, Texas 77005, USA;
| | - Alexandre Tkatchenko
- Physics and Materials Science Research Unit, University of Luxembourg, 1511 Luxembourg, Luxembourg;
| | - Klaus-Robert Müller
- Department of Computer Science, Technical University Berlin, 10587 Berlin, Germany; .,Max-Planck-Institut für Informatik, 66123 Saarbrücken, Germany.,Department of Brain and Cognitive Engineering, Korea University, Seoul 136-713, South Korea
| | - Cecilia Clementi
- Department of Mathematics and Computer Science, Freie Universität Berlin, 14195 Berlin, Germany; .,Department of Chemistry and Center for Theoretical Biological Physics, Rice University, Houston, Texas 77005, USA; .,Department of Physics, Rice University, Houston, Texas 77005, USA
| |
Collapse
|
10
|
Noé F. Machine Learning for Molecular Dynamics on Long Timescales. MACHINE LEARNING MEETS QUANTUM PHYSICS 2020. [DOI: 10.1007/978-3-030-40245-7_16] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
|
11
|
Klus S, Husic BE, Mollenhauer M, Noé F. Kernel methods for detecting coherent structures in dynamical data. CHAOS (WOODBURY, N.Y.) 2019; 29:123112. [PMID: 31893642 DOI: 10.1063/1.5100267] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/18/2019] [Accepted: 11/08/2019] [Indexed: 06/10/2023]
Abstract
We illustrate relationships between classical kernel-based dimensionality reduction techniques and eigendecompositions of empirical estimates of reproducing kernel Hilbert space operators associated with dynamical systems. In particular, we show that kernel canonical correlation analysis (CCA) can be interpreted in terms of kernel transfer operators and that it can be obtained by optimizing the variational approach for Markov processes score. As a result, we show that coherent sets of particle trajectories can be computed by kernel CCA. We demonstrate the efficiency of this approach with several examples, namely, the well-known Bickley jet, ocean drifter data, and a molecular dynamics problem with a time-dependent potential. Finally, we propose a straightforward generalization of dynamic mode decomposition called coherent mode decomposition. Our results provide a generic machine learning approach to the computation of coherent sets with an objective score that can be used for cross-validation and the comparison of different methods.
Collapse
Affiliation(s)
- Stefan Klus
- Department of Mathematics and Computer Science, Freie Universität Berlin, 14195 Berlin, Germany
| | - Brooke E Husic
- Department of Mathematics and Computer Science, Freie Universität Berlin, 14195 Berlin, Germany
| | - Mattes Mollenhauer
- Department of Mathematics and Computer Science, Freie Universität Berlin, 14195 Berlin, Germany
| | - Frank Noé
- Department of Mathematics and Computer Science, Freie Universität Berlin, 14195 Berlin, Germany
| |
Collapse
|
12
|
Affiliation(s)
- Frank Noé
- Department of Mathematics and Computer Science, Freie Universität Berlin, Berlin, Germany
- Department of Physics, Freie Universität Berlin, Berlin, Germany
| | - Edina Rosta
- Department of Chemistry, Kings College London, London, England
| |
Collapse
|
13
|
Paul F, Wu H, Vossel M, de Groot BL, Noé F. Identification of kinetic order parameters for non-equilibrium dynamics. J Chem Phys 2019; 150:164120. [PMID: 31042914 PMCID: PMC6486394 DOI: 10.1063/1.5083627] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2018] [Accepted: 04/04/2019] [Indexed: 12/17/2022] Open
Abstract
A popular approach to analyze the dynamics of high-dimensional many-body systems, such as macromolecules, is to project the trajectories onto a space of slowly varying collective variables, where subsequent analyses are made, such as clustering or estimation of free energy profiles or Markov state models. However, existing "dynamical" dimension reduction methods, such as the time-lagged independent component analysis (TICA), are only valid if the dynamics obeys detailed balance (microscopic reversibility) and typically require long, equilibrated simulation trajectories. Here, we develop a dimension reduction method for non-equilibrium dynamics based on the recently developed Variational Approach for Markov Processes (VAMP) by Wu and Noé. VAMP is illustrated by obtaining a low-dimensional description of a single file ion diffusion model and by identifying long-lived states from molecular dynamics simulations of the KcsA channel protein in an external electrochemical potential. This analysis provides detailed insights into the coupling of conformational dynamics, the configuration of the selectivity filter, and the conductance of the channel. We recommend VAMP as a replacement for the less general TICA method.
Collapse
Affiliation(s)
- Fabian Paul
- Department of Mathematics and Computer Science, FU Berlin, Arnimallee 6, 14195 Berlin, Germany
| | - Hao Wu
- Tongji University, School of Mathematical Sciences, Shanghai 200092, People's Republic of China
| | - Maximilian Vossel
- Max Planck Institute for Biophysical Chemistry, Am Fassberg 11 D-37077 Göttingen, Germany
| | - Bert L de Groot
- Max Planck Institute for Biophysical Chemistry, Am Fassberg 11 D-37077 Göttingen, Germany
| | - Frank Noé
- Department of Mathematics and Computer Science, FU Berlin, Arnimallee 6, 14195 Berlin, Germany
| |
Collapse
|