1
|
Surasinghe S, Fish J, Bollt EM. Learning transfer operators by kernel density estimation. CHAOS (WOODBURY, N.Y.) 2024; 34:023126. [PMID: 38377289 PMCID: PMC10881226 DOI: 10.1063/5.0179937] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/05/2023] [Accepted: 01/25/2024] [Indexed: 02/22/2024]
Abstract
Inference of transfer operators from data is often formulated as a classical problem that hinges on the Ulam method. The conventional description, known as the Ulam-Galerkin method, involves projecting onto basis functions represented as characteristic functions supported over a fine grid of rectangles. From this perspective, the Ulam-Galerkin approach can be interpreted as density estimation using the histogram method. In this study, we recast the problem within the framework of statistical density estimation. This alternative perspective allows for an explicit and rigorous analysis of bias and variance, thereby facilitating a discussion on the mean square error. Through comprehensive examples utilizing the logistic map and a Markov map, we demonstrate the validity and effectiveness of this approach in estimating the eigenvectors of the Frobenius-Perron operator. We compare the performance of histogram density estimation (HDE) and kernel density estimation (KDE) methods and find that KDE generally outperforms HDE in terms of accuracy. However, it is important to note that KDE exhibits limitations around boundary points and jumps. Based on our research findings, we suggest the possibility of incorporating other density estimation methods into this field and propose future investigations into the application of KDE-based estimation for high-dimensional maps. These findings provide valuable insights for researchers and practitioners working on estimating the Frobenius-Perron operator and highlight the potential of density estimation techniques in this area of study.
Collapse
Affiliation(s)
- Sudam Surasinghe
- Department of Ecology and Evolutionary Biology, Yale University, New Haven, Connecticut 06520, USA
| | - Jeremie Fish
- Clarkson Center for Complex Systems Science, Department of Electrical and Computer Engineering, Clarkson University, 8 Clarkson Ave., Potsdam, New York 13699, USA
| | - Erik M. Bollt
- Clarkson Center for Complex Systems Science, Department of Electrical and Computer Engineering, Clarkson University, 8 Clarkson Ave., Potsdam, New York 13699, USA
| |
Collapse
|
2
|
Ngo VA, Lin YT, Perez D. Improving Estimation of the Koopman Operator with Kolmogorov-Smirnov Indicator Functions. J Chem Theory Comput 2023; 19:7187-7198. [PMID: 37800673 DOI: 10.1021/acs.jctc.3c00632] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/07/2023]
Abstract
It has become common to perform kinetic analysis using approximate Koopman operators that transform high-dimensional timeseries of observables into ranked dynamical modes. The key to the practical success of the approach is the identification of a set of observables that form a good basis on which to expand the slow relaxation modes. Good observables are, however, difficult to identify a priori and suboptimal choices can lead to significant underestimations of characteristic time scales. Leveraging the representation of slow dynamics in terms of Hidden Markov Models (HMM), we propose a simple and computationally efficient clustering procedure to infer surrogate observables that form a good basis for slow modes. We apply the approach to an analytically solvable model system as well as on three protein systems of different complexities. We consistently demonstrate that the inferred indicator functions can significantly improve the estimation of the leading eigenvalues of Koopman operators and correctly identify key states and transition time scales of stochastic systems, even when good observables are not known a priori.
Collapse
Affiliation(s)
- Van A Ngo
- Advanced Computing for Life Sciences and Engineering, Computing and Computational Sciences, National Center for Computational Sciences, Oak Ridge National Laboratory, Oak Ridge, Tennessee 37830, United States
| | - Yen Ting Lin
- Information Sciences Group (CCS-3), Computer, Computational and Statistical Sciences Division, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, United States
| | - Danny Perez
- Physics and Chemistry of Materials Group (T-1), Theoretical Division, Los Alamos National Laboratory, Los Alamos, New Mexico 87544, United States
| |
Collapse
|
3
|
Salam T, Edwards V, Hsieh MA. Learning and Leveraging Features in Flow-Like Environments to Improve Situational Awareness. IEEE Robot Autom Lett 2022. [DOI: 10.1109/lra.2022.3141762] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
|
4
|
Hoffmann M, Scherer M, Hempel T, Mardt A, de Silva B, Husic BE, Klus S, Wu H, Kutz N, Brunton SL, Noé F. Deeptime: a Python library for machine learning dynamical models from time series data. MACHINE LEARNING: SCIENCE AND TECHNOLOGY 2022. [DOI: 10.1088/2632-2153/ac3de0] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022] Open
Abstract
Abstract
Generation and analysis of time-series data is relevant to many quantitative fields ranging from economics to fluid mechanics. In the physical sciences, structures such as metastable and coherent sets, slow relaxation processes, collective variables, dominant transition pathways or manifolds and channels of probability flow can be of great importance for understanding and characterizing the kinetic, thermodynamic and mechanistic properties of the system. Deeptime is a general purpose Python library offering various tools to estimate dynamical models based on time-series data including conventional linear learning methods, such as Markov state models (MSMs), Hidden Markov Models and Koopman models, as well as kernel and deep learning approaches such as VAMPnets and deep MSMs. The library is largely compatible with scikit-learn, having a range of Estimator classes for these different models, but in contrast to scikit-learn also provides deep Model classes, e.g. in the case of an MSM, which provide a multitude of analysis methods to compute interesting thermodynamic, kinetic and dynamical quantities, such as free energies, relaxation times and transition paths. The library is designed for ease of use but also easily maintainable and extensible code. In this paper we introduce the main features and structure of the deeptime software. Deeptime can be found under https://deeptime-ml.github.io/.
Collapse
|
5
|
Klebanov I, Sprungk B, Sullivan T. The linear conditional expectation in Hilbert space. BERNOULLI 2021. [DOI: 10.3150/20-bej1308] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Affiliation(s)
- Ilja Klebanov
- Zuse Institute Berlin, Takustraße 7, 14195 Berlin, Germany
| | - Björn Sprungk
- Technische Universität Bergakademie Freiberg, 09596 Freiberg, Germany
| | - T.J. Sullivan
- Zuse Institute Berlin, Takustraße 7, 14195 Berlin, Germany
| |
Collapse
|
6
|
Unke O, Chmiela S, Sauceda HE, Gastegger M, Poltavsky I, Schütt KT, Tkatchenko A, Müller KR. Machine Learning Force Fields. Chem Rev 2021; 121:10142-10186. [PMID: 33705118 PMCID: PMC8391964 DOI: 10.1021/acs.chemrev.0c01111] [Citation(s) in RCA: 371] [Impact Index Per Article: 123.7] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2020] [Indexed: 12/27/2022]
Abstract
In recent years, the use of machine learning (ML) in computational chemistry has enabled numerous advances previously out of reach due to the computational complexity of traditional electronic-structure methods. One of the most promising applications is the construction of ML-based force fields (FFs), with the aim to narrow the gap between the accuracy of ab initio methods and the efficiency of classical FFs. The key idea is to learn the statistical relation between chemical structure and potential energy without relying on a preconceived notion of fixed chemical bonds or knowledge about the relevant interactions. Such universal ML approximations are in principle only limited by the quality and quantity of the reference data used to train them. This review gives an overview of applications of ML-FFs and the chemical insights that can be obtained from them. The core concepts underlying ML-FFs are described in detail, and a step-by-step guide for constructing and testing them from scratch is given. The text concludes with a discussion of the challenges that remain to be overcome by the next generation of ML-FFs.
Collapse
Affiliation(s)
- Oliver
T. Unke
- Machine
Learning Group, Technische Universität
Berlin, 10587 Berlin, Germany
- DFG
Cluster of Excellence “Unifying Systems in Catalysis”
(UniSysCat), Technische Universität Berlin, 10623 Berlin, Germany
| | - Stefan Chmiela
- Machine
Learning Group, Technische Universität
Berlin, 10587 Berlin, Germany
| | - Huziel E. Sauceda
- Machine
Learning Group, Technische Universität
Berlin, 10587 Berlin, Germany
- BASLEARN,
BASF-TU Joint Lab, Technische Universität
Berlin, 10587 Berlin, Germany
| | - Michael Gastegger
- Machine
Learning Group, Technische Universität
Berlin, 10587 Berlin, Germany
- DFG
Cluster of Excellence “Unifying Systems in Catalysis”
(UniSysCat), Technische Universität Berlin, 10623 Berlin, Germany
- BASLEARN,
BASF-TU Joint Lab, Technische Universität
Berlin, 10587 Berlin, Germany
| | - Igor Poltavsky
- Department
of Physics and Materials Science, University
of Luxembourg, L-1511 Luxembourg City, Luxembourg
| | - Kristof T. Schütt
- Machine
Learning Group, Technische Universität
Berlin, 10587 Berlin, Germany
| | - Alexandre Tkatchenko
- Department
of Physics and Materials Science, University
of Luxembourg, L-1511 Luxembourg City, Luxembourg
| | - Klaus-Robert Müller
- Machine
Learning Group, Technische Universität
Berlin, 10587 Berlin, Germany
- BIFOLD−Berlin
Institute for the Foundations of Learning and Data, Berlin, Germany
- Department
of Artificial Intelligence, Korea University, Anam-dong, Seongbuk-gu, Seoul 02841, Korea
- Max Planck
Institute for Informatics, Stuhlsatzenhausweg, 66123 Saarbrücken, Germany
- Google
Research, Brain Team, Berlin, Germany
| |
Collapse
|
7
|
Meli M, Morra G, Colombo G. Simple Model of Protein Energetics To Identify Ab Initio Folding Transitions from All-Atom MD Simulations of Proteins. J Chem Theory Comput 2020; 16:5960-5971. [PMID: 32693598 PMCID: PMC8009504 DOI: 10.1021/acs.jctc.0c00524] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
![]()
A fundamental
requirement to predict the native conformation, address
questions of sequence design and optimization, and gain insights into
the folding mechanisms of proteins lies in the definition of an unbiased
reaction coordinate that reports on the folding state without the
need to compare it to reference values, which might be unavailable
for new (designed) sequences. Here, we introduce such a reaction coordinate,
which does not depend on previous structural knowledge of the native
state but relies solely on the energy partition within the protein:
the spectral gap of the pair nonbonded energy matrix (ENergy Gap,
ENG). This quantity can be simply calculated along unbiased MD trajectories.
We show that upon folding the gap increases significantly, while its
fluctuations are reduced to a minimum. This is consistently observed
for a diverse set of systems and trajectories. Our approach allows
one to promptly identify residues that belong to the folding core
as well as residues involved in non-native contacts that need to be
disrupted to guide polypeptides to the folded state. The energy gap
and fluctuations criteria are then used to develop an automatic detection
system which allows us to extract and analyze folding transitions
from a generic MD trajectory. We speculate that our method can be
used to detect conformational ensembles in dynamic and intrinsically
disordered proteins, revealing potential preorganization for binding.
Collapse
Affiliation(s)
| | - Giulia Morra
- SCITEC-CNR, Via Mario Bianco 9, Milano 20131, Italy.,Weill-Cornell Medicine, 1300 York Avenue, New York, New York 10065, United States
| | - Giorgio Colombo
- SCITEC-CNR, Via Mario Bianco 9, Milano 20131, Italy.,University of Pavia, Department of Chemistry, Viale Taramelli 12, Pavia 27100, Italy
| |
Collapse
|
8
|
Klus S, Nüske F, Hamzi B. Kernel-Based Approximation of the Koopman Generator and Schrödinger Operator. ENTROPY 2020; 22:e22070722. [PMID: 33286494 PMCID: PMC7517260 DOI: 10.3390/e22070722] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/27/2020] [Revised: 06/25/2020] [Accepted: 06/26/2020] [Indexed: 01/01/2023]
Abstract
Many dimensionality and model reduction techniques rely on estimating dominant eigenfunctions of associated dynamical operators from data. Important examples include the Koopman operator and its generator, but also the Schrödinger operator. We propose a kernel-based method for the approximation of differential operators in reproducing kernel Hilbert spaces and show how eigenfunctions can be estimated by solving auxiliary matrix eigenvalue problems. The resulting algorithms are applied to molecular dynamics and quantum chemistry examples. Furthermore, we exploit that, under certain conditions, the Schrödinger operator can be transformed into a Kolmogorov backward operator corresponding to a drift-diffusion process and vice versa. This allows us to apply methods developed for the analysis of high-dimensional stochastic differential equations to quantum mechanical systems.
Collapse
Affiliation(s)
- Stefan Klus
- Department of Mathematics and Computer Science, Freie Universität Berlin, 14195 Berlin, Germany
- Correspondence:
| | - Feliks Nüske
- Department of Mathematics, Paderborn University, 33098 Paderborn, Germany;
| | - Boumediene Hamzi
- Department of Mathematics, Imperial College London, London SW7 2AZ, UK;
| |
Collapse
|