1
|
Glielmo A, Husic BE, Rodriguez A, Clementi C, Noé F, Laio A. Unsupervised Learning Methods for Molecular Simulation Data. Chem Rev 2021; 121:9722-9758. [PMID: 33945269 PMCID: PMC8391792 DOI: 10.1021/acs.chemrev.0c01195] [Citation(s) in RCA: 116] [Impact Index Per Article: 38.7] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2020] [Indexed: 12/21/2022]
Abstract
Unsupervised learning is becoming an essential tool to analyze the increasingly large amounts of data produced by atomistic and molecular simulations, in material science, solid state physics, biophysics, and biochemistry. In this Review, we provide a comprehensive overview of the methods of unsupervised learning that have been most commonly used to investigate simulation data and indicate likely directions for further developments in the field. In particular, we discuss feature representation of molecular systems and present state-of-the-art algorithms of dimensionality reduction, density estimation, and clustering, and kinetic models. We divide our discussion into self-contained sections, each discussing a specific method. In each section, we briefly touch upon the mathematical and algorithmic foundations of the method, highlight its strengths and limitations, and describe the specific ways in which it has been used-or can be used-to analyze molecular simulation data.
Collapse
Affiliation(s)
- Aldo Glielmo
- International
School for Advanced Studies (SISSA) 34014 Trieste, Italy
| | - Brooke E. Husic
- Freie
Universität Berlin, Department of Mathematics
and Computer Science, 14195 Berlin, Germany
| | - Alex Rodriguez
- International Centre for Theoretical
Physics (ICTP), Condensed Matter and Statistical
Physics Section, 34100 Trieste, Italy
| | - Cecilia Clementi
- Freie
Universität Berlin, Department for
Physics, 14195 Berlin, Germany
- Rice
University Houston, Department of Chemistry, Houston, Texas 77005, United States
| | - Frank Noé
- Freie
Universität Berlin, Department of Mathematics
and Computer Science, 14195 Berlin, Germany
- Freie
Universität Berlin, Department for
Physics, 14195 Berlin, Germany
- Rice
University Houston, Department of Chemistry, Houston, Texas 77005, United States
| | - Alessandro Laio
- International
School for Advanced Studies (SISSA) 34014 Trieste, Italy
- International Centre for Theoretical
Physics (ICTP), Condensed Matter and Statistical
Physics Section, 34100 Trieste, Italy
| |
Collapse
|
2
|
Strahan J, Antoszewski A, Lorpaiboon C, Vani BP, Weare J, Dinner AR. Long-Time-Scale Predictions from Short-Trajectory Data: A Benchmark Analysis of the Trp-Cage Miniprotein. J Chem Theory Comput 2021; 17:2948-2963. [PMID: 33908762 DOI: 10.1021/acs.jctc.0c00933] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Elucidating physical mechanisms with statistical confidence from molecular dynamics simulations can be challenging owing to the many degrees of freedom that contribute to collective motions. To address this issue, we recently introduced a dynamical Galerkin approximation (DGA) [Thiede, E. H. J. Chem. Phys., 150, 2019, 244111], in which chemical kinetic statistics that satisfy equations of dynamical operators are represented by a basis expansion. Here, we reformulate this approach, clarifying (and reducing) the dependence on the choice of lag time. We present a new projection of the reactive current onto collective variables and provide improved estimators for rates and committors. We also present simple procedures for constructing suitable smoothly varying basis functions from arbitrary molecular features. To evaluate estimators and basis sets numerically, we generate and carefully validate a data set of short trajectories for the unfolding and folding of the trp-cage miniprotein, a well-studied system. Our analysis demonstrates a comprehensive strategy for characterizing reaction pathways quantitatively.
Collapse
Affiliation(s)
- John Strahan
- Department of Chemistry, University of Chicago, Chicago, Illinois 60637, United States
| | - Adam Antoszewski
- Department of Chemistry, University of Chicago, Chicago, Illinois 60637, United States
| | - Chatipat Lorpaiboon
- Department of Chemistry, University of Chicago, Chicago, Illinois 60637, United States
| | - Bodhi P Vani
- Department of Chemistry, University of Chicago, Chicago, Illinois 60637, United States
| | - Jonathan Weare
- Courant Institute of Mathematical Sciences, New York University, New York, New York 10012, United States
| | - Aaron R Dinner
- Department of Chemistry, University of Chicago, Chicago, Illinois 60637, United States
| |
Collapse
|
3
|
Webber RJ, Thiede EH, Dow D, Dinner AR, Weare J. Error Bounds for Dynamical Spectral Estimation. SIAM JOURNAL ON MATHEMATICS OF DATA SCIENCE 2021; 3:225-252. [PMID: 34355137 PMCID: PMC8336423 DOI: 10.1137/20m1335984] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Dynamical spectral estimation is a well-established numerical approach for estimating eigenvalues and eigenfunctions of the Markov transition operator from trajectory data. Although the approach has been widely applied in biomolecular simulations, its error properties remain poorly understood. Here we analyze the error of a dynamical spectral estimation method called "the variational approach to conformational dynamics" (VAC). We bound the approximation error and estimation error for VAC estimates. Our analysis establishes VAC's convergence properties and suggests new strategies for tuning VAC to improve accuracy.
Collapse
Affiliation(s)
- Robert J Webber
- Courant Institute of Mathematical Sciences, New York University, New York, NY 10012 USA
| | - Erik H Thiede
- Department of Chemistry, University of Chicago, Chicago, IL 60637 USA
| | - Douglas Dow
- Department of Mathematics, University of Chicago, Chicago, IL 60637 USA
| | - Aaron R Dinner
- Department of Chemistry, University of Chicago, Chicago, IL 60637 USA
| | - Jonathan Weare
- Courant Institute of Mathematical Sciences, New York University, New York, NY 10012 USA
| |
Collapse
|
4
|
Lorpaiboon C, Thiede EH, Webber RJ, Weare J, Dinner AR. Integrated Variational Approach to Conformational Dynamics: A Robust Strategy for Identifying Eigenfunctions of Dynamical Operators. J Phys Chem B 2020; 124:9354-9364. [PMID: 32955887 PMCID: PMC7955702 DOI: 10.1021/acs.jpcb.0c06477] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
One approach to analyzing the dynamics of a physical system is to search for long-lived patterns in its motions. This approach has been particularly successful for molecular dynamics data, where slowly decorrelating patterns can indicate large-scale conformational changes. Detecting such patterns is the central objective of the variational approach to conformational dynamics (VAC), as well as the related methods of time-lagged independent component analysis and Markov state modeling. In VAC, the search for slowly decorrelating patterns is formalized as a variational problem solved by the eigenfunctions of the system's transition operator. VAC computes solutions to this variational problem by optimizing a linear or nonlinear model of the eigenfunctions using time series data. Here, we build on VAC's success by addressing two practical limitations. First, VAC can give poor eigenfunction estimates when the lag time parameter is chosen poorly. Second, VAC can overfit when using flexible parametrizations such as artificial neural networks with insufficient regularization. To address these issues, we propose an extension that we call integrated VAC (IVAC). IVAC integrates over multiple lag times before solving the variational problem, making its results more robust and reproducible than VAC's.
Collapse
Affiliation(s)
- Chatipat Lorpaiboon
- Department of Chemistry and James Franck Institute, University of Chicago, Chicago, Illinois 60637, United States
| | - Erik Henning Thiede
- Flatiron Institute, New York, New York 60637, United States; Department of Computer Science, University of Chicago, Chicago, Illinois 60637, United States
| | - Robert J. Webber
- Courant Institute of Mathematical Sciences, New York University, New York, New York 10012, United States
| | - Jonathan Weare
- Courant Institute of Mathematical Sciences, New York University, New York, New York 10012, United States
| | - Aaron R. Dinner
- Department of Chemistry and James Franck Institute, University of Chicago, Chicago, Illinois 60637, United States
| |
Collapse
|
5
|
Affiliation(s)
- Frank Noé
- Department of Mathematics and Computer Science, Freie Universität Berlin, Berlin, Germany
- Department of Physics, Freie Universität Berlin, Berlin, Germany
| | - Edina Rosta
- Department of Chemistry, Kings College London, London, England
| |
Collapse
|
6
|
Thiede EH, Giannakis D, Dinner AR, Weare J. Galerkin approximation of dynamical quantities using trajectory data. J Chem Phys 2019; 150:244111. [PMID: 31255053 PMCID: PMC6824902 DOI: 10.1063/1.5063730] [Citation(s) in RCA: 27] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2018] [Accepted: 05/13/2019] [Indexed: 11/14/2022] Open
Abstract
Understanding chemical mechanisms requires estimating dynamical statistics such as expected hitting times, reaction rates, and committors. Here, we present a general framework for calculating these dynamical quantities by approximating boundary value problems using dynamical operators with a Galerkin expansion. A specific choice of basis set in the expansion corresponds to the estimation of dynamical quantities using a Markov state model. More generally, the boundary conditions impose restrictions on the choice of basis sets. We demonstrate how an alternative basis can be constructed using ideas from diffusion maps. In our numerical experiments, this basis gives results of comparable or better accuracy to Markov state models. Additionally, we show that delay embedding can reduce the information lost when projecting the system's dynamics for model construction; this improves estimates of dynamical statistics considerably over the standard practice of increasing the lag time.
Collapse
Affiliation(s)
- Erik H Thiede
- Department of Chemistry and James Franck Institute, The University of Chicago, Chicago, Illinois 60637, USA
| | - Dimitrios Giannakis
- Courant Institute of Mathematical Sciences, New York University, New York, New York 10012, USA
| | - Aaron R Dinner
- Department of Chemistry and James Franck Institute, The University of Chicago, Chicago, Illinois 60637, USA
| | - Jonathan Weare
- Courant Institute of Mathematical Sciences, New York University, New York, New York 10012, USA
| |
Collapse
|
7
|
Bittracher A, Banisch R, Schütte C. Data-driven computation of molecular reaction coordinates. J Chem Phys 2018; 149:154103. [PMID: 30342463 DOI: 10.1063/1.5035183] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/02/2023] Open
Abstract
The identification of meaningful reaction coordinates plays a key role in the study of complex molecular systems whose essential dynamics are characterized by rare or slow transition events. In a recent publication, precise defining characteristics of such reaction coordinates were identified and linked to the existence of a so-called transition manifold. This theory gives rise to a novel numerical method for the pointwise computation of reaction coordinates that relies on short parallel MD simulations only, but yields accurate approximation of the long time behavior of the system under consideration. This article presents an extension of the method towards practical applicability in computational chemistry. It links the newly defined reaction coordinates to concepts from transition path theory and Markov state model building. The main result is an alternative computational scheme that allows for a global computation of reaction coordinates based on commonly available types of simulation data, such as single long molecular trajectories or the push-forward of arbitrary canonically distributed point clouds. It is based on a Galerkin approximation of the transition manifold reaction coordinates that can be tuned to individual requirements by the choice of the Galerkin ansatz functions. Moreover, we propose a ready-to-implement variant of the new scheme, which computes data-fitted, mesh-free ansatz functions directly from the available simulation data. The efficacy of the new method is demonstrated on a small protein system.
Collapse
Affiliation(s)
- Andreas Bittracher
- Department of Mathematics, Freie Universität Berlin, 14195 Berlin, Germany
| | - Ralf Banisch
- Department of Mathematics, Freie Universität Berlin, 14195 Berlin, Germany
| | - Christof Schütte
- Department of Mathematics, Freie Universität Berlin, 14195 Berlin, Germany
| |
Collapse
|
8
|
Donati L, Keller BG. Girsanov reweighting for metadynamics simulations. J Chem Phys 2018; 149:072335. [DOI: 10.1063/1.5027728] [Citation(s) in RCA: 34] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023] Open
Affiliation(s)
- Luca Donati
- Department of Biology, Chemistry, Pharmacy, Freie Universität Berlin, Takustraße 3, D-14195 Berlin, Germany
| | - Bettina G. Keller
- Department of Biology, Chemistry, Pharmacy, Freie Universität Berlin, Takustraße 3, D-14195 Berlin, Germany
| |
Collapse
|
9
|
|
10
|
Affiliation(s)
- Brooke E. Husic
- Department of Chemistry, Stanford University, Stanford, California 94305, United States
| | - Vijay S. Pande
- Department of Chemistry, Stanford University, Stanford, California 94305, United States
| |
Collapse
|
11
|
Donati L, Hartmann C, Keller BG. Girsanov reweighting for path ensembles and Markov state models. J Chem Phys 2017; 146:244112. [DOI: 10.1063/1.4989474] [Citation(s) in RCA: 28] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022] Open
Affiliation(s)
- L. Donati
- Department of Biology, Chemistry, Pharmacy, Freie Universität Berlin, Takustraße 3, D-14195 Berlin, Germany
| | - C. Hartmann
- Institute of Mathematics, Brandenburgische Technische Universität Cottbus-Senftenberg, Konrad-Wachsmann-Allee 1, D-03046 Cottbus, Germany
| | - B. G. Keller
- Department of Biology, Chemistry, Pharmacy, Freie Universität Berlin, Takustraße 3, D-14195 Berlin, Germany
| |
Collapse
|
12
|
Wu H, Nüske F, Paul F, Klus S, Koltai P, Noé F. Variational Koopman models: Slow collective variables and molecular kinetics from short off-equilibrium simulations. J Chem Phys 2017; 146:154104. [DOI: 10.1063/1.4979344] [Citation(s) in RCA: 69] [Impact Index Per Article: 9.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
Affiliation(s)
- Hao Wu
- Department of Mathematics and Computer Science, Freie Universität Berlin, Arnimallee 6, 14195 Berlin, Germany
| | - Feliks Nüske
- Department of Mathematics and Computer Science, Freie Universität Berlin, Arnimallee 6, 14195 Berlin, Germany
| | - Fabian Paul
- Department of Mathematics and Computer Science, Freie Universität Berlin, Arnimallee 6, 14195 Berlin, Germany
| | - Stefan Klus
- Department of Mathematics and Computer Science, Freie Universität Berlin, Arnimallee 6, 14195 Berlin, Germany
| | - Péter Koltai
- Department of Mathematics and Computer Science, Freie Universität Berlin, Arnimallee 6, 14195 Berlin, Germany
| | - Frank Noé
- Department of Mathematics and Computer Science, Freie Universität Berlin, Arnimallee 6, 14195 Berlin, Germany
| |
Collapse
|
13
|
Noé F, Clementi C. Collective variables for the study of long-time kinetics from molecular trajectories: theory and methods. Curr Opin Struct Biol 2017; 43:141-147. [PMID: 28327454 DOI: 10.1016/j.sbi.2017.02.006] [Citation(s) in RCA: 98] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2016] [Accepted: 02/20/2017] [Indexed: 12/23/2022]
Abstract
Collective variables are an important concept to study high-dimensional dynamical systems, such as molecular dynamics of macromolecules, liquids, or polymers, in particular to define relevant metastable states and state-transition or phase-transition. Over the past decade, a rigorous mathematical theory has been formulated to define optimal collective variables to characterize slow dynamical processes. Here we review recent developments, including a variational principle to find optimal approximations to slow collective variables from simulation data, and algorithms such as the time-lagged independent component analysis. Using these concepts, a distance metric can be defined that quantifies how slowly molecular conformations interconvert. Extensions and open questions are discussed.
Collapse
Affiliation(s)
- Frank Noé
- Department of Mathematics and Computer Science, FU Berlin, Arnimallee 6, 14195 Berlin, Germany.
| | - Cecilia Clementi
- Center for Theoretical Biological Physics, and Department of Chemistry, Rice University, 6100 Main Street, Houston, TX 77005, United States.
| |
Collapse
|
14
|
Liu S, Zhu L, Sheong FK, Wang W, Huang X. Adaptive partitioning by local density-peaks: An efficient density-based clustering algorithm for analyzing molecular dynamics trajectories. J Comput Chem 2016; 38:152-160. [PMID: 27868222 DOI: 10.1002/jcc.24664] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2016] [Revised: 10/09/2016] [Accepted: 10/26/2016] [Indexed: 12/11/2022]
Abstract
We present an efficient density-based adaptive-resolution clustering method APLoD for analyzing large-scale molecular dynamics (MD) trajectories. APLoD performs the k-nearest-neighbors search to estimate the density of MD conformations in a local fashion, which can group MD conformations in the same high-density region into a cluster. APLoD greatly improves the popular density peaks algorithm by reducing the running time and the memory usage by 2-3 orders of magnitude for systems ranging from alanine dipeptide to a 370-residue Maltose-binding protein. In addition, we demonstrate that APLoD can produce clusters with various sizes that are adaptive to the underlying density (i.e., larger clusters at low-density regions, while smaller clusters at high-density regions), which is a clear advantage over other popular clustering algorithms including k-centers and k-medoids. We anticipate that APLoD can be widely applied to split ultra-large MD datasets containing millions of conformations for subsequent construction of Markov State Models. © 2016 Wiley Periodicals, Inc.
Collapse
Affiliation(s)
- Song Liu
- Department of Chemistry, The Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong
| | - Lizhe Zhu
- Department of Chemistry, The Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong.,Center of Systems Biology and Human Health, State Key Laboratory of Molecular Neuroscience, The Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong
| | - Fu Kit Sheong
- Department of Chemistry, The Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong
| | - Wei Wang
- Department of Chemistry, The Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong.,Center of Systems Biology and Human Health, State Key Laboratory of Molecular Neuroscience, The Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong
| | - Xuhui Huang
- Department of Chemistry, The Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong.,Center of Systems Biology and Human Health, State Key Laboratory of Molecular Neuroscience, The Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong
| |
Collapse
|
15
|
Lemke O, Keller BG. Density-based cluster algorithms for the identification of core sets. J Chem Phys 2016; 145:164104. [DOI: 10.1063/1.4965440] [Citation(s) in RCA: 48] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/12/2023] Open
Affiliation(s)
- Oliver Lemke
- Department of Biology, Chemistry, Pharmacy, Freie Universität Berlin, Takustraße 3, D-14195 Berlin, Germany
| | - Bettina G. Keller
- Department of Biology, Chemistry, Pharmacy, Freie Universität Berlin, Takustraße 3, D-14195 Berlin, Germany
| |
Collapse
|
16
|
Vitalini F, Noé F, Keller BG. Molecular dynamics simulations data of the twenty encoded amino acids in different force fields. Data Brief 2016; 7:582-90. [PMID: 27054161 PMCID: PMC4802541 DOI: 10.1016/j.dib.2016.02.086] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2015] [Revised: 02/23/2016] [Accepted: 02/29/2016] [Indexed: 11/26/2022] Open
Abstract
We present extensive all-atom Molecular Dynamics (MD) simulation data of the twenty encoded amino acids in explicit water, simulated with different force fields. The termini of the amino acids have been capped to ensure that the dynamics of the Φ and ψ torsion angles are analogues to the dynamics within a peptide chain. We use representatives of each of the four major force field families: AMBER ff-99SBILDN [1], AMBER ff-03 [2], OPLS-AA/L [3], CHARMM27 [4] and GROMOS43a1 [5], [6]. Our data represents a library and test bed for method development for MD simulations and for force fields development. Part of the data set has been previously used for comparison of the dynamic properties of force fields (Vitalini et al., 2015) [7] and for the construction of peptide basis functions for the variational approach to molecular kinetics [8].
Collapse
Affiliation(s)
- F Vitalini
- Department of Biology, Chemistry, Pharmacy, Freie Universität Berlin, Takustraße 3, D-14195 Berlin, Germany
| | - F Noé
- Department of Mathematics and Computer Science, Freie Universität Berlin, Arnimallee 6, D-14195 Berlin, Germany
| | - B G Keller
- Department of Biology, Chemistry, Pharmacy, Freie Universität Berlin, Takustraße 3, D-14195 Berlin, Germany
| |
Collapse
|
17
|
Rudzinski JF, Kremer K, Bereau T. Communication: Consistent interpretation of molecular simulation kinetics using Markov state models biased with external information. J Chem Phys 2016; 144:051102. [DOI: 10.1063/1.4941455] [Citation(s) in RCA: 28] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022] Open
Affiliation(s)
| | - Kurt Kremer
- Max Planck Institute for Polymer Research, 55128 Mainz, Germany
| | - Tristan Bereau
- Max Planck Institute for Polymer Research, 55128 Mainz, Germany
| |
Collapse
|
18
|
Nüske F, Schneider R, Vitalini F, Noé F. Variational tensor approach for approximating the rare-event kinetics of macromolecular systems. J Chem Phys 2016; 144:054105. [DOI: 10.1063/1.4940774] [Citation(s) in RCA: 48] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
Affiliation(s)
- Feliks Nüske
- Department of Mathematics and Computer Science, Freie Universität Berlin, Arnimallee 6, 14195 Berlin, Germany
| | - Reinhold Schneider
- Institut für Mathematik, Technische Universität Berlin, Straße des 17. Juni 136, 10623 Berlin, Germany
| | - Francesca Vitalini
- Department of Chemistry, Freie Universität Berlin, Takustr. 3, 14195 Berlin, Germany
| | - Frank Noé
- Department of Mathematics and Computer Science, Freie Universität Berlin, Arnimallee 6, 14195 Berlin, Germany
| |
Collapse
|
19
|
Scherer MK, Trendelkamp-Schroer B, Paul F, Pérez-Hernández G, Hoffmann M, Plattner N, Wehmeyer C, Prinz JH, Noé F. PyEMMA 2: A Software Package for Estimation, Validation, and Analysis of Markov Models. J Chem Theory Comput 2015; 11:5525-42. [PMID: 26574340 DOI: 10.1021/acs.jctc.5b00743] [Citation(s) in RCA: 721] [Impact Index Per Article: 80.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Abstract
Markov (state) models (MSMs) and related models of molecular kinetics have recently received a surge of interest as they can systematically reconcile simulation data from either a few long or many short simulations and allow us to analyze the essential metastable structures, thermodynamics, and kinetics of the molecular system under investigation. However, the estimation, validation, and analysis of such models is far from trivial and involves sophisticated and often numerically sensitive methods. In this work we present the open-source Python package PyEMMA ( http://pyemma.org ) that provides accurate and efficient algorithms for kinetic model construction. PyEMMA can read all common molecular dynamics data formats, helps in the selection of input features, provides easy access to dimension reduction algorithms such as principal component analysis (PCA) and time-lagged independent component analysis (TICA) and clustering algorithms such as k-means, and contains estimators for MSMs, hidden Markov models, and several other models. Systematic model validation and error calculation methods are provided. PyEMMA offers a wealth of analysis functions such that the user can conveniently compute molecular observables of interest. We have derived a systematic and accurate way to coarse-grain MSMs to few states and to illustrate the structures of the metastable states of the system. Plotting functions to produce a manuscript-ready presentation of the results are available. In this work, we demonstrate the features of the software and show new methodological concepts and results produced by PyEMMA.
Collapse
Affiliation(s)
- Martin K Scherer
- Department for Mathematics and Computer Science, Freie Universität , Arnimallee 6, Berlin 14195, Germany
| | | | - Fabian Paul
- Department for Mathematics and Computer Science, Freie Universität , Arnimallee 6, Berlin 14195, Germany
| | - Guillermo Pérez-Hernández
- Department for Mathematics and Computer Science, Freie Universität , Arnimallee 6, Berlin 14195, Germany
| | - Moritz Hoffmann
- Department for Mathematics and Computer Science, Freie Universität , Arnimallee 6, Berlin 14195, Germany
| | - Nuria Plattner
- Department for Mathematics and Computer Science, Freie Universität , Arnimallee 6, Berlin 14195, Germany
| | - Christoph Wehmeyer
- Department for Mathematics and Computer Science, Freie Universität , Arnimallee 6, Berlin 14195, Germany
| | - Jan-Hendrik Prinz
- Department for Mathematics and Computer Science, Freie Universität , Arnimallee 6, Berlin 14195, Germany
| | - Frank Noé
- Department for Mathematics and Computer Science, Freie Universität , Arnimallee 6, Berlin 14195, Germany
| |
Collapse
|