1
|
Lee SC, Z Y. Interpretation of autoencoder-learned collective variables using Morse-Smale complex and sublevelset persistent homology: An application on molecular trajectories. J Chem Phys 2024; 160:144104. [PMID: 38591676 DOI: 10.1063/5.0191446] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2023] [Accepted: 03/22/2024] [Indexed: 04/10/2024] Open
Abstract
Dimensionality reduction often serves as the first step toward a minimalist understanding of physical systems as well as the accelerated simulations of them. In particular, neural network-based nonlinear dimensionality reduction methods, such as autoencoders, have shown promising outcomes in uncovering collective variables (CVs). However, the physical meaning of these CVs remains largely elusive. In this work, we constructed a framework that (1) determines the optimal number of CVs needed to capture the essential molecular motions using an ensemble of hierarchical autoencoders and (2) provides topology-based interpretations to the autoencoder-learned CVs with Morse-Smale complex and sublevelset persistent homology. This approach was exemplified using a series of n-alkanes and can be regarded as a general, explainable nonlinear dimensionality reduction method.
Collapse
Affiliation(s)
- Shao-Chun Lee
- Department of Nuclear, Plasma, and Radiological Engineering, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, USA
- Beckman Institute for Advanced Science and Technology, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, USA
| | - Y Z
- Department of Nuclear, Plasma, and Radiological Engineering, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, USA
- Beckman Institute for Advanced Science and Technology, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, USA
- Department of Nuclear Engineering and Radiological Sciences, Department of Materials Science and Engineering, Department of Robotics, and Applied Physics Program, University of Michigan, Ann Arbor, Michigan 48105, USA
| |
Collapse
|
2
|
Barbatti M, Bondanza M, Crespo-Otero R, Demoulin B, Dral PO, Granucci G, Kossoski F, Lischka H, Mennucci B, Mukherjee S, Pederzoli M, Persico M, Pinheiro Jr M, Pittner J, Plasser F, Sangiogo Gil E, Stojanovic L. Newton-X Platform: New Software Developments for Surface Hopping and Nuclear Ensembles. J Chem Theory Comput 2022; 18:6851-6865. [PMID: 36194696 PMCID: PMC9648185 DOI: 10.1021/acs.jctc.2c00804] [Citation(s) in RCA: 19] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2022] [Indexed: 12/01/2022]
Abstract
Newton-X is an open-source computational platform to perform nonadiabatic molecular dynamics based on surface hopping and spectrum simulations using the nuclear ensemble approach. Both are among the most common methodologies in computational chemistry for photophysical and photochemical investigations. This paper describes the main features of these methods and how they are implemented in Newton-X. It emphasizes the newest developments, including zero-point-energy leakage correction, dynamics on complex-valued potential energy surfaces, dynamics induced by incoherent light, dynamics based on machine-learning potentials, exciton dynamics of multiple chromophores, and supervised and unsupervised machine learning techniques. Newton-X is interfaced with several third-party quantum-chemistry programs, spanning a broad spectrum of electronic structure methods.
Collapse
Affiliation(s)
- Mario Barbatti
- Aix
Marseille University, CNRS, ICR, 13013Marseille, France
- Institut
Universitaire de France, 75231Paris, France
| | - Mattia Bondanza
- Dipartimento
di Chimica e Chimica Industriale, Università
di Pisa, via Moruzzi
13, 56124Pisa, Italy
| | - Rachel Crespo-Otero
- Department
of Chemistry, Queen Mary University of London, Mile End Road, E1 4NSLondon, U.K.
| | | | - Pavlo O. Dral
- State
Key Laboratory of Physical Chemistry of Solid Surfaces, Fujian Provincial
Key Laboratory of Theoretical and Computational Chemistry, Department
of Chemistry, and College of Chemistry and Chemical Engineering, Xiamen University, 361005Xiamen, China
| | - Giovanni Granucci
- Dipartimento
di Chimica e Chimica Industriale, Università
di Pisa, via Moruzzi
13, 56124Pisa, Italy
| | - Fábris Kossoski
- Laboratoire
de Chimie et Physique Quantiques (UMR 5626), Université de Toulouse, CNRS, UPS, 31000Toulouse, France
| | - Hans Lischka
- Department
of Chemistry and Biochemistry, Texas Tech
University, Lubbock, Texas79409, United States
| | - Benedetta Mennucci
- Dipartimento
di Chimica e Chimica Industriale, Università
di Pisa, via Moruzzi
13, 56124Pisa, Italy
| | | | - Marek Pederzoli
- J.
Heyrovsky Institute of Physical Chemistry, Academy of Sciences of the Czech Republic, v.v.i., Dolejškova 3, 18223Prague 8, Czech Republic
| | - Maurizio Persico
- Dipartimento
di Chimica e Chimica Industriale, Università
di Pisa, via Moruzzi
13, 56124Pisa, Italy
| | | | - Jiří Pittner
- J.
Heyrovsky Institute of Physical Chemistry, Academy of Sciences of the Czech Republic, v.v.i., Dolejškova 3, 18223Prague 8, Czech Republic
| | - Felix Plasser
- Department
of Chemistry, Loughborough University, LE11 3TULoughborough, U.K.
| | - Eduarda Sangiogo Gil
- Dipartimento
di Chimica e Chimica Industriale, Università
di Pisa, via Moruzzi
13, 56124Pisa, Italy
| | - Ljiljana Stojanovic
- Department
of Physics and Astronomy, University College
London, Gower Street, WC1E 6BTLondon, U.K.
| |
Collapse
|
3
|
Appeldorn JH, Lemcke S, Speck T, Nikoubashman A. Employing Artificial Neural Networks to Identify Reaction Coordinates and Pathways for Self-Assembly. J Phys Chem B 2022; 126:5007-5016. [DOI: 10.1021/acs.jpcb.2c02232] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
Affiliation(s)
- Jörn H. Appeldorn
- Institute of Physics, Johannes Gutenberg-University Mainz, Staudingerweg 7-9, 55128 Mainz, Germany
| | - Simon Lemcke
- Institute of Physics, Johannes Gutenberg-University Mainz, Staudingerweg 7-9, 55128 Mainz, Germany
| | - Thomas Speck
- Institute of Physics, Johannes Gutenberg-University Mainz, Staudingerweg 7-9, 55128 Mainz, Germany
| | - Arash Nikoubashman
- Institute of Physics, Johannes Gutenberg-University Mainz, Staudingerweg 7-9, 55128 Mainz, Germany
| |
Collapse
|
4
|
Mahler BI. Contagion Dynamics for Manifold Learning. Front Big Data 2022; 5:668356. [PMID: 35574575 PMCID: PMC9094365 DOI: 10.3389/fdata.2022.668356] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2021] [Accepted: 02/02/2022] [Indexed: 11/29/2022] Open
Abstract
Contagion maps exploit activation times in threshold contagions to assign vectors in high-dimensional Euclidean space to the nodes of a network. A point cloud that is the image of a contagion map reflects both the structure underlying the network and the spreading behavior of the contagion on it. Intuitively, such a point cloud exhibits features of the network's underlying structure if the contagion spreads along that structure, an observation which suggests contagion maps as a viable manifold-learning technique. We test contagion maps and variants thereof as a manifold-learning tool on a number of different synthetic and real-world data sets, and we compare their performance to that of Isomap, one of the most well-known manifold-learning algorithms. We find that, under certain conditions, contagion maps are able to reliably detect underlying manifold structure in noisy data, while Isomap fails due to noise-induced error. This consolidates contagion maps as a technique for manifold learning. We also demonstrate that processing distance estimates between data points before performing methods to determine geometry, topology and dimensionality of a data set leads to clearer results for both Isomap and contagion maps.
Collapse
|
5
|
Glielmo A, Husic BE, Rodriguez A, Clementi C, Noé F, Laio A. Unsupervised Learning Methods for Molecular Simulation Data. Chem Rev 2021; 121:9722-9758. [PMID: 33945269 PMCID: PMC8391792 DOI: 10.1021/acs.chemrev.0c01195] [Citation(s) in RCA: 116] [Impact Index Per Article: 38.7] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2020] [Indexed: 12/21/2022]
Abstract
Unsupervised learning is becoming an essential tool to analyze the increasingly large amounts of data produced by atomistic and molecular simulations, in material science, solid state physics, biophysics, and biochemistry. In this Review, we provide a comprehensive overview of the methods of unsupervised learning that have been most commonly used to investigate simulation data and indicate likely directions for further developments in the field. In particular, we discuss feature representation of molecular systems and present state-of-the-art algorithms of dimensionality reduction, density estimation, and clustering, and kinetic models. We divide our discussion into self-contained sections, each discussing a specific method. In each section, we briefly touch upon the mathematical and algorithmic foundations of the method, highlight its strengths and limitations, and describe the specific ways in which it has been used-or can be used-to analyze molecular simulation data.
Collapse
Affiliation(s)
- Aldo Glielmo
- International
School for Advanced Studies (SISSA) 34014 Trieste, Italy
| | - Brooke E. Husic
- Freie
Universität Berlin, Department of Mathematics
and Computer Science, 14195 Berlin, Germany
| | - Alex Rodriguez
- International Centre for Theoretical
Physics (ICTP), Condensed Matter and Statistical
Physics Section, 34100 Trieste, Italy
| | - Cecilia Clementi
- Freie
Universität Berlin, Department for
Physics, 14195 Berlin, Germany
- Rice
University Houston, Department of Chemistry, Houston, Texas 77005, United States
| | - Frank Noé
- Freie
Universität Berlin, Department of Mathematics
and Computer Science, 14195 Berlin, Germany
- Freie
Universität Berlin, Department for
Physics, 14195 Berlin, Germany
- Rice
University Houston, Department of Chemistry, Houston, Texas 77005, United States
| | - Alessandro Laio
- International
School for Advanced Studies (SISSA) 34014 Trieste, Italy
- International Centre for Theoretical
Physics (ICTP), Condensed Matter and Statistical
Physics Section, 34100 Trieste, Italy
| |
Collapse
|
6
|
Trozzi F, Wang X, Tao P. UMAP as a Dimensionality Reduction Tool for Molecular Dynamics Simulations of Biomacromolecules: A Comparison Study. J Phys Chem B 2021; 125:5022-5034. [PMID: 33973773 PMCID: PMC8356557 DOI: 10.1021/acs.jpcb.1c02081] [Citation(s) in RCA: 36] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
Abstract
Proteins are the molecular machines of life. The multitude of possible conformations that proteins can adopt determines their free-energy landscapes. However, the inherently high dimensionality of a protein free-energy landscape poses a challenge to deciphering how proteins perform their functions. For this reason, dimensionality reduction is an active field of research for molecular biologists. The uniform manifold approximation and projection (UMAP) is a dimensionality reduction method based on a fuzzy topological analysis of data. In the present study, the performance of UMAP is compared with that of other popular dimensionality reduction methods such as t-distributed stochastic neighbor embedding (t-SNE), principal component analysis (PCA), and time-structure independent components analysis (tICA) in the context of analyzing molecular dynamics simulations of the circadian clock protein VIVID. A good dimensionality reduction method should accurately represent the data structure on the projected components. The comparison of the raw high-dimensional data with the projections obtained using different dimensionality reduction methods based on various metrics showed that UMAP has superior performance when compared with linear reduction methods (PCA and tICA) and has competitive performance and scalable computational cost.
Collapse
Affiliation(s)
- Francesco Trozzi
- Department of Chemistry, Center for Research Computing, Center for Drug Discovery, Design, and Delivery (CD4), Southern Methodist University, Dallas, Texas, 75275, United States of America
| | - Xinlei Wang
- Department of Statistical Science, Southern Methodist University, Dallas, Texas, 75275, United States of America
| | - Peng Tao
- Department of Chemistry, Center for Research Computing, Center for Drug Discovery, Design, and Delivery (CD4), Southern Methodist University, Dallas, Texas, 75275, United States of America
| |
Collapse
|
7
|
Adams H, Moy M. Topology Applied to Machine Learning: From Global to Local. Front Artif Intell 2021; 4:668302. [PMID: 34056580 PMCID: PMC8160457 DOI: 10.3389/frai.2021.668302] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2021] [Accepted: 04/15/2021] [Indexed: 11/24/2022] Open
Abstract
Through the use of examples, we explain one way in which applied topology has evolved since the birth of persistent homology in the early 2000s. The first applications of topology to data emphasized the global shape of a dataset, such as the three-circle model for 3 × 3 pixel patches from natural images, or the configuration space of the cyclo-octane molecule, which is a sphere with a Klein bottle attached via two circles of singularity. In these studies of global shape, short persistent homology bars are disregarded as sampling noise. More recently, however, persistent homology has been used to address questions about the local geometry of data. For instance, how can local geometry be vectorized for use in machine learning problems? Persistent homology and its vectorization methods, including persistence landscapes and persistence images, provide popular techniques for incorporating both local geometry and global topology into machine learning. Our meta-hypothesis is that the short bars are as important as the long bars for many machine learning tasks. In defense of this claim, we survey applications of persistent homology to shape recognition, agent-based modeling, materials science, archaeology, and biology. Additionally, we survey work connecting persistent homology to geometric features of spaces, including curvature and fractal dimension, and various methods that have been used to incorporate persistent homology into machine learning.
Collapse
Affiliation(s)
- Henry Adams
- Department of Mathematics, Colorado State University, Fort Collins, CO, United States
| | - Michael Moy
- Department of Mathematics, Colorado State University, Fort Collins, CO, United States
| |
Collapse
|
8
|
Manzhos S, Carrington T. Neural Network Potential Energy Surfaces for Small Molecules and Reactions. Chem Rev 2020; 121:10187-10217. [PMID: 33021368 DOI: 10.1021/acs.chemrev.0c00665] [Citation(s) in RCA: 119] [Impact Index Per Article: 29.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Abstract
We review progress in neural network (NN)-based methods for the construction of interatomic potentials from discrete samples (such as ab initio energies) for applications in classical and quantum dynamics including reaction dynamics and computational spectroscopy. The main focus is on methods for building molecular potential energy surfaces (PES) in internal coordinates that explicitly include all many-body contributions, even though some of the methods we review limit the degree of coupling, due either to a desire to limit computational cost or to limited data. Explicit and direct treatment of all many-body contributions is only practical for sufficiently small molecules, which are therefore our primary focus. This includes small molecules on surfaces. We consider direct, single NN PES fitting as well as more complex methods that impose structure (such as a multibody representation) on the PES function, either through the architecture of one NN or by using multiple NNs. We show how NNs are effective in building representations with low-dimensional functions including dimensionality reduction. We consider NN-based approaches to build PESs in the sums-of-product form important for quantum dynamics, ways to treat symmetry, and issues related to sampling data distributions and the relation between PES errors and errors in observables. We highlight combinations of NNs with other ideas such as permutationally invariant polynomials or sums of environment-dependent atomic contributions, which have recently emerged as powerful tools for building highly accurate PESs for relatively large molecular and reactive systems.
Collapse
Affiliation(s)
- Sergei Manzhos
- Centre Énergie Matériaux Télécommunications, Institut National de la Recherche Scientifique, 1650, Boulevard Lionel-Boulet, Varennes, Québec City, Québec J3X 1S2, Canada
| | - Tucker Carrington
- Chemistry Department, Queen's University, Kingston Ontario K7L 3N6, Canada
| |
Collapse
|
9
|
Spiwok V, Kříž P. Time-Lagged t-Distributed Stochastic Neighbor Embedding (t-SNE) of Molecular Simulation Trajectories. Front Mol Biosci 2020; 7:132. [PMID: 32714941 PMCID: PMC7344294 DOI: 10.3389/fmolb.2020.00132] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2020] [Accepted: 06/03/2020] [Indexed: 11/30/2022] Open
Abstract
Molecular simulation trajectories represent high-dimensional data. Such data can be visualized by methods of dimensionality reduction. Non-linear dimensionality reduction methods are likely to be more efficient than linear ones due to the fact that motions of atoms are non-linear. Here we test a popular non-linear t-distributed Stochastic Neighbor Embedding (t-SNE) method on analysis of trajectories of 200 ns alanine dipeptide dynamics and 208 μs Trp-cage folding and unfolding. Furthermore, we introduce a time-lagged variant of t-SNE in order to focus on rarely occurring transitions in the molecular system. This time-lagged t-SNE efficiently separates states according to distance in time. Using this method it is possible to visualize key states of studied systems (e.g., unfolded and folded protein) as well as possible kinetic traps using a two-dimensional plot. Time-lagged t-SNE is a visualization method and other applications, such as clustering and free energy modeling, must be done with caution.
Collapse
Affiliation(s)
- Vojtěch Spiwok
- Department of Biochemistry and Microbiology, University of Chemistry and Technology, Prague, Czechia
| | - Pavel Kříž
- Department of Mathematics, University of Chemistry and Technology, Prague, Czechia
| |
Collapse
|
10
|
Zou W, Tao Y, Kraka E. Systematic description of molecular deformations with Cremer-Pople puckering and deformation coordinates utilizing analytic derivatives: Applied to cycloheptane, cyclooctane, and cyclo[18]carbon. J Chem Phys 2020; 152:154107. [PMID: 32321269 DOI: 10.1063/1.5144278] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
The conformational properties of ring compounds such as cycloalkanes determine to a large extent their stability and reactivity. Therefore, the investigation of conformational processes such as ring inversion and/or ring pseudorotation has attracted a lot of attention over the past decades. An in-depth conformational analysis of ring compounds requires mapping the relevant parts of the conformational energy surface at stationary and also at non-stationary points. However, the latter is not feasible by a description of the ring with Cartesian or internal coordinates. We provide in this work, a solution to this problem by introducing a new coordinate system based on the Cremer-Pople puckering and deformation coordinates. Furthermore, analytic first- and second-order derivatives of puckering and deformation coordinates, i.e., B-matrices and D-tensors, were developed simplifying geometry optimization and frequency calculations. The new coordinate system is applied to map the potential energy surfaces and reaction paths of cycloheptane (C7H14), cyclooctane (C8H16), and cyclo[18]carbon (C18) at the quantum chemical level and to determine for the first time all stationary points of these ring compounds in a systematic way.
Collapse
Affiliation(s)
- Wenli Zou
- Computational and Theoretical Chemistry Group (CATCO), Department of Chemistry, Southern Methodist University, 3215 Daniel Ave., Dallas, Texas 75275-0314, USA
| | - Yunwen Tao
- Computational and Theoretical Chemistry Group (CATCO), Department of Chemistry, Southern Methodist University, 3215 Daniel Ave., Dallas, Texas 75275-0314, USA
| | - Elfi Kraka
- Computational and Theoretical Chemistry Group (CATCO), Department of Chemistry, Southern Methodist University, 3215 Daniel Ave., Dallas, Texas 75275-0314, USA
| |
Collapse
|
11
|
Fabrizio A, Meyer B, Corminboeuf C. Machine learning models of the energy curvature vs particle number for optimal tuning of long-range corrected functionals. J Chem Phys 2020; 152:154103. [DOI: 10.1063/5.0005039] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/16/2023] Open
Affiliation(s)
- Alberto Fabrizio
- Laboratory for Computational Molecular Design, Institute of Chemical Sciences and Engineering, École Polytechnique Fédérale de Lausanne, CH-1015 Lausanne, Switzerland
- National Centre for Computational Design and Discovery of Novel Materials (MARVEL), École Polytechnique Fédérale de Lausanne, 1015 Lausanne, Switzerland
| | - Benjamin Meyer
- Laboratory for Computational Molecular Design, Institute of Chemical Sciences and Engineering, École Polytechnique Fédérale de Lausanne, CH-1015 Lausanne, Switzerland
- National Centre for Computational Design and Discovery of Novel Materials (MARVEL), École Polytechnique Fédérale de Lausanne, 1015 Lausanne, Switzerland
| | - Clemence Corminboeuf
- Laboratory for Computational Molecular Design, Institute of Chemical Sciences and Engineering, École Polytechnique Fédérale de Lausanne, CH-1015 Lausanne, Switzerland
- National Centre for Computational Design and Discovery of Novel Materials (MARVEL), École Polytechnique Fédérale de Lausanne, 1015 Lausanne, Switzerland
| |
Collapse
|
12
|
Alam FF, Rahman T, Shehu A. Evaluating Autoencoder-Based Featurization and Supervised Learning for Protein Decoy Selection. Molecules 2020; 25:E1146. [PMID: 32143444 PMCID: PMC7179114 DOI: 10.3390/molecules25051146] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2020] [Revised: 02/18/2020] [Accepted: 02/25/2020] [Indexed: 11/24/2022] Open
Abstract
Rapid growth in molecular structure data is renewing interest in featurizing structure. Featurizations that retain information on biological activity are particularly sought for protein molecules, where decades of research have shown that indeed structure encodes function. Research on featurization of protein structure is active, but here we assess the promise of autoencoders. Motivated by rapid progress in neural network research, we investigate and evaluate autoencoders on yielding linear and nonlinear featurizations of protein tertiary structures. An additional reason we focus on autoencoders as the engine to obtain featurizations is the versatility of their architectures and the ease with which changes to architecture yield linear versus nonlinear features. While open-source neural network libraries, such as Keras, which we employ here, greatly facilitate constructing, training, and evaluating autoencoder architectures and conducting model search, autoencoders have not yet gained popularity in the structure biology community. Here we demonstrate their utility in a practical context. Employing autoencoder-based featurizations, we address the classic problem of decoy selection in protein structure prediction. Utilizing off-the-shelf supervised learning methods, we demonstrate that the featurizations are indeed meaningful and allow detecting active tertiary structures, thus opening the way for further avenues of research.
Collapse
Affiliation(s)
- Fardina Fathmiul Alam
- Department of Computer Science, George Mason University, Fairfax, VA 22030, USA; (F.F.A.); (T.R.)
| | - Taseef Rahman
- Department of Computer Science, George Mason University, Fairfax, VA 22030, USA; (F.F.A.); (T.R.)
| | - Amarda Shehu
- Department of Computer Science, George Mason University, Fairfax, VA 22030, USA; (F.F.A.); (T.R.)
- Center for Advancing Human-Machine Partnerships, George Mason University, Fairfax, VA 22030, USA
- Department of Bioengineering, George Mason University, Fairfax, VA 22030, USA
- School of Systems Biology, George Mason University, Fairfax, VA 22030, USA
| |
Collapse
|
13
|
Tribello GA, Gasparotto P. Using Dimensionality Reduction to Analyze Protein Trajectories. Front Mol Biosci 2019; 6:46. [PMID: 31275943 PMCID: PMC6593086 DOI: 10.3389/fmolb.2019.00046] [Citation(s) in RCA: 39] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2019] [Accepted: 05/31/2019] [Indexed: 11/24/2022] Open
Abstract
In recent years the analysis of molecular dynamics trajectories using dimensionality reduction algorithms has become commonplace. These algorithms seek to find a low-dimensional representation of a trajectory that is, according to a well-defined criterion, optimal. A number of different strategies for generating projections of trajectories have been proposed but little has been done to systematically compare how these various approaches fare when it comes to analysing trajectories for biomolecules in explicit solvent. In the following paper, we have thus analyzed a molecular dynamics trajectory of the C-terminal fragment of the immunoglobulin binding domain B1 of protein G of Streptococcus modeled in explicit solvent using a range of different dimensionality reduction algorithms. We have then tried to systematically compare the projections generated using each of these algorithms by using a clustering algorithm to find the positions and extents of the basins in the high-dimensional energy landscape. We find that no algorithm outshines all the other in terms of the quality of the projection it generates. Instead, all the algorithms do a reasonable job when it comes to building a projection that separates some of the configurations that lie in different basins. Having said that, however, all the algorithms struggle to project the basins because they all have a large intrinsic dimensionality.
Collapse
Affiliation(s)
- Gareth A Tribello
- Atomistic Simulation Centre, School of Mathematics and Physics, Queen's University Belfast, Belfast, United Kingdom
| | - Piero Gasparotto
- Department of Physics and Astronomy, Thomas Young Centre, University College London, London, United Kingdom
| |
Collapse
|
14
|
Trapl D, Horvacanin I, Mareska V, Ozcelik F, Unal G, Spiwok V. Anncolvar: Approximation of Complex Collective Variables by Artificial Neural Networks for Analysis and Biasing of Molecular Simulations. Front Mol Biosci 2019; 6:25. [PMID: 31058167 PMCID: PMC6482212 DOI: 10.3389/fmolb.2019.00025] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2019] [Accepted: 04/01/2019] [Indexed: 11/23/2022] Open
Abstract
The state of a molecular system can be described in terms of collective variables. These low-dimensional descriptors of molecular structure can be used to monitor the state of the simulation, to calculate free energy profiles or to accelerate rare events by a bias potential or a bias force. Frequent calculation of some complex collective variables may slow down the simulation or analysis of trajectories. Moreover, many collective variables cannot be explicitly calculated for newly sampled structures. In order to address this problem, we developed a new package called anncolvar. This package makes it possible to build and train an artificial neural network model that approximates a collective variable. It can be used to generate an input for the open-source enhanced sampling simulation PLUMED package, so the collective variable can be monitored and biased by methods available in this program. The computational efficiency and the accuracy of anncolvar are demonstrated on selected molecular systems (cyclooctane derivative, Trp-cage miniprotein) and selected collective variables (Isomap, molecular surface area).
Collapse
Affiliation(s)
- Dalibor Trapl
- Department of Biochemistry and Microbiology, University of Chemistry and Technology in Prague, Prague, Czechia
| | - Izabela Horvacanin
- Department of Biochemistry and Microbiology, University of Chemistry and Technology in Prague, Prague, Czechia.,Faculty of Science, University of Zagreb, Zagreb, Croatia
| | - Vaclav Mareska
- Department of Biochemistry and Microbiology, University of Chemistry and Technology in Prague, Prague, Czechia
| | - Furkan Ozcelik
- Computer Engineering Department, Istanbul Technical University, Istanbul, Turkey
| | - Gozde Unal
- Computer Engineering Department, Istanbul Technical University, Istanbul, Turkey
| | - Vojtech Spiwok
- Department of Biochemistry and Microbiology, University of Chemistry and Technology in Prague, Prague, Czechia
| |
Collapse
|
15
|
Nagel D, Weber A, Lickert B, Stock G. Dynamical coring of Markov state models. J Chem Phys 2019; 150:094111. [DOI: 10.1063/1.5081767] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/29/2023] Open
Affiliation(s)
- Daniel Nagel
- Biomolecular Dynamics, Institute of Physics, Albert Ludwigs University, 79104 Freiburg, Germany
| | - Anna Weber
- Biomolecular Dynamics, Institute of Physics, Albert Ludwigs University, 79104 Freiburg, Germany
| | - Benjamin Lickert
- Biomolecular Dynamics, Institute of Physics, Albert Ludwigs University, 79104 Freiburg, Germany
| | - Gerhard Stock
- Biomolecular Dynamics, Institute of Physics, Albert Ludwigs University, 79104 Freiburg, Germany
| |
Collapse
|
16
|
Sittel F, Stock G. Perspective: Identification of collective variables and metastable states of protein dynamics. J Chem Phys 2018; 149:150901. [PMID: 30342445 DOI: 10.1063/1.5049637] [Citation(s) in RCA: 84] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/25/2023] Open
Abstract
The statistical analysis of molecular dynamics simulations requires dimensionality reduction techniques, which yield a low-dimensional set of collective variables (CVs) {x i } = x that in some sense describe the essential dynamics of the system. Considering the distribution P( x ) of the CVs, the primal goal of a statistical analysis is to detect the characteristic features of P( x ), in particular, its maxima and their connection paths. This is because these features characterize the low-energy regions and the energy barriers of the corresponding free energy landscape ΔG( x ) = -k B T ln P( x ), and therefore amount to the metastable states and transition regions of the system. In this perspective, we outline a systematic strategy to identify CVs and metastable states, which subsequently can be employed to construct a Langevin or a Markov state model of the dynamics. In particular, we account for the still limited sampling typically achieved by molecular dynamics simulations, which in practice seriously limits the applicability of theories (e.g., assuming ergodicity) and black-box software tools (e.g., using redundant input coordinates). We show that it is essential to use internal (rather than Cartesian) input coordinates, employ dimensionality reduction methods that avoid rescaling errors (such as principal component analysis), and perform density based (rather than k-means-type) clustering. Finally, we briefly discuss a machine learning approach to dimensionality reduction, which highlights the essential internal coordinates of a system and may reveal hidden reaction mechanisms.
Collapse
Affiliation(s)
- Florian Sittel
- Biomolecular Dynamics, Institute of Physics, Albert Ludwigs University, 79104 Freiburg, Germany
| | - Gerhard Stock
- Biomolecular Dynamics, Institute of Physics, Albert Ludwigs University, 79104 Freiburg, Germany
| |
Collapse
|
17
|
Wehmeyer C, Noé F. Time-lagged autoencoders: Deep learning of slow collective variables for molecular kinetics. J Chem Phys 2018; 148:241703. [PMID: 29960344 DOI: 10.1063/1.5011399] [Citation(s) in RCA: 170] [Impact Index Per Article: 28.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Inspired by the success of deep learning techniques in the physical and chemical sciences, we apply a modification of an autoencoder type deep neural network to the task of dimension reduction of molecular dynamics data. We can show that our time-lagged autoencoder reliably finds low-dimensional embeddings for high-dimensional feature spaces which capture the slow dynamics of the underlying stochastic processes-beyond the capabilities of linear dimension reduction techniques.
Collapse
Affiliation(s)
- Christoph Wehmeyer
- Department of Mathematics and Computer Science, Freie Universität Berlin, Arnimallee 6, 14195 Berlin, Germany
| | - Frank Noé
- Department of Mathematics and Computer Science, Freie Universität Berlin, Arnimallee 6, 14195 Berlin, Germany
| |
Collapse
|
18
|
Pazúriková J, Křenek A, Spiwok V, Šimková M. Reducing the number of mean-square deviation calculations with floating close structure in metadynamics. J Chem Phys 2018; 146:115101. [PMID: 28330370 DOI: 10.1063/1.4978296] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/31/2023] Open
Abstract
Metadynamics is an important collective-coordinate-based enhanced sampling simulation method. Its performance depends significantly on the capability of collective coordinates to describe the studied molecular processes. Collective coordinates based on comparison with reference landmark structures can be used to enhance sampling in highly complex systems; however, they may slow down simulations due to high number of structure-structure distance (e.g., mean-square deviation) calculations. Here we introduce an approximation of root-mean-square or mean-square deviation that significantly reduces numbers of computationally expensive operations. We evaluate its accuracy and theoretical performance gain with metadynamics simulations on two molecular systems.
Collapse
Affiliation(s)
- Jana Pazúriková
- Institute of Computer Science, Masaryk University, Brno, Czech Republic
| | - Aleš Křenek
- Institute of Computer Science, Masaryk University, Brno, Czech Republic
| | - Vojtěch Spiwok
- Department of Biochemistry and Microbiology, University of Chemistry and Technology, Prague, Czech Republic
| | - Mária Šimková
- Institute of Computer Science, Masaryk University, Brno, Czech Republic
| |
Collapse
|
19
|
Manzhos S, Wang X, Carrington T. A multimode-like scheme for selecting the centers of Gaussian basis functions when computing vibrational spectra. Chem Phys 2018. [DOI: 10.1016/j.chemphys.2017.10.006] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|
20
|
Rodríguez-Espigares I, Kaczor AA, Stepniewski TM, Selent J. Challenges and Opportunities in Drug Discovery of Biased Ligands. Methods Mol Biol 2018; 1705:321-334. [PMID: 29188569 DOI: 10.1007/978-1-4939-7465-8_14] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Abstract
The observation of biased agonism in G protein-coupled receptors (GPCRs) has provided new approaches for the development of more efficacious and safer drugs. However, in order to rationally design biased drugs, one must understand the molecular basis of this phenomenon. Computational approaches can help in exploring the conformational universe of GPCRs and detecting conformational states with relevance for distinct functional outcomes. This information is extremely valuable for the development of new therapeutic agents that promote desired conformational receptor states and responses while avoiding the ones leading to undesired side-effects.This book chapter intends to introduce the reader to powerful computational approaches for sampling the conformational space of these receptors, focusing first on molecular dynamics and the analysis of the produced data through methods such as dimensionality reduction, Markov State Models and adaptive sampling. Then, we show how to seek for compounds that target distinct conformational states via docking and virtual screening. In addition, we describe how to detect receptor-ligand interactions that drive signaling bias and comment current challenges and opportunities of presented methods.
Collapse
Affiliation(s)
- Ismael Rodríguez-Espigares
- Department of Experimental and Health Sciences, Research Programme on Biomedical Informatics (GRIB), Hospital del Mar Medical Research Institute (IMIM), Pompeu Fabra University (UPF), Dr. Aiguader 88, E-08003, Barcelona, Spain
| | - Agnieszka A Kaczor
- Department of Synthesis and Chemical Technology of Pharmaceutical Substances with Computer Modelling Lab, Faculty of Pharmacy with Division of Medical Analytics, Medical University of Lublin, 4A Chodzki St., PL-20093, Lublin, Poland.,Department of Pharmaceutical Chemistry, School of Pharmacy, University of Eastern Finland, Yliopistonranta 1, P.O. Box 1627, FI-70211, Kuopio, Finland
| | - Tomasz Maciej Stepniewski
- Department of Experimental and Health Sciences, Research Programme on Biomedical Informatics (GRIB), Hospital del Mar Medical Research Institute (IMIM), Pompeu Fabra University (UPF), Dr. Aiguader 88, E-08003, Barcelona, Spain
| | - Jana Selent
- Department of Experimental and Health Sciences, Research Programme on Biomedical Informatics (GRIB), Hospital del Mar Medical Research Institute (IMIM), Pompeu Fabra University (UPF), Dr. Aiguader 88, E-08003, Barcelona, Spain.
| |
Collapse
|
21
|
Zauleck JPP, de Vivie-Riedle R. Constructing Grids for Molecular Quantum Dynamics Using an Autoencoder. J Chem Theory Comput 2017; 14:55-62. [DOI: 10.1021/acs.jctc.7b01045] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Affiliation(s)
- Julius P. P. Zauleck
- Department Chemie, Ludwig-Maximilians-Universität München, D-81377 München, Germany
| | | |
Collapse
|
22
|
Galvelis R, Sugita Y. Neural Network and Nearest Neighbor Algorithms for Enhancing Sampling of Molecular Dynamics. J Chem Theory Comput 2017; 13:2489-2500. [PMID: 28437616 DOI: 10.1021/acs.jctc.7b00188] [Citation(s) in RCA: 44] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
The free energy calculations of complex chemical and biological systems with molecular dynamics (MD) are inefficient due to multiple local minima separated by high-energy barriers. The minima can be escaped using an enhanced sampling method such as metadynamics, which apply bias (i.e., importance sampling) along a set of collective variables (CV), but the maximum number of CVs (or dimensions) is severely limited. We propose a high-dimensional bias potential method (NN2B) based on two machine learning algorithms: the nearest neighbor density estimator (NNDE) and the artificial neural network (ANN) for the bias potential approximation. The bias potential is constructed iteratively from short biased MD simulations accounting for correlation among CVs. Our method is capable of achieving ergodic sampling and calculating free energy of polypeptides with up to 8-dimensional bias potential.
Collapse
Affiliation(s)
- Raimondas Galvelis
- RIKEN Theoretical Molecular Science Laboratory , 2-1 Hirosawa, Wako, Saitama 351-0198, Japan
| | - Yuji Sugita
- RIKEN Theoretical Molecular Science Laboratory , 2-1 Hirosawa, Wako, Saitama 351-0198, Japan.,RIKEN Advance Institute for Computational Science , Integrated Inovation Building 7F, 6-7-1 Minatojima-minamimachi, Chuo-ku, Kobe, Hyogo 650-0047, Japan.,RIKEN iTHES , 2-1 Hirosawa, Wako, Saitama 351-0198, Japan.,RIKEN Quantitative Biology Center , Integrated Inovation Building 7F, 6-7-1 Minatojima-minamimachi, Chuo-ku, Kobe, Hyogo 650-0047, Japan
| |
Collapse
|
23
|
Hashemian B, Millán D, Arroyo M. Charting molecular free-energy landscapes with an atlas of collective variables. J Chem Phys 2016; 145:174109. [DOI: 10.1063/1.4966262] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022] Open
Affiliation(s)
- Behrooz Hashemian
- LaCàN, Universitat Politècnica de Catalunya–BarcelonaTech, Barcelona, Spain
| | - Daniel Millán
- LaCàN, Universitat Politècnica de Catalunya–BarcelonaTech, Barcelona, Spain
| | - Marino Arroyo
- LaCàN, Universitat Politècnica de Catalunya–BarcelonaTech, Barcelona, Spain
| |
Collapse
|
24
|
Coutsias EA, Lexa KW, Wester MJ, Pollock SN, Jacobson MP. Exhaustive Conformational Sampling of Complex Fused Ring Macrocycles Using Inverse Kinematics. J Chem Theory Comput 2016; 12:4674-87. [PMID: 27447193 PMCID: PMC5465426 DOI: 10.1021/acs.jctc.6b00250] [Citation(s) in RCA: 47] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Natural product and synthetic macrocycles are chemically and topologically diverse. An efficient, accurate, and general method for generating macrocycle conformations would enable structure-based design of macrocycle drugs or host-guest complexes. Computational sampling also provides insight into transiently populated states, complementing crystallographic and NMR data. Here, we report a new algorithm, BRIKARD, that addresses this challenge through computational algebraic geometry and inverse kinematics together with local energy minimization. BRIKARD is demonstrated on 67 diverse macrocycles with structural data, encompassing various ring topologies. We find this approach enumerates diverse structures with macrocyclic RMSD < 1.0 Å to the experimental conformation for 85% of our data set in contrast to success rates of 67-75% with other approaches, while for the subset of 21 more challenging compounds in the data set, these rates are 57% and 10-29%, respectively. Because the algorithm can be efficiently run in parallel on many processors, exhaustive conformational sampling of complex cycles can be obtained in minutes rather than hours: with a 40 processor implementation protocol, BRIKARD samples the conformational diversity of a potential energy landscape in a median of 1.3 minutes of wallclock time, much faster than 3.1-10.3 hours necessary with current programs. By rigorously testing BRIKARD on a broad range of scaffolds with highly complex ring systems, we push the frontiers of macrocycle sampling to encompass multiring compounds, including those with more than 50 ring atoms and up to seven interlaced flexible rings.
Collapse
Affiliation(s)
- Evangelos A. Coutsias
- Department of Applied Mathematics and Statistics, Stony Brook University, Stony Brook, New York 11794, United States
| | - Katrina W. Lexa
- Department of Pharmaceutical Chemistry, University of California in San Francisco, San Francisco, California 94107, United States
| | - Michael J. Wester
- New Mexico Center for Spatiotemporal Modeling of Cell Signaling, University of New Mexico, Albuquerque, New Mexico 87131, United States
| | - Sara N. Pollock
- Department of Mathematics, Texas A&M University, College Station, Texas 77843, United States
| | - Matthew P. Jacobson
- Department of Pharmaceutical Chemistry, University of California in San Francisco, San Francisco, California 94107, United States
| |
Collapse
|
25
|
Rodríguez-Espigares I, Kaczor AA, Selent J. In silico Exploration of the Conformational Universe of GPCRs. Mol Inform 2016; 35:227-37. [PMID: 27492237 DOI: 10.1002/minf.201600012] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2016] [Accepted: 04/14/2016] [Indexed: 12/17/2022]
Abstract
The structural plasticity of G protein coupled receptors (GPCRs) leads to a conformational universe going from inactive to active receptor states with several intermediate states. Many of them have not been captured yet and their role for GPCR activation is not well understood. The study of this conformational space and the transition dynamics between different receptor populations is a major challenge in molecular biophysics. The rational design of effector molecules that target such receptor populations allows fine-tuning receptor signalling with higher specificity to produce drugs with safer therapeutic profiles. In this minireview, we outline highly conserved receptor regions which are considered determinant for the establishment of distinct receptor states. We then discuss in-silico approaches such as dimensionality reduction methods and Markov State Models to explore the GPCR conformational universe and exploit the obtained conformations through structure-based drug design.
Collapse
Affiliation(s)
- Ismael Rodríguez-Espigares
- Pharmacoinformatics group, Research Programme on Biomedical Informatics (GRIB), Universitat Pompeu Fabra (UPF)-Hospital del Mar Medical Research Institute (IMIM), Parc de Recerca Biomèdica de Barcelona (PRBB), Dr. Aiguader, 88, 08003, Barcelona, Spain
| | - Agnieszka A Kaczor
- Department of Synthesis and Chemical Technology of Pharmaceutical Substances with Computer Modeling Lab, Faculty of Pharmacy with Division for Medical Analytics, Medical University of Lublin, 4A Chodźki St., PL-20059, Lublin, Poland.,School of Pharmacy, University of Eastern Finland, Yliopistonranta 1, P.O. Box 1627, FI-70211, Kuopio, Finland
| | - Jana Selent
- Pharmacoinformatics group, Research Programme on Biomedical Informatics (GRIB), Universitat Pompeu Fabra (UPF)-Hospital del Mar Medical Research Institute (IMIM), Parc de Recerca Biomèdica de Barcelona (PRBB), Dr. Aiguader, 88, 08003, Barcelona, Spain.
| |
Collapse
|
26
|
Perez A, MacCallum JL, Coutsias EA, Dill KA. Constraint methods that accelerate free-energy simulations of biomolecules. J Chem Phys 2015; 143:243143. [PMID: 26723628 PMCID: PMC4684272 DOI: 10.1063/1.4936911] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2015] [Accepted: 11/18/2015] [Indexed: 01/07/2023] Open
Abstract
Atomistic molecular dynamics simulations of biomolecules are critical for generating narratives about biological mechanisms. The power of atomistic simulations is that these are physics-based methods that satisfy Boltzmann's law, so they can be used to compute populations, dynamics, and mechanisms. But physical simulations are computationally intensive and do not scale well to the sizes of many important biomolecules. One way to speed up physical simulations is by coarse-graining the potential function. Another way is to harness structural knowledge, often by imposing spring-like restraints. But harnessing external knowledge in physical simulations is problematic because knowledge, data, or hunches have errors, noise, and combinatoric uncertainties. Here, we review recent principled methods for imposing restraints to speed up physics-based molecular simulations that promise to scale to larger biomolecules and motions.
Collapse
Affiliation(s)
- Alberto Perez
- Laufer Center for Physical and Quantitative Biology, Stony Brook University, Stony Brook, New York 11794, USA
| | - Justin L MacCallum
- Department of Chemistry, University of Calgary, Calgary, Alberta T2N 1N4, Canada
| | - Evangelos A Coutsias
- Laufer Center for Physical and Quantitative Biology, Stony Brook University, Stony Brook, New York 11794, USA
| | - Ken A Dill
- Laufer Center for Physical and Quantitative Biology, Stony Brook University, Stony Brook, New York 11794, USA
| |
Collapse
|
27
|
Nedialkova LV, Amat MA, Kevrekidis IG, Hummer G. Diffusion maps, clustering and fuzzy Markov modeling in peptide folding transitions. J Chem Phys 2015; 141:114102. [PMID: 25240340 DOI: 10.1063/1.4893963] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Using the helix-coil transitions of alanine pentapeptide as an illustrative example, we demonstrate the use of diffusion maps in the analysis of molecular dynamics simulation trajectories. Diffusion maps and other nonlinear data-mining techniques provide powerful tools to visualize the distribution of structures in conformation space. The resulting low-dimensional representations help in partitioning conformation space, and in constructing Markov state models that capture the conformational dynamics. In an initial step, we use diffusion maps to reduce the dimensionality of the conformational dynamics of Ala5. The resulting pretreated data are then used in a clustering step. The identified clusters show excellent overlap with clusters obtained previously by using the backbone dihedral angles as input, with small--but nontrivial--differences reflecting torsional degrees of freedom ignored in the earlier approach. We then construct a Markov state model describing the conformational dynamics in terms of a discrete-time random walk between the clusters. We show that by combining fuzzy C-means clustering with a transition-based assignment of states, we can construct robust Markov state models. This state-assignment procedure suppresses short-time memory effects that result from the non-Markovianity of the dynamics projected onto the space of clusters. In a comparison with previous work, we demonstrate how manifold learning techniques may complement and enhance informed intuition commonly used to construct reduced descriptions of the dynamics in molecular conformation space.
Collapse
Affiliation(s)
- Lilia V Nedialkova
- Department of Chemical and Biological Engineering, Princeton University, Princeton, New Jersey 08544, USA
| | - Miguel A Amat
- Department of Chemical and Biological Engineering, Princeton University, Princeton, New Jersey 08544, USA
| | - Ioannis G Kevrekidis
- Department of Chemical and Biological Engineering and Program in Applied and Computational Mathematics, Princeton University, Princeton, New Jersey 08544, USA
| | - Gerhard Hummer
- Department of Theoretical Biophysics, Max Planck Institute of Biophysics, Max-von-Laue-Str. 3, 60438 Frankfurt am Main, Germany
| |
Collapse
|
28
|
Hashemian B, Arroyo M. Topological obstructions in the way of data-driven collective variables. J Chem Phys 2015; 142:044102. [PMID: 25637964 DOI: 10.1063/1.4906425] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023] Open
Abstract
Nonlinear dimensionality reduction (NLDR) techniques are increasingly used to visualize molecular trajectories and to create data-driven collective variables for enhanced sampling simulations. The success of these methods relies on their ability to identify the essential degrees of freedom characterizing conformational changes. Here, we show that NLDR methods face serious obstacles when the underlying collective variables present periodicities, e.g., arising from proper dihedral angles. As a result, NLDR methods collapse very distant configurations, thus leading to misinterpretations and inefficiencies in enhanced sampling. Here, we identify this largely overlooked problem and discuss possible approaches to overcome it. We also characterize the geometry and topology of conformational changes of alanine dipeptide, a benchmark system for testing new methods to identify collective variables.
Collapse
Affiliation(s)
- Behrooz Hashemian
- LaCàN, Universitat Politecnica de Catalunya–BarcelonaTech, Barcelona, Spain
| | - Marino Arroyo
- LaCàN, Universitat Politecnica de Catalunya–BarcelonaTech, Barcelona, Spain
| |
Collapse
|
29
|
Chen C, Chen D, Ciucci F. A molecular dynamics study of oxygen ion diffusion in A-site ordered perovskite PrBaCo2O5.5: data mining the oxygen trajectories. Phys Chem Chem Phys 2015; 17:7831-7. [DOI: 10.1039/c4cp05847j] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
Abstract
Data mining the trajectories of molecular dynamics simulations leads to a better understanding of oxygen diffusion in perovskites.
Collapse
Affiliation(s)
- Chi Chen
- Department of Mechanical and Aerospace Engineering
- The Hong Kong University of Science and Technology
- Hong Kong
- SAR China
| | - Dengjie Chen
- Department of Mechanical and Aerospace Engineering
- The Hong Kong University of Science and Technology
- Hong Kong
- SAR China
| | - Francesco Ciucci
- Department of Mechanical and Aerospace Engineering
- The Hong Kong University of Science and Technology
- Hong Kong
- SAR China
- Department of Chemical and Biomolecular Engineering
| |
Collapse
|
30
|
Hashemian B, Millán D, Arroyo M. Modeling and enhanced sampling of molecular systems with smooth and nonlinear data-driven collective variables. J Chem Phys 2014; 139:214101. [PMID: 24320358 DOI: 10.1063/1.4830403] [Citation(s) in RCA: 41] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Collective variables (CVs) are low-dimensional representations of the state of a complex system, which help us rationalize molecular conformations and sample free energy landscapes with molecular dynamics simulations. Given their importance, there is need for systematic methods that effectively identify CVs for complex systems. In recent years, nonlinear manifold learning has shown its ability to automatically characterize molecular collective behavior. Unfortunately, these methods fail to provide a differentiable function mapping high-dimensional configurations to their low-dimensional representation, as required in enhanced sampling methods. We introduce a methodology that, starting from an ensemble representative of molecular flexibility, builds smooth and nonlinear data-driven collective variables (SandCV) from the output of nonlinear manifold learning algorithms. We demonstrate the method with a standard benchmark molecule, alanine dipeptide, and show how it can be non-intrusively combined with off-the-shelf enhanced sampling methods, here the adaptive biasing force method. We illustrate how enhanced sampling simulations with SandCV can explore regions that were poorly sampled in the original molecular ensemble. We further explore the transferability of SandCV from a simpler system, alanine dipeptide in vacuum, to a more complex system, alanine dipeptide in explicit water.
Collapse
Affiliation(s)
- Behrooz Hashemian
- LaCàN, Universitat Politècnica de Catalunya - BarcelonaTech, Campus Nord, 08034 Barcelona, Spain
| | | | | |
Collapse
|
31
|
Duan M, Li M, Han L, Huo S. Euclidean sections of protein conformation space and their implications in dimensionality reduction. Proteins 2014; 82:2585-96. [PMID: 24913095 DOI: 10.1002/prot.24622] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2014] [Revised: 05/06/2014] [Accepted: 05/30/2014] [Indexed: 01/05/2023]
Abstract
Dimensionality reduction is widely used in searching for the intrinsic reaction coordinates for protein conformational changes. We find the dimensionality-reduction methods using the pairwise root-mean-square deviation (RMSD) as the local distance metric face a challenge. We use Isomap as an example to illustrate the problem. We believe that there is an implied assumption for the dimensionality-reduction approaches that aim to preserve the geometric relations between the objects: both the original space and the reduced space have the same kind of geometry, such as Euclidean geometry vs. Euclidean geometry or spherical geometry vs. spherical geometry. When the protein free energy landscape is mapped onto a 2D plane or 3D space, the reduced space is Euclidean, thus the original space should also be Euclidean. For a protein with N atoms, its conformation space is a subset of the 3N-dimensional Euclidean space R(3N). We formally define the protein conformation space as the quotient space of R(3N) by the equivalence relation of rigid motions. Whether the quotient space is Euclidean or not depends on how it is parameterized. When the pairwise RMSD is employed as the local distance metric, implicit representations are used for the protein conformation space, leading to no direct correspondence to a Euclidean set. We have demonstrated that an explicit Euclidean-based representation of protein conformation space and the local distance metric associated to it improve the quality of dimensionality reduction in the tetra-peptide and β-hairpin systems.
Collapse
Affiliation(s)
- Mojie Duan
- Gustaf H. Carlson School of Chemistry and Biochemistry, Clark University, Worcester, Massachusetts, 01610
| | | | | | | |
Collapse
|
32
|
Duan M, Fan J, Li M, Han L, Huo S. Evaluation of Dimensionality-reduction Methods from Peptide Folding-unfolding Simulations. J Chem Theory Comput 2013; 9:2490-2497. [PMID: 23772182 DOI: 10.1021/ct400052y] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Dimensionality reduction methods have been widely used to study the free energy landscapes and low-free energy pathways of molecular systems. It was shown that the non-linear dimensionality-reduction methods gave better embedding results than the linear methods, such as principal component analysis, in some simple systems. In this study, we have evaluated several non linear methods, locally linear embedding, Isomap, and diffusion maps, as well as principal component analysis from the equilibrium folding/unfolding trajectory of the second β-hairpin of the B1 domain of streptococcal protein G. The CHARMM parm19 polar hydrogen potential function was used. A series of criteria which reflects different aspects of the embedding qualities were employed in the evaluation. Our results show that principal component analysis is not worse than the non-linear ones on this complex system. There is no clear winner in all aspects of the evaluation. Each dimensionality-reduction method has its limitations in a certain aspect. We emphasize that a fair, informative assessment of an embedding result requires a combination of multiple evaluation criteria rather than any single one. Caution should be used when dimensionality-reduction methods are employed, especially when only a few of top embedding dimensions are used to describe the free energy landscape.
Collapse
Affiliation(s)
- Mojie Duan
- Gustaf H. Carlson School of Chemistry and Biochemistry, Clark University, Worcester, MA 01610 USA
| | | | | | | | | |
Collapse
|
33
|
Porta JM, Jaillet L. Exploring the energy landscapes of flexible molecular loops using higher-dimensional continuation. J Comput Chem 2012; 34:234-44. [PMID: 23015474 DOI: 10.1002/jcc.23128] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2012] [Revised: 08/06/2012] [Accepted: 08/30/2012] [Indexed: 12/27/2022]
Abstract
The conformational space of a flexible molecular loop includes the set of conformations fulfilling the geometric loop-closure constraints and its energy landscape can be seen as a scalar field defined on this implicit set. Higher-dimensional continuation tools, recently developed in dynamical systems and also applied to robotics, provide efficient algorithms to trace out implicitly defined sets. This article describes these tools and applies them to obtain full descriptions of the energy landscapes of short molecular loops that, otherwise, can only be partially explored, mainly via sampling. Moreover, to deal with larger loops, this article exploits the higher-dimensional continuation tools to find local minima and minimum energy transition paths between them, without deviating from the loop-closure constraints. The proposed techniques are applied to previously studied molecules revealing the intricate structure of their energy landscapes.
Collapse
Affiliation(s)
- Josep M Porta
- Institut de Robótica i Informática Industrial, UPC-CSIC, Llorens Artigas 4-6, 08028 Barcelona, Spain.
| | | |
Collapse
|
34
|
Mitsutake A, Iijima H, Takano H. Relaxation mode analysis of a peptide system: comparison with principal component analysis. J Chem Phys 2012; 135:164102. [PMID: 22047223 DOI: 10.1063/1.3652959] [Citation(s) in RCA: 39] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
This article reports the first attempt to apply the relaxation mode analysis method to a simulation of a biomolecular system. In biomolecular systems, the principal component analysis is a well-known method for analyzing the static properties of fluctuations of structures obtained by a simulation and classifying the structures into some groups. On the other hand, the relaxation mode analysis has been used to analyze the dynamic properties of homopolymer systems. In this article, a long Monte Carlo simulation of Met-enkephalin in gas phase has been performed. The results are analyzed by the principal component analysis and relaxation mode analysis methods. We compare the results of both methods and show the effectiveness of the relaxation mode analysis.
Collapse
Affiliation(s)
- Ayori Mitsutake
- Department of Physics, Keio University, Yokohama, Kanagawa 223-8522, Japan.
| | | | | |
Collapse
|
35
|
Potapov A, Stepanova M. Conformational modes in biomolecules: dynamics and approximate invariance. PHYSICAL REVIEW. E, STATISTICAL, NONLINEAR, AND SOFT MATTER PHYSICS 2012; 85:020901. [PMID: 22463145 DOI: 10.1103/physreve.85.020901] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/27/2010] [Indexed: 05/31/2023]
Abstract
Understanding the physical mechanisms behind the folding and conformational dynamics of biomolecules is one of the major unsolved challenges of soft matter theory. In this contribution, a theoretical framework for biomolecular dynamics is introduced, employing selected aspects of statistical mechanics, dimensionality reduction, the perturbation theory, and the theory of matrices. Biomolecular dynamics is represented by time-dependent orthogonal conformational modes, the dynamics of the modes is investigated, and invariant properties that persist are identified. As an example, the dynamics of a human prion protein is considered. The theory provides a rigorous background for assessing the stable dynamical properties of biomolecules, such as their coarse-grained structure, through a multiscale approach using short subnanosecond segments of molecular dynamics trajectories. Furthermore, the paper offers a theoretical platform for models of conformational changes in macromolecules, which may allow complementing molecular dynamics simulations.
Collapse
Affiliation(s)
- Alex Potapov
- Centre for Mathematical Biology, University of Alberta, Edmonton, Alberta, Canada
| | | |
Collapse
|
36
|
Porta JM, Jaillet L, Bohigas O. Randomized path planning on manifolds based on higher-dimensional continuation. Int J Rob Res 2011. [DOI: 10.1177/0278364911432324] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Despite the significant advances in path planning methods, highly constrained problems are still challenging. In some situations, the presence of constraints defines a configuration space that is a non-parametrizable manifold embedded in a high-dimensional ambient space. In these cases, the use of sampling-based path planners is cumbersome since samples in the ambient space have low probability to lay on the configuration space manifold. In this paper, we present a new path planning algorithm specially tailored for highly constrained systems. The proposed planner builds on recently developed tools for higher-dimensional continuation, which provide numerical procedures to describe an implicitly defined manifold using a set of local charts. We propose to extend these methods focusing the generation of charts on the path between the two configurations to connect and randomizing the process to find alternative paths in the presence of obstacles. The advantage of this planner comes from the fact that it directly operates into the configuration space and not into the higher-dimensional ambient space, as most of the existing methods do.
Collapse
Affiliation(s)
- Josep M Porta
- Institut de Robòtica i Informàtica Industrial, CSIC-UPC, Barcelona, Spain
| | - Léonard Jaillet
- Institut de Robòtica i Informàtica Industrial, CSIC-UPC, Barcelona, Spain
| | - Oriol Bohigas
- Institut de Robòtica i Informàtica Industrial, CSIC-UPC, Barcelona, Spain
| |
Collapse
|
37
|
Spiwok V, Králová B. Metadynamics in the conformational space nonlinearly dimensionally reduced by Isomap. J Chem Phys 2011; 135:224504. [DOI: 10.1063/1.3660208] [Citation(s) in RCA: 50] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022] Open
|
38
|
Santo KP, Berjanskii M, Wishart DS, Stepanova M. Comparative analysis of essential collective dynamics and NMR-derived flexibility profiles in evolutionarily diverse prion proteins. Prion 2011; 5:188-200. [PMID: 21869604 DOI: 10.4161/pri.5.3.16097] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022] Open
Abstract
Collective motions on ns-μs time scales are known to have a major impact on protein folding, stability, binding and enzymatic efficiency. It is also believed that these motions may have an important role in the early stages of prion protein misfolding and prion disease. In an effort to accurately characterize these motions and their potential influence on the misfolding and prion disease transmissibility we have conducted a combined analysis of molecular dynamic simulations and NMR-derived flexibility measurements over a diverse range of prion proteins. Using a recently developed numerical formalism, we have analyzed the essential collective dynamics (ECD) for prion proteins from 8 different species including human, cow, elk, cat, hamster, chicken, turtle and frog. We also compared the numerical results with flexibility profiles generated by the random coil index (RCI) from NMR chemical shifts. Prion protein backbone flexibility derived from experimental NMR data and from theoretical computations show strong agreement with each other, demonstrating that it is possible to predict the observed RCI profiles employing the numerical ECD formalism. Interestingly, flexibility differences in the loop between second beta strand (S2) and the second alpha helix (HB) appear to distinguish prion proteins from species that are susceptible to prion disease and those that are resistant. Our results show that the different levels of flexibility in the S2-HB loop in various species are predictable via the ECD method, indicating that ECD may be used to identify disease resistant variants of prion proteins, as well as the influence of prion proteins mutations on disease susceptibility or misfolding propensity.
Collapse
|
39
|
|
40
|
Martin S, Thompson A, Coutsias EA, Watson JP. Topology of cyclo-octane energy landscape. J Chem Phys 2010; 132:234115. [PMID: 20572697 DOI: 10.1063/1.3445267] [Citation(s) in RCA: 42] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Understanding energy landscapes is a major challenge in chemistry and biology. Although a wide variety of methods have been invented and applied to this problem, very little is understood about the actual mathematical structures underlying such landscapes. Perhaps the most general assumption is the idea that energy landscapes are low-dimensional manifolds embedded in high-dimensional Euclidean space. While this is a very mild assumption, we have discovered an example of an energy landscape which is nonmanifold, demonstrating previously unknown mathematical complexity. The example occurs in the energy landscape of cyclo-octane, which was found to have the structure of a reducible algebraic variety, composed of the union of a sphere and a Klein bottle, intersecting in two rings.
Collapse
Affiliation(s)
- Shawn Martin
- Computer Science and Informatics, Sandia National Laboratories, Albuquerque, New Mexico 87185-1316, USA.
| | | | | | | |
Collapse
|
41
|
Manzhos S, Yamashita K, Carrington T. Extracting Functional Dependence from Sparse Data Using Dimensionality Reduction: Application to Potential Energy Surface Construction. ACTA ACUST UNITED AC 2010. [DOI: 10.1007/978-3-642-14941-2_7] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
42
|
|