1
|
Liu X, Xing J, Fu H, Shao X, Cai W. Analyzing Molecular Dynamics Trajectories Thermodynamically through Artificial Intelligence. J Chem Theory Comput 2024; 20:665-676. [PMID: 38193858 DOI: 10.1021/acs.jctc.3c00975] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2024]
Abstract
Molecular dynamics simulations produce trajectories that correspond to vast amounts of structure when exploring biochemical processes. Extracting valuable information, e.g., important intermediate states and collective variables (CVs) that describe the major movement modes, from molecular trajectories to understand the underlying mechanisms of biological processes presents a significant challenge. To achieve this goal, we introduce a deep learning approach, coined DIKI (deep identification of key intermediates), to determine low-dimensional CVs distinguishing key intermediate conformations without a-priori assumptions. DIKI dynamically plans the distribution of latent space and groups together similar conformations within the same cluster. Moreover, by incorporating two user-defined parameters, namely, coarse focus knob and fine focus knob, to help identify conformations with low free energy and differentiate the subtle distinctions among these conformations, resolution-tunable clustering was achieved. Furthermore, the integration of DIKI with a path-finding algorithm contributes to the identification of crucial intermediates along the lowest free-energy pathway. We postulate that DIKI is a robust and flexible tool that can find widespread applications in the analysis of complex biochemical processes.
Collapse
Affiliation(s)
- Xuyang Liu
- Research Center for Analytical Sciences, Tianjin Key Laboratory of Biosensing and Molecular Recognition, State Key Laboratory of Medicinal Chemical Biology, College of Chemistry, Nankai University, Tianjin 300071, China
- Haihe Laboratory of Sustainable Chemical Transformations, Tianjin 300192, China
| | - Jingya Xing
- Research Center for Analytical Sciences, Tianjin Key Laboratory of Biosensing and Molecular Recognition, State Key Laboratory of Medicinal Chemical Biology, College of Chemistry, Nankai University, Tianjin 300071, China
- Haihe Laboratory of Sustainable Chemical Transformations, Tianjin 300192, China
| | - Haohao Fu
- Research Center for Analytical Sciences, Tianjin Key Laboratory of Biosensing and Molecular Recognition, State Key Laboratory of Medicinal Chemical Biology, College of Chemistry, Nankai University, Tianjin 300071, China
- Haihe Laboratory of Sustainable Chemical Transformations, Tianjin 300192, China
| | - Xueguang Shao
- Research Center for Analytical Sciences, Tianjin Key Laboratory of Biosensing and Molecular Recognition, State Key Laboratory of Medicinal Chemical Biology, College of Chemistry, Nankai University, Tianjin 300071, China
- Haihe Laboratory of Sustainable Chemical Transformations, Tianjin 300192, China
| | - Wensheng Cai
- Research Center for Analytical Sciences, Tianjin Key Laboratory of Biosensing and Molecular Recognition, State Key Laboratory of Medicinal Chemical Biology, College of Chemistry, Nankai University, Tianjin 300071, China
- Haihe Laboratory of Sustainable Chemical Transformations, Tianjin 300192, China
| |
Collapse
|
2
|
Chennakesavalu S, Toomer DJ, Rotskoff GM. Ensuring thermodynamic consistency with invertible coarse-graining. J Chem Phys 2023; 158:124126. [PMID: 37003724 DOI: 10.1063/5.0141888] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/03/2023] Open
Abstract
Coarse-grained models are a core computational tool in theoretical chemistry and biophysics. A judicious choice of a coarse-grained model can yield physical insights by isolating the essential degrees of freedom that dictate the thermodynamic properties of a complex, condensed-phase system. The reduced complexity of the model typically leads to lower computational costs and more efficient sampling compared with atomistic models. Designing "good" coarse-grained models is an art. Generally, the mapping from fine-grained configurations to coarse-grained configurations itself is not optimized in any way; instead, the energy function associated with the mapped configurations is. In this work, we explore the consequences of optimizing the coarse-grained representation alongside its potential energy function. We use a graph machine learning framework to embed atomic configurations into a low-dimensional space to produce efficient representations of the original molecular system. Because the representation we obtain is no longer directly interpretable as a real-space representation of the atomic coordinates, we also introduce an inversion process and an associated thermodynamic consistency relation that allows us to rigorously sample fine-grained configurations conditioned on the coarse-grained sampling. We show that this technique is robust, recovering the first two moments of the distribution of several observables in proteins such as chignolin and alanine dipeptide.
Collapse
Affiliation(s)
| | - David J Toomer
- Department of Chemistry, Stanford University, Stanford, California 94305, USA
| | - Grant M Rotskoff
- Department of Chemistry, Stanford University, Stanford, California 94305, USA
| |
Collapse
|
3
|
Iida S, Tomoshi K. Free energy and kinetic rate calculation via non-equilibrium molecular simulation: application to biomolecules. Biophys Rev 2022; 14:1303-1314. [PMID: 36659997 PMCID: PMC9842846 DOI: 10.1007/s12551-022-01036-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2022] [Accepted: 11/26/2022] [Indexed: 12/30/2022] Open
Abstract
Non-equilibrium molecular dynamics (NEMD) simulation has been recognized as a powerful tool for examining biomolecules and provides fruitful insights into not only non-equilibrium but also equilibrium processes. We review recent advances in NEMD simulation and relevant, fundamental results of non-equilibrium statistical mechanics. We first introduce Crooks fluctuation theorem and Jarzynski equality that relate free energy difference to work done on a physical system during a non-equilibrium process. The theorems are beneficial for the analysis of NEMD trajectories. We then describe rate theory, a framework to calculate molecular kinetics from a non-equilibrium process; this theoretical framework enables us to calculate a reaction time-mean-first passage time-from NEMD trajectories. We, in turn, present recent NEMD techniques that apply an external force to a system to enhance molecular dissociation and introduce their application to biomolecules. Lastly, we show the current status of an appropriate selection of reaction coordinates for NEMD simulation.
Collapse
Affiliation(s)
- Shinji Iida
- Artificial Intelligence Research Center, National Institute of Advanced Industrial Science and Technology (AIST), 2-4-7 Aomi, Koto-Ku, Tokyo, 135-0064 Japan
| | - Kameda Tomoshi
- Artificial Intelligence Research Center, National Institute of Advanced Industrial Science and Technology (AIST), 2-4-7 Aomi, Koto-Ku, Tokyo, 135-0064 Japan
| |
Collapse
|
4
|
Salahub DR. Multiscale molecular modelling: from electronic structure to dynamics of nanosystems and beyond. Phys Chem Chem Phys 2022; 24:9051-9081. [PMID: 35389399 DOI: 10.1039/d1cp05928a] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Abstract
Important contemporary biological and materials problems often depend on interactions that span orders of magnitude differences in spatial and temporal dimensions. This Tutorial Review attempts to provide an introduction to such fascinating problems through a series of case studies, aimed at beginning researchers, graduate students, postdocs and more senior colleagues who are changing direction to focus on multiscale aspects of their research. The choice of specific examples is highly personal, with examples either chosen from our own work or outstanding multiscale efforts from the literature. I start with various embedding schemes, as exemplified by polarizable continuum models, 3-D RISM, molecular DFT and frozen-density embedding. Next, QM/MM (quantum mechanical/molecular mechanical) techniques are the workhorse of pm-to-nm/ps-to-ns simulations; examples are drawn from enzymes and from nanocatalysis for oil-sands upgrading. Using polarizable force-fields in the QM/MM framework represents a burgeoning subfield; with examples from ion channels and electron dynamics in molecules subject to strong external fields, probing the atto-second dynamics of the electrons with RT-TDDFT (real-time - time-dependent density functional theory) eventually coupled with nuclear motion through the Ehrenfest approximation. This is followed by a section on coarse graining, bridging dimensions from atoms to cells. The penultimate chapter gives a quick overview of multiscale approaches that extend into the meso- and macro-scales, building on atomistic and coarse-grained techniques to enter the world of materials engineering, on the one hand, and cell biology, on the other. A final chapter gives just a glimpse of the burgeoning impact of machine learning on the structure-dynamics front. I aim to capture the excitement of contemporary leading-edge breakthroughs in the description of physico-chemical systems and processes in complex environments, with only enough historical content to provide context and aid the next generation of methodological development. While I aim also for a clear description of the essence of methodological breakthroughs, equations are kept to a minimum and detailed formalism and implementation details are left to the references. My approach is very selective (case studies) rather than exhaustive. I think that these case studies should provide fodder to build as complete a reference tree on multiscale modelling as the reader may wish, through forward and backward citation analysis. I hope that my choices of cases will excite interest in newcomers and help to fuel the growth of multiscale modelling in general.
Collapse
Affiliation(s)
- Dennis R Salahub
- Department of Chemistry, Department of Physics and Astronomy, CMS-Centre for Molecular Simulation, IQST-Institute for Quantum Science and Technology, Quantum Alberta, University of Calgary, Calgary, Alberta, T2N 1N4, Canada.
| |
Collapse
|
5
|
Belkacemi Z, Gkeka P, Lelièvre T, Stoltz G. Chasing Collective Variables Using Autoencoders and Biased Trajectories. J Chem Theory Comput 2021; 18:59-78. [PMID: 34965117 DOI: 10.1021/acs.jctc.1c00415] [Citation(s) in RCA: 22] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
Free energy biasing methods have proven to be powerful tools to accelerate the simulation of important conformational changes of molecules by modifying the sampling measure. However, most of these methods rely on the prior knowledge of low-dimensional slow degrees of freedom, i.e., collective variables (CVs). Alternatively, such CVs can be identified using machine learning (ML) and dimensionality reduction algorithms. In this context, approaches where the CVs are learned in an iterative way using adaptive biasing have been proposed: at each iteration, the learned CV is used to perform free energy adaptive biasing to generate new data and learn a new CV. In this paper, we introduce a new iterative method involving CV learning with autoencoders: Free Energy Biasing and Iterative Learning with AutoEncoders (FEBILAE). Our method includes a reweighting scheme to ensure that the learning model optimizes the same loss at each iteration and achieves CV convergence. Using the alanine dipeptide system and the solvated chignolin mini-protein system as examples, we present results of our algorithm using the extended adaptive biasing force as the free energy adaptive biasing method.
Collapse
Affiliation(s)
- Zineb Belkacemi
- CERMICS, Ecole des Ponts ParisTech, 77455 Marne-la-Vallée, France.,Structure Design and Informatics, Sanofi 1371 R&D, 91385 Chilly-Mazarin, France
| | - Paraskevi Gkeka
- Structure Design and Informatics, Sanofi 1371 R&D, 91385 Chilly-Mazarin, France
| | - Tony Lelièvre
- CERMICS, Ecole des Ponts ParisTech, 77455 Marne-la-Vallée, France.,MATHERIALS Team-Project, Inria, 75589 Paris, France
| | - Gabriel Stoltz
- CERMICS, Ecole des Ponts ParisTech, 77455 Marne-la-Vallée, France.,MATHERIALS Team-Project, Inria, 75589 Paris, France
| |
Collapse
|
6
|
Ghorbani M, Prasad S, Klauda JB, Brooks BR. Variational embedding of protein folding simulations using Gaussian mixture variational autoencoders. J Chem Phys 2021; 155:194108. [PMID: 34800961 DOI: 10.1063/5.0069708] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Conformational sampling of biomolecules using molecular dynamics simulations often produces a large amount of high dimensional data that makes it difficult to interpret using conventional analysis techniques. Dimensionality reduction methods are thus required to extract useful and relevant information. Here, we devise a machine learning method, Gaussian mixture variational autoencoder (GMVAE), that can simultaneously perform dimensionality reduction and clustering of biomolecular conformations in an unsupervised way. We show that GMVAE can learn a reduced representation of the free energy landscape of protein folding with highly separated clusters that correspond to the metastable states during folding. Since GMVAE uses a mixture of Gaussians as its prior, it can directly acknowledge the multi-basin nature of the protein folding free energy landscape. To make the model end-to-end differentiable, we use a Gumbel-softmax distribution. We test the model on three long-timescale protein folding trajectories and show that GMVAE embedding resembles the folding funnel with folded states down the funnel and unfolded states outside the funnel path. Additionally, we show that the latent space of GMVAE can be used for kinetic analysis and Markov state models built on this embedding produce folding and unfolding timescales that are in close agreement with other rigorous dynamical embeddings such as time independent component analysis.
Collapse
Affiliation(s)
- Mahdi Ghorbani
- Laboratory of Computational Biology, National Heart, Lung, and Blood Institute, National Institutes of Health, Bethesda, Maryland 20824, USA
| | - Samarjeet Prasad
- Laboratory of Computational Biology, National Heart, Lung, and Blood Institute, National Institutes of Health, Bethesda, Maryland 20824, USA
| | - Jeffery B Klauda
- Department of Chemical and Biomolecular Engineering, University of Maryland, College Park, Maryland 20742, USA
| | - Bernard R Brooks
- Laboratory of Computational Biology, National Heart, Lung, and Blood Institute, National Institutes of Health, Bethesda, Maryland 20824, USA
| |
Collapse
|
7
|
Roet S, Daub CD, Riccardi E. Chemistrees: Data-Driven Identification of Reaction Pathways via Machine Learning. J Chem Theory Comput 2021; 17:6193-6202. [PMID: 34555907 PMCID: PMC8515787 DOI: 10.1021/acs.jctc.1c00458] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/03/2022]
Abstract
![]()
We propose to analyze
molecular dynamics (MD) output via a supervised machine
learning (ML) algorithm, the decision tree.
The approach aims to identify the predominant geometric features which
correlate with trajectories that transition between two arbitrarily
defined states. The data-driven algorithm aims to identify these features
without the bias of human “chemical intuition”. We demonstrate
the method by analyzing the proton exchange reactions in formic acid
solvated in small water clusters. The simulations were performed with ab initio MD combined with a method to efficiently sample
the rare event, path sampling. Our ML analysis identified relevant
geometric variables involved in the proton transfer reaction and how
they may change as the number of solvating water molecules changes.
Collapse
Affiliation(s)
- Sander Roet
- Department of Chemistry, Norwegian University of Science and Technology, Høgskoleringen 5, 7491 Trondheim, Norway
| | - Christopher D Daub
- Department of Chemistry, University of Helsinki, P.O. Box 55, FI-00014 Helsinki, Finland
| | - Enrico Riccardi
- Department of Informatics, UiO, Gaustadalléen 23B, 0373 Oslo, Norway
| |
Collapse
|
8
|
Computational methods for exploring protein conformations. Biochem Soc Trans 2021; 48:1707-1724. [PMID: 32756904 PMCID: PMC7458412 DOI: 10.1042/bst20200193] [Citation(s) in RCA: 30] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2020] [Revised: 07/07/2020] [Accepted: 07/09/2020] [Indexed: 12/13/2022]
Abstract
Proteins are dynamic molecules that can transition between a potentially wide range of structures comprising their conformational ensemble. The nature of these conformations and their relative probabilities are described by a high-dimensional free energy landscape. While computer simulation techniques such as molecular dynamics simulations allow characterisation of the metastable conformational states and the transitions between them, and thus free energy landscapes, to be characterised, the barriers between states can be high, precluding efficient sampling without substantial computational resources. Over the past decades, a dizzying array of methods have emerged for enhancing conformational sampling, and for projecting the free energy landscape onto a reduced set of dimensions that allow conformational states to be distinguished, known as collective variables (CVs), along which sampling may be directed. Here, a brief description of what biomolecular simulation entails is followed by a more detailed exposition of the nature of CVs and methods for determining these, and, lastly, an overview of the myriad different approaches for enhancing conformational sampling, most of which rely upon CVs, including new advances in both CV determination and conformational sampling due to machine learning.
Collapse
|
9
|
Schlick T, Portillo-Ledesma S. Biomolecular modeling thrives in the age of technology. NATURE COMPUTATIONAL SCIENCE 2021; 1:321-331. [PMID: 34423314 PMCID: PMC8378674 DOI: 10.1038/s43588-021-00060-9] [Citation(s) in RCA: 40] [Impact Index Per Article: 13.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/24/2020] [Accepted: 03/22/2021] [Indexed: 12/12/2022]
Abstract
The biomolecular modeling field has flourished since its early days in the 1970s due to the rapid adaptation and tailoring of state-of-the-art technology. The resulting dramatic increase in size and timespan of biomolecular simulations has outpaced Moore's law. Here, we discuss the role of knowledge-based versus physics-based methods and hardware versus software advances in propelling the field forward. This rapid adaptation and outreach suggests a bright future for modeling, where theory, experimentation and simulation define three pillars needed to address future scientific and biomedical challenges.
Collapse
Affiliation(s)
- Tamar Schlick
- Department of Chemistry, New York University, New York, NY, USA
- Courant Institute of Mathematical Sciences, New York University, New York, NY, USA
- New York University–East China Normal University Center for Computational Chemistry at New York University Shanghai, Shanghai, China
| | | |
Collapse
|
10
|
Hooft F, Pérez de Alba Ortíz A, Ensing B. Discovering Collective Variables of Molecular Transitions via Genetic Algorithms and Neural Networks. J Chem Theory Comput 2021; 17:2294-2306. [PMID: 33662202 PMCID: PMC8047796 DOI: 10.1021/acs.jctc.0c00981] [Citation(s) in RCA: 20] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2020] [Indexed: 01/13/2023]
Abstract
With the continual improvement of computing hardware and algorithms, simulations have become a powerful tool for understanding all sorts of (bio)molecular processes. To handle the large simulation data sets and to accelerate slow, activated transitions, a condensed set of descriptors, or collective variables (CVs), is needed to discern the relevant dynamics that describes the molecular process of interest. However, proposing an adequate set of CVs that can capture the intrinsic reaction coordinate of the molecular transition is often extremely difficult. Here, we present a framework to find an optimal set of CVs from a pool of candidates using a combination of artificial neural networks and genetic algorithms. The approach effectively replaces the encoder of an autoencoder network with genes to represent the latent space, i.e., the CVs. Given a selection of CVs as input, the network is trained to recover the atom coordinates underlying the CV values at points along the transition. The network performance is used as an estimator of the fitness of the input CVs. Two genetic algorithms optimize the CV selection and the neural network architecture. The successful retrieval of optimal CVs by this framework is illustrated at the hand of two case studies: the well-known conformational change in the alanine dipeptide molecule and the more intricate transition of a base pair in B-DNA from the classic Watson-Crick pairing to the alternative Hoogsteen pairing. Key advantages of our framework include the following: optimal interpretable CVs, avoiding costly calculation of committor or time-correlation functions, and automatic hyperparameter optimization. In addition, we show that applying a time-delay between the network input and output allows for enhanced selection of slow variables. Moreover, the network can also be used to generate molecular configurations of unexplored microstates, for example, for augmentation of the simulation data.
Collapse
Affiliation(s)
- Ferry Hooft
- Van ’t Hoff Institute
for Molecular Sciences, AI4Science Laboratory, and Amsterdam Center
for Multiscale Modeling, University of Amsterdam, Science Park 904, 1098 XH Amsterdam, The Netherlands
| | - Alberto Pérez de Alba Ortíz
- Van ’t Hoff Institute
for Molecular Sciences, AI4Science Laboratory, and Amsterdam Center
for Multiscale Modeling, University of Amsterdam, Science Park 904, 1098 XH Amsterdam, The Netherlands
| | - Bernd Ensing
- Van ’t Hoff Institute
for Molecular Sciences, AI4Science Laboratory, and Amsterdam Center
for Multiscale Modeling, University of Amsterdam, Science Park 904, 1098 XH Amsterdam, The Netherlands
| |
Collapse
|
11
|
Chen M. Collective variable-based enhanced sampling and machine learning. THE EUROPEAN PHYSICAL JOURNAL. B 2021; 94:211. [PMID: 34697536 PMCID: PMC8527828 DOI: 10.1140/epjb/s10051-021-00220-w] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/02/2021] [Accepted: 10/03/2021] [Indexed: 05/14/2023]
Abstract
ABSTRACT Collective variable-based enhanced sampling methods have been widely used to study thermodynamic properties of complex systems. Efficiency and accuracy of these enhanced sampling methods are affected by two factors: constructing appropriate collective variables for enhanced sampling and generating accurate free energy surfaces. Recently, many machine learning techniques have been developed to improve the quality of collective variables and the accuracy of free energy surfaces. Although machine learning has achieved great successes in improving enhanced sampling methods, there are still many challenges and open questions. In this perspective, we shall review recent developments on integrating machine learning techniques and collective variable-based enhanced sampling approaches. We also discuss challenges and future research directions including generating kinetic information, exploring high-dimensional free energy surfaces, and efficiently sampling all-atom configurations.
Collapse
Affiliation(s)
- Ming Chen
- Department of Chemistry, Purdue University, West Lafayette, IN 47907 USA
| |
Collapse
|
12
|
Badu S, Melnik R, Singh S. Mathematical and computational models of RNA nanoclusters and their applications in data-driven environments. MOLECULAR SIMULATION 2020. [DOI: 10.1080/08927022.2020.1804564] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Affiliation(s)
- Shyam Badu
- MS2Discovery Interdisciplinary Research Institute, Wilfrid Laurier University, Waterloo, Ontario, Canada
| | - Roderick Melnik
- MS2Discovery Interdisciplinary Research Institute, Wilfrid Laurier University, Waterloo, Ontario, Canada
- BCAM-Basque Center for Applied Mathematics, Bilbao, Spain
| | - Sundeep Singh
- MS2Discovery Interdisciplinary Research Institute, Wilfrid Laurier University, Waterloo, Ontario, Canada
| |
Collapse
|
13
|
Bonati L, Rizzi V, Parrinello M. Data-Driven Collective Variables for Enhanced Sampling. J Phys Chem Lett 2020; 11:2998-3004. [PMID: 32239945 DOI: 10.1021/acs.jpclett.0c00535] [Citation(s) in RCA: 74] [Impact Index Per Article: 18.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Designing an appropriate set of collective variables is crucial to the success of several enhanced sampling methods. Here we focus on how to obtain such variables from information limited to the metastable states. We characterize these states by a large set of descriptors and employ neural networks to compress this information in a lower-dimensional space, using Fisher's linear discriminant as an objective function to maximize the discriminative power of the network. We test this method on alanine dipeptide, using the nonlinearly separable data set composed by atomic distances. We then study an intermolecular aldol reaction characterized by a concerted mechanism. The resulting variables are able to promote sampling by drawing nonlinear paths in the physical space connecting the fluctuations between metastable basins. Lastly, we interpret the behavior of the neural network by studying its relation to the physical variables. Through the identification of its most relevant features, we are able to gain chemical insight into the process.
Collapse
Affiliation(s)
- Luigi Bonati
- Department of Physics, ETH Zurich, 8092 Zurich, Switzerland
- Institute of Computational Sciences, Università della Svizzera italiana, via Buffi 13, 6900 Lugano, Switzerland
| | - Valerio Rizzi
- Institute of Computational Sciences, Università della Svizzera italiana, via Buffi 13, 6900 Lugano, Switzerland
- Department of Chemistry and Applied Biosciences, ETH Zurich, 8092 Zurich, Switzerland
| | - Michele Parrinello
- Institute of Computational Sciences, Università della Svizzera italiana, via Buffi 13, 6900 Lugano, Switzerland
- Department of Chemistry and Applied Biosciences, ETH Zurich, 8092 Zurich, Switzerland
- Italian Institute of Technology, Via Morego 30, 16163 Genova, Italy
| |
Collapse
|
14
|
Wang Y, Lamim Ribeiro JM, Tiwary P. Machine learning approaches for analyzing and enhancing molecular dynamics simulations. Curr Opin Struct Biol 2020; 61:139-145. [PMID: 31972477 DOI: 10.1016/j.sbi.2019.12.016] [Citation(s) in RCA: 119] [Impact Index Per Article: 29.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2019] [Revised: 12/16/2019] [Accepted: 12/26/2019] [Indexed: 10/25/2022]
Abstract
Molecular dynamics (MD) has become a powerful tool for studying biophysical systems, due to increasing computational power and availability of software. Although MD has made many contributions to better understanding these complex biophysical systems, there remain methodological difficulties to be surmounted. First, how to make the deluge of data generated in running even a microsecond long MD simulation human comprehensible. Second, how to efficiently sample the underlying free energy surface and kinetics. In this short perspective, we summarize machine learning based ideas that are solving both of these limitations, with a focus on their key theoretical underpinnings and remaining challenges.
Collapse
Affiliation(s)
- Yihang Wang
- Biophysics Program and Institute for Physical Science and Technology, University of Maryland, College Park, MD 20742, USA
| | - João Marcelo Lamim Ribeiro
- Department of Pharmacological Sciences, Icahn School of Medicine at Mount Sinai, One Gustave L. Levy Place, Box 1677, New York, NY 10029, USA
| | - Pratyush Tiwary
- Department of Chemistry and Biochemistry and Institute for Physical Science and Technology, University of Maryland, College Park, MD 20742, USA.
| |
Collapse
|
15
|
Abstract
Sampling complex free-energy surfaces is one of the main challenges of modern atomistic simulation methods. The presence of kinetic bottlenecks in such surfaces often renders a direct approach useless. A popular strategy is to identify a small number of key collective variables and to introduce a bias potential that is able to favor their fluctuations in order to accelerate sampling. Here, we propose to use machine-learning techniques in conjunction with the recent variationally enhanced sampling method [O. Valsson, M. Parrinello, Phys. Rev. Lett. 113, 090601 (2014)] in order to determine such potential. This is achieved by expressing the bias as a neural network. The parameters are determined in a variational learning scheme aimed at minimizing an appropriate functional. This required the development of a more efficient minimization technique. The expressivity of neural networks allows representing rapidly varying free-energy surfaces, removes boundary effects artifacts, and allows several collective variables to be handled.
Collapse
|