1
|
Nadkarni I, Jeong J, Yalcin B, Aluru NR. Modulating Coarse-Grained Dynamics by Perturbing Free Energy Landscapes. J Phys Chem A 2024. [PMID: 39540849 DOI: 10.1021/acs.jpca.4c04530] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2024]
Abstract
We introduce an approach to describe the long-time dynamics of multiatomic molecules by modulating the free energy landscape (FEL) to capture dominant features of the energy-barrier crossing dynamics of the all-atom (AA) system. Notably, we establish that the self-diffusion coefficient of coarse-grained (CG) systems can be accurately delineated by enhancing conservative force fields with high-frequency perturbations. Using theoretical arguments, we show that these perturbations do not alter the lower-order distribution functions, thereby preserving the structure of the AA system after coarse-graining. We demonstrate the utility of this approach using molecular dynamics simulations of simple molecules in bulk with distinct dynamical characteristics with and without time scale separations as well as for inhomogeneous systems where a fluid is confined in a slit-like nanochannel. Additionally, we also apply our approach to more powerful many-body potentials optimized by using machine learning (ML).
Collapse
Affiliation(s)
- Ishan Nadkarni
- Walker Department of Mechanical Engineering, The University of Texas at Austin, Austin, Texas 78712, United States
| | - Jinu Jeong
- Department of Mechanical Science and Engineering, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, United States
| | - Bugra Yalcin
- Oden Institute for Computational Engineering and Sciences, The University of Texas at Austin, Austin, Texas 78712, United States
| | - Narayana R Aluru
- Walker Department of Mechanical Engineering, The University of Texas at Austin, Austin, Texas 78712, United States
- Oden Institute for Computational Engineering and Sciences, The University of Texas at Austin, Austin, Texas 78712, United States
| |
Collapse
|
2
|
Das M, Venkatramani R. A Mode Evolution Metric to Extract Reaction Coordinates for Biomolecular Conformational Transitions. J Chem Theory Comput 2024; 20:8422-8436. [PMID: 39287954 DOI: 10.1021/acs.jctc.4c00744] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/19/2024]
Abstract
The complex, multidimensional energy landscape of biomolecules makes the extraction of suitable, nonintuitive collective variables (CVs) that describe their conformational transitions challenging. At present, dimensionality reduction approaches and machine learning (ML) schemes are employed to obtain CVs from molecular dynamics (MD)/Monte Carlo (MC) trajectories or structural databanks for biomolecules. However, minimum sampling conditions to generate reliable CVs that accurately describe the underlying energy landscape remain unclear. Here, we address this issue by developing a Mode evolution Metric (MeM) to extract CVs that can pinpoint new states and describe local transitions in the vicinity of a reference minimum from nonequilibrated MD/MC trajectories. We present a general mathematical formulation of MeM for both statistical dimensionality reduction and machine learning approaches. Application of MeM to MC trajectories of model potential energy landscapes and MD trajectories of solvated alanine dipeptide reveals that the principal components which locate new states in the vicinity of a reference minimum emerge well before the trajectories locally equilibrate between the associated states. Finally, we demonstrate a possible application of MeM in designing efficient biased sampling schemes to construct accurate energy landscape slices that link transitions between states. MeM can help speed up the search for new minima around a biomolecular conformational state and enable the accurate estimation of thermodynamics for states lying on the energy landscape and the description of associated transitions.
Collapse
Affiliation(s)
- Mitradip Das
- Department of Chemical Sciences, Tata Institue of Fundamental Research, Colaba, Mumbai 400005, India
| | - Ravindra Venkatramani
- Department of Chemical Sciences, Tata Institue of Fundamental Research, Colaba, Mumbai 400005, India
| |
Collapse
|
3
|
Wang D, Qiu Y, Beyerle ER, Huang X, Tiwary P. Information Bottleneck Approach for Markov Model Construction. J Chem Theory Comput 2024; 20:5352-5367. [PMID: 38859575 PMCID: PMC11199095 DOI: 10.1021/acs.jctc.4c00449] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/12/2024]
Abstract
Markov state models (MSMs) have proven valuable in studying the dynamics of protein conformational changes via statistical analysis of molecular dynamics simulations. In MSMs, the complex configuration space is coarse-grained into conformational states, with dynamics modeled by a series of Markovian transitions among these states at discrete lag times. Constructing the Markovian model at a specific lag time necessitates defining states that circumvent significant internal energy barriers, enabling internal dynamics relaxation within the lag time. This process effectively coarse-grains time and space, integrating out rapid motions within metastable states. Thus, MSMs possess a multiresolution nature, where the granularity of states can be adjusted according to the time-resolution, offering flexibility in capturing system dynamics. This work introduces a continuous embedding approach for molecular conformations using the state predictive information bottleneck (SPIB), a framework that unifies dimensionality reduction and state space partitioning via a continuous, machine learned basis set. Without explicit optimization of the VAMP-based scores, SPIB demonstrates state-of-the-art performance in identifying slow dynamical processes and constructing predictive multiresolution Markovian models. Through applications to well-validated mini-proteins, SPIB showcases unique advantages compared to competing methods. It autonomously and self-consistently adjusts the number of metastable states based on a specified minimal time resolution, eliminating the need for manual tuning. While maintaining efficacy in dynamical properties, SPIB excels in accurately distinguishing metastable states and capturing numerous well-populated macrostates. This contrasts with existing VAMP-based methods, which often emphasize slow dynamics at the expense of incorporating numerous sparsely populated states. Furthermore, SPIB's ability to learn a low-dimensional continuous embedding of the underlying MSMs enhances the interpretation of dynamic pathways. With these benefits, we propose SPIB as an easy-to-implement methodology for end-to-end MSM construction.
Collapse
Affiliation(s)
- Dedi Wang
- Biophysics Program and Institute for Physical Science and Technology, University of Maryland, College Park, MD 20742, United States
| | - Yunrui Qiu
- Department of Chemistry, Theoretical Chemistry Institute, University of Wisconsin-Madison, Madison, WI 53706, United States
- Data Science Institute, University of Wisconsin-Madison, Madison, WI, 53706, United States
| | - Eric R. Beyerle
- Institute for Physical Science and Technology, University of Maryland, College Park, MD 20742, United States
| | - Xuhui Huang
- Department of Chemistry, Theoretical Chemistry Institute, University of Wisconsin-Madison, Madison, WI 53706, United States
- Data Science Institute, University of Wisconsin-Madison, Madison, WI, 53706, United States
| | - Pratyush Tiwary
- Department of Chemistry and Biochemistry and Institute for Physical Science and Technology, University of Maryland, College Park, MD 20742, United States
- University of Maryland Institute for Health Computing, Bethesda, MD 20852, United States
| |
Collapse
|
4
|
Wang D, Qiu Y, Beyerle ER, Huang X, Tiwary P. An Information Bottleneck Approach for Markov Model Construction. ARXIV 2024:arXiv:2404.02856v2. [PMID: 38947932 PMCID: PMC11213129] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 07/02/2024]
Abstract
Markov state models (MSMs) have proven valuable in studying dynamics of protein conformational changes via statistical analysis of molecular dynamics (MD) simulations. In MSMs, the complex configuration space is coarse-grained into conformational states, with dynamics modeled by a series of Markovian transitions among these states at discrete lag times. Constructing the Markovian model at a specific lag time necessitates defining states that circumvent significant internal energy barriers, enabling internal dynamics relaxation within the lag time. This process effectively coarse-grains time and space, integrating out rapid motions within metastable states. Thus, MSMs possess a multi-resolution nature, where the granularity of states can be adjusted according to the time-resolution, offering flexibility in capturing system dynamics. This work introduces a continuous embedding approach for molecular conformations using the state predictive information bottleneck (SPIB), a framework that unifies dimensionality reduction and state space partitioning via a continuous, machine learned basis set. Without explicit optimization of the VAMP-based scores, SPIB demonstrates state-of-the-art performance in identifying slow dynamical processes and constructing predictive multi-resolution Markovian models. Through applications to well-validated mini-proteins, SPIB showcases unique advantages compared to competing methods. It autonomously and self-consistently adjusts the number of metastable states based on specified minimal time resolution, eliminating the need for manual tuning. While maintaining efficacy in dynamical properties, SPIB excels in accurately distinguishing metastable states and capturing numerous well-populated macrostates. This contrasts with existing VAMP-based methods, which often emphasize slow dynamics at the expense of incorporating numerous sparsely populated states. Furthermore, SPIB's ability to learn a low-dimensional continuous embedding of the underlying MSMs enhances the interpretation of dynamic pathways. With these benefits, we propose SPIB as an easy-to-implement methodology for end-to-end MSMs construction.
Collapse
Affiliation(s)
- Dedi Wang
- Biophysics Program and Institute for Physical Science and Technology, University of Maryland, College Park, MD 20742, United States
| | - Yunrui Qiu
- Department of Chemistry, Theoretical Chemistry Institute, University of Wisconsin-Madison, Madison, WI 53706, United States
- Data Science Institute, University of Wisconsin-Madison, Madison, WI, 53706, United States
| | - Eric R. Beyerle
- Institute for Physical Science and Technology, University of Maryland, College Park, MD 20742, United States
| | - Xuhui Huang
- Department of Chemistry, Theoretical Chemistry Institute, University of Wisconsin-Madison, Madison, WI 53706, United States
- Data Science Institute, University of Wisconsin-Madison, Madison, WI, 53706, United States
| | - Pratyush Tiwary
- Department of Chemistry and Biochemistry and Institute for Physical Science and Technology, University of Maryland, College Park, MD 20742, United States
- University of Maryland Institute for Health Computing, Bethesda, MD 20852, United States
| |
Collapse
|
5
|
Mehdi S, Smith Z, Herron L, Zou Z, Tiwary P. Enhanced Sampling with Machine Learning. Annu Rev Phys Chem 2024; 75:347-370. [PMID: 38382572 PMCID: PMC11213683 DOI: 10.1146/annurev-physchem-083122-125941] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/23/2024]
Abstract
Molecular dynamics (MD) enables the study of physical systems with excellent spatiotemporal resolution but suffers from severe timescale limitations. To address this, enhanced sampling methods have been developed to improve the exploration of configurational space. However, implementing these methods is challenging and requires domain expertise. In recent years, integration of machine learning (ML) techniques into different domains has shown promise, prompting their adoption in enhanced sampling as well. Although ML is often employed in various fields primarily due to its data-driven nature, its integration with enhanced sampling is more natural with many common underlying synergies. This review explores the merging of ML and enhanced MD by presenting different shared viewpoints. It offers a comprehensive overview of this rapidly evolving field, which can be difficult to stay updated on. We highlight successful strategies such as dimensionality reduction, reinforcement learning, and flow-based methods. Finally, we discuss open problems at the exciting ML-enhanced MD interface.
Collapse
Affiliation(s)
- Shams Mehdi
- Institute for Physical Science and Technology, University of Maryland, College Park, Maryland, USA;
- Biophysics Program, University of Maryland, College Park, Maryland, USA
| | - Zachary Smith
- Institute for Physical Science and Technology, University of Maryland, College Park, Maryland, USA;
- Biophysics Program, University of Maryland, College Park, Maryland, USA
| | - Lukas Herron
- Institute for Physical Science and Technology, University of Maryland, College Park, Maryland, USA;
- Biophysics Program, University of Maryland, College Park, Maryland, USA
| | - Ziyue Zou
- Department of Chemistry and Biochemistry, University of Maryland, College Park, Maryland, USA
| | - Pratyush Tiwary
- Institute for Physical Science and Technology, University of Maryland, College Park, Maryland, USA;
- Department of Chemistry and Biochemistry, University of Maryland, College Park, Maryland, USA
| |
Collapse
|
6
|
France-Lanord A, Vroylandt H, Salanne M, Rotenberg B, Saitta AM, Pietrucci F. Data-Driven Path Collective Variables. J Chem Theory Comput 2024; 20:3069-3084. [PMID: 38619076 DOI: 10.1021/acs.jctc.4c00123] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/16/2024]
Abstract
Identifying optimal collective variables to model transformations using atomic-scale simulations is a long-standing challenge. We propose a new method for the generation, optimization, and comparison of collective variables that can be thought of as a data-driven generalization of the path collective variable concept. It consists of a kernel ridge regression of the committor probability, which encodes a transformation's progress. The resulting collective variable is one-dimensional, interpretable, and differentiable, making it appropriate for enhanced sampling simulations requiring biasing. We demonstrate the validity of the method on two different applications: a precipitation model and the association of Li+ and F- in water. For the former, we show that global descriptors such as the permutation invariant vector allow reaching an accuracy far from the one achieved via simpler, more intuitive variables. For the latter, we show that information correlated with the transformation mechanism is contained in the first solvation shell only and that inertial effects prevent the derivation of optimal collective variables from the atomic positions only.
Collapse
Affiliation(s)
- Arthur France-Lanord
- Institut des Sciences du Calcul et des Données, ISCD, Sorbonne Université, F-75005 Paris, France
- Muséum National d'Histoire Naturelle, UMR CNRS 7590, Institut de Minéralogie, de Physique des Matériaux et de Cosmochimie, IMPMC, Sorbonne Université, F-75005 Paris, France
| | - Hadrien Vroylandt
- Institut des Sciences du Calcul et des Données, ISCD, Sorbonne Université, F-75005 Paris, France
| | - Mathieu Salanne
- Physicochimie des Électrolytes et Nanosystèmes Interfaciaux, Sorbonne Université, CNRS, 4 Place Jussieu, F-75005 Paris, France
- Institut Universitaire de France (IUF), 75231 Paris, France
| | - Benjamin Rotenberg
- Physicochimie des Électrolytes et Nanosystèmes Interfaciaux, Sorbonne Université, CNRS, 4 Place Jussieu, F-75005 Paris, France
| | - A Marco Saitta
- Muséum National d'Histoire Naturelle, UMR CNRS 7590, Institut de Minéralogie, de Physique des Matériaux et de Cosmochimie, IMPMC, Sorbonne Université, F-75005 Paris, France
| | - Fabio Pietrucci
- Muséum National d'Histoire Naturelle, UMR CNRS 7590, Institut de Minéralogie, de Physique des Matériaux et de Cosmochimie, IMPMC, Sorbonne Université, F-75005 Paris, France
| |
Collapse
|
7
|
Wu Y, Cao S, Qiu Y, Huang X. Tutorial on how to build non-Markovian dynamic models from molecular dynamics simulations for studying protein conformational changes. J Chem Phys 2024; 160:121501. [PMID: 38516972 DOI: 10.1063/5.0189429] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2023] [Accepted: 02/20/2024] [Indexed: 03/23/2024] Open
Abstract
Protein conformational changes play crucial roles in their biological functions. In recent years, the Markov State Model (MSM) constructed from extensive Molecular Dynamics (MD) simulations has emerged as a powerful tool for modeling complex protein conformational changes. In MSMs, dynamics are modeled as a sequence of Markovian transitions among metastable conformational states at discrete time intervals (called lag time). A major challenge for MSMs is that the lag time must be long enough to allow transitions among states to become memoryless (or Markovian). However, this lag time is constrained by the length of individual MD simulations available to track these transitions. To address this challenge, we have recently developed Generalized Master Equation (GME)-based approaches, encoding non-Markovian dynamics using a time-dependent memory kernel. In this Tutorial, we introduce the theory behind two recently developed GME-based non-Markovian dynamic models: the quasi-Markov State Model (qMSM) and the Integrative Generalized Master Equation (IGME). We subsequently outline the procedures for constructing these models and provide a step-by-step tutorial on applying qMSM and IGME to study two peptide systems: alanine dipeptide and villin headpiece. This Tutorial is available at https://github.com/xuhuihuang/GME_tutorials. The protocols detailed in this Tutorial aim to be accessible for non-experts interested in studying the biomolecular dynamics using these non-Markovian dynamic models.
Collapse
Affiliation(s)
- Yue Wu
- Department of Chemistry, Theoretical Chemistry Institute, University of Wisconsin-Madison, Madison, Wisconsin 53706, USA
| | - Siqin Cao
- Department of Chemistry, Theoretical Chemistry Institute, University of Wisconsin-Madison, Madison, Wisconsin 53706, USA
| | - Yunrui Qiu
- Department of Chemistry, Theoretical Chemistry Institute, University of Wisconsin-Madison, Madison, Wisconsin 53706, USA
| | - Xuhui Huang
- Department of Chemistry, Theoretical Chemistry Institute, University of Wisconsin-Madison, Madison, Wisconsin 53706, USA
- Data Science Institute, University of Wisconsin-Madison, Madison, Wisconsin 53706, USA
| |
Collapse
|
8
|
Lelièvre T, Pigeon T, Stoltz G, Zhang W. Analyzing Multimodal Probability Measures with Autoencoders. J Phys Chem B 2024; 128:2607-2631. [PMID: 38466759 DOI: 10.1021/acs.jpcb.3c07075] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/13/2024]
Abstract
Finding collective variables to describe some important coarse-grained information on physical systems, in particular metastable states, remains a key issue in molecular dynamics. Recently, machine learning techniques have been intensively used to complement and possibly bypass expert knowledge in order to construct collective variables. Our focus here is on neural network approaches based on autoencoders. We study some relevant mathematical properties of the loss function considered for training autoencoders and provide physical interpretations based on conditional variances and minimum energy paths. We also consider various extensions in order to better describe physical systems, by incorporating more information on transition states at saddle points, and/or allowing for multiple decoders in order to describe several transition paths. Our results are illustrated on toy two-dimensional systems and on alanine dipeptide.
Collapse
Affiliation(s)
- Tony Lelièvre
- CERMICS, École des Ponts ParisTech, 6-8 Avenue Blaise Pascal, 77455 Marne-la-Vallée, France
- MATHERIALS Team-project, Inria Paris, 2 Rue Simone Iff, 75012 Paris, France
| | - Thomas Pigeon
- CERMICS, École des Ponts ParisTech, 6-8 Avenue Blaise Pascal, 77455 Marne-la-Vallée, France
- MATHERIALS Team-project, Inria Paris, 2 Rue Simone Iff, 75012 Paris, France
- IFP Energies Nouvelles, Rond-Point de l'Echangeur de Solaize, BP 3, 69360 Solaize, France
| | - Gabriel Stoltz
- CERMICS, École des Ponts ParisTech, 6-8 Avenue Blaise Pascal, 77455 Marne-la-Vallée, France
- MATHERIALS Team-project, Inria Paris, 2 Rue Simone Iff, 75012 Paris, France
| | - Wei Zhang
- Department of Mathematics and Computer Science, Freie Universität Berlin, Arnimallee 14, 14195 Berlin, Germany
- Zuse Institute Berlin, Takustraße 7, 14195 Berlin, Germany
| |
Collapse
|
9
|
Gong S, Zheng Z. A slow feature analysis approach for the optimization of collective variables. J Chem Phys 2024; 160:094104. [PMID: 38426510 DOI: 10.1063/5.0191014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2023] [Accepted: 02/13/2024] [Indexed: 03/02/2024] Open
Abstract
Molecular dynamics simulations have become increasingly important in understanding the microscopic mechanisms of various molecular systems. However, the high energy barriers in complicated molecules often make it difficult to observe events of interest within a reasonable timescale. To address this issue, researchers have developed a variety of enhanced sampling methods to explore configuration space by adding bias potentials along the slowly changing collective variables (CVs). In this study, we have developed a new tool that combines slow feature analysis and biasing-enhanced sampling methods to identify effective CVs and enhance the sampling efficiency of configuration space. We have demonstrated the effectiveness of this tool through three general examples.
Collapse
Affiliation(s)
- Shuai Gong
- School of Chemistry, Chemical Engineering and Life Science, Wuhan University of Technology, 122 Luoshi Road, Wuhan 430070, People's Republic of China
| | - Zheng Zheng
- School of Chemistry, Chemical Engineering and Life Science, Wuhan University of Technology, 122 Luoshi Road, Wuhan 430070, People's Republic of China
- Divamics Inc., Suzhou 215000, People's Republic of China
| |
Collapse
|
10
|
Fu H, Bian H, Shao X, Cai W. Collective Variable-Based Enhanced Sampling: From Human Learning to Machine Learning. J Phys Chem Lett 2024; 15:1774-1783. [PMID: 38329095 DOI: 10.1021/acs.jpclett.3c03542] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/09/2024]
Abstract
Enhanced-sampling algorithms relying on collective variables (CVs) are extensively employed to study complex (bio)chemical processes that are not amenable to brute-force molecular simulations. The selection of appropriate CVs characterizing the slow movement modes is of paramount importance for reliable and efficient enhanced-sampling simulations. In this Perspective, we first review the application and limitations of CVs obtained from chemical and geometrical intuition. We also introduce path-sampling algorithms, which can identify path-like CVs in a high-dimensional free-energy space. Machine-learning algorithms offer a viable approach to finding suitable CVs by analyzing trajectories from preliminary simulations. We discuss both the performance of machine-learning-derived CVs in enhanced-sampling simulations of experimental models and the challenges involved in applying these CVs to realistic, complex molecular assemblies. Moreover, we provide a prospective view of the potential advancements of machine-learning algorithms for the development of CVs in the field of enhanced-sampling simulations.
Collapse
Affiliation(s)
- Haohao Fu
- Research Center for Analytical Sciences, Frontiers Science Center for New Organic Matter, College of Chemistry, Nankai University, Tianjin Key Laboratory of Biosensing and Molecular Recognition, State Key Laboratory of Medicinal Chemical Biology, Tianjin 300071, China
- Haihe Laboratory of Sustainable Chemical Transformations, Tianjin 300192, China
| | - Hengwei Bian
- Research Center for Analytical Sciences, Frontiers Science Center for New Organic Matter, College of Chemistry, Nankai University, Tianjin Key Laboratory of Biosensing and Molecular Recognition, State Key Laboratory of Medicinal Chemical Biology, Tianjin 300071, China
- Haihe Laboratory of Sustainable Chemical Transformations, Tianjin 300192, China
| | - Xueguang Shao
- Research Center for Analytical Sciences, Frontiers Science Center for New Organic Matter, College of Chemistry, Nankai University, Tianjin Key Laboratory of Biosensing and Molecular Recognition, State Key Laboratory of Medicinal Chemical Biology, Tianjin 300071, China
- Haihe Laboratory of Sustainable Chemical Transformations, Tianjin 300192, China
| | - Wensheng Cai
- Research Center for Analytical Sciences, Frontiers Science Center for New Organic Matter, College of Chemistry, Nankai University, Tianjin Key Laboratory of Biosensing and Molecular Recognition, State Key Laboratory of Medicinal Chemical Biology, Tianjin 300071, China
- Haihe Laboratory of Sustainable Chemical Transformations, Tianjin 300192, China
| |
Collapse
|
11
|
Beyerle ER, Tiwary P. Thermodynamically Optimized Machine-Learned Reaction Coordinates for Hydrophobic Ligand Dissociation. J Phys Chem B 2024; 128:755-767. [PMID: 38205806 DOI: 10.1021/acs.jpcb.3c08304] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/12/2024]
Abstract
Ligand unbinding is mediated by its free energy change, which has intertwined contributions from both energy and entropy. It is important, but not easy, to quantify their individual contributions to the free energy profile. We model hydrophobic ligand unbinding for two systems, a methane particle and a C60 fullerene, both unbinding from hydrophobic pockets in all-atom water. Using a modified deep learning framework, we learn a thermodynamically optimized reaction coordinate to describe the hydrophobic ligand dissociation for both systems. Interpretation of these reaction coordinates reveals the roles of entropic and enthalpic forces as the ligand and pocket sizes change. In both cases, we observe that the free-energy barrier to unbinding is dominated by entropy considerations. Furthermore, the process of methane unbinding is driven by methane solvation, while fullerene unbinding is driven first by pocket wetting and then fullerene wetting. For both solutes, the direct importance of the distance from the binding pocket to the learned reaction coordinate is present, but low. Our framework and subsequent feature important analysis thus give useful thermodynamic insight into hydrophobic ligand dissociation problems that are otherwise difficult to glean.
Collapse
Affiliation(s)
- Eric R Beyerle
- Institute for Physical Science and Technology, University of Maryland, College Park, Maryland 20742, United States
| | - Pratyush Tiwary
- Institute for Physical Science and Technology, University of Maryland, College Park, Maryland 20742, United States
- Department of Chemistry, University of Maryland, College Park, Maryland 20742, United States
| |
Collapse
|
12
|
Arbon R, Zhu Y, Mey ASJS. Markov State Models: To Optimize or Not to Optimize. J Chem Theory Comput 2024; 20:977-988. [PMID: 38163961 PMCID: PMC10809420 DOI: 10.1021/acs.jctc.3c01134] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2023] [Revised: 12/10/2023] [Accepted: 12/11/2023] [Indexed: 01/03/2024]
Abstract
Markov state models (MSM) are a popular statistical method for analyzing the conformational dynamics of proteins including protein folding. With all statistical and machine learning (ML) models, choices must be made about the modeling pipeline that cannot be directly learned from the data. These choices, or hyperparameters, are often evaluated by expert judgment or, in the case of MSMs, by maximizing variational scores such as the VAMP-2 score. Modern ML and statistical pipelines often use automatic hyperparameter selection techniques ranging from the simple, choosing the best score from a random selection of hyperparameters, to the complex, optimization via, e.g., Bayesian optimization. In this work, we ask whether it is possible to automatically select MSM models this way by estimating and analyzing over 16,000,000 observations from over 280,000 estimated MSMs. We find that differences in hyperparameters can change the physical interpretation of the optimization objective, making automatic selection difficult. In addition, we find that enforcing conditions of equilibrium in the VAMP scores can result in inconsistent model selection. However, other parameters that specify the VAMP-2 score (lag time and number of relaxation processes scored) have only a negligible influence on model selection. We suggest that model observables and variational scores should be only a guide to model selection and that a full investigation of the MSM properties should be undertaken when selecting hyperparameters.
Collapse
Affiliation(s)
- Robert
E. Arbon
- EaStCHEM
School of Chemistry, David Brewster Road, Joseph Black Building, The King’s Buildings, Edinburgh EH9 3FJ, United Kingdom
- Redesign
Science, 180 Varick St., New York, New York 10014, United States
| | - Yanchen Zhu
- EaStCHEM
School of Chemistry, David Brewster Road, Joseph Black Building, The King’s Buildings, Edinburgh EH9 3FJ, United Kingdom
| | - Antonia S. J. S. Mey
- EaStCHEM
School of Chemistry, David Brewster Road, Joseph Black Building, The King’s Buildings, Edinburgh EH9 3FJ, United Kingdom
| |
Collapse
|
13
|
Herringer NSM, Dasetty S, Gandhi D, Lee J, Ferguson AL. Permutationally Invariant Networks for Enhanced Sampling (PINES): Discovery of Multimolecular and Solvent-Inclusive Collective Variables. J Chem Theory Comput 2024; 20:178-198. [PMID: 38150421 DOI: 10.1021/acs.jctc.3c00923] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2023]
Abstract
The typically rugged nature of molecular free-energy landscapes can frustrate efficient sampling of the thermodynamically relevant phase space due to the presence of high free-energy barriers. Enhanced sampling techniques can improve phase space exploration by accelerating sampling along particular collective variables (CVs). A number of techniques exist for the data-driven discovery of CVs parametrizing the important large-scale motions of the system. A challenge to CV discovery is learning CVs invariant to the symmetries of the molecular system, frequently rigid translation, rigid rotation, and permutational relabeling of identical particles. Of these, permutational invariance has proved a persistent challenge in frustrating the data-driven discovery of multimolecular CVs in systems of self-assembling particles and solvent-inclusive CVs for solvated systems. In this work, we integrate permutation invariant vector (PIV) featurizations with autoencoding neural networks to learn nonlinear CVs invariant to translation, rotation, and permutation and perform interleaved rounds of CV discovery and enhanced sampling to iteratively expand the sampling of configurational phase space and obtain converged CVs and free-energy landscapes. We demonstrate the permutationally invariant network for enhanced sampling (PINES) approach in applications to the self-assembly of a 13-atom argon cluster, association/dissociation of a NaCl ion pair in water, and hydrophobic collapse of a C45H92 n-pentatetracontane polymer chain. We make the approach freely available as a new module within the PLUMED2 enhanced sampling libraries.
Collapse
Affiliation(s)
| | - Siva Dasetty
- Pritzker School of Molecular Engineering, University of Chicago, Chicago, Illinois 60637, United States
| | - Diya Gandhi
- Pritzker School of Molecular Engineering, University of Chicago, Chicago, Illinois 60637, United States
| | - Junhee Lee
- Pritzker School of Molecular Engineering, University of Chicago, Chicago, Illinois 60637, United States
| | - Andrew L Ferguson
- Pritzker School of Molecular Engineering, University of Chicago, Chicago, Illinois 60637, United States
| |
Collapse
|
14
|
Ishizone T, Matsunaga Y, Fuchigami S, Nakamura K. Representation of Protein Dynamics Disentangled by Time-Structure-Based Prior. J Chem Theory Comput 2024; 20:436-450. [PMID: 38151233 DOI: 10.1021/acs.jctc.3c01025] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2023]
Abstract
Representation learning (RL) is a universal technique for deriving low-dimensional disentangled representations from high-dimensional observations, aiding in a multitude of downstream tasks. RL has been extensively applied to various data types, including images and natural language. Here, we analyze molecular dynamics (MD) simulation data of biomolecules in terms of RL. Currently, state-of-the-art RL techniques, mainly motivated by the variational principle, try to capture slow motions in the representation (latent) space. Here, we propose two methods based on an alternative perspective on the disentanglement in the latent space. By disentanglement, we here mean the separation of underlying factors in the simulation data, aiding in detecting physically important coordinates for conformational transitions. The proposed methods introduce a simple prior that imposes temporal constraints in the latent space, serving as a regularization term to facilitate the capture of disentangled representations of dynamics. Comparison with other methods via the analysis of MD simulation trajectories for alanine dipeptide and chignolin validates that the proposed methods construct Markov state models (MSMs) whose implied time scales are comparable to those of the state-of-the-art methods. Using a measure based on total variation, we quantitatively evaluated that the proposed methods successfully disentangle physically important coordinates, aiding the interpretation of folding/unfolding transitions of chignolin. Overall, our methods provide good representations of complex biomolecular dynamics for downstream tasks, allowing for better interpretations of the conformational transitions.
Collapse
Affiliation(s)
- Tsuyoshi Ishizone
- Mathematical Sciences Program, Graduate School of Advanced Mathematical Sciences, Meiji University, Nakano 4-21-1, Nakano-ku, Tokyo 164-8525, Japan
| | - Yasuhiro Matsunaga
- Graduate School of Science and Engineering, Saitama University, Shimo-Okubo 255, Sakura-ku, Saitama-shi, Saitama 338-8570, Japan
| | - Sotaro Fuchigami
- Physical Biochemistry Laboratory, Division of Pharmaceutical Sciences, School of Pharmaceutical Sciences, University of Shizuoka, 52-1 Yada, Suruga-ku, Shizuoka 422-8526, Japan
| | - Kazuyuki Nakamura
- Department of Mathematical Sciences Based on Modeling and Analysis, School of Interdisciplinary Mathematical Sciences, Meiji University, Nakano 4-21-1, Nakano-ku, Tokyo 164-8525, Japan
| |
Collapse
|
15
|
Kozlowski N, Grubmüller H. Uncertainties in Markov State Models of Small Proteins. J Chem Theory Comput 2023; 19:5516-5524. [PMID: 37540193 PMCID: PMC10448719 DOI: 10.1021/acs.jctc.3c00372] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2023] [Indexed: 08/05/2023]
Abstract
Markov state models are widely used to describe and analyze protein dynamics based on molecular dynamics simulations, specifically to extract functionally relevant characteristic time scales and motions. Particularly for larger biomolecules such as proteins, however, insufficient sampling is a notorious concern and often the source of large uncertainties that are difficult to quantify. Furthermore, there are several other sources of uncertainty, such as choice of the number of Markov states and lag time, choice and parameters of dimension reduction preprocessing step, and uncertainty due to the limited number of observed transitions; the latter is often estimated via a Bayesian approach. Here, we quantified and ranked all of these uncertainties for four small globular test proteins. We found that the largest uncertainty is due to insufficient sampling and initially increases with the total trajectory length T up to a critical tipping point, after which it decreases as 1 / T , thus providing guidelines for how much sampling is required for given accuracy. We also found that single long trajectories yielded better sampling accuracy than many shorter trajectories starting from the same structure. In comparison, the remaining sources of the above uncertainties are generally smaller by a factor of about 5, rendering them less of a concern but certainly not negligible. Importantly, the Bayes uncertainty, commonly used as the only uncertainty estimate, captures only a relatively small part of the true uncertainty, which is thus often drastically underestimated.
Collapse
Affiliation(s)
- Nicolai Kozlowski
- Department of Theoretical and Computational
Biophysics, Max-Planck-Institute for Multidisciplinary
Sciences, Göttingen 37077, Germany
| | - Helmut Grubmüller
- Department of Theoretical and Computational
Biophysics, Max-Planck-Institute for Multidisciplinary
Sciences, Göttingen 37077, Germany
| |
Collapse
|
16
|
Hagg A, Kirschner KN. Open-Source Machine Learning in Computational Chemistry. J Chem Inf Model 2023; 63:4505-4532. [PMID: 37466636 PMCID: PMC10430767 DOI: 10.1021/acs.jcim.3c00643] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2023] [Indexed: 07/20/2023]
Abstract
The field of computational chemistry has seen a significant increase in the integration of machine learning concepts and algorithms. In this Perspective, we surveyed 179 open-source software projects, with corresponding peer-reviewed papers published within the last 5 years, to better understand the topics within the field being investigated by machine learning approaches. For each project, we provide a short description, the link to the code, the accompanying license type, and whether the training data and resulting models are made publicly available. Based on those deposited in GitHub repositories, the most popular employed Python libraries are identified. We hope that this survey will serve as a resource to learn about machine learning or specific architectures thereof by identifying accessible codes with accompanying papers on a topic basis. To this end, we also include computational chemistry open-source software for generating training data and fundamental Python libraries for machine learning. Based on our observations and considering the three pillars of collaborative machine learning work, open data, open source (code), and open models, we provide some suggestions to the community.
Collapse
Affiliation(s)
- Alexander Hagg
- Institute
of Technology, Resource and Energy-Efficient Engineering (TREE), University of Applied Sciences Bonn-Rhein-Sieg, 53757 Sankt Augustin, Germany
- Department
of Electrical Engineering, Mechanical Engineering and Technical Journalism, University of Applied Sciences Bonn-Rhein-Sieg, 53757 Sankt Augustin, Germany
| | - Karl N. Kirschner
- Institute
of Technology, Resource and Energy-Efficient Engineering (TREE), University of Applied Sciences Bonn-Rhein-Sieg, 53757 Sankt Augustin, Germany
- Department
of Computer Science, University of Applied
Sciences Bonn-Rhein-Sieg, 53757 Sankt Augustin, Germany
| |
Collapse
|
17
|
Qiu Y, O’Connor MS, Xue M, Liu B, Huang X. An Efficient Path Classification Algorithm Based on Variational Autoencoder to Identify Metastable Path Channels for Complex Conformational Changes. J Chem Theory Comput 2023; 19:4728-4742. [PMID: 37382437 PMCID: PMC11042546 DOI: 10.1021/acs.jctc.3c00318] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/30/2023]
Abstract
Conformational changes (i.e., dynamic transitions between pairs of conformational states) play important roles in many chemical and biological processes. Constructing the Markov state model (MSM) from extensive molecular dynamics (MD) simulations is an effective approach to dissect the mechanism of conformational changes. When combined with transition path theory (TPT), MSM can be applied to elucidate the ensemble of kinetic pathways connecting pairs of conformational states. However, the application of TPT to analyze complex conformational changes often results in a vast number of kinetic pathways with comparable fluxes. This obstacle is particularly pronounced in heterogeneous self-assembly and aggregation processes. The large number of kinetic pathways makes it challenging to comprehend the molecular mechanisms underlying conformational changes of interest. To address this challenge, we have developed a path classification algorithm named latent-space path clustering (LPC) that efficiently lumps parallel kinetic pathways into distinct metastable path channels, making them easier to comprehend. In our algorithm, MD conformations are first projected onto a low-dimensional space containing a small set of collective variables (CVs) by time-structure-based independent component analysis (tICA) with kinetic mapping. Then, MSM and TPT are constructed to obtain the ensemble of pathways, and a deep learning architecture named the variational autoencoder (VAE) is used to learn the spatial distributions of kinetic pathways in the continuous CV space. Based on the trained VAE model, the TPT-generated ensemble of kinetic pathways can be embedded into a latent space, where the classification becomes clear. We show that LPC can efficiently and accurately identify the metastable path channels in three systems: a 2D potential, the aggregation of two hydrophobic particles in water, and the folding of the Fip35 WW domain. Using the 2D potential, we further demonstrate that our LPC algorithm outperforms the previous path-lumping algorithms by making substantially fewer incorrect assignments of individual pathways to four path channels. We expect that LPC can be widely applied to identify the dominant kinetic pathways underlying complex conformational changes.
Collapse
Affiliation(s)
- Yunrui Qiu
- Department of Chemistry, Theoretical Chemistry Institute, University of Wisconsin-Madison, Madison, WI, 53706, USA
| | - Michael S. O’Connor
- Biophysics Graduate Program, University of Wisconsin-Madison, Madison, WI, 53706, USA
| | - Mingyi Xue
- Department of Chemistry, Theoretical Chemistry Institute, University of Wisconsin-Madison, Madison, WI, 53706, USA
| | - Bojun Liu
- Department of Chemistry, Theoretical Chemistry Institute, University of Wisconsin-Madison, Madison, WI, 53706, USA
| | - Xuhui Huang
- Department of Chemistry, Theoretical Chemistry Institute, University of Wisconsin-Madison, Madison, WI, 53706, USA
- Biophysics Graduate Program, University of Wisconsin-Madison, Madison, WI, 53706, USA
| |
Collapse
|
18
|
Wu M, Liao J, Shu Z, Chen C. Enhanced sampling in explicit solvent by deep learning module in FSATOOL. J Comput Chem 2023. [PMID: 37191088 DOI: 10.1002/jcc.27132] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2023] [Revised: 04/21/2023] [Accepted: 04/27/2023] [Indexed: 05/17/2023]
Abstract
FSATOOL is an integrated molecular simulation and data analysis program. Its old molecular dynamics engine only supports simulations in vacuum or implicit solvent. In this work, we implement the well-known smooth particle mesh Ewald method for simulations in explicit solvent. The new developed engine is runnable on both CPU and GPU. All the existed analysis modules in the program are compatible with the new engine. Moreover, we also build a complete deep learning module in FSATOOL. Based on the module, we further implement two useful trajectory analysis methods: state-free reversible VAMPnets and time-lagged autoencoder. They are good at searching the collective variables related to the conformational transitions of biomolecules. In FSATOOL, these collective variables can be further used to construct a bias potential for the enhanced sampling purpose. We introduce the implementation details of the methods and present their actual performances in FSATOOL by a few enhanced sampling simulations.
Collapse
Affiliation(s)
- Mincong Wu
- Biomolecular Physics and Modeling Group, School of Physics, Huazhong University of Science and Technology, Wuhan, Hubei, China
| | - Jun Liao
- Biomolecular Physics and Modeling Group, School of Physics, Huazhong University of Science and Technology, Wuhan, Hubei, China
| | - Zirui Shu
- Biomolecular Physics and Modeling Group, School of Physics, Huazhong University of Science and Technology, Wuhan, Hubei, China
| | - Changjun Chen
- Biomolecular Physics and Modeling Group, School of Physics, Huazhong University of Science and Technology, Wuhan, Hubei, China
| |
Collapse
|
19
|
Verkhivker G, Alshahrani M, Gupta G, Xiao S, Tao P. From Deep Mutational Mapping of Allosteric Protein Landscapes to Deep Learning of Allostery and Hidden Allosteric Sites: Zooming in on "Allosteric Intersection" of Biochemical and Big Data Approaches. Int J Mol Sci 2023; 24:7747. [PMID: 37175454 PMCID: PMC10178073 DOI: 10.3390/ijms24097747] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2023] [Revised: 04/22/2023] [Accepted: 04/23/2023] [Indexed: 05/15/2023] Open
Abstract
The recent advances in artificial intelligence (AI) and machine learning have driven the design of new expert systems and automated workflows that are able to model complex chemical and biological phenomena. In recent years, machine learning approaches have been developed and actively deployed to facilitate computational and experimental studies of protein dynamics and allosteric mechanisms. In this review, we discuss in detail new developments along two major directions of allosteric research through the lens of data-intensive biochemical approaches and AI-based computational methods. Despite considerable progress in applications of AI methods for protein structure and dynamics studies, the intersection between allosteric regulation, the emerging structural biology technologies and AI approaches remains largely unexplored, calling for the development of AI-augmented integrative structural biology. In this review, we focus on the latest remarkable progress in deep high-throughput mining and comprehensive mapping of allosteric protein landscapes and allosteric regulatory mechanisms as well as on the new developments in AI methods for prediction and characterization of allosteric binding sites on the proteome level. We also discuss new AI-augmented structural biology approaches that expand our knowledge of the universe of protein dynamics and allostery. We conclude with an outlook and highlight the importance of developing an open science infrastructure for machine learning studies of allosteric regulation and validation of computational approaches using integrative studies of allosteric mechanisms. The development of community-accessible tools that uniquely leverage the existing experimental and simulation knowledgebase to enable interrogation of the allosteric functions can provide a much-needed boost to further innovation and integration of experimental and computational technologies empowered by booming AI field.
Collapse
Affiliation(s)
- Gennady Verkhivker
- Keck Center for Science and Engineering, Graduate Program in Computational and Data Sciences, Schmid College of Science and Technology, Chapman University, Orange, CA 92866, USA; (M.A.); (G.G.)
- Department of Biomedical and Pharmaceutical Sciences, Chapman University School of Pharmacy, Irvine, CA 92618, USA
| | - Mohammed Alshahrani
- Keck Center for Science and Engineering, Graduate Program in Computational and Data Sciences, Schmid College of Science and Technology, Chapman University, Orange, CA 92866, USA; (M.A.); (G.G.)
| | - Grace Gupta
- Keck Center for Science and Engineering, Graduate Program in Computational and Data Sciences, Schmid College of Science and Technology, Chapman University, Orange, CA 92866, USA; (M.A.); (G.G.)
| | - Sian Xiao
- Department of Chemistry, Center for Research Computing, Center for Drug Discovery, Design, and Delivery (CD4), Southern Methodist University, Dallas, TX 75275, USA; (S.X.); (P.T.)
| | - Peng Tao
- Department of Chemistry, Center for Research Computing, Center for Drug Discovery, Design, and Delivery (CD4), Southern Methodist University, Dallas, TX 75275, USA; (S.X.); (P.T.)
| |
Collapse
|
20
|
Hunkler S, Diederichs K, Kukharenko O, Peter C. Fast conformational clustering of extensive molecular dynamics simulation data. J Chem Phys 2023; 158:144109. [PMID: 37061476 DOI: 10.1063/5.0142797] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/17/2023] Open
Abstract
We present an unsupervised data processing workflow that is specifically designed to obtain a fast conformational clustering of long molecular dynamics simulation trajectories. In this approach, we combine two dimensionality reduction algorithms (cc_analysis and encodermap) with a density-based spatial clustering algorithm (hierarchical density-based spatial clustering of applications with noise). The proposed scheme benefits from the strengths of the three algorithms while avoiding most of the drawbacks of the individual methods. Here, the cc_analysis algorithm is applied for the first time to molecular simulation data. The encodermap algorithm complements cc_analysis by providing an efficient way to process and assign large amounts of data to clusters. The main goal of the procedure is to maximize the number of assigned frames of a given trajectory while keeping a clear conformational identity of the clusters that are found. In practice, we achieve this by using an iterative clustering approach and a tunable root-mean-square-deviation-based criterion in the final cluster assignment. This allows us to find clusters of different densities and different degrees of structural identity. With the help of four protein systems, we illustrate the capability and performance of this clustering workflow: wild-type and thermostable mutant of the Trp-cage protein (TC5b and TC10b), NTL9, and Protein B. Each of these test systems poses their individual challenges to the scheme, which, in total, give a nice overview of the advantages and potential difficulties that can arise when using the proposed method.
Collapse
Affiliation(s)
- Simon Hunkler
- Department of Chemistry, University of Konstanz, Konstanz, Germany
| | - Kay Diederichs
- Department of Chemistry, University of Konstanz, Konstanz, Germany
| | | | - Christine Peter
- Department of Chemistry, University of Konstanz, Konstanz, Germany
| |
Collapse
|
21
|
Xiao S, Verkhivker GM, Tao P. Machine learning and protein allostery. Trends Biochem Sci 2023; 48:375-390. [PMID: 36564251 PMCID: PMC10023316 DOI: 10.1016/j.tibs.2022.12.001] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2022] [Revised: 11/16/2022] [Accepted: 12/02/2022] [Indexed: 12/23/2022]
Abstract
The fundamental biological importance and complexity of allosterically regulated proteins stem from their central role in signal transduction and cellular processes. Recently, machine-learning approaches have been developed and actively deployed to facilitate theoretical and experimental studies of protein dynamics and allosteric mechanisms. In this review, we survey recent developments in applications of machine-learning methods for studies of allosteric mechanisms, prediction of allosteric effects and allostery-related physicochemical properties, and allosteric protein engineering. We also review the applications of machine-learning strategies for characterization of allosteric mechanisms and drug design targeting SARS-CoV-2. Continuous development and task-specific adaptation of machine-learning methods for protein allosteric mechanisms will have an increasingly important role in bridging a wide spectrum of data-intensive experimental and theoretical technologies.
Collapse
Affiliation(s)
- Sian Xiao
- Department of Chemistry, Center for Research Computing, Center for Drug Discovery, Design, and Delivery (CD4), Southern Methodist University, Dallas, TX 75205, USA.
| | - Gennady M Verkhivker
- Graduate Program in Computational and Data Sciences, Schmid College of Science and Technology, Chapman University, Orange, CA 92866, USA; Department of Biomedical and Pharmaceutical Sciences, Chapman University School of Pharmacy, Irvine, CA 92618, USA
| | - Peng Tao
- Department of Chemistry, Center for Research Computing, Center for Drug Discovery, Design, and Delivery (CD4), Southern Methodist University, Dallas, TX 75205, USA.
| |
Collapse
|
22
|
Agajanian S, Alshahrani M, Bai F, Tao P, Verkhivker GM. Exploring and Learning the Universe of Protein Allostery Using Artificial Intelligence Augmented Biophysical and Computational Approaches. J Chem Inf Model 2023; 63:1413-1428. [PMID: 36827465 PMCID: PMC11162550 DOI: 10.1021/acs.jcim.2c01634] [Citation(s) in RCA: 13] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/26/2023]
Abstract
Allosteric mechanisms are commonly employed regulatory tools used by proteins to orchestrate complex biochemical processes and control communications in cells. The quantitative understanding and characterization of allosteric molecular events are among major challenges in modern biology and require integration of innovative computational experimental approaches to obtain atomistic-level knowledge of the allosteric states, interactions, and dynamic conformational landscapes. The growing body of computational and experimental studies empowered by emerging artificial intelligence (AI) technologies has opened up new paradigms for exploring and learning the universe of protein allostery from first principles. In this review we analyze recent developments in high-throughput deep mutational scanning of allosteric protein functions; applications and latest adaptations of Alpha-fold structural prediction methods for studies of protein dynamics and allostery; new frontiers in integrating machine learning and enhanced sampling techniques for characterization of allostery; and recent advances in structural biology approaches for studies of allosteric systems. We also highlight recent computational and experimental studies of the SARS-CoV-2 spike (S) proteins revealing an important and often hidden role of allosteric regulation driving functional conformational changes, binding interactions with the host receptor, and mutational escape mechanisms of S proteins which are critical for viral infection. We conclude with a summary and outlook of future directions suggesting that AI-augmented biophysical and computer simulation approaches are beginning to transform studies of protein allostery toward systematic characterization of allosteric landscapes, hidden allosteric states, and mechanisms which may bring about a new revolution in molecular biology and drug discovery.
Collapse
Affiliation(s)
- Steve Agajanian
- Keck Center for Science and Engineering, Graduate Program in Computational and Data Sciences, Schmid College of Science and Technology, Chapman University, Orange, California 92866, United States
| | - Mohammed Alshahrani
- Keck Center for Science and Engineering, Graduate Program in Computational and Data Sciences, Schmid College of Science and Technology, Chapman University, Orange, California 92866, United States
| | - Fang Bai
- Shanghai Institute for Advanced Immunochemical Studies, School of Life Science and Technology and Information Science and Technology, Shanghai Tech University, 393 Middle Huaxia Road, Shanghai 201210, China
| | - Peng Tao
- Department of Chemistry, Center for Research Computing, Center for Drug Discovery, Design, and Delivery (CD4), Southern Methodist University, Dallas, Texas 75205, United States
| | - Gennady M Verkhivker
- Keck Center for Science and Engineering, Graduate Program in Computational and Data Sciences, Schmid College of Science and Technology, Chapman University, Orange, California 92866, United States
- Department of Biomedical and Pharmaceutical Sciences, Chapman University School of Pharmacy, Irvine, California 92618, United States
| |
Collapse
|
23
|
Predicting efficacy of drug-carrier nanoparticle designs for cancer treatment: a machine learning-based solution. Sci Rep 2023; 13:547. [PMID: 36631637 PMCID: PMC9834306 DOI: 10.1038/s41598-023-27729-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2022] [Accepted: 01/06/2023] [Indexed: 01/13/2023] Open
Abstract
Molecular Dynamic (MD) simulations are very effective in the discovery of nanomedicines for treating cancer, but these are computationally expensive and time-consuming. Existing studies integrating machine learning (ML) into MD simulation to enhance the process and enable efficient analysis cannot provide direct insights without the complete simulation. In this study, we present an ML-based approach for predicting the solvent accessible surface area (SASA) of a nanoparticle (NP), denoting its efficacy, from a fraction of the MD simulations data. The proposed framework uses a time series model for simulating the MD, resulting in an intermediate state, and a second model to calculate the SASA in that state. Empirically, the solution can predict the SASA value 260 timesteps ahead 7.5 times faster with a very low average error of 1956.93. We also introduce the use of an explainability technique to validate the predictions. This work can reduce the computational expense of both processing and data size greatly while providing reliable solutions for the nanomedicine design process.
Collapse
|
24
|
Chen H, Chipot C. Chasing collective variables using temporal data-driven strategies. QRB DISCOVERY 2023; 4:e2. [PMID: 37564298 PMCID: PMC10411323 DOI: 10.1017/qrd.2022.23] [Citation(s) in RCA: 10] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2022] [Revised: 12/21/2022] [Accepted: 12/29/2022] [Indexed: 01/09/2023] Open
Abstract
The convergence of free-energy calculations based on importance sampling depends heavily on the choice of collective variables (CVs), which in principle, should include the slow degrees of freedom of the biological processes to be investigated. Autoencoders (AEs), as emerging data-driven dimension reduction tools, have been utilised for discovering CVs. AEs, however, are often treated as black boxes, and what AEs actually encode during training, and whether the latent variables from encoders are suitable as CVs for further free-energy calculations remains unknown. In this contribution, we review AEs and their time-series-based variants, including time-lagged AEs (TAEs) and modified TAEs, as well as the closely related model variational approach for Markov processes networks (VAMPnets). We then show through numerical examples that AEs learn the high-variance modes instead of the slow modes. In stark contrast, time series-based models are able to capture the slow modes. Moreover, both modified TAEs with extensions from slow feature analysis and the state-free reversible VAMPnets (SRVs) can yield orthogonal multidimensional CVs. As an illustration, we employ SRVs to discover the CVs of the isomerizations of N-acetyl-N'-methylalanylamide and trialanine by iterative learning with trajectories from biased simulations. Last, through numerical experiments with anisotropic diffusion, we investigate the potential relationship of time-series-based models and committor probabilities.
Collapse
Affiliation(s)
- Haochuan Chen
- Laboratoire International Associé Centre National de la Recherche Scientifique et University of Illinois at Urbana-Champaign, Unité Mixte de Recherche n°7019, Université de Lorraine, 54506 Vandœuvre-lès-Nancy, France
| | - Christophe Chipot
- Laboratoire International Associé Centre National de la Recherche Scientifique et University of Illinois at Urbana-Champaign, Unité Mixte de Recherche n°7019, Université de Lorraine, 54506 Vandœuvre-lès-Nancy, France
- Theoretical and Computational Biophysics Group, Beckman Institute, and Department of Physics, University of Illinois at Urbana-Champaign, Urbana, IL61801, USA
- Department of Biochemistry and Molecular Biology, University of Chicago, Chicago, IL60637, USA
| |
Collapse
|
25
|
Baima J, Goryaeva AM, Swinburne TD, Maillet JB, Nastar M, Marinica MC. Capabilities and limits of autoencoders for extracting collective variables in atomistic materials science. Phys Chem Chem Phys 2022; 24:23152-23163. [PMID: 36128869 DOI: 10.1039/d2cp01917e] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
Free energy calculations in materials science are routinely hindered by the need to provide reaction coordinates that can meaningfully partition atomic configuration space, a prerequisite for most enhanced sampling approaches. Recent studies on molecular systems have highlighted the possibility of constructing appropriate collective variables directly from atomic motions through deep learning techniques. Here we extend this class of approaches to condensed matter problems, for which we encode the finite temperature collective variable by an iterative procedure starting from 0 K features of the energy landscape i.e. activation events or migration mechanisms given by a minimum - saddle point - minimum sequence. We employ the autoencoder neural networks in order to build a scalar collective variable for use with the adaptive biasing force method. Particular attention is given to design choices required for application to crystalline systems with defects, including the filtering of thermal motions which otherwise dominate the autoencoder input. The machine-learning workflow is tested on body-centered cubic iron and its common defects, such as small vacancy or self-interstitial clusters and screw dislocations. For localized defects, excellent collective variables as well as derivatives, necessary for free energy sampling, are systematically obtained. However, the approach has a limited accuracy when dealing with reaction coordinates that include atomic displacements of a magnitude comparable to thermal motions, e.g. the ones produced by the long-range elastic field of dislocations. We then combine the extraction of collective variables by autoencoders with an adaptive biasing force free energy method based on Bayesian inference. Using a vacancy migration as an example, we demonstrate the performance of coupling these two approaches for simultaneous discovery of reaction coordinates and free energy sampling in systems with localized defects.
Collapse
Affiliation(s)
- Jacopo Baima
- Université Paris-Saclay, CEA, Service de Recherches de Métallurgie Physique, Gif-sur-Yvette 91191, France.
| | - Alexandra M Goryaeva
- Université Paris-Saclay, CEA, Service de Recherches de Métallurgie Physique, Gif-sur-Yvette 91191, France.
| | - Thomas D Swinburne
- Aix-Marseille Université, CNRS, CINaM UMR 7325, Campus de Luminy, 13288 Marseille, France
| | | | - Maylise Nastar
- Université Paris-Saclay, CEA, Service de Recherches de Métallurgie Physique, Gif-sur-Yvette 91191, France.
| | - Mihai-Cosmin Marinica
- Université Paris-Saclay, CEA, Service de Recherches de Métallurgie Physique, Gif-sur-Yvette 91191, France.
| |
Collapse
|
26
|
Bhakat S. Collective variable discovery in the age of machine learning: reality, hype and everything in between. RSC Adv 2022; 12:25010-25024. [PMID: 36199882 PMCID: PMC9437778 DOI: 10.1039/d2ra03660f] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2022] [Accepted: 08/20/2022] [Indexed: 11/21/2022] Open
Abstract
Understanding the kinetics and thermodynamics profile of biomolecules is necessary to understand their functional roles which has a major impact in mechanism driven drug discovery. Molecular dynamics simulation has been routinely used to understand conformational dynamics and molecular recognition in biomolecules. Statistical analysis of high-dimensional spatiotemporal data generated from molecular dynamics simulation requires identification of a few low-dimensional variables which can describe the essential dynamics of a system without significant loss of information. In physical chemistry, these low-dimensional variables are often called collective variables. Collective variables are used to generate reduced representations of free energy surfaces and calculate transition probabilities between different metastable basins. However the choice of collective variables is not trivial for complex systems. Collective variables range from geometric criteria such as distances and dihedral angles to abstract ones such as weighted linear combinations of multiple geometric variables. The advent of machine learning algorithms led to increasing use of abstract collective variables to represent biomolecular dynamics. In this review, I will highlight several nuances of commonly used collective variables ranging from geometric to abstract ones. Further, I will put forward some cases where machine learning based collective variables were used to describe simple systems which in principle could have been described by geometric ones. Finally, I will put forward my thoughts on artificial general intelligence and how it can be used to discover and predict collective variables from spatiotemporal data generated by molecular dynamics simulations.
Collapse
Affiliation(s)
- Soumendranath Bhakat
- Department of Biochemistry and Biophysics, Perelman School of Medicine, University of Pennsylvania Pennsylvania 19104-6059 USA +1 30549 32620
| |
Collapse
|
27
|
Monroe JI, Shen VK. Systematic Control of Collective Variables Learned from Variational Autoencoders. J Chem Phys 2022; 157:094116. [DOI: 10.1063/5.0105120] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
Variational autoencoders (VAEs) are rapidly gaining popularity within molecular simulation for discovering low-dimensional, or latent, representations, which are critical for both analyzing and accelerating simulations. However, it remains unclear how the information a VAE learns is connected to its probabilistic structure, and, in turn, its loss function. Previous studies have focused on feature engineering, \emph{ad hoc} modifications to loss functions, or adjustment of the prior to enforce desirable latent space properties. By applying effectively arbitrarily flexible priors via normalizing flows, we focus instead on how adjusting the structure of the decoding model impacts the learned latent coordinate. We systematically adjust the power and flexibility of the decoding distribution, observing that this has a significant impact on the structure of the latent space as measured by a suite of metrics developed in this work. By also varying weights on separate terms within each VAE loss function, we show that the level of detail encoded can be further tuned. This provides practical guidance for utilizing VAEs to extract varying resolutions of low-dimensional information from molecular dynamics and Monte Carlo simulations.
Collapse
|
28
|
Li Y, Gong H. Identifying a Feasible Transition Pathway between Two Conformational States for a Protein. J Chem Theory Comput 2022; 18:4529-4543. [PMID: 35723447 DOI: 10.1021/acs.jctc.2c00390] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Proteins usually need to transit between different conformational states to fulfill their biological functions. In the mechanistic study of such transition processes by molecular dynamics simulations, identification of the minimum free energy path (MFEP) can substantially reduce the sampling space, thus enabling rigorous thermodynamic evaluation of the process. Conventionally, the MFEP is derived by iterative local optimization from an initial path, which is typically generated by simple brute force techniques like the targeted molecular dynamics (tMD). Therefore, the quality of the initial path determines the successfulness of MFEP estimation. In this work, we propose a method to improve derivation of the initial path. Through iterative relaxation-biasing simulations in a bidirectional manner, this method can construct a feasible transition pathway connecting two known states for a protein. Evaluation on small, fast-folding proteins against long equilibrium trajectories supports the good sampling efficiency of our method. When applied to larger proteins including the catalytic domain of human c-Src kinase as well as the converter domain of myosin VI, the paths generated by our method deviate significantly from those computed with the generic tMD approach. More importantly, free energy profiles and intermediate states obtained from our paths exhibit remarkable improvements over those from tMD paths with respect to both physical rationality and consistency with a priori knowledge.
Collapse
Affiliation(s)
- Yao Li
- MOE Key Laboratory of Bioinformatics, School of Life Sciences, Tsinghua University, Beijing 100084, China.,Beijing Advanced Innovation Center for Structural Biology, Tsinghua University, Beijing 100084, China
| | - Haipeng Gong
- MOE Key Laboratory of Bioinformatics, School of Life Sciences, Tsinghua University, Beijing 100084, China.,Beijing Advanced Innovation Center for Structural Biology, Tsinghua University, Beijing 100084, China
| |
Collapse
|
29
|
Gupta A, Dey S, Hicks A, Zhou HX. Artificial intelligence guided conformational mining of intrinsically disordered proteins. Commun Biol 2022; 5:610. [PMID: 35725761 PMCID: PMC9209487 DOI: 10.1038/s42003-022-03562-y] [Citation(s) in RCA: 22] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2021] [Accepted: 06/07/2022] [Indexed: 12/29/2022] Open
Abstract
Artificial intelligence recently achieved the breakthrough of predicting the three-dimensional structures of proteins. The next frontier is presented by intrinsically disordered proteins (IDPs), which, representing 30% to 50% of proteomes, readily access vast conformational space. Molecular dynamics (MD) simulations are promising in sampling IDP conformations, but only at extremely high computational cost. Here, we developed generative autoencoders that learn from short MD simulations and generate full conformational ensembles. An encoder represents IDP conformations as vectors in a reduced-dimensional latent space. The mean vector and covariance matrix of the training dataset are calculated to define a multivariate Gaussian distribution, from which vectors are sampled and fed to a decoder to generate new conformations. The ensembles of generated conformations cover those sampled by long MD simulations and are validated by small-angle X-ray scattering profile and NMR chemical shifts. This work illustrates the vast potential of artificial intelligence in conformational mining of IDPs.
Collapse
Affiliation(s)
- Aayush Gupta
- Department of Chemistry, University of Illinois at Chicago, Chicago, IL, 60607, USA
| | - Souvik Dey
- Department of Chemistry, University of Illinois at Chicago, Chicago, IL, 60607, USA
| | - Alan Hicks
- Department of Chemistry, University of Illinois at Chicago, Chicago, IL, 60607, USA
| | - Huan-Xiang Zhou
- Department of Chemistry, University of Illinois at Chicago, Chicago, IL, 60607, USA.
- Department of Physics, University of Illinois at Chicago, Chicago, IL, 60607, USA.
| |
Collapse
|
30
|
Monroe JI, Shen VK. Learning Efficient, Collective Monte Carlo Moves with Variational Autoencoders. J Chem Theory Comput 2022; 18:3622-3636. [PMID: 35613327 PMCID: PMC11210279 DOI: 10.1021/acs.jctc.2c00110] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Discovering meaningful collective variables for enhancing sampling, via applied biasing potentials or tailored MC move sets, remains a major challenge within molecular simulation. While recent studies identifying collective variables with variational autoencoders (VAEs) have focused on the encoding and latent space discovered by a VAE, the impact of the decoding and its ability to act as a generative model remains unexplored. We demonstrate how VAEs may be used to learn (on-the-fly and with minimal human intervention) highly efficient, collective Monte Carlo moves that accelerate sampling along the learned collective variable. In contrast to many machine learning-based efforts to bias sampling and generate novel configurations, our methods result in exact sampling in the ensemble of interest and do not require reweighting. In fact, we show that the acceptance rates of our moves approach unity for a perfect VAE model. While this is never observed in practice, VAE-based Monte Carlo moves still enhance sampling of new configurations. We demonstrate, however, that the form of the encoding and decoding distributions, in particular the extent to which the decoder reflects the underlying physics, greatly impacts the performance of the trained VAE.
Collapse
Affiliation(s)
- Jacob I Monroe
- Chemical Sciences Division, National Institute of Standards and Technology, Gaithersburg, Maryland 20899-8320, United States
| | - Vincent K Shen
- Chemical Sciences Division, National Institute of Standards and Technology, Gaithersburg, Maryland 20899-8320, United States
| |
Collapse
|
31
|
Ghorbani M, Prasad S, Klauda JB, Brooks BR. GraphVAMPNet, using graph neural networks and variational approach to Markov processes for dynamical modeling of biomolecules. J Chem Phys 2022; 156:184103. [PMID: 35568532 PMCID: PMC9094994 DOI: 10.1063/5.0085607] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2022] [Accepted: 04/22/2022] [Indexed: 11/14/2022] Open
Abstract
Finding a low dimensional representation of data from long-timescale trajectories of biomolecular processes, such as protein folding or ligand-receptor binding, is of fundamental importance, and kinetic models, such as Markov modeling, have proven useful in describing the kinetics of these systems. Recently, an unsupervised machine learning technique called VAMPNet was introduced to learn the low dimensional representation and the linear dynamical model in an end-to-end manner. VAMPNet is based on the variational approach for Markov processes and relies on neural networks to learn the coarse-grained dynamics. In this paper, we combine VAMPNet and graph neural networks to generate an end-to-end framework to efficiently learn high-level dynamics and metastable states from the long-timescale molecular dynamics trajectories. This method bears the advantages of graph representation learning and uses graph message passing operations to generate an embedding for each datapoint, which is used in the VAMPNet to generate a coarse-grained dynamical model. This type of molecular representation results in a higher resolution and a more interpretable Markov model than the standard VAMPNet, enabling a more detailed kinetic study of the biomolecular processes. Our GraphVAMPNet approach is also enhanced with an attention mechanism to find the important residues for classification into different metastable states.
Collapse
Affiliation(s)
| | - Samarjeet Prasad
- Laboratory of Computational Biology, National Heart, Lung, and Blood Institute, National Institutes of Health, Bethesda, Maryland 20824, USA
| | - Jeffery B. Klauda
- Department of Chemical and Biomolecular Engineering, University of Maryland, College Park, Maryland 20742, USA
| | - Bernard R. Brooks
- Laboratory of Computational Biology, National Heart, Lung, and Blood Institute, National Institutes of Health, Bethesda, Maryland 20824, USA
| |
Collapse
|
32
|
Integration of machine learning with computational structural biology of plants. Biochem J 2022; 479:921-928. [PMID: 35484946 DOI: 10.1042/bcj20200942] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2021] [Revised: 04/01/2022] [Accepted: 04/06/2022] [Indexed: 11/17/2022]
Abstract
Computational structural biology of proteins has developed rapidly in recent decades with the development of new computational tools and the advancement of computing hardware. However, while these techniques have widely been used to make advancements in human medicine, these methods have seen less utilization in the plant sciences. In the last several years, machine learning methods have gained popularity in computational structural biology. These methods have enabled the development of new tools which are able to address the major challenges that have hampered the wide adoption of the computational structural biology of plants. This perspective examines the remaining challenges in computational structural biology and how the development of machine learning techniques enables more in-depth computational structural biology of plants.
Collapse
|
33
|
Louwerse MD, Sivak D. Multidimensional minimum-work control of a 2D Ising model. J Chem Phys 2022; 156:194108. [DOI: 10.1063/5.0086079] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
A system's configurational state can be manipulated using dynamic variation of control parameters, such as temperature, pressure, or magnetic field; for finite-duration driving, excess work is required above the equilibrium free-energy change. Minimum-work protocols in multidimensional control-parameter space have potential to significantly reduce work relative to one-dimensional control. By numerically minimizing a linear-response approximation to the excess work, we design protocols in control-parameter spaces of a 2D Ising model that efficiently drive the system from the all-down to all-up configuration. We find that such designed multidimensional protocols take advantage of more flexible control to avoid control-parameter regions of high system resistance, heterogeneously input and extract work to make use of system relaxation, and flatten the energy landscape, making accessible many configurations that would otherwise have prohibitively high energy and thus decreasing spin correlations. Relative to one-dimensional protocols, this speeds up the rate-limiting spin-inversion reaction, thereby keeping the system significantly closer to equilibrium for a wide range of protocol durations, and significantly reducing resistance and hence work.
Collapse
|
34
|
|
35
|
Cignoni E, Slama V, Cupellini L, Mennucci B. The atomistic modeling of light-harvesting complexes from the physical models to the computational protocol. J Chem Phys 2022; 156:120901. [DOI: 10.1063/5.0086275] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
Abstract
The function of light-harvesting complexes is determined by a complex network of dynamic interactions among all the different components: the aggregate of pigments, the protein, and the surrounding environment. Complete and reliable predictions on these types of composite systems can be only achieved with an atomistic description. In the last few decades, there have been important advances in the atomistic modeling of light-harvesting complexes. These advances have involved both the completeness of the physical models and the accuracy and effectiveness of the computational protocols. In this Perspective, we present an overview of the main theoretical and computational breakthroughs attained so far in the field, with particular focus on the important role played by the protein and its dynamics. We then discuss the open problems in their accurate modeling that still need to be addressed. To illustrate an effective computational workflow for the modeling of light harvesting complexes, we take as an example the plant antenna complex CP29 and its H111N mutant.
Collapse
Affiliation(s)
- Edoardo Cignoni
- Dipartimento di Chimica e Chimica Industriale, University of Pisa, via G. Moruzzi 13, 56124 Pisa, Italy
| | - Vladislav Slama
- Dipartimento di Chimica e Chimica Industriale, University of Pisa, via G. Moruzzi 13, 56124 Pisa, Italy
| | - Lorenzo Cupellini
- Dipartimento di Chimica e Chimica Industriale, University of Pisa, via G. Moruzzi 13, 56124 Pisa, Italy
| | - Benedetta Mennucci
- Dipartimento di Chimica e Chimica Industriale, University of Pisa, via G. Moruzzi 13, 56124 Pisa, Italy
| |
Collapse
|
36
|
Baltrukevich H, Podlewska S. From Data to Knowledge: Systematic Review of Tools for Automatic Analysis of Molecular Dynamics Output. Front Pharmacol 2022; 13:844293. [PMID: 35359865 PMCID: PMC8960308 DOI: 10.3389/fphar.2022.844293] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2021] [Accepted: 01/26/2022] [Indexed: 12/02/2022] Open
Abstract
An increasing number of crystal structures available on one side, and the boost of computational power available for computer-aided drug design tasks on the other, have caused that the structure-based drug design tools are intensively used in the drug development pipelines. Docking and molecular dynamics simulations, key representatives of the structure-based approaches, provide detailed information about the potential interaction of a ligand with a target receptor. However, at the same time, they require a three-dimensional structure of a protein and a relatively high amount of computational resources. Nowadays, as both docking and molecular dynamics are much more extensively used, the amount of data output from these procedures is also growing. Therefore, there are also more and more approaches that facilitate the analysis and interpretation of the results of structure-based tools. In this review, we will comprehensively summarize approaches for handling molecular dynamics simulations output. It will cover both statistical and machine-learning-based tools, as well as various forms of depiction of molecular dynamics output.
Collapse
Affiliation(s)
- Hanna Baltrukevich
- Maj Institute of Pharmacology, Polish Academy of Sciences, Kraków, Poland
- Faculty of Pharmacy, Chair of Technology and Biotechnology of Medical Remedies, Jagiellonian University Medical College in Krakow, Kraków, Poland
| | - Sabina Podlewska
- Maj Institute of Pharmacology, Polish Academy of Sciences, Kraków, Poland
| |
Collapse
|
37
|
Hoffmann M, Scherer M, Hempel T, Mardt A, de Silva B, Husic BE, Klus S, Wu H, Kutz N, Brunton SL, Noé F. Deeptime: a Python library for machine learning dynamical models from time series data. MACHINE LEARNING: SCIENCE AND TECHNOLOGY 2022. [DOI: 10.1088/2632-2153/ac3de0] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022] Open
Abstract
Abstract
Generation and analysis of time-series data is relevant to many quantitative fields ranging from economics to fluid mechanics. In the physical sciences, structures such as metastable and coherent sets, slow relaxation processes, collective variables, dominant transition pathways or manifolds and channels of probability flow can be of great importance for understanding and characterizing the kinetic, thermodynamic and mechanistic properties of the system. Deeptime is a general purpose Python library offering various tools to estimate dynamical models based on time-series data including conventional linear learning methods, such as Markov state models (MSMs), Hidden Markov Models and Koopman models, as well as kernel and deep learning approaches such as VAMPnets and deep MSMs. The library is largely compatible with scikit-learn, having a range of Estimator classes for these different models, but in contrast to scikit-learn also provides deep Model classes, e.g. in the case of an MSM, which provide a multitude of analysis methods to compute interesting thermodynamic, kinetic and dynamical quantities, such as free energies, relaxation times and transition paths. The library is designed for ease of use but also easily maintainable and extensible code. In this paper we introduce the main features and structure of the deeptime software. Deeptime can be found under https://deeptime-ml.github.io/.
Collapse
|
38
|
Belkacemi Z, Gkeka P, Lelièvre T, Stoltz G. Chasing Collective Variables Using Autoencoders and Biased Trajectories. J Chem Theory Comput 2021; 18:59-78. [PMID: 34965117 DOI: 10.1021/acs.jctc.1c00415] [Citation(s) in RCA: 22] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
Free energy biasing methods have proven to be powerful tools to accelerate the simulation of important conformational changes of molecules by modifying the sampling measure. However, most of these methods rely on the prior knowledge of low-dimensional slow degrees of freedom, i.e., collective variables (CVs). Alternatively, such CVs can be identified using machine learning (ML) and dimensionality reduction algorithms. In this context, approaches where the CVs are learned in an iterative way using adaptive biasing have been proposed: at each iteration, the learned CV is used to perform free energy adaptive biasing to generate new data and learn a new CV. In this paper, we introduce a new iterative method involving CV learning with autoencoders: Free Energy Biasing and Iterative Learning with AutoEncoders (FEBILAE). Our method includes a reweighting scheme to ensure that the learning model optimizes the same loss at each iteration and achieves CV convergence. Using the alanine dipeptide system and the solvated chignolin mini-protein system as examples, we present results of our algorithm using the extended adaptive biasing force as the free energy adaptive biasing method.
Collapse
Affiliation(s)
- Zineb Belkacemi
- CERMICS, Ecole des Ponts ParisTech, 77455 Marne-la-Vallée, France.,Structure Design and Informatics, Sanofi 1371 R&D, 91385 Chilly-Mazarin, France
| | - Paraskevi Gkeka
- Structure Design and Informatics, Sanofi 1371 R&D, 91385 Chilly-Mazarin, France
| | - Tony Lelièvre
- CERMICS, Ecole des Ponts ParisTech, 77455 Marne-la-Vallée, France.,MATHERIALS Team-Project, Inria, 75589 Paris, France
| | - Gabriel Stoltz
- CERMICS, Ecole des Ponts ParisTech, 77455 Marne-la-Vallée, France.,MATHERIALS Team-Project, Inria, 75589 Paris, France
| |
Collapse
|
39
|
Zhu J, Jiang M, Liu Z. Fault Detection and Diagnosis in Industrial Processes with Variational Autoencoder: A Comprehensive Study. SENSORS (BASEL, SWITZERLAND) 2021; 22:227. [PMID: 35009769 PMCID: PMC8749793 DOI: 10.3390/s22010227] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/22/2021] [Revised: 12/13/2021] [Accepted: 12/27/2021] [Indexed: 06/14/2023]
Abstract
This work considers industrial process monitoring using a variational autoencoder (VAE). As a powerful deep generative model, the variational autoencoder and its variants have become popular for process monitoring. However, its monitoring ability, especially its fault diagnosis ability, has not been well investigated. In this paper, the process modeling and monitoring capabilities of several VAE variants are comprehensively studied. First, fault detection schemes are defined in three distinct ways, considering latent, residual, and the combined domains. Afterwards, to conduct the fault diagnosis, we first define the deep contribution plot, and then a deep reconstruction-based contribution diagram is proposed for deep domains under the fault propagation mechanism. In a case study, the performance of the process monitoring capability of four deep VAE models, namely, the static VAE model, the dynamic VAE model, and the recurrent VAE models (LSTM-VAE and GRU-VAE), has been comparatively evaluated on the industrial benchmark Tennessee Eastman process. Results show that recurrent VAEs with a deep reconstruction-based diagnosis mechanism are recommended for industrial process monitoring tasks.
Collapse
Affiliation(s)
- Jinlin Zhu
- State Key Laboratory of Food Science and Technology, Jiangnan University, Wuxi 214122, China
- School of Food Science and Technology, Jiangnan University, Wuxi 214122, China
| | - Muyun Jiang
- School of Computer Science and Engineering, Nanyang Technological University, Singapore 639798, Singapore;
| | - Zhong Liu
- Key Laboratory of Advanced Process Control for Light Industry (Ministry of Education), Jiangnan University, Wuxi 214122, China;
| |
Collapse
|
40
|
Vlachas PR, Zavadlav J, Praprotnik M, Koumoutsakos P. Accelerated Simulations of Molecular Systems through Learning of Effective Dynamics. J Chem Theory Comput 2021; 18:538-549. [PMID: 34890204 DOI: 10.1021/acs.jctc.1c00809] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Simulations are vital for understanding and predicting the evolution of complex molecular systems. However, despite advances in algorithms and special purpose hardware, accessing the time scales necessary to capture the structural evolution of biomolecules remains a daunting task. In this work, we present a novel framework to advance simulation time scales by up to 3 orders of magnitude by learning the effective dynamics (LED) of molecular systems. LED augments the equation-free methodology by employing a probabilistic mapping between coarse and fine scales using mixture density network (MDN) autoencoders and evolves the non-Markovian latent dynamics using long short-term memory MDNs. We demonstrate the effectiveness of LED in the Müller-Brown potential, the Trp cage protein, and the alanine dipeptide. LED identifies explainable reduced-order representations, i.e., collective variables, and can generate, at any instant, all-atom molecular trajectories consistent with the collective variables. We believe that the proposed framework provides a dramatic increase to simulation capabilities and opens new horizons for the effective modeling of complex molecular systems.
Collapse
Affiliation(s)
- Pantelis R Vlachas
- Computational Science and Engineering Laboratory, ETH Zurich, CH-8092, Switzerland
| | - Julija Zavadlav
- Professorship of Multiscale Modeling of Fluid Materials, TUM School of Engineering and Design, Technical University of Munich, 85748 Garching bei München, Germany.,Munich Data Science Institute, Technical University of Munich, 85748 Munich, Germany
| | - Matej Praprotnik
- Laboratory for Molecular Modeling, National Institute of Chemistry, SI-1001 Ljubljana, Slovenia.,Department of Physics, Faculty of Mathematics and Physics, University of Ljubljana, SI-1000 Ljubljana, Slovenia
| | - Petros Koumoutsakos
- John A. Paulson School of Engineering and Applied Sciences, Harvard University, Cambridge, Massachusetts 02138, United States
| |
Collapse
|
41
|
A quantitative paradigm for water-assisted proton transport through proteins and other confined spaces. Proc Natl Acad Sci U S A 2021; 118:2113141118. [PMID: 34857630 DOI: 10.1073/pnas.2113141118] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/15/2021] [Indexed: 11/18/2022] Open
Abstract
Water-assisted proton transport through confined spaces influences many phenomena in biomolecular and nanomaterial systems. In such cases, the water molecules that fluctuate in the confined pathways provide the environment and the medium for the hydrated excess proton migration via Grotthuss shuttling. However, a definitive collective variable (CV) that accurately couples the hydration and the connectivity of the proton wire with the proton translocation has remained elusive. To address this important challenge-and thus to define a quantitative paradigm for facile proton transport in confined spaces-a CV is derived in this work from graph theory, which is verified to accurately describe water wire formation and breakage coupled to the proton translocation in carbon nanotubes and the Cl-/H+ antiporter protein, ClC-ec1. Significant alterations in the conformations and thermodynamics of water wires are uncovered after introducing an excess proton into them. Large barriers in the proton translocation free-energy profiles are found when water wires are defined to be disconnected according to the new CV, even though the pertinent confined space is still reasonably well hydrated and-by the simple measure of the mere existence of a water structure-the proton transport would have been predicted to be facile via that oversimplified measure. In this paradigm, however, the simple presence of water is not sufficient for inferring proton translocation, since an excess proton itself is able to drive hydration, and additionally, the water molecules themselves must be adequately connected to facilitate any successful proton transport.
Collapse
|
42
|
Weber JK, Morrone JA, Bagchi S, Pabon JDE, Kang SG, Zhang L, Cornell WD. Simplified, interpretable graph convolutional neural networks for small molecule activity prediction. J Comput Aided Mol Des 2021; 36:391-404. [PMID: 34817762 PMCID: PMC9325818 DOI: 10.1007/s10822-021-00421-6] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2021] [Accepted: 09/24/2021] [Indexed: 12/11/2022]
Abstract
We here present a streamlined, explainable graph convolutional neural network (gCNN) architecture for small molecule activity prediction. We first conduct a hyperparameter optimization across nearly 800 protein targets that produces a simplified gCNN QSAR architecture, and we observe that such a model can yield performance improvements over both standard gCNN and RF methods on difficult-to-classify test sets. Additionally, we discuss how reductions in convolutional layer dimensions potentially speak to the “anatomical” needs of gCNNs with respect to radial coarse graining of molecular substructure. We augment this simplified architecture with saliency map technology that highlights molecular substructures relevant to activity, and we perform saliency analysis on nearly 100 data-rich protein targets. We show that resultant substructural clusters are useful visualization tools for understanding substructure-activity relationships. We go on to highlight connections between our models’ saliency predictions and observations made in the medicinal chemistry literature, focusing on four case studies of past lead finding and lead optimization campaigns.
Collapse
Affiliation(s)
- Jeffrey K Weber
- IBM Thomas J Watson Research Center, Yorktown Heights, NY, USA
| | | | - Sugato Bagchi
- IBM Thomas J Watson Research Center, Yorktown Heights, NY, USA
| | | | - Seung-Gu Kang
- IBM Thomas J Watson Research Center, Yorktown Heights, NY, USA
| | - Leili Zhang
- IBM Thomas J Watson Research Center, Yorktown Heights, NY, USA
| | - Wendy D Cornell
- IBM Thomas J Watson Research Center, Yorktown Heights, NY, USA.
| |
Collapse
|
43
|
Bonati L, Piccini G, Parrinello M. Deep learning the slow modes for rare events sampling. Proc Natl Acad Sci U S A 2021; 118:e2113533118. [PMID: 34706940 PMCID: PMC8612227 DOI: 10.1073/pnas.2113533118] [Citation(s) in RCA: 90] [Impact Index Per Article: 30.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/19/2021] [Indexed: 02/08/2023] Open
Abstract
The development of enhanced sampling methods has greatly extended the scope of atomistic simulations, allowing long-time phenomena to be studied with accessible computational resources. Many such methods rely on the identification of an appropriate set of collective variables. These are meant to describe the system's modes that most slowly approach equilibrium under the action of the sampling algorithm. Once identified, the equilibration of these modes is accelerated by the enhanced sampling method of choice. An attractive way of determining the collective variables is to relate them to the eigenfunctions and eigenvalues of the transfer operator. Unfortunately, this requires knowing the long-term dynamics of the system beforehand, which is generally not available. However, we have recently shown that it is indeed possible to determine efficient collective variables starting from biased simulations. In this paper, we bring the power of machine learning and the efficiency of the recently developed on the fly probability-enhanced sampling method to bear on this approach. The result is a powerful and robust algorithm that, given an initial enhanced sampling simulation performed with trial collective variables or generalized ensembles, extracts transfer operator eigenfunctions using a neural network ansatz and then accelerates them to promote sampling of rare events. To illustrate the generality of this approach, we apply it to several systems, ranging from the conformational transition of a small molecule to the folding of a miniprotein and the study of materials crystallization.
Collapse
Affiliation(s)
- Luigi Bonati
- Department of Physics, Eidgenössische Technische Hochschule (ETH) Zürich, 8092 Zürich, Switzerland;
- Atomistic Simulations, Italian Institute of Technology, 16163 Genova, Italy
| | | | - Michele Parrinello
- Atomistic Simulations, Italian Institute of Technology, 16163 Genova, Italy;
| |
Collapse
|
44
|
Moritsugu K. Multiscale Enhanced Sampling Using Machine Learning. Life (Basel) 2021; 11:life11101076. [PMID: 34685447 PMCID: PMC8540671 DOI: 10.3390/life11101076] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2021] [Revised: 10/06/2021] [Accepted: 10/08/2021] [Indexed: 01/18/2023] Open
Abstract
Multiscale enhanced sampling (MSES) allows for an enhanced sampling of all-atom protein structures by coupling with the accelerated dynamics of the associated coarse-grained (CG) model. In this paper, we propose an MSES extension to replace the CG model with the dynamics on the reduced subspace generated by a machine learning approach, the variational autoencoder (VAE). The molecular dynamic (MD) trajectories of the ribose-binding protein (RBP) in both the closed and open forms were used as the input by extracting the inter-residue distances as the structural features in order to train the VAE model, allowing the encoded latent layer to characterize the difference in the structural dynamics of the closed and open forms. The interpolated data characterizing the RBP structural change in between the closed and open forms were thus efficiently generated in the low-dimensional latent space of the VAE, which was then decoded into the time-series data of the inter-residue distances and was useful for driving the structural sampling at an atomistic resolution via the MSES scheme. The free energy surfaces on the latent space demonstrated the refinement of the generated data that had a single basin into the simulated data containing two closed and open basins, thus illustrating the usefulness of the MD simulation together with the molecular mechanics force field in recovering the correct structural ensemble.
Collapse
Affiliation(s)
- Kei Moritsugu
- Graduate School of Medical Life Science, Yokohama City University, Yokohama 230-0045, Japan
| |
Collapse
|
45
|
Rizzi A, Carloni P, Parrinello M. Targeted Free Energy Perturbation Revisited: Accurate Free Energies from Mapped Reference Potentials. J Phys Chem Lett 2021; 12:9449-9454. [PMID: 34555284 DOI: 10.1021/acs.jpclett.1c02135] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
We present an approach that extends the theory of targeted free energy perturbation (TFEP) to calculate free energy differences and free energy surfaces at an accurate quantum mechanical level of theory from a cheaper reference potential. The convergence is accelerated by a mapping function that increases the overlap between the target and the reference distributions. Building on recent work, we show that this map can be learned with a normalizing flow neural network, without requiring simulations with the expensive target potential but only a small number of single-point calculations, and, crucially, avoiding the systematic error that was found previously. We validate the method by numerically evaluating the free energy difference in a system with a double-well potential and by describing the free energy landscape of a simple chemical reaction in the gas phase.
Collapse
Affiliation(s)
- Andrea Rizzi
- Computational Biomedicine, Institute of Advanced Simulations IAS-5/Institute for Neuroscience and Medicine INM-9, Forschungszentrum Jülich GmbH, Jülich 52428, Germany
- Atomistic Simulations, Italian Institute of Technology, Via Morego 30, Genova 16163, Italy
| | - Paolo Carloni
- Computational Biomedicine, Institute of Advanced Simulations IAS-5/Institute for Neuroscience and Medicine INM-9, Forschungszentrum Jülich GmbH, Jülich 52428, Germany
- Molecular Neuroscience and Neuroimaging (INM-11), Forschungszentrum Jülich GmbH, Jülich 52428, Germany
- Department of Physics and Universitätsklinikum, RWTH Aachen University, Aachen 52074, Germany
| | - Michele Parrinello
- Atomistic Simulations, Italian Institute of Technology, Via Morego 30, Genova 16163, Italy
| |
Collapse
|
46
|
Bandyopadhyay S, Mondal J. A deep autoencoder framework for discovery of metastable ensembles in biomacromolecules. J Chem Phys 2021; 155:114106. [PMID: 34551528 DOI: 10.1063/5.0059965] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023] Open
Abstract
Biomacromolecules manifest dynamic conformational fluctuation and involve mutual interconversion among metastable states. A robust mapping of their conformational landscape often requires the low-dimensional projection of the conformational ensemble along optimized collective variables (CVs). However, the traditional choice for the CV is often limited by user-intuition and prior knowledge about the system, and this lacks a rigorous assessment of their optimality over other candidate CVs. To address this issue, we propose an approach in which we first choose the possible combinations of inter-residue Cα-distances within a given macromolecule as a set of input CVs. Subsequently, we derive a non-linear combination of latent space embedded CVs via auto-encoding the unbiased molecular dynamics simulation trajectories within the framework of the feed-forward neural network. We demonstrate the ability of the derived latent space variables in elucidating the conformational landscape in four hierarchically complex systems. The latent space CVs identify key metastable states of a bead-in-a-spring polymer. The combination of the adopted dimensional reduction technique with a Markov state model, built on the derived latent space, reveals multiple spatially and kinetically well-resolved metastable conformations for GB1 β-hairpin. A quantitative comparison based on the variational approach-based scoring of the auto-encoder-derived latent space CVs with the ones obtained via independent component analysis (principal component analysis or time-structured independent component analysis) confirms the optimality of the former. As a practical application, the auto-encoder-derived CVs were found to predict the reinforced folding of a Trp-cage mini-protein in aqueous osmolyte solution. Finally, the protocol was able to decipher the conformational heterogeneities involved in a complex metalloenzyme, namely, cytochrome P450.
Collapse
Affiliation(s)
- Satyabrata Bandyopadhyay
- Tata Institute of Fundamental Research, Center for Interdisciplinary Sciences, Hyderabad 500046, India
| | - Jagannath Mondal
- Tata Institute of Fundamental Research, Center for Interdisciplinary Sciences, Hyderabad 500046, India
| |
Collapse
|
47
|
Abstract
We extend the nonparametric framework of reaction coordinate optimization to nonequilibrium ensembles of (short) trajectories. For example, we show how, starting from such an ensemble, one can obtain an equilibrium free-energy profile along the committor, which can be used to determine important properties of the dynamics exactly. A new adaptive sampling approach, the transition-state ensemble enrichment, is suggested, which samples the configuration space by "growing" committor segments toward each other starting from the boundary states. This framework is suggested as a general tool, alternative to the Markov state models, for a rigorous and accurate analysis of simulations of large biomolecular systems, as it has the following attractive properties. It is immune to the curse of dimensionality, does not require system-specific information, can approximate arbitrary reaction coordinates with high accuracy, and has sensitive and rigorous criteria to test optimality and convergence. The approaches are illustrated on a 50-dimensional model system and a realistic protein folding trajectory.
Collapse
Affiliation(s)
- Sergei V Krivov
- Astbury Center for Structural Molecular Biology, Faculty of Biological Sciences, University of Leeds, Leeds LS2 9JT, U.K
| |
Collapse
|
48
|
Glielmo A, Husic BE, Rodriguez A, Clementi C, Noé F, Laio A. Unsupervised Learning Methods for Molecular Simulation Data. Chem Rev 2021; 121:9722-9758. [PMID: 33945269 PMCID: PMC8391792 DOI: 10.1021/acs.chemrev.0c01195] [Citation(s) in RCA: 116] [Impact Index Per Article: 38.7] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2020] [Indexed: 12/21/2022]
Abstract
Unsupervised learning is becoming an essential tool to analyze the increasingly large amounts of data produced by atomistic and molecular simulations, in material science, solid state physics, biophysics, and biochemistry. In this Review, we provide a comprehensive overview of the methods of unsupervised learning that have been most commonly used to investigate simulation data and indicate likely directions for further developments in the field. In particular, we discuss feature representation of molecular systems and present state-of-the-art algorithms of dimensionality reduction, density estimation, and clustering, and kinetic models. We divide our discussion into self-contained sections, each discussing a specific method. In each section, we briefly touch upon the mathematical and algorithmic foundations of the method, highlight its strengths and limitations, and describe the specific ways in which it has been used-or can be used-to analyze molecular simulation data.
Collapse
Affiliation(s)
- Aldo Glielmo
- International
School for Advanced Studies (SISSA) 34014 Trieste, Italy
| | - Brooke E. Husic
- Freie
Universität Berlin, Department of Mathematics
and Computer Science, 14195 Berlin, Germany
| | - Alex Rodriguez
- International Centre for Theoretical
Physics (ICTP), Condensed Matter and Statistical
Physics Section, 34100 Trieste, Italy
| | - Cecilia Clementi
- Freie
Universität Berlin, Department for
Physics, 14195 Berlin, Germany
- Rice
University Houston, Department of Chemistry, Houston, Texas 77005, United States
| | - Frank Noé
- Freie
Universität Berlin, Department of Mathematics
and Computer Science, 14195 Berlin, Germany
- Freie
Universität Berlin, Department for
Physics, 14195 Berlin, Germany
- Rice
University Houston, Department of Chemistry, Houston, Texas 77005, United States
| | - Alessandro Laio
- International
School for Advanced Studies (SISSA) 34014 Trieste, Italy
- International Centre for Theoretical
Physics (ICTP), Condensed Matter and Statistical
Physics Section, 34100 Trieste, Italy
| |
Collapse
|
49
|
Rydzewski J, Valsson O. Multiscale Reweighted Stochastic Embedding: Deep Learning of Collective Variables for Enhanced Sampling. J Phys Chem A 2021; 125:6286-6302. [PMID: 34213915 PMCID: PMC8389995 DOI: 10.1021/acs.jpca.1c02869] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2021] [Revised: 06/17/2021] [Indexed: 12/29/2022]
Abstract
Machine learning methods provide a general framework for automatically finding and representing the essential characteristics of simulation data. This task is particularly crucial in enhanced sampling simulations. There we seek a few generalized degrees of freedom, referred to as collective variables (CVs), to represent and drive the sampling of the free energy landscape. In theory, these CVs should separate different metastable states and correspond to the slow degrees of freedom of the studied physical process. To this aim, we propose a new method that we call multiscale reweighted stochastic embedding (MRSE). Our work builds upon a parametric version of stochastic neighbor embedding. The technique automatically learns CVs that map a high-dimensional feature space to a low-dimensional latent space via a deep neural network. We introduce several new advancements to stochastic neighbor embedding methods that make MRSE especially suitable for enhanced sampling simulations: (1) weight-tempered random sampling as a landmark selection scheme to obtain training data sets that strike a balance between equilibrium representation and capturing important metastable states lying higher in free energy; (2) a multiscale representation of the high-dimensional feature space via a Gaussian mixture probability model; and (3) a reweighting procedure to account for training data from a biased probability distribution. We show that MRSE constructs low-dimensional CVs that can correctly characterize the different metastable states in three model systems: the Müller-Brown potential, alanine dipeptide, and alanine tetrapeptide.
Collapse
Affiliation(s)
- Jakub Rydzewski
- Institute
of Physics, Faculty of Physics, Astronomy and Informatics, Nicolaus Copernicus University, Grudziadzka 5, 87-100 Torun, Poland
| | - Omar Valsson
- Max
Planck Institute for Polymer Research, Ackermannweg 10, Mainz D-55128, Germany
| |
Collapse
|
50
|
Computational methods for exploring protein conformations. Biochem Soc Trans 2021; 48:1707-1724. [PMID: 32756904 PMCID: PMC7458412 DOI: 10.1042/bst20200193] [Citation(s) in RCA: 30] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2020] [Revised: 07/07/2020] [Accepted: 07/09/2020] [Indexed: 12/13/2022]
Abstract
Proteins are dynamic molecules that can transition between a potentially wide range of structures comprising their conformational ensemble. The nature of these conformations and their relative probabilities are described by a high-dimensional free energy landscape. While computer simulation techniques such as molecular dynamics simulations allow characterisation of the metastable conformational states and the transitions between them, and thus free energy landscapes, to be characterised, the barriers between states can be high, precluding efficient sampling without substantial computational resources. Over the past decades, a dizzying array of methods have emerged for enhancing conformational sampling, and for projecting the free energy landscape onto a reduced set of dimensions that allow conformational states to be distinguished, known as collective variables (CVs), along which sampling may be directed. Here, a brief description of what biomolecular simulation entails is followed by a more detailed exposition of the nature of CVs and methods for determining these, and, lastly, an overview of the myriad different approaches for enhancing conformational sampling, most of which rely upon CVs, including new advances in both CV determination and conformational sampling due to machine learning.
Collapse
|