1
|
Ruzmetov T, Hung TI, Jonnalagedda SP, Chen SH, Fasihianifard P, Guo Z, Bhanu B, Chang CEA. Sampling Conformational Ensembles of Highly Dynamic Proteins via Generative Deep Learning. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.05.05.592587. [PMID: 38979147 PMCID: PMC11230202 DOI: 10.1101/2024.05.05.592587] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/10/2024]
Abstract
Proteins are inherently dynamic, and their conformational ensembles are functionally important in biology. Large-scale motions may govern protein structure-function relationship, and numerous transient but stable conformations of intrinsically disordered proteins (IDPs) can play a crucial role in biological function. Investigating conformational ensembles to understand regulations and disease-related aggregations of IDPs is challenging both experimentally and computationally. In this paper we first introduced an unsupervised deep learning-based model, termed Internal Coordinate Net (ICoN), which learns the physical principles of conformational changes from molecular dynamics (MD) simulation data. Second, we selected interpolating data points in the learned latent space that rapidly identify novel synthetic conformations with sophisticated and large-scale sidechains and backbone arrangements. Third, with the highly dynamic amyloid-β 1-42 (Aβ42) monomer, our deep learning model provided a comprehensive sampling of Aβ42's conformational landscape. Analysis of these synthetic conformations revealed conformational clusters that can be used to rationalize experimental findings. Additionally, the method can identify novel conformations with important interactions in atomistic details that are not included in the training data. New synthetic conformations showed distinct sidechain rearrangements that are probed by our EPR and amino acid substitution studies. This approach is highly transferable and can be used for any available data for training. The work also demonstrated the ability for deep learning to utilize learned natural atomistic motions in protein conformation sampling.
Collapse
|
2
|
Fu H, Bian H, Shao X, Cai W. Collective Variable-Based Enhanced Sampling: From Human Learning to Machine Learning. J Phys Chem Lett 2024; 15:1774-1783. [PMID: 38329095 DOI: 10.1021/acs.jpclett.3c03542] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/09/2024]
Abstract
Enhanced-sampling algorithms relying on collective variables (CVs) are extensively employed to study complex (bio)chemical processes that are not amenable to brute-force molecular simulations. The selection of appropriate CVs characterizing the slow movement modes is of paramount importance for reliable and efficient enhanced-sampling simulations. In this Perspective, we first review the application and limitations of CVs obtained from chemical and geometrical intuition. We also introduce path-sampling algorithms, which can identify path-like CVs in a high-dimensional free-energy space. Machine-learning algorithms offer a viable approach to finding suitable CVs by analyzing trajectories from preliminary simulations. We discuss both the performance of machine-learning-derived CVs in enhanced-sampling simulations of experimental models and the challenges involved in applying these CVs to realistic, complex molecular assemblies. Moreover, we provide a prospective view of the potential advancements of machine-learning algorithms for the development of CVs in the field of enhanced-sampling simulations.
Collapse
Affiliation(s)
- Haohao Fu
- Research Center for Analytical Sciences, Frontiers Science Center for New Organic Matter, College of Chemistry, Nankai University, Tianjin Key Laboratory of Biosensing and Molecular Recognition, State Key Laboratory of Medicinal Chemical Biology, Tianjin 300071, China
- Haihe Laboratory of Sustainable Chemical Transformations, Tianjin 300192, China
| | - Hengwei Bian
- Research Center for Analytical Sciences, Frontiers Science Center for New Organic Matter, College of Chemistry, Nankai University, Tianjin Key Laboratory of Biosensing and Molecular Recognition, State Key Laboratory of Medicinal Chemical Biology, Tianjin 300071, China
- Haihe Laboratory of Sustainable Chemical Transformations, Tianjin 300192, China
| | - Xueguang Shao
- Research Center for Analytical Sciences, Frontiers Science Center for New Organic Matter, College of Chemistry, Nankai University, Tianjin Key Laboratory of Biosensing and Molecular Recognition, State Key Laboratory of Medicinal Chemical Biology, Tianjin 300071, China
- Haihe Laboratory of Sustainable Chemical Transformations, Tianjin 300192, China
| | - Wensheng Cai
- Research Center for Analytical Sciences, Frontiers Science Center for New Organic Matter, College of Chemistry, Nankai University, Tianjin Key Laboratory of Biosensing and Molecular Recognition, State Key Laboratory of Medicinal Chemical Biology, Tianjin 300071, China
- Haihe Laboratory of Sustainable Chemical Transformations, Tianjin 300192, China
| |
Collapse
|
3
|
Lichtinger SM, Biggin PC. Tackling Hysteresis in Conformational Sampling: How to Be Forgetful with MEMENTO. J Chem Theory Comput 2023. [PMID: 37285481 DOI: 10.1021/acs.jctc.3c00140] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
The structure of proteins has long been recognized to hold the key to understanding and engineering their function, and rapid advances in structural biology and protein structure prediction are now supplying researchers with an ever-increasing wealth of structural information. Most of the time, however, structures can only be determined in free energy minima, one at a time. While conformational flexibility may thus be inferred from static end-state structures, their interconversion mechanisms─a central ambition of structural biology─are often beyond the scope of direct experimentation. Given the dynamical nature of the processes in question, many studies have attempted to explore conformational transitions using molecular dynamics (MD). However, ensuring proper convergence and reversibility in the predicted transitions is extremely challenging. In particular, a commonly used technique to map out a path from a starting to a target conformation called steered MD (SMD) can suffer from starting-state dependence (hysteresis) when combined with techniques such as umbrella sampling (US) to compute the free energy profile of a transition. Here, we study this problem in detail on conformational changes of increasing complexity. We also present a new, history-independent approach that we term "MEMENTO" (Morphing End states by Modelling Ensembles with iNdependent TOpologies) to generate paths that alleviate hysteresis in the construction of conformational free energy profiles. MEMENTO utilizes template-based structure modelling to restore physically reasonable protein conformations based on coordinate interpolation (morphing) as an ensemble of plausible intermediates, from which a smooth path is picked. We compare SMD and MEMENTO on well-characterized test cases (the toy peptide deca-alanine and the enzyme adenylate kinase) before discussing its use in more complicated systems (the kinase P38α and the bacterial leucine transporter LeuT). Our work shows that for all but the simplest systems SMD paths should not in general be used to seed umbrella sampling or related techniques, unless the paths are validated by consistent results from biased runs in opposite directions. MEMENTO, on the other hand, performs well as a flexible tool to generate intermediate structures for umbrella sampling. We also demonstrate that extended end-state sampling combined with MEMENTO can aid the discovery of collective variables on a case-by-case basis.
Collapse
Affiliation(s)
| | - Philip C Biggin
- Department of Biochemistry, University of Oxford, Oxford OX1 3QU, U.K
| |
Collapse
|
4
|
López-Blanco JR, Dehouck Y, Bastolla U, Chacón P. Local Normal Mode Analysis for Fast Loop Conformational Sampling. J Chem Inf Model 2022; 62:4561-4568. [PMID: 36099639 PMCID: PMC9516680 DOI: 10.1021/acs.jcim.2c00870] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
![]()
We propose and validate
a novel method to efficiently
explore local
protein loop conformations based on a new formalism for constrained
normal mode analysis (NMA) in internal coordinates. The manifold of
possible loop configurations imposed by the position and orientation
of the fixed loop ends is reduced to an orthogonal set of motions
(or modes) encoding concerted rotations of all the backbone dihedral
angles. We validate the sampling power on a set of protein loops with
highly variable experimental structures and demonstrate that our approach
can efficiently explore the conformational space of closed loops.
We also show an acceptable resemblance of the ensembles around equilibrium
conformations generated by long molecular simulations and constrained
NMA on a set of exposed and diverse loops. In comparison with other
methods, the main advantage is the lack of restrictions on the number
of dihedrals that can be altered simultaneously. Furthermore, the
method is computationally efficient since it only requires the diagonalization
of a tiny matrix, and the modes of motions are energetically contextualized
by the elastic network model, which includes both the loop and the
neighboring residues.
Collapse
Affiliation(s)
- José Ramón López-Blanco
- Department of Biological Physical Chemistry, Rocasolano Institute of Physical Chemistry, CSIC, Serrano 119, 28006 Madrid, Spain
| | - Yves Dehouck
- Centro de Biología Molecular "Severo Ochoa," CSIC-UAM, Cantoblanco, 28049 Madrid, Spain
| | - Ugo Bastolla
- Centro de Biología Molecular "Severo Ochoa," CSIC-UAM, Cantoblanco, 28049 Madrid, Spain
| | - Pablo Chacón
- Department of Biological Physical Chemistry, Rocasolano Institute of Physical Chemistry, CSIC, Serrano 119, 28006 Madrid, Spain
| |
Collapse
|
5
|
Chen H, Liu H, Feng H, Fu H, Cai W, Shao X, Chipot C. MLCV: Bridging Machine-Learning-Based Dimensionality Reduction and Free-Energy Calculation. J Chem Inf Model 2021; 62:1-8. [PMID: 34939790 DOI: 10.1021/acs.jcim.1c01010] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Importance-sampling algorithms leaning on the definition of a model reaction coordinate (RC) are widely employed to probe processes relevant to chemistry and biology alike, spanning time scales not amenable to common, brute-force molecular dynamics (MD) simulations. In practice, the model RC often consists of a handful of collective variables (CVs) chosen on the basis of chemical intuition. However, constructing manually a low-dimensional RC model to describe an intricate geometrical transformation for the purpose of free-energy calculations and analyses remains a daunting challenge due to the inherent complexity of the conformational transitions at play. To solve this issue, remarkable progress has been made in employing machine-learning techniques, such as autoencoders, to extract the low-dimensional RC model from a large set of CVs. Implementation of the differentiable, nonlinear machine-learned CVs in common MD engines to perform free-energy calculations is, however, particularly cumbersome. To address this issue, we present here a user-friendly tool (called MLCV) that facilitates the use of machine-learned CVs in importance-sampling simulations through the popular Colvars module. Our approach is critically probed with three case examples consisting of small peptides, showcasing that through hard-coded neural network in Colvars, deep-learning and enhanced-sampling can be effectively bridged with MD simulations. The MLCV code is versatile, applicable to all the CVs available in Colvars, and can be connected to any kind of dense neural networks. We believe that MLCV provides an effective, powerful, and user-friendly platform accessible to experts and nonexperts alike for machine-learning (ML)-guided CV discovery and enhanced-sampling simulations to unveil the molecular mechanisms underlying complex biochemical processes.
Collapse
Affiliation(s)
- Haochuan Chen
- Research Center for Analytical Sciences, Frontiers Science Center for New Organic Matter, College of Chemistry, Nankai University, Tianjin 300071, China.,Tianjin Key Laboratory of Biosensing and Molecular Recognition, Tianjin 300071, China.,State Key Laboratory of Medicinal Chemical Biology, Tianjin 300071, China
| | - Han Liu
- Research Center for Analytical Sciences, Frontiers Science Center for New Organic Matter, College of Chemistry, Nankai University, Tianjin 300071, China.,Tianjin Key Laboratory of Biosensing and Molecular Recognition, Tianjin 300071, China.,State Key Laboratory of Medicinal Chemical Biology, Tianjin 300071, China
| | - Heying Feng
- Research Center for Analytical Sciences, Frontiers Science Center for New Organic Matter, College of Chemistry, Nankai University, Tianjin 300071, China.,Tianjin Key Laboratory of Biosensing and Molecular Recognition, Tianjin 300071, China.,State Key Laboratory of Medicinal Chemical Biology, Tianjin 300071, China
| | - Haohao Fu
- Research Center for Analytical Sciences, Frontiers Science Center for New Organic Matter, College of Chemistry, Nankai University, Tianjin 300071, China.,Tianjin Key Laboratory of Biosensing and Molecular Recognition, Tianjin 300071, China.,State Key Laboratory of Medicinal Chemical Biology, Tianjin 300071, China
| | - Wensheng Cai
- Research Center for Analytical Sciences, Frontiers Science Center for New Organic Matter, College of Chemistry, Nankai University, Tianjin 300071, China.,Tianjin Key Laboratory of Biosensing and Molecular Recognition, Tianjin 300071, China.,State Key Laboratory of Medicinal Chemical Biology, Tianjin 300071, China
| | - Xueguang Shao
- Research Center for Analytical Sciences, Frontiers Science Center for New Organic Matter, College of Chemistry, Nankai University, Tianjin 300071, China.,Tianjin Key Laboratory of Biosensing and Molecular Recognition, Tianjin 300071, China.,State Key Laboratory of Medicinal Chemical Biology, Tianjin 300071, China
| | - Christophe Chipot
- Laboratoire International Associé CNRS and University of Illinois at Urbana-Champaign, UMR no. 7019, Université de Lorraine, BP 70239, F-54506 Vandœuvre-lès-Nancy, France
| |
Collapse
|
6
|
Yasuda T, Morita R, Shigeta Y, Harada R. Independent Nontargeted Parallel Cascade Selection Molecular Dynamics (Ino-PaCS-MD) to Enhance the Conformational Sampling of Proteins. J Chem Theory Comput 2021; 17:5933-5943. [PMID: 34410106 DOI: 10.1021/acs.jctc.1c00558] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Biological functions are related to long-time protein dynamics (rare events) that are induced over microseconds. Such protein dynamics can be investigated using molecular dynamics (MD) simulations. However, the detection of rare events remains challenging using conventional MD (cMD) since the accessible timescales of cMD are shorter than those of the biological functions. Recently, the parallel cascade selection MD (PaCS-MD) has been proposed to detect such rare events, wherein transition paths are generated between a given reactant and product. As an extension, the nontargeted PaCS-MD (nt-PaCS-MD) has been proposed to predict the transition paths without requiring reference to any product. Thus, as a further extension, we herein propose independent nt-PaCS-MD, namely, Ino-PaCS-MD, wherein multiple walkers are launched from a set of different starting configurations. Each walker repeats a cycle of restarting short-time MD simulations from configurations with high potentials for making transitions to neighboring metastable states. To further enhance the sampling ability, Ino-PaCS-MD temporarily stops the conformational search and periodically resets the starting configurations so that they are uniformly distributed in a conformational subspace, thereby preventing a given protein from being trapped in one of the metastable states. As a demonstration, Ino-PaCS-MD successfully detects rare events of a maltose-binding protein as open-close transitions with a nanosecond-order simulation time, although a microsecond-order cMD simulation failed to detect these rare events, showing the high sampling efficiency of Ino-PaCS-MD.
Collapse
Affiliation(s)
- Takunori Yasuda
- College of Biological Sciences, University of Tsukuba, 1-1-1, Tennodai, Tsukuba, Ibaraki 305-0821, Japan
| | - Rikuri Morita
- Center for Computational Sciences, University of Tsukuba, 1-1-1 Tennodai, Tsukuba, Ibaraki 305-8577, Japan
| | - Yasuteru Shigeta
- Center for Computational Sciences, University of Tsukuba, 1-1-1 Tennodai, Tsukuba, Ibaraki 305-8577, Japan
| | - Ryuhei Harada
- Center for Computational Sciences, University of Tsukuba, 1-1-1 Tennodai, Tsukuba, Ibaraki 305-8577, Japan
| |
Collapse
|
7
|
Cao X, Tian P. "Dividing and Conquering" and "Caching" in Molecular Modeling. Int J Mol Sci 2021; 22:5053. [PMID: 34068835 PMCID: PMC8126232 DOI: 10.3390/ijms22095053] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2021] [Revised: 04/26/2021] [Accepted: 04/27/2021] [Indexed: 11/17/2022] Open
Abstract
Molecular modeling is widely utilized in subjects including but not limited to physics, chemistry, biology, materials science and engineering. Impressive progress has been made in development of theories, algorithms and software packages. To divide and conquer, and to cache intermediate results have been long standing principles in development of algorithms. Not surprisingly, most important methodological advancements in more than half century of molecular modeling are various implementations of these two fundamental principles. In the mainstream classical computational molecular science, tremendous efforts have been invested on two lines of algorithm development. The first is coarse graining, which is to represent multiple basic particles in higher resolution modeling as a single larger and softer particle in lower resolution counterpart, with resulting force fields of partial transferability at the expense of some information loss. The second is enhanced sampling, which realizes "dividing and conquering" and/or "caching" in configurational space with focus either on reaction coordinates and collective variables as in metadynamics and related algorithms, or on the transition matrix and state discretization as in Markov state models. For this line of algorithms, spatial resolution is maintained but results are not transferable. Deep learning has been utilized to realize more efficient and accurate ways of "dividing and conquering" and "caching" along these two lines of algorithmic research. We proposed and demonstrated the local free energy landscape approach, a new framework for classical computational molecular science. This framework is based on a third class of algorithm that facilitates molecular modeling through partially transferable in resolution "caching" of distributions for local clusters of molecular degrees of freedom. Differences, connections and potential interactions among these three algorithmic directions are discussed, with the hope to stimulate development of more elegant, efficient and reliable formulations and algorithms for "dividing and conquering" and "caching" in complex molecular systems.
Collapse
Affiliation(s)
- Xiaoyong Cao
- School of Life Sciences, Jilin University, Changchun 130012, China;
| | - Pu Tian
- School of Life Sciences, Jilin University, Changchun 130012, China;
- School of Artificial Intelligence, Jilin University, Changchun 130012, China
| |
Collapse
|