1
|
Baltrukevich H, Podlewska S. From Data to Knowledge: Systematic Review of Tools for Automatic Analysis of Molecular Dynamics Output. Front Pharmacol 2022; 13:844293. [PMID: 35359865 PMCID: PMC8960308 DOI: 10.3389/fphar.2022.844293] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2021] [Accepted: 01/26/2022] [Indexed: 12/02/2022] Open
Abstract
An increasing number of crystal structures available on one side, and the boost of computational power available for computer-aided drug design tasks on the other, have caused that the structure-based drug design tools are intensively used in the drug development pipelines. Docking and molecular dynamics simulations, key representatives of the structure-based approaches, provide detailed information about the potential interaction of a ligand with a target receptor. However, at the same time, they require a three-dimensional structure of a protein and a relatively high amount of computational resources. Nowadays, as both docking and molecular dynamics are much more extensively used, the amount of data output from these procedures is also growing. Therefore, there are also more and more approaches that facilitate the analysis and interpretation of the results of structure-based tools. In this review, we will comprehensively summarize approaches for handling molecular dynamics simulations output. It will cover both statistical and machine-learning-based tools, as well as various forms of depiction of molecular dynamics output.
Collapse
Affiliation(s)
- Hanna Baltrukevich
- Maj Institute of Pharmacology, Polish Academy of Sciences, Kraków, Poland
- Faculty of Pharmacy, Chair of Technology and Biotechnology of Medical Remedies, Jagiellonian University Medical College in Krakow, Kraków, Poland
| | - Sabina Podlewska
- Maj Institute of Pharmacology, Polish Academy of Sciences, Kraków, Poland
| |
Collapse
|
2
|
Glielmo A, Husic BE, Rodriguez A, Clementi C, Noé F, Laio A. Unsupervised Learning Methods for Molecular Simulation Data. Chem Rev 2021; 121:9722-9758. [PMID: 33945269 PMCID: PMC8391792 DOI: 10.1021/acs.chemrev.0c01195] [Citation(s) in RCA: 141] [Impact Index Per Article: 35.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2020] [Indexed: 12/21/2022]
Abstract
Unsupervised learning is becoming an essential tool to analyze the increasingly large amounts of data produced by atomistic and molecular simulations, in material science, solid state physics, biophysics, and biochemistry. In this Review, we provide a comprehensive overview of the methods of unsupervised learning that have been most commonly used to investigate simulation data and indicate likely directions for further developments in the field. In particular, we discuss feature representation of molecular systems and present state-of-the-art algorithms of dimensionality reduction, density estimation, and clustering, and kinetic models. We divide our discussion into self-contained sections, each discussing a specific method. In each section, we briefly touch upon the mathematical and algorithmic foundations of the method, highlight its strengths and limitations, and describe the specific ways in which it has been used-or can be used-to analyze molecular simulation data.
Collapse
Affiliation(s)
- Aldo Glielmo
- International
School for Advanced Studies (SISSA) 34014 Trieste, Italy
| | - Brooke E. Husic
- Freie
Universität Berlin, Department of Mathematics
and Computer Science, 14195 Berlin, Germany
| | - Alex Rodriguez
- International Centre for Theoretical
Physics (ICTP), Condensed Matter and Statistical
Physics Section, 34100 Trieste, Italy
| | - Cecilia Clementi
- Freie
Universität Berlin, Department for
Physics, 14195 Berlin, Germany
- Rice
University Houston, Department of Chemistry, Houston, Texas 77005, United States
| | - Frank Noé
- Freie
Universität Berlin, Department of Mathematics
and Computer Science, 14195 Berlin, Germany
- Freie
Universität Berlin, Department for
Physics, 14195 Berlin, Germany
- Rice
University Houston, Department of Chemistry, Houston, Texas 77005, United States
| | - Alessandro Laio
- International
School for Advanced Studies (SISSA) 34014 Trieste, Italy
- International Centre for Theoretical
Physics (ICTP), Condensed Matter and Statistical
Physics Section, 34100 Trieste, Italy
| |
Collapse
|
3
|
Hruska E, Balasubramanian V, Lee H, Jha S, Clementi C. Extensible and Scalable Adaptive Sampling on Supercomputers. J Chem Theory Comput 2020; 16:7915-7925. [PMID: 33170696 DOI: 10.1021/acs.jctc.0c00991] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
The accurate sampling of protein dynamics is an ongoing challenge despite the utilization of high-performance computer (HPC) systems. Utilizing only "brute force" molecular dynamics (MD) simulations requires an unacceptably long time to solution. Adaptive sampling methods allow a more effective sampling of protein dynamics than standard MD simulations. Depending on the restarting strategy, the speed up can be more than 1 order of magnitude. One challenge limiting the utilization of adaptive sampling by domain experts is the relatively high complexity of efficiently running adaptive sampling on HPC systems. We discuss how the ExTASY framework can set up new adaptive sampling strategies and reliably execute resulting workflows at scale on HPC platforms. Here, the folding dynamics of four proteins are predicted with no a priori information.
Collapse
Affiliation(s)
- Eugen Hruska
- Center for Theoretical Biological Physics, Rice University, Houston, Texas 77005, United States.,Department of Physics & Astronomy, Rice University, Houston, Texas 77005, United States
| | - Vivekanandan Balasubramanian
- Department of Electrical and Computer Engineering, Rutgers University, Piscataway, New Jersey 08854, United States
| | - Hyungro Lee
- Department of Electrical and Computer Engineering, Rutgers University, Piscataway, New Jersey 08854, United States
| | - Shantenu Jha
- Department of Electrical and Computer Engineering, Rutgers University, Piscataway, New Jersey 08854, United States
| | - Cecilia Clementi
- Center for Theoretical Biological Physics, Rice University, Houston, Texas 77005, United States.,Department of Physics & Astronomy, Rice University, Houston, Texas 77005, United States.,Department of Physics, Freie Universität, 14195 Berlin, Germany.,Department of Chemistry, Rice University, Houston, Texas 77005, United States
| |
Collapse
|
4
|
Foglia NO, González Lebrero MC, Biekofsky RR, Estrin DA. Reaction Path Analysis from Potential Energy Contributions Using Forces: An Accessible Estimator of Reaction Coordinate Adequacy. J Chem Theory Comput 2020; 16:1618-1629. [PMID: 31999449 DOI: 10.1021/acs.jctc.9b01081] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
The calculation of potential energy and free-energy profiles along complex chemical reactions or rare event processes is of great interest because of their importance for many areas in chemistry, molecular biology, and material science. One typical way to generate these profiles is to add a bias potential to modify the energy surface, which can act on a selected degree of freedom in the system. However, in these cases, the quality of the result is strongly dependent on the selection of the degree of freedom over which this bias potential acts. The present work introduces a simple method for the analysis of the degree of freedom selected to describe a chemical process. The proposed methodology is based on the decomposition of contributions to the potential energy profiles by the integration of forces along a reaction path, which allows evaluating the different contributions to the energy change. This could be useful for discriminating the contributions to the energy arising from different regions of the system, which is particularly useful in systems with complex environments that must be represented using hybrid quantum mechanics/molecular mechanics schemes. Furthermore, this methodology allows in generating a quick and simple analysis of the degree of freedom which is used to describe the potential energy profile associated with the reactive process. This is computationally more accessible than the corresponding free-energy profile and can therefore be used as a simple estimator of reaction coordinate adequacy.
Collapse
Affiliation(s)
- Nicolás O Foglia
- Departamento de Quı́mica Inorgánica, Analı́tica y Quı́mica Fı́sica/INQUIMAE-CONICET, Facultad de Ciencias Exactas y Naturales, Universidad de Buenos Aires, Ciudad Universitaria, Pab. II, Buenos Aires C1428EHA, Argentina
| | - Mariano C González Lebrero
- Departamento de Quı́mica Inorgánica, Analı́tica y Quı́mica Fı́sica/INQUIMAE-CONICET, Facultad de Ciencias Exactas y Naturales, Universidad de Buenos Aires, Ciudad Universitaria, Pab. II, Buenos Aires C1428EHA, Argentina
| | - Rodolfo R Biekofsky
- Moebius Research Ltd., Systems Biomedicine, 24 Chedworth House, West Green Rd, N15 5EH London, U.K
| | - Darío A Estrin
- Departamento de Quı́mica Inorgánica, Analı́tica y Quı́mica Fı́sica/INQUIMAE-CONICET, Facultad de Ciencias Exactas y Naturales, Universidad de Buenos Aires, Ciudad Universitaria, Pab. II, Buenos Aires C1428EHA, Argentina
| |
Collapse
|
5
|
Gardner JM, Abrams CF. Energetics of Flap Opening in HIV-1 Protease: String Method Calculations. J Phys Chem B 2019; 123:9584-9591. [PMID: 31640343 PMCID: PMC7375464 DOI: 10.1021/acs.jpcb.9b08348] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
HIV-1 protease (PR) is the viral protein responsible for virion maturation, and its mechanisms of action remain incompletely understood. PR is dimeric and contains two flexible, symmetry-related flaps, which act as a gate to inhibit access to the binding pocket and hold the polypeptide substrate in the binding pocket once bound. Wide flap opening, a conformational change assumed to be necessary for substrate binding, is a rare event in the closed and bound form. In this study, we use molecular dynamics (MD) simulations and advanced MD techniques including temperature acceleration and string method in collective variables to study the conformational changes associated with substrate unbinding of both wild-type and F99Y mutant PR. The F99Y mutation is shown via MD to decouple the closing of previously unrecognized distal pockets from substrate unbinding. To determine whether or not the F99Y mutation affects the energetic cost of wide flap opening, we use string method in collective variables to determine the minimum free-energy mechanism for wide flap opening in concert with distal pocket closing. The results indicate that the major energetic cost in flap opening is disengagement of the two flap-tip Ile50 residues from each other and is not affected by the F99Y mutation.
Collapse
Affiliation(s)
- Jasmine M Gardner
- Dept. of Chemical and Biological Engineering , Drexel University , 3141 Chestnut Street , Philadelphia , Pennsylvania 19104 , United States
- Department of Chemistry - BMC , Uppsala University , Box 576, 751 23 Uppsala , Sweden
| | - Cameron F Abrams
- Dept. of Chemical and Biological Engineering , Drexel University , 3141 Chestnut Street , Philadelphia , Pennsylvania 19104 , United States
| |
Collapse
|
6
|
Abstract
Most current molecular dynamics simulation and analysis methods rely on the idea that the molecular system can be represented by a single global state (e.g., a Markov state in a Markov state model [MSM]). In this approach, molecules can be extensively sampled and analyzed when they only possess a few metastable states, such as small- to medium-sized proteins. However, this approach breaks down in frustrated systems and in large protein assemblies, where the number of global metastable states may grow exponentially with the system size. To address this problem, we here introduce dynamic graphical models (DGMs) that describe molecules as assemblies of coupled subsystems, akin to how spins interact in the Ising model. The change of each subsystem state is only governed by the states of itself and its neighbors. DGMs require fewer parameters than MSMs or other global state models; in particular, we do not need to observe all global system configurations to characterize them. Therefore, DGMs can predict previously unobserved molecular configurations. As a proof of concept, we demonstrate that DGMs can faithfully describe molecular thermodynamics and kinetics and predict previously unobserved metastable states for Ising models and protein simulations.
Collapse
Affiliation(s)
- Simon Olsson
- Department of Mathematics and Computer Science, Freie Universität Berlin, 14195 Berlin, Germany;
| | - Frank Noé
- Department of Mathematics and Computer Science, Freie Universität Berlin, 14195 Berlin, Germany;
- Department of Physics, Freie Universität Berlin, 14195 Berlin, Germany
- Department of Chemistry, Rice University, Houston, TX 77005
| |
Collapse
|
7
|
Thiede EH, Giannakis D, Dinner AR, Weare J. Galerkin approximation of dynamical quantities using trajectory data. J Chem Phys 2019; 150:244111. [PMID: 31255053 PMCID: PMC6824902 DOI: 10.1063/1.5063730] [Citation(s) in RCA: 29] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2018] [Accepted: 05/13/2019] [Indexed: 11/14/2022] Open
Abstract
Understanding chemical mechanisms requires estimating dynamical statistics such as expected hitting times, reaction rates, and committors. Here, we present a general framework for calculating these dynamical quantities by approximating boundary value problems using dynamical operators with a Galerkin expansion. A specific choice of basis set in the expansion corresponds to the estimation of dynamical quantities using a Markov state model. More generally, the boundary conditions impose restrictions on the choice of basis sets. We demonstrate how an alternative basis can be constructed using ideas from diffusion maps. In our numerical experiments, this basis gives results of comparable or better accuracy to Markov state models. Additionally, we show that delay embedding can reduce the information lost when projecting the system's dynamics for model construction; this improves estimates of dynamical statistics considerably over the standard practice of increasing the lag time.
Collapse
Affiliation(s)
- Erik H Thiede
- Department of Chemistry and James Franck Institute, The University of Chicago, Chicago, Illinois 60637, USA
| | - Dimitrios Giannakis
- Courant Institute of Mathematical Sciences, New York University, New York, New York 10012, USA
| | - Aaron R Dinner
- Department of Chemistry and James Franck Institute, The University of Chicago, Chicago, Illinois 60637, USA
| | - Jonathan Weare
- Courant Institute of Mathematical Sciences, New York University, New York, New York 10012, USA
| |
Collapse
|
8
|
Qi Y, Zhang B. Predicting three-dimensional genome organization with chromatin states. PLoS Comput Biol 2019; 15:e1007024. [PMID: 31181064 PMCID: PMC6586364 DOI: 10.1371/journal.pcbi.1007024] [Citation(s) in RCA: 83] [Impact Index Per Article: 13.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2018] [Revised: 06/20/2019] [Accepted: 04/13/2019] [Indexed: 11/19/2022] Open
Abstract
We introduce a computational model to simulate chromatin structure and dynamics. Starting from one-dimensional genomics and epigenomics data that are available for hundreds of cell types, this model enables de novo prediction of chromatin structures at five-kilo-base resolution. Simulated chromatin structures recapitulate known features of genome organization, including the formation of chromatin loops, topologically associating domains (TADs) and compartments, and are in quantitative agreement with chromosome conformation capture experiments and super-resolution microscopy measurements. Detailed characterization of the predicted structural ensemble reveals the dynamical flexibility of chromatin loops and the presence of cross-talk among neighboring TADs. Analysis of the model's energy function uncovers distinct mechanisms for chromatin folding at various length scales and suggests a need to go beyond simple A/B compartment types to predict specific contacts between regulatory elements using polymer simulations.
Collapse
Affiliation(s)
- Yifeng Qi
- Departments of Chemistry, Massachusetts Institute of Technology, Cambridge, Massachusetts, United States of America
| | - Bin Zhang
- Departments of Chemistry, Massachusetts Institute of Technology, Cambridge, Massachusetts, United States of America
| |
Collapse
|
9
|
Paul F, Wu H, Vossel M, de Groot BL, Noé F. Identification of kinetic order parameters for non-equilibrium dynamics. J Chem Phys 2019; 150:164120. [PMID: 31042914 PMCID: PMC6486394 DOI: 10.1063/1.5083627] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2018] [Accepted: 04/04/2019] [Indexed: 12/17/2022] Open
Abstract
A popular approach to analyze the dynamics of high-dimensional many-body systems, such as macromolecules, is to project the trajectories onto a space of slowly varying collective variables, where subsequent analyses are made, such as clustering or estimation of free energy profiles or Markov state models. However, existing "dynamical" dimension reduction methods, such as the time-lagged independent component analysis (TICA), are only valid if the dynamics obeys detailed balance (microscopic reversibility) and typically require long, equilibrated simulation trajectories. Here, we develop a dimension reduction method for non-equilibrium dynamics based on the recently developed Variational Approach for Markov Processes (VAMP) by Wu and Noé. VAMP is illustrated by obtaining a low-dimensional description of a single file ion diffusion model and by identifying long-lived states from molecular dynamics simulations of the KcsA channel protein in an external electrochemical potential. This analysis provides detailed insights into the coupling of conformational dynamics, the configuration of the selectivity filter, and the conductance of the channel. We recommend VAMP as a replacement for the less general TICA method.
Collapse
Affiliation(s)
- Fabian Paul
- Department of Mathematics and Computer Science, FU Berlin, Arnimallee 6, 14195 Berlin, Germany
| | - Hao Wu
- Tongji University, School of Mathematical Sciences, Shanghai 200092, People's Republic of China
| | - Maximilian Vossel
- Max Planck Institute for Biophysical Chemistry, Am Fassberg 11 D-37077 Göttingen, Germany
| | - Bert L de Groot
- Max Planck Institute for Biophysical Chemistry, Am Fassberg 11 D-37077 Göttingen, Germany
| | - Frank Noé
- Department of Mathematics and Computer Science, FU Berlin, Arnimallee 6, 14195 Berlin, Germany
| |
Collapse
|
10
|
Ceriotti M. Unsupervised machine learning in atomistic simulations, between predictions and understanding. J Chem Phys 2019; 150:150901. [PMID: 31005087 DOI: 10.1063/1.5091842] [Citation(s) in RCA: 85] [Impact Index Per Article: 14.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022] Open
Abstract
Automated analyses of the outcome of a simulation have been an important part of atomistic modeling since the early days, addressing the need of linking the behavior of individual atoms and the collective properties that are usually the final quantity of interest. Methods such as clustering and dimensionality reduction have been used to provide a simplified, coarse-grained representation of the structure and dynamics of complex systems from proteins to nanoparticles. In recent years, the rise of machine learning has led to an even more widespread use of these algorithms in atomistic modeling and to consider different classification and inference techniques as part of a coherent toolbox of data-driven approaches. This perspective briefly reviews some of the unsupervised machine-learning methods-that are geared toward classification and coarse-graining of molecular simulations-seen in relation to the fundamental mathematical concepts that underlie all machine-learning techniques. It discusses the importance of using concise yet complete representations of atomic structures as the starting point of the analyses and highlights the risk of introducing preconceived biases when using machine learning to rationalize and understand structure-property relations. Supervised machine-learning techniques that explicitly attempt to predict the properties of a material given its structure are less susceptible to such biases. Current developments in the field suggest that using these two classes of approaches side-by-side and in a fully integrated mode, while keeping in mind the relations between the data analysis framework and the fundamental physical principles, will be key to realizing the full potential of machine learning to help understand the behavior of complex molecules and materials.
Collapse
Affiliation(s)
- Michele Ceriotti
- Laboratory of Computational Science and Modeling, Institute des Materiaux, École Polytechnique Fédérale de Lausanne, 1015 Lausanne, Switzerland
| |
Collapse
|
11
|
Hruska E, Abella JR, Nüske F, Kavraki LE, Clementi C. Quantitative comparison of adaptive sampling methods for protein dynamics. J Chem Phys 2019; 149:244119. [PMID: 30599712 DOI: 10.1063/1.5053582] [Citation(s) in RCA: 33] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
Adaptive sampling methods, often used in combination with Markov state models, are becoming increasingly popular for speeding up rare events in simulation such as molecular dynamics (MD) without biasing the system dynamics. Several adaptive sampling strategies have been proposed, but it is not clear which methods perform better for different physical systems. In this work, we present a systematic evaluation of selected adaptive sampling strategies on a wide selection of fast folding proteins. The adaptive sampling strategies were emulated using models constructed on already existing MD trajectories. We provide theoretical limits for the sampling speed-up and compare the performance of different strategies with and without using some a priori knowledge of the system. The results show that for different goals, different adaptive sampling strategies are optimal. In order to sample slow dynamical processes such as protein folding without a priori knowledge of the system, a strategy based on the identification of a set of metastable regions is consistently the most efficient, while a strategy based on the identification of microstates performs better if the goal is to explore newer regions of the conformational space. Interestingly, the maximum speed-up achievable for the adaptive sampling of slow processes increases for proteins with longer folding times, encouraging the application of these methods for the characterization of slower processes, beyond the fast-folding proteins considered here.
Collapse
Affiliation(s)
- Eugen Hruska
- Center for Theoretical Biological Physics, Rice University, Houston, Texas 77005, USA
| | - Jayvee R Abella
- Department of Computer Science, Rice University, Houston, Texas 77005, USA
| | - Feliks Nüske
- Center for Theoretical Biological Physics, Rice University, Houston, Texas 77005, USA
| | - Lydia E Kavraki
- Department of Computer Science, Rice University, Houston, Texas 77005, USA
| | - Cecilia Clementi
- Center for Theoretical Biological Physics, Rice University, Houston, Texas 77005, USA
| |
Collapse
|
12
|
Sittel F, Stock G. Perspective: Identification of collective variables and metastable states of protein dynamics. J Chem Phys 2018; 149:150901. [PMID: 30342445 DOI: 10.1063/1.5049637] [Citation(s) in RCA: 89] [Impact Index Per Article: 12.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/25/2023] Open
Abstract
The statistical analysis of molecular dynamics simulations requires dimensionality reduction techniques, which yield a low-dimensional set of collective variables (CVs) {x i } = x that in some sense describe the essential dynamics of the system. Considering the distribution P( x ) of the CVs, the primal goal of a statistical analysis is to detect the characteristic features of P( x ), in particular, its maxima and their connection paths. This is because these features characterize the low-energy regions and the energy barriers of the corresponding free energy landscape ΔG( x ) = -k B T ln P( x ), and therefore amount to the metastable states and transition regions of the system. In this perspective, we outline a systematic strategy to identify CVs and metastable states, which subsequently can be employed to construct a Langevin or a Markov state model of the dynamics. In particular, we account for the still limited sampling typically achieved by molecular dynamics simulations, which in practice seriously limits the applicability of theories (e.g., assuming ergodicity) and black-box software tools (e.g., using redundant input coordinates). We show that it is essential to use internal (rather than Cartesian) input coordinates, employ dimensionality reduction methods that avoid rescaling errors (such as principal component analysis), and perform density based (rather than k-means-type) clustering. Finally, we briefly discuss a machine learning approach to dimensionality reduction, which highlights the essential internal coordinates of a system and may reveal hidden reaction mechanisms.
Collapse
Affiliation(s)
- Florian Sittel
- Biomolecular Dynamics, Institute of Physics, Albert Ludwigs University, 79104 Freiburg, Germany
| | - Gerhard Stock
- Biomolecular Dynamics, Institute of Physics, Albert Ludwigs University, 79104 Freiburg, Germany
| |
Collapse
|
13
|
Qi Y, Zhang B. Predicting three-dimensional genome organization with chromatin states.. [DOI: 10.1101/282095] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/01/2023]
Abstract
ABSTRACTWe introduce a computational model to simulate chromatin structure and dynamics. Starting from one-dimensional genomics and epigenomics data that are available for hundreds of cell types, this model enables de novo prediction of chromatin structures at five-kilo-base resolution. Simulated chromatin structures recapitulate known features of genome organization, including the formation of chromatin loops, topologically associating domains (TADs) and compartments, and are in quantitative agreement with chromosome conformation capture experiments and super-resolution microscopy measurements. Detailed characterization of the predicted structural ensemble reveals the dynamical flexibility of chromatin loops and the presence of cross-talk among neighboring TADs. Analysis of the model’s energy function uncovers distinct mechanisms for chromatin folding at various length scales.
Collapse
|
14
|
Boninsegna L, Banisch R, Clementi C. A Data-Driven Perspective on the Hierarchical Assembly of Molecular Structures. J Chem Theory Comput 2017; 14:453-460. [PMID: 29207235 DOI: 10.1021/acs.jctc.7b00990] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Abstract
Macromolecular systems are composed of a very large number of atomic degrees of freedom. There is strong evidence suggesting that structural changes occurring in large biomolecular systems at long time scale dynamics may be captured by models coarser than atomistic, although a suitable or optimal coarse-graining is a priori unknown. Here we propose a systematic approach to learning a coarse representation of a macromolecule from microscopic simulation data. In particular, the definition of effective coarse variables is achieved by partitioning the degrees of freedom both in the structural (physical) space and in the conformational space. The identification of groups of microscopic particles forming dynamical coherent states in different metastable states leads to a multiscale description of the system, in space and time. The application of this approach to the folding dynamics of two proteins provides a revised view of the classical idea of prestructured regions (foldons) that combine during a protein-folding process and suggests a hierarchical characterization of the assembly process of folded structures.
Collapse
Affiliation(s)
- Lorenzo Boninsegna
- Department of Chemistry, and Center for Theoretical Biological Physics, Rice University , 6100 Main Street, Houston, Texas 77005, United States
| | - Ralf Banisch
- Department of Mathematics and Computer Science, Freie Universität Berlin , Arnimallee 6, 14195 Berlin, Germany
| | - Cecilia Clementi
- Department of Chemistry, and Center for Theoretical Biological Physics, Rice University , 6100 Main Street, Houston, Texas 77005, United States.,Department of Mathematics and Computer Science, Freie Universität Berlin , Arnimallee 6, 14195 Berlin, Germany
| |
Collapse
|
15
|
Hashemian B, Millán D, Arroyo M. Charting molecular free-energy landscapes with an atlas of collective variables. J Chem Phys 2016; 145:174109. [DOI: 10.1063/1.4966262] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022] Open
Affiliation(s)
- Behrooz Hashemian
- LaCàN, Universitat Politècnica de Catalunya–BarcelonaTech, Barcelona, Spain
| | - Daniel Millán
- LaCàN, Universitat Politècnica de Catalunya–BarcelonaTech, Barcelona, Spain
| | - Marino Arroyo
- LaCàN, Universitat Politècnica de Catalunya–BarcelonaTech, Barcelona, Spain
| |
Collapse
|
16
|
Bottaro S, Gil-Ley A, Bussi G. RNA folding pathways in stop motion. Nucleic Acids Res 2016; 44:5883-91. [PMID: 27091499 PMCID: PMC4937309 DOI: 10.1093/nar/gkw239] [Citation(s) in RCA: 34] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2016] [Revised: 03/29/2016] [Accepted: 03/29/2016] [Indexed: 11/19/2022] Open
Abstract
We introduce a method for predicting RNA folding pathways, with an application to the most important RNA tetraloops. The method is based on the idea that ensembles of three-dimensional fragments extracted from high-resolution crystal structures are heterogeneous enough to describe metastable as well as intermediate states. These ensembles are first validated by performing a quantitative comparison against available solution nuclear magnetic resonance (NMR) data of a set of RNA tetranucleotides. Notably, the agreement is better with respect to the one obtained by comparing NMR with extensive all-atom molecular dynamics simulations. We then propose a procedure based on diffusion maps and Markov models that makes it possible to obtain reaction pathways and their relative probabilities from fragment ensembles. This approach is applied to study the helix-to-loop folding pathway of all the tetraloops from the GNRA and UNCG families. The results give detailed insights into the folding mechanism that are compatible with available experimental data and clarify the role of intermediate states observed in previous simulation studies. The method is computationally inexpensive and can be used to study arbitrary conformational transitions.
Collapse
Affiliation(s)
- Sandro Bottaro
- Scuola Internazionale Superiore di Studi Avanzati, International School for Advanced Studies, 265, Via Bonomea I-34136 Trieste, Italy
| | - Alejandro Gil-Ley
- Scuola Internazionale Superiore di Studi Avanzati, International School for Advanced Studies, 265, Via Bonomea I-34136 Trieste, Italy
| | - Giovanni Bussi
- Scuola Internazionale Superiore di Studi Avanzati, International School for Advanced Studies, 265, Via Bonomea I-34136 Trieste, Italy
| |
Collapse
|
17
|
Affiliation(s)
- Baron Peters
- Department of Chemical Engineering, University of California, Santa Barbara, California 93106;
| |
Collapse
|
18
|
Boninsegna L, Gobbo G, Noé F, Clementi C. Investigating Molecular Kinetics by Variationally Optimized Diffusion Maps. J Chem Theory Comput 2015; 11:5947-60. [DOI: 10.1021/acs.jctc.5b00749] [Citation(s) in RCA: 41] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Affiliation(s)
- Lorenzo Boninsegna
- Center
for Theoretical Biological Physics and Department of Chemistry, Rice University, 6100 Main Street, Houston, Texas 77005, United States
| | - Gianpaolo Gobbo
- Maxwell
Institute for Mathematical Sciences and School of Mathematics, The University of Edinburgh, Peter Guthrie Tait Road, Edinburgh EH9 3FD, United Kingdom
| | - Frank Noé
- Department
of Mathematics, Computer Science and Bioinformatics, Freie Universität Berlin, Arnimallee 6, 14195 Berlin, Germany
| | - Cecilia Clementi
- Center
for Theoretical Biological Physics and Department of Chemistry, Rice University, 6100 Main Street, Houston, Texas 77005, United States
| |
Collapse
|
19
|
Blöchliger N, Caflisch A, Vitalis A. Weighted Distance Functions Improve Analysis of High-Dimensional Data: Application to Molecular Dynamics Simulations. J Chem Theory Comput 2015; 11:5481-92. [DOI: 10.1021/acs.jctc.5b00618] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
- Nicolas Blöchliger
- Department of Biochemistry, University of Zurich, Winterthurerstrasse
190, CH-8057 Zurich, Zurich, Switzerland
| | - Amedeo Caflisch
- Department of Biochemistry, University of Zurich, Winterthurerstrasse
190, CH-8057 Zurich, Zurich, Switzerland
| | - Andreas Vitalis
- Department of Biochemistry, University of Zurich, Winterthurerstrasse
190, CH-8057 Zurich, Zurich, Switzerland
| |
Collapse
|
20
|
Nedialkova LV, Amat MA, Kevrekidis IG, Hummer G. Diffusion maps, clustering and fuzzy Markov modeling in peptide folding transitions. J Chem Phys 2015; 141:114102. [PMID: 25240340 DOI: 10.1063/1.4893963] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Using the helix-coil transitions of alanine pentapeptide as an illustrative example, we demonstrate the use of diffusion maps in the analysis of molecular dynamics simulation trajectories. Diffusion maps and other nonlinear data-mining techniques provide powerful tools to visualize the distribution of structures in conformation space. The resulting low-dimensional representations help in partitioning conformation space, and in constructing Markov state models that capture the conformational dynamics. In an initial step, we use diffusion maps to reduce the dimensionality of the conformational dynamics of Ala5. The resulting pretreated data are then used in a clustering step. The identified clusters show excellent overlap with clusters obtained previously by using the backbone dihedral angles as input, with small--but nontrivial--differences reflecting torsional degrees of freedom ignored in the earlier approach. We then construct a Markov state model describing the conformational dynamics in terms of a discrete-time random walk between the clusters. We show that by combining fuzzy C-means clustering with a transition-based assignment of states, we can construct robust Markov state models. This state-assignment procedure suppresses short-time memory effects that result from the non-Markovianity of the dynamics projected onto the space of clusters. In a comparison with previous work, we demonstrate how manifold learning techniques may complement and enhance informed intuition commonly used to construct reduced descriptions of the dynamics in molecular conformation space.
Collapse
Affiliation(s)
- Lilia V Nedialkova
- Department of Chemical and Biological Engineering, Princeton University, Princeton, New Jersey 08544, USA
| | - Miguel A Amat
- Department of Chemical and Biological Engineering, Princeton University, Princeton, New Jersey 08544, USA
| | - Ioannis G Kevrekidis
- Department of Chemical and Biological Engineering and Program in Applied and Computational Mathematics, Princeton University, Princeton, New Jersey 08544, USA
| | - Gerhard Hummer
- Department of Theoretical Biophysics, Max Planck Institute of Biophysics, Max-von-Laue-Str. 3, 60438 Frankfurt am Main, Germany
| |
Collapse
|
21
|
Chekmarev SF. Equilibration of Protein States: A Time Dependent Free-Energy Disconnectivity Graph. J Phys Chem B 2015; 119:8340-8. [PMID: 26068182 DOI: 10.1021/acs.jpcb.5b04336] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
The process of equilibration of protein states in a three-stranded antiparallel β-sheet miniprotein is studied using a time-dependent free energy disconnectivity graph. To determine the rates of transitions, the molecular dynamics simulation results of a recent work (Kalgin, I. V.; J. Phys. Chem. B 2013, 117, 6092) are employed. The vertices of the graph are the free energies of characteristic states of the protein, and the edges are the transition state free energies. To determine the latter, the "complete" partition function (Eyring, 1935) is used, which includes the translational partition function corresponding to the ballistic motion of the system along the reaction coordinate. The distance along the reaction coordinate that enters the translational partition function is taken to be proportional to the observation time and thus measures the number of representative points that cross the transition state surface during given time. As the time increases, the free energy barriers between the clusters of characteristic conformations (native-like, helical, and β-sheet conformations of different degree of organization) decrease and (local) equilibrium between the clusters is established. With time, these clusters are grouped into larger clusters, extending the equilibrium to a larger portion of protein states.
Collapse
Affiliation(s)
- Sergei F Chekmarev
- †Institute of Thermophysics, SB RAS, 630090 Novosibirsk, Russia.,‡Department of Physics, Novosibirsk State University, 630090 Novosibirsk, Russia
| |
Collapse
|
22
|
Affiliation(s)
- Baron Peters
- Department
of Chemical Engineering, University of California, Santa Barbara, California 93106, United States
- Department
of Chemistry and Biochemistry, University of California, Santa Barbara, California 93106, United States
| |
Collapse
|
23
|
Chekmarev SF. Protein folding as a complex reaction: a two-component potential for the driving force of folding and its variation with folding scenario. PLoS One 2015; 10:e0121640. [PMID: 25848943 PMCID: PMC4388825 DOI: 10.1371/journal.pone.0121640] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2014] [Accepted: 02/11/2015] [Indexed: 11/19/2022] Open
Abstract
The Helmholtz decomposition of the vector field of probability fluxes in a two-dimensional space of collective variables makes it possible to introduce a potential for the driving force of protein folding [Chekmarev, J. Chem. Phys. 139 (2013) 145103]. The potential has two components: one component (Φ) is responsible for the source and sink of the folding flow, which represent, respectively, the unfolded and native state of the protein, and the other (Ψ) accounts for the flow vorticity inherently generated at the periphery of the flow field and provides the canalization of the flow between the source and sink. Both components obey Poisson’s equations with the corresponding source/sink terms. In the present paper, we consider how the shape of the potential changes depending on the scenario of protein folding. To mimic protein folding dynamics projected onto a two-dimensional space of collective variables, the two-dimensional Müller and Brown potential is employed. Three characteristic scenarios are considered: a single pathway from the unfolded to the native state without intermediates, two parallel pathways without intermediates, and a single pathway with an off-pathway intermediate. To determine the probability fluxes, the hydrodynamic description of the folding reaction is used, in which the first-passage folding is viewed as a steady flow of the representative points of the protein from the unfolded to the native state. We show that despite the possible complexity of the folding process, the Φ-component is simple and universal in shape. The Ψ-component is more complex and reveals characteristic features of the process of folding. The present approach is potentially applicable to other complex reactions, for which the transition from the reactant to the product can be described in a space of two (collective) variables.
Collapse
Affiliation(s)
- Sergei F. Chekmarev
- Institute of Thermophysics, 630090 Novosibirsk, Russia and Department of Physics, Novosibirsk State University, 630090 Novosibirsk, Russia
- * E-mail:
| |
Collapse
|
24
|
Kim SB, Dsilva CJ, Kevrekidis IG, Debenedetti PG. Systematic characterization of protein folding pathways using diffusion maps: Application to Trp-cage miniprotein. J Chem Phys 2015; 142:085101. [DOI: 10.1063/1.4913322] [Citation(s) in RCA: 42] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Affiliation(s)
- Sang Beom Kim
- Department of Chemical and Biological Engineering, Princeton University, Princeton, New Jersey 08544, USA
| | - Carmeline J. Dsilva
- Department of Chemical and Biological Engineering, Princeton University, Princeton, New Jersey 08544, USA
| | - Ioannis G. Kevrekidis
- Department of Chemical and Biological Engineering, Princeton University, Princeton, New Jersey 08544, USA
- Program in Applied and Computational Mathematics, Princeton University, Princeton, New Jersey 08544, USA
| | - Pablo G. Debenedetti
- Department of Chemical and Biological Engineering, Princeton University, Princeton, New Jersey 08544, USA
| |
Collapse
|
25
|
A. Rohrdanz M, Zheng W, Lambeth B, Vreede J, Clementi C. Multiscale approach to the determination of the photoactive yellow protein signaling state ensemble. PLoS Comput Biol 2014; 10:e1003797. [PMID: 25356903 PMCID: PMC4214557 DOI: 10.1371/journal.pcbi.1003797] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2013] [Accepted: 07/08/2014] [Indexed: 02/04/2023] Open
Abstract
The nature of the optical cycle of photoactive yellow protein (PYP) makes its elucidation challenging for both experiment and theory. The long transition times render conventional simulation methods ineffective, and yet the short signaling-state lifetime makes experimental data difficult to obtain and interpret. Here, through an innovative combination of computational methods, a prediction and analysis of the biological signaling state of PYP is presented. Coarse-grained modeling and locally scaled diffusion map are first used to obtain a rough bird's-eye view of the free energy landscape of photo-activated PYP. Then all-atom reconstruction, followed by an enhanced sampling scheme; diffusion map-directed-molecular dynamics are used to focus in on the signaling-state region of configuration space and obtain an ensemble of signaling state structures. To the best of our knowledge, this is the first time an all-atom reconstruction from a coarse grained model has been performed in a relatively unexplored region of molecular configuration space. We compare our signaling state prediction with previous computational and more recent experimental results, and the comparison is favorable, which validates the method presented. This approach provides additional insight to understand the PYP photo cycle, and can be applied to other systems for which more direct methods are impractical. Many protein systems of biological interest undergo dynamical changes on a time scale too long to be modeled using standard computational methods. One example is photoactive yellow protein (PYP), found in several bacterial species. Blue light, potentially harmful for DNA, triggers several structural changes in PYP, eventually resulting in a conformation that changes the swimming behavior of bacteria. This conformation is difficult to investigate, as it is too short lived. In addition, understanding this “signaling state” is computationally difficult because of the long timescale of the transition. We overcome this by constructing a coarse-grained model to rapidly induce transitions to the signaling state. We then reconstruct and further sample the all-atom configurations from these coarse-grained representations. Our results are consistent with all available experimental and computational evidence.
Collapse
Affiliation(s)
- Mary A. Rohrdanz
- Center for Theoretical Biological Physics, Rice University, Houston, Texas, United States of America
- Chemistry Department, Rice University, Houston, Texas, United States of America
| | - Wenwei Zheng
- Center for Theoretical Biological Physics, Rice University, Houston, Texas, United States of America
- Chemistry Department, Rice University, Houston, Texas, United States of America
| | - Bradley Lambeth
- Center for Theoretical Biological Physics, Rice University, Houston, Texas, United States of America
| | - Jocelyne Vreede
- van't Hoff Institute for Molecular Sciences, University of Amsterdam, Amsterdam, The Netherlands
| | - Cecilia Clementi
- Center for Theoretical Biological Physics, Rice University, Houston, Texas, United States of America
- Chemistry Department, Rice University, Houston, Texas, United States of America
- * E-mail:
| |
Collapse
|
26
|
Zheng W, Vargiu AV, Vargiu AV, Rohrdanz MA, Carloni P, Clementi C. Molecular recognition of DNA by ligands: roughness and complexity of the free energy profile. J Chem Phys 2014; 139:145102. [PMID: 24116648 DOI: 10.1063/1.4824106] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Understanding the molecular mechanism by which probes and chemotherapeutic agents bind to nucleic acids is a fundamental issue in modern drug design. From a computational perspective, valuable insights are gained by the estimation of free energy landscapes as a function of some collective variables (CVs), which are associated with the molecular recognition event. Unfortunately the choice of CVs is highly non-trivial because of DNA's high flexibility and the presence of multiple association-dissociation events at different locations and/or sliding within the grooves. Here we have applied a modified version of Locally-Scaled Diffusion Map (LSDMap), a nonlinear dimensionality reduction technique for decoupling multiple-timescale dynamics in macromolecular systems, to a metadynamics-based free energy landscape calculated using a set of intuitive CVs. We investigated the binding of the organic drug anthramycin to a DNA 14-mer duplex. By performing an extensive set of metadynamics simulations, we observed sliding of anthramycin along the full-length DNA minor groove, as well as several detachments from multiple sites, including the one identified by X-ray crystallography. As in the case of equilibrium processes, the LSDMap analysis is able to extract the most relevant collective motions, which are associated with the slow processes within the system, i.e., ligand diffusion along the minor groove and dissociation from it. Thus, LSDMap in combination with metadynamics (and possibly every equivalent method) emerges as a powerful method to describe the energetics of ligand binding to DNA without resorting to intuitive ad hoc reaction coordinates.
Collapse
Affiliation(s)
- Wenwei Zheng
- Department of Chemistry, Rice University, Houston, Texas 77005, USA
| | | | | | | | | | | |
Collapse
|
27
|
Long AW, Ferguson AL. Nonlinear Machine Learning of Patchy Colloid Self-Assembly Pathways and Mechanisms. J Phys Chem B 2014; 118:4228-44. [DOI: 10.1021/jp500350b] [Citation(s) in RCA: 59] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023]
Affiliation(s)
- Andrew W. Long
- Department of Materials Science
and Engineering, University of Illinois at Urbana−Champaign, Urbana, Illinois 61801, United States
| | - Andrew L. Ferguson
- Department of Materials Science
and Engineering, University of Illinois at Urbana−Champaign, Urbana, Illinois 61801, United States
| |
Collapse
|
28
|
Leveles I, Németh V, Szabó JE, Harmat V, Nyíri K, Bendes ÁÁ, Papp-Kádár V, Zagyva I, Róna G, Ozohanics O, Vékey K, Tóth J, Vértessy BG. Structure and enzymatic mechanism of a moonlighting dUTPase. ACTA CRYSTALLOGRAPHICA SECTION D: BIOLOGICAL CRYSTALLOGRAPHY 2013; 69:2298-308. [PMID: 24311572 DOI: 10.1107/s0907444913021136] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Received: 03/14/2013] [Accepted: 07/29/2013] [Indexed: 02/08/2023]
Abstract
Genome integrity requires well controlled cellular pools of nucleotides. dUTPases are responsible for regulating cellular dUTP levels and providing dUMP for dTTP biosynthesis. In Staphylococcus, phage dUTPases are also suggested to be involved in a moonlighting function regulating the expression of pathogenicity-island genes. Staphylococcal phage trimeric dUTPase sequences include a specific insertion that is not found in other organisms. Here, a 2.1 Å resolution three-dimensional structure of a ϕ11 phage dUTPase trimer with complete localization of the phage-specific insert, which folds into a small β-pleated mini-domain reaching out from the dUTPase core surface, is presented. The insert mini-domains jointly coordinate a single Mg2+ ion per trimer at the entrance to the threefold inner channel. Structural results provide an explanation for the role of Asp95, which is suggested to have functional significance in the moonlighting activity, as the metal-ion-coordinating moiety potentially involved in correct positioning of the insert. Enzyme-kinetics studies of wild-type and mutant constructs show that the insert has no major role in dUTP binding or cleavage and provide a description of the elementary steps (fast binding of substrate and release of products). In conclusion, the structural and kinetic data allow insights into both the phage-specific characteristics and the generally conserved traits of ϕ11 phage dUTPase.
Collapse
Affiliation(s)
- Ibolya Leveles
- Institute of Enzymology, Research Centre for Natural Sciences, Hungarian Academy of Sciences, 29 Karolina Street, 1113 Budapest, Hungary
| | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
29
|
Zheng W, Rohrdanz MA, Clementi C. Rapid exploration of configuration space with diffusion-map-directed molecular dynamics. J Phys Chem B 2013; 117:12769-76. [PMID: 23865517 PMCID: PMC3808479 DOI: 10.1021/jp401911h] [Citation(s) in RCA: 62] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]
Abstract
The gap between the time scale of interesting behavior in macromolecular systems and that which our computational resources can afford often limits molecular dynamics (MD) from understanding experimental results and predicting what is inaccessible in experiments. In this paper, we introduce a new sampling scheme, named diffusion-map-directed MD (DM-d-MD), to rapidly explore molecular configuration space. The method uses a diffusion map to guide MD on the fly. DM-d-MD can be combined with other methods to reconstruct the equilibrium free energy, and here, we used umbrella sampling as an example. We present results from two systems: alanine dipeptide and alanine-12. In both systems, we gain tremendous speedup with respect to standard MD both in exploring the configuration space and reconstructing the equilibrium distribution. In particular, we obtain 3 orders of magnitude of speedup over standard MD in the exploration of the configurational space of alanine-12 at 300 K with DM-d-MD. The method is reaction coordinate free and minimally dependent on a priori knowledge of the system. We expect wide applications of DM-d-MD to other macromolecular systems in which equilibrium sampling is not affordable by standard MD.
Collapse
Affiliation(s)
- Wenwei Zheng
- Department of Chemistry, Rice University, Houston TX 77005
| | | | | |
Collapse
|
30
|
Deng NJ, Dai W, Levy RM. How kinetics within the unfolded state affects protein folding: an analysis based on markov state models and an ultra-long MD trajectory. J Phys Chem B 2013; 117:12787-99. [PMID: 23705683 DOI: 10.1021/jp401962k] [Citation(s) in RCA: 55] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/03/2023]
Abstract
Understanding how kinetics in the unfolded state affects protein folding is a fundamentally important yet less well-understood issue. Here we employ three different models to analyze the unfolded landscape and folding kinetics of the miniprotein Trp-cage. The first is a 208 μs explicit solvent molecular dynamics (MD) simulation from D. E. Shaw Research containing tens of folding events. The second is a Markov state model (MSM-MD) constructed from the same ultralong MD simulation; MSM-MD can be used to generate thousands of folding events. The third is a Markov state model built from temperature replica exchange MD simulations in implicit solvent (MSM-REMD). All the models exhibit multiple folding pathways, and there is a good correspondence between the folding pathways from direct MD and those computed from the MSMs. The unfolded populations interconvert rapidly between extended and collapsed conformations on time scales ≤40 ns, compared with the folding time of ∼5 μs. The folding rates are independent of where the folding is initiated from within the unfolded ensemble. About 90% of the unfolded states are sampled within the first 40 μs of the ultralong MD trajectory, which on average explores ∼27% of the unfolded state ensemble between consecutive folding events. We clustered the folding pathways according to structural similarity into "tubes", and kinetically partitioned the unfolded state into populations that fold along different tubes. From our analysis of the simulations and a simple kinetic model, we find that, when the mixing within the unfolded state is comparable to or faster than folding, the folding waiting times for all the folding tubes are similar and the folding kinetics is essentially single exponential despite the presence of heterogeneous folding paths with nonuniform barriers. When the mixing is much slower than folding, different unfolded populations fold independently, leading to nonexponential kinetics. A kinetic partition of the Trp-cage unfolded state is constructed which reveals that different unfolded populations have almost the same probability to fold along any of the multiple folding paths. We are investigating whether the results for the kinetics in the unfolded state of the 20-residue Trp-cage is representative of larger single domain proteins.
Collapse
Affiliation(s)
- Nan-jie Deng
- BioMaPS Institute for Quantitative Biology and Department of Chemistry and Chemical Biology, Rutgers, the State University of New Jersey , Piscataway, New Jersey 08854, United States
| | | | | |
Collapse
|
31
|
Kalgin IV, Caflisch A, Chekmarev SF, Karplus M. New insights into the folding of a β-sheet miniprotein in a reduced space of collective hydrogen bond variables: application to a hydrodynamic analysis of the folding flow. J Phys Chem B 2013; 117:6092-105. [PMID: 23621790 DOI: 10.1021/jp401742y] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
A new analysis of the 20 μs equilibrium folding/unfolding molecular dynamics simulations of the three-stranded antiparallel β-sheet miniprotein (beta3s) in implicit solvent is presented. The conformation space is reduced in dimensionality by introduction of linear combinations of hydrogen bond distances as the collective variables making use of a specially adapted principal component analysis (PCA); i.e., to make structured conformations more pronounced, only the formed bonds are included in determining the principal components. It is shown that a three-dimensional (3D) subspace gives a meaningful representation of the folding behavior. The first component, to which eight native hydrogen bonds make the major contribution (four in each beta hairpin), is found to play the role of the reaction coordinate for the overall folding process, while the second and third components distinguish the structured conformations. The representative points of the trajectory in the 3D space are grouped into conformational clusters that correspond to locally stable conformations of beta3s identified in earlier work. A simplified kinetic network based on the three components is constructed, and it is complemented by a hydrodynamic analysis. The latter, making use of "passive tracers" in 3D space, indicates that the folding flow is much more complex than suggested by the kinetic network. A 2D representation of streamlines shows there are vortices which correspond to repeated local rearrangement, not only around minima of the free energy surface but also in flat regions between minima. The vortices revealed by the hydrodynamic analysis are apparently not evident in folding pathways generated by transition-path sampling. Making use of the fact that the values of the collective hydrogen bond variables are linearly related to the Cartesian coordinate space, the RMSD between clusters is determined. Interestingly, the transition rates show an approximate exponential correlation with distance in the hydrogen bond subspace. Comparison with the many published studies shows good agreement with the present analysis for the parts that can be compared, supporting the robust character of our understanding of this "hydrogen atom" of protein folding.
Collapse
Affiliation(s)
- Igor V Kalgin
- Department of Physics, Novosibirsk State University, 630090 Novosibirsk, Russia
| | | | | | | |
Collapse
|
32
|
Dama JF, Sinitskiy AV, McCullagh M, Weare J, Roux B, Dinner AR, Voth GA. The Theory of Ultra-Coarse-Graining. 1. General Principles. J Chem Theory Comput 2013; 9:2466-80. [PMID: 26583735 DOI: 10.1021/ct4000444] [Citation(s) in RCA: 122] [Impact Index Per Article: 10.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/02/2023]
Abstract
Coarse-grained (CG) models provide a computationally efficient means to study biomolecular and other soft matter processes involving large numbers of atoms correlated over distance scales of many covalent bond lengths and long time scales. Variational methods based on information from simulations of finer-grained (e.g., all-atom) models, for example the multiscale coarse-graining (MS-CG) and relative entropy minimization methods, provide attractive tools for the systematic development of CG models. However, these methods have important drawbacks when used in the "ultra-coarse-grained" (UCG) regime, e.g., at a resolution level coarser or much coarser than one amino acid residue per effective CG particle in proteins. This is due to the possible existence of multiple metastable states "within" the CG sites for a given UCG model configuration. In this work, systematic variational UCG methods are presented that are specifically designed to CG entire protein domains and subdomains into single effective CG particles. This is accomplished by augmenting existing effective particle CG schemes to allow for discrete state transitions and configuration-dependent resolution. Additionally, certain conclusions of this work connect back to single-state force matching and open up new avenues for method development in that area. These results provide a formal statistical mechanical basis for UCG methods related to force matching and relative entropy CG methods and suggest practical algorithms for constructing optimal approximate UCG models from fine-grained simulation data.
Collapse
Affiliation(s)
- James F Dama
- Department of Chemistry and Institute for Biophysical Dynamics, ‡Computation Institute, §James Franck Institute, ∥Department of Mathematics, ⊥Department of Biochemistry and Molecular Biology, University of Chicago , Chicago, Illinois 60637, United States
| | - Anton V Sinitskiy
- Department of Chemistry and Institute for Biophysical Dynamics, ‡Computation Institute, §James Franck Institute, ∥Department of Mathematics, ⊥Department of Biochemistry and Molecular Biology, University of Chicago , Chicago, Illinois 60637, United States
| | - Martin McCullagh
- Department of Chemistry and Institute for Biophysical Dynamics, ‡Computation Institute, §James Franck Institute, ∥Department of Mathematics, ⊥Department of Biochemistry and Molecular Biology, University of Chicago , Chicago, Illinois 60637, United States
| | - Jonathan Weare
- Department of Chemistry and Institute for Biophysical Dynamics, ‡Computation Institute, §James Franck Institute, ∥Department of Mathematics, ⊥Department of Biochemistry and Molecular Biology, University of Chicago , Chicago, Illinois 60637, United States
| | - Benoît Roux
- Department of Chemistry and Institute for Biophysical Dynamics, ‡Computation Institute, §James Franck Institute, ∥Department of Mathematics, ⊥Department of Biochemistry and Molecular Biology, University of Chicago , Chicago, Illinois 60637, United States
| | - Aaron R Dinner
- Department of Chemistry and Institute for Biophysical Dynamics, ‡Computation Institute, §James Franck Institute, ∥Department of Mathematics, ⊥Department of Biochemistry and Molecular Biology, University of Chicago , Chicago, Illinois 60637, United States
| | - Gregory A Voth
- Department of Chemistry and Institute for Biophysical Dynamics, ‡Computation Institute, §James Franck Institute, ∥Department of Mathematics, ⊥Department of Biochemistry and Molecular Biology, University of Chicago , Chicago, Illinois 60637, United States
| |
Collapse
|
33
|
Duan M, Fan J, Li M, Han L, Huo S. Evaluation of Dimensionality-reduction Methods from Peptide Folding-unfolding Simulations. J Chem Theory Comput 2013; 9:2490-2497. [PMID: 23772182 DOI: 10.1021/ct400052y] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Dimensionality reduction methods have been widely used to study the free energy landscapes and low-free energy pathways of molecular systems. It was shown that the non-linear dimensionality-reduction methods gave better embedding results than the linear methods, such as principal component analysis, in some simple systems. In this study, we have evaluated several non linear methods, locally linear embedding, Isomap, and diffusion maps, as well as principal component analysis from the equilibrium folding/unfolding trajectory of the second β-hairpin of the B1 domain of streptococcal protein G. The CHARMM parm19 polar hydrogen potential function was used. A series of criteria which reflects different aspects of the embedding qualities were employed in the evaluation. Our results show that principal component analysis is not worse than the non-linear ones on this complex system. There is no clear winner in all aspects of the evaluation. Each dimensionality-reduction method has its limitations in a certain aspect. We emphasize that a fair, informative assessment of an embedding result requires a combination of multiple evaluation criteria rather than any single one. Caution should be used when dimensionality-reduction methods are employed, especially when only a few of top embedding dimensions are used to describe the free energy landscape.
Collapse
Affiliation(s)
- Mojie Duan
- Gustaf H. Carlson School of Chemistry and Biochemistry, Clark University, Worcester, MA 01610 USA
| | | | | | | | | |
Collapse
|
34
|
Rohrdanz MA, Zheng W, Clementi C. Discovering Mountain Passes via Torchlight: Methods for the Definition of Reaction Coordinates and Pathways in Complex Macromolecular Reactions. Annu Rev Phys Chem 2013; 64:295-316. [DOI: 10.1146/annurev-physchem-040412-110006] [Citation(s) in RCA: 150] [Impact Index Per Article: 12.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Affiliation(s)
| | - Wenwei Zheng
- Department of Chemistry, Rice University, Houston, Texas 77005;
| | | |
Collapse
|
35
|
Guttenberg N, Dama JF, Saunders MG, Voth GA, Weare J, Dinner AR. Minimizing memory as an objective for coarse-graining. J Chem Phys 2013; 138:094111. [DOI: 10.1063/1.4793313] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
|
36
|
Abdul-Wahid B, Yu L, Rajan D, Feng H, Darve E, Thain D, Izaguirre JA. Folding Proteins at 500 ns/hour with Work Queue. PROCEEDINGS ... IEEE INTERNATIONAL CONFERENCE ON ESCIENCE. IEEE INTERNATIONAL CONFERENCE ON ESCIENCE 2012; 2012:1-8. [PMID: 25540799 DOI: 10.1109/escience.2012.6404429] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
Molecular modeling is a field that traditionally has large computational costs. Until recently, most simulation techniques relied on long trajectories, which inherently have poor scalability. A new class of methods is proposed that requires only a large number of short calculations, and for which minimal communication between computer nodes is required. We considered one of the more accurate variants called Accelerated Weighted Ensemble Dynamics (AWE) and for which distributed computing can be made efficient. We implemented AWE using the Work Queue framework for task management and applied it to an all atom protein model (Fip35 WW domain). We can run with excellent scalability by simultaneously utilizing heterogeneous resources from multiple computing platforms such as clouds (Amazon EC2, Microsoft Azure), dedicated clusters, grids, on multiple architectures (CPU/GPU, 32/64bit), and in a dynamic environment in which processes are regularly added or removed from the pool. This has allowed us to achieve an aggregate sampling rate of over 500 ns/hour. As a comparison, a single process typically achieves 0.1 ns/hour.
Collapse
Affiliation(s)
- Badi' Abdul-Wahid
- University of Notre Dame, Notre Dame, IN 46656 ; Department of Computer Science & Engineering ; Interdisciplinary Center for Network Science and Applications
| | - Li Yu
- University of Notre Dame, Notre Dame, IN 46656 ; Department of Computer Science & Engineering
| | - Dinesh Rajan
- University of Notre Dame, Notre Dame, IN 46656 ; Department of Computer Science & Engineering
| | - Haoyun Feng
- University of Notre Dame, Notre Dame, IN 46656 ; Department of Computer Science & Engineering ; Interdisciplinary Center for Network Science and Applications
| | - Eric Darve
- Stanford University, 450 Serra Mall, Stanford, CA 94305 ; Department of Mechanical Engineering ; Institute for Computational and Mathematical Engineering
| | - Douglas Thain
- University of Notre Dame, Notre Dame, IN 46656 ; Department of Computer Science & Engineering
| | - Jesús A Izaguirre
- University of Notre Dame, Notre Dame, IN 46656 ; Department of Computer Science & Engineering ; Interdisciplinary Center for Network Science and Applications
| |
Collapse
|
37
|
Chebaro Y, Pasquali S, Derreumaux P. The Coarse-Grained OPEP Force Field for Non-Amyloid and Amyloid Proteins. J Phys Chem B 2012; 116:8741-52. [DOI: 10.1021/jp301665f] [Citation(s) in RCA: 86] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Affiliation(s)
- Yassmine Chebaro
- Laboratoire de Biochimie Théorique,
CNRS UPR 9080, Université Paris Diderot, Sorbonne Paris Cité, Institut de Biologie Physico-Chimique,
13 rue Pierre et Marie Curie, 75005 Paris
| | - Samuela Pasquali
- Laboratoire de Biochimie Théorique,
CNRS UPR 9080, Université Paris Diderot, Sorbonne Paris Cité, Institut de Biologie Physico-Chimique,
13 rue Pierre et Marie Curie, 75005 Paris
| | - Philippe Derreumaux
- Laboratoire de Biochimie Théorique,
CNRS UPR 9080, Université Paris Diderot, Sorbonne Paris Cité, Institut de Biologie Physico-Chimique,
13 rue Pierre et Marie Curie, 75005 Paris
- Institut Universitaire de France, 103 Bvd Saint-Michel, Paris 75005, France
| |
Collapse
|