1
|
Wu Y, Cao S, Qiu Y, Huang X. Tutorial on how to build non-Markovian dynamic models from molecular dynamics simulations for studying protein conformational changes. J Chem Phys 2024; 160:121501. [PMID: 38516972 DOI: 10.1063/5.0189429] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2023] [Accepted: 02/20/2024] [Indexed: 03/23/2024] Open
Abstract
Protein conformational changes play crucial roles in their biological functions. In recent years, the Markov State Model (MSM) constructed from extensive Molecular Dynamics (MD) simulations has emerged as a powerful tool for modeling complex protein conformational changes. In MSMs, dynamics are modeled as a sequence of Markovian transitions among metastable conformational states at discrete time intervals (called lag time). A major challenge for MSMs is that the lag time must be long enough to allow transitions among states to become memoryless (or Markovian). However, this lag time is constrained by the length of individual MD simulations available to track these transitions. To address this challenge, we have recently developed Generalized Master Equation (GME)-based approaches, encoding non-Markovian dynamics using a time-dependent memory kernel. In this Tutorial, we introduce the theory behind two recently developed GME-based non-Markovian dynamic models: the quasi-Markov State Model (qMSM) and the Integrative Generalized Master Equation (IGME). We subsequently outline the procedures for constructing these models and provide a step-by-step tutorial on applying qMSM and IGME to study two peptide systems: alanine dipeptide and villin headpiece. This Tutorial is available at https://github.com/xuhuihuang/GME_tutorials. The protocols detailed in this Tutorial aim to be accessible for non-experts interested in studying the biomolecular dynamics using these non-Markovian dynamic models.
Collapse
Affiliation(s)
- Yue Wu
- Department of Chemistry, Theoretical Chemistry Institute, University of Wisconsin-Madison, Madison, Wisconsin 53706, USA
| | - Siqin Cao
- Department of Chemistry, Theoretical Chemistry Institute, University of Wisconsin-Madison, Madison, Wisconsin 53706, USA
| | - Yunrui Qiu
- Department of Chemistry, Theoretical Chemistry Institute, University of Wisconsin-Madison, Madison, Wisconsin 53706, USA
| | - Xuhui Huang
- Department of Chemistry, Theoretical Chemistry Institute, University of Wisconsin-Madison, Madison, Wisconsin 53706, USA
- Data Science Institute, University of Wisconsin-Madison, Madison, Wisconsin 53706, USA
| |
Collapse
|
2
|
Nagel D, Sartore S, Stock G. Toward a Benchmark for Markov State Models: The Folding of HP35. J Phys Chem Lett 2023; 14:6956-6967. [PMID: 37504674 DOI: 10.1021/acs.jpclett.3c01561] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/29/2023]
Abstract
Adopting a 300 μs long MD trajectory of the folding of villin headpiece (HP35) by D. E. Shaw Research, we recently constructed a Markov state model (MSM) based on inter-residue contacts. The model reproduces the folding time and predicts that the native basin and unfolded region consist of metastable substates that are structurally well-characterized. Recognizing the need to establish well-defined benchmark problems, we study to what extent and in what sense this MSM can be employed as a reference model. Hence, we test the robustness of the MSM by comparing it to models that use alternative combinations of features, dimensionality reduction methods, and clustering schemes. The study suggests some main characteristics of the folding of HP35 that should be reproduced by other competitive models. Moreover, the discussion reveals which parts of the MSM workflow matter most for the considered problem and illustrates the promises and pitfalls of state-based models for the interpretation of biomolecular simulations.
Collapse
Affiliation(s)
- Daniel Nagel
- Biomolecular Dynamics, Institute of Physics, University of Freiburg, 79104 Freiburg, Germany
| | - Sofia Sartore
- Biomolecular Dynamics, Institute of Physics, University of Freiburg, 79104 Freiburg, Germany
| | - Gerhard Stock
- Biomolecular Dynamics, Institute of Physics, University of Freiburg, 79104 Freiburg, Germany
| |
Collapse
|
3
|
Voelz VA, Pande VS, Bowman GR. Folding@home: Achievements from over 20 years of citizen science herald the exascale era. Biophys J 2023; 122:2852-2863. [PMID: 36945779 PMCID: PMC10398258 DOI: 10.1016/j.bpj.2023.03.028] [Citation(s) in RCA: 18] [Impact Index Per Article: 18.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2022] [Revised: 01/26/2023] [Accepted: 03/16/2023] [Indexed: 03/23/2023] Open
Abstract
Simulations of biomolecules have enormous potential to inform our understanding of biology but require extremely demanding calculations. For over 20 years, the Folding@home distributed computing project has pioneered a massively parallel approach to biomolecular simulation, harnessing the resources of citizen scientists across the globe. Here, we summarize the scientific and technical advances this perspective has enabled. As the project's name implies, the early years of Folding@home focused on driving advances in our understanding of protein folding by developing statistical methods for capturing long-timescale processes and facilitating insight into complex dynamical processes. Success laid a foundation for broadening the scope of Folding@home to address other functionally relevant conformational changes, such as receptor signaling, enzyme dynamics, and ligand binding. Continued algorithmic advances, hardware developments such as graphics processing unit (GPU)-based computing, and the growing scale of Folding@home have enabled the project to focus on new areas where massively parallel sampling can be impactful. While previous work sought to expand toward larger proteins with slower conformational changes, new work focuses on large-scale comparative studies of different protein sequences and chemical compounds to better understand biology and inform the development of small-molecule drugs. Progress on these fronts enabled the community to pivot quickly in response to the COVID-19 pandemic, expanding to become the world's first exascale computer and deploying this massive resource to provide insight into the inner workings of the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) virus and aid the development of new antivirals. This success provides a glimpse of what is to come as exascale supercomputers come online and as Folding@home continues its work.
Collapse
Affiliation(s)
- Vincent A Voelz
- Department of Chemistry, Temple University, Philadelphia, Pennsylvania
| | | | - Gregory R Bowman
- Departments of Biochemistry & Biophysics and of Bioengineering, University of Pennsylvania, Philadelphia, Pennsylvania.
| |
Collapse
|
4
|
Qiu Y, O’Connor MS, Xue M, Liu B, Huang X. An Efficient Path Classification Algorithm Based on Variational Autoencoder to Identify Metastable Path Channels for Complex Conformational Changes. J Chem Theory Comput 2023; 19:4728-4742. [PMID: 37382437 PMCID: PMC11042546 DOI: 10.1021/acs.jctc.3c00318] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/30/2023]
Abstract
Conformational changes (i.e., dynamic transitions between pairs of conformational states) play important roles in many chemical and biological processes. Constructing the Markov state model (MSM) from extensive molecular dynamics (MD) simulations is an effective approach to dissect the mechanism of conformational changes. When combined with transition path theory (TPT), MSM can be applied to elucidate the ensemble of kinetic pathways connecting pairs of conformational states. However, the application of TPT to analyze complex conformational changes often results in a vast number of kinetic pathways with comparable fluxes. This obstacle is particularly pronounced in heterogeneous self-assembly and aggregation processes. The large number of kinetic pathways makes it challenging to comprehend the molecular mechanisms underlying conformational changes of interest. To address this challenge, we have developed a path classification algorithm named latent-space path clustering (LPC) that efficiently lumps parallel kinetic pathways into distinct metastable path channels, making them easier to comprehend. In our algorithm, MD conformations are first projected onto a low-dimensional space containing a small set of collective variables (CVs) by time-structure-based independent component analysis (tICA) with kinetic mapping. Then, MSM and TPT are constructed to obtain the ensemble of pathways, and a deep learning architecture named the variational autoencoder (VAE) is used to learn the spatial distributions of kinetic pathways in the continuous CV space. Based on the trained VAE model, the TPT-generated ensemble of kinetic pathways can be embedded into a latent space, where the classification becomes clear. We show that LPC can efficiently and accurately identify the metastable path channels in three systems: a 2D potential, the aggregation of two hydrophobic particles in water, and the folding of the Fip35 WW domain. Using the 2D potential, we further demonstrate that our LPC algorithm outperforms the previous path-lumping algorithms by making substantially fewer incorrect assignments of individual pathways to four path channels. We expect that LPC can be widely applied to identify the dominant kinetic pathways underlying complex conformational changes.
Collapse
Affiliation(s)
- Yunrui Qiu
- Department of Chemistry, Theoretical Chemistry Institute, University of Wisconsin-Madison, Madison, WI, 53706, USA
| | - Michael S. O’Connor
- Biophysics Graduate Program, University of Wisconsin-Madison, Madison, WI, 53706, USA
| | - Mingyi Xue
- Department of Chemistry, Theoretical Chemistry Institute, University of Wisconsin-Madison, Madison, WI, 53706, USA
| | - Bojun Liu
- Department of Chemistry, Theoretical Chemistry Institute, University of Wisconsin-Madison, Madison, WI, 53706, USA
| | - Xuhui Huang
- Department of Chemistry, Theoretical Chemistry Institute, University of Wisconsin-Madison, Madison, WI, 53706, USA
- Biophysics Graduate Program, University of Wisconsin-Madison, Madison, WI, 53706, USA
| |
Collapse
|
5
|
Gorgulla C, Jayaraj A, Fackeldey K, Arthanari H. Emerging frontiers in virtual drug discovery: From quantum mechanical methods to deep learning approaches. Curr Opin Chem Biol 2022; 69:102156. [PMID: 35576813 PMCID: PMC9990419 DOI: 10.1016/j.cbpa.2022.102156] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2021] [Revised: 03/16/2022] [Accepted: 04/07/2022] [Indexed: 11/19/2022]
Abstract
Virtual screening-based approaches to discover initial hit and lead compounds have the potential to reduce both the cost and time of early drug discovery stages, as well as to find inhibitors for even challenging target sites such as protein-protein interfaces. Here in this review, we provide an overview of the progress that has been made in virtual screening methodology and technology on multiple fronts in recent years. The advent of ultra-large virtual screens, in which hundreds of millions to billions of compounds are screened, has proven to be a powerful approach to discover highly potent hit compounds. However, these developments are just the tip of the iceberg, with new technologies and methods emerging to propel the field forward. Examples include novel machine-learning approaches, which can reduce the computational costs of virtual screening dramatically, while progress in quantum-mechanical approaches can increase the accuracy of predictions of various small molecule properties.
Collapse
Affiliation(s)
- Christoph Gorgulla
- Department of Biological Chemistry and Molecular Pharmacology, Blavatnik Institute, Harvard Medical School (HMS), Boston, MA, USA; Department of Physics, Faculty of Arts and Sciences, Harvard University, Cambridge, MA, USA; Department of Cancer Biology, Dana-Farber Cancer Institute (DFCI), Boston, MA, USA
| | | | - Konstantin Fackeldey
- Institute of Mathematics, Technical University Berlin, Berlin, Germany; Zuse Institute Berlin, Berlin, Germany
| | - Haribabu Arthanari
- Department of Biological Chemistry and Molecular Pharmacology, Blavatnik Institute, Harvard Medical School (HMS), Boston, MA, USA; Department of Cancer Biology, Dana-Farber Cancer Institute (DFCI), Boston, MA, USA.
| |
Collapse
|
6
|
Xu P, Mou X, Guo Q, Fu T, Ren H, Wang G, Li Y, Li G. Coarse-grained molecular dynamics study based on TorchMD. CHINESE J CHEM PHYS 2021. [DOI: 10.1063/1674-0068/cjcp2110218] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]
Affiliation(s)
- Peijun Xu
- Liaoning Normal University, Dalian 116029, China
| | - Xiaohong Mou
- Liaoning Normal University, Dalian 116029, China
| | - Qiuhan Guo
- Liaoning Normal University, Dalian 116029, China
| | - Ting Fu
- Pharmacy Department of Affiliated Zhongshan Hospital of Dalian University, Dalian 116001, China
| | - Hong Ren
- Department of Ophthalmology Aerospace Center Hospital, Beijing 100049, China
| | - Guiyan Wang
- Dalian Ocean University, Dalian 116029, China
| | - Yan Li
- Dalian Institute of Chemical Physics, State Key Laboratory of Molecular Reaction Dynamics, Dalian 116023, China
| | - Guohui Li
- Dalian Institute of Chemical Physics, State Key Laboratory of Molecular Reaction Dynamics, Dalian 116023, China
| |
Collapse
|
7
|
Zhu L, Jiang H, Cao S, Unarta IC, Gao X, Huang X. Critical role of backbone coordination in the mRNA recognition by RNA induced silencing complex. Commun Biol 2021; 4:1345. [PMID: 34848812 PMCID: PMC8632932 DOI: 10.1038/s42003-021-02822-7] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2021] [Accepted: 10/26/2021] [Indexed: 01/02/2023] Open
Abstract
Despite its functional importance, the molecular mechanism underlying target mRNA recognition by Argonaute (Ago) remains largely elusive. Based on extensive all-atom molecular dynamics simulations, we constructed quasi-Markov State Model (qMSM) to reveal the dynamics during recognition at position 6-7 in the seed region of human Argonaute 2 (hAgo2). Interestingly, we found that the slowest mode of motion therein is not the gRNA-target base-pairing, but the coordination of the target phosphate groups with a set of positively charged residues of hAgo2. Moreover, the ability of Helix-7 to approach the PIWI and MID domains was found to reduce the effective volume accessible to the target mRNA and therefore facilitate both the backbone coordination and base-pair formation. Further mutant simulations revealed that alanine mutation of the D358 residue on Helix-7 enhanced a trap state to slow down the loading of target mRNA. Similar trap state was also observed when wobble pairs were introduced in g6 and g7, indicating the role of Helix-7 in suppressing non-canonical base-paring. Our study pointed to a general mechanism for mRNA recognition by eukaryotic Agos and demonstrated the promise of qMSM in investigating complex conformational changes of biomolecular systems.
Collapse
Affiliation(s)
- Lizhe Zhu
- Warshel Institute for Computational Biology, School of Life and Health Sciences, The Chinese University of Hong Kong (Shenzhen), Shenzhen, Guangdong, 518172, China
- Department of Chemistry, The Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong
| | - Hanlun Jiang
- Department of Chemical and Biological Engineering, The Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong
- Department of Biochemistry, Institute for Protein Design, University of Washington, Seattle, WA, 98195, USA
| | - Siqin Cao
- Department of Chemistry, The Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong
- Center of Systems Biology and Human Health, State Key Laboratory of Molecular Neuroscience, The Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong
| | - Ilona Christy Unarta
- Department of Chemical and Biological Engineering, The Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong
- Center of Systems Biology and Human Health, State Key Laboratory of Molecular Neuroscience, The Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong
| | - Xin Gao
- Computational Bioscience Research Center, Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal, 23955-6900, Saudi Arabia.
| | - Xuhui Huang
- Department of Chemistry, The Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong.
- Department of Chemical and Biological Engineering, The Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong.
- Center of Systems Biology and Human Health, State Key Laboratory of Molecular Neuroscience, The Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong.
| |
Collapse
|
8
|
Damjanovic J, Murphy JM, Lin YS. CATBOSS: Cluster Analysis of Trajectories Based on Segment Splitting. J Chem Inf Model 2021; 61:5066-5081. [PMID: 34608796 PMCID: PMC8549068 DOI: 10.1021/acs.jcim.1c00598] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/14/2023]
Abstract
![]()
Molecular dynamics
(MD) simulations are an exceedingly and increasingly
potent tool for molecular behavior prediction and analysis. However,
the enormous wealth of data generated by these simulations can be
difficult to process and render in a human-readable fashion. Cluster
analysis is a commonly used way to partition data into structurally
distinct states. We present a method that improves on the state of
the art by taking advantage of the temporal information of MD trajectories
to enable more accurate clustering at a lower memory cost. To date,
cluster analysis of MD simulations has generally treated simulation
snapshots as a mere collection of independent data points and attempted
to separate them into different clusters based on structural similarity.
This new method, cluster analysis of trajectories based on segment
splitting (CATBOSS), applies density-peak-based clustering to classify trajectory segments learned by change detection. Applying
the method to a synthetic toy model as well as four real-life data
sets–trajectories of MD simulations of alanine dipeptide and
valine dipeptide as well as two fast-folding proteins–we find
CATBOSS to be robust and highly performant, yielding natural-looking
cluster boundaries and greatly improving clustering resolution. As
the classification of points into segments emphasizes density gaps
in the data by grouping them close to the state means, CATBOSS applied
to the valine dipeptide system is even able to account for a degree
of freedom deliberately omitted from the input data set. We also demonstrate
the potential utility of CATBOSS in distinguishing metastable states
from transition segments as well as promising application to cases
where there is little or no advance knowledge of intrinsic coordinates,
making for a highly versatile analysis tool.
Collapse
Affiliation(s)
- Jovan Damjanovic
- Department of Chemistry, Tufts University, Medford, Massachusetts 02155, United States
| | - James M Murphy
- Department of Mathematics, Tufts University, Medford, Massachusetts 02155, United States
| | - Yu-Shan Lin
- Department of Chemistry, Tufts University, Medford, Massachusetts 02155, United States
| |
Collapse
|
9
|
Konovalov K, Unarta IC, Cao S, Goonetilleke EC, Huang X. Markov State Models to Study the Functional Dynamics of Proteins in the Wake of Machine Learning. JACS AU 2021; 1:1330-1341. [PMID: 34604842 PMCID: PMC8479766 DOI: 10.1021/jacsau.1c00254] [Citation(s) in RCA: 51] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/07/2021] [Indexed: 05/19/2023]
Abstract
Markov state models (MSMs) based on molecular dynamics (MD) simulations are routinely employed to study protein folding, however, their application to functional conformational changes of biomolecules is still limited. In the past few years, the field of computational chemistry has experienced a surge of advancements stemming from machine learning algorithms, and MSMs have not been left out. Unlike global processes, such as protein folding, the application of MSMs to functional conformational changes is challenging because they mostly consist of localized structural transitions. Therefore, it is critical to properly select a subset of structural features that can describe the slowest dynamics of these functional conformational changes. To address this challenge, we recommend several automatic feature selection methods such as Spectral-OASIS. To identify states in MSMs, the chosen features can be subject to dimensionality reduction methods such as TICA or deep learning based VAMPNets to project MD conformations onto a few collective variables for subsequent clustering. Another challenge for the application of MSMs to the study of functional conformational changes is the ability to comprehend their biophysical mechanisms, as MSMs built for these processes often require a large number of states. We recommend the recently developed quasi-MSMs (qMSMs) to address this issue. Compared to MSMs, qMSMs encode the non-Markovian dynamics via the generalized master equation and can significantly reduce the number of states. As a result, qMSMs can be built with a handful of states to facilitate the interpretation of functional conformational changes. In the wake of machine learning, we believe that the rapid advancement in the MSM methodology will lead to their wider application in studying functional conformational changes of biomolecules.
Collapse
Affiliation(s)
- Kirill
A. Konovalov
- Department
of Chemistry, State Key Laboratory of Molecular Neuroscience, The Hong Kong University of Science and Technology, Kowloon, Hong Kong
- Hong
Kong Center for Neurodegenerative Diseases, Hong Kong Science Park, Hong Kong
| | - Ilona Christy Unarta
- Department
of Chemical and Biological Engineering, The Hong Kong University of Science and Technology, Kowloon, Hong Kong
- Hong
Kong Center for Neurodegenerative Diseases, Hong Kong Science Park, Hong Kong
| | - Siqin Cao
- Department
of Chemistry, State Key Laboratory of Molecular Neuroscience, The Hong Kong University of Science and Technology, Kowloon, Hong Kong
- Hong
Kong Center for Neurodegenerative Diseases, Hong Kong Science Park, Hong Kong
| | - Eshani C. Goonetilleke
- Department
of Chemistry, State Key Laboratory of Molecular Neuroscience, The Hong Kong University of Science and Technology, Kowloon, Hong Kong
- Hong
Kong Center for Neurodegenerative Diseases, Hong Kong Science Park, Hong Kong
| | - Xuhui Huang
- Department
of Chemistry, State Key Laboratory of Molecular Neuroscience, The Hong Kong University of Science and Technology, Kowloon, Hong Kong
- Department
of Chemical and Biological Engineering, The Hong Kong University of Science and Technology, Kowloon, Hong Kong
- Hong
Kong Center for Neurodegenerative Diseases, Hong Kong Science Park, Hong Kong
| |
Collapse
|
10
|
Glielmo A, Husic BE, Rodriguez A, Clementi C, Noé F, Laio A. Unsupervised Learning Methods for Molecular Simulation Data. Chem Rev 2021; 121:9722-9758. [PMID: 33945269 PMCID: PMC8391792 DOI: 10.1021/acs.chemrev.0c01195] [Citation(s) in RCA: 116] [Impact Index Per Article: 38.7] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2020] [Indexed: 12/21/2022]
Abstract
Unsupervised learning is becoming an essential tool to analyze the increasingly large amounts of data produced by atomistic and molecular simulations, in material science, solid state physics, biophysics, and biochemistry. In this Review, we provide a comprehensive overview of the methods of unsupervised learning that have been most commonly used to investigate simulation data and indicate likely directions for further developments in the field. In particular, we discuss feature representation of molecular systems and present state-of-the-art algorithms of dimensionality reduction, density estimation, and clustering, and kinetic models. We divide our discussion into self-contained sections, each discussing a specific method. In each section, we briefly touch upon the mathematical and algorithmic foundations of the method, highlight its strengths and limitations, and describe the specific ways in which it has been used-or can be used-to analyze molecular simulation data.
Collapse
Affiliation(s)
- Aldo Glielmo
- International
School for Advanced Studies (SISSA) 34014 Trieste, Italy
| | - Brooke E. Husic
- Freie
Universität Berlin, Department of Mathematics
and Computer Science, 14195 Berlin, Germany
| | - Alex Rodriguez
- International Centre for Theoretical
Physics (ICTP), Condensed Matter and Statistical
Physics Section, 34100 Trieste, Italy
| | - Cecilia Clementi
- Freie
Universität Berlin, Department for
Physics, 14195 Berlin, Germany
- Rice
University Houston, Department of Chemistry, Houston, Texas 77005, United States
| | - Frank Noé
- Freie
Universität Berlin, Department of Mathematics
and Computer Science, 14195 Berlin, Germany
- Freie
Universität Berlin, Department for
Physics, 14195 Berlin, Germany
- Rice
University Houston, Department of Chemistry, Houston, Texas 77005, United States
| | - Alessandro Laio
- International
School for Advanced Studies (SISSA) 34014 Trieste, Italy
- International Centre for Theoretical
Physics (ICTP), Condensed Matter and Statistical
Physics Section, 34100 Trieste, Italy
| |
Collapse
|
11
|
Jiang H, Fan X. The Two-Step Clustering Approach for Metastable States Learning. Int J Mol Sci 2021; 22:6576. [PMID: 34205252 PMCID: PMC8233889 DOI: 10.3390/ijms22126576] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2021] [Revised: 06/14/2021] [Accepted: 06/14/2021] [Indexed: 01/20/2023] Open
Abstract
Understanding the energy landscape and the conformational dynamics is crucial for studying many biological or chemical processes, such as protein-protein interaction and RNA folding. Molecular Dynamics (MD) simulations have been a major source of dynamic structure. Although many methods were proposed for learning metastable states from MD data, some key problems are still in need of further investigation. Here, we give a brief review on recent progresses in this field, with an emphasis on some popular methods belonging to a two-step clustering framework, and hope to draw more researchers to contribute to this area.
Collapse
Affiliation(s)
- Hangjin Jiang
- Center for Data Science, Zhejiang University, Hangzhou 310058, China;
| | - Xiaodan Fan
- Department of Statistics, The Chinese University of Hong Kong, Hong Kong, China
| |
Collapse
|
12
|
Marinova V, Dodd L, Lee SJ, Wood GPF, Marziano I, Salvalaglio M. Identifying Conformational Isomers of Organic Molecules in Solution via Unsupervised Clustering. J Chem Inf Model 2021; 61:2263-2273. [PMID: 33913713 PMCID: PMC8278389 DOI: 10.1021/acs.jcim.0c01387] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
We present a systematic approach for the identification of statistically relevant conformational macrostates of organic molecules from molecular dynamics trajectories. The approach applies to molecules characterized by an arbitrary number of torsional degrees of freedom and enables the transferability of the macrostates definition across different environments. We formulate a dissimilarity measure between molecular configurations that incorporates information on the characteristic energetic cost associated with transitions along all relevant torsional degrees of freedom. Such metric is employed to perform unsupervised clustering of molecular configurations based on the Fast Search and Find of Density Peaks algorithm. We apply this method to investigate the equilibrium conformational ensemble of Sildenafil, a conformationally complex pharmaceutical compound, in different environments including the crystal bulk, the gas phase, and three different solvents (acetonitrile, 1-butanol, and toluene). We demonstrate that while Sildenafil can adopt more than 100 metastable conformational configurations, only 12 are significantly populated across all of the environments investigated. Despite the complexity of the conformational space, we find that the most abundant conformers in solution are the closest to the conformers found in the most common Sildenafil crystal phase.
Collapse
Affiliation(s)
- Veselina Marinova
- Thomas Young Centre and Department of Chemical Engineering, University College London, London WC1E 7JE, U.K.,Department of Materials Science and Engineering, The University of Sheffield, Sheffield S1 3JD, U.K
| | - Laurence Dodd
- Thomas Young Centre and Department of Chemical Engineering, University College London, London WC1E 7JE, U.K
| | - Song-Jun Lee
- Thomas Young Centre and Department of Chemical Engineering, University College London, London WC1E 7JE, U.K
| | - Geoffrey P F Wood
- Pfizer Worldwide Research and Development, Groton Laboratories, Groton, Connecticut 06340, United States
| | - Ivan Marziano
- Pfizer Worldwide Research and Development, Sandwich CT13 9NJ, Kent, U.K
| | - Matteo Salvalaglio
- Thomas Young Centre and Department of Chemical Engineering, University College London, London WC1E 7JE, U.K
| |
Collapse
|
13
|
Weiß RG, Ries B, Wang S, Riniker S. Volume-scaled common nearest neighbor clustering algorithm with free-energy hierarchy. J Chem Phys 2021; 154:084106. [PMID: 33639726 DOI: 10.1063/5.0025797] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
The combination of Markov state modeling (MSM) and molecular dynamics (MD) simulations has been shown in recent years to be a valuable approach to unravel the slow processes of molecular systems with increasing complexity. While the algorithms for intermediate steps in the MSM workflow such as featurization and dimensionality reduction have been specifically adapted to MD datasets, conventional clustering methods are generally applied to the discretization step. This work adds to recent efforts to develop specialized density-based clustering algorithms for the Boltzmann-weighted data from MD simulations. We introduce the volume-scaled common nearest neighbor (vs-CNN) clustering that is an adapted version of the common nearest neighbor (CNN) algorithm. A major advantage of the proposed algorithm is that the introduced density-based criterion directly links to a free-energy notion via Boltzmann inversion. Such a free-energy perspective allows a straightforward hierarchical scheme to identify conformational clusters at different levels of a generally rugged free-energy landscape of complex molecular systems.
Collapse
Affiliation(s)
- R Gregor Weiß
- Laboratory of Physical Chemistry, ETH Zürich, Vladimir-Prelog-Weg 2, 8093 Zürich, Switzerland
| | - Benjamin Ries
- Laboratory of Physical Chemistry, ETH Zürich, Vladimir-Prelog-Weg 2, 8093 Zürich, Switzerland
| | - Shuzhe Wang
- Laboratory of Physical Chemistry, ETH Zürich, Vladimir-Prelog-Weg 2, 8093 Zürich, Switzerland
| | - Sereina Riniker
- Laboratory of Physical Chemistry, ETH Zürich, Vladimir-Prelog-Weg 2, 8093 Zürich, Switzerland
| |
Collapse
|
14
|
Lickert B, Stock G. Modeling non-Markovian data using Markov state and Langevin models. J Chem Phys 2020; 153:244112. [DOI: 10.1063/5.0031979] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Affiliation(s)
- Benjamin Lickert
- Biomolecular Dynamics, Institute of Physics, Albert Ludwigs University, 79104 Freiburg, Germany
| | - Gerhard Stock
- Biomolecular Dynamics, Institute of Physics, Albert Ludwigs University, 79104 Freiburg, Germany
| |
Collapse
|
15
|
|
16
|
Nagel D, Weber A, Lickert B, Stock G. Dynamical coring of Markov state models. J Chem Phys 2019; 150:094111. [DOI: 10.1063/1.5081767] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/29/2023] Open
Affiliation(s)
- Daniel Nagel
- Biomolecular Dynamics, Institute of Physics, Albert Ludwigs University, 79104 Freiburg, Germany
| | - Anna Weber
- Biomolecular Dynamics, Institute of Physics, Albert Ludwigs University, 79104 Freiburg, Germany
| | - Benjamin Lickert
- Biomolecular Dynamics, Institute of Physics, Albert Ludwigs University, 79104 Freiburg, Germany
| | - Gerhard Stock
- Biomolecular Dynamics, Institute of Physics, Albert Ludwigs University, 79104 Freiburg, Germany
| |
Collapse
|
17
|
Sittel F, Stock G. Perspective: Identification of collective variables and metastable states of protein dynamics. J Chem Phys 2018; 149:150901. [PMID: 30342445 DOI: 10.1063/1.5049637] [Citation(s) in RCA: 84] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/25/2023] Open
Abstract
The statistical analysis of molecular dynamics simulations requires dimensionality reduction techniques, which yield a low-dimensional set of collective variables (CVs) {x i } = x that in some sense describe the essential dynamics of the system. Considering the distribution P( x ) of the CVs, the primal goal of a statistical analysis is to detect the characteristic features of P( x ), in particular, its maxima and their connection paths. This is because these features characterize the low-energy regions and the energy barriers of the corresponding free energy landscape ΔG( x ) = -k B T ln P( x ), and therefore amount to the metastable states and transition regions of the system. In this perspective, we outline a systematic strategy to identify CVs and metastable states, which subsequently can be employed to construct a Langevin or a Markov state model of the dynamics. In particular, we account for the still limited sampling typically achieved by molecular dynamics simulations, which in practice seriously limits the applicability of theories (e.g., assuming ergodicity) and black-box software tools (e.g., using redundant input coordinates). We show that it is essential to use internal (rather than Cartesian) input coordinates, employ dimensionality reduction methods that avoid rescaling errors (such as principal component analysis), and perform density based (rather than k-means-type) clustering. Finally, we briefly discuss a machine learning approach to dimensionality reduction, which highlights the essential internal coordinates of a system and may reveal hidden reaction mechanisms.
Collapse
Affiliation(s)
- Florian Sittel
- Biomolecular Dynamics, Institute of Physics, Albert Ludwigs University, 79104 Freiburg, Germany
| | - Gerhard Stock
- Biomolecular Dynamics, Institute of Physics, Albert Ludwigs University, 79104 Freiburg, Germany
| |
Collapse
|
18
|
Wang W, Liang T, Sheong FK, Fan X, Huang X. An efficient Bayesian kinetic lumping algorithm to identify metastable conformational states via Gibbs sampling. J Chem Phys 2018; 149:072337. [PMID: 30134698 DOI: 10.1063/1.5027001] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/29/2023] Open
Abstract
Markov State Model (MSM) has become a popular approach to study the conformational dynamics of complex biological systems in recent years. Built upon a large number of short molecular dynamics simulation trajectories, MSM is able to predict the long time scale dynamics of complex systems. However, to achieve Markovianity, an MSM often contains hundreds or thousands of states (microstates), hindering human interpretation of the underlying system mechanism. One way to reduce the number of states is to lump kinetically similar states together and thus coarse-grain the microstates into macrostates. In this work, we introduce a probabilistic lumping algorithm, the Gibbs lumping algorithm, to assign a probability to any given kinetic lumping using the Bayesian inference. In our algorithm, the transitions among kinetically distinct macrostates are modeled by Poisson processes, which will well reflect the separation of time scales in the underlying free energy landscape of biomolecules. Furthermore, to facilitate the search for the optimal kinetic lumping (i.e., the lumped model with the highest probability), a Gibbs sampling algorithm is introduced. To demonstrate the power of our new method, we apply it to three systems: a 2D potential, alanine dipeptide, and a WW protein domain. In comparison with six other popular lumping algorithms, we show that our method can persistently produce the lumped macrostate model with the highest probability as well as the largest metastability. We anticipate that our Gibbs lumping algorithm holds great promise to be widely applied to investigate conformational changes in biological macromolecules.
Collapse
Affiliation(s)
- Wei Wang
- HKUST-Shenzhen Research Institute, Hi-Tech Park, Nanshan, Shenzhen 518057, China
| | - Tong Liang
- Department of Statistics, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong
| | - Fu Kit Sheong
- Department of Chemistry, The Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong
| | - Xiaodan Fan
- Department of Statistics, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong
| | - Xuhui Huang
- HKUST-Shenzhen Research Institute, Hi-Tech Park, Nanshan, Shenzhen 518057, China
| |
Collapse
|
19
|
Peng JH, Wang W, Yu YQ, Gu HL, Huang X. Clustering algorithms to analyze molecular dynamics simulation trajectories for complex chemical and biological systems. CHINESE J CHEM PHYS 2018. [DOI: 10.1063/1674-0068/31/cjcp1806147] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Affiliation(s)
- Jun-hui Peng
- HKUST-Shenzhen Research Institute, Hi-Tech Park, Nanshan, Shenzhen 518057, China
- Department of Chemistry, The Hong Kong University of Science and Technology, Kowloon, Hong Kong
| | - Wei Wang
- HKUST-Shenzhen Research Institute, Hi-Tech Park, Nanshan, Shenzhen 518057, China
- Department of Chemistry, The Hong Kong University of Science and Technology, Kowloon, Hong Kong
| | - Ye-qing Yu
- HKUST-Shenzhen Research Institute, Hi-Tech Park, Nanshan, Shenzhen 518057, China
- Department of Chemistry, The Hong Kong University of Science and Technology, Kowloon, Hong Kong
| | - Han-lin Gu
- Department of Mathematics, The Hong Kong University of Science and Technology, Kowloon, Hong Kong
| | - Xuhui Huang
- HKUST-Shenzhen Research Institute, Hi-Tech Park, Nanshan, Shenzhen 518057, China
- Department of Chemistry, The Hong Kong University of Science and Technology, Kowloon, Hong Kong
- Center of Systems Biology and Human Health, The Hong Kong University of Science and Technology, Kowloon, Hong Kong
- State Key Laboratory of Molecular Neuroscience, The Hong Kong University of Science and Technology, Kowloon, Hong Kong
| |
Collapse
|
20
|
Wang Y, Pang W, Zhou Y. Density propagation based adaptive multi-density clustering algorithm. PLoS One 2018; 13:e0198948. [PMID: 30020928 PMCID: PMC6051564 DOI: 10.1371/journal.pone.0198948] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2018] [Accepted: 05/29/2018] [Indexed: 11/21/2022] Open
Abstract
The performance of density based clustering algorithms may be greatly influenced by the chosen parameter values, and achieving optimal or near optimal results very much depends on empirical knowledge obtained from previous experiments. To address this limitation, we propose a novel density based clustering algorithm called the Density Propagation based Adaptive Multi-density clustering (DPAM) algorithm. DPAM can adaptively cluster spatial data. In order to avoid manual intervention when choosing parameters of density clustering and still achieve high performance, DPAM performs clustering in three stages: (1) generate the micro-clusters graph, (2) density propagation with redefinition of between-class margin and intra-class cohesion, and (3) calculate regional density. Experimental results demonstrated that DPAM could achieve better performance than several state-of-the-art density clustering algorithms in most of the tested cases, the ability of no parameters needing to be adjusted enables the proposed algorithm to achieve promising performance.
Collapse
Affiliation(s)
- Yizhang Wang
- College of Computer Science and Technology, Jilin University, Changchun, China
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Changchun, China
| | - Wei Pang
- Department of Computing Science, University of Aberdeen, United Kingdom
| | - You Zhou
- College of Computer Science and Technology, Jilin University, Changchun, China
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Changchun, China
- * E-mail:
| |
Collapse
|
21
|
A Survey of Data Mining and Deep Learning in Bioinformatics. J Med Syst 2018; 42:139. [DOI: 10.1007/s10916-018-1003-9] [Citation(s) in RCA: 81] [Impact Index Per Article: 13.5] [Reference Citation Analysis] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2018] [Accepted: 06/21/2018] [Indexed: 12/13/2022]
|
22
|
|
23
|
|
24
|
Wang W, Cao S, Zhu L, Huang X. Constructing Markov State Models to elucidate the functional conformational changes of complex biomolecules. WILEY INTERDISCIPLINARY REVIEWS-COMPUTATIONAL MOLECULAR SCIENCE 2017. [DOI: 10.1002/wcms.1343] [Citation(s) in RCA: 52] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]
Affiliation(s)
- Wei Wang
- Department of ChemistryThe Hong Kong University of Science and Technology Kowloon Hong Kong
- Center of Systems Biology and Human HealthThe Hong Kong University of Science and Technology Kowloon Hong Kong
| | - Siqin Cao
- Department of ChemistryThe Hong Kong University of Science and Technology Kowloon Hong Kong
| | - Lizhe Zhu
- Department of ChemistryThe Hong Kong University of Science and Technology Kowloon Hong Kong
- Center of Systems Biology and Human HealthThe Hong Kong University of Science and Technology Kowloon Hong Kong
| | - Xuhui Huang
- Department of ChemistryThe Hong Kong University of Science and Technology Kowloon Hong Kong
- Center of Systems Biology and Human HealthThe Hong Kong University of Science and Technology Kowloon Hong Kong
- Hong Kong Branch of Chinese National Engineering Research Center for Tissue Restoration & ReconstructionThe Hong Kong University of Science and Technology Kowloon Hong Kong
- HKUST‐Shenzhen Research Institute Shenzhen China
| |
Collapse
|
25
|
Meng L, Sheong FK, Zeng X, Zhu L, Huang X. Path lumping: An efficient algorithm to identify metastable path channels for conformational dynamics of multi-body systems. J Chem Phys 2017; 147:044112. [DOI: 10.1063/1.4995558] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023] Open
Affiliation(s)
- Luming Meng
- Department of Chemistry, The Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong
| | - Fu Kit Sheong
- Department of Chemistry, The Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong
| | - Xiangze Zeng
- Department of Chemistry, The Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong
| | - Lizhe Zhu
- Department of Chemistry, The Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong
- Center of Systems Biology and Human Health, The Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong
| | - Xuhui Huang
- Department of Chemistry, The Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong
- Center of Systems Biology and Human Health, The Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong
- Hong Kong Branch of Chinese National Engineering Research Center for Tissue Restoration and Reconstruction, The Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong
- HKUST-Shenzhen Research Institute, Hi-Tech Park, Nanshan, Shenzhen 518057, China
| |
Collapse
|