1
|
Woods EJ, Kannan D, Sharpe DJ, Swinburne TD, Wales DJ. Analysing ill-conditioned Markov chains. PHILOSOPHICAL TRANSACTIONS. SERIES A, MATHEMATICAL, PHYSICAL, AND ENGINEERING SCIENCES 2023; 381:20220245. [PMID: 37211032 DOI: 10.1098/rsta.2022.0245] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/15/2022] [Accepted: 11/15/2022] [Indexed: 05/23/2023]
Abstract
Discrete state Markov chains in discrete or continuous time are widely used to model phenomena in the social, physical and life sciences. In many cases, the model can feature a large state space, with extreme differences between the fastest and slowest transition timescales. Analysis of such ill-conditioned models is often intractable with finite precision linear algebra techniques. In this contribution, we propose a solution to this problem, namely partial graph transformation, to iteratively eliminate and renormalize states, producing a low-rank Markov chain from an ill-conditioned initial model. We show that the error induced by this procedure can be minimized by retaining both the renormalized nodes that represent metastable superbasins, and those through which reactive pathways concentrate, i.e. the dividing surface in the discrete state space. This procedure typically returns a much lower rank model, where trajectories can be efficiently generated with kinetic path sampling. We apply this approach to an ill-conditioned Markov chain for a model multi-community system, measuring the accuracy by direct comparison with trajectories and transition statistics. This article is part of a discussion meeting issue 'Supercomputing simulations of advanced materials'.
Collapse
Affiliation(s)
- Esmae J Woods
- Cavendish Laboratory, Department of Physics, University of Cambridge, Cambridge CB3 0HE, UK
- Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge CB2 1EW, UK
| | - Deepti Kannan
- Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge CB2 1EW, UK
| | - Daniel J Sharpe
- Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge CB2 1EW, UK
| | - Thomas D Swinburne
- CNRS, CINaM UMR, Aix-Marseille Université, 7325, Campus de Luminy, 13288 Marseille, France
| | - David J Wales
- Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge CB2 1EW, UK
| |
Collapse
|
2
|
Baltrukevich H, Podlewska S. From Data to Knowledge: Systematic Review of Tools for Automatic Analysis of Molecular Dynamics Output. Front Pharmacol 2022; 13:844293. [PMID: 35359865 PMCID: PMC8960308 DOI: 10.3389/fphar.2022.844293] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2021] [Accepted: 01/26/2022] [Indexed: 12/02/2022] Open
Abstract
An increasing number of crystal structures available on one side, and the boost of computational power available for computer-aided drug design tasks on the other, have caused that the structure-based drug design tools are intensively used in the drug development pipelines. Docking and molecular dynamics simulations, key representatives of the structure-based approaches, provide detailed information about the potential interaction of a ligand with a target receptor. However, at the same time, they require a three-dimensional structure of a protein and a relatively high amount of computational resources. Nowadays, as both docking and molecular dynamics are much more extensively used, the amount of data output from these procedures is also growing. Therefore, there are also more and more approaches that facilitate the analysis and interpretation of the results of structure-based tools. In this review, we will comprehensively summarize approaches for handling molecular dynamics simulations output. It will cover both statistical and machine-learning-based tools, as well as various forms of depiction of molecular dynamics output.
Collapse
Affiliation(s)
- Hanna Baltrukevich
- Maj Institute of Pharmacology, Polish Academy of Sciences, Kraków, Poland
- Faculty of Pharmacy, Chair of Technology and Biotechnology of Medical Remedies, Jagiellonian University Medical College in Krakow, Kraków, Poland
| | - Sabina Podlewska
- Maj Institute of Pharmacology, Polish Academy of Sciences, Kraków, Poland
| |
Collapse
|
3
|
Abstract
Numerous linear and non-linear spectroscopic techniques have been developed to elucidate structural and functional information of complex systems ranging from natural systems, such as proteins and light-harvesting systems, to synthetic systems, such as solar cell materials and light-emitting diodes. The obtained experimental data can be challenging to interpret due to the complexity and potential overlapping spectral signatures. Therefore, computational spectroscopy plays a crucial role in the interpretation and understanding of spectral observables of complex systems. Computational modeling of various spectroscopic techniques has seen significant developments in the past decade, when it comes to the systems that can be addressed, the size and complexity of the sample types, the accuracy of the methods, and the spectroscopic techniques that can be addressed. In this Perspective, I will review the computational spectroscopy methods that have been developed and applied for infrared and visible spectroscopies in the condensed phase. I will discuss some of the questions that this has allowed answering. Finally, I will discuss current and future challenges and how these may be addressed.
Collapse
Affiliation(s)
- Thomas L C Jansen
- Zernike Institute for Advanced Materials, University of Groningen, Nijenborgh 4, 9747 AG Groningen, The Netherlands
| |
Collapse
|
4
|
Glielmo A, Husic BE, Rodriguez A, Clementi C, Noé F, Laio A. Unsupervised Learning Methods for Molecular Simulation Data. Chem Rev 2021; 121:9722-9758. [PMID: 33945269 PMCID: PMC8391792 DOI: 10.1021/acs.chemrev.0c01195] [Citation(s) in RCA: 116] [Impact Index Per Article: 38.7] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2020] [Indexed: 12/21/2022]
Abstract
Unsupervised learning is becoming an essential tool to analyze the increasingly large amounts of data produced by atomistic and molecular simulations, in material science, solid state physics, biophysics, and biochemistry. In this Review, we provide a comprehensive overview of the methods of unsupervised learning that have been most commonly used to investigate simulation data and indicate likely directions for further developments in the field. In particular, we discuss feature representation of molecular systems and present state-of-the-art algorithms of dimensionality reduction, density estimation, and clustering, and kinetic models. We divide our discussion into self-contained sections, each discussing a specific method. In each section, we briefly touch upon the mathematical and algorithmic foundations of the method, highlight its strengths and limitations, and describe the specific ways in which it has been used-or can be used-to analyze molecular simulation data.
Collapse
Affiliation(s)
- Aldo Glielmo
- International
School for Advanced Studies (SISSA) 34014 Trieste, Italy
| | - Brooke E. Husic
- Freie
Universität Berlin, Department of Mathematics
and Computer Science, 14195 Berlin, Germany
| | - Alex Rodriguez
- International Centre for Theoretical
Physics (ICTP), Condensed Matter and Statistical
Physics Section, 34100 Trieste, Italy
| | - Cecilia Clementi
- Freie
Universität Berlin, Department for
Physics, 14195 Berlin, Germany
- Rice
University Houston, Department of Chemistry, Houston, Texas 77005, United States
| | - Frank Noé
- Freie
Universität Berlin, Department of Mathematics
and Computer Science, 14195 Berlin, Germany
- Freie
Universität Berlin, Department for
Physics, 14195 Berlin, Germany
- Rice
University Houston, Department of Chemistry, Houston, Texas 77005, United States
| | - Alessandro Laio
- International
School for Advanced Studies (SISSA) 34014 Trieste, Italy
- International Centre for Theoretical
Physics (ICTP), Condensed Matter and Statistical
Physics Section, 34100 Trieste, Italy
| |
Collapse
|
5
|
Song Z, Zhou H, Tian H, Wang X, Tao P. Unraveling the energetic significance of chemical events in enzyme catalysis via machine-learning based regression approach. Commun Chem 2020; 3:134. [PMID: 36703376 PMCID: PMC9814854 DOI: 10.1038/s42004-020-00379-w] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2020] [Accepted: 09/11/2020] [Indexed: 01/29/2023] Open
Abstract
The bacterial enzyme class of β-lactamases are involved in benzylpenicillin acylation reactions, which are currently being revisited using hybrid quantum mechanical molecular mechanical (QM/MM) chain-of-states pathway optimizations. Minimum energy pathways are sampled by reoptimizing pathway geometry under different representative protein environments obtained through constrained molecular dynamics simulations. Predictive potential energy surface models in the reaction space are trained with machine-learning regression techniques. Herein, using TEM-1/benzylpenicillin acylation reaction as the model system, we introduce two model-independent criteria for delineating the energetic contributions and correlations in the predicted reaction space. Both methods are demonstrated to effectively quantify the energetic contribution of each chemical process and identify the rate limiting step of enzymatic reaction with high degrees of freedom. The consistency of the current workflow is tested under seven levels of quantum chemistry theory and three non-linear machine-learning regression models. The proposed approaches are validated to provide qualitative compliance with experimental mutagenesis studies.
Collapse
Affiliation(s)
- Zilin Song
- grid.263864.d0000 0004 1936 7929Department of Chemistry, Center for Research Computing, Center for Drug Discovery, Design, and Delivery (CD4), Southern Methodist University, Dallas, TX 75275 USA
| | - Hongyu Zhou
- grid.263864.d0000 0004 1936 7929Department of Chemistry, Center for Research Computing, Center for Drug Discovery, Design, and Delivery (CD4), Southern Methodist University, Dallas, TX 75275 USA
| | - Hao Tian
- grid.263864.d0000 0004 1936 7929Department of Chemistry, Center for Research Computing, Center for Drug Discovery, Design, and Delivery (CD4), Southern Methodist University, Dallas, TX 75275 USA
| | - Xinlei Wang
- grid.263864.d0000 0004 1936 7929Department of Statistical Science, Southern Methodist University, Dallas, TX 75275 USA
| | - Peng Tao
- grid.263864.d0000 0004 1936 7929Department of Chemistry, Center for Research Computing, Center for Drug Discovery, Design, and Delivery (CD4), Southern Methodist University, Dallas, TX 75275 USA
| |
Collapse
|
6
|
Verkhivker GM, Agajanian S, Hu G, Tao P. Allosteric Regulation at the Crossroads of New Technologies: Multiscale Modeling, Networks, and Machine Learning. Front Mol Biosci 2020; 7:136. [PMID: 32733918 PMCID: PMC7363947 DOI: 10.3389/fmolb.2020.00136] [Citation(s) in RCA: 25] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2020] [Accepted: 06/08/2020] [Indexed: 12/12/2022] Open
Abstract
Allosteric regulation is a common mechanism employed by complex biomolecular systems for regulation of activity and adaptability in the cellular environment, serving as an effective molecular tool for cellular communication. As an intrinsic but elusive property, allostery is a ubiquitous phenomenon where binding or disturbing of a distal site in a protein can functionally control its activity and is considered as the "second secret of life." The fundamental biological importance and complexity of these processes require a multi-faceted platform of synergistically integrated approaches for prediction and characterization of allosteric functional states, atomistic reconstruction of allosteric regulatory mechanisms and discovery of allosteric modulators. The unifying theme and overarching goal of allosteric regulation studies in recent years have been integration between emerging experiment and computational approaches and technologies to advance quantitative characterization of allosteric mechanisms in proteins. Despite significant advances, the quantitative characterization and reliable prediction of functional allosteric states, interactions, and mechanisms continue to present highly challenging problems in the field. In this review, we discuss simulation-based multiscale approaches, experiment-informed Markovian models, and network modeling of allostery and information-theoretical approaches that can describe the thermodynamics and hierarchy allosteric states and the molecular basis of allosteric mechanisms. The wealth of structural and functional information along with diversity and complexity of allosteric mechanisms in therapeutically important protein families have provided a well-suited platform for development of data-driven research strategies. Data-centric integration of chemistry, biology and computer science using artificial intelligence technologies has gained a significant momentum and at the forefront of many cross-disciplinary efforts. We discuss new developments in the machine learning field and the emergence of deep learning and deep reinforcement learning applications in modeling of molecular mechanisms and allosteric proteins. The experiment-guided integrated approaches empowered by recent advances in multiscale modeling, network science, and machine learning can lead to more reliable prediction of allosteric regulatory mechanisms and discovery of allosteric modulators for therapeutically important protein targets.
Collapse
Affiliation(s)
- Gennady M. Verkhivker
- Graduate Program in Computational and Data Sciences, Schmid College of Science and Technology, Chapman University, Orange, CA, United States
- Department of Biomedical and Pharmaceutical Sciences, Chapman University School of Pharmacy, Irvine, CA, United States
| | - Steve Agajanian
- Graduate Program in Computational and Data Sciences, Schmid College of Science and Technology, Chapman University, Orange, CA, United States
| | - Guang Hu
- Center for Systems Biology, Department of Bioinformatics, School of Biology and Basic Medical Sciences, Soochow University, Suzhou, China
| | - Peng Tao
- Department of Chemistry, Center for Drug Discovery, Design, and Delivery (CD4), Center for Scientific Computation, Southern Methodist University, Dallas, TX, United States
| |
Collapse
|