1
|
Lee SC, Z Y. Interpretation of autoencoder-learned collective variables using Morse-Smale complex and sublevelset persistent homology: An application on molecular trajectories. J Chem Phys 2024; 160:144104. [PMID: 38591676 DOI: 10.1063/5.0191446] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2023] [Accepted: 03/22/2024] [Indexed: 04/10/2024] Open
Abstract
Dimensionality reduction often serves as the first step toward a minimalist understanding of physical systems as well as the accelerated simulations of them. In particular, neural network-based nonlinear dimensionality reduction methods, such as autoencoders, have shown promising outcomes in uncovering collective variables (CVs). However, the physical meaning of these CVs remains largely elusive. In this work, we constructed a framework that (1) determines the optimal number of CVs needed to capture the essential molecular motions using an ensemble of hierarchical autoencoders and (2) provides topology-based interpretations to the autoencoder-learned CVs with Morse-Smale complex and sublevelset persistent homology. This approach was exemplified using a series of n-alkanes and can be regarded as a general, explainable nonlinear dimensionality reduction method.
Collapse
Affiliation(s)
- Shao-Chun Lee
- Department of Nuclear, Plasma, and Radiological Engineering, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, USA
- Beckman Institute for Advanced Science and Technology, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, USA
| | - Y Z
- Department of Nuclear, Plasma, and Radiological Engineering, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, USA
- Beckman Institute for Advanced Science and Technology, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, USA
- Department of Nuclear Engineering and Radiological Sciences, Department of Materials Science and Engineering, Department of Robotics, and Applied Physics Program, University of Michigan, Ann Arbor, Michigan 48105, USA
| |
Collapse
|
2
|
Fu H, Bian H, Shao X, Cai W. Collective Variable-Based Enhanced Sampling: From Human Learning to Machine Learning. J Phys Chem Lett 2024; 15:1774-1783. [PMID: 38329095 DOI: 10.1021/acs.jpclett.3c03542] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/09/2024]
Abstract
Enhanced-sampling algorithms relying on collective variables (CVs) are extensively employed to study complex (bio)chemical processes that are not amenable to brute-force molecular simulations. The selection of appropriate CVs characterizing the slow movement modes is of paramount importance for reliable and efficient enhanced-sampling simulations. In this Perspective, we first review the application and limitations of CVs obtained from chemical and geometrical intuition. We also introduce path-sampling algorithms, which can identify path-like CVs in a high-dimensional free-energy space. Machine-learning algorithms offer a viable approach to finding suitable CVs by analyzing trajectories from preliminary simulations. We discuss both the performance of machine-learning-derived CVs in enhanced-sampling simulations of experimental models and the challenges involved in applying these CVs to realistic, complex molecular assemblies. Moreover, we provide a prospective view of the potential advancements of machine-learning algorithms for the development of CVs in the field of enhanced-sampling simulations.
Collapse
Affiliation(s)
- Haohao Fu
- Research Center for Analytical Sciences, Frontiers Science Center for New Organic Matter, College of Chemistry, Nankai University, Tianjin Key Laboratory of Biosensing and Molecular Recognition, State Key Laboratory of Medicinal Chemical Biology, Tianjin 300071, China
- Haihe Laboratory of Sustainable Chemical Transformations, Tianjin 300192, China
| | - Hengwei Bian
- Research Center for Analytical Sciences, Frontiers Science Center for New Organic Matter, College of Chemistry, Nankai University, Tianjin Key Laboratory of Biosensing and Molecular Recognition, State Key Laboratory of Medicinal Chemical Biology, Tianjin 300071, China
- Haihe Laboratory of Sustainable Chemical Transformations, Tianjin 300192, China
| | - Xueguang Shao
- Research Center for Analytical Sciences, Frontiers Science Center for New Organic Matter, College of Chemistry, Nankai University, Tianjin Key Laboratory of Biosensing and Molecular Recognition, State Key Laboratory of Medicinal Chemical Biology, Tianjin 300071, China
- Haihe Laboratory of Sustainable Chemical Transformations, Tianjin 300192, China
| | - Wensheng Cai
- Research Center for Analytical Sciences, Frontiers Science Center for New Organic Matter, College of Chemistry, Nankai University, Tianjin Key Laboratory of Biosensing and Molecular Recognition, State Key Laboratory of Medicinal Chemical Biology, Tianjin 300071, China
- Haihe Laboratory of Sustainable Chemical Transformations, Tianjin 300192, China
| |
Collapse
|
3
|
Dominic AJ, Cao S, Montoya-Castillo A, Huang X. Memory Unlocks the Future of Biomolecular Dynamics: Transformative Tools to Uncover Physical Insights Accurately and Efficiently. J Am Chem Soc 2023; 145:9916-9927. [PMID: 37104720 DOI: 10.1021/jacs.3c01095] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/29/2023]
Abstract
Conformational changes underpin function and encode complex biomolecular mechanisms. Gaining atomic-level detail of how such changes occur has the potential to reveal these mechanisms and is of critical importance in identifying drug targets, facilitating rational drug design, and enabling bioengineering applications. While the past two decades have brought Markov state model techniques to the point where practitioners can regularly use them to glimpse the long-time dynamics of slow conformations in complex systems, many systems are still beyond their reach. In this Perspective, we discuss how including memory (i.e., non-Markovian effects) can reduce the computational cost to predict the long-time dynamics in these complex systems by orders of magnitude and with greater accuracy and resolution than state-of-the-art Markov state models. We illustrate how memory lies at the heart of successful and promising techniques, ranging from the Fokker-Planck and generalized Langevin equations to deep-learning recurrent neural networks and generalized master equations. We delineate how these techniques work, identify insights that they can offer in biomolecular systems, and discuss their advantages and disadvantages in practical settings. We show how generalized master equations can enable the investigation of, for example, the gate-opening process in RNA polymerase II and demonstrate how our recent advances tame the deleterious influence of statistical underconvergence of the molecular dynamics simulations used to parameterize these techniques. This represents a significant leap forward that will enable our memory-based techniques to interrogate systems that are currently beyond the reach of even the best Markov state models. We conclude by discussing some current challenges and future prospects for how exploiting memory will open the door to many exciting opportunities.
Collapse
Affiliation(s)
- Anthony J Dominic
- Department of Chemistry, University of Colorado Boulder, Boulder, Colorado 80309, USA
| | - Siqin Cao
- Department of Chemistry, Theoretical Chemistry Institute, University of Wisconsin-Madison, Madison, Wisconsin 53706, USA
| | | | - Xuhui Huang
- Department of Chemistry, Theoretical Chemistry Institute, University of Wisconsin-Madison, Madison, Wisconsin 53706, USA
| |
Collapse
|
4
|
Wu S, Li H, Ma A. Exact reaction coordinates for flap opening in HIV-1 protease. Proc Natl Acad Sci U S A 2022; 119:e2214906119. [PMID: 36459640 PMCID: PMC9894123 DOI: 10.1073/pnas.2214906119] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2022] [Accepted: 10/25/2022] [Indexed: 12/04/2022] Open
Abstract
The primary goal of protein science is to understand how proteins function, which requires understanding the functional dynamics responsible for transitions between different functional structures of a protein. A central concept is the exact reaction coordinates that can determine the value of committor for any protein configuration, which provide the optimal description of functional dynamics. Despite intensive efforts, identifying the exact reaction coordinates (RCs) in complex molecules remains a formidable challenge. Using the recently developed generalized work functional, we report the discovery of the exact RCs for an important functional process-the flap opening of HIV-1 protease. Our results show that this process has six RCs, each one is a linear combination of ~240 backbone dihedrals, providing the precise definition of collectivity and cooperativity in the functional dynamics of a protein. Applying bias potentials along each RC can accelerate flap opening by [Formula: see text] to [Formula: see text] folds. The success in identifying the RCs of a protein with 198 residues represents a significant progress beyond that of the alanine dipeptide, currently the only other complex molecule for which the exact RCs for its conformational changes are known. Our results suggest that the generalized work functional (GWF) might be the fundamental operator of mechanics that controls protein dynamics.
Collapse
Affiliation(s)
- Shanshan Wu
- Center for Bioinformatics and Quantitative Biology, Richard and Loan Hill Department of Biomedical Engineering, The University of Illinois at Chicago, Chicago, IL60607
| | - Huiyu Li
- Center for Bioinformatics and Quantitative Biology, Richard and Loan Hill Department of Biomedical Engineering, The University of Illinois at Chicago, Chicago, IL60607
| | - Ao Ma
- Center for Bioinformatics and Quantitative Biology, Richard and Loan Hill Department of Biomedical Engineering, The University of Illinois at Chicago, Chicago, IL60607
| |
Collapse
|
5
|
Manuchehrfar F, Li H, Ma A, Liang J. Reactive Vortexes in a Naturally Activated Process: Non-Diffusive Rotational Fluxes at Transition State Uncovered by Persistent Homology. J Phys Chem B 2022; 126:9297-9308. [PMID: 36346639 PMCID: PMC10495042 DOI: 10.1021/acs.jpcb.2c07015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
The dynamics of reaction coordinates during barrier-crossing are key to understanding activated processes in complex systems such as proteins. The default assumption from Kramers' physical intuition is that of a diffusion process. However, the dynamics of barrier-crossing in natural complex molecules are largely unexplored. Here we investigate the transition dynamics of alanine dipeptide isomerization, the simplest complex system with a large number of non-reaction coordinates that can serve as an adequate thermal bath feeding energy into the reaction coordinates. We separate conformations along the time axis and construct the dynamic probability surface of reaction. We quantify its topological structure and rotational flux using persistent homology and differential form. Our results uncovered a region with a strong reactive vortex in the configuration-time space, where the highest probability peak and the transition state ensemble are located. This reactive region contains strong rotational fluxes: Most reactive trajectories swirl multiple times around this region in the subspace of the two most important reaction coordinates. Furthermore, the rotational fluxes result from cooperative movement along the isocommitter surfaces and orthogonal barrier-crossing. Overall, our findings offer a first glimpse into the reactive vortex regions that characterize the non-diffusive dynamics of barrier-crossing of a naturally occurring activation process.
Collapse
Affiliation(s)
- Farid Manuchehrfar
- Center for Bioinformatics and Quantiative Biology and Richard and Loan Hill Department of Biomedical Engineering, University of Illinois at Chicago, Chicago, Illinois60607, United States
| | - Huiyu Li
- Center for Bioinformatics and Quantiative Biology and Richard and Loan Hill Department of Biomedical Engineering, University of Illinois at Chicago, Chicago, Illinois60607, United States
| | - Ao Ma
- Center for Bioinformatics and Quantiative Biology and Richard and Loan Hill Department of Biomedical Engineering, University of Illinois at Chicago, Chicago, Illinois60607, United States
| | - Jie Liang
- Center for Bioinformatics and Quantiative Biology and Richard and Loan Hill Department of Biomedical Engineering, University of Illinois at Chicago, Chicago, Illinois60607, United States
| |
Collapse
|
6
|
Petenkaya A, Manuchehrfar F, Chronis C, Liang J. Identifying Transient Cells During Reprogramming via Persistent Homology. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2022; 2022:2920-2923. [PMID: 36085927 PMCID: PMC10495043 DOI: 10.1109/embc48229.2022.9871358] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Single-cell RNA sequencing is a powerful method that helps delineate the regulatory mechanisms shaping the diverse cellular populations. Heterogeneous cell populations consist of individual cells, and the expression of distinct sets of genes can differentiate one sub-population of cells from another, as they are responsible for the emergence of distinct cellular phenotypes. Of particular importance are cells at transition states that bridge these different cellular phenotypes. In this study, we develop a method to identify the cells at transition states bridging different cellular phenotypes. Our approach is based on persistent homology, which enabled us to identify the group of cells located on the boundaries between different sub-populations of cells. We applied this method to study the reprogramming of human fibroblasts toward induced pluripotent stem cells using single-cell time-course data. Even though only the data that is representative of the early stages of the reprogramming process are analyzed, we are able to uncover transient cells bridging different cell sub-populations. The most prominent group of transient cells are found to be enriched for NANOG, which is a known stem cell transcription factor that takes part in the maintenance of pluripotency and other stem cell marker genes. Overall, our method can identify cells in transient states bridging major cellular phenotypes, even though they are only a small fraction of the overall cell population. We also discuss how this approach can link the topology of the surface of cellular transcripts and bring order to the transition between cellular states and how it automatically uncovers the underlying time process.
Collapse
|
7
|
Kikutsuji T, Mori Y, Okazaki KI, Mori T, Kim K, Matubayasi N. Explaining reaction coordinates of alanine dipeptide isomerization obtained from deep neural networks using Explainable Artificial Intelligence (XAI). J Chem Phys 2022; 156:154108. [DOI: 10.1063/5.0087310] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
A method for obtaining appropriate reaction coordinates is required to identify transition states distinguishing product and reactant in complex molecular systems. Recently, abundant research has been devoted to obtaining reaction coordinates using artificial neural networks from deep learning literature, where many collective variables are typically utilized in the input layer. However, it is difficult to explain the details of which collective variables contribute to the predicted reaction coordinates owing to the complexity of the nonlinear functions in deep neural networks. To overcome this limitation, we used Explainable Artificial Intelligence (XAI) methods of the Local Interpretable Model-agnostic Explanation (LIME) and the game theory-based framework known as Shapley Additive exPlanations (SHAP). We demonstrated that XAI enables us to obtain the degree of contribution of each collective variable to reaction coordinates that is determined by nonlinear regressions with deep learning for the committor of the alanine dipeptide isomerization in vacuum. In particular, both LIME and SHAP provide important features to the predicted reaction coordinates, which are characterized by appropriate dihedral angles consistent with those previously reported from the committor test analysis. The present study offers an AI-aided framework to explain the appropriate reaction coordinates, which acquires considerable significance when the number of degrees of freedom increases.
Collapse
Affiliation(s)
| | | | - Kei-ichi Okazaki
- Department of Theoretical and Computational Molecular Science, Institute for Molecular Science, Japan
| | - Toshifumi Mori
- Kyushu University Institute for Materials Chemistry and Engineering, Japan
| | - Kang Kim
- Graduate School of Engineering Science, Osaka University - Toyonaka Campus, Japan
| | - Nobuyuki Matubayasi
- Division of Chemical Engineering, Graduate School of Engineering Science, Osaka University, Japan
| |
Collapse
|
8
|
Terebus A, Manuchehrfar F, Cao Y, Liang J. Exact Probability Landscapes of Stochastic Phenotype Switching in Feed-Forward Loops: Phase Diagrams of Multimodality. Front Genet 2021; 12:645640. [PMID: 34306004 PMCID: PMC8297706 DOI: 10.3389/fgene.2021.645640] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2020] [Accepted: 04/26/2021] [Indexed: 11/13/2022] Open
Abstract
Feed-forward loops (FFLs) are among the most ubiquitously found motifs of reaction networks in nature. However, little is known about their stochastic behavior and the variety of network phenotypes they can exhibit. In this study, we provide full characterizations of the properties of stochastic multimodality of FFLs, and how switching between different network phenotypes are controlled. We have computed the exact steady-state probability landscapes of all eight types of coherent and incoherent FFLs using the finite-butter Accurate Chemical Master Equation (ACME) algorithm, and quantified the exact topological features of their high-dimensional probability landscapes using persistent homology. Through analysis of the degree of multimodality for each of a set of 10,812 probability landscapes, where each landscape resides over 105–106 microstates, we have constructed comprehensive phase diagrams of all relevant behavior of FFL multimodality over broad ranges of input and regulation intensities, as well as different regimes of promoter binding dynamics. In addition, we have quantified the topological sensitivity of the multimodality of the landscapes to regulation intensities. Our results show that with slow binding and unbinding dynamics of transcription factor to promoter, FFLs exhibit strong stochastic behavior that is very different from what would be inferred from deterministic models. In addition, input intensity play major roles in the phenotypes of FFLs: At weak input intensity, FFL exhibit monomodality, but strong input intensity may result in up to 6 stable phenotypes. Furthermore, we found that gene duplication can enlarge stable regions of specific multimodalities and enrich the phenotypic diversity of FFL networks, providing means for cells toward better adaptation to changing environment. Our results are directly applicable to analysis of behavior of FFLs in biological processes such as stem cell differentiation and for design of synthetic networks when certain phenotypic behavior is desired.
Collapse
Affiliation(s)
- Anna Terebus
- Center for Bioinformatics and Quantitative Biology, Richard and Loan Hill Department of Bioengineering, University of Illinois at Chicago, Chicago, IL, United States.,Constellation, Baltimore, MD, United States
| | - Farid Manuchehrfar
- Center for Bioinformatics and Quantitative Biology, Richard and Loan Hill Department of Bioengineering, University of Illinois at Chicago, Chicago, IL, United States
| | - Youfang Cao
- Center for Bioinformatics and Quantitative Biology, Richard and Loan Hill Department of Bioengineering, University of Illinois at Chicago, Chicago, IL, United States.,Merck & Co., Inc., Kenilworth, NJ, United States
| | - Jie Liang
- Center for Bioinformatics and Quantitative Biology, Richard and Loan Hill Department of Bioengineering, University of Illinois at Chicago, Chicago, IL, United States
| |
Collapse
|