1
|
Dong J, Wang S, Cui W, Sun X, Guo H, Yan H, Vogel H, Wang Z, Yuan S. Machine Learning Deciphered Molecular Mechanistics with Accurate Kinetic and Thermodynamic Prediction. J Chem Theory Comput 2024; 20:4499-4513. [PMID: 38394691 DOI: 10.1021/acs.jctc.3c01412] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/25/2024]
Abstract
Time-lagged independent component analysis (tICA) and the Markov state model (MSM) have been extensively employed for extracting conformational dynamics and kinetic community networks from unbiased trajectory ensembles. However, these techniques may not be the optimal choice for elucidating transition mechanisms within low-dimensional representations, especially for intricate biosystems. Unraveling the association mechanism in such complex systems always necessitates permutations of several essential independent components or collective variables, a process that is inherently obscure and may require empirical knowledge for selection. To address these challenges, we have implemented an integrated unsupervised dimension reduction model: uniform manifold approximation and projection (UMAP) with hierarchy density-based spatial clustering of applications with noise (HDBSCAN). This approach effectively generates low-dimensional configurational embeddings. The hierarchical application of this architecture, in conjunction with MSM, reveals global kinetic connectivity while identifying local conformational states. Consequently, our methodology establishes a multiscale mechanistic elucidation framework. Leveraging the benefits of the uniform sample distribution and a denoising approach, our model demonstrates robustness in preserving global and local data structures compared to traditional dimension reduction methods in the field of MD analysis area. The interpretability of hyperparameter selection and compatibility with downstream tasks are cross-validated across various simulation data sets, utilizing both computational evaluation metrics and experimental kinetic observables. Furthermore, the predicted Mcl1-BH3 association kinetics (0.76 s-1) is in close agreement with surface plasmon resonance experiments (0.12 s-1), affirming the plausibility of the identified pathway composed of representative conformations. We anticipate that the devised workflow will serve as a foundational framework for studying recognition patterns in complex biological systems. Its contributions extend to the exploration of protein functional dynamics and rational drug design, offering a potent avenue for advancing research in these domains.
Collapse
Affiliation(s)
- Junlin Dong
- Research Center for Computer-Aided Drug Discovery, Institute of Biomedicine and Biotechnology, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Shiyu Wang
- Research Center for Computer-Aided Drug Discovery, Institute of Biomedicine and Biotechnology, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China
- University of Chinese Academy of Sciences, Beijing 100049, China
- AlphaMol Science Ltd, Shenzhen 518055, China
| | - Wenqiang Cui
- Research Center for Computer-Aided Drug Discovery, Institute of Biomedicine and Biotechnology, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Xiaolin Sun
- Research Center for Computer-Aided Drug Discovery, Institute of Biomedicine and Biotechnology, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China
| | - Haojie Guo
- Research Center for Computer-Aided Drug Discovery, Institute of Biomedicine and Biotechnology, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China
| | - Hailu Yan
- School of Biological Sciences, College of Science and Engineering, University of Edinburgh, Edinburgh EH8 9YL, U.K
| | - Horst Vogel
- Research Center for Computer-Aided Drug Discovery, Institute of Biomedicine and Biotechnology, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China
| | - Zhi Wang
- Artificial Intelligence Department, Zhejiang Financial College, Hangzhou 310018, China
| | - Shuguang Yuan
- Research Center for Computer-Aided Drug Discovery, Institute of Biomedicine and Biotechnology, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China
- AlphaMol Science Ltd, Shenzhen 518055, China
| |
Collapse
|
2
|
Nussinov R, Liu Y, Zhang W, Jang H. Protein conformational ensembles in function: roles and mechanisms. RSC Chem Biol 2023; 4:850-864. [PMID: 37920394 PMCID: PMC10619138 DOI: 10.1039/d3cb00114h] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2023] [Accepted: 09/02/2023] [Indexed: 11/04/2023] Open
Abstract
The sequence-structure-function paradigm has dominated twentieth century molecular biology. The paradigm tacitly stipulated that for each sequence there exists a single, well-organized protein structure. Yet, to sustain cell life, function requires (i) that there be more than a single structure, (ii) that there be switching between the structures, and (iii) that the structures be incompletely organized. These fundamental tenets called for an updated sequence-conformational ensemble-function paradigm. The powerful energy landscape idea, which is the foundation of modernized molecular biology, imported the conformational ensemble framework from physics and chemistry. This framework embraces the recognition that proteins are dynamic and are always interconverting between conformational states with varying energies. The more stable the conformation the more populated it is. The changes in the populations of the states are required for cell life. As an example, in vivo, under physiological conditions, wild type kinases commonly populate their more stable "closed", inactive, conformations. However, there are minor populations of the "open", ligand-free states. Upon their stabilization, e.g., by high affinity interactions or mutations, their ensembles shift to occupy the active states. Here we discuss the role of conformational propensities in function. We provide multiple examples of diverse systems, including protein kinases, lipid kinases, and Ras GTPases, discuss diverse conformational mechanisms, and provide a broad outlook on protein ensembles in the cell. We propose that the number of molecules in the active state (inactive for repressors), determine protein function, and that the dynamic, relative conformational propensities, rather than the rigid structures, are the hallmark of cell life.
Collapse
Affiliation(s)
- Ruth Nussinov
- Computational Structural Biology Section, Frederick National Laboratory for Cancer Research Frederick MD 21702 USA
- Department of Human Molecular Genetics and Biochemistry, Sackler School of Medicine, Tel Aviv University Tel Aviv 69978 Israel
- Cancer Innovation Laboratory, National Cancer Institute Frederick MD 21702 USA
| | - Yonglan Liu
- Cancer Innovation Laboratory, National Cancer Institute Frederick MD 21702 USA
| | - Wengang Zhang
- Cancer Innovation Laboratory, National Cancer Institute Frederick MD 21702 USA
| | - Hyunbum Jang
- Computational Structural Biology Section, Frederick National Laboratory for Cancer Research Frederick MD 21702 USA
- Cancer Innovation Laboratory, National Cancer Institute Frederick MD 21702 USA
| |
Collapse
|
3
|
Vila JA. Protein folding rate evolution upon mutations. Biophys Rev 2023; 15:661-669. [PMID: 37681091 PMCID: PMC10480377 DOI: 10.1007/s12551-023-01088-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2023] [Accepted: 06/24/2023] [Indexed: 09/09/2023] Open
Abstract
Despite the spectacular success of cutting-edge protein fold prediction methods, many critical questions remain unanswered, including why proteins can reach their native state in a biologically reasonable time. A satisfactory answer to this simple question could shed light on the slowest folding rate of proteins as well as how mutations-amino-acid substitutions and/or post-translational modifications-might affect it. Preliminary results indicate that (i) Anfinsen's dogma validity ensures that proteins reach their native state on a reasonable timescale regardless of their sequence or length, and (ii) it is feasible to determine the evolution of protein folding rates without accounting for epistasis effects or the mutational trajectories between the starting and target sequences. These results have direct implications for evolutionary biology because they lay the groundwork for a better understanding of why, and to what extent, mutations-a crucial element of evolution and a factor influencing it-affect protein evolvability. Furthermore, they may spur significant progress in our efforts to solve crucial structural biology problems, such as how a sequence encodes its folding.
Collapse
Affiliation(s)
- Jorge A. Vila
- IMASL-CONICET, Universidad Nacional de San Luis, Ejército de Los Andes 950, 5700 San Luis, Argentina
| |
Collapse
|
4
|
Lichtinger SM, Biggin PC. Tackling Hysteresis in Conformational Sampling: How to Be Forgetful with MEMENTO. J Chem Theory Comput 2023; 19:3705-3720. [PMID: 37285481 PMCID: PMC10308841 DOI: 10.1021/acs.jctc.3c00140] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2023] [Indexed: 06/09/2023]
Abstract
The structure of proteins has long been recognized to hold the key to understanding and engineering their function, and rapid advances in structural biology and protein structure prediction are now supplying researchers with an ever-increasing wealth of structural information. Most of the time, however, structures can only be determined in free energy minima, one at a time. While conformational flexibility may thus be inferred from static end-state structures, their interconversion mechanisms─a central ambition of structural biology─are often beyond the scope of direct experimentation. Given the dynamical nature of the processes in question, many studies have attempted to explore conformational transitions using molecular dynamics (MD). However, ensuring proper convergence and reversibility in the predicted transitions is extremely challenging. In particular, a commonly used technique to map out a path from a starting to a target conformation called steered MD (SMD) can suffer from starting-state dependence (hysteresis) when combined with techniques such as umbrella sampling (US) to compute the free energy profile of a transition. Here, we study this problem in detail on conformational changes of increasing complexity. We also present a new, history-independent approach that we term "MEMENTO" (Morphing End states by Modelling Ensembles with iNdependent TOpologies) to generate paths that alleviate hysteresis in the construction of conformational free energy profiles. MEMENTO utilizes template-based structure modelling to restore physically reasonable protein conformations based on coordinate interpolation (morphing) as an ensemble of plausible intermediates, from which a smooth path is picked. We compare SMD and MEMENTO on well-characterized test cases (the toy peptide deca-alanine and the enzyme adenylate kinase) before discussing its use in more complicated systems (the kinase P38α and the bacterial leucine transporter LeuT). Our work shows that for all but the simplest systems SMD paths should not in general be used to seed umbrella sampling or related techniques, unless the paths are validated by consistent results from biased runs in opposite directions. MEMENTO, on the other hand, performs well as a flexible tool to generate intermediate structures for umbrella sampling. We also demonstrate that extended end-state sampling combined with MEMENTO can aid the discovery of collective variables on a case-by-case basis.
Collapse
Affiliation(s)
| | - Philip C. Biggin
- Department of Biochemistry, University of Oxford, Oxford OX1 3QU, U.K.
| |
Collapse
|