1
|
Li S, Wu B, Luo YL, Han W. Simulations of Functional Motions of Super Large Biomolecules with a Mixed-Resolution Model. J Chem Theory Comput 2024; 20:2228-2245. [PMID: 38374639 PMCID: PMC10938502 DOI: 10.1021/acs.jctc.3c01046] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2023] [Revised: 01/18/2024] [Accepted: 01/29/2024] [Indexed: 02/21/2024]
Abstract
Many large protein machines function through an interplay between large-scale movements and intricate conformational changes. Understanding functional motions of these proteins through simulations becomes challenging for both all-atom and coarse-grained (CG) modeling techniques because neither approach alone can readily capture the full details of these motions. In this study, we develop a multiscale model by employing the popular MARTINI CG model to represent a heterogeneous environment and structurally stable proteins and using the united-atom (UA) model PACE to describe proteins undergoing subtle conformational changes. PACE was previously developed to be compatible with the MARTINI solvent and membrane. Here, we couple the protein descriptions of the two models by directly mixing UA and CG interaction parameters to greatly simplify parameter determination. Through extensive validations with diverse protein systems in solution or membrane, we demonstrate that only additional parameter rescaling is needed to enable the resulting model to recover the stability of native structures of proteins under mixed representation. Moreover, we identify the optimal scaling factors that can be applied to various protein systems, rendering the model potentially transferable. To further demonstrate its applicability for realistic systems, we apply the model to a mechanosensitive ion channel Piezo1 that has peripheral arms for sensing membrane tension and a central pore for ion conductance. The model can reproduce the coupling between Piezo1's large-scale arm movement and subtle pore opening in response to membrane stress while consuming much less computational costs than all-atom models. Therefore, our model shows promise for studying functional motions of large protein machines.
Collapse
Affiliation(s)
- Shu Li
- Centre
for Artificial Intelligence Driven Drug Discovery, Faculty of Applied
Sciences, Macao Polytechnic University, Macao 999078, China
- State
Key Laboratory of Chemical Oncogenomics, Guangdong Provincial Key
Laboratory of Chemical Genomics, School of Chemical Biology and Biotechnology, Peking University Shenzhen Graduate School, Shenzhen 518055, China
| | - Bohua Wu
- State
Key Laboratory of Chemical Oncogenomics, Guangdong Provincial Key
Laboratory of Chemical Genomics, School of Chemical Biology and Biotechnology, Peking University Shenzhen Graduate School, Shenzhen 518055, China
| | - Yun Lyna Luo
- Department
of Biotechnology and Pharmaceutical Sciences, Western University of Health Sciences, Pomona, California 91766, United States
| | - Wei Han
- State
Key Laboratory of Chemical Oncogenomics, Guangdong Provincial Key
Laboratory of Chemical Genomics, School of Chemical Biology and Biotechnology, Peking University Shenzhen Graduate School, Shenzhen 518055, China
- Department
of Chemistry, Faculty of Science, Hong Kong
Baptist University, Hong Kong SAR 999077, China
- Shenzhen
Bay Laboratory, Institute of Chemical Biology, Shenzhen 518132, China
| |
Collapse
|
2
|
Zheng LE, Barethiya S, Nordquist E, Chen J. Machine Learning Generation of Dynamic Protein Conformational Ensembles. Molecules 2023; 28:4047. [PMID: 37241789 PMCID: PMC10220786 DOI: 10.3390/molecules28104047] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2023] [Revised: 05/04/2023] [Accepted: 05/09/2023] [Indexed: 05/28/2023] Open
Abstract
Machine learning has achieved remarkable success across a broad range of scientific and engineering disciplines, particularly its use for predicting native protein structures from sequence information alone. However, biomolecules are inherently dynamic, and there is a pressing need for accurate predictions of dynamic structural ensembles across multiple functional levels. These problems range from the relatively well-defined task of predicting conformational dynamics around the native state of a protein, which traditional molecular dynamics (MD) simulations are particularly adept at handling, to generating large-scale conformational transitions connecting distinct functional states of structured proteins or numerous marginally stable states within the dynamic ensembles of intrinsically disordered proteins. Machine learning has been increasingly applied to learn low-dimensional representations of protein conformational spaces, which can then be used to drive additional MD sampling or directly generate novel conformations. These methods promise to greatly reduce the computational cost of generating dynamic protein ensembles, compared to traditional MD simulations. In this review, we examine recent progress in machine learning approaches towards generative modeling of dynamic protein ensembles and emphasize the crucial importance of integrating advances in machine learning, structural data, and physical principles to achieve these ambitious goals.
Collapse
Affiliation(s)
- Li-E Zheng
- Department of Gynecology, The First Affiliated Hospital of Fujian Medical University, Fuzhou 350005, China;
| | - Shrishti Barethiya
- Department of Chemistry, University of Massachusetts Amherst, Amherst, MA 01003, USA; (S.B.); (E.N.)
| | - Erik Nordquist
- Department of Chemistry, University of Massachusetts Amherst, Amherst, MA 01003, USA; (S.B.); (E.N.)
| | - Jianhan Chen
- Department of Chemistry, University of Massachusetts Amherst, Amherst, MA 01003, USA; (S.B.); (E.N.)
| |
Collapse
|
3
|
Ingólfsson H, Bhatia H, Aydin F, Oppelstrup T, López CA, Stanton LG, Carpenter TS, Wong S, Di Natale F, Zhang X, Moon JY, Stanley CB, Chavez JR, Nguyen K, Dharuman G, Burns V, Shrestha R, Goswami D, Gulten G, Van QN, Ramanathan A, Van Essen B, Hengartner NW, Stephen AG, Turbyville T, Bremer PT, Gnanakaran S, Glosli JN, Lightstone FC, Nissley DV, Streitz FH. Machine Learning-Driven Multiscale Modeling: Bridging the Scales with a Next-Generation Simulation Infrastructure. J Chem Theory Comput 2023; 19:2658-2675. [PMID: 37075065 PMCID: PMC10173464 DOI: 10.1021/acs.jctc.2c01018] [Citation(s) in RCA: 11] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2022] [Indexed: 04/20/2023]
Abstract
Interdependence across time and length scales is common in biology, where atomic interactions can impact larger-scale phenomenon. Such dependence is especially true for a well-known cancer signaling pathway, where the membrane-bound RAS protein binds an effector protein called RAF. To capture the driving forces that bring RAS and RAF (represented as two domains, RBD and CRD) together on the plasma membrane, simulations with the ability to calculate atomic detail while having long time and large length- scales are needed. The Multiscale Machine-Learned Modeling Infrastructure (MuMMI) is able to resolve RAS/RAF protein-membrane interactions that identify specific lipid-protein fingerprints that enhance protein orientations viable for effector binding. MuMMI is a fully automated, ensemble-based multiscale approach connecting three resolution scales: (1) the coarsest scale is a continuum model able to simulate milliseconds of time for a 1 μm2 membrane, (2) the middle scale is a coarse-grained (CG) Martini bead model to explore protein-lipid interactions, and (3) the finest scale is an all-atom (AA) model capturing specific interactions between lipids and proteins. MuMMI dynamically couples adjacent scales in a pairwise manner using machine learning (ML). The dynamic coupling allows for better sampling of the refined scale from the adjacent coarse scale (forward) and on-the-fly feedback to improve the fidelity of the coarser scale from the adjacent refined scale (backward). MuMMI operates efficiently at any scale, from a few compute nodes to the largest supercomputers in the world, and is generalizable to simulate different systems. As computing resources continue to increase and multiscale methods continue to advance, fully automated multiscale simulations (like MuMMI) will be commonly used to address complex science questions.
Collapse
Affiliation(s)
- Helgi
I. Ingólfsson
- Physical
and Life Sciences (PLS) Directorate, Lawrence
Livermore National Laboratory, Livermore, California 94550, United States
| | - Harsh Bhatia
- Computing
Directorate, Lawrence Livermore National
Laboratory, Livermore, California 94550, United States
| | - Fikret Aydin
- Physical
and Life Sciences (PLS) Directorate, Lawrence
Livermore National Laboratory, Livermore, California 94550, United States
| | - Tomas Oppelstrup
- Physical
and Life Sciences (PLS) Directorate, Lawrence
Livermore National Laboratory, Livermore, California 94550, United States
| | - Cesar A. López
- Theoretical
Biology and Biophysics Group, Los Alamos
National Laboratory, Los Alamos, New Mexico 87545, United States
| | - Liam G. Stanton
- Department
of Mathematics and Statistics, San José
State University, San José, California 95192, United States
| | - Timothy S. Carpenter
- Physical
and Life Sciences (PLS) Directorate, Lawrence
Livermore National Laboratory, Livermore, California 94550, United States
| | - Sergio Wong
- Physical
and Life Sciences (PLS) Directorate, Lawrence
Livermore National Laboratory, Livermore, California 94550, United States
| | - Francesco Di Natale
- Computing
Directorate, Lawrence Livermore National
Laboratory, Livermore, California 94550, United States
| | - Xiaohua Zhang
- Physical
and Life Sciences (PLS) Directorate, Lawrence
Livermore National Laboratory, Livermore, California 94550, United States
| | - Joseph Y. Moon
- Computing
Directorate, Lawrence Livermore National
Laboratory, Livermore, California 94550, United States
| | - Christopher B. Stanley
- Computational
Sciences and Engineering Division, Oak Ridge
National Laboratory, Oak Ridge, Tennessee 37830, United States
| | - Joseph R. Chavez
- Computing
Directorate, Lawrence Livermore National
Laboratory, Livermore, California 94550, United States
| | - Kien Nguyen
- Theoretical
Biology and Biophysics Group, Los Alamos
National Laboratory, Los Alamos, New Mexico 87545, United States
| | - Gautham Dharuman
- Physical
and Life Sciences (PLS) Directorate, Lawrence
Livermore National Laboratory, Livermore, California 94550, United States
| | - Violetta Burns
- Theoretical
Biology and Biophysics Group, Los Alamos
National Laboratory, Los Alamos, New Mexico 87545, United States
| | - Rebika Shrestha
- RAS Initiative,
The Cancer Research Technology Program, Frederick National Laboratory, Frederick, Maryland 21701, United States
| | - Debanjan Goswami
- RAS Initiative,
The Cancer Research Technology Program, Frederick National Laboratory, Frederick, Maryland 21701, United States
| | - Gulcin Gulten
- RAS Initiative,
The Cancer Research Technology Program, Frederick National Laboratory, Frederick, Maryland 21701, United States
| | - Que N. Van
- RAS Initiative,
The Cancer Research Technology Program, Frederick National Laboratory, Frederick, Maryland 21701, United States
| | - Arvind Ramanathan
- Computing,
Environment & Life Sciences (CELS) Directorate, Argonne National Laboratory, Lemont, Illinois 60439, United States
| | - Brian Van Essen
- Computing
Directorate, Lawrence Livermore National
Laboratory, Livermore, California 94550, United States
| | - Nicolas W. Hengartner
- Theoretical
Biology and Biophysics Group, Los Alamos
National Laboratory, Los Alamos, New Mexico 87545, United States
| | - Andrew G. Stephen
- RAS Initiative,
The Cancer Research Technology Program, Frederick National Laboratory, Frederick, Maryland 21701, United States
| | - Thomas Turbyville
- RAS Initiative,
The Cancer Research Technology Program, Frederick National Laboratory, Frederick, Maryland 21701, United States
| | - Peer-Timo Bremer
- Computing
Directorate, Lawrence Livermore National
Laboratory, Livermore, California 94550, United States
| | - S. Gnanakaran
- Theoretical
Biology and Biophysics Group, Los Alamos
National Laboratory, Los Alamos, New Mexico 87545, United States
| | - James N. Glosli
- Physical
and Life Sciences (PLS) Directorate, Lawrence
Livermore National Laboratory, Livermore, California 94550, United States
| | - Felice C. Lightstone
- Physical
and Life Sciences (PLS) Directorate, Lawrence
Livermore National Laboratory, Livermore, California 94550, United States
| | - Dwight V. Nissley
- RAS Initiative,
The Cancer Research Technology Program, Frederick National Laboratory, Frederick, Maryland 21701, United States
| | - Frederick H. Streitz
- Physical
and Life Sciences (PLS) Directorate, Lawrence
Livermore National Laboratory, Livermore, California 94550, United States
| |
Collapse
|
4
|
Zhang O, Haghighatlari M, Li J, Liu ZH, Namini A, Teixeira JMC, Forman-Kay JD, Head-Gordon T. Learning to evolve structural ensembles of unfolded and disordered proteins using experimental solution data. J Chem Phys 2023; 158:174113. [PMID: 37144719 PMCID: PMC10163956 DOI: 10.1063/5.0141474] [Citation(s) in RCA: 10] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2023] [Accepted: 04/11/2023] [Indexed: 05/06/2023] Open
Abstract
The structural characterization of proteins with a disorder requires a computational approach backed by experiments to model their diverse and dynamic structural ensembles. The selection of conformational ensembles consistent with solution experiments of disordered proteins highly depends on the initial pool of conformers, with currently available tools limited by conformational sampling. We have developed a Generative Recurrent Neural Network (GRNN) that uses supervised learning to bias the probability distributions of torsions to take advantage of experimental data types such as nuclear magnetic resonance J-couplings, nuclear Overhauser effects, and paramagnetic resonance enhancements. We show that updating the generative model parameters according to the reward feedback on the basis of the agreement between experimental data and probabilistic selection of torsions from learned distributions provides an alternative to existing approaches that simply reweight conformers of a static structural pool for disordered proteins. Instead, the biased GRNN, DynamICE, learns to physically change the conformations of the underlying pool of the disordered protein to those that better agree with experiments.
Collapse
Affiliation(s)
- Oufan Zhang
- Kenneth S. Pitzer Theory Center and Department of Chemistry, University of California, Berkeley, California 94720, USA
| | - Mojtaba Haghighatlari
- Kenneth S. Pitzer Theory Center and Department of Chemistry, University of California, Berkeley, California 94720, USA
| | - Jie Li
- Kenneth S. Pitzer Theory Center and Department of Chemistry, University of California, Berkeley, California 94720, USA
| | | | - Ashley Namini
- Molecular Medicine Program, Hospital for Sick Children, Toronto, Ontario M5S 1A8, Canada
| | | | | | | |
Collapse
|
5
|
Zhu JJ, Zhang NJ, Wei T, Chen HF. Enhancing Conformational Sampling for Intrinsically Disordered and Ordered Proteins by Variational Autoencoder. Int J Mol Sci 2023; 24:ijms24086896. [PMID: 37108059 PMCID: PMC10138423 DOI: 10.3390/ijms24086896] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2023] [Revised: 03/26/2023] [Accepted: 03/27/2023] [Indexed: 04/29/2023] Open
Abstract
Intrinsically disordered proteins (IDPs) account for more than 50% of the human proteome and are closely associated with tumors, cardiovascular diseases, and neurodegeneration, which have no fixed three-dimensional structure under physiological conditions. Due to the characteristic of conformational diversity, conventional experimental methods of structural biology, such as NMR, X-ray diffraction, and CryoEM, are unable to capture conformational ensembles. Molecular dynamics (MD) simulation can sample the dynamic conformations at the atomic level, which has become an effective method for studying the structure and function of IDPs. However, the high computational cost prevents MD simulations from being widely used for IDPs conformational sampling. In recent years, significant progress has been made in artificial intelligence, which makes it possible to solve the conformational reconstruction problem of IDP with fewer computational resources. Here, based on short MD simulations of different IDPs systems, we use variational autoencoders (VAEs) to achieve the generative reconstruction of IDPs structures and include a wider range of sampled conformations from longer simulations. Compared with the generative autoencoder (AEs), VAEs add an inference layer between the encoder and decoder in the latent space, which can cover the conformational landscape of IDPs more comprehensively and achieve the effect of enhanced sampling. Through experimental verification, the Cα RMSD between VAE-generated and MD simulation sampling conformations in the 5 IDPs test systems was significantly lower than that of AE. The Spearman correlation coefficient on the structure was higher than that of AE. VAE can also achieve excellent performance regarding structured proteins. In summary, VAEs can be used to effectively sample protein structures.
Collapse
Affiliation(s)
- Jun-Jie Zhu
- State Key Laboratory of Microbial Metabolism, Joint International Research Laboratory of Metabolic & Developmental Sciences, Department of Bioinformatics and Biostatistics, National Experimental Teaching Center for Life Sciences and Biotechnology, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Ning-Jie Zhang
- State Key Laboratory of Microbial Metabolism, Joint International Research Laboratory of Metabolic & Developmental Sciences, Department of Bioinformatics and Biostatistics, National Experimental Teaching Center for Life Sciences and Biotechnology, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Ting Wei
- State Key Laboratory of Microbial Metabolism, Joint International Research Laboratory of Metabolic & Developmental Sciences, Department of Bioinformatics and Biostatistics, National Experimental Teaching Center for Life Sciences and Biotechnology, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Hai-Feng Chen
- State Key Laboratory of Microbial Metabolism, Joint International Research Laboratory of Metabolic & Developmental Sciences, Department of Bioinformatics and Biostatistics, National Experimental Teaching Center for Life Sciences and Biotechnology, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200240, China
- Shanghai Center for Bioinformation Technology, Shanghai 200240, China
| |
Collapse
|
6
|
Baltrukevich H, Podlewska S. From Data to Knowledge: Systematic Review of Tools for Automatic Analysis of Molecular Dynamics Output. Front Pharmacol 2022; 13:844293. [PMID: 35359865 PMCID: PMC8960308 DOI: 10.3389/fphar.2022.844293] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2021] [Accepted: 01/26/2022] [Indexed: 12/02/2022] Open
Abstract
An increasing number of crystal structures available on one side, and the boost of computational power available for computer-aided drug design tasks on the other, have caused that the structure-based drug design tools are intensively used in the drug development pipelines. Docking and molecular dynamics simulations, key representatives of the structure-based approaches, provide detailed information about the potential interaction of a ligand with a target receptor. However, at the same time, they require a three-dimensional structure of a protein and a relatively high amount of computational resources. Nowadays, as both docking and molecular dynamics are much more extensively used, the amount of data output from these procedures is also growing. Therefore, there are also more and more approaches that facilitate the analysis and interpretation of the results of structure-based tools. In this review, we will comprehensively summarize approaches for handling molecular dynamics simulations output. It will cover both statistical and machine-learning-based tools, as well as various forms of depiction of molecular dynamics output.
Collapse
Affiliation(s)
- Hanna Baltrukevich
- Maj Institute of Pharmacology, Polish Academy of Sciences, Kraków, Poland
- Faculty of Pharmacy, Chair of Technology and Biotechnology of Medical Remedies, Jagiellonian University Medical College in Krakow, Kraków, Poland
| | - Sabina Podlewska
- Maj Institute of Pharmacology, Polish Academy of Sciences, Kraków, Poland
| |
Collapse
|