1
|
Ruzmetov T, Hung TI, Jonnalagedda SP, Chen SH, Fasihianifard P, Guo Z, Bhanu B, Chang CEA. Sampling Conformational Ensembles of Highly Dynamic Proteins via Generative Deep Learning. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.05.05.592587. [PMID: 38979147 PMCID: PMC11230202 DOI: 10.1101/2024.05.05.592587] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/10/2024]
Abstract
Proteins are inherently dynamic, and their conformational ensembles are functionally important in biology. Large-scale motions may govern protein structure-function relationship, and numerous transient but stable conformations of Intrinsically Disordered Proteins (IDPs) can play a crucial role in biological function. Investigating conformational ensembles to understand regulations and disease-related aggregations of IDPs is challenging both experimentally and computationally. In this paper we first introduce a deep learning-based model, termed Internal Coordinate Net (ICoN), which learns the physical principles of conformational changes from Molecular Dynamics (MD) simulation data. Second, we selected interpolating data points in the learned latent space that rapidly identify novel synthetic conformations with sophisticated and large-scale sidechains and backbone arrangements. Third, with the highly dynamic amyloid-β 1-42 (Aβ42) monomer, our deep learning model provided a comprehensive sampling of Aβ42's conformational landscape. Analysis of these synthetic conformations revealed conformational clusters that can be used to rationalize experimental findings. Additionally, the method can identify novel conformations with important interactions in atomistic details that are not included in the training data. New synthetic conformations showed distinct sidechain rearrangements that are probed by our EPR and amino acid substitution studies. This approach is highly transferable and can be used for any available data for training. The work also demonstrated the ability of deep learning to utilize learned natural atomistic motions in protein conformation sampling.
Collapse
|
2
|
Houston L, Phillips M, Torres A, Gaalswyk K, Ghosh K. Physics-Based Machine Learning Trains Hamiltonians and Decodes the Sequence-Conformation Relation in the Disordered Proteome. J Chem Theory Comput 2024; 20:10266-10274. [PMID: 39504303 DOI: 10.1021/acs.jctc.4c01114] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2024]
Abstract
Intrinsically disordered proteins and regions (IDPs) are involved in vital biological processes. To understand the IDP function, often controlled by conformation, we need to find the link between sequence and conformation. We decode this link by integrating theory, simulation, and machine learning (ML) where sequence-dependent electrostatics is modeled analytically while nonelectrostatic interaction is extracted from simulations for many sequences and subsequently trained using ML. The resulting Hamiltonian, combining physics-based electrostatics and machine-learned nonelectrostatics, accurately predicts sequence-specific global and local measures of conformations beyond the original observable used from the simulation. This is in contrast to traditional ML approaches that train and predict a specific observable, not a Hamiltonian. Our formalism reproduces experimental measurements, predicts multiple conformational features directly from sequence with high throughput that will give insights into IDP design and evolution, and illustrates the broad utility of using physics-based ML to train unknown parts of a Hamiltonian, rather than a specific observable, in combination with known physics.
Collapse
Affiliation(s)
- Lilianna Houston
- Department of Physics and Astronomy, University of Denver, Denver, Colorado 80210, United States
| | - Michael Phillips
- Department of Physics and Astronomy, University of Denver, Denver, Colorado 80210, United States
| | - Andrew Torres
- Department of Physics and Astronomy, University of Denver, Denver, Colorado 80210, United States
| | - Kari Gaalswyk
- Department of Physics and Astronomy, University of Denver, Denver, Colorado 80210, United States
| | - Kingshuk Ghosh
- Department of Physics and Astronomy, University of Denver, Denver, Colorado 80210, United States
- Department of Molecular and Cellular Biophysics, University of Denver, Denver, Colorado 80210, United States
| |
Collapse
|
3
|
Zeng J, Yang Z, Tang Y, Wei G. Emerging Frontiers in Conformational Exploration of Disordered Proteins: Integrating Autoencoder and Molecular Simulations. ACS Chem Neurosci 2024. [PMID: 39555603 DOI: 10.1021/acschemneuro.4c00670] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2024] Open
Abstract
Intrinsically disordered proteins (IDPs) are closely associated with a number of neurodegenerative diseases, such as Alzheimer's disease and Parkinson's disease. Due to the highly dynamic nature of IDPs, their structural determination and conformational exploration pose significant challenges for both experimental and computational research. Recently, the integration of machine learning with molecular dynamics (MD) simulations has emerged as a promising methodology for efficiently exploring the conformation spaces of IDPs. In this viewpoint, we briefly review recently developed autoencoder-based models designed to enhance the conformational exploration of IDPs through embedding and latent sampling. We highlight the capability of autoencoders in expanding the conformations sampled by MD simulations and discuss their limitations due to the non-Gaussian latent space distribution and the limited conformational diversity of training conformations. Potential strategies to overcome these limitations are also discussed.
Collapse
Affiliation(s)
- Jiyuan Zeng
- Department of Physics, State Key Laboratory of Surface Physics, and Key Laboratory for Computational Physical Sciences (Ministry of Education), Fudan University, Shanghai 200438, China
| | - Zhongyuan Yang
- Department of Physics, State Key Laboratory of Surface Physics, and Key Laboratory for Computational Physical Sciences (Ministry of Education), Fudan University, Shanghai 200438, China
| | - Yiming Tang
- Department of Physics, State Key Laboratory of Surface Physics, and Key Laboratory for Computational Physical Sciences (Ministry of Education), Fudan University, Shanghai 200438, China
| | - Guanghong Wei
- Department of Physics, State Key Laboratory of Surface Physics, and Key Laboratory for Computational Physical Sciences (Ministry of Education), Fudan University, Shanghai 200438, China
| |
Collapse
|
4
|
de Bruyn E, Dorn AE, Rossetti G, Fernandez C, Outeiro TF, Schulz JB, Carloni P. Impact of Phosphorylation on the Physiological Form of Human alpha-Synuclein in Aqueous Solution. J Chem Inf Model 2024; 64:8215-8226. [PMID: 39462994 PMCID: PMC11558680 DOI: 10.1021/acs.jcim.4c01172] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2024] [Revised: 10/05/2024] [Accepted: 10/15/2024] [Indexed: 10/29/2024]
Abstract
Serine 129 can be phosphorylated in pathological inclusions formed by the intrinsically disordered protein human α-synuclein (AS), a key player in Parkinson's disease and other synucleinopathies. Here, molecular simulations provide insight into the structural ensemble of phosphorylated AS. The simulations allow us to suggest that phosphorylation significantly impacts the structural content of the physiological AS conformational ensemble in aqueous solution, as the phosphate group is mostly solvated. The hydrophobic region of AS contains β-hairpin structures, which may increase the propensity of the protein to undergo amyloid formation, as seen in the nonphysiological (nonacetylated) form of the protein in a recent molecular simulation study. Our findings are consistent with existing experimental data with the caveat of the observed limitations of the force field for the phosphorylated moiety.
Collapse
Affiliation(s)
- Emile de Bruyn
- Jülich
Supercomputing Centre (JSC), Forschungszentrum
Jülich GmbH, 52425 Jülich, Germany
- Department
of Physics, RWTH Aachen University, 52062 Aachen, Germany
| | - Anton Emil Dorn
- Jülich
Supercomputing Centre (JSC), Forschungszentrum
Jülich GmbH, 52425 Jülich, Germany
- Faculty
of Biology, University of Duisburg-Essen, 45141 Essen, Germany
| | - Giulia Rossetti
- Jülich
Supercomputing Centre (JSC), Forschungszentrum
Jülich GmbH, 52425 Jülich, Germany
- Computational
Biomedicine (IAS-5/INM-9), Forschungszentrum
Jülich GmbH, 52425 Jülich, Germany
- Department
of Neurology, RWTH Aachen University, 52074 Aachen, Germany
| | - Claudio Fernandez
- Max Planck
Laboratory for Structural Biology, Chemistry and Molecular Biophysics
of Rosario (MPLbioR, UNR-MPINAT), Partner of the Max Planck Institute
for Multidisciplinary Sciences (MPINAT, MPG), Centro de Estudios Interdisciplinarios, Universidad Nacional de Rosario, S2002LRK Rosario, Argentina
- Department
of NMR-based Structural Biology, Max Planck
Institute for Multidisciplinary Sciences, 37077 Göttingen, Germany
| | - Tiago F. Outeiro
- Department
of Experimental Neurodegeneration, Center for Biostructural Imaging
of Neurodegeneration, University Medical
Center Göttingen, 37075 Göttingen, Germany
- Max
Planck Institute for Multidisciplinary Sciences, 37075 Göttingen, Germany
- Translational
and Clinical Research Institute, Newcastle
University, Newcastle upon Tyne NE1 7RU, United
Kingdom
| | - Jörg B. Schulz
- Department
of Physics, RWTH Aachen University, 52062 Aachen, Germany
- Department
of Neurology, RWTH Aachen University, 52074 Aachen, Germany
- JARA
Brain Institute Molecular Neuroscience and Neuroimaging (INM-11), Research Centre Jülich and RWTH Aachen University, 52074 Aachen, Germany
| | - Paolo Carloni
- Department
of Physics, RWTH Aachen University, 52062 Aachen, Germany
- Computational
Biomedicine (IAS-5/INM-9), Forschungszentrum
Jülich GmbH, 52425 Jülich, Germany
| |
Collapse
|
5
|
Aupič J, Pokorná P, Ruthstein S, Magistrato A. Predicting Conformational Ensembles of Intrinsically Disordered Proteins: From Molecular Dynamics to Machine Learning. J Phys Chem Lett 2024; 15:8177-8186. [PMID: 39093570 DOI: 10.1021/acs.jpclett.4c01544] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/04/2024]
Abstract
Intrinsically disordered proteins and regions (IDP/IDRs) are ubiquitous across all domains of life. Characterized by a lack of a stable tertiary structure, IDP/IDRs populate a diverse set of transiently formed structural states that can promiscuously adapt upon binding with specific interaction partners and/or certain alterations in environmental conditions. This malleability is foundational for their role as tunable interaction hubs in core cellular processes such as signaling, transcription, and translation. Tracing the conformational ensemble of an IDP/IDR and its perturbation in response to regulatory cues is thus paramount for illuminating its function. However, the conformational heterogeneity of IDP/IDRs poses several challenges. Here, we review experimental and computational methods devised to disentangle the conformational landscape of IDP/IDRs, highlighting recent computational advances that permit proteome-wide scans of IDP/IDRs conformations. We briefly evaluate selected computational methods using the disordered N-terminal of the human copper transporter 1 as a test case and outline further challenges in IDP/IDRs ensemble prediction.
Collapse
Affiliation(s)
- Jana Aupič
- CNR-IOM at International School for Advanced Studies (SISSA/ISAS), via Bonomea 265, 34136 Trieste, Italy
| | - Pavlína Pokorná
- CNR-IOM at International School for Advanced Studies (SISSA/ISAS), via Bonomea 265, 34136 Trieste, Italy
| | - Sharon Ruthstein
- Department of Chemistry, Faculty of Exact Sciences and the Institute for Nanotechnology and Advanced Materials (BINA), Bar-Ilan University, 5290002 Ramat-Gan, Israel
| | - Alessandra Magistrato
- CNR-IOM at International School for Advanced Studies (SISSA/ISAS), via Bonomea 265, 34136 Trieste, Italy
| |
Collapse
|
6
|
Ruzmetov T, Hung TI, Jonnalagedda SP, Chen SH, Fasihianifard P, Guo Z, Bhanu B, Chang CEA. Sampling Conformational Ensembles of Highly Dynamic Proteins via Generative Deep Learning. RESEARCH SQUARE 2024:rs.3.rs-4301803. [PMID: 38978607 PMCID: PMC11230488 DOI: 10.21203/rs.3.rs-4301803/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/10/2024]
Abstract
Proteins are inherently dynamic, and their conformational ensembles are functionally important in biology. Large-scale motions may govern protein structure-function relationship, and numerous transient but stable conformations of intrinsically disordered proteins (IDPs) can play a crucial role in biological function. Investigating conformational ensembles to understand regulations and disease-related aggregations of IDPs is challenging both experimentally and computationally. In this paper first an unsupervised deep learning-based model, termed Internal Coordinate Net (ICoN), is developed that learns the physical principles of conformational changes from molecular dynamics (MD) simulation data. Second, interpolating data points in the learned latent space are selected that rapidly identify novel synthetic conformations with sophisticated and large-scale sidechains and backbone arrangements. Third, with the highly dynamic amyloid-β1-42 (Aβ42) monomer, our deep learning model provided a comprehensive sampling of Aβ42's conformational landscape. Analysis of these synthetic conformations revealed conformational clusters that can be used to rationalize experimental findings. Additionally, the method can identify novel conformations with important interactions in atomistic details that are not included in the training data. New synthetic conformations showed distinct sidechain rearrangements that are probed by our EPR and amino acid substitution studies. The proposed approach is highly transferable and can be used for any available data for training. The work also demonstrated the ability for deep learning to utilize learned natural atomistic motions in protein conformation sampling.
Collapse
Affiliation(s)
- Talant Ruzmetov
- Department of Chemistry, University of California, Riverside, CA92521
| | - Ta I Hung
- Department of Chemistry, University of California, Riverside, CA92521
- Department of Bioengineering, University of California, Riverside, CA92521
| | | | - Si-Han Chen
- Department of Chemistry, University of California, Riverside, CA92521
| | | | - Zhefeng Guo
- Department of Neurology, Brain Research Institute, University of California, Los Angeles, CA 90095
| | - Bir Bhanu
- Department of Bioengineering, University of California, Riverside, CA92521
- Department of Electrical and Computer Engineering, University of California, Riverside, CA92521
| | - Chia-En A Chang
- Department of Chemistry, University of California, Riverside, CA92521
- Department of Bioengineering, University of California, Riverside, CA92521
| |
Collapse
|
7
|
Wang J, Wang X, Chu Y, Li C, Li X, Meng X, Fang Y, No KT, Mao J, Zeng X. Exploring the Conformational Ensembles of Protein-Protein Complex with Transformer-Based Generative Model. J Chem Theory Comput 2024; 20:4469-4480. [PMID: 38816696 DOI: 10.1021/acs.jctc.4c00255] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/01/2024]
Abstract
Protein-protein interactions are the basis of many protein functions, and understanding the contact and conformational changes of protein-protein interactions is crucial for linking the protein structure to biological function. Although difficult to detect experimentally, molecular dynamics (MD) simulations are widely used to study the conformational ensembles and dynamics of protein-protein complexes, but there are significant limitations in sampling efficiency and computational costs. In this study, a generative neural network was trained on protein-protein complex conformations obtained from molecular simulations to directly generate novel conformations with physical realism. We demonstrated the use of a deep learning model based on the transformer architecture to explore the conformational ensembles of protein-protein complexes through MD simulations. The results showed that the learned latent space can be used to generate unsampled conformations of protein-protein complexes for obtaining new conformations complementing pre-existing ones, which can be used as an exploratory tool for the analysis and enhancement of molecular simulations of protein-protein complexes.
Collapse
Affiliation(s)
- Jianmin Wang
- The Interdisciplinary Graduate Program in Integrative Biotechnology, Yonsei University, Incheon 21983, Korea
| | - Xun Wang
- School of Computer Science and Technology, China University of Petroleum, Qingdao, Shandong 266580, P. R. China
- High Performance Computer Research Center, University of Chinese Academy of Sciences, Beijing 100190, P. R. China
| | - Yanyi Chu
- Department of Pathology, Stanford University School of Medicine, Stanford, California 94305, United States
| | - Chunyan Li
- School of Informatics, Yunnan Normal University, Kunming, Yunnan 650500, P. R. China
| | - Xue Li
- School of Computer Science and Technology, China University of Petroleum, Qingdao, Shandong 266580, P. R. China
| | - Xiangyu Meng
- School of Computer Science and Technology, China University of Petroleum, Qingdao, Shandong 266580, P. R. China
| | - Yitian Fang
- School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200030, P. R. China
| | - Kyoung Tai No
- The Interdisciplinary Graduate Program in Integrative Biotechnology, Yonsei University, Incheon 21983, Korea
| | - Jiashun Mao
- School of Medical Information and Engineering, Southwest Medical University, Luzhou, Sichuan 646000, P. R. China
| | - Xiangxiang Zeng
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, Hunan 410082, P. R. China
| |
Collapse
|
8
|
Wang L, Wen Z, Liu SW, Zhang L, Finley C, Lee HJ, Fan HJS. Overview of AlphaFold2 and breakthroughs in overcoming its limitations. Comput Biol Med 2024; 176:108620. [PMID: 38761500 DOI: 10.1016/j.compbiomed.2024.108620] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2023] [Revised: 05/01/2024] [Accepted: 05/14/2024] [Indexed: 05/20/2024]
Abstract
Predicting three-dimensional (3D) protein structures has been challenging for decades. The emergence of AlphaFold2 (AF2), a deep learning-based machine learning method developed by DeepMind, became a game changer in the protein folding community. AF2 can predict a protein's three-dimensional structure with high confidence based on its amino acid sequence. Accurate prediction of protein structures can dramatically accelerate our understanding of biological mechanisms and provide a solid foundation for reliable drug design. Although AF2 breaks through the barriers in predicting protein structures, many rooms remain to be further studied. This review provides a brief historical overview of the development of protein structure prediction, covering template-based, template-free, and machine learning-based methods. In addition to reviewing the potential benefits (Pros) and considerations (Cons) of using AF2, this review summarizes the diverse applications, including protein structure predictions, dynamic changes, point mutation, integration of language model and experimental data, protein complex, and protein-peptide interaction. It underscores recent advancements in efficiency, reliability, and broad application of AF2. This comprehensive review offers valuable insights into the applications of AF2 and AF2-inspired AI methods in structural biology and its potential for clinically significant drug target discovery.
Collapse
Affiliation(s)
- Lei Wang
- College of Chemical Engineering, Sichuan University of Science and Engineering, Zigong City, Sichuan Province, 64300, China
| | - Zehua Wen
- College of Chemical Engineering, Sichuan University of Science and Engineering, Zigong City, Sichuan Province, 64300, China
| | - Shi-Wei Liu
- College of Chemical Engineering, Sichuan University of Science and Engineering, Zigong City, Sichuan Province, 64300, China
| | - Lihong Zhang
- Digestive Department, Binhai New Area Hospital of TCM Tianjin, Tianjin, 300451, China
| | - Cierra Finley
- Department of Natural Sciences, Southwest Tennessee Community College, Memphis, TN, 38015, USA
| | - Ho-Jin Lee
- Department of Natural Sciences, Southwest Tennessee Community College, Memphis, TN, 38015, USA; Division of Natural & Mathematical Sciences, LeMoyne-Own College, Memphis, TN, 38126, USA.
| | - Hua-Jun Shawn Fan
- College of Chemical Engineering, Sichuan University of Science and Engineering, Zigong City, Sichuan Province, 64300, China.
| |
Collapse
|
9
|
Su Z, Dhusia K, Wu Y. Encoding the space of protein-protein binding interfaces by artificial intelligence. Comput Biol Chem 2024; 110:108080. [PMID: 38643609 DOI: 10.1016/j.compbiolchem.2024.108080] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2023] [Revised: 04/03/2024] [Accepted: 04/17/2024] [Indexed: 04/23/2024]
Abstract
The physical interactions between proteins are largely determined by the structural properties at their binding interfaces. It was found that the binding interfaces in distinctive protein complexes are highly similar. The structural properties underlying different binding interfaces could be further captured by artificial intelligence. In order to test this hypothesis, we broke protein-protein binding interfaces into pairs of interacting fragments. We employed a generative model to encode these interface fragment pairs in a low-dimensional latent space. After training, new conformations of interface fragment pairs were generated. We found that, by only using a small number of interface fragment pairs that were generated by artificial intelligence, we were able to guide the assembly of protein complexes into their native conformations. These results demonstrate that the conformational space of fragment pairs at protein-protein binding interfaces is highly degenerate. Features in this degenerate space can be well characterized by artificial intelligence. In summary, our machine learning method will be potentially useful to search for and predict the conformations of unknown protein-protein interactions.
Collapse
Affiliation(s)
- Zhaoqian Su
- Data Science Institute, Vanderbilt University, 1001 19th Ave S, Nashville, TN 37212, USA
| | - Kalyani Dhusia
- Department of Systems and Computational Biology, Albert Einstein College of Medicine, 1300 Morris Park Avenue, Bronx, NY 10461, USA
| | - Yinghao Wu
- Department of Systems and Computational Biology, Albert Einstein College of Medicine, 1300 Morris Park Avenue, Bronx, NY 10461, USA.
| |
Collapse
|
10
|
Janson G, Feig M. Transferable deep generative modeling of intrinsically disordered protein conformations. PLoS Comput Biol 2024; 20:e1012144. [PMID: 38781245 PMCID: PMC11152266 DOI: 10.1371/journal.pcbi.1012144] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2024] [Revised: 06/05/2024] [Accepted: 05/07/2024] [Indexed: 05/25/2024] Open
Abstract
Intrinsically disordered proteins have dynamic structures through which they play key biological roles. The elucidation of their conformational ensembles is a challenging problem requiring an integrated use of computational and experimental methods. Molecular simulations are a valuable computational strategy for constructing structural ensembles of disordered proteins but are highly resource-intensive. Recently, machine learning approaches based on deep generative models that learn from simulation data have emerged as an efficient alternative for generating structural ensembles. However, such methods currently suffer from limited transferability when modeling sequences and conformations absent in the training data. Here, we develop a novel generative model that achieves high levels of transferability for intrinsically disordered protein ensembles. The approach, named idpSAM, is a latent diffusion model based on transformer neural networks. It combines an autoencoder to learn a representation of protein geometry and a diffusion model to sample novel conformations in the encoded space. IdpSAM was trained on a large dataset of simulations of disordered protein regions performed with the ABSINTH implicit solvent model. Thanks to the expressiveness of its neural networks and its training stability, idpSAM faithfully captures 3D structural ensembles of test sequences with no similarity in the training set. Our study also demonstrates the potential for generating full conformational ensembles from datasets with limited sampling and underscores the importance of training set size for generalization. We believe that idpSAM represents a significant progress in transferable protein ensemble modeling through machine learning.
Collapse
Affiliation(s)
- Giacomo Janson
- Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, Michigan, United States of America
| | - Michael Feig
- Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, Michigan, United States of America
| |
Collapse
|
11
|
Ellaway JIJ, Anyango S, Nair S, Zaki HA, Nadzirin N, Powell HR, Gutmanas A, Varadi M, Velankar S. Identifying protein conformational states in the Protein Data Bank: Toward unlocking the potential of integrative dynamics studies. STRUCTURAL DYNAMICS (MELVILLE, N.Y.) 2024; 11:034701. [PMID: 38774441 PMCID: PMC11106648 DOI: 10.1063/4.0000251] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 03/07/2024] [Accepted: 05/08/2024] [Indexed: 05/24/2024]
Abstract
Studying protein dynamics and conformational heterogeneity is crucial for understanding biomolecular systems and treating disease. Despite the deposition of over 215 000 macromolecular structures in the Protein Data Bank and the advent of AI-based structure prediction tools such as AlphaFold2, RoseTTAFold, and ESMFold, static representations are typically produced, which fail to fully capture macromolecular motion. Here, we discuss the importance of integrating experimental structures with computational clustering to explore the conformational landscapes that manifest protein function. We describe the method developed by the Protein Data Bank in Europe - Knowledge Base to identify distinct conformational states, demonstrate the resource's primary use cases, through examples, and discuss the need for further efforts to annotate protein conformations with functional information. Such initiatives will be crucial in unlocking the potential of protein dynamics data, expediting drug discovery research, and deepening our understanding of macromolecular mechanisms.
Collapse
Affiliation(s)
- Joseph I. J. Ellaway
- Protein Data Bank in Europe, European Bioinformatics Institute, Hinxton, United Kingdom
| | - Stephen Anyango
- Protein Data Bank in Europe, European Bioinformatics Institute, Hinxton, United Kingdom
| | - Sreenath Nair
- Protein Data Bank in Europe, European Bioinformatics Institute, Hinxton, United Kingdom
| | - Hossam A. Zaki
- The Warren Alpert Medical School of Brown University, Providence, Rhode Island 02903, USA
| | - Nurul Nadzirin
- Protein Data Bank in Europe, European Bioinformatics Institute, Hinxton, United Kingdom
| | - Harold R. Powell
- Imperial College London, Department of Life Sciences, London, United Kingdom
| | - Aleksandras Gutmanas
- WaveBreak Therapeutics Ltd., Clarendon House, Clarendon Road, Cambridge, United Kingdom
| | - Mihaly Varadi
- Protein Data Bank in Europe, European Bioinformatics Institute, Hinxton, United Kingdom
| | - Sameer Velankar
- Protein Data Bank in Europe, European Bioinformatics Institute, Hinxton, United Kingdom
| |
Collapse
|
12
|
Janson G, Feig M. Transferable deep generative modeling of intrinsically disordered protein conformations. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.02.08.579522. [PMID: 38370653 PMCID: PMC10871340 DOI: 10.1101/2024.02.08.579522] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/20/2024]
Abstract
Intrinsically disordered proteins have dynamic structures through which they play key biological roles. The elucidation of their conformational ensembles is a challenging problem requiring an integrated use of computational and experimental methods. Molecular simulations are a valuable computational strategy for constructing structural ensembles of disordered proteins but are highly resource-intensive. Recently, machine learning approaches based on deep generative models that learn from simulation data have emerged as an efficient alternative for generating structural ensembles. However, such methods currently suffer from limited transferability when modeling sequences and conformations absent in the training data. Here, we develop a novel generative model that achieves high levels of transferability for intrinsically disordered protein ensembles. The approach, named idpSAM, is a latent diffusion model based on transformer neural networks. It combines an autoencoder to learn a representation of protein geometry and a diffusion model to sample novel conformations in the encoded space. IdpSAM was trained on a large dataset of simulations of disordered protein regions performed with the ABSINTH implicit solvent model. Thanks to the expressiveness of its neural networks and its training stability, idpSAM faithfully captures 3D structural ensembles of test sequences with no similarity in the training set. Our study also demonstrates the potential for generating full conformational ensembles from datasets with limited sampling and underscores the importance of training set size for generalization. We believe that idpSAM represents a significant progress in transferable protein ensemble modeling through machine learning.
Collapse
Affiliation(s)
- Giacomo Janson
- Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, Michigan, USA
| | - Michael Feig
- Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, Michigan, USA
| |
Collapse
|
13
|
Ghosh C, Nagpal S, Muñoz V. Molecular simulations integrated with experiments for probing the interaction dynamics and binding mechanisms of intrinsically disordered proteins. Curr Opin Struct Biol 2024; 84:102756. [PMID: 38118365 PMCID: PMC11242915 DOI: 10.1016/j.sbi.2023.102756] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2023] [Revised: 11/28/2023] [Accepted: 11/29/2023] [Indexed: 12/22/2023]
Abstract
Intrinsically disordered proteins (IDPs) exploit their plasticity to deploy a rich panoply of soft interactions and binding phenomena. Advances in tailoring molecular simulations for IDPs combined with experimental cross-validation offer an atomistic view of the mechanisms that control IDP binding, function, and dysfunction. The emerging theme is that unbound IDPs autonomously form transient local structures and self-interactions that determine their binding behavior. Recent results have shed light on whether and how IDPs fold, stay disordered or drive condensation upon binding; how they achieve binding specificity and select among competing partners. The disorder-binding paradigm is now being proactively used by researchers to target IDPs for rational drug design and engineer molecular responsive elements for biosensing applications.
Collapse
Affiliation(s)
- Catherine Ghosh
- NSF-CREST Center for Cellular and Biomolecular Machines (CCBM), University of California at Merced, Merced, 95343 CA, USA; Department of Bioengineering, University of California at Merced, Merced, 95343 CA, USA. https://twitter.com/cat_ghosh
| | - Suhani Nagpal
- NSF-CREST Center for Cellular and Biomolecular Machines (CCBM), University of California at Merced, Merced, 95343 CA, USA; Department of Bioengineering, University of California at Merced, Merced, 95343 CA, USA; OpenEye, Cadence Molecular Sciences, Boston, 02114 MA, USA
| | - Victor Muñoz
- NSF-CREST Center for Cellular and Biomolecular Machines (CCBM), University of California at Merced, Merced, 95343 CA, USA; Department of Bioengineering, University of California at Merced, Merced, 95343 CA, USA.
| |
Collapse
|
14
|
Taneja I, Lasker K. Machine-learning-based methods to generate conformational ensembles of disordered proteins. Biophys J 2024; 123:101-113. [PMID: 38053335 PMCID: PMC10808026 DOI: 10.1016/j.bpj.2023.12.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2023] [Revised: 10/24/2023] [Accepted: 12/01/2023] [Indexed: 12/07/2023] Open
Abstract
Intrinsically disordered proteins are characterized by a conformational ensemble. While computational approaches such as molecular dynamics simulations have been used to generate such ensembles, their computational costs can be prohibitive. An alternative approach is to learn from data and train machine-learning models to generate conformational ensembles of disordered proteins. This has been a relatively unexplored approach, and in this work we demonstrate a proof-of-principle approach to do so. Specifically, we devised a two-stage computational pipeline: in the first stage, we employed supervised machine-learning models to predict ensemble-derived two-dimensional (2D) properties of a sequence, given the conformational ensemble of a closely related sequence. In the second stage, we used denoising diffusion models to generate three-dimensional (3D) coarse-grained conformational ensembles, given the two-dimensional predictions outputted by the first stage. We trained our models on a data set of coarse-grained molecular dynamics simulations of thousands of rationally designed synthetic sequences. The accuracy of our 2D and 3D predictions was validated across multiple metrics, and our work demonstrates the applicability of machine-learning techniques to predicting higher-dimensional properties of disordered proteins.
Collapse
Affiliation(s)
- Ishan Taneja
- Department of Integrative Structural and Computational Biology, Scripps Research, La Jolla, California
| | - Keren Lasker
- Department of Integrative Structural and Computational Biology, Scripps Research, La Jolla, California.
| |
Collapse
|
15
|
Zhu J, Li Z, Tong H, Lu Z, Zhang N, Wei T, Chen HF. Phanto-IDP: compact model for precise intrinsically disordered protein backbone generation and enhanced sampling. Brief Bioinform 2023; 25:bbad429. [PMID: 38018910 PMCID: PMC10783862 DOI: 10.1093/bib/bbad429] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2023] [Revised: 09/21/2023] [Accepted: 11/05/2023] [Indexed: 11/30/2023] Open
Abstract
The biological function of proteins is determined not only by their static structures but also by the dynamic properties of their conformational ensembles. Numerous high-accuracy static structure prediction tools have been recently developed based on deep learning; however, there remains a lack of efficient and accurate methods for exploring protein dynamic conformations. Traditionally, studies concerning protein dynamics have relied on molecular dynamics (MD) simulations, which incur significant computational costs for all-atom precision and struggle to adequately sample conformational spaces with high energy barriers. To overcome these limitations, various enhanced sampling techniques have been developed to accelerate sampling in MD. Traditional enhanced sampling approaches like replica exchange molecular dynamics (REMD) and frontier expansion sampling (FEXS) often follow the MD simulation approach and still cost a lot of computational resources and time. Variational autoencoders (VAEs), as a classic deep generative model, are not restricted by potential energy landscapes and can explore conformational spaces more efficiently than traditional methods. However, VAEs often face challenges in generating reasonable conformations for complex proteins, especially intrinsically disordered proteins (IDPs), which limits their application as an enhanced sampling method. In this study, we presented a novel deep learning model (named Phanto-IDP) that utilizes a graph-based encoder to extract protein features and a transformer-based decoder combined with variational sampling to generate highly accurate protein backbones. Ten IDPs and four structured proteins were used to evaluate the sampling ability of Phanto-IDP. The results demonstrate that Phanto-IDP has high fidelity and diversity in the generated conformation ensembles, making it a suitable tool for enhancing the efficiency of MD simulation, generating broader protein conformational space and a continuous protein transition path.
Collapse
Affiliation(s)
- Junjie Zhu
- State Key Laboratory of Microbial Metabolism, Joint International Research Laboratory of Metabolic & Developmental Sciences, Department of Bioinformatics and Biostatistics, National Experimental Teaching Center for Life Sciences and Biotechnology, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, 200240, China
| | - Zhengxin Li
- State Key Laboratory of Microbial Metabolism, Joint International Research Laboratory of Metabolic & Developmental Sciences, Department of Bioinformatics and Biostatistics, National Experimental Teaching Center for Life Sciences and Biotechnology, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, 200240, China
| | - Haowei Tong
- State Key Laboratory of Microbial Metabolism, Joint International Research Laboratory of Metabolic & Developmental Sciences, Department of Bioinformatics and Biostatistics, National Experimental Teaching Center for Life Sciences and Biotechnology, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, 200240, China
| | - Zhouyu Lu
- State Key Laboratory of Microbial Metabolism, Joint International Research Laboratory of Metabolic & Developmental Sciences, Department of Bioinformatics and Biostatistics, National Experimental Teaching Center for Life Sciences and Biotechnology, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, 200240, China
| | - Ningjie Zhang
- State Key Laboratory of Microbial Metabolism, Joint International Research Laboratory of Metabolic & Developmental Sciences, Department of Bioinformatics and Biostatistics, National Experimental Teaching Center for Life Sciences and Biotechnology, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, 200240, China
| | - Ting Wei
- State Key Laboratory of Microbial Metabolism, Joint International Research Laboratory of Metabolic & Developmental Sciences, Department of Bioinformatics and Biostatistics, National Experimental Teaching Center for Life Sciences and Biotechnology, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, 200240, China
| | - Hai-Feng Chen
- State Key Laboratory of Microbial Metabolism, Joint International Research Laboratory of Metabolic & Developmental Sciences, Department of Bioinformatics and Biostatistics, National Experimental Teaching Center for Life Sciences and Biotechnology, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, 200240, China
| |
Collapse
|
16
|
Amith W, Dutagaci B. Complex Conformational Space of the RNA Polymerase II C-Terminal Domain upon Phosphorylation. J Phys Chem B 2023; 127:9223-9235. [PMID: 37870995 PMCID: PMC10626582 DOI: 10.1021/acs.jpcb.3c02655] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2023] [Revised: 10/03/2023] [Indexed: 10/25/2023]
Abstract
Intrinsically disordered proteins (IDPs) have been closely studied during the past decade due to their importance in many biological processes. The disordered nature of this group of proteins makes it difficult to observe its full span of the conformational space using either experimental or computational studies. In this article, we explored the conformational space of the C-terminal domain (CTD) of RNA polymerase II (Pol II), which is also an intrinsically disordered low complexity domain, using enhanced sampling methods. We provided a detailed conformational analysis of model systems of CTD with different lengths; first with the last 44 residues of the human CTD sequence and finally the CTD model with 2-heptapeptide repeating units. We then investigated the effects of phosphorylation on CTD conformations by performing simulations at different phosphorylated states. We obtained broad conformational spaces in nonphosphorylated CTD models, and phosphorylation has complex effects on the conformations of the CTD. These complex effects depend on the length of the CTD, spacing between the multiple phosphorylation sites, ion coordination, and interactions with the nearby residues.
Collapse
Affiliation(s)
- Weththasinghage
D. Amith
- Department of Molecular and
Cell Biology, University of California,
Merced, Merced, California 95343, United States
| | - Bercem Dutagaci
- Department of Molecular and
Cell Biology, University of California,
Merced, Merced, California 95343, United States
| |
Collapse
|
17
|
Zheng LE, Barethiya S, Nordquist E, Chen J. Machine Learning Generation of Dynamic Protein Conformational Ensembles. Molecules 2023; 28:4047. [PMID: 37241789 PMCID: PMC10220786 DOI: 10.3390/molecules28104047] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2023] [Revised: 05/04/2023] [Accepted: 05/09/2023] [Indexed: 05/28/2023] Open
Abstract
Machine learning has achieved remarkable success across a broad range of scientific and engineering disciplines, particularly its use for predicting native protein structures from sequence information alone. However, biomolecules are inherently dynamic, and there is a pressing need for accurate predictions of dynamic structural ensembles across multiple functional levels. These problems range from the relatively well-defined task of predicting conformational dynamics around the native state of a protein, which traditional molecular dynamics (MD) simulations are particularly adept at handling, to generating large-scale conformational transitions connecting distinct functional states of structured proteins or numerous marginally stable states within the dynamic ensembles of intrinsically disordered proteins. Machine learning has been increasingly applied to learn low-dimensional representations of protein conformational spaces, which can then be used to drive additional MD sampling or directly generate novel conformations. These methods promise to greatly reduce the computational cost of generating dynamic protein ensembles, compared to traditional MD simulations. In this review, we examine recent progress in machine learning approaches towards generative modeling of dynamic protein ensembles and emphasize the crucial importance of integrating advances in machine learning, structural data, and physical principles to achieve these ambitious goals.
Collapse
Affiliation(s)
- Li-E Zheng
- Department of Gynecology, The First Affiliated Hospital of Fujian Medical University, Fuzhou 350005, China;
| | - Shrishti Barethiya
- Department of Chemistry, University of Massachusetts Amherst, Amherst, MA 01003, USA; (S.B.); (E.N.)
| | - Erik Nordquist
- Department of Chemistry, University of Massachusetts Amherst, Amherst, MA 01003, USA; (S.B.); (E.N.)
| | - Jianhan Chen
- Department of Chemistry, University of Massachusetts Amherst, Amherst, MA 01003, USA; (S.B.); (E.N.)
| |
Collapse
|
18
|
Zhang O, Haghighatlari M, Li J, Liu ZH, Namini A, Teixeira JMC, Forman-Kay JD, Head-Gordon T. Learning to evolve structural ensembles of unfolded and disordered proteins using experimental solution data. J Chem Phys 2023; 158:174113. [PMID: 37144719 PMCID: PMC10163956 DOI: 10.1063/5.0141474] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2023] [Accepted: 04/11/2023] [Indexed: 05/06/2023] Open
Abstract
The structural characterization of proteins with a disorder requires a computational approach backed by experiments to model their diverse and dynamic structural ensembles. The selection of conformational ensembles consistent with solution experiments of disordered proteins highly depends on the initial pool of conformers, with currently available tools limited by conformational sampling. We have developed a Generative Recurrent Neural Network (GRNN) that uses supervised learning to bias the probability distributions of torsions to take advantage of experimental data types such as nuclear magnetic resonance J-couplings, nuclear Overhauser effects, and paramagnetic resonance enhancements. We show that updating the generative model parameters according to the reward feedback on the basis of the agreement between experimental data and probabilistic selection of torsions from learned distributions provides an alternative to existing approaches that simply reweight conformers of a static structural pool for disordered proteins. Instead, the biased GRNN, DynamICE, learns to physically change the conformations of the underlying pool of the disordered protein to those that better agree with experiments.
Collapse
Affiliation(s)
- Oufan Zhang
- Kenneth S. Pitzer Theory Center and Department of Chemistry, University of California, Berkeley, California 94720, USA
| | - Mojtaba Haghighatlari
- Kenneth S. Pitzer Theory Center and Department of Chemistry, University of California, Berkeley, California 94720, USA
| | - Jie Li
- Kenneth S. Pitzer Theory Center and Department of Chemistry, University of California, Berkeley, California 94720, USA
| | | | - Ashley Namini
- Molecular Medicine Program, Hospital for Sick Children, Toronto, Ontario M5S 1A8, Canada
| | | | | | | |
Collapse
|
19
|
Gaalswyk K, Haider A, Ghosh K. Critical Assessment of Self-Consistency Checks in the All-Atom Molecular Dynamics Simulation of Intrinsically Disordered Proteins. J Chem Theory Comput 2023; 19:2973-2984. [PMID: 37133846 DOI: 10.1021/acs.jctc.2c01140] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
All atom simulations can be used to quantify conformational properties of Intrinsically Disordered Proteins (IDP). However, simulations must satisfy convergence checks to ensure observables computed from simulation are reliable and reproducible. While absolute convergence is purely a theoretical concept requiring infinitely long simulation, a more practical, yet rigorous, approach is to impose Self Consistency Checks (SCCs) to gain confidence in the simulated data. Currently there is no study of SCCs in IDPs, unlike their folded counterparts. In this paper, we introduce different criteria for self-consistency checks for IDPs. Next, we impose these SCCs to critically assess the performance of different simulation protocols using the N terminal domain of HIV Integrase and the linker region of SARS-CoV-2 Nucleoprotein as two model IDPs. All simulation protocols begin with all-atom implicit solvent Monte Carlo (MC) simulation and subsequent clustering of MC generated conformations to create the representative structures of the IDPs. These representative structures serve as the initial structure for subsequent molecular dynamics (MD) runs with explicit solvent. We conclude that generating multiple short (∼3 μs) MD simulation trajectories─all starting from the most representative MC generated conformation─and merging them is the protocol of choice due to (i) its ability to satisfy multiple SCCs, (ii) consistently reproducing experimental data, and (iii) the efficiency of running independent trajectories in parallel by harnessing multiple cores available in modern GPU clusters. Running one long trajectory (greater than 20 μs) can also satisfy the first two criteria but is less desirable due to prohibitive computation time. These findings help resolve the challenge of identifying a usable starting configuration, provide an objective measure of SCC, and establish rigorous criteria to determine the minimum length (for one long simulation) or number of trajectories needed in all-atom simulation of IDPs.
Collapse
Affiliation(s)
- Kari Gaalswyk
- Department of Physics and Astronomy, University of Denver, Denver, Colorado 80208, United States
| | - Austin Haider
- Department of Molecular and Cellular Biophysics, University of Denver, Denver, Colorado 80208, United States
| | - Kingshuk Ghosh
- Department of Physics and Astronomy, University of Denver, Denver, Colorado 80208, United States
- Department of Molecular and Cellular Biophysics, University of Denver, Denver, Colorado 80208, United States
| |
Collapse
|
20
|
de Bruyn E, Dorn AE, Zimmermann O, Rossetti G. SPEADI: Accelerated Analysis of IDP-Ion Interactions from MD-Trajectories. BIOLOGY 2023; 12:581. [PMID: 37106781 PMCID: PMC10135740 DOI: 10.3390/biology12040581] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/17/2023] [Revised: 04/04/2023] [Accepted: 04/05/2023] [Indexed: 04/29/2023]
Abstract
The disordered nature of Intrinsically Disordered Proteins (IDPs) makes their structural ensembles particularly susceptible to changes in chemical environmental conditions, often leading to an alteration of their normal functions. A Radial Distribution Function (RDF) is considered a standard method for characterizing the chemical environment surrounding particles during atomistic simulations, commonly averaged over an entire or part of a trajectory. Given their high structural variability, such averaged information might not be reliable for IDPs. We introduce the Time-Resolved Radial Distribution Function (TRRDF), implemented in our open-source Python package SPEADI, which is able to characterize dynamic environments around IDPs. We use SPEADI to characterize the dynamic distribution of ions around the IDPs Alpha-Synuclein (AS) and Humanin (HN) from Molecular Dynamics (MD) simulations, and some of their selected mutants, showing that local ion-residue interactions play an important role in the structures and behaviors of IDPs.
Collapse
Affiliation(s)
- Emile de Bruyn
- Jülich Supercomputing Centre, Forschungszentrum Jülich, 52425 Jülich, Germany
- Faculty of Mathematics, Computer Science and Natural Sciences, RWTH Aachen University, 52062 Aachen, Germany
| | - Anton Emil Dorn
- Faculty of Mathematics, Computer Science and Natural Sciences, RWTH Aachen University, 52062 Aachen, Germany
| | - Olav Zimmermann
- Jülich Supercomputing Centre, Forschungszentrum Jülich, 52425 Jülich, Germany
| | - Giulia Rossetti
- Jülich Supercomputing Centre, Forschungszentrum Jülich, 52425 Jülich, Germany
- Computational Biomedicine, Institute for Advanced Simulation IAS-5 and Institute of Neuroscience and Medicine INM-9, Forschungszentrum Jülich, 52425 Jülich, Germany
- Department of Neurology, RWTH Aachen University, 52062 Aachen, Germany
| |
Collapse
|
21
|
Zhu JJ, Zhang NJ, Wei T, Chen HF. Enhancing Conformational Sampling for Intrinsically Disordered and Ordered Proteins by Variational Autoencoder. Int J Mol Sci 2023; 24:ijms24086896. [PMID: 37108059 PMCID: PMC10138423 DOI: 10.3390/ijms24086896] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2023] [Revised: 03/26/2023] [Accepted: 03/27/2023] [Indexed: 04/29/2023] Open
Abstract
Intrinsically disordered proteins (IDPs) account for more than 50% of the human proteome and are closely associated with tumors, cardiovascular diseases, and neurodegeneration, which have no fixed three-dimensional structure under physiological conditions. Due to the characteristic of conformational diversity, conventional experimental methods of structural biology, such as NMR, X-ray diffraction, and CryoEM, are unable to capture conformational ensembles. Molecular dynamics (MD) simulation can sample the dynamic conformations at the atomic level, which has become an effective method for studying the structure and function of IDPs. However, the high computational cost prevents MD simulations from being widely used for IDPs conformational sampling. In recent years, significant progress has been made in artificial intelligence, which makes it possible to solve the conformational reconstruction problem of IDP with fewer computational resources. Here, based on short MD simulations of different IDPs systems, we use variational autoencoders (VAEs) to achieve the generative reconstruction of IDPs structures and include a wider range of sampled conformations from longer simulations. Compared with the generative autoencoder (AEs), VAEs add an inference layer between the encoder and decoder in the latent space, which can cover the conformational landscape of IDPs more comprehensively and achieve the effect of enhanced sampling. Through experimental verification, the Cα RMSD between VAE-generated and MD simulation sampling conformations in the 5 IDPs test systems was significantly lower than that of AE. The Spearman correlation coefficient on the structure was higher than that of AE. VAE can also achieve excellent performance regarding structured proteins. In summary, VAEs can be used to effectively sample protein structures.
Collapse
Affiliation(s)
- Jun-Jie Zhu
- State Key Laboratory of Microbial Metabolism, Joint International Research Laboratory of Metabolic & Developmental Sciences, Department of Bioinformatics and Biostatistics, National Experimental Teaching Center for Life Sciences and Biotechnology, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Ning-Jie Zhang
- State Key Laboratory of Microbial Metabolism, Joint International Research Laboratory of Metabolic & Developmental Sciences, Department of Bioinformatics and Biostatistics, National Experimental Teaching Center for Life Sciences and Biotechnology, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Ting Wei
- State Key Laboratory of Microbial Metabolism, Joint International Research Laboratory of Metabolic & Developmental Sciences, Department of Bioinformatics and Biostatistics, National Experimental Teaching Center for Life Sciences and Biotechnology, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Hai-Feng Chen
- State Key Laboratory of Microbial Metabolism, Joint International Research Laboratory of Metabolic & Developmental Sciences, Department of Bioinformatics and Biostatistics, National Experimental Teaching Center for Life Sciences and Biotechnology, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200240, China
- Shanghai Center for Bioinformation Technology, Shanghai 200240, China
| |
Collapse
|
22
|
Structural ensembles of disordered proteins from hierarchical chain growth and simulation. Curr Opin Struct Biol 2023; 78:102501. [PMID: 36463772 DOI: 10.1016/j.sbi.2022.102501] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2022] [Revised: 10/26/2022] [Accepted: 10/28/2022] [Indexed: 12/03/2022]
Abstract
Disordered proteins and nucleic acids play key roles in cellular function and disease. Here, we review recent advances in the computational exploration of the conformational dynamics of flexible biomolecules. While atomistic molecular dynamics (MD) simulation has seen a lot of improvement in recent years, large-scale computing resources and careful validation are required to simulate full-length disordered biopolymers in solution. As a computationally efficient alternative, hierarchical chain growth (HCG) combines pre-sampled chain fragments in a statistically reproducible manner into ensembles of full-length atomically detailed biomolecular structures. Experimental data can be integrated during and after chain assembly. Applications to the neurodegeneration-linked proteins α-synuclein, tau, and TDP-43, including as condensate, illustrate the use of HCG. We conclude by highlighting the emerging connections to AI-based structural modeling including AlphaFold2.
Collapse
|
23
|
Sun B, Kekenes-Huskey PM. Myofilament-associated proteins with intrinsic disorder (MAPIDs) and their resolution by computational modeling. Q Rev Biophys 2023; 56:e2. [PMID: 36628457 PMCID: PMC11070111 DOI: 10.1017/s003358352300001x] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/12/2023]
Abstract
The cardiac sarcomere is a cellular structure in the heart that enables muscle cells to contract. Dozens of proteins belong to the cardiac sarcomere, which work in tandem to generate force and adapt to demands on cardiac output. Intriguingly, the majority of these proteins have significant intrinsic disorder that contributes to their functions, yet the biophysics of these intrinsically disordered regions (IDRs) have been characterized in limited detail. In this review, we first enumerate these myofilament-associated proteins with intrinsic disorder (MAPIDs) and recent biophysical studies to characterize their IDRs. We secondly summarize the biophysics governing IDR properties and the state-of-the-art in computational tools toward MAPID identification and characterization of their conformation ensembles. We conclude with an overview of future computational approaches toward broadening the understanding of intrinsic disorder in the cardiac sarcomere.
Collapse
Affiliation(s)
- Bin Sun
- Research Center for Pharmacoinformatics (The State-Province Key Laboratories of Biomedicine-Pharmaceutics of China), Department of Medicinal Chemistry and Natural Medicine Chemistry, College of Pharmacy, Harbin Medical University, Harbin 150081, China
| | | |
Collapse
|
24
|
Lee KE, Procopio R, Pulido JS, Gunton KB. Initial Investigations of Intrinsically Disordered Regions in Inherited Retinal Diseases. Int J Mol Sci 2023; 24:ijms24021060. [PMID: 36674574 PMCID: PMC9861917 DOI: 10.3390/ijms24021060] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2022] [Revised: 12/16/2022] [Accepted: 12/30/2022] [Indexed: 01/08/2023] Open
Abstract
Intrinsically disordered regions (IDRs) are protein regions that are unable to fold into stable tertiary structures, enabling their involvement in key signaling and regulatory functions via dynamic interactions with diverse binding partners. An understanding of IDRs and their association with biological function may help elucidate the pathogenesis of inherited retinal diseases (IRDs). The main focus of this work was to investigate the degree of disorder in 14 proteins implicated in IRDs and their relationship with the number of pathogenic missense variants. Metapredict, an accurate, high-performance predictor that reproduces consensus disorder scores, was used to probe the degree of disorder as a function of the amino acid sequence. Publicly available data on gnomAD and ClinVar was used to analyze the number of pathogenic missense variants. We show that proteins with an over-representation of missense variation exhibit a high degree of disorder, and proteins with a high amount of disorder tolerate a higher degree of missense variation. These proteins also exhibit a lower amount of pathogenic missense variants with respect to total missense variants. These data suggest that protein function may be related to the overall level of disorder and could be used to refine variant interpretation in IRDs.
Collapse
Affiliation(s)
- Karen E. Lee
- Pediatric Ophthalmology & Adult Strabismus Service, Wills Eye Hospital, 840 Walnut Street, Philadelphia, PA 19107, USA
| | - Rebecca Procopio
- Pediatric Ophthalmology & Adult Strabismus Service, Wills Eye Hospital, 840 Walnut Street, Philadelphia, PA 19107, USA
| | - Jose S. Pulido
- Retina Service, Wills Eye Hospital and Mid Atlantic Retina, 840 Walnut Street, Philadelphia, PA 19107, USA
- Department of Translational Ophthalmology, Wills Eye Hospital, 840 Walnut Street, Philadelphia, PA 19107, USA
- Correspondence:
| | - Kammi B. Gunton
- Pediatric Ophthalmology & Adult Strabismus Service, Wills Eye Hospital, 840 Walnut Street, Philadelphia, PA 19107, USA
| |
Collapse
|
25
|
Agard DA, Bowman GR, DeGrado W, Dokholyan NV, Zhou HX. Solution of the protein structure prediction problem at last: crucial innovations and next frontiers. Fac Rev 2022; 11:38. [PMID: 36644294 PMCID: PMC9815721 DOI: 10.12703/r-01-0000020] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022] Open
Abstract
The protein structure prediction problem is solved, at last, thanks in large part to the use of artificial intelligence. The structures predicted by AlphaFold and RoseTTAFold are becoming the requisite starting point for many protein scientists. New frontiers, such as the conformational sampling of intrinsically disordered proteins, are emerging.
Collapse
Affiliation(s)
- David A Agard
- University of California San Francisco; Chan Zuckerberg Institute for Advanced Biological Imaging
| | | | | | | | | |
Collapse
|