1
|
Chang L, Perez A. AlphaFold2 knows some protein folding principles. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.08.25.609581. [PMID: 39253449 PMCID: PMC11383045 DOI: 10.1101/2024.08.25.609581] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 09/11/2024]
Abstract
AlphaFold2 (AF2) has revolutionized protein structure prediction. However, a common confusion lies in equating the protein structure prediction problem with the protein folding problem. The former provides a static structure, while the latter explains the dynamic folding pathway to that structure. We challenge the current status quo and advocate that AF2 has indeed learned some protein folding principles, despite being designed for structure prediction. AF2's high-dimensional parameters encode an imperfect biophysical scoring function. Typically, AF2 uses multiple sequence alignments (MSAs) to guide the search within a narrow region of its learned surface. In our study, we operate AF2 without MSAs or initial templates, forcing it to sample its entire energy landscape - more akin to an ab initio approach. Among over 7,000 proteins, a fraction fold using sequence alone, highlighting the smoothness of AF2's learned energy surface. Additionally, by combining recycling and iterative predictions, we discover multiple AF2 intermediate structures in good agreement with known experimental data. AF2 appears to follow a "local first, global later" folding mechanism. For designed proteins with more optimized local interactions, AF2's energy landscape is too smooth to detect intermediates even when it should. Our current work sheds new light on what AF2 has learned and opens exciting possibilities to advance our understanding of protein folding and for experimental discovery of folding intermediates.
Collapse
Affiliation(s)
- Liwei Chang
- Department of Chemistry, University of Florida, Gainesville & 32611, United States
| | - Alberto Perez
- Department of Chemistry, University of Florida, Gainesville & 32611, United States
| |
Collapse
|
2
|
Zhao K, Zhao P, Wang S, Xia Y, Zhang G. FoldPAthreader: predicting protein folding pathway using a novel folding force field model derived from known protein universe. Genome Biol 2024; 25:152. [PMID: 38862984 PMCID: PMC11167914 DOI: 10.1186/s13059-024-03291-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2024] [Accepted: 05/29/2024] [Indexed: 06/13/2024] Open
Abstract
Protein folding has become a tractable problem with the significant advances in deep learning-driven protein structure prediction. Here we propose FoldPAthreader, a protein folding pathway prediction method that uses a novel folding force field model by exploring the intrinsic relationship between protein evolution and folding from the known protein universe. Further, the folding force field is used to guide Monte Carlo conformational sampling, driving the protein chain fold into its native state by exploring potential intermediates. On 30 example targets, FoldPAthreader successfully predicts 70% of the proteins whose folding pathway is consistent with biological experimental data.
Collapse
Affiliation(s)
- Kailong Zhao
- College of Information Engineering, Zhejiang University of Technology, HangZhou, 310023, China
| | - Pengxin Zhao
- College of Information Engineering, Zhejiang University of Technology, HangZhou, 310023, China
| | - Suhui Wang
- College of Information Engineering, Zhejiang University of Technology, HangZhou, 310023, China
| | - Yuhao Xia
- College of Information Engineering, Zhejiang University of Technology, HangZhou, 310023, China
| | - Guijun Zhang
- College of Information Engineering, Zhejiang University of Technology, HangZhou, 310023, China.
| |
Collapse
|
3
|
Bitran A, Park K, Serebryany E, Shakhnovich EI. Co-translational formation of disulfides guides folding of the SARS-CoV-2 receptor binding domain. Biophys J 2023; 122:3238-3253. [PMID: 37422697 PMCID: PMC10465708 DOI: 10.1016/j.bpj.2023.07.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2022] [Revised: 05/27/2023] [Accepted: 07/03/2023] [Indexed: 07/10/2023] Open
Abstract
Many secreted proteins, including viral proteins, contain multiple disulfide bonds. How disulfide formation is coupled to protein folding in the cell remains poorly understood at the molecular level. Here, we combine experiment and simulation to address this question as it pertains to the SARS-CoV-2 receptor binding domain (RBD). We show that the RBD can only refold reversibly if its native disulfides are present before folding. But in their absence, the RBD spontaneously misfolds into a nonnative, molten-globule-like state that is structurally incompatible with complete disulfide formation and that is highly prone to aggregation. Thus, the RBD native structure represents a metastable state on the protein's energy landscape with reduced disulfides, indicating that nonequilibrium mechanisms are needed to ensure native disulfides form before folding. Our atomistic simulations suggest that this may be achieved via co-translational folding during RBD secretion into the endoplasmic reticulum. Namely, at intermediate translation lengths, native disulfide pairs are predicted to come together with high probability, and thus, under suitable kinetic conditions, this process may lock the protein into its native state and circumvent highly aggregation-prone nonnative intermediates. This detailed molecular picture of the RBD folding landscape may shed light on SARS-CoV-2 pathology and molecular constraints governing SARS-CoV-2 evolution.
Collapse
Affiliation(s)
- Amir Bitran
- Department of Chemistry and Chemical Biology, Harvard University, Cambridge, Massachusetts; PhD Program in Biophysics, Harvard University, Cambridge, Massachusetts.
| | - Kibum Park
- Department of Chemistry and Chemical Biology, Harvard University, Cambridge, Massachusetts
| | - Eugene Serebryany
- Department of Chemistry and Chemical Biology, Harvard University, Cambridge, Massachusetts
| | - Eugene I Shakhnovich
- Department of Chemistry and Chemical Biology, Harvard University, Cambridge, Massachusetts.
| |
Collapse
|
4
|
Serebryany E, Zhao VY, Park K, Bitran A, Trauger SA, Budnik B, Shakhnovich EI. Systematic conformation-to-phenotype mapping via limited deep sequencing of proteins. Mol Cell 2023; 83:1936-1952.e7. [PMID: 37267908 PMCID: PMC10281453 DOI: 10.1016/j.molcel.2023.05.006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2022] [Revised: 01/29/2023] [Accepted: 05/03/2023] [Indexed: 06/04/2023]
Abstract
Non-native conformations drive protein-misfolding diseases, complicate bioengineering efforts, and fuel molecular evolution. No current experimental technique is well suited for elucidating them and their phenotypic effects. Especially intractable are the transient conformations populated by intrinsically disordered proteins. We describe an approach to systematically discover, stabilize, and purify native and non-native conformations, generated in vitro or in vivo, and directly link conformations to molecular, organismal, or evolutionary phenotypes. This approach involves high-throughput disulfide scanning (HTDS) of the entire protein. To reveal which disulfides trap which chromatographically resolvable conformers, we devised a deep-sequencing method for double-Cys variant libraries of proteins that precisely and simultaneously locates both Cys residues within each polypeptide. HTDS of the abundant E. coli periplasmic chaperone HdeA revealed distinct classes of disordered hydrophobic conformers with variable cytotoxicity depending on where the backbone was cross-linked. HTDS can bridge conformational and phenotypic landscapes for many proteins that function in disulfide-permissive environments.
Collapse
Affiliation(s)
- Eugene Serebryany
- Department of Chemistry and Chemical Biology, Harvard University, Cambridge, MA 02138, USA.
| | - Victor Y Zhao
- Department of Chemistry and Chemical Biology, Harvard University, Cambridge, MA 02138, USA
| | - Kibum Park
- Department of Chemistry and Chemical Biology, Harvard University, Cambridge, MA 02138, USA
| | - Amir Bitran
- Department of Chemistry and Chemical Biology, Harvard University, Cambridge, MA 02138, USA
| | - Sunia A Trauger
- Center for Mass Spectrometry, Harvard University, Cambridge, MA 02138, USA
| | - Bogdan Budnik
- Center for Mass Spectrometry, Harvard University, Cambridge, MA 02138, USA
| | - Eugene I Shakhnovich
- Department of Chemistry and Chemical Biology, Harvard University, Cambridge, MA 02138, USA.
| |
Collapse
|
5
|
Serebryany E, Zhao VY, Park K, Bitran A, Trauger SA, Budnik B, Shakhnovich EI. Systematic conformation-to-phenotype mapping via limited deep-sequencing of proteins. ARXIV 2023:2204.06159. [PMID: 36776823 PMCID: PMC9915745] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 02/14/2023]
Abstract
Non-native conformations drive protein misfolding diseases, complicate bioengineering efforts, and fuel molecular evolution. No current experimental technique is well-suited for elucidating them and their phenotypic effects. Especially intractable are the transient conformations populated by intrinsically disordered proteins. We describe an approach to systematically discover, stabilize, and purify native and non-native conformations, generated in vitro or in vivo, and directly link conformations to molecular, organismal, or evolutionary phenotypes. This approach involves high-throughput disulfide scanning (HTDS) of the entire protein. To reveal which disulfides trap which chromatographically resolvable conformers, we devised a deep-sequencing method for double-Cys variant libraries of proteins that precisely and simultaneously locates both Cys residues within each polypeptide. HTDS of the abundant E. coli periplasmic chaperone HdeA revealed distinct classes of disordered hydrophobic conformers with variable cytotoxicity depending on where the backbone was cross-linked. HTDS can bridge conformational and phenotypic landscapes for many proteins that function in disulfide-permissive environments.
Collapse
Affiliation(s)
- Eugene Serebryany
- Department of Chemistry and Chemical Biology, Harvard University, Cambridge, MA
| | - Victor Y. Zhao
- Department of Chemistry and Chemical Biology, Harvard University, Cambridge, MA
| | - Kibum Park
- Department of Chemistry and Chemical Biology, Harvard University, Cambridge, MA
| | - Amir Bitran
- Department of Chemistry and Chemical Biology, Harvard University, Cambridge, MA
| | | | - Bogdan Budnik
- Center for Mass Spectrometry, Harvard University, Cambridge, MA
| | | |
Collapse
|
6
|
Bitran A, Park K, Serebryany E, Shakhnovich EI. Cotranslational formation of disulfides guides folding of the SARS COV-2 receptor binding domain. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2022:2022.11.10.516025. [PMID: 36380756 PMCID: PMC9665344 DOI: 10.1101/2022.11.10.516025] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
Many secreted proteins contain multiple disulfide bonds. How disulfide formation is coupled to protein folding in the cell remains poorly understood at the molecular level. Here, we combine experiment and simulation to address this question as it pertains to the SARS-CoV-2 receptor binding domain (RBD). We show that, whereas RBD can refold reversibly when its disulfides are intact, their disruption causes misfolding into a nonnative molten-globule state that is highly prone to aggregation and disulfide scrambling. Thus, non-equilibrium mechanisms are needed to ensure disulfides form prior to folding in vivo. Our simulations suggest that co-translational folding may accomplish this, as native disulfide pairs are predicted to form with high probability at intermediate lengths, ultimately committing the RBD to its metastable native state and circumventing nonnative intermediates. This detailed molecular picture of the RBD folding landscape may shed light on SARS-CoV-2 pathology and molecular constraints governing SARS-CoV-2 evolution.
Collapse
|
7
|
Woodard J, Iqbal S, Mashaghi A. Circuit topology predicts pathogenicity of missense mutations. Proteins 2022; 90:1634-1644. [PMID: 35394672 PMCID: PMC9543832 DOI: 10.1002/prot.26342] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2021] [Revised: 03/07/2022] [Accepted: 03/30/2022] [Indexed: 12/05/2022]
Abstract
The contact topology of a protein determines important aspects of the folding process. The topological measure of contact order has been shown to be predictive of the rate of folding. Circuit topology is emerging as another fundamental descriptor of biomolecular structure, with predicted effects on the folding rate. We analyze the residue‐based circuit topological environments of 21 K mutations labeled as pathogenic or benign. Multiple statistical lines of reasoning support the conclusion that the number of contacts in two specific circuit topological arrangements, namely inverse parallel and cross relations, with contacts involving the mutated residue have discriminatory value in determining the pathogenicity of human variants. We investigate how results vary with residue type and according to whether the gene is essential. We further explore the relationship to a number of structural features and find that circuit topology provides nonredundant information on protein structures and pathogenicity of mutations. Results may have implications for the polymer physics of protein folding and suggest that “local” topological information, including residue‐based circuit topology and residue contact order, could be useful in improving state‐of‐the‐art machine learning algorithms for pathogenicity prediction.
Collapse
Affiliation(s)
- Jaie Woodard
- Medical Systems Biophysics and Bioengineering, Leiden Academic Centre for Drug Research, Faculty of Science, Leiden University, Leiden, The Netherlands.,Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, USA
| | - Sumaiya Iqbal
- Center for the Development of Therapeutics, Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA.,Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA.,Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA.,Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, Massachusetts, USA
| | - Alireza Mashaghi
- Medical Systems Biophysics and Bioengineering, Leiden Academic Centre for Drug Research, Faculty of Science, Leiden University, Leiden, The Netherlands.,Centre for Interdisciplinary Genome Research, Faculty of Science, Leiden University, Leiden, The Netherlands
| |
Collapse
|
8
|
Chang L, Perez A. Deciphering the Folding Mechanism of Proteins G and L and Their Mutants. J Am Chem Soc 2022; 144:14668-14677. [PMID: 35930769 DOI: 10.1021/jacs.2c04488] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/24/2023]
Abstract
Much of our understanding of folding mechanisms comes from interpretations of experimental ϕ and ψ value analysis, relating the differences in stability of the transition state ensemble (TSE) and folded state. We introduce a unified approach combining simulations and Bayesian inference to provide atomistic detail for the folding mechanism of proteins G and L and their mutants. Proteins G and L fold to similar topologies despite low sequence similarity, but differ in their folding pathways. A fast folding redesign of protein G, NuG2, switches folding pathways and folds through a similar pathway with protein L. A redesign of protein L also leads to faster folding, respecting the original folding pathway. Our Bayesian inference approach starts from the same prior on all systems and correctly identifies the folding mechanism for each of the four proteins, a success of the force field and sampling strategy. The approach is computationally efficient and correctly identifies the TSE and intermediate structures along the folding pathway in good agreement with experiments. We complement our findings by using two orthogonal approaches that differ in computational cost and interpretability. Adaptive sampling MD combined with the Markov state model provides a kinetic model that confirms the more complex folding mechanism of protein G and its mutant. Finally, a novel fragment decomposition approach using AlphaFold identifies preferences for secondary structure element combinations that follow the order of events observed in the folding pathways.
Collapse
Affiliation(s)
- Liwei Chang
- Department of Chemistry, University of Florida, Gainesville, Florida 32611, United States.,Quantum Theory Project, University of Florida, Gainesville, Florida 32611, United States
| | - Alberto Perez
- Department of Chemistry, University of Florida, Gainesville, Florida 32611, United States.,Quantum Theory Project, University of Florida, Gainesville, Florida 32611, United States
| |
Collapse
|