1
|
Chu SKS, Narang K, Siegel JB. Protein stability prediction by fine-tuning a protein language model on a mega-scale dataset. PLoS Comput Biol 2024; 20:e1012248. [PMID: 39038042 PMCID: PMC11293664 DOI: 10.1371/journal.pcbi.1012248] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2023] [Revised: 08/01/2024] [Accepted: 06/13/2024] [Indexed: 07/24/2024] Open
Abstract
Protein stability plays a crucial role in a variety of applications, such as food processing, therapeutics, and the identification of pathogenic mutations. Engineering campaigns commonly seek to improve protein stability, and there is a strong interest in streamlining these processes to enable rapid optimization of highly stabilized proteins with fewer iterations. In this work, we explore utilizing a mega-scale dataset to develop a protein language model optimized for stability prediction. ESMtherm is trained on the folding stability of 528k natural and de novo sequences derived from 461 protein domains and can accommodate deletions, insertions, and multiple-point mutations. We show that a protein language model can be fine-tuned to predict folding stability. ESMtherm performs reasonably on small protein domains and generalizes to sequences distal from the training set. Lastly, we discuss our model's limitations compared to other state-of-the-art methods in generalizing to larger protein scaffolds. Our results highlight the need for large-scale stability measurements on a diverse dataset that mirrors the distribution of sequence lengths commonly observed in nature.
Collapse
Affiliation(s)
- Simon K. S. Chu
- Biophysics Graduate Program, University of California Davis, Davis, California, United States of America
| | - Kush Narang
- College of Biological Sciences, University of California Davis, Davis, California, United States of America
| | - Justin B. Siegel
- Genome Center, University of California Davis, Davis, California, United States of America
- Department of Chemistry, University of California Davis, Davis, California, United States of America
- Department of Biochemistry and Molecular Medicine, University of California Davis, Davis, California, United States of America
| |
Collapse
|
2
|
De Filippis V, Pozzi N, Acquasaliente L, Artusi I, Pontarollo G, Peterle D. Protein engineering by chemical methods: Incorporation of nonnatural amino acids as a tool for studying protein folding, stability, and function. Pept Sci (Hoboken) 2018. [DOI: 10.1002/pep2.24090] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
Affiliation(s)
- Vincenzo De Filippis
- Laboratory of Protein Chemistry, Department of Pharmaceutical & Pharmacological SciencesUniversity of Padua Padua Italy
| | - Nicola Pozzi
- Laboratory of Protein Chemistry, Department of Pharmaceutical & Pharmacological SciencesUniversity of Padua Padua Italy
| | - Laura Acquasaliente
- Laboratory of Protein Chemistry, Department of Pharmaceutical & Pharmacological SciencesUniversity of Padua Padua Italy
| | - Ilaria Artusi
- Laboratory of Protein Chemistry, Department of Pharmaceutical & Pharmacological SciencesUniversity of Padua Padua Italy
| | - Giulia Pontarollo
- Laboratory of Protein Chemistry, Department of Pharmaceutical & Pharmacological SciencesUniversity of Padua Padua Italy
| | - Daniele Peterle
- Laboratory of Protein Chemistry, Department of Pharmaceutical & Pharmacological SciencesUniversity of Padua Padua Italy
| |
Collapse
|
3
|
Kurochkina N, Guha U, Lu Z. SH Domains and Epidermal Growth Factor Receptors. SH DOMAINS 2015:133-158. [DOI: 10.1007/978-3-319-20098-9_7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/03/2025]
|
4
|
Trellet M, Melquiond ASJ, Bonvin AMJJ. A unified conformational selection and induced fit approach to protein-peptide docking. PLoS One 2013; 8:e58769. [PMID: 23516555 PMCID: PMC3596317 DOI: 10.1371/journal.pone.0058769] [Citation(s) in RCA: 146] [Impact Index Per Article: 12.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2012] [Accepted: 02/05/2013] [Indexed: 01/01/2023] Open
Abstract
Protein-peptide interactions are vital for the cell. They mediate, inhibit or serve as structural components in nearly 40% of all macromolecular interactions, and are often associated with diseases, making them interesting leads for protein drug design. In recent years, large-scale technologies have enabled exhaustive studies on the peptide recognition preferences for a number of peptide-binding domain families. Yet, the paucity of data regarding their molecular binding mechanisms together with their inherent flexibility makes the structural prediction of protein-peptide interactions very challenging. This leaves flexible docking as one of the few amenable computational techniques to model these complexes. We present here an ensemble, flexible protein-peptide docking protocol that combines conformational selection and induced fit mechanisms. Starting from an ensemble of three peptide conformations (extended, a-helix, polyproline-II), flexible docking with HADDOCK generates 79.4% of high quality models for bound/unbound and 69.4% for unbound/unbound docking when tested against the largest protein-peptide complexes benchmark dataset available to date. Conformational selection at the rigid-body docking stage successfully recovers the most relevant conformation for a given protein-peptide complex and the subsequent flexible refinement further improves the interface by up to 4.5 Å interface RMSD. Cluster-based scoring of the models results in a selection of near-native solutions in the top three for ∼75% of the successfully predicted cases. This unified conformational selection and induced fit approach to protein-peptide docking should open the route to the modeling of challenging systems such as disorder-order transitions taking place upon binding, significantly expanding the applicability limit of biomolecular interaction modeling by docking.
Collapse
Affiliation(s)
- Mikael Trellet
- Computational Structural Biology Group, Bijvoet Center for Biomolecular Research, Faculty of Science - Chemistry, Utrecht University, Utrecht, The Netherlands
| | - Adrien S. J. Melquiond
- Computational Structural Biology Group, Bijvoet Center for Biomolecular Research, Faculty of Science - Chemistry, Utrecht University, Utrecht, The Netherlands
- * E-mail: (AM); (AB)
| | - Alexandre M. J. J. Bonvin
- Computational Structural Biology Group, Bijvoet Center for Biomolecular Research, Faculty of Science - Chemistry, Utrecht University, Utrecht, The Netherlands
- * E-mail: (AM); (AB)
| |
Collapse
|
5
|
SH3 domains: modules of protein-protein interactions. Biophys Rev 2012; 5:29-39. [PMID: 28510178 DOI: 10.1007/s12551-012-0081-z] [Citation(s) in RCA: 134] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2012] [Accepted: 05/29/2012] [Indexed: 01/01/2023] Open
Abstract
Src homology 3 (SH3) domains are involved in the regulation of important cellular pathways, such as cell proliferation, migration and cytoskeletal modifications. Recognition of polyproline and a number of noncanonical sequences by SH3 domains has been extensively studied by crystallography, nuclear magnetic resonance and other methods. High-affinity peptides that bind SH3 domains are used in drug development as candidates for anticancer treatment. This review summarizes the latest achievements in deciphering structural determinants of SH3 function.
Collapse
|
6
|
Fernandez-Ballester G, Beltrao P, Gonzalez JM, Song YH, Wilmanns M, Valencia A, Serrano L. Structure-Based Prediction of the Saccharomyces cerevisiae SH3–Ligand Interactions. J Mol Biol 2009; 388:902-16. [DOI: 10.1016/j.jmb.2009.03.038] [Citation(s) in RCA: 30] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2008] [Revised: 03/11/2009] [Accepted: 03/15/2009] [Indexed: 01/21/2023]
|
7
|
Kiel C, Beltrao P, Serrano L. Analyzing Protein Interaction Networks Using Structural Information. Annu Rev Biochem 2008; 77:415-41. [DOI: 10.1146/annurev.biochem.77.062706.133317] [Citation(s) in RCA: 78] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Affiliation(s)
- Christina Kiel
- EMBL-CRG Systems Biology Unit, Center de Regulacio Genomica, Barcelona 08003, Spain; ,
| | - Pedro Beltrao
- European Molecular Biology Laboratory, 69117 Heidelberg, Germany;
| | - Luis Serrano
- EMBL-CRG Systems Biology Unit, Center de Regulacio Genomica, Barcelona 08003, Spain; ,
| |
Collapse
|
8
|
De Filippis V, Draghi A, Frasson R, Grandi C, Musi V, Fontana A, Pastore A. o-Nitrotyrosine and p-iodophenylalanine as spectroscopic probes for structural characterization of SH3 complexes. Protein Sci 2007; 16:1257-65. [PMID: 17567746 PMCID: PMC2206685 DOI: 10.1110/ps.062726807] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/08/2023]
Abstract
High-throughput screening of protein-protein and protein-peptide interactions is of high interest both for biotechnological and pharmacological applications. Here, we propose the use of the noncoded amino acids o-nitrotyrosine and p-iodophenylalanine as spectroscopic probes in combination with circular dichroism and fluorescence quenching techniques (i.e., collisional quenching and resonance energy transfer) as a means to determine the peptide orientation in complexes with SH3 domains. Proline-rich peptides bind SH3 modules in two alternative orientations, according to their sequence motifs, classified as class I and class II. The method was tested on an SH3 domain from a yeast myosin that is known to recognize specifically class I peptides. We exploited the fluorescence quenching effects induced by o-nitrotyrosine and p-iodophenylalanine on the fluorescence signal of a highly conserved Trp residue, which is the signature of SH3 domains and sits directly in the binding pocket. In particular, we studied how the introduction of the two probes at different positions of the peptide sequence (i.e., N-terminally or C-terminally) influences the spectroscopic properties of the complex. This approach provides clear-cut evidence of the orientation of the binding peptide in the SH3 pocket. The chemical strategy outlined here can be easily extended to other protein modules, known to bind linear sequence motifs in a highly directional manner.
Collapse
|
9
|
Beltrao P, Kiel C, Serrano L. Structures in systems biology. Curr Opin Struct Biol 2007; 17:378-84. [PMID: 17574836 DOI: 10.1016/j.sbi.2007.05.005] [Citation(s) in RCA: 42] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2007] [Revised: 04/11/2007] [Accepted: 05/29/2007] [Indexed: 11/18/2022]
Abstract
Oil and water do not normally mix, and apparently structural biology and systems biology look like two different universes. It can be argued that structural biology could play a very important role in systems biology. Although at the final stage of understanding a signal transduction pathway, a cell, an organ or a living system, structures could be obviated, we need them to be able to reach that stage. Structures of macromolecules, especially molecular machines, could provide quantitative parameters, help to elucidate functional networks or enable rational designed perturbation experiments for reverse engineering. The role of structural biology in systems biology should be to provide enough understanding so that macromolecules can be translated into dots or even into equations devoid of atoms.
Collapse
Affiliation(s)
- Pedro Beltrao
- European Molecular Biology Laboratory, Meyerhofstrasse 1, Heidelberg D69115, Germany
| | | | | |
Collapse
|
10
|
Paxillin and ponsin interact in nascent costameres of muscle cells. J Mol Biol 2007; 369:665-82. [PMID: 17462669 DOI: 10.1016/j.jmb.2007.03.050] [Citation(s) in RCA: 32] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2006] [Revised: 03/13/2007] [Accepted: 03/13/2007] [Indexed: 11/30/2022]
Abstract
Muscle differentiation requires the transition from motile myoblasts to sessile myotubes and the assembly of a highly regular contractile apparatus. This striking cytoskeletal remodelling is coordinated with a transformation of focal adhesion-like cell-matrix contacts into costameres. To assess mechanisms underlying this differentiation process, we searched for muscle specific-binding partners of paxillin. We identified an interaction of paxillin with the vinexin adaptor protein family member ponsin in nascent costameres during muscle differentiation, which is mediated by an interaction of the second src homology domain 3 (SH3) domain of ponsin with the proline-rich region of paxillin. To understand the molecular basis of this interaction, we determined the structure of this SH3 domain at 0.83 A resolution, as well as its complex with the paxillin binding peptide at 1.63 A resolution. Upon binding, the paxillin peptide adopts a polyproline-II helix conformation in the complex. Contrary to the charged SH3 binding interface, the peptide contains only non-polar residues and for the first time such an interaction was observed structurally in SH3 domains. Fluorescence titration confirmed the ponsin/paxillin interaction, characterising it further by a weak binding affinity. Transfection experiments revealed further characteristics of ponsin functions in muscle cells: All three SH3 domains in the C terminus of ponsin appeared to synergise in targeting the protein to force-transducing structures. The overexpression of ponsin resulted in altered muscle cell-matrix contact morphology, suggesting its involvement in the establishment of mature costameres. Further evidence for the role of ponsin in the maintenance of mature mechanotransduction sites in cardiomyocytes comes from the observation that ponsin expression was down-regulated in end-stage failing hearts, and that this effect was reverted upon mechanical unloading. These results provide new insights in how low affinity protein-protein interactions may contribute to a fine tuning of cytoskeletal remodelling processes during muscle differentiation and in adult cardiomyocytes.
Collapse
|
11
|
Hwang KJ, Mahmoodian F, Ferretti JA, Korn ED, Gruschus JM. Intramolecular interaction in the tail of Acanthamoeba myosin IC between the SH3 domain and a putative pleckstrin homology domain. Proc Natl Acad Sci U S A 2007; 104:784-9. [PMID: 17215368 PMCID: PMC1783391 DOI: 10.1073/pnas.0610231104] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022] Open
Abstract
The 466-aa tail of the heavy chain of Acanthamoeba myosin IC (AMIC) comprises an N-terminal 220-residue basic region (BR) followed by a 56-residue Gly/Pro/Ala-rich region (GPA1), a 55-residue Src homology 3 (SH3) domain, and a C-terminal 135-residue Gly/Pro/Ala-rich region (GPA2). Cryo-electron microscopy of AMIC had shown previously that the AMIC tail is folded back on itself, suggesting the possibility of interactions between its N- and C-terminal regions. We now show specific differences between the NMR spectrum of bacterially expressed full-length tail and the sum of the spectra of individually expressed BR and GPA1-SH3-GPA2 (GSG) regions. These results are indicative of interactions between the two subdomains in the full-length tail. From the NMR data, we could assign many of the residues in BR and GSG that are involved in these interactions. By combining homology modeling with the NMR data, we identify a putative pleckstrin homology (PH) domain within BR, and show that the PH domain interacts with the SH3 domain.
Collapse
Affiliation(s)
| | - Fatemeh Mahmoodian
- Laboratory of Cell Biology, National Heart, Lung, and Blood Institute, National Institutes of Health, Bethesda, MD 20892
| | | | - Edward D. Korn
- Laboratory of Cell Biology, National Heart, Lung, and Blood Institute, National Institutes of Health, Bethesda, MD 20892
- To whom correspondence should be addressed at:
National Institutes of Health, Building 50, Room 2517, Bethesda, MD 20892. E-mail:
| | | |
Collapse
|
12
|
Current awareness on yeast. Yeast 2006. [DOI: 10.1002/yea.1319] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022] Open
|