1
|
Zhou S, Zhou Y, Liu T, Zheng J, Jia C. PredLLPS_PSSM: a novel predictor for liquid-liquid protein separation identification based on evolutionary information and a deep neural network. Brief Bioinform 2023; 24:bbad299. [PMID: 37609923 DOI: 10.1093/bib/bbad299] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2023] [Revised: 08/01/2023] [Accepted: 08/02/2023] [Indexed: 08/24/2023] Open
Abstract
The formation of biomolecular condensates by liquid-liquid phase separation (LLPS) has become a universal mechanism for spatiotemporal coordination of biological activities in cells and has been widely observed to directly regulate the key cellular processes involved in cancer cell pathology. However, the complexity of protein sequences and the diversity of conformations are inherently disordered, which poses great challenges for LLPS protein calculations and experimental research. Herein, we proposed a novel predictor named PredLLPS_PSSM for LLPS protein identification based only on sequence evolution information. Because finding real and reliable samples is the cornerstone of building predictors, we collected anew and collated the LLPS proteins from the latest versions of three databases. By comparing the performance of the position-specific score matrix (PSSM) and word embedding, PredLLPS_PSSM combined PSSM-based information and two deep learning frameworks. Independent tests using three existing independent test datasets and two newly constructed independent test datasets demonstrated the superiority of PredLLPS_PSSM compared with state-of-the-art methods. Furthermore, we tested PredLLPS_PSSM on nine experimentally identified LLPS proteins from three insects that were not included in any of the databases. In addition, the powerful Shapley Additive exPlanation algorithm and heatmap were applied to find the most critical amino acids relevant to LLPS.
Collapse
Affiliation(s)
- Shengming Zhou
- School of Science, Dalian Maritime University, Dalian 116026, China
| | - Yetong Zhou
- School of Science, Dalian Maritime University, Dalian 116026, China
| | - Tian Liu
- School of Bioengineering, Dalian University of Technology, Dalian 116024, China
| | - Jia Zheng
- School of Science, Dalian Maritime University, Dalian 116026, China
| | - Cangzhi Jia
- School of Science, Dalian Maritime University, Dalian 116026, China
| |
Collapse
|
2
|
Saar KL, Morgunov AS, Qi R, Arter WE, Krainer G, Lee AA, Knowles TPJ. Learning the molecular grammar of protein condensates from sequence determinants and embeddings. Proc Natl Acad Sci U S A 2021; 118:e2019053118. [PMID: 33827920 DOI: 10.1073/pnas.2019053118] [Citation(s) in RCA: 63] [Impact Index Per Article: 21.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
The tendency of many cellular proteins to form protein-rich biomolecular condensates underlies the formation of subcellular compartments and has been linked to various physiological functions. Understanding the molecular basis of this fundamental process and predicting protein phase behavior have therefore become important objectives. To develop a global understanding of how protein sequence determines its phase behavior, we constructed bespoke datasets of proteins of varying phase separation propensity and identified explicit biophysical and sequence-specific features common to phase-separating proteins. Moreover, by combining this insight with neural network-based sequence embeddings, we trained machine-learning classifiers that identified phase-separating sequences with high accuracy, including from independent external test data. Intracellular phase separation of proteins into biomolecular condensates is increasingly recognized as a process with a key role in cellular compartmentalization and regulation. Different hypotheses about the parameters that determine the tendency of proteins to form condensates have been proposed, with some of them probed experimentally through the use of constructs generated by sequence alterations. To broaden the scope of these observations, we established an in silico strategy for understanding on a global level the associations between protein sequence and phase behavior and further constructed machine-learning models for predicting protein liquid–liquid phase separation (LLPS). Our analysis highlighted that LLPS-prone proteins are more disordered, less hydrophobic, and of lower Shannon entropy than sequences in the Protein Data Bank or the Swiss-Prot database and that they show a fine balance in their relative content of polar and hydrophobic residues. To further learn in a hypothesis-free manner the sequence features underpinning LLPS, we trained a neural network-based language model and found that a classifier constructed on such embeddings learned the underlying principles of phase behavior at a comparable accuracy to a classifier that used knowledge-based features. By combining knowledge-based features with unsupervised embeddings, we generated an integrated model that distinguished LLPS-prone sequences both from structured proteins and from unstructured proteins with a lower LLPS propensity and further identified such sequences from the human proteome at a high accuracy. These results provide a platform rooted in molecular principles for understanding protein phase behavior. The predictor, termed DeePhase, is accessible from https://deephase.ch.cam.ac.uk/.
Collapse
|
3
|
Cristóvão JS, Romão MA, Gallardo R, Schymkowitz J, Rousseau F, Gomes CM. Targeting S100B with Peptides Encoding Intrinsic Aggregation-Prone Sequence Segments. Molecules 2021; 26:molecules26020440. [PMID: 33467751 PMCID: PMC7830867 DOI: 10.3390/molecules26020440] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2020] [Revised: 01/12/2021] [Accepted: 01/12/2021] [Indexed: 12/11/2022] Open
Abstract
S100 proteins assume a diversity of oligomeric states including large order self-assemblies, with an impact on protein structure and function. Previous work has uncovered that S100 proteins, including S100B, are prone to undergo β-aggregation under destabilizing conditions. This propensity is encoded in aggregation-prone regions (APR) mainly located in segments at the homodimer interface, and which are therefore mostly shielded from the solvent and from deleterious interactions, under native conditions. As in other systems, this characteristic may be used to develop peptides with pharmacological potential that selectively induce the aggregation of S100B through homotypic interactions with its APRs, resulting in functional inhibition through a loss of function. Here we report initial studies towards this goal. We applied the TANGO algorithm to identify specific APR segments in S100B helix IV and used this information to design and synthesize S100B-derived APR peptides. We then combined fluorescence spectroscopy, transmission electron microscopy, biolayer interferometry, and aggregation kinetics and determined that the synthetic peptides have strong aggregation propensity, interact with S100B, and may promote co-aggregation reactions. In this framework, we discuss the considerable potential of such APR-derived peptides to act pharmacologically over S100B in numerous physiological and pathological conditions, for instance as modifiers of the S100B interactome or as promoters of S100B inactivation by selective aggregation.
Collapse
Affiliation(s)
- Joana S. Cristóvão
- Biosystems and Integrative Sciences Institute, Faculdade de Ciências, Universidade Lisboa, 1749-016 Lisbon, Portugal; (J.S.C.); (M.A.R.)
- Departamento de Química e Bioquímica, Faculdade de Ciências, Universidade Lisboa, 1749-016 Lisbon, Portugal
| | - Mariana A. Romão
- Biosystems and Integrative Sciences Institute, Faculdade de Ciências, Universidade Lisboa, 1749-016 Lisbon, Portugal; (J.S.C.); (M.A.R.)
- Departamento de Química e Bioquímica, Faculdade de Ciências, Universidade Lisboa, 1749-016 Lisbon, Portugal
| | - Rodrigo Gallardo
- VIB Switch Laboratory, Flanders Institute for Biotechnology (VIB), 3000 Leuven, Belgium;
- Switch Laboratory, Department of Cellular and Molecular Medicine, KU Leuven, Herestraat 49, PB 802, 3000 Leuven, Belgium
| | - Joost Schymkowitz
- VIB Switch Laboratory, Flanders Institute for Biotechnology (VIB), 3000 Leuven, Belgium;
- Switch Laboratory, Department of Cellular and Molecular Medicine, KU Leuven, Herestraat 49, PB 802, 3000 Leuven, Belgium
- Correspondence: (C.M.G.); (F.R.); (J.S.)
| | - Frederic Rousseau
- VIB Switch Laboratory, Flanders Institute for Biotechnology (VIB), 3000 Leuven, Belgium;
- Switch Laboratory, Department of Cellular and Molecular Medicine, KU Leuven, Herestraat 49, PB 802, 3000 Leuven, Belgium
- Correspondence: (C.M.G.); (F.R.); (J.S.)
| | - Cláudio M. Gomes
- Biosystems and Integrative Sciences Institute, Faculdade de Ciências, Universidade Lisboa, 1749-016 Lisbon, Portugal; (J.S.C.); (M.A.R.)
- Departamento de Química e Bioquímica, Faculdade de Ciências, Universidade Lisboa, 1749-016 Lisbon, Portugal
- Correspondence: (C.M.G.); (F.R.); (J.S.)
| |
Collapse
|
4
|
Mayorov A, Dal Peraro M, Abriata LA. Active Site-Induced Evolutionary Constraints Follow Fold Polarity Principles in Soluble Globular Enzymes. Mol Biol Evol 2020; 36:1728-1733. [PMID: 31004173 DOI: 10.1093/molbev/msz096] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022] Open
Abstract
A recent analysis of evolutionary rates in >500 globular soluble enzymes revealed pervasive conservation gradients toward catalytic residues. By looking at amino acid preference profiles rather than evolutionary rates in the same data set, we quantified the effects of active sites on site-specific constraints for physicochemical traits. We found that conservation gradients respond to constraints for polarity, hydrophobicity, flexibility, rigidity and structure in ways consistent with fold polarity principles; while sites far from active sites seem to experience no physicochemical constraint, rather being highly variable and favoring amino acids of low metabolic cost. Globally, our results highlight that amino acid variation contains finer information about protein structure than usually regarded in evolutionary models, and that this information is retrievable automatically with simple fits. We propose that analyses of the kind presented here incorporated into models of protein evolution should allow for better description of the physical chemistry that underlies molecular evolution.
Collapse
Affiliation(s)
- Alexander Mayorov
- Department of Chemistry, University of Cambridge, Cambridge, United Kingdom
| | - Matteo Dal Peraro
- Laboratory for Biomolecular Modeling, École Polytechnique Fédérale de Lausanne and Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Luciano A Abriata
- Laboratory for Biomolecular Modeling, École Polytechnique Fédérale de Lausanne and Swiss Institute of Bioinformatics, Lausanne, Switzerland.,Protein Production and Structure Core Facility, École Polytechnique Fédérale de Lausanne and Swiss Institute of Bioinformatics, Lausanne, Switzerland
| |
Collapse
|
5
|
Ferrada E. Gene Families, Epistasis and the Amino Acid Preferences of Protein Homologs. Evol Bioinform Online 2019; 15:1176934319870485. [PMID: 31452598 PMCID: PMC6698995 DOI: 10.1177/1176934319870485] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2019] [Accepted: 07/27/2019] [Indexed: 11/16/2022] Open
Abstract
In order to preserve structure and function, proteins tend to preferentially conserve amino acids at particular sites along the sequence. Because mutations can affect structure and function, the question arises whether the preference of a protein site for a particular amino acid varies between protein homologs, and to what extent that variation depends on sequence divergence. Answering these questions can help in the development of models of sequence evolution, as well as provide insights on the dependence of the fitness effects of mutations on the genetic background of sequences, a phenomenon known as epistasis. Here, I comment on recent computational work providing a systematic analysis of the extent to which the amino acid preferences of proteins depend on the background mutations of protein homologs.
Collapse
Affiliation(s)
- Evandro Ferrada
- Center for Genomics and Bioinformatics, Faculty of Science, Universidad Mayor, Santiago, Chile
| |
Collapse
|
6
|
Abstract
What happens inside an enzyme's active site to allow slow and difficult chemical reactions to occur so rapidly? This question has occupied biochemists' attention for a long time. Computer models of increasing sophistication have predicted an important role for electrostatic interactions in enzymatic reactions, yet this hypothesis has proved vexingly difficult to test experimentally. Recent experiments utilizing the vibrational Stark effect make it possible to measure the electric field a substrate molecule experiences when bound inside its enzyme's active site. These experiments have provided compelling evidence supporting a major electrostatic contribution to enzymatic catalysis. Here, we review these results and develop a simple model for electrostatic catalysis that enables us to incorporate disparate concepts introduced by many investigators to describe how enzymes work into a more unified framework stressing the importance of electric fields at the active site.
Collapse
Affiliation(s)
- Stephen D Fried
- Proteins and Nucleic Acid Chemistry Division, Medical Research Council Laboratory of Molecular Biology, Cambridge CB2 0QH, United Kingdom;
| | - Steven G Boxer
- Department of Chemistry, Stanford University, Stanford, California 94305;
| |
Collapse
|
7
|
Wang Y, Roose BW, Palovcak EJ, Carnevale V, Dmochowski IJ. A Genetically Encoded β-Lactamase Reporter for Ultrasensitive (129) Xe NMR in Mammalian Cells. Angew Chem Int Ed Engl 2016; 55:8984-7. [PMID: 27305488 DOI: 10.1002/anie.201604055] [Citation(s) in RCA: 45] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2016] [Revised: 05/20/2016] [Indexed: 01/27/2023]
Abstract
Molecular imaging holds considerable promise for elucidating biological processes in normal physiology as well as disease states, but requires noninvasive methods for identifying analytes at sub-micromolar concentrations. Particularly useful are genetically encoded, single-protein reporters that harness the power of molecular biology to visualize specific molecular processes, but such reporters have been conspicuously lacking for in vivo magnetic resonance imaging (MRI). Herein, we report TEM-1 β-lactamase (bla) as a single-protein reporter for hyperpolarized (HP) (129) Xe NMR, with significant saturation contrast at 0.1 μm. Xenon chemical exchange saturation transfer (CEST) interactions with the primary allosteric site in bla give rise to a unique saturation peak at 255 ppm, well removed (≈60 ppm downfield) from the (129) Xe-H2 O peak. Useful saturation contrast was also observed for bla expressed in bacterial cells and mammalian cells.
Collapse
Affiliation(s)
- Yanfei Wang
- Department of Chemistry, University of Pennsylvania, 231 South 34th Street, Philadelphia, PA, 19104-6323, USA
| | - Benjamin W Roose
- Department of Chemistry, University of Pennsylvania, 231 South 34th Street, Philadelphia, PA, 19104-6323, USA
| | - Eugene J Palovcak
- Institute for Computational Molecular Science, College of Science and Technology, Temple University, 1925 N. 12th Street, Philadelphia, PA, 19122, USA
| | - Vincenzo Carnevale
- Institute for Computational Molecular Science, College of Science and Technology, Temple University, 1925 N. 12th Street, Philadelphia, PA, 19122, USA
| | - Ivan J Dmochowski
- Department of Chemistry, University of Pennsylvania, 231 South 34th Street, Philadelphia, PA, 19104-6323, USA.
| |
Collapse
|
8
|
Kan ZY, Walters BT, Mayne L, Englander SW. Protein hydrogen exchange at residue resolution by proteolytic fragmentation mass spectrometry analysis. Proc Natl Acad Sci U S A 2013; 110:16438-43. [PMID: 24019478 DOI: 10.1073/pnas.1315532110] [Citation(s) in RCA: 116] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Hydrogen exchange technology provides a uniquely powerful instrument for measuring protein structural and biophysical properties, quantitatively and in a nonperturbing way, and determining how these properties are implemented to produce protein function. A developing hydrogen exchange-mass spectrometry method (HX MS) is able to analyze large biologically important protein systems while requiring only minuscule amounts of experimental material. The major remaining deficiency of the HX MS method is the inability to deconvolve HX results to individual amino acid residue resolution. To pursue this goal we used an iterative optimization program (HDsite) that integrates recent progress in multiple peptide acquisition together with previously unexamined isotopic envelope-shape information and a site-resolved back-exchange correction. To test this approach, residue-resolved HX rates computed from HX MS data were compared with extensive HX NMR measurements, and analogous comparisons were made in simulation trials. These tests found excellent agreement and revealed the important computational determinants.
Collapse
|
9
|
Xiao S, Patsalo V, Shan B, Bi Y, Green DF, Raleigh DP. Rational modification of protein stability by targeting surface sites leads to complicated results. Proc Natl Acad Sci U S A 2013; 110:11337-42. [PMID: 23798426 PMCID: PMC3710877 DOI: 10.1073/pnas.1222245110] [Citation(s) in RCA: 36] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
The rational modification of protein stability is an important goal of protein design. Protein surface electrostatic interactions are not evolutionarily optimized for stability and are an attractive target for the rational redesign of proteins. We show that surface charge mutants can exert stabilizing effects in distinct and unanticipated ways, including ones that are not predicted by existing methods, even when only solvent-exposed sites are targeted. Individual mutation of three solvent-exposed lysines in the villin headpiece subdomain significantly stabilizes the protein, but the mechanism of stabilization is very different in each case. One mutation destabilizes native-state electrostatic interactions but has a larger destabilizing effect on the denatured state, a second removes the desolvation penalty paid by the charged residue, whereas the third introduces unanticipated native-state interactions but does not alter electrostatics. Our results show that even seemingly intuitive mutations can exert their effects through unforeseen and complex interactions.
Collapse
Affiliation(s)
| | - Vadim Patsalo
- Applied Mathematics, and
- Laufer Center for Physical and Quantitative Biology, Stony Brook University, Stony Brook, NY 11794-3600
| | | | - Yuan Bi
- Departments of Chemistry and
| | - David F. Green
- Departments of Chemistry and
- Applied Mathematics, and
- Laufer Center for Physical and Quantitative Biology, Stony Brook University, Stony Brook, NY 11794-3600
| | | |
Collapse
|
10
|
Gruszka DT, Wojdyla JA, Bingham RJ, Turkenburg JP, Manfield IW, Steward A, Leech AP, Geoghegan JA, Foster TJ, Clarke J, Potts JR. Staphylococcal biofilm-forming protein has a contiguous rod-like structure. Proc Natl Acad Sci U S A 2012; 109:E1011-8. [PMID: 22493247 PMCID: PMC3340054 DOI: 10.1073/pnas.1119456109] [Citation(s) in RCA: 58] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Staphylococcus aureus and Staphylococcus epidermidis form communities (called biofilms) on inserted medical devices, leading to infections that affect many millions of patients worldwide and cause substantial morbidity and mortality. As biofilms are resistant to antibiotics, device removal is often required to resolve the infection. Thus, there is a need for new therapeutic strategies and molecular data that might assist their development. Surface proteins S. aureus surface protein G (SasG) and accumulation-associated protein (S. epidermidis) promote biofilm formation through their "B" regions. B regions contain tandemly arrayed G5 domains interspersed with approximately 50 residue sequences (herein called E) and have been proposed to mediate intercellular accumulation through Zn(2+)-mediated homodimerization. Although E regions are predicted to be unstructured, SasG and accumulation-associated protein form extended fibrils on the bacterial surface. Here we report structures of E-G5 and G5-E-G5 from SasG and biophysical characteristics of single and multidomain fragments. E sequences fold cooperatively and form interlocking interfaces with G5 domains in a head-to-tail fashion, resulting in a contiguous, elongated, monomeric structure. E and G5 domains lack a compact hydrophobic core, and yet G5 domain and multidomain constructs have thermodynamic stabilities only slightly lower than globular proteins of similar size. Zn(2+) does not cause SasG domains to form dimers. The work reveals a paradigm for formation of fibrils on the 100-nm scale and suggests that biofilm accumulation occurs through a mechanism distinct from the "zinc zipper." Finally, formation of two domains by each repeat (as in SasG) might reduce misfolding in proteins when the tandem arrangement of highly similar sequences is advantageous.
Collapse
Affiliation(s)
| | - Justyna A. Wojdyla
- Department of Biology, University of York, York YO10 5DD, United Kingdom
| | - Richard J. Bingham
- Department of Chemical and Biological Sciences, University of Huddersfield, Huddersfield HD1 3DH, United Kingdom
| | | | - Iain W. Manfield
- Astbury Centre for Structural Molecular Biology, Faculty of Biological Sciences, University of Leeds, Leeds LS2 9JT, United Kingdom
| | - Annette Steward
- Department of Chemistry, University of Cambridge, Cambridge CB2 1EW, United Kingdom; and
| | - Andrew P. Leech
- Department of Biology, University of York, York YO10 5DD, United Kingdom
| | - Joan A. Geoghegan
- Microbiology Department, Moyne Institute of Preventive Medicine, Trinity College, Dublin 2, Ireland
| | - Timothy J. Foster
- Microbiology Department, Moyne Institute of Preventive Medicine, Trinity College, Dublin 2, Ireland
| | - Jane Clarke
- Department of Chemistry, University of Cambridge, Cambridge CB2 1EW, United Kingdom; and
| | - Jennifer R. Potts
- Department of Biology, University of York, York YO10 5DD, United Kingdom
- Department of Chemistry, University of York, York YO10 5DD, United Kingdom
| |
Collapse
|