1
|
Mutational biases favor complexity increases in protein interaction networks after gene duplication. Mol Syst Biol 2024; 20:549-572. [PMID: 38499674 PMCID: PMC11066126 DOI: 10.1038/s44320-024-00030-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2024] [Revised: 02/27/2024] [Accepted: 02/28/2024] [Indexed: 03/20/2024] Open
Abstract
Biological systems can gain complexity over time. While some of these transitions are likely driven by natural selection, the extent to which they occur without providing an adaptive benefit is unknown. At the molecular level, one example is heteromeric complexes replacing homomeric ones following gene duplication. Here, we build a biophysical model and simulate the evolution of homodimers and heterodimers following gene duplication using distributions of mutational effects inferred from available protein structures. We keep the specific activity of each dimer identical, so their concentrations drift neutrally without new functions. We show that for more than 60% of tested dimer structures, the relative concentration of the heteromer increases over time due to mutational biases that favor the heterodimer. However, allowing mutational effects on synthesis rates and differences in the specific activity of homo- and heterodimers can limit or reverse the observed bias toward heterodimers. Our results show that the accumulation of more complex protein quaternary structures is likely under neutral evolution, and that natural selection would be needed to reverse this tendency.
Collapse
|
2
|
An atlas of protein homo-oligomerization across domains of life. Cell 2024; 187:999-1010.e15. [PMID: 38325366 DOI: 10.1016/j.cell.2024.01.022] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2023] [Revised: 11/03/2023] [Accepted: 01/15/2024] [Indexed: 02/09/2024]
Abstract
Protein structures are essential to understanding cellular processes in molecular detail. While advances in artificial intelligence revealed the tertiary structure of proteins at scale, their quaternary structure remains mostly unknown. We devise a scalable strategy based on AlphaFold2 to predict homo-oligomeric assemblies across four proteomes spanning the tree of life. Our results suggest that approximately 45% of an archaeal proteome and a bacterial proteome and 20% of two eukaryotic proteomes form homomers. Our predictions accurately capture protein homo-oligomerization, recapitulate megadalton complexes, and unveil hundreds of homo-oligomer types, including three confirmed experimentally by structure determination. Integrating these datasets with omics information suggests that a majority of known protein complexes are symmetric. Finally, these datasets provide a structural context for interpreting disease mutations and reveal coiled-coil regions as major enablers of quaternary structure evolution in human. Our strategy is applicable to any organism and provides a comprehensive view of homo-oligomerization in proteomes.
Collapse
|
3
|
CC + : A searchable database of validated coiled coils in PDB structures and AlphaFold2 models. Protein Sci 2023; 32:e4789. [PMID: 37768271 PMCID: PMC10588367 DOI: 10.1002/pro.4789] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2023] [Revised: 09/10/2023] [Accepted: 09/23/2023] [Indexed: 09/29/2023]
Abstract
α-Helical coiled coils are common tertiary and quaternary elements of protein structure. In coiled coils, two or more α helices wrap around each other to form bundles. This apparently simple structural motif can generate many architectures and topologies. Coiled coil-forming sequences can be predicted from heptad repeats of hydrophobic and polar residues, hpphppp, although this is not always reliable. Alternatively, coiled-coil structures can be identified using the program SOCKET, which finds knobs-into-holes (KIH) packing between side chains of neighboring helices. SOCKET also classifies coiled-coil architecture and topology, thus allowing sequence-to-structure relationships to be garnered. In 2009, we used SOCKET to create a relational database of coiled-coil structures, CC+ , from the RCSB Protein Data Bank (PDB). Here, we report an update of CC+ following an update of SOCKET (to Socket2) and the recent explosion of structural data and the success of AlphaFold2 in predicting protein structures from genome sequences. With the most-stringent SOCKET parameters, CC+ contains ≈12,000 coiled-coil assemblies from experimentally determined structures, and ≈120,000 potential coiled-coil structures within single-chain models predicted by AlphaFold2 across 48 proteomes. CC+ allows these and other less-stringently defined coiled coils to be searched at various levels of structure, sequence, and side-chain interactions. The identified coiled coils can be viewed directly from CC+ using the Socket2 application, and their associated data can be downloaded for further analyses. CC+ is available freely at http://coiledcoils.chm.bris.ac.uk/CCPlus/Home.html. It will be updated automatically. We envisage that CC+ could be used to understand coiled-coil assemblies and their sequence-to-structure relationships, and to aid protein design and engineering.
Collapse
|
4
|
Mesoscale molecular assembly is favored by the active, crowded cytoplasm. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.09.19.558334. [PMID: 37781612 PMCID: PMC10541124 DOI: 10.1101/2023.09.19.558334] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/03/2023]
Abstract
The mesoscale organization of molecules into membraneless biomolecular condensates is emerging as a key mechanism of rapid spatiotemporal control in cells1. Principles of biomolecular condensation have been revealed through in vitro reconstitution2. However, intracellular environments are much more complex than test-tube environments: They are viscoelastic, highly crowded at the mesoscale, and are far from thermodynamic equilibrium due to the constant action of energy-consuming processes3. We developed synDrops, a synthetic phase separation system, to study how the cellular environment affects condensate formation. Three key features enable physical analysis: synDrops are inducible, bioorthogonal, and have well-defined geometry. This design allows kinetic analysis of synDrop assembly and facilitates computational simulation of the process. We compared experiments and simulations to determine that macromolecular crowding promotes condensate nucleation but inhibits droplet growth through coalescence. ATP-dependent cellular activities help overcome the frustration of growth. In particular, actomyosin dynamics potentiate droplet growth by reducing confinement and elasticity in the mammalian cytoplasm, thereby enabling synDrop coarsening. Our results demonstrate that mesoscale molecular assembly is favored by the combined effects of crowding and active matter in the cytoplasm. These results move toward a better predictive understanding of condensate formation in vivo.
Collapse
|
5
|
Affinity and Valence Impact the Extent and Symmetry of Phase Separation of Multivalent Proteins. PHYSICAL REVIEW LETTERS 2022; 129:128102. [PMID: 36179193 DOI: 10.1103/physrevlett.129.128102] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/24/2019] [Revised: 07/07/2022] [Accepted: 08/11/2022] [Indexed: 06/16/2023]
Abstract
Biomolecular self-assembly spatially segregates proteins with a limited number of binding sites (valence) into condensates that coexist with a dilute phase. We develop a many-body lattice model for a three-component system of proteins with fixed valence in a solvent. We compare the predictions of the model to experimental phase diagrams that we measure in vivo, which allows us to vary specifically a binding site's affinity and valency. We find that the extent of phase separation varies exponentially with affinity and increases with valency. Valency alone determines the symmetry of the phase diagram.
Collapse
|
6
|
How gene duplication diversifies the landscape of protein oligomeric state and function. Curr Opin Genet Dev 2022; 76:101966. [PMID: 36007298 PMCID: PMC9548406 DOI: 10.1016/j.gde.2022.101966] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2022] [Revised: 07/01/2022] [Accepted: 07/08/2022] [Indexed: 11/29/2022]
Abstract
Oligomeric proteins are central to cellular life and the duplication and divergence of their genes is a key driver of evolutionary innovations. The duplication of a gene coding for an oligomeric protein has numerous possible outcomes, which motivates questions on the relationship between structural and functional divergence. How do protein oligomeric states diversify after gene duplication? In the simple case of duplication of a homo-oligomeric protein gene, what properties can influence the fate of descendant paralogs toward forming independent homomers or maintaining their interaction as a complex? Furthermore, how are functional innovations associated with the diversification of oligomeric states? Here, we review recent literature and present specific examples in an attempt to illustrate and answer these questions.
Collapse
|
7
|
A unified statistical potential reveals that amino acid stickiness governs nonspecific recruitment of client proteins into condensates. Protein Sci 2022; 31:e4361. [PMID: 35762716 PMCID: PMC9207749 DOI: 10.1002/pro.4361] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2022] [Revised: 05/06/2022] [Accepted: 05/10/2022] [Indexed: 11/07/2022]
Abstract
Membraneless organelles are cellular compartments that form by liquid-liquid phase separation of one or more components. Other molecules, such as proteins and nucleic acids, will distribute between the cytoplasm and the liquid compartment in accordance with the thermodynamic drive to lower the free energy of the system. The resulting distribution colocalizes molecular species to carry out a diversity of functions. Two factors could drive this partitioning: the difference in solvation between the dilute versus dense phase and intermolecular interactions between the client and scaffold proteins. Here, we develop a set of knowledge-based potentials that allow for the direct comparison between stickiness, which is dominated by desolvation energy, and pairwise residue contact propensity terms. We use these scales to examine experimental data from two systems: protein cargo dissolving within phase-separated droplets made from FG repeat proteins of the nuclear pore complex and client proteins dissolving within phase-separated FUS droplets. These analyses reveal a close agreement between the stickiness of the client proteins and the experimentally determined values of the partition coefficients (R > 0.9), while pairwise residue contact propensities between client and scaffold show weaker correlations. Hence, the stickiness of client proteins is sufficient to explain their differential partitioning within these two phase-separated systems without taking into account the composition of the condensate. This result implies that selective trafficking of client proteins to distinct membraneless organelles requires recognition elements beyond the client sequence composition. STATEMENT: Empirical potentials for amino acid stickiness and pairwise residue contact propensities are derived. These scales are unique in that they enable direct comparison of desolvation versus contact terms. We find that partitioning of a client protein to a condensate is best explained by amino acid stickiness.
Collapse
|
8
|
Synthetic condensate size correlates with yeast replicative cell age. MICROPUBLICATION BIOLOGY 2022; 2022:10.17912/micropub.biology.000582. [PMID: 35673323 PMCID: PMC9167435 DOI: 10.17912/micropub.biology.000582] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/01/2022] [Revised: 06/02/2022] [Accepted: 05/31/2022] [Indexed: 11/25/2022]
Abstract
Yeast divides asymmetrically, with an aging mother cell and a 'rejuvenated' daughter cell, and serves as a model organism for studying aging. At the same time, determining the age of yeast cells is technically challenging, requiring complex experimental setups or genetic strategies. We developed a synthetic system composed of two interacting oligomers, which forms condensates in living yeast cells. Here, we report that these synthetic condensates' size correlates with yeast replicative age, making these condensates age reporters for this model organism.
Collapse
|
9
|
The modular cell gets connected. Science 2022; 375:1093-1094. [PMID: 35271323 DOI: 10.1126/science.abo2360] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
Abstract
Integrative molecular cell biology can be used to interpret networks beyond modules.
Collapse
|
10
|
PDBe-KB: collaboratively defining the biological context of structural data. Nucleic Acids Res 2022; 50:D534-D542. [PMID: 34755867 PMCID: PMC8728252 DOI: 10.1093/nar/gkab988] [Citation(s) in RCA: 36] [Impact Index Per Article: 18.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2021] [Revised: 10/01/2021] [Accepted: 10/14/2021] [Indexed: 12/15/2022] Open
Abstract
The Protein Data Bank in Europe - Knowledge Base (PDBe-KB, https://pdbe-kb.org) is an open collaboration between world-leading specialist data resources contributing functional and biophysical annotations derived from or relevant to the Protein Data Bank (PDB). The goal of PDBe-KB is to place macromolecular structure data in their biological context by developing standardised data exchange formats and integrating functional annotations from the contributing partner resources into a knowledge graph that can provide valuable biological insights. Since we described PDBe-KB in 2019, there have been significant improvements in the variety of available annotation data sets and user functionality. Here, we provide an overview of the consortium, highlighting the addition of annotations such as predicted covalent binders, phosphorylation sites, effects of mutations on the protein structure and energetic local frustration. In addition, we describe a library of reusable web-based visualisation components and introduce new features such as a bulk download data service and a novel superposition service that generates clusters of superposed protein chains weekly for the whole PDB archive.
Collapse
|
11
|
QSalignWeb: A Server to Predict and Analyze Protein Quaternary Structure. Front Mol Biosci 2022; 8:787510. [PMID: 35071324 PMCID: PMC8769216 DOI: 10.3389/fmolb.2021.787510] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2021] [Accepted: 12/02/2021] [Indexed: 11/16/2022] Open
Abstract
The identification of physiologically relevant quaternary structures (QSs) in crystal lattices is challenging. To predict the physiological relevance of a particular QS, QSalign searches for homologous structures in which subunits interact in the same geometry. This approach proved accurate but was limited to structures already present in the Protein Data Bank (PDB). Here, we introduce a webserver (www.QSalign.org) allowing users to submit homo-oligomeric structures of their choice to the QSalign pipeline. Given a user-uploaded structure, the sequence is extracted and used to search homologs based on sequence similarity and PFAM domain architecture. If structural conservation is detected between a homolog and the user-uploaded QS, physiological relevance is inferred. The web server also generates alternative QSs with PISA and processes them the same way as the query submitted to widen the predictions. The result page also shows representative QSs in the protein family of the query, which is informative if no QS conservation was detected or if the protein appears monomeric. These representative QSs can also serve as a starting point for homology modeling.
Collapse
|
12
|
Corrigendum: The Dihydrofolate Reductase Protein-Fragment Complementation Assay: A Survival-Selection Assay for Large-Scale Analysis of Protein-Protein Interactions. Cold Spring Harb Protoc 2022; 2022:pdb.corr107812. [PMID: 34983863 DOI: 10.1101/pdb.corr107812] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
13
|
Abstract
Large-scale mapping of protein structures and their different states is crucial for gaining a mechanistic understanding of proteome function and regulation. In this issue of Cell, Cappelletti et al. achieve such a feat and identify hundreds of protein structural changes in response to outside stressors, providing a rich "structuromics" resource characterizing cellular adaptation.
Collapse
|
14
|
Abundance Imparts Evolutionary Constraints of Similar Magnitude on the Buried, Surface, and Disordered Regions of Proteins. Front Mol Biosci 2021; 8:626729. [PMID: 33996892 PMCID: PMC8119896 DOI: 10.3389/fmolb.2021.626729] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2020] [Accepted: 03/29/2021] [Indexed: 12/02/2022] Open
Abstract
An understanding of the forces shaping protein conservation is key, both for the fundamental knowledge it represents and to allow for optimal use of evolutionary information in practical applications. Sequence conservation is typically examined at one of two levels. The first is a residue-level, where intra-protein differences are analyzed and the second is a protein-level, where inter-protein differences are studied. At a residue level, we know that solvent-accessibility is a prime determinant of conservation. By inverting this logic, we inferred that disordered regions are slightly more solvent-accessible on average than the most exposed surface residues in domains. By integrating abundance information with evolutionary data within and across proteins, we confirmed a previously reported strong surface-core association in the evolution of structured regions, but we found a comparatively weak association between disordered and structured regions. The facts that disordered and structured regions experience different structural constraints and evolve independently provide a unique setup to examine an outstanding question: why is a protein’s abundance the main determinant of its sequence conservation? Indeed, any structural or biophysical property linked to the abundance-conservation relationship should increase the relative conservation of regions concerned with that property (e.g., disordered residues with mis-interactions, domain residues with misfolding). Surprisingly, however, we found the conservation of disordered and structured regions to increase in equal proportion with abundance. This observation implies that either abundance-related constraints are structure-independent, or multiple constraints apply to different regions and perfectly balance each other.
Collapse
|
15
|
Abstract
Defining the principles underlying the organization of biomolecules within cells is a key challenge of current cell biology research. Persson et al. now identify a powerful layer of regulation that allows cells to decouple diffusion from temperature by modulating their intracellular viscosity. This so-called viscoadaptation is mediated through trehalose and glycogen activities, which alter diffusion dynamics and self-assembly propensity inside the cell globally.
Collapse
|
16
|
Designer protein assemblies with tunable phase diagrams in living cells. Nat Chem Biol 2020; 16:939-945. [PMID: 32661377 DOI: 10.1038/s41589-020-0576-z] [Citation(s) in RCA: 50] [Impact Index Per Article: 12.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2019] [Accepted: 05/22/2020] [Indexed: 12/14/2022]
Abstract
Protein self-organization is a hallmark of biological systems. Although the physicochemical principles governing protein-protein interactions have long been known, the principles by which such nanoscale interactions generate diverse phenotypes of mesoscale assemblies, including phase-separated compartments, remain challenging to characterize. To illuminate such principles, we create a system of two proteins designed to interact and form mesh-like assemblies. We devise a new strategy to map high-resolution phase diagrams in living cells, which provide self-assembly signatures of this system. The structural modularity of the two protein components allows straightforward modification of their molecular properties, enabling us to characterize how interaction affinity impacts the phase diagram and material state of the assemblies in vivo. The phase diagrams and their dependence on interaction affinity were captured by theory and simulations, including out-of-equilibrium effects seen in growing cells. Finally, we find that cotranslational protein binding suffices to recruit a messenger RNA to the designed micron-scale structures.
Collapse
|
17
|
Proteomic analysis reveals the direct recruitment of intrinsically disordered regions to stress granules in S. cerevisiae. J Cell Sci 2020; 133:jcs244657. [PMID: 32503941 DOI: 10.1242/jcs.244657] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2020] [Accepted: 05/15/2020] [Indexed: 01/21/2023] Open
Abstract
Stress granules (SGs) are stress-induced membraneless condensates that store non-translating mRNA and stalled translation initiation complexes. Although metazoan SGs are dynamic compartments where proteins can rapidly exchange with their surroundings, yeast SGs seem largely static. To gain a better understanding of yeast SGs, we identified proteins that sediment after heat shock using mass spectrometry. Proteins that sediment upon heat shock are biased toward a subset of abundant proteins that are significantly enriched in intrinsically disordered regions (IDRs). Heat-induced SG localization of over 80 proteins were confirmed using microscopy, including 32 proteins not previously known to localize to SGs. We found that several IDRs were sufficient to mediate SG recruitment. Moreover, the dynamic exchange of IDRs can be observed using fluorescence recovery after photobleaching, whereas other components remain immobile. Lastly, we showed that the IDR of the Ubp3 deubiquitinase was critical for yeast SG formation. This work shows that IDRs can be sufficient for SG incorporation, can remain dynamic in vitrified SGs, and can play an important role in cellular compartmentalization upon stress.This article has an associated First Person interview with the first author of the paper.
Collapse
|
18
|
YeastRGB: comparing the abundance and localization of yeast proteins across cells and libraries. Nucleic Acids Res 2020; 47:D1245-D1249. [PMID: 30357397 PMCID: PMC6324022 DOI: 10.1093/nar/gky941] [Citation(s) in RCA: 29] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2018] [Accepted: 10/18/2018] [Indexed: 01/06/2023] Open
Abstract
The ability to measure the abundance and visualize the localization of proteins across the yeast proteome has stimulated hypotheses on gene function and fueled discoveries. While the classic C’ tagged GFP yeast library has been the only resource for over a decade, the recent development of the SWAT technology has led to the creation of multiple novel yeast libraries where new-generation fluorescent reporters are fused at the N’ and C’ of open reading frames. Efficient access to these data requires a user interface to visualize and compare protein abundance, localization and co-localization across cells, strains, and libraries. YeastRGB (www.yeastRGB.org) was designed to address such a need, through a user-friendly interface that maximizes informative content. It employs a compact display where cells are cropped and tiled together into a ‘cell-grid.’ This representation enables viewing dozens of cells for a particular strain within a display unit, and up to 30 display units can be arrayed on a standard high-definition screen. Additionally, the display unit allows users to control zoom-level and overlay of images acquired using different color channels. Thus, YeastRGB makes comparing abundance and localization efficient, across thousands of cells from different strains and libraries.
Collapse
|
19
|
PDBe-KB: a community-driven resource for structural and functional annotations. Nucleic Acids Res 2020; 48:D344-D353. [PMID: 31584092 PMCID: PMC6943075 DOI: 10.1093/nar/gkz853] [Citation(s) in RCA: 68] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2019] [Revised: 09/11/2019] [Accepted: 10/01/2019] [Indexed: 11/23/2022] Open
Abstract
The Protein Data Bank in Europe-Knowledge Base (PDBe-KB, https://pdbe-kb.org) is a community-driven, collaborative resource for literature-derived, manually curated and computationally predicted structural and functional annotations of macromolecular structure data, contained in the Protein Data Bank (PDB). The goal of PDBe-KB is two-fold: (i) to increase the visibility and reduce the fragmentation of annotations contributed by specialist data resources, and to make these data more findable, accessible, interoperable and reusable (FAIR) and (ii) to place macromolecular structure data in their biological context, thus facilitating their use by the broader scientific community in fundamental and applied research. Here, we describe the guidelines of this collaborative effort, the current status of contributed data, and the PDBe-KB infrastructure, which includes the data exchange format, the deposition system for added value annotations, the distributable database containing the assembled data, and programmatic access endpoints. We also describe a series of novel web-pages-the PDBe-KB aggregated views of structure data-which combine information on macromolecular structures from many PDB entries. We have recently released the first set of pages in this series, which provide an overview of available structural and functional information for a protein of interest, referenced by a UniProtKB accession.
Collapse
|
20
|
Protein Abundance Biases the Amino Acid Composition of Disordered Regions to Minimize Non-functional Interactions. J Mol Biol 2019; 431:4978-4992. [PMID: 31442477 PMCID: PMC6941228 DOI: 10.1016/j.jmb.2019.08.008] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2019] [Revised: 08/07/2019] [Accepted: 08/10/2019] [Indexed: 02/07/2023]
Abstract
In eukaryotes, disordered regions cover up to 50% of proteomes and mediate fundamental cellular processes. In contrast to globular domains, where about half of the amino acids are buried in the protein interior, disordered regions show higher solvent accessibility, which makes them prone to engage in non-functional interactions. Such interactions are exacerbated by the law of mass action, prompting the question of how they are minimized in abundant proteins. We find that interaction propensity or "stickiness" of disordered regions negatively correlates with their cellular abundance, both in yeast and human. Strikingly, considering yeast proteins where a large fraction of the sequence is disordered, the correlation between stickiness and abundance reaches R=-0.55. Beyond this global amino-acid composition bias, we identify three rules by which amino-acid composition of disordered regions adjusts with high abundance. First, lysines are preferred over arginines, consistent with the latter amino acid being stickier than the former. Second, compensatory effects exist, whereby a sticky region can be tolerated if it is compensated by a distal non-sticky region. Third, such compensation requires a lower average stickiness at the same abundance when compared to a scenario where stickiness is homogeneous throughout the sequence. We validate these rules experimentally, employing them as different strategies to rescue an otherwise sticky protein fragment from aggregation. Our results highlight that non-functional interactions represent a significant constraint in cellular systems and reveal simple rules by which protein sequences adapt to that constraint. Data from this work are deposited in Figshare, at https://doi.org/10.6084/m9.figshare.8068937.v3.
Collapse
|
21
|
Infinite Ansammlungen gefalteter Proteine im Kontext von Evolution, Krankheiten und Proteinentwicklung. Angew Chem Int Ed Engl 2019. [DOI: 10.1002/ange.201806092] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
|
22
|
Abstract
In experimental evolution, scientists evolve organisms in the lab, typically by challenging them to new environmental conditions. How best to evolve a desired trait? Should the challenge be applied abruptly, gradually, periodically, sporadically? Should one apply chemical mutagenesis, and do strains with high innate mutation rate evolve faster? What are ideal population sizes of evolving populations? There are endless strategies, beyond those that can be exposed by individual labs. We therefore arranged a community challenge, Evolthon, in which students and scientists from different labs were asked to evolve Escherichia coli or Saccharomyces cerevisiae for an abiotic stress—low temperature. About 30 participants from around the world explored diverse environmental and genetic regimes of evolution. After a period of evolution in each lab, all strains of each species were competed with one another. In yeast, the most successful strategies were those that used mating, underscoring the importance of sex in evolution. In bacteria, the fittest strain used a strategy based on exploration of different mutation rates. Different strategies displayed variable levels of performance and stability across additional challenges and conditions. This study therefore uncovers principles of effective experimental evolutionary regimens and might prove useful also for biotechnological developments of new strains and for understanding natural strategies in evolutionary arms races between species. Evolthon constitutes a model for community-based scientific exploration that encourages creativity and cooperation. This Community Page article describes Evolthon; a first-of-its-kind community-based effort, involving about 30 participant labs around the world, aiming to explore the best strategy for evolving microorganisms to cope with an environmental challenge.
Collapse
|
23
|
Infinite Assembly of Folded Proteins in Evolution, Disease, and Engineering. Angew Chem Int Ed Engl 2019; 58:5514-5531. [PMID: 30133878 PMCID: PMC6471489 DOI: 10.1002/anie.201806092] [Citation(s) in RCA: 29] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2018] [Revised: 08/06/2018] [Indexed: 12/14/2022]
Abstract
Mutations and changes in a protein's environment are well known for their potential to induce misfolding and aggregation, including amyloid formation. Alternatively, such perturbations can trigger new interactions that lead to the polymerization of folded proteins. In contrast to aggregation, this process does not require misfolding and, to highlight this difference, we refer to it as agglomeration. This term encompasses the amorphous assembly of folded proteins as well as the polymerization in one, two, or three dimensions. We stress the remarkable potential of symmetric homo‐oligomers to agglomerate even by single surface point mutations, and we review the double‐edged nature of this potential: how aberrant assemblies resulting from agglomeration can lead to disease, but also how agglomeration can serve in cellular adaptation and be exploited for the rational design of novel biomaterials.
Collapse
|
24
|
Genome-wide SWAp-Tag yeast libraries for proteome exploration. Nat Methods 2018; 15:617-622. [PMID: 29988094 PMCID: PMC6076999 DOI: 10.1038/s41592-018-0044-9] [Citation(s) in RCA: 91] [Impact Index Per Article: 15.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2017] [Accepted: 05/10/2018] [Indexed: 12/31/2022]
Abstract
Yeast libraries revolutionized the systematic study of cell biology. To extensively increase the number of such libraries and the type of information that can be gleaned from them, we previously devised the SWAp-Tag (SWAT) approach that enables rapid, easy and efficient creation of yeast strain collections. Here we present the construction and investigation of a full genome library of ~5500 strains carrying the SWAT NOP1promoter-GFP module at the N terminus of proteins, as well as its use in creating six additional libraries that either restore the native regulation, create an overexpression library with a Cherry tag or enable protein complementation assays from two fragments of an enzyme or fluorophore. We show methods to utilize these SWAT collections to systematically characterize the yeast proteome on multiple levels spanning protein abundance, localization, topology and interactions. Our findings demonstrate how diverse full-genome SWAT libraries facilitate obtaining insights into numerous aspects of the proteome.
Collapse
|
25
|
Abstract
A precise knowledge of the quaternary structure of proteins is essential to illuminate both their function and their evolution. The major part of our knowledge on quaternary structure is inferred from X-ray crystallography data, but this inference process is hard and error-prone. The difficulty lies in discriminating fortuitous protein contacts, which make up the lattice of protein crystals, from biological protein contacts that exist in the native cellular environment. Here, we review methods devised to discriminate between both types of contacts and describe resources for downloading protein quaternary structure information and identifying high-confidence quaternary structures. The use of high-confidence datasets of quaternary structures will be critical for the analysis of structural, functional, and evolutionary properties of proteins.
Collapse
|
26
|
Exhaustive search of linear information encoding protein-peptide recognition. PLoS Comput Biol 2017; 13:e1005499. [PMID: 28426660 PMCID: PMC5417721 DOI: 10.1371/journal.pcbi.1005499] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2015] [Revised: 05/04/2017] [Accepted: 04/04/2017] [Indexed: 11/24/2022] Open
Abstract
High-throughput in vitro methods have been extensively applied to identify linear information that encodes peptide recognition. However, these methods are limited in number of peptides, sequence variation, and length of peptides that can be explored, and often produce solutions that are not found in the cell. Despite the large number of methods developed to attempt addressing these issues, the exhaustive search of linear information encoding protein-peptide recognition has been so far physically unfeasible. Here, we describe a strategy, called DALEL, for the exhaustive search of linear sequence information encoded in proteins that bind to a common partner. We applied DALEL to explore binding specificity of SH3 domains in the budding yeast Saccharomyces cerevisiae. Using only the polypeptide sequences of SH3 domain binding proteins, we succeeded in identifying the majority of known SH3 binding sites previously discovered either in vitro or in vivo. Moreover, we discovered a number of sites with both non-canonical sequences and distinct properties that may serve ancillary roles in peptide recognition. We compared DALEL to a variety of state-of-the-art algorithms in the blind identification of known binding sites of the human Grb2 SH3 domain. We also benchmarked DALEL on curated biological motifs derived from the ELM database to evaluate the effect of increasing/decreasing the enrichment of the motifs. Our strategy can be applied in conjunction with experimental data of proteins interacting with a common partner to identify binding sites among them. Yet, our strategy can also be applied to any group of proteins of interest to identify enriched linear motifs or to exhaustively explore the space of linear information encoded in a polypeptide sequence. Finally, we have developed a webserver located at http://michnick.bcm.umontreal.ca/dalel, offering user-friendly interface and providing different scenarios utilizing DALEL. Here we describe the first strategy for the exhaustive search of the linear information encoding protein-peptide recognition; an approach that has previously been physically unfeasible because the combinatorial space of polypeptide sequences is too vast. The search covers the entire space of sequences with no restriction on motif length or composition, and includes all possible combinations of amino acids at distinct positions of each sequence, as well as positions with correlated preferences for amino acids.
Collapse
|
27
|
Protein-Fragment Complementation Assays for Large-Scale Analysis, Functional Dissection, and Spatiotemporal Dynamic Studies of Protein-Protein Interactions in Living Cells. Cold Spring Harb Protoc 2016; 2016:2016/11/pdb.top083543. [PMID: 27803260 DOI: 10.1101/pdb.top083543] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
Protein-fragment complementation assays (PCAs) comprise a family of assays that can be used to study protein-protein interactions (PPIs), conformation changes, and protein complex dimensions. We developed PCAs to provide simple and direct methods for the study of PPIs in any living cell, subcellular compartments or membranes, multicellular organisms, or in vitro. Because they are complete assays, requiring no cell-specific components other than reporter fragments, they can be applied in any context. PCAs provide a general strategy for the detection of proteins expressed at endogenous levels within appropriate subcellular compartments and with normal posttranslational modifications, in virtually any cell type or organism under any conditions. Here we introduce a number of applications of PCAs in budding yeast, Saccharomyces cerevisiae These applications represent the full range of PPI characteristics that might be studied, from simple detection on a large scale to visualization of spatiotemporal dynamics.
Collapse
|
28
|
The Dihydrofolate Reductase Protein-Fragment Complementation Assay: A Survival-Selection Assay for Large-Scale Analysis of Protein-Protein Interactions. Cold Spring Harb Protoc 2016; 2016:2016/11/pdb.prot090027. [PMID: 27803252 DOI: 10.1101/pdb.prot090027] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
Protein-fragment complementation assays (PCAs) can be used to study protein-protein interactions (PPIs) in any living cell, in vivo or in vitro, in any subcellular compartment or membranes. Here, we present a detailed protocol for performing and analyzing a high-throughput PCA screening to study PPIs in yeast, using dihydrofolate reductase (DHFR) as the reporter protein. The DHFR PCA is a simple survival-selection assay in which Saccharomyces cerevisiae DHFR (scDHFR) is inhibited by methotrexate, thus preventing nucleotide synthesis and causing arrest of cell division. Complementation of cells with a methotrexate-insensitive murine DHFR restores nucleotide synthesis, allowing cell proliferation. The methotrexate-resistant DHFR has two mutations (L22F and F31S) and is 10,000 times less sensitive to methotrexate than wild-type scDHFR, but retains full catalytic activity. The DHFR PCA is sensitive enough for PPIs to be detected for open reading frame (ORF)-PCA fragments expressed off of their endogenous promoters.
Collapse
|
29
|
|
30
|
Symmetry breaking in homo-oligomers: the curious case of mega-hemocyanin. Structure 2015; 23:3-5. [PMID: 25565100 DOI: 10.1016/j.str.2014.12.006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
Abstract
Mega-hemocyanin is a 13.5 MDa oxygen transporter found in snails. It is built from three stacked rings involving ten subunits each. The cryo-EM structure of the complex presented by Gatsogiannis and colleagues in this issue of Structure revealed an unexpected breaking of 5-fold symmetry in the central ring and a nonequivalent packing of the subunits.
Collapse
|
31
|
Fast and accurate discovery of degenerate linear motifs in protein sequences. PLoS One 2014; 9:e106081. [PMID: 25207816 PMCID: PMC4160167 DOI: 10.1371/journal.pone.0106081] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2013] [Accepted: 08/01/2014] [Indexed: 11/20/2022] Open
Abstract
Linear motifs mediate a wide variety of cellular functions, which makes their characterization in protein sequences crucial to understanding cellular systems. However, the short length and degenerate nature of linear motifs make their discovery a difficult problem. Here, we introduce MotifHound, an algorithm particularly suited for the discovery of small and degenerate linear motifs. MotifHound performs an exact and exhaustive enumeration of all motifs present in proteins of interest, including all of their degenerate forms, and scores the overrepresentation of each motif based on its occurrence in proteins of interest relative to a background (e.g., proteome) using the hypergeometric distribution. To assess MotifHound, we benchmarked it together with state-of-the-art algorithms. The benchmark consists of 11,880 sets of proteins from S. cerevisiae; in each set, we artificially spiked-in one motif varying in terms of three key parameters, (i) number of occurrences, (ii) length and (iii) the number of degenerate or “wildcard” positions. The benchmark enabled the evaluation of the impact of these three properties on the performance of the different algorithms. The results showed that MotifHound and SLiMFinder were the most accurate in detecting degenerate linear motifs. Interestingly, MotifHound was 15 to 20 times faster at comparable accuracy and performed best in the discovery of highly degenerate motifs. We complemented the benchmark by an analysis of proteins experimentally shown to bind the FUS1 SH3 domain from S. cerevisiae. Using the full-length protein partners as sole information, MotifHound recapitulated most experimentally determined motifs binding to the FUS1 SH3 domain. Moreover, these motifs exhibited properties typical of SH3 binding peptides, e.g., high intrinsic disorder and evolutionary conservation, despite the fact that none of these properties were used as prior information. MotifHound is available (http://michnick.bcm.umontreal.ca or http://tinyurl.com/motifhound) together with the benchmark that can be used as a reference to assess future developments in motif discovery.
Collapse
|
32
|
Structural, evolutionary, and assembly principles of protein oligomerization. PROGRESS IN MOLECULAR BIOLOGY AND TRANSLATIONAL SCIENCE 2013; 117:25-51. [PMID: 23663964 DOI: 10.1016/b978-0-12-386931-9.00002-7] [Citation(s) in RCA: 85] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/02/2022]
Abstract
In the protein universe, 30-50% of proteins self-assemble to form symmetrical complexes consisting of multiple copies of themselves, called homomers. The prevalence of homomers motivates us to review many of their properties. In Section 1, we describe the methods and challenges associated with quaternary structure inference-these methods are indeed at the basis of any analysis on homomers. In Section 2, we describe the morphological properties of homomers, as well as the database 3DComplex, which provides a taxonomy for both homomeric and heteromeric protein complexes. In Section 3, we review interface properties of homomeric complexes. In Section 4, we then present recent findings on the evolution of homomer interfaces, which we link in Section 5 to the evolution of homomers as entire entities. In Section 6, we discuss mechanisms involved in their assembly and how these mechanisms can be linked to evolution.
Collapse
|
33
|
Protein abundance is key to distinguish promiscuous from functional phosphorylation based on evolutionary information. Philos Trans R Soc Lond B Biol Sci 2012; 367:2594-606. [PMID: 22889910 DOI: 10.1098/rstb.2012.0078] [Citation(s) in RCA: 79] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023] Open
Abstract
In eukaryotic cells, protein phosphorylation is an important and widespread mechanism used to regulate protein function. Yet, of the thousands of phosphosites identified to date, only a few hundred at best have a characterized function. It was recently shown that these functional sites are significantly more conserved than phosphosites of unknown function, stressing the importance of considering evolutionary conservation in assessing the global functional landscape of phosphosites. This leads us to review studies that examined the impact of phosphorylation on evolutionary conservation. While all these studies have shown that conservation is greater among phosphorylated sites compared with non-phosphorylated ones, the magnitude of this difference varies greatly. Further, not all studies have considered key factors that may influence the rate of phosphosite evolution. Such key factors are their localization in ordered or disordered regions, their stoichiometry or the abundance of their corresponding protein. Here we take into account all of these factors simultaneously, which reveals remarkable evolutionary patterns. First, while it is well established that protein conservation increases with abundance, we show that phosphosites partly follow an opposite trend. More precisely, Saccharomyces cerevisiae phosphosites present among abundant proteins are 1.5 times more likely to diverge in the closely related species Saccharomyces bayanus when compared with phosphosites present in the 5 per cent least abundant proteins. Second, we show that conservation is coupled to stoichiometry, whereby sites frequently phosphorylated are more conserved than those rarely phosphorylated. Finally, we provide a model of functional and noisy or 'accidental' phosphorylation that explains these observations.
Collapse
|
34
|
Abstract
In living cells, functional protein-protein interactions compete with a much larger number of nonfunctional, or promiscuous, interactions. Several cellular properties contribute to avoiding unwanted protein interactions, including regulation of gene expression, cellular compartmentalization, and high specificity and affinity of functional interactions. Here we investigate whether other mechanisms exist that shape the sequence and structure of proteins to favor their correct assembly into functional protein complexes. To examine this question, we project evolutionary and cellular abundance information onto 397, 196, and 631 proteins of known 3D structure from Escherichia coli, Saccharomyces cerevisiae, and Homo sapiens, respectively. On the basis of amino acid frequencies in interface patches versus the solvent-accessible protein surface, we define a propensity or "stickiness" scale for each of the 20 amino acids. We find that the propensity to interact in a nonspecific manner is inversely correlated with abundance. In other words, high abundance proteins have less sticky surfaces. We also find that stickiness constrains protein evolution, whereby residues in sticky surface patches are more conserved than those found in nonsticky patches. Finally, we find that the constraint imposed by stickiness on protein divergence is proportional to protein abundance, which provides mechanistic insights into the correlation between protein conservation and protein abundance. Overall, the avoidance of nonfunctional interactions significantly influences the physico-chemical and evolutionary properties of proteins. Remarkably, the effects observed are consistently larger in E. coli and S. cerevisiae than in H. sapiens, suggesting that promiscuous protein-protein interactions may be freer to accumulate in the human lineage.
Collapse
|
35
|
Molecular characterization of the evolution of phagosomes. Mol Syst Biol 2011; 6:423. [PMID: 20959821 PMCID: PMC2990642 DOI: 10.1038/msb.2010.80] [Citation(s) in RCA: 101] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2010] [Accepted: 09/15/2010] [Indexed: 11/23/2022] Open
Abstract
First large-scale comparative proteomics/phosphoproteomics study characterizing some of the key steps that contributed to the remodeling of phagosomes that occurred during evolution. Comparison of profiling analyses of isolated phagosomes from three distant organisms (Dictyostelium, Drosophila, and mouse) revealed a protein core that defines a potential ‘ancient' phagosome and a set of 50 proteins that emerged while adaptive immunity was already well established. Gene duplication events of mouse phagosome paralogs occurred mostly in Bilateria and Euteleostomi, coinciding with the emergence of innate and adaptive immunity, and thus, provided the functional innovations needed for the establishment of these two crucial evolutionary steps of the immune system. Phosphoproteomics of isolated phagosomes from the same three distant species indicate that the phagosome phosphoproteome has been extensively modified during evolution. Still, some phosphosites have been maintained for >1.2 billion years, and thus, highlight their particular significance in the regulation of key phagosomal functions.
Phagocytosis is the process by which multiple cell types internalize large particulate material from the external milieu. The functional properties of phagosomes are acquired through a complex maturation process, referred to as phagolysosome biogenesis. This pathway involves a series of rapid interactions with organelles of the endocytic apparatus, enabling the gradual transformation of newly formed phagosomes into phagolysosomes in which proteolytic degradation occurs. The degradative environment encountered in the phagosome lumen has enabled the use of phagocytosis as a predation mechanism for feeding (phagotrophy) in amoeba, whereas multicellular organisms utilize this process as a defense mechanism to kill microbes and, in jawed vertebrates (fish), initiate a sustained immune response. High-throughput proteomics profiling of isolated phagosomes has been tremendously helpful for the molecular comprehension of this organelle. This approach is achieved by feeding low buoyancy latex beads to phagocytic cells, enabling the subsequent isolation of latex bead-containing phagosomes, away from all the other cell organelles, by a single-isopicnic centrifugation in sucrose gradient. In order to characterize some of the key steps that contributed to the remodeling of phagosomes during evolution, we isolated this organelle from three distant organisms: the amoeba Dictyostelium discoideum, the fruit fly Drosophila melanogaster, and mouse (Mus musculus) that use phagocytosis for different purposes, and performed detailed proteomics and phosphoproteomics analyses with unparallel protein coverage for this organelle (two- to four-fold enhancements in identified proteins). In order to establish the origin of the mouse phagosome proteome, we performed comparative analyses among 39 taxa including plants/algea, unicellular organisms, fungi, and more complex animal multicellular organisms. These genomic comparisons indicated that a large proportion of the mouse phagosome proteome is of ancient origin (73.1% of the proteome is conserved in eukaryotic organisms) (Figure 2A). This stresses the fact that phagocytosis is a very ancient process, as shown by its possible involvement in the emergence of eukaryotic cells (eukaryogenesis). Indeed, we identified close to 300 phagosome mouse proteins also present on Drosophila and Dictyostelium phagosomes, defining a potential ‘ancient' core of proteins from which the immune functions of phagosomes likely evolved. Around 16.7% of the mouse phagosome proteins appeared in organisms that use phagocytosis for innate immunity (Bilateria to Chordata), whereas 10.2% appeared in Euteleostomi or Tetrapoda where phagosomes have an important function in linking the killing of microorganisms with the development of a specific sustained immune response following antigen recognition. The phagosome is made of molecules taken from a variety of sources within the cell, including the cytoplasm, the cytoskeleton and membrane organelles. Despite the evolution and diversification of these various cellular systems, the mammalian phagosome proteome is made preferentially of ancient proteins (Figure 2B). Comparison of functional annotation during evolution highlighted the emergence of specific phagosomal functions at various steps during evolution (Figure 2C). Some of these proteins and their point of origin during evolution are highlighted in Figure 2D. Strikingly, we identified in Tetrapods a set of 50 proteins that arose while adaptive immunity was already well established in teleosts (fish), indicating that the phagocytic system is still evolving. Our study highlights the fact that the functional properties of phagosomes emerged by the remodeling of ancient molecules, the addition of novel components, and the duplication of existing proteins (paralogs) leading to the formation of molecular machines of mixed origin. Gene duplication is a process that contributed continuously to the complexification of the mouse proteome during evolution. In sharp contrast, paralog analysis indicated that the phagosome proteome was mainly reorganized through two periods of gene duplication, in Bilateria and Euteleostomi, coinciding with the emergence of adaptive immunity (in jawed fish), and innate immunity (at the split between Metazoa and Bilateria). These results strongly suggest that selective constraints may have favored the maintenance of phagosome paralogs to ensure the establishment of novel functions associated with this organelle at these two crucial evolutionary steps of the immune system. The emergence of genes associated to the MHC locus in mammals that appeared originally in the genome of jawed fishes, contributed to the development of complex molecular mechanisms linking innate (our immune system that defends the host from infection in a non-specific manner) and adaptive immunity (the part of the immune system triggered specifically after antigen recognition). Several of the genes of this locus encode proteins known to have important functions in antigen presentation, such as subunits of the immunoproteasome (LMP2 and LMP7), MHC class I and class II molecules, as well as tapasin and the transporter associated with antigen processing (TAP1 and TAP2), involved in the transport and loading of peptides on MHC class I molecules (Figure 6). In addition to their ability to present peptides on MHC class II molecules, phagosomes of vertebrates have been shown to be competent for the presentation of exogenous peptides on MHC class I molecules, a process referred to as cross-presentation. From a functional point of view, the involvement of phagosomes in antigen cross-presentation is the outcome of the successful integration of a wide range of multimolecular components that emerged throughout evolution (Figure 6). The trimming of exogenous proteins into small peptides that can be loaded on MHC class I molecules is inherited from the phagotrophic properties of unicellular organisms, where internalized bacteria are degraded into basic molecules and used as a source of nutrients. Ancient processes have therefore been co-opted (the use of an existing biological structure or feature for a new function) for new functionalities. A summarizing model of the various steps that enabled phagosome antigen presentation is presented in Figure 6. This model highlights the fact that although antigen presentation is unique to evolutionary recent phagosomes (starting in jawed fishes about 450 million years ago), it uses and integrates molecular machines composed of proteins that emerged throughout evolution. In summary, we present here the first large-scale comparative proteomics/phosphoproteomics study characterizing some of the key evolutionary steps that contributed to the remodeling of phagosomes during evolution. Functional properties of this organelle emerged by the remodeling of ancient molecules, the addition of novel components, the extensive adaption of protein phosphorylation sites and the duplication of existing proteins leading to the formation of molecular machines of mixed origin. Amoeba use phagocytosis to internalize bacteria as a source of nutrients, whereas multicellular organisms utilize this process as a defense mechanism to kill microbes and, in vertebrates, initiate a sustained immune response. By using a large-scale approach to identify and compare the proteome and phosphoproteome of phagosomes isolated from distant organisms, and by comparative analysis over 39 taxa, we identified an ‘ancient' core of phagosomal proteins around which the immune functions of this organelle have likely organized. Our data indicate that a larger proportion of the phagosome proteome, compared with the whole cell proteome, has been acquired through gene duplication at a period coinciding with the emergence of innate and adaptive immunity. Our study also characterizes in detail the acquisition of novel proteins and the significant remodeling of the phagosome phosphoproteome that contributed to modify the core constituents of this organelle in evolution. Our work thus provides the first thorough analysis of the changes that enabled the transformation of the phagosome from a phagotrophic compartment into an organelle fully competent for antigen presentation.
Collapse
|
36
|
A Simple Definition of Structural Regions in Proteins and Its Use in Analyzing Interface Evolution. J Mol Biol 2010; 403:660-70. [DOI: 10.1016/j.jmb.2010.09.028] [Citation(s) in RCA: 133] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2010] [Revised: 08/19/2010] [Accepted: 09/13/2010] [Indexed: 10/19/2022]
|
37
|
Systemizing the structures and structuring the system. Expert Rev Proteomics 2010; 7:319-22. [PMID: 20536302 DOI: 10.1586/epr.10.33] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
This Keystone symposium, entitled 'Biomolecular Interactions and Networks: function and disease', was held in Quebec City, Canada, 7-12 March 2010. The conference was distinctive in that it bridged two fields that may be perceived as having little in common: structural and systems biology. However, the growth in structural and omics data brings these two fields closer and closer. Indeed, in two sections of this article we cover talks on systematic analyses of protein structures, as well as systems level approaches that incorporate structural information. In two other sections, we report studies that aim at charting and analyzing cellular systems, and finally we discuss talks that pointed to the issue of promiscuity in biological networks.
Collapse
|
38
|
|
39
|
Physicochemical principles that regulate the competition between functional and dysfunctional association of proteins. Proc Natl Acad Sci U S A 2009; 106:10159-64. [PMID: 19502422 PMCID: PMC2700930 DOI: 10.1073/pnas.0812414106] [Citation(s) in RCA: 124] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2008] [Indexed: 01/31/2023] Open
Abstract
To maintain protein homeostasis, a variety of quality control mechanisms, such as the unfolded protein response and the heat shock response, enable proteins to fold and to assemble into functional complexes while avoiding the formation of aberrant and potentially harmful aggregates. We show here that a complementary contribution to the regulation of the interactions between proteins is provided by the physicochemical properties of their amino acid sequences. The results of a systematic analysis of the protein-protein complexes in the Protein Data Bank (PDB) show that interface regions are more prone to aggregate than other surface regions, indicating that many of the interactions that promote the formation of functional complexes, including hydrophobic and electrostatic forces, can potentially also cause abnormal intermolecular association. We also show, however, that aggregation-prone interfaces are prevented from triggering uncontrolled assembly by being stabilized into their functional conformations by disulfide bonds and salt bridges. These results indicate that functional and dysfunctional association of proteins are promoted by similar forces but also that they are closely regulated by the presence of specific interactions that stabilize native states.
Collapse
|
40
|
Weak functional constraints on phosphoproteomes. Trends Genet 2009; 25:193-7. [DOI: 10.1016/j.tig.2009.03.003] [Citation(s) in RCA: 194] [Impact Index Per Article: 12.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2008] [Revised: 03/06/2009] [Accepted: 03/06/2009] [Indexed: 10/20/2022]
|
41
|
Self-assembly and evolution of homomeric protein complexes. PHYSICAL REVIEW LETTERS 2009; 102:118106. [PMID: 19392244 DOI: 10.1103/physrevlett.102.118106] [Citation(s) in RCA: 43] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/22/2008] [Indexed: 05/27/2023]
Abstract
We introduce a simple "patchy particle" model to study the thermodynamics and dynamics of self-assembly of homomeric protein complexes. Our calculations allow us to rationalize recent results for dihedral complexes. Namely, why evolution of such complexes naturally takes the system into a region of interaction space where (i) the evolutionarily newer interactions are weaker, (ii) subcomplexes involving the stronger interactions are observed to be thermodynamically stable on destabilization of the protein-protein interactions, and (iii) the self-assembly dynamics are hierarchical with these same subcomplexes acting as kinetic intermediates.
Collapse
|
42
|
|
43
|
Assembly reflects evolution of protein complexes. Nature 2008; 453:1262-5. [PMID: 18563089 DOI: 10.1038/nature06942] [Citation(s) in RCA: 323] [Impact Index Per Article: 20.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2007] [Accepted: 03/20/2008] [Indexed: 02/03/2023]
Abstract
A homomer is formed by self-interacting copies of a protein unit. This is functionally important, as in allostery, and structurally crucial because mis-assembly of homomers is implicated in disease. Homomers are widespread, with 50-70% of proteins with a known quaternary state assembling into such structures. Despite their prevalence, their role in the evolution of cellular machinery and the potential for their use in the design of new molecular machines, little is known about the mechanisms that drive formation of homomers at the level of evolution and assembly in the cell. Here we present an analysis of over 5,000 unique atomic structures and show that the quaternary structure of homomers is conserved in over 70% of protein pairs sharing as little as 30% sequence identity. Where quaternary structure is not conserved among the members of a protein family, a detailed investigation revealed well-defined evolutionary pathways by which proteins transit between different quaternary structure types. Furthermore, we show by perturbing subunit interfaces within complexes and by mass spectrometry analysis, that the (dis)assembly pathway mimics the evolutionary pathway. These data represent a molecular analogy to Haeckel's evolutionary paradigm of embryonic development, where an intermediate in the assembly of a complex represents a form that appeared in its own evolutionary history. Our model of self-assembly allows reliable prediction of evolution and assembly of a complex solely from its crystal structure.
Collapse
|
44
|
Evolution and dynamics of protein interactions and networks. Curr Opin Struct Biol 2008; 18:349-57. [DOI: 10.1016/j.sbi.2008.03.003] [Citation(s) in RCA: 77] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2007] [Revised: 03/04/2008] [Accepted: 03/04/2008] [Indexed: 12/29/2022]
|
45
|
Evolution of protein complexes by duplication of homomeric interactions. Genome Biol 2007; 8:R51. [PMID: 17411433 PMCID: PMC1895999 DOI: 10.1186/gb-2007-8-4-r51] [Citation(s) in RCA: 150] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2006] [Revised: 01/15/2007] [Accepted: 04/05/2007] [Indexed: 12/02/2022] Open
Abstract
A study of yeast protein complexes, complexes of known three-dimensional structure in the Protein Data Bank and clusters of pair-wise protein interactions in the networks of several organisms revealed that duplication of homomeric interactions often results in the formation of complexes of paralogous proteins. Background Cellular functions are accomplished by the concerted actions of functional modules. The mechanisms driving the emergence and evolution of these modules are still unclear. Here we investigate the evolutionary origins of protein complexes, modules in physical protein-protein interaction networks. Results We studied protein complexes in Saccharomyces cerevisiae, complexes of known three-dimensional structure in the Protein Data Bank and clusters of pairwise protein interactions in the networks of several organisms. We found that duplication of homomeric interactions, a large class of protein interactions, frequently results in the formation of complexes of paralogous proteins. This route is a common mechanism for the evolution of complexes and clusters of protein interactions. Our conclusions are further confirmed by theoretical modelling of network evolution. We propose reasons for why this is favourable in terms of structure and function of protein complexes. Conclusion Our study provides the first insight into the evolution of functional modularity in protein-protein interaction networks, and the origins of a large class of protein complexes.
Collapse
|
46
|
Abstract
Using a previously developed automated method for enzyme annotation, we report the re-annotation of the ENZYME database and the analysis of local error rates per class. In control experiments, we demonstrate that the method is able to correctly re-annotate 91% of all Enzyme Classification (EC) classes with high coverage (755 out of 827). Only 44 enzyme classes are found to contain false positives, while the remaining 28 enzyme classes are not represented. We also show cases where the re-annotation procedure results in partial overlaps for those few enzyme classes where a certain inconsistency might appear between homologous proteins, mostly due to function specificity. Our results allow the interactive exploration of the EC hierarchy for known enzyme families as well as putative enzyme sequences that may need to be classified within the EC hierarchy. These aspects of our framework have been incorporated into a web-server, called CORRIE, which stands for Correspondence Indicator Estimation and allows the interactive prediction of a functional class for putative enzymes from sequence alone, supported by probabilistic measures in the context of the pre-calculated Correspondence Indicators of known enzymes with the functional classes of the EC hierarchy. The CORRIE server is available at: http://www.genomes.org/services/corrie/.
Collapse
|
47
|
Abstract
Most of the proteins in a cell assemble into complexes to carry out their function. It is therefore crucial to understand the physicochemical properties as well as the evolution of interactions between proteins. The Protein Data Bank represents an important source of information for such studies, because more than half of the structures are homo- or heteromeric protein complexes. Here we propose the first hierarchical classification of whole protein complexes of known 3-D structure, based on representing their fundamental structural features as a graph. This classification provides the first overview of all the complexes in the Protein Data Bank and allows nonredundant sets to be derived at different levels of detail. This reveals that between one-half and two-thirds of known structures are multimeric, depending on the level of redundancy accepted. We also analyse the structures in terms of the topological arrangement of their subunits and find that they form a small number of arrangements compared with all theoretically possible ones. This is because most complexes contain four subunits or less, and the large majority are homomeric. In addition, there is a strong tendency for symmetry in complexes, even for heteromeric complexes. Finally, through comparison of Biological Units in the Protein Data Bank with the Protein Quaternary Structure database, we identified many possible errors in quaternary structure assignments. Our classification, available as a database and Web server at http://www.3Dcomplex.org, will be a starting point for future work aimed at understanding the structure and evolution of protein complexes. The millions of genes sequenced over the past decade correspond to a much smaller set of protein structural domains, or folds—probably only a few thousand. Since structural data is being accumulated at a fast pace, classifications of domains such as SCOP help significantly in understanding the sequence–structure relationship. More recently, classifications of interacting domain pairs address the relationship between sequence divergence and domain–domain interaction. One level of description that has yet to be investigated is the protein complex level, which is the physiologically relevant state for most proteins within the cell. Here, Levy and colleagues propose a classification scheme for protein complexes, which will allow a better understanding of their structural properties and evolution.
Collapse
|
48
|
The origins and evolution of functional modules: lessons from protein complexes. Philos Trans R Soc Lond B Biol Sci 2006; 361:507-17. [PMID: 16524839 PMCID: PMC1609335 DOI: 10.1098/rstb.2005.1807] [Citation(s) in RCA: 95] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Modularity is an attribute of a system that can be decomposed into a set of cohesive entities that are loosely coupled. Many cellular networks can be decomposed into functional modules-each functionally separable from the other modules. The protein complexes in physical protein interaction networks are a good example of this, and here we focus on their origins and evolution. We investigate the emergence of protein complexes and physical interactions between proteins by duplication, and review other mechanisms. We dissect the dataset of protein complexes of known three-dimensional structure, and show that roughly 90% of these complexes contain contacts between identical proteins within the same complex. Proteins that are shared across different complexes occur frequently, and they tend to be essential genes more often than members of a single protein complex. We also provide a perspective on the evolutionary mechanisms driving the growth of other modular cellular networks such as transcriptional regulatory and metabolic networks.
Collapse
|
49
|
Probabilistic annotation of protein sequences based on functional classifications. BMC Bioinformatics 2005; 6:302. [PMID: 16354297 PMCID: PMC1361783 DOI: 10.1186/1471-2105-6-302] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2005] [Accepted: 12/14/2005] [Indexed: 11/17/2022] Open
Abstract
Background One of the most evident achievements of bioinformatics is the development of methods that transfer biological knowledge from characterised proteins to uncharacterised sequences. This mode of protein function assignment is mostly based on the detection of sequence similarity and the premise that functional properties are conserved during evolution. Most automatic approaches developed to date rely on the identification of clusters of homologous proteins and the mapping of new proteins onto these clusters, which are expected to share functional characteristics. Results Here, we inverse the logic of this process, by considering the mapping of sequences directly to a functional classification instead of mapping functions to a sequence clustering. In this mode, the starting point is a database of labelled proteins according to a functional classification scheme, and the subsequent use of sequence similarity allows defining the membership of new proteins to these functional classes. In this framework, we define the Correspondence Indicators as measures of relationship between sequence and function and further formulate two Bayesian approaches to estimate the probability for a sequence of unknown function to belong to a functional class. This approach allows the parametrisation of different sequence search strategies and provides a direct measure of annotation error rates. We validate this approach with a database of enzymes labelled by their corresponding four-digit EC numbers and analyse specific cases. Conclusion The performance of this method is significantly higher than the simple strategy consisting in transferring the annotation from the highest scoring BLAST match and is expected to find applications in automated functional annotation pipelines.
Collapse
|
50
|
Ultrastructure of mesangial and juxtaglomerular cells in the kidney of a hibernator. ZEITSCHRIFT FUR ZELLFORSCHUNG UND MIKROSKOPISCHE ANATOMIE (VIENNA, AUSTRIA : 1948) 1971; 118:326-32. [PMID: 4327754 DOI: 10.1007/bf00331191] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/10/2023]
|