1
|
Weibel CA, Wheeler AL, James JE, Willis SM, McShea H, Masel J. The protein domains of vertebrate species in which selection is more effective have greater intrinsic structural disorder. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.03.02.530449. [PMID: 38712167 PMCID: PMC11071303 DOI: 10.1101/2023.03.02.530449] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/08/2024]
Abstract
The nearly neutral theory of molecular evolution posits variation among species in the effectiveness of selection. In an idealized model, the census population size determines both this minimum magnitude of the selection coefficient required for deleterious variants to be reliably purged, and the amount of neutral diversity. Empirically, an "effective population size" is often estimated from the amount of putatively neutral genetic diversity and is assumed to also capture a species' effectiveness of selection. A potentially more direct measure of the effectiveness of selection is the degree to which selection maintains preferred codons. However, past metrics that compare codon bias across species are confounded by among-species variation in %GC content and/or amino acid composition. Here we propose a new Codon Adaptation Index of Species (CAIS), based on Kullback-Leibler divergence, that corrects for both confounders. We demonstrate the use of CAIS correlations, as well as the Effective Number of Codons, to show that the protein domains of more highly adapted vertebrate species evolve higher intrinsic structural disorder.
Collapse
Affiliation(s)
- Catherine A. Weibel
- Department of Mathematics, University of Arizona, Tucson, Arizona 85721, USA
- Department of Physics, University of Arizona, Tucson, Arizona 85721, USA
- present address: Department of Applied Physics, Stanford University, California, USA
| | - Andrew L. Wheeler
- Genetics Graduate Interdisciplinary Program, University of Arizona, Tucson, Arizona 85721, USA
| | - Jennifer E. James
- Department of Ecology and Evolutionary Biology, University of Arizona, Tucson, Arizona 85721, USA
- present address: Department of Ecology and Genetics, Evolutionary Biology Center, Uppsala University, Sweden
| | - Sara M. Willis
- Department of Ecology and Evolutionary Biology, University of Arizona, Tucson, Arizona 85721, USA
- present address: University Information Technology Services, University of Arizona, Tucson, Arizona 85721, USA
| | - Hanon McShea
- Department of Earth System Science, Stanford University
| | - Joanna Masel
- Department of Ecology and Evolutionary Biology, University of Arizona, Tucson, Arizona 85721, USA
| |
Collapse
|
2
|
Acheson K, Kirrander A. Automatic Clustering of Excited-State Trajectories: Application to Photoexcited Dynamics. J Chem Theory Comput 2023; 19:6126-6138. [PMID: 37703098 PMCID: PMC10536988 DOI: 10.1021/acs.jctc.3c00776] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2023] [Indexed: 09/14/2023]
Abstract
We introduce automatic clustering as a computationally efficient tool for classifying and interpreting trajectories from simulations of photo-excited dynamics. Trajectories are treated as time-series data, with the features for clustering selected by variance mapping of normalized data. The L2-norm and dynamic time warping are proposed as suitable similarity measures for calculating the distance matrices, and these are clustered using the unsupervised density-based DBSCAN algorithm. The silhouette coefficient and the number of trajectories classified as noise are used as quality measures for the clustering. The ability of clustering to provide rapid overview of large and complex trajectory data sets, and its utility for extracting chemical and physical insight, is demonstrated on trajectories corresponding to the photochemical ring-opening reaction of 1,3-cyclohexadiene, noting that the clustering can be used to generate reduced dimensionality representations in an unbiased manner.
Collapse
Affiliation(s)
- Kyle Acheson
- EaStCHEM,
School of Chemistry and Centre for Science at Extreme Conditions, University of Edinburgh, David Brewster Road, Edinburgh EH9 3FJ, U.K.
- Department
of Chemistry, University of Warwick, Coventry CV4 7AL, U.K.
| | - Adam Kirrander
- Physical
and Theoretical Chemistry Laboratory, Department of Chemistry, University of Oxford, South Parks Road, Oxford OX1 3QZ, U.K.
| |
Collapse
|
3
|
Tang QY, Ren W, Wang J, Kaneko K. The Statistical Trends of Protein Evolution: A Lesson from AlphaFold Database. Mol Biol Evol 2022; 39:6701686. [PMID: 36108094 PMCID: PMC9550990 DOI: 10.1093/molbev/msac197] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open
Abstract
The recent development of artificial intelligence provides us with new and powerful tools for studying the mysterious relationship between organism evolution and protein evolution. In this work, based on the AlphaFold Protein Structure Database (AlphaFold DB), we perform comparative analyses of the proteins of different organisms. The statistics of AlphaFold-predicted structures show that, for organisms with higher complexity, their constituent proteins will have larger radii of gyration, higher coil fractions, and slower vibrations, statistically. By conducting normal mode analysis and scaling analyses, we demonstrate that higher organismal complexity correlates with lower fractal dimensions in both the structure and dynamics of the constituent proteins, suggesting that higher functional specialization is associated with higher organismal complexity. We also uncover the topology and sequence bases of these correlations. As the organismal complexity increases, the residue contact networks of the constituent proteins will be more assortative, and these proteins will have a higher degree of hydrophilic-hydrophobic segregation in the sequences. Furthermore, by comparing the statistical structural proximity across the proteomes with the phylogenetic tree of homologous proteins, we show that, statistical structural proximity across the proteomes may indirectly reflect the phylogenetic proximity, indicating a statistical trend of protein evolution in parallel with organism evolution. This study provides new insights into how the diversity in the functionality of proteins increases and how the dimensionality of the manifold of protein dynamics reduces during evolution, contributing to the understanding of the origin and evolution of lives.
Collapse
Affiliation(s)
| | - Weitong Ren
- Theoretical Molecular Science Laboratory, RIKEN Cluster for Pioneering Research, 2-1 Hirosawa, Wako, Saitama 351-0198, Japan
| | - Jun Wang
- School of Physics, National Laboratory of Solid State Microstructure, and Collaborative Innovation Center of Advanced Microstructures, Nanjing University, Nanjing 210093, People’s Republic of China
| | | |
Collapse
|
4
|
Kosinski LJ, Aviles NR, Gomez K, Masel J. Random peptides rich in small and disorder-promoting amino acids are less likely to be harmful. Genome Biol Evol 2022; 14:evac085. [PMID: 35668555 PMCID: PMC9210321 DOI: 10.1093/gbe/evac085] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2021] [Revised: 04/01/2022] [Accepted: 05/27/2022] [Indexed: 11/15/2022] Open
Abstract
Proteins are the workhorses of the cell, yet they carry great potential for harm via misfolding and aggregation. Despite the dangers, proteins are sometimes born de novo from non-coding DNA. Proteins are more likely to be born from non-coding regions that produce peptides that do little to no harm when translated than from regions that produce harmful peptides. To investigate which newborn proteins are most likely to "first, do no harm", we estimate fitnesses from an experiment that competed Escherichia coli lineages that each expressed a unique random peptide. A variety of peptide metrics significantly predict lineage fitness, but this predictive power stems from simple amino acid frequencies rather than the ordering of amino acids. Amino acids that are smaller and that promote intrinsic structural disorder have more benign fitness effects. We validate that the amino acids that indicate benign effects in random peptides expressed in E. coli also do so in an independent dataset of random N-terminal tags in which it is possible to control for expression level. The same amino acids are also enriched in young animal proteins.
Collapse
Affiliation(s)
- Luke J Kosinski
- Department of Molecular and Cellular Biology, University of Arizona, Tucson, USA
| | - Nathan R Aviles
- Graduate Interdisciplinary Program in Statistics, University of Arizona, Tucson, USA
| | - Kevin Gomez
- Graduate Interdisciplinary Program in Applied Math, University of Arizona, Tucson, USA
| | - Joanna Masel
- Department of Ecology and Evolutionary Biology, University of Arizona, Tucson, USA
| |
Collapse
|
5
|
Weisman CM, Murray AW, Eddy SR. Mixing genome annotation methods in a comparative analysis inflates the apparent number of lineage-specific genes. Curr Biol 2022; 32:2632-2639.e2. [PMID: 35588743 DOI: 10.1016/j.cub.2022.04.085] [Citation(s) in RCA: 25] [Impact Index Per Article: 12.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2022] [Revised: 03/17/2022] [Accepted: 04/21/2022] [Indexed: 12/16/2022]
Abstract
Comparisons of genomes of different species are used to identify lineage-specific genes, those genes that appear unique to one species or clade. Lineage-specific genes are often thought to represent genetic novelty that underlies unique adaptations. Identification of these genes depends not only on genome sequences, but also on inferred gene annotations. Comparative analyses typically use available genomes that have been annotated using different methods, increasing the risk that orthologous DNA sequences may be erroneously annotated as a gene in one species but not another, appearing lineage specific as a result. To evaluate the impact of such "annotation heterogeneity," we identified four clades of species with sequenced genomes with more than one publicly available gene annotation, allowing us to compare the number of lineage-specific genes inferred when differing annotation methods are used to those resulting when annotation method is uniform across the clade. In these case studies, annotation heterogeneity increases the apparent number of lineage-specific genes by up to 15-fold, suggesting that annotation heterogeneity is a substantial source of potential artifact.
Collapse
Affiliation(s)
- Caroline M Weisman
- Lewis-Sigler Institute for Integrative Genomics, Carl Icahn Laboratory, Princeton University, South Drive, Princeton, NJ 08540, USA.
| | - Andrew W Murray
- Department of Molecular & Cellular Biology, Harvard University, Divinity Avenue, Cambridge, MA 02138, USA
| | - Sean R Eddy
- Department of Molecular & Cellular Biology, Harvard University, Divinity Avenue, Cambridge, MA 02138, USA; Howard Hughes Medical Institute, Jones Bridge Road, Chevy Chase, MD 20815, USA; John A. Paulson School of Engineering and Applied Sciences, Harvard University, Oxford Street, Cambridge, MA 02138, USA
| |
Collapse
|
6
|
Parakkunnel R, Bhojaraja Naik K, Susmita C, Girimalla V, Bhaskar KU, Sripathy KV, Shantharaja CS, Aravindan S, Kumar S, Lakhanpaul S, Bhat KV. Evolution and co-evolution: insights into the divergence of plant heat shock factor genes. PHYSIOLOGY AND MOLECULAR BIOLOGY OF PLANTS : AN INTERNATIONAL JOURNAL OF FUNCTIONAL PLANT BIOLOGY 2022; 28:1029-1047. [PMID: 35722513 PMCID: PMC9203654 DOI: 10.1007/s12298-022-01183-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/28/2022] [Revised: 04/27/2022] [Accepted: 05/04/2022] [Indexed: 05/03/2023]
Abstract
The Heat Shock Factor (Hsf) genes are widely distributed across the plant kingdom regulating the plant response to various abiotic stresses. In addition to natural selection, breeding and accelerated selection changed the structure and function of Hsf genes. 1076 Hsf genes from 30 genera from primitive algae to the most advanced plant species and major crop plants were used for phylogenetic analysis. The interspecific divergence was studied with 11 members of genus Oryza while intraspecific divergence was studied with sesame pan-genome adapted to diverse ecological niches. B2 genes in eudicots and monocots originated separately while A1 gave rise to the recently evolved Class-C genes and land colonization happened with evolution of A1 genes. An increase in the number of lineages in the Oryza clade with the evolution of AA genome indicated independent domestication and positive selection was observed in > 53% of loci whereas the highly conserved homologues were under purifying selection. The paralogous genes under positive selection exhibited more domain changes for diversified function and increased fitness. A significant co-evolving cluster involving amino acids Phenylalanine, Lysine and Valine played crucial role in maintaining hydrophobic core along with highly conserved Tryptophan residues. A mutation of Glutamic acid to Glutamine was observed in A8 genes of Lamiales affecting protein solvency. Breeding resulted in accumulation of mutations reducing the hydrophobicity of proteins and a further reduction in protein aggregation. This study identify genome duplications, non-neutral selection and co-evolving residues as causing drastic changes in the conserved domain of Hsf proteins. Supplementary information The online version contains supplementary material available at 10.1007/s12298-022-01183-7.
Collapse
Affiliation(s)
- Ramya Parakkunnel
- ICAR- Indian Institute of Seed Science, Regional Station, GKVK Campus, Bengaluru, Karnataka 560065 India
| | - K Bhojaraja Naik
- ICAR- Indian Institute of Seed Science, Regional Station, GKVK Campus, Bengaluru, Karnataka 560065 India
| | - C Susmita
- ICAR- Indian Institute of Seed Science, Mau, Uttar Pradesh 275103 India
| | - Vanishree Girimalla
- ICAR- Indian Institute of Seed Science, Regional Station, GKVK Campus, Bengaluru, Karnataka 560065 India
| | - K Udaya Bhaskar
- ICAR- Indian Institute of Seed Science, Regional Station, GKVK Campus, Bengaluru, Karnataka 560065 India
| | - KV Sripathy
- ICAR- Indian Institute of Seed Science, Regional Station, GKVK Campus, Bengaluru, Karnataka 560065 India
| | - CS Shantharaja
- ICAR- Indian Institute of Seed Science, Regional Station, GKVK Campus, Bengaluru, Karnataka 560065 India
| | - S Aravindan
- 4Division of Genomic Resources, ICAR- National Bureau of Plant Genetic Resources, New Delhi, 110012 India
| | - Sanjay Kumar
- ICAR- Indian Institute of Seed Science, Mau, Uttar Pradesh 275103 India
| | | | - KV Bhat
- 4Division of Genomic Resources, ICAR- National Bureau of Plant Genetic Resources, New Delhi, 110012 India
| |
Collapse
|
7
|
Papadopoulos C, Callebaut I, Gelly JC, Hatin I, Namy O, Renard M, Lespinet O, Lopes A. Intergenic ORFs as elementary structural modules of de novo gene birth and protein evolution. Genome Res 2021; 31:2303-2315. [PMID: 34810219 PMCID: PMC8647833 DOI: 10.1101/gr.275638.121] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2021] [Accepted: 09/23/2021] [Indexed: 01/08/2023]
Abstract
The noncoding genome plays an important role in de novo gene birth and in the emergence of genetic novelty. Nevertheless, how noncoding sequences' properties could promote the birth of novel genes and shape the evolution and the structural diversity of proteins remains unclear. Therefore, by combining different bioinformatic approaches, we characterized the fold potential diversity of the amino acid sequences encoded by all intergenic open reading frames (ORFs) of S. cerevisiae with the aim of (1) exploring whether the structural states' diversity of proteomes is already present in noncoding sequences, and (2) estimating the potential of the noncoding genome to produce novel protein bricks that could either give rise to novel genes or be integrated into pre-existing proteins, thus participating in protein structure diversity and evolution. We showed that amino acid sequences encoded by most yeast intergenic ORFs contain the elementary building blocks of protein structures. Moreover, they encompass the large structural state diversity of canonical proteins, with the majority predicted as foldable. Then, we investigated the early stages of de novo gene birth by reconstructing the ancestral sequences of 70 yeast de novo genes and characterized the sequence and structural properties of intergenic ORFs with a strong translation signal. This enabled us to highlight sequence and structural factors determining de novo gene emergence. Finally, we showed a strong correlation between the fold potential of de novo proteins and one of their ancestral amino acid sequences, reflecting the relationship between the noncoding genome and the protein structure universe.
Collapse
Affiliation(s)
- Chris Papadopoulos
- Université Paris-Saclay, CEA, CNRS, Institute for Integrative Biology of the Cell (I2BC), 91198 Gif-sur-Yvette, France
| | - Isabelle Callebaut
- Sorbonne Université, Muséum National d'Histoire Naturelle, UMR CNRS 7590, Institut de Minéralogie, de Physique des Matériaux et de Cosmochimie, IMPMC, 75005 Paris, France
| | - Jean-Christophe Gelly
- Université de Paris, Biologie Intégrée du Globule Rouge, UMR_S1134, BIGR, INSERM, F-75015 Paris, France
- Laboratoire d'Excellence GR-Ex, 75015 Paris, France
- Institut National de la Transfusion Sanguine, F-75015 Paris, France
| | - Isabelle Hatin
- Université Paris-Saclay, CEA, CNRS, Institute for Integrative Biology of the Cell (I2BC), 91198 Gif-sur-Yvette, France
| | - Olivier Namy
- Université Paris-Saclay, CEA, CNRS, Institute for Integrative Biology of the Cell (I2BC), 91198 Gif-sur-Yvette, France
| | - Maxime Renard
- Université Paris-Saclay, CEA, CNRS, Institute for Integrative Biology of the Cell (I2BC), 91198 Gif-sur-Yvette, France
| | - Olivier Lespinet
- Université Paris-Saclay, CEA, CNRS, Institute for Integrative Biology of the Cell (I2BC), 91198 Gif-sur-Yvette, France
| | - Anne Lopes
- Université Paris-Saclay, CEA, CNRS, Institute for Integrative Biology of the Cell (I2BC), 91198 Gif-sur-Yvette, France
| |
Collapse
|
8
|
Dubreuil B, Levy ED. Abundance Imparts Evolutionary Constraints of Similar Magnitude on the Buried, Surface, and Disordered Regions of Proteins. Front Mol Biosci 2021; 8:626729. [PMID: 33996892 PMCID: PMC8119896 DOI: 10.3389/fmolb.2021.626729] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2020] [Accepted: 03/29/2021] [Indexed: 12/02/2022] Open
Abstract
An understanding of the forces shaping protein conservation is key, both for the fundamental knowledge it represents and to allow for optimal use of evolutionary information in practical applications. Sequence conservation is typically examined at one of two levels. The first is a residue-level, where intra-protein differences are analyzed and the second is a protein-level, where inter-protein differences are studied. At a residue level, we know that solvent-accessibility is a prime determinant of conservation. By inverting this logic, we inferred that disordered regions are slightly more solvent-accessible on average than the most exposed surface residues in domains. By integrating abundance information with evolutionary data within and across proteins, we confirmed a previously reported strong surface-core association in the evolution of structured regions, but we found a comparatively weak association between disordered and structured regions. The facts that disordered and structured regions experience different structural constraints and evolve independently provide a unique setup to examine an outstanding question: why is a protein’s abundance the main determinant of its sequence conservation? Indeed, any structural or biophysical property linked to the abundance-conservation relationship should increase the relative conservation of regions concerned with that property (e.g., disordered residues with mis-interactions, domain residues with misfolding). Surprisingly, however, we found the conservation of disordered and structured regions to increase in equal proportion with abundance. This observation implies that either abundance-related constraints are structure-independent, or multiple constraints apply to different regions and perfectly balance each other.
Collapse
Affiliation(s)
- Benjamin Dubreuil
- Department of Structural Biology, Weizmann Institute of Science, Rehovot, Israel
| | - Emmanuel D Levy
- Department of Structural Biology, Weizmann Institute of Science, Rehovot, Israel
| |
Collapse
|
9
|
James JE, Willis SM, Nelson PG, Weibel C, Kosinski LJ, Masel J. Universal and taxon-specific trends in protein sequences as a function of age. eLife 2021; 10:e57347. [PMID: 33416492 PMCID: PMC7819706 DOI: 10.7554/elife.57347] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2020] [Accepted: 01/05/2021] [Indexed: 01/12/2023] Open
Abstract
Extant protein-coding sequences span a huge range of ages, from those that emerged only recently to those present in the last universal common ancestor. Because evolution has had less time to act on young sequences, there might be 'phylostratigraphy' trends in any properties that evolve slowly with age. A long-term reduction in hydrophobicity and hydrophobic clustering was found in previous, taxonomically restricted studies. Here we perform integrated phylostratigraphy across 435 fully sequenced species, using sensitive HMM methods to detect protein domain homology. We find that the reduction in hydrophobic clustering is universal across lineages. However, only young animal domains have a tendency to have higher structural disorder. Among ancient domains, trends in amino acid composition reflect the order of recruitment into the genetic code, suggesting that the composition of the contemporary descendants of ancient sequences reflects amino acid availability during the earliest stages of life, when these sequences first emerged.
Collapse
Affiliation(s)
- Jennifer E James
- Department of Ecology and Evolutionary Biology, University of ArizonaTucsonUnited States
| | - Sara M Willis
- Department of Ecology and Evolutionary Biology, University of ArizonaTucsonUnited States
| | - Paul G Nelson
- Department of Ecology and Evolutionary Biology, University of ArizonaTucsonUnited States
| | - Catherine Weibel
- Department of Physics, University of ArizonaTucsonUnited States
- Department of Mathematics, University of ArizonaTucsonUnited States
| | - Luke J Kosinski
- Department of Molecular and Cellular Biology, University of ArizonaTucsonUnited States
| | - Joanna Masel
- Department of Ecology and Evolutionary Biology, University of ArizonaTucsonUnited States
| |
Collapse
|
10
|
Evolution Rapidly Optimizes Stability and Aggregation in Lattice Proteins Despite Pervasive Landscape Valleys and Mazes. Genetics 2020; 214:1047-1057. [PMID: 32107278 DOI: 10.1534/genetics.120.302815] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2019] [Accepted: 02/18/2020] [Indexed: 11/18/2022] Open
Abstract
The "fitness" landscapes of genetic sequences are characterized by high dimensionality and "ruggedness" due to sign epistasis. Ascending from low to high fitness on such landscapes can be difficult because adaptive trajectories get stuck at low-fitness local peaks. Compounding matters, recent theoretical arguments have proposed that extremely long, winding adaptive paths may be required to reach even local peaks: a "maze-like" landscape topography. The extent to which peaks and mazes shape the mode and tempo of evolution is poorly understood, due to empirical limitations and the abstractness of many landscape models. We explore the prevalence, scale, and evolutionary consequences of landscape mazes in a biophysically grounded computational model of protein evolution that captures the "frustration" between "stability" and aggregation propensity. Our stability-aggregation landscape exhibits extensive sign epistasis and local peaks galore. Although this frequently obstructs adaptive ascent to high fitness and virtually eliminates reproducibility of evolutionary outcomes, many adaptive paths do successfully complete the ascent from low to high fitness, with hydrophobicity a critical mediator of success. These successful paths exhibit maze-like properties on a global landscape scale, in which taking an indirect path helps to avoid low-fitness local peaks. This delicate balance of "hard but possible" adaptation could occur more broadly in other biological settings where competing interactions and frustration are important.
Collapse
|
11
|
Dubreuil B, Matalon O, Levy ED. Protein Abundance Biases the Amino Acid Composition of Disordered Regions to Minimize Non-functional Interactions. J Mol Biol 2019; 431:4978-4992. [PMID: 31442477 PMCID: PMC6941228 DOI: 10.1016/j.jmb.2019.08.008] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2019] [Revised: 08/07/2019] [Accepted: 08/10/2019] [Indexed: 02/07/2023]
Abstract
In eukaryotes, disordered regions cover up to 50% of proteomes and mediate fundamental cellular processes. In contrast to globular domains, where about half of the amino acids are buried in the protein interior, disordered regions show higher solvent accessibility, which makes them prone to engage in non-functional interactions. Such interactions are exacerbated by the law of mass action, prompting the question of how they are minimized in abundant proteins. We find that interaction propensity or "stickiness" of disordered regions negatively correlates with their cellular abundance, both in yeast and human. Strikingly, considering yeast proteins where a large fraction of the sequence is disordered, the correlation between stickiness and abundance reaches R=-0.55. Beyond this global amino-acid composition bias, we identify three rules by which amino-acid composition of disordered regions adjusts with high abundance. First, lysines are preferred over arginines, consistent with the latter amino acid being stickier than the former. Second, compensatory effects exist, whereby a sticky region can be tolerated if it is compensated by a distal non-sticky region. Third, such compensation requires a lower average stickiness at the same abundance when compared to a scenario where stickiness is homogeneous throughout the sequence. We validate these rules experimentally, employing them as different strategies to rescue an otherwise sticky protein fragment from aggregation. Our results highlight that non-functional interactions represent a significant constraint in cellular systems and reveal simple rules by which protein sequences adapt to that constraint. Data from this work are deposited in Figshare, at https://doi.org/10.6084/m9.figshare.8068937.v3.
Collapse
Affiliation(s)
- Benjamin Dubreuil
- Department of Structural Biology, Weizmann Institute of Science, Rehovot 7610001, Israel
| | - Or Matalon
- Department of Structural Biology, Weizmann Institute of Science, Rehovot 7610001, Israel
| | - Emmanuel D Levy
- Department of Structural Biology, Weizmann Institute of Science, Rehovot 7610001, Israel.
| |
Collapse
|
12
|
Affiliation(s)
- Stephen Branden Van Oss
- Department of Computational and Systems Biology, Pittsburgh Center for Evolutionary Biology and Medicine, School of Medicine, University of Pittsburgh, Pittsburgh, PA, United States of America
| | - Anne-Ruxandra Carvunis
- Department of Computational and Systems Biology, Pittsburgh Center for Evolutionary Biology and Medicine, School of Medicine, University of Pittsburgh, Pittsburgh, PA, United States of America
| |
Collapse
|