1
|
Safadi A, Lovell SC, Doig AJ. Essentiality, protein-protein interactions and evolutionary properties are key predictors for identifying cancer-associated genes using machine learning. Sci Rep 2024; 14:9199. [PMID: 38649399 PMCID: PMC11035574 DOI: 10.1038/s41598-023-44118-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2023] [Accepted: 10/04/2023] [Indexed: 04/25/2024] Open
Abstract
The distinctive nature of cancer as a disease prompts an exploration of the special characteristics the genes implicated in cancer exhibit. The identification of cancer-associated genes and their characteristics is crucial to further our understanding of this disease and enhanced likelihood of therapeutic drug targets success. However, the rate at which cancer genes are being identified experimentally is slow. Applying predictive analysis techniques, through the building of accurate machine learning models, is potentially a useful approach in enhancing the identification rate of these genes and their characteristics. Here, we investigated gene essentiality scores and found that they tend to be higher for cancer-associated genes compared to other protein-coding human genes. We built a dataset of extended gene properties linked to essentiality and used it to train a machine-learning model; this model reached 89% accuracy and > 0.85 for the Area Under Curve (AUC). The model showed that essentiality, evolutionary-related properties, and properties arising from protein-protein interaction networks are particularly effective in predicting cancer-associated genes. We were able to use the model to identify potential candidate genes that have not been previously linked to cancer. Prioritising genes that score highly by our methods could aid scientists in their cancer genes research.
Collapse
Affiliation(s)
- Amro Safadi
- Division of Evolution and Genomic Sciences, School of Biological Sciences, Faculty of Biology, Medicine and Health, The University of Manchester, Manchester, M13 9PT, UK
| | - Simon C Lovell
- Division of Evolution and Genomic Sciences, School of Biological Sciences, Faculty of Biology, Medicine and Health, The University of Manchester, Manchester, M13 9PT, UK
| | - Andrew J Doig
- Division of Neuroscience, School of Biological Sciences, Faculty of Biology, Medicine and Health, The University of Manchester, Manchester, M13 9BL, UK.
| |
Collapse
|
2
|
Bouvier JW, Emms DM, Kelly S. Rubisco is evolving for improved catalytic efficiency and CO 2 assimilation in plants. Proc Natl Acad Sci U S A 2024; 121:e2321050121. [PMID: 38442173 PMCID: PMC10945770 DOI: 10.1073/pnas.2321050121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2023] [Accepted: 01/25/2024] [Indexed: 03/07/2024] Open
Abstract
Rubisco is the primary entry point for carbon into the biosphere. However, rubisco is widely regarded as inefficient leading many to question whether the enzyme can adapt to become a better catalyst. Through a phylogenetic investigation of the molecular and kinetic evolution of Form I rubisco we uncover the evolutionary trajectory of rubisco kinetic evolution in angiosperms. We show that rbcL is among the 1% of slowest-evolving genes and enzymes on Earth, accumulating one nucleotide substitution every 0.9 My and one amino acid mutation every 7.2 My. Despite this, rubisco catalysis has been continually evolving toward improved CO2/O2 specificity, carboxylase turnover, and carboxylation efficiency. Consistent with this kinetic adaptation, increased rubisco evolution has led to a concomitant improvement in leaf-level CO2 assimilation. Thus, rubisco has been slowly but continually evolving toward improved catalytic efficiency and CO2 assimilation in plants.
Collapse
Affiliation(s)
- Jacques W Bouvier
- Department of Biology, University of Oxford, Oxford OX1 3RB, United Kingdom
| | - David M Emms
- Department of Biology, University of Oxford, Oxford OX1 3RB, United Kingdom
| | - Steven Kelly
- Department of Biology, University of Oxford, Oxford OX1 3RB, United Kingdom
| |
Collapse
|
3
|
Xing Z, Zhang Y, Tian Z, Wang M, Xiao W, Zhu C, Zhao S, Zhu Y, Hu L, Kong X. Escaping but not the inactive X-linked protein complex coding genes may achieve X-chromosome dosage compensation and underlie X chromosome inactivation-related diseases. Heliyon 2023; 9:e17721. [PMID: 37449161 PMCID: PMC10336589 DOI: 10.1016/j.heliyon.2023.e17721] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2022] [Revised: 06/05/2023] [Accepted: 06/26/2023] [Indexed: 07/18/2023] Open
Abstract
X chromosome dosage compensation (XDC) refers to the process by which X-linked genes acquire expression equivalence between two sexes. Ohno proposed that XDC is achieved by two-fold upregulations of X-linked genes in both sexes and by silencing one X chromosome (X chromosome inactivation, XCI) in females. However, genes subject to two-fold upregulations as well as the underlying mechanism remain unclear. It's reported that gene dosage changes may only affect X-linked dosage-sensitive genes, such as protein complex coding genes (PCGs). Our results showed that in human PCGs are more likely to escape XCI and escaping PCGs (EsP) show two-fold higher expression than inactivated PCGs (InP) or other X-linked genes at RNA and protein levels in both sexes, which suggest that EsP may achieve upregulations and XDC. The higher expressions of EsP possibly result from the upregulations of the single active X chromosome (Xa), rather than escaping expressions from the inactive X chromosome (Xi). EsP genes have relatively high expression levels in humans and lower dN/dS ratios, suggesting that they are likely under stronger selection pressure over evolutionary time. Our study also suggests that SP1 transcription factor is significantly enriched in EsP and may be involved in the up-regulations of EsP on the active X. Finally, human EsP genes in this study are enriched in the toll-like receptor pathway, NF-kB pathway, apoptotic pathway, and abnormal mental, developmental and reproductive phenotypes. These findings suggest misregulations of EsP may be involved in autoimmune, reproductive, and neurological diseases, providing insight for the diagnosis and treatment of these diseases.
Collapse
Affiliation(s)
- Zhihao Xing
- Clinical Laboratory, Institute of Pediatrics, Shenzhen Children’s' Hospital, Shenzhen, China
- CAS Key Laboratory of Tissue Microenvironment and Tumor, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, China
| | - Yuchao Zhang
- CAS Key Laboratory of Tissue Microenvironment and Tumor, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, China
| | - Zhongyuan Tian
- Zhoukou Traditional Chinese Medicine Hospital, Zhoukou, Henan, China
| | - Meng Wang
- CAS Key Laboratory of Tissue Microenvironment and Tumor, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, China
| | - Weiwei Xiao
- Clinical Laboratory, Institute of Pediatrics, Shenzhen Children’s' Hospital, Shenzhen, China
| | - Chunqing Zhu
- Clinical Laboratory, Institute of Pediatrics, Shenzhen Children’s' Hospital, Shenzhen, China
| | - Songhui Zhao
- CAS Key Laboratory of Tissue Microenvironment and Tumor, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, China
| | - Yufei Zhu
- CAS Key Laboratory of Tissue Microenvironment and Tumor, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, China
| | - Landian Hu
- CAS Key Laboratory of Tissue Microenvironment and Tumor, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, China
| | - Xiangyin Kong
- CAS Key Laboratory of Tissue Microenvironment and Tumor, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, China
| |
Collapse
|
4
|
Lai HY, Yu YH, Jhou YT, Liao CW, Leu JY. Multiple intermolecular interactions facilitate rapid evolution of essential genes. Nat Ecol Evol 2023; 7:745-755. [PMID: 36997737 PMCID: PMC10172115 DOI: 10.1038/s41559-023-02029-5] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2022] [Accepted: 02/21/2023] [Indexed: 04/01/2023]
Abstract
Essential genes are commonly assumed to function in basic cellular processes and to change slowly. However, it remains unclear whether all essential genes are similarly conserved or if their evolutionary rates can be accelerated by specific factors. To address these questions, we replaced 86 essential genes of Saccharomyces cerevisiae with orthologues from four other species that diverged from S. cerevisiae about 50, 100, 270 and 420 Myr ago. We identify a group of fast-evolving genes that often encode subunits of large protein complexes, including anaphase-promoting complex/cyclosome (APC/C). Incompatibility of fast-evolving genes is rescued by simultaneously replacing interacting components, suggesting it is caused by protein co-evolution. Detailed investigation of APC/C further revealed that co-evolution involves not only primary interacting proteins but also secondary ones, suggesting the evolutionary impact of epistasis. Multiple intermolecular interactions in protein complexes may provide a microenvironment facilitating rapid evolution of their subunits.
Collapse
Affiliation(s)
- Huei-Yi Lai
- Institute of Molecular Biology, Academia Sinica, Taipei, Taiwan
| | - Yen-Hsin Yu
- Institute of Molecular Biology, Academia Sinica, Taipei, Taiwan
| | - Yu-Ting Jhou
- Institute of Molecular Biology, Academia Sinica, Taipei, Taiwan
| | - Chia-Wei Liao
- Institute of Molecular Biology, Academia Sinica, Taipei, Taiwan
| | - Jun-Yi Leu
- Institute of Molecular Biology, Academia Sinica, Taipei, Taiwan.
| |
Collapse
|
5
|
Laloum D, Robinson-Rechavi M. Rhythmicity is linked to expression cost at the protein level but to expression precision at the mRNA level. PLoS Comput Biol 2022; 18:e1010399. [PMID: 36095022 PMCID: PMC9518874 DOI: 10.1371/journal.pcbi.1010399] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2021] [Revised: 09/28/2022] [Accepted: 07/17/2022] [Indexed: 11/18/2022] Open
Abstract
Many genes have nycthemeral rhythms of expression, i.e. a 24-hours periodic variation, at either mRNA or protein level or both, and most rhythmic genes are tissue-specific. Here, we investigate and discuss the evolutionary origins of rhythms in gene expression. Our results suggest that rhythmicity of protein expression could have been favored by selection to minimize costs. Trends are consistent in bacteria, plants and animals, and are also supported by tissue-specific patterns in mouse. Unlike for protein level, cost cannot explain rhythm at the RNA level. We suggest that instead it allows to periodically reduce expression noise. Noise control had the strongest support in mouse, with limited evidence in other species. We have also found that genes under stronger purifying selection are rhythmically expressed at the mRNA level, and we propose that this is because they are noise sensitive genes. Finally, the adaptive role of rhythmic expression is supported by rhythmic genes being highly expressed yet tissue-specific. This provides a good evolutionary explanation for the observation that nycthemeral rhythms are often tissue-specific. For many genes, their expression, i.e. the production of RNA and proteins, is rhythmic with a 24-hour period. Here, we study and discuss the evolutionary origins of these rhythms. Our analyses of data from different species suggest that the rhythmicity of protein level may have been favored by selection for cost minimization. Furthermore, we have shown that cost cannot explain the rhythmic variations in RNA levels. Instead, we suggest that it periodically reduces the stochasticity of gene expression. We also found that genes under stronger purifying selection are rhythmically expressed at the mRNA level, and propose that this is because they are noise-sensitive genes. Finally, rhythmic expression involves genes that are often highly expressed and tissue-specific. This provides a good evolutionary explanation for the tissue-specificity of these rhythms.
Collapse
Affiliation(s)
- David Laloum
- Department of Ecology and Evolution, Batiment Biophore, Quartier UNIL-Sorge, Université de Lausanne, Lausanne, Switzerland
- Swiss Institute of Bioinformatics, Batiment Génopode, Quartier UNIL-Sorge, Université de Lausanne, Lausanne, Switzerland
| | - Marc Robinson-Rechavi
- Department of Ecology and Evolution, Batiment Biophore, Quartier UNIL-Sorge, Université de Lausanne, Lausanne, Switzerland
- Swiss Institute of Bioinformatics, Batiment Génopode, Quartier UNIL-Sorge, Université de Lausanne, Lausanne, Switzerland
- * E-mail:
| |
Collapse
|
6
|
Pollet L, Lambourne L, Xia Y. Structural Determinants of Yeast Protein-Protein Interaction Interface Evolution at the Residue Level. J Mol Biol 2022; 434:167750. [PMID: 35850298 DOI: 10.1016/j.jmb.2022.167750] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2021] [Revised: 06/09/2022] [Accepted: 07/12/2022] [Indexed: 12/01/2022]
Abstract
Interfaces of contact between proteins play important roles in determining the proper structure and function of protein-protein interactions (PPIs). Therefore, to fully understand PPIs, we need to better understand the evolutionary design principles of PPI interfaces. Previous studies have uncovered that interfacial sites are more evolutionarily conserved than other surface protein sites. Yet, little is known about the nature and relative importance of evolutionary constraints in PPI interfaces. Here, we explore constraints imposed by the structure of the microenvironment surrounding interfacial residues on residue evolutionary rate using a large dataset of over 700 structural models of baker's yeast PPIs. We find that interfacial residues are, on average, systematically more conserved than all other residues with a similar degree of total burial as measured by relative solvent accessibility (RSA). Besides, we find that RSA of the residue when the PPI is formed is a better predictor of interfacial residue evolutionary rate than RSA in the monomer state. Furthermore, we investigate four structure-based measures of residue interfacial involvement, including change in RSA upon binding (ΔRSA), number of residue-residue contacts across the interface, and distance from the center or the periphery of the interface. Integrated modeling for evolutionary rate prediction in interfaces shows that ΔRSA plays a dominant role among the four measures of interfacial involvement, with minor, but independent contributions from other measures. These results yield insight into the evolutionary design of interfaces, improving our understanding of the role that structure plays in the molecular evolution of PPIs at the residue level.
Collapse
Affiliation(s)
- Léah Pollet
- Department of Bioengineering, Faculty of Engineering, McGill University, Montreal, QC, Canada
| | - Luke Lambourne
- Center for Cancer Systems Biology (CCSB), Dana-Farber Cancer Institute, Boston, MA, USA; Department of Genetics, Blavatnik Institute, Harvard Medical School, Boston, MA, USA; Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, MA, USA.
| | - Yu Xia
- Department of Bioengineering, Faculty of Engineering, McGill University, Montreal, QC, Canada.
| |
Collapse
|
7
|
Wang Y, Jiang B, Wu Y, He X, Liu L. Rapid intraspecies evolution of fitness effects of yeast genes. Genome Biol Evol 2022; 14:6575331. [PMID: 35482054 PMCID: PMC9113246 DOI: 10.1093/gbe/evac061] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/21/2022] [Indexed: 11/14/2022] Open
Abstract
Organisms within species have numerous genetic and phenotypic variations. Growing evidences show intraspecies variation of mutant phenotypes may be more complicated than expected. Current studies on intraspecies variations of mutant phenotypes are limited to just a few strains. This study investigated the intraspecies variation of fitness effects of 5,630 gene mutants in ten Saccharomyces cerevisiae strains using CRISPR–Cas9 screening. We found that the variability of fitness effects induced by gene disruptions is very large across different strains. Over 75% of genes affected cell fitness in a strain-specific manner to varying degrees. The strain specificity of the fitness effect of a gene is related to its evolutionary and functional properties. Subsequent analysis revealed that younger genes, especially those newly acquired in S. cerevisiae species, are more likely to be strongly strain-specific. Intriguingly, there seems to exist a ceiling of fitness effect size for strong strain-specific genes, and among them, the newly acquired genes are still evolving and have yet to reach this ceiling. Additionally, for a large proportion of protein complexes, the strain specificity profile is inconsistent among genes encoding the same complex. Taken together, these results offer a genome-wide map of intraspecies variation for fitness effect as a mutant phenotype and provide an updated insight on intraspecies phenotypic evolution.
Collapse
Affiliation(s)
- Yayu Wang
- MOE Key Laboratory of Gene Function and Regulation, State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-sen University, Guangzhou 510275, China
| | - Bei Jiang
- MOE Key Laboratory of Gene Function and Regulation, State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-sen University, Guangzhou 510275, China
| | - Yue Wu
- MOE Key Laboratory of Gene Function and Regulation, State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-sen University, Guangzhou 510275, China
| | - Xionglei He
- MOE Key Laboratory of Gene Function and Regulation, State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-sen University, Guangzhou 510275, China
| | - Li Liu
- MOE Key Laboratory of Gene Function and Regulation, State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-sen University, Guangzhou 510275, China
| |
Collapse
|
8
|
Vedelek B, Kovács Á, Boros IM. Evolutionary mode for the functional preservation of fast-evolving Drosophila telomere capping proteins. Open Biol 2021; 11:210261. [PMID: 34784790 PMCID: PMC8596017 DOI: 10.1098/rsob.210261] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023] Open
Abstract
DNA end protection is fundamental for the long-term preservation of the genome. In vertebrates the Shelterin protein complex protects telomeric DNA ends, thereby contributing to the maintenance of genome integrity. In the Drosophila genus, this function is thought to be performed by the Terminin complex, an assembly of fast-evolving subunits. Considering that DNA end protection is fundamental for successful genome replication, the accelerated evolution of Terminin subunits is counterintuitive, as conservation is supposed to maintain the assembly and concerted function of the interacting partners. This problem extends over Drosophila telomere biology and provides insight into the evolution of protein assemblies. In order to learn more about the mechanistic details of this phenomenon we have investigated the intra- and interspecies assemblies of Verrocchio and Modigliani, two Terminin subunits using in vitro assays. Based on our results and on homology-based three-dimensional models for Ver and Moi, we conclude that both proteins contain Ob-fold and contribute to the ssDNA binding of the Terminin complex. We propose that the preservation of Ver function is achieved by conservation of specific amino acids responsible for folding or localized in interacting surfaces. We also provide here the first evidence on Moi DNA binding.
Collapse
Affiliation(s)
- Balázs Vedelek
- Department of Biochemistry and Molecular Biology, University of Szeged, Szeged, Hungary,Institute of Biochemistry, Biological Research Centre, Szeged, Hungary
| | - Ákos Kovács
- Department of Biochemistry and Molecular Biology, University of Szeged, Szeged, Hungary
| | - Imre M. Boros
- Department of Biochemistry and Molecular Biology, University of Szeged, Szeged, Hungary,Institute of Biochemistry, Biological Research Centre, Szeged, Hungary
| |
Collapse
|
9
|
Abstract
Because gene expression is important for evolutionary adaptation, its misregulation is an important cause of maladaptation. A misregulated gene can be incorrectly silent ("off") when a transcription factor (TF) that is required for its activation does not binds its regulatory region. Conversely, a misregulated gene can be incorrectly active ("on") when a TF not normally involved in its activation binds its regulatory region, a phenomenon also known as regulatory crosstalk. DNA mutations that destroy or create TF binding sites on DNA are an important source of misregulation and crosstalk. Although misregulation reduces fitness in an environment to which an organism is well-adapted, it may become adaptive in a new environment. Here, I derive simple yet general mathematical expressions that delimit the conditions under which misregulation can be adaptive. These expressions depend on the strength of selection against misregulation, on the fraction of DNA sequence space filled with TF binding sites, and on the fraction of genes that must be expressed for optimal adaptation. I then use empirical data from RNA sequencing, protein-binding microarrays, and genome evolution, together with population genetic simulations to ask when these conditions are likely to be met. I show that they can be met under realistic circumstances, but these circumstances may vary among organisms and environments. My analysis provides a framework in which improved theory and data collection can help us demonstrate the role of misregulation in adaptation. It also shows that misregulation, like DNA mutation, is one of life's many imperfections that can help propel Darwinian evolution.
Collapse
Affiliation(s)
- Andreas Wagner
- Department of Evolutionary Biology and Environmental Studies, University of Zurich, Zurich, CH-8057, Switzerland.,The Santa Fe Institute, Santa Fe, NM 87501, USA.,Swiss Institute of Bioinformatics, Lausanne, Switzerland
| |
Collapse
|
10
|
Razban RM, Dasmeh P, Serohijos AWR, Shakhnovich EI. Avoidance of protein unfolding constrains protein stability in long-term evolution. Biophys J 2021; 120:2413-2424. [PMID: 33932438 PMCID: PMC8390877 DOI: 10.1016/j.bpj.2021.03.042] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2020] [Revised: 02/24/2021] [Accepted: 03/17/2021] [Indexed: 11/28/2022] Open
Abstract
Every amino acid residue can influence a protein's overall stability, making stability highly susceptible to change throughout evolution. We consider the distribution of protein stabilities evolutionarily permittable under two previously reported protein fitness functions: flux dynamics and misfolding avoidance. We develop an evolutionary dynamics theory and find that it agrees better with an extensive protein stability data set for dihydrofolate reductase orthologs under the misfolding avoidance fitness function rather than the flux dynamics fitness function. Further investigation with ribonuclease H data demonstrates that not any misfolded state is avoided; rather, it is only the unfolded state. At the end, we discuss how our work pertains to the universal protein abundance-evolutionary rate correlation seen across organisms' proteomes. We derive a closed-form expression relating protein abundance to evolutionary rate that captures Escherichia coli, Saccharomyces cerevisiae, and Homo sapiens experimental trends without fitted parameters.
Collapse
Affiliation(s)
- Rostam M Razban
- Department of Chemistry and Chemical Biology, Harvard University, Cambridge, Massachusetts
| | - Pouria Dasmeh
- Department of Chemistry and Chemical Biology, Harvard University, Cambridge, Massachusetts; Departement de Biochimie, Université de Montréal, Montreal, Quebec, Canada
| | | | - Eugene I Shakhnovich
- Department of Chemistry and Chemical Biology, Harvard University, Cambridge, Massachusetts.
| |
Collapse
|
11
|
Moran J, Finlay D, Tikhonov M. Improve it or lose it: Evolvability cost of competition for expression. Phys Rev E 2021; 103:062402. [PMID: 34271680 DOI: 10.1103/physreve.103.062402] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2021] [Accepted: 05/24/2021] [Indexed: 11/07/2022]
Abstract
Expression level is known to be a strong determinant of a protein's rate of evolution. But the converse can also be true: evolutionary dynamics can affect expression levels of proteins. Having implications in both directions fosters the possibility of an "improve it or lose it" feedback loop, where higher expressed systems are more likely to improve and be expressed even higher, while those that are expressed less are eventually lost to drift. Using a minimal model to study this in the context of a changing environment, we demonstrate that one unexpected consequence of such a feedback loop is that a slow switch to a new environment can allow genotypes to reach higher fitness sooner than a direct exposure to it.
Collapse
Affiliation(s)
- Jacob Moran
- Department of Physics, Washington University in St. Louis, St. Louis, Missouri 63130, USA
| | - Devon Finlay
- Department of Physics, Washington University in St. Louis, St. Louis, Missouri 63130, USA
| | - Mikhail Tikhonov
- Department of Physics, Washington University in St. Louis, St. Louis, Missouri 63130, USA
| |
Collapse
|
12
|
Dubreuil B, Levy ED. Abundance Imparts Evolutionary Constraints of Similar Magnitude on the Buried, Surface, and Disordered Regions of Proteins. Front Mol Biosci 2021; 8:626729. [PMID: 33996892 PMCID: PMC8119896 DOI: 10.3389/fmolb.2021.626729] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2020] [Accepted: 03/29/2021] [Indexed: 12/02/2022] Open
Abstract
An understanding of the forces shaping protein conservation is key, both for the fundamental knowledge it represents and to allow for optimal use of evolutionary information in practical applications. Sequence conservation is typically examined at one of two levels. The first is a residue-level, where intra-protein differences are analyzed and the second is a protein-level, where inter-protein differences are studied. At a residue level, we know that solvent-accessibility is a prime determinant of conservation. By inverting this logic, we inferred that disordered regions are slightly more solvent-accessible on average than the most exposed surface residues in domains. By integrating abundance information with evolutionary data within and across proteins, we confirmed a previously reported strong surface-core association in the evolution of structured regions, but we found a comparatively weak association between disordered and structured regions. The facts that disordered and structured regions experience different structural constraints and evolve independently provide a unique setup to examine an outstanding question: why is a protein’s abundance the main determinant of its sequence conservation? Indeed, any structural or biophysical property linked to the abundance-conservation relationship should increase the relative conservation of regions concerned with that property (e.g., disordered residues with mis-interactions, domain residues with misfolding). Surprisingly, however, we found the conservation of disordered and structured regions to increase in equal proportion with abundance. This observation implies that either abundance-related constraints are structure-independent, or multiple constraints apply to different regions and perfectly balance each other.
Collapse
Affiliation(s)
- Benjamin Dubreuil
- Department of Structural Biology, Weizmann Institute of Science, Rehovot, Israel
| | - Emmanuel D Levy
- Department of Structural Biology, Weizmann Institute of Science, Rehovot, Israel
| |
Collapse
|
13
|
Evans P, Cox NJ, Gamazon ER. The regulatory genome constrains protein sequence evolution: implications for the search for disease-associated genes. PeerJ 2020; 8:e9554. [PMID: 32765967 PMCID: PMC7380284 DOI: 10.7717/peerj.9554] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2020] [Accepted: 06/24/2020] [Indexed: 11/20/2022] Open
Abstract
The development of explanatory models of protein sequence evolution has broad implications for our understanding of cellular biology, population history, and disease etiology. Here we analyze the GTEx transcriptome resource to quantify the effect of the transcriptome on protein sequence evolution in a multi-tissue framework. We find substantial variation among the central nervous system tissues in the effect of expression variance on evolutionary rate, with highly variable genes in the cortex showing significantly greater purifying selection than highly variable genes in subcortical regions (Mann-Whitney U p = 1.4 × 10-4). The remaining tissues cluster in observed expression correlation with evolutionary rate, enabling evolutionary analysis of genes in diverse physiological systems, including digestive, reproductive, and immune systems. Importantly, the tissue in which a gene attains its maximum expression variance significantly varies (p = 5.55 × 10-284) with evolutionary rate, suggesting a tissue-anchored model of protein sequence evolution. Using a large-scale reference resource, we show that the tissue-anchored model provides a transcriptome-based approach to predicting the primary affected tissue of developmental disorders. Using gradient boosted regression trees to model evolutionary rate under a range of model parameters, selected features explain up to 62% of the variation in evolutionary rate and provide additional support for the tissue model. Finally, we investigate several methodological implications, including the importance of evolutionary-rate-aware gene expression imputation models using genetic data for improved search for disease-associated genes in transcriptome-wide association studies. Collectively, this study presents a comprehensive transcriptome-based analysis of a range of factors that may constrain molecular evolution and proposes a novel framework for the study of gene function and disease mechanism.
Collapse
Affiliation(s)
- Patrick Evans
- Division of Genetic Medicine, Vanderbilt University Medical Center, Nashville, TN, United States of America
| | - Nancy J Cox
- Division of Genetic Medicine, Vanderbilt University Medical Center, Nashville, TN, United States of America
| | - Eric R Gamazon
- Division of Genetic Medicine, Vanderbilt University Medical Center, Nashville, TN, United States of America.,Clare Hall, University of Cambridge, Cambridge, United Kingdom.,MRC Epidemiology Unit, University of Cambridge, Cambridge, United Kingdom.,Data Science Institute, Vanderbilt University, Nashville, TN, United States of America
| |
Collapse
|
14
|
Abstract
Darwin's theory of evolution emphasized that positive selection of functional proficiency provides the fitness that ultimately determines the structure of life, a view that has dominated biochemical thinking of enzymes as perfectly optimized for their specific functions. The 20th-century modern synthesis, structural biology, and the central dogma explained the machinery of evolution, and nearly neutral theory explained how selection competes with random fixation dynamics that produce molecular clocks essential e.g. for dating evolutionary histories. However, quantitative proteomics revealed that selection pressures not relating to optimal function play much larger roles than previously thought, acting perhaps most importantly via protein expression levels. This paper first summarizes recent progress in the 21st century toward recovering this universal selection pressure. Then, the paper argues that proteome cost minimization is the dominant, underlying 'non-function' selection pressure controlling most of the evolution of already functionally adapted living systems. A theory of proteome cost minimization is described and argued to have consequences for understanding evolutionary trade-offs, aging, cancer, and neurodegenerative protein-misfolding diseases.
Collapse
|
15
|
Aligning functional network constraint to evolutionary outcomes. BMC Evol Biol 2020; 20:58. [PMID: 32448114 PMCID: PMC7245893 DOI: 10.1186/s12862-020-01613-8] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2018] [Accepted: 04/15/2020] [Indexed: 12/12/2022] Open
Abstract
BACKGROUND Functional constraint through genomic architecture is suggested to be an important dimension of genome evolution, but quantitative evidence for this idea is rare. In this contribution, existing evidence and discussions on genomic architecture as constraint for convergent evolution, rapid adaptation, and genic adaptation are summarized into alternative, testable hypotheses. Network architecture statistics from protein-protein interaction networks are then used to calculate differences in evolutionary outcomes on the example of genomic evolution in yeast, and the results are used to evaluate statistical support for these longstanding hypotheses. RESULTS A discriminant function analysis lent statistical support to classifying the yeast interactome into hub, intermediate and peripheral nodes based on network neighborhood connectivity, betweenness centrality, and average shortest path length. Quantitative support for the existence of genomic architecture as a mechanistic basis for evolutionary constraint is then revealed through utilizing these statistical parameters of the protein-protein interaction network in combination with estimators of protein evolution. CONCLUSIONS As functional genetic networks are becoming increasingly available, it will now be possible to evaluate functional genetic network constraint against variables describing complex phenotypes and environments, for better understanding of commonly observed deterministic patterns of evolution in non-model organisms. The hypothesis framework and methodological approach outlined herein may help to quantify the extrinsic versus intrinsic dimensions of evolutionary constraint, and result in a better understanding of how fast, effectively, or deterministically organisms adapt.
Collapse
|
16
|
Gangele K, Jamsandekar M, Mishra A, Poluri KM. Unraveling the evolutionary origin of ELR motif using fish CXC chemokine CXCL8. FISH & SHELLFISH IMMUNOLOGY 2019; 93:17-27. [PMID: 31310848 DOI: 10.1016/j.fsi.2019.07.034] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/24/2019] [Revised: 07/12/2019] [Accepted: 07/12/2019] [Indexed: 05/19/2023]
Abstract
Chemokines are chemotactic proteins involved in host defense through the migration of immune-regulatory cells to the site of infection. Interleukin-8 (CXCL8/IL8) is the most studied "ELR-CXC chemokine/neutrophil activating chemokine (NAC) that regulate neutrophil trafficking during infections and inflammation by binding to its cognate G-protein coupled receptors CXCR1/CXCR2. The "ELR" motif of NAC chemokines is essential for the CXCR1/CXCR2 receptor activation. In order to understand the evolutionary origin of "ELR" motif in the CXC chemokines, a thorough evolutionary study of CXCL8 gene from various fishes and primates was performed. Phylogenetic analysis revealed that the CXCL8 gene can be classified into four distinct lineages (CXCL8-L1a, CXCL8-L1b, CXCL8-L2, and CXCL8-L3), where CXCL8-L1a is the fastest evolving lineage and CXCL8-L3 is the slowest. Selection analysis suggested that The "ELR/DLR" motif containing branches (gadoid and coelacanth) are positively selected. The probable evolutionary trend of "ELR" motif suggested that this motif in ancestor CXCL8 is evolved from the GGR of Lamprey (Agnatha), followed by duplication giving rise to two main motifs in CXCL8 "NXH" in L3 lineage and "ELR/DLR" in L1a/L1b lineages. Although, structural analysis suggested that the overall topology of the CXCL8 proteins is similar, differences do exist at the individual structural elements among the members of different lineages. Functional distance analysis suggested that the CXCL8-L3 lineage is more distant compared to the CXCL8-L1a and L1b lineages from the inferred ancestor. Functional divergence analysis between different lineages suggested that most of the selected residues are important for receptor or glycosaminoglycan binding. Such a functional diversification can be attributed to the novel set of functions adopted by CXCL8 in various species.
Collapse
Affiliation(s)
- Krishnakant Gangele
- Department of Biotechnology, Indian Institute of Technology Roorkee, Roorkee, 247667, Uttarakhand, India
| | - Minal Jamsandekar
- Department of Biotechnology, Indian Institute of Technology Roorkee, Roorkee, 247667, Uttarakhand, India
| | - Amit Mishra
- Cellular and Molecular Neurobiology Unit, Indian Institute of Technology Jodhpur, Jodhpur, 342011, Rajasthan, India
| | - Krishna Mohan Poluri
- Department of Biotechnology, Indian Institute of Technology Roorkee, Roorkee, 247667, Uttarakhand, India.
| |
Collapse
|
17
|
Razban RM. Protein Melting Temperature Cannot Fully Assess Whether Protein Folding Free Energy Underlies the Universal Abundance-Evolutionary Rate Correlation Seen in Proteins. Mol Biol Evol 2019; 36:1955-1963. [PMID: 31093676 PMCID: PMC6736436 DOI: 10.1093/molbev/msz119] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023] Open
Abstract
The protein misfolding avoidance hypothesis explains the universal negative correlation between protein abundance and sequence evolutionary rate across the proteome by identifying protein folding free energy (ΔG) as the confounding variable. Abundant proteins resist toxic misfolding events by being more stable, and more stable proteins evolve slower because their mutations are more destabilizing. Direct supporting evidence consists only of computer simulations. A study taking advantage of a recent experimental breakthrough in measuring protein stability proteome-wide through melting temperature (Tm) (Leuenberger et al. 2017), found weak misfolding avoidance hypothesis support for the Escherichia coli proteome, and no support for the Saccharomyces cerevisiae, Homo sapiens, and Thermus thermophilus proteomes (Plata and Vitkup 2018). I find that the nontrivial relationship between Tm and ΔG and inaccuracy in Tm measurements by Leuenberger et al. 2017 can be responsible for not observing strong positive abundance-Tm and strong negative Tm-evolutionary rate correlations.
Collapse
Affiliation(s)
- Rostam M Razban
- Department of Chemistry and Chemical Biology, Harvard University, Cambridge, MA
| |
Collapse
|
18
|
Systematic analysis reveals the prevalence and principles of bypassable gene essentiality. Nat Commun 2019; 10:1002. [PMID: 30824696 PMCID: PMC6397241 DOI: 10.1038/s41467-019-08928-1] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2018] [Accepted: 02/07/2019] [Indexed: 12/12/2022] Open
Abstract
Gene essentiality is a variable phenotypic trait, but to what extent and how essential genes can become dispensable for viability remain unclear. Here, we investigate 'bypass of essentiality (BOE)' - an underexplored type of digenic genetic interaction that renders essential genes dispensable. Through analyzing essential genes on one of the six chromosome arms of the fission yeast Schizosaccharomyces pombe, we find that, remarkably, as many as 27% of them can be converted to non-essential genes by BOE interactions. Using this dataset we identify three principles of essentiality bypass: bypassable essential genes tend to have lower importance, tend to exhibit differential essentiality between species, and tend to act with other bypassable genes. In addition, we delineate mechanisms underlying bypassable essentiality, including the previously unappreciated mechanism of dormant redundancy between paralogs. The new insights gained on bypassable essentiality deepen our understanding of genotype-phenotype relationships and will facilitate drug development related to essential genes.
Collapse
|
19
|
Marek A, Tomala K. The Contribution of Purifying Selection, Linkage, and Mutation Bias to the Negative Correlation between Gene Expression and Polymorphism Density in Yeast Populations. Genome Biol Evol 2018; 10:2986-2996. [PMID: 30321329 PMCID: PMC6250307 DOI: 10.1093/gbe/evy225] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/10/2018] [Indexed: 11/13/2022] Open
Abstract
The negative correlation between the rate of protein evolution and expression level of a gene has been recognized as a universal law of the evolutionary biology (Koonin 2011). In our study, we apply a population-based approach to systematically investigate the relative importance of unequal mutation rate, linkage, and selection in the origin of the expression-polymorphism anticorrelation. We analyzed the DNA sequence of protein coding genes of 24 Saccharomyces cerevisiae and 58 Schizosaccharomyces pombe strains. We found that highly expressed genes had a substantially decreased number of polymorphic sites when compared with genes transcribed less extensively. This expression-dependent reduction was especially strong in the nonsynonymous sites, although it was also present in the synonymous sites and untranslated regions, both up and down of a gene. Most importantly, no such trend was found in introns. We used these observations, as well as analyses of site frequency spectra and data from mutation accumulation experiments, to show that the purifying selection acting on nonsynonymous sites was the main, but not exclusive, factor impeding molecular evolution within the coding sequences of highly expressed genes. Linkage could not fully explain the observed pattern of polymorphism within the untranslated regions and synonymous sites, although the contribution of selection acting directly on synonymous variants was extremely small. Finally, we found that the impact of mutational bias was rather negligible.
Collapse
Affiliation(s)
- Agnieszka Marek
- Institute of Environmental Sciences, Jagiellonian University, Krakow, Poland
| | - Katarzyna Tomala
- Institute of Environmental Sciences, Jagiellonian University, Krakow, Poland
| |
Collapse
|
20
|
Jaquiéry J, Peccoud J, Ouisse T, Legeai F, Prunier-Leterme N, Gouin A, Nouhaud P, Brisson JA, Bickel R, Purandare S, Poulain J, Battail C, Lemaitre C, Mieuzet L, Le Trionnaire G, Simon JC, Rispe C. Disentangling the Causes for Faster-X Evolution in Aphids. Genome Biol Evol 2018; 10:507-520. [PMID: 29360959 PMCID: PMC5798017 DOI: 10.1093/gbe/evy015] [Citation(s) in RCA: 29] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 01/18/2018] [Indexed: 12/22/2022] Open
Abstract
The faster evolution of X chromosomes has been documented in several species, and results from the increased efficiency of selection on recessive alleles in hemizygous males and/or from increased drift due to the smaller effective population size of X chromosomes. Aphids are excellent models for evaluating the importance of selection in faster-X evolution because their peculiar life cycle and unusual inheritance of sex chromosomes should generally lead to equivalent effective population sizes for X and autosomes. Because we lack a high-density genetic map for the pea aphid, whose complete genome has been sequenced, we first assigned its entire genome to the X or autosomes based on ratios of sequencing depth in males (X0) to females (XX). Then, we computed nonsynonymous to synonymous substitutions ratios (dN/dS) for the pea aphid gene set and found faster evolution of X-linked genes. Our analyses of substitution rates, together with polymorphism and expression data, showed that relaxed selection is likely to be the greatest contributor to faster-X because a large fraction of X-linked genes are expressed at low rates and thus escape selection. Yet, a minor role for positive selection is also suggested by the difference between substitution rates for X and autosomes for male-biased genes (but not for asexual female-biased genes) and by lower Tajima’s D for X-linked compared with autosomal genes with highly male-biased expression patterns. This study highlights the relevance of organisms displaying alternative chromosomal inheritance to the understanding of forces shaping genome evolution.
Collapse
Affiliation(s)
- Julie Jaquiéry
- INRA UMR IGEPP Domaine de la Motte, Le Rheu, France.,CNRS UMR 6553 ECOBIO, Université de Rennes 1, France
| | - Jean Peccoud
- CNRS UMR 7267 Ecologie et Biologie des Interactions, Equipe Ecologie Evolution Symbiose, Université de Poitiers, France
| | | | - Fabrice Legeai
- INRA UMR IGEPP Domaine de la Motte, Le Rheu, France.,INRIA Centre Rennes - Bretagne Atlantique, GenOuest, Rennes, France
| | | | - Anais Gouin
- INRA UMR IGEPP Domaine de la Motte, Le Rheu, France.,INRIA Centre Rennes - Bretagne Atlantique, GenOuest, Rennes, France
| | - Pierre Nouhaud
- Institute of Population Genetics, Vetmeduni Vienna, Vienna, Austria
| | | | - Ryan Bickel
- Department of Biology, University of Rochester
| | - Swapna Purandare
- Multidisciplinary Center for Advance Research and Studies (MCARS), Jamia Millia Islamia, New Delhi, India
| | - Julie Poulain
- Commissariat à l'Energie Atomique (CEA), Institut de Génomique (IG), Genoscope, Evry, France
| | - Christophe Battail
- Commissariat à l'Energie Atomique (CEA), Institut de Génomique (IG), Centre National de Génotypage (CNG), Evry, France
| | - Claire Lemaitre
- INRIA Centre Rennes - Bretagne Atlantique, GenOuest, Rennes, France
| | | | | | | | - Claude Rispe
- BIOEPAR, INRA, ONIRIS, La Chantrerie, Nantes, France
| |
Collapse
|
21
|
Razban RM, Gilson AI, Durfee N, Strobelt H, Dinkla K, Choi JM, Pfister H, Shakhnovich EI. ProteomeVis: a web app for exploration of protein properties from structure to sequence evolution across organisms' proteomes. Bioinformatics 2018; 34:3557-3565. [PMID: 29741573 PMCID: PMC6184454 DOI: 10.1093/bioinformatics/bty370] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2017] [Revised: 03/27/2018] [Accepted: 05/03/2018] [Indexed: 01/27/2023] Open
Abstract
Motivation Protein evolution spans time scales and its effects span the length of an organism. A web app named ProteomeVis is developed to provide a comprehensive view of protein evolution in the Saccharomyces cerevisiae and Escherichia coli proteomes. ProteomeVis interactively creates protein chain graphs, where edges between nodes represent structure and sequence similarities within user-defined ranges, to study the long time scale effects of protein structure evolution. The short time scale effects of protein sequence evolution are studied by sequence evolutionary rate (ER) correlation analyses with protein properties that span from the molecular to the organismal level. Results We demonstrate the utility and versatility of ProteomeVis by investigating the distribution of edges per node in organismal protein chain universe graphs (oPCUGs) and putative ER determinants. S.cerevisiae and E.coli oPCUGs are scale-free with scaling constants of 1.79 and 1.56, respectively. Both scaling constants can be explained by a previously reported theoretical model describing protein structure evolution. Protein abundance most strongly correlates with ER among properties in ProteomeVis, with Spearman correlations of -0.49 (P-value < 10-10) and -0.46 (P-value < 10-10) for S.cerevisiae and E.coli, respectively. This result is consistent with previous reports that found protein expression to be the most important ER determinant. Availability and implementation ProteomeVis is freely accessible at http://proteomevis.chem.harvard.edu. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Rostam M Razban
- Department of Chemistry & Chemical Biology, Harvard University, Cambridge, MA, USA
| | - Amy I Gilson
- Department of Chemistry & Chemical Biology, Harvard University, Cambridge, MA, USA
| | - Niamh Durfee
- Department of Chemistry & Chemical Biology, Harvard University, Cambridge, MA, USA
| | - Hendrik Strobelt
- School of Engineering & Applied Sciences, Harvard University, Cambridge, MA, USA
| | - Kasper Dinkla
- School of Engineering & Applied Sciences, Harvard University, Cambridge, MA, USA
| | - Jeong-Mo Choi
- Department of Chemistry & Chemical Biology, Harvard University, Cambridge, MA, USA
| | - Hanspeter Pfister
- School of Engineering & Applied Sciences, Harvard University, Cambridge, MA, USA
| | - Eugene I Shakhnovich
- Department of Chemistry & Chemical Biology, Harvard University, Cambridge, MA, USA
| |
Collapse
|
22
|
Albert FW, Bloom JS, Siegel J, Day L, Kruglyak L. Genetics of trans-regulatory variation in gene expression. eLife 2018; 7:e35471. [PMID: 30014850 PMCID: PMC6072440 DOI: 10.7554/elife.35471] [Citation(s) in RCA: 97] [Impact Index Per Article: 16.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2018] [Accepted: 06/30/2018] [Indexed: 12/02/2022] Open
Abstract
Heritable variation in gene expression forms a crucial bridge between genomic variation and the biology of many traits. However, most expression quantitative trait loci (eQTLs) remain unidentified. We mapped eQTLs by transcriptome sequencing in 1012 yeast segregants. The resulting eQTLs accounted for over 70% of the heritability of mRNA levels, allowing comprehensive dissection of regulatory variation. Most genes had multiple eQTLs. Most expression variation arose from trans-acting eQTLs distant from their target genes. Nearly all trans-eQTLs clustered at 102 hotspot locations, some of which influenced the expression of thousands of genes. Fine-mapped hotspot regions were enriched for transcription factor genes. While most genes had a local eQTL, most of these had no detectable effects on the expression of other genes in trans. Hundreds of non-additive genetic interactions accounted for small fractions of expression variation. These results reveal the complexity of genetic influences on transcriptome variation in unprecedented depth and detail.
Collapse
Affiliation(s)
- Frank Wolfgang Albert
- Department of Genetics, Cell Biology and DevelopmentUniversity of MinnesotaMinneapolisUnited States
| | - Joshua S Bloom
- Department of Human GeneticsUniversity of California, Los AngelesLos AngelesUnited States
- Department of Biological ChemistryUniversity of California, Los AngelesLos AngelesUnited States
- Howard Hughes Medical InstituteLos AngelesUnited States
| | - Jake Siegel
- Department of Human GeneticsUniversity of California, Los AngelesLos AngelesUnited States
- Department of Biological ChemistryUniversity of California, Los AngelesLos AngelesUnited States
- Howard Hughes Medical InstituteLos AngelesUnited States
| | - Laura Day
- Department of Human GeneticsUniversity of California, Los AngelesLos AngelesUnited States
- Department of Biological ChemistryUniversity of California, Los AngelesLos AngelesUnited States
- Howard Hughes Medical InstituteLos AngelesUnited States
| | - Leonid Kruglyak
- Department of Human GeneticsUniversity of California, Los AngelesLos AngelesUnited States
- Department of Biological ChemistryUniversity of California, Los AngelesLos AngelesUnited States
- Howard Hughes Medical InstituteLos AngelesUnited States
| |
Collapse
|
23
|
Diepeveen ET, Gehrmann T, Pourquié V, Abeel T, Laan L. Patterns of Conservation and Diversification in the Fungal Polarization Network. Genome Biol Evol 2018; 10:1765-1782. [PMID: 29931311 PMCID: PMC6054225 DOI: 10.1093/gbe/evy121] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 06/18/2018] [Indexed: 12/12/2022] Open
Abstract
The combined actions of proteins in networks underlie all fundamental cellular functions. Deeper insights into the dynamics of network composition across species and their functional consequences are crucial to fully understand protein network evolution. Large-scale comparative studies with high phylogenetic resolution are now feasible through the recent rise in available genomic data sets of both model and nonmodel species. Here, we focus on the polarity network, which is universally essential for cell proliferation and studied in great detail in the model organism, Saccharomyces cerevisiae. We examine 42 proteins, directly related to cell polarization, across 298 fungal strains/species to determine the composition of the network and patterns of conservation and diversification. We observe strong protein conservation for a group of 23 core proteins: >95% of all examined strains/species possess at least 14 of these core proteins, albeit in varying compositions, and non of the individual core proteins is 100% conserved. We find high levels of variation in prevalence and sequence identity in the remaining 19 proteins, resulting in distinct lineage-specific compositions of the network in the majority of strains/species. We show that the observed diversification in network composition correlates with lineage, lifestyle, and genetic distance. Yeast, filamentous and basal unicellular fungi, form distinctive groups based on these analyses, with substantial differences to their polarization network. Our study shows that the fungal polarization network is highly dynamic, even between closely related species, and that functional conservation appears to be achieved by varying the specific components of the fungal polarization repertoire.
Collapse
Affiliation(s)
- Eveline T Diepeveen
- Department of Bionanoscience, Faculty of Applied Sciences, Kavli Institute of NanoScience, Delft University of Technology, The Netherlands
| | - Thies Gehrmann
- Delft Bioinformatics Lab, Faculty of Electrical Engineering, Mathematics and Computer Science, Intelligent Systems, Delft University of Technology, The Netherlands
- Department of Molecular Epidemiology, Leiden Computational Biology Center, Leiden University Medical Centre, The Netherlands
| | - Valérie Pourquié
- Department of Bionanoscience, Faculty of Applied Sciences, Kavli Institute of NanoScience, Delft University of Technology, The Netherlands
- Delft Bioinformatics Lab, Faculty of Electrical Engineering, Mathematics and Computer Science, Intelligent Systems, Delft University of Technology, The Netherlands
| | - Thomas Abeel
- Delft Bioinformatics Lab, Faculty of Electrical Engineering, Mathematics and Computer Science, Intelligent Systems, Delft University of Technology, The Netherlands
- Genome Sequencing and Analysis Program, Broad Institute of MIT and Harvard, 7 Cambridge Center, Cambridge, Massachusetts
| | - Liedewij Laan
- Department of Bionanoscience, Faculty of Applied Sciences, Kavli Institute of NanoScience, Delft University of Technology, The Netherlands
| |
Collapse
|
24
|
Alvarez-Ponce D, Feyertag F, Chakraborty S. Position Matters: Network Centrality Considerably Impacts Rates of Protein Evolution in the Human Protein-Protein Interaction Network. Genome Biol Evol 2018; 9:1742-1756. [PMID: 28854629 PMCID: PMC5570066 DOI: 10.1093/gbe/evx117] [Citation(s) in RCA: 30] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/01/2017] [Indexed: 02/06/2023] Open
Abstract
The proteins of any organism evolve at disparate rates. A long list of factors affecting rates of protein evolution have been identified. However, the relative importance of each factor in determining rates of protein evolution remains unresolved. The prevailing view is that evolutionary rates are dominantly determined by gene expression, and that other factors such as network centrality have only a marginal effect, if any. However, this view is largely based on analyses in yeasts, and accurately measuring the importance of the determinants of rates of protein evolution is complicated by the fact that the different factors are often correlated with each other, and by the relatively poor quality of available functional genomics data sets. Here, we use correlation, partial correlation and principal component regression analyses to measure the contributions of several factors to the variability of the rates of evolution of human proteins. For this purpose, we analyzed the entire human protein–protein interaction data set and the human signal transduction network—a network data set of exceptionally high quality, obtained by manual curation, which is expected to be virtually free from false positives. In contrast with the prevailing view, we observe that network centrality (measured as the number of physical and nonphysical interactions, betweenness, and closeness) has a considerable impact on rates of protein evolution. Surprisingly, the impact of centrality on rates of protein evolution seems to be comparable, or even superior according to some analyses, to that of gene expression. Our observations seem to be independent of potentially confounding factors and from the limitations (biases and errors) of interactomic data sets.
Collapse
|
25
|
Schumacher J, Herlyn H. Correlates of evolutionary rates in the murine sperm proteome. BMC Evol Biol 2018; 18:35. [PMID: 29580206 PMCID: PMC5870804 DOI: 10.1186/s12862-018-1157-6] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2017] [Accepted: 03/19/2018] [Indexed: 01/20/2023] Open
Abstract
Background Protein-coding genes expressed in sperm evolve at different rates. To gain deeper insight into the factors underlying this heterogeneity we examined the relative importance of a diverse set of previously described rate correlates in determining the evolution of murine sperm proteins. Results Using partial rank correlations we detected several major rate indicators: Phyletic gene age, numbers of protein-protein interactions, and survival essentiality emerged as particularly important rate correlates in murine sperm proteins. Tissue specificity, numbers of paralogs, and untranslated region lengths also correlate significantly with sperm genes’ evolutionary rates, albeit to a lesser extent. Multifunctionality, coding sequence or average intron lengths, and mean expression level have insignificant or virtually no independent effects on evolutionary rates in murine sperm genes. Gene ontology enrichment analyses of three equally sized murine sperm protein groups classified based on their evolutionary rates indicate strongest sperm-specific functional specialization in the most quickly evolving gene class. Conclusions We propose a model according to which slowly evolving murine sperm proteins tend to be constrained by factors such as survival essentiality, network connectivity, and/or broad expression. In contrast, evolutionary change may arise especially in less constrained sperm proteins, which might, moreover, be prone to specialize to reproduction-related functions. Our results should be taken into account in future studies on rate variations of reproductive genes. Electronic supplementary material The online version of this article (10.1186/s12862-018-1157-6) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Julia Schumacher
- Institute of Organismic and Molecular Evolution, Anthropology, Johannes Gutenberg University, Mainz, Germany.
| | - Holger Herlyn
- Institute of Organismic and Molecular Evolution, Anthropology, Johannes Gutenberg University, Mainz, Germany.
| |
Collapse
|
26
|
Mariño-Ramírez L, Bodenreider O, Kantz N, Jordan IK. Co-Evolutionary Rates of Functionally Related Yeast Genes. Evol Bioinform Online 2017. [DOI: 10.1177/117693430600200017] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
Evolutionary knowledge is often used to facilitate computational attempts at gene function prediction. One rich source of evolutionary information is the relative rates of gene sequence divergence, and in this report we explore the connection between gene evolutionary rates and function. We performed a genome-scale evaluation of the relationship between evolutionary rates and functional annotations for the yeast Saccharomyces cerevisiae. Non-synonymous ( dN) and synonymous ( dS) substitution rates were calculated for 1,095 orthologous gene sets common to S. cerevisiae and six other closely related yeast species. Differences in evolutionary rates between pairs of genes (Δ dN & Δ dS) were then compared to their functional similarities ( sGO), which were measured using Gene Ontology (GO) annotations. Substantial and statistically significant correlations were found between Δ dN and sGO, whereas there is no apparent relationship between Δ dS and sGO. These results are consistent with a mode of action for natural selection that is based on similar rates of elimination of deleterious protein coding sequence variants for functionally related genes. The connection between gene evolutionary rates and function was stronger than seen for phylogenetic profiles, which have previously been employed to inform functional inference. The co-evolution of functionally related yeast genes points to the relevance of specific function for the efficacy of natural selection and underscores the utility of gene evolutionary rates for functional predictions.
Collapse
Affiliation(s)
- Leonardo Mariño-Ramírez
- National Center for Biotechnology Information, National Institutes of Health, Bethesda, MD 20894, U.S.A
| | - Olivier Bodenreider
- National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, U.S.A
| | - Natalie Kantz
- National Center for Biotechnology Information, National Institutes of Health, Bethesda, MD 20894, U.S.A
| | - I. King Jordan
- School of Biology, Georgia Institute of Technology, Atlanta, GA 30332, U.S.A
| |
Collapse
|
27
|
Effects of different kinds of essentiality on sequence evolution of human testis proteins. Sci Rep 2017; 7:43534. [PMID: 28272493 PMCID: PMC5341092 DOI: 10.1038/srep43534] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2016] [Accepted: 01/25/2017] [Indexed: 11/17/2022] Open
Abstract
We asked if essentiality for either fertility or viability differentially affects sequence evolution of human testis proteins. Based on murine knockout data, we classified a set of 965 proteins expressed in human seminiferous tubules into three categories: proteins essential for prepubertal survival (“lethality proteins”), associated with male sub- or infertility (“male sub-/infertility proteins”), and nonessential proteins. In our testis protein dataset, lethality genes evolved significantly slower than nonessential and male sub-/infertility genes, which is in line with other authors’ findings. Using tissue specificity, connectivity in the protein-protein interaction (PPI) network, and multifunctionality as proxies for evolutionary constraints, we found that of the three categories, proteins linked to male sub- or infertility are least constrained. Lethality proteins, on the other hand, are characterized by broad expression, many PPI partners, and high multifunctionality, all of which points to strong evolutionary constraints. We conclude that compared with lethality proteins, those linked to male sub- or infertility are nonetheless indispensable, but evolve under more relaxed constraints. Finally, adaptive evolution in response to postmating sexual selection could further accelerate evolutionary rates of male sub- or infertility proteins expressed in human testis. These findings may become useful for in silico detection of human sub-/infertility genes.
Collapse
|
28
|
Invergo BM, Montanucci L, Bertranpetit J. Dynamic sensitivity and nonlinear interactions influence the system-level evolutionary patterns of phototransduction proteins. Proc Biol Sci 2017; 282:20152215. [PMID: 26631565 DOI: 10.1098/rspb.2015.2215] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023] Open
Abstract
Determining the influence of complex, molecular-system dynamics on the evolution of proteins is hindered by the significant challenge of quantifying the control exerted by the proteins on system output. We have employed a combination of systems biology and molecular evolution analyses in a first attempt to unravel this relationship. We employed a comprehensive mathematical model of mammalian phototransduction to predict the degree of influence that each protein in the system exerts on the high-level dynamic behaviour. We found that the genes encoding the most dynamically sensitive proteins exhibit relatively relaxed evolutionary constraint. We also investigated the evolutionary and epistatic influences of the many nonlinear interactions between proteins in the system and found several pairs to have coevolved, including those whose interactions are purely dynamical with respect to system output. This evidence points to a key role played by nonlinear system dynamics in influencing patterns of molecular evolution.
Collapse
Affiliation(s)
- Brandon M Invergo
- IBE-Institute of Evolutionary Biology (CSIC-Universitat Pompeu Fabra), CEXS-UPF-PRBB, Barcelona, Catalonia 08003, Spain
| | - Ludovica Montanucci
- IBE-Institute of Evolutionary Biology (CSIC-Universitat Pompeu Fabra), CEXS-UPF-PRBB, Barcelona, Catalonia 08003, Spain
| | - Jaume Bertranpetit
- IBE-Institute of Evolutionary Biology (CSIC-Universitat Pompeu Fabra), CEXS-UPF-PRBB, Barcelona, Catalonia 08003, Spain
| |
Collapse
|
29
|
Cohen O, Oberhardt M, Yizhak K, Ruppin E. Essential Genes Embody Increased Mutational Robustness to Compensate for the Lack of Backup Genetic Redundancy. PLoS One 2016; 11:e0168444. [PMID: 27997585 PMCID: PMC5173180 DOI: 10.1371/journal.pone.0168444] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2016] [Accepted: 12/01/2016] [Indexed: 11/23/2022] Open
Abstract
Genetic robustness is a hallmark of cells, occurring through many mechanisms and at many levels. Essential genes lack the common robustness mechanism of genetic redundancy (i.e., existing alongside other genes with the same function), and thus appear at first glance to leave cells highly vulnerable to genetic or environmental perturbations. Here we explore a hypothesis that cells might protect against essential gene loss through mechanisms that occur at various cellular levels aside from the level of the gene. Using Escherichia coli and Saccharomyces cerevisiae as models, we find that essential genes are enriched over non-essential genes for properties we call "coding efficiency" and "coding robustness", denoting respectively a gene's efficiency of translation and robustness to non-synonymous mutations. The coding efficiency levels of essential genes are highly positively correlated with their evolutionary conservation levels, suggesting that this feature plays a key role in protecting conserved, evolutionarily important genes. We then extend our hypothesis into the realm of metabolic networks, showing that essential metabolic reactions are encoded by more "robust" genes than non-essential reactions, and that essential metabolites are produced by more reactions than non-essential metabolites. Taken together, these results testify that robustness at the gene-loss level and at the mutation level (and more generally, at two cellular levels that are usually treated separately) are not decoupled, but rather, that cellular vulnerability exposed due to complete gene loss is compensated by increased mutational robustness. Why some genes are backed up primarily against loss and others against mutations still remains an open question.
Collapse
Affiliation(s)
- Osher Cohen
- School of Computer Sciences and Sackler School of Medicine, Tel Aviv University, Tel Aviv, Israel
| | - Matthew Oberhardt
- Center for Bioinformatics and Computational Biology, University of Maryland, College Park, MD, United States of America
| | - Keren Yizhak
- School of Computer Sciences and Sackler School of Medicine, Tel Aviv University, Tel Aviv, Israel
| | - Eytan Ruppin
- School of Computer Sciences and Sackler School of Medicine, Tel Aviv University, Tel Aviv, Israel
- Center for Bioinformatics and Computational Biology, University of Maryland, College Park, MD, United States of America
| |
Collapse
|
30
|
Chesmore KN, Bartlett J, Cheng C, Williams SM. Complex Patterns of Association between Pleiotropy and Transcription Factor Evolution. Genome Biol Evol 2016; 8:3159-3170. [PMID: 27635052 PMCID: PMC5174740 DOI: 10.1093/gbe/evw228] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
Abstract
Pleiotropy has been claimed to constrain gene evolution but specific mechanisms and extent of these constraints have been difficult to demonstrate. The expansion of molecular data makes it possible to investigate these pleiotropic effects. Few classes of genes have been characterized as intensely as human transcription factors (TFs). We therefore analyzed the evolutionary rates of full TF proteins, along with their DNA binding domains and protein-protein interacting domains (PID) in light of the degree of pleiotropy, measured by the number of TF-TF interactions, or the number of DNA-binding targets. Data were extracted from the ENCODE Chip-Seq dataset, the String v 9.2 database, and the NHGRI GWAS catalog. Evolutionary rates of proteins and domains were calculated using the PAML CodeML package. Our analysis shows that the numbers of TF-TF interactions and DNA binding targets associated with constrained gene evolution; however, the constraint caused by the number of DNA binding targets was restricted to the DNA binding domains, whereas the number of TF-TF interactions constrained the full protein and did so more strongly. Additionally, we found a positive correlation between the number of protein-PIDs and the evolutionary rates of the protein-PIDs. These findings show that not only does pleiotropy associate with constrained protein evolution but the constraint differs by domain function. Finally, we show that GWAS associated TF genes are more highly pleiotropic : The GWAS data illustrates that mutations in highly pleiotropic genes are more likely to be associated with disease phenotypes.
Collapse
Affiliation(s)
- Kevin N Chesmore
- Department of Genetics, Geisel School of Medicine, Dartmouth College, Hanover, NH
| | - Jacquelaine Bartlett
- Department of Genetics, Geisel School of Medicine, Dartmouth College, Hanover, NH
| | - Chao Cheng
- Department of Genetics, Geisel School of Medicine, Dartmouth College, Hanover, NH
| | - Scott M Williams
- Department of Genetics, Geisel School of Medicine, Dartmouth College, Hanover, NH
| |
Collapse
|
31
|
Alvarez-Ponce D, Sabater-Muñoz B, Toft C, Ruiz-González MX, Fares MA. Essentiality Is a Strong Determinant of Protein Rates of Evolution during Mutation Accumulation Experiments in Escherichia coli. Genome Biol Evol 2016; 8:2914-2927. [PMID: 27566759 PMCID: PMC5630975 DOI: 10.1093/gbe/evw205] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022] Open
Abstract
The Neutral Theory of Molecular Evolution is considered the most powerful theory to understand the evolutionary behavior of proteins. One of the main predictions of this theory is that essential proteins should evolve slower than dispensable ones owing to increased selective constraints. Comparison of genomes of different species, however, has revealed only small differences between the rates of evolution of essential and nonessential proteins. In some analyses, these differences vanish once confounding factors are controlled for, whereas in other cases essentiality seems to have an independent, albeit small, effect. It has been argued that comparing relatively distant genomes may entail a number of limitations. For instance, many of the genes that are dispensable in controlled lab conditions may be essential in some of the conditions faced in nature. Moreover, essentiality can change during evolution, and rates of protein evolution are simultaneously shaped by a variety of factors, whose individual effects are difficult to isolate. Here, we conducted two parallel mutation accumulation experiments in Escherichia coli, during 5,500–5,750 generations, and compared the genomes at different points of the experiments. Our approach (a short-term experiment, under highly controlled conditions) enabled us to overcome many of the limitations of previous studies. We observed that essential proteins evolved substantially slower than nonessential ones during our experiments. Strikingly, rates of protein evolution were only moderately affected by expression level and protein length.
Collapse
Affiliation(s)
| | - Beatriz Sabater-Muñoz
- Instituto de Biología Molecular y Celular de Plantas (CSIC-UPV), Valencia, Spain Department of Genetics, Smurfit Institute of Genetics, University of Dublin, Trinity College Dublin, Dublin, Ireland
| | - Christina Toft
- Department of Genetics, University of Valencia, Valencia, Spain Departamento de Biotecnología, Instituto de Agroquímica y Tecnología de los Alimentos (CSIC), Valencia, Spain
| | - Mario X Ruiz-González
- Instituto de Biología Molecular y Celular de Plantas (CSIC-UPV), Valencia, Spain Current Address: Secretaría de Educación Superior, Ciencia, Tecnología e Innovación, Proyecto Prometeo; Departamento de Ciencias Biológicas, Universidad Tócnica Particular de Loja, Loja, Ecuador
| | - Mario A Fares
- Instituto de Biología Molecular y Celular de Plantas (CSIC-UPV), Valencia, Spain Department of Genetics, Smurfit Institute of Genetics, University of Dublin, Trinity College Dublin, Dublin, Ireland
| |
Collapse
|
32
|
Shen XX, Salichos L, Rokas A. A Genome-Scale Investigation of How Sequence, Function, and Tree-Based Gene Properties Influence Phylogenetic Inference. Genome Biol Evol 2016; 8:2565-80. [PMID: 27492233 PMCID: PMC5010910 DOI: 10.1093/gbe/evw179] [Citation(s) in RCA: 47] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/25/2016] [Indexed: 12/13/2022] Open
Abstract
Molecular phylogenetic inference is inherently dependent on choices in both methodology and data. Many insightful studies have shown how choices in methodology, such as the model of sequence evolution or optimality criterion used, can strongly influence inference. In contrast, much less is known about the impact of choices in the properties of the data, typically genes, on phylogenetic inference. We investigated the relationships between 52 gene properties (24 sequence-based, 19 function-based, and 9 tree-based) with each other and with three measures of phylogenetic signal in two assembled data sets of 2,832 yeast and 2,002 mammalian genes. We found that most gene properties, such as evolutionary rate (measured through the percent average of pairwise identity across taxa) and total tree length, were highly correlated with each other. Similarly, several gene properties, such as gene alignment length, Guanine-Cytosine content, and the proportion of tree distance on internal branches divided by relative composition variability (treeness/RCV), were strongly correlated with phylogenetic signal. Analysis of partial correlations between gene properties and phylogenetic signal in which gene evolutionary rate and alignment length were simultaneously controlled, showed similar patterns of correlations, albeit weaker in strength. Examination of the relative importance of each gene property on phylogenetic signal identified gene alignment length, alongside with number of parsimony-informative sites and variable sites, as the most important predictors. Interestingly, the subsets of gene properties that optimally predicted phylogenetic signal differed considerably across our three phylogenetic measures and two data sets; however, gene alignment length and RCV were consistently included as predictors of all three phylogenetic measures in both yeasts and mammals. These results suggest that a handful of sequence-based gene properties are reliable predictors of phylogenetic signal and could be useful in guiding the choice of phylogenetic markers.
Collapse
Affiliation(s)
- Xing-Xing Shen
- Department of Biological Sciences, Vanderbilt University
| | - Leonidas Salichos
- Department of Biological Sciences, Vanderbilt University Department of Molecular Biophysics and Biochemistry, Yale University
| | - Antonis Rokas
- Department of Biological Sciences, Vanderbilt University
| |
Collapse
|
33
|
Banerjee S, Chakraborty S, De RK. Deciphering the cause of evolutionary variance within intrinsically disordered regions in human proteins. J Biomol Struct Dyn 2016; 35:233-249. [PMID: 26790343 DOI: 10.1080/07391102.2016.1143877] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
Why the intrinsically disordered regions evolve within human proteome has became an interesting question for a decade. Till date, it remains an unsolved yet an intriguing issue to investigate why some of the disordered regions evolve rapidly while the rest are highly conserved across mammalian species. Identifying the key biological factors, responsible for the variation in the conservation rate of different disordered regions within the human proteome, may revisit the above issue. We emphasized that among the other biological features (multifunctionality, gene essentiality, protein connectivity, number of unique domains, gene expression level and expression breadth) considered in our study, the number of unique protein domains acts as a strong determinant that negatively influences the conservation of disordered regions. In this context, we justified that proteins having a fewer types of domains preferably need to conserve their disordered regions to enhance their structural flexibility which in turn will facilitate their molecular interactions. In contrast, the selection pressure acting on the stretches of disordered regions is not so strong in the case of multi-domains proteins. Therefore, we reasoned that the presence of conserved disordered stretches may compensate the functions of multiple domains within a single domain protein. Interestingly, we noticed that the influence of the unique domain number and expression level acts differently on the evolution of disordered regions from that of well-structured ones.
Collapse
Affiliation(s)
- Sanghita Banerjee
- a Machine Intelligence Unit , Indian Statistical Institute , 203 Barrackpore Trunk Road, Kolkata 700108 , India
| | | | - Rajat K De
- a Machine Intelligence Unit , Indian Statistical Institute , 203 Barrackpore Trunk Road, Kolkata 700108 , India
| |
Collapse
|
34
|
Selection maintaining protein stability at equilibrium. J Theor Biol 2016; 391:21-34. [DOI: 10.1016/j.jtbi.2015.12.001] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2015] [Revised: 11/29/2015] [Accepted: 12/01/2015] [Indexed: 11/24/2022]
|
35
|
Acharya D, Ghosh TC. Global analysis of human duplicated genes reveals the relative importance of whole-genome duplicates originated in the early vertebrate evolution. BMC Genomics 2016; 17:71. [PMID: 26801093 PMCID: PMC4724117 DOI: 10.1186/s12864-016-2392-0] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2015] [Accepted: 01/13/2016] [Indexed: 12/13/2022] Open
Abstract
Background Gene duplication is a genetic mutation that creates functionally redundant gene copies that are initially relieved from selective pressures and may adapt themselves to new functions with time. The levels of gene duplication may vary from small-scale duplication (SSD) to whole genome duplication (WGD). Studies with yeast revealed ample differences between these duplicates: Yeast WGD pairs were functionally more similar, less divergent in subcellular localization and contained a lesser proportion of essential genes. In this study, we explored the differences in evolutionary genomic properties of human SSD and WGD genes, with the identifiable human duplicates coming from the two rounds of whole genome duplication occurred early in vertebrate evolution. Results We observed that these two groups of duplicates were also dissimilar in terms of their evolutionary and genomic properties. But interestingly, this is not like the same observed in yeast. The human WGDs were found to be functionally less similar, diverge more in subcellular level and contain a higher proportion of essential genes than the SSDs, all of which are opposite from yeast. Additionally, we explored that human WGDs were more divergent in their gene expression profile, have higher multifunctionality and are more often associated with disease, and are evolutionarily more conserved than human SSDs. Conclusions Our study suggests that human WGD duplicates are more divergent and entails the adaptation of WGDs to novel and important functions that consequently lead to their evolutionary conservation in the course of evolution. Electronic supplementary material The online version of this article (doi:10.1186/s12864-016-2392-0) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Debarun Acharya
- Bioinformatics Centre, Bose Institute, P 1/12, C.I.T. Scheme VII M, Kolkata, 700054, West Bengal, India
| | - Tapash C Ghosh
- Bioinformatics Centre, Bose Institute, P 1/12, C.I.T. Scheme VII M, Kolkata, 700054, West Bengal, India.
| |
Collapse
|
36
|
Glastad KM, Goodisman MAD, Yi SV, Hunt BG. Effects of DNA Methylation and Chromatin State on Rates of Molecular Evolution in Insects. G3 (BETHESDA, MD.) 2015; 6:357-63. [PMID: 26637432 PMCID: PMC4751555 DOI: 10.1534/g3.115.023499] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/06/2015] [Accepted: 11/30/2015] [Indexed: 01/03/2023]
Abstract
Epigenetic information is widely appreciated for its role in gene regulation in eukaryotic organisms. However, epigenetic information can also influence genome evolution. Here, we investigate the effects of epigenetic information on gene sequence evolution in two disparate insects: the fly Drosophila melanogaster, which lacks substantial DNA methylation, and the ant Camponotus floridanus, which possesses a functional DNA methylation system. We found that DNA methylation was positively correlated with the synonymous substitution rate in C. floridanus, suggesting a key effect of DNA methylation on patterns of gene evolution. However, our data suggest the link between DNA methylation and elevated rates of synonymous substitution was explained, in large part, by the targeting of DNA methylation to genes with signatures of transcriptionally active chromatin, rather than the mutational effect of DNA methylation itself. This phenomenon may be explained by an elevated mutation rate for genes residing in transcriptionally active chromatin, or by increased structural constraints on genes in inactive chromatin. This result highlights the importance of chromatin structure as the primary epigenetic driver of genome evolution in insects. Overall, our study demonstrates how different epigenetic systems contribute to variation in the rates of coding sequence evolution.
Collapse
Affiliation(s)
- Karl M Glastad
- School of Biology, Georgia Institute of Technology, Atlanta, Georgia 30332
| | | | - Soojin V Yi
- School of Biology, Georgia Institute of Technology, Atlanta, Georgia 30332
| | - Brendan G Hunt
- Department of Entomology, University of Georgia, Griffin, Georgia 30223
| |
Collapse
|
37
|
Gu X, Tang W. Model parameters of molecular evolution explain genomic correlations. Brief Bioinform 2015; 18:37-42. [PMID: 26628558 DOI: 10.1093/bib/bbv098] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2015] [Revised: 10/01/2015] [Indexed: 11/13/2022] Open
Abstract
One long-standing research focus in evolutionary genomics is trying to resolve how biological variables (expression, essentiality, protein-protein interaction, structural stability, etc.) determine the rate of protein evolution. While these studies have considerably deepened our understanding of molecular evolution, many issues remain unsolved. In this opinion article, after having a brief survey of literatures, we establish relationships between model parameters of molecular evolution and genomic variables, based on which, most-observed genomic correlations and confounds can be explained by model parameter combinations under different conditions, which include the strength of stabilizing selection, mutational variance, expression sufficiency, gene pleiotropy, as well as the effective population size. We suggest that the problem to discern biological variable(s) that may determine the rate of protein evolution can be tackled at two levels. The first level, as discussed here, is to demonstrate how the model of molecular evolution can predict potential genomic correlations under various conditions. And the second level is to estimate genome-wide variations of model parameters (or combinations) that help to identify canonical biological variables that may underlie the rate variation among genes that ranges up to at least three magnitudes.
Collapse
|
38
|
Slavney A, Arbiza L, Clark AG, Keinan A. Strong Constraint on Human Genes Escaping X-Inactivation Is Modulated by their Expression Level and Breadth in Both Sexes. Mol Biol Evol 2015; 33:384-93. [PMID: 26494842 PMCID: PMC4751236 DOI: 10.1093/molbev/msv225] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022] Open
Abstract
In eutherian mammals, X-linked gene expression is normalized between XX females and XY males through the process of X chromosome inactivation (XCI). XCI results in silencing of transcription from one ChrX homolog per female cell. However, approximately 25% of human ChrX genes escape XCI to some extent and exhibit biallelic expression in females. The evolutionary basis of this phenomenon is not entirely clear, but high sequence conservation of XCI escapers suggests that purifying selection may directly or indirectly drive XCI escape at these loci. One hypothesis is that this signal results from contributions to developmental and physiological sex differences, but presently there is limited evidence supporting this model in humans. Another potential driver of this signal is selection for high and/or broad gene expression in both sexes, which are strong predictors of reduced nucleotide substitution rates in mammalian genes. Here, we compared purifying selection and gene expression patterns of human XCI escapers with those of X-inactivated genes in both sexes. When we accounted for the functional status of each ChrX gene’s Y-linked homolog (or “gametolog”), we observed that XCI escapers exhibit greater degrees of purifying selection in the human lineage than X-inactivated genes, as well as higher and broader gene expression than X-inactivated genes across tissues in both sexes. These results highlight a significant role for gene expression in both sexes in driving purifying selection on XCI escapers, and emphasize these genes’ potential importance in human disease.
Collapse
Affiliation(s)
- Andrea Slavney
- Department of Biological Statistics and Computational Biology, Cornell University Department of Molecular Biology and Genetics, Cornell University
| | - Leonardo Arbiza
- Department of Biological Statistics and Computational Biology, Cornell University
| | - Andrew G Clark
- Department of Biological Statistics and Computational Biology, Cornell University Department of Molecular Biology and Genetics, Cornell University
| | - Alon Keinan
- Department of Biological Statistics and Computational Biology, Cornell University
| |
Collapse
|
39
|
Whittle CA, Extavour CG. Codon and Amino Acid Usage Are Shaped by Selection Across Divergent Model Organisms of the Pancrustacea. G3 (BETHESDA, MD.) 2015; 5:2307-21. [PMID: 26384771 PMCID: PMC4632051 DOI: 10.1534/g3.115.021402] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/15/2015] [Accepted: 08/28/2015] [Indexed: 01/24/2023]
Abstract
In protein-coding genes, synonymous codon usage and amino acid composition correlate to expression in some eukaryotes, and may result from translational selection. Here, we studied large-scale RNA-seq data from three divergent arthropod models, including cricket (Gryllus bimaculatus), milkweed bug (Oncopeltus fasciatus), and the amphipod crustacean Parhyale hawaiensis, and tested for optimization of codon and amino acid usage relative to expression level. We report strong signals of AT3 optimal codons (those favored in highly expressed genes) in G. bimaculatus and O. fasciatus, whereas weaker signs of GC3 optimal codons were found in P. hawaiensis, suggesting selection on codon usage in all three organisms. Further, in G. bimaculatus and O. fasciatus, high expression was associated with lowered frequency of amino acids with large size/complexity (S/C) scores in favor of those with intermediate S/C values; thus, selection may favor smaller amino acids while retaining those of moderate size for protein stability or conformation. In P. hawaiensis, highly transcribed genes had elevated frequency of amino acids with large and small S/C scores, suggesting a complex dynamic in this crustacean. In all species, the highly transcribed genes appeared to favor short proteins, high optimal codon usage, specific amino acids, and were preferentially involved in cell-cycling and protein synthesis. Together, based on examination of 1,680,067, 1,667,783, and 1,326,896 codon sites in G. bimaculatus, O. fasciatus, and P. hawaiensis, respectively, we conclude that translational selection shapes codon and amino acid usage in these three Pancrustacean arthropods.
Collapse
Affiliation(s)
- Carrie A Whittle
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, Massachusetts 02138
| | - Cassandra G Extavour
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, Massachusetts 02138 Department of Molecular and Cellular Biology, Harvard University, Cambridge, Massachusetts 02138
| |
Collapse
|
40
|
Filleton F, Chuffart F, Nagarajan M, Bottin-Duplus H, Yvert G. The complex pattern of epigenomic variation between natural yeast strains at single-nucleosome resolution. Epigenetics Chromatin 2015; 8:26. [PMID: 26229551 PMCID: PMC4520285 DOI: 10.1186/s13072-015-0019-3] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2015] [Accepted: 07/22/2015] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Epigenomic studies on humans and model species have revealed substantial inter-individual variation in histone modification profiles. However, the pattern of this variation has not been precisely characterized, particularly regarding which genomic features are enriched for variability and whether distinct histone marks co-vary synergistically. Yeast allows us to investigate intra-species variation at high resolution while avoiding other sources of variation, such as cell type or subtype. RESULTS We profiled histone marks H3K4me3, H3K9ac, H3K14ac, H4K12ac and H3K4me1 in three unrelated wild strains of Saccharomyces cerevisiae at single-nucleosome resolution and analyzed inter-strain differences statistically. All five marks varied significantly at specific loci, but to different extents. The number of nucleosomes varying for a given mark between two strains ranged from 20 to several thousands; +1 nucleosomes were significantly less subject to variation. Genes with highly evolvable or responsive expression showed higher variability; however, the variation pattern could not be explained by known transcriptional differences between the strains. Synergistic variation of distinct marks was not systematic, with surprising differences between functionally related H3K9ac and H3K14ac. Interestingly, H3K14ac differences that persisted through transient hyperacetylation were supported by H3K4me3 differences, suggesting stabilization via cross talk. CONCLUSIONS Quantitative variation of histone marks among S. cerevisiae strains is abundant and complex. Its relation to functional characteristics is modular and seems modest, with partial association with gene expression divergences, differences between functionally related marks and partial co-variation between marks that may confer stability. Thus, the specific context of studies, such as which precise marks, individuals and genomic loci are investigated, is primordial in population epigenomics studies. The complexity found in this pilot survey in yeast suggests that high complexity can be anticipated among higher eukaryotes, including humans.
Collapse
Affiliation(s)
- Fabien Filleton
- Laboratoire de Biologie Moléculaire de la Cellule, Ecole Normale Supérieure de Lyon, CNRS, Université de Lyon, 46 Allée d'Italie, 69007 Lyon, France
| | - Florent Chuffart
- Laboratoire de Biologie Moléculaire de la Cellule, Ecole Normale Supérieure de Lyon, CNRS, Université de Lyon, 46 Allée d'Italie, 69007 Lyon, France
| | - Muniyandi Nagarajan
- Laboratoire de Biologie Moléculaire de la Cellule, Ecole Normale Supérieure de Lyon, CNRS, Université de Lyon, 46 Allée d'Italie, 69007 Lyon, France ; Department of Genomic Science, School of Biological Sciences, Central University of Kerala, Kerala, India
| | - Hélène Bottin-Duplus
- Laboratoire de Biologie Moléculaire de la Cellule, Ecole Normale Supérieure de Lyon, CNRS, Université de Lyon, 46 Allée d'Italie, 69007 Lyon, France
| | - Gaël Yvert
- Laboratoire de Biologie Moléculaire de la Cellule, Ecole Normale Supérieure de Lyon, CNRS, Université de Lyon, 46 Allée d'Italie, 69007 Lyon, France
| |
Collapse
|
41
|
Abstract
The rate and mechanism of protein sequence evolution have been central questions in evolutionary biology since the 1960s. Although the rate of protein sequence evolution depends primarily on the level of functional constraint, exactly what determines functional constraint has remained unclear. The increasing availability of genomic data has enabled much needed empirical examinations on the nature of functional constraint. These studies found that the evolutionary rate of a protein is predominantly influenced by its expression level rather than functional importance. A combination of theoretical and empirical analyses has identified multiple mechanisms behind these observations and demonstrated a prominent role in protein evolution of selection against errors in molecular and cellular processes.
Collapse
Affiliation(s)
- Jianzhi Zhang
- Department of Ecology and Evolutionary Biology, University of Michigan, 830 North University Avenue, Ann Arbor, Michigan 48109, USA
| | - Jian-Rong Yang
- Department of Ecology and Evolutionary Biology, University of Michigan, 830 North University Avenue, Ann Arbor, Michigan 48109, USA
| |
Collapse
|
42
|
Kryuchkova-Mostacci N, Robinson-Rechavi M. Tissue-Specific Evolution of Protein Coding Genes in Human and Mouse. PLoS One 2015; 10:e0131673. [PMID: 26121354 PMCID: PMC4488272 DOI: 10.1371/journal.pone.0131673] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2015] [Accepted: 06/04/2015] [Indexed: 12/23/2022] Open
Abstract
Protein-coding genes evolve at different rates, and the influence of different parameters, from gene size to expression level, has been extensively studied. While in yeast gene expression level is the major causal factor of gene evolutionary rate, the situation is more complex in animals. Here we investigate these relations further, especially taking in account gene expression in different organs as well as indirect correlations between parameters. We used RNA-seq data from two large datasets, covering 22 mouse tissues and 27 human tissues. Over all tissues, evolutionary rate only correlates weakly with levels and breadth of expression. The strongest explanatory factors of purifying selection are GC content, expression in many developmental stages, and expression in brain tissues. While the main component of evolutionary rate is purifying selection, we also find tissue-specific patterns for sites under neutral evolution and for positive selection. We observe fast evolution of genes expressed in testis, but also in other tissues, notably liver, which are explained by weak purifying selection rather than by positive selection.
Collapse
Affiliation(s)
- Nadezda Kryuchkova-Mostacci
- Department of Ecology and Evolution, University of Lausanne, Lausanne, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Marc Robinson-Rechavi
- Department of Ecology and Evolution, University of Lausanne, Lausanne, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
| |
Collapse
|
43
|
Wojtkowska M, Buczek D, Stobienia O, Karachitos A, Antoniewicz M, Slocinska M, Makałowski W, Kmita H. The TOM Complex of Amoebozoans: the Cases of the Amoeba Acanthamoeba castellanii and the Slime Mold Dictyostelium discoideum. Protist 2015; 166:349-62. [PMID: 26074248 DOI: 10.1016/j.protis.2015.05.005] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2014] [Revised: 05/10/2015] [Accepted: 05/14/2015] [Indexed: 11/29/2022]
Abstract
Protein import into mitochondria requires a wide variety of proteins, forming complexes in both mitochondrial membranes. The TOM complex (translocase of the outer membrane) is responsible for decoding of targeting signals, translocation of imported proteins across or into the outer membrane, and their subsequent sorting. Thus the TOM complex is regarded as the main gate into mitochondria for imported proteins. Available data indicate that mitochondria of representative organisms from across the major phylogenetic lineages of eukaryotes differ in subunit organization of the TOM complex. The subunit organization of the TOM complex in the Amoebozoa is still elusive, so we decided to investigate its organization in the soil amoeba Acanthamoeba castellanii and the slime mold Dictyostelium discoideum. They represent two major subclades of the Amoebozoa: the Lobosa and Conosa, respectively. Our results confirm the presence of Tom70, Tom40 and Tom7 in the A. castellanii and D. discoideum TOM complex, while the presence of Tom22 and Tom20 is less supported. Interestingly, the Tom proteins display the highest similarity to Opisthokonta cognate proteins, with the exception of Tom40. Thus representatives of two major subclades of the Amoebozoa appear to be similar in organization of the TOM complex, despite differences in their lifestyle.
Collapse
Affiliation(s)
- Małgorzata Wojtkowska
- Adam Mickiewicz University, Faculty of Biology, Institute of Molecular Biology and Biotechnology, Department of Bioenergetics, Poznań, Poland.
| | - Dorota Buczek
- Adam Mickiewicz University, Faculty of Biology, Institute of Molecular Biology and Biotechnology, Department of Bioenergetics, Poznań, Poland; University of Muenster, Faculty of Medicine Institute of Bioinformatics, Muenster, Germany
| | - Olgierd Stobienia
- Adam Mickiewicz University, Faculty of Biology, Institute of Molecular Biology and Biotechnology, Department of Bioenergetics, Poznań, Poland
| | - Andonis Karachitos
- Adam Mickiewicz University, Faculty of Biology, Institute of Molecular Biology and Biotechnology, Department of Bioenergetics, Poznań, Poland
| | - Monika Antoniewicz
- Adam Mickiewicz University, Faculty of Biology, Institute of Molecular Biology and Biotechnology, Department of Bioenergetics, Poznań, Poland
| | - Małgorzata Slocinska
- Adam Mickiewicz University, Faculty of Biology, Institute of Experimental Biology, Department of Animal Physiology and Development, Poznań, Poland
| | - Wojciech Makałowski
- University of Muenster, Faculty of Medicine Institute of Bioinformatics, Muenster, Germany
| | - Hanna Kmita
- Adam Mickiewicz University, Faculty of Biology, Institute of Molecular Biology and Biotechnology, Department of Bioenergetics, Poznań, Poland
| |
Collapse
|
44
|
Faure G, Koonin EV. Universal distribution of mutational effects on protein stability, uncoupling of protein robustness from sequence evolution and distinct evolutionary modes of prokaryotic and eukaryotic proteins. Phys Biol 2015; 12:035001. [PMID: 25927823 DOI: 10.1088/1478-3975/12/3/035001] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
Abstract
Robustness to destabilizing effects of mutations is thought of as a key factor of protein evolution. The connections between two measures of robustness, the relative core size and the computationally estimated effect of mutations on protein stability (ΔΔG), protein abundance and the selection pressure on protein-coding genes (dN/dS) were analyzed for the organisms with a large number of available protein structures including four eukaryotes, two bacteria and one archaeon. The distribution of the effects of mutations in the core on protein stability is universal and indistinguishable in eukaryotes and bacteria, centered at slightly destabilizing amino acid replacements, and with a heavy tail of more strongly destabilizing replacements. The distribution of mutational effects in the hyperthermophilic archaeon Thermococcus gammatolerans is significantly shifted toward strongly destabilizing replacements which is indicative of stronger constraints that are imposed on proteins in hyperthermophiles. The median effect of mutations is strongly, positively correlated with the relative core size, in evidence of the congruence between the two measures of protein robustness. However, both measures show only limited correlations to the expression level and selection pressure on protein-coding genes. Thus, the degree of robustness reflected in the universal distribution of mutational effects appears to be a fundamental, ancient feature of globular protein folds whereas the observed variations are largely neutral and uncoupled from short term protein evolution. A weak anticorrelation between protein core size and selection pressure is observed only for surface residues in prokaryotes but a stronger anticorrelation is observed for all residues in eukaryotic proteins. This substantial difference between proteins of prokaryotes and eukaryotes is likely to stem from the demonstrable higher compactness of prokaryotic proteins.
Collapse
Affiliation(s)
- Guilhem Faure
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | | |
Collapse
|
45
|
Bloch NI, Price TD, Chang BSW. Evolutionary dynamics of Rh2 opsins in birds demonstrate an episode of accelerated evolution in the New World warblers (Setophaga). Mol Ecol 2015; 24:2449-62. [PMID: 25827331 DOI: 10.1111/mec.13180] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2015] [Revised: 03/14/2015] [Accepted: 03/23/2015] [Indexed: 12/23/2022]
Abstract
Low rates of sequence evolution associated with purifying selection can be interrupted by episodic changes in selective regimes. Visual pigments are a unique system in which we can investigate the functional consequences of genetic changes, therefore connecting genotype to phenotype in the context of natural and sexual selection pressures. We study the RH2 and RH1 visual pigments (opsins) across 22 bird species belonging to two ecologically convergent clades, the New World warblers (Parulidae) and Old World warblers (Phylloscopidae) and evaluate rates of evolution in these clades along with data from 21 additional species. We demonstrate generally slow evolution of these opsins: both Rh1 and Rh2 are highly conserved across Old World and New World warblers. However, Rh2 underwent a burst of evolution within the New World genus Setophaga, where it accumulated substitutions at 6 amino acid sites across the species we studied. Evolutionary analyses revealed a significant increase in dN /dS in Setophaga, implying relatively strong selective pressures to overcome long-standing purifying selection. We studied the effects of each substitution on spectral tuning and found they do not cause large spectral shifts. Thus, substitutions may reflect other aspects of opsin function, such as those affecting photosensitivity and/or dark-light adaptation. Although it is unclear what these alterations mean for colour perception, we suggest that rapid evolution is linked to sexual selection, given the exceptional plumage colour diversification in Setophaga.
Collapse
Affiliation(s)
- Natasha I Bloch
- Department of Ecology & Evolution, University of Chicago, 1101 E 57th Street, Chicago, IL, 60637, USA
| | | | | |
Collapse
|
46
|
Mukherjee S, Panda A, Ghosh TC. Elucidating evolutionary features and functional implications of orphan genes in Leishmania major. INFECTION GENETICS AND EVOLUTION 2015; 32:330-7. [PMID: 25843649 DOI: 10.1016/j.meegid.2015.03.031] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/13/2015] [Revised: 03/25/2015] [Accepted: 03/26/2015] [Indexed: 11/28/2022]
Abstract
Orphan genes are protein coding genes that lack recognizable homologs in other organisms. These genes were reported to comprise a considerable fraction of coding regions in all sequenced genomes and thought to be allied with organism's lineage-specific traits. However, their evolutionary persistence and functional significance still remain elusive. Due to lack of homologs with the host genome and for their probable lineage-specific functional roles, orphan gene product of pathogenic protozoan might be considered as the possible therapeutic targets. Leishmania major is an important parasitic protozoan of the genus Leishmania that is associated with the disease cutaneous leishmaniasis. Therefore, evolutionary and functional characterization of orphan genes in this organism may help in understanding the factors prevailing pathogen evolution and parasitic adaptation. In this study, we systematically identified orphan genes of L. major and employed several in silico analyses for understanding their evolutionary and functional attributes. To trace the signatures of molecular evolution, we compared their evolutionary rate with non-orphan genes. In agreement with prior observations, here we noticed that orphan genes evolve at a higher rate as compared to non-orphan genes. Lower sequence conservation of orphan genes was previously attributed solely due to their younger gene age. However, here we observed that together with gene age, a number of genomic (like expression level, GC content, variation in codon usage) and proteomic factors (like protein length, intrinsic disorder content, hydropathicity) could independently modulate their evolutionary rate. We considered the interplay of all these factors and analyzed their relative contribution on protein evolutionary rate by regression analysis. On the functional level, we observed that orphan genes are associated with regulatory, growth factor and transport related processes. Moreover, these genes were found to be enriched with various types of interaction and trafficking motifs, implying their possible involvement in host-parasite interactions. Thus, our comprehensive analysis of L. major orphan genes provided evidence for their extensive roles in host-pathogen interactions and virulence.
Collapse
Affiliation(s)
- Sumit Mukherjee
- Bioinformatics Centre, Bose Institute, P 1/12, C.I.T. Scheme VII M, Kolkata 700 054, West Bengal, India; Department of Physical Sciences, Indian Institute of Science Education and Research-Kolkata, Mohanpur 741246, Nadia, West Bengal, India
| | - Arup Panda
- Bioinformatics Centre, Bose Institute, P 1/12, C.I.T. Scheme VII M, Kolkata 700 054, West Bengal, India
| | - Tapash Chandra Ghosh
- Bioinformatics Centre, Bose Institute, P 1/12, C.I.T. Scheme VII M, Kolkata 700 054, West Bengal, India.
| |
Collapse
|
47
|
Cheng C, Andrews E, Yan KK, Ung M, Wang D, Gerstein M. An approach for determining and measuring network hierarchy applied to comparing the phosphorylome and the regulome. Genome Biol 2015; 16:63. [PMID: 25880651 PMCID: PMC4404648 DOI: 10.1186/s13059-015-0624-2] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2014] [Accepted: 03/10/2015] [Indexed: 12/16/2022] Open
Abstract
Many biological networks naturally form a hierarchy with a preponderance of downward information flow. In this study, we define a score to quantify the degree of hierarchy in a network and develop a simulated-annealing algorithm to maximize the hierarchical score globally over a network. We apply our algorithm to determine the hierarchical structure of the phosphorylome in detail and investigate the correlation between its hierarchy and kinase properties. We also compare it to the regulatory network, finding that the phosphorylome is more hierarchical than the regulome.
Collapse
Affiliation(s)
- Chao Cheng
- Department of Genetics, Geisel School of Medicine at Dartmouth, Hanover, New Hampshire, USA. .,Institute for Quantitative Biomedical Sciences, Geisel School of Medicine at Dartmouth, Lebanon, New Hampshire, USA. .,Norris Cotton Cancer Center, Geisel School of Medicine at Dartmouth, Lebanon, New Hampshire, USA.
| | - Erik Andrews
- Department of Genetics, Geisel School of Medicine at Dartmouth, Hanover, New Hampshire, USA.
| | - Koon-Kiu Yan
- Program in Computational Biology and Bioinformatics, Yale University, 260 Whitney Avenue, New Haven, CT, 06520, USA.
| | - Matthew Ung
- Department of Genetics, Geisel School of Medicine at Dartmouth, Hanover, New Hampshire, USA.
| | - Daifeng Wang
- Program in Computational Biology and Bioinformatics, Yale University, 260 Whitney Avenue, New Haven, CT, 06520, USA.
| | - Mark Gerstein
- Program in Computational Biology and Bioinformatics, Yale University, 260 Whitney Avenue, New Haven, CT, 06520, USA. .,Department of Molecular Biophysics and Biochemistry, Yale University, 260 Whitney Avenue, New Haven, CT, 06520, USA. .,Department of Computer Science, Yale University, 260 Whitney Avenue, New Haven, CT, 06520, USA.
| |
Collapse
|
48
|
Gunawardana Y, Fujiwara S, Takeda A, Woo J, Woelk C, Niranjan M. Outlier detection at the transcriptome-proteome interface. Bioinformatics 2015; 31:2530-6. [PMID: 25819671 DOI: 10.1093/bioinformatics/btv182] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2014] [Accepted: 03/24/2015] [Indexed: 11/14/2022] Open
Abstract
BACKGROUND In high-throughput experimental biology, it is widely acknowledged that while expression levels measured at the levels of transcriptome and the corresponding proteome do not, in general, correlate well, messenger RNA levels are used as convenient proxies for protein levels. Our interest is in developing data-driven computational models that can bridge the gap between these two levels of measurement at which different mechanisms of regulation may act on different molecular species causing any observed lack of correlations. To this end, we build data-driven predictors of protein levels using mRNA levels and known proxies of translation efficiencies as covariates. Previous work showed that in such a setting, outliers with respect to the model are reliable candidates for post-translational regulation. RESULTS Here, we present and compare two novel formulations of deriving a protein concentration predictor from which outliers may be extracted in a systematic manner. The first approach, outlier rejecting regression, allows explicit specification of a certain fraction of the data as outliers. In a regression setting, this is a non-convex optimization problem which we solve by deriving a difference of convex functions algorithm (DCA). With post-translationally regulated proteins, one expects their concentrations to be affected primarily by disruption of protein stability. Our second algorithm exploits this observation by minimizing an asymmetric loss using quantile regression and extracts outlier proteins whose measured concentrations are lower than what a genome-wide regression would predict. We validate the two approaches on a dataset of yeast transcriptome and proteome. Functional annotation check on detected outliers demonstrate that the methods are able to identify post-translationally regulated genes with high statistical confidence.
Collapse
Affiliation(s)
- Yawwani Gunawardana
- School of Electronics and Computer Science, University of Southampton, Southampton, UK
| | - Shuhei Fujiwara
- Department of Mathematical Informatics, Graduate School of Information Science and Technology, The University of Tokyo, Tokyo, Japan and
| | - Akiko Takeda
- Department of Mathematical Informatics, Graduate School of Information Science and Technology, The University of Tokyo, Tokyo, Japan and
| | - Jeongmin Woo
- Faculty of Medicine, Southampton General Hospital, University of Southampton, Southampton, UK
| | - Christopher Woelk
- Faculty of Medicine, Southampton General Hospital, University of Southampton, Southampton, UK
| | - Mahesan Niranjan
- School of Electronics and Computer Science, University of Southampton, Southampton, UK
| |
Collapse
|
49
|
Yang L, Hao D, Lv Y, Zuo Y, Jiang W. Genome-wide characterization of essential, toxicity-modulating and no-phenotype genes in S. cerevisiae. Gene 2015; 559:1-8. [PMID: 25576218 DOI: 10.1016/j.gene.2015.01.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2014] [Revised: 12/25/2014] [Accepted: 01/04/2015] [Indexed: 11/30/2022]
Abstract
Based on the requirements for an organism's viability, genes can be classified into essential genes and non-essential genes. Non-essential genes can be further classified into toxicity-modulating genes and no-phenotype genes based on the fitness phenotype of yeast cells when the gene is deleted under DNA-damaging conditions. In this study, graph theoretical approaches were used to characterize essential, toxicity-modulating and no-phenotype genes for S. cerevisiae in the physical interaction (PI) network and the perturbation sensitivity (PS) network. We also gained previously published biological datasets to gain a more complete understanding of the differences and relationships between essential, toxicity-modulating genes and no-phenotype genes. The analysis results indicate that toxicity-modulating genes have similar properties as essential genes, and toxicity-modulating genes might represent a middle ground between essential genes and no-phenotype genes, suggesting that cells initiate highly coordinated responses to damage that are similar to those needed for vital cellular functions. These findings may elucidate the mechanisms for understanding toxicity-modulating processes relevant to certain diseases.
Collapse
Affiliation(s)
- Lei Yang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, PR China
| | - Dapeng Hao
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, PR China
| | - Yingli Lv
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, PR China
| | - Yongchun Zuo
- The Key Laboratory of Mammalian Reproductive Biology and Biotechnology of the Ministry of Education, College of Life Sciences, Inner Mongolia University, Hohhot 010021, PR China.
| | - Wei Jiang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, PR China.
| |
Collapse
|
50
|
Barker B, Xu L, Gu Z. Dynamic epistasis under varying environmental perturbations. PLoS One 2015; 10:e0114911. [PMID: 25625594 PMCID: PMC4308068 DOI: 10.1371/journal.pone.0114911] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2013] [Accepted: 11/15/2014] [Indexed: 01/17/2023] Open
Abstract
Epistasis describes the phenomenon that mutations at different loci do not have independent effects with regard to certain phenotypes. Understanding the global epistatic landscape is vital for many genetic and evolutionary theories. Current knowledge for epistatic dynamics under multiple conditions is limited by the technological difficulties in experimentally screening epistatic relations among genes. We explored this issue by applying flux balance analysis to simulate epistatic landscapes under various environmental perturbations. Specifically, we looked at gene-gene epistatic interactions, where the mutations were assumed to occur in different genes. We predicted that epistasis tends to become more positive from glucose-abundant to nutrient-limiting conditions, indicating that selection might be less effective in removing deleterious mutations in the latter. We also observed a stable core of epistatic interactions in all tested conditions, as well as many epistatic interactions unique to each condition. Interestingly, genes in the stable epistatic interaction network are directly linked to most other genes whereas genes with condition-specific epistasis form a scale-free network. Furthermore, genes with stable epistasis tend to have similar evolutionary rates, whereas this co-evolving relationship does not hold for genes with condition-specific epistasis. Our findings provide a novel genome-wide picture about epistatic dynamics under environmental perturbations.
Collapse
Affiliation(s)
- Brandon Barker
- Center for Advanced Computing, Cornell University, Ithaca, New York, United States of America
| | - Lin Xu
- Division of Hematology/Oncology, Department of Pediatrics, University of Texas Southwestern Medical Center, Dallas, Texas, United States of America
| | - Zhenglong Gu
- Division of Nutritional Sciences, Cornell University, Ithaca, New York, United States of America
- Tri-Institutional Training Program in Computational Biology and Medicine, New York, New York, United States of America
| |
Collapse
|