1
|
Basu Choudhury G, Datta S. Implication of Molecular Constraints Facilitating the Functional Evolution of Pseudomonas aeruginosa KPR2 into a Versatile α-Keto-Acid Reductase. Biochemistry 2024; 63:1808-1823. [PMID: 38962820 DOI: 10.1021/acs.biochem.4c00087] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/05/2024]
Abstract
Theoretical concepts linking the structure, function, and evolution of a protein, while often intuitive, necessitate validation through investigations in real-world systems. Our study empirically explores the evolutionary implications of multiple gene copies in an organism by shedding light on the structure-function modulations observed in Pseudomonas aeruginosa's second copy of ketopantoate reductase (PaKPR2). We demonstrated with two apo structures that the typical active site cleft of the protein transforms into a two-sided pocket where a molecular gate made up of two residues controls the substrate entry site, resulting in its inactivity toward the natural substrate ketopantoate. Strikingly, this structural modification made the protein active against several important α-keto-acid substrates with varied efficiency. Structural constraints at the binding site for this altered functional trait were analyzed with two binary complexes that show the conserved residue microenvironment faces restricted movements due to domain closure. Finally, its mechanistic highlights gathered from a ternary complex structure help in delineating the molecular perspectives behind its kinetic cooperativity toward these broad range of substrates. Detailed structural characteristics of the protein presented here also identified four key amino acid residues responsible for its versatile α-keto-acid reductase activity, which can be further modified to improve its functional properties through protein engineering.
Collapse
Affiliation(s)
- Gourab Basu Choudhury
- CSIR-Indian Institute of Chemical Biology, Raja S C Mullick Road, Jadavpur, Kolkata 700032, India
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad 201002, India
| | - Saumen Datta
- CSIR-Indian Institute of Chemical Biology, Raja S C Mullick Road, Jadavpur, Kolkata 700032, India
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad 201002, India
| |
Collapse
|
2
|
Ding J, Liu F, Zeng J, Gu H, Zhang D, Yang X, Wu B, Shu L, He Z, Wang C. Homologous recombination and gene-specific selection co-shape the vertical nucleotide diversity of mangrove sediment microbial populations. Ecol Evol 2024; 14:e70040. [PMID: 39021733 PMCID: PMC11254452 DOI: 10.1002/ece3.70040] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2024] [Revised: 07/01/2024] [Accepted: 07/08/2024] [Indexed: 07/20/2024] Open
Abstract
Mangrove sediments host a diverse array of microbial populations and are characterized by high heterogeneity along their vertical depths. However, the genetic diversity within these populations is largely unknown, hindering our understanding of their adaptive evolution across the sediment depths. To elucidate their genetic diversity, we utilized metagenome sequencing to identify 16 high-frequency microbial populations comprised of two archaea and 14 bacteria from mangrove sediment cores (0-100 cm, with 10 depths) in Qi'ao Island, China. Our analysis of the genome-wide genetic variation revealed extensive nucleotide diversity in the microbial populations. The genes involved in the transport and the energy metabolism displayed a high nucleotide diversity (HND; 0.0045-0.0195; an indicator of shared minor alleles with the microbial populations). By tracking the processes of homologous recombination, we found that each microbial population was subjected to different purification selection levels at different depths (44.12% genes). This selection resulted in significant differences in synonymous/non-synonymous mutation ratio between 0-20 and 20-100 cm layers, indicating the adaptive evolutionary process of microbial populations. Furthermore, our assessment of differentiation in the allele frequencies between these two layers showed that the functional genes involved in the metabolic processes of amino acids or cofactors were highly differential in more than half of them. Together, we showed that the nucleotide diversity of microbial populations was shaped by homologous recombination and gene-specific selection, finally resulting in the stratified differentiation occurring between 0-20 and 20-100 cm. These results enhance our cognition of the microbial adaptation mechanisms to vertical environmental changes during the sedimentation process of coastal blue carbon ecosystems.
Collapse
Affiliation(s)
- Jijuan Ding
- School of Environmental Science and Engineering, Southern Marine Science and Engineering Guangdong Laboratory (Zhuhai), State Key Laboratory for BiocontrolSun Yat‐Sen UniversityGuangzhouChina
| | - Fei Liu
- School of Environmental Science and Engineering, Southern Marine Science and Engineering Guangdong Laboratory (Zhuhai), State Key Laboratory for BiocontrolSun Yat‐Sen UniversityGuangzhouChina
| | - Jiaxiong Zeng
- School of Environmental Science and Engineering, Southern Marine Science and Engineering Guangdong Laboratory (Zhuhai), State Key Laboratory for BiocontrolSun Yat‐Sen UniversityGuangzhouChina
| | - Hang Gu
- School of Environmental Science and Engineering, Southern Marine Science and Engineering Guangdong Laboratory (Zhuhai), State Key Laboratory for BiocontrolSun Yat‐Sen UniversityGuangzhouChina
| | - Dandan Zhang
- School of Environmental Science and Engineering, Southern Marine Science and Engineering Guangdong Laboratory (Zhuhai), State Key Laboratory for BiocontrolSun Yat‐Sen UniversityGuangzhouChina
| | - Xueqin Yang
- School of Environmental Science and Engineering, Southern Marine Science and Engineering Guangdong Laboratory (Zhuhai), State Key Laboratory for BiocontrolSun Yat‐Sen UniversityGuangzhouChina
| | - Bo Wu
- School of Environmental Science and Engineering, Southern Marine Science and Engineering Guangdong Laboratory (Zhuhai), State Key Laboratory for BiocontrolSun Yat‐Sen UniversityGuangzhouChina
| | - Longfei Shu
- School of Environmental Science and Engineering, Southern Marine Science and Engineering Guangdong Laboratory (Zhuhai), State Key Laboratory for BiocontrolSun Yat‐Sen UniversityGuangzhouChina
| | - Zhili He
- School of Environmental Science and Engineering, Southern Marine Science and Engineering Guangdong Laboratory (Zhuhai), State Key Laboratory for BiocontrolSun Yat‐Sen UniversityGuangzhouChina
| | - Cheng Wang
- School of Environmental Science and Engineering, Southern Marine Science and Engineering Guangdong Laboratory (Zhuhai), State Key Laboratory for BiocontrolSun Yat‐Sen UniversityGuangzhouChina
- Key Laboratory of Watershed Earth Surface Processes and Ecological SecurityZhejiang Normal UniversityJinhuaChina
| |
Collapse
|
3
|
Balogun EJ, Ness RW. The Effects of De Novo Mutation on Gene Expression and the Consequences for Fitness in Chlamydomonas reinhardtii. Mol Biol Evol 2024; 41:msae035. [PMID: 38366781 PMCID: PMC10910851 DOI: 10.1093/molbev/msae035] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2023] [Revised: 02/01/2024] [Accepted: 02/13/2024] [Indexed: 02/18/2024] Open
Abstract
Mutation is the ultimate source of genetic variation, the bedrock of evolution. Yet, predicting the consequences of new mutations remains a challenge in biology. Gene expression provides a potential link between a genotype and its phenotype. But the variation in gene expression created by de novo mutation and the fitness consequences of mutational changes to expression remain relatively unexplored. Here, we investigate the effects of >2,600 de novo mutations on gene expression across the transcriptome of 28 mutation accumulation lines derived from 2 independent wild-type genotypes of the green algae Chlamydomonas reinhardtii. We observed that the amount of genetic variance in gene expression created by mutation (Vm) was similar to the variance that mutation generates in typical polygenic phenotypic traits and approximately 15-fold the variance seen in the limited species where Vm in gene expression has been estimated. Despite the clear effect of mutation on expression, we did not observe a simple additive effect of mutation on expression change, with no linear correlation between the total expression change and mutation count of individual MA lines. We therefore inferred the distribution of expression effects of new mutations to connect the number of mutations to the number of differentially expressed genes (DEGs). Our inferred DEE is highly L-shaped with 95% of mutations causing 0-1 DEG while the remaining 5% are spread over a long tail of large effect mutations that cause multiple genes to change expression. The distribution is consistent with many cis-acting mutation targets that affect the expression of only 1 gene and a large target of trans-acting targets that have the potential to affect tens or hundreds of genes. Further evidence for cis-acting mutations can be seen in the overabundance of mutations in or near differentially expressed genes. Supporting evidence for trans-acting mutations comes from a 15:1 ratio of DEGs to mutations and the clusters of DEGs in the co-expression network, indicative of shared regulatory architecture. Lastly, we show that there is a negative correlation with the extent of expression divergence from the ancestor and fitness, providing direct evidence of the deleterious effects of perturbing gene expression.
Collapse
Affiliation(s)
- Eniolaye J Balogun
- Department of Biology, William G. Davis Building, University of Toronto, Mississauga L5L-1C6, Canada
- Department of Ecology and Evolutionary Biology, University of Toronto, Toronto M5S-3B2, Canada
| | - Rob W Ness
- Department of Biology, William G. Davis Building, University of Toronto, Mississauga L5L-1C6, Canada
| |
Collapse
|
4
|
Li G, Luo X, Hu Z, Wu J, Peng W, Liu J, Zhu X. Essential proteins discovery based on dominance relationship and neighborhood similarity centrality. Health Inf Sci Syst 2023; 11:55. [PMID: 37981988 PMCID: PMC10654316 DOI: 10.1007/s13755-023-00252-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2023] [Accepted: 10/13/2023] [Indexed: 11/21/2023] Open
Abstract
Essential proteins play a vital role in development and reproduction of cells. The identification of essential proteins helps to understand the basic survival of cells. Due to time-consuming, costly and inefficient with biological experimental methods for discovering essential proteins, computational methods have gained increasing attention. In the initial stage, essential proteins are mainly identified by the centralities based on protein-protein interaction (PPI) networks, which limit their identification rate due to many false positives in PPI networks. In this study, a purified PPI network is firstly introduced to reduce the impact of false positives in the PPI network. Secondly, by analyzing the similarity relationship between a protein and its neighbors in the PPI network, a new centrality called neighborhood similarity centrality (NSC) is proposed. Thirdly, based on the subcellular localization and orthologous data, the protein subcellular localization score and ortholog score are calculated, respectively. Fourthly, by analyzing a large number of methods based on multi-feature fusion, it is found that there is a special relationship among features, which is called dominance relationship, then, a novel model based on dominance relationship is proposed. Finally, NSC, subcellular localization score, and ortholog score are fused by the dominance relationship model, and a new method called NSO is proposed. In order to verify the performance of NSO, the seven representative methods (ION, NCCO, E_POC, SON, JDC, PeC, WDC) are compared on yeast datasets. The experimental results show that the NSO method has higher identification rate than other methods.
Collapse
Affiliation(s)
- Gaoshi Li
- Key Lab of Education Blockchain and Intelligent Technology, Ministry of Education, Guangxi Normal University, Guilin, 541004 China
- Guangxi Key Lab of Multi-source Information Mining & Security, Guangxi Normal University, Guilin, 541004 Guangxi China
- College of Computer Science and Engineering, Guangxi Normal University, Guilin, 541004 Guangxi China
| | - Xinlong Luo
- Key Lab of Education Blockchain and Intelligent Technology, Ministry of Education, Guangxi Normal University, Guilin, 541004 China
- Guangxi Key Lab of Multi-source Information Mining & Security, Guangxi Normal University, Guilin, 541004 Guangxi China
- College of Computer Science and Engineering, Guangxi Normal University, Guilin, 541004 Guangxi China
| | - Zhipeng Hu
- Key Lab of Education Blockchain and Intelligent Technology, Ministry of Education, Guangxi Normal University, Guilin, 541004 China
- Guangxi Key Lab of Multi-source Information Mining & Security, Guangxi Normal University, Guilin, 541004 Guangxi China
- College of Computer Science and Engineering, Guangxi Normal University, Guilin, 541004 Guangxi China
| | - Jingli Wu
- Key Lab of Education Blockchain and Intelligent Technology, Ministry of Education, Guangxi Normal University, Guilin, 541004 China
- Guangxi Key Lab of Multi-source Information Mining & Security, Guangxi Normal University, Guilin, 541004 Guangxi China
- College of Computer Science and Engineering, Guangxi Normal University, Guilin, 541004 Guangxi China
| | - Wei Peng
- Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming, 650500 Yunnan China
| | - Jiafei Liu
- Key Lab of Education Blockchain and Intelligent Technology, Ministry of Education, Guangxi Normal University, Guilin, 541004 China
- Guangxi Key Lab of Multi-source Information Mining & Security, Guangxi Normal University, Guilin, 541004 Guangxi China
- College of Computer Science and Engineering, Guangxi Normal University, Guilin, 541004 Guangxi China
| | - Xiaoshu Zhu
- Key Lab of Education Blockchain and Intelligent Technology, Ministry of Education, Guangxi Normal University, Guilin, 541004 China
- Guangxi Key Lab of Multi-source Information Mining & Security, Guangxi Normal University, Guilin, 541004 Guangxi China
- College of Computer Science and Engineering, Guangxi Normal University, Guilin, 541004 Guangxi China
- School of Computer and Information Security & School of Software Engineering, Guilin University of Electronic Science and Technology, Guilin, China
| |
Collapse
|
5
|
Johnson MM, Hockenberry AJ, McGuffie MJ, Vieira LC, Wilke CO. Growth-dependent Gene Expression Variation Influences the Strength of Codon Usage Biases. Mol Biol Evol 2023; 40:msad189. [PMID: 37619989 PMCID: PMC10482319 DOI: 10.1093/molbev/msad189] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2023] [Accepted: 08/11/2023] [Indexed: 08/26/2023] Open
Abstract
The most highly expressed genes in microbial genomes tend to use a limited set of synonymous codons, often referred to as "preferred codons." The existence of preferred codons is commonly attributed to selection pressures on various aspects of protein translation including accuracy and/or speed. However, gene expression is condition-dependent and even within single-celled organisms transcript and protein abundances can vary depending on a variety of environmental and other factors. Here, we show that growth rate-dependent expression variation is an important constraint that significantly influences the evolution of gene sequences. Using large-scale transcriptomic and proteomic data sets in Escherichia coli and Saccharomyces cerevisiae, we confirm that codon usage biases are strongly associated with gene expression but highlight that this relationship is most pronounced when gene expression measurements are taken during rapid growth conditions. Specifically, genes whose relative expression increases during periods of rapid growth have stronger codon usage biases than comparably expressed genes whose expression decreases during rapid growth conditions. These findings highlight that gene expression measured in any particular condition tells only part of the story regarding the forces shaping the evolution of microbial gene sequences. More generally, our results imply that microbial physiology during rapid growth is critical for explaining long-term translational constraints.
Collapse
Affiliation(s)
- Mackenzie M Johnson
- Department of Integrative Biology, The University of Texas at Austin, Austin, TX, USA
| | - Adam J Hockenberry
- Department of Integrative Biology, The University of Texas at Austin, Austin, TX, USA
| | - Matthew J McGuffie
- Department of Molecular Biosciences, Center for Systems and Synthetic Biology, The University of Texas at Austin, Austin, TX, USA
| | - Luiz Carlos Vieira
- Department of Integrative Biology, The University of Texas at Austin, Austin, TX, USA
| | - Claus O Wilke
- Department of Integrative Biology, The University of Texas at Austin, Austin, TX, USA
| |
Collapse
|
6
|
Johnson MM, Hockenberry AJ, McGuffie MJ, Vieira LC, Wilke CO. Growth-dependent gene expression variation influences the strength of codon usage biases. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.03.14.532645. [PMID: 36993177 PMCID: PMC10055066 DOI: 10.1101/2023.03.14.532645] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 04/29/2023]
Abstract
The most highly expressed genes in microbial genomes tend to use a limited set of synonymous codons, often referred to as "preferred codons." The existence of preferred codons is commonly attributed to selection pressures on various aspects of protein translation including accuracy and/or speed. However, gene expression is condition-dependent and even within single-celled organisms transcript and protein abundances can vary depending on a variety of environmental and other factors. Here, we show that growth rate-dependent expression variation is an important constraint that significantly influences the evolution of gene sequences. Using large-scale transcriptomic and proteomic data sets in Escherichia coli and Saccharomyces cerevisiae, we confirm that codon usage biases are strongly associated with gene expression but highlight that this relationship is most pronounced when gene expression measurements are taken during rapid growth conditions. Specifically, genes whose relative expression increases during periods of rapid growth have stronger codon usage biases than comparably expressed genes whose expression decreases during rapid growth conditions. These findings highlight that gene expression measured in any particular condition tells only part of the story regarding the forces shaping the evolution of microbial gene sequences. More generally, our results imply that microbial physiology during rapid growth is critical for explaining long-term translational constraints.
Collapse
Affiliation(s)
- Mackenzie M Johnson
- Department of Integrative Biology, The University of Texas at Austin, Austin, TX, United States of America
| | - Adam J Hockenberry
- Department of Integrative Biology, The University of Texas at Austin, Austin, TX, United States of America
| | - Matthew J McGuffie
- Department of Molecular Biosciences, Center for Systems and Synthetic Biology, The University of Texas at Austin, Austin, TX, United States of America
| | - Luiz Carlos Vieira
- Department of Integrative Biology, The University of Texas at Austin, Austin, TX, United States of America
| | - Claus O Wilke
- Department of Integrative Biology, The University of Texas at Austin, Austin, TX, United States of America
| |
Collapse
|
7
|
Dolatyabi S, Peighambari SM, Razmyar J. Molecular detection and analysis of beak and feather disease viruses in Iran. Front Vet Sci 2022; 9:1053886. [PMID: 36532332 PMCID: PMC9751380 DOI: 10.3389/fvets.2022.1053886] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2022] [Accepted: 11/14/2022] [Indexed: 10/25/2023] Open
Abstract
The beak and feather disease virus (BFDV) is one of the few pathogens capable of causing extinction of psittacines. To determine the prevalence and the nature of BFDV mutation, this study investigated the presence of the BFDV among 1,095 individual birds of the 17 psittacine species in Iran followed by analyzing the DNA sequences of seven replication-associated protein (rep) and 10 capsid (cap) genomes of the virus. The BFDV was found to be the foremost pathogen among more than 12 psittacine species, and phylogenetic analysis showed that the BFDV GenBank-published sequences from Poland, Saudi Arabia, South Africa, Taiwan, and Thailand were most similar to those of this study. Evolutionary analysis concluded that arginine, leucine, and glycine were the amino acids frequently involved in the least-conserved substitution patterns of BFDV, and conversely, methionine, glutamine, and tryptophan were the amino acids that exhibited ultra-high conservation through the substitution patterns. The high substitution rate of arginine to lysine and glycine to serine also made greater contribution to the BFDV gene mutation. The relative synonymous codon usage between two genes revealed that the cap genome encoded proteins frequently used fewer codons, while the rep genome encoded proteins used more codons only at moderate frequency, explaining the broader divergence of the cap compared to the rep sequence. The data analysis also introduced a new variant of BFDV that exists in the rep and cap sequences of budgerigars. While the existence of more new variants was suspected, more solid evidence is required to substantiate this suspicion.
Collapse
|
8
|
Bédard C, Cisneros AF, Jordan D, Landry CR. Correlation between protein abundance and sequence conservation: what do recent experiments say? Curr Opin Genet Dev 2022; 77:101984. [PMID: 36162152 DOI: 10.1016/j.gde.2022.101984] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2022] [Revised: 08/23/2022] [Accepted: 08/26/2022] [Indexed: 01/27/2023]
Abstract
Cells evolve in a space of parameter values set by physical and chemical forces. These constraints create associations among cellular properties. A particularly strong association is the negative correlation between the rate of evolution of proteins and their abundance in the cell. Highly expressed proteins evolve slower than lowly expressed ones. Multiple hypotheses have been put forward to explain this relationship, including, for instance, the requirement for higher mRNA stability, misfolding avoidance, and misinteraction avoidance for highly expressed proteins. Here, we review some of these hypotheses, their predictions, and how they are supported to finally discuss recent experiments that have been performed to test these predictions.
Collapse
Affiliation(s)
- Camille Bédard
- Département de Biologie, Faculté des Sciences et de Génie, Université Laval, G1V 0A6, Canada; Institut de Biologie Intégrative et des Systèmes, Université Laval, G1V 0A6, Canada; PROTEO, Le regroupement québécois de recherche sur la fonction, l'ingénierie et les applications des protéines, Université Laval, G1V 0A6, Canada; Centre de Recherche sur les Données Massives, Université Laval, G1V 0A6, Canada. https://twitter.com/@CamilleBed17
| | - Angel F Cisneros
- Institut de Biologie Intégrative et des Systèmes, Université Laval, G1V 0A6, Canada; PROTEO, Le regroupement québécois de recherche sur la fonction, l'ingénierie et les applications des protéines, Université Laval, G1V 0A6, Canada; Centre de Recherche sur les Données Massives, Université Laval, G1V 0A6, Canada; Département de Biochimie, de Microbiologie et de Bio-informatique, Faculté des Sciences et de Génie, Université Laval, G1V 0A6, Canada. https://twitter.com/@AngelFCC119
| | - David Jordan
- Institut de Biologie Intégrative et des Systèmes, Université Laval, G1V 0A6, Canada; PROTEO, Le regroupement québécois de recherche sur la fonction, l'ingénierie et les applications des protéines, Université Laval, G1V 0A6, Canada; Centre de Recherche sur les Données Massives, Université Laval, G1V 0A6, Canada; Département de Biochimie, de Microbiologie et de Bio-informatique, Faculté des Sciences et de Génie, Université Laval, G1V 0A6, Canada. https://twitter.com/@DavidJordan1997
| | - Christian R Landry
- Département de Biologie, Faculté des Sciences et de Génie, Université Laval, G1V 0A6, Canada; Institut de Biologie Intégrative et des Systèmes, Université Laval, G1V 0A6, Canada; PROTEO, Le regroupement québécois de recherche sur la fonction, l'ingénierie et les applications des protéines, Université Laval, G1V 0A6, Canada; Centre de Recherche sur les Données Massives, Université Laval, G1V 0A6, Canada; Département de Biochimie, de Microbiologie et de Bio-informatique, Faculté des Sciences et de Génie, Université Laval, G1V 0A6, Canada.
| |
Collapse
|
9
|
Moldovan MA, Gaydukova SA. Unusual Dependence between Gene Expression and Negative Selection in Euplotes. Mol Biol 2022. [DOI: 10.1134/s0026893323010090] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
|
10
|
Moutinho AF, Eyre-Walker A, Dutheil JY. Strong evidence for the adaptive walk model of gene evolution in Drosophila and Arabidopsis. PLoS Biol 2022; 20:e3001775. [PMID: 36099311 PMCID: PMC9470001 DOI: 10.1371/journal.pbio.3001775] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2021] [Accepted: 08/01/2022] [Indexed: 11/19/2022] Open
Abstract
Understanding the dynamics of species adaptation to their environments has long been a central focus of the study of evolution. Theories of adaptation propose that populations evolve by “walking” in a fitness landscape. This “adaptive walk” is characterised by a pattern of diminishing returns, where populations further away from their fitness optimum take larger steps than those closer to their optimal conditions. Hence, we expect young genes to evolve faster and experience mutations with stronger fitness effects than older genes because they are further away from their fitness optimum. Testing this hypothesis, however, constitutes an arduous task. Young genes are small, encode proteins with a higher degree of intrinsic disorder, are expressed at lower levels, and are involved in species-specific adaptations. Since all these factors lead to increased protein evolutionary rates, they could be masking the effect of gene age. While controlling for these factors, we used population genomic data sets of Arabidopsis and Drosophila and estimated the rate of adaptive substitutions across genes from different phylostrata. We found that a gene’s evolutionary age significantly impacts the molecular rate of adaptation. Moreover, we observed that substitutions in young genes tend to have larger physicochemical effects. Our study, therefore, provides strong evidence that molecular evolution follows an adaptive walk model across a large evolutionary timescale. This study uses population genomic datasets from Arabidopsis and Drosophila to show that young genes adapt faster and are subject to mutations of larger fitness effects, providing strong evidence that molecular evolution follows an adaptive walk model across a large evolutionary timescale.
Collapse
Affiliation(s)
- Ana Filipa Moutinho
- Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Biology, Plön, Germany
- School of Life Sciences, University of Sussex, Brighton, United Kingdom
- * E-mail:
| | - Adam Eyre-Walker
- School of Life Sciences, University of Sussex, Brighton, United Kingdom
| | - Julien Y. Dutheil
- Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Biology, Plön, Germany
- Unité Mixte de Recherche 5554 Institut des Sciences de l’Evolution, CNRS, IRD, EPHE, Université de Montpellier, Montpellier, France
| |
Collapse
|
11
|
Shibai A, Kotani H, Sakata N, Furusawa C, Tsuru S. Purifying selection enduringly acts on the sequence evolution of highly expressed proteins in Escherichia coli. G3 GENES|GENOMES|GENETICS 2022; 12:6694045. [PMID: 36073932 PMCID: PMC9635659 DOI: 10.1093/g3journal/jkac235] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/04/2022] [Accepted: 08/27/2022] [Indexed: 11/17/2022]
Abstract
The evolutionary speed of a protein sequence is constrained by its expression level, with highly expressed proteins evolving relatively slowly. This negative correlation between expression levels and evolutionary rates (known as the E–R anticorrelation) has already been widely observed in past macroevolution between species from bacteria to animals. However, it remains unclear whether this seemingly general law also governs recent evolution, including past and de novo, within a species. However, the advent of genomic sequencing and high-throughput phenotyping, particularly for bacteria, has revealed fundamental gaps between the 2 evolutionary processes and has provided empirical data opposing the possible underlying mechanisms which are widely believed. These conflicts raise questions about the generalization of the E–R anticorrelation and the relevance of plausible mechanisms. To explore the ubiquitous impact of expression levels on molecular evolution and test the relevance of the possible underlying mechanisms, we analyzed the genome sequences of 99 strains of Escherichia coli for evolution within species in nature. We also analyzed genomic mutations accumulated under laboratory conditions as a model of de novo evolution within species. Here, we show that E–R anticorrelation is significant in both past and de novo evolution within species in E. coli. Our data also confirmed ongoing purifying selection on highly expressed genes. Ongoing selection included codon-level purifying selection, supporting the relevance of the underlying mechanisms. However, the impact of codon-level purifying selection on the constraints in evolution within species might be smaller than previously expected from evolution between species.
Collapse
Affiliation(s)
- Atsushi Shibai
- Center for Biosystems Dynamics Research (BDR), RIKEN , Osaka 565-0874, Japan
| | - Hazuki Kotani
- Center for Biosystems Dynamics Research (BDR), RIKEN , Osaka 565-0874, Japan
| | - Natsue Sakata
- Center for Biosystems Dynamics Research (BDR), RIKEN , Osaka 565-0874, Japan
| | - Chikara Furusawa
- Center for Biosystems Dynamics Research (BDR), RIKEN , Osaka 565-0874, Japan
- Universal Biology Institute, School of Science, The University of Tokyo , Tokyo 113-0033, Japan
| | - Saburo Tsuru
- Universal Biology Institute, School of Science, The University of Tokyo , Tokyo 113-0033, Japan
| |
Collapse
|
12
|
Palenchar PM. The Influence of Codon Usage, Protein Abundance, and Protein Stability on Protein Evolution Vary by Evolutionary Distance and the Type of Protein. Protein J 2022; 41:216-229. [PMID: 35147896 DOI: 10.1007/s10930-022-10045-w] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/01/2022] [Indexed: 12/01/2022]
Abstract
In general, the evolutionary rate of proteins is not primarily related to protein and amino acid functions, and factors such as protein abundance, codon usage, and the protein's TM are more important. To better understand the factors that affect protein evolution, E. coli MG1655 orthologs were compared to those in closely related bacteria and to more distantly related prokaryotes, eukaryotes, and archaea. Also, the evolution of different types of proteins was studied. The analyses indicate that the amino acid conservation of enzymes that do not use macromolecules (e.g. DNA, RNA, and proteins) as substrates and that carry out metabolic processes involving small molecules (i.e. small molecule enzymes) is different than other enzymes. For example, the small molecule enzymes have a lower percent identity than other enzymes when sequences from closely related bacteria are compared. Analyses indicate the lower percent identity is not a result of the amino acid or codon usage of the small molecule enzymes. The small molecule enzymes also don't have a significantly lower protein abundance indicating that is also not likely an important factor driving differences in amino acid conservation. Analyses indicate different methods to measure the TM of proteins have different relationships between amino acid conservation over different evolutionary distances. In totality, the results demonstrate that the relationship between the factors thought to affect protein evolution (protein abundance, codon usage, and proteins TMs) and protein evolution are complex and depend on the factor, the organisms, and the type of proteins being analyzed.
Collapse
Affiliation(s)
- Peter M Palenchar
- Department of Chemistry, Villanova University, 800 E. Lancaster Ave, Villanova, PA, 19805, USA.
| |
Collapse
|
13
|
Divyashree M, Prakash SK, Aditya V, Aljabali AA, Alzahrani KJ, Azevedo V, Góes-Neto A, Tambuwala MM, Barh D. Bugs as drugs: neglected but a promising future therapeutic strategy in cancer. Future Oncol 2022; 18:1609-1626. [PMID: 35137604 DOI: 10.2217/fon-2021-1137] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
Effective cancer treatment is an urgent need due to the rising incidence of cancer. One of the most promising future strategies in cancer treatment is using microorganisms as cancer indicators, prophylactic agents, immune activators, vaccines or vectors in antitumor therapy. The success of bacteria-mediated chemotherapy will be dependent on the balance of therapeutic benefit and the control of bacterial infection in the body. Additionally, protozoans and viruses have the potential to be used in cancer therapy. This review summarizes how these microorganisms interact with tumor microenvironments and the challenges of a 'bugs as drugs' approach in cancer therapy. Several standpoints are discussed, such as bacteria as vectors for gene therapy that shuttle therapeutic compounds into tumor tissues, their intrinsic antitumor activities and their combination with chemotherapy or radiotherapy. Bug-based cancer therapy is a two-edged sword and we need to find the opportunities by overcoming the challenges.
Collapse
Affiliation(s)
- Mithoor Divyashree
- Nitte University Centre for Science Education & Research (NUCSER), NITTE (Deemed to be University), Paneer Campus, Deralakatte, Mangalore, 575018, Karnataka, India
| | - Shama K Prakash
- K. S. Hegde Medical Academy, NITTE (Deemed to be University), Deralakatte, Mangalore, 575018, Karnataka, India
| | - Vankadari Aditya
- Nitte University Centre for Science Education & Research (NUCSER), NITTE (Deemed to be University), Paneer Campus, Deralakatte, Mangalore, 575018, Karnataka, India
| | - Alaa Aa Aljabali
- Department of Pharmaceutics & Pharmaceutical Technology, Yarmouk University-Faculty of Pharmacy, Irbid, 566, Jordan
| | - Khalid J Alzahrani
- Department of Clinical Laboratories Sciences, College of Applied Medical Sciences, Taif University, Taif, 21944, Saudi Arabia
| | - Vasco Azevedo
- Department of Genetics, Laboratory of Cellular & Molecular Genetics, Ecology & Evolution, Institute of Biological Sciences, Federal University of Minas Gerais, Belo Horizonte, CEP, 31270-901, Brazil
| | - Aristóteles Góes-Neto
- Department of Microbiology, Molecular & Computational Biology of Fungi Laboratory, Institute of Biological Sciences, Federal University of Minas Gerais, Belo Horizonte, CEP, 31270-901, Brazil
| | - Murtaza M Tambuwala
- School of Pharmacy & Pharmaceutical Science, Ulster University, Coleraine, Northern Ireland, BT52 1SA, UK
| | - Debmalya Barh
- Department of Genetics, Laboratory of Cellular & Molecular Genetics, Ecology & Evolution, Institute of Biological Sciences, Federal University of Minas Gerais, Belo Horizonte, CEP, 31270-901, Brazil.,Institute of Integrative Omics & Applied Biotechnology (IIOAB), Nonakuri, Purba Medinipur WB, 721172, India
| |
Collapse
|
14
|
Soni V, Eyre-Walker A. OUP accepted manuscript. Genome Biol Evol 2022; 14:6528851. [PMID: 35166775 PMCID: PMC8882387 DOI: 10.1093/gbe/evac028] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/09/2022] [Indexed: 12/05/2022] Open
Abstract
The rate of amino acid substitution has been shown to be correlated to a number of factors including the rate of recombination, the age of the gene, the length of the protein, mean expression level, and gene function. However, the extent to which these correlations are due to adaptive and nonadaptive evolution has not been studied in detail, at least not in hominids. We find that the rate of adaptive evolution is significantly positively correlated to the rate of recombination, protein length and gene expression level, and negatively correlated to gene age. These correlations remain significant when each factor is controlled for in turn, except when controlling for expression in an analysis of protein length; and they also generally remain significant when biased gene conversion is taken into account. However, the positive correlations could be an artifact of population size contraction. We also find that the rate of nonadaptive evolution is negatively correlated to each factor, and all these correlations survive controlling for each other and biased gene conversion. Finally, we examine the effect of gene function on rates of adaptive and nonadaptive evolution; we confirm that virus-interacting proteins (VIPs) have higher rates of adaptive and lower rates of nonadaptive evolution, but we also demonstrate that there is significant variation in the rate of adaptive and nonadaptive evolution between GO categories when removing VIPs. We estimate that the VIP/non-VIP axis explains about 5–8 fold more of the variance in evolutionary rate than GO categories.
Collapse
Affiliation(s)
- Vivak Soni
- School of Life Sciences, University of Sussex, Brighton, United Kingdom
| | - Adam Eyre-Walker
- School of Life Sciences, University of Sussex, Brighton, United Kingdom
- Corresponding author: E-mail:
| |
Collapse
|
15
|
Abstract
It is known that methods to estimate the rate of adaptive evolution, which are based on the McDonald–Kreitman test, can be biased by changes in effective population size. Here, we demonstrate theoretically that changes in population size can also generate an artifactual correlation between the rate of adaptive evolution and any factor that is correlated to the strength of selection acting against deleterious mutations. In this context, we have investigated whether several site-level factors influence the rate of adaptive evolution in the divergence of humans and chimpanzees, two species that have been inferred to have undergone population size contraction since they diverged. We find that the rate of adaptive evolution, relative to the rate of mutation, is higher for more exposed amino acids, lower for amino acid pairs that are more dissimilar in terms of their polarity, volume, and lower for amino acid pairs that are subject to stronger purifying selection, as measured by the ratio of the numbers of nonsynonymous to synonymous polymorphisms (pN/pS). All of these correlations are opposite to the artifactual correlations expected under contracting population size. We therefore conclude that these correlations are genuine.
Collapse
Affiliation(s)
- Vivak Soni
- School of Life Sciences, University of Sussex, Brighton, United Kingdom
| | - Ana Filipa Moutinho
- School of Life Sciences, University of Sussex, Brighton, United Kingdom
- Department for Evolutionary Genetics, Max Planck Institute for Evolutionary Biology, Plon, Germany
| | - Adam Eyre-Walker
- School of Life Sciences, University of Sussex, Brighton, United Kingdom
- Corresponding author: E-mail:
| |
Collapse
|
16
|
Wan Y, Mills E, Leung RC, Vieira A, Zhi X, Croucher NJ, Woodford N, Jauneikaite E, Ellington MJ, Sriskandan S. Alterations in chromosomal genes nfsA, nfsB, and ribE are associated with nitrofurantoin resistance in Escherichia coli from the United Kingdom. Microb Genom 2021; 7:000702. [PMID: 34860151 PMCID: PMC8767348 DOI: 10.1099/mgen.0.000702] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023] Open
Abstract
Antimicrobial resistance in enteric or urinary Escherichia coli is a risk factor for invasive E. coli infections. Due to widespread trimethoprim resistance amongst urinary E. coli and increased bacteraemia incidence, a national recommendation to prescribe nitrofurantoin for uncomplicated urinary tract infection was made in 2014. Nitrofurantoin resistance is reported in <6% urinary E. coli isolates in the UK, however, mechanisms underpinning nitrofurantoin resistance in these isolates remain unknown. This study aimed to identify the genetic basis of nitrofurantoin resistance in urinary E. coli isolates collected from north west London and then elucidate resistance-associated genetic alterations in available UK E. coli genomes. As a result, an algorithm was developed to predict nitrofurantoin susceptibility. Deleterious mutations and gene-inactivating insertion sequences in chromosomal nitroreductase genes nfsA and/or nfsB were identified in genomes of nine confirmed nitrofurantoin-resistant urinary E. coli isolates and additional 11 E. coli isolates that were highlighted by the prediction algorithm and subsequently validated to be nitrofurantoin-resistant. Eight categories of allelic changes in nfsA, nfsB, and the associated gene ribE were detected in 12412 E. coli genomes from the UK. Evolutionary analysis of these three genes revealed homoplasic mutations and explained the previously reported order of stepwise mutations. The mobile gene complex oqxAB, which is associated with reduced nitrofurantoin susceptibility, was identified in only one of the 12412 genomes. In conclusion, mutations and insertion sequences in nfsA and nfsB were leading causes of nitrofurantoin resistance in UK E. coli. As nitrofurantoin exposure increases in human populations, the prevalence of nitrofurantoin resistance in carriage E. coli isolates and those from urinary and bloodstream infections should be monitored.
Collapse
Affiliation(s)
- Yu Wan
- NIHR Health Protection Research Unit in Healthcare Associated Infections and Antimicrobial Resistance, Department of Infectious Disease, Imperial College London, London, United Kingdom
| | - Ewurabena Mills
- NIHR Health Protection Research Unit in Healthcare Associated Infections and Antimicrobial Resistance, Department of Infectious Disease, Imperial College London, London, United Kingdom
- MRC Centre for Molecular Bacteriology and Infection, Imperial College London, London, United Kingdom
| | - Rhoda C.Y. Leung
- NIHR Health Protection Research Unit in Healthcare Associated Infections and Antimicrobial Resistance, Department of Infectious Disease, Imperial College London, London, United Kingdom
- Present address: Department of Microbiology, Queen Mary Hospital, Hong Kong S.A.R., PR China
| | - Ana Vieira
- NIHR Health Protection Research Unit in Healthcare Associated Infections and Antimicrobial Resistance, Department of Infectious Disease, Imperial College London, London, United Kingdom
- MRC Centre for Molecular Bacteriology and Infection, Imperial College London, London, United Kingdom
| | - Xiangyun Zhi
- NIHR Health Protection Research Unit in Healthcare Associated Infections and Antimicrobial Resistance, Department of Infectious Disease, Imperial College London, London, United Kingdom
- MRC Centre for Molecular Bacteriology and Infection, Imperial College London, London, United Kingdom
| | - Nicholas J. Croucher
- Department of Infectious Disease Epidemiology, School of Public Health, Imperial College London, London, United Kingdom
- MRC Centre for Global Infectious Disease Analysis, School of Public Health, Imperial College London, London, United Kingdom
| | - Neil Woodford
- Antimicrobial Resistance and Healthcare Associated Infections Reference Unit, National Infection Service, Public Health England, Colindale, London, United Kingdom
| | - Elita Jauneikaite
- NIHR Health Protection Research Unit in Healthcare Associated Infections and Antimicrobial Resistance, Department of Infectious Disease, Imperial College London, London, United Kingdom
- Department of Infectious Disease Epidemiology, School of Public Health, Imperial College London, London, United Kingdom
- MRC Centre for Global Infectious Disease Analysis, School of Public Health, Imperial College London, London, United Kingdom
| | - Matthew J. Ellington
- NIHR Health Protection Research Unit in Healthcare Associated Infections and Antimicrobial Resistance, Department of Infectious Disease, Imperial College London, London, United Kingdom
- Antimicrobial Resistance and Healthcare Associated Infections Reference Unit, National Infection Service, Public Health England, Colindale, London, United Kingdom
| | - Shiranee Sriskandan
- NIHR Health Protection Research Unit in Healthcare Associated Infections and Antimicrobial Resistance, Department of Infectious Disease, Imperial College London, London, United Kingdom
- MRC Centre for Molecular Bacteriology and Infection, Imperial College London, London, United Kingdom
- *Correspondence: Shiranee Sriskandan,
| |
Collapse
|
17
|
Tralamazza SM, Abraham LN, Reyes-Avila CS, Corrêa B, Croll D. Histone H3K27 methylation perturbs transcriptional robustness and underpins dispensability of highly conserved genes in fungi. Mol Biol Evol 2021; 39:6424003. [PMID: 34751371 PMCID: PMC8789075 DOI: 10.1093/molbev/msab323] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
Epigenetic modifications are key regulators of gene expression and underpin genome integrity. Yet, how epigenetic changes affect the evolution and transcriptional robustness of genes remains largely unknown. Here, we show how the repressive histone mark H3K27me3 underpins the trajectory of highly conserved genes in fungi. We first performed transcriptomic profiling on closely related species of the plant pathogen Fusarium graminearum species complex. We determined transcriptional responsiveness of genes across environmental conditions to determine expression robustness. To infer evolutionary conservation, we used a framework of 23 species across the Fusarium genus including three species covered with histone methylation data. Gene expression variation is negatively correlated with gene conservation confirming that highly conserved genes show higher expression robustness. In contrast, genes marked by H3K27me3 do not show such associations. Furthermore, highly conserved genes marked by H3K27me3 encode smaller proteins, exhibit weaker codon usage bias, higher levels of hydrophobicity, show lower intrinsically disordered regions, and are enriched for functions related to regulation and membrane transport. The evolutionary age of conserved genes with H3K27me3 histone marks falls typically within the origins of the Fusarium genus. We show that highly conserved genes marked by H3K27me3 are more likely to be dispensable for survival during host infection. Lastly, we show that conserved genes exposed to repressive H3K27me3 marks across distantly related Fusarium fungi are associated with transcriptional perturbation at the microevolutionary scale. In conclusion, we show how repressive histone marks are entangled in the evolutionary fate of highly conserved genes across evolutionary timescales.
Collapse
Affiliation(s)
- Sabina Moser Tralamazza
- Laboratory of Evolutionary Genetics, Institute of Biology, University of Neuchatel, Switzerland.,Department of Microbiology, Institute of Biomedical Sciences, University of Sao Paulo, Brazil
| | - Leen Nanchira Abraham
- Laboratory of Evolutionary Genetics, Institute of Biology, University of Neuchatel, Switzerland
| | | | - Benedito Corrêa
- Department of Microbiology, Institute of Biomedical Sciences, University of Sao Paulo, Brazil
| | - Daniel Croll
- Laboratory of Evolutionary Genetics, Institute of Biology, University of Neuchatel, Switzerland
| |
Collapse
|
18
|
Latrille T, Lartillot N. Quantifying the impact of changes in effective population size and expression level on the rate of coding sequence evolution. Theor Popul Biol 2021; 142:57-66. [PMID: 34563555 DOI: 10.1016/j.tpb.2021.09.005] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2021] [Revised: 09/08/2021] [Accepted: 09/11/2021] [Indexed: 02/07/2023]
Abstract
Molecular sequences are shaped by selection, where the strength of selection relative to drift is determined by effective population size (Ne). Populations with high Ne are expected to undergo stronger purifying selection, and consequently to show a lower substitution rate for selected mutations relative to the substitution rate for neutral mutations (ω). However, computational models based on biophysics of protein stability have suggested that ω can also be independent of Ne. Together, the response of ω to changes in Ne depends on the specific mapping from sequence to fitness. Importantly, an increase in protein expression level has been found empirically to result in decrease of ω, an observation predicted by theoretical models assuming selection for protein stability. Here, we derive a theoretical approximation for the response of ω to changes in Ne and expression level, under an explicit genotype-phenotype-fitness map. The method is generally valid for additive traits and log-concave fitness functions. We applied these results to protein undergoing selection for their conformational stability and corroborate out findings with simulations under more complex models. We predict a weak response of ω to changes in either Ne or expression level, which are interchangeable. Based on empirical data, we propose that fitness based on the conformational stability may not be a sufficient mechanism to explain the empirically observed variation in ω across species. Other aspects of protein biophysics might be explored, such as protein-protein interactions, which can lead to a stronger response of ω to changes in Ne.
Collapse
Affiliation(s)
- T Latrille
- Université de Lyon, Université Lyon 1, CNRS, Laboratoire de Biométrie et Biologie Évolutive UMR 5558, F-69622 Villeurbanne, France; École Normale Supérieure de Lyon, Université de Lyon, Université Lyon 1, Lyon, France.
| | - N Lartillot
- Université de Lyon, Université Lyon 1, CNRS, Laboratoire de Biométrie et Biologie Évolutive UMR 5558, F-69622 Villeurbanne, France
| |
Collapse
|
19
|
Variables Influencing Differences in Sequence Conservation in the Fission Yeast Schizosaccharomyces pombe. J Mol Evol 2021; 89:601-610. [PMID: 34436628 PMCID: PMC8599406 DOI: 10.1007/s00239-021-10028-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2021] [Accepted: 08/17/2021] [Indexed: 11/17/2022]
Abstract
Which variables determine the constraints on gene sequence evolution is one of the most central questions in molecular evolution. In the fission yeast Schizosaccharomyces pombe, an important model organism, the variables influencing the rate of sequence evolution have yet to be determined. Previous studies in other single celled organisms have generally found gene expression levels to be most significant, with numerous other variables such as gene length and functional importance identified as having a smaller impact. Using publicly available data, we used partial least squares regression, principal components regression, and partial correlations to determine the variables most strongly associated with sequence evolution constraints. We identify centrality in the protein–protein interactions network, amino acid composition, and cellular location as the most important determinants of sequence conservation. However, each factor only explains a small amount of variance, and there are numerous variables having a significant or heterogeneous influence. Our models explain more than half of the variance in dN, raising the possibility that future refined models could quantify the role of stochastics in evolutionary rate variation.
Collapse
|
20
|
Biesiadecka MK, Sliwa P, Tomala K, Korona R. An Overexpression Experiment Does Not Support the Hypothesis That Avoidance of Toxicity Determines the Rate of Protein Evolution. Genome Biol Evol 2021; 12:589-596. [PMID: 32259256 PMCID: PMC7250497 DOI: 10.1093/gbe/evaa067] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/01/2020] [Indexed: 12/22/2022] Open
Abstract
The misfolding avoidance hypothesis postulates that sequence mutations render proteins cytotoxic and therefore the higher the gene expression, the stronger the operation of selection against substitutions. This translates into prediction that relative toxicity of extant proteins is higher for those evolving faster. In the present experiment, we selected pairs of yeast genes which were paralogous but evolving at different rates. We expressed them artificially to high levels. We expected that toxicity would be higher for ones bearing more mutations, especially that overcrowding should rather exacerbate than reverse the already existing differences in misfolding rates. We did find that the applied mode of overexpression caused a considerable decrease in fitness and that the decrease was proportional to the amount of excessive protein. However, it was not higher for proteins which are normally expressed at lower levels (and have less conserved sequence). This result was obtained consistently, regardless whether the rate of growth or ability to compete in common cultures was used as a proxy for fitness. In additional experiments, we applied factors that reduce accuracy of translation or enhance structural instability of proteins. It did not change a consistent pattern of independence between the fitness cost caused by overexpression of a protein and the rate of its sequence evolution.
Collapse
Affiliation(s)
| | - Piotr Sliwa
- Department of Genetics, Faculty of Biotechnology, University of Rzeszów, Poland
| | - Katarzyna Tomala
- Institute of Environmental Sciences, Faculty of Biology, Jagiellonian University, Cracow, Poland
| | - Ryszard Korona
- Institute of Environmental Sciences, Faculty of Biology, Jagiellonian University, Cracow, Poland
| |
Collapse
|
21
|
Dubreuil B, Levy ED. Abundance Imparts Evolutionary Constraints of Similar Magnitude on the Buried, Surface, and Disordered Regions of Proteins. Front Mol Biosci 2021; 8:626729. [PMID: 33996892 PMCID: PMC8119896 DOI: 10.3389/fmolb.2021.626729] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2020] [Accepted: 03/29/2021] [Indexed: 12/02/2022] Open
Abstract
An understanding of the forces shaping protein conservation is key, both for the fundamental knowledge it represents and to allow for optimal use of evolutionary information in practical applications. Sequence conservation is typically examined at one of two levels. The first is a residue-level, where intra-protein differences are analyzed and the second is a protein-level, where inter-protein differences are studied. At a residue level, we know that solvent-accessibility is a prime determinant of conservation. By inverting this logic, we inferred that disordered regions are slightly more solvent-accessible on average than the most exposed surface residues in domains. By integrating abundance information with evolutionary data within and across proteins, we confirmed a previously reported strong surface-core association in the evolution of structured regions, but we found a comparatively weak association between disordered and structured regions. The facts that disordered and structured regions experience different structural constraints and evolve independently provide a unique setup to examine an outstanding question: why is a protein’s abundance the main determinant of its sequence conservation? Indeed, any structural or biophysical property linked to the abundance-conservation relationship should increase the relative conservation of regions concerned with that property (e.g., disordered residues with mis-interactions, domain residues with misfolding). Surprisingly, however, we found the conservation of disordered and structured regions to increase in equal proportion with abundance. This observation implies that either abundance-related constraints are structure-independent, or multiple constraints apply to different regions and perfectly balance each other.
Collapse
Affiliation(s)
- Benjamin Dubreuil
- Department of Structural Biology, Weizmann Institute of Science, Rehovot, Israel
| | - Emmanuel D Levy
- Department of Structural Biology, Weizmann Institute of Science, Rehovot, Israel
| |
Collapse
|
22
|
Das JK, Roy S. A study on non-synonymous mutational patterns in structural proteins of SARS-CoV-2. Genome 2021; 64:665-678. [PMID: 33788636 DOI: 10.1139/gen-2020-0157] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
SARS-CoV-2 is mutating and creating divergent variants across the world. An in-depth investigation of the amino acid substitutions in the genomic signature of SARS-CoV-2 proteins is highly essential for understanding its host adaptation and infection biology. A total of 9587 SARS-CoV-2 structural protein sequences collected from 49 different countries are used to characterize protein-wise variants, substitution patterns (type and location), and major substitution changes. The majority of the substitutions are distinct, mostly in a particular location, and lead to a change in an amino acid's biochemical properties. In terms of mutational changes, envelope (E) and membrane (M) proteins are relatively more stable than nucleocapsid (N) and spike (S) proteins. Several co-occurrence substitutions are observed, particularly in S and N proteins. Substitution specific to active sub-domains reveals that heptapeptide repeat, fusion peptides, transmembrane in S protein, and N-terminal and C-terminal domains in the N protein are remarkably mutated. We also observe a few deleterious mutations in the above domains. The overall study on non-synonymous mutation in structural proteins of SARS-CoV-2 at the start of the pandemic indicates a diversity amongst virus sequences.
Collapse
Affiliation(s)
- Jayanta Kumar Das
- Department of Pediatrics, Johns Hopkins University School of Medicine, Maryland, USA
| | - Swarup Roy
- Network Reconstruction & Analysis (NetRA) Lab, Department of Computer Applications, Sikkim University, Gangtok, India
| |
Collapse
|
23
|
Rousset F, Cabezas-Caballero J, Piastra-Facon F, Fernández-Rodríguez J, Clermont O, Denamur E, Rocha EPC, Bikard D. The impact of genetic diversity on gene essentiality within the Escherichia coli species. Nat Microbiol 2021; 6:301-312. [PMID: 33462433 DOI: 10.1038/s41564-020-00839-y] [Citation(s) in RCA: 46] [Impact Index Per Article: 15.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2020] [Accepted: 11/20/2020] [Indexed: 01/28/2023]
Abstract
Bacteria from the same species can differ widely in their gene content. In Escherichia coli, the set of genes shared by all strains, known as the core genome, represents about half the number of genes present in any strain. Although recent advances in bacterial genomics have unravelled genes required for fitness in various experimental conditions, most studies have focused on single model strains. As a result, the impact of the species' genetic diversity on core processes of the bacterial cell remains largely under-investigated. Here, we have developed a CRISPR interference platform for high-throughput gene repression that is compatible with most E. coli isolates and closely related species. We have applied it to assess the importance of ~3,400 nearly ubiquitous genes in three growth conditions in 18 representative E. coli strains spanning most common phylogroups and lifestyles of the species. Our screens revealed extensive variations in gene essentiality between strains and conditions. Investigation of the genetic determinants for these variations highlighted the importance of epistatic interactions with mobile genetic elements. In particular, we have shown how prophage-encoded defence systems against phage infection can trigger the essentiality of persistent genes that are usually non-essential. This study provides broad insights into the evolvability of gene essentiality and argues for the importance of studying various isolates from the same species under diverse conditions.
Collapse
Affiliation(s)
- François Rousset
- Synthetic Biology, Department of Microbiology, Institut Pasteur, Paris, France.,Sorbonne Université, Collège Doctoral, Paris, France
| | | | | | | | | | - Erick Denamur
- Université de Paris, IAME, INSERM UMR1137, Paris, France.,AP-HP, Laboratoire de Génétique Moléculaire, Hôpital Bichat, Paris, France
| | - Eduardo P C Rocha
- Microbial Evolutionary Genomics, Institut Pasteur, CNRS, UMR3525, Paris, France.
| | - David Bikard
- Synthetic Biology, Department of Microbiology, Institut Pasteur, Paris, France.
| |
Collapse
|
24
|
Lato DF, Golding GB. The Location of Substitutions and Bacterial Genome Arrangements. Genome Biol Evol 2020; 13:6035136. [PMID: 33320172 PMCID: PMC7851589 DOI: 10.1093/gbe/evaa260] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 12/11/2020] [Indexed: 01/09/2023] Open
Abstract
Increasing evidence supports the notion that different regions of a genome have unique rates of molecular change. This variation is particularly evident in bacterial genomes where previous studies have reported gene expression and essentiality tend to decrease, whereas substitution rates usually increase with increasing distance from the origin of replication. Genomic reorganization such as rearrangements occur frequently in bacteria and allow for the introduction and restructuring of genetic content, creating gradients of molecular traits along genomes. Here, we explore the interplay of these phenomena by mapping substitutions to the genomes of Escherichia coli, Bacillus subtilis, Streptomyces, and Sinorhizobium meliloti, quantifying how many substitutions have occurred at each position in the genome. Preceding work indicates that substitution rate significantly increases with distance from the origin. Using a larger sample size and accounting for genome rearrangements through ancestral reconstruction, our analysis demonstrates that the correlation between the number of substitutions and the distance from the origin of replication is significant but small and inconsistent in direction. Some replicons had a significantly decreasing trend (E. coli and the chromosome of S. meliloti), whereas others showed the opposite significant trend (B. subtilis, Streptomyces, pSymA and pSymB in S. meliloti). dN, dS, and ω were examined across all genes and there was no significant correlation between those values and distance from the origin. This study highlights the impact that genomic rearrangements and location have on molecular trends in some bacteria, illustrating the importance of considering spatial trends in molecular evolutionary analysis. Assuming that molecular trends are exclusively in one direction can be problematic.
Collapse
Affiliation(s)
- Daniella F Lato
- Department of Biology, McMaster University, Hamilton, Ontario, Canada
| | - G Brian Golding
- Department of Biology, McMaster University, Hamilton, Ontario, Canada
| |
Collapse
|
25
|
Evans P, Cox NJ, Gamazon ER. The regulatory genome constrains protein sequence evolution: implications for the search for disease-associated genes. PeerJ 2020; 8:e9554. [PMID: 32765967 PMCID: PMC7380284 DOI: 10.7717/peerj.9554] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2020] [Accepted: 06/24/2020] [Indexed: 11/20/2022] Open
Abstract
The development of explanatory models of protein sequence evolution has broad implications for our understanding of cellular biology, population history, and disease etiology. Here we analyze the GTEx transcriptome resource to quantify the effect of the transcriptome on protein sequence evolution in a multi-tissue framework. We find substantial variation among the central nervous system tissues in the effect of expression variance on evolutionary rate, with highly variable genes in the cortex showing significantly greater purifying selection than highly variable genes in subcortical regions (Mann-Whitney U p = 1.4 × 10-4). The remaining tissues cluster in observed expression correlation with evolutionary rate, enabling evolutionary analysis of genes in diverse physiological systems, including digestive, reproductive, and immune systems. Importantly, the tissue in which a gene attains its maximum expression variance significantly varies (p = 5.55 × 10-284) with evolutionary rate, suggesting a tissue-anchored model of protein sequence evolution. Using a large-scale reference resource, we show that the tissue-anchored model provides a transcriptome-based approach to predicting the primary affected tissue of developmental disorders. Using gradient boosted regression trees to model evolutionary rate under a range of model parameters, selected features explain up to 62% of the variation in evolutionary rate and provide additional support for the tissue model. Finally, we investigate several methodological implications, including the importance of evolutionary-rate-aware gene expression imputation models using genetic data for improved search for disease-associated genes in transcriptome-wide association studies. Collectively, this study presents a comprehensive transcriptome-based analysis of a range of factors that may constrain molecular evolution and proposes a novel framework for the study of gene function and disease mechanism.
Collapse
Affiliation(s)
- Patrick Evans
- Division of Genetic Medicine, Vanderbilt University Medical Center, Nashville, TN, United States of America
| | - Nancy J Cox
- Division of Genetic Medicine, Vanderbilt University Medical Center, Nashville, TN, United States of America
| | - Eric R Gamazon
- Division of Genetic Medicine, Vanderbilt University Medical Center, Nashville, TN, United States of America.,Clare Hall, University of Cambridge, Cambridge, United Kingdom.,MRC Epidemiology Unit, University of Cambridge, Cambridge, United Kingdom.,Data Science Institute, Vanderbilt University, Nashville, TN, United States of America
| |
Collapse
|
26
|
Design and Unique Expression of a Novel Antibacterial Fusion Protein Cecropin B-Human Lysozyme to Be Toxic to Prokaryotic Host Cells. Probiotics Antimicrob Proteins 2020; 11:1362-1369. [PMID: 30835077 DOI: 10.1007/s12602-019-09527-8] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
Abstract
A novel antibacterial fusion protein, cecropin B-human lysozyme (CB-hLyso), was designed and expressed in a prokaryotic system. The full-length CB gene was first synthesized and fused to the 5' end of the hLyso gene. The recombinant CB-hLyso was then subcloned in plasmid pET32a, and pET32a-CB-hLyso was transferred into Escherichia coli (E. coli) BL21(DE3) and BL21(DE3)pLysS. The results showed that in the original culture media, Luria-Bertani (LB) media and terrific broth (TB), at 37 or 25 °C, CB-hLyso was barely expressed; however, when the original culture medium was replaced with an equi-volume of fresh medium, obvious expression occurred in BL21(DE3)pLysS/pET32a-CB-hLyso at 25 °C, and the expression in TB (25%) was higher than that in LB (15%). Through a two-step chromatographic method consisting of Ni-chelated Sepharose Fast Flow affinity and Sephadex G-75 size-exclusion, the crude fusion CB-hLyso was isolated in a homogeneous form, and preliminary bacteriostasis experiments showed that the fusion CB-hLyso had a strong inhibitory effect on the growth of Staphylococci. This work provides useful insights into the design of novel fusion polypeptides with higher bacteriolytic activity and wider antimicrobial spectra and in the expression of polypeptide products that are toxic to prokaryotic host cells, eukaryotic host cells or insect cells. Graphical Abstract Schematic representation of expression vector pET-32a-CB-hLyso, with Factor Xa and Asn-Gly.
Collapse
|
27
|
Li G, Li M, Wang J, Li Y, Pan Y. United Neighborhood Closeness Centrality and Orthology for Predicting Essential Proteins. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020; 17:1451-1458. [PMID: 30596582 DOI: 10.1109/tcbb.2018.2889978] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/06/2023]
Abstract
Identifying essential proteins plays an important role in disease study, drug design, and understanding the minimal requirement for cellular life. Computational methods for essential proteins discovery overcome the disadvantages of biological experimental methods that are often time-consuming, expensive, and inefficient. The topological features of protein-protein interaction (PPI) networks are often used to design computational prediction methods, such as Degree Centrality (DC), Betweenness Centrality (BC), Closeness Centrality (CC), Subgraph Centrality (SC), Eigenvector Centrality (EC), Information Centrality (IC), and Neighborhood Centrality (NC). However, the prediction accuracies of these individual methods still have space to be improved. Studies show that additional information, such as orthologous relations, helps discover essential proteins. Many researchers have proposed different methods by combining multiple information sources to gain improvement of prediction accuracy. In this study, we find that essential proteins appear in triangular structure in PPI network significantly more often than nonessential ones. Based on this phenomenon, we propose a novel pure centrality measure, so-called Neighborhood Closeness Centrality (NCC). Accordingly, we develop a new combination model, Extended Pareto Optimality Consensus model, named EPOC, to fuse NCC and Orthology information and a novel essential proteins identification method, NCCO, is fully proposed. Compared with seven existing classic centrality methods (DC, BC, IC, CC, SC, EC, and NC) and three consensus methods (PeC, ION, and CSC), our results on S.cerevisiae and E.coli datasets show that NCCO has clear advantages. As a consensus method, EPOC also yields better performance than the random walk model.
Collapse
|
28
|
Abstract
Darwin's theory of evolution emphasized that positive selection of functional proficiency provides the fitness that ultimately determines the structure of life, a view that has dominated biochemical thinking of enzymes as perfectly optimized for their specific functions. The 20th-century modern synthesis, structural biology, and the central dogma explained the machinery of evolution, and nearly neutral theory explained how selection competes with random fixation dynamics that produce molecular clocks essential e.g. for dating evolutionary histories. However, quantitative proteomics revealed that selection pressures not relating to optimal function play much larger roles than previously thought, acting perhaps most importantly via protein expression levels. This paper first summarizes recent progress in the 21st century toward recovering this universal selection pressure. Then, the paper argues that proteome cost minimization is the dominant, underlying 'non-function' selection pressure controlling most of the evolution of already functionally adapted living systems. A theory of proteome cost minimization is described and argued to have consequences for understanding evolutionary trade-offs, aging, cancer, and neurodegenerative protein-misfolding diseases.
Collapse
|
29
|
Abstract
Adaptive mutations play an important role in molecular evolution. However, the frequency and nature of these mutations at the intramolecular level are poorly understood. To address this, we analyzed the impact of protein architecture on the rate of adaptive substitutions, aiming to understand how protein biophysics influences fitness and adaptation. Using Drosophila melanogaster and Arabidopsis thaliana population genomics data, we fitted models of distribution of fitness effects and estimated the rate of adaptive amino-acid substitutions both at the protein and amino-acid residue level. We performed a comprehensive analysis covering genome, gene, and protein structure, by exploring a multitude of factors with a plausible impact on the rate of adaptive evolution, such as intron number, protein length, secondary structure, relative solvent accessibility, intrinsic protein disorder, chaperone affinity, gene expression, protein function, and protein-protein interactions. We found that the relative solvent accessibility is a major determinant of adaptive evolution, with most adaptive mutations occurring at the surface of proteins. Moreover, we observe that the rate of adaptive substitutions differs between protein functional classes, with genes encoding for protein biosynthesis and degradation signaling exhibiting the fastest rates of protein adaptation. Overall, our results suggest that adaptive evolution in proteins is mainly driven by intermolecular interactions, with host-pathogen coevolution likely playing a major role.
Collapse
Affiliation(s)
- Ana Filipa Moutinho
- Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Biology, Plön, Germany
| | - Fernanda Fontes Trancoso
- Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Biology, Plön, Germany
| | - Julien Yann Dutheil
- Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Biology, Plön, Germany.,Unité Mixte de Recherche 5554 Institut des Sciences de l'Evolution, CNRS, IRD, EPHE, Université de Montpellier, Montpellier, France
| |
Collapse
|
30
|
Dubreuil B, Matalon O, Levy ED. Protein Abundance Biases the Amino Acid Composition of Disordered Regions to Minimize Non-functional Interactions. J Mol Biol 2019; 431:4978-4992. [PMID: 31442477 PMCID: PMC6941228 DOI: 10.1016/j.jmb.2019.08.008] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2019] [Revised: 08/07/2019] [Accepted: 08/10/2019] [Indexed: 02/07/2023]
Abstract
In eukaryotes, disordered regions cover up to 50% of proteomes and mediate fundamental cellular processes. In contrast to globular domains, where about half of the amino acids are buried in the protein interior, disordered regions show higher solvent accessibility, which makes them prone to engage in non-functional interactions. Such interactions are exacerbated by the law of mass action, prompting the question of how they are minimized in abundant proteins. We find that interaction propensity or "stickiness" of disordered regions negatively correlates with their cellular abundance, both in yeast and human. Strikingly, considering yeast proteins where a large fraction of the sequence is disordered, the correlation between stickiness and abundance reaches R=-0.55. Beyond this global amino-acid composition bias, we identify three rules by which amino-acid composition of disordered regions adjusts with high abundance. First, lysines are preferred over arginines, consistent with the latter amino acid being stickier than the former. Second, compensatory effects exist, whereby a sticky region can be tolerated if it is compensated by a distal non-sticky region. Third, such compensation requires a lower average stickiness at the same abundance when compared to a scenario where stickiness is homogeneous throughout the sequence. We validate these rules experimentally, employing them as different strategies to rescue an otherwise sticky protein fragment from aggregation. Our results highlight that non-functional interactions represent a significant constraint in cellular systems and reveal simple rules by which protein sequences adapt to that constraint. Data from this work are deposited in Figshare, at https://doi.org/10.6084/m9.figshare.8068937.v3.
Collapse
Affiliation(s)
- Benjamin Dubreuil
- Department of Structural Biology, Weizmann Institute of Science, Rehovot 7610001, Israel
| | - Or Matalon
- Department of Structural Biology, Weizmann Institute of Science, Rehovot 7610001, Israel
| | - Emmanuel D Levy
- Department of Structural Biology, Weizmann Institute of Science, Rehovot 7610001, Israel.
| |
Collapse
|
31
|
Li G, Li M, Peng W, Li Y, Pan Y, Wang J. A novel extended Pareto Optimality Consensus model for predicting essential proteins. J Theor Biol 2019; 480:141-149. [PMID: 31398315 DOI: 10.1016/j.jtbi.2019.08.005] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2019] [Revised: 08/02/2019] [Accepted: 08/06/2019] [Indexed: 12/11/2022]
Abstract
Essential proteins have vital functions, when they are destroyed in cells, the cells will die or stop reproducing. Therefore, it is very important to identify essential proteins from a large number of other proteins. Due to the time-consuming, expensive, and inefficient process in biological experimental methods, computational methods become more and more popular to recognize them. In the early stages, these methods mainly rely on protein-protein interaction (PPI) information, which limits their discovery capacities. Researchers find novel methods by fusing multi-information to improve prediction accuracy. According to these features, essential proteins are more important and conservative in the evolution process, their neighbors in PPI networks are usually likely to be essential, there are many false positives in PPI data, whether a protein is essential can be assessed by the importance of a protein itself, the relevance of neighbors and the reliability of PPIs. The importance of neighbors and the reliability of PPIs can be further integrated into neighborhood feature. In the study, orthologous information, edge-clustering coefficient and gene expression information are used to measure the importance of a protein itself, the importance of the neighbors and the reliability of PPIs, respectively. Then, a novel expanded POC model, E_POC, is proposed to fuse the above information to discover essential proteins, a weighted PPI network is constructed. The proteins ranked high according to their weights are treated as candidate essential proteins. This novel method is named as E_POC. E_POC outperforms the existing classical methods on S. cerevisiae and E. coli data.
Collapse
Affiliation(s)
- Gaoshi Li
- School of Computer Science and engineering, Central South University, Changsha 410083, China; Guangxi Key Lab of Multi-source Information Mining & Security, Guangxi Normal University, Guilin, Guangxi 541004, China.
| | - Min Li
- School of Computer Science and engineering, Central South University, Changsha 410083, China.
| | - Wei Peng
- Computer Center/ Faculty of Information Engineering and Automation of Kunming University of Science and Technology, Kunming, Yunnan 650093, China
| | - Yaohang Li
- Department of Computer Science, Old Dominion University, Norfolk, VA 23529, USA.
| | - Yi Pan
- Department of Computer Science, Georgia State University, Atlanta, GA 30302-4110, USA.
| | - Jianxin Wang
- School of Computer Science and engineering, Central South University, Changsha 410083, China.
| |
Collapse
|
32
|
Guo Y, Peng Z, Liu J, Yuan N, Wang Z, Du J. Systematic Comparisons of Positively Selected Genes between Gossypium arboreum and Gossypium raimondii Genomes. Curr Bioinform 2019. [DOI: 10.2174/1574893614666190227151013] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Background:
Studies of Positively Selected Genes (PSGs) in microorganisms and
mammals have provided insights into the dynamics of genome evolution and the genetic basis of
differences between species by using whole genome-wide scans. Systematic investigations and
comparisons of PSGs in plants, however, are still limited.
Objective:
A systematic comparison of PSGs between the genomes of two cotton species,
Gossypium arboreum (G. arboreum) and G. raimondii, will give the key answer for revealing
molecular evolutionary differences in plants.
Methods:
Genome sequences of G. arboreum and G. raimondii were compared, including Whole
Genome Duplication (WGD) events and genomic features such as gene number, gene length,
codon bias index, evolutionary rate, number of expressed genes, and retention of duplicated
copies.
Results:
Unlike the PSGs in G. raimondii, G. arboreum comprised more PSGs, smaller gene size
and fewer expressed gene. In addition, the PSGs evolved at a higher rate of synonymous
substitutions, but were subjected to lower selection pressure. The PSGs in G. arboreum were also
retained with a lower number of duplicate gene copies than G. raimondii after a single WGD event
involving Gossypium.
Conclusion:
These data indicate that PSGs in G. arboreum and G. raimondii differ not only in
Ka/Ks, but also in their evolutionary, structural, and expression properties, indicating that
divergence of G. arboreum and G. raimondii was associated with differences in PSGs in terms of
evolutionary rates, gene length, expression patterns, and WGD retention in Gossypium.
Collapse
Affiliation(s)
- Yue Guo
- Provincial Key Laboratory of Agrobiology, Institute of Crop Germplasm and Biotechnology, Jiangsu Academy of Agricultural Sciences, Nanjing 210014, China
| | - Zhen Peng
- Provincial Key Laboratory of Agrobiology, Institute of Crop Germplasm and Biotechnology, Jiangsu Academy of Agricultural Sciences, Nanjing 210014, China
| | - Jing Liu
- Provincial Key Laboratory of Agrobiology, Institute of Crop Germplasm and Biotechnology, Jiangsu Academy of Agricultural Sciences, Nanjing 210014, China
| | - Na Yuan
- Provincial Key Laboratory of Agrobiology, Institute of Crop Germplasm and Biotechnology, Jiangsu Academy of Agricultural Sciences, Nanjing 210014, China
| | - Zhen Wang
- Provincial Key Laboratory of Agrobiology, Institute of Crop Germplasm and Biotechnology, Jiangsu Academy of Agricultural Sciences, Nanjing 210014, China
| | - Jianchang Du
- Provincial Key Laboratory of Agrobiology, Institute of Crop Germplasm and Biotechnology, Jiangsu Academy of Agricultural Sciences, Nanjing 210014, China
| |
Collapse
|
33
|
Razban RM. Protein Melting Temperature Cannot Fully Assess Whether Protein Folding Free Energy Underlies the Universal Abundance-Evolutionary Rate Correlation Seen in Proteins. Mol Biol Evol 2019; 36:1955-1963. [PMID: 31093676 PMCID: PMC6736436 DOI: 10.1093/molbev/msz119] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023] Open
Abstract
The protein misfolding avoidance hypothesis explains the universal negative correlation between protein abundance and sequence evolutionary rate across the proteome by identifying protein folding free energy (ΔG) as the confounding variable. Abundant proteins resist toxic misfolding events by being more stable, and more stable proteins evolve slower because their mutations are more destabilizing. Direct supporting evidence consists only of computer simulations. A study taking advantage of a recent experimental breakthrough in measuring protein stability proteome-wide through melting temperature (Tm) (Leuenberger et al. 2017), found weak misfolding avoidance hypothesis support for the Escherichia coli proteome, and no support for the Saccharomyces cerevisiae, Homo sapiens, and Thermus thermophilus proteomes (Plata and Vitkup 2018). I find that the nontrivial relationship between Tm and ΔG and inaccuracy in Tm measurements by Leuenberger et al. 2017 can be responsible for not observing strong positive abundance-Tm and strong negative Tm-evolutionary rate correlations.
Collapse
Affiliation(s)
- Rostam M Razban
- Department of Chemistry and Chemical Biology, Harvard University, Cambridge, MA
| |
Collapse
|
34
|
Ghoneim DH, Zhang X, Brule CE, Mathews DH, Grayhack EJ. Conservation of location of several specific inhibitory codon pairs in the Saccharomyces sensu stricto yeasts reveals translational selection. Nucleic Acids Res 2019; 47:1164-1177. [PMID: 30576464 PMCID: PMC6379720 DOI: 10.1093/nar/gky1262] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2018] [Revised: 11/19/2018] [Accepted: 12/06/2018] [Indexed: 12/30/2022] Open
Abstract
Synonymous codons provide redundancy in the genetic code that influences translation rates in many organisms, in which overall codon use is driven by selection for optimal codons. It is unresolved if or to what extent translational selection drives use of suboptimal codons or codon pairs. In Saccharomyces cerevisiae, 17 specific inhibitory codon pairs, each comprised of adjacent suboptimal codons, inhibit translation efficiency in a manner distinct from their constituent codons, and many are translated slowly in native genes. We show here that selection operates within Saccharomyces sensu stricto yeasts to conserve nine of these codon pairs at defined positions in genes. Conservation of these inhibitory codon pairs is significantly greater than expected, relative to conservation of their constituent codons, with seven pairs more highly conserved than any other synonymous pair. Conservation is strongly correlated with slow translation of the pairs. Conservation of suboptimal codon pairs extends to two related Candida species, fungi that diverged from Saccharomyces ∼270 million years ago, with an enrichment for codons decoded by I•A and U•G wobble in both Candida and Saccharomyces. Thus, conservation of inhibitory codon pairs strongly implies selection for slow translation at particular gene locations, executed by suboptimal codon pairs.
Collapse
Affiliation(s)
- Dalia H Ghoneim
- Department of Biochemistry and Biophysics, School of Medicine and Dentistry, University of Rochester, Rochester, NY 14642, USA.,Center for RNA Biology, University of Rochester, Rochester, NY 14642, USA
| | - Xiaoju Zhang
- Department of Biochemistry and Biophysics, School of Medicine and Dentistry, University of Rochester, Rochester, NY 14642, USA.,Center for RNA Biology, University of Rochester, Rochester, NY 14642, USA
| | - Christina E Brule
- Department of Biochemistry and Biophysics, School of Medicine and Dentistry, University of Rochester, Rochester, NY 14642, USA.,Center for RNA Biology, University of Rochester, Rochester, NY 14642, USA
| | - David H Mathews
- Department of Biochemistry and Biophysics, School of Medicine and Dentistry, University of Rochester, Rochester, NY 14642, USA.,Center for RNA Biology, University of Rochester, Rochester, NY 14642, USA
| | - Elizabeth J Grayhack
- Department of Biochemistry and Biophysics, School of Medicine and Dentistry, University of Rochester, Rochester, NY 14642, USA.,Center for RNA Biology, University of Rochester, Rochester, NY 14642, USA
| |
Collapse
|
35
|
Chiok KLR, Shah DH. Identification of common highly expressed genes of Salmonella Enteritidis by in silico prediction of gene expression and in vitro transcriptomic analysis. Poult Sci 2019; 98:2948-2963. [PMID: 30953073 DOI: 10.3382/ps/pez119] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2018] [Accepted: 02/27/2019] [Indexed: 01/02/2023] Open
Abstract
Chickens are the reservoir host of Salmonella Enteritidis. Salmonella Enteritidis colonizes the gastro-intestinal tract of chickens and replicates within macrophages without causing clinically discernable illness. Persistence of S. Enteritidis in the hostile environments of intestinal tract and macrophages allows it to disseminate extra-intestinally to liver, spleen, and reproductive tract. Extra-intestinal dissemination into reproductive tract leads to contamination of internal contents of eggs, which is a major risk factor for human infection. Understanding the genes that contribute to S. Enteritidis persistence in the chicken host is central to elucidate the genetic basis of the unique pathobiology of this public health pathogen. The aim of this study was to identify a succinct set of genes associated with infection-relevant in vitro environments to provide a rational foundation for subsequent biologically-relevant research. We used in silico prediction of gene expression and RNA-seq technology to identify a core set of 73 S. Enteritidis genes that are consistently highly expressed in multiple S. Enteritidis strains cultured at avian physiologic temperature under conditions that represent intestinal and intracellular environments. These common highly expressed (CHX) genes encode proteins involved in bacterial metabolism, protein synthesis, cell-envelope biogenesis, stress response, and a few proteins with uncharacterized functions. Further studies are needed to dissect the contribution of these CHX genes to the pathobiology of S. Enteritidis in the avian host. Several of the CHX genes could serve as promising targets for studies towards the development of immunoprophylactic and novel therapeutic strategies to prevent colonization of chickens and their environment with S. Enteritidis.
Collapse
Affiliation(s)
- Kim Lam R Chiok
- Department of Veterinary Microbiology and Pathology, Washington State University, Pullman, WA 99164-7040
| | - Devendra H Shah
- Department of Veterinary Microbiology and Pathology, Washington State University, Pullman, WA 99164-7040
| |
Collapse
|
36
|
Davydov II, Salamin N, Robinson-Rechavi M. Large-Scale Comparative Analysis of Codon Models Accounting for Protein and Nucleotide Selection. Mol Biol Evol 2019; 36:1316-1332. [PMID: 30847475 PMCID: PMC6526913 DOI: 10.1093/molbev/msz048] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022] Open
Abstract
There are numerous sources of variation in the rate of synonymous substitutions inside genes, such as direct selection on the nucleotide sequence, or mutation rate variation. Yet scans for positive selection rely on codon models which incorporate an assumption of effectively neutral synonymous substitution rate, constant between sites of each gene. Here we perform a large-scale comparison of approaches which incorporate codon substitution rate variation and propose our own simple yet effective modification of existing models. We find strong effects of substitution rate variation on positive selection inference. More than 70% of the genes detected by the classical branch-site model are presumably false positives caused by the incorrect assumption of uniform synonymous substitution rate. We propose a new model which is strongly favored by the data while remaining computationally tractable. With the new model we can capture signatures of nucleotide level selection acting on translation initiation and on splicing sites within the coding region. Finally, we show that rate variation is highest in the highly recombining regions, and we propose that recombination and mutation rate variation, such as high CpG mutation rate, are the two main sources of nucleotide rate variation. Although we detect fewer genes under positive selection in Drosophila than without rate variation, the genes which we detect contain a stronger signal of adaptation of dynein, which could be associated with Wolbachia infection. We provide software to perform positive selection analysis using the new model.
Collapse
Affiliation(s)
- Iakov I Davydov
- Department of Computational Biology, Biophore, University of Lausanne, Lausanne, Switzerland.,Department of Ecology and Evolution, Biophore, University of Lausanne, Lausanne, Switzerland.,Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Nicolas Salamin
- Department of Computational Biology, Biophore, University of Lausanne, Lausanne, Switzerland.,Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Marc Robinson-Rechavi
- Department of Ecology and Evolution, Biophore, University of Lausanne, Lausanne, Switzerland.,Swiss Institute of Bioinformatics, Lausanne, Switzerland
| |
Collapse
|
37
|
Bolívar P, Mugal CF, Rossi M, Nater A, Wang M, Dutoit L, Ellegren H. Biased Inference of Selection Due to GC-Biased Gene Conversion and the Rate of Protein Evolution in Flycatchers When Accounting for It. Mol Biol Evol 2019; 35:2475-2486. [PMID: 30085180 PMCID: PMC6188562 DOI: 10.1093/molbev/msy149] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022] Open
Abstract
The rate of recombination impacts on rates of protein evolution for at least two reasons: it affects the efficacy of selection due to linkage and influences sequence evolution through the process of GC-biased gene conversion (gBGC). We studied how recombination, via gBGC, affects inferences of selection in gene sequences using comparative genomic and population genomic data from the collared flycatcher (Ficedula albicollis). We separately analyzed different mutation categories (“strong”-to-“weak,” “weak-to-strong,” and GC-conservative changes) and found that gBGC impacts on the distribution of fitness effects of new mutations, and leads to that the rate of adaptive evolution and the proportion of adaptive mutations among nonsynonymous substitutions are underestimated by 22–33%. It also biases inferences of demographic history based on the site frequency spectrum. In light of this impact, we suggest that inferences of selection (and demography) in lineages with pronounced gBGC should be based on GC-conservative changes only. Doing so, we estimate that 10% of nonsynonymous mutations are effectively neutral and that 27% of nonsynonymous substitutions have been fixed by positive selection in the flycatcher lineage. We also find that gene expression level, sex-bias in expression, and the number of protein–protein interactions, but not Hill–Robertson interference (HRI), are strong determinants of selective constraint and rate of adaptation of collared flycatcher genes. This study therefore illustrates the importance of disentangling the effects of different evolutionary forces and genetic factors in interpretation of sequence data, and from that infer the role of natural selection in DNA sequence evolution.
Collapse
Affiliation(s)
- Paulina Bolívar
- Department of Evolutionary Biology, Evolutionary Biology Centre, Uppsala University, Uppsala, Sweden
| | - Carina F Mugal
- Department of Evolutionary Biology, Evolutionary Biology Centre, Uppsala University, Uppsala, Sweden
| | - Matteo Rossi
- Department of Evolutionary Biology, Evolutionary Biology Centre, Uppsala University, Uppsala, Sweden.,Department of Biology II, Faculty of Biology, Ludwig-Maximilians-Universität München, Planegg-Martinsried, Germany
| | - Alexander Nater
- Department of Evolutionary Biology, Evolutionary Biology Centre, Uppsala University, Uppsala, Sweden.,Chair in Zoology and Evolutionary Biology, Department of Biology, University of Konstanz, Konstanz, Germany
| | - Mi Wang
- Department of Evolutionary Biology, Evolutionary Biology Centre, Uppsala University, Uppsala, Sweden
| | - Ludovic Dutoit
- Department of Evolutionary Biology, Evolutionary Biology Centre, Uppsala University, Uppsala, Sweden
| | - Hans Ellegren
- Department of Evolutionary Biology, Evolutionary Biology Centre, Uppsala University, Uppsala, Sweden
| |
Collapse
|
38
|
Venev SV, Zeldovich KB. Thermophilic Adaptation in Prokaryotes Is Constrained by Metabolic Costs of Proteostasis. Mol Biol Evol 2019; 35:211-224. [PMID: 29106597 PMCID: PMC5850847 DOI: 10.1093/molbev/msx282] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Prokaryotes evolved to thrive in an extremely diverse set of habitats, and their proteomes bear signatures of environmental conditions. Although correlations between amino acid usage and environmental temperature are well-documented, understanding of the mechanisms of thermal adaptation remains incomplete. Here, we couple the energetic costs of protein folding and protein homeostasis to build a microscopic model explaining both the overall amino acid composition and its temperature trends. Low biosynthesis costs lead to low diversity of physical interactions between amino acid residues, which in turn makes proteins less stable and drives up chaperone activity to maintain appropriate levels of folded, functional proteins. Assuming that the cost of chaperone activity is proportional to the fraction of unfolded client proteins, we simulated thermal adaptation of model proteins subject to minimization of the total cost of amino acid synthesis and chaperone activity. For the first time, we predicted both the proteome-average amino acid abundances and their temperature trends simultaneously, and found strong correlations between model predictions and 402 genomes of bacteria and archaea. The energetic constraint on protein evolution is more apparent in highly expressed proteins, selected by codon adaptation index. We found that in bacteria, highly expressed proteins are similar in composition to thermophilic ones, whereas in archaea no correlation between predicted expression level and thermostability was observed. At the same time, thermal adaptations of highly expressed proteins in bacteria and archaea are nearly identical, suggesting that universal energetic constraints prevail over the phylogenetic differences between these domains of life.
Collapse
Affiliation(s)
- Sergey V Venev
- Program in Bioinformatics and Integrative Biology, University of Massachusetts Medical School, 368 Plantation St, Worcester, MA
| | - Konstantin B Zeldovich
- Program in Bioinformatics and Integrative Biology, University of Massachusetts Medical School, 368 Plantation St, Worcester, MA
| |
Collapse
|
39
|
Salvador-Martínez I, Coronado-Zamora M, Castellano D, Barbadilla A, Salazar-Ciudad I. Mapping Selection within Drosophila melanogaster Embryo's Anatomy. Mol Biol Evol 2019; 35:66-79. [PMID: 29040697 DOI: 10.1093/molbev/msx266] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
Abstract
We present a survey of selection across Drosophila melanogaster embryonic anatomy. Our approach integrates genomic variation, spatial gene expression patterns, and development with the aim of mapping adaptation over the entire embryo's anatomy. Our adaptation map is based on analyzing spatial gene expression information for 5,969 genes (from text-based annotations of in situ hybridization data directly from the BDGP database, Tomancak et al. 2007) and the polymorphism and divergence in these genes (from the project DGRP, Mackay et al. 2012).The proportion of nonsynonymous substitutions that are adaptive, neutral, or slightly deleterious are estimated for the set of genes expressed in each embryonic anatomical structure using the distribution of fitness effects-alpha method (Eyre-Walker and Keightley 2009). This method is a robust derivative of the McDonald and Kreitman test (McDonald and Kreitman 1991). We also explore whether different anatomical structures differ in the phylogenetic age, codon usage, or expression bias of the genes they express and whether genes expressed in many anatomical structures show more adaptive substitutions than other genes.We found that: 1) most of the digestive system and ectoderm-derived structures are under selective constraint, 2) the germ line and some specific mesoderm-derived structures show high rates of adaptive substitution, and 3) the genes that are expressed in a small number of anatomical structures show higher expression bias, lower phylogenetic ages, and less constraint.
Collapse
Affiliation(s)
- Irepan Salvador-Martínez
- Evo-devo Helsinki Community, Centre of Excellence in Experimental and Computational Developmental Biology, Institute of Biotechnology, University of Helsinki, Helsinki, Finland
| | - Marta Coronado-Zamora
- Departament de Genètica i de Microbiologia, Genomics, Bioinformatics and Evolution, Departament de Genètica i Microbiologia, Universitat Autònoma de Barcelona, Cerdanyola del Vallès, Spain
| | - David Castellano
- Bioinformatics Research Centre, Aarhus University, Aarhus, Denmark
| | - Antonio Barbadilla
- Departament de Genètica i de Microbiologia, Genomics, Bioinformatics and Evolution, Departament de Genètica i Microbiologia, Universitat Autònoma de Barcelona, Cerdanyola del Vallès, Spain
| | - Isaac Salazar-Ciudad
- Evo-devo Helsinki Community, Centre of Excellence in Experimental and Computational Developmental Biology, Institute of Biotechnology, University of Helsinki, Helsinki, Finland.,Departament de Genètica i de Microbiologia, Genomics, Bioinformatics and Evolution, Departament de Genètica i Microbiologia, Universitat Autònoma de Barcelona, Cerdanyola del Vallès, Spain
| |
Collapse
|
40
|
Pervasive population genomic consequences of genome duplication in Arabidopsis arenosa. Nat Ecol Evol 2019; 3:457-468. [DOI: 10.1038/s41559-019-0807-4] [Citation(s) in RCA: 59] [Impact Index Per Article: 11.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2018] [Accepted: 01/10/2019] [Indexed: 12/30/2022]
|
41
|
Gislason AS, Turner K, Domaratzki M, Cardona ST. Comparative analysis of the Burkholderia cenocepacia K56-2 essential genome reveals cell envelope functions that are uniquely required for survival in species of the genus Burkholderia. Microb Genom 2019; 3. [PMID: 29208119 PMCID: PMC5729917 DOI: 10.1099/mgen.0.000140] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open
Abstract
Burkholderia cenocepacia K56-2 belongs to the Burkholderia cepacia complex, a group of Gram-negative opportunistic pathogens that have large and dynamic genomes. In this work, we identified the essential genome of B. cenocepacia K56-2 using high-density transposon mutagenesis and insertion site sequencing (Tn-seq circle). We constructed a library of one million transposon mutants and identified the transposon insertions at an average of one insertion per 27 bp. The probability of gene essentiality was determined by comparing of the insertion density per gene with the variance of neutral datasets generated by Monte Carlo simulations. Five hundred and eight genes were not significantly disrupted, suggesting that these genes are essential for survival in rich, undefined medium. Comparison of the B. cenocepacia K56-2 essential genome with that of the closely related B. cenocepacia J2315 revealed partial overlapping, suggesting that some essential genes are strain-specific. Furthermore, 158 essential genes were conserved in B. cenocepacia and two species belonging to the Burkholderia pseudomallei complex, B. pseudomallei K96243 and Burkholderia thailandensis E264. Porins, including OpcC, a lysophospholipid transporter, LplT, and a protein involved in the modification of lipid A with aminoarabinose were found to be essential in Burkholderia genomes but not in other bacterial essential genomes identified so far. Our results highlight the existence of cell envelope processes that are uniquely essential in species of the genus Burkholderia for which the essential genomes have been identified by Tn-seq.
Collapse
Affiliation(s)
- April S Gislason
- 1Department of Microbiology, University of Manitoba, Winnipeg, MB, R3T 2N2, Canada
| | - Keith Turner
- 2Monsanto Company, 700 Chesterfield Parkway W, Chesterfield, MO, 63017, USA
| | - Mike Domaratzki
- 3Department of Computer Science, University of Manitoba, Winnipeg, R3T 2N2, Canada
| | - Silvia T Cardona
- 4Department of Medical Microbiology & Infectious Diseases, University of Manitoba, Winnipeg, MB, R3E 0J9, Canada
| |
Collapse
|
42
|
Ferrada E. The Site-Specific Amino Acid Preferences of Homologous Proteins Depend on Sequence Divergence. Genome Biol Evol 2019; 11:121-135. [PMID: 30496400 PMCID: PMC6326188 DOI: 10.1093/gbe/evy261] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/26/2018] [Indexed: 12/20/2022] Open
Abstract
The propensity of protein sites to be occupied by any of the 20 amino acids is known as site-specific amino acid preferences (SSAP). Under the assumption that SSAP are conserved among homologs, they can be used to parameterize evolutionary models for the reconstruction of accurate phylogenetic trees. However, simulations and experimental studies have not been able to fully assess the relative conservation of SSAP as a function of sequence divergence between protein homologs. Here, we implement a computational procedure to predict the SSAP of proteins based on the effect of changes in thermodynamic stability upon mutation. An advantage of this computational approach is that it allows us to interrogate a large and unbiased sample of homologous proteins, over the entire spectrum of sequence divergence, and under selection for the same molecular trait. We show that computational predictions have reproducibilities that resemble those obtained in experimental replicates, and can largely recapitulate the SSAP observed in a large-scale mutagenesis experiment. Our results support recent experimental reports on the conservation of SSAP of related homologs, with a slowly increasing fraction of up to 15% of different sites at sequence distances lower than 40%. However, even under the sole contribution of thermodynamic stability, our conservative approach identifies up to 30% of significant different sites between divergent homologs. We show that this relation holds for homologs of diverse sizes and structural classes. Analyses of residue contact networks suggest that an important determinant of these differences is the increasing accumulation of structural deviations that results from sequence divergence.
Collapse
Affiliation(s)
- Evandro Ferrada
- Center for Genomics and Bioinformatics, Faculty of Science, Universidad Mayor, Camino La Pirámide 5750, Huechuraba, 8580745, Santiago, Chile
| |
Collapse
|
43
|
Assis R. Lineage-Specific Expression Divergence in Grasses Is Associated with Male Reproduction, Host-Pathogen Defense, and Domestication. Genome Biol Evol 2019; 11:207-219. [PMID: 30398650 PMCID: PMC6331041 DOI: 10.1093/gbe/evy245] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/03/2018] [Indexed: 02/02/2023] Open
Abstract
Poaceae (grasses) is an agriculturally important and widely distributed family of plants with extraordinary phenotypic diversity, much of which was generated under recent lineage-specific evolution. Yet, little is known about the genes and functional modules involved in the lineage-specific divergence of grasses. Here, I address this question on a genome-wide scale by applying a novel branch-based statistic of lineage-specific expression divergence, LED, to RNA-seq data from nine tissues of the wild grass Brachypodium distachyon and its domesticated relatives Oryza sativa japonica (rice) and Sorghum bicolor (sorghum). I find that LED is generally smallest in B. distachyon and largest in O. sativa japonica, which underwent domestication earlier than S. bicolor, supporting the hypothesis that domestication may increase the rate of lineage-specific expression divergence in grasses. Moreover, in all three species, LED is positively correlated with protein-coding sequence divergence and tissue specificity, and negatively correlated with network connectivity. Further analysis reveals that genes with large LED are often primarily expressed in anther, implicating lineage-specific expression divergence in the evolution of male reproductive phenotypes. Gene ontology enrichment analysis also identifies an overrepresentation of terms related to male reproduction in the two domesticated grasses, as well as to those involved in host-pathogen defense in all three species. Last, examinations of genes with the largest LED reveal that their lineage-specific expression divergence may have contributed to antimicrobial functions in B. distachyon, to enhanced adaptation and yield during domestication in O. sativa japonica, and to defense against a widespread and devastating fungal pathogen in S. bicolor. Together, these findings suggest that lineage-specific expression divergence in grasses may increase under domestication and preferentially target rapidly evolving genes involved in male reproduction, host-pathogen defense, and the origin of domesticated phenotypes.
Collapse
Affiliation(s)
- Raquel Assis
- Department of Biology, Pennsylvania State University, University Park
| |
Collapse
|
44
|
Dong C, Jin YT, Hua HL, Wen QF, Luo S, Zheng WX, Guo FB. Comprehensive review of the identification of essential genes using computational methods: focusing on feature implementation and assessment. Brief Bioinform 2018; 21:171-181. [PMID: 30496347 DOI: 10.1093/bib/bby116] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2018] [Revised: 11/01/2018] [Accepted: 11/02/2018] [Indexed: 02/06/2023] Open
Abstract
Essential genes have attracted increasing attention in recent years due to the important functions of these genes in organisms. Among the methods used to identify the essential genes, accurate and efficient computational methods can make up for the deficiencies of expensive and time-consuming experimental technologies. In this review, we have collected researches on essential gene predictions in prokaryotes and eukaryotes and summarized the five predominant types of features used in these studies. The five types of features include evolutionary conservation, domain information, network topology, sequence component and expression level. We have described how to implement the useful forms of these features and evaluated their performance based on the data of Escherichia coli MG1655, Bacillus subtilis 168 and human. The prerequisite and applicable range of these features is described. In addition, we have investigated the techniques used to weight features in various models. To facilitate researchers in the field, two available online tools, which are accessible for free and can be directly used to predict gene essentiality in prokaryotes and humans, were referred. This article provides a simple guide for the identification of essential genes in prokaryotes and eukaryotes.
Collapse
Affiliation(s)
- Chuan Dong
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, China
| | - Yan-Ting Jin
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, China
| | - Hong-Li Hua
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, China
| | - Qing-Feng Wen
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, China
| | - Sen Luo
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, China
| | - Wen-Xin Zheng
- School of Biomedical Engineering, Capital Medical University, Beijing, China
| | - Feng-Biao Guo
- School of Life Science and Technology, Center for Informational Biology, Intelligent Learning Institute for Science and Application, University of Electronic Science and Technology of China, Chengdu, China
| |
Collapse
|
45
|
Aguilar-Rodríguez J, Wagner A. Metabolic Determinants of Enzyme Evolution in a Genome-Scale Bacterial Metabolic Network. Genome Biol Evol 2018; 10:3076-3088. [PMID: 30351420 PMCID: PMC6257574 DOI: 10.1093/gbe/evy234] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/22/2018] [Indexed: 11/12/2022] Open
Abstract
Different genes and proteins evolve at very different rates. To identify the factors that explain these differences is an important aspect of research in molecular evolution. One such factor is the role a protein plays in a large molecular network. Here, we analyze the evolutionary rates of enzyme-coding genes in the genome-scale metabolic network of Escherichia coli to find the evolutionary constraints imposed by the structure and function of this complex metabolic system. Central and highly connected enzymes appear to evolve more slowly than less connected enzymes, but we find that they do so as a by-product of their high abundance, and not because of their position in the metabolic network. In contrast, enzymes catalyzing reactions with high metabolic flux-high substrate to product conversion rates-evolve slowly even after we account for their abundance. Moreover, enzymes catalyzing reactions that are difficult to by-pass through alternative pathways, such that they are essential in many different genetic backgrounds, also evolve more slowly. Our analyses show that an enzyme's role in the function of a metabolic network affects its evolution more than its place in the network's structure. They highlight the value of a system-level perspective for studies of molecular evolution.
Collapse
Affiliation(s)
- José Aguilar-Rodríguez
- Department of Evolutionary Biology and Environmental Studies, University of Zurich, Zurich, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
- Department of Biology, Stanford University, Stanford, CA and Department of Chemical and Systems Biology, Stanford University School of Medicine, Stanford, CA
| | - Andreas Wagner
- Department of Evolutionary Biology and Environmental Studies, University of Zurich, Zurich, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
- The Santa Fe Institute, Santa Fe, New Mexico
| |
Collapse
|
46
|
Marek A, Tomala K. The Contribution of Purifying Selection, Linkage, and Mutation Bias to the Negative Correlation between Gene Expression and Polymorphism Density in Yeast Populations. Genome Biol Evol 2018; 10:2986-2996. [PMID: 30321329 PMCID: PMC6250307 DOI: 10.1093/gbe/evy225] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/10/2018] [Indexed: 11/13/2022] Open
Abstract
The negative correlation between the rate of protein evolution and expression level of a gene has been recognized as a universal law of the evolutionary biology (Koonin 2011). In our study, we apply a population-based approach to systematically investigate the relative importance of unequal mutation rate, linkage, and selection in the origin of the expression-polymorphism anticorrelation. We analyzed the DNA sequence of protein coding genes of 24 Saccharomyces cerevisiae and 58 Schizosaccharomyces pombe strains. We found that highly expressed genes had a substantially decreased number of polymorphic sites when compared with genes transcribed less extensively. This expression-dependent reduction was especially strong in the nonsynonymous sites, although it was also present in the synonymous sites and untranslated regions, both up and down of a gene. Most importantly, no such trend was found in introns. We used these observations, as well as analyses of site frequency spectra and data from mutation accumulation experiments, to show that the purifying selection acting on nonsynonymous sites was the main, but not exclusive, factor impeding molecular evolution within the coding sequences of highly expressed genes. Linkage could not fully explain the observed pattern of polymorphism within the untranslated regions and synonymous sites, although the contribution of selection acting directly on synonymous variants was extremely small. Finally, we found that the impact of mutational bias was rather negligible.
Collapse
Affiliation(s)
- Agnieszka Marek
- Institute of Environmental Sciences, Jagiellonian University, Krakow, Poland
| | - Katarzyna Tomala
- Institute of Environmental Sciences, Jagiellonian University, Krakow, Poland
| |
Collapse
|
47
|
Alvarez-Ponce D, Feyertag F, Chakraborty S. Position Matters: Network Centrality Considerably Impacts Rates of Protein Evolution in the Human Protein-Protein Interaction Network. Genome Biol Evol 2018; 9:1742-1756. [PMID: 28854629 PMCID: PMC5570066 DOI: 10.1093/gbe/evx117] [Citation(s) in RCA: 30] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/01/2017] [Indexed: 02/06/2023] Open
Abstract
The proteins of any organism evolve at disparate rates. A long list of factors affecting rates of protein evolution have been identified. However, the relative importance of each factor in determining rates of protein evolution remains unresolved. The prevailing view is that evolutionary rates are dominantly determined by gene expression, and that other factors such as network centrality have only a marginal effect, if any. However, this view is largely based on analyses in yeasts, and accurately measuring the importance of the determinants of rates of protein evolution is complicated by the fact that the different factors are often correlated with each other, and by the relatively poor quality of available functional genomics data sets. Here, we use correlation, partial correlation and principal component regression analyses to measure the contributions of several factors to the variability of the rates of evolution of human proteins. For this purpose, we analyzed the entire human protein–protein interaction data set and the human signal transduction network—a network data set of exceptionally high quality, obtained by manual curation, which is expected to be virtually free from false positives. In contrast with the prevailing view, we observe that network centrality (measured as the number of physical and nonphysical interactions, betweenness, and closeness) has a considerable impact on rates of protein evolution. Surprisingly, the impact of centrality on rates of protein evolution seems to be comparable, or even superior according to some analyses, to that of gene expression. Our observations seem to be independent of potentially confounding factors and from the limitations (biases and errors) of interactomic data sets.
Collapse
|
48
|
Aflorei ED, Klapholz B, Chen C, Radian S, Dragu AN, Moderau N, Prodromou C, Ribeiro PS, Stanewsky R, Korbonits M. In vivo bioassay to test the pathogenicity of missense human AIP variants. J Med Genet 2018; 55:522-529. [PMID: 29632148 PMCID: PMC6073908 DOI: 10.1136/jmedgenet-2017-105191] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2017] [Revised: 02/23/2018] [Accepted: 03/01/2018] [Indexed: 12/17/2022]
Abstract
Background Heterozygous germline loss-of-function mutations in the aryl hydrocarbon receptor-interacting protein gene (AIP) predispose to childhood-onset pituitary tumours. The pathogenicity of missense variants may pose difficulties for genetic counselling and family follow-up. Objective To develop an in vivo system to test the pathogenicity of human AIP mutations using the fruit fly Drosophila melanogaster. Methods We generated a null mutant of the Drosophila AIP orthologue, CG1847, a gene located on the Xchromosome, which displayed lethality at larval stage in hemizygous knockout male mutants (CG1847exon1_3). We tested human missense variants of ‘unknown significance’, with ‘pathogenic’ variants as positive control. Results We found that human AIP can functionally substitute for CG1847, as heterologous overexpression of human AIP rescued male CG1847exon1_3 lethality, while a truncated version of AIP did not restore viability. Flies harbouring patient-specific missense AIP variants (p.C238Y, p.I13N, p.W73R and p.G272D) failed to rescue CG1847exon1_3 mutants, while seven variants (p.R16H, p.Q164R, p.E293V, p.A299V, p.R304Q, p.R314W and p.R325Q) showed rescue, supporting a non-pathogenic role for these latter variants corresponding to prevalence and clinical data. Conclusion Our in vivo model represents a valuable tool to characterise putative disease-causing human AIP variants and assist the genetic counselling and management of families carrying AIP variants.
Collapse
Affiliation(s)
- Elena Daniela Aflorei
- Centre for Endocrinology, Barts and the London School of Medicine, Queen Mary University of London, London, UK
| | - Benjamin Klapholz
- Department of Physiology, Development and Neuroscience, University of Cambridge, Cambridge, UK
| | - Chenghao Chen
- Department of Cell and Developmental Biology, Division of Biosciences, Faculty of Life Sciences, University College London, London, UK
| | - Serban Radian
- Centre for Endocrinology, Barts and the London School of Medicine, Queen Mary University of London, London, UK.,Department of Endocrinology, C.I. Parhon National Institute of Endocrinology, Carol Davila University of Medicine and Pharmacy, Bucharest, Romania
| | - Anca Neluta Dragu
- Centre for Endocrinology, Barts and the London School of Medicine, Queen Mary University of London, London, UK.,Department of Cell and Developmental Biology, Division of Biosciences, Faculty of Life Sciences, University College London, London, UK
| | - Nina Moderau
- Protein Dynamics and Cell Signalling Laboratory, Centre for Tumour Biology, Barts Cancer Institute, Queen Mary University of London, London, UK
| | | | - Paulo S Ribeiro
- Protein Dynamics and Cell Signalling Laboratory, Centre for Tumour Biology, Barts Cancer Institute, Queen Mary University of London, London, UK
| | - Ralf Stanewsky
- Department of Cell and Developmental Biology, Division of Biosciences, Faculty of Life Sciences, University College London, London, UK.,Institute of Neuro- and Behavioural Biology, Westfälische Wilhelms University, Münster, Germany
| | - Márta Korbonits
- Centre for Endocrinology, Barts and the London School of Medicine, Queen Mary University of London, London, UK
| |
Collapse
|
49
|
Schumacher J, Herlyn H. Correlates of evolutionary rates in the murine sperm proteome. BMC Evol Biol 2018; 18:35. [PMID: 29580206 PMCID: PMC5870804 DOI: 10.1186/s12862-018-1157-6] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2017] [Accepted: 03/19/2018] [Indexed: 01/20/2023] Open
Abstract
Background Protein-coding genes expressed in sperm evolve at different rates. To gain deeper insight into the factors underlying this heterogeneity we examined the relative importance of a diverse set of previously described rate correlates in determining the evolution of murine sperm proteins. Results Using partial rank correlations we detected several major rate indicators: Phyletic gene age, numbers of protein-protein interactions, and survival essentiality emerged as particularly important rate correlates in murine sperm proteins. Tissue specificity, numbers of paralogs, and untranslated region lengths also correlate significantly with sperm genes’ evolutionary rates, albeit to a lesser extent. Multifunctionality, coding sequence or average intron lengths, and mean expression level have insignificant or virtually no independent effects on evolutionary rates in murine sperm genes. Gene ontology enrichment analyses of three equally sized murine sperm protein groups classified based on their evolutionary rates indicate strongest sperm-specific functional specialization in the most quickly evolving gene class. Conclusions We propose a model according to which slowly evolving murine sperm proteins tend to be constrained by factors such as survival essentiality, network connectivity, and/or broad expression. In contrast, evolutionary change may arise especially in less constrained sperm proteins, which might, moreover, be prone to specialize to reproduction-related functions. Our results should be taken into account in future studies on rate variations of reproductive genes. Electronic supplementary material The online version of this article (10.1186/s12862-018-1157-6) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Julia Schumacher
- Institute of Organismic and Molecular Evolution, Anthropology, Johannes Gutenberg University, Mainz, Germany.
| | - Holger Herlyn
- Institute of Organismic and Molecular Evolution, Anthropology, Johannes Gutenberg University, Mainz, Germany.
| |
Collapse
|
50
|
Feyertag F, Alvarez-Ponce D. Disulfide Bonds Enable Accelerated Protein Evolution. Mol Biol Evol 2018; 34:1833-1837. [PMID: 28431018 DOI: 10.1093/molbev/msx135] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023] Open
Abstract
The different proteins of any proteome evolve at enormously different rates. What factors contribute to this variability, and to what extent, is still a largely open question. We hypothesized that disulfide bonds, by increasing protein stability, should make proteins' structures relatively independent of their amino acid sequences, thus acting as buffers of deleterious mutations and enabling accelerated sequence evolution. In agreement with this hypothesis, we observed that membrane proteins with disulfide bonds evolved 88% faster than those without disulfide bonds, and that extracellular proteins with disulfide bonds evolved 49% faster than those without disulfide bonds. In addition, genes encoding proteins with disulfide bonds exhibit an increased likelihood of showing signatures of positive selection. Multivariate analyses indicate that the trend is independent of a number of potentially confounding factors. The effect, however, is not observed among the longest proteins, which can become stabilized by mechanisms other than disulfide bonds.
Collapse
Affiliation(s)
- Felix Feyertag
- Department of Biology, University of Nevada-Reno, Reno, NV
| | | |
Collapse
|