1
|
Roberts M, Josephs EB. Weaker selection on genes with treatment-specific expression consistent with a limit on plasticity evolution in Arabidopsis thaliana. Genetics 2023; 224:iyad074. [PMID: 37094602 PMCID: PMC10484170 DOI: 10.1093/genetics/iyad074] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2023] [Revised: 03/06/2023] [Accepted: 04/07/2023] [Indexed: 04/26/2023] Open
Abstract
Differential gene expression between environments often underlies phenotypic plasticity. However, environment-specific expression patterns are hypothesized to relax selection on genes, and thus limit plasticity evolution. We collated over 27 terabases of RNA-sequencing data on Arabidopsis thaliana from over 300 peer-reviewed studies and 200 treatment conditions to investigate this hypothesis. Consistent with relaxed selection, genes with more treatment-specific expression have higher levels of nucleotide diversity and divergence at nonsynonymous sites but lack stronger signals of positive selection. This result persisted even after controlling for expression level, gene length, GC content, the tissue specificity of expression, and technical variation between studies. Overall, our investigation supports the existence of a hypothesized trade-off between the environment specificity of a gene's expression and the strength of selection on said gene in A. thaliana. Future studies should leverage multiple genome-scale datasets to tease apart the contributions of many variables in limiting plasticity evolution.
Collapse
Affiliation(s)
- Miles Roberts
- Genetics and Genome Sciences Program, Michigan State University, East Lansing, MI 48824, USA
| | - Emily B Josephs
- Department of Plant Biology, Michigan State University, East Lansing, MI 48824, USA
- Ecology, Evolution, and Behavior Program, Michigan State University, East Lansing, MI 48824, USA
| |
Collapse
|
2
|
Saidi A, Hajibarat Z, Hajibarat Z. Phylogeny, gene structure and GATA genes expression in different tissues of solanaceae species. BIOCATALYSIS AND AGRICULTURAL BIOTECHNOLOGY 2021. [DOI: 10.1016/j.bcab.2021.102015] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023]
|
3
|
Han Y, Luthe D. Identification and evolution analysis of the JAZ gene family in maize. BMC Genomics 2021; 22:256. [PMID: 33838665 PMCID: PMC8037931 DOI: 10.1186/s12864-021-07522-4] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2021] [Accepted: 03/08/2021] [Indexed: 02/07/2023] Open
Abstract
Background Jasmonates (JAs) are important for plants to coordinate growth, reproduction, and defense responses. In JA signaling, jasmonate ZIM-domain (JAZ) proteins serve as master regulators at the initial stage of herbivores attacks. Although discovered in many plant species, little in-depth characterization of JAZ gene expression has been reported in the agronomically important crop, maize (Zea mays L.). Results In this study 16 JAZ genes from the maize genome were identified and classified. Phylogenetic analyses were performed from maize, rice, sorghum, Brachypodium, and Arabidopsis using deduced protein sequences, total six clades were proposed and conservation was observed in each group, such as similar gene exon/intron structures. Synteny analysis across four monocots indicated these JAZ gene families had a common ancestor, and duplication events in maize genome may drive the expansion of JAZ gene family, including genome-wide duplication (GWD), transposon, and/or tandem duplication. Strong purifying selection acted on all JAZ genes except those in group 4, which were under neutral selection. Further, we cloned three paralogous JAZ gene pairs from two maize inbreds differing in JA levels and insect resistance, and gene polymorphisms were observed between two inbreds. Conclusions Here we analyzed the composition and evolution of JAZ genes in maize with three other monocot plants. Extensive phylogenetic and synteny analysis revealed the expansion and selection fate of maize JAZ. This is the first study comparing the difference between two inbreds, and we propose genotype-specific JAZ gene expression might be present in maize plants. Since genetic redundancy in JAZ gene family hampers our understanding of their role in response to specific elicitors, we hope this research could be pertinent to elucidating the defensive responses in plants. Supplementary Information The online version contains supplementary material available at 10.1186/s12864-021-07522-4.
Collapse
Affiliation(s)
- Yang Han
- The Pennsylvania State University, Plant Science, University Park, PA, USA
| | - Dawn Luthe
- The Pennsylvania State University, Plant Science, University Park, PA, USA.
| |
Collapse
|
4
|
Alvarez-Ponce D, Aguilar-Rodríguez J, Fares MA. Molecular Chaperones Accelerate the Evolution of Their Protein Clients in Yeast. Genome Biol Evol 2020; 11:2360-2375. [PMID: 31297528 PMCID: PMC6735891 DOI: 10.1093/gbe/evz147] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/05/2019] [Indexed: 12/23/2022] Open
Abstract
Protein stability is a major constraint on protein evolution. Molecular chaperones, also known as heat-shock proteins, can relax this constraint and promote protein evolution by diminishing the deleterious effect of mutations on protein stability and folding. This effect, however, has only been stablished for a few chaperones. Here, we use a comprehensive chaperone–protein interaction network to study the effect of all yeast chaperones on the evolution of their protein substrates, that is, their clients. In particular, we analyze how yeast chaperones affect the evolutionary rates of their clients at two very different evolutionary time scales. We first study the effect of chaperone-mediated folding on protein evolution over the evolutionary divergence of Saccharomyces cerevisiae and S. paradoxus. We then test whether yeast chaperones have left a similar signature on the patterns of standing genetic variation found in modern wild and domesticated strains of S. cerevisiae. We find that genes encoding chaperone clients have diverged faster than genes encoding non-client proteins when controlling for their number of protein–protein interactions. We also find that genes encoding client proteins have accumulated more intraspecific genetic diversity than those encoding non-client proteins. In a number of multivariate analyses, controlling by other well-known factors that affect protein evolution, we find that chaperone dependence explains the largest fraction of the observed variance in the rate of evolution at both evolutionary time scales. Chaperones affecting rates of protein evolution mostly belong to two major chaperone families: Hsp70s and Hsp90s. Our analyses show that protein chaperones, by virtue of their ability to buffer destabilizing mutations and their role in modulating protein genotype–phenotype maps, have a considerable accelerating effect on protein evolution.
Collapse
Affiliation(s)
- David Alvarez-Ponce
- Biology Department, University of Nevada, Reno.,Instituto de Biología Molecular y Celular de Plantas, CSIC-UPV, Valencia, Spain
| | - José Aguilar-Rodríguez
- Department of Biology, Stanford University, CA.,Department of Chemical and Systems Biology, Stanford University School of Medicine, CA
| | - Mario A Fares
- Instituto de Biología Molecular y Celular de Plantas, CSIC-UPV, Valencia, Spain.,Smurfit Institute of Genetics, University of Dublin, Trinity College Dublin, Ireland
| |
Collapse
|
5
|
Liu J, Sun Z, Mao X, Gerken H, Wang X, Yang W. Multiomics analysis reveals a distinct mechanism of oleaginousness in the emerging model alga Chromochloris zofingiensis. THE PLANT JOURNAL : FOR CELL AND MOLECULAR BIOLOGY 2019; 98:745-758. [PMID: 30828893 DOI: 10.1111/tpj.14270] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/23/2018] [Revised: 12/24/2018] [Accepted: 01/28/2019] [Indexed: 05/03/2023]
Abstract
Chromochloris zofingiensis, featured due to its capability to simultaneously synthesize triacylglycerol (TAG) and astaxanthin, is emerging as a leading candidate alga for production uses. To better understand the oleaginous mechanism of this alga, we conducted a multiomics analysis by systematically integrating time-resolved transcriptomes, lipidomes and metabolomes in response to nitrogen deprivation. The data analysis unraveled the distinct mechanism of TAG accumulation, which involved coordinated stimulation of multiple biological processes including supply of energy and reductants, carbon reallocation from protein and starch, and 'pushing' and 'pulling' carbon to TAG synthesis. Unlike the model alga Chlamydomonas, de novo fatty acid synthesis in C. zofingiensis was promoted, together with enhanced turnover of both glycolipids and phospholipids, supporting the drastic need of acyls for TAG assembly. Moreover, genomewide analysis identified many key functional enzymes and transcription factors that had engineering potential for TAG modulation. Two genes encoding glycerol-3-phosphate acyltransferase (GPAT), the first committed enzyme for TAG assembly, were found in the C. zofingiensis genome; in vivo functional characterization revealed that extrachloroplastic GPAT instead of chloroplastic GPAT played a central role in TAG synthesis. These findings illuminate distinct oleaginousness mechanisms in C. zofingiensis and pave the way towards rational manipulation of this alga to becone an emerging model for trait improvements.
Collapse
Affiliation(s)
- Jin Liu
- Laboratory for Algae Biotechnology & Innovation, College of Engineering, Peking University, Beijing, 100871, China
| | - Zheng Sun
- International Research Center for Marine Biosciences, Ministry of Science and Technology, Shanghai Ocean University, Shanghai, 201306, China
| | - Xuemei Mao
- Laboratory for Algae Biotechnology & Innovation, College of Engineering, Peking University, Beijing, 100871, China
| | - Henri Gerken
- School of Sustainable Engineering and the Built Environment, Arizona State University Polytechnic campus, Mesa, AZ, 85212, USA
| | - Xiaofei Wang
- Laboratory for Algae Biotechnology & Innovation, College of Engineering, Peking University, Beijing, 100871, China
| | - Wenqiang Yang
- Photosynthesis Research Center, Key Laboratory of Photobiology, Institute of Botany, Chinese Academy of Sciences, Beijing, 100093, China
| |
Collapse
|
6
|
Abstract
An attractive and long-standing hypothesis regarding the evolution of genes after duplication posits that the duplication event creates new evolutionary possibilities by releasing a copy of the gene from constraint. Apparent support was found in numerous analyses, particularly, the observation of higher rates of evolution in duplicated as compared with singleton genes. Could it, instead, be that more duplicable genes (owing to mutation, fixation, or retention biases) are intrinsically faster evolving? To uncouple the measurement of rates of evolution from the determination of duplicate or singleton status, we measure the rates of evolution in singleton genes in outgroup primate lineages but classify these genes as to whether they have duplicated or not in a crown group of great apes. We find that rates of evolution are higher in duplicable genes prior to the duplication event. In part this is owing to a negative correlation between coding sequence length and rate of evolution, coupled with a bias toward smaller genes being more duplicable. The effect is masked by difference in expression rate between duplicable genes and singletons. Additionally, in contradiction to the classical assumption, we find no convincing evidence for an increase in dN/dS after duplication, nor for rate asymmetry between duplicates. We conclude that high rates of evolution of duplicated genes are not solely a consequence of the duplication event, but are rather a predictor of duplicability. These results are consistent with a model in which successful gene duplication events in mammals are skewed toward events of minimal phenotypic impact.
Collapse
Affiliation(s)
- Áine N O'Toole
- Department of Genetics, Smurfit Institute of Genetics, Trinity College Dublin, Dublin, Ireland
| | - Laurence D Hurst
- The Milner Centre for Evolution, Department of Biology and Biochemistry, University of Bath, Bath, Somerset, United Kingdom
| | - Aoife McLysaght
- Department of Genetics, Smurfit Institute of Genetics, Trinity College Dublin, Dublin, Ireland
| |
Collapse
|
7
|
Alvarez-Ponce D, Feyertag F, Chakraborty S. Position Matters: Network Centrality Considerably Impacts Rates of Protein Evolution in the Human Protein-Protein Interaction Network. Genome Biol Evol 2018; 9:1742-1756. [PMID: 28854629 PMCID: PMC5570066 DOI: 10.1093/gbe/evx117] [Citation(s) in RCA: 30] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/01/2017] [Indexed: 02/06/2023] Open
Abstract
The proteins of any organism evolve at disparate rates. A long list of factors affecting rates of protein evolution have been identified. However, the relative importance of each factor in determining rates of protein evolution remains unresolved. The prevailing view is that evolutionary rates are dominantly determined by gene expression, and that other factors such as network centrality have only a marginal effect, if any. However, this view is largely based on analyses in yeasts, and accurately measuring the importance of the determinants of rates of protein evolution is complicated by the fact that the different factors are often correlated with each other, and by the relatively poor quality of available functional genomics data sets. Here, we use correlation, partial correlation and principal component regression analyses to measure the contributions of several factors to the variability of the rates of evolution of human proteins. For this purpose, we analyzed the entire human protein–protein interaction data set and the human signal transduction network—a network data set of exceptionally high quality, obtained by manual curation, which is expected to be virtually free from false positives. In contrast with the prevailing view, we observe that network centrality (measured as the number of physical and nonphysical interactions, betweenness, and closeness) has a considerable impact on rates of protein evolution. Surprisingly, the impact of centrality on rates of protein evolution seems to be comparable, or even superior according to some analyses, to that of gene expression. Our observations seem to be independent of potentially confounding factors and from the limitations (biases and errors) of interactomic data sets.
Collapse
|
8
|
Tine M. Evolutionary significance and diversification of the phosphoglucose isomerase genes in vertebrates. BMC Res Notes 2015; 8:799. [PMID: 26682538 PMCID: PMC4684624 DOI: 10.1186/s13104-015-1683-x] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2015] [Accepted: 11/09/2015] [Indexed: 01/20/2024] Open
Abstract
Background Phosphoglucose isomerase (PGI) genes are important multifunctional proteins whose evolution has, until now, not been well elucidated because of the limited number of completely sequenced genomes. Although the multifunctionality of this gene family has been considered as an original and innate characteristic, PGI genes may have acquired novel functions through changes in coding sequences and exon/intron structure, which are known to lead to functional divergence after gene duplication. A whole-genome comparative approach was used to estimate the rates of molecular evolution of this protein family. Results The results confirm the presence of two isoforms in teleost fishes and only one variant in all other vertebrates. Phylogenetic reconstructions grouped the PGI genes into five main groups: lungfishes/coelacanth/cartilaginous fishes, teleost fishes, amphibians, reptiles/birds and mammals, with the teleost group being subdivided into two subclades comprising PGI1 and PGI2. This PGI partitioning into groups is consistent with the synteny and molecular evolution results based on the estimation of the ratios of nonsynonymous to synonymous changes (Ka/Ks) and divergence rates between both PGI paralogs and orthologs. Teleost PGI2 shares more similarity with the variant found in all other vertebrates, suggesting that it has less evolved than PGI1 relative to the PGI of common vertebrate ancestor. Conclusions The diversification of PGI genes into PGI1 and PGI2 is consistent with a teleost-specific duplication before the radiation of this lineage, and after its split from the other infraclasses of ray-finned fishes. The low average Ka/Ks ratios within teleost and mammalian lineages suggest that both PGI1 and PGI2 are functionally constrained by purifying selection and may, therefore, have the same functions. By contrast, the high average Ka/Ks ratios and divergence rates within reptiles and birds indicate that PGI may be involved in different functions. The synteny analyses show that the genomic region harbouring PGI genes has independently undergone genomic rearrangements in mammals versus the reptile/bird lineage in particular, which may have contributed to the actual functional diversification of this gene family. Electronic supplementary material The online version of this article (doi:10.1186/s13104-015-1683-x) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Mbaye Tine
- Molecular Zoology Laboratory, Department of Zoology, University of Johannesburg, Auckland Park, 2006, South Africa. .,Genome Centre Cologne at MPI for Plant Breeding Research, 22 Carl-von-Linné-Weg 10, 50829, Cologne, Germany.
| |
Collapse
|
9
|
Pich I Roselló O, Kondrashov FA. Long-term asymmetrical acceleration of protein evolution after gene duplication. Genome Biol Evol 2014; 6:1949-55. [PMID: 25070510 PMCID: PMC4159008 DOI: 10.1093/gbe/evu159] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Rapid divergence of gene copies after duplication is thought to determine the fate of the copies and evolution of novel protein functions. However, data on how long the gene copies continue to experience an elevated rate of evolution remain scarce. Standard theory of gene duplications based on some level of genetic redundancy of gene copies predicts that the period of accelerated evolution must end relatively quickly. Using a maximum-likelihood approach we estimate preduplication, initial postduplication, and recent postduplication rates of evolution that occurred in the mammalian lineage. We find that both gene copies experience a similar in magnitude acceleration in their rate of evolution. The copy located in the original genomic position typically returns to the preduplication rates of evolution in a short period of time. The burst of faster evolution of the copy that is located in a new genomic position typically lasts longer. Furthermore, the fast-evolving copies on average continue to evolve faster than the preduplication rates far longer than predicted by standard theory of gene duplications. We hypothesize that the prolonged elevated rates of evolution are determined by functional properties that were acquired during, or soon after, the gene duplication event.
Collapse
Affiliation(s)
- Oriol Pich I Roselló
- Facultat de Medicina, Universitat de Barcelona (UB), SpainBioinformatics and Genomics Programme, Centre for Genomic Regulation (CRG), Barcelona, SpainUniversitat Pompeu Fabra (UPF), Barcelona, Spain
| | - Fyodor A Kondrashov
- Bioinformatics and Genomics Programme, Centre for Genomic Regulation (CRG), Barcelona, SpainUniversitat Pompeu Fabra (UPF), Barcelona, SpainInstitució Catalana de Recerca i Estudis Avançats (ICREA), Barcelona, Spain
| |
Collapse
|
10
|
Pegueroles C, Laurie S, Albà MM. Accelerated evolution after gene duplication: a time-dependent process affecting just one copy. Mol Biol Evol 2013; 30:1830-42. [PMID: 23625888 DOI: 10.1093/molbev/mst083] [Citation(s) in RCA: 87] [Impact Index Per Article: 7.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022] Open
Abstract
Gene duplication is widely regarded as a major mechanism modeling genome evolution and function. However, the mechanisms that drive the evolution of the two, initially redundant, gene copies are still ill defined. Many gene duplicates experience evolutionary rate acceleration, but the relative contribution of positive selection and random drift to the retention and subsequent evolution of gene duplicates, and for how long the molecular clock may be distorted by these processes, remains unclear. Focusing on rodent genes that duplicated before and after the mouse and rat split, we find significantly increased sequence divergence after duplication in only one of the copies, which in nearly all cases corresponds to the novel daughter copy, independent of the mechanism of duplication. We observe that the evolutionary rate of the accelerated copy, measured as the ratio of nonsynonymous to synonymous substitutions, is on average 5-fold higher in the period spanning 4-12 My after the duplication than it was before the duplication. This increase can be explained, at least in part, by the action of positive selection according to the results of the maximum likelihood-based branch-site test. Subsequently, the rate decelerates until purifying selection completely returns to preduplication levels. Reversion to the original rates has already been accomplished 40.5 My after the duplication event, corresponding to a genetic distance of about 0.28 synonymous substitutions per site. Differences in tissue gene expression patterns parallel those of substitution rates, reinforcing the role of neofunctionalization in explaining the evolution of young gene duplicates.
Collapse
Affiliation(s)
- Cinta Pegueroles
- Evolutionary Genomics Group, Research Programme on Biomedical Informatics (GRIB), Hospital del Mar Research Institute (IMIM), Universitat Pompeu Fabra (UPF), Barcelona, Spain
| | | | | |
Collapse
|
11
|
Genome-wide identification and divergent transcriptional expression of StAR-related lipid transfer (START) genes in teleosts. Gene 2013; 519:18-25. [DOI: 10.1016/j.gene.2013.01.058] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2012] [Revised: 01/28/2013] [Accepted: 01/30/2013] [Indexed: 12/20/2022]
|
12
|
Chen FC, Liao BY, Pan CL, Lin HY, Chang AYF. Assessing determinants of exonic evolutionary rates in mammals. Mol Biol Evol 2012; 29:3121-9. [PMID: 22504521 DOI: 10.1093/molbev/mss116] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022] Open
Abstract
From studies investigating the differences in evolutionary rates between genes, gene compactness and gene expression level have been identified as important determinants of gene-level protein evolutionary rate, as represented by nonsynonymous to synonymous substitution rate (d(N)/d(S)) ratio. However, the causes of exon-level variances in d(N)/d(S) are less understood. Here, we use principal component regression to examine to what extent 13 exon features explain the variance in d(N), d(S), and the d(N)/d(S) ratio of human-rhesus macaque or human-mouse orthologous exons. The exon features were grouped into six functional categories: expression features, mRNA splicing features, structural-functional features, compactness features, exon duplicability, and other features, including G + C content and exon length. Although expression features are important for determining d(N) and d(N)/d(S) between exons of different genes, structural-functional features and splicing features explained more of the variance for exons of the same genes. Furthermore, we show that compactness features can explain only a relatively small percentage of variance in exon-level d(N) or d(N)/d(S) in either between-gene or within-gene comparison. By contrast, d(S) yielded inconsistent results in the human-mouse comparison and the human-rhesus macaque comparison. This inconsistency may suggest rapid evolutionary changes of the mutation landscape in mammals. Our results suggest that between-gene and within-gene variation in d(N)/d(S) (and d(N)) are driven by different evolutionary forces and that the role of mRNA splicing in causing the variation in evolutionary rates of coding sequences may be underappreciated.
Collapse
Affiliation(s)
- Feng-Chi Chen
- Division of Biostatistics and Bioinformatics, Institute of Population Health Sciences, National Health Research Institutes, Zhunan, Miaoli County, Taiwan, Republic of China.
| | | | | | | | | |
Collapse
|
13
|
Corbi J, Dutheil JY, Damerval C, Tenaillon MI, Manicacci D. Accelerated evolution and coevolution drove the evolutionary history of AGPase sub-units during angiosperm radiation. ANNALS OF BOTANY 2012; 109:693-708. [PMID: 22307567 PMCID: PMC3286274 DOI: 10.1093/aob/mcr303] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/15/2011] [Accepted: 11/07/2011] [Indexed: 05/10/2023]
Abstract
BACKGROUND AND AIMS ADP-glucose pyrophosphorylase (AGPase) is a key enzyme of starch biosynthesis. In the green plant lineage, it is composed of two large (LSU) and two small (SSU) sub-units encoded by paralogous genes, as a consequence of several rounds of duplication. First, our aim was to detect specific patterns of molecular evolution following duplication events and the divergence between monocotyledons and dicotyledons. Secondly, we investigated coevolution between amino acids both within and between sub-units. METHODS A phylogeny of each AGPase sub-unit was built using all gymnosperm and angiosperm sequences available in databases. Accelerated evolution along specific branches was tested using the ratio of the non-synonymous to the synonymous substitution rate. Coevolution between amino acids was investigated taking into account compensatory changes between co-substitutions. KEY RESULTS We showed that SSU paralogues evolved under high functional constraints during angiosperm radiation, with a significant level of coevolution between amino acids that participate in SSU major functions. In contrast, in the LSU paralogues, we identified residues under positive selection (1) following the first LSU duplication that gave rise to two paralogues mainly expressed in angiosperm source and sink tissues, respectively; and (2) following the emergence of grass-specific paralogues expressed in the endosperm. Finally, we found coevolution between residues that belong to the interaction domains of both sub-units. CONCLUSIONS Our results support the view that coevolution among amino acid residues, especially those lying in the interaction domain of each sub-unit, played an important role in AGPase evolution. First, within SSU, coevolution allowed compensating mutations in a highly constrained context. Secondly, the LSU paralogues probably acquired tissue-specific expression and regulatory properties via the coevolution between sub-unit interacting domains. Finally, the pattern we observed during LSU evolution is consistent with repeated sub-functionalization under 'Escape from Adaptive Conflict', a model rarely illustrated in the literature.
Collapse
Affiliation(s)
- Jonathan Corbi
- CNRS, UMR 0320/UMR 8120 Génétique Végétale, Ferme du Moulon, F-91190 Gif sur Yvette, France
| | - Julien Y. Dutheil
- BiRC-Bioinformatics Research Center, Aarhus University, C.F. Møllers Alle 8, Building 1110, DK-8000 Århus C, Denmark
| | - Catherine Damerval
- CNRS, UMR 0320/UMR 8120 Génétique Végétale, Ferme du Moulon, F-91190 Gif sur Yvette, France
| | - Maud I. Tenaillon
- CNRS, UMR 0320/UMR 8120 Génétique Végétale, Ferme du Moulon, F-91190 Gif sur Yvette, France
| | - Domenica Manicacci
- Université Paris-Sud, UMR 0320/UMR 8120 Génétique Végétale, Ferme du Moulon, F-91190 Gif sur Yvette, France
| |
Collapse
|
14
|
Chen TW, Wu TH, Ng WV, Lin WC. Interrogation of alternative splicing events in duplicated genes during evolution. BMC Genomics 2011; 12 Suppl 3:S16. [PMID: 22369477 PMCID: PMC3333175 DOI: 10.1186/1471-2164-12-s3-s16] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/05/2022] Open
Abstract
Background Gene duplication provides resources for developing novel genes and new functions while retaining the original functions. In addition, alternative splicing could increase the complexity of expression at the transcriptome and proteome level without increasing the number of gene copy in the genome. Duplication and alternative splicing are thought to work together to provide the diverse functions or expression patterns for eukaryotes. Previously, it was believed that duplication and alternative splicing were negatively correlated and probably interchangeable. Results We look into the relationship between occurrence of alternative splicing and duplication at different time after duplication events. We found duplication and alternative splicing were indeed inversely correlated if only recently duplicated genes were considered, but they became positively correlated when we took those ancient duplications into account. Specifically, for slightly or moderately duplicated genes with gene families containing 2 - 7 paralogs, genes were more likely to evolve alternative splicing and had on average a greater number of alternative splicing isoforms after long-term evolution compared to singleton genes. On the other hand, those large gene families (contain at least 8 paralogs) had a lower proportion of alternative splicing, and fewer alternative splicing isoforms on average even when ancient duplicated genes were taken into consideration. We also found these duplicated genes having alternative splicing were under tighter evolutionary constraints compared to those having no alternative splicing, and had an enrichment of genes that participate in molecular transducer activities. Conclusions We studied the association between occurrences of alternative splicing and gene duplication. Our results implicate that there are key differences in functions and evolutionary constraints among singleton genes or duplicated genes with or without alternative splicing incidences. It implies that the gene duplication and alternative splicing may have different functional significance in the evolution of speciation diversity.
Collapse
Affiliation(s)
- Ting-Wen Chen
- Institute of Biomedical Informatics, National Yang-Ming University, Taipei, Taiwan
| | | | | | | |
Collapse
|
15
|
Bu L, Bergthorsson U, Katju V. Local synteny and codon usage contribute to asymmetric sequence divergence of Saccharomyces cerevisiae gene duplicates. BMC Evol Biol 2011; 11:279. [PMID: 21955875 PMCID: PMC3190396 DOI: 10.1186/1471-2148-11-279] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2011] [Accepted: 09/28/2011] [Indexed: 11/10/2022] Open
Abstract
Background Duplicated genes frequently experience asymmetric rates of sequence evolution. Relaxed selective constraints and positive selection have both been invoked to explain the observation that one paralog within a gene-duplicate pair exhibits an accelerated rate of sequence evolution. In the majority of studies where asymmetric divergence has been established, there is no indication as to which gene copy, ancestral or derived, is evolving more rapidly. In this study we investigated the effect of local synteny (gene-neighborhood conservation) and codon usage on the sequence evolution of gene duplicates in the S. cerevisiae genome. We further distinguish the gene duplicates into those that originated from a whole-genome duplication (WGD) event (ohnologs) versus small-scale duplications (SSD) to determine if there exist any differences in their patterns of sequence evolution. Results For SSD pairs, the derived copy evolves faster than the ancestral copy. However, there is no relationship between rate asymmetry and synteny conservation (ancestral-like versus derived-like) in ohnologs. mRNA abundance and optimal codon usage as measured by the CAI is lower in the derived SSD copies relative to ancestral paralogs. Moreover, in the case of ohnologs, the faster-evolving copy has lower CAI and lowered expression. Conclusions Together, these results suggest that relaxation of selection for codon usage and gene expression contribute to rate asymmetry in the evolution of duplicated genes and that in SSD pairs, the relaxation of selection stems from the loss of ancestral regulatory information in the derived copy.
Collapse
Affiliation(s)
- Lijing Bu
- Department of Biology, University of New Mexico, Albuquerque, NM 87131, USA
| | | | | |
Collapse
|
16
|
Forslund K, Schreiber F, Thanintorn N, Sonnhammer ELL. OrthoDisease: tracking disease gene orthologs across 100 species. Brief Bioinform 2011; 12:463-73. [PMID: 21565935 DOI: 10.1093/bib/bbr024] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
Abstract
Orthology is one of the most important tools available to modern biology, as it allows making inferences from easily studied model systems to much less tractable systems of interest, such as ourselves. This becomes important not least in the study of genetic diseases. We here review work on the orthology of disease-associated genes and also present an updated version of the InParanoid-based disease orthology database and web site OrthoDisease, with 14-fold increased species coverage since the previous version. Using this resource, we survey the taxonomic distribution of orthologs of human genes involved in different disease categories. The hypothesis that paralogs can mask the effect of deleterious mutations predicts that known heritable disease genes should have fewer close paralogs. We found large-scale support for this hypothesis as significantly fewer duplications were observed for disease genes in the OrthoDisease ortholog groups.
Collapse
Affiliation(s)
- Kristoffer Forslund
- Stockholm Bioinformatics Center, Department of Biochemistry and Biophysics, Stockholm University, Albanova, 10691 Stockholm, Sweden
| | | | | | | |
Collapse
|
17
|
Yang L, Gaut BS. Factors that contribute to variation in evolutionary rate among Arabidopsis genes. Mol Biol Evol 2011; 28:2359-69. [PMID: 21389272 DOI: 10.1093/molbev/msr058] [Citation(s) in RCA: 136] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022] Open
Abstract
Surprisingly, few studies have described evolutionary rate variation among plant nuclear genes, with little investigation of the causes of rate variation. Here, we describe evolutionary rates for 11,492 ortholog pairs between Arabidopsis thaliana and A. lyrata and investigate possible contributors to rate variation among these genes. Rates of evolution at synonymous sites vary along chromosomes, suggesting that mutation rates vary on genomic scales, perhaps as a function of recombination rate. Rates of evolution at nonsynonymous sites correlate most strongly with expression patterns, but they also vary as to whether a gene is duplicated and retained after a whole-genome duplication (WGD) event. WGD genes evolve more slowly, on average, than nonduplicated genes and non-WGD duplicates. We hypothesize that levels and patterns of expression are not only the major determinants that explain nonsynonymous rate variation among genes but also a critical determinant of gene retention after duplication.
Collapse
Affiliation(s)
- Liang Yang
- Department of Ecology and Evolutionary Biology, University of California Irvine, Irvine, USA
| | | |
Collapse
|
18
|
Chen FC, Chen CJ, Li WH, Chuang TJ. Gene family size conservation is a good indicator of evolutionary rates. Mol Biol Evol 2010; 27:1750-8. [PMID: 20194423 PMCID: PMC2908708 DOI: 10.1093/molbev/msq055] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023] Open
Abstract
The evolution of duplicate genes has been a topic of broad interest. Here, we propose that the conservation of gene family size is a good indicator of the rate of sequence evolution and some other biological properties. By comparing the human–chimpanzee–macaque orthologous gene families with and without family size conservation, we demonstrate that genes with family size conservation evolve more slowly than those without family size conservation. Our results further demonstrate that both family expansion and contraction events may accelerate gene evolution, resulting in elevated evolutionary rates in the genes without family size conservation. In addition, we show that the duplicate genes with family size conservation evolve significantly more slowly than those without family size conservation. Interestingly, the median evolutionary rate of singletons falls in between those of the above two types of duplicate gene families. Our results thus suggest that the controversy on whether duplicate genes evolve more slowly than singletons can be resolved when family size conservation is taken into consideration. Furthermore, we also observe that duplicate genes with family size conservation have the highest level of gene expression/expression breadth, the highest proportion of essential genes, and the lowest gene compactness, followed by singletons and then by duplicate genes without family size conservation. Such a trend accords well with our observations of evolutionary rates. Our results thus point to the importance of family size conservation in the evolution of duplicate genes.
Collapse
Affiliation(s)
- Feng-Chi Chen
- Division of Biostatistics and Bioinformatics, Institute of Population Health Sciences, National Health Research Institutes, Miaoli County, Taiwan
| | | | | | | |
Collapse
|
19
|
Farré D, Albà MM. Heterogeneous patterns of gene-expression diversification in mammalian gene duplicates. Mol Biol Evol 2009; 27:325-35. [PMID: 19822635 DOI: 10.1093/molbev/msp242] [Citation(s) in RCA: 33] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022] Open
Abstract
Gene duplication is a major mechanism for molecular evolutionary innovation. Young gene duplicates typically exhibit elevated rates of protein evolution and, according to a number of recent studies, increased expression divergence. However, the nature of these changes is still poorly understood. To gain novel insights into the functional consequences of gene duplication, we have undertaken an in-depth analysis of a large data set of gene families containing primate- and/or rodent-specific gene duplicates. We have found a clear tendency toward an increase in protein, promoter, and expression divergence with increasing number of duplication events undergone by each gene since the human-mouse split. In addition, gene duplication is significantly associated with a reduction in expression breadth and intensity. Interestingly, it is possible to identify three main groups regarding the evolution of gene expression following gene duplication. The first group, which comprises around 25% of the families, shows patterns compatible with tissue-expression partitioning. The second and largest group, comprising 33-53% of the families, shows broad expression of one of the gene copies and reduced, overlapping, expression of the other copy or copies. This can be attributed, in most cases, to loss of expression in several tissues of one or more gene copies. Finally, a substantial number of families, 19-35%, maintain a very high level of tissue-expression overlap (>0.8) after tens of millions of years of evolution. These families may have been subject to selection for increased gene dosage.
Collapse
|
20
|
Mallick S, Gnerre S, Muller P, Reich D. The difficulty of avoiding false positives in genome scans for natural selection. Genome Res 2009; 19:922-33. [PMID: 19411606 DOI: 10.1101/gr.086512.108] [Citation(s) in RCA: 82] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
Several studies have found evidence for more positive selection on the chimpanzee lineage compared with the human lineage since the two species split. A potential concern, however, is that these findings may simply reflect artifacts of the data: inaccuracies in the underlying chimpanzee genome sequence, which is of lower quality than human. To test this hypothesis, we generated de novo genome assemblies of chimpanzee and macaque and aligned them with human. We also implemented a novel bioinformatic procedure for producing alignments of closely related species that uses synteny information to remove misassembled and misaligned regions, and sequence quality scores to remove nucleotides that are less reliable. We applied this procedure to re-examine 59 genes recently identified as candidates for positive selection in chimpanzees. The great majority of these signals disappear after application of our new bioinformatic procedure. We also carried out laboratory-based resequencing of 10 of the regions in multiple chimpanzees and humans, and found that our alignments were correct wherever there was a conflict with the published results. These findings throw into question previous findings that there has been more positive selection in chimpanzees than in humans since the two species diverged. Our study also highlights the challenges of searching the extreme tails of distributions for signals of natural selection. Inaccuracies in the genome sequence at even a tiny fraction of genes can produce false-positive signals, which make it difficult to identify loci that have genuinely been targets of selection.
Collapse
Affiliation(s)
- Swapan Mallick
- Department of Genetics, Harvard Medical School, Boston, MA 02115, USA.
| | | | | | | |
Collapse
|
21
|
Zou C, Lehti-Shiu MD, Thomashow M, Shiu SH. Evolution of stress-regulated gene expression in duplicate genes of Arabidopsis thaliana. PLoS Genet 2009; 5:e1000581. [PMID: 19649161 PMCID: PMC2709438 DOI: 10.1371/journal.pgen.1000581] [Citation(s) in RCA: 93] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2009] [Accepted: 06/30/2009] [Indexed: 01/10/2023] Open
Abstract
Due to the selection pressure imposed by highly variable environmental conditions, stress sensing and regulatory response mechanisms in plants are expected to evolve rapidly. One potential source of innovation in plant stress response mechanisms is gene duplication. In this study, we examined the evolution of stress-regulated gene expression among duplicated genes in the model plant Arabidopsis thaliana. Key to this analysis was reconstructing the putative ancestral stress regulation pattern. By comparing the expression patterns of duplicated genes with the patterns of their ancestors, duplicated genes likely lost and gained stress responses at a rapid rate initially, but the rate is close to zero when the synonymous substitution rate (a proxy for time) is >∼0.8. When considering duplicated gene pairs, we found that partitioning of putative ancestral stress responses occurred more frequently compared to cases of parallel retention and loss. Furthermore, the pattern of stress response partitioning was extremely asymmetric. An analysis of putative cis-acting DNA regulatory elements in the promoters of the duplicated stress-regulated genes indicated that the asymmetric partitioning of ancestral stress responses are likely due, at least in part, to differential loss of DNA regulatory elements; the duplicated genes losing most of their stress responses were those that had lost more of the putative cis-acting elements. Finally, duplicate genes that lost most or all of the ancestral responses are more likely to have gained responses to other stresses. Therefore, the retention of duplicates that inherit few or no functions seems to be coupled to neofunctionalization. Taken together, our findings provide new insight into the patterns of evolutionary changes in gene stress responses after duplication and lay the foundation for testing the adaptive significance of stress regulatory changes under highly variable biotic and abiotic environments. Plants have developed a multitude of response mechanisms to survive stressful environments. Since the environment is highly variable, these stress response mechanisms are expected to undergo frequent innovation. Duplicate genes represent a potential source for such innovation. In this paper, we explored the evolutionary changes in stress responses at the transcriptional level among duplicated genes in the model plant Arabidopsis thaliana. We found that after gene duplication, ancestral stress responses tend to be retained by only one of the gene duplicates (partitioning). In addition, the pattern of partitioning of multiple stress responses is extremely asymmetric, where one duplicate tends to inherit most or all of the ancestral stress responses. We present evidence that the asymmetric loss of stress responses is correlated with the asymmetric loss of putative transcription factor binding sites. Interestingly, those duplicate genes inheriting few or no ancestral responses tend to have gained new stress responses, providing support for the model that gene duplicates are a source of innovation. Our findings provide important insight into the mechanisms of gene function evolution and lay the foundation for experimental studies to determine the significance of gain of stress responses in plant adaptation.
Collapse
Affiliation(s)
- Cheng Zou
- Department of Plant Biology, Michigan State University, East Lansing, Michigan, United States of America
- Department of Statistics and Probability, Michigan State University, East Lansing, Michigan, United States of America
| | - Melissa D. Lehti-Shiu
- Department of Plant Biology, Michigan State University, East Lansing, Michigan, United States of America
| | - Michael Thomashow
- MSU-DOE Plant Research Lab, Michigan State University, East Lansing, Michigan, United States of America
| | - Shin-Han Shiu
- Department of Plant Biology, Michigan State University, East Lansing, Michigan, United States of America
- * E-mail:
| |
Collapse
|
22
|
Multifunctionality dominantly determines the rate of human housekeeping and tissue specific interacting protein evolution. Gene 2009; 439:11-6. [PMID: 19306918 DOI: 10.1016/j.gene.2009.03.005] [Citation(s) in RCA: 28] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2008] [Revised: 03/02/2009] [Accepted: 03/06/2009] [Indexed: 01/09/2023]
Abstract
Elucidation of the determinants of the rate of protein sequence evolution is one of the great challenges in evolutionary biology. It has been proposed that housekeeping genes are evolutionarily slower than tissue specific genes. In the present communication, we have examined different determinants that influence the evolutionary rate variation in human housekeeping and tissue specific proteins present in protein-protein interaction network. Studies on yeast proteome, revealed a predominant role of protein connectivity in determining the rate of protein evolution. However, in human, we did not observe any significant influence of protein connectivity on its evolutionary rate. Rather, a significant impact of the proportion of protein's interacting length (amount of protein interface involved in interaction with its partners), expression level and multifunctionality has been observed in determining the rate of protein evolution. We also observed that multi interface proteins are evolutionarily conserved between housekeeping and tissue specific genes and it has been found that the average number of biological processes they associated in these two sets of genes is similar. Moreover, single interface proteins in housekeeping genes evolve more slowly as compared to tissue specific genes owing to their involvement in different number of biological processes. Partial correlation analysis suggests that the relative importance of three individual factors in determining the evolutionary rate variation between housekeeping and tissue specific proteins is in the order of protein multifunctionality>protein expression level>interacting protein length.
Collapse
|
23
|
Almeida FC, Desalle R. Orthology, function and evolution of accessory gland proteins in the Drosophila repleta group. Genetics 2009; 181:235-45. [PMID: 19015541 PMCID: PMC2621172 DOI: 10.1534/genetics.108.096263] [Citation(s) in RCA: 29] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2008] [Accepted: 11/10/2008] [Indexed: 01/03/2023] Open
Abstract
The accessory gland proteins (Acps) of Drosophila have become a model for the study of reproductive protein evolution. A major step in the study of Acps is to identify biological causes and consequences of the observed patterns of molecular evolution by comparing species groups with different biology. Here we characterize the Acp complement of Drosophila mayaguana, a repleta group representative. Species of this group show important differences in ecology and reproduction as compared to other Drosophila. Our results show that the extremely high rates of Acp evolution previously found are likely to be ubiquitous among species of the repleta group. These evolutionary rates are considerably higher than the ones observed in other Drosophila groups' Acps. This disparity, however, is not accompanied by major differences in the estimated number of Acps or in the functional categories represented as previously suggested. Among the genes expressed in accessory glands of D. mayaguana almost half are likely products of recent duplications. This allowed us to test predictions of the neofunctionalization model for gene duplication and paralog evolution in a more or less constrained timescale. We found that positive selection is a strong force in the early divergence of these gene pairs.
Collapse
|
24
|
Probabilistic cross-species inference of orthologous genomic regions created by whole-genome duplication in yeast. Genetics 2008; 179:1681-92. [PMID: 18562662 DOI: 10.1534/genetics.107.074450] [Citation(s) in RCA: 36] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Identification of orthologous genes across species becomes challenging in the presence of a whole-genome duplication (WGD). We present a probabilistic method for identifying orthologs that considers all possible orthology/paralogy assignments for a set of genomes with a shared WGD (here five yeast species). This approach allows us to estimate how confident we can be in the orthology assignments in each genomic region. Two inferences produced by this model are indicative of purifying selection acting to prevent duplicate gene loss. First, our model suggests that there are significant differences (up to a factor of seven) in duplicate gene half-life. Second, we observe differences between the genes that the model infers to have been lost soon after WGD and those lost more recently. Gene losses soon after WGD appear uncorrelated with gene expression level and knockout fitness defect. However, later losses are biased toward genes whose paralogs have high expression and large knockout fitness defects, as well as showing biases toward certain functional groups such as ribosomal proteins. We suggest that while duplicate copies of some genes may be lost neutrally after WGD, another set of genes may be initially preserved in duplicate by natural selection for reasons including dosage.
Collapse
|
25
|
Chain FJJ, Ilieva D, Evans BJ. Duplicate gene evolution and expression in the wake of vertebrate allopolyploidization. BMC Evol Biol 2008; 8:43. [PMID: 18261230 PMCID: PMC2275784 DOI: 10.1186/1471-2148-8-43] [Citation(s) in RCA: 39] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2007] [Accepted: 02/08/2008] [Indexed: 12/21/2022] Open
Abstract
Background The mechanism by which duplicate genes originate – whether by duplication of a whole genome or of a genomic segment – influences their genetic fates. To study events that trigger duplicate gene persistence after whole genome duplication in vertebrates, we have analyzed molecular evolution and expression of hundreds of persistent duplicate gene pairs in allopolyploid clawed frogs (Xenopus and Silurana). We collected comparative data that allowed us to tease apart the molecular events that occurred soon after duplication from those that occurred later on. We also quantified expression profile divergence of hundreds of paralogs during development and in different tissues. Results Our analyses indicate that persistent duplicates generated by allopolyploidization are subjected to strong purifying selection soon after duplication. The level of purifying selection is relaxed compared to a singleton ortholog, but not significantly variable over a period spanning about 40 million years. Despite persistent functional constraints, however, analysis of paralogous expression profiles indicates that quantitative aspects of their expression diverged substantially during this period. Conclusion These results offer clues into how vertebrate transcriptomes are sculpted in the wake of whole genome duplication (WGD), such as those that occurred in our early ancestors. That functional constraints were relaxed relative to a singleton ortholog but not significantly different in the early compared to the later stage of duplicate gene evolution suggests that the timescale for a return to pre-duplication levels is drawn out over tens of millions of years – beyond the age of these tetraploid species. Quantitative expression divergence can occur soon after WGD and with a magnitude that is not correlated with the rate of protein sequence divergence. On a coarse scale, quantitative expression divergence appears to be more prevalent than spatial and temporal expression divergence, and also faster or more frequent than other processes that operate at the protein level, such as some types of neofunctionalization.
Collapse
Affiliation(s)
- Frédéric J J Chain
- Center for Environmental Genomics, Department of Biology, Life Sciences Building Room 328 McMaster University, 1280 Main Street West, Hamilton, ON, L8S 4K1, Canada.
| | | | | |
Collapse
|
26
|
Scannell DR, Wolfe KH. A burst of protein sequence evolution and a prolonged period of asymmetric evolution follow gene duplication in yeast. Genes Dev 2008; 18:137-47. [PMID: 18025270 PMCID: PMC2134778 DOI: 10.1101/gr.6341207] [Citation(s) in RCA: 86] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2007] [Accepted: 09/23/2007] [Indexed: 11/24/2022]
Abstract
It is widely accepted that newly arisen duplicate gene pairs experience an altered selective regime that is often manifested as an increase in the rate of protein sequence evolution. Many details about the nature of the rate acceleration remain unknown, however, including its typical magnitude and duration, and whether it applies to both gene copies or just one. We provide initial answers to these questions by comparing the rate of protein sequence evolution among eight yeast species, between a large set of duplicate gene pairs that were created by a whole-genome duplication (WGD) and a set of genes that were returned to single-copy after this event. Importantly, we use a new method that takes into account the tendency for slowly evolving genes to be retained preferentially in duplicate. We show that, on average, proteins encoded by duplicate gene pairs evolved at least three times faster immediately after the WGD than single-copy genes to which they behave identically in non-WGD lineages. Although the high rate in duplicated genes subsequently declined rapidly, it has not yet returned to the typical rate for single-copy genes. In addition, we show that although duplicate gene pairs often have highly asymmetric rates of evolution, even the slower members of pairs show evidence of a burst of protein sequence evolution immediately after duplication. We discuss the contribution of neofunctionalization to duplicate gene preservation and propose that a form of subfunctionalization mediated by coding region activity-reducing mutations is likely to have played an important role.
Collapse
Affiliation(s)
- Devin R Scannell
- Smurfit Institute of Genetics, Trinity College Dublin, Dublin 2, Ireland.
| | | |
Collapse
|
27
|
Artamonova II, Gelfand MS. Comparative Genomics and Evolution of Alternative Splicing: The Pessimists' Science. Chem Rev 2007; 107:3407-30. [PMID: 17645315 DOI: 10.1021/cr068304c] [Citation(s) in RCA: 45] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
- Irena I Artamonova
- Group of Bioinformatics, Vavilov Institute of General Genetics, RAS, Gubkina 3, Moscow 119991, Russia
| | | |
Collapse
|
28
|
Liao BY, Scott NM, Zhang J. Impacts of Gene Essentiality, Expression Pattern, and Gene Compactness on the Evolutionary Rate of Mammalian Proteins. Mol Biol Evol 2006; 23:2072-80. [PMID: 16887903 DOI: 10.1093/molbev/msl076] [Citation(s) in RCA: 155] [Impact Index Per Article: 8.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Understanding the determinants of the rate of protein sequence evolution is of fundamental importance in evolutionary biology. Many recent studies have focused on the yeast because of the availability of many genome-wide expressional and functional data. Yeast studies revealed a predominant role of gene expression level and a minor role of gene essentiality in determining the rate of protein sequence evolution. Whether these rules apply to complex organisms such as mammals is unclear. Here we assemble a list of 1,138 essential and 2,341 nonessential mouse genes based on targeted gene deletion experiments and report a significant impact of gene essentiality on the rate of mammalian protein evolution. Gene expression level has virtually no effect, although tissue specificity in expression pattern has a strong influence. Unexpectedly, gene compactness, measured by average intron size and untranslated region length, has the greatest influence. Hence, the relative importance of the various factors in determining the rate of mammalian protein evolution is gene compactness > gene essentiality approximately tissue specificity > expression level. Our results suggest a considerable variation in rate determinants between unicellular organisms such as the yeast and multicellular organisms such as mammals.
Collapse
Affiliation(s)
- Ben-Yang Liao
- Department of Ecology and Evolutionary Biology, University of Michigan, USA
| | | | | |
Collapse
|
29
|
Brunet FG, Roest Crollius H, Paris M, Aury JM, Gibert P, Jaillon O, Laudet V, Robinson-Rechavi M. Gene loss and evolutionary rates following whole-genome duplication in teleost fishes. Mol Biol Evol 2006; 23:1808-16. [PMID: 16809621 DOI: 10.1093/molbev/msl049] [Citation(s) in RCA: 281] [Impact Index Per Article: 15.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Teleost fishes provide the first unambiguous support for ancient whole-genome duplication in an animal lineage. Studies in yeast or plants have shown that the effects of such duplications can be mediated by a complex pattern of gene retention and changes in evolutionary pressure. To explore such patterns in fishes, we have determined by phylogenetic analysis the evolutionary origin of 675 Tetraodon duplicated genes assigned to chromosomes, using additional data from other species of actinopterygian fishes. The subset of genes, which was retained in double after the genome duplication, is enriched in development, signaling, behavior, and regulation functional categories. The evolutionary rate of duplicate fish genes appears to be determined by 3 forces: 1) fish proteins evolve faster than mammalian orthologs; 2) the genes kept in double after genome duplication represent the subset under strongest purifying selection; and 3) following duplication, there is an asymmetric acceleration of evolutionary rate in one of the paralogs. These results show that similar mechanisms are at work in fishes as in yeast or plants and provide a framework for future investigation of the consequences of duplication in fishes and other animals.
Collapse
Affiliation(s)
- Frédéric G Brunet
- Laboratoire de Biologie Moléculaire de la Cellule, INRA LA 1237, CNRS UMR5161, IFR 128 BioSciences Lyon-Gerland, Ecole Normale Supérieure de Lyon, Lyon, France
| | | | | | | | | | | | | | | |
Collapse
|
30
|
Chain FJJ, Evans BJ. Multiple mechanisms promote the retained expression of gene duplicates in the tetraploid frog Xenopus laevis. PLoS Genet 2006; 2:e56. [PMID: 16683033 PMCID: PMC1449897 DOI: 10.1371/journal.pgen.0020056] [Citation(s) in RCA: 61] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2005] [Accepted: 02/28/2006] [Indexed: 01/19/2023] Open
Abstract
Gene duplication provides a window of opportunity for biological variants to persist under the protection of a co-expressed copy with similar or redundant function. Duplication catalyzes innovation (neofunctionalization), subfunction degeneration (subfunctionalization), and genetic buffering (redundancy), and the genetic survival of each paralog is triggered by mechanisms that add, compromise, or do not alter protein function. We tested the applicability of three types of mechanisms for promoting the retained expression of duplicated genes in 290 expressed paralogs of the tetraploid clawed frog, Xenopus laevis. Tests were based on explicit expectations concerning the ka/ks ratio, and the number and location of nonsynonymous substitutions after duplication. Functional constraints on the majority of paralogs are not significantly different from a singleton ortholog. However, we recover strong support that some of them have an asymmetric rate of nonsynonymous substitution: 6% match predictions of the neofunctionalization hypothesis in that (1) each paralog accumulated nonsynonymous substitutions at a significantly different rate and (2) the one that evolves faster has a higher ka/ks ratio than the other paralog and than a singleton ortholog. Fewer paralogs (3%) exhibit a complementary pattern of substitution at the protein level that is predicted by enhancement or degradation of different functional domains, and the remaining 13% have a higher average ka/ks ratio in both paralogs that is consistent with altered functional constraints, diversifying selection, or activity-reducing mutations after duplication. We estimate that these paralogs have been retained since they originated by genome duplication between 21 and 41 million years ago. Multiple mechanisms operate to promote the retained expression of duplicates in the same genome, in genes in the same functional class, over the same period of time following duplication, and sometimes in the same pair of paralogs. None of these paralogs are superfluous; degradation or enhancement of different protein subfunctions and neofunctionalization are plausible hypotheses for the retained expression of some of them. Evolution of most X. laevis paralogs, however, is consistent with retained expression via mechanisms that do not radically alter functional constraints, such as selection to preserve post-duplication stoichiometry or temporal, quantitative, or spatial subfunctionalization. Gene duplication plays a fundamental role in biological innovation but it is not clear how both copies of a duplicated gene manage to circumvent degradation by mutation if neither is unique. This study explores genetic mechanisms that could make each copy of a duplicate gene different, and therefore distinguishable and potentially preserved by natural selection. It is based on DNA sequences of the protein-coding region of 290 expressed duplicated genes in a frog, Xenopus laevis, that underwent complete duplication of its entire genome. Results provide evidence for multiple mechanisms acting within the same genome, within the same functional classes of genes, within the same period of time following duplication, and even on the same set of duplicated genes. Each copy of a duplicate gene may be subject to distinct evolutionary constraints, and this could be associated with degradation or enhancement of function. Functional constraints of most of these duplicates, however, are not substantially different from a single copy gene; their persistence in the first dozens of millions of years after duplication may more frequently be explained by mechanisms acting on their expression rather than their function.
Collapse
Affiliation(s)
- Frédéric J. J Chain
- Center for Environmental Genomics, Department of Biology, McMaster University, Hamilton, Ontario, Canada
| | - Ben J Evans
- Center for Environmental Genomics, Department of Biology, McMaster University, Hamilton, Ontario, Canada
- * To whom correspondence should be addressed. E-mail:
| |
Collapse
|
31
|
Cusack BP, Wolfe KH. Changes in alternative splicing of human and mouse genes are accompanied by faster evolution of constitutive exons. Mol Biol Evol 2005; 22:2198-208. [PMID: 16049198 DOI: 10.1093/molbev/msi218] [Citation(s) in RCA: 36] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Alternative splicing is known to be an important source of protein sequence variation, but its evolutionary impact has not been explored in detail. Studying alternative splicing requires extensive sampling of the transcriptome, but new data sets based on expressed sequence tags aligned to chromosomes make it possible to study alternative splicing on a genome-wide scale. Although genes showing alternative splicing by exon skipping are conserved as compared to the genome as a whole, we find that genes where structural differences between human and mouse result in genome-specific alternatively spliced exons in one species show almost 60% greater nonsynonymous divergence in constitutive exons than genes where exon skipping is conserved. This effect is also seen for genes showing species-specific patterns of alternative splicing where gene structure is conserved. Our observations are not attributable to an inherent difference in rate of evolution between these two sets of proteins or to differences with respect to predictors of evolutionary rate such as expression level, tissue specificity, or genetic redundancy. Where genome-specific alternatively spliced exons are seen in mammals, the vast majority of skipped exons appear to be recent additions to gene structures. Furthermore, among genes with genome-specific alternatively spliced exons, the degree of nonsynonymous divergence in constitutive sequence is a function of the frequency of incorporation of these alternative exons into transcripts. These results suggest that alterations in alternative splicing pattern can have knock-on effects in terms of accelerated sequence evolution in constant regions of the protein.
Collapse
Affiliation(s)
- Brian P Cusack
- Department of Genetics, Smurfit Institute, University of Dublin, Trinity College, Dublin, Ireland
| | | |
Collapse
|
32
|
Abstract
Over 35 years ago, Susumu Ohno stated that gene duplication was the single most important factor in evolution. He reiterated this point a few years later in proposing that without duplicated genes the creation of metazoans, vertebrates, and mammals from unicellular organisms would have been impossible. Such big leaps in evolution, he argued, required the creation of new gene loci with previously nonexistent functions. Bold statements such as these, combined with his proposal that at least one whole-genome duplication event facilitated the evolution of vertebrates, have made Ohno an icon in the literature on genome evolution. However, discussion on the occurrence and consequences of gene and genome duplication events has a much longer, and often neglected, history. Here we review literature dealing with the occurrence and consequences of gene duplication, beginning in 1911. We document conceptual and technological advances in gene duplication research from this early research in comparative cytology up to recent research on whole genomes, "transcriptomes," and "interactomes."
Collapse
Affiliation(s)
- John S Taylor
- Department of Biology, University of Victoria, British Columbia V8W 3N5, Canada.
| | | |
Collapse
|
33
|
Abstract
We report improved whole-genome shotgun sequences for the genomes of indica and japonica rice, both with multimegabase contiguity, or almost 1,000-fold improvement over the drafts of 2002. Tested against a nonredundant collection of 19,079 full-length cDNAs, 97.7% of the genes are aligned, without fragmentation, to the mapped super-scaffolds of one or the other genome. We introduce a gene identification procedure for plants that does not rely on similarity to known genes to remove erroneous predictions resulting from transposable elements. Using the available EST data to adjust for residual errors in the predictions, the estimated gene count is at least 38,000–40,000. Only 2%–3% of the genes are unique to any one subspecies, comparable to the amount of sequence that might still be missing. Despite this lack of variation in gene content, there is enormous variation in the intergenic regions. At least a quarter of the two sequences could not be aligned, and where they could be aligned, single nucleotide polymorphism (SNP) rates varied from as little as 3.0 SNP/kb in the coding regions to 27.6 SNP/kb in the transposable elements. A more inclusive new approach for analyzing duplication history is introduced here. It reveals an ancient whole-genome duplication, a recent segmental duplication on Chromosomes 11 and 12, and massive ongoing individual gene duplications. We find 18 distinct pairs of duplicated segments that cover 65.7% of the genome; 17 of these pairs date back to a common time before the divergence of the grasses. More important, ongoing individual gene duplications provide a never-ending source of raw material for gene genesis and are major contributors to the differences between members of the grass family. Comparative genome sequencing of indica and japonica rice reveals that duplication of genes and genomic regions has played a major part in the evolution of grass genomes
Collapse
|
34
|
Huminiecki L, Wolfe KH. Divergence of spatial gene expression profiles following species-specific gene duplications in human and mouse. Genome Res 2004; 14:1870-9. [PMID: 15466287 PMCID: PMC524410 DOI: 10.1101/gr.2705204] [Citation(s) in RCA: 126] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
To examine the process by which duplicated genes diverge in function, we studied how the gene expression profiles of orthologous gene sets in human and mouse are affected by the presence of additional recent species-specific paralogs. Gene expression profiles were compared across 16 homologous tissues in human and mouse using microarray data from the Gene Expression Atlas for 1575 sets of orthologs including 250 with species-specific paralogs. We find that orthologs that have undergone recent duplication are less likely to have strongly correlated expression profiles than those that remain in a one-to-one relationship between human and mouse. There is a general trend for paralogous genes to become more specialized in their expression patterns, with decreased breadth and increased specificity of expression as gene family size increases. Despite this trend, detailed examination of some particular gene families where species-specific duplications have occurred indicated several examples of apparent neofunctionalization of duplicated genes, but only one case of subfunctionalization. Often, the expression of both copies of a duplicated gene appears to have changed relative to the ancestral state. Our results suggest that gene expression profiles are surprisingly labile and that expression in a particular tissue may be gained or lost repeatedly during the evolution of even small gene families. We conclude that gene duplication is a major driving force behind the emergence of divergent gene expression patterns.
Collapse
Affiliation(s)
- Lukasz Huminiecki
- Department of Genetics, Smurfit Institute, University of Dublin, Trinity College, Dublin 2, Ireland.
| | | |
Collapse
|
35
|
Abstract
One of the greatest promises of genome sequencing projects is to further the understanding of human diseases and to develop new therapies. Model organism genomes have been sequenced in parallel to human genomes to provide effective tools for the investigation of human gene function. Many of their genes share a common ancestry and function with human genes, and this is particularly true for orthologous genes. Here we present OrthoDisease, a comprehensive database of model organism genes that are orthologous to human disease genes. OrthoDisease was constructed by applying the Inparanoid ortholog detection algorithm to disease genes derived from the Online Mendelian Inheritance in Man database (OMIM). Pairwise whole genome/proteome comparisons between Homo sapiens and six other organisms were performed to identify ortholog clusters. OMIM numbers were extracted from the OMIM Morbid Map and were converted to gene sequences using the Locuslink mim2loc and loc2acc tables. These were mapped to Inparanoid ortholog clusters using Blast. The number of ortholog clusters in OrthoDisease with each respective species is currently: M. musculus, 1,354; D. melanogaster, 724; C. elegans, 533; A. thaliana, 398; S. cerevisiae, 290; and E. coli, 153. The database is accessible online at http://orthodisease.cgb.ki.se, and can be searched with disease or protein names. The web interface presents all ortholog clusters that include a selected disease gene. A capability to download the entire dataset is also provided.
Collapse
Affiliation(s)
- Kevin P O'Brien
- Center for Genomics and Bioinformatics, Karolinska Institutet, Stockholm, Sweden.
| | | | | |
Collapse
|
36
|
Castillo-Davis CI, Hartl DL, Achaz G. cis-Regulatory and protein evolution in orthologous and duplicate genes. Genome Res 2004; 14:1530-6. [PMID: 15256508 PMCID: PMC509261 DOI: 10.1101/gr.2662504] [Citation(s) in RCA: 111] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
The relationship between protein and regulatory sequence evolution is a central question in molecular evolution. It is currently not known to what extent changes in gene expression are coupled with the evolution of protein coding sequences, or whether these changes differ among orthologs (species homologs) and paralogs (duplicate genes). Here, we develop a method to measure the extent of functionally relevant cis-regulatory sequence change in homologous genes, and validate it using microarray data and experimentally verified regulatory elements in different eukaryotic species. By comparing the genomes of Caenorhabditis elegans and C. briggsae, we found that protein and regulatory evolution is weakly coupled in orthologs but not paralogs, suggesting that selective pressure on gene expression and protein evolution is quite similar and persists for a significant amount of time following speciation but not gene duplication. Additionally, duplicates of both species exhibit a dramatic acceleration of both regulatory and protein evolution compared to orthologs, suggesting increased directional selection and/or relaxed selection on both gene expression patterns and protein function in duplicate genes.
Collapse
Affiliation(s)
- Cristian I Castillo-Davis
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, Massachusetts, 02138 USA
| | | | | |
Collapse
|
37
|
Jordan IK, Wolf YI, Koonin EV. Duplicated genes evolve slower than singletons despite the initial rate increase. BMC Evol Biol 2004; 4:22. [PMID: 15238160 PMCID: PMC481058 DOI: 10.1186/1471-2148-4-22] [Citation(s) in RCA: 156] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2004] [Accepted: 07/06/2004] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Gene duplication is an important mechanism that can lead to the emergence of new functions during evolution. The impact of duplication on the mode of gene evolution has been the subject of several theoretical and empirical comparative-genomic studies. It has been shown that, shortly after the duplication, genes seem to experience a considerable relaxation of purifying selection. RESULTS Here we demonstrate two opposite effects of gene duplication on evolutionary rates. Sequence comparisons between paralogs show that, in accord with previous observations, a substantial acceleration in the evolution of paralogs occurs after duplication, presumably due to relaxation of purifying selection. The effect of gene duplication on evolutionary rate was also assessed by sequence comparison between orthologs that have paralogs (duplicates) and those that do not (singletons). It is shown that, in eukaryotes, duplicates, on average, evolve significantly slower than singletons. Eukaryotic ortholog evolutionary rates for duplicates are also negatively correlated with the number of paralogs per gene and the strength of selection between paralogs. A tally of annotated gene functions shows that duplicates tend to be enriched for proteins with known functions, particularly those involved in signaling and related cellular processes; by contrast, singletons include an over-abundance of poorly characterized proteins. CONCLUSIONS These results suggest that whether or not a gene duplicate is retained by selection depends critically on the pre-existing functional utility of the protein encoded by the ancestral singleton. Duplicates of genes of a higher biological import, which are subject to strong functional constraints on the sequence, are retained relatively more often. Thus, the evolutionary trajectory of duplicated genes appears to be determined by two opposing trends, namely, the post-duplication rate acceleration and the generally slow evolutionary rate owing to the high level of functional constraints.
Collapse
MESH Headings
- Animals
- Base Composition/genetics
- DNA/genetics
- DNA, Archaeal/genetics
- DNA, Bacterial/genetics
- Evolution, Molecular
- Genes/genetics
- Genes/physiology
- Genes, Archaeal/genetics
- Genes, Archaeal/physiology
- Genes, Bacterial/genetics
- Genes, Bacterial/physiology
- Genes, Duplicate/genetics
- Genes, Duplicate/physiology
- Genes, Fungal/genetics
- Genes, Fungal/physiology
- Genes, Insect/genetics
- Genes, Insect/physiology
- Gram-Negative Bacteria/genetics
- Gram-Positive Bacteria/genetics
- Humans
- Mice
- Mutation/genetics
- Sequence Homology, Nucleic Acid
Collapse
Affiliation(s)
- I King Jordan
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | - Yuri I Wolf
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | - Eugene V Koonin
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| |
Collapse
|
38
|
Davis JC, Petrov DA. Preferential duplication of conserved proteins in eukaryotic genomes. PLoS Biol 2004; 2:E55. [PMID: 15024414 PMCID: PMC368158 DOI: 10.1371/journal.pbio.0020055] [Citation(s) in RCA: 124] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2003] [Accepted: 12/18/2003] [Indexed: 11/18/2022] Open
Abstract
A central goal in genome biology is to understand the origin and maintenance of genic diversity. Over evolutionary time, each gene's contribution to the genic content of an organism depends not only on its probability of long-term survival, but also on its propensity to generate duplicates that are themselves capable of long-term survival. In this study we investigate which types of genes are likely to generate functional and persistent duplicates. We demonstrate that genes that have generated duplicates in the C. elegans and S. cerevisiae genomes were 25%-50% more constrained prior to duplication than the genes that failed to leave duplicates. We further show that conserved genes have been consistently prolific in generating duplicates for hundreds of millions of years in these two species. These findings reveal one way in which gene duplication shapes the content of eukaryotic genomes. Our finding that the set of duplicate genes is biased has important implications for genome-scale studies.
Collapse
Affiliation(s)
- Jerel C Davis
- Department of Biological Sciences, Stanford University, Stanford, California, USA.
| | | |
Collapse
|
39
|
Swart EC, Hide WA, Seoighe C. FRAGS: estimation of coding sequence substitution rates from fragmentary data. BMC Bioinformatics 2004; 5:8. [PMID: 15005802 PMCID: PMC344743 DOI: 10.1186/1471-2105-5-8] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2003] [Accepted: 01/29/2004] [Indexed: 01/06/2023] Open
Abstract
Background Rates of substitution in protein-coding sequences can provide important insights into evolutionary processes that are of biomedical and theoretical interest. Increased availability of coding sequence data has enabled researchers to estimate more accurately the coding sequence divergence of pairs of organisms. However the use of different data sources, alignment protocols and methods to estimate substitution rates leads to widely varying estimates of key parameters that define the coding sequence divergence of orthologous genes. Although complete genome sequence data are not available for all organisms, fragmentary sequence data can provide accurate estimates of substitution rates provided that an appropriate and consistent methodology is used and that differences in the estimates obtainable from different data sources are taken into account. Results We have developed FRAGS, an application framework that uses existing, freely available software components to construct in-frame alignments and estimate coding substitution rates from fragmentary sequence data. Coding sequence substitution estimates for human and chimpanzee sequences, generated by FRAGS, reveal that methodological differences can give rise to significantly different estimates of important substitution parameters. The estimated substitution rates were also used to infer upper-bounds on the amount of sequencing error in the datasets that we have analysed. Conclusion We have developed a system that performs robust estimation of substitution rates for orthologous sequences from a pair of organisms. Our system can be used when fragmentary genomic or transcript data is available from one of the organisms and the other is a completely sequenced genome within the Ensembl database. As well as estimating substitution statistics our system enables the user to manage and query alignment and substitution data.
Collapse
Affiliation(s)
- Estienne C Swart
- South African National Bioinformatics Institute, University of the Western Cape, Private Bag X17, Bellville 7535, South Africa
| | - Winston A Hide
- South African National Bioinformatics Institute, University of the Western Cape, Private Bag X17, Bellville 7535, South Africa
| | - Cathal Seoighe
- South African National Bioinformatics Institute, University of the Western Cape, Private Bag X17, Bellville 7535, South Africa
| |
Collapse
|
40
|
Abstract
We compared genes at which mutations are known to cause human disease (disease genes) with other human genes (nondisease genes) using a large set of human-rodent alignments to infer evolutionary patterns. Such comparisons may be of use both in predicting disease genes and in understanding the general evolution of human genes. Four features were found to differ significantly between disease and nondisease genes, with disease genes (i) evolving with higher nonsynonymous/synonymous substitution rate ratios (Ka/Ks), (ii) evolving at higher synonymous substitution rates, (iii) with longer protein-coding sequences, and (iv) expressed in a narrower range of tissues. Discriminant analysis showed that these differences may help to predict human disease genes. We also investigated other factors affecting the mode of evolution in the disease genes: Ka/Ks is significantly affected by protein function, mode of inheritance, and the reduction of life expectancy caused by disease.
Collapse
Affiliation(s)
- Nick G C Smith
- Department of Evolutionary Biology, Evolutionary Biology Centre, Uppsala University, Norbyvägen 18D, 752 36 Uppsala, Sweden.
| | | |
Collapse
|
41
|
Abstract
Complete genome sequence data led rapidly to the conclusion that ancient genome duplications had shaped the genomes of the model organisms Saccharomyces cerevisiae and Arabidopsis thaliana. Recent contributions have gone on to refine date estimates for these duplications and, in the case of Arabidopsis, to infer additional, more ancient, rounds of duplication by reconstructing gene order before the most recent duplication event. It is becoming widely accepted that an ancient duplication occurred before the radiation of the ray-finned fish. However, despite methodological advances and the availability of complete genome sequence data the debate over whether very ancient genome duplications have occurred early in the vertebrate lineage has not yet been fully resolved.
Collapse
Affiliation(s)
- Cathal Seoighe
- South African National Bioinformatics Institute, University of the Western Cape, Private Bag X17, Bellville 7535, South Africa.
| |
Collapse
|
42
|
Doyle CK, Davis BK, Cook RG, Rich RR, Rodgers JR. Hyperconservation of the N-formyl peptide binding site of M3: evidence that M3 is an old eutherian molecule with conserved recognition of a pathogen-associated molecular pattern. JOURNAL OF IMMUNOLOGY (BALTIMORE, MD. : 1950) 2003; 171:836-44. [PMID: 12847252 DOI: 10.4049/jimmunol.171.2.836] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Abstract
The mouse MHC class I-b molecule H2-M3 has unique specificity for N-formyl peptides, derived from bacteria (and mitochondria), and is thus a pathogen-associated molecular pattern recognition receptor (PRR). To test whether M3 was selected for this PRR function, we studied M3 sequences from diverse murid species of murine genera Mus, Rattus, Apodemus, Diplothrix, Hybomys, Mastomys, and Tokudaia and of sigmodontine genera Sigmodon and PEROMYSCUS: We found that M3 is highly conserved, and the 10 residues coordinating the N-formyl group are almost invariant. The ratio of nonsynonymous and synonymous substitution rates suggests the Ag recognition site of M3, unlike the Ag recognition site of class I-a molecules, is under strong negative (purifying) selection and has been for at least 50-65 million years. Consistent with this, M3 alpha1alpha2 domains from Rattus norvegicus and Sigmodon hispidus and from the "null" allele H2-M3(b) specifically bound N-formyl peptides. The pattern of nucleotide substitution in M3 suggests M3 arose rapidly from murid I-a precursors by an evolutionary leap ("saltation"), perhaps involving intense selective pressure from bacterial pathogens. Alternatively, M3 arose more slowly but prior to the radiation of eutherian (placental) mammals. Older dates for the emergence of M3, and the accepted antiquity of CD1, suggest that primordial class I MHC molecules could have evolved originally as monomorphic PRR, presenting pathogen-associated molecular patterns. Such MHC PRR molecules could have been preadaptations for the evolution of acquired immunity during the early vertebrate radiation.
Collapse
Affiliation(s)
- C Kuyler Doyle
- Baylor College of Medicine, Department of Immunology, Houston, TX 77030, USA
| | | | | | | | | |
Collapse
|
43
|
Cooper GM, Brudno M, Green ED, Batzoglou S, Sidow A. Quantitative estimates of sequence divergence for comparative analyses of mammalian genomes. Genome Res 2003; 13:813-20. [PMID: 12727901 PMCID: PMC430923 DOI: 10.1101/gr.1064503] [Citation(s) in RCA: 95] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
Comparative sequence analyses on a collection of carefully chosen mammalian genomes could facilitate identification of functional elements within the human genome and allow quantification of evolutionary constraint at the single nucleotide level. High-resolution quantification would be informative for determining the distribution of important positions within functional elements and for evaluating the relative importance of nucleotide sites that carry single nucleotide polymorphisms (SNPs). Because the level of resolution in comparative sequence analyses is a direct function of sequence diversity, we propose that the information content of a candidate mammalian genome be defined as the sequence divergence it would add relative to already-sequenced genomes. We show that reliable estimates of genomic sequence divergence can be obtained from small genomic regions. On the basis of a multiple sequence alignment of approximately 1.4 megabases each from eight mammals, we generate such estimates for five unsequenced mammals. Estimates of the neutral divergence in these data suggest that a small number of diverse mammalian genomes in addition to human, mouse, and rat would allow single nucleotide resolution in comparative sequence analyses.
Collapse
Affiliation(s)
- Gregory M Cooper
- Department of Genetics, Stanford University, Stanford, California 94305, USA
| | | | | | | | | |
Collapse
|
44
|
Larhammar D, Lundin LG, Hallböök F. The human Hox-bearing chromosome regions did arise by block or chromosome (or even genome) duplications. Genome Res 2002; 12:1910-20. [PMID: 12466295 PMCID: PMC187569 DOI: 10.1101/gr.445702] [Citation(s) in RCA: 98] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2002] [Accepted: 09/30/2002] [Indexed: 11/25/2022]
Abstract
Many chromosome regions in the human genome exist in four similar copies, suggesting that the entire genome was duplicated twice in early vertebrate evolution, a concept called the 2R hypothesis. Forty-two gene families on the four Hox-bearing chromosomes were recently analyzed by others, and 32 of these were reported to have evolutionary histories incompatible with duplications concomitant with the Hox clusters, thereby contradicting the 2R hypothesis. However, we show here that nine of the families have probably been translocated to the Hox-bearing chromosomes more recently, and that three of these belong to other chromosome quartets where they actually support the 2R hypothesis. We consider 13 families too complex to shed light on the chromosome duplication hypothesis. Among the remaining 20 families, 14 display phylogenies that support or are at least consistent with the Hox-cluster duplications. Only six families seem to have other phylogenies, but these trees are highly uncertain due to shortage of sequence information. We conclude that all relevant and analyzable families support or are consistent with block/chromosome duplications and that none clearly contradicts the 2R hypothesis.
Collapse
Affiliation(s)
- Dan Larhammar
- Unit of Pharmacology, Uppsala University, SE-75124 Uppsala, Sweden.
| | | | | |
Collapse
|