251
|
Jovelin R, Phillips PC. Functional constraint and divergence in the G protein family in Caenorhabditis elegans and Caenorhabditis briggsae. Mol Genet Genomics 2005; 273:299-310. [PMID: 15856303 DOI: 10.1007/s00438-004-1105-6] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2004] [Accepted: 12/09/2004] [Indexed: 10/25/2022]
Abstract
Part of the challenge of the post-genomic world is to identify functional elements within the wide array of information generated by genome sequencing. Although cross-species comparisons and investigation of rates of sequence divergence are an efficient approach, the relationship between sequence divergence and functional conservation is not clear. Here, we use a comparative approach to examine questions of evolutionary rates and conserved function within the guanine nucleotide-binding protein (G protein) gene family in nematodes of the genus Caenorhabditis. In particular, we show that, in cases where the Caenorhabditis elegans ortholog shows a loss-of-function phenotype, G protein genes of C. elegans and Caenorhabditis briggsae diverge on average three times more slowly than G protein genes that do not exhibit any phenotype when mutated in C. elegans, suggesting that genes with loss of function phenotypes are subject to stronger selective constraints in relation to their function in both species. Our results also indicate that selection is as strong on G proteins involved in environmental perception as it is on those controlling other important processes. Finally, using phylogenetic footprinting, we identify a conserved non-coding motif present in multiple copies in the genomes of four species of Caenorhabditis. The presence of this motif in the same intron in the gpa-1 genes of C. elegans, C. briggsae and Caenorhabditis remanei suggests that it plays a role in the regulation of gpa-1, as well as other loci.
Collapse
Affiliation(s)
- Richard Jovelin
- Center for Ecology and Evolutionary Biology, University of Oregon, Eugene, OR, 97403-5289, USA
| | | |
Collapse
|
252
|
Wall DP, Hirsh AE, Fraser HB, Kumm J, Giaever G, Eisen MB, Feldman MW. Functional genomic analysis of the rates of protein evolution. Proc Natl Acad Sci U S A 2005; 102:5483-8. [PMID: 15800036 PMCID: PMC555735 DOI: 10.1073/pnas.0501761102] [Citation(s) in RCA: 225] [Impact Index Per Article: 11.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
The evolutionary rates of proteins vary over several orders of magnitude. Recent work suggests that analysis of large data sets of evolutionary rates in conjunction with the results from high-throughput functional genomic experiments can identify the factors that cause proteins to evolve at such dramatically different rates. To this end, we estimated the evolutionary rates of >3,000 proteins in four species of the yeast genus Saccharomyces and investigated their relationship with levels of expression and protein dispensability. Each protein's dispensability was estimated by the growth rate of mutants deficient for the protein. Our analyses of these improved evolutionary and functional genomic data sets yield three main results. First, dispensability and expression have independent, significant effects on the rate of protein evolution. Second, measurements of expression levels in the laboratory can be used to filter data sets of dispensability estimates, removing variates that are unlikely to reflect real biological effects. Third, structural equation models show that although we may reasonably infer that dispensability and expression have significant effects on protein evolutionary rate, we cannot yet accurately estimate the relative strengths of these effects.
Collapse
Affiliation(s)
- Dennis P Wall
- Department of Biological Sciences, and Stanford Genome Technology Center, Stanford University, Stanford, CA 94305, USA.
| | | | | | | | | | | | | |
Collapse
|
253
|
Bhardwaj N, Lu H. Correlation between gene expression profiles and protein-protein interactions within and across genomes. Bioinformatics 2005; 21:2730-8. [PMID: 15797912 DOI: 10.1093/bioinformatics/bti398] [Citation(s) in RCA: 112] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022] Open
Abstract
MOTIVATION Function annotation of an unclassified protein on the basis of its interaction partners is well documented in the literature. Reliable predictions of interactions from other data sources such as gene expression measurements would provide a useful route to function annotation. We investigate the global relationship of protein-protein interactions with gene expression. This relationship is studied in four evolutionarily diverse species, for which substantial information regarding their interactions and expression is available: human, mouse, yeast and Escherichia coli. RESULTS In E.coli the expression of interacting pairs is highly correlated in comparison to random pairs, while in the other three species, the correlation of expression of interacting pairs is only slightly stronger than that of random pairs. To strengthen the correlation, we developed a protocol to integrate ortholog information into the interaction and expression datasets. In all four genomes, the likelihood of predicting protein interactions from highly correlated expression data is increased using our protocol. In yeast, for example, the likelihood of predicting a true interaction, when the correlation is > 0.9, increases from 1.4 to 9.4. The improvement demonstrates that protein interactions are reflected in gene expression and the correlation between the two is strengthened by evolution information. The results establish that co-expression of interacting protein pairs is more conserved than that of random ones.
Collapse
Affiliation(s)
- Nitin Bhardwaj
- Bioinformatics Program, Department of Bioengineering, University of Illinois at Chicago, Chicago, IL 60607, USA
| | | |
Collapse
|
254
|
Gu X, Zhang Z, Huang W. Rapid evolution of expression and regulatory divergences after yeast gene duplication. Proc Natl Acad Sci U S A 2005; 102:707-12. [PMID: 15647348 PMCID: PMC545572 DOI: 10.1073/pnas.0409186102] [Citation(s) in RCA: 148] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2004] [Indexed: 11/18/2022] Open
Abstract
Although gene duplication is widely believed to be the major source of genetic novelty, how the expression or regulatory network of duplicate genes evolves remains poorly understood. In this article, we propose an additive expression distance between duplicate genes, so that the evolutionary rate of expression divergence after gene duplication can be estimated through phylogenomic analysis. We have analyzed yeast genome sequences, microarrays, and transcriptional regulatory networks, showing a >10-fold increase in the initial rate for both expression and regulatory network evolution after gene duplication but only an approximately 20% rate increase in the early stage for protein sequences. Based on the estimated age distribution of yeast duplicate genes, we roughly estimate that the initial rate of expression divergence shortly after gene duplication is 2.9 x 10(-9) per year, whereas the baseline rate for very ancient gene duplication is 0.14 x 10(-9) per year. Relative expression rate tests suggest that the expression of duplicate genes tends to evolve asymmetrically, that is, the expression of one copy evolves rapidly, whereas the other one largely maintains the ancestral expression profile. Our study highlights the crucial role of early rapid evolution after gene/genome duplication for continuously increasing the complexity of the yeast regulatory network.
Collapse
Affiliation(s)
- Xun Gu
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Institutes of Biomedical Sciences, Fudan University, Shanghai 200433, China.
| | | | | |
Collapse
|
255
|
Jordan IK, Mariño-Ramírez L, Koonin EV. Evolutionary significance of gene expression divergence. Gene 2005; 345:119-26. [PMID: 15716085 PMCID: PMC1859841 DOI: 10.1016/j.gene.2004.11.034] [Citation(s) in RCA: 104] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2004] [Revised: 11/08/2004] [Accepted: 11/15/2004] [Indexed: 11/25/2022]
Abstract
Recent large-scale studies of evolutionary changes in gene expression among mammalian species have led to the proposal that gene expression divergence may be neutral with respect to organismic fitness. Here, we employ a comparative analysis of mammalian gene sequence divergence and gene expression divergence to test the hypothesis that the evolution of gene expression is predominantly neutral. Two models of neutral gene expression evolution are considered: 1-purely neutral evolution (i.e., no selective constraint) of gene expression levels and patterns and 2-neutral evolution accompanied by selective constraint. With respect to purely neutral evolution, levels of change in gene expression between human-mouse orthologs are correlated with levels of gene sequence divergence that are determined largely by purifying selection. In contrast, evolutionary changes of tissue-specific gene expression profiles do not show such a correlation with sequence divergence. However, divergence of both gene expression levels and profiles are significantly lower for orthologous human-mouse gene pairs than for pairs of randomly chosen human and mouse genes. These data clearly point to the action of selective constraint on gene expression divergence and are inconsistent with the purely neutral model; however, there is likely to be a neutral component in evolution of gene expression, particularly, in tissues where the expression of a given gene is low and functionally irrelevant. The model of neutral evolution with selective constraint predicts a regular, clock-like accumulation of gene expression divergence. However, relative rate tests of the divergence among human-mouse-rat orthologous gene sets reveal clock-like evolution for gene sequence divergence, and to a lesser extent for gene expression level divergence, but not for the divergence of tissue-specific gene expression profiles. Taken together, these results indicate that gene expression divergence is subject to the effects of purifying selective constraint and suggest that it might also be substantially influenced by positive Darwinian selection.
Collapse
Affiliation(s)
- I King Jordan
- National Center for Biotechnology Information, National Institutes of Health 8600 Rockville Pike, Bldg 38A/Room 5N511-M, Bethesda, MD 20894, USA.
| | | | | |
Collapse
|
256
|
Hazkani-Covo E, Wool D, Graur D. In search of the vertebrate phylotypic stage: A molecular examination of the developmental hourglass model and von Baer's third law. JOURNAL OF EXPERIMENTAL ZOOLOGY PART B-MOLECULAR AND DEVELOPMENTAL EVOLUTION 2005; 304:150-8. [PMID: 15779077 DOI: 10.1002/jez.b.21033] [Citation(s) in RCA: 60] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/01/2023]
Abstract
In 1828, Karl von Baer proposed a set of four evolutionary "laws" pertaining to embryological development. According to von Baer's third law, young embryos from different species are relatively undifferentiated and resemble one another but as development proceeds, distinguishing features of the species begin to appear and embryos of different species progressively diverge from one another. An expansion of this law, called "the hourglass model," has been proposed independently by Denis Duboule and Rudolf Raff in the 1990s. According to the hourglass model, ontogeny is characterized by a starting point at which different taxa differ markedly from one another, followed by a stage of reduced intertaxonomic variability (the phylotypic stage), and ending in a von-Baer-like progressive divergence among the taxa. A possible "translation" of the hourglass model into molecular terminology would suggest that orthologs expressed in stages described by the tapered part of the hourglass should resemble one another more than orthologs expressed in the expansive parts that precede or succeed the phylotypic stage. We tested this hypothesis using 1,585 mouse genes expressed during 26 embryonic stages, and their human orthologs. Evolutionary divergence was estimated at different embryonic stages by calculating pairwise distances between corresponding orthologous proteins from mouse and human. Two independent datasets were used. One dataset contained genes that are expressed solely in a single developmental stage; the second was made of genes expressed at different developmental stages. In the second dataset the genes were classified according to their earliest stage of expression. We fitted second order polynomials to the two datasets. The two polynomials displayed minima as expected from the hourglass model. The molecular results suggest, albeit weakly, that a phylotypic stage (or period) indeed exists. Its temporal location, sometimes between the first-somites stage and the formation of the posterior neuropore, was in approximate agreement with the morphologically defined phylotypic stage. The molecular evidence for the later parts of the hourglass model, i.e., for von Baer's third law, was stronger than that for the earlier parts.
Collapse
Affiliation(s)
- Einat Hazkani-Covo
- Department of Zoology, George S. Wise Faculty of Life Sciences, Tel Aviv University, Ramat Aviv 69978, Israel
| | | | | |
Collapse
|
257
|
Albertson RC, Payne-Ferreira TL, Postlethwait J, Yelick PC. Zebrafishacvr2a andacvr2b exhibit distinct roles in craniofacial development. Dev Dyn 2005; 233:1405-18. [PMID: 15977175 DOI: 10.1002/dvdy.20480] [Citation(s) in RCA: 28] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023] Open
Abstract
To examine the roles of activin type II receptor signaling in craniofacial development, full-length zebrafish acvr2a and acvr2b clones were isolated. Although ubiquitously expressed as maternal mRNAs and in early embryogenesis, by 24 hr postfertilization (hpf), acvr2a and acvr2b exhibit restricted expression in neural, hindbrain, and neural crest cells (NCCs). A morpholino-based targeted protein depletion approach was used to reveal discrete functions for each acvr2 gene product. The acvr2a morphants exhibited defects in the development of most cranial NCC-derived cartilage, bone, and pharyngeal tooth structures, whereas acvr2b morphant defects were largely restricted to posterior arch structures and included the absence and/or aberrant migration of posterior NCC streams, defects in NCC-derived posterior arch cartilages, and dysmorphic pharyngeal tooth development. These studies revealed previously uncharacterized roles for acvr2a and acvr2b in hindbrain and NCC patterning, in NCC derived pharyngeal arch cartilage and joint formation, and in tooth development.
Collapse
Affiliation(s)
- R Craig Albertson
- Department of Cytokine Biology, The Forsyth Institute, Harvard School of Dental Medicine, Boston, Massachusetts 02115, USA
| | | | | | | |
Collapse
|
258
|
Nayak S, Goree J, Schedl T. fog-2 and the evolution of self-fertile hermaphroditism in Caenorhabditis. PLoS Biol 2004; 3:e6. [PMID: 15630478 PMCID: PMC539060 DOI: 10.1371/journal.pbio.0030006] [Citation(s) in RCA: 88] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2004] [Accepted: 10/16/2004] [Indexed: 01/06/2023] Open
Abstract
Somatic and germline sex determination pathways have diverged significantly in animals, making comparisons between taxa difficult. To overcome this difficulty, we compared the genes in the germline sex determination pathways of Caenorhabditis elegans and C. briggsae, two Caenorhabditis species with similar reproductive systems and sequenced genomes. We demonstrate that C. briggsae has orthologs of all known C. elegans sex determination genes with one exception: fog-2. Hermaphroditic nematodes are essentially females that produce sperm early in life, which they use for self fertilization. In C. elegans, this brief period of spermatogenesis requires FOG-2 and the RNA-binding protein GLD-1, which together repress translation of the tra-2 mRNA. FOG-2 is part of a large C. elegans FOG-2-related protein family defined by the presence of an F-box and Duf38/FOG-2 homogy domain. A fog-2-related gene family is also present in C. briggsae, however, the branch containing fog-2 appears to have arisen relatively recently in C. elegans, post-speciation. The C-terminus of FOG-2 is rapidly evolving, is required for GLD-1 interaction, and is likely critical for the role of FOG-2 in sex determination. In addition, C. briggsae gld-1 appears to play the opposite role in sex determination (promoting the female fate) while maintaining conserved roles in meiotic progression during oogenesis. Our data indicate that the regulation of the hermaphrodite germline sex determination pathway at the level of FOG-2/GLD-1/tra-2 mRNA is fundamentally different between C. elegans and C. briggsae, providing functional evidence in support of the independent evolution of self-fertile hermaphroditism. We speculate on the convergent evolution of hermaphroditism in Caenorhabditis based on the plasticity of the C. elegans germline sex determination cascade, in which multiple mutant paths yield self fertility. A comparison of sex determination genes in C. elegans and C. briggsae provides evidence in support of the convergent evolution of self-fertile hermaphroditism in the Caenorhabditis clade
Collapse
Affiliation(s)
- Sudhir Nayak
- 1Department of Genetics, Washington University School of MedicineSt. Louis, MissouriUnited States of America
| | - Johnathan Goree
- 1Department of Genetics, Washington University School of MedicineSt. Louis, MissouriUnited States of America
| | - Tim Schedl
- 1Department of Genetics, Washington University School of MedicineSt. Louis, MissouriUnited States of America
| |
Collapse
|
259
|
Hahn MW, Kern AD. Comparative genomics of centrality and essentiality in three eukaryotic protein-interaction networks. Mol Biol Evol 2004; 22:803-6. [PMID: 15616139 DOI: 10.1093/molbev/msi072] [Citation(s) in RCA: 464] [Impact Index Per Article: 23.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/15/2023] Open
Abstract
Most proteins do not evolve in isolation, but as components of complex genetic networks. Therefore, a protein's position in a network may indicate how central it is to cellular function and, hence, how constrained it is evolutionarily. To look for an effect of position on evolutionary rate, we examined the protein-protein interaction networks in three eukaryotes: yeast, worm, and fly. We find that the three networks have remarkably similar structure, such that the number of interactors per protein and the centrality of proteins in the networks have similar distributions. Proteins that have a more central position in all three networks, regardless of the number of direct interactors, evolve more slowly and are more likely to be essential for survival. Our results are thus consistent with a classic proposal of Fisher's that pleiotropy constrains evolution.
Collapse
|
260
|
Bolotin A, Quinquis B, Renault P, Sorokin A, Ehrlich SD, Kulakauskas S, Lapidus A, Goltsman E, Mazur M, Pusch GD, Fonstein M, Overbeek R, Kyprides N, Purnelle B, Prozzi D, Ngui K, Masuy D, Hancy F, Burteau S, Boutry M, Delcour J, Goffeau A, Hols P. Complete sequence and comparative genome analysis of the dairy bacterium Streptococcus thermophilus. Nat Biotechnol 2004; 22:1554-8. [PMID: 15543133 PMCID: PMC7416660 DOI: 10.1038/nbt1034] [Citation(s) in RCA: 357] [Impact Index Per Article: 17.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2004] [Accepted: 09/21/2004] [Indexed: 02/06/2023]
Abstract
The lactic acid bacterium Streptococcus thermophilus is widely used for the manufacture of yogurt and cheese. This dairy species of major economic importance is phylogenetically close to pathogenic streptococci, raising the possibility that it has a potential for virulence. Here we report the genome sequences of two yogurt strains of S. thermophilus. We found a striking level of gene decay (10% pseudogenes) in both microorganisms. Many genes involved in carbon utilization are nonfunctional, in line with the paucity of carbon sources in milk. Notably, most streptococcal virulence-related genes that are not involved in basic cellular processes are either inactivated or absent in the dairy streptococcus. Adaptation to the constant milk environment appears to have resulted in the stabilization of the genome structure. We conclude that S. thermophilus has evolved mainly through loss-of-function events that remarkably mirror the environment of the dairy niche resulting in a severely diminished pathogenic potential.
Collapse
Affiliation(s)
- Alexander Bolotin
- Génétique Microbienne. Centre de Recherche de Jouy en Josas, Institut National de la Recherche Agronomique, Jouy en Josas, 78352 Cedex France
| | - Benoît Quinquis
- Génétique Microbienne. Centre de Recherche de Jouy en Josas, Institut National de la Recherche Agronomique, Jouy en Josas, 78352 Cedex France
| | - Pierre Renault
- Génétique Microbienne. Centre de Recherche de Jouy en Josas, Institut National de la Recherche Agronomique, Jouy en Josas, 78352 Cedex France
| | - Alexei Sorokin
- Génétique Microbienne. Centre de Recherche de Jouy en Josas, Institut National de la Recherche Agronomique, Jouy en Josas, 78352 Cedex France
| | - S Dusko Ehrlich
- Génétique Microbienne. Centre de Recherche de Jouy en Josas, Institut National de la Recherche Agronomique, Jouy en Josas, 78352 Cedex France
| | - Saulius Kulakauskas
- Unité de Recherche Latière et Génétique Appliquée, Centre de Recherche de Jouy en Josas, Institut National de la Recherche Agronomique, Jouy en Josas, 78352 Cedex France
| | - Alla Lapidus
- Integrated Genomics, Chicago, 60612 USA Illinois
- Present Address: Microbial Genomics, Department of Energy Joint Genome Institute, 2800 Mitchell Drive, B400, Walnut Creek, California 94598 USA
| | - Eugene Goltsman
- Integrated Genomics, Chicago, 60612 USA Illinois
- Present Address: Microbial Genomics, Department of Energy Joint Genome Institute, 2800 Mitchell Drive, B400, Walnut Creek, California 94598 USA
| | | | - Gordon D Pusch
- Integrated Genomics, Chicago, 60612 USA Illinois
- Present Address: Fellowship for Interpretation of Genomes, 15W155 81st Street, Burr Ridge, Illinois 60527 USA
| | - Michael Fonstein
- Integrated Genomics, Chicago, 60612 USA Illinois
- Present Address: Cleveland BioLabs, Inc., 10265 Carnegie Ave., Cleveland, Ohio 44106
| | - Ross Overbeek
- Integrated Genomics, Chicago, 60612 USA Illinois
- Present Address: Fellowship for Interpretation of Genomes, 15W155 81st Street, Burr Ridge, Illinois 60527 USA
| | - Nikos Kyprides
- Integrated Genomics, Chicago, 60612 USA Illinois
- Present Address: Microbial Genomics, Department of Energy Joint Genome Institute, 2800 Mitchell Drive, B400, Walnut Creek, California 94598 USA
| | - Bénédicte Purnelle
- Institut des Sciences de la Vie, Université Catholique de Louvain, Louvain-la-Neuve, 1348 Belgium
| | - Deborah Prozzi
- Institut des Sciences de la Vie, Université Catholique de Louvain, Louvain-la-Neuve, 1348 Belgium
| | - Katrina Ngui
- Institut des Sciences de la Vie, Université Catholique de Louvain, Louvain-la-Neuve, 1348 Belgium
- Present Address: Department Anatomy and Cell Biology, University of Melbourne, Victoria 3010 Australia
| | - David Masuy
- Institut des Sciences de la Vie, Université Catholique de Louvain, Louvain-la-Neuve, 1348 Belgium
| | - Frédéric Hancy
- Institut des Sciences de la Vie, Université Catholique de Louvain, Louvain-la-Neuve, 1348 Belgium
| | - Sophie Burteau
- Institut des Sciences de la Vie, Université Catholique de Louvain, Louvain-la-Neuve, 1348 Belgium
- Present Address: Unité de Recherche en Biologie Cellulaire, Facultés Universitaires Notre-Dame de la Paix, 61 Rue de Bruxelles, 5000 Namur, Belgium
| | - Marc Boutry
- Institut des Sciences de la Vie, Université Catholique de Louvain, Louvain-la-Neuve, 1348 Belgium
| | - Jean Delcour
- Institut des Sciences de la Vie, Université Catholique de Louvain, Louvain-la-Neuve, 1348 Belgium
| | - André Goffeau
- Institut des Sciences de la Vie, Université Catholique de Louvain, Louvain-la-Neuve, 1348 Belgium
| | - Pascal Hols
- Institut des Sciences de la Vie, Université Catholique de Louvain, Louvain-la-Neuve, 1348 Belgium
| |
Collapse
|
261
|
Albà MM, Castresana J. Inverse relationship between evolutionary rate and age of mammalian genes. Mol Biol Evol 2004; 22:598-606. [PMID: 15537804 DOI: 10.1093/molbev/msi045] [Citation(s) in RCA: 126] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
A large number of genes is shared by all living organisms, whereas many others are unique to some specific lineages, indicating their different times of origin. The availability of a growing number of eukaryotic genomes allows us to estimate which mammalian genes are novel genes and, approximately, when they arose. In this article, we classify human genes into four different age groups and estimate evolutionary rates in human and mouse orthologs. We show that older genes tend to evolve more slowly than newer ones; that is, proteins that arose earlier in evolution currently have a larger proportion of sites subjected to negative selection. Interestingly, this property is maintained when a fraction of the fastest-evolving genes is excluded or when only genes belonging to a given functional class are considered. One way to explain this relationship is by assuming that genes maintain their functional constraints along all their evolutionary history, but the nature of more recent evolutionary innovations is such that the functional constraints operating on them are increasingly weaker. Alternatively, our results would also be consistent with a scenario in which the functional constraints acting on a gene would not need to be constant through evolution. Instead, starting from weak functional constraints near the time of origin of a gene-as supported by mechanisms proposed for the origin of orphan genes-there would be a gradual increase in selective pressures with time, resulting in fewer accepted mutations in older versus more novel genes.
Collapse
Affiliation(s)
- M Mar Albà
- Research Group on Biomedical Informatics, Departament de Ciències Experimentals i de la Salut, Universitat Pompeu Fabra, Barcelona, Spain.
| | | |
Collapse
|
262
|
Conservation and evolution of cis-regulatory systems in ascomycete fungi. PLoS Biol 2004; 2:e398. [PMID: 15534694 PMCID: PMC526180 DOI: 10.1371/journal.pbio.0020398] [Citation(s) in RCA: 179] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2004] [Accepted: 09/09/2004] [Indexed: 12/18/2022] Open
Abstract
Relatively little is known about the mechanisms through which gene expression regulation evolves. To investigate this, we systematically explored the conservation of regulatory networks in fungi by examining the cis-regulatory elements that govern the expression of coregulated genes. We first identified groups of coregulated Saccharomyces cerevisiae genes enriched for genes with known upstream or downstream cis-regulatory sequences. Reasoning that many of these gene groups are coregulated in related species as well, we performed similar analyses on orthologs of coregulated S. cerevisiae genes in 13 other ascomycete species. We find that many species-specific gene groups are enriched for the same flanking regulatory sequences as those found in the orthologous gene groups from S. cerevisiae, indicating that those regulatory systems have been conserved in multiple ascomycete species. In addition to these clear cases of regulatory conservation, we find examples of cis-element evolution that suggest multiple modes of regulatory diversification, including alterations in transcription factor-binding specificity, incorporation of new gene targets into an existing regulatory system, and cooption of regulatory systems to control a different set of genes. We investigated one example in greater detail by measuring the in vitro activity of the S. cerevisiae transcription factor Rpn4p and its orthologs from Candida albicans and Neurospora crassa. Our results suggest that the DNA binding specificity of these proteins has coevolved with the sequences found upstream of the Rpn4p target genes and suggest that Rpn4p has a different function in N. crassa. A systematic examination of the gene regulatory elements in ascomycete fungi reveals striking conservation along with some examples of the ways in which regulatory systems can evolve
Collapse
|
263
|
Pereira-Leal JB, Audit B, Peregrin-Alvarez JM, Ouzounis CA. An exponential core in the heart of the yeast protein interaction network. Mol Biol Evol 2004; 22:421-5. [PMID: 15496552 DOI: 10.1093/molbev/msi024] [Citation(s) in RCA: 44] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023] Open
Abstract
Protein interactions in the budding yeast have been shown to form a scale-free network, a feature of other organized networks such as bacterial and archaeal metabolism and the World Wide Web. Here, we study the connections established by yeast proteins and discover a preferential attachment between essential proteins. The essential-essential connections are long ranged and form a subnetwork where the giant component includes 97% of these proteins. Unexpectedly, this subnetwork displays an exponential connectivity distribution, in sharp contrast to the scale-free topology of the complete network. Furthermore, the wide phylogenetic extent of these core proteins and interactions provides evidence that they represent the ancestral state of the yeast protein interaction network. Finally, we propose that this core exponential network may represent a generic scaffold around which organism-specific and taxon-specific proteins and interactions coalesce.
Collapse
Affiliation(s)
- José B Pereira-Leal
- Computational Genomics Group, The European Bioinformatics Institute, EMBL Cambridge Outstation, Wellcome Trust Genome Campus, Cambridge, UK.
| | | | | | | |
Collapse
|
264
|
Chen Y, Xu D. Understanding protein dispensability through machine-learning analysis of high-throughput data. Bioinformatics 2004; 21:575-81. [PMID: 15479713 DOI: 10.1093/bioinformatics/bti058] [Citation(s) in RCA: 59] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION Protein dispensability is fundamental to the understanding of gene function and evolution. Recent advances in generating high-throughput data such as genomic sequence data, protein-protein interaction data, gene-expression data and growth-rate data of mutants allow us to investigate protein dispensability systematically at the genome scale. RESULTS In our studies, protein dispensability is represented as a fitness score that is measured by the growth rate of gene-deletion mutants. By the analyses of high-throughput data in yeast Saccharomyces cerevisiae, we found that a protein's dispensability had significant correlations with its evolutionary rate and duplication rate, as well as its connectivity in protein-protein interaction network and gene-expression correlation network. Neural network and support vector machine were applied to predict protein dispensability through high-throughput data. Our studies shed some lights on global characteristics of protein dispensability and evolution. AVAILABILITY The original datasets for protein dispensability analysis and prediction, together with related scripts, are available at http://digbio.missouri.edu/~ychen/ProDispen/ CONTACT xudong@missouri.edu.
Collapse
Affiliation(s)
- Yu Chen
- UT-ORNL Graduate School of Genome Science and Technology Oak Ridge, TN, USA
| | | |
Collapse
|
265
|
Hirsh AE, Fraser HB, Wall DP. Adjusting for selection on synonymous sites in estimates of evolutionary distance. Mol Biol Evol 2004; 22:174-7. [PMID: 15371530 DOI: 10.1093/molbev/msh265] [Citation(s) in RCA: 53] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Evolution at silent sites is often used to estimate the pace of selectively neutral processes or to infer differences in divergence times of genes. However, silent sites are subject to selection in favor of preferred codons, and the strength of such selection varies dramatically across genes. Here, we use the relationship between codon bias and synonymous divergence observed in four species of the genus Saccharomyces to provide a simple correction for selection on silent sites.
Collapse
Affiliation(s)
- Aaron E Hirsh
- Department of Biological Sciences, Stanford University, Stanford, California, USA
| | | | | |
Collapse
|
266
|
Baudot A, Jacq B, Brun C. A scale of functional divergence for yeast duplicated genes revealed from analysis of the protein-protein interaction network. Genome Biol 2004; 5:R76. [PMID: 15461795 PMCID: PMC545596 DOI: 10.1186/gb-2004-5-10-r76] [Citation(s) in RCA: 31] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2004] [Revised: 06/11/2004] [Accepted: 08/02/2004] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Studying the evolution of the function of duplicated genes usually implies an estimation of the extent of functional conservation/divergence between duplicates from comparison of actual sequences. This only reveals the possible molecular function of genes without taking into account their cellular function(s). We took into consideration this latter dimension of gene function to approach the functional evolution of duplicated genes by analyzing the protein-protein interaction network in which their products are involved. For this, we derived a functional classification of the proteins using PRODISTIN, a bioinformatics method allowing comparison of protein function. Our work focused on the duplicated yeast genes, remnants of an ancient whole-genome duplication. RESULTS Starting from 4,143 interactions, we analyzed 41 duplicated protein pairs with the PRODISTIN method. We showed that duplicated pairs behaved differently in the classification with respect to their interactors. The different observed behaviors allowed us to propose a functional scale of conservation/divergence for the duplicated genes, based on interaction data. By comparing our results to the functional information carried by GO annotations and sequence comparisons, we showed that the interaction network analysis reveals functional subtleties, which are not discernible by other means. Finally, we interpreted our results in terms of evolutionary scenarios. CONCLUSIONS Our analysis might provide a new way to analyse the functional evolution of duplicated genes and constitutes the first attempt of protein function evolutionary comparisons based on protein-protein interactions.
Collapse
Affiliation(s)
- Anaïs Baudot
- Laboratoire de Génétique et Physiologie du Développement, IBDM, CNRS INSERM Université de la Méditerranée, Parc Scientifique de Luminy, Case 907, 13288 Marseille Cedex 9, France
| | - Bernard Jacq
- Laboratoire de Génétique et Physiologie du Développement, IBDM, CNRS INSERM Université de la Méditerranée, Parc Scientifique de Luminy, Case 907, 13288 Marseille Cedex 9, France
| | - Christine Brun
- Laboratoire de Génétique et Physiologie du Développement, IBDM, CNRS INSERM Université de la Méditerranée, Parc Scientifique de Luminy, Case 907, 13288 Marseille Cedex 9, France
| |
Collapse
|
267
|
Premzl M, Gready JE, Jermiin LS, Simonic T, Marshall Graves JA. Evolution of vertebrate genes related to prion and Shadoo proteins--clues from comparative genomic analysis. Mol Biol Evol 2004; 21:2210-31. [PMID: 15342797 DOI: 10.1093/molbev/msh245] [Citation(s) in RCA: 43] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023] Open
Abstract
Recent findings of new genes in fish related to the prion protein (PrP) gene PRNP, including our recent report of SPRN coding for Shadoo (Sho) protein found also in mammals, raise issues of their function and evolution. Here we report additional novel fish genes found in public databases, including a duplicated SPRN gene, SPRNB, in Fugu, Tetraodon, carp, and zebrafish encoding the Sho2 protein, and we use comparative genomic analysis to analyze the evolutionary relationships and to infer evolutionary trajectories of the complete data set. Phylogenetic footprinting performed on aligned human, mouse, and Fugu SPRN genes to define candidate regulatory promoter regions, detected 16 conserved motifs, three of which are known transcription factor-binding sites for a receptor and transcription factors specific to or associated with expression in brain. This result and other homology-based (VISTA global genomic alignment; protein sequence alignment and phylogenetics) and context-dependent (genomic context; relative gene order and orientation) criteria indicate fish and mammalian SPRN genes are orthologous and suggest a strongly conserved basic function in brain. Whereas tetrapod PRNPs share context with the analogous stPrP-2-coding gene in fish, their sequences are diverged, suggesting that the tetrapod and fish genes are likely to have significantly different functions. Phylogenetic analysis predicts the SPRN/SPRNB duplication occurred before divergence of fish from tetrapods, whereas that of stPrP-1 and stPrP-2 occurred in fish. Whereas Sho appears to have a conserved function in vertebrate brain, PrP seems to have an adaptive role fine-tuned in a lineage-specific fashion. An evolutionary model consistent with our findings and literature knowledge is proposed that has an ancestral prevertebrate SPRN-like gene leading to all vertebrate PrP-related and Sho-related genes. This provides a new framework for exploring the evolution of this unusual family of proteins and for searching for members in other fish branches and intermediate vertebrate groups.
Collapse
Affiliation(s)
- Marko Premzl
- Computational Proteomics Group, John Curtin School of Medical Research, Australian National University, Canberra, Australia
| | | | | | | | | |
Collapse
|
268
|
Jordan IK, Mariño-Ramírez L, Wolf YI, Koonin EV. Conservation and coevolution in the scale-free human gene coexpression network. Mol Biol Evol 2004; 21:2058-70. [PMID: 15282333 DOI: 10.1093/molbev/msh222] [Citation(s) in RCA: 153] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
The role of natural selection in biology is well appreciated. Recently, however, a critical role for physical principles of network self-organization in biological systems has been revealed. Here, we employ a systems level view of genome-scale sequence and expression data to examine the interplay between these two sources of order, natural selection and physical self-organization, in the evolution of human gene regulation. The topology of a human gene coexpression network, derived from tissue-specific expression profiles, shows scale-free properties that imply evolutionary self-organization via preferential node attachment. Genes with numerous coexpressed partners (the hubs of the coexpression network) evolve more slowly on average than genes with fewer coexpressed partners, and genes that are coexpressed show similar rates of evolution. Thus, the strength of selective constraints on gene sequences is affected by the topology of the gene coexpression network. This connection is strong for the coding regions and 3' untranslated regions (UTRs), but the 5' UTRs appear to evolve under a different regime. Surprisingly, we found no connection between the rate of gene sequence divergence and the extent of gene expression profile divergence between human and mouse. This suggests that distinct modes of natural selection might govern sequence versus expression divergence, and we propose a model, based on rapid, adaptation-driven divergence and convergent evolution of gene expression patterns, for how natural selection could influence gene expression divergence.
Collapse
Affiliation(s)
- I King Jordan
- National Center for Biotechnology Information, National Institutes of Health Bethesda, Maryland, USA
| | | | | | | |
Collapse
|
269
|
Abstract
There is growing interest in the evolutionary dynamics of molecular genetic pathways and networks, and the extent to which the molecular evolution of a gene depends on its position within a pathway or network, as well as over-all network topology. Investigations on the relationships between network organization, topological architecture and evolutionary dynamics provide intriguing hints as to how networks evolve. Recent studies also suggest that genetic pathway and network structures may influence the action of evolutionary forces, and may play a role in maintaining phenotypic robustness in organisms.
Collapse
Affiliation(s)
- Jennifer M Cork
- Department of Genetics, North Carolina State University, Raliegh, NC 27695, USA
| | | |
Collapse
|
270
|
López-Bigas N, Ouzounis CA. Genome-wide identification of genes likely to be involved in human genetic disease. Nucleic Acids Res 2004; 32:3108-14. [PMID: 15181176 PMCID: PMC434425 DOI: 10.1093/nar/gkh605] [Citation(s) in RCA: 208] [Impact Index Per Article: 10.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
Sequence analysis of the group of proteins known to be associated with hereditary diseases allows the detection of key distinctive features shared within this group. The disease proteins are characterized by greater length of their amino acid sequence, a broader phylogenetic extent, and specific conservation and paralogy profiles compared with all human proteins. This unique property pattern provides insights into the global nature of hereditary diseases and moreover can be used to predict novel disease genes. We have developed a computational method that allows the detection of genes likely to be involved in hereditary disease in the human genome. The probability score assignments for the human genome are accessible at http://maine.ebi. ac.uk:8000/services/dgp.
Collapse
Affiliation(s)
- Núria López-Bigas
- Computational Genomics Group, The European Bioinformatics Institute, EMBL Cambridge Outstation, Cambridge CB10 1SD, UK
| | | |
Collapse
|
271
|
Fraser HB, Hirsh AE, Giaever G, Kumm J, Eisen MB. Noise minimization in eukaryotic gene expression. PLoS Biol 2004; 2:e137. [PMID: 15124029 PMCID: PMC400249 DOI: 10.1371/journal.pbio.0020137] [Citation(s) in RCA: 286] [Impact Index Per Article: 14.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2004] [Accepted: 03/09/2004] [Indexed: 01/08/2023] Open
Abstract
All organisms have elaborate mechanisms to control rates of protein production. However, protein production is also subject to stochastic fluctuations, or “noise.” Several recent studies in Saccharomyces cerevisiae and Escherichia coli have investigated the relationship between transcription and translation rates and stochastic fluctuations in protein levels, or more generally, how such randomness is a function of intrinsic and extrinsic factors. However, the fundamental question of whether stochasticity in protein expression is generally biologically relevant has not been addressed, and it remains unknown whether random noise in the protein production rate of most genes significantly affects the fitness of any organism. We propose that organisms should be particularly sensitive to variation in the protein levels of two classes of genes: genes whose deletion is lethal to the organism and genes that encode subunits of multiprotein complexes. Using an experimentally verified model of stochastic gene expression in S. cerevisiae, we estimate the noise in protein production for nearly every yeast gene, and confirm our prediction that the production of essential and complex-forming proteins involves lower levels of noise than does the production of most other genes. Our results support the hypothesis that noise in gene expression is a biologically important variable, is generally detrimental to organismal fitness, and is subject to natural selection. Analysis of gene expression data for nearly every gene in yeast provides evidence that random variation in the production rate of proteins could significantly affect the fitness of an organism
Collapse
Affiliation(s)
- Hunter B Fraser
- Department of Molecular and Cell Biology, University of California, Berkeley, USA.
| | | | | | | | | |
Collapse
|
272
|
Davis JC, Petrov DA. Preferential duplication of conserved proteins in eukaryotic genomes. PLoS Biol 2004; 2:E55. [PMID: 15024414 PMCID: PMC368158 DOI: 10.1371/journal.pbio.0020055] [Citation(s) in RCA: 124] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2003] [Accepted: 12/18/2003] [Indexed: 11/18/2022] Open
Abstract
A central goal in genome biology is to understand the origin and maintenance of genic diversity. Over evolutionary time, each gene's contribution to the genic content of an organism depends not only on its probability of long-term survival, but also on its propensity to generate duplicates that are themselves capable of long-term survival. In this study we investigate which types of genes are likely to generate functional and persistent duplicates. We demonstrate that genes that have generated duplicates in the C. elegans and S. cerevisiae genomes were 25%-50% more constrained prior to duplication than the genes that failed to leave duplicates. We further show that conserved genes have been consistently prolific in generating duplicates for hundreds of millions of years in these two species. These findings reveal one way in which gene duplication shapes the content of eukaryotic genomes. Our finding that the set of duplicate genes is biased has important implications for genome-scale studies.
Collapse
Affiliation(s)
- Jerel C Davis
- Department of Biological Sciences, Stanford University, Stanford, California, USA.
| | | |
Collapse
|
273
|
Kondrashov FA, Ogurtsov AY, Kondrashov AS. Bioinformatical assay of human gene morbidity. Nucleic Acids Res 2004; 32:1731-7. [PMID: 15020709 PMCID: PMC390328 DOI: 10.1093/nar/gkh330] [Citation(s) in RCA: 61] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Only a fraction of eukaryotic genes affect the phenotype drastically. We compared 18 parameters in 1273 human morbid genes, known to cause diseases, and in the remaining 16 580 unambiguous human genes. Morbid genes evolve more slowly, have wider phylogenetic distributions, are more similar to essential genes of Drosophila melanogaster, code for longer proteins containing more alanine and glycine and less histidine, lysine and methionine, possess larger numbers of longer introns with more accurate splicing signals and have higher and broader expressions. These differences make it possible to classify as non-morbid 34% of human genes with unknown morbidity, when only 5% of known morbid genes are incorrectly classified as non-morbid. This classification can help to identify disease-causing genes among multiple candidates.
Collapse
Affiliation(s)
- Fyodor A Kondrashov
- National Center for Biotechnology Information, National Institutes of Health, 38a Center Drive, 6S602, Bethesda, MD 20892, USA.
| | | | | |
Collapse
|
274
|
Affiliation(s)
- Lars M Steinmetz
- European Molecular Biology Laboratory, Meyerhofstrasse 1, 69117 Heidelberg, Germany.
| | | |
Collapse
|
275
|
Koonin EV, Fedorova ND, Jackson JD, Jacobs AR, Krylov DM, Makarova KS, Mazumder R, Mekhedov SL, Nikolskaya AN, Rao BS, Rogozin IB, Smirnov S, Sorokin AV, Sverdlov AV, Vasudevan S, Wolf YI, Yin JJ, Natale DA. A comprehensive evolutionary classification of proteins encoded in complete eukaryotic genomes. Genome Biol 2004; 5:R7. [PMID: 14759257 PMCID: PMC395751 DOI: 10.1186/gb-2004-5-2-r7] [Citation(s) in RCA: 676] [Impact Index Per Article: 33.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2003] [Revised: 12/01/2003] [Accepted: 12/04/2003] [Indexed: 11/10/2022] Open
Abstract
We examined functional and evolutionary patterns in the recently constructed set of 5,873 clusters of predicted orthologs from seven eukaryotic genomes. The analysis reveals a conserved core of largely essential eukaryotic genes as well as major diversification and innovation associated with evolution of eukaryotic genomes. Background Sequencing the genomes of multiple, taxonomically diverse eukaryotes enables in-depth comparative-genomic analysis which is expected to help in reconstructing ancestral eukaryotic genomes and major events in eukaryotic evolution and in making functional predictions for currently uncharacterized conserved genes. Results We examined functional and evolutionary patterns in the recently constructed set of 5,873 clusters of predicted orthologs (eukaryotic orthologous groups or KOGs) from seven eukaryotic genomes: Caenorhabditis elegans, Drosophila melanogaster, Homo sapiens, Arabidopsis thaliana, Saccharomyces cerevisiae, Schizosaccharomyces pombe and Encephalitozoon cuniculi. Conservation of KOGs through the phyletic range of eukaryotes strongly correlates with their functions and with the effect of gene knockout on the organism's viability. The approximately 40% of KOGs that are represented in six or seven species are enriched in proteins responsible for housekeeping functions, particularly translation and RNA processing. These conserved KOGs are often essential for survival and might approximate the minimal set of essential eukaryotic genes. The 131 single-member, pan-eukaryotic KOGs we identified were examined in detail. For around 20 that remained uncharacterized, functions were predicted by in-depth sequence analysis and examination of genomic context. Nearly all these proteins are subunits of known or predicted multiprotein complexes, in agreement with the balance hypothesis of evolution of gene copy number. Other KOGs show a variety of phyletic patterns, which points to major contributions of lineage-specific gene loss and the 'invention' of genes new to eukaryotic evolution. Examination of the sets of KOGs lost in individual lineages reveals co-elimination of functionally connected genes. Parsimonious scenarios of eukaryotic genome evolution and gene sets for ancestral eukaryotic forms were reconstructed. The gene set of the last common ancestor of the crown group consists of 3,413 KOGs and largely includes proteins involved in genome replication and expression, and central metabolism. Only 44% of the KOGs, mostly from the reconstructed gene set of the last common ancestor of the crown group, have detectable homologs in prokaryotes; the remainder apparently evolved via duplication with divergence and invention of new genes. Conclusions The KOG analysis reveals a conserved core of largely essential eukaryotic genes as well as major diversification and innovation associated with evolution of eukaryotic genomes. The results provide quantitative support for major trends of eukaryotic evolution noticed previously at the qualitative level and a basis for detailed reconstruction of evolution of eukaryotic genomes and biology of ancestral forms.
Collapse
Affiliation(s)
- Eugene V Koonin
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
276
|
Krylov DM, Wolf YI, Rogozin IB, Koonin EV. Gene loss, protein sequence divergence, gene dispensability, expression level, and interactivity are correlated in eukaryotic evolution. Genome Res 2003; 13:2229-35. [PMID: 14525925 PMCID: PMC403683 DOI: 10.1101/gr.1589103] [Citation(s) in RCA: 300] [Impact Index Per Article: 14.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
Lineage-specific gene loss, to a large extent, accounts for the differences in gene repertoires between genomes, particularly among eukaryotes. We derived a parsimonious scenario of gene losses for eukaryotic orthologous groups (KOGs) from seven complete eukaryotic genomes. The scenario involves substantial gene loss in fungi, nematodes, and insects. Based on this evolutionary scenario and estimates of the divergence times between major eukaryotic phyla, we introduce a numerical measure, the propensity for gene loss (PGL). We explore the connection among the propensity of a gene to be lost in evolution (PGL value), protein sequence divergence, the effect of gene knockout on fitness, the number of protein-protein interactions, and expression level for the genes in KOGs. Significant correlations between PGL and each of these variables were detected. Genes that have a lower propensity to be lost in eukaryotic evolution accumulate fewer substitutions in their protein sequences and tend to be essential for the organism viability, tend to be highly expressed, and have many interaction partners. The dependence between PGL and gene dispensability and interactivity is much stronger than that for sequence evolution rate. Thus, propensity of a gene to be lost during evolution seems to be a direct reflection of its biological importance.
Collapse
Affiliation(s)
- Dmitri M Krylov
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland 20894, USA
| | | | | | | |
Collapse
|
277
|
|
278
|
Affiliation(s)
- Cristian I Castillo-Davis
- Department of Organismic and Evolutionary Biology, Harvard University, 16 Divinity Avenue, Cambridge, MA 02138, USA
| | | |
Collapse
|
279
|
Rocha EPC, Danchin A. An analysis of determinants of amino acids substitution rates in bacterial proteins. Mol Biol Evol 2003; 21:108-16. [PMID: 14595100 DOI: 10.1093/molbev/msh004] [Citation(s) in RCA: 213] [Impact Index Per Article: 10.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
The variation of amino acid substitution rates in proteins depends on several variables. Among these, the protein's expression level, functional category, essentiality, or metabolic costs of its amino acid residues may play an important role. However, the relative importance of each variable has not yet been evaluated in comparative analyses. To this aim, we made regression analyses combining data available on these variables and on evolutionary rates, in two well-documented model bacteria, Escherichia coli and Bacillus subtilis. In both bacteria, the level of expression of the protein in the cell was by far the most important driving force constraining the amino acids substitution rate. Subsequent inclusion in the analysis of the other variables added little further information. Furthermore, when the rates of synonymous substitutions were included in the analysis of the E. coli data, only the variable expression levels remained statistically significant. The rate of nonsynonymous substitution was shown to correlate with expression levels independently of the rate of synonymous substitution. These results suggest an important direct influence of expression levels, or at least codon usage bias for translation optimization, on the rates of nonsynonymous substitutions in bacteria. They also indicate that when a control for this variable is included, essentiality plays no significant role in the rate of protein evolution in bacteria, as is the case in eukaryotes.
Collapse
|
280
|
Wuchty S, Oltvai ZN, Barabási AL. Evolutionary conservation of motif constituents in the yeast protein interaction network. Nat Genet 2003; 35:176-9. [PMID: 12973352 DOI: 10.1038/ng1242] [Citation(s) in RCA: 211] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2003] [Accepted: 08/26/2003] [Indexed: 11/09/2022]
Abstract
Understanding why some cellular components are conserved across species but others evolve rapidly is a key question of modern biology. Here we show that in Saccharomyces cerevisiae, proteins organized in cohesive patterns of interactions are conserved to a substantially higher degree than those that do not participate in such motifs. We find that the conservation of proteins in distinct topological motifs correlates with the interconnectedness and function of that motif and also depends on the structure of the overall interactome topology. These findings indicate that motifs may represent evolutionary conserved topological units of cellular networks molded in accordance with the specific biological function in which they participate.
Collapse
Affiliation(s)
- S Wuchty
- Department of Physics, University of Notre Dame, Notre Dame, Indiana 46556, USA
| | | | | |
Collapse
|
281
|
Affiliation(s)
- David B Searls
- Bioinformatics Division, Genetics Research, GlaxoSmithKline Pharmaceuticals, 709 Swedeland Road, P.O. Box 1539, King of Prussia, Pennsylvania 19406, USA.
| |
Collapse
|
282
|
Affiliation(s)
- Xun Gu
- Department of Genetics, Development and Cell Biology, Centre for Bioinformatics and Biological Statistics, Iowa State University, Ames, IA 50011, USA.
| |
Collapse
|
283
|
Fraser HB, Wall DP, Hirsh AE. A simple dependence between protein evolution rate and the number of protein-protein interactions. BMC Evol Biol 2003; 3:11. [PMID: 12769820 PMCID: PMC166126 DOI: 10.1186/1471-2148-3-11] [Citation(s) in RCA: 130] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2003] [Accepted: 05/23/2003] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND It has been shown for an evolutionarily distant genomic comparison that the number of protein-protein interactions a protein has correlates negatively with their rates of evolution. However, the generality of this observation has recently been challenged. Here we examine the problem using protein-protein interaction data from the yeast Saccharomyces cerevisiae and genome sequences from two other yeast species. RESULTS In contrast to a previous study that used an incomplete set of protein-protein interactions, we observed a highly significant correlation between number of interactions and evolutionary distance to either Candida albicans or Schizosaccharomyces pombe. This study differs from the previous one in that it includes all known protein interactions from S. cerevisiae, and a larger set of protein evolutionary rates. In both evolutionary comparisons, a simple monotonic relationship was found across the entire range of the number of protein-protein interactions. In agreement with our earlier findings, this relationship cannot be explained by the fact that proteins with many interactions tend to be important to yeast. The generality of these correlations in other kingdoms of life unfortunately cannot be addressed at this time, due to the incompleteness of protein-protein interaction data from organisms other than S. cerevisiae. CONCLUSIONS Protein-protein interactions tend to slow the rate at which proteins evolve. This may be due to structural constraints that must be met to maintain interactions, but more work is needed to definitively establish the mechanism(s) behind the correlations we have observed.
Collapse
Affiliation(s)
- Hunter B Fraser
- Department of Molecular and Cell Biology, University of California, Berkeley, CA, 94720, USA
| | - Dennis P Wall
- Center for Computational Genetics and Biological Modeling, Department of Biological Sciences, Stanford University, Stanford, CA, 94305, USA
| | - Aaron E Hirsh
- Center for Computational Genetics and Biological Modeling, Department of Biological Sciences, Stanford University, Stanford, CA, 94305, USA
| |
Collapse
|
284
|
Pál C, Hurst LD. Evidence for co-evolution of gene order and recombination rate. Nat Genet 2003; 33:392-5. [PMID: 12577060 DOI: 10.1038/ng1111] [Citation(s) in RCA: 123] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2002] [Accepted: 01/22/2003] [Indexed: 12/31/2022]
Abstract
There is increasing evidence in eukaryotic genomes that gene order is not random, even allowing for tandem duplication. Notably, in numerous genomes, genes of similar expression tend to be clustered. Are there other reasons for clustering of functionally similar genes? If genes are linked to enable genetic, rather than physical clustering, then we also expect that clusters of certain genes might be associated with blocks of reduced recombination rates. Here we show that, in yeast, essential genes are highly clustered and this clustering is independent of clustering of co-expressed genes and of tandem duplications. Adjacent pairs of essential genes are preferentially conserved through evolution. Notably, we also find that clusters of essential genes are in regions of low recombination and that larger clusters have lower recombination rates. These results suggest that selection acts to modify both the fine-scale intragenomic variation in the recombination rate and the distribution of genes and provide evidence for co-evolution of gene order and recombination rate.
Collapse
Affiliation(s)
- Csaba Pál
- Department of Biology and Biochemistry, University of Bath, BA2 7AY, Bath, Somerset, UK
| | | |
Collapse
|
285
|
Abstract
Changes in technology in the past decade have had such an impact on the way that molecular evolution research is done that it is difficult now to imagine working in a world without genomics or the Internet. In 1992, GenBank was less than a hundredth of its current size and was updated every three months on a huge spool of tape. Homology searches took 30 minutes and rarely found a hit. Now it is difficult to find sequences with only a few homologs to use as examples for teaching bioinformatics. For molecular evolution researchers, the genomics revolution has showered us with raw data and the information revolution has given us the wherewithal to analyze it. In broad terms, the most significant outcome from these changes has been our newfound ability to examine the evolution of genomes as a whole, enabling us to infer genome-wide evolutionary patterns and to identify subsets of genes whose evolution has been in some way atypical.
Collapse
Affiliation(s)
- Kenneth H Wolfe
- Department of Genetics, Smurfit Institute, University of Dublin, Trinity College, Dublin 2, Ireland.
| | | |
Collapse
|
286
|
Abstract
The pathway that controls sexual fate in the nematode Caenorhabditis elegans has been well characterized at the molecular level. By identifying differences between the sex-determination mechanisms in C. elegans and other nematode species, it should be possible to understand how complex sex-determining pathways evolve. Towards this goal, orthologues of many of the C. elegans sex regulators have been isolated from other members of the genus Caenorhabditis. Rapid sequence evolution is observed in every case, but several of the orthologues appear to have conserved sex-determining roles. Thus extensive sequence divergence does not necessarily coincide with changes in pathway structure, although the same forces may contribute to both. This review summarizes recent findings and, with reference to results from other animals, offers explanations for why sex-determining genes and pathways appear to be evolving rapidly. Experimental strategies that hold promise for illuminating pathway differences between nematodes are also discussed.
Collapse
Affiliation(s)
- Paul Stothard
- Department of Biological Sciences, University of Alberta, Edmonton, Alberta, T6G 2E9, Canada
| | | |
Collapse
|
287
|
Pál C, Papp B, Hurst LD. Genomic function: Rate of evolution and gene dispensability. Nature 2003; 421:496-7; discussion 497-8. [PMID: 12556881 DOI: 10.1038/421496b] [Citation(s) in RCA: 110] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Affiliation(s)
- Csaba Pál
- Department of Biology and Biochemistry, University of Bath, Bath BA2 7AY, UK
| | | | | |
Collapse
|
288
|
Warringer J, Blomberg A. Automated screening in environmental arrays allows analysis of quantitative phenotypic profiles in Saccharomyces cerevisiae. Yeast 2003; 20:53-67. [PMID: 12489126 DOI: 10.1002/yea.931] [Citation(s) in RCA: 204] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
A methodology for large-scale automated phenotypic profiling utilizing quantitative changes in yeast growth has been tested and applied to the analysis of some commonly used laboratory strains. This yeast-adjusted methodology is based on microcultivation in 350 microl liquid medium, where growth is frequently optically recorded, followed by automated extraction of relevant variables from obtained growth curves. We report that cultivation at this micro-scale displayed overall growth features and protein expression pattern highly similar to growth in well aerated medium-scale (10 ml) culture. However, differences were also encountered, mainly relating to the respiratory potential and the production of stress-induced proteins. Quantitative phenotypic profiles for the laboratory yeast strains W303, FY1679 and CEN-PK.2 were screened for in environmental arrays, including 98 different conditions composed of low, medium and high concentrations of 33 growth inhibitors. We introduce the concepts phenotypic index(rate) and phenotypic index(stationary), which relate to changes in rate of growth and the stationary phase optical density increment, respectively, in a particular environment relative a reference strain. The laboratory strains presented selective phenotypic profiles in both phenotypic indexes and the two features appeared in many cases to be independent characteristics. We propose the utilization of this methodology in large-scale screening of the complete collection of yeast deletion mutants.
Collapse
Affiliation(s)
- Jonas Warringer
- Department of Cell and Molecular Biology, Lundberg Laboratory, Göteborg University, Medicinaregatan 9c, 413 90 Göteborg, Sweden
| | | |
Collapse
|
289
|
Jordan IK, Wolf YI, Koonin EV. No simple dependence between protein evolution rate and the number of protein-protein interactions: only the most prolific interactors tend to evolve slowly. BMC Evol Biol 2003; 3:1. [PMID: 12515583 PMCID: PMC140311 DOI: 10.1186/1471-2148-3-1] [Citation(s) in RCA: 165] [Impact Index Per Article: 7.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2002] [Accepted: 01/06/2003] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND It has been suggested that rates of protein evolution are influenced, to a great extent, by the proportion of amino acid residues that are directly involved in protein function. In agreement with this hypothesis, recent work has shown a negative correlation between evolutionary rates and the number of protein-protein interactions. However, the extent to which the number of protein-protein interactions influences evolutionary rates remains unclear. Here, we address this question at several different levels of evolutionary relatedness. RESULTS Manually curated data on the number of protein-protein interactions among Saccharomyces cerevisiae proteins was examined for possible correlation with evolutionary rates between S. cerevisiae and Schizosaccharomyces pombe orthologs. Only a very weak negative correlation between the number of interactions and evolutionary rate of a protein was observed. Furthermore, no relationship was found between a more general measure of the evolutionary conservation of S. cerevisiae proteins, based on the taxonomic distribution of their homologs, and the number of protein-protein interactions. However, when the proteins from yeast were assorted into discrete bins according to the number of interactions, it turned out that 6.5% of the proteins with the greatest number of interactions evolved, on average, significantly slower than the rest of the proteins. Comparisons were also performed using protein-protein interaction data obtained with high-throughput analysis of Helicobacter pylori proteins. No convincing relationship between the number of protein-protein interactions and evolutionary rates was detected, either for comparisons of orthologs from two completely sequenced H. pylori strains or for comparisons of H. pylori and Campylobacter jejuni orthologs, even when the proteins were classified into bins by the number of interactions. CONCLUSION The currently available comparative-genomic data do not support the hypothesis that the evolutionary rates of the majority of proteins substantially depend on the number of protein-protein interactions they are involved in. However, a small fraction of yeast proteins with the largest number of interactions (the hubs of the interaction network) tend to evolve slower than the bulk of the proteins.
Collapse
Affiliation(s)
- I King Jordan
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894
| | - Yuri I Wolf
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894
| | - Eugene V Koonin
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894
| |
Collapse
|
290
|
|
291
|
Abstract
Structural analyses on a small number of protein families have shown that residues in protein interfaces are more conserved than average amino acid residues. This is also true of other ligand-binding and active site residues. This raises the question whether protein interactions place additional constraints on sequence divergence beyond this general background of functional restrictions on all different types of proteins. In order to investigate this, the sequence identities of Saccharomyces cerevisiae (SC) proteins to their Schizosaccharomyces pombe (SP) orthologues were used as a measure of sequence divergence. The SC proteins were divided into those in stable complexes, those that participate in transient interactions and the remaining proteins. All types of proteins can undergo extensive divergence: all three sequence identity distributions range from less than 20 to over 90%. However, overall, protein interactions do place additional constraints on sequence divergence and the distributions differ significantly: proteins not known to be involved in interactions have an average sequence identity of 38% while this value is 46% for proteins in stable complexes. Proteins that have transient interactions are intermediate between the two, with an average sequence identity of 41%. This trend is independent of whether the proteins are involved in informational functions (transcription, translation and replication) or not and of protein dispensability.
Collapse
Affiliation(s)
- Sarah A Teichmann
- MRC Laboratory of Molecular Biology, Hills Road, CB2 2QH, Cambridge, UK.
| |
Collapse
|
292
|
Torgerson DG, Kulathinal RJ, Singh RS. Mammalian sperm proteins are rapidly evolving: evidence of positive selection in functionally diverse genes. Mol Biol Evol 2002; 19:1973-80. [PMID: 12411606 DOI: 10.1093/oxfordjournals.molbev.a004021] [Citation(s) in RCA: 187] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
A growing number of genes involved in sex and reproduction have been demonstrated to be rapidly evolving. Here, we show that genes expressed solely in spermatozoa represent a highly diverged subset among mouse and human tissue-specific orthologs. The average rate of nonsynonymous substitutions per site (K(a)) is significantly higher in sperm proteins (mean K(a) = 0.18; N = 35) than in proteins expressed specifically in all other tissues (mean K(a) = 0.074; N = 473). No differences, however, are found in the synonymous substitution rate (K(s)) between tissues, suggesting that selective forces, and not mutation rate, explain the high rate of replacement substitutions in sperm proteins. Four out of 19 sperm-specific genes with characterized function demonstrated evidence of strong positive Darwinian selection, including a protein involved in gene regulation, Protamine-1 (PRM1), a protein involved in glycolysis, GAPDS, and two egg-binding proteins, Adam-2 precursor (ADAM2) and sperm-adhesion molecule-1 (SAM1). These results demonstrate the rapid evolution of sperm-specific genes and highlight the molecular action of sexual selection on a variety of characters involved in mammalian sperm function.
Collapse
Affiliation(s)
- Dara G Torgerson
- Department of Biology, McMaster University, Hamilton, Ontario, Canada.
| | | | | |
Collapse
|
293
|
Lipman DJ, Souvorov A, Koonin EV, Panchenko AR, Tatusova TA. The relationship of protein conservation and sequence length. BMC Evol Biol 2002; 2:20. [PMID: 12410938 PMCID: PMC137605 DOI: 10.1186/1471-2148-2-20] [Citation(s) in RCA: 107] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2002] [Accepted: 11/01/2002] [Indexed: 11/22/2022] Open
Abstract
BACKGROUND In general, the length of a protein sequence is determined by its function and the wide variance in the lengths of an organism's proteins reflects the diversity of specific functional roles for these proteins. However, additional evolutionary forces that affect the length of a protein may be revealed by studying the length distributions of proteins evolving under weaker functional constraints. RESULTS We performed sequence comparisons to distinguish highly conserved and poorly conserved proteins from the bacterium Escherichia coli, the archaeon Archaeoglobus fulgidus, and the eukaryotes Saccharomyces cerevisiae, Drosophila melanogaster, and Homo sapiens. For all organisms studied, the conserved and nonconserved proteins have strikingly different length distributions. The conserved proteins are, on average, longer than the poorly conserved ones, and the length distributions for the poorly conserved proteins have a relatively narrow peak, in contrast to the conserved proteins whose lengths spread over a wider range of values. For the two prokaryotes studied, the poorly conserved proteins approximate the minimal length distribution expected for a diverse range of structural folds. CONCLUSIONS There is a relationship between protein conservation and sequence length. For all the organisms studied, there seems to be a significant evolutionary trend favoring shorter proteins in the absence of other, more specific functional constraints.
Collapse
Affiliation(s)
- David J Lipman
- National Center for Biotechnology Information, National Institutes of Health, Bethesda, MD 20894, USA
| | - Alexander Souvorov
- National Center for Biotechnology Information, National Institutes of Health, Bethesda, MD 20894, USA
| | - Eugene V Koonin
- National Center for Biotechnology Information, National Institutes of Health, Bethesda, MD 20894, USA
| | - Anna R Panchenko
- National Center for Biotechnology Information, National Institutes of Health, Bethesda, MD 20894, USA
| | - Tatiana A Tatusova
- National Center for Biotechnology Information, National Institutes of Health, Bethesda, MD 20894, USA
| |
Collapse
|
294
|
Kitami T, Nadeau JH. Biochemical networking contributes more to genetic buffering in human and mouse metabolic pathways than does gene duplication. Nat Genet 2002; 32:191-4. [PMID: 12161750 DOI: 10.1038/ng945] [Citation(s) in RCA: 68] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
During evolution different genes evolve at unequal rates, reflecting the varying functional constraints on phenotype. An important contributor to this variation is genetic buffering, which reduces the potential detrimental effects of mutations. We studied whether gene duplication and redundant metabolic networks affect genetic buffering by comparing the evolutionary rate of 242 human and mouse orthologous genes involved in metabolic pathways. A gene with a redundant network is defined as one for which the structural layout of metabolic pathways provides an alternative metabolic route that can, in principle, compensate for the loss of a protein function encoded by the gene. We found that genes with redundant networks evolve at similar rates as did genes without redundant networks, [corrected] but no significant difference was detected between single-copy genes and gene families. This implies that redundancy in metabolic networks provides significantly more genetic buffering than do gene families. We also found that genes encoding proteins involved in glycolysis and gluconeogenesis showed as a group a distinct pattern of variation, in contrast to genes involved in other pathways. These results suggest that redundant networks provide genetic buffering and contribute to the functional diversification of metabolic pathways.
Collapse
Affiliation(s)
- Toshimori Kitami
- Department of Genetics, Center for Computational Genomics, Case Western Reserve University and University Hospitals of Cleveland, Cleveland, Ohio 44106, USA
| | | |
Collapse
|
295
|
Abstract
Human single nucleotide polymorphisms (SNPs) represent the most frequent type of human population DNA variation. One of the main goals of SNP research is to understand the genetics of the human phenotype variation and especially the genetic basis of human complex diseases. Non-synonymous coding SNPs (nsSNPs) comprise a group of SNPs that, together with SNPs in regulatory regions, are believed to have the highest impact on phenotype. Here we present a World Wide Web server to predict the effect of an nsSNP on protein structure and function. The prediction method enabled analysis of the publicly available SNP database HGVbase, which gave rise to a dataset of nsSNPs with predicted functionality. The dataset was further used to compare the effect of various structural and functional characteristics of amino acid substitutions responsible for phenotypic display of nsSNPs. We also studied the dependence of selective pressure on the structural and functional properties of proteins. We found that in our dataset the selection pressure against deleterious SNPs depends on the molecular function of the protein, although it is insensitive to several other protein features considered. The strongest selective pressure was detected for proteins involved in transcription regulation.
Collapse
Affiliation(s)
- Vasily Ramensky
- European Molecular Biology Laboratory, Meyerhofstrasse 1, 69117 Heidelberg, Germany
| | | | | |
Collapse
|
296
|
Jordan IK, Rogozin IB, Wolf YI, Koonin EV. Essential genes are more evolutionarily conserved than are nonessential genes in bacteria. Genome Res 2002; 12:962-8. [PMID: 12045149 PMCID: PMC1383730 DOI: 10.1101/gr.87702] [Citation(s) in RCA: 340] [Impact Index Per Article: 15.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/02/2023]
Abstract
The "knockout-rate" prediction holds that essential genes should be more evolutionarily conserved than are nonessential genes. This is because negative (purifying) selection acting on essential genes is expected to be more stringent than that for nonessential genes, which are more functionally dispensable and/or redundant. However, a recent survey of evolutionary distances between Saccharomyces cerevisiae and Caenorhabditis elegans proteins did not reveal any difference between the rates of evolution for essential and nonessential genes. An analysis of mouse and rat orthologous genes also found that essential and nonessential genes evolved at similar rates when genes thought to evolve under directional selection were excluded from the analysis. In the present study, we combine genomic sequence data with experimental knockout data to compare the rates of evolution and the levels of selection for essential versus nonessential bacterial genes. In contrast to the results obtained for eukaryotic genes, essential bacterial genes appear to be more conserved than are nonessential genes over both relatively short (microevolutionary) and longer (macroevolutionary) time scales.
Collapse
Affiliation(s)
- I King Jordan
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | | | | | | |
Collapse
|
297
|
Abstract
The availability of multiple complete genome sequences from the same species can facilitate attempts to systematically address basic questions in genome evolution. We refer to such efforts as "microevolutionary genomics". We report the results of comparative analyses of complete intraspecific genome (and proteome) sequences from four bacterial species--Chlamydophila pneumoniae, Escherichia coli, Helicobacter pylori and Neisseria meningitidis. Comparisons of average synonymous (K(s)) and nonsynonymous (K(a)) substitution rates were used to assess the influence of various biological factors on the rate of protein evolution. For example, E. coli experiences the most intense purifying selection of the species analyzed, and this may be due to the relatively larger population size of this species. In addition, essential genes were shown to be more evolutionarily conserved than nonessential genes in E. coli and duplicated genes have higher rates of evolution than unique genes for all species studied except C. pneumoniae. Different functional categories of genes were shown to evolve at significantly different rates emphasizing the role of category-specific functional constraints in determining evolutionary rates. Finally, functionally characterized genes tend to be conserved between strains, while uncharacterized genes are over-represented among the unique, strain-specific genes. This suggests the possibility that nonessential genes are responsible for driving the evolutionary diversification between strains.
Collapse
Affiliation(s)
- I King Jordan
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland 20894, USA
| | | | | | | |
Collapse
|
298
|
Harrison PM, Gerstein M. Studying genomes through the aeons: protein families, pseudogenes and proteome evolution. J Mol Biol 2002; 318:1155-74. [PMID: 12083509 DOI: 10.1016/s0022-2836(02)00109-2] [Citation(s) in RCA: 120] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Protein families can be used to understand many aspects of genomes, both their "live" and their "dead" parts (i.e. genes and pseudogenes). Surveys of genomes have revealed that, in every organism, there are always a few large families and many small ones, with the overall distribution following a power-law. This commonality is equally true for both genes and pseudogenes, and exists despite the fact that the specific families that are enlarged differ greatly between organisms. Furthermore, because of family structure there is great redundancy in proteomes, a fact linked to the large number of dispensable genes for each organism and the small size of the minimal, indispensable sub-proteome. Pseudogenes in prokaryotes represent families that are in the process of being dispensed with. In particular, the genome sequences of certain pathogenic bacteria (Mycobacterium leprae, Yersinia pestis and Rickettsia prowazekii) show how an organism can undergo reductive evolution on a large scale (i.e. the dying out of families) as a result of niche change. There appears to be less pressure to delete pseudogenes in eukaryotes. These can be divided into two varieties, duplicated and processed, where the latter involves reverse transcription from an mRNA intermediate. We discuss these collectively in yeast, worm, fly, and human. The fly has few pseudogenes apparently because of its high rate of genomic DNA deletion. In the other three organisms, the distribution of pseudogenes on the chromosome and amongst different families is highly non-uniform. Pseudogenes tend not to occur in the middle of chromosome arms, and tend to be associated with lineage-specific (as opposed to highly conserved) families that have environmental-response functions. This may be because, rather than being dead, they may form a reservoir of diverse "extra parts" that can be resurrected to help an organism adapt to its surroundings. In yeast, there may be a novel mechanism involving the [PSI+] prion that potentially enables this resurrection. In worm, the pseudogenes tend to arise out of families (e.g. chemoreceptors) that are greatly expanded in it compared to the fly. The human genome stands out in having many processed pseudogenes. These have a character very different from those of the duplicated variety, to a large extent just representing random insertions. Thus, their occurrence tends to be roughly in proportion to the amount of mRNA for a particular protein and to reflect the extent of the intergenic sequences. Further information about pseudogenes is available at http://genecensus.org/pseudogene
Collapse
Affiliation(s)
- Paul M Harrison
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520-8114, USA
| | | |
Collapse
|
299
|
Fraser HB, Hirsh AE, Steinmetz LM, Scharfe C, Feldman MW. Evolutionary rate in the protein interaction network. Science 2002; 296:750-2. [PMID: 11976460 DOI: 10.1126/science.1068696] [Citation(s) in RCA: 625] [Impact Index Per Article: 28.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
Abstract
High-throughput screens have begun to reveal the protein interaction network that underpins most cellular functions in the yeast Saccharomyces cerevisiae. How the organization of this network affects the evolution of the proteins that compose it is a fundamental question in molecular evolution. We show that the connectivity of well-conserved proteins in the network is negatively correlated with their rate of evolution. Proteins with more interactors evolve more slowly not because they are more important to the organism, but because a greater proportion of the protein is directly involved in its function. At sites important for interaction between proteins, evolutionary changes may occur largely by coevolution, in which substitutions in one protein result in selection pressure for reciprocal changes in interacting partners. We confirm one predicted outcome of this process-namely, that interacting proteins evolve at similar rates.
Collapse
Affiliation(s)
- Hunter B Fraser
- Department of Molecular and Cell Biology, University of California, Berkeley, CA 94720, USA.
| | | | | | | | | |
Collapse
|
300
|
Abstract
Until recently, it was impracticable to identify the genes that are responsible for variation in continuous traits, or to directly observe the effects of their different alleles. Now, the abundance of genetic markers has made it possible to identify quantitative trait loci (QTL)--the regions of a chromosome or, ideally, individual sequence variants that are responsible for trait variation. What kind of QTL do we expect to find and what can our observations of QTL tell us about how organisms evolve? The key to understanding the evolutionary significance of QTL is to understand the nature of inherited variation, not in the immediate mechanistic sense of how genes influence phenotype, but, rather, to know what evolutionary forces maintain genetic variability.
Collapse
Affiliation(s)
- N H Barton
- Institute of Cell, Animal and Population Biology, University of Edinburgh, West Mains Road, Edinburgh EH9 3JT, UK.
| | | |
Collapse
|