26
|
Kondrashov FA. Gene duplication as a mechanism of genomic adaptation to a changing environment. Proc Biol Sci 2012; 279:5048-57. [PMID: 22977152 PMCID: PMC3497230 DOI: 10.1098/rspb.2012.1108] [Citation(s) in RCA: 385] [Impact Index Per Article: 32.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2012] [Accepted: 08/21/2012] [Indexed: 01/13/2023] Open
Abstract
A subject of extensive study in evolutionary theory has been the issue of how neutral, redundant copies can be maintained in the genome for long periods of time. Concurrently, examples of adaptive gene duplications to various environmental conditions in different species have been described. At this point, it is too early to tell whether or not a substantial fraction of gene copies have initially achieved fixation by positive selection for increased dosage. Nevertheless, enough examples have accumulated in the literature that such a possibility should be considered. Here, I review the recent examples of adaptive gene duplications and make an attempt to draw generalizations on what types of genes may be particularly prone to be selected for under certain environmental conditions. The identification of copy-number variation in ecological field studies of species adapting to stressful or novel environmental conditions may improve our understanding of gene duplications as a mechanism of adaptation and its relevance to the long-term persistence of gene duplications.
Collapse
|
27
|
Povolotskaya IS, Kondrashov FA, Ledda A, Vlasov PK. Stop codons in bacteria are not selectively equivalent. Biol Direct 2012; 7:30. [PMID: 22974057 PMCID: PMC3549826 DOI: 10.1186/1745-6150-7-30] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2012] [Accepted: 08/22/2012] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The evolution and genomic stop codon frequencies have not been rigorously studied with the exception of coding of non-canonical amino acids. Here we study the rate of evolution and frequency distribution of stop codons in bacterial genomes. RESULTS We show that in bacteria stop codons evolve slower than synonymous sites, suggesting the action of weak negative selection. However, the frequency of stop codons relative to genomic nucleotide content indicated that this selection regime is not straightforward. The frequency of TAA and TGA stop codons is GC-content dependent, with TAA decreasing and TGA increasing with GC-content, while TAG frequency is independent of GC-content. Applying a formal, analytical model to these data we found that the relationship between stop codon frequencies and nucleotide content cannot be explained by mutational biases or selection on nucleotide content. However, with weak nucleotide content-dependent selection on TAG, -0.5 < Nes < 1.5, the model fits all of the data and recapitulates the relationship between TAG and nucleotide content. For biologically plausible rates of mutations we show that, in bacteria, TAG stop codon is universally associated with lower fitness, with TAA being the optimal for G-content < 16% while for G-content > 16% TGA has a higher fitness than TAG. CONCLUSIONS Our data indicate that TAG codon is universally suboptimal in the bacterial lineage, such that TAA is likely to be the preferred stop codon for low GC content while the TGA is the preferred stop codon for high GC content. The optimization of stop codon usage may therefore be useful in genome engineering or gene expression optimization applications.
Collapse
|
28
|
Koblik EA, Red'kin YA, Meer MS, Derelle R, Golenkina SA, Kondrashov FA, Arkhipov VY. Acrocephalus orinus: a case of mistaken identity. PLoS One 2011; 6:e17716. [PMID: 21526114 PMCID: PMC3081296 DOI: 10.1371/journal.pone.0017716] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2010] [Accepted: 02/11/2011] [Indexed: 11/19/2022] Open
Abstract
Recent discovery of the Large-billed Reed Warbler (Acrocephalus orinus) in museums and in the wild significantly expanded our knowledge of its morphological traits and genetic variability, and revealed new data on geographical distribution of the breeding grounds, migration routes and wintering locations of this species. It is now certain that A. orinus is breeding in Central Asia; however, the precise area of distribution remains unclear. The difficulty in the further study of this species lies in the small number of known specimens, with only 13 currently available in museums, and in the relative uncertainty of the breeding area and habitat of this species. Following morphological and genetic analyses from Svensson, et al, we describe 14 new A. orinus specimens from collections of Zoological Museums of the former USSR from the territory of Central Asian states. All of these specimens were erroneously labeled as Blyth's Reed Warbler (A. dumetorum), which is thought to be a breeding species in these areas. The 14 new A. orinus specimens were collected during breeding season while most of the 85 A. dumetorum specimens from the same area were collected during the migration period. Our data indicate that the Central Asian territory previously attributed as breeding grounds of A. dumetorum is likely to constitute the breeding territory of A. orinus. This rare case of a re-description of the breeding territory of a lost species emphasizes the importance of maintenance of museum collections around the world. If the present data on the breeding grounds of A. orinus are confirmed with field observations and collections, the literature on the biology of A. dumetorum from the southern part of its range may have to be reconsidered.
Collapse
|
29
|
Breen MS, Kondrashov FA. Mitochondrial pathogenic mutations are population-specific. Biol Direct 2010; 5:68. [PMID: 21194457 PMCID: PMC3022564 DOI: 10.1186/1745-6150-5-68] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2010] [Accepted: 12/31/2010] [Indexed: 01/09/2023] Open
Abstract
Background Surveying deleterious variation in human populations is crucial for our understanding, diagnosis and potential treatment of human genetic pathologies. A number of recent genome-wide analyses focused on the prevalence of segregating deleterious alleles in the nuclear genome. However, such studies have not been conducted for the mitochondrial genome. Results We present a systematic survey of polymorphisms in the human mitochondrial genome, including those predicted to be deleterious and those that correspond to known pathogenic mutations. Analyzing 4458 completely sequenced mitochondrial genomes we characterize the genetic diversity of different types of single nucleotide polymorphisms (SNPs) in African (L haplotypes) and non-African (M and N haplotypes) populations. We find that the overall level of polymorphism is higher in the mitochondrial compared to the nuclear genome, although the mitochondrial genome appears to be under stronger selection as indicated by proportionally fewer nonsynonymous than synonymous substitutions. The African mitochondrial genomes show higher heterozygosity, a greater number of polymorphic sites and higher frequencies of polymorphisms for synonymous, benign and damaging polymorphism than non-African genomes. However, African genomes carry significantly fewer SNPs that have been previously characterized as pathogenic compared to non-African genomes. Conclusions Finding SNPs classified as pathogenic to be the only category of polymorphisms that are more abundant in non-African genomes is best explained by a systematic ascertainment bias that favours the discovery of pathogenic polymorphisms segregating in non-African populations. This further suggests that, contrary to the common disease-common variant hypothesis, pathogenic mutations are largely population-specific and different SNPs may be associated with the same disease in different populations. Therefore, to obtain a comprehensive picture of the deleterious variability in the human population, as well as to improve the diagnostics of individuals carrying African mitochondrial haplotypes, it is necessary to survey different populations independently. Reviewers This article was reviewed by Dr Mikhail Gelfand, Dr Vasily Ramensky (nominated by Dr Eugene Koonin) and Dr David Rand (nominated by Dr Laurence Hurst).
Collapse
|
30
|
Povolotskaya IS, Kondrashov FA. Sequence space and the ongoing expansion of the protein universe. Nature 2010; 465:922-6. [PMID: 20485343 DOI: 10.1038/nature09105] [Citation(s) in RCA: 142] [Impact Index Per Article: 10.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2009] [Accepted: 04/19/2010] [Indexed: 11/09/2022]
Abstract
The need to maintain the structural and functional integrity of an evolving protein severely restricts the repertoire of acceptable amino-acid substitutions. However, it is not known whether these restrictions impose a global limit on how far homologous protein sequences can diverge from each other. Here we explore the limits of protein evolution using sequence divergence data. We formulate a computational approach to study the rate of divergence of distant protein sequences and measure this rate for ancient proteins, those that were present in the last universal common ancestor. We show that ancient proteins are still diverging from each other, indicating an ongoing expansion of the protein sequence universe. The slow rate of this divergence is imposed by the sparseness of functional protein sequences in sequence space and the ruggedness of the protein fitness landscape: approximately 98 per cent of sites cannot accept an amino-acid substitution at any given moment but a vast majority of all sites may eventually be permitted to evolve when other, compensatory, changes occur. Thus, approximately 3.5 x 10(9) yr has not been enough to reach the limit of divergent evolution of proteins, and for most proteins the limit of sequence similarity imposed by common function may not exceed that of random sequences.
Collapse
|
31
|
Kondrashov FA, Kondrashov AS. Measurements of spontaneous rates of mutations in the recent past and the near future. Philos Trans R Soc Lond B Biol Sci 2010; 365:1169-76. [PMID: 20308091 PMCID: PMC2871817 DOI: 10.1098/rstb.2009.0286] [Citation(s) in RCA: 72] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
The rate of spontaneous mutation in natural populations is a fundamental parameter for many evolutionary phenomena. Because the rate of mutation is generally low, most of what is currently known about mutation has been obtained through indirect, complex and imprecise methodological approaches. However, in the past few years genome-wide sequencing of closely related individuals has made it possible to estimate the rates of mutation directly at the level of the DNA, avoiding most of the problems associated with using indirect methods. Here, we review the methods used in the past with an emphasis on next generation sequencing, which may soon make the accurate measurement of spontaneous mutation rates a matter of routine.
Collapse
|
32
|
Meer MV, Kondrashov AS, Artzy-Randrup Y, Kondrashov FA. Compensatory evolution in mitochondrial tRNAs navigates valleys of low fitness. Nature 2010; 464:279-82. [PMID: 20182427 DOI: 10.1038/nature08691] [Citation(s) in RCA: 64] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2009] [Accepted: 11/16/2009] [Indexed: 12/25/2022]
Abstract
A long-standing controversy in evolutionary biology is whether or not evolving lineages can cross valleys on the fitness landscape that correspond to low-fitness genotypes, which can eventually enable them to reach isolated fitness peaks. Here we study the fitness landscapes traversed by switches between different AU and GC Watson-Crick nucleotide pairs at complementary sites of mitochondrial transfer RNA stem regions in 83 mammalian species. We find that such Watson-Crick switches occur 30-40 times more slowly than pairs of neutral substitutions, and that alleles corresponding to GU and AC non-Watson-Crick intermediate states segregate within human populations at low frequencies, similar to those of non-synonymous alleles. Substitutions leading to a Watson-Crick switch are strongly correlated, especially in mitochondrial tRNAs encoded on the GT-nucleotide-rich strand of the mitochondrial genome. Using these data we estimate that a typical Watson-Crick switch involves crossing a fitness valley of a depth of about 10(-3) or even about 10(-2), with AC intermediates being slightly more deleterious than GU intermediates. This compensatory evolution must proceed through rare intermediate variants that never reach fixation. The ubiquitous nature of compensatory evolution in mammalian mitochondrial tRNAs and other molecules implies that simultaneous fixation of two alleles that are individually deleterious may be a common phenomenon at the molecular level.
Collapse
|
33
|
Kondrashov AS, Povolotskaya IS, Ivankov DN, Kondrashov FA. Rate of sequence divergence under constant selection. Biol Direct 2010; 5:5. [PMID: 20092641 PMCID: PMC2835663 DOI: 10.1186/1745-6150-5-5] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2009] [Accepted: 01/21/2010] [Indexed: 12/02/2022] Open
Abstract
Background Divergence of two independently evolving sequences that originated from a common ancestor can be described by two parameters, the asymptotic level of divergence E and the rate r at which this level of divergence is approached. Constant negative selection impedes allele replacements and, therefore, is routinely assumed to decelerate sequence divergence. However, its impact on E and on r has not been formally investigated. Results Strong selection that favors only one allele can make E arbitrarily small and r arbitrarily large. In contrast, in the case of 4 possible alleles and equal mutation rates, the lowest value of r, attained when two alleles confer equal fitnesses and the other two are strongly deleterious, is only two times lower than its value under selective neutrality. Conclusions Constant selection can strongly constrain the level of sequence divergence, but cannot reduce substantially the rate at which this level is approached. In particular, under any constant selection the divergence of sequences that accumulated one substitution per neutral site since their origin from the common ancestor must already constitute at least one half of the asymptotic divergence at sites under such selection. Reviewers This article was reviewed by Drs. Nicolas Galtier, Sergei Maslov, and Nick Grishin.
Collapse
|
34
|
Alkalaeva E, Eliseev B, Ambrogelly A, Vlasov P, Kondrashov FA, Gundllapalli S, Frolova L, Söll D, Kisselev L. Translation termination in pyrrolysine-utilizing archaea. FEBS Lett 2009; 583:3455-60. [PMID: 19796638 DOI: 10.1016/j.febslet.2009.09.044] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2009] [Accepted: 09/24/2009] [Indexed: 10/20/2022]
Abstract
Although some data link archaeal and eukaryotic translation, the overall mechanism of protein synthesis in archaea remains largely obscure. Both archaeal (aRF1) and eukaryotic (eRF1) single release factors recognize all three stop codons. The archaeal genus Methanosarcinaceae contains two aRF1 homologs, and also uses the UAG stop to encode the 22nd amino acid, pyrrolysine. Here we provide an analysis of the last stage of archaeal translation in pyrrolysine-utilizing species. We demonstrated that only one of two Methanosarcina barkeri aRF1 homologs possesses activity and recognizes all three stop codons. The second aRF1 homolog may have another unknown function. The mechanism of pyrrolysine incorporation in the Methanosarcinaceae is discussed.
Collapse
|
35
|
|
36
|
Schmidt S, Gerasimova A, Kondrashov FA, Adzuhbei IA, Kondrashov AS, Sunyaev S. Hypermutable non-synonymous sites are under stronger negative selection. PLoS Genet 2008; 4:e1000281. [PMID: 19043566 PMCID: PMC2583910 DOI: 10.1371/journal.pgen.1000281] [Citation(s) in RCA: 28] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2007] [Accepted: 10/27/2008] [Indexed: 12/04/2022] Open
Abstract
Mutation rate varies greatly between nucleotide sites of the human genome and depends both on the global genomic location and the local sequence context of a site. In particular, CpG context elevates the mutation rate by an order of magnitude. Mutations also vary widely in their effect on the molecular function, phenotype, and fitness. Independence of the probability of occurrence of a new mutation's effect has been a fundamental premise in genetics. However, highly mutable contexts may be preserved by negative selection at important sites but destroyed by mutation at sites under no selection. Thus, there may be a positive correlation between the rate of mutations at a nucleotide site and the magnitude of their effect on fitness. We studied the impact of CpG context on the rate of human–chimpanzee divergence and on intrahuman nucleotide diversity at non-synonymous coding sites. We compared nucleotides that occupy identical positions within codons of identical amino acids and only differ by being within versus outside CpG context. Nucleotides within CpG context are under a stronger negative selection, as revealed by their lower, proportionally to the mutation rate, rate of evolution and nucleotide diversity. In particular, the probability of fixation of a non-synonymous transition at a CpG site is two times lower than at a CpG site. Thus, sites with different mutation rates are not necessarily selectively equivalent. This suggests that the mutation rate may complement sequence conservation as a characteristic predictive of functional importance of nucleotide sites. Mutations occur in some sites in the genome more frequently than in others. Similarly, mutations in some sites have greater consequences than in others. The effect of mutations might not be independent of the frequency with which mutations occur. Indeed, sites where mutations happen frequently will be preserved if the effects of these mutations are severe or will otherwise be allowed to mutate if there are no consequences for the organism. We compared both human–chimpanzee differences and sequence variation among humans in protein coding genes. We found that highly mutable nucleotide sites, such as the dinucleotide CpG, are on average more important and more frequently preserved by natural selection. Using this information, together with other features such as sequence conservation, opens a new perspective to predict the effect of human mutations, including their potential involvement in diseases.
Collapse
|
37
|
Assis R, Kondrashov AS, Koonin EV, Kondrashov FA. Nested genes and increasing organizational complexity of metazoan genomes. Trends Genet 2008; 24:475-8. [PMID: 18774620 PMCID: PMC3380635 DOI: 10.1016/j.tig.2008.08.003] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2008] [Revised: 07/24/2008] [Accepted: 08/01/2008] [Indexed: 10/21/2022]
Abstract
The most common form of protein-coding gene overlap in eukaryotes is a simple nested structure, whereby one gene is embedded in an intron of another. Analysis of nested protein-coding genes in vertebrates, fruit flies and nematodes revealed substantially higher rates of evolutionary gains than losses. The accumulation of nested gene structures could not be attributed to any obvious functional relationships between the genes involved and represents an increase of the organizational complexity of animal genomes via a neutral process.
Collapse
|
38
|
Donaldson ZR, Kondrashov FA, Putnam A, Bai Y, Stoinski TL, Hammock EAD, Young LJ. Evolution of a behavior-linked microsatellite-containing element in the 5' flanking region of the primate AVPR1A gene. BMC Evol Biol 2008; 8:180. [PMID: 18573213 PMCID: PMC2483724 DOI: 10.1186/1471-2148-8-180] [Citation(s) in RCA: 51] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2008] [Accepted: 06/23/2008] [Indexed: 11/29/2022] Open
Abstract
Background The arginine vasopressin V1a receptor (V1aR) modulates social cognition and behavior in a wide variety of species. Variation in a repetitive microsatellite element in the 5' flanking region of the V1aR gene (AVPR1A) in rodents has been associated with variation in brain V1aR expression and in social behavior. In humans, the 5' flanking region of AVPR1A contains a tandem duplication of two ~350 bp, microsatellite-containing elements located approximately 3.5 kb upstream of the transcription start site. The first block, referred to as DupA, contains a polymorphic (GT)25 microsatellite; the second block, DupB, has a complex (CT)4-(TT)-(CT)8-(GT)24 polymorphic motif, known as RS3. Polymorphisms in RS3 have been associated with variation in sociobehavioral traits in humans, including autism spectrum disorders. Thus, evolution of these regions may have contributed to variation in social behavior in primates. We examined the structure of these regions in six ape, six monkey, and one prosimian species. Results Both tandem repeat blocks are present upstream of the AVPR1A coding region in five of the ape species we investigated, while monkeys have only one copy of this region. As in humans, the microsatellites within DupA and DupB are polymorphic in many primate species. Furthermore, both single (lacking DupB) and duplicated alleles (containing both DupA and DupB) are present in chimpanzee (Pan troglodytes) populations with allele frequencies of 0.795 and 0.205 for the single and duplicated alleles, respectively, based on the analysis of 47 wild-caught individuals. Finally, a phylogenetic reconstruction suggests two alternate evolutionary histories for this locus. Conclusion There is no obvious relationship between the presence of the RS3 duplication and social organization in primates. However, polymorphisms identified in some species may be useful in future genetic association studies. In particular, the presence of both single and duplicated alleles in chimpanzees provides a unique opportunity to assess the functional role of this duplication in contributing to variation in social behavior in primates. While our initial studies show no signs of directional selection on this locus in chimps, pharmacological and genetic association studies support a potential role for this region in influencing V1aR expression and social behavior.
Collapse
|
39
|
Popadin KY, Mamirova LA, Kondrashov FA. A manually curated database of tetrapod mitochondrially encoded tRNA sequences and secondary structures. BMC Bioinformatics 2007; 8:441. [PMID: 17999775 PMCID: PMC2206058 DOI: 10.1186/1471-2105-8-441] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2007] [Accepted: 11/14/2007] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Mitochondrial tRNAs have been the subject of study for structural biologists interested in their secondary structure characteristics, evolutionary biologists have researched patterns of compensatory and structural evolution and medical studies have been directed towards understanding the basis of human disease. However, an up to date, manually curated database of mitochondrially encoded tRNAs from higher animals is currently not available. DESCRIPTION We obtained the complete mitochondrial sequence for 277 tetrapod species from GenBank and re-annotated all of the tRNAs based on a multiple alignment of each tRNA gene and secondary structure prediction made independently for each tRNA. The mitochondrial (mt) tRNA sequences and the secondary structure based multiple alignments are freely available as Supplemental Information online. CONCLUSION We compiled a manually curated database of mitochondrially encoded tRNAs from tetrapods with completely sequenced genomes. In the course of our work, we reannotated more than 10% of all tetrapod mt-tRNAs and subsequently predicted the secondary structures of 6060 mitochondrial tRNAs. This carefully constructed database can be utilized to enhance our knowledge in several different fields including the evolution of mt-tRNA secondary structure and prediction of pathogenic mt-tRNA mutations. In addition, researchers reporting novel mitochondrial genome sequences should check their tRNA gene annotations against our database to ensure a higher level of fidelity of their annotation.
Collapse
|
40
|
Bazykin GA, Kondrashov FA, Brudno M, Poliakov A, Dubchak I, Kondrashov AS. Extensive parallelism in protein evolution. Biol Direct 2007; 2:20. [PMID: 17705846 PMCID: PMC2020468 DOI: 10.1186/1745-6150-2-20] [Citation(s) in RCA: 31] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2007] [Accepted: 08/16/2007] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Independently evolving lineages mostly accumulate different changes, which leads to their gradual divergence. However, parallel accumulation of identical changes is also common, especially in traits with only a small number of possible states. RESULTS We characterize parallelism in evolution of coding sequences in three four-species sets of genomes of mammals, Drosophila, and yeasts. Each such set contains two independent evolutionary paths, which we call paths I and II. An amino acid replacement which occurred along path I also occurs along path II with the probability 50-80% of that expected under selective neutrality. Thus, the per site rate of parallel evolution of proteins is several times higher than their average rate of evolution, but still lower than the rate of evolution of neutral sequences. This deficit may be caused by changes in the fitness landscape, leading to a replacement being possible along path I but not along path II. However, constant, weak selection assumed by the nearly neutral model of evolution appears to be a more likely explanation. Then, the average coefficient of selection associated with an amino acid replacement, in the units of the effective population size, must exceed approximately 0.4, and the fraction of effectively neutral replacements must be below approximately 30%. At a majority of evolvable amino acid sites, only a relatively small number of different amino acids is permitted. CONCLUSION High, but below-neutral, rates of parallel amino acid replacements suggest that a majority of amino acid replacements that occur in evolution are subject to weak, but non-trivial, selection, as predicted by Ohta's nearly-neutral theory.
Collapse
|
41
|
Plotnikova OV, Kondrashov FA, Vlasov PK, Grigorenko AP, Ginter EK, Rogaev EI. Conversion and compensatory evolution of the gamma-crystallin genes and identification of a cataractogenic mutation that reverses the sequence of the human CRYGD gene to an ancestral state. Am J Hum Genet 2007; 81:32-43. [PMID: 17564961 PMCID: PMC1950927 DOI: 10.1086/518616] [Citation(s) in RCA: 30] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2007] [Accepted: 03/30/2007] [Indexed: 11/03/2022] Open
Abstract
We identified a mutation in the CRYGD gene (P23S) of the gamma-crystallin gene cluster that is associated with a polymorphic congenital cataract that occurs with frequency of approximately 0.3% in a human population. To gain insight into the molecular mechanism of the pathogenesis of gamma-crystallin isoforms, we undertook an evolutionary analysis of the available mammalian and newly obtained primate sequences of the gamma-crystallin genes. The cataract-associated serine at site 23 corresponds to the ancestral state, since it was found in CRYGD of a lower primate and all the surveyed nonprimate mammals. Crystallin proteins include two structurally similar domains, and substitutions in mammalian CRYGD protein at site 23 of the first domain were always associated with substitutions in the structurally reciprocal sites 109 and 136 of the second domain. These data suggest that the cataractogenic effect of serine at site 23 in the N-terminal domain of CRYGD may be compensated indirectly by amino acid changes in a distal domain. We also found that gene conversion was a factor in the evolution of the gamma-crystallin gene cluster throughout different mammalian clades. The high rate of gene conversion observed between the functional CRYGD gene and two primate gamma-crystallin pseudogenes (CRYGEP1 and CRYGFP1) coupled with a surprising finding of apparent negative selection in primate pseudogenes suggest a deleterious impact of recently derived pseudogenes involved in gene conversion in the gamma-crystallin gene cluster.
Collapse
|
42
|
Kondrashov FA, Gurbich TA, Vlasov PK. Selection for functional uniformity of tuf duplicates in gamma-proteobacteria. Trends Genet 2007; 23:215-8. [PMID: 17383049 DOI: 10.1016/j.tig.2007.03.002] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2006] [Revised: 02/14/2007] [Accepted: 03/07/2007] [Indexed: 10/23/2022]
Abstract
Having an extra copy of a gene is thought to provide some functional redundancy, which results in a higher rate of evolution in duplicated genes. In this article, we estimate the impact of gene duplication on the selection of tuf paralogs, and we find that in the absence of gene conversion, tuf paralogs have evolved significantly slower than when gene conversion has been a factor in their evolution. Thus, tuf gene copies evolve under a selective pressure that ensures their functional uniformity, and gene conversion reduces selection against amino acid substitutions that affect the function of the encoded protein, EF-Tu.
Collapse
|
43
|
Kondrashov FA, Koonin EV, Morgunov IG, Finogenova TV, Kondrashova MN. Evolution of glyoxylate cycle enzymes in Metazoa: evidence of multiple horizontal transfer events and pseudogene formation. Biol Direct 2006; 1:31. [PMID: 17059607 PMCID: PMC1630690 DOI: 10.1186/1745-6150-1-31] [Citation(s) in RCA: 111] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2006] [Accepted: 10/23/2006] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The glyoxylate cycle is thought to be present in bacteria, protists, plants, fungi, and nematodes, but not in other Metazoa. However, activity of the glyoxylate cycle enzymes, malate synthase (MS) and isocitrate lyase (ICL), in animal tissues has been reported. In order to clarify the status of the MS and ICL genes in animals and get an insight into their evolution, we undertook a comparative-genomic study. RESULTS Using sequence similarity searches, we identified MS genes in arthropods, echinoderms, and vertebrates, including platypus and opossum, but not in the numerous sequenced genomes of placental mammals. The regions of the placental mammals' genomes expected to code for malate synthase, as determined by comparison of the gene orders in vertebrate genomes, show clear similarity to the opossum MS sequence but contain stop codons, indicating that the MS gene became a pseudogene in placental mammals. By contrast, the ICL gene is undetectable in animals other than the nematodes that possess a bifunctional, fused ICL-MS gene. Examination of phylogenetic trees of MS and ICL suggests multiple horizontal gene transfer events that probably went in both directions between several bacterial and eukaryotic lineages. The strongest evidence was obtained for the acquisition of the bifunctional ICL-MS gene from an as yet unknown bacterial source with the corresponding operonic organization by the common ancestor of the nematodes. CONCLUSION The distribution of the MS and ICL genes in animals suggests that either they encode alternative enzymes of the glyoxylate cycle that are not orthologous to the known MS and ICL or the animal MS acquired a new function that remains to be characterized. Regardless of the ultimate solution to this conundrum, the genes for the glyoxylate cycle enzymes present a remarkable variety of evolutionary events including unusual horizontal gene transfer from bacteria to animals.
Collapse
|
44
|
Babenko VN, Basu MK, Kondrashov FA, Rogozin IB, Koonin EV. Signs of positive selection of somatic mutations in human cancers detected by EST sequence analysis. BMC Cancer 2006; 6:36. [PMID: 16469093 PMCID: PMC1431556 DOI: 10.1186/1471-2407-6-36] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2005] [Accepted: 02/09/2006] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Carcinogenesis typically involves multiple somatic mutations in caretaker (DNA repair) and gatekeeper (tumor suppressors and oncogenes) genes. Analysis of mutation spectra of the tumor suppressor that is most commonly mutated in human cancers, p53, unexpectedly suggested that somatic evolution of the p53 gene during tumorigenesis is dominated by positive selection for gain of function. This conclusion is supported by accumulating experimental evidence of evolution of new functions of p53 in tumors. These findings prompted a genome-wide analysis of possible positive selection during tumor evolution. METHODS A comprehensive analysis of probable somatic mutations in the sequences of Expressed Sequence Tags (ESTs) from malignant tumors and normal tissues was performed in order to access the prevalence of positive selection in cancer evolution. For each EST, the numbers of synonymous and non-synonymous substitutions were calculated. In order to identify genes with a signature of positive selection in cancers, these numbers were compared to: i) expected numbers and ii) the numbers for the respective genes in the ESTs from normal tissues. RESULTS We identified 112 genes with a signature of positive selection in cancers, i.e., a significantly elevated ratio of non-synonymous to synonymous substitutions, in tumors as compared to 37 such genes in an approximately equal-sized EST collection from normal tissues. A substantial fraction of the tumor-specific positive-selection candidates have experimentally demonstrated or strongly predicted links to cancer. CONCLUSION The results of EST analysis should be interpreted with extreme caution given the noise introduced by sequencing errors and undetected polymorphisms. Furthermore, an inherent limitation of EST analysis is that multiple mutations amenable to statistical analysis can be detected only in relatively highly expressed genes. Nevertheless, the present results suggest that positive selection might affect a substantial number of genes during tumorigenic somatic evolution.
Collapse
|
45
|
Rogaev EI, Moliaka YK, Malyarchuk BA, Kondrashov FA, Derenko MV, Chumakov I, Grigorenko AP. Complete mitochondrial genome and phylogeny of Pleistocene mammoth Mammuthus primigenius. PLoS Biol 2006; 4:e73. [PMID: 16448217 PMCID: PMC1360101 DOI: 10.1371/journal.pbio.0040073] [Citation(s) in RCA: 77] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2005] [Accepted: 01/10/2006] [Indexed: 11/18/2022] Open
Abstract
Phylogenetic relationships between the extinct woolly mammoth (Mammuthus primigenius), and the Asian (Elephas maximus) and African savanna (Loxodonta africana) elephants remain unresolved. Here, we report the sequence of the complete mitochondrial genome (16,842 base pairs) of a woolly mammoth extracted from permafrost-preserved remains from the Pleistocene epoch--the oldest mitochondrial genome sequence determined to date. We demonstrate that well-preserved mitochondrial genome fragments, as long as approximately 1,600-1700 base pairs, can be retrieved from pre-Holocene remains of an extinct species. Phylogenetic reconstruction of the Elephantinae clade suggests that M. primigenius and E. maximus are sister species that diverged soon after their common ancestor split from the L. africana lineage. Low nucleotide diversity found between independently determined mitochondrial genomic sequences of woolly mammoths separated geographically and in time suggests that north-eastern Siberia was occupied by a relatively homogeneous population of M. primigenius throughout the late Pleistocene.
Collapse
|
46
|
Kondrashov FA, Ogurtsov AY, Kondrashov AS. Selection in favor of nucleotides G and C diversifies evolution rates and levels of polymorphism at mammalian synonymous sites. J Theor Biol 2005; 240:616-26. [PMID: 16343547 DOI: 10.1016/j.jtbi.2005.10.020] [Citation(s) in RCA: 52] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2005] [Revised: 10/26/2005] [Accepted: 10/27/2005] [Indexed: 11/24/2022]
Abstract
The impact of synonymous nucleotide substitutions on fitness in mammals remains controversial. Despite some indications of selective constraint, synonymous sites are often assumed to be neutral, and the rate of their evolution is used as a proxy for mutation rate. We subdivide all sites into four classes in terms of the mutable CpG context, nonCpG, postC, preG, and postCpreG, and compare four-fold synonymous sites and intron sites residing outside transposable elements. The distribution of the rate of evolution across all synonymous sites is trimodal. Rate of evolution at nonCpG synonymous sites, not preceded by C and not followed by G, is approximately 10% below that at such intron sites. In contrast, rate of evolution at postCpreG synonymous sites is approximately 30% above that at such intron sites. Finally, synonymous and intron postC and preG sites evolve at similar rates. The relationship between the levels of polymorphism at the corresponding synonymous and intron sites is very similar to that between their rates of evolution. Within every class, synonymous sites are occupied by G or C much more often than intron sites, whose nucleotide composition is consistent with neutral mutation-drift equilibrium. These patterns suggest that synonymous sites are under weak selection in favor of G and C, with the average coefficient s approximately 0.25/Ne approximately 10(-5), where Ne is the effective population size. Such selection decelerates evolution and reduces variability at sites with symmetric mutation, but has the opposite effects at sites where the favored nucleotides are more mutable. The amino-acid composition of proteins dictates that many synonymous sites are CpGprone, which causes them, on average, to evolve faster and to be more polymorphic than intron sites. An average genotype carries approximately 10(7) suboptimal nucleotides at synonymous sites, implying synergistic epistasis in selection against them.
Collapse
|
47
|
Kondrashov FA, Kondrashov AS. Role of selection in fixation of gene duplications. J Theor Biol 2005; 239:141-51. [PMID: 16242725 DOI: 10.1016/j.jtbi.2005.08.033] [Citation(s) in RCA: 138] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2005] [Revised: 05/25/2005] [Accepted: 05/26/2005] [Indexed: 02/02/2023]
Abstract
New genes commonly appear through complete or partial duplications of pre-existing genes. Duplications of long DNA segments are constantly produced by rare mutations, may become fixed in a population by selection or random drift, and are subject to divergent evolution of the paralogous sequences after fixation, although gene conversion can impede this process. New data shed some light on each of these processes. Mutations which involve duplications can occur through at least two different mechanisms, backward strand slippage during DNA replication and unequal crossing-over. The background rate of duplication of a complete gene in humans is 10(-9)-10(-10) per generation, although many genes located within hot-spots of large-scale mutation are duplicated much more often. Many gene duplications affect fitness strongly, and are responsible, through gene dosage effects, for a number of genetic diseases. However, high levels of intrapopulation polymorphism caused by presence or absence of long, gene-containing DNA segments imply that some duplications are not under strong selection. The polymorphism to fixation ratios appear to be approximately the same for gene duplications and for presumably selectively neutral nucleotide substitutions, which, according to the McDonald-Kreitman test, is consistent with selective neutrality of duplications. However, this pattern can also be due to negative selection against most of segregating duplications and positive selection for at least some duplications which become fixed. Patterns in post-fixation evolution of duplicated genes do not easily reveal the causes of fixations. Many gene duplications which became fixed recently in a variety of organisms were positively selected because the increased expression of the corresponding genes was beneficial. The effects of gene dosage provide a unified framework for studying all phases of the life history of a gene duplication. Application of well-known methods of evolutionary genetics to accumulating data on new, polymorphic, and fixed duplication will enhance our understanding of the role of natural selection in the evolution by gene duplication.
Collapse
|
48
|
Yampolsky LY, Kondrashov FA, Kondrashov AS. Distribution of the strength of selection against amino acid replacements in human proteins. Hum Mol Genet 2005; 14:3191-201. [PMID: 16174645 DOI: 10.1093/hmg/ddi350] [Citation(s) in RCA: 74] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
The impact of an amino acid replacement on the organism's fitness can vary from lethal to selectively neutral and even, in rare cases, beneficial. Substantial data are available on either pathogenic or acceptable replacements. However, the whole distribution of coefficients of selection against individual replacements is not known for any organism. To ascertain this distribution for human proteins, we combined data on pathogenic missense mutations, on human non-synonymous SNPs and on human-chimpanzee divergence of orthologous proteins. Fractions of amino acid replacements which reduce fitness by >10(-2), 10(-2)-10(-4), 10(-4)-10(-5) and <10(-5) are 25, 49, 14 and 12%, respectively. On average, the strength of selection against a replacement is substantially higher when chemically dissimilar amino acids are involved, and the Grantham's index of a replacement explains 35% of variance in the average logarithm of selection coefficients associated with different replacements. Still, the impact of a replacement depends on its context within the protein more than on its own nature. Reciprocal replacements are often associated with rather different selection coefficients, in particular, replacements of non-polar amino acids with polar ones are typically much more deleterious than replacements in the opposite direction. However, differences between evolutionary fluxes of reciprocal replacements are only weakly correlated with the differences between the corresponding selection coefficients.
Collapse
|
49
|
Kondrashov FA. Prediction of pathogenic mutations in mitochondrially encoded human tRNAs. Hum Mol Genet 2005; 14:2415-9. [PMID: 16014637 DOI: 10.1093/hmg/ddi243] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023] Open
Abstract
Some mutations in human mitochondrial tRNAs are severely pathogenic. The available computational methods have a poor record of predicting the impact of a tRNA mutation on the phenotype and fitness. Here patterns of evolution at tRNA sites that harbor pathogenic mutations and at sites that harbor phenotypically cryptic polymorphisms were compared. Mutations that are pathogenic to humans occupy more conservative sites, are only rarely fixed in closely related species, and, when located in stem structures, often disrupt Watson-Crick pairing and display signs of compensatory evolution. These observations make it possible to classify approximately 90% of all known pathogenic mutations as deleterious together with only approximately 30% of polymorphisms. These polymorphisms segregate at frequencies that are more than two times lower than frequencies of polymorphisms classified as benign, indicating that at least approximately 30% of known polymorphisms in mitochondrial tRNAs affect fitness negatively.
Collapse
|
50
|
Kondrashov FA. [The convergent evolution of the secondary structure of mitochondrial cysteine tRNA in the nine-banded armadillo Dasypus novemcinctus]. BIOFIZIKA 2005; 50:396-403. [PMID: 15977827] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 05/03/2023]
Abstract
The case of the convergent loss of the D-hairpin in mitochondrial cysteine tRNA of the nine-banded armadillo Dasypus novemcinctus is described. This evolutionary event sheds light on the molecular structure-function relationship and on the effect of this relationship on the processes of evolution of biopolymers and macromolecules.
Collapse
|