551
|
Khatri P, Desai V, Tarca AL, Sellamuthu S, Wildman DE, Romero R, Draghici S. New Onto-Tools: Promoter-Express, nsSNPCounter and Onto-Translate. Nucleic Acids Res 2006; 34:W626-31. [PMID: 16845086 PMCID: PMC1538776 DOI: 10.1093/nar/gkl213] [Citation(s) in RCA: 17] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022] Open
Abstract
The Onto-Tools suite is composed of an annotation database and eight complementary, web-accessible data mining tools: Onto-Express, Onto-Compare, Onto-Design, Onto-Translate, Onto-Miner, Pathway-Express, Promoter-Express and nsSNPCounter. Promoter-Express is a new tool added to the Onto-Tools ensemble that facilitates the identification of transcription factor binding sites active in specific conditions. nsSNPCounter is another new tool that allows computation and analysis of synonymous and non-synonymous codon substitutions for studying evolutionary rates of protein coding genes. Onto-Translate has also been enhanced to expand its scope and accuracy by fully utilizing the capabilities of the Onto-Tools database. Currently, Onto-Translate allows arbitrary mappings between 28 types of IDs for 53 organisms. Onto-Tools are freely available at .
Collapse
Affiliation(s)
| | | | - Adi L. Tarca
- Perinatology Research BranchNIH/NICHD, 4 Brush, 3990 John R, Detroit, MI 48201, USA
| | | | - Derek E. Wildman
- Perinatology Research BranchNIH/NICHD, 4 Brush, 3990 John R, Detroit, MI 48201, USA
| | - Roberto Romero
- Perinatology Research BranchNIH/NICHD, 4 Brush, 3990 John R, Detroit, MI 48201, USA
| | | |
Collapse
|
552
|
Sikela JM. The jewels of our genome: the search for the genomic changes underlying the evolutionarily unique capacities of the human brain. PLoS Genet 2006; 2:e80. [PMID: 16733552 PMCID: PMC1464830 DOI: 10.1371/journal.pgen.0020080] [Citation(s) in RCA: 56] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2023] Open
Abstract
The recent publication of the initial sequence and analysis of the chimp genome allows us, for the first time, to compare our genome with that of our closest living evolutionary relative. With more primate genome sequences being pursued, and with other genome-wide, cross-species comparative techniques emerging, we are entering an era in which we will be able to carry out genomic comparisons of unprecedented scope and detail. These studies should yield a bounty of new insights about the genes and genomic features that are unique to our species as well as those that are unique to other primate lineages, and may begin to causally link some of these to lineage-specific phenotypic characteristics. The most intriguing potential of these new approaches will be in the area of evolutionary neurogenomics and in the possibility that the key human lineage–specific (HLS) genomic changes that underlie the evolution of the human brain will be identified. Such new knowledge should provide fresh insights into neuronal development and higher cognitive function and dysfunction, and may possibly uncover biological mechanisms for information storage, analysis, and retrieval never previously seen.
Collapse
Affiliation(s)
- James M Sikela
- Human Medical Genetics Program, Department of Pharmacology, University of Colorado at Denver and Health Sciences Center, USA.
| |
Collapse
|
553
|
Yu XJ, Zheng HK, Wang J, Wang W, Su B. Detecting lineage-specific adaptive evolution of brain-expressed genes in human using rhesus macaque as outgroup. Genomics 2006; 88:745-751. [PMID: 16857340 DOI: 10.1016/j.ygeno.2006.05.008] [Citation(s) in RCA: 102] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2006] [Revised: 05/16/2006] [Accepted: 05/23/2006] [Indexed: 11/28/2022]
Abstract
Comparative genetic analysis between human and chimpanzee may detect genetic divergences responsible for human-specific characteristics. Previous studies have identified a series of genes that potentially underwent Darwinian positive selection during human evolution. However, without a closely related species as outgroup, it is difficult to identify human-lineage-specific changes, which is critical in delineating the biological uniqueness of humans. In this study, we conducted phylogeny-based analyses of 2633 human brain-expressed genes using rhesus macaque as the outgroup. We identified 47 candidate genes showing strong evidence of positive selection in the human lineage. Genes with maximal expression in the brain showed a higher evolutionary rate in human than in chimpanzee. We observed that many immune-defense-related genes were under strong positive selection, and this trend was more prominent in chimpanzee than in human. We also demonstrated that rhesus macaque performed much better than mouse as an outgroup in identifying lineage-specific selection in humans.
Collapse
Affiliation(s)
- Xiao-Jing Yu
- Key Laboratory of Cellular and Molecular Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, Yunnan 650223, China; Kunming Primate Research Center, Chinese Academy of Sciences, Kunming, Yunnan 650223, China; Graduate School, Chinese Academy of Sciences, Beijing, China
| | - Hong-Kun Zheng
- Beijing Genomics Institute, Chinese Academy of Sciences, Beijing, China; The Institute of Human Genetics, University of Aarhus, DK-8000 Aarhus C, Denmark
| | - Jun Wang
- Beijing Genomics Institute, Chinese Academy of Sciences, Beijing, China; The Institute of Human Genetics, University of Aarhus, DK-8000 Aarhus C, Denmark; Department of Biochemistry and Molecular Biology, University of Southern Denmark, DK-5230, Odense M, Denmark
| | - Wen Wang
- Key Laboratory of Cellular and Molecular Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, Yunnan 650223, China
| | - Bing Su
- Key Laboratory of Cellular and Molecular Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, Yunnan 650223, China; Kunming Primate Research Center, Chinese Academy of Sciences, Kunming, Yunnan 650223, China.
| |
Collapse
|
554
|
Goidts V, Cooper DN, Armengol L, Schempp W, Conroy J, Estivill X, Nowak N, Hameister H, Kehrer-Sawatzki H. Complex patterns of copy number variation at sites of segmental duplications: an important category of structural variation in the human genome. Hum Genet 2006; 120:270-84. [PMID: 16838144 DOI: 10.1007/s00439-006-0217-y] [Citation(s) in RCA: 48] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2006] [Accepted: 05/26/2006] [Indexed: 10/24/2022]
Abstract
The structural diversity of the human genome is much higher than previously assumed although its full extent remains unknown. To investigate the association between segmental duplications that display constitutive copy number differences (CNDs) between humans and the great apes and those which exhibit polymorphic copy number variations (CNVs) between humans, we analysed a BAC array enriched with segmental duplications displaying such CNDs. This study documents for the first time that in addition to human-specific gains common to all humans, these duplication clusters (DCs) also exhibit polymorphic CNVs > 40 kb. Segmental duplication is known to have been a frequent event during human genome evolution. Importantly, among the CNV-associated genes identified here, those involved in transcriptional regulation were found to be significantly overrepresented. Complex patterns of variation were evident at sites of DCs, manifesting as inter-individual differentially sized copy number alterations at the same genomic loci. Thus, CNVs associated with segmental duplications do not simply represent insertion/deletion polymorphisms, but rather constitute a wide variety of rearrangements involving differential amplification and partial gains and losses with high inter-individual variability. Although the number of CNVs was not found to differ between Africans and Caucasians/Asians, the average number of variant patterns per locus was significantly lower in Africans. Thus, complex variation patterns characterizing segmental duplications result from relatively recent genomic rearrangements. The high number of these rearrangements, some of which are potentially recurrent, together with differences in population size and expansion dynamics, may account for the greater diversity of CNV in Caucasians/Asians as compared with Africans.
Collapse
Affiliation(s)
- Violaine Goidts
- Department of Human Genetics, University of Ulm, Albert Einstein Allee 11, 89081, Ulm, Germany
| | | | | | | | | | | | | | | | | |
Collapse
|
555
|
Kelley JL, Madeoy J, Calhoun JC, Swanson W, Akey JM. Genomic signatures of positive selection in humans and the limits of outlier approaches. Genome Res 2006; 16:980-9. [PMID: 16825663 PMCID: PMC1524870 DOI: 10.1101/gr.5157306] [Citation(s) in RCA: 158] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
Identifying regions of the human genome that have been targets of positive selection will provide important insights into recent human evolutionary history and may facilitate the search for complex disease genes. However, the confounding effects of population demographic history and selection on patterns of genetic variation complicate inferences of selection when a small number of loci are studied. To this end, identifying outlier loci from empirical genome-wide distributions of genetic variation is a promising strategy to detect targets of selection. Here, we evaluate the power and efficiency of a simple outlier approach and describe a genome-wide scan for positive selection using a dense catalog of 1.58 million SNPs that were genotyped in three human populations. In total, we analyzed 14,589 genes, 385 of which possess patterns of genetic variation consistent with the hypothesis of positive selection. Furthermore, several extended genomic regions were found, spanning >500 kb, that contained multiple contiguous candidate selection genes. More generally, these data provide important practical insights into the limits of outlier approaches in genome-wide scans for selection, provide strong candidate selection genes to study in greater detail, and may have important implications for disease related research.
Collapse
Affiliation(s)
- Joanna L. Kelley
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195, USA
| | - Jennifer Madeoy
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195, USA
| | - John C. Calhoun
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195, USA
| | - Willie Swanson
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195, USA
| | - Joshua M. Akey
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195, USA
- Corresponding author.E-mail ; fax (206) 685-7301
| |
Collapse
|
556
|
Sabeti PC, Schaffner SF, Fry B, Lohmueller J, Varilly P, Shamovsky O, Palma A, Mikkelsen TS, Altshuler D, Lander ES. Positive natural selection in the human lineage. Science 2006; 312:1614-20. [PMID: 16778047 DOI: 10.1126/science.1124309] [Citation(s) in RCA: 769] [Impact Index Per Article: 42.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
Abstract
Positive natural selection is the force that drives the increase in prevalence of advantageous traits, and it has played a central role in our development as a species. Until recently, the study of natural selection in humans has largely been restricted to comparing individual candidate genes to theoretical expectations. The advent of genome-wide sequence and polymorphism data brings fundamental new tools to the study of natural selection. It is now possible to identify new candidates for selection and to reevaluate previous claims by comparison with empirical distributions of DNA sequence variation across the human genome and among populations. The flood of data and analytical methods, however, raises many new challenges. Here, we review approaches to detect positive natural selection, describe results from recent analyses of genome-wide data, and discuss the prospects and challenges ahead as we expand our understanding of the role of natural selection in shaping the human genome.
Collapse
Affiliation(s)
- P C Sabeti
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | | | | | | | | | | | | | | | | | | |
Collapse
|
557
|
Ehrenreich IM, Purugganan MD. The molecular genetic basis of plant adaptation. AMERICAN JOURNAL OF BOTANY 2006; 93:953-962. [PMID: 21642159 DOI: 10.3732/ajb.93.7.953] [Citation(s) in RCA: 32] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/30/2023]
Abstract
How natural selection on adaptive traits is filtered to the genetic level remains largely unknown. Theory and quantitative trait locus (QTL) mapping have provided insights into the number and effect of genes underlying adaptations, but these results have been hampered by questions of applicability to real biological systems and poor resolution, respectively. Advances in molecular technologies have expedited the cloning of adaptive genes through both forward and reverse genetic approaches. Forward approaches start with adaptive traits and attempt to characterize their underlying genetic architectures through linkage disequilibrium mapping, QTL mapping, and other methods. Reverse screens search large sequence data sets for genes that possess the signature of selection. Though both approaches have been successful in identifying adaptive genes in plants, very few, if any, of these adaptations' molecular bases have been fully resolved. The continued isolation of plant adaptive genes will lead to a more comprehensive understanding of natural selection's effect on genes and genomes.
Collapse
Affiliation(s)
- Ian M Ehrenreich
- Department of Genetics, Box 7614, North Carolina State University, Raleigh, North Carolina 27695 USA
| | | |
Collapse
|
558
|
Biswas S, Akey JM. Genomic insights into positive selection. Trends Genet 2006; 22:437-46. [PMID: 16808986 DOI: 10.1016/j.tig.2006.06.005] [Citation(s) in RCA: 283] [Impact Index Per Article: 15.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2005] [Revised: 03/27/2006] [Accepted: 06/06/2006] [Indexed: 01/28/2023]
Abstract
The traditional way of identifying targets of adaptive evolution has been to study a few loci that one hypothesizes a priori to have been under selection. This approach is complicated because of the confounding effects that population demographic history and selection have on patterns of DNA sequence variation. In principle, multilocus analyses can facilitate robust inferences of selection at individual loci. The deluge of large-scale catalogs of genetic variation has stimulated many genome-wide scans for positive selection in several species. Here, we review some of the salient observations of these studies, identify important challenges ahead, consider the limitations of genome-wide scans for selection and discuss the potential significance of a comprehensive understanding of genomic patterns of selection for disease-related research.
Collapse
Affiliation(s)
- Shameek Biswas
- Department of Genome Sciences, University of Washington, 1705 NE Pacific, Seattle, WA 98195, USA
| | | |
Collapse
|
559
|
Gilad Y, Oshlack A, Rifkin SA. Natural selection on gene expression. Trends Genet 2006; 22:456-61. [PMID: 16806568 DOI: 10.1016/j.tig.2006.06.002] [Citation(s) in RCA: 147] [Impact Index Per Article: 8.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2006] [Revised: 04/07/2006] [Accepted: 06/05/2006] [Indexed: 01/30/2023]
Abstract
Changes in genetic regulation contribute to adaptations in natural populations and influence susceptibility to human diseases. Despite their potential phenotypic importance, the selective pressures acting on regulatory processes in general and gene expression levels in particular are largely unknown. Studies in model organisms suggest that the expression levels of most genes evolve under stabilizing selection, although a few are consistent with adaptive evolution. However, it has been proposed that gene expression levels in primates evolve largely in the absence of selective constraints. In this article, we discuss the microarray-based observations that led to these disparate interpretations. We conclude that in both primates and model organisms, stabilizing selection is likely to be the dominant mode of gene expression evolution. An important implication is that mutations affecting gene expression will often be deleterious and might underlie many human diseases.
Collapse
Affiliation(s)
- Yoav Gilad
- Department of Human Genetics, University of Chicago, Chicago, Il 60637, USA.
| | | | | |
Collapse
|
560
|
Abstract
Why do proteins evolve at different rates? Advances in systems biology and genomics have facilitated a move from studying individual proteins to characterizing global cellular factors. Systematic surveys indicate that protein evolution is not determined exclusively by selection on protein structure and function, but is also affected by the genomic position of the encoding genes, their expression patterns, their position in biological networks and possibly their robustness to mistranslation. Recent work has allowed insights into the relative importance of these factors. We discuss the status of a much-needed coherent view that integrates studies on protein evolution with biochemistry and functional and structural genomics.
Collapse
Affiliation(s)
- Csaba Pál
- European Molecular Biology Laboratory, Meyerhofstrasse 1, D-69012 Heidelberg, Germany
| | | | | |
Collapse
|
561
|
|
562
|
Evans SN, Shvets Y, Slatkin M. Non-equilibrium theory of the allele frequency spectrum. Theor Popul Biol 2006; 71:109-19. [PMID: 16887160 DOI: 10.1016/j.tpb.2006.06.005] [Citation(s) in RCA: 52] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2006] [Revised: 06/05/2006] [Accepted: 06/07/2006] [Indexed: 10/24/2022]
Abstract
A forward diffusion equation describing the evolution of the allele frequency spectrum is presented. The influx of mutations is accounted for by imposing a suitable boundary condition. For a Wright-Fisher diffusion with or without selection and varying population size, the boundary condition is lim(x downward arrow0)xf(x,t)=thetarho(t), where f(.,t) is the frequency spectrum of derived alleles at independent loci at time t and rho(t) is the relative population size at time t. When population size and selection intensity are independent of time, the forward equation is equivalent to the backwards diffusion usually used to derive the frequency spectrum, but this approach allows computation of the time dependence of the spectrum both before an equilibrium is attained and when population size and selection intensity vary with time. From the diffusion equation, a set of ordinary differential equations for the moments of f(.,t) is derived and the expected spectrum of a finite sample is expressed in terms of those moments. The use of the forward equation is illustrated by considering neutral and selected alleles in a highly simplified model of human history. For example, it is shown that approximately 30% of the expected total heterozygosity of neutral loci is attributable to mutations that arose since the onset of population growth in roughly the last 150,000 years.
Collapse
Affiliation(s)
- Steven N Evans
- Department of Statistics #3860, University of California at Berkeley, 367 Evans Hall, Berkeley, CA 94720-3860, USA.
| | | | | |
Collapse
|
563
|
Levine MT, Jones CD, Kern AD, Lindfors HA, Begun DJ. Novel genes derived from noncoding DNA in Drosophila melanogaster are frequently X-linked and exhibit testis-biased expression. Proc Natl Acad Sci U S A 2006; 103:9935-9. [PMID: 16777968 PMCID: PMC1502557 DOI: 10.1073/pnas.0509809103] [Citation(s) in RCA: 236] [Impact Index Per Article: 13.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022] Open
Abstract
Descriptions of recently evolved genes suggest several mechanisms of origin including exon shuffling, gene fission/fusion, retrotransposition, duplication-divergence, and lateral gene transfer, all of which involve recruitment of preexisting genes or genetic elements into new function. The importance of noncoding DNA in the origin of novel genes remains an open question. We used the well annotated genome of the genetic model system Drosophila melanogaster and genome sequences of related species to carry out a whole-genome search for new D. melanogaster genes that are derived from noncoding DNA. Here, we describe five such genes, four of which are X-linked. Our RT-PCR experiments show that all five putative novel genes are expressed predominantly in testes. These data support the idea that these novel genes are derived from ancestral noncoding sequence and that new, favored genes are likely to invade populations under selective pressures relating to male reproduction.
Collapse
Affiliation(s)
- Mia T Levine
- Center for Population Biology, University of California-Davis, Davis, CA 95616, USA.
| | | | | | | | | |
Collapse
|
564
|
Pavlicek A, Jurka J. Positive selection on the nonhomologous end-joining factor Cernunnos-XLF in the human lineage. Biol Direct 2006; 1:15. [PMID: 16749933 PMCID: PMC1552050 DOI: 10.1186/1745-6150-1-15] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2006] [Accepted: 06/02/2006] [Indexed: 01/23/2023] Open
Abstract
Background Cernunnos-XLF is a nonhomologous end-joining factor that is mutated in patients with a rare immunodeficiency with microcephaly. Several other microcephaly-associated genes such as ASPM and microcephalin experienced recent adaptive evolution apparently linked to brain size expansion in humans. In this study we investigated whether Cernunnos-XLF experienced similar positive selection during human evolution. Results We obtained or reconstructed full-length coding sequences of chimpanzee, rhesus macaque, canine, and bovine Cernunnos-XLF orthologs from sequence databases and sequence trace archives. Comparison of coding sequences revealed an excess of nonsynonymous substitutions consistent with positive selection on Cernunnos-XLF in the human lineage. The hotspots of adaptive evolution are concentrated around a specific structural domain, whose analogue in the structurally similar XRCC4 protein is involved in binding of another nonhomologous end-joining factor, DNA ligase IV. Conclusion Cernunnos-XLF is a microcephaly-associated locus newly identified to be under adaptive evolution in humans, and possibly played a role in human brain expansion. We speculate that Cernunnos-XLF may have contributed to the increased number of brain cells in humans by efficient double strand break repair, which helps to prevent frequent apoptosis of neuronal progenitors and aids mitotic cell cycle progression. Reviewers This article was reviewed by Chris Ponting and Richard Emes (nominated by Chris Ponting), Kateryna Makova, Gáspár Jékely and Eugene V. Koonin.
Collapse
Affiliation(s)
- Adam Pavlicek
- Genetic Information Research Institute, Mountain View, CA 94043, USA
| | - Jerzy Jurka
- Genetic Information Research Institute, Mountain View, CA 94043, USA
| |
Collapse
|
565
|
Verrelli BC, Tishkoff SA, Stone AC, Touchman JW. Contrasting histories of G6PD molecular evolution and malarial resistance in humans and chimpanzees. Mol Biol Evol 2006; 23:1592-601. [PMID: 16751255 DOI: 10.1093/molbev/msl024] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022] Open
Abstract
Although mutations in the glucose-6-phosphate dehydrogenase (G6PD) gene result in several blood-related diseases in humans, they also confer resistance to malarial infection. This association between G6PD and malaria was supported by population genetic analyses of the G6PD locus, which indicated that these mutations may have recently risen in frequency in certain geographic regions as a result of positive selection. Here we characterize nucleotide sequence variation in a 5.2-kb region of the G6PD locus in a population sample of 56 chimpanzees, as well as among 7 other nonhuman primates, to compare with that in humans in determining whether other primates that are impacted by malaria also exhibit patterns of G6PD polymorphism or divergence consistent with positive selection. We find that chimpanzees have several amino acid variants but that the overall pattern at G6PD in chimpanzees, as well as in Old and New World primates in general, can be explained by recent purifying selection as well as strong functional constraint dating back to at least 30-40 MYA. These comparative analyses suggest that the recent signature of positive selection at G6PD in humans is unique.
Collapse
Affiliation(s)
- Brian C Verrelli
- Center for Evolutionary Functional Genomics, The Biodesign Institute, Tempe, Arizona, USA.
| | | | | | | |
Collapse
|
566
|
Abstract
The beneficial substitution of an allele shapes patterns of genetic variation at linked sites. Thus, in principle, adaptations can be mapped by looking for the signature of directional selection in polymorphism data. In practice, such efforts are hampered by the need for an accurate characterization of the demographic history of the species and of the effects of positive selection. In an attempt to circumvent these difficulties, researchers are increasingly taking a purely empirical approach, in which a large number of genomic regions are ordered by summaries of the polymorphism data, and loci with extreme values are considered to be likely targets of positive selection. We evaluated the reliability of the "empirical" approach, focusing on applications to human data and to maize. To do so, we considered a coalescent model of directional selection in a sensible demographic setting, allowing for selection on standing variation as well as on a new mutation. Our simulations suggest that while empirical approaches will identify several interesting candidates, they will also miss many--in some cases, most--loci of interest. The extent of the trade-off depends on the mode of positive selection and the demographic history of the population. Specifically, the false-discovery rate is higher when directional selection involves a recessive rather than a co-dominant allele, when it acts on a previously neutral rather than a new allele, and when the population has experienced a population bottleneck rather than maintained a constant size. One implication of these results is that, insofar as attributes of the beneficial mutation (e.g., the dominance coefficient) affect the power to detect targets of selection, genomic scans will yield an unrepresentative subset of loci that contribute to adaptations.
Collapse
Affiliation(s)
- Kosuke M. Teshima
- Department of Human Genetics, University of Chicago, Chicago, Illinois 60637, USA
- Corresponding authors.E-mail ; fax (773) 834-0505.E-mail ; fax (773) 834-0505
| | - Graham Coop
- Department of Human Genetics, University of Chicago, Chicago, Illinois 60637, USA
| | - Molly Przeworski
- Department of Human Genetics, University of Chicago, Chicago, Illinois 60637, USA
- Corresponding authors.E-mail ; fax (773) 834-0505.E-mail ; fax (773) 834-0505
| |
Collapse
|
567
|
Abstract
Understanding the genes that contribute to reproductive isolation is essential to understanding speciation, but isolating such genes has proven very difficult. In this study I apply a multilocus test statistic to >10,000 SNP markers assayed in wild-derived inbred strains of house mice to identify genomic regions of elevated differentiation between two subspecies of house mice, Mus musculus musculus and M. m. domesticus. Differentiation was high through approximately 90% of the X chromosome. In addition, eight regions of high differentiation were identified on the autosomes, totaling 7.5% of the autosomal genome. Regions of high differentiation were confirmed by direct sequencing of samples collected from the wild. Some regions of elevated differentiation have an overrepresentation of genes with host-pathogen interactions and olfaction. The most strongly differentiated region on the X has previously been shown to fail to introgress across a hybrid zone between the two subspecies. This survey indicates autosomal regions that should also be examined for differential introgression across the hybrid zone, as containing potential genes causing hybrid unfitness.
Collapse
Affiliation(s)
- Bettina Harr
- Institute for Genetics, Department of Evolutionary Genetics, 50674 Köln, Germany.
| |
Collapse
|
568
|
Clark AG. Genomics of the evolutionary process. Trends Ecol Evol 2006; 21:316-21. [PMID: 16769431 DOI: 10.1016/j.tree.2006.04.004] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2005] [Revised: 03/27/2006] [Accepted: 04/18/2006] [Indexed: 12/29/2022]
Abstract
Comparative analysis of genome sequences has become the primary means by which functional elements are first identified, often preceding even the identification of their function. Although this approach capitalizes on the conservation of homologous functions, it has also been successful in identifying evolutionary novelties, including new genes and pathways. As I discuss here, the analysis of multiple alignments of sequences from species on a known phylogeny has provided rich detail about the heterogeneities in the process of genome changes. Inferences of positive selection acting on protein-encoding genes have provided clues about the role of adaptive evolution in the past. These methods also identify negatively selected genes, providing some clue to genes that are most likely to be mutable to a disease-causing state.
Collapse
Affiliation(s)
- Andrew G Clark
- Department of Molecular Biology and Genetics, Cornell University, Ithaca, NY 14853, USA.
| |
Collapse
|
569
|
Wong P, Frishman D. Fold designability, distribution, and disease. PLoS Comput Biol 2006; 2:e40. [PMID: 16680196 PMCID: PMC1456317 DOI: 10.1371/journal.pcbi.0020040] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2005] [Accepted: 03/17/2006] [Indexed: 12/04/2022] Open
Abstract
Fold designability has been estimated by the number of families contained in that fold. Here, we show that among orthologous proteins, sequence divergence is higher for folds with greater numbers of families. Folds with greater numbers of families also tend to have families that appear more often in the proteome and greater promiscuity (the number of unique “partner” folds that the fold is found with within the same protein). We also find that many disease-related proteins have folds with relatively few families. In particular, a number of these proteins are associated with diseases occurring at high frequency. These results suggest that family counts reflect how certain structures are distributed in nature and is an important characteristic associated with many human diseases. Most proteins are composed of structural domains that can be classified into “folds.” Domains with the same fold type share overall structural similarity. The number of amino acid sequences that encode a fold is termed the “designability” of the fold. Folds that have higher designability are thought to be more robust to stresses and mutations. Such features may also allow the fold to appear in a greater variety of contexts. Here, the authors show that proteins with folds estimated to be of higher designability are more widespread amongst proteins in human, mouse, and yeast, consistent with this hypothesis. The authors also find that many hereditary disease-associated proteins have folds estimated to be of low designability. A number of these diseases occur at a relatively high frequency. These results suggest that the estimate of designability employed reflects how certain structures are distributed in nature and is an important characteristic associated with many human diseases.
Collapse
Affiliation(s)
- Philip Wong
- Institute for Bioinformatics, GSF–National Research Center for Environment and Health, Neuherberg, Germany
| | - Dmitrij Frishman
- Institute for Bioinformatics, GSF–National Research Center for Environment and Health, Neuherberg, Germany
- Department of Genome-Oriented Bioinformatics, Technische Universität Munchen, Wissenschaftzentrum Weihenstephan, Freising, Germany
- * To whom correspondence should be addressed. E-mail:
| |
Collapse
|
570
|
Suzuki Y. Ancient positive selection on CD155 as a possible cause for susceptibility to poliovirus infection in simians. Gene 2006; 373:16-22. [PMID: 16500041 DOI: 10.1016/j.gene.2005.12.016] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2005] [Revised: 12/16/2005] [Accepted: 12/19/2005] [Indexed: 11/15/2022]
Abstract
Poliovirus is the etiological agent of poliomyelitis. From the observations that only simians are susceptible to poliovirus infection and that 37 amino acid sites (the poliovirus-binding associated [PBA] sites) in the domain D1 of CD155 are involved in the binding to poliovirus, it is considered that the susceptibility to poliovirus infection evolved through amino acid substitutions that occurred at the PBA sites on the ancestral branch of simians. Here it is shown that positive selection has operated on these substitutions by analyzing the nucleotide sequences encoding almost the entire region of D1 in humans, non-human hominoids (chimpanzees and gorillas), Old World monkeys (African green monkeys), New World monkeys (brown capuchins, squirrel monkeys, and marmosets), prosimians (ring-tailed lemurs), and non-primate mammals (rabbits). Positive selection is unlikely to have operated on the susceptibility to poliovirus infection, but possibly on the binding to another molecule. Elimination of susceptibility to poliovirus infection in simians may be difficult, because it also requires elimination of advantageous effects that have been exerted by CD155.
Collapse
Affiliation(s)
- Yoshiyuki Suzuki
- Center for Information Biology and DNA Data Bank of Japan, National Institute of Genetics, 1111 Yata, Mishima-shi, Shizuoka-ken 411-8540, Japan.
| |
Collapse
|
571
|
Gilad Y, Oshlack A, Smyth GK, Speed TP, White KP. Expression profiling in primates reveals a rapid evolution of human transcription factors. Nature 2006; 440:242-5. [PMID: 16525476 DOI: 10.1038/nature04559] [Citation(s) in RCA: 218] [Impact Index Per Article: 12.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2005] [Accepted: 12/29/2005] [Indexed: 12/13/2022]
Abstract
Although it has been hypothesized for thirty years that many human adaptations are likely to be due to changes in gene regulation, almost nothing is known about the modes of natural selection acting on regulation in primates. Here we identify a set of genes for which expression is evolving under natural selection. We use a new multi-species complementary DNA array to compare steady-state messenger RNA levels in liver tissues within and between humans, chimpanzees, orangutans and rhesus macaques. Using estimates from a linear mixed model, we identify a set of genes for which expression levels have remained constant across the entire phylogeny (approximately 70 million years), and are therefore likely to be under stabilizing selection. Among the top candidates are five genes with expression levels that have previously been shown to be altered in liver carcinoma. We also find a number of genes with similar expression levels among non-human primates but significantly elevated or reduced expression in the human lineage, features that point to the action of directional selection. Among the gene set with a human-specific increase in expression, there is an excess of transcription factors; the same is not true for genes with increased expression in chimpanzee.
Collapse
Affiliation(s)
- Yoav Gilad
- Department of Genetics, Yale University, New Haven, Connecticut 06510, USA.
| | | | | | | | | |
Collapse
|
572
|
|
573
|
Patin E, Barreiro LB, Sabeti PC, Austerlitz F, Luca F, Sajantila A, Behar DM, Semino O, Sakuntabhai A, Guiso N, Gicquel B, McElreavey K, Harding RM, Heyer E, Quintana-Murci L. Deciphering the ancient and complex evolutionary history of human arylamine N-acetyltransferase genes. Am J Hum Genet 2006; 78:423-36. [PMID: 16416399 PMCID: PMC1380286 DOI: 10.1086/500614] [Citation(s) in RCA: 72] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2005] [Accepted: 12/21/2005] [Indexed: 12/24/2022] Open
Abstract
The human N-acetyltransferase genes NAT1 and NAT2 encode two phase-II enzymes that metabolize various drugs and carcinogens. Functional variability at these genes has been associated with adverse drug reactions and cancer susceptibility. Mutations in NAT2 leading to the so-called slow-acetylation phenotype reach high frequencies worldwide, which questions the significance of altered acetylation in human adaptation. To investigate the role of population history and natural selection in shaping NATs variation, we characterized genetic diversity through the resequencing and genotyping of NAT1, NAT2, and the pseudogene NATP in a collection of 13 different populations with distinct ethnic backgrounds and demographic pasts. This combined study design allowed us to define a detailed map of linkage disequilibrium of the NATs region as well as to perform a number of sequence-based neutrality tests and the long-range haplotype (LRH) test. Our data revealed distinctive patterns of variability for the two genes: the reduced diversity observed at NAT1 is consistent with the action of purifying selection, whereas NAT2 functional variation contributes to high levels of diversity. In addition, the LRH test identified a particular NAT2 haplotype (NAT2*5B) under recent positive selection in western/central Eurasians. This haplotype harbors the mutation 341T-->C and encodes the "slowest-acetylator" NAT2 enzyme, suggesting a general selective advantage for the slow-acetylator phenotype. Interestingly, the NAT2*5B haplotype, which seems to have conferred a selective advantage during the past approximately 6,500 years, exhibits today the strongest association with susceptibility to bladder cancer and adverse drug reactions. On the whole, the patterns observed for NAT2 well illustrate how geographically and temporally fluctuating xenobiotic environments may have influenced not only our genome variability but also our present-day susceptibility to disease.
Collapse
Affiliation(s)
- Etienne Patin
- Centre National de la Recherche Scientifique, CNRS, FRE 2849, Unit of Molecular Prevention and Therapy of Human Diseases, Paris, France
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
574
|
Pollinger JP, Bustamante CD, Fledel-Alon A, Schmutz S, Gray MM, Wayne RK. Selective sweep mapping of genes with large phenotypic effects. Genome Res 2006; 15:1809-19. [PMID: 16339379 PMCID: PMC1356119 DOI: 10.1101/gr.4374505] [Citation(s) in RCA: 88] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
Many domestic dog breeds have originated through fixation of discrete mutations by intense artificial selection. As a result of this process, markers in the proximity of genes influencing breed-defining traits will have reduced variation (a selective sweep) and will show divergence in allele frequency. Consequently, low-resolution genomic scans can potentially be used to identify regions containing genes that have a major influence on breed-defining traits. We model the process of breed formation and show that the probability of two or three adjacent marker loci showing a spurious signal of selection within at least one breed (i.e., Type I error or false-positive rate) is low if highly variable and moderately spaced markers are utilized. We also use simulations with selection to demonstrate that even a moderately spaced set of highly polymorphic markers (e.g., one every 0.8 cM) has high power to detect regions targeted by strong artificial selection in dogs. Further, we show that a gene responsible for black coat color in the Large Munsterlander has a 40-Mb region surrounding the gene that is very low in heterozygosity for microsatellite markers. Similarly, we survey 302 microsatellite markers in the Dachshund and find three linked monomorphic microsatellite markers all within a 10-Mb region on chromosome 3. This region contains the FGFR3 gene, which is responsible for achondroplasia in humans, but not in dogs. Consequently, our results suggest that the causative mutation is a gene or regulatory region closely linked to FGFR3.
Collapse
Affiliation(s)
- John P Pollinger
- Department of Ecology and Evolutionary Biology, University of California, Los Angeles, California 90095-1606, USA.
| | | | | | | | | | | |
Collapse
|
575
|
Lee S, Kohane I, Kasif S. Genes involved in complex adaptive processes tend to have highly conserved upstream regions in mammalian genomes. BMC Genomics 2005; 6:168. [PMID: 16309559 PMCID: PMC1310621 DOI: 10.1186/1471-2164-6-168] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2005] [Accepted: 11/27/2005] [Indexed: 11/18/2022] Open
Abstract
BACKGROUND Recent advances in genome sequencing suggest a remarkable conservation in gene content of mammalian organisms. The similarity in gene repertoire present in different organisms has increased interest in studying regulatory mechanisms of gene expression aimed at elucidating the differences in phenotypes. In particular, a proximal promoter region contains a large number of regulatory elements that control the expression of its downstream gene. Although many studies have focused on identification of these elements, a broader picture on the complexity of transcriptional regulation of different biological processes has not been addressed in mammals. The regulatory complexity may strongly correlate with gene function, as different evolutionary forces must act on the regulatory systems under different biological conditions. We investigate this hypothesis by comparing the conservation of promoters upstream of genes classified in different functional categories. RESULTS By conducting a rank correlation analysis between functional annotation and upstream sequence alignment scores obtained by human-mouse and human-dog comparison, we found a significantly greater conservation of the upstream sequence of genes involved in development, cell communication, neural functions and signaling processes than those involved in more basic processes shared with unicellular organisms such as metabolism and ribosomal function. This observation persists after controlling for G+C content. Considering conservation as a functional signature, we hypothesize a higher density of cis-regulatory elements upstream of genes participating in complex and adaptive processes. CONCLUSION We identified a class of functions that are associated with either high or low promoter conservation in mammals. We detected a significant tendency that points to complex and adaptive processes were associated with higher promoter conservation, despite the fact that they have emerged relatively recently during evolution. We described and contrasted several hypotheses that provide a deeper insight into how transcriptional complexity might have been emerged during evolution.
Collapse
Affiliation(s)
- Soohyun Lee
- Bioinformatics Program, Boston University, Boston, MA 02215, USA
- Center for Advanced Genomic Technology,. Boston University, Boston, MA 02215, USA
| | - Isaac Kohane
- Children's Hospital Informatics Program at Harvard-MIT Health Sciences and Technology, Boston, MA 02215, USA
| | - Simon Kasif
- Bioinformatics Program, Boston University, Boston, MA 02215, USA
- Department of Biomedical Engineering, Boston University, Boston, MA 02215, USA
- Center for Advanced Genomic Technology,. Boston University, Boston, MA 02215, USA
- Children's Hospital Informatics Program at Harvard-MIT Health Sciences and Technology, Boston, MA 02215, USA
| |
Collapse
|
576
|
Abstract
The distribution of mutational effects on fitness is of fundamental importance for many aspects of evolution. We develop two methods for characterizing the fitness effects of deleterious, nonsynonymous mutations, using polymorphism data from two related species. These methods also provide estimates of the proportion of amino acid substitutions that are selectively favorable, when combined with data on between-species sequence divergence. The methods are applicable to species with different effective population sizes, but that share the same distribution of mutational effects. The first, simpler, method assumes that diversity for all nonneutral mutations is given by the value under mutation-selection balance, while the second method allows for stronger effects of genetic drift and yields estimates of the parameters of the probability distribution of mutational effects. We apply these methods to data on populations of Drosophila miranda and D. pseudoobscura and find evidence for the presence of deleterious nonsynonymous mutations, mostly with small heterozygous selection coefficients (a mean of the order of 10(-5) for segregating variants). A leptokurtic gamma distribution of mutational effects with a shape parameter between 0.1 and 1 can explain observed diversities, in the absence of a separate class of completely neutral nonsynonymous mutations. We also describe a simple approximate method for estimating the harmonic mean selection coefficient from diversity data on a single species.
Collapse
Affiliation(s)
- Laurence Loewe
- Institute of Evolutionary Biology, School of Biological Sciences, University of Edinburgh, Edinburgh EH9 3JT, United Kingdom.
| | | | | | | |
Collapse
|
577
|
Clark AG, Hubisz MJ, Bustamante CD, Williamson SH, Nielsen R. Ascertainment bias in studies of human genome-wide polymorphism. Genome Res 2005; 15:1496-502. [PMID: 16251459 PMCID: PMC1310637 DOI: 10.1101/gr.4107905] [Citation(s) in RCA: 330] [Impact Index Per Article: 17.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2005] [Accepted: 09/06/2005] [Indexed: 11/25/2022]
Abstract
Large-scale SNP genotyping studies rely on an initial assessment of nucleotide variation to identify sites in the DNA sequence that harbor variation among individuals. This "SNP discovery" sample may be quite variable in size and composition, and it has been well established that properties of the SNPs that are found are influenced by the discovery sampling effort. The International HapMap project relied on nearly any piece of information available to identify SNPs-including BAC end sequences, shotgun reads, and differences between public and private sequences-and even made use of chimpanzee data to confirm human sequence differences. In addition, the ascertainment criteria shifted from using only SNPs that had been validated in population samples, to double-hit SNPs, to finally accepting SNPs that were singletons in small discovery samples. In contrast, Perlegen's primary discovery was a resequencing-by-hybridization effort using the 24 people of diverse origin in the Polymorphism Discovery Resource. Here we take these two data sets and contrast two basic summary statistics, heterozygosity and F(ST), as well as the site frequency spectra, for 500-kb windows spanning the genome. The magnitude of disparity between these samples in these measures of variability indicates that population genetic analysis on the raw genotype data is ill advised. Given the knowledge of the discovery samples, we perform an ascertainment correction and show how the post-correction data are more consistent across these studies. However, discrepancies persist, suggesting that the heterogeneity in the SNP discovery process of the HapMap project resulted in a data set resistant to complete ascertainment correction. Ascertainment bias will likely erode the power of tests of association between SNPs and complex disorders, but the effect will likely be small, and perhaps more importantly, it is unlikely that the bias will introduce false-positive inferences.
Collapse
Affiliation(s)
- Andrew G Clark
- Molecular Biology and Genetics and Computational Biology, Cornell University, Ithaca, New York 14853, USA.
| | | | | | | | | |
Collapse
|