1
|
Abstract
Human accelerated regions (HARs) are the fastest-evolving sequences in the human genome. When HARs were discovered in 2006, their function was mysterious due to scant annotation of the noncoding genome. Diverse technologies, from transgenic animals to machine learning, have consistently shown that HARs function as gene regulatory enhancers with significant enrichment in neurodevelopment. It is now possible to quantitatively measure the enhancer activity of thousands of HARs in parallel and model how each nucleotide contributes to gene expression. These strategies have revealed that many human HAR sequences function differently than their chimpanzee orthologs, though individual nucleotide changes in the same HAR may have opposite effects, consistent with compensatory substitutions. To fully evaluate the role of HARs in human evolution, it will be necessary to experimentally and computationally dissect them across more cell types and developmental stages.
Collapse
Affiliation(s)
- Sean Whalen
- Gladstone Institute of Data Science and Biotechnology, San Francisco, California, USA; ,
| | - Katherine S Pollard
- Gladstone Institute of Data Science and Biotechnology, San Francisco, California, USA; ,
- Department of Epidemiology and Biostatistics, University of California, San Francisco, California, USA
- Chan Zuckerberg Biohub, San Francisco, California, USA
| |
Collapse
|
2
|
Keränen SVE, Villahoz-Baleta A, Bruno AE, Halfon MS. REDfly: An Integrated Knowledgebase for Insect Regulatory Genomics. INSECTS 2022; 13:618. [PMID: 35886794 PMCID: PMC9323752 DOI: 10.3390/insects13070618] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/16/2022] [Revised: 07/01/2022] [Accepted: 07/06/2022] [Indexed: 11/29/2022]
Abstract
We provide here an updated description of the REDfly (Regulatory Element Database for Fly) database of transcriptional regulatory elements, a unique resource that provides regulatory annotation for the genome of Drosophila and other insects. The genomic sequences regulating insect gene expression-transcriptional cis-regulatory modules (CRMs, e.g., "enhancers") and transcription factor binding sites (TFBSs)-are not currently curated by any other major database resources. However, knowledge of such sequences is important, as CRMs play critical roles with respect to disease as well as normal development, phenotypic variation, and evolution. Characterized CRMs also provide useful tools for both basic and applied research, including developing methods for insect control. REDfly, which is the most detailed existing platform for metazoan regulatory-element annotation, includes over 40,000 experimentally verified CRMs and TFBSs along with their DNA sequences, their associated genes, and the expression patterns they direct. Here, we briefly describe REDfly's contents and data model, with an emphasis on the new features implemented since 2020. We then provide an illustrated walk-through of several common REDfly search use cases.
Collapse
Affiliation(s)
| | - Angel Villahoz-Baleta
- Center for Computational Research, State University of New York at Buffalo, Buffalo, NY 14203, USA; (A.V.-B.); (A.E.B.)
- New York State Center of Excellence in Bioinformatics and Life Sciences, State University of New York at Buffalo, Buffalo, NY 14203, USA
| | - Andrew E. Bruno
- Center for Computational Research, State University of New York at Buffalo, Buffalo, NY 14203, USA; (A.V.-B.); (A.E.B.)
- New York State Center of Excellence in Bioinformatics and Life Sciences, State University of New York at Buffalo, Buffalo, NY 14203, USA
| | - Marc S. Halfon
- New York State Center of Excellence in Bioinformatics and Life Sciences, State University of New York at Buffalo, Buffalo, NY 14203, USA
- Department of Biochemistry, State University of New York at Buffalo, Buffalo, NY 14203, USA
- Department of Biomedical Informatics, State University of New York at Buffalo, Buffalo, NY 14203, USA
- Department of Biological Sciences, State University of New York at Buffalo, Buffalo, NY 14203, USA
- Department of Molecular and Cellular Biology and Program in Cancer Genetics, Roswell Park Cancer Institute, Buffalo, NY 14263, USA
| |
Collapse
|
3
|
Asma H, Halfon MS. Annotating the Insect Regulatory Genome. INSECTS 2021; 12:591. [PMID: 34209769 PMCID: PMC8305585 DOI: 10.3390/insects12070591] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/28/2021] [Revised: 06/23/2021] [Accepted: 06/25/2021] [Indexed: 11/17/2022]
Abstract
An ever-growing number of insect genomes is being sequenced across the evolutionary spectrum. Comprehensive annotation of not only genes but also regulatory regions is critical for reaping the full benefits of this sequencing. Driven by developments in sequencing technologies and in both empirical and computational discovery strategies, the past few decades have witnessed dramatic progress in our ability to identify cis-regulatory modules (CRMs), sequences such as enhancers that play a major role in regulating transcription. Nevertheless, providing a timely and comprehensive regulatory annotation of newly sequenced insect genomes is an ongoing challenge. We review here the methods being used to identify CRMs in both model and non-model insect species, and focus on two tools that we have developed, REDfly and SCRMshaw. These resources can be paired together in a powerful combination to facilitate insect regulatory annotation over a broad range of species, with an accuracy equal to or better than that of other state-of-the-art methods.
Collapse
Affiliation(s)
- Hasiba Asma
- Program in Genetics, Genomics, and Bioinformatics, University at Buffalo-State University of New York, Buffalo, NY 14203, USA;
| | - Marc S. Halfon
- Program in Genetics, Genomics, and Bioinformatics, University at Buffalo-State University of New York, Buffalo, NY 14203, USA;
- Department of Biochemistry, University at Buffalo-State University of New York, Buffalo, NY 14203, USA
- Department of Biomedical Informatics, University at Buffalo-State University of New York, Buffalo, NY 14203, USA
- Department of Biological Sciences, University at Buffalo-State University of New York, Buffalo, NY 14203, USA
- NY State Center of Excellence in Bioinformatics & Life Sciences, Buffalo, NY 14203, USA
| |
Collapse
|
4
|
Rivera J, Keränen SVE, Gallo SM, Halfon MS. REDfly: the transcriptional regulatory element database for Drosophila. Nucleic Acids Res 2020; 47:D828-D834. [PMID: 30329093 PMCID: PMC6323911 DOI: 10.1093/nar/gky957] [Citation(s) in RCA: 44] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2018] [Accepted: 10/04/2018] [Indexed: 12/21/2022] Open
Abstract
The REDfly database provides a comprehensive curation of experimentally-validated Drosophila transcriptional cis-regulatory elements and includes information on DNA sequence, experimental evidence, patterns of regulated gene expression, and more. Now in its thirteenth year, REDfly has grown to over 23 000 records of tested reporter gene constructs and 2200 tested transcription factor binding sites. Recent developments include the start of curation of predicted cis-regulatory modules in addition to experimentally-verified ones, improved search and filtering, and increased interaction with the authors of curated papers. An expanded data model that will capture information on temporal aspects of gene regulation, regulation in response to environmental and other non-developmental cues, sexually dimorphic gene regulation, and non-endogenous (ectopic) aspects of reporter gene expression is under development and expected to be in place within the coming year. REDfly is freely accessible at http://redfly.ccr.buffalo.edu, and news about database updates and new features can be followed on Twitter at @REDfly_database.
Collapse
Affiliation(s)
- John Rivera
- Center for Computational Research, State University of New York at Buffalo, Buffalo, NY 14203, USA.,New York State Center of Excellence in Bioinformatics and Life Sciences, State University of New York at Buffalo, Buffalo, NY 14203, USA
| | | | - Steven M Gallo
- Center for Computational Research, State University of New York at Buffalo, Buffalo, NY 14203, USA.,New York State Center of Excellence in Bioinformatics and Life Sciences, State University of New York at Buffalo, Buffalo, NY 14203, USA
| | - Marc S Halfon
- New York State Center of Excellence in Bioinformatics and Life Sciences, State University of New York at Buffalo, Buffalo, NY 14203, USA.,Department of Biochemistry, State University of New York at Buffalo, Buffalo, NY 14203, USA.,Department of Biomedical Informatics, State University of New York at Buffalo, Buffalo, NY 14203, USA.,Department of Biological Sciences, State University of New York at Buffalo, Buffalo, NY 14203, USA.,Department of Molecular and Cellular Biology and Program in Cancer Genetics, Roswell Park Cancer Institute, Buffalo, NY 14263, USA
| |
Collapse
|
5
|
Kostka D, Holloway AK, Pollard KS. Developmental Loci Harbor Clusters of Accelerated Regions That Evolved Independently in Ape Lineages. Mol Biol Evol 2019; 35:2034-2045. [PMID: 29897475 PMCID: PMC6063267 DOI: 10.1093/molbev/msy109] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
Abstract
Some of the fastest evolving regions of the human genome are conserved noncoding elements with many human-specific DNA substitutions. These human accelerated regions (HARs) are enriched nearby regulatory genes, and several HARs function as developmental enhancers. To investigate if this evolutionary signature is unique to humans, we quantified evidence of accelerated substitutions in conserved genomic elements across multiple lineages and applied this approach simultaneously to the genomes of five apes: human, chimpanzee, gorilla, orangutan, and gibbon. We find roughly similar numbers and genomic distributions of lineage-specific accelerated regions (linARs) in all five apes. In particular, apes share an enrichment of linARs in regulatory DNA nearby genes involved in development, especially transcription factors and other regulators. Many developmental loci harbor clusters of nonoverlapping linARs from multiple apes, suggesting that accelerated evolution in each species affected distinct regulatory elements that control a shared set of developmental pathways. Our statistical tests distinguish between GC-biased and unbiased accelerated substitution rates, allowing us to quantify the roles of different evolutionary forces in creating linARs. We find evidence of GC-biased gene conversion in each ape, but unbiased acceleration consistent with positive selection or loss of constraint is more common in all five lineages. It therefore appears that similar evolutionary processes created independent accelerated regions in the genomes of different apes, and that these lineage-specific changes to conserved noncoding sequences may have differentially altered expression of a core set of developmental genes across ape evolution.
Collapse
Affiliation(s)
- Dennis Kostka
- Departments of Developmental Biology and Computational & Systems Biology, Pittsburgh Center for Evolutionary Biology and Medicine, University of Pittsburgh School of Medicine, Pittsburgh, PA
| | - Alisha K Holloway
- Gladstone Institutes, San Francisco, CA.,Phylos Bioscience, Portland, OR.,Department of Epidemiology & Biostatistics, Institute for Human Genetics, Quantitative Biology Institutes, and Institute for Computational Health Sciences, University of California, San Francisco, CA
| | - Katherine S Pollard
- Gladstone Institutes, San Francisco, CA.,Department of Epidemiology & Biostatistics, Institute for Human Genetics, Quantitative Biology Institutes, and Institute for Computational Health Sciences, University of California, San Francisco, CA.,Chan-Zuckerberg Biohub, San Francisco, CA
| |
Collapse
|
6
|
Abstract
Populations evolve as mutations arise in individual organisms and, through hereditary transmission, may become "fixed" (shared by all individuals) in the population. Most mutations are lethal or have negative fitness consequences for the organism. Others have essentially no effect on organismal fitness and can become fixed through the neutral stochastic process known as random drift. However, mutations may also produce a selective advantage that boosts their chances of reaching fixation. Regions of genomes where new mutations are beneficial, rather than neutral or deleterious, tend to evolve more rapidly due to positive selection. Genes involved in immunity and defense are a well-known example; rapid evolution in these genes presumably occurs because new mutations help organisms to prevail in evolutionary "arms races" with pathogens. In recent years genome-wide scans for selection have enlarged our understanding of the genome evolution of various species. In this chapter, we will focus on methods to detect selection on the genome. In particular, we will discuss probabilistic models and how they have changed with the advent of new genome-wide data now available.
Collapse
Affiliation(s)
- Carolin Kosiol
- Centre of Biological Diversity, School of Biology, University of St Andrews, Fife, UK.
- Institut für Populationsgenetik, Vetmeduni Vienna, Wien, Austria.
| | - Maria Anisimova
- Institute of Applied Simulation, School of Life Sciences and Facility Management, Zurich University of Applied Sciences (ZHAW), Wädenswil, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
| |
Collapse
|
7
|
Shadow Enhancers Are Pervasive Features of Developmental Regulatory Networks. Curr Biol 2015; 26:38-51. [PMID: 26687625 PMCID: PMC4712172 DOI: 10.1016/j.cub.2015.11.034] [Citation(s) in RCA: 153] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2015] [Revised: 11/16/2015] [Accepted: 11/17/2015] [Indexed: 11/22/2022]
Abstract
Embryogenesis is remarkably robust to segregating mutations and environmental variation; under a range of conditions, embryos of a given species develop into stereotypically patterned organisms. Such robustness is thought to be conferred, in part, through elements within regulatory networks that perform similar, redundant tasks. Redundant enhancers (or "shadow" enhancers), for example, can confer precision and robustness to gene expression, at least at individual, well-studied loci. However, the extent to which enhancer redundancy exists and can thereby have a major impact on developmental robustness remains unknown. Here, we systematically assessed this, identifying over 1,000 predicted shadow enhancers during Drosophila mesoderm development. The activity of 23 elements, associated with five genes, was examined in transgenic embryos, while natural structural variation among individuals was used to assess their ability to buffer against genetic variation. Our results reveal three clear properties of enhancer redundancy within developmental systems. First, it is much more pervasive than previously anticipated, with 64% of loci examined having shadow enhancers. Their spatial redundancy is often partial in nature, while the non-overlapping function may explain why these enhancers are maintained within a population. Second, over 70% of loci do not follow the simple situation of having only two shadow enhancers-often there are three (rols), four (CadN and ade5), or five (Traf1), at least one of which can be deleted with no obvious phenotypic effects. Third, although shadow enhancers can buffer variation, patterns of segregating variation suggest that they play a more complex role in development than generally considered.
Collapse
|
8
|
Wang MS, Adeola AC, Li Y, Zhang YP, Wu DD. Accelerated evolution of constraint elements for hematophagic adaptation in mosquitoes. Zool Res 2015; 36:320-327. [PMID: 26646568 PMCID: PMC4771951 DOI: 10.13918/j.issn.2095-8137.2015.6.320] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2015] [Accepted: 10/09/2015] [Indexed: 06/05/2023] Open
Abstract
Comparative genomics is a powerful approach that comprehensively interprets the genome. Herein, we performed whole genome comparative analysis of 16 Diptera genomes, including four mosquitoes and 12 Drosophilae. We found more than 540 000 constraint elements (CEs) in the Diptera genome, with the majority found in the intergenic, coding and intronic regions. Accelerated elements (AEs) identified in mosquitoes were mostly in the protein-coding regions (>93%), which differs from vertebrates in genomic distribution. Some genes functionally enriched in blood digestion, body temperature regulation and insecticide resistance showed rapid evolution not only in the lineage of the recent common ancestor of mosquitoes (RCAM), but also in some mosquito lineages. This may be associated with lineage-specific traits and/or adaptations in comparison with other insects. Our findings revealed that although universally fast evolution acted on biological systems in RCAM, such as hematophagy, same adaptations also appear to have occurred through distinct degrees of evolution in different mosquito species, enabling them to be successful blood feeders in different environments.
Collapse
Affiliation(s)
- Ming-Shan Wang
- State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming 650223, China; University of Chinese Academy of Sciences, Beijing 100049, China
| | - Adeniyi C Adeola
- State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming 650223, China; University of Chinese Academy of Sciences, Beijing 100049, China
| | - Yan Li
- State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming 650223, China
| | - Ya-Ping Zhang
- State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming 650223, China;Laboratory for Conservation and Utilization of Bio-resources, Yunnan University, Kunming 650091, China.
| | - Dong-Dong Wu
- State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming 650223, China.
| |
Collapse
|
9
|
Hubisz MJ, Pollard KS. Exploring the genesis and functions of Human Accelerated Regions sheds light on their role in human evolution. Curr Opin Genet Dev 2014; 29:15-21. [PMID: 25156517 DOI: 10.1016/j.gde.2014.07.005] [Citation(s) in RCA: 82] [Impact Index Per Article: 8.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2014] [Revised: 07/23/2014] [Accepted: 07/25/2014] [Indexed: 12/31/2022]
Abstract
Human accelerated regions (HARs) are DNA sequences that changed very little throughout mammalian evolution, but then experienced a burst of changes in humans since divergence from chimpanzees. This unexpected evolutionary signature is suggestive of deeply conserved function that was lost or changed on the human lineage. Since their discovery, the actual roles of HARs in human evolution have remained somewhat elusive, due to their being almost exclusively non-coding sequences with no annotation. Ongoing research is beginning to crack this problem by leveraging new genome sequences, functional genomics data, computational approaches, and genetic assays to reveal that many HARs are developmental gene regulatory elements and RNA genes, most of which evolved their uniquely human mutations through positive selection before divergence of archaic hominins and diversification of modern humans.
Collapse
Affiliation(s)
- Melissa J Hubisz
- Department of Biological Statistics and Computational Biology, Cornell University, 102D Weill Hall, Ithaca, NY 14853, USA
| | - Katherine S Pollard
- Gladstone Institutes, Division of Biostatistics & Institute for Human Genetics, University of California, 1650 Owens Street, San Francisco, CA 94158, USA.
| |
Collapse
|
10
|
Poh YP, Ting CT, Fu HW, Langley CH, Begun DJ. Population genomic analysis of base composition evolution in Drosophila melanogaster. Genome Biol Evol 2013; 4:1245-55. [PMID: 23160062 PMCID: PMC3542573 DOI: 10.1093/gbe/evs097] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022] Open
Abstract
The relative importance of mutation, selection, and biased gene conversion to patterns of base composition variation in Drosophila melanogaster, and to a lesser extent, D. simulans, has been investigated for many years. However, genomic data from sufficiently large samples to thoroughly characterize patterns of base composition polymorphism within species have been lacking. Here, we report a genome-wide analysis of coding and noncoding polymorphism in a large sample of inbred D. melanogaster strains from Raleigh, North Carolina. Consistent with previous results, we observed that AT mutations fix more frequently than GC mutations in D. melanogaster. Contrary to predictions of previous models of codon usage in D. melanogaster, we found that synonymous sites segregating for derived AT polymorphisms were less skewed toward low frequencies compared with sites segregating a derived GC polymorphism. However, no such pattern was observed for comparable base composition polymorphisms in noncoding DNA. These results suggest that AT-ending codons could currently be favored by natural selection in the D. melanogaster lineage.
Collapse
Affiliation(s)
- Yu-Ping Poh
- Institute of Molecular and Cellular Biology, National Tsing Hua University, Taiwan, Republic of China.
| | | | | | | | | |
Collapse
|
11
|
Bullaughey K. Multidimensional adaptive evolution of a feed-forward network and the illusion of compensation. Evolution 2012; 67:49-65. [PMID: 23289561 DOI: 10.1111/j.1558-5646.2012.01735.x] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Abstract
When multiple substitutions affect a trait in opposing ways, they are often assumed to be compensatory, not only with respect to the trait, but also with respect to fitness. This type of compensatory evolution has been suggested to underlie the evolution of protein structures and interactions, RNA secondary structures, and gene regulatory modules and networks. The possibility for compensatory evolution results from epistasis. Yet if epistasis is widespread, then it is also possible that the opposing substitutions are individually adaptive. I term this possibility an adaptive reversal. Although possible for arbitrary phenotype-fitness mappings, it has not yet been investigated whether such epistasis is prevalent in a biologically realistic setting. I investigate a particular regulatory circuit, the type I coherent feed-forward loop, which is ubiquitous in natural systems and is accurately described by a simple mathematical model. I show that such reversals are common during adaptive evolution, can result solely from the topology of the fitness landscape, and can occur even when adaptation follows a modest environmental change and the network was well adapted to the original environment. The possibility of adaptive reversals warrants a systems perspective when interpreting substitution patterns in gene regulatory networks.
Collapse
Affiliation(s)
- Kevin Bullaughey
- Department of Ecology & Evolution, University of Chicago, Chicago, IL 60637, USA.
| |
Collapse
|
12
|
Busser BW, Taher L, Kim Y, Tansey T, Bloom MJ, Ovcharenko I, Michelson AM. A machine learning approach for identifying novel cell type-specific transcriptional regulators of myogenesis. PLoS Genet 2012; 8:e1002531. [PMID: 22412381 PMCID: PMC3297574 DOI: 10.1371/journal.pgen.1002531] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2011] [Accepted: 12/23/2011] [Indexed: 12/22/2022] Open
Abstract
Transcriptional enhancers integrate the contributions of multiple classes of transcription factors (TFs) to orchestrate the myriad spatio-temporal gene expression programs that occur during development. A molecular understanding of enhancers with similar activities requires the identification of both their unique and their shared sequence features. To address this problem, we combined phylogenetic profiling with a DNA-based enhancer sequence classifier that analyzes the TF binding sites (TFBSs) governing the transcription of a co-expressed gene set. We first assembled a small number of enhancers that are active in Drosophila melanogaster muscle founder cells (FCs) and other mesodermal cell types. Using phylogenetic profiling, we increased the number of enhancers by incorporating orthologous but divergent sequences from other Drosophila species. Functional assays revealed that the diverged enhancer orthologs were active in largely similar patterns as their D. melanogaster counterparts, although there was extensive evolutionary shuffling of known TFBSs. We then built and trained a classifier using this enhancer set and identified additional related enhancers based on the presence or absence of known and putative TFBSs. Predicted FC enhancers were over-represented in proximity to known FC genes; and many of the TFBSs learned by the classifier were found to be critical for enhancer activity, including POU homeodomain, Myb, Ets, Forkhead, and T-box motifs. Empirical testing also revealed that the T-box TF encoded by org-1 is a previously uncharacterized regulator of muscle cell identity. Finally, we found extensive diversity in the composition of TFBSs within known FC enhancers, suggesting that motif combinatorics plays an essential role in the cellular specificity exhibited by such enhancers. In summary, machine learning combined with evolutionary sequence analysis is useful for recognizing novel TFBSs and for facilitating the identification of cognate TFs that coordinate cell type-specific developmental gene expression patterns.
Collapse
Affiliation(s)
- Brian W. Busser
- Laboratory of Developmental Systems Biology, National Heart, Lung, and Blood Institute, National Institutes of Health, Bethesda, Maryland, United States of America
| | - Leila Taher
- Computational Biology Branch, National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland, United States of America
| | - Yongsok Kim
- Laboratory of Developmental Systems Biology, National Heart, Lung, and Blood Institute, National Institutes of Health, Bethesda, Maryland, United States of America
| | - Terese Tansey
- Laboratory of Developmental Systems Biology, National Heart, Lung, and Blood Institute, National Institutes of Health, Bethesda, Maryland, United States of America
| | - Molly J. Bloom
- Laboratory of Developmental Systems Biology, National Heart, Lung, and Blood Institute, National Institutes of Health, Bethesda, Maryland, United States of America
| | - Ivan Ovcharenko
- Computational Biology Branch, National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland, United States of America
- * E-mail: (IO); (AMM)
| | - Alan M. Michelson
- Laboratory of Developmental Systems Biology, National Heart, Lung, and Blood Institute, National Institutes of Health, Bethesda, Maryland, United States of America
- * E-mail: (IO); (AMM)
| |
Collapse
|
13
|
Abstract
Vast tracts of noncoding DNA contain elements that regulate gene expression in higher eukaryotes. Describing these regulatory elements and understanding how they evolve represent major challenges for biologists. Advances in the ability to survey genome-scale DNA sequence data are providing unprecedented opportunities to use evolutionary models and computational tools to identify functionally important elements and the mode of selection acting on them in multiple species. This chapter reviews some of the current methods that have been developed and applied on noncoding DNA, what they have shown us, and how they are limited. Results of several recent studies reveal that a significantly larger fraction of noncoding DNA in eukaryotic organisms is likely to be functional than previously believed, implying that the functional annotation of most noncoding DNA in these organisms is largely incomplete. In Drosophila, recent studies have further suggested that a large fraction of noncoding DNA divergence observed between species may be the product of recurrent adaptive substitution. Similar studies in humans have revealed a more complex pattern, with signatures of recurrent positive selection being largely concentrated in conserved noncoding DNA elements. Understanding these patterns and the extent to which they generalize to other organisms awaits the analysis of forthcoming genome-scale polymorphism and divergence data from more species.
Collapse
Affiliation(s)
- Ying Zhen
- Department of Ecology and Evolutionary Biology, The Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ, USA
| | | |
Collapse
|
14
|
Abstract
Populations evolve as mutations arise in individual organisms and, through hereditary transmission, may become "fixed" (shared by all individuals) in the population. Most mutations are lethal or have negative fitness consequences for the organism. Others have essentially no effect on organismal fitness and can become fixed through the neutral stochastic process known as random drift. However, mutations may also produce a selective advantage that boosts their chances of reaching fixation. Regions of genes where new mutations are beneficial, rather than neutral or deleterious, tend to evolve more rapidly due to positive selection. Genes involved in immunity and defense are a well-known example; rapid evolution in these genes presumably occurs because new mutations help organisms to prevail in evolutionary "arms races" with pathogens. In recent years, genome-wide scans for selection have enlarged our understanding of the evolution of the protein-coding regions of the various species. In this chapter, we focus on the methods to detect selection in protein-coding genes. In particular, we discuss probabilistic models and how they have changed with the advent of new genome-wide data now available.
Collapse
|
15
|
Kostka D, Hubisz MJ, Siepel A, Pollard KS. The role of GC-biased gene conversion in shaping the fastest evolving regions of the human genome. Mol Biol Evol 2011; 29:1047-57. [PMID: 22075116 PMCID: PMC3278478 DOI: 10.1093/molbev/msr279] [Citation(s) in RCA: 61] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open
Abstract
GC-biased gene conversion (gBGC) is a recombination-associated evolutionary process that accelerates the fixation of guanine or cytosine alleles, regardless of their effects on fitness. gBGC can increase the overall rate of substitutions, a hallmark of positive selection. Many fast-evolving genes and noncoding sequences in the human genome have GC-biased substitution patterns, suggesting that gBGC-in contrast to adaptive processes-may have driven the human changes in these sequences. To investigate this hypothesis, we developed a substitution model for DNA sequence evolution that quantifies the nonlinear interacting effects of selection and gBGC on substitution rates and patterns. Based on this model, we used a series of lineage-specific likelihood ratio tests to evaluate sequence alignments for evidence of changes in mode of selection, action of gBGC, or both. With a false positive rate of less than 5% for individual tests, we found that the majority (76%) of previously identified human accelerated regions are best explained without gBGC, whereas a substantial minority (19%) are best explained by the action of gBGC alone. Further, more than half (55%) have substitution rates that significantly exceed local estimates of the neutral rate, suggesting that these regions may have been shaped by positive selection rather than by relaxation of constraint. By distinguishing the effects of gBGC, relaxation of constraint, and positive selection we provide an integrated analysis of the evolutionary forces that shaped the fastest evolving regions of the human genome, which facilitates the design of targeted functional studies of adaptation in humans.
Collapse
Affiliation(s)
- Dennis Kostka
- Gladstone Institute of Cardiovascular Disease, University of California, San Francisco, USA.
| | | | | | | |
Collapse
|
16
|
Balakirev ES, Anisimova M, Ayala FJ. Complex interplay of evolutionary forces in the ladybird homeobox genes of Drosophila melanogaster. PLoS One 2011; 6:e22613. [PMID: 21799919 PMCID: PMC3142176 DOI: 10.1371/journal.pone.0022613] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2011] [Accepted: 06/29/2011] [Indexed: 11/19/2022] Open
Abstract
Tandemly arranged paralogous genes lbe and lbl are members of the Drosophila NK homeobox family. We analyzed population samples of Drosophila melanogaster from Africa, Europe, North and South America, and single strains of D. sechellia, D. simulans, and D. yakuba within two linked regions encompassing partial sequences of lbe and lbl. The evolution of lbe and lbl is highly constrained due to their important regulatory functions. Despite this, a variety of forces have shaped the patterns of variation in lb genes: recombination, intragenic gene conversion and natural selection strongly influence background variation created by linkage disequilibrium and dimorphic haplotype structure. The two genes exhibited similar levels of nucleotide diversity and positive selection was detected in the noncoding regions of both genes. However, synonymous variability was significantly higher for lbe: no nonsynonymous changes were observed in this gene. We argue that balancing selection impacts some synonymous sites of the lbe gene. Stability of mRNA secondary structure was significantly different between the lbe (but not lbl) haplotype groups and may represent a driving force of balancing selection in epistatically interacting synonymous sites. Balancing selection on synonymous sites may be the first, or one of a few such observations, in Drosophila. In contrast, recurrent positive selection on lbl at the protein level influenced evolution at three codon sites. Transcription factor binding-site profiles were different for lbe and lbl, suggesting that their developmental functions are not redundant. Combined with our previous results on nucleotide variation in esterase and other homeobox genes, these results suggest that interplay of balancing and directional selection may be a general feature of molecular evolution in Drosophila and other eukaryote genomes.
Collapse
Affiliation(s)
- Evgeniy S Balakirev
- Department of Ecology and Evolutionary Biology, University of California Irvine, Irvine, California, United States of America.
| | | | | |
Collapse
|
17
|
Hubisz MJ, Pollard KS, Siepel A. PHAST and RPHAST: phylogenetic analysis with space/time models. Brief Bioinform 2010; 12:41-51. [PMID: 21278375 DOI: 10.1093/bib/bbq072] [Citation(s) in RCA: 323] [Impact Index Per Article: 23.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
The PHylogenetic Analysis with Space/Time models (PHAST) software package consists of a collection of command-line programs and supporting libraries for comparative genomics. PHAST is best known as the engine behind the Conservation tracks in the University of California, Santa Cruz (UCSC) Genome Browser. However, it also includes several other tools for phylogenetic modeling and functional element identification, as well as utilities for manipulating alignments, trees and genomic annotations. PHAST has been in development since 2002 and has now been downloaded more than 1000 times, but so far it has been released only as provisional ('beta') software. Here, we describe the first official release (v1.0) of PHAST, with improved stability, portability and documentation and several new features. We outline the components of the package and detail recent improvements. In addition, we introduce a new interface to the PHAST libraries from the R statistical computing environment, called RPHAST, and illustrate its use in a series of vignettes. We demonstrate that RPHAST can be particularly useful in applications involving both large-scale phylogenomics and complex statistical analyses. The R interface also makes the PHAST libraries acccessible to non-C programmers, and is useful for rapid prototyping. PHAST v1.0 and RPHAST v1.0 are available for download at http://compgen.bscb.cornell.edu/phast, under the terms of an unrestrictive BSD-style license. RPHAST can also be obtained from the Comprehensive R Archive Network (CRAN; http://cran.r-project.org).
Collapse
Affiliation(s)
- Melissa J Hubisz
- Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, NY 14853, USA.
| | | | | |
Collapse
|
18
|
Ridout KE, Dixon CJ, Filatov DA. Positive selection differs between protein secondary structure elements in Drosophila. Genome Biol Evol 2010; 2:166-79. [PMID: 20624723 PMCID: PMC2997536 DOI: 10.1093/gbe/evq008] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Different protein secondary structure elements have different physicochemical properties and roles in the protein, which may determine their evolutionary flexibility. However, it is not clear to what extent protein structure affects the way Darwinian selection acts at the amino acid level. Using phylogeny-based likelihood tests for positive selection, we have examined the relationship between protein secondary structure and selection across six species of Drosophila. We find that amino acids that form disordered regions, such as random coils, are far more likely to be under positive selection than expected from their proportion in the proteins, and residues in helices and β-structures are subject to less positive selection than predicted. In addition, it appears that sites undergoing positive selection are more likely than expected to occur close to one another in the protein sequence. Finally, on a genome-wide scale, we have determined that positively selected sites are found more frequently toward the gene ends. Our results demonstrate that protein structures with a greater degree of organization and strong hydrophobicity, represented here as helices and β-structures, are less tolerant to molecular adaptation than disordered, hydrophilic regions, across a diverse set of proteins.
Collapse
Affiliation(s)
- Kate E Ridout
- Department of Plant Sciences, University of Oxford, Oxford, United Kingdom
| | | | | |
Collapse
|
19
|
Polygenic and directional regulatory evolution across pathways in Saccharomyces. Proc Natl Acad Sci U S A 2010; 107:5058-63. [PMID: 20194736 DOI: 10.1073/pnas.0912959107] [Citation(s) in RCA: 84] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022] Open
Abstract
The search to understand how genomes innovate in response to selection dominates the field of evolutionary biology. Powerful molecular evolution approaches have been developed to test individual loci for signatures of selection. In many cases, however, an organism's response to changes in selective pressure may be mediated by multiple genes, whose products function together in a cellular process or pathway. Here we assess the prevalence of polygenic evolution in pathways in the yeasts Saccharomyces cerevisiae and S. bayanus. We first established short-read sequencing methods to detect cis-regulatory variation in a diploid hybrid between the species. We then tested for the scenario in which selective pressure in one species to increase or decrease the activity of a pathway has driven the accumulation of cis-regulatory variants that act in the same direction on gene expression. Application of this test revealed a variety of yeast pathways with evidence for directional regulatory evolution. In parallel, we also used population genomic sequencing data to compare protein and cis-regulatory variation within and between species. We identified pathways with evidence for divergence within S. cerevisiae, and we detected signatures of positive selection between S. cerevisiae and S. bayanus. Our results point to polygenic, pathway-level change as a common evolutionary mechanism among yeasts. We suggest that pathway analyses, including our test for directional regulatory evolution, will prove to be a relevant and powerful strategy in many evolutionary genomic applications.
Collapse
|
20
|
DuMont VLB, Singh ND, Wright MH, Aquadro CF. Locus-specific decoupling of base composition evolution at synonymous sites and introns along the Drosophila melanogaster and Drosophila sechellia lineages. Genome Biol Evol 2009; 1:67-74. [PMID: 20333178 PMCID: PMC2817403 DOI: 10.1093/gbe/evp008] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/15/2009] [Indexed: 12/20/2022] Open
Abstract
Selection is thought to be partially responsible for patterns of molecular evolution at synonymous sites within numerous Drosophila species. Recently, “per-site” and likelihood methods have been developed to detect loci for which positive selection is a major component of synonymous site evolution. An underlying assumption of these methods, however, is a homogeneous mutation process. To address this potential shortcoming, we perform a complementary analysis making gene-by-gene comparisons of paired synonymous site and intron substitution rates toward and away from the nucleotides G and C because preferred codons are G or C ending in Drosophila. This comparison may reduce both the false-positive rate (due to broadscale heterogeneity in mutation) and false-negative rate (due to lack of power comparing small numbers of sites) of the per-site and likelihood methods. We detect loci with patterns of evolution suggestive of synonymous site selection pressures predominately favoring unpreferred and preferred codons along the Drosophila melanogaster and Drosophila sechellia lineages, respectively. Intron selection pressures do not appear sufficient to explain all these results as the magnitude of the difference in synonymous and intron evolution is dependent on recombination environment and chromosomal location in a direction supporting the hypothesis of selectively driven synonymous fixations. This comparison identifies 101 loci with an apparent switch in codon preference between D. melanogaster and D. sechellia, a pattern previously only observed at the Notch locus.
Collapse
|