1
|
Dubinkina V, Bhogale S, Hsieh PH, Dibaeinia P, Nambiar A, Maslov S, Yoshikuni Y, Sinha S. A transcriptomic atlas of acute stress response to low pH in multiple Issatchenkia orientalis strains. Microbiol Spectr 2024; 12:e0253623. [PMID: 38018981 PMCID: PMC10783018 DOI: 10.1128/spectrum.02536-23] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2023] [Accepted: 10/27/2023] [Indexed: 11/30/2023] Open
Abstract
IMPORTANCE Issatchenkia orientalis is a promising industrial chassis to produce biofuels and bioproducts due to its high tolerance to multiple environmental stresses such as low pH, heat, and other chemicals otherwise toxic for the most widely used microbes. Yet, little is known about specific mechanisms of such tolerance in this organism, hindering our ability to engineer this species to produce valuable biochemicals. Here, we report a comprehensive study of the mechanisms of acidic tolerance in this species via transcriptome profiling across variable pH for 12 different strains with different phenotypes. We found multiple regulatory mechanisms involved in tolerance to low pH in different strains of I. orientalis, marking potential targets for future gene editing and perturbation experiments.
Collapse
Affiliation(s)
- Veronika Dubinkina
- Carl R. Woese Institute for Genomic Biology, University of Illinois Urbana-Champaign, Urbana, Illinois, USA
- Department of Bioengineering, University of Illinois Urbana-Champaign, Urbana, Illinois, USA
- The Gladstone Institute of Data Science and Biotechnology, San Francisco, California, USA
| | - Shounak Bhogale
- Center for Biophysics and Quantitative Biology, University of Illinois Urbana-Champaign, Urbana, Illinois, USA
| | - Ping-Hung Hsieh
- Center for Advanced Bioenergy and Bioproducts Innovation, Lawrence Berkeley National Laboratory, Berkeley, California, USA
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, California, USA
| | - Payam Dibaeinia
- Carl R. Woese Institute for Genomic Biology, University of Illinois Urbana-Champaign, Urbana, Illinois, USA
- Department of Computer Science, University of Illinois Urbana-Champaign, Urbana, Illinois, USA
| | - Ananthan Nambiar
- Carl R. Woese Institute for Genomic Biology, University of Illinois Urbana-Champaign, Urbana, Illinois, USA
- Department of Bioengineering, University of Illinois Urbana-Champaign, Urbana, Illinois, USA
| | - Sergei Maslov
- Carl R. Woese Institute for Genomic Biology, University of Illinois Urbana-Champaign, Urbana, Illinois, USA
- Department of Bioengineering, University of Illinois Urbana-Champaign, Urbana, Illinois, USA
- Department of Physics, University of Illinois Urbana-Champaign, Urbana, Illinois, USA
| | - Yasuo Yoshikuni
- Center for Advanced Bioenergy and Bioproducts Innovation, Lawrence Berkeley National Laboratory, Berkeley, California, USA
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, California, USA
- US Department of Energy Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, California, USA
- Biological Systems and Engineering Division, Lawrence Berkeley National Laboratory, Berkeley, California, USA
- Global Institution for Collaborative Research and Education, Hokkaido University, Hokkaido, Japan
- Institute of Global Innovation Research, Tokyo University of Agriculture and Technology, Tokyo, Japan
| | - Saurabh Sinha
- Carl R. Woese Institute for Genomic Biology, University of Illinois Urbana-Champaign, Urbana, Illinois, USA
- Center for Biophysics and Quantitative Biology, University of Illinois Urbana-Champaign, Urbana, Illinois, USA
- Department of Computer Science, University of Illinois Urbana-Champaign, Urbana, Illinois, USA
- Cancer Center at Illinois, University of Illinois Urbana-Champaign, Urbana, Illinois, USA
- Department of Biomedical Engineering at Georgia Tech and Emory University, Atlanta, Georgia, USA
- Department of Industrial and Systems Engineering, Georgia Institute of Technology, Atlanta, Georgia, USA
| |
Collapse
|
2
|
Abstract
Because gene expression is important for evolutionary adaptation, its misregulation is an important cause of maladaptation. A misregulated gene can be incorrectly silent ("off") when a transcription factor (TF) that is required for its activation does not binds its regulatory region. Conversely, a misregulated gene can be incorrectly active ("on") when a TF not normally involved in its activation binds its regulatory region, a phenomenon also known as regulatory crosstalk. DNA mutations that destroy or create TF binding sites on DNA are an important source of misregulation and crosstalk. Although misregulation reduces fitness in an environment to which an organism is well-adapted, it may become adaptive in a new environment. Here, I derive simple yet general mathematical expressions that delimit the conditions under which misregulation can be adaptive. These expressions depend on the strength of selection against misregulation, on the fraction of DNA sequence space filled with TF binding sites, and on the fraction of genes that must be expressed for optimal adaptation. I then use empirical data from RNA sequencing, protein-binding microarrays, and genome evolution, together with population genetic simulations to ask when these conditions are likely to be met. I show that they can be met under realistic circumstances, but these circumstances may vary among organisms and environments. My analysis provides a framework in which improved theory and data collection can help us demonstrate the role of misregulation in adaptation. It also shows that misregulation, like DNA mutation, is one of life's many imperfections that can help propel Darwinian evolution.
Collapse
Affiliation(s)
- Andreas Wagner
- Department of Evolutionary Biology and Environmental Studies, University of Zurich, Zurich, CH-8057, Switzerland.,The Santa Fe Institute, Santa Fe, NM 87501, USA.,Swiss Institute of Bioinformatics, Lausanne, Switzerland
| |
Collapse
|
3
|
DiFrisco J, Jaeger J. Homology of process: developmental dynamics in comparative biology. Interface Focus 2021; 11:20210007. [PMID: 34055306 PMCID: PMC8086918 DOI: 10.1098/rsfs.2021.0007] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/22/2021] [Indexed: 12/14/2022] Open
Abstract
Comparative biology builds up systematic knowledge of the diversity of life, across evolutionary lineages and levels of organization, starting with evidence from a sparse sample of model organisms. In developmental biology, a key obstacle to the growth of comparative approaches is that the concept of homology is not very well defined for levels of organization that are intermediate between individual genes and morphological characters. In this paper, we investigate what it means for ontogenetic processes to be homologous, focusing specifically on the examples of insect segmentation and vertebrate somitogenesis. These processes can be homologous without homology of the underlying genes or gene networks, since the latter can diverge over evolutionary time, while the dynamics of the process remain the same. Ontogenetic processes like these therefore constitute a dissociable level and distinctive unit of comparison requiring their own specific criteria of homology. In addition, such processes are typically complex and nonlinear, such that their rigorous description and comparison requires not only observation and experimentation, but also dynamical modelling. We propose six criteria of process homology, combining recognized indicators (sameness of parts, morphological outcome and topological position) with novel ones derived from dynamical systems modelling (sameness of dynamical properties, dynamical complexity and evidence for transitional forms). We show how these criteria apply to animal segmentation and other ontogenetic processes. We conclude by situating our proposed dynamical framework for homology of process in relation to similar research programmes, such as process structuralism and developmental approaches to morphological homology.
Collapse
Affiliation(s)
- James DiFrisco
- Institute of Philosophy, KU Leuven, 3000 Leuven, Belgium
| | - Johannes Jaeger
- Complexity Science Hub (CSH) Vienna, Josefstädter Strasse 39, 1080 Vienna, Austria
| |
Collapse
|
4
|
Conner WR, Delaney EK, Bronski MJ, Ginsberg PS, Wheeler TB, Richardson KM, Peckenpaugh B, Kim KJ, Watada M, Hoffmann AA, Eisen MB, Kopp A, Cooper BS, Turelli M. A phylogeny for the Drosophila montium species group: A model clade for comparative analyses. Mol Phylogenet Evol 2020; 158:107061. [PMID: 33387647 DOI: 10.1016/j.ympev.2020.107061] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2020] [Revised: 12/18/2020] [Accepted: 12/24/2020] [Indexed: 12/22/2022]
Abstract
The Drosophila montium species group is a clade of 94 named species, closely related to the model species D. melanogaster. The montium species group is distributed over a broad geographic range throughout Asia, Africa, and Australasia. Species of this group possess a wide range of morphologies, mating behaviors, and endosymbiont associations, making this clade useful for comparative analyses. We use genomic data from 42 available species to estimate the phylogeny and relative divergence times within the montium species group, and its relative divergence time from D. melanogaster. To assess the robustness of our phylogenetic inferences, we use 3 non-overlapping sets of 20 single-copy coding sequences and analyze all 60 genes with both Bayesian and maximum likelihood methods. Our analyses support monophyly of the group. Apart from the uncertain placement of a single species, D. baimaii, our analyses also support the monophyly of all seven subgroups proposed within the montium group. Our phylograms and relative chronograms provide a highly resolved species tree, with discordance restricted to estimates of relatively short branches deep in the tree. In contrast, age estimates for the montium crown group, relative to its divergence from D. melanogaster, depend critically on prior assumptions concerning variation in rates of molecular evolution across branches, and hence have not been reliably determined. We discuss methodological issues that limit phylogenetic resolution - even when complete genome sequences are available - as well as the utility of the current phylogeny for understanding the evolutionary and biogeographic history of this clade.
Collapse
Affiliation(s)
- William R Conner
- Department of Evolution and Ecology, University of California, Davis, CA 95616, USA; Division of Biological Sciences, University of Montana, Missoula, MT 59812, USA(1)
| | - Emily K Delaney
- Department of Evolution and Ecology, University of California, Davis, CA 95616, USA
| | - Michael J Bronski
- Department of Molecular & Cell Biology, University of California, Berkeley, CA 94720, USA
| | - Paul S Ginsberg
- Department of Evolution and Ecology, University of California, Davis, CA 95616, USA; Department of Genetics, University of Georgia, Athens, GA 30602, USA(1)
| | - Timothy B Wheeler
- Division of Biological Sciences, University of Montana, Missoula, MT 59812, USA(1)
| | - Kelly M Richardson
- Bio21 Institute, School of BioScience, University of Melbourne, Victoria 3010, Australia
| | - Brooke Peckenpaugh
- Department of Evolution and Ecology, University of California, Davis, CA 95616, USA; Department of Biology, Indiana University, Bloomington, IN 47405, USA(1)
| | - Kevin J Kim
- Department of Evolution and Ecology, University of California, Davis, CA 95616, USA
| | - Masayoshi Watada
- Graduate School of Science and Engineering, Ehime University, Matsuyama, Ehime, Japan
| | - Ary A Hoffmann
- Bio21 Institute, School of BioScience, University of Melbourne, Victoria 3010, Australia
| | - Michael B Eisen
- Department of Molecular & Cell Biology, University of California, Berkeley, CA 94720, USA
| | - Artyom Kopp
- Department of Evolution and Ecology, University of California, Davis, CA 95616, USA
| | - Brandon S Cooper
- Division of Biological Sciences, University of Montana, Missoula, MT 59812, USA(1)
| | - Michael Turelli
- Department of Evolution and Ecology, University of California, Davis, CA 95616, USA.
| |
Collapse
|
5
|
Whole Genome Sequences of 23 Species from the Drosophila montium Species Group (Diptera: Drosophilidae): A Resource for Testing Evolutionary Hypotheses. G3-GENES GENOMES GENETICS 2020; 10:1443-1455. [PMID: 32220952 PMCID: PMC7202002 DOI: 10.1534/g3.119.400959] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/24/2023]
Abstract
Large groups of species with well-defined phylogenies are excellent systems for testing evolutionary hypotheses. In this paper, we describe the creation of a comparative genomic resource consisting of 23 genomes from the species-rich Drosophila montium species group, 22 of which are presented here for the first time. The montium group is well-positioned for clade genomics. Within the montium clade, evolutionary distances are such that large numbers of sequences can be accurately aligned while also recovering strong signals of divergence; and the distance between the montium group and D. melanogaster is short enough so that orthologous sequence can be readily identified. All genomes were assembled from a single, small-insert library using MaSuRCA, before going through an extensive post-assembly pipeline. Estimated genome sizes within the montium group range from 155 Mb to 223 Mb (mean = 196 Mb). The absence of long-distance information during the assembly process resulted in fragmented assemblies, with the scaffold NG50s varying widely based on repeat content and sample heterozygosity (min = 18 kb, max = 390 kb, mean = 74 kb). The total scaffold length for most assemblies is also shorter than the estimated genome size, typically by 5-15%. However, subsequent analysis showed that our assemblies are highly complete. Despite large differences in contiguity, all assemblies contain at least 96% of known single-copy Dipteran genes (BUSCOs, n = 2,799). Similarly, by aligning our assemblies to the D. melanogaster genome and remapping coordinates for a large set of transcriptional enhancers (n = 3,457), we showed that each montium assembly contains orthologs for at least 91% of D. melanogaster enhancers. Importantly, the genic and enhancer contents of our assemblies are comparable to that of far more contiguous Drosophila assemblies. The alignment of our own D. serrata assembly to a previously published PacBio D. serrata assembly also showed that our longest scaffolds (up to 1 Mb) are free of large-scale misassemblies. Our genome assemblies are a valuable resource that can be used to further resolve the montium group phylogeny; study the evolution of protein-coding genes and cis-regulatory sequences; and determine the genetic basis of ecological and behavioral adaptations.
Collapse
|
6
|
Mitchelmore J, Grinberg NF, Wallace C, Spivakov M. Functional effects of variation in transcription factor binding highlight long-range gene regulation by epromoters. Nucleic Acids Res 2020; 48:2866-2879. [PMID: 32112106 PMCID: PMC7102942 DOI: 10.1093/nar/gkaa123] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2019] [Revised: 02/14/2020] [Accepted: 02/17/2020] [Indexed: 02/06/2023] Open
Abstract
Identifying DNA cis-regulatory modules (CRMs) that control the expression of specific genes is crucial for deciphering the logic of transcriptional control. Natural genetic variation can point to the possible gene regulatory function of specific sequences through their allelic associations with gene expression. However, comprehensive identification of causal regulatory sequences in brute-force association testing without incorporating prior knowledge is challenging due to limited statistical power and effects of linkage disequilibrium. Sequence variants affecting transcription factor (TF) binding at CRMs have a strong potential to influence gene regulatory function, which provides a motivation for prioritizing such variants in association testing. Here, we generate an atlas of CRMs showing predicted allelic variation in TF binding affinity in human lymphoblastoid cell lines and test their association with the expression of their putative target genes inferred from Promoter Capture Hi-C and immediate linear proximity. We reveal >1300 CRM TF-binding variants associated with target gene expression, the majority of them undetected with standard association testing. A large proportion of CRMs showing associations with the expression of genes they contact in 3D localize to the promoter regions of other genes, supporting the notion of 'epromoters': dual-action CRMs with promoter and distal enhancer activity.
Collapse
Affiliation(s)
- Joanna Mitchelmore
- Nuclear Dynamics Programme, Babraham Institute, Babraham Research Campus, Cambridge CB22 3AT, UK
| | - Nastasiya F Grinberg
- Cambridge Institute of Therapeutic Immunology & Infectious Disease (CITIID), University of Cambridge, Cambridge Biomedical Campus, Cambridge CB2 0AW, UK
| | - Chris Wallace
- Cambridge Institute of Therapeutic Immunology & Infectious Disease (CITIID), University of Cambridge, Cambridge Biomedical Campus, Cambridge CB2 0AW, UK
- MRC Biostatistics Unit, University of Cambridge, Cambridge Biomedical Campus, Cambridge CB2 0SR, UK
| | - Mikhail Spivakov
- Nuclear Dynamics Programme, Babraham Institute, Babraham Research Campus, Cambridge CB22 3AT, UK
- MRC London Institute of Medical Sciences, Du Cane Road, London W12 0NN, UK
- Institute of Clinical Sciences, Faculty of Medicine, Imperial College, Du Cane Road, London W12 0NN, UK
| |
Collapse
|
7
|
Alexandre CM, Urton JR, Jean-Baptiste K, Huddleston J, Dorrity MW, Cuperus JT, Sullivan AM, Bemm F, Jolic D, Arsovski AA, Thompson A, Nemhauser JL, Fields S, Weigel D, Bubb KL, Queitsch C. Complex Relationships between Chromatin Accessibility, Sequence Divergence, and Gene Expression in Arabidopsis thaliana. Mol Biol Evol 2019; 35:837-854. [PMID: 29272536 DOI: 10.1093/molbev/msx326] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open
Abstract
Variation in regulatory DNA is thought to drive phenotypic variation, evolution, and disease. Prior studies of regulatory DNA and transcription factors across animal species highlighted a fundamental conundrum: Transcription factor binding domains and cognate binding sites are conserved, while regulatory DNA sequences are not. It remains unclear how conserved transcription factors and dynamic regulatory sites produce conserved expression patterns across species. Here, we explore regulatory DNA variation and its functional consequences within Arabidopsis thaliana, using chromatin accessibility to delineate regulatory DNA genome-wide. Unlike in previous cross-species comparisons, the positional homology of regulatory DNA is maintained among A. thaliana ecotypes and less nucleotide divergence has occurred. Of the ∼50,000 regulatory sites in A. thaliana, we found that 15% varied in accessibility among ecotypes. Some of these accessibility differences were associated with extensive, previously unannotated sequence variation, encompassing many deletions and ancient hypervariable alleles. Unexpectedly, for the majority of such regulatory sites, nearby gene expression was unaffected. Nevertheless, regulatory sites with high levels of sequence variation and differential chromatin accessibility were the most likely to be associated with differential gene expression. Finally, and most surprising, we found that the vast majority of differentially accessible sites show no underlying sequence variation. We argue that these surprising results highlight the necessity to consider higher-order regulatory context in evaluating regulatory variation and predicting its phenotypic consequences.
Collapse
Affiliation(s)
| | - James R Urton
- Department of Genome Sciences, University of Washington, Seattle, WA
| | - Ken Jean-Baptiste
- Department of Genome Sciences, University of Washington, Seattle, WA
| | - John Huddleston
- Department of Genome Sciences, University of Washington, Seattle, WA.,Molecular and Cellular Biology Graduate Program, University of Washington, Seattle, WA
| | - Michael W Dorrity
- Department of Genome Sciences, University of Washington, Seattle, WA
| | - Josh T Cuperus
- Department of Genome Sciences, University of Washington, Seattle, WA
| | | | - Felix Bemm
- Department of Molecular Biology, Max Planck Institute for Developmental Biology, Tübingen, Germany
| | - Dino Jolic
- Department of Molecular Biology, Max Planck Institute for Developmental Biology, Tübingen, Germany
| | | | | | | | - Stan Fields
- Department of Genome Sciences, University of Washington, Seattle, WA.,Howard Hughes Medical Institute, University of Washington, Seattle, WA
| | - Detlef Weigel
- Department of Molecular Biology, Max Planck Institute for Developmental Biology, Tübingen, Germany
| | - Kerry L Bubb
- Department of Genome Sciences, University of Washington, Seattle, WA
| | - Christin Queitsch
- Department of Genome Sciences, University of Washington, Seattle, WA
| |
Collapse
|
8
|
Marxer M, Vollenweider V, Schmid-Hempel P. Insect antimicrobial peptides act synergistically to inhibit a trypanosome parasite. Philos Trans R Soc Lond B Biol Sci 2016; 371:20150302. [PMID: 27160603 PMCID: PMC4874398 DOI: 10.1098/rstb.2015.0302] [Citation(s) in RCA: 38] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/08/2016] [Indexed: 11/12/2022] Open
Abstract
The innate immune system provides protection from infection by producing essential effector molecules, such as antimicrobial peptides (AMPs) that possess broad-spectrum activity. This is also the case for bumblebees, Bombus terrestris, when infected by the trypanosome, Crithidia bombi Furthermore, the expressed mixture of AMPs varies with host genetic background and infecting parasite strain (genotype). Here, we used the fact that clones of C. bombi can be cultivated and kept as strains in medium to test the effect of various combinations of AMPs on the growth rate of the parasite. In particular, we used pairwise combinations and a range of physiological concentrations of three AMPs, namely Abaecin, Defensin and Hymenoptaecin, synthetized from the respective genomic sequences. We found that these AMPs indeed suppress the growth of eight different strains of C. bombi, and that combinations of AMPs were typically more effective than the use of a single AMP alone. Furthermore, the most effective combinations were rarely those consisting of maximum concentrations. In addition, the AMP combination treatments revealed parasite strain specificity, such that strains varied in their sensitivity towards the same mixtures. Hence, variable expression of AMPs could be an alternative strategy to combat highly variable infections.This article is part of the themed issue 'Evolutionary ecology of arthropod antimicrobial peptides'.
Collapse
Affiliation(s)
- Monika Marxer
- ETH Zurich, Institute of Integrative Biology (IBZ), Universitätsstrasse 16, 8092 Zürich, Switzerland
| | - Vera Vollenweider
- ETH Zurich, Institute of Integrative Biology (IBZ), Universitätsstrasse 16, 8092 Zürich, Switzerland
| | - Paul Schmid-Hempel
- ETH Zurich, Institute of Integrative Biology (IBZ), Universitätsstrasse 16, 8092 Zürich, Switzerland
| |
Collapse
|
9
|
Rothschild JB, Tsimiklis P, Siggia ED, François P. Predicting Ancestral Segmentation Phenotypes from Drosophila to Anopheles Using In Silico Evolution. PLoS Genet 2016; 12:e1006052. [PMID: 27227405 PMCID: PMC4882032 DOI: 10.1371/journal.pgen.1006052] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2016] [Accepted: 04/23/2016] [Indexed: 12/23/2022] Open
Abstract
Molecular evolution is an established technique for inferring gene homology but regulatory DNA turns over so rapidly that inference of ancestral networks is often impossible. In silico evolution is used to compute the most parsimonious path in regulatory space for anterior-posterior patterning linking two Dipterian species. The expression pattern of gap genes has evolved between Drosophila (fly) and Anopheles (mosquito), yet one of their targets, eve, has remained invariant. Our model predicts that stripe 5 in fly disappears and a new posterior stripe is created in mosquito, thus eve stripe modules 3+7 and 4+6 in fly are homologous to 3+6 and 4+5 in mosquito. We can place Clogmia on this evolutionary pathway and it shares the mosquito homologies. To account for the evolution of the other pair-rule genes in the posterior we have to assume that the ancestral Dipterian utilized a dynamic method to phase those genes in relation to eve. The last common ancestor of the fruit fly (Drosophila) and mosquito (Anopheles) lived more than 200 Million years ago. Can we use available data on insects alive today to infer what their ancestor looked like? In this manuscript, we focus on early embryonic development, when stripes of genetic expression appear and define the location of insect segments (“segmentation”). We use an evolutionary algorithm to reconstruct and predict dynamics of genes controlling stripes in the last common ancestor of fly and mosquito. We predict a new and different combinatorial logic of stripe formation in mosquito compared to fly, which is fully consistent with development of intermediate species such as moth-fly (Clogmia). Our simulations further suggest that the dynamics of gene expression in this last common ancestor were similar to other insects, such as wasps (Nasonia). Our method illustrates how computational methods inspired by machine learning and non-linear physics can be used to infer gene dynamics in species that disappeared millions of years ago.
Collapse
Affiliation(s)
- Jeremy B. Rothschild
- Physics Department, McGill University, Ernest Rutherford Physics Building, Montreal, Quebec, Canada
| | - Panagiotis Tsimiklis
- Physics Department, McGill University, Ernest Rutherford Physics Building, Montreal, Quebec, Canada
| | - Eric D. Siggia
- Center for Studies in Physics and Biology, The Rockefeller University, New York, New York, United States of America
| | - Paul François
- Physics Department, McGill University, Ernest Rutherford Physics Building, Montreal, Quebec, Canada
- * E-mail:
| |
Collapse
|
10
|
Bergen AC, Olsen GM, Fay JC. Divergent MLS1 Promoters Lie on a Fitness Plateau for Gene Expression. Mol Biol Evol 2016; 33:1270-9. [PMID: 26782997 PMCID: PMC4839218 DOI: 10.1093/molbev/msw010] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
Qualitative patterns of gene activation and repression are often conserved despite an abundance of quantitative variation in expression levels within and between species. A major challenge to interpreting patterns of expression divergence is knowing which changes in gene expression affect fitness. To characterize the fitness effects of gene expression divergence, we placed orthologous promoters from eight yeast species upstream of malate synthase (MLS1) in Saccharomyces cerevisiae. As expected, we found these promoters varied in their expression level under activated and repressed conditions as well as in their dynamic response following loss of glucose repression. Despite these differences, only a single promoter driving near basal levels of expression caused a detectable loss of fitness. We conclude that the MLS1 promoter lies on a fitness plateau whereby even large changes in gene expression can be tolerated without a substantial loss of fitness.
Collapse
Affiliation(s)
- Andrew C Bergen
- Molecular Genetics and Genomics Program, Washington University, St. Louis
| | | | - Justin C Fay
- Department of Genetics, Washington University, St. Louis Center for Genome Sciences and Systems Biology, Washington University, St. Louis
| |
Collapse
|
11
|
Abe H, Gemmell NJ. Evolutionary Footprints of Short Tandem Repeats in Avian Promoters. Sci Rep 2016; 6:19421. [PMID: 26766026 PMCID: PMC4725869 DOI: 10.1038/srep19421] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2015] [Accepted: 12/11/2015] [Indexed: 01/12/2023] Open
Abstract
Short tandem repeats (STRs) or microsatellites are well-known sequence elements that may change the spacing between transcription factor binding sites (TFBSs) in promoter regions by expansion or contraction of repetitive units. Some of these mutations have the potential to contribute to phenotypic diversity by altering patterns of gene expression. To explore how repetitive sequence motifs within promoters have evolved in avian lineages under mutation-selection balance, more than 400 evolutionary conserved STRs (ecSTRs) were identified in this study by comparing the 2 kb upstream promoter sequences of chicken against those of other birds (turkey, duck, zebra finch, and flycatcher). The rate of conservation was significantly higher in AG dinucleotide repeats than in AC or AT repeats, with the expansion of AG motifs being noticeably constrained in passerines. Analysis of the relative distance between ecSTRs and TFBSs revealed a significantly higher rate of conserved TFBSs in the vicinity of ecSTRs in both chicken-duck and chicken-passerine comparisons. Our comparative study provides a novel insight into which intrinsic factors have influenced the degree of constraint on repeat expansion/contraction during avian promoter evolution.
Collapse
Affiliation(s)
- Hideaki Abe
- Department of Anatomy, University of Otago, Dunedin 9054, New Zealand
| | - Neil J Gemmell
- Department of Anatomy, University of Otago, Dunedin 9054, New Zealand.,Allan Wilson Centre for Molecular Ecology and Evolution, University of Otago, Dunedin 9054, New Zealand
| |
Collapse
|
12
|
Read T, Richmond PA, Dowell RD. A trans-acting Variant within the Transcription Factor RIM101 Interacts with Genetic Background to Determine its Regulatory Capacity. PLoS Genet 2016; 12:e1005746. [PMID: 26751950 PMCID: PMC4709078 DOI: 10.1371/journal.pgen.1005746] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2015] [Accepted: 11/25/2015] [Indexed: 11/19/2022] Open
Abstract
Most genetic variants associated with disease occur within regulatory regions of the genome, underscoring the importance of defining the mechanisms underlying differences in regulation of gene expression between individuals. We discovered a pair of co-regulated, divergently oriented transcripts, AQY2 and ncFRE6, that are expressed in one strain of Saccharomyces cerevisiae, ∑1278b, but not in another, S288c. By combining classical genetics techniques with high-throughput sequencing, we identified a trans-acting single nucleotide polymorphism within the transcription factor RIM101 that causes the background-dependent expression of both transcripts. Subsequent RNA-seq experiments revealed that RIM101 regulates many more targets in S288c than in ∑1278b and that deletion of RIM101 in both backgrounds abrogates the majority of differential expression between the strains. Strikingly, only three transcripts undergo a significant change in expression after swapping RIM101 alleles between backgrounds, implying that the differences in the RIM101 allele lead to a remarkably focused transcriptional response. However, hundreds of RIM101-dependent targets undergo a subtle but consistent shift in expression in the S288c RIM101-swapped strain, but not its ∑1278b counterpart. We conclude that ∑1278b may harbor a variant(s) that buffers against widespread transcriptional dysregulation upon introduction of a non-native RIM101 allele, emphasizing the importance of accounting for genetic background when assessing the impact of a regulatory variant.
Collapse
Affiliation(s)
- Timothy Read
- Department of Molecular, Cellular and Developmental Biology, University of Colorado, Boulder, Boulder, Colorado, United States of America
| | - Phillip A. Richmond
- Department of Molecular, Cellular and Developmental Biology, University of Colorado, Boulder, Boulder, Colorado, United States of America
- BioFrontiers Institute, University of Colorado, Boulder, Boulder, Colorado, United States of America
| | - Robin D. Dowell
- Department of Molecular, Cellular and Developmental Biology, University of Colorado, Boulder, Boulder, Colorado, United States of America
- BioFrontiers Institute, University of Colorado, Boulder, Boulder, Colorado, United States of America
- * E-mail:
| |
Collapse
|
13
|
Thompson D, Regev A, Roy S. Comparative analysis of gene regulatory networks: from network reconstruction to evolution. Annu Rev Cell Dev Biol 2015; 31:399-428. [PMID: 26355593 DOI: 10.1146/annurev-cellbio-100913-012908] [Citation(s) in RCA: 95] [Impact Index Per Article: 10.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Regulation of gene expression is central to many biological processes. Although reconstruction of regulatory circuits from genomic data alone is therefore desirable, this remains a major computational challenge. Comparative approaches that examine the conservation and divergence of circuits and their components across strains and species can help reconstruct circuits as well as provide insights into the evolution of gene regulatory processes and their adaptive contribution. In recent years, advances in genomic and computational tools have led to a wealth of methods for such analysis at the sequence, expression, pathway, module, and entire network level. Here, we review computational methods developed to study transcriptional regulatory networks using comparative genomics, from sequence to functional data. We highlight how these methods use evolutionary conservation and divergence to reliably detect regulatory components as well as estimate the extent and rate of divergence. Finally, we discuss the promise and open challenges in linking regulatory divergence to phenotypic divergence and adaptation.
Collapse
Affiliation(s)
- Dawn Thompson
- Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142
| | | | | |
Collapse
|
14
|
Duque T, Sinha S. What does it take to evolve an enhancer? A simulation-based study of factors influencing the emergence of combinatorial regulation. Genome Biol Evol 2015; 7:1415-31. [PMID: 25956793 PMCID: PMC4494070 DOI: 10.1093/gbe/evv080] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
There is widespread interest today in understanding enhancers, which are regulatory elements typically harboring several transcription factor binding sites and mediating the combinatorial effect of transcription factors on gene expression. The evolution of enhancers poses interesting unanswered questions, for example, the evolutionary time taken for a typical enhancer to emerge or the factors shaping its evolution. Existing approaches to cis-regulatory evolution have often ignored the combinatorial nature and varied biochemical mechanisms of gene regulation encoded in enhancers. We report on our investigation of enhancer evolution through the use of PEBCRES, a framework for evolutionary simulation of enhancers that employs a mechanistic and well-supported sequence-to-expression model to assign fitness to the evolving enhancer genotype. We estimated the time necessary to evolve, from genomic background, enhancers capable of driving complex gene expression patterns similar to those involved in early development in Drosophila. We found the time-to-evolve to range between 0.5 and 10 Myr, and to vary greatly with the target expression pattern, complexity of the real enhancer known to encode that pattern, and the strength of input from specific transcription factors. To our knowledge, this is the first estimate of waiting times for realistic enhancers to evolve. The in silico evolved enhancers had, with a few interesting exceptions, site compositions similar to those seen in real enhancers for the same patterns. Our simulations also revealed that certain features of an enhancer might evolve not due to their biological function but as aids to the evolutionary process itself.
Collapse
Affiliation(s)
- Thyago Duque
- Department of Computer Science, University of Illinois at Urbana-Champaign
| | - Saurabh Sinha
- Department of Computer Science, University of Illinois at Urbana-Champaign Institute for Genomic Biology, University of Illinois at Urbana-Champaign
| |
Collapse
|
15
|
Carl SH, Russell S. Common binding by redundant group B Sox proteins is evolutionarily conserved in Drosophila. BMC Genomics 2015; 16:292. [PMID: 25887553 PMCID: PMC4419465 DOI: 10.1186/s12864-015-1495-3] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2015] [Accepted: 03/27/2015] [Indexed: 01/08/2023] Open
Abstract
Background Group B Sox proteins are a highly conserved group of transcription factors that act extensively to coordinate nervous system development in higher metazoans while showing both co-expression and functional redundancy across a broad group of taxa. In Drosophila melanogaster, the two group B Sox proteins Dichaete and SoxNeuro show widespread common binding across the genome. While some instances of functional compensation have been observed in Drosophila, the function of common binding and the extent of its evolutionary conservation is not known. Results We used DamID-seq to examine the genome-wide binding patterns of Dichaete and SoxNeuro in four species of Drosophila. Through a quantitative comparison of Dichaete binding, we evaluated the rate of binding site turnover across the genome as well as at specific functional sites. We also examined the presence of Sox motifs within binding intervals and the correlation between sequence conservation and binding conservation. To determine whether common binding between Dichaete and SoxNeuro is conserved, we performed a detailed analysis of the binding patterns of both factors in two species. Conclusion We find that, while the regulatory networks driven by Dichaete and SoxNeuro are largely conserved across the drosophilids studied, binding site turnover is widespread and correlated with phylogenetic distance. Nonetheless, binding is preferentially conserved at known cis-regulatory modules and core, independently verified binding sites. We observed the strongest binding conservation at sites that are commonly bound by Dichaete and SoxNeuro, suggesting that these sites are functionally important. Our analysis provides insights into the evolution of group B Sox function, highlighting the specific conservation of shared binding sites and suggesting alternative sources of neofunctionalisation between paralogous family members. Electronic supplementary material The online version of this article (doi:10.1186/s12864-015-1495-3) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Sarah H Carl
- Department of Genetics and Cambridge Systems Biology Centre, University of Cambridge, Downing Street, Cambridge, CB2 3EH, UK.
| | - Steven Russell
- Department of Genetics and Cambridge Systems Biology Centre, University of Cambridge, Downing Street, Cambridge, CB2 3EH, UK.
| |
Collapse
|
16
|
Nadimpalli S, Persikov AV, Singh M. Pervasive variation of transcription factor orthologs contributes to regulatory network evolution. PLoS Genet 2015; 11:e1005011. [PMID: 25748510 PMCID: PMC4351887 DOI: 10.1371/journal.pgen.1005011] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2014] [Accepted: 01/18/2015] [Indexed: 01/17/2023] Open
Abstract
Differences in transcriptional regulatory networks underlie much of the phenotypic variation observed across organisms. Changes to cis-regulatory elements are widely believed to be the predominant means by which regulatory networks evolve, yet examples of regulatory network divergence due to transcription factor (TF) variation have also been observed. To systematically ascertain the extent to which TFs contribute to regulatory divergence, we analyzed the evolution of the largest class of metazoan TFs, Cys2-His2 zinc finger (C2H2-ZF) TFs, across 12 Drosophila species spanning ~45 million years of evolution. Remarkably, we uncovered that a significant fraction of all C2H2-ZF 1-to-1 orthologs in flies exhibit variations that can affect their DNA-binding specificities. In addition to loss and recruitment of C2H2-ZF domains, we found diverging DNA-contacting residues in ~44% of domains shared between D. melanogaster and the other fly species. These diverging DNA-contacting residues, found in ~70% of the D. melanogaster C2H2-ZF genes in our analysis and corresponding to ~26% of all annotated D. melanogaster TFs, show evidence of functional constraint: they tend to be conserved across phylogenetic clades and evolve slower than other diverging residues. These same variations were rarely found as polymorphisms within a population of D. melanogaster flies, indicating their rapid fixation. The predicted specificities of these dynamic domains gradually change across phylogenetic distances, suggesting stepwise evolutionary trajectories for TF divergence. Further, whereas proteins with conserved C2H2-ZF domains are enriched in developmental functions, those with varying domains exhibit no functional enrichments. Our work suggests that a subset of highly dynamic and largely unstudied TFs are a likely source of regulatory variation in Drosophila and other metazoans.
Collapse
Affiliation(s)
- Shilpa Nadimpalli
- Department of Computer Science, Princeton University, Princeton, New Jersey, United States of America
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, New Jersey, United States of America
| | - Anton V. Persikov
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, New Jersey, United States of America
| | - Mona Singh
- Department of Computer Science, Princeton University, Princeton, New Jersey, United States of America
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, New Jersey, United States of America
| |
Collapse
|
17
|
Villar D, Berthelot C, Aldridge S, Rayner TF, Lukk M, Pignatelli M, Park TJ, Deaville R, Erichsen JT, Jasinska AJ, Turner JMA, Bertelsen MF, Murchison EP, Flicek P, Odom DT. Enhancer evolution across 20 mammalian species. Cell 2015; 160:554-66. [PMID: 25635462 PMCID: PMC4313353 DOI: 10.1016/j.cell.2015.01.006] [Citation(s) in RCA: 460] [Impact Index Per Article: 51.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2014] [Revised: 10/31/2014] [Accepted: 12/15/2014] [Indexed: 12/21/2022]
Abstract
The mammalian radiation has corresponded with rapid changes in noncoding regions of the genome, but we lack a comprehensive understanding of regulatory evolution in mammals. Here, we track the evolution of promoters and enhancers active in liver across 20 mammalian species from six diverse orders by profiling genomic enrichment of H3K27 acetylation and H3K4 trimethylation. We report that rapid evolution of enhancers is a universal feature of mammalian genomes. Most of the recently evolved enhancers arise from ancestral DNA exaptation, rather than lineage-specific expansions of repeat elements. In contrast, almost all liver promoters are partially or fully conserved across these species. Our data further reveal that recently evolved enhancers can be associated with genes under positive selection, demonstrating the power of this approach for annotating regulatory adaptations in genomic sequences. These results provide important insight into the functional genetics underpinning mammalian regulatory evolution.
Collapse
Affiliation(s)
- Diego Villar
- University of Cambridge, Cancer Research UK Cambridge Institute, Robinson Way, Cambridge, CB2 0RE, UK
| | - Camille Berthelot
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Sarah Aldridge
- University of Cambridge, Cancer Research UK Cambridge Institute, Robinson Way, Cambridge, CB2 0RE, UK
| | - Tim F Rayner
- University of Cambridge, Cancer Research UK Cambridge Institute, Robinson Way, Cambridge, CB2 0RE, UK
| | - Margus Lukk
- University of Cambridge, Cancer Research UK Cambridge Institute, Robinson Way, Cambridge, CB2 0RE, UK
| | - Miguel Pignatelli
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Thomas J Park
- Department of Biological Sciences, University of Illinois at Chicago (UIC), 845 West Taylor Street, Chicago, IL 60607, USA
| | - Robert Deaville
- UK Cetacean Strandings Investigation Programme (CSIP) and Institute of Zoology, Zoological Society of London, Outer Circle, Regent's Park, London NW1 4RY, UK
| | - Jonathan T Erichsen
- School of Optometry and Vision Sciences, Cardiff University, Maindy Road, Cardiff CF24 4HQ, UK
| | - Anna J Jasinska
- UCLA Center for Neurobehavioral Genetics, 695 Charles E. Young Drive South, Los Angeles, CA 90095, USA
| | - James M A Turner
- Division of Stem Cell Biology and Developmental Genetics, MRC National Institute for Medical Research, Mill Hill, London NW7 1AA, UK
| | - Mads F Bertelsen
- Center for Zoo and Wild Animal Health, Copenhagen Zoo, Roskildevej 38, DK-2000 Frederiksberg, Denmark
| | - Elizabeth P Murchison
- Department of Veterinary Medicine, University of Cambridge, Madingley Road, Cambridge CB3 0ES, UK
| | - Paul Flicek
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK; Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK.
| | - Duncan T Odom
- University of Cambridge, Cancer Research UK Cambridge Institute, Robinson Way, Cambridge, CB2 0RE, UK; Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK.
| |
Collapse
|
18
|
McCandlish DM, Stoltzfus A. Modeling evolution using the probability of fixation: history and implications. QUARTERLY REVIEW OF BIOLOGY 2014; 89:225-52. [PMID: 25195318 DOI: 10.1086/677571] [Citation(s) in RCA: 123] [Impact Index Per Article: 12.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/03/2022]
Abstract
Many models of evolution calculate the rate of evolution by multiplying the rate at which new mutations originate within a population by a probability of fixation. Here we review the historical origins, contemporary applications, and evolutionary implications of these "origin-fixation" models, which are widely used in evolutionary genetics, molecular evolution, and phylogenetics. Origin-fixation models were first introduced in 1969, in association with an emerging view of "molecular" evolution. Early origin-fixation models were used to calculate an instantaneous rate of evolution across a large number of independently evolving loci; in the 1980s and 1990s, a second wave of origin-fixation models emerged to address a sequence of fixation events at a single locus. Although origin fixation models have been applied to a broad array of problems in contemporary evolutionary research, their rise in popularity has not been accompanied by an increased appreciation of their restrictive assumptions or their distinctive implications. We argue that origin-fixation models constitute a coherent theory of mutation-limited evolution that contrasts sharply with theories of evolution that rely on the presence of standing genetic variation. A major unsolved question in evolutionary biology is the degree to which these models provide an accurate approximation of evolution in natural populations.
Collapse
|
19
|
Yokoyama KD, Zhang Y, Ma J. Tracing the evolution of lineage-specific transcription factor binding sites in a birth-death framework. PLoS Comput Biol 2014; 10:e1003771. [PMID: 25144359 PMCID: PMC4140645 DOI: 10.1371/journal.pcbi.1003771] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2014] [Accepted: 06/27/2014] [Indexed: 11/24/2022] Open
Abstract
Changes in cis-regulatory element composition that result in novel patterns of gene expression are thought to be a major contributor to the evolution of lineage-specific traits. Although transcription factor binding events show substantial variation across species, most computational approaches to study regulatory elements focus primarily upon highly conserved sites, and rely heavily upon multiple sequence alignments. However, sequence conservation based approaches have limited ability to detect lineage-specific elements that could contribute to species-specific traits. In this paper, we describe a novel framework that utilizes a birth-death model to trace the evolution of lineage-specific binding sites without relying on detailed base-by-base cross-species alignments. Our model was applied to analyze the evolution of binding sites based on the ChIP-seq data for six transcription factors (GATA1, SOX2, CTCF, MYC, MAX, ETS1) along the lineage toward human after human-mouse common ancestor. We estimate that a substantial fraction of binding sites (∼58–79% for each factor) in humans have origins since the divergence with mouse. Over 15% of all binding sites are unique to hominids. Such elements are often enriched near genes associated with specific pathways, and harbor more common SNPs than older binding sites in the human genome. These results support the ability of our method to identify lineage-specific regulatory elements and help understand their roles in shaping variation in gene regulation across species. Recent experimental studies showed that the evolution of transcription factor binding sites (TFBS) is highly dynamic, with sites differing a great deal even between closely related mammalian species. Despite the substantial experimental evidence for rapid divergence of regulatory protein-binding events across species, computational methods designed to analyze regulatory elements evolution have focused primarily on phylogenetic footprinting approaches, in which putative functional regulatory elements are identified according to strong sequence conservation. Cross-species comparisons of non-coding sequences are limited in their ability to fully understand the evolution of regulatory sequences, particularly in cases where the elements are selected for novelty or species-specific. We have developed a novel framework to reconstruct the history of lineage-specific TFBS and showed that large amount of TFBS in human were born after human-mouse divergence. These elements also have distinct biological implications as compared to more ancient ones. This method can help understand the roles of lineage-specific TFBS in shaping gene regulation across different species.
Collapse
Affiliation(s)
- Ken Daigoro Yokoyama
- Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, Illinois, United States of America
| | - Yang Zhang
- Department of Bioengineering, University of Illinois at Urbana-Champaign, Urbana, Illinois, United States of America
| | - Jian Ma
- Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, Illinois, United States of America
- Department of Bioengineering, University of Illinois at Urbana-Champaign, Urbana, Illinois, United States of America
- * E-mail:
| |
Collapse
|
20
|
Abstract
Transcription factor binding sites (TFBSs) on the DNA are generally accepted as the key nodes of gene control. However, the multitudes of TFBSs identified in genome-wide studies, some of them seemingly unconstrained in evolution, have prompted the view that in many cases TF binding may serve no biological function. Yet, insights from transcriptional biochemistry, population genetics and functional genomics suggest that rather than segregating into 'functional' or 'non-functional', TFBS inputs to their target genes may be generally cumulative, with varying degrees of potency and redundancy. As TFBS redundancy can be diminished by mutations and environmental stress, some of the apparently 'spurious' sites may turn out to be important for maintaining adequate transcriptional regulation under these conditions. This has significant implications for interpreting the phenotypic effects of TFBS mutations, particularly in the context of genome-wide association studies for complex traits.
Collapse
|
21
|
Naturally occurring deletions of hunchback binding sites in the even-skipped stripe 3+7 enhancer. PLoS One 2014; 9:e91924. [PMID: 24786295 PMCID: PMC4006794 DOI: 10.1371/journal.pone.0091924] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2013] [Accepted: 02/18/2014] [Indexed: 11/23/2022] Open
Abstract
Changes in regulatory DNA contribute to phenotypic differences within and between taxa. Comparative studies show that many transcription factor binding sites (TFBS) are conserved between species whereas functional studies reveal that some mutations segregating within species alter TFBS function. Consistently, in this analysis of 13 regulatory elements in Drosophila melanogaster populations, single base and insertion/deletion polymorphism are rare in characterized regulatory elements. Experimentally defined TFBS are nearly devoid of segregating mutations and, as has been shown before, are quite conserved. For instance 8 of 11 Hunchback binding sites in the stripe 3+7 enhancer of even-skipped are conserved between D. melanogaster and Drosophila virilis. Oddly, we found a 72 bp deletion that removes one of these binding sites (Hb8), segregating within D. melanogaster. Furthermore, a 45 bp deletion polymorphism in the spacer between the stripe 3+7 and stripe 2 enhancers, removes another predicted Hunchback site. These two deletions are separated by ∼250 bp, sit on distinct haplotypes, and segregate at appreciable frequency. The Hb8Δ is at 5 to 35% frequency in the new world, but also shows cosmopolitan distribution. There is depletion of sequence variation on the Hb8Δ-carrying haplotype. Quantitative genetic tests indicate that Hb8Δ affects developmental time, but not viability of offspring. The Eve expression pattern differs between inbred lines, but the stripe 3 and 7 boundaries seem unaffected by Hb8Δ. The data reveal segregating variation in regulatory elements, which may reflect evolutionary turnover of characterized TFBS due to drift or co-evolution.
Collapse
|
22
|
Cis-regulatory variation: significance in biomedicine and evolution. Cell Tissue Res 2014; 356:495-505. [PMID: 24744265 DOI: 10.1007/s00441-014-1855-3] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2014] [Accepted: 02/19/2014] [Indexed: 12/29/2022]
Abstract
Cis-regulatory regions (CRR) control gene expression and chromatin modifications. Genetic variation at CRR in individuals across a population contributes to phenotypic differences of biomedical relevance. This standing variation is important for personalized genomic medicine as well as for adaptive evolution and speciation. This review focuses on genetic variation at CRR, its influence on chromatin, gene expression, and ultimately disease phenotypes. In addition, we summarize our understanding of how this variation may contribute to evolution. Recent technological and computational advances have accelerated research in the direction of personalized medicine, combining strengths of molecular biology and genomics. This will pave new ways to understand how CRR variation affects phenotypes and chart out possible avenues of intervention.
Collapse
|
23
|
Villar D, Flicek P, Odom DT. Evolution of transcription factor binding in metazoans - mechanisms and functional implications. Nat Rev Genet 2014; 15:221-33. [PMID: 24590227 PMCID: PMC4175440 DOI: 10.1038/nrg3481] [Citation(s) in RCA: 166] [Impact Index Per Article: 16.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Abstract
Differences in transcription factor binding can contribute to organismal evolution by altering downstream gene expression programmes. Genome-wide studies in Drosophila melanogaster and mammals have revealed common quantitative and combinatorial properties of in vivo DNA binding, as well as marked differences in the rate and mechanisms of evolution of transcription factor binding in metazoans. Here, we review the recently discovered rapid 're-wiring' of in vivo transcription factor binding between related metazoan species and summarize general principles underlying the observed patterns of evolution. We then consider what might explain the differences in genome evolution between metazoan phyla and outline the conceptual and technological challenges facing this research field.
Collapse
Affiliation(s)
- Diego Villar
- University of Cambridge, Cancer Research UK Cambridge Institute, Li Ka Shing Centre, Robinson Way, Cambridge CB2 0RE, UK
| | - Paul Flicek
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB1 01SD, UK
| | - Duncan T Odom
- University of Cambridge, Cancer Research UK Cambridge Institute, Li Ka Shing Centre, Robinson Way, Cambridge CB2 0RE, UK
| |
Collapse
|
24
|
Martinez C, Rest JS, Kim AR, Ludwig M, Kreitman M, White K, Reinitz J. Ancestral resurrection of the Drosophila S2E enhancer reveals accessible evolutionary paths through compensatory change. Mol Biol Evol 2014; 31:903-16. [PMID: 24408913 DOI: 10.1093/molbev/msu042] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Upstream regulatory sequences that control gene expression evolve rapidly, yet the expression patterns and functions of most genes are typically conserved. To address this paradox, we have reconstructed computationally and resurrected in vivo the cis-regulatory regions of the ancestral Drosophila eve stripe 2 element and evaluated its evolution using a mathematical model of promoter function. Our feed-forward transcriptional model predicts gene expression patterns directly from enhancer sequence. We used this functional model along with phylogenetics to generate a set of possible ancestral eve stripe 2 sequences for the common ancestors of 1) D. simulans and D. sechellia; 2) D. melanogaster, D. simulans, and D. sechellia; and 3) D. erecta and D. yakuba. These ancestral sequences were synthesized and resurrected in vivo. Using a combination of quantitative and computational analysis, we find clear support for functional compensation between the binding sites for Bicoid, Giant, and Krüppel over the course of 40-60 My of Drosophila evolution. We show that this compensation is driven by a coupling interaction between Bicoid activation and repression at the anterior and posterior border necessary for proper placement of the anterior stripe 2 border. A multiplicity of mechanisms for binding site turnover exemplified by Bicoid, Giant, and Krüppel sites, explains how rapid sequence change may occur while maintaining the function of the cis-regulatory element.
Collapse
Affiliation(s)
- Carlos Martinez
- Institute for Genomics and Systems Biology, University of Chicago
| | | | | | | | | | | | | |
Collapse
|
25
|
Duque T, Samee MAH, Kazemian M, Pham HN, Brodsky MH, Sinha S. Simulations of enhancer evolution provide mechanistic insights into gene regulation. Mol Biol Evol 2013; 31:184-200. [PMID: 24097306 PMCID: PMC3879441 DOI: 10.1093/molbev/mst170] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/14/2023] Open
Abstract
There is growing interest in models of regulatory sequence evolution. However, existing models specifically designed for regulatory sequences consider the independent evolution of individual transcription factor (TF)-binding sites, ignoring that the function and evolution of a binding site depends on its context, typically the cis-regulatory module (CRM) in which the site is located. Moreover, existing models do not account for the gene-specific roles of TF-binding sites, primarily because their roles often are not well understood. We introduce two models of regulatory sequence evolution that address some of the shortcomings of existing models and implement simulation frameworks based on them. One model simulates the evolution of an individual binding site in the context of a CRM, while the other evolves an entire CRM. Both models use a state-of-the art sequence-to-expression model to predict the effects of mutations on the regulatory output of the CRM and determine the strength of selection. We use the new framework to simulate the evolution of TF-binding sites in 37 well-studied CRMs belonging to the anterior-posterior patterning system in Drosophila embryos. We show that these simulations provide accurate fits to evolutionary data from 12 Drosophila genomes, which includes statistics of binding site conservation on relatively short evolutionary scales and site loss across larger divergence times. The new framework allows us, for the first time, to test hypotheses regarding the underlying cis-regulatory code by directly comparing the evolutionary implications of the hypothesis with the observed evolutionary dynamics of binding sites. Using this capability, we find that explicitly modeling self-cooperative DNA binding by the TF Caudal (CAD) provides significantly better fits than an otherwise identical evolutionary simulation that lacks this mechanistic aspect. This hypothesis is further supported by a statistical analysis of the distribution of intersite spacing between adjacent CAD sites. Experimental tests confirm direct homodimeric interaction between CAD molecules as well as self-cooperative DNA binding by CAD. We note that computational modeling of the D. melanogaster CRMs alone did not yield significant evidence to support CAD self-cooperativity. We thus demonstrate how specific mechanistic details encoded in CRMs can be revealed by modeling their evolution and fitting such models to multispecies data.
Collapse
Affiliation(s)
- Thyago Duque
- Department of Computer Science, University of Illinois at Urbana-Champaign
| | | | | | | | | | | |
Collapse
|
26
|
Karagodin DA, Omelina ES, Fedorova EV, Baricheva EM. Identification of functionally significant elements in the second intron of the Drosophila melanogaster Trithorax-like gene. Gene 2013; 520:178-84. [PMID: 23481306 DOI: 10.1016/j.gene.2013.02.012] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2012] [Revised: 01/14/2013] [Accepted: 02/07/2013] [Indexed: 10/27/2022]
Abstract
It is known that a lot of genes having a distinct expression pattern require the complex system of transcription regulation. The regulatory regions of such genes can include not only the 5'-flanking regions, but also other regions, particularly their intron sequences. The Drosophila melanogaster Trithorax-like (Trl) gene, encoding the GAGA protein, is one of the genes with complex expression pattern. GAGA is one of a few transcription factors that can regulate gene expression at multiple levels. The GAGA-mediated modulation of expression seems to be linked with modifications of the chromatin structure. Nowadays, the regulatory potential of the Trl 5'-flanking region that contains multiple GAGA binding sites has been analyzed, but the presence of the functionally significant elements in other Trl regions has not been examined. We found DNase I hypersensitive sites, evolutionary-conserved sequences and numerous GAGA binding sites in the second intron of the Trl gene. Interestingly, these sequences localize in two main regions of the intron in immediate proximity to preferred regions of transposon insertions. Additionally, we revealed that deletion of the intron fragment in the Trl(1-72) mutants caused an alteration of the Trl expression pattern. These results allow us to conclude that the second intron of the Trl gene contains functionally significant elements.
Collapse
Affiliation(s)
- D A Karagodin
- Institute of Cytology and Genetics, Siberian Branch, Russian Academy of Sciences, 10 Lavrentieva Street, Novosibirsk 630090, Russian Federation
| | | | | | | |
Collapse
|
27
|
Hupalo D, Kern AD. Conservation and functional element discovery in 20 angiosperm plant genomes. Mol Biol Evol 2013; 30:1729-44. [PMID: 23640124 DOI: 10.1093/molbev/mst082] [Citation(s) in RCA: 34] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022] Open
Abstract
Here, we describe the construction of a phylogenetically deep, whole-genome alignment of 20 flowering plants, along with an analysis of plant genome conservation. Each included angiosperm genome was aligned to a reference genome, Arabidopsis thaliana, using the LASTZ/MULTIZ paradigm and tools from the University of California-Santa Cruz Genome Browser source code. In addition to the multiple alignment, we created a local genome browser displaying multiple tracks of newly generated genome annotation, as well as annotation sourced from published data of other research groups. An investigation into A. thaliana gene features present in the aligned A. lyrata genome revealed better conservation of start codons, stop codons, and splice sites within our alignments (51% of features from A. thaliana conserved without interruption in A. lyrata) when compared with previous publicly available plant pairwise alignments (34% of features conserved). The detailed view of conservation across angiosperms revealed not only high coding-sequence conservation but also a large set of previously uncharacterized intergenic conservation. From this, we annotated the collection of conserved features, revealing dozens of putative noncoding RNAs, including some with recorded small RNA expression. Comparing conservation between kingdoms revealed a faster decay of vertebrate genome features when compared with angiosperm genomes. Finally, conserved sequences were searched for folding RNA features, including but not limited to noncoding RNA (ncRNA) genes. Among these, we highlight a double hairpin in the 5'-untranslated region (5'-UTR) of the PRIN2 gene and a putative ncRNA with homology targeting the LAF3 protein.
Collapse
Affiliation(s)
- Daniel Hupalo
- Department of Biological Sciences, Dartmouth College, Hanover, New Hampshire, USA.
| | | |
Collapse
|
28
|
Wunderlich Z, Bragdon MD, Eckenrode KB, Lydiard-Martin T, Pearl-Waserman S, DePace AH. Dissecting sources of quantitative gene expression pattern divergence between Drosophila species. Mol Syst Biol 2013; 8:604. [PMID: 22893002 PMCID: PMC3435502 DOI: 10.1038/msb.2012.35] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2012] [Accepted: 07/12/2012] [Indexed: 12/21/2022] Open
Abstract
Gene expression patterns can diverge between species due to changes in a gene's regulatory DNA or changes in the proteins, e.g., transcription factors (TFs), that regulate the gene. We developed a modeling framework to uncover the sources of expression differences in blastoderm embryos of three Drosophila species, focusing on the regulatory circuit controlling expression of the hunchback (hb) posterior stripe. Using this framework and cellular-resolution expression measurements of hb and its regulating TFs, we found that changes in the expression patterns of hb's TFs account for much of the expression divergence. We confirmed our predictions using transgenic D. melanogaster lines, which demonstrate that this set of orthologous cis-regulatory elements (CREs) direct similar, but not identical, expression patterns. We related expression pattern differences to sequence changes in the CRE using a calculation of the CRE's TF binding site content. By applying this calculation in both the transgenic and endogenous contexts, we found that changes in binding site content affect sensitivity to regulating TFs and that compensatory evolution may occur in circuit components other than the CRE.
Collapse
Affiliation(s)
- Zeba Wunderlich
- Department of Systems Biology, Harvard Medical School, Boston, MA 02115, USA
| | | | | | | | | | | |
Collapse
|
29
|
Goode DK, Elgar G. Capturing the regulatory interactions of eukaryote genomes. Brief Funct Genomics 2012; 12:142-60. [PMID: 23117864 DOI: 10.1093/bfgp/els041] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/20/2023] Open
Abstract
A key finding from early genomics research is the remarkable consistency in the number of protein-coding regions across diverse species. This has led many researchers to look to the cis-regulatory elements of genes as the fundamental influence behind evolving gene function and subsequent species diversification. Historically, since these elements are often located in vast intergenic and intronic regions of the genome, their identification has been recalcitrant. Now, with the deluge of whole-genome data from representatives of numerous eukaryotic lineages, various approaches have enabled us to begin to recognize features that characterize regulatory regions of the genome. Here we endeavour to collate these approaches in order to give an overview of the complexities involved in extrapolating regulatory signatures. The resource provided by the escalating richness of whole-genome datasets enables more sophisticated modelling of these regulatory signatures yet at the same time introduces increasing potential for noise. While we are only at the advent of making these discoveries, the next decade promises to be a very exciting and rewarding time for genome researchers.
Collapse
Affiliation(s)
- Debbie K Goode
- Cambridge Institute for Medical Research, Deptartment of Haematology, Addenbrooke's Hospital, Hills Road, Cambridge, UK
| | | |
Collapse
|
30
|
Spivakov M, Akhtar J, Kheradpour P, Beal K, Girardot C, Koscielny G, Herrero J, Kellis M, Furlong EEM, Birney E. Analysis of variation at transcription factor binding sites in Drosophila and humans. Genome Biol 2012; 13:R49. [PMID: 22950968 PMCID: PMC3491393 DOI: 10.1186/gb-2012-13-9-r49] [Citation(s) in RCA: 77] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2012] [Revised: 05/23/2012] [Accepted: 06/08/2012] [Indexed: 12/31/2022] Open
Abstract
BACKGROUND Advances in sequencing technology have boosted population genomics and made it possible to map the positions of transcription factor binding sites (TFBSs) with high precision. Here we investigate TFBS variability by combining transcription factor binding maps generated by ENCODE, modENCODE, our previously published data and other sources with genomic variation data for human individuals and Drosophila isogenic lines. RESULTS We introduce a metric of TFBS variability that takes into account changes in motif match associated with mutation and makes it possible to investigate TFBS functional constraints instance-by-instance as well as in sets that share common biological properties. We also take advantage of the emerging per-individual transcription factor binding data to show evidence that TFBS mutations, particularly at evolutionarily conserved sites, can be efficiently buffered to ensure coherent levels of transcription factor binding. CONCLUSIONS Our analyses provide insights into the relationship between individual and interspecies variation and show evidence for the functional buffering of TFBS mutations in both humans and flies. In a broad perspective, these results demonstrate the potential of combining functional genomics and population genetics approaches for understanding gene regulation.
Collapse
Affiliation(s)
- Mikhail Spivakov
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK.
| | | | | | | | | | | | | | | | | | | |
Collapse
|
31
|
Hartmann H, Guthöhrlein EW, Siebert M, Luehr S, Söding J. P-value-based regulatory motif discovery using positional weight matrices. Genome Res 2012; 23:181-94. [PMID: 22990209 PMCID: PMC3530678 DOI: 10.1101/gr.139881.112] [Citation(s) in RCA: 45] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
To analyze gene regulatory networks, the sequence-dependent DNA/RNA binding affinities of proteins and noncoding RNAs are crucial. Often, these are deduced from sets of sequences enriched in factor binding sites. Two classes of computational approaches exist. The first describe binding motifs by sequence patterns and search the patterns with highest statistical significance for enrichment. The second class uses the more powerful position weight matrices (PWMs). Instead of maximizing the statistical significance of enrichment, they maximize a likelihood. Here we present XXmotif (eXhaustive evaluation of matriX motifs), the first PWM-based motif discovery method that can optimize PWMs by directly minimizing their P-values of enrichment. Optimization requires computing millions of enrichment P-values for thousands of PWMs. For a given PWM, the enrichment P-value is calculated efficiently from the match P-values of all possible motif placements in the input sequences using order statistics. The approach can naturally combine P-values for motif enrichment, conservation, and localization. On ChIP-chip/seq, miRNA knock-down, and coexpression data sets from yeast and metazoans, XXmotif outperformed state-of-the-art tools, both in numbers of correctly identified motifs and in the quality of PWMs. In segmentation modules of D. melanogaster, we detect the known key regulators and several new motifs. In human core promoters, XXmotif reports most previously described and eight novel motifs sharply peaked around the transcription start site, among them an Initiator motif similar to the fly and yeast versions. XXmotif's sensitivity, reliability, and usability will help to leverage the quickly accumulating wealth of functional genomics data.
Collapse
Affiliation(s)
- Holger Hartmann
- Gene Center and Department of Biochemistry, Ludwig-Maximilians-Universität München, Feodor-Lynen-Straße 25, 81377 Munich, Germany
| | | | | | | | | |
Collapse
|
32
|
Zhang X, Moret BME. Refining regulatory networks through phylogenetic transfer of information. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2012; 9:1032-1045. [PMID: 22547434 DOI: 10.1109/tcbb.2012.62] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/31/2023]
Abstract
The experimental determination of transcriptional regulatory networks in the laboratory remains difficult and timeconsuming, while computational methods to infer these networks provide only modest accuracy. The latter can be attributed partly to the limitations of a single-organism approach. Computational biology has long used comparative and evolutionary approaches to extend the reach and accuracy of its analyses. In this paper, we describe ProPhyC, a probabilistic phylogenetic model and associated inference algorithms, designed to improve the inference of regulatory networks for a family of organisms by using known evolutionary relationships among these organisms. ProPhyC can be used with various network evolutionary models and any existing inference method. Extensive experimental results on both biological and synthetic data confirm that our model (through its associated refinement algorithms) yields substantial improvement in the quality of inferred networks over all current methods. We also compare ProPhyC with a transfer learning approach we design. This approach also uses phylogenetic relationships while inferring regulatory networks for a family of organisms. Using similar input information but designed in a very different framework, this transfer learning approach does not perform better than ProPhyC, which indicates that ProPhyC makes good use of the evolutionary information.
Collapse
Affiliation(s)
- Xiuwei Zhang
- Laboratory for Computational Biology and Bioinformatics, Ecole Polytechnique Fédérale de Lausanne, Swiss Institute of Bioinformatics, EPFL IC IIF LCBB INJ211, Lausanne CH-1015, Switzerland.
| | | |
Collapse
|
33
|
Widespread site-dependent buffering of human regulatory polymorphism. PLoS Genet 2012; 8:e1002599. [PMID: 22457641 PMCID: PMC3310774 DOI: 10.1371/journal.pgen.1002599] [Citation(s) in RCA: 62] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2011] [Accepted: 02/03/2012] [Indexed: 11/19/2022] Open
Abstract
The average individual is expected to harbor thousands of variants within non-coding genomic regions involved in gene regulation. However, it is currently not possible to interpret reliably the functional consequences of genetic variation within any given transcription factor recognition sequence. To address this, we comprehensively analyzed heritable genome-wide binding patterns of a major sequence-specific regulator (CTCF) in relation to genetic variability in binding site sequences across a multi-generational pedigree. We localized and quantified CTCF occupancy by ChIP-seq in 12 related and unrelated individuals spanning three generations, followed by comprehensive targeted resequencing of the entire CTCF–binding landscape across all individuals. We identified hundreds of variants with reproducible quantitative effects on CTCF occupancy (both positive and negative). While these effects paralleled protein–DNA recognition energetics when averaged, they were extensively buffered by striking local context dependencies. In the significant majority of cases buffering was complete, resulting in silent variants spanning every position within the DNA recognition interface irrespective of level of binding energy or evolutionary constraint. The prevalence of complex partial or complete buffering effects severely constrained the ability to predict reliably the impact of variation within any given binding site instance. Surprisingly, 40% of variants that increased CTCF occupancy occurred at positions of human–chimp divergence, challenging the expectation that the vast majority of functional regulatory variants should be deleterious. Our results suggest that, even in the presence of “perfect” genetic information afforded by resequencing and parallel studies in multiple related individuals, genomic site-specific prediction of the consequences of individual variation in regulatory DNA will require systematic coupling with empirical functional genomic measurements. A comprehensive understanding of the contribution of individual genome sequences to disease and quantitative traits will require the general ability to predict consequences of genetic variation in non-protein-coding regions, particularly those involved in gene regulation. Here we tested the power to predict such consequences when presented with “complete” information encompassing the genomic DNA binding site patterns of a well-studied regulatory protein across multiple related individuals, coupled with all individual genome sequences at the binding positions. We find that, while there is reasonable ability to predict the average effects of variation within the consensus recognition sequence of a transcriptional regulator, it is not possible to determine reliably the consequences of variation at any given genomic instance. This suggests that the interpretation of individual genome sequences will require comprehensive complementation with functional genomic studies.
Collapse
|
34
|
Abstract
Copy number variation has recently received considerable attention, and copy number variants (CNVs) have been shown to be both common in mammalian genomes and important for understanding genetic and phenotypic variation. As empirical knowledge and detection methods are quickly advancing, evolutionary theories about CNVs are rapidly updated and often revised. Here, we review recent progress on understanding CNVs, and we discuss some key issues for future research. In essence, we discuss four major forces in population genetics, recombination, mutation, selection, and demography, in relation to CNVs.
Collapse
|
35
|
He X, Duque TSPC, Sinha S. Evolutionary origins of transcription factor binding site clusters. Mol Biol Evol 2011; 29:1059-70. [PMID: 22075113 DOI: 10.1093/molbev/msr277] [Citation(s) in RCA: 52] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
Empirical studies have revealed that regulatory DNA sequences such as enhancers or promoters often harbor multiple binding sites for the same transcription factor. Such "homotypic site clustering" has been hypothesized as arising out of functional requirements of the sequences. Here, we propose an alternative explanation of this phenomenon that multisite enhancers are common because they are favored by evolutionary sampling of the genotype-phenotype landscape. To test this hypothesis, we developed a new computational framework specialized for population genetic simulations of enhancer evolution. It uses a thermodynamics-based model of enhancer function, integrating information from strong as well as weak binding sites, to determine the strength of selection. Using this framework, we found that even when simpler genotypes exist for a desired strength of regulation, relatively complex genotypes (enhancers with more sites) are more readily reached by the simulated evolutionary process. We show that there are more ways to "build" a fit genotype with many weak sites than with a few strong sites, and this is why evolution finds complex genotypes more often. Our claims are consistent with an empirical analysis of binding site content in enhancers characterized in Drosophila melanogaster and their orthologs in other Drosophila species. We also characterized a subtle but significant difference between genotypes likely to be sampled by evolution and equally fit genotypes one would obtain by uniform sampling of the fitness landscape, that is, an "evolutionary signature" in enhancer sequences. Finally, we investigated potential effects of other factors, such as rugged fitness landscapes, short local duplications, and noise characteristics of enhancers, on the emergence of homotypic site clustering. Homotypic site clustering is an important contributor to the complexity and function of cis-regulatory sequences. This work provides a simple null hypothesis for its origin, against which alternative adaptationist explanations may be evaluated, and cautions against "evolutionary mirages" present in common features of genomic sequence. The quantitative framework we develop here can be used more generally to understand how mechanisms of enhancer action influence their composition and evolution.
Collapse
Affiliation(s)
- Xin He
- Department of Biochemistry, University of California at San Francisco, CA, USA
| | | | | |
Collapse
|
36
|
A conserved developmental patterning network produces quantitatively different output in multiple species of Drosophila. PLoS Genet 2011; 7:e1002346. [PMID: 22046143 PMCID: PMC3203197 DOI: 10.1371/journal.pgen.1002346] [Citation(s) in RCA: 45] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2011] [Accepted: 08/27/2011] [Indexed: 11/18/2022] Open
Abstract
Differences in the level, timing, or location of gene expression can contribute to alternative phenotypes at the molecular and organismal level. Understanding the origins of expression differences is complicated by the fact that organismal morphology and gene regulatory networks could potentially vary even between closely related species. To assess the scope of such changes, we used high-resolution imaging methods to measure mRNA expression in blastoderm embryos of Drosophila yakuba and Drosophila pseudoobscura and assembled these data into cellular resolution atlases, where expression levels for 13 genes in the segmentation network are averaged into species-specific, cellular resolution morphological frameworks. We demonstrate that the blastoderm embryos of these species differ in their morphology in terms of size, shape, and number of nuclei. We present an approach to compare cellular gene expression patterns between species, while accounting for varying embryo morphology, and apply it to our data and an equivalent dataset for Drosophila melanogaster. Our analysis reveals that all individual genes differ quantitatively in their spatio-temporal expression patterns between these species, primarily in terms of their relative position and dynamics. Despite many small quantitative differences, cellular gene expression profiles for the whole set of genes examined are largely similar. This suggests that cell types at this stage of development are conserved, though they can differ in their relative position by up to 3–4 cell widths and in their relative proportion between species by as much as 5-fold. Quantitative differences in the dynamics and relative level of a subset of genes between corresponding cell types may reflect altered regulatory functions between species. Our results emphasize that transcriptional networks can diverge over short evolutionary timescales and that even small changes can lead to distinct output in terms of the placement and number of equivalent cells. For a gene to function properly, it must be active in the right place, at the right time, and in the right amount. Changes in any of these features can lead to observable differences between individuals and species and in some cases can lead to disease. We do not currently understand how the position, timing, and amount of gene expression is encoded in DNA sequence. One approach to this problem is to compare how gene expression differs between species and to try to relate changes in DNA sequence to changes in gene expression. Here, we take the first step by comparing gene expression patterns at high spatial and temporal resolution between embryos of three species of fruit flies. We develop methods for comparing gene expression in individual cells, which allow us to control for variation in the size, shape, and number of nuclei between embryos. We find measurable quantitative differences in the patterns for all individual genes that we have examined. However, by considering all genes in our dataset at once, we show that many genes are changing together, leading to largely equivalent types of cells in these three species.
Collapse
|
37
|
Zheng W, Gianoulis TA, Karczewski KJ, Zhao H, Snyder M. Regulatory Variation Within and Between Species. Annu Rev Genomics Hum Genet 2011; 12:327-46. [DOI: 10.1146/annurev-genom-082908-150139] [Citation(s) in RCA: 58] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Affiliation(s)
- Wei Zheng
- Department of Molecular, Cellular, and Developmental Biology, Biostatics Resources, Keck Laboratory, Yale University, New Haven, Connecticut 06520;
| | - Tara A. Gianoulis
- Department of Genetics and Wyss Institute for Biologically Inspired Engineering, Harvard Medical School, Boston, Massachusetts 02115;
| | - Konrad J. Karczewski
- Department of Genetics, Stanford University School of Medicine, Stanford, California 94305; ,
| | - Hongyu Zhao
- Biostatistics Division, Yale School of Public Health, New Haven, Connecticut 06520;
| | - Michael Snyder
- Department of Genetics, Stanford University School of Medicine, Stanford, California 94305; ,
| |
Collapse
|
38
|
Cruickshank T, Nista P. Selection and constraint on regulatory elements in Drosophila simulans. J Mol Evol 2011; 73:94-100. [DOI: 10.1007/s00239-011-9458-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2011] [Accepted: 08/19/2011] [Indexed: 10/17/2022]
|
39
|
Abstract
We tested whether functionally important sites in bacterial, yeast, and animal promoters are more conserved than their neighbors. We found that substitutions are predominantly seen in less important sites and that those that occurred tended to have less impact on gene expression than possible alternatives. These results suggest that purifying selection operates on promoter sequences.
Collapse
|
40
|
Wunderlich Z, DePace AH. Modeling transcriptional networks in Drosophila development at multiple scales. Curr Opin Genet Dev 2011; 21:711-8. [PMID: 21889888 DOI: 10.1016/j.gde.2011.07.005] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2011] [Accepted: 07/20/2011] [Indexed: 11/29/2022]
Abstract
Quantitative models of developmental processes can provide insights at multiple scales. Ultimately, models may be particularly informative for key questions about network level behavior during development such as how does the system respond to environmental perturbation, or operate reliably in different genetic backgrounds? The transcriptional networks that pattern the Drosophila embryo have been the subject of numerous quantitative experimental studies coupled to modeling frameworks in recent years. In this review, we describe three studies that consider these networks at different levels of molecular detail and therefore result in different types of insights. We also discuss other developmental transcriptional networks operating in Drosophila, with the goal of highlighting what additional insights they may provide.
Collapse
Affiliation(s)
- Zeba Wunderlich
- Department of Systems Biology, Harvard Medical School, Boston, MA 02115, USA
| | | |
Collapse
|
41
|
Needles in stacks of needles: finding disease-causal variants in a wealth of genomic data. Nat Rev Genet 2011; 12:628-40. [PMID: 21850043 DOI: 10.1038/nrg3046] [Citation(s) in RCA: 394] [Impact Index Per Article: 30.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
Genome and exome sequencing yield extensive catalogues of human genetic variation. However, pinpointing the few phenotypically causal variants among the many variants present in human genomes remains a major challenge, particularly for rare and complex traits wherein genetic information alone is often insufficient. Here, we review approaches to estimate the deleteriousness of single nucleotide variants (SNVs), which can be used to prioritize disease-causal variants. We describe recent advances in comparative and functional genomics that enable systematic annotation of both coding and non-coding variants. Application and optimization of these methods will be essential to find the genetic answers that sequencing promises to hide in plain sight.
Collapse
|
42
|
Swamy KBS, Chu WY, Wang CY, Tsai HK, Wang D. Evidence of association between nucleosome occupancy and the evolution of transcription factor binding sites in yeast. BMC Evol Biol 2011; 11:150. [PMID: 21627806 PMCID: PMC3124427 DOI: 10.1186/1471-2148-11-150] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2011] [Accepted: 05/31/2011] [Indexed: 11/14/2022] Open
Abstract
Background Divergence of transcription factor binding sites is considered to be an important source of regulatory evolution. The associations between transcription factor binding sites and phenotypic diversity have been investigated in many model organisms. However, the understanding of other factors that contribute to it is still limited. Recent studies have elucidated the effect of chromatin structure on molecular evolution of genomic DNA. Though the profound impact of nucleosome positions on gene regulation has been reported, their influence on transcriptional evolution is still less explored. With the availability of genome-wide nucleosome map in yeast species, it is thus desirable to investigate their impact on transcription factor binding site evolution. Here, we present a comprehensive analysis of the role of nucleosome positioning in the evolution of transcription factor binding sites. Results We compared the transcription factor binding site frequency in nucleosome occupied regions and nucleosome depleted regions in promoters of old (orthologs among Saccharomycetaceae) and young (Saccharomyces specific) genes; and in duplicate gene pairs. We demonstrated that nucleosome occupied regions accommodate greater binding site variations than nucleosome depleted regions in young genes and in duplicate genes. This finding was confirmed by measuring the difference in evolutionary rates of binding sites in sensu stricto yeasts at nucleosome occupied regions and nucleosome depleted regions. The binding sites at nucleosome occupied regions exhibited a consistently higher evolution rate than those at nucleosome depleted regions, corroborating the difference in the selection constraints at the two regions. Finally, through site-directed mutagenesis experiment, we found that binding site gain or loss events at nucleosome depleted regions may cause more expression differences than those in nucleosome occupied regions. Conclusions Our study indicates the existence of different selection constraint on binding sites at nucleosome occupied regions than at the nucleosome depleted regions. We found that the binding sites have a different rate of evolution at nucleosome occupied and depleted regions. Finally, using transcription factor binding site-directed mutagenesis experiment, we confirmed the difference in the impact of binding site changes on expression at these regions. Thus, our work demonstrates the importance of composite analysis of chromatin and transcriptional evolution.
Collapse
Affiliation(s)
- Krishna B S Swamy
- Institute of Information Science, Academia Sinica, Taipei, 115, Taiwan
| | | | | | | | | |
Collapse
|
43
|
Integrated genome-scale prediction of detrimental mutations in transcription networks. PLoS Genet 2011; 7:e1002077. [PMID: 21637788 PMCID: PMC3102745 DOI: 10.1371/journal.pgen.1002077] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2010] [Accepted: 03/25/2011] [Indexed: 01/10/2023] Open
Abstract
A central challenge in genetics is to understand when and why mutations alter the phenotype of an organism. The consequences of gene inhibition have been systematically studied and can be predicted reasonably well across a genome. However, many sequence variants important for disease and evolution may alter gene regulation rather than gene function. The consequences of altering a regulatory interaction (or “edge”) rather than a gene (or “node”) in a network have not been as extensively studied. Here we use an integrative analysis and evolutionary conservation to identify features that predict when the loss of a regulatory interaction is detrimental in the extensively mapped transcription network of budding yeast. Properties such as the strength of an interaction, location and context in a promoter, regulator and target gene importance, and the potential for compensation (redundancy) associate to some extent with interaction importance. Combined, however, these features predict quite well whether the loss of a regulatory interaction is detrimental across many promoters and for many different transcription factors. Thus, despite the potential for regulatory diversity, common principles can be used to understand and predict when changes in regulation are most harmful to an organism. The genomes of individuals differ in sequence at thousands of base pairs. Some of these polymorphisms affect the sequence of proteins, but many are likely to alter how genes are regulated. When are changes in gene regulation detrimental to an organism? We have used an integrative analysis of transcription factor binding site conservation in budding yeast to address the extent to which different features predict when potential changes in gene regulation are detrimental. We found that, despite the diversity of transcription factors and regulatory regions in a genome, a few simple properties can be used to predict and understand when changes in regulation are most harmful.
Collapse
|
44
|
He BZ, Holloway AK, Maerkl SJ, Kreitman M. Does positive selection drive transcription factor binding site turnover? A test with Drosophila cis-regulatory modules. PLoS Genet 2011; 7:e1002053. [PMID: 21572512 PMCID: PMC3084208 DOI: 10.1371/journal.pgen.1002053] [Citation(s) in RCA: 76] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2010] [Accepted: 03/02/2011] [Indexed: 12/04/2022] Open
Abstract
Transcription factor binding site(s) (TFBS) gain and loss (i.e., turnover) is a well-documented feature of cis-regulatory module (CRM) evolution, yet little attention has been paid to the evolutionary force(s) driving this turnover process. The predominant view, motivated by its widespread occurrence, emphasizes the importance of compensatory mutation and genetic drift. Positive selection, in contrast, although it has been invoked in specific instances of adaptive gene expression evolution, has not been considered as a general alternative to neutral compensatory evolution. In this study we evaluate the two hypotheses by analyzing patterns of single nucleotide polymorphism in the TFBS of well-characterized CRM in two closely related Drosophila species, Drosophila melanogaster and Drosophila simulans. An important feature of the analysis is classification of TFBS mutations according to the direction of their predicted effect on binding affinity, which allows gains and losses to be evaluated independently along the two phylogenetic lineages. The observed patterns of polymorphism and divergence are not compatible with neutral evolution for either class of mutations. Instead, multiple lines of evidence are consistent with contributions of positive selection to TFBS gain and loss as well as purifying selection in its maintenance. In discussion, we propose a model to reconcile the finding of selection driving TFBS turnover with constrained CRM function over long evolutionary time.
Collapse
Affiliation(s)
- Bin Z He
- Department of Ecology and Evolution, The University of Chicago, Chicago, Illinois, USA.
| | | | | | | |
Collapse
|
45
|
Kaplan T, Li XY, Sabo PJ, Thomas S, Stamatoyannopoulos JA, Biggin MD, Eisen MB. Quantitative models of the mechanisms that control genome-wide patterns of transcription factor binding during early Drosophila development. PLoS Genet 2011; 7:e1001290. [PMID: 21304941 PMCID: PMC3033374 DOI: 10.1371/journal.pgen.1001290] [Citation(s) in RCA: 139] [Impact Index Per Article: 10.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2010] [Accepted: 01/01/2011] [Indexed: 01/01/2023] Open
Abstract
Transcription factors that drive complex patterns of gene expression during animal development bind to thousands of genomic regions, with quantitative differences in binding across bound regions mediating their activity. While we now have tools to characterize the DNA affinities of these proteins and to precisely measure their genome-wide distribution in vivo, our understanding of the forces that determine where, when, and to what extent they bind remains primitive. Here we use a thermodynamic model of transcription factor binding to evaluate the contribution of different biophysical forces to the binding of five regulators of early embryonic anterior-posterior patterning in Drosophila melanogaster. Predictions based on DNA sequence and in vitro protein-DNA affinities alone achieve a correlation of ∼0.4 with experimental measurements of in vivo binding. Incorporating cooperativity and competition among the five factors, and accounting for spatial patterning by modeling binding in every nucleus independently, had little effect on prediction accuracy. A major source of error was the prediction of binding events that do not occur in vivo, which we hypothesized reflected reduced accessibility of chromatin. To test this, we incorporated experimental measurements of genome-wide DNA accessibility into our model, effectively restricting predicted binding to regions of open chromatin. This dramatically improved our predictions to a correlation of 0.6-0.9 for various factors across known target genes. Finally, we used our model to quantify the roles of DNA sequence, accessibility, and binding competition and cooperativity. Our results show that, in regions of open chromatin, binding can be predicted almost exclusively by the sequence specificity of individual factors, with a minimal role for protein interactions. We suggest that a combination of experimentally determined chromatin accessibility data and simple computational models of transcription factor binding may be used to predict the binding landscape of any animal transcription factor with significant precision.
Collapse
Affiliation(s)
- Tommy Kaplan
- Department of Molecular and Cell Biology, California Institute of Quantitative Biosciences, University of California Berkeley, Berkeley, California, United States of America
| | - Xiao-Yong Li
- Howard Hughes Medical Institute, University of California Berkeley, Berkeley, California, United States of America
| | - Peter J. Sabo
- Department of Genome Sciences, University of Washington, Seattle, Washington, United States of America
| | - Sean Thomas
- Department of Genome Sciences, University of Washington, Seattle, Washington, United States of America
| | | | - Mark D. Biggin
- Genomics Division, Lawrence Berkeley National Laboratory, Berkeley, California, United States of America
| | - Michael B. Eisen
- Department of Molecular and Cell Biology, California Institute of Quantitative Biosciences, University of California Berkeley, Berkeley, California, United States of America
- Howard Hughes Medical Institute, University of California Berkeley, Berkeley, California, United States of America
- Genomics Division, Lawrence Berkeley National Laboratory, Berkeley, California, United States of America
| |
Collapse
|
46
|
Bullaughey K. Changes in selective effects over time facilitate turnover of enhancer sequences. Genetics 2011; 187:567-82. [PMID: 21098721 PMCID: PMC3030497 DOI: 10.1534/genetics.110.121590] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2010] [Accepted: 11/10/2010] [Indexed: 11/18/2022] Open
Abstract
Correct gene expression is often critical and consequently stabilizing selection on expression is widespread. Yet few genes possess highly conserved regulatory DNA, and for the few enhancers that have been carefully characterized, substantial functional reorganization has often occurred. Given that natural selection removes mutations of even very small deleterious effect, how can transcription factor binding evolve so readily when it underlies a conserved phenotype? As a first step toward addressing this question, I combine a computational model for regulatory function that incorporates many aspects of our present biological knowledge with a model for the fitness effects of misexpression. I then use this model to study the evolution of enhancers. Several robust behaviors emerge: First, the selective effects of mutations at a site change dramatically over time due to substitutions elsewhere in the enhancer, and even the overall degree of constraint across the enhancer can change considerably. Second, many of the substitutions responsible for changes in binding occur at sites where previously the mutation would have been strongly deleterious, suggesting that fluctuations in selective effects at a site are important for functional turnover. Third, most substitutions contributing to the repatterning of binding and constraint are effectively neutral, highlighting the importance of genetic drift-even for enhancers underlying conserved phenotypes. These findings have important implications for phylogenetic inference of function and for interpretations of selection coefficients estimated for regulatory DNA.
Collapse
Affiliation(s)
- Kevin Bullaughey
- Department of Ecology and Evolution, University of Chicago, Chicago, Illinois 60637, USA.
| |
Collapse
|
47
|
Moses AM, Landry CR. Moving from transcriptional to phospho-evolution: generalizing regulatory evolution? Trends Genet 2010; 26:462-7. [PMID: 20817339 DOI: 10.1016/j.tig.2010.08.002] [Citation(s) in RCA: 38] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2010] [Revised: 07/29/2010] [Accepted: 08/03/2010] [Indexed: 12/31/2022]
Abstract
Much of biological diversity is thought to arise from changes in regulatory networks. Although the role of transcriptional regulation has been well established, the contribution to evolution of changes at other levels of regulation has yet to be addressed. Using examples from the literature and recent studies on the evolution of protein phosphorylation, we argue that protein regulatory networks also play a prime role in generating diversity within and between species. Because there are several analogies between the regulation of protein functions by kinases and the regulation of gene expression by transcription factors, the principles that guide transcriptional regulatory evolution can also be explored in kinase-substrate networks. These comparisons will allow us to generalize existing models of evolution across the complex layers of the cell's regulatory links.
Collapse
Affiliation(s)
- Alan M Moses
- Department of Cell and Systems Biology, University of Toronto, Toronto, Ontario M5S 3B2, Canada
| | | |
Collapse
|
48
|
Lang M, Juan E. Binding site number variation and high-affinity binding consensus of Myb-SANT-like transcription factor Adf-1 in Drosophilidae. Nucleic Acids Res 2010; 38:6404-17. [PMID: 20542916 PMCID: PMC2965233 DOI: 10.1093/nar/gkq504] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023] Open
Abstract
There is a growing interest in the evolution of transcription factor binding sites and corresponding functional change of transcriptional regulation. In this context, we have examined the structural changes of the ADF-1 binding sites at the Adh promoters of Drosophila funebris and D. virilis. We detected an expanded footprinted region in D. funebris that contains various adjacent binding sites with different binding affinities. ADF-1 was described to direct sequence-specific DNA binding to sites consisting of the multiple trinucleotide repeat . The ADF-1 recognition sites with high binding affinity differ from this trinucleotide repeat consensus sequence and a new consensus sequence is proposed for the high-affinity ADF-1 binding sites. In vitro transcription experiments with the D. funebris and D. virilis ADF-1 binding regions revealed that stronger ADF-1 binding to the expanded D. funebris ADF-1 binding region only moderately lead to increased transcriptional activity of the Adh gene. The potential of this regional expansion is discussed in the context of different ADF-1 cellular concentrations and maintenance of the ADF-1 stimulus. Altogether, evolutionary change of ADF-1 binding regions involves both, rearrangements of complex binding site cluster and also nucleotide substitutions within sites that lead to different binding affinities.
Collapse
Affiliation(s)
- Michael Lang
- Departament de Genètica, Universitat de Barcelona, 08028 Barcelona, Spain
| | | |
Collapse
|
49
|
Tian S, Haney RA, Feder ME. Phylogeny disambiguates the evolution of heat-shock cis-regulatory elements in Drosophila. PLoS One 2010; 5:e10669. [PMID: 20498853 PMCID: PMC2871787 DOI: 10.1371/journal.pone.0010669] [Citation(s) in RCA: 33] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2010] [Accepted: 04/23/2010] [Indexed: 11/19/2022] Open
Abstract
Heat-shock genes have a well-studied control mechanism for their expression that is mediated through cis-regulatory motifs known as heat-shock elements (HSEs). The evolution of important features of this control mechanism has not been investigated in detail, however. Here we exploit the genome sequencing of multiple Drosophila species, combined with a wealth of available information on the structure and function of HSEs in D. melanogaster, to undertake this investigation. We find that in single-copy heat shock genes, entire HSEs have evolved or disappeared 14 times, and the phylogenetic approach bounds the timing and direction of these evolutionary events in relation to speciation. In contrast, in the multi-copy gene Hsp70, the number of HSEs is nearly constant across species. HSEs evolve in size, position, and sequence within heat-shock promoters. In turn, functional significance of certain features is implicated by preservation despite this evolutionary change; these features include tail-to-tail arrangements of HSEs, gapped HSEs, and the presence or absence of entire HSEs. The variation among Drosophila species indicates that the cis-regulatory encoding of responsiveness to heat and other stresses is diverse. The broad dimensions of variation uncovered are particularly important as they suggest a substantial challenge for functional studies.
Collapse
Affiliation(s)
- Sibo Tian
- Department of Organismal Biology and Anatomy, University of Chicago, Chicago, Illinois, United States of America
| | - Robert A. Haney
- Department of Organismal Biology and Anatomy, University of Chicago, Chicago, Illinois, United States of America
| | - Martin E. Feder
- Department of Organismal Biology and Anatomy, University of Chicago, Chicago, Illinois, United States of America
| |
Collapse
|
50
|
Kim J, Sinha S. Towards realistic benchmarks for multiple alignments of non-coding sequences. BMC Bioinformatics 2010; 11:54. [PMID: 20102627 PMCID: PMC2823711 DOI: 10.1186/1471-2105-11-54] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2009] [Accepted: 01/26/2010] [Indexed: 02/02/2023] Open
Abstract
BACKGROUND With the continued development of new computational tools for multiple sequence alignment, it is necessary today to develop benchmarks that aid the selection of the most effective tools. Simulation-based benchmarks have been proposed to meet this necessity, especially for non-coding sequences. However, it is not clear if such benchmarks truly represent real sequence data from any given group of species, in terms of the difficulty of alignment tasks. RESULTS We find that the conventional simulation approach, which relies on empirically estimated values for various parameters such as substitution rate or insertion/deletion rates, is unable to generate synthetic sequences reflecting the broad genomic variation in conservation levels. We tackle this problem with a new method for simulating non-coding sequence evolution, by relying on genome-wide distributions of evolutionary parameters rather than their averages. We then generate synthetic data sets to mimic orthologous sequences from the Drosophila group of species, and show that these data sets truly represent the variability observed in genomic data in terms of the difficulty of the alignment task. This allows us to make significant progress towards estimating the alignment accuracy of current tools in an absolute sense, going beyond only a relative assessment of different tools. We evaluate six widely used multiple alignment tools in the context of Drosophila non-coding sequences, and find the accuracy to be significantly different from previously reported values. Interestingly, the performance of most tools degrades more rapidly when there are more insertions than deletions in the data set, suggesting an asymmetric handling of insertions and deletions, even though none of the evaluated tools explicitly distinguishes these two types of events. We also examine the accuracy of two existing tools for annotating insertions versus deletions, and find their performance to be close to optimal in Drosophila non-coding sequences if provided with the true alignments. CONCLUSION We have developed a method to generate benchmarks for multiple alignments of Drosophila non-coding sequences, and shown it to be more realistic than traditional benchmarks. Apart from helping to select the most effective tools, these benchmarks will help practitioners of comparative genomics deal with the effects of alignment errors, by providing accurate estimates of the extent of these errors.
Collapse
Affiliation(s)
- Jaebum Kim
- Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA
| | | |
Collapse
|