1
|
Krieger G, Lupo O, Wittkopp P, Barkai N. Evolution of transcription factor binding through sequence variations and turnover of binding sites. Genome Res 2022; 32:1099-1111. [PMID: 35618416 DOI: 10.1101/gr.276715.122] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2022] [Accepted: 05/20/2022] [Indexed: 01/08/2023]
Abstract
Variations in noncoding regulatory sequences play a central role in evolution. Interpreting such variations, however, remains difficult even in the context of defined attributes such as transcription factor (TF) binding sites. Here, we systematically link variations in cis-regulatory sequences to TF binding by profiling the allele-specific binding of 27 TFs expressed in a yeast hybrid, in which two related genomes are present within the same nucleus. TFs localize preferentially to sites containing their known consensus motifs but occupy only a small fraction of the motif-containing sites available within the genomes. Differential binding of TFs to the orthologous alleles was well explained by variations that alter motif sequence, whereas differences in chromatin accessibility between alleles were of little apparent effect. Motif variations that abolished binding when present in only one allele were still bound when present in both alleles, suggesting evolutionary compensation, with a potential role for sequence conservation at the motif's vicinity. At the level of the full promoter, we identify cases of binding-site turnover, in which binding sites are reciprocally gained and lost, yet most interspecific differences remained uncompensated. Our results show the flexibility of TFs to bind imprecise motifs and the fast evolution of TF binding sites between related species.
Collapse
Affiliation(s)
- Gat Krieger
- Department of Molecular Genetics, Weizmann Institute of Science, Rehovot 76100, Israel
| | - Offir Lupo
- Department of Molecular Genetics, Weizmann Institute of Science, Rehovot 76100, Israel
| | - Patricia Wittkopp
- Department of Ecology and Evolutionary Biology, Department of Molecular, Cellular, and Developmental Biology, University of Michigan, Ann Arbor, Michigan 48109, USA
| | - Naama Barkai
- Department of Molecular Genetics, Weizmann Institute of Science, Rehovot 76100, Israel
| |
Collapse
|
2
|
Shih CH, Fay J. Cis-regulatory variants affect gene expression dynamics in yeast. eLife 2021; 10:e68469. [PMID: 34369376 PMCID: PMC8367379 DOI: 10.7554/elife.68469] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2021] [Accepted: 08/06/2021] [Indexed: 12/14/2022] Open
Abstract
Evolution of cis-regulatory sequences depends on how they affect gene expression and motivates both the identification and prediction of cis-regulatory variants responsible for expression differences within and between species. While much progress has been made in relating cis-regulatory variants to expression levels, the timing of gene activation and repression may also be important to the evolution of cis-regulatory sequences. We investigated allele-specific expression (ASE) dynamics within and between Saccharomyces species during the diauxic shift and found appreciable cis-acting variation in gene expression dynamics. Within-species ASE is associated with intergenic variants, and ASE dynamics are more strongly associated with insertions and deletions than ASE levels. To refine these associations, we used a high-throughput reporter assay to test promoter regions and individual variants. Within the subset of regions that recapitulated endogenous expression, we identified and characterized cis-regulatory variants that affect expression dynamics. Between species, chimeric promoter regions generate novel patterns and indicate constraints on the evolution of gene expression dynamics. We conclude that changes in cis-regulatory sequences can tune gene expression dynamics and that the interplay between expression dynamics and other aspects of expression is relevant to the evolution of cis-regulatory sequences.
Collapse
Affiliation(s)
- Ching-Hua Shih
- Department of Biology, University of RochesterRochesterUnited States
| | - Justin Fay
- Department of Biology, University of RochesterRochesterUnited States
| |
Collapse
|
3
|
Choudhary MN, Friedman RZ, Wang JT, Jang HS, Zhuo X, Wang T. Co-opted transposons help perpetuate conserved higher-order chromosomal structures. Genome Biol 2020; 21:16. [PMID: 31973766 PMCID: PMC6979391 DOI: 10.1186/s13059-019-1916-8] [Citation(s) in RCA: 38] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2019] [Accepted: 12/08/2019] [Indexed: 12/26/2022] Open
Abstract
BACKGROUND Transposable elements (TEs) make up half of mammalian genomes and shape genome regulation by harboring binding sites for regulatory factors. These include binding sites for architectural proteins, such as CTCF, RAD21, and SMC3, that are involved in tethering chromatin loops and marking domain boundaries. The 3D organization of the mammalian genome is intimately linked to its function and is remarkably conserved. However, the mechanisms by which these structural intricacies emerge and evolve have not been thoroughly probed. RESULTS Here, we show that TEs contribute extensively to both the formation of species-specific loops in humans and mice through deposition of novel anchoring motifs, as well as to the maintenance of conserved loops across both species through CTCF binding site turnover. The latter function demonstrates the ability of TEs to contribute to genome plasticity and reinforce conserved genome architecture as redundant loop anchors. Deleting such candidate TEs in human cells leads to the collapse of conserved loop and domain structures. These TEs are also marked by reduced DNA methylation and bear mutational signatures of hypomethylation through evolutionary time. CONCLUSIONS TEs have long been considered a source of genetic innovation. By examining their contribution to genome topology, we show that TEs can contribute to regulatory plasticity by inducing redundancy and potentiating genetic drift locally while conserving genome architecture globally, revealing a paradigm for defining regulatory conservation in the noncoding genome beyond classic sequence-level conservation.
Collapse
Affiliation(s)
- Mayank Nk Choudhary
- The Edison Family Center for Genome Sciences & Systems Biology, Department of Genetics, Washington University, 4515 McKinley Avenue, Campus Box 8510, St. Louis, MO, 63110, USA
| | - Ryan Z Friedman
- The Edison Family Center for Genome Sciences & Systems Biology, Department of Genetics, Washington University, 4515 McKinley Avenue, Campus Box 8510, St. Louis, MO, 63110, USA
| | - Julia T Wang
- The Edison Family Center for Genome Sciences & Systems Biology, Department of Genetics, Washington University, 4515 McKinley Avenue, Campus Box 8510, St. Louis, MO, 63110, USA
| | - Hyo Sik Jang
- The Edison Family Center for Genome Sciences & Systems Biology, Department of Genetics, Washington University, 4515 McKinley Avenue, Campus Box 8510, St. Louis, MO, 63110, USA
| | - Xiaoyu Zhuo
- The Edison Family Center for Genome Sciences & Systems Biology, Department of Genetics, Washington University, 4515 McKinley Avenue, Campus Box 8510, St. Louis, MO, 63110, USA
| | - Ting Wang
- The Edison Family Center for Genome Sciences & Systems Biology, Department of Genetics, Washington University, 4515 McKinley Avenue, Campus Box 8510, St. Louis, MO, 63110, USA.
| |
Collapse
|
4
|
Lenzini L, Di Patti F, Livi R, Fondi M, Fani R, Mengoni A. A Method for the Structure-Based, Genome-Wide Analysis of Bacterial Intergenic Sequences Identifies Shared Compositional and Functional Features. Genes (Basel) 2019; 10:genes10100834. [PMID: 31652625 PMCID: PMC6826451 DOI: 10.3390/genes10100834] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2019] [Revised: 10/07/2019] [Accepted: 10/16/2019] [Indexed: 11/16/2022] Open
Abstract
In this paper, we propose a computational strategy for performing genome-wide analyses of intergenic sequences in bacterial genomes. Following similar directions of a previous paper, where a method for genome-wide analysis of eucaryotic Intergenic sequences was proposed, here we developed a tool for implementing similar concepts in bacteria genomes. This allows us to (i) classify intergenic sequences into clusters, characterized by specific global structural features and (ii) draw possible relations with their functional features.
Collapse
Affiliation(s)
- Leonardo Lenzini
- Dipartimento di Fisica e Astronomia, Università degli Studi di Firenze, Sesto Fiorentino, 50019, Italy.
- Istituto Nazionale di Fisica Nucleare, Sesto Fiorentino, 50019, Italy.
| | - Francesca Di Patti
- Dipartimento di Fisica e Astronomia, Università degli Studi di Firenze, Sesto Fiorentino, 50019, Italy.
- Centro Interdipartimentale per lo Studio delle Dinamiche Complesse, Sesto Fiorentino, 50019, Italy.
| | - Roberto Livi
- Dipartimento di Fisica e Astronomia, Università degli Studi di Firenze, Sesto Fiorentino, 50019, Italy.
- Istituto Nazionale di Fisica Nucleare, Sesto Fiorentino, 50019, Italy.
- Centro Interdipartimentale per lo Studio delle Dinamiche Complesse, Sesto Fiorentino, 50019, Italy.
- Istituto dei Sistemi Complessi, Consiglio Nazionale delle Ricerche, Sesto Fiorentino, 50019, Italy.
| | - Marco Fondi
- Dipartimento di Biologia, Università degli Studi di Firenze, Sesto Fiorentino, 50019, Italy.
| | - Renato Fani
- Istituto dei Sistemi Complessi, Consiglio Nazionale delle Ricerche, Sesto Fiorentino, 50019, Italy.
- Dipartimento di Biologia, Università degli Studi di Firenze, Sesto Fiorentino, 50019, Italy.
| | - Alessio Mengoni
- Dipartimento di Biologia, Università degli Studi di Firenze, Sesto Fiorentino, 50019, Italy.
| |
Collapse
|
5
|
Huh I, Mendizabal I, Park T, Yi SV. Functional conservation of sequence determinants at rapidly evolving regulatory regions across mammals. PLoS Comput Biol 2018; 14:e1006451. [PMID: 30289877 PMCID: PMC6192654 DOI: 10.1371/journal.pcbi.1006451] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2018] [Revised: 10/17/2018] [Accepted: 08/20/2018] [Indexed: 01/08/2023] Open
Abstract
Recent advances in epigenomics have made it possible to map genome-wide regulatory regions using empirical methods. Subsequent comparative epigenomic studies have revealed that regulatory regions diverge rapidly between genome of different species, and that the divergence is more pronounced in enhancers than in promoters. To understand genomic changes underlying these patterns, we investigated if we can identify specific sequence fragments that are over-enriched in regulatory regions, thus potentially contributing to regulatory functions of such regions. Here we report numerous sequence fragments that are statistically over-enriched in enhancers and promoters of different mammals (which we refer to as ‘sequence determinants’). Interestingly, the degree of statistical enrichment, which presumably is associated with the degree of regulatory impacts of the specific sequence determinant, was significantly higher for promoter sequence determinants than enhancer sequence determinants. We further used a machine learning method to construct prediction models using sequence determinants. Remarkably, prediction models constructed from one species could be used to predict regulatory regions of other species with high accuracy. This observation indicates that even though the precise locations of regulatory regions diverge rapidly during evolution, the functional potential of sequence determinants underlying regulatory sequences may be conserved between species. Regions of the genome that do not encode genes but affect expression of other genes, such as enhancers and promoters, are referred to as regulatory regions. Because of their regulatory functions, it was thought that enhancers and promoters should be evolutionarily conserved. Regulatory regions can be now epigenomically identified because they are marked by specific modifications of histone tails at the chromatin level. Interestingly, when we compare epigenomically identified regulatory regions from different mammals, the specific positions of regulatory regions are often divergent between species. Enhancers in particular are highly divergent between species. In this study, we show that we can find sequence fragments that are statistically enriched in enhancers and promoters of different species, and that the degree of statistical enrichment can explain different levels of evolutionary sequence conservation between enhancers and promoters. We further constructed predictive models of enhancers and promoters using the enriched sequence fragments, and show that these models can not only accurately predict enhancers and promoters of the same species, but works comparably well when applied to other species. These results indicate that even though the specific positions of regulatory regions have diverged between species, the functions of sequence fragments that comprise those regions may be conserved.
Collapse
Affiliation(s)
- Iksoo Huh
- School of Biological Sciences, Institute of Bioengineering and Bioscience, Georgia Institute of Technology, Atlanta, GA, United States of America
- College of Nursing, The Research Institute of Nursing Science, Seoul National University, Seoul, Korea
| | - Isabel Mendizabal
- School of Biological Sciences, Institute of Bioengineering and Bioscience, Georgia Institute of Technology, Atlanta, GA, United States of America
| | - Taesung Park
- Department of Statistics, College of Natural Sciences, Seoul National University, Seoul, Korea
| | - Soojin V. Yi
- School of Biological Sciences, Institute of Bioengineering and Bioscience, Georgia Institute of Technology, Atlanta, GA, United States of America
- * E-mail:
| |
Collapse
|
6
|
Liang P, Saqib HSA, Zhang X, Zhang L, Tang H. Single-Base Resolution Map of Evolutionary Constraints and Annotation of Conserved Elements across Major Grass Genomes. Genome Biol Evol 2018; 10:473-488. [PMID: 29378032 PMCID: PMC5798027 DOI: 10.1093/gbe/evy006] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 01/08/2018] [Indexed: 12/20/2022] Open
Abstract
Conserved noncoding sequences (CNSs) are evolutionarily conserved DNA sequences that do not encode proteins but may have potential regulatory roles in gene expression. CNS in crop genomes could be linked to many important agronomic traits and ecological adaptations. Compared with the relatively mature exon annotation protocols, efficient methods are lacking to predict the location of noncoding sequences in the plant genomes. We implemented a computational pipeline that is tailored to the comparisons of plant genomes, yielding a large number of conserved sequences using rice genome as the reference. In this study, we used 17 published grass genomes, along with five monocot genomes as well as the basal angiosperm genome of Amborella trichopoda. Genome alignments among these genomes suggest that at least 12.05% of the rice genome appears to be evolving under constraints in the Poaceae lineage, with close to half of the evolutionarily constrained sequences located outside protein-coding regions. We found evidence for purifying selection acting on the conserved sequences by analyzing segregating SNPs within the rice population. Furthermore, we found that known functional motifs were significantly enriched within CNS, with many motifs associated with the preferred binding of ubiquitous transcription factors. The conserved elements that we have curated are accessible through our public database and the JBrowse server. In-depth functional annotations and evolutionary dynamics of the identified conserved sequences provide a solid foundation for studying gene regulation, genome evolution, as well as to inform gene isolation for cereal biologists.
Collapse
Affiliation(s)
- Pingping Liang
- Key Laboratory of Genetics, Breeding and Multiple Utilization of Corps, Center for Genomics and Biotechnology, Ministry of Education; Fujian Provincial Key Laboratory of Haixia Applied Plant Systems Biology, Fujian Agriculture and Forestry University, Fuzhou, China
- Key Laboratory of the Ministry of Education for Coastal and Wetland Ecosystems, College of the Environment and Ecology, Xiamen University, China
| | - Hafiz Sohaib Ahmed Saqib
- Institute of Applied Ecology, Fujian Agriculture and Forestry University, Fuzhou, China
- State Key Laboratory of Ecological Pest Control for Fujian and Taiwan Crops, Fujian Agriculture and Forestry University, Fuzhou, China
| | - Xingtan Zhang
- Key Laboratory of Genetics, Breeding and Multiple Utilization of Corps, Center for Genomics and Biotechnology, Ministry of Education; Fujian Provincial Key Laboratory of Haixia Applied Plant Systems Biology, Fujian Agriculture and Forestry University, Fuzhou, China
| | - Liangsheng Zhang
- Key Laboratory of Genetics, Breeding and Multiple Utilization of Corps, Center for Genomics and Biotechnology, Ministry of Education; Fujian Provincial Key Laboratory of Haixia Applied Plant Systems Biology, Fujian Agriculture and Forestry University, Fuzhou, China
| | - Haibao Tang
- Key Laboratory of Genetics, Breeding and Multiple Utilization of Corps, Center for Genomics and Biotechnology, Ministry of Education; Fujian Provincial Key Laboratory of Haixia Applied Plant Systems Biology, Fujian Agriculture and Forestry University, Fuzhou, China
| |
Collapse
|
7
|
Leclercq M, Diallo AB, Blanchette M. Prediction of human miRNA target genes using computationally reconstructed ancestral mammalian sequences. Nucleic Acids Res 2016; 45:556-566. [PMID: 27899600 PMCID: PMC5314757 DOI: 10.1093/nar/gkw1085] [Citation(s) in RCA: 29] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/29/2016] [Revised: 09/26/2016] [Accepted: 11/13/2016] [Indexed: 11/14/2022] Open
Abstract
MicroRNAs (miRNA) are short single-stranded RNA molecules derived from hairpin-forming precursors that play a crucial role as post-transcriptional regulators in eukaryotes and viruses. In the past years, many microRNA target genes (MTGs) have been identified experimentally. However, because of the high costs of experimental approaches, target genes databases remain incomplete. Although several target prediction programs have been developed in the recent years to identify MTGs in silico, their specificity and sensitivity remain low. Here, we propose a new approach called MirAncesTar, which uses ancestral genome reconstruction to boost the accuracy of existing MTGs prediction tools for human miRNAs. For each miRNA and each putative human target UTR, our algorithm makes uses of existing prediction tools to identify putative target sites in the human UTR, as well as in its mammalian orthologs and inferred ancestral sequences. It then evaluates evidence in support of selective pressure to maintain target site counts (rather than sequences), accounting for the possibility of target site turnover. It finally integrates this measure with several simpler ones using a logistic regression predictor. MirAncesTar improves the accuracy of existing MTG predictors by 26% to 157%. Source code and prediction results for human miRNAs, as well as supporting evolutionary data are available at http://cs.mcgill.ca/∼blanchem/mirancestar.
Collapse
Affiliation(s)
- Mickael Leclercq
- School of Computer Science and McGill Centre for Bioinformatics, McGill University, Montreal, Quebec, H3A0E9, Canada
| | - Abdoulaye Baniré Diallo
- Laboratoire de bio-informatique du département informatique, Université du Québec à Montréal, Montréal, Québec H2X 3Y7, Canada
| | - Mathieu Blanchette
- School of Computer Science and McGill Centre for Bioinformatics, McGill University, Montreal, Quebec, H3A0E9, Canada
| |
Collapse
|
8
|
Bergen AC, Olsen GM, Fay JC. Divergent MLS1 Promoters Lie on a Fitness Plateau for Gene Expression. Mol Biol Evol 2016; 33:1270-9. [PMID: 26782997 PMCID: PMC4839218 DOI: 10.1093/molbev/msw010] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
Qualitative patterns of gene activation and repression are often conserved despite an abundance of quantitative variation in expression levels within and between species. A major challenge to interpreting patterns of expression divergence is knowing which changes in gene expression affect fitness. To characterize the fitness effects of gene expression divergence, we placed orthologous promoters from eight yeast species upstream of malate synthase (MLS1) in Saccharomyces cerevisiae. As expected, we found these promoters varied in their expression level under activated and repressed conditions as well as in their dynamic response following loss of glucose repression. Despite these differences, only a single promoter driving near basal levels of expression caused a detectable loss of fitness. We conclude that the MLS1 promoter lies on a fitness plateau whereby even large changes in gene expression can be tolerated without a substantial loss of fitness.
Collapse
Affiliation(s)
- Andrew C Bergen
- Molecular Genetics and Genomics Program, Washington University, St. Louis
| | | | - Justin C Fay
- Department of Genetics, Washington University, St. Louis Center for Genome Sciences and Systems Biology, Washington University, St. Louis
| |
Collapse
|
9
|
De Witte D, Van de Velde J, Decap D, Van Bel M, Audenaert P, Demeester P, Dhoedt B, Vandepoele K, Fostier J. BLSSpeller: exhaustive comparative discovery of conserved cis-regulatory elements. Bioinformatics 2015; 31:3758-66. [PMID: 26254488 PMCID: PMC4653392 DOI: 10.1093/bioinformatics/btv466] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2014] [Accepted: 08/03/2015] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION The accurate discovery and annotation of regulatory elements remains a challenging problem. The growing number of sequenced genomes creates new opportunities for comparative approaches to motif discovery. Putative binding sites are then considered to be functional if they are conserved in orthologous promoter sequences of multiple related species. Existing methods for comparative motif discovery usually rely on pregenerated multiple sequence alignments, which are difficult to obtain for more diverged species such as plants. As a consequence, misaligned regulatory elements often remain undetected. RESULTS We present a novel algorithm that supports both alignment-free and alignment-based motif discovery in the promoter sequences of related species. Putative motifs are exhaustively enumerated as words over the IUPAC alphabet and screened for conservation using the branch length score. Additionally, a confidence score is established in a genome-wide fashion. In order to take advantage of a cloud computing infrastructure, the MapReduce programming model is adopted. The method is applied to four monocotyledon plant species and it is shown that high-scoring motifs are significantly enriched for open chromatin regions in Oryza sativa and for transcription factor binding sites inferred through protein-binding microarrays in O.sativa and Zea mays. Furthermore, the method is shown to recover experimentally profiled ga2ox1-like KN1 binding sites in Z.mays. AVAILABILITY AND IMPLEMENTATION BLSSpeller was written in Java. Source code and manual are available at http://bioinformatics.intec.ugent.be/blsspeller CONTACT Klaas.Vandepoele@psb.vib-ugent.be or jan.fostier@intec.ugent.be. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Dieter De Witte
- Department of Information Technology (INTEC), Ghent University-iMinds, Ghent, Belgium
| | - Jan Van de Velde
- Department of Plant Systems Biology, VIB and Department of Plant Biotechnology and Bioinformatics, Ghent University, Ghent, Belgium
| | - Dries Decap
- Department of Information Technology (INTEC), Ghent University-iMinds, Ghent, Belgium
| | - Michiel Van Bel
- Department of Plant Systems Biology, VIB and Department of Plant Biotechnology and Bioinformatics, Ghent University, Ghent, Belgium
| | - Pieter Audenaert
- Department of Information Technology (INTEC), Ghent University-iMinds, Ghent, Belgium
| | - Piet Demeester
- Department of Information Technology (INTEC), Ghent University-iMinds, Ghent, Belgium
| | - Bart Dhoedt
- Department of Information Technology (INTEC), Ghent University-iMinds, Ghent, Belgium
| | - Klaas Vandepoele
- Department of Plant Systems Biology, VIB and Department of Plant Biotechnology and Bioinformatics, Ghent University, Ghent, Belgium
| | - Jan Fostier
- Department of Information Technology (INTEC), Ghent University-iMinds, Ghent, Belgium
| |
Collapse
|
10
|
Nadimpalli S, Persikov AV, Singh M. Pervasive variation of transcription factor orthologs contributes to regulatory network evolution. PLoS Genet 2015; 11:e1005011. [PMID: 25748510 PMCID: PMC4351887 DOI: 10.1371/journal.pgen.1005011] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2014] [Accepted: 01/18/2015] [Indexed: 01/17/2023] Open
Abstract
Differences in transcriptional regulatory networks underlie much of the phenotypic variation observed across organisms. Changes to cis-regulatory elements are widely believed to be the predominant means by which regulatory networks evolve, yet examples of regulatory network divergence due to transcription factor (TF) variation have also been observed. To systematically ascertain the extent to which TFs contribute to regulatory divergence, we analyzed the evolution of the largest class of metazoan TFs, Cys2-His2 zinc finger (C2H2-ZF) TFs, across 12 Drosophila species spanning ~45 million years of evolution. Remarkably, we uncovered that a significant fraction of all C2H2-ZF 1-to-1 orthologs in flies exhibit variations that can affect their DNA-binding specificities. In addition to loss and recruitment of C2H2-ZF domains, we found diverging DNA-contacting residues in ~44% of domains shared between D. melanogaster and the other fly species. These diverging DNA-contacting residues, found in ~70% of the D. melanogaster C2H2-ZF genes in our analysis and corresponding to ~26% of all annotated D. melanogaster TFs, show evidence of functional constraint: they tend to be conserved across phylogenetic clades and evolve slower than other diverging residues. These same variations were rarely found as polymorphisms within a population of D. melanogaster flies, indicating their rapid fixation. The predicted specificities of these dynamic domains gradually change across phylogenetic distances, suggesting stepwise evolutionary trajectories for TF divergence. Further, whereas proteins with conserved C2H2-ZF domains are enriched in developmental functions, those with varying domains exhibit no functional enrichments. Our work suggests that a subset of highly dynamic and largely unstudied TFs are a likely source of regulatory variation in Drosophila and other metazoans.
Collapse
Affiliation(s)
- Shilpa Nadimpalli
- Department of Computer Science, Princeton University, Princeton, New Jersey, United States of America
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, New Jersey, United States of America
| | - Anton V. Persikov
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, New Jersey, United States of America
| | - Mona Singh
- Department of Computer Science, Princeton University, Princeton, New Jersey, United States of America
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, New Jersey, United States of America
| |
Collapse
|
11
|
Genome-wide analysis of promoters: clustering by alignment and analysis of regular patterns. PLoS One 2014; 9:e85260. [PMID: 24465517 PMCID: PMC3898993 DOI: 10.1371/journal.pone.0085260] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2013] [Accepted: 11/26/2013] [Indexed: 01/08/2023] Open
Abstract
In this paper we perform a genome-wide analysis of H. sapiens promoters. To this aim, we developed and combined two mathematical methods that allow us to (i) classify promoters into groups characterized by specific global structural features, and (ii) recover, in full generality, any regular sequence in the different classes of promoters. One of the main findings of this analysis is that H. sapiens promoters can be classified into three main groups. Two of them are distinguished by the prevalence of weak or strong nucleotides and are characterized by short compositionally biased sequences, while the most frequent regular sequences in the third group are strongly correlated with transposons. Taking advantage of the generality of these mathematical procedures, we have compared the promoter database of H. sapiens with those of other species. We have found that the above-mentioned features characterize also the evolutionary content appearing in mammalian promoters, at variance with ancestral species in the phylogenetic tree, that exhibit a definitely lower level of differentiation among promoters.
Collapse
|
12
|
Briones-Martin-Del-Campo M, Orta-Zavalza E, Juarez-Cepeda J, Gutierrez-Escobedo G, Cañas-Villamar I, Castaño I, De Las Peñas A. The oxidative stress response of the opportunistic fungal pathogen Candida glabrata. Rev Iberoam Micol 2013; 31:67-71. [PMID: 24270068 DOI: 10.1016/j.riam.2013.09.012] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2013] [Accepted: 09/27/2013] [Indexed: 11/28/2022] Open
Abstract
Organisms have evolved different strategies to respond to oxidative stress generated as a by-product of aerobic respiration and thus maintain the redox homeostasis within the cell. In particular, fungal pathogens are exposed to reactive oxygen species (ROS) when they interact with the phagocytic cells of the host which are the first line of defense against fungal infections. These pathogens have co-opted the enzymatic (catalases, superoxide dismutases (SODs), and peroxidases) and non-enzymatic (glutathione) mechanisms used to maintain the redox homeostasis within the cell, to resist oxidative stress and ensure survival within the host. Several virulence factors have been related to the response to oxidative stress in pathogenic fungi. The opportunistic fungal pathogen Candida glabrata (C. glabrata) is the second most common cause of candidiasis after Candida albicans (C. albicans). C. glabrata has a well defined oxidative stress response (OSR), which include both enzymatic and non-enzymatic mechanisms. C. glabrata OSR is controlled by the well-conserved transcription factors Yap1, Skn7, Msn2 and Msn4. In this review, we describe the OSR of C. glabrata, what is known about its core elements, its regulation and how C. glabrata interacts with the host. This manuscript is part of the series of works presented at the "V International Workshop: Molecular genetic approaches to the study of human pathogenic fungi" (Oaxaca, Mexico, 2012).
Collapse
Affiliation(s)
- Marcela Briones-Martin-Del-Campo
- División de Biología Molecular, IPICYT, Instituto Potosino de Investigación Científica y Tecnológica, San Luis Potosí, San Luis Potosí, México
| | - Emmanuel Orta-Zavalza
- División de Biología Molecular, IPICYT, Instituto Potosino de Investigación Científica y Tecnológica, San Luis Potosí, San Luis Potosí, México
| | - Jacqueline Juarez-Cepeda
- División de Biología Molecular, IPICYT, Instituto Potosino de Investigación Científica y Tecnológica, San Luis Potosí, San Luis Potosí, México
| | - Guadalupe Gutierrez-Escobedo
- División de Biología Molecular, IPICYT, Instituto Potosino de Investigación Científica y Tecnológica, San Luis Potosí, San Luis Potosí, México
| | - Israel Cañas-Villamar
- División de Biología Molecular, IPICYT, Instituto Potosino de Investigación Científica y Tecnológica, San Luis Potosí, San Luis Potosí, México
| | - Irene Castaño
- División de Biología Molecular, IPICYT, Instituto Potosino de Investigación Científica y Tecnológica, San Luis Potosí, San Luis Potosí, México
| | - Alejandro De Las Peñas
- División de Biología Molecular, IPICYT, Instituto Potosino de Investigación Científica y Tecnológica, San Luis Potosí, San Luis Potosí, México.
| |
Collapse
|
13
|
Behnam E, Waterman MS, Smith AD. A geometric interpretation for local alignment-free sequence comparison. J Comput Biol 2013; 20:471-85. [PMID: 23829649 PMCID: PMC3704055 DOI: 10.1089/cmb.2012.0280] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Local alignment-free sequence comparison arises in the context of identifying similar segments of sequences that may not be alignable in the traditional sense. We propose a randomized approximation algorithm that is both accurate and efficient. We show that under D2 and its important variant [Formula: see text] as the similarity measure, local alignment-free comparison between a pair of sequences can be formulated as the problem of finding the maximum bichromatic dot product between two sets of points in high dimensions. We introduce a geometric framework that reduces this problem to that of finding the bichromatic closest pair (BCP), allowing the properties of the underlying metric to be leveraged. Local alignment-free sequence comparison can be solved by making a quadratic number of alignment-free substring comparisons. We show both theoretically and through empirical results on simulated data that our approximation algorithm requires a subquadratic number of such comparisons and trades only a small amount of accuracy to achieve this efficiency. Therefore, our algorithm can extend the current usage of alignment-free-based methods and can also be regarded as a substitute for local alignment algorithms in many biological studies.
Collapse
Affiliation(s)
- Ehsan Behnam
- Molecular and Computational Biology, University of Southern California, Los Angeles, California 90089-2910, USA
| | | | | |
Collapse
|
14
|
Turco G, Schnable JC, Pedersen B, Freeling M. Automated conserved non-coding sequence (CNS) discovery reveals differences in gene content and promoter evolution among grasses. FRONTIERS IN PLANT SCIENCE 2013; 4:170. [PMID: 23874343 PMCID: PMC3708275 DOI: 10.3389/fpls.2013.00170] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/04/2013] [Accepted: 05/13/2013] [Indexed: 05/07/2023]
Abstract
Conserved non-coding sequences (CNS) are islands of non-coding sequence that, like protein coding exons, show less divergence in sequence between related species than functionless DNA. Several CNSs have been demonstrated experimentally to function as cis-regulatory regions. However, the specific functions of most CNSs remain unknown. Previous searches for CNS in plants have either anchored on exons and only identified nearby sequences or required years of painstaking manual annotation. Here we present an open source tool that can accurately identify CNSs between any two related species with sequenced genomes, including both those immediately adjacent to exons and distal sequences separated by >12 kb of non-coding sequence. We have used this tool to characterize new motifs, associate CNSs with additional functions, and identify previously undetected genes encoding RNA and protein in the genomes of five grass species. We provide a list of 15,363 orthologous CNSs conserved across all grasses tested. We were also able to identify regulatory sequences present in the common ancestor of grasses that have been lost in one or more extant grass lineages. Lists of orthologous gene pairs and associated CNSs are provided for reference inbred lines of arabidopsis, Japonica rice, foxtail millet, sorghum, brachypodium, and maize.
Collapse
Affiliation(s)
| | - James C. Schnable
- *Correspondence: James C. Schnable and Michael Freeling, Department of Plant and Microbial Biology, University of California, 111 Koshland Hall, Berkeley, CA 94720, USA e-mail: ;
| | | | - Michael Freeling
- *Correspondence: James C. Schnable and Michael Freeling, Department of Plant and Microbial Biology, University of California, 111 Koshland Hall, Berkeley, CA 94720, USA e-mail: ;
| |
Collapse
|
15
|
Tsai ZTY, Tsai HK, Cheng JH, Lin CH, Tsai YF, Wang D. Evolution of cis-regulatory elements in yeast de novo and duplicated new genes. BMC Genomics 2012; 13:717. [PMID: 23256513 PMCID: PMC3553024 DOI: 10.1186/1471-2164-13-717] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2012] [Accepted: 12/18/2012] [Indexed: 12/22/2022] Open
Abstract
Background New genes that originate from non-coding DNA rather than being duplicated from parent genes are called de novo genes. Their short evolution time and lack of parent genes provide a chance to study the evolution of cis-regulatory elements in the initial stage of gene emergence. Although a few reports have discussed cis-regulatory elements in new genes, knowledge of the characteristics of these elements in de novo genes is lacking. Here, we conducted a comprehensive investigation to depict the emergence and establishment of cis-regulatory elements in de novo yeast genes. Results In a genome-wide investigation, we found that the number of transcription factor binding sites (TFBSs) in de novo genes of S. cerevisiae increased rapidly and quickly became comparable to the number of TFBSs in established genes. This phenomenon might have resulted from certain characteristics of de novo genes; namely, a relatively frequent gain of TFBSs, an unexpectedly high number of preexisting TFBSs, or lower selection pressure in the promoter regions of the de novo genes. Furthermore, we identified differences in the promoter architecture between de novo genes and duplicated new genes, suggesting that distinct regulatory strategies might be employed by genes of different origin. Finally, our functional analyses of the yeast de novo genes revealed that they might be related to reproduction. Conclusions Our observations showed that de novo genes and duplicated new genes possess mutually distinct regulatory characteristics, implying that these two types of genes might have different roles in evolution.
Collapse
|
16
|
Stewart AJ, Seymour RM, Pomiankowski A, Plotkin JB. The population genetics of cooperative gene regulation. BMC Evol Biol 2012; 12:173. [PMID: 22954408 PMCID: PMC3537746 DOI: 10.1186/1471-2148-12-173] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2012] [Accepted: 08/31/2012] [Indexed: 12/25/2022] Open
Abstract
Background Changes in gene regulatory networks drive the evolution of phenotypic diversity both within and between species. Rewiring of transcriptional networks is achieved either by changes to transcription factor binding sites or by changes to the physical interactions among transcription factor proteins. It has been suggested that the evolution of cooperative binding among factors can facilitate the adaptive rewiring of a regulatory network. Results We use a population-genetic model to explore when cooperative binding of transcription factors is favored by evolution, and what effects cooperativity then has on the adaptive re-writing of regulatory networks. We consider a pair of transcription factors that regulate multiple targets and overlap in the sets of target genes they regulate. We show that, under stabilising selection, cooperative binding between the transcription factors is favoured provided the amount of overlap between their target genes exceeds a threshold. The value of this threshold depends on several population-genetic factors: strength of selection on binding sites, cost of pleiotropy associated with protein-protein interactions, rates of mutation and population size. Once it is established, we find that cooperative binding of transcription factors significantly accelerates the adaptive rewiring of transcriptional networks under positive selection. We compare our qualitative predictions to systematic data on Saccharomyces cerevisiae transcription factors, their binding sites, and their protein-protein interactions. Conclusions Our study reveals a rich set of evolutionary dynamics driven by a tradeoff between the beneficial effects of cooperative binding at targets shared by a pair of factors, and the detrimental effects of cooperative binding for non-shared targets. We find that cooperative regulation will evolve when transcription factors share a sufficient proportion of their target genes. These findings help to explain empirical pattens in datasets of transcription factors in Saccharomyces cerevisiae and, they suggest that changes to physical interactions between transcription factors can play a critical role in the evolution of gene regulatory networks.
Collapse
|
17
|
Engle EK, Fay JC. Divergence of the yeast transcription factor FZF1 affects sulfite resistance. PLoS Genet 2012; 8:e1002763. [PMID: 22719269 PMCID: PMC3375221 DOI: 10.1371/journal.pgen.1002763] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2012] [Accepted: 04/26/2012] [Indexed: 01/06/2023] Open
Abstract
Changes in gene expression are commonly observed during evolution. However, the phenotypic consequences of expression divergence are frequently unknown and difficult to measure. Transcriptional regulators provide a mechanism by which phenotypic divergence can occur through multiple, coordinated changes in gene expression during development or in response to environmental changes. Yet, some changes in transcriptional regulators may be constrained by their pleiotropic effects on gene expression. Here, we use a genome-wide screen for promoters that are likely to have diverged in function and identify a yeast transcription factor, FZF1, that has evolved substantial differences in its ability to confer resistance to sulfites. Chimeric alleles from four Saccharomyces species show that divergence in FZF1 activity is due to changes in both its coding and upstream noncoding sequence. Between the two closest species, noncoding changes affect the expression of FZF1, whereas coding changes affect the expression of SSU1, a sulfite efflux pump activated by FZF1. Both coding and noncoding changes also affect the expression of many other genes. Our results show how divergence in the coding and promoter region of a transcription factor alters the response to an environmental stress. Changes in gene regulation are thought to play an important role in evolution. While variation in gene expression between species is common, it is hard to identify the phenotypic consequences of this variation since many changes in gene expression may have subtle or no phenotypic effects. In this study, we investigate changes in sulfite resistance and gene expression caused by the transcription factor, FZF1, that has evolved rapidly during the divergence of related yeast species. We find that divergence in the ability of FZF1 to confer sulfite resistance is mediated by changes in its expression as well as changes in its protein structure, both of which cause changes in the expression of other genes. Our results show how the combination of multiple changes within a transcription factor can produce substantial changes in phenotype and the expression of many genes.
Collapse
Affiliation(s)
- Elizabeth K. Engle
- Molecular Genetics and Genomics Program, Washington University, St. Louis, Missouri, United States of America
| | - Justin C. Fay
- Department of Genetics and Center for Genome Sciences and Systems Biology, Washington University School of Medicine, St. Louis, Missouri, United States of America
- * E-mail:
| |
Collapse
|
18
|
Hsu C, Scherrer S, Buetti-Dinh A, Ratna P, Pizzolato J, Jaquet V, Becskei A. Stochastic signalling rewires the interaction map of a multiple feedback network during yeast evolution. Nat Commun 2012; 3:682. [PMID: 22353713 PMCID: PMC3293423 DOI: 10.1038/ncomms1687] [Citation(s) in RCA: 33] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2011] [Accepted: 01/17/2012] [Indexed: 01/18/2023] Open
Abstract
During evolution, genetic networks are rewired through strengthening or weakening their interactions to develop new regulatory schemes. In the galactose network, the GAL1/GAL3 paralogues and the GAL2 gene enhance their own expression mediated by the Gal4p transcriptional activator. The wiring strength in these feedback loops is set by the number of Gal4p binding sites. Here we show using synthetic circuits that multiplying the binding sites increases the expression of a gene under the direct control of an activator, but this enhancement is not fed back in the circuit. The feedback loops are rather activated by genes that have frequent stochastic bursts and fast RNA decay rates. In this way, rapid adaptation to galactose can be triggered even by weakly expressed genes. Our results indicate that nonlinear stochastic transcriptional responses enable feedback loops to function autonomously, or contrary to what is dictated by the strength of interactions enclosing the circuit.
Collapse
Affiliation(s)
- Chieh Hsu
- Biozentrum, University of Basel, Klingelbergstrasse 50/70, Basel 4056, Switzerland
| | | | | | | | | | | | | |
Collapse
|
19
|
Positive evolutionary selection of an HD motif on Alzheimer precursor protein orthologues suggests a functional role. PLoS Comput Biol 2012; 8:e1002356. [PMID: 22319430 PMCID: PMC3271017 DOI: 10.1371/journal.pcbi.1002356] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2011] [Accepted: 12/07/2011] [Indexed: 12/31/2022] Open
Abstract
HD amino acid duplex has been found in the active center of many different enzymes. The dyad plays remarkably different roles in their catalytic processes that usually involve metal coordination. An HD motif is positioned directly on the amyloid beta fragment (Aβ) and on the carboxy-terminal region of the extracellular domain (CAED) of the human amyloid precursor protein (APP) and a taxonomically well defined group of APP orthologues (APPOs). In human Aβ HD is part of a presumed, RGD-like integrin-binding motif RHD; however, neither RHD nor RXD demonstrates reasonable conservation in APPOs. The sequences of CAEDs and the position of the HD are not particularly conserved either, yet we show with a novel statistical method using evolutionary modeling that the presence of HD on CAEDs cannot be the result of neutral evolutionary forces (p<0.0001). The motif is positively selected along the evolutionary process in the majority of APPOs, despite the fact that HD motif is underrepresented in the proteomes of all species of the animal kingdom. Position migration can be explained by high probability occurrence of multiple copies of HD on intermediate sequences, from which only one is kept by selective evolutionary forces, in a similar way as in the case of the “transcription binding site turnover.” CAED of all APP orthologues and homologues are predicted to bind metal ions including Amyloid-like protein 1 (APLP1) and Amyloid-like protein 2 (APLP2). Our results suggest that HDs on the CAEDs are most probably key components of metal-binding domains, which facilitate and/or regulate inter- or intra-molecular interactions in a metal ion-dependent or metal ion concentration-dependent manner. The involvement of naturally occurring mutations of HD (Tottori (D7N) and English (H6R) mutations) in early onset Alzheimer's disease gives additional support to our finding that HD has an evolutionary preserved function on APPOs. HD amino acid duplex can be found in the active center of different metallo-enzymes. An HD motif is positioned directly on the amyloid beta (Aβ) fragment and on the carboxy-terminal region of the extracellular domain of the human amyloid precursor protein (APP) and a taxonomically well defined group of APP orthologues (APPOs). The conservation of the HD dyad is not position specific and it cannot be seen in a multiple alignment. Yet we show with a novel statistical method using evolutionary modeling that HD motif is positively selected by evolution on APPOs, despite the fact that HD dyad is underrepresented in the proteomes of all species of the animal kingdom. CAED of all APP orthologues and homologues are predicted to bind metal ions including Amyloid-like protein 1 (APLP1) and Amyloid-like protein 2 (APLP2). Our results suggest that HDs on the APPOs are most probably key components of metal-binding domains, which facilitate and/or regulate inter- or intra-molecular interactions in a metal ion-dependent or metal ion concentration-dependent manner. The involvement of naturally occurring mutations of HD (Tottori (D7N) and English (H6R)) in early onset Alzheimer's disease gives additional support to our finding that HD has an evolutionary preserved function on APPOs.
Collapse
|
20
|
Ramazzotti M, Berná L, Stefanini I, Cavalieri D. A computational pipeline to discover highly phylogenetically informative genes in sequenced genomes: application to Saccharomyces cerevisiae natural strains. Nucleic Acids Res 2012; 40:3834-48. [PMID: 22266652 PMCID: PMC3351171 DOI: 10.1093/nar/gks005] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022] Open
Abstract
The quest for genes representing genetic relationships of strains or individuals within populations and their evolutionary history is acquiring a novel dimension of complexity with the advancement of next-generation sequencing (NGS) technologies. In fact, sequencing an entire genome uncovers genetic variation in coding and non-coding regions and offers the possibility of studying Saccharomyces cerevisiae populations at the strain level. Nevertheless, the disadvantageous cost-benefit ratio (the amount of details disclosed by NGS against the time-expensive and expertise-demanding data assembly process) still precludes the application of these techniques to the routinely assignment of yeast strains, making the selection of the most reliable molecular markers greatly desirable. In this work we propose an original computational approach to discover genes that can be used as a descriptor of the population structure. We found 13 genes whose variability can be used to recapitulate the phylogeny obtained from genome-wide sequences. The same approach that we prove to be successful in yeasts can be generalized to any other population of individuals given the availability of high-quality genomic sequences and of a clear population structure to be targeted.
Collapse
Affiliation(s)
- Matteo Ramazzotti
- Department of Preclinical and Clinical Pharmacology, University of Florence, Viale G. Pieraccini 6, 50139 Firenze, Italy
| | | | | | | |
Collapse
|