1
|
Maseko NN, Steenkamp ET, Wingfield BD, Wilken PM. An in Silico Approach to Identifying TF Binding Sites: Analysis of the Regulatory Regions of BUSCO Genes from Fungal Species in the Ceratocystidaceae Family. Genes (Basel) 2023; 14:genes14040848. [PMID: 37107606 PMCID: PMC10137650 DOI: 10.3390/genes14040848] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2023] [Revised: 03/26/2023] [Accepted: 03/27/2023] [Indexed: 04/03/2023] Open
Abstract
Transcriptional regulation controls gene expression through regulatory promoter regions that contain conserved sequence motifs. These motifs, also known as regulatory elements, are critically important to expression, which is driving research efforts to identify and characterize them. Yeasts have been the focus of such studies in fungi, including in several in silico approaches. This study aimed to determine whether in silico approaches could be used to identify motifs in the Ceratocystidaceae family, and if present, to evaluate whether these correspond to known transcription factors. This study targeted the 1000 base-pair region upstream of the start codon of 20 single-copy genes from the BUSCO dataset for motif discovery. Using the MEME and Tomtom analysis tools, conserved motifs at the family level were identified. The results show that such in silico approaches could identify known regulatory motifs in the Ceratocystidaceae and other unrelated species. This study provides support to ongoing efforts to use in silico analyses for motif discovery.
Collapse
|
2
|
Bernardini A, Lorenzo M, Chaves-Sanjuan A, Swuec P, Pigni M, Saad D, Konarev PV, Graewert MA, Valentini E, Svergun DI, Nardini M, Mantovani R, Gnesutta N. The USR domain of USF1 mediates NF-Y interactions and cooperative DNA binding. Int J Biol Macromol 2021; 193:401-413. [PMID: 34673109 DOI: 10.1016/j.ijbiomac.2021.10.056] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2021] [Revised: 10/07/2021] [Accepted: 10/08/2021] [Indexed: 10/20/2022]
Abstract
The trimeric CCAAT-binding NF-Y is a "pioneer" Transcription Factor -TF- known to cooperate with neighboring TFs to regulate gene expression. Genome-wide analyses detected a precise stereo-alignment -10/12 bp- of CCAAT with E-box elements and corresponding colocalization of NF-Y with basic-Helix-Loop-Helix (bHLH) TFs. We dissected here NF-Y interactions with USF1 and MAX. USF1, but not MAX, cooperates in DNA binding with NF-Y. NF-Y and USF1 synergize to activate target promoters. Reconstruction of complexes by structural means shows independent DNA binding of MAX, whereas USF1 has extended contacts with NF-Y, involving the USR, a USF-specific amino acid sequence stretch required for trans-activation. The USR is an intrinsically disordered domain and adopts different conformations based on E-box-CCAAT distances. Deletion of the USR abolishes cooperative DNA binding with NF-Y. Our data indicate that the functionality of certain unstructured domains involves adapting to small variation in stereo-alignments of the multimeric TFs sites.
Collapse
Affiliation(s)
- Andrea Bernardini
- Dipartimento di Bioscienze, Università degli Studi di Milano, Milano 20133, Italy
| | - Mariangela Lorenzo
- Dipartimento di Bioscienze, Università degli Studi di Milano, Milano 20133, Italy
| | | | - Paolo Swuec
- Dipartimento di Bioscienze, Università degli Studi di Milano, Milano 20133, Italy
| | - Matteo Pigni
- Dipartimento di Bioscienze, Università degli Studi di Milano, Milano 20133, Italy
| | - Dana Saad
- Dipartimento di Bioscienze, Università degli Studi di Milano, Milano 20133, Italy
| | - Petr V Konarev
- A.V. Shubnikov Institute of Crystallography, Federal Scientific Research Centre "Crystallography and Photonics" of Russian Academy of Science, Moscow 119333, Russian Federation
| | | | - Erica Valentini
- European Molecular Biology Laboratory, Hamburg Unit, Hamburg 22607, Germany
| | - Dmitri I Svergun
- European Molecular Biology Laboratory, Hamburg Unit, Hamburg 22607, Germany
| | - Marco Nardini
- Dipartimento di Bioscienze, Università degli Studi di Milano, Milano 20133, Italy
| | - Roberto Mantovani
- Dipartimento di Bioscienze, Università degli Studi di Milano, Milano 20133, Italy.
| | - Nerina Gnesutta
- Dipartimento di Bioscienze, Università degli Studi di Milano, Milano 20133, Italy.
| |
Collapse
|
3
|
Markus BM, Waldman BS, Lorenzi HA, Lourido S. High-Resolution Mapping of Transcription Initiation in the Asexual Stages of Toxoplasma gondii. Front Cell Infect Microbiol 2021; 10:617998. [PMID: 33553008 PMCID: PMC7854901 DOI: 10.3389/fcimb.2020.617998] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2020] [Accepted: 12/03/2020] [Indexed: 12/13/2022] Open
Abstract
Toxoplasma gondii is a common parasite of humans and animals, causing life-threatening disease in the immunocompromized, fetal abnormalities when contracted during gestation, and recurrent ocular lesions in some patients. Central to the prevalence and pathogenicity of this protozoan is its ability to adapt to a broad range of environments, and to differentiate between acute and chronic stages. These processes are underpinned by a major rewiring of gene expression, yet the mechanisms that regulate transcription in this parasite are only partially characterized. Deciphering these mechanisms requires a precise and comprehensive map of transcription start sites (TSSs); however, Toxoplasma TSSs have remained incompletely defined. To address this challenge, we used 5'-end RNA sequencing to genomically assess transcription initiation in both acute and chronic stages of Toxoplasma. Here, we report an in-depth analysis of transcription initiation at promoters, and provide empirically-defined TSSs for 7603 (91%) protein-coding genes, of which only 1840 concur with existing gene models. Comparing data from acute and chronic stages, we identified instances of stage-specific alternative TSSs that putatively generate mRNA isoforms with distinct 5' termini. Analysis of the nucleotide content and nucleosome occupancy around TSSs allowed us to examine the determinants of TSS choice, and outline features of Toxoplasma promoter architecture. We also found pervasive divergent transcription at Toxoplasma promoters, clustered within the nucleosomes of highly-symmetrical phased arrays, underscoring chromatin contributions to transcription initiation. Corroborating previous observations, we asserted that Toxoplasma 5' leaders are among the longest of any eukaryote studied thus far, displaying a median length of approximately 800 nucleotides. Further highlighting the utility of a precise TSS map, we pinpointed motifs associated with transcription initiation, including the binding sites of the master regulator of chronic-stage differentiation, BFD1, and a novel motif with a similar positional arrangement present at 44% of Toxoplasma promoters. This work provides a critical resource for functional genomics in Toxoplasma, and lays down a foundation to study the interactions between genomic sequences and the regulatory factors that control transcription in this parasite.
Collapse
Affiliation(s)
- Benedikt M. Markus
- Whitehead Institute for Biomedical Research, Cambridge, MA, United States
- Faculty of Biology, University of Freiburg, Freiburg, Germany
| | - Benjamin S. Waldman
- Whitehead Institute for Biomedical Research, Cambridge, MA, United States
- Department of Biology, Massachusetts Institute of Technology, Cambridge, MA, United States
| | | | - Sebastian Lourido
- Whitehead Institute for Biomedical Research, Cambridge, MA, United States
- Department of Biology, Massachusetts Institute of Technology, Cambridge, MA, United States
| |
Collapse
|
4
|
Genome wide analysis of W-box element in Arabidopsis thaliana reveals TGAC motif with genes down regulated by heat and salinity. Sci Rep 2019; 9:1681. [PMID: 30737427 PMCID: PMC6368537 DOI: 10.1038/s41598-019-38757-7] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2018] [Accepted: 12/21/2018] [Indexed: 01/10/2023] Open
Abstract
To design, synthetic promoters leading to stress-specific induction of a transgene, the study of cis-regulatory elements is of great importance. Cis-regulatory elements play a major role in regulating the gene expression spatially and temporally at the transcriptional level. The present work focuses on one of the important cis-regulatory element, W-box having TGAC as a core motif which serves as a binding site for the members of the WRKY transcription factor family. In the present study, we have analyzed the occurrence frequency of TGAC core motifs for varying spacer lengths (ranging from 0 to 30 base pairs) across the Arabidopsis thaliana genome in order to determine the biological and functional significance of these conserved sequences. Further, the available microarray data was used to determine the role of TGAC motif in abiotic stresses namely salinity, osmolarity and heat. It was observed that TGAC motifs with spacer sequences like TGACCCATTTTGAC and TGACCCATGAATTTTGAC had a significant deviation in frequency and were thought to be favored for transcriptional bindings. The microarray data analysis revealed the involvement of TGAC motif mainly with genes down-regulated under abiotic stress conditions. These results were further confirmed by the transient expression studies with promoter-reporter cassettes carrying TGAC and TGAC-ACGT variant motifs with spacer lengths of 5 and 10.
Collapse
|
5
|
Bekiaris PS, Tekath T, Staiger D, Danisman S. Computational exploration of cis-regulatory modules in rhythmic expression data using the "Exploration of Distinctive CREs and CRMs" (EDCC) and "CRM Network Generator" (CNG) programs. PLoS One 2018; 13:e0190421. [PMID: 29298348 PMCID: PMC5752016 DOI: 10.1371/journal.pone.0190421] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2017] [Accepted: 12/14/2017] [Indexed: 11/19/2022] Open
Abstract
Understanding the effect of cis-regulatory elements (CRE) and clusters of CREs, which are called cis-regulatory modules (CRM), in eukaryotic gene expression is a challenge of computational biology. We developed two programs that allow simple, fast and reliable analysis of candidate CREs and CRMs that may affect specific gene expression and that determine positional features between individual CREs within a CRM. The first program, "Exploration of Distinctive CREs and CRMs" (EDCC), correlates candidate CREs and CRMs with specific gene expression patterns. For pairs of CREs, EDCC also determines positional preferences of the single CREs in relation to each other and to the transcriptional start site. The second program, "CRM Network Generator" (CNG), prioritizes these positional preferences using a neural network and thus allows unbiased rating of the positional preferences that were determined by EDCC. We tested these programs with data from a microarray study of circadian gene expression in Arabidopsis thaliana. Analyzing more than 1.5 million pairwise CRE combinations, we found 22 candidate combinations, of which several contained known clock promoter elements together with elements that had not been identified as relevant to circadian gene expression before. CNG analysis further identified positional preferences of these CRE pairs, hinting at positional information that may be relevant for circadian gene expression. Future wet lab experiments will have to determine which of these combinations confer daytime specific circadian gene expression.
Collapse
Affiliation(s)
| | - Tobias Tekath
- RNA Biology and Molecular Physiology, Faculty of Biology, Bielefeld University, Bielefeld, Germany
| | - Dorothee Staiger
- RNA Biology and Molecular Physiology, Faculty of Biology, Bielefeld University, Bielefeld, Germany
| | - Selahattin Danisman
- RNA Biology and Molecular Physiology, Faculty of Biology, Bielefeld University, Bielefeld, Germany
| |
Collapse
|
6
|
Bagshaw AT. Functional Mechanisms of Microsatellite DNA in Eukaryotic Genomes. Genome Biol Evol 2017; 9:2428-2443. [PMID: 28957459 PMCID: PMC5622345 DOI: 10.1093/gbe/evx164] [Citation(s) in RCA: 64] [Impact Index Per Article: 9.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 08/23/2017] [Indexed: 02/06/2023] Open
Abstract
Microsatellite repeat DNA is best known for its length mutability, which is implicated in several neurological diseases and cancers, and often exploited as a genetic marker. Less well-known is the body of work exploring the widespread and surprisingly diverse functional roles of microsatellites. Recently, emerging evidence includes the finding that normal microsatellite polymorphism contributes substantially to the heritability of human gene expression on a genome-wide scale, calling attention to the task of elucidating the mechanisms involved. At present, these are underexplored, but several themes have emerged. I review evidence demonstrating roles for microsatellites in modulation of transcription factor binding, spacing between promoter elements, enhancers, cytosine methylation, alternative splicing, mRNA stability, selection of transcription start and termination sites, unusual structural conformations, nucleosome positioning and modification, higher order chromatin structure, noncoding RNA, and meiotic recombination hot spots.
Collapse
|
7
|
Balancing selection maintains polymorphisms at neurogenetic loci in field experiments. Proc Natl Acad Sci U S A 2017; 114:3690-3695. [PMID: 28325880 DOI: 10.1073/pnas.1621228114] [Citation(s) in RCA: 31] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022] Open
Abstract
Most variation in behavior has a genetic basis, but the processes determining the level of diversity at behavioral loci are largely unknown for natural populations. Expression of arginine vasopressin receptor 1a (Avpr1a) and oxytocin receptor (Oxtr) in specific regions of the brain regulates diverse social and reproductive behaviors in mammals, including humans. That these genes have important fitness consequences and that natural populations contain extensive diversity at these loci implies the action of balancing selection. In Myodes glareolus, Avpr1a and Oxtr each contain a polymorphic microsatellite locus located in their 5' regulatory region (the regulatory region-associated microsatellite, RRAM) that likely regulates gene expression. To test the hypothesis that balancing selection maintains diversity at behavioral loci, we released artificially bred females and males with different RRAM allele lengths into field enclosures that differed in population density. The length of Avpr1a and Oxtr RRAMs was associated with reproductive success, but population density and the sex interacted to determine the optimal genotype. In general, longer Avpr1a RRAMs were more beneficial for males, and shorter RRAMs were more beneficial for females; the opposite was true for Oxtr RRAMs. Moreover, Avpr1a RRAM allele length is correlated with the reproductive success of the sexes during different phases of reproduction; for males, RRAM length correlated with the numbers of newborn offspring, but for females selection was evident on the number of weaned offspring. This report of density-dependence and sexual antagonism acting on loci within the arginine vasopressin-oxytocin pathway explains how genetic diversity at Avpr1a and Oxtr could be maintained in natural populations.
Collapse
|
8
|
Acevedo-Luna N, Mariño-Ramírez L, Halbert A, Hansen U, Landsman D, Spouge JL. Most of the tight positional conservation of transcription factor binding sites near the transcription start site reflects their co-localization within regulatory modules. BMC Bioinformatics 2016; 17:479. [PMID: 27871221 PMCID: PMC5117513 DOI: 10.1186/s12859-016-1354-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2016] [Accepted: 11/11/2016] [Indexed: 11/24/2022] Open
Abstract
Background Transcription factors (TFs) form complexes that bind regulatory modules (RMs) within DNA, to control specific sets of genes. Some transcription factor binding sites (TFBSs) near the transcription start site (TSS) display tight positional preferences relative to the TSS. Furthermore, near the TSS, RMs can co-localize TFBSs with each other and the TSS. The proportion of TFBS positional preferences due to TFBS co-localization within RMs is unknown, however. ChIP experiments confirm co-localization of some TFBSs genome-wide, including near the TSS, but they typically examine only a few TFs at a time, using non-physiological conditions that can vary from lab to lab. In contrast, sequence analysis can examine many TFs uniformly and methodically, broadly surveying the co-localization of TFBSs with tight positional preferences relative to the TSS. Results Our statistics found 43 significant sets of human motifs in the JASPAR TF Database with positional preferences relative to the TSS, with 38 preferences tight (±5 bp). Each set of motifs corresponded to a gene group of 135 to 3304 genes, with 42/43 (98%) gene groups independently validated by DAVID, a gene ontology database, with FDR < 0.05. Motifs corresponding to two TFBSs in a RM should co-occur more than by chance alone, enriching the intersection of the gene groups corresponding to the two TFs. Thus, a gene-group intersection systematically enriched beyond chance alone provides evidence that the two TFs participate in an RM. Of the 903 = 43*42/2 intersections of the 43 significant gene groups, we found 768/903 (85%) pairs of gene groups with significantly enriched intersections, with 564/768 (73%) intersections independently validated by DAVID with FDR < 0.05. A user-friendly web site at http://go.usa.gov/3kjsH permits biologists to explore the interaction network of our TFBSs to identify candidate subunit RMs. Conclusions Gene duplication and convergent evolution within a genome provide obvious biological mechanisms for replicating an RM near the TSS that binds a particular TF subunit. Of all intersections of our 43 significant gene groups, 85% were significantly enriched, with 73% of the significant enrichments independently validated by gene ontology. The co-localization of TFBSs within RMs therefore likely explains much of the tight TFBS positional preferences near the TSS. Electronic supplementary material The online version of this article (doi:10.1186/s12859-016-1354-5) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Natalia Acevedo-Luna
- Department of Genetics, Development and Cell Biology, Iowa State University, Ames, IA, 50011, USA
| | - Leonardo Mariño-Ramírez
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, 20894, USA
| | - Armand Halbert
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, 20894, USA
| | - Ulla Hansen
- Department of Biology, Boston University, 5 Cummington Mall, Boston, MA, 02215, USA
| | - David Landsman
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, 20894, USA
| | - John L Spouge
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, 20894, USA.
| |
Collapse
|
9
|
Guo H, Huo H, Yu Q. SMCis: An Effective Algorithm for Discovery of Cis-Regulatory Modules. PLoS One 2016; 11:e0162968. [PMID: 27637070 PMCID: PMC5026350 DOI: 10.1371/journal.pone.0162968] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2016] [Accepted: 08/31/2016] [Indexed: 12/02/2022] Open
Abstract
The discovery of cis-regulatory modules (CRMs) is a challenging problem in computational biology. Limited by the difficulty of using an HMM to model dependent features in transcriptional regulatory sequences (TRSs), the probabilistic modeling methods based on HMMs cannot accurately represent the distance between regulatory elements in TRSs and are cumbersome to model the prevailing dependencies between motifs within CRMs. We propose a probabilistic modeling algorithm called SMCis, which builds a more powerful CRM discovery model based on a hidden semi-Markov model. Our model characterizes the regulatory structure of CRMs and effectively models dependencies between motifs at a higher level of abstraction based on segments rather than nucleotides. Experimental results on three benchmark datasets indicate that our method performs better than the compared algorithms.
Collapse
Affiliation(s)
- Haitao Guo
- School of Computer Science and Technology, Xidian University, Xi’an, Shaanxi, China
| | - Hongwei Huo
- School of Computer Science and Technology, Xidian University, Xi’an, Shaanxi, China
- * E-mail:
| | - Qiang Yu
- School of Computer Science and Technology, Xidian University, Xi’an, Shaanxi, China
| |
Collapse
|
10
|
Brown AS, Mohanty BK, Howe PH. Identification and characterization of an hnRNP E1 translational silencing motif. Nucleic Acids Res 2016; 44:5892-907. [PMID: 27067543 PMCID: PMC4937310 DOI: 10.1093/nar/gkw241] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2015] [Accepted: 03/28/2016] [Indexed: 12/19/2022] Open
Abstract
Non-canonical transforming growth factor β (TGFβ) signaling through protein kinase B (Akt2) induces phosphorylation of heterogeneous nuclear ribonucleoprotein E1 (hnRNP E1) at serine-43 (p-hnRNP E1). This post-translational modification (PTM) of hnRNP E1 promotes its dissociation from a 3′ untranslated region (UTR) nucleic acid regulatory motif, driving epithelial to mesenchymal transition (EMT) and metastasis. We have identified an hnRNP E1 consensus-binding motif and genomically resolved a subset of genes in which it is contained. This study characterizes the binding kinetics of the consensus-binding motif and hnRNP E1, its various K-homology (KH) domains and p-hnRNP E1. Levels of p-hnRNP E1 are highly upregulated in metastatic cancer cells and low in normal epithelial tissue. We show a correlation between this PTM and levels of Akt2 and its activated form, phosphorylated serine-474 (p-Akt2). Using cellular progression models of metastasis, we observed a signature high level of Akt2, p-Akt2 and p-hnRNP E1 protein expression, coupled to a significantly reduced level of total hnRNP E1 in metastatic cells. Genes that are translationally silenced by hnRNP E1 and expressed by its dissociation are highly implicated in the progression of EMT and metastasis. This study provides insight into a non-canonical TGFβ signaling cascade that is responsible for inducing EMT by aberrant expression of hnRNP E1 silenced targets. The relevance of this system in metastatic progression is clearly shown in cellular models by the high abundance of p-hnRNP E1 and low levels of hnRNP E1. New insights provided by the resolution of this molecular mechanism provide targets for therapeutic intervention and give further insight into the role of the TGFβ microenvironment.
Collapse
Affiliation(s)
- Andrew S Brown
- Department of Biochemistry, Medical University of South Carolina, 173 Ashley Avenue, Charleston, SC 29425, USA Department of Biomedical Science, Kent State University, 800 East Summit Street, Kent, OH 44240, USA
| | - Bidyut K Mohanty
- Department of Biochemistry, Medical University of South Carolina, 173 Ashley Avenue, Charleston, SC 29425, USA
| | - Philip H Howe
- Department of Biochemistry, Medical University of South Carolina, 173 Ashley Avenue, Charleston, SC 29425, USA
| |
Collapse
|
11
|
Dolfini D, Zambelli F, Pedrazzoli M, Mantovani R, Pavesi G. A high definition look at the NF-Y regulome reveals genome-wide associations with selected transcription factors. Nucleic Acids Res 2016; 44:4684-702. [PMID: 26896797 PMCID: PMC4889920 DOI: 10.1093/nar/gkw096] [Citation(s) in RCA: 37] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2015] [Accepted: 02/09/2016] [Indexed: 12/11/2022] Open
Abstract
NF-Y is a trimeric transcription factor (TF), binding the CCAAT box element, for which several results suggest a pioneering role in activation of transcription. In this work, we integrated 380 ENCODE ChIP-Seq experiments for 154 TFs and cofactors with sequence analysis, protein–protein interactions and RNA profiling data, in order to identify genome-wide regulatory modules resulting from the co-association of NF-Y with other TFs. We identified three main degrees of co-association with NF-Y for sequence-specific TFs. In the most relevant one, we found TFs having a significant overlap with NF-Y in their DNA binding loci, some with a precise spacing of binding sites with respect to the CCAAT box, others (FOS, Sp1/2, RFX5, IRF3, PBX3) mostly lacking their canonical binding site and bound to arrays of well spaced CCAAT boxes. As expected, NF-Y binding also correlates with RNA Pol II General TFs and with subunits of complexes involved in the control of H3K4 methylations. Co-association patterns are confirmed by protein–protein interactions, and correspond to specific functional categorizations and expression level changes of target genes following NF-Y inactivation. These data define genome-wide rules for the organization of NF-Y-centered regulatory modules, supporting a model of distinct categorization and synergy with well defined sets of TFs.
Collapse
Affiliation(s)
- Diletta Dolfini
- Dipartimento di Bioscienze, Università degli Studi di Milano, Milano, Via Celoria 26, 20133, Italy
| | - Federico Zambelli
- Dipartimento di Bioscienze, Università degli Studi di Milano, Milano, Via Celoria 26, 20133, Italy Istituto di Biomembrane e Bioenergetica, Consiglio Nazionale delle Ricerche, Bari, Via Amendola 165/A, 70126, Italy
| | - Maurizio Pedrazzoli
- Dipartimento di Bioscienze, Università degli Studi di Milano, Milano, Via Celoria 26, 20133, Italy
| | - Roberto Mantovani
- Dipartimento di Bioscienze, Università degli Studi di Milano, Milano, Via Celoria 26, 20133, Italy
| | - Giulio Pavesi
- Dipartimento di Bioscienze, Università degli Studi di Milano, Milano, Via Celoria 26, 20133, Italy
| |
Collapse
|
12
|
Abe H, Gemmell NJ. Evolutionary Footprints of Short Tandem Repeats in Avian Promoters. Sci Rep 2016; 6:19421. [PMID: 26766026 PMCID: PMC4725869 DOI: 10.1038/srep19421] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2015] [Accepted: 12/11/2015] [Indexed: 01/12/2023] Open
Abstract
Short tandem repeats (STRs) or microsatellites are well-known sequence elements that may change the spacing between transcription factor binding sites (TFBSs) in promoter regions by expansion or contraction of repetitive units. Some of these mutations have the potential to contribute to phenotypic diversity by altering patterns of gene expression. To explore how repetitive sequence motifs within promoters have evolved in avian lineages under mutation-selection balance, more than 400 evolutionary conserved STRs (ecSTRs) were identified in this study by comparing the 2 kb upstream promoter sequences of chicken against those of other birds (turkey, duck, zebra finch, and flycatcher). The rate of conservation was significantly higher in AG dinucleotide repeats than in AC or AT repeats, with the expansion of AG motifs being noticeably constrained in passerines. Analysis of the relative distance between ecSTRs and TFBSs revealed a significantly higher rate of conserved TFBSs in the vicinity of ecSTRs in both chicken-duck and chicken-passerine comparisons. Our comparative study provides a novel insight into which intrinsic factors have influenced the degree of constraint on repeat expansion/contraction during avian promoter evolution.
Collapse
Affiliation(s)
- Hideaki Abe
- Department of Anatomy, University of Otago, Dunedin 9054, New Zealand
| | - Neil J Gemmell
- Department of Anatomy, University of Otago, Dunedin 9054, New Zealand.,Allan Wilson Centre for Molecular Ecology and Evolution, University of Otago, Dunedin 9054, New Zealand
| |
Collapse
|
13
|
Zhang Y, Wang H, Zhou D, Moody L, Lezmi S, Chen H, Pan YX. High-fat diet caused widespread epigenomic differences on hepatic methylome in rat. Physiol Genomics 2015. [DOI: 10.1152/physiolgenomics.00110.2014] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022] Open
Abstract
A high-fat (HF) diet is associated with progression of liver diseases. To illustrate genome-wide landscape of DNA methylation in liver of rats fed either a control or HF diet, two enrichment-based methods, namely methyl-DNA immunoprecipitation assay with high-throughput sequencing (MeDIP-seq) and methylation-sensitive restriction enzyme sequencing (MRE-seq), were performed in our study. Rats fed with the HF diet exhibited an increased body weight and liver fat accumulation compared with that of the control group when they were 12 wk of age. Genome-wide analysis of differentially methylated regions (DMRs) showed that 12,494 DMRs induced by HF diet were hypomethylated and 6,404 were hypermethylated. DMRs with gene annotations [differentially methylated genes (DMGs)] were further analyzed to show gene-specific methylation profile. There were 88, 2,680, and 95 hypomethylated DMGs identified with changes in DNA methylation in the promoter, intragenic and downstream regions, respectively, compared with fewer hypermethylated DMGs (45, 1,623, and 50 in the respective regions). Some of these genes also contained an ACGT cis-acting motif whose DNA methylation status may affect gene expression. Pathway analysis showed that these DMGs were involved in critical hepatic signaling networks related to hepatic development. Therefore, HF diet had global impacts on DNA methylation profile in the liver of rats, leading to differential expression of genes in hepatic pathways that may involve in functional changes in liver development.
Collapse
Affiliation(s)
- Yukun Zhang
- Department of Food Science and Human Nutrition, University of Illinois at Urbana-Champaign, Urbana, Illinois
| | - Huan Wang
- Department of Food Science and Human Nutrition, University of Illinois at Urbana-Champaign, Urbana, Illinois
| | - Dan Zhou
- Department of Food Science and Human Nutrition, University of Illinois at Urbana-Champaign, Urbana, Illinois
| | - Laura Moody
- Division of Nutritional Sciences, University of Illinois at Urbana-Champaign, Urbana, Illinois
| | - Stéphane Lezmi
- Department of Pathobiology, College of Veterinary Medicine, University of Illinois at Urbana-Champaign, Urbana, Illinois
| | - Hong Chen
- Department of Food Science and Human Nutrition, University of Illinois at Urbana-Champaign, Urbana, Illinois
- Division of Nutritional Sciences, University of Illinois at Urbana-Champaign, Urbana, Illinois
| | - Yuan-Xiang Pan
- Department of Food Science and Human Nutrition, University of Illinois at Urbana-Champaign, Urbana, Illinois
- Division of Nutritional Sciences, University of Illinois at Urbana-Champaign, Urbana, Illinois
- Illinois Informatics Institute, University of Illinois at Urbana-Champaign, Urbana, Illinois; and
| |
Collapse
|
14
|
Deb A, Kundu S. Deciphering Cis-Regulatory Element Mediated Combinatorial Regulation in Rice under Blast Infected Condition. PLoS One 2015; 10:e0137295. [PMID: 26327607 PMCID: PMC4556519 DOI: 10.1371/journal.pone.0137295] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2015] [Accepted: 08/14/2015] [Indexed: 01/15/2023] Open
Abstract
Combinations of cis-regulatory elements (CREs) present at the promoters facilitate the binding of several transcription factors (TFs), thereby altering the consequent gene expressions. Due to the eminent complexity of the regulatory mechanism, the combinatorics of CRE-mediated transcriptional regulation has been elusive. In this work, we have developed a new methodology that quantifies the co-occurrence tendencies of CREs present in a set of promoter sequences; these co-occurrence scores are filtered in three consecutive steps to test their statistical significance; and the significantly co-occurring CRE pairs are presented as networks. These networks of co-occurring CREs are further transformed to derive higher order of regulatory combinatorics. We have further applied this methodology on the differentially up-regulated gene-sets of rice tissues under fungal (Magnaporthe) infected conditions to demonstrate how it helps to understand the CRE-mediated combinatorial gene regulation. Our analysis includes a wide spectrum of biologically important results. The CRE pairs having a strong tendency to co-occur often exhibit very similar joint distribution patterns at the promoters of rice. We couple the network approach with experimental results of plant gene regulation and defense mechanisms and find evidences of auto and cross regulation among TF families, cross-talk among multiple hormone signaling pathways, similarities and dissimilarities in regulatory combinatorics between different tissues, etc. Our analyses have pointed a highly distributed nature of the combinatorial gene regulation facilitating an efficient alteration in response to fungal attack. All together, our proposed methodology could be an important approach in understanding the combinatorial gene regulation. It can be further applied to unravel the tissue and/or condition specific combinatorial gene regulation in other eukaryotic systems with the availability of annotated genomic sequences and suitable experimental data.
Collapse
Affiliation(s)
- Arindam Deb
- Department of Biophysics Molecular Biology and Bioinformatics, University of Calcutta, Kolkata, West Bengal, India
| | - Sudip Kundu
- Department of Biophysics Molecular Biology and Bioinformatics, University of Calcutta, Kolkata, West Bengal, India
- Center of Excellence in Systems Biology and Biomedical Engineering (TEQIP Phase II), University of Calcutta, Kolkata, West Bengal, India
- * E-mail:
| |
Collapse
|
15
|
Abe H, Gemmell NJ. Abundance, arrangement, and function of sequence motifs in the chicken promoters. BMC Genomics 2014; 15:900. [PMID: 25318583 PMCID: PMC4203960 DOI: 10.1186/1471-2164-15-900] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2014] [Accepted: 10/08/2014] [Indexed: 01/01/2023] Open
Abstract
Background Eukaryotic promoters are regions containing various sequence motifs necessary to control gene transcription. Much evidence has emerged showing that structural and/or contextual changes in regulatory elements can critically affect cis-regulatory activity. As sequence motifs can be key factors in maintaining complex promoter architectures, one effective approach to further understand the evolution of promoter regions in vertebrates is to compare the abundance and distribution patterns of sequence motifs in these regions between divergent species. When compared with mammals, the chicken (Gallus gallus) has a very different genome composition and sufficient genomic information to make it a good model for the exploration of promoter structure and evolution. Results More than 10% of chicken genes contained short tandem repeat (STR) in the region 2 kb upstream of promoters, but the total number of STRs observed in chicken is approximately half of that detected in human promoters. In terms of the STR motif frequencies, chicken promoter regions were more similar to other avian and mammalian promoters than these were to the entire chicken genome. Unlike other STRs, nearly half of the trinucleotide repeats found in promoters partly or entirely overlapped with CpG islands, indicating potential association with nucleosome positions. Moreover, the chicken promoters are abundant with sequence motifs such as poly-A, poly-G and G-quadruplexes, especially in the core region, that are otherwise rare in the genome. Most of sequence motifs showed strong functional enrichment for particular gene ontology (GO) categories, indicating roles in regulation of transcription and gene expression, as well as immune response and cognition. Conclusions Chicken promoter regions share some, but not all, of the structural features observed in mammalian promoters. The findings presented here provide empirical evidence suggesting that the frequencies and locations of STR motifs have been conserved through promoter evolution in a lineage-specific manner. Correlation analysis between GO categories and sequence motifs suggests motif-specific constraints acting on gene function. Electronic supplementary material The online version of this article (doi:10.1186/1471-2164-15-900) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Hideaki Abe
- Department of Anatomy, University of Otago, Dunedin, New Zealand.
| | | |
Collapse
|
16
|
Liu J, Spulber M, Wu D, Talom RM, Palivan CG, Meier W. Poly(N-isopropylacrylamide-co-tris-nitrilotriacetic acid acrylamide) for a Combined Study of Molecular Recognition and Spatial Constraints in Protein Binding and Interactions. J Am Chem Soc 2014; 136:12607-14. [DOI: 10.1021/ja503632w] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
- Juan Liu
- Department of Chemistry, University of Basel, Klingelbergstrasse
80, Basel 4056, Switzerland
| | - Mariana Spulber
- Department of Chemistry, University of Basel, Klingelbergstrasse
80, Basel 4056, Switzerland
| | - Dalin Wu
- Department of Chemistry, University of Basel, Klingelbergstrasse
80, Basel 4056, Switzerland
| | - Renee M. Talom
- Department of Chemistry, University of Basel, Klingelbergstrasse
80, Basel 4056, Switzerland
| | - Cornelia G. Palivan
- Department of Chemistry, University of Basel, Klingelbergstrasse
80, Basel 4056, Switzerland
| | - Wolfgang Meier
- Department of Chemistry, University of Basel, Klingelbergstrasse
80, Basel 4056, Switzerland
| |
Collapse
|
17
|
Incorporating motif analysis into gene co-expression networks reveals novel modular expression pattern and new signaling pathways. PLoS Genet 2013; 9:e1003840. [PMID: 24098147 PMCID: PMC3789834 DOI: 10.1371/journal.pgen.1003840] [Citation(s) in RCA: 62] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2013] [Accepted: 08/14/2013] [Indexed: 11/19/2022] Open
Abstract
Understanding of gene regulatory networks requires discovery of expression modules within gene co-expression networks and identification of promoter motifs and corresponding transcription factors that regulate their expression. A commonly used method for this purpose is a top-down approach based on clustering the network into a range of densely connected segments, treating these segments as expression modules, and extracting promoter motifs from these modules. Here, we describe a novel bottom-up approach to identify gene expression modules driven by known cis-regulatory motifs in the gene promoters. For a specific motif, genes in the co-expression network are ranked according to their probability of belonging to an expression module regulated by that motif. The ranking is conducted via motif enrichment or motif position bias analysis. Our results indicate that motif position bias analysis is an effective tool for genome-wide motif analysis. Sub-networks containing the top ranked genes are extracted and analyzed for inherent gene expression modules. This approach identified novel expression modules for the G-box, W-box, site II, and MYB motifs from an Arabidopsis thaliana gene co-expression network based on the graphical Gaussian model. The novel expression modules include those involved in house-keeping functions, primary and secondary metabolism, and abiotic and biotic stress responses. In addition to confirmation of previously described modules, we identified modules that include new signaling pathways. To associate transcription factors that regulate genes in these co-expression modules, we developed a novel reporter system. Using this approach, we evaluated MYB transcription factor-promoter interactions within MYB motif modules. Gene co-expression networks unite genes with similar expression patterns. From these networks, gene co-expression modules can be identified. A specific family of transcription factor(s) may regulate the genes within a co-expression module. Thus, module identification is important to decipher the gene regulatory network. Previously, module identification relied on clustering the gene network into gene clusters that were then treated as modules. This represents a top-down approach. Here, we introduce a reverse approach aiming at identifying gene co-expression modules regulated by known promoter motifs. For a given promoter motif, we calculated the probability of each gene within the network to belong to a module regulated by that motif via motif enrichment analysis or motif position bias analysis. A sub-network containing the genes with a high probability of belonging to a motif driven module was then extracted from the gene co-expression network. From this sub-network, the modular structure can be identified via visual inspection. Our bottom-up approach recovered many known and novel modules for the G-box, MYB, W-box and site II elements motif, whose expression may be regulated by the transcription factors that bind to these motifs. Additionally, we developed a rapid transcription factor-promoter interaction screening system to validate predicted interactions.
Collapse
|
18
|
Mehrotra R, Sethi S, Zutshi I, Bhalothia P, Mehrotra S. Patterns and evolution of ACGT repeat cis-element landscape across four plant genomes. BMC Genomics 2013; 14:203. [PMID: 23530833 PMCID: PMC3622567 DOI: 10.1186/1471-2164-14-203] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2012] [Accepted: 03/18/2013] [Indexed: 12/01/2022] Open
Abstract
BACKGROUND Transcription factor binding is regulated by several interactions, primarily involving cis-element binding. These binding sites maintain specificity by means of their sequence, and other additional factors such as inter-motif distance and spacer specificity. The ACGT core sequence has been established as a functionally important cis-element which frequently regulates gene expression in synergy with other cis-elements. In this study, we used two monocotyledonous - Oryza sativa and Sorghum bicolor, and two dicotyledonous species - Arabidopsis thaliana and Glycine max to analyze the conservation of co-occurring ACGT core elements in plant promoters with respect to spacer distance between them. Using data generated from Arabidopsis thaliana and Oryza sativa, we also identified conserved regions across all spacers and possible conditions regulating gene promoters with multiple ACGT cis-elements. RESULTS Our data indicated specific predominant spacer lengths between co-occurring ACGT elements, but these lengths were not universally conserved across all species under analysis. However, the frequency distribution indicated local regions of high correlation among monocots and dicots. Sequence specificity data clearly revealed a preference for G at the first and C at the terminal position of a spacer sequence, suggesting that the G-box motif is the most prevalent for the ACGT class of promoters. Using gene expression databases, we also observed trends suggesting that co-occurring ACGT elements are responsible for gene regulation in response to exogenous stress. Conservation in patterns of ACGT (N) ACGT among orthologous genes also indicated the possibility that emergence of functional significance across species was a result of parallel evolution of these cis-elements. CONCLUSIONS Although the importance of ACGT elements has been acknowledged for several plant species, ours is the first study that attempts to compare their occurrence across four species and analyze conservation among them. The apparent preference for particular spacer distances suggest that these motifs might be implicated in important physiological functions which are yet to be identified. Variations in correlation patterns among monocots and dicots might arise out of differences in transcriptional regulation in the two classes. In accordance with literature, we established the involvement of co-occurring ACGT elements in stress responses and showed how this regulation differs with variation in the ACGT (N) ACGT motif. We believe that our study will be an essential resource in determining optimum spacer length and spacer sequence between ACGT elements for promoter design in future.
Collapse
Affiliation(s)
- Rajesh Mehrotra
- Biological Sciences Department, Birla Institute of Technology and Science, Pilani, RJ, India
| | - Sachin Sethi
- Biological Sciences Department, Birla Institute of Technology and Science, Pilani, RJ, India
| | - Ipshita Zutshi
- Biological Sciences Department, Birla Institute of Technology and Science, Pilani, RJ, India
| | - Purva Bhalothia
- Biological Sciences Department, Birla Institute of Technology and Science, Pilani, RJ, India
| | - Sandhya Mehrotra
- Biological Sciences Department, Birla Institute of Technology and Science, Pilani, RJ, India
| |
Collapse
|
19
|
Vandenbon A, Kumagai Y, Teraguchi S, Amada KM, Akira S, Standley DM. A Parzen window-based approach for the detection of locally enriched transcription factor binding sites. BMC Bioinformatics 2013; 14:26. [PMID: 23331723 PMCID: PMC3602658 DOI: 10.1186/1471-2105-14-26] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2012] [Accepted: 01/14/2013] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Identification of cis- and trans-acting factors regulating gene expression remains an important problem in biology. Bioinformatics analyses of regulatory regions are hampered by several difficulties. One is that binding sites for regulatory proteins are often not significantly over-represented in the set of DNA sequences of interest, because of high levels of false positive predictions, and because of positional restrictions on functional binding sites with regard to the transcription start site. RESULTS We have developed a novel method for the detection of regulatory motifs based on their local over-representation in sets of regulatory regions. The method makes use of a Parzen window-based approach for scoring local enrichment, and during evaluation of significance it takes into account GC content of sequences. We show that the accuracy of our method compares favourably to that of other methods, and that our method is capable of detecting not only generally over-represented regulatory motifs, but also locally over-represented motifs that are often missed by standard motif detection approaches. Using a number of examples we illustrate the validity of our approach and suggest applications, such as the analysis of weaker binding sites. CONCLUSIONS Our approach can be used to suggest testable hypotheses for wet-lab experiments. It has potential for future analyses, such as the prediction of weaker binding sites. An online application of our approach, called LocaMo Finder (Local Motif Finder), is available at http://sysimm.ifrec.osaka-u.ac.jp/tfbs/locamo/.
Collapse
Affiliation(s)
- Alexis Vandenbon
- Laboratory of Systems Immunology, Immunology Frontier Research Center, Osaka University, Osaka, Japan.
| | | | | | | | | | | |
Collapse
|
20
|
Mehdi AM, Sehgal MSB, Kobe B, Bailey TL, Bodén M. DLocalMotif: a discriminative approach for discovering local motifs in protein sequences. ACTA ACUST UNITED AC 2012; 29:39-46. [PMID: 23142965 DOI: 10.1093/bioinformatics/bts654] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
MOTIVATION Local motifs are patterns of DNA or protein sequences that occur within a sequence interval relative to a biologically defined anchor or landmark. Current protein motif discovery methods do not adequately consider such constraints to identify biologically significant motifs that are only weakly over-represented but spatially confined. Using negatives, i.e. sequences known to not contain a local motif, can further increase the specificity of their discovery. RESULTS This article introduces the method DLocalMotif that makes use of positional information and negative data for local motif discovery in protein sequences. DLocalMotif combines three scoring functions, measuring degrees of motif over-representation, entropy and spatial confinement, specifically designed to discriminatively exploit the availability of negative data. The method is shown to outperform current methods that use only a subset of these motif characteristics. We apply the method to several biological datasets. The analysis of peroxisomal targeting signals uncovers several novel motifs that occur immediately upstream of the dominant peroxisomal targeting signal-1 signal. The analysis of proline-tyrosine nuclear localization signals uncovers multiple novel motifs that overlap with C2H2 zinc finger domains. We also evaluate the method on classical nuclear localization signals and endoplasmic reticulum retention signals and find that DLocalMotif successfully recovers biologically relevant sequence properties. AVAILABILITY http://bioinf.scmb.uq.edu.au/dlocalmotif/
Collapse
Affiliation(s)
- Ahmed M Mehdi
- Institute for Molecular Bioscience, The University of Queensland, Australia
| | | | | | | | | |
Collapse
|
21
|
Ma S, Bachan S, Porto M, Bohnert HJ, Snyder M, Dinesh-Kumar SP. Discovery of stress responsive DNA regulatory motifs in Arabidopsis. PLoS One 2012; 7:e43198. [PMID: 22912824 PMCID: PMC3418279 DOI: 10.1371/journal.pone.0043198] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2012] [Accepted: 07/17/2012] [Indexed: 11/25/2022] Open
Abstract
The discovery of DNA regulatory motifs in the sequenced genomes using computational methods remains challenging. Here, we present MotifIndexer - a comprehensive strategy for de novo identification of DNA regulatory motifs at a genome level. Using word-counting methods, we indexed the existence of every 8-mer oligo composed of bases A, C, G, T, r, y, s, w, m, k, n or 12-mer oligo composed of A, C, G, T, n, in the promoters of all predicted genes of Arabidopsis thaliana genome and of selected stress-induced co-expressed genes. From this analysis, we identified number of over-represented motifs. Among these, major critical motifs were identified using a position filter. We used a model based on uniform distribution and the z-scores derived from this model to describe position bias. Interestingly, many motifs showed position bias towards the transcription start site. We extended this model to show biased distribution of motifs in the genomes of both A. thaliana and rice. We also used MotifIndexer to identify conserved motifs in co-expressed gene groups from two Arabidopsis species, A. thaliana and A. lyrata. This new comparative genomics method does not depend on alignments of homologous gene promoter sequences.
Collapse
Affiliation(s)
- Shisong Ma
- Department of Plant Biology and the Genome Center, College of Biological Sciences, University of California Davis, Davis, California, United States of America
- * E-mail: (SPD-K); (SM)
| | - Shawn Bachan
- Department of Plant Biology and the Genome Center, College of Biological Sciences, University of California Davis, Davis, California, United States of America
| | - Matthew Porto
- Department of Plant Biology and the Genome Center, College of Biological Sciences, University of California Davis, Davis, California, United States of America
| | - Hans J. Bohnert
- Departements of Plant Biology and Crop Sciences, University of Illinois at Urbana-Champaign, Urbana, Illinois, United States of America
| | - Michael Snyder
- Department of Genetics, Stanford University, Stanford, California, United States of America
| | - Savithramma P. Dinesh-Kumar
- Department of Plant Biology and the Genome Center, College of Biological Sciences, University of California Davis, Davis, California, United States of America
- * E-mail: (SPD-K); (SM)
| |
Collapse
|
22
|
Huang Q, Gong C, Li J, Zhuo Z, Chen Y, Wang J, Hua ZC. Distance and helical phase dependence of synergistic transcription activation in cis-regulatory module. PLoS One 2012; 7:e31198. [PMID: 22299056 PMCID: PMC3267773 DOI: 10.1371/journal.pone.0031198] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2011] [Accepted: 01/03/2012] [Indexed: 01/21/2023] Open
Abstract
Deciphering of the spatial and stereospecific constraints on synergistic transcription activation mediated between activators bound to cis-regulatory elements is important for understanding gene regulation and remains largely unknown. It has been commonly believed that two activators will activate transcription most effectively when they are bound on the same face of DNA double helix and within a boundary distance from the transcription initiation complex attached to the TATA box. In this work, we studied the spatial and stereospecific constraints on activation by multiple copies of bound model activators using a series of engineered relative distances and stereospecific orientations. We observed that multiple copies of the activators GAL4-VP16 and ZEBRA bound to engineered promoters activated transcription more effectively when bound on opposite faces of the DNA double helix. This phenomenon was not affected by the spatial relationship between the proximal activator and initiation complex. To explain these results, we proposed the novel concentration field model, which posits the effective concentration of bound activators, and therefore the transcription activation potential, is affected by their stereospecific positioning. These results could be used to understand synergistic transcription activation anew and to aid the development of predictive models for the identification of cis-regulatory elements.
Collapse
Affiliation(s)
- Qilai Huang
- The State Key Laboratory of Pharmaceutical Biotechnology and Affiliated Stomatological Hospital, Nanjing University, Nanjing, People's Republic of China
- The State Key Laboratory of Quality Research in Chinese Medicine and Macau Institute for Applied Research in Medicine, Macau University of Science and Technology, Macau, People's Republic of China
- Changzhou High-Tech Research Institute of Nanjing University and Jiangsu TargetPharma Laboratories Inc., Changzhou, People's Republic of China
| | - Chenguang Gong
- The State Key Laboratory of Pharmaceutical Biotechnology and Affiliated Stomatological Hospital, Nanjing University, Nanjing, People's Republic of China
| | - Jiahuang Li
- The State Key Laboratory of Pharmaceutical Biotechnology and Affiliated Stomatological Hospital, Nanjing University, Nanjing, People's Republic of China
| | - Zhu Zhuo
- The State Key Laboratory of Pharmaceutical Biotechnology and Affiliated Stomatological Hospital, Nanjing University, Nanjing, People's Republic of China
| | - Yuan Chen
- The State Key Laboratory of Pharmaceutical Biotechnology and Affiliated Stomatological Hospital, Nanjing University, Nanjing, People's Republic of China
| | - Jin Wang
- The State Key Laboratory of Pharmaceutical Biotechnology and Affiliated Stomatological Hospital, Nanjing University, Nanjing, People's Republic of China
- * E-mail: (JW); (ZH)
| | - Zi-Chun Hua
- The State Key Laboratory of Pharmaceutical Biotechnology and Affiliated Stomatological Hospital, Nanjing University, Nanjing, People's Republic of China
- The State Key Laboratory of Quality Research in Chinese Medicine and Macau Institute for Applied Research in Medicine, Macau University of Science and Technology, Macau, People's Republic of China
- Changzhou High-Tech Research Institute of Nanjing University and Jiangsu TargetPharma Laboratories Inc., Changzhou, People's Republic of China
- * E-mail: (JW); (ZH)
| |
Collapse
|
23
|
Ma X, Kulkarni A, Zhang Z, Xuan Z, Serfling R, Zhang MQ. A highly efficient and effective motif discovery method for ChIP-seq/ChIP-chip data using positional information. Nucleic Acids Res 2012; 40:e50. [PMID: 22228832 PMCID: PMC3326300 DOI: 10.1093/nar/gkr1135] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/25/2023] Open
Abstract
Identification of DNA motifs from ChIP-seq/ChIP-chip [chromatin immunoprecipitation (ChIP)] data is a powerful method for understanding the transcriptional regulatory network. However, most established methods are designed for small sample sizes and are inefficient for ChIP data. Here we propose a new k-mer occurrence model to reflect the fact that functional DNA k-mers often cluster around ChIP peak summits. With this model, we introduced a new measure to discover functional k-mers. Using simulation, we demonstrated that our method is more robust against noises in ChIP data than available methods. A novel word clustering method is also implemented to group similar k-mers into position weight matrices (PWMs). Our method was applied to a diverse set of ChIP experiments to demonstrate its high sensitivity and specificity. Importantly, our method is much faster than several other methods for large sample sizes. Thus, we have developed an efficient and effective motif discovery method for ChIP experiments.
Collapse
Affiliation(s)
- Xiaotu Ma
- Department of Molecular and Cell Biology, Center for Systems Biology, University of Texas at Dallas, 800 W. Campbell Road, Richardson, TX 75080, USA
| | | | | | | | | | | |
Collapse
|
24
|
Promoter microsatellites as modulators of human gene expression. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2012; 769:41-54. [PMID: 23560304 DOI: 10.1007/978-1-4614-5434-2_4] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/06/2023]
Abstract
Microsatellites in and around genes have been shown to modulate levels of gene expression in multiple organisms, ranging from bacteria to humans. Here we will discuss promoter microsatellites known to modulate gene expression, with a few key examples related to the human brain. Many of the microsatellites we discuss are highly conserved in mammals, indicating that selection may favor their retention as "tuning knobs" of gene expression. We will also discuss the mechanisms by which microsatellites in promoters can alter gene expression as they expand and contract, with particular attention to secondary structures like Z-DNA and H-DNA. We suggest that promoter microsatellites, especially those that are highly conserved, may be an important source of human phenotypic variation.
Collapse
|
25
|
Rao XJ, Xu XX, Yu XQ. Manduca sexta moricin promoter elements can increase promoter activities of Drosophila melanogaster antimicrobial peptide genes. INSECT BIOCHEMISTRY AND MOLECULAR BIOLOGY 2011; 41:982-92. [PMID: 22005212 PMCID: PMC3210862 DOI: 10.1016/j.ibmb.2011.09.007] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/01/2011] [Revised: 09/19/2011] [Accepted: 09/30/2011] [Indexed: 05/13/2023]
Abstract
Insects produce a variety of antimicrobial peptides (AMPs). Induction of insect AMP genes is regulated by the Toll and IMD (immune deficiency) pathways via NF-κB and GATA factors. Little is known about species-specific regulation of AMP genes. In this report, we showed that activities of most Manduca sexta and Drosophila melanogaster AMP gene promoters were regulated in a species-specific manner in Drosophila (Dipteran) S2 cells and Spodoptera frugiperda (Lepidopteran) Sf9 cells. A κB-GATA element (22 bp) from M. sexta moricin (MsMoricin) promoter could significantly increase activities of Drosophila AMP gene promoters in S2 cells, and an MsMoricin promoter activating element (MPAE) (140 bp) could increase activity of drosomycin promoter specifically in Sf9 cells. However, κB and GATA factors alone were not sufficient for MsMoricin gene activation, suggesting that other co-regulators may be required to fully activate AMP genes. Our results suggest that induction of insect AMP genes may require a transcription complex composed of common nuclear factors (such as NF-κB and GATA factors) and species-related co-regulators, and it is the co-regulators that may confer species-specific regulation of AMP genes. In addition, we showed that activity of Drosophila drosomycin promoter could be activated cooperatively by the inserted exogenous κB-GATA element and the endogenous κB element. These findings revealed an approach of engineering AMP genes with enhanced activities, which may lead to broad applications.
Collapse
Affiliation(s)
| | | | - Xiao-Qiang Yu
- Send correspondence to: Xiao-Qiang Yu, PhD, Division of Cell Biology and Biophysics, School of Biological Sciences, University of Missouri-Kansas City, 5007 Rockhill Road, Kansas City, MO 64110, Telephone: (816)-235-6379, Fax: (816)-235-1503,
| |
Collapse
|
26
|
Kranz AL, Eils R, König R. Enhancers regulate progression of development in mammalian cells. Nucleic Acids Res 2011; 39:8689-702. [PMID: 21785139 PMCID: PMC3203619 DOI: 10.1093/nar/gkr602] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
During development and differentiation of an organism, accurate gene regulation is central for cells to maintain and balance their differentiation processes. Transcriptional interactions between cis-acting DNA elements such as promoters and enhancers are the basis for precise and balanced transcriptional regulation. We identified modules of combinations of binding sites in proximal and distal regulatory regions upstream of all transcription start sites (TSSs) in silico and applied these modules to gene expression time-series of mouse embryonic development and differentiation of human stem cells. In addition to tissue-specific regulation controlled by combinations of transcription factors (TFs) binding at promoters, we observed that in particular the combination of TFs binding at promoters together with TFs binding at the respective enhancers regulate highly specifically temporal progression during development: whereas 40% of TFs were specific for time intervals, 79% of TF pairs and even 97% of promoter-enhancer modules showed specificity for single time intervals of the human stem cells. Predominantly SP1 and E2F contributed to temporal specificity at promoters and the forkhead (FOX) family of TFs at enhancer regions. Altogether, we characterized three classes of TFs: with binding sites being enriched at the TSS (like SP1), depleted at the TSS (like FOX), and rather uniformly distributed.
Collapse
Affiliation(s)
- Anna-Lena Kranz
- Department of Bioinformatics and Functional Genomics, Institute of Pharmacy and Molecular Biotechnology, and Bioquant, University of Heidelberg, INF 267, 69120 Heidelberg, Germany
| | | | | |
Collapse
|
27
|
Qin J, Li MJ, Wang P, Zhang MQ, Wang J. ChIP-Array: combinatory analysis of ChIP-seq/chip and microarray gene expression data to discover direct/indirect targets of a transcription factor. Nucleic Acids Res 2011; 39:W430-6. [PMID: 21586587 PMCID: PMC3125757 DOI: 10.1093/nar/gkr332] [Citation(s) in RCA: 48] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
Abstract
Chromatin immunoprecipitation (ChIP) coupled with high-throughput techniques (ChIP-X), such as next generation sequencing (ChIP-Seq) and microarray (ChIP–chip), has been successfully used to map active transcription factor binding sites (TFBS) of a transcription factor (TF). The targeted genes can be activated or suppressed by the TF, or are unresponsive to the TF. Microarray technology has been used to measure the actual expression changes of thousands of genes under the perturbation of a TF, but is unable to determine if the affected genes are direct or indirect targets of the TF. Furthermore, both ChIP-X and microarray methods produce a large number of false positives. Combining microarray expression profiling and ChIP-X data allows more effective TFBS analysis for studying the function of a TF. However, current web servers only provide tools to analyze either ChIP-X or expression data, but not both. Here, we present ChIP-Array, a web server that integrates ChIP-X and expression data from human, mouse, yeast, fruit fly and Arabidopsis. This server will assist biologists to detect direct and indirect target genes regulated by a TF of interest and to aid in the functional characterization of the TF. ChIP-Array is available at http://jjwanglab.hku.hk/ChIP-Array, with free access to academic users.
Collapse
Affiliation(s)
- Jing Qin
- Department of Biochemistry, LKS Faculty of Medicine, The University of Hong Kong, 21 Sassoon Road, Hong Kong SAR, China
| | | | | | | | | |
Collapse
|
28
|
Kocha KM, Genge CE, Moyes CD. Origins of interspecies variation in mammalian muscle metabolic enzymes. Physiol Genomics 2011; 43:873-83. [PMID: 21586671 DOI: 10.1152/physiolgenomics.00025.2011] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022] Open
Abstract
Do the transcriptional mechanisms that control an individual's mitochondrial content, PGC1α (peroxisome proliferator-activated receptor γ coactivator-1α) and NRF1 (nuclear respiratory factor-1), also cause differences between species? We explored the determinants of cytochrome c oxidase (COX) activities in muscles from 12 rodents differing 1,000-fold in mass. Hindlimb muscles differed in scaling patterns from isometric (soleus, gastrocnemius) to allometric (tibialis anterior, scaling coefficient = -0.16). Consideration of myonuclear domain reduced the differences within species, but interspecies differences remained. For tibialis anterior, there was no significant scaling relationship in mRNA/g for COX4-1, PGC1α, or NRF1, yet COX4-1 mRNA/g was a good predictor of COX activity (r(2) = 0.55), PGC1α and NRF1 mRNA correlated with each other (r(2) = 0.42), and both could predict COX4-1 mRNA (r(2) = 0.48 and 0.52) and COX activity (r(2) = 0.55 and 0.49). This paradox was resolved by multivariate analysis, which explained 90% of interspecies variation, about equally partitioned between mass effects and PGC1α (or NRF1) mRNA levels, independent of mass. To explore the determinants of PGC1α mRNA, we analyzed 52 mammalian PGC1α proximal promoters and found no size dependence in regulatory element distribution. Likewise, the activity of PGC1α promoter reporter genes from 30 mammals showed no significant relationship with body mass. Collectively, these studies suggest that not all muscles scale equivalently, but for those that show allometric scaling, transcriptional regulation of the master regulators, PGC1α and NRF1, does not account for scaling patterns, though it does contribute to interspecies differences in COX activities independent of mass.
Collapse
Affiliation(s)
- K M Kocha
- Department of Biology, Queen's University, Kingston, Ontario, Canada
| | | | | |
Collapse
|
29
|
Rouault M, Nielsen DA, Ho A, Kreek MJ, Yuferov V. Cell-specific effects of variants of the 68-base pair tandem repeat on prodynorphin gene promoter activity. Addict Biol 2011; 16:334-46. [PMID: 20731629 DOI: 10.1111/j.1369-1600.2010.00248.x] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
A polymorphic 68-bp tandem repeat has been identified within the promoter of the human prodynorphin (PDYN) gene. We found that this 68-bp repeat in the PDYN promoter occurs naturally up to five times. We studied the effect of the number of 68-bp repeats, and of a SNP (rs61761346) found within the repeat on PDYN gene promoter activity. Thirteen promoter forms, different naturally occurring combinations of repeats and the internal SNP, were cloned upstream of the luciferase reporter gene, transfected into human SK-N-SH, H69, or HEK293 cells. Cells were then stimulated with TPA or caffeine. We found cell-specific effects of the number of 68-bp repeats on the transcriptional activity of the PDYN promoter. In SK-N-SH and H69 cells, three or four repeats led to lower expression of luciferase than did one or two repeats. The opposite effect was found in HEK293 cells. The SNP also had an effect on PDYN gene expression in both SK-N-SH and H69 cells; promoter forms with the A allele had significantly higher expression than promoter forms with the G allele. These results further our understanding of the complex transcriptional regulation of the PDYN gene promoter.
Collapse
Affiliation(s)
- Morgane Rouault
- The Laboratory of the Biology of Addictive Diseases, The Rockefeller University, USA
| | | | | | | | | |
Collapse
|
30
|
Cserháti M, Turóczy Z, Zombori Z, Cserzo M, Dudits D, Pongor S, Györgyey J. Prediction of new abiotic stress genes in Arabidopsis thaliana and Oryza sativa according to enumeration-based statistical analysis. Mol Genet Genomics 2011; 285:375-91. [PMID: 21437642 DOI: 10.1007/s00438-011-0605-4] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2010] [Accepted: 01/31/2011] [Indexed: 10/18/2022]
Abstract
Plants undergo an extensive change in gene regulation during abiotic stress. It is of great agricultural importance to know which genes are affected during stress response. The genome sequence of a number of plant species has been determined, among them Arabidopsis and Oryza sativa, whose genome has been annotated most completely as of yet, and are well-known organisms widely used as experimental systems. This paper applies a statistical algorithm for predicting new stress-induced motifs and genes by analyzing promoter sets co-regulated by abiotic stress in the previously mentioned two species. After identifying characteristic putative regulatory motif sequence pairs (dyads) in the promoters of 125 stress-regulated Arabidopsis genes and 87 O. sativa genes, these dyads were used to screen the entire Arabidopsis and O. sativa promoteromes to find related stress-induced genes whose promoters contained a large number of these dyads found by our algorithm. We were able to predict a number of putative dyads, characteristic of a large number of stress-regulated genes, some of them newly discovered by our algorithm and serve as putative transcription factor binding sites. Our new motif prediction algorithm comes complete with a stand-alone program. This algorithm may be used in motif discovery in the future in other species. The more than 1,200 Arabidopsis and 1,700 Orzya sativa genes found by our algorithm are good candidates for further experimental studies in abiotic stress.
Collapse
Affiliation(s)
- Mátyás Cserháti
- Biological Research Center, Institute of Plant Biology, Hungarian Academy of Sciences, P.O. BOX 521, Temesvári Krt. 62, 6701 Szeged, Hungary.
| | | | | | | | | | | | | |
Collapse
|
31
|
Soccio RE, Tuteja G, Everett LJ, Li Z, Lazar MA, Kaestner KH. Species-specific strategies underlying conserved functions of metabolic transcription factors. Mol Endocrinol 2011; 25:694-706. [PMID: 21292830 DOI: 10.1210/me.2010-0454] [Citation(s) in RCA: 51] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022] Open
Abstract
The winged helix protein FOXA2 and the nuclear receptor peroxisome proliferator-activated receptor-γ (PPARγ) are highly conserved, regionally expressed transcription factors (TFs) that regulate networks of genes controlling complex metabolic functions. Cistrome analysis for Foxa2 in mouse liver and PPARγ in mouse adipocytes has previously produced consensus-binding sites that are nearly identical to those used by the corresponding TFs in human cells. We report here that, despite the conservation of the canonical binding motif, the great majority of binding regions for FOXA2 in human liver and for PPARγ in human adipocytes are not in the orthologous locations corresponding to the mouse genome, and vice versa. Of note, TF binding can be absent in one species despite sequence conservation, including motifs that do support binding in the other species, demonstrating a major limitation of in silico binding site prediction. Whereas only approximately 10% of binding sites are conserved, gene-centric analysis reveals that about 50% of genes with nearby TF occupancy are shared across species for both hepatic FOXA2 and adipocyte PPARγ. Remarkably, for both TFs, many of the shared genes function in tissue-specific metabolic pathways, whereas species-unique genes fail to show enrichment for these pathways. Nonetheless, the species-unique genes, like the shared genes, showed the expected transcriptional regulation by the TFs in loss-of-function experiments. Thus, species-specific strategies underlie the biological functions of metabolic TFs that are highly conserved across mammalian species. Analysis of factor binding in multiple species may be necessary to distinguish apparent species-unique noise and reveal functionally relevant information.
Collapse
Affiliation(s)
- Raymond E Soccio
- Division of Endocrinology, Diabetes, and Metabolism, Department of Medicine, University of Pennsylvania School of Medicine, Philadelphia, Pennsylvania 19104-6149, USA
| | | | | | | | | | | |
Collapse
|
32
|
van Heeringen SJ, Akhtar W, Jacobi UG, Akkers RC, Suzuki Y, Veenstra GJC. Nucleotide composition-linked divergence of vertebrate core promoter architecture. Genome Res 2011; 21:410-21. [PMID: 21284373 DOI: 10.1101/gr.111724.110] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]
Abstract
Transcription initiation involves the recruitment of basal transcription factors to the core promoter. A variety of core promoter elements exists; however for most of these motifs, the distribution across species is unknown. Here we report on the comparison of human and amphibian promoter sequences. We have used oligo-capping in combination with deep sequencing to determine transcription start sites in Xenopus tropicalis. To systematically predict regulatory elements, we have developed a de novo motif finding pipeline using an ensemble of computational tools. A comprehensive comparison of human and amphibian promoter sequences revealed both similarities and differences in core promoter architecture. Some of the differences stem from a highly divergent nucleotide composition of Xenopus and human promoters. Whereas the distribution of some core promoter motifs is conserved independently of species-specific nucleotide bias, the frequency of another class of motifs correlates with the single nucleotide frequencies. This class includes the well-known TATA box and SP1 motifs, which are more abundant in Xenopus and human promoters, respectively. While these motifs are enriched above the local nucleotide background in both organisms, their frequency varies in step with this background. These differences are likely adaptive as these motifs can recruit TFIID to either CpG island or sharply initiating promoters. Our results highlight both the conserved and diverged aspects of vertebrate transcription, most notably showing co-opted motif usage to recruit the transcriptional machinery to promoters with diverging nucleotide composition. This shows how sweeping changes in nucleotide composition are compatible with highly conserved mechanisms of transcription initiation.
Collapse
Affiliation(s)
- Simon J van Heeringen
- Radboud University Nijmegen, Department of Molecular Biology, Faculty of Science, Nijmegen Centre for Molecular Life Sciences, 6500 HB Nijmegen, The Netherlands
| | | | | | | | | | | |
Collapse
|
33
|
Yokoyama KD, Thorne JL, Wray GA. Coordinated genome-wide modifications within proximal promoter cis-regulatory elements during vertebrate evolution. Genome Biol Evol 2010; 3:66-74. [PMID: 21118975 PMCID: PMC3021792 DOI: 10.1093/gbe/evq078] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023] Open
Abstract
There often exists a "one-to-many" relationship between a transcription factor and a multitude of binding sites throughout the genome. It is commonly assumed that transcription factor binding motifs remain largely static over the course of evolution because changes in binding specificity can alter the interactions with potentially hundreds of sites across the genome. Focusing on regulatory motifs overrepresented at specific locations within or near the promoter, we find that a surprisingly large number of cis-regulatory elements have been subject to coordinated genome-wide modifications during vertebrate evolution, such that the motif frequency changes on a single branch of vertebrate phylogeny. This was found to be the case even between closely related mammal species, with nearly a third of all location-specific consensus motifs exhibiting significant modifications within the human or mouse lineage since their divergence. Many of these modifications are likely to be compensatory changes throughout the genome following changes in protein factor binding affinities, whereas others may be due to changes in mutation rates or effective population size. The likelihood that this happened many times during vertebrate evolution highlights the need to examine additional taxa and to understand the evolutionary and molecular mechanisms underlying the evolution of protein-DNA interactions.
Collapse
|
34
|
Carstensen L, Sandelin A, Winther O, Hansen NR. Multivariate Hawkes process models of the occurrence of regulatory elements. BMC Bioinformatics 2010; 11:456. [PMID: 20828413 PMCID: PMC2949889 DOI: 10.1186/1471-2105-11-456] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2010] [Accepted: 09/09/2010] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND A central question in molecular biology is how transcriptional regulatory elements (TREs) act in combination. Recent high-throughput data provide us with the location of multiple regulatory regions for multiple regulators, and thus with the possibility of analyzing the multivariate distribution of the occurrences of these TREs along the genome. RESULTS We present a model of TRE occurrences known as the Hawkes process. We illustrate the use of this model by analyzing two different publically available data sets. We are able to model, in detail, how the occurrence of one TRE is affected by the occurrences of others, and we can test a range of natural hypotheses about the dependencies among the TRE occurrences. In contrast to earlier efforts, pre-processing steps such as clustering or binning are not needed, and we thus retain information about the dependencies among the TREs that is otherwise lost. For each of the two data sets we provide two results: first, a qualitative description of the dependencies among the occurrences of the TREs, and second, quantitative results on the favored or avoided distances between the different TREs. CONCLUSIONS The Hawkes process is a novel way of modeling the joint occurrences of multiple TREs along the genome that is capable of providing new insights into dependencies among elements involved in transcriptional regulation. The method is available as an R package from http://www.math.ku.dk/~richard/ppstat/.
Collapse
Affiliation(s)
- Lisbeth Carstensen
- Department of Mathematical Sciences, University of Copenhagen, Universitetsparken 5, 2100 Copenhagen Ø, Denmark
| | | | | | | |
Collapse
|
35
|
Abstract
MOTIVATION Discovery of nucleotide motifs that are localized with respect to a certain biological landmark is important in several appli-cations, such as in regulatory sequences flanking the transcription start site, in the neighborhood of known transcription factor binding sites, and in transcription factor binding regions discovered by massively parallel sequencing (ChIP-Seq). RESULTS We report an algorithm called LocalMotif to discover such localized motifs. The algorithm is based on a novel scoring function, called spatial confinement score, which can determine the exact interval of localization of a motif. This score is combined with other existing scoring measures including over-representation and relative entropy to determine the overall prominence of the motif. The approach successfully discovers biologically relevant motifs and their intervals of localization in scenarios where the motifs cannot be discovered by general motif finding tools. It is especially useful for discovering multiple co-localized motifs in a set of regulatory sequences, such as those identified by ChIP-Seq. AVAILABILITY AND IMPLEMENTATION The LocalMotif software is available at http://www.comp.nus.edu.sg/~bioinfo/LocalMotif.
Collapse
Affiliation(s)
- Vipin Narang
- Department of Computer Science, National University of Singapore, Singapore
| | | | | |
Collapse
|
36
|
Algorithms and methods for correlating experimental results with annotation databases. Methods Mol Biol 2010; 593:315-40. [PMID: 19957156 DOI: 10.1007/978-1-60327-194-3_15] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023]
Abstract
An important procedure in biomedical research is the detection of genes that are differentially expressed under pathologic conditions. These genes, or at least a subset of them, are key biomarkers and are thought to be important to describe and understand the analyzed biological system (the pathology) at a molecular level. To obtain this understanding, it is indispensable to link those genes to biological knowledge stored in databases. Ontological analysis is nowadays a standard procedure to analyze large gene lists. By detecting enriched and depleted gene properties and functions, important insights on the biological system can be obtained. In this chapter, we will give a brief survey of the general layout of the methods used in an ontological analysis and of the most important tools that have been developed.
Collapse
|
37
|
Vandenbon A, Nakai K. Modeling tissue-specific structural patterns in human and mouse promoters. Nucleic Acids Res 2009; 38:17-25. [PMID: 19850720 PMCID: PMC2800225 DOI: 10.1093/nar/gkp866] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
Sets of genes expressed in the same tissue are believed to be under the regulation of a similar set of transcription factors, and can thus be assumed to contain similar structural patterns in their regulatory regions. Here we present a study of the structural patterns in promoters of genes expressed specifically in 26 human and 34 mouse tissues. For each tissue we constructed promoter structure models, taking into account presences of motifs, their positioning to the transcription start site, and pairwise positioning of motifs. We found that 35 out of 60 models (58%) were able to distinguish positive test promoter sequences from control promoter sequences with statistical significance. Models with high performance include those for liver, skeletal muscle, kidney and tongue. Many of the important structural patterns in these models involve transcription factors of known importance in the tissues in question and structural patterns tend to be conserved between human and mouse. In addition to that, promoter models for related tissues tend to have high inter-tissue performance, indicating that their promoters share common structural patterns. Together, these results illustrate the validity of our models, but also indicate that the promoter structures for some tissues are easier to model than those of others.
Collapse
Affiliation(s)
- Alexis Vandenbon
- Department of Medical Genome Sciences, Graduate School of Frontier Sciences, University of Tokyo, Tokyo, Japan
| | | |
Collapse
|
38
|
Zeng J, Zhu S, Yan H. Towards accurate human promoter recognition: a review of currently used sequence features and classification methods. Brief Bioinform 2009; 10:498-508. [PMID: 19531545 DOI: 10.1093/bib/bbp027] [Citation(s) in RCA: 40] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
This review describes important advances that have been made during the past decade for genome-wide human promoter recognition. Interest in promoter recognition algorithms on a genome-wide scale is worldwide and touches on a number of practical systems that are important in analysis of gene regulation and in genome annotation without experimental support of ESTs, cDNAs or mRNAs. The main focus of this review is on feature extraction and model selection for accurate human promoter recognition, with descriptions of what they are, what has been accomplished, and what remains to be done.
Collapse
Affiliation(s)
- Jia Zeng
- Department of Computer Science, Hong Kong Baptist University, Kowloon, Hong Kong.
| | | | | |
Collapse
|
39
|
Yokoyama KD, Ohler U, Wray GA. Measuring spatial preferences at fine-scale resolution identifies known and novel cis-regulatory element candidates and functional motif-pair relationships. Nucleic Acids Res 2009; 37:e92. [PMID: 19483094 PMCID: PMC2715254 DOI: 10.1093/nar/gkp423] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/03/2023] Open
Abstract
Transcriptional regulation is mediated by the collective binding of proteins called transcription factors to cis-regulatory elements. A handful of factors are known to function at particular distances from the transcription start site, although the extent to which this occurs is not well understood. Spatial dependencies can also exist between pairs of binding motifs, facilitating factor-pair interactions. We sought to determine to what extent spatial preferences measured at high-scale resolution could be utilized to predict cis-regulatory elements as well as motif-pairs binding interacting proteins. We introduce the ‘motif positional function’ model which predicts spatial biases using regression analysis, differentiating noise from true position-specific overrepresentation at single-nucleotide resolution. Our method predicts 48 consensus motifs exhibiting positional enrichment within human promoters, including fourteen motifs without known binding partners. We then extend the model to analyze distance preferences between pairs of motifs. We find that motif-pairs binding interacting factors often co-occur preferentially at multiple distances, with intervals between preferred distances often corresponding to the turn of the DNA double-helix. This offers a novel means by which to predict sequence elements with a collective role in gene regulation.
Collapse
Affiliation(s)
- Ken Daigoro Yokoyama
- Biology Department, Institute for Genome Sciences and Policy, Duke University, Durham, NC 27708, USA
| | | | | |
Collapse
|
40
|
Hutchins LN, Murphy SM, Singh P, Graber JH. Position-dependent motif characterization using non-negative matrix factorization. ACTA ACUST UNITED AC 2008; 24:2684-90. [PMID: 18852176 PMCID: PMC2639279 DOI: 10.1093/bioinformatics/btn526] [Citation(s) in RCA: 70] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/12/2023]
Abstract
Motivation:Cis-acting regulatory elements are frequently constrained by both sequence content and positioning relative to a functional site, such as a splice or polyadenylation site. We describe an approach to regulatory motif analysis based on non-negative matrix factorization (NMF). Whereas existing pattern recognition algorithms commonly focus primarily on sequence content, our method simultaneously characterizes both positioning and sequence content of putative motifs. Results: Tests on artificially generated sequences show that NMF can faithfully reproduce both positioning and content of test motifs. We show how the variation of the residual sum of squares can be used to give a robust estimate of the number of motifs or patterns in a sequence set. Our analysis distinguishes multiple motifs with significant overlap in sequence content and/or positioning. Finally, we demonstrate the use of the NMF approach through characterization of biologically interesting datasets. Specifically, an analysis of mRNA 3′-processing (cleavage and polyadenylation) sites from a broad range of higher eukaryotes reveals a conserved core pattern of three elements. Contact:joel.graber@jax.org Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Lucie N Hutchins
- Center for Genome Dynamics, The Jackson Laboratory, Bar Harbor, ME 04609, USA
| | | | | | | |
Collapse
|
41
|
Friedman BA, Stadler MB, Shomron N, Ding Y, Burge CB. Ab initio identification of functionally interacting pairs of cis-regulatory elements. Genome Res 2008; 18:1643-51. [PMID: 18799692 DOI: 10.1101/gr.080085.108] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
Cooperatively acting pairs of cis-regulatory elements play important roles in many biological processes. Here, we describe a statistical approach, compositionally orthogonalized co-occurrence analysis (coCOA) that detects pairs of oligonucleotides that preferentially co-occur in pairs of sequence regions, controlling for correlations between the compositions of the analyzed regions. coCOA identified three clusters of oligonucleotide pairs that frequently co-occur at 5' and 3' ends of human and mouse introns. The largest cluster involved GC-rich sequences at the 5' ends of introns that co-occur and are co-conserved with specific AU-rich sequences near intron 3' ends. These motifs are preferentially conserved when they occur together, as measured by a new co-conservation measure, supporting common in vivo function. These motif pairs are also enriched in introns flanking alternative "cassette" exons, suggesting a role in silencing of intervening exons, and we showed that these motifs can cooperatively silence splicing of an intervening exon in a splicing reporter assay. This approach can be easily generalized to problems beyond RNA splicing.
Collapse
Affiliation(s)
- Brad A Friedman
- Department of Biology, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA.
| | | | | | | | | |
Collapse
|
42
|
Computational analysis of constraints on noncoding regions, coding regions and gene expression in relation to Plasmodium phenotypic diversity. PLoS One 2008; 3:e3122. [PMID: 18769675 PMCID: PMC2518851 DOI: 10.1371/journal.pone.0003122] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2008] [Accepted: 08/02/2008] [Indexed: 12/26/2022] Open
Abstract
BACKGROUND Malaria-causing Plasmodium species exhibit marked differences including host choice and preference for invading particular cell types. The genetic bases of phenotypic differences between parasites can be understood, in part, by investigating constraints on gene expression and genic sequences, both coding and regulatory. METHODOLOGY/PRINCIPAL FINDINGS We investigated the evolutionary constraints on sequence and expression of parasitic genes by applying comparative genomics approaches to 6 Plasmodium genomes and 2 genome-wide expression studies. We found that the coding regions of Plasmodium transcription factor and sexual development genes are relatively less constrained, as are those of genes encoding CCCH zinc fingers and invasion proteins, which all play important roles in these parasites. Transcription factors and genes with stage-restricted expression have conserved upstream regions and so do several gene classes critical to the parasite's lifestyle, namely, ion transport, invasion, chromatin assembly and CCCH zinc fingers. Additionally, a cross-species comparison of expression patterns revealed that Plasmodium-specific genes exhibit significant expression divergence. CONCLUSIONS/SIGNIFICANCE Overall, constraints on Plasmodium's protein coding regions confirm observations from other eukaryotes in that transcription factors are under relatively lower constraint. Proteins relevant to the parasite's unique lifestyle also have lower constraint on their coding regions. Greater conservation between Plasmodium species in terms of promoter motifs suggests tight regulatory control of lifestyle genes. However, an interspecies divergence in expression patterns of these genes suggests that either expression is controlled via genomic or epigenomic features not encoded in the proximal promoter sequence, or alternatively, the combinatorial interactions between motifs confer species-specific expression patterns.
Collapse
|
43
|
Hackenberg M, Matthiesen R. Annotation-Modules: a tool for finding significant combinations of multisource annotations for gene lists. Bioinformatics 2008; 24:1386-93. [PMID: 18434345 DOI: 10.1093/bioinformatics/btn178] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022] Open
Affiliation(s)
- Michael Hackenberg
- Bioinformatics Group, CIC bioGUNE, CIBER-HEPAD, Technology Park of Bizkaia, 48160 Derio, Bizkaia, Spain.
| | | |
Collapse
|
44
|
Tharakaraman K, Bodenreider O, Landsman D, Spouge JL, Mariño-Ramírez L. The biological function of some human transcription factor binding motifs varies with position relative to the transcription start site. Nucleic Acids Res 2008; 36:2777-86. [PMID: 18367472 PMCID: PMC2377430 DOI: 10.1093/nar/gkn137] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022] Open
Abstract
A number of previous studies have predicted transcription factor binding sites (TFBSs) by exploiting the position of genomic landmarks like the transcriptional start site (TSS). The studies’ methods are generally too computationally intensive for genome-scale investigation, so the full potential of ‘positional regulomics’ to discover TFBSs and determine their function remains unknown. Because databases often annotate the genomic landmarks in DNA sequences, the methodical exploitation of positional regulomics has become increasingly urgent. Accordingly, we examined a set of 7914 human putative promoter regions (PPRs) with a known TSS. Our methods identified 1226 eight-letter DNA words with significant positional preferences with respect to the TSS, of which only 608 of the 1226 words matched known TFBSs. Many groups of genes whose PPRs contained a common word displayed similar expression profiles and related biological functions, however. Most interestingly, our results included 78 words, each of which clustered significantly in two or three different positions relative to the TSS. Often, the gene groups corresponding to different positional clusters of the same word corresponded to diverse functions, e.g. activation or repression in different tissues. Thus, different clusters of the same word likely reflect the phenomenon of ‘positional regulation’, i.e. a word's regulatory function can vary with its position relative to a genomic landmark, a conclusion inaccessible to methods based purely on sequence. Further integrative analysis of words co-occurring in PPRs also yielded 24 different groups of genes, likely identifying cis-regulatory modules de novo. Whereas comparative genomics requires precise sequence alignments, positional regulomics exploits genomic landmarks to provide a ‘poor man's alignment’. By exploiting the phenomenon of positional regulation, it uses position to differentiate the biological functions of subsets of TFBSs sharing a common sequence motif.
Collapse
Affiliation(s)
- Kannan Tharakaraman
- Computational Biology Branch, National Center for Biotechnology Information and National Library of Medicine, National Institutes of Health, 8600 Rockville Pike, MSC 6075 Bethesda, MD 20894-6075, USA
| | | | | | | | | |
Collapse
|
45
|
Wang J, Ungar LH, Tseng H, Hannenhalli S. MetaProm: a neural network based meta-predictor for alternative human promoter prediction. BMC Genomics 2007; 8:374. [PMID: 17941982 PMCID: PMC2194789 DOI: 10.1186/1471-2164-8-374] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2007] [Accepted: 10/17/2007] [Indexed: 01/21/2023] Open
Abstract
BACKGROUND De novo eukaryotic promoter prediction is important for discovering novel genes and understanding gene regulation. In spite of the great advances made in the past decade, recent studies revealed that the overall performances of the current promoter prediction programs (PPPs) are still poor, and predictions made by individual PPPs do not overlap each other. Furthermore, most PPPs are trained and tested on the most-upstream promoters; their performances on alternative promoters have not been assessed. RESULTS In this paper, we evaluate the performances of current major promoter prediction programs (i.e., PSPA, FirstEF, McPromoter, DragonGSF, DragonPF, and FProm) using 42,536 distinct human gene promoters on a genome-wide scale, and with emphasis on alternative promoters. We describe an artificial neural network (ANN) based meta-predictor program that integrates predictions from the current PPPs and the predicted promoters' relation to CpG islands. Our specific analysis of recently discovered alternative promoters reveals that although only 41% of the 3' most promoters overlap a CpG island, 74% of 5' most promoters overlap a CpG island. CONCLUSION Our assessment of six PPPs on 1.06 x 109 bps of human genome sequence reveals the specific strengths and weaknesses of individual PPPs. Our meta-predictor outperforms any individual PPP in sensitivity and specificity. Furthermore, we discovered that the 5' alternative promoters are more likely to be associated with a CpG island.
Collapse
Affiliation(s)
- Junwen Wang
- Center for Bioinformatics, University of Pennsylvania, Philadelphia, PA 19104, USA.
| | | | | | | |
Collapse
|