1
|
Bredesen BA, Rehmsmeier M. DNA sequence models of genome-wide Drosophila melanogaster Polycomb binding sites improve generalization to independent Polycomb Response Elements. Nucleic Acids Res 2019; 47:7781-7797. [PMID: 31340029 PMCID: PMC6735708 DOI: 10.1093/nar/gkz617] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2019] [Revised: 07/01/2019] [Accepted: 07/11/2019] [Indexed: 12/12/2022] Open
Abstract
Polycomb Response Elements (PREs) are cis-regulatory DNA elements that maintain gene transcription states through DNA replication and mitosis. PREs have little sequence similarity, but are enriched in a number of sequence motifs. Previous methods for modelling Drosophila melanogaster PRE sequences (PREdictor and EpiPredictor) have used a set of 7 motifs and a training set of 12 PREs and 16-23 non-PREs. Advances in experimental methods for mapping chromatin binding factors and modifications has led to the publication of several genome-wide sets of Polycomb targets. In addition to the seven motifs previously used, PREs are enriched in the GTGT motif, recently associated with the sequence-specific DNA binding protein Combgap. We investigated whether models trained on genome-wide Polycomb sites generalize to independent PREs when trained with control sequences generated by naive PRE models and including the GTGT motif. We also developed a new PRE predictor: SVM-MOCCA. Training PRE predictors with genome-wide experimental data improves generalization to independent data, and SVM-MOCCA predicts the majority of PREs in three independent experimental sets. We present 2908 candidate PREs enriched in sequence and chromatin signatures. 2412 of these are also enriched in H3K4me1, a mark of Trithorax activated chromatin, suggesting that PREs/TREs have a common sequence code.
Collapse
Affiliation(s)
- Bjørn André Bredesen
- Computational Biology Unit, Department of Informatics, University of Bergen, P.O. Box 7803, N-5020 Bergen, Norway
| | - Marc Rehmsmeier
- Computational Biology Unit, Department of Informatics, University of Bergen, P.O. Box 7803, N-5020 Bergen, Norway.,Integrated Research Institute (IRI) for the Life Sciences and Department of Biology, Humboldt-Universität zu Berlin, Unter den Linden 6, 10099 Berlin, Germany
| |
Collapse
|
2
|
Abstract
Polycomb group response elements (PREs) play an essential role in gene regulation by the Polycomb group (PcG) repressor proteins in Drosophila. PREs are required for the recruitment and maintenance of repression by the PcG proteins. PREs are made up of binding sites for multiple DNA-binding proteins, but it is still unclear what combination(s) of binding sites is required for PRE activity. Here we compare the binding sites and activities of two closely linked yet separable PREs of the Drosophila engrailed (en) gene, PRE1 and PRE2. Both PRE1 and PRE2 contain binding sites for multiple PRE-DNA-binding proteins, but the number, arrangement, and spacing of the sites differs between the two PREs. These differences have functional consequences. Both PRE1 and PRE2 mediate pairing-sensitive silencing of mini-white, a functional assay for PcG repression; however, PRE1 requires two binding sites for Pleiohomeotic (Pho), whereas PRE2 requires only one Pho-binding site for this activity. Furthermore, for full pairing-sensitive silencing activity, PRE1 requires an AT-rich region not found in PRE2. These two PREs behave differently in a PRE embryonic and larval reporter construct inserted at an identical location in the genome. Our data illustrate the diversity of architecture and function of PREs.
Collapse
|
3
|
Chen Y, Leal AD, Patel S, Gorski DH. The homeobox gene GAX activates p21WAF1/CIP1 expression in vascular endothelial cells through direct interaction with upstream AT-rich sequences. J Biol Chem 2007; 282:507-17. [PMID: 17074759 PMCID: PMC1865102 DOI: 10.1074/jbc.m606604200] [Citation(s) in RCA: 31] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022] Open
Abstract
Tumors secrete pro-angiogenic factors to induce the ingrowth of blood vessels from the surrounding stroma, the end targets of which are vascular endothelial cells (ECs). The homeobox gene GAX inhibits angiogenesis and induces p21(WAF1/CIP1) expression in vascular ECs. To elucidate the mechanism through which GAX activates p21(WAF1/CIP1) expression, we constructed GAX cDNAs with deletions of the N-terminal domain, the homeodomain, or the C-terminal domain and then assessed these constructs for their ability to activate p21(WAF1/CIP1). There was an absolute requirement for the homeodomain, whereas deleting the C-terminal domain decreased but did not abolish transactivation of the p21(WAF1/CIP1) promoter by GAX. Deleting the N-terminal domain did abolish transactivation. Next, we performed chromatin immunoprecipitation and found, approximately 15 kb upstream of the p21(WAF1/CIP1) ATG codon, an ATTA-containing GAX-binding site (designated A6) with a sequence similar to that of other homeodomain-binding sites. GAX was able to bind to A6 in a homeodomain-dependent manner and thereby activate the expression of a reporter gene coupled to this sequence, and this activation was abolished by mutating specific residues in this sequence. On the basis of the sequence of A6, we were then able to locate other ATTA-containing sequences that also bound GAX and activated transcription in reporter constructs. Finally, we found that the ability of these GAX deletions to induce G(0)/G(1) arrest correlates with their ability to transactivate the p21(WAF1/CIP1) promoter. We conclude that GAX activates p21(WAF1/CIP1) through multiple upstream AT-rich sequences. Given the multiple biological activities of GAX in regulating EC function, identification of a putative GAX-binding site will allow the study of how GAX activates or represses other downstream targets to inhibit angiogenesis.
Collapse
Affiliation(s)
- Yun Chen
- From the Division of Surgical Oncology, UMDNJ-Robert Wood Johnson Medical School, The Cancer Institute of New Jersey, New Brunswick, NJ 088901
| | - Alejandro D. Leal
- From the Division of Surgical Oncology, UMDNJ-Robert Wood Johnson Medical School, The Cancer Institute of New Jersey, New Brunswick, NJ 088901
| | - Sejal Patel
- From the Division of Surgical Oncology, UMDNJ-Robert Wood Johnson Medical School, The Cancer Institute of New Jersey, New Brunswick, NJ 088901
| | - David H. Gorski
- From the Division of Surgical Oncology, UMDNJ-Robert Wood Johnson Medical School, The Cancer Institute of New Jersey, New Brunswick, NJ 088901
| |
Collapse
|
4
|
Fiedler T, Rehmsmeier M. jPREdictor: a versatile tool for the prediction of cis-regulatory elements. Nucleic Acids Res 2006; 34:W546-50. [PMID: 16845067 PMCID: PMC1538890 DOI: 10.1093/nar/gkl250] [Citation(s) in RCA: 32] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023] Open
Abstract
Gene regulation is the process through which an organism effects spatial and temporal differences in gene expression levels. Knowledge of cis-regulatory elements as key players in gene regulation is indispensable for the understanding of the latter and of the development of organisms. Here we present the tool jPREdictor for the fast and versatile prediction of cis-regulatory elements on a genome-wide scale. The prediction is based on clusters of individual motifs and any combination of these into multi-motifs with selectable minimal and maximal distances. Individual motifs can be of heterogenous classes, such as simple sequence motifs or position-specific scoring matrices. Cluster scores are weighted occurrences of multi-motifs, where the weights are derived from positive and negative training sets. We illustrate the flexibility of the jPREdictor with a new predic-tion of Polycomb/Trithorax Response Elements in Drosophila melanogaster. jPREdictor is available as a graphical user interface for online use and for download at .
Collapse
Affiliation(s)
| | - Marc Rehmsmeier
- To whom correspondence should be addressed. Tel: +49 0 521 106 2905; Fax: +49 0 521 106 6411;
| |
Collapse
|
5
|
Emberly E, Rajewsky N, Siggia ED. Conservation of regulatory elements between two species of Drosophila. BMC Bioinformatics 2003; 4:57. [PMID: 14629780 PMCID: PMC302112 DOI: 10.1186/1471-2105-4-57] [Citation(s) in RCA: 80] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2003] [Accepted: 11/20/2003] [Indexed: 12/19/2022] Open
Abstract
Background One of the important goals in the post-genomic era is to determine the regulatory elements within the non-coding DNA of a given organism's genome. The identification of functional cis-regulatory modules has proven difficult since the component factor binding sites are small and the rules governing their arrangement are poorly understood. However, the genomes of suitably diverged species help to predict regulatory elements based on the generally accepted assumption that conserved blocks of genomic sequence are likely to be functional. To judge the efficacy of strategies that prefilter by sequence conservation it is important to know to what extent the converse assumption holds, namely that functional elements common to both species will fall within these conserved blocks. The recently completed sequence of a second Drosophila species provides an opportunity to test this assumption for one of the experimentally best studied regulatory networks in multicellular organisms, the body patterning of the fly embryo. Results We find that 50%–70% of known binding sites reside in conserved sequence blocks, but these percentages are not greatly enriched over what is expected by chance. Finally, a computational genome-wide search in both species for regulatory modules based on clusters of binding sites suggests that genes central to the regulatory network are consistently recovered. Conclusions Our results indicate that binding sites remain clustered for these "core modules" while not necessarily residing in conserved blocks. This is an important clue as to how regulatory information is encoded in the genome and how modules evolve.
Collapse
Affiliation(s)
- Eldon Emberly
- Center for Studies in Physics and Biology, The Rockefeller University, 1230 York Avenue, New York, NY, USA
| | - Nikolaus Rajewsky
- Department of Biology, New York University, 1009 Main Building, 100 Washington Square East, New York, NY, USA
| | - Eric D Siggia
- Center for Studies in Physics and Biology, The Rockefeller University, 1230 York Avenue, New York, NY, USA
| |
Collapse
|
6
|
Papatsenko DA, Makeev VJ, Lifanov AP, Régnier M, Nazina AG, Desplan C. Extraction of functional binding sites from unique regulatory regions: the Drosophila early developmental enhancers. Genome Res 2002; 12:470-81. [PMID: 11875036 PMCID: PMC155290 DOI: 10.1101/gr.212502] [Citation(s) in RCA: 60] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
The early developmental enhancers of Drosophila melanogaster comprise one of the most sophisticated regulatory systems in higher eukaryotes. An elaborate code in their DNA sequence translates both maternal and early embryonic regulatory signals into spatial distribution of transcription factors. One of the most striking features of this code is the redundancy of binding sites for these transcription factors (BSTF). Using this redundancy, we explored the possibility of predicting functional binding sites in a single enhancer region without any prior consensus/matrix description or evolutionary sequence comparisons. We developed a conceptually simple algorithm, Scanseq, that employs an original statistical evaluation for identifying the most redundant motifs and locates the position of potential BSTF in a given regulatory region. To estimate the biological relevance of our predictions, we built thorough literature-based annotations for the best-known Drosophila developmental enhancers and we generated detailed distribution maps for the most robust binding sites. The high statistical correlation between the location of BSTF in these experiment-based maps and the location predicted in silico by Scanseq confirmed the relevance of our approach. We also discuss the definition of true binding sites and the possible biological principles that govern patterning of regulatory regions and the distribution of transcriptional signals.
Collapse
|
7
|
Misquitta L, Paterson BM. Targeted disruption of gene function in Drosophila by RNA interference (RNA-i): a role for nautilus in embryonic somatic muscle formation. Proc Natl Acad Sci U S A 1999; 96:1451-6. [PMID: 9990044 PMCID: PMC15483 DOI: 10.1073/pnas.96.4.1451] [Citation(s) in RCA: 268] [Impact Index Per Article: 10.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
The expression of the MyoD gene homolog, nautilus (nau), in the Drosophila embryo defines a subset of mesodermal cells known as the muscle "pioneer" or "founder" cells. These cells are thought to establish the future muscle pattern in each hemisegment. Founders appear to recruit fusion-competent mesodermal cells to establish a particular muscle fiber type. In support of this concept every somatic muscle in the embryo is associated with one or more nautilus-positive cells. However, because of the lack of known (isolated) nautilus mutations, no direct test of the founder cell hypothesis has been possible. We now have utilized toxin ablation and genetic interference by double-stranded RNA (RNA interference or RNA-i) to determine both the role of the nautilus-expressing cells and the nautilus gene, respectively, in embryonic muscle formation. In the absence of nautilus-expressing cells muscle formation is severely disrupted or absent. A similar phenotype is observed with the elimination of the nautilus gene product by genetic interference upon injection of nautilus double-stranded RNA. These results define a crucial role for nautilus in embryonic muscle formation. The application of RNA interference to a variety of known Drosophila mutations as controls gave phenotypes essentially indistinguishable from the original mutation. RNA-i provides a powerful approach for the targeted disruption of a given genetic function in Drosophila.
Collapse
Affiliation(s)
- L Misquitta
- Laboratory of Biochemistry, National Cancer Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | | |
Collapse
|
8
|
Clark RF, Elgin SC. Heterochromatin protein 1, a known suppressor of position-effect variegation, is highly conserved in Drosophila. Nucleic Acids Res 1992; 20:6067-74. [PMID: 1461737 PMCID: PMC334474 DOI: 10.1093/nar/20.22.6067] [Citation(s) in RCA: 30] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022] Open
Abstract
The Su(var)205 gene of Drosophila melanogaster encodes heterochromatin protein 1 (HP1), a protein located preferentially within beta-heterochromatin. Mutation of this gene has been associated with dominant suppression of position-effect variegation. We have cloned and sequenced the gene encoding HP1 from Drosophila virilis, a distantly related species. Comparison of the predicted amino acid sequence with Drosophila melanogaster HP1 shows two regions of strong homology, one near the N-terminus (57/61 amino acids identical) and the other near the C-terminus (62/68 amino acids identical) of the protein. Little homology is seen in the 5' and 3' untranslated portions of the gene, as well as in the intronic sequences, although intron/exon boundaries are generally conserved. A comparison of the deduced amino acid sequences of HP1-like proteins from other species shows that the cores of the N-terminal and C-terminal domains have been conserved from insects to mammals. The high degree of conservation suggests that these N- and C-terminal domains could interact with other macromolecules in the formation of the condensed structure of heterochromatin.
Collapse
Affiliation(s)
- R F Clark
- Department of Biology, Washington University, St Louis, MO 63130
| | | |
Collapse
|
9
|
Friedman TB, Burnett JB, Lootens S, Steinman R, Wallrath LL. The urate oxidase gene of Drosophila pseudoobscura and Drosophila melanogaster: evolutionary changes of sequence and regulation. J Mol Evol 1992; 34:62-77. [PMID: 1556745 DOI: 10.1007/bf00163853] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]
Abstract
The urate oxidase (UO) transcription unit of Drosophila pseudoobscura was cloned, sequenced, and compared to the UO transcription unit from Drosophila melanogaster. In both species the UO coding region is divided into two exons of approximately equal size. The deduced D. pseudoobscura and D. melanogaster UO peptides have 346 and 352 amino acid residues, respectively. The nucleotide sequences of the D. pseudoobscura and D. melanogaster UO protein-coding regions are 82.2% identical whereas the deduced amino acid sequences are 87.6% identical with 42 amino acid changes, 33 of which occur in the first exon. Although the UO gene is expressed exclusively within the cells of the Malpighian tubules in both of these species, the temporal patterns of UO gene activity during development are markedly different. UO enzyme activity, UO protein, and UO mRNA are found in the third instar larva and adult of D. melanogaster but only in the adult stage of D. pseudoobscura. The intronic sequences and the extragenic 5' and 3' flanking regions of the D. pseudoobscura and D. melanogaster UO genes are highly divergent with the exception of eight small islands of conserved sequence along 772 bp 5' of the UO protein-coding region. These islands of conserved sequence are possible UO cis-acting regulatory elements as they reside along the 5' flanking DNA of the D. melanogaster UO gene that is capable of conferring a wild-type D. melanogaster pattern of UO regulation on a UO-lacZ fusion gene.
Collapse
Affiliation(s)
- T B Friedman
- Graduate Program in Genetics, Michigan State University, East Lansing 48824
| | | | | | | | | |
Collapse
|
10
|
Abstract
The fushi tarazu (ftz) gene of Drosophila melanogaster encodes a homeodomain-containing transcription factor that functions in the formation of body segments. Here we report an analysis of the DNA-binding properties of the ftz homeodomain in vitro. We provide evidence that the homeodomain binds to DNA as a monomer, with an equilibrium dissociation constant of 2.5 x 10(-11) M for binding to a consensus binding site. A single ftz binding site occupies 10 to 12 bp, as judged by the ability of protein bound at one site to interfere with binding to an adjacent site. These experiments also demonstrated a lack of cooperative binding between ftz homeodomains. Analysis of single-nucleotide substitutions over an 11-bp sequence shows that a stretch of 6 bp is critical for binding, with an optimal sequence of 5'CTAATTA3'. These data correlate well with recent structural evidence for base-specific contact at these positions. In addition, we found that sequences flanking the region of direct contact have effects on DNA binding that could be of biological significance.
Collapse
|
11
|
Abstract
The fushi tarazu (ftz) gene of Drosophila melanogaster encodes a homeodomain-containing transcription factor that functions in the formation of body segments. Here we report an analysis of the DNA-binding properties of the ftz homeodomain in vitro. We provide evidence that the homeodomain binds to DNA as a monomer, with an equilibrium dissociation constant of 2.5 x 10(-11) M for binding to a consensus binding site. A single ftz binding site occupies 10 to 12 bp, as judged by the ability of protein bound at one site to interfere with binding to an adjacent site. These experiments also demonstrated a lack of cooperative binding between ftz homeodomains. Analysis of single-nucleotide substitutions over an 11-bp sequence shows that a stretch of 6 bp is critical for binding, with an optimal sequence of 5'CTAATTA3'. These data correlate well with recent structural evidence for base-specific contact at these positions. In addition, we found that sequences flanking the region of direct contact have effects on DNA binding that could be of biological significance.
Collapse
Affiliation(s)
- B Florence
- Departments of Genetics, University of Wisconsin, Madison 53706
| | | | | |
Collapse
|
12
|
Molecular characterization of the Drosophila melanogaster urate oxidase gene, an ecdysone-repressible gene expressed only in the malpighian tubules. Mol Cell Biol 1990. [PMID: 2118989 DOI: 10.1128/mcb.10.10.5114] [Citation(s) in RCA: 31] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
The urate oxidase (UO) gene of Drosophila melanogaster is expressed during the third-instar larval and adult stages, exclusively within a subset of cells of the Malpighian tubules. The UO gene contains a 69-base-pair intron and encodes mature mRNAs of 1,224, 1,227, and 1,244 nucleotides, depending on the site of 3' endonucleolytic cleavage prior to polyadenylation. A direct repeat, 5'-AAGTGAGAGTGAT-3', is the proposed cis-regulatory element involved in 20-hydroxyecdysone repression of the UO gene. The deduced amino acid sequences of UO of D. melanogaster, rat, mouse, and pig and uricase II of soybean show 32 to 38% identity, with 22% of amino acid residues identical in all species. With use of P-element-mediated germ line transformation, 826 base pairs 5' and approximately 1,200 base pairs 3' of the D. melanogaster UO transcribed region contain all of the cis elements allowing for appropriate temporal regulation and Malpighian tubule-specific expression of the UO gene.
Collapse
|
13
|
Wallrath LL, Burnett JB, Friedman TB. Molecular characterization of the Drosophila melanogaster urate oxidase gene, an ecdysone-repressible gene expressed only in the malpighian tubules. Mol Cell Biol 1990; 10:5114-27. [PMID: 2118989 PMCID: PMC361181 DOI: 10.1128/mcb.10.10.5114-5127.1990] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022] Open
Abstract
The urate oxidase (UO) gene of Drosophila melanogaster is expressed during the third-instar larval and adult stages, exclusively within a subset of cells of the Malpighian tubules. The UO gene contains a 69-base-pair intron and encodes mature mRNAs of 1,224, 1,227, and 1,244 nucleotides, depending on the site of 3' endonucleolytic cleavage prior to polyadenylation. A direct repeat, 5'-AAGTGAGAGTGAT-3', is the proposed cis-regulatory element involved in 20-hydroxyecdysone repression of the UO gene. The deduced amino acid sequences of UO of D. melanogaster, rat, mouse, and pig and uricase II of soybean show 32 to 38% identity, with 22% of amino acid residues identical in all species. With use of P-element-mediated germ line transformation, 826 base pairs 5' and approximately 1,200 base pairs 3' of the D. melanogaster UO transcribed region contain all of the cis elements allowing for appropriate temporal regulation and Malpighian tubule-specific expression of the UO gene.
Collapse
Affiliation(s)
- L L Wallrath
- Graduate Program in Genetics, Michigan State University, East Lansing 48824
| | | | | |
Collapse
|
14
|
Structural and functional comparisons of the Drosophila virilis and Drosophila melanogaster rough genes. Proc Natl Acad Sci U S A 1990; 87:5916-20. [PMID: 1974051 PMCID: PMC54440 DOI: 10.1073/pnas.87.15.5916] [Citation(s) in RCA: 26] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022] Open
Abstract
We have isolated the homeobox gene rough (ro) from Drosophila virilis. Comparison of the predicted amino acid sequences of the D. melanogaster and D. virilis rough proteins reveals that domains of high conservation, including the homeodomain, are interspersed with highly diverged regions. Stretches of significant sequence conservation are also observed in the 5' promoter region and in the introns. The D. virilis rough gene rescues the rough mutant phenotype and is properly regulated when introduced into the D. melanogaster genome. Thus the rough protein as well as the cis-regulatory elements that ensure proper temporal and spatial regulation are functionally conserved between these Drosophila species.
Collapse
|