401
|
Huber BR, Bulyk ML. Meta-analysis discovery of tissue-specific DNA sequence motifs from mammalian gene expression data. BMC Bioinformatics 2006; 7:229. [PMID: 16643658 PMCID: PMC1522027 DOI: 10.1186/1471-2105-7-229] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2005] [Accepted: 04/27/2006] [Indexed: 11/23/2022] Open
Abstract
Background A key step in the regulation of gene expression is the sequence-specific binding of transcription factors (TFs) to their DNA recognition sites. However, elucidating TF binding site (TFBS) motifs in higher eukaryotes has been challenging, even when employing cross-species sequence conservation. We hypothesized that for human and mouse, many orthologous genes expressed in a similarly tissue-specific manner in both human and mouse gene expression data, are likely to be co-regulated by orthologous TFs that bind to DNA sequence motifs present within noncoding sequence conserved between these genomes. Results We performed automated motif searching and merging across four different motif finding algorithms, followed by filtering of the resulting motifs for those that contain blocks of information content. Applying this motif finding strategy to conserved noncoding regions surrounding co-expressed tissue-specific human genes allowed us to discover both previously known, and many novel candidate, regulatory DNA motifs in all 18 tissue-specific expression clusters that we examined. For previously known TFBS motifs, we observed that if a TF was expressed in the specified tissue of interest, then in most cases we identified a motif that matched its TRANSFAC motif; conversely, of all those discovered motifs that matched TRANSFAC motifs, most of the corresponding TF transcripts were expressed in the tissue(s) corresponding to the expression cluster for which the motif was found. Conclusion Our results indicate that the integration of the results from multiple motif finding tools identifies and ranks highly more known and novel motifs than does the use of just one of these tools. In addition, we believe that our simultaneous enrichment strategies helped to identify likely human cis regulatory elements. A number of the discovered motifs may correspond to novel binding site motifs for as yet uncharacterized tissue-specific TFs. We expect this strategy to be useful for identifying motifs in other metazoan genomes.
Collapse
Affiliation(s)
- Bertrand R Huber
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA 02115, USA
- Harvard-MIT Division of Health Sciences and Technology, Harvard Medical School, Boston, MA 02115, USA
| | - Martha L Bulyk
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA 02115, USA
- Department of Pathology, Brigham and Women's Hospital and Harvard Medical School, Boston, MA 02115, USA
- Harvard-MIT Division of Health Sciences and Technology, Harvard Medical School, Boston, MA 02115, USA
| |
Collapse
|
402
|
Nguyen DH, D'haeseleer P. Deciphering principles of transcription regulation in eukaryotic genomes. Mol Syst Biol 2006; 2:2006.0012. [PMID: 16738557 PMCID: PMC1681486 DOI: 10.1038/msb4100054] [Citation(s) in RCA: 61] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2005] [Accepted: 02/08/2006] [Indexed: 11/22/2022] Open
Abstract
Transcription regulation has been responsible for organismal complexity and diversity in the course of biological evolution and adaptation, and it is determined largely by the context-dependent behavior of cis-regulatory elements (CREs). Therefore, understanding principles underlying CRE behavior in regulating transcription constitutes a fundamental objective of quantitative biology, yet these remain poorly understood. Here we present a deterministic mathematical strategy, the motif expression decomposition (MED) method, for deriving principles of transcription regulation at the single-gene resolution level. MED operates on all genes in a genome without requiring any a priori knowledge of gene cluster membership, or manual tuning of parameters. Applying MED to Saccharomyces cerevisiae transcriptional networks, we identified four functions describing four different ways that CREs can quantitatively affect gene expression levels. These functions, three of which have extrema in different positions in the gene promoter (short-, mid-, and long-range) whereas the other depends on the motif orientation, are validated by expression data. We illustrate how nature could use these principles as an additional dimension to amplify the combinatorial power of a small set of CREs in regulating transcription.
Collapse
Affiliation(s)
- Dat H Nguyen
- Department of Genetics, Harvard Medical School, Boston, MA 02115, USA.
| | | |
Collapse
|
403
|
Wu G, Nie L, Zhang W. Relation between mRNA expression and sequence information in Desulfovibrio vulgaris: combinatorial contributions of upstream regulatory motifs and coding sequence features to variations in mRNA abundance. Biochem Biophys Res Commun 2006; 344:114-21. [PMID: 16603130 DOI: 10.1016/j.bbrc.2006.03.124] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2006] [Accepted: 03/21/2006] [Indexed: 11/29/2022]
Abstract
The context-dependent expression of genes is the core for biological activities, and significant attention has been given to identification of various factors contributing to gene expression at genomic scale. However, so far this type of analysis has been focused either on relation between mRNA expression and non-coding sequence features such as upstream regulatory motifs or on correlation between mRNA abundance and non-random features in coding sequences (e.g., codon usage and amino acid usage). In this study multiple regression analyses of the mRNA abundance and all sequence information in Desulfovibrio vulgaris were performed, with the goal to investigate how much coding and non-coding sequence features contribute to the variations in mRNA expression, and in what manner they act together. Using the AlignACE program, 442 over-represented motifs were identified from the upstream 100bp region of 293 genes located in the known regulons. Regression of mRNA expression data against the measures of coding and non-coding sequence features indicated that 54.1% of the variations in mRNA abundance can be explained by the presence of upstream motifs, while coding sequences alone contribute to 29.7% of the variations in mRNA abundance. Interestingly, most of contribution from coding sequences is overlapping with that from upstream motifs; thereby a total of 60.3% of the variations in mRNA abundance can be explained when coding and non-coding information was included. This result demonstrates that upstream regulatory motifs and coding sequence information contribute to the overall mRNA expression in a combinatorial rather than an additive manner.
Collapse
Affiliation(s)
- Gang Wu
- Department of Biological Sciences, University of Maryland at Baltimore County, 1000 Hilltop Circle, Baltimore, MD 21250, USA
| | | | | |
Collapse
|
404
|
Nicolas P, Tocquet AS, Miele V, Muri F. A Reversible Jump Markov Chain Monte Carlo Algorithm for Bacterial Promoter Motifs Discovery. J Comput Biol 2006; 13:651-67. [PMID: 16706717 DOI: 10.1089/cmb.2006.13.651] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Effective probabilistic modeling approaches have been developed to find motifs of biological function in DNA sequences. However, the problem of automated model choice remains largely open and becomes more essential as the number of sequences to be analyzed is constantly increasing. Here we propose a reversible jump Markov chain Monte Carlo algorithm for estimating both parameters and model dimension of a Bayesian hidden semi-Markov model dedicated to bacterial promoter motif discovery. Bacterial promoters are complex motifs composed of two boxes separated by a spacer of variable but constrained length and occurring close to the protein translation start site. The algorithm allows simultaneous estimations of the width of the boxes, of the support size of the spacer length distribution, and of the order of the Markovian model used for the "background" nucleotide composition. The application of this method on three sequence sets points out the good behavior of the algorithm and the biological relevance of the estimated promoter motifs.
Collapse
Affiliation(s)
- Pierre Nicolas
- Laboratoire Statistique et Génome, CNRS, Tour Evry2, 523 place des terrasses de l'Agora, F-91034 Evry, France.
| | | | | | | |
Collapse
|
405
|
Monsieurs P, Thijs G, Fadda AA, De Keersmaecker SCJ, Vanderleyden J, De Moor B, Marchal K. More robust detection of motifs in coexpressed genes by using phylogenetic information. BMC Bioinformatics 2006; 7:160. [PMID: 16549017 PMCID: PMC1525208 DOI: 10.1186/1471-2105-7-160] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2005] [Accepted: 03/20/2006] [Indexed: 11/30/2022] Open
Abstract
Background Several motif detection algorithms have been developed to discover overrepresented motifs in sets of coexpressed genes. However, in a noisy gene list, the number of genes containing the motif versus the number lacking the motif might not be sufficiently high to allow detection by classical motif detection tools. To still recover motifs which are not significantly enriched but still present, we developed a procedure in which we use phylogenetic footprinting to first delineate all potential motifs in each gene. Then we mutually compare all detected motifs and identify the ones that are shared by at least a few genes in the data set as potential candidates. Results We applied our methodology to a compiled test data set containing known regulatory motifs and to two biological data sets derived from genome wide expression studies. By executing four consecutive steps of 1) identifying conserved regions in orthologous intergenic regions, 2) aligning these conserved regions, 3) clustering the conserved regions containing similar regulatory regions followed by extraction of the regulatory motifs and 4) screening the input intergenic sequences with detected regulatory motif models, our methodology proves to be a powerful tool for detecting regulatory motifs when a low signal to noise ratio is present in the input data set. Comparing our results with two other motif detection algorithms points out the robustness of our algorithm. Conclusion We developed an approach that can reliably identify multiple regulatory motifs lacking a high degree of overrepresentation in a set of coexpressed genes (motifs belonging to sparsely connected hubs in the regulatory network) by exploiting the advantages of using both coexpression and phylogenetic information.
Collapse
Affiliation(s)
- Pieter Monsieurs
- ESAT-SCD/SISTA, K.U. Leuven, Kasteelpark Arenberg 10, 3001 Leuven-Heverlee, Belgium
| | - Gert Thijs
- ESAT-SCD/SISTA, K.U. Leuven, Kasteelpark Arenberg 10, 3001 Leuven-Heverlee, Belgium
| | - Abeer A Fadda
- Centre of Microbial and Plant Genetics, K.U. Leuven, Kasteelpark Arenberg 20, 3001 Leuven-Heverlee, Belgium
| | - Sigrid CJ De Keersmaecker
- Centre of Microbial and Plant Genetics, K.U. Leuven, Kasteelpark Arenberg 20, 3001 Leuven-Heverlee, Belgium
| | - Jozef Vanderleyden
- Centre of Microbial and Plant Genetics, K.U. Leuven, Kasteelpark Arenberg 20, 3001 Leuven-Heverlee, Belgium
| | - Bart De Moor
- ESAT-SCD/SISTA, K.U. Leuven, Kasteelpark Arenberg 10, 3001 Leuven-Heverlee, Belgium
| | - Kathleen Marchal
- Centre of Microbial and Plant Genetics, K.U. Leuven, Kasteelpark Arenberg 20, 3001 Leuven-Heverlee, Belgium
| |
Collapse
|
406
|
Kundaje A, Middendorf M, Shah M, Wiggins CH, Freund Y, Leslie C. A classification-based framework for predicting and analyzing gene regulatory response. BMC Bioinformatics 2006; 7 Suppl 1:S5. [PMID: 16723008 PMCID: PMC1810316 DOI: 10.1186/1471-2105-7-s1-s5] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
Abstract
BACKGROUND We have recently introduced a predictive framework for studying gene transcriptional regulation in simpler organisms using a novel supervised learning algorithm called GeneClass. GeneClass is motivated by the hypothesis that in model organisms such as Saccharomyces cerevisiae, we can learn a decision rule for predicting whether a gene is up- or down-regulated in a particular microarray experiment based on the presence of binding site subsequences ("motifs") in the gene's regulatory region and the expression levels of regulators such as transcription factors in the experiment ("parents"). GeneClass formulates the learning task as a classification problem--predicting +1 and -1 labels corresponding to up- and down-regulation beyond the levels of biological and measurement noise in microarray measurements. Using the Adaboost algorithm, GeneClass learns a prediction function in the form of an alternating decision tree, a margin-based generalization of a decision tree. METHODS In the current work, we introduce a new, robust version of the GeneClass algorithm that increases stability and computational efficiency, yielding a more scalable and reliable predictive model. The improved stability of the prediction tree enables us to introduce a detailed post-processing framework for biological interpretation, including individual and group target gene analysis to reveal condition-specific regulation programs and to suggest signaling pathways. Robust GeneClass uses a novel stabilized variant of boosting that allows a set of correlated features, rather than single features, to be included at nodes of the tree; in this way, biologically important features that are correlated with the single best feature are retained rather than decorrelated and lost in the next round of boosting. Other computational developments include fast matrix computation of the loss function for all features, allowing scalability to large datasets, and the use of abstaining weak rules, which results in a more shallow and interpretable tree. We also show how to incorporate genome-wide protein-DNA binding data from ChIP chip experiments into the GeneClass algorithm, and we use an improved noise model for gene expression data. RESULTS Using the improved scalability of Robust GeneClass, we present larger scale experiments on a yeast environmental stress dataset, training and testing on all genes and using a comprehensive set of potential regulators. We demonstrate the improved stability of the features in the learned prediction tree, and we show the utility of the post-processing framework by analyzing two groups of genes in yeast--the protein chaperones and a set of putative targets of the Nrg1 and Nrg2 transcription factors--and suggesting novel hypotheses about their transcriptional and post-transcriptional regulation. Detailed results and Robust GeneClass source code is available for download from http://www.cs.columbia.edu/compbio/robust-geneclass.
Collapse
Affiliation(s)
- Anshul Kundaje
- Department of Computer Science, Columbia University, New York, NY 10027, USA
| | | | - Mihir Shah
- Department of Computer Science, Columbia University, New York, NY 10027, USA
| | - Chris H Wiggins
- Department of Applied Physics and Applied Mathematics, Columbia University, New York, NY 10027, USA
- Center for Computational Biology and Bioinformatics, Columbia University, New York, NY 10027, USA
| | - Yoav Freund
- Department of Computer Science, Columbia University, New York, NY 10027, USA
- Center for Computational Biology and Bioinformatics, Columbia University, New York, NY 10027, USA
- Center for Computational Learning Systems, Columbia University, New York, NY 10027, USA
| | - Christina Leslie
- Department of Computer Science, Columbia University, New York, NY 10027, USA
- Center for Computational Biology and Bioinformatics, Columbia University, New York, NY 10027, USA
- Center for Computational Learning Systems, Columbia University, New York, NY 10027, USA
| |
Collapse
|
407
|
Abstract
One of the goals of systems-biology research is to discover networks and interactions by integrating diverse data sets. So far, systems-biology research has focused on model organisms, which are well characterized and therefore suited to testing new methods. Systems biology has great potential for use in the search for therapies for disease. Here, the potential of systems-biology approaches in the search for new drugs and vaccines to treat malaria is examined.
Collapse
Affiliation(s)
- Elizabeth A Winzeler
- Department of Cell Biology, ICND202, The Scripps Research Institute, La Jolla, California 92037, USA.
| |
Collapse
|
408
|
Andersson CR, Isaksson A, Gustafsson MG. Bayesian detection of periodic mRNA time profiles without use of training examples. BMC Bioinformatics 2006; 7:63. [PMID: 16469110 PMCID: PMC1413563 DOI: 10.1186/1471-2105-7-63] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2005] [Accepted: 02/09/2006] [Indexed: 11/10/2022] Open
Abstract
Background Detection of periodically expressed genes from microarray data without use of known periodic and non-periodic training examples is an important problem, e.g. for identifying genes regulated by the cell-cycle in poorly characterised organisms. Commonly the investigator is only interested in genes expressed at a particular frequency that characterizes the process under study but this frequency is seldom exactly known. Previously proposed detector designs require access to labelled training examples and do not allow systematic incorporation of diffuse prior knowledge available about the period time. Results A learning-free Bayesian detector that does not rely on labelled training examples and allows incorporation of prior knowledge about the period time is introduced. It is shown to outperform two recently proposed alternative learning-free detectors on simulated data generated with models that are different from the one used for detector design. Results from applying the detector to mRNA expression time profiles from S. cerevisiae showsthat the genes detected as periodically expressed only contain a small fraction of the cell-cycle genes inferred from mutant phenotype. For example, when the probability of false alarm was equal to 7%, only 12% of the cell-cycle genes were detected. The genes detected as periodically expressed were found to have a statistically significant overrepresentation of known cell-cycle regulated sequence motifs. One known sequence motif and 18 putative motifs, previously not associated with periodic expression, were also over represented. Conclusion In comparison with recently proposed alternative learning-free detectors for periodic gene expression, Bayesian inference allows systematic incorporation of diffuse a priori knowledge about, e.g. the period time. This results in relative performance improvements due to increased robustness against errors in the underlying assumptions. Results from applying the detector to mRNA expression time profiles from S. cerevisiae include several new findings that deserve further experimental studies.
Collapse
Affiliation(s)
- Claes R Andersson
- The Linnaeus Centre for Bioinformatics, BMC, Uppsala University, Box 598, S-751 24 Uppsala, Sweden
| | - Anders Isaksson
- Department of Genetics and Pathology, Rudbecklaboratoriet, Uppsala University, S-751 85 Uppsala, Sweden
| | - Mats G Gustafsson
- Department of Genetics and Pathology, Rudbecklaboratoriet, Uppsala University, S-751 85 Uppsala, Sweden
- Department of Engineering Sciences, Uppsala University, Box 528, S-751 20 Uppsala, Sweden
| |
Collapse
|
409
|
Chang LW, Nagarajan R, Magee JA, Milbrandt J, Stormo GD. A systematic model to predict transcriptional regulatory mechanisms based on overrepresentation of transcription factor binding profiles. Genome Res 2006; 16:405-13. [PMID: 16449500 PMCID: PMC1415218 DOI: 10.1101/gr.4303406] [Citation(s) in RCA: 62] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/31/2023]
Abstract
An important aspect of understanding a biological pathway is to delineate the transcriptional regulatory mechanisms of the genes involved. Two important tasks are often encountered when studying transcription regulation, i.e., (1) the identification of common transcriptional regulators of a set of coexpressed genes; (2) the identification of genes that are regulated by one or several transcription factors. In this study, a systematic and statistical approach was taken to accomplish these tasks by establishing an integrated model considering all of the promoters and characterized transcription factors (TFs) in the genome. A promoter analysis pipeline (PAP) was developed to implement this approach. PAP was tested using coregulated gene clusters collected from the literature. In most test cases, PAP identified the transcription regulators of the input genes accurately. When compared with chromatin immunoprecipitation experiment data, PAP's predictions are consistent with the experimental observations. When PAP was used to analyze one published expression-profiling data set and two novel coregulated gene sets, PAP was able to generate biologically meaningful hypotheses. Therefore, by taking a systematic approach of considering all promoters and characterized TFs in our model, we were able to make more reliable predictions about the regulation of gene expression in mammalian organisms.
Collapse
Affiliation(s)
- Li-Wei Chang
- Department of Genetics, Washington University School of Medicine, St. Louis, Missouri 63110, USA
| | | | | | | | | |
Collapse
|
410
|
Choi D, Fang Y, Mathers WD. Condition-specific coregulation with cis-regulatory motifs and modules in the mouse genome. Genomics 2006; 87:500-8. [PMID: 16431075 DOI: 10.1016/j.ygeno.2005.11.015] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2005] [Accepted: 11/26/2005] [Indexed: 11/30/2022]
Abstract
Deciphering genetic regulatory codes remains a challenge. Here, we present an effective approach to identifying in vivo condition-specific coregulation with cis-regulatory motifs and modules in the mouse genome. A resampling-based algorithm was adopted to cluster our microarray data of a stress response, which generated 35 tight clusters with unique expression patterns containing 811 genes of 5652 genes significantly altered. Database searches identified many known motifs within the 3-kb regulatory regions of 40 genes from 3 clusters and modules with six to nine motifs that were commonly shared by 60-100% of these genes. The upstream regulatory region contained the highest frequency of these common motifs. CisModule program predictions were comparable with the results from database searches and found four potentially novel motifs. This result indicates that these motifs and modules could be responsible for gene coregulation of the stress response in the lacrimal gland.
Collapse
Affiliation(s)
- Dongseok Choi
- Division of Biostatistics, Department of Public Health & Preventive Medicine, Oregon Health & Science University, 3375 SW Terwilliger Boulevard, Portland, OR 97239, USA
| | | | | |
Collapse
|
411
|
Kong KF, Jayawardena SR, Indulkar SD, Del Puerto A, Koh CL, Høiby N, Mathee K. Pseudomonas aeruginosa AmpR is a global transcriptional factor that regulates expression of AmpC and PoxB beta-lactamases, proteases, quorum sensing, and other virulence factors. Antimicrob Agents Chemother 2006; 49:4567-75. [PMID: 16251297 PMCID: PMC1280116 DOI: 10.1128/aac.49.11.4567-4575.2005] [Citation(s) in RCA: 125] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
In members of the family Enterobacteriaceae, ampC, which encodes a beta-lactamase, is regulated by an upstream, divergently transcribed gene, ampR. However, in Pseudomonas aeruginosa, the regulation of ampC is not understood. In this study, we compared the characteristics of a P. aeruginosa ampR mutant, PAOampR, with that of an isogenic ampR+ parent. The ampR mutation greatly altered AmpC production. In the absence of antibiotic, PAOampR expressed increased basal beta-lactamase levels. However, this increase was not followed by a concomitant increase in the P(ampC) promoter activity. The discrepancy in protein and transcription analyses led us to discover the presence of another chromosomal AmpR-regulated beta-lactamase, PoxB. We found that the expression of P. aeruginosa ampR greatly altered the beta-lactamase production from ampC and poxB in Escherichia coli: it up-regulated AmpC but down-regulated PoxB activities. In addition, the constitutive P(ampR) promoter activity in PAOampR indicated that AmpR did not autoregulate in the absence or presence of inducers. We further demonstrated that AmpR is a global regulator because the strain carrying the ampR mutation produced higher levels of pyocyanin and LasA protease and lower levels of LasB elastase than the wild-type strain. The increase in LasA levels was positively correlated with the P(lasA), P(lasI), and P(lasR) expression. The reduction in the LasB activity was positively correlated with the P(rhlR) expression. Thus, AmpR plays a dual role, positively regulating the ampC, lasB, and rhlR expression levels and negatively regulating the poxB, lasA, lasI, and lasR expression levels.
Collapse
Affiliation(s)
- Kok-Fai Kong
- Department of Biological Sciences, Florida International University, University Park, Miami, Florida 33199, USA
| | | | | | | | | | | | | |
Collapse
|
412
|
Warren CL, Kratochvil NCS, Hauschild KE, Foister S, Brezinski ML, Dervan PB, Phillips GN, Ansari AZ. Defining the sequence-recognition profile of DNA-binding molecules. Proc Natl Acad Sci U S A 2006; 103:867-72. [PMID: 16418267 PMCID: PMC1347994 DOI: 10.1073/pnas.0509843102] [Citation(s) in RCA: 176] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Determining the sequence-recognition properties of DNA-binding proteins and small molecules remains a major challenge. To address this need, we have developed a high-throughput approach that provides a comprehensive profile of the binding properties of DNA-binding molecules. The approach is based on displaying every permutation of a duplex DNA sequence (up to 10 positional variants) on a microfabricated array. The entire sequence space is interrogated simultaneously, and the affinity of a DNA-binding molecule for every sequence is obtained in a rapid, unbiased, and unsupervised manner. Using this platform, we have determined the full molecular recognition profile of an engineered small molecule and a eukaryotic transcription factor. The approach also yielded unique insights into the altered sequence-recognition landscapes as a result of cooperative assembly of DNA-binding molecules in a ternary complex. Solution studies strongly corroborated the sequence preferences identified by the array analysis.
Collapse
|
413
|
Chen JCY, Powers T. Coordinate regulation of multiple and distinct biosynthetic pathways by TOR and PKA kinases in S. cerevisiae. Curr Genet 2006; 49:281-93. [PMID: 16397762 DOI: 10.1007/s00294-005-0055-9] [Citation(s) in RCA: 37] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2005] [Revised: 11/14/2005] [Accepted: 11/15/2005] [Indexed: 10/25/2022]
Abstract
The target of rapamycin (TOR) signaling pathway is an essential regulator of cell growth in eukaryotic cells. In Saccharomyces cerevisiae, TOR controls the expression of many genes involved in a wide array of distinct nutrient-responsive metabolic pathways. By exploring the TOR pathway under different growth conditions, we have identified novel TOR-regulated genes, including genes required for branched-chain amino acid biosynthesis as well as lysine biosynthesis (LYS genes). We show that TOR-dependent control of LYS gene expression occurs independently from previously identified LYS gene regulators and is instead coupled to cAMP-regulated protein kinase A (PKA). Additional genome-wide expression analyses reveal that TOR and PKA coregulate LYS gene expression in a pattern that is remarkably similar to genes within the ribosomal protein and "Ribi" regulon genes required for ribosome biogenesis. Moreover, this pattern of coregulation is distinct from other clusters of TOR/PKA coregulated genes, which includes genes involved in fermentation as well as aerobic respiration, suggesting that control of gene expression by TOR and PKA involves multiple modes of crosstalk. Our results underscore how multiple signaling pathways, general growth conditions, as well as the availability of specific nutrients contribute to the maintenance of appropriate patterns of gene activity in yeast.
Collapse
Affiliation(s)
- Jenny C-Y Chen
- Section of Molecular and Cellular Biology, College of Biological Sciences, University of California, Davis, CA 95616, USA
| | | |
Collapse
|
414
|
Abstract
DNA-binding proteins are important for various cellular processes, such as transcriptional regulation, recombination, replication, repair, and DNA modification. Of particular interest are transcription factors (TFs), since through interactions with their DNA binding sites, they modulate gene expression in a manner required for normal cellular growth and differentiation, and also for response to environmental stimuli. To date, the DNA-binding specificities of most DNA-binding proteins remain unknown, as earlier technologies aimed at characterizing DNA-protein interactions have been laborious and not highly scalable. New DNA microarray-based technology, termed protein binding microarrays (PBMs), has been developed that allows rapid, high-throughput characterization of in vitro DNA binding site sequence specificities of TFs or of any DNA binding protein. DNA binding site data from PBMs can be used to predict what genes are regulated by a given TF, what the functions are of a given TF and its predicted target genes, and how that TF may fit into the transcriptional regulatory networks of the cell.
Collapse
Affiliation(s)
- Martha L. Bulyk
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA 02115
- Department of Pathology, Brigham and Women's Hospital and Harvard Medical School, Boston, MA 02115
- Harvard/MIT Division of Health Sciences and Technology (HST), Brigham and Women's Hospital and Harvard Medical School, Boston, MA 02115
| |
Collapse
|
415
|
Wang G, Zhang W. A steganalysis-based approach to comprehensive identification and characterization of functional regulatory elements. Genome Biol 2006; 7:R49. [PMID: 16787547 PMCID: PMC1779545 DOI: 10.1186/gb-2006-7-6-r49] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2006] [Revised: 04/10/2006] [Accepted: 05/17/2006] [Indexed: 11/23/2022] Open
Abstract
The comprehensive identification of cis-regulatory elements on a genome scale is a challenging problem. We develop a novel, steganalysis-based approach for genome-wide motif finding, called WordSpy, by viewing regulatory regions as a stegoscript with cis-elements embedded in 'background' sequences. We apply WordSpy to the promoters of cell-cycle-related genes of Saccharomyces cerevisiae and Arabidopsis thaliana, identifying all known cell-cycle motifs with high ranking. WordSpy can discover a complete set of cis-elements and facilitate the systematic study of regulatory networks.
Collapse
Affiliation(s)
- Guandong Wang
- Department of Computer Science and Engineering, Washington University, St. Louis, MO 63130, USA
| | - Weixiong Zhang
- Department of Computer Science and Engineering, Washington University, St. Louis, MO 63130, USA
- Department of Genetics, Washington University, St. Louis, MO 63130, USA
| |
Collapse
|
416
|
Van Hellemont R, Monsieurs P, Thijs G, De Moor B, Van de Peer Y, Marchal K. A novel approach to identifying regulatory motifs in distantly related genomes. Genome Biol 2005; 6:R113. [PMID: 16420672 PMCID: PMC1414112 DOI: 10.1186/gb-2005-6-13-r113] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2005] [Revised: 08/22/2005] [Accepted: 12/01/2005] [Indexed: 11/25/2022] Open
Abstract
A two-step procedure for identifying regulatory motifs in distantly related organisms is described that combines the advantages of sequence alignment and motif detection approaches. Although proven successful in the identification of regulatory motifs, phylogenetic footprinting methods still show some shortcomings. To assess these difficulties, most apparent when applying phylogenetic footprinting to distantly related organisms, we developed a two-step procedure that combines the advantages of sequence alignment and motif detection approaches. The results on well-studied benchmark datasets indicate that the presented method outperforms other methods when the sequences become either too long or too heterogeneous in size.
Collapse
Affiliation(s)
- Ruth Van Hellemont
- ESAT-SCD, KU Leuven, Kasteelpark Arenberg 10, 3001 Leuven-Heverlee, Belgium
| | - Pieter Monsieurs
- ESAT-SCD, KU Leuven, Kasteelpark Arenberg 10, 3001 Leuven-Heverlee, Belgium
| | - Gert Thijs
- ESAT-SCD, KU Leuven, Kasteelpark Arenberg 10, 3001 Leuven-Heverlee, Belgium
| | - Bart De Moor
- ESAT-SCD, KU Leuven, Kasteelpark Arenberg 10, 3001 Leuven-Heverlee, Belgium
| | - Yves Van de Peer
- Plant Systems Biology, Bioinformatics and Evolutionary Genomics, VIB/Ghent University, Technologiepark 927, 9052 Gent, Belgium
| | - Kathleen Marchal
- ESAT-SCD, KU Leuven, Kasteelpark Arenberg 10, 3001 Leuven-Heverlee, Belgium
- Department of Microbial and Molecular Systems, KU Leuven, Kasteelpark Arenberg 20, 3001 Leuven-Heverlee, Belgium
| |
Collapse
|
417
|
Siddharthan R, Siggia ED, van Nimwegen E. PhyloGibbs: a Gibbs sampling motif finder that incorporates phylogeny. PLoS Comput Biol 2005; 1:e67. [PMID: 16477324 PMCID: PMC1309704 DOI: 10.1371/journal.pcbi.0010067] [Citation(s) in RCA: 176] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2005] [Accepted: 10/28/2005] [Indexed: 12/27/2022] Open
Abstract
A central problem in the bioinformatics of gene regulation is to find the binding sites for regulatory proteins. One of the most promising approaches toward identifying these short and fuzzy sequence patterns is the comparative analysis of orthologous intergenic regions of related species. This analysis is complicated by various factors. First, one needs to take the phylogenetic relationship between the species into account in order to distinguish conservation that is due to the occurrence of functional sites from spurious conservation that is due to evolutionary proximity. Second, one has to deal with the complexities of multiple alignments of orthologous intergenic regions, and one has to consider the possibility that functional sites may occur outside of conserved segments. Here we present a new motif sampling algorithm, PhyloGibbs, that runs on arbitrary collections of multiple local sequence alignments of orthologous sequences. The algorithm searches over all ways in which an arbitrary number of binding sites for an arbitrary number of transcription factors (TFs) can be assigned to the multiple sequence alignments. These binding site configurations are scored by a Bayesian probabilistic model that treats aligned sequences by a model for the evolution of binding sites and "background" intergenic DNA. This model takes the phylogenetic relationship between the species in the alignment explicitly into account. The algorithm uses simulated annealing and Monte Carlo Markov-chain sampling to rigorously assign posterior probabilities to all the binding sites that it reports. In tests on synthetic data and real data from five Saccharomyces species our algorithm performs significantly better than four other motif-finding algorithms, including algorithms that also take phylogeny into account. Our results also show that, in contrast to the other algorithms, PhyloGibbs can make realistic estimates of the reliability of its predictions. Our tests suggest that, running on the five-species multiple alignment of a single gene's upstream region, PhyloGibbs on average recovers over 50% of all binding sites in S. cerevisiae at a specificity of about 50%, and 33% of all binding sites at a specificity of about 85%. We also tested PhyloGibbs on collections of multiple alignments of intergenic regions that were recently annotated, based on ChIP-on-chip data, to contain binding sites for the same TF. We compared PhyloGibbs's results with the previous analysis of these data using six other motif-finding algorithms. For 16 of 21 TFs for which all other motif-finding methods failed to find a significant motif, PhyloGibbs did recover a motif that matches the literature consensus. In 11 cases where there was disagreement in the results we compiled lists of known target genes from the literature, and found that running PhyloGibbs on their regulatory regions yielded a binding motif matching the literature consensus in all but one of the cases. Interestingly, these literature gene lists had little overlap with the targets annotated based on the ChIP-on-chip data. The PhyloGibbs code can be downloaded from http://www.biozentrum.unibas.ch/~nimwegen/cgi-bin/phylogibbs.cgi or http://www.imsc.res.in/~rsidd/phylogibbs. The full set of predicted sites from our tests on yeast are available at http://www.swissregulon.unibas.ch.
Collapse
Affiliation(s)
- Rahul Siddharthan
- Center for Studies in Physics and Biology, The Rockefeller University, New York, New York, United States of America
- Institute of Mathematical Sciences, Taramani, Chennai, India
| | - Eric D Siggia
- Center for Studies in Physics and Biology, The Rockefeller University, New York, New York, United States of America
| | - Erik van Nimwegen
- Center for Studies in Physics and Biology, The Rockefeller University, New York, New York, United States of America
- Division of Bioinformatics, Biozentrum, University of Basel, Basel, Switzerland
- * To whom correspondence should be addressed. E-mail:
| |
Collapse
|
418
|
Ettwiller L, Paten B, Souren M, Loosli F, Wittbrodt J, Birney E. The discovery, positioning and verification of a set of transcription-associated motifs in vertebrates. Genome Biol 2005; 6:R104. [PMID: 16356267 PMCID: PMC1414082 DOI: 10.1186/gb-2005-6-12-r104] [Citation(s) in RCA: 41] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2005] [Revised: 10/18/2005] [Accepted: 11/08/2005] [Indexed: 11/10/2022] Open
Abstract
We have developed several new methods to investigate transcriptional motifs in vertebrates. We developed a specific alignment tool appropriate for regions involved in transcription control, and exhaustively enumerated all possible 12-mers for involvement in transcription by virtue of their mammalian conservation. We then used deeper comparative analysis across vertebrates to identify the active instances of these motifs. We have shown experimentally in Medaka fish that a subset of these predictions is involved in transcription.
Collapse
Affiliation(s)
- Laurence Ettwiller
- EBI, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK
| | - Benedict Paten
- EBI, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK
| | | | - Felix Loosli
- EMBL, Meyerhofstrasse, 69012 Heidelberg, Germany
| | | | - Ewan Birney
- EBI, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK
| |
Collapse
|
419
|
Silva WLDS, Cavalcanti ARDO, Guimarães KS, Morais Jr. MAD. Identification in silico of putative damage responsive elements (DRE) in promoter regions of the yeast genome. Genet Mol Biol 2005. [DOI: 10.1590/s1415-47572005000500025] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022] Open
|
420
|
Jiao Y, Ma L, Strickland E, Deng XW. Conservation and divergence of light-regulated genome expression patterns during seedling development in rice and Arabidopsis. THE PLANT CELL 2005; 17:3239-56. [PMID: 16284311 PMCID: PMC1315367 DOI: 10.1105/tpc.105.035840] [Citation(s) in RCA: 68] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/05/2023]
Abstract
Genome-wide 70-mer oligonucleotide microarrays of rice (Oryza sativa) and Arabidopsis thaliana were used to profile genome expression changes during light-regulated seedling development. We estimate that the expression of approximately 20% of the genome in both rice and Arabidopsis seedlings is regulated by white light. Qualitatively similar expression profiles from seedlings grown under different light qualities were observed in both species; however, a quantitatively weaker effect on genome expression was observed in rice. Most metabolic pathways exhibited qualitatively similar light regulation in both species with a few species-specific differences. Global comparison of expression profiles between rice and Arabidopsis reciprocal best-matched gene pairs revealed a higher correlation of genome expression patterns in constant light than in darkness, suggesting that the genome expression profile of photomorphogenesis is more conserved. Transcription factor gene expression under constant light exposure was poorly conserved between the two species, implying a faster-evolving rate of transcription factor gene expression in light-grown plants. Organ-specific expression profiles during seedling photomorphogenesis provide genome-level evidence for divergent light effects in different higher plant organs. Finally, overrepresentation of specific promoter motifs in root- and leaf-specific light-regulated genes in both species suggests that these cis-elements are important for gene expression responses to light.
Collapse
Affiliation(s)
- Yuling Jiao
- Department of Molecular, Cellular, and Developmental Biology, Yale University, New Haven, Conecticut 06520-8014, USA
| | | | | | | |
Collapse
|
421
|
Perco P, Kainz A, Mayer G, Lukas A, Oberbauer R, Mayer B. Detection of coregulation in differential gene expression profiles. Biosystems 2005; 82:235-47. [PMID: 16181729 DOI: 10.1016/j.biosystems.2005.08.001] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2005] [Revised: 08/02/2005] [Accepted: 08/02/2005] [Indexed: 01/04/2023]
Abstract
Genomics and proteomics approaches generate distinct gene expression and protein profiles, listing individual genes embedded in broad functional terms as gene ontologies. However, interpretation of gene profiles in a regulatory and functional context remains a major issue. Elucidation of regulatory mechanisms at the gene expression level via analysis of promoter regions is a prominent procedure to decipher such gene regulatory networks. We propose a novel genetic algorithm (GA) to extract joint promoter modules in a set of coexpressed genes as resulting from differential gene expression experiments. Algorithm design has focused on the following constraints: (I) identification of the major promoter modules, which are (II) characterized by a maximum number of joint motifs and (III) are found in a maximum number of coexpressed genes. The capability of the GA in detecting multiple modules was evaluated on various test data sets, analyzing the impact of the number of motifs per promoter module, the number of genes associated with a module, as well as the total number of distinct promoter modules encoded in a sequence set. In addition to the test data sets, the GA was evaluated on two biological examples, namely a muscle-specific data set and the upstream sequences of the beta-actin gene (ACTB) derived from different species, complemented by a comparison to alternative promoter module identification routines.
Collapse
Affiliation(s)
- Paul Perco
- Institute for Biomolecular Structural Chemistry, University of Vienna, Campus Vienna Biocenter 6, 1030 Vienna, Austria
| | | | | | | | | | | |
Collapse
|
422
|
Barnes DW, Mattingly CJ, Parton A, Dowell LM, Bayne CJ, Forrest JN. Marine organism cell biology and regulatory sequence discoveryin comparative functional genomics. Cytotechnology 2005; 46:123-37. [PMID: 19003267 PMCID: PMC3449718 DOI: 10.1007/s10616-005-1719-5] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2005] [Accepted: 08/04/2005] [Indexed: 01/28/2023] Open
Abstract
The use of bioinformatics to integrate phenotypic and genomic data from mammalian models is well established as a means of understanding human biology and disease. Beyond direct biomedical applications of these approaches in predicting structure–function relationships between coding sequences and protein activities, comparative studies also promote understanding of molecular evolution and the relationship between genomic sequence and morphological and physiological specialization. Recently recognized is the potential of comparative studies to identify functionally significant regulatory regions and to generate experimentally testable hypotheses that contribute to understanding mechanisms that regulate gene expression, including transcriptional activity, alternative splicing and transcript stability. Functional tests of hypotheses generated by computational approaches require experimentally tractable in vitro systems, including cell cultures. Comparative sequence analysis strategies that use genomic sequences from a variety of evolutionarily diverse organisms are critical for identifying conserved regulatory motifs in the 5′-upstream, 3′-downstream and introns of genes. Genomic sequences and gene orthologues in the first aquatic vertebrate and protovertebrate organisms to be fully sequenced (Fugu rubripes, Ciona intestinalis, Tetraodon nigroviridis, Danio rerio) as well as in the elasmobranchs, spiny dogfish shark (Squalus acanthias) and little skate (Raja erinacea), and marine invertebrate models such as the sea urchin (Strongylocentrotus purpuratus) are valuable in the prediction of putative genomic regulatory regions. Cell cultures have been derived for these and other model species. Data and tools resulting from these kinds of studies will contribute to understanding transcriptional regulation of biomedically important genes and provide new avenues for medical therapeutics and disease prevention.
Collapse
Affiliation(s)
- David W Barnes
- Mount Desert Island Biological Laboratory, Center for Marine Functional Genomics Studies, P.O. Box 35, Old Bar Harbour Road, Salisbury Cove, MA, 04672, USA,
| | | | | | | | | | | |
Collapse
|
423
|
Futschik ME, Carlisle B. Noise-robust soft clustering of gene expression time-course data. J Bioinform Comput Biol 2005; 3:965-88. [PMID: 16078370 DOI: 10.1142/s0219720005001375] [Citation(s) in RCA: 286] [Impact Index Per Article: 15.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2004] [Revised: 01/24/2005] [Accepted: 01/30/2005] [Indexed: 11/18/2022]
Abstract
Clustering is an important tool in microarray data analysis. This unsupervised learning technique is commonly used to reveal structures hidden in large gene expression data sets. The vast majority of clustering algorithms applied so far produce hard partitions of the data, i.e. each gene is assigned exactly to one cluster. Hard clustering is favourable if clusters are well separated. However, this is generally not the case for microarray time-course data, where gene clusters frequently overlap. Additionally, hard clustering algorithms are often highly sensitive to noise. To overcome the limitations of hard clustering, we applied soft clustering which offers several advantages for researchers. First, it generates accessible internal cluster structures, i.e. it indicates how well corresponding clusters represent genes. This can be used for the more targeted search for regulatory elements. Second, the overall relation between clusters, and thus a global clustering structure, can be defined. Additionally, soft clustering is more noise robust and a priori pre-filtering of genes can be avoided. This prevents the exclusion of biologically relevant genes from the data analysis. Soft clustering was implemented here using the fuzzy c-means algorithm. Procedures to find optimal clustering parameters were developed. A software package for soft clustering has been developed based on the open-source statistical language R. The package called Mfuzz is freely available.
Collapse
Affiliation(s)
- Matthias E Futschik
- Institute of Theoretical Biology, Humboldt-University, Invalidenstr. 43, 10115 Berlin, Germany.
| | | |
Collapse
|
424
|
Li X, Zhong S, Wong WH. Reliable prediction of transcription factor binding sites by phylogenetic verification. Proc Natl Acad Sci U S A 2005; 102:16945-50. [PMID: 16286651 PMCID: PMC1283155 DOI: 10.1073/pnas.0504201102] [Citation(s) in RCA: 38] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2005] [Accepted: 10/03/2005] [Indexed: 11/18/2022] Open
Abstract
We present a statistical methodology that largely improves the accuracy in computational predictions of transcription factor (TF) binding sites in eukaryote genomes. This method models the cross-species conservation of binding sites without relying on accurate sequence alignment. It can be coupled with any motif-finding algorithm that searches for overrepresented sequence motifs in individual species and can increase the accuracy of the coupled motif-finding algorithm. Because this method is capable of accurately detecting TF binding sites, it also enhances our ability to predict the cis-regulatory modules. We applied this method on the published chromatin immunoprecipitation (ChIP)-chip data in Saccharomyces cerevisiae and found that its sensitivity and specificity are 9% and 14% higher than those of two recent methods. We also recovered almost all of the previously verified TF binding sites and made predictions on the cis-regulatory elements that govern the tight regulation of ribosomal protein genes in 13 eukaryote species (2 plants, 4 yeasts, 2 worms, 2 insects, and 3 mammals). These results give insights to the transcriptional regulation in eukaryotic organisms.
Collapse
Affiliation(s)
- Xiaoman Li
- Department of Statistics, Stanford University, Sequoia Hall, 390 Serra Mall, Stanford, CA 94305-4065, USA.
| | | | | |
Collapse
|
425
|
Eriksson PR, Mendiratta G, McLaughlin NB, Wolfsberg TG, Mariño-Ramírez L, Pompa TA, Jainerin M, Landsman D, Shen CH, Clark DJ. Global regulation by the yeast Spt10 protein is mediated through chromatin structure and the histone upstream activating sequence elements. Mol Cell Biol 2005; 25:9127-37. [PMID: 16199888 PMCID: PMC1265784 DOI: 10.1128/mcb.25.20.9127-9137.2005] [Citation(s) in RCA: 54] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
The yeast SPT10 gene encodes a putative histone acetyltransferase (HAT) implicated as a global transcription regulator acting through basal promoters. Here we address the mechanism of this global regulation. Although microarray analysis confirmed that Spt10p is a global regulator, Spt10p was not detected at any of the most strongly affected genes in vivo. In contrast, the presence of Spt10p at the core histone gene promoters in vivo was confirmed. Since Spt10p activates the core histone genes, a shortage of histones could occur in spt10Delta cells, resulting in defective chromatin structure and a consequent activation of basal promoters. Consistent with this hypothesis, the spt10Delta phenotype can be rescued by extra copies of the histone genes and chromatin is poorly assembled in spt10Delta cells, as shown by irregular nucleosome spacing and reduced negative supercoiling of the endogenous 2mum plasmid. Furthermore, Spt10p binds specifically and highly cooperatively to pairs of upstream activating sequence elements in the core histone promoters [consensus sequence, (G/A)TTCCN(6)TTCNC], consistent with a direct role in histone gene regulation. No other high-affinity sites are predicted in the yeast genome. Thus, Spt10p is a sequence-specific activator of the histone genes, possessing a DNA-binding domain fused to a likely HAT domain.
Collapse
Affiliation(s)
- Peter R Eriksson
- Laboratory of Molecular Growth Regulation, National Instistute of Child Health and Human Development, National Institutes of Health, Bethesda, MD 20892, USA
| | | | | | | | | | | | | | | | | | | |
Collapse
|
426
|
Hwang D, Smith JJ, Leslie DM, Weston AD, Rust AG, Ramsey S, de Atauri P, Siegel AF, Bolouri H, Aitchison JD, Hood L. A data integration methodology for systems biology: experimental verification. Proc Natl Acad Sci U S A 2005; 102:17302-7. [PMID: 16301536 PMCID: PMC1297683 DOI: 10.1073/pnas.0508649102] [Citation(s) in RCA: 109] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
The integration of data from multiple global assays is essential to understanding dynamic spatiotemporal interactions within cells. In a companion paper, we reported a data integration methodology, designated Pointillist, that can handle multiple data types from technologies with different noise characteristics. Here we demonstrate its application to the integration of 18 data sets relating to galactose utilization in yeast. These data include global changes in mRNA and protein abundance, genome-wide protein-DNA interaction data, database information, and computational predictions of protein-DNA and protein-protein interactions. We divided the integration task to determine three network components: key system elements (genes and proteins), protein-protein interactions, and protein-DNA interactions. Results indicate that the reconstructed network efficiently focuses on and recapitulates the known biology of galactose utilization. It also provided new insights, some of which were verified experimentally. The methodology described here, addresses a critical need across all domains of molecular and cell biology, to effectively integrate large and disparate data sets.
Collapse
Affiliation(s)
- Daehee Hwang
- Institute for Systems Biology, Seattle, WA 98103, USA
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
427
|
Murillo LA, Newport G, Lan CY, Habelitz S, Dungan J, Agabian NM. Genome-wide transcription profiling of the early phase of biofilm formation by Candida albicans. EUKARYOTIC CELL 2005; 4:1562-73. [PMID: 16151249 PMCID: PMC1214198 DOI: 10.1128/ec.4.9.1562-1573.2005] [Citation(s) in RCA: 122] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
Abstract
The ability to adhere to surfaces and develop as a multicellular community is an adaptation used by most microorganisms to survive in changing environments. Biofilm formation proceeds through distinct developmental phases and impacts not only medicine but also industry and evolution. In organisms such as the opportunistic pathogen Candida albicans, the ability to grow as biofilms is also an important mechanism for persistence, facilitating its growth on different tissues and a broad range of abiotic surfaces used in medical devices. The early stage of C. albicans biofilm is characterized by the adhesion of single cells to the substratum, followed by the formation of an intricate network of hyphae and the beginning of a dense structure. Changes in the transcriptome begin within 30 min of contact with the substrate and include expression of genes related to sulfur metabolism, in particular MET3, and the equivalent gene homologues of the Ribi regulon in Saccharomyces cerevisiae. Some of these changes are initiated early and maintained throughout the process; others are restricted to the earliest stages of biofilm formation. We identify here a potential alternative pathway for cysteine metabolism and the biofilm-associated expression of genes involved in glutathione production in C. albicans.
Collapse
Affiliation(s)
- Luis A Murillo
- Department of Cell and Tissue Biology, University of California, San Francisco, 521 Parnassus Ave., San Francisco, CA 94143-0422, USA
| | | | | | | | | | | |
Collapse
|
428
|
Chan BY, Kibler D. Using hexamers to predict cis-regulatory motifs in Drosophila. BMC Bioinformatics 2005; 6:262. [PMID: 16253142 PMCID: PMC1291357 DOI: 10.1186/1471-2105-6-262] [Citation(s) in RCA: 34] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2005] [Accepted: 10/27/2005] [Indexed: 12/22/2022] Open
Abstract
Background Cis-regulatory modules (CRMs) are short stretches of DNA that help regulate gene expression in higher eukaryotes. They have been found up to 1 megabase away from the genes they regulate and can be located upstream, downstream, and even within their target genes. Due to the difficulty of finding CRMs using biological and computational techniques, even well-studied regulatory systems may contain CRMs that have not yet been discovered. Results We present a simple, efficient method (HexDiff) based only on hexamer frequencies of known CRMs and non-CRM sequence to predict novel CRMs in regulatory systems. On a data set of 16 gap and pair-rule genes containing 52 known CRMs, predictions made by HexDiff had a higher correlation with the known CRMs than several existing CRM prediction algorithms: Ahab, Cluster Buster, MSCAN, MCAST, and LWF. After combining the results of the different algorithms, 10 putative CRMs were identified and are strong candidates for future study. The hexamers used by HexDiff to distinguish between CRMs and non-CRM sequence were also analyzed and were shown to be enriched in regulatory elements. Conclusion HexDiff provides an efficient and effective means for finding new CRMs based on known CRMs, rather than known binding sites.
Collapse
Affiliation(s)
- Bob Y Chan
- School of Information and Computer Science, University of California, Irvine, Irvine, California, USA
| | - Dennis Kibler
- School of Information and Computer Science, University of California, Irvine, Irvine, California, USA
| |
Collapse
|
429
|
Mahony S, Hendrix D, Smith TJ, Golden A. Self-Organizing Maps of Position Weight Matrices for Motif Discovery in Biological Sequences. Artif Intell Rev 2005. [DOI: 10.1007/s10462-005-9011-9] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
430
|
Tabach Y, Milyavsky M, Shats I, Brosh R, Zuk O, Yitzhaky A, Mantovani R, Domany E, Rotter V, Pilpel Y. The promoters of human cell cycle genes integrate signals from two tumor suppressive pathways during cellular transformation. Mol Syst Biol 2005; 1:2005.0022. [PMID: 16729057 PMCID: PMC1681464 DOI: 10.1038/msb4100030] [Citation(s) in RCA: 62] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2005] [Accepted: 09/22/2005] [Indexed: 12/28/2022] Open
Abstract
Deciphering regulatory events that drive malignant transformation represents a major challenge for systems biology. Here, we analyzed genome-wide transcription profiling of an in vitro cancerous transformation process. We focused on a cluster of genes whose expression levels increased as a function of p53 and p16(INK4A) tumor suppressors inactivation. This cluster predominantly consists of cell cycle genes and constitutes a signature of a diversity of cancers. By linking expression profiles of the genes in the cluster with the dynamic behavior of p53 and p16(INK4A), we identified a promoter architecture that integrates signals from the two tumor suppressive channels and that maps their activity onto distinct levels of expression of the cell cycle genes, which, in turn, correspond to different cellular proliferation rates. Taking components of the mitotic spindle as an example, we experimentally verified our predictions that p53-mediated transcriptional repression of several of these novel targets is dependent on the activities of p21, NFY, and E2F. Our study demonstrates how a well-controlled transformation process allows linking between gene expression, promoter architecture, and activity of upstream signaling molecules.
Collapse
MESH Headings
- Animals
- Cell Cycle Proteins/biosynthesis
- Cell Cycle Proteins/physiology
- Cell Division
- Cell Line, Transformed/metabolism
- Cell Line, Transformed/transplantation
- Cell Transformation, Neoplastic/genetics
- Computational Biology
- Cyclin-Dependent Kinase Inhibitor p16/physiology
- DNA-Binding Proteins/genetics
- DNA-Binding Proteins/physiology
- Fibroblasts/cytology
- Fibroblasts/metabolism
- Gene Expression Profiling
- Gene Expression Regulation
- Genes, Tumor Suppressor
- Genes, cdc
- Genes, p16
- Genes, p53
- Humans
- Mice
- Mice, Nude
- Promoter Regions, Genetic/genetics
- Promoter Regions, Genetic/physiology
- Recombinant Fusion Proteins/physiology
- Regulatory Sequences, Nucleic Acid
- Spindle Apparatus/metabolism
- Telomerase/genetics
- Telomerase/physiology
- Transcription, Genetic
- Transplantation, Heterologous
- Tumor Suppressor Protein p53/physiology
Collapse
Affiliation(s)
- Yuval Tabach
- Department of Molecular Cell Biology, Weizmann Institute of Science, Rehovot, Israel
- Department of Physics of Complex Systems, Weizmann Institute of Science, Rehovot, Israel
| | - Michael Milyavsky
- Department of Molecular Cell Biology, Weizmann Institute of Science, Rehovot, Israel
| | - Igor Shats
- Department of Molecular Cell Biology, Weizmann Institute of Science, Rehovot, Israel
| | - Ran Brosh
- Department of Molecular Cell Biology, Weizmann Institute of Science, Rehovot, Israel
| | - Or Zuk
- Department of Physics of Complex Systems, Weizmann Institute of Science, Rehovot, Israel
| | - Assif Yitzhaky
- Department of Physics of Complex Systems, Weizmann Institute of Science, Rehovot, Israel
| | - Roberto Mantovani
- Dipartimento di Scienze Biomolecolare e Biotecnologie, Universita di Milano, Milan, Italy
| | - Eytan Domany
- Department of Physics of Complex Systems, Weizmann Institute of Science, Rehovot, Israel
| | - Varda Rotter
- Department of Molecular Cell Biology, Weizmann Institute of Science, Rehovot, Israel
- Department of Molecular Cell Biology, Weizmann Institute of Science, Rehovot 76100, Israel. Tel.: +972 8 934 4501; Fax: +972 8 946 5265; E-mail:
| | - Yitzhak Pilpel
- Department of Molecular Genetics, Weizmann Institute of Science, Rehovot, Israel
- Department of Molecular Genetics, Weizmann Institute of Science, Rehovot 76100, Israel. Tel.: +972 8 934 6058; Fax: +972 8 934 4108; E-mail:
| |
Collapse
|
431
|
Abstract
Many short DNA motifs, such as transcription factor binding sites (TFBS) and splice sites, exhibit strong local as well as nonlocal dependence. We introduce permuted variable length Markov models (PVLMM) which could capture the potentially important dependencies among positions and apply them to the problem of detecting splice and TFB sites. They have been satisfactory from the viewpoint of prediction performance and also give ready biological interpretations of the sequence dependence observed. The issue of model selection is also studied.
Collapse
Affiliation(s)
- Xiaoyue Zhao
- Department of Statistics, University of California- Berkeley, 367 Evans Hall, Berkeley CA 94720-3860, USA.
| | | | | |
Collapse
|
432
|
Hindemitt T, Mayer KFX. CREDO: a web-based tool for computational detection of conserved sequence motifs in noncoding sequences. Bioinformatics 2005; 21:4304-6. [PMID: 16204349 DOI: 10.1093/bioinformatics/bti691] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
SUMMARY CREDO is a user-friendly, web-based tool that integrates the analysis and results of different algorithms widely used for the computational detection of conserved sequence motifs in noncoding sequences. It enables easy comparison of the individual results. CREDO offers intuitive interfaces for easy and rapid configuration of the applied algorithms and convenient views on the results in graphical and tabular formats. AVAILABILITY http://mips.gsf.de/proj/regulomips/credo.htm.
Collapse
Affiliation(s)
- Tobias Hindemitt
- MIPS/Institute for Bioinformatics, GSF Research Centre for Environment and Health Ingolstaedter Landstrasse 1, 85758 Neuherberg, Germany
| | | |
Collapse
|
433
|
Shalgi R, Lapidot M, Shamir R, Pilpel Y. A catalog of stability-associated sequence elements in 3' UTRs of yeast mRNAs. Genome Biol 2005; 6:R86. [PMID: 16207357 PMCID: PMC1257469 DOI: 10.1186/gb-2005-6-10-r86] [Citation(s) in RCA: 60] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2005] [Revised: 07/25/2005] [Accepted: 09/06/2005] [Indexed: 12/02/2022] Open
Abstract
By analyzing 3' UTR sequences and mRNA decay profiles in yeast, 53 sequence motifs have been identified that may be implicated in stabilization or destabilization of mRNA. Background In recent years, intensive computational efforts have been directed towards the discovery of promoter motifs that correlate with mRNA expression profiles. Nevertheless, it is still not always possible to predict steady-state mRNA expression levels based on promoter signals alone, suggesting that other factors may be involved. Other genic regions, in particular 3' UTRs, which are known to exert regulatory effects especially through controlling RNA stability and localization, were less comprehensively investigated, and deciphering regulatory motifs within them is thus crucial. Results By analyzing 3' UTR sequences and mRNA decay profiles of Saccharomyces cerevisiae genes, we derived a catalog of 53 sequence motifs that may be implicated in stabilization or destabilization of mRNAs. Some of the motifs correspond to known RNA-binding protein sites, and one of them may act in destabilization of ribosome biogenesis genes during stress response. In addition, we present for the first time a catalog of 23 motifs associated with subcellular localization. A significant proportion of the 3' UTR motifs is highly conserved in orthologous yeast genes, and some of the motifs are strikingly similar to recently published mammalian 3' UTR motifs. We classified all genes into those regulated only at transcription initiation level, only at degradation level, and those regulated by a combination of both. Interestingly, different biological functionalities and expression patterns correspond to such classification. Conclusion The present motif catalogs are a first step towards the understanding of the regulation of mRNA degradation and subcellular localization, two important processes which - together with transcription regulation - determine the cell transcriptome.
Collapse
Affiliation(s)
- Reut Shalgi
- Department of Molecular Genetics, Weizmann Institute of Science, Rehovot, 76100, Israel
| | - Michal Lapidot
- Department of Molecular Genetics, Weizmann Institute of Science, Rehovot, 76100, Israel
| | - Ron Shamir
- School of Computer Science, Tel-Aviv University, Tel-Aviv, 69978, Israel
| | - Yitzhak Pilpel
- Department of Molecular Genetics, Weizmann Institute of Science, Rehovot, 76100, Israel
| |
Collapse
|
434
|
Granek JA, Clarke ND. Explicit equilibrium modeling of transcription-factor binding and gene regulation. Genome Biol 2005; 6:R87. [PMID: 16207358 PMCID: PMC1257470 DOI: 10.1186/gb-2005-6-10-r87] [Citation(s) in RCA: 96] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2005] [Revised: 06/17/2005] [Accepted: 08/30/2005] [Indexed: 12/02/2022] Open
Abstract
A computational model, GOMER, is presented that predicts transcription-factor binding and incorporates effects of cooperativity and competition. We have developed a computational model that predicts the probability of transcription factor binding to any site in the genome. GOMER (generalizable occupancy model of expression regulation) calculates binding probabilities on the basis of position weight matrices, and incorporates the effects of cooperativity and competition by explicit calculation of coupled binding equilibria. GOMER can be used to test hypotheses regarding gene regulation that build upon this physically principled prediction of protein-DNA binding.
Collapse
Affiliation(s)
- Joshua A Granek
- Department of Biophysics and Biophysical Chemistry, Johns Hopkins University School of Medicine, North Wolfe Street, Baltimore, MD 21205, USA
- National Evolutionary Synthesis Center, Broad Street, Durham, NC 27705, USA
| | - Neil D Clarke
- Department of Biophysics and Biophysical Chemistry, Johns Hopkins University School of Medicine, North Wolfe Street, Baltimore, MD 21205, USA
- Genome Institute of Singapore, Biopolis Street, Singapore 138672, Republic of Singapore
| |
Collapse
|
435
|
He X, Zhang J. Gene complexity and gene duplicability. Curr Biol 2005; 15:1016-21. [PMID: 15936271 DOI: 10.1016/j.cub.2005.04.035] [Citation(s) in RCA: 79] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2005] [Revised: 04/13/2005] [Accepted: 04/19/2005] [Indexed: 11/22/2022]
Abstract
Eukaryotic genes are on average more complex than prokaryotic genes in terms of expression regulation, protein length, and protein-domain structure [1-5]. Eukaryotes are also known to have a higher rate of gene duplication than prokaryotes do [6, 7]. Because gene duplication is the primary source of new genes [], the average gene complexity in a genome may have been increased by gene duplication if complex genes are preferentially duplicated. Here, we test this "gene complexity and gene duplicability" hypothesis with yeast genomic data. We show that, on average, duplicate genes from either whole-genome or individual-gene duplication have longer protein sequences, more functional domains, and more cis-regulatory motifs than singleton genes. This phenomenon is not a by-product of previously known mechanisms, such as protein function [10-13], evolutionary rate [14, 15], dosage [11], and dosage balance [16], that influence gene duplicability. Rather, it appears to have resulted from the sub-neo-functionalization process in duplicate-gene evolution [11]. Under this process, complex genes are more likely to be retained after duplication because they are prone to subfunctionalization, and gene complexity is regained via subsequent neofunctionalization. Thus, gene duplication increases both gene number and gene complexity, two important factors in the origin of genomic and organismal complexity.
Collapse
Affiliation(s)
- Xionglei He
- Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor 48109, USA
| | | |
Collapse
|
436
|
Kielbasa SM, Gonze D, Herzel H. Measuring similarities between transcription factor binding sites. BMC Bioinformatics 2005; 6:237. [PMID: 16191190 PMCID: PMC1261160 DOI: 10.1186/1471-2105-6-237] [Citation(s) in RCA: 44] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2004] [Accepted: 09/28/2005] [Indexed: 11/22/2022] Open
Abstract
Background Collections of transcription factor binding profiles (Transfac, Jaspar) are essential to identify regulatory elements in DNA sequences. Subsets of highly similar profiles complicate large scale analysis of transcription factor binding sites. Results We propose to identify and group similar profiles using two independent similarity measures: χ2 distances between position frequency matrices (PFMs) and correlation coefficients between position weight matrices (PWMs) scores. Conclusion We show that these measures complement each other and allow to associate Jaspar and Transfac matrices. Clusters of highly similar matrices are identified and can be used to optimise the search for regulatory elements. Moreover, the application of the measures is illustrated by assigning E-box matrices of a SELEX experiment and of experimentally characterised binding sites of circadian clock genes to the Myc-Max cluster.
Collapse
Affiliation(s)
- Szymon M Kielbasa
- Institute for Theoretical Biology, Humboldt University, Invalidenstraße 43, D-10115 Berlin, Germany
| | - Didier Gonze
- Institute for Theoretical Biology, Humboldt University, Invalidenstraße 43, D-10115 Berlin, Germany
- Unité de Chronobiologie Théorique, Université Libre de Bruxelles, CP 231, Campus Plaine, Bvd du Triomphe, B-1050 Bruxelles, Belgium
| | - Hanspeter Herzel
- Institute for Theoretical Biology, Humboldt University, Invalidenstraße 43, D-10115 Berlin, Germany
| |
Collapse
|
437
|
Kruus E, Thumfort P, Tang C, Wingreen NS. Gibbs sampling and helix-cap motifs. Nucleic Acids Res 2005; 33:5343-53. [PMID: 16174845 PMCID: PMC1234247 DOI: 10.1093/nar/gki842] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2005] [Revised: 08/08/2005] [Accepted: 08/30/2005] [Indexed: 11/25/2022] Open
Abstract
Protein backbones have characteristic secondary structures, including alpha-helices and beta-sheets. Which structure is adopted locally is strongly biased by the local amino acid sequence of the protein. Accurate (probabilistic) mappings from sequence to structure are valuable for both secondary-structure prediction and protein design. For the case of alpha-helix caps, we test whether the information content of the sequence-structure mapping can be self-consistently improved by using a relaxed definition of the structure. We derive helix-cap sequence motifs using database helix assignments for proteins of known structure. These motifs are refined using Gibbs sampling in competition with a null motif. Then Gibbs sampling is repeated, allowing for frameshifts of +/-1 amino acid residue, in order to find sequence motifs of higher total information content. All helix-cap motifs were found to have good generalization capability, as judged by training on a small set of non-redundant proteins and testing on a larger set. For overall prediction purposes, frameshift motifs using all training examples yielded the best results. Frameshift motifs using a fraction of all training examples performed best in terms of true positives among top predictions. However, motifs without frameshifts also performed well, despite a roughly one-third lower total information content.
Collapse
Affiliation(s)
- Erik Kruus
- NEC Laboratories America, Inc. 4 Independence Way, Princeton, NJ 08544, USA.
| | | | | | | |
Collapse
|
438
|
Abstract
Gene duplication plays an important role in evolution because it is the primary source of new genes. Many recent studies showed that gene duplicability varies considerably among genes. Several considerations led us to hypothesize that less important genes have higher rates of successful duplications, where gene importance is measured by the fitness reduction caused by the deletion of the gene. Here, we test this hypothesis by comparing the importance of two groups of singleton genes in the yeast Saccharomyces cerevisiae (Sce). Group S genes did not duplicate in four other yeast species examined, whereas group D experienced duplication in these species. Consistent with our hypothesis, we found group D genes to be less important than group S genes. Specifically, 17% of group D genes are essential in Sce, compared to 28% for group S. Furthermore, deleting a group D gene in Sce reduces the fitness by 24% on average, compared to 38% for group S. Our subsequent analysis showed that less important genes have more cis-regulatory motifs, which could lead to a higher chance of subfunctionalization of duplicate genes and result in an enhanced rate of gene retention. Less important genes may also have weaker dosage imbalance effects and cause fewer genetic perturbations when duplicated. Regardless of the cause, our observation indicates that the previous finding of a less severe fitness consequence of deleting a duplicate gene than deleting a singleton gene is at least in part due to the fact that duplicate genes are intrinsically less important than singleton genes and suggests that the contribution of duplicate genes to genetic robustness has been overestimated.
Collapse
Affiliation(s)
- Xionglei He
- Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, USA
| | | |
Collapse
|
439
|
Vavouri T, Elgar G. Prediction of cis-regulatory elements using binding site matrices--the successes, the failures and the reasons for both. Curr Opin Genet Dev 2005; 15:395-402. [PMID: 15950456 DOI: 10.1016/j.gde.2005.05.002] [Citation(s) in RCA: 55] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2005] [Accepted: 05/23/2005] [Indexed: 01/02/2023]
Abstract
Protein-DNA interactions control many aspects of animal development and cellular responses to the environment. Although profiling of individual transcription factor binding sites is not a reliable guide for predicting the position of cis-regulatory elements in large genomes, modelling the evolution and the organization of regulatory elements has provided enough information to make some successful predictions. For vertebrate genomes, the field is limited by the lack of sufficient experimental data upon which to build reliable models. Nonetheless, a combination of experimental, computational and comparative data is likely to reveal aspects of complex regulatory networks in vertebrates, just as it has already done for simple eukaryotic genomes.
Collapse
Affiliation(s)
- Tanya Vavouri
- Comparative Genomics Group, MRC Rosalind Franklin Centre for Genomics Research, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SB, UK
| | | |
Collapse
|
440
|
Suzuki M, Ketterling MG, McCarty DR. Quantitative statistical analysis of cis-regulatory sequences in ABA/VP1- and CBF/DREB1-regulated genes of Arabidopsis. PLANT PHYSIOLOGY 2005; 139:437-47. [PMID: 16113229 PMCID: PMC1203392 DOI: 10.1104/pp.104.058412] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
We have developed a simple quantitative computational approach for objective analysis of cis-regulatory sequences in promoters of coregulated genes. The program, designated MotifFinder, identifies oligo sequences that are overrepresented in promoters of coregulated genes. We used this approach to analyze promoter sequences of Viviparous1 (VP1)/abscisic acid (ABA)-regulated genes and cold-regulated genes, respectively, of Arabidopsis (Arabidopsis thaliana). We detected significantly enriched sequences in up-regulated genes but not in down-regulated genes. This result suggests that gene activation but not repression is mediated by specific and common sequence elements in promoters. The enriched motifs include several known cis-regulatory sequences as well as previously unidentified motifs. With respect to known cis-elements, we dissected the flanking nucleotides of the core sequences of Sph element, ABA response elements (ABREs), and the C repeat/dehydration-responsive element. This analysis identified the motif variants that may correlate with qualitative and quantitative differences in gene expression. While both VP1 and cold responses are mediated in part by ABA signaling via ABREs, these responses correlate with unique ABRE variants distinguished by nucleotides flanking the ACGT core. ABRE and Sph motifs are tightly associated uniquely in the coregulated set of genes showing a strict dependence on VP1 and ABA signaling. Finally, analysis of distribution of the enriched sequences revealed a striking concentration of enriched motifs in a proximal 200-base region of VP1/ABA and cold-regulated promoters. Overall, each class of coregulated genes possesses a discrete set of the enriched motifs with unique distributions in their promoters that may account for the specificity of gene regulation.
Collapse
Affiliation(s)
- Masaharu Suzuki
- Plant Molecular and Cellular Biology Program, Horticultural Sciences Department, University of Florida, Gainesville, 32611, USA.
| | | | | |
Collapse
|
441
|
Vandepoele K, Vlieghe K, Florquin K, Hennig L, Beemster GTS, Gruissem W, Van de Peer Y, Inzé D, De Veylder L. Genome-wide identification of potential plant E2F target genes. PLANT PHYSIOLOGY 2005; 139:316-28. [PMID: 16126853 PMCID: PMC1203381 DOI: 10.1104/pp.105.066290] [Citation(s) in RCA: 85] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
Entry into the S phase of the cell cycle is controlled by E2F transcription factors that induce the transcription of genes required for cell cycle progression and DNA replication. Although the E2F pathway is highly conserved in higher eukaryotes, only a few E2F target genes have been experimentally validated in plants. We have combined microarray analysis and bioinformatics tools to identify plant E2F-responsive genes. Promoter regions of genes that were induced at the transcriptional level in Arabidopsis (Arabidopsis thaliana) seedlings ectopically expressing genes for the E2Fa and DPa transcription factors were searched for the presence of E2F-binding sites, resulting in the identification of 181 putative E2F target genes. In most cases, the E2F-binding element was located close to the transcription start site, but occasionally could also be localized in the 5' untranslated region. Comparison of our results with available microarray data sets from synchronized cell suspensions revealed that the E2F target genes were expressed almost exclusively during G1 and S phases and activated upon reentry of quiescent cells into the cell cycle. To test the robustness of the data for the Arabidopsis E2F target genes, we also searched for the presence of E2F-cis-acting elements in the promoters of the putative orthologous rice (Oryza sativa) genes. Using this approach, we identified 70 potential conserved plant E2F target genes. These genes encode proteins involved in cell cycle regulation, DNA replication, and chromatin dynamics. In addition, we identified several genes for potentially novel S phase regulatory proteins.
Collapse
Affiliation(s)
- Klaas Vandepoele
- Department of Plant Systems Biology, Flanders Interuniversity Institute for Biotechnology, Ghent University, Belgium
| | | | | | | | | | | | | | | | | |
Collapse
|
442
|
Trindade LM, van Berloo R, Fiers M, Visser RGF. PRECISE: software for prediction of cis-acting regulatory elements. ACTA ACUST UNITED AC 2005; 96:618-22. [PMID: 16135709 DOI: 10.1093/jhered/esi094] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]
Abstract
The regulation of gene expression at the transcription initiation level is highly complex and requires the presence of multiple transcription factors. These transcription factors are often proteins or peptides that bind to the so-called cis-acting elements, which are present in the promoter regions and conserved among different species. In order to predict these cis-acting elements, a computer program called PRECISE (Prediction of REgulatory CIS-acting Elements) was developed. The power of the tool lies in its user-friendly interface and in the possibility of using empirical motif frequency tables to filter through the many discovered motifs. The tools to create the empirical motif frequency table (e.g., from a whole genome sequence) are included in the package. In the first case study, the upstream regions of all the genes in the Arabidopsis genome were used to create an empirical motif frequency table and a set of 64 upstream sequences of genes known to be involved in starch metabolism was subjected to analysis by PRECISE. The 20 motifs with the highest specificity in the selected set were analyzed in more detail. Of these 20 motifs, 15 showed a very high or complete homology to the sequences of known cis-acting elements. These cis-acting elements are regulated by light, auxin, and abscisic acid, and confer specific expression in sink organs such as leaves and seeds. All these factors have been shown to play an important role in starch biosynthesis. In the second case study, the upstream regions of 16 genes whose transcription is induced by gibberellins (GA) in Arabidopsis were analyzed with PRECISE and compared to the motifs present in the PLACE database. Among the most promising motifs found by PRECISE were 6 of the 17 known GA motifs. These results indicate the power of the PRECISE software package in the prediction of regulatory elements.
Collapse
Affiliation(s)
- L M Trindade
- Graduate School of Experimental Plant Sciences, Laboratory of Plant Breeding, Department of Plant Sciences, Wageningen University, P.O. Box 386, 6700 AJ Wageningen, The Netherlands.
| | | | | | | |
Collapse
|
443
|
Ding LH, Shingyoji M, Chen F, Hwang JJ, Burma S, Lee C, Cheng JF, Chen DJ. Gene expression profiles of normal human fibroblasts after exposure to ionizing radiation: a comparative study of low and high doses. Radiat Res 2005; 164:17-26. [PMID: 15966761 DOI: 10.1667/rr3354] [Citation(s) in RCA: 160] [Impact Index Per Article: 8.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/03/2022]
Abstract
Several types of cellular responses to ionizing radiation, such as the adaptive response or the bystander effect, suggest that low-dose radiation may possess characteristics that distinguish it from its high-dose counterpart. Accumulated evidence also implies that the biological effects of low-dose and high-dose ionizing radiation are not linearly distributed. We have investigated, for the first time, global gene expression changes induced by ionizing radiation at doses as low as 2 cGy and have compared this to expression changes at 4 Gy. We applied cDNA microarray analyses to G1-arrested normal human skin fibroblasts subjected to X irradiation. Our data suggest that both qualitative and quantitative differences exist between gene expression profiles induced by 2 cGy and 4 Gy. The predominant functional groups responding to low-dose radiation are those involved in cell-cell signaling, signal transduction, development and DNA damage responses. At high dose, the responding genes are involved in apoptosis and cell proliferation. Interestingly, several genes, such as cytoskeleton components ANLN and KRT15 and cell-cell signaling genes GRAP2 and GPR51, were found to respond to low-dose radiation but not to high-dose radiation. Pathways that are specifically activated by low-dose radiation were also evident. These quantitative and qualitative differences in gene expression changes may help explain the non-linear correlation of biological effects of ionizing radiation from low dose to high dose.
Collapse
Affiliation(s)
- Liang-Hao Ding
- Lawrence Berkeley National Laboratory, Life Sciences Division, Berkeley, California, 94720, USA
| | | | | | | | | | | | | | | |
Collapse
|
444
|
Petti AA, Church GM. A network of transcriptionally coordinated functional modules in Saccharomyces cerevisiae. Genome Res 2005; 15:1298-306. [PMID: 16109970 PMCID: PMC1199545 DOI: 10.1101/gr.3847105] [Citation(s) in RCA: 28] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/25/2023]
Abstract
Recent computational and experimental work suggests that functional modules underlie much of cellular physiology and are a useful unit of cellular organization from the perspective of systems biology. Because interactions among modules can give rise to higher-level properties that are essential to cellular function, a complete knowledge of these interactions is necessary for future work in systems biology, including in silico modeling and metabolic engineering. Here we present a computational method for the systematic identification and analysis of functional modules whose activity is coordinated at the level of transcription. We applied this method, Search for Pairwise Interactions (SPIN), to obtain a global view of functional module connectivity in Saccharomyces cerevisiae and to provide insight into the biological mechanisms underlying this coordination. We also examined this global network at higher resolution to obtain detailed information about the interactions of particular module pairs. For instance, our results reveal possible transcriptional coordination of glycolysis and lipid metabolism by the transcription factor Gcr1p, and further suggest that glycolysis and phosphoinositide signaling may regulate each other reciprocally.
Collapse
Affiliation(s)
- Allegra A Petti
- Department of Genetics, Harvard Medical School, Boston, Massachusetts 02115, USA.
| | | |
Collapse
|
445
|
Zhu Z, Shendure J, Church GM. Discovering functional transcription-factor combinations in the human cell cycle. Genome Res 2005; 15:848-55. [PMID: 15930495 PMCID: PMC1142475 DOI: 10.1101/gr.3394405] [Citation(s) in RCA: 58] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
With the completion of full genome sequences and advancement in high-throughput technologies, in silico methods have been successfully used to integrate diverse data sources toward unraveling the combinatorial nature of transcriptional regulation. So far, almost all of these studies are restricted to lower eukaryotes such as budding yeast. We describe here a computational search for functional transcription-factor (TF) combinations using phylogenetically conserved sequences and microarray-based expression data. Taking into account both orientational and positional constraints, we investigated the overrepresentation of binding sites in the vicinity of one another and whether these combinations result in more coherent expression profiles. Without any prior biological knowledge, the search led to the discovery of several experimentally established TF associations, as well as some novel ones. In particular, we identified a regulatory module controlling cell cycle-dependent transcription of G2-M genes and expanded its functional generality. We also detected many homotypic combinations, supporting the importance of binding-site density in transcriptional regulation of higher eukaryotes.
Collapse
Affiliation(s)
- Zhou Zhu
- Department of Genetics, Harvard Medical School, Boston, Massachusetts 02115, USA.
| | | | | |
Collapse
|
446
|
Kankainen M, Holm L. POCO: discovery of regulatory patterns from promoters of oppositely expressed gene sets. Nucleic Acids Res 2005; 33:W427-31. [PMID: 15980504 PMCID: PMC1160228 DOI: 10.1093/nar/gki467] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open
Abstract
Functionally associated genes tend to be co-expressed, which indicates that they could also be co-regulated. Since co-regulation is usually governed by transcription factors via their specific binding elements, putative regulators can be identified from promoter sets of (co-expressed) genes by screening for over-represented nucleotide patterns. Here, we present a program, POCO, which discovers such over-represented patterns from either one or two promoter sets. Typical microarray experiments yield up- and down-regulated gene sets that may represent, for example, distinct defense pathways. Assuming that a functional transcription factor cannot simultaneously both up- and down-regulate the gene sets, its binding element should respectively be over- and under-represented in the corresponding promoter sets. This idea is implemented in POCO, which tests the hypothesis that the distributions of a pattern differ among three sets of promoters: up-regulated, down-regulated and randomly-chosen. In the program, pattern discovery is based on explicit enumeration of all possible patterns on the alphabet (A, C, G, T and N). The mean occurrences and SDs of the patterns are estimated using bootstrapping and their significance is assessed using ANOVA F-statistics, Tukey's honestly significantly difference test and P-values. The program is freely available at .
Collapse
Affiliation(s)
- Matti Kankainen
- Institute of Biotechnology, University of HelsinkiPO Box 56 (Viikinkaari 5), FIN-00014, Helsinki, Finland
| | - Liisa Holm
- Institute of Biotechnology, University of HelsinkiPO Box 56 (Viikinkaari 5), FIN-00014, Helsinki, Finland
- Department of Biosciences, Division of Genetics, University of HelsinkiPO Box 56 (Viikinkaari 5), FIN-00014, Helsinki, Finland
- To whom correspondence should be addressed. Tel: +358 9 19159115; Fax: +358 9 19159079;
| |
Collapse
|
447
|
Boorsma A, Foat BC, Vis D, Klis F, Bussemaker HJ. T-profiler: scoring the activity of predefined groups of genes using gene expression data. Nucleic Acids Res 2005; 33:W592-5. [PMID: 15980543 PMCID: PMC1160244 DOI: 10.1093/nar/gki484] [Citation(s) in RCA: 164] [Impact Index Per Article: 8.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022] Open
Abstract
One of the key challenges in the analysis of gene expression data is how to relate the expression level of individual genes to the underlying transcriptional programs and cellular state. Here we describe T-profiler, a tool that uses the t-test to score changes in the average activity of predefined groups of genes. The gene groups are defined based on Gene Ontology categorization, ChIP-chip experiments, upstream matches to a consensus transcription factor binding motif or location on the same chromosome. If desired, an iterative procedure can be used to select a single, optimal representative from sets of overlapping gene groups. T-profiler makes it possible to interpret microarray data in a way that is both intuitive and statistically rigorous, without the need to combine experiments or choose parameters. Currently, gene expression data from Saccharomyces cerevisiae and Candida albicans are supported. Users can upload their microarray data for analysis on the web at .
Collapse
Affiliation(s)
| | - Barrett C. Foat
- Department of Biological Sciences, Columbia UniversityNew York, NY 10027, USA
| | | | | | - Harmen J. Bussemaker
- Department of Biological Sciences, Columbia UniversityNew York, NY 10027, USA
- Center for Computational Biology and Bioinformatics, Columbia UniversityNew York, NY 10032, USA
- To whom correspondence should be addressed. Tel: +1 212 854 9932; Fax: +1 212 865 8246;
| |
Collapse
|
448
|
Corcoran DL, Feingold E, Benos PV. FOOTER: a web tool for finding mammalian DNA regulatory regions using phylogenetic footprinting. Nucleic Acids Res 2005; 33:W442-6. [PMID: 15980508 PMCID: PMC1160181 DOI: 10.1093/nar/gki420] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
FOOTER is a newly developed algorithm that analyzes homologous mammalian promoter sequences in order to identify transcriptional DNA regulatory 'signals'. FOOTER uses prior knowledge about the binding site preferences of the transcription factors (TFs) in the form of position-specific scoring matrices (PSSMs). The PSSM models are generated from known mammalian binding sites from the TRANSFAC database. In a test set of 72 confirmed binding sites (most of them not present in TRANSFAC) of 19 TFs, it exhibited 83% sensitivity and 72% specificity. FOOTER is accessible over the web at http://biodev.hgen.pitt.edu/Footer/.
Collapse
Affiliation(s)
- David L. Corcoran
- Department of Human Genetics, Graduate School of Public Health, University of PittsburghPittsburgh, PA, USA
- Department of Biostatistics, Graduate School of Public Health, University of PittsburghPittsburgh, PA, USA
| | - Eleanor Feingold
- Department of Human Genetics, Graduate School of Public Health, University of PittsburghPittsburgh, PA, USA
- Department of Biostatistics, Graduate School of Public Health, University of PittsburghPittsburgh, PA, USA
| | - Panayiotis V. Benos
- Department of Human Genetics, Graduate School of Public Health, University of PittsburghPittsburgh, PA, USA
- Department of Computational Biology, University of PittsburghPittsburgh, PA, USA
- University of Pittsburgh Cancer Institute, School of Medicine, University of PittsburghPittsburgh, PA, USA
- To whom correspondence should be addressed. Tel: +1 412 648 3315; Fax: +1 412 624 3020;
| |
Collapse
|
449
|
Gertz J, Riles L, Turnbaugh P, Ho SW, Cohen BA. Discovery, validation, and genetic dissection of transcription factor binding sites by comparative and functional genomics. Genome Res 2005; 15:1145-52. [PMID: 16077013 PMCID: PMC1182227 DOI: 10.1101/gr.3859605] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2005] [Accepted: 05/03/2005] [Indexed: 11/24/2022]
Abstract
Completing the annotation of a genome sequence requires identifying the regulatory sequences that control gene expression. To identify these sequences, we developed an algorithm that searches for short, conserved sequence motifs in the genomes of related species. The method is effective in finding motifs de novo and for refining known regulatory motifs in Saccharomyces cerevisiae. We tested one novel motif prediction of the algorithm and found it to be the binding site of Stp2; it is significantly different from the previously predicted Stp2 binding site. We show that Stp2 physically interacts with this sequence motif, and that stp2 mutations affect the expression of genes associated with the motif. We demonstrate that the Stp2 binding site also interacts genetically with Stp1, a regulator of amino acid permease genes and, with Sfp1, a key regulator of cell growth. These results illuminate an important transcriptional circuit that regulates cell growth through external nutrient uptake.
Collapse
Affiliation(s)
- Jason Gertz
- Department of Genetics, Washington University School of Medicine, St. Louis, Missouri 63108, USA
| | | | | | | | | |
Collapse
|
450
|
Wilson IW, Kennedy GC, Peacock JW, Dennis ES. Microarray Analysis Reveals Vegetative Molecular Phenotypes of Arabidopsis Flowering-time Mutants. ACTA ACUST UNITED AC 2005; 46:1190-201. [PMID: 15908439 DOI: 10.1093/pcp/pci128] [Citation(s) in RCA: 33] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]
Abstract
The transition to flowering occurs at the shoot apex; however, most of the characterized genes that affect the timing of floral induction are expressed throughout the plant. To further our understanding of these genes and the flowering process, the vegetative molecular phenotypes of 16 Arabidopsis mutants associated with the major flowering initiation pathways were assayed using a 13,000 clone microarray under two different conditions that affect flowering. All mutants showed at least one change in gene expression other than the mutant flowering gene. Metabolism- and defence-related pathways were the areas with the most frequent gene expression changes detected in the mutants. Several genes such as EARLI1 were differentially expressed in a number of flowering mutants from different flowering pathways. Analysis of the promoter regions of genes differentially expressed identified common promoter elements, indicating some form of common regulation.
Collapse
Affiliation(s)
- Iain W Wilson
- CSIRO Plant Industry, GPO Box 1600, Canberra ACT 2601, Australia
| | | | | | | |
Collapse
|