551
|
Cliften P, Sudarsanam P, Desikan A, Fulton L, Fulton B, Majors J, Waterston R, Cohen BA, Johnston M. Finding functional features in Saccharomyces genomes by phylogenetic footprinting. Science 2003; 301:71-6. [PMID: 12775844 DOI: 10.1126/science.1084337] [Citation(s) in RCA: 634] [Impact Index Per Article: 30.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
Abstract
The sifting and winnowing of DNA sequence that occur during evolution cause nonfunctional sequences to diverge, leaving phylogenetic footprints of functional sequence elements in comparisons of genome sequences. We searched for such footprints among the genome sequences of six Saccharomyces species and identified potentially functional sequences. Comparison of these sequences allowed us to revise the catalog of yeast genes and identify sequence motifs that may be targets of transcriptional regulatory proteins. Some of these conserved sequence motifs reside upstream of genes with similar functional annotations or similar expression patterns or those bound by the same transcription factor and are thus good candidates for functional regulatory sequences.
Collapse
Affiliation(s)
- Paul Cliften
- Department of Genetics, Washington University School of Medicine, 660 South Euclid Avenue, St. Louis, MO 63110, USA
| | | | | | | | | | | | | | | | | |
Collapse
|
552
|
Halees AS, Leyfer D, Weng Z. PromoSer: A large-scale mammalian promoter and transcription start site identification service. Nucleic Acids Res 2003; 31:3554-9. [PMID: 12824364 PMCID: PMC168956 DOI: 10.1093/nar/gkg549] [Citation(s) in RCA: 65] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Proximal promoters have a major impact on transcriptional regulation. Studies of the sequence-based nature of this regulation usually require collection of proximal promoter sequences for large sets of co-regulated genes. We report a newly implemented web service that facilitates extraction of user specified regions around the transcription start site of all annotated human, mouse or rat genes. The transcription start sites have been identified computationally by considering alignments of a large number of partial and full-length mRNA sequences to genomic DNA, with provision for alternative promoters. The service is publicly available at http://biowulf.bu.edu/zlab/PromoSer/.
Collapse
Affiliation(s)
- Anason S Halees
- Bioinformatics Program, Boston University, 44 Cummington Street, Boston, MA 02215, USA
| | | | | |
Collapse
|
553
|
Rombauts S, Florquin K, Lescot M, Marchal K, Rouzé P, van de Peer Y. Computational approaches to identify promoters and cis-regulatory elements in plant genomes. PLANT PHYSIOLOGY 2003; 132:1162-76. [PMID: 12857799 PMCID: PMC167057 DOI: 10.1104/pp.102.017715] [Citation(s) in RCA: 77] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/14/2002] [Revised: 01/10/2003] [Accepted: 03/17/2003] [Indexed: 05/19/2023]
Abstract
The identification of promoters and their regulatory elements is one of the major challenges in bioinformatics and integrates comparative, structural, and functional genomics. Many different approaches have been developed to detect conserved motifs in a set of genes that are either coregulated or orthologous. However, although recent approaches seem promising, in general, unambiguous identification of regulatory elements is not straightforward. The delineation of promoters is even harder, due to its complex nature, and in silico promoter prediction is still in its infancy. Here, we review the different approaches that have been developed for identifying promoters and their regulatory elements. We discuss the detection of cis-acting regulatory elements using word-counting or probabilistic methods (so-called "search by signal" methods) and the delineation of promoters by considering both sequence content and structural features ("search by content" methods). As an example of search by content, we explored in greater detail the association of promoters with CpG islands. However, due to differences in sequence content, the parameters used to detect CpG islands in humans and other vertebrates cannot be used for plants. Therefore, a preliminary attempt was made to define parameters that could possibly define CpG and CpNpG islands in Arabidopsis, by exploring the compositional landscape around the transcriptional start site. To this end, a data set of more than 5,000 gene sequences was built, including the promoter region, the 5'-untranslated region, and the first introns and coding exons. Preliminary analysis shows that promoter location based on the detection of potential CpG/CpNpG islands in the Arabidopsis genome is not straightforward. Nevertheless, because the landscape of CpG/CpNpG islands differs considerably between promoters and introns on the one side and exons (whether coding or not) on the other, more sophisticated approaches can probably be developed for the successful detection of "putative" CpG and CpNpG islands in plants.
Collapse
Affiliation(s)
- Stephane Rombauts
- Department of Plant Systems Biology, Flanders Interuniversity Institute for Biotechnology, Ghent University, B-9000 Gent, Belgium
| | | | | | | | | | | |
Collapse
|
554
|
Schwartz S, Elnitski L, Li M, Weirauch M, Riemer C, Smit A, Green ED, Hardison RC, Miller W. MultiPipMaker and supporting tools: Alignments and analysis of multiple genomic DNA sequences. Nucleic Acids Res 2003; 31:3518-24. [PMID: 12824357 PMCID: PMC168985 DOI: 10.1093/nar/gkg579] [Citation(s) in RCA: 169] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Analysis of multiple sequence alignments can generate important, testable hypotheses about the phylogenetic history and cellular function of genomic sequences. We describe the MultiPipMaker server, which aligns multiple, long genomic DNA sequences quickly and with good sensitivity (available at http://bio.cse.psu.edu/ since May 2001). Alignments are computed between a contiguous reference sequence and one or more secondary sequences, which can be finished or draft sequence. The outputs include a stacked set of percent identity plots, called a MultiPip, comparing the reference sequence with subsequent sequences, and a nucleotide-level multiple alignment. New tools are provided to search MultiPipMaker output for conserved matches to a user-specified pattern and for conserved matches to position weight matrices that describe transcription factor binding sites (singly and in clusters). We illustrate the use of MultiPipMaker to identify candidate regulatory regions in WNT2 and then demonstrate by transfection assays that they are functional. Analysis of the alignments also confirms the phylogenetic inference that horses are more closely related to cats than to cows.
Collapse
Affiliation(s)
- Scott Schwartz
- Department of Computer Science and Engineering, The Pennsylvania State University, University Park, PA 16802, USA
| | | | | | | | | | | | | | | | | |
Collapse
|
555
|
Sandelin A, Höglund A, Lenhard B, Wasserman WW. Integrated analysis of yeast regulatory sequences for biologically linked clusters of genes. Funct Integr Genomics 2003; 3:125-34. [PMID: 12827523 DOI: 10.1007/s10142-003-0086-6] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2002] [Revised: 04/07/2003] [Accepted: 04/29/2003] [Indexed: 10/26/2022]
Abstract
Dramatic progress in deciphering the regulatory controls in Saccharomyces cerevisiae has been enabled by the fusion of high-throughput genomics technologies with advanced sequence analysis algorithms. Sets of genes likely to function together and with similar expression profiles have been identified in diverse studies. By fusing an advanced pattern recognition algorithm for identification of transcription factor binding sites with a new method for the quantitative comparison of binding properties of transcription factors, we provide an integrated means to move from expression data to biological insights. The Yeast Regulatory Sequence Analysis system, YRSA, combines standard functions with a novel pattern characterization procedure in an intuitive interface designed for use by a broad range of scientists. The features of the system include automated retrieval of user-defined promoter sequences, binding site discovery by pattern recognition, graphical displays of the observed pattern and positions of similar sequences in the specified genes, and comparison of the new pattern against a collection of binding patterns for characterized transcription factors. The comprehensive YRSA system was used to study the regulatory mechanisms of yeast regulons. Analysis of the regulatory controls of a battery of genes induced by DNA damaging agents supports a putative mediating role for the cell-cycle checkpoint regulatory element MCB. YRSA is available at http://yrsa.cgb.ki.se. [YRSA: ancient Scandinavian name meaning old she-bear (Latin Ursus arctos = brown bear/grizzly).]
Collapse
Affiliation(s)
- Albin Sandelin
- Center for Genomics and Bioinformatics, Karolinska Institutet, Stockholm, Sweden
| | | | | | | |
Collapse
|
556
|
Thompson W, Rouchka EC, Lawrence CE. Gibbs Recursive Sampler: finding transcription factor binding sites. Nucleic Acids Res 2003; 31:3580-5. [PMID: 12824370 PMCID: PMC169014 DOI: 10.1093/nar/gkg608] [Citation(s) in RCA: 231] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2003] [Revised: 04/09/2003] [Accepted: 04/09/2003] [Indexed: 11/14/2022] Open
Abstract
The Gibbs Motif Sampler is a software package for locating common elements in collections of biopolymer sequences. In this paper we describe a new variation of the Gibbs Motif Sampler, the Gibbs Recursive Sampler, which has been developed specifically for locating multiple transcription factor binding sites for multiple transcription factors simultaneously in unaligned DNA sequences that may be heterogeneous in DNA composition. Here we describe the basic operation of the web-based version of this sampler. The sampler may be acces-sed at http://bayesweb.wadsworth.org/gibbs/gibbs.html and at http://www.bioinfo.rpi.edu/applications/bayesian/gibbs/gibbs.html. An online user guide is available at http://bayesweb.wadsworth.org/gibbs/bernoulli.html and at http://www.bioinfo.rpi.edu/applications/bayesian/gibbs/manual/bernoulli.html. Solaris, Solaris.x86 and Linux versions of the sampler are available as stand-alone programs for academic and not-for-profit users. Commercial licenses are also available. The Gibbs Recursive Sampler is distributed in accordance with the ISCB level 0 guidelines and a requirement for citation of use in scientific publications.
Collapse
Affiliation(s)
- William Thompson
- The Wadsworth Center, New York State Department of Health, Albany, NY 12201-0509, USA.
| | | | | |
Collapse
|
557
|
Lapidot M, Pilpel Y. Comprehensive quantitative analyses of the effects of promoter sequence elements on mRNA transcription. Nucleic Acids Res 2003; 31:3824-8. [PMID: 12824429 PMCID: PMC168999 DOI: 10.1093/nar/gkg593] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2003] [Revised: 04/02/2003] [Accepted: 04/02/2003] [Indexed: 11/13/2022] Open
Abstract
We have generated a WWW interface for automated comprehensive analyses of promoter regulatory motifs and the effect they exert on mRNA expression profiles. The server provides a wide spectrum of analysis tools that allow de novo discovery of regulatory motifs, along with refinement and in-depth investigation of fully or partially characterized motifs. The presented discovery and analysis tools are fundamentally different from existing tools in their basic rational, statistical background and specificity and sensitivity towards true regulatory elements. We thus anticipate that the service will be of great importance to the experimental and computational biology communities alike. The motif discovery and diagnosis workbench is available at http://longitude.weizmann.ac.il/rMotif/.
Collapse
Affiliation(s)
- Michal Lapidot
- Department of Molecular Genetics, Weizmann Institute of Science, Rehovot, 76100, Israel
| | | |
Collapse
|
558
|
Chiang DY, Moses AM, Kellis M, Lander ES, Eisen MB. Phylogenetically and spatially conserved word pairs associated with gene-expression changes in yeasts. Genome Biol 2003; 4:R43. [PMID: 12844359 PMCID: PMC193630 DOI: 10.1186/gb-2003-4-7-r43] [Citation(s) in RCA: 42] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2003] [Revised: 04/28/2003] [Accepted: 05/15/2003] [Indexed: 12/02/2022] Open
Abstract
BACKGROUND Transcriptional regulation in eukaryotes often involves multiple transcription factors binding to the same transcription control region, and to understand the regulatory content of eukaryotic genomes it is necessary to consider the co-occurrence and spatial relationships of individual binding sites. The determination of conserved sequences (often known as phylogenetic footprinting) has identified individual transcription factor binding sites. We extend this concept of functional conservation to higher-order features of transcription control regions. RESULTS We used the genome sequences of four yeast species of the genus Saccharomyces to identify sequences potentially involved in multifactorial control of gene expression. We found 989 potential regulatory 'templates': pairs of hexameric sequences that are jointly conserved in transcription regulatory regions and also exhibit non-random relative spacing. Many of the individual sequences in these templates correspond to known transcription factor binding sites, and the sets of genes containing a particular template in their transcription control regions tend to be differentially expressed in conditions where the corresponding transcription factors are known to be active. The incorporation of word pairs to define sequence features yields more specific predictions of average expression profiles and more informative regression models for genome-wide expression data than considering sequence conservation alone. CONCLUSIONS The incorporation of both joint conservation and spacing constraints of sequence pairs predicts groups of target genes that are specific for common patterns of gene expression. Our work suggests that positional information, especially the relative spacing between transcription factor binding sites, may represent a common organizing principle of transcription control regions.
Collapse
Affiliation(s)
- Derek Y Chiang
- Department of Molecular and Cell Biology, University of California, Berkeley, CA 94720, USA
| | - Alan M Moses
- Graduate Group in Biophysics, University of California, Berkeley, CA 94720, USA
| | - Manolis Kellis
- Whitehead/MIT Center for Genome Research, Department of Computer Science, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - Eric S Lander
- Whitehead/MIT Center for Genome Research, Department of Biology, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - Michael B Eisen
- Department of Genome Sciences, Life Sciences Division, Ernest Orlando Lawrence Berkeley National Lab, 1 Cyclotron Road, Berkeley, CA 94720, USA
- Center for Integrative Genomics and Division of Genetics and Development, Department of Molecular and Cell Biology, University of California, Berkeley, CA 94720, USA
| |
Collapse
|
559
|
Michael TP, McClung CR. Enhancer trapping reveals widespread circadian clock transcriptional control in Arabidopsis. PLANT PHYSIOLOGY 2003; 132:629-39. [PMID: 12805593 PMCID: PMC167003 DOI: 10.1104/pp.021006] [Citation(s) in RCA: 86] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/26/2003] [Revised: 02/21/2003] [Accepted: 03/01/2003] [Indexed: 05/18/2023]
Abstract
The circadian clock synchronizes the internal biology of an organism with the environment and has been shown to be widespread among organisms. Microarray experiments have shown that the circadian clock regulates mRNA abundance of about 10% of the transcriptome in plants, invertebrates, and mammals. In contrast, the circadian clock regulates the transcription of the virtually all cyanobacterial genes. To determine the extent to which the circadian clock controls transcription in Arabidopsis, we used in vivo enhancer trapping. We found that 36% of our enhancer trap lines display circadian-regulated transcription, which is much higher than estimates of circadian regulation based on analysis of steady-state mRNA abundance. Individual lines identified by enhancer trapping exhibit peak transcription rates at circadian phases spanning the complete circadian cycle. Flanking genomic sequence was identified for 23 enhancer trap lines to identify clock-controlled genes (CCG-ETs). Promoter analysis of CCG-ETs failed to predict new circadian clock response elements (CCREs), although previously defined CCREs, the CCA1-binding site, and the evening element were identified. However, many CCGs lack either the CCA1-binding site or the evening element; therefore, the presence of these CCREs is insufficient to confer circadian regulation, and it is clear that additional elements play critical roles.
Collapse
Affiliation(s)
- Todd P Michael
- Department of Biological Sciences, Dartmouth College, Hanover, New Hampshire 03755, USA
| | | |
Collapse
|
560
|
Kirouac M, Sternberg PW. cis-Regulatory control of three cell fate-specific genes in vulval organogenesis of Caenorhabditis elegans and C. briggsae. Dev Biol 2003; 257:85-103. [PMID: 12710959 DOI: 10.1016/s0012-1606(03)00032-0] [Citation(s) in RCA: 44] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
The great-grandprogeny of the Caenorhabditis elegans vulval precursor cells (VPCs) adopt one of the final vulA, B1, B2, C, D, E, and F cell fates in a precise spatial pattern. This pattern of vulval cell types is likely to depend on the cis-regulatory regions of the transcriptional targets of intercellular signals in vulval development. egl-17, zmp-1, and cdh-3 are expressed differentially in the developing vulva cells, providing a potential readout for different signaling pathways. To understand how such pathways interact to specify unique vulval cell types in a precise pattern, we have identified cis-regulatory regions sufficient to confer vulval cell type-specific regulation when fused in cis to the basal pes-10 promoter. We have identified the C. briggsae homologs of these three genes, with their corresponding control regions, and tested these regions in both C. elegans and C. briggsae. These regions of similarity in C. elegans and C. briggsae upstream of egl-17, zmp-1, and cdh-3 promote expression in vulval cells and the anchor cell (AC). By using the cis-regulatory analysis and phylogenetic footprinting, we have identified overrepresented sequences involved in conferring vulval and AC expression.
Collapse
Affiliation(s)
- Martha Kirouac
- Howard Hughes Medical Institute and Division of Biology, mail code 156-29, California Institute of Technology, Pasadena, CA 91125, USA
| | | |
Collapse
|
561
|
Ettwiller LM, Rung J, Birney E. Discovering novel cis-regulatory motifs using functional networks. Genome Res 2003; 13:883-95. [PMID: 12727907 PMCID: PMC430934 DOI: 10.1101/gr.866403] [Citation(s) in RCA: 17] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
We combined functional information such as protein-protein interactions or metabolic networks with genome information in Saccharomyces cerevisiae to predict cis-regulatory motifs in the upstream region of genes. We developed a new scoring metric combining these two information sources and used this metric in motif discovery. To estimate the statistical significance of this metric, we used brute-force randomization, which shows a consistent well-behaved trend. In contrast, real data showed complex nonrandom behavior. With conservative parameters we were able to find 42 degenerate motifs (that touch 40% of yeast genes) based on 647 original patterns, five of which are well known. Some of these motifs also show limited spatial position in the promoter, indicative of a true motif. We also tested the metric on other known motifs and show that this metric is a good discriminator of real motifs. As well as a pragmatic motif discovery method, with many applications beyond this work, these results also show that interacting proteins are often coordinated at the level of transcription, even in the absence of obvious coregulation in gene expression data sets.
Collapse
Affiliation(s)
- Laurence M Ettwiller
- European Bioinformatics Institute (EBI), Wellcome Trust Genome Campus, Hinxton, CB10 1SD, UK
| | | | | |
Collapse
|
562
|
Elkon R, Linhart C, Sharan R, Shamir R, Shiloh Y. Genome-wide in silico identification of transcriptional regulators controlling the cell cycle in human cells. Genome Res 2003; 13:773-80. [PMID: 12727897 PMCID: PMC430898 DOI: 10.1101/gr.947203] [Citation(s) in RCA: 252] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2002] [Accepted: 02/25/2003] [Indexed: 11/24/2022]
Abstract
Dissection of regulatory networks that control gene transcription is one of the greatest challenges of functional genomics. Using human genomic sequences, models for binding sites of known transcription factors, and gene expression data, we demonstrate that the reverse engineering approach, which infers regulatory mechanisms from gene expression patterns, can reveal transcriptional networks in human cells. To date, such methodologies were successfully demonstrated only in prokaryotes and low eukaryotes. We developed computational methods for identifying putative binding sites of transcription factors and for evaluating the statistical significance of their prevalence in a given set of promoters. Focusing on transcriptional mechanisms that control cell cycle progression, our computational analyses revealed eight transcription factors whose binding sites are significantly overrepresented in promoters of genes whose expression is cell-cycle-dependent. The enrichment of some of these factors is specific to certain phases of the cell cycle. In addition, several pairs of these transcription factors show a significant co-occurrence rate in cell-cycle-regulated promoters. Each such pair indicates functional cooperation between its members in regulating the transcriptional program associated with cell cycle progression. The methods presented here are general and can be applied to the analysis of transcriptional networks controlling any biological process.
Collapse
Affiliation(s)
- Ran Elkon
- The David and Inez Myers Laboratory for Genetic Research, Department of Human Genetics, Sackler School of Medicine, and School of Computer Science, Tel Aviv University, Tel Aviv 69978, Israel
| | | | | | | | | |
Collapse
|
563
|
Miyoshi K, Shirai C, Mizuta K. Transcription of genes encoding trans-acting factors required for rRNA maturation/ribosomal subunit assembly is coordinately regulated with ribosomal protein genes and involves Rap1 in Saccharomyces cerevisiae. Nucleic Acids Res 2003; 31:1969-73. [PMID: 12655014 PMCID: PMC152794 DOI: 10.1093/nar/gkg278] [Citation(s) in RCA: 19] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2002] [Revised: 02/06/2003] [Accepted: 02/06/2003] [Indexed: 11/13/2022] Open
Abstract
We demonstrate that the genes encoding trans- acting factors essential for pre-rRNA processing/ribosomal subunit assembly are responsive to various kinds of stresses such as heat shock, nitrogen deprivation and a secretory defect, in coordination with ribosomal protein genes in Saccharomyces cerevisiae. The rap1-17 mutation, which produces the C-terminally truncated protein of a transcriptional factor Rap1p, affects transcriptional repression of the trans-acting factor genes due to a secretory defect as shown previously for both ribosomal protein and rRNA genes.
Collapse
Affiliation(s)
- Keita Miyoshi
- Department of Bioresource Science and Technology, Graduate School of Biosphere Science, Hiroshima University, Kagamiyama 1-4-4, Higashi-Hiroshima 739-8528, Japan
| | | | | |
Collapse
|
564
|
Qin ZS, McCue LA, Thompson W, Mayerhofer L, Lawrence CE, Liu JS. Identification of co-regulated genes through Bayesian clustering of predicted regulatory binding sites. Nat Biotechnol 2003; 21:435-9. [PMID: 12627170 DOI: 10.1038/nbt802] [Citation(s) in RCA: 74] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2002] [Accepted: 12/07/2002] [Indexed: 02/01/2023]
Abstract
The identification of co-regulated genes and their transcription-factor binding sites (TFBS) are key steps toward understanding transcription regulation. In addition to effective laboratory assays, various computational approaches for the detection of TFBS in promoter regions of coexpressed genes have been developed. The availability of complete genome sequences combined with the likelihood that transcription factors and their cognate sites are often conserved during evolution has led to the development of phylogenetic footprinting. The modus operandi of this technique is to search for conserved motifs upstream of orthologous genes from closely related species. The method can identify hundreds of TFBS without prior knowledge of co-regulation or coexpression. Because many of these predicted sites are likely to be bound by the same transcription factor, motifs with similar patterns can be put into clusters so as to infer the sets of co-regulated genes, that is, the regulons. This strategy utilizes only genome sequence information and is complementary to and confirmative of gene expression data generated by microarray experiments. However, the limited data available to characterize individual binding patterns, the variation in motif alignment, motif width, and base conservation, and the lack of knowledge of the number and sizes of regulons make this inference problem difficult. We have developed a Gibbs sampling-based Bayesian motif clustering (BMC) algorithm to address these challenges. Tests on simulated data sets show that BMC produces many fewer errors than hierarchical and K-means clustering methods. The application of BMC to hundreds of predicted gamma-proteobacterial motifs correctly identified many experimentally reported regulons, inferred the existence of previously unreported members of these regulons, and suggested novel regulons.
Collapse
Affiliation(s)
- Zhaohui S Qin
- Department of Statistics, Harvard University, Cambridge, MA 02138, USA
| | | | | | | | | | | |
Collapse
|
565
|
Ureta-Vidal A, Ettwiller L, Birney E. Comparative genomics: genome-wide analysis in metazoan eukaryotes. Nat Rev Genet 2003; 4:251-62. [PMID: 12671656 DOI: 10.1038/nrg1043] [Citation(s) in RCA: 156] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
The increasing number of complete and nearly complete metazoan genome sequences provides a significant amount of material for large-scale comparative genomic analysis. Finding new effective methods to analyse such enormous datasets has been the object of intense research. Three main areas in comparative genomics have recently shown important developments: whole-genome alignment, gene prediction and regulatory-region prediction. Each of these areas improves the methods of deciphering long genomic sequences and uncovering what lies hidden in them.
Collapse
Affiliation(s)
- Abel Ureta-Vidal
- EnsEMBL Project, Room A2-06, EMBL-European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | | | | |
Collapse
|
566
|
Conlon EM, Liu XS, Lieb JD, Liu JS. Integrating regulatory motif discovery and genome-wide expression analysis. Proc Natl Acad Sci U S A 2003; 100:3339-44. [PMID: 12626739 PMCID: PMC152294 DOI: 10.1073/pnas.0630591100] [Citation(s) in RCA: 252] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
We propose motif regressor for discovering sequence motifs upstream of genes that undergo expression changes in a given condition. The method combines the advantages of matrix-based motif finding and oligomer motif-expression regression analysis, resulting in high sensitivity and specificity. motif regressor is particularly effective in discovering expression-mediating motifs of medium to long width with multiple degenerate positions. When applied to Saccharomyces cerevisiae, motif regressor identified the ROX1 and YAP1 motifs from Rox1p and Yap1p overexpression experiments, respectively; predicted that Gcn4p may have increased activity in YAP1 deletion mutants; reported a group of motifs (including GCN4, PHO4, MET4, STRE, USR1, RAP1, M3A, and M3B) that may mediate the transcriptional response to amino acid starvation; and found all of the known cell-cycle regulation motifs from 18 expression microarrays over two cell cycles.
Collapse
Affiliation(s)
- Erin M Conlon
- Department of Statistics, Harvard University, 1 Oxford Street, Cambridge, MA 02138, USA
| | | | | | | |
Collapse
|
567
|
Aerts S, Thijs G, Coessens B, Staes M, Moreau Y, De Moor B. Toucan: deciphering the cis-regulatory logic of coregulated genes. Nucleic Acids Res 2003; 31:1753-64. [PMID: 12626717 PMCID: PMC152870 DOI: 10.1093/nar/gkg268] [Citation(s) in RCA: 147] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
TOUCAN is a Java application for the rapid discovery of significant cis-regulatory elements from sets of coexpressed or coregulated genes. Biologists can automatically (i) retrieve genes and intergenic regions, (ii) identify putative regulatory regions, (iii) score sequences for known transcription factor binding sites, (iv) identify candidate motifs for unknown binding sites, and (v) detect those statistically over-represented sites that are characteristic for a gene set. Genes or intergenic regions are retrieved from Ensembl or EMBL, together with orthologs and supporting information. Orthologs are aligned and syntenic regions are selected as candidate regulatory regions. Putative sites for known transcription factors are detected using our MotifScanner, which scores position weight matrices using a probabilistic model. New motifs are detected using our MotifSampler based on Gibbs sampling. Binding sites characteristic for a gene set--and thus statistically over-represented with respect to a reference sequence set--are found using a binomial test. We have validated Toucan by analyzing muscle-specific genes, liver-specific genes and E2F target genes; we have easily detected many known binding sites within intergenic DNA and identified new biologically plausible sites for known and unknown transcription factors. Software available at http://www.esat.kuleuven.ac. be/ approximately dna/BioI/Software.html.
Collapse
Affiliation(s)
- Stein Aerts
- Department of Electrical Engineering (ESAT-SCD), Katholieke Universiteit Leuven, Kasteelpark Arenberg 10, 3001 Heverlee, Leuven, Belgium.
| | | | | | | | | | | |
Collapse
|
568
|
Qiu P, Qin L, Sorrentino RP, Greene JR, Wang L, Partridge NC. Comparative promoter analysis and its application in analysis of PTH-regulated gene expression. J Mol Biol 2003; 326:1327-36. [PMID: 12595247 DOI: 10.1016/s0022-2836(03)00053-6] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
Abstract
Taking advantage of the "working draft" of the human genome and the MIT shotgun assembly of the mouse genome, we performed a comparative promoter analysis of human RefSeq mRNA (sequences from GenBank's RefSeq database). By combining this analysis with a transcription factor (TF) binding site analysis using a TRANSFAC position weight matrix (PWM) search, 86% of non-specific TF sites were removed. Using a set of genes that are regulated by parathyroid hormone (PTH), a statistical analysis was performed on the conserved TF binding sites among a set of eight human and mouse genes. From among the eight genes tested, we obtained a set of 31 TFs, suggesting possible roles for associated genes in PTH-mediated pathways. All three known PTH-responsive TFs (AP1, RUNX2, CREB) were correctly predicted by this analysis as well as two other potential TFs (VDR and CEBP Delta). Additionally, a model was made to describe the TF site characteristic module of PTH-regulated genes. This model was then used to search all human RefSeq gene promoters with established human-mouse ortholog relationships to identify other PTH-regulated genes. This comparative approach combined with statistical analysis proved to be sufficiently specific to decipher critical TFs involved in PTH-regulated pathways.
Collapse
Affiliation(s)
- Ping Qiu
- Bioinformatics Group and Discovery Technology Department, Schering-Plough Research Institute, 2015 Galloping Hill Road, Kenilworth, NJ 07033, USA.
| | | | | | | | | | | |
Collapse
|
569
|
Hedenfalk I, Ringner M, Ben-Dor A, Yakhini Z, Chen Y, Chebil G, Ach R, Loman N, Olsson H, Meltzer P, Borg A, Trent J. Molecular classification of familial non-BRCA1/BRCA2 breast cancer. Proc Natl Acad Sci U S A 2003; 100:2532-7. [PMID: 12610208 PMCID: PMC151375 DOI: 10.1073/pnas.0533805100] [Citation(s) in RCA: 141] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
In the decade since their discovery, the two major breast cancer susceptibility genes BRCA1 and BRCA2, have been shown conclusively to be involved in a significant fraction of families segregating breast and ovarian cancer. However, it has become equally clear that a large proportion of families segregating breast cancer alone are not caused by mutations in BRCA1 or BRCA2. Unfortunately, despite intensive effort, the identification of additional breast cancer predisposition genes has so far been unsuccessful, presumably because of genetic heterogeneity, low penetrance, or recessive/polygenic mechanisms. These non-BRCA1/2 breast cancer families (termed BRCAx families) comprise a histopathologically heterogeneous group, further supporting their origin from multiple genetic events. Accordingly, the identification of a method to successfully subdivide BRCAx families into recognizable groups could be of considerable value to further genetic analysis. We have previously shown that global gene expression analysis can identify unique and distinct expression profiles in breast tumors from BRCA1 and BRCA2 mutation carriers. Here we show that gene expression profiling can discover novel classes among BRCAx tumors, and differentiate them from BRCA1 and BRCA2 tumors. Moreover, microarray-based comparative genomic hybridization (CGH) to cDNA arrays revealed specific somatic genetic alterations within the BRCAx subgroups. These findings illustrate that, when gene expression-based classifications are used, BRCAx families can be grouped into homogeneous subsets, thereby potentially increasing the power of conventional genetic analysis.
Collapse
Affiliation(s)
- Ingrid Hedenfalk
- Cancer Genetics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
570
|
Abstract
The transition to malignancy requires an extensive reconfiguration of the genome's expression program that does not result entirely from actual changes in primary DNA sequence-i.e., mutation. Epigenetic-meta-DNA-gene expression states result from an assembly over a given locus of a poorly understood nucleoprotein entity that includes histones and other architectural components of chromatin, nonhistone DNA-bound regulators, and additional chromatin-bound polypeptides. This structure is rapidly reestablished in the wake of the DNA replication fork, thus ensuring its persistence in rapidly proliferating cells and thereby yielding an exceptionally stable mode of gene expression. Chromatin is the perfect vehicle for enabling such genome control. During S phase both covalently modified histones and histone-associated regulatory proteins distribute to the newly synthesized daughter chromatids in a form of "molecular dowry" inherited from the G(1) state of the genome, and impose a specific mode of function on the underlying DNA. An extensively studied example of chromatin-based epigenetic inheritance connects DNA methylation to the targeting of chromatin remodeling and modification. In a broad sense, however, genome reprogramming in cancer is associated with the remodeling of a multitude of regulatory DNA stretches-e.g., promoters, enhancers, locus control regions (LCRs), insulators, etc.-into a specific chromatin architecture. This architectural entity provides a general molecular signature of the cancer epigenome that complements and significantly expands its DNA methylation-based component.
Collapse
Affiliation(s)
- Fyodor D Urnov
- Sangamo BioSciences, Inc., Point Richmond Tech Center, 501 Canal Boulevard, Suite A100, Richmond, California 94804, USA.
| |
Collapse
|
571
|
Chhabra SR, Shockley KR, Conners SB, Scott KL, Wolfinger RD, Kelly RM. Carbohydrate-induced differential gene expression patterns in the hyperthermophilic bacterium Thermotoga maritima. J Biol Chem 2003; 278:7540-52. [PMID: 12475972 DOI: 10.1074/jbc.m211748200] [Citation(s) in RCA: 110] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
Abstract
The hyperthermophilic bacterium Thermotoga maritima MSB8 was grown on a variety of carbohydrates to determine the influence of carbon and energy source on differential gene expression. Despite the fact that T. maritima has been phylogenetically characterized as a primitive microorganism from an evolutionary perspective, results here suggest that it has versatile and discriminating mechanisms for regulating and effecting complex carbohydrate utilization. Growth of T. maritima on monosaccharides was found to be slower than growth on polysaccharides, although growth to cell densities of 10(8) to 10(9) cells/ml was observed on all carbohydrates tested. Differential expression of genes encoding carbohydrate-active proteins encoded in the T. maritima genome was followed using a targeted cDNA microarray in conjunction with mixed model statistical analysis. Coordinated regulation of genes responding to specific carbohydrates was noted. Although glucose generally repressed expression of all glycoside hydrolase genes, other sugars induced or repressed these genes to varying extents. Expression profiles of most endo-acting glycoside hydrolase genes correlated well with their reported biochemical properties, although exo-acting glycoside hydrolase genes displayed less specific expression patterns. Genes encoding selected putative ABC sugar transporters were found to respond to specific carbohydrates, and in some cases putative oligopeptide transporter genes were also found to respond to specific sugar substrates. Several genes encoding putative transcriptional regulators were expressed during growth on specific sugars, thus suggesting functional assignments. The transcriptional response of T. maritima to specific carbohydrate growth substrates indicated that sugar backbone- and linkage-specific regulatory networks are operational in this organism during the uptake and utilization of carbohydrate substrates. Furthermore, the wide ranging collection of such networks in T. maritima suggests that this organism is capable of adapting to a variety of growth environments containing carbohydrate growth substrates.
Collapse
Affiliation(s)
- Swapnil R Chhabra
- Department of Chemical Engineering, North Carolina State University, Raleigh, North Carolina 27695, USA
| | | | | | | | | | | |
Collapse
|
572
|
Kessler MM, Zeng Q, Hogan S, Cook R, Morales AJ, Cottarel G. Systematic discovery of new genes in the Saccharomyces cerevisiae genome. Genome Res 2003; 13:264-71. [PMID: 12566404 PMCID: PMC420365 DOI: 10.1101/gr.232903] [Citation(s) in RCA: 52] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2002] [Accepted: 11/07/2002] [Indexed: 11/24/2022]
Abstract
We used genome-wide comparative analysis of predicted protein sequences to identify many novel small genes, named smORFs for small open reading frames, within the budding yeast genome. Further analysis of 117 of these new genes showed that 84 are transcribed. We extended our analysis of one smORF conserved from yeast to human. This investigation provides an updated and comprehensive annotation of the yeast genome, validates additional concepts in the study of genomes in silico, and increases the expected numbers of coding sequences in a genome with the corresponding impact on future functional genomics and proteomics studies.
Collapse
Affiliation(s)
- Marco M Kessler
- Genome Therapeutics Corporation, Waltham, Massachusetts 02453, USA
| | | | | | | | | | | |
Collapse
|
573
|
Abstract
Decomposing a biological sequence into modular domains is a basic prerequisite to identify functional units in biological molecules. The commonly used segmentation procedures usually have two steps. First, collect and align a set of sequences that are homologous to the target sequence. Then, parse this multiple alignment into several blocks and identify the functionally important ones by using a semi-automatic method, which combines manual analysis and expert knowledge. In this paper, we present a novel exploratory approach to parsing and analyzing such kinds of multiple alignments. It is based on a type of analysis-of-variance (ANOVA) decomposition of the sequence information content. Unlike the traditional change-point method, this approach takes into account not only the composition biases but also the overdispersion effects among the blocks. The new approach is tested on the families of ribosomal proteins and has a promising performance. It is shown that the new approach provides a better way for judging some important residues in these proteins. This allows one to find some subsets of residues, which are critical to these proteins.
Collapse
Affiliation(s)
- Jian Zhang
- EURANDOM, Den Dolech 2, 5612 AZ, Eindhoven, The Chinese Academy of Sciences, Beijing.
| |
Collapse
|
574
|
|
575
|
Yeast functional genomics and metabolic engineering: past, present and future. TOPICS IN CURRENT GENETICS 2003. [DOI: 10.1007/3-540-37003-x_11] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/01/2023]
|
576
|
Whitham SA, Quan S, Chang HS, Cooper B, Estes B, Zhu T, Wang X, Hou YM. Diverse RNA viruses elicit the expression of common sets of genes in susceptible Arabidopsis thaliana plants. THE PLANT JOURNAL : FOR CELL AND MOLECULAR BIOLOGY 2003; 33:271-83. [PMID: 12535341 DOI: 10.1046/j.1365-313x.2003.01625.x] [Citation(s) in RCA: 236] [Impact Index Per Article: 11.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/18/2023]
Abstract
Systemic infections of plants by viruses require that viruses modify host cells in order to facilitate infections. These modifications include induction of host factors required for replication, propagation and movement, and suppression of host defense responses, which are likely to be associated with changes in host gene expression. Past studies of the effects of viral infection on gene expression in susceptible hosts have been limited to only a handful of genes. To gain broader insight into the responses elicited by viruses in susceptible hosts, high-density oligonucleotide probe microarray technology was used. Arabidopsis leaves were either mock inoculated or inoculated with cucumber mosaic cucumovirus, oil seed rape tobamovirus, turnip vein clearing tobamovirus, potato virus X potexvirus, or turnip mosaic potyvirus. Inoculated leaves were collected at 1, 2, 4, and 5 days after inoculation, total RNA was isolated, and samples were hybridized to Arabidopsis GeneChip microarrays (Affymetrix). Microarray hybridization revealed co-ordinated changes in gene expression in response to infection by diverse viruses. These changes include virus-general and virus-specific alterations in the expression of genes associated with distinct defense or stress responses. Analyses of the promoters of these genes further suggest that diverse RNA viruses elicit common responses in susceptible plant hosts through signaling pathways that have not been previously characterized.
Collapse
Affiliation(s)
- Steven A Whitham
- Department of Plant Pathology, Iowa State University, Ames, IA 50011-1020, USA.
| | | | | | | | | | | | | | | |
Collapse
|
577
|
Abstract
A common approach to the analysis of gene expression data is to define clusters of genes that have similar expression. A critical step in cluster analysis is the determination of similarity between the expression levels of two genes. We introduce a neural network-based similarity index as a non-linear similarity index and compare the results with other proximity measures for Saccharomyces cerevisiae gene expression data. We show that the clusters obtained using Euclidean distance, correlation coefficients, and mutual information were not significantly different. The clusters formed with the neural network-based index were more in agreement with those defined by functional categories and common regulatory motifs.
Collapse
Affiliation(s)
- Tomohiro Sawa
- Division of Health Sciences and Technology, Harvard Medical School and Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, MA 02139, USA.
| | | |
Collapse
|
578
|
Jiménez JL, Mitchell MP, Sgouros JG. Microarray analysis of orthologous genes: conservation of the translational machinery across species at the sequence and expression level. Genome Biol 2002; 4:R4. [PMID: 12537549 PMCID: PMC151285 DOI: 10.1186/gb-2002-4-1-r4] [Citation(s) in RCA: 17] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2002] [Revised: 08/28/2002] [Accepted: 10/31/2002] [Indexed: 12/04/2022] Open
Abstract
BACKGROUND Genome projects have provided a vast amount of sequence information. Sequence comparison between species helps to establish functional catalogues within organisms and to study how they are maintained and modified across phylogenetic groups during evolution. Microarray studies allow us to determine groups of genes with similar temporal regulation and perhaps also common regulatory upstream regions for binding of transcription factors. The integration of sequence and expression data is expected to refine our current annotations and provide some insight into the evolution of gene regulation across organisms. RESULTS We have investigated how well the protein subcellular localization and functional categories established from clustering of orthologous genes agree with gene-expression data in Saccharomyces cerevisiae. An increase in the resolution of biologically meaningful classes is observed upon the combination of experiments under different conditions. The functional categories deduced by sequence comparison approaches are, in general, preserved at the level of expression and can sometimes interact into larger co-regulated networks, such as the protein translation process. Differences and similarities in the expression between cytoplasmic-mitochondrial and interspecies translation machineries complement evolutionary information from sequence similarity. CONCLUSIONS Combination of several microarray experiments is a powerful tool for the identification of upstream regulatory motifs of yeast genes involved in protein synthesis. Comparison of these yeast co-regulated genes against the archaeal and bacterial operons indicates that the components of the protein translation process are conserved across organisms at the expression level with minor specific adaptations.
Collapse
Affiliation(s)
- Jose L Jiménez
- Computational Genome Analysis Laboratory, Cancer Research UK, 44 Lincoln's Inn Fields, London WC2A 3PX, UK.
| | | | | |
Collapse
|
579
|
Sinha S, Tompa M. Discovery of novel transcription factor binding sites by statistical overrepresentation. Nucleic Acids Res 2002; 30:5549-60. [PMID: 12490723 PMCID: PMC140044 DOI: 10.1093/nar/gkf669] [Citation(s) in RCA: 139] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2002] [Accepted: 09/17/2002] [Indexed: 11/12/2022] Open
Abstract
Understanding the complex and varied mechanisms that regulate gene expression is an important and challenging problem. A fundamental sub-problem is to identify DNA binding sites for unknown regulatory factors, given a collection of genes believed to be co-regulated. We discuss a computational method that identifies good candidates for such binding sites. Unlike local search techniques such as expectation maximization and Gibbs samplers that may not reach a global optimum, the method discussed enumerates all motifs in the search space, and is guaranteed to produce the motifs with greatest z-scores. We discuss the results of validation experiments in which this algorithm was used to identify candidate binding sites in several well studied regulons of Saccharomyces cerevisiae, where the most prominent transcription factor binding sites are largely known. We then discuss the results on gene families in the functional and mutant phenotype catalogs of S.cerevisiae, where the algorithm suggests many promising novel transcription factor binding sites. The program is available at http://bio.cs.washington.edu/software.html.
Collapse
Affiliation(s)
- Saurabh Sinha
- Department of Computer Science and Engineering, Box 352350, University of Washington, Seattle, WA 98195-2350, USA
| | | |
Collapse
|
580
|
Sharan R, Elkon R, Shamir R. Cluster analysis and its applications to gene expression data. ERNST SCHERING RESEARCH FOUNDATION WORKSHOP 2002:83-108. [PMID: 12061008 DOI: 10.1007/978-3-662-04747-7_5] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Affiliation(s)
- R Sharan
- School of Computer Science, Tel Aviv University, Tel Aviv 69978, Israel.
| | | | | |
Collapse
|
581
|
Abstract
A large amount of microarray gene expression data relevant to the yeast cell cycle has been collected, and several hundred genes have been placed into a model transcriptional control network. Genome-wide studies of the location of cell cycle transcription factors, and a variety of computational approaches, have allowed refinement of the model, and at the same time show how other genome-wide data sets may be organised into model networks.
Collapse
Affiliation(s)
- Bruce Futcher
- Department of Molecular Genetics and Microbiology, Life Science Building, University of Stony Brook, Stony Brook, NY 11794-5222, USA.
| |
Collapse
|
582
|
Chen ZY, Corey DP. Understanding inner ear development with gene expression profiling. JOURNAL OF NEUROBIOLOGY 2002; 53:276-85. [PMID: 12382281 DOI: 10.1002/neu.10125] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
Understanding the development of the inner ear requires knowing the spatial and temporal pattern of gene expression, and the functions of those gene products. In the last decade, hearing research has benefited tremendously from the progress of the human and mouse genome projects, as amply illustrated by the identification of many deafness genes in both human and mouse. However, the sheer amount of information generated from the genome project has far outpaced the rate at which it is utilized. Microarray technology offers a means to quantify the expression level of transcripts at a whole-genome scale. Cross-tissue comparisons will identify genes unique to the inner ear, which will expedite the identification of new deafness genes. Microdissection and subtraction after ablation of cell types can reveal genes expressed in certain cells, such as hair cells. Expression profiling of both inner ear and other tissues, under a variety of conditions (such as during development, with drug treatment or in knock-out animals), can be used for cluster analysis to group genes of similar expression. Coexpression can suggest functional pathways and interactions between known genes, and can identify new genes in a structure or pathway. In this review we give examples for both transcription factors and cochlear structures.
Collapse
Affiliation(s)
- Zheng-Yi Chen
- Neurology Service, Massachusetts General Hospital, WEL425, Boston, Massachusetts 02114, USA
| | | |
Collapse
|
583
|
Sudarsanam P, Pilpel Y, Church GM. Genome-wide co-occurrence of promoter elements reveals a cis-regulatory cassette of rRNA transcription motifs in Saccharomyces cerevisiae. Genome Res 2002; 12:1723-31. [PMID: 12421759 PMCID: PMC187556 DOI: 10.1101/gr.301202] [Citation(s) in RCA: 70] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2002] [Accepted: 09/10/2002] [Indexed: 11/25/2022]
Abstract
Combinatorial regulation is an important feature of eukaryotic transcription. However, only a limited number of studies have characterized this aspect on a whole-genome level. We have conducted a genome-wide computational survey to identify cis-regulatory motif pairs that co-occur in a significantly high number of promoters in the S. cerevisiae genome. A pair of novel motifs, mRRPE and PAC, co-occur most highly in the genome, primarily in the promoters of genes involved in rRNA transcription and processing. The two motifs show significant positional and orientational bias with mRRPE being closer to the ATG than PAC in most promoters. Two additional rRNA-related motifs, mRRSE3 and mRRSE10, also co-occur with mRRPE and PAC. mRRPE and PAC are the primary determinants of expression profiles while mRRSE3 and mRRSE10 modulate these patterns. We describe a new computational approach for studying the functional significance of the physical locations of promoter elements that combine analyses of genome sequence and microarray data. Applying this methodology to the regulatory cassette containing the four rRNA motifs demonstrates that the relative promoter locations of these elements have a profound effect on the expression patterns of the downstream genes. These findings provide a function for these novel motifs and insight into the mechanism by which they regulate gene expression. The methodology introduced here should prove particularly useful for analyzing transcriptional regulation in more complex genomes.
Collapse
Affiliation(s)
- Priya Sudarsanam
- Department of Genetics and Lipper Center for Computational Genetics, Harvard Medical School, Boston, Massachusetts 02115, USA
| | | | | |
Collapse
|
584
|
Gasch AP, Eisen MB. Exploring the conditional coregulation of yeast gene expression through fuzzy k-means clustering. Genome Biol 2002; 3:RESEARCH0059. [PMID: 12429058 PMCID: PMC133443 DOI: 10.1186/gb-2002-3-11-research0059] [Citation(s) in RCA: 194] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2002] [Revised: 09/06/2002] [Accepted: 09/11/2002] [Indexed: 01/06/2023] Open
Abstract
BACKGROUND Organisms simplify the orchestration of gene expression by coregulating genes whose products function together in the cell. Many proteins serve different roles depending on the demands of the organism, and therefore the corresponding genes are often coexpressed with different groups of genes under different situations. This poses a challenge in analyzing whole-genome expression data, because many genes will be similarly expressed to multiple, distinct groups of genes. Because most commonly used analytical methods cannot appropriately represent these relationships, the connections between conditionally coregulated genes are often missed. RESULTS We used a heuristically modified version of fuzzy k-means clustering to identify overlapping clusters of yeast genes based on published gene-expression data following the response of yeast cells to environmental changes. We have validated the method by identifying groups of functionally related and coregulated genes, and in the process we have uncovered new correlations between yeast genes and between the experimental conditions based on similarities in gene-expression patterns. To investigate the regulation of gene expression, we correlated the clusters with known transcription factor binding sites present in the genes' promoters. These results give insights into the mechanism of the regulation of gene expression in yeast cells responding to environmental changes. CONCLUSIONS Fuzzy k-means clustering is a useful analytical tool for extracting biological insights from gene-expression data. Our analysis presented here suggests that a prevalent theme in the regulation of yeast gene expression is the condition-specific coregulation of overlapping sets of genes.
Collapse
Affiliation(s)
- Audrey P Gasch
- Department of Genome Science, Life Science Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Michael B Eisen
- Department of Genome Science, Life Science Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
- Department of Molecular and Cell Biology, University of California, Berkeley, CA 94720, USA
| |
Collapse
|
585
|
Abstract
Phylogenetic footprinting is a technique that identifies regulatory elements by finding unusually well conserved regions in a set of orthologous noncoding DNA sequences from multiple species. We introduce a new motif-finding problem, the Substring Parsimony Problem, which is a formalization of the ideas behind phylogenetic footprinting, and we present an exact dynamic programming algorithm to solve it. We then present a number of algorithmic optimizations that allow our program to run quickly on most biologically interesting datasets. We show how to handle data sets in which only an unknown subset of the sequences contains the regulatory element. Finally, we describe how to empirically assess the statistical significance of the motifs found. Each technique is implemented and successfully identifies a number of known binding sites, as well as several highly conserved but uncharacterized regions. The program is available at http://bio.cs.washington.edu/software.html.
Collapse
Affiliation(s)
- Mathieu Blanchette
- Department of Computer Science and Engineering, Box 352350, University of Washington, Seattle, WA 98195-2350, USA.
| | | | | |
Collapse
|
586
|
Thijs G, Marchal K, Lescot M, Rombauts S, De Moor B, Rouzé P, Moreau Y. A Gibbs sampling method to detect overrepresented motifs in the upstream regions of coexpressed genes. J Comput Biol 2002; 9:447-64. [PMID: 12015892 DOI: 10.1089/10665270252935566] [Citation(s) in RCA: 260] [Impact Index Per Article: 11.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Microarray experiments can reveal important information about transcriptional regulation. In our case, we look for potential promoter regulatory elements in the upstream region of coexpressed genes. Here we present two modifications of the original Gibbs sampling algorithm for motif finding (Lawrence et al., 1993). First, we introduce the use of a probability distribution to estimate the number of copies of the motif in a sequence. Second, we describe the technical aspects of the incorporation of a higher-order background model whose application we discussed in Thijs et al. (2001). Our implementation is referred to as the Motif Sampler. We successfully validate our algorithm on several data sets. First, we show results for three sets of upstream sequences containing known motifs: 1) the G-box light-response element in plants, 2) elements involved in methionine response in Saccharomyces cerevisiae, and 3) the FNR O(2)-responsive element in bacteria. We use these data sets to explain the influence of the parameters on the performance of our algorithm. Second, we show results for upstream sequences from four clusters of coexpressed genes identified in a microarray experiment on wounding in Arabidopsis thaliana. Several motifs could be matched to regulatory elements from plant defence pathways in our database of plant cis-acting regulatory elements (PlantCARE). Some other strong motifs do not have corresponding motifs in PlantCARE but are promising candidates for further analysis.
Collapse
Affiliation(s)
- Gert Thijs
- ESAT-SCD, KULeuven, Kasteelpark Arenberg 10, 3001 Leuven, Belgium.
| | | | | | | | | | | | | |
Collapse
|
587
|
Abstract
The recent growth in genomic data and measurements of genome-wide expression patterns allows us to apply computational tools to examine gene regulation by transcription factors. In this work, we present a class of mathematical models that help in understanding the connections between transcription factors and functional classes of genes based on genetic and genomic data. Such a model represents the joint distribution of transcription factor binding sites and of expression levels of a gene in a unified probabilistic model. Learning a combined probability model of binding sites and expression patterns enables us to improve the clustering of the genes based on the discovery of putative binding sites and to detect which binding sites and experiments best characterize a cluster. To learn such models from data, we introduce a new search method that rapidly learns a model according to a Bayesian score. We evaluate our method on synthetic data as well as on real life data and analyze the biological insights it provides. Finally, we demonstrate the applicability of the method to other data analysis problems in gene expression data.
Collapse
Affiliation(s)
- Yoseph Barash
- School of Computer Science and Engineering, Hebrew University, Jerusalem 91904, Israel
| | | |
Collapse
|
588
|
Moseyko N, Zhu T, Chang HS, Wang X, Feldman LJ. Transcription profiling of the early gravitropic response in Arabidopsis using high-density oligonucleotide probe microarrays. PLANT PHYSIOLOGY 2002; 130:720-8. [PMID: 12376639 PMCID: PMC166601 DOI: 10.1104/pp.009688] [Citation(s) in RCA: 64] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/09/2002] [Accepted: 06/14/2002] [Indexed: 05/18/2023]
Abstract
Studies of plant tropisms, the directed growth toward or away from external stimuli such as light and gravity, began more than a century ago. Yet biochemical, physiological, and especially molecular mechanisms of plant tropic responses remain for the most part unclear. We examined expression of 8,300 genes during early stages of the gravitropic response using high-density oligonucleotide probe microarrays. Approximately 1.7% of the genes represented on the array exhibited significant expression changes within the first 30 min of gravity stimulation. Among gravity-induced genes were a number of genes previously implicated to be involved in gravitropism. However, a much larger number of the identified genes have not been previously associated with gravitropism. Because reorientation of plants may also expose plants to mechanical perturbations, we also compared the effects of a gentle mechanical perturbation on mRNA levels during the gravity response. It was found that approximately 39% of apparently gravity-regulated genes were also regulated by the mechanical perturbation caused by plant reorientation. Our study revealed the induction of complex gene expression patterns as a consequence of gravitropic reorientation and points to an interplay between the gravitropic and mechanical responses and to the extreme sensitivity of plants to even very gentle mechanical perturbations.
Collapse
Affiliation(s)
- Nick Moseyko
- Department of Plant and Microbial Biology, University of California, 111 Koshland Hall, Berkeley, CA 94720-3102, USA
| | | | | | | | | |
Collapse
|
589
|
Guglielmi B, Werner M. The yeast homolog of human PinX1 is involved in rRNA and small nucleolar RNA maturation, not in telomere elongation inhibition. J Biol Chem 2002; 277:35712-9. [PMID: 12107183 DOI: 10.1074/jbc.m205526200] [Citation(s) in RCA: 66] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
Abstract
In human cells, PinX1 protein has recently been shown to regulate telomere length by repressing the telomerase. In this work, we show that the putative yeast homolog of PinX1, encoded by the YGR280c open reading frame (ORF), is a new component of the ribosomal RNA processing machinery. The protein has a KK(E/D) C-terminal domain typical of nucleolar proteins and bears a putative RNA interacting domain widespread in eukaryotes called the G-patch. The protein was hence renamed Gno1p (G-patch nucleolar protein). GNO1 deletion results in a large growth defect due to the inhibition of the pre-ribosomal RNA processing first cleavage steps at sites A(0), A(1), and A(2). Furthermore, Gno1p is involved in the final 3'-end trimming of U18 and U24 small nucleolar RNAs. A mutational analysis showed that the G-patch of Gno1p is essential for both functions, whereas the KK(E/D) repeats are only required for U18 small nucleolar RNA maturation. We found that PinX1 complemented the gno1-Delta mutation, suggesting that it has a dual function in telomere length regulation and ribosomal RNA maturation in agreement with its telomeric and nucleolar localization in human cells. Conversely, we found that Gno1p does not exhibit the in vivo telomerase inhibitor activity of PinX1.
Collapse
Affiliation(s)
- Benjamin Guglielmi
- Service de Biochimie et Génétique Moléculaire, Bâtiment 144, Commissariat à l'Energie Atomique/Saclay, F-91191 Gif-sur-Yvette Cedex, France
| | | |
Collapse
|
590
|
Halfon MS, Michelson AM. Exploring genetic regulatory networks in metazoan development: methods and models. Physiol Genomics 2002; 10:131-43. [PMID: 12209016 DOI: 10.1152/physiolgenomics.00072.2002] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022] Open
Abstract
One of the foremost challenges of 21st century biological research will be to decipher the complex genetic regulatory networks responsible for embryonic development. The recent explosion of whole genome sequence data and of genome-wide transcriptional profiling methods, such as microarrays, coupled with the development of sophisticated computational tools for exploiting and analyzing genomic data, provide a significant starting point for regulatory network analysis. In this article we review some of the main methodological issues surrounding genome annotation, transcriptional profiling, and computational prediction of cis-regulatory elements and discuss how the power of model genetic organisms can be used to experimentally verify and extend the results of genomic research.
Collapse
Affiliation(s)
- Marc S Halfon
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Howard Hughes Medical Institute, Boston, Massachusetts 02115, USA
| | | |
Collapse
|
591
|
Grünenfelder B, Winzeler EA. Treasures and traps in genome-wide data sets: case examples from yeast. Nat Rev Genet 2002; 3:653-61. [PMID: 12209140 DOI: 10.1038/nrg886] [Citation(s) in RCA: 60] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Since the publication of the Saccharomyces cerevisiae genome sequence, much effort has been dedicated to developing high-throughput techniques to generate comprehensive information about the function and dynamics of all genes in this yeast's genome. These techniques have generated data sets that typically contain large amounts of reliable and valuable biological information. Nevertheless, there are also uncertainties that are associated with such large-scale studies, which we discuss in this review. These uncertainties increase with the complexity of the organism under study. On the basis of the results from yeast, we should learn much from human and mouse genomic data sets. However, as with yeast data sets, they might also contain misleading results.
Collapse
Affiliation(s)
- Björn Grünenfelder
- Department of Cell Biology, ICND 202, The Scripps Research Institute, 10550 North Torrey Pines Road, La Jolla, California 92037, USA
| | | |
Collapse
|
592
|
Ramonell KM, Zhang B, Ewing RM, Chen Y, Xu D, Stacey G, Somerville S. Microarray analysis of chitin elicitation in Arabidopsis thaliana. MOLECULAR PLANT PATHOLOGY 2002; 3:301-11. [PMID: 20569338 DOI: 10.1046/j.1364-3703.2002.00123.x] [Citation(s) in RCA: 37] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/11/2023]
Abstract
Summary Chitin oligomers, released from fungal cell walls by endochitinase, induce defence and related cellular responses in many plants. However, little is known about chitin responses in the model plant Arabidopsis. We describe here a large-scale characterization of gene expression patterns in Arabidopsis in response to chitin treatment using an Arabidopsis microarray consisting of 2375 EST clones representing putative defence-related and regulatory genes. Transcript levels for 71 ESTs, representing 61 genes, were altered three-fold or more in chitin-treated seedlings relative to control seedlings. A number of transcripts exhibited altered accumulation as early as 10 min after exposure to chitin, representing some of the earliest changes in gene expression observed in chitin-treated plants. Included among the 61 genes were those that have been reported to be elicited by various pathogen-related stimuli in other plants. Additional genes, including genes of unknown function, were also identified, broadening our understanding of chitin-elicited responses. Among transcripts with enhanced accumulation, one cluster was enriched in genes with both the W-box promoter element and a novel regulatory element. In addition, a number of transcripts had decreased abundance, encoding several proteins involved in cell wall strengthening and wall deposition. The chalcone synthase promoter element was identified in the upstream regions of these genes, suggesting that pathogen signals may suppress the expression of some genes. These data indicate that Arabidopsis should be an excellent model to elucidate the mechanisms of chitin elicitation in plant defence.
Collapse
Affiliation(s)
- Katrina M Ramonell
- Department of Plant Biology, Carnegie Institution of Washington, 260 Panama Street, Stanford, CA 94305, USA
| | | | | | | | | | | | | |
Collapse
|
593
|
Smith JJ, Marelli M, Christmas RH, Vizeacoumar FJ, Dilworth DJ, Ideker T, Galitski T, Dimitrov K, Rachubinski RA, Aitchison JD. Transcriptome profiling to identify genes involved in peroxisome assembly and function. J Cell Biol 2002; 158:259-71. [PMID: 12135984 PMCID: PMC2173120 DOI: 10.1083/jcb.200204059] [Citation(s) in RCA: 163] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022] Open
Abstract
Yeast cells were induced to proliferate peroxisomes, and microarray transcriptional profiling was used to identify PEX genes encoding peroxins involved in peroxisome assembly and genes involved in peroxisome function. Clustering algorithms identified 224 genes with expression profiles similar to those of genes encoding peroxisomal proteins and genes involved in peroxisome biogenesis. Several previously uncharacterized genes were identified, two of which, YPL112c and YOR084w, encode proteins of the peroxisomal membrane and matrix, respectively. Ypl112p, renamed Pex25p, is a novel peroxin required for the regulation of peroxisome size and maintenance. These studies demonstrate the utility of comparative gene profiling as an alternative to functional assays to identify genes with roles in peroxisome biogenesis.
Collapse
Affiliation(s)
- Jennifer J Smith
- The Institute for Systems Biology, 1441 N. 34th Street, Seattle, WA 98103-8904, USA
| | | | | | | | | | | | | | | | | | | |
Collapse
|
594
|
Jorgensen P, Nishikawa JL, Breitkreutz BJ, Tyers M. Systematic identification of pathways that couple cell growth and division in yeast. Science 2002; 297:395-400. [PMID: 12089449 DOI: 10.1126/science.1070850] [Citation(s) in RCA: 582] [Impact Index Per Article: 26.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/03/2022]
Abstract
Size homeostasis in budding yeast requires that cells grow to a critical size before commitment to division in the late prereplicative growth phase of the cell cycle, an event termed Start. We determined cell size distributions for the complete set of approximately 6000 Saccharomyces cerevisiae gene deletion strains and identified approximately 500 abnormally small (whi) or large (lge) mutants. Genetic analysis revealed a complex network of newly found factors that govern critical cell size at Start, the most potent of which were Sfp1, Sch9, Cdh1, Prs3, and Whi5. Ribosome biogenesis is intimately linked to cell size through Sfp1, a transcription factor that controls the expression of at least 60 genes implicated in ribosome assembly. Cell growth and division appear to be coupled by multiple conserved mechanisms.
Collapse
Affiliation(s)
- Paul Jorgensen
- Department of Medical Genetics and Microbiology, University of Toronto, Toronto, Ontario, Canada M5S 1A8
| | | | | | | |
Collapse
|
595
|
Kurdistani SK, Robyr D, Tavazoie S, Grunstein M. Genome-wide binding map of the histone deacetylase Rpd3 in yeast. Nat Genet 2002; 31:248-54. [PMID: 12089521 DOI: 10.1038/ng907] [Citation(s) in RCA: 214] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
We describe the genome-wide distribution of the histone deacetylase and repressor Rpd3 and its associated proteins Ume1 and Ume6 in Saccharomyces cerevisiae. Using a new cross-linking protocol, we found that Rpd3 binds upstream of many individual genes and upstream of members of gene classes with similar functions in anabolic processes. In addition, Rpd3 is preferentially associated with promoters that direct high transcriptional activity. We also found that Rpd3 was absent from large sub-telomeric domains. We show by co-immunoprecipitation and by the high similarity of their binding maps that Ume1 interacts with Rpd3. In contrast, despite the known role of Ume6 in Rpd3 recruitment, only a limited number of the genes targeted by Rpd3 are also enriched for (or targeted by) Ume6. This suggests that Rpd3 is brought to many promoters by alternative recruiters, some of which may bind the putative cis-regulatory DNA elements that we have identified in sets of Rpd3 target genes. Finally, we show that comparing the genome-wide pattern of Rpd3 binding with gene expression and histone acetylation in the rpd3 Delta mutant strain reveals new sites of Rpd3 function.
Collapse
Affiliation(s)
- Siavash K Kurdistani
- Department of Biological Chemistry, University of California School of Medicine, Los Angeles, California 90095, USA
| | | | | | | |
Collapse
|
596
|
Supekova L, Pezacki JP, Su AI, Loweth CJ, Riedl R, Geierstanger B, Schultz PG, Wemmer DE. Genomic effects of polyamide/DNA interactions on mRNA expression. CHEMISTRY & BIOLOGY 2002; 9:821-7. [PMID: 12144926 DOI: 10.1016/s1074-5521(02)00174-6] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
Abstract
Here we characterize the biological activity of a hairpin polyamide 1 that inhibits binding of the minor-groove transcription factor LEF-1, constitutively expressed in colon cancers. Genome-wide analysis of mRNA expression in DLD1 colon cancer cells treated with 1 reveals that a limited number of genes are affected; the most significant changes correspond to genes related to cell cycle, signaling, and proteolysis rather than the anticipated WNT signaling pathway. Treated cells display increased doubling time and hypersensitivity to DNA damage that most likely results from downregulation of DNA-damage checkpoint genes, including YWAE (14-3-3epsilon protein) and DDIT3. Promoter analyses on a genomic level revealed numerous potential polyamide binding sites and multiple possible mechanisms for transcriptional antagonism, underscoring the utility of gene expression profiling in understanding the effects of polyamides on transcription at the cellular level.
Collapse
Affiliation(s)
- Lubica Supekova
- Department of Chemistry, The Scripps Research Institute, La Jolla, CA 92037, USA
| | | | | | | | | | | | | | | |
Collapse
|
597
|
Shedden K, Cooper S. Analysis of cell-cycle gene expression in Saccharomyces cerevisiae using microarrays and multiple synchronization methods. Nucleic Acids Res 2002; 30:2920-9. [PMID: 12087178 PMCID: PMC117069 DOI: 10.1093/nar/gkf414] [Citation(s) in RCA: 62] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Microarray analysis of gene expression during the yeast division cycle has led to the proposal that a significant number of genes in Saccharomyces cerevisiae are expressed in a cell-cycle-specific manner. Four different methods of synchronization were used for cell-cycle analysis. Randomized data exhibit periodic patterns of lesser strength than the experimental data. Thus the cyclicities in the expression measurements in the four experiments presented do not arise from chance fluctuations or noise in the data. However, when the degree of cyclicity for genes in different experiments are compared, a large degree of non-reproducibility is found. Re-examining the phase timing of peak expression, we find that three of the experiments (those using alpha-factor, CDC28 and CDC15 synchronization) show consistent patterns of phasing, but the elutriation synchrony results demonstrate a different pattern from the other arrest-release synchronization methods. Specific genes can show a wide range of cyclical behavior between different experiments; a gene with high cyclicity in one experiment can show essentially no cyclicity in another experiment. The elutriation experiment, possibly being the least perturbing of the four synchronization methods, may give the most accurate characterization of the state of gene expression during the normal, unperturbed cell cycle. Under this alternative explanation, the observed cyclicities in the other three experiments are a stress response to synchronization, and may not reproduce in unperturbed cells.
Collapse
Affiliation(s)
- Kerby Shedden
- Department of Statistics, University of Michigan, Ann Arbor, MI 48109-1285, USA.
| | | |
Collapse
|
598
|
Müller F, Blader P, Strähle U. Search for enhancers: teleost models in comparative genomic and transgenic analysis of cis regulatory elements. Bioessays 2002; 24:564-72. [PMID: 12111739 DOI: 10.1002/bies.10096] [Citation(s) in RCA: 73] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
Homology searches between DNA sequences of evolutionary distant species (phylogenetic footprinting) offer a fast detection method for regulatory sequences. Because of the small size of their genomes, tetraodontid species such as the Japanese pufferfish and green spotted pufferfish have become attractive models for comparative genomics. A disadvantage of the tetraodontid species is, however, that they cannot be bred and manipulated routinely under laboratory conditions, so these species are less attractive for developmental and genetic analysis. In contrast, an increasing arsenal of transgene techniques with the developmental model species zebrafish and medaka are being used for functional analysis of cis regulatory sequences. The main disadvantage is the much larger genome. While comparison between many loci proved the suitability of phylogenetic footprinting using fish and mammalian sequences, fast rate of change in enhancer structure and gene duplication within teleosts may obscure detection of homologies. Here we discuss the contribution and potentials provided by different teleost models for the detection and functional analysis of conserved cis-regulatory elements.
Collapse
Affiliation(s)
- Ferenc Müller
- Institute of Toxicology and Genetics, Research Center Karlsruhe, Germany.
| | | | | |
Collapse
|
599
|
Blanchette M, Tompa M. Discovery of regulatory elements by a computational method for phylogenetic footprinting. Genome Res 2002; 12:739-48. [PMID: 11997340 PMCID: PMC186562 DOI: 10.1101/gr.6902] [Citation(s) in RCA: 235] [Impact Index Per Article: 10.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2002] [Accepted: 02/28/2002] [Indexed: 01/17/2023]
Abstract
Phylogenetic footprinting is a method for the discovery of regulatory elements in a set of orthologous regulatory regions from multiple species. It does so by identifying the best conserved motifs in those orthologous regions. We describe a computer algorithm designed specifically for this purpose, making use of the phylogenetic relationships among the sequences under study to make more accurate predictions. The program is guaranteed to report all sets of motifs with the lowest parsimony scores, calculated with respect to the phylogenetic tree relating the input species. We report the results of this algorithm on several data sets of interest. A large number of known functional binding sites are identified by our method, but we also find several highly conserved motifs for which no function is yet known.
Collapse
Affiliation(s)
- Mathieu Blanchette
- Department of Computer Science and Engineering, University of Washington, Seattle, Washington 98195-2350, USA
| | | |
Collapse
|
600
|
GuhaThakurta D, Palomar L, Stormo GD, Tedesco P, Johnson TE, Walker DW, Lithgow G, Kim S, Link CD. Identification of a novel cis-regulatory element involved in the heat shock response in Caenorhabditis elegans using microarray gene expression and computational methods. Genome Res 2002; 12:701-12. [PMID: 11997337 PMCID: PMC186591 DOI: 10.1101/gr.228902] [Citation(s) in RCA: 193] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2001] [Accepted: 03/15/2002] [Indexed: 11/24/2022]
Abstract
We report here the identification of a previously unknown transcription regulatory element for heat shock (HS) genes in Caenorhabditis elegans. We monitored the expression pattern of 11,917 genes from C. elegans to determine the genes that were up-regulated on HS. Twenty eight genes were observed to be consistently up-regulated in several different repetitions of the experiments. We analyzed the upstream regions of these genes using computational DNA pattern recognition methods. Two potential cis-regulatory motifs were identified in this way. One of these motifs (TTCTAGAA) was the DNA binding motif for the heat shock factor (HSF), whereas the other (GGGTGTC) was previously unreported in the literature. We determined the significance of these motifs for the HS genes using different statistical tests and parameters. Comparative sequence analysis of orthologous HS genes from C. elegans and Caenorhabditis briggsae indicated that the identified DNA regulatory motifs are conserved across related species. The role of the identified DNA sites in regulation of HS genes was tested by in vitro mutagenesis of a green fluorescent protein (GFP) reporter transgene driven by the C. elegans hsp-16-2 promoter. DNA sites corresponding to both motifs are shown to play a significant role in up-regulation of the hsp-16-2 gene on HS. This is one of the rare instances in which a novel regulatory element, identified using computational methods, is shown to be biologically active. The contributions of individual sites toward induction of transcription on HS are nonadditive, which indicates interaction and cross-talk between the sites, possibly through the transcription factors (TFs) binding to these sites.
Collapse
Affiliation(s)
- Debraj GuhaThakurta
- Department of Genetics, Washington University School of Medicine, St. Louis, Missouri 63114, USA
| | | | | | | | | | | | | | | | | |
Collapse
|