1
|
Glenwinkel L, Taylor SR, Langebeck-Jensen K, Pereira L, Reilly MB, Basavaraju M, Rafi I, Yemini E, Pocock R, Sestan N, Hammarlund M, Miller DM, Hobert O. In silico analysis of the transcriptional regulatory logic of neuronal identity specification throughout the C. elegans nervous system. eLife 2021; 10:e64906. [PMID: 34165430 PMCID: PMC8225391 DOI: 10.7554/elife.64906] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2020] [Accepted: 05/07/2021] [Indexed: 12/11/2022] Open
Abstract
The generation of the enormous diversity of neuronal cell types in a differentiating nervous system entails the activation of neuron type-specific gene batteries. To examine the regulatory logic that controls the expression of neuron type-specific gene batteries, we interrogate single cell expression profiles of all 118 neuron classes of the Caenorhabditis elegans nervous system for the presence of DNA binding motifs of 136 neuronally expressed C. elegans transcription factors. Using a phylogenetic footprinting pipeline, we identify cis-regulatory motif enrichments among neuron class-specific gene batteries and we identify cognate transcription factors for 117 of the 118 neuron classes. In addition to predicting novel regulators of neuronal identities, our nervous system-wide analysis at single cell resolution supports the hypothesis that many transcription factors directly co-regulate the cohort of effector genes that define a neuron type, thereby corroborating the concept of so-called terminal selectors of neuronal identity. Our analysis provides a blueprint for how individual components of an entire nervous system are genetically specified.
Collapse
Affiliation(s)
- Lori Glenwinkel
- Department of Biological Sciences, Columbia University, Howard Hughes Medical InstituteNew YorkUnited States
| | - Seth R Taylor
- Department of Cell and Developmental Biology, Vanderbilt University School of MedicineNashvilleUnited States
| | | | - Laura Pereira
- Department of Biological Sciences, Columbia University, Howard Hughes Medical InstituteNew YorkUnited States
| | - Molly B Reilly
- Department of Biological Sciences, Columbia University, Howard Hughes Medical InstituteNew YorkUnited States
| | - Manasa Basavaraju
- Department of Neurobiology, Yale University School of MedicineNew HavenUnited States
- Department of Genetics, Yale University School of MedicineNew HavenUnited States
| | - Ibnul Rafi
- Department of Biological Sciences, Columbia University, Howard Hughes Medical InstituteNew YorkUnited States
| | - Eviatar Yemini
- Department of Biological Sciences, Columbia University, Howard Hughes Medical InstituteNew YorkUnited States
| | - Roger Pocock
- Biotech Research and Innovation Centre, University of CopenhagenCopenhagenDenmark
- Development and Stem Cells Program, Monash Biomedicine Discovery Institute and Department of Anatomy and Developmental Biology, Monash UniversityMelbourneAustralia
| | - Nenad Sestan
- Department of Neurobiology, Yale University School of MedicineNew HavenUnited States
- Department of Genetics, Yale University School of MedicineNew HavenUnited States
| | - Marc Hammarlund
- Department of Neurobiology, Yale University School of MedicineNew HavenUnited States
- Department of Genetics, Yale University School of MedicineNew HavenUnited States
| | - David M Miller
- Department of Cell and Developmental Biology, Vanderbilt University School of MedicineNashvilleUnited States
| | - Oliver Hobert
- Department of Biological Sciences, Columbia University, Howard Hughes Medical InstituteNew YorkUnited States
| |
Collapse
|
2
|
Glenwinkel L, Wu D, Minevich G, Hobert O. TargetOrtho: a phylogenetic footprinting tool to identify transcription factor targets. Genetics 2014; 197:61-76. [PMID: 24558259 PMCID: PMC4012501 DOI: 10.1534/genetics.113.160721] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2014] [Accepted: 02/09/2014] [Indexed: 11/18/2022] Open
Abstract
The identification of the regulatory targets of transcription factors is central to our understanding of how transcription factors fulfill their many key roles in development and homeostasis. DNA-binding sites have been uncovered for many transcription factors through a number of experimental approaches, but it has proven difficult to use this binding site information to reliably predict transcription factor target genes in genomic sequence space. Using the nematode Caenorhabditis elegans and other related nematode species as a starting point, we describe here a bioinformatic pipeline that identifies potential transcription factor target genes from genomic sequences. Among the key features of this pipeline is the use of sequence conservation of transcription-factor-binding sites in related species. Rather than using aligned genomic DNA sequences from the genomes of multiple species as a starting point, TargetOrtho scans related genome sequences independently for matches to user-provided transcription-factor-binding motifs, assigns motif matches to adjacent genes, and then determines whether orthologous genes in different species also contain motif matches. We validate TargetOrtho by identifying previously characterized targets of three different types of transcription factors in C. elegans, and we use TargetOrtho to identify novel target genes of the Collier/Olf/EBF transcription factor UNC-3 in C. elegans ventral nerve cord motor neurons. We have also implemented the use of TargetOrtho in Drosophila melanogaster using conservation among five species in the D. melanogaster species subgroup for target gene discovery.
Collapse
Affiliation(s)
- Lori Glenwinkel
- Department of Biochemistry and Molecular Biophysics, Howard Hughes Medical Institute, Columbia University Medical Center, New York, New York 10032
| | | | - Gregory Minevich
- Department of Biochemistry and Molecular Biophysics, Howard Hughes Medical Institute, Columbia University Medical Center, New York, New York 10032
| | - Oliver Hobert
- Department of Biochemistry and Molecular Biophysics, Howard Hughes Medical Institute, Columbia University Medical Center, New York, New York 10032
| |
Collapse
|
3
|
Hsieh YW, Chang C, Chuang CF. The microRNA mir-71 inhibits calcium signaling by targeting the TIR-1/Sarm1 adaptor protein to control stochastic L/R neuronal asymmetry in C. elegans. PLoS Genet 2012; 8:e1002864. [PMID: 22876200 PMCID: PMC3410857 DOI: 10.1371/journal.pgen.1002864] [Citation(s) in RCA: 49] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2011] [Accepted: 06/12/2012] [Indexed: 01/06/2023] Open
Abstract
The Caenorhabditis elegans left and right AWC olfactory neurons communicate to establish stochastic asymmetric identities, AWC(ON) and AWC(OFF), by inhibiting a calcium-mediated signaling pathway in the future AWC(ON) cell. NSY-4/claudin-like protein and NSY-5/innexin gap junction protein are the two parallel signals that antagonize the calcium signaling pathway to induce the AWC(ON) fate. However, it is not known how the calcium signaling pathway is downregulated by nsy-4 and nsy-5 in the AWC(ON) cell. Here we identify a microRNA, mir-71, that represses the TIR-1/Sarm1 adaptor protein in the calcium signaling pathway to promote the AWC(ON) identity. Similar to tir-1 loss-of-function mutants, overexpression of mir-71 generates two AWC(ON) neurons. tir-1 expression is downregulated through its 3' UTR in AWC(ON), in which mir-71 is expressed at a higher level than in AWC(OFF). In addition, mir-71 is sufficient to inhibit tir-1 expression in AWC through the mir-71 complementary site in the tir-1 3' UTR. Our genetic studies suggest that mir-71 acts downstream of nsy-4 and nsy-5 to promote the AWC(ON) identity in a cell autonomous manner. Furthermore, the stability of mature mir-71 is dependent on nsy-4 and nsy-5. Together, these results provide insight into the mechanism by which nsy-4 and nsy-5 inhibit calcium signaling to establish stochastic asymmetric AWC differentiation.
Collapse
Affiliation(s)
- Yi-Wen Hsieh
- Division of Developmental Biology, Cincinnati Children's Hospital Medical Center Research Foundation, Cincinnati, Ohio, United States of America
| | - Chieh Chang
- Division of Developmental Biology, Cincinnati Children's Hospital Medical Center Research Foundation, Cincinnati, Ohio, United States of America
- * E-mail: (CC); (C-FC)
| | - Chiou-Fen Chuang
- Division of Developmental Biology, Cincinnati Children's Hospital Medical Center Research Foundation, Cincinnati, Ohio, United States of America
- * E-mail: (CC); (C-FC)
| |
Collapse
|
4
|
Burghoorn J, Piasecki BP, Crona F, Phirke P, Jeppsson KE, Swoboda P. The in vivo dissection of direct RFX-target gene promoters in C. elegans reveals a novel cis-regulatory element, the C-box. Dev Biol 2012; 368:415-26. [PMID: 22683808 DOI: 10.1016/j.ydbio.2012.05.033] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2011] [Revised: 04/23/2012] [Accepted: 05/25/2012] [Indexed: 11/26/2022]
Abstract
At the core of the primary transcriptional network regulating ciliary gene expression in Caenorhabditis elegans sensory neurons is the RFX/DAF-19 transcription factor, which binds and thereby positively regulates 13-15 bp X-box promoter motifs found in the cis-regulatory regions of many ciliary genes. However, the variable expression of direct RFX-target genes in various sets of ciliated sensory neurons (CSNs) occurs through as of yet uncharacterized mechanisms. In this study the cis-regulatory regions of 41 direct RFX-target genes are compared using in vivo genetic analyses and computational comparisons of orthologous nematode sequences. We find that neither the proximity to the translational start site nor the exact sequence composition of the X-box promoter motif of the respective ciliary gene can explain the variation in expression patterns observed among different direct RFX-target genes. Instead, a novel enhancer element appears to co-regulate ciliary genes in a DAF-19 dependent manner. This cytosine- and thymidine-rich sequence, the C-box, was found in the cis-regulatory regions in close proximity to the respective X-box motif for 84% of the most broadly expressed direct RFX-target genes sampled in this study. Molecular characterization confirmed that these 8-11 bp C-box sequences act as strong enhancer elements for direct RFX-target genes. An artificial promoter containing only an X-box promoter motif and two of the C-box enhancer elements was able to drive strong expression of a GFP reporter construct in many C. elegans CSNs. These data provide a much-improved understanding of how direct RFX-target genes are differentially regulated in C. elegans and will provide a molecular model for uncovering the transcriptional network mediating ciliary gene expression in animals.
Collapse
Affiliation(s)
- Jan Burghoorn
- Karolinska Institute, Center for Biosciences at NOVUM, Department of Biosciences and Nutrition, Hälsovägen 7, S-141 83 Huddinge, Sweden
| | | | | | | | | | | |
Collapse
|
5
|
Coordinated regulation of cholinergic motor neuron traits through a conserved terminal selector gene. Nat Neurosci 2011; 15:205-14. [PMID: 22119902 PMCID: PMC3267877 DOI: 10.1038/nn.2989] [Citation(s) in RCA: 119] [Impact Index Per Article: 9.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2011] [Accepted: 10/28/2011] [Indexed: 11/08/2022]
Abstract
Cholinergic motor neurons are defined by the coexpression of a battery of genes encoding proteins that act sequentially to synthesize, package and degrade acetylcholine and reuptake its breakdown product, choline. How expression of these critical motor neuron identity determinants is controlled and coordinated is not understood. We show here that, in the nematode Caenorhabditis elegans, all members of the cholinergic gene battery, as well as many other markers of terminal motor neuron fate, are co-regulated by a shared cis-regulatory signature and a common trans-acting factor, the phylogenetically conserved COE (Collier, Olf, EBF)-type transcription factor UNC-3. UNC-3 initiated and maintained expression of cholinergic fate markers and was sufficient to induce cholinergic fate in other neuron types. UNC-3 furthermore operated in negative feedforward loops to induce the expression of transcription factors that repress individual UNC-3-induced terminal fate markers, resulting in diversification of motor neuron differentiation programs in specific motor neuron subtypes. A chordate ortholog of UNC-3, Ciona intestinalis COE, was also both required and sufficient for inducing a cholinergic fate. Thus, UNC-3 is a terminal selector for cholinergic motor neuron differentiation whose function is conserved across phylogeny.
Collapse
|
6
|
Analysis of multiple ethyl methanesulfonate-mutagenized Caenorhabditis elegans strains by whole-genome sequencing. Genetics 2010; 185:417-30. [PMID: 20439776 DOI: 10.1534/genetics.110.116319] [Citation(s) in RCA: 77] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022] Open
Abstract
Whole-genome sequencing (WGS) of organisms displaying a specific mutant phenotype is a powerful approach to identify the genetic determinants of a plethora of biological processes. We have previously validated the feasibility of this approach by identifying a point-mutated locus responsible for a specific phenotype, observed in an ethyl methanesulfonate (EMS)-mutagenized Caenorhabditis elegans strain. Here we describe the genome-wide mutational profile of 17 EMS-mutagenized genomes as assessed with a bioinformatic pipeline, called MAQGene. Surprisingly, we find that while outcrossing mutagenized strains does reduce the total number of mutations, a striking mutational load is still observed even in outcrossed strains. Such genetic complexity has to be taken into account when establishing a causative relationship between genotype and phenotype. Even though unintentional, the 17 sequenced strains described here provide a resource of allelic variants in almost 1000 genes, including 62 premature stop codons, which represent candidate knockout alleles that will be of further use for the C. elegans community to study gene function.
Collapse
|
7
|
Meireles-Filho ACA, Stark A. Comparative genomics of gene regulation-conservation and divergence of cis-regulatory information. Curr Opin Genet Dev 2009; 19:565-70. [PMID: 19913403 DOI: 10.1016/j.gde.2009.10.006] [Citation(s) in RCA: 68] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2009] [Revised: 10/06/2009] [Accepted: 10/06/2009] [Indexed: 01/13/2023]
Abstract
We recently witnessed a tremendous increase in genomics studies on gene regulation and in entirely sequenced genomes from closely related species. This has triggered analyses that suggest a wide range of evolutionary dynamics of gene regulation, from rapid turnover of transcription-factor binding sites to conservation of enhancer function across large evolutionary distances. Many examples show that enhancers can evolve beyond recognizable sequence similarity while retaining function. However, bioinformatics approaches are increasingly able to detect conserved regulatory elements through characteristic evolutionary sequence signatures. Cis-regulatory changes are also a major source of morphological evolution, which might be facilitated by many biochemically functional elements that are selectively neutral and by the buffering function of redundant enhancers and 'shadow' enhancers.
Collapse
|
8
|
Wang S, Yang S, Yin Y, Guo X, Wang S, Hao D. An in silico strategy identified the target gene candidates regulated by dehydration responsive element binding proteins (DREBs) in Arabidopsis genome. PLANT MOLECULAR BIOLOGY 2009; 69:167-78. [PMID: 18931920 DOI: 10.1007/s11103-008-9414-5] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/20/2008] [Accepted: 10/01/2008] [Indexed: 05/23/2023]
Abstract
Identification of downstream target genes of stress-relating transcription factors (TFs) is desirable in understanding cellular responses to various environmental stimuli. However, this has long been a difficult work for both experimental and computational practices. In this research, we presented a novel computational strategy which combined the analysis of the transcription factor binding site (TFBS) contexts and machine learning approach. Using this strategy, we conducted a genome-wide investigation into novel direct target genes of dehydration responsive element binding proteins (DREBs), the members of AP2-EREBPs transcription factor super family which is reported to be responsive to various abiotic stresses in Arabidopsis. The genome-wide searching yielded in total 474 target gene candidates. With reference to the microarray data for abiotic stresses-inducible gene expression profile, 268 target gene candidates out of the total 474 genes predicted, were induced during the 24-h exposure to abiotic stresses. This takes about 57% of total predicted targets. Furthermore, GO annotations revealed that these target genes are likely involved in protein amino acid phosphorylation, protein binding and Endomembrane sorting system. The results suggested that the predicted target gene candidates were adequate to meet the essential biological principle of stress-resistance in plants.
Collapse
Affiliation(s)
- Shichen Wang
- College of Animal Science and Veterinary Medicine, Jilin University, Changchun 130062, People's Republic of China
| | | | | | | | | | | |
Collapse
|
9
|
Kuntz SG, Schwarz EM, DeModena JA, De Buysscher T, Trout D, Shizuya H, Sternberg PW, Wold BJ. Multigenome DNA sequence conservation identifies Hox cis-regulatory elements. Genome Res 2008; 18:1955-68. [PMID: 18981268 DOI: 10.1101/gr.085472.108] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/31/2023]
Abstract
To learn how well ungapped sequence comparisons of multiple species can predict cis-regulatory elements in Caenorhabditis elegans, we made such predictions across the large, complex ceh-13/lin-39 locus and tested them transgenically. We also examined how prediction quality varied with different genomes and parameters in our comparisons. Specifically, we sequenced approximately 0.5% of the C. brenneri and C. sp. 3 PS1010 genomes, and compared five Caenorhabditis genomes (C. elegans, C. briggsae, C. brenneri, C. remanei, and C. sp. 3 PS1010) to find regulatory elements in 22.8 kb of noncoding sequence from the ceh-13/lin-39 Hox subcluster. We developed the MUSSA program to find ungapped DNA sequences with N-way transitive conservation, applied it to the ceh-13/lin-39 locus, and transgenically assayed 21 regions with both high and low degrees of conservation. This identified 10 functional regulatory elements whose activities matched known ceh-13/lin-39 expression, with 100% specificity and a 77% recovery rate. One element was so well conserved that a similar mouse Hox cluster sequence recapitulated the native nematode expression pattern when tested in worms. Our findings suggest that ungapped sequence comparisons can predict regulatory elements genome-wide.
Collapse
Affiliation(s)
- Steven G Kuntz
- Division of Biology, California Institute of Technology, Pasadena, California 91125, USA
| | | | | | | | | | | | | | | |
Collapse
|
10
|
Dieterich C, Sommer RJ. A Caenorhabditis motif compendium for studying transcriptional gene regulation. BMC Genomics 2008; 9:30. [PMID: 18215260 PMCID: PMC2248174 DOI: 10.1186/1471-2164-9-30] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2007] [Accepted: 01/23/2008] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Controlling gene expression is fundamental to biological complexity. The nematode Caenorhabditis elegans is an important model for studying principles of gene regulation in multi-cellular organisms. A comprehensive parts list of putative regulatory motifs was yet missing for this model system. In this study, we compile a set of putative regulatory motifs by combining evidence from conservation and expression data. DESCRIPTION We present an unbiased comparative approach to a regulatory motif compendium for Caenorhabditis species. This involves the assembly of a new nematode genome, whole genome alignments and assessment of conserved k-mers counts. Candidate motifs are selected from a set of 9,500 randomly picked genes by three different motif discovery strategies. Motif candidates have to pass a conservation enrichment filter. Motif degeneracy and length are optimized. Retained motif descriptions are evaluated by expression data using a non-parametric test, which assesses expression changes due to the presence/absence of individual motifs. Finally, we also provide condition-specific motif ensembles by conditional tree analysis. CONCLUSION The nematode genomes align surprisingly well despite high neutral substitution rates. Our pipeline delivers motif sets by three alternative strategies. Each set contains less than 400 motifs, which are significantly conserved and correlated with 214 out of 270 tested gene expression conditions. This motif compendium is an entry point to comprehensive studies on nematode gene regulation. The website: http://corg.eb.tuebingen.mpg.de/CMC has extensive query capabilities, supplements this article and supports the experimental list.
Collapse
Affiliation(s)
- Christoph Dieterich
- Department of Evolutionary Biology, Max Planck Institute for Developmental Biology, Spemannstrasse 35 - 37, Tübingen, Germany.
| | | |
Collapse
|
11
|
Yang E, Simcha D, Almon RR, Dubois DC, Jusko WJ, Androulakis IP. Context specific transcription factor prediction. Ann Biomed Eng 2007; 35:1053-67. [PMID: 17377845 PMCID: PMC4184431 DOI: 10.1007/s10439-007-9268-z] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2006] [Accepted: 01/25/2007] [Indexed: 01/13/2023]
Abstract
One of the goals of systems biology is the identification of regulatory mechanisms that govern an organism's response to external stimuli. Transcription factors have been hypothesized as a major contributor to an organism's response to various outside stimuli, and a great deal of work has been done to predict the set of transcription factors which regulate a given gene. Most of the current methods seek to identify possible binding sites from genomic sequence. Initial attempts at predicting transcription factors from genomic sequences suffered from the problem of false positives. Making the problem more difficult, it has also been shown that while predicted binding sites might be false positives, they can be shown to bind to their corresponding sequences in vitro. One method for rectifying this is through the use of phylogenetic analysis in which only regions which show high evolutionary conservation are analyzed. However such an approach may be too stringent because of the level of degeneracy shown in transcription factor binding site position weight matrices. Due to the degeneracy, there may be only a few bases that need to be conserved across species. Therefore, while a sequence may not show a high level of evolutionary conservation, these sequences may still show high affinity for the same transcription factor. In predicting transcription factor binding we explore the notion that "Co-expression implies co-regulation" [Allocco et al. BMC Bioinformatics 5:18, 2004]. With multiple genes requiring similar transcription factors binding sites, there exists a basis for eliminating false positives. This method allows for the selection of transcription factors binding sites that are active under a given experimental paradigm, thereby allowing us to indirectly incorporate the effects of chromosome and recognition site presentation upon transcription factor binding prediction. Rather than having to rationalize that a few transcription factors binding sites are over-represented in a cluster of genes, one can show that a few transcription factors are active in the cluster of genes that have been grouped together. Although the method focuses on predicting experiment-specific transcription factor binding sites, it is possible that if such a methodology were used in an iterative process where different experiments were analyzed, one could obtain a comprehensive set of transcription factors binding sites which regulate the various dynamic responses shown by biological systems under a variety of conditions hence building a more comprehensive model of transcriptional regulation.
Collapse
Affiliation(s)
- Eric Yang
- Biomedical Engineering Department, Rutgers University, 617 Bowser Road, Piscataway, NJ, 08854, USA
| | - David Simcha
- Biomedical Engineering Department, Rutgers University, 617 Bowser Road, Piscataway, NJ, 08854, USA
| | - Richard R. Almon
- Department of Biological Sciences, State University of New York at Buffalo, Buffalo, NY, 14260, USA
- Department of Pharmaceutical Sciences, State University of New York at Buffalo, Buffalo, NY, 14260, USA
| | - Debra C. Dubois
- Department of Biological Sciences, State University of New York at Buffalo, Buffalo, NY, 14260, USA
- Department of Pharmaceutical Sciences, State University of New York at Buffalo, Buffalo, NY, 14260, USA
| | - William J. Jusko
- Department of Pharmaceutical Sciences, State University of New York at Buffalo, Buffalo, NY, 14260, USA
| | - Ioannis P. Androulakis
- Biomedical Engineering Department, Rutgers University, 617 Bowser Road, Piscataway, NJ, 08854, USA
- Chemical and Biochemical Engineering Department Rutgers University, Piscataway, NJ, 08854, USA
| |
Collapse
|
12
|
Müller F, Borycki AG. Sequence analyses to study the evolutionary history and cis-regulatory elements of Hedgehog genes. Methods Mol Biol 2007; 397:231-250. [PMID: 18025724 DOI: 10.1007/978-1-59745-516-9_16] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/25/2023]
Abstract
Sequence analysis and comparative genomics are powerful tools to gain knowledge on multiple aspects of gene and protein regulation and function. These have been widely used to understand the evolutionary history and the biochemistry of Hedgehog (Hh) proteins, and the molecular control of Hedgehog gene expression. Here, we report on some of the methods available to retrieve protein and genomic sequences. We describe how protein sequence comparison can produce information on the evolutionary history of Hh proteins. Moreover, we describe the use of genomic sequence analysis including phylogenetic footprinting and transcription factor-binding site search tools, techniques that allow for the characterization of cis-regulatory elements of developmental genes such as the Hedgehog genes.
Collapse
|
13
|
Jegga AG, Chen J, Gowrisankar S, Deshmukh MA, Gudivada R, Kong S, Kaimal V, Aronow BJ. GenomeTrafac: a whole genome resource for the detection of transcription factor binding site clusters associated with conventional and microRNA encoding genes conserved between mouse and human gene orthologs. Nucleic Acids Res 2006; 35:D116-21. [PMID: 17178752 PMCID: PMC1781107 DOI: 10.1093/nar/gkl1011] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Transcriptional cis-regulatory control regions frequently are found within non-coding DNA segments conserved across multi-species gene orthologs. Adopting a systematic gene-centric pipeline approach, we report here the development of a web-accessible database resource--GenomeTraFac (http://genometrafac.cchmc.org)--that allows genome-wide detection and characterization of compositionally similar cis-clusters that occur in gene orthologs between any two genomes for both microRNA genes as well as conventional RNA-encoding genes. Each ortholog gene pair can be scanned to visualize overall conserved sequence regions, and within these, the relative density of conserved cis-element motif clusters form graph peak structures. The results of these analyses can be mined en masse to identify most frequently represented cis-motifs in a list of genes. The system also provides a method for rapid evaluation and visualization of gene model-consistency between orthologs, and facilitates consideration of the potential impact of sequence variation in conserved non-coding regions to impact complex cis-element structures. Using the mouse and human genomes via the NCBI Reference Sequence database and the Sanger Institute miRBase, the system demonstrated the ability to identify validated transcription factor targets within promoter and distal genomic regulatory regions of both conventional and microRNA genes.
Collapse
Affiliation(s)
- Anil G. Jegga
- Division of Biomedical Informatics, Cincinnati Children's Hospital Medical CenterCincinnati, OH 45229, USA
- Department of Pediatrics, College of MedicineCincinnati, OH 45229, USA
| | - Jing Chen
- Division of Biomedical Informatics, Cincinnati Children's Hospital Medical CenterCincinnati, OH 45229, USA
- Department of Biomedical Engineering, University of CincinnatiCincinnati, OH 45229, USA
| | - Sivakumar Gowrisankar
- Division of Biomedical Informatics, Cincinnati Children's Hospital Medical CenterCincinnati, OH 45229, USA
- Department of Biomedical Engineering, University of CincinnatiCincinnati, OH 45229, USA
| | - Mrunal A. Deshmukh
- Division of Biomedical Informatics, Cincinnati Children's Hospital Medical CenterCincinnati, OH 45229, USA
| | - RangaChandra Gudivada
- Division of Biomedical Informatics, Cincinnati Children's Hospital Medical CenterCincinnati, OH 45229, USA
- Department of Biomedical Engineering, University of CincinnatiCincinnati, OH 45229, USA
| | - Sue Kong
- Division of Biomedical Informatics, Cincinnati Children's Hospital Medical CenterCincinnati, OH 45229, USA
| | - Vivek Kaimal
- Division of Biomedical Informatics, Cincinnati Children's Hospital Medical CenterCincinnati, OH 45229, USA
- Department of Biomedical Engineering, University of CincinnatiCincinnati, OH 45229, USA
| | - Bruce J. Aronow
- Division of Biomedical Informatics, Cincinnati Children's Hospital Medical CenterCincinnati, OH 45229, USA
- Department of Pediatrics, College of MedicineCincinnati, OH 45229, USA
- Department of Biomedical Engineering, University of CincinnatiCincinnati, OH 45229, USA
- To whom correspondence should be addressed at Cincinnati Children's Hospital Medical Center, 3333 Burnet Avenue–MLC 7024, Cincinnati, OH 45229-3039, USA. Tel: +1 513 636 4865; Fax: +1 513 636 2056;
| |
Collapse
|
14
|
McGhee JD, Sleumer MC, Bilenky M, Wong K, McKay SJ, Goszczynski B, Tian H, Krich ND, Khattra J, Holt RA, Baillie DL, Kohara Y, Marra MA, Jones SJM, Moerman DG, Robertson AG. The ELT-2 GATA-factor and the global regulation of transcription in the C. elegans intestine. Dev Biol 2006; 302:627-45. [PMID: 17113066 DOI: 10.1016/j.ydbio.2006.10.024] [Citation(s) in RCA: 135] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2006] [Revised: 10/08/2006] [Accepted: 10/14/2006] [Indexed: 12/18/2022]
Abstract
A SAGE library was prepared from hand-dissected intestines from adult Caenorhabditis elegans, allowing the identification of >4000 intestinally-expressed genes; this gene inventory provides fundamental information for understanding intestine function, structure and development. Intestinally-expressed genes fall into two broad classes: widely-expressed "housekeeping" genes and genes that are either intestine-specific or significantly intestine-enriched. Within this latter class of genes, we identified a subset of highly-expressed highly-validated genes that are expressed either exclusively or primarily in the intestine. Over half of the encoded proteins are candidates for secretion into the intestinal lumen to hydrolyze the bacterial food (e.g. lysozymes, amoebapores, lipases and especially proteases). The promoters of this subset of intestine-specific/intestine-enriched genes were analyzed computationally, using both a word-counting method (RSAT oligo-analysis) and a method based on Gibbs sampling (MotifSampler). Both methods returned the same over-represented site, namely an extended GATA-related sequence of the general form AHTGATAARR, which agrees with experimentally determined cis-acting control sequences found in intestine genes over the past 20 years. All promoters in the subset contain such a site, compared to <5% for control promoters; moreover, our analysis suggests that the majority (perhaps all) of genes expressed exclusively or primarily in the worm intestine are likely to contain such a site in their promoters. There are three zinc-finger GATA-type factors that are candidates to bind this extended GATA site in the differentiating C. elegans intestine: ELT-2, ELT-4 and ELT-7. All evidence points to ELT-2 being the most important of the three. We show that worms in which both the elt-4 and the elt-7 genes have been deleted from the genome are essentially wildtype, demonstrating that ELT-2 provides all essential GATA-factor functions in the intestine. The SAGE analysis also identifies more than a hundred other transcription factors in the adult intestine but few show an RNAi-induced loss-of-function phenotype and none (other than ELT-2) show a phenotype primarily in the intestine. We thus propose a simple model in which the ELT-2 GATA factor directly participates in the transcription of all intestine-specific/intestine-enriched genes, from the early embryo through to the dying adult. Other intestinal transcription factors would thus modulate the action of ELT-2, depending on the worm's nutritional and physiological needs.
Collapse
Affiliation(s)
- James D McGhee
- Department of Biochemistry and Molecular Biology, University of Calgary, 3330 Hospital Drive N.W., Calgary, Alberta, Canada T2N 4N1.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
15
|
Perco P, Rapberger R, Siehs C, Lukas A, Oberbauer R, Mayer G, Mayer B. Transforming omics data into context: Bioinformatics on genomics and proteomics raw data. Electrophoresis 2006; 27:2659-75. [PMID: 16739231 DOI: 10.1002/elps.200600064] [Citation(s) in RCA: 32] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Differential gene expression analysis and proteomics have exerted significant impact on the elucidation of concerted cellular processes, as simultaneous measurement of hundreds to thousands of individual objects on the level of RNA and protein ensembles became technically feasible. The availability of such data sets has promised a profound understanding of phenomena on an aggregate level, expressed as the phenotypic response (observables) of cells, e.g., in the presence of drugs, or characterization of cells and tissue displaying distinct patho-physiological states. However, the step of transforming these data into context, i.e., linking distinct expression or abundance patterns with phenotypic observables - and furthermore enabling a sound biological interpretation on the level of reaction networks and concerted pathways, is still a major shortcoming. This finding is certainly based on the enormous complexity embedded in cellular reaction networks, but a variety of computational approaches have been developed over the last few years to overcome these issues. This review provides an overview on computational procedures for analysis of genomic and proteomic data introducing a sequential analysis workflow: Explorative statistics for deriving a first, from the purely statistical viewpoint, relevant candidate gene/protein list, followed by co-regulation and network analysis to biologically expand this core list toward functional networks and pathways. The review on these procedures is complemented by example applications tailored at identification of disease-associated proteins. Optimization of computational procedures involved, in conjunction with the continuous increase in additional biological data, clearly has the potential of boosting our understanding of processes on a cell-wide level.
Collapse
Affiliation(s)
- Paul Perco
- Department of Nephrology, Medical University of Vienna, Austria
| | | | | | | | | | | | | |
Collapse
|
16
|
Ortiz CO, Etchberger JF, Posy SL, Frøkjaer-Jensen C, Lockery S, Honig B, Hobert O. Searching for neuronal left/right asymmetry: genomewide analysis of nematode receptor-type guanylyl cyclases. Genetics 2006; 173:131-49. [PMID: 16547101 PMCID: PMC1461427 DOI: 10.1534/genetics.106.055749] [Citation(s) in RCA: 98] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2006] [Accepted: 03/03/2006] [Indexed: 11/18/2022] Open
Abstract
Functional left/right asymmetry ("laterality") is a fundamental feature of many nervous systems, but only very few molecular correlates to functional laterality are known. At least two classes of chemosensory neurons in the nematode Caenorhabditis elegans are functionally lateralized. The gustatory neurons ASE left (ASEL) and ASE right (ASER) are two bilaterally symmetric neurons that sense distinct chemosensory cues and express a distinct set of four known chemoreceptors of the guanylyl cyclase (gcy) gene family. To examine the extent of lateralization of gcy gene expression patterns in the ASE neurons, we have undertaken a genomewide analysis of all gcy genes. We report the existence of a total of 27 gcy genes encoding receptor-type guanylyl cyclases and of 7 gcy genes encoding soluble guanylyl cyclases in the complete genome sequence of C. elegans. We describe the expression pattern of all previously uncharacterized receptor-type guanylyl cyclases and find them to be highly biased but not exclusively restricted to the nervous system. We find that >41% (11/27) of all receptor-type guanylyl cyclases are expressed in the ASE gustatory neurons and that one-third of all gcy genes (9/27) are expressed in a lateral, left/right asymmetric manner in the ASE neurons. The expression of all laterally expressed gcy genes is under the control of a gene regulatory network composed of several transcription factors and miRNAs. The complement of gcy genes in the related nematode C. briggsae differs from C. elegans as evidenced by differences in chromosomal localization, number of gcy genes, and expression patterns. Differences in gcy expression patterns in the ASE neurons of C. briggsae arise from a difference in cis-regulatory elements and trans-acting factors that control ASE laterality. In sum, our results indicate the existence of a surprising multitude of putative chemoreceptors in the gustatory ASE neurons and suggest the existence of a substantial degree of laterality in gustatory signaling mechanisms in nematodes.
Collapse
Affiliation(s)
- Christopher O Ortiz
- Howard Hughes Medical Institute, Department of Biochemistry and Molecular Biophysics, Columbia University Medical Center, New York, New York 10032, USA
| | | | | | | | | | | | | |
Collapse
|
17
|
Hristova M, Birse D, Hong Y, Ambros V. The Caenorhabditis elegans heterochronic regulator LIN-14 is a novel transcription factor that controls the developmental timing of transcription from the insulin/insulin-like growth factor gene ins-33 by direct DNA binding. Mol Cell Biol 2006; 25:11059-72. [PMID: 16314527 PMCID: PMC1316966 DOI: 10.1128/mcb.25.24.11059-11072.2005] [Citation(s) in RCA: 44] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023] Open
Abstract
A temporal gradient of the novel nuclear protein LIN-14 specifies the timing and sequence of stage-specific developmental events in Caenorhabditis elegans. The profound effects of lin-14 mutations on worm development suggest that LIN-14 directly or indirectly regulates stage-specific gene expression. We show that LIN-14 can associate with chromatin in vivo and has in vitro DNA binding activity. A bacterially expressed C-terminal domain of LIN-14 was used to select DNA sequences that contain a putative consensus binding site from a pool of randomized double-stranded oligonucleotides. To identify candidates for genes directly regulated by lin-14, we employed DNA microarray hybridization to compare the mRNA abundance of C. elegans genes in wild-type animals to that in mutants with reduced or elevated lin-14 activity. Five of the candidate LIN-14 target genes identified by microarrays, including the insulin/insulin-like growth factor family gene ins-33, contain putative LIN-14 consensus sites in their upstream DNA sequences. Genetic analysis indicates that the developmental regulation of ins-33 mRNA involves the stage-specific repression of ins-33 transcription by LIN-14 via sequence-specific DNA binding. These results reinforce the conclusion that lin-14 encodes a novel class of transcription factor.
Collapse
Affiliation(s)
- Marta Hristova
- Dartmouth Medical School, Department of Genetics, Hanover, NH 03755, USA
| | | | | | | |
Collapse
|
18
|
Junion G, Jagla T, Duplant S, Tapin R, Da Ponte JP, Jagla K. Mapping Dmef2-binding regulatory modules by using a ChIP-enriched in silico targets approach. Proc Natl Acad Sci U S A 2005; 102:18479-84. [PMID: 16339902 PMCID: PMC1317932 DOI: 10.1073/pnas.0507030102] [Citation(s) in RCA: 31] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2005] [Accepted: 10/27/2005] [Indexed: 11/18/2022] Open
Abstract
Mapping the regulatory modules to which transcription factors bind in vivo is a key step toward understanding of global gene expression programs. We have developed a chromatin immunoprecipitation (ChIP)-chip strategy for identifying factor-specific regulatory regions acting in vivo. This method, called the ChIP-enriched in silico targets (ChEST) approach, combines immunoprecipitation of cross-linked protein-DNA complexes (X-ChIP) with in silico prediction of targets and generation of computed DNA microarrays. We report the use of ChEST in Drosophila to identify several previously unknown targets of myocyte enhancer factor 2 (MEF2), a key regulator of myogenic differentiation. Our approach was validated by demonstrating that the identified sequences act as enhancers in vivo and are able to drive reporter gene expression specifically in MEF2-positive muscle cells. Presented here, the ChEST strategy was originally designed to identify regulatory modules in Drosophila, but it can be adapted for any sequenced and annotated genome.
Collapse
Affiliation(s)
- Guillaume Junion
- Institut National de la Santé et de la Recherche Médicale Unité 384, Faculté de Médecine, 28 Place Henri Dunant, 63000 Clermont-Ferrand, France
| | | | | | | | | | | |
Collapse
|
19
|
Rambaldi D, Guffanti A, Morandi P, Cassata G. NemaFootPrinter: a web based software for the identification of conserved non-coding genome sequence regions between C. elegans and C. briggsae. BMC Bioinformatics 2005; 6 Suppl 4:S22. [PMID: 16351749 PMCID: PMC1866385 DOI: 10.1186/1471-2105-6-s4-s22] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Background NemaFootPrinter (Nematode Transcription Factor Scan Through Philogenetic Footprinting) is a web-based software for interactive identification of conserved, non-exonic DNA segments in the genomes of C. elegans and C. briggsae. It has been implemented according to the following project specifications: a) Automated identification of orthologous gene pairs. b) Interactive selection of the boundaries of the genes to be compared. c) Pairwise sequence comparison with a range of different methods. d) Identification of putative transcription factor binding sites on conserved, non-exonic DNA segments. Results Starting from a C. elegans or C. briggsae gene name or identifier, the software identifies the putative ortholog (if any), based on information derived from public nematode genome annotation databases. The investigator can then retrieve the genome DNA sequences of the two orthologous genes; visualize graphically the genes' intron/exon structure and the surrounding DNA regions; select, through an interactive graphical user interface, subsequences of the two gene regions. Using a bioinformatics toolbox (Blast2seq, Dotmatcher, Ssearch and connection to the rVista database) the investigator is able at the end of the procedure to identify and analyze significant sequences similarities, detecting the presence of transcription factor binding sites corresponding to the conserved segments. The software automatically masks exons. Discussion This software is intended as a practical and intuitive tool for the researchers interested in the identification of non-exonic conserved sequence segments between C. elegans and C. briggsae. These sequences may contain regulatory transcriptional elements since they are conserved between two related, but rapidly evolving genomes. This software also highlights the power of genome annotation databases when they are conceived as an open resource and the possibilities offered by seamless integration of different web services via the http protocol. Availability: the program is freely available at
Collapse
Affiliation(s)
- Davide Rambaldi
- IFOM-FIRC Institute of Molecular Oncology Foundation, Milan, Italy.
| | | | | | | |
Collapse
|
20
|
Marinescu VD, Kohane IS, Riva A. The MAPPER database: a multi-genome catalog of putative transcription factor binding sites. Nucleic Acids Res 2005; 33:D91-7. [PMID: 15608292 PMCID: PMC540057 DOI: 10.1093/nar/gki103] [Citation(s) in RCA: 105] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022] Open
Abstract
We describe a comprehensive map of putative transcription factor binding sites (TFBSs) across multiple genomes created using a search method that relies on hidden Markov models built from experimentally determined TFBSs. Using the information in the TRANSFAC and JASPAR databases, we built 1134 models for TFBSs and used them to scan regions 10 kb upstream of the start of the transcript for all known genes in the human, mouse and Drosophila melanogaster genomes. The results, together with homology information on clusters of ortholog genes across the three genomes, were used to create a multi-organism catalog of annotated TFBSs. The catalog can be queried through a web interface accessible at http://bio.chip.org/mapper that allows the identification, visualization and selection of TFBSs occurring in the promoter of a gene of interest and also the common factors predicted to bind across the cluster of orthologs that includes that gene. Alternatively, the interface allows the user to retrieve binding sites for a single transcription factor of interest in a single gene or in all genes of the human, mouse or fruit fly genomes.
Collapse
Affiliation(s)
- Voichita D Marinescu
- Children's Hospital Informatics Program, Children's Hospital Boston, Harvard Medical School, Enders Research Building EN-150.9, 300 Longwood Avenue, Boston, MA 02115, USA
| | | | | |
Collapse
|
21
|
Chen N, Harris TW, Antoshechkin I, Bastiani C, Bieri T, Blasiar D, Bradnam K, Canaran P, Chan J, Chen CK, Chen WJ, Cunningham F, Davis P, Kenny E, Kishore R, Lawson D, Lee R, Muller HM, Nakamura C, Pai S, Ozersky P, Petcherski A, Rogers A, Sabo A, Schwarz EM, Van Auken K, Wang Q, Durbin R, Spieth J, Sternberg PW, Stein LD. WormBase: a comprehensive data resource for Caenorhabditis biology and genomics. Nucleic Acids Res 2005; 33:D383-9. [PMID: 15608221 PMCID: PMC540020 DOI: 10.1093/nar/gki066] [Citation(s) in RCA: 126] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
WormBase (http://www.wormbase.org), the model organism database for information about Caenorhabditis elegans and related nematodes, continues to expand in breadth and depth. Over the past year, WormBase has added multiple large-scale datasets including SAGE, interactome, 3D protein structure datasets and NCBI KOGs. To accommodate this growth, the International WormBase Consortium has improved the user interface by adding new features to aid in navigation, visualization of large-scale datasets, advanced searching and data mining. Internally, we have restructured the database models to rationalize the representation of genes and to prepare the system to accept the genome sequences of three additional Caenorhabditis species over the coming year.
Collapse
Affiliation(s)
- Nansheng Chen
- Cold Spring Harbor Laboratory, 1 Bungtown Road, Cold Spring Harbor, NY 11724, USA.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
22
|
Marinescu VD, Kohane IS, Riva A. MAPPER: a search engine for the computational identification of putative transcription factor binding sites in multiple genomes. BMC Bioinformatics 2005; 6:79. [PMID: 15799782 PMCID: PMC1131891 DOI: 10.1186/1471-2105-6-79] [Citation(s) in RCA: 171] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2004] [Accepted: 03/30/2005] [Indexed: 12/19/2022] Open
Abstract
Background Cis-regulatory modules are combinations of regulatory elements occurring in close proximity to each other that control the spatial and temporal expression of genes. The ability to identify them in a genome-wide manner depends on the availability of accurate models and of search methods able to detect putative regulatory elements with enhanced sensitivity and specificity. Results We describe the implementation of a search method for putative transcription factor binding sites (TFBSs) based on hidden Markov models built from alignments of known sites. We built 1,079 models of TFBSs using experimentally determined sequence alignments of sites provided by the TRANSFAC and JASPAR databases and used them to scan sequences of the human, mouse, fly, worm and yeast genomes. In several cases tested the method identified correctly experimentally characterized sites, with better specificity and sensitivity than other similar computational methods. Moreover, a large-scale comparison using synthetic data showed that in the majority of cases our method performed significantly better than a nucleotide weight matrix-based method. Conclusion The search engine, available at , allows the identification, visualization and selection of putative TFBSs occurring in the promoter or other regions of a gene from the human, mouse, fly, worm and yeast genomes. In addition it allows the user to upload a sequence to query and to build a model by supplying a multiple sequence alignment of binding sites for a transcription factor of interest. Due to its extensive database of models, powerful search engine and flexible interface, MAPPER represents an effective resource for the large-scale computational analysis of transcriptional regulation.
Collapse
Affiliation(s)
- Voichita D Marinescu
- Children's Hospital Informatics Program, Children's Hospital Boston, Harvard Medical School,300 Longwood Avenue, Boston, MA 02115, USA
| | - Isaac S Kohane
- Children's Hospital Informatics Program, Children's Hospital Boston, Harvard Medical School,300 Longwood Avenue, Boston, MA 02115, USA
| | - Alberto Riva
- Children's Hospital Informatics Program, Children's Hospital Boston, Harvard Medical School,300 Longwood Avenue, Boston, MA 02115, USA
| |
Collapse
|
23
|
McCarroll SA, Li H, Bargmann CI. Identification of Transcriptional Regulatory Elements in Chemosensory Receptor Genes by Probabilistic Segmentation. Curr Biol 2005; 15:347-52. [PMID: 15723796 DOI: 10.1016/j.cub.2005.02.023] [Citation(s) in RCA: 37] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2004] [Revised: 12/19/2004] [Accepted: 12/21/2004] [Indexed: 11/16/2022]
Abstract
Genome sequencing has allowed many gene regulatory elements to be identified through cross-species comparisons . However, the expression of genes in multigene families can diverge rapidly between related species . An alternative approach to characterizing multigene families utilizes the fact that genes within the group are likely to share aspects of their regulation. Here, we use a statistical approach, probabilistic segmentation , to identify sequences that are overrepresented in the regions upstream of C. elegans chemosensory receptor genes. Although each of these elements is present in only a subset of the genes, their distribution across and within the promoters of chemosensory receptor genes makes it possible to detect them. Many of the motifs show positional preference with respect to the translational start site and correspond to the binding sites of known families of transcription factors. We verified one motif, the E-box sequence WWYCACSTGYY, by showing that it directs expression of reporter genes to the ADL chemosensory neurons. Thus, probabilistic segmentation can be used to identify functional regulatory elements with no previous knowledge of gene expression or regulation. This approach may be of particular value for rapidly evolving genes in the immune system and the nervous system.
Collapse
Affiliation(s)
- Steven A McCarroll
- Department of Anatomy, University of California, San Francisco, San Francisco, CA 94143 USA
| | | | | |
Collapse
|
24
|
Kutlu B, Naamane N, Berthou L, Eizirik DL. New Approaches forin SilicoIdentification of Cytokine-Modified β Cell Gene Networks. Ann N Y Acad Sci 2004; 1037:41-58. [PMID: 15699492 DOI: 10.1196/annals.1337.007] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023]
Abstract
Beta cell dysfunction and death in type 1 diabetes mellitus (T1DM) is caused by direct contact with activated macrophages and T lymphocytes and by exposure to soluble mediators secreted by these cells, such as cytokines and nitric oxide. Cytokine-induced apoptosis depends on the expression of pro- and anti-apoptotic genes that remain to be characterized. Using microarray analyses, we identified several transcription factor and "effector" gene networks regulated by interleukin-1beta and/or interferon-gamma in beta cells. This suggests that beta cell fate following exposure to cytokines is a complex and highly regulated process, depending on the duration and severity of perturbation of key gene networks. In order to draw correct conclusions from these massive amounts of data, we need to utilize novel bioinformatics and statistical tools. Thus, we are presently performing in silico analysis for the localization of binding sites for the transcription factor NF-kappaB (previously shown to be pivotal for beta cell apoptosis) in 15 temporally related gene clusters, identified by time-course microarray analysis. In silico analysis is based on a broad range of computational techniques used to detect motifs in a DNA sequence corresponding to the binding site of a transcription factor. These computer-based findings must be validated by use of positive and negative controls, and by "ChIP on chip" analysis. Moreover, new statistical approaches are required to decrease false positive findings. These novel approaches will constitute a "proof of principle" for the integrated use of bioinformatics and functional genomics in the characterization of relevant cytokine-regulated beta cell gene networks leading to beta cell apoptosis in T1DM.
Collapse
Affiliation(s)
- Burak Kutlu
- Laboratory of Experimental Medicine, ULB, 808 route de Lennik, B-1070 Brussels, Belgium
| | | | | | | |
Collapse
|
25
|
MONKEY: identifying conserved transcription-factor binding sites in multiple alignments using a binding site-specific evolutionary model. Genome Biol 2004; 5:R98. [PMID: 15575972 PMCID: PMC545801 DOI: 10.1186/gb-2004-5-12-r98] [Citation(s) in RCA: 125] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2004] [Revised: 10/21/2004] [Accepted: 10/28/2004] [Indexed: 11/20/2022] Open
Abstract
MONKEY is a new method for identifying conserved transcription-factor binding sites from multiple-sequence alignments. We introduce a method (MONKEY) to identify conserved transcription-factor binding sites in multispecies alignments. MONKEY employs probabilistic models of factor specificity and binding-site evolution, on which basis we compute the likelihood that putative sites are conserved and assign statistical significance to each hit. Using genomes from the genus Saccharomyces, we illustrate how the significance of real sites increases with evolutionary distance and explore the relationship between conservation and function.
Collapse
|
26
|
Wenick AS, Hobert O. Genomic cis-regulatory architecture and trans-acting regulators of a single interneuron-specific gene battery in C. elegans. Dev Cell 2004; 6:757-70. [PMID: 15177025 DOI: 10.1016/j.devcel.2004.05.004] [Citation(s) in RCA: 163] [Impact Index Per Article: 8.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2004] [Revised: 04/09/2004] [Accepted: 04/12/2004] [Indexed: 10/26/2022]
Abstract
Gene batteries are sets of coregulated genes with common cis-regulatory elements that define the differentiated state of a cell. The nature of gene batteries for individual neuronal cellular subtypes and their linked cis-regulatory elements is poorly defined. Through molecular dissection of the highly modular cis-regulatory architecture of individual neuronally expressed genes, we have defined a conserved 16 bp cis-regulatory motif that drives gene expression in a single interneuron subtype, termed AIY, in the nematode Caenorhabditis elegans. This motif is bound and activated by the Paired- and LIM-type homeodomain proteins CEH-10 and TTX-3. Using genome-wide phylogenetic footprinting, we delineated the location, distribution, and evolution of AIY-specific cis-regulatory elements throughout the genome and thereby defined a large battery of AIY-expressed genes, all of which represent direct Paired/LIM homeodomain target genes. The identity of these homeodomain targets provides novel insights into the biology of the AIY interneuron.
Collapse
Affiliation(s)
- Adam S Wenick
- Department of Biochemistry and Molecular Biophysics, Center for Neurobiology and Behavior, Columbia University Medical Center, 701 West 168th Street, New York, NY 10032, USA
| | | |
Collapse
|
27
|
Portman DS, Bohmann D. Toward the computable transcriptome. Mol Cell 2004; 14:693-4. [PMID: 15200946 DOI: 10.1016/j.molcel.2004.06.013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Applying a combination of innovative approaches to understanding neuronal gene regulation in C. elegans, an article in the latest Developmental Cell (Wenick and Hobert, 2004) gives hope that reading the genome's transcriptional regulatory code may one day be possible.
Collapse
Affiliation(s)
- Douglas S Portman
- Department of Biomedical Genetics, University of Rochester Medical Center, Rochester, NY 14642, USA
| | | |
Collapse
|
28
|
Force A, Shashikant C, Stadler P, Amemiya CT. Comparative Genomics, cis-Regulatory Elements, and Gene Duplication. Methods Cell Biol 2004; 77:545-61. [PMID: 15602931 DOI: 10.1016/s0091-679x(04)77029-6] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/01/2023]
Affiliation(s)
- Allan Force
- Molecular Genetics Program, Benaroya Research Institute, Seattle, Washington 98101, USA
| | | | | | | |
Collapse
|