Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: GuhaThakurta D. Computational identification of transcriptional regulatory elements in DNA sequence. Nucleic Acids Res 2006;34:3585-98. [PMID: 16855295 PMCID: PMC1524905 DOI: 10.1093/nar/gkl372] [Citation(s) in RCA: 98] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023] Open

For:	GuhaThakurta D. Computational identification of transcriptional regulatory elements in DNA sequence. Nucleic Acids Res 2006;34:3585-98. [PMID: 16855295 PMCID: PMC1524905 DOI: 10.1093/nar/gkl372] [Citation(s) in RCA: 98] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023] Open

Number

Cited by Other Article(s)

Localization of the cis-enhancer element for mouse type X collagen expression in hypertrophic chondrocytes in vivo. J Bone Miner Res 2009;24:1022-32. [PMID: 19113928 PMCID: PMC2683646 DOI: 10.1359/jbmr.081249] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 01/19/2023]

Abstract

The type X collagen gene (Col10a1) is a specific molecular marker of hypertrophic chondrocytes during endochondral bone formation. Mutations in human COL10A1 and altered chondrocyte hypertrophy have been associated with multiple skeletal disorders. However, until recently, the cis-enhancer element that specifies Col10a1 expression in hypertrophic chondrocytes in vivo has remained unidentified. Previously, we and others have shown that the Col10a1 distal promoter (-4.4 to -3.8 kb) may harbor a critical enhancer that mediates its tissue specificity in transgenic mice studies. Here, we report further localization of the cis-enhancer element within this Col10a1 distal promoter by using a similar transgenic mouse approach. We identify a 150-bp Col10a1 promoter element (-4296 to -4147 bp) that is sufficient to direct its tissue-specific expression in vivo. In silico analysis identified several putative transcription factor binding sites including two potential activator protein-1 (AP-1) sites within its 5'- and 3'-ends (-4276 to -4243 and -4166 to -4152 bp), respectively. Interestingly, transgenic mice using a reporter construct deleted for these two AP-1 elements still showed tissue-specific reporter activity. EMSAs using oligonucleotide probes derived from this region and MCT cell nuclear extracts identified DNA/protein complexes that were enriched from cells stimulated to hypertrophy. Moreover, these elements mediated increased reporter activity on transfection into MCT cells. These data define a 90-bp cis-enhancer required for tissue-specific Col10a1 expression in vivo and putative DNA/protein complexes that contribute to the regulation of chondrocyte hypertrophy. This work will enable us to identify candidate transcription factors essential both for skeletal development and for the pathogenesis of skeletal disorders.

Collapse

Temiz NA, Camacho CJ. Experimentally based contact energies decode interactions responsible for protein-DNA affinity and the role of molecular waters at the binding interface. Nucleic Acids Res 2009;37:4076-88. [PMID: 19429892 PMCID: PMC2709573 DOI: 10.1093/nar/gkp289] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022] Open

Zhang S, Xu M, Li S, Su Z. Genome-wide de novo prediction of cis-regulatory binding sites in prokaryotes. Nucleic Acids Res 2009;37:e72. [PMID: 19383880 PMCID: PMC2691844 DOI: 10.1093/nar/gkp248] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023] Open

Zhang CQ, Wang J, Gao X. [Computational identification of transcriptional regulatory elements in Arabidopsis TCH4 promoter]. YI CHUAN = HEREDITAS 2009;30:620-6. [PMID: 18487153 DOI: 10.3724/sp.j.1005.2008.00620] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]

Pape UJ, Klein H, Vingron M. Statistical detection of cooperative transcription factors with similarity adjustment. Bioinformatics 2009;25:2103-9. [PMID: 19286833 PMCID: PMC2722994 DOI: 10.1093/bioinformatics/btp143] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022] Open

Miklós I, Novák Á, Satija R, Lyngsø R, Hein J. Stochastic models of sequence evolution including insertion—deletion events. Stat Methods Med Res 2009;18:453-85. [DOI: 10.1177/0962280208099500] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]

Kang K, Chung JH, Kim J. Evolutionary Conserved Motif Finder (ECMFinder) for genome-wide identification of clustered YY1- and CTCF-binding sites. Nucleic Acids Res 2009;37:2003-13. [PMID: 19208640 PMCID: PMC2665242 DOI: 10.1093/nar/gkp077] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023] Open

Alon S, Eisenberg E, Jacob-Hirsch J, Rechavi G, Vatine G, Toyama R, Coon SL, Klein DC, Gothilf Y. A new cis-acting regulatory element driving gene expression in the zebrafish pineal gland. Bioinformatics 2009;25:559-62. [PMID: 19147662 DOI: 10.1093/bioinformatics/btp031] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022] Open

Wichadakul D, McDermott J, Samudrala R. Prediction and integration of regulatory and protein-protein interactions. Methods Mol Biol 2009;541:101-43. [PMID: 19381527 DOI: 10.1007/978-1-59745-243-4_6] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023]

Yaragatti M, Sandler T, Ungar L. A predictive model for identifying mini-regulatory modules in the mouse genome. Bioinformatics 2008;25:353-7. [DOI: 10.1093/bioinformatics/btn622] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open

Chaivorapol C, Melton C, Wei G, Yeh RF, Ramalho-Santos M, Blelloch R, Li H. CompMoby: comparative MobyDick for detection of cis-regulatory motifs. BMC Bioinformatics 2008;9:455. [PMID: 18950538 PMCID: PMC2605473 DOI: 10.1186/1471-2105-9-455] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2008] [Accepted: 10/27/2008] [Indexed: 12/31/2022] Open

Abstract

BACKGROUND

The regulation of gene expression is complex and occurs at many levels, including transcriptional and post-transcriptional, in metazoans. Transcriptional regulation is mainly determined by sequence elements within the promoter regions of genes while sequence elements within the 3' untranslated regions of mRNAs play important roles in post-transcriptional regulation such as mRNA stability and translation efficiency. Identifying cis-regulatory elements, or motifs, in multicellular eukaryotes is more difficult compared to unicellular eukaryotes due to the larger intergenic sequence space and the increased complexity in regulation. Experimental techniques for discovering functional elements are often time consuming and not easily applied on a genome level. Consequently, computational methods are advantageous for genome-wide cis-regulatory motif detection. To decrease the search space in metazoans, many algorithms use cross-species alignment, although studies have demonstrated that a large portion of the binding sites for the same trans-acting factor do not reside in alignable regions. Therefore, a computational algorithm should account for both conserved and nonconserved cis-regulatory elements in metazoans.

RESULTS

We present CompMoby (Comparative MobyDick), software developed to identify cis-regulatory binding sites at both the transcriptional and post-transcriptional levels in metazoans without prior knowledge of the trans-acting factors. The CompMoby algorithm was previously shown to identify cis-regulatory binding sites in upstream regions of genes co-regulated in embryonic stem cells. In this paper, we extend the software to identify putative cis-regulatory motifs in 3' UTR sequences and verify our results using experimentally validated data sets in mouse and human. We also detail the implementation of CompMoby into a user-friendly tool that includes a web interface to a streamlined analysis. Our software allows detection of motifs in the following three categories: one, those that are alignable and conserved; two, those that are conserved but not alignable; three, those that are species specific. One of the output files from CompMoby gives the user the option to decide what category of cis-regulatory element to experimentally pursue based on their biological problem. Using experimentally validated biological datasets, we demonstrate that CompMoby is successful in detecting cis-regulatory target sites of known and novel trans-acting factors at the transcriptional and post-transcriptional levels.

CONCLUSION

CompMoby is a powerful software tool for systematic de novo discovery of evolutionarily conserved and nonconserved cis-regulatory sequences involved in transcriptional or post-transcriptional regulation in metazoans. This software is freely available to users at http://genome.ucsf.edu/compmoby/.

Collapse

da Fonseca PGS, Guimarães KS, Sagot MF. Efficient representation and P-value computation for high-order Markov motifs. Bioinformatics 2008;24:i160-6. [PMID: 18689819 DOI: 10.1093/bioinformatics/btn282] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open

Fauteux F, Blanchette M, Strömvik MV. Seeder: discriminative seeding DNA motif discovery. ACTA ACUST UNITED AC 2008;24:2303-7. [PMID: 18718942 PMCID: PMC2562012 DOI: 10.1093/bioinformatics/btn444] [Citation(s) in RCA: 37] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]

Murray JI, Voelker RB, Henscheid KL, Warf MB, Berglund JA. Identification of motifs that function in the splicing of non-canonical introns. Genome Biol 2008;9:R97. [PMID: 18549497 PMCID: PMC2481429 DOI: 10.1186/gb-2008-9-6-r97] [Citation(s) in RCA: 31] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2007] [Revised: 12/27/2007] [Accepted: 06/12/2008] [Indexed: 01/22/2023] Open

Abstract

The enrichment of specific intronic splicing enhancers upstream of weak PY tracts suggests a novel mechanism for intron recognition that compensates for a weakened canonical pre-mRNA splicing motif.

Background

While the current model of pre-mRNA splicing is based on the recognition of four canonical intronic motifs (5' splice site, branchpoint sequence, polypyrimidine (PY) tract and 3' splice site), it is becoming increasingly clear that splicing is regulated by both canonical and non-canonical splicing signals located in the RNA sequence of introns and exons that act to recruit the spliceosome and associated splicing factors. The diversity of human intronic sequences suggests the existence of novel recognition pathways for non-canonical introns. This study addresses the recognition and splicing of human introns that lack a canonical PY tract. The PY tract is a uridine-rich region at the 3' end of introns that acts as a binding site for U2AF65, a key factor in splicing machinery recruitment.

Results

Human introns were classified computationally into low- and high-scoring PY tracts by scoring the likely U2AF65 binding site strength. Biochemical studies confirmed that low-scoring PY tracts are weak U2AF65 binding sites while high-scoring PY tracts are strong U2AF65 binding sites. A large population of human introns contains weak PY tracts. Computational analysis revealed many families of motifs, including C-rich and G-rich motifs, that are enriched upstream of weak PY tracts. In vivo splicing studies show that C-rich and G-rich motifs function as intronic splicing enhancers in a combinatorial manner to compensate for weak PY tracts.

Conclusion

The enrichment of specific intronic splicing enhancers upstream of weak PY tracts suggests that a novel mechanism for intron recognition exists, which compensates for a weakened canonical pre-mRNA splicing motif.

Collapse

Lähdesmäki H, Rust AG, Shmulevich I. Probabilistic inference of transcription factor binding from multiple data sources. PLoS One 2008;3:e1820. [PMID: 18364997 PMCID: PMC2268002 DOI: 10.1371/journal.pone.0001820] [Citation(s) in RCA: 38] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2007] [Accepted: 02/04/2008] [Indexed: 11/21/2022] Open

Abstract

An important problem in molecular biology is to build a complete understanding of transcriptional regulatory processes in the cell. We have developed a flexible, probabilistic framework to predict TF binding from multiple data sources that differs from the standard hypothesis testing (scanning) methods in several ways. Our probabilistic modeling framework estimates the probability of binding and, thus, naturally reflects our degree of belief in binding. Probabilistic modeling also allows for easy and systematic integration of our binding predictions into other probabilistic modeling methods, such as expression-based gene network inference. The method answers the question of whether the whole analyzed promoter has a binding site, but can also be extended to estimate the binding probability at each nucleotide position. Further, we introduce an extension to model combinatorial regulation by several TFs. Most importantly, the proposed methods can make principled probabilistic inference from multiple evidence sources, such as, multiple statistical models (motifs) of the TFs, evolutionary conservation, regulatory potential, CpG islands, nucleosome positioning, DNase hypersensitive sites, ChIP-chip binding segments and other (prior) sequence-based biological knowledge. We developed both a likelihood and a Bayesian method, where the latter is implemented with a Markov chain Monte Carlo algorithm. Results on a carefully constructed test set from the mouse genome demonstrate that principled data fusion can significantly improve the performance of TF binding prediction methods. We also applied the probabilistic modeling framework to all promoters in the mouse genome and the results indicate a sparse connectivity between transcriptional regulators and their target promoters. To facilitate analysis of other sequences and additional data, we have developed an on-line web tool, ProbTF, which implements our probabilistic TF binding prediction method using multiple data sources. Test data set, a web tool, source codes and supplementary data are available at: http://www.probtf.org.

Collapse

Levitsky VG, Ignatieva EV, Ananko EA, Turnaev II, Merkulova TI, Kolchanov NA, Hodgman TC. Effective transcription factor binding site prediction using a combination of optimization, a genetic algorithm and discriminant analysis to capture distant interactions. BMC Bioinformatics 2007;8:481. [PMID: 18093302 PMCID: PMC2265442 DOI: 10.1186/1471-2105-8-481] [Citation(s) in RCA: 29] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2007] [Accepted: 12/19/2007] [Indexed: 12/22/2022] Open

Abstract

Background

Reliable transcription factor binding site (TFBS) prediction methods are essential for computer annotation of large amount of genome sequence data. However, current methods to predict TFBSs are hampered by the high false-positive rates that occur when only sequence conservation at the core binding-sites is considered.

Results

To improve this situation, we have quantified the performance of several Position Weight Matrix (PWM) algorithms, using exhaustive approaches to find their optimal length and position. We applied these approaches to bio-medically important TFBSs involved in the regulation of cell growth and proliferation as well as in inflammatory, immune, and antiviral responses (NF-κB, ISGF3, IRF1, STAT1), obesity and lipid metabolism (PPAR, SREBP, HNF4), regulation of the steroidogenic (SF-1) and cell cycle (E2F) genes expression. We have also gained extra specificity using a method, entitled SiteGA, which takes into account structural interactions within TFBS core and flanking regions, using a genetic algorithm (GA) with a discriminant function of locally positioned dinucleotide (LPD) frequencies.

To ensure a higher confidence in our approach, we applied resampling-jackknife and bootstrap tests for the comparison, it appears that, optimized PWM and SiteGA have shown similar recognition performances. Then we applied SiteGA and optimized PWMs (both separately and together) to sequences in the Eukaryotic Promoter Database (EPD). The resulting SiteGA recognition models can now be used to search sequences for BSs using the web tool, SiteGA.

Analysis of dependencies between close and distant LPDs revealed by SiteGA models has shown that the most significant correlations are between close LPDs, and are generally located in the core (footprint) region. A greater number of less significant correlations are mainly between distant LPDs, which spanned both core and flanking regions. When SiteGA and optimized PWM models were applied together, this substantially reduced false positives at least at higher stringencies.

Conclusion

Based on this analysis, SiteGA adds substantial specificity even to optimized PWMs and may be considered for large-scale genome analysis. It adds to the range of techniques available for TFBS prediction, and EPD analysis has led to a list of genes which appear to be regulated by the above TFs.

Collapse

Bi C, Leeder JS, Vyhlidal CA. A comparative study on computational two-block motif detection: algorithms and applications. Mol Pharm 2007;5:3-16. [PMID: 18076137 DOI: 10.1021/mp7001126] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]

Wang X, Gu J, Zhang MQ, Li Y. Identification of phylogenetically conserved microRNA cis-regulatory elements across 12 Drosophila species. Bioinformatics 2007;24:165-71. [DOI: 10.1093/bioinformatics/btm572] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/21/2023] Open

Cogburn LA, Porter TE, Duclos MJ, Simon J, Burgess SC, Zhu JJ, Cheng HH, Dodgson JB, Burnside J. Functional genomics of the chicken--a model organism. Poult Sci 2007;86:2059-94. [PMID: 17878436 DOI: 10.1093/ps/86.10.2059] [Citation(s) in RCA: 86] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open

Abstract

Since the sequencing of the genome and the development of high-throughput tools for the exploration of functional elements of the genome, the chicken has reached model organism status. Functional genomics focuses on understanding the function and regulation of genes and gene products on a global or genome-wide scale. Systems biology attempts to integrate functional information derived from multiple high-content data sets into a holistic view of all biological processes within a cell or organism. Generation of a large collection ( approximately 600K) of chicken expressed sequence tags, representing most tissues and developmental stages, has enabled the construction of high-density microarrays for transcriptional profiling. Comprehensive analysis of this large expressed sequence tag collection and a set of approximately 20K full-length cDNA sequences indicate that the transcriptome of the chicken represents approximately 20,000 genes. Furthermore, comparative analyses of these sequences have facilitated functional annotation of the genome and the creation of several bioinformatic resources for the chicken. Recently, about 20 papers have been published on transcriptional profiling with DNA microarrays in chicken tissues under various conditions. Proteomics is another powerful high-throughput tool currently used for examining the dynamics of protein expression in chicken tissues and fluids. Computational analyses of the chicken genome are providing new insight into the evolution of gene families in birds and other organisms. Abundant functional genomic resources now support large-scale analyses in the chicken and will facilitate identification of transcriptional mechanisms, gene networks, and metabolic or regulatory pathways that will ultimately determine the phenotype of the bird. New technologies such as marker-assisted selection, transgenics, and RNA interference offer the opportunity to modify the phenotype of the chicken to fit defined production goals. This review focuses on functional genomics in the chicken and provides a road map for large-scale exploration of the chicken genome.

Collapse

Dinkel H, Sticht H. A computational strategy for the prediction of functional linear peptide motifs in proteins. ACTA ACUST UNITED AC 2007;23:3297-303. [PMID: 17977881 DOI: 10.1093/bioinformatics/btm524] [Citation(s) in RCA: 29] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]

Grskovic M, Chaivorapol C, Gaspar-Maia A, Li H, Ramalho-Santos M. Systematic identification of cis-regulatory sequences active in mouse and human embryonic stem cells. PLoS Genet 2007;3:e145. [PMID: 17784790 PMCID: PMC1959362 DOI: 10.1371/journal.pgen.0030145] [Citation(s) in RCA: 75] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2007] [Accepted: 07/10/2007] [Indexed: 01/06/2023] Open

Abstract

Understanding the transcriptional regulation of pluripotent cells is of fundamental interest and will greatly inform efforts aimed at directing differentiation of embryonic stem (ES) cells or reprogramming somatic cells. We first analyzed the transcriptional profiles of mouse ES cells and primordial germ cells and identified genes upregulated in pluripotent cells both in vitro and in vivo. These genes are enriched for roles in transcription, chromatin remodeling, cell cycle, and DNA repair. We developed a novel computational algorithm, CompMoby, which combines analyses of sequences both aligned and non-aligned between different genomes with a probabilistic segmentation model to systematically predict short DNA motifs that regulate gene expression. CompMoby was used to identify conserved overrepresented motifs in genes upregulated in pluripotent cells. We show that the motifs are preferentially active in undifferentiated mouse ES and embryonic germ cells in a sequence-specific manner, and that they can act as enhancers in the context of an endogenous promoter. Importantly, the activity of the motifs is conserved in human ES cells. We further show that the transcription factor NF-Y specifically binds to one of the motifs, is differentially expressed during ES cell differentiation, and is required for ES cell proliferation. This study provides novel insights into the transcriptional regulatory networks of pluripotent cells. Our results suggest that this systematic approach can be broadly applied to understanding transcriptional networks in mammalian species.

Embryonic stem cells have two remarkable properties: they can proliferate very rapidly, and they can give rise to all of the body's cell types. Understanding how gene activity is regulated in embryonic stem cells will be an important step towards therapeutic applications. The activity of genes is regulated by proteins called transcription factors, which bind to stretches of DNA sequences that act as on or off switches. We identified genes that are active in mouse embryonic stem cells but not in differentiated cells. We reasoned that if these genes have similar patterns of activity, they may be regulated by the same transcription factors. We therefore developed a computational approach that takes information on gene activity and predicts DNA sequences that may act as switches. Using this approach, we discovered new DNA switches that regulate gene activity in mouse and human embryonic stem cells. Furthermore, we identified a transcription factor that binds to one of these DNA switches and is important for the rapid proliferation of embryonic stem cells. Our approach sheds light on the genetic regulation of embryonic stem cells and will be broadly applicable to questions of how gene activity is regulated in other cell types of interest.

Collapse

Affiliation(s)

Marica Grskovic Institute for Regeneration Medicine, University of California San Francisco, San Francisco, California, United States of America Diabetes Center, University of California San Francisco, San Francisco, California, United States of America
Christina Chaivorapol Department of Biochemistry and Biophysics, University of California San Francisco, San Francisco, California, United States of America California Institute for Quantitative Biomedical Research, University of California San Francisco, San Francisco, California, United States of America Graduate Program in Biological and Medical Informatics; University of California San Francisco, San Francisco, California, United States of America
Alexandre Gaspar-Maia Institute for Regeneration Medicine, University of California San Francisco, San Francisco, California, United States of America Diabetes Center, University of California San Francisco, San Francisco, California, United States of America Doctoral Program in Biomedicine and Experimental Biology, Center for Neuroscience and Cell Biology, University of Coimbra, Coimbra, Portugal
Hao Li Department of Biochemistry and Biophysics, University of California San Francisco, San Francisco, California, United States of America California Institute for Quantitative Biomedical Research, University of California San Francisco, San Francisco, California, United States of America Graduate Program in Biological and Medical Informatics; University of California San Francisco, San Francisco, California, United States of America * To whom correspondence should be addressed. E-mail: (HL); (MRS)
Miguel Ramalho-Santos Institute for Regeneration Medicine, University of California San Francisco, San Francisco, California, United States of America Diabetes Center, University of California San Francisco, San Francisco, California, United States of America * To whom correspondence should be addressed. E-mail: (HL); (MRS)

Collapse

Goto N, Kurokawa K, Yasunaga T. Analysis of invariant sequences in 266 complete genomes. Gene 2007;401:172-80. [PMID: 17728079 DOI: 10.1016/j.gene.2007.07.017] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2006] [Revised: 07/13/2007] [Accepted: 07/16/2007] [Indexed: 11/29/2022]

Doyon JB, Liu DR. Identification of eukaryotic promoter regulatory elements using nonhomologous random recombination. Nucleic Acids Res 2007;35:5851-60. [PMID: 17720707 PMCID: PMC2034452 DOI: 10.1093/nar/gkm634] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open

A novel ensemble learning method for de novo computational identification of DNA binding sites. BMC Bioinformatics 2007;8:249. [PMID: 17626633 PMCID: PMC1950314 DOI: 10.1186/1471-2105-8-249] [Citation(s) in RCA: 38] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2007] [Accepted: 07/12/2007] [Indexed: 12/02/2022] Open

Okumura T, Makiguchi H, Makita Y, Yamashita R, Nakai K. Melina II: a web tool for comparisons among several predictive algorithms to find potential motifs from promoter regions. Nucleic Acids Res 2007;35:W227-31. [PMID: 17537821 PMCID: PMC1933176 DOI: 10.1093/nar/gkm362] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/30/2023] Open

Carlson JM, Chakravarty A, DeZiel CE, Gross RH. SCOPE: a web server for practical de novo motif discovery. Nucleic Acids Res 2007;35:W259-64. [PMID: 17485471 PMCID: PMC1933170 DOI: 10.1093/nar/gkm310] [Citation(s) in RCA: 85] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open

Jegga AG, Chen J, Gowrisankar S, Deshmukh MA, Gudivada R, Kong S, Kaimal V, Aronow BJ. GenomeTrafac: a whole genome resource for the detection of transcription factor binding site clusters associated with conventional and microRNA encoding genes conserved between mouse and human gene orthologs. Nucleic Acids Res 2006;35:D116-21. [PMID: 17178752 PMCID: PMC1781107 DOI: 10.1093/nar/gkl1011] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open

Brilli M, Fani R, Lió P. MotifScorer: using a compendium of microarrays to identify regulatory motifs. Bioinformatics 2006;23:493-5. [PMID: 17138590 DOI: 10.1093/bioinformatics/btl607] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open

GuhaThakurta D, Xie T, Anand M, Edwards SW, Li G, Wang SS, Schadt EE. Cis-regulatory variations: a study of SNPs around genes showing cis-linkage in segregating mouse populations. BMC Genomics 2006;7:235. [PMID: 16978413 PMCID: PMC1618400 DOI: 10.1186/1471-2164-7-235] [Citation(s) in RCA: 39] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2006] [Accepted: 09/15/2006] [Indexed: 11/10/2022] Open

Abstract

Background

Changes in gene expression are known to be responsible for phenotypic variation and susceptibility to diseases. Identification and annotation of the genomic sequence variants that cause gene expression changes is therefore likely to lead to a better understanding of the cause of disease at the molecular level. In this study we investigate the pattern of single nucleotide polymorphisms (SNPs) in genes for which the mRNA levels show cis-genetic linkage (gene expression quantitative trait loci mapping in cis, or cis-eQTLs) in segregating mouse populations. Such genes are expected to have polymorphisms near their physical location (cis-variations) that affect their mRNA levels by altering one or more of the cis-regulatory elements. This led us to characterize the SNPs in promoter (5 Kb upstream) and non-coding gene regions (introns and 5 Kb downstream) (cis-SNPs) and the effects they may have on putative transcription factor binding sites.

Results

We demonstrate that the cis-eQTL genes (CEGs) have a significantly higher frequency of cis-SNPs compared to non-CEGs (when both sets are taken from the non-IBD regions, i.e. regions not identical by descent). Most CEGs having cis-SNPs do not contain these SNPs in the phylogenetically conserved regions. In those CEGs that contain cis-SNPs in the phylogenetically conserved regions, enrichment of cis-SNPs occurs both within and outside of the conserved sequences. A higher fraction of CEGs are also seen to harbor cis-SNP that affect predicted transcription factor binding sites, a likely consequence of the higher cis-SNPs density in these genes.

Conclusion

This present study provides the first genome-wide investigation of the putative cis-regulatory variations in a large set of genes whose levels of expression give rise to cis-linkage in segregating mammalian populations. Our results provide insights into the challenges that exist in identifying polymorphisms regulating gene expression using bioinformatic sequence analysis approaches. The data provided herein should benefit future investigations in this area.

Collapse