Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Siddharthan R, Siggia ED, van Nimwegen E. PhyloGibbs: a Gibbs sampling motif finder that incorporates phylogeny. PLoS Comput Biol 2005;1:e67. [PMID: 16477324 PMCID: PMC1309704 DOI: 10.1371/journal.pcbi.0010067] [Citation(s) in RCA: 176] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2005] [Accepted: 10/28/2005] [Indexed: 12/27/2022] Open

For:	Siddharthan R, Siggia ED, van Nimwegen E. PhyloGibbs: a Gibbs sampling motif finder that incorporates phylogeny. PLoS Comput Biol 2005;1:e67. [PMID: 16477324 PMCID: PMC1309704 DOI: 10.1371/journal.pcbi.0010067] [Citation(s) in RCA: 176] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2005] [Accepted: 10/28/2005] [Indexed: 12/27/2022] Open

Number

Cited by Other Article(s)

Seitzer P, Wilbanks EG, Larsen DJ, Facciotti MT. A Monte Carlo-based framework enhances the discovery and interpretation of regulatory sequence motifs. BMC Bioinformatics 2012. [PMID: 23181585 PMCID: PMC3542263 DOI: 10.1186/1471-2105-13-317] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open

Abstract

BACKGROUND

Discovery of functionally significant short, statistically overrepresented subsequence patterns (motifs) in a set of sequences is a challenging problem in bioinformatics. Oftentimes, not all sequences in the set contain a motif. These non-motif-containing sequences complicate the algorithmic discovery of motifs. Filtering the non-motif-containing sequences from the larger set of sequences while simultaneously determining the identity of the motif is, therefore, desirable and a non-trivial problem in motif discovery research.

RESULTS

We describe MotifCatcher, a framework that extends the sensitivity of existing motif-finding tools by employing random sampling to effectively remove non-motif-containing sequences from the motif search. We developed two implementations of our algorithm; each built around a commonly used motif-finding tool, and applied our algorithm to three diverse chromatin immunoprecipitation (ChIP) data sets. In each case, the motif finder with the MotifCatcher extension demonstrated improved sensitivity over the motif finder alone. Our approach organizes candidate functionally significant discovered motifs into a tree, which allowed us to make additional insights. In all cases, we were able to support our findings with experimental work from the literature.

CONCLUSIONS

Our framework demonstrates that additional processing at the sequence entry level can significantly improve the performance of existing motif-finding tools. For each biological data set tested, we were able to propose novel biological hypotheses supported by experimental work from the literature. Specifically, in Escherichia coli, we suggested binding site motifs for 6 non-traditional LexA protein binding sites; in Saccharomyces cerevisiae, we hypothesize 2 disparate mechanisms for novel binding sites of the Cse4p protein; and in Halobacterium sp. NRC-1, we discoverd subtle differences in a general transcription factor (GTF) binding site motif across several data sets. We suggest that small differences in our discovered motif could confer specificity for one or more homologous GTF proteins. We offer a free implementation of the MotifCatcher software package at http://www.bme.ucdavis.edu/facciotti/resources_data/software/.

Collapse

Pachkov M, Balwierz PJ, Arnold P, Ozonov E, van Nimwegen E. SwissRegulon, a database of genome-wide annotations of regulatory sites: recent updates. Nucleic Acids Res 2012. [PMID: 23180783 PMCID: PMC3531101 DOI: 10.1093/nar/gks1145] [Citation(s) in RCA: 102] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022] Open

Simcha D, Price ND, Geman D. The limits of de novo DNA motif discovery. PLoS One 2012;7:e47836. [PMID: 23144830 PMCID: PMC3492406 DOI: 10.1371/journal.pone.0047836] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2012] [Accepted: 09/21/2012] [Indexed: 12/02/2022] Open

Abstract

A major challenge in molecular biology is reverse-engineering the cis-regulatory logic that plays a major role in the control of gene expression. This program includes searching through DNA sequences to identify “motifs” that serve as the binding sites for transcription factors or, more generally, are predictive of gene expression across cellular conditions. Several approaches have been proposed for de novo motif discovery–searching sequences without prior knowledge of binding sites or nucleotide patterns. However, unbiased validation is not straightforward. We consider two approaches to unbiased validation of discovered motifs: testing the statistical significance of a motif using a DNA “background” sequence model to represent the null hypothesis and measuring performance in predicting membership in gene clusters. We demonstrate that the background models typically used are “too null,” resulting in overly optimistic assessments of significance, and argue that performance in predicting TF binding or expression patterns from DNA motifs should be assessed by held-out data, as in predictive learning. Applying this criterion to common motif discovery methods resulted in universally poor performance, although there is a marked improvement when motifs are statistically significant against real background sequences. Moreover, on synthetic data where “ground truth” is known, discriminative performance of all algorithms is far below the theoretical upper bound, with pronounced “over-fitting” in training. A key conclusion from this work is that the failure of de novo discovery approaches to accurately identify motifs is basically due to statistical intractability resulting from the fixed size of co-regulated gene clusters, and thus such failures do not necessarily provide evidence that unfound motifs are not active biologically. Consequently, the use of prior knowledge to enhance motif discovery is not just advantageous but necessary. An implementation of the LR and ALR algorithms is available at http://code.google.com/p/likelihood-ratio-motifs/.

Collapse

Hafner M, Lianoglou S, Tuschl T, Betel D. Genome-wide identification of miRNA targets by PAR-CLIP. Methods 2012;58:94-105. [PMID: 22926237 PMCID: PMC3508682 DOI: 10.1016/j.ymeth.2012.08.006] [Citation(s) in RCA: 84] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2012] [Revised: 08/10/2012] [Accepted: 08/12/2012] [Indexed: 01/08/2023] Open

Katara P, Grover A, Sharma V. Phylogenetic footprinting: a boost for microbial regulatory genomics. PROTOPLASMA 2012;249:901-907. [PMID: 22113593 DOI: 10.1007/s00709-011-0351-9] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/27/2011] [Accepted: 11/09/2011] [Indexed: 05/31/2023]

Cornish JP, Matthews F, Thomas JR, Erill I. Inference of self-regulated transcriptional networks by comparative genomics. Evol Bioinform Online 2012;8:449-61. [PMID: 23032607 PMCID: PMC3422134 DOI: 10.4137/ebo.s9205] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022] Open

Zia A, Moses AM. Towards a theoretical understanding of false positives in DNA motif finding. BMC Bioinformatics 2012;13:151. [PMID: 22738169 PMCID: PMC3436861 DOI: 10.1186/1471-2105-13-151] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2011] [Accepted: 06/27/2012] [Indexed: 11/10/2022] Open

Arnold P, Erb I, Pachkov M, Molina N, van Nimwegen E. MotEvo: integrated Bayesian probabilistic methods for inferring regulatory sites and motifs on multiple alignments of DNA sequences. ACTA ACUST UNITED AC 2012;28:487-94. [PMID: 22334039 DOI: 10.1093/bioinformatics/btr695] [Citation(s) in RCA: 65] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]

Midha M, Prasad NK, Vindal V. MycoRRdb: a database of computationally identified regulatory regions within intergenic sequences in mycobacterial genomes. PLoS One 2012;7:e36094. [PMID: 22563442 PMCID: PMC3338573 DOI: 10.1371/journal.pone.0036094] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2011] [Accepted: 03/29/2012] [Indexed: 11/18/2022] Open

Zambelli F, Pesole G, Pavesi G. Motif discovery and transcription factor binding sites before and after the next-generation sequencing era. Brief Bioinform 2012;14:225-37. [PMID: 22517426 PMCID: PMC3603212 DOI: 10.1093/bib/bbs016] [Citation(s) in RCA: 93] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023] Open

Pearson JC, Watson JD, Crews ST. Drosophila melanogaster Zelda and Single-minded collaborate to regulate an evolutionarily dynamic CNS midline cell enhancer. Dev Biol 2012;366:420-32. [PMID: 22537497 DOI: 10.1016/j.ydbio.2012.04.001] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2012] [Revised: 04/04/2012] [Accepted: 04/06/2012] [Indexed: 10/28/2022]

Abstract

The Drosophila Zelda transcription factor plays an important role in regulating transcription at the embryonic maternal-to-zygotic transition. However, expression of zelda continues throughout embryogenesis in cells including the developing CNS and trachea, but little is known about its post-blastoderm functions. In this paper, it is shown that zelda directly controls CNS midline and tracheal expression of the link (CG13333) gene, as well as link blastoderm expression. The link gene contains a 5' enhancer with multiple Zelda TAGteam binding sites that in vivo mutational studies show are required for link transcription. The link enhancer also has a binding site for the Single-minded:Tango and Trachealess:Tango bHLH-PAS proteins that also influences link midline and tracheal expression. These results provide an example of how a transcription factor (Single-minded or Trachealess) can interact with distinct co-regulatory proteins (Zelda or Sox/POU-homeodomain proteins) to control a similar pattern of expression of different target genes in a mechanistically different manner. While zelda and single-minded midline expression is well-conserved in Drosophila, midline expression of link is not well-conserved. Phylogenetic analysis of link expression suggests that ~60 million years ago, midline expression was nearly or completely absent, and first appeared in the melanogaster group (including D. melanogaster, D. yakuba, and D. erecta) >13 million years ago. The differences in expression are due, in part, to sequence polymorphisms in the link enhancer and likely due to altered binding of multiple transcription factors. Less than 6 million years ago, a second change occurred that resulted in high levels of expression in D. melanogaster. This change may be due to alterations in a putative Zelda binding site. Within the CNS, the zelda gene is alternatively spliced beginning at mid-embryogenesis into transcripts that encode a Zelda isoform missing three zinc fingers from the DNA binding domain. This may result in a protein with altered, possibly non-functional, DNA-binding properties. In summary, Zelda collaborates with bHLH-PAS proteins to directly regulate midline and tracheal expression of an evolutionary dynamic enhancer in the post-blastoderm embryo.

Collapse

Vaughn JN, Ellingson SR, Mignone F, von Arnim A. Known and novel post-transcriptional regulatory sequences are conserved across plant families. RNA (NEW YORK, N.Y.) 2012;18:368-84. [PMID: 22237150 PMCID: PMC3285926 DOI: 10.1261/rna.031179.111] [Citation(s) in RCA: 50] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/03/2023]

Contribution of transcription factor binding site motif variants to condition-specific gene expression patterns in budding yeast. PLoS One 2012;7:e32274. [PMID: 22384202 PMCID: PMC3285675 DOI: 10.1371/journal.pone.0032274] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2011] [Accepted: 01/24/2012] [Indexed: 11/19/2022] Open

Sanchez-Alberola N, Campoy S, Barbé J, Erill I. Analysis of the SOS response of Vibrio and other bacteria with multiple chromosomes. BMC Genomics 2012;13:58. [PMID: 22305460 PMCID: PMC3323433 DOI: 10.1186/1471-2164-13-58] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2011] [Accepted: 02/03/2012] [Indexed: 12/18/2022] Open

Abstract

Background

The SOS response is a well-known regulatory network present in most bacteria and aimed at addressing DNA damage. It has also been linked extensively to stress-induced mutagenesis, virulence and the emergence and dissemination of antibiotic resistance determinants. Recently, the SOS response has been shown to regulate the activity of integrases in the chromosomal superintegrons of the Vibrionaceae, which encompasses a wide range of pathogenic species harboring multiple chromosomes. Here we combine in silico and in vitro techniques to perform a comparative genomics analysis of the SOS regulon in the Vibrionaceae, and we extend the methodology to map this transcriptional network in other bacterial species harboring multiple chromosomes.

Results

Our analysis provides the first comprehensive description of the SOS response in a family (Vibrionaceae) that includes major human pathogens. It also identifies several previously unreported members of the SOS transcriptional network, including two proteins of unknown function. The analysis of the SOS response in other bacterial species with multiple chromosomes uncovers additional regulon members and reveals that there is a conserved core of SOS genes, and that specialized additions to this basic network take place in different phylogenetic groups. Our results also indicate that across all groups the main elements of the SOS response are always found in the large chromosome, whereas specialized additions are found in the smaller chromosomes and plasmids.

Conclusions

Our findings confirm that the SOS response of the Vibrionaceae is strongly linked with pathogenicity and dissemination of antibiotic resistance, and suggest that the characterization of the newly identified members of this regulon could provide key insights into the pathogenesis of Vibrio. The persistent location of key SOS genes in the large chromosome across several bacterial groups confirms that the SOS response plays an essential role in these organisms and sheds light into the mechanisms of evolution of global transcriptional networks involved in adaptability and rapid response to environmental changes, suggesting that small chromosomes may act as evolutionary test beds for the rewiring of transcriptional networks.

Collapse

König J, Zarnack K, Luscombe NM, Ule J. Protein-RNA interactions: new genomic technologies and perspectives. Nat Rev Genet 2012;13:77-83. [PMID: 22251872 DOI: 10.1038/nrg3141] [Citation(s) in RCA: 349] [Impact Index Per Article: 29.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]

Erb I, González-Vallinas JR, Bussotti G, Blanco E, Eyras E, Notredame C. Use of ChIP-Seq data for the design of a multiple promoter-alignment method. Nucleic Acids Res 2012;40:e52. [PMID: 22230796 PMCID: PMC3326335 DOI: 10.1093/nar/gkr1292] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023] Open

Aerts S. Computational strategies for the genome-wide identification of cis-regulatory elements and transcriptional targets. Curr Top Dev Biol 2012;98:121-45. [PMID: 22305161 DOI: 10.1016/b978-0-12-386499-4.00005-7] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]

Ascano M, Hafner M, Cekan P, Gerstberger S, Tuschl T. Identification of RNA-protein interaction networks using PAR-CLIP. WILEY INTERDISCIPLINARY REVIEWS-RNA 2011;3:159-77. [PMID: 22213601 DOI: 10.1002/wrna.1103] [Citation(s) in RCA: 177] [Impact Index Per Article: 13.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]

Erb I, van Nimwegen E. Transcription factor binding site positioning in yeast: proximal promoter motifs characterize TATA-less promoters. PLoS One 2011;6:e24279. [PMID: 21931670 PMCID: PMC3170328 DOI: 10.1371/journal.pone.0024279] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2011] [Accepted: 08/09/2011] [Indexed: 12/26/2022] Open

Abstract

The availability of sequence specificities for a substantial fraction of yeast's transcription factors and comparative genomic algorithms for binding site prediction has made it possible to comprehensively annotate transcription factor binding sites genome-wide. Here we use such a genome-wide annotation for comprehensively studying promoter architecture in yeast, focusing on the distribution of transcription factor binding sites relative to transcription start sites, and the architecture of TATA and TATA-less promoters. For most transcription factors, binding sites are positioned further upstream and vary over a wider range in TATA promoters than in TATA-less promoters. In contrast, a group of ‘proximal promoter motifs’ (GAT1/GLN3/DAL80, FKH1/2, PBF1/2, RPN4, NDT80, and ROX1) occur preferentially in TATA-less promoters and show a strong preference for binding close to the transcription start site in these promoters. We provide evidence that suggests that pre-initiation complexes are recruited at TATA sites in TATA promoters and at the sites of the other proximal promoter motifs in TATA-less promoters. TATA-less promoters can generally be classified by the proximal promoter motif they contain, with different classes of TATA-less promoters showing different patterns of transcription factor binding site positioning and nucleosome coverage. These observations suggest that different modes of regulation of transcription initiation may be operating in the different promoter classes. In addition we show that, across all promoter classes, there is a close match between nucleosome free regions and regions of highest transcription factor binding site density. This close agreement between transcription factor binding site density and nucleosome depletion suggests a direct and general competition between transcription factors and nucleosomes for binding to promoters.

Collapse

Corcoran DL, Georgiev S, Mukherjee N, Gottwein E, Skalsky RL, Keene JD, Ohler U. PARalyzer: definition of RNA binding sites from PAR-CLIP short-read sequence data. Genome Biol 2011;12:R79. [PMID: 21851591 PMCID: PMC3302668 DOI: 10.1186/gb-2011-12-8-r79] [Citation(s) in RCA: 264] [Impact Index Per Article: 20.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2011] [Revised: 06/16/2011] [Accepted: 08/18/2011] [Indexed: 01/17/2023] Open

Zhang C, Wang J, Hua X, Fang J, Zhu H, Gao X. A mutation degree model for the identification of transcriptional regulatory elements. BMC Bioinformatics 2011;12:262. [PMID: 21708002 PMCID: PMC3228546 DOI: 10.1186/1471-2105-12-262] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2010] [Accepted: 06/27/2011] [Indexed: 11/10/2022] Open

Zhang S, Li S, Niu M, Pham PT, Su Z. MotifClick: prediction of cis-regulatory binding sites via merging cliques. BMC Bioinformatics 2011;12:238. [PMID: 21679436 PMCID: PMC3225181 DOI: 10.1186/1471-2105-12-238] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2010] [Accepted: 06/16/2011] [Indexed: 11/21/2022] Open

Xie D, Chen CC, He X, Cao X, Zhong S. Towards an evolutionary model of transcription networks. PLoS Comput Biol 2011;7:e1002064. [PMID: 21695281 PMCID: PMC3111474 DOI: 10.1371/journal.pcbi.1002064] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2010] [Accepted: 04/08/2011] [Indexed: 11/18/2022] Open

Abstract

DNA evolution models made invaluable contributions to comparative genomics, although it seemed formidable to include non-genomic features into these models. In order to build an evolutionary model of transcription networks (TNs), we had to forfeit the substitution model used in DNA evolution and to start from modeling the evolution of the regulatory relationships. We present a quantitative evolutionary model of TNs, subjecting the phylogenetic distance and the evolutionary changes of cis-regulatory sequence, gene expression and network structure to one probabilistic framework. Using the genome sequences and gene expression data from multiple species, this model can predict regulatory relationships between a transcription factor (TF) and its target genes in all species, and thus identify TN re-wiring events. Applying this model to analyze the pre-implantation development of three mammalian species, we identified the conserved and re-wired components of the TNs downstream to a set of TFs including Oct4, Gata3/4/6, cMyc and nMyc. Evolutionary events on the DNA sequence that led to turnover of TF binding sites were identified, including a birth of an Oct4 binding site by a 2nt deletion. In contrast to recent reports of large interspecies differences of TF binding sites and gene expression patterns, the interspecies difference in TF-target relationship is much smaller. The data showed increasing conservation levels from genomic sequences to TF-DNA interaction, gene expression, TN, and finally to morphology, suggesting that evolutionary changes are larger at molecular levels and smaller at functional levels. The data also showed that evolutionarily older TFs are more likely to have conserved target genes, whereas younger TFs tend to have larger re-wiring rates.

DNA evolution models made invaluable contributions to comparative genomic studies. Still lacking is an evolutionary model of transcription networks (TNs). To develop such a model, we had to forfeit the substitution model used in DNA evolution and to start from modeling the evolution of the regulatory relationships, and then subject the phylogenetic distance and the multi-species DNA sequence and gene expression data to one probabilistic framework. This model enabled us to infer the evolutionary changes of transcriptional regulatory relationships. Applying this model to analyze three yeast species, we found the anaerobic phenotype in two species was associated with the evolutionary loss of a larger cis-regulatory motif than previously thought. Analyzing three mammalian species, we found increasing conservation levels from genomic sequences to transcription factor-DNA interaction, gene expression, TN, and finally to morphology, suggesting that evolutionary changes are larger at molecular levels and smaller at functional levels. We also found that evolutionarily younger TFs are more likely to regulate different target genes in different species.

Collapse

Carvalho AM, Oliveira AL. GRISOTTO: A greedy approach to improve combinatorial algorithms for motif discovery with prior knowledge. Algorithms Mol Biol 2011;6:13. [PMID: 21513505 PMCID: PMC3112114 DOI: 10.1186/1748-7188-6-13] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2010] [Accepted: 04/22/2011] [Indexed: 11/30/2022] Open

Ng P, Keich U. Alignment Constrained Sampling. J Comput Biol 2011;18:155-68. [DOI: 10.1089/cmb.2010.0220] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open

Chen G, Zhou Q. Heterogeneity in DNA multiple alignments: modeling, inference, and applications in motif finding. Biometrics 2011;66:694-704. [PMID: 19995355 DOI: 10.1111/j.1541-0420.2009.01362.x] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]

Li G, Liu B, Ma Q, Xu Y. A new framework for identifying cis-regulatory motifs in prokaryotes. Nucleic Acids Res 2010;39:e42. [PMID: 21149261 PMCID: PMC3074163 DOI: 10.1093/nar/gkq948] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open

Kishore S, Luber S, Zavolan M. Deciphering the role of RNA-binding proteins in the post-transcriptional control of gene expression. Brief Funct Genomics 2010;9:391-404. [PMID: 21127008 DOI: 10.1093/bfgp/elq028] [Citation(s) in RCA: 120] [Impact Index Per Article: 8.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open

Jayaraman G, Siddharthan R. Sigma-2: Multiple sequence alignment of non-coding DNA via an evolutionary model. BMC Bioinformatics 2010;11:464. [PMID: 20846408 PMCID: PMC2949893 DOI: 10.1186/1471-2105-11-464] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2010] [Accepted: 09/16/2010] [Indexed: 11/10/2022] Open

Abstract

BACKGROUND

While most multiple sequence alignment programs expect that all or most of their input is known to be homologous, and penalise insertions and deletions, this is not a reasonable assumption for non-coding DNA, which is much less strongly conserved than protein-coding genes. Arguing that the goal of sequence alignment should be the detection of homology and not similarity, we incorporate an evolutionary model into a previously published multiple sequence alignment program for non-coding DNA, Sigma, as a sensitive likelihood-based way to assess the significance of alignments. Version 1 of Sigma was successful in eliminating spurious alignments but exhibited relatively poor sensitivity on synthetic data. Sigma 1 used a p-value (the probability under the "null hypothesis" of non-homology) to assess the significance of alignments, and, optionally, a background model that captured short-range genomic correlations. Sigma version 2, described here, retains these features, but calculates the p-value using a sophisticated evolutionary model that we describe here, and also allows for a transition matrix for different substitution rates from and to different nucleotides. Our evolutionary model takes separate account of mutation and fixation, and can be extended to allow for locally differing functional constraints on sequence.

RESULTS

We demonstrate that, on real and synthetic data, Sigma-2 significantly outperforms other programs in specificity to genuine homology (that is, it minimises alignment of spuriously similar regions that do not have a common ancestry) while it is now as sensitive as the best current programs.

CONCLUSIONS

Comparing these results with an extrapolation of the best results from other available programs, we suggest that conservation rates in intergenic DNA are often significantly over-estimated. It is increasingly important to align non-coding DNA correctly, in regulatory genomics and in the context of whole-genome alignment, and Sigma-2 is an important step in that direction.

Collapse

Chen K, van Nimwegen E, Rajewsky N, Siegal ML. Correlating gene expression variation with cis-regulatory polymorphism in Saccharomyces cerevisiae. Genome Biol Evol 2010;2:697-707. [PMID: 20829281 PMCID: PMC2953268 DOI: 10.1093/gbe/evq054] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open

Sahota G, Stormo GD. Novel sequence-based method for identifying transcription factor binding sites in prokaryotic genomes. ACTA ACUST UNITED AC 2010;26:2672-7. [PMID: 20807838 DOI: 10.1093/bioinformatics/btq501] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]

Baumbach J. On the power and limits of evolutionary conservation--unraveling bacterial gene regulatory networks. Nucleic Acids Res 2010;38:7877-84. [PMID: 20699275 PMCID: PMC3001071 DOI: 10.1093/nar/gkq699] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open

Zhu XG, Shan L, Wang Y, Quick WP. C4 rice - an ideal arena for systems biology research. JOURNAL OF INTEGRATIVE PLANT BIOLOGY 2010;52:762-70. [PMID: 20666931 DOI: 10.1111/j.1744-7909.2010.00983.x] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/05/2023]

Genome-wide identification of cis-regulatory motifs and modules underlying gene coregulation using statistics and phylogeny. Proc Natl Acad Sci U S A 2010;107:14615-20. [PMID: 20671200 DOI: 10.1073/pnas.1002876107] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/14/2023] Open

Evans KJ. Most transcription factor binding sites are in a few mosaic classes of the human genome. BMC Genomics 2010;11:286. [PMID: 20459624 PMCID: PMC2881025 DOI: 10.1186/1471-2164-11-286] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2010] [Accepted: 05/06/2010] [Indexed: 12/02/2022] Open

Hafner M, Landthaler M, Burger L, Khorshid M, Hausser J, Berninger P, Rothballer A, Ascano M, Jungkamp AC, Munschauer M, Ulrich A, Wardle GS, Dewell S, Zavolan M, Tuschl T. Transcriptome-wide identification of RNA-binding protein and microRNA target sites by PAR-CLIP. Cell 2010;141:129-41. [PMID: 20371350 DOI: 10.1016/j.cell.2010.03.009] [Citation(s) in RCA: 2184] [Impact Index Per Article: 156.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2009] [Revised: 01/11/2010] [Accepted: 02/27/2010] [Indexed: 12/17/2022]

Bailey TL, Bodén M, Whitington T, Machanick P. The value of position-specific priors in motif discovery using MEME. BMC Bioinformatics 2010;11:179. [PMID: 20380693 PMCID: PMC2868008 DOI: 10.1186/1471-2105-11-179] [Citation(s) in RCA: 74] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2010] [Accepted: 04/09/2010] [Indexed: 11/23/2022] Open

Gordân R, Narlikar L, Hartemink AJ. Finding regulatory DNA motifs using alignment-free evolutionary conservation information. Nucleic Acids Res 2010;38:e90. [PMID: 20047961 PMCID: PMC2847231 DOI: 10.1093/nar/gkp1166] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2009] [Revised: 10/30/2009] [Accepted: 11/23/2009] [Indexed: 01/01/2023] Open

Jiang L, Pearson JC, Crews ST. Diverse modes of Drosophila tracheal fusion cell transcriptional regulation. Mech Dev 2010;127:265-80. [PMID: 20347970 DOI: 10.1016/j.mod.2010.03.003] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2009] [Revised: 03/18/2010] [Accepted: 03/21/2010] [Indexed: 10/19/2022]

Siddharthan R. Dinucleotide weight matrices for predicting transcription factor binding sites: generalizing the position weight matrix. PLoS One 2010;5:e9722. [PMID: 20339533 PMCID: PMC2842295 DOI: 10.1371/journal.pone.0009722] [Citation(s) in RCA: 60] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2009] [Accepted: 02/26/2010] [Indexed: 01/27/2023] Open

Abstract

Background

Identifying transcription factor binding sites (TFBS) in silico is key in understanding gene regulation. TFBS are string patterns that exhibit some variability, commonly modelled as “position weight matrices” (PWMs). Though convenient, the PWM has significant limitations, in particular the assumed independence of positions within the binding motif; and predictions based on PWMs are usually not very specific to known functional sites. Analysis here on binding sites in yeast suggests that correlation of dinucleotides is not limited to near-neighbours, but can extend over considerable gaps.

Methodology/Principal Findings

I describe a straightforward generalization of the PWM model, that considers frequencies of dinucleotides instead of individual nucleotides. Unlike previous efforts, this method considers all dinucleotides within an extended binding region, and does not make an attempt to determine a priori the significance of particular dinucleotide correlations. I describe how to use a “dinucleotide weight matrix” (DWM) to predict binding sites, dealing in particular with the complication that its entries are not independent probabilities. Benchmarks show, for many factors, a dramatic improvement over PWMs in precision of predicting known targets. In most cases, significant further improvement arises by extending the commonly defined “core motifs” by about 10bp on either side. Though this flanking sequence shows no strong motif at the nucleotide level, the predictive power of the dinucleotide model suggests that the “signature” in DNA sequence of protein-binding affinity extends beyond the core protein-DNA contact region.

Conclusion/Significance

While computationally more demanding and slower than PWM-based approaches, this dinucleotide method is straightforward, both conceptually and in implementation, and can serve as a basis for future improvements.

Collapse

Cenik C, Derti A, Mellor JC, Berriz GF, Roth FP. Genome-wide functional analysis of human 5' untranslated region introns. Genome Biol 2010;11:R29. [PMID: 20222956 PMCID: PMC2864569 DOI: 10.1186/gb-2010-11-3-r29] [Citation(s) in RCA: 59] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2010] [Accepted: 03/11/2010] [Indexed: 12/19/2022] Open

Georgiev S, Boyle AP, Jayasurya K, Ding X, Mukherjee S, Ohler U. Evidence-ranked motif identification. Genome Biol 2010;11:R19. [PMID: 20156354 PMCID: PMC2872879 DOI: 10.1186/gb-2010-11-2-r19] [Citation(s) in RCA: 75] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2009] [Revised: 09/30/2009] [Accepted: 02/15/2010] [Indexed: 11/13/2022] Open

The effect of orthology and coregulation on detecting regulatory motifs. PLoS One 2010;5:e8938. [PMID: 20140085 PMCID: PMC2815771 DOI: 10.1371/journal.pone.0008938] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2009] [Accepted: 01/05/2010] [Indexed: 11/19/2022] Open

Sleumer MC, Mah AK, Baillie DL, Jones SJM. Conserved elements associated with ribosomal genes and their trans-splice acceptor sites in Caenorhabditis elegans. Nucleic Acids Res 2010;38:2990-3004. [PMID: 20100800 PMCID: PMC2875031 DOI: 10.1093/nar/gkq003] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023] Open

Won KJ, Ren B, Wang W. Genome-wide prediction of transcription factor binding sites using an integrated model. Genome Biol 2010;11:R7. [PMID: 20096096 PMCID: PMC2847719 DOI: 10.1186/gb-2010-11-1-r7] [Citation(s) in RCA: 96] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2009] [Revised: 10/30/2009] [Accepted: 01/22/2010] [Indexed: 12/19/2022] Open

Reid JE, Evans KJ, Dyer N, Wernisch L, Ott S. Variable structure motifs for transcription factor binding sites. BMC Genomics 2010;11:30. [PMID: 20074339 PMCID: PMC2824720 DOI: 10.1186/1471-2164-11-30] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2009] [Accepted: 01/14/2010] [Indexed: 02/06/2023] Open

Abstract

Background

Classically, models of DNA-transcription factor binding sites (TFBSs) have been based on relatively few known instances and have treated them as sites of fixed length using position weight matrices (PWMs). Various extensions to this model have been proposed, most of which take account of dependencies between the bases in the binding sites. However, some transcription factors are known to exhibit some flexibility and bind to DNA in more than one possible physical configuration. In some cases this variation is known to affect the function of binding sites. With the increasing volume of ChIP-seq data available it is now possible to investigate models that incorporate this flexibility. Previous work on variable length models has been constrained by: a focus on specific zinc finger proteins in yeast using restrictive models; a reliance on hand-crafted models for just one transcription factor at a time; and a lack of evaluation on realistically sized data sets.

Results

We re-analysed binding sites from the TRANSFAC database and found motivating examples where our new variable length model provides a better fit. We analysed several ChIP-seq data sets with a novel motif search algorithm and compared the results to one of the best standard PWM finders and a recently developed alternative method for finding motifs of variable structure. All the methods performed comparably in held-out cross validation tests. Known motifs of variable structure were recovered for p53, Stat5a and Stat5b. In addition our method recovered a novel generalised version of an existing PWM for Sp1 that allows for variable length binding. This motif improved classification performance.

Conclusions

We have presented a new gapped PWM model for variable length DNA binding sites that is not too restrictive nor over-parameterised. Our comparison with existing tools shows that on average it does not have better predictive accuracy than existing methods. However, it does provide more interpretable models of motifs of variable structure that are suitable for follow-up structural studies. To our knowledge, we are the first to apply variable length motif models to eukaryotic ChIP-seq data sets and consequently the first to show their value in this domain. The results include a novel motif for the ubiquitous transcription factor Sp1.

Collapse

He X, Sinha S. Evolution of cis-regulatory sequences in Drosophila. Methods Mol Biol 2010;674:283-296. [PMID: 20827599 DOI: 10.1007/978-1-60761-854-6_18] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/29/2023]

Ladunga I. An overview of the computational analyses and discovery of transcription factor binding sites. Methods Mol Biol 2010;674:1-22. [PMID: 20827582 DOI: 10.1007/978-1-60761-854-6_1] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/29/2023]

Ho ES, Jakubowski CD, Gunderson SI. iTriplet, a rule-based nucleic acid sequence motif finder. Algorithms Mol Biol 2009;4:14. [PMID: 19874606 PMCID: PMC2784457 DOI: 10.1186/1748-7188-4-14] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2009] [Accepted: 10/29/2009] [Indexed: 12/29/2022] Open

100

Discovering multiple realistic TFBS motifs based on a generalized model. BMC Bioinformatics 2009;10:321. [PMID: 19811641 PMCID: PMC2770069 DOI: 10.1186/1471-2105-10-321] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2009] [Accepted: 10/07/2009] [Indexed: 11/10/2022] Open

Abstract

BACKGROUND

Identification of transcription factor binding sites (TFBSs) is a central problem in Bioinformatics on gene regulation. de novo motif discovery serves as a promising way to predict and better understand TFBSs for biological verifications. Real TFBSs of a motif may vary in their widths and their conservation degrees within a certain range. Deciding a single motif width by existing models may be biased and misleading. Additionally, multiple, possibly overlapping, candidate motifs are desired and necessary for biological verification in practice. However, current techniques either prohibit overlapping TFBSs or lack explicit control of different motifs.

RESULTS

We propose a new generalized model to tackle the motif widths by considering and evaluating a width range of interest simultaneously, which should better address the width uncertainty. Moreover, a meta-convergence framework for genetic algorithms (GAs), is proposed to provide multiple overlapping optimal motifs simultaneously in an effective and flexible way. Users can easily specify the difference amongst expected motif kinds via similarity test. Incorporating Genetic Algorithm with Local Filtering (GALF) for searching, the new GALF-G (G for generalized) algorithm is proposed based on the generalized model and meta-convergence framework.

CONCLUSION

GALF-G was tested extensively on over 970 synthetic, real and benchmark datasets, and is usually better than the state-of-the-art methods. The range model shows an increase in sensitivity compared with the single-width ones, while providing competitive precisions on the E. coli benchmark. Effectiveness can be maintained even using a very small population, exhibiting very competitive efficiency. In discovering multiple overlapping motifs in a real liver-specific dataset, GALF-G outperforms MEME by up to 73% in overall F-scores. GALF-G also helps to discover an additional motif which has probably not been annotated in the dataset. http://www.cse.cuhk.edu.hk/%7Etmchan/GALFG/

Collapse