Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Siddharthan R, Siggia ED, van Nimwegen E. PhyloGibbs: a Gibbs sampling motif finder that incorporates phylogeny. PLoS Comput Biol 2005;1:e67. [PMID: 16477324 PMCID: PMC1309704 DOI: 10.1371/journal.pcbi.0010067] [Citation(s) in RCA: 176] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2005] [Accepted: 10/28/2005] [Indexed: 12/27/2022] Open

For:	Siddharthan R, Siggia ED, van Nimwegen E. PhyloGibbs: a Gibbs sampling motif finder that incorporates phylogeny. PLoS Comput Biol 2005;1:e67. [PMID: 16477324 PMCID: PMC1309704 DOI: 10.1371/journal.pcbi.0010067] [Citation(s) in RCA: 176] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2005] [Accepted: 10/28/2005] [Indexed: 12/27/2022] Open

Number

Cited by Other Article(s)

Torres-Tiji Y, Sethuram H, Gupta A, McCauley J, Dutra-Molino JV, Pathania R, Saxton L, Kang K, Hillson NJ, Mayfield SP. Bioinformatic Prediction and High Throughput In Vivo Screening to Identify Cis-Regulatory Elements for the Development of Algal Synthetic Promoters. ACS Synth Biol 2024;13:2150-2165. [PMID: 38986010 DOI: 10.1021/acssynbio.4c00199] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/12/2024]

Selvakumar P, Siddharthan R. Position-specific evolution in transcription factor binding sites, and a fast likelihood calculation for the F81 model. ROYAL SOCIETY OPEN SCIENCE 2024;11:231088. [PMID: 38269075 PMCID: PMC10805598 DOI: 10.1098/rsos.231088] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/26/2023] [Accepted: 12/20/2023] [Indexed: 01/26/2024]

Katsantoni M, van Nimwegen E, Zavolan M. Improved analysis of (e)CLIP data with RCRUNCH yields a compendium of RNA-binding protein binding sites and motifs. Genome Biol 2023;24:77. [PMID: 37069586 PMCID: PMC10108518 DOI: 10.1186/s13059-023-02913-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2022] [Accepted: 03/29/2023] [Indexed: 04/19/2023] Open

Danan C, Manickavel S, Hafner M. PAR-CLIP: A Method for Transcriptome-Wide Identification of RNA Binding Protein Interaction Sites. METHODS IN MOLECULAR BIOLOGY (CLIFTON, N.J.) 2021;2404:167-188. [PMID: 34694609 DOI: 10.1007/978-1-0716-1851-6_9] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]

Hafner M, Katsantoni M, Köster T, Marks J, Mukherjee J, Staiger D, Ule J, Zavolan M. CLIP and complementary methods. ACTA ACUST UNITED AC 2021. [DOI: 10.1038/s43586-021-00018-1] [Citation(s) in RCA: 50] [Impact Index Per Article: 16.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]

Carazo F, Romero JP, Rubio A. Upstream analysis of alternative splicing: a review of computational approaches to predict context-dependent splicing factors. Brief Bioinform 2020;20:1358-1375. [PMID: 29390045 DOI: 10.1093/bib/bby005] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2017] [Revised: 12/14/2017] [Indexed: 12/13/2022] Open

Sankaranarayanan SR, Ianiri G, Coelho MA, Reza MH, Thimmappa BC, Ganguly P, Vadnala RN, Sun S, Siddharthan R, Tellgren-Roth C, Dawson TL, Heitman J, Sanyal K. Loss of centromere function drives karyotype evolution in closely related Malassezia species. eLife 2020;9:e53944. [PMID: 31958060 PMCID: PMC7025860 DOI: 10.7554/elife.53944] [Citation(s) in RCA: 35] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2019] [Accepted: 01/20/2020] [Indexed: 12/14/2022] Open

Agrawal A, Sambare SV, Narlikar L, Siddharthan R. THiCweed: fast, sensitive detection of sequence features by clustering big datasets. Nucleic Acids Res 2019;46:e29. [PMID: 29267972 PMCID: PMC5861420 DOI: 10.1093/nar/gkx1251] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2017] [Accepted: 12/01/2017] [Indexed: 11/19/2022] Open

Berger S, Pachkov M, Arnold P, Omidi S, Kelley N, Salatino S, van Nimwegen E. Crunch: integrated processing and modeling of ChIP-seq data in terms of regulatory motifs. Genome Res 2019;29:1164-1177. [PMID: 31138617 PMCID: PMC6633267 DOI: 10.1101/gr.239319.118] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2018] [Accepted: 05/14/2019] [Indexed: 01/10/2023]

Lu J, Cao X, Zhong S. A likelihood approach to testing hypotheses on the co-evolution of epigenome and genome. PLoS Comput Biol 2018;14:e1006673. [PMID: 30586383 PMCID: PMC6324829 DOI: 10.1371/journal.pcbi.1006673] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2018] [Revised: 01/08/2019] [Accepted: 11/26/2018] [Indexed: 01/03/2023] Open

Abstract

Central questions to epigenome evolution include whether interspecies changes of histone modifications are independent of evolutionary changes of DNA, and if there is dependence whether they depend on any specific types of DNA sequence changes. Here, we present a likelihood approach for testing hypotheses on the co-evolution of genome and histone modifications. The gist of this approach is to convert evolutionary biology hypotheses into probabilistic forms, by explicitly expressing the joint probability of multispecies DNA sequences and histone modifications, which we refer to as a class of Joint Evolutionary Model for the Genome and the Epigenome (JEMGE). JEMGE can be summarized as a mixture model of four components representing four evolutionary hypotheses, namely dependence and independence of interspecies epigenomic variations to underlying sequence substitutions and to underlying sequence insertions and deletions (indels). We implemented a maximum likelihood method to fit the models to the data. Based on comparison of likelihoods, we inferred whether interspecies epigenomic variations depended on substitution or indels in local genomic sequences based on DNase hypersensitivity and spermatid H3K4me3 ChIP-seq data from human and rhesus macaque. Approximately 5.5% of homologous regions in the genomes exhibited H3K4me3 modification in either species, among which approximately 67% homologous regions exhibited local-sequence-dependent interspecies H3K4me3 variations. Substitutions accounted for less local-sequence-dependent H3K4me3 variations than indels. Among transposon-mediated indels, ERV1 insertions and L1 insertions were most strongly associated with H3K4me3 gains and losses, respectively. By initiating probabilistic formulation on the co-evolution of genomes and epigenomes, JEMGE helps to bring evolutionary biology principles to comparative epigenomic studies.

Epigenetic modifications play a significant role in gene regulations and thus heavily influence phenotypic outcomes. Whereas cross-species epigenomic comparisons have been fruitful in revealing the function of epigenetic modifications, it still remains unclear how the epigenome changes across species. A central question in epigenome evolution studies is whether interspecies epigenomic variations rely on genomic changes in cis and, if partially yes, whether different genomic changes have distinct impacts. To tackle this question, we initiated a likelihood-based approach, in which different hypotheses related to the co-evolution of the genome and the epigenome could be converted into probabilistic models. By fitting the models to actual data, each model yielded a likelihood, and the hypothesis corresponded to the largest likelihood was selected as most supported by observed data. In this work, we focused on the influence of two types of underlying sequence changes: substitutions, and insertions and deletions (indels). We quantitatively assessed the dependence of H3K4me3 variations on substitutions and indels between human and rhesus, and separated their relative impacts within each genomic region with H3K4me3. The methodology presented here provides a framework for modeling the epigenome together with the genome and a quantitative approach to test different evolutionary hypotheses.

Collapse

Dempster-Shafer Theory for the Prediction of Auxin-Response Elements (AuxREs) in Plant Genomes. BIOMED RESEARCH INTERNATIONAL 2018;2018:3837060. [PMID: 30515394 PMCID: PMC6236769 DOI: 10.1155/2018/3837060] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/17/2018] [Accepted: 10/15/2018] [Indexed: 11/17/2022]

Dotu I, Adamson SI, Coleman B, Fournier C, Ricart-Altimiras E, Eyras E, Chuang JH. SARNAclust: Semi-automatic detection of RNA protein binding motifs from immunoprecipitation data. PLoS Comput Biol 2018;14:e1006078. [PMID: 29596423 PMCID: PMC5892938 DOI: 10.1371/journal.pcbi.1006078] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2017] [Revised: 04/10/2018] [Accepted: 03/05/2018] [Indexed: 12/02/2022] Open

Abstract

RNA-protein binding is critical to gene regulation, controlling fundamental processes including splicing, translation, localization and stability, and aberrant RNA-protein interactions are known to play a role in a wide variety of diseases. However, molecular understanding of RNA-protein interactions remains limited; in particular, identification of RNA motifs that bind proteins has long been challenging, especially when such motifs depend on both sequence and structure. Moreover, although RNA binding proteins (RBPs) often contain more than one binding domain, algorithms capable of identifying more than one binding motif simultaneously have not been developed. In this paper we present a novel pipeline to determine binding peaks in crosslinking immunoprecipitation (CLIP) data, to discover multiple possible RNA sequence/structure motifs among them, and to experimentally validate such motifs. At the core is a new semi-automatic algorithm SARNAclust, the first unsupervised method to identify and deconvolve multiple sequence/structure motifs simultaneously. SARNAclust computes similarity between sequence/structure objects using a graph kernel, providing the ability to isolate the impact of specific features through the bulge graph formalism. Application of SARNAclust to synthetic data shows its capability of clustering 5 motifs at once with a V-measure value of over 0.95, while GraphClust achieves only a V-measure of 0.083 and RNAcontext cannot detect any of the motifs. When applied to existing eCLIP sets, SARNAclust finds known motifs for SLBP and HNRNPC and novel motifs for several other RBPs such as AGGF1, AKAP8L and ILF3. We demonstrate an experimental validation protocol, a targeted Bind-n-Seq-like high-throughput sequencing approach that relies on RNA inverse folding for oligo pool design, that can validate the components within the SLBP motif. Finally, we use this protocol to experimentally interrogate the SARNAclust motif predictions for protein ILF3. Our results support a newly identified partially double-stranded UUUUUGAGA motif similar to that known for the splicing factor HNRNPC.

RNA-protein binding is critical to gene regulation, and aberrant RNA-protein interactions play a role in a wide variety of diseases. However, molecular understanding of these interactions remains limited because of the difficulty of ascertaining the motifs that bind each protein. To address this challenge, we have developed a novel algorithm, SARNAclust, to computationally identify combined structure/sequence motifs from immunoprecipitation data. SARNAclust can deconvolve multiple motifs simultaneously and determine the importance of specific features through a graph kernel and bulge graph formalism. We have verified SARNAclust to be effective on synthetic motif data and also tested it on ENCODE eCLIP datasets, identifying known motifs and novel predictions. We have experimentally validated SARNAclust for two proteins, SLBP and ILF3, using RNA Bind-n-Seq measurements. Applying SARNAclust to ENCODE data provides new evidence for previously unknown regulatory interactions, notably splicing co-regulation by ILF3 and the splicing factor hnRNPC.

Collapse

Nettling M, Treutler H, Cerquides J, Grosse I. Unrealistic phylogenetic trees may improve phylogenetic footprinting. Bioinformatics 2018;33:1639-1646. [PMID: 28130227 PMCID: PMC5447242 DOI: 10.1093/bioinformatics/btx033] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2016] [Accepted: 01/19/2017] [Indexed: 01/10/2023] Open

Caldonazzo Garbelini JM, Kashiwabara AY, Sanches DS. Sequence motif finder using memetic algorithm. BMC Bioinformatics 2018;19:4. [PMID: 29298679 PMCID: PMC5751424 DOI: 10.1186/s12859-017-2005-1] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2017] [Accepted: 12/18/2017] [Indexed: 11/10/2022] Open

Fu H, Zhang X. Noncoding Variants Functional Prioritization Methods Based on Predicted Regulatory Factor Binding Sites. Curr Genomics 2017;18:322-331. [PMID: 29081688 PMCID: PMC5635616 DOI: 10.2174/1389202918666170228143619] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2016] [Revised: 10/16/2016] [Accepted: 11/02/2016] [Indexed: 12/31/2022] Open

Abstract

BACKGROUNDS

With the advent of the post genomic era, the research for the genetic mechanism of the diseases has found to be increasingly depended on the studies of the genes, the gene-networks and gene-protein interaction networks. To explore gene expression and regulation, the researchers have carried out many studies on transcription factors and their binding sites (TFBSs). Based on the large amount of transcription factor binding sites predicting values in the deep learning models, further computation and analysis have been done to reveal the relationship between the gene mutation and the occurrence of the disease. It has been demonstrated that based on the deep learning methods, the performances of the prediction for the functions of the noncoding variants are outperforming than those of the conventional methods. The research on the prediction for functions of Single Nucleotide Polymorphisms (SNPs) is expected to uncover the mechanism of the gene mutation affection on traits and diseases of human beings.

RESULTS

We reviewed the conventional TFBSs identification methods from different perspectives. As for the deep learning methods to predict the TFBSs, we discussed the related problems, such as the raw data preprocessing, the structure design of the deep convolution neural network (CNN) and the model performance measure et al. And then we summarized the techniques that usually used in finding out the functional noncoding variants from de novo sequence.

CONCLUSION

Along with the rapid development of the high-throughout assays, more and more sample data and chromatin features would be conducive to improve the prediction accuracy of the deep convolution neural network for TFBSs identification. Meanwhile, getting more insights into the deep CNN framework itself has been proved useful for both the promotion on model performance and the development for more suitable design to sample data. Based on the feature values predicted by the deep CNN model, the prioritization model for functional noncoding variants would contribute to reveal the affection of gene mutation on the diseases.

Collapse

Omidi S, Zavolan M, Pachkov M, Breda J, Berger S, van Nimwegen E. Automated incorporation of pairwise dependency in transcription factor binding site prediction using dinucleotide weight tensors. PLoS Comput Biol 2017;13:e1005176. [PMID: 28753602 PMCID: PMC5550003 DOI: 10.1371/journal.pcbi.1005176] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2016] [Revised: 08/09/2017] [Accepted: 06/02/2017] [Indexed: 11/17/2022] Open

Abstract

Gene regulatory networks are ultimately encoded by the sequence-specific binding of (TFs) to short DNA segments. Although it is customary to represent the binding specificity of a TF by a position-specific weight matrix (PSWM), which assumes each position within a site contributes independently to the overall binding affinity, evidence has been accumulating that there can be significant dependencies between positions. Unfortunately, methodological challenges have so far hindered the development of a practical and generally-accepted extension of the PSWM model. On the one hand, simple models that only consider dependencies between nearest-neighbor positions are easy to use in practice, but fail to account for the distal dependencies that are observed in the data. On the other hand, models that allow for arbitrary dependencies are prone to overfitting, requiring regularization schemes that are difficult to use in practice for non-experts. Here we present a new regulatory motif model, called dinucleotide weight tensor (DWT), that incorporates arbitrary pairwise dependencies between positions in binding sites, rigorously from first principles, and free from tunable parameters. We demonstrate the power of the method on a large set of ChIP-seq data-sets, showing that DWTs outperform both PSWMs and motif models that only incorporate nearest-neighbor dependencies. We also demonstrate that DWTs outperform two previously proposed methods. Finally, we show that DWTs inferred from ChIP-seq data also outperform PSWMs on HT-SELEX data for the same TF, suggesting that DWTs capture inherent biophysical properties of the interactions between the DNA binding domains of TFs and their binding sites. We make a suite of DWT tools available at dwt.unibas.ch, that allow users to automatically perform ‘motif finding’, i.e. the inference of DWT motifs from a set of sequences, binding site prediction with DWTs, and visualization of DWT ‘dilogo’ motifs.

Gene regulatory networks are ultimately encoded in constellations of short binding sites in the DNA and RNA that are recognized by regulatory factors such as transcription factors (TFs). For several decades, computational analysis of regulatory networks has relied on a model of TF sequence-specificity, the position-specific weight-matrix (PSWM), that assumes different positions in a binding site contribute independently to the total binding energy of the TF. However, in recent years evidence has been accumulating that, at least for some TFs, this assumption does not hold. Here we present a new model for the sequence-specificity of TFs, the dinucleotide weight tensor (DWT), that takes arbitrary dependencies between positions in binding sites into account and show that it consistently outperforms PSWMs on high-throughput datasets on TF binding. Moreover, in contrast to previous approaches, DWTs are directly derived from first principles within a Bayesian framework, and contain no tunable parameters. This allows them to be easily applied in practice and we make a suite of tools available for computational analysis with DWTs.

Collapse

Recurrent rewiring and emergence of RNA regulatory networks. Proc Natl Acad Sci U S A 2017;114:E2816-E2825. [PMID: 28320951 DOI: 10.1073/pnas.1617777114] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open

Liu B, Yang J, Li Y, McDermaid A, Ma Q. An algorithmic perspective of de novo cis-regulatory motif finding based on ChIP-seq data. Brief Bioinform 2017;19:1069-1081. [DOI: 10.1093/bib/bbx026] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2016] [Indexed: 01/06/2023] Open

Nettling M, Treutler H, Cerquides J, Grosse I. Combining phylogenetic footprinting with motif models incorporating intra-motif dependencies. BMC Bioinformatics 2017;18:141. [PMID: 28249564 PMCID: PMC5333389 DOI: 10.1186/s12859-017-1495-1] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2016] [Accepted: 01/24/2017] [Indexed: 11/23/2022] Open

Abstract

Background

Transcriptional gene regulation is a fundamental process in nature, and the experimental and computational investigation of DNA binding motifs and their binding sites is a prerequisite for elucidating this process. Approaches for de-novo motif discovery can be subdivided in phylogenetic footprinting that takes into account phylogenetic dependencies in aligned sequences of more than one species and non-phylogenetic approaches based on sequences from only one species that typically take into account intra-motif dependencies. It has been shown that modeling (i) phylogenetic dependencies as well as (ii) intra-motif dependencies separately improves de-novo motif discovery, but there is no approach capable of modeling both (i) and (ii) simultaneously.

Results

Here, we present an approach for de-novo motif discovery that combines phylogenetic footprinting with motif models capable of taking into account intra-motif dependencies. We study the degree of intra-motif dependencies inferred by this approach from ChIP-seq data of 35 transcription factors. We find that significant intra-motif dependencies of orders 1 and 2 are present in all 35 datasets and that intra-motif dependencies of order 2 are typically stronger than those of order 1. We also find that the presented approach improves the classification performance of phylogenetic footprinting in all 35 datasets and that incorporating intra-motif dependencies of order 2 yields a higher classification performance than incorporating such dependencies of only order 1.

Conclusion

Combining phylogenetic footprinting with motif models incorporating intra-motif dependencies leads to an improved performance in the classification of transcription factor binding sites. This may advance our understanding of transcriptional gene regulation and its evolution.

Electronic supplementary material

The online version of this article (doi:10.1186/s12859-017-1495-1) contains supplementary material, which is available to authorized users.

Collapse

Furnholm T, Rehan M, Wishart J, Tisa LS. Pb2+ tolerance by Frankia sp. strain EAN1pec involves surface-binding. MICROBIOLOGY-SGM 2017;163:472-487. [PMID: 28141503 DOI: 10.1099/mic.0.000439] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]

Liu B, Zhang H, Zhou C, Li G, Fennell A, Wang G, Kang Y, Liu Q, Ma Q. An integrative and applicable phylogenetic footprinting framework for cis-regulatory motifs identification in prokaryotic genomes. BMC Genomics 2016;17:578. [PMID: 27507169 PMCID: PMC4977642 DOI: 10.1186/s12864-016-2982-x] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2016] [Accepted: 07/29/2016] [Indexed: 11/10/2022] Open

Abstract

Background

Phylogenetic footprinting is an important computational technique for identifying cis-regulatory motifs in orthologous regulatory regions from multiple genomes, as motifs tend to evolve slower than their surrounding non-functional sequences. Its application, however, has several difficulties for optimizing the selection of orthologous data and reducing the false positives in motif prediction.

Results

Here we present an integrative phylogenetic footprinting framework for accurate motif predictions in prokaryotic genomes (MP³). The framework includes a new orthologous data preparation procedure, an additional promoter scoring and pruning method and an integration of six existing motif finding algorithms as basic motif search engines. Specifically, we collected orthologous genes from available prokaryotic genomes and built the orthologous regulatory regions based on sequence similarity of promoter regions. This procedure made full use of the large-scale genomic data and taxonomy information and filtered out the promoters with limited contribution to produce a high quality orthologous promoter set. The promoter scoring and pruning is implemented through motif voting by a set of complementary predicting tools that mine as many motif candidates as possible and simultaneously eliminate the effect of random noise. We have applied the framework to Escherichia coli k12 genome and evaluated the prediction performance through comparison with seven existing programs. This evaluation was systematically carried out at the nucleotide and binding site level, and the results showed that MP³ consistently outperformed other popular motif finding tools. We have integrated MP³ into our motif identification and analysis server DMINDA, allowing users to efficiently identify and analyze motifs in 2,072 completely sequenced prokaryotic genomes.

Conclusion

The performance evaluation indicated that MP³ is effective for predicting regulatory motifs in prokaryotic genomes. Its application may enhance progress in elucidating transcription regulation mechanism, thus provide benefit to the genomic research community and prokaryotic genome researchers in particular.

Electronic supplementary material

The online version of this article (doi:10.1186/s12864-016-2982-x) contains supplementary material, which is available to authorized users.

Collapse

Danan C, Manickavel S, Hafner M. PAR-CLIP: A Method for Transcriptome-Wide Identification of RNA Binding Protein Interaction Sites. Methods Mol Biol 2016;1358:153-73. [PMID: 26463383 PMCID: PMC5142217 DOI: 10.1007/978-1-4939-3067-8_10] [Citation(s) in RCA: 28] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/17/2023]

Yen IEH, Lin X, Zhang J, Ravikumar P, Dhillon IS. A Convex Atomic-Norm Approach to Multiple Sequence Alignment and Motif Discovery. JMLR WORKSHOP AND CONFERENCE PROCEEDINGS 2016;48:2272-2280. [PMID: 27559428 PMCID: PMC4993214] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]

Davies NJ, Krusche P, Tauber E, Ott S. Analysis of 5' gene regions reveals extraordinary conservation of novel non-coding sequences in a wide range of animals. BMC Evol Biol 2015;15:227. [PMID: 26482678 PMCID: PMC4613772 DOI: 10.1186/s12862-015-0499-6] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2015] [Accepted: 09/28/2015] [Indexed: 01/20/2023] Open

Abstract

Background

Phylogenetic footprinting is a comparative method based on the principle that functional sequence elements will acquire fewer mutations over time than non-functional sequences. Successful comparisons of distantly related species will thus yield highly important sequence elements likely to serve fundamental biological roles. RNA regulatory elements are less well understood than those in DNA. In this study we use the emerging model organism Nasonia vitripennis, a parasitic wasp, in a comparative analysis against 12 insect genomes to identify deeply conserved non-coding elements (CNEs) conserved in large groups of insects, with a focus on 5’ UTRs and promoter sequences.

Results

We report the identification of 322 CNEs conserved across a broad range of insect orders. The identified regions are associated with regulatory and developmental genes, and contain short footprints revealing aspects of their likely function in translational regulation. The most ancient regions identified in our analysis were all found to overlap transcribed regions of genes, reflecting stronger conservation of translational regulatory elements than transcriptional elements. Further expanding sequence analyses to non-insect species we also report the discovery of, to our knowledge, the two oldest and most ubiquitous CNE’s yet described in the animal kingdom (700 MYA). These ancient conserved non-coding elements are associated with the two ribosomal stalk genes, RPLP1 and RPLP2, and were very likely functional in some of the earliest animals.

Conclusions

We report the identification of the most deeply conserved CNE’s found to date, and several other deeply conserved elements which are without exception, part of 5’ untranslated regions of transcripts, and occur in a number of key translational regulatory genes, highlighting translational regulation of translational regulators as a conserved feature of insect genomes.

Electronic supplementary material

The online version of this article (doi:10.1186/s12862-015-0499-6) contains supplementary material, which is available to authorized users.

Collapse

Bespoke RNA recognition by Pumilios. Biochem Soc Trans 2015;43:801-6. [DOI: 10.1042/bst20150072] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]

Thompson D, Regev A, Roy S. Comparative analysis of gene regulatory networks: from network reconstruction to evolution. Annu Rev Cell Dev Biol 2015;31:399-428. [PMID: 26355593 DOI: 10.1146/annurev-cellbio-100913-012908] [Citation(s) in RCA: 95] [Impact Index Per Article: 10.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]

Benner P, Bačák M, Bourguignon PY. Point estimates in phylogenetic reconstructions. Bioinformatics 2015;30:i534-40. [PMID: 25161244 PMCID: PMC4147914 DOI: 10.1093/bioinformatics/btu461] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open

Taher L, Narlikar L, Ovcharenko I. Identification and computational analysis of gene regulatory elements. Cold Spring Harb Protoc 2015;2015:pdb.top083642. [PMID: 25561628 PMCID: PMC5885252 DOI: 10.1101/pdb.top083642] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]

Cook KB, Hughes TR, Morris QD. High-throughput characterization of protein-RNA interactions. Brief Funct Genomics 2014;14:74-89. [PMID: 25504152 PMCID: PMC4303715 DOI: 10.1093/bfgp/elu047] [Citation(s) in RCA: 68] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023] Open

Reyes-Herrera PH, Ficarra E. Computational Methods for CLIP-seq Data Processing. Bioinform Biol Insights 2014;8:199-207. [PMID: 25336930 PMCID: PMC4196881 DOI: 10.4137/bbi.s16803] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2014] [Revised: 07/29/2014] [Accepted: 08/01/2014] [Indexed: 12/25/2022] Open

Baresic M, Salatino S, Kupr B, van Nimwegen E, Handschin C. Transcriptional network analysis in muscle reveals AP-1 as a partner of PGC-1α in the regulation of the hypoxic gene program. Mol Cell Biol 2014;34:2996-3012. [PMID: 24912679 PMCID: PMC4135604 DOI: 10.1128/mcb.01710-13] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2013] [Revised: 01/26/2014] [Accepted: 06/03/2014] [Indexed: 12/16/2022] Open

iRegulon: from a gene list to a gene regulatory network using large motif and track collections. PLoS Comput Biol 2014;10:e1003731. [PMID: 25058159 PMCID: PMC4109854 DOI: 10.1371/journal.pcbi.1003731] [Citation(s) in RCA: 606] [Impact Index Per Article: 60.6] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2014] [Accepted: 05/27/2014] [Indexed: 01/17/2023] Open

Abstract

Identifying master regulators of biological processes and mapping their downstream gene networks are key challenges in systems biology. We developed a computational method, called iRegulon, to reverse-engineer the transcriptional regulatory network underlying a co-expressed gene set using cis-regulatory sequence analysis. iRegulon implements a genome-wide ranking-and-recovery approach to detect enriched transcription factor motifs and their optimal sets of direct targets. We increase the accuracy of network inference by using very large motif collections of up to ten thousand position weight matrices collected from various species, and linking these to candidate human TFs via a motif2TF procedure. We validate iRegulon on gene sets derived from ENCODE ChIP-seq data with increasing levels of noise, and we compare iRegulon with existing motif discovery methods. Next, we use iRegulon on more challenging types of gene lists, including microRNA target sets, protein-protein interaction networks, and genetic perturbation data. In particular, we over-activate p53 in breast cancer cells, followed by RNA-seq and ChIP-seq, and could identify an extensive up-regulated network controlled directly by p53. Similarly we map a repressive network with no indication of direct p53 regulation but rather an indirect effect via E2F and NFY. Finally, we generalize our computational framework to include regulatory tracks such as ChIP-seq data and show how motif and track discovery can be combined to map functional regulatory interactions among co-expressed genes. iRegulon is available as a Cytoscape plugin from http://iregulon.aertslab.org.

Gene regulatory networks control developmental, homeostatic, and disease processes by governing precise levels and spatio-temporal patterns of gene expression. Determining their topology can provide mechanistic insight into these processes. Gene regulatory networks consist of interactions between transcription factors and their direct target genes. Each regulatory interaction represents the binding of the transcription factor to a specific DNA binding site near its target gene. Here we present a computational method, called iRegulon, to identify master regulators and direct target genes in a human gene signature, i.e. a set of co-expressed genes. iRegulon relies on the analysis of the regulatory sequences around each gene in the gene set to detect enriched TF motifs or ChIP-seq peaks, using databases of nearly 10.000 TF motifs and 1000 ChIP-seq data sets or “tracks”. Next, it associates enriched motifs and tracks with candidate transcription factors and determines the optimal subset of direct target genes. We validate iRegulon on ENCODE data, and use it in combination with RNA-seq and ChIP-seq data to map a p53 downstream network with new predicted co-factors and targets. iRegulon is available as a Cytoscape plugin, supporting human, mouse, and Drosophila genes, and provides access to hundreds of cancer-related TF-target subnetworks or “regulons”.

Collapse

Kloetgen A, Münch PC, Borkhardt A, Hoell JI, McHardy AC. Biochemical and bioinformatic methods for elucidating the role of RNA-protein interactions in posttranscriptional regulation. Brief Funct Genomics 2014;14:102-14. [PMID: 24951655 PMCID: PMC4471435 DOI: 10.1093/bfgp/elu020] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023] Open

Santolini M, Mora T, Hakim V. A general pairwise interaction model provides an accurate description of in vivo transcription factor binding sites. PLoS One 2014;9:e99015. [PMID: 24926895 PMCID: PMC4057186 DOI: 10.1371/journal.pone.0099015] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2013] [Accepted: 05/09/2014] [Indexed: 11/19/2022] Open

Abstract

The identification of transcription factor binding sites (TFBSs) on genomic DNA is of crucial importance for understanding and predicting regulatory elements in gene networks. TFBS motifs are commonly described by Position Weight Matrices (PWMs), in which each DNA base pair contributes independently to the transcription factor (TF) binding. However, this description ignores correlations between nucleotides at different positions, and is generally inaccurate: analysing fly and mouse in vivo ChIPseq data, we show that in most cases the PWM model fails to reproduce the observed statistics of TFBSs. To overcome this issue, we introduce the pairwise interaction model (PIM), a generalization of the PWM model. The model is based on the principle of maximum entropy and explicitly describes pairwise correlations between nucleotides at different positions, while being otherwise as unconstrained as possible. It is mathematically equivalent to considering a TF-DNA binding energy that depends additively on each nucleotide identity at all positions in the TFBS, like the PWM model, but also additively on pairs of nucleotides. We find that the PIM significantly improves over the PWM model, and even provides an optimal description of TFBS statistics within statistical noise. The PIM generalizes previous approaches to interdependent positions: it accounts for co-variation of two or more base pairs, and predicts secondary motifs, while outperforming multiple-motif models consisting of mixtures of PWMs. We analyse the structure of pairwise interactions between nucleotides, and find that they are sparse and dominantly located between consecutive base pairs in the flanking region of TFBS. Nonetheless, interactions between pairs of non-consecutive nucleotides are found to play a significant role in the obtained accurate description of TFBS statistics. The PIM is computationally tractable, and provides a general framework that should be useful for describing and predicting TFBSs beyond PWMs.

Collapse

Jablonska A, Polouliakh N. In silico discovery of novel transcription factors regulated by mTOR-pathway activities. Front Cell Dev Biol 2014;2:23. [PMID: 25364730 PMCID: PMC4206986 DOI: 10.3389/fcell.2014.00023] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2014] [Accepted: 05/09/2014] [Indexed: 12/21/2022] Open

Azmi AM, Al-Ssulami A. Encoded expansion: an efficient algorithm to discover identical string motifs. PLoS One 2014;9:e95148. [PMID: 24871320 PMCID: PMC4037181 DOI: 10.1371/journal.pone.0095148] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2013] [Accepted: 03/24/2014] [Indexed: 11/19/2022] Open

Abstract

A major task in computational biology is the discovery of short recurring string patterns known as motifs. Most of the schemes to discover motifs are either stochastic or combinatorial in nature. Stochastic approaches do not guarantee finding the correct motifs, while the combinatorial schemes tend to have an exponential time complexity with respect to motif length. To alleviate the cost, the combinatorial approach exploits dynamic data structures such as trees or graphs. Recently (Karci (2009) Efficient automatic exact motif discovery algorithms for biological sequences, Expert Systems with Applications 36:7952-7963) devised a deterministic algorithm that finds all the identical copies of string motifs of all sizes [Formula: see text] in theoretical time complexity of [Formula: see text] and a space complexity of [Formula: see text] where [Formula: see text] is the length of the input sequence and [Formula: see text] is the length of the longest possible string motif. In this paper, we present a significant improvement on Karci's original algorithm. The algorithm that we propose reports all identical string motifs of sizes [Formula: see text] that occur at least [Formula: see text] times. Our algorithm starts with string motifs of size 2, and at each iteration it expands the candidate string motifs by one symbol throwing out those that occur less than [Formula: see text] times in the entire input sequence. We use a simple array and data encoding to achieve theoretical worst-case time complexity of [Formula: see text] and a space complexity of [Formula: see text] Encoding of the substrings can speed up the process of comparison between string motifs. Experimental results on random and real biological sequences confirm that our algorithm has indeed a linear time complexity and it is more scalable in terms of sequence length than the existing algorithms.

Collapse

Glenwinkel L, Wu D, Minevich G, Hobert O. TargetOrtho: a phylogenetic footprinting tool to identify transcription factor targets. Genetics 2014;197:61-76. [PMID: 24558259 PMCID: PMC4012501 DOI: 10.1534/genetics.113.160721] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2014] [Accepted: 02/09/2014] [Indexed: 11/18/2022] Open

Rouault H, Santolini M, Schweisguth F, Hakim V. Imogene: identification of motifs and cis-regulatory modules underlying gene co-regulation. Nucleic Acids Res 2014;42:6128-45. [PMID: 24682824 PMCID: PMC4041412 DOI: 10.1093/nar/gku209] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022] Open

Diepenbruck M, Waldmeier L, Ivanek R, Berninger P, Arnold P, van Nimwegen E, Christofori G. Tead2 expression levels control the subcellular distribution of Yap and Taz, zyxin expression and epithelial-mesenchymal transition. J Cell Sci 2014;127:1523-36. [PMID: 24554433 DOI: 10.1242/jcs.139865] [Citation(s) in RCA: 101] [Impact Index Per Article: 10.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022] Open

Eggeling R, Gohr A, Keilwagen J, Mohr M, Posch S, Smith AD, Grosse I. On the value of intra-motif dependencies of human insulator protein CTCF. PLoS One 2014;9:e85629. [PMID: 24465627 PMCID: PMC3899044 DOI: 10.1371/journal.pone.0085629] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2013] [Accepted: 12/05/2013] [Indexed: 01/08/2023] Open

Re A, Joshi T, Kulberkyte E, Morris Q, Workman CT. RNA-protein interactions: an overview. Methods Mol Biol 2014;1097:491-521. [PMID: 24639174 DOI: 10.1007/978-1-62703-709-9_23] [Citation(s) in RCA: 76] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]

Duque T, Samee MAH, Kazemian M, Pham HN, Brodsky MH, Sinha S. Simulations of enhancer evolution provide mechanistic insights into gene regulation. Mol Biol Evol 2013;31:184-200. [PMID: 24097306 PMCID: PMC3879441 DOI: 10.1093/molbev/mst170] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/14/2023] Open

Abstract

There is growing interest in models of regulatory sequence evolution. However, existing models specifically designed for regulatory sequences consider the independent evolution of individual transcription factor (TF)-binding sites, ignoring that the function and evolution of a binding site depends on its context, typically the cis-regulatory module (CRM) in which the site is located. Moreover, existing models do not account for the gene-specific roles of TF-binding sites, primarily because their roles often are not well understood. We introduce two models of regulatory sequence evolution that address some of the shortcomings of existing models and implement simulation frameworks based on them. One model simulates the evolution of an individual binding site in the context of a CRM, while the other evolves an entire CRM. Both models use a state-of-the art sequence-to-expression model to predict the effects of mutations on the regulatory output of the CRM and determine the strength of selection. We use the new framework to simulate the evolution of TF-binding sites in 37 well-studied CRMs belonging to the anterior-posterior patterning system in Drosophila embryos. We show that these simulations provide accurate fits to evolutionary data from 12 Drosophila genomes, which includes statistics of binding site conservation on relatively short evolutionary scales and site loss across larger divergence times. The new framework allows us, for the first time, to test hypotheses regarding the underlying cis-regulatory code by directly comparing the evolutionary implications of the hypothesis with the observed evolutionary dynamics of binding sites. Using this capability, we find that explicitly modeling self-cooperative DNA binding by the TF Caudal (CAD) provides significantly better fits than an otherwise identical evolutionary simulation that lacks this mechanistic aspect. This hypothesis is further supported by a statistical analysis of the distribution of intersite spacing between adjacent CAD sites. Experimental tests confirm direct homodimeric interaction between CAD molecules as well as self-cooperative DNA binding by CAD. We note that computational modeling of the D. melanogaster CRMs alone did not yield significant evidence to support CAD self-cooperativity. We thus demonstrate how specific mechanistic details encoded in CRMs can be revealed by modeling their evolution and fitting such models to multispecies data.

Collapse

Brümmer A, Kishore S, Subasic D, Hengartner M, Zavolan M. Modeling the binding specificity of the RNA-binding protein GLD-1 suggests a function of coding region-located sites in translational repression. RNA (NEW YORK, N.Y.) 2013;19:1317-1326. [PMID: 23974436 PMCID: PMC3854522 DOI: 10.1261/rna.037531.112] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/03/2012] [Accepted: 06/25/2013] [Indexed: 06/02/2023]

Saha S, Lindeberg M. Bound to Succeed: transcription factor binding-site prediction and its contribution to understanding virulence and environmental adaptation in bacterial plant pathogens. MOLECULAR PLANT-MICROBE INTERACTIONS : MPMI 2013;26:1123-1130. [PMID: 23802990 DOI: 10.1094/mpmi-04-13-0090-cr] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/02/2023]

Nucleosome free regions in yeast promoters result from competitive binding of transcription factors that interact with chromatin modifiers. PLoS Comput Biol 2013;9:e1003181. [PMID: 23990766 PMCID: PMC3749953 DOI: 10.1371/journal.pcbi.1003181] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2012] [Accepted: 07/04/2013] [Indexed: 11/19/2022] Open

Abstract

Because DNA packaging in nucleosomes modulates its accessibility to transcription factors (TFs), unraveling the causal determinants of nucleosome positioning is of great importance to understanding gene regulation. Although there is evidence that intrinsic sequence specificity contributes to nucleosome positioning, the extent to which other factors contribute to nucleosome positioning is currently highly debated. Here we obtained both in vivo and in vitro reference maps of positions that are either consistently covered or free of nucleosomes across multiple experimental data-sets in Saccharomyces cerevisiae. We then systematically quantified the contribution of TF binding to nucleosome positiong using a rigorous statistical mechanics model in which TFs compete with nucleosomes for binding DNA. Our results reconcile previous seemingly conflicting results on the determinants of nucleosome positioning and provide a quantitative explanation for the difference between in vivo and in vitro positioning. On a genome-wide scale, nucleosome positioning is dominated by the phasing of nucleosome arrays over gene bodies, and their positioning is mainly determined by the intrinsic sequence preferences of nucleosomes. In contrast, larger nucleosome free regions in promoters, which likely have a much more significant impact on gene expression, are determined mainly by TF binding. Interestingly, of the 158 yeast TFs included in our modeling, we find that only 10–20 significantly contribute to inducing nucleosome-free regions, and these TFs are highly enriched for having direct interations with chromatin remodelers. Together our results imply that nucleosome free regions in yeast promoters results from the binding of a specific class of TFs that recruit chromatin remodelers.

The DNA of all eukaryotic organisms is packaged into nucleosomes, which cover roughly of the genome. As nucleosome positioning profoundly affects DNA accessibility to other DNA binding proteins such as transcription factors (TFs), it plays an important role in transcription regulation. However, to what extent nucleosome positioning is guided by intrinsic DNA sequence preferences of nucleosomes, and to what extent other DNA binding factors play a role, is currently highly debated. Here we use a rigorous biophysical model to systematically study the relative contributions of intrinsic sequence preferences and competitive binding of TFs to nucleosome positioning in yeast. We find that, on the one hand, the phasing of the many small spacers within dense nucleosome arrays that cover gene bodies are mainly determined by intrinsic sequence preferences. On the other hand, larger nucleosome free regions (NFRs) in promoters are explained predominantly by TF binding. Strikingly, we find that only 10–20 TFs make a significant contribution to explaining NFRs, and these TFs are highly enriched for directly interacting with chromatin modifiers. Thus, the picture that emerges is that binding by a specific class of TFs recruits chromatin modifiers which mediate local nucleosome expulsion.

Collapse

Leibovich L, Paz I, Yakhini Z, Mandel-Gutfreund Y. DRIMust: a web server for discovering rank imbalanced motifs using suffix trees. Nucleic Acids Res 2013;41:W174-9. [PMID: 23685432 PMCID: PMC3692051 DOI: 10.1093/nar/gkt407] [Citation(s) in RCA: 46] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open

Signal correlations in ecological niches can shape the organization and evolution of bacterial gene regulatory networks. Adv Microb Physiol 2013;61:1-36. [PMID: 23046950 DOI: 10.1016/b978-0-12-394423-8.00001-9] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]

Shao J, Zhang J, Zhang Z, Jiang H, Lou X, Huang B, Foltz G, Lan Q, Huang Q, Lin B. Alternative polyadenylation in glioblastoma multiforme and changes in predicted RNA binding protein profiles. OMICS-A JOURNAL OF INTEGRATIVE BIOLOGY 2013;17:136-49. [PMID: 23421905 DOI: 10.1089/omi.2012.0098] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]

Sahu TK, Rao AR, Vasisht S, Singh N, Singh UP. Computational approaches, databases and tools for in silico motif discovery. Interdiscip Sci 2012;4:239-255. [PMID: 23354813 DOI: 10.1007/s12539-012-0141-x] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2011] [Revised: 04/12/2012] [Accepted: 06/13/2012] [Indexed: 06/01/2023]

Müller-Molina AJ, Schöler HR, Araúzo-Bravo MJ. Comprehensive human transcription factor binding site map for combinatory binding motifs discovery. PLoS One 2012;7:e49086. [PMID: 23209563 PMCID: PMC3509107 DOI: 10.1371/journal.pone.0049086] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2012] [Accepted: 10/08/2012] [Indexed: 11/18/2022] Open

Abstract

To know the map between transcription factors (TFs) and their binding sites is essential to reverse engineer the regulation process. Only about 10%-20% of the transcription factor binding motifs (TFBMs) have been reported. This lack of data hinders understanding gene regulation. To address this drawback, we propose a computational method that exploits never used TF properties to discover the missing TFBMs and their sites in all human gene promoters. The method starts by predicting a dictionary of regulatory "DNA words." From this dictionary, it distills 4098 novel predictions. To disclose the crosstalk between motifs, an additional algorithm extracts TF combinatorial binding patterns creating a collection of TF regulatory syntactic rules. Using these rules, we narrowed down a list of 504 novel motifs that appear frequently in syntax patterns. We tested the predictions against 509 known motifs confirming that our system can reliably predict ab initio motifs with an accuracy of 81%-far higher than previous approaches. We found that on average, 90% of the discovered combinatorial binding patterns target at least 10 genes, suggesting that to control in an independent manner smaller gene sets, supplementary regulatory mechanisms are required. Additionally, we discovered that the new TFBMs and their combinatorial patterns convey biological meaning, targeting TFs and genes related to developmental functions. Thus, among all the possible available targets in the genome, the TFs tend to regulate other TFs and genes involved in developmental functions. We provide a comprehensive resource for regulation analysis that includes a dictionary of "DNA words," newly predicted motifs and their corresponding combinatorial patterns. Combinatorial patterns are a useful filter to discover TFBMs that play a major role in orchestrating other factors and thus, are likely to lock/unlock cellular functional clusters.

Collapse