Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Stormo GD, Hartzell GW. Identifying protein-binding sites from unaligned DNA fragments. Proc Natl Acad Sci U S A 1989;86:1183-7. [PMID: 2919167 PMCID: PMC286650 DOI: 10.1073/pnas.86.4.1183] [Citation(s) in RCA: 224] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023] Open

For:	Stormo GD, Hartzell GW. Identifying protein-binding sites from unaligned DNA fragments. Proc Natl Acad Sci U S A 1989;86:1183-7. [PMID: 2919167 PMCID: PMC286650 DOI: 10.1073/pnas.86.4.1183] [Citation(s) in RCA: 224] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023] Open

Number

Cited by Other Article(s)

Selvakumar P, Siddharthan R. Position-specific evolution in transcription factor binding sites, and a fast likelihood calculation for the F81 model. ROYAL SOCIETY OPEN SCIENCE 2024;11:231088. [PMID: 38269075 PMCID: PMC10805598 DOI: 10.1098/rsos.231088] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/26/2023] [Accepted: 12/20/2023] [Indexed: 01/26/2024]

Proft S, Leiz J, Heinemann U, Seelow D, Schmidt-Ott KM, Rutkiewicz M. Discovery of a non-canonical GRHL1 binding site using deep convolutional and recurrent neural networks. BMC Genomics 2023;24:736. [PMID: 38049725 PMCID: PMC10696883 DOI: 10.1186/s12864-023-09830-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2023] [Accepted: 11/22/2023] [Indexed: 12/06/2023] Open

Affiliation(s)

Sebastian Proft Exploratory Diagnostic Sciences, Berlin Institute of Health, Charité - Universitätsmedizin Berlin, 10117, Berlin, Germany Institute of Medical Genetics and Human Genetics, Charité - Universitätsmedizin Berlin, Freie Universität Berlin and Humboldt-Universität zu Berlin, 13353, Berlin, Germany
Janna Leiz Department of Nephrology and Hypertension, Hannover Medical School, 30625, Hannover, Germany Department of Nephrology and Intensive Care Medicine, Charité - Universitätsmedizin Berlin, Freie Universität Berlin and Humboldt-Universität zu Berlin, 12203, Berlin, Germany Molecular and Translational Kidney Research, Max-Delbrück-Center for Molecular Medicine in the Helmholtz Association, 13125, Berlin, Germany
Udo Heinemann Macromolecular Structure and Interaction, Max Delbrück Center for Molecular Medicine in the Helmholtz Association, 13125, Berlin, Germany.
Dominik Seelow Exploratory Diagnostic Sciences, Berlin Institute of Health, Charité - Universitätsmedizin Berlin, 10117, Berlin, Germany. Institute of Medical Genetics and Human Genetics, Charité - Universitätsmedizin Berlin, Freie Universität Berlin and Humboldt-Universität zu Berlin, 13353, Berlin, Germany.
Kai M Schmidt-Ott Department of Nephrology and Hypertension, Hannover Medical School, 30625, Hannover, Germany. Department of Nephrology and Intensive Care Medicine, Charité - Universitätsmedizin Berlin, Freie Universität Berlin and Humboldt-Universität zu Berlin, 12203, Berlin, Germany. Molecular and Translational Kidney Research, Max-Delbrück-Center for Molecular Medicine in the Helmholtz Association, 13125, Berlin, Germany.
Maria Rutkiewicz Macromolecular Structure and Interaction, Max Delbrück Center for Molecular Medicine in the Helmholtz Association, 13125, Berlin, Germany Department of Structural Biology of Eukaryotes, Institute of Bioorganic Chemistry, Polish Academy of Sciences, Poznań, 61-704, Poland

Collapse

Alexandari AM, Horton CA, Shrikumar A, Shah N, Li E, Weilert M, Pufall MA, Zeitlinger J, Fordyce PM, Kundaje A. De novo distillation of thermodynamic affinity from deep learning regulatory sequence models of in vivo protein-DNA binding. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.05.11.540401. [PMID: 37214836 PMCID: PMC10197627 DOI: 10.1101/2023.05.11.540401] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/24/2023]

Abstract

Transcription factors (TF) are proteins that bind DNA in a sequence-specific manner to regulate gene transcription. Despite their unique intrinsic sequence preferences, in vivo genomic occupancy profiles of TFs differ across cellular contexts. Hence, deciphering the sequence determinants of TF binding, both intrinsic and context-specific, is essential to understand gene regulation and the impact of regulatory, non-coding genetic variation. Biophysical models trained on in vitro TF binding assays can estimate intrinsic affinity landscapes and predict occupancy based on TF concentration and affinity. However, these models cannot adequately explain context-specific, in vivo binding profiles. Conversely, deep learning models, trained on in vivo TF binding assays, effectively predict and explain genomic occupancy profiles as a function of complex regulatory sequence syntax, albeit without a clear biophysical interpretation. To reconcile these complementary models of in vitro and in vivo TF binding, we developed Affinity Distillation (AD), a method that extracts thermodynamic affinities de-novo from deep learning models of TF chromatin immunoprecipitation (ChIP) experiments by marginalizing away the influence of genomic sequence context. Applied to neural networks modeling diverse classes of yeast and mammalian TFs, AD predicts energetic impacts of sequence variation within and surrounding motifs on TF binding as measured by diverse in vitro assays with superior dynamic range and accuracy compared to motif-based methods. Furthermore, AD can accurately discern affinities of TF paralogs. Our results highlight thermodynamic affinity as a key determinant of in vivo binding, suggest that deep learning models of in vivo binding implicitly learn high-resolution affinity landscapes, and show that these affinities can be successfully distilled using AD. This new biophysical interpretation of deep learning models enables high-throughput in silico experiments to explore the influence of sequence context and variation on both intrinsic affinity and in vivo occupancy.

Collapse

Alamos S, Reimer A, Westrum C, Turner MA, Talledo P, Zhao J, Luu E, Garcia HG. Minimal synthetic enhancers reveal control of the probability of transcriptional engagement and its timing by a morphogen gradient. Cell Syst 2023;14:220-236.e3. [PMID: 36696901 PMCID: PMC10125799 DOI: 10.1016/j.cels.2022.12.008] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2021] [Revised: 09/03/2022] [Accepted: 12/21/2022] [Indexed: 01/26/2023]

Zhang Y, Bao W, Cao Y, Cong H, Chen B, Chen Y. A survey on protein–DNA-binding sites in computational biology. Brief Funct Genomics 2022;21:357-375. [DOI: 10.1093/bfgp/elac009] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2022] [Revised: 04/07/2022] [Accepted: 04/22/2022] [Indexed: 01/08/2023] Open

Abstract Abstract Transcription factors are important cellular components of the process of gene expression control. Transcription factor binding sites are locations where transcription factors specifically recognize DNA sequences, targeting gene-specific regions and recruiting transcription factors or chromatin regulators to fine-tune spatiotemporal gene regulation. As the common proteins, transcription factors play a meaningful role in life-related activities. In the face of the increase in the protein sequence, it is urgent how to predict the structure and function of the protein effectively. At present, protein–DNA-binding site prediction methods are based on traditional machine learning algorithms and deep learning algorithms. In the early stage, we usually used the development method based on traditional machine learning algorithm to predict protein–DNA-binding sites. In recent years, methods based on deep learning to predict protein–DNA-binding sites from sequence data have achieved remarkable success. Various statistical and machine learning methods used to predict the function of DNA-binding proteins have been proposed and continuously improved. Existing deep learning methods for predicting protein–DNA-binding sites can be roughly divided into three categories: convolutional neural network (CNN), recursive neural network (RNN) and hybrid neural network based on CNN–RNN. The purpose of this review is to provide an overview of the computational and experimental methods applied in the field of protein–DNA-binding site prediction today. This paper introduces the methods of traditional machine learning and deep learning in protein–DNA-binding site prediction from the aspects of data processing characteristics of existing learning frameworks and differences between basic learning model frameworks. Our existing methods are relatively simple compared with natural language processing, computational vision, computer graphics and other fields. Therefore, the summary of existing protein–DNA-binding site prediction methods will help researchers better understand this field. Collapse

Clauwaert J, Waegeman W. Novel Transformer Networks for Improved Sequence Labeling in genomics. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022;19:97-106. [PMID: 33125335 DOI: 10.1109/tcbb.2020.3035021] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]

Clauwaert J, Menschaert G, Waegeman W. Explainability in transformer models for functional genomics. Brief Bioinform 2021;22:6214646. [PMID: 33834200 PMCID: PMC8425421 DOI: 10.1093/bib/bbab060] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2020] [Revised: 01/28/2021] [Accepted: 02/05/2021] [Indexed: 11/16/2022] Open

He Y, Shen Z, Zhang Q, Wang S, Huang DS. A survey on deep learning in DNA/RNA motif mining. Brief Bioinform 2020;22:5916939. [PMID: 33005921 PMCID: PMC8293829 DOI: 10.1093/bib/bbaa229] [Citation(s) in RCA: 34] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2020] [Revised: 08/19/2020] [Accepted: 08/24/2020] [Indexed: 01/18/2023] Open

DNA motif discovery using chemical reaction optimization. EVOLUTIONARY INTELLIGENCE 2020. [DOI: 10.1007/s12065-020-00444-2] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]

Ashraf FB, Shafi MSR. MFEA: An evolutionary approach for motif finding in DNA sequences. INFORMATICS IN MEDICINE UNLOCKED 2020. [DOI: 10.1016/j.imu.2020.100466] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022] Open

Agrawal A, Sambare SV, Narlikar L, Siddharthan R. THiCweed: fast, sensitive detection of sequence features by clustering big datasets. Nucleic Acids Res 2019;46:e29. [PMID: 29267972 PMCID: PMC5861420 DOI: 10.1093/nar/gkx1251] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2017] [Accepted: 12/01/2017] [Indexed: 11/19/2022] Open

Hashim FA, Mabrouk MS, Al-Atabany W. Review of Different Sequence Motif Finding Algorithms. Avicenna J Med Biotechnol 2019;11:130-148. [PMID: 31057715 PMCID: PMC6490410] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2018] [Accepted: 05/26/2018] [Indexed: 11/05/2022] Open

Hashim FA, Mabrouk MS, Atabany WA. Comparative Analysis of DNA Motif Discovery Algorithms: A Systemic Review. CURRENT CANCER THERAPY REVIEWS 2019. [DOI: 10.2174/1573394714666180417161728] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]

Dempster-Shafer Theory for the Prediction of Auxin-Response Elements (AuxREs) in Plant Genomes. BIOMED RESEARCH INTERNATIONAL 2018;2018:3837060. [PMID: 30515394 PMCID: PMC6236769 DOI: 10.1155/2018/3837060] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/17/2018] [Accepted: 10/15/2018] [Indexed: 11/17/2022]

Latif H, Federowicz S, Ebrahim A, Tarasova J, Szubin R, Utrilla J, Zengler K, Palsson BO. ChIP-exo interrogation of Crp, DNA, and RNAP holoenzyme interactions. PLoS One 2018;13:e0197272. [PMID: 29771928 PMCID: PMC5957442 DOI: 10.1371/journal.pone.0197272] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2017] [Accepted: 04/30/2018] [Indexed: 12/17/2022] Open

Comprehensive, high-resolution binding energy landscapes reveal context dependencies of transcription factor binding. Proc Natl Acad Sci U S A 2018;115:E3702-E3711. [PMID: 29588420 PMCID: PMC5910820 DOI: 10.1073/pnas.1715888115] [Citation(s) in RCA: 51] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open

Caldonazzo Garbelini JM, Kashiwabara AY, Sanches DS. Sequence motif finder using memetic algorithm. BMC Bioinformatics 2018;19:4. [PMID: 29298679 PMCID: PMC5751424 DOI: 10.1186/s12859-017-2005-1] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2017] [Accepted: 12/18/2017] [Indexed: 11/10/2022] Open

Inherent limitations of probabilistic models for protein-DNA binding specificity. PLoS Comput Biol 2017;13:e1005638. [PMID: 28686588 PMCID: PMC5521849 DOI: 10.1371/journal.pcbi.1005638] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2017] [Revised: 07/21/2017] [Accepted: 06/21/2017] [Indexed: 01/10/2023] Open

Abstract

The specificities of transcription factors are most commonly represented with probabilistic models. These models provide a probability for each base occurring at each position within the binding site and the positions are assumed to contribute independently. The model is simple and intuitive and is the basis for many motif discovery algorithms. However, the model also has inherent limitations that prevent it from accurately representing true binding probabilities, especially for the highest affinity sites under conditions of high protein concentration. The limitations are not due to the assumption of independence between positions but rather are caused by the non-linear relationship between binding affinity and binding probability and the fact that independent normalization at each position skews the site probabilities. Generally probabilistic models are reasonably good approximations, but new high-throughput methods allow for biophysical models with increased accuracy that should be used whenever possible.

Transcription factors (TFs), a class of DNA-binding proteins, play a central role in the regulation of gene expression. TFs control the rate of transcription by binding to the genome in a sequence-specific manner. Thus, one important aspect in the study of gene regulation mechanism is to model the binding specificities of TFs, namely the features of the DNA sequences that a TF prefers to bind. Multiple models have been proposed to characterize the binding specificities of TFs, among which the class of probabilistic models is the most popular. In this study, we point out several major limitations of the well-established probabilistic model by comparing it with the biophysical model. Through simulations we demonstrate that the probabilistic model is only an approximation of the biophysical model. The latter has most of the advantages of the former, and is a more accurate representation of binding specificities. We propose a shift from the probabilistic model to the biophysical model in future studies of protein-DNA interactions.

Collapse

Validating regulatory predictions from diverse bacteria with mutant fitness data. PLoS One 2017;12:e0178258. [PMID: 28542589 PMCID: PMC5443562 DOI: 10.1371/journal.pone.0178258] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2016] [Accepted: 04/27/2017] [Indexed: 11/26/2022] Open

Ren C, Chen H, Yang B, Liu F, Ouyang Z, Bo X, Shu W. iFORM: Incorporating Find Occurrence of Regulatory Motifs. PLoS One 2016;11:e0168607. [PMID: 27992540 PMCID: PMC5167396 DOI: 10.1371/journal.pone.0168607] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2016] [Accepted: 12/02/2016] [Indexed: 11/18/2022] Open

Tuğrul M, Paixão T, Barton NH, Tkačik G. Dynamics of Transcription Factor Binding Site Evolution. PLoS Genet 2015;11:e1005639. [PMID: 26545200 PMCID: PMC4636380 DOI: 10.1371/journal.pgen.1005639] [Citation(s) in RCA: 68] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2015] [Accepted: 10/09/2015] [Indexed: 11/19/2022] Open

Abstract

Evolution of gene regulation is crucial for our understanding of the phenotypic differences between species, populations and individuals. Sequence-specific binding of transcription factors to the regulatory regions on the DNA is a key regulatory mechanism that determines gene expression and hence heritable phenotypic variation. We use a biophysical model for directional selection on gene expression to estimate the rates of gain and loss of transcription factor binding sites (TFBS) in finite populations under both point and insertion/deletion mutations. Our results show that these rates are typically slow for a single TFBS in an isolated DNA region, unless the selection is extremely strong. These rates decrease drastically with increasing TFBS length or increasingly specific protein-DNA interactions, making the evolution of sites longer than ∼ 10 bp unlikely on typical eukaryotic speciation timescales. Similarly, evolution converges to the stationary distribution of binding sequences very slowly, making the equilibrium assumption questionable. The availability of longer regulatory sequences in which multiple binding sites can evolve simultaneously, the presence of “pre-sites” or partially decayed old sites in the initial sequence, and biophysical cooperativity between transcription factors, can all facilitate gain of TFBS and reconcile theoretical calculations with timescales inferred from comparative genomics.

Evolution has produced a remarkable diversity of living forms that manifests in qualitative differences as well as quantitative traits. An essential factor that underlies this variability is transcription factor binding sites, short pieces of DNA that control gene expression levels. Nevertheless, we lack a thorough theoretical understanding of the evolutionary times required for the appearance and disappearance of these sites. By combining a biophysically realistic model for how cells read out information in transcription factor binding sites with model for DNA sequence evolution, we explore these timescales and ask what factors crucially affect them. We find that the emergence of binding sites from a random sequence is generically slow under point and insertion/deletion mutational mechanisms. Strong selection, sufficient genomic sequence in which the sites can evolve, the existence of partially decayed old binding sites in the sequence, as well as certain biophysical mechanisms such as cooperativity, can accelerate the binding site gain times and make them consistent with the timescales suggested by comparative analyses of genomic data.

Collapse

Ponomarenko PM, Ponomarenko MP. Sequence-based prediction of transcription upregulation by auxin in plants. J Bioinform Comput Biol 2015;13:1540009. [PMID: 25666655 DOI: 10.1142/s0219720015400090] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]

Abstract

Auxin is one of the main regulators of growth and development in plants. Prediction of auxin response based on gene sequence is of high importance. We found the TGTCNC consensus of 111 known natural and artificially mutated auxin response elements (AuxREs) with measured auxin-caused relative increase in genes' transcription levels, so-called either "a response to auxin" or "an auxin response." This consensus was identical to the most cited AuxRE motif. Also, we found several DNA sequence features that correlate with auxin-caused increase in genes' transcription levels, namely: number of matches with TGTCNC, homology score based on nucleotide frequencies at the consensus positions, abundances of five trinucleotides and five B-helical DNA features around these known AuxREs. We combined these correlations using a four-step empirical model of auxin response based on a gene's sequence with four steps, namely: (1) search for AuxREs with no auxin; (2) stop at the found AuxRE; (3) repression of the basal transcription of the gene having this AuxRE; and (4) manifold increase of this gene's transcription in response to auxin. Independently measured increases in transcription levels in response to auxin for 70 Arabidopsis genes were found to significantly correlate with predictions of this equation (r = 0.44, p < 0.001) as well as with TATA-binding protein (TBP)'s affinity to promoters of these genes and with nucleosome packing of these promoters (both, p < 0.025). Finally, we improved our equation for prediction of a gene's transcription increase in response to auxin by taking into account TBP-binding and nucleosome packing (r = 0.53, p < 10(-6)). Fisher's F-test validated the significant impact of both TBP/promoter-affinity and promoter nucleosome on auxin response in addition to those of AuxRE, F = 4.07, p < 0.025. It means that both TATA-box and nucleosome should be taken into account to recognize transcription factor binding sites upon DNA sequences: in the case of the TATA-less nucleosome-rich promoters, recognition scores must be higher than in the case of the TATA-containing nucleosome-free promoters at the same transcription activity.

Collapse

Kilpatrick AM, Ward B, Aitken S. Stochastic EM-based TFBS motif discovery with MITSU. Bioinformatics 2014;30:i310-8. [PMID: 24931999 PMCID: PMC4058950 DOI: 10.1093/bioinformatics/btu286] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open

Hu J, Wang D, Li J, Jing G, Ning K, Xu J. Genome-wide identification of transcription factors and transcription-factor binding sites in oleaginous microalgae Nannochloropsis. Sci Rep 2014;4:5454. [PMID: 24965723 PMCID: PMC5154493 DOI: 10.1038/srep05454] [Citation(s) in RCA: 55] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2014] [Accepted: 06/09/2014] [Indexed: 12/25/2022] Open

Costanzo MC, Engel SR, Wong ED, Lloyd P, Karra K, Chan ET, Weng S, Paskov KM, Roe GR, Binkley G, Hitz BC, Cherry JM. Saccharomyces genome database provides new regulation data. Nucleic Acids Res 2013;42:D717-25. [PMID: 24265222 PMCID: PMC3965049 DOI: 10.1093/nar/gkt1158] [Citation(s) in RCA: 55] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023] Open

Nandi S, Blais A, Ioshikhes I. Identification of cis-regulatory modules in promoters of human genes exploiting mutual positioning of transcription factors. Nucleic Acids Res 2013;41:8822-41. [PMID: 23913413 PMCID: PMC3799424 DOI: 10.1093/nar/gkt578] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022] Open

Liu W, Chen H, Chen L. An ant colony optimization based algorithm for identifying gene regulatory elements. Comput Biol Med 2013;43:922-32. [PMID: 23746735 DOI: 10.1016/j.compbiomed.2013.04.008] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2011] [Revised: 04/10/2013] [Accepted: 04/11/2013] [Indexed: 11/15/2022]

Zhang Y, Huo H, Yu Q. A heuristic cluster-based EM algorithm for the planted (l, d) problem. J Bioinform Comput Biol 2013;11:1350009. [PMID: 23859273 DOI: 10.1142/s0219720013500091] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]

Yu Q, Huo H, Zhang Y, Guo H, Guo H. PairMotif+: a fast and effective algorithm for de novo motif discovery in DNA sequences. Int J Biol Sci 2013;9:412-24. [PMID: 23678291 PMCID: PMC3654438 DOI: 10.7150/ijbs.5786] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/01/2013] [Accepted: 04/15/2013] [Indexed: 11/25/2022] Open

Modeling the specificity of protein-DNA interactions. QUANTITATIVE BIOLOGY 2013;1:115-130. [PMID: 25045190 DOI: 10.1007/s40484-013-0012-4] [Citation(s) in RCA: 132] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]

Hu ZP, Chen LS, Jia CY, Zhu HZ, Wang W, Zhong J. Screening of potential pseudo att sites of Streptomyces phage ΦC31 integrase in the human genome. Acta Pharmacol Sin 2013;34:561-9. [PMID: 23416928 DOI: 10.1038/aps.2012.173] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022] Open

Weiss V, Medina-Rivera A, Huerta AM, Santos-Zavaleta A, Salgado H, Morett E, Collado-Vides J. Evidence classification of high-throughput protocols and confidence integration in RegulonDB. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2013;2013:bas059. [PMID: 23327937 PMCID: PMC3548332 DOI: 10.1093/database/bas059] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]

Efficient identification of transcription factor binding sites with a graph theoretic approach. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2013;2013:856281. [PMID: 23365625 PMCID: PMC3549379 DOI: 10.1155/2013/856281] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/09/2012] [Accepted: 12/13/2012] [Indexed: 12/02/2022]

Sebastian A, Contreras-Moreira B. The twilight zone of cis element alignments. Nucleic Acids Res 2012;41:1438-49. [PMID: 23268451 PMCID: PMC3561995 DOI: 10.1093/nar/gks1301] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/26/2023] Open

Regulatory regions of the C elegans genome contain more low-affinity REF-1 transcription-factor binding sites than high-affinity sites. Genes Genomics 2012. [DOI: 10.1007/s13258-012-0213-7] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]

Müller-Molina AJ, Schöler HR, Araúzo-Bravo MJ. Comprehensive human transcription factor binding site map for combinatory binding motifs discovery. PLoS One 2012;7:e49086. [PMID: 23209563 PMCID: PMC3509107 DOI: 10.1371/journal.pone.0049086] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2012] [Accepted: 10/08/2012] [Indexed: 11/18/2022] Open

Abstract

To know the map between transcription factors (TFs) and their binding sites is essential to reverse engineer the regulation process. Only about 10%-20% of the transcription factor binding motifs (TFBMs) have been reported. This lack of data hinders understanding gene regulation. To address this drawback, we propose a computational method that exploits never used TF properties to discover the missing TFBMs and their sites in all human gene promoters. The method starts by predicting a dictionary of regulatory "DNA words." From this dictionary, it distills 4098 novel predictions. To disclose the crosstalk between motifs, an additional algorithm extracts TF combinatorial binding patterns creating a collection of TF regulatory syntactic rules. Using these rules, we narrowed down a list of 504 novel motifs that appear frequently in syntax patterns. We tested the predictions against 509 known motifs confirming that our system can reliably predict ab initio motifs with an accuracy of 81%-far higher than previous approaches. We found that on average, 90% of the discovered combinatorial binding patterns target at least 10 genes, suggesting that to control in an independent manner smaller gene sets, supplementary regulatory mechanisms are required. Additionally, we discovered that the new TFBMs and their combinatorial patterns convey biological meaning, targeting TFs and genes related to developmental functions. Thus, among all the possible available targets in the genome, the TFs tend to regulate other TFs and genes involved in developmental functions. We provide a comprehensive resource for regulation analysis that includes a dictionary of "DNA words," newly predicted motifs and their corresponding combinatorial patterns. Combinatorial patterns are a useful filter to discover TFBMs that play a major role in orchestrating other factors and thus, are likely to lock/unlock cellular functional clusters.

Collapse

Chemes LB, Glavina J, Alonso LG, Marino-Buslje C, de Prat-Gay G, Sánchez IE. Sequence evolution of the intrinsically disordered and globular domains of a model viral oncoprotein. PLoS One 2012;7:e47661. [PMID: 23118886 PMCID: PMC3485249 DOI: 10.1371/journal.pone.0047661] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2012] [Accepted: 09/14/2012] [Indexed: 12/11/2022] Open

Ding J, Li X, Hu H. Systematic prediction of cis-regulatory elements in the Chlamydomonas reinhardtii genome using comparative genomics. PLANT PHYSIOLOGY 2012;160:613-23. [PMID: 22915576 PMCID: PMC3461543 DOI: 10.1104/pp.112.200840] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]

Wang Y, Ding J, Daniell H, Hu H, Li X. Motif analysis unveils the possible co-regulation of chloroplast genes and nuclear genes encoding chloroplast proteins. PLANT MOLECULAR BIOLOGY 2012;80:177-87. [PMID: 22733202 DOI: 10.1007/s11103-012-9938-6] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/18/2012] [Accepted: 06/15/2012] [Indexed: 06/01/2023]

Bi C. Memetic algorithms for de novo motif-finding in biomedical sequences. Artif Intell Med 2012;56:1-17. [DOI: 10.1016/j.artmed.2012.04.002] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2011] [Revised: 04/03/2012] [Accepted: 04/10/2012] [Indexed: 11/26/2022]

Nandi S, Ioshikhes I. Optimizing the GATA-3 position weight matrix to improve the identification of novel binding sites. BMC Genomics 2012;13:416. [PMID: 22913572 PMCID: PMC3481455 DOI: 10.1186/1471-2164-13-416] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2011] [Accepted: 08/02/2012] [Indexed: 11/21/2022] Open

Federico M, Leoncini M, Montangero M, Valente P. Direct vs 2-stage approaches to structured motif finding. Algorithms Mol Biol 2012;7:20. [PMID: 22908910 PMCID: PMC3564690 DOI: 10.1186/1748-7188-7-20] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2011] [Accepted: 07/25/2012] [Indexed: 12/03/2022] Open

Abstract

Background

The notion of DNA motif is a mathematical abstraction used to model regions of the DNA (known as Transcription Factor Binding Sites, or TFBSs) that are bound by a given Transcription Factor to regulate gene expression or repression. In turn, DNA structured motifs are a mathematical counterpart that models sets of TFBSs that work in concert in the gene regulations processes of higher eukaryotic organisms. Typically, a structured motif is composed of an ordered set of isolated (or simple) motifs, separated by a variable, but somewhat constrained number of “irrelevant” base-pairs. Discovering structured motifs in a set of DNA sequences is a computationally hard problem that has been addressed by a number of authors using either a direct approach, or via the preliminary identification and successive combination of simple motifs.

Results

We describe a computational tool, named SISMA, for the de-novo discovery of structured motifs in a set of DNA sequences. SISMA is an exact, enumerative algorithm, meaning that it finds all the motifs conforming to the specifications. It does so in two stages: first it discovers all the possible component simple motifs, then combines them in a way that respects the given constraints. We developed SISMA mainly with the aim of understanding the potential benefits of such a 2-stage approach w.r.t. direct methods. In fact, no 2-stage software was available for the general problem of structured motif discovery, but only a few tools that solved restricted versions of the problem. We evaluated SISMA against other published tools on a comprehensive benchmark made of both synthetic and real biological datasets. In a significant number of cases, SISMA outperformed the competitors, exhibiting a good performance also in most of the cases in which it was inferior.

Conclusions

A reflection on the results obtained lead us to conclude that a 2-stage approach can be implemented with many advantages over direct approaches. Some of these have to do with greater modularity, ease of parallelization, and the possibility to perform adaptive searches of structured motifs. As another consideration, we noted that most hard instances for SISMA were easy to detect in advance. In these cases one may initially opt for a direct method; or, as a viable alternative in most laboratories, one could run both direct and 2-stage tools in parallel, halting the computations when the first halts.

Collapse

Mahdevar G, Sadeghi M, Nowzari-Dalini A. Transcription factor binding sites detection by using alignment-based approach. J Theor Biol 2012;304:96-102. [PMID: 22504445 DOI: 10.1016/j.jtbi.2012.03.039] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2011] [Revised: 03/27/2012] [Accepted: 03/29/2012] [Indexed: 11/25/2022]

Zhao Y, Ruan S, Pandey M, Stormo GD. Improved models for transcription factor binding site identification using nonindependent interactions. Genetics 2012;191:781-90. [PMID: 22505627 PMCID: PMC3389974 DOI: 10.1534/genetics.112.138685] [Citation(s) in RCA: 98] [Impact Index Per Article: 8.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2011] [Accepted: 04/07/2012] [Indexed: 12/27/2022] Open

Ihuegbu NE, Stormo GD, Buhler J. Fast, sensitive discovery of conserved genome-wide motifs. J Comput Biol 2012;19:139-47. [PMID: 22300316 DOI: 10.1089/cmb.2011.0249] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open

Zandevakili P, Hu M, Qin Z. GPUmotif: an ultra-fast and energy-efficient motif analysis program using graphics processing units. PLoS One 2012;7:e36865. [PMID: 22662128 PMCID: PMC3360745 DOI: 10.1371/journal.pone.0036865] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2012] [Accepted: 04/15/2012] [Indexed: 11/18/2022] Open

Kim JK, Kwon O, Kim J, Kim EK, Park HK, Lee JE, Kim KL, Choi JW, Lim S, Seok H, Lee-Kwon W, Choi JH, Kang BH, Kim S, Ryu SH, Suh PG. PDZ domain-containing 1 (PDZK1) protein regulates phospholipase C-β3 (PLC-β3)-specific activation of somatostatin by forming a ternary complex with PLC-β3 and somatostatin receptors. J Biol Chem 2012;287:21012-24. [PMID: 22528496 DOI: 10.1074/jbc.m111.337865] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open

Kim J, Kim I, Yang JS, Shin YE, Hwang J, Park S, Choi YS, Kim S. Rewiring of PDZ domain-ligand interaction network contributed to eukaryotic evolution. PLoS Genet 2012;8:e1002510. [PMID: 22346764 PMCID: PMC3276551 DOI: 10.1371/journal.pgen.1002510] [Citation(s) in RCA: 54] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2011] [Accepted: 12/12/2011] [Indexed: 12/04/2022] Open

Abstract

PDZ domain-mediated interactions have greatly expanded during metazoan evolution, becoming important for controlling signal flow via the assembly of multiple signaling components. The evolutionary history of PDZ domain-mediated interactions has never been explored at the molecular level. It is of great interest to understand how PDZ domain-ligand interactions emerged and how they become rewired during evolution. Here, we constructed the first human PDZ domain-ligand interaction network (PDZNet) together with binding motif sequences and interaction strengths of ligands. PDZNet includes 1,213 interactions between 97 human PDZ proteins and 591 ligands that connect most PDZ protein-mediated interactions (98%) in a large single network via shared ligands. We examined the rewiring of PDZ domain-ligand interactions throughout eukaryotic evolution by tracing changes in the C-terminal binding motif sequences of the PDZ ligands. We found that interaction rewiring by sequence mutation frequently occurred throughout evolution, largely contributing to the growth of PDZNet. The rewiring of PDZ domain-ligand interactions provided an effective means of functional innovations in nervous system development. Our findings provide empirical evidence for a network evolution model that highlights the rewiring of interactions as a mechanism for the development of new protein functions. PDZNet will be a valuable resource to further characterize the organization of the PDZ domain-mediated signaling proteome.

Rewiring of interactions is a powerful tool for the evolution of organism complexity. Rewiring among preexisting proteins provides a simple mechanism for the development of new signaling circuits by redirecting information flows without a gain or loss of genes. Particularly, interactions mediated by short linear motifs can be easily changed by mutations during evolution, resulting in a rewiring of interactions. However, how interaction rewiring of linear motif interactions facilitates the emergence of new protein function during evolution is poorly understood. Here, we systematically investigated the rewiring of interactions mediated by PDZ domains, which are one of the most commonly found peptide recognition modules. We found that PDZ domain-ligand interactions are frequently rewired by C-terminal sequence mutations in PDZ ligands during evolution. Especially, rewiring of PDZ domain-ligand interactions was involved in neuronal function development, occurring concurrently with the emergence of vertebrates and suggesting that reorganization of signaling pathways by rewiring PDZ domain-ligand interactions significantly contributed to the evolution of nervous systems in vertebrates. Our findings highlight the rewiring of interactions as an effective means for functional innovation, providing new insight into eukaryotic evolution, which has not been fully explained by only the expansion of protein families.

Collapse

Aittokallio T, Kurki M, Nevalainen O, Nikula T, West A, Lahesmaa R. Computational Strategies for Analyzing Data in Gene Expression Microarray Experiments. J Bioinform Comput Biol 2012;1:541-86. [PMID: 15290769 DOI: 10.1142/s0219720003000319] [Citation(s) in RCA: 36] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2003] [Revised: 07/02/2003] [Indexed: 11/18/2022]

Ascano M, Hafner M, Cekan P, Gerstberger S, Tuschl T. Identification of RNA-protein interaction networks using PAR-CLIP. WILEY INTERDISCIPLINARY REVIEWS-RNA 2011;3:159-77. [PMID: 22213601 DOI: 10.1002/wrna.1103] [Citation(s) in RCA: 177] [Impact Index Per Article: 13.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]