1
|
Augustijn HE, Roseboom AM, Medema MH, van Wezel GP. Harnessing regulatory networks in Actinobacteria for natural product discovery. J Ind Microbiol Biotechnol 2024; 51:kuae011. [PMID: 38569653 PMCID: PMC10996143 DOI: 10.1093/jimb/kuae011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2024] [Accepted: 04/02/2024] [Indexed: 04/05/2024]
Abstract
Microbes typically live in complex habitats where they need to rapidly adapt to continuously changing growth conditions. To do so, they produce an astonishing array of natural products with diverse structures and functions. Actinobacteria stand out for their prolific production of bioactive molecules, including antibiotics, anticancer agents, antifungals, and immunosuppressants. Attention has been directed especially towards the identification of the compounds they produce and the mining of the large diversity of biosynthetic gene clusters (BGCs) in their genomes. However, the current return on investment in random screening for bioactive compounds is low, while it is hard to predict which of the millions of BGCs should be prioritized. Moreover, many of the BGCs for yet undiscovered natural products are silent or cryptic under laboratory growth conditions. To identify ways to prioritize and activate these BGCs, knowledge regarding the way their expression is controlled is crucial. Intricate regulatory networks control global gene expression in Actinobacteria, governed by a staggering number of up to 1000 transcription factors per strain. This review highlights recent advances in experimental and computational methods for characterizing and predicting transcription factor binding sites and their applications to guide natural product discovery. We propose that regulation-guided genome mining approaches will open new avenues toward eliciting the expression of BGCs, as well as prioritizing subsets of BGCs for expression using synthetic biology approaches. ONE-SENTENCE SUMMARY This review provides insights into advances in experimental and computational methods aimed at predicting transcription factor binding sites and their applications to guide natural product discovery.
Collapse
Affiliation(s)
- Hannah E Augustijn
- Bioinformatics Group, Wageningen University, Wageningen, The Netherlands
- Molecular Biotechnology, Institute of Biology, Leiden University, Leiden, The Netherlands
| | - Anna M Roseboom
- Molecular Biotechnology, Institute of Biology, Leiden University, Leiden, The Netherlands
| | - Marnix H Medema
- Bioinformatics Group, Wageningen University, Wageningen, The Netherlands
- Molecular Biotechnology, Institute of Biology, Leiden University, Leiden, The Netherlands
| | - Gilles P van Wezel
- Molecular Biotechnology, Institute of Biology, Leiden University, Leiden, The Netherlands
- Netherlands Institute for Ecology (NIOO-KNAW), Wageningen, The Netherlands
| |
Collapse
|
2
|
Jimmy JL, Karn R, Kumari S, Sruthilaxmi CB, Pooja S, Emerson IA, Babu S. Rice WRKY13 TF protein binds to motifs in the promoter region to regulate downstream disease resistance-related genes. Funct Integr Genomics 2023; 23:249. [PMID: 37474674 DOI: 10.1007/s10142-023-01167-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2023] [Revised: 06/22/2023] [Accepted: 07/03/2023] [Indexed: 07/22/2023]
Abstract
In plants, pathogen resistance is brought about by the binding of certain transcription factor (TF) proteins to the cis-elements of certain target genes. These cis-elements are present upstream in the motif of the promoters of each gene. This ensures the binding of a specific TF to a specific promoter, therefore regulating the expression of that gene. Therefore, the study of each promoter sequence of all the rice genes would help identify the target genes of a specific TF. Rice 1 kb upstream promoter sequences of 55,986 annotated genes were analyzed using the Perl program algorithm to detect WRKY13 binding motifs (bm). The resulting genes were grouped using Gene Ontology and gene set enrichment analysis. A gene with more than 4 TF bm in their promoter was selected. Ten genes reported to have a role in rice disease resistance were selected for further analysis. Cis-acting regulatory element analysis was carried out to find the cis-elements and confirm the presence of the corresponding motifs in the promoter sequences of these genes. The 3D structure of WRKY13 TF and the corresponding ten genes were built, and the interacting residues were determined. The binding capacity of WRKY13 to the promoter of these selected genes was analyzed using docking studies. WRKY13 was considered for docking analysis based on the prior reports of autoregulation. Molecular dynamic simulations provided more details regarding the interactions. Expression data revealed the expression of the genes that helped provide the mechanism of interaction. Further co-expression network helped to characterize the interaction of these selected disease resistance-related genes with the WRKY13 TF protein. This study suggests downstream target genes that are regulated by the WRKY13 TF. The molecular mechanism involving the gene network regulated by WRKY13 TF in disease resistance against rice fungal pathogens is explored.
Collapse
Affiliation(s)
- John Lilly Jimmy
- School of Bio Science and Technology, Vellore Institute of Technology, Vellore, 632014, India.
| | - Rohit Karn
- School of Bio Science and Technology, Vellore Institute of Technology, Vellore, 632014, India
| | - Sweta Kumari
- School of Bio Science and Technology, Vellore Institute of Technology, Vellore, 632014, India
| | | | - Singh Pooja
- School of Science, Monash University Malaysia, Bandar Sunway, Selangor, Malaysia
| | - Isaac Arnold Emerson
- School of Bio Science and Technology, Vellore Institute of Technology, Vellore, 632014, India
| | - Subramanian Babu
- VIT School of Agricultural Innovations and Advanced Learning, Vellore Institute of Technology, Vellore, 632014, India
| |
Collapse
|
3
|
Whata A, Chimedza C. Deep Learning for SARS COV-2 Genome Sequences. IEEE ACCESS : PRACTICAL INNOVATIONS, OPEN SOLUTIONS 2021; 9:59597-59611. [PMID: 34812391 PMCID: PMC8545213 DOI: 10.1109/access.2021.3073728] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/25/2021] [Accepted: 04/11/2021] [Indexed: 05/04/2023]
Abstract
The SARS-CoV-2 virus which originated in Wuhan, China has since spread throughout the world and is affecting millions of people. When there is a novel virus outbreak, it is crucial to quickly determine if the epidemic is a result of the novel virus or a well-known virus. We propose a deep learning algorithm that uses a convolutional neural network (CNN) as well as a bi-directional long short-term memory (Bi-LSTM) neural network, for the classification of the severe acute respiratory syndrome coronavirus 2 (SARS CoV-2) amongst Coronaviruses. Besides, we classify whether a genome sequence contains candidate regulatory motifs or otherwise. Regulatory motifs bind to transcription factors. Transcription factors are responsible for the expression of genes. The experimental results show that at peak performance, the proposed convolutional neural network bi-directional long short-term memory (CNN-Bi-LSTM) model achieves a classification accuracy of 99.95%, area under curve receiver operating characteristic (AUC ROC) of 100.00%, a specificity of 99.97%, the sensitivity of 99.97%, Cohen's Kappa equal to 0.9978, Mathews Correlation Coefficient (MCC) equal to 0.9978 for the classification of SARS CoV-2 amongst Coronaviruses. Also, the CNN-Bi-LSTM correctly detects whether a sequence has candidate regulatory motifs or binding-sites with a classification accuracy of 99.76%, AUC ROC of 100.00%, a specificity of 99.76%, a sensitivity of 99.76%, MCC equal to 0.9980, and Cohen's Kappa of 0.9970 at peak performance. These results are encouraging enough to recognise deep learning algorithms as alternative avenues for detecting SARS CoV-2 as well as detecting regulatory motifs in the SARS CoV-2 genes.
Collapse
Affiliation(s)
- Albert Whata
- School of Natural and Applied SciencesSol Plaatje University Kimberley 8301 South Africa
| | - Charles Chimedza
- School of Statistics and Actuarial ScienceUniversity of the Witwatersrand Johannesburg 2050 South Africa
| |
Collapse
|
4
|
Carazo F, Romero JP, Rubio A. Upstream analysis of alternative splicing: a review of computational approaches to predict context-dependent splicing factors. Brief Bioinform 2020; 20:1358-1375. [PMID: 29390045 DOI: 10.1093/bib/bby005] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2017] [Revised: 12/14/2017] [Indexed: 12/13/2022] Open
Abstract
Alternative splicing (AS) has shown to play a pivotal role in the development of diseases, including cancer. Specifically, all the hallmarks of cancer (angiogenesis, cell immortality, avoiding immune system response, etc.) are found to have a counterpart in aberrant splicing of key genes. Identifying the context-specific regulators of splicing provides valuable information to find new biomarkers, as well as to define alternative therapeutic strategies. The computational models to identify these regulators are not trivial and require three conceptual steps: the detection of AS events, the identification of splicing factors that potentially regulate these events and the contextualization of these pieces of information for a specific experiment. In this work, we review the different algorithmic methodologies developed for each of these tasks. Main weaknesses and strengths of the different steps of the pipeline are discussed. Finally, a case study is detailed to help the reader be aware of the potential and limitations of this computational approach.
Collapse
|
5
|
Pronsato L, Milanesi L, Vasconsuelo A. Testosterone induces up-regulation of mitochondrial gene expression in murine C2C12 skeletal muscle cells accompanied by an increase of nuclear respiratory factor-1 and its downstream effectors. Mol Cell Endocrinol 2020; 500:110631. [PMID: 31676390 DOI: 10.1016/j.mce.2019.110631] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/06/2019] [Revised: 10/25/2019] [Accepted: 10/25/2019] [Indexed: 01/03/2023]
Abstract
The reduction in muscle mass and strength with age, sarcopenia, is a prevalent condition among the elderly, linked to skeletal muscle dysfunction and cell apoptosis. We demonstrated that testosterone protects against H2O2-induced apoptosis in C2C12 muscle cells. Here, we analyzed the effect of testosterone on mitochondrial gene expression in C2C12 skeletal muscle cells. We found that testosterone increases mRNA expression of genes encoded by mitochondrial DNA, such as NADPH dehydrogenase subunit 1 (ND1), subunit 4 (ND4), cytochrome b (CytB), cytochrome c oxidase subunit 1 (Cox1) and subunit 2 (Cox2) in C2C12. Additionally, the hormone induced the expression of the nuclear respiratory factors 1 and 2 (Nrf-1 and Nrf-2), the mitochondrial transcription factors A (Tfam) and B2 (TFB2M), and the optic atrophy 1 (OPA1). The simultaneous treatment with testosterone and the androgen receptor antagonist, Flutamide, reduced these effects. H2O2-oxidative stress induced treatment, significantly decreased mitochondrial gene expression. Computational analysis revealed that mitochondrial DNA contains specific sequences, which the androgen receptor could recognize and bind, probably taking place a direct regulation of mitochondrial transcription by the receptor. These findings indicate that androgen plays an important role in the regulation of mitochondrial transcription and biogenesis in skeletal muscle.
Collapse
Affiliation(s)
- Lucía Pronsato
- Instituto de Investigaciones Biológicas y Biomédicas del Sur (INBIOSUR-CONICET), 8000, Bahía Blanca, Argentina.
| | - Lorena Milanesi
- Instituto de Investigaciones Biológicas y Biomédicas del Sur (INBIOSUR-CONICET), 8000, Bahía Blanca, Argentina.
| | - Andrea Vasconsuelo
- Instituto de Investigaciones Biológicas y Biomédicas del Sur (INBIOSUR-CONICET), 8000, Bahía Blanca, Argentina
| |
Collapse
|
6
|
Guo WL, Huang DS. An efficient method to transcription factor binding sites imputation via simultaneous completion of multiple matrices with positional consistency. MOLECULAR BIOSYSTEMS 2018; 13:1827-1837. [PMID: 28718849 DOI: 10.1039/c7mb00155j] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
Transcription factors (TFs) are DNA-binding proteins that have a central role in regulating gene expression. Identification of DNA-binding sites of TFs is a key task in understanding transcriptional regulation, cellular processes and disease. Chromatin immunoprecipitation followed by high-throughput sequencing (ChIP-seq) enables genome-wide identification of in vivo TF binding sites. However, it is still difficult to map every TF in every cell line owing to cost and biological material availability, which poses an enormous obstacle for integrated analysis of gene regulation. To address this problem, we propose a novel computational approach, TFBSImpute, for predicting additional TF binding profiles by leveraging information from available ChIP-seq TF binding data. TFBSImpute fuses the dataset to a 3-mode tensor and imputes missing TF binding signals via simultaneous completion of multiple TF binding matrices with positional consistency. We show that signals predicted by our method achieve overall similarity with experimental data and that TFBSImpute significantly outperforms baseline approaches, by assessing the performance of imputation methods against observed ChIP-seq TF binding profiles. Besides, motif analysis shows that TFBSImpute preforms better in capturing binding motifs enriched in observed data compared with baselines, indicating that the higher performance of TFBSImpute is not simply due to averaging related samples. We anticipate that our approach will constitute a useful complement to experimental mapping of TF binding, which is beneficial for further study of regulation mechanisms and disease.
Collapse
Affiliation(s)
- Wei-Li Guo
- Institute of Machine Learning and Systems Biology, School of Electronics and Information Engineering, Tongji University, Shanghai, 201804, China.
| | | |
Collapse
|
7
|
Pan WH, Sommer F, Falk-Paulsen M, Ulas T, Best L, Fazio A, Kachroo P, Luzius A, Jentzsch M, Rehman A, Müller F, Lengauer T, Walter J, Künzel S, Baines JF, Schreiber S, Franke A, Schultze JL, Bäckhed F, Rosenstiel P. Exposure to the gut microbiota drives distinct methylome and transcriptome changes in intestinal epithelial cells during postnatal development. Genome Med 2018; 10:27. [PMID: 29653584 PMCID: PMC5899322 DOI: 10.1186/s13073-018-0534-5] [Citation(s) in RCA: 95] [Impact Index Per Article: 15.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2018] [Accepted: 03/20/2018] [Indexed: 12/26/2022] Open
Abstract
BACKGROUND The interplay of epigenetic processes and the intestinal microbiota may play an important role in intestinal development and homeostasis. Previous studies have established that the microbiota regulates a large proportion of the intestinal epithelial transcriptome in the adult host, but microbial effects on DNA methylation and gene expression during early postnatal development are still poorly understood. Here, we sought to investigate the microbial effects on DNA methylation and the transcriptome of intestinal epithelial cells (IECs) during postnatal development. METHODS We collected IECs from the small intestine of each of five 1-, 4- and 12 to 16-week-old mice representing the infant, juvenile, and adult states, raised either in the presence or absence of a microbiota. The DNA methylation profile was determined using reduced representation bisulfite sequencing (RRBS) and the epithelial transcriptome by RNA sequencing using paired samples from each individual mouse to analyze the link between microbiota, gene expression, and DNA methylation. RESULTS We found that microbiota-dependent and -independent processes act together to shape the postnatal development of the transcriptome and DNA methylation signatures of IECs. The bacterial effect on the transcriptome increased over time, whereas most microbiota-dependent DNA methylation differences were detected already early after birth. Microbiota-responsive transcripts could be attributed to stage-specific cellular programs during postnatal development and regulated gene sets involved primarily immune pathways and metabolic processes. Integrated analysis of the methylome and transcriptome data identified 126 genomic loci at which coupled differential DNA methylation and RNA transcription were associated with the presence of intestinal microbiota. We validated a subset of differentially expressed and methylated genes in an independent mouse cohort, indicating the existence of microbiota-dependent "functional" methylation sites which may impact on long-term gene expression signatures in IECs. CONCLUSIONS Our study represents the first genome-wide analysis of microbiota-mediated effects on maturation of DNA methylation signatures and the transcriptional program of IECs after birth. It indicates that the gut microbiota dynamically modulates large portions of the epithelial transcriptome during postnatal development, but targets only a subset of microbially responsive genes through their DNA methylation status.
Collapse
Affiliation(s)
- Wei-Hung Pan
- Institute for Clinical Molecular Biology, University of Kiel, Rosalind-Franklin-Straße 12, 24105, Kiel, Germany
| | - Felix Sommer
- Institute for Clinical Molecular Biology, University of Kiel, Rosalind-Franklin-Straße 12, 24105, Kiel, Germany
- The Wallenberg Laboratory, Department of Molecular and Clinical Medicine, University of Gothenburg, 41345, Gothenburg, Sweden
| | - Maren Falk-Paulsen
- Institute for Clinical Molecular Biology, University of Kiel, Rosalind-Franklin-Straße 12, 24105, Kiel, Germany
| | - Thomas Ulas
- Genomics and Immunoregulation, LIMES-Institute, University of Bonn, 53115, Bonn, Germany
| | - Lena Best
- Institute for Clinical Molecular Biology, University of Kiel, Rosalind-Franklin-Straße 12, 24105, Kiel, Germany
| | - Antonella Fazio
- Institute for Clinical Molecular Biology, University of Kiel, Rosalind-Franklin-Straße 12, 24105, Kiel, Germany
| | - Priyadarshini Kachroo
- Institute for Clinical Molecular Biology, University of Kiel, Rosalind-Franklin-Straße 12, 24105, Kiel, Germany
| | - Anne Luzius
- Institute for Clinical Molecular Biology, University of Kiel, Rosalind-Franklin-Straße 12, 24105, Kiel, Germany
| | - Marlene Jentzsch
- Institute for Clinical Molecular Biology, University of Kiel, Rosalind-Franklin-Straße 12, 24105, Kiel, Germany
| | - Ateequr Rehman
- Institute for Clinical Molecular Biology, University of Kiel, Rosalind-Franklin-Straße 12, 24105, Kiel, Germany
| | - Fabian Müller
- Max Planck Institute for Informatics, 66123, Saarbrücken, Germany
| | - Thomas Lengauer
- Max Planck Institute for Informatics, 66123, Saarbrücken, Germany
- Graduate School of Computer Science, Saarland University, 66123, Saarbrücken, Germany
| | - Jörn Walter
- Department of Genetics, University of Saarland, 66123, Saarbrücken, Germany
| | - Sven Künzel
- Institute for Experimental Medicine, Christian Albrechts University of Kiel, Kiel, Germany
| | - John F Baines
- Institute for Experimental Medicine, Christian Albrechts University of Kiel, Kiel, Germany
- Max Planck Institute for Evolutionary Biology, Evolutionary Genomics, August-Thienemann-Str. 2, 24306, Plön, Germany
| | - Stefan Schreiber
- Institute for Clinical Molecular Biology, University of Kiel, Rosalind-Franklin-Straße 12, 24105, Kiel, Germany
- Department of Internal Medicine I, University Hospital Schleswig Holstein, 24105, Kiel, Germany
| | - Andre Franke
- Institute for Clinical Molecular Biology, University of Kiel, Rosalind-Franklin-Straße 12, 24105, Kiel, Germany
| | - Joachim L Schultze
- Genomics and Immunoregulation, LIMES-Institute, University of Bonn, 53115, Bonn, Germany
- Platform for Single Cell Genomics and Epigenomics (PRECISE), German Center for Neurodegenerative Diseases and the University of Bonn, Bonn, Germany
| | - Fredrik Bäckhed
- The Wallenberg Laboratory, Department of Molecular and Clinical Medicine, University of Gothenburg, 41345, Gothenburg, Sweden
- Novo Nordisk Foundation Center for Basic Metabolic Research, Section for Metabolic Receptology and Enteroendocrinology, Faculty of Health Sciences, University of Copenhagen, 2200, Copenhagen, Denmark
| | - Philip Rosenstiel
- Institute for Clinical Molecular Biology, University of Kiel, Rosalind-Franklin-Straße 12, 24105, Kiel, Germany.
| |
Collapse
|
8
|
Djordjevic M, Djordjevic M, Zdobnov E. Scoring Targets of Transcription in Bacteria Rather than Focusing on Individual Binding Sites. Front Microbiol 2017; 8:2314. [PMID: 29213263 PMCID: PMC5702782 DOI: 10.3389/fmicb.2017.02314] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2017] [Accepted: 11/09/2017] [Indexed: 11/13/2022] Open
Abstract
Reliable identification of targets of bacterial regulators is necessary to understand bacterial gene expression regulation. These targets are commonly predicted by searching for high-scoring binding sites in the upstream genomic regions, which typically leads to a large number of false positives. In contrast to the common approach, here we propose a novel concept, where overrepresentation of the scoring distribution that corresponds to the entire searched region is assessed, as opposed to predicting individual binding sites. We explore two implementations of this concept, based on Kolmogorov-Smirnov (KS) and Anderson-Darling (AD) tests, which both provide straightforward P-value estimates for predicted targets. This approach is implemented for pleiotropic bacterial regulators, including σ70 (bacterial housekeeping σ factor) target predictions, which is a classical bioinformatics problem characterized by low specificity. We show that KS based approach is both faster and more accurate, departing from the current paradigm of AD being slower, but more accurate. Moreover, KS approach leads to a significant increase in the search accuracy compared to the standard approach, while at the same time straightforwardly assigning well established P-values to each potential target. Consequently, the new KS based method proposed here, which assigns P-values to fixed length upstream regions, provides a fast and accurate approach for predicting bacterial transcription targets.
Collapse
Affiliation(s)
- Marko Djordjevic
- Institute of Physiology and Biochemistry, Faculty of Biology, University of Belgrade, Belgrade, Serbia
| | | | - Evgeny Zdobnov
- Swiss Institute of Bioinformatics and Department of Genetic Medicine and Development, University of Geneva, Geneva, Switzerland
| |
Collapse
|
9
|
Identifying novel transcription factors involved in the inflammatory response by using binding site motif scanning in genomic regions defined by histone acetylation. PLoS One 2017; 12:e0184850. [PMID: 28922390 PMCID: PMC5602638 DOI: 10.1371/journal.pone.0184850] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2017] [Accepted: 08/31/2017] [Indexed: 02/07/2023] Open
Abstract
The innate immune response to pathogenic challenge is a complex, multi-staged process involving thousands of genes. While numerous transcription factors that act as master regulators of this response have been identified, the temporal complexity of gene expression changes in response to pathogen-associated molecular pattern receptor stimulation strongly suggest that additional layers of regulation remain to be uncovered. The evolved pathogen response program in mammalian innate immune cells is understood to reflect a compromise between the probability of clearing the infection and the extent of tissue damage and inflammatory sequelae it causes. Because of that, a key challenge to delineating the regulators that control the temporal inflammatory response is that an innate immune regulator that may confer a selective advantage in the wild may be dispensable in the lab setting. In order to better understand the complete transcriptional response of primary macrophages to the bacterial endotoxin lipopolysaccharide (LPS), we designed a method that integrates temporally resolved gene expression and chromatin-accessibility measurements from mouse macrophages. By correlating changes in transcription factor binding site motif enrichment scores, calculated within regions of accessible chromatin, with the average temporal expression profile of a gene cluster, we screened for transcriptional factors that regulate the cluster. We have validated our predictions of LPS-stimulated transcriptional regulators using ChIP-seq data for three transcription factors with experimentally confirmed functions in innate immunity. In addition, we predict a role in the macrophage LPS response for several novel transcription factors that have not previously been implicated in immune responses. This method is applicable to any experimental situation where temporal gene expression and chromatin-accessibility data are available.
Collapse
|
10
|
Jayaram N, Usvyat D, R Martin AC. Evaluating tools for transcription factor binding site prediction. BMC Bioinformatics 2016; 17:547. [PMID: 27806697 PMCID: PMC6889335 DOI: 10.1186/s12859-016-1298-9] [Citation(s) in RCA: 56] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2016] [Accepted: 10/20/2016] [Indexed: 12/21/2022] Open
Abstract
Background Binding of transcription factors to transcription factor binding sites (TFBSs) is key to the mediation of transcriptional regulation. Information on experimentally validated functional TFBSs is limited and consequently there is a need for accurate prediction of TFBSs for gene annotation and in applications such as evaluating the effects of single nucleotide variations in causing disease. TFBSs are generally recognized by scanning a position weight matrix (PWM) against DNA using one of a number of available computer programs. Thus we set out to evaluate the best tools that can be used locally (and are therefore suitable for large-scale analyses) for creating PWMs from high-throughput ChIP-Seq data and for scanning them against DNA. Results We evaluated a set of de novo motif discovery tools that could be downloaded and installed locally using ENCODE-ChIP-Seq data and showed that rGADEM was the best-performing tool. TFBS prediction tools used to scan PWMs against DNA fall into two classes — those that predict individual TFBSs and those that identify clusters. Our evaluation showed that FIMO and MCAST performed best respectively. Conclusions Selection of the best-performing tools for generating PWMs from ChIP-Seq data and for scanning PWMs against DNA has the potential to improve prediction of precise transcription factor binding sites within regions identified by ChIP-Seq experiments for gene finding, understanding regulation and in evaluating the effects of single nucleotide variations in causing disease. Electronic supplementary material The online version of this article (doi:10.1186/s12859-016-1298-9) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Narayan Jayaram
- Institute of Structural and Molecular Biology, Division of Biosciences, University College London, Darwin Building, Gower Street, London, WC1E 6BT, UK
| | - Daniel Usvyat
- Institute of Structural and Molecular Biology, Division of Biosciences, University College London, Darwin Building, Gower Street, London, WC1E 6BT, UK
| | - Andrew C R Martin
- Institute of Structural and Molecular Biology, Division of Biosciences, University College London, Darwin Building, Gower Street, London, WC1E 6BT, UK.
| |
Collapse
|
11
|
Sharmin M, Bravo HC, Hannenhalli S. Heterogeneity of transcription factor binding specificity models within and across cell lines. Genome Res 2016; 26:1110-23. [PMID: 27311443 PMCID: PMC4971765 DOI: 10.1101/gr.199166.115] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2015] [Accepted: 06/16/2016] [Indexed: 12/24/2022]
Abstract
Complex gene expression patterns are mediated by the binding of transcription factors (TFs) to specific genomic loci. The in vivo occupancy of a TF is, in large part, determined by the TF's DNA binding interaction partners, motivating genomic context-based models of TF occupancy. However, approaches thus far have assumed a uniform TF binding model to explain genome-wide cell-type–specific binding sites. Therefore, the cell type heterogeneity of TF occupancy models, as well as the extent to which binding rules underlying a TF's occupancy are shared across cell types, has not been investigated. Here, we develop an ensemble-based approach (TRISECT) to identify the heterogeneous binding rules for cell-type–specific TF occupancy and analyze the inter-cell-type sharing of such rules. Comprehensive analysis of 23 TFs, each with ChIP-seq data in four to 12 different cell types, shows that by explicitly capturing the heterogeneity of binding rules, TRISECT accurately identifies in vivo TF occupancy. Importantly, many of the binding rules derived from individual cell types are shared across cell types and reveal distinct yet functionally coherent putative target genes in different cell types. Closer inspection of the predicted cell-type–specific interaction partners provides insights into the context-specific functional landscape of a TF. Together, our novel ensemble-based approach reveals, for the first time, a widespread heterogeneity of binding rules, comprising the interaction partners within a cell type, many of which nevertheless transcend cell types. Notably, the putative targets of shared binding rules in different cell types, while distinct, exhibit significant functional coherence.
Collapse
Affiliation(s)
- Mahfuza Sharmin
- Department of Computer Science, University of Maryland, College Park, Maryland 20742, USA; Center for Bioinformatics and Computational Biology, University of Maryland, College Park, Maryland 20742, USA
| | - Héctor Corrada Bravo
- Department of Computer Science, University of Maryland, College Park, Maryland 20742, USA; Center for Bioinformatics and Computational Biology, University of Maryland, College Park, Maryland 20742, USA
| | - Sridhar Hannenhalli
- Center for Bioinformatics and Computational Biology, University of Maryland, College Park, Maryland 20742, USA; Department of Cell and Molecular Biology, University of Maryland, College Park, Maryland 20742, USA
| |
Collapse
|
12
|
Behura SK, Sarro J, Li P, Mysore K, Severson DW, Emrich SJ, Duman-Scheel M. High-throughput cis-regulatory element discovery in the vector mosquito Aedes aegypti. BMC Genomics 2016; 17:341. [PMID: 27161480 PMCID: PMC4862039 DOI: 10.1186/s12864-016-2468-x] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2015] [Accepted: 02/12/2016] [Indexed: 12/15/2022] Open
Abstract
Background Despite substantial progress in mosquito genomic and genetic research, few cis-regulatory elements (CREs), DNA sequences that control gene expression, have been identified in mosquitoes or other non-model insects. Formaldehyde-assisted isolation of regulatory elements paired with DNA sequencing, FAIRE-seq, is emerging as a powerful new high-throughput tool for global CRE discovery. FAIRE results in the preferential recovery of open chromatin DNA fragments that are not bound by nucleosomes, an evolutionarily conserved indicator of regulatory activity, which are then sequenced. Despite the power of the approach, FAIRE-seq has not yet been applied to the study of non-model insects. In this investigation, we utilized FAIRE-seq to profile open chromatin and identify likely regulatory elements throughout the genome of the human disease vector mosquito Aedes aegypti. We then assessed genetic variation in the regulatory elements of dengue virus susceptible (Moyo-S) and refractory (Moyo-R) mosquito strains. Results Analysis of sequence data obtained through next generation sequencing of FAIRE DNA isolated from A. aegypti embryos revealed >121,000 FAIRE peaks (FPs), many of which clustered in the 1 kb 5’ upstream flanking regions of genes known to be expressed at this stage. As expected, known transcription factor consensus binding sites were enriched in the FPs, and of these FoxA1, Hunchback, Gfi, Klf4, MYB/ph3 and Sox9 are most predominant. All of the elements tested in vivo were confirmed to drive gene expression in transgenic Drosophila reporter assays. Of the >13,000 single nucleotide polymorphisms (SNPs) recently identified in dengue virus-susceptible and refractory mosquito strains, 3365 were found to map to FPs. Conclusion FAIRE-seq analysis of open chromatin in A. aegypti permitted genome-wide discovery of CREs. The results of this investigation indicate that FAIRE-seq is a powerful tool for identification of regulatory DNA in the genomes of non-model organisms, including human disease vector mosquitoes. Electronic supplementary material The online version of this article (doi:10.1186/s12864-016-2468-x) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Susanta K Behura
- Eck Institute for Global Health, University of Notre Dame, Notre Dame, IN, 46556, USA.,Department of Biological Sciences, University of Notre Dame, Notre Dame, IN, 46556, USA
| | - Joseph Sarro
- Eck Institute for Global Health, University of Notre Dame, Notre Dame, IN, 46556, USA.,Department of Biological Sciences, University of Notre Dame, Notre Dame, IN, 46556, USA.,Department of Medical and Molecular Genetics, Indiana University School of Medicine, 1234 Notre Dame Ave., South Bend, IN, 46617, USA
| | - Ping Li
- Eck Institute for Global Health, University of Notre Dame, Notre Dame, IN, 46556, USA.,Department of Medical and Molecular Genetics, Indiana University School of Medicine, 1234 Notre Dame Ave., South Bend, IN, 46617, USA
| | - Keshava Mysore
- Eck Institute for Global Health, University of Notre Dame, Notre Dame, IN, 46556, USA.,Department of Medical and Molecular Genetics, Indiana University School of Medicine, 1234 Notre Dame Ave., South Bend, IN, 46617, USA
| | - David W Severson
- Eck Institute for Global Health, University of Notre Dame, Notre Dame, IN, 46556, USA.,Department of Biological Sciences, University of Notre Dame, Notre Dame, IN, 46556, USA.,Department of Medical and Molecular Genetics, Indiana University School of Medicine, 1234 Notre Dame Ave., South Bend, IN, 46617, USA
| | - Scott J Emrich
- Eck Institute for Global Health, University of Notre Dame, Notre Dame, IN, 46556, USA. .,Department of Biological Sciences, University of Notre Dame, Notre Dame, IN, 46556, USA. .,Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN, 46556, USA.
| | - Molly Duman-Scheel
- Eck Institute for Global Health, University of Notre Dame, Notre Dame, IN, 46556, USA. .,Department of Biological Sciences, University of Notre Dame, Notre Dame, IN, 46556, USA. .,Department of Medical and Molecular Genetics, Indiana University School of Medicine, 1234 Notre Dame Ave., South Bend, IN, 46617, USA.
| |
Collapse
|
13
|
Verma S, Kesh K, Gupta A, Swarnakar S. An Overview of Matrix Metalloproteinase 9 Polymorphism and Gastric Cancer Risk. Asian Pac J Cancer Prev 2015; 16:7393-400. [DOI: 10.7314/apjcp.2015.16.17.7393] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
|
14
|
Clifford J, Adami C. Discovery and information-theoretic characterization of transcription factor binding sites that act cooperatively. Phys Biol 2015; 12:056004. [PMID: 26331781 DOI: 10.1088/1478-3975/12/5/056004] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]
Abstract
Transcription factor binding to the surface of DNA regulatory regions is one of the primary causes of regulating gene expression levels. A probabilistic approach to model protein-DNA interactions at the sequence level is through position weight matrices (PWMs) that estimate the joint probability of a DNA binding site sequence by assuming positional independence within the DNA sequence. Here we construct conditional PWMs that depend on the motif signatures in the flanking DNA sequence, by conditioning known binding site loci on the presence or absence of additional binding sites in the flanking sequence of each site's locus. Pooling known sites with similar flanking sequence patterns allows for the estimation of the conditional distribution function over the binding site sequences. We apply our model to the Dorsal transcription factor binding sites active in patterning the Dorsal-Ventral axis of Drosophila development. We find that those binding sites that cooperate with nearby Twist sites on average contain about 0.5 bits of information about the presence of Twist transcription factor binding sites in the flanking sequence. We also find that Dorsal binding site detectors conditioned on flanking sequence information make better predictions about what is a Dorsal site relative to background DNA than detection without information about flanking sequence features.
Collapse
Affiliation(s)
- Jacob Clifford
- Department of Physics and Astronomy, Michigan State University, East Lansing, MI, USA. BEACON Center for the Study of Evolution in Action, Michigan State University, East Lansing, MI, USA
| | | |
Collapse
|
15
|
Contribution of Sequence Motif, Chromatin State, and DNA Structure Features to Predictive Models of Transcription Factor Binding in Yeast. PLoS Comput Biol 2015; 11:e1004418. [PMID: 26291518 PMCID: PMC4546298 DOI: 10.1371/journal.pcbi.1004418] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2014] [Accepted: 06/29/2015] [Indexed: 11/19/2022] Open
Abstract
Transcription factor (TF) binding is determined by the presence of specific sequence motifs (SM) and chromatin accessibility, where the latter is influenced by both chromatin state (CS) and DNA structure (DS) properties. Although SM, CS, and DS have been used to predict TF binding sites, a predictive model that jointly considers CS and DS has not been developed to predict either TF-specific binding or general binding properties of TFs. Using budding yeast as model, we found that machine learning classifiers trained with either CS or DS features alone perform better in predicting TF-specific binding compared to SM-based classifiers. In addition, simultaneously considering CS and DS further improves the accuracy of the TF binding predictions, indicating the highly complementary nature of these two properties. The contributions of SM, CS, and DS features to binding site predictions differ greatly between TFs, allowing TF-specific predictions and potentially reflecting different TF binding mechanisms. In addition, a "TF-agnostic" predictive model based on three DNA “intrinsic properties” (in silico predicted nucleosome occupancy, major groove geometry, and dinucleotide free energy) that can be calculated from genomic sequences alone has performance that rivals the model incorporating experiment-derived data. This intrinsic property model allows prediction of binding regions not only across TFs, but also across DNA-binding domain families with distinct structural folds. Furthermore, these predicted binding regions can help identify TF binding sites that have a significant impact on target gene expression. Because the intrinsic property model allows prediction of binding regions across DNA-binding domain families, it is TF agnostic and likely describes general binding potential of TFs. Thus, our findings suggest that it is feasible to establish a TF agnostic model for identifying functional regulatory regions in potentially any sequenced genome. Identification of transcription factor binding sites based on sequence motifs is typically accompanied by a high false positive rate. Increasing evidence suggests that there are many other factors besides DNA sequence that may affect the binding and interaction of TFs with DNA. Through the integration of sequence motif, chromatin state, and DNA structure properties, we show that TF binding can be better predicted. Moreover, considering chromatin state and DNA structure properties simultaneously yields a significant improvement. While the binding of some TFs can be readily predicted using either chromatin state information or DNA structure, other TFs need both. Thus, our findings provide insights on how different histone modifications and DNA structure properties may influence the binding of a particular TF and thus how TFs regulate gene expression. These features are referred to as sequence “intrinsic properties” because they can be predicted from sequences alone. These intrinsic properties can be used to build a TF binding prediction model that has a similar performance to considering all features. Moreover, the intrinsic property model allows TFBS predictions not only across TFs, but also across DNA-binding domain families that are present in most eukaryotes, suggesting that the model likely can be used across species.
Collapse
|
16
|
Homotypic clustering of OsMYB4 binding site motifs in promoters of the rice genome and cellular-level implications on sheath blight disease resistance. Gene 2015; 561:209-18. [DOI: 10.1016/j.gene.2015.02.031] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2014] [Revised: 02/08/2015] [Accepted: 02/12/2015] [Indexed: 11/18/2022]
|
17
|
Akdeli N, Riemann K, Westphal J, Hess J, Siffert W, Bachmann HS. A 3'UTR polymorphism modulates mRNA stability of the oncogene and drug target Polo-like Kinase 1. Mol Cancer 2014; 13:87. [PMID: 24767679 PMCID: PMC4020576 DOI: 10.1186/1476-4598-13-87] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2013] [Accepted: 04/15/2014] [Indexed: 12/17/2022] Open
Abstract
BACKGROUND The Polo-like Kinase 1 (PLK1) protein regulates cell cycle progression and is overexpressed in many malignant tissues. Overexpression is associated with poor prognosis in several cancer entities, whereby expression of PLK1 shows high inter-individual variability. Although PLK1 is extensively studied, not much is known about the genetic variability of the PLK1 gene. The function of PLK1 and the expression of the corresponding gene could be influenced by genomic variations. Hence, we investigated the gene for functional polymorphisms. Such polymorphisms could be useful to investigate whether PLK1 alters the risk for and the course of cancer and they could have an impact on the response to PLK1 inhibitors. METHODS The coding region, the 5' and 3'UTRs and the regulatory regions of PLK1 were systematically sequenced. We determined the allele frequencies and genotype distributions of putatively functional SNPs in 120 Caucasians and analyzed the linkage and haplotype structure using Haploview. The functional analysis included electrophoretic mobility shift assay (EMSA) for detected variants of the silencer and promoter regions and reporter assays for a 3'UTR polymorphism. RESULTS Four putatively functional polymorphisms were detected and further analyzed, one in the silencer region (rs57973275), one in the core promoter region (rs16972787), one in intron 3 (rs40076) and one polymorphism in the 3'untranslated region (3'UTR) of PLK1 (rs27770). Alleles of rs27770 display different secondary mRNA structures and showed a distinct allele-dependent difference in mRNA stability with a significantly higher reporter activity of the A allele (p < 0.01). CONCLUSION The present study provides evidence that at least one genomic variant of PLK1 has functional properties and influences expression of PLK1. This suggests polymorphisms of the PLK1 gene as an interesting target for further studies that might affect cancer risk, tumor progression as well as the response to PLK1 inhibitors.
Collapse
Affiliation(s)
- Neval Akdeli
- Institute of Pharmacogenetics, University Hospital Essen, Hufelandstr. 55, 45147 Essen, Germany
| | - Kathrin Riemann
- Institute of Pharmacogenetics, University Hospital Essen, Hufelandstr. 55, 45147 Essen, Germany
| | - Jana Westphal
- Institute of Pharmacogenetics, University Hospital Essen, Hufelandstr. 55, 45147 Essen, Germany
| | - Jochen Hess
- Institute of Pharmacogenetics, University Hospital Essen, Hufelandstr. 55, 45147 Essen, Germany
- Department of Urology, University Hospital Essen, Hufelandstr. 55, 45147 Essen, Germany
| | - Winfried Siffert
- Institute of Pharmacogenetics, University Hospital Essen, Hufelandstr. 55, 45147 Essen, Germany
| | - Hagen S Bachmann
- Institute of Pharmacogenetics, University Hospital Essen, Hufelandstr. 55, 45147 Essen, Germany
| |
Collapse
|
18
|
Identifying functional transcription factor binding sites in yeast by considering their positional preference in the promoters. PLoS One 2014; 8:e83791. [PMID: 24386279 PMCID: PMC3873331 DOI: 10.1371/journal.pone.0083791] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2013] [Accepted: 11/08/2013] [Indexed: 11/25/2022] Open
Abstract
Transcription factor binding site (TFBS) identification plays an important role in deciphering gene regulatory codes. With comprehensive knowledge of TFBSs, one can understand molecular mechanisms of gene regulation. In the recent decades, various computational approaches have been proposed to predict TFBSs in the genome. The TFBS dataset of a TF generated by each algorithm is a ranked list of predicted TFBSs of that TF, where top ranked TFBSs are statistically significant ones. However, whether these statistically significant TFBSs are functional (i.e. biologically relevant) is still unknown. Here we develop a post-processor, called the functional propensity calculator (FPC), to assign a functional propensity to each TFBS in the existing computationally predicted TFBS datasets. It is known that functional TFBSs reveal strong positional preference towards the transcriptional start site (TSS). This motivates us to take TFBS position relative to the TSS as the key idea in building our FPC. Based on our calculated functional propensities, the TFBSs of a TF in the original TFBS dataset could be reordered, where top ranked TFBSs are now the ones with high functional propensities. To validate the biological significance of our results, we perform three published statistical tests to assess the enrichment of Gene Ontology (GO) terms, the enrichment of physical protein-protein interactions, and the tendency of being co-expressed. The top ranked TFBSs in our reordered TFBS dataset outperform the top ranked TFBSs in the original TFBS dataset, justifying the effectiveness of our post-processor in extracting functional TFBSs from the original TFBS dataset. More importantly, assigning functional propensities to putative TFBSs enables biologists to easily identify which TFBSs in the promoter of interest are likely to be biologically relevant and are good candidates to do further detailed experimental investigation. The FPC is implemented as a web tool at http://santiago.ee.ncku.edu.tw/FPC/.
Collapse
|
19
|
Kumari S, Ware D. Genome-wide computational prediction and analysis of core promoter elements across plant monocots and dicots. PLoS One 2013; 8:e79011. [PMID: 24205361 PMCID: PMC3812177 DOI: 10.1371/journal.pone.0079011] [Citation(s) in RCA: 36] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2013] [Accepted: 09/18/2013] [Indexed: 01/22/2023] Open
Abstract
Transcription initiation, essential to gene expression regulation, involves recruitment of basal transcription factors to the core promoter elements (CPEs). The distribution of currently known CPEs across plant genomes is largely unknown. This is the first large scale genome-wide report on the computational prediction of CPEs across eight plant genomes to help better understand the transcription initiation complex assembly. The distribution of thirteen known CPEs across four monocots (Brachypodium distachyon, Oryza sativa ssp. japonica, Sorghum bicolor, Zea mays) and four dicots (Arabidopsis thaliana, Populus trichocarpa, Vitis vinifera, Glycine max) reveals the structural organization of the core promoter in relation to the TATA-box as well as with respect to other CPEs. The distribution of known CPE motifs with respect to transcription start site (TSS) exhibited positional conservation within monocots and dicots with slight differences across all eight genomes. Further, a more refined subset of annotated genes based on orthologs of the model monocot (O. sativa ssp. japonica) and dicot (A. thaliana) genomes supported the positional distribution of these thirteen known CPEs. DNA free energy profiles provided evidence that the structural properties of promoter regions are distinctly different from that of the non-regulatory genome sequence. It also showed that monocot core promoters have lower DNA free energy than dicot core promoters. The comparison of monocot and dicot promoter sequences highlights both the similarities and differences in the core promoter architecture irrespective of the species-specific nucleotide bias. This study will be useful for future work related to genome annotation projects and can inspire research efforts aimed to better understand regulatory mechanisms of transcription.
Collapse
Affiliation(s)
- Sunita Kumari
- Cold Spring Harbor Laboratory, Cold Spring Harbor, New York, United States of America,
| | - Doreen Ware
- Cold Spring Harbor Laboratory, Cold Spring Harbor, New York, United States of America,
- United States Department of Agriculture-Agriculture Research Service, Robert W. Holley Center for Agriculture and Health, Ithaca, New York, United States of America
| |
Collapse
|
20
|
Plasschaert RN, Vigneau S, Tempera I, Gupta R, Maksimoska J, Everett L, Davuluri R, Mamorstein R, Lieberman PM, Schultz D, Hannenhalli S, Bartolomei MS. CTCF binding site sequence differences are associated with unique regulatory and functional trends during embryonic stem cell differentiation. Nucleic Acids Res 2013; 42:774-89. [PMID: 24121688 PMCID: PMC3902912 DOI: 10.1093/nar/gkt910] [Citation(s) in RCA: 49] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022] Open
Abstract
CTCF (CCCTC-binding factor) is a highly conserved multifunctional DNA-binding protein with thousands of binding sites genome-wide. Our previous work suggested that differences in CTCF’s binding site sequence may affect the regulation of CTCF recruitment and its function. To investigate this possibility, we characterized changes in genome-wide CTCF binding and gene expression during differentiation of mouse embryonic stem cells. After separating CTCF sites into three classes (LowOc, MedOc and HighOc) based on similarity to the consensus motif, we found that developmentally regulated CTCF binding occurs preferentially at LowOc sites, which have lower similarity to the consensus. By measuring the affinity of CTCF for selected sites, we show that sites lost during differentiation are enriched in motifs associated with weaker CTCF binding in vitro. Specifically, enrichment for T at the 18th position of the CTCF binding site is associated with regulated binding in the LowOc class and can predictably reduce CTCF affinity for binding sites. Finally, by comparing changes in CTCF binding with changes in gene expression during differentiation, we show that LowOc and HighOc sites are associated with distinct regulatory functions. Our results suggest that the regulatory control of CTCF is dependent in part on specific motifs within its binding site.
Collapse
Affiliation(s)
- Robert N Plasschaert
- Department of Cell & Developmental Biology, Perelman School of Medicine at the University of Pennsylvania, Philadelphia, PA 19104, USA, Program of Gene Expression and Regulation, The Wistar Institute, Philadelphia, PA 19104, USA, Department of Genetics, Perelman School of Medicine at the University of Pennsylvania, Philadelphia, PA 19104, USA and Department of Cell Biology and Molecular Genetics, University of Maryland, College Park, MD 20742, USA
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
21
|
Persikov AV, Singh M. De novo prediction of DNA-binding specificities for Cys2His2 zinc finger proteins. Nucleic Acids Res 2013; 42:97-108. [PMID: 24097433 PMCID: PMC3874201 DOI: 10.1093/nar/gkt890] [Citation(s) in RCA: 134] [Impact Index Per Article: 12.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
Proteins with sequence-specific DNA binding function are important for a wide range of biological activities. De novo prediction of their DNA-binding specificities from sequence alone would be a great aid in inferring cellular networks. Here we introduce a method for predicting DNA-binding specificities for Cys2His2 zinc fingers (C2H2-ZFs), the largest family of DNA-binding proteins in metazoans. We develop a general approach, based on empirical calculations of pairwise amino acid–nucleotide interaction energies, for predicting position weight matrices (PWMs) representing DNA-binding specificities for C2H2-ZF proteins. We predict DNA-binding specificities on a per-finger basis and merge predictions for C2H2-ZF domains that are arrayed within sequences. We test our approach on a diverse set of natural C2H2-ZF proteins with known binding specificities and demonstrate that for >85% of the proteins, their predicted PWMs are accurate in 50% of their nucleotide positions. For proteins with several zinc finger isoforms, we show via case studies that this level of accuracy enables us to match isoforms with their known DNA-binding specificities. A web server for predicting a PWM given a protein containing C2H2-ZF domains is available online at http://zf.princeton.edu and can be used to aid in protein engineering applications and in genome-wide searches for transcription factor targets.
Collapse
Affiliation(s)
- Anton V Persikov
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton NJ 08544, USA and Department of Computer Science, Princeton University, Princeton NJ 08544, USA
| | | |
Collapse
|
22
|
Wang H, Guan S, Zhu Z, Wang Y, Lu Y. A valid strategy for precise identifications of transcription factor binding sites in combinatorial regulation using bioinformatic and experimental approaches. PLANT METHODS 2013; 9:34. [PMID: 23971995 PMCID: PMC3847620 DOI: 10.1186/1746-4811-9-34] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/01/2013] [Accepted: 08/13/2013] [Indexed: 05/04/2023]
Abstract
BACKGROUND Transcription factor (TF) binding sites (cis element) play a central role in gene regulation, and eukaryotic organisms frequently adapt a combinatorial regulation to render sophisticated local gene expression patterns. Knowing the precise cis element on a distal promoter is a prerequisite for studying a typical transcription process; however, identifications of cis elements have lagged behind those of their associated trans acting TFs due to technical difficulties. Consequently, gene regulations via combinatorial TFs, as widely observed across biological processes, have remained vague in many cases. RESULTS We present here a valid strategy for identifying cis elements in combinatorial TF regulations. It consists of bioinformatic searches of available databases to generate candidate cis elements and tests of the candidates using improved experimental assays. Taking the MYB and the bHLH that collaboratively regulate the anthocyanin pathway genes as examples, we demonstrate how candidate cis motifs for the TFs are found on multi-specific promoters of chalcone synthase (CHS) genes, and how to experimentally test the candidate sites by designing DNA fragments hosting the candidate motifs based on a known promoter (us1 allele of Ipomoea purpurea CHS-D in our case) and applying site-mutagenesis at the motifs. It was shown that TF-DNA interactions could be unambiguously analyzed by assays of electrophoretic mobility shift (EMSA) and dual-luciferase transient expressions, and the resulting evidence precisely delineated a cis element. The cis element for R2R3 MYBs including Ipomoea MYB1 and Magnolia MYB1, for instance, was found to be ANCNACC, and that for bHLHs (exemplified by Ipomoea bHLH2 and petunia AN1) was CACNNG. A re-analysis was conducted on previously reported promoter segments recognized by maize C1 and apple MYB10, which indicated that cis elements similar to ANCNACC were indeed present on these segments, and tested positive for their bindings to Ipomoea MYB1. CONCLUSION Identification of cis elements in combinatorial regulation is now feasible with the strategy outlined. The working pipeline integrates the existing databases with experimental techniques, providing an open framework for precisely identifying cis elements. This strategy is widely applicable to various biological systems, and may enhance future analyses on gene regulation.
Collapse
Affiliation(s)
- Hailong Wang
- State Key Laboratory of Systematic and Evolutionary Botany, Institute of Botany, Chinese Academy of Sciences, 20 Nan Xin Cun, Beijing 100093, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Shan Guan
- State Key Laboratory of Systematic and Evolutionary Botany, Institute of Botany, Chinese Academy of Sciences, 20 Nan Xin Cun, Beijing 100093, China
| | - Zhixin Zhu
- State Key Laboratory of Systematic and Evolutionary Botany, Institute of Botany, Chinese Academy of Sciences, 20 Nan Xin Cun, Beijing 100093, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Yan Wang
- State Key Laboratory of Systematic and Evolutionary Botany, Institute of Botany, Chinese Academy of Sciences, 20 Nan Xin Cun, Beijing 100093, China
| | - Yingqing Lu
- State Key Laboratory of Systematic and Evolutionary Botany, Institute of Botany, Chinese Academy of Sciences, 20 Nan Xin Cun, Beijing 100093, China
| |
Collapse
|
23
|
Qu H, Fang X. A brief review on the Human Encyclopedia of DNA Elements (ENCODE) project. GENOMICS PROTEOMICS & BIOINFORMATICS 2013; 11:135-41. [PMID: 23722115 PMCID: PMC4357814 DOI: 10.1016/j.gpb.2013.05.001] [Citation(s) in RCA: 77] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/10/2013] [Revised: 05/15/2013] [Accepted: 05/18/2013] [Indexed: 12/18/2022]
Abstract
The ENCyclopedia Of DNA Elements (ENCODE) project is an international research consortium that aims to identify all functional elements in the human genome sequence. The second phase of the project comprised 1640 datasets from 147 different cell types, yielding a set of 30 publications across several journals. These data revealed that 80.4% of the human genome displays some functionality in at least one cell type. Many of these regulatory elements are physically associated with one another and further form a network or three-dimensional conformation to affect gene expression. These elements are also related to sequence variants associated with diseases or traits. All these findings provide us new insights into the organization and regulation of genes and genome, and serve as an expansive resource for understanding human health and disease.
Collapse
|
24
|
Fertig EJ, Favorov AV, Ochs MF. Identifying context-specific transcription factor targets from prior knowledge and gene expression data. IEEE Trans Nanobioscience 2013; 12:142-9. [PMID: 23694699 DOI: 10.1109/tnb.2013.2263390] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/26/2023]
Abstract
Numerous methodologies, assays, and databases presently provide candidate targets of transcription factors (TFs). However, TFs rarely regulate their targets universally. The context of activation of a TF can change the transcriptional response of targets. Direct multiple regulation typical to mammalian genes complicates direct inference of TF targets from gene expression data. We present a novel statistic that infers context-specific TF regulation based upon the CoGAPS algorithm, which infers overlapping gene expression patterns resulting from coregulation. Numerical experiments with simulated data showed that this statistic correctly inferred targets that are common to multiple TFs, except in cases where the signal from a TF is negligible relative to noise level and signal from other TFs. The statistic is robust to moderate levels of error in the simulated gene sets, identifying fewer false positives than false negatives. Significantly, the regulatory statistic refines the number of TF targets relevant to cell signaling in gastrointestinal stromal tumors (GIST) to genes consistent with the phosphorylation patterns of TFs identified in previous studies. As formulated, the proposed regulatory statistic has wide applicability to inferring set membership in integrated datasets. This statistic could be naturally extended to account for prior probabilities of set membership or to add candidate gene targets.
Collapse
Affiliation(s)
- Elana J Fertig
- Department of Oncology, SKCCC, School of Medicine, Johns Hopkins University, Baltimore, MD 21218, USA.
| | | | | |
Collapse
|
25
|
Khan MAF, Soto-Jimenez LM, Howe T, Streit A, Sosinsky A, Stern CD. Computational tools and resources for prediction and analysis of gene regulatory regions in the chick genome. Genesis 2013; 51:311-24. [PMID: 23355428 PMCID: PMC3664090 DOI: 10.1002/dvg.22375] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2012] [Revised: 01/16/2013] [Accepted: 01/17/2013] [Indexed: 11/07/2022]
Abstract
The discovery of cis-regulatory elements is a challenging problem in bioinformatics, owing to distal locations and context-specific roles of these elements in controlling gene regulation. Here we review the current bioinformatics methodologies and resources available for systematic discovery of cis-acting regulatory elements and conserved transcription factor binding sites in the chick genome. In addition, we propose and make available, a novel workflow using computational tools that integrate CTCF analysis to predict putative insulator elements, enhancer prediction, and TFBS analysis. To demonstrate the usefulness of this computational workflow, we then use it to analyze the locus of the gene Sox2 whose developmental expression is known to be controlled by a complex array of cis-acting regulatory elements. The workflow accurately predicts most of the experimentally verified elements along with some that have not yet been discovered. A web version of the CTCF tool, together with instructions for using the workflow can be accessed from http://toolshed.g2.bx.psu.edu/view/mkhan1980/ctcf_analysis. For local installation of the tool, relevant Perl scripts and instructions are provided in the directory named "code" in the supplementary materials.
Collapse
Affiliation(s)
- Mohsin A F Khan
- Department of Cell & Developmental Biology, University College London, London, United Kingdom
| | | | | | | | | | | |
Collapse
|
26
|
Lee C, Huang CH. LASAGNA: a novel algorithm for transcription factor binding site alignment. BMC Bioinformatics 2013; 14:108. [PMID: 23522376 PMCID: PMC3747862 DOI: 10.1186/1471-2105-14-108] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2012] [Accepted: 03/08/2013] [Indexed: 12/11/2022] Open
Abstract
BACKGROUND Scientists routinely scan DNA sequences for transcription factor (TF) binding sites (TFBSs). Most of the available tools rely on position-specific scoring matrices (PSSMs) constructed from aligned binding sites. Because of the resolutions of assays used to obtain TFBSs, databases such as TRANSFAC, ORegAnno and PAZAR store unaligned variable-length DNA segments containing binding sites of a TF. These DNA segments need to be aligned to build a PSSM. While the TRANSFAC database provides scoring matrices for TFs, nearly 78% of the TFs in the public release do not have matrices available. As work on TFBS alignment algorithms has been limited, it is highly desirable to have an alignment algorithm tailored to TFBSs. RESULTS We designed a novel algorithm named LASAGNA, which is aware of the lengths of input TFBSs and utilizes position dependence. Results on 189 TFs of 5 species in the TRANSFAC database showed that our method significantly outperformed ClustalW2 and MEME. We further compared a PSSM method dependent on LASAGNA to an alignment-free TFBS search method. Results on 89 TFs whose binding sites can be located in genomes showed that our method is significantly more precise at fixed recall rates. Finally, we described LASAGNA-ChIP, a more sophisticated version for ChIP (Chromatin immunoprecipitation) experiments. Under the one-per-sequence model, it showed comparable performance with MEME in discovering motifs in ChIP-seq peak sequences. CONCLUSIONS We conclude that the LASAGNA algorithm is simple and effective in aligning variable-length binding sites. It has been integrated into a user-friendly webtool for TFBS search and visualization called LASAGNA-Search. The tool currently stores precomputed PSSM models for 189 TFs and 133 TFs built from TFBSs in the TRANSFAC Public database (release 7.0) and the ORegAnno database (08Nov10 dump), respectively. The webtool is available at http://biogrid.engr.uconn.edu/lasagna_search/.
Collapse
Affiliation(s)
- Chih Lee
- Department of Computer Science and Engineering, University of Connecticut,
Fairfield Road, Storrs, CT 06269, USA
| | - Chun-Hsi Huang
- Department of Computer Science and Engineering, University of Connecticut,
Fairfield Road, Storrs, CT 06269, USA
| |
Collapse
|
27
|
Chalmel F, Lardenois A, Georg I, Barrionuevo F, Demougin P, Jégou B, Scherer G, Primig M. Genome-wide identification of Sox8-, and Sox9-dependent genes during early post-natal testis development in the mouse. Andrology 2013; 1:281-92. [PMID: 23315995 DOI: 10.1111/j.2047-2927.2012.00049.x] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2012] [Revised: 11/14/2012] [Accepted: 11/20/2012] [Indexed: 01/15/2023]
Abstract
The SOX8 and SOX9 transcription factors are involved in, among others, sex differentiation, male gonad development and adult maintenance of spermatogenesis. Sox8(-/-) mice lacking Sox9 in Sertoli cells fail to form testis cords and cannot establish spermatogenesis. Although genetic and histological data show an important role for these transcription factors in regulating spermatogenesis, it is not clear which genes depend upon them at a genome-wide level. To identify transcripts that respond to the absence of Sox8 in all cells and Sox9 in Sertoli cells we measured mRNA concentrations in testicular samples from mice at 0, 6 and 18 days post-partum. In total, 621 and 629 transcripts were found at decreased or increased levels, respectively, at different time points in the mutant as compared to the control samples. These mRNAs were categorized as preferentially expressed in Sertoli cells or germ cells using data obtained with male and female gonad samples and enriched testicular cell populations. Five candidate genes were validated at the protein level. Furthermore, we identified putative direct SOX8 and SOX9 target genes by integrating predicted SOX-binding sites present in potential regulatory regions upstream of the transcription start site. Finally, we used protein network data to gain insight into the effects on regulatory interactions that occur when Sox8 and Sox9 are absent in developing Sertoli cells. The integration of testicular samples with enriched Sertoli cells, germ cells and female gonads enabled us to broadly distinguish transcripts directly affected in Sertoli cells from others that respond to secondary events in testicular cell types. Thus, combined RNA profiling signals, motif predictions and network data identified putative SOX8/SOX9 target genes in Sertoli cells and yielded insight into regulatory interactions that depend upon these transcription factors. In addition, our results will facilitate the interpretation of genome-wide in vivo SOX8 and SOX9 DNA binding data.
Collapse
Affiliation(s)
- F Chalmel
- Inserm, U1085-Irset, University of Rennes 1, Rennes, France
| | | | | | | | | | | | | | | |
Collapse
|
28
|
Piro RM, Molineris I, Di Cunto F, Eils R, König R. Disease-gene discovery by integration of 3D gene expression and transcription factor binding affinities. ACTA ACUST UNITED AC 2012; 29:468-75. [PMID: 23267172 DOI: 10.1093/bioinformatics/bts720] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2023]
Abstract
MOTIVATION The computational evaluation of candidate genes for hereditary disorders is a non-trivial task. Several excellent methods for disease-gene prediction have been developed in the past 2 decades, exploiting widely differing data sources to infer disease-relevant functional relationships between candidate genes and disorders. We have shown recently that spatially mapped, i.e. 3D, gene expression data from the mouse brain can be successfully used to prioritize candidate genes for human Mendelian disorders of the central nervous system. RESULTS We improved our previous work 2-fold: (i) we demonstrate that condition-independent transcription factor binding affinities of the candidate genes' promoters are relevant for disease-gene prediction and can be integrated with our previous approach to significantly enhance its predictive power; and (ii) we define a novel similarity measure-termed Relative Intensity Overlap-for both 3D gene expression patterns and binding affinity profiles that better exploits their disease-relevant information content. Finally, we present novel disease-gene predictions for eight loci associated with different syndromes of unknown molecular basis that are characterized by mental retardation.
Collapse
Affiliation(s)
- Rosario M Piro
- Department of Theoretical Bioinformatics, German Cancer Research Center (Deutsches Krebsforschungszentrum, DKFZ), University of Heidelberg, Im 69120 Heidelberg, Germany.
| | | | | | | | | |
Collapse
|
29
|
He Y, Zhang Y, Zheng G, Wei C. CTF: a CRF-based transcription factor binding sites finding system. BMC Genomics 2012; 13 Suppl 8:S18. [PMID: 23282203 PMCID: PMC3535700 DOI: 10.1186/1471-2164-13-s8-s18] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
Background Identifying the location of transcription factor bindings is crucial to understand transcriptional regulation. Currently, Chromatin Immunoprecipitation followed with high-throughput Sequencing (ChIP-seq) is able to locate the transcription factor binding sites (TFBSs) accurately in high throughput and it has become the gold-standard method for TFBS finding experimentally. However, due to its high cost, it is impractical to apply the method in a very large scale. Considering the large number of transcription factors, numerous cell types and various conditions, computational methods are still very valuable to accurate TFBS identification. Results In this paper, we proposed a novel integrated TFBS prediction system, CTF, based on Conditional Random Fields (CRFs). Integrating information from different sources, CTF was able to capture patterns of TFBSs contained in different features (sequence, chromatin and etc) and predicted the TFBS locations with a high accuracy. We compared CTF with several existing tools as well as the PWM baseline method on a dataset generated by ChIP-seq experiments (TFBSs of 13 transcription factors in mouse genome). Results showed that CTF performed significantly better than existing methods tested. Conclusions CTF is a powerful tool to predict TFBSs by integrating high throughput data and different features. It can be a useful complement to ChIP-seq and other experimental methods for TFBS identification and thus improve our ability to investigate functional elements in post-genomic era. Availability: CTF is freely available to academic users at: http://cbb.sjtu.edu.cn/~ccwei/pub/software/CTF/CTF.php
Collapse
Affiliation(s)
- Yupeng He
- School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, 800 Dongchuan Road, Shanghai 200240, China
| | | | | | | |
Collapse
|
30
|
Blanco E, Corominas M. CBS: an open platform that integrates predictive methods and epigenetics information to characterize conserved regulatory features in multiple Drosophila genomes. BMC Genomics 2012; 13:688. [PMID: 23228284 PMCID: PMC3564944 DOI: 10.1186/1471-2164-13-688] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2012] [Accepted: 11/28/2012] [Indexed: 12/11/2022] Open
Abstract
Background Information about the composition of regulatory regions is of great value for designing experiments to functionally characterize gene expression. The multiplicity of available applications to predict transcription factor binding sites in a particular locus contrasts with the substantial computational expertise that is demanded to manipulate them, which may constitute a potential barrier for the experimental community. Results CBS (Conserved regulatory Binding Sites, http://compfly.bio.ub.es/CBS) is a public platform of evolutionarily conserved binding sites and enhancers predicted in multiple Drosophila genomes that is furnished with published chromatin signatures associated to transcriptionally active regions and other experimental sources of information. The rapid access to this novel body of knowledge through a user-friendly web interface enables non-expert users to identify the binding sequences available for any particular gene, transcription factor, or genome region. Conclusions The CBS platform is a powerful resource that provides tools for data mining individual sequences and groups of co-expressed genes with epigenomics information to conduct regulatory screenings in Drosophila.
Collapse
Affiliation(s)
- Enrique Blanco
- Departament de Genètica and Institut de Biomedicina (IBUB), Universitat de Barcelona, Av, Diagonal 643, 08028, Barcelona, Spain.
| | | |
Collapse
|
31
|
Abstract
Understanding regulation of gene transcription is central to molecular biology as well as being of great interest in medicine. The molecular syntax of the concerted transcriptional activation/repression of gene networks in mammal cells, which shape the physiological response to the molecular signals, is often unknown or not completely understood. Combining genome-wide experiments with in silico approaches opens the way to a more systematic comprehension of the molecular mechanisms of transcription regulation. Diverse bioinformatics tools have been developed to help unravel these mechanisms, by handling and processing data at different stages: from data collection and storage to the identification of molecular targets and from the detection of DNA motif signatures in the regulatory sequences of functionally related genes to the identification of relevant regulatory networks. Moreover, the large amount of genome-wide scale data recently produced has attracted professionals from diverse backgrounds to this cutting-edge realm of molecular biology. This mini-review is intended as an orientation for multidisciplinary professionals, introducing a streamlined workflow in gene transcription regulation with emphasis on sequence analysis. It provides an outlook on tools and methods, selected from a host of bioinformatics resources available today. It has been designed for the benefit of students, investigators, and professionals who seek a coherent yet quick introduction to in silico approaches to analyzing regulation of gene transcription in the post-genomic era.
Collapse
Affiliation(s)
- Gioia Altobelli
- Department of Endocrinology, William Harvey Research Institute, Barts and The London School of Medicine and Dentistry, Queen Mary University of London, Charterhouse Square, London EC1M 6BQ, UK.
| |
Collapse
|
32
|
Whitfield TW, Wang J, Collins PJ, Partridge EC, Aldred SF, Trinklein ND, Myers RM, Weng Z. Functional analysis of transcription factor binding sites in human promoters. Genome Biol 2012; 13:R50. [PMID: 22951020 PMCID: PMC3491394 DOI: 10.1186/gb-2012-13-9-r50] [Citation(s) in RCA: 108] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2011] [Revised: 04/19/2012] [Accepted: 06/18/2012] [Indexed: 12/19/2022] Open
Abstract
Background The binding of transcription factors to specific locations in the genome is integral to the orchestration of transcriptional regulation in cells. To characterize transcription factor binding site function on a large scale, we predicted and mutagenized 455 binding sites in human promoters. We carried out functional tests on these sites in four different immortalized human cell lines using transient transfections with a luciferase reporter assay, primarily for the transcription factors CTCF, GABP, GATA2, E2F, STAT, and YY1. Results In each cell line, between 36% and 49% of binding sites made a functional contribution to the promoter activity; the overall rate for observing function in any of the cell lines was 70%. Transcription factor binding resulted in transcriptional repression in more than a third of functional sites. When compared with predicted binding sites whose function was not experimentally verified, the functional binding sites had higher conservation and were located closer to transcriptional start sites (TSSs). Among functional sites, repressive sites tended to be located further from TSSs than were activating sites. Our data provide significant insight into the functional characteristics of YY1 binding sites, most notably the detection of distinct activating and repressing classes of YY1 binding sites. Repressing sites were located closer to, and often overlapped with, translational start sites and presented a distinctive variation on the canonical YY1 binding motif. Conclusions The genomic properties that we found to associate with functional TF binding sites on promoters -- conservation, TSS proximity, motifs and their variations -- point the way to improved accuracy in future TFBS predictions.
Collapse
Affiliation(s)
- Troy W Whitfield
- Program in Bioinformatics and Integrative Biology and Department of Biochemistry and Molecular Pharmacology, University of Massachusetts Medical School, Worcester, MA 01605, USA
| | | | | | | | | | | | | | | |
Collapse
|
33
|
Arnold P, Erb I, Pachkov M, Molina N, van Nimwegen E. MotEvo: integrated Bayesian probabilistic methods for inferring regulatory sites and motifs on multiple alignments of DNA sequences. ACTA ACUST UNITED AC 2012; 28:487-94. [PMID: 22334039 DOI: 10.1093/bioinformatics/btr695] [Citation(s) in RCA: 65] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]
Abstract
MOTIVATION Probabilistic approaches for inferring transcription factor binding sites (TFBSs) and regulatory motifs from DNA sequences have been developed for over two decades. Previous work has shown that prediction accuracy can be significantly improved by incorporating features such as the competition of multiple transcription factors (TFs) for binding to nearby sites, the tendency of TFBSs for co-regulated TFs to cluster and form cis-regulatory modules and explicit evolutionary modeling of conservation of TFBSs across orthologous sequences. However, currently available tools only incorporate some of these features, and significant methodological hurdles hampered their synthesis into a single consistent probabilistic framework. RESULTS We present MotEvo, a integrated suite of Bayesian probabilistic methods for the prediction of TFBSs and inference of regulatory motifs from multiple alignments of phylogenetically related DNA sequences, which incorporates all features just mentioned. In addition, MotEvo incorporates a novel model for detecting unknown functional elements that are under evolutionary constraint, and a new robust model for treating gain and loss of TFBSs along a phylogeny. Rigorous benchmarking tests on ChIP-seq datasets show that MotEvo's novel features significantly improve the accuracy of TFBS prediction, motif inference and enhancer prediction. AVAILABILITY Source code, a user manual and files with several example applications are available at www.swissregulon.unibas.ch.
Collapse
Affiliation(s)
- Phil Arnold
- Biozentrum, University of Basel, Swiss Institute of Bioinformatics, Klingelbergstrasse 50-70, 4056 Basel, Switzerland
| | | | | | | | | |
Collapse
|
34
|
Pairó E, Maynou J, Marco S, Perera A. A subspace method for the detection of transcription factor binding sites. Bioinformatics 2012; 28:1328-35. [DOI: 10.1093/bioinformatics/bts147] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
|
35
|
Habib N, Wapinski I, Margalit H, Regev A, Friedman N. A functional selection model explains evolutionary robustness despite plasticity in regulatory networks. Mol Syst Biol 2012; 8:619. [PMID: 23089682 PMCID: PMC3501536 DOI: 10.1038/msb.2012.50] [Citation(s) in RCA: 43] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2012] [Accepted: 08/29/2012] [Indexed: 11/09/2022] Open
Abstract
Evolutionary rewiring of regulatory networks is an important source of diversity among species. Previous evidence suggested substantial divergence of regulatory networks across species. However, systematically assessing the extent of this plasticity and its functional implications has been challenging due to limited experimental data and the noisy nature of computational predictions. Here, we introduce a novel approach to study cis-regulatory evolution, and use it to trace the regulatory history of 88 DNA motifs of transcription factors across 23 Ascomycota fungi. While motifs are conserved, we find a pervasive gain and loss in the regulation of their target genes. Despite this turnover, the biological processes associated with a motif are generally conserved. We explain these trends using a model with a strong selection to conserve the overall function of a transcription factor, and a much weaker selection over the specific genes it targets. The model also accounts for the turnover of bound targets measured experimentally across species in yeasts and mammals. Thus, selective pressures on regulatory networks mostly tolerate local rewiring, and may allow for subtle fine-tuning of gene regulation during evolution.
Collapse
Affiliation(s)
- Naomi Habib
- School of Computer Science and Engineering, Hebrew University, Jerusalem, Israel
- Department of Microbiology and Molecular Genetics, IMRIC, Faculty of Medicine, Hebrew University, Jerusalem, Israel
- Alexander Silberman Institute of Life Sciences, Hebrew University, Jerusalem, Israel
| | - Ilan Wapinski
- Department of Systems Biology, Harvard Medical School, Boston, MA, USA
- Broad Institute, 7 Cambridge Center, Cambridge, MA, USA
| | - Hanah Margalit
- Department of Microbiology and Molecular Genetics, IMRIC, Faculty of Medicine, Hebrew University, Jerusalem, Israel
| | - Aviv Regev
- Broad Institute, 7 Cambridge Center, Cambridge, MA, USA
- Howard Hughes Medical Institute, Department of Biology, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Nir Friedman
- School of Computer Science and Engineering, Hebrew University, Jerusalem, Israel
- Alexander Silberman Institute of Life Sciences, Hebrew University, Jerusalem, Israel
| |
Collapse
|
36
|
Oh YM, Kim JK, Choi S, Yoo JY. Identification of co-occurring transcription factor binding sites from DNA sequence using clustered position weight matrices. Nucleic Acids Res 2011; 40:e38. [PMID: 22187154 PMCID: PMC3300004 DOI: 10.1093/nar/gkr1252] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Accurate prediction of transcription factor binding sites (TFBSs) is a prerequisite for identifying cis-regulatory modules that underlie transcriptional regulatory circuits encoded in the genome. Here, we present a computational framework for detecting TFBSs, when multiple position weight matrices (PWMs) for a transcription factor are available. Grouping multiple PWMs of a transcription factor (TF) based on their sequence similarity improves the specificity of TFBS prediction, which was evaluated using multiple genome-wide ChIP-Seq data sets from 26 TFs. The Z-scores of the area under a receiver operating characteristic curve (AUC) values of 368 TFs were calculated and used to statistically identify co-occurring regulatory motifs in the TF bound ChIP loci. Motifs that are co-occurring along with the empirical bindings of E2F, JUN or MYC have been evaluated, in the basal or stimulated condition. Results prove our method can be useful to systematically identify the co-occurring motifs of the TF for the given conditions.
Collapse
Affiliation(s)
- Young Min Oh
- Department of Life Sciences, Pohang University of Science and Technology, Pohang, Republic of Korea
| | | | | | | |
Collapse
|
37
|
Bishop EP, Rohs R, Parker SCJ, West SM, Liu P, Mann RS, Honig B, Tullius TD. A map of minor groove shape and electrostatic potential from hydroxyl radical cleavage patterns of DNA. ACS Chem Biol 2011; 6:1314-20. [PMID: 21967305 DOI: 10.1021/cb200155t] [Citation(s) in RCA: 71] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
Abstract
DNA shape variation and the associated variation in minor groove electrostatic potential are widely exploited by proteins for DNA recognition. Here we show that the hydroxyl radical cleavage pattern is a quantitative measure of DNA backbone solvent accessibility, minor groove width, and minor groove electrostatic potential, at single nucleotide resolution. We introduce maps of DNA shape and electrostatic potential as tools for understanding how proteins recognize binding sites in a genome. These maps reveal periodic structural signals in yeast and Drosophila genomic DNA sequences that are associated with positioned nucleosomes.
Collapse
Affiliation(s)
| | - Remo Rohs
- Molecular and Computational Biology Program, Department of Biological Sciences, University of Southern California, Los Angeles, California 90089, United States
| | - Stephen C. J. Parker
- National Human Genome Research Institute, National Institutes of Health, Rockville, Maryland 20852, United States
| | | | | | | | | | | |
Collapse
|
38
|
Regulation of POU4F3 gene expression in hair cells by 5' DNA in mice. Neuroscience 2011; 197:48-64. [PMID: 21958861 DOI: 10.1016/j.neuroscience.2011.09.033] [Citation(s) in RCA: 45] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2011] [Revised: 09/12/2011] [Accepted: 09/13/2011] [Indexed: 01/21/2023]
Abstract
The POU-domain transcription POU4F3 is expressed in the sensory cells of the inner ear. Expression begins shortly after commitment to the hair cell (HC) fate, and continues throughout life. It is required for terminal HC differentiation and survival. To explore regulation of the murine Pou4f3 gene, we linked enhanced green fluorescent protein (eGFP) to 8.5 kb of genomic sequence 5' to the start codon in transgenic mice. eGFP was uniformly present in all embryonic and neonatal HCs. Expression of eGFP was also observed in developing Merkel cells and olfactory neurons as well as adult inner and vestibular HCs, mimicking the normal expression pattern of POU4F3 protein, with the exception of adult outer HCs. Apparently ectopic expression was observed in developing inner ear neurons. On a Pou4f3 null background, the transgene produced expression in embryonic HCs which faded soon after birth both in vivo and in vitro. Pou4f3 null HCs treated with caspase 3 and 9 inhibitors survived longer than untreated HCs, but still showed reduced expression of eGFP. The results suggest the existence of separate enhancers for different HC types, as well as strong autoregulation of the Pou4f3 gene. Bioinformatic analysis of four divergent mammalian species revealed three highly conserved regions within the transgene: 400 bp immediately 5' to the Pou4f3 ATG, a short sequence at -1.3 kb, and a longer region at -8.2 to -8.5 kb. The latter contained E-box motifs that bind basic helix-loop-helix (bHLH) transcription factors, including motifs activated by ATOH1. Cotransfection of HEK293 or VOT-E36 cells with ATOH1 and the transgene as a reporter enhanced eGFP expression when compared with the transgene alone. Chromatin immunoprecipitation of the three highly conserved regions revealed binding of ATOH1 to the distal-most conserved region. The results are consistent with regulation of Pou4f3 in HCs by ATOH1 at a distal enhancer.
Collapse
|
39
|
Gruel J, LeBorgne M, LeMeur N, Théret N. Simple Shared Motifs (SSM) in conserved region of promoters: a new approach to identify co-regulation patterns. BMC Bioinformatics 2011; 12:365. [PMID: 21910886 PMCID: PMC3215511 DOI: 10.1186/1471-2105-12-365] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2010] [Accepted: 09/12/2011] [Indexed: 01/07/2023] Open
Abstract
Background Regulation of gene expression plays a pivotal role in cellular functions. However, understanding the dynamics of transcription remains a challenging task. A host of computational approaches have been developed to identify regulatory motifs, mainly based on the recognition of DNA sequences for transcription factor binding sites. Recent integration of additional data from genomic analyses or phylogenetic footprinting has significantly improved these methods. Results Here, we propose a different approach based on the compilation of Simple Shared Motifs (SSM), groups of sequences defined by their length and similarity and present in conserved sequences of gene promoters. We developed an original algorithm to search and count SSM in pairs of genes. An exceptional number of SSM is considered as a common regulatory pattern. The SSM approach is applied to a sample set of genes and validated using functional gene-set enrichment analyses. We demonstrate that the SSM approach selects genes that are over-represented in specific biological categories (Ontology and Pathways) and are enriched in co-expressed genes. Finally we show that genes co-expressed in the same tissue or involved in the same biological pathway have increased SSM values. Conclusions Using unbiased clustering of genes, Simple Shared Motifs analysis constitutes an original contribution to provide a clearer definition of expression networks.
Collapse
Affiliation(s)
- Jérémy Gruel
- EA 4427 SeRAIC IFR140, Université de Rennes 1, 2 avenue du Pr, Léon Bernard, Rennes 35043, France.
| | | | | | | |
Collapse
|
40
|
Yang S, Yalamanchili HK, Li X, Yao KM, Sham PC, Zhang MQ, Wang J. Correlated evolution of transcription factors and their binding sites. ACTA ACUST UNITED AC 2011; 27:2972-8. [PMID: 21896508 DOI: 10.1093/bioinformatics/btr503] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
MOTIVATION The interaction between transcription factor (TF) and transcription factor binding site (TFBS) is essential for gene regulation. Mutation in either the TF or the TFBS may weaken their interaction and thus result in abnormalities. To maintain such vital interaction, a mutation in one of the interacting partners might be compensated by a corresponding mutation in its binding partner during the course of evolution. Confirming this co-evolutionary relationship will guide us in designing protein sequences to target a specific DNA sequence or in predicting TFBS for poorly studied proteins, or even correcting and rescuing disease mutations in clinical applications. RESULTS Based on six, publicly available, experimentally validated TF-TFBS binding datasets for the basic Helix-Loop-Helix (bHLH) family, Homeo family, High-Mobility Group (HMG) family and Transient Receptor Potential channels (TRP) family, we showed that the evolutions of the TFs and their TFBSs are significantly correlated across eukaryotes. We further developed a mutual information-based method to identify co-evolved protein residues and DNA bases. This research sheds light on the dynamic relationship between TF and TFBS during their evolution. The same principle and strategy can be applied to co-evolutionary studies on protein-DNA interactions in other protein families. AVAILABILITY All the datasets, scripts and other related files have been made freely available at: http://jjwanglab.org/co-evo. CONTACT junwen@uw.edu. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Shu Yang
- Department of Biochemistry, LKS Faculty of Medicine, The University of Hong Kong, Hong Kong SAR, China
| | | | | | | | | | | | | |
Collapse
|
41
|
Tree-based position weight matrix approach to model transcription factor binding site profiles. PLoS One 2011; 6:e24210. [PMID: 21912677 PMCID: PMC3166302 DOI: 10.1371/journal.pone.0024210] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2011] [Accepted: 08/02/2011] [Indexed: 11/30/2022] Open
Abstract
Most of the position weight matrix (PWM) based bioinformatics methods developed to predict transcription factor binding sites (TFBS) assume each nucleotide in the sequence motif contributes independently to the interaction between protein and DNA sequence, usually producing high false positive predictions. The increasing availability of TF enrichment profiles from recent ChIP-Seq methodology facilitates the investigation of dependent structure and accurate prediction of TFBSs. We develop a novel Tree-based PWM (TPWM) approach to accurately model the interaction between TF and its binding site. The whole tree-structured PWM could be considered as a mixture of different conditional-PWMs. We propose a discriminative approach, called TPD (TPWM based Discriminative Approach), to construct the TPWM from the ChIP-Seq data with a pre-existing PWM. To achieve the maximum discriminative power between the positive and negative datasets, the cutoff value is determined based on the Matthew Correlation Coefficient (MCC). The resulting TPWMs are evaluated with respect to accuracy on extensive synthetic datasets. We then apply our TPWM discriminative approach on several real ChIP-Seq datasets to refine the current TFBS models stored in the TRANSFAC database. Experiments on both the simulated and real ChIP-Seq data show that the proposed method starting from existing PWM has consistently better performance than existing tools in detecting the TFBSs. The improved accuracy is the result of modelling the complete dependent structure of the motifs and better prediction of true positive rate. The findings could lead to better understanding of the mechanisms of TF-DNA interactions.
Collapse
|
42
|
Benson CC, Zhou Q, Long X, Miano JM. Identifying functional single nucleotide polymorphisms in the human CArGome. Physiol Genomics 2011; 43:1038-48. [PMID: 21771879 DOI: 10.1152/physiolgenomics.00098.2011] [Citation(s) in RCA: 42] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023] Open
Abstract
Regulatory SNPs (rSNPs) reside primarily within the nonprotein coding genome and are thought to disturb normal patterns of gene expression by altering DNA binding of transcription factors. Nevertheless, despite the explosive rise in SNP association studies, there is little information as to the function of rSNPs in human disease. Serum response factor (SRF) is a widely expressed DNA-binding transcription factor that has variable affinity to at least 1,216 permutations of a 10 bp transcription factor binding site (TFBS) known as the CArG box. We developed a robust in silico bioinformatics screening method to evaluate sequences around RefSeq genes for conserved CArG boxes. Utilizing a predetermined phastCons threshold score, we identified 8,252 strand-specific CArGs within an 8 kb window around the transcription start site of 5,213 genes, including all previously defined SRF target genes. We then interrogated this CArG dataset for the presence of previously annotated common polymorphisms. We found a total of 118 unique CArG boxes harboring a SNP within the 10 bp CArG sequence and 1,130 CArG boxes with SNPs located just outside the CArG element. Gel shift and luciferase reporter assays validated SRF binding and functional activity of several new CArG boxes. Importantly, SNPs within or just outside the CArG box often resulted in altered SRF binding and activity. Collectively, these findings demonstrate a powerful approach to computationally define rSNPs in the human CArGome and provide a foundation for similar analyses of other TFBS. Such information may find utility in genetic association studies of human disease where little insight is known regarding the functionality of rSNPs.
Collapse
Affiliation(s)
- Craig C Benson
- University of Rochester Medical Center, Rochester, NY, USA
| | | | | | | |
Collapse
|
43
|
Schnall-Levin M, Rissland OS, Johnston WK, Perrimon N, Bartel DP, Berger B. Unusually effective microRNA targeting within repeat-rich coding regions of mammalian mRNAs. Genome Res 2011; 21:1395-403. [PMID: 21685129 DOI: 10.1101/gr.121210.111] [Citation(s) in RCA: 108] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]
Abstract
MicroRNAs (miRNAs) regulate numerous biological processes by base-pairing with target messenger RNAs (mRNAs), primarily through sites in 3' untranslated regions (UTRs), to direct the repression of these targets. Although miRNAs have sometimes been observed to target genes through sites in open reading frames (ORFs), large-scale studies have shown such targeting to be generally less effective than 3' UTR targeting. Here, we show that several miRNAs each target significant groups of genes through multiple sites within their coding regions. This ORF targeting, which mediates both predictable and effective repression, arises from highly repeated sequences containing miRNA target sites. We show that such sequence repeats largely arise through evolutionary duplications and occur particularly frequently within families of paralogous C(2)H(2) zinc-finger genes, suggesting the potential for their coordinated regulation. Examples of ORFs targeted by miR-181 include both the well-known tumor suppressor RB1 and RBAK, encoding a C(2)H(2) zinc-finger protein and transcriptional binding partner of RB1. Our results indicate a function for repeat-rich coding sequences in mediating post-transcriptional regulation and reveal circumstances in which miRNA-mediated repression through ORF sites can be reliably predicted.
Collapse
Affiliation(s)
- Michael Schnall-Levin
- Department of Mathematics, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | | | | | | | | | | |
Collapse
|
44
|
Kurki MI, Paananen J, Storvik M, Ylä-Herttuala S, Jääskeläinen JE, von Und Zu Fraunberg M, Wong G, Pehkonen P. TAFFEL: Independent Enrichment Analysis of gene sets. BMC Bioinformatics 2011; 12:171. [PMID: 21592412 PMCID: PMC3120704 DOI: 10.1186/1471-2105-12-171] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2011] [Accepted: 05/19/2011] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND A major challenge in genomic research is identifying significant biological processes and generating new hypotheses from large gene sets. Gene sets often consist of multiple separate biological pathways, controlled by distinct regulatory mechanisms. Many of these pathways and the associated regulatory mechanisms might be obscured by a large number of other significant processes and thus not identified as significant by standard gene set enrichment analysis tools. RESULTS We present a novel method called Independent Enrichment Analysis (IEA) and software TAFFEL that eases the task by clustering genes to subgroups using Gene Ontology categories and transcription regulators. IEA indicates transcriptional regulators putatively controlling biological functions in studied condition. CONCLUSIONS We demonstrate that the developed method and TAFFEL tool give new insight to the analysis of differentially expressed genes and can generate novel hypotheses. Our comparison to other popular methods showed that the IEA method implemented in TAFFEL can find important biological phenomena, which are not reported by other methods.
Collapse
Affiliation(s)
- Mitja I Kurki
- Department of Biosciences, University of Eastern Finland, PO Box 1627, FIN-70211 Kuopio, Finland
| | | | | | | | | | | | | | | |
Collapse
|
45
|
Qin J, Li MJ, Wang P, Zhang MQ, Wang J. ChIP-Array: combinatory analysis of ChIP-seq/chip and microarray gene expression data to discover direct/indirect targets of a transcription factor. Nucleic Acids Res 2011; 39:W430-6. [PMID: 21586587 PMCID: PMC3125757 DOI: 10.1093/nar/gkr332] [Citation(s) in RCA: 48] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
Abstract
Chromatin immunoprecipitation (ChIP) coupled with high-throughput techniques (ChIP-X), such as next generation sequencing (ChIP-Seq) and microarray (ChIP–chip), has been successfully used to map active transcription factor binding sites (TFBS) of a transcription factor (TF). The targeted genes can be activated or suppressed by the TF, or are unresponsive to the TF. Microarray technology has been used to measure the actual expression changes of thousands of genes under the perturbation of a TF, but is unable to determine if the affected genes are direct or indirect targets of the TF. Furthermore, both ChIP-X and microarray methods produce a large number of false positives. Combining microarray expression profiling and ChIP-X data allows more effective TFBS analysis for studying the function of a TF. However, current web servers only provide tools to analyze either ChIP-X or expression data, but not both. Here, we present ChIP-Array, a web server that integrates ChIP-X and expression data from human, mouse, yeast, fruit fly and Arabidopsis. This server will assist biologists to detect direct and indirect target genes regulated by a TF of interest and to aid in the functional characterization of the TF. ChIP-Array is available at http://jjwanglab.hku.hk/ChIP-Array, with free access to academic users.
Collapse
Affiliation(s)
- Jing Qin
- Department of Biochemistry, LKS Faculty of Medicine, The University of Hong Kong, 21 Sassoon Road, Hong Kong SAR, China
| | | | | | | | | |
Collapse
|
46
|
Bais AS, Kaminski N, Benos PV. Finding subtypes of transcription factor motif pairs with distinct regulatory roles. Nucleic Acids Res 2011; 39:e76. [PMID: 21486752 PMCID: PMC3113591 DOI: 10.1093/nar/gkr205] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/27/2023] Open
Abstract
DNA sequences bound by a transcription factor (TF) are presumed to contain sequence elements that reflect its DNA binding preferences and its downstream-regulatory effects. Experimentally identified TF binding sites (TFBSs) are usually similar enough to be summarized by a ‘consensus’ motif, representative of the TF DNA binding specificity. Studies have shown that groups of nucleotide TFBS variants (subtypes) can contribute to distinct modes of downstream regulation by the TF via differential recruitment of cofactors. A TFA may bind to TFBS subtypes a1 or a2 depending on whether it associates with cofactors TFB or TFC, respectively. While some approaches can discover motif pairs (dyads), none address the problem of identifying ‘variants’ of dyads. TFs are key components of multiple regulatory pathways targeting different sets of genes perhaps with different binding preferences. Identifying the discriminating TF–DNA associations that lead to the differential downstream regulation is thus essential. We present DiSCo (Discovery of Subtypes and Cofactors), a novel approach for identifying variants of dyad motifs (and their respective target sequence sets) that are instrumental for differential downstream regulation. Using both simulated and experimental datasets, we demonstrate how current motif discovery can be successfully leveraged to address this question.
Collapse
Affiliation(s)
- Abha Singh Bais
- Department of Computational and Systems Biology, Dorothy P. and Richard P. Simmons Center for Interstitial Lung Disease, Division of Pulmonary, Allergy and Critical Care Medicine and Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA 15260, USA
| | | | | |
Collapse
|
47
|
Everett LJ, Jensen ST, Hannenhalli S. Transcriptional regulation via TF-modifying enzymes: an integrative model-based analysis. Nucleic Acids Res 2011; 39:e78. [PMID: 21470963 PMCID: PMC3130287 DOI: 10.1093/nar/gkr172] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Transcription factor activity is largely regulated through post-translational modification. Here, we report the first integrative model of transcription that includes both interactions between transcription factors and promoters, and between transcription factors and modifying enzymes. Simulations indicate that our method is robust against noise. We validated our tool on a well-studied stress response network in yeast and on a STAT1-mediated regulatory network in human B cells. Our work represents a significant step toward a comprehensive model of gene transcription.
Collapse
Affiliation(s)
- Logan J Everett
- Genomics and Computational Biology Program, 700 Clinical Research Building, 415 Curie Boulevard, Philadelphia, PA 19104, USA.
| | | | | |
Collapse
|
48
|
Geeven G, Macgillavry HD, Eggers R, Sassen MM, Verhaagen J, Smit AB, de Gunst MCM, van Kesteren RE. LLM3D: a log-linear modeling-based method to predict functional gene regulatory interactions from genome-wide expression data. Nucleic Acids Res 2011; 39:5313-27. [PMID: 21422075 PMCID: PMC3141251 DOI: 10.1093/nar/gkr139] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022] Open
Abstract
All cellular processes are regulated by condition-specific and time-dependent interactions between transcription factors and their target genes. While in simple organisms, e.g. bacteria and yeast, a large amount of experimental data is available to support functional transcription regulatory interactions, in mammalian systems reconstruction of gene regulatory networks still heavily depends on the accurate prediction of transcription factor binding sites. Here, we present a new method, log-linear modeling of 3D contingency tables (LLM3D), to predict functional transcription factor binding sites. LLM3D combines gene expression data, gene ontology annotation and computationally predicted transcription factor binding sites in a single statistical analysis, and offers a methodological improvement over existing enrichment-based methods. We show that LLM3D successfully identifies novel transcriptional regulators of the yeast metabolic cycle, and correctly predicts key regulators of mouse embryonic stem cell self-renewal more accurately than existing enrichment-based methods. Moreover, in a clinically relevant in vivo injury model of mammalian neurons, LLM3D identified peroxisome proliferator-activated receptor γ (PPARγ) as a neuron-intrinsic transcriptional regulator of regenerative axon growth. In conclusion, LLM3D provides a significant improvement over existing methods in predicting functional transcription regulatory interactions in the absence of experimental transcription factor binding data.
Collapse
Affiliation(s)
- Geert Geeven
- Department of Mathematics, Faculty of Sciences, VU University, De Boelelaan 1081, 1081 HV Amsterdam, The Netherlands
| | | | | | | | | | | | | | | |
Collapse
|
49
|
Abstract
The number of known mutations in human nuclear genes, underlying or associated with human inherited disease, has now exceeded 100,000 in more than 3700 different genes (Human Gene Mutation Database). However, for a variety of reasons, this figure is likely to represent only a small proportion of the clinically relevant genetic variants that remain to be identified in the human genome (the 'mutome'). With the advent of next-generation sequencing, we are currently witnessing a revolution in medical genetics. In particular, whole-genome sequencing (WGS) has the potential to identify all disease-causing or disease-associated DNA variants in a given individual. Here, we use examples of recent advances in our understanding of mutational/pathogenic mechanisms to guide our thinking about possible locations outwith gene-coding sequences for those disease-causing or disease-associated variants that are likely so often to have been overlooked because of the inadequacy of current mutation screening protocols. Such considerations are important not only for improving mutation-screening strategies but also for enhancing the interpretation of findings derived from genome-wide association studies, whole-exome sequencing and WGS. An improved understanding of the human mutome will not only lead to the development of improved diagnostic testing procedures but should also improve our understanding of human genome biology.
Collapse
Affiliation(s)
- J M Chen
- Etablissement Français du Sang (EFS) - Bretagne, Brest, France.
| | | | | |
Collapse
|
50
|
Abstract
A powerful method to identify binding sites in target genes is chromatin immunoprecipitation (ChIP), which allows the purification of in vivo formed complexes of a DNA-binding protein and associated DNA. Briefly, the method involves the fixation of plant tissue and the isolation of the total protein-DNA mixture, followed by an immunoprecipitation step with an antibody directed against the protein of interest and, subsequently, the DNA can be purified. Finally, the DNA can be analyzed by PCR for the enrichment of specific regions. A drawback of ChIP is that for each protein another antibody is needed. To overcome this, a generic strategy is possible using tags fused to the protein of interest. In this case, only antibody is needed against the tag. This protocol describes the tagging of proteins and how to perform ChIP.
Collapse
|