51
|
Abstract
A powerful method to identify binding sites in target genes is chromatin immunoprecipitation (ChIP), which allows the purification of in vivo formed complexes of a DNA-binding protein and associated DNA. Briefly, the method involves the fixation of plant tissue and the isolation of the total protein-DNA mixture, followed by an immunoprecipitation step with an antibody directed against the protein of interest and, subsequently, the DNA can be purified. Finally, the DNA can be analyzed by PCR for the enrichment of specific regions. A drawback of ChIP is that for each protein another antibody is needed. To overcome this, a generic strategy is possible using tags fused to the protein of interest. In this case, only antibody is needed against the tag. This protocol describes the tagging of proteins and how to perform ChIP.
Collapse
|
52
|
Li G, Chan TM, Leung KS, Lee KH. A cluster refinement algorithm for motif discovery. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2010; 7:654-668. [PMID: 21030733 DOI: 10.1109/tcbb.2009.25] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/30/2023]
Abstract
Finding Transcription Factor Binding Sites, i.e., motif discovery, is crucial for understanding the gene regulatory relationship. Motifs are weakly conserved and motif discovery is an NP-hard problem. We propose a new approach called Cluster Refinement Algorithm for Motif Discovery (CRMD). CRMD employs a flexible statistical motif model allowing a variable number of motifs and motif instances. CRMD first uses a novel entropy-based clustering to find complete and good starting candidate motifs from the DNA sequences. CRMD then employs an effective greedy refinement to search for optimal motifs from the candidate motifs. The refinement is fast, and it changes the number of motif instances based on the adaptive thresholds. The performance of CRMD is further enhanced if the problem has one occurrence of motif instance per sequence. Using an appropriate similarity test of motifs, CRMD is also able to find multiple motifs. CRMD has been tested extensively on synthetic and real data sets. The experimental results verify that CRMD usually outperforms four other state-of-the-art algorithms in terms of the qualities of the solutions with competitive computing time. It finds a good balance between finding true motif instances and screening false motif instances, and is robust on problems of various levels of difficulty.
Collapse
Affiliation(s)
- Gang Li
- Department of Computer Science and Engineering, Chinese University of Hong Kong, Hong Kong.
| | | | | | | |
Collapse
|
53
|
Transcription factor binding variation in the evolution of gene regulation. Trends Genet 2010; 26:468-75. [PMID: 20864205 DOI: 10.1016/j.tig.2010.08.005] [Citation(s) in RCA: 50] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2010] [Revised: 08/22/2010] [Accepted: 08/22/2010] [Indexed: 01/17/2023]
Abstract
Transcription factor interactions with DNA are one of the primary mechanisms by which expression is modulated, yet their evolution remains poorly understood. Chromatin immunoprecipitation followed by microarray (ChIP-chip) or sequencing (ChIP-Seq) has revolutionized the study of protein-DNA interactions. However, only recently has attention focused on determining to what extent these regulatory interactions vary between species across entire genomes. A series of recent studies have compared in vivo binding data across a range of evolutionary distances. Binding events diverge rapidly, indicating gene regulation is an evolutionarily flexible process.
Collapse
|
54
|
Li MJ, Sham PC, Wang J. FastPval: a fast and memory efficient program to calculate very low P-values from empirical distribution. ACTA ACUST UNITED AC 2010; 26:2897-9. [PMID: 20861029 PMCID: PMC2971576 DOI: 10.1093/bioinformatics/btq540] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
Motivation: Resampling methods, such as permutation and bootstrap, have been widely used to generate an empirical distribution for assessing the statistical significance of a measurement. However, to obtain a very low P-value, a large size of resampling is required, where computing speed, memory and storage consumption become bottlenecks, and sometimes become impossible, even on a computer cluster. Results: We have developed a multiple stage P-value calculating program called FastPval that can efficiently calculate very low (up to 10−9) P-values from a large number of resampled measurements. With only two input files and a few parameter settings from the users, the program can compute P-values from empirical distribution very efficiently, even on a personal computer. When tested on the order of 109 resampled data, our method only uses 52.94% the time used by the conventional method, implemented by standard quicksort and binary search algorithms, and consumes only 0.11% of the memory and storage. Furthermore, our method can be applied to extra large datasets that the conventional method fails to calculate. The accuracy of the method was tested on data generated from Normal, Poison and Gumbel distributions and was found to be no different from the exact ranking approach. Availability: The FastPval executable file, the java GUI and source code, and the java web start server with example data and introduction, are available at http://wanglab.hku.hk/pvalue Contact:junwen@hku.hk Supplementary information:Supplementary data are available at Bioinformatics online and http://wanglab.hku.hk/pvalue/.
Collapse
Affiliation(s)
- Mulin Jun Li
- Department of Biochemistry, LKS Faculty of Medicine, The University of Hong Kong, 21 Sassoon Rd, Pokfulam, Hong Kong SAR, China
| | | | | |
Collapse
|
55
|
Piechota M, Korostynski M, Przewlocki R. Identification of cis-regulatory elements in the mammalian genome: the cREMaG database. PLoS One 2010; 5:e12465. [PMID: 20824209 PMCID: PMC2930848 DOI: 10.1371/journal.pone.0012465] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2010] [Accepted: 08/02/2010] [Indexed: 12/20/2022] Open
Abstract
Background A growing number of gene expression-profiling datasets provides a reliable source of information about gene co-expression. In silico analyses of the properties shared among the promoters of co-expressed genes facilitates the identification of transcription factors (TFs) involved in the co-regulation of those genes. Our previous experience with microarray data led to the development of a database suitable for the examination of regulatory motifs in the promoters of co-expressed genes. Methodology We introduce the cREMaG (cis-Regulatory Elements in the Mammalian Genome) system designed for in silico studies of the promoter properties of co-regulated mammalian genes. The cREMaG system offers an analysis of data obtained from human, mouse, rat, bovine and canine gene expression-profiling studies. More than eight analysis parameters can be utilized in user-defined combinations. The selection of alternative transcription start sites and information about CpG islands are also available. Conclusions Using the cREMaG system, we successfully identified TFs mediating transcriptional responses in reference gene sets. The cREMaG system facilitates in silico studies of mammalian transcriptional gene regulation. The resource is freely available at http://www.cremag.org.
Collapse
Affiliation(s)
- Marcin Piechota
- Department of Molecular Neuropharmacology, Institute of Pharmacology, Polish Academy of Sciences, Kraków, Poland.
| | | | | |
Collapse
|
56
|
Ramsey SA, Knijnenburg TA, Kennedy KA, Zak DE, Gilchrist M, Gold ES, Johnson CD, Lampano AE, Litvak V, Navarro G, Stolyar T, Aderem A, Shmulevich I. Genome-wide histone acetylation data improve prediction of mammalian transcription factor binding sites. ACTA ACUST UNITED AC 2010; 26:2071-5. [PMID: 20663846 PMCID: PMC2922897 DOI: 10.1093/bioinformatics/btq405] [Citation(s) in RCA: 55] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
Motivation: Histone acetylation (HAc) is associated with open chromatin, and HAc has been shown to facilitate transcription factor (TF) binding in mammalian cells. In the innate immune system context, epigenetic studies strongly implicate HAc in the transcriptional response of activated macrophages. We hypothesized that using data from large-scale sequencing of a HAc chromatin immunoprecipitation assay (ChIP-Seq) would improve the performance of computational prediction of binding locations of TFs mediating the response to a signaling event, namely, macrophage activation. Results: We tested this hypothesis using a multi-evidence approach for predicting binding sites. As a training/test dataset, we used ChIP-Seq-derived TF binding site locations for five TFs in activated murine macrophages. Our model combined TF binding site motif scanning with evidence from sequence-based sources and from HAc ChIP-Seq data, using a weighted sum of thresholded scores. We find that using HAc data significantly improves the performance of motif-based TF binding site prediction. Furthermore, we find that within regions of high HAc, local minima of the HAc ChIP-Seq signal are particularly strongly correlated with TF binding locations. Our model, using motif scanning and HAc local minima, improves the sensitivity for TF binding site prediction by ∼50% over a model based on motif scanning alone, at a false positive rate cutoff of 0.01. Availability: The data and software source code for model training and validation are freely available online at http://magnet.systemsbiology.net/hac. Contact:aderem@systemsbiology.org; ishmulevich@systemsbiology.org Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Stephen A Ramsey
- Institute for Systems Biology, 1441 North 34th Street, Seattle, WA 98103, USA.
| | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
57
|
Tagu D, Dugravot S, Outreman Y, Rispe C, Simon JC, Colella S. The anatomy of an aphid genome: From sequence to biology. C R Biol 2010; 333:464-73. [DOI: 10.1016/j.crvi.2010.03.006] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
|
58
|
Laurila K, Yli-Harja O, Lähdesmäki H. A protein-protein interaction guided method for competitive transcription factor binding improves target predictions. Nucleic Acids Res 2010; 37:e146. [PMID: 19786498 PMCID: PMC2794167 DOI: 10.1093/nar/gkp789] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022] Open
Abstract
An important milestone in revealing cells' functions is to build a comprehensive understanding of transcriptional regulation processes. These processes are largely regulated by transcription factors (TFs) binding to DNA sites. Several TF binding site (TFBS) prediction methods have been developed, but they usually model binding of a single TF at a time albeit few methods for predicting binding of multiple TFs also exist. In this article, we propose a probabilistic model that predicts binding of several TFs simultaneously. Our method explicitly models the competitive binding between TFs and uses the prior knowledge of existing protein-protein interactions (PPIs), which mimics the situation in the nucleus. Modeling DNA binding for multiple TFs improves the accuracy of binding site prediction remarkably when compared with other programs and the cases where individual binding prediction results of separate TFs have been combined. The traditional TFBS prediction methods usually predict overwhelming number of false positives. This lack of specificity is overcome remarkably with our competitive binding prediction method. In addition, previously unpredictable binding sites can be detected with the help of PPIs. Source codes are available at http://www.cs.tut.fi/ approximately harrila/.
Collapse
Affiliation(s)
- Kirsti Laurila
- Department of Signal Processing, Tampere University of Technology, P.O. Box 527, FI-33101 Tampere, Finland
| | | | | |
Collapse
|
59
|
Reid JE, Evans KJ, Dyer N, Wernisch L, Ott S. Variable structure motifs for transcription factor binding sites. BMC Genomics 2010; 11:30. [PMID: 20074339 PMCID: PMC2824720 DOI: 10.1186/1471-2164-11-30] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2009] [Accepted: 01/14/2010] [Indexed: 02/06/2023] Open
Abstract
Background Classically, models of DNA-transcription factor binding sites (TFBSs) have been based on relatively few known instances and have treated them as sites of fixed length using position weight matrices (PWMs). Various extensions to this model have been proposed, most of which take account of dependencies between the bases in the binding sites. However, some transcription factors are known to exhibit some flexibility and bind to DNA in more than one possible physical configuration. In some cases this variation is known to affect the function of binding sites. With the increasing volume of ChIP-seq data available it is now possible to investigate models that incorporate this flexibility. Previous work on variable length models has been constrained by: a focus on specific zinc finger proteins in yeast using restrictive models; a reliance on hand-crafted models for just one transcription factor at a time; and a lack of evaluation on realistically sized data sets. Results We re-analysed binding sites from the TRANSFAC database and found motivating examples where our new variable length model provides a better fit. We analysed several ChIP-seq data sets with a novel motif search algorithm and compared the results to one of the best standard PWM finders and a recently developed alternative method for finding motifs of variable structure. All the methods performed comparably in held-out cross validation tests. Known motifs of variable structure were recovered for p53, Stat5a and Stat5b. In addition our method recovered a novel generalised version of an existing PWM for Sp1 that allows for variable length binding. This motif improved classification performance. Conclusions We have presented a new gapped PWM model for variable length DNA binding sites that is not too restrictive nor over-parameterised. Our comparison with existing tools shows that on average it does not have better predictive accuracy than existing methods. However, it does provide more interpretable models of motifs of variable structure that are suitable for follow-up structural studies. To our knowledge, we are the first to apply variable length motif models to eukaryotic ChIP-seq data sets and consequently the first to show their value in this domain. The results include a novel motif for the ubiquitous transcription factor Sp1.
Collapse
Affiliation(s)
- John E Reid
- MRC Biostatistics Unit, Institute of Public Health, Forvie Site, Cambridge, CB2 0SR, UK.
| | | | | | | | | |
Collapse
|
60
|
Huttenhower C, Mutungu KT, Indik N, Yang W, Schroeder M, Forman JJ, Troyanskaya OG, Coller HA. Detailing regulatory networks through large scale data integration. Bioinformatics 2009; 25:3267-74. [PMID: 19825796 DOI: 10.1093/bioinformatics/btp588] [Citation(s) in RCA: 47] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Much of a cell's regulatory response to changing environments occurs at the transcriptional level. Particularly in higher organisms, transcription factors (TFs), microRNAs and epigenetic modifications can combine to form a complex regulatory network. Part of this system can be modeled as a collection of regulatory modules: co-regulated genes, the conditions under which they are co-regulated and sequence-level regulatory motifs. RESULTS We present the Combinatorial Algorithm for Expression and Sequence-based Cluster Extraction (COALESCE) system for regulatory module prediction. The algorithm is efficient enough to discover expression biclusters and putative regulatory motifs in metazoan genomes (>20,000 genes) and very large microarray compendia (>10,000 conditions). Using Bayesian data integration, it can also include diverse supporting data types such as evolutionary conservation or nucleosome placement. We validate its performance using a functional evaluation of co-clustered genes, known yeast and Escherichea coli TF targets, synthetic data and various metazoan data compendia. In all cases, COALESCE performs as well or better than current biclustering and motif prediction tools, with high accuracy in functional and TF/target assignments and zero false positives on synthetic data. COALESCE provides an efficient and flexible platform within which large, diverse data collections can be integrated to predict metazoan regulatory networks. AVAILABILITY Source code (C++) is available at http://function.princeton.edu/sleipnir, and supporting data and a web interface are provided at http://function.princeton.edu/coalesce. CONTACT ogt@cs.princeton.edu; hcoller@princeton.edu. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Curtis Huttenhower
- Department of Computer Science, Princeton University, 35 Olden Street, Princeton, NJ 08540, USA
| | | | | | | | | | | | | | | |
Collapse
|
61
|
Oh YM, Kim JK, Choi Y, Choi S, Yoo JY. Prediction and experimental validation of novel STAT3 target genes in human cancer cells. PLoS One 2009; 4:e6911. [PMID: 19730699 PMCID: PMC2731854 DOI: 10.1371/journal.pone.0006911] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2009] [Accepted: 08/03/2009] [Indexed: 11/23/2022] Open
Abstract
The comprehensive identification of functional transcription factor binding sites (TFBSs) is an important step in understanding complex transcriptional regulatory networks. This study presents a motif-based comparative approach, STAT-Finder, for identifying functional DNA binding sites of STAT3 transcription factor. STAT-Finder combines STAT-Scanner, which was designed to predict functional STAT TFBSs with improved sensitivity, and a motif-based alignment to minimize false positive prediction rates. Using two reference sets containing promoter sequences of known STAT3 target genes, STAT-Finder identified functional STAT3 TFBSs with enhanced prediction efficiency and sensitivity relative to other conventional TFBS prediction tools. In addition, STAT-Finder identified novel STAT3 target genes among a group of genes that are over-expressed in human cancer cells. The binding of STAT3 to the predicted TFBSs was also experimentally confirmed through chromatin immunoprecipitation. Our proposed method provides a systematic approach to the prediction of functional TFBSs that can be applied to other TFs.
Collapse
Affiliation(s)
- Young Min Oh
- Department of Life Sciences, Pohang University of Science and Technology, Pohang, Republic of Korea
| | - Jong Kyoung Kim
- Department of Computer Science, Pohang University of Science and Technology, Pohang, Republic of Korea
| | - Yongwook Choi
- Department of Life Sciences, Pohang University of Science and Technology, Pohang, Republic of Korea
| | - Seungjin Choi
- Department of Computer Science, Pohang University of Science and Technology, Pohang, Republic of Korea
- * E-mail: (JY); (SC)
| | - Joo-Yeon Yoo
- Department of Life Sciences, Pohang University of Science and Technology, Pohang, Republic of Korea
- * E-mail: (JY); (SC)
| |
Collapse
|
62
|
Marco A, Konikoff C, Karr TL, Kumar S. Relationship between gene co-expression and sharing of transcription factor binding sites in Drosophila melanogaster. Bioinformatics 2009; 25:2473-7. [PMID: 19633094 DOI: 10.1093/bioinformatics/btp462] [Citation(s) in RCA: 36] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/16/2023] Open
Abstract
MOTIVATION In functional genomics, it is frequently useful to correlate expression levels of genes to identify transcription factor binding sites (TFBS) via the presence of common sequence motifs. The underlying assumption is that co-expressed genes are more likely to contain shared TFBS and, thus, TFBS can be identified computationally. Indeed, gene pairs with a very high expression correlation show a significant excess of shared binding sites in yeast. We have tested this assumption in a more complex organism, Drosophila melanogaster, by using experimentally determined TFBS and microarray expression data. We have also examined the reverse relationship between the expression correlation and the extent of TFBS sharing. RESULTS Pairs of genes with shared TFBS show, on average, a higher degree of co-expression than those with no common TFBS in Drosophila. However, the reverse does not hold true: gene pairs with high expression correlations do not share significantly larger numbers of TFBS. Exception to this observation exists when comparing expression of genes from the earliest stages of embryonic development. Interestingly, semantic similarity between gene annotations (Biological Process) is much better associated with TFBS sharing, as compared to the expression correlation. We discuss these results in light of reverse engineering approaches to computationally predict regulatory sequences by using comparative genomics.
Collapse
Affiliation(s)
- Antonio Marco
- Center for Evolutionary Functional Genomics, The Biodesign Institute, Arizona State University, Tempe, AZ 85287-5301, USA.
| | | | | | | |
Collapse
|
63
|
Abstract
Motivation: The motif discovery problem consists of finding over-represented patterns in a collection of biosequences. It is one of the classical sequence analysis problems, but still has not been satisfactorily solved in an exact and efficient manner. This is partly due to the large number of possibilities of defining the motif search space and the notion of over-representation. Even for well-defined formalizations, the problem is frequently solved in an ad hoc manner with heuristics that do not guarantee to find the best motif. Results: We show how to solve the motif discovery problem (almost) exactly on a practically relevant space of IUPAC generalized string patterns, using the p-value with respect to an i.i.d. model or a Markov model as the measure of over-representation. In particular, (i) we use a highly accurate compound Poisson approximation for the null distribution of the number of motif occurrences. We show how to compute the exact clump size distribution using a recently introduced device called probabilistic arithmetic automaton (PAA). (ii) We define two p-value scores for over-representation, the first one based on the total number of motif occurrences, the second one based on the number of sequences in a collection with at least one occurrence. (iii) We describe an algorithm to discover the optimal pattern with respect to either of the scores. The method exploits monotonicity properties of the compound Poisson approximation and is by orders of magnitude faster than exhaustive enumeration of IUPAC strings (11.8 h compared with an extrapolated runtime of 4.8 years). (iv) We justify the use of the proposed scores for motif discovery by showing our method to outperform other motif discovery algorithms (e.g. MEME, Weeder) on benchmark datasets. We also propose new motifs on Mycobacterium tuberculosis. Availability and Implementation: The method has been implemented in Java. It can be obtained from http://ls11-www.cs.tu-dortmund.de/people/marschal/paa_md/ Contact:tobias.marschall@tu-dortmund.de; sven.rahmann@tu-dortmund.de
Collapse
Affiliation(s)
- Tobias Marschall
- Computer Science Department, Bioinformatics for High-Throughput Technologies at the Chair of Algorithm Engineering, TU Dortmund, Dortmund, Germany.
| | | |
Collapse
|
64
|
HOU L, QIAN MP, ZHU YP, DENG MH. Advances on bioinformatic research in transcription factor binding sites. YI CHUAN = HEREDITAS 2009; 31:365-73. [DOI: 10.3724/sp.j.1005.2009.00365] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
65
|
Narlikar L, Ovcharenko I. Identifying regulatory elements in eukaryotic genomes. BRIEFINGS IN FUNCTIONAL GENOMICS AND PROTEOMICS 2009; 8:215-30. [PMID: 19498043 DOI: 10.1093/bfgp/elp014] [Citation(s) in RCA: 73] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
Abstract
Proper development and functioning of an organism depends on precise spatial and temporal expression of all its genes. These coordinated expression-patterns are maintained primarily through the process of transcriptional regulation. Transcriptional regulation is mediated by proteins binding to regulatory elements on the DNA in a combinatorial manner, where particular combinations of transcription factor binding sites establish specific regulatory codes. In this review, we survey experimental and computational approaches geared towards the identification of proximal and distal gene regulatory elements in the genomes of complex eukaryotes. Available approaches that decipher the genetic structure and function of regulatory elements by exploiting various sources of information like gene expression data, chromatin structure, DNA-binding specificities of transcription factors, cooperativity of transcription factors, etc. are highlighted. We also discuss the relevance of regulatory elements in the context of human health through examples of mutations in some of these regions having serious implications in misregulation of genes and being strongly associated with human disorders.
Collapse
Affiliation(s)
- Leelavati Narlikar
- Computational Biology Branch, National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | | |
Collapse
|
66
|
Luca F, Kashyap S, Southard C, Zou M, Witonsky D, Di Rienzo A, Conzen SD. Adaptive variation regulates the expression of the human SGK1 gene in response to stress. PLoS Genet 2009; 5:e1000489. [PMID: 19461886 PMCID: PMC2679193 DOI: 10.1371/journal.pgen.1000489] [Citation(s) in RCA: 38] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2009] [Accepted: 04/22/2009] [Indexed: 12/22/2022] Open
Abstract
The Serum and Glucocorticoid-regulated Kinase1 (SGK1) gene is a target of the glucocorticoid receptor (GR) and is central to the stress response in many human tissues. Because environmental stress varies across habitats, we hypothesized that natural selection shaped the geographic distribution of genetic variants regulating the level of SGK1 expression following GR activation. By combining population genetics and molecular biology methods, we identified a variant (rs9493857) with marked allele frequency differences between populations of African and European ancestry and with a strong correlation between allele frequency and latitude in worldwide population samples. This SNP is located in a GR-binding region upstream of SGK1 that was identified using a GR ChIP-chip. SNP rs9493857 also lies within a predicted binding site for Oct1, a transcription factor known to cooperate with the GR in the transactivation of target genes. Using ChIP assays, we show that both GR and Oct1 bind to this region and that the ancestral allele at rs9493857 binds the GR-Oct1 complex more efficiently than the derived allele. Finally, using a reporter gene assay, we demonstrate that the ancestral allele is associated with increased glucocorticoid-dependent gene expression when compared to the derived allele. Our results suggest a novel paradigm in which hormonal responsiveness is modulated by sequence variation in the regulatory regions of nuclear receptor target genes. Identifying such functional variants may shed light on the mechanisms underlying inter-individual variation in response to environmental stressors and to hormonal therapy, as well as in the susceptibility to hormone-dependent diseases. Susceptibility to many common human diseases including hypertension, heart disease, and the metabolic syndrome is associated with increased neuroendocrine signaling in response to environmental stressors. A key component of the human stress response involves increased systemic glucocorticoid secretion that in turn leads to glucocorticoid receptor (GR) activation. As a result, a variety of GR-expressing cell types undergo gene expression changes, thereby providing an integrated physiological response to stress. The SGK1 gene is a well-established GR target that promotes cellular homeostasis in response to stress. Here, we use a combination of population genetics and molecular biology approaches to identify an SNP (rs9493857) in a distant SGK1 GR-binding region with unusually large differences in allele frequency between populations of European and African ancestry. Furthermore, rs9493857 shows a strong correlation between allele frequency and distance from the equator, a pattern consistent with a varying selective advantage across environments. Indeed, the ancestral allele at rs9493857 results in increased GR-binding and glucocorticoid-regulated gene expression, suggesting that an increased stress response (i.e., glucocorticoid responsiveness) was advantageous in ancestral human populations. We speculate that, in modern times, such variation could favor the negative effects of a heightened glucocorticoid response, potentially predisposing individuals to chronic diseases such as metabolic syndrome and hypertension.
Collapse
Affiliation(s)
- Francesca Luca
- Department of Human Genetics, The University of Chicago, Chicago, Illinois, United States of America
| | - Sonal Kashyap
- Department of Medicine, The University of Chicago, Chicago, Illinois, United States of America
| | - Catherine Southard
- Department of Human Genetics, The University of Chicago, Chicago, Illinois, United States of America
| | - Min Zou
- Department of Medicine, The University of Chicago, Chicago, Illinois, United States of America
| | - David Witonsky
- Department of Human Genetics, The University of Chicago, Chicago, Illinois, United States of America
| | - Anna Di Rienzo
- Department of Human Genetics, The University of Chicago, Chicago, Illinois, United States of America
- * E-mail: (ADR); (SDC)
| | - Suzanne D. Conzen
- Department of Medicine, The University of Chicago, Chicago, Illinois, United States of America
- * E-mail: (ADR); (SDC)
| |
Collapse
|
67
|
Courchesne NMD, Parisien A, Wang B, Lan CQ. Enhancement of lipid production using biochemical, genetic and transcription factor engineering approaches. J Biotechnol 2009; 141:31-41. [PMID: 19428728 DOI: 10.1016/j.jbiotec.2009.02.018] [Citation(s) in RCA: 257] [Impact Index Per Article: 17.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2008] [Revised: 02/15/2009] [Accepted: 02/20/2009] [Indexed: 01/03/2023]
Abstract
This paper compares three possible strategies for enhanced lipid overproduction in microalgae: the biochemical engineering (BE) approaches, the genetic engineering (GE) approaches, and the transcription factor engineering (TFE) approaches. The BE strategy relies on creating a physiological stress such as nutrient-starvation or high salinity to channel metabolic fluxes to lipid accumulation. The GE strategy exploits our understanding to the lipid metabolic pathway, especially the rate-limiting enzymes, to create a channelling of metabolites to lipid biosynthesis by overexpressing one or more key enzymes in recombinant microalgal strains. The TFE strategy is an emerging technology aiming at enhancing the production of a particular metabolite by means of overexpressing TFs regulating the metabolic pathways involved in the accumulation of target metabolites. Currently, BE approaches are the most established in microalgal lipid production. The TFE is a very promising strategy because it may avoid the inhibitive effects of the BE approaches and the limitation of "secondary bottlenecks" as commonly observed in the GE approaches. However, it is still a novel concept to be investigated systematically.
Collapse
|
68
|
Affiliation(s)
- Debopriya Das
- Life Sciences Division, Ernest O Lawrence Berkeley National Laboratory, Berkeley, California, United States of America.
| | | | | |
Collapse
|
69
|
Hao P, Yu Y, Zhang X, Tu K, Fan H, Zhong Y. The contribution of cis-regulatory elements to head-to-head gene pairs' co-expression pattern. SCIENCE IN CHINA. SERIES C, LIFE SCIENCES 2009; 52:74-9. [PMID: 19152086 DOI: 10.1007/s11427-009-0004-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/16/2008] [Accepted: 08/06/2008] [Indexed: 10/21/2022]
Abstract
Transcription regulation is one of the most critical pipelines in biological process, in which cis-elements play the role as gene expression regulators. We attempt to deduce the principles underlying the co-expression of "head-to-head" gene pairs by analyzing activities or behaviors of the shared cis-elements. A network component analysis was performed to estimate the impact of cis-elements on gene promoters and their activities under different conditions. Our discoveries reveal how biological system uses those regulatory elements to control the expression pattern of "head-to-head" gene pairs and the whole transcription regulation system.
Collapse
Affiliation(s)
- Pei Hao
- School of Life Sciences, Fudan University, Shanghai, 200433, China
| | | | | | | | | | | |
Collapse
|
70
|
Cai Y, He J, Li X, Lu L, Yang X, Feng K, Lu W, Kong X. A Novel Computational Approach To Predict Transcription Factor DNA Binding Preference. J Proteome Res 2008; 8:999-1003. [DOI: 10.1021/pr800717y] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
- Yudong Cai
- CAS-MPG Partner Institute for Computational Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, 320 Yueyang Road, Shanghai 200031, China, Department of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai 200040, People’s Republic of China, Institute of Health Sciences, Shanghai Jiao Tong University School of Medicine and Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200025, China, Division of Imaging Science & Biomedical
| | - JianFeng He
- CAS-MPG Partner Institute for Computational Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, 320 Yueyang Road, Shanghai 200031, China, Department of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai 200040, People’s Republic of China, Institute of Health Sciences, Shanghai Jiao Tong University School of Medicine and Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200025, China, Division of Imaging Science & Biomedical
| | - XinLei Li
- CAS-MPG Partner Institute for Computational Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, 320 Yueyang Road, Shanghai 200031, China, Department of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai 200040, People’s Republic of China, Institute of Health Sciences, Shanghai Jiao Tong University School of Medicine and Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200025, China, Division of Imaging Science & Biomedical
| | - Lin Lu
- CAS-MPG Partner Institute for Computational Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, 320 Yueyang Road, Shanghai 200031, China, Department of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai 200040, People’s Republic of China, Institute of Health Sciences, Shanghai Jiao Tong University School of Medicine and Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200025, China, Division of Imaging Science & Biomedical
| | - XinYi Yang
- CAS-MPG Partner Institute for Computational Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, 320 Yueyang Road, Shanghai 200031, China, Department of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai 200040, People’s Republic of China, Institute of Health Sciences, Shanghai Jiao Tong University School of Medicine and Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200025, China, Division of Imaging Science & Biomedical
| | - KaiYan Feng
- CAS-MPG Partner Institute for Computational Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, 320 Yueyang Road, Shanghai 200031, China, Department of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai 200040, People’s Republic of China, Institute of Health Sciences, Shanghai Jiao Tong University School of Medicine and Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200025, China, Division of Imaging Science & Biomedical
| | - WenCong Lu
- CAS-MPG Partner Institute for Computational Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, 320 Yueyang Road, Shanghai 200031, China, Department of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai 200040, People’s Republic of China, Institute of Health Sciences, Shanghai Jiao Tong University School of Medicine and Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200025, China, Division of Imaging Science & Biomedical
| | - XiangYin Kong
- CAS-MPG Partner Institute for Computational Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, 320 Yueyang Road, Shanghai 200031, China, Department of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai 200040, People’s Republic of China, Institute of Health Sciences, Shanghai Jiao Tong University School of Medicine and Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200025, China, Division of Imaging Science & Biomedical
| |
Collapse
|
71
|
Persikov AV, Osada R, Singh M. Predicting DNA recognition by Cys2His2 zinc finger proteins. ACTA ACUST UNITED AC 2008; 25:22-9. [PMID: 19008249 DOI: 10.1093/bioinformatics/btn580] [Citation(s) in RCA: 87] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
MOTIVATION Cys(2)His(2) zinc finger (ZF) proteins represent the largest class of eukaryotic transcription factors. Their modular structure and well-conserved protein-DNA interface allow the development of computational approaches for predicting their DNA-binding preferences even when no binding sites are known for a particular protein. The 'canonical model' for ZF protein-DNA interaction consists of only four amino acid nucleotide contacts per zinc finger domain. RESULTS We present an approach for predicting ZF binding based on support vector machines (SVMs). While most previous computational approaches have been based solely on examples of known ZF protein-DNA interactions, ours additionally incorporates information about protein-DNA pairs known to bind weakly or not at all. Moreover, SVMs with a linear kernel can naturally incorporate constraints about the relative binding affinities of protein-DNA pairs; this type of information has not been used previously in predicting ZF protein-DNA binding. Here, we build a high-quality literature-derived experimental database of ZF-DNA binding examples and utilize it to test both linear and polynomial kernels for predicting ZF protein-DNA binding on the basis of the canonical binding model. The polynomial SVM outperforms previously published prediction procedures as well as the linear SVM. This may indicate the presence of dependencies between contacts in the canonical binding model and suggests that modification of the underlying structural model may result in further improved performance in predicting ZF protein-DNA binding. Overall, this work demonstrates that methods incorporating information about non-binding and relative binding of protein-DNA pairs have great potential for effective prediction of protein-DNA interactions. AVAILABILITY An online tool for predicting ZF DNA binding is available at http://compbio.cs.princeton.edu/zf/.
Collapse
Affiliation(s)
- Anton V Persikov
- Lewis-Sigler Institute for Integrative Genomics and Department of Computer Science, Princeton University, Princeton, NJ 08544, USA
| | | | | |
Collapse
|